[ This article is part of a series
promoting FailoverConf -- a virtual event
dedicated to resilience hosted by Gremlin on April 21. Join! ]
By Jeff
Lo, Director of Product Marketing at Splunk
Traditional IT teams and application
developers approached building resilience into their infrastructure and
applications in a very waterfall way. They would lean on documentation, force
change requests, and try to anticipate all that could go wrong before testing
what had been built. The complexity of modern tech environments is such that
even with the most diligent of development teams, there is no predicting how
systems will fail, and there is no question that things will go wrong in the
most unexpected ways.
In today's fast-paced and competitive business
environment, attempting to plan ahead for every possible disaster scenario is
far greater than the risk of not deploying code. Those who maintain application
velocity are often those who will win in markets where customer demands are
high.
That is why resilience needs to not only be
tightly integrated into development, and testing, but also be baked into the
tools that monitor and support the applications in production. This means that
monitoring tools need to see all aspects of the application from the back-end
infrastructure to the front-end user. They also need to detect and surface
issues in a timely and relevant way. And they need to provide context that does
not force those who are on-call to hunt for details related to the issue.
Only with robust observability can teams
maintain application velocity and support the crucial feedback loop that is
necessary to build stability into applications and the infrastructure that runs
it. For legacy applications an a priori approach to resilience may have worked.
But modern application architectures are complex and consist of many small
interrelated stateful and stateless services, with greater service
dependencies, greater release velocity, and even more disparity between
environments. All of this means operations and SRE teams need to expect
failure, respond quickly and learn. In addition to a robust and automated
testing strategy, choosing processes and tools that set guard rails for
resilience and support feedback loops is necessary to modern environments.
Splunk supports data-driven application
development with a suite of observability tools that increases visibility from
planning to production. For modern, cloud-native applications SignalFx
Infrastructure Monitoring and SignalFx Microservices APM solutions are able to
keep pace with modern architectures built on containers, Kubernetes and
microservices, and deliver insights that improve the performance and resiliency
of all services in those applications. With these solutions, Splunk gives
today's DevOps teams the confidence to deploy code quickly and reliably.
Additionally, Splunk's incident response solution from
VictorOps connects incidents across modern and legacy systems, and helps
mobilize the right on-call engineers while providing them the context they need
to address issues with speed and precision. Splunk Enterprise gives
organizations high-fidelity data to support troubleshooting of the most complex
technical issues, and Splunk gives DevOps teams the ability to tie delivery
chain metrics to business value. All of Splunk's technologies are backed by AI
and Machine Learning to reduce the human time needed to respond to issues,
detect issues even before they happen, and help spot abnormal behavior in
real-time. Join
us at the virtual FailoverConf on
April 21st to learn more about resilience from other industry leaders.
##
About the Author
Jeff
Lo is Director of Product Marketing at Splunk with 20 years of experience in
product marketing, product management and go-to-market. Prior to SignalFx, Jeff
led Product Marketing at Scalyr and ran global product marketing for Predix
Studio and Digital Twins at GE Digital. Jeff holds a B.Sc. in Electrical
Engineering from the University of Alberta.