Virtualization Technology News and Information
Gremlin Announces Automatic Service Discovery for More Targeted and Effective Chaos Engineering

Gremlin announces Automatic Service Discovery at FailoverConf. The new feature from Gremlin automatically identifies the various services running across distributed systems, which enables engineers to directly target them for more effective Chaos Engineering experiments.

"When we started Gremlin our primary focus was on the underlying infrastructure, helping customers answer questions like, 'Can we handle server crashes?' or 'Can this cluster deal with a 10X traffic spike?'" said Matthew Fornaciari, CTO and Co-Founder of Gremlin. "But the rise in popularity of microservices necessitate services functioning as first-class citizens. The infrastructure layer is becoming more abstract and engineers are increasingly thinking about their systems as a collection of services. We want to replicate that mental model in Gremlin and reduce the cognitive load necessary to create controlled chaos."

[ Watch the VMblog Expert Video Interview with James Thigpen of Gremlin ]

Gremlin's Automatic Service Discovery works by identifying the services running where the Gremlin agent is installed, and then surfacing the operational data that makes those services function, such as process names, container images, and where the service is deployed. This makes it easier than ever before for engineers to run targeted chaos experiments, regardless of how they are hosted, be it distributed across hosts, containers, or even multiple cloud providers.

"End customers won't care about the ephemeral workloads and API calls happening behind the UI, they just want applications that function and perform as expected," said Jason English, Principal Analyst at Intellyx. "Before DevOps teams can shift-left and engineer resiliency into a system with early performance testing, chaos experiments and telemetry; they need to shift-right and discover exactly what services are contributing to that customer experience in production."

Gremlin has also built a new way to track reliability progress, enabling SREs and DevOps teams to click into a particular service and view the full history of experiments run over time. The owner of the service can also include links to runbooks for remediation and any associated dashboards for deeper observability. Having a single view for all of this information will provide engineers with a greater understanding of the reliability of their services.

More resources

Read the State of Chaos Engineering 2020 report.

Published Tuesday, April 27, 2021 10:15 AM by David Marshall
Filed under:
There are no comments for this post.
To post a comment, you must be a registered user. Registration is free and easy! Sign up now!
<April 2021>