Virtualization Technology News and Information
VMblog Expert Interview: Thundra Talks Bringing Order to the Chaos - Answering Questions on Chaos Engineering

Interview Thundra Samdan 

Thundra's VP of Product Emrah Samdan recently spoke at the Chaos Conference, the world's largest chaos engineering event.  After he spoke, the attendees had some great follow up questions.  Given the discussion, we figured other people might have similar questions and so we've curated those questions and Emrah's answers.

VMblog:  How confidently can we start experimenting in non-production environments if we know for sure that test environments are not mirroring production?  What are some strategies you may apply to tackle this?

Emrah Samdan:  You don't always test the production environment. But, you can test how you are responding as a team. Running chaos experiments on production is the end goal, not the way to start. This way, you can also "learn" how to run chaos experiments.

VMblog:  You talked about latency injections; what other types of anomalies are included in chaos injections?

Samdan:  Injecting different types of failures, playing with the concurrent execution limits of serverless functions, and playing with the IAM permissions.

VMblog:  In your teams, how do you manage/keep track of what chaos experiments have been performed across all your serverless functions?

Samdan:  Well, my preferred way is to create the shared communication channel, and using retrospective templates by incident management platforms like Opsgenie and Pagerduty.

VMblog:  Could you give us some examples of chaos injections into web applications?  And, how effective is it?  What are the metrics that we can use to measure those as well?

Samdan:  You can start really strong instead of with the "the day after tomorrow" scenario. For example, you can simply inject a latency to your API endpoints and see if there will be other problems in the other parts of your system such as more items waiting in the queue, problems in DB connection. After that, you can make it bigger by injecting more latency or injecting latency to more places.

VMblog:  Do APM tools come under the chaos engineering umbrella?

Samdan:  Not very frequently.

VMblog:  What are some of the metrics to measure in chaos attacks?

Samdan:  I say business level metrics can be more important than your system level metrics. For example, your APDEX score is more important than anything else, because it's the value to your customers. You should also check the application level metrics such as the latency or infrastructure and other level metrics such as CPU usage.

VMblog:  Can we inject chaos attacks on the database layer?

Samdan:  Yes, you can. But Thundra does it at the application level. For more infrastructure level chaos to database, you can use Gremlin.

VMblog:  You said "recursion is deadly in serverless."  Why?

Samdan:  You never know the base case actually covers the problematic inputs. You can stay in the infinite recursion but your function may time out.

VMblog:  Can Chaos testing be part of the CI/CD pipeline?

Samdan:  Very, very good question. I think about this a lot, but automated Chaos doesn't seem to be that much of Chaos for me. Maybe we can embed previous Chaos experiments into our CI/CD process, just to make sure it's still working, but for new game days, there should be a new hypothesis that you can think of.


Published Wednesday, November 04, 2020 7:38 AM by David Marshall
Filed under: ,
There are no comments for this post.
To post a comment, you must be a registered user. Registration is free and easy! Sign up now!
<November 2020>