Virtualization Technology News and Information
VMblog Expert Interview: Codecov Founder Unravels the Mysteries of Code Coverage and Discusses Sentry Acquisition


Sentry, the developer-first application monitoring platform, recently acquired code coverage provider Codecov. To find out more, VMblog spoke with Codecov founder Jerrod Engelberg, now head of Codecov at Sentry, to hear his advice about code coverage and learn more about what the acquisition means for the customers of both companies.

VMblog: What is code coverage and why is it important?

Jerrod Engelberg: Simply put, code coverage is a metric used by software developers and organizations around the world to understand what percentage of their code is being tested before deployment.

In mathematical terms, code coverage is expressed as:

  • Total number of tested lines of code (LoC) / total number of lines of code = Code coverage %

The impact of code coverage is debated, but what's clear is that there is always some risk to leaving production code untested. Code coverage is one of the fastest ways to see where you may have accidentally neglected to test your code contribution, or to identify areas of the code base where you should add tests later.

VMblog: What's the right coverage number to aim for?

Engelberg: One of the major pitfalls we see folks fall into is the "100% tested code or bust" mentality. There are benefits to having 100% code coverage, but it's important to keep in mind that achieving complete coverage may not always be possible or practical, especially for complex applications or legacy codebases.

In such cases, it may be better to focus on testing critical or high-risk areas of the application rather than trying to achieve 100% code coverage across the entire codebase.

To put numbers behind this: as much talk as there is about 100% coverage, we find that only about 22% of repos using Codecov have actually achieved the 100% mark. The majority of Codecov users are above 80% coverage.

VMblog: If I'm at 100% coverage, why do I sometimes still get errors? What's going on?

Engelberg: While untested code is generally seen as risky, tested code can still generate errors or defects. The main reasons we see for this are:

  • Incomplete or insufficient test cases: Although achieving 100% code coverage means that all code paths have been executed by the test suite, it does not guarantee that all possible scenarios have been tested. To make testing even more effective, Codecov is working on ways to "test your tests" and evaluate whether you missed any test cases.
  • Integration issues: Even if all the individual components of a software application are tested thoroughly, issues can arise when integrating these components together. Although code coverage is generally synonymous with unit testing, Codecov supports testing data from Integration Tests and End-to-End tests, which more closely represent a fully integrated software system
  • Performance/environmental factors: The software application may be impacted by environmental factors such as hardware limitations, network issues, or system configuration errors, which may not be caught by the test suite.

VMblog: What defines a high-quality test and how do you know how good your tests are?

Engelberg: Here are a few attributes of tests that are considered high-quality:

  • Thorough: A test suite should be thorough. That means that it should cover all relevant cases and edge cases. Often, it also means using different types of tests to ensure resiliency (unit tests, integration tests, end-to-end tests).
  • Reliable and Repeatable: A well-isolated test should be repeatable. A test should produce the same results every time it is run, provided the code being tested has not changed.
  • Performant: On average, a unit test should take only a few milliseconds to run. Integration tests and end-to-end tests are often more computationally intensive.
  • Maintainable: It's important to write tests in a way that makes them easy to understand and maintain, so they can continue to provide value as the codebase evolves. (Ensuring tests are well-named and well-documented always helps with this!)

Some metrics that you can look to measure the quality of your tests include:

  • Test flakiness: Is there a test or set of tests that frequently fails during build times? The test may be flaky and/or non-deterministic due to a race condition or similar.
  • Overly robust tests: This is less discussed, but if a test hasn't failed in many months, it may be overly robust and passing even when it should not. Mutation testing is a technique that can help sniff out overly robust tests.
  • Test time: If your test suite takes many hours to run, you may want to investigate the performance of your tests. If that doesn't help, consider parallelizing your testing in CI, or using a tool like Smart Automated Test Selection to test only what has actually changed and its dependencies.

VMblog: Is it possible to measure the ROI of code coverage?

Engelberg: If having a well-tested, resilient code base is important to you, code coverage is one of the most objective and tangible metrics available. Organizations that start using Codecov frequently increase their code coverage almost immediately after showing Codecov metrics in their pull request / merge request flow.

But why does increasing the testing of your codebase produce ROI?

The short answer is, the earlier you catch / prevent a defect, the "cheaper" it is for your software organization to patch that error. We believe it's far, far better for a developer to catch a bug while writing the code than a customer to catch it for you in production -- better in terms of being faster, less damaging to your reputation, less distracting, and less costly. Finding an error in production costs an average of $80 per defect, while finding it in production can cost up to $$7,600 per defect.

Beyond avoiding defects, well-tested code is generally understood to be of higher quality, more maintainable, and easier for onboarding new team members.

VMblog: Why did Sentry acquire Codecov?

Engelberg: Sentry and Codecov share a singular focus on the developer, and the acquisition expanded Sentry's capabilities for development teams to improve code quality and velocity earlier in the development life cycle. Now, when in production, devs can identify which lines of code have been tested and which may need more attention. Together, the companies enable devs to be more proactive in preventing issues, and when issues inevitably do arise, they can be resolved  faster with actionable data.

Even if you can't catch all the errors in a tested code base, the sooner you can catch them, the cheaper they are to fix. Sentry wanted to help its customers "shift left" catching errors and eliminating them before the code is deployed.

VMblog: What does the intersection of observability and code coverage look like?

Engelberg: Historically, pre-deploy and post-deploy tools were not typically part of the same software toolkit, except for internal tools built at the largest companies like Google and Facebook. However, what became clear as Sentry and Codecov started working together was that regardless of the deployment stage, more and more of this work has become the developer's responsibility, and a unified tool, focused on the developer, was needed.

Having a unified tool allows you to answer questions like:

Q: "I found an error in my running production systems. Was the function that caused this error tested?"

  • A: If error-producing code IS NOT tested, the next step might be to write a test around this case. But, if  error-producing code IS tested, the next step is to investigate the test in question relative to the code logic and see if, for example, a test case was missed.

Q: "I'm about to make this change to a part of the codebase I'm not familiar with. How often is it actually used in production? What parts of the customer experience are impacted by this change?"

  • A: We heard again and again stories of a well-meaning developer changing a deeply-nested, shared library without realizing that that library was consumed by something like the log-in button or the checkout flow. All of a sudden, changing a completely different part of the codebase knocked out a core user experience. Bringing production data back to the change management step allows developers to understand the *customer impact* of their code contribution.

VMblog: What does Sentry buying Codecov mean for existing Codecov customers?

Engelberg: Codecov helps you understand how your deployments will impact your users before you deploy your code, and Sentry allows you to monitor for errors and performance after deployment. By using them together, you fully understand your application's reliability, which means faster development cycles, quicker discovery and remediation of bugs, and an overall better developer experience.

We just announced our first integration on March 29, so developers using Codecov's integration with Sentry will now see any untested code causing errors directly in the Sentry Issue stack trace. We'll continue to build tighter integrations and more capabilities, to ensure we're providing the best code coverage tools on the market while providing more value than ever to our user base. Our goal with this first iteration was to unify pre-production and post-deploy data, to provide actionable insights to developers, saving them time, and organizations money by making it faster and easier to find and resolve problems that will inevitably happen. There hasn't been a single solution that helps developers fill gaps in coverage and fix the most critical issues caused by untested code - until now.

We'll continue to build tighter integrations, more capabilities, to continue toward our goal to be the best code coverage tool on the market while providing more value than ever to our user base.

VMblog: How does code coverage connect to developer productivity and innovation?

Engelberg: We discussed the ROI of code coverage and the benefits of testing above. One additional benefit of code coverage is that it can speed deployment and innovation,  because you have the safety net of code coverage behind you.

  1. Code authors know nearly instantly if they were successful in testing the parts of the code base they intended to, and have an immediate remediation path to know where to write tests if they did not.
  2. Change (pull request) reviewers no longer have to spend time reviewing if code is tested, and can focus instead on reviewing the logic of the code submitted to them.
  3. New teammates can more quickly be onboarded to the code base, looking at tests (or even contributing tests!) to understand how and why the codebase was written.
  4. Managers and executives can have a constant pulse on the code quality of the codebase and don't have to feel the vertigo effect of a codebase changing constantly around them without any insight.
Published Wednesday, March 29, 2023 9:01 AM by David Marshall
Filed under: ,
There are no comments for this post.
To post a comment, you must be a registered user. Registration is free and easy! Sign up now!
<March 2023>