Sentry, the developer-first application
monitoring platform, recently acquired code coverage provider Codecov. To find
out more, VMblog spoke with Codecov founder Jerrod Engelberg, now head of
Codecov at Sentry, to hear his advice about code coverage and learn more about
what the acquisition means for the customers of both companies.
VMblog:
What is code coverage and why is it important?
Jerrod Engelberg: Simply put, code coverage is a metric used by software
developers and organizations around the world to understand what percentage of
their code is being tested before deployment.
In mathematical terms, code coverage is
expressed as:
- Total number of tested lines of
code (LoC) / total number of lines of code = Code coverage %
The impact of code coverage is debated, but
what's clear is that there is always some risk to leaving production code untested. Code coverage is one of the
fastest ways to see where you may have accidentally neglected to test your code
contribution, or to identify areas of the code base where you should add tests
later.
VMblog:
What's the right coverage number to aim for?
Engelberg: One of the major pitfalls we see
folks fall into is the "100% tested code or bust" mentality. There are benefits
to having 100% code coverage, but it's important to keep in mind that achieving
complete coverage may not always be possible or practical,
especially for complex applications or legacy codebases.
In such cases, it may be better to focus on
testing critical or high-risk areas of the application rather than trying to
achieve 100% code coverage across the entire codebase.
To put numbers behind this: as much talk as
there is about 100% coverage, we find that only about 22% of repos using
Codecov have actually achieved the 100% mark. The majority of Codecov users are
above 80% coverage.
VMblog:
If I'm at 100% coverage, why do I sometimes still get errors? What's going on?
Engelberg: While untested code is generally
seen as risky, tested code can still generate errors or defects. The main
reasons we see for this are:
- Incomplete or insufficient test
cases: Although achieving 100% code coverage means that all code paths have
been executed by the test suite, it does not guarantee that all possible
scenarios have been tested. To make testing even more effective, Codecov is
working on ways to "test your tests" and evaluate whether you missed any test
cases.
- Integration issues: Even if all
the individual components of a software application are tested thoroughly,
issues can arise when integrating these components together. Although code
coverage is generally synonymous with unit testing, Codecov supports testing
data from Integration Tests and End-to-End tests, which more closely represent
a fully integrated software system
- Performance/environmental factors:
The software application may be impacted by environmental factors such as
hardware limitations, network issues, or system configuration errors, which may
not be caught by the test suite.
VMblog:
What defines a high-quality test and how do you know how good your tests are?
Engelberg: Here are a few attributes of tests
that are considered high-quality:
- Thorough: A test suite should be thorough.
That means that it should cover all relevant cases and edge cases. Often, it
also means using different types of tests to ensure resiliency (unit
tests, integration tests, end-to-end tests).
- Reliable and Repeatable: A well-isolated test
should be repeatable. A test should produce the same results every time it is
run, provided the code being tested has not changed.
- Performant: On average, a unit test should
take only a few milliseconds to run. Integration tests and end-to-end tests are
often more computationally intensive.
- Maintainable: It's important to write tests in
a way that makes them easy to understand and maintain, so they can continue to
provide value as the codebase evolves. (Ensuring tests are well-named and
well-documented always helps with this!)
Some metrics that you can look to measure the
quality of your tests include:
- Test flakiness: Is there a test or set of
tests that frequently fails during build times? The test may be flaky and/or
non-deterministic due to a race condition or similar.
- Overly robust tests: This is less discussed,
but if a test hasn't failed in many months, it may be overly robust and passing
even when it should not. Mutation testing is a technique that can help
sniff out overly robust tests.
- Test time: If your test suite takes many hours
to run, you may want to investigate the performance of your tests. If that
doesn't help, consider parallelizing your testing in CI, or using a tool like Smart Automated Test Selection to test only
what has actually changed and its dependencies.
VMblog:
Is it possible to measure the ROI of code coverage?
Engelberg: If having a well-tested, resilient
code base is important to you, code coverage is one of the most objective and
tangible metrics available. Organizations that start using Codecov frequently
increase their code coverage almost immediately after showing Codecov metrics
in their pull request / merge request flow.
But why does increasing the testing of your
codebase produce ROI?
The short answer is, the earlier you catch /
prevent a defect, the "cheaper" it is for your software organization to patch
that error. We believe it's far, far better for a developer to catch a bug
while writing the code than a customer to catch it for you in production --
better in terms of being faster, less damaging to your reputation, less
distracting, and less costly. Finding an error in production costs an average of $80 per defect, while finding it
in production can cost up to $$7,600 per defect.
Beyond avoiding defects, well-tested code is
generally understood to be of higher quality, more maintainable, and easier for
onboarding new team members.
VMblog:
Why did Sentry acquire Codecov?
Engelberg: Sentry and Codecov share a singular
focus on the developer, and the acquisition expanded Sentry's capabilities for
development teams to improve code quality and velocity earlier in the
development life cycle. Now, when in production, devs can identify which lines
of code have been tested and which may need more attention. Together, the
companies enable devs to be more proactive in preventing issues, and when
issues inevitably do arise, they can be resolved faster with actionable data.
Even if you can't catch all the errors in a
tested code base, the sooner you can catch them, the cheaper they are to fix.
Sentry wanted to help its customers "shift left" catching errors and
eliminating them before the code is
deployed.
VMblog:
What does the intersection of observability and code coverage look like?
Engelberg: Historically, pre-deploy and
post-deploy tools were not typically part of the same software toolkit, except
for internal tools built at the largest companies like Google and Facebook.
However, what became clear as Sentry and Codecov started working together was
that regardless of the deployment stage, more and more of this work has become
the developer's responsibility, and a unified tool, focused on the developer,
was needed.
Having a unified tool allows you to answer
questions like:
Q: "I found an error in my running production
systems. Was the function that caused this error tested?"
- A: If error-producing code IS NOT
tested, the next step might be to write a test around this case. But, if
error-producing code IS tested, the next step is to investigate the test
in question relative to the code logic and see if, for example, a test case was
missed.
Q: "I'm about to make this change to a part of
the codebase I'm not familiar with. How often is it actually used in
production? What parts of the customer experience are impacted by this change?"
- A: We heard again and again
stories of a well-meaning developer changing a deeply-nested, shared library
without realizing that that library was consumed by something like the log-in
button or the checkout flow. All of a sudden, changing a completely different
part of the codebase knocked out a core user experience. Bringing production
data back to the change management step allows developers to understand the
*customer impact* of their code contribution.
VMblog:
What does Sentry buying Codecov mean for existing Codecov customers?
Engelberg: Codecov helps you understand how
your deployments will impact your users before you deploy your code, and Sentry
allows you to monitor for errors and performance after deployment. By using
them together, you fully understand your application's reliability, which means
faster development cycles, quicker discovery and remediation of bugs, and an
overall better developer experience.
We just announced our first integration on March 29, so developers using Codecov's integration with Sentry will now see any untested code causing errors directly in the Sentry Issue stack trace. We'll continue to build tighter integrations and more capabilities, to ensure
we're providing the best code coverage tools on the market while providing more
value than ever to our user base. Our goal with this first iteration was to
unify pre-production and post-deploy data, to provide actionable insights to
developers, saving them time, and organizations money by making it faster and
easier to find and resolve problems that will inevitably happen. There hasn't
been a single solution that helps developers fill gaps in coverage and fix the
most critical issues caused by untested code - until now.
We'll continue to build tighter integrations,
more capabilities, to continue toward our goal to be the best code coverage
tool on the market while providing more value than ever to our user base.
VMblog:
How does code coverage connect to developer productivity
and innovation?
Engelberg: We discussed the ROI of code
coverage and the benefits of testing above. One additional benefit of code
coverage is that it can speed deployment and innovation, because you have the safety net of code coverage
behind you.
- Code authors know nearly instantly if
they were successful in testing the parts of the code base they intended
to, and have an immediate remediation path to know where to write tests if
they did not.
- Change (pull
request) reviewers no longer
have to spend time reviewing if code is tested, and can focus instead on
reviewing the logic of the code submitted to them.
- New teammates can more
quickly be onboarded to the code base, looking at tests (or even contributing tests!) to understand
how and why the codebase was written.
- Managers and executives can
have a constant pulse on the code quality of the codebase and don't have
to feel the vertigo effect of a codebase changing constantly around them
without any insight.
##