By Ayse
Kaya
Between 2020 and
2021, the number of all-time pulls on Docker Hub nearly tripled, from 130
billion to 318 billion. That level of growth is
astounding, especially when you consider that it took more than six years to
achieve the first 130 billion and that some estimates say Docker Hub, while
still the most popular container registry, is home to just half
of the world's containers.
Containers have
become the norm for application development (report), with the massive developer adoption
of cloud native apps and containerized workflows. The result: millions of
public image repositories. And with more than 8
million container images on Docker Hub alone, the container
landscape continues to get more complex, more specialized, and more difficult
to secure.
At Slim.AI-our
startup that's focused on container best practices and developer experience-we
have container enthusiasts using our various tools everyday, scanning
containers, optimizing them, and sharing their experience with us.
We thought it
would be interesting to find out what's inside the public images that serve as
starting points for nearly all modern software development. So, we looked.
In our forthcoming
report, to be released during KubeCon, we look inside the most popular
containers available on Docker Hub. We were interested in profiling these
containers from the point of view of a developer starting a project, so we
designed our study to answer questions like: Is the container going to be easy
to use? Is it efficient? Is it safe? Will it cause issues when I ship my
application?
Containers
promise a simplified world for developers and operations alike, and they
certainly have turbocharged the move to the cloud. But with new solutions come
new problems. Our preliminary findings show an increasingly complex container
ecosystem that requires deep expertise and great tooling to fulfill that
promise, otherwise it's just creating more friction for developers and ops
teams alike.
We applied an
array of proprietary and open-source tools - combined with the insights we've
gleaned from talking to the container experts who use our DockerSlim
open-source - to curate a list of the Top 100 most impactful containers being
used today and do deep-dive analysis on what comprises those containers.
Why the Top 100?
When we look at pulls-that is, actual usage-we see a skewed Pareto distribution
in the data set, where a subset of popular containers make up a significant
percentage of overall use. So, by focusing on a "Top 100" (give or take), we
had a convenient cutoff point that is at once relevant to most developers and
manageable from an analytics point of view. Several of these containers such as
Jenkins, Ubuntu, Centos, Redis, and PostgreSQL have been pulled more than a
billion times each.
We also added an
element of qualitative curation to our analysis based on the usage of
DockerSlim and our emerging Slim Developer Platform, as looking purely at pull
count or Docker Hub official images is not necessarily the only way to
understand the evolving container landscape. Developers use Docker containers
in myriad ways, and many highly active development communities have highly
relevant images that aren't always prevalent in "Most Popular" type
lists-LinuxServer.io, Nuxt.js, and Raspberry Pi, for example. Ignoring these
types of images didn't ring true to our analysis nor how developers think about
public container images.
Here are a few
more details about our methodology:
- The containers included a mix of
base images and purpose-built public containers.
- Using DockerSlim's X-Ray tool
along with a suite of other open source tools, combined with data and insights
from our SaaS platform, we looked into the "latest" version of each container
and extracted a series of characteristics that define what we call the level of
"production readiness" of these containers.
- We analyzed those reports
specifically and in aggregate using state-of-the-art data science techniques,
according to container best practices and evaluated them to provide a big
picture overview as well as outliers from the norm.
Here's a sneak
preview of three of the early takeaways from the forthcoming report. Some
surprised us, and others made perfect sense:
- Slim Is In: We saw a provable correlation between container size and attack
surface (vulnerabilities). Larger containers tend to contain more
vulnerabilities with varying degrees of severity and in some cases, with no
easy fix. Some of the containers in the programming languages category, such as
Rust, Ruby and Perl; some web frameworks, such as Node and Django have
significantly more vulnerabilities than average.
- Production-Ready vs
Developer-Friendly: The analysis shows, expectedly,
that many generic public images contain developer-friendly tooling such as
shells, package managers, and helper libraries. While these assets are often
necessary for initial container development or debugging, they should be
removed for deployment to production, implying that by using a popular public
image, dev teams are incurring inevitable tech debt down stream when they go to
ship containers to production.
- Categories Matter: We find interesting in-group/out-group dynamics in between categories.
Data Science containers, for example, have the highest number of licenses,
libraries, and fairly high numbers of files, packages, and special permissions
- and they also tend to be some of the largest images overall. DevOps
containers, on the other hand, are less exploratory and more functional in
nature. As much, they tend to be smaller and less complicated, with fewer
licenses, packages and libraries. Yet, both categories have fewer
vulnerabilities on average, potentially because they are crafted for more
specialized use cases.
Of course, these
findings are preliminary. But they are interesting, especially when one
considers the vast deployment landscape of these Top 100 containers. Continued
research is needed to understand what containers look like once they are in
production environments, and how DevOps and DevX teams are evolving the way
organizations store and manage containers. We're excited to undertake these
research challenges as well.
To get a copy of
the report when it is published for KubeCon, sign
up here. And follow us on Twitter to let us know what you'd like to
see in future container best practices research that we conduct.
##
To hear more
about cloud native topics, join the Cloud Native Computing Foundation and cloud native community at KubeCon+CloudNativeCon North America 2021 - October 11-15, 2021
About
the Author
Ayse Kaya is Senior Director of Strategic
Insights & Analytics at Slim.AI. Prior to Slim.AI, she served in analytics
leadership roles at Cisco, Cloudlock and Keystone Strategies. Ayse is a
graduate of the MIT Sloan School of Management and the Massachusetts Institute
of Technology.