Virtualization Technology News and Information
Article
RSS
What We Discovered Analyzing the Top 100 Public Container Images

By Ayse Kaya

Between 2020 and 2021, the number of all-time pulls on Docker Hub nearly tripled, from 130 billion to 318 billion. That level of growth is astounding, especially when you consider that it took more than six years to achieve the first 130 billion and that some estimates say Docker Hub, while still the most popular container registry, is home to just half of the world's containers.

Containers have become the norm for application development (report), with the massive developer adoption of cloud native apps and containerized workflows. The result: millions of public image repositories. And with more than 8 million container images on Docker Hub alone, the container landscape continues to get more complex, more specialized, and more difficult to secure.

At Slim.AI-our startup that's focused on container best practices and developer experience-we have container enthusiasts using our various tools everyday, scanning containers, optimizing them, and sharing their experience with us.

We thought it would be interesting to find out what's inside the public images that serve as starting points for nearly all modern software development. So, we looked.

In our forthcoming report, to be released during KubeCon, we look inside the most popular containers available on Docker Hub. We were interested in profiling these containers from the point of view of a developer starting a project, so we designed our study to answer questions like: Is the container going to be easy to use? Is it efficient? Is it safe? Will it cause issues when I ship my application?

Containers promise a simplified world for developers and operations alike, and they certainly have turbocharged the move to the cloud. But with new solutions come new problems. Our preliminary findings show an increasingly complex container ecosystem that requires deep expertise and great tooling to fulfill that promise, otherwise it's just creating more friction for developers and ops teams alike.

We applied an array of proprietary and open-source tools - combined with the insights we've gleaned from talking to the container experts who use our DockerSlim open-source - to curate a list of the Top 100 most impactful containers being used today and do deep-dive analysis on what comprises those containers. 

Why the Top 100? When we look at pulls-that is, actual usage-we see a skewed Pareto distribution in the data set, where a subset of popular containers make up a significant percentage of overall use. So, by focusing on a "Top 100" (give or take), we had a convenient cutoff point that is at once relevant to most developers and manageable from an analytics point of view. Several of these containers such as Jenkins, Ubuntu, Centos, Redis, and PostgreSQL have been pulled more than a billion times each.

We also added an element of qualitative curation to our analysis based on the usage of DockerSlim and our emerging Slim Developer Platform, as looking purely at pull count or Docker Hub official images is not necessarily the only way to understand the evolving container landscape. Developers use Docker containers in myriad ways, and many highly active development communities have highly relevant images that aren't always prevalent in "Most Popular" type lists-LinuxServer.io, Nuxt.js, and Raspberry Pi, for example. Ignoring these types of images didn't ring true to our analysis nor how developers think about public container images.

Here are a few more details about our methodology:

  • The containers included a mix of base images and purpose-built public containers.
  • Using DockerSlim's X-Ray tool along with a suite of other open source tools, combined with data and insights from our SaaS platform, we looked into the "latest" version of each container and extracted a series of characteristics that define what we call the level of "production readiness" of these containers.
  • We analyzed those reports specifically and in aggregate using state-of-the-art data science techniques, according to container best practices and evaluated them to provide a big picture overview as well as outliers from the norm.

Here's a sneak preview of three of the early takeaways from the forthcoming report. Some surprised us, and others made perfect sense:

  • Slim Is In: We saw a provable correlation between container size and attack surface (vulnerabilities). Larger containers tend to contain more vulnerabilities with varying degrees of severity and in some cases, with no easy fix. Some of the containers in the programming languages category, such as Rust, Ruby and Perl; some web frameworks, such as Node and Django have significantly more vulnerabilities than average.
  • Production-Ready vs Developer-Friendly: The analysis shows, expectedly, that many generic public images contain developer-friendly tooling such as shells, package managers, and helper libraries. While these assets are often necessary for initial container development or debugging, they should be removed for deployment to production, implying that by using a popular public image, dev teams are incurring inevitable tech debt down stream when they go to ship containers to production.
  • Categories Matter: We find interesting in-group/out-group dynamics in between categories. Data Science containers, for example, have the highest number of licenses, libraries, and fairly high numbers of files, packages, and special permissions - and they also tend to be some of the largest images overall. DevOps containers, on the other hand, are less exploratory and more functional in nature. As much, they tend to be smaller and less complicated, with fewer licenses, packages and libraries. Yet, both categories have fewer vulnerabilities on average, potentially because they are crafted for more specialized use cases.

Of course, these findings are preliminary. But they are interesting, especially when one considers the vast deployment landscape of these Top 100 containers. Continued research is needed to understand what containers look like once they are in production environments, and how DevOps and DevX teams are evolving the way organizations store and manage containers. We're excited to undertake these research challenges as well.

To get a copy of the report when it is published for KubeCon, sign up here. And follow us on Twitter to let us know what you'd like to see in future container best practices research that we conduct.

##

To hear more about cloud native topics, join the Cloud Native Computing Foundation and cloud native community at KubeCon+CloudNativeCon North America 2021 - October 11-15, 2021      

About the Author

Ayse Kaya 

Ayse Kaya is Senior Director of Strategic Insights & Analytics at Slim.AI. Prior to Slim.AI, she served in analytics leadership roles at Cisco, Cloudlock and Keystone Strategies. Ayse is a graduate of the MIT Sloan School of Management and the Massachusetts Institute of Technology.

Published Thursday, September 30, 2021 7:43 AM by David Marshall
Filed under: ,
Comments
There are no comments for this post.
To post a comment, you must be a registered user. Registration is free and easy! Sign up now!
Calendar
<September 2021>
SuMoTuWeThFrSa
2930311234
567891011
12131415161718
19202122232425
262728293012
3456789