Virtualization Technology News and Information
Article
RSS
Under the hood of CRI-O, Kubernetes Container Runtime

By Sascha Grunert, Senior Software Engineer, SUSE

Often, CRI-O is the container runtime of choice when it comes to running containerized workloads on top of Kubernetes. The project recently graduated to the CNCF incubator, and as result, a more stable and active community has been established around the technology. This blog post aims to provide partial insight into the corresponding Maintainers Track and Deep Dive Session of CRI-O at KubeCon + CloudNativeCon North America 2019. 

Responsibilities of a Kubernetes Conformant Container Runtime

In order to work in Kubernetes in a conformant way a container runtime must fulfil the Container Runtime Interface (CRI) Specification, but - what does that mean in detail? What exactly happens under the hood if we create a Kubernetes Deployment, for example? There really isn't a whole lot of magic involved, but in order to answer that question in a sensible way, let's take one step back.

On a high-level, CRI-O is just one of many components of a fully working Kubernetes cluster, whereas it directly interfaces with the kubelet - the Kubernetes worker node agent.

 

During the node startup process, the kubelet tries to connect to a gRPC server and expects a working CRI implementation there. CRI-O is the component which services this Application Programming Interface (API). If that sounds a bit heady it's because if we take a look at the CRI we can see that container runtimes have to support a complete container life cycle management process. Besides, the runtime should be able to download and manage container images, isolate container resources and fulfill the monitoring and logging requirements of the CRI. CRI-O achieves all of that by following the main UNIX philosophy and doing the least amount of work possible, the result which is an easy-to-maintain project that reduces attack surface for possible vulnerabilities.

Running a Kubernetes Node on top of CRI-O

To get started with Kubernetes and CRI-O, pick your preferred installation method and start the kubelet by specifying the remote container runtime and the UNIX socket where CRI-O listens on:

> kubelet --container-runtime=remote \
          --container-runtime-endpoint=/var/run/crio/crio.sock \
          --config=...

CRI-O expects a TOML based configuration file per default in /etc/crio/crio.conf The location of the listening crio.sock can be changed there as well:

# The crio.api table contains settings for the kubelet/gRPC interface.
[crio.api]

# Path to AF_LOCAL socket on which CRI-O will listen.
listen = "/var/run/crio/crio.sock"

That's it. The recommended way of running Kubernetes components like CRI-O and the kubelet is via systemd units It should be noted that an example unit can be found in the CRI-O repository. Per default, CRI-O does not log with a very high verbosity, which means that a typical CRI-O startup may look like:

INFO[...] no seccomp profile specified, using the internal default
INFO[...] installing default apparmor profile: crio-default-1.15.2

By default, CRI-O tries to be as secure as possible, which means that it provides a default apparmor and seccomp profile out of the box.

If we want to increase the logging verbosity, we could simply edit the configuration file in /etc/crio/crio.conf and adjust the log level like this:

# Changes the verbosity of the logs based on the level it is set to.
log_level = "debug"

Okay, so if you've followed me this far, you've been able to make the configuration changes I talked about. Now let's talk about how to apply them. CRI-O relies only on the on-disk state, so a restart of the container runtime wouldn't be a problem if we wanted to do so. However, CRI-O supports a live configuration reload feature for some parameters, which means that we can send a hang up signal (HUP) to the process to apply the new configuration:

> sudo kill -HUP $(pgrep crio)

CRI-O now logs that the configuration has been applied successfully:

INFO[...] reloading configuration "/etc/crio/crio.conf"
INFO[...] set config log_level to "debug"

From now on, every request and piece of additional information will be logged by CRI-O to enhance full cluster debugging capabilities during the operational process. That configuration is just one of many examples how CRI-O can be reconfigured, but now let's move on and elaborate what happens if we run a Kubernetes workload.

Life Cycle of a Kubernetes Workload

With higher logging verbosity and a successfully connected kubelet, we can now see that multiple requests are processed periodically by CRI-O:

time="..." level=debug msg="ListContainersRequest  ...
time="..." level=debug msg="ListContainersResponse ...
time="..." level=debug msg="ListPodSandboxRequest  ...
time="..." level=debug msg="ListPodSandboxResponse ...

This is one part of the kubelet synchronization loop which handles workload updates and verifies that the worker node is overall in a healthy state. We will assume that everything works as intended and create our first Kubernetes workload via kubectl, like this alpine pod:

> kubectl run --generator=run-pod/v1 \
              -it --rm --image=alpine:latest alpine sh
/ #

We're now running a live shell session inside a freshly created container. But on the way to achieve this, a lot of things happen in the background. During the kubectl invocation, we connect to the Kubernetes API Server and write the desired target manifest to it. This manifest will be picked up by the Scheduler to assign a free target node to it. The node runs the kubelet, which now has to create the workload via CRI-O. This means that the first gRPC request created by the kubelet will be a RunPodSandboxRequest, so in our case (the outputs are formatted for a better readability):

level=debug msg="
    RunPodSandboxRequest &RunPodSandboxRequest{
        Config: &PodSandboxConfig{
            Metadata: &PodSandboxMetadata{
                Name: alpine,
                Uid: 8aecd56d-8cc9-46e3-8d9f-375ad373254b,
                Namespace: kube-system,
                Attempt: 0,
            },
            Hostname: alpine,
            LogDirectory: /var/log/pods/kube-system_alpine ...,
            DnsConfig: &DNSConfig{...},
            PortMappings: [],
            Labels: map[string]string{...},
            Annotations: map[string]string{...},
            Linux: &LinuxPodSandboxConfig{...},
        },
    }"

The request already contains a lot of information for CRI-O to get started. The first thing the runtime has to do is to spawn a new container with the configured pause_image and pause_command. If the pause image is not available locally, CRI-O will download it automatically and setup the pod sandbox. The image was already available in our case, so CRI-O can setup the container directly, as the logs tell us:

Attempting to run pod sandbox with infra container: kube-system/alpine/POD
parsed reference into "[overlay@...]k8s.gcr.io/pause:3.1"
exporting opaque data as blob "sha256: ..."
created pod sandbox "..."
pod sandbox "..." has work directory ...
pod sandbox "..." has run directory ...
mounted container "..." at ...
running conmon: conmon" args="..."
Received container pid: 10879

The most interesting part of the logs is that CRI-O runs containers on top of the dedicated container monitoring project conmon, which keeps track of the container processes during its whole lifetime. The container process itself can be run by any Open Container Initiative (OCI) compatible lower-level runtime, like runc or kata-containers.

For now, the sandbox seems running, but CRI-O has to setup a working network environment too, which will be shared by all containers running inside the pod. Networking in Kubernetes will be done via the Container Networking Interface (CNI), whereas multiple possible solutions exist in the open source world. For demonstration purposes the bridge CNI plugin can be utilized to create a simple network bridge to existing local interfaces. The CRI-O logs tell us that the networking has been setup perfectly:

Got pod network &{...}
About to add CNI network crio-bridge (type=bridge)
Got pod network &{...}
About to check CNI network crio-bridge (type=bridge)
CNI setup result:
Interfaces­čśč
    {Name:crio.0 Mac:26:d7:27:1a:2f:e9 Sandbox:}
    {Name:veth6d76c137 Mac:06:2b:89:90:8e:3d Sandbox:}
    {Name:eth0 Mac:c2:f2:88:59:08:e6 Sandbox:/proc/10879/ns/net}],
IP:[{
    Version:4 Interface:0xc00073b6b8
    Address:{IP:10.10.2.73 Mask:ffffff00}
    Gateway:10.10.2.1}],
Routes:[{Dst:{IP:0.0.0.0 Mask:00000000} GW:<nil>}],
DNS:{Nameservers:[] Domain: Search:[] Options:[]}

The resulting IP address 10.10.2.73 could now be accessed directly on the node running CRI-O, for example for debugging purposes. From Kubernetes 1.16 and above, the dual stack IPv6 feature can be used with CRI-O as well. If the sandbox is running, its running status will be checked by the kubelet periodically via the PodSandboxStatusRequest.

If the sandbox is up and running, the kubelet will start verifying that the target container image (alpine) exists on the node. This will be done by doing an ImageStatusRequest and if the image does not exist locally, the kubelet sends a PullImageRequest to CRI-O to retrieve the image. CRI-O supports a fine-granular registry configuration syntax, which allows end users to specify container image registry mirroring and URL rewrites. This can be done in nearly every possible combination, which allows the user to run Kubernetes in partially or fully disconnected environments. For example, a registry mirror could be specified by this entry inside the global registry configuration file /etc/containers/registries.conf:

[[registry]]
prefix = "example.com/foo"
mirror = [
    { location = "example-mirror-0.local/path" }
]

If CRI-O now pulls an image named example.com/foo/image:latest, it will first try the configured mirror on example-mirror-0.local/path/image:latest.

Let's assume that the alpine image got pulled successfully on the node, what would be the next step? Well, CRI-O will receive a CreateContainerRequest for every container running inside the pod. This request contains all the information needed for CRI-O to setup the environment correctly, like the command to be executed or the devices needed to be mounted. The container processes run on top of conmon too, but won't be started from the beginning. CRI-O waits for an incoming StartContainerRequest to finally start the container. During the whole time the kubelet synchronizes with CRI-O by checking if the sandbox and its containers are still running and has not stopped unexpectedly.

An additional AttachRequest is needed to be able to access the container via a streamed terminal session. The request returns the URL to the CRI-O internal streaming server, which then allows to interactively stream the commands running from the shell session created by kubectl run. Because we specified the --rm flag, kubectl will shut down the pod after it has been exited. The kubelet will actively create two additional requests to cleanup the workload: First a StopContainerRequest and then a StopPodSandboxRequest. CRI-O will now take care in removing the container processes, the mounts created for them and cleaning up the networking interfaces.

Conclusion

That's the complete life cycle of a container workload in Kubernetes. A lot of details and features have not been covered here, but I will leave this up to you as you explore the exciting world of container runtimes. Join our CRI-O community by contributing to the project or by messaging us in the official Kubernetes #crio Slack channel.

That's it, I hope you enjoyed the read and got some insights about how the kubelet interacts with CRI-O to drive the Kubernetes workloads. Feel free to reach out to me anytime for feedback or questions, I'm happy to get in contact with you!

##

About the Author

Sascha Grunert 

Sascha is a Senior Software Engineer at SUSE, where he works on many different container related open-source projects like Kubernetes and CRI-O. He joined the open-source community in November 2018, having gained container experience before joining SUSE. Sascha's passions include contributing to open source, as well as giving talks and evangelizing Kubernetes-related technologies.Sascha joined SIG Release (Release Notes team) in the beginning of the Kubernetes 1.14 release cycle to boost the community and add a different perspective when compared to his daily work.

Published Wednesday, October 30, 2019 7:24 AM by David Marshall
Comments
There are no comments for this post.
To post a comment, you must be a registered user. Registration is free and easy! Sign up now!
top25
Calendar
<October 2019>
SuMoTuWeThFrSa
293012345
6789101112
13141516171819
20212223242526
272829303112
3456789