By Jeffrey Saelens -
Principal Engineer and W. Watson, Principal Consultant, Vulk Coop
Introduction
What does ideology mean? The common pejorative use of the term for another person's ideology is "that which
is wrong". Other definitions of ideology
are better characterized as ideas that suppress from the top "a rational system
of ideas to oppose the irrational impulses of the mob" to ideas that erupt from
the bottom "fantasies that hide inconvenient truths" [1].
The term cloud
native has been used along a spectrum of ideological definitions starting
with "no one knows what cloud native means" (That which is wrong) to treating
cloud native as a panacea (a fantasy).
The answer is somewhere in the middle.
A clue to how to reason about cloud native ideas in this middle ground
can be found in the paper "Programming-in-the-large versus
programming-in-the-small" by Frank DeRemer and Hans Kron written back in 1975 [2]. The problem that has been plaguing
programming for 40+ years is figuring out how to reason about, define, and
direct several programs, components, or modules as a system. When we look at
cloud native principles as a whole, it is analogous to reasoning about
programming in the large, but the various domains of cloud native
(containerization, CI/CD, orchestration, observability, service meshes,
networking, etc) [3] can be
associated with programming in the small.
Some aspects of the former are found in the latter.
With respect to networking, the right question
to ask is "how do we take cloud native principles and apply them to the
individual domain of networking?" For
cloud native in the large, networking means service mesh networking, such as
service discovery, health checking, routing for REST services, load balancing,
authentication and authorization, and the generation of observability metrics
and tracing [4]. These are mostly layer 7 concerns but really
address networking the "components or modules" as a system. For cloud native in the small, networking
includes concerns down to layer 2, with implications for layer 1. For cloud native in the small, in the words
of Ed Warnicke, the packet is the payload.
Cloud Native Buzzword Bingo
What About Cloud Native Buzzword
Bingo?
The result of any critique of an architectural
offering should be the trade offs of the offering. Critiquing the promise of
cloud native should be no different. One
recommendation to keep us on track with using the terms and definitions of
cloud native in a reasonable way is to force ourselves to define what we mean
by the terms. We can do this by playing
buzzword bingo during our talks, papers, and conferences to keep us from
relying too heavily on buzzwords. [5] When describing cloud native networks while
playing, a question to ask ourselves is "does this description sound more like
an enterprise application concept, a traditional SDN networking concept, or
neither and the definitions are not well defined?"
SDN - Champion of All Buzzwords
Software defined networking (SDN) might just
be the champion of all buzzwords in the networking space. A proposed shift in
paradigm, not unlike the recent pivot to cloud native, in how network
operations were to be run, SDN was supposed to fundamentally change how the
industry approached networking. Unfortunately,
SDN buckled under its own hype and a business model that threatened the
very vendors pushing solutions into the market. It's hard to identify at this
point what problem SDN set out to solve as its scope has creeped indefinitely
since its proposal. Reducing complexity, automating network operations and
providing a common interface to multiple network vendors equipment are some of
the primary examples of what SDN promised to deliver on. This desire to please
everyone resulted in a confusing landscape of anything and everything claiming
to be "SDN". Given this, it's hard to tell what SDN even entails. Is it a
complex protocol like OpenFlow and NETCONF or possibly custom python
manipulating CLIs/APIs? Furthermore, what is the difference between SDN,
orchestration and policy driven networking?
This confusion makes at least two of the core problems, reducing
complexity and interoperability between vendors, very hard to achieve. All of
this is not to say there have been no SDN success stories; however, what is
typically seen are greenfield deployments of both hardware and software the
vast majority of the time. This is often a suitable approach in the enterprise
space, but creates challenges in multi-vendor communication service provider's
(CSP) networks with large brownfields.
NFV - 50 Shades of Virtualization
Network Function Virtualization (NFV) was
another major proposal for shifting how networking at large is tackled. Unlike
SDN, NFV had a very clearly defined goal, reduce OPEX. It would achieve this by
enabling network operators to migrate to a common off the shelf (COTS) model
for their hardware base, and simply license virtual network functions from
vendors in a multi-vendor ecosystem. This ideal turned out to be quite the
reach. Endless variations between CSP's infrastructure, and physical network
function code simply shoved into a virtual machine and labeled a VNF made the
early days of NFV a continuous cycle of pain and frustration. Compounding matters,
the NFV deployment models and architectures often directly mirrored their
legacy physical counterparts with additional complexity built in. Using the
packet core as an example, line cards often had a one-to-one mapping directly
to a single VM consuming an entire server. This perpetuated the appliance based
approach within NFV making integration extremely challenging. The appliances
were built with very specific expectations with regards to infrastructure
tuning and the type of dataplane available to it. Some wanted SR IOV, others a
DPDK enabled virtual switch to handle packet treatment before reaching their
VNF. The pain that arose from this infinite mutability was likely a major
catalyst for pushing the cloud native community towards standardizing on
immutable infrastructure. Finally, there is the provisioning side of NFV.
ETSI's MANO architecture for VNF orchestration and life-cycle management came
out very early in the NFV journey. While providing a general context of how
things "could" be done, its lack of specificity made using it as an established
standard challenging. There was room for an endless amount of interpretation as
to how a network function virtualization orchestrator (NFVO) was supposed to
communicate with the virtual network function manager (VNFM) and virtual
infrastructure manager (VIM). Every vendor sought to innovate in this space,
and each had a different take. This complexity delayed the maturity of the
overall virtualization efforts within the risk adverse CSP space, causing some
providers to question if they should skip it entirely with the cloud native
tsunami on the horizon.
SDN and NFV, what went wrong?
The unique challenges and complexity of both
the SDN and NFV spaces have made that ever elusive OPEX reduction hard to achieve.
Too often, SDN and NFV are used interchangeably despite solving very different
technical problems. They've been lumped into the primordial soup we refer to as
buzzword bingo, losing their gravity. The lessons learned in these spaces
should and can be carried forward. One of the biggest deltas between the
SDN/NFV and cloud native approaches is the concept of declarative consumption
models. Even in the more successful SDN and NFV deployments, the presentation
layer is incredibly imperative. This means that operators have to hire multiple
experts for a single deployment across all their teams. First, each team will
need someone who granularly knows what the VNF configuration and network
service is supposed to encompass. Second, these teams will need someone who
understands the modeling languages of the MANO stack they chose to deploy and
what the API interactions between the stack entails. Third, each organization
needs someone who deeply understands virtualization to tune and troubleshoot
the infrastructure. An alternative approach would be to jettison this mindset
and instead put the needed skill sets on the appropriate teams. These teams
would fall within the declarative vs imperative spectrum and focus in on their
specializations. Teams would imperatively define their domain, creating
abstractions that allow themselves and other teams to declaratively consume the
work of others.
What You Think You Want
Service providers have the desire for speedy
deployments and changes to code with minimal toil. They want to apply security patches without
having to upgrade everything in lock step.
They want resilient infrastructure.
What many service providers think they want from cloud native is
Kubernetes. Kubernetes has benefits of
orchestration. Running Kubernetes seems
to legitimize containerization previously done on workloads. But there are problems with just adopting
Kubernetes without changing your development process. This premature adoption really gets the cart
before the horse. The many of the desires
of service providers are solved higher up in the CNCF's cloud native trail map.
Cloud Native Trail Map [ 6 ]
What You Really Want
What service providers really want, if you
filter out the noise, is uninterrupted service delivery to their customers and
the ability to continuously deploy new services without impacting their
infrastructure. Translating this into a cloud native paradigm, what service providers really want are the
properties of agile. The properties
of speed of deployments and the ability to change code with minimal toil. This is supported by a strong continuous
delivery process, which is often skipped on the trail to cloud native.
With continuous delivery, you have a process
(a pipeline) that applies a series of tests at different stages. The first stage creates an artifact from the
source code, the second stage tests the integration of that artifact with other
artifacts, configuration, and resources in a test environment. The third stage and beyond can be manual
tests, deployment into production, or other environments. Sometimes the later stages include tests of
how well stacks of infrastructure elements (servers, switches, databases, i.e.
any grouping of elements that must be
modified all at once) work together.
Continuous Delivery
The reason why you need continuous delivery is
to pull the pain forward. The more you
practice something the easier it becomes.
For service providers this means preferring software and hardware
options that lend themselves to the continuous delivery process. This means promoting, contributing to, and
using open source, open hardware, bare metal switches, and commodity solutions
that allow for completely automated deployments [7] with separate artifacts [ 8 ],
configuration [9], and
environments [10]. [11] If you use proprietary software and hardware,
it can be harder to automate deployments.
Service providers need to demand the ability to make completely
automated deploys with separate artifacts, configuration, and environments from
the vendors in order to get the benefits of an agile process.
Immutable Infrastructure
If infrastructure is immutable, it is easily
reproduced, consistent, disposable, will have a repeatable provisioning
process, and will not have configuration or artifacts that are modifiable in
place. Kubernetes allows for immutable
infrastructure above the orchestration level.
The orchestrator itself needs to be in a continuous delivery process as
well, in order to get agile benefits for everything below the application
level, which is a great cause of pain for service providers. Changes that happen frequently in production
should be intelligently automated and based on conditions that trigger the
change. This is a significant change
from the previous NFV mentality.
Challenges
There are many challenges when applying cloud
native principles to networking. One of
the many challenges is identifying the requirements from the diverse service
provider community. One type of
requirement for larger service providers is the ability to procure solutions
from multiple vendors. This means that
vendors can not sell solutions that do not play nice with other vendor's
solutions. An example of failure between
implementations in the past would be SNMP [12] and Netconf/YANG. Some success stories
would be HTTP and IPv4 [13]. The deployment process should be integrated
with the procurement process to help facilitate multiple organizations, working
with Conway's law [14] and not
against it. For larger service providers
this should include automated performance, security, and compliance testing
stages in the continuous delivery process.
Network Service Mesh
Network Service Mesh [15] is one solution that lends itself to declarative configuration for layer 2 and
layer 3 payloads. This means that it
lends itself to a comprehensive continuous delivery process as well, since the
configuration can be saved in source control, versioned, tested, and is
generally easier to reason about [16]. This helps with requirements for easily
repeatable deployments and the separation of network service artifacts, network
configuration, and network environments which is needed for testing. It also has abstractions between the network
service developer, the operator (i.e sneaky network people), and even the
consumer (i.e. Sarah, the application developer), that are easier to reason
about.
When an application developer consumes a cloud
native networking function, network service mesh allows the application
developer to consume it using a declarative API. If an operator combines cloud native network
functions into a service chain using network service mesh, they combine the
services using a declarative API and then expose those services as a
declarative API for the application developer.
When a cloud native network function developer creates networking
software using the network service mesh, they expose that software using a
declarative API.
Conclusion
For CNFs to qualify as cloud native, the
ideology behind what it means to in fact be cloud native must be taken into
consideration. Simply repackaging network functions in containers and deploying
them with Kubernetes will not be enough to give services providers what they
"want" or "need". While NSM might not be a silver bullet, it is at least
approaching the problem with cloud native considerations at the forefront as
opposed to being relegated to a nice to have afterthought.
##
About the Authors
Jeffrey Saelens -
Principal Engineer
Jeffrey Saelens is a
Principal Architect in the telecommunications industry. Starting his career in
the US Army, Jeffrey was a Green Beret who focused on communications and
systems engineering. After leaving the military, he dove into the service
provider world heavily focusing on NFV and SDN transformations within data
center, core and access networks. Currently, Jeffrey seeks out ways of
implementing cloud native philosophies within the service provider space.
W. Watson - Principal Consultant, Vulk Coop
W.
Watson has been professionally developing software for 25 years. He has spent
the numerous years studying game theory and other business expertise in pursuit
of the perfect organizational structure for software co-operatives. He also
founded the Austin Software Cooperatives meetup group and Vulk Coop as an
alternative way to work on software as a group. He has a diverse background
that includes service in the Marine Corps as a computer programmer, and
software development in numerous industries including defense, medical,
education, and insurance. He has spent the last couple of years developing
complementary cloud native systems such as the cncf.ci dashboard. He currently
works on the Cloud Native Network Function (CNF) Conformance test suite
(https://github.com/cncf/cnf-conformance) the CNF Testbed
(https://github.com/cncf/cnf-testbed), and the cloud native networking
principles (https://networking.cloud-native-principles.org/) initiatives.
Recent speaking experiences include ONS NA, KubeCon NA 2019, and Open Source
Summit 2020.
[1] https://en.wikipedia.org/wiki/Ideology
[2] https://dl.acm.org/doi/10.1145/390016.808431
[3] https://github.com/cncf/trailmap
[4] https://blog.envoyproxy.io/service-mesh-data-plane-vs-control-plane-2774e720f7fc
[5] https://en.wikipedia.org/wiki/Buzzword_bingo
[ 6 ] https://github.com/cncf/trailmap
[7] A deployment is the act of putting artifacts into
an environment.
[ 8 ] An
artifact is the result of compiling code: a binary. It is environment agnostic.
[9] Configuration
is specific to an environment. It is
used to communicate to artifacts information about the environment
[10] An
environment is configuration plus resources (such as a server or switch). It is everything besides the artifacts. An environment usually has a name such as
‘test' or ‘production'
[11] "Open code availability.
Perhaps the next most important technical consideration is that a protocol have
freely available implementation code. This may have been the case when deciding
between IPv4 and IPX, the latter of which at the time was, in many ways, the
technically superior of the two." https://www.ietfjournal.org/what-makes-for-a-successful-protocol/
[12] "SNMP implementations vary
across platform vendors. In some cases, SNMP is an added feature, and is not
taken seriously enough to be an element of the core design. Some major
equipment vendors tend to over-extend their proprietary command line interface (CLI) centric configuration andcontrol systems." https://en.wikipedia.org/wiki/Simple_Network_Management_Protocol#Implementation_issues
[13] "If we apply those
definitions, then a protocol such as HTTP is defined as wildly successful
because it exceeded its design in both purpose and scale. Another example of a
wildly successful protocol is IPv4. Although it was designed for all purposes
("Everything over IP and IP over Everything"), it has been deployed on a far
greater scale than it was originally designed to meet." https://www.ietfjournal.org/what-makes-for-a-successful-protocol/
[14] https://en.wikipedia.org/wiki/Conway%27s_law
[15] https://networkservicemesh.io/
[16] "Declarative
configuration is different from imperative configuration , where you simply
take a series of actions (e.g., apt-get install foo ) to modify the world.
Years of production experience have taught us that maintaining a written record
of the system's desired state leads to a more manageable, reliable system. Declarative configuration enables
numerous advantages, including code
review for configurations as
well as documenting the current state of
the world for distributed teams. Additionally, it is the basis for all of the self-healing behaviors in Kubernetes
that keep applications running without user action." Hightower, Kelsey; Burns,
Brendan; Beda, Joe. Kubernetes: Up and Running: Dive into the Future of
Infrastructure (Kindle Locations 892-896). Kindle Edition.