By Raj Nair Founder and CEO, Avesha
Enterprises find themselves in a cloud journey with lots of
opportunity and challenges. They enjoy the agility and ubiquity of a
cloud-first architecture yet are challenged by scaling it up in 4 dimensions:
Security, Observability, Resiliency and Load Balancing. In this paper, we
examine each of these dimensions from the viewpoint of an enterprise
deployment.
Complexity of Microservice Interconnections
A key feature of a modern architecture is the communication
among microservices talking to each other that can lead to a lot of complexity
causing challenges with observability and security. This is getting worse as applications
grow exponentially in the number of microservices.
Figure 1: Uber Service Graph
This problem is only compounded further by the factors of
multi-cluster/ multi-cloud deployment and continued growth of microservices.
Scaling Locations
Enterprises must deploy in multiple
locations to meet their respective business objectives of a global customer
base, together with the need to respect national boundaries for data
sovereignty and maintain good performance from the latency standpoint.
Modernization of Applications
While traditional monolith
applications only talk minimally to each other or to microservices, the
challenge is to provide a way to surface more APIs from the monoliths and scale
up the interconnectivity across domains and clusters.
Enterprise Service Mesh - key building block for application
connectivity
The Enterprise Service Mesh (ESM) has emerged as the key
building block for inter-service connectivity. Envoy and Istio have effectively
abstracted inter-connectivity via the "side car" mechanism that moves the
complexity of connectivity out of the application into the service mesh. In addition,
Envoy handles load balancing and traffic limiting functions that make it
capable of handling north-south traffic as well - opening the possibility of a
simplified network infrastructure layer that integrates API gateway
functionality. This is evident from the latest positioning of service mesh
providers - yet, what is lacking is a comprehensive service-level network
architecture that addresses the scaling of the ESM in terms of sheer growth of
interconnections and the diversity of locations.
One of issues is that the notion of internal and external
addresses is not enough. In a cloud, what is internal to a customer is
different from what is internal to the cloud and it is easy to have IP
addresses that overlap causing confusion. This problem is confounding as the
meshes grow because individual teams deploy to the same cluster and there is no
automated mechanism to prevent overlaps.
With growth comes the challenge of configuration and
administration. A common practice among enterprises is to have very large
clusters due to the problem of interconnection - where it is challenging to
ensure security across clusters without a lot of tedious Netops support to
manually implement and maintain network policies. In addition, the
configuration of the service meshes are not optimized for the same reason.
Automation of some of these functions is sorely needed to support scalability.
Mesh of Meshes
One way to extend the scale of service meshes is to look to
the way Internet had evolved to a network of networks that crossed multiple
administrative domains. For service meshes, this would mean that we have a mesh
of underlying service meshes via gateways to those service meshes. These
gateways become the "edges" of the service meshes and perform gatekeeping and
content routing functions. This "Mesh-of-Meshes" approach can address the
scalability evaluated in the 4 dimensions:
-
Security As interconnections increase,
the surface area increases, and this needs protection beyond what mTLS offers
because some of the traffic is inevitably from outside the local service mesh.
A more intelligent monitoring of traffic is necessary to flag and deal with
attacks - in other words, the zero-trust position must consider multi-mesh
traffic across different network domains. This is critical to achieve
zero-trust where the gateway itself would need to be a participant in the
service mesh. This would represent a way to extend zero-trust to monolithic
applications.
-
Observability Traffic across all clusters
exponentially increases the complexity of the potential interactions. It is
important to provide end-to-end visibility and traceability of the ESM to help
isolate issues and prevent them from affecting other service meshes in the
service graph. Currently, there is no easy way to troubleshoot and trace issues
because existing mechanisms such as namespaces do not offer traffic
segmentation.
-
Resiliency In an ESM, resiliency is
inherently across clusters and includes failover scenarios working across the entire
ESM. Likewise, the capability of failover may be checked and tested continuously.
Failover mechanisms can work seamlessly across application tiers with minimum
deployment costs. Existing mechanisms are too expensive to deploy and slow to
respond because of a lack of a mechanism to isolate failures.
-
Load Balancing Across an ESM, workloads
may be placed at the right locations leveraging available hardware acceleration
(for compute - GPU or CPU -- and storage), latency to the client or cloud
costs. In addition, traffic may be directed to the right clusters to receive
the best available service - particularly useful for overload or failure
conditions to avoid turning away client traffic. A new controller of
controllers is needed to balance workloads and be sensitive to changing QoE of
workloads.
Conclusion
In this paper, we sketched a vision of an Enterprise Service
Mesh that can meet the needs of Enterprise Application Connectivity through a
Mesh of Meshes approach. We analyzed what this might mean from the dimensions
of Security, Observability, Resiliency and Load Balancing. We conclude that
automation with smarts is an important requirement for scaling service meshes
in the Enterprise.
##
To hear more
about cloud native topics, join the Cloud Native Computing Foundation and cloud native community at KubeCon+CloudNativeCon North America 2021 - October 11-15, 2021
ABOUT THE AUTHOR
Raj Nair Founder and CEO, Avesha
Raj Nair is a successful serial entrepreneur who has
previously founded very successful startups in the area of cloud computing and
media delivery. His first startup, Arrowpoint Communications, invented L5
network load balancing and was acquired by Cisco Systems for $6B. Later he
co-founded Azuki Systems, a pioneer in Over-the-Top media delivery to phones
and tablets and was acquired by Ericsson in 2014. Raj has over 35 patents in a
range of communications technologies including content routing, security and
load balancing.