Virtualization Technology News and Information
Article
RSS
Enterprise Service Mesh -- a panacea or beginning of the problem?

By Raj Nair Founder and CEO, Avesha

Enterprises find themselves in a cloud journey with lots of opportunity and challenges. They enjoy the agility and ubiquity of a cloud-first architecture yet are challenged by scaling it up in 4 dimensions: Security, Observability, Resiliency and Load Balancing. In this paper, we examine each of these dimensions from the viewpoint of an enterprise deployment.

Complexity of Microservice Interconnections

A key feature of a modern architecture is the communication among microservices talking to each other that can lead to a lot of complexity causing challenges with observability and security. This is getting worse as applications grow exponentially in the number of microservices.

uber-service-graph 

Figure 1: Uber Service Graph

This problem is only compounded further by the factors of multi-cluster/ multi-cloud deployment and continued growth of microservices.

Scaling Locations

Enterprises must deploy in multiple locations to meet their respective business objectives of a global customer base, together with the need to respect national boundaries for data sovereignty and maintain good performance from the latency standpoint.

Modernization of Applications

While traditional monolith applications only talk minimally to each other or to microservices, the challenge is to provide a way to surface more APIs from the monoliths and scale up the interconnectivity across domains and clusters.

Enterprise Service Mesh - key building block for application connectivity

The Enterprise Service Mesh (ESM) has emerged as the key building block for inter-service connectivity. Envoy and Istio have effectively abstracted inter-connectivity via the "side car" mechanism that moves the complexity of connectivity out of the application into the service mesh. In addition, Envoy handles load balancing and traffic limiting functions that make it capable of handling north-south traffic as well - opening the possibility of a simplified network infrastructure layer that integrates API gateway functionality. This is evident from the latest positioning of service mesh providers - yet, what is lacking is a comprehensive service-level network architecture that addresses the scaling of the ESM in terms of sheer growth of interconnections and the diversity of locations.

  • IP Address Allocation

One of issues is that the notion of internal and external addresses is not enough. In a cloud, what is internal to a customer is different from what is internal to the cloud and it is easy to have IP addresses that overlap causing confusion. This problem is confounding as the meshes grow because individual teams deploy to the same cluster and there is no automated mechanism to prevent overlaps.

  • Need for Automation

With growth comes the challenge of configuration and administration. A common practice among enterprises is to have very large clusters due to the problem of interconnection - where it is challenging to ensure security across clusters without a lot of tedious Netops support to manually implement and maintain network policies. In addition, the configuration of the service meshes are not optimized for the same reason. Automation of some of these functions is sorely needed to support scalability.

Mesh of Meshes

One way to extend the scale of service meshes is to look to the way Internet had evolved to a network of networks that crossed multiple administrative domains. For service meshes, this would mean that we have a mesh of underlying service meshes via gateways to those service meshes. These gateways become the "edges" of the service meshes and perform gatekeeping and content routing functions. This "Mesh-of-Meshes" approach can address the scalability evaluated in the 4 dimensions:

  • Security As interconnections increase, the surface area increases, and this needs protection beyond what mTLS offers because some of the traffic is inevitably from outside the local service mesh. A more intelligent monitoring of traffic is necessary to flag and deal with attacks - in other words, the zero-trust position must consider multi-mesh traffic across different network domains. This is critical to achieve zero-trust where the gateway itself would need to be a participant in the service mesh. This would represent a way to extend zero-trust to monolithic applications.
  • Observability Traffic across all clusters exponentially increases the complexity of the potential interactions. It is important to provide end-to-end visibility and traceability of the ESM to help isolate issues and prevent them from affecting other service meshes in the service graph. Currently, there is no easy way to troubleshoot and trace issues because existing mechanisms such as namespaces do not offer traffic segmentation.
  • Resiliency In an ESM, resiliency is inherently across clusters and includes failover scenarios working across the entire ESM. Likewise, the capability of failover may be checked and tested continuously. Failover mechanisms can work seamlessly across application tiers with minimum deployment costs. Existing mechanisms are too expensive to deploy and slow to respond because of a lack of a mechanism to isolate failures.
  • Load Balancing Across an ESM, workloads may be placed at the right locations leveraging available hardware acceleration (for compute - GPU or CPU -- and storage), latency to the client or cloud costs. In addition, traffic may be directed to the right clusters to receive the best available service - particularly useful for overload or failure conditions to avoid turning away client traffic. A new controller of controllers is needed to balance workloads and be sensitive to changing QoE of workloads.

Conclusion

In this paper, we sketched a vision of an Enterprise Service Mesh that can meet the needs of Enterprise Application Connectivity through a Mesh of Meshes approach. We analyzed what this might mean from the dimensions of Security, Observability, Resiliency and Load Balancing. We conclude that automation with smarts is an important requirement for scaling service meshes in the Enterprise.

##

To hear more about cloud native topics, join the Cloud Native Computing Foundation and cloud native community at KubeCon+CloudNativeCon North America 2021 - October 11-15, 2021     

ABOUT THE AUTHOR

Raj Nair Founder and CEO, Avesha

raj nair 

Raj Nair is a successful serial entrepreneur who has previously founded very successful startups in the area of cloud computing and media delivery. His first startup, Arrowpoint Communications, invented L5 network load balancing and was acquired by Cisco Systems for $6B. Later he co-founded Azuki Systems, a pioneer in Over-the-Top media delivery to phones and tablets and was acquired by Ericsson in 2014. Raj has over 35 patents in a range of communications technologies including content routing, security and load balancing.
Published Thursday, September 30, 2021 7:33 AM by David Marshall
Filed under: ,
Comments
There are no comments for this post.
To post a comment, you must be a registered user. Registration is free and easy! Sign up now!
Calendar
<September 2021>
SuMoTuWeThFrSa
2930311234
567891011
12131415161718
19202122232425
262728293012
3456789