Four Main Service Mesh Deployment Models
This article provides a comprehensive analysis of the four primary service mesh data plane deployment models: Sidecar, Ambient, Cilium mesh, and gRPC. We explore the architecture, performance, security, management complexity, and resource efficiency of each model, offering recommendations to help you make the best decision for different application scenarios. Whether you prioritize high performance, low resource consumption, or require stronger security guarantees, this guide will help you choose the right deployment model.
Tetrate offers an enterprise-ready, 100% upstream distribution of Istio, Tetrate Istio Subscription (TIS). TIS is the easiest way to get started with Istio for production use cases.
Get access now ›
Service Mesh Data Plane Comparison Matrix
The following table compares the deployment modes of the service mesh data plane across multiple dimensions:
Data plane modes | Platform security – threat assessment, risk | Resource Efficiency – infra/resource consumption, etc. | Manageability – upgrades, vulnerabilities, etc. | Performance – latency, etc. |
Sidecar mode: L4 and L7 Proxy per Service Instance | High security, as each service instance has an independent proxy, reducing the attack surface. Risk management depends on control plane configuration. | Higher resource consumption, as each instance requires an independent proxy. | Centralized management and configuration required, upgrades are relatively complex, but can be simplified through the control plane. | May increase latency as requests need to be forwarded through the proxy. |
Ambient mode: Shared L4 – L7 per Service Model | Designed for security with ztunnel(in Istio) for local routing. However, shared proxies can introduce risks, and its overall security maturity is still evolving. | Higher efficiency as multiple services share the same proxy. | Relatively simple management, but may face vulnerabilities due to the shared proxy. | Good performance with local routing, but may incur cross-AZ costs with waypoint proxies. |
Cilium mesh mode: Shared L4 and L7 Model | Moderate security with a focus on eBPF and fine-grained access control. However, there are known issues with identity and trust models. | Efficiency due to kernel-level processing, reducing infrastructure expenses. | Management is more complex, needing to handle configurations for multiple services. | Variable performance; certain scenarios might introduce significant latency. |
gRPC mode: L4 and L7 Part of the Application Model | While gRPC integrates proxy functions within the application, theoretically reducing the attack surface, the application’s complexity and variability can actually expand it. The security of the gRPC mode depends on specific use cases and needs careful evaluation of potential threats and attack surfaces. | Higher efficiency because the proxy is implemented inline in the same process as the app. | Complex management, regular updates and maintenance of application layer proxy required. | Superior performance with low latency, suitable for real-time applications. |
These four deployment modes are differentiated based on how proxies are associated with service instances.
The following diagram illustrates potential locations for proxies in different deployment modes of the service mesh data plane.
- Sidecar Mode: The proxy is in the same Pod as the application container.
- Ambient Mode: The L4 proxy is on the same node as the application container, while the L7 proxy may not be on the same node.
- Cilium Mode: The L4 and L7 proxies are combined and located on the same node as the application container.
- gRPC Mode: The gRPC framework is integrated into the application and deployed within the same container.
Sidecar Mode: L4 and L7 Proxy per Service Instance
The diagram below shows the communication paths in sidecar mode where Application 1 accesses Application 2 on the same node and Application 3 across nodes.
This is the most common deployment mode for service meshes and was the initial mode supported by Istio. Each service instance is accompanied by a proxy (such as Envoy), which handles all inbound and outbound network communications, including L4 and L7 layers.
- Advantages: High security, as each service instance is isolated, reducing potential attack surfaces.
- Disadvantages: High resource consumption, as each service instance requires a separate proxy, increasing infrastructure costs.
- Maturity: The maturity of the Istio Sidecar mode has reached the production level. They have undergone extensive testing and are ready for use in actual environments.
Ambient Mode: Shared L4 – L7 per Service Model
The diagram below illustrates the communication paths in ambient mode where Application 1 accesses Application 2 on the same node and Application 3 across nodes.
In this mode, a shared L4 proxy on each node serves all service instances on the same physical host, while each service account has a dedicated L7 proxy.
- Advantages: Lower costs, as the proxy is shared among multiple services.
- Disadvantages: Although the zTunnel component is designed for security, shared proxies can introduce risks. The security maturity of this model is still evolving.
- Maturity: The Istio ambient mode is currently in the beta stage, there are no large-scale production-level best practices yet, and it does not support multi-clusters.
In Istio ambient mode, zTunnel is deployed as an L4 proxy on each node. Waypoint, as an L7 proxy, can be deployed according to Namespace, specific services, or namespace sharing.
Cilium Mesh Mode: Shared L4 and L7 Model
The diagram below displays the communication paths in Cilium mesh mode where Application 1 accesses Application 2 on the same node and Application 3 across nodes.
This mode is a middle ground between fully independent and fully shared setups, with each node having a shared L7 proxy. However, L4 functions like traffic routing can be managed without a proxy through kernel programs (e.g., eBPF programs) or a mesh proxy. An example of this data plane mode is the Cilium service mesh, which deploys Envoy proxies as L7 proxies according to its CiliumEnvoyConfig specification. Using a CNI plugin like Cilium can achieve secure isolation between services while reducing resource usage.
- Advantages: Kernel-level efficiency can reduce infrastructure costs in specific scenarios.
- Disadvantages: Management is more complex, and certain scenarios may result in increased latency.
- Maturity: Cilium mesh manages L4 traffic directly through eBPF and configures the Envoy proxy on each node to control L7 traffic via CRDs (such as CiliumEnvoyConfig). However, there are concerns about its security due to inconsistent identity models. The proxy is customized with minimal Envoy extensions and Cilium policy enforcement filters. As a result, this Cilium mesh may not support all the features of the Envoy proxy.
Note: This model is not the data plane of Istio.
gRPC Mode: L4 and L7 Part of the Application Model
In the gRPC mode, no external proxies are deployed; instead, proxy functions are directly integrated into the application using the RPC framework, leading to significant intrusion into the application. The service mesh control plane uses a set of discovery APIs known as xDS APIs to dynamically configure the application. The gRPC client libraries within the application provide extensive support for the xDS APIs. With this capability, the service mesh control plane can program L4 and L7 proxy functions directly within this library inside the service container.
The diagram below illustrates how, in Istio’s gRPC mode, the control plane communicates with the application.
In this mode, when a gRPC service communicates with the control plane, a traditional Sidecar proxy is not needed; instead, a specific agent is used for initialization and communication with the control plane. This design reduces resource consumption and deployment complexity while still enabling functions such as service discovery and traffic management.
- Advantages: High performance, as the proxy is tightly integrated with the application, reducing network hops and additional overhead.
- Disadvantages: High complexity, as complex network processing functions need to be implemented within the application, which may increase development costs.
- Security Considerations: The security of this model is debated. While integrating proxy functions within the application theoretically reduces the external attack surface, the application’s diversity and complexity could expand the overall attack surface. Therefore, when considering the security of the gRPC mode, it is crucial to carefully analyze the security threat model and attack risks in specific use cases.
- Maturity: The gRPC mode in Istio is still in the experimental stage.
Which Mode Should I Use?
As previously introduced, several factors influence the choice of a service mesh data plane deployment mode:
- Maturity
- Enterprise security needs
- Resource constraints
- Performance requirements
- Network overhead
- Tolerance for management complexity
Maturity
When considering the deployment modes of the service mesh data plane, maturity is a key factor. The maturity level of each mode affects its reliability and support in production environments:
- Sidecar Mode: This is the most mature service mesh deployment mode, widely adopted in production environments and well-supported.
- Ambient Mode: While this mode offers some cost and performance advantages, it is still in the early stages and may lack mature best practices and broad ecosystem support.
- Cilium Mesh Mode: As a relatively new option, it offers unique technological advantages, especially in scenarios using eBPF. However, concerns about its security model and identity management suggest it may not be as mature or reliable as other modes.
- gRPC Mode: Despite excellent performance, the complexity and intrusiveness of this mode mean it may require more custom development and is still in the experimental stage.
Enterprise Security Needs
If your business has high security requirements, such as in the financial or healthcare sectors, then the Sidecar Mode might be the best choice. This mode provides strong security by ensuring each service instance has its own independent proxy, thus maximizing service isolation. For those exploring newer models like Ambient Mode, it’s essential to understand that while ztunnel aims for secure local routing, the mode’s overall security strategy is still evolving.
Resource Constraints
In resource-constrained environments, deploying a separate proxy for each service instance may not be practical. In such cases, consider the gRPC Mode or Ambient Mode. gRPC Mode is particularly suitable for organizations that already use gRPC extensively and are willing to handle complex networking functions internally within the application. The Ambient Mode, on the other hand, uses a shared proxy to reduce resource consumption.
Performance Requirements
For applications requiring high performance and low latency, the gRPC Mode provides optimal performance because it eliminates the additional network hops introduced by traditional proxies. However, it’s important to note that the gRPC Mode is still experimental and may not support all features of Istio. Consider your service mesh functionality needs accordingly.
Network Overhead
Each data plane mode has distinct characteristics affecting network overhead. Sidecar mode, with locality-aware routing, reduces cross-zone traffic but adds network hops, increasing latency and compute use. Ambient mode uses ztunnels for local routing but may incur cross-AZ costs with waypoint proxies. Cilium mode places proxies on the same node as applications, potentially reducing inter-node traffic but could introduce more latency. gRPC mode integrates RPC framework into the application, minimizing network hops and overhead, ideal for high-performance, low-latency needs.
Tolerance for Management Complexity
Management complexity is also a significant consideration when choosing a service mesh data plane mode. Sidecar Mode and gRPC Mode may require more complex configurations and maintenance, while the Ambient Mode might offer a more streamlined management experience in some deployment environments. Cilium Mode could require complex management due to its reliance on eBPF and multiple configuration points.
Conclusion
Choosing the right service mesh data plane deployment mode depends on specific factors including maturity, security, resource constraints, performance, and management complexity. Here’s a quick guide:
- Sidecar Mode: Best for high security needs, offering the most isolation.
- gRPC Mode: Suitable for environments with high-performance demands where gRPC is already in use.
- Ambient Mode: Good for cost-effectiveness and lower isolation needs, but the security model is evolving.
- Cilium Mesh Mode: Could be good for infrastructures utilizing eBPF technology, but consider security and management complexity.
The best choice will align with your application requirements, security policies, and technical familiarity. It’s essential to understand each mode’s strengths and limitations to make an informed decision that balances benefits, risks, and costs.
References
###
If you’re new to service mesh, Tetrate has a bunch of free online courses available at Tetrate Academy that will quickly get you up to speed with Istio and Envoy.
Are you using Kubernetes? Tetrate Enterprise Gateway for Envoy (TEG) is the easiest way to get started with Envoy Gateway for production use cases. Get the power of Envoy Proxy in an easy-to-consume package managed by the Kubernetes Gateway API. Learn more ›
Getting started with Istio? If you’re looking for the surest way to get to production with Istio, check out Tetrate Istio Subscription. Tetrate Istio Subscription has everything you need to run Istio and Envoy in highly regulated and mission-critical production environments. It includes Tetrate Istio Distro, a 100% upstream distribution of Istio and Envoy that is FIPS-verified and FedRAMP ready. For teams requiring open source Istio and Envoy without proprietary vendor dependencies, Tetrate offers the ONLY 100% upstream Istio enterprise support offering.
Get a DemoA correction was made on Sept. 12, 2024: an earlier version of this article referred to “ambient mode” as “Ambient Mode – Shared L4/L7 per Service Node.” This has been corrected to more closely match NIST SP-800 233 initial public draft. The relative security of each mode has been updated in Figure 1 to more accurately reflect the assessment of NIST SP-800 233.