This is the first in a series of three articles that reviews the development of the Istio open-source project (this article), shows how to optimize Istio performance, and describes Istio’s open-source ecosystem and future. I also share my view on the most appropriate use of eBPF with Istio, mostly in the second article.
Service mesh technology is on the rise due to the popularity of Kubernetes container management software, the use of microservices and the DevOps approach in application development and delivery, and the growing use of cloud native architectures. Istio is the leading service mesh software, nearly always implemented with Envoy as a sidecar proxy.
The rise of Kubernetes and programmable data proxies such as Envoy proxy create the foundation for Istio. The future of Istio is to further serve as the foundation for a secure, zero-trust network.
The Birth of Istio
Since 2013, with the explosion of the mobile Internet, enterprises have needed greater efficiency and faster updates to existing code in application development. As a result, application architecture has shifted from the traditional monoliths to microservices, and the DevOps approach to application development and delivery has also become popular.
In the same year, with Docker container software being put into open source, the problems of application encapsulation and isolation were largely solved, making it easier to schedule applications using an orchestration system.
In 2014, Kubernetes and Spring Boot were open-sourced, and using the Spring Framework for the development of microservice applications became popular. In the next few years, many remote procedure call (RPC) middleware open source projects appeared, such as Google’s gRPC 1.0, released in 2016.
Now, the service framework is in full bloom. To save costs, increase development efficiency, and make applications more flexible, enterprises are migrating to the cloud. But, this is not just as simple as moving applications to the cloud. To use cloud computing resources efficiently, a set of “cloud native” methods and concepts are also on the horizon.
The Istio Open Source Timeline
Let’s briefly review the major events of Istio open source, beginning with the Envoy data proxy, since it’s an important part of Istio:
- September 2016: Before becoming open source, Envoy was used as an edge proxy inside Lyft, and it was verified in large-scale production inside Lyft. In fact, Envoy was open sourced before Istio was open sourced, and Envoy got the attention of Google engineers. At that time, Google was planning to launch an open source service mesh project, initially planning to use Nginx as a proxy. In 2017, the Envoy open source project was donated to CNCF.
- May 2017: Istio was open sourced by Google, IBM, and Lyft, using Envoy as a data proxy. The microservices architecture was used from the beginning. The composition of the data plane, control plane, and sidecar pattern was determined.
- March 2018: Kubernetes successfully became the first project to graduate from CNCF, becoming increasingly “boring.” The basic API was finalized. In the second edition, CNCF officially wrote the service mesh into the cloud native first definition. Tetrate was founded by the Google Istio team.
- July 2018: Istio 1.0 is released, billed as “production ready”.
- March 2020: Istio 1.5 is released. The architecture returned to a monolithic application; the release cycle was determined, with a major version released every three months; and the API became stable.
- From 2020 to the present: The development of Istio mainly focuses on Day 2 operation, performance optimization, and extensibility. Several open-source projects in the Istio ecosystem have begun to emerge, such as Slime, Areaki, and Merbridge.
Why Did Istio Arrive after Kubernetes?
The fundamental drivers of service mesh are the combined effects of a proliferation of languages and associated technology stacks, the surge in the number of services, and the shortened life cycle and dynamism of containers.
To make it possible for developers to manage traffic between services with minimal cost, Istio needs to solve three problems:
- Provide an efficient and scalable sidecar proxy that can be configured through an API.
- Inject proxies into applications to handle inter-application traffic and efficiently manage the fleet of distributed sidecar proxies.
- Transparently mediate traffic between applications, so developers can take advantage of Istio’s capabilities without modifying their applications.
Solving these three problems is indispensable for the Istio service mesh. The choice of this proxy has a direct effect on the direction and success of the project.
To solve container orchestration, scheduling, and management, Istio relies on Kubernetes. The need for a programmable proxy is solved by the Envoy proxy.
In Figure 1 below, we can see the transition of the service deployment architecture from Kubernetes to Istio.
From Kubernetes to Istio, in a nutshell, the deployment architecture of the application has the following characteristics:
- Kubernetes manages the life cycle of applications—specifically, application deployment and management (scaling, automatic recovery, and rollouts).
- Automatic sidecar injection using Kubernetes init container and sidecar mode to achieve transparent traffic interception. First, the inbound and outbound traffic of the service is intercepted through the sidecar proxy, and then the behavior of the proxy is managed through the configuration of the control plane, Istio. (There’s also a proxyless mode for Istio; see gRPC proxyless service mesh for details.)
- The service mesh decouples traffic management from Kubernetes, and the traffic inside the service mesh does not need the support of the kube-proxy component. Through an abstraction similar to the microservice application layer, the traffic between services is managed, providing security and observability features.
- The control plane sends proxy configuration to the data plane through the xDS protocol. The proxies that have implemented xDS include Envoy and the MOSN project, which, like Envoy, is open source.
Typically, north-south traffic in Kubernetes is managed by the Kubernetes Ingress resource. With Istio, this has changed, and the traffic is managed by the Gateway resource. Note that Istio has support for the Ingress resource as well. (This allows Istio to be used as a lightweight API gateway; a similar capability is being added to Envoy through the Envoy Gateway project.)
Transparent Traffic Hijacking
If you are using middleware such as gRPC to develop microservices, the interceptor in the SDK will automatically intercept the traffic for you, as shown in the following figure.
How to make the traffic in the Kubernetes pod go through the proxy? The answer is to inject a proxy into each application pod. Containers in the pod share the network space with the application, which allows us to hijack the inbound and outbound traffic and route it through the sidecar. The traffic is hijacked using iptables, as shown in the figure below.
From the figure, we can see a very complex set of traffic hijacking logic using iptables. The advantage of using iptables is that it is supported by any Linux operating system. But the use of iptables also has some side effects:
- All services in the Istio mesh need to add a network hop when entering and leaving the pod. Although each hop may be only a couple of milliseconds, as the dependencies between services in the mesh increase, total latency may increase significantly; this may not be suitable for services that require low latency.
- As the number of services increases, so does the number of injected sidecars. The control plane needs to deliver more Envoy proxy configurations to the data plane, which increases system memory and network resource use.
How to optimize the service mesh in response to these two problems?
- Use proxyless mode: remove the sidecar proxy and return to the SDK.
- Optimize the data plane: reduce the frequency and size of proxy configurations delivered to the data plane.
- eBPF: optimize network hijacking with eBPF.
Specifics for each of these steps will be provided in the performance optimization section, below.
Sidecar Operation and Maintenance Management
Istio is built on top of Kubernetes and leverages Kubernetes’ container orchestration and lifecycle management to automatically inject sidecars into pods through admission controllers. This happens each time Kubernetes creates pods.
To solve the sidecar resource consumption problem, there were four service mesh deployment modes proposed, as shown in the following figure.
The following table compares these four deployment methods in detail. Each of them has advantages and disadvantages. The specific choice you should use depends on the current situation.
|Mode||Memory overhead||Security||Fault domain||Operation and maintenance|
|Sidecar proxy||The overhead is most significant because a proxy is injected per pod.||Since the sidecar must be deployed with the workload, the workload can bypass the sidecar.||Pod-level isolation; if the proxy fails, only the workload in the Pod is affected.||A workload’s sidecar can be upgraded independently without affecting other workloads.|
|Node sharing proxy||There is only one proxy on each node, shared by all workloads on that node, with low overhead.||There are security risks in the management of encrypted content and private keys.||Node-level isolation; if a version conflict, configuration conflict, or extension incompatibility occurs when a shared proxy is upgraded, it may affect all workloads on that node.||There is no need to worry about injecting sidecars.|
|Service Account / Node Sharing Proxy||All workloads under the service account/identity use a shared proxy with little overhead.||Authentication and security of connections between workloads and proxies are not guaranteed.||The fault domain is the same as “node sharing proxy.”||Same as in Node sharing proxy mode, above.|
|Shared remote proxy with micro-|
|Because we inject a micro-proxy for each pod, the overhead is relatively large.||L4 routing is decoupled from security concerns as micro-proxy only handles mTLS.||When a L7 policy needs to be applied, traffic from workload instances is routed through a L7 proxy; otherwise, it can be avoided completely. This L7 proxy can be used as a remote proxy, a proxy for each service account, or a shared node proxy.||Same as Sidecar proxy mode, above.|
The Evolution of the Programmable Proxy
In a 2022 article, Zhang Xiaohui of Flomesh explained the evolution of proxy software. I will quote some of his views below to illustrate the crucial role of programmable proxies in Istio.
The following figure shows the evolution process of the proxy software from configuration to programmable mode, and the representative proxy software in each stage.
The entire proxy evolution process follows the application as it moves from local and monolithic to large-scale and distributed. I will briefly outline the evolution of proxy software:
- Configuration files era: almost all software has configuration files, and proxy software is inseparable from configuration files because of its relatively complex functions. The proxy at this stage is mainly developed using the C language, including its extension modules, which highlights the proxy’s ability. This is the original form of proxies, including Nginx, Apache HTTP Server, Squid, etc.
- Configuration language era: Proxies in this era are more extensible and flexible, and support features such as dynamic data acquisition and matching logic judgment. Varnish and HAProxy are two representative examples.
- Scripting language era: Since the introduction of scripting languages, proxies have become programmable. We can use scripts to add dynamic logic to proxies more easily, increasing development efficiency. The representative software is Nginx and its supported scripting languages.
- Clusters era: With the popularity of cloud computing, large-scale deployment and dynamic configuration of APIs have become necessary capabilities for proxies. With the increase in network traffic, large-scale proxy clusters have emerged. The representative proxies of this era include Envoy, Kong, etc.
- Cloud-native era: Multi-tenancy, elasticity, heterogeneous hybrid cloud, multi-cluster, security, and observability are all higher-level requirements for proxies in the cloud-native era. This will also be a historical opportunity for service meshes; the proxies will be combined together to form a mesh with representative software such as Istio, Linkerd, and Pypi.
Are These All Service Meshes?
The table below compares the current popular open source service mesh projects:
|Istio||Linkerd||Consul Connect||Traefik Mesh||Kuma||Open Service Mesh (OSM)|
|License||Apache License 2.0||Apache License 2.0||Mozilla License||Apache License 2.0||Apache License 2.0||Apache License 2.0|
|Initiator||Google, IBM, Lyft||Buoyant||HashiCorp||Traefik Labs||Kong||Microsoft|
|Service proxy||Envoy, which supports proxyless mode for gRPC||Linkerd2-proxy||Default is Envoy , replaceable||Traefik Proxy||Envoy||Envoy|
|Ingress controller||Envoy, custom Ingress, supports Kubernetes Gateway API||no built-in||Envoy, with support for the Kubernetes Gateway API||no built-in||Kong||Support Contour, Nginx, compatible with other|
|Governance||Istio Community and Open Usage Commons, proposed to donate to CNCF||CNCF||View Contribution Guidelines||View Contribution Guidelines||CNCF||CNCF|
|Comment||It is one of the most popular service mesh projects at present.||The earliest service mesh, the creator of the concept of “Service Mesh”, the first service mesh project to enter CNCF, using a lightweight proxy developed with Rust.||Consul service mesh, using Envoy as a sidecar proxy.||A service mesh project launched by Traefik, using Traefik Proxy as a sidecar and supporting SMI (mentioned below).||A service mesh project launched by Kong that uses Envoy as a sidecar proxy, using Kong’s own gateway as ingress.||An open source service mesh created by Microsoft, using Envoy as a sidecar, compatible with SMI (also proposed by Microsoft).|
For a detailed comparison of Istio, Linkerd, and Consul Connect, check out this blog post.
In addition to the items listed above, there are a few others that are related to service mesh:
- Envoy: As mentioned, Envoy is a cloud-native proxy, frequently used as a sidecar in other Envoy-based service meshes and for building API Gateways.
- Service Mesh Performance (SMP): Metrics that capture details of infrastructure capacity, service mesh configuration, and workload metadata to standardize service mesh values and describe the performance of any deployment.
- Service Mesh Interface (SMI): This is not a service mesh but a set of service mesh implementation standards. Similar to OAM, SPIFFE, CNI, CSI, etc., SMI defines interface standards, and the specific implementation varies. Currently, Traefik Mesh and Open Service Mesh claim to support SMI.
- Network Service Mesh: It’s worth mentioning this project because it’s often mistaken for a service mesh. In fact, it is oriented towards a three-layer network, and it can be used to connect multi-cloud/hybrid clouds without changing the CNI plug-in. It’s not a “service mesh” as we define it, but a powerful complement to a service mesh—even though the name is somewhat confusing.
Looking at the so-called “service mesh” projects mentioned above, we can see that most service mesh projects started as proxies first, and then the control plane was implemented later. Istio, Consul Connect, Open Service Mesh, and Kuma use Envoy as a sidecar proxy. Only Linkerd and Traefik Mesh have created their own proxies.
All service mesh projects support the sidecar pattern. Apart from Istio, Linkerd, and Consul Connect, which have been used in production, other service mesh projects don’t have significant production usage.
In this article, I’ve reviewed the development of the Istio open source project and described how it relates to Envoy proxy and some other components of the service mesh ecosystem. In the next article, we’ll show how to optimize Istio performance. In the final article, we’ll cover Istio’s open source ecosystem and its future.