If you’ve been watching the service mesh space recently, you’ll have noticed a lot of talk about eBPF and “sidecar-less” meshes. In fact, there’s been so much talk about these things that I’m hoping for a lot of readers for this blog post, just because I’ve got all of it in the title!
But what actually are “sidecar-less” service meshes? How do they work? And do they solve the problems we’ve been told they do, namely improving performance and reducing resource usage? In this post I’ll explain what these two technologies are, what they can and can’t do for the mesh, and how they do — and do not — work together.
What is eBPF?
eBPF — which started life as Extended Berkeley Packet Filter, but now “is no longer an acronym for anything” — is a mechanism for injecting and running your own code into the kernel of a Linux system. It originally came, as the name suggests, from Berkeley Standard Distribution, or BSD, Unix. This family of Unix distributions was famously good at networking, and systems like pfSense are still based on them today. The Berkeley Packet Filter was originally designed to support that networking use-case: to allow code to be loaded into the kernel which could do arbitrary things to packets being processed. BPF was extended, becoming eBPF, and eventually ported to Linux. In its modern form, it supports injecting code into almost anywhere in the kernel – to monitor performance, detect intrusions, or hook any system call. Think of it as “AWS Lambda for the kernel.”
eBPF retains BPF’s strong support for networking and there’s a whole framework in the kernel, called eXpress Data Path (XDP), that eBPF programs can call for help with network operations. eBPF code can use the functions in this framework to, say, make a packet skip the rest of the TCP/IP stack, if an early decision has been made about what to do with it. When most people say “eBPF”, they mean networking and XDP.
Injecting code into the kernel sounds a bit dangerous, no? Well it would be, but eBPF programs aren’t machine code; they’re bytecode for a VM (interpreter) in the kernel. This VM sandboxes them and enforces access controls. It also does static analysis of eBPF programs before they’re loaded to check they won’t do anything harmful. That might sound like the halting problem, but it’s made possible by eBPF programs being limited – not Turing-complete. The downside to this is a limit to their expressive power; there are, as the term implies, algorithms they simply can’t encode.
The Node-Proxy Topology
Service meshes use a layer 7 proxy (Envoy, in Istio’s case) to provide advanced networking functionality for every service-to-service request. To do this, an instance of that proxy must be on the request path between every pair of service workloads. The common way to do this is with a sidecar — an instance of the proxy running alongside every workload, thus handling all traffic in and out of the service. There are other possible topologies, however, such as the “node-proxy” layout, where there’s one proxy per node (worker VM or the like). All traffic from every workload (i.e. pod) on the node goes in and out through that one proxy instance. This model is also called “sidecar-less,” as each service no longer has a dedicated sidecar. Note that in all these cases, the proxy is a normal process, running in user space (this is important later).
A Leaner Mesh?
We’re talking about eBPF and the node-proxy topology because they’re both claimed to improve the “classic” service mesh design. eBPF has the ability to inject specialist, complex code into the kernel, and to attach it to hooks on the network path. This means much of the work done by the sidecar proxy can potentially be re-implemented in (“moved to”) the kernel as eBPF programs. This is claimed to have performance benefits, but I’ll address that later.
The node-proxy topology means only running one instance of the mesh’s proxy on a node, where previously there were n. This is claimed to reduce resource usage, but again I’ll talk about that later.
A False Dichotomy
The first thing I want to stress is that eBPF and the node-proxy topology are not the same, nor do they have to go hand-in-hand. There’s content out there conflating them, and saying you need both to have either, but that isn’t true. The picture being painted is of a mesh where most of the sidecar proxy functions are moved to the kernel, and what’s “left” in user space can be served by just one proxy, as if “less” proxy is an unalloyed good.
To show why this isn’t the case, let’s look at eBPF and node proxy each in isolation.
eBPF and Performance
eBPF can indeed be used to take some of the work off the user-space sidecar. But the eBPF path won’t be any faster than Envoy’s processing; it’s just code, which uses CPU cycles, whether it runs in user space or kernel space. In fact, Envoy is optimized C++ whereas eBPF’s bytecode is interpreted, so Envoy should be more performant.
People also state that encryption is inherently faster in kernel space, using terms like “mTLS offload”. It’s claimed that if you use the kernel’s crypto API then you get optimized use of the processor’s vector and cipher instructions. But there’s no reason this can’t be done in user space while using Envoy, with sufficient effort. Envoy’s crypto library, BoringSSL, does just that.
What is slower about the sidecar model is the buffer-copying and context-switching into user space, and the traversal of the whole TCP/IP and socket stack to get there. These can be eliminated if we can process an entire packet completely in eBPF. This is possible some of the time. For example, simple access control policy can be implemented in eBPF such that, if this code rejects a packet, there’s no need for it to traverse the TCP/IP stack, nor for it to be copied into, or call user space. Practically speaking, however, we can’t tackle all cases — implement all mesh features — in the kernel, because eBPF programs are limited in their complexity, in order to enable that static analysis mentioned earlier. (Consider just how much work Envoy does — HTTP is a complicated protocol, and then there’s HTTP/2, Kafka, MySQL, and so forth. Re-implementing all these, in a limited language, isn’t practical.) Another downside of this approach is that you’re left with some code that used to be in a normal, well-understood user-space process, now running in the kernel where it’s much harder to observe and debug.
Where eBPF can more usefully come into play is with the interception and redirection of traffic headed in and out of the pod — getting it to the sidecar and out to the next destination. This was traditionally done with iptables rules, but eBPF can use XDP to bypass the kernel’s TCP/IP stack. This advantage is real, providing up to 20% better throughput and latency, but it’s separate from actually doing the work of the sidecar, and has been available in Istio for years.
Eliminating the Proxy?
So if we can move some of the sidecar’s functionality into eBPF, does that mean we can — or should — get rid of sidecars completely? Well, as I said, they’re separate concerns. You’ll always need a proxy to handle some of the requests, so you can’t do away with proxies altogether. This gets confusing because there’s literature out there talking about “sidecarless” meshes, and while that term is technically accurate, it’s heavily implied that there are no proxies, which isn’t, and can’t be, the case. To avoid this confusion, we at Tetrate use the term “node proxy” or “host-based proxy” when talking about this topology. There is a decision to be made around how many proxies you want — one per service or one per node — but that decision is separate from whether or not to use eBPF.
Node Proxy and Security
So, should you use node proxies? Much has been said about the use of node proxies being a way to reduce resource usage compared to individual sidecars. This resource usage occurs along two main axes: CPU cycles and bytes of RAM.
CPU usage does not favor the node-proxy approach here, as the same code runs, over the same traffic, on the same machine. In fact, as mentioned above, the use of Envoy as a sidecar proxy is likely to have an advantage over the use of eBPF, as code for the first is written in a compiled language, C++ and code for the second is interpreted at runtime.
RAM usage will also be similar. To understand why, let’s go over what kind of things processes keep in RAM:
- Program code. The compiled program code itself (i.e., the Envoy binary on disk) must be loaded into RAM to run. However, this binary is the same for all running Envoy instances, and it’s only ever read, not written, so it won’t change over time. Thus, the OS optimizes this by only loading the binary into RAM once, after which all running Envoy processes share it.
- Working data. The other main thing that’s stored in RAM is the program’s working data – the .data and .bss segments, stack, and heap, if you want to get technical. In Envoy’s case, this working data is mostly the config sent by the control plane, Istio. Envoy’s config objects can be very large; at Tetrate, we have seen JSON-formatted config dumps hit 200K lines). However, while a node proxy will need all the config applicable to all the services on its node, sidecars only need the config relevant to their one service. A naïve setup might give all the sidecars all the config for all the services, and yes, this would waste a lot of space. But Istio is pretty good at only sending sidecars the parts of the config they need. This can be further helped by the user applying Sidecar resources, which Tetrate’s TSB management plane will do automatically for you. This puts the sum total of the sidecars’ config space in a similar range to the combined config applied to a node proxy.
(Note that modern memory management is very complicated, and it’s important to read the metrics correctly. While virtual memory allocations may well be higher when using sidecars, the number of actual physical pages used will be similar; this figure will be some combination of RSS and shared memory.)
So, by using a node proxy, you’re just adding up all of the sidecar CPU usage, and you’re pretty much adding up the sidecars’ actual physical RAM usage also to arrive at roughly equivalent resource usage, just all in one process. But, you’re trading a lot in terms of isolation. Sidecar proxies run in separate pods, which means they’re not just separate OS processes, they also run in separate namespaces (providing more software isolation) and cgroups (providing hardware isolation). Node proxy’s lack of this isolation creates problems such as:
- Noisy neighbors. Lots of traffic flowing down one route can starve the resources needed to process the other routes. In the worst case, this can be a deliberate overload caused by someone performing a denial of service attack. Significant engineering effort has been spent over the past three decades to implement noisy-neighbor mitigations in hypervisors and kernels (including container technology), but since Envoy was designed as an app sidecar, not as a central traffic director, it doesn’t even attempt any of these mitigations internally.
- Security. The “root” of the Envoy config structure is the Listener, representing a bound socket. There have been claims that Envoy is “multi-tenant” because you can build a forest of these separate config trees, one per service being proxied. However, there’s no permissions model in Envoy; anyone who can apply any config can alter it all, so calculating a name clash and hopping into the config tree for another service isn’t hard. Additionally, everything is happening in one process, and there’s no memory protection between different parts of it. Even if Envoy were to try to enforce that, it wouldn’t be as good as process or namespaced-based isolation.
- Service identity. Another security concern is that because there’s now one proxy deputizing for all the services, the mesh can’t verify identities of individual services (pods), only hosts (nodes). This throws a lot of mesh guarantees out of the window, such as the usefulness of mTLS.
- Availability. Envoy sometimes crashes and can be catastrophically mis-configured. Errors in the config for any service can potentially cause disruption to all, and worse, when this (sporadically) occurs, it’s not obvious to the owners of those other services what happened — they didn’t push any config changes recently, yet the proxy for their service broke. This is the kind of thing that causes days lost to incident response, and puts app devs off wanting to use the platform.
- Performance. Combining all the config into a single Envoy instance means that Envoy must be reconfigured any time there’s a change related to any service it’s proxying for. Over the past five years building Istio in close partnership with the Envoy community, we’ve learned that minimizing configuration churn in Envoy has significant runtime benefits for applications. Istio goes to great lengths today to minimize the configuration that needs to be sent to any individual Envoy and compute minimal sets of affected Envoys for each change in user or cluster configuration.
And these are just the most serious issues. There are more besides: how do you do cost attribution when the proxy is shared? How do you perform canary upgrades of the proxy in a way that service owners can understand?
eBPF and the use of a node proxy are separate technologies, both of which affect a service mesh in different ways. Each can be used independently. It’s not true that “using a mesh with sidecars means you can’t have the advantages of eBPF.”
Our view at Tetrate is that node-proxy topology isn’t a best practice for the general case, and eBPF neither addresses nor solves its shortcomings. Node proxy is actually going to land in Istio soon, but it’ll always be opt-in. Unless you have an exceptional use-case, like extreme resource constraints, we don’t recommend taking on all the isolation trade-offs I listed above. Instead, we’re confident that careful configuration of the mesh will get you all the advantages of a node-proxy approach with few, if any, of the downsides. Manually writing all such config is tedious, but Tetrate’s TSB management plane can do it for you.
Node proxy is actually going to land in Istio soon, but it’ll always be opt-in. Unless you have an exceptional use-case, like extreme resource constraints, we don’t recommend taking on all the isolation trade-offs
We Tetrate folks are excited about eBPF (and moreso by the somewhat-similar WebAssembly). Used properly, it can provide real mesh performance improvements, but those improvements apply similarly to both node-proxy and sidecar topologies alike. The Istio community has spent years ensuring they have the fastest, most efficient data-plane, and one which already uses eBPF in some places. Any work done by the wider community on mesh or Envoy speedups using eBPF is likely to quickly be adopted by Istio, too.