data:image/s3,"s3://crabby-images/35a60/35a600496596c01febf3805686e46b6001e4d912" alt=""
Organizations often want to know how a service mesh can help provide better visibility into their deployments, so they can get a clearer understanding of their user experience.
But neither metrics nor logs can provide specifics on individual cases. That’s where tracing comes in.
A trace gives a developer the full context of a user experience, by attaching a correlation ID to each user request. That correlation ID is the thread that links the trace together through multiple services.
Since all requests go through Envoy, it may seem that Envoy could provide the tracing information, but it’s not quite that simple. To Envoy, each application looks just like the network – Envoy doesn’t have any special insight into the application’s internals. This means Envoy can’t track causality: 10 requests enter a service and 100 leave it, which of the 100 relate to which of the 10? Because Envoy cannot answer that question, it cannot automatically forward trace context – or any other kind of context – on behalf of your application. The presentation below shows this graphically:
Request Context in a Mesh
1. Tracing involves following a path through multiple services to understand the full context of the user experience. A trace begins with a user request, which is assigned a correlation ID.
data:image/s3,"s3://crabby-images/11876/1187605a4c47b23ed344fffb5452574a57684034" alt="An incoming user request setting example headers, including the HTTP Host header, an example X-User-Identity header, and X-B3-TraceID header with trace 1234"
2. Several headers are attached to the request as the trace gets started, including a normal header.
3. A trace header is also attached to the request, in this example, 1234.
4. A custom header is also attached.
data:image/s3,"s3://crabby-images/abaff/abaff0ed18caa73c7bab617581818d553003a0c2" alt="The example incoming request sets custom headers. In this example, X-User-Identity: Base64 Token."
5. Envoy is sitting right beside the application, and the two of them talk to each other. Any request that comes in goes through Envoy.
data:image/s3,"s3://crabby-images/6c3b3/6c3b336c018ea69c7f95b03edbd4d5a7018699cb" alt="The incoming request is intercepted by Envoy before it reaches the application"
6. The trace will show everything that happens to the user request. Since the request is going through Envoy, that will be part of the trace. After going through Envoy, the request goes to the application.
data:image/s3,"s3://crabby-images/6bd59/6bd592e6bd505acf4e7964ed130e4bd85329c12b" alt="Envoy forwards the incoming request to the application."
7. Envoy can also attach additional headers to gather information about what is happening inside the app.
data:image/s3,"s3://crabby-images/cbea5/cbea5621aa1a535651f53828e2a0f7ab382b7ec4" alt="Envoy forwards the incoming request to the application. Envoy can set headers in addition to the incoming request’s original headers. In this example, we show Envoy setting the X-Forwarded-For and X-Forwarded-Client-Cert headers."
8. As the request moves through the app, the app will likely contact another system to process the request.
data:image/s3,"s3://crabby-images/b449e/b449ea27eec29d20df5c5cb90cfe0dd77c5512dc" alt="The application’s business logic makes an outgoing request in response to our example incoming requests. These outgoing calls are intercepted by Envoy too."
9. We can see inside the application that the request going out is on behalf of one that came in from the user with the trace ID 1234.
data:image/s3,"s3://crabby-images/14ee7/14ee77e1b1c4f0eb70bb0f9614ba1f825977d7ab" alt="The application calls to backends, in this example, on behalf of request 1234."
10. Every request needs an identity, and we can see here that this request is correlated to the user.
data:image/s3,"s3://crabby-images/a1192/a11928f471d6025cdba1d7698078f1433bd2b809" alt="For the outgoing call, it’s not clear what X-User-Identity and X-B3-TraceID should be set; the application knows the outgoing call is in response to request 1234, but Envoy doesn’t."
11. This is where things get more complicated. The application has to copy the identity for that user. It can’t get from one step to another without copying the ID.
data:image/s3,"s3://crabby-images/0af48/0af48a336cd093a722989c8fd72a03bdd23f67e3" alt="To get the correct metadata onto the outgoing call, the business logic needs to propagate the headers."
12. The app sends a response to the user request, and in turn gets a response back.
data:image/s3,"s3://crabby-images/fc61d/fc61d398cb44d84e0c92923b4bcbc3217a7b73d8" alt="The backend sends a response to our application’s request."
13. The system will probably have to do multiple requests back and forth to get the full answer for the user request and return it.
data:image/s3,"s3://crabby-images/abb6b/abb6bb3053ba4a6426a2a2c816a1ab23d392d2a4" alt="There can be many calls to different backends from one inbound request. In the business logic, we know they’re all related, but Envoy doesn’t."
14. If this one request was all that the app was receiving, the service mesh could propagate headers and do all the tracing. But there’s never just one request coming in at a time, there’s always multiple requests happening concurrently. It’s that concurrency — multiple things happening at one time — that causes a loss of visibility.
data:image/s3,"s3://crabby-images/f79c8/f79c8808603308f33bd6ad0a1b850b69073bcac4" alt="The app completes its work and returns its response to our original user request 1234."
15. Because Envoy can only see the network, and not inside the app, all it sees are multiple requests and responses. There’s no way for Envoy to know which of those belong to the different user requests, because they all happened at the same time. Envoy can’t put the data on the individual requests.
data:image/s3,"s3://crabby-images/83181/8318167e46c3d3a97950367d5f7988472f50a898" alt="Envoy doesn’t see any of the business logic involved in this request. It has no knowledge of the app’s innards: instead, Envoy just sees a series of requests in and out of the application."
16. If we add multiple user requests at the same time, we can see the back-and-forth starts to grow rapidly.
data:image/s3,"s3://crabby-images/ed4e5/ed4e548c89e0761ef8d1394a0bff8dab4f7ec86c" alt="When we overlay multiple incoming requests, the back-and-forth calls happen concurrently, so Envoy cannot track causality (what call was generated by which incoming request)."
17. That’s why the application has to be involved, because it has to copy that data. That’s not necessarily easy to do, but there are tools built by the tracing community to make it easier. Tetrate recommends Zipkin.
data:image/s3,"s3://crabby-images/dad7a/dad7a648d224d143361882a15ded34c25c63ee15" alt="The application has responded to the original incoming request 1234."
This is why the business logic itself needs to forward headers from the incoming request onto outgoing requests. There are many libraries for this for a variety of tracing systems and languages. TSB ships with Zipkin. Therefore, any of Zipkin’s listed libraries/agents would work out of the box.
For custom, non-tracing headers that need to be forwarded, you need to implement these yourself in a library your developers use.
data:image/s3,"s3://crabby-images/e4406/e4406530399bdb85136f9b135c928791e0bb22be" alt=""
data:image/s3,"s3://crabby-images/b5fad/b5fad1ec114a4338d2563ecfeb8f110a1a54782e" alt=""
Zack Butcher is a Tetrate engineer and an Istio contributor; Eileen AJ Connelly is a content writer for Tetrate. Tetrate writer Tevah Platt contributed.
Tetrate is a service mesh company that makes it easier for companies to adopt and use Istio and Envoy