One of the most repeated pieces of advice for anyone getting started with microservices is to make sure you can see everything that’s going on inside your services. Leverage the power of observability. However, observability is a loaded term – so it’s valuable to understand what that terms mean, and what’s involved.

This blog is good for beginners, intermediates or anyone who’s had a bit of a brain fart and needs a refresher course. It will tell you what an event is, the difference between observability tools and what you can do with them in an environment that’s supported by an Istio and Envoy service mesh. 

Metrics, logs, and tracing

In its most basic sense, observability is best explained as the ability to understand your system using external outputs of metrics, logs and tracing that have been generated based on events from within your system. 

Key Terms: Metrics: Metrics are a general aggregation of data that gives you a picture of what’s happening and where to dig deeper. Services are continually generating metrics for consumption; they’re a constant measure of health. Logs: Logs are verbose. They contain some information on an ‘event’ from start to finish. A log can gather (anonymized) user data, for example, which user made a request, where the request started, where it entered the service and much more. Tracing: Tracing gives you the ability to see a request from start to finish as it happens. It’s a live capture of event behavior. It can help to pinpoint where a failure occurred, or what caused a performance issue based on a real-time example.

In a microservices-based environment, you’re generating a lot of events, and an event is defined as anything that happens from the moment that a request reaches the outer perimeter of your network, the action that generates observable data.

Observable data showing interconnected microservices

Metrics, logs and tracing are all ways to understand those events and by extension, the inner workings of your services. 

Observability in Istio service mesh

If you’re using microservices to any great extent, or you have a hybrid environment, you are more than likely considering or using a service mesh.  

Istio and Envoy produce a significant amount of data for service behavior, in metrics, logs and tracing – which is great! You’ll need to make some judgement calls to decide which data are critical or important and which really aren’t, so that you don’t drown in the data you collect, or pay to store data you don’t need. 


Generally speaking, there are two types of metrics. Application/Business metrics and Operational metrics.

Key Terms: Application metrics: Application Performance Metrics (APM) data relates to application performance such as load time and response time, and ensuring that applications are delivering the expected performance to customers. An open source technology like Apache SkyWalking can be integrated into an Istio service mesh to act both as an APM and as additional Service Performance Management (SPM) system - double the benefits! Operational metrics: Operational metrics are concerned with how the services are running. How is your environment performing, and are often described as ‘RED’ metrics - measuring the rate of Requests, rate of Errors, and Duration.

The sole concern of a service mesh, such as Istio, is gathering those operational metrics to help you ascertain how our services are performing, and getting a general understanding of service health.  

Proxy-level metrics

Envoy generates its own statistics as well as access logs and tracing, more information on which can be found here. Envoy observability will tell you about its configuration and health at a very granular level, giving you a picture of the relationship between service instances and Envoy resources. 

Many Envoy observability insights are gathered automatically, so Istio takes the immense capabilities of Envoy observability, and adds in a further control to give you the ability to select which metrics are generated and gathered at each proxy. Also, if you need to expand the reach, you can scale up and back as needed!

Service-level metrics

Covering the four basic service observability needs for service-to-service communication, service-level metrics give you insight into your latency, traffic, errors and saturation (saturation is the total number of requests per second that your service can handle, and how close you are to meeting that threshold). 

Control plane metrics

These are the metrics that Istio gathers on itself! (Yes, the description really is this short.)


Access logs give users the ability to see and understand behavior from a single instance’s point of view. Istio’s logging capabilities are enabled by Envoy and they’re customizable, which means that there is complete control over what is gathered, and how. 


Istio’s tracing capabilities are, again, enabled by the Envoy proxies automatically, and give users the ability to see a request flow through the mesh in real-time, to understand any sources of latency that may be causing downstream dependency failures or other unintended outcomes. The traces are sent to a distributed tracing backend, such as Zipkin, that can act as a collection and lookup to provide an all-round easier experience for reviewing distributed traces. Istio’s role in this is again to extend greater control over the tracing capabilities by giving users the ability to dictate the volume of tracing carried out, based on what they deem necessary. 

The value of observability

The further you move away from a monolith the more costly and confusing your observability can become if it’s not set up properly. Good observability means:

  • You spend less on storage (genuinely, it will impact your financial outgoings).
  • Engineering can actually see what is going on in your systems! Ambiguity helps no one in this instance. 
  • If you are going through a migration process, you’ll find a lot of issues you may not have known existed, giving you ample opportunity to implement solid processes to prevent future problems.
  • You’ll find the badly written code. It won’t rewrite it for you, but it will highlight where changes need to be made. 
  • It will embed in your engineering teams a culture of care. 
  • You’ll be able to see very easily the features that are getting the most use, and focus attention on improvements to where customers are spending the most time, before they even realise it. 

The list could go on, but the point is that Observability tools do far more than help debug, they are your eyes and ears into a distributed system that needs more oversight to be successful.

Additional resources:


Tia Louden - Marketing
Tia Louden is a content writer for Tetrate. Tetrate is solving observability problems for enterprises with products that are powered by the open source projects Istio, Envoy, Zipkin, and Apache SkyWalking.