In today’s ever-evolving landscape of microservices architectures, observability has transitioned from a nice-to-have feature to an absolute necessity. Understanding what transpires within your service mesh is imperative for maintaining system health, diagnosing issues, and optimizing performance. Envoy Gateway, a robust edge proxy, has recently unveiled its version 0.5.0, introducing a slew of enhanced proxy observability features. In this article, we will delve into these features, exploring how they can empower you to gain profound insights into your microservices ecosystem.
Prerequisites
Before embarking on our journey to explore proxy observability with Envoy Gateway, let’s ensure that we have the necessary setup in place. The Envoy Gateway observability architecture is shown in the figure below.
Ensure the following prerequisites are met:
Install Envoy Gateway
Follow the steps in the Quickstart Guide to install Envoy Gateway and the example manifest.
helm install eg oci://docker.io/envoyproxy/gateway-helm --version v0.5.0 -n envoy-gateway-system --create-namespace
kubectl apply -f <https://github.com/envoyproxy/gateway/releases/download/v0.5.0/quickstart.yaml> -n default
Install FluentBit
Install FluentBit to collect logs from EnvoyProxy instances and forward them to Loki.
helm repo add fluent <https://fluent.github.io/helm-charts>
helm repo update
helm upgrade --install fluent-bit fluent/fluent-bit -f <https://raw.githubusercontent.com/envoyproxy/gateway/latest/examples/fluent-bit/helm-values.yaml> -n monitoring --create-namespace --version 0.30.4
Install Loki
Set up Loki to store the logs collected by FluentBit.
kubectl apply -f <https://raw.githubusercontent.com/envoyproxy/gateway/latest/examples/loki/loki.yaml> -n monitoring
Install Tempo
Install Tempo to store traces.
helm repo add grafana <https://grafana.github.io/helm-charts>
helm repo update
helm upgrade --install tempo grafana/tempo -f <https://raw.githubusercontent.com/envoyproxy/gateway/latest/examples/tempo/helm-values.yaml> -n monitoring --create-namespace --version 1.3.1
Install the OpenTelemetry Collector
Install the OpenTelemetry Collector, a vendor-agnostic tool for receiving, processing, and exporting telemetry data.
helm repo add open-telemetry <https://open-telemetry.github.io/opentelemetry-helm-charts>
helm repo update
helm upgrade --install otel-collector open-telemetry/opentelemetry-collector -f <https://raw.githubusercontent.com/envoyproxy/gateway/latest/examples/otel-collector/helm-values.yaml> -n monitoring --create-namespace --version 0.60.0
Send Test Traffic
Test the observability features by sending traffic to the External IP. To get the external IP of the Envoy service, run the following:
export ENVOY_SERVICE=$(kubectl get svc -n envoy-gateway-system --selector=gateway.envoyproxy.io/owning-gateway-namespace=default,gateway.envoyproxy.io/owning-gateway-name=eg -o jsonpath='{.items[0].metadata.name}')
export GATEWAY_HOST=$(kubectl get svc/${ENVOY_SERVICE} -n envoy-gateway-system -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
curl --verbose --header "Host: www.example.com" http://$GATEWAY_HOST/get
Expose Endpoints
Ensure you expose the necessary endpoints for querying metrics, logs, and traces.
LOKI_IP=$(kubectl get svc loki -n monitoring -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
With these prerequisites in place, let’s explore the proxy observability scenarios.
Metrics Observability
Envoy handles metrics data by default through its built-in statistics and metrics collection feature. Here’s how it works:
- Statistics and metrics collection: Envoy collects various performance and operational metrics by default. These metrics include information about network traffic, request and response rates, error rates, and more. Envoy uses these metrics to monitor its behavior and performance.
- Exposing metrics: Envoy provides an HTTP admin interface that exposes collected metrics in a format that can be scraped by monitoring systems like Prometheus. By default, metrics are available at the endpoint /stats/prometheus on the Envoy admin interface.
- Metric types: Envoy collects different types of metrics, including counters (e.g., request count), gauges (e.g., current connections), and histograms (e.g., request duration distribution). These metrics provide detailed insights into the behavior of the proxy.
- Custom metrics: Envoy also allows you to define custom metrics and statistics in the configuration file to capture specific application-level metrics or monitor aspects of Envoy’s behavior not covered by default metrics.
- Integration with monitoring systems: You can configure Envoy to integrate with external monitoring and observability systems like Prometheus, Grafana, and StatsD to collect, store, and visualize metrics data.
How Envoy Gateway Handles Metrics
Envoy Gateway offers comprehensive metrics observability. Here’s how you can harness it:
- Enable Prometheus metrics endpoints: Set
telemetry.metrics.prometheus
in the EnvoyProxy Custom Resource Definition (CRD) to enable metrics collection. Expose Prometheus metrics endpoints to start collecting data.
kubectl apply -f <https://raw.githubusercontent.com/envoyproxy/gateway/latest/examples/kubernetes/metric/prometheus.yaml>
- Verify metrics: Learn how to verify these metrics by querying a sample endpoint.
export ENVOY_POD_NAME=$(kubectl get pod -n envoy-gateway-system --selector=gateway.envoyproxy.io/owning-gateway-namespace=default,gateway.envoyproxy.io/owning-gateway-name=eg -o jsonpath='{.items[0].metadata.name}')
kubectl port-forward pod/$ENVOY_POD_NAME -n envoy-gateway-system 19001:19001
# Check metrics
curl localhost:19001/stats/prometheus | grep "default/backend/rule/0/match/0-www"
You will receive no response because Prometheus has sunk the metrics to OpenTelemetry.
- Configure OpenTelemetry Sink: Discover how Envoy Gateway can send metrics to an OpenTelemetry Sink and how to configure this feature.
kubectl apply -f <https://raw.githubusercontent.com/envoyproxy/gateway/latest/examples/kubernetes/metric/otel-sink.yaml>
- Verify OTel-Collector metrics: Check out the process for verifying metrics collected by OpenTelemetry Collector.
export OTEL_POD_NAME=$(kubectl get pod -n monitoring --selector=app.kubernetes.io/name=opentelemetry-collector -o jsonpath='{.items[0].metadata.name}')
kubectl port-forward pod/$OTEL_POD_NAME -n monitoring 19001:19001
# Check metrics
curl localhost:19001/metrics | grep "default/backend/rule/0/match/0-www"
Logs Observability
Understanding what’s happening in your microservices also requires effective log management. Envoy Gateway handles logs by default through its access logging feature and sends logs to stdout in default text format. Here’s how it works:
- Access logging: Envoy can be configured to generate access logs by default for incoming and outgoing requests. These logs capture information about each request, including details such as the request and response times, HTTP status codes, request headers, and more.
- Log formats: Envoy allows you to define custom log formats, which specify what information should be included in the logs and in what format. You can configure the log format in the Envoy configuration file.
- Log output: Envoy supports various log output targets, including writing logs to files, sending them to stdout, forwarding them to a syslog server, or even sending them to an HTTP server for remote log storage.
- Filtering and sampling: Envoy provides options for filtering and sampling log data, so you can control which requests are logged and which are not. This can help reduce the volume of log data generated.
- Security: Envoy’s access logs can also be configured to include security-related information, such as request and response headers, to aid in security monitoring and auditing.
How Envoy Gateway Handles Logs
Envoy Gateway provides flexibility in this area:
- Verification with Loki: Explore how to verify logs from Loki, the log storage solution. Run the following command to retrieve logs from Loki:
curl -s "http://$LOKI_IP:3100/loki/api/v1/query_range" --data-urlencode "query={job=\\"fluentbit\\"}" | jq '.data.result[0].values'
- Disabling logs: Learn how to disable logs using
telemetry.accesslog.disabled
in the EnvoyProxy CRD. See the proxyAccessLog API type for more details.
kubectl apply -f <https://raw.githubusercontent.com/envoyproxy/gateway/latest/examples/kubernetes/accesslog/disable-accesslog.yaml>
- Sending logs to OpenTelemetry Sink: Discover the process of sending logs to OpenTelemetry Sink for centralized log management.
kubectl apply -f <https://raw.githubusercontent.com/envoyproxy/gateway/latest/examples/kubernetes/accesslog/otel-accesslog.yaml>
- Verifying OpenTelemetry Sink logs: Check how to verify logs stored in Loki after enabling the OpenTelemetry Sink.
curl -s "http://$LOKI_IP:3100/loki/api/v1/query_range" --data-urlencode "query={exporter=\\"OTLP\\"}" | jq '.data.result[0].values'
The output will look similar to this:
[
[
"1693314563284333000",
"{"body":"[2023-08-29T13:09:23.284Z] "- - HTTP/1.1" 400 DPE 0 11 0 "-" "-" "-" "-" "-"\\n","resources":{"cluster_name":"default/eg","k8s.cluster.name":"cluster-1","log_name":"otel_envoy_accesslog","node_name":"envoy-default-eg-64656661-6fccffddc5-662np","zone_name":""}}"
]
]
Traces Observability
Traces provide a chronological view of requests and are essential for debugging and performance optimization.
Envoy, by default, generates trace data and can send this data to backend tracing systems that support OpenTracing and Zipkin formats. This enables Envoy to integrate with distributed tracing systems to track request flow seamlessly.
Here are the basic steps of how Envoy handles trace data by default:
- Generating trace data: Envoy generates trace data that records the journey of a request through the proxy. This includes start and end times, trace identifiers (Trace ID and Span ID), and service and operation names, among other information. This data is used to construct timelines and request paths.
- Reporting trace data: Envoy can report trace data to backend tracing systems by default. Envoy supports the OpenTracing standard, so you can configure Envoy to send trace data to tracing systems that support OpenTracing. Additionally, Envoy supports the Zipkin format, allowing you to send trace data to a Zipkin tracing system.
- Configuring trace data destinations: In the Envoy configuration file, you can define where trace data should be sent. This typically includes the tracing system’s address, port, and trace data format (Envoy Gateway currently only supports OpenTelemetry).
- Enabling tracing: To enable tracing, you can add the appropriate configuration options in the Envoy configuration file to ensure Envoy starts generating and sending trace data.
How to Enable Tracing in Envoy Gateway
- Enabling Traces: Set
telemetry.tracing
in the EnvoyProxy CRD to enable tracing.
kubectl apply -f <https://raw.githubusercontent.com/envoyproxy/gateway/latest/examples/kubernetes/tracing/default.yaml>
- Sample rate considerations: Since sample rate can have a potential performance impact, you can set the
samplingRate
field to adjust the sampling rate to meet your particular performance profile.
- Verify traces with Tempo: Explore how to verify traces from Tempo, the trace storage solution. Expose the Tempo service:
kubectl port-forward tempo-0 3100 -n monitoring
- List the tracing data:
curl -s "<http://127.0.0.1:3100/api/search>" --data-urlencode "q={ component=envoy }" | jq .traces
You will see output similar to this:
{
"traceID": "8010d4fd89e024c0626d984621babd71",
"rootServiceName": "eg.default",
"rootTraceName": "ingress",
"startTimeUnixNano": "1693377719403756000"
}
- Fetch specific traces: Understand how to retrieve specific traces by trace ID for detailed analysis. Search for the specific trace ID:
curl -s "<http://127.0.0.1:3100/api/traces/><trace_id>" | jq
These traces provide detailed information about a request’s journey through Envoy Gateway, including start and end times, trace and span IDs, service and operation names, attributes, and more.
Conclusion
Observability is critical to managing microservices in a complex and dynamic environment. Envoy Gateway, with its enhanced proxy observability features, allows you to gain deep insights into your microservices ecosystem. You can monitor, diagnose, and optimize your services by enabling metrics, logs, and trace observability.
Envoy Gateway’s integration with popular observability tools like Prometheus, Loki, and Tempo allows you to centralize and visualize your metrics, logs and traces for better visibility and control. Whether you’re debugging issues, optimizing performance, or ensuring the reliability of your microservices, Envoy Gateway’s observability features provide the data you need to make informed decisions and keep your services running smoothly.
As you continue to explore and utilize Envoy Gateway’s observability capabilities, you’ll be better equipped to understand, manage, and scale your microservices architecture confidently.