These are the set of steps I walk through any time I sit down to debug an Istio setup, regardless of how much experience I have with the deployment. Most Istio errors in my experience are simple, “stupid” mistakes; having a checklist to walkthrough helps me catch problems a lot more quickly. With that said, these are the steps I generally walk through:

1. Is it syntactically valid?

      • ‘istioctl analyze’ will verify your configurations parse correctly.

2. Does it have a status set?

      • Look for warnings or errors set by istiod; alternatively istioctl analyze will catch these too (you can run it pointed at your live cluster and it’s essentially the same as istiod; alternatively, you can run it locally– only where it’ll do more limited validation).
      • istioctl analyze includes a bunch of referential checks that are easy to miss. For example, it checks that host names you use in a VirtualService match the Gateway it’s bound to.

Really, 99% of the errors in Istio are just slight misconfigurations. istioctl analyze tries to catch a lot of these common mistakes, but can’t get them all. So we’re going back to basics and checking from the bottom up:

3. Is the name right? Is the resource in the right namespace?

      • Since Istio 1.4, nearly all resources are namespace scoped, including all networking configurations like VirtualService, EnvoyFilter, Gateway, ServiceEntry. Make sure they’re in the same namespace as the service you’re working on.
      • This is especially important because selectors are namespaced. A common misconfiguration is to publish a VirtualService in the application’s namespace (e.g. default) targeting the istio: ingressgateway selector, intending for the VirtualService to bind to the istio-ingressgateway deployment in the istio-system namespace. This will only work if your VirtualService is also in the istio-system namespace.
      • Alternatively, write a Sidecar resource in the istio-system namespace that imports the VirtualService from the application namespace. In general, don’t do this if you can avoid it and just deploy a set of Envoy gateways per application that needs ingress.

4. Are the resource selectors correct?

    • Verify the pods in your deployment have the right labels; make sure they match character-for-character.
    • Remember, per above, resource selectors are bound to the namespace the resource is published in.

At this point we’re fairly confident the configuration is correct on our side, so let’s start to see how the runtime system is handling the configuration.

You can use the experimental istioclt describe command to do a lot of this analysis automatically. However, it’s still experimental, so we’re not covering it as the primary way to debug. When it’s promoted to stable it’ll be included more heavily here. We’ll include the details below regardless, because they’re fool-proof. But the reason istioctl describe is better is that these methods are only fool-proof with deep knowledge of Envoy or a lot of searching through the Envoy reference docs (this is a good exercise in and of itself).

5. Did Envoy accept (ACK) the configuration?

    • istioctl proxy-status pod_name -n pod_namespace
    • We expect everything to be SYNCED; anything else indicates an error. Any error here means we should check Pilot’s logs– skip forward to that (#8).
    • If Envoy did ACK the config, let’s make sure it manifested correctly in Envoy.

6. Did the configuration appear as expected in Envoy?

    • istioctl proxy-config type podname -n namespace is our primary tool
      • It works by collecting data from Envoy’s admin endpoint – mostly the /config_dump endpoint – but there’s a ton of useful information there.
    • Based on the configuration we wrote, we’re going to look for different types of configuration in Envoy. Generally speaking:
    • VirtualService HTTP rules manifest as routes in Envoy
      • istioctl proxy-config routes pod_name -n pod_namespace
      • Other VirtualService rules (the hostname sometimes, TCP rules) can manifest in the listeners too.
    • When evaluating VirtualServices, you’re looking for hostnames to be present in Envoy configuration like you wrote them (either in Listeners or Routes), and for Routes to be present (e.g. if you have a 50-50 traffic split). You should be able to trace from a Listener (identify by the host) to a Route (named in the Listener) to a Cluster (named in the Route).
    • Gateways manifest as listeners
    • You’re looking for listeners bound to the ports in your Gateways, with hostnames matching the Servers in the Gateway.
    • DestinationRules manifest as clusters
    • ServiceEntries manifest as clusters

Remember that DestinationRules do not appear unless a ServiceEntry for their host exists first.

    • EnvoyFilters will manifest where you tell Istio to put them. Typically a bad EnvoyFilter will manifest as Envoy rejecting the configuration (i.e. not being in the SYNCED state above) and you need to check Istiod (Pilot) logs for the errors from Envoy rejecting the configuration.
    • Sidecars scope overall what configuration applies, so you’re generally looking for the absence or presence of configuration above to see if a sidecar is applying correctly.

7. Did Istiod (Pilot) log errors?

If you’re trying to see if Pilot is logging an error about a piece of configuration, delete and re-create it quickly before you check logs to help ensure they’ll be present and included towards the bottom.

    • If configuration didn’t appear in Envoy at all– Envoy did not ACK it, or it’s an EnvoyFilter configuration– it’s likely that the configuration is invalid (Istio cannot syntactically validate the configuration inside of an EnvoyFilter) or is located in the wrong spot in Envoy’s configuration.
    • In either case, Envoy will reject the configuration as invalid and Pilot will log the error; you can generally search for the name of your resource to find the error.
      • Here, you’ll have to use judgement to determine if it’s an error in the configuration you wrote, or a bug in Pilot resulting in it producing invalid configuration.
      • In either case, please file an issue about the hang-up so we can add additional checks to istioctl analyze to help prevent future failures.

Zack Butcher

Zack Butcher is a founding engineer at Tetrate, a core contributor to @IstioMesh and co-author of Istio: Up and Running. Tetrate is committed to open source and offers services and products that make it easier for organizations to adopt Istio and Envoy. You can ask us anything about service mesh at