Ready to take your Envoy Gateway setup into production?
This article provides a list of items, organized by common operational concerns, that can help you validate your production readiness.
Environment Configuration
First, let’s make sure we have no environment configuration surprises:
- Non-prod and prod parity: Ensure the environment you’ve tested adequately represents your production environment. You don’t want any surprises due to differences in environment setup.
- Cluster compatibility: Double-check that your Kubernetes version supports Envoy Gateway’s requirements. Incompatible versions may cause unpredictable behaviors or missing features. Check out the Envoy Gateway compatibility matrix ›
- Namespace setup: Organize namespaces for Envoy Gateway and related services, ensuring separation of concerns and securing access. Implement role-based access control (RBAC) to limit permissions to only what’s necessary.
Pro tip: Regularly review your Kubernetes environment’s resource allocations to detect any bottlenecks early, especially if you’re running multiple services on the same cluster.
Tetrate offers an enterprise-ready, 100% upstream distribution of Envoy Gateway, Tetrate Enterprise Gateway for Envoy. Schedule a time to talk to an expert to learn if Envoy Gateway can help accelerate your cloud architecture strategy.
Talk to an expert ›
Optimize for Scalability and Performance
Production traffic can be unpredictable, so preparing your setup for varying loads is crucial:
- Set resource limits: Set appropriate CPU and memory resource limits for the Envoy Proxy data plane and Envoy Gateway control plane. Check out Customize EnvoyProxy | Envoy Gateway
- Enable autoscaling: Set the correct number of replicas for Envoy Gateway and Envoy Proxy for high availability or Configure Horizontal Pod Autoscaling (HPA) for Envoy Gateway and Envoy Proxy.
Pro tip: During your load testing phase, simulate different traffic patterns to determine the best settings for auto scaling thresholds.
Validate Your Traffic Management Configurations
Make sure your traffic routing and handling does what you expect:
- Routing and path matching: Verify that your routing configurations are accurate. Misconfigured paths can lead to unintended traffic behaviors, impacting user experience.
- Rate limiting and resilience policies: Implement rate limiting to control the number of requests your services handle. Additionally, set up retry and circuit breaker policies to help your system recover from temporary failures without cascading issues.
Pro tip: Regularly test these configurations in a staging environment to ensure they behave as expected under different scenarios.
Securing Your Setup
Ensure you have taken the time to validate your security configurations:
- TLS setup: Implement TLS for encrypted communication between services and mTLS for mutual authentication, ensuring both clients and servers verify each other’s identity.
- Access controls: Use authentication mechanisms like JWT tokens for service-to-service communication and user requests. Implement proper authorization policies to control who can access what.
- Secrets management: Store secrets (e.g., certificates, tokens) securely in line with your organization’s security guidelines. Avoid hardcoding secrets in configuration files or code.
- Conduct a security review: Make sure your information security expert has had a chance to review your setup’s security and provide suggestions for improvement.
Pro tip: Rotate your secrets regularly to minimize the risk of exposure, and audit your access control configurations periodically.
Observability and Monitoring Essentials
When you inevitably get woken up at 2am for an issue, let’s make it easy to find the root cause:
- Telemetry integration: Enable telemetry with tools like Prometheus or OpenTelemetry to capture metrics from the control and data planes so you can investigate when an issue happens.
- Centralized logging: Set up centralized logging to collect logs making it easier to diagnose issues.
- Dashboards and alerts: Setup Grafana dashboards to visualize key metrics and set up alerts for indicators like high latency, failed requests, or resource exhaustion.
Pro tip: Create separate dashboards for different environments (staging, production) to compare behaviors and spot potential issues before promoting changes.
High Availability and Fault Tolerance
Building resilience into your system helps avoid downtime:
- Multi-region failover: Ensure you have a Global Traffic Manager setup to fail over traffic to another region if there is an outage.
- Simulate failures: Before going to prod, you should practice a failover scenario with live traffic going to the system.
Pro tip: Try out chaos engineering exercises to find weaknesses in your fault tolerance strategies and continuously improve your configurations.
Documentation Will Save You Time
Well-documented setups improve team efficiency:
- Operational playbook: Develop a playbook for common incidents and troubleshooting steps so anyone on the team can resolve issues that come up.
- Failure point diagram: Create a failure point diagram from client request to target service. Highlight each network hop and document potential root causes for broken connections.
Pro tip: Document the reasoning behind your configuration choices, as it helps future engineers understand the setup and make informed decisions when updates are needed.
Backup and Recovery Plans
- Regular backups: Schedule frequent backups for configuration files, secrets, and critical data. Verify you can quickly redeploy your setup if necessary.
- Disaster recovery testing: Test your disaster recovery procedures regularly to ensure your team knows how to restore services quickly in case of a failure.
Pro tip: Manage all your configuration in Git and ensure your repo is backed up regularly.
Next Steps
Looking to take Envoy Gateway to production? We’d love to help you on your journey. As a Tetrate Enterprise Gateway customer you’ll get expert guidance and support.
Get in touch with us to learn more about how we can help you.
###
If you’re new to service mesh, Tetrate has a bunch of free online courses available at Tetrate Academy that will quickly get you up to speed with Istio and Envoy.
Are you using Kubernetes? Tetrate Enterprise Gateway for Envoy (TEG) is the easiest way to get started with Envoy Gateway for production use cases. Get the power of Envoy Proxy in an easy-to-consume package managed by the Kubernetes Gateway API. Learn more ›
Getting started with Istio? If you’re looking for the surest way to get to production with Istio, check out Tetrate Istio Subscription. Tetrate Istio Subscription has everything you need to run Istio and Envoy in highly regulated and mission-critical production environments. It includes Tetrate Istio Distro, a 100% upstream distribution of Istio and Envoy that is FIPS-verified and FedRAMP ready. For teams requiring open source Istio and Envoy without proprietary vendor dependencies, Tetrate offers the ONLY 100% upstream Istio enterprise support offering.
Need global visibility for Istio? TIS+ is a hosted Day 2 operations solution for Istio designed to simplify and enhance the workflows of platform and support teams. Key features include: a global service dashboard, multi-cluster visibility, service topology visualization, and workspace-based access control.
Get a Demo