As an engineering leader, your role in organizing your team to meet SLOs and SLAs in the era of cloud native architecture is crucial to your company’s success. This article provides insights and practical guidance on how to organize your team to get clarity on the current state of your teams and systems to make the right investments for the future. We’ll also look at the importance of consistency in edge metrics—like those delivered by Envoy Gateway—to the success of a monthly service review.
Tetrate offers an enterprise-ready, 100% upstream distribution of Envoy Gateway, Tetrate Enterprise Gateway for Envoy. Schedule a time to talk to an expert to learn if Envoy Gateway can help accelerate your cloud architecture strategy.
Talk to an expert ›
Whether you are building B2B SaaS solutions, internal software, or B2C software, your work directly impacts the performance of our systems, which is critical to ensuring our business’s success.
As the providers of a SaaS solution, you understand the weight of ensuring that our software is meeting our clients’ service level agreements (SLAs).
If you provide systems for critical business functions within your organization, you know that how well you meet your service level objectives (SLOs) directly impacts the success of the business.
If you have a B2C solution, you know how important it is to ensure that every user has a good experience with your software.
The Performance Clarity Challenges of Cloud Computing
While the era of cloud architecture has improved scalability and resiliency, tracking service quality levels has become increasingly complex. Our distributed multi-component systems have posed more challenges than clarity regarding how a system is doing.
As the number of components in our systems grows, it becomes increasingly difficult to get a clear picture of what is impacting your SLOs. It also gets harder to measure, pinpoint problems, and hold people accountable for taking action to improve performance.
When examining SLAs and SLOs, we need to consider the quality delivered to clients and users. By stepping back from all the small components that make up your system, we can gain clarity. We focus on the time between a request arriving at the boundary of your system and the response.
Set Your Team Up for Success
Establish goals, accountability, and transparency in your organization to clarify whether and where investments are necessary to meet SLAs and SLOs.
Establish Goals
Each team should be responsible for memorializing their SLOs. Not all services are equally critical, so SLOs vary slightly between services.
Establish Accountability
Ask all your teams to appoint a person responsible for their team’s Service performance.
Organize Monthly Service Review
Measure
It is no surprise that the first step is to measure. If you’re unaware of the status of service quality, you won’t be able to make decisions rooted in reality.
It can be overwhelming to pick what to measure; however, start simple and measure the most critical items. Three areas that are directly linked to the quality of the service you provide to users:
- Outages
- How many complete outages did we have, for how long, and which components?
- Errors
- Application error rate trends – what is the % of errors in total traffic?
- Performance
- API response performance trends – is it getting slower or faster?
- Static content delivery performance – is it getting slower or faster?
Attribute
For accountability clarity, ensure all metrics are appropriately attributed to the application and team.
Attribute performance metrics by:
- Owning team
- Application
- Environment
Teams might argue that they depend on other underlying systems that aren’t able to meet their applications’ performance needs. However, accountability is essential here. As a leader, you must empower the person owning a product’s service quality to have the necessary conversations with the owners of the systems they depend on. Remember, often, the solution sits on both sides of the fence.
Report
Organize the findings into a digestible monthly report, broken down by team and applications.
Ask all service performance owners to add commentary to the report to shed light on the outages, errors, and performance changes.
Ask all service performance owners to add any action points taken in the past month and any planned actions anticipated to impact performance (positive or negative).
Review
Set up a monthly meeting to review performance with all service performance owners, underlying systems and infrastructure owners, and internal business stakeholders. Make sure your operations team is involved in this meeting.
Act
Take action to address issues in the system’s performance quality. Address performance issues with engineering actions and measure the impact.
Pick Edge Components That Make It Easy to Measure Performance
Consistency in edge metrics will make measuring and reporting much easier. It is important to pick a solution that provides the granularity to measure and attribute appropriately.
Envoy Proxy, a mature reverse proxy originally developed at Lyft, allows you to capture rich metrics from requests. It enables you to measure performance, collect metrics data to attribute performance, and use it to report on service performance.
However, without a scalable control plane, Envoy Proxy can be difficult to manage and configure. The easiest way to use Envoy Proxy to handle incoming requests to your system is to run it as a Kubernetes Gateway managed by Envoy Gateway.
Even though Envoy Gateway enables you to use Envoy Proxy as a Kubernetes Gateway, you can route traffic outside and inside Kubernetes, allowing you to have a consistent technology component regardless of whether you have Kubernetes-hosted services.
###
If you’re new to service mesh, Tetrate has a bunch of free online courses available at Tetrate Academy that will quickly get you up to speed with Istio and Envoy.
Are you using Kubernetes? Tetrate Enterprise Gateway for Envoy (TEG) is the easiest way to get started with Envoy Gateway for production use cases. Get the power of Envoy Proxy in an easy-to-consume package managed by the Kubernetes Gateway API. Learn more ›
Getting started with Istio? If you’re looking for the surest way to get to production with Istio, check out Tetrate Istio Subscription. Tetrate Istio Subscription has everything you need to run Istio and Envoy in highly regulated and mission-critical production environments. It includes Tetrate Istio Distro, a 100% upstream distribution of Istio and Envoy that is FIPS-verified and FedRAMP ready. For teams requiring open source Istio and Envoy without proprietary vendor dependencies, Tetrate offers the ONLY 100% upstream Istio enterprise support offering.
Need global visibility for Istio? TIS+ is a hosted Day 2 operations solution for Istio designed to simplify and enhance the workflows of platform and support teams. Key features include: a global service dashboard, multi-cluster visibility, service topology visualization, and workspace-based access control.
Get a Demo