Can Rate Limiting Help Control Compute Costs?

It’s an exhilarating feeling. Your application or platform is really popular, and the traffic is pouring in.
Then reality hits as you see the cloud computing bill. Your services have been scaling excessively due to high traffic and demand, and the excitement might fade away.
How to mitigate this problem? Setting boundaries with rate limiting.
With effective rate limiting you control the incoming traffic to a system. It can help control computing costs by stopping excessive usage and abuse.
However, when we talk about rate limiting, we often need to be more precise. It is one of those things that changes meaning depending on context.
Tetrate offers an enterprise-ready, 100% upstream distribution of Istio, Tetrate Istio Subscription (TIS). TIS is the easiest way to get started with Istio for production use cases. TIS+, a hosted Day 2 operations solution for Istio, adds a global service registry, unified Istio metrics dashboard, and self-service troubleshooting.
What Do You Mean by “Rate Limiting”?
There are three main categories of rate limiting to consider:
-
Upstream services protection: This form of rate limiting shields the underlying systems from being flooded with excessive requests.
-
Reasonable usage limits: These limits are based on reasonable user activity and prevent abnormal usage patterns.
-
Product-defined limits: These are limits based on a business agreement. If you have third-party clients accessing your services, you likely have a specific agreement regarding rate limits for their usage.
Enforce Rate Limits with a Scalable Gateway
Now, the question arises: How do you effectively enforce these limits?
The answer lies in using a Gateway solution, such as the Envoy Gateway, which offers simple and configurable rate limiting for Envoy Proxy.
When you enable Envoy Gateway in your Kubernetes cluster, it automatically installs the control plane and rate-limiting server required to enforce rate limiting for your resources.
When you run Envoy Proxy controlled by Envoy Gateway on Kubernetes it gives you a scalable gateway solution. As traffic volume changes, the data plane handling the requests can scale up and down as necessary.
Want global rate-limiting across gateways and regions? Connecting the rate-limiting to a cloud-hosted, cross-region replicated Redis allows you to achieve truly cross-cluster global rate limiting for your system.
A Simple Approach to Defining Rate Limits
Here’s a simple way to approach defining your rate limits:
First, assess how much traffic your underlying service can handle. This forms the foundation for your rate limits, usually set for a short period targeting the underlying service. This should be the most comprehensive rate limit, as it impacts all incoming requests. This is your upstream services protection rate limiting.
Next, consider limiting the requests from end users. If you have an application, distinguishing between reasonable human activity and non-reasonable activity helps you establish limits across different routes. This is a more restrictive approach but ensures fair and regular usage. Figuring out the appropriate rate limits here allows you to set reasonable usage limits.
Finally, if third-party clients access your services programmatically, it’s essential to adhere to any rate limits set in your business agreement. These limits will be based on the agreement with the client and may vary for different APIs, but they allow you to track usage based on the client. These are your product-defined limits, which should never have a greater value than your upstream services protection rate limiting.
Parting Thoughts
Rate limiting controls incoming traffic and helps, in turn, to control compute costs. Implementing a layered approach to rate limiting can help organizations strike the right balance between ensuring fair usage and protecting their underlying systems.