How Rate Limiter protect your system from abuse! EP: 3 Behind The Screen

28 Dec 2025 • 4 min read

This is the 3rd episode of my series Behind The Screen, where I’m discussing the workings of tech that enhance our daily life in the simplest way.

Here are the other episodes of this series in case if you find something interesting.

In this episode we are going to discuss:

What are rate limiters.?
What problem the solve?
How they work?
Rate Limiting Algorithms.

IMPORTANT - This post is for beginners who want to understand rate limiters. If you are an experienced developer and already know about rate limiters, this post is not for you. You might find my two other posts interesting.

What is a Rate Limiter?

A rate limiter is a backend system component that controls how often a client can access your service/system.

In some apps that require OTP verification, you might have seen that you can request OTP every 1 minute or whatever interval is set by the developer. This is done to prevent abuse of the system. As sending OTP is an expensive task.

This blocking is done by a rate limiter.

What problem Rate Limiter solve?

The answer is simple, and you might have guessed it. They prevent your system from abuse, and they also reduce your operational cost. Surely you won’t want the user to request an OTP every second because that’s expensive for you.

How Rate Limiter Work?

The rate limiter keeps track of the user and the requests they have done so far. For example, let’s say you have an API endpoint:

POST - /auth/otp/

This API is integrated into your mobile app and is responsible for sending OTP to the user.

The rate limiter will be set in between when a request comes to this API and check if the user should be able to access it or not. If yes, the request will not be blocked, and if not, the request will be blocked, and a 429 status code will be sent, which means “Too Many Requests”.

But how does rate limiter decide it?

The rate limiter already knows how many requests can be made to a particular API within a fixed interval of time. So it keeps track of the number of requests already made, and if it equals or exceeds the maximum allowed number of requests, it will interfere.

On What Algorithms Rate Limiter Work?

There are 3 mostly used rate limiting algorithms:

Token Bucket
Fixed Window
Sliding Window

Let’s discuss each of them.

Token Bucket

This is the simplest rate-limiting algorithm. Consider you have a bucket in which you have some tokens. When a user accesses a service, a token is dropped from that bucket.

We keep dropping tokens as the user keeps accessing the resource. If a user accesses resources too frequently, at some time this bucket will empty. Once the bucket is empty, we will not allow the user to access more resources.

We also fill tokens to this bucket at a fixed interval of time. You can see in the above illustration that at time 12:21:00 the bucket had 60 tokens. In the next 23 seconds the user accesses the resource (hits the API) 40 times, and now 20 tokens remain in the bucket.

By 35 seconds he had consumed all tokens.

Now these 60 tokens are filled into this bucket every 60 seconds. This means that from 12:21:35 to 12:21:59, the user can’t access the API/resource.

At 12:22:00 this bucket will be refilled with 60 tokens so the user can continue accessing the server resources.

Disadvantages

With simplicity a lot of disadvantages also come.

Poor By DesignFor every bucket you need to store the max token count, current token count, and last refill time and keep it refilling. At scale, when you have millions of users, this will not scale and will cause performance issues.
Difficult to express complex policiesToken bucket struggles with rules like max 10 requests per second and max 1000 per day.
Potential AbuseThis does not completely prevent abuse. As of the above example, any user can hit 200 requests between 12:21:59 and 12:22:00. In just 2 seconds the client is sending 200 requests.

💡 Quick note: If you enjoy understanding how everyday tech actually works under the hood, I write one of these breakdowns every week in Behind The Screen. You can subscribe below — no spam, just deep system-level explanations.

Fixed Window

This algorithm divides time into fixed windows. If you see the above diagram, time is divided into windows of 1 minute. This algorithm says that there are at most N requests in a fixed time window of length T.

In our example, 12:36 to 12:37 is one window. We keep count of requests made during this window, and if it exceeds the max, we will rate limit the user. If a client tries to access a resource at 12:37:01, he will be considered in a new window, and the count will start from 0.

Advantages

Less Storage OverheadUnlike a token bucket, this does not have a huge storage overhead and can work perfectly at scale.

Disadvantages

Burst at cornersIf you look carefully, this algorithm is not abuse-proof. As this also allows users to hit all their requests at the ending of the current window and the starting of the next window. Which is just 1 second of time.

Sliding Window

This algorithm says that at most N requests in the last T seconds are allowed — continuously, not in fixed buckets. So instead of checking a fixed window of time, we check the last T seconds from the time the request is made.

For example, if a request is made at 12:36:24, then we will check how many requests the user has made from 12:35:24 if our interval (T) is set to 60 seconds. This solved the problem of bursts that we faced with two other algorithms.

I’m building an Open Source Integration Engine to make SaaS integrations easy.

You can check it: Connective

Open source contributions are also welcome in this project.