What is Rate Limiting?

What is rate limiting

Updated on by Hayley Brown

What is it and what does rate limiting do?

Rate limiting is a restriction on an API used to control the rate or number of times a user makes a request/API call to a network, server or other resource in a specific period. For instance, login attempts to an account. API Rate limits are used to manage network traffic for an API and prevent excessive or abusive use of a resource. This ensures it is available to all users. 

The Different Types of API Rate Limiting

There are multiple types of rate limiting that an organisation may implement to control requests made to an API

User Rate Limits

User rate limits are the most popular rate-limiting method and work by identifying the number of requests a given user makes. This is usually done by tracking the user’s IP address or API key. Once a user has exceeded their specified limit it will trigger the app to deny further requests. The rate-limit timeframe will need to reset and the user can then try again. If the issue continually arises users can ask developers to increase their rate limit.

Geographic Rate Limits

Geographic rate limits allow developers to further secure applications in a given geographic region. This is done by setting a rate limit for each specific region for a specified timeframe. For example, it can be predicted that users in a given region will be less active between midnight and 9:00 am. Therefore, developers can define a lower rate limit for this timeframe. This approach prevents suspicious traffic and further reduces the risks.

Server Rate Limits

Server rate limits enable developers to establish restrictions at the server level when they designate a particular server for certain application components. This method offers greater adaptability, permitting developers to raise the rate limit on frequently used servers while reducing the traffic limit on those that are less active.

Rate Limiting Algorithms

There are several API rate limiting algorithms and they regulate the flow of requests to a system used and each has its advantages and limitations. 

Token Bucket Algorithm

The system operates by envisioning a container that accumulates tokens at a steady pace, with each request consuming a specific number of tokens. Requests can proceed if there are sufficient tokens available. Tokens are replenished in the container at a consistent rate. 

For instance, a web server utilizes the Token Bucket Algorithm to restrict the number of requests from a single IP address to 50 within a minute. The server replenishes the container with tokens at a rate of 10 tokens per minute, and each request requires one token. If an IP address attempts to submit a request when the container is empty, that request will be denied.

Token Bucket Algorithm (rate limiting)

One benefit of the Token Bucket algorithm is its efficient use of memory, as it necessitates the storage of only a predetermined number of tokens. This efficiency is particularly crucial in environments with constrained memory availability. Nevertheless, the algorithm is vulnerable to race conditions, which may arise when several threads or processes try to access the same resource at the same time.

Leaky Bucket Algorithm

The Leaky Bucket Algorithm is similar to the previously mentioned Token Bucket algorithm. The difference is instead of storing a fixed number of tokens, it stores a fixed amount of data. The process can be visualised as a bucket with a hole at the bottom. Incoming requests fill the bucket, and additional requests are permitted as long as it isn’t full. Once the bucket reaches capacity, any extra requests overflow and are lost. The bucket is drained at a steady pace. 

For instance, an email platform utilises the Leaky Bucket Algorithm to restrict users to sending a maximum of 10 emails each hour. This bucket can accommodate 10 emails, with each new email added to its contents. When the bucket is at capacity, any further emails are discarded. It is emptied at a rate of one email every six minutes.

Leaky Bucket Algorithm (rate limiting)

This algorithm is straightforward and intuitive to comprehend and apply. It enables a predetermined volume of data to be sent at a stable pace, which is advantageous for systems needing a continuous data stream. Nonetheless, the leaky bucket algorithm may be less precise than alternative methods in monitoring and enforcing rate limits, as it depends on a constant data transmission rate rather than a specific count of requests. 

Fixed Window Counter Algorithm

The Fixed Window Counter Algorithm functions by tallying the requests made within a predetermined time frame, like a minute or an hour. When the request count surpasses a specified threshold, additional requests are denied until the beginning of the subsequent time period.

​​The fixed window algorithm is a method used by rate-limiting tools to track and throttle requests. It does this by dividing time into fixed intervals, or windows. Requests are counted within each window, if the number of requests exceeds a predetermined limit all subsequent requests are throttled until the next window.

Fixed Window Algorithm (rate limiting)

For example, an API implements the Fixed Window Counter Algorithm to limit the number of requests per minute to 50. If a user sends over 50 requests in a minute, further requests are blocked until the next minute starts.

Sliding-Window Rate Limiting

Sliding-window rate limiting algorithms function similarly to fixed-window algorithms but differ in how they define the beginning of each time frame. In sliding-window rate limiting, the period commences only when a user submits a new request, rather than adhering to a set schedule. 

For example, suppose the initial request is made at 9:00:24 am, and the limit is 200 requests per minute, the server permits up to 200 requests until 9:01:24. Sliding-window algorithms effectively address challenges associated with fixed-window rate limiting and also alleviate the starvation problem encountered with leaky bucket rate limiting by offering greater adaptability.

Sliding Window Algorithm (rate limiting)

Sliding-window algorithms help solve the issues affecting requests in fixed-window rate limiting. They also mitigate the starvation issue facing leaky bucket rate limiting by providing more flexibility.

Why is rate limiting important?

API rate limiting helps to ensure the performance and stability of an API system. Rate limiting helps to prevent downtime, slow responses, and malicious attacks. Additionally, rate limiting can prevent accidental or unintentional misuse of an API. This helps mitigate security risks or data loss.

By implementing rate limiting, businesses can protect their valuable resources and data — all while providing reliable performance for their users. There are numerous key benefits to rate-limiting, including the following:

Prevent DoS (Denial-of-Service) Attacks

Rate limiting is often implemented and used to protect against DoS (denial-of-service) attacks, which are designed to overwhelm a network with a high number of requests. The result of the attack leaves the resource unavailable to legitimate users. Therefore, limiting requests makes it difficult for potential attackers to execute an attack. 

Eliminates Credential Stuffing

Credential stuffing is another type of attack and is when attackers compromise databases containing user credentials. These credentials can be used by bots that submit hundreds if not thousands of credentials into a login form and access accounts. Rate limiting identifies credential stuffing and blocks bots before they access the account.

Prevents Abuse

Not only does rate limiting prevent the previously mentioned attacks it also reduces and prevents abuse from a single user or group monopolising a resource. As a result, it ensures that the resource is available to all users and prevents users from making excessive requests, which are wasteful and impact performance.

Improves User Experience

User experience is of great importance to organisations and rate limiting helps to enhance the experience by limiting the rate of requests made. As a result, it can improve the responsiveness of a network or server by reducing delays. This is important for applications that require real-time or near-real-time responses. 

Reduces Costs

When rate-limiting is implemented it avoids users incurring extra costs by preventing the chance of overuse of a resource. If a particular resource experiences a high volume of requests it requires additional capacity to handle the load. This could result in additional costs. With rate-limiting reducing the demand and rate of requests will avoid the need for additional capacity. 

Cyclr and API Rate Limiting

Cyclr’s Rate Limits can be specified in two ways – which you use depends on which better fits the limits of the API you’re working with.

Maximum Requests Per Time Frame

This allows you to specify the number of calls Cyclr can make within a set period of time, 

For instance:

Up to 1,000 Requests every 1 Second

Up to 10,000 Requests every 60 Minutes

The smallest Time Unit available here is Seconds.

Time Between Requests

This allows you to have Cyclr wait between each call.

For instance:

1 Request every 500 Milliseconds

1 Request every 5 Seconds

The smallest Time Unit available here is Milliseconds.

We continually add new rate limit options and if you want to be first to know when they are announced sign up to our newsletter.

You may also enjoy these…

About Author

Avatar for Hayley Brown

Hayley Brown

Joined Cyclr in 2020 after working in marketing teams in the eCommerce and education industries. She graduated from Sussex University and has a keen interest in graphic design and content writing. Follow Hayley on LinkedIn

Ready to start your integration journey?

Book a demo to see Cyclr in action and start creating integration solutions for your customers

Recommended by G2 users