Why use Redis for rate limiting?

Redis is an in-memory data store that provides extremely low latency, making it ideal for real-time request tracking across multiple server instances.

What is the difference between Fixed Window and Token Bucket?

Fixed Window resets at specific time intervals, while Token Bucket allows for bursts of traffic while maintaining a steady long-term rate.

Can this handle distributed systems?

Yes, because the state is stored in a central Redis instance, all your application nodes will share the same rate limit counters.

Implementing Rate Limiting with Redis and Node.js

A developer at a growing fintech startup watches their dashboard in real-time. Suddenly, a single IP address begins hitting a specific API endpoint 500 times per second. The database CPU spikes to 95%. Without a mechanism to throttle these requests, the entire application hangs, and legitimate users are locked out. This is the reality of unprotected APIs.

This post breaks down how to build a rate-limiting system using Node.js and Redis. We'll look at the logic behind the "Fixed Window" algorithm, how to implement it in code, and why Redis is the right tool for the job. You'll walk away with a functional pattern to protect your backend from basic DoS attacks or aggressive scrapers.

Why Should You Use Redis for Rate Limiting?

Redis is the ideal choice because its in-memory nature allows for extremely low-latency read and write operations. When you're checking a user's request count, you can't afford to wait 50ms for a traditional database response—that would defeat the purpose of protecting the app. Redis provides the speed needed to keep your middleware overhead minimal.

Most developers start with an in-memory object inside their Node.js process to track hits. It works fine for a single instance. But the moment you scale to two or more containers or serverless functions, that local memory becomes useless. Each instance has its own isolated count, meaning a user could bypass your limits by hitting different pods. Redis acts as a centralized, single source of truth that all your application instances can talk to simultaneously.

Here is why Redis beats other options for this specific task:

Atomic Operations: Using commands like INCR ensures that even with thousands of concurrent requests, your counters stay accurate without race conditions.
TTL (Time To Live): You can set keys to expire automatically. This means the "window" resets itself without you having to write cleanup scripts.
Data Structures: Beyond simple strings, you can use sorted sets for more complex "Sliding Window" logic.

If you're already managing high-traffic-heavy workloads, you might want to check out how to avoid fixing N+1 query loops to keep your database healthy. Rate limiting is just one part of the larger battle for performance.

How Do You Implement a Fixed Window Rate Limiter in Node.js?

You implement a fixed window rate limiter by using the INCR command followed by an EXPIRE command on a key tied to a specific user ID or IP address. This creates a time-bound window that resets once the key expires.

Let's look at a practical implementation. We'll use the ioredis library because it's reliable and handles many edge cases out of the box. You'll need a running Redis instance—I usually use Redis OSS or a managed service like AWS ElastiCache for production environments.


const Redis = require('ioredis');
const redis = new Redis(); // Connects to localhost:6379 by default

async function rateLimiterMiddleware(req, res, next) {
  const ip = req.ip;
  const windowMs = 60; // 60 seconds
  const maxRequests = 10;
  const key = `rate_limit:${ip}`;

  try {
    // Increment the count for this IP
    const currentRequests = await redis.incr(key);

    // If this is the first request in the window, set the expiration
    if (currentRequests === 1) {
      await redis.expire(key, windowMs);
    }

    // Check if the user exceeded the limit
    if (currentRequests > maxRequests) {
      return res.status(429).json({
        error: 'Too many requests. Please try again later.',
        retryAfter: windowMs
      });
    }

    next();
  } catch (err) {
    console.error('Redis Error:', err);
    // If Redis fails, we usually let the request through rather than breaking the app
    next();
  }
}

The code above is a basic starting point. It's simple, but it has a flaw: if the window is 60 seconds, the count resets exactly 60 seconds after the first request. It doesn't account for the "edge" of the window perfectly. For most applications, this is fine. For high-stakes financial APIs, you might need a more sophisticated approach.

One thing to keep in mind: if your Redis connection drops, the catch block ensures your API doesn't just stop working. It's better to let a few extra requests through than to crash your entire service because a cache went offline. This is a form of building resilient systems through graceful degradation.

What is the Difference Between Fixed Window and Sliding Window?

The main difference is that Fixed Window resets at specific time intervals (like the start of a new minute), while Sliding Window tracks the exact timestamp of every request to ensure the limit is never exceeded in any rolling 60-second period.

Feature	Fixed Window	Sliding Window
Complexity	Very Low	Moderate to High
Memory Usage	Minimal (One key per user)	Higher (Stores timestamps)
Precision	Can allow bursts at the edge	Extremely precise
Implementation	Simple `INCR` and `EXPIRE`	Requires Sorted Sets (ZSET)

A Fixed Window can be "gamed." If a user makes 10 requests at 00:59 and another 10 requests at 01:01, they've effectively made 20 requests in a two-second span. If your backend can't handle that burst, you'll need to move to a Sliding Window algorithm. This uses a Redis Sorted Set where each member is a timestamp. You prune the old timestamps and count the remaining ones on every request.

Implementing the Sliding Window (The Pro Way)

If you need more precision, you'll use the ZREMRANGEBYSCORE and ZADD commands. This approach is more expensive in terms of CPU and memory, but it provides a much smoother experience for the end user.

Generate a unique key for the user/IP.
Get the current timestamp (in milliseconds).
Remove all elements in the Sorted Set that are older than (Current Time - Window Duration).
Count the remaining elements in the set.
If the count is below the limit, add the current timestamp to the set.
If the count is over the limit, reject the request.

It's a bit more code, but it prevents the "burst at the edge" problem entirely. It ensures that at any given point in time, the user hasn't exceeded the threshold in the last $X$ seconds.

Don't forget to handle the 429 Too Many Requests status code correctly. It's a standard HTTP response that tells the client exactly what happened. Most well-behaved scrapers and legitimate clients will see this and back off. If you don't include a Retry-After header, you're making it harder for legitimate clients to know when to try again.

Testing your rate limiter is just as important as writing it. I always use a tool like autocannon or even a simple Bash loop to hit my local endpoint and verify the 429 responses trigger exactly when expected. You don't want to find out your limit is set to 10,000 instead of 10 when you're already in production.