Rate Limiting: How to Handle It When Web Scraping

Rate Limiting: How to Handle It When Web Scraping

Rate limiting is the most basic yet effective anti-scraping measure websites deploy. It restricts how many requests a client can make within a time window. When you exceed the limit, you get blocked — temporarily or permanently.

Unlike sophisticated bot detection that analyzes browser fingerprints and behavior, rate limiting is purely about request volume. This makes it predictable and manageable, but only if you handle it correctly.

How Rate Limiting Works

Server-Side Implementation

Websites track requests using various identifiers:

Rate limit key: IP address + endpoint
Window: 60 seconds
Limit: 100 requests

Request 1-100: → 200 OK
Request 101:   → 429 Too Many Requests
               Retry-After: 60

Common Rate Limit Strategies

StrategyHow It WorksExample
Fixed windowCount resets at fixed intervals100 requests per minute, resets at :00
Sliding windowRolling time window100 requests in any 60-second period
Token bucketTokens replenish at fixed rate10 tokens/second, burst up to 100
Leaky bucketRequests processed at constant rateQueue processed at 2 requests/second

Rate Limit Identifiers

Websites may rate-limit by:

  • IP address (most common)
  • IP subnet (/24 range)
  • User agent
  • Session/cookie
  • API key
  • Account
  • Endpoint (different limits per URL pattern)

Detecting Rate Limits

HTTP Status Codes

def detect_rate_limit(response):
    """Identify rate-limit responses."""

    # Standard rate limit response
    if response.status_code == 429:
        retry_after = response.headers.get("Retry-After")
        return True, retry_after

    # Some sites use 403 for rate limits
    if response.status_code == 403:
        body = response.text.lower()
        if "rate limit" in body or "too many requests" in body:
            return True, None

    # Some use 503 with retry header
    if response.status_code == 503:
        retry_after = response.headers.get("Retry-After")
        if retry_after:
            return True, retry_after

    return False, None

Rate Limit Headers

Many APIs communicate limits through headers:

def parse_rate_limit_headers(response):
    """Extract rate limit info from response headers."""
    info = {}

    # Standard headers
    info["limit"] = response.headers.get("X-RateLimit-Limit")
    info["remaining"] = response.headers.get("X-RateLimit-Remaining")
    info["reset"] = response.headers.get("X-RateLimit-Reset")

    # GitHub-style
    if not info["limit"]:
        info["limit"] = response.headers.get("X-Rate-Limit-Limit")
        info["remaining"] = response.headers.get("X-Rate-Limit-Remaining")
        info["reset"] = response.headers.get("X-Rate-Limit-Reset")

    # Retry-After (seconds or HTTP date)
    info["retry_after"] = response.headers.get("Retry-After")

    return info

Content-Based Detection

Some sites return 200 but serve CAPTCHA or block pages instead of actual content:

def is_soft_blocked(response):
    """Detect soft blocks (200 status but no real content)."""
    if response.status_code != 200:
        return False

    content = response.text.lower()
    block_signals = [
        "please verify you are human",
        "access denied",
        "too many requests",
        "rate limit exceeded",
        "please try again later",
        "captcha",
        "checking your browser",
    ]

    return any(signal in content for signal in block_signals)

Handling Rate Limits

Strategy 1: Exponential Backoff

The standard approach when rate-limited:

import time
import random
from curl_cffi import requests

def request_with_backoff(session, url, max_retries=5):
    """Make request with exponential backoff on rate limits."""
    for attempt in range(max_retries):
        response = session.get(url, timeout=30)

        if response.status_code == 200:
            return response

        if response.status_code == 429:
            # Use Retry-After if provided
            retry_after = response.headers.get("Retry-After")
            if retry_after:
                wait = int(retry_after)
            else:
                # Exponential backoff: 2, 4, 8, 16, 32 seconds
                wait = (2 ** attempt) + random.uniform(0, 1)

            print(f"Rate limited. Waiting {wait:.1f}s (attempt {attempt + 1})")
            time.sleep(wait)
            continue

        # Other errors
        return response

    raise Exception(f"Failed after {max_retries} retries: {url}")

Strategy 2: Proactive Rate Control

Don’t wait for 429s — control your request rate proactively:

import time
import threading
from collections import deque

class RateLimiter:
    def __init__(self, requests_per_second=2):
        self.rate = requests_per_second
        self.interval = 1.0 / requests_per_second
        self.timestamps = deque()
        self.lock = threading.Lock()

    def wait(self):
        """Block until it's safe to make the next request."""
        with self.lock:
            now = time.time()

            # Remove timestamps older than 1 second
            while self.timestamps and now - self.timestamps[0] > 1.0:
                self.timestamps.popleft()

            # If at capacity, wait
            if len(self.timestamps) >= self.rate:
                sleep_time = 1.0 - (now - self.timestamps[0])
                if sleep_time > 0:
                    time.sleep(sleep_time)

            self.timestamps.append(time.time())

# Usage
limiter = RateLimiter(requests_per_second=2)

for url in urls:
    limiter.wait()  # Blocks if too fast
    response = session.get(url)

Strategy 3: Token Bucket

Allows bursting while maintaining long-term rate:

import time
import threading

class TokenBucket:
    def __init__(self, rate, capacity):
        """
        rate: tokens added per second
        capacity: maximum burst size
        """
        self.rate = rate
        self.capacity = capacity
        self.tokens = capacity
        self.last_refill = time.time()
        self.lock = threading.Lock()

    def consume(self, tokens=1):
        """Consume tokens, blocking if necessary."""
        with self.lock:
            self._refill()

            while self.tokens < tokens:
                # Wait for tokens to replenish
                wait_time = (tokens - self.tokens) / self.rate
                time.sleep(wait_time)
                self._refill()

            self.tokens -= tokens

    def _refill(self):
        now = time.time()
        elapsed = now - self.last_refill
        new_tokens = elapsed * self.rate
        self.tokens = min(self.capacity, self.tokens + new_tokens)
        self.last_refill = now

# 5 requests/second sustained, burst up to 20
bucket = TokenBucket(rate=5, capacity=20)

for url in urls:
    bucket.consume()
    response = session.get(url)

Strategy 4: Concurrent Rate Limiting

For async or multi-threaded scrapers:

import asyncio
import aiohttp
from asyncio import Semaphore

class AsyncRateLimiter:
    def __init__(self, rate_per_second=5, max_concurrent=10):
        self.semaphore = Semaphore(max_concurrent)
        self.rate = rate_per_second
        self.interval = 1.0 / rate_per_second

    async def acquire(self):
        await self.semaphore.acquire()
        await asyncio.sleep(self.interval)

    def release(self):
        self.semaphore.release()

async def scrape_async(urls, rate_per_second=5):
    limiter = AsyncRateLimiter(rate_per_second=rate_per_second)

    async with aiohttp.ClientSession() as session:
        async def fetch(url):
            await limiter.acquire()
            try:
                async with session.get(url) as resp:
                    return {"url": url, "status": resp.status}
            finally:
                limiter.release()

        tasks = [fetch(url) for url in urls]
        return await asyncio.gather(*tasks)

# Fetch 1000 URLs at 5 requests/second
results = asyncio.run(scrape_async(urls, rate_per_second=5))

Combining Rate Limiting with IP Rotation

The most effective approach: rate-limit per IP while rotating across many IPs:

import time
import random
from curl_cffi import requests

class DistributedScraper:
    def __init__(self, proxy_gateway, username, password,
                 per_ip_rpm=30, total_rpm=300):
        self.gateway = proxy_gateway
        self.username = username
        self.password = password
        self.per_ip_rpm = per_ip_rpm
        self.per_ip_interval = 60 / per_ip_rpm
        self.total_interval = 60 / total_rpm

    def scrape(self, urls):
        results = []

        for i, url in enumerate(urls):
            # Create new session (new IP) periodically
            session_id = f"s{i // self.per_ip_rpm}_{int(time.time())}"

            session = requests.Session(impersonate="chrome120")
            session.proxies = {
                "http": f"http://{self.username}-session_{session_id}:{self.password}@{self.gateway}",
                "https": f"http://{self.username}-session_{session_id}:{self.password}@{self.gateway}"
            }

            resp = session.get(url, timeout=30)
            results.append({"url": url, "status": resp.status_code})

            # Rate limit: don't exceed total RPM
            delay = self.total_interval + random.uniform(0, 0.5)
            time.sleep(delay)

        return results

scraper = DistributedScraper(
    proxy_gateway="gate.proxy.com:7777",
    username="user",
    password="pass",
    per_ip_rpm=30,    # 30 requests per IP per minute
    total_rpm=300     # 300 total requests per minute (across all IPs)
)

results = scraper.scrape(urls)

Discovering Rate Limits

When scraping a new target, discover its limits:

import time
from curl_cffi import requests

def discover_rate_limit(url, proxy=None, max_requests=200):
    """Discover a site's rate limit by gradually increasing speed."""
    session = requests.Session(impersonate="chrome120")
    if proxy:
        session.proxies = {"http": proxy, "https": proxy}

    # Start with 1 req/s and increase
    rates = [1, 2, 5, 10, 20, 50]

    for rate in rates:
        interval = 1.0 / rate
        success = 0
        blocked = 0

        print(f"\nTesting {rate} req/s...")

        for i in range(min(rate * 10, max_requests)):
            resp = session.get(url)

            if resp.status_code == 200:
                success += 1
            elif resp.status_code in (429, 403, 503):
                blocked += 1
                if blocked >= 3:
                    break

            time.sleep(interval)

        block_rate = blocked / (success + blocked) * 100
        print(f"  Success: {success}, Blocked: {blocked} ({block_rate:.0f}%)")

        if block_rate > 20:
            safe_rate = rates[rates.index(rate) - 1] if rate > 1 else 0.5
            print(f"\nEstimated safe rate: {safe_rate} req/s per IP")
            return safe_rate

    print(f"\nNo rate limit detected up to {rates[-1]} req/s")
    return rates[-1]

Best Practices

  1. Start conservative — Begin at 1 request every 2-3 seconds and increase gradually
  2. Monitor response codes — Track your 200/429/403 ratio in real-time
  3. Respect Retry-After — Always honor the server’s suggested wait time
  4. Add jitter — Randomize delays to avoid periodic patterns
  5. Use per-endpoint limits — APIs often have different limits for different endpoints
  6. Implement circuit breakers — If you hit too many 429s, pause all requests for a cooldown period
  7. Cache responses — Never re-request pages you’ve already successfully scraped
  8. Use residential proxies — More IPs = more total capacity

FAQ

What’s the difference between rate limiting and IP banning?

Rate limiting is temporary and threshold-based — exceed X requests per minute, get blocked for Y minutes. Once the window resets, you can make requests again. IP banning is a persistent block that doesn’t automatically expire. Rate limiting is a warning; an IP ban is the consequence of ignoring too many warnings. See our IP ban bypass guide for handling bans.

How do I know if a site has rate limits?

Make requests at increasing speeds and monitor status codes. Rate limits typically manifest as 429 (Too Many Requests) responses with a Retry-After header. Some sites use 403 or 503 instead. You can also check the site’s API documentation, robots.txt, or response headers for X-RateLimit-* values.

Should I use delays between requests even with proxy rotation?

Yes. Even with IP rotation, making requests too fast can trigger site-wide rate limits (which apply across all IPs). Additionally, extremely rapid requests from different IPs but the same session or fingerprint are suspicious. A 1-3 second delay per request is safe for most sites.

Can rate limits apply to entire subnets?

Yes. Some sites rate-limit /24 subnets (256 IPs) as a unit. This is particularly problematic with datacenter proxies, where many IPs share the same subnet. Residential proxies from diverse geographic locations avoid this issue.

What’s the safest rate for scraping most websites?

As a general rule, 10-20 requests per minute per IP is safe for most websites. For APIs, check their documented limits. For heavily protected sites (social media, major e-commerce), 5-10 requests per minute per IP is safer. Combine with IP rotation to multiply your effective rate.

Conclusion

Rate limiting is the most predictable anti-scraping measure, and the easiest to handle properly. Build rate awareness into your scraper from day one — proactive rate control is always cheaper than recovering from bans. Combine thoughtful rate limiting with IP rotation and user-agent rotation for a sustainable scraping operation.

Useful Resources


Related Reading

Scroll to Top