Rate Limiting: How to Handle It When Web Scraping
Rate limiting is the most basic yet effective anti-scraping measure websites deploy. It restricts how many requests a client can make within a time window. When you exceed the limit, you get blocked — temporarily or permanently.
Unlike sophisticated bot detection that analyzes browser fingerprints and behavior, rate limiting is purely about request volume. This makes it predictable and manageable, but only if you handle it correctly.
How Rate Limiting Works
Server-Side Implementation
Websites track requests using various identifiers:
Rate limit key: IP address + endpoint
Window: 60 seconds
Limit: 100 requests
Request 1-100: → 200 OK
Request 101: → 429 Too Many Requests
Retry-After: 60Common Rate Limit Strategies
| Strategy | How It Works | Example |
|---|---|---|
| Fixed window | Count resets at fixed intervals | 100 requests per minute, resets at :00 |
| Sliding window | Rolling time window | 100 requests in any 60-second period |
| Token bucket | Tokens replenish at fixed rate | 10 tokens/second, burst up to 100 |
| Leaky bucket | Requests processed at constant rate | Queue processed at 2 requests/second |
Rate Limit Identifiers
Websites may rate-limit by:
- IP address (most common)
- IP subnet (/24 range)
- User agent
- Session/cookie
- API key
- Account
- Endpoint (different limits per URL pattern)
Detecting Rate Limits
HTTP Status Codes
def detect_rate_limit(response):
"""Identify rate-limit responses."""
# Standard rate limit response
if response.status_code == 429:
retry_after = response.headers.get("Retry-After")
return True, retry_after
# Some sites use 403 for rate limits
if response.status_code == 403:
body = response.text.lower()
if "rate limit" in body or "too many requests" in body:
return True, None
# Some use 503 with retry header
if response.status_code == 503:
retry_after = response.headers.get("Retry-After")
if retry_after:
return True, retry_after
return False, NoneRate Limit Headers
Many APIs communicate limits through headers:
def parse_rate_limit_headers(response):
"""Extract rate limit info from response headers."""
info = {}
# Standard headers
info["limit"] = response.headers.get("X-RateLimit-Limit")
info["remaining"] = response.headers.get("X-RateLimit-Remaining")
info["reset"] = response.headers.get("X-RateLimit-Reset")
# GitHub-style
if not info["limit"]:
info["limit"] = response.headers.get("X-Rate-Limit-Limit")
info["remaining"] = response.headers.get("X-Rate-Limit-Remaining")
info["reset"] = response.headers.get("X-Rate-Limit-Reset")
# Retry-After (seconds or HTTP date)
info["retry_after"] = response.headers.get("Retry-After")
return infoContent-Based Detection
Some sites return 200 but serve CAPTCHA or block pages instead of actual content:
def is_soft_blocked(response):
"""Detect soft blocks (200 status but no real content)."""
if response.status_code != 200:
return False
content = response.text.lower()
block_signals = [
"please verify you are human",
"access denied",
"too many requests",
"rate limit exceeded",
"please try again later",
"captcha",
"checking your browser",
]
return any(signal in content for signal in block_signals)Handling Rate Limits
Strategy 1: Exponential Backoff
The standard approach when rate-limited:
import time
import random
from curl_cffi import requests
def request_with_backoff(session, url, max_retries=5):
"""Make request with exponential backoff on rate limits."""
for attempt in range(max_retries):
response = session.get(url, timeout=30)
if response.status_code == 200:
return response
if response.status_code == 429:
# Use Retry-After if provided
retry_after = response.headers.get("Retry-After")
if retry_after:
wait = int(retry_after)
else:
# Exponential backoff: 2, 4, 8, 16, 32 seconds
wait = (2 ** attempt) + random.uniform(0, 1)
print(f"Rate limited. Waiting {wait:.1f}s (attempt {attempt + 1})")
time.sleep(wait)
continue
# Other errors
return response
raise Exception(f"Failed after {max_retries} retries: {url}")Strategy 2: Proactive Rate Control
Don’t wait for 429s — control your request rate proactively:
import time
import threading
from collections import deque
class RateLimiter:
def __init__(self, requests_per_second=2):
self.rate = requests_per_second
self.interval = 1.0 / requests_per_second
self.timestamps = deque()
self.lock = threading.Lock()
def wait(self):
"""Block until it's safe to make the next request."""
with self.lock:
now = time.time()
# Remove timestamps older than 1 second
while self.timestamps and now - self.timestamps[0] > 1.0:
self.timestamps.popleft()
# If at capacity, wait
if len(self.timestamps) >= self.rate:
sleep_time = 1.0 - (now - self.timestamps[0])
if sleep_time > 0:
time.sleep(sleep_time)
self.timestamps.append(time.time())
# Usage
limiter = RateLimiter(requests_per_second=2)
for url in urls:
limiter.wait() # Blocks if too fast
response = session.get(url)Strategy 3: Token Bucket
Allows bursting while maintaining long-term rate:
import time
import threading
class TokenBucket:
def __init__(self, rate, capacity):
"""
rate: tokens added per second
capacity: maximum burst size
"""
self.rate = rate
self.capacity = capacity
self.tokens = capacity
self.last_refill = time.time()
self.lock = threading.Lock()
def consume(self, tokens=1):
"""Consume tokens, blocking if necessary."""
with self.lock:
self._refill()
while self.tokens < tokens:
# Wait for tokens to replenish
wait_time = (tokens - self.tokens) / self.rate
time.sleep(wait_time)
self._refill()
self.tokens -= tokens
def _refill(self):
now = time.time()
elapsed = now - self.last_refill
new_tokens = elapsed * self.rate
self.tokens = min(self.capacity, self.tokens + new_tokens)
self.last_refill = now
# 5 requests/second sustained, burst up to 20
bucket = TokenBucket(rate=5, capacity=20)
for url in urls:
bucket.consume()
response = session.get(url)Strategy 4: Concurrent Rate Limiting
For async or multi-threaded scrapers:
import asyncio
import aiohttp
from asyncio import Semaphore
class AsyncRateLimiter:
def __init__(self, rate_per_second=5, max_concurrent=10):
self.semaphore = Semaphore(max_concurrent)
self.rate = rate_per_second
self.interval = 1.0 / rate_per_second
async def acquire(self):
await self.semaphore.acquire()
await asyncio.sleep(self.interval)
def release(self):
self.semaphore.release()
async def scrape_async(urls, rate_per_second=5):
limiter = AsyncRateLimiter(rate_per_second=rate_per_second)
async with aiohttp.ClientSession() as session:
async def fetch(url):
await limiter.acquire()
try:
async with session.get(url) as resp:
return {"url": url, "status": resp.status}
finally:
limiter.release()
tasks = [fetch(url) for url in urls]
return await asyncio.gather(*tasks)
# Fetch 1000 URLs at 5 requests/second
results = asyncio.run(scrape_async(urls, rate_per_second=5))Combining Rate Limiting with IP Rotation
The most effective approach: rate-limit per IP while rotating across many IPs:
import time
import random
from curl_cffi import requests
class DistributedScraper:
def __init__(self, proxy_gateway, username, password,
per_ip_rpm=30, total_rpm=300):
self.gateway = proxy_gateway
self.username = username
self.password = password
self.per_ip_rpm = per_ip_rpm
self.per_ip_interval = 60 / per_ip_rpm
self.total_interval = 60 / total_rpm
def scrape(self, urls):
results = []
for i, url in enumerate(urls):
# Create new session (new IP) periodically
session_id = f"s{i // self.per_ip_rpm}_{int(time.time())}"
session = requests.Session(impersonate="chrome120")
session.proxies = {
"http": f"http://{self.username}-session_{session_id}:{self.password}@{self.gateway}",
"https": f"http://{self.username}-session_{session_id}:{self.password}@{self.gateway}"
}
resp = session.get(url, timeout=30)
results.append({"url": url, "status": resp.status_code})
# Rate limit: don't exceed total RPM
delay = self.total_interval + random.uniform(0, 0.5)
time.sleep(delay)
return results
scraper = DistributedScraper(
proxy_gateway="gate.proxy.com:7777",
username="user",
password="pass",
per_ip_rpm=30, # 30 requests per IP per minute
total_rpm=300 # 300 total requests per minute (across all IPs)
)
results = scraper.scrape(urls)Discovering Rate Limits
When scraping a new target, discover its limits:
import time
from curl_cffi import requests
def discover_rate_limit(url, proxy=None, max_requests=200):
"""Discover a site's rate limit by gradually increasing speed."""
session = requests.Session(impersonate="chrome120")
if proxy:
session.proxies = {"http": proxy, "https": proxy}
# Start with 1 req/s and increase
rates = [1, 2, 5, 10, 20, 50]
for rate in rates:
interval = 1.0 / rate
success = 0
blocked = 0
print(f"\nTesting {rate} req/s...")
for i in range(min(rate * 10, max_requests)):
resp = session.get(url)
if resp.status_code == 200:
success += 1
elif resp.status_code in (429, 403, 503):
blocked += 1
if blocked >= 3:
break
time.sleep(interval)
block_rate = blocked / (success + blocked) * 100
print(f" Success: {success}, Blocked: {blocked} ({block_rate:.0f}%)")
if block_rate > 20:
safe_rate = rates[rates.index(rate) - 1] if rate > 1 else 0.5
print(f"\nEstimated safe rate: {safe_rate} req/s per IP")
return safe_rate
print(f"\nNo rate limit detected up to {rates[-1]} req/s")
return rates[-1]Best Practices
- Start conservative — Begin at 1 request every 2-3 seconds and increase gradually
- Monitor response codes — Track your 200/429/403 ratio in real-time
- Respect Retry-After — Always honor the server’s suggested wait time
- Add jitter — Randomize delays to avoid periodic patterns
- Use per-endpoint limits — APIs often have different limits for different endpoints
- Implement circuit breakers — If you hit too many 429s, pause all requests for a cooldown period
- Cache responses — Never re-request pages you’ve already successfully scraped
- Use residential proxies — More IPs = more total capacity
FAQ
What’s the difference between rate limiting and IP banning?
Rate limiting is temporary and threshold-based — exceed X requests per minute, get blocked for Y minutes. Once the window resets, you can make requests again. IP banning is a persistent block that doesn’t automatically expire. Rate limiting is a warning; an IP ban is the consequence of ignoring too many warnings. See our IP ban bypass guide for handling bans.
How do I know if a site has rate limits?
Make requests at increasing speeds and monitor status codes. Rate limits typically manifest as 429 (Too Many Requests) responses with a Retry-After header. Some sites use 403 or 503 instead. You can also check the site’s API documentation, robots.txt, or response headers for X-RateLimit-* values.
Should I use delays between requests even with proxy rotation?
Yes. Even with IP rotation, making requests too fast can trigger site-wide rate limits (which apply across all IPs). Additionally, extremely rapid requests from different IPs but the same session or fingerprint are suspicious. A 1-3 second delay per request is safe for most sites.
Can rate limits apply to entire subnets?
Yes. Some sites rate-limit /24 subnets (256 IPs) as a unit. This is particularly problematic with datacenter proxies, where many IPs share the same subnet. Residential proxies from diverse geographic locations avoid this issue.
What’s the safest rate for scraping most websites?
As a general rule, 10-20 requests per minute per IP is safe for most websites. For APIs, check their documented limits. For heavily protected sites (social media, major e-commerce), 5-10 requests per minute per IP is safer. Combine with IP rotation to multiply your effective rate.
Conclusion
Rate limiting is the most predictable anti-scraping measure, and the easiest to handle properly. Build rate awareness into your scraper from day one — proactive rate control is always cheaper than recovering from bans. Combine thoughtful rate limiting with IP rotation and user-agent rotation for a sustainable scraping operation.
Useful Resources
- Cloudflare Rate Limiting Documentation
- How Websites Detect Bots
- 403 Forbidden Fix Guide
- Bypass IP Bans
- 403 Forbidden in Web Scraping: How to Fix It
- Best CAPTCHA Solving Services in 2026: Complete Comparison
- Anti-Phishing with Proxies: How Security Teams Use Mobile IPs
- Brand Protection with Proxies: Detect Counterfeit Sellers & Trademark Violations
- How Cybersecurity Teams Use Proxies for Threat Intelligence
- Using Mobile Proxies for Dark Web Monitoring and Research
Related Reading
- 403 Forbidden in Web Scraping: How to Fix It
- Best CAPTCHA Solving Services in 2026: Complete Comparison
- Anti-Phishing with Proxies: How Security Teams Use Mobile IPs
- Brand Protection with Proxies: Detect Counterfeit Sellers & Trademark Violations
- How Cybersecurity Teams Use Proxies for Threat Intelligence
- Using Mobile Proxies for Dark Web Monitoring and Research