How to Bypass IP Bans When Web Scraping
IP bans are the most basic and most common anti-scraping measure websites use. When a site detects suspicious activity from your IP address, it blocks all further requests from that IP — sometimes temporarily, sometimes permanently.
This guide covers how IP bans work, how to detect them, techniques to bypass them, and strategies to prevent them in the first place.
How IP Bans Work
Websites implement IP bans through several mechanisms:
Server-Level Blocks
Direct blocks at the web server (Nginx, Apache) or firewall level:
# Nginx IP block example
deny 203.0.113.0/24; # Block entire subnet
deny 198.51.100.42; # Block single IPCDN/WAF Blocks
Services like Cloudflare, Akamai, and AWS WAF maintain IP reputation databases and can block IPs across their entire network:
- Cloudflare: Blocks based on IP reputation, ASN, and country
- Akamai: Uses historical threat data to score and block IPs
- AWS WAF: Custom rules based on IP sets and rate limiting
Application-Level Blocks
The website’s application code tracks and blocks IPs:
# How sites typically implement IP tracking
class IPTracker:
def check_ip(self, ip_address):
request_count = self.redis.get(f"requests:{ip_address}")
if int(request_count or 0) > 100: # 100 requests per minute
self.block_ip(ip_address, duration=3600) # Block for 1 hour
return False
self.redis.incr(f"requests:{ip_address}")
self.redis.expire(f"requests:{ip_address}", 60)
return TrueDetecting IP Bans
Common Ban Indicators
import requests
def check_ban_status(url, proxies=None):
"""Check if you're IP banned from a site."""
try:
response = requests.get(url, proxies=proxies, timeout=15, headers={
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
})
# HTTP status indicators
if response.status_code == 403:
if "blocked" in response.text.lower() or "banned" in response.text.lower():
return "IP_BANNED"
if "cloudflare" in response.text.lower():
return "CLOUDFLARE_BLOCK"
return "FORBIDDEN"
if response.status_code == 429:
return "RATE_LIMITED"
if response.status_code == 503:
if "access denied" in response.text.lower():
return "SERVICE_BLOCK"
return "SERVICE_UNAVAILABLE"
# Connection-level indicators
if response.status_code == 200:
# Check for soft blocks (served a different page)
if len(response.text) < 500 and "captcha" in response.text.lower():
return "CAPTCHA_CHALLENGE"
return "OK"
return f"UNKNOWN_{response.status_code}"
except requests.exceptions.ConnectionError:
return "CONNECTION_REFUSED" # Possibly IP blocked at firewall
except requests.exceptions.Timeout:
return "TIMEOUT" # Possibly being throttled
status = check_ban_status("https://target-site.com")
print(f"Ban status: {status}")Distinguishing Ban Types
| Response | Meaning | Typical Duration |
|---|---|---|
| 403 + “blocked” | Hard IP ban | Hours to permanent |
| 403 + Cloudflare page | WAF block | Minutes to hours |
| 429 + Retry-After | Rate limit | Seconds to minutes |
| Connection refused | Firewall block | Hours to permanent |
| 503 | Temporary block | Minutes |
| 200 + CAPTCHA | Soft block (challenge) | Per-request |
Method 1: Proxy Rotation
The most straightforward way to bypass IP bans is to route your traffic through different IP addresses.
Residential Proxy Rotation
from curl_cffi import requests
import random
class ProxyRotator:
def __init__(self, proxy_gateway, username, password):
self.gateway = proxy_gateway
self.username = username
self.password = password
self.banned_ips = set()
def get_proxy(self):
"""Get a residential proxy URL with session ID for sticky sessions."""
session_id = random.randint(10000, 99999)
return f"http://{self.username}-session-{session_id}:{self.password}@{self.gateway}"
def fetch(self, url, max_retries=5):
for attempt in range(max_retries):
proxy = self.get_proxy()
proxies = {"http": proxy, "https": proxy}
try:
response = requests.get(
url,
impersonate="chrome",
proxies=proxies,
timeout=20
)
if response.status_code == 200:
return response
if response.status_code in (403, 429):
print(f"Blocked on attempt {attempt + 1}, rotating IP...")
continue
except Exception as e:
print(f"Error: {e}")
continue
return None
# Usage
rotator = ProxyRotator(
proxy_gateway="gate.provider.com:7777",
username="user",
password="pass"
)
response = rotator.fetch("https://ip-banned-site.com")
if response:
print(f"Success: {len(response.text)} bytes")For proxy provider recommendations, see our proxy provider reviews.
Datacenter vs Residential Proxies
Datacenter proxies are cheaper but more likely to be pre-banned:
- IPs are from known hosting providers
- Easy to detect via ASN lookup
- Often already in ban lists
- Good for sites with minimal protection
Residential proxies are harder to ban:
- IPs belong to real ISPs
- Indistinguishable from regular users
- Rotate through millions of IPs
- Essential for well-protected sites
Mobile Proxies
Mobile proxies use cellular network IPs, which are almost never banned because:
- Mobile carriers use CGNAT (many users share one IP)
- Banning a mobile IP would block thousands of legitimate users
- IPs rotate naturally as phones connect to different towers
# Mobile proxy usage
mobile_proxy = "http://user:pass@mobile-proxy.provider.com:7777"
response = requests.get(
"https://heavily-protected-site.com",
impersonate="chrome",
proxies={"http": mobile_proxy, "https": mobile_proxy}
)See our mobile proxy guides for detailed setup.
Method 2: IP Rotation Strategies
Not all rotation patterns are equal. The strategy matters as much as the proxies.
Sticky Sessions
Use the same IP for a series of related requests, then switch:
import time
class StickySessionRotator:
def __init__(self, proxy_base, requests_per_session=20, session_duration=300):
self.proxy_base = proxy_base
self.requests_per_session = requests_per_session
self.session_duration = session_duration
self.current_session = None
self.session_start = 0
self.session_requests = 0
def get_proxy(self):
now = time.time()
if (self.current_session is None or
self.session_requests >= self.requests_per_session or
now - self.session_start > self.session_duration):
# Create new session
self.current_session = f"{self.proxy_base}-session-{random.randint(1, 99999)}"
self.session_start = now
self.session_requests = 0
self.session_requests += 1
return self.current_sessionGeographic Rotation
Rotate between IPs in different locations to avoid geographic pattern detection:
countries = ["us", "gb", "de", "fr", "ca", "au", "jp"]
def get_geo_proxy(country):
return f"http://user-country-{country}:pass@gate.provider.com:7777"
# Rotate countries
for i, url in enumerate(urls):
country = countries[i % len(countries)]
proxy = get_geo_proxy(country)
response = requests.get(url, proxies={"https": proxy})For more rotation techniques, see our IP rotation strategies guide.
Method 3: Find the Origin Server
If the IP ban is implemented by a CDN (like Cloudflare), you might be able to bypass it by accessing the origin server directly.
import dns.resolver
def find_origin_ip(domain):
"""Attempt to find the origin server IP."""
subdomains = ['mail', 'ftp', 'staging', 'dev', 'api', 'direct', 'old']
results = []
for sub in subdomains:
try:
answers = dns.resolver.resolve(f"{sub}.{domain}", "A")
for rdata in answers:
ip = str(rdata)
results.append({"subdomain": sub, "ip": ip})
print(f"{sub}.{domain}: {ip}")
except:
pass
return results
# If you find the origin IP
origin_ip = "203.0.113.42" # Example
headers = {
"Host": "target-site.com", # Must include the Host header
"User-Agent": "Mozilla/5.0..."
}
response = requests.get(
f"https://{origin_ip}/path",
headers=headers,
verify=False # SSL cert won't match the IP
)Method 4: VPN and Tor
VPN Rotation
VPNs provide a quick way to change your IP, though they’re not scalable for large operations:
import subprocess
def connect_vpn(server):
"""Connect to a VPN server (example using OpenVPN)."""
subprocess.run(["openvpn", "--config", f"{server}.ovpn"], check=True)
# Rotate VPN servers
servers = ["us-east", "us-west", "eu-london", "eu-amsterdam"]Tor Network
Tor routes traffic through multiple relays, but it’s slow and many sites block Tor exit nodes:
import requests
# Using Tor SOCKS proxy (requires Tor service running)
tor_proxy = {
"http": "socks5://127.0.0.1:9050",
"https": "socks5://127.0.0.1:9050"
}
response = requests.get(
"https://target-site.com",
proxies=tor_proxy,
timeout=30
)Method 5: Session and Cookie Management
Sometimes the ban is cookie-based rather than purely IP-based. Clearing cookies can help:
from curl_cffi import requests
def fresh_session():
"""Create a completely fresh session."""
session = requests.Session(impersonate="chrome")
# No cookies, no history
return session
# Use a fresh session for each batch
for batch in batches:
session = fresh_session()
session.proxies = {"https": get_new_proxy()}
for url in batch:
response = session.get(url)
if response.status_code != 200:
break # Session might be flagged, get new onePrevention: Avoiding IP Bans
The best strategy is prevention. Here are practices that minimize bans:
1. Respect Rate Limits
import time
import random
def polite_scrape(urls, min_delay=2, max_delay=5):
for url in urls:
delay = random.uniform(min_delay, max_delay)
time.sleep(delay)
response = requests.get(url)
yield response2. Respect robots.txt
from urllib.robotparser import RobotFileParser
def check_robots(url, user_agent="*"):
rp = RobotFileParser()
from urllib.parse import urlparse
parsed = urlparse(url)
robots_url = f"{parsed.scheme}://{parsed.netloc}/robots.txt"
rp.set_url(robots_url)
rp.read()
if rp.can_fetch(user_agent, url):
return True
else:
print(f"robots.txt disallows: {url}")
return False3. Distribute Requests Over Time
import random
class RequestScheduler:
def __init__(self, requests_per_hour=200):
self.interval = 3600 / requests_per_hour
self.last_request = 0
def wait(self):
now = time.time()
elapsed = now - self.last_request
wait_time = self.interval - elapsed
if wait_time > 0:
# Add jitter
jitter = random.uniform(-self.interval * 0.2, self.interval * 0.2)
actual_wait = max(0, wait_time + jitter)
time.sleep(actual_wait)
self.last_request = time.time()4. Rotate User-Agents
Don’t use the same User-Agent for thousands of requests. See our User-Agent rotation guide.
5. Mimic Human Browse Patterns
def human_browsing_pattern(session, target_urls):
"""Scrape with human-like navigation patterns."""
for url in target_urls:
# Occasionally visit homepage
if random.random() < 0.1:
session.get("https://target-site.com/")
time.sleep(random.uniform(1, 3))
# Visit target page
response = session.get(url)
# Variable reading time
reading_time = random.gauss(5, 2) # Mean 5 seconds, std 2
time.sleep(max(1, reading_time))
yield responseMonitoring and Alerting
Build monitoring into your scraper to detect bans early:
class BanMonitor:
def __init__(self, threshold=0.1):
self.total_requests = 0
self.blocked_requests = 0
self.threshold = threshold
def record(self, status_code):
self.total_requests += 1
if status_code in (403, 429, 503):
self.blocked_requests += 1
@property
def block_rate(self):
if self.total_requests == 0:
return 0
return self.blocked_requests / self.total_requests
def check(self):
if self.block_rate > self.threshold:
print(f"WARNING: Block rate is {self.block_rate:.1%}")
print("Consider: slower rate, proxy rotation, or different approach")
return True
return False
monitor = BanMonitor(threshold=0.05)
for url in urls:
response = requests.get(url)
monitor.record(response.status_code)
if monitor.total_requests % 100 == 0:
monitor.check()Recovering from an IP Ban
If your primary IP is already banned:
- Wait it out: Many bans are temporary (1-24 hours)
- Contact the site: Some sites unban IPs upon request, especially for legitimate research
- Switch ISP: Request a new IP from your ISP (some rotate IPs on router restart)
- Use proxies: Route through residential proxies for all future requests
- Change approach: Consider using the site’s API if available
Conclusion
IP bans are the most basic anti-scraping measure but can be persistent and widespread when CDNs are involved. The key principle is prevention over cure: use residential proxies from the start, implement respectful rate limiting, and rotate IPs proactively rather than reactively.
For related guides, see our articles on IP rotation strategies, rate limiting, and 403 Forbidden errors.
- 403 Forbidden in Web Scraping: How to Fix It
- Best CAPTCHA Solving Services in 2026: Complete Comparison
- Anti-Phishing with Proxies: How Security Teams Use Mobile IPs
- Brand Protection with Proxies: Detect Counterfeit Sellers & Trademark Violations
- How Cybersecurity Teams Use Proxies for Threat Intelligence
- Using Mobile Proxies for Dark Web Monitoring and Research
- 403 Forbidden in Web Scraping: How to Fix It
- Best CAPTCHA Solving Services in 2026: Complete Comparison
- Anti-Phishing with Proxies: How Security Teams Use Mobile IPs
- Brand Protection with Proxies: Detect Counterfeit Sellers & Trademark Violations
- How Cybersecurity Teams Use Proxies for Threat Intelligence
- Using Mobile Proxies for Dark Web Monitoring and Research
- 403 Forbidden in Web Scraping: How to Fix It
- Best CAPTCHA Solving Services in 2026: Complete Comparison
- Anti-Phishing with Proxies: How Security Teams Use Mobile IPs
- Brand Protection with Proxies: Detect Counterfeit Sellers & Trademark Violations
- How Cybersecurity Teams Use Proxies for Threat Intelligence
- Using Mobile Proxies for Dark Web Monitoring and Research
Related Reading
- 403 Forbidden in Web Scraping: How to Fix It
- Best CAPTCHA Solving Services in 2026: Complete Comparison
- Anti-Phishing with Proxies: How Security Teams Use Mobile IPs
- Brand Protection with Proxies: Detect Counterfeit Sellers & Trademark Violations
- How Cybersecurity Teams Use Proxies for Threat Intelligence
- Using Mobile Proxies for Dark Web Monitoring and Research