How to Bypass IP Bans When Web Scraping

How to Bypass IP Bans When Web Scraping

IP bans are the most basic and most common anti-scraping measure websites use. When a site detects suspicious activity from your IP address, it blocks all further requests from that IP — sometimes temporarily, sometimes permanently.

This guide covers how IP bans work, how to detect them, techniques to bypass them, and strategies to prevent them in the first place.

How IP Bans Work

Websites implement IP bans through several mechanisms:

Server-Level Blocks

Direct blocks at the web server (Nginx, Apache) or firewall level:

# Nginx IP block example
deny 203.0.113.0/24;  # Block entire subnet
deny 198.51.100.42;   # Block single IP

CDN/WAF Blocks

Services like Cloudflare, Akamai, and AWS WAF maintain IP reputation databases and can block IPs across their entire network:

  • Cloudflare: Blocks based on IP reputation, ASN, and country
  • Akamai: Uses historical threat data to score and block IPs
  • AWS WAF: Custom rules based on IP sets and rate limiting

Application-Level Blocks

The website’s application code tracks and blocks IPs:

# How sites typically implement IP tracking
class IPTracker:
    def check_ip(self, ip_address):
        request_count = self.redis.get(f"requests:{ip_address}")
        if int(request_count or 0) > 100:  # 100 requests per minute
            self.block_ip(ip_address, duration=3600)  # Block for 1 hour
            return False
        self.redis.incr(f"requests:{ip_address}")
        self.redis.expire(f"requests:{ip_address}", 60)
        return True

Detecting IP Bans

Common Ban Indicators

import requests

def check_ban_status(url, proxies=None):
    """Check if you're IP banned from a site."""
    try:
        response = requests.get(url, proxies=proxies, timeout=15, headers={
            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
        })

        # HTTP status indicators
        if response.status_code == 403:
            if "blocked" in response.text.lower() or "banned" in response.text.lower():
                return "IP_BANNED"
            if "cloudflare" in response.text.lower():
                return "CLOUDFLARE_BLOCK"
            return "FORBIDDEN"

        if response.status_code == 429:
            return "RATE_LIMITED"

        if response.status_code == 503:
            if "access denied" in response.text.lower():
                return "SERVICE_BLOCK"
            return "SERVICE_UNAVAILABLE"

        # Connection-level indicators
        if response.status_code == 200:
            # Check for soft blocks (served a different page)
            if len(response.text) < 500 and "captcha" in response.text.lower():
                return "CAPTCHA_CHALLENGE"
            return "OK"

        return f"UNKNOWN_{response.status_code}"

    except requests.exceptions.ConnectionError:
        return "CONNECTION_REFUSED"  # Possibly IP blocked at firewall
    except requests.exceptions.Timeout:
        return "TIMEOUT"  # Possibly being throttled

status = check_ban_status("https://target-site.com")
print(f"Ban status: {status}")

Distinguishing Ban Types

ResponseMeaningTypical Duration
403 + “blocked”Hard IP banHours to permanent
403 + Cloudflare pageWAF blockMinutes to hours
429 + Retry-AfterRate limitSeconds to minutes
Connection refusedFirewall blockHours to permanent
503Temporary blockMinutes
200 + CAPTCHASoft block (challenge)Per-request

Method 1: Proxy Rotation

The most straightforward way to bypass IP bans is to route your traffic through different IP addresses.

Residential Proxy Rotation

from curl_cffi import requests
import random

class ProxyRotator:
    def __init__(self, proxy_gateway, username, password):
        self.gateway = proxy_gateway
        self.username = username
        self.password = password
        self.banned_ips = set()

    def get_proxy(self):
        """Get a residential proxy URL with session ID for sticky sessions."""
        session_id = random.randint(10000, 99999)
        return f"http://{self.username}-session-{session_id}:{self.password}@{self.gateway}"

    def fetch(self, url, max_retries=5):
        for attempt in range(max_retries):
            proxy = self.get_proxy()
            proxies = {"http": proxy, "https": proxy}

            try:
                response = requests.get(
                    url,
                    impersonate="chrome",
                    proxies=proxies,
                    timeout=20
                )

                if response.status_code == 200:
                    return response

                if response.status_code in (403, 429):
                    print(f"Blocked on attempt {attempt + 1}, rotating IP...")
                    continue

            except Exception as e:
                print(f"Error: {e}")
                continue

        return None

# Usage
rotator = ProxyRotator(
    proxy_gateway="gate.provider.com:7777",
    username="user",
    password="pass"
)

response = rotator.fetch("https://ip-banned-site.com")
if response:
    print(f"Success: {len(response.text)} bytes")

For proxy provider recommendations, see our proxy provider reviews.

Datacenter vs Residential Proxies

Datacenter proxies are cheaper but more likely to be pre-banned:

  • IPs are from known hosting providers
  • Easy to detect via ASN lookup
  • Often already in ban lists
  • Good for sites with minimal protection

Residential proxies are harder to ban:

  • IPs belong to real ISPs
  • Indistinguishable from regular users
  • Rotate through millions of IPs
  • Essential for well-protected sites

Mobile Proxies

Mobile proxies use cellular network IPs, which are almost never banned because:

  • Mobile carriers use CGNAT (many users share one IP)
  • Banning a mobile IP would block thousands of legitimate users
  • IPs rotate naturally as phones connect to different towers
# Mobile proxy usage
mobile_proxy = "http://user:pass@mobile-proxy.provider.com:7777"

response = requests.get(
    "https://heavily-protected-site.com",
    impersonate="chrome",
    proxies={"http": mobile_proxy, "https": mobile_proxy}
)

See our mobile proxy guides for detailed setup.

Method 2: IP Rotation Strategies

Not all rotation patterns are equal. The strategy matters as much as the proxies.

Sticky Sessions

Use the same IP for a series of related requests, then switch:

import time

class StickySessionRotator:
    def __init__(self, proxy_base, requests_per_session=20, session_duration=300):
        self.proxy_base = proxy_base
        self.requests_per_session = requests_per_session
        self.session_duration = session_duration
        self.current_session = None
        self.session_start = 0
        self.session_requests = 0

    def get_proxy(self):
        now = time.time()

        if (self.current_session is None or
            self.session_requests >= self.requests_per_session or
            now - self.session_start > self.session_duration):
            # Create new session
            self.current_session = f"{self.proxy_base}-session-{random.randint(1, 99999)}"
            self.session_start = now
            self.session_requests = 0

        self.session_requests += 1
        return self.current_session

Geographic Rotation

Rotate between IPs in different locations to avoid geographic pattern detection:

countries = ["us", "gb", "de", "fr", "ca", "au", "jp"]

def get_geo_proxy(country):
    return f"http://user-country-{country}:pass@gate.provider.com:7777"

# Rotate countries
for i, url in enumerate(urls):
    country = countries[i % len(countries)]
    proxy = get_geo_proxy(country)
    response = requests.get(url, proxies={"https": proxy})

For more rotation techniques, see our IP rotation strategies guide.

Method 3: Find the Origin Server

If the IP ban is implemented by a CDN (like Cloudflare), you might be able to bypass it by accessing the origin server directly.

import dns.resolver

def find_origin_ip(domain):
    """Attempt to find the origin server IP."""
    subdomains = ['mail', 'ftp', 'staging', 'dev', 'api', 'direct', 'old']
    results = []

    for sub in subdomains:
        try:
            answers = dns.resolver.resolve(f"{sub}.{domain}", "A")
            for rdata in answers:
                ip = str(rdata)
                results.append({"subdomain": sub, "ip": ip})
                print(f"{sub}.{domain}: {ip}")
        except:
            pass

    return results

# If you find the origin IP
origin_ip = "203.0.113.42"  # Example
headers = {
    "Host": "target-site.com",  # Must include the Host header
    "User-Agent": "Mozilla/5.0..."
}

response = requests.get(
    f"https://{origin_ip}/path",
    headers=headers,
    verify=False  # SSL cert won't match the IP
)

Method 4: VPN and Tor

VPN Rotation

VPNs provide a quick way to change your IP, though they’re not scalable for large operations:

import subprocess

def connect_vpn(server):
    """Connect to a VPN server (example using OpenVPN)."""
    subprocess.run(["openvpn", "--config", f"{server}.ovpn"], check=True)

# Rotate VPN servers
servers = ["us-east", "us-west", "eu-london", "eu-amsterdam"]

Tor Network

Tor routes traffic through multiple relays, but it’s slow and many sites block Tor exit nodes:

import requests

# Using Tor SOCKS proxy (requires Tor service running)
tor_proxy = {
    "http": "socks5://127.0.0.1:9050",
    "https": "socks5://127.0.0.1:9050"
}

response = requests.get(
    "https://target-site.com",
    proxies=tor_proxy,
    timeout=30
)

Method 5: Session and Cookie Management

Sometimes the ban is cookie-based rather than purely IP-based. Clearing cookies can help:

from curl_cffi import requests

def fresh_session():
    """Create a completely fresh session."""
    session = requests.Session(impersonate="chrome")
    # No cookies, no history
    return session

# Use a fresh session for each batch
for batch in batches:
    session = fresh_session()
    session.proxies = {"https": get_new_proxy()}

    for url in batch:
        response = session.get(url)
        if response.status_code != 200:
            break  # Session might be flagged, get new one

Prevention: Avoiding IP Bans

The best strategy is prevention. Here are practices that minimize bans:

1. Respect Rate Limits

import time
import random

def polite_scrape(urls, min_delay=2, max_delay=5):
    for url in urls:
        delay = random.uniform(min_delay, max_delay)
        time.sleep(delay)

        response = requests.get(url)
        yield response

2. Respect robots.txt

from urllib.robotparser import RobotFileParser

def check_robots(url, user_agent="*"):
    rp = RobotFileParser()
    from urllib.parse import urlparse
    parsed = urlparse(url)
    robots_url = f"{parsed.scheme}://{parsed.netloc}/robots.txt"
    rp.set_url(robots_url)
    rp.read()

    if rp.can_fetch(user_agent, url):
        return True
    else:
        print(f"robots.txt disallows: {url}")
        return False

3. Distribute Requests Over Time

import random

class RequestScheduler:
    def __init__(self, requests_per_hour=200):
        self.interval = 3600 / requests_per_hour
        self.last_request = 0

    def wait(self):
        now = time.time()
        elapsed = now - self.last_request
        wait_time = self.interval - elapsed

        if wait_time > 0:
            # Add jitter
            jitter = random.uniform(-self.interval * 0.2, self.interval * 0.2)
            actual_wait = max(0, wait_time + jitter)
            time.sleep(actual_wait)

        self.last_request = time.time()

4. Rotate User-Agents

Don’t use the same User-Agent for thousands of requests. See our User-Agent rotation guide.

5. Mimic Human Browse Patterns

def human_browsing_pattern(session, target_urls):
    """Scrape with human-like navigation patterns."""

    for url in target_urls:
        # Occasionally visit homepage
        if random.random() < 0.1:
            session.get("https://target-site.com/")
            time.sleep(random.uniform(1, 3))

        # Visit target page
        response = session.get(url)

        # Variable reading time
        reading_time = random.gauss(5, 2)  # Mean 5 seconds, std 2
        time.sleep(max(1, reading_time))

        yield response

Monitoring and Alerting

Build monitoring into your scraper to detect bans early:

class BanMonitor:
    def __init__(self, threshold=0.1):
        self.total_requests = 0
        self.blocked_requests = 0
        self.threshold = threshold

    def record(self, status_code):
        self.total_requests += 1
        if status_code in (403, 429, 503):
            self.blocked_requests += 1

    @property
    def block_rate(self):
        if self.total_requests == 0:
            return 0
        return self.blocked_requests / self.total_requests

    def check(self):
        if self.block_rate > self.threshold:
            print(f"WARNING: Block rate is {self.block_rate:.1%}")
            print("Consider: slower rate, proxy rotation, or different approach")
            return True
        return False

monitor = BanMonitor(threshold=0.05)

for url in urls:
    response = requests.get(url)
    monitor.record(response.status_code)

    if monitor.total_requests % 100 == 0:
        monitor.check()

Recovering from an IP Ban

If your primary IP is already banned:

  1. Wait it out: Many bans are temporary (1-24 hours)
  2. Contact the site: Some sites unban IPs upon request, especially for legitimate research
  3. Switch ISP: Request a new IP from your ISP (some rotate IPs on router restart)
  4. Use proxies: Route through residential proxies for all future requests
  5. Change approach: Consider using the site’s API if available

Conclusion

IP bans are the most basic anti-scraping measure but can be persistent and widespread when CDNs are involved. The key principle is prevention over cure: use residential proxies from the start, implement respectful rate limiting, and rotate IPs proactively rather than reactively.

For related guides, see our articles on IP rotation strategies, rate limiting, and 403 Forbidden errors.


Related Reading

Scroll to Top