Sucuri WAF blocks scrapers via IP reputation, rate limits, and JavaScript challenges. you can bypass it using residential proxies, rotating headers, and request throttling. this guide covers detection methods and practical workarounds.
what is the Sucuri WAF?
Sucuri is a cloud-based web application firewall (WAF) that sits in front of websites and filters malicious traffic. it inspects HTTP requests, checks IP reputation, enforces rate limits, and can issue JavaScript challenges (similar to Cloudflare’s bot detection). many e-commerce and news sites use Sucuri to protect against scraping.
when Sucuri blocks your scraper, you typically get a 403 Forbidden or a redirect to a challenge page. the response may include headers like X-Sucuri-ID or X-Sucuri-Cache that confirm the WAF is present.
how to detect Sucuri protection
before scraping, you can use WhatWaf or manual header inspection to detect Sucuri. look for these signals in the response headers:
X-Sucuri-ID: 12345
X-Sucuri-Cache: HIT
Server: Sucuri/Cloudproxyyou can also check via DNS: if the site’s A record points to Sucuri IP ranges (185.93.x.x or 66.248.x.x), the WAF is active. a quick nslookup or dig will confirm this.
method 1: use residential proxies
Sucuri’s IP reputation database flags datacenter IPs (AWS, DigitalOcean, Hetzner) almost immediately. residential proxies route your requests through real ISP addresses, which Sucuri is far less likely to block. look for providers offering rotating residential pools.
import requests
proxies = {
"http": "http://user:pass@residential-proxy.example.com:8080",
"https": "http://user:pass@residential-proxy.example.com:8080"
}
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
"Accept-Language": "en-US,en;q=0.9",
"Accept-Encoding": "gzip, deflate, br",
"Connection": "keep-alive"
}
response = requests.get("https://target-site.com/page", proxies=proxies, headers=headers)
print(response.status_code)method 2: rotate user agents and headers
Sucuri inspects request headers for bot signatures. a missing Accept header, unusual User-Agent strings, or missing Referer are dead giveaways. rotate a pool of real browser user agents on every request.
import random
USER_AGENTS = [
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36",
"Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/115.0"
]
def get_headers():
return {
"User-Agent": random.choice(USER_AGENTS),
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.5",
"Referer": "https://www.google.com/"
}method 3: throttle your request rate
Sucuri rate limits are typically triggered at 100+ requests per minute from a single IP. dropping to 1 request every 2-5 seconds, combined with jitter, dramatically reduces detection. add randomized delays between requests to mimic human browsing patterns.
import time
import random
def polite_get(url, session):
time.sleep(random.uniform(2.0, 5.0)) # random delay
return session.get(url, headers=get_headers())method 4: handle JavaScript challenges
if Sucuri issues a JS challenge, your scraper needs to execute JavaScript. use Playwright or Selenium with a real browser to render the challenge page and collect the resulting session cookies. once you have valid cookies, you can use them in lighter requests-based scrapers for subsequent calls.
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
page = browser.new_page()
page.goto("https://target-site.com")
page.wait_for_load_state("networkidle")
cookies = page.context.cookies()
browser.close()method 5: target the origin server directly
Sucuri is a reverse proxy, meaning the real server exists behind it. if you can discover the origin IP through DNS history (SecurityTrails, Shodan, or certificate transparency logs), you can send requests directly to the origin, bypassing the WAF entirely. set the Host header to the domain name while pointing to the origin IP.
import requests
# origin IP discovered via DNS history
origin_ip = "203.0.113.50"
target_domain = "target-site.com"
response = requests.get(
f"https://{origin_ip}/page",
headers={"Host": target_domain},
verify=False # SSL cert won't match origin IP
)best practices and ethical considerations
always check the site’s robots.txt and terms of service before scraping. Sucuri bypass techniques should only be used on sites you own, have permission to scrape, or for legitimate research purposes. rate limiting and residential proxies are standard professional scraping practice and are widely used in data collection pipelines.
for more background, see what is web scraping and what is a proxy server. for proxy type selection, review SOCKS5 vs HTTP proxy.