WhatWaf: Detect WAF Protection Before Scraping

TL;DR
WhatWaf is an open-source Python tool that identifies which WAF protects a web target before you attempt scraping. it tests 80+ WAF signatures including Cloudflare, Sucuri, Akamai, and AWS Shield. running it first saves hours of debugging blocked requests.

why WAF detection matters for scrapers

different WAFs require different bypass strategies. Cloudflare’s JS challenge needs a headless browser. Akamai Bot Manager uses behavioral fingerprinting that requires spoofed TLS. Sucuri responds to residential proxies. knowing the WAF before writing your scraper means you pick the right approach from the start.

WhatWaf was created by Ekultek and has over 3,800 GitHub stars. it sends a series of test payloads and analyzes the response headers, body, and status codes to fingerprint the WAF. it supports both direct URL testing and integration into Python scripts.

installation

git clone https://github.com/Ekultek/WhatWaf.git
cd WhatWaf
pip install -r requirements.txt

WhatWaf requires Python 3.6+. it uses requests, beautifulsoup4, and colorlog. on first run it downloads the latest WAF signatures from its GitHub repository.

basic command-line usage

the simplest usage is passing a URL with the -u flag. WhatWaf will return the detected WAF name and confidence score.

# test a single URL
python whatwaf.py -u https://target-site.com

# test with a random user agent
python whatwaf.py -u https://target-site.com --ra

# batch test from a file of URLs
python whatwaf.py -l urls.txt

# output results to JSON
python whatwaf.py -u https://target-site.com --format json -o results.json

example output for a Cloudflare-protected site:

[+] detected WAF: Cloudflare
[+] confidence: 95%
[+] detected tamper scripts: between, randomcase, space2comment

interpreting results

WhatWaf categorizes detections into three confidence tiers. a confidence score above 80% means the signature match is reliable. between 50-80% means multiple WAFs could match. below 50% means the site uses a custom or uncommon WAF. use the JSON output to build automated decision logic in your scraping pipeline.

import json
import subprocess

def detect_waf(url):
    result = subprocess.run(
        ["python", "whatwaf.py", "-u", url, "--format", "json", "-o", "/tmp/waf.json"],
        capture_output=True, text=True, cwd="/path/to/WhatWaf"
    )
    with open("/tmp/waf.json") as f:
        data = json.load(f)
    return data

waf_info = detect_waf("https://example.com")
print(waf_info)

supported WAFs

WhatWaf detects 80+ WAFs including: Cloudflare, Sucuri, Akamai, AWS WAF, Imperva Incapsula, F5 BIG-IP ASM, Barracuda, Citrix NetScaler, Reblaze, StackPath, ModSecurity, and many more. the full list is in the content/wafdetector.py file in the repository.

using WhatWaf programmatically

for integration into a scraping framework, you can import WhatWaf’s detection module directly. this avoids subprocess overhead and lets you run detection in-process.

import sys
sys.path.insert(0, "/path/to/WhatWaf")
from lib.core.waf_detections import WafDetections

def check_protection(url):
    detector = WafDetections(url)
    detections = detector.get_page()
    return detections

bypass strategy selection based on WAF type

once you know the WAF, apply the right bypass strategy. Cloudflare: use Playwright with stealth mode or a browser fingerprint-spoofing library. Sucuri: residential proxies and realistic headers. Akamai: rotate TLS fingerprints using curl-impersonate or tls-client. ModSecurity: encode payloads and avoid SQLi-like patterns in parameters.

for more on proxy types relevant to bypassing WAFs, see what is a proxy server and SOCKS5 vs HTTP proxy. for scraping fundamentals, see what is web scraping.