How to Bypass DataDome When Web Scraping

How to Bypass DataDome When Web Scraping

DataDome is one of the most aggressive anti-bot solutions on the market. it protects over 10,000 websites including major ecommerce platforms, ticketing sites, and classified marketplaces. if you’ve ever hit a page that suddenly shows a CAPTCHA or returns a 403 after your first few requests, there’s a good chance DataDome is behind it.

this guide breaks down how DataDome actually detects scrapers and what you can do to get around it. every technique includes working Python code so you can test against your target site immediately.

How DataDome Detects Scrapers

before you can bypass DataDome, you need to understand what it looks for. DataDome uses a layered detection approach that combines server-side and client-side signals.

Server-Side Detection

on the server side, DataDome analyzes:

  • IP reputation – datacenter IPs from AWS, GCP, Azure, and other cloud providers are flagged immediately. DataDome maintains a massive database of known proxy and VPN IP ranges.
  • request rate – sending too many requests from a single IP triggers rate limiting. DataDome tracks requests per IP across all its clients, meaning if your IP was flagged on site A, it might already be flagged on site B.
  • header consistency – mismatched or missing HTTP headers are a dead giveaway. DataDome checks whether your headers match what a real browser would send.
  • TLS fingerprint – the way your HTTP client negotiates the TLS handshake reveals whether you’re using a real browser or a library like requests or urllib.

Client-Side Detection

DataDome injects JavaScript on every page that collects:

  • browser fingerprint – canvas rendering, WebGL hash, audio context, installed fonts, screen resolution, timezone, and language settings
  • mouse and keyboard events – real users move their mouse, scroll, and interact with pages. bots typically don’t.
  • cookie handling – DataDome sets and reads cookies to track sessions. scrapers that don’t handle cookies properly get caught.
  • JavaScript execution – DataDome’s JS challenge must execute successfully. tools that don’t run JavaScript fail this check entirely.

Method 1: Using curl_cffi with TLS Fingerprint Impersonation

the easiest way to get past DataDome’s TLS fingerprinting is to use curl_cffi, which can impersonate real browser TLS fingerprints.

from curl_cffi import requests

# impersonate Chrome's TLS fingerprint
session = requests.Session(impersonate="chrome124")

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36",
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
    "Accept-Language": "en-US,en;q=0.9",
    "Accept-Encoding": "gzip, deflate, br",
    "Sec-Fetch-Dest": "document",
    "Sec-Fetch-Mode": "navigate",
    "Sec-Fetch-Site": "none",
    "Sec-Fetch-User": "?1",
    "Upgrade-Insecure-Requests": "1",
    "Connection": "keep-alive",
}

response = session.get(
    "https://datadome-protected-site.com",
    headers=headers
)

print(f"Status: {response.status_code}")
print(f"DataDome cookie set: {'datadome' in response.cookies}")

this works for sites where DataDome relies primarily on TLS fingerprinting and header checks. but for sites with full JavaScript challenges enabled, you’ll need a browser-based approach.

Method 2: Browser Automation with Playwright

for sites where DataDome runs its full JavaScript challenge, you need a real browser. Playwright with stealth plugins is the most reliable approach.

import asyncio
from playwright.async_api import async_playwright

async def scrape_with_playwright(url, proxy=None):
    async with async_playwright() as p:
        browser_args = [
            "--disable-blink-features=AutomationControlled",
            "--disable-dev-shm-usage",
            "--no-sandbox",
        ]

        launch_options = {
            "headless": False,  # DataDome often blocks headless browsers
            "args": browser_args,
        }

        if proxy:
            launch_options["proxy"] = {
                "server": proxy["server"],
                "username": proxy.get("username", ""),
                "password": proxy.get("password", ""),
            }

        browser = await p.chromium.launch(**launch_options)

        context = await browser.new_context(
            viewport={"width": 1920, "height": 1080},
            user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36",
            locale="en-US",
            timezone_id="America/New_York",
        )

        # remove navigator.webdriver flag
        await context.add_init_script("""
            Object.defineProperty(navigator, 'webdriver', {
                get: () => undefined
            });
        """)

        page = await context.new_page()

        # navigate and wait for DataDome's JS to execute
        await page.goto(url, wait_until="networkidle")

        # wait extra time for DataDome challenge resolution
        await page.wait_for_timeout(3000)

        content = await page.content()
        await browser.close()

        return content

# usage
html = asyncio.run(scrape_with_playwright(
    "https://datadome-protected-site.com",
    proxy={"server": "http://proxy-server:port", "username": "user", "password": "pass"}
))

Why Headless Mode Often Fails

DataDome specifically tests for headless browser indicators:

  • navigator.webdriver returns true in headless mode
  • the Chrome DevTools protocol leaves traces
  • canvas and WebGL rendering differs between headless and headed modes
  • certain JavaScript APIs behave differently in headless environments

you can test your browser’s fingerprint visibility using the Browser Fingerprint Tester on dataresearchtools.com to see exactly which signals are being leaked.

Method 3: Proxy Rotation with Residential IPs

DataDome’s IP reputation system is extremely effective against datacenter IPs. residential proxies are essential for any serious scraping of DataDome-protected sites.

from curl_cffi import requests
import random
import time

class DataDomeScraper:
    def __init__(self, proxy_list):
        self.proxy_list = proxy_list
        self.session = requests.Session(impersonate="chrome124")
        self.headers = {
            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36",
            "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
            "Accept-Language": "en-US,en;q=0.9",
            "Accept-Encoding": "gzip, deflate, br",
            "Sec-Fetch-Dest": "document",
            "Sec-Fetch-Mode": "navigate",
            "Sec-Fetch-Site": "none",
            "Sec-Fetch-User": "?1",
        }

    def get_random_proxy(self):
        proxy = random.choice(self.proxy_list)
        return {"http": proxy, "https": proxy}

    def scrape(self, url, max_retries=3):
        for attempt in range(max_retries):
            proxy = self.get_random_proxy()
            try:
                response = self.session.get(
                    url,
                    headers=self.headers,
                    proxies=proxy,
                    timeout=30
                )

                if response.status_code == 200:
                    return response

                if response.status_code == 403:
                    print(f"attempt {attempt + 1}: blocked (403). rotating proxy...")
                    time.sleep(random.uniform(2, 5))
                    continue

            except Exception as e:
                print(f"attempt {attempt + 1}: error - {e}")
                time.sleep(random.uniform(1, 3))

        return None

# usage with residential proxy list
proxies = [
    "http://user:pass@residential1.proxy.com:port",
    "http://user:pass@residential2.proxy.com:port",
    "http://user:pass@residential3.proxy.com:port",
]

scraper = DataDomeScraper(proxies)
result = scraper.scrape("https://target-site.com/products")

when choosing proxies, compare costs across providers using the Proxy Cost Calculator to find the best value for your volume.

DataDome tracks sessions using cookies. proper cookie management is critical to avoid detection.

from curl_cffi import requests
import json

class DataDomeSessionManager:
    def __init__(self, proxy):
        self.proxy = {"http": proxy, "https": proxy}
        self.session = requests.Session(impersonate="chrome124")
        self.cookies = {}

    def initialize_session(self, base_url):
        """visit the homepage first to get initial DataDome cookies"""
        response = self.session.get(
            base_url,
            proxies=self.proxy,
            headers={
                "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36",
                "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
                "Accept-Language": "en-US,en;q=0.9",
            }
        )

        # store DataDome cookie
        if "datadome" in self.session.cookies:
            self.cookies["datadome"] = self.session.cookies["datadome"]
            print(f"DataDome cookie acquired: {self.cookies['datadome'][:20]}...")

        return response.status_code == 200

    def scrape_page(self, url):
        """scrape a page using the established session"""
        response = self.session.get(
            url,
            proxies=self.proxy,
            headers={
                "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36",
                "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
                "Accept-Language": "en-US,en;q=0.9",
                "Referer": url.rsplit("/", 1)[0] + "/",
            }
        )
        return response

# usage
manager = DataDomeSessionManager("http://user:pass@residential.proxy.com:port")
if manager.initialize_session("https://target-site.com"):
    result = manager.scrape_page("https://target-site.com/products/page/1")
    print(f"Got {len(result.text)} bytes")

Method 5: Solving DataDome CAPTCHAs

when DataDome presents a CAPTCHA challenge, you can use a CAPTCHA solving service. DataDome typically uses its own custom CAPTCHA, not reCAPTCHA or hCaptcha.

import requests
import time

def solve_datadome_captcha(page_url, captcha_url, api_key):
    """solve DataDome CAPTCHA using a solving service"""

    # submit the captcha
    submit_response = requests.post(
        "https://api.captcha-service.com/createTask",
        json={
            "clientKey": api_key,
            "task": {
                "type": "DataDomeSliderTask",
                "websiteURL": page_url,
                "captchaUrl": captcha_url,
                "userAgent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
                "proxy": "http://user:pass@proxy:port"
            }
        }
    )

    task_id = submit_response.json()["taskId"]

    # poll for result
    for _ in range(30):
        time.sleep(3)
        result = requests.post(
            "https://api.captcha-service.com/getTaskResult",
            json={"clientKey": api_key, "taskId": task_id}
        )
        data = result.json()
        if data["status"] == "ready":
            return data["solution"]["cookie"]

    return None

Timing and Rate Limiting

DataDome monitors request patterns closely. here are the timing strategies that work:

import random
import time

def human_delay():
    """simulate human-like delays between requests"""
    # base delay of 3-8 seconds
    base = random.uniform(3, 8)
    # occasionally add a longer pause (simulating reading)
    if random.random() < 0.15:
        base += random.uniform(10, 30)
    return base

def scrape_with_timing(urls, scraper):
    results = []
    for i, url in enumerate(urls):
        delay = human_delay()
        print(f"[{i+1}/{len(urls)}] waiting {delay:.1f}s before next request...")
        time.sleep(delay)

        result = scraper.scrape(url)
        if result:
            results.append(result)
        else:
            # if blocked, take a longer break
            print("blocked. cooling down for 60s...")
            time.sleep(60)

    return results

Detecting DataDome on a Website

before investing time in bypass techniques, confirm that the site actually uses DataDome.

from curl_cffi import requests

def detect_datadome(url):
    """check if a website uses DataDome protection"""
    session = requests.Session(impersonate="chrome124")
    response = session.get(url)

    indicators = {
        "datadome_cookie": "datadome" in dict(response.cookies),
        "datadome_js": "datadome" in response.text.lower(),
        "dd_script": "js.datadome.co" in response.text,
        "dd_header": "x-datadome" in {k.lower(): v for k, v in response.headers.items()},
    }

    is_datadome = any(indicators.values())
    print(f"DataDome detected: {is_datadome}")
    for check, result in indicators.items():
        print(f"  {check}: {result}")

    return is_datadome

detect_datadome("https://example-site.com")

Common Mistakes That Get You Blocked

these are the errors I see most often when people try to scrape DataDome-protected sites:

  1. using datacenter proxies – DataDome blocks nearly all datacenter IP ranges. residential or mobile proxies are required.
  2. not handling cookies – DataDome sets tracking cookies on every request. ignoring them flags you as a bot immediately.
  3. consistent request intervals – sending requests at exactly 5-second intervals is obviously automated. randomize your delays.
  4. ignoring TLS fingerprints – using the Python requests library without curl_cffi exposes a non-browser TLS fingerprint that DataDome catches instantly.
  5. scraping too fast – even with perfect fingerprinting, scraping 100 pages per minute from a single IP will trigger rate limits.
  6. not rotating user agents – using the same User-Agent string for thousands of requests is suspicious. rotate between 5-10 realistic browser strings.

Choosing the Right Proxy Type for DataDome

Proxy TypeSuccess RateCostBest For
Datacenter~5%Lownot recommended for DataDome
Residential~70-85%Mediumhigh-volume scraping
Mobile (4G/5G)~90-95%Highmaximum success rate
ISP Proxies~75-85%Medium-Highconsistent sessions

mobile proxies have the highest success rate because DataDome trusts mobile carrier IPs more than any other type. the tradeoff is cost and speed.

Summary

bypassing DataDome requires a combination of techniques working together:

  • use curl_cffi or a real browser for proper TLS fingerprinting
  • rotate residential or mobile proxies for every session
  • manage cookies and maintain proper session state
  • add realistic timing between requests
  • use headful (not headless) browsers when JavaScript challenges appear
  • solve CAPTCHAs through external services when they’re triggered

no single technique will work on its own. DataDome’s layered detection means you need to pass every check simultaneously. start with the TLS fingerprint + residential proxy approach and escalate to full browser automation only if needed, since browser-based scraping is significantly slower and more resource intensive.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top