How to Bypass DataDome When Web Scraping
DataDome is one of the most aggressive anti-bot solutions on the market. it protects over 10,000 websites including major ecommerce platforms, ticketing sites, and classified marketplaces. if you’ve ever hit a page that suddenly shows a CAPTCHA or returns a 403 after your first few requests, there’s a good chance DataDome is behind it.
this guide breaks down how DataDome actually detects scrapers and what you can do to get around it. every technique includes working Python code so you can test against your target site immediately.
How DataDome Detects Scrapers
before you can bypass DataDome, you need to understand what it looks for. DataDome uses a layered detection approach that combines server-side and client-side signals.
Server-Side Detection
on the server side, DataDome analyzes:
- IP reputation – datacenter IPs from AWS, GCP, Azure, and other cloud providers are flagged immediately. DataDome maintains a massive database of known proxy and VPN IP ranges.
- request rate – sending too many requests from a single IP triggers rate limiting. DataDome tracks requests per IP across all its clients, meaning if your IP was flagged on site A, it might already be flagged on site B.
- header consistency – mismatched or missing HTTP headers are a dead giveaway. DataDome checks whether your headers match what a real browser would send.
- TLS fingerprint – the way your HTTP client negotiates the TLS handshake reveals whether you’re using a real browser or a library like requests or urllib.
Client-Side Detection
DataDome injects JavaScript on every page that collects:
- browser fingerprint – canvas rendering, WebGL hash, audio context, installed fonts, screen resolution, timezone, and language settings
- mouse and keyboard events – real users move their mouse, scroll, and interact with pages. bots typically don’t.
- cookie handling – DataDome sets and reads cookies to track sessions. scrapers that don’t handle cookies properly get caught.
- JavaScript execution – DataDome’s JS challenge must execute successfully. tools that don’t run JavaScript fail this check entirely.
Method 1: Using curl_cffi with TLS Fingerprint Impersonation
the easiest way to get past DataDome’s TLS fingerprinting is to use curl_cffi, which can impersonate real browser TLS fingerprints.
from curl_cffi import requests
# impersonate Chrome's TLS fingerprint
session = requests.Session(impersonate="chrome124")
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.9",
"Accept-Encoding": "gzip, deflate, br",
"Sec-Fetch-Dest": "document",
"Sec-Fetch-Mode": "navigate",
"Sec-Fetch-Site": "none",
"Sec-Fetch-User": "?1",
"Upgrade-Insecure-Requests": "1",
"Connection": "keep-alive",
}
response = session.get(
"https://datadome-protected-site.com",
headers=headers
)
print(f"Status: {response.status_code}")
print(f"DataDome cookie set: {'datadome' in response.cookies}")
this works for sites where DataDome relies primarily on TLS fingerprinting and header checks. but for sites with full JavaScript challenges enabled, you’ll need a browser-based approach.
Method 2: Browser Automation with Playwright
for sites where DataDome runs its full JavaScript challenge, you need a real browser. Playwright with stealth plugins is the most reliable approach.
import asyncio
from playwright.async_api import async_playwright
async def scrape_with_playwright(url, proxy=None):
async with async_playwright() as p:
browser_args = [
"--disable-blink-features=AutomationControlled",
"--disable-dev-shm-usage",
"--no-sandbox",
]
launch_options = {
"headless": False, # DataDome often blocks headless browsers
"args": browser_args,
}
if proxy:
launch_options["proxy"] = {
"server": proxy["server"],
"username": proxy.get("username", ""),
"password": proxy.get("password", ""),
}
browser = await p.chromium.launch(**launch_options)
context = await browser.new_context(
viewport={"width": 1920, "height": 1080},
user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36",
locale="en-US",
timezone_id="America/New_York",
)
# remove navigator.webdriver flag
await context.add_init_script("""
Object.defineProperty(navigator, 'webdriver', {
get: () => undefined
});
""")
page = await context.new_page()
# navigate and wait for DataDome's JS to execute
await page.goto(url, wait_until="networkidle")
# wait extra time for DataDome challenge resolution
await page.wait_for_timeout(3000)
content = await page.content()
await browser.close()
return content
# usage
html = asyncio.run(scrape_with_playwright(
"https://datadome-protected-site.com",
proxy={"server": "http://proxy-server:port", "username": "user", "password": "pass"}
))
Why Headless Mode Often Fails
DataDome specifically tests for headless browser indicators:
navigator.webdriverreturnstruein headless mode- the Chrome DevTools protocol leaves traces
- canvas and WebGL rendering differs between headless and headed modes
- certain JavaScript APIs behave differently in headless environments
you can test your browser’s fingerprint visibility using the Browser Fingerprint Tester on dataresearchtools.com to see exactly which signals are being leaked.
Method 3: Proxy Rotation with Residential IPs
DataDome’s IP reputation system is extremely effective against datacenter IPs. residential proxies are essential for any serious scraping of DataDome-protected sites.
from curl_cffi import requests
import random
import time
class DataDomeScraper:
def __init__(self, proxy_list):
self.proxy_list = proxy_list
self.session = requests.Session(impersonate="chrome124")
self.headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.9",
"Accept-Encoding": "gzip, deflate, br",
"Sec-Fetch-Dest": "document",
"Sec-Fetch-Mode": "navigate",
"Sec-Fetch-Site": "none",
"Sec-Fetch-User": "?1",
}
def get_random_proxy(self):
proxy = random.choice(self.proxy_list)
return {"http": proxy, "https": proxy}
def scrape(self, url, max_retries=3):
for attempt in range(max_retries):
proxy = self.get_random_proxy()
try:
response = self.session.get(
url,
headers=self.headers,
proxies=proxy,
timeout=30
)
if response.status_code == 200:
return response
if response.status_code == 403:
print(f"attempt {attempt + 1}: blocked (403). rotating proxy...")
time.sleep(random.uniform(2, 5))
continue
except Exception as e:
print(f"attempt {attempt + 1}: error - {e}")
time.sleep(random.uniform(1, 3))
return None
# usage with residential proxy list
proxies = [
"http://user:pass@residential1.proxy.com:port",
"http://user:pass@residential2.proxy.com:port",
"http://user:pass@residential3.proxy.com:port",
]
scraper = DataDomeScraper(proxies)
result = scraper.scrape("https://target-site.com/products")
when choosing proxies, compare costs across providers using the Proxy Cost Calculator to find the best value for your volume.
Method 4: Cookie and Session Management
DataDome tracks sessions using cookies. proper cookie management is critical to avoid detection.
from curl_cffi import requests
import json
class DataDomeSessionManager:
def __init__(self, proxy):
self.proxy = {"http": proxy, "https": proxy}
self.session = requests.Session(impersonate="chrome124")
self.cookies = {}
def initialize_session(self, base_url):
"""visit the homepage first to get initial DataDome cookies"""
response = self.session.get(
base_url,
proxies=self.proxy,
headers={
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.9",
}
)
# store DataDome cookie
if "datadome" in self.session.cookies:
self.cookies["datadome"] = self.session.cookies["datadome"]
print(f"DataDome cookie acquired: {self.cookies['datadome'][:20]}...")
return response.status_code == 200
def scrape_page(self, url):
"""scrape a page using the established session"""
response = self.session.get(
url,
proxies=self.proxy,
headers={
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.9",
"Referer": url.rsplit("/", 1)[0] + "/",
}
)
return response
# usage
manager = DataDomeSessionManager("http://user:pass@residential.proxy.com:port")
if manager.initialize_session("https://target-site.com"):
result = manager.scrape_page("https://target-site.com/products/page/1")
print(f"Got {len(result.text)} bytes")
Method 5: Solving DataDome CAPTCHAs
when DataDome presents a CAPTCHA challenge, you can use a CAPTCHA solving service. DataDome typically uses its own custom CAPTCHA, not reCAPTCHA or hCaptcha.
import requests
import time
def solve_datadome_captcha(page_url, captcha_url, api_key):
"""solve DataDome CAPTCHA using a solving service"""
# submit the captcha
submit_response = requests.post(
"https://api.captcha-service.com/createTask",
json={
"clientKey": api_key,
"task": {
"type": "DataDomeSliderTask",
"websiteURL": page_url,
"captchaUrl": captcha_url,
"userAgent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
"proxy": "http://user:pass@proxy:port"
}
}
)
task_id = submit_response.json()["taskId"]
# poll for result
for _ in range(30):
time.sleep(3)
result = requests.post(
"https://api.captcha-service.com/getTaskResult",
json={"clientKey": api_key, "taskId": task_id}
)
data = result.json()
if data["status"] == "ready":
return data["solution"]["cookie"]
return None
Timing and Rate Limiting
DataDome monitors request patterns closely. here are the timing strategies that work:
import random
import time
def human_delay():
"""simulate human-like delays between requests"""
# base delay of 3-8 seconds
base = random.uniform(3, 8)
# occasionally add a longer pause (simulating reading)
if random.random() < 0.15:
base += random.uniform(10, 30)
return base
def scrape_with_timing(urls, scraper):
results = []
for i, url in enumerate(urls):
delay = human_delay()
print(f"[{i+1}/{len(urls)}] waiting {delay:.1f}s before next request...")
time.sleep(delay)
result = scraper.scrape(url)
if result:
results.append(result)
else:
# if blocked, take a longer break
print("blocked. cooling down for 60s...")
time.sleep(60)
return results
Detecting DataDome on a Website
before investing time in bypass techniques, confirm that the site actually uses DataDome.
from curl_cffi import requests
def detect_datadome(url):
"""check if a website uses DataDome protection"""
session = requests.Session(impersonate="chrome124")
response = session.get(url)
indicators = {
"datadome_cookie": "datadome" in dict(response.cookies),
"datadome_js": "datadome" in response.text.lower(),
"dd_script": "js.datadome.co" in response.text,
"dd_header": "x-datadome" in {k.lower(): v for k, v in response.headers.items()},
}
is_datadome = any(indicators.values())
print(f"DataDome detected: {is_datadome}")
for check, result in indicators.items():
print(f" {check}: {result}")
return is_datadome
detect_datadome("https://example-site.com")
Common Mistakes That Get You Blocked
these are the errors I see most often when people try to scrape DataDome-protected sites:
- using datacenter proxies – DataDome blocks nearly all datacenter IP ranges. residential or mobile proxies are required.
- not handling cookies – DataDome sets tracking cookies on every request. ignoring them flags you as a bot immediately.
- consistent request intervals – sending requests at exactly 5-second intervals is obviously automated. randomize your delays.
- ignoring TLS fingerprints – using the Python requests library without curl_cffi exposes a non-browser TLS fingerprint that DataDome catches instantly.
- scraping too fast – even with perfect fingerprinting, scraping 100 pages per minute from a single IP will trigger rate limits.
- not rotating user agents – using the same User-Agent string for thousands of requests is suspicious. rotate between 5-10 realistic browser strings.
Choosing the Right Proxy Type for DataDome
| Proxy Type | Success Rate | Cost | Best For |
|---|---|---|---|
| Datacenter | ~5% | Low | not recommended for DataDome |
| Residential | ~70-85% | Medium | high-volume scraping |
| Mobile (4G/5G) | ~90-95% | High | maximum success rate |
| ISP Proxies | ~75-85% | Medium-High | consistent sessions |
mobile proxies have the highest success rate because DataDome trusts mobile carrier IPs more than any other type. the tradeoff is cost and speed.
Summary
bypassing DataDome requires a combination of techniques working together:
- use curl_cffi or a real browser for proper TLS fingerprinting
- rotate residential or mobile proxies for every session
- manage cookies and maintain proper session state
- add realistic timing between requests
- use headful (not headless) browsers when JavaScript challenges appear
- solve CAPTCHAs through external services when they’re triggered
no single technique will work on its own. DataDome’s layered detection means you need to pass every check simultaneously. start with the TLS fingerprint + residential proxy approach and escalate to full browser automation only if needed, since browser-based scraping is significantly slower and more resource intensive.