Headless Browser + Proxy Setup: The Anti-Detection Stack

Why Headless Browsers Are Now a Scraping Requirement

Five years ago, most web scraping could be done with HTTP request libraries like Python’s requests or Node’s axios. You sent a GET request, received HTML, parsed it, and moved on. That era is over for any non-trivial scraping target.

Modern websites rely on JavaScript to render content, load data via AJAX calls, and implement anti-bot protections that require a real browser environment to bypass. Anti-bot systems like Cloudflare, HUMAN (formerly PerimeterX), and Akamai Bot Manager check for browser APIs, execute JavaScript challenges, and validate TLS fingerprints that only a real browser can produce.

A headless browser is a web browser that runs without a visible graphical interface. It executes JavaScript, renders CSS, handles cookies, processes redirects, and creates a browser fingerprint, all the things a real browser does, just without displaying anything on screen. When configured correctly and routed through a quality proxy, a headless browser is nearly indistinguishable from a real user.

Chrome Headless vs Puppeteer vs Playwright

The three main options for headless browser automation each have distinct strengths.

Chrome Headless (Direct)

Running Chrome with the --headless flag gives you a full Chrome browser without the GUI. You interact with it via the Chrome DevTools Protocol (CDP).

Pros:

  • Exact same rendering engine as real Chrome
  • Full access to all Chrome features
  • Smallest abstraction layer (direct CDP access)

Cons:

  • Low-level API requires more code for common tasks
  • No built-in convenience functions for scraping patterns
  • Managing browser lifecycle is your responsibility
  • Headless Chrome has detectable differences from headed Chrome (this matters for anti-bot evasion)

Puppeteer

Puppeteer is Google’s official Node.js library for controlling Chrome via CDP. It provides a high-level API over Chrome DevTools Protocol.

Pros:

  • Well-documented, mature ecosystem
  • Large community and extensive plugin ecosystem
  • Good TypeScript support
  • Tight integration with Chrome updates

Cons:

  • Node.js only (though there are unofficial Python ports)
  • Chrome/Chromium only (no Firefox or WebKit)
  • Some default behaviors are detectable (Puppeteer adds identifiable properties to the browser)
  • Resource management can be tricky at scale

Playwright

Playwright is Microsoft’s browser automation library, designed as a modern successor to Puppeteer. It supports multiple languages and multiple browsers.

Pros:

  • Supports Chromium, Firefox, and WebKit (Safari’s engine)
  • Available in Node.js, Python, Java, and .NET
  • Better auto-wait mechanics reduce flaky scripts
  • Superior context isolation (multiple browser contexts share a single browser process)
  • Built-in proxy support per context (different proxies for different scraping tasks in the same browser instance)
  • Network interception is more robust than Puppeteer

Cons:

  • Slightly younger ecosystem than Puppeteer
  • Uses its own patched browser builds (not stock Chrome), which can have subtle fingerprint differences

The Recommendation

For new scraping projects in 2026, Playwright is the stronger choice. Its multi-browser support, built-in proxy configuration, and context isolation make it superior for scraping workloads. The Python API is particularly well-designed for data practitioners who are already working in Python.

Proxy Integration with Headless Browsers

Routing your headless browser through a proxy is the foundation of the anti-detection stack.

Playwright Proxy Configuration

Playwright supports proxy configuration at two levels: browser-wide and per-context.

Browser-level proxy applies to all pages opened by that browser instance:

browser = playwright.chromium.launch(
    proxy={
        "server": "http://proxy.example.com:8080",
        "username": "user",
        "password": "pass"
    }
)

Context-level proxy allows different proxy configurations for different scraping tasks within the same browser process:

context = browser.new_context(
    proxy={
        "server": "http://proxy.example.com:8080",
        "username": "user",
        "password": "pass"
    }
)

Context-level proxies are powerful for multi-account operations where each account needs a different IP.

Puppeteer Proxy Configuration

Puppeteer sets the proxy at the browser launch level:

browser = await puppeteer.launch({
    args: ['--proxy-server=http://proxy.example.com:8080']
});

For authenticated proxies, you need to handle proxy authentication via page-level interception, as Chromium does not support inline proxy auth in the URL.

SOCKS5 vs HTTP Proxies

Both Playwright and Puppeteer support SOCKS5 and HTTP/HTTPS proxies. For web scraping:

  • HTTP/HTTPS proxies: Simpler setup, work with most proxy providers, slightly more overhead per request
  • SOCKS5 proxies: Lower overhead, support for non-HTTP traffic, but fewer proxy providers offer them

For mobile proxies from DataResearchTools, HTTP proxy connections are the standard and provide the most reliable integration with headless browsers.

Stealth Plugins: Making Headless Browsers Undetectable

Out of the box, headless browsers are detectable. Anti-bot systems check for specific properties that differ between headless and headed browser environments.

What Gets Detected

Without stealth configuration, headless browsers expose:

  • navigator.webdriver property set to true
  • Missing or incorrect navigator.plugins array (headed Chrome has plugins, headless often does not)
  • Chrome-specific properties like window.chrome being absent or incomplete
  • Inconsistent screen dimensions and color depth
  • Missing or incorrect permissions API responses
  • WebGL renderer string revealing software rendering instead of hardware GPU
  • Canvas fingerprint anomalies from software rendering

Puppeteer Stealth Plugin

The puppeteer-extra-plugin-stealth package patches Puppeteer to fix these detectable differences:

  • Overrides navigator.webdriver to false
  • Adds realistic navigator.plugins and navigator.mimeTypes
  • Patches window.chrome to match headed Chrome
  • Fixes iframe contentWindow access patterns
  • Overrides Permissions API responses
  • Patches WebGL vendor and renderer strings

This plugin handles the most commonly checked detection vectors, but it is not a complete solution. Some anti-bot systems have evolved beyond these checks.

Playwright Stealth

Playwright does not have an official stealth plugin, but several community options exist:

  • playwright-extra with puppeteer-extra-plugin-stealth adapted for Playwright
  • playwright-stealth (Python package)
  • Manual patching via addInitScript to override detectable properties before page JavaScript executes

Beyond Stealth Plugins

Stealth plugins handle the low-hanging fruit. For hard targets, you need additional measures:

  • Custom browser builds: Compile Chromium with modifications that remove headless-specific behaviors at the engine level
  • Real browser profiles: Import actual browser profiles (with extensions, history, bookmarks) to create realistic browser environments
  • Hardware-backed rendering: Run headless browsers on machines with real GPUs to produce authentic WebGL and Canvas fingerprints

Fingerprint Management

Browser fingerprinting is a multi-dimensional identification technique. Managing your fingerprint across scraping sessions is critical for avoiding detection.

Key Fingerprint Components

ComponentWhat It RevealsHow to Control
User-AgentBrowser version, OSRotate realistic UAs
Screen resolutionDevice typeMatch UA to resolution
TimezoneGeographic locationMatch to proxy location
LanguageUser localeMatch to proxy location
WebGL rendererGPU hardwareSpoof or use real GPU
Canvas hashRendering engineVaries by OS/GPU
Audio contextAudio hardwareSpoof fingerprint
Font listInstalled fontsUse OS-appropriate fonts
PlatformOperating systemMatch to UA
Hardware concurrencyCPU coresSet realistic values
Device memoryRAM amountSet realistic values

Fingerprint Consistency

The most common mistake is creating an internally inconsistent fingerprint. Sending a Windows User-Agent but reporting a Mac-specific font list, or claiming to be an iPhone but reporting a screen resolution that no iPhone has, is immediately suspicious.

Rules for consistent fingerprints:

  1. Every fingerprint component must be consistent with the others
  2. The User-Agent, platform, screen resolution, and available fonts must correspond to a real device
  3. The timezone and language must match your proxy’s geographic location
  4. WebGL and Canvas output should be consistent with the claimed GPU

Fingerprint Rotation

Just as you rotate IPs, you should rotate fingerprints. But fingerprint rotation follows different rules:

  • Rotate fingerprints when you rotate to a new IP (new IP should equal new user)
  • Keep fingerprint consistent within a sticky session
  • Maintain a library of pre-built consistent fingerprint profiles
  • Each profile should represent a real device configuration (iPhone 15 on iOS 18, MacBook Pro on macOS Sequoia, etc.)

Detecting Headless Detection

How do you know if a target site is detecting your headless browser? Monitor for these signals.

Direct Detection Indicators

  • Receiving CAPTCHAs on pages that do not show them to real users
  • Being redirected to bot detection pages
  • Receiving empty or different content than a real browser sees
  • HTTP 403 or 429 responses on pages that load normally in a real browser
  • JavaScript challenge pages that loop infinitely

Subtle Detection Indicators

  • Response content differs slightly from what a real browser receives (missing elements, different ad content)
  • Slower response times (may indicate request is being held for additional analysis)
  • Different cookies being set compared to a real browser session
  • Missing or different response headers

Testing Your Setup

Before deploying at scale, validate your headless browser setup against detection test sites:

  • bot.sannysoft.com: Tests common headless browser detection vectors
  • browserleaks.com: Shows your browser’s full fingerprint
  • pixelscan.net: Evaluates fingerprint consistency and detects proxy usage
  • creepjs: Advanced fingerprinting detection

Compare the results from your headless browser against a real browser on the same machine to identify discrepancies.

Resource Optimization

Headless browsers are resource-intensive. A single Chrome instance uses 200-500 MB of RAM. At scale, resource optimization is critical.

Memory Management

  • Limit concurrent pages: Each tab consumes additional memory. Close pages when done.
  • Use browser contexts: Playwright’s browser contexts share a single browser process, using less memory than separate browser instances.
  • Block unnecessary resources: Intercept and block images, fonts, CSS, and media files that you do not need for data extraction. This can reduce memory usage by 40-60%.
  • Periodic restart: Chromium has known memory leaks. Restart browser instances every 50-100 pages.

Network Optimization

Block unnecessary network requests to reduce bandwidth and speed up page loads:

  • Block image loading (unless you need images)
  • Block font downloads
  • Block analytics and tracking scripts (Google Analytics, Facebook Pixel, etc.)
  • Block ad network requests
  • Block video and audio content

This reduces page load time by 50-80% and significantly reduces proxy bandwidth consumption.

CPU Optimization

  • Disable animations: CSS animations consume CPU cycles without providing scraping value
  • Disable smooth scrolling: Use instant scroll when scrolling is needed for content loading
  • Avoid unnecessary rendering: If you only need API response data (intercepted via network monitoring), you can navigate with minimal rendering

Scaling Architecture

For production scraping operations:

  • Container-based: Run each browser instance in a Docker container with resource limits
  • Pool management: Maintain a pool of warm browser instances rather than launching and closing for each task
  • Horizontal scaling: Distribute browser instances across multiple machines
  • Queue-based workload: Decouple URL generation from browser-based scraping to manage concurrency

Putting It All Together

The complete anti-detection stack for production scraping:

  1. Playwright with stealth patches for browser automation
  2. Mobile proxies from DataResearchTools for high-trust IP addresses
  3. Consistent fingerprint profiles that match your proxy’s geographic and device characteristics
  4. Network interception to block unnecessary resources and capture API responses
  5. Proxy rotation coordinated with fingerprint rotation
  6. Monitoring to detect degradation before it becomes blocking
  7. Rate limiting to stay within sustainable request budgets

This stack handles the vast majority of anti-bot defenses deployed on the web today, including Cloudflare, HUMAN, Akamai, and DataDome. For a deeper understanding of how these systems work, see our guide on how anti-bot detection systems identify scrapers.

Start with Playwright, add a mobile proxy, apply stealth patches, and validate against detection test sites before scaling to production workloads. Explore our web scraping proxy solutions for proxy infrastructure that integrates seamlessly with headless browser setups.


Related Reading

Scroll to Top