Playwright Proxy Configuration: Step-by-Step Scraping Guide

Playwright Proxy Configuration: Step-by-Step Scraping Guide

Playwright has become the preferred browser automation tool for many scraping teams, and for good reason. It offers native support for per-context proxy configuration, built-in authentication handling, multi-browser support (Chromium, Firefox, WebKit), and a modern async API. Where Selenium requires workarounds for proxy authentication and Puppeteer lacks per-page proxy switching, Playwright handles both natively.

This guide covers Playwright proxy configuration from basic setup through advanced patterns including per-context routing, request interception, and production-ready scraper architectures. Examples are provided in both Python and Node.js.

Why Playwright for Proxy-Based Scraping

Playwright was built by the team that created Puppeteer, and it addresses many of Puppeteer’s limitations. For proxy-based scraping specifically, Playwright offers several advantages.

Native Proxy Authentication

Unlike Selenium and Puppeteer, Playwright supports proxy authentication as a first-class feature. No extensions, no third-party libraries, no local proxy forwarders — just pass credentials directly in the proxy configuration.

Browser Contexts with Independent Proxies

Playwright’s browser context model lets you create isolated browsing sessions within a single browser instance. Each context can have its own proxy, cookies, storage, and viewport. This means you can scrape multiple sites with different proxies simultaneously without launching multiple browsers.

Multi-Browser Support

Test your scraping setup across Chromium, Firefox, and WebKit from the same codebase. Different anti-bot systems may respond differently to different browsers, and Playwright lets you switch with a single parameter change.

Basic Proxy Setup

Python

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch(
        headless=True,
        proxy={
            'server': 'http://proxy-host:proxy-port',
            'username': 'your-username',
            'password': 'your-password'
        }
    )

    page = browser.new_page()
    page.goto('https://httpbin.org/ip')
    print(page.content())
    browser.close()

Node.js

const { chromium } = require('playwright');

(async () => {
  const browser = await chromium.launch({
    headless: true,
    proxy: {
      server: 'http://proxy-host:proxy-port',
      username: 'your-username',
      password: 'your-password'
    }
  });

  const page = await browser.newPage();
  await page.goto('https://httpbin.org/ip');
  console.log(await page.content());
  await browser.close();
})();

When you set the proxy at the browser level, all pages and contexts created from that browser instance use the same proxy. This is the simplest configuration and works for single-proxy scraping tasks.

SOCKS5 Proxy

browser = p.chromium.launch(
    proxy={
        'server': 'socks5://proxy-host:proxy-port',
        'username': 'your-username',
        'password': 'your-password'
    }
)

Playwright handles SOCKS5 natively, including DNS resolution through the proxy.

Per-Context Proxy Configuration

This is where Playwright shines compared to other tools. You can assign different proxies to different browser contexts within a single browser instance.

Python Example

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    # Launch browser WITHOUT a proxy -- proxies are set per context
    browser = p.chromium.launch(headless=True)

    # Context 1: Singapore mobile proxy
    context_sg = browser.new_context(
        proxy={
            'server': 'http://sg-mobile-proxy:port',
            'username': 'user_sg',
            'password': 'pass_sg'
        },
        viewport={'width': 1920, 'height': 1080},
        user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
    )

    # Context 2: US mobile proxy
    context_us = browser.new_context(
        proxy={
            'server': 'http://us-mobile-proxy:port',
            'username': 'user_us',
            'password': 'pass_us'
        },
        viewport={'width': 1920, 'height': 1080},
        user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
    )

    # Scrape with different proxies simultaneously
    page_sg = context_sg.new_page()
    page_us = context_us.new_page()

    page_sg.goto('https://example.sg/products')
    page_us.goto('https://example.com/products')

    sg_content = page_sg.content()
    us_content = page_us.content()

    context_sg.close()
    context_us.close()
    browser.close()

Node.js Example

const { chromium } = require('playwright');

(async () => {
  const browser = await chromium.launch({ headless: true });

  // Create contexts with different proxies
  const contextSG = await browser.newContext({
    proxy: {
      server: 'http://sg-mobile-proxy:port',
      username: 'user_sg',
      password: 'pass_sg'
    }
  });

  const contextUS = await browser.newContext({
    proxy: {
      server: 'http://us-mobile-proxy:port',
      username: 'user_us',
      password: 'pass_us'
    }
  });

  const pageSG = await contextSG.newPage();
  const pageUS = await contextUS.newPage();

  await Promise.all([
    pageSG.goto('https://example.sg/products'),
    pageUS.goto('https://example.com/products')
  ]);

  // Extract data from both...

  await browser.close();
})();

Memory Advantage

Each browser context uses significantly less memory than a separate browser instance. A Chromium browser with 10 contexts uses roughly 500 MB, compared to 1.5-3 GB for 10 separate browser instances. For multi-account scraping operations that need distinct proxies per account, this is a major advantage.

Learn more about managing multiple accounts with proxies in our multi-account proxy guide.

Request Interception

Playwright’s route API lets you intercept, modify, or block requests. This is useful for reducing bandwidth, injecting headers, and debugging proxy issues.

Blocking Unnecessary Resources

async def block_resources(route, request):
    blocked_types = ['image', 'stylesheet', 'font', 'media']
    if request.resource_type in blocked_types:
        await route.abort()
    else:
        await route.continue_()

page = context.new_page()
await page.route('**/*', block_resources)
await page.goto('https://example.com')

Blocking images, fonts, and stylesheets can reduce bandwidth consumption by 50-70%, which directly reduces proxy costs when paying per GB.

Modifying Request Headers

async def add_custom_headers(route, request):
    headers = {
        **request.headers,
        'Accept-Language': 'en-SG,en;q=0.9',
        'Accept-Encoding': 'gzip, deflate, br'
    }
    await route.continue_(headers=headers)

await page.route('**/*', add_custom_headers)

Intercepting API Responses

Sometimes the data you need is in API responses loaded by the page, not in the rendered HTML. Playwright can intercept these:

from playwright.sync_api import sync_playwright
import json

api_data = []

def handle_response(response):
    if '/api/products' in response.url:
        try:
            data = response.json()
            api_data.append(data)
        except Exception:
            pass

with sync_playwright() as p:
    browser = p.chromium.launch(
        proxy={'server': 'http://proxy:port', 'username': 'user', 'password': 'pass'}
    )

    page = browser.new_page()
    page.on('response', handle_response)
    page.goto('https://example.com/products')
    page.wait_for_timeout(5000)  # Wait for API calls to complete

    print(json.dumps(api_data, indent=2))
    browser.close()

This technique is often more efficient than parsing rendered HTML, especially for SPAs that load data via JSON APIs.

Proxy Rotation Strategies

Context-Based Rotation

Create a new context with a different proxy for each batch of requests:

from playwright.sync_api import sync_playwright
import random

proxies = [
    {'server': 'http://proxy1:port', 'username': 'user1', 'password': 'pass1'},
    {'server': 'http://proxy2:port', 'username': 'user2', 'password': 'pass2'},
    {'server': 'http://proxy3:port', 'username': 'user3', 'password': 'pass3'},
]

with sync_playwright() as p:
    browser = p.chromium.launch(headless=True)

    urls = ['https://example.com/page1', 'https://example.com/page2', ...]

    for i, url in enumerate(urls):
        if i % 5 == 0:  # New context every 5 URLs
            if 'context' in dir():
                context.close()
            proxy = random.choice(proxies)
            context = browser.new_context(proxy=proxy)

        page = context.new_page()
        page.goto(url, wait_until='networkidle')
        content = page.content()
        # Process content...
        page.close()

    browser.close()

Rotating Gateway

With a rotating proxy gateway, the provider handles IP rotation server-side:

browser = p.chromium.launch(
    proxy={
        'server': 'http://rotating-gateway.provider.com:port',
        'username': 'user',
        'password': 'pass'
    }
)

# Each new context or page may get a different IP
for url in urls:
    context = browser.new_context()
    page = context.new_page()
    page.goto(url)
    # Scrape...
    context.close()

Sticky Session Rotation

For scraping that requires session continuity (login-based scraping, pagination), use sticky sessions:

# Use a proxy endpoint that supports sticky sessions via session ID in username
context = browser.new_context(
    proxy={
        'server': 'http://sticky-gateway.provider.com:port',
        'username': 'user-session-abc123',  # Session ID keeps same IP
        'password': 'pass'
    }
)

page = context.new_page()
# All requests in this context use the same IP
page.goto('https://example.com/login')
# ... login ...
page.goto('https://example.com/dashboard')
# ... scrape multiple pages ...
context.close()

Anti-Detection with Playwright

Playwright is harder to detect than stock Selenium, but it still requires configuration to avoid fingerprinting.

Stealth Configuration

context = browser.new_context(
    viewport={'width': 1920, 'height': 1080},
    user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
    locale='en-SG',
    timezone_id='Asia/Singapore',
    proxy={'server': 'http://sg-proxy:port', 'username': 'user', 'password': 'pass'}
)

page = context.new_page()

# Remove automation indicators
await page.add_init_script("""
    Object.defineProperty(navigator, 'webdriver', { get: () => undefined });
    window.chrome = { runtime: {} };
""")

Playwright Stealth Plugin (Node.js)

const { chromium } = require('playwright');
const { PlaywrightExtra } = require('playwright-extra');
const stealth = require('puppeteer-extra-plugin-stealth');

const pw = new PlaywrightExtra(chromium);
pw.use(stealth());

const browser = await pw.launch({
  headless: true,
  proxy: {
    server: 'http://mobile-proxy:port',
    username: 'user',
    password: 'pass'
  }
});

Geographic Consistency

When using a Singapore mobile proxy, your browser context should reflect a Singapore user:

context = browser.new_context(
    proxy={'server': 'http://sg-mobile-proxy:port', 'username': 'user', 'password': 'pass'},
    locale='en-SG',
    timezone_id='Asia/Singapore',
    geolocation={'latitude': 1.3521, 'longitude': 103.8198},
    permissions=['geolocation'],
    viewport={'width': 1920, 'height': 1080}
)

Anti-bot systems cross-reference IP geolocation with browser timezone and locale. A mismatch is a detection signal. For more on how specific anti-bot platforms analyze these signals, see our guides on Cloudflare bypass and Akamai bypass.

Production Scraper: Python Async

Here is a production-ready async scraper using Playwright with mobile proxies:

import asyncio
from playwright.async_api import async_playwright
import logging
import random

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

class PlaywrightProxyScraper:
    def __init__(self, proxy_config, concurrency=3):
        self.proxy_config = proxy_config
        self.concurrency = concurrency
        self.browser = None
        self.playwright = None

    async def start(self):
        self.playwright = await async_playwright().start()
        self.browser = await self.playwright.chromium.launch(headless=True)

    async def scrape_url(self, url, semaphore):
        async with semaphore:
            context = await self.browser.new_context(
                proxy=self.proxy_config,
                viewport={'width': 1920, 'height': 1080},
                user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) '
                           'AppleWebKit/537.36 (KHTML, like Gecko) '
                           'Chrome/120.0.0.0 Safari/537.36',
                locale='en-SG',
                timezone_id='Asia/Singapore'
            )

            page = await context.new_page()

            # Block heavy resources
            await page.route('**/*.{png,jpg,jpeg,gif,svg,css,woff,woff2}',
                           lambda route: route.abort())

            try:
                await asyncio.sleep(random.uniform(0.5, 2.0))
                await page.goto(url, wait_until='networkidle', timeout=30000)

                title = await page.title()
                if any(x in title.lower() for x in ['blocked', 'denied', 'captcha']):
                    logger.warning(f"Blocked on {url}")
                    return {'url': url, 'success': False, 'content': None}

                content = await page.content()
                logger.info(f"Scraped {url}")
                return {'url': url, 'success': True, 'content': content}

            except Exception as e:
                logger.error(f"Failed {url}: {e}")
                return {'url': url, 'success': False, 'content': None}

            finally:
                await context.close()

    async def scrape_many(self, urls):
        semaphore = asyncio.Semaphore(self.concurrency)
        tasks = [self.scrape_url(url, semaphore) for url in urls]
        return await asyncio.gather(*tasks)

    async def close(self):
        if self.browser:
            await self.browser.close()
        if self.playwright:
            await self.playwright.stop()

# Usage
async def main():
    scraper = PlaywrightProxyScraper(
        proxy_config={
            'server': 'http://mobile-proxy.example.com:port',
            'username': 'your-username',
            'password': 'your-password'
        },
        concurrency=5
    )

    await scraper.start()

    urls = [f'https://example.com/page/{i}' for i in range(1, 51)]
    results = await scraper.scrape_many(urls)

    successful = sum(1 for r in results if r['success'])
    logger.info(f"Scraped {successful}/{len(urls)} pages successfully")

    await scraper.close()

asyncio.run(main())

Production Scraper: Node.js

const { chromium } = require('playwright');

class PlaywrightScraper {
  constructor(proxyConfig, concurrency = 3) {
    this.proxyConfig = proxyConfig;
    this.concurrency = concurrency;
    this.browser = null;
  }

  async start() {
    this.browser = await chromium.launch({ headless: true });
  }

  async scrapeUrl(url) {
    const context = await this.browser.newContext({
      proxy: this.proxyConfig,
      viewport: { width: 1920, height: 1080 },
      userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
      locale: 'en-SG',
      timezoneId: 'Asia/Singapore'
    });

    const page = await context.newPage();
    await page.route('**/*.{png,jpg,jpeg,gif,svg,css,woff,woff2}',
      route => route.abort()
    );

    try {
      await page.waitForTimeout(Math.random() * 1500 + 500);
      await page.goto(url, { waitUntil: 'networkidle', timeout: 30000 });

      const content = await page.content();
      return { url, success: true, content };
    } catch (err) {
      console.error(`Failed ${url}: ${err.message}`);
      return { url, success: false, content: null };
    } finally {
      await context.close();
    }
  }

  async scrapeMany(urls) {
    const results = [];
    for (let i = 0; i < urls.length; i += this.concurrency) {
      const batch = urls.slice(i, i + this.concurrency);
      const batchResults = await Promise.all(
        batch.map(url => this.scrapeUrl(url))
      );
      results.push(...batchResults);
    }
    return results;
  }

  async close() {
    if (this.browser) await this.browser.close();
  }
}

// Usage
(async () => {
  const scraper = new PlaywrightScraper({
    server: 'http://mobile-proxy.example.com:port',
    username: 'your-username',
    password: 'your-password'
  }, 5);

  await scraper.start();

  const urls = Array.from({ length: 50 }, (_, i) =>
    `https://example.com/page/${i + 1}`
  );

  const results = await scraper.scrapeMany(urls);
  const successful = results.filter(r => r.success).length;
  console.log(`Scraped ${successful}/${urls.length} pages`);

  await scraper.close();
})();

Playwright vs. Puppeteer vs. Selenium for Proxy Scraping

FeaturePlaywrightPuppeteerSelenium
Native proxy authYesNo (needs workaround)No (needs extension/Wire)
Per-context proxiesYesNo (per-browser only)No (per-driver only)
Multi-browserChromium, Firefox, WebKitChromium onlyAll browsers
Async APINativeNativeVia async wrapper
Stealth ecosystemGrowingMatureMature (undetected-chromedriver)
Memory per sessionLow (contexts)High (instances)High (instances)

For most new scraping projects, Playwright is the recommended choice. Its proxy handling is cleaner, its context model is more memory-efficient, and its API is more intuitive.

For Puppeteer-specific setups, see our Puppeteer proxy guide. For Selenium, check the Selenium proxy guide.

Conclusion

Playwright’s native proxy support — especially per-context configuration with built-in authentication — makes it the most scraping-friendly browser automation tool available. Combined with mobile proxies, you get a stack that handles both IP trust and browser fingerprinting in a clean, maintainable architecture.

The per-context model is particularly powerful for multi-account operations where each account needs its own proxy and cookie store. Instead of running dozens of browser instances, you run one browser with dozens of lightweight contexts.

DataResearchTools mobile proxies work seamlessly with Playwright’s proxy configuration. Get started with our scraping proxy plans and connect them to Playwright in under five minutes.


Related Reading

Scroll to Top