Migrating from Puppeteer to Playwright in 2026: Step by Step

If you’re still running Puppeteer in 2026, migrating to Playwright isn’t a nice-to-have anymore. It’s overdue. Playwright 1.43+ handles Chromium, Firefox, and WebKit from a single API, ships with better async primitives, and its locator system handles dynamic DOM updates without the manual wait juggling that made Puppeteer scripts brittle. This guide covers what actually breaks during the migration, what you can reuse, and the two or three places where the API is genuinely different enough to bite you.

Why Puppeteer isn’t keeping up

Puppeteer 21+ dropped Firefox and WebKit support. That’s fine if you only care about Chromium — but a lot of modern anti-bot stacks fingerprint browser type as part of their detection logic. Running exclusively on Chromium narrows your options. Playwright gives you all three browsers without a separate tool.

The other thing is request interception. Puppeteer’s page.on('request', ...) model is async in an awkward way that causes race conditions if you’re not careful. Playwright’s page.route() is synchronous from the handler’s perspective. It’s one of those changes that sounds small until you’ve spent an afternoon debugging a Puppeteer scrape that drops requests intermittently.

Before going further: if you’re still deciding whether to migrate at all, the detailed benchmarks are in Playwright vs Puppeteer vs Selenium for Web Scraping 2026. Detection rates, performance numbers, community support — it’s all there.

What maps over and what doesn’t

Most of the API is familiar. The things that aren’t are BrowserContext, locators, and where cookies live.

PuppeteerPlaywright equivalentNotes
puppeteer.launch()chromium.launch()Per-browser launchers
page.goto(url)page.goto(url)Identical
page.waitForSelector()page.locator().waitFor()Locators preferred
page.evaluate()page.evaluate()Identical
page.on('request', ...)page.route(url, handler)Cleaner in Playwright
page.$() / page.$$()page.locator()Auto-retrying, lazy
page.cookies()context.cookies()Moved to BrowserContext
Incognito contextbrowser.new_context()More explicit

The BrowserContext shift is the biggest conceptual change. In Puppeteer, context was implicit — you launched a browser and got a page. In Playwright, you create contexts explicitly, and each one gets its own isolated cookies, local storage, and permissions. It’s more verbose to set up but it’s what makes parallelism clean.

Step-by-step migration

Do these in order. Each step is independently testable, so you can migrate incrementally rather than rewriting everything at once.

  1. Install playwright and remove puppeteer (or pyppeteer) from your project
  2. Replace browser launch and initialization code
  3. Update navigation and action calls (mostly find-and-replace)
  4. Migrate request interception to page.route()
  5. Replace page.$() and page.$$() with locators
  6. Move cookie and storage logic to BrowserContext
  7. Audit your wait logic — most of it can go away
  8. Test against real targets before calling it done

Here’s the before/after for the most common pattern:

# Puppeteer (via pyppeteer)
async def scrape():
    browser = await launch(headless=True)
    page = await browser.newPage()
    await page.goto('https://example.com')
    text = await page.evaluate('document.querySelector("h1").textContent')
    await browser.close()
    return text

# Playwright (playwright-python)
async def scrape():
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        context = await browser.new_context()
        page = await context.new_page()
        await page.goto('https://example.com')
        text = await page.locator('h1').text_content()
        await browser.close()
        return text

Mechanical search-and-replace gets you maybe 60% of the way there. The remaining 40% is request interception and the wait logic.

Migrating request interception

This is where most migrations stall. Blocking images and scripts in Puppeteer requires setting up a request listener, checking resource type, and handling the async abort correctly — and it’s easy to get wrong. Playwright’s page.route() is cleaner:

# Puppeteer
await page.setRequestInterception(True)
page.on('request', lambda req: asyncio.ensure_future(
    req.abort() if req.resource_type == 'image' else req.continue_()
))

# Playwright
await page.route('**/*', lambda route: route.abort()
    if route.request.resource_type == 'image'
    else route.continue_())

But page.route() also accepts glob and regex patterns, so you can drop entire CDN domains in one line. Blocking fonts, analytics, and ad scripts this way speeds up scrapes without handling every request individually.

Anti-bot and proxy considerations

Playwright doesn’t ship stealth built in, but playwright-stealth (Python) and playwright-extra with the stealth plugin (Node) both work as of mid-2026. Playwright’s default fingerprint surface is smaller than Puppeteer’s — no exposed navigator.webdriver, better canvas handling — but that doesn’t mean you’re invisible.

Against serious stacks like Kasada or Akamai, you need residential proxies regardless of which browser tool you’re using. The same infrastructure decisions that come up during Scrapy migrations — proxy rotation strategy, session management, retry logic — matter here too. After migrating, at minimum:

  • Set a real user_agent via context.new_context(user_agent=...)
  • Set viewport to an actual screen size
  • Use a residential or mobile proxy pool if you’re hitting rate limits

Don’t assume the migration alone fixes detection issues. It helps. It doesn’t solve everything.

Parallel sessions

Playwright’s BrowserContext model makes parallelism straightforward. Each context is fully isolated — separate cookies, storage, and state — running concurrently under a single browser process. It’s what the API was designed for from the start.

async def scrape_batch(urls):
    async with async_playwright() as p:
        browser = await p.chromium.launch()
        contexts = [await browser.new_context() for _ in urls]
        pages = [await ctx.new_page() for ctx in contexts]
        results = await asyncio.gather(*[
            page.goto(url) for page, url in zip(pages, urls)
        ])
        await browser.close()
        return results

Puppeteer incognito contexts are heavier and session isolation is less reliable. If you’re running any volume, the Playwright model is noticeably better.

Bottom line

Migrate to Playwright. The API conversion is mostly mechanical once you get the BrowserContext model and locators in your head. Multi-browser support, cleaner request interception, and better parallel session handling make it the more durable choice for 2026 scraping infrasctructure. DRT covers browser automation and anti-bot bypass in depth — if you’re building or rebuilding a scraping stack this year, it’s worth reading before you lock in your toolchain.

Related guides on dataresearchtools.com

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top
message me on telegram

Resources

Proxy Signals Podcast
Operator-level insights on mobile proxies and access infrastructure.

Multi-Account Proxies: Setup, Types, Tools & Mistakes (2026)