Playwright Page.goto Timeouts: Root Causes and Fixes for Scrapers

I’ll write this article directly.

—

page.goto() timing out is one of the most common errors scrapers hit in 2026, and it almost never means the site is simply slow. the real causes range from anti-bot detection and proxy routing latency to Playwright’s own default waitUntil strategy firing too early — or too late. understanding which layer is failing cuts debugging time from hours to minutes.

What Playwright’s timeout actually measures

When you call page.goto(url, { timeout: 30000 }), the clock starts the moment the navigation is initiated and stops when the condition set by waitUntil is satisfied. the default is load, which waits for the load event. if that event never fires — because a third-party script hangs, an ad network stalls, or the server never finishes sending — Playwright throws TimeoutError: page.goto: Timeout 30000ms exceeded.

the waitUntil option is where most engineers have leverage. here is how the four options behave in practice:

`waitUntil` value	fires when	best for
`commit`	first byte received	fastest, use for API-style pages
`domcontentloaded`	HTML parsed, no subresources	SPAs with client-side render
`load` (default)	all synchronous resources loaded	general content pages
`networkidle`	no network activity for 500ms	pages with heavy async hydration

networkidle sounds safe but is a trap on pages that poll analytics or keep a WebSocket alive — it can hang indefinitely. for most scraping workloads, domcontentloaded with a manual wait for the specific selector you need is faster and more reliable than waiting for networkidle.

The three most common root causes

Slow or blocked proxies

Proxy latency is the leading cause of goto timeouts that aren’t actually anti-bot related. a residential proxy hop through a congested exit node can add 8-15 seconds to TTFB alone, leaving you 15 seconds for the rest of the page load inside a 30-second budget. datacenter proxies are faster but more likely to be blocked outright.

if you are seeing timeouts cluster on specific targets rather than across all URLs in a run, the proxy is usually the culprit. test the same URL with curl through the same proxy to isolate the variable. when timeouts come from the server side cascading through multiple layers, that pattern is covered in the 504 Gateway Timeout: Causes & Fixes guide — the diagnosis approach transfers directly to scraper infrastructure.

Anti-bot middleware stalling the connection

Some anti-bot systems don’t block outright — they hold the TCP connection open while fingerprinting the client, then either serve content or close the socket. Akamai Bot Manager and Cloudflare’s challenge pages are the most common offenders. if you are seeing timeouts specifically on Akamai-protected domains, check whether the response includes a reference code in the error page — Akamai Reference Number 18.xxxxxx: Decoding the Error Code (2026) explains what those codes mean and what fingerprint signals triggered them.

Headless browser fingerprinting

Playwright in headless mode exposes detectable signals: missing navigator.plugins, predictable canvas hashes, and the HeadlessChrome string in the user agent. sites that detect these signals often stall rather than return a 403, which looks like a timeout to your scraper. this is the same class of problem covered in Why Your Headless Chrome Times Out: Common Causes and Fixes (2026). the fix is playwright-extra with puppeteer-extra-plugin-stealth ported to Playwright, or switching to Browserbase / Bright Data’s Scraping Browser, which handle fingerprint normalization at the infrastructure level.

Fixing timeouts in code

Start with a defensive goto wrapper that separates navigation errors from timeout errors:

import asyncio
from playwright.async_api import async_playwright, TimeoutError as PlaywrightTimeout

async def safe_goto(page, url, retries=3):
    for attempt in range(retries):
        try:
            await page.goto(
                url,
                wait_until="domcontentloaded",
                timeout=45_000,
            )
            return True
        except PlaywrightTimeout:
            if attempt == retries - 1:
                raise
            await asyncio.sleep(2 ** attempt)  # exponential backoff
    return False

a few things this pattern enforces:

domcontentloaded instead of load cuts median wait time by 30-60% on content-heavy pages
45 seconds gives proxy hops enough headroom without blocking your worker pool for too long
exponential backoff on retry avoids hammering a rate-limited endpoint

for targets that legitimately need networkidle, wrap the call with a shorter inner timeout and fall through to a selector-based wait:

try:
    await page.goto(url, wait_until="networkidle", timeout=20_000)
except PlaywrightTimeout:
    # fall back to waiting for the content we actually need
    await page.wait_for_selector(".product-price", timeout=25_000)

Distinguishing timeout causes quickly

When a timeout fires in production, run through this checklist in order:

check TTFB with curl -o /dev/null -s -w "%{time_starttransfer}\n" --proxy — if TTFB exceeds 10 seconds, the proxy or upstream is the bottleneck, not Playwright
replay the request in a headed (non-headless) browser to see if a challenge page loads — if it does, anti-bot is the cause
check whether the timeout is consistent or intermittent — consistent timeouts point to detection, intermittent timeouts point to proxy quality or rate limiting
compare behavior across different IP types (residential vs datacenter vs mobile) to isolate whether the block is IP-class-based

if you are migrating from Selenium and hitting similar issues there, Selenium WebDriverException: Diagnosis and Fixes for Scrapers (2026) covers the equivalent diagnostic process for WebDriver errors. the root causes overlap heavily — most are network and fingerprint issues, not framework bugs.

one signal that is easy to miss: a consistent HTTP 403 that Playwright surfaces as a timeout because the 403 response triggers a JS redirect that never resolves. always log response.status inside a page.on("response", ...) listener during debugging. HTTP 403 Forbidden When Scraping: Top 12 Causes and Fixes (2026) covers the 403 side of that pattern in detail.

Infrastructure-level fixes

When code-level changes aren’t enough, the following adjustments address the systemic causes:

rotate proxies per request, not per session — session stickiness trains bot detection models faster
set realistic viewport and device scale ({ width: 1440, height: 900, deviceScaleFactor: 1 }) — headless defaults are a known signal
add browser context reuse limits — contexts used for more than 50-100 requests accumulate fingerprint data in cookies and storage that anti-bot systems can detect
cap concurrent contexts per proxy IP to 1-2 — higher concurrency from a single IP is a strong bot signal regardless of browser fingerprint quality

for high-volume workloads (10,000+ pages/day), managed scraping browsers like Browserbase or Bright Data’s product absorb the fingerprint and proxy management overhead entirely. the per-page cost (roughly $0.001-0.003 per page) is almost always cheaper than engineering time spent chasing detection.

Bottom line

Most page.goto timeouts fall into three buckets: slow proxy routing, anti-bot stalling, or the wrong waitUntil strategy — and the fix for each is different, so measure before you change anything. switch to domcontentloaded plus a selector wait as your default, add a curl-based TTFB check to your debugging playbook, and treat persistent timeouts on specific domains as a fingerprint problem, not a timeout tuning problem. dataresearchtools.com covers this class of scraping infrastructure failure regularly — the patterns here apply across Playwright, Puppeteer, and any headless stack running behind proxies in 2026.