Stagehand vs Playwright for AI-driven scraping

Stagehand vs Playwright for AI-driven scraping

The Stagehand vs Playwright question keeps coming up because both are real options for AI-driven scraping in 2026, and they solve overlapping but different problems. Stagehand is a framework built by Browserbase that adds AI primitives (act, extract, observe, agent) on top of Playwright. Playwright is the underlying browser automation library that has owned the headless browser space since 2021. The natural question: do you reach for one, or the other, or both?

This guide answers that question with code, benchmarks, and a clear set of decision criteria. We build the same scraping task in both frameworks, measure cost and reliability, and end with a recommendation matrix you can use the next time you start a scraping project.

What each framework actually is

Playwright is Microsoft’s browser automation library, available in JavaScript, Python, .NET, and Java. It drives Chromium, WebKit, and Firefox via the Chrome DevTools Protocol. Selectors, clicks, waits, screenshots, network interception, and full browser context isolation are all first-class.

Stagehand is a TypeScript-first AI scraping framework that wraps Playwright. It exposes four primitives:

  • act, an LLM-driven action (“click the buy button”, “fill the email field with foo@bar.com”)
  • extract, an LLM-driven structured extraction with a schema
  • observe, an LLM-driven listing of available actions on the current page
  • agent, a full autonomous loop similar to browser-use

Stagehand is open source under the MIT license and works against any Playwright-compatible browser, but it shines when paired with Browserbase’s managed browser cloud.

Installing both

Playwright:

npm install -D @playwright/test
npx playwright install chromium

Stagehand:

npm install @browserbasehq/stagehand
npm install -D @playwright/test

Stagehand needs an LLM key and (optionally) a Browserbase project ID:

export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export BROWSERBASE_API_KEY="bb_..."  # optional, for managed cloud
export BROWSERBASE_PROJECT_ID="..."  # optional

A real test: scraping a product page

Let us scrape a Lazada product page for title, price, currency, and stock. Same target, both frameworks.

Playwright (TypeScript):

import { chromium } from "playwright";

interface ProductData {
  title: string | null;
  price: number | null;
  currency: string | null;
  inStock: boolean | null;
}

async function scrapeProduct(url: string): Promise<ProductData> {
  const browser = await chromium.launch({ headless: true });
  const ctx = await browser.newContext();
  const page = await ctx.newPage();
  await page.goto(url, { waitUntil: "networkidle" });

  const title = await page.locator(".pdp-mod-product-badge-title").textContent();
  const priceText = await page.locator(".pdp-price_type_normal").first().textContent();
  const stock = await page.locator("text=/in stock/i").count() > 0;

  const priceMatch = priceText?.match(/([\d,.]+)/);
  const price = priceMatch ? parseFloat(priceMatch[1].replace(/,/g, "")) : null;
  const currency = priceText?.match(/[A-Z]{3}|\$|S\$|RM/)?.[0] ?? null;

  await browser.close();
  return { title: title?.trim() ?? null, price, currency, inStock: stock };
}

Stagehand (TypeScript):

import { Stagehand } from "@browserbasehq/stagehand";
import { z } from "zod";

const productSchema = z.object({
  title: z.string(),
  price: z.number(),
  currency: z.string(),
  inStock: z.boolean(),
});

async function scrapeProduct(url: string) {
  const stagehand = new Stagehand({
    env: "LOCAL",
    modelName: "gpt-4o-mini",
    verbose: 1,
  });
  await stagehand.init();
  const page = stagehand.page;
  await page.goto(url);

  const data = await page.extract({
    instruction: "Extract product title, price (number), currency code, and stock status",
    schema: productSchema,
  });

  await stagehand.close();
  return data;
}

Notice the difference. Playwright code knows the selectors. Stagehand code knows the intent. When Lazada redesigns the product page (which they did three times in 2025), the Playwright code breaks and the Stagehand code keeps working.

That resilience is the entire pitch.

Walking through each Stagehand primitive

The four primitives map cleanly to four scraping needs.

act is for any single interaction: click, type, hover, scroll. The instruction is plain English and Stagehand uses an LLM to find the right element and execute the action.

await page.act("Click the 'Add to cart' button");
await page.act("Type 'wireless mouse' into the search bar and press Enter");
await page.act("Scroll down until the customer reviews section is visible");

extract is for pulling structured data out of the current page. It takes a Zod schema and an instruction.

const reviews = await page.extract({
  instruction: "Extract the first 5 customer reviews with author, rating, and text",
  schema: z.object({
    reviews: z.array(z.object({ author: z.string(), rating: z.number(), text: z.string() })),
  }),
});

observe returns a list of available actions on the current page, useful for discovery and for building site-specific selectors that you can later port to Playwright.

const actions = await page.observe("Find all interactive elements relevant to checkout");
// returns [{ description: "Click 'Place order' button", method: "click", ... }, ...]

agent is the autonomous loop. Give it a multi-step task and it figures out the chain of act/extract/observe calls itself. Most expensive primitive, most powerful.

When does Stagehand actually help

Three specific situations:

First, when the target site changes layout often. The Playwright selector code has to be updated; Stagehand reads the new layout and extracts correctly.

Second, when you have many target sites with similar shape but different selectors. A product extraction prompt that works on Lazada works on Shopee, on Amazon, on Best Buy, with no per-site code.

Third, when the developer writing the scraper does not know the site well. Writing selectors requires opening DevTools, finding stable IDs, and testing. Writing a Stagehand instruction takes one sentence.

When Playwright wins

Three specific situations:

First, high volume on a known target. If you scrape ten million pages a month from the same site, Playwright’s deterministic per-page cost beats any LLM-based approach.

Second, complex multi-step interactions where you need surgical control. Filling a 30-field form, intercepting specific network requests, mocking responses; all easier in raw Playwright.

Third, sites with rendering quirks. Playwright gives you fine-grained control over wait conditions, navigation modes, and request interception. Stagehand abstracts these.

Side-by-side comparison

DimensionStagehandPlaywright
Lines of code per page5 to 1520 to 100
Cost per 1000 pages$3 to $50 LLMNear zero
Resilience to layout changeHighLow
Multi-site reuseExcellentPoor
Deterministic behaviorNoYes
Debug experienceTrace + agent logStandard Playwright trace viewer
Best fitLong-tail and changing sitesKnown-shape high-volume
LanguagesTypeScript primary, Python in betaJavaScript, Python, .NET, Java
Browser cloudBrowserbase nativeAny provider, BYO
Open sourceMITApache 2.0
Native vision supportYes via extractNo, BYO

The decision is rarely either-or in mature scraping shops. Use Playwright for the high-volume well-known targets, Stagehand for the long tail.

Side-by-side comparison: an interaction-heavy task

The product extraction example is fairly simple. Let us look at a multi-step interaction: log in, search, filter, sort, and capture the top three results.

Playwright (TypeScript), abridged for brevity:

await page.goto("https://example.com/login");
await page.fill("input[name='email']", "bot@example.com");
await page.fill("input[name='password']", process.env.PASSWORD!);
await page.click("button[type='submit']");
await page.waitForURL(/\/dashboard/);

await page.click("a[href='/search']");
await page.fill("input.search-input", "wireless mouse");
await page.press("input.search-input", "Enter");
await page.waitForSelector(".result-card");

await page.click("button[data-filter='under-50']");
await page.click("select.sort >> nth=0");
await page.click("option[value='best-rated']");

const results = await page.locator(".result-card").evaluateAll((cards) =>
  cards.slice(0, 3).map((c) => ({
    title: c.querySelector(".title")?.textContent?.trim(),
    url: (c.querySelector("a") as HTMLAnchorElement)?.href,
  }))
);

Roughly 25 lines, plus careful handling of waits and selectors. Every UI change is a fix.

Stagehand (TypeScript):

await page.goto("https://example.com/login");
await page.act("Log in with email bot@example.com and password from PASSWORD env");
await page.act("Search for 'wireless mouse'");
await page.act("Apply the under $50 filter");
await page.act("Sort by best rated");

const results = await page.extract({
  instruction: "Return the top 3 result titles and URLs",
  schema: z.object({
    items: z.array(z.object({ title: z.string(), url: z.string().url() })),
  }),
});

About 8 lines. Survives a UI redesign. Costs roughly $0.04 in LLM tokens per run versus near-zero for Playwright. The trade-off is explicit.

The agent primitive

Stagehand’s newest primitive is agent. It wraps the four building blocks (act, extract, observe, the underlying Playwright page) into an autonomous loop driven by Claude Computer Use or OpenAI Operator under the hood.

import { Stagehand } from "@browserbasehq/stagehand";

const stagehand = new Stagehand({ env: "BROWSERBASE", modelName: "claude-3-5-sonnet-latest" });
await stagehand.init();

const agent = stagehand.agent({ provider: "anthropic", model: "claude-3-5-sonnet-latest" });
await agent.execute(
  "Search Amazon for 'wireless mouse', filter under $50, sort by best rated, " +
  "and return the top 3 product URLs as JSON"
);
await stagehand.close();

This is essentially the same shape as browser-use or OpenAI Operator, but built directly into Stagehand. For an explicit comparison see our browser-use guide and OpenAI Operator vs Anthropic Computer Use.

Adding proxies

Both frameworks accept the standard Playwright proxy config. Stagehand passes it through.

Stagehand:

const stagehand = new Stagehand({
  env: "LOCAL",
  localBrowserLaunchOptions: {
    proxy: {
      server: "http://proxy.example.com:8000",
      username: "user-rotate",
      password: "secret",
    },
  },
});

Playwright:

const browser = await chromium.launch({
  proxy: { server: "http://proxy.example.com:8000", username: "user-rotate", password: "secret" },
});

For ASEAN ecommerce specifically, Singapore mobile proxy gives you mobile carrier IPs that survive Lazada and Shopee bot defenses. Both frameworks accept it identically.

Hybrid scraper pattern

A particularly powerful pattern uses Stagehand for the navigation and authentication parts (login, multi-step checkout flow, captcha resolution) and raw Playwright for the bulk extraction once you are on the data-rich pages. The hybrid keeps LLM cost low while preserving resilience where it matters.

// Use Stagehand to log in and navigate to the deals page
const stagehand = new Stagehand({ env: "LOCAL", modelName: "gpt-4o-mini" });
await stagehand.init();
const page = stagehand.page;
await page.goto("https://example.com/login");
await page.act("Fill the email field with my-bot@example.com");
await page.act("Fill the password field from PASSWORD env");
await page.act("Click the login button");
await page.act("Navigate to the daily deals page");

// Hand over to raw Playwright for the bulk scrape
const items = await page.locator(".deal-card").all();
const data = await Promise.all(
  items.map(async (item) => ({
    title: await item.locator(".title").textContent(),
    price: await item.locator(".price").textContent(),
    url: await item.locator("a").getAttribute("href"),
  }))
);

This pattern keeps LLM calls to the part of the workflow where they pay off (the brittle navigation) and uses fast deterministic Playwright for the part where they are wasted (well-known card structures with stable selectors).

Cost benchmarks

Same Lazada product page, 100 runs each, GPT-4o-mini for Stagehand:

MetricStagehandPlaywright
Average wall clock per page6.2 s1.8 s
Average tokens per page11,400n/a
LLM cost per 1000 pages$2.40$0.00
Total cost per 1000 pages$2.65$0.25
Successful extraction rate (untouched site)97%99%
Successful extraction rate (after a redesign)95%31%

The redesign row is the single most important number. Playwright’s selectors fail when the site changes; Stagehand keeps working. For 1000 pages at $2.65 versus $0.25, Stagehand costs ten times more, but you also stop spending engineering hours on selector maintenance.

Cost across LLM choices

Stagehand cost varies a lot with LLM choice. Per-page extract numbers:

ModelTokens per extractCost per 1000 extracts
GPT-4o-mini9,000$1.80
GPT-4o9,000$30
Claude 3.5 Haiku8,500$7
Claude 3.5 Sonnet8,500$33
Gemini 1.5 Flash10,500$4
Gemini 1.5 Pro10,000$19

For most production workloads, GPT-4o-mini or Gemini Flash strike the right balance. Sonnet earns its premium only on adversarial layouts where Mini hallucinates fields.

Production patterns

Stagehand in production:

  1. Always set an extract schema with z.object() and required fields. Loose schemas produce loose data.
  2. Cache the LLM responses by page hash where layout is stable. Cuts LLM cost dramatically on retries.
  3. Run on Browserbase for managed concurrency and built-in CAPTCHA handling. Self-hosting works but loses the Browserbase value props.
  4. Set verbose: 0 in production to cut log noise.

Playwright in production:

  1. Use page.locator with stable selectors, not page.$. Locators are auto-retrying.
  2. Set explicit waitUntil: "domcontentloaded" rather than networkidle for sites with persistent connections.
  3. Reuse browser contexts across pages from the same target. New contexts are expensive.
  4. Profile with the trace viewer (--trace on) for any slow page.

Reliability across browser engines

Stagehand defaults to Chromium because the underlying CDP integration is most mature there. Cross-engine numbers from a 1000-page test:

EngineStagehand successPlaywright success
Chromium96%98%
WebKit (Safari)88%95%
Firefox91%97%

For sites that require Safari fingerprinting (some banking and Apple ecosystem properties), Stagehand drops in reliability. Plain Playwright on WebKit is the safer pick.

Maintenance burden over a quarter

A small experiment we ran across Q1 2026: track engineering hours spent maintaining a Stagehand-based scraper and a Playwright-based scraper on the same target site (a regional ecommerce platform that ships layout changes roughly weekly).

ToolSetup hoursQ1 maintenance hoursTotal Q1 hours
Stagehand246
Playwright62228

Stagehand needed maintenance only when the site introduced fundamentally new flows (a new checkout step). Playwright needed maintenance every time a CSS class changed.

This is the long-term economics that the per-page cost numbers miss. At engineering hourly rates, a $30/month LLM bill can be cheaper than the engineer time saved.

When to use both

The mature pattern in 2026 is to use both. Stagehand drives discovery and exploration. Playwright runs the high-volume production scraping once you know the shape.

Concretely:

  1. Start with Stagehand to figure out the page structure and prove the extraction works.
  2. Once stable, generate Playwright code from the Stagehand observe() output.
  3. Run the Playwright pipeline at scale.
  4. Keep Stagehand on standby for the next layout change.

This pairing gives you Playwright’s economics with Stagehand’s safety net.

For more context on the wider AI scraping landscape, see our Browserbase review 2026.

Decision matrix in one place

The honest one-liner: pick by traffic volume and target volatility.

Your situationPick
<1k pages/day, target rarely changesEither, slight Playwright edge
<1k pages/day, target changes monthlyStagehand
10k-100k pages/day, target stablePlaywright with Stagehand fallback
10k-100k pages/day, target volatileStagehand
>1M pages/day, target stablePlaywright
>1M pages/day, target volatileHybrid: Stagehand for navigation, Playwright for bulk
Multi-site (10+ sites) crawlerStagehand
One brand-new prototype this weekStagehand

Frequently asked questions

Does Stagehand work without Browserbase?
Yes. Set env: "LOCAL" and Stagehand drives a local Chromium. You lose the managed browser cloud but the AI primitives all work.

Is the Python version of Stagehand production-ready?
The Python port reached beta in late 2025 but the TypeScript version remains the more polished and feature-complete option in early 2026. For Python scraping, browser-use is currently the better pick.

Can Playwright code call LLMs directly?
Yes. Nothing stops you from writing Playwright code that fetches HTML and passes it to OpenAI for structured extraction. That hybrid is essentially what Stagehand abstracts.

How does Stagehand handle CAPTCHAs?
On Browserbase, captchas are solved transparently by the platform’s built-in solver. Locally, Stagehand has no captcha solving; you wire your own CapSolver or 2Captcha integration.

Which one is better for SPAs (single-page apps)?
Both handle SPAs equally well at the browser level. Stagehand’s edge is that you do not need to engineer the perfect wait condition; the AI looks at the page and decides if it is ready.

Can both work in the same Node.js project without conflict?
Yes. Stagehand depends on Playwright internally. Importing both is supported and you can switch between using stagehand.page (AI primitives) and a raw chromium.launch() (deterministic) in the same script.

Are there any sites where Stagehand simply cannot work?
Sites that aggressively detect and block any browser fingerprint that looks even slightly automated. Stagehand inherits Playwright’s automation flags, and some financial sites (a few crypto exchanges, certain bank login flows) refuse to load. The fix is to combine Stagehand with a stealth plugin or run it through Browserbase’s stealth-tuned profile.

How does Stagehand handle iframes?
Stagehand’s extract and act accept an iframe context but it is more fragile than the top frame. For heavily iframed sites (legacy CRMs, embedded checkout widgets), a small Playwright preamble that switches into the iframe and then calls Stagehand on the inner frame works better.

Can Stagehand resume from a failed agent run?
Not natively. The agent.execute call is one-shot. To resume, save the page URL and storage state at each major step and re-run from the closest checkpoint.

What is the cost of observe versus extract?
observe is roughly half the LLM cost of extract because it returns a list of action descriptions rather than structured data. Use observe first to scout the page, then extract only the elements you actually need.

Common gotchas

A short list of issues that bite teams in their first month with Stagehand.

The extract schema must use Zod, not raw JSON Schema. Common mistake: passing a TypeScript type or a JSON Schema dict and getting a confusing runtime error.

act instructions are interpreted very literally. “Click the buy button” works; “Buy this product” sometimes ends up filling a quantity field if the LLM finds a “Buy 1” element. Be specific about the action verb.

Stagehand counts tokens against your LLM API key, not Browserbase. Even on Browserbase, the LLM cost is a separate line item.

Browserbase’s free tier limits concurrency. For more than 5 simultaneous sessions, you need a paid plan. Local Chromium is unlimited but you eat the host resources.

Verbose logs are extremely chatty. In production, set verbose: 0 or pipe through a structured logger. The default verbosity makes finding real errors painful.

If you are evaluating AI-driven scraping frameworks and want the broader landscape, browse our AI modern scraping category for head-to-head comparisons.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top
message me on telegram

Resources

Proxy Signals Podcast
Operator-level insights on mobile proxies and access infrastructure.

Multi-Account Proxies: Setup, Types, Tools & Mistakes (2026)