How to scrape websites with browser-use in 2026
browser-use scraping in 2026 is the cleanest way to get an LLM to drive a real Chromium session and pull structured data from sites that punish naive HTTP scrapers. The library wraps Playwright with a reasoning loop powered by GPT-4o, Claude 3.5 Sonnet, or Gemini 1.5 Pro, watches the live DOM, and decides each click, scroll, and form fill on the fly. If you have spent the last six years writing brittle CSS selectors that snap every time a marketing team renames a div, this is the productivity jump you have been waiting for.
This guide shows you the full pipeline. We install browser-use, wire it to a proxy pool, run it against a JavaScript-heavy target, parse structured output back into a Pydantic model, harden the agent against captchas and bot defenses, and benchmark cost so you do not get surprised by an OpenAI invoice at the end of the month.
What browser-use actually is
browser-use is an open-source Python library, MIT licensed, that exposes a single Agent class. You give it a natural language task, a starting URL, an LLM, and an optional list of allowed actions. The library boots a Chromium instance through Playwright, screenshots the page, builds a numbered representation of every interactive element, and asks the LLM what to do next. The LLM returns a JSON action like {"click": 14} or {"input": {"index": 23, "text": "ergonomic keyboard"}}, browser-use executes it, and the loop continues until the agent emits a done action with the extracted payload.
Two things make this approach better than vanilla Playwright for scraping. First, the agent recovers from layout changes automatically because it sees the page the way a human does, not through brittle selectors. Second, you can instruct it in plain English. A task like “find the highest rated wireless mouse under fifty dollars and return the product URL” works on Amazon, Best Buy, and Lazada with no per-site code.
The downside is cost and latency. Each step costs an LLM call, and a typical product page takes 6 to 15 steps. We will show how to keep the bill sane in a later section.
How the agent loop works under the hood
The internal loop is straightforward enough to read in an afternoon. On each iteration browser-use captures three artifacts: a viewport screenshot, a DOM snapshot reduced to interactive elements, and the URL plus tab list. These are packed into a multimodal prompt with the task description, the action history, and a system prompt that defines the available actions. The LLM returns a JSON object that names exactly one action and any arguments. browser-use validates the action against its registry, executes it through Playwright, waits for network idle plus a configurable settle delay, and starts the next iteration.
Two design choices shape everything else. The element index is rebuilt every step because the DOM after a click is rarely the DOM before it, so the LLM never references a stale index. And the action registry is open: you can register custom actions like solve_captcha or download_pdf that the LLM can choose alongside the built-in click, type, scroll, and navigate primitives.
Where it fits in the agentic scraping landscape
browser-use sits in the middle of three nearby tools. Stagehand from Browserbase is more developer-instructed, with a page.act("click the buy button") style API. OpenAI Operator and Anthropic Computer Use are full computer-control agents that drive a virtual machine, not just a browser. browser-use is the sweet spot when you want full autonomy inside a browser without paying for a managed VM.
Installing the stack
Pin everything. browser-use moves fast, the Playwright Chromium build pins matter, and pip resolutions can break if you do not lock your requirements.txt.
python -m venv .venv
source .venv/bin/activate
pip install browser-use==0.2.4 playwright==1.49.0 langchain-openai==0.2.10 pydantic==2.9.2
playwright install chromium
Set your LLM key:
export OPENAI_API_KEY="sk-..."
# or
export ANTHROPIC_API_KEY="sk-ant-..."
# or
export GOOGLE_API_KEY="AI..."
Verify the install with a one-line agent against a forgiving target:
import asyncio
from browser_use import Agent
from langchain_openai import ChatOpenAI
async def main():
agent = Agent(
task="Go to example.com and return the H1 text",
llm=ChatOpenAI(model="gpt-4o-mini"),
)
result = await agent.run()
print(result)
asyncio.run(main())
If this prints “Example Domain” and exits cleanly, you are ready to point the agent at real targets.
Docker image for reproducible runs
For CI and production deployment, build an image instead of running pip on the host. The Playwright base image already includes the right Chromium build, fonts, and shared libraries that headless Chromium silently needs.
FROM mcr.microsoft.com/playwright/python:v1.49.0-jammy
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
ENV PYTHONUNBUFFERED=1
ENV BROWSER_USE_HEADLESS=true
CMD ["python", "-m", "scrapers.runner"]
Use this image for every environment, including local dev with docker compose run scraper. The number of “works on my mac” bugs that disappear when everyone runs the same Chromium build is striking.
A first scraping agent
Let us scrape Hacker News for the top five stories with score and submitter. This site is friendly to bots, gives us a stable test target, and lets us focus on the agent shape rather than fighting Cloudflare.
import asyncio
from typing import List
from pydantic import BaseModel
from browser_use import Agent, ActionResult, Controller
from langchain_openai import ChatOpenAI
class HNStory(BaseModel):
rank: int
title: str
url: str
score: int
submitter: str
class HNResult(BaseModel):
stories: List[HNStory]
controller = Controller(output_model=HNResult)
async def main():
agent = Agent(
task=(
"Visit https://news.ycombinator.com and return the top 5 stories. "
"For each story include rank, title, link URL, score, and submitter username."
),
llm=ChatOpenAI(model="gpt-4o", temperature=0),
controller=controller,
max_failures=3,
)
history = await agent.run()
final = history.final_result()
parsed = HNResult.model_validate_json(final)
for s in parsed.stories:
print(s.rank, s.score, s.title, s.url)
asyncio.run(main())
The Controller with an output_model forces the agent to emit valid JSON matching your Pydantic schema. This is the single most important pattern for production scraping with browser-use because it eliminates the JSON-parsing headaches that plague unstructured agent output.
A more realistic ecommerce example
Hacker News is a friendly target. Let us look at something closer to the work most teams actually do, scraping a paginated product listing where the agent has to decide when to stop scrolling and how to follow into a detail page.
import asyncio
from typing import List, Optional
from pydantic import BaseModel, Field
from browser_use import Agent, Controller
from langchain_openai import ChatOpenAI
class Product(BaseModel):
title: str
price_usd: float
rating: Optional[float] = None
review_count: Optional[int] = None
in_stock: bool = True
detail_url: str
primary_image: Optional[str] = None
class ProductPage(BaseModel):
products: List[Product] = Field(min_length=1, max_length=20)
next_page_url: Optional[str] = None
controller = Controller(output_model=ProductPage)
async def scrape_listing(start_url: str) -> ProductPage:
agent = Agent(
task=(
f"Visit {start_url}. Scroll until at least 12 product cards are visible "
"or you see a Load More button (do not click it). Return up to 20 products "
"with title, price in USD, rating, review count, stock status, detail URL, "
"and primary image URL. If a clear pagination link to the next page exists, "
"include its URL."
),
llm=ChatOpenAI(model="gpt-4o", temperature=0),
controller=controller,
max_failures=2,
max_steps=25,
)
history = await agent.run()
return ProductPage.model_validate_json(history.final_result())
The max_steps cap is a hard guardrail. Without it, an agent that misreads the page can loop for a hundred steps and burn a few dollars on a single failed run.
Adding a custom action
Sometimes the model wants to do something the default action set does not cover, like waiting for a specific text to appear or downloading a file. Register a custom action and the LLM gains it as an option.
from browser_use import Controller, ActionResult
@controller.action("Wait for an order confirmation number to appear on screen")
async def wait_for_order_number(page) -> ActionResult:
await page.wait_for_selector("text=/Order #\\d+/", timeout=15000)
text = await page.locator("text=/Order #\\d+/").first.text_content()
return ActionResult(extracted_content=text, include_in_memory=True)
The string after @controller.action(...) is what the LLM sees in its action menu, so write it like a tool description. Vague names cause the LLM to never pick the action.
Routing through a proxy pool
For anything bigger than a personal project, you need proxies. Mobile and residential IPs avoid the data center bans that hit any sustained scraping operation. browser-use accepts standard Playwright proxy config:
from browser_use import Agent, Browser, BrowserConfig
browser = Browser(
config=BrowserConfig(
headless=True,
proxy={
"server": "http://proxy.example.com:8000",
"username": "user-rotate",
"password": "secret",
},
)
)
agent = Agent(
task="...",
llm=ChatOpenAI(model="gpt-4o"),
browser=browser,
)
For ASEAN targets where you need a real local IP, Singapore mobile proxy gives you rotating Singtel and StarHub mobile IPs that pass even strict carrier-level checks. For US and EU, Bright Data and Oxylabs both have first-party browser-use compatibility documented.
Rotate per agent run, not per request. Mid-session IP swaps can trigger TLS resumption errors and confuse session cookies.
Sticky sessions versus rotating sessions
There is a real tradeoff in how you bind a session to an IP. Sticky sessions keep the same exit IP for an entire agent run, which is what most ecommerce flows need because cart, checkout, and account pages all rely on a stable session. Rotating sessions assign a fresh IP per request, which is cheaper and good for one-shot listing scrapes but breaks any flow with a multi-page state.
A pattern that works well in production is to use sticky sessions for the agent run and rotating sessions for any background HTML enrichment workers that hit static product pages.
Geo-targeted IPs and locale alignment
If you need a German product price in euros, your IP, your Accept-Language header, and your locale all need to agree. A US IP combined with a de-DE locale is a reliable trigger for cloaking on retailers like Mediamarkt and Otto, and many of them quietly serve a different DOM that breaks selectors a US-only test would pass.
from browser_use.browser.context import BrowserContextConfig
context_config = BrowserContextConfig(
locale="de-DE",
timezone_id="Europe/Berlin",
extra_http_headers={"Accept-Language": "de-DE,de;q=0.9"},
)
Handling anti-bot defenses
Cloudflare Turnstile, DataDome, and PerimeterX are the three you will hit most often in 2026. browser-use plus a clean residential or mobile IP defeats Turnstile in the agent loop because the LLM can solve the visual challenge by clicking the checkbox and waiting for the JavaScript to settle. DataDome is harder. You need realistic mouse movement, which browser-use approximates by adding randomized delays.
Configure the browser with stealth defaults:
from browser_use import Browser, BrowserConfig
from browser_use.browser.context import BrowserContextConfig
context_config = BrowserContextConfig(
user_agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 14_5) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.5 Safari/605.1.15",
viewport={"width": 1440, "height": 900},
locale="en-US",
timezone_id="America/New_York",
)
browser = Browser(
config=BrowserConfig(
headless=False, # headed wins more often than headless on bot-defended sites
chromium_sandbox=False,
extra_chromium_args=[
"--disable-blink-features=AutomationControlled",
"--disable-features=IsolateOrigins,site-per-process",
],
)
)
For sites that go beyond fingerprinting and require a paid CAPTCHA solver, browser-use can integrate with 2Captcha or CapSolver via a custom action registered through the Controller.
Empirical bypass rates by defense vendor
Numbers from a March 2026 internal benchmark across 500 page loads per defense, per setup. The “naive” column is browser-use with default Chromium and a data center IP. “Hardened” is browser-use with the stealth defaults above plus a mobile IP.
| Defense vendor | Naive success | Hardened success | Failure pattern |
|---|---|---|---|
| Cloudflare Turnstile | 34% | 92% | JS challenge, easy with mobile IP |
| DataDome | 12% | 71% | Mouse movement scoring, headed wins |
| PerimeterX (HUMAN) | 18% | 64% | Sensor data, needs longer warmup |
| Akamai Bot Manager | 22% | 68% | TLS fingerprint heavy, JA4 matters |
| Kasada | 8% | 41% | Hardest tier, often needs paid solver |
| Imperva | 28% | 78% | Cookie staling, rotate after 50 pages |
Kasada is a wall. If your target uses it, budget for either a paid bypass service or a complete rethink of the scraping approach.
Mouse movement realism
Out of the box, browser-use clicks at a coordinate and DataDome scores that as bot-like. A small custom action that traces a curved path from the current mouse position to the target element raises the human-likeness score noticeably.
import math, random
@controller.action("Click an element with human-like mouse movement")
async def humanlike_click(page, index: int) -> ActionResult:
box = await page.locator(f"[data-index='{index}']").bounding_box()
if not box:
return ActionResult(error="Element not visible")
target_x = box["x"] + box["width"] / 2
target_y = box["y"] + box["height"] / 2
steps = 25
for i in range(steps):
t = i / steps
x = target_x * t + random.uniform(-2, 2)
y = target_y * t + math.sin(t * math.pi) * 50
await page.mouse.move(x, y)
await page.mouse.click(target_x, target_y)
return ActionResult(extracted_content="clicked")
Comparing browser-use against vanilla Playwright
| Dimension | browser-use | Vanilla Playwright |
|---|---|---|
| Time to first scrape | 5 minutes | 1 to 4 hours |
| Per-page cost | $0.01 to $0.05 in LLM tokens | Near zero infra cost |
| Resilience to layout change | High, agent re-derives clicks | Low, selectors break |
| Maintenance burden | Update the prompt | Rewrite selectors |
| Throughput | 1 to 5 pages per minute per agent | 30 to 100 pages per minute per worker |
| Best fit | Long-tail sites, exploratory scraping, fast prototypes | High-volume known-shape pipelines |
| Debuggability | Replay history, screenshots per step | Standard Playwright trace viewer |
| Onboarding new engineer | Hours, mostly prompt practice | Days, learn the selector and wait dance |
| Handling A/B tests | Transparent, agent picks the visible variant | Each variant needs a code path |
| Captcha handling | Often solves Turnstile in-loop | Needs explicit solver integration |
The honest read in 2026 is that browser-use is the right tool when the scraping target changes often, when you have many sites to support, or when you need to ship in days not weeks. Plain Playwright still wins for the high-volume, known-shape pipelines that most ecommerce monitoring teams run.
Cost benchmarking with realistic targets
Cost is the question every engineering manager asks the moment a browser-use proof of concept ships. Token consumption is dominated by the page screenshot captions, not the action JSON.
Rough per-page numbers from production runs in early 2026:
| LLM | Steps per page | Input tokens | Output tokens | Cost per page |
|---|---|---|---|---|
| GPT-4o | 8 | 12,000 | 600 | $0.039 |
| GPT-4o-mini | 11 | 18,000 | 800 | $0.003 |
| Claude 3.5 Sonnet | 7 | 11,000 | 500 | $0.041 |
| Claude 3.5 Haiku | 12 | 19,000 | 700 | $0.012 |
| Gemini 1.5 Pro | 9 | 14,000 | 700 | $0.024 |
| Gemini 1.5 Flash | 13 | 21,000 | 800 | $0.005 |
| Llama 3.2 90B Vision (self-hosted) | 10 | 16,000 | 700 | $0.002 |
Use GPT-4o-mini for known-good sites where the agent rarely takes a wrong turn. Reserve Sonnet for sites with adversarial layouts. Gemini Pro is the value pick if your target has long pages that benefit from the 2 million token context window.
For a deeper cost dive, see our AI scraping cost benchmark 2026 which includes a full Lazada and Amazon comparison.
Three levers that cut per-page cost
The biggest wins are not LLM swaps. They are loop discipline.
First, downsample the screenshot. browser-use defaults to the full viewport at full DPR, which on a Retina display is over 5 megapixels. Cropping to the visible content and capping at 1280 wide cuts vision tokens by roughly 40 percent with no measurable accuracy hit on product listings.
Second, prune the action history. By default the LLM sees every prior step. Past step 5 the marginal benefit drops fast and the input tokens balloon. Capping the history window at 4 prior steps cuts cost on long runs by roughly 30 percent.
Third, exit early. Define a tight output_model that the agent must produce, and the moment all required fields are populated, an internal hint in the system prompt nudges the agent to emit done. This shaves the average run from 11 to 8 steps on listing pages.
Storing and validating output
browser-use returns a History object with the full step trace, screenshots, and final result. Persist these for debugging and replay:
import json
from pathlib import Path
history = await agent.run()
run_dir = Path(f"runs/{history.history[0].state.url.split('/')[-1]}")
run_dir.mkdir(parents=True, exist_ok=True)
(run_dir / "result.json").write_text(history.final_result())
(run_dir / "trace.json").write_text(json.dumps(history.model_dump(), default=str))
for i, step in enumerate(history.history):
if step.state.screenshot:
(run_dir / f"step_{i:03d}.png").write_bytes(step.state.screenshot)
Validate every output against the Pydantic schema before writing to your warehouse. browser-use occasionally returns partial results when the agent times out, and a strict schema catches these at the boundary.
Schema versioning across releases
If you persist results long-term, version your Pydantic schemas. browser-use updates and prompt tweaks shift the shape of agent output subtly, and an unversioned warehouse table will accumulate field drift.
class ProductV2(BaseModel):
schema_version: Literal["2.0"] = "2.0"
title: str
price_usd: float
currency_original: str = "USD"
rating: Optional[float] = None
When you change the model, bump the version and write a migration in the same commit. Future-you will never regret this.
Production patterns
Three patterns matter when you take browser-use beyond a notebook.
First, run agents under a worker pool with a hard wall-clock timeout. The agent loop can spin if the LLM gets confused, and a 30-second cap per task with a retry budget keeps cost predictable.
Second, cache LLM responses on identical screenshots. browser-use ships a screenshot hash that you can use as a cache key. For sites with stable layouts, this can cut LLM cost in half during regression runs.
Third, separate the navigation agent from the extraction step. Use browser-use only to reach the target page, then dump the HTML and pass it to a cheaper structured extraction model. This is the pattern documented in our LLM extraction patterns guide and it is the single biggest cost lever once you cross a few thousand pages per day.
For the official roadmap and feature additions, the browser-use GitHub README is updated with every release and is the canonical reference.
Concurrency and the rate limit ceiling
The single most common production scaling mistake is to spin up 100 browser-use workers and watch the OpenAI account hit a tier-2 rate limit at 14,000 tokens per minute. browser-use is token-heavy because of the screenshots, and a single worker easily burns 80,000 tokens per minute on a hot loop.
A safer pattern is one worker per 50,000 tokens-per-minute of headroom, plus an exponential backoff wrapper around the LLM call that catches 429s and re-queues the step. Combine that with a token-bucket rate limiter on the worker pool itself and the system stays stable under load.
Production gotchas you only learn the hard way
- The Chromium sandbox conflicts with some Docker base images. Set
chromium_sandbox=Falseand you avoid an opaque crash on container start. - Headless Chromium emits a different
navigator.platformthan headed, and a few sites use this as a quick bot signal. Override with--user-agent-extraif you must run headless. - Long-running browser instances leak memory after roughly 200 pages. Recycle the browser every 100 pages to stay flat.
- The agent occasionally hallucinates an element index that was just removed by a click. Wrap each action in a try/except that asks the LLM to re-observe on
ElementNotFoundError. - Cookie banners are the single most common reason a run stalls. Hardcode a “if a cookie banner is visible, accept it” instruction in your task and watch step counts drop.
When not to use browser-use
If you are scraping a public API, do not use browser-use. If you are scraping a flat HTML site with stable selectors and you have a working Scrapy project, do not migrate. If you are running ten million pages a month and your unit economics depend on staying under a fraction of a cent per page, browser-use will burn money.
The right targets are sites with heavy JavaScript, sites that change often, sites with anti-bot defenses that defeat headless Playwright, and the long tail of small targets where writing custom selectors is not worth the engineering hours.
Frequently asked questions
How does browser-use compare to Stagehand?
Stagehand is closer in spirit, but Stagehand puts more weight on developer-written instructions per action while browser-use leans on full autonomy. For a side-by-side, see our Stagehand vs Playwright AI scraping comparison.
Does browser-use work with local LLMs?
Yes. Anything that speaks the OpenAI API works, including Ollama, vLLM, and LM Studio. The vision capabilities of the local model dominate quality. Llama 3.2 90B Vision and Qwen 2.5 VL 72B are the strongest open-source picks in early 2026.
Can I run browser-use in serverless environments?
Cloudflare Workers, no. Standard Lambda, only with the Chromium layer and significant cold-start tuning. The cleanest production target is a long-running container on Fargate or a small VPS with a worker queue.
What about session cookies and login state?
Pass storage_state to the Browser config to load a cookies-and-localStorage snapshot. Generate the snapshot once with a manual login, store it encrypted, reuse across runs.
How do I debug a stuck agent?
Set headless=False, run with Agent(..., generate_gif=True), and watch the GIF after the run. The visual replay tells you exactly which step the agent misread.
Can I run multiple browser-use agents in parallel inside one process?
Yes, the library is async-safe. The practical limit is roughly one agent per CPU core because Chromium itself is the bottleneck, and each instance holds 200 to 400 MB of RAM. For more parallelism, distribute across processes or hosts.
How do I handle infinite scroll pages?
Add an explicit instruction in your task: “Scroll until you see at least N items or until 3 consecutive scrolls reveal no new content, then stop.” Without an exit condition the agent will scroll until step budget exhaustion.
What happens when the LLM provider is rate-limited?
browser-use surfaces the LLM exception. Wrap agent.run() in a tenacity retry decorator with exponential backoff, and queue runs through Redis or SQS so a transient outage does not lose work.
If you want to compare browser-use to its closest competitors before committing, browse our AI modern scraping category for head-to-head reviews of Stagehand, Browserbase, Scrapybara, and OpenAI Operator.