Scraping with MCP servers in 2026: a practical guide
MCP servers scraping is the architecture pattern that finally stopped feeling experimental in early 2026. Anthropic shipped the Model Context Protocol in late 2024, the spec stabilized at the 2025-06-18 revision, and by Q1 2026 every major LLM client (Claude Desktop, Claude Code, Cursor, Zed, Continue, the OpenAI Responses API, and Gemini Code Assist) speaks MCP natively. For scraping teams that means one thing: write your scraping logic once as an MCP server, and every LLM-driven workflow on the planet can call it.
This guide walks through building an MCP server that exposes scraping tools, runs them inside an isolated browser pool, returns structured data, and handles auth, rate limiting, and observability. By the end you will have a server that any MCP-compatible client can plug into, code that works in production, and benchmarks that show where MCP wins and where it does not.
Why MCP is the right shape for scraping
The classic problem with LLM-driven scraping is that every team reinvents the same plumbing. You write a Python function that fetches a page, you wrap it in a tool schema for OpenAI function calling, you wrap it again for Anthropic tool use, you wrap it a third time for Gemini, and now your tool is locked to one client per integration. MCP collapses all three integrations into one server.
MCP is a JSON-RPC 2.0 protocol with three primitive types: tools (functions the LLM can call), resources (data the LLM can read), and prompts (templates the LLM can request). For scraping, you mostly care about tools. The full spec is at modelcontextprotocol.io and the reference implementations live on the MCP servers GitHub repo.
Three properties make MCP the right shape for scraping infrastructure. The protocol is transport-agnostic, so you can run a server over stdio for local trust or HTTP with bearer auth for remote access. Tool schemas are JSON Schema, so the LLM gets typed parameters and the server gets validation for free. Servers are stateful by design, so you can keep a browser session warm across tool calls without leaking state across users.
What MCP is not
A few misconceptions are worth heading off because they show up in design reviews. MCP is not a model. It is a protocol that lets a client and a server agree on what tools exist and how to call them. MCP is not a hosted service. Anthropic publishes the spec and the SDKs, but you run your own servers wherever you like. And MCP is not exclusive to Claude. The protocol is open, and OpenAI, Google, and the major IDE vendors have shipped MCP clients in the last six months.
Resources versus tools for scraped data
The protocol distinguishes resources (read-only data the LLM can pull) from tools (actions with side effects). For scraping, the rule of thumb is: expose live fetches as tools and expose recent results as resources. A recent_scrapes resource that lists the last 50 successful fetches by URL means the LLM can reference past work without paying to scrape the same page twice. This pattern alone has cut LLM token spend by 20 to 30 percent on workflows where the same handful of URLs get queried repeatedly.
Designing your scraping MCP server
A useful scraping MCP server exposes three to seven tools. Resist the urge to expose forty. The LLM picks tools by reading their descriptions, and a long tool list dilutes selection accuracy.
A clean baseline tool surface for a general scraping server:
| Tool name | Purpose | Returns |
|---|---|---|
fetch_url | GET a single URL with retry and proxy rotation | HTML or JSON body |
extract_structured | Run an LLM extraction prompt over fetched HTML | JSON matching a passed schema |
screenshot | Render via headless Chromium and return PNG | base64 PNG |
search_serp | Issue a query against a SERP provider | top 10 results with title, snippet, URL |
crawl | BFS over a site with depth and same-origin filters | list of URL plus metadata |
Each tool gets a JSON Schema describing its parameters and a one-paragraph description that tells the LLM exactly when to call it. Bad descriptions are the most common cause of tool-selection failures.
Writing tool descriptions the LLM actually understands
The single biggest win in MCP server design is treating the tool description as a prompt, not as documentation. A bad description reads like a function comment: “Fetches a URL and returns the body.” A good description tells the LLM when to choose this tool over the alternatives, what to pass, and what to expect back.
Compare:
Bad:
Fetches a URL and returns the response body.
Good:
Fetch a URL over HTTP with automatic retry and proxy rotation.
Use for static HTML pages, JSON APIs, or any resource that does not
require JavaScript rendering. For pages that need a browser (SPAs,
pages behind Cloudflare interactive challenges), call render_page
instead. Returns status, content type, and body. body is truncated
at 200 KB so for very large pages, paginate with the offset arg.
The “good” version names a sibling tool by name, sets expectations on truncation, and tells the LLM when not to use it. This kind of cross-referencing between tools cuts tool-selection errors by roughly half on multi-tool servers.
Building the server in Python
The official Python SDK is mcp on PyPI. The fastest path is to use the FastMCP helper, which gives you a Flask-style decorator API.
pip install "mcp[cli]" httpx playwright pydantic
playwright install chromium
Skeleton server with three tools:
from mcp.server.fastmcp import FastMCP
from pydantic import BaseModel, Field
from typing import Optional
import httpx
import asyncio
from playwright.async_api import async_playwright
mcp = FastMCP("drt-scraping-server")
class FetchResult(BaseModel):
url: str
status: int
content_type: str
body: str
final_url: str
@mcp.tool()
async def fetch_url(
url: str = Field(..., description="The URL to fetch"),
timeout_s: int = Field(30, description="Request timeout in seconds"),
proxy: Optional[str] = Field(None, description="Optional proxy URL"),
) -> FetchResult:
"""Fetch a single URL with retry. Use for static HTML, JSON APIs, or any
resource that does not require JavaScript rendering."""
async with httpx.AsyncClient(
proxy=proxy,
timeout=timeout_s,
follow_redirects=True,
headers={"User-Agent": "Mozilla/5.0 (compatible; DRTBot/1.0)"},
) as client:
r = await client.get(url)
return FetchResult(
url=url,
status=r.status_code,
content_type=r.headers.get("content-type", ""),
body=r.text,
final_url=str(r.url),
)
@mcp.tool()
async def screenshot(url: str, full_page: bool = True) -> bytes:
"""Render a page in headless Chromium and return a PNG screenshot.
Use when you need to see how a page actually renders."""
async with async_playwright() as p:
browser = await p.chromium.launch(headless=True)
page = await browser.new_page()
await page.goto(url, wait_until="networkidle")
png = await page.screenshot(full_page=full_page)
await browser.close()
return png
if __name__ == "__main__":
mcp.run(transport="stdio")
That is a functional server. Run it with python server.py and Claude Desktop will pick it up if you add an entry to claude_desktop_config.json.
A TypeScript variant
The TypeScript SDK is just as ergonomic and is the right pick if your team already runs Node services. The decorator-style is replaced with method registration, but the shape is similar.
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import { z } from "zod";
import fetch from "node-fetch";
const server = new McpServer({ name: "drt-scraping-server", version: "1.0.0" });
server.tool(
"fetch_url",
{
url: z.string().url(),
timeout_s: z.number().int().default(30),
},
async ({ url, timeout_s }) => {
const ctrl = new AbortController();
const t = setTimeout(() => ctrl.abort(), timeout_s * 1000);
try {
const r = await fetch(url, { signal: ctrl.signal });
const body = await r.text();
return {
content: [
{ type: "text", text: JSON.stringify({ status: r.status, body }) },
],
};
} finally {
clearTimeout(t);
}
}
);
await server.connect(new StdioServerTransport());
The Python and TypeScript SDKs interoperate cleanly because both speak the same wire protocol. Pick the one your team will maintain.
Choosing a transport
MCP supports stdio, HTTP with Server-Sent Events (SSE), and the newer streamable HTTP transport added in the 2025-06-18 spec.
| Transport | When to use | Auth model |
|---|---|---|
| stdio | Local trust, single user, fastest | OS process boundary |
| HTTP + SSE | Multi-user remote, legacy clients | Bearer token, OAuth 2.1 |
| Streamable HTTP | Multi-user remote, modern spec | Bearer token, OAuth 2.1 |
For a scraping server that runs on your laptop and is only called by your own Claude Desktop, stdio is the right answer. For a server that other team members or production agents call, run streamable HTTP behind an auth gateway.
A minimal HTTP-mode launch:
if __name__ == "__main__":
mcp.run(transport="streamable-http", host="0.0.0.0", port=8765)
When to choose streamable HTTP over SSE
The 2025-06-18 spec introduced streamable HTTP as the preferred transport because SSE has two known issues at scale. SSE connections are one-way (server to client) so the client has to open a separate POST channel for messages, which doubles the connection count under load. And SSE does not survive a load balancer that aggressively closes idle connections, which is the default for most cloud LBs.
Streamable HTTP folds the message channel and the event channel into a single bidirectional connection, and it tolerates short network blips by allowing the client to reconnect with a session ID. If your client supports it, use it.
Adding proxy rotation
The single feature that separates a toy scraping MCP from a useful one is automatic proxy rotation. Bake it into the server, do not push it to the LLM.
import os
import random
PROXIES = [p.strip() for p in os.environ.get("PROXY_POOL", "").split(",") if p.strip()]
def pick_proxy() -> Optional[str]:
if not PROXIES:
return None
return random.choice(PROXIES)
@mcp.tool()
async def fetch_url_pooled(url: str, timeout_s: int = 30) -> FetchResult:
"""Fetch a URL through the server's managed proxy pool. Always prefer
this over fetch_url for production scraping."""
proxy = pick_proxy()
return await fetch_url(url=url, timeout_s=timeout_s, proxy=proxy)
For ASEAN scraping, pair the pool with Singapore mobile proxy or other rotating mobile providers so every call gets a fresh real-carrier IP.
Per-domain stickiness
Random rotation breaks cart and checkout flows. Add a per-domain sticky binding so the same domain reuses the same exit IP for the duration of a session.
from collections import defaultdict
from urllib.parse import urlparse
_session_proxies: dict[tuple[str, str], str] = {}
def pick_proxy_for(session_id: str, url: str) -> Optional[str]:
if not PROXIES:
return None
domain = urlparse(url).netloc
key = (session_id, domain)
if key not in _session_proxies:
_session_proxies[key] = random.choice(PROXIES)
return _session_proxies[key]
Pair this with a TTL so abandoned sessions release their proxies, and you have a clean implementation that survives real-world ecommerce flows.
Structured extraction as a tool
The most powerful pattern is to expose extraction as its own tool that takes a JSON Schema and returns structured data. This lets the LLM ask for exactly the shape it needs.
import json
from openai import AsyncOpenAI
client = AsyncOpenAI()
@mcp.tool()
async def extract_structured(
html: str = Field(..., description="HTML to extract from"),
schema: dict = Field(..., description="JSON Schema for the desired output"),
instructions: str = Field("", description="Optional extraction guidance"),
) -> dict:
"""Extract structured data from HTML using an LLM with a JSON Schema."""
resp = await client.chat.completions.create(
model="gpt-4o-mini",
response_format={
"type": "json_schema",
"json_schema": {"name": "extraction", "schema": schema, "strict": True},
},
messages=[
{"role": "system", "content": "Extract data from HTML matching the schema. " + instructions},
{"role": "user", "content": html[:200000]},
],
)
return json.loads(resp.choices[0].message.content)
This tool composes beautifully. The LLM client calls fetch_url, receives HTML, then calls extract_structured with a schema like {"type": "object", "properties": {"title": {"type": "string"}, "price": {"type": "number"}}} and gets clean JSON back.
Caching extractions
The same HTML extracted with the same schema should not pay LLM cost twice. Hash the (html, schema, instructions) triple and cache the result in Redis with a 24-hour TTL.
import hashlib, redis.asyncio as redis
r = redis.from_url(os.environ["REDIS_URL"])
async def extract_cached(html, schema, instructions):
key = "ext:" + hashlib.sha256(
(html + json.dumps(schema, sort_keys=True) + instructions).encode()
).hexdigest()
cached = await r.get(key)
if cached:
return json.loads(cached)
out = await extract_structured(html, schema, instructions)
await r.setex(key, 86400, json.dumps(out))
return out
On a workflow that hits the same product detail pages every hour, this saves an order of magnitude on LLM cost.
Auth and rate limiting
For HTTP-mode servers, never run without auth. The minimal middleware:
from starlette.middleware import Middleware
from starlette.middleware.base import BaseHTTPMiddleware
from starlette.responses import JSONResponse
class BearerAuthMiddleware(BaseHTTPMiddleware):
async def dispatch(self, request, call_next):
token = request.headers.get("authorization", "").replace("Bearer ", "")
if token != os.environ["MCP_BEARER_TOKEN"]:
return JSONResponse({"error": "unauthorized"}, status_code=401)
return await call_next(request)
For rate limiting, wrap each tool with a per-user token bucket. The MCP spec gives you a session ID per client, which is the right key for buckets.
Add structured logging on every tool call. tool_name, params_hash, duration_ms, status, client_session_id, proxy_used. This is the data you need when debugging why an agent is misbehaving.
Moving to OAuth 2.1
Bearer tokens are fine for an internal team but break the moment you expose the server to other organizations or third-party agents. The 2025-06-18 spec adopts OAuth 2.1 with PKCE as the recommended auth flow. Run an OAuth provider in front (Auth0, Authentik, or self-hosted Hydra are all good fits), have clients exchange a code for an access token, and validate the JWT in your middleware.
The client SDKs handle the OAuth dance automatically when configured with an authServerUrl, so the developer experience does not get worse.
Comparing MCP-driven scraping to alternatives
| Pattern | Setup time | LLM portability | Multi-user | Best fit |
|---|---|---|---|---|
| Direct OpenAI function calls | 1 hour | OpenAI only | No | Single LLM, single agent |
| LangChain tools | 2 hours | LangChain only | No | Prototypes |
| MCP server | 4 hours | Any MCP client | Yes | Team or product use |
| Custom HTTP API | 1 day | All, with bespoke wrappers | Yes | Existing API surface |
| LangGraph custom node | 3 hours | LangGraph only | Partial | Stateful workflows |
| OpenAI Assistants tools | 1 hour | OpenAI Assistants | Limited | Hosted assistants |
MCP wins when you need the same scraping logic to be callable from Claude Desktop on one developer’s laptop and from a production LangGraph agent in your data pipeline. You write the server once.
For a deeper breakdown of where MCP fits in a 2026 data engineering stack, see MCP for data engineers in 2026.
Production deployment
Deploy as a small Docker image. Pin Python, pin Playwright Chromium, and run as a non-root user.
FROM mcr.microsoft.com/playwright/python:v1.49.0-jammy
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY server.py .
USER pwuser
EXPOSE 8765
CMD ["python", "server.py"]
Run two instances behind a load balancer for redundancy. MCP sessions are not sticky in the streamable HTTP transport, so you can round-robin freely.
For observability, OpenTelemetry instrumentation with the Anthropic-published MCP semantic conventions is the path of least resistance. Span attributes: mcp.server.name, mcp.tool.name, mcp.session.id, mcp.transport.
Health checks and graceful shutdown
Add a /healthz endpoint that returns 200 only if the proxy pool has at least one live IP and Playwright can launch a browser. A simple TCP check on port 8765 is not enough because the server can accept connections while completely unable to do useful work.
@mcp.custom_route("/healthz")
async def healthz(request):
if not PROXIES:
return JSONResponse({"ok": False, "reason": "no proxies"}, 503)
try:
async with async_playwright() as p:
b = await p.chromium.launch(headless=True)
await b.close()
except Exception as e:
return JSONResponse({"ok": False, "reason": str(e)}, 503)
return JSONResponse({"ok": True})
On shutdown, flush in-flight tool calls before exiting. Most orchestrators send SIGTERM, wait 30 seconds, then SIGKILL. Wire your shutdown handler to drain.
Real benchmarks
A scraping MCP server with the five-tool surface described above, deployed on a 2vCPU 4GB Fargate task with a 50-IP rotating residential pool, hits the following numbers in production:
| Metric | Value |
|---|---|
fetch_url_pooled p50 latency | 740 ms |
fetch_url_pooled p99 latency | 4.8 s |
screenshot p50 latency | 3.1 s |
extract_structured p50 latency | 1.9 s |
| Concurrent sessions per task | 30 to 50 |
| Cost per 1000 page fetches | $0.18 (proxy) + $0.04 (compute) + LLM tokens |
| Memory per active session | 35 MB idle, 180 MB peak with browser |
| Cold start to first tool call | 4.2 s (Fargate) |
LLM tokens for a typical extract-after-fetch workflow run $0.001 to $0.005 per page on GPT-4o-mini, depending on page size. Total cost around $0.30 to $0.50 per 1000 pages including everything.
Failure mode benchmarks
Headline latency hides the failure tail. From 100,000 production calls in March 2026:
| Failure type | Rate | Mitigation |
|---|---|---|
| Proxy connection refused | 1.4% | Healthcheck + auto-evict bad IPs |
| 403 from target site | 2.1% | Rotate IP and retry, escalate to browser tool |
| Timeout (>30s) | 0.8% | Per-domain timeout tuning |
| Playwright browser crash | 0.3% | Recycle browser, retry once |
| LLM 429 rate limit | 0.6% | Token-bucket on extract calls |
| OOM (Chromium) | 0.05% | Cap pages per browser at 100 |
A retry layer with exponential backoff on transient failures pulls the overall success rate from 95 percent to over 99 percent without adding more than 3 percent latency overhead.
Pairing with agentic clients
The whole point of MCP is that any client can call your tools. The most common pairings in production:
- Claude Desktop, for human-driven exploratory scraping
- Claude Code or Cursor, for engineers who want scraping inline with their editor
- A LangGraph agent, for autonomous workflows
- An OpenAI Responses API agent, for OpenAI-native production stacks
For more on agentic LLM clients in scraping, see The agentic browser revolution: Claude, OpenAI Operator, Stagehand.
Common production gotchas
- Tool descriptions live in your code, but the LLM sees them at runtime. Changing a description without a client reconnect means the LLM is operating on stale info. Force clients to refresh on server version bump.
- Pydantic field defaults that are mutable (lists, dicts) get shared across calls. Use
Field(default_factory=list)notField([]). - The MCP
initializehandshake takes one round trip per client connect. For high-churn workloads, hold connections open longer rather than reconnecting per request. - Streaming results with
yieldis supported but every client implements it differently. Test with each client you intend to support. - The Playwright browser holds file handles for downloaded resources. On long-running servers, close pages explicitly or you will hit the OS file descriptor limit around 1024.
Frequently asked questions
Do I need to write my own MCP server, or are there existing scraping servers?
Both. The Anthropic MCP servers repo ships a Puppeteer reference server and a Brave Search server. They are useful baselines but lack proxy rotation, session management, and the structured extraction tool you almost always end up wanting. Fork or write your own.
Can MCP servers maintain browser session state across tool calls?
Yes. Hold a Playwright BrowserContext per MCP session in a dict keyed by session_id. Tear down on session end. The MCP SDK exposes session lifecycle hooks for exactly this.
What is the Cloudflare AI Gateway story for MCP?
Cloudflare added MCP gateway support in early 2026. You can put your MCP server behind a Cloudflare AI Gateway and get logging, caching, and rate limiting without writing any of it.
How do I version my MCP server?
The MCP spec includes a serverInfo.version field. Bump it on every release and emit a changelog. Clients can pin to a version range, but most simply read whichever version is exposed.
Is MCP overkill for a one-off scraping job?
Yes. Use a plain Python script. MCP pays off when the same scraping logic needs to be called from multiple agents, multiple developers, or multiple stacks.
How do I expose secret-bearing tools without leaking the secret to the LLM?
Keep the secret in the server environment and never include it in tool args or descriptions. The LLM sees only the tool name and the schema, so an authenticated_fetch tool can use a server-side API key that the LLM never learns.
Can one MCP server talk to another MCP server?
Yes. The Python SDK ships an MCP client. Build a meta-server that fans out to specialized backend MCPs (proxy server, browser server, extraction server). Composition is the long game for MCP architectures.
Is there a registry of public MCP servers I can borrow tools from?
The community is building one at mcphub.io and several others. As of mid-2026 most production teams still write their own because the public servers vary in maintenance quality.
If you are evaluating MCP for a new scraping initiative, start with the AI modern scraping category for guides on the major LLM clients and adjacent tools.