Running scrapers on Cloudflare Workers in 2026

Running scrapers on Cloudflare Workers is the kind of architectural decision that sounds clever in a slack thread and either pays off massively or crashes against runtime limits within a week. Workers give you a globally-distributed serverless runtime with sub-50ms cold starts, free egress to anywhere, and tight integration with KV/R2/D1/Durable Objects for state. The catch is that each Worker invocation has a 30-second CPU limit (or 5 minutes with longer-duration plans), 128 MB of memory, no persistent disk, and an aggressive limit on subrequests. For some scraping workloads these limits are fine. For others they are fatal.

This guide covers what Workers can and cannot do for scraping in 2026, the cases where the architecture wins, the limits that bite, and a complete working scraper implementation including state management via Durable Objects, queue management, and the new Browser Rendering API for JS-heavy pages.

What Workers offer that other serverless does not

The Workers value proposition for scrapers:

Sub-50ms cold start: each invocation feels instant, no Lambda-style 5-second cold starts
Free egress: outbound HTTPS to anywhere is free, no per-GB egress charges
300+ POPs: requests originate from whichever POP is closest to the target, naturally distributing IPs
Built-in KV, R2, D1: persistent state without separate infrastructure
Durable Objects: stateful coordination for queues, rate limits, locks
Browser Rendering API: real Chromium rendering integrated with Workers in 2024
Cron Triggers: scheduled execution without separate scheduler

For official Cloudflare Workers docs, see Cloudflare’s developer site.

What Workers cannot do for scrapers

The limits that catch teams:

30 sec CPU limit per invocation (Workers Standard) or 5 min (Workers Unbound, costs more)
128 MB memory (Standard) or 1 GB (Unbound)
50 subrequests (Standard) or 1000 (Unbound) per invocation
No persistent disk: no SQLite files, no temp directories
Limited Node.js compat: many npm packages do not work
Outbound IP is Cloudflare’s: no proxy support (requires fetch-via-proxy patterns)
No long-lived connections: WebSockets supported but not for scraping use
Browser Rendering API is metered: $0.20 per browser-minute, not free

The CPU limit is the most cutting one. A scraper that fetches 100 pages and parses them all in one invocation will hit 30 seconds easily. The pattern that works is: each Worker invocation does one small unit of work, persists state to KV/D1/Durable Object, and either schedules the next invocation or returns.

When Workers win for scraping

Workers shine when:

High-volume, light-per-page scraping (1000s of pages, each <1 sec to fetch)
Geo-distributed scraping (target wants to see traffic from many countries)
Scheduled lightweight jobs (cron-triggered, completes in seconds)
API endpoints that need a global edge layer wrapping internal scrapers
Webhook receivers that trigger scraping flows
Real-time price monitoring at scale

When Workers do not win:

Heavy JavaScript-rendered pages (Browser Rendering API helps but adds latency)
Sites with strict TLS fingerprinting (Worker fetch uses Cloudflare’s fingerprint)
Pages requiring proxies (no native proxy support)
Long-running scraping that needs minutes per page
Stateful scraping with complex local state

The pattern that wins: small, frequent, distributed scraping. The pattern that loses: long, complex, single-machine.

Setup: a basic Worker scraper

Install Wrangler:

npm install -g wrangler
wrangler login

Create a Worker:

wrangler init my-scraper --type javascript
cd my-scraper

The basic Worker that fetches a page:

// src/index.js
export default {
  async fetch(request, env, ctx) {
    const url = new URL(request.url).searchParams.get("url");
    if (!url) return new Response("Missing url parameter", { status: 400 });

    try {
      const resp = await fetch(url, {
        headers: {
          "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 "
                       + "(KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36",
          "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
          "Accept-Language": "en-US,en;q=0.9",
        },
      });
      const html = await resp.text();

      // Parse with HTMLRewriter (Worker's built-in HTML parser)
      const titles = [];
      const rewriter = new HTMLRewriter().on("h1", {
        text(text) {
          if (text.text.trim()) titles.push(text.text);
        },
      });
      await rewriter.transform(new Response(html)).text();

      return new Response(JSON.stringify({ url, titles }), {
        headers: { "Content-Type": "application/json" },
      });
    } catch (err) {
      return new Response(JSON.stringify({ error: err.message }), { status: 500 });
    }
  },
};

Deploy:

wrangler deploy

This Worker accepts a url query parameter, fetches it, extracts H1 titles, and returns JSON. Try it: curl https://my-scraper.your-subdomain.workers.dev?url=https://example.com.

Storage layer: KV, R2, D1, Durable Objects

Each storage option has tradeoffs:

storage	use case	latency	cost
KV	global k-v store, eventually consistent	<50ms read	$0.50/M reads
R2	object storage (S3-like)	varies	$0.36/M operations + storage
D1	SQL database (SQLite at edge)	<50ms	$0.001/1k rows read
Durable Objects	strongly consistent state, stateful	<10ms	$0.15/M requests
Workers Cache	HTTP-style cache	<10ms	included

For scraping state:

Crawl queue: Durable Object (FIFO, strongly consistent)
Visited URLs (dedup): KV (eventually consistent is fine)
Scraped data (records): D1 or R2
Per-host rate limits: Durable Object (atomic counters)
Page snapshots (HTML, screenshots): R2

A Durable Object for queue management:

// src/queue.js
export class CrawlQueue {
  constructor(state, env) {
    this.state = state;
    this.env = env;
  }

  async fetch(request) {
    const url = new URL(request.url);
    const action = url.searchParams.get("action");

    if (action === "push") {
      const body = await request.json();
      const queue = (await this.state.storage.get("queue")) || [];
      queue.push(...body.urls);
      await this.state.storage.put("queue", queue);
      return new Response(JSON.stringify({ size: queue.length }));
    }

    if (action === "pop") {
      const queue = (await this.state.storage.get("queue")) || [];
      const item = queue.shift();
      await this.state.storage.put("queue", queue);
      return new Response(JSON.stringify({ url: item, remaining: queue.length }));
    }

    return new Response("unknown action", { status: 400 });
  }
}

Bind it in wrangler.toml:

[[durable_objects.bindings]]
name = "QUEUE"
class_name = "CrawlQueue"

[[migrations]]
tag = "v1"
new_classes = ["CrawlQueue"]

Use from a Worker:

const id = env.QUEUE.idFromName("global");
const stub = env.QUEUE.get(id);
const resp = await stub.fetch("http://queue/?action=pop");
const { url } = await resp.json();

The Durable Object guarantees that two Workers calling pop simultaneously get different items (no race condition).

Browser Rendering API

For JavaScript-heavy pages, Cloudflare’s Browser Rendering API gives you Chromium in a Worker. It launched in 2024 and stabilized through 2025.

// wrangler.toml binding
// [browser]
// binding = "MYBROWSER"

import puppeteer from "@cloudflare/puppeteer";

export default {
  async fetch(request, env) {
    const url = new URL(request.url).searchParams.get("url");
    const browser = await puppeteer.launch(env.MYBROWSER);
    const page = await browser.newPage();
    await page.goto(url, { waitUntil: "networkidle0" });
    const html = await page.content();
    const screenshot = await page.screenshot();
    await browser.close();

    // Store screenshot in R2
    await env.SCREENSHOTS.put(`${Date.now()}.png`, screenshot);

    return new Response(html, {
      headers: { "Content-Type": "text/html" },
    });
  },
};

Pricing: $0.20 per browser-minute. A 5-second page render costs ~$0.017. For 1000 pages, that is $17. Compare to running your own browser pool: usually cheaper at high volume but with operational overhead.

Stealth on Workers

Worker fetch uses Cloudflare’s network. The TLS fingerprint is whatever Cloudflare’s outbound HTTP client uses, which is distinctive. Targets that fingerprint TLS see “Cloudflare Worker” patterns.

Mitigations:

For HTTP-only sites without TLS fingerprinting: standard fetch is fine
For sites with light fingerprinting: Browser Rendering API uses real Chromium
For heavy targets: Workers are not the right tool. Use them as orchestration, route the actual fetches through proxied scrapers elsewhere

Cloudflare also offers Smart Placement which routes Workers to a POP near the upstream service, reducing latency. For scraping, you usually want POPs distributed close to your targets.

Subrequest limits

Each Worker invocation can make up to 50 subrequests on Standard plan, 1000 on Unbound. A subrequest is any external fetch, KV operation, D1 query, etc.

For high-volume scraping, this means each Worker invocation can fetch up to 50 pages in parallel. Beyond that, you need to chain invocations:

async function scrapeWithChunking(urls, env) {
  const CHUNK_SIZE = 30;  // leave headroom under 50
  for (let i = 0; i < urls.length; i += CHUNK_SIZE) {
    const chunk = urls.slice(i, i + CHUNK_SIZE);
    await Promise.all(chunk.map(url => scrapeOne(url, env)));
  }
}

If a single chunk takes 30+ seconds CPU time, split into separate Worker invocations triggered by Cron or by Durable Object scheduling.

Cron Triggers for scheduled scraping

Workers support scheduled execution via Cron Triggers in wrangler.toml:

[triggers]
crons = ["*/5 * * * *", "0 0 * * *"]  # every 5 min and daily at midnight

Handler:

export default {
  async scheduled(event, env, ctx) {
    if (event.cron === "*/5 * * * *") {
      // Run every 5 minutes
      ctx.waitUntil(scrapePriceUpdates(env));
    }
    if (event.cron === "0 0 * * *") {
      // Daily full crawl
      ctx.waitUntil(fullCrawl(env));
    }
  },
};

ctx.waitUntil lets the work continue past the response return, useful for fire-and-forget scrapers.

Comparison: Workers vs other serverless for scraping

platform	cold start	max duration	egress cost	proxy support	browser support
Cloudflare Workers	<50ms	30s (5min Unbound)	free	none native	Browser Rendering API
AWS Lambda	100-3000ms	15 min	$0.09/GB	yes	via container
Google Cloud Functions	100-3000ms	9 min	$0.12/GB	yes	via container
Azure Functions	200-3000ms	10 min	$0.087/GB	yes	via container
Vercel Functions	100-1000ms	60s (Pro)	$0.40/GB	limited	yes
Deno Deploy	<50ms	50s	free	yes	no

For free egress and global edge distribution, Workers and Deno Deploy lead. For long-running tasks and proxy support, Lambda or GCF win. For browser rendering, Workers’ Browser Rendering API or Lambda with custom container.

For broader serverless-scraping patterns in JS, see Deno scraping libraries 2026 reviewed.

Architecture: Workers + Durable Object + R2

A complete distributed scraper:

   Cron Trigger (every 1 min)
            |
            v
   +----------------+
   |   Worker       |  reads queue, dispatches subrequests
   +----------------+
            |
   +--------+--------+
   |                 |
   v                 v
 Durable Object    Worker subrequests
 (queue, state)   (parallel fetches)
                       |
                       v
                  KV (dedup)
                  D1 (data)
                  R2 (HTML, screenshots)

Each Cron tick: Worker reads up to 50 URLs from the Durable Object queue, fetches them in parallel, parses, writes results to D1 and R2, marks URLs as visited in KV. If queue still has items, schedule another invocation immediately.

This pattern handles tens of thousands of URLs per hour on Workers Standard, scales to millions on Unbound.

Cost analysis

A scraper doing 1 million page fetches per month:

component	cost
Workers requests (1M)	$0.30
Workers CPU time	varies, ~$5-20
KV reads (visited check)	$0.50
D1 writes (results)	$1-3
R2 storage (HTML at 50 KB avg)	$0.75
Durable Object requests	$0.15
Browser Rendering (10% of pages, 5 sec each)	$1,000
Total without browser	~$10/month
Total with browser on 10% of pages	~$1,010/month

Without browser rendering, Workers are extremely cheap. With browser rendering, they are still competitive but no longer the absolute bargain. For pages that need browser rendering at scale, consider running Playwright on dedicated VMs and reserving Workers for HTTP-only flows.

Common pitfalls

Cold storage state on first invocation: Durable Objects start empty; handle nil/empty cases
HTMLRewriter is streaming, not DOM: cannot do complex selector queries
No npm packages with native code: pure JS only
Subrequest limit: hits silently when exceeded, your fetch returns an error
CPU limit applies to async work too: even waiting for a fetch counts if your Worker is processing
Time limit measured in CPU time: 30 seconds CPU != 30 seconds wall clock
No persistent disk: cannot save SQLite files, must use D1 or KV

For the Cloudflare Workers limits reference, see Workers limits.

Operational checklist

For production scrapers on Workers in 2026:

Wrangler 3+ for deploys
Workers Unbound plan if you need >30s CPU or >50 subrequests
Durable Objects for queues and rate limits
KV for dedup and lightweight state
D1 or R2 for results storage
Cron Triggers for scheduling
Monitor CPU time, subrequest count, error rate per Worker
Use Workers Logpush to stream logs to R2 or external SIEM
Reserve Browser Rendering API for pages that truly need JS
For TLS-fingerprinted targets, route through external proxy infrastructure
Test with wrangler dev --remote for true edge testing

FAQ

Q: can I use proxies with Workers?
Not natively. Workers fetch from Cloudflare’s network. To use proxies, you would need to route to a proxy service via fetch (a custom proxy that accepts HTTPS and forwards), which is awkward. For proxy-required scraping, run your scrapers elsewhere.

Q: how does Worker fetch’s TLS fingerprint look?
Distinct from any browser. Cloudflare’s outbound HTTP client has its own JA4. Targets that check TLS fingerprints can identify it. For TLS-sensitive targets, do not use Workers for the fetch.

Q: is Browser Rendering API a real Chrome?
Yes, it runs Chromium in Cloudflare’s infrastructure. Your Worker connects to it via the puppeteer-compatible API. It supports stealth patches but is metered.

Q: can I do long-running scraping on Workers?
Not in a single invocation. Use Durable Objects to break work into small chunks, with each Worker invocation handling one chunk. Coordinate via Cron or self-scheduled triggers.

Q: how do Workers compare to Vercel Edge Functions for scraping?
Workers have lower cold start and free egress; Vercel has Edge Functions in similar shape but with usage-based pricing including egress. For raw scraping cost at scale, Workers are cheaper. For Vercel projects already using their platform, Edge Functions integrate natively.

Common pitfalls in production Workers scraping

The first failure mode is the subrequest fan-out cliff. Workers Standard caps subrequests at 50 per invocation; Workers Unbound raises this to 1000. A scraper that processes a category page and queues 60 product fetches from a single Worker invocation hits the cap silently: the 51st fetch returns an error in the response object but does not throw. Your loop continues, your error counter never increments because you assumed fetch() would throw, and you end up with 49 successful fetches and 11 silently dropped requests. The fix is two-part: first, always check response.ok and response.status after every fetch; second, batch outbound fetches into chunks of 40 (leaving headroom for retries) and queue overflow into the Durable Object for the next Cron tick.

The second pitfall is HTMLRewriter buffering on slow upstreams. HTMLRewriter is streaming, but if your handler does any async work inside an element callback, it pauses the stream. A handler that fetches a related URL on every <a href> element it encounters effectively serializes the entire HTML parse, turning a 200ms streaming parse into a 30+ second one (one fetch per anchor). The fix is to collect the URLs synchronously into an array during the parse, then dispatch fetches in parallel after the parse completes:

const collectedUrls = [];
const rewriter = new HTMLRewriter().on("a[href]", {
  element(el) {
    collectedUrls.push(el.getAttribute("href"));  // sync only
  },
});
await rewriter.transform(response).text();
// Now fan out
const subResponses = await Promise.all(
  collectedUrls.slice(0, 40).map((u) => fetch(new URL(u, baseUrl)))
);

The third pitfall is Durable Object hibernation losing in-memory state. DOs hibernate after about 10 seconds of inactivity, and any state stored in instance variables (not in state.storage) is lost on wake. Scrapers that maintain a “currently processing” set in this.processing = new Set() find that the set is empty after every cron-triggered wake, leading to duplicate processing. Always persist coordination state via state.storage.put() and re-hydrate via state.storage.get() on every method entry. Hibernation is documented but easy to forget when your local wrangler dev keeps state in memory across reloads.

Real-world example: 10M-URL crawl with $40 monthly cost

A scraper team built a price-monitoring system that fetched 10 million product URLs per month across 200 ecommerce sites. The architecture used Workers Cron + Durable Objects + KV + D1 + R2:

// cron.js — runs every minute
export default {
  async scheduled(event, env, ctx) {
    const queueDO = env.QUEUE_DO.get(env.QUEUE_DO.idFromName("global"));
    const batch = await queueDO.fetch("https://internal/dequeue?count=40").then(r => r.json());

    if (!batch.urls.length) return;

    const results = await Promise.all(batch.urls.map(async (url) => {
      try {
        const r = await fetch(url, {
          headers: { "user-agent": "Mozilla/5.0 ..." },
          cf: { cacheTtl: 0 },  // never cache, always fresh
        });
        if (!r.ok) return { url, error: `status ${r.status}` };
        const html = await r.text();
        const price = extractPrice(html);  // sync HTMLRewriter elsewhere
        return { url, price, fetched_at: Date.now() };
      } catch (e) {
        return { url, error: e.message };
      }
    }));

    // Write results to D1 in one batched insert
    const stmt = env.DB.prepare(
      "INSERT INTO prices (url, price, fetched_at) VALUES (?, ?, ?)"
    );
    await env.DB.batch(results
      .filter(r => r.price)
      .map(r => stmt.bind(r.url, r.price, r.fetched_at))
    );
  },
};

Monthly billing breakdown:

Workers requests: 14.4M (43,200 cron ticks at 40 fetches + 14M subrequests) = $4.80
Workers CPU time at avg 80ms per invocation = $11.50
Durable Object requests = $1.20
D1 writes (10M rows) = $15
KV reads (dedup checks) = $5
R2 storage (sampled raw HTML, 100GB) = $1.50

Total: $39 per month for 10M monitored URLs. The same workload on AWS Lambda + RDS would have cost approximately $850 per month, dominated by RDS instance fees and Lambda invocation costs. The lesson: Workers wins decisively when the workload fits its constraints (no proxies needed, sub-second per page, results structured enough for D1).

Detection: when Workers are the wrong choice

Five signals that your scraping workload should NOT live on Workers:

Per-request proxy required: Workers cannot rotate through residential proxy pools efficiently. Move to Lambda or VMs.
Long-running JS challenges: Cloudflare’s own Under Attack mode takes 5-10 seconds, and other JS challenges take similar time. Workers Unbound max is 5 minutes but CPU time is metered separately. Use Browserbase or VM-based Playwright.
Heavy data parsing: A 5MB HTML page parsed via DOMParser easily exceeds the 30s Workers Standard CPU budget. HTMLRewriter handles streaming HTML but is awkward for non-trivial extraction.
Stateful sessions across many requests: Cookies and login flows that span dozens of requests work better in long-lived processes.
Custom TLS fingerprinting: Workers cannot impersonate Chrome’s TLS handshake. Targets that check JA4 see Cloudflare’s outbound fingerprint instead.

If any two of these apply, run your scraper on Fargate, EKS, or self-hosted infrastructure instead.

Wrapping up

Cloudflare Workers are a niche but powerful scraping platform when the workload fits: high volume, distributed, HTTP-light, with state in KV/D1/Durable Objects. Browser Rendering API closes the JS gap at the cost of metered pricing. Pair this with our scrapers vs Playwright integration patterns and Deno scraping libraries for the full serverless picture, and browse the dev-tools-projects category on DRT for related infrastructure deep-dives.