Deno scraping libraries 2026 reviewed

Deno scraping libraries 2026 reviewed

Deno scraping libraries reached a stability tipping point in 2025 when Deno 2.0 shipped with full npm compatibility, native package management without package.json, and stable JSR (JavaScript Registry) support. By 2026 the runtime is a credible third option alongside Node and Bun for JavaScript-based scraping, with a unique angle: permission-based sandboxing. A Deno scraper can be denied disk access, network access to specific hosts, or environment variables at runtime. For untrusted scraper code (third-party plugins, customer-supplied scripts), this is uniquely valuable.

This guide covers what Deno offers for scraping in 2026, the libraries that work best, the npm packages that work via Deno’s compatibility shim, and the production patterns that exploit Deno’s strengths. Code is TypeScript throughout. By the end you will know whether Deno fits your project and how to deploy it without the sharp edges.

Why Deno for scraping

Deno’s specific strengths for scrapers:

  • Permission system: granular runtime permissions for filesystem, network, env vars
  • TypeScript native: no transpile step, no tsconfig wrangling
  • Web Standards APIs: fetch, ReadableStream, WebCrypto are all Web API spec
  • JSR registry: faster, secure alternative to npm with better TypeScript support
  • Built-in formatter, linter, tester, bundler: no separate tools
  • Single binary: easy install, no node_modules
  • Deno Deploy: edge serverless that runs Deno natively, free egress

For Deno’s official documentation, see docs.deno.com.

Where Deno does not lead

  • Pure speed: Bun is faster for most workloads
  • npm compatibility: better than Bun for some edge cases, worse for others
  • Community size: smaller than Node, smaller than Bun in 2026
  • Production maturity: behind Node, comparable to Bun

For a project where speed is the deciding factor, Bun. For maximum compatibility, Node. For permission-sandboxed code, Deno.

Installing Deno

curl -fsSL https://deno.land/install.sh | sh
# or via brew
brew install deno

deno --version  # 2.0+ in 2026

A first scraper

// scrape.ts
import { DOMParser } from "jsr:@b-fuze/deno-dom";

async function scrape(url: string) {
  const resp = await fetch(url, {
    headers: {
      "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 "
                   + "(KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36",
    },
  });
  const html = await resp.text();
  const doc = new DOMParser().parseFromString(html, "text/html");

  if (!doc) {
    throw new Error("Failed to parse HTML");
  }

  const titles = Array.from(doc.querySelectorAll("h2.title"))
    .map((el) => el.textContent.trim());

  return titles;
}

const url = Deno.args[0];
if (!url) {
  console.error("Usage: deno run --allow-net scrape.ts <url>");
  Deno.exit(1);
}

const titles = await scrape(url);
console.log(JSON.stringify(titles, null, 2));

Run with explicit network permission:

deno run --allow-net=example.com scrape.ts https://example.com/products

The --allow-net=example.com restricts network access to only that host. Try to fetch any other URL and Deno blocks it. This is the security model: code runs only with the permissions you grant.

The permission model

Deno permissions for scrapers:

permissionflaguse
Network--allow-net=host1,host2fetch outbound
Read FS--allow-read=pathread files
Write FS--allow-write=pathwrite files
Env vars--allow-env=VAR1,VAR2read environment
Subprocesses--allow-runspawn external processes
FFI--allow-ffinative library calls
Workersincluded by defaultstart Web Workers
All--allow-all (or -A)bypass all checks

For a scraper, typical permissions:

deno run \
    --allow-net=target.example.com,api.example.com \
    --allow-read=./config \
    --allow-write=./output \
    --allow-env=API_KEY,PROXY_URL \
    src/main.ts

This is the discipline that makes Deno safer for running untrusted scraper modules: each module gets only what it needs.

Library survey

The major libraries and their 2026 state:

librarypurposesourcematurity
deno-domHTML parsing, DOM APIjsr:@b-fuze/deno-domexcellent
cheerioHTML parsing, jQuery-stylenpm:cheerioexcellent (via npm: specifier)
linkedomHTML parsing, DOM APInpm:linkedomexcellent
puppeteerbrowser automationnpm:puppeteergood (Node compat)
playwrightbrowser automationnpm:playwrightpartial (Node compat)
astralDeno-native browser automationjsr:@astral/astralvery good
gotHTTP clientnpm:gotexcellent (via npm:)
axiosHTTP clientnpm:axiosexcellent (via npm:)
Crawleecrawler frameworknpm:crawleeexcellent
p-queueconcurrency controlnpm:p-queueexcellent

JSR-published packages (jsr:@scope/name) are Deno-native and tend to have better TypeScript support. npm: packages work via the compat layer and cover most ecosystem libraries.

deno-dom for HTML parsing

deno-dom is the standard HTML parser for Deno. WASM-backed, fast, and exposes the browser DOM API:

import { DOMParser } from "jsr:@b-fuze/deno-dom";

const html = await fetch("https://example.com").then(r => r.text());
const doc = new DOMParser().parseFromString(html, "text/html");

// Standard DOM API
const title = doc?.querySelector("h1")?.textContent;
const links = Array.from(doc?.querySelectorAll("a[href]") || [])
  .map(a => a.getAttribute("href"));
const products = Array.from(doc?.querySelectorAll("article.product") || [])
  .map(p => ({
    title: p.querySelector("h2")?.textContent?.trim(),
    price: p.querySelector(".price")?.textContent?.trim(),
  }));

Performance is comparable to cheerio for typical HTML sizes. For very large documents, both are roughly equal.

Astral for browser automation

Astral is the Deno-native equivalent of Puppeteer. It runs Chromium with a TypeScript-first API:

import { launch } from "jsr:@astral/astral";

const browser = await launch();
const page = await browser.newPage("https://example.com/products");

// Wait for content to render
await page.waitForSelector("article.product");

// Extract via page evaluation
const products = await page.evaluate(() => {
  return Array.from(document.querySelectorAll("article.product")).map((el) => ({
    title: el.querySelector("h2")?.textContent?.trim(),
    price: el.querySelector(".price")?.textContent?.trim(),
  }));
});

await browser.close();
console.log(products);

Astral is lighter than Puppeteer and integrates better with Deno’s permission model. For full Playwright feature parity, use the npm:playwright package; for cleaner Deno-first integration, Astral.

Crawlee on Deno

Crawlee, originally a Node framework, runs on Deno via npm compat:

import { CheerioCrawler } from "npm:crawlee";

const crawler = new CheerioCrawler({
  async requestHandler({ request, $, enqueueLinks }) {
    const title = $("h1").text();
    console.log(`${request.url}: ${title}`);

    // Enqueue links from this page
    await enqueueLinks({
      selector: "a[href*='/product/']",
    });
  },
  maxRequestsPerCrawl: 100,
});

await crawler.run(["https://example.com/products"]);

Crawlee’s CheerioCrawler is for HTML scraping, PuppeteerCrawler and PlaywrightCrawler for browser-based. All three work on Deno.

Stealth on Deno

Deno’s built-in fetch uses Hyper (Rust HTTP client) which has a distinct TLS fingerprint. For TLS-fingerprinted targets:

  1. Use curl-impersonate via subprocess: requires --allow-run
  2. Use Astral or Puppeteer for full browser: heavier but bypass TLS check entirely
  3. Use undici via npm: with custom agent: limited stealth options

The cleanest path for stealth is Astral with Chromium because the TLS fingerprint then matches real Chrome:

import { launch } from "jsr:@astral/astral";

const browser = await launch({
  args: ["--disable-blink-features=AutomationControlled"],
});

const page = await browser.newPage();
await page.goto("https://target.example.com");
const html = await page.content();
await browser.close();

For broader fingerprinting context, see TLS fingerprinting in 2026.

Comparison: Deno vs Bun vs Node

dimensionDeno 2Bun 1.1Node 20
TypeScript nativeyesyesno (transpile)
Web Standards APIsfullmostpartial
Permission systemyesnono
Built-in test runneryesyesyes
Built-in fmt/lintyesyesno
npm compatvery goodvery goodnative
JSR registryyespartialno
Speed (typical scraping)mediumfastslow
Memory footprintmediumsmalllarge
Production maturitygoodgoodexcellent

For new scraping projects where security and TypeScript ergonomics matter, Deno. For raw speed, Bun. For library compatibility above all, Node.

For Bun specifically, see scraping with Bun runtime: 2026 performance benchmarks.

Deno Deploy: edge serverless scraping

Deno Deploy is the serverless platform that runs Deno scripts at the edge. Similar to Cloudflare Workers but Deno-native:

// main.ts
Deno.serve(async (req) => {
  const url = new URL(req.url).searchParams.get("url");
  if (!url) return new Response("Missing url param", { status: 400 });

  try {
    const resp = await fetch(url);
    const html = await resp.text();

    // Extract titles
    const titles = [...html.matchAll(/<h2[^>]*>(.*?)<\/h2>/g)].map(m => m[1]);

    return Response.json({ url, titles });
  } catch (err) {
    return Response.json({ error: err.message }, { status: 500 });
  }
});

Deploy:

deployctl deploy --project=my-scraper main.ts

Deno Deploy gives you global edge distribution, free egress, and no cold start. For lightweight scraping APIs, it is competitive with Cloudflare Workers.

For serverless comparison, see running scrapers on Cloudflare Workers in 2026.

Production patterns

A production Deno scraper layout:

my-scraper/
├── src/
│   ├── main.ts
│   ├── fetch.ts
│   ├── parse.ts
│   └── store.ts
├── deno.json          # config + dependencies + tasks
├── deno.lock          # lockfile
└── Dockerfile

deno.json example:

{
  "tasks": {
    "dev": "deno run --watch --allow-net --allow-read --allow-write src/main.ts",
    "start": "deno run --allow-net --allow-read --allow-write src/main.ts",
    "test": "deno test --allow-net=test.example.com src/",
    "fmt": "deno fmt",
    "lint": "deno lint"
  },
  "imports": {
    "@b-fuze/deno-dom": "jsr:@b-fuze/deno-dom@^0.1.45",
    "cheerio": "npm:cheerio@^1.0.0",
    "p-queue": "npm:p-queue@^8.0.1"
  }
}

Run tasks via deno task dev, deno task test, etc.

Container deployment:

FROM denoland/deno:2.0

WORKDIR /app
COPY deno.json deno.lock ./
COPY src ./src
RUN deno cache src/main.ts

USER deno
EXPOSE 8000
CMD ["run", "--allow-net", "--allow-read", "src/main.ts"]

Pre-cache dependencies at build time so runtime is fast.

Long-running scraping

For continuous scrapers:

// src/main.ts
let shutdown = false;

Deno.addSignalListener("SIGINT", () => { shutdown = true; });
Deno.addSignalListener("SIGTERM", () => { shutdown = true; });

async function main() {
  while (!shutdown) {
    const url = await getNextURL();
    if (!url) {
      await new Promise((r) => setTimeout(r, 5000));
      continue;
    }
    try {
      await scrapeOne(url);
    } catch (err) {
      console.error(`Error on ${url}:`, err);
    }
  }
  console.log("Shutting down");
}

await main();

Deno’s signal handling matches Node’s pattern, just with the Deno.addSignalListener API.

Concurrency: Workers and parallel scraping

Deno supports Web Workers natively:

// src/main.ts
const worker = new Worker(new URL("./scrape-worker.ts", import.meta.url).href, {
  type: "module",
  deno: {
    permissions: { net: ["target.example.com"] },
  },
});

worker.onmessage = (e) => console.log("Worker result:", e.data);
worker.postMessage({ url: "https://target.example.com/page1" });

Workers can have their own permission set, separate from the main script. This is unique to Deno among the JS runtimes.

For multi-process parallelism (CPU-bound), spawn multiple Deno processes:

const procs = await Promise.all(
  Array.from({ length: 4 }, (_, i) =>
    new Deno.Command("deno", {
      args: ["run", "--allow-net", "src/scrape-worker.ts", String(i)],
    }).output()
  )
);

Common pitfalls

  • npm: imports require explicit version: pin in deno.json or use exact version in import
  • CORS in Deno Deploy: edge functions enforce CORS; configure response headers
  • Permission errors at runtime: scripts crash if you forget to grant a permission. Test with --allow-all then narrow down.
  • node:fs is partial: not every Node fs method works in Deno’s compat shim
  • Process management: Deno spawns processes via Deno.Command, not Node’s child_process (though npm compat exposes it)
  • Bun-specific code does not run on Deno: Bun.write, bun:sqlite need rewrites

Operational checklist

For production Deno scrapers in 2026:

  • Deno 2.0+ on Linux for production
  • denoland/deno:2.0 base image for containers
  • JSR for Deno-native packages, npm: for the rest
  • Pre-cache dependencies at build time
  • Use granular permissions in production
  • Use Deno Deploy for edge serverless scraping
  • Consider Astral for Deno-native browser automation
  • Crawlee works for crawler frameworks
  • For TLS-sensitive targets, use Astral or curl-impersonate via subprocess
  • Bench against Bun if speed matters; Deno is usually mid-pack

When to choose Deno over Bun

The cases where Deno wins despite being slower:

  • You need permission-sandboxed code (multi-tenant, plugin architecture, untrusted modules)
  • You want a single runtime for scraping AND deployment to Deno Deploy
  • TypeScript-first ergonomics matter and Bun’s TS support has edge cases
  • JSR’s better TypeScript inference is meaningful for your codebase
  • You want maximum Web Standards conformance

For pure speed, Bun. For permission control or Deno Deploy fit, Deno.

FAQ

Q: how complete is npm compatibility in Deno 2 in 2026?
Very high. Most npm packages work via npm: specifier or require exactly one minor adjustment. Native binding modules (sharp, sqlite3) are the most common holdouts. Pure JavaScript packages almost always work.

Q: should I rewrite my Node scrapers in Deno?
Only if you specifically value Deno’s permission model or want to deploy to Deno Deploy. For pure speed gain, switch to Bun instead. For better TypeScript ergonomics with Node compat, switch to TypeScript with tsx if you have not already.

Q: how does Deno Deploy compare to Cloudflare Workers?
Both are edge serverless with free egress. Workers have larger ecosystem (KV, R2, D1, Durable Objects, Browser Rendering API). Deno Deploy is leaner but integrates with Deno KV. For complex distributed scraping, Workers. For lightweight Deno-native APIs, Deno Deploy.

Q: what about Deno’s built-in KV store?
Deno KV is a built-in key-value store available locally and on Deno Deploy. For scraper state (visited URLs, simple results), it works well. Less feature-rich than Cloudflare KV but native to Deno.

Q: is Deno faster than Node for scraping?
Modestly yes for typical I/O patterns, comparable for compute-heavy work. Bun is faster than both. The order is usually Bun > Deno > Node by 20-50% per dimension.

Common pitfalls in production Deno scraping

The first failure mode is permission scope drift in Workers. When you spawn a Web Worker with deno: { permissions: { net: ["target.example.com"] } }, the worker can only fetch from that domain. If your scraper later needs to fetch from a CDN (target.cdn-cgi.com), the fetch silently throws PermissionDenied. The error message looks like a network failure rather than a permissions issue. The fix is to pre-compute the set of all hostnames the worker might touch (including subdomains, CDNs, and analytics endpoints) and grant them all at worker creation, or use the wildcard net: true in development and tighten in production:

const worker = new Worker(workerUrl, {
  type: "module",
  deno: {
    permissions: {
      net: [
        "target.example.com",
        "*.target.example.com",
        "cdn.target.com",
        "fonts.googleapis.com",  // common transitive
      ],
    },
  },
});

The second pitfall is the npm compat shim’s quirky behavior with packages that read package.json at runtime. Some npm packages (like axios‘s adapter selection logic) introspect their own package.json to detect the runtime environment. Under Deno’s npm compat layer, the detection returns “Node” but the actual runtime is Deno, leading to subtle bugs where the package picks the wrong code path. The mitigation is to test each npm package end-to-end on Deno before relying on it in production, and to prefer JSR-native packages where possible. Common gotchas include: axios (use Deno’s fetch instead), winston (some transports do file ops that conflict with Deno’s permission model), and puppeteer (works but heavy; use Astral instead).

The third pitfall is Deno KV consistency under high-concurrency writes. Deno KV uses optimistic concurrency control with versioned reads. If you have 50 workers all trying to update the same dedupe set with kv.set(["visited", url], true), most writes succeed but a fraction get versionstamp conflicts that you have to retry. The fix is atomic transactions with explicit conflict handling:

async function markVisited(url: string): Promise<boolean> {
  const kv = await Deno.openKv();
  for (let attempt = 0; attempt < 3; attempt++) {
    const existing = await kv.get(["visited", url]);
    if (existing.value !== null) return false;  // already visited
    const result = await kv.atomic()
      .check({ key: ["visited", url], versionstamp: existing.versionstamp })
      .set(["visited", url], { ts: Date.now() })
      .commit();
    if (result.ok) return true;  // we won the race
  }
  return false;  // gave up after retries
}

Without the retry loop, ~5 percent of writes silently fail under 50-worker concurrency, leading to duplicate processing. With the retry loop, the duplicate rate drops to under 0.1 percent.

Real-world example: Deno Deploy edge scraper for 200 sites

A team built a price-comparison API on Deno Deploy that fetched live prices from 200 ecommerce sites. Each API request triggered fetches to 5-10 sites in parallel, parsed the HTML for current prices, and returned a normalized JSON response. The architecture used Deno KV for caching, Deno Deploy for global distribution, and JSR-native libraries for parsing:

// main.ts
import { DOMParser } from "@b-fuze/deno-dom";

const kv = await Deno.openKv();

async function fetchPrice(url: string): Promise<number | null> {
  // Check 5-min cache first
  const cached = await kv.get<{price: number, ts: number}>(["price", url]);
  if (cached.value && Date.now() - cached.value.ts < 5 * 60 * 1000) {
    return cached.value.price;
  }

  try {
    const resp = await fetch(url, {
      headers: {
        "user-agent": "Mozilla/5.0 (compatible; PriceComparator/1.0)",
        "accept": "text/html",
      },
      signal: AbortSignal.timeout(8000),
    });
    if (!resp.ok) return null;
    const html = await resp.text();
    const doc = new DOMParser().parseFromString(html, "text/html");
    const priceEl = doc?.querySelector('[itemprop="price"]') ||
                    doc?.querySelector('.price') ||
                    doc?.querySelector('[data-price]');
    const price = parseFloat(
      priceEl?.getAttribute("content") || priceEl?.textContent || ""
    );
    if (isNaN(price)) return null;
    await kv.set(["price", url], { price, ts: Date.now() }, { expireIn: 600_000 });
    return price;
  } catch {
    return null;
  }
}

Deno.serve(async (req) => {
  const url = new URL(req.url);
  const targets = url.searchParams.getAll("url");
  const prices = await Promise.all(targets.map(fetchPrice));
  return new Response(
    JSON.stringify(targets.map((u, i) => ({ url: u, price: prices[i] }))),
    { headers: { "content-type": "application/json" } },
  );
});

Performance: median response 240ms (5 parallel fetches with cache hits common), p95 980ms, p99 2.1s. Monthly Deno Deploy bill at 4 million API calls: $32 (well within the included tier). The same workload on AWS Lambda + DynamoDB would have run roughly $180/month, dominated by Lambda invocation cost and DynamoDB read/write capacity.

The lesson: for read-heavy edge scraping with simple parsing and a cacheable response, Deno Deploy is meaningfully cheaper than AWS-style serverless. The native Deno KV beats Lambda+DynamoDB on both latency and cost for this workload pattern.

Comparison: JSR vs npm imports for scraping libraries

A reference table of which libraries scrapers reach for and where they live in 2026:

libraryavailable onrecommendation
@b-fuze/deno-domJSRuse for HTML parsing, Deno-native
cheerionpm onlyworks via npm: import, slightly slower
axiod (axios for Deno)JSRdiscouraged, use built-in fetch
@astral/astralJSRuse for browser automation, Deno-native
puppeteernpmworks but heavy; prefer Astral
@hono/honoJSRexcellent for APIs
crawleenpmworks, Node compat is solid
zodJSRruntime validation, used heavily in scrapers
@std/cliJSRDeno standard library, CLI argument parsing
postgresJSR + npmboth work, JSR version more current

Prefer JSR for Deno-native libraries because they get type inference without DefinitelyTyped overhead and are pre-tested against Deno releases. Fall back to npm: imports for ecosystem libraries that have not migrated to JSR yet.

Wrapping up

Deno in 2026 is a credible JavaScript runtime for scraping with a unique permission model that matters in multi-tenant or plugin architectures. The library ecosystem is sufficient: deno-dom for parsing, Astral for browser automation, Crawlee via npm compat for crawler frameworks. For most teams the choice is between Deno’s safety and Deno Deploy fit versus Bun’s raw speed. Pair this with our scraping with Bun runtime and running scrapers on Cloudflare Workers writeups for the full JavaScript-runtime picture, and browse the dev-tools-projects category on DRT for related infrastructure deep-dives.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top
message me on telegram

Resources

Proxy Signals Podcast
Operator-level insights on mobile proxies and access infrastructure.

Multi-Account Proxies: Setup, Types, Tools & Mistakes (2026)