Cheerio vs JSDom vs Linkedom for Node.js Scrapers (2026)

Picking the right HTML parser for your Node.js scraper is one of those decisions that looks trivial until you’re burning 4 GB of RAM on a 10,000-page crawl. Cheerio, JSDom, and Linkedom all let you query HTML with CSS selectors in Node.js, but they make very different tradeoffs around speed, correctness, and memory footprint — and choosing wrong will cost you.

What each library actually does

Cheerio is a thin jQuery-style API layered over a fast HTML parser (htmlparser2 by default, or parse5 for stricter mode). it does not execute JavaScript, it does not build a real DOM, and it does not care about CSS rendering. it just parses markup and gives you a traversal API. that’s it.

JSDom simulates a full browser environment in Node. it runs JavaScript, fires events, maintains a living DOM, and implements a large chunk of the Web APIs. it’s what tools like Jest use to fake a browser in tests. for scraping, this power usually works against you.

Linkedom sits between the two. it implements enough of the DOM standard to run querySelector and basic DOM APIs, but it skips JavaScript execution and keeps memory usage low. think of it as “the DOM spec, minus the browser.”

Speed and memory: the numbers that matter

Here’s a rough benchmark comparison across a 5 MB HTML document (a realistic news index page with ~2,000 nodes):

LibraryParse timePeak memoryJS execution
Cheerio (htmlparser2)~8ms~18 MBNo
Cheerio (parse5 mode)~22ms~28 MBNo
Linkedom~14ms~22 MBNo
JSDom~120ms~180 MBYes

JSDom is roughly 15x slower and 10x heavier for parse-only workloads. if your scraper doesn’t need JavaScript rendering, JSDom is almost never the right answer. for JavaScript-heavy SPAs you should be reaching for Playwright or Puppeteer anyway — for which the Python-side comparison of Pyppeteer vs Playwright Python: Which to Use in 2026 gives a good sense of what browser automation actually costs at scale.

When to use Cheerio

Cheerio is the default choice for static HTML scraping in Node.js. the API is familiar, the ecosystem is mature, and htmlparser2 is lenient enough to handle the broken HTML you’ll find on real sites.

import * as cheerio from 'cheerio';
import { fetch } from 'undici';

const html = await (await fetch('https://example.com/products')).text();
const $ = cheerio.load(html);

const prices = [];
$('.product-card .price').each((_, el) => {
  prices.push($(el).text().trim());
});

console.log(prices);

Cheerio works well when:

  • you’re scraping static or server-rendered pages
  • you need to process thousands of documents per minute
  • your team already knows jQuery selectors
  • you’re running in a memory-constrained environment (cheap VPS, Lambda)

one caveat: Cheerio’s default htmlparser2 is forgiving to a fault. if you’re hitting sites with deeply malformed HTML and getting wrong results, switch to parse5 mode via cheerio.load(html, { xmlMode: false }) — parse5 is spec-compliant and handles edge cases htmlparser2 silently mishandles. if you’re exploring the broader parser landscape, Selectolax: The Fastest HTML Parser You’re Not Using in 2026 covers the Python equivalent with similar speed-vs-correctness tradeoffs.

When Linkedom makes sense

Linkedom’s value proposition is correctness without the JSDom weight. if your scraping logic relies on DOM APIs beyond what Cheerio exposes — element.closest(), MutationObserver stubs, document.createElement for re-serialization — Linkedom handles these without loading a full browser runtime.

it’s also a clean fit for Worker Threads workloads. because Linkedom avoids the global state JSDom introduces, you can safely instantiate it inside Node.js worker threads and parse documents in parallel without hitting concurrency bugs.

numbered setup for a Linkedom worker pipeline:

  1. spawn N worker threads (one per CPU core)
  2. pass raw HTML strings via workerData
  3. parse with parseHTML(html) inside each worker
  4. return structured objects (not DOM nodes) back to main thread
  5. aggregate results in main thread

this pattern keeps memory per-worker predictable and avoids the GC pressure that JSDom creates when many documents are live simultaneously. if you’re running this on Bun instead of Node, the performance gap widens further — Web Scraping with Bun: Faster Than Node.js for Scrapers in 2026? benchmarks the runtime difference directly.

When JSDom is actually justified

JSDom earns its place in two specific scenarios:

  • test environments where you need window, document, and event simulation for code that runs in both browser and Node (Jest’s default jsdom environment is exactly this)
  • scraping targets that do light client-side rendering via inline
    Scroll to Top
    message me on telegram

    Resources

    Proxy Signals Podcast
    Operator-level insights on mobile proxies and access infrastructure.

    Multi-Account Proxies: Setup, Types, Tools & Mistakes (2026)