html5ever vs lol-html: Rust HTML Parsing Compared (2026)

If you’re parsing HTML in Rust and you haven’t chosen between html5ever and lol-html yet, you’re likely leaving significant performance on the table. Both crates handle HTML parsing, but they solve fundamentally different problems — and picking the wrong one for your scraper or pipeline can mean the difference between 50MB/s throughput and 500MB/s.

What Each Parser Actually Does

html5ever is a full spec-compliant HTML5 parser. it builds a complete DOM tree, handles malformed markup the way browsers do, and is the foundation that Servo (Mozilla’s browser engine) was built on. if you need to traverse nodes, modify the tree, or run CSS selectors against a parsed document, html5ever gives you that full graph structure.

lol-html (Leandro’s Own Lazy HTML rewriter) is a streaming, chunk-based rewriter built by Cloudflare. it never builds a DOM — it processes HTML as a byte stream and lets you attach handlers to CSS selectors that fire as matching elements pass through. memory usage stays nearly constant regardless of document size.

Both are production-grade in 2026. the question is architecture, not maturity.

Performance: Numbers That Matter

lol-html is faster for single-pass extraction tasks, often by 5x to 10x, because it avoids heap allocation for the full tree. Cloudflare uses it to rewrite HTML at the edge on billions of requests per day. html5ever is slower on throughput but gives you a queryable structure you can traverse multiple times without re-parsing.

Metric	html5ever	lol-html
Throughput (clean HTML)	~80-120 MB/s	~400-600 MB/s
Memory model	Full DOM tree	Streaming, O(1)
Multi-pass queries	Yes	No (one pass)
Spec compliance	Full HTML5	Partial (no full DOM)
Malformed HTML handling	Browser-grade	Tolerant, pass-through
CSS selector support	Via selectors crate	Built-in via `element!` macro

For a 10MB HTML document: html5ever allocates roughly 40-80MB of heap to build the tree. lol-html stays under 5MB regardless.

When to Use lol-html

lol-html wins when you have a clear extraction target and volume matters. common cases:

scraping thousands of product pages where you need title, price, and one or two attributes
rewriting HTML in a proxy or middleware layer (Cloudflare Workers uses this natively)
streaming large HTML responses without buffering the full body
pipelines where latency matters more than flexibility

use lol_html::{element, HtmlRewriter, Settings};

let mut output = vec![];
let mut rewriter = HtmlRewriter::new(
    Settings {
        element_content_handlers: vec![
            element!("h1.product-title", |el| {
                let text = el.get_attribute("data-name").unwrap_or_default();
                println!("Product: {}", text);
                Ok(())
            }),
        ],
        ..Settings::default()
    },
    |c: &[u8]| output.extend_from_slice(c),
);

rewriter.write(html_bytes)?;
rewriter.end()?;

the selector fires once per matching element as the stream passes through. you never hold the full document in memory.

When to Use html5ever

html5ever wins when your extraction logic isn’t known upfront, when you need to traverse parent-child relationships, or when you’re building tooling rather than a point scraper. specific cases:

you need to find all tags inside a
that also contains a specific class — relationships that require tree context
you’re building a general-purpose scraper that accepts arbitrary CSS selectors at runtime
you need to reconstruct or modify the document and re-serialize it
the HTML is severely malformed and spec-correct error recovery matters (think legacy CMS output)

the HTML Parsing: Complete Guide to DOM, SAX, and Regex Approaches covers why tree-based parsing matters for complex document structures — html5ever’s full DOM is the Rust equivalent of that model.

html5ever is typically used through the scraper crate, which wraps it with a CSS selector API:

use scraper::{Html, Selector};

let document = Html::parse_document(&html_string);
let selector = Selector::parse("div.product > span.price").unwrap();

for element in document.select(&selector) {
    println!("{}", element.text().collect::<String>());
}

the scraper crate is the practical entry point — you rarely use html5ever directly unless you’re implementing a custom TreeSink.

Ecosystem and Language Context

if you’re coming from Python, Selectolax: The Fastest HTML Parser You’re Not Using in 2026 covers a Python parser with a similar streaming philosophy to lol-html — worth reading if you run mixed-language pipelines. for Node.js, the tradeoffs between Cheerio vs JSDom vs Linkedom for Node.js Scrapers (2026) map loosely to html5ever vs lol-html: Cheerio is your fast selector-based tool, JSDom is your full spec-compliant DOM.

Rust HTML parsing does have a gap: neither parser handles JavaScript rendering. if the pages you’re targeting require browser execution, you’re looking at headless tooling instead. the comparison in Pyppeteer vs Playwright Python: Which to Use in 2026 covers that layer, and it’s worth reading before assuming static parsing will be enough.

one niche worth calling out: if your scraper handles forms, sessions, or login flows in a stateless HTTP client, that logic sits above the parser layer entirely. Mechanicalsoup Library Review 2026: When Cookies + Forms Matter is Python-specific, but the session-management patterns it describes apply regardless of your parsing layer.

Practical Decision Checklist

do you know exactly which elements you need before parsing? use lol-html
do you need parent-context-aware selection (e.g., “only links inside nav”)? use html5ever via scraper
are you processing more than 1,000 pages per minute on a single thread? benchmark both, lol-html likely wins
is your HTML severely broken or CMS-generated garbage? html5ever’s error recovery is more robust
are you building a reusable library or CLI tool that accepts arbitrary selectors? html5ever
are you running in a constrained memory environment (edge functions, embedded)? lol-html

neither crate is “better” — they’re optimized for different architectures. the mistake most Rust scraper authors make is defaulting to scraper (html5ever) for everything because the API is familiar, then wondering why their pipeline saturates at 100 pages/second when lol-html could handle 10x that for the same extraction task.

Bottom Line

use lol-html when you’re doing high-volume, single-pass extraction with a known selector set — it is faster, leaner, and production-proven at cloud scale. use html5ever (via the scraper crate) when your queries are complex, relational, or determined at runtime. for most production scrapers, the right answer is lol-html for the hot path and html5ever for the edge cases. DRT covers both Rust and Python parsing tooling in depth — if you found this useful, the pillar guide on HTML parsing approaches is worth bookmarking for the broader context.