If you’re parsing HTML in Rust and you haven’t chosen between html5ever and lol-html yet, you’re likely leaving significant performance on the table. Both crates handle HTML parsing, but they solve fundamentally different problems — and picking the wrong one for your scraper or pipeline can mean the difference between 50MB/s throughput and 500MB/s.
What Each Parser Actually Does
html5ever is a full spec-compliant HTML5 parser. it builds a complete DOM tree, handles malformed markup the way browsers do, and is the foundation that Servo (Mozilla’s browser engine) was built on. if you need to traverse nodes, modify the tree, or run CSS selectors against a parsed document, html5ever gives you that full graph structure.
lol-html (Leandro’s Own Lazy HTML rewriter) is a streaming, chunk-based rewriter built by Cloudflare. it never builds a DOM — it processes HTML as a byte stream and lets you attach handlers to CSS selectors that fire as matching elements pass through. memory usage stays nearly constant regardless of document size.
Both are production-grade in 2026. the question is architecture, not maturity.
Performance: Numbers That Matter
lol-html is faster for single-pass extraction tasks, often by 5x to 10x, because it avoids heap allocation for the full tree. Cloudflare uses it to rewrite HTML at the edge on billions of requests per day. html5ever is slower on throughput but gives you a queryable structure you can traverse multiple times without re-parsing.
| Metric | html5ever | lol-html |
|---|---|---|
| Throughput (clean HTML) | ~80-120 MB/s | ~400-600 MB/s |
| Memory model | Full DOM tree | Streaming, O(1) |
| Multi-pass queries | Yes | No (one pass) |
| Spec compliance | Full HTML5 | Partial (no full DOM) |
| Malformed HTML handling | Browser-grade | Tolerant, pass-through |
| CSS selector support | Via selectors crate | Built-in via element! macro |
For a 10MB HTML document: html5ever allocates roughly 40-80MB of heap to build the tree. lol-html stays under 5MB regardless.
When to Use lol-html
lol-html wins when you have a clear extraction target and volume matters. common cases:
- scraping thousands of product pages where you need title, price, and one or two attributes
- rewriting HTML in a proxy or middleware layer (Cloudflare Workers uses this natively)
- streaming large HTML responses without buffering the full body
- pipelines where latency matters more than flexibility
use lol_html::{element, HtmlRewriter, Settings};
let mut output = vec![];
let mut rewriter = HtmlRewriter::new(
Settings {
element_content_handlers: vec![
element!("h1.product-title", |el| {
let text = el.get_attribute("data-name").unwrap_or_default();
println!("Product: {}", text);
Ok(())
}),
],
..Settings::default()
},
|c: &[u8]| output.extend_from_slice(c),
);
rewriter.write(html_bytes)?;
rewriter.end()?;the selector fires once per matching element as the stream passes through. you never hold the full document in memory.
When to Use html5ever
html5ever wins when your extraction logic isn’t known upfront, when you need to traverse parent-child relationships, or when you’re building tooling rather than a point scraper. specific cases:
- you need to find all
tags inside athat also contains a specific class — relationships that require tree context - you’re building a general-purpose scraper that accepts arbitrary CSS selectors at runtime
- you need to reconstruct or modify the document and re-serialize it
- the HTML is severely malformed and spec-correct error recovery matters (think legacy CMS output)
the HTML Parsing: Complete Guide to DOM, SAX, and Regex Approaches covers why tree-based parsing matters for complex document structures — html5ever’s full DOM is the Rust equivalent of that model.
html5ever is typically used through the scraper crate, which wraps it with a CSS selector API:
use scraper::{Html, Selector};
let document = Html::parse_document(&html_string);
let selector = Selector::parse("div.product > span.price").unwrap();
for element in document.select(&selector) {
println!("{}", element.text().collect::<String>());
}the scraper crate is the practical entry point — you rarely use html5ever directly unless you’re implementing a custom TreeSink.
Ecosystem and Language Context
if you’re coming from Python, Selectolax: The Fastest HTML Parser You’re Not Using in 2026 covers a Python parser with a similar streaming philosophy to lol-html — worth reading if you run mixed-language pipelines. for Node.js, the tradeoffs between Cheerio vs JSDom vs Linkedom for Node.js Scrapers (2026) map loosely to html5ever vs lol-html: Cheerio is your fast selector-based tool, JSDom is your full spec-compliant DOM.
Rust HTML parsing does have a gap: neither parser handles JavaScript rendering. if the pages you’re targeting require browser execution, you’re looking at headless tooling instead. the comparison in Pyppeteer vs Playwright Python: Which to Use in 2026 covers that layer, and it’s worth reading before assuming static parsing will be enough.
one niche worth calling out: if your scraper handles forms, sessions, or login flows in a stateless HTTP client, that logic sits above the parser layer entirely. Mechanicalsoup Library Review 2026: When Cookies + Forms Matter is Python-specific, but the session-management patterns it describes apply regardless of your parsing layer.
Practical Decision Checklist
- do you know exactly which elements you need before parsing? use lol-html
- do you need parent-context-aware selection (e.g., “only links inside nav”)? use html5ever via scraper
- are you processing more than 1,000 pages per minute on a single thread? benchmark both, lol-html likely wins
- is your HTML severely broken or CMS-generated garbage? html5ever’s error recovery is more robust
- are you building a reusable library or CLI tool that accepts arbitrary selectors? html5ever
- are you running in a constrained memory environment (edge functions, embedded)? lol-html
neither crate is “better” — they’re optimized for different architectures. the mistake most Rust scraper authors make is defaulting to scraper (html5ever) for everything because the API is familiar, then wondering why their pipeline saturates at 100 pages/second when lol-html could handle 10x that for the same extraction task.
Bottom Line
use lol-html when you’re doing high-volume, single-pass extraction with a known selector set — it is faster, leaner, and production-proven at cloud scale. use html5ever (via the scraper crate) when your queries are complex, relational, or determined at runtime. for most production scrapers, the right answer is lol-html for the hot path and html5ever for the edge cases. DRT covers both Rust and Python parsing tooling in depth — if you found this useful, the pillar guide on HTML parsing approaches is worth bookmarking for the broader context.
Related guides on dataresearchtools.com
- Selectolax: The Fastest HTML Parser You're Not Using in 2026
- Cheerio vs JSDom vs Linkedom for Node.js Scrapers (2026)
- Pyppeteer vs Playwright Python: Which to Use in 2026
- Mechanicalsoup Library Review 2026: When Cookies + Forms Matter
- Pillar: HTML Parsing: Complete Guide to DOM, SAX, and Regex Approaches