Web Scraping with Reqwest + Tokio in Rust: Async Patterns (2026)

Rust is not the obvious choice for web scraping, but if you need to push thousands of concurrent HTTP requests with predictable latency and near-zero overhead, reqwest running on a tokio async runtime is one of the fastest stacks available in 2026. This article walks through the patterns that actually work in production, not the toy examples from the crate docs.

Why Rust for Scraping in 2026

The honest answer is that most scraping workloads don’t need Rust. If you’re hitting a hundred URLs a day, use Python. But once you’re running fleet-scale crawls, the cost of GC pauses, thread overhead, and memory bloat adds up fast. Rust’s async model compiles down to a state machine with no runtime cost, and reqwest wraps hyper under the hood, which is the same HTTP engine behind many production reverse proxies.

Compared to JVM-based stacks, the startup time is negligible. If you’ve looked at Scala web scraping with Sttp + Jsoup, you’ll know the JVM warms up fast for long-running jobs, but for short-lived Lambda-style scrapers, Rust binaries win on cold start by 300-800ms consistently.

Setting Up Reqwest + Tokio

Add these to Cargo.toml:

[dependencies]
reqwest = { version = "0.12", features = ["json", "cookies", "gzip"] }
tokio = { version = "1", features = ["full"] }
scraper = "0.19"
tower = { version = "0.4", features = ["limit", "retry"] }

The cookies feature is non-obvious but important: without it, session-based sites drop you after the first request. gzip cuts transfer size by 60-80% on most HTML pages.

A minimal async scraper looks like this:

use reqwest::Client;
use tokio::time::{sleep, Duration};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = Client::builder()
        .user_agent("Mozilla/5.0 (compatible; DataBot/1.0)")
        .timeout(Duration::from_secs(10))
        .pool_max_idle_per_host(20)
        .build()?;

    let urls: Vec<&str> = vec![
        "https://example.com/page/1",
        "https://example.com/page/2",
    ];

    let handles: Vec<_> = urls.into_iter().map(|url| {
        let c = client.clone();
        tokio::spawn(async move {
            c.get(url).send().await
        })
    }).collect();

    for handle in handles {
        let resp = handle.await??;
        println!("{} -> {}", resp.url(), resp.status());
    }

    Ok(())
}

Client is cheap to clone because it wraps an Arc internally. Spawning one task per URL with tokio::spawn gives you true concurrency bounded only by your connection pool settings.

Concurrency Patterns That Don’t Blow Up

Naively spawning 10,000 tasks at once will saturate your connection pool, trigger rate limiting, and OOM on large response bodies held in memory. Use futures::stream::iter with buffer_unordered to cap in-flight requests:

use futures::stream::{self, StreamExt};

let results = stream::iter(urls)
    .map(|url| {
        let client = client.clone();
        async move { client.get(url).send().await }
    })
    .buffer_unordered(50) // max 50 concurrent
    .collect::<Vec<_>>()
    .await;

50 concurrent requests is a reasonable default for scraping a single domain without triggering blocks. For multi-domain crawls, you want per-domain rate limiting with tower‘s RateLimit layer or a semaphore per host.

Key patterns to implement:

  • Retry with exponential backoff on 429 and 503 responses, not on 404s
  • Per-host connection pooling via pool_max_idle_per_host to avoid SYN floods
  • Timeout at two levels: connection timeout (3s) and total request timeout (10-15s)
  • Circuit breaker per domain using tower::limit::ConcurrencyLimit to pause crawling when error rate spikes

For architecture guidance that scales beyond a single crawler process, the web scraping architecture patterns article covers queue-based designs and worker coordination that apply directly to Rust scrapers.

Comparing Rust Against Other Async Scraping Stacks

StackThroughput (req/s, single core)Memory per 1k tasksCold startMaturity for scraping
Rust + reqwest/tokio~9,000~12 MB<10msMedium
Go + colly~7,000~18 MB<15msHigh
Node.js + Axios~2,500~85 MB~80msVery high
Bun + fetch~3,800~55 MB~30msMedium
Python + httpx/asyncio~1,200~110 MB~120msVery high

Bun has closed the gap on Node significantly. If you’ve read the Bun vs Deno vs Node.js scraping benchmarks, you’ll see Bun hits close to 3,800 req/s in optimistic conditions, but Rust still has roughly 2.4x the throughput at lower memory cost. The tradeoff is development speed: a Rust scraper takes 3-4x longer to write than the equivalent Bun scraper.

Go sits in a practical middle ground, with better ecosystem support for scraping-specific tasks. Rust wins on raw numbers, not on ecosystem breadth.

Handling Anti-Bot and Proxy Rotation

Reqwest exposes a Proxy builder for routing through rotating proxies:

let proxy = reqwest::Proxy::all("http://user:pass@proxy.example.com:8080")?;
let client = Client::builder()
    .proxy(proxy)
    .build()?;

For rotating residential proxies, build a proxy pool as a Vec and round-robin or randomly select one per request. Wrap each request in a retry loop that swaps the proxy on 407 or connection errors.

TLS fingerprinting is the real challenge in 2026. reqwest uses rustls or native-tls and sends a TLS ClientHello that fingerprinting services can distinguish from Chrome. Options:

  1. Use reqwest with native-tls on macOS/Linux to match OS-level TLS more closely
  2. Route through a headless browser for JS-heavy targets (Playwright handles this, see the Cypress vs Playwright comparison for when to escalate to a browser)
  3. Accept the fingerprint mismatch for targets that don’t do JA3/JA4 checks

Most B2B SaaS sites and e-commerce APIs don’t fingerprint at the TLS layer. Government and financial sites often do.

Parsing HTML with the Scraper Crate

scraper provides a CSS selector API similar to BeautifulSoup:

use scraper::{Html, Selector};

let body = resp.text().await?;
let document = Html::parse_document(&body);
let selector = Selector::parse("div.product-title a").unwrap();

for element in document.select(&selector) {
    println!("{}", element.inner_html());
}

Pre-compile selectors outside your loop. Selector::parse is not cheap and will show up in profiles if called per-document. For XPath requirements, there is no mature native option in Rust yet; you’ll either shell out to a Python subprocess or switch to Go’s goquery + htmlquery combination.

Bottom line

Use Rust with reqwest and tokio when throughput and memory efficiency are the actual constraints, not when you want to move fast. For most data teams, Python or Bun gets the job done with a fraction of the development cost. When you do go Rust, the patterns above: bounded concurrency, per-host rate limiting, and pre-compiled selectors, are where the real gains come from. DRT covers the full scraping stack from infrastructure to framework selection, so check the other guides if you’re still evaluating languages for a new project.

Related guides on dataresearchtools.com

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top

Resources

Proxy Signals Podcast
Operator-level insights on mobile proxies and access infrastructure.

Multi-Account Proxies: Setup, Types, Tools & Mistakes (2026)