Go Web Scraping with chromedp: Headless Chrome in Pure Go (2026)

It seems file write permission isn’t granted for the Desktop. here’s the full article markdown body directly:

—

Most Go scrapers start with Colly and hit a wall the moment JavaScript renders the data they need. chromedp is the answer: a pure-Go library that drives a real Chrome instance over the DevTools Protocol, giving you full JS execution, cookie handling, and DOM access without leaving the Go ecosystem. this guide covers chromedp in 2026 — setup, real patterns, concurrency, and where it falls short.

why chromedp instead of Playwright or Puppeteer

chromedp has one serious advantage over Node.js alternatives: you compile a single binary, ship it, and run it anywhere Chrome is installed. no npm, no node_modules, no runtime version mismatches. if you’ve been tracking the Bun vs Deno vs Node.js web scraping benchmarks, you know JS runtimes carry real overhead even before your scraper logic runs. Go’s goroutine scheduler lets you spin up hundreds of concurrent browser tasks with a fraction of the memory a Node.js process would need.

the tradeoff is ecosystem depth. Playwright has better auto-wait logic, richer selectors, and a larger community. chromedp requires you to wire up waits manually and read DevTools Protocol docs when something breaks. for teams already running Go services, that’s a reasonable cost. for teams starting from scratch, Playwright may ship faster.

feature	chromedp	Playwright (Go binding)	Rod
pure Go	yes	partial (wraps Node)	yes
auto-wait	no	yes	partial
network intercept	yes	yes	yes
screenshot/PDF	yes	yes	yes
maintained (2026)	active	active	active
binary size	small	large (Node runtime)	small

Rod is a close competitor worth knowing — it wraps the same DevTools Protocol but with a higher-level API. for most production scrapers, chromedp or Rod both work; pick based on your team’s comfort with low-level control.

installation and a working scraper in 30 lines

go get github.com/chromedp/chromedp

Chrome or Chromium must be installed on the host. on Linux servers, install chromium-browser and set CHROME_PATH if needed. here’s a minimal scraper that loads a JS-heavy page and extracts a value:

package main

import (
    "context"
    "fmt"
    "log"
    "time"

    "github.com/chromedp/chromedp"
)

func main() {
    ctx, cancel := chromedp.NewContext(context.Background())
    defer cancel()

    ctx, cancel = context.WithTimeout(ctx, 30*time.Second)
    defer cancel()

    var price string
    err := chromedp.Run(ctx,
        chromedp.Navigate("https://example.com/product/42"),
        chromedp.WaitVisible(`[data-testid="price"]`, chromedp.ByQuery),
        chromedp.Text(`[data-testid="price"]`, &price, chromedp.ByQuery),
    )
    if err != nil {
        log.Fatal(err)
    }
    fmt.Println("price:", price)
}

a few things to note: WaitVisible blocks until the element appears in the DOM — this replaces Playwright’s auto-wait. always set a context.WithTimeout or a single slow page will hang your worker forever. the chromedp.ByQuery option uses CSS selectors; chromedp.ByXPath is available but slower.

concurrency patterns that don’t leak memory

the naive pattern — one chromedp.NewContext per goroutine — creates a new Chrome process per scrape. at scale, that’s process soup. the correct approach is an ExecAllocator pool:

opts := append(chromedp.DefaultExecAllocatorOptions[:],
    chromedp.Flag("headless", true),
    chromedp.Flag("no-sandbox", true),
    chromedp.Flag("disable-gpu", true),
    chromedp.UserAgent("Mozilla/5.0 (X11; Linux x86_64) ..."),
)

allocCtx, cancel := chromedp.NewExecAllocator(context.Background(), opts...)
defer cancel()

// child contexts = tabs, not new Chrome processes
for _, url := range urls {
    tabCtx, tabCancel := chromedp.NewContext(allocCtx)
    // run scrape, then tabCancel()
}

child contexts created from allocCtx reuse the same Chrome process as separate tabs. memory stays bounded. a single Chrome process comfortably handles 8-16 concurrent tabs before you see GC pressure. for larger fleets, run multiple allocators, one per CPU core, and distribute URLs via a channel.

numbered steps for a production-safe setup:

create one ExecAllocator per worker process
limit concurrent tab contexts to 8-12 via a semaphore (chan struct{})
call tabCancel() immediately after each scrape — leaked tab contexts accumulate as zombie Chrome tabs
set a per-tab timeout via context.WithTimeout, not a global one
log chromedp.ListenTarget network events if you need request/response interception

handling anti-bot and proxy injection

chromedp exposes Chrome’s --proxy-server flag directly through ExecAllocatorOptions:

chromedp.ProxyServer("http://user:pass@proxy.example.com:8080"),

rotate proxies at the allocator level (one proxy per allocator) rather than per-tab — per-tab proxy injection requires DevTools network override calls and is more brittle. for sticky sessions (same IP for a login flow), assign one allocator per session and don’t share.

anti-bot bypass is where raw chromedp falls short. it ships no stealth patches, so headless detection via navigator.webdriver is trivial for Cloudflare, Datadome, and Akamai. your options:

patch via chromedp.Evaluate to overwrite navigator.webdriver before navigation
port puppeteer-extra-plugin-stealth concepts manually (TLS fingerprint, canvas noise, etc.)
use a managed browser API (Browserless, Bright Data Scraping Browser) and point chromedp at its remote endpoint via chromedp.NewRemoteAllocator

for a lighter alternative that avoids the headless stack entirely — useful when you control the browser session — see the best web scraping Chrome extensions guide for browser-side extraction patterns.

when chromedp is the wrong choice

chromedp is overkill for static HTML. if the page doesn’t need JavaScript, use net/http + goquery and run it 10x faster with 1% of the memory. chromedp is also a poor fit for:

very high concurrency (1000+ simultaneous pages): browser automation doesn’t scale linearly, consider a dedicated scraping cluster or a headless API
mobile simulation: Chrome DevTools device emulation works, but it’s coarser than a real device; for concurrency-first architectures, the approach covered in Elixir Crawly’s BEAM-based scraper is worth comparing
serverless environments: cold-starting Chrome in AWS Lambda is possible but painful; you’ll fight layer size limits and --no-sandbox security tradeoffs

key situations where chromedp earns its keep:

SPA targets where all data loads via XHR after JS execution
login flows requiring session cookie capture
sites using CSRF tokens injected by JS
screenshot or PDF pipelines already running in Go

if you’re evaluating Go browser scrapers in 2026, also look at Web Scraping with Bun as a JS-side alternative if your team leans frontend — Bun’s startup time and native fetch make it surprisingly competitive for lighter JS-rendering tasks.

Bottom line

chromedp is the right tool for Go teams that need reliable headless Chrome automation without leaving the Go binary deployment model. use an ExecAllocator pool, cap concurrency at 8-12 tabs per process, and handle stealth patches explicitly — the library won’t do it for you. for static pages, drop back to goquery; for extreme scale, front it with a managed browser API. DRT covers this stack in depth because the choice of runtime and browser engine has compounding effects on scraper reliability and infrastructure cost.