It seems file write permission isn’t granted for the Desktop. here’s the full article markdown body directly:
—
Most Go scrapers start with Colly and hit a wall the moment JavaScript renders the data they need. chromedp is the answer: a pure-Go library that drives a real Chrome instance over the DevTools Protocol, giving you full JS execution, cookie handling, and DOM access without leaving the Go ecosystem. this guide covers chromedp in 2026 — setup, real patterns, concurrency, and where it falls short.
why chromedp instead of Playwright or Puppeteer
chromedp has one serious advantage over Node.js alternatives: you compile a single binary, ship it, and run it anywhere Chrome is installed. no npm, no node_modules, no runtime version mismatches. if you’ve been tracking the Bun vs Deno vs Node.js web scraping benchmarks, you know JS runtimes carry real overhead even before your scraper logic runs. Go’s goroutine scheduler lets you spin up hundreds of concurrent browser tasks with a fraction of the memory a Node.js process would need.
the tradeoff is ecosystem depth. Playwright has better auto-wait logic, richer selectors, and a larger community. chromedp requires you to wire up waits manually and read DevTools Protocol docs when something breaks. for teams already running Go services, that’s a reasonable cost. for teams starting from scratch, Playwright may ship faster.
| feature | chromedp | Playwright (Go binding) | Rod |
|---|---|---|---|
| pure Go | yes | partial (wraps Node) | yes |
| auto-wait | no | yes | partial |
| network intercept | yes | yes | yes |
| screenshot/PDF | yes | yes | yes |
| maintained (2026) | active | active | active |
| binary size | small | large (Node runtime) | small |
Rod is a close competitor worth knowing — it wraps the same DevTools Protocol but with a higher-level API. for most production scrapers, chromedp or Rod both work; pick based on your team’s comfort with low-level control.
installation and a working scraper in 30 lines
go get github.com/chromedp/chromedpChrome or Chromium must be installed on the host. on Linux servers, install chromium-browser and set CHROME_PATH if needed. here’s a minimal scraper that loads a JS-heavy page and extracts a value:
package main
import (
"context"
"fmt"
"log"
"time"
"github.com/chromedp/chromedp"
)
func main() {
ctx, cancel := chromedp.NewContext(context.Background())
defer cancel()
ctx, cancel = context.WithTimeout(ctx, 30*time.Second)
defer cancel()
var price string
err := chromedp.Run(ctx,
chromedp.Navigate("https://example.com/product/42"),
chromedp.WaitVisible(`[data-testid="price"]`, chromedp.ByQuery),
chromedp.Text(`[data-testid="price"]`, &price, chromedp.ByQuery),
)
if err != nil {
log.Fatal(err)
}
fmt.Println("price:", price)
}a few things to note: WaitVisible blocks until the element appears in the DOM — this replaces Playwright’s auto-wait. always set a context.WithTimeout or a single slow page will hang your worker forever. the chromedp.ByQuery option uses CSS selectors; chromedp.ByXPath is available but slower.
concurrency patterns that don’t leak memory
the naive pattern — one chromedp.NewContext per goroutine — creates a new Chrome process per scrape. at scale, that’s process soup. the correct approach is an ExecAllocator pool:
opts := append(chromedp.DefaultExecAllocatorOptions[:],
chromedp.Flag("headless", true),
chromedp.Flag("no-sandbox", true),
chromedp.Flag("disable-gpu", true),
chromedp.UserAgent("Mozilla/5.0 (X11; Linux x86_64) ..."),
)
allocCtx, cancel := chromedp.NewExecAllocator(context.Background(), opts...)
defer cancel()
// child contexts = tabs, not new Chrome processes
for _, url := range urls {
tabCtx, tabCancel := chromedp.NewContext(allocCtx)
// run scrape, then tabCancel()
}child contexts created from allocCtx reuse the same Chrome process as separate tabs. memory stays bounded. a single Chrome process comfortably handles 8-16 concurrent tabs before you see GC pressure. for larger fleets, run multiple allocators, one per CPU core, and distribute URLs via a channel.
numbered steps for a production-safe setup:
- create one
ExecAllocatorper worker process - limit concurrent tab contexts to 8-12 via a semaphore (
chan struct{}) - call
tabCancel()immediately after each scrape — leaked tab contexts accumulate as zombie Chrome tabs - set a per-tab timeout via
context.WithTimeout, not a global one - log
chromedp.ListenTargetnetwork events if you need request/response interception
handling anti-bot and proxy injection
chromedp exposes Chrome’s --proxy-server flag directly through ExecAllocatorOptions:
chromedp.ProxyServer("http://user:pass@proxy.example.com:8080"),rotate proxies at the allocator level (one proxy per allocator) rather than per-tab — per-tab proxy injection requires DevTools network override calls and is more brittle. for sticky sessions (same IP for a login flow), assign one allocator per session and don’t share.
anti-bot bypass is where raw chromedp falls short. it ships no stealth patches, so headless detection via navigator.webdriver is trivial for Cloudflare, Datadome, and Akamai. your options:
- patch via
chromedp.Evaluateto overwritenavigator.webdriverbefore navigation - port puppeteer-extra-plugin-stealth concepts manually (TLS fingerprint, canvas noise, etc.)
- use a managed browser API (Browserless, Bright Data Scraping Browser) and point chromedp at its remote endpoint via
chromedp.NewRemoteAllocator
for a lighter alternative that avoids the headless stack entirely — useful when you control the browser session — see the best web scraping Chrome extensions guide for browser-side extraction patterns.
when chromedp is the wrong choice
chromedp is overkill for static HTML. if the page doesn’t need JavaScript, use net/http + goquery and run it 10x faster with 1% of the memory. chromedp is also a poor fit for:
- very high concurrency (1000+ simultaneous pages): browser automation doesn’t scale linearly, consider a dedicated scraping cluster or a headless API
- mobile simulation: Chrome DevTools device emulation works, but it’s coarser than a real device; for concurrency-first architectures, the approach covered in Elixir Crawly’s BEAM-based scraper is worth comparing
- serverless environments: cold-starting Chrome in AWS Lambda is possible but painful; you’ll fight layer size limits and
--no-sandboxsecurity tradeoffs
key situations where chromedp earns its keep:
- SPA targets where all data loads via XHR after JS execution
- login flows requiring session cookie capture
- sites using CSRF tokens injected by JS
- screenshot or PDF pipelines already running in Go
if you’re evaluating Go browser scrapers in 2026, also look at Web Scraping with Bun as a JS-side alternative if your team leans frontend — Bun’s startup time and native fetch make it surprisingly competitive for lighter JS-rendering tasks.
Bottom line
chromedp is the right tool for Go teams that need reliable headless Chrome automation without leaving the Go binary deployment model. use an ExecAllocator pool, cap concurrency at 8-12 tabs per process, and handle stealth patches explicitly — the library won’t do it for you. for static pages, drop back to goquery; for extreme scale, front it with a managed browser API. DRT covers this stack in depth because the choice of runtime and browser engine has compounding effects on scraper reliability and infrastructure cost.
Related guides on dataresearchtools.com
- Bun vs Deno vs Node.js for Web Scraping in 2026: Speed Benchmarks
- Go Web Scraping with Colly v2: Production Patterns for 2026
- Elixir Web Scraping with Crawly: BEAM Concurrency for Scrapers (2026)
- Web Scraping with Bun: Faster Than Node.js for Scrapers in 2026?
- Pillar: Best Web Scraping Chrome Extensions 2026: Extract Data From Your Browser