Best Go scraping libraries 2026 ranked
Best Go scraping libraries in 2026 occupy a niche that is small but unusually high-leverage. Go’s concurrency model (goroutines and channels) maps almost perfectly to the scraping problem, and Go’s compiled binary makes deployment dramatically simpler than Python or Node alternatives. The downside is library breadth: the Go scraping ecosystem has fewer options than Python’s, and the existing libraries are less actively maintained on average. For specific workloads (high-throughput HTTP scraping, distributed crawler workers, scraping infrastructure embedded in Go services), Go is the right choice and the libraries that exist are excellent. For one-off scrapers or projects that benefit from a rich ecosystem, Python or Node remain easier.
This guide ranks the Go scraping libraries actually worth using in 2026, with honest performance comparisons, clear use case mapping, and the gotchas specific to Go’s approach.
Why Go for scraping
Three reasons Go is interesting for scrapers:
Concurrency: a goroutine costs about 2 KB of stack memory. You can run 10,000+ concurrent goroutines on a modest server. Compared to Python’s coroutine overhead and Node’s event loop limits, Go’s concurrency is genuinely different in scale.
Compile-once deploy-anywhere: a Go binary is a single static file. Deployment to a new server or container is scp and ./scraper. No virtualenv, no node_modules, no version drift between dev and prod.
HTTP performance: Go’s net/http standard library is fast enough that “scraping” and “high-performance HTTP service” use the same toolkit. fasthttp pushes performance even further for extreme throughput needs.
Three reasons Go is sometimes wrong:
Smaller library ecosystem: fewer parsers, fewer pre-built scrapers, less community content.
No native browser automation: Chromedp and Rod are good but not as polished as Playwright in Python or JavaScript.
Verbose for one-offs: Python’s requests two-line scraper has no clean Go equivalent.
HTTP clients
net/http (standard library)
The standard library client. Production-grade, well-documented, fast. Right choice for most scraping HTTP needs.
package main
import (
"io"
"net/http"
"time"
)
func fetch(url string) ([]byte, error) {
client := &http.Client{Timeout: 10 * time.Second}
req, err := http.NewRequest("GET", url, nil)
if err != nil {
return nil, err
}
req.Header.Set("User-Agent", "Mozilla/5.0 ...")
resp, err := client.Do(req)
if err != nil {
return nil, err
}
defer resp.Body.Close()
return io.ReadAll(resp.Body)
}
Best for: most Go HTTP work. Default unless you have specific needs.
fasthttp
Aggressive performance-oriented HTTP library that bypasses some net/http abstractions for raw speed. 5-10x faster than net/http on benchmarks. The API is different (uses fasthttp.Request and fasthttp.Response instead of net/http types).
import "github.com/valyala/fasthttp"
func fetchFast(url string) ([]byte, error) {
req := fasthttp.AcquireRequest()
resp := fasthttp.AcquireResponse()
defer fasthttp.ReleaseRequest(req)
defer fasthttp.ReleaseResponse(resp)
req.SetRequestURI(url)
req.Header.SetUserAgent("Mozilla/5.0 ...")
if err := fasthttp.Do(req, resp); err != nil {
return nil, err
}
return resp.Body(), nil
}
Best for: extreme throughput needs (10k+ requests/sec), low-latency requirements.
resty
The popular convenience HTTP client wrapping net/http with a nicer API. Fluent builder pattern, JSON serialization, retry support. Slightly slower than raw net/http but more readable.
import "github.com/go-resty/resty/v2"
client := resty.New().SetTimeout(10 * time.Second)
resp, err := client.R().
SetHeader("User-Agent", "Mozilla/5.0").
Get("https://example.com")
Best for: developer ergonomics, projects that benefit from convenience over absolute performance.
HTML parsers
GoQuery
The jQuery-style HTML parser. Cleanest API for Go HTML manipulation. Built on golang.org/x/net/html under the hood.
import (
"github.com/PuerkitoBio/goquery"
"strings"
)
func parseTitles(html string) []string {
doc, err := goquery.NewDocumentFromReader(strings.NewReader(html))
if err != nil {
return nil
}
var titles []string
doc.Find("h2.product-title").Each(func(i int, s *goquery.Selection) {
titles = append(titles, s.Text())
})
return titles
}
Best for: most Go HTML parsing. Default choice.
golang.org/x/net/html
The standard parser GoQuery wraps. Direct use is verbose but available for custom AST manipulation.
Best for: low-level parsing needs, when you want zero dependencies.
colly’s parser
Colly framework includes its own HTML traversal which is less verbose than GoQuery for callback-driven scraping. Used in conjunction with Colly only.
Browser automation
Chromedp
The dominant Go browser automation library. Uses Chrome DevTools Protocol directly without intermediate libraries. Fast, well-maintained, but the API is verbose compared to Playwright.
import (
"context"
"github.com/chromedp/chromedp"
"time"
)
func scrapeWithChrome(url string) (string, error) {
ctx, cancel := chromedp.NewContext(context.Background())
defer cancel()
ctx, cancel = context.WithTimeout(ctx, 30*time.Second)
defer cancel()
var title string
err := chromedp.Run(ctx,
chromedp.Navigate(url),
chromedp.WaitVisible("h1.product-title"),
chromedp.Text("h1.product-title", &title),
)
return title, err
}
Best for: most Go browser automation. The default choice when you need a real browser.
Rod
A modern alternative to Chromedp with a more fluent API. Active development, strong feature parity with Playwright.
import "github.com/go-rod/rod"
browser := rod.New().MustConnect()
page := browser.MustPage("https://example.com").MustWaitLoad()
title := page.MustElement("h1.product-title").MustText()
Best for: developers who prefer Rod’s API ergonomics over Chromedp’s.
Playwright-go
The Microsoft Playwright API for Go. Newer and less mature than Chromedp/Rod but offers cross-browser (Firefox, WebKit) support that the Chrome-only alternatives lack.
Best for: cross-browser needs in Go, teams using Playwright in other languages.
Frameworks
Colly
The dominant Go scraping framework. Built-in caching, concurrency, request rate limiting, and HTML parsing callbacks. The right choice for crawler-heavy Go scrapers.
import "github.com/gocolly/colly/v2"
c := colly.NewCollector(
colly.AllowedDomains("example.com"),
colly.Async(true),
)
c.Limit(&colly.LimitRule{
DomainGlob: "*",
Parallelism: 10,
Delay: 100 * time.Millisecond,
})
c.OnHTML("div.product", func(e *colly.HTMLElement) {
fmt.Println(e.ChildText("h2.title"))
})
c.OnHTML("a.next", func(e *colly.HTMLElement) {
e.Request.Visit(e.Attr("href"))
})
c.Visit("https://shop.example.com/page/1")
c.Wait()
Best for: large crawlers, the standard Go scraping framework.
Geziyor
Another Go scraping framework with similar feature set to Colly. Less popular but actively maintained.
Best for: Colly alternatives.
Comparison table
| library | layer | speed | learning curve | best for |
|---|---|---|---|---|
| net/http | HTTP | fast | easy | most HTTP work |
| fasthttp | HTTP | fastest | medium | extreme throughput |
| resty | HTTP | fast | easy | developer ergonomics |
| GoQuery | parser | fast | easy | most HTML parsing |
| golang.org/x/net/html | parser | fast | hard | custom AST work |
| Chromedp | browser | mid | medium | most browser automation |
| Rod | browser | mid | medium | Chromedp alternative |
| Playwright-go | browser | mid | medium | cross-browser |
| Colly | framework | fast | medium | most crawler work |
| Geziyor | framework | fast | medium | Colly alternative |
Decision matrix: solopreneur, SMB, enterprise
| profile | scale | recommended stack | reasoning |
|---|---|---|---|
| Solopreneur Go-curious | <10k pages/day | net/http + GoQuery | Standard library + the one parser |
| Indie scraper, single binary | <500k pages/day | net/http + GoQuery + Colly | Framework value at this scale |
| Indie extreme throughput | <1M pages/day | fasthttp + GoQuery | When net/http becomes a bottleneck |
| SMB scraping infra | 1-10M pages/day | Colly + Redis queue + custom workers | Distribute across N binaries |
| SMB JS-heavy | <500k pages/day | Chromedp + GoQuery post-parse | Browser only when needed |
| Embedded scraping in service | varies | net/http only | Avoid framework imports inside larger services |
| Enterprise data pipeline | 10M+ pages/day | Custom Go workers + Kafka + GoQuery | Maximum control, minimum dependencies |
The right pattern for Go at scale is custom workers reading from a queue rather than a monolithic Colly process. Goroutines do the concurrency; Redis or NATS does the work distribution. This pattern scales linearly with worker count and survives single-machine failures cleanly.
Migration path: Python or Node to Go
Most Go migrations happen when Python or Node scrapers hit infrastructure limits at scale. The playbook:
- Identify the throughput bottleneck. If your Python scraper saturates one CPU core at 800 req/s, Go can run the same workload at 4000+ req/s on one core. If you are not CPU-bound, the migration may not pay off.
- Port one scraper end-to-end. Choose the highest-throughput single-target scraper as the migration pilot. Validate output equivalence on a sample.
- Keep Python or Node for orchestration. Many teams use Go for the scraper workers and Python for the data pipeline (Pandas, ML preprocessing). The tools do not have to match.
- Containerize and deploy in parallel. Run Go workers alongside Python workers reading from the same queue. Cut over by reducing Python worker count over a few weeks.
- Re-evaluate at six months. If the Go workers are stable and the throughput gain is real, migrate the rest. If they are not, the original choice was right.
The migration is rarely binary. Most production scrapers end up polyglot with Go for hot-path workers and Python for one-off and analytical work.
Performance benchmarks
Same workload as Python and Node benchmarks: 10,000 simple HTML pages from a local mirror, single Go binary.
| stack | total time | requests/sec |
|---|---|---|
| net/http (50 goroutines) | 6s | 1666 |
| fasthttp (50 goroutines) | 3s | 3333 |
| resty (50 goroutines) | 8s | 1250 |
| Colly (default) | 7s | 1428 |
| Chromedp (50 contexts) | 110s | 90 |
Go HTTP throughput is the highest of the three languages we benchmarked. fasthttp specifically is faster than even Node’s undici for this workload. Browser automation is similar across all languages because the bottleneck is browser execution.
Cost worked example
For a 1M-pages-per-day Go scraping workload (roughly 12 req/s sustained):
- 1 medium VPS ($40/mo, 8 vCPU, 16 GB)
- net/http + GoQuery + Colly stack (free)
- uTLS for TLS fingerprint impersonation when needed (free)
- Smartproxy/Decodo residential proxies (~$200/mo for ~25 GB)
- Redis on a small managed instance ($10/mo) for distributed work coordination
- PostgreSQL on a hosted instance ($25/mo)
Total: about $275/month for a workload that handles 30 million pages per month. The Python or Node equivalent would need 2-3x the compute capacity for the same throughput, raising infrastructure cost by $80-120/month. Go’s compiled binary also reduces deployment complexity (no language runtime, no virtualenv) and operational toil.
The break-even point where Go’s lower compute cost overcomes its higher development cost typically sits around 10M pages/month. Below that, Python or Node ergonomics usually win on total team productivity.
Stack recommendations
Most Go scraping: net/http + GoQuery + Colly. Standard library plus the two best community libraries. Adequate for almost everything.
Extreme throughput: fasthttp + GoQuery + Colly. When you need 10k+ HTTP requests per second per machine.
Browser-required scraping: Chromedp + GoQuery (for parsing extracted HTML). Use Chromedp for JS execution, GoQuery for the parsing because it is more ergonomic.
Distributed scraping: net/http + GoQuery + custom code with Redis queues. Colly does not have great distributed support; for multi-machine scrapers you build the coordination layer yourself.
Scraping inside a larger Go service: net/http directly. Avoid pulling in framework overhead for embedded scraping inside a service that does other things.
Idiomatic Go scraper template
A modern Go scraper using standard libraries:
package main
import (
"context"
"fmt"
"io"
"net/http"
"strings"
"sync"
"time"
"github.com/PuerkitoBio/goquery"
)
type Product struct {
Name string
Price string
URL string
}
func fetchPage(ctx context.Context, url string) (string, error) {
req, err := http.NewRequestWithContext(ctx, "GET", url, nil)
if err != nil {
return "", err
}
req.Header.Set("User-Agent", "Mozilla/5.0")
client := &http.Client{Timeout: 15 * time.Second}
resp, err := client.Do(req)
if err != nil {
return "", err
}
defer resp.Body.Close()
body, err := io.ReadAll(resp.Body)
return string(body), err
}
func parseProducts(html string) []Product {
doc, err := goquery.NewDocumentFromReader(strings.NewReader(html))
if err != nil {
return nil
}
var products []Product
doc.Find("div.product-card").Each(func(i int, s *goquery.Selection) {
products = append(products, Product{
Name: s.Find("h2.title").Text(),
Price: s.Find("span.price").Text(),
URL: s.Find("a").AttrOr("href", ""),
})
})
return products
}
func scrapeAll(urls []string, concurrency int) []Product {
sem := make(chan struct{}, concurrency)
var wg sync.WaitGroup
var mu sync.Mutex
var allProducts []Product
for _, url := range urls {
wg.Add(1)
go func(u string) {
defer wg.Done()
sem <- struct{}{}
defer func() { <-sem }()
html, err := fetchPage(context.Background(), u)
if err != nil {
fmt.Println("error:", err)
return
}
products := parseProducts(html)
mu.Lock()
allProducts = append(allProducts, products...)
mu.Unlock()
}(url)
}
wg.Wait()
return allProducts
}
func main() {
urls := []string{"https://example.com/p/1", "https://example.com/p/2"}
products := scrapeAll(urls, 20)
for _, p := range products {
fmt.Printf("%+v\n", p)
}
}
This pattern handles 1500+ pages per minute on a small VPS with proper concurrency control.
Distributed scraper architecture
For workloads that exceed one machine, the canonical Go scraper architecture is:
- Coordinator service that pushes URLs to a queue (Redis Streams, NATS JetStream, or Kafka).
- Worker pool of N stateless Go binaries, each consuming from the queue, scraping in parallel goroutines, and writing results to a sink (Postgres, S3, Kafka).
- Health and metrics exposed via Prometheus endpoints on each worker; scraped via a central Prometheus + Grafana stack.
- Dead-letter queue for URLs that fail repeatedly, picked up by a slower retry process or surfaced for manual investigation.
This pattern scales linearly: doubling worker count doubles throughput up until the target rate-limits or the queue itself bottlenecks. With NATS or Kafka, the queue layer easily handles 100k messages/sec, far beyond what most scrapers need.
Common mistakes to avoid
Forgetting to close response bodies: every HTTP response body must be closed or you leak file descriptors. The defer resp.Body.Close() pattern is essential.
Unbounded goroutine spawning: launching one goroutine per URL without a semaphore exhausts memory and overwhelms target sites. Use a buffered channel as a semaphore.
Using fasthttp when you do not need it: fasthttp’s API is different from net/http and the integration cost is real. For most workloads, net/http is fast enough.
Ignoring context cancellation: pass context.Context through your scraper functions so you can cancel cleanly on shutdown signals.
Trying to use Python-style async patterns: Go’s concurrency primitives (goroutines, channels, sync.WaitGroup) are different from async/await. Embrace them rather than fighting them.
We cover the Python and Node alternatives in our best Python scraping libraries 2026 and best Node.js scraping libraries 2026 reviews.
External authoritative reference: the Go net/http documentation covers the standard library client.
Common gotchas
- Goroutine leaks. Goroutines started without a clear exit path can leak forever if their channel never closes. Always have a
context.Done()check or aselectwith a timeout case. - net/http connection reuse defaults. The default Transport reuses connections, which is good for performance but bad if you want each request from a fresh proxy. For per-request isolation, set
Transport.DisableKeepAlives = true. - fasthttp’s API allocations.
fasthttp.RequestandResponseare pooled; you mustAcquireandReleasethem. ForgettingReleasecauses memory growth that looks like a leak. - GoQuery selector syntax differences. GoQuery uses CSS selectors but does not support all jQuery extensions.
:contains()is supported,:has()is not. Test your selectors against the actual DOM before assuming. - Chromedp context cancellation. Cancelling the parent context kills all in-flight Chrome operations, but the headless Chrome process can survive. Always call
chromedp.Cancel(ctx)explicitly to ensure cleanup. - JSON unmarshaling silent failures. Unknown fields are silently dropped by
json.Unmarshal. If your target’s response shape changes, you may not notice. UseDisallowUnknownFields()on the decoder during development. - Slice append concurrency. Multiple goroutines appending to the same slice corrupt it. Use a
sync.Mutexor a channel-based aggregator. - Colly OnHTML callback ordering. Multiple
OnHTMLhandlers for overlapping selectors fire in registration order, not in DOM order. Test handler ordering if you depend on it.
When to use Go vs Python vs Node
| consideration | best language |
|---|---|
| highest HTTP throughput per machine | Go |
| richest ecosystem | Python |
| best browser automation | Python or Node (Playwright) |
| simplest deployment | Go (single binary) |
| smallest learning curve | Python |
| best for embedded scraping in services | Go |
| largest community of scraping content | Python |
| AI/LLM integration | Python |
For dedicated scraping projects that scale and benefit from compiled performance, Go is excellent. For one-offs and projects requiring rich ecosystem support, Python wins. For Node-shop infrastructure, Node Crawlee fits naturally.
FAQ
Q: Colly or write my own?
For projects with link-following, deduplication, and rate-limiting needs across thousands of pages, Colly saves significant code. For simple scrapers with a known URL list, raw net/http + GoQuery is enough.
Q: Chromedp or Rod or Playwright-go?
Chromedp is the safe default with the most production usage. Rod has a nicer API. Playwright-go is right when you need Firefox or WebKit. Performance is similar across all three.
Q: how do I handle TLS fingerprinting in Go?
Go’s TLS stack does not have first-class fingerprint impersonation. The closest options are utls (uTLS) which mimics specific browser TLS handshakes, and routing through a proxy that handles fingerprinting.
Q: is fasthttp worth the complexity?
For most workloads, no. fasthttp gives you an extra 2-5x throughput at the cost of API divergence from the standard library. Use it when you have measured a performance need that net/http cannot meet.
Q: does Go have an equivalent to Scrapy?
Colly is the closest. Less batteries-included than Scrapy but covers the core crawler patterns.
Q: how do I handle proxies in Go?
Set Transport.Proxy on your http.Client. For per-request proxy rotation, build a custom Transport that selects a proxy from a pool. Colly accepts a proxy switcher function natively.
Q: are there structured-data extraction libraries?
A few exist (go-rod/rod for browser, tdewolff/parse for streaming HTML). For strongly typed extraction, write a struct and unmarshal CSS selectors into it manually with reflection or code generation.
Q: is Go’s gc a problem for long-running scrapers?
Generally no. Go’s GC has been excellent since 1.14 with sub-millisecond pauses. The main GC concern is allocating in tight loops; reuse buffers and pool objects with sync.Pool if you see GC pressure.
Closing
Go scraping in 2026 is the right choice for high-throughput dedicated scraping infrastructure, distributed crawler workers, and embedded scraping inside Go services. The ecosystem is smaller than Python’s but the libraries that exist are excellent. net/http + GoQuery + Colly is the standard stack; fasthttp and Chromedp cover specialized needs. For broader scraping infrastructure see our dev-tools-projects category hub.