Deploying Scrapers on Hetzner: Cheapest Production Stack 2026

Hetzner is the default choice for cost-conscious scraping infrastructure in 2026 — here’s the complete production stack.

If you’re running scrapers at any meaningful volume, Hetzner is probably the cheapest dedicated or cloud VM provider that doesn’t make you compromise on network performance. Deploying scrapers on Hetzner gives you bare-metal-level control, EU and US-East regions, unmetered traffic on most plans, and pricing that undercuts AWS by 60-80% for equivalent compute. This guide covers the full production stack: VM sizing, proxy routing, job scheduling, and monitoring, based on what actually works in 2026.

Why Hetzner for Scrapers

Hetzner’s CAX and CX lines are the two starting points. CAX runs on Ampere ARM64 chips and costs roughly €3.79/month for 2 vCPU + 4GB RAM. CX runs on x86. For pure scraping workloads (HTTP requests, light parsing), ARM64 is fine and meaningfully cheaper. For anything running Playwright or Puppeteer compiled against x86 binaries, stick with CX or recompile your browsers.

The key advantage Hetzner has over serverless scraping platforms is egress pricing. Hetzner bundles 20TB of outbound traffic on most plans. If you’ve hit Modal’s cold-start limits doing browser-heavy scraping (as covered in Deploying Scrapers on Modal Labs 2026: Serverless GPU Headless Browsers), a warm VM that’s always on solves that immediately.

One honest tradeoff: you’re managing your own infrastructure. No autoscaling out of the box, no managed queues, no zero-downtime deploys unless you build them.

VM Sizing and OS Setup

For a single-scraper node handling up to 50 concurrent HTTP workers:

PlanvCPURAMStorageEgressPrice/mo
CAX112 ARM4GB40GB SSD20TB~€3.79
CX222 x864GB40GB SSD20TB~€4.35
CX324 x868GB80GB SSD20TB~€8.49
CX428 x8616GB160GB SSD20TB~€18.39

For Playwright/Chromium-based scrapers, CX32 is the minimum practical starting point. Chromium alone consumes 200-400MB per browser instance. Eight concurrent browser workers will saturate a 4GB node.

Ubuntu 24.04 LTS is the recommended OS. After provisioning:

apt update && apt upgrade -y
apt install -y python3-pip python3-venv git redis-server nginx fail2ban
ufw allow 22 && ufw allow 80 && ufw allow 443 && ufw enable

Use Hetzner Firewall rules at the network level in addition to ufw — belt and suspenders, because fail2ban won’t block a flood before it hits your interface.

Job Scheduling: Celery + Redis vs. Cron

For persistent scraping workloads, Celery backed by Redis is the standard stack. Redis runs comfortably on the same node for low-to-medium throughput (under 500 tasks/hour). Above that, move Redis to its own CAX11 instance.

A minimal Celery task for rotating scraping:

from celery import Celery
import httpx

app = Celery("scraper", broker="redis://localhost:6379/0")

@app.task(autoretry_for=(httpx.HTTPError,), retry_kwargs={"max_retries": 3, "countdown": 5})
def fetch(url: str, proxy: str) -> dict:
    with httpx.Client(proxies=proxy, timeout=15) as client:
        r = client.get(url)
        r.raise_for_status()
        return {"url": url, "status": r.status_code, "body": r.text[:5000]}

For simple periodic jobs (daily crawls, weekly index sweeps), plain cron is less overhead than running a Celery beat scheduler. Unlike Deploying Scrapers on Railway 2026: Cron + Background Workers, Hetzner doesn’t give you a managed cron UI, but systemd timers are more reliable than crontab and support logging natively.

Proxy Routing on Hetzner

Hetzner’s own IPs are heavily flagged by Cloudflare, Akamai, and most major anti-bot systems. You cannot scrape serious targets using Hetzner egress IPs without a proxy layer.

The standard architecture:

  1. Scrapers run on Hetzner VMs (cheap compute, not egress)
  2. All outbound traffic routes through a residential or mobile proxy pool via HTTPS_PROXY or per-request proxy rotation
  3. Session pinning (sticky sessions) for multi-step flows like login + cart
  4. Proxy failover logic with exponential backoff on 429/403

For Cloudflare-protected targets specifically, your proxy tier matters more than your scraper framework. Deploying Scrapers on Cloudflare Workers 2026: Limits and Workarounds covers why CF Workers aren’t a clean bypass for this problem either.

If you’re region-pinning requests (scraping country-specific SERPs, geo-gated content), you need proxies with confirmed residential IPs in the target country, not datacenter IPs on a Hetzner Falkenstein node. Deploying Scrapers on Fly.io 2026: Region-Pinned Workers is worth reading if your use case demands actual VM placement in target regions, rather than just proxy-level geo-routing.

Monitoring and Alerting

Minimal production monitoring for a Hetzner scraper node:

  • Prometheus + Grafana for CPU, memory, and task throughput (Hetzner’s built-in metrics are read-only snapshots, not real-time)
  • Celery Flower on a private port for queue depth and worker status
  • Alertmanager webhook to Telegram or Slack on: queue depth over threshold, worker crash, disk over 80%

For error tracking at the task level, Sentry’s free tier handles scraper exception volumes comfortably under 5K events/day. Above that, self-host Sentry on a second small VM or switch to structured logging into a Loki instance.

Key metrics to watch:

  • Proxy error rate (429/403/CAPTCHA) as a percentage of total requests
  • Task retry rate (indicates target difficulty or proxy quality degradation)
  • Queue drain time (lag between enqueue and completion)

Bottom Line

For pure cost-per-compute-hour on persistent scraping workloads in 2026, Hetzner wins every comparison. Start on a CX32 (€8.49/month) with Celery + Redis, route all egress through a residential proxy pool, and add monitoring before you scale to multiple nodes, not after. DRT covers the full deployment landscape including serverless and edge alternatives, but for teams that want one always-on node and predictable bills, Hetzner is the answer.

Related guides on dataresearchtools.com

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top

Resources

Proxy Signals Podcast
Operator-level insights on mobile proxies and access infrastructure.

Multi-Account Proxies: Setup, Types, Tools & Mistakes (2026)