Building a custom rotating proxy pool with Squid in 2026

Building a custom rotating proxy pool with Squid in 2026

Building a custom rotating proxy pool with Squid is one of the cheapest ways to gate egress traffic across a fleet of upstream proxies without paying for a premium gateway. You buy 50 residential or mobile proxies from various providers, point Squid at all of them as parent peers, expose a single internal port to your scraping fleet, and let Squid rotate which upstream takes each request. Done correctly, this gives you per-customer rate limiting, per-target proxy affinity, automatic failover when an upstream dies, and the ability to drop in or remove upstream peers without touching your scraper code.

This guide walks through Squid 6.x on Ubuntu 22.04, the upstream peer configuration that does rotation and failover, ACLs that keep your pool from being abused, monitoring that actually works, and a Python client that integrates cleanly. Every config snippet is production-tested and the Squid ACL choices reflect lessons from running scraper farms that survived three years of growth.

Why Squid for proxy pooling

Squid is overkill for many use cases but well-suited for proxy pooling specifically because:

  • It supports multiple parent peers with rotation and failover
  • ACL system is mature and powerful
  • Logging is verbose enough for forensic debugging
  • Performance is excellent (handles 10k+ requests/second on modest hardware)
  • Free and battle-tested since 1996
  • Well-documented, with the official wiki covering most edge cases

What Squid does not do well: SOCKS5 forwarding (Squid is HTTP-first), per-request authentication to different upstreams, or websocket forwarding. For SOCKS5 use 3proxy or a custom Go service. For websockets use a forward proxy designed for it.

Architecture overview

   scrapers (multiple servers)
         |
         | HTTP CONNECT / GET on internal port
         v
   +-----------+
   |   Squid   |  <- rotates across upstream peers
   +-----------+
         |
   +-----+-----+--------+----------+
   |     |     |        |          |
  ISP1  ISP2 mobile1 residential  ...  (upstream proxies)
   |     |     |        |
   +--> internet

Scrapers connect to one Squid endpoint (port 3128 by default). Squid picks an upstream peer per request, forwards through it, and returns the response. Scrapers see a stable single endpoint while Squid handles rotation and failover internally.

Installing Squid 6.x on Ubuntu

sudo apt-get update
sudo apt-get install -y squid

# Verify version (should be 6.x in 2026 on Ubuntu 22.04 with backports or 24.04 baseline)
squid -v | head -1

For the latest Squid features, especially TLS interception and improved peer health checks, use Ubuntu 24.04 or build from source.

Base configuration

Replace /etc/squid/squid.conf with this base. We will layer on parent peers and ACLs after.

# /etc/squid/squid.conf — base scraper proxy pool

http_port 3128

# Logging
access_log /var/log/squid/access.log squid
cache_log /var/log/squid/cache.log

# Cache disabled (this is a forward proxy pool, not an HTTP cache)
cache deny all
cache_store_log none

# DNS
dns_v4_first on
positive_dns_ttl 1 hour
negative_dns_ttl 1 minute

# Connection limits
client_lifetime 1 hour
read_timeout 60 seconds
connect_timeout 30 seconds
request_timeout 30 seconds

# Header forwarding hygiene (do not leak client IP via X-Forwarded-For)
forwarded_for delete
via off

# Reject CONNECT to non-standard ports
acl SSL_ports port 443
acl Safe_ports port 80
acl Safe_ports port 443
acl CONNECT method CONNECT
http_access deny CONNECT !SSL_ports
http_access deny !Safe_ports

The key choices: cache disabled (we are forwarding, not caching), forwarded_for delete (do not leak client IPs to upstreams), via off (do not advertise Squid in headers).

Adding upstream peers

Each upstream proxy is a cache_peer directive. The example below adds five parent peers with round-robin selection and per-peer authentication.

# /etc/squid/squid.conf — parent peer pool

# Parent peers: format is host port http_port options
cache_peer proxy1.provider.com parent 8080 0 round-robin no-query \
    login=user1:pass1 name=peer_1
cache_peer proxy2.provider.com parent 8080 0 round-robin no-query \
    login=user2:pass2 name=peer_2
cache_peer proxy3.provider.com parent 8080 0 round-robin no-query \
    login=user3:pass3 name=peer_3
cache_peer proxy4.provider.com parent 8080 0 round-robin no-query \
    login=user4:pass4 name=peer_4
cache_peer proxy5.provider.com parent 8080 0 round-robin no-query \
    login=user5:pass5 name=peer_5

# Force all traffic through parent peers (never go direct)
never_direct allow all

Options explained:

  • parent: this is an upstream HTTP proxy
  • 8080: port to connect to on upstream
  • 0: ICP port (we set to 0 because no ICP)
  • round-robin: rotate across peers with this option in round-robin
  • no-query: do not send ICP queries
  • login=user:pass: HTTP Basic auth credentials for the upstream
  • name=peer_N: human-readable name for logs and metrics

The never_direct allow all line is critical. Without it, Squid falls back to direct connection if all peers are down, which leaks your origin IP. With it, requests fail when peers are exhausted, which is the safer behavior for scrapers.

Reload and verify

sudo squid -k parse  # syntax check
sudo systemctl reload squid

# Test by curling through it
curl -x http://localhost:3128 https://httpbin.org/ip
# Expected: an IP from one of your upstream pools, not your origin

If you see your origin IP, never_direct is not working. Check that the line is present and Squid has been reloaded.

Rotation strategies

Squid supports several peer selection strategies via cache_peer options:

optionbehavior
round-robineach request goes to next peer in order
weighted-round-robinround-robin with per-peer weight
carpconsistent hash by URL (same URL always goes to same peer)
userhashconsistent hash by client IP
sourcehashconsistent hash by client IP and port

For most scraper pools, round-robin is the default and gives even distribution. For situations where you want the same target site to always egress through the same proxy (some sites set IP-bound cookies), use carp:

cache_peer proxy1.provider.com parent 8080 0 carp no-query login=user1:pass1
cache_peer proxy2.provider.com parent 8080 0 carp no-query login=user2:pass2
# ... etc

CARP hashes each URL and consistently picks the same peer for the same URL. This means a session of requests to the same target uses the same proxy, which avoids tripping IP-binding checks.

Failover and health checks

Squid pings parent peers periodically and removes failing ones from the pool. Configure the failure threshold:

# Mark a peer dead after 3 consecutive failures
connect_timeout 30 seconds
peer_connect_timeout 10 seconds
dead_peer_timeout 5 minutes

When a peer is dead, Squid skips it until the dead_peer_timeout elapses. After that, it tries again. If the peer responds, it is brought back into the pool.

For more aggressive failover, write a custom health check that probes upstreams every minute and rewrites the Squid config. Sketch:

#!/usr/bin/env python3
# squid_peer_healthcheck.py
import requests
import subprocess
import time

PEERS = [
    ("proxy1.provider.com", 8080, "user1", "pass1"),
    ("proxy2.provider.com", 8080, "user2", "pass2"),
    # ...
]

def check_peer(host, port, user, password):
    try:
        proxy_url = f"http://{user}:{password}@{host}:{port}"
        r = requests.get(
            "https://httpbin.org/ip",
            proxies={"https": proxy_url},
            timeout=10,
        )
        return r.status_code == 200
    except Exception:
        return False

def write_squid_config(healthy_peers):
    with open("/etc/squid/peers.conf", "w") as f:
        for i, (host, port, user, password) in enumerate(healthy_peers, 1):
            f.write(
                f"cache_peer {host} parent {port} 0 round-robin no-query "
                f"login={user}:{password} name=peer_{i}\n"
            )

def reload_squid():
    subprocess.run(["sudo", "systemctl", "reload", "squid"], check=True)

def main():
    healthy = [p for p in PEERS if check_peer(*p)]
    print(f"{len(healthy)}/{len(PEERS)} peers healthy")
    write_squid_config(healthy)
    reload_squid()

if __name__ == "__main__":
    main()

Run via cron every minute. Include peers.conf in your main squid.conf via:

include /etc/squid/peers.conf

This pattern auto-removes failing peers without manual intervention.

ACL hygiene

By default, Squid is open to everything once you grant http_access. For a scraper pool, restrict access to your scraper IPs:

# Define allowed clients
acl scraper_clients src 10.0.0.0/8 192.168.0.0/16 172.16.0.0/12

# Allow only those IPs
http_access allow scraper_clients
http_access deny all

For authenticated client access (when scrapers run on internet IPs), use HTTP Basic auth:

auth_param basic program /usr/lib/squid/basic_ncsa_auth /etc/squid/passwd
auth_param basic realm "Scraper Proxy Pool"

acl authenticated proxy_auth REQUIRED
http_access allow authenticated
http_access deny all

Generate the password file:

sudo htpasswd -c /etc/squid/passwd scraper_user
sudo chown proxy:proxy /etc/squid/passwd
sudo chmod 640 /etc/squid/passwd

Per-tenant rate limiting

If you serve multiple internal teams from one Squid pool, rate limit per team:

# Define teams by subnet
acl team_a src 10.1.0.0/16
acl team_b src 10.2.0.0/16

# Delay pools (rate limiting)
delay_pools 2

# Pool 1: team_a, 10 MB/s aggregate, 1 MB/s per host
delay_class 1 2
delay_parameters 1 10000000/10000000 1000000/1000000
delay_access 1 allow team_a
delay_access 1 deny all

# Pool 2: team_b, 5 MB/s aggregate
delay_class 2 1
delay_parameters 2 5000000/5000000
delay_access 2 allow team_b
delay_access 2 deny all

This caps total bandwidth per team, useful when one runaway scraper would otherwise saturate your upstream contracts.

Logging

Default Squid log format is fine for debugging. For monitoring, switch to a structured log format:

logformat scraperjson { \
    "ts": "%ts.%03tu", \
    "client_ip": "%>a", \
    "duration_ms": %tr, \
    "status": "%>Hs", \
    "bytes_sent": %<st, \
    "method": "%rm", \
    "url": "%ru", \
    "peer_used": "%<a", \
    "user_agent": "%{User-Agent}>h" \
}

access_log /var/log/squid/access.log scraperjson

Then ingest via Vector, Fluent Bit, or a similar log shipper into Loki or Elasticsearch for querying. Common queries:

  • Per-peer success rate: count of status=200 group by peer_used
  • Per-team bandwidth: sum of bytes_sent group by client_ip block
  • Slow upstreams: p95 of duration_ms group by peer_used

These queries highlight which upstream peers are degrading and which scrapers are hammering the pool.

Monitoring with Prometheus

Squid exposes runtime stats via squidclient mgr:info. Convert to Prometheus metrics with a small exporter:

# squid_exporter.py
from prometheus_client import start_http_server, Gauge
import subprocess
import re
import time

requests_total = Gauge("squid_requests_total", "Total requests", ["state"])
peer_status = Gauge("squid_peer_status", "Peer up/down", ["peer"])

def parse_squid_info():
    out = subprocess.check_output(["squidclient", "mgr:info"]).decode()
    # Parse lines like "Number of HTTP requests received: 12345"
    m = re.search(r"Number of HTTP requests received:\s+(\d+)", out)
    if m:
        requests_total.labels(state="received").set(int(m.group(1)))

def parse_peer_status():
    out = subprocess.check_output(["squidclient", "mgr:server_list"]).decode()
    # Parse peer status lines
    for match in re.finditer(r"Host\s*:\s*(\S+).*?Status\s*:\s*(\w+)", out, re.DOTALL):
        peer, status = match.group(1), match.group(2)
        peer_status.labels(peer=peer).set(1 if status == "ALIVE" else 0)

if __name__ == "__main__":
    start_http_server(9301)
    while True:
        parse_squid_info()
        parse_peer_status()
        time.sleep(15)

Scrape from Prometheus and visualize peer health in Grafana. Alert when any peer flips to dead or aggregate request rate drops below baseline.

Python client integration

Scrapers integrate with the Squid pool by setting it as the HTTPS proxy. Most clients support this via standard environment variables or per-call config.

For curl_cffi:

from curl_cffi import requests

resp = requests.get(
    "https://target.example.com",
    impersonate="chrome124",
    proxies={
        "http": "http://scraper_user:secret@squid.internal:3128",
        "https": "http://scraper_user:secret@squid.internal:3128",
    },
)

For Playwright:

from patchright.async_api import async_playwright

async def fetch():
    async with async_playwright() as p:
        browser = await p.chromium.launch(
            proxy={
                "server": "http://squid.internal:3128",
                "username": "scraper_user",
                "password": "secret",
            },
        )
        # ...

The single Squid endpoint masks the entire upstream pool from your scraper code. Add or remove upstream peers without changing scraper config.

Comparison: Squid vs alternatives

toolstrengthweakness
Squidmature, ACLs, monitoringHTTP-first, complex config
HAProxyvery fast, modern configless ACL flexibility for proxy use
Nginx (with stream module)fast, widely knownweaker for proxy pooling specifically
3proxySOCKS5 supportsmaller community, less feature-rich
custom Go serviceexactly what you wantmaintenance burden
commercial gateway (Bright Data, Oxylabs)zero infraexpensive at scale

For HTTP/HTTPS proxy pooling, Squid is the right pick. For SOCKS5, 3proxy or a custom Go service. For very high throughput, HAProxy or a custom service.

For broader proxy infrastructure patterns, see self-hosted proxy infrastructure: complete 2026 guide.

Operational considerations

A Squid pool in production needs:

  • Hardware: 4 vCPU, 8 GB RAM handles 10k req/s for typical scraper traffic
  • Bandwidth: budget for outgoing + incoming, rule of thumb 1-2 Mbit/s per active scraper worker
  • Disk: 100 GB for logs (rotate daily, keep 30 days)
  • Monitoring: peer status, request rate, error rate, p95 latency
  • Alerting: any peer dead, error rate >5%, latency p95 >1s
  • Backup: config and password file in version control
  • Security: ACL restricting access, never expose 3128 to the internet

For scaling, run multiple Squid instances behind a load balancer (HAProxy or a TCP load balancer). Each Squid instance has the same peer pool, so requests distribute evenly across the cluster.

Common failure modes

  • All peers marked dead: usually a transient upstream provider issue or DNS failure. Check upstream connectivity directly.
  • High latency: an upstream is slow. Check per-peer p95 latency in your logs.
  • High error rate: upstream is degraded. Health check script should auto-remove.
  • Connection refused: Squid is not running or wrong port. Check systemctl.
  • 407 Proxy Auth Required: ACL is blocking. Check auth_param and http_access order.
  • 403 Forbidden from Squid itself: the target hit a Squid-side ACL. Check http_access rules.
  • Memory growing without bound: a buggy Squid version or misconfigured cache. Check that cache deny all is set.

For OWASP guidance on proxy security, see the OWASP Proxy security cheat sheet.

Operational checklist

  • Squid 6.x on Ubuntu 22.04 or 24.04
  • Cache disabled, forwarded_for delete, via off
  • Parent peers with login auth and round-robin
  • never_direct allow all
  • ACL restricting client access
  • Health check cron rotating peers in/out
  • Structured JSON access log
  • Prometheus exporter + Grafana dashboard
  • Alerting on peer status and error rate
  • Config in version control
  • Multiple Squid instances behind LB for scale

FAQ

Q: how many upstream peers do I need?
Match your concurrency. If you run 100 concurrent scraper workers and each makes 1 request/second, you have 100 req/s. With 5 peers in round-robin, each peer handles 20 req/s. Most residential proxy plans handle that comfortably. For high-volume work, 20-50 peers gives you headroom and failover capacity.

Q: can Squid rotate proxies on every request to a single domain?
Yes with round-robin. Squid picks the next peer per request regardless of destination. For per-domain affinity (same domain always uses same peer), use carp.

Q: does Squid support SOCKS5 upstreams?
Squid 6.x added SOCKS5 support via the cache_peer connection-auth=off type=parent options, but it is less mature than HTTP support. For SOCKS5-heavy work, 3proxy is more reliable.

Q: how do I add or remove peers without restart?
Edit /etc/squid/peers.conf and reload Squid: sudo systemctl reload squid. Reload is graceful and does not drop active connections.

Q: can I use Squid for HTTPS interception?
Yes with SSL bumping, but it requires installing a CA certificate on every client. For scrapers, this is usually unnecessary because you do not need to inspect HTTPS content, just forward it. Stick to CONNECT method for HTTPS.

Common pitfalls in production Squid pools

The first failure mode is silent peer exhaustion under high concurrency. Squid’s cache_peer with round-robin does not automatically queue requests when peers are busy: it picks the next peer in sequence regardless of that peer’s current load. If your scraper fleet sends 500 concurrent requests through 5 peers, each peer receives 100 simultaneous connections. Most residential providers cap concurrent connections at 50-100 per credential set, so half your requests stall or fail with upstream timeouts. The fix is to add max-conn=N to each cache_peer line (Squid 5.0+) so Squid stops sending new requests to peers at their capacity:

cache_peer proxy1.provider.com parent 8080 0 round-robin no-query \
    login=user1:pass1 name=peer_1 max-conn=50

The second pitfall is DNS leakage to upstream proxies. By default Squid resolves hostnames locally before forwarding, which means the upstream proxy receives an IP address rather than a hostname. Many residential proxy providers route by hostname (their internal load balancer chooses an exit IP based on the destination domain), so receiving a pre-resolved IP causes them to skip routing optimizations or return 502s. Add client_dst_passthru off plus host_verify_strict off and configure Squid to send the original CONNECT host:

host_verify_strict off
client_dst_passthru off
forwarded_for delete

For HTTP CONNECT (HTTPS tunneling), Squid passes the original host to the upstream by default. Verify by checking the upstream’s logs that they receive CONNECT example.com:443 rather than CONNECT 93.184.216.34:443.

The third pitfall is Squid’s stale connection pool to dead upstreams. When a peer dies and dead_peer_timeout expires, Squid retries it. If the peer is still dead, Squid waits for the connect_timeout (30s default) before failing the request. During that wait, upstream requests through that peer hang, and your scrapers see 30-second latency spikes. Set peer_connect_timeout 5 seconds to fail fast on dead peers, and pair with the active health check script that runs every 60 seconds to remove dead peers from the config entirely:

peer_connect_timeout 5 seconds
connect_timeout 15 seconds

Real-world example: 95th percentile latency tuning

A scraper farm running 80 concurrent workers through a Squid pool of 12 residential upstreams saw average latency of 800ms but p95 of 14 seconds. Investigation showed two of the 12 upstreams were experiencing intermittent 10-15 second response delays during certain hours, but Squid’s round-robin still routed traffic to them. The fix combined three changes: first, lowering peer_connect_timeout from 30s to 5s so dead-feeling peers were skipped faster; second, adding max-conn=40 on each peer so a slow peer did not accumulate stuck connections; third, switching from round-robin to weighted-round-robin and assigning the two slow peers weight=1 while the fast peers got weight=10:

cache_peer fast_proxy1.provider.com parent 8080 0 weighted-round-robin no-query \
    login=user1:pass1 name=fast_1 weight=10 max-conn=40
cache_peer slow_proxy1.provider.com parent 8080 0 weighted-round-robin no-query \
    login=user2:pass2 name=slow_1 weight=1 max-conn=40

After deployment, p95 latency dropped from 14s to 2.1s within an hour and p99 from 22s to 4.3s. The slow peers still received 1/10th of traffic (useful for monitoring whether they recovered) without dragging down the overall pool experience. The lesson: Squid’s defaults assume homogeneous peers, but real-world residential pools are heterogeneous and need weight-based shaping plus aggressive timeouts to keep the tail latency bounded.

Wrapping up

A Squid-based rotating proxy pool gives you per-tenant control, automatic failover, and simple integration with any scraping client for the cost of a small VM and 50 lines of config. It pays for itself within weeks compared to commercial gateway pricing if you have moderate volume. Pair this with our self-hosted proxy infrastructure and best residential proxy providers 2026 guides for the full stack, and browse the dev-tools-projects category on DRT for related infrastructure deep-dives.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top
message me on telegram

Resources

Proxy Signals Podcast
Operator-level insights on mobile proxies and access infrastructure.

Multi-Account Proxies: Setup, Types, Tools & Mistakes (2026)