Building a custom rotating proxy pool with Squid in 2026
Building a custom rotating proxy pool with Squid is one of the cheapest ways to gate egress traffic across a fleet of upstream proxies without paying for a premium gateway. You buy 50 residential or mobile proxies from various providers, point Squid at all of them as parent peers, expose a single internal port to your scraping fleet, and let Squid rotate which upstream takes each request. Done correctly, this gives you per-customer rate limiting, per-target proxy affinity, automatic failover when an upstream dies, and the ability to drop in or remove upstream peers without touching your scraper code.
This guide walks through Squid 6.x on Ubuntu 22.04, the upstream peer configuration that does rotation and failover, ACLs that keep your pool from being abused, monitoring that actually works, and a Python client that integrates cleanly. Every config snippet is production-tested and the Squid ACL choices reflect lessons from running scraper farms that survived three years of growth.
Why Squid for proxy pooling
Squid is overkill for many use cases but well-suited for proxy pooling specifically because:
- It supports multiple parent peers with rotation and failover
- ACL system is mature and powerful
- Logging is verbose enough for forensic debugging
- Performance is excellent (handles 10k+ requests/second on modest hardware)
- Free and battle-tested since 1996
- Well-documented, with the official wiki covering most edge cases
What Squid does not do well: SOCKS5 forwarding (Squid is HTTP-first), per-request authentication to different upstreams, or websocket forwarding. For SOCKS5 use 3proxy or a custom Go service. For websockets use a forward proxy designed for it.
Architecture overview
scrapers (multiple servers)
|
| HTTP CONNECT / GET on internal port
v
+-----------+
| Squid | <- rotates across upstream peers
+-----------+
|
+-----+-----+--------+----------+
| | | | |
ISP1 ISP2 mobile1 residential ... (upstream proxies)
| | | |
+--> internet
Scrapers connect to one Squid endpoint (port 3128 by default). Squid picks an upstream peer per request, forwards through it, and returns the response. Scrapers see a stable single endpoint while Squid handles rotation and failover internally.
Installing Squid 6.x on Ubuntu
sudo apt-get update
sudo apt-get install -y squid
# Verify version (should be 6.x in 2026 on Ubuntu 22.04 with backports or 24.04 baseline)
squid -v | head -1
For the latest Squid features, especially TLS interception and improved peer health checks, use Ubuntu 24.04 or build from source.
Base configuration
Replace /etc/squid/squid.conf with this base. We will layer on parent peers and ACLs after.
# /etc/squid/squid.conf — base scraper proxy pool
http_port 3128
# Logging
access_log /var/log/squid/access.log squid
cache_log /var/log/squid/cache.log
# Cache disabled (this is a forward proxy pool, not an HTTP cache)
cache deny all
cache_store_log none
# DNS
dns_v4_first on
positive_dns_ttl 1 hour
negative_dns_ttl 1 minute
# Connection limits
client_lifetime 1 hour
read_timeout 60 seconds
connect_timeout 30 seconds
request_timeout 30 seconds
# Header forwarding hygiene (do not leak client IP via X-Forwarded-For)
forwarded_for delete
via off
# Reject CONNECT to non-standard ports
acl SSL_ports port 443
acl Safe_ports port 80
acl Safe_ports port 443
acl CONNECT method CONNECT
http_access deny CONNECT !SSL_ports
http_access deny !Safe_ports
The key choices: cache disabled (we are forwarding, not caching), forwarded_for delete (do not leak client IPs to upstreams), via off (do not advertise Squid in headers).
Adding upstream peers
Each upstream proxy is a cache_peer directive. The example below adds five parent peers with round-robin selection and per-peer authentication.
# /etc/squid/squid.conf — parent peer pool
# Parent peers: format is host port http_port options
cache_peer proxy1.provider.com parent 8080 0 round-robin no-query \
login=user1:pass1 name=peer_1
cache_peer proxy2.provider.com parent 8080 0 round-robin no-query \
login=user2:pass2 name=peer_2
cache_peer proxy3.provider.com parent 8080 0 round-robin no-query \
login=user3:pass3 name=peer_3
cache_peer proxy4.provider.com parent 8080 0 round-robin no-query \
login=user4:pass4 name=peer_4
cache_peer proxy5.provider.com parent 8080 0 round-robin no-query \
login=user5:pass5 name=peer_5
# Force all traffic through parent peers (never go direct)
never_direct allow all
Options explained:
parent: this is an upstream HTTP proxy8080: port to connect to on upstream0: ICP port (we set to 0 because no ICP)round-robin: rotate across peers with this option in round-robinno-query: do not send ICP querieslogin=user:pass: HTTP Basic auth credentials for the upstreamname=peer_N: human-readable name for logs and metrics
The never_direct allow all line is critical. Without it, Squid falls back to direct connection if all peers are down, which leaks your origin IP. With it, requests fail when peers are exhausted, which is the safer behavior for scrapers.
Reload and verify
sudo squid -k parse # syntax check
sudo systemctl reload squid
# Test by curling through it
curl -x http://localhost:3128 https://httpbin.org/ip
# Expected: an IP from one of your upstream pools, not your origin
If you see your origin IP, never_direct is not working. Check that the line is present and Squid has been reloaded.
Rotation strategies
Squid supports several peer selection strategies via cache_peer options:
| option | behavior |
|---|---|
round-robin | each request goes to next peer in order |
weighted-round-robin | round-robin with per-peer weight |
carp | consistent hash by URL (same URL always goes to same peer) |
userhash | consistent hash by client IP |
sourcehash | consistent hash by client IP and port |
For most scraper pools, round-robin is the default and gives even distribution. For situations where you want the same target site to always egress through the same proxy (some sites set IP-bound cookies), use carp:
cache_peer proxy1.provider.com parent 8080 0 carp no-query login=user1:pass1
cache_peer proxy2.provider.com parent 8080 0 carp no-query login=user2:pass2
# ... etc
CARP hashes each URL and consistently picks the same peer for the same URL. This means a session of requests to the same target uses the same proxy, which avoids tripping IP-binding checks.
Failover and health checks
Squid pings parent peers periodically and removes failing ones from the pool. Configure the failure threshold:
# Mark a peer dead after 3 consecutive failures
connect_timeout 30 seconds
peer_connect_timeout 10 seconds
dead_peer_timeout 5 minutes
When a peer is dead, Squid skips it until the dead_peer_timeout elapses. After that, it tries again. If the peer responds, it is brought back into the pool.
For more aggressive failover, write a custom health check that probes upstreams every minute and rewrites the Squid config. Sketch:
#!/usr/bin/env python3
# squid_peer_healthcheck.py
import requests
import subprocess
import time
PEERS = [
("proxy1.provider.com", 8080, "user1", "pass1"),
("proxy2.provider.com", 8080, "user2", "pass2"),
# ...
]
def check_peer(host, port, user, password):
try:
proxy_url = f"http://{user}:{password}@{host}:{port}"
r = requests.get(
"https://httpbin.org/ip",
proxies={"https": proxy_url},
timeout=10,
)
return r.status_code == 200
except Exception:
return False
def write_squid_config(healthy_peers):
with open("/etc/squid/peers.conf", "w") as f:
for i, (host, port, user, password) in enumerate(healthy_peers, 1):
f.write(
f"cache_peer {host} parent {port} 0 round-robin no-query "
f"login={user}:{password} name=peer_{i}\n"
)
def reload_squid():
subprocess.run(["sudo", "systemctl", "reload", "squid"], check=True)
def main():
healthy = [p for p in PEERS if check_peer(*p)]
print(f"{len(healthy)}/{len(PEERS)} peers healthy")
write_squid_config(healthy)
reload_squid()
if __name__ == "__main__":
main()
Run via cron every minute. Include peers.conf in your main squid.conf via:
include /etc/squid/peers.conf
This pattern auto-removes failing peers without manual intervention.
ACL hygiene
By default, Squid is open to everything once you grant http_access. For a scraper pool, restrict access to your scraper IPs:
# Define allowed clients
acl scraper_clients src 10.0.0.0/8 192.168.0.0/16 172.16.0.0/12
# Allow only those IPs
http_access allow scraper_clients
http_access deny all
For authenticated client access (when scrapers run on internet IPs), use HTTP Basic auth:
auth_param basic program /usr/lib/squid/basic_ncsa_auth /etc/squid/passwd
auth_param basic realm "Scraper Proxy Pool"
acl authenticated proxy_auth REQUIRED
http_access allow authenticated
http_access deny all
Generate the password file:
sudo htpasswd -c /etc/squid/passwd scraper_user
sudo chown proxy:proxy /etc/squid/passwd
sudo chmod 640 /etc/squid/passwd
Per-tenant rate limiting
If you serve multiple internal teams from one Squid pool, rate limit per team:
# Define teams by subnet
acl team_a src 10.1.0.0/16
acl team_b src 10.2.0.0/16
# Delay pools (rate limiting)
delay_pools 2
# Pool 1: team_a, 10 MB/s aggregate, 1 MB/s per host
delay_class 1 2
delay_parameters 1 10000000/10000000 1000000/1000000
delay_access 1 allow team_a
delay_access 1 deny all
# Pool 2: team_b, 5 MB/s aggregate
delay_class 2 1
delay_parameters 2 5000000/5000000
delay_access 2 allow team_b
delay_access 2 deny all
This caps total bandwidth per team, useful when one runaway scraper would otherwise saturate your upstream contracts.
Logging
Default Squid log format is fine for debugging. For monitoring, switch to a structured log format:
logformat scraperjson { \
"ts": "%ts.%03tu", \
"client_ip": "%>a", \
"duration_ms": %tr, \
"status": "%>Hs", \
"bytes_sent": %<st, \
"method": "%rm", \
"url": "%ru", \
"peer_used": "%<a", \
"user_agent": "%{User-Agent}>h" \
}
access_log /var/log/squid/access.log scraperjson
Then ingest via Vector, Fluent Bit, or a similar log shipper into Loki or Elasticsearch for querying. Common queries:
- Per-peer success rate: count of
status=200group bypeer_used - Per-team bandwidth: sum of
bytes_sentgroup byclient_ipblock - Slow upstreams: p95 of
duration_msgroup bypeer_used
These queries highlight which upstream peers are degrading and which scrapers are hammering the pool.
Monitoring with Prometheus
Squid exposes runtime stats via squidclient mgr:info. Convert to Prometheus metrics with a small exporter:
# squid_exporter.py
from prometheus_client import start_http_server, Gauge
import subprocess
import re
import time
requests_total = Gauge("squid_requests_total", "Total requests", ["state"])
peer_status = Gauge("squid_peer_status", "Peer up/down", ["peer"])
def parse_squid_info():
out = subprocess.check_output(["squidclient", "mgr:info"]).decode()
# Parse lines like "Number of HTTP requests received: 12345"
m = re.search(r"Number of HTTP requests received:\s+(\d+)", out)
if m:
requests_total.labels(state="received").set(int(m.group(1)))
def parse_peer_status():
out = subprocess.check_output(["squidclient", "mgr:server_list"]).decode()
# Parse peer status lines
for match in re.finditer(r"Host\s*:\s*(\S+).*?Status\s*:\s*(\w+)", out, re.DOTALL):
peer, status = match.group(1), match.group(2)
peer_status.labels(peer=peer).set(1 if status == "ALIVE" else 0)
if __name__ == "__main__":
start_http_server(9301)
while True:
parse_squid_info()
parse_peer_status()
time.sleep(15)
Scrape from Prometheus and visualize peer health in Grafana. Alert when any peer flips to dead or aggregate request rate drops below baseline.
Python client integration
Scrapers integrate with the Squid pool by setting it as the HTTPS proxy. Most clients support this via standard environment variables or per-call config.
For curl_cffi:
from curl_cffi import requests
resp = requests.get(
"https://target.example.com",
impersonate="chrome124",
proxies={
"http": "http://scraper_user:secret@squid.internal:3128",
"https": "http://scraper_user:secret@squid.internal:3128",
},
)
For Playwright:
from patchright.async_api import async_playwright
async def fetch():
async with async_playwright() as p:
browser = await p.chromium.launch(
proxy={
"server": "http://squid.internal:3128",
"username": "scraper_user",
"password": "secret",
},
)
# ...
The single Squid endpoint masks the entire upstream pool from your scraper code. Add or remove upstream peers without changing scraper config.
Comparison: Squid vs alternatives
| tool | strength | weakness |
|---|---|---|
| Squid | mature, ACLs, monitoring | HTTP-first, complex config |
| HAProxy | very fast, modern config | less ACL flexibility for proxy use |
| Nginx (with stream module) | fast, widely known | weaker for proxy pooling specifically |
| 3proxy | SOCKS5 support | smaller community, less feature-rich |
| custom Go service | exactly what you want | maintenance burden |
| commercial gateway (Bright Data, Oxylabs) | zero infra | expensive at scale |
For HTTP/HTTPS proxy pooling, Squid is the right pick. For SOCKS5, 3proxy or a custom Go service. For very high throughput, HAProxy or a custom service.
For broader proxy infrastructure patterns, see self-hosted proxy infrastructure: complete 2026 guide.
Operational considerations
A Squid pool in production needs:
- Hardware: 4 vCPU, 8 GB RAM handles 10k req/s for typical scraper traffic
- Bandwidth: budget for outgoing + incoming, rule of thumb 1-2 Mbit/s per active scraper worker
- Disk: 100 GB for logs (rotate daily, keep 30 days)
- Monitoring: peer status, request rate, error rate, p95 latency
- Alerting: any peer dead, error rate >5%, latency p95 >1s
- Backup: config and password file in version control
- Security: ACL restricting access, never expose 3128 to the internet
For scaling, run multiple Squid instances behind a load balancer (HAProxy or a TCP load balancer). Each Squid instance has the same peer pool, so requests distribute evenly across the cluster.
Common failure modes
- All peers marked dead: usually a transient upstream provider issue or DNS failure. Check upstream connectivity directly.
- High latency: an upstream is slow. Check per-peer p95 latency in your logs.
- High error rate: upstream is degraded. Health check script should auto-remove.
- Connection refused: Squid is not running or wrong port. Check systemctl.
- 407 Proxy Auth Required: ACL is blocking. Check
auth_paramandhttp_accessorder. - 403 Forbidden from Squid itself: the target hit a Squid-side ACL. Check
http_accessrules. - Memory growing without bound: a buggy Squid version or misconfigured cache. Check that
cache deny allis set.
For OWASP guidance on proxy security, see the OWASP Proxy security cheat sheet.
Operational checklist
- Squid 6.x on Ubuntu 22.04 or 24.04
- Cache disabled, forwarded_for delete, via off
- Parent peers with login auth and round-robin
- never_direct allow all
- ACL restricting client access
- Health check cron rotating peers in/out
- Structured JSON access log
- Prometheus exporter + Grafana dashboard
- Alerting on peer status and error rate
- Config in version control
- Multiple Squid instances behind LB for scale
FAQ
Q: how many upstream peers do I need?
Match your concurrency. If you run 100 concurrent scraper workers and each makes 1 request/second, you have 100 req/s. With 5 peers in round-robin, each peer handles 20 req/s. Most residential proxy plans handle that comfortably. For high-volume work, 20-50 peers gives you headroom and failover capacity.
Q: can Squid rotate proxies on every request to a single domain?
Yes with round-robin. Squid picks the next peer per request regardless of destination. For per-domain affinity (same domain always uses same peer), use carp.
Q: does Squid support SOCKS5 upstreams?
Squid 6.x added SOCKS5 support via the cache_peer connection-auth=off type=parent options, but it is less mature than HTTP support. For SOCKS5-heavy work, 3proxy is more reliable.
Q: how do I add or remove peers without restart?
Edit /etc/squid/peers.conf and reload Squid: sudo systemctl reload squid. Reload is graceful and does not drop active connections.
Q: can I use Squid for HTTPS interception?
Yes with SSL bumping, but it requires installing a CA certificate on every client. For scrapers, this is usually unnecessary because you do not need to inspect HTTPS content, just forward it. Stick to CONNECT method for HTTPS.
Common pitfalls in production Squid pools
The first failure mode is silent peer exhaustion under high concurrency. Squid’s cache_peer with round-robin does not automatically queue requests when peers are busy: it picks the next peer in sequence regardless of that peer’s current load. If your scraper fleet sends 500 concurrent requests through 5 peers, each peer receives 100 simultaneous connections. Most residential providers cap concurrent connections at 50-100 per credential set, so half your requests stall or fail with upstream timeouts. The fix is to add max-conn=N to each cache_peer line (Squid 5.0+) so Squid stops sending new requests to peers at their capacity:
cache_peer proxy1.provider.com parent 8080 0 round-robin no-query \
login=user1:pass1 name=peer_1 max-conn=50
The second pitfall is DNS leakage to upstream proxies. By default Squid resolves hostnames locally before forwarding, which means the upstream proxy receives an IP address rather than a hostname. Many residential proxy providers route by hostname (their internal load balancer chooses an exit IP based on the destination domain), so receiving a pre-resolved IP causes them to skip routing optimizations or return 502s. Add client_dst_passthru off plus host_verify_strict off and configure Squid to send the original CONNECT host:
host_verify_strict off
client_dst_passthru off
forwarded_for delete
For HTTP CONNECT (HTTPS tunneling), Squid passes the original host to the upstream by default. Verify by checking the upstream’s logs that they receive CONNECT example.com:443 rather than CONNECT 93.184.216.34:443.
The third pitfall is Squid’s stale connection pool to dead upstreams. When a peer dies and dead_peer_timeout expires, Squid retries it. If the peer is still dead, Squid waits for the connect_timeout (30s default) before failing the request. During that wait, upstream requests through that peer hang, and your scrapers see 30-second latency spikes. Set peer_connect_timeout 5 seconds to fail fast on dead peers, and pair with the active health check script that runs every 60 seconds to remove dead peers from the config entirely:
peer_connect_timeout 5 seconds
connect_timeout 15 seconds
Real-world example: 95th percentile latency tuning
A scraper farm running 80 concurrent workers through a Squid pool of 12 residential upstreams saw average latency of 800ms but p95 of 14 seconds. Investigation showed two of the 12 upstreams were experiencing intermittent 10-15 second response delays during certain hours, but Squid’s round-robin still routed traffic to them. The fix combined three changes: first, lowering peer_connect_timeout from 30s to 5s so dead-feeling peers were skipped faster; second, adding max-conn=40 on each peer so a slow peer did not accumulate stuck connections; third, switching from round-robin to weighted-round-robin and assigning the two slow peers weight=1 while the fast peers got weight=10:
cache_peer fast_proxy1.provider.com parent 8080 0 weighted-round-robin no-query \
login=user1:pass1 name=fast_1 weight=10 max-conn=40
cache_peer slow_proxy1.provider.com parent 8080 0 weighted-round-robin no-query \
login=user2:pass2 name=slow_1 weight=1 max-conn=40
After deployment, p95 latency dropped from 14s to 2.1s within an hour and p99 from 22s to 4.3s. The slow peers still received 1/10th of traffic (useful for monitoring whether they recovered) without dragging down the overall pool experience. The lesson: Squid’s defaults assume homogeneous peers, but real-world residential pools are heterogeneous and need weight-based shaping plus aggressive timeouts to keep the tail latency bounded.
Wrapping up
A Squid-based rotating proxy pool gives you per-tenant control, automatic failover, and simple integration with any scraping client for the cost of a small VM and 50 lines of config. It pays for itself within weeks compared to commercial gateway pricing if you have moderate volume. Pair this with our self-hosted proxy infrastructure and best residential proxy providers 2026 guides for the full stack, and browse the dev-tools-projects category on DRT for related infrastructure deep-dives.