Proxy Connection Pooling: Maximize Throughput & Reduce Overhead
Every new TCP connection through a proxy involves a three-way handshake to the proxy, then another to the target — adding 100-300ms of overhead per request. Connection pooling reuses established connections, eliminating this overhead and delivering 3-10x throughput improvements.
This guide covers how connection pooling works with proxy servers, optimal pool sizing, and implementations across popular languages.
Why Connection Pooling Matters
Without Pooling (new connection per request):
Request 1: TCP handshake (50ms) → TLS (80ms) → CONNECT (30ms) → GET (100ms) = 260ms
Request 2: TCP handshake (50ms) → TLS (80ms) → CONNECT (30ms) → GET (100ms) = 260ms
Request 3: TCP handshake (50ms) → TLS (80ms) → CONNECT (30ms) → GET (100ms) = 260ms
Total: 780ms for 3 requests
With Pooling (reuse connections):
Request 1: TCP handshake (50ms) → TLS (80ms) → CONNECT (30ms) → GET (100ms) = 260ms
Request 2: GET (100ms) = 100ms (reused connection)
Request 3: GET (100ms) = 100ms (reused connection)
Total: 460ms for 3 requests (41% faster)For 1,000 requests, the difference compounds:
- Without pooling: ~260 seconds
- With pooling: ~103 seconds (2.5x faster)
Python: httpx Connection Pooling
import httpx
import asyncio
import time
class PooledProxyScraper:
"""Scraper with optimized connection pooling through proxy."""
def __init__(self, proxy_url: str):
self.proxy_url = proxy_url
self.client = httpx.AsyncClient(
proxy=proxy_url,
limits=httpx.Limits(
max_connections=100, # Total pool size
max_keepalive_connections=20, # Keep-alive connections
keepalive_expiry=30, # Seconds before expiry
),
timeout=httpx.Timeout(
connect=10,
read=30,
write=10,
pool=5, # Wait for available connection
),
http2=True, # HTTP/2 multiplexing
)
async def scrape(self, urls: list) -> list:
"""Scrape URLs using pooled connections."""
tasks = [self._fetch(url) for url in urls]
return await asyncio.gather(*tasks, return_exceptions=True)
async def _fetch(self, url: str):
response = await self.client.get(url)
return {
"url": url,
"status": response.status_code,
"size": len(response.content),
}
async def close(self):
await self.client.aclose()
# Benchmark: pooled vs unpooled
async def benchmark():
urls = [f"https://httpbin.org/get?i={i}" for i in range(100)]
proxy = "http://user:pass@proxy.example.com:8080"
# Pooled
scraper = PooledProxyScraper(proxy)
start = time.time()
results = await scraper.scrape(urls)
pooled_time = time.time() - start
await scraper.close()
# Unpooled (new client per request)
start = time.time()
for url in urls:
async with httpx.AsyncClient(proxy=proxy) as client:
await client.get(url)
unpooled_time = time.time() - start
print(f"Pooled: {pooled_time:.1f}s ({len(urls)/pooled_time:.1f} req/s)")
print(f"Unpooled: {unpooled_time:.1f}s ({len(urls)/unpooled_time:.1f} req/s)")
print(f"Speedup: {unpooled_time/pooled_time:.1f}x")
asyncio.run(benchmark())Pool Sizing Guidelines
Pool Size Formula:
optimal_pool_size = concurrent_requests * (1 + avg_latency / avg_processing_time)
Example:
- 20 concurrent scrapers
- 200ms average proxy latency
- 50ms processing time per response
optimal_pool_size = 20 * (1 + 200/50) = 20 * 5 = 100 connections
Rules of Thumb:
┌──────────────────────────┬──────────────────┐
│ Scenario │ Pool Size │
├──────────────────────────┼──────────────────┤
│ Light scraping (1-5 rps) │ 10-20 │
│ Medium (5-50 rps) │ 20-50 │
│ Heavy (50-200 rps) │ 50-100 │
│ Enterprise (200+ rps) │ 100-500 │
└──────────────────────────┴──────────────────┘Dynamic Pool Sizing
class DynamicPool:
"""Automatically adjust pool size based on demand."""
def __init__(self, proxy_url, min_size=10, max_size=200):
self.proxy_url = proxy_url
self.min_size = min_size
self.max_size = max_size
self.current_size = min_size
self._rebuild_client()
def _rebuild_client(self):
self.client = httpx.AsyncClient(
proxy=self.proxy_url,
limits=httpx.Limits(
max_connections=self.current_size,
max_keepalive_connections=self.current_size // 2,
),
)
async def adjust_pool(self, pending_requests: int):
"""Scale pool based on pending requests."""
target = min(max(pending_requests * 2, self.min_size), self.max_size)
if target != self.current_size:
await self.client.aclose()
self.current_size = target
self._rebuild_client()
print(f"Pool resized to {self.current_size}")Connection Pooling with aiohttp
import aiohttp
async def scrape_with_aiohttp_pool(urls, proxy_url):
"""aiohttp has built-in connection pooling via TCPConnector."""
connector = aiohttp.TCPConnector(
limit=100, # Total connection limit
limit_per_host=10, # Per-host limit
ttl_dns_cache=300, # DNS cache TTL
keepalive_timeout=30, # Keep-alive timeout
enable_cleanup_closed=True,
)
async with aiohttp.ClientSession(connector=connector) as session:
tasks = []
for url in urls:
tasks.append(session.get(url, proxy=proxy_url))
responses = await asyncio.gather(*tasks)
results = []
for resp in responses:
results.append({
"status": resp.status,
"size": len(await resp.read()),
})
resp.release() # Return connection to pool
return resultsNode.js Connection Pooling
const { HttpsProxyAgent } = require('https-proxy-agent');
const http = require('http');
const https = require('https');
// Create agent with connection pooling
const proxyAgent = new HttpsProxyAgent('http://user:pass@proxy.com:8080', {
keepAlive: true,
keepAliveMsecs: 30000,
maxSockets: 100,
maxFreeSockets: 20,
timeout: 30000,
});
// Use with fetch or axios
const axios = require('axios');
const instance = axios.create({
httpAgent: proxyAgent,
httpsAgent: proxyAgent,
timeout: 30000,
});
async function scrapeWithPool(urls) {
const results = await Promise.all(
urls.map(url => instance.get(url).catch(err => ({ error: err.message })))
);
return results;
}Go Connection Pooling
package main
import (
"fmt"
"net/http"
"net/url"
"time"
"io/ioutil"
)
func createPooledClient(proxyURL string) *http.Client {
proxy, _ := url.Parse(proxyURL)
transport := &http.Transport{
Proxy: http.ProxyURL(proxy),
MaxIdleConns: 100,
MaxIdleConnsPerHost: 20,
MaxConnsPerHost: 50,
IdleConnTimeout: 30 * time.Second,
TLSHandshakeTimeout: 10 * time.Second,
ResponseHeaderTimeout: 10 * time.Second,
DisableKeepAlives: false, // Enable keep-alive
}
return &http.Client{
Transport: transport,
Timeout: 30 * time.Second,
}
}
func main() {
client := createPooledClient("http://user:pass@proxy.com:8080")
urls := []string{
"https://httpbin.org/get",
"https://httpbin.org/ip",
}
for _, u := range urls {
resp, err := client.Get(u)
if err != nil {
fmt.Printf("Error: %v\n", err)
continue
}
body, _ := ioutil.ReadAll(resp.Body)
resp.Body.Close() // Important: close body to return connection
fmt.Printf("%s: %d bytes\n", u, len(body))
}
}Common Pooling Mistakes
Mistake 1: Not Closing Response Bodies
# WRONG — connection stays checked out
async with httpx.AsyncClient(proxy=proxy) as client:
response = await client.get(url)
# Never read the response body
# Connection cannot be reused
# RIGHT — read and release
async with httpx.AsyncClient(proxy=proxy) as client:
response = await client.get(url)
_ = response.content # Read body, connection returned to poolMistake 2: Creating New Clients Per Request
# WRONG — no connection reuse
for url in urls:
async with httpx.AsyncClient(proxy=proxy) as client: # New pool each time
await client.get(url)
# RIGHT — share the client
async with httpx.AsyncClient(proxy=proxy) as client: # One pool
for url in urls:
await client.get(url) # Connections reusedMistake 3: Pool Too Small
# WRONG — pool bottleneck, requests queue up
client = httpx.AsyncClient(
proxy=proxy,
limits=httpx.Limits(max_connections=5) # Too small for 100 concurrent tasks
)
# RIGHT — size pool to match concurrency
client = httpx.AsyncClient(
proxy=proxy,
limits=httpx.Limits(max_connections=100)
)Monitoring Pool Health
class PoolMonitor:
"""Monitor connection pool utilization."""
def __init__(self, client: httpx.AsyncClient):
self.client = client
def get_pool_stats(self):
pool = self.client._transport._pool
return {
"active_connections": len(pool._requests),
"idle_connections": len(pool._idle_connections)
if hasattr(pool, '_idle_connections') else 'N/A',
}Internal Links
- Proxy Load Balancing — distribute pooled connections across proxies
- Proxy Performance Benchmarks — measure pooling impact
- HTTP/2 and HTTP/3 with Proxies — HTTP/2 multiplexing as advanced pooling
- Web Scraping Architecture — design patterns using connection pools
- Bandwidth Optimization — combine with pooling for maximum efficiency
FAQ
What is the difference between connection pooling and HTTP/2 multiplexing?
Connection pooling reuses TCP connections sequentially — one request at a time per connection. HTTP/2 multiplexing sends multiple requests simultaneously over a single connection. They complement each other: pool HTTP/2 connections for the best performance.
How long should I keep idle connections alive?
30-60 seconds is typical. Too short and you waste connections; too long and you hold resources for proxies that may rotate your IP. Match the keep-alive timeout to your scraping pattern — continuous scraping benefits from longer timeouts.
Does connection pooling work with rotating proxies?
It depends. If the proxy gateway handles rotation on the server side, pooling works perfectly — you maintain connections to the gateway while the gateway rotates exit IPs. If rotation requires connecting to different proxy servers, each server gets its own pool entry.
Can connection pooling cause IP detection issues?
Yes, if you reuse connections too aggressively to the same target. A single connection sending hundreds of requests looks automated. Balance pooling efficiency with natural browsing patterns by limiting requests-per-connection.
How do I handle connection pool exhaustion?
Set a pool timeout to wait briefly for available connections before failing. Monitor pool utilization and increase the max size if requests frequently wait. For spiky workloads, use dynamic pool sizing that scales with demand.
- AJAX Request Interception: Scraping API Calls Directly
- Bandwidth Optimization for Proxies: Reduce Costs & Increase Speed
- Build an Anti-Detection Test Suite: Verify Browser Stealth
- Build a Proxy Rotator in Python: Complete Tutorial
- How to Configure Proxies on iPhone and Android
- How to Use Proxies in Node.js (Axios, Fetch, Puppeteer)
- AJAX Request Interception: Scraping API Calls Directly
- Bandwidth Optimization for Proxies: Reduce Costs & Increase Speed
- Build an Anti-Detection Test Suite: Verify Browser Stealth
- Build a Proxy Rotator in Python: Complete Tutorial
- How to Configure Proxies on iPhone and Android
- How to Use Proxies in Node.js (Axios, Fetch, Puppeteer)
- AJAX Request Interception: Scraping API Calls Directly
- Azure Functions for Serverless Web Scraping: the Complete Guide
- Build an Anti-Detection Test Suite: Verify Browser Stealth
- Build a News Crawler in Python: Step-by-Step Tutorial
- How to Configure Proxies on iPhone and Android
- How to Use Proxies in Node.js (Axios, Fetch, Puppeteer)
Related Reading
- AJAX Request Interception: Scraping API Calls Directly
- Azure Functions for Serverless Web Scraping: the Complete Guide
- Build an Anti-Detection Test Suite: Verify Browser Stealth
- Build a News Crawler in Python: Step-by-Step Tutorial
- How to Configure Proxies on iPhone and Android
- How to Use Proxies in Node.js (Axios, Fetch, Puppeteer)