Bandwidth Optimization for Proxies: Reduce Costs & Increase Speed
Residential proxy providers charge $5-15 per GB. A naive scraper downloading full pages with images, CSS, and JavaScript can burn through gigabytes in hours. By optimizing bandwidth, you can reduce costs by 70-90% while also improving speed — less data means faster responses.
This guide covers every technique to minimize proxy bandwidth consumption.
The Bandwidth Problem
A typical web page in 2026:
Average page weight: 2.5 MB
├── HTML: 100 KB (4%)
├── JavaScript: 900 KB (36%)
├── CSS: 200 KB (8%)
├── Images: 1,100 KB (44%)
├── Fonts: 150 KB (6%)
└── Other: 50 KB (2%)
If you only need text data, you're wasting 96% of bandwidth.Scraping 100,000 pages at full weight: 250 GB = $1,250-$3,750 in residential proxy costs.
Scraping 100,000 pages (HTML only): 10 GB = $50-$150. A 25x cost reduction.
Technique 1: Block Unnecessary Resources
With Playwright
from playwright.async_api import async_playwright
async def scrape_text_only(url, proxy):
async with async_playwright() as p:
browser = await p.chromium.launch(
proxy={"server": proxy}
)
page = await browser.new_page()
# Block images, CSS, fonts, media
await page.route("**/*.{png,jpg,jpeg,gif,svg,webp,ico}",
lambda route: route.abort())
await page.route("**/*.{css,woff,woff2,ttf,eot}",
lambda route: route.abort())
await page.route("**/*.{mp4,mp3,avi,flv}",
lambda route: route.abort())
# Block known tracking/analytics domains
block_domains = [
"google-analytics.com", "googletagmanager.com",
"facebook.net", "doubleclick.net", "hotjar.com",
]
for domain in block_domains:
await page.route(f"**/*{domain}*", lambda route: route.abort())
await page.goto(url, wait_until="domcontentloaded")
content = await page.content()
await browser.close()
return contentWith Puppeteer (Node.js)
const puppeteer = require('puppeteer');
async function scrapeEfficient(url, proxyServer) {
const browser = await puppeteer.launch({
args: [`--proxy-server=${proxyServer}`]
});
const page = await browser.newPage();
// Enable request interception
await page.setRequestInterception(true);
page.on('request', (req) => {
const blockedTypes = ['image', 'stylesheet', 'font', 'media'];
if (blockedTypes.includes(req.resourceType())) {
req.abort();
} else {
req.continue();
}
});
await page.goto(url, { waitUntil: 'domcontentloaded' });
const content = await page.content();
await browser.close();
return content;
}Technique 2: Request Compression
Ask servers to compress responses:
import httpx
import gzip
import brotli
async def compressed_request(url, proxy):
async with httpx.AsyncClient(
proxy=proxy,
headers={
"Accept-Encoding": "br, gzip, deflate", # Prefer Brotli
}
) as client:
response = await client.get(url)
# httpx handles decompression automatically
# But check what encoding was used
encoding = response.headers.get("content-encoding", "none")
raw_size = int(response.headers.get("content-length", 0))
decoded_size = len(response.content)
compression_ratio = (1 - raw_size / decoded_size) * 100 if decoded_size > 0 else 0
print(f"Encoding: {encoding}")
print(f"Transfer size: {raw_size:,} bytes")
print(f"Decoded size: {decoded_size:,} bytes")
print(f"Savings: {compression_ratio:.1f}%")
return responseCompression savings:
HTML: 60-80% smaller with gzip, 65-85% with Brotli
JSON: 70-90% smaller with gzip
CSS: 60-75% smaller
JS: 50-70% smaller
Images: 0% (already compressed)Technique 3: API-First Scraping
Many websites load data via API calls. Intercepting these APIs gives you structured data without page overhead:
import httpx
import json
class APIInterceptScraper:
"""Find and use internal APIs instead of parsing HTML."""
async def discover_apis(self, url, proxy):
"""Use browser to discover API endpoints."""
from playwright.async_api import async_playwright
apis_found = []
async with async_playwright() as p:
browser = await p.chromium.launch(proxy={"server": proxy})
page = await browser.new_page()
# Capture XHR/fetch requests
page.on("request", lambda req: apis_found.append({
"url": req.url,
"method": req.method,
"type": req.resource_type,
}) if req.resource_type in ["xhr", "fetch"] else None)
await page.goto(url)
await page.wait_for_timeout(5000)
await browser.close()
return [
api for api in apis_found
if "api" in api["url"].lower() or "json" in api["url"].lower()
]
async def scrape_via_api(self, api_url, proxy):
"""Directly call the API — much less bandwidth."""
async with httpx.AsyncClient(proxy=proxy) as client:
response = await client.get(
api_url,
headers={
"Accept": "application/json",
"X-Requested-With": "XMLHttpRequest",
}
)
return response.json()
# Bandwidth comparison:
# Full page load: ~2.5 MB
# API call only: ~5-50 KB (50-500x less bandwidth)Technique 4: Conditional Requests
Avoid re-downloading unchanged content:
class ConditionalScraper:
"""Only download content that has changed."""
def __init__(self, proxy):
self.proxy = proxy
self.etag_cache = {}
self.modified_cache = {}
async def get_if_modified(self, url):
headers = {}
# Send ETag if we have one
if url in self.etag_cache:
headers["If-None-Match"] = self.etag_cache[url]
# Send Last-Modified if we have it
if url in self.modified_cache:
headers["If-Modified-Since"] = self.modified_cache[url]
async with httpx.AsyncClient(proxy=self.proxy) as client:
response = await client.get(url, headers=headers)
if response.status_code == 304:
# Not modified — zero bandwidth for content
print(f" {url}: Not modified (0 bytes)")
return None
# Cache headers for next request
if "etag" in response.headers:
self.etag_cache[url] = response.headers["etag"]
if "last-modified" in response.headers:
self.modified_cache[url] = response.headers["last-modified"]
print(f" {url}: Downloaded ({len(response.content)} bytes)")
return response.contentTechnique 5: Partial Content (Range Requests)
Download only what you need from large files:
async def download_partial(url, proxy, start_byte=0, end_byte=1024):
"""Download specific byte range."""
async with httpx.AsyncClient(proxy=proxy) as client:
response = await client.get(
url,
headers={"Range": f"bytes={start_byte}-{end_byte}"}
)
if response.status_code == 206:
print(f"Partial content: {len(response.content)} bytes")
return response.content
else:
print("Server doesn't support range requests")
return response.content
# Useful for: checking headers of large files,
# downloading just the first N KB of a page,
# resuming interrupted downloadsTechnique 6: Response Streaming
Process data as it arrives without buffering the full response:
async def stream_and_extract(url, proxy, max_bytes=50_000):
"""Stream response and stop after finding what we need."""
async with httpx.AsyncClient(proxy=proxy) as client:
total_bytes = 0
chunks = []
async with client.stream("GET", url) as response:
async for chunk in response.aiter_bytes(chunk_size=8192):
chunks.append(chunk)
total_bytes += len(chunk)
# Check if we have what we need
content_so_far = b"".join(chunks).decode("utf-8", errors="ignore")
if "<title>" in content_so_far and "</title>" in content_so_far:
# Found the title — stop downloading
break
if total_bytes > max_bytes:
break
content = b"".join(chunks).decode("utf-8", errors="ignore")
print(f"Downloaded {total_bytes:,} bytes instead of full page")
return contentTechnique 7: Local Caching
import hashlib
import os
import time
import json
class BandwidthCache:
"""Cache responses locally to avoid repeated proxy requests."""
def __init__(self, cache_dir="./cache", ttl=3600):
self.cache_dir = cache_dir
self.ttl = ttl
os.makedirs(cache_dir, exist_ok=True)
def _cache_key(self, url):
return hashlib.sha256(url.encode()).hexdigest()
def get(self, url):
key = self._cache_key(url)
path = os.path.join(self.cache_dir, key)
if not os.path.exists(path):
return None
with open(path, "r") as f:
cached = json.load(f)
if time.time() - cached["timestamp"] > self.ttl:
os.remove(path)
return None
return cached["content"]
def set(self, url, content):
key = self._cache_key(url)
path = os.path.join(self.cache_dir, key)
with open(path, "w") as f:
json.dump({
"url": url,
"content": content,
"timestamp": time.time(),
}, f)
async def get_or_fetch(self, url, proxy):
cached = self.get(url)
if cached:
return cached
async with httpx.AsyncClient(proxy=proxy) as client:
response = await client.get(url)
content = response.text
self.set(url, content)
return contentBandwidth Savings Summary
| Technique | Savings | Effort |
|---|---|---|
| Block images/CSS/fonts | 60-80% | Low |
| Gzip/Brotli compression | 60-85% | None (just set header) |
| API-first scraping | 90-99% | Medium |
| Conditional requests | 100% on unchanged | Low |
| Streaming with early stop | 50-95% | Medium |
| Local caching | 100% on repeats | Low |
| Combined approach | 90-99% | Medium |
Internal Links
- Proxy Performance Benchmarks — measure the speed impact of optimization
- Web Scraping Cost Optimization — broader cost reduction strategies
- Proxy Cost Calculator — calculate bandwidth savings
- Headless Chrome Optimization — reduce browser resource usage
- AJAX Request Interception — capture API calls for bandwidth savings
FAQ
How much can I save on proxy costs with bandwidth optimization?
Combining resource blocking (images/CSS), compression, and API-first scraping typically reduces bandwidth by 90-95%. On residential proxies at $10/GB, scraping 100K pages could cost $250 unoptimized vs $12-25 optimized.
Does blocking resources affect the data I can scrape?
Blocking images and CSS does not affect text data extraction. However, some JavaScript-heavy sites need certain scripts to render content. Test with and without blocking to ensure your target data still loads.
Should I use headless browser or HTTP requests for bandwidth efficiency?
HTTP requests (httpx, requests) use dramatically less bandwidth than headless browsers. A headless browser downloads and executes all page resources. Use HTTP requests when possible, and switch to headless browsers only for JavaScript-rendered content — with resource blocking enabled.
Does compression use more CPU on my machine?
Decompression is fast — gzip adds less than 1ms per response, Brotli 1-3ms. The bandwidth savings (60-85% less data) far outweigh the minimal CPU cost. Always enable compression.
How do I know which optimization gives the biggest improvement?
Measure your baseline first. Profile a sample of requests to see the average page size breakdown (HTML, JS, CSS, images). Block the largest category first. Typically, blocking images gives the biggest single improvement (40-50% savings), followed by CSS and fonts.
- AJAX Request Interception: Scraping API Calls Directly
- Building a Proxy Server from Scratch: Python & Go Tutorial
- Build an Anti-Detection Test Suite: Verify Browser Stealth
- Build a Proxy Rotator in Python: Complete Tutorial
- How to Configure Proxies on iPhone and Android
- How to Use Proxies in Node.js (Axios, Fetch, Puppeteer)
- AJAX Request Interception: Scraping API Calls Directly
- Building a Proxy Server from Scratch: Python & Go Tutorial
- Build an Anti-Detection Test Suite: Verify Browser Stealth
- Build a Proxy Rotator in Python: Complete Tutorial
- How to Configure Proxies on iPhone and Android
- How to Use Proxies in Node.js (Axios, Fetch, Puppeteer)
- AJAX Request Interception: Scraping API Calls Directly
- Building a Proxy Server from Scratch: Python & Go Tutorial
- Build an Anti-Detection Test Suite: Verify Browser Stealth
- Build a Proxy Rotator in Python: Complete Tutorial
- How to Configure Proxies on iPhone and Android
- How to Use Proxies in Node.js (Axios, Fetch, Puppeteer)
- AJAX Request Interception: Scraping API Calls Directly
- Azure Functions for Serverless Web Scraping: the Complete Guide
- Build an Anti-Detection Test Suite: Verify Browser Stealth
- Build a News Crawler in Python: Step-by-Step Tutorial
- How to Configure Proxies on iPhone and Android
- How to Use Proxies in Node.js (Axios, Fetch, Puppeteer)
- AJAX Request Interception: Scraping API Calls Directly
- Azure Functions for Serverless Web Scraping: the Complete Guide
- Build an Anti-Detection Test Suite: Verify Browser Stealth
- Build a News Crawler in Python: Step-by-Step Tutorial
- How to Configure Proxies on iPhone and Android
- How to Use Proxies in Node.js (Axios, Fetch, Puppeteer)
- AJAX Request Interception: Scraping API Calls Directly
- Azure Functions for Serverless Web Scraping: the Complete Guide
- Build an Anti-Detection Test Suite: Verify Browser Stealth
- Build a News Crawler in Python: Step-by-Step Tutorial
- How to Configure Proxies on iPhone and Android
- How to Use Proxies in Node.js (Axios, Fetch, Puppeteer)
Related Reading
- AJAX Request Interception: Scraping API Calls Directly
- Azure Functions for Serverless Web Scraping: the Complete Guide
- Build an Anti-Detection Test Suite: Verify Browser Stealth
- Build a News Crawler in Python: Step-by-Step Tutorial
- How to Configure Proxies on iPhone and Android
- How to Use Proxies in Node.js (Axios, Fetch, Puppeteer)