HTTP Headers Reference: Complete Guide for Proxy Users & Web Scrapers

HTTP Headers Reference: Complete Guide for Proxy Users & Web Scrapers

HTTP headers are the metadata sent between clients and servers with every request and response. For proxy users, web scrapers, and API developers, understanding headers is essential. The wrong headers can get you blocked, while the right ones make your requests indistinguishable from normal browser traffic. This reference covers every important header with practical examples.

Request Headers

Request headers are sent by the client (browser, cURL, script) to the server.

Essential Request Headers

User-Agent

Identifies the client software. One of the most important headers for web scraping.

# Browser-like User-Agent
curl -H "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36" \
     https://example.com

# Mobile User-Agent
curl -H "User-Agent: Mozilla/5.0 (iPhone; CPU iPhone OS 17_0 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.0 Mobile/15E148 Safari/604.1" \
     https://example.com

Accept

Tells the server what content types the client can handle:

# JSON API
curl -H "Accept: application/json" https://api.example.com/data

# HTML page
curl -H "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8" https://example.com

# Any content
curl -H "Accept: */*" https://example.com

Accept-Language

Specifies preferred languages. Important for geo-targeted content:

curl -H "Accept-Language: en-US,en;q=0.9" https://example.com
curl -H "Accept-Language: ja-JP,ja;q=0.9,en;q=0.5" https://example.com

Accept-Encoding

Indicates supported compression:

curl -H "Accept-Encoding: gzip, deflate, br" --compressed https://example.com

Referer

Shows which page linked to the current request:

curl -H "Referer: https://www.google.com/" https://example.com/page

Cookie

Sends stored cookies to the server:

curl -b "session_id=abc123; user_pref=dark_mode" https://example.com/dashboard

Authentication Headers

Authorization

# Basic Auth
curl -H "Authorization: Basic dXNlcjpwYXNz" https://api.example.com

# Bearer Token
curl -H "Authorization: Bearer eyJhbGciOiJIUzI1NiIs..." https://api.example.com

# API Key
curl -H "X-API-Key: your-api-key-here" https://api.example.com

Content Headers (for POST/PUT/PATCH)

Content-Type

# JSON
curl -H "Content-Type: application/json" -d '{"key":"value"}' https://api.example.com

# Form data
curl -H "Content-Type: application/x-www-form-urlencoded" -d "key=value" https://example.com

# Multipart (file upload) - cURL sets this automatically with -F
curl -F "file=@photo.jpg" https://example.com/upload

Content-Length

Automatically set by cURL based on the body size. Rarely needs manual configuration.

Response Headers

Headers returned by the server to the client.

Status and Content Headers

HeaderDescriptionExample Value
Content-TypeResponse body formatapplication/json; charset=utf-8
Content-LengthResponse body size in bytes4523
Content-EncodingCompression usedgzip
Content-LanguageResponse languageen-US
Content-DispositionDownload filename hintattachment; filename="report.pdf"

Caching Headers

HeaderDescriptionExample Value
Cache-ControlCaching directivesmax-age=3600, public
ETagResource version identifier"33a64df551425fcc55e4d42a148795d9f25f89d4"
Last-ModifiedWhen resource was last changedWed, 15 Jan 2025 08:00:00 GMT
ExpiresWhen cached content expiresThu, 16 Jan 2025 08:00:00 GMT
AgeTime since response was cached (seconds)3600

Security Headers

HeaderDescriptionExample Value
Strict-Transport-SecurityForce HTTPSmax-age=31536000; includeSubDomains
X-Content-Type-OptionsPrevent MIME sniffingnosniff
X-Frame-OptionsPrevent clickjackingDENY
Content-Security-PolicyControl resource loadingdefault-src 'self'
X-XSS-ProtectionXSS filter1; mode=block
Referrer-PolicyControl Referer headerstrict-origin-when-cross-origin

Rate Limiting Headers

HeaderDescriptionExample Value
X-RateLimit-LimitMax requests per window100
X-RateLimit-RemainingRequests left in window47
X-RateLimit-ResetWhen window resets (Unix timestamp)1705312800
Retry-AfterSeconds to wait after 429/50360
# Check rate limit headers
curl -s -I https://api.example.com/data | grep -i "rate\|retry"

CORS Headers

HeaderDescriptionExample Value
Access-Control-Allow-OriginAllowed origins* or https://app.example.com
Access-Control-Allow-MethodsAllowed HTTP methodsGET, POST, PUT, DELETE
Access-Control-Allow-HeadersAllowed request headersAuthorization, Content-Type
Access-Control-Max-AgePreflight cache time86400

Proxy-Specific Headers

Headers Added by Proxies

HeaderDescriptionImpact
X-Forwarded-ForClient’s original IPReveals you are using a proxy
X-Forwarded-ProtoOriginal protocol (http/https)Can expose proxy usage
X-Forwarded-HostOriginal Host headerMay reveal proxy
ViaProxy chain informationDirectly identifies proxy
ForwardedStandardized forwarding infoModern replacement for X-Forwarded
X-Real-IPSingle client IPCommon in Nginx setups

Detecting Proxy Headers

Check if your proxy leaks identifying headers:

# Check what headers the target sees
curl -x http://proxy:8080 https://httpbin.org/headers | jq '.'

# Look for proxy-revealing headers
curl -x http://proxy:8080 https://httpbin.org/headers | \
  jq '.headers | to_entries[] | select(.key | test("forward|via|proxy|real.ip"; "i"))'

Proxy Authentication Headers

# Proxy-Authorization (sent to proxy)
curl -x http://proxy:8080 \
     -H "Proxy-Authorization: Basic dXNlcjpwYXNz" \
     https://example.com

# vs Authorization (sent to target server)
curl -H "Authorization: Bearer token" https://api.example.com

Key difference:

  • Authorization: Authenticates with the target server
  • Proxy-Authorization: Authenticates with the proxy server
  • Proxy-Authenticate: Server tells client that proxy auth is needed (407 response)
  • WWW-Authenticate: Server tells client that server auth is needed (401 response)

Headers for Web Scraping Anti-Detection

Realistic Browser Headers

curl -s \
     -H "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36" \
     -H "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8" \
     -H "Accept-Language: en-US,en;q=0.9" \
     -H "Accept-Encoding: gzip, deflate, br" \
     -H "Connection: keep-alive" \
     -H "Upgrade-Insecure-Requests: 1" \
     -H "Sec-Fetch-Dest: document" \
     -H "Sec-Fetch-Mode: navigate" \
     -H "Sec-Fetch-Site: none" \
     -H "Sec-Fetch-User: ?1" \
     -H "Cache-Control: max-age=0" \
     --compressed \
     https://example.com

Python Header Rotation

import requests
import random

USER_AGENTS = [
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:121.0) Gecko/20100101 Firefox/121.0",
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.2 Safari/605.1.15",
    "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
]

ACCEPT_LANGUAGES = [
    "en-US,en;q=0.9",
    "en-GB,en;q=0.9",
    "en-US,en;q=0.9,es;q=0.8",
]

def get_realistic_headers():
    """Generate realistic browser-like headers."""
    ua = random.choice(USER_AGENTS)

    headers = {
        "User-Agent": ua,
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
        "Accept-Language": random.choice(ACCEPT_LANGUAGES),
        "Accept-Encoding": "gzip, deflate, br",
        "Connection": "keep-alive",
        "Upgrade-Insecure-Requests": "1",
        "Sec-Fetch-Dest": "document",
        "Sec-Fetch-Mode": "navigate",
        "Sec-Fetch-Site": "none",
        "Sec-Fetch-User": "?1",
        "Cache-Control": "max-age=0",
    }

    # Add Chrome-specific headers if Chrome UA
    if "Chrome" in ua:
        headers["Sec-Ch-Ua"] = '"Not_A Brand";v="8", "Chromium";v="120", "Google Chrome";v="120"'
        headers["Sec-Ch-Ua-Mobile"] = "?0"
        headers["Sec-Ch-Ua-Platform"] = '"Windows"' if "Windows" in ua else '"macOS"'

    return headers

# Usage with proxy
response = requests.get(
    "https://example.com",
    headers=get_realistic_headers(),
    proxies={"https": "http://user:pass@proxy:8080"}
)

Inspecting Headers with cURL

View Response Headers Only

curl -I https://example.com
# or
curl --head https://example.com

View Both Request and Response Headers

curl -v https://example.com 2>&1 | grep -E "^[<>]"
# > lines = sent (request headers)
# < lines = received (response headers)

View Headers Through a Proxy

curl -v -x http://proxy:8080 https://example.com 2>&1 | grep -E "^[<>]"

Save Headers to File

curl -D headers.txt -o body.html https://example.com
cat headers.txt

Header Troubleshooting Table

IssueSymptomHeader to Check/Fix
Blocked as bot403 ForbiddenUser-Agent, Accept, Sec-Fetch headers
Wrong content typeGarbled responseAccept, Accept-Encoding
Authentication failed401 UnauthorizedAuthorization
Proxy auth failed407 Proxy Auth RequiredProxy-Authorization
Rate limited429 Too Many RequestsX-RateLimit-Remaining, Retry-After
Redirect loopToo many redirectsLocation, Referer
CORS errorBrowser blocks responseAccess-Control-Allow-Origin
Caching issuesStale dataCache-Control, ETag, If-None-Match
Proxy detectedDifferent results vs browserX-Forwarded-For, Via

FAQ

What HTTP headers should I set for web scraping?

At minimum, set User-Agent to a current browser string, Accept to match what browsers send, Accept-Language to a common locale, and Accept-Encoding: gzip, deflate, br. For better stealth, also include Sec-Fetch-Dest, Sec-Fetch-Mode, Sec-Fetch-Site, and Sec-Ch-Ua headers that match your User-Agent. Rotate User-Agents between requests and set Referer to mimic natural browsing patterns.

How do I check if my proxy is leaking headers?

Send a request through your proxy to a header-inspection service like https://httpbin.org/headers or https://ifconfig.me/all. Look for X-Forwarded-For, Via, X-Real-IP, or Forwarded headers in the response. Elite/high-anonymity proxies should not add any of these headers. Transparent proxies add all of them, and anonymous proxies add some but mask your real IP.

What is the difference between Authorization and Proxy-Authorization headers?

Authorization authenticates you with the target web server (e.g., API authentication with Basic auth or Bearer tokens). Proxy-Authorization authenticates you with the proxy server itself. Both can be present simultaneously in the same request. A 401 status code means the target server rejected your Authorization header, while a 407 means the proxy rejected your Proxy-Authorization header.

How do I handle rate limit headers in my scraper?

Check X-RateLimit-Remaining in each response. When it reaches zero, read X-RateLimit-Reset for the reset timestamp and Retry-After for the wait duration. In Python: remaining = int(response.headers.get("X-RateLimit-Remaining", 1)). If remaining is zero, sleep until the reset time. Distribute requests across multiple proxies to effectively multiply your rate limit allowance.

Why do some websites return different content when I use cURL vs a browser?

Websites use header fingerprinting to detect non-browser clients. cURL’s default headers (User-Agent: curl/8.x, no Accept-Language, no Sec-Fetch headers) are easily identified. Websites may serve different content, block requests, or serve CAPTCHAs. To get browser-identical responses, replicate the full set of headers your target browser sends, including Sec-Ch-Ua, Sec-Fetch-* headers, and proper Accept values. Use browser DevTools Network tab to copy exact headers.


Related Reading

Scroll to Top