Networking Terms for Scrapers: Complete Glossary 2026
Web scraping operates at the intersection of networking and data extraction. Understanding networking fundamentals helps scrapers debug connection issues, optimize performance, and evade detection. This glossary covers every networking term relevant to web scraping professionals.
Core Networking Concepts
TCP (Transmission Control Protocol)
The reliable transport protocol underlying HTTP. TCP establishes connections through a three-way handshake (SYN, SYN-ACK, ACK) and guarantees ordered, error-checked delivery. Understanding TCP helps diagnose connection timeouts and proxy issues.
IP Address
Unique numerical identifier for devices on a network. IPv4 (e.g., 192.168.1.1) has ~4.3 billion addresses; IPv6 (e.g., 2001:0db8:85a3::8a2e:0370:7334) has 340 undecillion. Proxy services provide different IP addresses for each request.
DNS (Domain Name System)
Translates domain names (google.com) to IP addresses (142.250.80.46). DNS queries can reveal scraping activity if not properly configured. DNS-over-HTTPS (DoH) can prevent DNS-based detection.
Port
Numbered endpoint for network communication. Common ports: 80 (HTTP), 443 (HTTPS), 1080 (SOCKS), 8080 (HTTP proxy), 3128 (Squid proxy). Proxy connections typically use ports 80, 443, or custom ports.
Bandwidth
Maximum data transfer rate of a network connection. Measured in Mbps or Gbps. Proxy bandwidth directly affects scraping speed — residential proxies typically offer 10-50 Mbps vs 100+ Mbps for datacenter.
Latency
Time delay between sending a request and receiving the first byte of response. Measured in milliseconds. Lower latency = faster scraping. Proxy latency ranges from 50ms (datacenter) to 500ms (international residential).
HTTP/HTTPS Concepts
HTTP (Hypertext Transfer Protocol)
Application-layer protocol for web communication. Scrapers send HTTP requests (GET, POST) and parse HTTP responses. Understanding HTTP headers, status codes, and methods is fundamental to scraping.
HTTPS (HTTP Secure)
HTTP encrypted with TLS/SSL. Nearly all modern websites use HTTPS. Proxy servers can either tunnel HTTPS (CONNECT method) or terminate and re-encrypt (MITM proxy).
HTTP/2
Binary protocol replacing HTTP/1.1’s text format. Features multiplexing (multiple requests over one connection), header compression (HPACK), and server push. HTTP/2 settings create fingerprinting vectors.
HTTP/3 (QUIC)
Latest HTTP protocol built on UDP instead of TCP. Offers faster connection establishment and better performance over unreliable networks. Growing adoption in 2026 introduces new scraping challenges.
HTTP Headers
Metadata sent with HTTP requests and responses. Critical headers for scraping include User-Agent, Accept, Accept-Language, Cookie, Referer, and security headers (Sec-CH-UA, Sec-Fetch-*).
HTTP Status Codes
Numerical codes indicating request outcomes. Key codes for scrapers:
| Code | Meaning | Scraping Action |
|---|---|---|
| 200 | OK | Parse response |
| 301/302 | Redirect | Follow redirect |
| 403 | Forbidden | Change proxy/headers |
| 407 | Proxy Auth Required | Check proxy credentials |
| 429 | Too Many Requests | Slow down, rotate proxy |
| 500 | Server Error | Retry later |
| 503 | Service Unavailable | Retry, check if blocked |
| 520-530 | Cloudflare errors | Bypass strategies needed |
Cookies
Small data files stored by the browser, sent with subsequent requests. Cookies maintain sessions, track users, and are essential for authenticated scraping.
Request Methods
HTTP verbs: GET (retrieve data), POST (submit data), PUT (update), DELETE (remove), HEAD (headers only), OPTIONS (capabilities). Most scraping uses GET; form submission uses POST.
TLS/SSL Terms
TLS (Transport Layer Security)
Cryptographic protocol securing data in transit. TLS handshake characteristics (cipher suites, extensions, curves) create fingerprints used for bot detection (JA3/JA4).
SSL Certificate
Digital certificate authenticating a website’s identity. Self-signed or invalid certificates may indicate suspicious sites. Proxy MITM requires certificate trust configuration.
SNI (Server Name Indication)
TLS extension that indicates the hostname the client wants to connect to. Allows multiple HTTPS sites on one IP. SNI values are visible even in encrypted traffic.
ALPN (Application-Layer Protocol Negotiation)
TLS extension for selecting the application protocol (HTTP/1.1, h2, h3). Part of the TLS fingerprint that can identify scraping tools.
Proxy-Specific Terms
Forward Proxy
Server that sits between the client and the internet, forwarding requests on behalf of the client. This is what “proxy” typically means in web scraping context.
Reverse Proxy
Server that sits in front of web servers, forwarding client requests to backend servers. CDNs like Cloudflare act as reverse proxies. Not used for scraping directly.
SOCKS Proxy
Protocol-independent proxy that operates at a lower level than HTTP proxies. SOCKS5 supports authentication and UDP. Often used for non-HTTP traffic.
HTTP CONNECT
HTTP method used to establish a tunnel through a proxy for HTTPS connections. The proxy creates a TCP connection to the target and relays data without inspecting it.
Proxy Chaining
Routing traffic through multiple proxies in sequence for additional anonymity. Increases latency but makes tracking more difficult.
Sticky Session
Proxy configuration that maintains the same IP address for multiple requests within a time window (typically 1-30 minutes). Essential for stateful scraping tasks.
IP Rotation
Automatically changing the proxy IP address between requests or at set intervals. Prevents rate limiting and distributes request load across many IPs.
Backconnect Proxy
Proxy gateway that automatically selects and rotates IPs from a pool. The client connects to a single endpoint; the proxy provider handles IP selection and rotation.
DNS Concepts
DNS Resolution
Process of translating a domain name to an IP address through DNS queries. Slow or incorrect DNS resolution can impact scraping performance.
DNS Leak
When DNS queries bypass the proxy and use the default DNS server, revealing the scraper’s real location. DNS leak prevention is important for anonymous scraping.
DNS over HTTPS (DoH)
Protocol that encrypts DNS queries within HTTPS, preventing ISPs and middleboxes from seeing DNS lookups. Relevant for privacy-focused scraping operations.
TTL (Time to Live)
Duration a DNS record should be cached. Short TTLs indicate frequently changing IPs (common with CDNs). Scrapers should respect DNS TTLs for correct target resolution.
Performance Terms
Throughput
Amount of data transferred per unit of time. Measured in requests/second or pages/minute. Proxy throughput depends on bandwidth, latency, and concurrent connections.
Concurrency
Number of simultaneous connections or requests. Higher concurrency = faster scraping, but too many concurrent connections trigger rate limiting.
Connection Pool
Pre-established group of TCP connections reused across multiple requests. Reduces overhead of creating new connections for each request.
Keep-Alive
HTTP header (Connection: keep-alive) that maintains a TCP connection for multiple requests. Improves performance and matches normal browser behavior.
FAQ
What networking knowledge do I need for web scraping?
Understanding HTTP methods and headers, status codes, cookies, DNS resolution, and proxy protocols (HTTP vs SOCKS5) covers the essential networking knowledge for effective scraping.
Why do HTTP headers matter for scraping?
Anti-bot systems analyze headers like User-Agent, Accept-Language, Sec-CH-UA, and Sec-Fetch-* to determine if requests come from real browsers. Inconsistent or missing headers are immediate bot indicators.
What causes a 403 error in scraping?
A 403 (Forbidden) typically means the website has detected and blocked your request. Common causes: datacenter IP detected, missing/incorrect headers, rate limit exceeded, or geographic restriction.
What is the difference between HTTP and SOCKS5 proxies?
HTTP proxies understand and can modify HTTP traffic, working at the application layer. SOCKS5 proxies work at the transport layer, forwarding any type of traffic without understanding it. SOCKS5 is more flexible but HTTP proxies can add features like header modification.
Internal links: Proxy Glossary A-Z | Anti-Bot Terminology | HTTP Status Codes Guide | SOCKS5 Proxy Guide
- Anti-Bot Detection Glossary: 50+ Terms Defined
- Anti-Bot Terminology Glossary: Complete A-Z Reference 2026
- Backconnect Proxies Deep Dive: Architecture and Real-World Performance
- Best Proxies in Southeast Asia: Singapore, Thailand, Indonesia, Philippines
- How to Build a 4G/5G Mobile Proxy Farm with Raspberry Pi
- How to Configure a Proxy in FoxyProxy for Firefox
- Anti-Bot Detection Glossary: 50+ Terms Defined
- Anti-Bot Terminology Glossary: Complete A-Z Reference 2026
- Backconnect Proxies Deep Dive: Architecture and Real-World Performance
- Best Proxies in Southeast Asia: Singapore, Thailand, Indonesia, Philippines
- How to Build a 4G/5G Mobile Proxy Farm with Raspberry Pi
- How to Configure a Proxy in FoxyProxy for Firefox
- Anti-Bot Detection Glossary: 50+ Terms Defined
- Anti-Bot Terminology Glossary: Complete A-Z Reference 2026
- 403 Forbidden Error: What It Means & How to Fix It
- 407 Proxy Authentication Required: Fix Guide
- Backconnect Proxies Deep Dive: Architecture and Real-World Performance
- Best Proxies in Southeast Asia: Singapore, Thailand, Indonesia, Philippines
Related Reading
- Anti-Bot Detection Glossary: 50+ Terms Defined
- Anti-Bot Terminology Glossary: Complete A-Z Reference 2026
- 403 Forbidden Error: What It Means & How to Fix It
- 407 Proxy Authentication Required: Fix Guide
- Backconnect Proxies Deep Dive: Architecture and Real-World Performance
- Best Proxies in Southeast Asia: Singapore, Thailand, Indonesia, Philippines