Networking Terms for Scrapers: Complete Glossary 2026

Networking Terms for Scrapers: Complete Glossary 2026

Web scraping operates at the intersection of networking and data extraction. Understanding networking fundamentals helps scrapers debug connection issues, optimize performance, and evade detection. This glossary covers every networking term relevant to web scraping professionals.

Core Networking Concepts

TCP (Transmission Control Protocol)

The reliable transport protocol underlying HTTP. TCP establishes connections through a three-way handshake (SYN, SYN-ACK, ACK) and guarantees ordered, error-checked delivery. Understanding TCP helps diagnose connection timeouts and proxy issues.

IP Address

Unique numerical identifier for devices on a network. IPv4 (e.g., 192.168.1.1) has ~4.3 billion addresses; IPv6 (e.g., 2001:0db8:85a3::8a2e:0370:7334) has 340 undecillion. Proxy services provide different IP addresses for each request.

DNS (Domain Name System)

Translates domain names (google.com) to IP addresses (142.250.80.46). DNS queries can reveal scraping activity if not properly configured. DNS-over-HTTPS (DoH) can prevent DNS-based detection.

Port

Numbered endpoint for network communication. Common ports: 80 (HTTP), 443 (HTTPS), 1080 (SOCKS), 8080 (HTTP proxy), 3128 (Squid proxy). Proxy connections typically use ports 80, 443, or custom ports.

Bandwidth

Maximum data transfer rate of a network connection. Measured in Mbps or Gbps. Proxy bandwidth directly affects scraping speed — residential proxies typically offer 10-50 Mbps vs 100+ Mbps for datacenter.

Latency

Time delay between sending a request and receiving the first byte of response. Measured in milliseconds. Lower latency = faster scraping. Proxy latency ranges from 50ms (datacenter) to 500ms (international residential).

HTTP/HTTPS Concepts

HTTP (Hypertext Transfer Protocol)

Application-layer protocol for web communication. Scrapers send HTTP requests (GET, POST) and parse HTTP responses. Understanding HTTP headers, status codes, and methods is fundamental to scraping.

HTTPS (HTTP Secure)

HTTP encrypted with TLS/SSL. Nearly all modern websites use HTTPS. Proxy servers can either tunnel HTTPS (CONNECT method) or terminate and re-encrypt (MITM proxy).

HTTP/2

Binary protocol replacing HTTP/1.1’s text format. Features multiplexing (multiple requests over one connection), header compression (HPACK), and server push. HTTP/2 settings create fingerprinting vectors.

HTTP/3 (QUIC)

Latest HTTP protocol built on UDP instead of TCP. Offers faster connection establishment and better performance over unreliable networks. Growing adoption in 2026 introduces new scraping challenges.

HTTP Headers

Metadata sent with HTTP requests and responses. Critical headers for scraping include User-Agent, Accept, Accept-Language, Cookie, Referer, and security headers (Sec-CH-UA, Sec-Fetch-*).

HTTP Status Codes

Numerical codes indicating request outcomes. Key codes for scrapers:

CodeMeaningScraping Action
200OKParse response
301/302RedirectFollow redirect
403ForbiddenChange proxy/headers
407Proxy Auth RequiredCheck proxy credentials
429Too Many RequestsSlow down, rotate proxy
500Server ErrorRetry later
503Service UnavailableRetry, check if blocked
520-530Cloudflare errorsBypass strategies needed

Cookies

Small data files stored by the browser, sent with subsequent requests. Cookies maintain sessions, track users, and are essential for authenticated scraping.

Request Methods

HTTP verbs: GET (retrieve data), POST (submit data), PUT (update), DELETE (remove), HEAD (headers only), OPTIONS (capabilities). Most scraping uses GET; form submission uses POST.

TLS/SSL Terms

TLS (Transport Layer Security)

Cryptographic protocol securing data in transit. TLS handshake characteristics (cipher suites, extensions, curves) create fingerprints used for bot detection (JA3/JA4).

SSL Certificate

Digital certificate authenticating a website’s identity. Self-signed or invalid certificates may indicate suspicious sites. Proxy MITM requires certificate trust configuration.

SNI (Server Name Indication)

TLS extension that indicates the hostname the client wants to connect to. Allows multiple HTTPS sites on one IP. SNI values are visible even in encrypted traffic.

ALPN (Application-Layer Protocol Negotiation)

TLS extension for selecting the application protocol (HTTP/1.1, h2, h3). Part of the TLS fingerprint that can identify scraping tools.

Proxy-Specific Terms

Forward Proxy

Server that sits between the client and the internet, forwarding requests on behalf of the client. This is what “proxy” typically means in web scraping context.

Reverse Proxy

Server that sits in front of web servers, forwarding client requests to backend servers. CDNs like Cloudflare act as reverse proxies. Not used for scraping directly.

SOCKS Proxy

Protocol-independent proxy that operates at a lower level than HTTP proxies. SOCKS5 supports authentication and UDP. Often used for non-HTTP traffic.

HTTP CONNECT

HTTP method used to establish a tunnel through a proxy for HTTPS connections. The proxy creates a TCP connection to the target and relays data without inspecting it.

Proxy Chaining

Routing traffic through multiple proxies in sequence for additional anonymity. Increases latency but makes tracking more difficult.

Sticky Session

Proxy configuration that maintains the same IP address for multiple requests within a time window (typically 1-30 minutes). Essential for stateful scraping tasks.

IP Rotation

Automatically changing the proxy IP address between requests or at set intervals. Prevents rate limiting and distributes request load across many IPs.

Backconnect Proxy

Proxy gateway that automatically selects and rotates IPs from a pool. The client connects to a single endpoint; the proxy provider handles IP selection and rotation.

DNS Concepts

DNS Resolution

Process of translating a domain name to an IP address through DNS queries. Slow or incorrect DNS resolution can impact scraping performance.

DNS Leak

When DNS queries bypass the proxy and use the default DNS server, revealing the scraper’s real location. DNS leak prevention is important for anonymous scraping.

DNS over HTTPS (DoH)

Protocol that encrypts DNS queries within HTTPS, preventing ISPs and middleboxes from seeing DNS lookups. Relevant for privacy-focused scraping operations.

TTL (Time to Live)

Duration a DNS record should be cached. Short TTLs indicate frequently changing IPs (common with CDNs). Scrapers should respect DNS TTLs for correct target resolution.

Performance Terms

Throughput

Amount of data transferred per unit of time. Measured in requests/second or pages/minute. Proxy throughput depends on bandwidth, latency, and concurrent connections.

Concurrency

Number of simultaneous connections or requests. Higher concurrency = faster scraping, but too many concurrent connections trigger rate limiting.

Connection Pool

Pre-established group of TCP connections reused across multiple requests. Reduces overhead of creating new connections for each request.

Keep-Alive

HTTP header (Connection: keep-alive) that maintains a TCP connection for multiple requests. Improves performance and matches normal browser behavior.

FAQ

What networking knowledge do I need for web scraping?

Understanding HTTP methods and headers, status codes, cookies, DNS resolution, and proxy protocols (HTTP vs SOCKS5) covers the essential networking knowledge for effective scraping.

Why do HTTP headers matter for scraping?

Anti-bot systems analyze headers like User-Agent, Accept-Language, Sec-CH-UA, and Sec-Fetch-* to determine if requests come from real browsers. Inconsistent or missing headers are immediate bot indicators.

What causes a 403 error in scraping?

A 403 (Forbidden) typically means the website has detected and blocked your request. Common causes: datacenter IP detected, missing/incorrect headers, rate limit exceeded, or geographic restriction.

What is the difference between HTTP and SOCKS5 proxies?

HTTP proxies understand and can modify HTTP traffic, working at the application layer. SOCKS5 proxies work at the transport layer, forwarding any type of traffic without understanding it. SOCKS5 is more flexible but HTTP proxies can add features like header modification.


Internal links: Proxy Glossary A-Z | Anti-Bot Terminology | HTTP Status Codes Guide | SOCKS5 Proxy Guide


Related Reading

Scroll to Top