The CFAA and Web Scraping: Understanding US Legal Precedents

The Computer Fraud and Abuse Act (CFAA) is the primary federal statute that governs unauthorized access to computer systems in the United States. Originally enacted in 1986 to combat hacking, the CFAA has become one of the most frequently litigated laws in web scraping disputes. Understanding how courts interpret the CFAA in the scraping context is essential for any organization collecting data from US-based websites.

What the CFAA Actually Says

The CFAA, codified at 18 U.S.C. Section 1030, creates criminal and civil liability for anyone who “intentionally accesses a computer without authorization, or exceeds authorized access.” The statute was designed to target computer hackers, but its broad language has been applied to a wide range of activities, including web scraping.

Key Provisions

Section 1030(a)(2): Prohibits intentionally accessing a computer without authorization, or exceeding authorized access, and thereby obtaining information from any protected computer. This is the most commonly invoked provision in scraping cases.

Section 1030(a)(5): Prohibits knowingly causing damage to a protected computer through intentional access without authorization. This may apply when scraping causes service disruption.

Section 1030(g): Provides a private right of action for anyone who suffers damage or loss by reason of a CFAA violation. This enables website operators to sue scrapers directly.

The Critical Terms

Two terms are central to CFAA analysis in scraping cases:

“Without authorization”: Accessing a computer system entirely without permission.

“Exceeds authorized access”: Accessing a computer with authorization but using that access to obtain information the accessor is not entitled to obtain.

The distinction between these two concepts, and their application to web scraping, has generated enormous judicial attention.

The Evolution of CFAA Scraping Law

Early Broad Interpretation

Early cases applied the CFAA broadly to web scraping. Courts reasoned that if a website’s terms of service prohibited scraping, then scraping violated those terms and therefore “exceeded authorized access.” This interpretation effectively gave every website operator veto power over scraping through their ToS.

Notable early cases:

EF Cultural Travel v. Zefer (1st Cir. 2003): The court suggested that a website operator’s cease-and-desist communication could establish the boundary of authorization, making subsequent scraping “without authorization.”

United States v. Drew (C.D. Cal. 2009): The court initially allowed a CFAA prosecution based on ToS violations but later overturned the conviction, finding that tying CFAA liability to ToS violations was unconstitutionally vague.

The Narrowing Trend

Over time, courts began narrowing the CFAA’s application to scraping:

United States v. Nosal (9th Cir. 2012) (Nosal I): The Ninth Circuit, sitting en banc, held that “exceeds authorized access” in the CFAA is limited to violations of access restrictions, not use restrictions. An employee who had access to data but used it for an unauthorized purpose did not violate the CFAA. This narrowed the statute significantly.

Sandvig v. Barr (D.D.C. 2020): The court held that researchers who scraped websites in violation of ToS to study discrimination did not violate the CFAA, finding that ToS violations alone do not establish unauthorized access.

The Supreme Court Weighs In

Van Buren v. United States (2021): The Supreme Court resolved a circuit split by holding that “exceeds authorized access” applies only to those who access areas of a computer they are not authorized to access, not to those who misuse access they are otherwise authorized to have.

The Court adopted a “gates-up-or-down” framework: the CFAA is violated when someone accesses information they have no right to access (the gate is down), not when they access information they can reach but use it improperly (the gate is up but they misuse what they find).

This decision significantly limited the CFAA’s reach and undermined the theory that ToS violations constitute CFAA violations.

The hiQ Labs v. LinkedIn Saga

The most important scraping-specific CFAA case is hiQ Labs v. LinkedIn, which wound its way through the courts over several years.

Background

hiQ Labs built a business analyzing publicly available LinkedIn profile data to predict employee turnover. LinkedIn sent hiQ a cease-and-desist letter and implemented technical measures to block hiQ’s access. hiQ sued for declaratory relief.

Ninth Circuit Decision (2022)

After the Supreme Court’s Van Buren decision, the Ninth Circuit revisited hiQ and held:

Public data is not “without authorization”: When data is publicly available (no login required), accessing it is not “without authorization” under the CFAA. The concept of authorization applies to access controls (passwords, login requirements), not to unilateral restrictions imposed through ToS or cease-and-desist letters.

A cease-and-desist does not revoke authorization: LinkedIn’s cease-and-desist letter did not convert hiQ’s access into unauthorized access. For publicly available data, there is no authorization to revoke because no authorization is required in the first place.

The analogy to open-access spaces: The court analogized public websites to open-access spaces. Just as a store cannot make it a federal crime for someone to enter after being told not to, a website operator cannot make it a CFAA violation for someone to access publicly available data after being told not to.

Impact

The hiQ decision established that:

Scraping publicly available web data does not violate the CFAA
Website operators cannot create CFAA liability through cease-and-desist letters alone
The CFAA’s “authorization” concept is tied to technical access controls, not contractual or verbal restrictions

However, the decision left open:

Whether other legal theories (trespass, contract breach, state computer fraud laws) might still apply
Whether scraping behind authentication barriers violates the CFAA
Whether intentional evasion of technical blocking measures affects the analysis

Current State of CFAA Scraping Law

What Is Generally Permissible

Based on current precedent, the following scraping activities are generally not CFAA violations:

Scraping publicly available web pages (no login required)
Scraping despite a website’s terms of service prohibiting it
Continuing to scrape after receiving a cease-and-desist letter (regarding CFAA specifically; other legal theories may still apply)
Using standard HTTP requests to access publicly available content

What Remains Risky

The following activities may still violate the CFAA:

Scraping behind login walls: Accessing data that requires authentication and then scraping it for unauthorized purposes may constitute exceeding authorized access
Using stolen credentials: Accessing a website with credentials obtained without the account holder’s permission is clearly “without authorization”
Circumventing technical barriers: Courts have not definitively addressed whether bypassing IP blocks, CAPTCHAs, or rate limiters constitutes unauthorized access
Causing system damage: If scraping causes service disruption, Section 1030(a)(5) may apply regardless of authorization status

Circuit Splits and Uncertainty

While the Supreme Court’s Van Buren decision and the Ninth Circuit’s hiQ decision provide strong guidance, not all circuits have addressed scraping-specific questions. Organizations operating outside the Ninth Circuit should be aware that other courts may reach different conclusions on open questions.

Practical Implications for Proxy Users

Using Proxies for Public Data Collection

Using proxy infrastructure like DataResearchTools mobile proxies to collect publicly available data from US websites is supported by current CFAA precedent. The hiQ decision affirms that accessing public data does not require authorization, and the method of access (direct or through a proxy) does not change the authorization analysis.

However, proxy users should be mindful of several nuances:

Rate limiting and system impact: Even if accessing public data is not “unauthorized,” causing system disruption through aggressive scraping could trigger Section 1030(a)(5). Use rate limiting and respect the target site’s capacity.

IP rotation and blocking evasion: If a website blocks your IP address and you rotate to a new IP to continue scraping, the legal analysis is not entirely clear. Some courts might view this as circumventing an access restriction; others might not, given that the data remains publicly available. Conservative practice is to interpret IP blocking as a signal to reassess your approach.

CAPTCHA circumvention: Bypassing CAPTCHAs is a gray area. CAPTCHAs can be viewed as access controls (the gate is down), which might bring the CFAA back into play. Avoid automated CAPTCHA solving for US-based targets.

Documentation Best Practices

Maintain records that demonstrate CFAA compliance:

Document that target data is publicly available: Screenshots showing no login is required
Record robots.txt compliance: Evidence that you checked and respected robots.txt
Log rate limiting: Evidence that your request rates are reasonable
Track system impact: Monitoring showing that your scraping does not degrade target site performance

Beyond the CFAA

The CFAA is not the only legal framework that applies to scraping in the US. Even if your activities do not violate the CFAA, consider:

State computer fraud laws: Many states have their own unauthorized access statutes, some broader than the CFAA. California’s Comprehensive Computer Data Access and Fraud Act (Penal Code Section 502) is particularly relevant.

Copyright law: The CFAA does not preempt copyright claims. Scraping copyrighted content may infringe even if access is authorized.

Trespass to chattels: Some courts recognize common law trespass claims against scrapers who impose excessive burdens on website infrastructure.

Breach of contract: If you agreed to terms of service (especially through clickwrap), breach of contract claims may succeed even where CFAA claims fail.

State privacy laws: CCPA, CPRA, and other state privacy laws may impose obligations when scraping personal data of state residents.

Unfair competition: State unfair competition laws may apply to scraping that gives one business an unfair advantage over another.

Key Takeaways for Scraping Operations

Do

Focus on publicly available data that requires no authentication
Implement reasonable rate limiting
Respect robots.txt directives
Document your compliance approach
Use reputable proxy infrastructure like DataResearchTools that supports compliant data collection
Monitor legal developments, as the law continues to evolve
Consult legal counsel for high-risk scraping projects

Do Not

Scrape behind login walls without explicit authorization
Use stolen or shared credentials
Circumvent CAPTCHAs or technical access controls
Scrape at rates that could impact target site performance
Assume CFAA protection extends to all aspects of scraping (copyright, privacy, and other laws still apply)
Rely solely on the CFAA analysis without considering other legal theories

The Future of CFAA and Scraping

Several developments may further shape CFAA scraping law:

Legislative reform: Proposals to amend the CFAA have been introduced in Congress periodically. Some would narrow the statute further; others would expand it.

AI training cases: The wave of lawsuits over AI training data collection may produce new precedents on CFAA application to large-scale automated data collection.

Technical evolution: As websites implement more sophisticated anti-scraping measures, courts will need to address whether circumventing these measures constitutes unauthorized access.

International harmonization: As other countries develop their own computer fraud frameworks, US law may be influenced by international approaches, particularly in cross-border scraping disputes.

Conclusion

The CFAA’s application to web scraping has narrowed significantly over the past decade. The Van Buren and hiQ decisions establish that scraping publicly available data — data accessible without authentication — does not violate the CFAA. This provides meaningful legal protection for proxy users engaged in legitimate data collection from public web pages.

However, the CFAA is just one piece of the legal puzzle. Compliant scraping requires attention to copyright, privacy, contract, and state law in addition to federal computer fraud statutes. Organizations that build comprehensive compliance programs — including responsible proxy infrastructure from providers like DataResearchTools — position themselves for sustainable, legally defensible data collection operations.