Web Scraping Legal Guide 2026: GDPR, CFAA, hiQ vs LinkedIn, and More

Web Scraping Legal Guide 2026: GDPR, CFAA, hiQ vs LinkedIn, and More

scraping publicly available data is generally legal in the United States and most of Europe in 2026. courts have consistently ruled that public web pages do not enjoy CFAA-style “without authorization” protection, and the hiQ Labs v LinkedIn line of cases makes that explicit for scraping. but personal data falls under GDPR even if it is public, ToS violations create separate contract risk, and copyright protects scraped content even when scraping is allowed.

this guide is informational and not legal advice. for any production scraping operation, consult a qualified attorney in the relevant jurisdictions.

the high-level rule

three legal frameworks apply to most scraping:

  1. computer access laws (CFAA in the US, Computer Misuse Act in the UK, similar in EU member states): regulate unauthorized access to computer systems
  2. data protection laws (GDPR in the EU/UK, CCPA in California, PIPL in China): regulate processing of personal data
  3. contract and tort law: ToS breach, trespass to chattels, copyright

each can apply independently. you can be compliant with one and violate another. the safest scrapers map each project against all three.

CFAA and hiQ Labs v LinkedIn

the Computer Fraud and Abuse Act (CFAA, 18 USC § 1030) is the main US law sites have tried to use against scrapers. the key question: does scraping public data constitute access “without authorization”?

the hiQ Labs Inc v LinkedIn Corp case answered no, with caveats. the timeline matters:

  • 2017: hiQ sued LinkedIn after LinkedIn sent a cease-and-desist for scraping public profiles. hiQ won a preliminary injunction.
  • 2019: 9th Circuit affirmed: scraping public data is not “without authorization” under CFAA.
  • 2021: Supreme Court (in Van Buren v. United States) narrowed CFAA generally, supporting hiQ’s reading.
  • 2022: 9th Circuit reaffirmed on remand. The hiQ-LinkedIn dispute eventually settled, with the November 2022 final judgment and permanent injunction (final judgment text) finding hiQ liable for breach of contract under LinkedIn’s User Agreement. hiQ was permanently enjoined from scraping LinkedIn data and ordered to delete data already collected.

so the hiQ ruling has two threads:

  • CFAA: hiQ won. Scraping public web data is not “unauthorized access.” This precedent stands and is followed in most US courts.
  • Contract: hiQ lost. Even though scraping was not a CFAA violation, hiQ had agreed to LinkedIn’s User Agreement (which prohibits scraping), and that agreement was enforceable.

the practical takeaway: scraping public data is not a federal crime in the US, but it can still be a breach of contract if you have a binding agreement with the site (you logged in, accepted ToS, etc.) that prohibits it. the LinkedIn case is the clearest authority on this distinction.

current CFAA precedent post-Van Buren

Van Buren v United States (Supreme Court, 2021) tightened CFAA broadly. the Court ruled CFAA’s “exceeds authorized access” clause applies only to information you have no right to access at all, not to information you have access to but use for an improper purpose.

for scrapers this means:

  • accessing public pages: not CFAA-covered, you have authorization
  • scraping behind a login that you legitimately have: probably not CFAA, but may be ToS breach
  • scraping using stolen credentials, or accessing pages you have been specifically blocked from: CFAA risk

hard cases live in the third category. if a site sends you a cease-and-desist or actively blocks your IP and you continue, courts have sometimes treated that as crossing into “without authorization.” the law is unsettled here. the conservative read: stop when explicitly told to.

GDPR and personal data scraping

if you scrape personal data of EU/UK residents, GDPR applies regardless of where you operate. “personal data” includes anything that identifies a natural person: name, email, photo, online identifier, profile URL.

GDPR requires you have a legal basis to process. the relevant ones for scraping:

  • consent: rarely practical (you cannot ask each scraped person)
  • legitimate interest: most common basis for B2B scraping. requires balancing test against the data subject’s rights
  • legal obligation, vital interests, public task: rarely apply to commercial scraping

even with legitimate interest, you must:

  • inform data subjects within 30 days (Article 14) unless that is “impossible or disproportionate”
  • honor right to erasure, access, and objection requests
  • implement appropriate security
  • consider whether the data is “special category” (health, political opinion, sexual orientation, etc.) which has higher protection

some EU data protection authorities (notably France’s CNIL and the Italian Garante) have fined companies that scraped personal data without proper basis. Clearview AI received €20M fines from multiple EU regulators in 2022-2023 for scraping faces from social media.

practical compliance steps:

  • exclude EU/UK data subjects where possible
  • if you must include them, document your legitimate interest assessment (LIA)
  • publish a privacy notice covering scraped data
  • maintain a data deletion process that responds within 30 days

CCPA and US state privacy laws

California’s CCPA (and successor CPRA), Virginia’s VCDPA, Colorado’s CPA, and a growing list of other state laws apply to companies that process California/state residents’ data above thresholds. for scrapers:

  • if you sell scraped data, that is a “sale” under CCPA and triggers opt-out requirements
  • consumers can request deletion, access, and opt out
  • “publicly available” data has a narrower exception under CCPA than people often assume; just because it is public on LinkedIn does not exempt it

federal privacy law in the US remains stalled in Congress as of mid-2026. expect continued state-by-state expansion.

copyright and database rights

even when scraping is legal, what you do with the scraped data is a separate question.

United States: data and facts are not copyrightable (Feist Publications v Rural Telephone established this). but creative arrangements, written content, photos, and original prose are copyrighted. scraping articles and republishing them is infringement. scraping prices and aggregating them into a database is generally fine.

European Union: the Database Directive gives database makers a separate “sui generis” right protecting substantial investment in database creation, even when the contents are not copyrighted. scraping a “substantial part” of an EU-protected database can violate this even when the underlying data is factual.

United Kingdom: post-Brexit, the UK retained the EU database rights regime. similar rules apply.

if you scrape and re-publish content (not just facts), you need either a license, fair use/fair dealing defense, or transformative use that does not substitute for the original. AI training has been a hot litigation area here in 2024-2026, with multiple cases pending in US and EU courts.

ToS breach and contract law

most major sites have ToS prohibiting scraping. legally, this matters when:

  • you have a binding agreement (clicked “I agree”, created an account, logged in)
  • the ToS clearly prohibits scraping
  • the site can prove damages

US courts have enforced anti-scraping ToS in some cases (the LinkedIn Final Judgment 2022 against hiQ being the clearest), but ToS-only claims usually result in injunctions (stop scraping) rather than large damages. unless you ignored a cease-and-desist, you usually have time to comply once a dispute escalates.

mere browsewrap (ToS link in the footer that you never clicked) is harder for sites to enforce. clickwrap (you actively agreed) is much stronger.

trespass to chattels and “computer trespass”

a tort theory some sites have used: by sending too many requests, you interfere with the site’s servers (trespass to chattels). courts require actual server impairment to apply this, not just a ToS violation.

the bar is high. a scraper running a few thousand requests a day rarely meets it. but DDoS-style scraping at very high volume has triggered successful claims. rate limiting protects you legally, not just technically.

country-specific notes

United States: most permissive. CFAA narrow per Van Buren and hiQ. ToS enforceable but injunctive relief is the typical remedy. state privacy laws growing.

European Union: GDPR is the binding constraint. database rights add a copyright layer. various national WAFs interpret “necessary processing” differently.

United Kingdom: post-Brexit, mostly aligned with EU but on its own track. UK GDPR and Computer Misuse Act 1990 apply.

Australia: privacy law similar to GDPR-lite. anti-spam regulation strict. Copyright Act 1968 protects content.

Singapore: PDPA covers personal data with a “publicly available” exemption broader than GDPR. Computer Misuse Act applies to unauthorized access.

China: PIPL and the Cybersecurity Law are strict. data export controls add friction. scraping Chinese sites from outside is technically possible but legally fraught.

Canada: PIPEDA and CASL apply. Tucows v ICANN and similar cases have been scraping-permissive but cautious.

the safest pattern for production scraping

  1. only scrape public pages without bypassing auth or paywalls
  2. respect robots.txt as a courtesy, even though violation is not itself illegal
  3. rate-limit to avoid trespass-to-chattels exposure (a few requests per second per domain max is a reasonable default)
  4. identify your bot in user-agent if appropriate, or use realistic browser UAs without forging origin
  5. honor cease-and-desist and IP blocks; do not work around them once explicit
  6. exclude personal data unless you have a documented legal basis
  7. respect copyright: scrape facts and structured data freely, but do not republish creative content without a license
  8. document your decisions: keep a written record of what you scrape, why, and your legal basis

following all eight puts you in the safest position legally. our comprehensive scraping legal guide goes deeper on each point.

scraping specific big sites: what we know

Amazon: scraping public product data has been litigated and Amazon has lost CFAA claims when the data was public. ToS prohibits scraping, but enforcement is mostly IP blocks. our Amazon scraping legal guide covers this in detail.

LinkedIn: hiQ-style scraping of public profiles is permitted under CFAA but enjoined under contract law for users with accounts. scraping while not logged in is the safer pattern.

Google: SERP scraping violates ToS but is universally done. Google enforces with IP blocks and CAPTCHAs, not lawsuits, for normal-volume use.

Twitter/X: post-2023 API changes, scraping is more legally fraught. X has been aggressive with cease-and-desists. proceed cautiously.

Reddit: post-2023 API pricing change, scraping public threads is technically allowed but ToS prohibits it. Reddit has not been litigious historically.

Meta (Facebook, Instagram): very aggressive. multiple lawsuits won under CFAA-adjacent state laws and ToS. high risk for commercial scraping.

AI training data: a special case

scraping data to train AI models is a 2024-2026 hot legal topic with no settled answer. major lawsuits include:

  • New York Times v OpenAI/Microsoft (filed Dec 2023): copyright infringement claim over training data
  • Getty Images v Stability AI: image scraping for diffusion model training
  • Authors Guild class actions: training on copyrighted books
  • various artist class actions: training on artwork

courts have not yet given clear guidance. fair use arguments are central to defenses. the EU AI Act (effective 2024-2026) requires training data transparency for general-purpose models. expect more rulings and regulation in 2026-2027.

if you scrape for AI training, document sources, exclude opted-out content (most major sites now have AI-specific opt-outs in robots.txt and ai.txt), and consult counsel before commercial deployment.

faq

is scraping illegal?
in most cases, no. scraping public data without bypassing auth is legal in most jurisdictions. specific data types (personal data, copyrighted content) and specific sources (sites where you have a binding ToS prohibiting it) carry separate legal risk.

did hiQ Labs really beat LinkedIn?
on CFAA, yes. on contract (ToS), no. the November 2022 final judgment found hiQ in breach of LinkedIn’s User Agreement and permanently enjoined hiQ from scraping LinkedIn. so the hiQ case actually establishes both that scraping public data is not CFAA-illegal and that ToS breaches can still be enforced separately.

what about the EU AI Act?
applies from 2024 with full force in 2026-2027. most relevant for scrapers training general-purpose AI: training-data transparency requirements, copyright opt-out respect, and risk classification of AI systems. does not directly regulate scraping itself.

can I scrape data and sell it?
depends what data and which jurisdictions. selling scraped factual data (prices, business listings, public records) is generally legal in the US and often in the EU subject to GDPR if personal data is involved. selling scraped copyrighted content (articles, photos) is infringement.

do I need to respect robots.txt?
not legally required in most jurisdictions, but courts have cited robots.txt non-compliance as evidence of bad faith. respect it where reasonable; document why if you do not.

should I use a real browser to avoid legal liability?
no. browser vs HTTP client does not change the legal analysis. what matters is whether you bypass auth, what data you collect, and your purpose.

am I liable if my scraper accidentally hits a private endpoint?
possibly. unauthorized access claims focus on what you knew or should have known. discovering an exposed private endpoint by accident, then continuing to scrape it after realizing, is risky. stop and notify if you see something that looks like a leak.

conclusion

web scraping in 2026 is mostly legal mostly of the time, but “mostly” carries real risk. the CFAA does not generally apply to public-data scraping in the US after hiQ and Van Buren. GDPR creates a bigger constraint when personal data is involved. ToS breach is an ever-present contract risk if you have an account on the target site. copyright applies to what you do with what you scraped, separate from the scraping itself.

the safe path is: scrape public data, respect explicit blocks and cease-and-desists, exclude or carefully justify personal data, do not republish copyrighted content, and document your decisions. for any commercial operation, talk to a lawyer in your jurisdiction before scaling. this guide is informational and is not a substitute for legal advice on your specific case.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top

Resources

Proxy Signals Podcast
Operator-level insights on mobile proxies and access infrastructure.

Multi-Account Proxies: Setup, Types, Tools & Mistakes (2026)