Is Web Scraping Legal? A Comprehensive Legal Guide for 2026
The legality of web scraping is one of the most commonly asked questions in the data collection industry — and the answer is nuanced. Web scraping itself is not inherently illegal. It is a technique, like a hammer: legal to own and use, but the way you use it determines legality.
In short: scraping publicly available data is generally legal, but how you scrape, what you scrape, and what you do with the data all matter.
This guide breaks down the legal frameworks, landmark court cases, and practical guidelines for conducting web scraping operations within legal boundaries.
Table of Contents
- The Short Answer
- Key Legal Frameworks
- Landmark Court Cases
- What Makes Scraping Legal or Illegal
- Data Protection Laws and Scraping
- Terms of Service Considerations
- Industry-Specific Regulations
- Best Practices for Legal Web Scraping
- When You Need Legal Counsel
- FAQ
The Short Answer
| Scenario | Generally Legal? | Notes |
|---|---|---|
| Scraping publicly available data | Yes | No login required, no ToS bypass |
| Scraping for personal/research use | Yes | Academic and journalistic protections |
| Scraping behind login walls | Risky | May violate CFAA or ToS |
| Scraping personal/private data | Risky | GDPR/CCPA compliance required |
| Scraping copyrighted content for republication | No | Copyright infringement |
| Scraping after receiving cease-and-desist | Risky | May constitute trespass to chattels |
| Scraping that damages target servers | No | DoS is illegal everywhere |
Key Legal Frameworks
Computer Fraud and Abuse Act (CFAA) — United States
The CFAA is the primary federal law governing unauthorized computer access. The critical question is whether scraping constitutes “unauthorized access” or “exceeding authorized access.”
Key points:
- Accessing publicly available websites is generally not “unauthorized access”
- The Supreme Court’s 2021 ruling in Van Buren v. United States narrowed the CFAA’s scope, clarifying that “exceeding authorized access” means accessing data one is not entitled to, not merely violating usage policies
- However, circumventing technical barriers (IP blocks, CAPTCHAs) after being explicitly banned could still raise CFAA concerns
General Data Protection Regulation (GDPR) — European Union
GDPR affects web scraping when personal data of EU residents is involved:
- Lawful basis required — You need a legal justification for processing personal data (legitimate interest is the most common basis for scraping)
- Data minimization — Only collect data that’s necessary for your purpose
- Purpose limitation — Use data only for the stated purpose
- Right to erasure — Individuals can request deletion of their scraped data
- Data Protection Impact Assessment — Required for large-scale processing of personal data
California Consumer Privacy Act (CCPA/CPRA)
Similar to GDPR but specific to California residents:
- Applies to businesses meeting certain revenue/data thresholds
- Gives consumers the right to know what data is collected about them
- Provides opt-out rights for data sales
- Exemptions for publicly available government data
Copyright Law
Copyright protects the creative expression in content, not facts:
- Facts are not copyrightable — Product prices, addresses, phone numbers, stock prices
- Creative content is copyrightable — Articles, images, reviews, product descriptions
- Database rights (EU) — The EU protects databases that required substantial investment to create, even if individual data points aren’t copyrightable
Landmark Court Cases
hiQ Labs v. LinkedIn (2022)
Ruling: The Ninth Circuit ruled that scraping publicly accessible data on LinkedIn does not violate the CFAA.
Significance: This is the most important web scraping case. It established that:
- Public data on the web can be scraped without violating the CFAA
- LinkedIn could not use the CFAA to prevent hiQ from scraping public profiles
- The ruling distinguished between public data and data behind authentication
Limitation: The case didn’t address copyright, contract law, or state privacy laws — it only addressed the CFAA.
Ryanair v. PR Aviation (EU, 2015)
Ruling: The EU Court of Justice ruled that database rights don’t protect non-creative databases, but terms of service restricting scraping can be enforceable.
Significance: In the EU, contractual restrictions on scraping (ToS) can carry legal weight even if the data itself isn’t protected by copyright.
Meta v. BrandTotal/Bright Data (2024)
Ruling: The court found that scraping publicly visible social media data for analytics did not violate the CFAA, reinforcing the hiQ precedent.
Significance: Further solidified that publicly accessible social media data can be scraped without violating federal computer fraud laws.
Van Buren v. United States (Supreme Court, 2021)
Ruling: The Supreme Court held that “exceeding authorized access” under the CFAA requires accessing information that one’s computer access doesn’t extend to — not merely misusing information one is authorized to access.
Significance: This narrowed interpretation makes it harder to prosecute scrapers under the CFAA for violating terms of service, as ToS violations alone don’t constitute “unauthorized access.”
What Makes Scraping Legal or Illegal
Generally Legal
- Scraping publicly accessible data — If anyone can see it in a browser without logging in, scraping it is generally permissible
- Scraping factual data — Prices, product specifications, business listings, weather data
- Scraping for research — Academic research, journalism, competitive analysis
- Scraping government/public records — Court records, business filings, regulatory data
- Respecting technical boundaries — Following robots.txt, rate limiting, not overwhelming servers
Legally Risky
- Circumventing access controls — Bypassing CAPTCHAs, login requirements, or IP bans specifically designed to block scrapers
- Scraping personal data without legal basis — Collecting emails, phone numbers, or personal profiles without GDPR/CCPA compliance
- Violating explicit contractual agreements — You signed terms of service that prohibit scraping
- Republishing copyrighted content — Scraping articles and republishing them as your own
- Causing server damage — Overwhelming servers with requests to the point of disruption
Generally Illegal
- Scraping classified/confidential data — Government secrets, trade secrets, encrypted data
- Scraping medical or financial records — HIPAA, financial regulations
- DDoS through scraping — So many requests that the target server goes down
- Identity theft or fraud — Using scraped data for criminal purposes
Data Protection Laws and Scraping
GDPR Compliance Checklist
If you’re scraping data that includes personal information of EU residents:
- [ ] Identify your lawful basis — Legitimate interest is most common for scraping
- [ ] Conduct a Legitimate Interest Assessment (LIA)
- [ ] Minimize data collection — Only scrape what you need
- [ ] Document your processing activities
- [ ] Implement appropriate security measures for stored data
- [ ] Honor data subject requests — Deletion, access, portability
- [ ] Consider a Data Protection Impact Assessment for large-scale scraping
- [ ] Appoint a DPO if required by your organization’s profile
Personal Data vs. Business Data
| Data Type | GDPR Classification | Scraping Risk |
|---|---|---|
| Product prices | Not personal data | Low |
| Business addresses | Generally not personal data | Low |
| Business email (info@company.com) | Debatable | Low-Medium |
| Personal email (john@gmail.com) | Personal data | High |
| Full names + job titles | Personal data | Medium |
| Photos of individuals | Personal data (biometric potential) | High |
| Phone numbers | Personal data | High |
| IP addresses | Personal data under GDPR | Medium |
For more on compliance, read our guide on web scraping compliance and legal frameworks.
Terms of Service Considerations
Are ToS Legally Binding?
The enforceability of Terms of Service depends on:
- Browsewrap agreements (ToS link in footer) — Generally weak enforceability. Courts have found that merely visiting a site doesn’t constitute agreement.
- Clickwrap agreements (must click “I agree”) — Strong enforceability. If you created an account and agreed to terms, you’re bound.
- Signed contracts — Fully enforceable.
After Van Buren
The Supreme Court’s Van Buren ruling suggests that violating ToS alone is not a federal crime under the CFAA. However:
- ToS violations could still support breach of contract claims
- State laws may treat ToS differently
- EU courts may give ToS more weight (see Ryanair case)
Practical Approach
- Read the ToS before scraping
- If ToS prohibits scraping, assess the risk based on your jurisdiction and purpose
- Never agree to ToS that prohibits scraping and then scrape (clickwrap)
- Consider reaching out to the website for data access (APIs, data partnerships)
Industry-Specific Regulations
Real Estate (MLS Data)
MLS databases have strong copyright and contractual protections. Scraping MLS data typically violates both licensing agreements and copyright.
Healthcare (HIPAA)
Patient health information is protected under HIPAA. Scraping healthcare data that includes patient identifiers is illegal without authorization.
Financial (SEC, FINRA)
Public SEC filings are fair game, but some financial data providers have strong terms restricting scraping. Insider information obtained through scraping could trigger securities violations.
Travel (Airlines, Hotels)
Airlines have aggressively pursued scrapers (see Ryanair cases). Many travel sites employ sophisticated anti-bot systems. Scraping is common but often challenged legally.
Best Practices for Legal Web Scraping
1. Only Scrape Public Data
Stick to data that anyone can access without authentication:
# Good: Scraping a public product page
response = requests.get("https://store.com/products/widget-123")
Risky: Scraping behind authentication
session.post("https://store.com/login", data={"user": "...", "pass": "..."})
response = session.get("https://store.com/account/orders")
2. Respect robots.txt
Always check and follow robots.txt directives:
from urllib.robotparser import RobotFileParser
rp = RobotFileParser()
rp.set_url("https://example.com/robots.txt")
rp.read()
url = "https://example.com/products"
if rp.can_fetch("*", url):
print("Allowed to scrape this URL")
response = requests.get(url)
else:
print("robots.txt disallows scraping this URL")
3. Rate Limit Your Requests
Don’t overwhelm target servers:
import time
import random
for url in urls:
response = requests.get(url)
# Random delay between 2-5 seconds
time.sleep(random.uniform(2, 5))
4. Identify Yourself
Use a descriptive User-Agent that includes contact information:
headers = {
"User-Agent": "DataResearchBot/1.0 (https://yoursite.com/bot; contact@yoursite.com)"
}
response = requests.get(url, headers=headers)
5. Don’t Scrape Personal Data Unless Necessary
Apply data minimization principles:
- Skip personal identifiers when collecting business data
- Anonymize or pseudonymize personal data immediately
- Delete data you no longer need
6. Document Your Scraping Activities
Maintain records of:
- What data you scrape and why
- Which sites you scrape
- How you store and process the data
- Your legal basis for processing (if personal data is involved)
- Data retention policies
7. Use Proxies Responsibly
Proxies are legal tools for distributing request load, but don’t use them to circumvent specific bans directed at you.
When You Need Legal Counsel
Consult a lawyer specializing in internet law when:
- You receive a cease-and-desist letter
- You’re scraping personal data at scale
- Your business model depends on scraped data
- You’re operating across multiple jurisdictions
- You’re scraping from competitors in a regulated industry
- You’re unsure about the legality of your specific use case
FAQ
Can I get sued for web scraping?
Yes, you can be sued, though whether the lawsuit succeeds depends on circumstances. Common legal theories include breach of contract (ToS violation), copyright infringement, trespass to chattels, and unfair competition. The hiQ v. LinkedIn case established that scraping public data doesn’t violate the CFAA, but other legal claims remain available to website owners.
Is scraping Google legal?
Scraping Google search results is not illegal per se, but it violates Google’s Terms of Service. Google actively blocks scrapers and may take legal action against large-scale commercial scraping. For legitimate SEO monitoring, using Google’s official APIs or authorized SERP API providers is the safest approach.
Is it legal to scrape and sell data?
Scraping publicly available factual data and selling it is generally legal — this is what data providers like Bloomberg, Nielsen, and countless price comparison sites do. However, you cannot scrape copyrighted content for resale, and selling scraped personal data must comply with GDPR/CCPA. The data’s nature and your processing practices determine legality.
Do I need to follow robots.txt?
Robots.txt is not legally binding in most jurisdictions — it’s a voluntary standard. However, ignoring robots.txt after being explicitly told not to scrape a site could be used as evidence of bad faith in a legal proceeding. From a practical and ethical standpoint, respecting robots.txt is strongly recommended.
Is it legal to scrape social media?
Scraping publicly visible social media data is generally legal under the hiQ and Meta v. BrandTotal precedents, at least under the CFAA. However, scraping private profiles, using scraped data for harassment, or violating GDPR by processing personal data without a lawful basis can create legal liability. Each platform also has its own terms that may restrict scraping. For more on specific platforms, see our guide on social media proxies.
—
Want to learn about scraping specific platforms? Read our guide on is it legal to scrape Amazon or explore our web scraping compliance guides for jurisdiction-specific details.