CCPA compliance for scrapers handling US consumer data

CCPA compliance for scrapers handling US consumer data

CCPA scraping compliance has grown into the second-most-cited blocker for B2C data pipelines, right behind GDPR. The California Consumer Privacy Act, as amended by the California Privacy Rights Act (CPRA) and now enforced by the California Privacy Protection Agency (CPPA), reshaped what US-touching scrapers can safely do. Many engineering teams still operate under the older 2018 CCPA mental model, and that gap is exactly where 2025 and 2026 enforcement actions landed. This guide walks through the actual rules as enforced today, the public-record carve-out that scraping operators rely on (and frequently misread), the consumer rights you must honour, and a checklist your team can implement this quarter.

The audience here is the data engineer or product lead who already runs a scraping pipeline that touches California residents and needs a defensible compliance posture in 2026.

What CCPA actually covers in scraping context

CCPA applies to any business that collects personal information of California residents and meets one of three thresholds: more than USD 25 million in annual revenue, buys or sells personal information of 100,000 or more consumers or households, or derives 50 percent or more of annual revenue from selling or sharing personal information. Scrapers hit the second and third thresholds easily.

Personal information under Cal. Civ. Code Section 1798.140(v) is defined extremely broadly: any information that identifies, relates to, describes, is reasonably capable of being associated with, or could reasonably be linked, directly or indirectly, with a particular consumer or household. The list of examples runs from the obvious (name, email, address, phone) to the operationally relevant (IP addresses, cookie identifiers, browsing history, geolocation, inferences drawn to create a consumer profile). If you scrape it and it relates to a person, it is personal information.

CPRA added a new category in 2023, sensitive personal information (SPI), which includes Social Security numbers, precise geolocation, race, ethnicity, religious or philosophical beliefs, union membership, contents of mail/email/text messages, genetic data, biometric data, health information, sex life, and sexual orientation. SPI carries additional restrictions and is the highest-risk class for scrapers.

For the broader US context and how state-level privacy laws are converging, see the personal vs public data scraping framework. For the EU equivalent, the GDPR compliance guide for scrapers is the right next read.

The publicly available information carve-out (and its limits)

CCPA explicitly excludes “publicly available information” from the definition of personal information. Section 1798.140(v)(2) defines publicly available as information that is lawfully made available from federal, state, or local government records, or information that a business has a reasonable basis to believe is lawfully made available to the general public by the consumer or from widely distributed media; or information made available by a person to whom the consumer has disclosed the information if the consumer has not restricted the information to a specific audience.

This is a real carve-out, but it is narrower than scrapers often assume. Three pitfalls.

First, the “lawfully made available” qualifier means information leaked, hacked, or scraped in violation of terms of service does not become publicly available just because it ended up online. A doxxing forum dump is not publicly available information under CCPA, even if you can read it.

Second, the “consumer has not restricted” carve-out means a profile a user marked private but you accessed via a workaround does not qualify. The user’s restriction state at the time of collection matters.

Third, inferences drawn from publicly available information are not themselves publicly available. If you scrape a profile photo from a public LinkedIn page and then run a face-recognition model against it to infer ethnicity, the inferred ethnicity is personal information (and likely SPI), even though the source was public.

The CPPA has signalled in 2024 and 2025 enforcement guidance that it reads the carve-out narrowly. Treat it as a defence you may invoke, not a shield you assume.

Compliance checklist for scrapers handling California data

ControlWhat it requiresWhy it matters
Privacy policy with CCPA disclosuresCategories of PI collected, sources, purposes, third partiesSection 1798.130
“Do Not Sell or Share My Personal Information” linkHomepage link if you sell or shareSection 1798.135
Opt-out mechanismFunctional within 15 business daysSection 1798.135
Right to know request handlingVerifiable response within 45 daysSection 1798.130
Right to delete request handlingVerifiable deletion within 45 daysSection 1798.105
Right to correct request handlingHonour correction requestsSection 1798.106
Limit use of SPIHonour the limit-the-use-of-SPI rightSection 1798.121
Service provider contractsCCPA-compliant DPAs with vendorsSection 1798.140(ag)
Data minimisationOnly collect what is necessary and proportionateCPRA Section 1798.100(c)
Retention schedulesDisclose and enforce retention periodsSection 1798.100(a)(3)
Annual cybersecurity audit (if high risk)CPPA forthcoming regulationsCPRA
Risk assessment for high-risk processingCPPA forthcoming regulationsCPRA

A scraper that ticks every row above operates inside the safe harbour. One that ticks half is exposed.

Consumer rights and the request workflow

CCPA grants California residents seven core rights: right to know, right to delete, right to correct, right to opt out of sale or sharing, right to limit use of SPI, right to non-discrimination, and right to data portability. For a scraper, the operationally heavy rights are right to know, right to delete, and right to opt out of sale.

Right to know means a consumer can request the categories and specific pieces of personal information you collected about them, the sources, the business or commercial purpose, and the third parties you shared with. You have 45 days to respond. The CPPA expects you to be able to identify the consumer in your dataset, which means your storage schema needs to be queryable by identifier types you collected (email, name plus zip, device ID).

Right to delete means once a verifiable request is received, you must delete the consumer’s personal information from your records and instruct service providers and contractors to do the same. There are exceptions (legal compliance, security, free speech, internal analytics consistent with consumer expectations), but the default is delete.

Right to opt out of sale or sharing is broader than many teams realise. “Sale” includes any disclosure for monetary or other valuable consideration. If you scrape data and license it to customers, that is a sale. You must honour the Global Privacy Control (GPC) signal as a valid opt-out, automatically and without requiring further action. The CPPA confirmed this in 2024 enforcement actions.

For a worked decision tree on how to triage rights requests, see the ethics-first scraping policy guide.

How CCPA enforcement shifted in 2024 and 2025

The CPPA, which took over administrative enforcement in 2023, brought a rulemaking and audit-driven approach that the original Attorney General enforcement lacked. Three trends.

First, the CPPA targeted data brokers explicitly. The Delete Act (SB 362), in force since 2026, requires data brokers to register annually and to honour a single deletion mechanism that consumers can use across all brokers at once. Scrapers that resell personal information meet the data broker definition under California law, full stop. Registration is not optional.

Second, enforcement action shifted from notice-and-cure to direct fine. The 30-day cure period that the original CCPA included was eliminated by CPRA. A scraper that fails to honour an opt-out request can face civil penalties of USD 2,500 per violation or USD 7,500 per intentional violation, with each individual consumer counted separately. A breach affecting 10,000 California residents can produce a USD 75 million liability ceiling.

Third, the CPPA has aggressively enforced the GPC requirement. A 2025 settlement with a major data broker centred on the broker’s failure to recognise GPC signals automatically. The fine was significant, the public-shaming letter was widely read, and the message was unmistakable: GPC is mandatory.

For the parallel UK and EU enforcement environment, see the GDPR compliance guide.

Decision tree for a US-touching scrape

Q1: Does the target site host personal info of California residents?
    ├── No  -> CCPA likely not in scope. Document the assessment.
    └── Yes -> Q2
Q2: Is the data clearly within the publicly available carve-out?
    ├── Yes -> Document why; still recommended to honour deletion requests.
    └── No  -> Q3
Q3: Does your business meet a CCPA threshold?
    ├── No  -> CCPA does not apply directly; state laws may.
    └── Yes -> Q4
Q4: Have you published a CCPA-compliant privacy policy?
    ├── No  -> Publish before launching.
    └── Yes -> Q5
Q5: Do you sell or share the scraped data?
    ├── Yes -> Add "Do Not Sell or Share" link; honour GPC; register if data broker.
    └── No  -> Q6
Q6: Will you process sensitive personal information?
    ├── Yes -> Honour limit-use right; consider risk assessment.
    └── No  -> Proceed; log the assessment in your records.

Service provider, contractor, and third party

CCPA distinguishes between three downstream relationships. A service provider processes personal information on your behalf under a written contract that restricts further use. A contractor is similar but typically engaged on a one-off basis. A third party receives personal information for its own purposes; this is where “sale” attaches.

Scrapers commonly sit in two roles: as a service provider when they scrape on behalf of a customer under a DPA, and as a third party when they license the dataset for the customer’s independent use. The DPA you sign with a proxy provider is a service provider agreement. The DPA you sign with a customer who buys your dataset is potentially a third-party arrangement, depending on how restrictive the contract is. Get this categorisation wrong and you have either misclassified a sale (CPPA fine territory) or imposed restrictions you cannot enforce (commercial conflict).

Comparison: CCPA vs GDPR for scrapers

DimensionCCPA / CPRAGDPR
Personal data definitionBroad, includes householdBroad, individual only
Lawful basis requiredNo, but right to opt out of saleYes, six bases
Public data carve-outYes (publicly available)None
Right to deleteYes (with exceptions)Yes (Article 17)
Right to opt out of saleYes (mandatory GPC)Implicit in lawful basis
Sensitive data categoryYes (SPI, CPRA addition)Yes (special categories)
Extraterritorial reachYes if doing business in CAYes if processing EU data
Statutory damagesYes, per-violation civil penaltyAdministrative fines up to 4% revenue
Cure periodNone (after CPRA)Limited
Private right of actionLimited (data breach only)Yes (Article 82)

The two regimes overlap heavily but diverge on lawful basis and the public data carve-out. Build for both and you have most US and EU coverage.

External references

The canonical statute is the California Civil Code, Title 1.81.5, hosted at oag.ca.gov/privacy/ccpa. The CPPA publishes its regulations and enforcement actions at cppa.ca.gov. The Global Privacy Control specification is at globalprivacycontrol.org.

Operationalising opt-out signals

The Global Privacy Control is a browser-emitted signal in the request headers (Sec-GPC: 1) that indicates the user has opted out of the sale or sharing of their personal information. The CPPA requires you to honour it automatically. Implementation for a scraping operator is two-part: detect the GPC signal at any user-facing surface (your website, your customer portal, your data preview pages) and treat any consumer whose original collection context included GPC as opted out by default.

For a scraped dataset, this is harder, because you typically do not have GPC headers from the scraping target. The practical workaround: when you receive a deletion or opt-out request, do not require the requester to re-authenticate from a GPC-enabled browser. Treat the request as valid based on identifier match alone, and document the verification path.

Special cases: data brokers, AI training, and hiring

The Delete Act (SB 362) made California the first US state with a single-source deletion mechanism for data brokers. Once the deletion portal is fully live (2026 phased rollout), any consumer can submit a single request that deletes their data across every registered broker. Scrapers who meet the data broker definition must register, must honour the central deletion list, and must not re-collect deleted consumers’ data within an enforcement window.

AI training is now subject to additional CPPA risk assessment requirements when the training set includes California residents’ personal information at scale. The risk assessment must address the necessity of the training data, the safeguards against re-identification, and the consumer rights surface for opt-out and deletion. Several large model providers were quietly fined in 2025 for failing to file the risk assessment.

Hiring and employee data was carved out of CCPA from 2018 to 2023 but became fully covered in 2023. A scraper that pulls professional profile data of California residents now operates under full CCPA, with no employment-context exemption.

FAQ

Is publicly available data exempt from CCPA?
Partially. The carve-out only covers data that was lawfully made publicly available and that the consumer has not restricted. Inferences drawn from public data are not themselves public.

Do I need to honour Global Privacy Control signals?
Yes. The CPPA confirmed in 2024 that GPC is a valid opt-out signal that must be honoured automatically.

What is the fine range under CCPA in 2026?
Civil penalties are USD 2,500 per violation or USD 7,500 per intentional violation, with each individual consumer counted separately.

Am I a data broker if I scrape and sell?
If you knowingly collect and sell personal information of consumers with whom you do not have a direct relationship, yes, and you must register annually under the Delete Act.

Does CCPA apply to B2B data?
Yes since 2023. The B2B carve-out expired and professional contact data of California residents is now fully covered.

Extended enforcement analysis 2024-2026

The California Privacy Protection Agency moved from rulemaking to active enforcement during 2024 and 2025. The DoorDash settlement (February 2024, USD 375,000) was the first to specifically cite cross-context scraping of consumer data without a working opt-out signal. The CPPA’s enforcement advisories in 2025 covered three patterns relevant to scrapers, namely failure to honour the Global Privacy Control header, failure to recognise the Sec-GPC header on automated traffic, and failure to surface a Do Not Sell or Share My Personal Information link in the privacy notice that links the scraping operation to the consumer-facing brand.

The Sephora case (August 2022, USD 1.2 million) remains the touchstone for California enforcement on opt-out signals. Sephora was found in violation for failing to process opt-out signals as valid CCPA requests. Every scraper that touches California residents should treat that case as authoritative and design GPC handling into the ingest layer, not a downstream marketing tool.

A pattern emerged in 2025 that scrapers should plan for. The CPPA increasingly views scraping followed by enrichment, segmentation, and resale as a sale or sharing event under CCPA, even if the scraping operator does not directly transfer data. The triggering test is whether the consumer would reasonably understand that their public information would be combined with non-public signals and sold downstream. For B2B people-data vendors this is now the central compliance question.

Implementation patterns for a CCPA-clean pipeline

The minimum control set for a US-touching 2026 scraping pipeline includes nine items.

  1. A GPC and Sec-GPC header check at every fetch with the result logged per request.
  2. A privacy notice link surfaced on every consumer-facing surface that touches scraped data.
  3. A right-to-know workflow that responds within forty-five days with extension up to ninety.
  4. A right-to-delete workflow with verification that does not over-collect identity proof.
  5. A right-to-correct workflow added in 2023 amendments and now actively enforced.
  6. A right-to-limit-use-of-sensitive-personal-information workflow.
  7. A service provider contract with every downstream processor.
  8. A data inventory that distinguishes personal information from sensitive personal information.
  9. A retention schedule documented per category and enforced.

Worked example: GPC handling at fetch time

def should_index(response, headers):
    gpc = headers.get("Sec-GPC", "0")
    if gpc == "1":
        log.info("gpc_signal_present", url=response.url)
        return False  # treat as opt-out for downstream sale or share
    return True

The check belongs at the ingest layer because removing data downstream after vectorisation is harder than skipping it at fetch.

Additional FAQ

Do I need a CCPA notice if I never sell data?
Yes if you process personal information of California residents above the thresholds. The notice obligation is independent of sale.

Does the publicly available carve-out cover LinkedIn profiles?
Generally no. The carve-out applies to information lawfully made available from federal, state, or local government records, plus information the consumer or their authorised agent has made available. Commercial platforms with terms of service restricting bulk access do not satisfy the carve-out by themselves.

What is the difference between sale and share under CCPA?
Sale is exchange for monetary or other valuable consideration. Share is disclosure for cross-context behavioural advertising. Both trigger opt-out rights and the Do Not Sell or Share link.

How do I verify a deletion request without over-collecting?
Match against information you already hold. A consumer should not have to provide more identity than the minimum needed to confirm the match. Documentation of the verification logic is part of compliance.

Practical scope determination for CCPA

Determining whether the CCPA applies to a scraping operation requires analysis on three axes. First, does the scraping operation process personal information of California residents. Second, does the operating entity meet the size threshold (USD 25 million annual revenue, or 100,000 California consumers, or 50 percent of revenue from selling personal information). Third, does the activity fit within the CCPA’s exempted categories.

For most commercial scrapers the first axis is yes by default, the second axis is met for any team above small startup size, and the third axis offers little relief. The narrow exemptions for medical information governed by HIPAA, financial information governed by GLBA, and certain business-to-business communications during the transition period in earlier amendments are of limited use to a generic scraper.

The 2024 amendments and CPPA regulations clarified that aggregators, brokers, and AI training data vendors fall squarely within scope when they touch California-resident data. The CPPA’s enforcement priorities published in 2025 listed data brokers and AI training data as the top two areas of focus. Scrapers in those categories should plan for a CCPA registration where applicable and a higher level of regulatory attention.

Sensitive personal information and the right to limit

The CCPA’s 2023 amendments introduced a new category of sensitive personal information (SPI) and a new right to limit its use and disclosure. SPI includes Social Security numbers, driver’s licence numbers, financial account information, precise geolocation, racial or ethnic origin, religious beliefs, mail and email content, genetic data, biometric data, health data, and sex life or sexual orientation.

For scrapers the SPI category is operationally similar to GDPR Article 9 special category data. The scraper should detect SPI at ingest, route it to a separate handling pathway with stricter access controls, and surface a right-to-limit-use mechanism on the consumer-facing surface.

The right to limit is narrower than the right to delete. A consumer who exercises the right to limit is restricting use to specific listed purposes (services requested, security and integrity, certain analytics) but is not requiring deletion. The scraper must therefore have a way to flag SPI records as limited and prevent downstream non-listed uses.

Service provider, contractor, and third party distinctions

The CCPA distinguishes service providers (who process personal information on behalf of a business under a written contract restricting their use), contractors (a 2023 addition broadly similar to service providers but with subtle differences), and third parties (everyone else). The classification matters because transfers to service providers and contractors are not sales or shares, but transfers to third parties typically are.

A scraping operation that resells data to clients must therefore decide whether each client is a service provider, contractor, or third party, and put the right contract in place. The CPPA template language for service provider contracts is the safest starting point. Contracts that diverge from the template are scrutinised more closely.

A common 2026 mistake is treating analytics platforms as service providers without a service provider contract. Without the contract, the data transfer to the analytics platform is a sale or share that triggers the opt-out right and the Do Not Sell or Share link.

Next steps

The fastest path to a defensible CCPA posture in 2026 is to publish the privacy policy with the required disclosures, wire up GPC detection across your customer-facing surfaces, stand up a deletion inbox you actually monitor, and register as a data broker if you sell scraped data. For broader policy guidance, head to the DRT compliance and ethics hub and pair this guide with the ethics-first policy build.

This guide is informational, not legal advice.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top
message me on telegram

Resources

Proxy Signals Podcast
Operator-level insights on mobile proxies and access infrastructure.

Multi-Account Proxies: Setup, Types, Tools & Mistakes (2026)