Scraping Competitor Ad Libraries: Meta, Google, TikTok in 2026

Scraping competitor ad libraries is one of the highest-leverage intelligence moves a growth team can make, and in 2026 it has gotten both easier and harder simultaneously. Meta, Google, and TikTok all expose public ad transparency portals, but each one has tightened rate limits, added fingerprinting layers, and restructured their HTML quarterly. This guide covers what actually works right now, platform by platform, with real tool choices and honest tradeoffs.

Why Ad Library Scraping Matters for Competitive Intel

Ad libraries show you what your competitors are spending money on: which creatives are running, how long they’ve been live, which regions they target, and which angles they’re testing. A creative that has been running for 90-plus days is almost certainly profitable. That signal alone can cut your own creative testing budget significantly.

Combined with scraping Reddit subreddit sentiment for marketing intel, you get a full picture: what competitors are saying in paid channels and how real users are responding to those messages organically.

Meta Ad Library: The Most Data, the Worst Bot Defenses

Meta’s Ad Library at facebook.com/ads/library is the richest source. It exposes active and recently inactive ads with spend ranges, impression estimates, start dates, and demographic targeting breakdowns for political ads. Non-political ads skip the demographic data but still show creative, copy, and run duration.

Meta’s defenses in 2026 include aggressive browser fingerprinting via their Bowser-based detection stack, rate limits that trigger at roughly 80-100 requests per session, and occasional CAPTCHAs on geographic anomalies. Playwright with stealth patches (playwright-extra + puppeteer-stealth ported to Python via playwright-stealth) handles the fingerprinting. The harder problem is session rotation.

from playwright.async_api import async_playwright
from playwright_stealth import stealth_async

async def scrape_meta_ad_library(query: str, proxy: str):
    async with async_playwright() as p:
        browser = await p.chromium.launch(proxy={"server": proxy})
        page = await browser.new_page()
        await stealth_async(page)
        await page.goto(
            f"https://www.facebook.com/ads/library/?active_status=active&ad_type=all&q={query}",
            wait_until="networkidle"
        )
        # scroll-load and extract ad cards here
        await browser.close()

Use residential proxies with sticky sessions per scraping worker. Rotating too fast on Meta triggers device-change signals. For volume work at scale, the same proxy discipline that matters for scraping Google Maps and local pack data applies here: residential IPs, low concurrency per IP, and session persistence measured in minutes, not seconds.

Google Ads Transparency Center: Structured but Rate-Limited

Google launched its Ads Transparency Center in 2023 and expanded coverage through 2025. By 2026, it covers Search, Shopping, Display, and YouTube ads with verified advertiser status, first-seen/last-seen dates, and geographic targeting.

The endpoint is adstransparency.google.com. Google serves it as a client-rendered Angular app, so raw HTTP requests return a shell. You need a headless browser. The good news: Google’s fingerprinting here is lighter than on Search itself, and the API calls the frontend makes to adstransparency.google.com/api/v1/ads are documented enough to reverse-engineer.

Key data points available per ad:

  • Advertiser verified name and domain
  • First and last shown dates
  • Ad format (Search, Display, Video)
  • Regional targeting (country-level)
  • Creative text and headlines for Search ads

Rate limits are roughly 200-300 API calls per hour per IP before you hit soft blocks. Residential proxies with 10-15 minute sticky sessions work well. If you’re also running SERP feature scraping for SEO audits, you can reuse the same proxy pool across both jobs since the timing profiles are compatible.

TikTok Creative Center: The Easiest Target in 2026

TikTok’s Creative Center (ads.tiktok.com/business/creativecenter) is the least defended of the three. TikTok exposes a public API that the Creative Center frontend calls, and it is effectively undocumented but stable. The base endpoint is:

GET https://ads.tiktok.com/creative_radar_api/v1/top_ads/v2/list
  ?period=7&industry_id=<id>&region=US&limit=20&page=1

No authentication required. Rate limits are permissive: 1,000-plus calls per hour from a single datacenter IP without issues in testing through April 2026. You can use httpx with no browser overhead.

The tradeoff: Creative Center shows top-performing ads across the platform, not a specific advertiser’s full library. For advertiser-specific intel, TikTok’s formal Ad Library (library.tiktok.com) is more relevant but requires account login and is more defended. For category-level trend analysis, the Creative Center API is the cleaner option.

Platform Comparison and Tooling

PlatformAuth RequiredJS RenderingRate Limit (est.)Best Proxy Type
Meta Ad LibraryNo (public)Yes (React)~80-100 req/sessionResidential sticky
Google Ads TransparencyNo (public)Yes (Angular)~200-300 req/hrResidential sticky
TikTok Creative CenterNoNo (API)~1,000+ req/hrDatacenter OK
TikTok Ad LibraryYes (login)YesUnknownResidential rotating

Parsing and Storage Pipeline

Once you’ve handled the scraping layer, the parsing and storage pipeline is where most teams lose time. Recommended stack:

  1. Scrape raw HTML or API JSON into a staging bucket (S3 or local filesystem) before parsing. Never parse inline on the scraper worker.
  2. Run a separate extraction job using selectolax (faster than BeautifulSoup for HTML) or jmespath for JSON payloads.
  3. Normalize advertiser names with fuzzy matching (rapidfuzz) since the same brand appears differently across platforms.
  4. Store in Postgres with a first_seen / last_seen / last_scraped pattern. Upsert on a composite key of (platform, ad_id).
  5. Flag ads with last_seen - first_seen > 60 days as “evergreen” for creative intelligence reports.

For sentiment layering on top of the ad creative data, cross-referencing with YouTube comment sentiment gives you a feedback loop: identify which competitor ad angles generate positive versus negative response in organic video comments.

The same Postgres schema works well for backlink monitoring if you’re also running backlink network scraping at scale, since both workflows benefit from the same deduplication and staleness-detection patterns.

Deduplication gotchas

  • Meta recycles ad IDs when creatives are paused and relaunched, so treat (ad_id, start_date) as the composite key, not ad_id alone.
  • TikTok Creative Center returns the same ad in multiple regions with the same id. Use (id, region) as your key.
  • Google Transparency uses opaque internal IDs that can change when an ad is edited. Hash the creative text + advertiser + start date as a stable fingerprint.

Bottom Line

For most growth and data teams in 2026, TikTok Creative Center is the fastest win: no auth, no browser overhead, permissive limits. Meta Ad Library is the most valuable for e-commerce competitive intel but requires residential proxy investment and careful session management. Google Ads Transparency sits in the middle on both dimensions. DRT covers infrastructure patterns for exactly these kinds of scraping pipelines, so if you’re building this out at scale, the proxy and parsing architecture pieces are worth reading in depth before you hit your first soft block.

Related guides on dataresearchtools.com

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top

Resources

Proxy Signals Podcast
Operator-level insights on mobile proxies and access infrastructure.

Multi-Account Proxies: Setup, Types, Tools & Mistakes (2026)