Scraping competitor ad libraries is one of the highest-leverage intelligence moves a growth team can make, and in 2026 it has gotten both easier and harder simultaneously. Meta, Google, and TikTok all expose public ad transparency portals, but each one has tightened rate limits, added fingerprinting layers, and restructured their HTML quarterly. This guide covers what actually works right now, platform by platform, with real tool choices and honest tradeoffs.
Why Ad Library Scraping Matters for Competitive Intel
Ad libraries show you what your competitors are spending money on: which creatives are running, how long they’ve been live, which regions they target, and which angles they’re testing. A creative that has been running for 90-plus days is almost certainly profitable. That signal alone can cut your own creative testing budget significantly.
Combined with scraping Reddit subreddit sentiment for marketing intel, you get a full picture: what competitors are saying in paid channels and how real users are responding to those messages organically.
Meta Ad Library: The Most Data, the Worst Bot Defenses
Meta’s Ad Library at facebook.com/ads/library is the richest source. It exposes active and recently inactive ads with spend ranges, impression estimates, start dates, and demographic targeting breakdowns for political ads. Non-political ads skip the demographic data but still show creative, copy, and run duration.
Meta’s defenses in 2026 include aggressive browser fingerprinting via their Bowser-based detection stack, rate limits that trigger at roughly 80-100 requests per session, and occasional CAPTCHAs on geographic anomalies. Playwright with stealth patches (playwright-extra + puppeteer-stealth ported to Python via playwright-stealth) handles the fingerprinting. The harder problem is session rotation.
from playwright.async_api import async_playwright
from playwright_stealth import stealth_async
async def scrape_meta_ad_library(query: str, proxy: str):
async with async_playwright() as p:
browser = await p.chromium.launch(proxy={"server": proxy})
page = await browser.new_page()
await stealth_async(page)
await page.goto(
f"https://www.facebook.com/ads/library/?active_status=active&ad_type=all&q={query}",
wait_until="networkidle"
)
# scroll-load and extract ad cards here
await browser.close()Use residential proxies with sticky sessions per scraping worker. Rotating too fast on Meta triggers device-change signals. For volume work at scale, the same proxy discipline that matters for scraping Google Maps and local pack data applies here: residential IPs, low concurrency per IP, and session persistence measured in minutes, not seconds.
Google Ads Transparency Center: Structured but Rate-Limited
Google launched its Ads Transparency Center in 2023 and expanded coverage through 2025. By 2026, it covers Search, Shopping, Display, and YouTube ads with verified advertiser status, first-seen/last-seen dates, and geographic targeting.
The endpoint is adstransparency.google.com. Google serves it as a client-rendered Angular app, so raw HTTP requests return a shell. You need a headless browser. The good news: Google’s fingerprinting here is lighter than on Search itself, and the API calls the frontend makes to adstransparency.google.com/api/v1/ads are documented enough to reverse-engineer.
Key data points available per ad:
- Advertiser verified name and domain
- First and last shown dates
- Ad format (Search, Display, Video)
- Regional targeting (country-level)
- Creative text and headlines for Search ads
Rate limits are roughly 200-300 API calls per hour per IP before you hit soft blocks. Residential proxies with 10-15 minute sticky sessions work well. If you’re also running SERP feature scraping for SEO audits, you can reuse the same proxy pool across both jobs since the timing profiles are compatible.
TikTok Creative Center: The Easiest Target in 2026
TikTok’s Creative Center (ads.tiktok.com/business/creativecenter) is the least defended of the three. TikTok exposes a public API that the Creative Center frontend calls, and it is effectively undocumented but stable. The base endpoint is:
GET https://ads.tiktok.com/creative_radar_api/v1/top_ads/v2/list
?period=7&industry_id=<id>®ion=US&limit=20&page=1No authentication required. Rate limits are permissive: 1,000-plus calls per hour from a single datacenter IP without issues in testing through April 2026. You can use httpx with no browser overhead.
The tradeoff: Creative Center shows top-performing ads across the platform, not a specific advertiser’s full library. For advertiser-specific intel, TikTok’s formal Ad Library (library.tiktok.com) is more relevant but requires account login and is more defended. For category-level trend analysis, the Creative Center API is the cleaner option.
Platform Comparison and Tooling
| Platform | Auth Required | JS Rendering | Rate Limit (est.) | Best Proxy Type |
|---|---|---|---|---|
| Meta Ad Library | No (public) | Yes (React) | ~80-100 req/session | Residential sticky |
| Google Ads Transparency | No (public) | Yes (Angular) | ~200-300 req/hr | Residential sticky |
| TikTok Creative Center | No | No (API) | ~1,000+ req/hr | Datacenter OK |
| TikTok Ad Library | Yes (login) | Yes | Unknown | Residential rotating |
Parsing and Storage Pipeline
Once you’ve handled the scraping layer, the parsing and storage pipeline is where most teams lose time. Recommended stack:
- Scrape raw HTML or API JSON into a staging bucket (S3 or local filesystem) before parsing. Never parse inline on the scraper worker.
- Run a separate extraction job using
selectolax(faster than BeautifulSoup for HTML) orjmespathfor JSON payloads. - Normalize advertiser names with fuzzy matching (
rapidfuzz) since the same brand appears differently across platforms. - Store in Postgres with a
first_seen / last_seen / last_scrapedpattern. Upsert on a composite key of(platform, ad_id). - Flag ads with
last_seen - first_seen > 60 daysas “evergreen” for creative intelligence reports.
For sentiment layering on top of the ad creative data, cross-referencing with YouTube comment sentiment gives you a feedback loop: identify which competitor ad angles generate positive versus negative response in organic video comments.
The same Postgres schema works well for backlink monitoring if you’re also running backlink network scraping at scale, since both workflows benefit from the same deduplication and staleness-detection patterns.
Deduplication gotchas
- Meta recycles ad IDs when creatives are paused and relaunched, so treat
(ad_id, start_date)as the composite key, notad_idalone. - TikTok Creative Center returns the same ad in multiple regions with the same
id. Use(id, region)as your key. - Google Transparency uses opaque internal IDs that can change when an ad is edited. Hash the creative text + advertiser + start date as a stable fingerprint.
Bottom Line
For most growth and data teams in 2026, TikTok Creative Center is the fastest win: no auth, no browser overhead, permissive limits. Meta Ad Library is the most valuable for e-commerce competitive intel but requires residential proxy investment and careful session management. Google Ads Transparency sits in the middle on both dimensions. DRT covers infrastructure patterns for exactly these kinds of scraping pipelines, so if you’re building this out at scale, the proxy and parsing architecture pieces are worth reading in depth before you hit your first soft block.
Related guides on dataresearchtools.com
- Scraping SERP Features for 2026 SEO Audits: PAA, Snippets, AIO
- Scraping Backlink Networks at Scale for Disavow Files (2026)
- Scraping YouTube Comment Sentiment for Brand Analysis (2026)
- Scraping Reddit Subreddit Sentiment for Marketing Intel (2026)
- Pillar: Best Proxy Types for Scraping Google Maps and Local Pack (2026)