How to Scrape Netflix Catalog Data with Proxies in 2026
Netflix operates the world’s largest streaming library, but its catalog varies dramatically by country. A title available in the US might not be available in the UK, Japan, or Brazil. For content researchers, media analysts, VPN reviewers, and entertainment industry professionals, scraping Netflix catalog data across multiple countries provides unique insights into content licensing, regional strategy, and library composition.
This guide covers how to scrape Netflix catalog data using Python with geo-targeted residential proxies to compare libraries across regions.
Why Scrape Netflix Catalog Data?
Netflix catalog analysis serves several industries:
- Content licensing intelligence — Track which titles are available in which countries to understand licensing deals
- Media research — Analyze Netflix’s content strategy by region, genre distribution, and content investment
- VPN service reviews — Provide accurate country-specific library comparisons for VPN review content
- Entertainment journalism — Report on new additions and removals from Netflix libraries worldwide
- Academic research — Study content localization, cultural preferences, and streaming market dynamics
- Competitive analysis — Compare Netflix’s regional libraries against Disney+, Amazon Prime, and other services
- Content gap analysis — Identify titles available in other regions but not in your target market
Netflix’s Geo-Specific Catalogs
Netflix maintains different catalogs for almost every country it operates in. Key differences include:
- Title availability — A movie or series may be in the US catalog but absent from the UK catalog due to licensing
- Content volume — The US library typically has 5,000+ titles, while smaller markets may have 2,000-3,000
- Original vs. licensed — Netflix Originals are generally available worldwide, while licensed content varies
- Pricing — Subscription costs differ by country
- Local originals — Region-specific original content (Korean dramas, Spanish series, etc.)
- Language options — Available audio tracks and subtitle languages vary by region
Why You Need Proxies
Netflix determines your location based on your IP address and serves the catalog for that region. Without proxies:
- You can only see the catalog for your own country
- Netflix blocks known VPN and proxy IPs aggressively
- Datacenter IPs are blocked almost universally
- Multiple rapid requests from one IP trigger rate limiting
To see the catalog for a different country, you need a residential proxy IP from that country.
Data Points to Extract
| Data Point | Source | Notes |
|---|---|---|
| Title | Browse page / detail | Movie or series name |
| Type | Metadata | Movie, series, documentary |
| Genre | Category tags | Multiple genres per title |
| Release year | Metadata | Original release year |
| Netflix original | Badge | Whether it is a Netflix production |
| Description | Detail page | Synopsis text |
| Cast | Detail page | Actors and directors |
| Maturity rating | Metadata | Content rating (PG, R, etc.) |
| Duration | Metadata | Runtime or number of seasons |
| Available audio | Detail page | Language tracks |
| Available subtitles | Detail page | Subtitle languages |
| Match percentage | Browse page | Personalized recommendation score |
| Country availability | Requires multi-country scrape | Which countries have this title |
Setting Up Your Environment
Netflix requires full JavaScript rendering:
pip install playwright beautifulsoup4
playwright install chromiumPython Code: Scraping Netflix Catalog with Proxies
Approach 1: Browse Page Scraping
import asyncio
from playwright.async_api import async_playwright
from bs4 import BeautifulSoup
import json
import random
import logging
import time
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
class NetflixCatalogScraper:
def __init__(self, proxy_map: dict):
"""
proxy_map: dict of country_code -> proxy_string
e.g., {"us": "user:pass@us-proxy:8080", "uk": "user:pass@uk-proxy:8080"}
"""
self.proxy_map = proxy_map
self.catalogs = {}
def parse_proxy(self, proxy_str: str) -> dict:
auth, server = proxy_str.rsplit("@", 1)
user, password = auth.split(":", 1)
return {
"server": f"http://{server}",
"username": user,
"password": password
}
async def scrape_country_catalog(self, country_code: str,
email: str, password: str,
max_genres: int = 20):
"""Scrape Netflix catalog for a specific country."""
proxy_str = self.proxy_map.get(country_code)
if not proxy_str:
logger.error(f"No proxy configured for {country_code}")
return
proxy = self.parse_proxy(proxy_str)
titles = []
async with async_playwright() as p:
browser = await p.chromium.launch(
headless=True,
proxy=proxy
)
context = await browser.new_context(
viewport={"width": 1920, "height": 1080},
user_agent=(
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/120.0.0.0 Safari/537.36"
),
locale="en-US"
)
page = await context.new_page()
# Login to Netflix
logger.info(f"Logging into Netflix via {country_code} proxy")
await page.goto("https://www.netflix.com/login", wait_until="networkidle")
await page.wait_for_timeout(random.randint(2000, 4000))
# Fill login form
await page.fill('input[name="userLoginId"]', email)
await page.wait_for_timeout(random.randint(500, 1000))
await page.fill('input[name="password"]', password)
await page.wait_for_timeout(random.randint(500, 1000))
await page.click('button[type="submit"]')
await page.wait_for_timeout(random.randint(5000, 8000))
# Select profile if profile picker appears
profile_links = await page.query_selector_all("[class*='profile-link']")
if profile_links:
await profile_links[0].click()
await page.wait_for_timeout(random.randint(3000, 5000))
# Browse the catalog
logger.info(f"Browsing Netflix catalog from {country_code}")
await page.goto("https://www.netflix.com/browse", wait_until="networkidle")
await page.wait_for_timeout(random.randint(3000, 5000))
# Scroll to load more rows
for i in range(10):
await page.evaluate(f"window.scrollBy(0, {600 + i * 100})")
await page.wait_for_timeout(random.randint(1000, 2000))
html = await page.content()
browse_titles = self.parse_browse_page(html)
titles.extend(browse_titles)
logger.info(f"Found {len(browse_titles)} titles on browse page")
# Scrape genre-specific pages for more coverage
genre_ids = await self.get_genre_ids(page)
for genre_id in genre_ids[:max_genres]:
genre_url = f"https://www.netflix.com/browse/genre/{genre_id}"
try:
await page.goto(genre_url, wait_until="networkidle")
await page.wait_for_timeout(random.randint(2000, 4000))
# Scroll genre page
for i in range(5):
await page.evaluate(f"window.scrollBy(0, {400 + i * 100})")
await page.wait_for_timeout(random.randint(800, 1500))
html = await page.content()
genre_titles = self.parse_browse_page(html)
titles.extend(genre_titles)
logger.info(f"Genre {genre_id}: found {len(genre_titles)} titles")
except Exception as e:
logger.error(f"Genre {genre_id} failed: {e}")
await page.wait_for_timeout(random.randint(3000, 6000))
await browser.close()
# Deduplicate by title ID
seen = set()
unique_titles = []
for title in titles:
tid = title.get("netflix_id")
if tid and tid not in seen:
seen.add(tid)
title["country"] = country_code
unique_titles.append(title)
self.catalogs[country_code] = unique_titles
logger.info(f"Total unique titles for {country_code}: {len(unique_titles)}")
def parse_browse_page(self, html: str) -> list:
"""Extract titles from Netflix browse page HTML."""
soup = BeautifulSoup(html, "html.parser")
titles = []
# Netflix title cards
title_cards = soup.select(
"[class*='title-card'], [class*='slider-item'], "
"[class*='boxart-container']"
)
for card in title_cards:
title = {}
# Title name from aria-label or alt text
link = card.select_one("a[href*='/watch/'], a[href*='/title/']")
if link:
href = link.get("href", "")
# Extract Netflix ID from URL
for segment in href.split("/"):
if segment.isdigit():
title["netflix_id"] = segment
break
title["url"] = f"https://www.netflix.com{href}"
# Title from image alt or aria-label
img = card.select_one("img[alt]")
if img:
title["name"] = img.get("alt", "").strip()
title["image"] = img.get("src") or img.get("data-src")
# Fallback: aria-label on the card
aria = card.get("aria-label", "")
if aria and not title.get("name"):
title["name"] = aria
if title.get("name") or title.get("netflix_id"):
titles.append(title)
return titles
async def get_genre_ids(self, page) -> list:
"""Extract genre IDs from Netflix browse page."""
genre_ids = []
# Netflix genre links are often in navigation or embedded data
links = await page.query_selector_all("a[href*='/browse/genre/']")
for link in links:
href = await link.get_attribute("href")
if href and "/genre/" in href:
genre_id = href.split("/genre/")[1].split("?")[0].split("/")[0]
if genre_id.isdigit() and genre_id not in genre_ids:
genre_ids.append(genre_id)
# Common Netflix genre IDs as fallback
common_genres = [
"83", "6548", "8933", "1365", "7424", # Action, Comedy, Drama
"2243108", "52117", "11804", "31574", # Thriller, Sci-Fi, Anime
"5763", "6839", "8711", "7077", "99", # Horror, Documentary
"10118", "10673", "1492", "4370" # TV Shows, etc.
]
for gid in common_genres:
if gid not in genre_ids:
genre_ids.append(gid)
return genre_ids
async def scrape_title_detail(self, netflix_id: str, country_code: str,
email: str, password: str) -> dict:
"""Scrape detailed information for a specific title."""
proxy = self.parse_proxy(self.proxy_map[country_code])
async with async_playwright() as p:
browser = await p.chromium.launch(headless=True, proxy=proxy)
context = await browser.new_context(
viewport={"width": 1920, "height": 1080},
user_agent=(
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/120.0.0.0 Safari/537.36"
)
)
page = await context.new_page()
# Login first (simplified -- reuse saved cookies in production)
await page.goto("https://www.netflix.com/login", wait_until="networkidle")
await page.fill('input[name="userLoginId"]', email)
await page.fill('input[name="password"]', password)
await page.click('button[type="submit"]')
await page.wait_for_timeout(5000)
# Navigate to title page
url = f"https://www.netflix.com/title/{netflix_id}"
await page.goto(url, wait_until="networkidle")
await page.wait_for_timeout(random.randint(3000, 5000))
html = await page.content()
detail = self.parse_title_detail(html, netflix_id)
await browser.close()
return detail
def parse_title_detail(self, html: str, netflix_id: str) -> dict:
"""Parse title detail page for comprehensive data."""
soup = BeautifulSoup(html, "html.parser")
detail = {"netflix_id": netflix_id}
# Title
title_el = soup.select_one("h1, [class*='title-title']")
if title_el:
detail["name"] = title_el.get_text(strip=True)
# Description
desc_el = soup.select_one("[class*='synopsis'], [class*='preview-modal-synopsis']")
if desc_el:
detail["description"] = desc_el.get_text(strip=True)
# Maturity rating
maturity_el = soup.select_one("[class*='maturity'], [class*='rating']")
if maturity_el:
detail["maturity_rating"] = maturity_el.get_text(strip=True)
# Year and duration
meta_items = soup.select("[class*='meta-item'], [class*='duration']")
for meta in meta_items:
text = meta.get_text(strip=True)
if text.isdigit() and len(text) == 4:
detail["year"] = text
elif "Season" in text or "Episode" in text:
detail["seasons"] = text
elif "h" in text or "min" in text:
detail["duration"] = text
# Genres
genre_els = soup.select("[class*='genre'], [class*='tag-item']")
detail["genres"] = [g.get_text(strip=True) for g in genre_els]
# Cast
cast_els = soup.select("[class*='cast'] a, [class*='creator'] a")
detail["cast"] = [c.get_text(strip=True) for c in cast_els]
return detail
def compare_catalogs(self) -> dict:
"""Compare catalogs across scraped countries."""
if len(self.catalogs) < 2:
return {}
all_titles = {}
for country, titles in self.catalogs.items():
for title in titles:
tid = title.get("netflix_id")
if tid:
if tid not in all_titles:
all_titles[tid] = {
"name": title.get("name"),
"countries": []
}
all_titles[tid]["countries"].append(country)
countries = list(self.catalogs.keys())
comparison = {
"total_unique_titles": len(all_titles),
"per_country": {
c: len(self.catalogs[c]) for c in countries
},
"available_everywhere": sum(
1 for t in all_titles.values()
if set(countries).issubset(set(t["countries"]))
),
"exclusive_titles": {}
}
for country in countries:
exclusive = [
t for t in all_titles.values()
if t["countries"] == [country]
]
comparison["exclusive_titles"][country] = len(exclusive)
return comparison
# Usage
if __name__ == "__main__":
proxy_map = {
"us": "user:pass@us-residential.proxy.com:8080",
"uk": "user:pass@uk-residential.proxy.com:8080",
"jp": "user:pass@jp-residential.proxy.com:8080",
"de": "user:pass@de-residential.proxy.com:8080",
}
scraper = NetflixCatalogScraper(proxy_map=proxy_map)
# Scrape US catalog
asyncio.run(scraper.scrape_country_catalog(
country_code="us",
email="your@email.com",
password="your_password",
max_genres=10
))
# Scrape UK catalog
asyncio.run(scraper.scrape_country_catalog(
country_code="uk",
email="your@email.com",
password="your_password",
max_genres=10
))
# Compare catalogs
comparison = scraper.compare_catalogs()
print(json.dumps(comparison, indent=2))
# Save results
with open("netflix_catalogs.json", "w") as f:
json.dump(scraper.catalogs, f, indent=2)Approach 2: Using Netflix’s Shakti API
Netflix’s internal API (Shakti) returns structured JSON data. It requires valid authentication cookies:
import requests
import json
def query_netflix_api(auth_cookies: dict, proxy: str,
genre_id: int = 83, page: int = 0) -> dict:
"""Query Netflix's internal API for catalog data."""
# The Shakti API URL includes a build identifier that changes
# You need to extract this from the Netflix page source
build_id = "vPRELEASE" # Placeholder -- extract dynamically
url = (
f"https://www.netflix.com/api/shakti/{build_id}"
f"/pathEvaluator?withSize=true&materialize=true"
)
# Shakti uses a specific request format
body = {
"paths": [
["videos", genre_id, {"from": page * 50, "to": (page + 1) * 50 - 1},
["summary", "title", "synopsis", "maturity", "releaseYear",
"seasonCount", "episodeCount", "runtime", "userRating"]]
]
}
headers = {
"Content-Type": "application/json",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
}
proxies = {"http": f"http://{proxy}", "https": f"http://{proxy}"}
try:
response = requests.post(
url,
json=body,
headers=headers,
cookies=auth_cookies,
proxies=proxies,
timeout=30
)
if response.status_code == 200:
return response.json()
except Exception as e:
print(f"Shakti API request failed: {e}")
return {}Geo-Targeted Residential Proxies
Netflix has one of the most sophisticated VPN and proxy detection systems in the streaming industry. Here is what works and what does not:
What Works
- Premium residential proxies — IPs assigned to real home internet connections have the highest success rate
- ISP proxies — Static residential IPs from major ISPs pass Netflix’s detection
- Clean IP pools — Proxies with no history of Netflix abuse
What Gets Blocked
- Datacenter proxies — Blocked almost universally. Netflix maintains extensive datacenter IP databases.
- Known VPN IPs — IPs associated with VPN services are blocked
- Shared residential proxies — If too many Netflix users share the same IP, it gets flagged
- Free proxies — Never work with Netflix
Country Coverage
For comprehensive catalog comparison, you need proxies in each target country:
- Major markets: US, UK, Canada, Australia, Germany, France, Japan, South Korea
- Growing markets: India, Brazil, Mexico, Spain, Italy
- Smaller markets: Singapore, Thailand, Netherlands, Sweden
Verify your proxy’s detected country with our IP lookup tool before attempting Netflix scraping.
Netflix’s Detection Methods
Netflix invests heavily in proxy and VPN detection:
- IP database matching — Netflix maintains databases of known datacenter, VPN, and proxy IP ranges
- DNS leak detection — Checks if DNS requests match the expected ISP for the IP address
- WebRTC leak detection — Can detect your real IP through WebRTC in browsers
- Behavioral analysis — Unusual access patterns (rapid browsing, multiple profiles, frequent country changes) trigger reviews
- Payment method cross-referencing — Account payment country vs. access country mismatches may trigger restrictions
- ISP verification — Validates that the IP belongs to a residential ISP in the claimed country
Troubleshooting
Problem: Netflix shows “You seem to be using an unblocker or proxy” error
- Your proxy IP is flagged by Netflix. Switch to a different residential proxy IP.
- Avoid datacenter and known VPN IPs entirely.
- Use ISP proxies (static residential) for the most reliable access.
Problem: Login succeeds but browse page is empty
- Netflix may be serving a restricted view. Verify the proxy IP country matches an active Netflix market.
- Wait longer after login for the browse page to fully render (Netflix uses heavy client-side rendering).
Problem: Title pages show “not available in your region”
- The title is genuinely not in the catalog for the proxy’s country. This is expected and is actually the data point you want to capture.
- Record this as a negative availability signal for that country.
Problem: Session expires quickly
- Netflix limits concurrent sessions. Ensure you are not logged in from too many simultaneous IPs.
- Maintain consistent proxy IPs within a session. Do not rotate proxies mid-session.
Problem: Getting rate limited on genre pages
- Add 5-10 second delays between genre page loads.
- Limit your scraping to 20-30 genres per session.
- Spread multi-country scraping across different time windows.
Estimate bandwidth and proxy costs with our proxy cost calculator.
Legal and Ethical Considerations
Netflix catalog scraping raises specific legal concerns:
- Netflix Terms of Use — Explicitly prohibit scraping, crawling, and automated access. Netflix has the resources and motivation to enforce these terms.
- DMCA implications — Circumventing Netflix’s geographic access controls could implicate the Digital Millennium Copyright Act’s anti-circumvention provisions.
- Copyright — Netflix’s catalog metadata (descriptions, images, ratings) is copyrighted content. Republishing this data may infringe on Netflix’s rights.
- Account Terms — Using a Netflix account for automated scraping violates the subscriber agreement and can result in account termination.
- Data licensing — Services like JustWatch, Reelgood, and uNoGS provide licensed Netflix catalog data. These are the legally proper sources for commercial applications.
- Academic use — Academic researchers may have stronger fair use arguments, but should still consult with their institution’s legal counsel.
For commercial catalog comparison products, consider licensing data from established providers rather than scraping Netflix directly.
Third-Party Alternatives
Before building a Netflix scraper, consider these data sources:
- uNoGS (unofficial Netflix Online Global Search) — Provides catalog data across countries. May use their own scraping infrastructure.
- JustWatch — Licensed streaming availability data across Netflix and other services
- Reelgood — Aggregated streaming catalog data
- TMDB (The Movie Database) — Community-maintained movie and TV data with streaming provider links
- Netflix Media Center — Official press releases about new content additions
These alternatives may be more sustainable and legally sound for production applications.
Conclusion
Scraping Netflix catalog data requires premium residential proxies from each target country, a valid Netflix subscription, and headless browser automation to handle Netflix’s JavaScript-heavy interface. The primary value lies in cross-country catalog comparison, which requires simultaneous proxy access in multiple regions. Given Netflix’s aggressive proxy detection and legal posture, carefully evaluate whether scraping is necessary or if third-party data providers can meet your needs. For research purposes, start with a small number of countries and genres, validate your approach, and expand gradually.
- How to Scrape Airbnb Listings with Proxies in 2026
- How to Scrape Facebook Marketplace with Proxies in 2026
- aiohttp + BeautifulSoup: Async Python Scraping
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- API vs Web Scraping: When You Need Proxies (and When You Don’t)
- ASEAN Data Protection Laws: A Web Scraping Compliance Matrix
- How to Scrape Airbnb Listings with Proxies in 2026
- How to Scrape Facebook Marketplace with Proxies in 2026
- aiohttp + BeautifulSoup: Async Python Scraping
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- API vs Web Scraping: When You Need Proxies (and When You Don’t)
- ASEAN Data Protection Laws: A Web Scraping Compliance Matrix
- How to Scrape Airbnb Listings with Proxies in 2026
- How to Scrape Facebook Marketplace with Proxies in 2026
- aiohttp + BeautifulSoup: Async Python Scraping
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- API vs Web Scraping: When You Need Proxies (and When You Don’t)
- ASEAN Data Protection Laws: A Web Scraping Compliance Matrix
Related Reading
- How to Scrape Airbnb Listings with Proxies in 2026
- How to Scrape Facebook Marketplace with Proxies in 2026
- aiohttp + BeautifulSoup: Async Python Scraping
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- API vs Web Scraping: When You Need Proxies (and When You Don’t)
- ASEAN Data Protection Laws: A Web Scraping Compliance Matrix