How to Scrape Netflix Catalog Data with Proxies in 2026

How to Scrape Netflix Catalog Data with Proxies in 2026

Netflix operates the world’s largest streaming library, but its catalog varies dramatically by country. A title available in the US might not be available in the UK, Japan, or Brazil. For content researchers, media analysts, VPN reviewers, and entertainment industry professionals, scraping Netflix catalog data across multiple countries provides unique insights into content licensing, regional strategy, and library composition.

This guide covers how to scrape Netflix catalog data using Python with geo-targeted residential proxies to compare libraries across regions.

Why Scrape Netflix Catalog Data?

Netflix catalog analysis serves several industries:

  • Content licensing intelligence — Track which titles are available in which countries to understand licensing deals
  • Media research — Analyze Netflix’s content strategy by region, genre distribution, and content investment
  • VPN service reviews — Provide accurate country-specific library comparisons for VPN review content
  • Entertainment journalism — Report on new additions and removals from Netflix libraries worldwide
  • Academic research — Study content localization, cultural preferences, and streaming market dynamics
  • Competitive analysis — Compare Netflix’s regional libraries against Disney+, Amazon Prime, and other services
  • Content gap analysis — Identify titles available in other regions but not in your target market

Netflix’s Geo-Specific Catalogs

Netflix maintains different catalogs for almost every country it operates in. Key differences include:

  • Title availability — A movie or series may be in the US catalog but absent from the UK catalog due to licensing
  • Content volume — The US library typically has 5,000+ titles, while smaller markets may have 2,000-3,000
  • Original vs. licensed — Netflix Originals are generally available worldwide, while licensed content varies
  • Pricing — Subscription costs differ by country
  • Local originals — Region-specific original content (Korean dramas, Spanish series, etc.)
  • Language options — Available audio tracks and subtitle languages vary by region

Why You Need Proxies

Netflix determines your location based on your IP address and serves the catalog for that region. Without proxies:

  • You can only see the catalog for your own country
  • Netflix blocks known VPN and proxy IPs aggressively
  • Datacenter IPs are blocked almost universally
  • Multiple rapid requests from one IP trigger rate limiting

To see the catalog for a different country, you need a residential proxy IP from that country.

Data Points to Extract

Data PointSourceNotes
TitleBrowse page / detailMovie or series name
TypeMetadataMovie, series, documentary
GenreCategory tagsMultiple genres per title
Release yearMetadataOriginal release year
Netflix originalBadgeWhether it is a Netflix production
DescriptionDetail pageSynopsis text
CastDetail pageActors and directors
Maturity ratingMetadataContent rating (PG, R, etc.)
DurationMetadataRuntime or number of seasons
Available audioDetail pageLanguage tracks
Available subtitlesDetail pageSubtitle languages
Match percentageBrowse pagePersonalized recommendation score
Country availabilityRequires multi-country scrapeWhich countries have this title

Setting Up Your Environment

Netflix requires full JavaScript rendering:

pip install playwright beautifulsoup4
playwright install chromium

Python Code: Scraping Netflix Catalog with Proxies

Approach 1: Browse Page Scraping

import asyncio
from playwright.async_api import async_playwright
from bs4 import BeautifulSoup
import json
import random
import logging
import time

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

class NetflixCatalogScraper:
    def __init__(self, proxy_map: dict):
        """
        proxy_map: dict of country_code -> proxy_string
        e.g., {"us": "user:pass@us-proxy:8080", "uk": "user:pass@uk-proxy:8080"}
        """
        self.proxy_map = proxy_map
        self.catalogs = {}

    def parse_proxy(self, proxy_str: str) -> dict:
        auth, server = proxy_str.rsplit("@", 1)
        user, password = auth.split(":", 1)
        return {
            "server": f"http://{server}",
            "username": user,
            "password": password
        }

    async def scrape_country_catalog(self, country_code: str,
                                      email: str, password: str,
                                      max_genres: int = 20):
        """Scrape Netflix catalog for a specific country."""
        proxy_str = self.proxy_map.get(country_code)
        if not proxy_str:
            logger.error(f"No proxy configured for {country_code}")
            return

        proxy = self.parse_proxy(proxy_str)
        titles = []

        async with async_playwright() as p:
            browser = await p.chromium.launch(
                headless=True,
                proxy=proxy
            )
            context = await browser.new_context(
                viewport={"width": 1920, "height": 1080},
                user_agent=(
                    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
                    "AppleWebKit/537.36 (KHTML, like Gecko) "
                    "Chrome/120.0.0.0 Safari/537.36"
                ),
                locale="en-US"
            )
            page = await context.new_page()

            # Login to Netflix
            logger.info(f"Logging into Netflix via {country_code} proxy")
            await page.goto("https://www.netflix.com/login", wait_until="networkidle")
            await page.wait_for_timeout(random.randint(2000, 4000))

            # Fill login form
            await page.fill('input[name="userLoginId"]', email)
            await page.wait_for_timeout(random.randint(500, 1000))
            await page.fill('input[name="password"]', password)
            await page.wait_for_timeout(random.randint(500, 1000))
            await page.click('button[type="submit"]')
            await page.wait_for_timeout(random.randint(5000, 8000))

            # Select profile if profile picker appears
            profile_links = await page.query_selector_all("[class*='profile-link']")
            if profile_links:
                await profile_links[0].click()
                await page.wait_for_timeout(random.randint(3000, 5000))

            # Browse the catalog
            logger.info(f"Browsing Netflix catalog from {country_code}")
            await page.goto("https://www.netflix.com/browse", wait_until="networkidle")
            await page.wait_for_timeout(random.randint(3000, 5000))

            # Scroll to load more rows
            for i in range(10):
                await page.evaluate(f"window.scrollBy(0, {600 + i * 100})")
                await page.wait_for_timeout(random.randint(1000, 2000))

            html = await page.content()
            browse_titles = self.parse_browse_page(html)
            titles.extend(browse_titles)
            logger.info(f"Found {len(browse_titles)} titles on browse page")

            # Scrape genre-specific pages for more coverage
            genre_ids = await self.get_genre_ids(page)
            for genre_id in genre_ids[:max_genres]:
                genre_url = f"https://www.netflix.com/browse/genre/{genre_id}"
                try:
                    await page.goto(genre_url, wait_until="networkidle")
                    await page.wait_for_timeout(random.randint(2000, 4000))

                    # Scroll genre page
                    for i in range(5):
                        await page.evaluate(f"window.scrollBy(0, {400 + i * 100})")
                        await page.wait_for_timeout(random.randint(800, 1500))

                    html = await page.content()
                    genre_titles = self.parse_browse_page(html)
                    titles.extend(genre_titles)
                    logger.info(f"Genre {genre_id}: found {len(genre_titles)} titles")

                except Exception as e:
                    logger.error(f"Genre {genre_id} failed: {e}")

                await page.wait_for_timeout(random.randint(3000, 6000))

            await browser.close()

        # Deduplicate by title ID
        seen = set()
        unique_titles = []
        for title in titles:
            tid = title.get("netflix_id")
            if tid and tid not in seen:
                seen.add(tid)
                title["country"] = country_code
                unique_titles.append(title)

        self.catalogs[country_code] = unique_titles
        logger.info(f"Total unique titles for {country_code}: {len(unique_titles)}")

    def parse_browse_page(self, html: str) -> list:
        """Extract titles from Netflix browse page HTML."""
        soup = BeautifulSoup(html, "html.parser")
        titles = []

        # Netflix title cards
        title_cards = soup.select(
            "[class*='title-card'], [class*='slider-item'], "
            "[class*='boxart-container']"
        )

        for card in title_cards:
            title = {}

            # Title name from aria-label or alt text
            link = card.select_one("a[href*='/watch/'], a[href*='/title/']")
            if link:
                href = link.get("href", "")
                # Extract Netflix ID from URL
                for segment in href.split("/"):
                    if segment.isdigit():
                        title["netflix_id"] = segment
                        break
                title["url"] = f"https://www.netflix.com{href}"

            # Title from image alt or aria-label
            img = card.select_one("img[alt]")
            if img:
                title["name"] = img.get("alt", "").strip()
                title["image"] = img.get("src") or img.get("data-src")

            # Fallback: aria-label on the card
            aria = card.get("aria-label", "")
            if aria and not title.get("name"):
                title["name"] = aria

            if title.get("name") or title.get("netflix_id"):
                titles.append(title)

        return titles

    async def get_genre_ids(self, page) -> list:
        """Extract genre IDs from Netflix browse page."""
        genre_ids = []

        # Netflix genre links are often in navigation or embedded data
        links = await page.query_selector_all("a[href*='/browse/genre/']")
        for link in links:
            href = await link.get_attribute("href")
            if href and "/genre/" in href:
                genre_id = href.split("/genre/")[1].split("?")[0].split("/")[0]
                if genre_id.isdigit() and genre_id not in genre_ids:
                    genre_ids.append(genre_id)

        # Common Netflix genre IDs as fallback
        common_genres = [
            "83", "6548", "8933", "1365", "7424",   # Action, Comedy, Drama
            "2243108", "52117", "11804", "31574",     # Thriller, Sci-Fi, Anime
            "5763", "6839", "8711", "7077", "99",     # Horror, Documentary
            "10118", "10673", "1492", "4370"           # TV Shows, etc.
        ]

        for gid in common_genres:
            if gid not in genre_ids:
                genre_ids.append(gid)

        return genre_ids

    async def scrape_title_detail(self, netflix_id: str, country_code: str,
                                   email: str, password: str) -> dict:
        """Scrape detailed information for a specific title."""
        proxy = self.parse_proxy(self.proxy_map[country_code])

        async with async_playwright() as p:
            browser = await p.chromium.launch(headless=True, proxy=proxy)
            context = await browser.new_context(
                viewport={"width": 1920, "height": 1080},
                user_agent=(
                    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
                    "AppleWebKit/537.36 (KHTML, like Gecko) "
                    "Chrome/120.0.0.0 Safari/537.36"
                )
            )
            page = await context.new_page()

            # Login first (simplified -- reuse saved cookies in production)
            await page.goto("https://www.netflix.com/login", wait_until="networkidle")
            await page.fill('input[name="userLoginId"]', email)
            await page.fill('input[name="password"]', password)
            await page.click('button[type="submit"]')
            await page.wait_for_timeout(5000)

            # Navigate to title page
            url = f"https://www.netflix.com/title/{netflix_id}"
            await page.goto(url, wait_until="networkidle")
            await page.wait_for_timeout(random.randint(3000, 5000))

            html = await page.content()
            detail = self.parse_title_detail(html, netflix_id)

            await browser.close()
            return detail

    def parse_title_detail(self, html: str, netflix_id: str) -> dict:
        """Parse title detail page for comprehensive data."""
        soup = BeautifulSoup(html, "html.parser")
        detail = {"netflix_id": netflix_id}

        # Title
        title_el = soup.select_one("h1, [class*='title-title']")
        if title_el:
            detail["name"] = title_el.get_text(strip=True)

        # Description
        desc_el = soup.select_one("[class*='synopsis'], [class*='preview-modal-synopsis']")
        if desc_el:
            detail["description"] = desc_el.get_text(strip=True)

        # Maturity rating
        maturity_el = soup.select_one("[class*='maturity'], [class*='rating']")
        if maturity_el:
            detail["maturity_rating"] = maturity_el.get_text(strip=True)

        # Year and duration
        meta_items = soup.select("[class*='meta-item'], [class*='duration']")
        for meta in meta_items:
            text = meta.get_text(strip=True)
            if text.isdigit() and len(text) == 4:
                detail["year"] = text
            elif "Season" in text or "Episode" in text:
                detail["seasons"] = text
            elif "h" in text or "min" in text:
                detail["duration"] = text

        # Genres
        genre_els = soup.select("[class*='genre'], [class*='tag-item']")
        detail["genres"] = [g.get_text(strip=True) for g in genre_els]

        # Cast
        cast_els = soup.select("[class*='cast'] a, [class*='creator'] a")
        detail["cast"] = [c.get_text(strip=True) for c in cast_els]

        return detail

    def compare_catalogs(self) -> dict:
        """Compare catalogs across scraped countries."""
        if len(self.catalogs) < 2:
            return {}

        all_titles = {}
        for country, titles in self.catalogs.items():
            for title in titles:
                tid = title.get("netflix_id")
                if tid:
                    if tid not in all_titles:
                        all_titles[tid] = {
                            "name": title.get("name"),
                            "countries": []
                        }
                    all_titles[tid]["countries"].append(country)

        countries = list(self.catalogs.keys())
        comparison = {
            "total_unique_titles": len(all_titles),
            "per_country": {
                c: len(self.catalogs[c]) for c in countries
            },
            "available_everywhere": sum(
                1 for t in all_titles.values()
                if set(countries).issubset(set(t["countries"]))
            ),
            "exclusive_titles": {}
        }

        for country in countries:
            exclusive = [
                t for t in all_titles.values()
                if t["countries"] == [country]
            ]
            comparison["exclusive_titles"][country] = len(exclusive)

        return comparison


# Usage
if __name__ == "__main__":
    proxy_map = {
        "us": "user:pass@us-residential.proxy.com:8080",
        "uk": "user:pass@uk-residential.proxy.com:8080",
        "jp": "user:pass@jp-residential.proxy.com:8080",
        "de": "user:pass@de-residential.proxy.com:8080",
    }

    scraper = NetflixCatalogScraper(proxy_map=proxy_map)

    # Scrape US catalog
    asyncio.run(scraper.scrape_country_catalog(
        country_code="us",
        email="your@email.com",
        password="your_password",
        max_genres=10
    ))

    # Scrape UK catalog
    asyncio.run(scraper.scrape_country_catalog(
        country_code="uk",
        email="your@email.com",
        password="your_password",
        max_genres=10
    ))

    # Compare catalogs
    comparison = scraper.compare_catalogs()
    print(json.dumps(comparison, indent=2))

    # Save results
    with open("netflix_catalogs.json", "w") as f:
        json.dump(scraper.catalogs, f, indent=2)

Approach 2: Using Netflix’s Shakti API

Netflix’s internal API (Shakti) returns structured JSON data. It requires valid authentication cookies:

import requests
import json

def query_netflix_api(auth_cookies: dict, proxy: str,
                      genre_id: int = 83, page: int = 0) -> dict:
    """Query Netflix's internal API for catalog data."""
    # The Shakti API URL includes a build identifier that changes
    # You need to extract this from the Netflix page source
    build_id = "vPRELEASE"  # Placeholder -- extract dynamically

    url = (
        f"https://www.netflix.com/api/shakti/{build_id}"
        f"/pathEvaluator?withSize=true&materialize=true"
    )

    # Shakti uses a specific request format
    body = {
        "paths": [
            ["videos", genre_id, {"from": page * 50, "to": (page + 1) * 50 - 1},
             ["summary", "title", "synopsis", "maturity", "releaseYear",
              "seasonCount", "episodeCount", "runtime", "userRating"]]
        ]
    }

    headers = {
        "Content-Type": "application/json",
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
    }

    proxies = {"http": f"http://{proxy}", "https": f"http://{proxy}"}

    try:
        response = requests.post(
            url,
            json=body,
            headers=headers,
            cookies=auth_cookies,
            proxies=proxies,
            timeout=30
        )
        if response.status_code == 200:
            return response.json()
    except Exception as e:
        print(f"Shakti API request failed: {e}")

    return {}

Geo-Targeted Residential Proxies

Netflix has one of the most sophisticated VPN and proxy detection systems in the streaming industry. Here is what works and what does not:

What Works

  • Premium residential proxies — IPs assigned to real home internet connections have the highest success rate
  • ISP proxies — Static residential IPs from major ISPs pass Netflix’s detection
  • Clean IP pools — Proxies with no history of Netflix abuse

What Gets Blocked

  • Datacenter proxies — Blocked almost universally. Netflix maintains extensive datacenter IP databases.
  • Known VPN IPs — IPs associated with VPN services are blocked
  • Shared residential proxies — If too many Netflix users share the same IP, it gets flagged
  • Free proxies — Never work with Netflix

Country Coverage

For comprehensive catalog comparison, you need proxies in each target country:

  • Major markets: US, UK, Canada, Australia, Germany, France, Japan, South Korea
  • Growing markets: India, Brazil, Mexico, Spain, Italy
  • Smaller markets: Singapore, Thailand, Netherlands, Sweden

Verify your proxy’s detected country with our IP lookup tool before attempting Netflix scraping.

Netflix’s Detection Methods

Netflix invests heavily in proxy and VPN detection:

  1. IP database matching — Netflix maintains databases of known datacenter, VPN, and proxy IP ranges
  2. DNS leak detection — Checks if DNS requests match the expected ISP for the IP address
  3. WebRTC leak detection — Can detect your real IP through WebRTC in browsers
  4. Behavioral analysis — Unusual access patterns (rapid browsing, multiple profiles, frequent country changes) trigger reviews
  5. Payment method cross-referencing — Account payment country vs. access country mismatches may trigger restrictions
  6. ISP verification — Validates that the IP belongs to a residential ISP in the claimed country

Troubleshooting

Problem: Netflix shows “You seem to be using an unblocker or proxy” error

  • Your proxy IP is flagged by Netflix. Switch to a different residential proxy IP.
  • Avoid datacenter and known VPN IPs entirely.
  • Use ISP proxies (static residential) for the most reliable access.

Problem: Login succeeds but browse page is empty

  • Netflix may be serving a restricted view. Verify the proxy IP country matches an active Netflix market.
  • Wait longer after login for the browse page to fully render (Netflix uses heavy client-side rendering).

Problem: Title pages show “not available in your region”

  • The title is genuinely not in the catalog for the proxy’s country. This is expected and is actually the data point you want to capture.
  • Record this as a negative availability signal for that country.

Problem: Session expires quickly

  • Netflix limits concurrent sessions. Ensure you are not logged in from too many simultaneous IPs.
  • Maintain consistent proxy IPs within a session. Do not rotate proxies mid-session.

Problem: Getting rate limited on genre pages

  • Add 5-10 second delays between genre page loads.
  • Limit your scraping to 20-30 genres per session.
  • Spread multi-country scraping across different time windows.

Estimate bandwidth and proxy costs with our proxy cost calculator.

Legal and Ethical Considerations

Netflix catalog scraping raises specific legal concerns:

  • Netflix Terms of Use — Explicitly prohibit scraping, crawling, and automated access. Netflix has the resources and motivation to enforce these terms.
  • DMCA implications — Circumventing Netflix’s geographic access controls could implicate the Digital Millennium Copyright Act’s anti-circumvention provisions.
  • Copyright — Netflix’s catalog metadata (descriptions, images, ratings) is copyrighted content. Republishing this data may infringe on Netflix’s rights.
  • Account Terms — Using a Netflix account for automated scraping violates the subscriber agreement and can result in account termination.
  • Data licensing — Services like JustWatch, Reelgood, and uNoGS provide licensed Netflix catalog data. These are the legally proper sources for commercial applications.
  • Academic use — Academic researchers may have stronger fair use arguments, but should still consult with their institution’s legal counsel.

For commercial catalog comparison products, consider licensing data from established providers rather than scraping Netflix directly.

Third-Party Alternatives

Before building a Netflix scraper, consider these data sources:

  • uNoGS (unofficial Netflix Online Global Search) — Provides catalog data across countries. May use their own scraping infrastructure.
  • JustWatch — Licensed streaming availability data across Netflix and other services
  • Reelgood — Aggregated streaming catalog data
  • TMDB (The Movie Database) — Community-maintained movie and TV data with streaming provider links
  • Netflix Media Center — Official press releases about new content additions

These alternatives may be more sustainable and legally sound for production applications.

Conclusion

Scraping Netflix catalog data requires premium residential proxies from each target country, a valid Netflix subscription, and headless browser automation to handle Netflix’s JavaScript-heavy interface. The primary value lies in cross-country catalog comparison, which requires simultaneous proxy access in multiple regions. Given Netflix’s aggressive proxy detection and legal posture, carefully evaluate whether scraping is necessary or if third-party data providers can meet your needs. For research purposes, start with a small number of countries and genres, validate your approach, and expand gradually.


Related Reading

Scroll to Top