How to scrape Coupang Korea: a practical 2026 guide

How to scrape Coupang Korea: a practical 2026 guide

Scrape Coupang Korea correctly in 2026 and you have access to the dominant ecommerce platform in one of the most digitally mature markets in the world. Coupang serves over 22 million active customers in South Korea, runs the country’s largest same-day delivery network (Rocket Delivery), and indexes hundreds of millions of SKUs ranging from groceries to electronics to fashion. For brand managers tracking pricing, agencies running competitive intelligence, or product teams sizing demand, Coupang is non-negotiable.

This guide covers Coupang Korea scraping end-to-end: which endpoints to hit, how to handle the bot defenses Coupang has stacked since their NYSE listing, how to deal with Korean character encoding and KRW pricing nuances, and how to manage Korean mobile carrier proxies. Working Python code throughout.

What Coupang Korea exposes

Three surfaces matter:

SurfaceURL patternBest for
Product detail pagecoupang.com/vp/products/{product_id}Full extraction with reviews
Internal APIcoupang.com/vp/products/{product_id}/items/{item_id}/vendor-itemsVariant-level data
Search resultscoupang.com/np/search?q={query}Discovery

Coupang’s structure is more nested than Lazada or Shopee. A “product” can have multiple “items” (variants), each with multiple “vendor items” (different sellers offering the same item). For complete competitive intelligence, you need vendor-item-level data, not just product-level.

Anti-bot defenses

Coupang uses a custom bot defense stack assembled by the Coupang security team:

  1. Cloudflare protection on the public web pages
  2. Aggressive IP reputation scoring; data center IPs are heavily challenged
  3. Custom JavaScript challenges that defeat headless Chromium with default settings
  4. Header-based fingerprinting (specific Accept-Language and User-Agent combinations expected)

The recommended path in 2026: Korean mobile carrier IPs (KT, SK Telecom, LG U+), real Chromium driven through CDP with a Korean locale, and patient throttling.

Working browser-based scraper

import asyncio
import json
from playwright.async_api import async_playwright
from bs4 import BeautifulSoup

async def scrape_coupang_kr(product_url: str, proxy: dict | None = None) -> dict:
    async with async_playwright() as p:
        browser = await p.chromium.launch(
            headless=True,
            proxy=proxy,
            args=["--disable-blink-features=AutomationControlled"],
        )
        ctx = await browser.new_context(
            user_agent="Mozilla/5.0 (iPhone; CPU iPhone OS 17_5 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.5 Mobile/15E148 Safari/604.1",
            locale="ko-KR",
            timezone_id="Asia/Seoul",
            extra_http_headers={"Accept-Language": "ko-KR,ko;q=0.9,en-US;q=0.8,en;q=0.7"},
            viewport={"width": 390, "height": 844},
        )
        page = await ctx.new_page()
        await page.goto(product_url, wait_until="networkidle", timeout=45000)
        html = await page.content()
        await browser.close()

    return _parse_coupang_html(html)

def _parse_coupang_html(html: str) -> dict:
    soup = BeautifulSoup(html, "html.parser")

    title = soup.select_one("h2.prod-buy-header__title")
    price_el = soup.select_one(".total-price strong")
    original_price = soup.select_one(".price-amount.origin-price")
    rating_el = soup.select_one(".rating-star-num")
    review_count_el = soup.select_one(".count")
    stock_el = soup.select_one(".out-of-stock")

    return {
        "title": title.text.strip() if title else None,
        "price_krw": _parse_krw(price_el.text) if price_el else None,
        "original_price_krw": _parse_krw(original_price.text) if original_price else None,
        "rating": float(rating_el.get("style", "").replace("width:", "").replace("%;", "")) / 20 if rating_el else None,
        "review_count": int(review_count_el.text.strip("()").replace(",", "")) if review_count_el else None,
        "in_stock": stock_el is None,
    }

def _parse_krw(text: str) -> float:
    import re
    digits = re.sub(r"[^\d]", "", text or "")
    return float(digits) if digits else 0.0

asyncio.run(scrape_coupang_kr("https://www.coupang.com/vp/products/1234567890"))

Mobile user agent matters more here than on most sites. Coupang serves a more API-friendly (smaller, JSON-heavy) version to mobile clients.

Capturing the internal product API

Coupang’s product detail page makes several internal API calls. Intercepting them gives cleaner JSON than parsing HTML.

async def scrape_with_api_capture(url: str, proxy: dict | None = None) -> dict:
    api_payloads = {}
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True, proxy=proxy)
        ctx = await browser.new_context(locale="ko-KR")
        page = await ctx.new_page()

        async def handler(resp):
            if "/vp/products/" in resp.url and resp.status == 200:
                try:
                    if "application/json" in resp.headers.get("content-type", ""):
                        api_payloads[resp.url] = await resp.json()
                except Exception:
                    pass

        page.on("response", handler)
        await page.goto(url, wait_until="networkidle", timeout=45000)
        await asyncio.sleep(2)
        await browser.close()
    return api_payloads

The intercepted payloads include the structured price, stock, vendor, and review data without you having to parse HTML.

Korean Won price handling

Korean Won uses the symbol ₩ and is whole-number (no fractional units). Prices appear as “1,234,500원” or “₩1,234,500”. Strip everything that is not a digit to parse:

import re

def parse_krw(s: str) -> float:
    return float(re.sub(r"[^\d]", "", s) or 0)

Conversion rates fluctuate but rough USD ratio in 2026 is around 1,400 KRW per USD. Always store the raw KRW value; convert only for display.

Korean character handling

Korean uses Hangul (한글) which is well-supported by UTF-8. Two specific gotchas:

First, Hangul has both completed syllable blocks (가, 나) and decomposed forms (Jamo). Coupang uses completed forms. Make sure your storage layer normalizes via NFC.

import unicodedata

def normalize_korean(s: str) -> str:
    return unicodedata.normalize("NFC", s)

Second, product titles often mix Hangul, Latin (brand names like Samsung, LG, Apple), and CJK ideographs (some traditional terms). Storage and indexing should support all three.

Mobile proxy rotation

Korean mobile carrier IPs (KT, SK Telecom, LG U+) are the cleanest source for Coupang scraping. Korean residential IPs work for low volume; mobile is required for sustained throughput.

import random

KR_MOBILE_PROXIES = [
    {"server": "socks5://us:pw@kr-kt-1.proxy.example.com:1080"},
    {"server": "socks5://us:pw@kr-skt-1.proxy.example.com:1080"},
    {"server": "socks5://us:pw@kr-lgu-1.proxy.example.com:1080"},
]

async def scrape_with_proxy(url: str):
    proxy = random.choice(KR_MOBILE_PROXIES)
    return await scrape_coupang_kr(url, proxy=proxy)

For broader proxy strategy in Asia, see best mobile proxy providers 2026.

Discovering product URLs

Coupang’s category structure is deeply nested. Sitemap discovery works:

import httpx
import xml.etree.ElementTree as ET

async def list_coupang_sitemap_urls(limit: int = 5) -> list[str]:
    sitemap_index = "https://www.coupang.com/sitemap.xml"
    async with httpx.AsyncClient(timeout=30) as client:
        r = await client.get(sitemap_index)
        root = ET.fromstring(r.text)
        ns = {"sm": "http://www.sitemaps.org/schemas/sitemap/0.9"}
        sitemaps = [s.find("sm:loc", ns).text for s in root.findall("sm:sitemap", ns)][:limit]

        urls = []
        for sm_url in sitemaps:
            r = await client.get(sm_url)
            sm_root = ET.fromstring(r.text)
            urls.extend(u.find("sm:loc", ns).text for u in sm_root.findall("sm:url", ns))
        return urls

Coupang category landing pages also expose paginated listings:

async def search_coupang(query: str, page: int = 1) -> list[dict]:
    url = f"https://www.coupang.com/np/search?q={query}&page={page}"
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        ctx = await browser.new_context(locale="ko-KR")
        pg = await ctx.new_page()
        await pg.goto(url, wait_until="networkidle")
        items = await pg.locator(".search-product").all()
        results = []
        for item in items:
            href = await item.locator("a").first.get_attribute("href")
            title = await item.locator(".name").text_content()
            results.append({"url": f"https://www.coupang.com{href}", "title": title.strip() if title else ""})
        await browser.close()
    return results

Korean address and seller data

Korean ecommerce uses a unique address structure (시 / 도 / 군 / 구 / 동 hierarchy). Vendor location data on Coupang typically appears at the city or district level. For brand intelligence, normalize to a hierarchical structure:

KOREAN_REGIONS = {
    "Seoul": "서울특별시",
    "Busan": "부산광역시",
    "Gyeonggi": "경기도",
    # ...
}

def normalize_korean_region(text: str) -> str | None:
    for english, korean in KOREAN_REGIONS.items():
        if korean in text or english in text:
            return english
    return None

For PIPA compliance, store at city level, not specific addresses.

Comparison to other Asian markets

MarketBot defenseVolumeMobile proxy required
Coupang KoreaHighLargest in KoreaYes
Naver Smart StoreHighVery largeYes
Gmarket KoreaMediumLargeRecommended
11Street KoreaMediumMediumOptional
Rakuten JapanHighLargest in JapanYes
Amazon JapanMediumLargest in JapanOptional

For Japan specifically, see our Rakuten Japan scraping guide.

Stealth fingerprint hardening for Coupang

Coupang’s Cloudflare integration trips on the standard headless Chromium fingerprint. Combine the AutomationControlled patch with realistic Korean mobile fingerprints:

context_init = """
Object.defineProperty(navigator, 'webdriver', {get: () => undefined});
Object.defineProperty(navigator, 'languages', {get: () => ['ko-KR', 'ko', 'en']});
Object.defineProperty(navigator, 'platform', {get: () => 'iPhone'});
Object.defineProperty(screen, 'colorDepth', {get: () => 32});
"""

await ctx.add_init_script(context_init)

Additionally, Coupang weighs the order and casing of HTTP headers. Use extra_http_headers to send a Korean-realistic header set in the right order:

ctx = await browser.new_context(
    extra_http_headers={
        "Accept-Language": "ko-KR,ko;q=0.9,en-US;q=0.8,en;q=0.7",
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
        "Sec-Ch-Ua-Platform": '"iOS"',
        "Sec-Ch-Ua-Mobile": "?1",
    },
)

These details push the bot score from “high” to “medium” on Coupang’s internal scoring, which is enough to keep the session alive.

Coupang Rocket vs Marketplace

Coupang sells in two modes. Coupang-fulfilled (Rocket Delivery) products are sold by Coupang directly. Marketplace products are sold by third parties through Coupang. Both appear on the same product page, often with multiple vendor offers.

For competitive intelligence, vendor-level data matters. A single product might have 20 different vendors offering it at 20 different prices. The product-level price is meaningless without the vendor breakdown.

async def scrape_coupang_vendors(product_id: int, item_id: int):
    url = f"https://www.coupang.com/vp/products/{product_id}/items/{item_id}/vendor-items"
    # fetch with browser session, parse vendor list
    pass

Cost optimization tactics

Three patterns specifically valuable for Coupang scraping:

Block image and font requests. Coupang product pages load 4 to 6 MB of imagery by default. Blocking via Playwright route interception cuts proxy bandwidth by 75 percent.

Cache vendor data per item. The vendor list rarely changes hourly. Refresh vendor data once per day for most items, more often only for hot SKUs.

Use the API capture pattern over HTML parsing. The intercepted JSON contains structured data; HTML parsing is brittle as Coupang ships frontend updates.

Combined, these cut typical per-page cost from $0.038 to $0.019, roughly half.

Korean ecommerce calendar awareness

Korean ecommerce has different peak periods than Western or ASEAN markets. Plan capacity around:

  • Lunar New Year (Seollal): late January to mid-February. Surge in gift purchases.
  • Pepero Day (November 11): minor spike (different from China’s Singles Day but on the same date).
  • Coupang’s own anniversary sales: irregular schedule, usually late summer.
  • Christmas and New Year: standard global peak.

During peak windows, expect 3x normal load on Coupang infrastructure plus more aggressive bot defense. Scale your IP pool by 2x and increase pacing margins.

Production patterns

Three patterns matter.

First, throttle conservatively. 1-2 requests per second per IP. Coupang challenges aggressive scrapers within minutes.

Second, capture warm sessions. Sessions that have visited the homepage, browsed a category, and visited an item have a much lower challenge rate than cold sessions.

Third, monitor for the Cloudflare interstitial. If your scraper starts hitting “Just a moment…” pages, your IP pool is being challenged. Pause and rotate.

Vendor-level data extraction

Coupang’s vendor-items endpoint is the only way to see all sellers offering a single SKU. The shape:

async def fetch_vendor_items(product_id: int, item_id: int, session_cookies: dict) -> list[dict]:
    url = (f"https://www.coupang.com/vp/products/{product_id}/items/{item_id}"
           f"/vendor-items")
    async with httpx.AsyncClient(cookies=session_cookies) as c:
        r = await c.get(url, headers={
            "Accept": "application/json",
            "User-Agent": "Mozilla/5.0 ...",
            "Referer": f"https://www.coupang.com/vp/products/{product_id}",
        })
        return r.json().get("vendorItems", [])

Each vendor item includes price, stock, shipping cost, vendor name, vendor rating, and delivery type. For brand intelligence (catching unauthorized resellers, monitoring grey-market pricing), this data is gold.

Real benchmarks

A March 2026 production run, 10,000 Coupang products with the API capture pattern:

MetricValue
Success rate91%
Median latency per item5.8 s
p99 latency18 s
Cost per 1000 items$19
Cloudflare challenge rate5.3%
429 throttle rate1.4%

Cloudflare challenges are the leading failure cause. With proper stealth and IP rotation, you can keep the rate under 6 percent.

Storage schema

CREATE TABLE coupang_products (
    id BIGSERIAL PRIMARY KEY,
    product_id BIGINT NOT NULL,
    item_id BIGINT,
    vendor_item_id BIGINT,
    url TEXT NOT NULL,
    title TEXT NOT NULL,
    price_krw NUMERIC(12,0) NOT NULL,
    original_price_krw NUMERIC(12,0),
    rating NUMERIC(3,2),
    review_count INTEGER,
    in_stock BOOLEAN NOT NULL,
    is_rocket BOOLEAN DEFAULT FALSE,
    extracted_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    raw_jsonb JSONB,
    UNIQUE(product_id, item_id, vendor_item_id)
);
CREATE INDEX idx_coupang_extracted_at ON coupang_products(extracted_at);

AI-driven extraction fallback

For pages where the deterministic JSON interception fails (Coupang ships UI updates frequently), fall through to LLM extraction:

async def scrape_with_fallback(url: str) -> dict:
    try:
        return await scrape_with_api_capture(url)
    except (NoPayloadError, KeyError):
        html = await fetch_html(url)
        return await llm_extract_product(html, schema=PRODUCT_SCHEMA)

The LLM fallback runs at roughly 4x the cost per page but catches the cases where the deterministic path breaks. This hybrid keeps the happy path fast and cheap while staying resilient to frontend changes.

Cost expectations

10,000 Coupang Korea products per month with Korean mobile proxies:

ComponentCost
Korean mobile proxy traffic (~2.5MB/page)$80-$130
Browser compute$40
LLM extraction (optional)$30
Total$150-$200

Korean mobile IPs are slightly cheaper than Indonesian mobile, partly because Korean carrier infrastructure has more capacity.

Legal considerations

Korea’s Personal Information Protection Act (PIPA) is strict. Public commercial data (product listings, prices, vendor names) is not personal data. Customer reviews that include real names are personal data and require care; the typical compliance pattern is to extract only ratings and review counts, not review text or reviewer names.

The Coupang terms of service prohibit automated access. Civil enforcement only; no criminal exposure for scraping public commercial data.

For broader compliance reading, see GDPR compliance for web scraping, which covers many of the same principles applied to Korean PIPA.

Coupang-specific data quirks

Several Coupang-only data points that other ecommerce platforms do not expose:

Rocket Wow membership pricing. Members get different prices on many SKUs. The page renders both prices and Wow-only prices appear with a Wow badge. Capture both.

Coupang Card discount. Coupang’s branded credit card offers an automatic discount that appears on the product page. Capture as a separate field; it affects price comparison logic.

Same-day delivery flag. The “Rocket Delivery” badge indicates next-day or same-day delivery. For demand intelligence, this flag is correlated with sales velocity.

Origin country. Coupang labels imported products with origin country (China, Korea, USA, etc). For brand and trade intelligence, this is essential.

def extract_coupang_specific(page_data: dict) -> dict:
    return {
        "wow_price_krw": page_data.get("wowPrice"),
        "card_discount_krw": page_data.get("cardDiscountAmount"),
        "is_rocket_delivery": page_data.get("rocketDelivery", False),
        "origin_country": page_data.get("originCountry"),
    }

Review and rating extraction

Coupang reviews are paginated and load lazily. Each review includes star rating, text, photos, and a buyer-helpful counter. The endpoint:

async def fetch_reviews(product_id: int, page: int = 1, size: int = 30) -> dict:
    url = (f"https://www.coupang.com/vp/product/reviews"
           f"?productId={product_id}&page={page}&size={size}")
    # uses the same session cookies as product fetches
    ...

For sentiment analysis, capture the text plus rating. For authenticity (counterfeit detection), photos are a strong signal because genuine buyers post product photos and fake reviews rarely do.

Frequently asked questions

Can I use Coupang’s Partner API?
Coupang has an Affiliate Partner API for sellers and an Open API for partners. If you qualify, official APIs are the safe path. For competitive intelligence (you are not a seller), scraping is the practical option.

Why does my scraper work for an hour then start failing?
IP reputation degradation. Mobile IPs survive longer than residential, but every IP eventually gets flagged with sustained traffic. Rotate aggressively.

How does Coupang’s anti-bot compare to Naver Smart Store?
Naver is harder. Coupang relies on Cloudflare plus custom challenges; Naver has its own homegrown defense plus deep integration with Korean identity verification. For Naver scraping, expect 2x the cost and 30 percent lower success rate.

Can I scrape Coupang affiliate links?
The affiliate program API gives you tracked product URLs you can include in content. The scraping pattern for product data is the same; only the URL structure adds a tracking parameter.

Can I scrape Coupang Eats (food delivery)?
Yes with similar patterns. Coupang Eats has a mobile-first interface that works best with mobile user agents and Korean mobile IPs.

How do I detect when a Coupang product moves between Rocket and Marketplace?
Track the is_rocket flag over time. A change from true to false often signals supply chain or pricing changes that brand managers care about.

What about Coupang Play (streaming) metadata?
Title and synopsis data are scrapable. View counts and engagement data are not exposed publicly.

How do I handle the seller location data?
Vendor profiles include city-level location for marketplace sellers. Store at city granularity; scraping shop-level address details ventures into PIPA territory.

What about Coupang Fresh (groceries)?
Same scraping pattern with a slightly different URL structure (coupang.com/vp/products/{id} is universal but Fresh items have additional perishability and chilled-delivery flags).

Can I scrape Coupang from outside Korea?
Yes for the public web pages, but mobile carrier IPs from Korea perform dramatically better. From a US IP, expect a 3x challenge rate.

How do I track price changes accurately on Coupang?
Snapshot daily for stable products, hourly for hot deals. Coupang prices can change multiple times per day during 11.11-style sales.

Does Coupang have a search-suggest API I can use for keyword discovery?
Yes, at coupang.com/np/search/suggestion?q={prefix}. Useful for brand monitoring and trend tracking.

Common production gotchas

  • The Cloudflare challenge cookie expires after 30 minutes. Sessions need refresh more often than on Lazada or Shopee.
  • Korean character encoding in URLs uses %EC%-style percent-encoding. URL parsing libraries usually handle it but logging may show garbled text.
  • Some Coupang pages require login for full pricing visibility (loyalty pricing). Scraping anonymously gets you the public price tier only.
  • The mobile site (m.coupang.com) returns slightly different DOM than the desktop site. Pick one and stick with it.
  • Vendor data updates more frequently than product data. Re-scrape vendors weekly even if products are stable.

Storing variant data

Coupang’s nested product/item/vendor-item structure deserves a normalized schema. A relational design that scales:

CREATE TABLE coupang_product_master (
    product_id BIGINT PRIMARY KEY,
    title TEXT NOT NULL,
    brand TEXT,
    category_id INTEGER,
    first_seen_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE TABLE coupang_items (
    item_id BIGINT PRIMARY KEY,
    product_id BIGINT REFERENCES coupang_product_master(product_id),
    variant_attributes JSONB NOT NULL,
    first_seen_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE TABLE coupang_vendor_items (
    vendor_item_id BIGINT PRIMARY KEY,
    item_id BIGINT REFERENCES coupang_items(item_id),
    vendor_id BIGINT NOT NULL,
    vendor_name TEXT,
    is_rocket BOOLEAN DEFAULT FALSE,
    first_seen_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE TABLE coupang_price_history (
    id BIGSERIAL PRIMARY KEY,
    vendor_item_id BIGINT REFERENCES coupang_vendor_items(vendor_item_id),
    price_krw NUMERIC(12,0) NOT NULL,
    in_stock BOOLEAN NOT NULL,
    captured_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE INDEX idx_coupang_price_history_vendor_time
    ON coupang_price_history(vendor_item_id, captured_at);

This shape supports the most common queries (price over time per vendor, which vendors offer SKU X, average price across vendors) without requiring expensive joins.

For more Asian ecommerce coverage, browse the ecommerce category.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top
message me on telegram

Resources

Proxy Signals Podcast
Operator-level insights on mobile proxies and access infrastructure.

Multi-Account Proxies: Setup, Types, Tools & Mistakes (2026)