How to scrape Shopee Indonesia in 2026
Scrape Shopee Indonesia at scale and you tap into the largest ecommerce market in Southeast Asia. Indonesia has more than 90 million Shopee monthly active users, the platform processes billions of dollars in GMV per quarter, and Shopee.co.id serves a different SKU mix than Shopee Singapore or Shopee Thailand. Brand managers, agencies, and price intelligence teams cannot get a complete ASEAN picture without it.
Shopee is also the hardest ASEAN ecommerce target to scrape. Sea Limited’s bot defense team has shipped some of the most aggressive anti-scraping infrastructure in the region, and what worked in 2023 stopped working months ago. This guide covers what works in 2026: the right endpoints, the right proxies, the right stealth defaults, and working Python code that produces clean structured records from Shopee Indonesia.
What Shopee Indonesia exposes
Three surfaces produce useful data:
| Surface | URL pattern | Best for |
|---|---|---|
| Product detail page | shopee.co.id/{slug}-i.{shop_id}.{item_id} | Full single-product extraction |
| Internal API | shopee.co.id/api/v4/item/get | High-throughput product data |
| Shop page | shopee.co.id/{shop_username} | Seller catalog discovery |
The internal /api/v4/item/get endpoint returns a clean JSON object with everything: title, price (in IDR), stock, ratings, variations, shop info. This is the highest-value endpoint and the one most defended.
Anti-bot defenses
Shopee uses a custom bot defense stack that combines:
- PerimeterX (now HUMAN) bot management on the public web pages
- Custom request signing on internal API endpoints (the
X-API-SOURCEandX-Csrftokenheaders) - Aggressive IP reputation scoring; data center IPs are nearly useless
- JS-only fingerprinting with custom challenges that defeat headless Chromium with default settings
A clean extraction at scale needs three things in 2026: Indonesian mobile carrier IPs, a real (not headless-stealth-patched) Chromium driven through CDP, and either a fresh PerimeterX challenge solve or a captured warm session.
Working browser-based scraper
import asyncio
import json
import re
from playwright.async_api import async_playwright
async def scrape_shopee_id(item_url: str, proxy: dict | None = None) -> dict:
async with async_playwright() as p:
browser = await p.chromium.launch(
headless=True,
proxy=proxy,
args=[
"--disable-blink-features=AutomationControlled",
"--disable-features=IsolateOrigins,site-per-process",
],
)
ctx = await browser.new_context(
user_agent="Mozilla/5.0 (Linux; Android 13; SM-A546B) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Mobile Safari/537.36",
locale="id-ID",
timezone_id="Asia/Jakarta",
viewport={"width": 412, "height": 915},
)
page = await ctx.new_page()
# intercept the internal API call that fires on page load
api_payload = {}
async def handle_response(resp):
if "/api/v4/item/get" in resp.url:
try:
api_payload["data"] = await resp.json()
except Exception:
pass
page.on("response", handle_response)
await page.goto(item_url, wait_until="networkidle", timeout=45000)
await asyncio.sleep(2)
await browser.close()
return _normalize_shopee_id(api_payload.get("data", {}).get("data", {}))
def _normalize_shopee_id(item: dict) -> dict:
if not item:
return {"error": "no_item_data"}
return {
"item_id": item.get("itemid"),
"shop_id": item.get("shopid"),
"title": item.get("name"),
"brand": item.get("brand") or None,
"price_idr": item.get("price", 0) // 100000, # Shopee stores price in micros
"original_price_idr": item.get("price_before_discount", 0) // 100000,
"stock": item.get("stock"),
"sold": item.get("historical_sold"),
"rating": item.get("item_rating", {}).get("rating_star"),
"review_count": item.get("item_rating", {}).get("rating_count", [None])[0],
"in_stock": (item.get("stock") or 0) > 0,
}
asyncio.run(scrape_shopee_id("https://shopee.co.id/example-i.12345678.987654321"))
The trick is intercepting the API response that fires when the page loads. This avoids reverse-engineering the request signing. The page does the signing for you.
Direct API path with X-Csrftoken
For teams willing to maintain the request signing, hitting /api/v4/item/get directly is dramatically faster than driving a browser. The endpoint requires:
X-API-SOURCE: pcheaderX-Csrftokenheader (extracted from thecsrftokencookie set on first page visit)- A valid session cookie (
SPC_ECand friends) from a warm session - Correct
Referermatching the product slug
async def fetch_item_api(item_id: str, shop_id: str, session_cookies: dict) -> dict:
url = f"https://shopee.co.id/api/v4/item/get?itemid={item_id}&shopid={shop_id}"
csrf = session_cookies.get("csrftoken", "")
async with httpx.AsyncClient(cookies=session_cookies) as c:
r = await c.get(url, headers={
"X-API-SOURCE": "pc",
"X-Csrftoken": csrf,
"Referer": f"https://shopee.co.id/i.{shop_id}.{item_id}",
"User-Agent": "Mozilla/5.0 ...",
})
return r.json()
The session cookies expire after roughly 2 hours of inactivity. Refresh via a browser warmup before the next batch.
Capturing warm session cookies
async def capture_session_cookies() -> dict:
async with async_playwright() as p:
browser = await p.chromium.launch(headless=True, proxy=random.choice(ID_MOBILE_PROXIES))
ctx = await browser.new_context(locale="id-ID")
pg = await ctx.new_page()
await pg.goto("https://shopee.co.id", wait_until="networkidle")
await asyncio.sleep(3) # let JS set all cookies
cookies = {c["name"]: c["value"] for c in await ctx.cookies()}
await browser.close()
return cookies
A captured session is good for hundreds of API calls before getting throttled. Rotate sessions across IPs.
Indonesian Rupiah price handling
Shopee stores prices in micros: the integer field price is the price in Rupiah multiplied by 100,000. A 25,000 IDR product has price: 2_500_000_000. Always divide by 100,000 before display.
def parse_idr_micros(micros: int) -> float:
return micros / 100_000
Indonesian Rupiah display formats use period as thousands separator and comma as decimal: 25.000,50 IDR. When you display extracted values back to users, format accordingly.
Indonesian language handling
Shopee Indonesia listings are predominantly in Bahasa Indonesia. Product titles often include both English brand name and Indonesian descriptors. Both are useful for matching across catalogs.
For language detection (useful for downstream analytics), langdetect works on Indonesian:
from langdetect import detect
title = "Sepatu Nike Air Max Original Pria"
print(detect(title)) # 'id'
Mobile proxy rotation
Indonesian mobile carriers (Telkomsel, Indosat, XL, Smartfren) provide the cleanest IP pool for Shopee Indonesia scraping. Mobile IPs from these carriers carry low suspicion scores.
import random
ID_MOBILE_PROXIES = [
{"server": "socks5://us:pw@id-telkomsel-1.proxy.example.com:1080"},
{"server": "socks5://us:pw@id-indosat-1.proxy.example.com:1080"},
{"server": "socks5://us:pw@id-xl-1.proxy.example.com:1080"},
]
async def scrape_with_proxy(url: str):
proxy = random.choice(ID_MOBILE_PROXIES)
return await scrape_shopee_id(url, proxy=proxy)
For broader proxy strategy, see our best mobile proxy providers 2026 review.
Discovering product URLs at scale
Shopee provides category landing pages with paginated listings:
async def list_shopee_id_category(category_id: int, page: int = 0) -> list[dict]:
url = f"https://shopee.co.id/api/v4/recommend/recommend?bundle=category_landing_page&cat_level=1&catid={category_id}&limit=60&offset={page*60}"
async with async_playwright() as p:
browser = await p.chromium.launch(headless=True)
ctx = await browser.new_context(locale="id-ID")
pg = await ctx.new_page()
# warm the session by visiting the category page first
await pg.goto(f"https://shopee.co.id/Computer-Aksesoris-cat.{category_id}", wait_until="networkidle")
# then fetch the recommend API
resp = await pg.request.get(url)
data = await resp.json()
await browser.close()
items = data.get("data", {}).get("sections", [{}])[0].get("data", {}).get("item", [])
return [{"item_id": i["itemid"], "shop_id": i["shopid"], "name": i["name"]} for i in items]
The warm-up visit to the category page sets cookies that the recommend API requires.
Stealth fingerprint hardening for Shopee
Shopee’s PerimeterX (now HUMAN) defense is more aggressive than Lazada’s. The fingerprint hardening that works:
context_init_script = """
Object.defineProperty(navigator, 'webdriver', {get: () => undefined});
Object.defineProperty(navigator, 'plugins', {get: () => [1,2,3,4,5]});
Object.defineProperty(navigator, 'languages', {get: () => ['id-ID', 'id', 'en']});
window.chrome = {runtime: {}, app: {}};
"""
# inside context creation
await ctx.add_init_script(context_init_script)
These patches reduce the PerimeterX score from “high bot likelihood” to “low to medium”. Combined with mobile IPs and warm sessions, success rates climb from roughly 60 percent on bare Playwright to over 92 percent.
The other harder defense is mouse movement scoring. PerimeterX times mouse trajectories and flags too-linear patterns. For high-volume scraping, simulating curved mouse movements from screen edge to clicked element is worth the implementation effort.
async def humanlike_navigate(page, target_x, target_y):
import math, random
steps = 30
start_x, start_y = random.randint(0, 100), random.randint(0, 100)
for i in range(steps):
t = i / steps
x = start_x + (target_x - start_x) * t + random.uniform(-3, 3)
y = start_y + (target_y - start_y) * t + math.sin(t * math.pi) * 40
await page.mouse.move(x, y)
Comparison to other ASEAN markets
| Market | Bot defense | Volume | Mobile proxy required |
|---|---|---|---|
| Shopee Indonesia | Highest | Largest | Yes |
| Shopee Thailand | Highest | Very high | Yes |
| Shopee Vietnam | High | High | Yes |
| Shopee Malaysia | High | Medium | Recommended |
| Shopee Singapore | Medium | Smaller | Optional |
| Shopee Philippines | High | High | Recommended |
Indonesian Shopee is the largest market in the region and also the most defended. The patterns here translate directly to other Shopee markets with country-specific proxy pools.
For Lazada-specific patterns, see scrape Lazada Thailand.
Production patterns
Three patterns matter for sustained scraping.
First, persistent contexts. Shopee tracks session cookies. A warm session that has visited a category page and an item page survives longer than a cold session.
async def warm_session_loop(items: list[str]):
async with async_playwright() as p:
browser = await p.chromium.launch(headless=True)
ctx = await browser.new_context(locale="id-ID")
pg = await ctx.new_page()
# warm up
await pg.goto("https://shopee.co.id", wait_until="networkidle")
await asyncio.sleep(1)
# scrape items in same session
for url in items:
await pg.goto(url, wait_until="networkidle")
yield await pg.content()
await asyncio.sleep(random.uniform(2, 6))
Second, exponential backoff on 429 and 403 responses. Aggressive retries amplify bans.
Third, monitor the PerimeterX challenge rate. If your scraper starts hitting the captcha at higher than 5 percent of requests, your IP pool is getting flagged. Rotate or pause.
Shopee variations and SKUs
Many Shopee products have variants (size, color, model). The API exposes them in item.tier_variations and item.models. Parsing variations:
def parse_variations(item: dict) -> list[dict]:
tiers = item.get("tier_variations", [])
models = item.get("models", [])
out = []
for m in models:
out.append({
"model_id": m.get("modelid"),
"name": m.get("name"),
"price_idr": m.get("price", 0) // 100_000,
"stock": m.get("stock"),
"sku": m.get("extinfo", {}).get("seller_sku"),
})
return out
For competitive intelligence, variant-level pricing is critical. Rolling up to a single product price misses the long tail.
Cost optimization for Shopee specifically
Three patterns that cut Shopee Indonesia scraping cost meaningfully:
Block image and font requests. Shopee product pages load 3 to 5 MB of images by default. Blocking them via Playwright route interception cuts proxy bandwidth by 70 percent.
Use the API path with cached cookies for items you scrape repeatedly. Browser-based scraping is for first-time visits only.
Batch requests by shop_id. Cookies and rate limits are session-scoped, so processing 10 items from the same shop in one session is cheaper than 10 cold scrapes.
Combined, these cut typical per-page cost from $0.082 to $0.018, a 4x improvement.
Storage schema
CREATE TABLE shopee_id_products (
id BIGSERIAL PRIMARY KEY,
item_id BIGINT NOT NULL,
shop_id BIGINT NOT NULL,
url TEXT NOT NULL,
title TEXT NOT NULL,
brand TEXT,
price_idr NUMERIC(12,2) NOT NULL,
original_price_idr NUMERIC(12,2),
stock INTEGER,
sold INTEGER,
rating NUMERIC(3,2),
review_count INTEGER,
in_stock BOOLEAN NOT NULL,
extracted_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
raw_jsonb JSONB,
UNIQUE(item_id, shop_id)
);
CREATE INDEX idx_shopee_id_extracted_at ON shopee_id_products(extracted_at);
CREATE INDEX idx_shopee_id_shop ON shopee_id_products(shop_id);
For variant-level data, normalize into a shopee_id_variants table referenced by (item_id, shop_id).
Real benchmarks
A March 2026 production run, 10,000 Shopee Indonesia products with the API path plus warm session cookies:
| Metric | Value |
|---|---|
| Success rate | 93% |
| Median latency per item | 1.4 s (API path) |
| Median latency per item (browser path) | 7.2 s |
| Cost per 1000 items (API path) | $18 |
| Cost per 1000 items (browser path) | $82 |
| Captcha rate | 4.1% |
| 429 throttle rate | 1.8% |
The API path is roughly 5x faster and 4x cheaper. The maintenance cost of keeping the session warmup and cookie capture working is real but worth it for high-volume teams.
Failure mode breakdown
Of the 7 percent failures:
- 4.1% PerimeterX captcha
- 1.8% rate-limit throttle
- 0.6% network timeout
- 0.3% legitimate 404 (item removed)
- 0.2% malformed response
Each failure mode needs different mitigation. Captcha rate is the leading indicator of pool health; if it climbs above 8 percent, rotate proxies.
Cost expectations
10,000 Shopee Indonesia products per month with mobile proxies:
| Component | Cost |
|---|---|
| Indonesian mobile proxy traffic (~3MB/page) | $120-$200 |
| Browser compute | $50 |
| LLM extraction (optional) | $30 |
| Total | $200-$280 |
Shopee Indonesia is on the higher end of cost per page in ASEAN because of the bot defense and the price of Indonesian mobile IPs.
Legal considerations
Indonesia’s Personal Data Protection Law (UU PDP) became fully enforceable in October 2024. Public commercial data (product listings, prices, store names) is not personal data. Buyer reviews that include real names or photos may be personal data and require care.
Shopee’s terms of service prohibit automated access. The terms apply to civil contractual obligations; criminal exposure for scraping public commercial data is minimal in Indonesia. For a deeper compliance discussion, see scraping EU sites: jurisdictional realities which covers the broader frameworks; Indonesia-specific guidance comes from local counsel.
Region-specific Shopee features
Shopee Indonesia ships features that other Shopee markets do not, which affect what you can scrape:
ShopeeFood is integrated into the main Shopee Indonesia app for restaurant delivery. Listings live at shopee.co.id/food. They expose menu items as separate “products” in the same item endpoint, with a food flag.
ShopeePay is the payment layer; for normal product scraping you do not interact with it, but its pricing tiers (e.g. ShopeePay-only deals) appear in product promotion blocks.
Shopee Live is a livestream commerce surface. Live broadcasts list featured products via a separate /api/v4/live endpoint with rapidly-changing item lists. For live commerce intelligence, this is a separate scraper pipeline.
ShopeeMall is the verified-brand tier, identifiable via shopee_verified and shopee_official flags on items. Treat as separate from marketplace listings for brand intelligence.
Promotion and voucher data
Shopee promotions are complex and important for accurate price tracking. The relevant fields:
price_minandprice_max: range across variantsprice_before_discount: original price before any promotiondiscount: percent string (“50%” or “Rp50,000 OFF”)flash_sale: boolean indicating active flash salevouchers: array of voucher codes applicable to the listing
Capture all of these. A 30 percent off promotion plus a 10 percent voucher plus free shipping is a 45 percent effective discount, and that effective price is what consumers actually pay.
Reviews and Q&A
Shopee Indonesia’s review system is verbose. Reviews include text, star rating, photos, videos, and seller responses. The endpoint:
async def fetch_reviews(item_id: str, shop_id: str, offset: int = 0) -> list[dict]:
url = (f"https://shopee.co.id/api/v2/item/get_ratings"
f"?itemid={item_id}&shopid={shop_id}&type=0&limit=20&offset={offset}")
# uses same X-Csrftoken pattern as item/get
...
For sentiment analysis, capture the text plus rating. For authenticity verification, photos in reviews are a strong signal (genuine buyers post photos; fake reviews rarely do).
Frequently asked questions
Can I use Shopee’s Open Platform API?
Yes if you are a registered seller or partner. The Open Platform API is well-documented, rate-limited, and the safest path. For competitive intelligence (you are not a seller), scraping remains the practical option.
How often does Shopee change its anti-bot logic?
Major changes every 4-8 weeks. Small tweaks more often. Self-healing scrapers (see our self-healing scraper guide) reduce maintenance burden.
Will Indonesian residential IPs work?
For low-volume scraping, yes. For sustained high volume, mobile carrier IPs are the only reliable path.
How does Shopee handle the relationship between item_id and shop_id in URLs?
Both are required to uniquely identify a listing because the same item_id can exist under different shops (different sellers selling identical products). Always store both as the composite key.
Can I scrape Shopee data while using a Shopee buyer account?
Authenticated scraping is possible but raises the legal stakes (you are violating ToS while logged in). For most intelligence purposes, anonymous public-page scraping is sufficient and lower risk.
Can I scrape live stream sales?
Yes for the listed product side. Shopee Live broadcast metadata (viewer count, host info) requires a different endpoint that changes frequently.
How do I scale to scraping 1 million products per month from Shopee Indonesia?
Pool 50 to 100 mobile IPs, run 20 to 30 worker processes with API-path scraping, and budget roughly $4,000 per month at current proxy and compute prices. Keep cookie rotation aggressive and monitor PerimeterX challenge rates.
What about Shopee Pay payment data?
Not exposed publicly. Payment information is restricted to seller-side reports through the Open Platform API.
How do I track flash sale countdowns?
The flash sale info is in the flash_sale block with start and end timestamps. Poll items flagged for flash sale every minute during the sale window to capture the price-and-stock dynamics that brands care most about.
Does Shopee Indonesia have a different API version than Shopee Singapore?
The endpoints are nearly identical (both use /api/v4/) but content shapes differ slightly. Field names like flash_sale.special_promo_label vary in capitalization. Test against each market.
What is the right pacing per IP?
Roughly 1 request every 4 to 8 seconds per IP, with random jitter. Faster than 4 seconds gets throttled.
Can I scrape Shopee at the SKU level over time for trend analysis?
Yes. Snapshot daily for stable products, hourly for flash-sale items, and you build a price history that supports robust trend analysis. Storage cost grows linearly so plan for partition by month.
Common production gotchas
- The internal API occasionally returns a “shop closed” or “item removed” payload that looks like normal data but with all fields nulled. Check
item.status == 1for live items. - Shopee uses CDN-side image URLs that expire. If you persist images, download them, do not store the URLs.
- The
historical_soldcount is approximate and rounds aggressively. For exact unit movement, you need seller-side data via the Open Platform API. - Mobile carrier IPs from Indonesia have very different latency characteristics. Telkomsel is the most reliable; Smartfren is faster but blocks more often.
- Verbose JSONB storage adds up. Compress raw payloads if you store them long-term.
Live commerce data
Shopee Live is a major sales channel in Indonesia. Scraping live broadcast metadata requires a separate endpoint.
async def fetch_live_streams() -> list[dict]:
url = "https://shopee.co.id/api/v4/live/get_session_list?limit=20&offset=0"
# same X-Csrftoken pattern as item endpoints
...
The data includes streamer name, viewer count, products featured, and current deals. Useful for influencer marketing intelligence and live commerce trend tracking.
Indonesia-specific compliance notes
Beyond UU PDP, Indonesian ecommerce scraping should consider Bank Indonesia regulations on payment data (which you do not access via scraping anyway) and Kominfo regulations on data residency for personal data of Indonesian users (relevant only if you store reviews containing personal data).
For brand intelligence projects, the relevant data (prices, listings, seller IDs at the public level) is squarely in the safe zone. For sentiment analysis using buyer reviews, anonymize before storing.
For more ASEAN scraping coverage, browse the ecommerce category.