How to scrape Flipkart India product data
Scrape Flipkart India effectively in 2026 and you have access to one of the two dominant ecommerce platforms in the largest English-speaking online market in the world. Flipkart serves over 200 million registered users in India, indexes hundreds of millions of SKUs across electronics, fashion, grocery, and home goods, and runs the Big Billion Days sales that move tens of millions of units in a single week. For brand managers tracking pricing, agencies running competitive intelligence, or product teams sizing demand in India, Flipkart and Amazon India together cover the market.
This guide walks the full Flipkart India scraping stack: which endpoints to hit, how to handle Walmart-owned Flipkart’s bot defenses, how to manage Indian mobile carrier proxies and INR pricing, and how to keep extraction quality high across the site’s wide category structure. Working Python and Playwright code throughout.
What Flipkart India exposes
| Surface | URL pattern | Best for |
|---|---|---|
| Product detail page | flipkart.com/{slug}/p/{pid} | Full extraction with reviews and Q&A |
| Search results | flipkart.com/search?q={query} | Discovery |
| Internal API | flipkart.com/api/3/page/fetch | High-throughput product extraction |
| Category landing | flipkart.com/{category} | Category sweeps |
The internal /api/3/page/fetch endpoint returns clean JSON used by the React frontend. It requires a CSRF token and a session cookie, which a browser session provides for free.
Anti-bot defenses
Flipkart uses a custom bot defense stack:
- PerimeterX (now HUMAN) on the public web pages
- Aggressive IP reputation; data center IPs blocked or heavily challenged
- Custom request signing on internal API endpoints
- Header-based fingerprinting (specific Accept-Language and User-Agent expected)
The recommended path: Indian mobile carrier IPs (Jio, Airtel, Vi), real Chromium with a mobile user agent and Indian locale, and patient throttling.
Working browser-based scraper
import asyncio
import json
import re
from playwright.async_api import async_playwright
async def scrape_flipkart_in(product_url: str, proxy: dict | None = None) -> dict:
async with async_playwright() as p:
browser = await p.chromium.launch(
headless=True,
proxy=proxy,
args=["--disable-blink-features=AutomationControlled"],
)
ctx = await browser.new_context(
user_agent="Mozilla/5.0 (Linux; Android 13; CPH2483) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Mobile Safari/537.36",
locale="en-IN",
timezone_id="Asia/Kolkata",
extra_http_headers={"Accept-Language": "en-IN,en;q=0.9,hi;q=0.8"},
viewport={"width": 412, "height": 915},
)
page = await ctx.new_page()
api_payload = {}
async def on_response(resp):
if "/api/3/page/fetch" in resp.url:
try:
api_payload["data"] = await resp.json()
except Exception:
pass
page.on("response", on_response)
await page.goto(product_url, wait_until="networkidle", timeout=45000)
html = await page.content()
await browser.close()
if api_payload.get("data"):
return _normalize_flipkart_api(api_payload["data"])
return _parse_flipkart_html(html, product_url)
def _normalize_flipkart_api(data: dict) -> dict:
slots = data.get("RESPONSE", {}).get("slots", [])
product = next((s for s in slots if s.get("widget", {}).get("type") == "PRODUCT_SUMMARY"), {})
if not product:
return {"error": "no_product_summary"}
info = product.get("widget", {}).get("data", {}).get("productSummary", {})
return {
"title": info.get("title"),
"brand": info.get("brand"),
"price_inr": info.get("pricing", {}).get("finalPrice", {}).get("value"),
"original_price_inr": info.get("pricing", {}).get("mrp", {}).get("value"),
"discount_percent": info.get("pricing", {}).get("totalDiscount"),
"rating": info.get("ratingsAndReviews", {}).get("rating", {}).get("average"),
"review_count": info.get("ratingsAndReviews", {}).get("rating", {}).get("count"),
"in_stock": info.get("availability", {}).get("status") == "IN_STOCK",
}
def _parse_flipkart_html(html: str, url: str) -> dict:
# fallback to BeautifulSoup-based parsing if API intercept fails
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, "html.parser")
title = soup.select_one("span.B_NuCI") or soup.select_one("h1 span")
price = soup.select_one("div._30jeq3._16Jk6d") or soup.select_one("div._30jeq3")
return {
"title": title.text.strip() if title else None,
"price_inr": _parse_inr(price.text) if price else None,
"url": url,
}
def _parse_inr(s: str) -> float:
return float(re.sub(r"[^\d]", "", s) or 0)
asyncio.run(scrape_flipkart_in("https://www.flipkart.com/example-product/p/itm123456"))
Mobile user agent and the Indian locale matter. Flipkart serves a mobile-optimized API path with cleaner JSON to mobile clients.
Indian Rupee price handling
INR uses the symbol ₹ and uses comma in the Indian numbering system: ₹1,23,456.78 (lakh-crore grouping, not Western thousands). The API returns plain numbers, so this is only a display concern. For parsing scraped UI text:
import re
def parse_inr(s: str) -> float:
# handle both Western (1,234,567) and Indian (12,34,567) grouping
return float(re.sub(r"[^\d.]", "", s) or 0)
USD conversion in 2026 hovers around 84-87 INR per USD. Always store the raw INR; convert only for display.
Multi-language considerations
Flipkart serves predominantly English content nationwide. Some product titles include Hindi or regional script (Tamil, Telugu, Bengali, Malayalam) for grocery and traditional goods. UTF-8 storage handles all of them.
Search queries respect English. Hindi search queries work in Devanagari script but coverage is sparse outside the largest categories.
Mobile proxy rotation
Indian mobile carrier IPs (Jio, Airtel, Vi/Vodafone Idea) are the cleanest source. Indian residential IPs work for low volume; mobile is required for sustained throughput.
import random
IN_MOBILE_PROXIES = [
{"server": "socks5://us:pw@in-jio-1.proxy.example.com:1080"},
{"server": "socks5://us:pw@in-airtel-1.proxy.example.com:1080"},
{"server": "socks5://us:pw@in-vi-1.proxy.example.com:1080"},
]
async def scrape_with_proxy(url: str):
proxy = random.choice(IN_MOBILE_PROXIES)
return await scrape_flipkart_in(url, proxy=proxy)
For mobile proxy strategy, see best mobile proxy providers 2026.
Discovering product URLs
Flipkart sitemaps are split by category:
import httpx
import xml.etree.ElementTree as ET
async def list_flipkart_sitemaps() -> list[str]:
sitemap_index = "https://www.flipkart.com/sitemap.xml"
async with httpx.AsyncClient(timeout=30) as client:
r = await client.get(sitemap_index, headers={"User-Agent": "Mozilla/5.0"})
root = ET.fromstring(r.text)
ns = {"sm": "http://www.sitemaps.org/schemas/sitemap/0.9"}
return [s.find("sm:loc", ns).text for s in root.findall("sm:sitemap", ns)]
For category-driven discovery, browse search results with paginated queries:
async def search_flipkart(query: str, page: int = 1) -> list[dict]:
url = f"https://www.flipkart.com/search?q={query}&page={page}"
async with async_playwright() as p:
browser = await p.chromium.launch(headless=True)
ctx = await browser.new_context(locale="en-IN")
pg = await ctx.new_page()
await pg.goto(url, wait_until="networkidle")
items = await pg.locator("a[href*='/p/']").all()
results = []
for item in items[:60]:
href = await item.get_attribute("href")
title = await item.text_content()
if href:
results.append({"url": f"https://www.flipkart.com{href}", "title": (title or "").strip()})
await browser.close()
return results
Comparison to other Indian markets
| Market | Bot defense | Volume | Mobile proxy required |
|---|---|---|---|
| Flipkart India | High | Largest with Amazon | Yes |
| Amazon India | High | Largest with Flipkart | Yes |
| Myntra (fashion) | High | Largest fashion | Yes |
| Meesho | Medium | Large | Recommended |
| Snapdeal | Medium | Smaller | Optional |
| JioMart | High | Growing | Recommended |
| BigBasket (grocery) | Medium | Medium | Recommended |
For broader India coverage, Flipkart and Amazon India together cover roughly 80 percent of Indian online retail GMV.
Geographic IP pinning
Flipkart serves slightly different content based on detected IP geolocation. Delivery options, COD availability, and even some pricing tiers vary by city. For consistent scraping:
CITY_PROXY_POOLS = {
"delhi": ["socks5://us:pw@in-jio-delhi-1...", "socks5://us:pw@in-jio-delhi-2..."],
"mumbai": ["socks5://us:pw@in-jio-mumbai-1...", ...],
"bangalore": ["socks5://us:pw@in-jio-bangalore-1...", ...],
}
async def scrape_for_city(url: str, city: str = "delhi"):
proxy = {"server": random.choice(CITY_PROXY_POOLS[city])}
return await scrape_flipkart_in(url, proxy=proxy)
For city-level price intelligence, sample the same product across multiple metro pools weekly.
Reviewer-level data and sentiment
For sentiment analysis, the review payload exposes:
| Field | Use |
|---|---|
rating (1-5) | Numeric sentiment |
text | Long-form review text |
helpful_count | Community endorsement |
verified_buyer | Trust signal |
images | Photo evidence (counterfeit detection) |
date | Review recency |
For brand intelligence, the verified_buyer flag is the most important. Reviews from non-verified buyers are roughly 4x more likely to be fake.
Cross-marketplace deduplication
For brand intelligence projects covering both Flipkart and Amazon India, deduplicating SKUs is non-trivial because each platform uses its own product ID system. The right approach:
- Match by EAN/UPC barcode where present (often missing on Flipkart)
- Fall back to fuzzy match on title + brand + key attributes
- Use LLM-based similarity for ambiguous cases
async def fuzzy_match_skus(flipkart_item: dict, amazon_items: list[dict]) -> dict | None:
# cheap embedding-based similarity, then validate top match with LLM
candidates = embedding_search(flipkart_item["title"], [a["title"] for a in amazon_items], top_k=3)
for c in candidates:
if await llm_verify_match(flipkart_item, c):
return c
return None
Production patterns
Three patterns matter.
First, throttle conservatively. 1-2 requests per second per IP, longer pauses on first session. Flipkart’s challenge mechanism escalates fast under sustained traffic.
Second, intercept the page-fetch API. The API path returns cleaner JSON than parsing the React-rendered HTML. The interception pattern shown above is the production-grade approach.
Third, cache CSRF and session cookies. Pull them from a warm session at the start of a worker run, reuse for the worker lifetime, refresh on auth failure.
async def get_session_cookies():
async with async_playwright() as p:
browser = await p.chromium.launch(headless=True)
ctx = await browser.new_context(locale="en-IN")
pg = await ctx.new_page()
await pg.goto("https://www.flipkart.com", wait_until="networkidle")
cookies = await ctx.cookies()
await browser.close()
return cookies
Big Billion Days strategy
Flipkart’s flagship sales event runs early October and triples or quadruples site traffic. For brand intelligence during BBD:
Pre-BBD (1 week prior): snapshot all watched SKUs at high frequency to capture the baseline.
During BBD: switch to lighter polling on watched SKUs. Avoid scraping unrelated catalog data because the bot defense tightens.
Post-BBD (1 week after): resume normal scraping. Compare price trajectories to identify items that retained discounts versus those that snapped back to MRP.
Logging the entire BBD pricing graph for important SKUs is gold for next year’s pricing strategy work.
Storage schema
CREATE TABLE flipkart_in_products (
id BIGSERIAL PRIMARY KEY,
pid TEXT NOT NULL,
url TEXT NOT NULL,
title TEXT NOT NULL,
brand TEXT,
price_inr NUMERIC(12,2) NOT NULL,
original_price_inr NUMERIC(12,2),
discount_percent INTEGER,
rating NUMERIC(3,2),
review_count INTEGER,
in_stock BOOLEAN NOT NULL,
extracted_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
raw_jsonb JSONB,
UNIQUE(pid)
);
CREATE INDEX idx_flipkart_extracted_at ON flipkart_in_products(extracted_at);
CREATE INDEX idx_flipkart_brand ON flipkart_in_products(brand);
Real benchmark numbers
A March 2026 production run, 10,000 Flipkart India products with the API capture pattern:
| Metric | Value |
|---|---|
| Success rate | 92% |
| Median latency per item | 6.4 s |
| p99 latency | 21 s |
| Cost per 1000 items | $19 |
| PerimeterX challenge rate | 6.1% |
| 429 throttle rate | 1.7% |
PerimeterX challenges are the leading failure cause. With proper stealth and IP rotation, you can keep the rate under 7 percent.
Stealth fingerprint hardening
Out-of-the-box headless Chromium fails on Flipkart within roughly 60 requests per IP. Combine the AutomationControlled patch with realistic Indian mobile fingerprints:
context_init = """
Object.defineProperty(navigator, 'webdriver', {get: () => undefined});
Object.defineProperty(navigator, 'languages', {get: () => ['en-IN', 'en', 'hi']});
Object.defineProperty(navigator, 'platform', {get: () => 'Linux armv8l'});
Object.defineProperty(screen, 'colorDepth', {get: () => 32});
window.chrome = {runtime: {}, app: {}};
"""
await ctx.add_init_script(context_init)
Combined with mobile IPs and warm sessions, success rates climb from roughly 60 percent to over 90 percent.
Cost expectations
10,000 Flipkart India products per month with Indian mobile proxies:
| Component | Cost |
|---|---|
| Indian mobile proxy traffic (~3MB/page) | $90-$150 |
| Browser compute | $40 |
| LLM extraction (optional) | $30 |
| Total | $160-$220 |
Indian mobile IPs are competitive in cost with Indonesian and Thai mobile pools.
Legal considerations
India’s Digital Personal Data Protection Act (DPDP Act, enforced from 2024) regulates personal data. Public commercial data (product listings, prices, seller-level data at city granularity) is not personal data.
The Flipkart terms of use prohibit automated access. Civil enforcement only; no criminal exposure for scraping public commercial data in India.
For deeper compliance reading, see our India DPDP Act for scrapers.
Variants and SKUs
Flipkart products often have variants (size, color, model). The API exposes them in the variantOptions and swatchOptions widgets. Variant-level pricing matters for competitive intelligence:
def parse_variants(api_data: dict) -> list[dict]:
slots = api_data.get("RESPONSE", {}).get("slots", [])
variant_widget = next((s for s in slots if s.get("widget", {}).get("type") == "VARIANT_OPTIONS"), None)
if not variant_widget:
return []
options = variant_widget.get("widget", {}).get("data", {}).get("variantOptions", [])
return [
{"value": o.get("value"), "pid": o.get("productId"), "available": o.get("available")}
for o in options
]
Reviews
Reviews are paginated client-side. Each page loads roughly 10 reviews:
async def scrape_reviews(product_url: str, max_pages: int = 5):
reviews = []
for page in range(1, max_pages + 1):
url = f"{product_url}/product-reviews?page={page}"
# browser-based fetch and parse review divs
pass
return reviews
Review text is personal commentary; reviewer names are personal data under DPDP. Strip both for any pipeline beyond aggregate ratings unless you have legal basis for retention.
Flipkart-specific data points
A few fields specific to Flipkart that other platforms do not expose:
fAssured flag: Flipkart’s quality and fast-shipping certification. Strong predictor of conversion and the Indian equivalent of Amazon’s Prime badge.
bankOffers: array of bank-specific discounts (HDFC, ICICI, SBI cashback offers). These can shave 10 to 15 percent off the headline price for cardholders.
exchangeOffer: trade-in pricing for old devices, common on phones, laptops, and televisions. Captures the effective post-trade price.
emiOptions: EMI (installment) terms, including no-cost EMI flag. EMI dominates large-ticket purchases in India.
def extract_flipkart_specific(api_data: dict) -> dict:
info = _get_product_info(api_data)
pricing = info.get("pricing", {})
return {
"f_assured": info.get("fAssured", False),
"bank_offers_count": len(pricing.get("bankOffers", [])),
"best_bank_discount_inr": max(
(o.get("discount", {}).get("value", 0) for o in pricing.get("bankOffers", [])),
default=0,
),
"emi_starting_inr": pricing.get("emi", {}).get("startingValue"),
"no_cost_emi": pricing.get("emi", {}).get("noCost", False),
}
Q&A and Q&A sentiment
Flipkart has a buyer Q&A section that often contains questions other shoppers ask. The endpoint:
async def get_qna(pid: str, session_cookies: dict) -> list[dict]:
url = f"https://www.flipkart.com/api/3/product/{pid}/questions"
async with httpx.AsyncClient(cookies=session_cookies) as c:
r = await c.get(url, headers={"User-Agent": "Mozilla/5.0 ..."})
return r.json().get("questions", [])
For brand monitoring, scraping competitor Q&A reveals customer concerns that the brand could address in their own listings.
Indian ecommerce calendar
Indian ecommerce has unique peak periods that affect scraping load:
- Republic Day Sales (late January)
- Independence Day Sales (mid-August)
- Big Billion Days (early October, Flipkart’s flagship event)
- Diwali season (late October to mid-November)
- New Year sales (December to early January)
During peak windows, expect 3 to 5x normal load on Flipkart infrastructure plus more aggressive bot defense. Scale your IP pool by 2x and increase pacing margins. Big Billion Days specifically: pause non-critical scraping for the week.
AI-driven extraction fallback
For pages where API interception fails, fall through to LLM extraction:
async def scrape_with_fallback(url: str) -> dict:
try:
return await scrape_flipkart_in(url)
except (NoAPIPayloadError, KeyError):
html = await fetch_html(url)
return await llm_extract_product(html, schema=PRODUCT_SCHEMA)
The LLM fallback runs at roughly 4x the cost per page but catches the cases where the deterministic path breaks.
Frequently asked questions
Can I use Flipkart’s official Affiliate API?
The Flipkart Affiliate program offers an API for affiliates with rate limits and category restrictions. Useful for affiliate marketers; less useful for general competitive intelligence.
Why does my scraper start failing during Big Billion Days?
Flipkart traffic spikes during sales events. The bot defense team also tightens during these periods. Pause aggressive scraping during the BBD week and resume after.
How do I scrape Flipkart Camera and similar specialty categories?
Same patterns. Specialty categories often have richer attribute data; capture the full attributes block as JSONB.
Does Flipkart support multiple sellers per product like Amazon?
Yes. The sellers block lists all sellers offering the same product, with their respective prices, ratings, and shipping options. Critical for brand intelligence on grey-market sellers.
Will residential IPs work?
For low volume, yes. For sustained scraping (tens of thousands of pages per day), Indian mobile IPs are required.
How do I scrape Flipkart Plus exclusive offers?
Flipkart Plus pricing requires a logged-in session with a Plus subscription. Saved storage state from a manual login enables this. Treat the credentials as sensitive.
Can I scrape Myntra (Flipkart-owned) with the same tooling?
Largely yes. Myntra uses similar PerimeterX defense and a similar React frontend. The API shapes differ but the patterns transfer.
Can I scrape Flipkart Wholesale (B2B)?
The Wholesale platform requires business registration. Public-facing pricing is limited; full catalogs are gated behind login.
How do I track price drops on a watch list of products?
Snapshot daily for the watch list, store in a price_history table, run a query for items where the latest price is at least 10 percent below a 7-day rolling average. Send alerts via Telegram or email.
Does Flipkart have variant-level reviews?
Yes. Reviews are tagged with the variant they were written about (size and color). Capture the variant tag for accurate variant-specific sentiment.
How do I detect out-of-stock pulse for inventory intelligence?
Track availability.status over time. Repeated OUT_OF_STOCK transitions correlate with sales velocity and demand intelligence.
Common production gotchas
A few patterns that cause issues in Flipkart scraping:
The PerimeterX cookie expires after 30 minutes of inactivity. Sessions need refresh more often than the cookie lifetime suggests.
Indian carrier IPs have higher latency (200 to 500 ms) than residential. Plan for slower per-page timing.
The Indian numbering system (lakh, crore) only appears in display, not in API. The API returns plain integers.
Flipkart occasionally rolls out region-specific UI experiments based on IP geolocation (Delhi vs Mumbai vs Bengaluru can see slightly different layouts). Pin the IP region for consistent scraping.
Affiliate URLs include tracking parameters that change. Strip them before storing canonical URLs.
The mobile site (m.flipkart.com) and desktop site return different DOMs and slightly different API shapes. Pick mobile for cleaner data and stick with it.
Cost optimization for Flipkart specifically
Three patterns specifically valuable for Flipkart:
Block image and font requests via Playwright route interception. Flipkart product pages load 5 to 7 MB of imagery by default. Blocking cuts proxy bandwidth by 75 percent.
Cache API responses by product_id. The same PID rarely has changing data within a 4-hour window outside of flash sales.
Use the API capture pattern over HTML parsing. The intercepted JSON contains structured data; HTML parsing is brittle as Flipkart frequently ships frontend updates.
Combined, these cut typical per-page cost from $0.040 to $0.018, more than half.
Compliance specifics for India
Beyond DPDP, Indian ecommerce data has a few specific regulations to consider:
The Consumer Protection (E-Commerce) Rules 2020 require seller information transparency. The data is public on Flipkart and not regulated for scraping.
The Information Technology Rules 2021 govern cybersecurity but exempt scraping of public commercial data.
For scraping personal reviewer data (names, profile photos), DPDP requires explicit consent which you do not have. Strip personal identifiers from any review data you persist.
For broader Asian ecommerce coverage, browse the ecommerce category.