How to Scrape Lazada with Proxies in 2026 (SEA E-commerce)
Lazada is Southeast Asia’s second-largest e-commerce platform, backed by Alibaba Group, and operates across six countries: Singapore, Malaysia, Thailand, Vietnam, the Philippines, and Indonesia. As a key competitor to Shopee, Lazada data is essential for anyone conducting SEA e-commerce research, competitive analysis, or price monitoring.
This guide covers how to scrape Lazada product data using Python with SEA regional proxies, including strategies for handling Alibaba’s advanced anti-bot technology.
Why Scrape Lazada?
Lazada data addresses multiple business needs in the SEA e-commerce ecosystem:
- Competitive intelligence — Compare product pricing, availability, and seller strategies against Shopee and other marketplaces
- Brand monitoring — Track authorized and unauthorized sellers of your products across Lazada markets
- Price benchmarking — Monitor competitor pricing across countries to optimize your own pricing strategy
- Market sizing — Estimate market size and demand for product categories by analyzing listing volume and sales data
- Seller research — Identify top-performing sellers, their product ranges, and pricing patterns
- Cross-border commerce — Understand how the same products are priced and positioned across different SEA countries
- Trend detection — Spot rising product categories and seasonal demand shifts
Lazada’s Multi-Country Operations
Lazada operates distinct sites for each country:
| Country | Domain | Currency | Notes |
|---|---|---|---|
| Singapore | lazada.sg | SGD | Mature market, high AOV |
| Malaysia | lazada.com.my | MYR | Fast-growing market |
| Thailand | lazada.co.th | THB | Largest Lazada market |
| Vietnam | lazada.vn | VND | Rapidly growing |
| Philippines | lazada.com.ph | PHP | Strong mobile commerce |
| Indonesia | lazada.co.id | IDR | Competitive with Tokopedia |
Each country site has distinct product catalogs, pricing, sellers, and promotional events.
Data Points to Extract
| Data Point | Source | Notes |
|---|---|---|
| Product name | Listing card / detail | May be in local language |
| Price | Price element | Current, original, and discounted |
| Rating | Star display | Average rating (1-5) |
| Review count | Rating section | Total number of reviews |
| Sold count | Sales indicator | Monthly or total sales |
| Seller name | Seller section | Shop name and type (LazMall, etc.) |
| Seller rating | Seller profile | Positive rating percentage |
| Shipping info | Delivery section | Free shipping, estimated time |
| Brand | Product attributes | Brand name if listed |
| Category | Breadcrumb | Full category hierarchy |
| SKU variations | Product options | Colors, sizes, configurations |
| Images | Gallery | Product image URLs |
| Specifications | Detail tab | Technical specs table |
Alibaba’s Anti-Bot Technology
As an Alibaba-backed company, Lazada benefits from some of the most sophisticated anti-bot technology in e-commerce:
- Alibaba Security (ARES) — Lazada uses Alibaba’s proprietary bot detection system, which includes:
- Advanced browser fingerprinting
- Mouse movement and behavioral analysis
- Machine learning-based bot classification
- Slider CAPTCHA — Alibaba’s custom CAPTCHA system triggered by suspicious activity
- Encrypted API parameters — API requests require encrypted signature parameters that change with each session
- Cookie encryption — Session cookies include encrypted tokens that are validated server-side
- Rate limiting — Aggressive per-IP rate limits with progressive blocking
- Geographic restrictions — Country sites reject traffic from outside the target region
- JavaScript challenges — Complex JavaScript rendering required for data access
Setting Up Your Environment
Given Lazada’s heavy anti-bot measures, a headless browser approach is recommended:
pip install playwright beautifulsoup4 fake-useragent
playwright install chromiumPython Code: Scraping Lazada with Proxies
Approach 1: Browser-Based Scraping
import asyncio
from playwright.async_api import async_playwright
from bs4 import BeautifulSoup
import json
import random
import logging
import re
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
class LazadaScraper:
COUNTRY_CONFIGS = {
"sg": {"domain": "lazada.sg", "currency": "SGD"},
"my": {"domain": "lazada.com.my", "currency": "MYR"},
"th": {"domain": "lazada.co.th", "currency": "THB"},
"vn": {"domain": "lazada.vn", "currency": "VND"},
"ph": {"domain": "lazada.com.ph", "currency": "PHP"},
"id": {"domain": "lazada.co.id", "currency": "IDR"},
}
def __init__(self, country: str, proxy_list: list):
if country not in self.COUNTRY_CONFIGS:
raise ValueError(f"Unsupported country: {country}")
self.country = country
self.config = self.COUNTRY_CONFIGS[country]
self.proxy_list = proxy_list
self.products = []
def get_random_proxy(self) -> dict:
proxy_str = random.choice(self.proxy_list)
auth, server = proxy_str.rsplit("@", 1)
user, password = auth.split(":", 1)
return {
"server": f"http://{server}",
"username": user,
"password": password
}
async def search_products(self, keyword: str, max_pages: int = 5):
"""Search Lazada for products using headless browser."""
async with async_playwright() as p:
proxy = self.get_random_proxy()
browser = await p.chromium.launch(
headless=True,
proxy=proxy
)
context = await browser.new_context(
viewport={"width": 1920, "height": 1080},
user_agent=(
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/120.0.0.0 Safari/537.36"
),
locale="en-US"
)
page = await context.new_page()
for page_num in range(1, max_pages + 1):
url = (
f"https://www.{self.config['domain']}"
f"/catalog/?q={keyword}&page={page_num}"
)
logger.info(f"Scraping page {page_num}: {url}")
try:
await page.goto(url, wait_until="networkidle", timeout=60000)
await page.wait_for_timeout(random.randint(3000, 5000))
# Scroll to load more products
for i in range(5):
await page.evaluate(f"window.scrollBy(0, {400 + i * 200})")
await page.wait_for_timeout(random.randint(800, 1500))
html = await page.content()
page_products = self.parse_search_results(html)
if not page_products:
logger.info("No more products found")
break
self.products.extend(page_products)
logger.info(f"Found {len(page_products)} products on page {page_num}")
except Exception as e:
logger.error(f"Page scrape failed: {e}")
# Rotate proxy on failure
await browser.close()
proxy = self.get_random_proxy()
browser = await p.chromium.launch(headless=True, proxy=proxy)
context = await browser.new_context(
viewport={"width": 1920, "height": 1080},
user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
locale="en-US"
)
page = await context.new_page()
await page.wait_for_timeout(random.randint(4000, 8000))
await browser.close()
def parse_search_results(self, html: str) -> list:
"""Extract product data from Lazada search results."""
soup = BeautifulSoup(html, "html.parser")
products = []
# Try to extract from embedded JSON data first
scripts = soup.find_all("script")
for script in scripts:
if script.string and "window.pageData" in (script.string or ""):
try:
# Extract JSON from window.pageData assignment
json_match = re.search(
r'window\.pageData\s*=\s*({.*?});',
script.string,
re.DOTALL
)
if json_match:
page_data = json.loads(json_match.group(1))
items = (page_data.get("mods", {})
.get("listItems", []))
for item in items:
products.append(self.parse_item_json(item))
except (json.JSONDecodeError, AttributeError):
continue
# Fallback: parse HTML product cards
if not products:
cards = soup.select("[data-qa-locator='product-item'], [class*='product-card']")
for card in cards:
product = self.parse_product_card(card)
if product:
products.append(product)
return products
def parse_item_json(self, item: dict) -> dict:
"""Parse product from Lazada's embedded JSON data."""
return {
"item_id": item.get("itemId") or item.get("nid"),
"name": item.get("name"),
"price": item.get("price"),
"original_price": item.get("originalPrice"),
"discount": item.get("discount"),
"rating": item.get("ratingScore"),
"review_count": item.get("review"),
"sold_count": item.get("itemSoldCntShow"),
"brand": item.get("brandName"),
"seller_name": item.get("sellerName"),
"location": item.get("location"),
"image": item.get("image"),
"url": item.get("productUrl"),
"is_lazmall": item.get("isLazMall", False),
"free_shipping": item.get("isFreeShipping", False),
"currency": self.config["currency"],
"country": self.country
}
def parse_product_card(self, card) -> dict:
"""Parse product from HTML card element."""
product = {}
# Title
title_el = card.select_one("[class*='title'], a[title]")
if title_el:
product["name"] = title_el.get("title") or title_el.get_text(strip=True)
# Price
price_el = card.select_one("[class*='price'] span, [data-price]")
if price_el:
product["price"] = price_el.get_text(strip=True)
# Rating
rating_el = card.select_one("[class*='rating']")
if rating_el:
product["rating"] = rating_el.get_text(strip=True)
# Link
link_el = card.select_one("a[href*='/products/']")
if link_el:
product["url"] = link_el["href"]
# Image
img_el = card.select_one("img[src]")
if img_el:
product["image"] = img_el.get("src") or img_el.get("data-src")
product["currency"] = self.config["currency"]
product["country"] = self.country
return product if product.get("name") else None
async def scrape_product_detail(self, product_url: str) -> dict:
"""Scrape detailed product information from listing page."""
async with async_playwright() as p:
proxy = self.get_random_proxy()
browser = await p.chromium.launch(headless=True, proxy=proxy)
context = await browser.new_context(
viewport={"width": 1920, "height": 1080},
user_agent=(
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/120.0.0.0 Safari/537.36"
)
)
page = await context.new_page()
detail = {}
try:
full_url = product_url if product_url.startswith("http") else f"https:{product_url}"
await page.goto(full_url, wait_until="networkidle", timeout=60000)
await page.wait_for_timeout(random.randint(3000, 5000))
html = await page.content()
soup = BeautifulSoup(html, "html.parser")
# Product title
title_el = soup.select_one("h1, [class*='pdp-product-title']")
if title_el:
detail["name"] = title_el.get_text(strip=True)
# Price
price_el = soup.select_one("[class*='pdp-price'], [class*='price-current']")
if price_el:
detail["price"] = price_el.get_text(strip=True)
# Rating and reviews
rating_el = soup.select_one("[class*='pdp-review-summary']")
if rating_el:
detail["rating_summary"] = rating_el.get_text(strip=True)
# Description
desc_el = soup.select_one("[class*='detail-content'], [class*='pdp-product-detail']")
if desc_el:
detail["description"] = desc_el.get_text(strip=True)
# Specifications
specs = {}
spec_rows = soup.select("[class*='specification'] li, [class*='key-value']")
for row in spec_rows:
key_el = row.select_one("[class*='key'], [class*='name']")
val_el = row.select_one("[class*='value']")
if key_el and val_el:
specs[key_el.get_text(strip=True)] = val_el.get_text(strip=True)
detail["specifications"] = specs
# Seller info
seller_el = soup.select_one("[class*='seller-name'], [class*='store-name']")
if seller_el:
detail["seller_name"] = seller_el.get_text(strip=True)
# Reviews
reviews = []
review_els = soup.select("[class*='review-content'], [class*='item-content']")
for rev in review_els[:10]:
reviews.append(rev.get_text(strip=True))
detail["reviews_sample"] = reviews
except Exception as e:
logger.error(f"Detail scrape failed: {e}")
await browser.close()
return detail
# Usage
if __name__ == "__main__":
# Use Thai proxies for Lazada Thailand
th_proxies = [
"user:pass@th-residential1.proxy.com:8080",
"user:pass@th-residential2.proxy.com:8080",
]
scraper = LazadaScraper(country="th", proxy_list=th_proxies)
asyncio.run(scraper.search_products(
keyword="wireless mouse",
max_pages=3
))
print(f"Found {len(scraper.products)} products on Lazada TH")
# Get detail for first product
if scraper.products and scraper.products[0].get("url"):
detail = asyncio.run(
scraper.scrape_product_detail(scraper.products[0]["url"])
)
print(f"Detail: {detail.get('name')}")
with open("lazada_th_products.json", "w") as f:
json.dump(scraper.products, f, indent=2, ensure_ascii=False)Approach 2: Direct API Requests
When you can bypass the encrypted parameters, Lazada’s API returns clean JSON:
import requests
import json
import time
import random
def search_lazada_api(country_domain: str, keyword: str,
proxy: str, page: int = 1) -> dict:
"""Attempt Lazada API search (parameters may need updating)."""
url = f"https://www.{country_domain}/catalog/"
params = {
"ajax": "true",
"q": keyword,
"page": page
}
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
"Accept": "application/json",
"Referer": f"https://www.{country_domain}/",
"X-Requested-With": "XMLHttpRequest"
}
proxies = {"http": f"http://{proxy}", "https": f"http://{proxy}"}
try:
response = requests.get(
url, params=params, headers=headers,
proxies=proxies, timeout=30
)
if response.status_code == 200:
return response.json()
except Exception as e:
print(f"API request failed: {e}")
return {}SEA Proxy Requirements
Like Shopee, Lazada enforces geographic restrictions:
- Singapore — Requires SG IP addresses for lazada.sg
- Malaysia — Requires MY IP addresses for lazada.com.my
- Thailand — Requires TH IP addresses for lazada.co.th
- Vietnam — Requires VN IP addresses for lazada.vn
- Philippines — Requires PH IP addresses for lazada.com.ph
- Indonesia — Requires ID IP addresses for lazada.co.id
Premium residential proxies from these Southeast Asian countries are essential. Mobile proxies provide the highest trust scores, particularly in markets where mobile commerce dominates (Philippines, Indonesia, Vietnam).
Verify your proxy’s country with our IP lookup tool.
Handling Alibaba’s Anti-Bot Defenses
Lazada’s Alibaba-backed security requires specific countermeasures:
Browser Fingerprint Consistency
async def create_consistent_context(playwright, proxy):
"""Create a browser context with consistent fingerprint."""
browser = await playwright.chromium.launch(
headless=True,
proxy=proxy,
args=[
"--disable-blink-features=AutomationControlled",
"--disable-features=IsolateOrigins,site-per-process"
]
)
context = await browser.new_context(
viewport={"width": 1920, "height": 1080},
user_agent=(
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/120.0.0.0 Safari/537.36"
),
locale="en-US",
timezone_id="Asia/Singapore", # Match proxy location
color_scheme="light"
)
# Override navigator.webdriver to avoid detection
await context.add_init_script("""
Object.defineProperty(navigator, 'webdriver', {
get: () => undefined
});
""")
return browser, contextHandling Slider CAPTCHAs
Lazada’s slider CAPTCHA is notoriously difficult to solve automatically. Options include:
- CAPTCHA solving services — Integrate with services like 2Captcha or Anti-Captcha
- Prevention — Use slower request rates and better proxy quality to avoid triggering CAPTCHAs
- Session caching — Save and reuse valid session cookies to minimize new session creation
Recommended Proxy Type
For Lazada scraping:
- SEA residential proxies — Required for each target country
- Mobile proxies — Highest trust level, especially in mobile-first markets like PH and ID
- Sticky sessions (5-10 minutes) — Lazada tracks session consistency; maintain the same IP for multi-page workflows
- Premium providers — Alibaba’s bot detection scores IP reputation heavily. Use premium proxy providers with clean SEA IP pools.
- Minimum pool of 50 IPs per country — Lazada blocks aggressively; you need a large rotation pool
Estimate your multi-country proxy costs with our proxy cost calculator.
Troubleshooting
Problem: Getting redirected to CAPTCHA page
- Reduce request frequency to 1 request every 5-10 seconds.
- Use a headless browser instead of direct HTTP requests.
- Switch to higher-quality residential or mobile proxies.
- Add the
--disable-blink-features=AutomationControlledflag to your browser launch.
Problem: Search results page returns no products
- Verify your proxy IP is from the correct country.
- Ensure JavaScript is fully rendered by waiting longer after page load.
- Check for the
window.pageDataJSON in script tags — products may be in embedded data even if not visually rendered.
Problem: Product detail pages are empty or show error
- Lazada product URLs often use protocol-relative format (starting with
//). Prependhttps:to these URLs. - Some product pages redirect to login. Use fresh sessions with clean cookies.
Problem: Price data appears inconsistent
- Lazada has flash sales, vouchers, and dynamic pricing. Prices may change between requests.
- Look for both
priceandoriginalPricefields to capture discount information.
Problem: Reviews not loading on detail pages
- Reviews are loaded via separate AJAX requests. Scroll down to the review section and wait for it to load.
- Alternatively, intercept the review API endpoint from network traffic.
Legal and Ethical Considerations
Scraping Lazada involves legal considerations across multiple SEA jurisdictions:
- Lazada Terms of Service — Prohibit scraping, data mining, and automated access. As an Alibaba subsidiary, Lazada has significant legal resources.
- Computer Misuse Act (Singapore) — Singapore’s CMA broadly criminalizes unauthorized computer access, which could extend to scraping if Lazada argues their anti-bot measures constitute access controls.
- Multi-jurisdiction compliance — Operating across six countries means compliance with six different legal frameworks. Thailand’s Computer Crime Act, Vietnam’s Cybersecurity Law, and Indonesia’s Electronic Information and Transactions Law all have provisions that could apply.
- Personal data — Seller names, review author names, and location data are personal information under various SEA data protection laws.
- Alibaba litigation history — Alibaba has pursued legal action against scrapers of its Chinese platforms (Taobao, Tmall). The same approach could be applied to Lazada.
- Commercial use — Using scraped data for competitive pricing or market intelligence could raise unfair competition claims.
Consult legal counsel familiar with Southeast Asian e-commerce and data protection law before conducting commercial Lazada scraping.
Conclusion
Lazada is one of the more challenging e-commerce platforms to scrape due to Alibaba’s advanced anti-bot technology. The headless browser approach with Playwright provides the best results, especially when combined with consistent browser fingerprinting and geo-targeted SEA proxies. Focus on extracting embedded JSON data from window.pageData rather than parsing HTML, as this is more reliable and contains richer product information. Start with a single country, refine your approach against Alibaba’s defenses, then expand to additional markets.
- How to Scrape Airbnb Listings with Proxies in 2026
- How to Scrape Facebook Marketplace with Proxies in 2026
- aiohttp + BeautifulSoup: Async Python Scraping
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- API vs Web Scraping: When You Need Proxies (and When You Don’t)
- ASEAN Data Protection Laws: A Web Scraping Compliance Matrix
- How to Scrape Airbnb Listings with Proxies in 2026
- How to Scrape Facebook Marketplace with Proxies in 2026
- aiohttp + BeautifulSoup: Async Python Scraping
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- API vs Web Scraping: When You Need Proxies (and When You Don’t)
- ASEAN Data Protection Laws: A Web Scraping Compliance Matrix
- How to Scrape Airbnb Listings with Proxies in 2026
- How to Scrape Facebook Marketplace with Proxies in 2026
- aiohttp + BeautifulSoup: Async Python Scraping
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- API vs Web Scraping: When You Need Proxies (and When You Don’t)
- ASEAN Data Protection Laws: A Web Scraping Compliance Matrix
Related Reading
- How to Scrape Airbnb Listings with Proxies in 2026
- How to Scrape Facebook Marketplace with Proxies in 2026
- aiohttp + BeautifulSoup: Async Python Scraping
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- API vs Web Scraping: When You Need Proxies (and When You Don’t)
- ASEAN Data Protection Laws: A Web Scraping Compliance Matrix