How to Scrape Lazada with Proxies in 2026 (SEA E-commerce)

How to Scrape Lazada with Proxies in 2026 (SEA E-commerce)

Lazada is Southeast Asia’s second-largest e-commerce platform, backed by Alibaba Group, and operates across six countries: Singapore, Malaysia, Thailand, Vietnam, the Philippines, and Indonesia. As a key competitor to Shopee, Lazada data is essential for anyone conducting SEA e-commerce research, competitive analysis, or price monitoring.

This guide covers how to scrape Lazada product data using Python with SEA regional proxies, including strategies for handling Alibaba’s advanced anti-bot technology.

Why Scrape Lazada?

Lazada data addresses multiple business needs in the SEA e-commerce ecosystem:

  • Competitive intelligence — Compare product pricing, availability, and seller strategies against Shopee and other marketplaces
  • Brand monitoring — Track authorized and unauthorized sellers of your products across Lazada markets
  • Price benchmarking — Monitor competitor pricing across countries to optimize your own pricing strategy
  • Market sizing — Estimate market size and demand for product categories by analyzing listing volume and sales data
  • Seller research — Identify top-performing sellers, their product ranges, and pricing patterns
  • Cross-border commerce — Understand how the same products are priced and positioned across different SEA countries
  • Trend detection — Spot rising product categories and seasonal demand shifts

Lazada’s Multi-Country Operations

Lazada operates distinct sites for each country:

CountryDomainCurrencyNotes
Singaporelazada.sgSGDMature market, high AOV
Malaysialazada.com.myMYRFast-growing market
Thailandlazada.co.thTHBLargest Lazada market
Vietnamlazada.vnVNDRapidly growing
Philippineslazada.com.phPHPStrong mobile commerce
Indonesialazada.co.idIDRCompetitive with Tokopedia

Each country site has distinct product catalogs, pricing, sellers, and promotional events.

Data Points to Extract

Data PointSourceNotes
Product nameListing card / detailMay be in local language
PricePrice elementCurrent, original, and discounted
RatingStar displayAverage rating (1-5)
Review countRating sectionTotal number of reviews
Sold countSales indicatorMonthly or total sales
Seller nameSeller sectionShop name and type (LazMall, etc.)
Seller ratingSeller profilePositive rating percentage
Shipping infoDelivery sectionFree shipping, estimated time
BrandProduct attributesBrand name if listed
CategoryBreadcrumbFull category hierarchy
SKU variationsProduct optionsColors, sizes, configurations
ImagesGalleryProduct image URLs
SpecificationsDetail tabTechnical specs table

Alibaba’s Anti-Bot Technology

As an Alibaba-backed company, Lazada benefits from some of the most sophisticated anti-bot technology in e-commerce:

  1. Alibaba Security (ARES) — Lazada uses Alibaba’s proprietary bot detection system, which includes:
  • Advanced browser fingerprinting
  • Mouse movement and behavioral analysis
  • Machine learning-based bot classification
  1. Slider CAPTCHA — Alibaba’s custom CAPTCHA system triggered by suspicious activity
  2. Encrypted API parameters — API requests require encrypted signature parameters that change with each session
  3. Cookie encryption — Session cookies include encrypted tokens that are validated server-side
  4. Rate limiting — Aggressive per-IP rate limits with progressive blocking
  5. Geographic restrictions — Country sites reject traffic from outside the target region
  6. JavaScript challenges — Complex JavaScript rendering required for data access

Setting Up Your Environment

Given Lazada’s heavy anti-bot measures, a headless browser approach is recommended:

pip install playwright beautifulsoup4 fake-useragent
playwright install chromium

Python Code: Scraping Lazada with Proxies

Approach 1: Browser-Based Scraping

import asyncio
from playwright.async_api import async_playwright
from bs4 import BeautifulSoup
import json
import random
import logging
import re

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

class LazadaScraper:
    COUNTRY_CONFIGS = {
        "sg": {"domain": "lazada.sg", "currency": "SGD"},
        "my": {"domain": "lazada.com.my", "currency": "MYR"},
        "th": {"domain": "lazada.co.th", "currency": "THB"},
        "vn": {"domain": "lazada.vn", "currency": "VND"},
        "ph": {"domain": "lazada.com.ph", "currency": "PHP"},
        "id": {"domain": "lazada.co.id", "currency": "IDR"},
    }

    def __init__(self, country: str, proxy_list: list):
        if country not in self.COUNTRY_CONFIGS:
            raise ValueError(f"Unsupported country: {country}")
        self.country = country
        self.config = self.COUNTRY_CONFIGS[country]
        self.proxy_list = proxy_list
        self.products = []

    def get_random_proxy(self) -> dict:
        proxy_str = random.choice(self.proxy_list)
        auth, server = proxy_str.rsplit("@", 1)
        user, password = auth.split(":", 1)
        return {
            "server": f"http://{server}",
            "username": user,
            "password": password
        }

    async def search_products(self, keyword: str, max_pages: int = 5):
        """Search Lazada for products using headless browser."""
        async with async_playwright() as p:
            proxy = self.get_random_proxy()
            browser = await p.chromium.launch(
                headless=True,
                proxy=proxy
            )
            context = await browser.new_context(
                viewport={"width": 1920, "height": 1080},
                user_agent=(
                    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
                    "AppleWebKit/537.36 (KHTML, like Gecko) "
                    "Chrome/120.0.0.0 Safari/537.36"
                ),
                locale="en-US"
            )
            page = await context.new_page()

            for page_num in range(1, max_pages + 1):
                url = (
                    f"https://www.{self.config['domain']}"
                    f"/catalog/?q={keyword}&page={page_num}"
                )
                logger.info(f"Scraping page {page_num}: {url}")

                try:
                    await page.goto(url, wait_until="networkidle", timeout=60000)
                    await page.wait_for_timeout(random.randint(3000, 5000))

                    # Scroll to load more products
                    for i in range(5):
                        await page.evaluate(f"window.scrollBy(0, {400 + i * 200})")
                        await page.wait_for_timeout(random.randint(800, 1500))

                    html = await page.content()
                    page_products = self.parse_search_results(html)

                    if not page_products:
                        logger.info("No more products found")
                        break

                    self.products.extend(page_products)
                    logger.info(f"Found {len(page_products)} products on page {page_num}")

                except Exception as e:
                    logger.error(f"Page scrape failed: {e}")
                    # Rotate proxy on failure
                    await browser.close()
                    proxy = self.get_random_proxy()
                    browser = await p.chromium.launch(headless=True, proxy=proxy)
                    context = await browser.new_context(
                        viewport={"width": 1920, "height": 1080},
                        user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
                        locale="en-US"
                    )
                    page = await context.new_page()

                await page.wait_for_timeout(random.randint(4000, 8000))

            await browser.close()

    def parse_search_results(self, html: str) -> list:
        """Extract product data from Lazada search results."""
        soup = BeautifulSoup(html, "html.parser")
        products = []

        # Try to extract from embedded JSON data first
        scripts = soup.find_all("script")
        for script in scripts:
            if script.string and "window.pageData" in (script.string or ""):
                try:
                    # Extract JSON from window.pageData assignment
                    json_match = re.search(
                        r'window\.pageData\s*=\s*({.*?});',
                        script.string,
                        re.DOTALL
                    )
                    if json_match:
                        page_data = json.loads(json_match.group(1))
                        items = (page_data.get("mods", {})
                                .get("listItems", []))
                        for item in items:
                            products.append(self.parse_item_json(item))
                except (json.JSONDecodeError, AttributeError):
                    continue

        # Fallback: parse HTML product cards
        if not products:
            cards = soup.select("[data-qa-locator='product-item'], [class*='product-card']")
            for card in cards:
                product = self.parse_product_card(card)
                if product:
                    products.append(product)

        return products

    def parse_item_json(self, item: dict) -> dict:
        """Parse product from Lazada's embedded JSON data."""
        return {
            "item_id": item.get("itemId") or item.get("nid"),
            "name": item.get("name"),
            "price": item.get("price"),
            "original_price": item.get("originalPrice"),
            "discount": item.get("discount"),
            "rating": item.get("ratingScore"),
            "review_count": item.get("review"),
            "sold_count": item.get("itemSoldCntShow"),
            "brand": item.get("brandName"),
            "seller_name": item.get("sellerName"),
            "location": item.get("location"),
            "image": item.get("image"),
            "url": item.get("productUrl"),
            "is_lazmall": item.get("isLazMall", False),
            "free_shipping": item.get("isFreeShipping", False),
            "currency": self.config["currency"],
            "country": self.country
        }

    def parse_product_card(self, card) -> dict:
        """Parse product from HTML card element."""
        product = {}

        # Title
        title_el = card.select_one("[class*='title'], a[title]")
        if title_el:
            product["name"] = title_el.get("title") or title_el.get_text(strip=True)

        # Price
        price_el = card.select_one("[class*='price'] span, [data-price]")
        if price_el:
            product["price"] = price_el.get_text(strip=True)

        # Rating
        rating_el = card.select_one("[class*='rating']")
        if rating_el:
            product["rating"] = rating_el.get_text(strip=True)

        # Link
        link_el = card.select_one("a[href*='/products/']")
        if link_el:
            product["url"] = link_el["href"]

        # Image
        img_el = card.select_one("img[src]")
        if img_el:
            product["image"] = img_el.get("src") or img_el.get("data-src")

        product["currency"] = self.config["currency"]
        product["country"] = self.country

        return product if product.get("name") else None

    async def scrape_product_detail(self, product_url: str) -> dict:
        """Scrape detailed product information from listing page."""
        async with async_playwright() as p:
            proxy = self.get_random_proxy()
            browser = await p.chromium.launch(headless=True, proxy=proxy)
            context = await browser.new_context(
                viewport={"width": 1920, "height": 1080},
                user_agent=(
                    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
                    "AppleWebKit/537.36 (KHTML, like Gecko) "
                    "Chrome/120.0.0.0 Safari/537.36"
                )
            )
            page = await context.new_page()
            detail = {}

            try:
                full_url = product_url if product_url.startswith("http") else f"https:{product_url}"
                await page.goto(full_url, wait_until="networkidle", timeout=60000)
                await page.wait_for_timeout(random.randint(3000, 5000))

                html = await page.content()
                soup = BeautifulSoup(html, "html.parser")

                # Product title
                title_el = soup.select_one("h1, [class*='pdp-product-title']")
                if title_el:
                    detail["name"] = title_el.get_text(strip=True)

                # Price
                price_el = soup.select_one("[class*='pdp-price'], [class*='price-current']")
                if price_el:
                    detail["price"] = price_el.get_text(strip=True)

                # Rating and reviews
                rating_el = soup.select_one("[class*='pdp-review-summary']")
                if rating_el:
                    detail["rating_summary"] = rating_el.get_text(strip=True)

                # Description
                desc_el = soup.select_one("[class*='detail-content'], [class*='pdp-product-detail']")
                if desc_el:
                    detail["description"] = desc_el.get_text(strip=True)

                # Specifications
                specs = {}
                spec_rows = soup.select("[class*='specification'] li, [class*='key-value']")
                for row in spec_rows:
                    key_el = row.select_one("[class*='key'], [class*='name']")
                    val_el = row.select_one("[class*='value']")
                    if key_el and val_el:
                        specs[key_el.get_text(strip=True)] = val_el.get_text(strip=True)
                detail["specifications"] = specs

                # Seller info
                seller_el = soup.select_one("[class*='seller-name'], [class*='store-name']")
                if seller_el:
                    detail["seller_name"] = seller_el.get_text(strip=True)

                # Reviews
                reviews = []
                review_els = soup.select("[class*='review-content'], [class*='item-content']")
                for rev in review_els[:10]:
                    reviews.append(rev.get_text(strip=True))
                detail["reviews_sample"] = reviews

            except Exception as e:
                logger.error(f"Detail scrape failed: {e}")

            await browser.close()
            return detail


# Usage
if __name__ == "__main__":
    # Use Thai proxies for Lazada Thailand
    th_proxies = [
        "user:pass@th-residential1.proxy.com:8080",
        "user:pass@th-residential2.proxy.com:8080",
    ]

    scraper = LazadaScraper(country="th", proxy_list=th_proxies)

    asyncio.run(scraper.search_products(
        keyword="wireless mouse",
        max_pages=3
    ))

    print(f"Found {len(scraper.products)} products on Lazada TH")

    # Get detail for first product
    if scraper.products and scraper.products[0].get("url"):
        detail = asyncio.run(
            scraper.scrape_product_detail(scraper.products[0]["url"])
        )
        print(f"Detail: {detail.get('name')}")

    with open("lazada_th_products.json", "w") as f:
        json.dump(scraper.products, f, indent=2, ensure_ascii=False)

Approach 2: Direct API Requests

When you can bypass the encrypted parameters, Lazada’s API returns clean JSON:

import requests
import json
import time
import random

def search_lazada_api(country_domain: str, keyword: str,
                      proxy: str, page: int = 1) -> dict:
    """Attempt Lazada API search (parameters may need updating)."""
    url = f"https://www.{country_domain}/catalog/"
    params = {
        "ajax": "true",
        "q": keyword,
        "page": page
    }

    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
        "Accept": "application/json",
        "Referer": f"https://www.{country_domain}/",
        "X-Requested-With": "XMLHttpRequest"
    }

    proxies = {"http": f"http://{proxy}", "https": f"http://{proxy}"}

    try:
        response = requests.get(
            url, params=params, headers=headers,
            proxies=proxies, timeout=30
        )
        if response.status_code == 200:
            return response.json()
    except Exception as e:
        print(f"API request failed: {e}")

    return {}

SEA Proxy Requirements

Like Shopee, Lazada enforces geographic restrictions:

  • Singapore — Requires SG IP addresses for lazada.sg
  • Malaysia — Requires MY IP addresses for lazada.com.my
  • Thailand — Requires TH IP addresses for lazada.co.th
  • Vietnam — Requires VN IP addresses for lazada.vn
  • Philippines — Requires PH IP addresses for lazada.com.ph
  • Indonesia — Requires ID IP addresses for lazada.co.id

Premium residential proxies from these Southeast Asian countries are essential. Mobile proxies provide the highest trust scores, particularly in markets where mobile commerce dominates (Philippines, Indonesia, Vietnam).

Verify your proxy’s country with our IP lookup tool.

Handling Alibaba’s Anti-Bot Defenses

Lazada’s Alibaba-backed security requires specific countermeasures:

Browser Fingerprint Consistency

async def create_consistent_context(playwright, proxy):
    """Create a browser context with consistent fingerprint."""
    browser = await playwright.chromium.launch(
        headless=True,
        proxy=proxy,
        args=[
            "--disable-blink-features=AutomationControlled",
            "--disable-features=IsolateOrigins,site-per-process"
        ]
    )

    context = await browser.new_context(
        viewport={"width": 1920, "height": 1080},
        user_agent=(
            "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
            "AppleWebKit/537.36 (KHTML, like Gecko) "
            "Chrome/120.0.0.0 Safari/537.36"
        ),
        locale="en-US",
        timezone_id="Asia/Singapore",  # Match proxy location
        color_scheme="light"
    )

    # Override navigator.webdriver to avoid detection
    await context.add_init_script("""
        Object.defineProperty(navigator, 'webdriver', {
            get: () => undefined
        });
    """)

    return browser, context

Handling Slider CAPTCHAs

Lazada’s slider CAPTCHA is notoriously difficult to solve automatically. Options include:

  1. CAPTCHA solving services — Integrate with services like 2Captcha or Anti-Captcha
  2. Prevention — Use slower request rates and better proxy quality to avoid triggering CAPTCHAs
  3. Session caching — Save and reuse valid session cookies to minimize new session creation

Recommended Proxy Type

For Lazada scraping:

  • SEA residential proxies — Required for each target country
  • Mobile proxies — Highest trust level, especially in mobile-first markets like PH and ID
  • Sticky sessions (5-10 minutes) — Lazada tracks session consistency; maintain the same IP for multi-page workflows
  • Premium providers — Alibaba’s bot detection scores IP reputation heavily. Use premium proxy providers with clean SEA IP pools.
  • Minimum pool of 50 IPs per country — Lazada blocks aggressively; you need a large rotation pool

Estimate your multi-country proxy costs with our proxy cost calculator.

Troubleshooting

Problem: Getting redirected to CAPTCHA page

  • Reduce request frequency to 1 request every 5-10 seconds.
  • Use a headless browser instead of direct HTTP requests.
  • Switch to higher-quality residential or mobile proxies.
  • Add the --disable-blink-features=AutomationControlled flag to your browser launch.

Problem: Search results page returns no products

  • Verify your proxy IP is from the correct country.
  • Ensure JavaScript is fully rendered by waiting longer after page load.
  • Check for the window.pageData JSON in script tags — products may be in embedded data even if not visually rendered.

Problem: Product detail pages are empty or show error

  • Lazada product URLs often use protocol-relative format (starting with //). Prepend https: to these URLs.
  • Some product pages redirect to login. Use fresh sessions with clean cookies.

Problem: Price data appears inconsistent

  • Lazada has flash sales, vouchers, and dynamic pricing. Prices may change between requests.
  • Look for both price and originalPrice fields to capture discount information.

Problem: Reviews not loading on detail pages

  • Reviews are loaded via separate AJAX requests. Scroll down to the review section and wait for it to load.
  • Alternatively, intercept the review API endpoint from network traffic.

Legal and Ethical Considerations

Scraping Lazada involves legal considerations across multiple SEA jurisdictions:

  • Lazada Terms of Service — Prohibit scraping, data mining, and automated access. As an Alibaba subsidiary, Lazada has significant legal resources.
  • Computer Misuse Act (Singapore) — Singapore’s CMA broadly criminalizes unauthorized computer access, which could extend to scraping if Lazada argues their anti-bot measures constitute access controls.
  • Multi-jurisdiction compliance — Operating across six countries means compliance with six different legal frameworks. Thailand’s Computer Crime Act, Vietnam’s Cybersecurity Law, and Indonesia’s Electronic Information and Transactions Law all have provisions that could apply.
  • Personal data — Seller names, review author names, and location data are personal information under various SEA data protection laws.
  • Alibaba litigation history — Alibaba has pursued legal action against scrapers of its Chinese platforms (Taobao, Tmall). The same approach could be applied to Lazada.
  • Commercial use — Using scraped data for competitive pricing or market intelligence could raise unfair competition claims.

Consult legal counsel familiar with Southeast Asian e-commerce and data protection law before conducting commercial Lazada scraping.

Conclusion

Lazada is one of the more challenging e-commerce platforms to scrape due to Alibaba’s advanced anti-bot technology. The headless browser approach with Playwright provides the best results, especially when combined with consistent browser fingerprinting and geo-targeted SEA proxies. Focus on extracting embedded JSON data from window.pageData rather than parsing HTML, as this is more reliable and contains richer product information. Start with a single country, refine your approach against Alibaba’s defenses, then expand to additional markets.


Related Reading

Scroll to Top