How to Scrape Temu Product Data 2026

How to Scrape Temu Product Data 2026

Temu has rapidly become one of the fastest-growing e-commerce platforms globally since its 2022 launch, reaching over 100 million monthly active users. Owned by PDD Holdings (the parent company of Pinduoduo), Temu offers ultra-low prices on consumer goods, making it a critical target for competitive intelligence, dropshipping research, and price monitoring.

This guide covers how to scrape Temu product data effectively with Python, navigate their anti-bot defenses, and build a production-ready scraping pipeline.

What Data Can You Extract from Temu?

Temu product listings contain valuable e-commerce intelligence:

  • Product titles and descriptions
  • Pricing (with flash deals and bulk discounts)
  • Product images (multiple angles)
  • Review counts and ratings
  • Sold count / popularity indicators
  • Category hierarchy
  • Product specifications
  • Shipping details and estimated delivery
  • Seller/brand information
  • Related product recommendations

Example JSON Output

{
  "product_id": "601099517482108",
  "title": "Men's Casual Running Shoes Breathable Mesh Sneakers",
  "price": {
    "current": 8.99,
    "original": 35.99,
    "discount_percentage": 75,
    "currency": "USD"
  },
  "rating": 4.6,
  "review_count": 12543,
  "sold_count": "50K+",
  "category_path": ["Shoes", "Men's Shoes", "Sneakers"],
  "images": [
    "https://img.kwcdn.com/product/image1.jpg",
    "https://img.kwcdn.com/product/image2.jpg"
  ],
  "specifications": {
    "Material": "Mesh, Rubber sole",
    "Closure": "Lace-up",
    "Season": "All seasons"
  },
  "shipping": {
    "free_shipping": true,
    "estimated_delivery": "7-15 business days"
  },
  "variations": [
    {"type": "Color", "options": ["Black", "White", "Gray", "Blue"]},
    {"type": "Size", "options": ["US 7", "US 8", "US 9", "US 10", "US 11"]}
  ],
  "url": "https://www.temu.com/product-detail-601099517482108.html"
}

Prerequisites

pip install requests beautifulsoup4 selenium undetected-chromedriver fake-useragent lxml

Temu has very aggressive anti-bot protections. Residential proxies are mandatory for any scraping operation beyond basic testing.

Method 1: Scraping Temu with Requests

Temu’s website is heavily JavaScript-dependent, but some data can be extracted from initial page loads and API endpoints.

import requests
from bs4 import BeautifulSoup
from fake_useragent import UserAgent
import json
import re
import time
import random

class TemuScraper:
    def __init__(self, proxy_url=None):
        self.session = requests.Session()
        self.ua = UserAgent()
        self.proxy_url = proxy_url
        self.base_url = "https://www.temu.com"

    def _get_headers(self):
        return {
            "User-Agent": self.ua.random,
            "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
            "Accept-Language": "en-US,en;q=0.9",
            "Accept-Encoding": "gzip, deflate, br",
            "Referer": "https://www.temu.com/",
            "Sec-Fetch-Dest": "document",
            "Sec-Fetch-Mode": "navigate",
            "Sec-Fetch-Site": "same-origin",
            "Connection": "keep-alive",
        }

    def _get_proxies(self):
        if self.proxy_url:
            return {"http": self.proxy_url, "https": self.proxy_url}
        return None

    def search_products(self, query, max_pages=3):
        """Search Temu and extract product data."""
        all_products = []

        for page in range(1, max_pages + 1):
            url = f"{self.base_url}/search_result.html?search_key={query}&page={page}"

            try:
                response = self.session.get(
                    url,
                    headers=self._get_headers(),
                    proxies=self._get_proxies(),
                    timeout=30
                )
                response.raise_for_status()

                products = self._extract_products_from_html(response.text)
                all_products.extend(products)
                print(f"Page {page}: Found {len(products)} products")

                time.sleep(random.uniform(3, 7))

            except requests.RequestException as e:
                print(f"Error on page {page}: {e}")
                continue

        return all_products

    def _extract_products_from_html(self, html):
        """Extract product data from page source."""
        products = []

        # Try to find embedded JSON data
        patterns = [
            r'window\.__INITIAL_STATE__\s*=\s*({.*?});',
            r'window\.__rawData\s*=\s*({.*?});',
            r'"itemList"\s*:\s*(\[.*?\])',
        ]

        for pattern in patterns:
            match = re.search(pattern, html, re.DOTALL)
            if match:
                try:
                    data = json.loads(match.group(1))
                    # Parse based on structure found
                    if isinstance(data, list):
                        for item in data:
                            products.append(self._parse_item(item))
                    elif isinstance(data, dict):
                        items = self._find_items_in_dict(data)
                        for item in items:
                            products.append(self._parse_item(item))
                    break
                except json.JSONDecodeError:
                    continue

        # Fallback to HTML parsing
        if not products:
            soup = BeautifulSoup(html, "lxml")
            cards = soup.select("[class*='product-card'], [class*='goods-item']")
            for card in cards:
                try:
                    title = card.select_one("[class*='title']")
                    price = card.select_one("[class*='price']")
                    link = card.select_one("a[href]")
                    products.append({
                        "title": title.get_text(strip=True) if title else None,
                        "price": price.get_text(strip=True) if price else None,
                        "url": self.base_url + link["href"] if link else None,
                    })
                except Exception:
                    continue

        return products

    def _parse_item(self, item):
        """Parse a single product item from JSON data."""
        return {
            "product_id": item.get("goodsId") or item.get("productId"),
            "title": item.get("goodsName") or item.get("title"),
            "price": item.get("salePrice") or item.get("price"),
            "original_price": item.get("marketPrice") or item.get("originalPrice"),
            "image": item.get("image") or item.get("thumbUrl"),
            "rating": item.get("avgRating"),
            "sold_count": item.get("salesTip"),
        }

    def _find_items_in_dict(self, data, key_names=None):
        """Recursively find product items in nested dict."""
        if key_names is None:
            key_names = ["items", "goodsList", "products", "itemList"]
        results = []

        for key, value in data.items():
            if key in key_names and isinstance(value, list):
                results.extend(value)
            elif isinstance(value, dict):
                results.extend(self._find_items_in_dict(value, key_names))

        return results

    def scrape_product_detail(self, product_url):
        """Scrape detailed product information."""
        try:
            response = self.session.get(
                product_url,
                headers=self._get_headers(),
                proxies=self._get_proxies(),
                timeout=30
            )
            response.raise_for_status()

            soup = BeautifulSoup(response.text, "lxml")

            # Extract JSON-LD structured data
            for script in soup.find_all("script", type="application/ld+json"):
                try:
                    data = json.loads(script.string)
                    if data.get("@type") == "Product":
                        return {
                            "title": data.get("name"),
                            "description": data.get("description"),
                            "price": data.get("offers", {}).get("price"),
                            "currency": data.get("offers", {}).get("priceCurrency"),
                            "rating": data.get("aggregateRating", {}).get("ratingValue"),
                            "review_count": data.get("aggregateRating", {}).get("reviewCount"),
                            "image": data.get("image"),
                            "brand": data.get("brand", {}).get("name"),
                        }
                except json.JSONDecodeError:
                    continue

            return None

        except requests.RequestException as e:
            print(f"Error scraping product: {e}")
            return None


# Usage
if __name__ == "__main__":
    scraper = TemuScraper(proxy_url="http://user:pass@proxy:port")
    results = scraper.search_products("wireless earbuds", max_pages=2)
    print(f"Found {len(results)} products")
    print(json.dumps(results[:3], indent=2))

Method 2: Scraping Temu with Selenium

Since Temu relies heavily on client-side rendering, Selenium is usually the more reliable approach.

import undetected_chromedriver as uc
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import json
import time
import random

class TemuSeleniumScraper:
    def __init__(self, proxy=None):
        options = uc.ChromeOptions()
        options.add_argument("--headless=new")
        options.add_argument("--no-sandbox")

        if proxy:
            options.add_argument(f"--proxy-server={proxy}")

        self.driver = uc.Chrome(options=options)

    def search_products(self, query, max_pages=3):
        """Search Temu and extract products."""
        products = []

        for page in range(1, max_pages + 1):
            url = f"https://www.temu.com/search_result.html?search_key={query}&page={page}"
            self.driver.get(url)

            # Wait for product cards
            try:
                WebDriverWait(self.driver, 20).until(
                    EC.presence_of_element_located(
                        (By.CSS_SELECTOR, "[class*='product'], [class*='goods']")
                    )
                )
            except Exception:
                print(f"Timeout on page {page}")
                continue

            # Scroll to load all products
            self._scroll_page()

            # Extract product data via JS
            page_products = self.driver.execute_script("""
                const results = [];
                const cards = document.querySelectorAll('[class*="product-card"], [class*="goods-item"]');
                cards.forEach(card => {
                    const title = card.querySelector('[class*="title"]');
                    const price = card.querySelector('[class*="price"]');
                    const link = card.querySelector('a[href]');
                    const img = card.querySelector('img');

                    results.push({
                        title: title ? title.innerText.trim() : null,
                        price: price ? price.innerText.trim() : null,
                        url: link ? link.href : null,
                        image: img ? img.src : null
                    });
                });
                return results;
            """)

            products.extend(page_products)
            print(f"Page {page}: {len(page_products)} products")
            time.sleep(random.uniform(4, 8))

        return products

    def scrape_product_page(self, url):
        """Scrape individual product page."""
        self.driver.get(url)
        time.sleep(3)

        # Wait for main content
        try:
            WebDriverWait(self.driver, 15).until(
                EC.presence_of_element_located((By.CSS_SELECTOR, "[class*='detail'], [class*='product-info']"))
            )
        except Exception:
            return None

        product = self.driver.execute_script("""
            const result = {};

            // Title
            const title = document.querySelector('h1, [class*="goods-name"]');
            result.title = title ? title.innerText.trim() : null;

            // Price
            const price = document.querySelector('[class*="sale-price"], [class*="current-price"]');
            result.price = price ? price.innerText.trim() : null;

            // Original price
            const origPrice = document.querySelector('[class*="origin-price"], [class*="market-price"]');
            result.original_price = origPrice ? origPrice.innerText.trim() : null;

            // Rating
            const rating = document.querySelector('[class*="star-rating"], [class*="rating"]');
            result.rating = rating ? rating.innerText.trim() : null;

            // Review count
            const reviews = document.querySelector('[class*="review-count"]');
            result.review_count = reviews ? reviews.innerText.trim() : null;

            // Sold count
            const sold = document.querySelector('[class*="sold"], [class*="sales"]');
            result.sold = sold ? sold.innerText.trim() : null;

            // Description
            const desc = document.querySelector('[class*="description"], [class*="detail-info"]');
            result.description = desc ? desc.innerText.substring(0, 500) : null;

            return result;
        """)

        return product

    def _scroll_page(self):
        """Scroll page to trigger lazy loading."""
        for _ in range(5):
            self.driver.execute_script("window.scrollBy(0, 800);")
            time.sleep(1)

    def close(self):
        self.driver.quit()


# Usage
scraper = TemuSeleniumScraper(proxy="http://proxy:port")
results = scraper.search_products("phone cases", max_pages=2)
print(json.dumps(results[:5], indent=2))
scraper.close()

Handling Temu’s Anti-Bot Protections

Temu has some of the most aggressive anti-scraping measures in e-commerce:

1. Advanced Fingerprinting

Temu uses sophisticated browser fingerprinting that checks:

  • WebGL rendering
  • Canvas fingerprint
  • Audio context
  • Installed fonts and plugins

Use undetected-chromedriver to minimize detection:

import undetected_chromedriver as uc

options = uc.ChromeOptions()
options.add_argument("--disable-blink-features=AutomationControlled")
driver = uc.Chrome(options=options)

2. Dynamic HTML Structure

Temu frequently changes CSS class names and page structures. Build resilient selectors:

# Bad: fragile class-based selector
# soup.select("div.css-1abc2de")

# Good: attribute-based or content-based selectors
soup.select("[data-testid*='product']")
soup.find("span", string=re.compile(r'\$\d+\.\d{2}'))

3. Request Rate Detection

Temu monitors request patterns aggressively. Use variable timing:

def adaptive_delay(request_num):
    """Progressive delay that increases over time."""
    base = 3 + (request_num // 20) * 2  # Increase base every 20 requests
    jitter = random.uniform(0, base * 0.5)
    return base + jitter

Proxy Recommendations for Temu

Proxy TypeSuccess RateRecommendation
Mobile Proxies85-95%Best option for Temu
Residential Rotating60-75%Good for moderate volume
ISP Proxies50-65%Decent for small batches
Datacenter10-20%Not recommended

Temu’s anti-bot system is trained to detect datacenter IPs. Use mobile proxies or rotating residential proxies for reliable access.

Legal Considerations

  1. Terms of Service: Temu’s ToS strictly prohibits automated scraping and data extraction.
  2. Data Privacy: Chinese data protection regulations (PIPL) and international privacy laws apply.
  3. Copyright: Product images and descriptions are copyrighted content.
  4. Competition Law: Using scraped pricing data for price-fixing or anti-competitive purposes is illegal.
  5. Jurisdiction: Temu operates globally but is headquartered in China, adding jurisdictional complexity.

Always consult our web scraping compliance guide before starting a scraping project.

Rate Limiting Best Practices

  1. Start with 5-8 second delays between requests
  2. Take breaks: Pause for 60-120 seconds every 30 requests
  3. Rotate everything: IPs, user agents, and session cookies
  4. Respect 429 responses: Back off exponentially when rate-limited
  5. Monitor success rates: If they drop below 70%, slow down significantly
  6. Limit daily volume: Keep under 1,000 requests per IP per day

Conclusion

Temu is one of the more challenging e-commerce platforms to scrape due to its aggressive anti-bot protections. Success requires a combination of undetected-chromedriver, high-quality residential or mobile proxies, and careful request pacing.

For the best scraping infrastructure, explore dataresearchtools.com for proxy comparisons and setup guides. Our e-commerce proxy guide covers additional strategies for scraping competitive marketplaces.


Related Reading

Scroll to Top