How to Scrape Temu Product Data 2026
Temu has rapidly become one of the fastest-growing e-commerce platforms globally since its 2022 launch, reaching over 100 million monthly active users. Owned by PDD Holdings (the parent company of Pinduoduo), Temu offers ultra-low prices on consumer goods, making it a critical target for competitive intelligence, dropshipping research, and price monitoring.
This guide covers how to scrape Temu product data effectively with Python, navigate their anti-bot defenses, and build a production-ready scraping pipeline.
What Data Can You Extract from Temu?
Temu product listings contain valuable e-commerce intelligence:
- Product titles and descriptions
- Pricing (with flash deals and bulk discounts)
- Product images (multiple angles)
- Review counts and ratings
- Sold count / popularity indicators
- Category hierarchy
- Product specifications
- Shipping details and estimated delivery
- Seller/brand information
- Related product recommendations
Example JSON Output
{
"product_id": "601099517482108",
"title": "Men's Casual Running Shoes Breathable Mesh Sneakers",
"price": {
"current": 8.99,
"original": 35.99,
"discount_percentage": 75,
"currency": "USD"
},
"rating": 4.6,
"review_count": 12543,
"sold_count": "50K+",
"category_path": ["Shoes", "Men's Shoes", "Sneakers"],
"images": [
"https://img.kwcdn.com/product/image1.jpg",
"https://img.kwcdn.com/product/image2.jpg"
],
"specifications": {
"Material": "Mesh, Rubber sole",
"Closure": "Lace-up",
"Season": "All seasons"
},
"shipping": {
"free_shipping": true,
"estimated_delivery": "7-15 business days"
},
"variations": [
{"type": "Color", "options": ["Black", "White", "Gray", "Blue"]},
{"type": "Size", "options": ["US 7", "US 8", "US 9", "US 10", "US 11"]}
],
"url": "https://www.temu.com/product-detail-601099517482108.html"
}Prerequisites
pip install requests beautifulsoup4 selenium undetected-chromedriver fake-useragent lxmlTemu has very aggressive anti-bot protections. Residential proxies are mandatory for any scraping operation beyond basic testing.
Method 1: Scraping Temu with Requests
Temu’s website is heavily JavaScript-dependent, but some data can be extracted from initial page loads and API endpoints.
import requests
from bs4 import BeautifulSoup
from fake_useragent import UserAgent
import json
import re
import time
import random
class TemuScraper:
def __init__(self, proxy_url=None):
self.session = requests.Session()
self.ua = UserAgent()
self.proxy_url = proxy_url
self.base_url = "https://www.temu.com"
def _get_headers(self):
return {
"User-Agent": self.ua.random,
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.9",
"Accept-Encoding": "gzip, deflate, br",
"Referer": "https://www.temu.com/",
"Sec-Fetch-Dest": "document",
"Sec-Fetch-Mode": "navigate",
"Sec-Fetch-Site": "same-origin",
"Connection": "keep-alive",
}
def _get_proxies(self):
if self.proxy_url:
return {"http": self.proxy_url, "https": self.proxy_url}
return None
def search_products(self, query, max_pages=3):
"""Search Temu and extract product data."""
all_products = []
for page in range(1, max_pages + 1):
url = f"{self.base_url}/search_result.html?search_key={query}&page={page}"
try:
response = self.session.get(
url,
headers=self._get_headers(),
proxies=self._get_proxies(),
timeout=30
)
response.raise_for_status()
products = self._extract_products_from_html(response.text)
all_products.extend(products)
print(f"Page {page}: Found {len(products)} products")
time.sleep(random.uniform(3, 7))
except requests.RequestException as e:
print(f"Error on page {page}: {e}")
continue
return all_products
def _extract_products_from_html(self, html):
"""Extract product data from page source."""
products = []
# Try to find embedded JSON data
patterns = [
r'window\.__INITIAL_STATE__\s*=\s*({.*?});',
r'window\.__rawData\s*=\s*({.*?});',
r'"itemList"\s*:\s*(\[.*?\])',
]
for pattern in patterns:
match = re.search(pattern, html, re.DOTALL)
if match:
try:
data = json.loads(match.group(1))
# Parse based on structure found
if isinstance(data, list):
for item in data:
products.append(self._parse_item(item))
elif isinstance(data, dict):
items = self._find_items_in_dict(data)
for item in items:
products.append(self._parse_item(item))
break
except json.JSONDecodeError:
continue
# Fallback to HTML parsing
if not products:
soup = BeautifulSoup(html, "lxml")
cards = soup.select("[class*='product-card'], [class*='goods-item']")
for card in cards:
try:
title = card.select_one("[class*='title']")
price = card.select_one("[class*='price']")
link = card.select_one("a[href]")
products.append({
"title": title.get_text(strip=True) if title else None,
"price": price.get_text(strip=True) if price else None,
"url": self.base_url + link["href"] if link else None,
})
except Exception:
continue
return products
def _parse_item(self, item):
"""Parse a single product item from JSON data."""
return {
"product_id": item.get("goodsId") or item.get("productId"),
"title": item.get("goodsName") or item.get("title"),
"price": item.get("salePrice") or item.get("price"),
"original_price": item.get("marketPrice") or item.get("originalPrice"),
"image": item.get("image") or item.get("thumbUrl"),
"rating": item.get("avgRating"),
"sold_count": item.get("salesTip"),
}
def _find_items_in_dict(self, data, key_names=None):
"""Recursively find product items in nested dict."""
if key_names is None:
key_names = ["items", "goodsList", "products", "itemList"]
results = []
for key, value in data.items():
if key in key_names and isinstance(value, list):
results.extend(value)
elif isinstance(value, dict):
results.extend(self._find_items_in_dict(value, key_names))
return results
def scrape_product_detail(self, product_url):
"""Scrape detailed product information."""
try:
response = self.session.get(
product_url,
headers=self._get_headers(),
proxies=self._get_proxies(),
timeout=30
)
response.raise_for_status()
soup = BeautifulSoup(response.text, "lxml")
# Extract JSON-LD structured data
for script in soup.find_all("script", type="application/ld+json"):
try:
data = json.loads(script.string)
if data.get("@type") == "Product":
return {
"title": data.get("name"),
"description": data.get("description"),
"price": data.get("offers", {}).get("price"),
"currency": data.get("offers", {}).get("priceCurrency"),
"rating": data.get("aggregateRating", {}).get("ratingValue"),
"review_count": data.get("aggregateRating", {}).get("reviewCount"),
"image": data.get("image"),
"brand": data.get("brand", {}).get("name"),
}
except json.JSONDecodeError:
continue
return None
except requests.RequestException as e:
print(f"Error scraping product: {e}")
return None
# Usage
if __name__ == "__main__":
scraper = TemuScraper(proxy_url="http://user:pass@proxy:port")
results = scraper.search_products("wireless earbuds", max_pages=2)
print(f"Found {len(results)} products")
print(json.dumps(results[:3], indent=2))Method 2: Scraping Temu with Selenium
Since Temu relies heavily on client-side rendering, Selenium is usually the more reliable approach.
import undetected_chromedriver as uc
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import json
import time
import random
class TemuSeleniumScraper:
def __init__(self, proxy=None):
options = uc.ChromeOptions()
options.add_argument("--headless=new")
options.add_argument("--no-sandbox")
if proxy:
options.add_argument(f"--proxy-server={proxy}")
self.driver = uc.Chrome(options=options)
def search_products(self, query, max_pages=3):
"""Search Temu and extract products."""
products = []
for page in range(1, max_pages + 1):
url = f"https://www.temu.com/search_result.html?search_key={query}&page={page}"
self.driver.get(url)
# Wait for product cards
try:
WebDriverWait(self.driver, 20).until(
EC.presence_of_element_located(
(By.CSS_SELECTOR, "[class*='product'], [class*='goods']")
)
)
except Exception:
print(f"Timeout on page {page}")
continue
# Scroll to load all products
self._scroll_page()
# Extract product data via JS
page_products = self.driver.execute_script("""
const results = [];
const cards = document.querySelectorAll('[class*="product-card"], [class*="goods-item"]');
cards.forEach(card => {
const title = card.querySelector('[class*="title"]');
const price = card.querySelector('[class*="price"]');
const link = card.querySelector('a[href]');
const img = card.querySelector('img');
results.push({
title: title ? title.innerText.trim() : null,
price: price ? price.innerText.trim() : null,
url: link ? link.href : null,
image: img ? img.src : null
});
});
return results;
""")
products.extend(page_products)
print(f"Page {page}: {len(page_products)} products")
time.sleep(random.uniform(4, 8))
return products
def scrape_product_page(self, url):
"""Scrape individual product page."""
self.driver.get(url)
time.sleep(3)
# Wait for main content
try:
WebDriverWait(self.driver, 15).until(
EC.presence_of_element_located((By.CSS_SELECTOR, "[class*='detail'], [class*='product-info']"))
)
except Exception:
return None
product = self.driver.execute_script("""
const result = {};
// Title
const title = document.querySelector('h1, [class*="goods-name"]');
result.title = title ? title.innerText.trim() : null;
// Price
const price = document.querySelector('[class*="sale-price"], [class*="current-price"]');
result.price = price ? price.innerText.trim() : null;
// Original price
const origPrice = document.querySelector('[class*="origin-price"], [class*="market-price"]');
result.original_price = origPrice ? origPrice.innerText.trim() : null;
// Rating
const rating = document.querySelector('[class*="star-rating"], [class*="rating"]');
result.rating = rating ? rating.innerText.trim() : null;
// Review count
const reviews = document.querySelector('[class*="review-count"]');
result.review_count = reviews ? reviews.innerText.trim() : null;
// Sold count
const sold = document.querySelector('[class*="sold"], [class*="sales"]');
result.sold = sold ? sold.innerText.trim() : null;
// Description
const desc = document.querySelector('[class*="description"], [class*="detail-info"]');
result.description = desc ? desc.innerText.substring(0, 500) : null;
return result;
""")
return product
def _scroll_page(self):
"""Scroll page to trigger lazy loading."""
for _ in range(5):
self.driver.execute_script("window.scrollBy(0, 800);")
time.sleep(1)
def close(self):
self.driver.quit()
# Usage
scraper = TemuSeleniumScraper(proxy="http://proxy:port")
results = scraper.search_products("phone cases", max_pages=2)
print(json.dumps(results[:5], indent=2))
scraper.close()Handling Temu’s Anti-Bot Protections
Temu has some of the most aggressive anti-scraping measures in e-commerce:
1. Advanced Fingerprinting
Temu uses sophisticated browser fingerprinting that checks:
- WebGL rendering
- Canvas fingerprint
- Audio context
- Installed fonts and plugins
Use undetected-chromedriver to minimize detection:
import undetected_chromedriver as uc
options = uc.ChromeOptions()
options.add_argument("--disable-blink-features=AutomationControlled")
driver = uc.Chrome(options=options)2. Dynamic HTML Structure
Temu frequently changes CSS class names and page structures. Build resilient selectors:
# Bad: fragile class-based selector
# soup.select("div.css-1abc2de")
# Good: attribute-based or content-based selectors
soup.select("[data-testid*='product']")
soup.find("span", string=re.compile(r'\$\d+\.\d{2}'))3. Request Rate Detection
Temu monitors request patterns aggressively. Use variable timing:
def adaptive_delay(request_num):
"""Progressive delay that increases over time."""
base = 3 + (request_num // 20) * 2 # Increase base every 20 requests
jitter = random.uniform(0, base * 0.5)
return base + jitterProxy Recommendations for Temu
| Proxy Type | Success Rate | Recommendation |
|---|---|---|
| Mobile Proxies | 85-95% | Best option for Temu |
| Residential Rotating | 60-75% | Good for moderate volume |
| ISP Proxies | 50-65% | Decent for small batches |
| Datacenter | 10-20% | Not recommended |
Temu’s anti-bot system is trained to detect datacenter IPs. Use mobile proxies or rotating residential proxies for reliable access.
Legal Considerations
- Terms of Service: Temu’s ToS strictly prohibits automated scraping and data extraction.
- Data Privacy: Chinese data protection regulations (PIPL) and international privacy laws apply.
- Copyright: Product images and descriptions are copyrighted content.
- Competition Law: Using scraped pricing data for price-fixing or anti-competitive purposes is illegal.
- Jurisdiction: Temu operates globally but is headquartered in China, adding jurisdictional complexity.
Always consult our web scraping compliance guide before starting a scraping project.
Rate Limiting Best Practices
- Start with 5-8 second delays between requests
- Take breaks: Pause for 60-120 seconds every 30 requests
- Rotate everything: IPs, user agents, and session cookies
- Respect 429 responses: Back off exponentially when rate-limited
- Monitor success rates: If they drop below 70%, slow down significantly
- Limit daily volume: Keep under 1,000 requests per IP per day
Conclusion
Temu is one of the more challenging e-commerce platforms to scrape due to its aggressive anti-bot protections. Success requires a combination of undetected-chromedriver, high-quality residential or mobile proxies, and careful request pacing.
For the best scraping infrastructure, explore dataresearchtools.com for proxy comparisons and setup guides. Our e-commerce proxy guide covers additional strategies for scraping competitive marketplaces.
- How to Scrape AliExpress Product Data
- How to Scrape Amazon Product Reviews in 2026
- aiohttp + BeautifulSoup: Async Python Scraping
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- API vs Web Scraping: When You Need Proxies (and When You Don’t)
- ASEAN Data Protection Laws: A Web Scraping Compliance Matrix
- How to Scrape AliExpress Product Data
- How to Scrape Amazon Product Reviews in 2026
- aiohttp + BeautifulSoup: Async Python Scraping
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- API vs Web Scraping: When You Need Proxies (and When You Don’t)
- ASEAN Data Protection Laws: A Web Scraping Compliance Matrix
- How to Scrape AliExpress Product Data
- How to Scrape Amazon Product Reviews in 2026
- aiohttp + BeautifulSoup: Async Python Scraping
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- API vs Web Scraping: When You Need Proxies (and When You Don’t)
- ASEAN Data Protection Laws: A Web Scraping Compliance Matrix
Related Reading
- How to Scrape AliExpress Product Data
- How to Scrape Amazon Product Reviews in 2026
- aiohttp + BeautifulSoup: Async Python Scraping
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- API vs Web Scraping: When You Need Proxies (and When You Don’t)
- ASEAN Data Protection Laws: A Web Scraping Compliance Matrix