How to Scrape AliExpress Product Data
AliExpress is one of the world’s largest online retail marketplaces, owned by Alibaba Group and connecting international buyers with Chinese manufacturers and sellers. With over 150 million active buyers and millions of product listings, AliExpress is a goldmine for dropshippers, price comparison tools, and e-commerce researchers.
This guide walks you through scraping AliExpress product data with Python, handling their robust anti-bot systems, and building a scalable data extraction pipeline.
What Data Can You Extract from AliExpress?
AliExpress product pages contain rich, structured data:
- Product titles and descriptions
- Pricing (with bulk discounts and flash deals)
- Seller information (store name, rating, years on platform)
- Product ratings and review counts
- Shipping options and costs by destination
- Product specifications and attributes
- Order count and popularity metrics
- Product images and videos
- Variation details (colors, sizes, styles)
Example JSON Output
{
"product_id": "1005006789012345",
"title": "Wireless Bluetooth Earbuds TWS Noise Canceling",
"price": {
"current": 15.99,
"original": 39.99,
"discount_percentage": 60,
"currency": "USD"
},
"rating": 4.7,
"review_count": 8432,
"orders": 25600,
"seller": {
"store_name": "TechGadget Official Store",
"store_rating": 96.2,
"followers": 45000,
"years_on_platform": 5
},
"shipping": {
"free_shipping": true,
"estimated_delivery": "15-25 days",
"ship_from": "China"
},
"specifications": {
"Brand": "Generic",
"Bluetooth Version": "5.3",
"Battery Life": "6 hours",
"Waterproof Rating": "IPX5"
},
"categories": ["Consumer Electronics", "Earphones & Headphones", "TWS Earbuds"],
"url": "https://www.aliexpress.com/item/1005006789012345.html"
}Prerequisites
Install the required Python packages:
pip install requests beautifulsoup4 selenium webdriver-manager fake-useragent lxmlAliExpress has strong anti-bot protections, so rotating residential proxies are essential for any meaningful scraping operation.
Method 1: Scraping with Requests and BeautifulSoup
AliExpress renders significant content via JavaScript, but search result metadata can be captured from embedded JSON data.
import requests
from bs4 import BeautifulSoup
from fake_useragent import UserAgent
import json
import re
import time
import random
class AliExpressScraper:
def __init__(self, proxy_url=None):
self.session = requests.Session()
self.ua = UserAgent()
self.proxy_url = proxy_url
self.base_url = "https://www.aliexpress.com"
def _get_headers(self):
return {
"User-Agent": self.ua.random,
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.9",
"Accept-Encoding": "gzip, deflate, br",
"Referer": "https://www.aliexpress.com/",
"Connection": "keep-alive",
"Sec-Fetch-Dest": "document",
"Sec-Fetch-Mode": "navigate",
}
def _get_proxies(self):
if self.proxy_url:
return {"http": self.proxy_url, "https": self.proxy_url}
return None
def search_products(self, query, max_pages=5):
"""Scrape AliExpress search results."""
all_products = []
for page in range(1, max_pages + 1):
url = f"{self.base_url}/wholesale?SearchText={query}&page={page}"
try:
response = self.session.get(
url,
headers=self._get_headers(),
proxies=self._get_proxies(),
timeout=30
)
response.raise_for_status()
products = self._extract_search_data(response.text)
all_products.extend(products)
print(f"Page {page}: Found {len(products)} products")
time.sleep(random.uniform(3, 7))
except requests.RequestException as e:
print(f"Error on page {page}: {e}")
continue
return all_products
def _extract_search_data(self, html):
"""Extract product data from embedded JavaScript."""
products = []
# AliExpress often embeds product data in script tags
pattern = r'_init_data_\s*=\s*{\s*data:\s*({.*?})\s*}'
match = re.search(pattern, html, re.DOTALL)
if match:
try:
data = json.loads(match.group(1))
items = data.get("data", {}).get("root", {}).get("fields", {}).get("mods", {}).get("itemList", {}).get("content", [])
for item in items:
product = {
"product_id": item.get("productId"),
"title": item.get("title", {}).get("displayTitle"),
"price": item.get("prices", {}).get("salePrice", {}).get("minPrice"),
"original_price": item.get("prices", {}).get("originalPrice", {}).get("minPrice"),
"rating": item.get("evaluation", {}).get("starRating"),
"orders": item.get("trade", {}).get("tradeDesc"),
"store_name": item.get("store", {}).get("storeName"),
"url": f"https://www.aliexpress.com/item/{item.get('productId')}.html",
"image": item.get("image", {}).get("imgUrl"),
"free_shipping": item.get("logistics", {}).get("freeShipping", False),
}
products.append(product)
except (json.JSONDecodeError, KeyError) as e:
print(f"Error parsing JSON data: {e}")
# Fallback: parse HTML directly
if not products:
soup = BeautifulSoup(html, "lxml")
products = self._parse_html_search(soup)
return products
def _parse_html_search(self, soup):
"""Fallback HTML parsing for search results."""
products = []
cards = soup.select("div[class*='product-card']")
for card in cards:
try:
title_elem = card.select_one("h3, [class*='title']")
price_elem = card.select_one("[class*='price']")
link_elem = card.select_one("a[href*='/item/']")
product = {
"title": title_elem.get_text(strip=True) if title_elem else None,
"price": price_elem.get_text(strip=True) if price_elem else None,
"url": link_elem["href"] if link_elem else None,
}
products.append(product)
except Exception:
continue
return products
def scrape_product_detail(self, product_id):
"""Scrape a single product page for detailed data."""
url = f"{self.base_url}/item/{product_id}.html"
try:
response = self.session.get(
url,
headers=self._get_headers(),
proxies=self._get_proxies(),
timeout=30
)
response.raise_for_status()
soup = BeautifulSoup(response.text, "lxml")
# Try to extract structured data
script_tags = soup.find_all("script", type="application/ld+json")
for script in script_tags:
try:
data = json.loads(script.string)
if data.get("@type") == "Product":
return {
"title": data.get("name"),
"description": data.get("description"),
"price": data.get("offers", {}).get("price"),
"currency": data.get("offers", {}).get("priceCurrency"),
"rating": data.get("aggregateRating", {}).get("ratingValue"),
"review_count": data.get("aggregateRating", {}).get("reviewCount"),
"image": data.get("image"),
}
except json.JSONDecodeError:
continue
# Extract from embedded page data
return self._extract_product_page_data(response.text)
except requests.RequestException as e:
print(f"Error scraping product {product_id}: {e}")
return None
def _extract_product_page_data(self, html):
"""Extract product details from page scripts."""
pattern = r'window\.runParams\s*=\s*({.*?});'
match = re.search(pattern, html, re.DOTALL)
if match:
try:
data = json.loads(match.group(1))
action_data = data.get("data", {}).get("actionModule", {})
title_data = data.get("data", {}).get("titleModule", {})
price_data = data.get("data", {}).get("priceModule", {})
return {
"title": title_data.get("subject"),
"product_id": action_data.get("productId"),
"price": price_data.get("formattedActivityPrice"),
"original_price": price_data.get("formattedPrice"),
}
except (json.JSONDecodeError, KeyError):
pass
return None
# Usage
if __name__ == "__main__":
scraper = AliExpressScraper(proxy_url="http://user:pass@proxy:port")
results = scraper.search_products("wireless earbuds", max_pages=3)
for product in results[:3]:
if product.get("product_id"):
details = scraper.scrape_product_detail(product["product_id"])
print(json.dumps(details, indent=2))
time.sleep(random.uniform(4, 8))Method 2: Scraping AliExpress with Selenium
AliExpress heavily relies on JavaScript. Selenium is often necessary for full data extraction.
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options
import json
import time
import random
class AliExpressSeleniumScraper:
def __init__(self, proxy=None):
chrome_options = Options()
chrome_options.add_argument("--headless=new")
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--disable-blink-features=AutomationControlled")
chrome_options.add_experimental_option("excludeSwitches", ["enable-automation"])
if proxy:
chrome_options.add_argument(f"--proxy-server={proxy}")
self.driver = webdriver.Chrome(options=chrome_options)
self.driver.execute_cdp_cmd("Page.addScriptToEvaluateOnNewDocument", {
"source": """
Object.defineProperty(navigator, 'webdriver', {get: () => undefined});
window.chrome = {runtime: {}};
"""
})
def search_and_scrape(self, query, max_pages=3):
"""Search AliExpress and scrape product data."""
products = []
for page in range(1, max_pages + 1):
url = f"https://www.aliexpress.com/wholesale?SearchText={query}&page={page}"
self.driver.get(url)
# Wait for products to load
try:
WebDriverWait(self.driver, 20).until(
EC.presence_of_element_located((By.CSS_SELECTOR, "[class*='product-card'], [class*='search-item']"))
)
except Exception:
print(f"Timeout on page {page}")
continue
# Scroll to load all products
self._lazy_scroll()
# Extract product data via JavaScript
page_products = self.driver.execute_script("""
const products = [];
const cards = document.querySelectorAll('[class*="product-card"], [class*="search-item"]');
cards.forEach(card => {
const title = card.querySelector('h3, [class*="title"]');
const price = card.querySelector('[class*="price"]');
const link = card.querySelector('a[href*="/item/"]');
const rating = card.querySelector('[class*="star"]');
products.push({
title: title ? title.innerText.trim() : null,
price: price ? price.innerText.trim() : null,
url: link ? link.href : null,
rating: rating ? rating.innerText.trim() : null
});
});
return products;
""")
products.extend(page_products)
print(f"Page {page}: {len(page_products)} products")
time.sleep(random.uniform(3, 6))
return products
def _lazy_scroll(self):
"""Gradually scroll the page to trigger lazy-loaded content."""
last_height = self.driver.execute_script("return document.body.scrollHeight")
while True:
self.driver.execute_script("window.scrollBy(0, 800);")
time.sleep(1)
new_height = self.driver.execute_script("return document.body.scrollHeight")
if new_height == last_height:
break
last_height = new_height
def close(self):
self.driver.quit()Handling AliExpress Anti-Bot Protections
AliExpress has some of the most aggressive anti-bot systems among e-commerce sites:
1. Akamai Bot Manager
AliExpress uses Akamai’s bot detection which analyzes browser fingerprints, mouse movements, and behavioral patterns. Countermeasures include:
- Use undetected-chromedriver for Selenium-based scraping
- Simulate human-like mouse movements and scrolling
- Use residential proxies that look like real users
pip install undetected-chromedriverimport undetected_chromedriver as uc
driver = uc.Chrome(headless=True)
driver.get("https://www.aliexpress.com/")2. CAPTCHA and Slide Verification
AliExpress frequently presents slide CAPTCHAs. Strategies:
- Reduce request frequency to avoid triggering
- Use mobile proxies for cleaner IP reputation
- Implement CAPTCHA-solving services as a fallback
3. Cookie and Session Management
Maintain proper cookies to appear as a returning visitor:
# Save and load cookies between sessions
import pickle
# Save cookies
pickle.dump(driver.get_cookies(), open("aliexpress_cookies.pkl", "wb"))
# Load cookies
cookies = pickle.load(open("aliexpress_cookies.pkl", "rb"))
for cookie in cookies:
driver.add_cookie(cookie)Proxy Recommendations for AliExpress
| Proxy Type | Success Rate | Best For |
|---|---|---|
| Residential Rotating | 70-80% | Search results, basic scraping |
| Mobile Proxies | 90%+ | High-volume, anti-bot bypass |
| ISP Proxies | 75-85% | Session-based scraping |
| Datacenter | 20-30% | Not recommended |
For AliExpress, mobile proxies offer the highest success rates due to their shared IP pools that are trusted by AliExpress’s systems. For budget-conscious projects, rotating residential proxies are a good alternative.
Legal Considerations
- Terms of Service: AliExpress explicitly prohibits scraping in their ToS. Proceed at your own risk.
- Data Protection: Chinese data protection laws (PIPL) may apply to seller data. Avoid collecting personal information.
- Intellectual Property: Product images and descriptions are protected by copyright.
- Rate Limits: Aggressive scraping can be considered a denial-of-service attack. Always use respectful rate limiting.
- Commercial Use: Consult legal counsel before using scraped data for commercial purposes.
See our web scraping legal guide for detailed compliance information.
Rate Limiting Best Practices
AliExpress is particularly sensitive to scraping patterns:
- Minimum 3-5 second delays between requests
- Rotate IPs every 5-10 requests
- Rotate user agents on every request
- Limit to 200-500 requests per hour per IP
- Implement exponential backoff on errors
def smart_rate_limit(request_count, base_delay=3):
"""Adaptive rate limiting based on request count."""
if request_count % 50 == 0:
# Take a longer break every 50 requests
time.sleep(random.uniform(30, 60))
elif request_count % 10 == 0:
time.sleep(random.uniform(10, 20))
else:
time.sleep(random.uniform(base_delay, base_delay * 2))Conclusion
Scraping AliExpress requires more sophisticated techniques than many other e-commerce sites due to their aggressive anti-bot protections. By combining Selenium with undetected-chromedriver, rotating residential or mobile proxies, and careful rate limiting, you can build a reliable data pipeline.
For best results, use high-quality proxy services optimized for e-commerce scraping. Check out our proxy comparison guides to find the best provider for your AliExpress scraping needs.
- How to Scrape Amazon Product Reviews in 2026
- How to Scrape Apollo.io Contact Data in 2026
- aiohttp + BeautifulSoup: Async Python Scraping
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- API vs Web Scraping: When You Need Proxies (and When You Don’t)
- ASEAN Data Protection Laws: A Web Scraping Compliance Matrix
- How to Scrape Amazon Product Reviews in 2026
- How to Scrape Apollo.io Contact Data in 2026
- aiohttp + BeautifulSoup: Async Python Scraping
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- API vs Web Scraping: When You Need Proxies (and When You Don’t)
- ASEAN Data Protection Laws: A Web Scraping Compliance Matrix
- How to Scrape Amazon Product Reviews in 2026
- How to Scrape Apollo.io Contact Data in 2026
- aiohttp + BeautifulSoup: Async Python Scraping
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- API vs Web Scraping: When You Need Proxies (and When You Don’t)
- ASEAN Data Protection Laws: A Web Scraping Compliance Matrix
Related Reading
- How to Scrape Amazon Product Reviews in 2026
- How to Scrape Apollo.io Contact Data in 2026
- aiohttp + BeautifulSoup: Async Python Scraping
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- API vs Web Scraping: When You Need Proxies (and When You Don’t)
- ASEAN Data Protection Laws: A Web Scraping Compliance Matrix