How to Scrape Shein Product Data with Proxies in 2026

Shein has grown into one of the world’s largest fast-fashion e-commerce platforms, with millions of products updated daily. For competitive intelligence, price monitoring, and market research, extracting Shein product data programmatically is a common requirement. However, Shein employs sophisticated anti-bot measures that make scraping without proxies virtually impossible.

This guide walks you through scraping Shein product data using Python and residential proxies, covering everything from initial setup to handling pagination at scale.

Why Scrape Shein?

Shein’s catalog is massive and constantly changing. Brands, dropshippers, and market researchers need access to this data for several reasons:

Competitive pricing analysis — Track how Shein prices products relative to competitors
Trend identification — Spot emerging fashion trends before they hit mainstream retail
Product research — Analyze bestsellers, review sentiment, and sizing data
Supplier intelligence — Understand Shein’s product sourcing patterns
Inventory monitoring — Track stock levels and restock patterns

Understanding Shein’s Anti-Bot Measures

Shein uses several layers of protection to prevent automated access:

Rate limiting — Aggressive request throttling that blocks IPs making too many requests in short intervals
Browser fingerprinting — JavaScript-based checks that detect headless browsers and automation tools
CAPTCHA challenges — reCAPTCHA and custom challenges triggered by suspicious behavior
Dynamic content loading — Heavy use of JavaScript rendering that prevents simple HTTP scraping
Cookie validation — Session-based tokens that must be maintained across requests
User-Agent verification — Checks for consistent and realistic browser signatures

Without proxies, your IP will be blocked within minutes of starting any scraping operation.

Data Points to Extract

A comprehensive Shein product scrape typically targets these fields:

Data Point	Location	Notes
Product name	Title element	Often includes brand and style
SKU / Product ID	URL or data attributes	Unique identifier
Price	Price container	Current and original price
Discount percentage	Badge element	Flash sale indicators
Images	Gallery container	Multiple angles, zoom versions
Reviews	Review section	Text, rating, photos, sizing feedback
Size options	Size selector	Available sizes and stock status
Color variants	Color picker	Hex codes and swatch images
Category breadcrumb	Navigation	Full category path
Shipping info	Delivery section	Estimated delivery times

Setting Up Your Environment

Install the required Python packages:

pip install requests beautifulsoup4 lxml fake-useragent

Python Code: Scraping Shein with Proxy Rotation

Here is a complete scraper that extracts product data from Shein category pages:

import requests
from bs4 import BeautifulSoup
from fake_useragent import UserAgent
import json
import time
import random
import logging

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

class SheinScraper:
    def __init__(self, proxy_list: list):
        self.proxy_list = proxy_list
        self.ua = UserAgent()
        self.session = requests.Session()
        self.base_url = "https://www.shein.com"
        self.results = []

    def get_proxy(self) -> dict:
        """Rotate through proxy list randomly."""
        proxy = random.choice(self.proxy_list)
        return {
            "http": f"http://{proxy}",
            "https": f"http://{proxy}"
        }

    def get_headers(self) -> dict:
        """Generate realistic browser headers."""
        return {
            "User-Agent": self.ua.random,
            "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
            "Accept-Language": "en-US,en;q=0.9",
            "Accept-Encoding": "gzip, deflate, br",
            "Connection": "keep-alive",
            "Upgrade-Insecure-Requests": "1",
            "Cache-Control": "max-age=0"
        }

    def scrape_category(self, category_url: str, max_pages: int = 10):
        """Scrape all products from a category with pagination."""
        for page in range(1, max_pages + 1):
            page_url = f"{category_url}?page={page}"
            logger.info(f"Scraping page {page}: {page_url}")

            try:
                response = self.session.get(
                    page_url,
                    headers=self.get_headers(),
                    proxies=self.get_proxy(),
                    timeout=30
                )

                if response.status_code == 200:
                    self.parse_category_page(response.text)
                elif response.status_code == 403:
                    logger.warning("Access denied -- rotating proxy")
                    time.sleep(random.uniform(5, 10))
                    continue
                else:
                    logger.error(f"Status {response.status_code}")

            except requests.exceptions.RequestException as e:
                logger.error(f"Request failed: {e}")

            # Random delay between pages
            time.sleep(random.uniform(2, 5))

    def parse_category_page(self, html: str):
        """Extract product data from category page HTML."""
        soup = BeautifulSoup(html, "lxml")

        # Shein often embeds product data in JSON within script tags
        scripts = soup.find_all("script", type="application/ld+json")
        for script in scripts:
            try:
                data = json.loads(script.string)
                if isinstance(data, list):
                    for item in data:
                        if item.get("@type") == "Product":
                            self.results.append(self.extract_product(item))
            except (json.JSONDecodeError, TypeError):
                continue

        # Fallback: parse HTML elements directly
        product_cards = soup.select("[class*='product-card']")
        for card in product_cards:
            product = self.parse_product_card(card)
            if product and product not in self.results:
                self.results.append(product)

    def parse_product_card(self, card) -> dict:
        """Parse individual product card HTML element."""
        title_el = card.select_one("[class*='title'], [class*='name']")
        price_el = card.select_one("[class*='price']")
        link_el = card.select_one("a[href]")
        img_el = card.select_one("img[src]")

        return {
            "name": title_el.get_text(strip=True) if title_el else None,
            "price": price_el.get_text(strip=True) if price_el else None,
            "url": link_el["href"] if link_el else None,
            "image": img_el["src"] if img_el else None
        }

    def extract_product(self, json_data: dict) -> dict:
        """Extract structured product data from JSON-LD."""
        return {
            "name": json_data.get("name"),
            "price": json_data.get("offers", {}).get("price"),
            "currency": json_data.get("offers", {}).get("priceCurrency"),
            "availability": json_data.get("offers", {}).get("availability"),
            "image": json_data.get("image"),
            "sku": json_data.get("sku"),
            "rating": json_data.get("aggregateRating", {}).get("ratingValue"),
            "review_count": json_data.get("aggregateRating", {}).get("reviewCount")
        }

    def scrape_product_detail(self, product_url: str) -> dict:
        """Scrape detailed data from individual product page."""
        try:
            response = self.session.get(
                product_url,
                headers=self.get_headers(),
                proxies=self.get_proxy(),
                timeout=30
            )
            if response.status_code != 200:
                return {}

            soup = BeautifulSoup(response.text, "lxml")

            # Extract review data
            reviews = []
            review_elements = soup.select("[class*='review-item']")
            for rev in review_elements:
                rating_el = rev.select_one("[class*='rating']")
                text_el = rev.select_one("[class*='content'], [class*='text']")
                reviews.append({
                    "rating": rating_el.get_text(strip=True) if rating_el else None,
                    "text": text_el.get_text(strip=True) if text_el else None
                })

            # Extract size information
            sizes = []
            size_elements = soup.select("[class*='size-item'], [class*='size-option']")
            for size in size_elements:
                sizes.append(size.get_text(strip=True))

            return {
                "reviews": reviews,
                "sizes": sizes,
                "url": product_url
            }

        except requests.exceptions.RequestException as e:
            logger.error(f"Detail page failed: {e}")
            return {}


# Usage example
if __name__ == "__main__":
    proxies = [
        "user:pass@residential1.proxy.com:8080",
        "user:pass@residential2.proxy.com:8080",
        "user:pass@residential3.proxy.com:8080",
    ]

    scraper = SheinScraper(proxy_list=proxies)
    scraper.scrape_category(
        "https://www.shein.com/Women-Dresses-c-1727.html",
        max_pages=5
    )

    print(f"Scraped {len(scraper.results)} products")
    with open("shein_products.json", "w") as f:
        json.dump(scraper.results, f, indent=2)

Handling Pagination and Categories

Shein organizes products by category, subcategory, and collection. To scrape comprehensively:

Start with the sitemap — Shein publishes a sitemap at /sitemap.xml that lists all category URLs
Handle infinite scroll — Some category pages use lazy loading. Monitor the network requests to find the underlying API endpoint that returns paginated JSON
Category tree traversal — Build a recursive crawler that follows category links down to leaf categories
Pagination parameters — Shein uses ?page=N for most category pages, with typically 120 products per page

def get_all_categories(self):
    """Extract all category URLs from Shein sitemap."""
    response = self.session.get(
        f"{self.base_url}/sitemap.xml",
        headers=self.get_headers(),
        proxies=self.get_proxy()
    )
    soup = BeautifulSoup(response.text, "xml")
    urls = [loc.text for loc in soup.find_all("loc")]
    categories = [u for u in urls if "/c-" in u]
    return categories

Recommended Proxy Type

For Shein scraping, residential proxies are the clear winner:

Residential rotating proxies — Best for category page scraping at scale. Rotate IPs every request or every few requests to avoid detection.
Sticky residential sessions — Use 5-10 minute sticky sessions when scraping individual product detail pages to maintain session consistency.
Geo-targeting — Target US, UK, or EU IPs to access region-specific pricing and catalogs.

Datacenter proxies get detected and blocked almost immediately on Shein. Mobile proxies work well but are more expensive than necessary for this use case.

Use our proxy cost calculator to estimate bandwidth costs for your Shein scraping project.

Rate Limiting and Best Practices

Follow these guidelines to scrape Shein sustainably:

Request delays — Wait 2-5 seconds between requests minimum
Session rotation — Create new sessions every 50-100 requests
User-Agent rotation — Rotate between 20+ realistic browser User-Agent strings
Time distribution — Spread scraping across different hours to mimic organic traffic
Retry logic — Implement exponential backoff when encountering 403 or 429 responses
Respect robots.txt — Check Shein’s robots.txt for disallowed paths

Troubleshooting

Problem: Getting empty responses or 403 errors

Rotate to fresh proxy IPs. Your current IPs may be flagged.
Verify your headers include realistic Accept and Accept-Language values.
Try accessing from a different geographic region.

Problem: Product data is missing from HTML

Shein renders much content via JavaScript. Consider using a headless browser like Playwright for JS-heavy pages.
Check for JSON data embedded in Scroll to Top English