How to Scrape Etsy Product Listings 2026
Etsy is one of the largest marketplaces for handmade, vintage, and unique goods, hosting over 90 million active buyers and 7.5 million active sellers. For e-commerce researchers, competitor analysts, and market intelligence professionals, scraping Etsy product listings provides invaluable data on pricing trends, popular categories, and seller performance.
In this comprehensive guide, you’ll learn how to scrape Etsy product data using Python, handle their anti-bot protections, and build a reliable data pipeline.
What Data Can You Extract from Etsy?
Etsy product listings contain a wealth of structured data that’s useful for market research and competitive analysis:
- Product titles and descriptions
- Pricing (including sale prices and original prices)
- Review counts and ratings
- Seller information (shop name, location, total sales)
- Product images and thumbnails
- Categories and tags
- Shipping information
- Variation options (sizes, colors, etc.)
Example JSON Output
{
"product_id": "1234567890",
"title": "Handmade Ceramic Mug - Blue Glaze",
"price": 28.99,
"original_price": 34.99,
"currency": "USD",
"rating": 4.8,
"review_count": 1243,
"seller": {
"shop_name": "CeramicStudioCo",
"location": "Portland, Oregon",
"total_sales": 15420
},
"categories": ["Home & Living", "Kitchen & Dining", "Drinkware", "Mugs"],
"tags": ["handmade mug", "ceramic mug", "blue mug", "pottery"],
"shipping": {
"free_shipping": true,
"estimated_delivery": "3-5 business days"
},
"variations": [
{"type": "Color", "options": ["Blue", "Green", "White"]},
{"type": "Size", "options": ["12oz", "16oz"]}
],
"url": "https://www.etsy.com/listing/1234567890/"
}Prerequisites
Before you start scraping Etsy, make sure you have the following installed:
pip install requests beautifulsoup4 lxml fake-useragentYou’ll also want a reliable proxy service to avoid IP blocks. Residential proxies are recommended for Etsy scraping since they rotate IPs from real residential connections.
Method 1: Scraping Etsy with Requests and BeautifulSoup
This method works well for extracting data from Etsy search results and individual product pages.
Scraping Etsy Search Results
import requests
from bs4 import BeautifulSoup
from fake_useragent import UserAgent
import json
import time
import random
class EtsyScraper:
def __init__(self, proxy_url=None):
self.session = requests.Session()
self.ua = UserAgent()
self.proxy_url = proxy_url
self.base_url = "https://www.etsy.com"
def _get_headers(self):
return {
"User-Agent": self.ua.random,
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.9",
"Accept-Encoding": "gzip, deflate, br",
"Referer": "https://www.etsy.com/",
"DNT": "1",
"Connection": "keep-alive",
}
def _get_proxies(self):
if self.proxy_url:
return {
"http": self.proxy_url,
"https": self.proxy_url
}
return None
def search_products(self, query, max_pages=5):
"""Scrape Etsy search results for a given query."""
all_products = []
for page in range(1, max_pages + 1):
url = f"{self.base_url}/search?q={query}&page={page}"
try:
response = self.session.get(
url,
headers=self._get_headers(),
proxies=self._get_proxies(),
timeout=30
)
response.raise_for_status()
soup = BeautifulSoup(response.text, "lxml")
products = self._parse_search_results(soup)
all_products.extend(products)
print(f"Page {page}: Found {len(products)} products")
# Respectful rate limiting
time.sleep(random.uniform(2, 5))
except requests.RequestException as e:
print(f"Error on page {page}: {e}")
continue
return all_products
def _parse_search_results(self, soup):
"""Parse product data from search results page."""
products = []
# Etsy renders product cards in a grid
listings = soup.select("div.v2-listing-card")
for listing in listings:
try:
product = {}
# Extract title
title_elem = listing.select_one("h3.v2-listing-card__title")
product["title"] = title_elem.get_text(strip=True) if title_elem else None
# Extract price
price_elem = listing.select_one("span.currency-value")
product["price"] = float(price_elem.get_text(strip=True).replace(",", "")) if price_elem else None
# Extract URL
link_elem = listing.select_one("a.listing-link")
product["url"] = link_elem["href"] if link_elem else None
# Extract listing ID from URL
if product["url"]:
product["listing_id"] = product["url"].split("/listing/")[1].split("/")[0]
# Extract shop name
shop_elem = listing.select_one("p.v2-listing-card__shop")
product["shop_name"] = shop_elem.get_text(strip=True) if shop_elem else None
# Extract rating
rating_elem = listing.select_one("span.v2-listing-card__rating")
if rating_elem:
product["rating"] = float(rating_elem.get_text(strip=True))
products.append(product)
except Exception as e:
print(f"Error parsing listing: {e}")
continue
return products
def scrape_product_page(self, url):
"""Scrape detailed data from an individual product page."""
try:
response = self.session.get(
url,
headers=self._get_headers(),
proxies=self._get_proxies(),
timeout=30
)
response.raise_for_status()
soup = BeautifulSoup(response.text, "lxml")
# Etsy embeds structured data in JSON-LD
script_tags = soup.find_all("script", type="application/ld+json")
for script in script_tags:
try:
data = json.loads(script.string)
if data.get("@type") == "Product":
return self._parse_structured_data(data, soup)
except json.JSONDecodeError:
continue
# Fallback to HTML parsing
return self._parse_product_html(soup)
except requests.RequestException as e:
print(f"Error scraping product: {e}")
return None
def _parse_structured_data(self, data, soup):
"""Parse product data from JSON-LD structured data."""
product = {
"title": data.get("name"),
"description": data.get("description"),
"url": data.get("url"),
"image": data.get("image"),
}
# Extract pricing
offers = data.get("offers", {})
if isinstance(offers, list):
offers = offers[0]
product["price"] = offers.get("price")
product["currency"] = offers.get("priceCurrency")
# Extract reviews
aggregate_rating = data.get("aggregateRating", {})
product["rating"] = aggregate_rating.get("ratingValue")
product["review_count"] = aggregate_rating.get("reviewCount")
# Extract seller info from HTML
shop_elem = soup.select_one("a[data-shop-name]")
if shop_elem:
product["shop_name"] = shop_elem.get("data-shop-name")
return product
def _parse_product_html(self, soup):
"""Fallback HTML parsing for product pages."""
product = {}
title = soup.select_one("h1[data-buy-box-listing-title]")
product["title"] = title.get_text(strip=True) if title else None
price = soup.select_one("div[data-buy-box-region='price'] p.wt-text-title-larger")
if price:
price_text = price.get_text(strip=True).replace("$", "").replace(",", "")
try:
product["price"] = float(price_text)
except ValueError:
product["price"] = price_text
return product
# Usage example
if __name__ == "__main__":
# Initialize with proxy
scraper = EtsyScraper(proxy_url="http://user:pass@proxy-server:port")
# Search for products
results = scraper.search_products("handmade ceramic mug", max_pages=3)
# Scrape individual product details
for product in results[:5]:
if product.get("url"):
details = scraper.scrape_product_page(product["url"])
print(json.dumps(details, indent=2))
time.sleep(random.uniform(3, 6))Method 2: Scraping Etsy with Selenium
For pages that rely heavily on JavaScript rendering, Selenium provides a more robust solution.
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options
import json
import time
class EtsySeleniumScraper:
def __init__(self, proxy=None):
chrome_options = Options()
chrome_options.add_argument("--headless")
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--disable-dev-shm-usage")
chrome_options.add_argument("--disable-blink-features=AutomationControlled")
if proxy:
chrome_options.add_argument(f"--proxy-server={proxy}")
self.driver = webdriver.Chrome(options=chrome_options)
self.driver.execute_cdp_cmd("Page.addScriptToEvaluateOnNewDocument", {
"source": "Object.defineProperty(navigator, 'webdriver', {get: () => undefined})"
})
def search_products(self, query, max_pages=3):
"""Search Etsy and extract product data."""
products = []
for page in range(1, max_pages + 1):
url = f"https://www.etsy.com/search?q={query}&page={page}"
self.driver.get(url)
# Wait for product cards to load
WebDriverWait(self.driver, 15).until(
EC.presence_of_all_elements_located(
(By.CSS_SELECTOR, "div.v2-listing-card")
)
)
# Scroll to load lazy content
self._scroll_page()
cards = self.driver.find_elements(By.CSS_SELECTOR, "div.v2-listing-card")
for card in cards:
try:
product = {
"title": card.find_element(By.CSS_SELECTOR, "h3").text,
"url": card.find_element(By.CSS_SELECTOR, "a").get_attribute("href"),
}
try:
price_elem = card.find_element(By.CSS_SELECTOR, "span.currency-value")
product["price"] = price_elem.text
except Exception:
product["price"] = None
products.append(product)
except Exception as e:
continue
time.sleep(3)
return products
def _scroll_page(self):
"""Scroll down the page to trigger lazy loading."""
total_height = self.driver.execute_script("return document.body.scrollHeight")
for i in range(0, total_height, 500):
self.driver.execute_script(f"window.scrollTo(0, {i});")
time.sleep(0.3)
def close(self):
self.driver.quit()
# Usage
scraper = EtsySeleniumScraper(proxy="http://proxy-server:port")
results = scraper.search_products("vintage jewelry")
print(json.dumps(results[:5], indent=2))
scraper.close()Handling Etsy’s Anti-Bot Protections
Etsy employs several anti-scraping measures that you need to handle:
1. Rate Limiting
Etsy will throttle or block your requests if you send too many in a short period. Implement respectful delays:
import random
import time
def respectful_delay(min_seconds=2, max_seconds=6):
"""Add a random delay between requests."""
delay = random.uniform(min_seconds, max_seconds)
time.sleep(delay)2. CAPTCHA Challenges
Etsy uses CAPTCHA to verify human visitors. To minimize CAPTCHA triggers:
- Rotate user agents frequently
- Use residential proxies that appear as regular users
- Maintain consistent session cookies
- Avoid rapid successive requests
3. JavaScript Rendering
Many Etsy pages load content dynamically. If you’re getting empty responses, switch to Selenium or consider using a headless browser with stealth plugins.
4. Request Headers
Always send complete, realistic headers. Missing headers are a common trigger for anti-bot systems:
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.9",
"Accept-Encoding": "gzip, deflate, br",
"Referer": "https://www.etsy.com/",
"Sec-Fetch-Dest": "document",
"Sec-Fetch-Mode": "navigate",
"Sec-Fetch-Site": "same-origin",
}Proxy Recommendations for Etsy Scraping
Etsy is moderately aggressive with anti-bot detection. Here’s what works best:
| Proxy Type | Effectiveness | Cost | Best For |
|---|---|---|---|
| Residential Rotating | High | $$$ | Large-scale scraping |
| ISP Proxies | High | $$ | Consistent sessions |
| Datacenter | Low | $ | Small tests only |
| Mobile Proxies | Very High | $$$$ | When other methods fail |
For most Etsy scraping projects, rotating residential proxies provide the best balance of reliability and cost. Rotate IPs every 3-5 requests to stay under the radar.
Etsy API Alternative
Before scraping, consider using the Etsy Open API v3. It provides structured access to:
- Active listings
- Shop information
- Receipts and transactions (for shop owners)
- Taxonomy and categories
However, the API has rate limits (10 requests per second) and requires OAuth authentication, which is why many opt for web scraping for large-scale data collection.
Legal Considerations
Before scraping Etsy, be aware of these legal aspects:
- Terms of Service: Etsy’s ToS prohibits automated data collection. Scraping may put your account or IP at risk.
- Copyright: Product descriptions and images are copyrighted by sellers. Don’t republish scraped content.
- Personal Data: Avoid collecting personally identifiable information (PII) about sellers or buyers. Comply with GDPR and other privacy regulations.
- Rate Limiting: Excessive scraping can impact Etsy’s servers. Always implement respectful rate limits.
- Commercial Use: Using scraped data commercially may have additional legal implications. Consult a lawyer for your specific use case.
For a deeper dive into the legal landscape, check out our guide to web scraping compliance.
Rate Limiting Best Practices
To scrape Etsy sustainably without getting blocked:
- Start slow: Begin with 1 request every 5 seconds and gradually increase
- Random delays: Use randomized intervals (2-6 seconds) between requests
- Session management: Rotate sessions every 50-100 requests
- Off-peak hours: Scrape during low-traffic hours (2 AM – 6 AM EST)
- Exponential backoff: If you receive a 429 status code, wait progressively longer before retrying
import time
def exponential_backoff(attempt, base_delay=5, max_delay=300):
"""Calculate delay with exponential backoff."""
delay = min(base_delay * (2 ** attempt), max_delay)
time.sleep(delay)Data Storage and Export
Once you’ve scraped Etsy data, store it efficiently:
import pandas as pd
import json
# Save to CSV
df = pd.DataFrame(products)
df.to_csv("etsy_products.csv", index=False)
# Save to JSON
with open("etsy_products.json", "w") as f:
json.dump(products, f, indent=2)Conclusion
Scraping Etsy product listings gives you powerful market intelligence for e-commerce research, competitive analysis, and trend monitoring. By using the techniques in this guide — proper headers, rotating proxies, respectful rate limiting, and robust parsing — you can build a reliable Etsy scraping pipeline.
For the best results, pair your scraper with high-quality residential proxies and always respect Etsy’s servers with appropriate rate limiting. Check out our complete guide to e-commerce scraping for more platform-specific strategies.
- How to Scrape AliExpress Product Data
- How to Scrape Amazon Product Reviews in 2026
- aiohttp + BeautifulSoup: Async Python Scraping
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- API vs Web Scraping: When You Need Proxies (and When You Don’t)
- ASEAN Data Protection Laws: A Web Scraping Compliance Matrix
- How to Scrape AliExpress Product Data
- How to Scrape Amazon Product Reviews in 2026
- aiohttp + BeautifulSoup: Async Python Scraping
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- API vs Web Scraping: When You Need Proxies (and When You Don’t)
- ASEAN Data Protection Laws: A Web Scraping Compliance Matrix
- How to Scrape AliExpress Product Data
- How to Scrape Amazon Product Reviews in 2026
- aiohttp + BeautifulSoup: Async Python Scraping
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- API vs Web Scraping: When You Need Proxies (and When You Don’t)
- ASEAN Data Protection Laws: A Web Scraping Compliance Matrix
Related Reading
- How to Scrape AliExpress Product Data
- How to Scrape Amazon Product Reviews in 2026
- aiohttp + BeautifulSoup: Async Python Scraping
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- API vs Web Scraping: When You Need Proxies (and When You Don’t)
- ASEAN Data Protection Laws: A Web Scraping Compliance Matrix