Etsy surfaces its best-seller badges and trending tag labels on public product pages, and scraping them at scale is genuinely useful for competitive research, niche validation, and dropshipping product discovery. The catch is that Etsy runs aggressive bot detection, rate-limits unauthenticated crawlers hard, and returns different HTML depending on whether your request looks like a browser or a script. Here is a practical 2026 guide to getting the data reliably.
What Data You Can Actually Pull
Etsy does not expose a public API for best-seller or trending data. Everything you care about lives in rendered HTML or embedded JSON-LD on product and search pages.
Useful fields per listing:
- Listing title, price, sale price
- “Bestseller” badge (a
with classwt-badge--small) - Star rating and review count
- Shop name and sales count
- Tags (visible on listing pages, not search results)
- Estimated monthly sales (inferred from review velocity, not served directly)
Trending tags appear in Etsy’s search autocomplete (/api/v3/ajax/typeahead/etsy/term) and in the “Shop by popular tags” carousels on category pages. Both endpoints are accessible without login but require consistent headers.
How Etsy Detects Bots
Before writing a single line of code, understand the detection stack you are up against:
| Layer | Method | Notes |
|---|---|---|
| TLS fingerprinting | JA3/JA4 hash check | Requests/httpx fail without spoofing |
| Header validation | User-Agent, Accept, Sec-Fetch-* | Missing Sec-Fetch headers = instant block |
| IP reputation | DataDome (embedded on most pages) | Datacenter IPs blocked by default |
| Behavioral analysis | Mouse events, scroll timing | Only triggers on JS-heavy category pages |
| CAPTCHA | hCaptcha | Triggered on rapid listing traversal |
The TLS fingerprint check is the highest-priority hurdle. Plain requests with a spoofed User-Agent still fails because the TLS handshake looks like Python. Use curl_cffi with impersonate="chrome120" or route through a residential proxy with its own TLS termination.
DataDome is the persistent layer. It tracks request cadence across sessions and will silently serve degraded HTML (no badge data, no review count) long before it serves a hard block. This is similar to the detection stack you encounter when doing more general marketplace work like scraping Walmart Marketplace seller data.
Scraping Best-Seller Listings: Working Approach
For listing-level data, the most reliable path in 2026 is:
- Build a seed URL list from Etsy search (
/search?q=)&explicit=1&ship_to=US - Paginate through results pages (up to page 250, ~6000 results per query)
- For each listing URL, fetch the product page and parse the embedded
block - Supplement with HTML parsing for the bestseller badge
import re, json
from curl_cffi import requests as cffi_requests
SESSION = cffi_requests.Session(impersonate="chrome120")
def fetch_listing(url: str) -> dict:
resp = SESSION.get(
url,
headers={
"Accept-Language": "en-US,en;q=0.9",
"Referer": "https://www.etsy.com/search",
},
timeout=20,
)
html = resp.text
# Extract JSON-LD
match = re.search(r'<script type="application/ld\+json">(.*?)</script>', html, re.S)
data = json.loads(match.group(1)) if match else {}
# Bestseller badge
is_bestseller = 'wt-badge--small' in html and 'Bestseller' in html
return {
"name": data.get("name"),
"price": data.get("offers", {}).get("price"),
"rating": data.get("aggregateRating", {}).get("ratingValue"),
"review_count": data.get("aggregateRating", {}).get("reviewCount"),
"is_bestseller": is_bestseller,
}
Run this through a rotating residential proxy pool. Aim for one request every 3-8 seconds per IP, randomized. At 10 concurrent sessions across different IPs you can pull roughly 1,500 listings per hour without triggering DataDome's behavioral thresholds.
For tags, fetch the individual listing URL and parse the elements inside the
Pulling Trending Tags from the Autocomplete API
The autocomplete endpoint is the fastest source for trending tag signals:
GET https://www.etsy.com/api/v3/ajax/typeahead/etsy/term?term=<prefix>&limit=10
No auth required, but you need to set x-csrf-token and x-etsy-user-agent headers that match a real browser session. Capture these once via browser DevTools, then reuse them. The token rotates every ~24 hours, so build a refresh mechanism.
Response includes results[].term strings ranked by Etsy's internal trending score. Prefix-sweep common root terms ("handmade", "vintage", "personalized", "custom", "boho") to map the trending tag graph across a category. A full sweep of 200 seed prefixes takes about 15 minutes and produces a clean list of ~800 high-signal tags.
This approach is lighter-weight than scraping full search result pages. If you are already running scraping pipelines against other platforms, the session-header management pattern here is similar to what you need for scraping Lever and Greenhouse job boards, where CSRF tokens and session cookies also need active management.
Proxy and Infrastructure Choices
Residential proxies are non-negotiable for Etsy at any meaningful scale. Datacenter IPs are blocked at the DataDome layer. Here is a quick comparison of realistic options:
| Provider type | Pass rate on Etsy | Cost per GB | Best for |
|---|---|---|---|
| Residential rotating | ~85-90% | $3-$8 | High-volume listing crawls |
| Mobile (4G LTE) | ~95%+ | $15-$25 | Autocomplete API, badge extraction |
| ISP/static residential | ~80-85% | $4-$10 | Session-persistent flows |
| Datacenter | <20% | $0.50-$2 | Not viable for Etsy |
Mobile proxies outperform residential for Etsy because mobile IPs score well on Etsy's trust model. If you are running similar scraping work on other high-trust-requirement targets like Amazon brand registry pages, the same mobile proxy pool carries over cleanly.
Rotate IPs per domain session, not per request. DataDome penalizes rapid IP rotation more than steady moderate-volume sessions. One IP, one Etsy session, 50-100 requests, then rotate.
Handling Blocks and Soft Failures
Etsy soft-blocks look like real responses. You will get HTTP 200 with stripped content. Build explicit validation:
- Badge count in response should be non-zero if you are querying a bestseller-focused search
- JSON-LD block should always be present on listing pages (its absence = soft block)
- Review count
0on a listing with 4.8 stars is a signal you got served degraded HTML
When you detect a soft block, discard the IP, add a 30-second delay, and retry on a new session. Do not retry the same URL immediately on the same IP. Log soft-block rate per proxy provider to tune your rotation strategy.
The general discipline here applies across scraping targets. When you hit dynamic sites like boutique recruitment portals you see the same pattern: HTTP 200 with missing data fields is often more dangerous than an explicit 403, because it silently corrupts your dataset.
For the full picture on Etsy's data model including seller-level metrics and shop statistics, the Etsy product and seller data scraping guide on DRT covers the shop endpoint structure and pagination in detail.
Bottom line
Use curl_cffi with Chrome impersonation, residential or mobile proxies rotated at the session level, and validate every response for soft-block signals before writing to your dataset. The autocomplete API is the fastest route to trending tag data and worth hitting separately from the listing crawl. DRT covers this category of scraping target in depth, so check back as Etsy's detection stack evolves through 2026.