How to Scrape Wix and Squarespace Stores in 2026
Scraping Wix and Squarespace stores is harder than most ecommerce targets, not because they have better bot protection, but because neither platform exposes a clean public API for product data. what you get instead is heavily client-rendered HTML, proprietary JSON blobs buried inside <script> tags, and JavaScript-dependent pagination that breaks naive scrapers on the first request. this guide covers what actually works in 2026, with platform-specific patterns, tool picks, and honest tradeoffs.
how Wix and Squarespace serve product data
both platforms render product catalogs via JavaScript frameworks, not server-side HTML. that single fact drives every decision you’ll make downstream.
Wix (now Wix Studio for most new stores) loads product data through its internal _api/wix-ecommerce-storefront-web/api endpoint. the JSON response is structured and predictable once you find it. Wix also injects a window.__VIEWER_MODEL__ object into the page source on many storefronts — catalog state, no browser render needed.
Squarespace uses its own Commerce API (/api/2/commerce/) internally. product collections are served as JSON at /api/2/commerce/products with pagination via cursor. the HTML source usually includes a Static.SQUARESPACE_CONTEXT JSON block with store metadata, active collection IDs, and sometimes a partial product list.
this is a different situation from How to Scrape Shopify Stores at Scale 2026 (Without Getting Blocked), where /products.json is a public, documented endpoint anyone can hit. Wix and Squarespace require reverse-engineering internal APIs. there are no docs.
fingerprinting the platform before you write a single line
check which platform you’re dealing with first. they look similar from the outside and the scraper for one won’t work on the other.
| signal | Wix | Squarespace |
|---|---|---|
| HTTP response header | X-Wix-Request-Id present | X-ServedBy: squarespace |
| HTML source | window.rendererModel or __VIEWER_MODEL__ | Static.SQUARESPACE_CONTEXT block |
| asset CDN | static.parastorage.com | static1.squarespace.com |
| robots.txt | disallows on /_api/ | usually /api/ blocked |
a curl request to the root URL and a quick grep takes under two seconds. more reliable than URL guessing, more reliable than favicon matching. if you’re running a multi-platform pipeline that also handles Magento or BigCommerce, the detection approaches in How to Scrape Magento Stores in 2026: API and HTML Patterns and How to Scrape BigCommerce Stores Programmatically (2026) follow the same logic.
scraping Wix stores
the viewer model path
fastest option: parse window.__VIEWER_MODEL__ directly from the HTML. no browser, no JS execution.
import httpx, re, json
def fetch_wix_viewer_model(url: str) -> dict:
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
"Accept-Language": "en-US,en;q=0.9",
}
r = httpx.get(url, headers=headers, follow_redirects=True, timeout=15)
match = re.search(
r'window\.__VIEWER_MODEL__\s*=\s*(\{.+?\})(?=;\s*</script>)',
r.text, re.DOTALL
)
if match:
return json.loads(match.group(1))
return {}
this works on roughly 60-70% of Wix storefronts. for the rest — newer Wix Studio builds mostly — you POST to _api/wix-ecommerce-storefront-web/api directly using the store’s metaSiteId. find it in the viewer model or the page source. responses are paginated JSON with products under a predictable key path.
things that will trip you up:
- Wix rate-limits by IP around 1 req/2s. rotate residential proxies after every 50-80 requests
_api/endpoints return 403 withoutX-Wix-Brandor with an unexpectedOriginheader. mirror these from a real browser session on the target store- product variants are nested under
productItemsinside each product object, not at the top level. easy to miss
Wix anti-bot
higher-traffic stores run Imperva (Incapsula). you’ll hit a JS challenge page. Playwright with rebrowser-patches plus rotating residential proxies clears it reliably. pure httpx won’t — don’t try to solve Imperva challenges in a pure HTTP client, it’s a time sink.
scraping Squarespace stores
Squarespace’s internal Commerce API is the easier of the two. products paginate at /api/2/commerce/products with a cursor param. the sequence:
- GET
/api/2/commerce/products?per_page=200— first page pluspagination.nextPageCursor - GET
/api/2/commerce/products?per_page=200&cursor=<nextPageCursor>— repeat untilpagination.hasNextPageis false - parse
items[]from each response — each hasvariants,pricing,images,categories
the per_page cap is 200. most stores have under 2,000 products, so you’re looking at maybe 1-10 requests per store. no auth required on public storefronts. it’s almost too easy.
password-protected stores redirect /api/2/ calls to the password page. check Content-Type on the response. JSON means open, HTML means gated. skip and move on.
but unlike the pattern-matching grunt work needed for How to Scrape WooCommerce Stores 2026: Pattern Recognition Approach, Squarespace gives you a consistent API surface regardless of theme. you write the scraper once and it works on every store.
proxy and rate-limit strategy
both platforms use Cloudflare CDN for static assets, but API traffic runs through different stacks. from testing in 2026:
- Wix: soft rate-limit around 1 req/2s per IP, 429 with retry-after
- Squarespace: more lenient, roughly 5-10 req/s per IP before a temporary block
for Wix, residential proxies are not optional for any serious run. datacenter IPs get blocked at the Imperva layer before they ever reach product data. for Squarespace, datacenter proxies from Bright Data or Oxylabs work fine on most stores — which cuts cost a lot if you’re doing bulk collection.
rules that hold up in practice:
- rotate IP every 50 Wix
_api/requests, or immediately on any 403 - rotate every 200 Squarespace
/api/2/commerce/requests, or on any 429 - keep
User-AgentandAccept-Languageconsistent within a session. rotating headers independently from IPs creates a fingerprint mismatch and triggers blocks faster than the rate limit would
common failure modes
- empty
items[]on Squarespace: store uses a non-default catalog structure. checkStatic.SQUARESPACE_CONTEXTforactivePageCollections, swap in the right collection ID - Wix viewer model missing products: the store runs on Wix Blocks. fall back to Playwright and intercept XHR to
_api/wix-ecommerce-storefront-web/ - 403 on Wix
_api/: missingX-Wix-Brandor wrongReferer. copy headers from a live browser session on that exact store, not from a different Wix site - Squarespace returns HTML not JSON: password-protected or in maintenence mode. skip it
- Wix pagination stops early:
metaSiteIdmismatch. extract the ID from each target URL independently, never reuse across stores
bottom line
Squarespace is the easier target — consistent API, no browser required, minimal anti-bot. Wix takes more setup: viewer model extraction, Playwright fallback for JS-heavy stores, residential proxies if Imperva shows up. if you’re building a multi-platform ecommerce scraper, validate on Squarespace first, then add Wix. dataresearchtools.com covers the full ecommerce scraping stack — the same reverse-engineering approach here applies to any headless-first storefront you’ll run into in 2026.