How to Scrape Wix and Squarespace Stores in 2026

Scraping Wix and Squarespace stores is harder than most ecommerce targets, not because they have better bot protection, but because neither platform exposes a clean public API for product data. what you get instead is heavily client-rendered HTML, proprietary JSON blobs buried inside <script> tags, and JavaScript-dependent pagination that breaks naive scrapers on the first request. this guide covers what actually works in 2026, with platform-specific patterns, tool picks, and honest tradeoffs.

how Wix and Squarespace serve product data

both platforms render product catalogs via JavaScript frameworks, not server-side HTML. that single fact drives every decision you’ll make downstream.

Wix (now Wix Studio for most new stores) loads product data through its internal _api/wix-ecommerce-storefront-web/api endpoint. the JSON response is structured and predictable once you find it. Wix also injects a window.__VIEWER_MODEL__ object into the page source on many storefronts — catalog state, no browser render needed.

Squarespace uses its own Commerce API (/api/2/commerce/) internally. product collections are served as JSON at /api/2/commerce/products with pagination via cursor. the HTML source usually includes a Static.SQUARESPACE_CONTEXT JSON block with store metadata, active collection IDs, and sometimes a partial product list.

this is a different situation from How to Scrape Shopify Stores at Scale 2026 (Without Getting Blocked), where /products.json is a public, documented endpoint anyone can hit. Wix and Squarespace require reverse-engineering internal APIs. there are no docs.

fingerprinting the platform before you write a single line

check which platform you’re dealing with first. they look similar from the outside and the scraper for one won’t work on the other.

signal	Wix	Squarespace
HTTP response header	`X-Wix-Request-Id` present	`X-ServedBy: squarespace`
HTML source	`window.rendererModel` or `__VIEWER_MODEL__`	`Static.SQUARESPACE_CONTEXT` block
asset CDN	`static.parastorage.com`	`static1.squarespace.com`
robots.txt	disallows on `/_api/`	usually `/api/` blocked

a curl request to the root URL and a quick grep takes under two seconds. more reliable than URL guessing, more reliable than favicon matching. if you’re running a multi-platform pipeline that also handles Magento or BigCommerce, the detection approaches in How to Scrape Magento Stores in 2026: API and HTML Patterns and How to Scrape BigCommerce Stores Programmatically (2026) follow the same logic.

scraping Wix stores

the viewer model path

fastest option: parse window.__VIEWER_MODEL__ directly from the HTML. no browser, no JS execution.

import httpx, re, json

def fetch_wix_viewer_model(url: str) -> dict:
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
        "Accept-Language": "en-US,en;q=0.9",
    }
    r = httpx.get(url, headers=headers, follow_redirects=True, timeout=15)
    match = re.search(
        r'window\.__VIEWER_MODEL__\s*=\s*(\{.+?\})(?=;\s*</script>)',
        r.text, re.DOTALL
    )
    if match:
        return json.loads(match.group(1))
    return {}

this works on roughly 60-70% of Wix storefronts. for the rest — newer Wix Studio builds mostly — you POST to _api/wix-ecommerce-storefront-web/api directly using the store’s metaSiteId. find it in the viewer model or the page source. responses are paginated JSON with products under a predictable key path.

things that will trip you up:

Wix rate-limits by IP around 1 req/2s. rotate residential proxies after every 50-80 requests
_api/ endpoints return 403 without X-Wix-Brand or with an unexpected Origin header. mirror these from a real browser session on the target store
product variants are nested under productItems inside each product object, not at the top level. easy to miss

Wix anti-bot

higher-traffic stores run Imperva (Incapsula). you’ll hit a JS challenge page. Playwright with rebrowser-patches plus rotating residential proxies clears it reliably. pure httpx won’t — don’t try to solve Imperva challenges in a pure HTTP client, it’s a time sink.

scraping Squarespace stores

Squarespace’s internal Commerce API is the easier of the two. products paginate at /api/2/commerce/products with a cursor param. the sequence:

GET /api/2/commerce/products?per_page=200 — first page plus pagination.nextPageCursor
GET /api/2/commerce/products?per_page=200&cursor=<nextPageCursor> — repeat until pagination.hasNextPage is false
parse items[] from each response — each has variants, pricing, images, categories

the per_page cap is 200. most stores have under 2,000 products, so you’re looking at maybe 1-10 requests per store. no auth required on public storefronts. it’s almost too easy.

password-protected stores redirect /api/2/ calls to the password page. check Content-Type on the response. JSON means open, HTML means gated. skip and move on.

but unlike the pattern-matching grunt work needed for How to Scrape WooCommerce Stores 2026: Pattern Recognition Approach, Squarespace gives you a consistent API surface regardless of theme. you write the scraper once and it works on every store.

proxy and rate-limit strategy

both platforms use Cloudflare CDN for static assets, but API traffic runs through different stacks. from testing in 2026:

Wix: soft rate-limit around 1 req/2s per IP, 429 with retry-after
Squarespace: more lenient, roughly 5-10 req/s per IP before a temporary block

for Wix, residential proxies are not optional for any serious run. datacenter IPs get blocked at the Imperva layer before they ever reach product data. for Squarespace, datacenter proxies from Bright Data or Oxylabs work fine on most stores — which cuts cost a lot if you’re doing bulk collection.

rules that hold up in practice:

rotate IP every 50 Wix _api/ requests, or immediately on any 403
rotate every 200 Squarespace /api/2/commerce/ requests, or on any 429
keep User-Agent and Accept-Language consistent within a session. rotating headers independently from IPs creates a fingerprint mismatch and triggers blocks faster than the rate limit would

common failure modes

empty items[] on Squarespace: store uses a non-default catalog structure. check Static.SQUARESPACE_CONTEXT for activePageCollections, swap in the right collection ID
Wix viewer model missing products: the store runs on Wix Blocks. fall back to Playwright and intercept XHR to _api/wix-ecommerce-storefront-web/
403 on Wix _api/: missing X-Wix-Brand or wrong Referer. copy headers from a live browser session on that exact store, not from a different Wix site
Squarespace returns HTML not JSON: password-protected or in maintenence mode. skip it
Wix pagination stops early: metaSiteId mismatch. extract the ID from each target URL independently, never reuse across stores

bottom line

Squarespace is the easier target — consistent API, no browser required, minimal anti-bot. Wix takes more setup: viewer model extraction, Playwright fallback for JS-heavy stores, residential proxies if Imperva shows up. if you’re building a multi-platform ecommerce scraper, validate on Squarespace first, then add Wix. dataresearchtools.com covers the full ecommerce scraping stack — the same reverse-engineering approach here applies to any headless-first storefront you’ll run into in 2026.