How to Scrape BigCommerce Stores Programmatically (2026)

BigCommerce powers roughly 45,000 live stores as of 2026, and unlike Shopify or WooCommerce, it ships with a public Storefront GraphQL API enabled by default on almost every store. That single fact changes your scraping strategy completely: reach for the API first, fall back to HTML only when you have to.

Why BigCommerce Is Easier to Scrape Than Most Platforms

BigCommerce separates its catalog layer from its storefront rendering in a way most platforms do not. Every store exposes a /graphql endpoint at the store domain that accepts unauthenticated POST requests for product, category, and collection data. There is no API key requirement for read-only catalog queries on the storefront API, only a x-bc-storefront-api-token for rate-limited operations.

Compare this to the fingerprinting-heavy JavaScript rendering you deal with on How to Scrape Shopify Stores at Scale 2026 (Without Getting Blocked), where headless Chrome is often unavoidable. BigCommerce catalog data is available over plain HTTP with a simple POST body.

The Storefront GraphQL API: Your First Stop

Send a POST to https://{store-domain}/graphql with Content-Type: application/json. No auth header needed for basic product queries.

import httpx

STORE = "https://example-store.mybigcommerce.com"

QUERY = """
{
  site {
    products(first: 50) {
      edges {
        node {
          entityId
          name
          sku
          prices {
            price { value currencyCode }
            salePrice { value currencyCode }
          }
          brand { name }
          inventory { aggregated { availableToSell } }
        }
      }
      pageInfo { hasNextPage endCursor }
    }
  }
}
"""

resp = httpx.post(
    f"{STORE}/graphql",
    json={"query": QUERY},
    headers={"Content-Type": "application/json"},
    timeout=15,
)
data = resp.json()["data"]["site"]["products"]

Paginate with products(first: 50, after: "{endCursor}") until hasNextPage is false. Most stores return 50 products per page; pushing to 250 sometimes works but triggers rate limits faster.

For category trees, query site.categoryTree in the same endpoint. Brand and variant data live under node.variants and node.brand respectively. If a store uses a custom storefront (Stencil theme with heavy JS), the GraphQL endpoint is still present unless the merchant has explicitly disabled it, which is rare.

HTML Fallback: When the API Isn’t Enough

Reviews, store policies, custom metafields, and some promotional content only appear in the rendered HTML. BigCommerce Stencil themes have predictable class patterns.

Key selectors across most Stencil themes:

Product name: .productView-title
Price: .price--withoutTax or .price--withTax
Review count: .productView-reviewLink
Category breadcrumb: .breadcrumbs
Pagination: .pagination-list

BigCommerce does run Cloudflare on many stores. For HTML scraping, rotate residential proxies and keep request rates below 1 req/sec per domain. The same anti-bot discipline applies here as on How to Scrape WooCommerce Stores 2026: Pattern Recognition Approach, where recognizing platform-specific patterns lets you stay under detection thresholds.

Rate Limits and Error Handling

The Storefront GraphQL API enforces rate limits by IP and by store. Common responses you will encounter:

HTTP Code	Meaning	Action
200	Success	Parse normally
429	Rate limited	Back off 30-60s, retry with jitter
403	CF challenge or store disabled API	Switch IP, try HTML fallback
503	Store temporarily offline	Retry after 120s
200 + `errors` key	GraphQL validation error	Check query syntax

Note that BigCommerce returns HTTP 200 even for GraphQL errors, so always check resp.json().get("errors") before processing. A missing data key or a populated errors array means the query failed silently.

For large crawls across many stores, the recommended approach mirrors what works for How to Scrape Magento Stores in 2026: API and HTML Patterns: detect the platform first, then route to the right extraction method. BigCommerce stores are identifiable by the x-bc-store-hash response header or the /graphql endpoint returning a schema response.

Scaling Across Multiple BigCommerce Stores

If you are aggregating data from hundreds of stores (price intelligence, competitor monitoring, review aggregation), the workflow is:

Detect store platform by probing /graphql and checking x-bc-store-hash in response headers.
Extract the store hash from the header or from the Stencil JS bundle on the page (window.BCData.store_hash).
Issue GraphQL catalog queries with cursor-based pagination.
Fall back to HTML scraping for reviews and promotional blocks.
Normalize SKUs, prices, and inventory counts into a shared schema before storage.
Re-crawl on a schedule: full catalog weekly, price/inventory fields daily or every 6 hours for volatile categories.

Store hashes also let you hit the BCApp CDN directly for product images rather than downloading from the store domain, which cuts your bandwidth and avoids storefront-level bot detection entirely.

For comparison, platforms like Wix or Squarespace give you almost none of these structural handholds. If you have worked through How to Scrape Wix and Squarespace Stores in 2026, the BigCommerce GraphQL API will feel like a gift.

Proxy and Header Strategy

For the GraphQL endpoint, datacenter proxies usually work fine since the API is not as aggressively fingerprinted as rendered storefronts. Use residential proxies when:

The store has Cloudflare in front and you start seeing 403s on GraphQL POSTs.
You need to scrape region-specific pricing (geo-targeting is common in B2B BigCommerce stores).
You are scraping HTML pages for review content.

Keep your User-Agent set to a modern Chrome string and include Accept-Language: en-US,en;q=0.9. BigCommerce does not check Referer on GraphQL requests but some Cloudflare rules do.

Review Data via the Public REST API

A lesser-known fact: BigCommerce also exposes a public v2 REST endpoint for product reviews on some stores at /api/v2/products/{id}/reviews.json. No auth is needed if the store has not locked it down. Check for a 403 or redirect to determine if a given store has restricted it. This is faster than parsing HTML review widgets and gives you structured JSON with rating, author, and date fields.

If you are building a broader review aggregation pipeline that spans multiple software directories, the extraction patterns overlap significantly with what is covered in How to Scrape G2.com and Capterra SaaS Reviews Programmatically.

Bottom Line

BigCommerce is one of the more scraper-friendly ecommerce platforms because the Storefront GraphQL API removes the need for headless rendering on most catalog data. Start with GraphQL, layer in HTML fallback for reviews and promotions, handle 429s with exponential backoff, and use residential proxies only when Cloudflare starts blocking your GraphQL POSTs. If you need coverage across all major ecommerce platforms, DRT maintains platform-specific guides for each one as part of its ongoing web scraping infrastructure series.