How to Scrape Magento Stores in 2026: API and HTML Patterns

The skill is loaded. Let me write and humanize the article in one pass, then save it.

—

Draft Rewrite

Magento powers a surprising chunk of mid-market and enterprise ecommerce, and scraping it in 2026 means knowing which version you’re dealing with, whether the store exposes its REST or GraphQL API, and how hard the bot mitigation is. the platform is actually more scraper-friendly than most — if you know where to look.

Detect the Magento Version First

before writing a single line of scraper code, confirm the target is Magento and which generation. Magento 1 is EOL but still running on thousands of stores. Magento 2 (Adobe Commerce / Open Source) is the default target.

quick fingerprint signals:

/skin/frontend/ in asset paths = Magento 1
/static/version[hash]/frontend/ = Magento 2
X-Magento-Cache-Id response header = Magento 2
Mage.Cookies in page source = Magento 1

curl -sI https://example.com/ | grep -i magento
curl -s https://example.com/ | grep -o 'static/version[^/]*'

Magento 1 stores have no official API, so they need pure HTML parsing (covered below). Magento 2 is the main event.

Magento 2 REST and GraphQL APIs

Magento 2 ships with a full REST API and a GraphQL endpoint. many stores leave at least the catalog endpoints publicly accessible without auth, because the storefront itself needs them for page rendering.

REST API

the base path is /rest/V1/. common public endpoints:

Endpoint	Returns
`/rest/V1/products?searchCriteria[pageSize]=50`	product list with full attributes
`/rest/V1/products/{sku}`	single product detail
`/rest/V1/categories`	full category tree
`/rest/V1/products/{sku}/media`	image URLs
`/rest/V1/configurable-products/{sku}/children`	variant SKUs

import httpx

BASE = "https://example.com/rest/V1"
params = {
    "searchCriteria[pageSize]": 100,
    "searchCriteria[currentPage]": 1,
    "searchCriteria[sortOrders][0][field]": "id",
    "searchCriteria[sortOrders][0][direction]": "ASC",
}
r = httpx.get(f"{BASE}/products", params=params, timeout=15)
data = r.json()
products = data["items"]
total = data["total_count"]

paginate by incrementing currentPage until len(products) < pageSize. total_count tells you the full catalog size upfront, so you can size your job queue before firing a single extra request.

GraphQL

Magento 2.3+ has a GraphQL endpoint at /graphql. it's often faster than REST for storefront data because you pull exactly what you need in one round-trip.

{
  products(search: "", pageSize: 50, currentPage: 1) {
    total_count
    items {
      sku
      name
      price_range {
        minimum_price { regular_price { value currency } }
      }
      categories { id name url_key }
    }
  }
}

POST that as {"query": "..."} to /graphql. no auth needed for catalog data on most stores. GraphQL also handles bundled product structures and layered navigation filters in one shot, which REST fumbles.

if you're used to the structured API approach from other platforms, How to Scrape BigCommerce Stores Programmatically (2026) covers a similar REST-first pattern that maps cleanly to Magento's field structure.

HTML Scraping for Magento 1 and API-Blocked Stores

some stores disable the API entirely, put it behind OAuth, or just run Magento 1. fall back to HTML parsing. Magento's frontend is consistent enough that a few selectors cover most themes.

useful selectors on default Luma and blank themes:

product list items: .product-item
product name: .product-item-link
price: .price (or .special-price .price for sale items)
SKU on PDP: [itemprop="sku"]
pagination: rel="next" in

Magento 1 uses .product-name and .price-box but keeps the same microdata itemprop pattern.

numbered extraction flow for a category page:

fetch the category URL, parse
nodes
extract href from .product-item-link for each PDP URL
fetch each PDP, extract [itemprop="sku"], [itemprop="price"], and [itemprop="image"]
check for rel="next" in and iterate
on configurable products, pull the [data-role="swatch-options"] JSON blob for full variant data without extra requests

that JSON blob in step 5 is the real shortcut. Magento inlines the full variant matrix as a JavaScript object on the page. you get all variant prices and attribute combinations without touching the REST API at all, which matters when you're dealing with a catalog where every parent SKU has a dozen children.

this is the same embedded JSON island pattern described in How to Scrape WooCommerce Stores 2026: Pattern Recognition Approach, where most structured data lives inside

Scroll to Top

   message me on telegram    
Resources
Proxy Signals Podcast
Operator-level insights on mobile proxies and access infrastructure.
Multi-Account Proxies: Setup, Types, Tools & Mistakes (2026)
                 English