How to Scrape StockX and GOAT for Sneaker Price Tracking

How to Scrape StockX and GOAT for Sneaker Price Tracking

The sneaker resale market generates over $10 billion annually, with StockX and GOAT serving as the two dominant marketplace platforms. For resellers, investors, brand analysts, and market researchers, real-time pricing data from these platforms drives buy/sell decisions, identifies profitable arbitrage opportunities, and reveals demand trends across sneaker models.

Both platforms heavily protect their pricing data through aggressive anti-bot measures, API authentication requirements, and sophisticated fingerprinting. This guide demonstrates how to build a sneaker price tracker that extracts pricing, bid data, and sales history from StockX and GOAT using Python and mobile proxy rotation.

Understanding StockX and GOAT API Architecture

Both platforms use modern frontend architectures backed by API endpoints that serve data to their React-based interfaces.

StockX

StockX operates as a stock market for consumer goods. Key data points include:

  • Ask price: The lowest price a seller is willing to accept
  • Bid price: The highest price a buyer is offering
  • Last sale: The most recent completed transaction price
  • Sales history: A time series of all past transactions
  • Volatility: Price fluctuation metrics over time periods
  • Number of bids/asks: Market depth indicators

StockX uses a GraphQL API internally, which provides structured access to product and market data. However, it requires authentication tokens and implements rate limiting.

GOAT

GOAT offers similar marketplace functionality but with a REST-based API structure:

  • Retail price vs. resale price
  • Size-specific pricing
  • Condition grades (new, used, defect)
  • Historical price charts
  • Seller ratings and listing counts

Both platforms implement anti-bot protections that make web scraping proxies essential for sustained data collection.

Setting Up the Environment

pip install requests beautifulsoup4 selenium webdriver-manager pandas

Building the StockX Scraper

StockX data is accessible through a combination of server-rendered HTML (which contains initial product data) and API endpoints (for detailed market data):

import requests
from bs4 import BeautifulSoup
import json
import time
import random
import re
import pandas as pd
from datetime import datetime
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC


class SneakerProxyPool:
    """Manages proxy rotation for sneaker marketplace scraping."""

    def __init__(self, proxy_list):
        self.proxies = proxy_list
        self.index = 0
        self.cooldown = {}

    def get_proxy(self):
        """Return the next available proxy."""
        now = time.time()
        available = [
            p for p in self.proxies
            if p not in self.cooldown or now > self.cooldown[p]
        ]
        if not available:
            self.cooldown.clear()
            available = self.proxies

        proxy = available[self.index % len(available)]
        self.index += 1
        return proxy

    def set_cooldown(self, proxy, seconds=60):
        """Put a proxy on cooldown."""
        self.cooldown[proxy] = time.time() + seconds

    def get_requests_proxy(self):
        """Return proxy formatted for requests library."""
        proxy = self.get_proxy()
        return {"http": proxy, "https": proxy}

    def create_selenium_driver(self, proxy=None):
        """Create a configured Selenium driver."""
        if proxy is None:
            proxy = self.get_proxy()

        options = Options()
        options.add_argument(f"--proxy-server={proxy}")
        options.add_argument("--headless=new")
        options.add_argument("--disable-blink-features=AutomationControlled")
        options.add_argument("--no-sandbox")
        options.add_argument(
            "--user-agent=Mozilla/5.0 (iPhone; CPU iPhone OS 17_4 like Mac OS X) "
            "AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.4 "
            "Mobile/15E148 Safari/604.1"
        )
        options.add_experimental_option("excludeSwitches", ["enable-automation"])

        service = Service(ChromeDriverManager().install())
        driver = webdriver.Chrome(service=service, options=options)

        driver.execute_cdp_cmd(
            "Page.addScriptToEvaluateOnNewDocument",
            {"source": "Object.defineProperty(navigator, 'webdriver', {get: () => undefined})"},
        )

        return driver, proxy


class StockXScraper:
    """Scrapes product and pricing data from StockX."""

    def __init__(self, proxy_pool):
        self.proxy_pool = proxy_pool
        self.session = requests.Session()
        self.session.headers.update({
            "User-Agent": (
                "Mozilla/5.0 (iPhone; CPU iPhone OS 17_4 like Mac OS X) "
                "AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.4 "
                "Mobile/15E148 Safari/604.1"
            ),
            "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
            "Accept-Language": "en-US,en;q=0.5",
        })

    def scrape_product_page(self, product_slug):
        """Scrape product details and current pricing from a StockX product page."""
        url = f"https://stockx.com/{product_slug}"
        proxy = self.proxy_pool.get_requests_proxy()

        try:
            response = self.session.get(url, proxies=proxy, timeout=20)

            if response.status_code == 200:
                return self._parse_product_page(response.text, product_slug)
            elif response.status_code == 403:
                print(f"Blocked on {product_slug}, trying Selenium fallback...")
                return self._scrape_with_selenium(product_slug)
            else:
                print(f"HTTP {response.status_code} for {product_slug}")
                return None

        except requests.RequestException as e:
            print(f"Request error: {e}")
            return self._scrape_with_selenium(product_slug)

    def _parse_product_page(self, html, slug):
        """Extract product data from StockX page HTML."""
        soup = BeautifulSoup(html, "html.parser")
        product = {"slug": slug, "scraped_at": datetime.now().isoformat()}

        # Extract Next.js data
        next_data = soup.select_one("script#__NEXT_DATA__")
        if next_data:
            try:
                data = json.loads(next_data.string)
                props = data.get("props", {}).get("pageProps", {})
                product_data = props.get("req", {}).get("product", {})

                if product_data:
                    product["title"] = product_data.get("title")
                    product["brand"] = product_data.get("brand")
                    product["colorway"] = product_data.get("colorway")
                    product["retail_price"] = product_data.get("retailPrice")
                    product["style_id"] = product_data.get("styleId")
                    product["release_date"] = product_data.get("releaseDate")
                    product["product_id"] = product_data.get("id")

                    # Market data
                    market = product_data.get("market", {})
                    product["lowest_ask"] = market.get("lowestAsk")
                    product["highest_bid"] = market.get("highestBid")
                    product["last_sale"] = market.get("lastSale")
                    product["sales_last_72h"] = market.get("salesLast72Hours")
                    product["change_value"] = market.get("changeValue")
                    product["change_percentage"] = market.get("changePercentage")
                    product["volatility"] = market.get("volatility")
                    product["number_of_bids"] = market.get("numberOfBids")
                    product["number_of_asks"] = market.get("numberOfAsks")

                    return product

            except (json.JSONDecodeError, TypeError, KeyError) as e:
                print(f"JSON parsing error: {e}")

        # Fallback: parse visible elements
        title_el = soup.select_one("h1")
        product["title"] = title_el.get_text(strip=True) if title_el else None

        return product

    def _scrape_with_selenium(self, product_slug):
        """Fallback scraper using Selenium for JavaScript-rendered content."""
        driver, proxy = self.proxy_pool.create_selenium_driver()

        try:
            url = f"https://stockx.com/{product_slug}"
            driver.get(url)

            WebDriverWait(driver, 15).until(
                EC.presence_of_element_located((By.CSS_SELECTOR, "h1"))
            )

            time.sleep(random.uniform(2, 4))
            html = driver.page_source
            return self._parse_product_page(html, product_slug)

        except Exception as e:
            print(f"Selenium scrape error: {e}")
            return None

        finally:
            driver.quit()

    def scrape_search_results(self, query, max_results=40):
        """Search StockX and return matching products."""
        driver, proxy = self.proxy_pool.create_selenium_driver()
        products = []

        try:
            search_url = f"https://stockx.com/search?s={query.replace(' ', '%20')}"
            driver.get(search_url)

            WebDriverWait(driver, 15).until(
                EC.presence_of_element_located(
                    (By.CSS_SELECTOR, "[data-testid='product-tile'], a[href*='/']")
                )
            )

            time.sleep(random.uniform(2, 4))

            # Scroll to load more results
            for _ in range(3):
                driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
                time.sleep(random.uniform(1, 2))

            soup = BeautifulSoup(driver.page_source, "html.parser")

            tiles = soup.select("[data-testid='product-tile'], .browse-grid a.tile")

            for tile in tiles[:max_results]:
                product = {}

                name_el = tile.select_one("[data-testid='product-tile-title'], .tile-title")
                product["name"] = name_el.get_text(strip=True) if name_el else None

                price_el = tile.select_one("[data-testid='product-tile-lowest-ask'], .tile-price")
                if price_el:
                    price_text = price_el.get_text(strip=True)
                    product["lowest_ask"] = self._clean_price(price_text)
                    product["lowest_ask_raw"] = price_text

                link_el = tile if tile.name == "a" else tile.select_one("a")
                if link_el and link_el.get("href"):
                    href = link_el["href"]
                    product["slug"] = href.strip("/").split("/")[-1]
                    product["url"] = f"https://stockx.com{href}"

                if product.get("name"):
                    products.append(product)

            print(f"Search '{query}': found {len(products)} products")

        except Exception as e:
            print(f"Search error: {e}")

        finally:
            driver.quit()

        return products

    @staticmethod
    def _clean_price(price_text):
        """Extract numeric price from text."""
        match = re.search(r"[\d,]+\.?\d*", price_text.replace(",", ""))
        return float(match.group()) if match else None

Building the GOAT Scraper

GOAT has a different data structure but similar anti-bot protections:

class GOATScraper:
    """Scrapes sneaker data from GOAT marketplace."""

    def __init__(self, proxy_pool):
        self.proxy_pool = proxy_pool
        self.session = requests.Session()
        self.session.headers.update({
            "User-Agent": (
                "Mozilla/5.0 (iPhone; CPU iPhone OS 17_4 like Mac OS X) "
                "AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.4 "
                "Mobile/15E148 Safari/604.1"
            ),
            "Accept": "application/json, text/html",
        })

    def scrape_product(self, product_slug):
        """Scrape product data from GOAT."""
        driver, proxy = self.proxy_pool.create_selenium_driver()

        try:
            url = f"https://www.goat.com/sneakers/{product_slug}"
            driver.get(url)

            WebDriverWait(driver, 15).until(
                EC.presence_of_element_located((By.CSS_SELECTOR, "h1"))
            )

            time.sleep(random.uniform(2, 4))
            soup = BeautifulSoup(driver.page_source, "html.parser")

            product = {
                "slug": product_slug,
                "platform": "goat",
                "scraped_at": datetime.now().isoformat(),
            }

            # Try to extract from Next.js data
            next_data = soup.select_one("script#__NEXT_DATA__")
            if next_data:
                try:
                    data = json.loads(next_data.string)
                    product_info = (
                        data.get("props", {})
                        .get("pageProps", {})
                        .get("productTemplate", {})
                    )

                    if product_info:
                        product["title"] = product_info.get("name")
                        product["brand"] = product_info.get("brandName")
                        product["retail_price"] = product_info.get("retailPriceCents", 0) / 100
                        product["release_date"] = product_info.get("releaseDate")
                        product["colorway"] = product_info.get("color")
                        product["sku"] = product_info.get("sku")

                        # Pricing by size
                        sizes = product_info.get("sizeRange", [])
                        product["size_prices"] = []
                        for size in sizes:
                            size_data = {
                                "size": size.get("value"),
                                "lowest_price": size.get("lowestPriceCents", {}).get("amount", 0) / 100,
                            }
                            product["size_prices"].append(size_data)

                except (json.JSONDecodeError, TypeError, KeyError):
                    pass

            # Fallback to visible elements
            if not product.get("title"):
                title_el = soup.select_one("h1")
                product["title"] = title_el.get_text(strip=True) if title_el else None

            return product

        except Exception as e:
            print(f"GOAT scrape error for {product_slug}: {e}")
            return None

        finally:
            driver.quit()

    def search_products(self, query, max_results=20):
        """Search for products on GOAT."""
        driver, proxy = self.proxy_pool.create_selenium_driver()
        products = []

        try:
            search_url = f"https://www.goat.com/search?query={query.replace(' ', '%20')}"
            driver.get(search_url)

            WebDriverWait(driver, 15).until(
                EC.presence_of_element_located(
                    (By.CSS_SELECTOR, "[data-testid='grid-item'], .grid-item")
                )
            )

            time.sleep(random.uniform(2, 4))
            soup = BeautifulSoup(driver.page_source, "html.parser")

            items = soup.select("[data-testid='grid-item'], .grid-item, a[href*='/sneakers/']")

            for item in items[:max_results]:
                product = {"platform": "goat"}

                name_el = item.select_one("[data-testid='product-card-title'], .product-name")
                product["name"] = name_el.get_text(strip=True) if name_el else None

                price_el = item.select_one("[data-testid='product-card-price'], .product-price")
                if price_el:
                    product["price"] = self._clean_price(price_el.get_text(strip=True))

                link = item if item.name == "a" else item.select_one("a")
                if link and link.get("href"):
                    href = link["href"]
                    product["slug"] = href.split("/")[-1]
                    product["url"] = f"https://www.goat.com{href}" if href.startswith("/") else href

                if product.get("name"):
                    products.append(product)

        except Exception as e:
            print(f"GOAT search error: {e}")

        finally:
            driver.quit()

        return products

    @staticmethod
    def _clean_price(price_text):
        match = re.search(r"[\d,]+\.?\d*", price_text.replace(",", ""))
        return float(match.group()) if match else None

Building the Price Tracker

Combine StockX and GOAT data into a unified price tracking system:

class SneakerPriceTracker:
    """Unified price tracker across StockX and GOAT."""

    def __init__(self, stockx_scraper, goat_scraper, data_dir="sneaker_data"):
        self.stockx = stockx_scraper
        self.goat = goat_scraper
        self.data_dir = data_dir

    def track_product(self, stockx_slug, goat_slug=None):
        """Get current prices from both platforms for a product."""
        result = {
            "tracked_at": datetime.now().isoformat(),
            "stockx_slug": stockx_slug,
            "goat_slug": goat_slug,
        }

        # StockX data
        stockx_data = self.stockx.scrape_product_page(stockx_slug)
        if stockx_data:
            result["stockx_lowest_ask"] = stockx_data.get("lowest_ask")
            result["stockx_highest_bid"] = stockx_data.get("highest_bid")
            result["stockx_last_sale"] = stockx_data.get("last_sale")
            result["stockx_title"] = stockx_data.get("title")
            result["retail_price"] = stockx_data.get("retail_price")

        time.sleep(random.uniform(3, 6))

        # GOAT data
        if goat_slug:
            goat_data = self.goat.scrape_product(goat_slug)
            if goat_data:
                result["goat_lowest_price"] = None
                if goat_data.get("size_prices"):
                    prices = [
                        sp["lowest_price"] for sp in goat_data["size_prices"]
                        if sp["lowest_price"] > 0
                    ]
                    if prices:
                        result["goat_lowest_price"] = min(prices)
                result["goat_title"] = goat_data.get("title")

        # Calculate arbitrage
        if result.get("stockx_lowest_ask") and result.get("goat_lowest_price"):
            result["price_diff"] = round(
                result["stockx_lowest_ask"] - result["goat_lowest_price"], 2
            )
            result["cheaper_platform"] = (
                "goat" if result["goat_lowest_price"] < result["stockx_lowest_ask"]
                else "stockx"
            )

        return result

    def track_watchlist(self, watchlist):
        """Track prices for a list of products."""
        results = []

        for item in watchlist:
            print(f"Tracking: {item.get('name', item['stockx_slug'])}")

            data = self.track_product(
                item["stockx_slug"],
                item.get("goat_slug"),
            )
            data["name"] = item.get("name", "")
            results.append(data)

            time.sleep(random.uniform(5, 10))

        return results

    def save_snapshot(self, results):
        """Save a price tracking snapshot."""
        import os
        os.makedirs(self.data_dir, exist_ok=True)

        timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
        filename = f"{self.data_dir}/prices_{timestamp}.json"

        with open(filename, "w") as f:
            json.dump(results, f, indent=2, default=str)

        print(f"Snapshot saved: {filename}")
        return filename

    def analyze_price_history(self, snapshots_dir=None):
        """Analyze price trends from historical snapshots."""
        import os
        import glob

        data_dir = snapshots_dir or self.data_dir
        files = sorted(glob.glob(f"{data_dir}/prices_*.json"))

        if not files:
            print("No snapshots found")
            return None

        all_data = []
        for filepath in files:
            with open(filepath) as f:
                snapshot = json.load(f)
                for item in snapshot:
                    all_data.append(item)

        df = pd.DataFrame(all_data)
        df["tracked_at"] = pd.to_datetime(df["tracked_at"])

        return df

Running the Complete Pipeline

def main():
    proxies = [
        "http://user:pass@mobile-proxy1.example.com:8080",
        "http://user:pass@mobile-proxy2.example.com:8080",
        "http://user:pass@mobile-proxy3.example.com:8080",
        "http://user:pass@mobile-proxy4.example.com:8080",
        "http://user:pass@mobile-proxy5.example.com:8080",
    ]

    pool = SneakerProxyPool(proxies)
    stockx_scraper = StockXScraper(pool)
    goat_scraper = GOATScraper(pool)
    tracker = SneakerPriceTracker(stockx_scraper, goat_scraper)

    # Define watchlist
    watchlist = [
        {
            "name": "Jordan 1 Retro High OG Chicago Lost and Found",
            "stockx_slug": "air-jordan-1-retro-high-og-chicago-lost-and-found",
            "goat_slug": "air-jordan-1-retro-high-og-lost-found-dz5485-612",
        },
        {
            "name": "Nike Dunk Low Panda",
            "stockx_slug": "nike-dunk-low-retro-white-black-2021",
            "goat_slug": "nike-dunk-low-retro-black-white-dd1391-100",
        },
        {
            "name": "Adidas Yeezy Slide Onyx",
            "stockx_slug": "adidas-yeezy-slide-onyx",
            "goat_slug": "adidas-yeezy-slide-onyx-hz5453",
        },
        {
            "name": "New Balance 550 White Green",
            "stockx_slug": "new-balance-550-white-green",
            "goat_slug": "new-balance-550-white-green-bb550wt1",
        },
    ]

    # Track current prices
    results = tracker.track_watchlist(watchlist)
    tracker.save_snapshot(results)

    # Display results
    df = pd.DataFrame(results)
    display_cols = [
        "name", "stockx_lowest_ask", "goat_lowest_price",
        "price_diff", "cheaper_platform",
    ]
    available_cols = [c for c in display_cols if c in df.columns]
    print("\nCurrent Prices:")
    print(df[available_cols].to_string())

    # Search for new products
    print("\nSearching StockX for trending shoes...")
    search_results = stockx_scraper.scrape_search_results("travis scott", max_results=10)
    for product in search_results:
        print(f"  {product.get('name')}: ${product.get('lowest_ask', 'N/A')}")

    # Export
    df.to_csv("sneaker_prices_latest.csv", index=False)
    print(f"\nTracked {len(results)} products across StockX and GOAT")


if __name__ == "__main__":
    main()

Why Mobile Proxies Are Critical for Sneaker Sites

StockX and GOAT have invested heavily in bot detection because automated purchasing bots have been a problem in the sneaker resale industry for years. Their defenses are specifically tuned to detect:

  • Datacenter IP addresses (almost always blocked)
  • Residential proxy patterns (partially effective)
  • Browser automation fingerprints
  • Rapid request sequences

Mobile proxies provide the highest success rates because:

  1. Carrier IP trust. Mobile IPs are assigned by cellular providers and shared by thousands of real users via CGNAT. StockX and GOAT cannot block these without affecting legitimate mobile shoppers.
  1. Natural browsing patterns. Mobile proxy traffic inherently mimics real user behavior patterns, reducing behavioral detection flags.
  1. IP rotation. Mobile proxies can rotate IPs by reconnecting to the cellular network, providing fresh addresses without maintaining large proxy pools.

Scheduling Automated Price Checks

For production use, schedule the tracker to run at regular intervals:

import schedule

def scheduled_track():
    """Run price tracking on a schedule."""
    proxies = ["http://user:pass@proxy.example.com:8080"]
    pool = SneakerProxyPool(proxies)
    stockx = StockXScraper(pool)
    goat = GOATScraper(pool)
    tracker = SneakerPriceTracker(stockx, goat)

    watchlist = [...]  # Your watchlist
    results = tracker.track_watchlist(watchlist)
    tracker.save_snapshot(results)
    print(f"Scheduled check complete: {len(results)} products tracked")

# Run every 6 hours
schedule.every(6).hours.do(scheduled_track)

while True:
    schedule.run_pending()
    time.sleep(60)

Conclusion

Building a sneaker price tracker that spans StockX and GOAT provides a comprehensive view of the resale market. The cross-platform comparison reveals arbitrage opportunities, and historical price tracking identifies trends before they become obvious.

The aggressive anti-bot measures on both platforms make mobile proxy rotation non-negotiable for reliable data collection. With proper proxy infrastructure and the scraping framework outlined in this guide, you can maintain continuous price monitoring across the sneaker market.

For more e-commerce scraping techniques, explore our other tutorials. The proxy glossary provides definitions for proxy concepts referenced throughout this guide, and our web scraping proxy hub covers additional platform-specific scraping strategies.


Related Reading

Scroll to Top