How to Scrape StockX and GOAT for Sneaker Price Tracking
The sneaker resale market generates over $10 billion annually, with StockX and GOAT serving as the two dominant marketplace platforms. For resellers, investors, brand analysts, and market researchers, real-time pricing data from these platforms drives buy/sell decisions, identifies profitable arbitrage opportunities, and reveals demand trends across sneaker models.
Both platforms heavily protect their pricing data through aggressive anti-bot measures, API authentication requirements, and sophisticated fingerprinting. This guide demonstrates how to build a sneaker price tracker that extracts pricing, bid data, and sales history from StockX and GOAT using Python and mobile proxy rotation.
Understanding StockX and GOAT API Architecture
Both platforms use modern frontend architectures backed by API endpoints that serve data to their React-based interfaces.
StockX
StockX operates as a stock market for consumer goods. Key data points include:
- Ask price: The lowest price a seller is willing to accept
- Bid price: The highest price a buyer is offering
- Last sale: The most recent completed transaction price
- Sales history: A time series of all past transactions
- Volatility: Price fluctuation metrics over time periods
- Number of bids/asks: Market depth indicators
StockX uses a GraphQL API internally, which provides structured access to product and market data. However, it requires authentication tokens and implements rate limiting.
GOAT
GOAT offers similar marketplace functionality but with a REST-based API structure:
- Retail price vs. resale price
- Size-specific pricing
- Condition grades (new, used, defect)
- Historical price charts
- Seller ratings and listing counts
Both platforms implement anti-bot protections that make web scraping proxies essential for sustained data collection.
Setting Up the Environment
pip install requests beautifulsoup4 selenium webdriver-manager pandasBuilding the StockX Scraper
StockX data is accessible through a combination of server-rendered HTML (which contains initial product data) and API endpoints (for detailed market data):
import requests
from bs4 import BeautifulSoup
import json
import time
import random
import re
import pandas as pd
from datetime import datetime
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
class SneakerProxyPool:
"""Manages proxy rotation for sneaker marketplace scraping."""
def __init__(self, proxy_list):
self.proxies = proxy_list
self.index = 0
self.cooldown = {}
def get_proxy(self):
"""Return the next available proxy."""
now = time.time()
available = [
p for p in self.proxies
if p not in self.cooldown or now > self.cooldown[p]
]
if not available:
self.cooldown.clear()
available = self.proxies
proxy = available[self.index % len(available)]
self.index += 1
return proxy
def set_cooldown(self, proxy, seconds=60):
"""Put a proxy on cooldown."""
self.cooldown[proxy] = time.time() + seconds
def get_requests_proxy(self):
"""Return proxy formatted for requests library."""
proxy = self.get_proxy()
return {"http": proxy, "https": proxy}
def create_selenium_driver(self, proxy=None):
"""Create a configured Selenium driver."""
if proxy is None:
proxy = self.get_proxy()
options = Options()
options.add_argument(f"--proxy-server={proxy}")
options.add_argument("--headless=new")
options.add_argument("--disable-blink-features=AutomationControlled")
options.add_argument("--no-sandbox")
options.add_argument(
"--user-agent=Mozilla/5.0 (iPhone; CPU iPhone OS 17_4 like Mac OS X) "
"AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.4 "
"Mobile/15E148 Safari/604.1"
)
options.add_experimental_option("excludeSwitches", ["enable-automation"])
service = Service(ChromeDriverManager().install())
driver = webdriver.Chrome(service=service, options=options)
driver.execute_cdp_cmd(
"Page.addScriptToEvaluateOnNewDocument",
{"source": "Object.defineProperty(navigator, 'webdriver', {get: () => undefined})"},
)
return driver, proxy
class StockXScraper:
"""Scrapes product and pricing data from StockX."""
def __init__(self, proxy_pool):
self.proxy_pool = proxy_pool
self.session = requests.Session()
self.session.headers.update({
"User-Agent": (
"Mozilla/5.0 (iPhone; CPU iPhone OS 17_4 like Mac OS X) "
"AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.4 "
"Mobile/15E148 Safari/604.1"
),
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.5",
})
def scrape_product_page(self, product_slug):
"""Scrape product details and current pricing from a StockX product page."""
url = f"https://stockx.com/{product_slug}"
proxy = self.proxy_pool.get_requests_proxy()
try:
response = self.session.get(url, proxies=proxy, timeout=20)
if response.status_code == 200:
return self._parse_product_page(response.text, product_slug)
elif response.status_code == 403:
print(f"Blocked on {product_slug}, trying Selenium fallback...")
return self._scrape_with_selenium(product_slug)
else:
print(f"HTTP {response.status_code} for {product_slug}")
return None
except requests.RequestException as e:
print(f"Request error: {e}")
return self._scrape_with_selenium(product_slug)
def _parse_product_page(self, html, slug):
"""Extract product data from StockX page HTML."""
soup = BeautifulSoup(html, "html.parser")
product = {"slug": slug, "scraped_at": datetime.now().isoformat()}
# Extract Next.js data
next_data = soup.select_one("script#__NEXT_DATA__")
if next_data:
try:
data = json.loads(next_data.string)
props = data.get("props", {}).get("pageProps", {})
product_data = props.get("req", {}).get("product", {})
if product_data:
product["title"] = product_data.get("title")
product["brand"] = product_data.get("brand")
product["colorway"] = product_data.get("colorway")
product["retail_price"] = product_data.get("retailPrice")
product["style_id"] = product_data.get("styleId")
product["release_date"] = product_data.get("releaseDate")
product["product_id"] = product_data.get("id")
# Market data
market = product_data.get("market", {})
product["lowest_ask"] = market.get("lowestAsk")
product["highest_bid"] = market.get("highestBid")
product["last_sale"] = market.get("lastSale")
product["sales_last_72h"] = market.get("salesLast72Hours")
product["change_value"] = market.get("changeValue")
product["change_percentage"] = market.get("changePercentage")
product["volatility"] = market.get("volatility")
product["number_of_bids"] = market.get("numberOfBids")
product["number_of_asks"] = market.get("numberOfAsks")
return product
except (json.JSONDecodeError, TypeError, KeyError) as e:
print(f"JSON parsing error: {e}")
# Fallback: parse visible elements
title_el = soup.select_one("h1")
product["title"] = title_el.get_text(strip=True) if title_el else None
return product
def _scrape_with_selenium(self, product_slug):
"""Fallback scraper using Selenium for JavaScript-rendered content."""
driver, proxy = self.proxy_pool.create_selenium_driver()
try:
url = f"https://stockx.com/{product_slug}"
driver.get(url)
WebDriverWait(driver, 15).until(
EC.presence_of_element_located((By.CSS_SELECTOR, "h1"))
)
time.sleep(random.uniform(2, 4))
html = driver.page_source
return self._parse_product_page(html, product_slug)
except Exception as e:
print(f"Selenium scrape error: {e}")
return None
finally:
driver.quit()
def scrape_search_results(self, query, max_results=40):
"""Search StockX and return matching products."""
driver, proxy = self.proxy_pool.create_selenium_driver()
products = []
try:
search_url = f"https://stockx.com/search?s={query.replace(' ', '%20')}"
driver.get(search_url)
WebDriverWait(driver, 15).until(
EC.presence_of_element_located(
(By.CSS_SELECTOR, "[data-testid='product-tile'], a[href*='/']")
)
)
time.sleep(random.uniform(2, 4))
# Scroll to load more results
for _ in range(3):
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(random.uniform(1, 2))
soup = BeautifulSoup(driver.page_source, "html.parser")
tiles = soup.select("[data-testid='product-tile'], .browse-grid a.tile")
for tile in tiles[:max_results]:
product = {}
name_el = tile.select_one("[data-testid='product-tile-title'], .tile-title")
product["name"] = name_el.get_text(strip=True) if name_el else None
price_el = tile.select_one("[data-testid='product-tile-lowest-ask'], .tile-price")
if price_el:
price_text = price_el.get_text(strip=True)
product["lowest_ask"] = self._clean_price(price_text)
product["lowest_ask_raw"] = price_text
link_el = tile if tile.name == "a" else tile.select_one("a")
if link_el and link_el.get("href"):
href = link_el["href"]
product["slug"] = href.strip("/").split("/")[-1]
product["url"] = f"https://stockx.com{href}"
if product.get("name"):
products.append(product)
print(f"Search '{query}': found {len(products)} products")
except Exception as e:
print(f"Search error: {e}")
finally:
driver.quit()
return products
@staticmethod
def _clean_price(price_text):
"""Extract numeric price from text."""
match = re.search(r"[\d,]+\.?\d*", price_text.replace(",", ""))
return float(match.group()) if match else NoneBuilding the GOAT Scraper
GOAT has a different data structure but similar anti-bot protections:
class GOATScraper:
"""Scrapes sneaker data from GOAT marketplace."""
def __init__(self, proxy_pool):
self.proxy_pool = proxy_pool
self.session = requests.Session()
self.session.headers.update({
"User-Agent": (
"Mozilla/5.0 (iPhone; CPU iPhone OS 17_4 like Mac OS X) "
"AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.4 "
"Mobile/15E148 Safari/604.1"
),
"Accept": "application/json, text/html",
})
def scrape_product(self, product_slug):
"""Scrape product data from GOAT."""
driver, proxy = self.proxy_pool.create_selenium_driver()
try:
url = f"https://www.goat.com/sneakers/{product_slug}"
driver.get(url)
WebDriverWait(driver, 15).until(
EC.presence_of_element_located((By.CSS_SELECTOR, "h1"))
)
time.sleep(random.uniform(2, 4))
soup = BeautifulSoup(driver.page_source, "html.parser")
product = {
"slug": product_slug,
"platform": "goat",
"scraped_at": datetime.now().isoformat(),
}
# Try to extract from Next.js data
next_data = soup.select_one("script#__NEXT_DATA__")
if next_data:
try:
data = json.loads(next_data.string)
product_info = (
data.get("props", {})
.get("pageProps", {})
.get("productTemplate", {})
)
if product_info:
product["title"] = product_info.get("name")
product["brand"] = product_info.get("brandName")
product["retail_price"] = product_info.get("retailPriceCents", 0) / 100
product["release_date"] = product_info.get("releaseDate")
product["colorway"] = product_info.get("color")
product["sku"] = product_info.get("sku")
# Pricing by size
sizes = product_info.get("sizeRange", [])
product["size_prices"] = []
for size in sizes:
size_data = {
"size": size.get("value"),
"lowest_price": size.get("lowestPriceCents", {}).get("amount", 0) / 100,
}
product["size_prices"].append(size_data)
except (json.JSONDecodeError, TypeError, KeyError):
pass
# Fallback to visible elements
if not product.get("title"):
title_el = soup.select_one("h1")
product["title"] = title_el.get_text(strip=True) if title_el else None
return product
except Exception as e:
print(f"GOAT scrape error for {product_slug}: {e}")
return None
finally:
driver.quit()
def search_products(self, query, max_results=20):
"""Search for products on GOAT."""
driver, proxy = self.proxy_pool.create_selenium_driver()
products = []
try:
search_url = f"https://www.goat.com/search?query={query.replace(' ', '%20')}"
driver.get(search_url)
WebDriverWait(driver, 15).until(
EC.presence_of_element_located(
(By.CSS_SELECTOR, "[data-testid='grid-item'], .grid-item")
)
)
time.sleep(random.uniform(2, 4))
soup = BeautifulSoup(driver.page_source, "html.parser")
items = soup.select("[data-testid='grid-item'], .grid-item, a[href*='/sneakers/']")
for item in items[:max_results]:
product = {"platform": "goat"}
name_el = item.select_one("[data-testid='product-card-title'], .product-name")
product["name"] = name_el.get_text(strip=True) if name_el else None
price_el = item.select_one("[data-testid='product-card-price'], .product-price")
if price_el:
product["price"] = self._clean_price(price_el.get_text(strip=True))
link = item if item.name == "a" else item.select_one("a")
if link and link.get("href"):
href = link["href"]
product["slug"] = href.split("/")[-1]
product["url"] = f"https://www.goat.com{href}" if href.startswith("/") else href
if product.get("name"):
products.append(product)
except Exception as e:
print(f"GOAT search error: {e}")
finally:
driver.quit()
return products
@staticmethod
def _clean_price(price_text):
match = re.search(r"[\d,]+\.?\d*", price_text.replace(",", ""))
return float(match.group()) if match else NoneBuilding the Price Tracker
Combine StockX and GOAT data into a unified price tracking system:
class SneakerPriceTracker:
"""Unified price tracker across StockX and GOAT."""
def __init__(self, stockx_scraper, goat_scraper, data_dir="sneaker_data"):
self.stockx = stockx_scraper
self.goat = goat_scraper
self.data_dir = data_dir
def track_product(self, stockx_slug, goat_slug=None):
"""Get current prices from both platforms for a product."""
result = {
"tracked_at": datetime.now().isoformat(),
"stockx_slug": stockx_slug,
"goat_slug": goat_slug,
}
# StockX data
stockx_data = self.stockx.scrape_product_page(stockx_slug)
if stockx_data:
result["stockx_lowest_ask"] = stockx_data.get("lowest_ask")
result["stockx_highest_bid"] = stockx_data.get("highest_bid")
result["stockx_last_sale"] = stockx_data.get("last_sale")
result["stockx_title"] = stockx_data.get("title")
result["retail_price"] = stockx_data.get("retail_price")
time.sleep(random.uniform(3, 6))
# GOAT data
if goat_slug:
goat_data = self.goat.scrape_product(goat_slug)
if goat_data:
result["goat_lowest_price"] = None
if goat_data.get("size_prices"):
prices = [
sp["lowest_price"] for sp in goat_data["size_prices"]
if sp["lowest_price"] > 0
]
if prices:
result["goat_lowest_price"] = min(prices)
result["goat_title"] = goat_data.get("title")
# Calculate arbitrage
if result.get("stockx_lowest_ask") and result.get("goat_lowest_price"):
result["price_diff"] = round(
result["stockx_lowest_ask"] - result["goat_lowest_price"], 2
)
result["cheaper_platform"] = (
"goat" if result["goat_lowest_price"] < result["stockx_lowest_ask"]
else "stockx"
)
return result
def track_watchlist(self, watchlist):
"""Track prices for a list of products."""
results = []
for item in watchlist:
print(f"Tracking: {item.get('name', item['stockx_slug'])}")
data = self.track_product(
item["stockx_slug"],
item.get("goat_slug"),
)
data["name"] = item.get("name", "")
results.append(data)
time.sleep(random.uniform(5, 10))
return results
def save_snapshot(self, results):
"""Save a price tracking snapshot."""
import os
os.makedirs(self.data_dir, exist_ok=True)
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
filename = f"{self.data_dir}/prices_{timestamp}.json"
with open(filename, "w") as f:
json.dump(results, f, indent=2, default=str)
print(f"Snapshot saved: {filename}")
return filename
def analyze_price_history(self, snapshots_dir=None):
"""Analyze price trends from historical snapshots."""
import os
import glob
data_dir = snapshots_dir or self.data_dir
files = sorted(glob.glob(f"{data_dir}/prices_*.json"))
if not files:
print("No snapshots found")
return None
all_data = []
for filepath in files:
with open(filepath) as f:
snapshot = json.load(f)
for item in snapshot:
all_data.append(item)
df = pd.DataFrame(all_data)
df["tracked_at"] = pd.to_datetime(df["tracked_at"])
return dfRunning the Complete Pipeline
def main():
proxies = [
"http://user:pass@mobile-proxy1.example.com:8080",
"http://user:pass@mobile-proxy2.example.com:8080",
"http://user:pass@mobile-proxy3.example.com:8080",
"http://user:pass@mobile-proxy4.example.com:8080",
"http://user:pass@mobile-proxy5.example.com:8080",
]
pool = SneakerProxyPool(proxies)
stockx_scraper = StockXScraper(pool)
goat_scraper = GOATScraper(pool)
tracker = SneakerPriceTracker(stockx_scraper, goat_scraper)
# Define watchlist
watchlist = [
{
"name": "Jordan 1 Retro High OG Chicago Lost and Found",
"stockx_slug": "air-jordan-1-retro-high-og-chicago-lost-and-found",
"goat_slug": "air-jordan-1-retro-high-og-lost-found-dz5485-612",
},
{
"name": "Nike Dunk Low Panda",
"stockx_slug": "nike-dunk-low-retro-white-black-2021",
"goat_slug": "nike-dunk-low-retro-black-white-dd1391-100",
},
{
"name": "Adidas Yeezy Slide Onyx",
"stockx_slug": "adidas-yeezy-slide-onyx",
"goat_slug": "adidas-yeezy-slide-onyx-hz5453",
},
{
"name": "New Balance 550 White Green",
"stockx_slug": "new-balance-550-white-green",
"goat_slug": "new-balance-550-white-green-bb550wt1",
},
]
# Track current prices
results = tracker.track_watchlist(watchlist)
tracker.save_snapshot(results)
# Display results
df = pd.DataFrame(results)
display_cols = [
"name", "stockx_lowest_ask", "goat_lowest_price",
"price_diff", "cheaper_platform",
]
available_cols = [c for c in display_cols if c in df.columns]
print("\nCurrent Prices:")
print(df[available_cols].to_string())
# Search for new products
print("\nSearching StockX for trending shoes...")
search_results = stockx_scraper.scrape_search_results("travis scott", max_results=10)
for product in search_results:
print(f" {product.get('name')}: ${product.get('lowest_ask', 'N/A')}")
# Export
df.to_csv("sneaker_prices_latest.csv", index=False)
print(f"\nTracked {len(results)} products across StockX and GOAT")
if __name__ == "__main__":
main()Why Mobile Proxies Are Critical for Sneaker Sites
StockX and GOAT have invested heavily in bot detection because automated purchasing bots have been a problem in the sneaker resale industry for years. Their defenses are specifically tuned to detect:
- Datacenter IP addresses (almost always blocked)
- Residential proxy patterns (partially effective)
- Browser automation fingerprints
- Rapid request sequences
Mobile proxies provide the highest success rates because:
- Carrier IP trust. Mobile IPs are assigned by cellular providers and shared by thousands of real users via CGNAT. StockX and GOAT cannot block these without affecting legitimate mobile shoppers.
- Natural browsing patterns. Mobile proxy traffic inherently mimics real user behavior patterns, reducing behavioral detection flags.
- IP rotation. Mobile proxies can rotate IPs by reconnecting to the cellular network, providing fresh addresses without maintaining large proxy pools.
Scheduling Automated Price Checks
For production use, schedule the tracker to run at regular intervals:
import schedule
def scheduled_track():
"""Run price tracking on a schedule."""
proxies = ["http://user:pass@proxy.example.com:8080"]
pool = SneakerProxyPool(proxies)
stockx = StockXScraper(pool)
goat = GOATScraper(pool)
tracker = SneakerPriceTracker(stockx, goat)
watchlist = [...] # Your watchlist
results = tracker.track_watchlist(watchlist)
tracker.save_snapshot(results)
print(f"Scheduled check complete: {len(results)} products tracked")
# Run every 6 hours
schedule.every(6).hours.do(scheduled_track)
while True:
schedule.run_pending()
time.sleep(60)Conclusion
Building a sneaker price tracker that spans StockX and GOAT provides a comprehensive view of the resale market. The cross-platform comparison reveals arbitrage opportunities, and historical price tracking identifies trends before they become obvious.
The aggressive anti-bot measures on both platforms make mobile proxy rotation non-negotiable for reliable data collection. With proper proxy infrastructure and the scraping framework outlined in this guide, you can maintain continuous price monitoring across the sneaker market.
For more e-commerce scraping techniques, explore our other tutorials. The proxy glossary provides definitions for proxy concepts referenced throughout this guide, and our web scraping proxy hub covers additional platform-specific scraping strategies.
- How to Scrape Amazon Product Data with Proxies: 2026 Python Guide
- How to Scrape Bing Search Results with Python and Proxies
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- API vs Web Scraping: When You Need Proxies (and When You Don’t)
- ASEAN Data Protection Laws: A Web Scraping Compliance Matrix
- How to Build an Ethical Web Scraping Policy for Your Company
- How to Scrape Amazon Product Data with Proxies: 2026 Python Guide
- How to Scrape Bing Search Results with Python and Proxies
- aiohttp + BeautifulSoup: Async Python Scraping
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- API vs Web Scraping: When You Need Proxies (and When You Don’t)
- ASEAN Data Protection Laws: A Web Scraping Compliance Matrix
- How to Scrape Amazon Product Data with Proxies: 2026 Python Guide
- How to Scrape Bing Search Results with Python and Proxies
- aiohttp + BeautifulSoup: Async Python Scraping
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- API vs Web Scraping: When You Need Proxies (and When You Don’t)
- ASEAN Data Protection Laws: A Web Scraping Compliance Matrix
Related Reading
- How to Scrape Amazon Product Data with Proxies: 2026 Python Guide
- How to Scrape Bing Search Results with Python and Proxies
- aiohttp + BeautifulSoup: Async Python Scraping
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- API vs Web Scraping: When You Need Proxies (and When You Don’t)
- ASEAN Data Protection Laws: A Web Scraping Compliance Matrix