—
Grubhub sits on top of one of the richest food-delivery datasets in the US, and if you need to scrape Grubhub for menu pricing, item availability, or restaurant density across cities, the good news is the data is structured and accessible. the bad news: Grubhub’s anti-bot stack has gotten meaningfully tighter since 2024.
How Grubhub structures its data
Grubhub organizes data in three layers: city/market, restaurant listing, and menu.
at the city level, each market has a slug (e.g., chicago-il, new-york-ny) that scopes search results geographically. restaurant listings include name, cuisine tags, address, rating, review count, delivery fee, and estimated delivery time. menu data goes deeper: categories, item names, descriptions, prices, modifiers (size, extras), and availability windows.
the key fields you’ll want:
restaurant_id(internal numeric ID, stable across requests)menu_category_idandmenu_item_idprice(in cents, divide by 100)availability(lunch/dinner windows, sometimes day-of-week flags)delivery_zone(polygon or radius, relevant for multi-city work)
Grubhub also exposes is_orderable, which flags restaurants that are live vs. listed-but-closed. that’s useful for filtering before scraping menus.
The API approach vs HTML scraping
Don’t scrape the HTML. Grubhub’s frontend is React-rendered, so raw HTML gives you almost nothing without a headless browser. instead, target the internal JSON API that the webapp calls.
the main endpoints are:
https://api-gtm.grubhub.com/restaurants/search— takes lat/lon + radius, returns restaurant listhttps://api-gtm.grubhub.com/restaurants/{restaurant_id}/menu— full menu JSONhttps://api-gtm.grubhub.com/restaurants/{restaurant_id}— restaurant metadata
these endpoints return clean JSON with no HTML parsing needed. compared to scraping DoorDash restaurant menus and pricing, Grubhub’s API is slightly more permissive in terms of payload structure, but stricter on request headers.
the search endpoint accepts pageSize up to 100 and supports pagination via offset. you can also filter by cuisine, delivery fee max, and sort by rating or distance.
| approach | complexity | data quality | speed |
|---|---|---|---|
| HTML scraping | high | poor (React SSR) | slow |
| internal API (JSON) | medium | excellent | fast |
| official API (none) | n/a | n/a | n/a |
| headless browser | high | good | very slow |
Grubhub has no public API for third-party access, so the internal API route is your only real option for structured data at scale.
Anti-bot measures and how to handle them
Grubhub uses a combination of rate limiting, TLS fingerprinting, and behavior analysis. hitting the API without proper headers returns 403s within a few dozen requests. rotating IPs alone won’t fix it if your TLS fingerprint screams Python requests.
what actually works:
- use
httpxwith HTTP/2 support, which more closely matches browser TLS fingerprints thanrequests - set realistic headers:
User-Agent,Accept-Language,Referer(set tohttps://www.grubhub.com/), andx-csrf-token(pull from an initial page load) - rotate residential proxies, not datacenter IPs — Grubhub blocks datacenter CIDR blocks aggressively
- add 2-5 second jitter between requests per session
- keep sessions alive with cookies from an initial homepage hit
similar challenges come up when you scrape Uber Eats restaurant listings at scale, though Uber Eats has a different fingerprint profile. and if you’re expanding coverage to European platforms, the same residential proxy approach applies when you scrape Deliveroo restaurant menus in the UK and EU.
Python code: fetching Grubhub restaurant listings
here’s a realistic example hitting the search endpoint with proper headers and session reuse:
import httpx
import time
import random
BASE_URL = "https://api-gtm.grubhub.com"
HEADERS = {
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36",
"Accept": "application/json",
"Accept-Language": "en-US,en;q=0.9",
"Referer": "https://www.grubhub.com/",
"Origin": "https://www.grubhub.com",
}
def fetch_restaurants(lat: float, lon: float, radius_meters: int = 5000, page_size: int = 100, offset: int = 0, proxy: str = None):
params = {
"orderMethod": "delivery",
"locationMode": "DELIVERY",
"pageSize": page_size,
"hideHateos": True,
"latitude": lat,
"longitude": lon,
"radius": radius_meters,
"offset": offset,
}
proxies = {"https://": proxy} if proxy else None
with httpx.Client(http2=True, headers=HEADERS, proxies=proxies, timeout=15) as client:
resp = client.get(f"{BASE_URL}/restaurants/search", params=params)
resp.raise_for_status()
time.sleep(random.uniform(2, 5))
return resp.json()
# example: downtown Chicago
data = fetch_restaurants(lat=41.8781, lon=-87.6298, proxy="http://user:pass@residential-proxy:8080")
restaurants = data.get("search_result", {}).get("results", [])
print(f"fetched {len(restaurants)} restaurants")for menu data, swap the endpoint to /restaurants/{restaurant_id}/menu and parse the menu_category_list key. each category has an item_list array with price, name, and modifier groups.
Scaling across cities and storing the data
the cleanest multi-city approach is a lat/lon grid. pick a city center, then tile outward with overlapping radius circles (5km radius, 4km step) to avoid gaps. for a city like LA, you’ll need 15-20 tiles to cover the metro. for NYC, closer to 30.
store results in Postgres with a schema like:
restaurants(id, grubhub_id, city, name, lat, lon, rating, scraped_at)
menu_items(id, restaurant_id, category, name, price_cents, scraped_at)index grubhub_id as unique to deduplicate on re-runs. run incremental scrapes daily for pricing volatility — menu prices on Grubhub shift frequently, especially for items with demand-based pricing.
for the job queue, use Celery with Redis or a simple Postgres-backed queue. 10-15 concurrent workers with a shared rotating proxy pool handles ~50 cities overnight without triggering hard bans.
if you’re also pulling data from Asian markets, the same grid-based architecture translates well when you scrape Foodpanda menu data across Asia and the EU or scrape GrabFood restaurant and menu data. the proxy and rate-limit logic is nearly identical across these platforms.
expect roughly 800 restaurants per major US city on average, with 20-60 menu items each. a full 50-city dataset runs 40,000-80,000 restaurants and 2-4 million menu rows — manageable in Postgres with proper partitioning by city or scrape date.
Bottom line
scraping Grubhub at scale is doable with httpx, residential proxies, and the internal JSON API — skip the HTML layer entirely. the biggest failure mode is fingerprint mismatch, not IP bans, so invest in realistic headers and HTTP/2 before you scale. dataresearchtools.com covers Grubhub alongside the full food-delivery ecosystem if you need benchmarks or want to track how the anti-bot landscape evolves through 2026.