Scraping GoFood (Gojek) Restaurant Listings at Scale
GoFood, the food delivery arm of Indonesian super-app Gojek, is one of the largest food delivery platforms in Southeast Asia. With a dominant position in Indonesia and a growing presence across the region, GoFood holds an enormous dataset of restaurant information, menu items, pricing, and delivery logistics.
For market researchers, F&B businesses, and data analysts focusing on the Indonesian market, GoFood data is indispensable. This guide covers how to scrape GoFood restaurant listings at scale while navigating the platform’s technical defenses.
The GoFood Data Landscape
Scale of the Platform
GoFood’s data footprint is substantial:
- 500,000+ restaurant partners across Indonesia
- Presence in 50+ cities including Jakarta, Surabaya, Bandung, Medan, and Bali
- Millions of menu items with pricing in Indonesian Rupiah
- Real-time delivery data including driver availability and ETAs
- Ratings and reviews from one of SEA’s largest user bases
Data Categories Available
| Data Type | Description | Update Frequency |
|---|---|---|
| Restaurant profiles | Name, address, cuisine, hours | Weekly |
| Menu items | Names, prices, descriptions | Daily |
| Promotions | Discounts, vouchers, flash deals | Hourly |
| Ratings | Star ratings, review counts | Daily |
| Delivery info | Fees, ETAs, zones | Real-time |
| Photos | Restaurant and food images | Weekly |
Understanding GoFood’s Technical Stack
Mobile-First Architecture
GoFood is deeply integrated into the Gojek super-app. Unlike some competitors, there is limited web-based access to GoFood data. This means:
- Primary data source: The Gojek mobile app and its API
- API authentication: Token-based auth tied to app sessions
- Data delivery: JSON responses from RESTful and GraphQL endpoints
- Location services: GPS-based location verification
Why Mobile Proxies Are Essential
GoFood’s security infrastructure validates that requests originate from real mobile devices on Indonesian mobile networks. The platform checks:
- IP origin: Must be from known Indonesian mobile carriers (Telkomsel, Indosat, XL Axiata, Tri)
- Network type: Mobile data connections are treated differently from WiFi and datacenter connections
- Device attestation: App-level checks for rooted or emulated devices
- Request patterns: Behavioral analysis of API call sequences
DataResearchTools provides mobile proxies on Indonesian carriers including Telkomsel, Indosat Ooredoo, and XL Axiata, giving your scraper the authentic network fingerprint that GoFood expects.
Setting Up the Scraping Infrastructure
Environment Setup
# Create project directory
mkdir gofood-scraper && cd gofood-scraper
# Set up virtual environment
python3 -m venv venv
source venv/bin/activate
# Install dependencies
pip install requests aiohttp asyncio pandas sqlalchemyCore Scraper Class
import requests
import time
import random
import json
from datetime import datetime
class GoFoodScraper:
def __init__(self, proxy_user, proxy_pass):
self.base_api = "https://api.gojekapi.com"
self.session = requests.Session()
# Configure DataResearchTools Indonesian mobile proxy
self.session.proxies = {
"http": f"http://{proxy_user}:{proxy_pass}@id-mobile.dataresearchtools.com:8080",
"https": f"http://{proxy_user}:{proxy_pass}@id-mobile.dataresearchtools.com:8080"
}
self.session.headers.update({
"User-Agent": "Gojek/4.x.x (Android 14; samsung SM-A546E)",
"Accept": "application/json",
"Accept-Language": "id-ID,id;q=0.9,en;q=0.8",
"X-Platform": "android",
"X-AppVersion": "4.80.0",
"X-UniqueId": self._generate_device_id()
})
def _generate_device_id(self):
"""Generate a realistic device identifier."""
import uuid
return str(uuid.uuid4())
def _respectful_delay(self, min_sec=2, max_sec=5):
"""Add a random delay between requests."""
time.sleep(random.uniform(min_sec, max_sec))Scraping Restaurant Listings
Location-Based Discovery
GoFood restaurant discovery is entirely location-dependent. You need to query from coordinates within Indonesian cities.
# Major Indonesian city coordinates for scraping
INDONESIAN_CITIES = {
"Jakarta": {
"center": (-6.2088, 106.8456),
"bounds": {"min_lat": -6.38, "max_lat": -6.08, "min_lng": 106.65, "max_lng": 107.00}
},
"Surabaya": {
"center": (-7.2575, 112.7521),
"bounds": {"min_lat": -7.35, "max_lat": -7.20, "min_lng": 112.65, "max_lng": 112.85}
},
"Bandung": {
"center": (-6.9175, 107.6191),
"bounds": {"min_lat": -6.98, "max_lat": -6.85, "min_lng": 107.55, "max_lng": 107.70}
},
"Medan": {
"center": (3.5952, 98.6722),
"bounds": {"min_lat": 3.50, "max_lat": 3.70, "min_lng": 98.60, "max_lng": 98.75}
},
"Bali": {
"center": (-8.6500, 115.2167),
"bounds": {"min_lat": -8.80, "max_lat": -8.50, "min_lng": 115.10, "max_lng": 115.30}
}
}
def generate_search_grid(city_bounds, step=0.005):
"""Generate coordinate grid for a city."""
points = []
lat = city_bounds["min_lat"]
while lat <= city_bounds["max_lat"]:
lng = city_bounds["min_lng"]
while lng <= city_bounds["max_lng"]:
points.append((lat, lng))
lng += step
lat += step
return pointsFetching Restaurant Lists
def get_nearby_restaurants(self, latitude, longitude, page=0, limit=20):
"""Fetch restaurants near a specific coordinate."""
params = {
"lat": latitude,
"long": longitude,
"page": page,
"limit": limit,
"cuisine": "",
"sort_by": "distance"
}
response = self.session.get(
f"{self.base_api}/gofood/v3/restaurants",
params=params
)
if response.status_code == 200:
return response.json()
elif response.status_code == 429:
# Rate limited - back off
time.sleep(random.uniform(30, 60))
return None
else:
return None
def scrape_city_restaurants(self, city_name):
"""Scrape all restaurants in a given city."""
city = INDONESIAN_CITIES.get(city_name)
if not city:
raise ValueError(f"Unknown city: {city_name}")
grid_points = generate_search_grid(city["bounds"])
all_restaurants = {}
for i, (lat, lng) in enumerate(grid_points):
print(f"Scanning point {i+1}/{len(grid_points)}: ({lat}, {lng})")
page = 0
while True:
data = self.get_nearby_restaurants(lat, lng, page=page)
if not data or not data.get("restaurants"):
break
for restaurant in data["restaurants"]:
rid = restaurant.get("id")
if rid and rid not in all_restaurants:
all_restaurants[rid] = {
"id": rid,
"name": restaurant.get("name"),
"address": restaurant.get("address"),
"latitude": restaurant.get("latitude"),
"longitude": restaurant.get("longitude"),
"cuisine": restaurant.get("cuisine_type"),
"rating": restaurant.get("rating"),
"review_count": restaurant.get("total_reviews"),
"delivery_fee": restaurant.get("delivery_fee"),
"min_order": restaurant.get("min_order"),
"estimated_delivery": restaurant.get("eta_minutes"),
"is_open": restaurant.get("is_open"),
"city": city_name,
"scraped_at": datetime.utcnow().isoformat()
}
if len(data["restaurants"]) < 20:
break
page += 1
self._respectful_delay()
self._respectful_delay(1, 3)
return list(all_restaurants.values())Extracting Menu Data
Detailed Menu Scraping
def get_restaurant_menu(self, restaurant_id):
"""Fetch the complete menu for a restaurant."""
response = self.session.get(
f"{self.base_api}/gofood/v2/restaurants/{restaurant_id}/menu"
)
if response.status_code != 200:
return None
raw_menu = response.json()
parsed_items = []
for category in raw_menu.get("categories", []):
for item in category.get("items", []):
parsed_items.append({
"restaurant_id": restaurant_id,
"category": category.get("name", ""),
"item_name": item.get("name", ""),
"description": item.get("description", ""),
"price_idr": item.get("price", 0),
"original_price_idr": item.get("original_price", item.get("price", 0)),
"is_discounted": item.get("price", 0) < item.get("original_price", item.get("price", 0)),
"discount_percentage": self._calc_discount(
item.get("price", 0),
item.get("original_price", item.get("price", 0))
),
"is_available": item.get("available", True),
"is_popular": item.get("is_popular", False),
"image_url": item.get("image_url", ""),
"variants": [
{"name": v.get("name"), "price": v.get("price")}
for v in item.get("variants", [])
]
})
return parsed_items
def _calc_discount(self, current, original):
"""Calculate discount percentage."""
if original <= 0 or current >= original:
return 0
return round((1 - current / original) * 100, 1)Scaling the Scraping Operation
Asynchronous Scraping
For large-scale data collection across multiple Indonesian cities, use asynchronous requests:
import aiohttp
import asyncio
class AsyncGoFoodScraper:
def __init__(self, proxy_url):
self.proxy_url = proxy_url
self.base_api = "https://api.gojekapi.com"
self.semaphore = asyncio.Semaphore(5) # Max concurrent requests
async def fetch_menu(self, session, restaurant_id):
"""Fetch menu data asynchronously."""
async with self.semaphore:
try:
async with session.get(
f"{self.base_api}/gofood/v2/restaurants/{restaurant_id}/menu",
proxy=self.proxy_url,
timeout=aiohttp.ClientTimeout(total=30)
) as response:
if response.status == 200:
return await response.json()
return None
except Exception as e:
print(f"Error fetching menu {restaurant_id}: {e}")
return None
finally:
await asyncio.sleep(random.uniform(1, 3))
async def scrape_menus_batch(self, restaurant_ids):
"""Scrape menus for a batch of restaurants."""
async with aiohttp.ClientSession(headers={
"User-Agent": "Gojek/4.x.x (Android 14)",
"Accept": "application/json"
}) as session:
tasks = [
self.fetch_menu(session, rid) for rid in restaurant_ids
]
return await asyncio.gather(*tasks)Batch Processing Strategy
For scraping at the scale of GoFood’s Indonesian catalog:
def process_city_in_batches(scraper, city_name, batch_size=50):
"""Process a city's restaurants in manageable batches."""
# Step 1: Get all restaurant IDs
restaurants = scraper.scrape_city_restaurants(city_name)
restaurant_ids = [r["id"] for r in restaurants]
print(f"Found {len(restaurant_ids)} restaurants in {city_name}")
# Step 2: Scrape menus in batches
all_menus = []
for i in range(0, len(restaurant_ids), batch_size):
batch = restaurant_ids[i:i + batch_size]
print(f"Processing batch {i//batch_size + 1} ({len(batch)} restaurants)")
for rid in batch:
menu = scraper.get_restaurant_menu(rid)
if menu:
all_menus.extend(menu)
scraper._respectful_delay()
# Longer pause between batches
time.sleep(random.uniform(10, 20))
return all_menusData Storage and Analysis
Saving to Database
from sqlalchemy import create_engine, Column, String, Float, Integer, Boolean, DateTime
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
Base = declarative_base()
class GoFoodRestaurant(Base):
__tablename__ = "gofood_restaurants"
id = Column(String, primary_key=True)
name = Column(String)
address = Column(String)
latitude = Column(Float)
longitude = Column(Float)
cuisine = Column(String)
rating = Column(Float)
review_count = Column(Integer)
delivery_fee = Column(Integer)
min_order = Column(Integer)
city = Column(String)
scraped_at = Column(DateTime)
class GoFoodMenuItem(Base):
__tablename__ = "gofood_menu_items"
id = Column(Integer, primary_key=True, autoincrement=True)
restaurant_id = Column(String)
category = Column(String)
item_name = Column(String)
price_idr = Column(Integer)
original_price_idr = Column(Integer)
is_discounted = Column(Boolean)
is_available = Column(Boolean)
scraped_at = Column(DateTime)Price Analysis Queries
Common analytical queries on GoFood data:
def analyze_city_pricing(db_session, city):
"""Analyze pricing patterns for a city."""
results = db_session.execute("""
SELECT
cuisine,
COUNT(DISTINCT r.id) as restaurant_count,
AVG(m.price_idr) as avg_item_price,
MIN(m.price_idr) as min_price,
MAX(m.price_idr) as max_price,
AVG(r.delivery_fee) as avg_delivery_fee,
AVG(r.rating) as avg_rating
FROM gofood_restaurants r
JOIN gofood_menu_items m ON r.id = m.restaurant_id
WHERE r.city = :city AND m.is_available = true
GROUP BY cuisine
ORDER BY restaurant_count DESC
""", {"city": city})
return results.fetchall()Indonesian Market-Specific Considerations
Currency and Pricing
Indonesian Rupiah values are large numbers. Menu items typically range from IDR 10,000 to IDR 200,000. Store prices as integers (in Rupiah) to avoid floating-point issues.
Language Handling
GoFood content is primarily in Bahasa Indonesia. Ensure your scraper handles Indonesian characters correctly:
# Ensure proper encoding
response.encoding = "utf-8"
data = response.json()
# Indonesian menu items may contain special characters
item_name = data.get("name", "").strip()Peak Hours in Indonesia
Schedule heavy scraping during off-peak hours to minimize impact and reduce detection risk:
- Off-peak: 1:00 AM – 6:00 AM WIB (UTC+7)
- Moderate: 9:00 AM – 11:00 AM WIB
- Peak (avoid): 11:30 AM – 1:30 PM and 6:00 PM – 8:30 PM WIB
Regional Cuisine Categories
GoFood has cuisine categories specific to Indonesia:
- Nasi (Rice dishes)
- Mie & Bakso (Noodles and Meatballs)
- Ayam & Bebek (Chicken and Duck)
- Sate (Satay)
- Martabak
- Kopi (Coffee)
- Minuman (Beverages)
- Jajanan (Snacks)
Understanding these categories helps you structure your data collection and analysis.
Handling Common Challenges
Dynamic API Endpoints
GoFood occasionally updates its API structure. Build your scraper to handle endpoint changes gracefully:
def get_with_fallback(self, endpoint_versions, params):
"""Try multiple API versions until one works."""
for endpoint in endpoint_versions:
try:
response = self.session.get(
f"{self.base_api}{endpoint}",
params=params,
timeout=30
)
if response.status_code == 200:
return response.json()
except Exception:
continue
return None
# Usage
menu = self.get_with_fallback(
[
f"/gofood/v3/restaurants/{rid}/menu",
f"/gofood/v2/restaurants/{rid}/menu",
f"/gofood/v1/restaurants/{rid}/catalog"
],
params={}
)Session Expiration
GoFood sessions expire periodically. Implement automatic session refresh:
def ensure_valid_session(self):
"""Check and refresh session if needed."""
test_response = self.session.get(
f"{self.base_api}/gofood/v1/health",
timeout=10
)
if test_response.status_code == 401:
self._initialize_session()
return True
return FalseConclusion
Scraping GoFood at scale requires understanding Indonesia’s mobile-first digital landscape and using proxy infrastructure that matches the platform’s expectations. DataResearchTools mobile proxies on Indonesian carriers provide the authentic network identity needed to collect GoFood data reliably.
Start with a single city like Jakarta, build out your data pipeline, validate the quality of collected data, and then expand to additional cities. The combination of proper mobile proxy infrastructure, respectful scraping practices, and structured data storage will give you a comprehensive view of Indonesia’s food delivery market.
- Best Proxies for Food Delivery Platform Scraping
- How Cloud Kitchens Use Proxies for Competitive Menu Analysis
- How to Scrape AliExpress Product Data Without Getting Blocked
- Amazon Buy Box Monitoring: Proxy Setup for Continuous Tracking
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- API vs Web Scraping: When You Need Proxies (and When You Don’t)
- Best Proxies for Food Delivery Platform Scraping
- How Cloud Kitchens Use Proxies for Competitive Menu Analysis
- aiohttp + BeautifulSoup: Async Python Scraping
- How to Scrape AliExpress Product Data Without Getting Blocked
- Amazon Buy Box Monitoring: Proxy Setup for Continuous Tracking
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- Best Proxies for Food Delivery Platform Scraping
- How Cloud Kitchens Use Proxies for Competitive Menu Analysis
- aiohttp + BeautifulSoup: Async Python Scraping
- How to Scrape AliExpress Product Data Without Getting Blocked
- Amazon Buy Box Monitoring: Proxy Setup for Continuous Tracking
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
Related Reading
- Best Proxies for Food Delivery Platform Scraping
- How Cloud Kitchens Use Proxies for Competitive Menu Analysis
- aiohttp + BeautifulSoup: Async Python Scraping
- How to Scrape AliExpress Product Data Without Getting Blocked
- Amazon Buy Box Monitoring: Proxy Setup for Continuous Tracking
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)