Scraping GoFood (Gojek) Restaurant Listings at Scale

Scraping GoFood (Gojek) Restaurant Listings at Scale

GoFood, the food delivery arm of Indonesian super-app Gojek, is one of the largest food delivery platforms in Southeast Asia. With a dominant position in Indonesia and a growing presence across the region, GoFood holds an enormous dataset of restaurant information, menu items, pricing, and delivery logistics.

For market researchers, F&B businesses, and data analysts focusing on the Indonesian market, GoFood data is indispensable. This guide covers how to scrape GoFood restaurant listings at scale while navigating the platform’s technical defenses.

The GoFood Data Landscape

Scale of the Platform

GoFood’s data footprint is substantial:

  • 500,000+ restaurant partners across Indonesia
  • Presence in 50+ cities including Jakarta, Surabaya, Bandung, Medan, and Bali
  • Millions of menu items with pricing in Indonesian Rupiah
  • Real-time delivery data including driver availability and ETAs
  • Ratings and reviews from one of SEA’s largest user bases

Data Categories Available

Data TypeDescriptionUpdate Frequency
Restaurant profilesName, address, cuisine, hoursWeekly
Menu itemsNames, prices, descriptionsDaily
PromotionsDiscounts, vouchers, flash dealsHourly
RatingsStar ratings, review countsDaily
Delivery infoFees, ETAs, zonesReal-time
PhotosRestaurant and food imagesWeekly

Understanding GoFood’s Technical Stack

Mobile-First Architecture

GoFood is deeply integrated into the Gojek super-app. Unlike some competitors, there is limited web-based access to GoFood data. This means:

  • Primary data source: The Gojek mobile app and its API
  • API authentication: Token-based auth tied to app sessions
  • Data delivery: JSON responses from RESTful and GraphQL endpoints
  • Location services: GPS-based location verification

Why Mobile Proxies Are Essential

GoFood’s security infrastructure validates that requests originate from real mobile devices on Indonesian mobile networks. The platform checks:

  1. IP origin: Must be from known Indonesian mobile carriers (Telkomsel, Indosat, XL Axiata, Tri)
  2. Network type: Mobile data connections are treated differently from WiFi and datacenter connections
  3. Device attestation: App-level checks for rooted or emulated devices
  4. Request patterns: Behavioral analysis of API call sequences

DataResearchTools provides mobile proxies on Indonesian carriers including Telkomsel, Indosat Ooredoo, and XL Axiata, giving your scraper the authentic network fingerprint that GoFood expects.

Setting Up the Scraping Infrastructure

Environment Setup

# Create project directory
mkdir gofood-scraper && cd gofood-scraper

# Set up virtual environment
python3 -m venv venv
source venv/bin/activate

# Install dependencies
pip install requests aiohttp asyncio pandas sqlalchemy

Core Scraper Class

import requests
import time
import random
import json
from datetime import datetime

class GoFoodScraper:
    def __init__(self, proxy_user, proxy_pass):
        self.base_api = "https://api.gojekapi.com"
        self.session = requests.Session()

        # Configure DataResearchTools Indonesian mobile proxy
        self.session.proxies = {
            "http": f"http://{proxy_user}:{proxy_pass}@id-mobile.dataresearchtools.com:8080",
            "https": f"http://{proxy_user}:{proxy_pass}@id-mobile.dataresearchtools.com:8080"
        }

        self.session.headers.update({
            "User-Agent": "Gojek/4.x.x (Android 14; samsung SM-A546E)",
            "Accept": "application/json",
            "Accept-Language": "id-ID,id;q=0.9,en;q=0.8",
            "X-Platform": "android",
            "X-AppVersion": "4.80.0",
            "X-UniqueId": self._generate_device_id()
        })

    def _generate_device_id(self):
        """Generate a realistic device identifier."""
        import uuid
        return str(uuid.uuid4())

    def _respectful_delay(self, min_sec=2, max_sec=5):
        """Add a random delay between requests."""
        time.sleep(random.uniform(min_sec, max_sec))

Scraping Restaurant Listings

Location-Based Discovery

GoFood restaurant discovery is entirely location-dependent. You need to query from coordinates within Indonesian cities.

# Major Indonesian city coordinates for scraping
INDONESIAN_CITIES = {
    "Jakarta": {
        "center": (-6.2088, 106.8456),
        "bounds": {"min_lat": -6.38, "max_lat": -6.08, "min_lng": 106.65, "max_lng": 107.00}
    },
    "Surabaya": {
        "center": (-7.2575, 112.7521),
        "bounds": {"min_lat": -7.35, "max_lat": -7.20, "min_lng": 112.65, "max_lng": 112.85}
    },
    "Bandung": {
        "center": (-6.9175, 107.6191),
        "bounds": {"min_lat": -6.98, "max_lat": -6.85, "min_lng": 107.55, "max_lng": 107.70}
    },
    "Medan": {
        "center": (3.5952, 98.6722),
        "bounds": {"min_lat": 3.50, "max_lat": 3.70, "min_lng": 98.60, "max_lng": 98.75}
    },
    "Bali": {
        "center": (-8.6500, 115.2167),
        "bounds": {"min_lat": -8.80, "max_lat": -8.50, "min_lng": 115.10, "max_lng": 115.30}
    }
}

def generate_search_grid(city_bounds, step=0.005):
    """Generate coordinate grid for a city."""
    points = []
    lat = city_bounds["min_lat"]
    while lat <= city_bounds["max_lat"]:
        lng = city_bounds["min_lng"]
        while lng <= city_bounds["max_lng"]:
            points.append((lat, lng))
            lng += step
        lat += step
    return points

Fetching Restaurant Lists

def get_nearby_restaurants(self, latitude, longitude, page=0, limit=20):
    """Fetch restaurants near a specific coordinate."""
    params = {
        "lat": latitude,
        "long": longitude,
        "page": page,
        "limit": limit,
        "cuisine": "",
        "sort_by": "distance"
    }

    response = self.session.get(
        f"{self.base_api}/gofood/v3/restaurants",
        params=params
    )

    if response.status_code == 200:
        return response.json()
    elif response.status_code == 429:
        # Rate limited - back off
        time.sleep(random.uniform(30, 60))
        return None
    else:
        return None

def scrape_city_restaurants(self, city_name):
    """Scrape all restaurants in a given city."""
    city = INDONESIAN_CITIES.get(city_name)
    if not city:
        raise ValueError(f"Unknown city: {city_name}")

    grid_points = generate_search_grid(city["bounds"])
    all_restaurants = {}

    for i, (lat, lng) in enumerate(grid_points):
        print(f"Scanning point {i+1}/{len(grid_points)}: ({lat}, {lng})")

        page = 0
        while True:
            data = self.get_nearby_restaurants(lat, lng, page=page)
            if not data or not data.get("restaurants"):
                break

            for restaurant in data["restaurants"]:
                rid = restaurant.get("id")
                if rid and rid not in all_restaurants:
                    all_restaurants[rid] = {
                        "id": rid,
                        "name": restaurant.get("name"),
                        "address": restaurant.get("address"),
                        "latitude": restaurant.get("latitude"),
                        "longitude": restaurant.get("longitude"),
                        "cuisine": restaurant.get("cuisine_type"),
                        "rating": restaurant.get("rating"),
                        "review_count": restaurant.get("total_reviews"),
                        "delivery_fee": restaurant.get("delivery_fee"),
                        "min_order": restaurant.get("min_order"),
                        "estimated_delivery": restaurant.get("eta_minutes"),
                        "is_open": restaurant.get("is_open"),
                        "city": city_name,
                        "scraped_at": datetime.utcnow().isoformat()
                    }

            if len(data["restaurants"]) < 20:
                break
            page += 1
            self._respectful_delay()

        self._respectful_delay(1, 3)

    return list(all_restaurants.values())

Extracting Menu Data

Detailed Menu Scraping

def get_restaurant_menu(self, restaurant_id):
    """Fetch the complete menu for a restaurant."""
    response = self.session.get(
        f"{self.base_api}/gofood/v2/restaurants/{restaurant_id}/menu"
    )

    if response.status_code != 200:
        return None

    raw_menu = response.json()
    parsed_items = []

    for category in raw_menu.get("categories", []):
        for item in category.get("items", []):
            parsed_items.append({
                "restaurant_id": restaurant_id,
                "category": category.get("name", ""),
                "item_name": item.get("name", ""),
                "description": item.get("description", ""),
                "price_idr": item.get("price", 0),
                "original_price_idr": item.get("original_price", item.get("price", 0)),
                "is_discounted": item.get("price", 0) < item.get("original_price", item.get("price", 0)),
                "discount_percentage": self._calc_discount(
                    item.get("price", 0),
                    item.get("original_price", item.get("price", 0))
                ),
                "is_available": item.get("available", True),
                "is_popular": item.get("is_popular", False),
                "image_url": item.get("image_url", ""),
                "variants": [
                    {"name": v.get("name"), "price": v.get("price")}
                    for v in item.get("variants", [])
                ]
            })

    return parsed_items

def _calc_discount(self, current, original):
    """Calculate discount percentage."""
    if original <= 0 or current >= original:
        return 0
    return round((1 - current / original) * 100, 1)

Scaling the Scraping Operation

Asynchronous Scraping

For large-scale data collection across multiple Indonesian cities, use asynchronous requests:

import aiohttp
import asyncio

class AsyncGoFoodScraper:
    def __init__(self, proxy_url):
        self.proxy_url = proxy_url
        self.base_api = "https://api.gojekapi.com"
        self.semaphore = asyncio.Semaphore(5)  # Max concurrent requests

    async def fetch_menu(self, session, restaurant_id):
        """Fetch menu data asynchronously."""
        async with self.semaphore:
            try:
                async with session.get(
                    f"{self.base_api}/gofood/v2/restaurants/{restaurant_id}/menu",
                    proxy=self.proxy_url,
                    timeout=aiohttp.ClientTimeout(total=30)
                ) as response:
                    if response.status == 200:
                        return await response.json()
                    return None
            except Exception as e:
                print(f"Error fetching menu {restaurant_id}: {e}")
                return None
            finally:
                await asyncio.sleep(random.uniform(1, 3))

    async def scrape_menus_batch(self, restaurant_ids):
        """Scrape menus for a batch of restaurants."""
        async with aiohttp.ClientSession(headers={
            "User-Agent": "Gojek/4.x.x (Android 14)",
            "Accept": "application/json"
        }) as session:
            tasks = [
                self.fetch_menu(session, rid) for rid in restaurant_ids
            ]
            return await asyncio.gather(*tasks)

Batch Processing Strategy

For scraping at the scale of GoFood’s Indonesian catalog:

def process_city_in_batches(scraper, city_name, batch_size=50):
    """Process a city's restaurants in manageable batches."""
    # Step 1: Get all restaurant IDs
    restaurants = scraper.scrape_city_restaurants(city_name)
    restaurant_ids = [r["id"] for r in restaurants]
    print(f"Found {len(restaurant_ids)} restaurants in {city_name}")

    # Step 2: Scrape menus in batches
    all_menus = []
    for i in range(0, len(restaurant_ids), batch_size):
        batch = restaurant_ids[i:i + batch_size]
        print(f"Processing batch {i//batch_size + 1} ({len(batch)} restaurants)")

        for rid in batch:
            menu = scraper.get_restaurant_menu(rid)
            if menu:
                all_menus.extend(menu)
            scraper._respectful_delay()

        # Longer pause between batches
        time.sleep(random.uniform(10, 20))

    return all_menus

Data Storage and Analysis

Saving to Database

from sqlalchemy import create_engine, Column, String, Float, Integer, Boolean, DateTime
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker

Base = declarative_base()

class GoFoodRestaurant(Base):
    __tablename__ = "gofood_restaurants"

    id = Column(String, primary_key=True)
    name = Column(String)
    address = Column(String)
    latitude = Column(Float)
    longitude = Column(Float)
    cuisine = Column(String)
    rating = Column(Float)
    review_count = Column(Integer)
    delivery_fee = Column(Integer)
    min_order = Column(Integer)
    city = Column(String)
    scraped_at = Column(DateTime)

class GoFoodMenuItem(Base):
    __tablename__ = "gofood_menu_items"

    id = Column(Integer, primary_key=True, autoincrement=True)
    restaurant_id = Column(String)
    category = Column(String)
    item_name = Column(String)
    price_idr = Column(Integer)
    original_price_idr = Column(Integer)
    is_discounted = Column(Boolean)
    is_available = Column(Boolean)
    scraped_at = Column(DateTime)

Price Analysis Queries

Common analytical queries on GoFood data:

def analyze_city_pricing(db_session, city):
    """Analyze pricing patterns for a city."""
    results = db_session.execute("""
        SELECT
            cuisine,
            COUNT(DISTINCT r.id) as restaurant_count,
            AVG(m.price_idr) as avg_item_price,
            MIN(m.price_idr) as min_price,
            MAX(m.price_idr) as max_price,
            AVG(r.delivery_fee) as avg_delivery_fee,
            AVG(r.rating) as avg_rating
        FROM gofood_restaurants r
        JOIN gofood_menu_items m ON r.id = m.restaurant_id
        WHERE r.city = :city AND m.is_available = true
        GROUP BY cuisine
        ORDER BY restaurant_count DESC
    """, {"city": city})
    return results.fetchall()

Indonesian Market-Specific Considerations

Currency and Pricing

Indonesian Rupiah values are large numbers. Menu items typically range from IDR 10,000 to IDR 200,000. Store prices as integers (in Rupiah) to avoid floating-point issues.

Language Handling

GoFood content is primarily in Bahasa Indonesia. Ensure your scraper handles Indonesian characters correctly:

# Ensure proper encoding
response.encoding = "utf-8"
data = response.json()

# Indonesian menu items may contain special characters
item_name = data.get("name", "").strip()

Peak Hours in Indonesia

Schedule heavy scraping during off-peak hours to minimize impact and reduce detection risk:

  • Off-peak: 1:00 AM – 6:00 AM WIB (UTC+7)
  • Moderate: 9:00 AM – 11:00 AM WIB
  • Peak (avoid): 11:30 AM – 1:30 PM and 6:00 PM – 8:30 PM WIB

Regional Cuisine Categories

GoFood has cuisine categories specific to Indonesia:

  • Nasi (Rice dishes)
  • Mie & Bakso (Noodles and Meatballs)
  • Ayam & Bebek (Chicken and Duck)
  • Sate (Satay)
  • Martabak
  • Kopi (Coffee)
  • Minuman (Beverages)
  • Jajanan (Snacks)

Understanding these categories helps you structure your data collection and analysis.

Handling Common Challenges

Dynamic API Endpoints

GoFood occasionally updates its API structure. Build your scraper to handle endpoint changes gracefully:

def get_with_fallback(self, endpoint_versions, params):
    """Try multiple API versions until one works."""
    for endpoint in endpoint_versions:
        try:
            response = self.session.get(
                f"{self.base_api}{endpoint}",
                params=params,
                timeout=30
            )
            if response.status_code == 200:
                return response.json()
        except Exception:
            continue
    return None

# Usage
menu = self.get_with_fallback(
    [
        f"/gofood/v3/restaurants/{rid}/menu",
        f"/gofood/v2/restaurants/{rid}/menu",
        f"/gofood/v1/restaurants/{rid}/catalog"
    ],
    params={}
)

Session Expiration

GoFood sessions expire periodically. Implement automatic session refresh:

def ensure_valid_session(self):
    """Check and refresh session if needed."""
    test_response = self.session.get(
        f"{self.base_api}/gofood/v1/health",
        timeout=10
    )
    if test_response.status_code == 401:
        self._initialize_session()
        return True
    return False

Conclusion

Scraping GoFood at scale requires understanding Indonesia’s mobile-first digital landscape and using proxy infrastructure that matches the platform’s expectations. DataResearchTools mobile proxies on Indonesian carriers provide the authentic network identity needed to collect GoFood data reliably.

Start with a single city like Jakarta, build out your data pipeline, validate the quality of collected data, and then expand to additional cities. The combination of proper mobile proxy infrastructure, respectful scraping practices, and structured data storage will give you a comprehensive view of Indonesia’s food delivery market.


Related Reading

Scroll to Top