Scraping Zomato and Yelp for Restaurant Market Research

Scraping Zomato and Yelp for Restaurant Market Research

While food delivery platforms like GrabFood and Foodpanda dominate transactional food data in Southeast Asia, review and discovery platforms like Zomato and Yelp offer a different and complementary perspective. These platforms specialize in restaurant discovery, detailed reviews, dining-in experiences, and comprehensive restaurant information that food delivery apps often lack.

This guide covers how to scrape Zomato and Yelp for restaurant market research, including practical techniques for extracting listings, reviews, menu data, and competitive intelligence.

Why Zomato and Yelp Data Matters

Complementary Data Sources

Food delivery platforms and review platforms capture different aspects of the dining market:

Data TypeFood Delivery AppsZomato/Yelp
Menu pricingDelivery prices (often inflated)Dine-in prices
ReviewsDelivery experience focusedFull dining experience
Restaurant coverageDelivery partners onlyAll restaurants
PhotosFood items onlyAmbiance, interiors, food
Operating infoDelivery hoursFull hours, reservations
Price levelExact pricesPrice range categories

Research Applications

  • Market sizing: Count total restaurants by area, cuisine, and price level
  • Trend identification: Track new restaurant openings and closures
  • Consumer preferences: Analyze review content for dining trends
  • Location intelligence: Map restaurant density and gaps
  • Investment research: Evaluate F&B market opportunities

Zomato in Southeast Asia

Zomato’s SEA Presence

Zomato operates in several Southeast Asian markets, with significant presence in the Philippines and Indonesia. The platform offers:

  • Restaurant discovery and reviews
  • Table reservations
  • Menu information with photos
  • Curated restaurant collections
  • User-generated ratings and reviews

Zomato’s Technical Architecture

Zomato provides data through both its website and mobile app:

  • Web: Server-rendered pages with some dynamic content loading
  • API: Public API with rate limits (deprecated for most use cases but some endpoints remain)
  • Mobile app: Full-featured API for app users

Scraping Zomato Listings

import requests
from bs4 import BeautifulSoup
import time
import random
import json

class ZomatoScraper:
    def __init__(self, proxy_user, proxy_pass, country="PH"):
        self.session = requests.Session()

        proxy_host = f"{country.lower()}-mobile.dataresearchtools.com"
        self.session.proxies = {
            "http": f"http://{proxy_user}:{proxy_pass}@{proxy_host}:8080",
            "https": f"http://{proxy_user}:{proxy_pass}@{proxy_host}:8080"
        }

        self.session.headers.update({
            "User-Agent": "Mozilla/5.0 (Linux; Android 14; Pixel 8) AppleWebKit/537.36 "
                         "(KHTML, like Gecko) Chrome/121.0.0.0 Mobile Safari/537.36",
            "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
            "Accept-Language": "en-US,en;q=0.9"
        })

    def search_restaurants(self, city, cuisine=None, page=1):
        """Search for restaurants in a city."""
        params = {
            "page": page,
            "sort": "rating",
            "order": "desc"
        }
        if cuisine:
            params["cuisines"] = cuisine

        url = f"https://www.zomato.com/{city}/restaurants"
        response = self.session.get(url, params=params)

        if response.status_code != 200:
            return []

        soup = BeautifulSoup(response.text, 'html.parser')
        restaurants = []

        # Parse restaurant cards from search results
        for card in soup.select('[data-type="restaurant"]'):
            restaurant = {
                "name": self._safe_text(card.select_one('.result-title')),
                "url": card.select_one('a')['href'] if card.select_one('a') else None,
                "cuisine": self._safe_text(card.select_one('.search-page-text')),
                "rating": self._safe_text(card.select_one('.rating-large')),
                "votes": self._safe_text(card.select_one('.rating-votes')),
                "price_for_two": self._safe_text(card.select_one('.res-cost')),
                "locality": self._safe_text(card.select_one('.search_result_subzone'))
            }
            restaurants.append(restaurant)

        return restaurants

    def _safe_text(self, element):
        return element.get_text(strip=True) if element else ""

Extracting Detailed Restaurant Info

def get_restaurant_detail(self, restaurant_url):
    """Scrape detailed information from a restaurant page."""
    response = self.session.get(restaurant_url)

    if response.status_code != 200:
        return None

    soup = BeautifulSoup(response.text, 'html.parser')

    # Try to extract structured data from JSON-LD
    script_tags = soup.find_all('script', type='application/ld+json')
    structured_data = {}
    for script in script_tags:
        try:
            data = json.loads(script.string)
            if data.get('@type') == 'Restaurant':
                structured_data = data
                break
        except (json.JSONDecodeError, TypeError):
            continue

    detail = {
        "name": structured_data.get("name", self._safe_text(soup.select_one('h1'))),
        "address": structured_data.get("address", {}).get("streetAddress", ""),
        "cuisine_types": [],
        "rating": structured_data.get("aggregateRating", {}).get("ratingValue"),
        "review_count": structured_data.get("aggregateRating", {}).get("reviewCount"),
        "price_range": structured_data.get("priceRange", ""),
        "phone": structured_data.get("telephone", ""),
        "hours": self._extract_hours(soup),
        "features": self._extract_features(soup),
        "latitude": structured_data.get("geo", {}).get("latitude"),
        "longitude": structured_data.get("geo", {}).get("longitude"),
        "photos_count": self._count_photos(soup)
    }

    return detail

def _extract_hours(self, soup):
    """Extract operating hours."""
    hours = {}
    hours_section = soup.select_one('.res-timing')
    if hours_section:
        hours["display"] = hours_section.get_text(strip=True)
    return hours

def _extract_features(self, soup):
    """Extract restaurant features and amenities."""
    features = []
    for feature in soup.select('.res-info-feature'):
        features.append(feature.get_text(strip=True))
    return features

Scraping Yelp for SEA Data

Yelp’s SEA Coverage

Yelp has more limited coverage in Southeast Asia compared to Western markets, but it provides valuable data particularly in:

  • Singapore (moderate coverage)
  • Manila (growing presence)
  • Bangkok (tourist-focused listings)

Yelp’s Anti-Bot Defenses

Yelp has some of the most aggressive anti-scraping measures of any review platform:

  • Advanced bot detection using behavioral analysis
  • Aggressive rate limiting
  • Content obfuscation in HTML
  • Dynamic content loading
  • Legal enforcement against scraping

Mobile proxies significantly improve success rates because Yelp’s detection systems have higher trust thresholds for mobile carrier IPs.

Yelp Scraping Implementation

class YelpScraper:
    def __init__(self, proxy_user, proxy_pass, country="SG"):
        self.session = requests.Session()

        proxy_host = f"{country.lower()}-mobile.dataresearchtools.com"
        self.session.proxies = {
            "http": f"http://{proxy_user}:{proxy_pass}@{proxy_host}:8080",
            "https": f"http://{proxy_user}:{proxy_pass}@{proxy_host}:8080"
        }

        self.session.headers.update({
            "User-Agent": "Mozilla/5.0 (iPhone; CPU iPhone OS 17_0 like Mac OS X) "
                         "AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.0 "
                         "Mobile/15E148 Safari/604.1",
            "Accept": "text/html,application/xhtml+xml",
            "Accept-Language": "en-US,en;q=0.9"
        })

    def search_restaurants(self, location, term="restaurants", start=0):
        """Search Yelp for restaurants in a location."""
        params = {
            "find_desc": term,
            "find_loc": location,
            "start": start
        }

        response = self.session.get(
            "https://www.yelp.com/search",
            params=params
        )

        if response.status_code != 200:
            return []

        soup = BeautifulSoup(response.text, 'html.parser')
        results = []

        # Extract search results from JSON embedded in page
        for script in soup.find_all('script', type='application/json'):
            try:
                data = json.loads(script.string)
                businesses = self._extract_businesses(data)
                if businesses:
                    results.extend(businesses)
            except (json.JSONDecodeError, TypeError):
                continue

        return results

    def _extract_businesses(self, data):
        """Recursively search JSON for business data."""
        businesses = []
        if isinstance(data, dict):
            if "bizId" in data or "businessId" in data:
                businesses.append({
                    "id": data.get("bizId") or data.get("businessId"),
                    "name": data.get("name") or data.get("businessName"),
                    "rating": data.get("rating"),
                    "review_count": data.get("reviewCount"),
                    "price": data.get("priceRange"),
                    "categories": data.get("categories", []),
                    "neighborhood": data.get("neighborhoods", []),
                    "address": data.get("formattedAddress")
                })
            for value in data.values():
                businesses.extend(self._extract_businesses(value))
        elif isinstance(data, list):
            for item in data:
                businesses.extend(self._extract_businesses(item))
        return businesses

    def get_restaurant_reviews(self, business_id, start=0):
        """Fetch reviews for a specific business."""
        response = self.session.get(
            f"https://www.yelp.com/biz/{business_id}",
            params={"start": start, "sort_by": "date_desc"}
        )

        if response.status_code != 200:
            return []

        soup = BeautifulSoup(response.text, 'html.parser')
        reviews = []

        for review_div in soup.select('[data-review-id]'):
            review = {
                "review_id": review_div.get('data-review-id'),
                "rating": self._extract_rating(review_div),
                "text": self._safe_text(review_div.select_one('.comment')),
                "date": self._safe_text(review_div.select_one('.rating-qualifier')),
                "user": self._safe_text(review_div.select_one('.user-display-name')),
                "photos": len(review_div.select('.photo-box-img'))
            }
            reviews.append(review)

        time.sleep(random.uniform(3, 7))
        return reviews

    def _extract_rating(self, element):
        """Extract star rating from review element."""
        rating_element = element.select_one('[aria-label*="star"]')
        if rating_element:
            label = rating_element.get('aria-label', '')
            try:
                return float(label.split()[0])
            except (ValueError, IndexError):
                pass
        return None

    def _safe_text(self, element):
        return element.get_text(strip=True) if element else ""

Combining Data Sources for Market Research

Unified Restaurant Database

Merge data from Zomato, Yelp, and food delivery platforms into a unified view:

def build_unified_restaurant_profile(zomato_data, yelp_data, delivery_data):
    """Merge restaurant data from multiple sources."""
    profile = {
        "name": zomato_data.get("name") or yelp_data.get("name") or delivery_data.get("name"),
        "address": zomato_data.get("address") or yelp_data.get("address"),
        "coordinates": {
            "lat": zomato_data.get("latitude") or delivery_data.get("latitude"),
            "lng": zomato_data.get("longitude") or delivery_data.get("longitude")
        },
        "ratings": {
            "zomato": zomato_data.get("rating"),
            "yelp": yelp_data.get("rating"),
            "grabfood": delivery_data.get("grabfood_rating"),
            "foodpanda": delivery_data.get("foodpanda_rating"),
            "average": None
        },
        "pricing": {
            "dinein_price_level": zomato_data.get("price_range") or yelp_data.get("price"),
            "delivery_avg_price": delivery_data.get("avg_item_price"),
            "price_for_two": zomato_data.get("price_for_two")
        },
        "review_count": {
            "zomato": zomato_data.get("review_count", 0),
            "yelp": yelp_data.get("review_count", 0),
            "delivery": delivery_data.get("total_reviews", 0),
            "total": 0
        },
        "cuisine": list(set(
            (zomato_data.get("cuisine_types") or []) +
            (yelp_data.get("categories") or []) +
            (delivery_data.get("cuisines") or [])
        )),
        "data_sources": []
    }

    # Calculate average rating
    ratings = [v for v in profile["ratings"].values() if v and isinstance(v, (int, float))]
    if ratings:
        profile["ratings"]["average"] = round(sum(ratings) / len(ratings), 2)

    # Total reviews
    profile["review_count"]["total"] = sum(
        v for v in profile["review_count"].values()
        if isinstance(v, int)
    )

    # Track data sources
    if zomato_data:
        profile["data_sources"].append("zomato")
    if yelp_data:
        profile["data_sources"].append("yelp")
    if delivery_data:
        profile["data_sources"].append("delivery_platforms")

    return profile

Market Analysis Queries

def analyze_restaurant_market(unified_restaurants, area_name):
    """Generate market analysis from unified restaurant data."""
    total = len(unified_restaurants)

    # Cuisine distribution
    cuisine_counts = {}
    for r in unified_restaurants:
        for cuisine in r.get("cuisine", []):
            cuisine_counts[cuisine] = cuisine_counts.get(cuisine, 0) + 1

    # Price level distribution
    price_distribution = {"$": 0, "$$": 0, "$$$": 0, "$$$$": 0}
    for r in unified_restaurants:
        level = r.get("pricing", {}).get("dinein_price_level", "")
        if level in price_distribution:
            price_distribution[level] += 1

    # Rating distribution
    rating_buckets = {"4.5+": 0, "4.0-4.4": 0, "3.5-3.9": 0, "3.0-3.4": 0, "<3.0": 0}
    for r in unified_restaurants:
        avg_rating = r.get("ratings", {}).get("average", 0)
        if avg_rating >= 4.5:
            rating_buckets["4.5+"] += 1
        elif avg_rating >= 4.0:
            rating_buckets["4.0-4.4"] += 1
        elif avg_rating >= 3.5:
            rating_buckets["3.5-3.9"] += 1
        elif avg_rating >= 3.0:
            rating_buckets["3.0-3.4"] += 1
        else:
            rating_buckets["<3.0"] += 1

    return {
        "area": area_name,
        "total_restaurants": total,
        "cuisine_distribution": dict(sorted(
            cuisine_counts.items(), key=lambda x: x[1], reverse=True
        )),
        "price_distribution": price_distribution,
        "rating_distribution": rating_buckets,
        "avg_rating": round(
            sum(r["ratings"]["average"] for r in unified_restaurants
                if r["ratings"]["average"]) /
            len([r for r in unified_restaurants if r["ratings"]["average"]]), 2
        ),
        "delivery_presence": f"{len([r for r in unified_restaurants if 'delivery_platforms' in r['data_sources']]) / total:.1%}"
    }

Practical Market Research Scenarios

Scenario 1: New Restaurant Concept Validation

Before launching a new restaurant in Bangkok:

  1. Scrape Zomato for existing restaurants in target area
  2. Analyze cuisine gaps and price level distribution
  3. Review competitor ratings and customer feedback themes
  4. Assess delivery platform presence for the cuisine type
  5. Determine optimal price positioning

Scenario 2: Franchise Expansion Research

When evaluating a franchise opportunity in Manila:

  1. Map all restaurants of the same cuisine in target districts
  2. Compare pricing between franchise and independent operators
  3. Analyze review sentiment for the franchise brand across existing locations
  4. Assess market saturation by price level
  5. Identify underserved neighborhoods

Scenario 3: Investor Due Diligence

For F&B investment decisions:

  1. Track restaurant opening and closing rates over time
  2. Analyze category-level rating trends
  3. Compare review volume as a proxy for customer engagement
  4. Map competitive intensity across neighborhoods
  5. Identify emerging cuisine trends from new restaurant openings

Proxy Best Practices for Review Platform Scraping

Review platforms like Zomato and Yelp employ sophisticated bot detection. Key practices:

  1. Use mobile proxies: DataResearchTools mobile proxies provide the high-trust IPs needed for platforms with aggressive bot detection
  2. Rotate sessions: Create new sessions every 50-100 requests
  3. Vary request patterns: Randomize delays, page orders, and browsing patterns
  4. Respect rate limits: Keep requests to 10-15 per minute per IP
  5. Handle blocks gracefully: Implement exponential backoff on 403 responses
  6. Maintain geographic consistency: Use proxies from the same country as the target data

Conclusion

Zomato and Yelp provide critical restaurant market data that complements food delivery platform information. By combining data from all these sources using mobile proxies from DataResearchTools, researchers and F&B businesses can build comprehensive market intelligence covering the full spectrum of dining and delivery in Southeast Asia.

The key is treating each platform as a unique data source with its own technical challenges and defensive measures, while building a unified analytical framework that brings all the data together for actionable insights.


Related Reading

Scroll to Top