Scraping Zomato and Yelp for Restaurant Market Research
While food delivery platforms like GrabFood and Foodpanda dominate transactional food data in Southeast Asia, review and discovery platforms like Zomato and Yelp offer a different and complementary perspective. These platforms specialize in restaurant discovery, detailed reviews, dining-in experiences, and comprehensive restaurant information that food delivery apps often lack.
This guide covers how to scrape Zomato and Yelp for restaurant market research, including practical techniques for extracting listings, reviews, menu data, and competitive intelligence.
Why Zomato and Yelp Data Matters
Complementary Data Sources
Food delivery platforms and review platforms capture different aspects of the dining market:
| Data Type | Food Delivery Apps | Zomato/Yelp |
|---|---|---|
| Menu pricing | Delivery prices (often inflated) | Dine-in prices |
| Reviews | Delivery experience focused | Full dining experience |
| Restaurant coverage | Delivery partners only | All restaurants |
| Photos | Food items only | Ambiance, interiors, food |
| Operating info | Delivery hours | Full hours, reservations |
| Price level | Exact prices | Price range categories |
Research Applications
- Market sizing: Count total restaurants by area, cuisine, and price level
- Trend identification: Track new restaurant openings and closures
- Consumer preferences: Analyze review content for dining trends
- Location intelligence: Map restaurant density and gaps
- Investment research: Evaluate F&B market opportunities
Zomato in Southeast Asia
Zomato’s SEA Presence
Zomato operates in several Southeast Asian markets, with significant presence in the Philippines and Indonesia. The platform offers:
- Restaurant discovery and reviews
- Table reservations
- Menu information with photos
- Curated restaurant collections
- User-generated ratings and reviews
Zomato’s Technical Architecture
Zomato provides data through both its website and mobile app:
- Web: Server-rendered pages with some dynamic content loading
- API: Public API with rate limits (deprecated for most use cases but some endpoints remain)
- Mobile app: Full-featured API for app users
Scraping Zomato Listings
import requests
from bs4 import BeautifulSoup
import time
import random
import json
class ZomatoScraper:
def __init__(self, proxy_user, proxy_pass, country="PH"):
self.session = requests.Session()
proxy_host = f"{country.lower()}-mobile.dataresearchtools.com"
self.session.proxies = {
"http": f"http://{proxy_user}:{proxy_pass}@{proxy_host}:8080",
"https": f"http://{proxy_user}:{proxy_pass}@{proxy_host}:8080"
}
self.session.headers.update({
"User-Agent": "Mozilla/5.0 (Linux; Android 14; Pixel 8) AppleWebKit/537.36 "
"(KHTML, like Gecko) Chrome/121.0.0.0 Mobile Safari/537.36",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.9"
})
def search_restaurants(self, city, cuisine=None, page=1):
"""Search for restaurants in a city."""
params = {
"page": page,
"sort": "rating",
"order": "desc"
}
if cuisine:
params["cuisines"] = cuisine
url = f"https://www.zomato.com/{city}/restaurants"
response = self.session.get(url, params=params)
if response.status_code != 200:
return []
soup = BeautifulSoup(response.text, 'html.parser')
restaurants = []
# Parse restaurant cards from search results
for card in soup.select('[data-type="restaurant"]'):
restaurant = {
"name": self._safe_text(card.select_one('.result-title')),
"url": card.select_one('a')['href'] if card.select_one('a') else None,
"cuisine": self._safe_text(card.select_one('.search-page-text')),
"rating": self._safe_text(card.select_one('.rating-large')),
"votes": self._safe_text(card.select_one('.rating-votes')),
"price_for_two": self._safe_text(card.select_one('.res-cost')),
"locality": self._safe_text(card.select_one('.search_result_subzone'))
}
restaurants.append(restaurant)
return restaurants
def _safe_text(self, element):
return element.get_text(strip=True) if element else ""Extracting Detailed Restaurant Info
def get_restaurant_detail(self, restaurant_url):
"""Scrape detailed information from a restaurant page."""
response = self.session.get(restaurant_url)
if response.status_code != 200:
return None
soup = BeautifulSoup(response.text, 'html.parser')
# Try to extract structured data from JSON-LD
script_tags = soup.find_all('script', type='application/ld+json')
structured_data = {}
for script in script_tags:
try:
data = json.loads(script.string)
if data.get('@type') == 'Restaurant':
structured_data = data
break
except (json.JSONDecodeError, TypeError):
continue
detail = {
"name": structured_data.get("name", self._safe_text(soup.select_one('h1'))),
"address": structured_data.get("address", {}).get("streetAddress", ""),
"cuisine_types": [],
"rating": structured_data.get("aggregateRating", {}).get("ratingValue"),
"review_count": structured_data.get("aggregateRating", {}).get("reviewCount"),
"price_range": structured_data.get("priceRange", ""),
"phone": structured_data.get("telephone", ""),
"hours": self._extract_hours(soup),
"features": self._extract_features(soup),
"latitude": structured_data.get("geo", {}).get("latitude"),
"longitude": structured_data.get("geo", {}).get("longitude"),
"photos_count": self._count_photos(soup)
}
return detail
def _extract_hours(self, soup):
"""Extract operating hours."""
hours = {}
hours_section = soup.select_one('.res-timing')
if hours_section:
hours["display"] = hours_section.get_text(strip=True)
return hours
def _extract_features(self, soup):
"""Extract restaurant features and amenities."""
features = []
for feature in soup.select('.res-info-feature'):
features.append(feature.get_text(strip=True))
return featuresScraping Yelp for SEA Data
Yelp’s SEA Coverage
Yelp has more limited coverage in Southeast Asia compared to Western markets, but it provides valuable data particularly in:
- Singapore (moderate coverage)
- Manila (growing presence)
- Bangkok (tourist-focused listings)
Yelp’s Anti-Bot Defenses
Yelp has some of the most aggressive anti-scraping measures of any review platform:
- Advanced bot detection using behavioral analysis
- Aggressive rate limiting
- Content obfuscation in HTML
- Dynamic content loading
- Legal enforcement against scraping
Mobile proxies significantly improve success rates because Yelp’s detection systems have higher trust thresholds for mobile carrier IPs.
Yelp Scraping Implementation
class YelpScraper:
def __init__(self, proxy_user, proxy_pass, country="SG"):
self.session = requests.Session()
proxy_host = f"{country.lower()}-mobile.dataresearchtools.com"
self.session.proxies = {
"http": f"http://{proxy_user}:{proxy_pass}@{proxy_host}:8080",
"https": f"http://{proxy_user}:{proxy_pass}@{proxy_host}:8080"
}
self.session.headers.update({
"User-Agent": "Mozilla/5.0 (iPhone; CPU iPhone OS 17_0 like Mac OS X) "
"AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.0 "
"Mobile/15E148 Safari/604.1",
"Accept": "text/html,application/xhtml+xml",
"Accept-Language": "en-US,en;q=0.9"
})
def search_restaurants(self, location, term="restaurants", start=0):
"""Search Yelp for restaurants in a location."""
params = {
"find_desc": term,
"find_loc": location,
"start": start
}
response = self.session.get(
"https://www.yelp.com/search",
params=params
)
if response.status_code != 200:
return []
soup = BeautifulSoup(response.text, 'html.parser')
results = []
# Extract search results from JSON embedded in page
for script in soup.find_all('script', type='application/json'):
try:
data = json.loads(script.string)
businesses = self._extract_businesses(data)
if businesses:
results.extend(businesses)
except (json.JSONDecodeError, TypeError):
continue
return results
def _extract_businesses(self, data):
"""Recursively search JSON for business data."""
businesses = []
if isinstance(data, dict):
if "bizId" in data or "businessId" in data:
businesses.append({
"id": data.get("bizId") or data.get("businessId"),
"name": data.get("name") or data.get("businessName"),
"rating": data.get("rating"),
"review_count": data.get("reviewCount"),
"price": data.get("priceRange"),
"categories": data.get("categories", []),
"neighborhood": data.get("neighborhoods", []),
"address": data.get("formattedAddress")
})
for value in data.values():
businesses.extend(self._extract_businesses(value))
elif isinstance(data, list):
for item in data:
businesses.extend(self._extract_businesses(item))
return businesses
def get_restaurant_reviews(self, business_id, start=0):
"""Fetch reviews for a specific business."""
response = self.session.get(
f"https://www.yelp.com/biz/{business_id}",
params={"start": start, "sort_by": "date_desc"}
)
if response.status_code != 200:
return []
soup = BeautifulSoup(response.text, 'html.parser')
reviews = []
for review_div in soup.select('[data-review-id]'):
review = {
"review_id": review_div.get('data-review-id'),
"rating": self._extract_rating(review_div),
"text": self._safe_text(review_div.select_one('.comment')),
"date": self._safe_text(review_div.select_one('.rating-qualifier')),
"user": self._safe_text(review_div.select_one('.user-display-name')),
"photos": len(review_div.select('.photo-box-img'))
}
reviews.append(review)
time.sleep(random.uniform(3, 7))
return reviews
def _extract_rating(self, element):
"""Extract star rating from review element."""
rating_element = element.select_one('[aria-label*="star"]')
if rating_element:
label = rating_element.get('aria-label', '')
try:
return float(label.split()[0])
except (ValueError, IndexError):
pass
return None
def _safe_text(self, element):
return element.get_text(strip=True) if element else ""Combining Data Sources for Market Research
Unified Restaurant Database
Merge data from Zomato, Yelp, and food delivery platforms into a unified view:
def build_unified_restaurant_profile(zomato_data, yelp_data, delivery_data):
"""Merge restaurant data from multiple sources."""
profile = {
"name": zomato_data.get("name") or yelp_data.get("name") or delivery_data.get("name"),
"address": zomato_data.get("address") or yelp_data.get("address"),
"coordinates": {
"lat": zomato_data.get("latitude") or delivery_data.get("latitude"),
"lng": zomato_data.get("longitude") or delivery_data.get("longitude")
},
"ratings": {
"zomato": zomato_data.get("rating"),
"yelp": yelp_data.get("rating"),
"grabfood": delivery_data.get("grabfood_rating"),
"foodpanda": delivery_data.get("foodpanda_rating"),
"average": None
},
"pricing": {
"dinein_price_level": zomato_data.get("price_range") or yelp_data.get("price"),
"delivery_avg_price": delivery_data.get("avg_item_price"),
"price_for_two": zomato_data.get("price_for_two")
},
"review_count": {
"zomato": zomato_data.get("review_count", 0),
"yelp": yelp_data.get("review_count", 0),
"delivery": delivery_data.get("total_reviews", 0),
"total": 0
},
"cuisine": list(set(
(zomato_data.get("cuisine_types") or []) +
(yelp_data.get("categories") or []) +
(delivery_data.get("cuisines") or [])
)),
"data_sources": []
}
# Calculate average rating
ratings = [v for v in profile["ratings"].values() if v and isinstance(v, (int, float))]
if ratings:
profile["ratings"]["average"] = round(sum(ratings) / len(ratings), 2)
# Total reviews
profile["review_count"]["total"] = sum(
v for v in profile["review_count"].values()
if isinstance(v, int)
)
# Track data sources
if zomato_data:
profile["data_sources"].append("zomato")
if yelp_data:
profile["data_sources"].append("yelp")
if delivery_data:
profile["data_sources"].append("delivery_platforms")
return profileMarket Analysis Queries
def analyze_restaurant_market(unified_restaurants, area_name):
"""Generate market analysis from unified restaurant data."""
total = len(unified_restaurants)
# Cuisine distribution
cuisine_counts = {}
for r in unified_restaurants:
for cuisine in r.get("cuisine", []):
cuisine_counts[cuisine] = cuisine_counts.get(cuisine, 0) + 1
# Price level distribution
price_distribution = {"$": 0, "$$": 0, "$$$": 0, "$$$$": 0}
for r in unified_restaurants:
level = r.get("pricing", {}).get("dinein_price_level", "")
if level in price_distribution:
price_distribution[level] += 1
# Rating distribution
rating_buckets = {"4.5+": 0, "4.0-4.4": 0, "3.5-3.9": 0, "3.0-3.4": 0, "<3.0": 0}
for r in unified_restaurants:
avg_rating = r.get("ratings", {}).get("average", 0)
if avg_rating >= 4.5:
rating_buckets["4.5+"] += 1
elif avg_rating >= 4.0:
rating_buckets["4.0-4.4"] += 1
elif avg_rating >= 3.5:
rating_buckets["3.5-3.9"] += 1
elif avg_rating >= 3.0:
rating_buckets["3.0-3.4"] += 1
else:
rating_buckets["<3.0"] += 1
return {
"area": area_name,
"total_restaurants": total,
"cuisine_distribution": dict(sorted(
cuisine_counts.items(), key=lambda x: x[1], reverse=True
)),
"price_distribution": price_distribution,
"rating_distribution": rating_buckets,
"avg_rating": round(
sum(r["ratings"]["average"] for r in unified_restaurants
if r["ratings"]["average"]) /
len([r for r in unified_restaurants if r["ratings"]["average"]]), 2
),
"delivery_presence": f"{len([r for r in unified_restaurants if 'delivery_platforms' in r['data_sources']]) / total:.1%}"
}Practical Market Research Scenarios
Scenario 1: New Restaurant Concept Validation
Before launching a new restaurant in Bangkok:
- Scrape Zomato for existing restaurants in target area
- Analyze cuisine gaps and price level distribution
- Review competitor ratings and customer feedback themes
- Assess delivery platform presence for the cuisine type
- Determine optimal price positioning
Scenario 2: Franchise Expansion Research
When evaluating a franchise opportunity in Manila:
- Map all restaurants of the same cuisine in target districts
- Compare pricing between franchise and independent operators
- Analyze review sentiment for the franchise brand across existing locations
- Assess market saturation by price level
- Identify underserved neighborhoods
Scenario 3: Investor Due Diligence
For F&B investment decisions:
- Track restaurant opening and closing rates over time
- Analyze category-level rating trends
- Compare review volume as a proxy for customer engagement
- Map competitive intensity across neighborhoods
- Identify emerging cuisine trends from new restaurant openings
Proxy Best Practices for Review Platform Scraping
Review platforms like Zomato and Yelp employ sophisticated bot detection. Key practices:
- Use mobile proxies: DataResearchTools mobile proxies provide the high-trust IPs needed for platforms with aggressive bot detection
- Rotate sessions: Create new sessions every 50-100 requests
- Vary request patterns: Randomize delays, page orders, and browsing patterns
- Respect rate limits: Keep requests to 10-15 per minute per IP
- Handle blocks gracefully: Implement exponential backoff on 403 responses
- Maintain geographic consistency: Use proxies from the same country as the target data
Conclusion
Zomato and Yelp provide critical restaurant market data that complements food delivery platform information. By combining data from all these sources using mobile proxies from DataResearchTools, researchers and F&B businesses can build comprehensive market intelligence covering the full spectrum of dining and delivery in Southeast Asia.
The key is treating each platform as a unique data source with its own technical challenges and defensive measures, while building a unified analytical framework that brings all the data together for actionable insights.
- Best Proxies for Food Delivery Platform Scraping
- How Cloud Kitchens Use Proxies for Competitive Menu Analysis
- How to Scrape AliExpress Product Data Without Getting Blocked
- Amazon Buy Box Monitoring: Proxy Setup for Continuous Tracking
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- API vs Web Scraping: When You Need Proxies (and When You Don’t)
- Best Proxies for Food Delivery Platform Scraping
- How Cloud Kitchens Use Proxies for Competitive Menu Analysis
- aiohttp + BeautifulSoup: Async Python Scraping
- How to Scrape AliExpress Product Data Without Getting Blocked
- Amazon Buy Box Monitoring: Proxy Setup for Continuous Tracking
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- Best Proxies for Food Delivery Platform Scraping
- How Cloud Kitchens Use Proxies for Competitive Menu Analysis
- aiohttp + BeautifulSoup: Async Python Scraping
- How to Scrape AliExpress Product Data Without Getting Blocked
- Amazon Buy Box Monitoring: Proxy Setup for Continuous Tracking
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
Related Reading
- Best Proxies for Food Delivery Platform Scraping
- How Cloud Kitchens Use Proxies for Competitive Menu Analysis
- aiohttp + BeautifulSoup: Async Python Scraping
- How to Scrape AliExpress Product Data Without Getting Blocked
- Amazon Buy Box Monitoring: Proxy Setup for Continuous Tracking
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)