How Food Aggregators Use Proxies to Verify Restaurant Listings
Food aggregator platforms, data companies, and business intelligence firms depend on accurate restaurant data. But food delivery platforms are dynamic environments where listings change constantly: restaurants open and close, menus update, prices shift, operating hours change, and promotional offers come and go. Verifying that restaurant data is current and accurate requires systematic, automated checking across multiple platforms and locations.
This guide explains how food aggregators use proxy-powered verification to maintain data quality across Southeast Asia’s food delivery landscape.
The Data Quality Challenge
Why Restaurant Data Goes Stale
Restaurant listings on food delivery platforms become inaccurate for many reasons:
- Temporary closures: Restaurants close for renovations, holidays, or unforeseen events
- Permanent closures: Businesses shut down without removing platform listings
- Menu changes: Items added, removed, or repriced without consistent updates
- Operating hour changes: Seasonal or event-driven schedule modifications
- Location changes: Restaurants relocate but keep old addresses
- Ownership changes: New owners may change concepts while keeping the listing
- Platform inconsistencies: Same restaurant with different data on different platforms
Cost of Bad Data
Inaccurate restaurant data has real business consequences:
| Stakeholder | Impact of Bad Data |
|---|---|
| Aggregator platforms | User frustration, reduced trust |
| Data analytics firms | Flawed market analysis, bad recommendations |
| Investors | Misinformed investment decisions |
| Franchise brands | Inability to track market presence |
| Mapping services | Incorrect business listings |
| Marketing agencies | Wasted ad spend on closed locations |
Building a Verification System
Verification Architecture
[Restaurant Database] --> [Verification Scheduler]
|
[Platform Scrapers with Mobile Proxies]
|
[Cross-Reference Engine]
|
[Data Quality Scoring]
|
[Alerts & Updates] --> [Clean Database]Core Implementation
import requests
import time
import random
from datetime import datetime
from dataclasses import dataclass, field
from typing import List, Optional, Dict
from enum import Enum
class ListingStatus(Enum):
ACTIVE = "active"
TEMPORARILY_CLOSED = "temporarily_closed"
PERMANENTLY_CLOSED = "permanently_closed"
NOT_FOUND = "not_found"
DATA_MISMATCH = "data_mismatch"
UNVERIFIED = "unverified"
@dataclass
class VerificationResult:
restaurant_id: str
platform: str
country: str
status: ListingStatus
verified_at: datetime
name_match: bool = True
address_match: bool = True
menu_available: bool = True
is_accepting_orders: bool = True
price_changes_detected: bool = False
operating_hours_match: bool = True
confidence_score: float = 0.0
issues: List[str] = field(default_factory=list)
class RestaurantVerifier:
def __init__(self, proxy_user, proxy_pass):
self.proxy_user = proxy_user
self.proxy_pass = proxy_pass
def _get_session(self, country):
session = requests.Session()
proxy_host = f"{country.lower()}-mobile.dataresearchtools.com"
session.proxies = {
"http": f"http://{self.proxy_user}:{self.proxy_pass}@{proxy_host}:8080",
"https": f"http://{self.proxy_user}:{self.proxy_pass}@{proxy_host}:8080"
}
session.headers.update({
"User-Agent": "Mozilla/5.0 (Linux; Android 14) AppleWebKit/537.36",
"Accept": "application/json"
})
return sessionSingle-Platform Verification
def verify_on_platform(self, restaurant_record, platform, country):
"""Verify a restaurant's listing on a specific platform."""
session = self._get_session(country)
platform_id = restaurant_record.get(f"{platform}_id")
if not platform_id:
return VerificationResult(
restaurant_id=restaurant_record["id"],
platform=platform,
country=country,
status=ListingStatus.NOT_FOUND,
verified_at=datetime.utcnow(),
confidence_score=0,
issues=["No platform ID in database"]
)
# Fetch current platform data
platform_data = self._fetch_platform_listing(session, platform, platform_id, country)
if not platform_data:
return VerificationResult(
restaurant_id=restaurant_record["id"],
platform=platform,
country=country,
status=ListingStatus.NOT_FOUND,
verified_at=datetime.utcnow(),
confidence_score=0,
issues=["Listing not found on platform"]
)
# Run verification checks
issues = []
checks = {}
# Check 1: Name match
checks["name_match"] = self._verify_name(
restaurant_record.get("name", ""),
platform_data.get("name", "")
)
if not checks["name_match"]:
issues.append(f"Name mismatch: DB='{restaurant_record['name']}', "
f"Platform='{platform_data['name']}'")
# Check 2: Address match
checks["address_match"] = self._verify_address(
restaurant_record.get("address", ""),
platform_data.get("address", ""),
restaurant_record.get("latitude"),
restaurant_record.get("longitude"),
platform_data.get("latitude"),
platform_data.get("longitude")
)
if not checks["address_match"]:
issues.append("Address or coordinates mismatch")
# Check 3: Is accepting orders
checks["is_accepting_orders"] = platform_data.get("is_active", False)
if not checks["is_accepting_orders"]:
issues.append("Restaurant not currently accepting orders")
# Check 4: Menu available
menu = self._fetch_menu(session, platform, platform_id, country)
checks["menu_available"] = menu is not None and len(menu) > 0
if not checks["menu_available"]:
issues.append("No menu items available")
# Check 5: Price consistency
if menu and restaurant_record.get("expected_prices"):
checks["price_changes_detected"] = self._check_prices(
menu, restaurant_record["expected_prices"]
)
if checks["price_changes_detected"]:
issues.append("Significant price changes detected")
# Determine status
if not checks["is_accepting_orders"] and not checks["menu_available"]:
status = ListingStatus.TEMPORARILY_CLOSED
elif not checks["name_match"]:
status = ListingStatus.DATA_MISMATCH
elif issues:
status = ListingStatus.DATA_MISMATCH
else:
status = ListingStatus.ACTIVE
# Calculate confidence score
confidence = self._calculate_confidence(checks)
return VerificationResult(
restaurant_id=restaurant_record["id"],
platform=platform,
country=country,
status=status,
verified_at=datetime.utcnow(),
name_match=checks["name_match"],
address_match=checks["address_match"],
menu_available=checks["menu_available"],
is_accepting_orders=checks["is_accepting_orders"],
price_changes_detected=checks.get("price_changes_detected", False),
confidence_score=confidence,
issues=issues
)Name and Address Verification
def _verify_name(self, expected_name, actual_name):
"""Verify restaurant name with fuzzy matching."""
from difflib import SequenceMatcher
# Normalize names
expected = self._normalize_name(expected_name)
actual = self._normalize_name(actual_name)
# Exact match after normalization
if expected == actual:
return True
# Fuzzy match
similarity = SequenceMatcher(None, expected, actual).ratio()
return similarity >= 0.8
def _normalize_name(self, name):
"""Normalize a restaurant name for comparison."""
import re
name = name.lower().strip()
# Remove common suffixes added by platforms
suffixes = [" - food delivery", " (halal)", " (non-halal)",
" - order online", " delivery", " (new)"]
for suffix in suffixes:
name = name.replace(suffix, "")
name = re.sub(r'[^a-z0-9\s]', '', name)
return ' '.join(name.split())
def _verify_address(self, expected_addr, actual_addr,
expected_lat=None, expected_lng=None,
actual_lat=None, actual_lng=None):
"""Verify address using text similarity and coordinates."""
# If coordinates available, use distance check
if all([expected_lat, expected_lng, actual_lat, actual_lng]):
distance = self._haversine(
expected_lat, expected_lng,
actual_lat, actual_lng
)
return distance < 0.2 # Within 200 meters
# Fall back to text comparison
from difflib import SequenceMatcher
similarity = SequenceMatcher(
None,
expected_addr.lower(),
actual_addr.lower()
).ratio()
return similarity >= 0.7Cross-Platform Verification
def cross_platform_verify(self, restaurant_record, country):
"""Verify a restaurant across all available platforms."""
platforms = ["grabfood", "foodpanda", "shopeefood"]
results = {}
for platform in platforms:
if restaurant_record.get(f"{platform}_id"):
result = self.verify_on_platform(restaurant_record, platform, country)
results[platform] = result
time.sleep(random.uniform(2, 4))
# Cross-reference results
cross_ref = {
"restaurant_id": restaurant_record["id"],
"platforms_checked": len(results),
"platforms_active": len([r for r in results.values()
if r.status == ListingStatus.ACTIVE]),
"platforms_closed": len([r for r in results.values()
if r.status in [ListingStatus.TEMPORARILY_CLOSED,
ListingStatus.PERMANENTLY_CLOSED]]),
"data_consistent": self._check_cross_platform_consistency(results),
"overall_confidence": self._aggregate_confidence(results),
"platform_results": {
platform: {
"status": result.status.value,
"confidence": result.confidence_score,
"issues": result.issues
}
for platform, result in results.items()
}
}
# Determine overall status
if all(r.status == ListingStatus.ACTIVE for r in results.values()):
cross_ref["overall_status"] = "verified_active"
elif all(r.status in [ListingStatus.TEMPORARILY_CLOSED, ListingStatus.NOT_FOUND]
for r in results.values()):
cross_ref["overall_status"] = "likely_closed"
elif any(r.status == ListingStatus.DATA_MISMATCH for r in results.values()):
cross_ref["overall_status"] = "data_review_needed"
else:
cross_ref["overall_status"] = "partially_active"
return cross_ref
def _check_cross_platform_consistency(self, results):
"""Check if data is consistent across platforms."""
active_results = [r for r in results.values()
if r.status == ListingStatus.ACTIVE]
if len(active_results) < 2:
return True # Can't compare with less than 2
# Check name consistency
names = [r.restaurant_id for r in active_results] # Simplified
return True # Would compare actual fetched namesBatch Verification
Running Large-Scale Verification
def batch_verify(self, restaurants, country, batch_size=50):
"""Verify a large batch of restaurants."""
all_results = []
total = len(restaurants)
for i in range(0, total, batch_size):
batch = restaurants[i:i + batch_size]
print(f"Verifying batch {i//batch_size + 1} "
f"({i+1}-{min(i+batch_size, total)} of {total})")
for restaurant in batch:
result = self.cross_platform_verify(restaurant, country)
all_results.append(result)
time.sleep(random.uniform(1, 3))
# Longer pause between batches
time.sleep(random.uniform(5, 10))
return all_results
def generate_verification_report(results):
"""Generate a summary report from batch verification results."""
total = len(results)
status_counts = {}
for r in results:
status = r["overall_status"]
status_counts[status] = status_counts.get(status, 0) + 1
needs_review = [
r for r in results
if r["overall_status"] in ["likely_closed", "data_review_needed"]
]
report = {
"total_verified": total,
"verification_date": datetime.utcnow().isoformat(),
"status_breakdown": status_counts,
"data_quality_score": round(
status_counts.get("verified_active", 0) / total * 100, 1
),
"needs_review_count": len(needs_review),
"needs_review": [
{
"id": r["restaurant_id"],
"status": r["overall_status"],
"confidence": r["overall_confidence"],
"issues": [
issue
for pr in r["platform_results"].values()
for issue in pr["issues"]
]
}
for r in needs_review
]
}
return reportData Quality Scoring
def _calculate_confidence(self, checks):
"""Calculate a confidence score for a verification result."""
weights = {
"name_match": 25,
"address_match": 20,
"is_accepting_orders": 25,
"menu_available": 20,
"price_changes_detected": 10
}
score = 0
for check, weight in weights.items():
if check == "price_changes_detected":
# No price changes = good
if not checks.get(check, False):
score += weight
else:
if checks.get(check, False):
score += weight
return score
def _aggregate_confidence(self, results):
"""Aggregate confidence across platforms."""
scores = [r.confidence_score for r in results.values()]
if not scores:
return 0
return round(sum(scores) / len(scores), 1)Scheduling Verification
Verification Priority System
Not all restaurants need the same verification frequency:
VERIFICATION_SCHEDULE = {
"high_priority": {
"criteria": "Chain restaurants, high-traffic locations, recently updated",
"frequency_days": 7,
"platforms": ["grabfood", "foodpanda", "shopeefood"]
},
"medium_priority": {
"criteria": "Independent restaurants, moderate traffic",
"frequency_days": 14,
"platforms": ["grabfood", "foodpanda"]
},
"low_priority": {
"criteria": "Low-traffic areas, stable listings",
"frequency_days": 30,
"platforms": ["grabfood"]
},
"flagged": {
"criteria": "Previously had data issues",
"frequency_days": 3,
"platforms": ["grabfood", "foodpanda", "shopeefood"]
}
}Why Mobile Proxies Are Critical for Verification
Restaurant verification requires:
- Reliable access: Every failed request is an unverified listing
- Location accuracy: Restaurant availability depends on query location
- Platform trust: Verification requires access to the same data customers see
- Scale: Verifying thousands of listings requires sustained access without blocks
DataResearchTools mobile proxies provide the foundation for reliable, at-scale restaurant verification. Their Southeast Asian mobile carrier IPs ensure that food delivery platforms return accurate, customer-facing data, which is essential for meaningful verification results.
Conclusion
Restaurant listing verification is a critical but often overlooked component of food data quality. By building systematic verification pipelines with DataResearchTools mobile proxies, food aggregators and data companies can maintain accurate, trustworthy restaurant databases across Southeast Asia’s major food delivery platforms.
The key is treating verification as an ongoing process rather than a one-time exercise. Regular verification cycles, priority-based scheduling, and automated alerting ensure that data quality stays high as the dynamic food delivery landscape continues to evolve.
- Best Proxies for Food Delivery Platform Scraping
- How Cloud Kitchens Use Proxies for Competitive Menu Analysis
- How to Scrape AliExpress Product Data Without Getting Blocked
- Amazon Buy Box Monitoring: Proxy Setup for Continuous Tracking
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- API vs Web Scraping: When You Need Proxies (and When You Don’t)
- Best Proxies for Food Delivery Platform Scraping
- How Cloud Kitchens Use Proxies for Competitive Menu Analysis
- aiohttp + BeautifulSoup: Async Python Scraping
- How to Scrape AliExpress Product Data Without Getting Blocked
- Amazon Buy Box Monitoring: Proxy Setup for Continuous Tracking
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- Best Proxies for Food Delivery Platform Scraping
- How Cloud Kitchens Use Proxies for Competitive Menu Analysis
- aiohttp + BeautifulSoup: Async Python Scraping
- How to Scrape AliExpress Product Data Without Getting Blocked
- Amazon Buy Box Monitoring: Proxy Setup for Continuous Tracking
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- Best Proxies for Food Delivery Platform Scraping
- How Cloud Kitchens Use Proxies for Competitive Menu Analysis
- aiohttp + BeautifulSoup: Async Python Scraping
- How to Scrape AliExpress Product Data Without Getting Blocked
- Amazon Buy Box Monitoring: Proxy Setup for Continuous Tracking
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
Related Reading
- Best Proxies for Food Delivery Platform Scraping
- How Cloud Kitchens Use Proxies for Competitive Menu Analysis
- aiohttp + BeautifulSoup: Async Python Scraping
- How to Scrape AliExpress Product Data Without Getting Blocked
- Amazon Buy Box Monitoring: Proxy Setup for Continuous Tracking
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)