How Food Aggregators Use Proxies to Verify Restaurant Listings

How Food Aggregators Use Proxies to Verify Restaurant Listings

Food aggregator platforms, data companies, and business intelligence firms depend on accurate restaurant data. But food delivery platforms are dynamic environments where listings change constantly: restaurants open and close, menus update, prices shift, operating hours change, and promotional offers come and go. Verifying that restaurant data is current and accurate requires systematic, automated checking across multiple platforms and locations.

This guide explains how food aggregators use proxy-powered verification to maintain data quality across Southeast Asia’s food delivery landscape.

The Data Quality Challenge

Why Restaurant Data Goes Stale

Restaurant listings on food delivery platforms become inaccurate for many reasons:

  • Temporary closures: Restaurants close for renovations, holidays, or unforeseen events
  • Permanent closures: Businesses shut down without removing platform listings
  • Menu changes: Items added, removed, or repriced without consistent updates
  • Operating hour changes: Seasonal or event-driven schedule modifications
  • Location changes: Restaurants relocate but keep old addresses
  • Ownership changes: New owners may change concepts while keeping the listing
  • Platform inconsistencies: Same restaurant with different data on different platforms

Cost of Bad Data

Inaccurate restaurant data has real business consequences:

StakeholderImpact of Bad Data
Aggregator platformsUser frustration, reduced trust
Data analytics firmsFlawed market analysis, bad recommendations
InvestorsMisinformed investment decisions
Franchise brandsInability to track market presence
Mapping servicesIncorrect business listings
Marketing agenciesWasted ad spend on closed locations

Building a Verification System

Verification Architecture

[Restaurant Database] --> [Verification Scheduler]
                              |
                    [Platform Scrapers with Mobile Proxies]
                              |
                    [Cross-Reference Engine]
                              |
                    [Data Quality Scoring]
                              |
                    [Alerts & Updates] --> [Clean Database]

Core Implementation

import requests
import time
import random
from datetime import datetime
from dataclasses import dataclass, field
from typing import List, Optional, Dict
from enum import Enum

class ListingStatus(Enum):
    ACTIVE = "active"
    TEMPORARILY_CLOSED = "temporarily_closed"
    PERMANENTLY_CLOSED = "permanently_closed"
    NOT_FOUND = "not_found"
    DATA_MISMATCH = "data_mismatch"
    UNVERIFIED = "unverified"

@dataclass
class VerificationResult:
    restaurant_id: str
    platform: str
    country: str
    status: ListingStatus
    verified_at: datetime
    name_match: bool = True
    address_match: bool = True
    menu_available: bool = True
    is_accepting_orders: bool = True
    price_changes_detected: bool = False
    operating_hours_match: bool = True
    confidence_score: float = 0.0
    issues: List[str] = field(default_factory=list)

class RestaurantVerifier:
    def __init__(self, proxy_user, proxy_pass):
        self.proxy_user = proxy_user
        self.proxy_pass = proxy_pass

    def _get_session(self, country):
        session = requests.Session()
        proxy_host = f"{country.lower()}-mobile.dataresearchtools.com"
        session.proxies = {
            "http": f"http://{self.proxy_user}:{self.proxy_pass}@{proxy_host}:8080",
            "https": f"http://{self.proxy_user}:{self.proxy_pass}@{proxy_host}:8080"
        }
        session.headers.update({
            "User-Agent": "Mozilla/5.0 (Linux; Android 14) AppleWebKit/537.36",
            "Accept": "application/json"
        })
        return session

Single-Platform Verification

def verify_on_platform(self, restaurant_record, platform, country):
    """Verify a restaurant's listing on a specific platform."""
    session = self._get_session(country)
    platform_id = restaurant_record.get(f"{platform}_id")

    if not platform_id:
        return VerificationResult(
            restaurant_id=restaurant_record["id"],
            platform=platform,
            country=country,
            status=ListingStatus.NOT_FOUND,
            verified_at=datetime.utcnow(),
            confidence_score=0,
            issues=["No platform ID in database"]
        )

    # Fetch current platform data
    platform_data = self._fetch_platform_listing(session, platform, platform_id, country)

    if not platform_data:
        return VerificationResult(
            restaurant_id=restaurant_record["id"],
            platform=platform,
            country=country,
            status=ListingStatus.NOT_FOUND,
            verified_at=datetime.utcnow(),
            confidence_score=0,
            issues=["Listing not found on platform"]
        )

    # Run verification checks
    issues = []
    checks = {}

    # Check 1: Name match
    checks["name_match"] = self._verify_name(
        restaurant_record.get("name", ""),
        platform_data.get("name", "")
    )
    if not checks["name_match"]:
        issues.append(f"Name mismatch: DB='{restaurant_record['name']}', "
                      f"Platform='{platform_data['name']}'")

    # Check 2: Address match
    checks["address_match"] = self._verify_address(
        restaurant_record.get("address", ""),
        platform_data.get("address", ""),
        restaurant_record.get("latitude"),
        restaurant_record.get("longitude"),
        platform_data.get("latitude"),
        platform_data.get("longitude")
    )
    if not checks["address_match"]:
        issues.append("Address or coordinates mismatch")

    # Check 3: Is accepting orders
    checks["is_accepting_orders"] = platform_data.get("is_active", False)
    if not checks["is_accepting_orders"]:
        issues.append("Restaurant not currently accepting orders")

    # Check 4: Menu available
    menu = self._fetch_menu(session, platform, platform_id, country)
    checks["menu_available"] = menu is not None and len(menu) > 0
    if not checks["menu_available"]:
        issues.append("No menu items available")

    # Check 5: Price consistency
    if menu and restaurant_record.get("expected_prices"):
        checks["price_changes_detected"] = self._check_prices(
            menu, restaurant_record["expected_prices"]
        )
        if checks["price_changes_detected"]:
            issues.append("Significant price changes detected")

    # Determine status
    if not checks["is_accepting_orders"] and not checks["menu_available"]:
        status = ListingStatus.TEMPORARILY_CLOSED
    elif not checks["name_match"]:
        status = ListingStatus.DATA_MISMATCH
    elif issues:
        status = ListingStatus.DATA_MISMATCH
    else:
        status = ListingStatus.ACTIVE

    # Calculate confidence score
    confidence = self._calculate_confidence(checks)

    return VerificationResult(
        restaurant_id=restaurant_record["id"],
        platform=platform,
        country=country,
        status=status,
        verified_at=datetime.utcnow(),
        name_match=checks["name_match"],
        address_match=checks["address_match"],
        menu_available=checks["menu_available"],
        is_accepting_orders=checks["is_accepting_orders"],
        price_changes_detected=checks.get("price_changes_detected", False),
        confidence_score=confidence,
        issues=issues
    )

Name and Address Verification

def _verify_name(self, expected_name, actual_name):
    """Verify restaurant name with fuzzy matching."""
    from difflib import SequenceMatcher

    # Normalize names
    expected = self._normalize_name(expected_name)
    actual = self._normalize_name(actual_name)

    # Exact match after normalization
    if expected == actual:
        return True

    # Fuzzy match
    similarity = SequenceMatcher(None, expected, actual).ratio()
    return similarity >= 0.8

def _normalize_name(self, name):
    """Normalize a restaurant name for comparison."""
    import re
    name = name.lower().strip()
    # Remove common suffixes added by platforms
    suffixes = [" - food delivery", " (halal)", " (non-halal)",
                " - order online", " delivery", " (new)"]
    for suffix in suffixes:
        name = name.replace(suffix, "")
    name = re.sub(r'[^a-z0-9\s]', '', name)
    return ' '.join(name.split())

def _verify_address(self, expected_addr, actual_addr,
                     expected_lat=None, expected_lng=None,
                     actual_lat=None, actual_lng=None):
    """Verify address using text similarity and coordinates."""
    # If coordinates available, use distance check
    if all([expected_lat, expected_lng, actual_lat, actual_lng]):
        distance = self._haversine(
            expected_lat, expected_lng,
            actual_lat, actual_lng
        )
        return distance < 0.2  # Within 200 meters

    # Fall back to text comparison
    from difflib import SequenceMatcher
    similarity = SequenceMatcher(
        None,
        expected_addr.lower(),
        actual_addr.lower()
    ).ratio()
    return similarity >= 0.7

Cross-Platform Verification

def cross_platform_verify(self, restaurant_record, country):
    """Verify a restaurant across all available platforms."""
    platforms = ["grabfood", "foodpanda", "shopeefood"]
    results = {}

    for platform in platforms:
        if restaurant_record.get(f"{platform}_id"):
            result = self.verify_on_platform(restaurant_record, platform, country)
            results[platform] = result
            time.sleep(random.uniform(2, 4))

    # Cross-reference results
    cross_ref = {
        "restaurant_id": restaurant_record["id"],
        "platforms_checked": len(results),
        "platforms_active": len([r for r in results.values()
                                 if r.status == ListingStatus.ACTIVE]),
        "platforms_closed": len([r for r in results.values()
                                  if r.status in [ListingStatus.TEMPORARILY_CLOSED,
                                                   ListingStatus.PERMANENTLY_CLOSED]]),
        "data_consistent": self._check_cross_platform_consistency(results),
        "overall_confidence": self._aggregate_confidence(results),
        "platform_results": {
            platform: {
                "status": result.status.value,
                "confidence": result.confidence_score,
                "issues": result.issues
            }
            for platform, result in results.items()
        }
    }

    # Determine overall status
    if all(r.status == ListingStatus.ACTIVE for r in results.values()):
        cross_ref["overall_status"] = "verified_active"
    elif all(r.status in [ListingStatus.TEMPORARILY_CLOSED, ListingStatus.NOT_FOUND]
             for r in results.values()):
        cross_ref["overall_status"] = "likely_closed"
    elif any(r.status == ListingStatus.DATA_MISMATCH for r in results.values()):
        cross_ref["overall_status"] = "data_review_needed"
    else:
        cross_ref["overall_status"] = "partially_active"

    return cross_ref

def _check_cross_platform_consistency(self, results):
    """Check if data is consistent across platforms."""
    active_results = [r for r in results.values()
                      if r.status == ListingStatus.ACTIVE]

    if len(active_results) < 2:
        return True  # Can't compare with less than 2

    # Check name consistency
    names = [r.restaurant_id for r in active_results]  # Simplified
    return True  # Would compare actual fetched names

Batch Verification

Running Large-Scale Verification

def batch_verify(self, restaurants, country, batch_size=50):
    """Verify a large batch of restaurants."""
    all_results = []
    total = len(restaurants)

    for i in range(0, total, batch_size):
        batch = restaurants[i:i + batch_size]
        print(f"Verifying batch {i//batch_size + 1} "
              f"({i+1}-{min(i+batch_size, total)} of {total})")

        for restaurant in batch:
            result = self.cross_platform_verify(restaurant, country)
            all_results.append(result)
            time.sleep(random.uniform(1, 3))

        # Longer pause between batches
        time.sleep(random.uniform(5, 10))

    return all_results

def generate_verification_report(results):
    """Generate a summary report from batch verification results."""
    total = len(results)
    status_counts = {}

    for r in results:
        status = r["overall_status"]
        status_counts[status] = status_counts.get(status, 0) + 1

    needs_review = [
        r for r in results
        if r["overall_status"] in ["likely_closed", "data_review_needed"]
    ]

    report = {
        "total_verified": total,
        "verification_date": datetime.utcnow().isoformat(),
        "status_breakdown": status_counts,
        "data_quality_score": round(
            status_counts.get("verified_active", 0) / total * 100, 1
        ),
        "needs_review_count": len(needs_review),
        "needs_review": [
            {
                "id": r["restaurant_id"],
                "status": r["overall_status"],
                "confidence": r["overall_confidence"],
                "issues": [
                    issue
                    for pr in r["platform_results"].values()
                    for issue in pr["issues"]
                ]
            }
            for r in needs_review
        ]
    }

    return report

Data Quality Scoring

def _calculate_confidence(self, checks):
    """Calculate a confidence score for a verification result."""
    weights = {
        "name_match": 25,
        "address_match": 20,
        "is_accepting_orders": 25,
        "menu_available": 20,
        "price_changes_detected": 10
    }

    score = 0
    for check, weight in weights.items():
        if check == "price_changes_detected":
            # No price changes = good
            if not checks.get(check, False):
                score += weight
        else:
            if checks.get(check, False):
                score += weight

    return score

def _aggregate_confidence(self, results):
    """Aggregate confidence across platforms."""
    scores = [r.confidence_score for r in results.values()]
    if not scores:
        return 0
    return round(sum(scores) / len(scores), 1)

Scheduling Verification

Verification Priority System

Not all restaurants need the same verification frequency:

VERIFICATION_SCHEDULE = {
    "high_priority": {
        "criteria": "Chain restaurants, high-traffic locations, recently updated",
        "frequency_days": 7,
        "platforms": ["grabfood", "foodpanda", "shopeefood"]
    },
    "medium_priority": {
        "criteria": "Independent restaurants, moderate traffic",
        "frequency_days": 14,
        "platforms": ["grabfood", "foodpanda"]
    },
    "low_priority": {
        "criteria": "Low-traffic areas, stable listings",
        "frequency_days": 30,
        "platforms": ["grabfood"]
    },
    "flagged": {
        "criteria": "Previously had data issues",
        "frequency_days": 3,
        "platforms": ["grabfood", "foodpanda", "shopeefood"]
    }
}

Why Mobile Proxies Are Critical for Verification

Restaurant verification requires:

  1. Reliable access: Every failed request is an unverified listing
  2. Location accuracy: Restaurant availability depends on query location
  3. Platform trust: Verification requires access to the same data customers see
  4. Scale: Verifying thousands of listings requires sustained access without blocks

DataResearchTools mobile proxies provide the foundation for reliable, at-scale restaurant verification. Their Southeast Asian mobile carrier IPs ensure that food delivery platforms return accurate, customer-facing data, which is essential for meaningful verification results.

Conclusion

Restaurant listing verification is a critical but often overlooked component of food data quality. By building systematic verification pipelines with DataResearchTools mobile proxies, food aggregators and data companies can maintain accurate, trustworthy restaurant databases across Southeast Asia’s major food delivery platforms.

The key is treating verification as an ongoing process rather than a one-time exercise. Regular verification cycles, priority-based scheduling, and automated alerting ensure that data quality stays high as the dynamic food delivery landscape continues to evolve.


Related Reading

Scroll to Top