Building a Healthcare Price Comparison Engine with Mobile Proxies

Building a Healthcare Price Comparison Engine with Mobile Proxies

Healthcare costs are notoriously opaque. Patients struggle to compare prices for medical procedures, consultations, and treatments across providers. In Southeast Asia, where healthcare systems range from Singapore’s world-class private hospitals to Indonesia’s rapidly expanding public health infrastructure, price transparency is even more challenging.

Building a healthcare price comparison engine addresses this gap by collecting, normalizing, and presenting medical pricing data from hospitals, clinics, and healthcare platforms across the region. The technical backbone of such a system is reliable proxy infrastructure that enables continuous data collection from diverse healthcare provider websites.

This guide walks you through designing and building a healthcare price comparison engine powered by DataResearchTools mobile proxies.

The Healthcare Price Transparency Problem

Why Prices Are Hard to Compare

Healthcare pricing in Southeast Asia is complex for several reasons:

  • Bundled vs. itemized pricing: Some providers quote all-inclusive package prices while others list each component separately
  • Variable pricing: Costs depend on patient conditions, insurance coverage, and negotiated rates
  • Limited online information: Many providers do not publish pricing online, or publish only starting prices
  • Currency differences: Cross-border comparison requires currency normalization
  • Quality variations: Price alone does not reflect the quality of care or included services
  • Regional variations: Pricing within a single country can vary dramatically between cities

Market Opportunity

Despite these challenges, there is strong demand for healthcare price comparison:

  • Medical tourism: Southeast Asia attracts millions of medical tourists who need to compare costs across countries
  • Insurance companies: Insurers need pricing data to set reimbursement rates and identify cost-effective providers
  • Employers: Companies with regional operations need to benchmark healthcare costs for employee benefits
  • Patients: Local patients increasingly research costs before choosing providers
  • Healthcare providers: Hospitals and clinics need competitive pricing intelligence

Data Sources for Healthcare Pricing

Hospital and Clinic Websites

Most major hospitals in Southeast Asia now publish at least some pricing information online:

  • Package prices for common procedures (health screenings, dental, cosmetic)
  • Consultation fees by department or specialist level
  • Room rates and hospitalization costs
  • Health screening packages with tiered pricing

Healthcare Booking Platforms

Online healthcare booking platforms aggregate provider information:

  • Doctor Anywhere, Halodoc, and similar platforms list consultation fees
  • Medical tourism platforms like Medical Departures and Dental Departures publish procedure prices
  • Health screening aggregators compare package prices

Government Transparency Initiatives

Several SEA governments have launched healthcare pricing transparency efforts:

  • Singapore: Ministry of Health publishes bill size data by procedure
  • Thailand: Medical tourism board publishes indicative pricing
  • Indonesia: BPJS Kesehatan (national insurance) publishes covered procedure rates
  • Malaysia: Ministry of Health publishes fee schedule guidelines

Insurance and Benefits Data

  • Published insurance plan networks and fee schedules
  • Corporate healthcare benefits information
  • Government insurance coverage rates

System Architecture

High-Level Design

Data Collection Layer         Processing Layer         Presentation Layer
---------------------         ----------------         ------------------
Hospital websites     -->  Price Extraction    -->  Search API
Booking platforms     -->  Normalization       -->  Comparison Dashboard
Government databases  -->  Categorization      -->  Analytics Reports
Insurance data        -->  Quality Scoring     -->  Alert System

         |                       |
    DataResearchTools        Database
    Mobile Proxies           (PostgreSQL)

Component Design

Data Collection Engine

  • Multi-threaded crawler using DataResearchTools mobile proxies
  • Source-specific parsers for each hospital and platform
  • Scheduling system for regular price updates
  • Error handling and retry logic

Processing Pipeline

  • Price extraction and normalization
  • Procedure categorization using standard medical coding
  • Currency conversion
  • Data quality validation

Comparison Engine

  • Search and filter capabilities
  • Cross-provider comparison
  • Cross-country comparison with currency normalization
  • Trend analysis and historical pricing

Database Schema

CREATE TABLE providers (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255) NOT NULL,
    type VARCHAR(50),  -- hospital, clinic, platform
    country VARCHAR(2),
    city VARCHAR(100),
    website VARCHAR(500),
    accreditations TEXT[],
    created_at TIMESTAMP DEFAULT NOW()
);

CREATE TABLE procedures (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255) NOT NULL,
    category VARCHAR(100),
    icd_code VARCHAR(20),
    cpt_code VARCHAR(20),
    description TEXT
);

CREATE TABLE prices (
    id SERIAL PRIMARY KEY,
    provider_id INTEGER REFERENCES providers(id),
    procedure_id INTEGER REFERENCES procedures(id),
    price DECIMAL(12,2),
    currency VARCHAR(3),
    price_usd DECIMAL(12,2),
    price_type VARCHAR(50),  -- package, starting_from, fixed, estimate
    includes TEXT,
    excludes TEXT,
    source_url VARCHAR(500),
    collected_at TIMESTAMP DEFAULT NOW()
);

CREATE TABLE price_history (
    id SERIAL PRIMARY KEY,
    price_id INTEGER REFERENCES prices(id),
    price DECIMAL(12,2),
    price_usd DECIMAL(12,2),
    recorded_at TIMESTAMP DEFAULT NOW()
);

Building the Data Collection Engine

Core Collector Class

import requests
from bs4 import BeautifulSoup
from datetime import datetime
import time
import json

class HealthcarePriceCollector:
    def __init__(self, proxy_user, proxy_pass):
        self.proxy_endpoints = {
            "SG": f"http://{proxy_user}:{proxy_pass}@sg-mobile.dataresearchtools.com:8080",
            "TH": f"http://{proxy_user}:{proxy_pass}@th-mobile.dataresearchtools.com:8080",
            "ID": f"http://{proxy_user}:{proxy_pass}@id-mobile.dataresearchtools.com:8080",
            "PH": f"http://{proxy_user}:{proxy_pass}@ph-mobile.dataresearchtools.com:8080",
            "MY": f"http://{proxy_user}:{proxy_pass}@my-mobile.dataresearchtools.com:8080",
            "VN": f"http://{proxy_user}:{proxy_pass}@vn-mobile.dataresearchtools.com:8080"
        }

    def get_proxies(self, country):
        proxy_url = self.proxy_endpoints.get(country)
        return {"http": proxy_url, "https": proxy_url}

    def collect_hospital_prices(self, hospital_config):
        country = hospital_config["country"]
        proxies = self.get_proxies(country)
        prices = []

        for page_config in hospital_config["pricing_pages"]:
            try:
                response = requests.get(
                    page_config["url"],
                    proxies=proxies,
                    headers={
                        "User-Agent": "Mozilla/5.0 (Linux; Android 14; SM-S918B) "
                                      "AppleWebKit/537.36 Chrome/120.0.0.0 Mobile Safari/537.36"
                    },
                    timeout=30
                )

                if response.status_code == 200:
                    parsed = page_config["parser"](response.text)
                    for item in parsed:
                        item["provider"] = hospital_config["name"]
                        item["country"] = country
                        item["source_url"] = page_config["url"]
                        item["collected_at"] = datetime.utcnow().isoformat()
                        prices.append(item)

                time.sleep(2)
            except Exception as e:
                print(f"Error collecting from {hospital_config['name']}: {e}")

        return prices

Hospital-Specific Parsers

Each hospital website has a unique structure requiring custom parsers:

class SingaporeHospitalParsers:

    @staticmethod
    def parse_mount_elizabeth(html):
        soup = BeautifulSoup(html, "html.parser")
        prices = []

        price_cards = soup.select(".price-package-card, .procedure-price")
        for card in price_cards:
            procedure = card.select_one(".procedure-name, .package-title")
            price_elem = card.select_one(".price-amount, .package-price")
            includes = card.select(".includes-item, .package-includes li")

            if procedure and price_elem:
                price_text = price_elem.get_text(strip=True)
                price_value = extract_numeric_price(price_text)

                prices.append({
                    "procedure": procedure.get_text(strip=True),
                    "price": price_value,
                    "currency": "SGD",
                    "price_type": "package" if "package" in
                                  card.get("class", []) else "starting_from",
                    "includes": [i.get_text(strip=True) for i in includes]
                })

        return prices

    @staticmethod
    def parse_raffles_hospital(html):
        soup = BeautifulSoup(html, "html.parser")
        prices = []

        tables = soup.select("table.pricing-table")
        for table in tables:
            category = table.find_previous("h2")
            rows = table.select("tbody tr")

            for row in rows:
                cells = row.select("td")
                if len(cells) >= 2:
                    prices.append({
                        "procedure": cells[0].get_text(strip=True),
                        "price": extract_numeric_price(
                            cells[-1].get_text(strip=True)
                        ),
                        "currency": "SGD",
                        "category": category.get_text(strip=True)
                                    if category else None,
                        "price_type": "estimated_range"
                    })

        return prices

Medical Tourism Platform Scraping

class MedicalTourismCollector:
    def __init__(self, collector):
        self.collector = collector

    def collect_procedure_prices(self, procedure, countries):
        all_prices = []

        for country in countries:
            proxies = self.collector.get_proxies(country)

            # Example: Collecting from medical tourism platforms
            response = requests.get(
                f"https://example-medical-tourism.com/search",
                params={
                    "procedure": procedure,
                    "country": country
                },
                proxies=proxies,
                headers={
                    "User-Agent": "Mozilla/5.0 (Linux; Android 14)"
                },
                timeout=30
            )

            if response.status_code == 200:
                prices = self.parse_results(response.text, country)
                all_prices.extend(prices)

            time.sleep(2)

        return all_prices

Price Normalization and Comparison

Normalizing Prices

Healthcare prices come in many formats. Normalize them for meaningful comparison:

class PriceNormalizer:
    EXCHANGE_RATES = {
        "SGD": 0.74, "THB": 0.028, "IDR": 0.000063,
        "PHP": 0.018, "MYR": 0.22, "VND": 0.000041
    }

    def normalize(self, price_data):
        normalized = price_data.copy()

        # Convert to USD
        currency = price_data["currency"]
        if currency in self.EXCHANGE_RATES:
            normalized["price_usd"] = (
                price_data["price"] * self.EXCHANGE_RATES[currency]
            )

        # Determine price type confidence
        normalized["confidence"] = self.assess_confidence(price_data)

        # Categorize the procedure
        normalized["standard_category"] = self.categorize_procedure(
            price_data["procedure"]
        )

        return normalized

    def assess_confidence(self, price_data):
        """Rate confidence in the price accuracy"""
        score = 0.5

        if price_data.get("price_type") == "package":
            score += 0.2  # Package prices are more reliable
        elif price_data.get("price_type") == "starting_from":
            score -= 0.1  # Starting prices may be lower than actual

        if price_data.get("includes"):
            score += 0.1  # Detailed inclusions improve confidence

        if price_data.get("collected_at"):
            days_old = (datetime.utcnow() - datetime.fromisoformat(
                price_data["collected_at"]
            )).days
            if days_old > 90:
                score -= 0.2
            elif days_old > 30:
                score -= 0.1

        return min(max(score, 0.0), 1.0)

Building Comparison Views

class PriceComparison:
    def compare_procedure(self, procedure_name, prices_db):
        """Generate cross-provider comparison for a procedure"""
        prices = prices_db.get_prices(procedure_name)

        comparison = {
            "procedure": procedure_name,
            "generated_at": datetime.utcnow().isoformat(),
            "by_country": {},
            "overall_stats": {}
        }

        for price in prices:
            country = price["country"]
            if country not in comparison["by_country"]:
                comparison["by_country"][country] = {
                    "providers": [],
                    "min_usd": float("inf"),
                    "max_usd": 0,
                    "avg_usd": 0
                }

            comparison["by_country"][country]["providers"].append({
                "provider": price["provider"],
                "price_local": price["price"],
                "currency": price["currency"],
                "price_usd": price["price_usd"],
                "price_type": price["price_type"],
                "includes": price.get("includes", []),
                "confidence": price.get("confidence", 0.5)
            })

        # Calculate statistics per country
        for country, data in comparison["by_country"].items():
            usd_prices = [p["price_usd"] for p in data["providers"]
                          if p["price_usd"]]
            if usd_prices:
                data["min_usd"] = min(usd_prices)
                data["max_usd"] = max(usd_prices)
                data["avg_usd"] = sum(usd_prices) / len(usd_prices)
                data["provider_count"] = len(data["providers"])

        return comparison

Keeping Data Fresh

Update Scheduling

Different data types need different update frequencies:

collection_schedule = {
    "health_screening_packages": {
        "frequency": "weekly",
        "reason": "Packages change monthly; weekly catches promotions"
    },
    "consultation_fees": {
        "frequency": "biweekly",
        "reason": "Consultation fees are relatively stable"
    },
    "procedure_estimates": {
        "frequency": "monthly",
        "reason": "Procedure pricing changes less frequently"
    },
    "room_rates": {
        "frequency": "weekly",
        "reason": "Room rates may have seasonal variations"
    },
    "dental_procedures": {
        "frequency": "monthly",
        "reason": "Dental pricing is relatively stable"
    }
}

Change Detection

Alert stakeholders when significant price changes occur:

def detect_price_changes(self, new_prices, threshold_pct=10):
    alerts = []
    for price in new_prices:
        previous = self.db.get_previous_price(
            price["provider"], price["procedure"]
        )
        if previous:
            change_pct = abs(
                (price["price"] - previous["price"]) / previous["price"] * 100
            )
            if change_pct >= threshold_pct:
                alerts.append({
                    "provider": price["provider"],
                    "procedure": price["procedure"],
                    "old_price": previous["price"],
                    "new_price": price["price"],
                    "change_pct": change_pct,
                    "direction": "increase" if price["price"] > previous["price"]
                                else "decrease"
                })
    return alerts

Best Practices

  1. Use country-specific mobile proxies: DataResearchTools mobile proxies ensure you see authentic local pricing for each SEA market.
  1. Always note what is included: A lower price that excludes anesthesia, room charges, or follow-up visits is not truly cheaper. Capture inclusion/exclusion details.
  1. Distinguish price types: Clearly label whether a price is a fixed package, starting estimate, or negotiable range.
  1. Validate outliers: Extremely low or high prices may indicate parsing errors. Implement automated validation.
  1. Respect the data: Healthcare pricing is sensitive. Present data responsibly and note the limitations of collected pricing information.
  1. Update exchange rates: Use current exchange rates for cross-currency comparisons, and note the rates used.

Conclusion

A healthcare price comparison engine powered by DataResearchTools mobile proxies fills a critical gap in healthcare transparency across Southeast Asia. By collecting pricing data from hospitals, clinics, and booking platforms across the region, normalizing it for meaningful comparison, and presenting it in an accessible format, you create value for patients, insurers, employers, and healthcare providers alike.

DataResearchTools provides the proxy infrastructure essential for this type of cross-market data collection, with mobile proxy endpoints in every major SEA country ensuring authentic, reliable access to healthcare pricing data.


Related Reading

Scroll to Top