Scraping Health Insurance Plans and Premium Data

Health insurance markets in Southeast Asia are undergoing rapid transformation. With growing middle-class populations, government mandates for insurance coverage, and the rise of digital insurance platforms, the volume and complexity of health insurance products has exploded. For insurance companies, comparison platforms, brokers, and market researchers, collecting and analyzing health insurance plan data across the region is critical for competitive strategy.

This guide covers how to scrape health insurance plans, premium data, coverage details, and benefits information from insurance company websites, comparison platforms, and regulatory databases across Southeast Asia.

The Southeast Asian Health Insurance Landscape

Market Overview

Singapore

Mandatory MediShield Life provides basic coverage
Integrated Shield Plans (IPs) from private insurers provide enhanced coverage
Highly regulated by the Monetary Authority of Singapore (MAS)
Key insurers: AIA, Great Eastern, Prudential, NTUC Income, AXA

Thailand

Universal Coverage Scheme covers most citizens
Growing private health insurance market
Office of Insurance Commission (OIC) regulates
Key players: AIA Thailand, Thai Life Insurance, Muang Thai Life

Indonesia

BPJS Kesehatan provides national health coverage
Rapidly growing private insurance market
OJK (Financial Services Authority) regulates
Key players: Allianz Indonesia, Prudential Indonesia, Manulife Indonesia

Philippines

PhilHealth provides basic coverage
Growing HMO (Health Maintenance Organization) market
Insurance Commission regulates
Key players: Maxicare, Intellicare, Medicard, Pacific Cross

Malaysia

Government healthcare widely accessible
Private health insurance growing
Bank Negara Malaysia regulates
Key players: AIA Malaysia, Great Eastern Malaysia, Prudential Malaysia

Vietnam

Vietnam Social Security provides basic coverage
Private insurance market expanding rapidly
Ministry of Finance regulates
Key players: Bao Viet, Prudential Vietnam, Manulife Vietnam

Data Points to Collect

Plan Details

Plan names and tiers (basic, standard, premium)
Coverage types (hospitalization, outpatient, dental, maternity)
Annual coverage limits
Co-payment and deductible structures
Waiting periods for different coverage types
Pre-existing condition policies
Network restrictions (panel vs. non-panel)

Premium Data

Monthly and annual premium rates
Premium variations by age, gender, and smoking status
Family plan pricing structures
Group insurance rate indicators
Premium loading for pre-existing conditions
Discount programs (no-claims, multi-year, bundling)

Benefits and Features

Hospital room and board limits
Surgical benefit schedules
Outpatient treatment coverage
Prescription drug coverage
Preventive care and wellness benefits
International coverage options
Telemedicine integration
Critical illness riders

Claims and Performance Data

Claim settlement ratios (where published)
Average claim processing time
Customer satisfaction ratings
Complaint statistics from regulators

Technical Implementation

Proxy Configuration

import requests
from bs4 import BeautifulSoup
from datetime import datetime
import time
import json
import re

class InsuranceScraper:
    def __init__(self, proxy_user, proxy_pass):
        self.proxies = {
            "SG": f"http://{proxy_user}:{proxy_pass}@sg-mobile.dataresearchtools.com:8080",
            "TH": f"http://{proxy_user}:{proxy_pass}@th-mobile.dataresearchtools.com:8080",
            "ID": f"http://{proxy_user}:{proxy_pass}@id-mobile.dataresearchtools.com:8080",
            "PH": f"http://{proxy_user}:{proxy_pass}@ph-mobile.dataresearchtools.com:8080",
            "MY": f"http://{proxy_user}:{proxy_pass}@my-mobile.dataresearchtools.com:8080",
            "VN": f"http://{proxy_user}:{proxy_pass}@vn-mobile.dataresearchtools.com:8080"
        }

    def get_proxy(self, country):
        proxy_url = self.proxies[country]
        return {"http": proxy_url, "https": proxy_url}

    def get_headers(self, country):
        lang_map = {
            "SG": "en-SG,en;q=0.9",
            "TH": "th-TH,th;q=0.9,en;q=0.8",
            "ID": "id-ID,id;q=0.9,en;q=0.8",
            "PH": "en-PH,en;q=0.9",
            "MY": "ms-MY,ms;q=0.9,en;q=0.8",
            "VN": "vi-VN,vi;q=0.9,en;q=0.8"
        }
        return {
            "User-Agent": "Mozilla/5.0 (Linux; Android 14; SM-S918B) "
                          "AppleWebKit/537.36 Chrome/120.0.0.0 "
                          "Mobile Safari/537.36",
            "Accept-Language": lang_map.get(country, "en-US,en;q=0.9")
        }

Insurance Company Website Scraping

class InsuranceCompanyScraper:
    def __init__(self, scraper):
        self.scraper = scraper

    def scrape_plan_details(self, insurer_config, country):
        """Scrape health insurance plan details from insurer website"""
        proxy = self.scraper.get_proxy(country)
        headers = self.scraper.get_headers(country)
        plans = []

        for page in insurer_config["plan_pages"]:
            try:
                response = requests.get(
                    page["url"],
                    proxies=proxy,
                    headers=headers,
                    timeout=30
                )

                if response.status_code == 200:
                    parsed = page["parser"](response.text)
                    for plan in parsed:
                        plan["insurer"] = insurer_config["name"]
                        plan["country"] = country
                        plan["source_url"] = page["url"]
                        plan["collected_at"] = datetime.utcnow().isoformat()
                        plans.append(plan)

                time.sleep(2)
            except Exception as e:
                print(f"Error scraping {insurer_config['name']}: {e}")

        return plans

    def scrape_premium_calculator(self, insurer_config, country,
                                  age_range=range(25, 66, 5)):
        """
        Interact with premium calculators to collect pricing
        across different age groups
        """
        proxy = self.scraper.get_proxy(country)
        headers = self.scraper.get_headers(country)
        premium_data = []

        for age in age_range:
            for gender in ["male", "female"]:
                try:
                    # Many insurers have API endpoints for premium calculation
                    response = requests.post(
                        insurer_config["calculator_url"],
                        json={
                            "age": age,
                            "gender": gender,
                            "smoker": False,
                            "plan": insurer_config.get(
                                "default_plan", "standard"
                            )
                        },
                        proxies=proxy,
                        headers={**headers, "Content-Type": "application/json"},
                        timeout=30
                    )

                    if response.status_code == 200:
                        result = response.json()
                        premium_data.append({
                            "insurer": insurer_config["name"],
                            "country": country,
                            "age": age,
                            "gender": gender,
                            "smoker": False,
                            "plan": insurer_config.get(
                                "default_plan", "standard"
                            ),
                            "monthly_premium": result.get("monthly_premium"),
                            "annual_premium": result.get("annual_premium"),
                            "currency": result.get("currency",
                                self.scraper.get_currency(country)),
                            "collected_at": datetime.utcnow().isoformat()
                        })

                    time.sleep(1)
                except Exception as e:
                    print(f"Calculator error for age {age}: {e}")

        return premium_data

Comparison Platform Scraping

class ComparisonPlatformScraper:
    def __init__(self, scraper):
        self.scraper = scraper

    def scrape_comparison_site(self, platform_config, country):
        """Scrape insurance comparison platforms"""
        proxy = self.scraper.get_proxy(country)
        headers = self.scraper.get_headers(country)

        comparison_data = []

        try:
            response = requests.get(
                platform_config["url"],
                params=platform_config.get("params", {}),
                proxies=proxy,
                headers=headers,
                timeout=30
            )

            if response.status_code == 200:
                plans = platform_config["parser"](response.text)
                for plan in plans:
                    plan["platform"] = platform_config["name"]
                    plan["country"] = country
                    plan["collected_at"] = datetime.utcnow().isoformat()
                    comparison_data.append(plan)

        except Exception as e:
            print(f"Error scraping {platform_config['name']}: {e}")

        return comparison_data

    def collect_sg_shield_plans(self):
        """Collect Singapore Integrated Shield Plan comparisons"""
        proxy = self.scraper.get_proxy("SG")
        headers = self.scraper.get_headers("SG")

        # MAS and CPF Board publish IP comparison data
        try:
            response = requests.get(
                "https://www.cpf.gov.sg/member/healthcare-financing/"
                "medishield-life/comparison-of-integrated-shield-plans",
                proxies=proxy,
                headers=headers,
                timeout=30
            )

            if response.status_code == 200:
                return self.parse_shield_plan_comparison(response.text)
        except Exception as e:
            print(f"Error collecting Shield Plan data: {e}")
        return []

Regulatory Data Collection

class InsuranceRegulatoryMonitor:
    def __init__(self, scraper):
        self.scraper = scraper

    def monitor_mas_singapore(self):
        """Monitor MAS for insurance regulatory updates"""
        proxy = self.scraper.get_proxy("SG")
        headers = self.scraper.get_headers("SG")

        updates = []
        try:
            response = requests.get(
                "https://www.mas.gov.sg/regulation/insurance",
                proxies=proxy,
                headers=headers,
                timeout=30
            )
            if response.status_code == 200:
                updates = self.parse_mas_updates(response.text)
        except Exception as e:
            print(f"Error monitoring MAS: {e}")
        return updates

    def monitor_ojk_indonesia(self):
        """Monitor OJK for insurance regulatory updates"""
        proxy = self.scraper.get_proxy("ID")
        headers = self.scraper.get_headers("ID")

        updates = []
        try:
            response = requests.get(
                "https://www.ojk.go.id/id/kanal/iknb/regulasi/asuransi",
                proxies=proxy,
                headers=headers,
                timeout=30
            )
            if response.status_code == 200:
                updates = self.parse_ojk_updates(response.text)
        except Exception as e:
            print(f"Error monitoring OJK: {e}")
        return updates

Data Analysis

Premium Comparison Analysis

class PremiumAnalyzer:
    def compare_premiums(self, premium_data, country, age, plan_tier):
        """Compare premiums across insurers for specific demographics"""
        filtered = [
            p for p in premium_data
            if p["country"] == country
            and p["age"] == age
            and plan_tier.lower() in p.get("plan", "").lower()
        ]

        if not filtered:
            return None

        comparison = {
            "country": country,
            "age": age,
            "plan_tier": plan_tier,
            "insurers": [],
            "cheapest": None,
            "most_expensive": None,
            "avg_annual_premium": 0
        }

        premiums = []
        for p in filtered:
            annual = p.get("annual_premium", 0)
            premiums.append(annual)
            comparison["insurers"].append({
                "name": p["insurer"],
                "annual_premium": annual,
                "monthly_premium": p.get("monthly_premium"),
                "currency": p["currency"]
            })

        comparison["insurers"].sort(key=lambda x: x["annual_premium"])
        comparison["cheapest"] = comparison["insurers"][0]
        comparison["most_expensive"] = comparison["insurers"][-1]
        comparison["avg_annual_premium"] = sum(premiums) / len(premiums)
        comparison["spread_pct"] = (
            (comparison["most_expensive"]["annual_premium"] -
             comparison["cheapest"]["annual_premium"]) /
            comparison["cheapest"]["annual_premium"] * 100
        ) if comparison["cheapest"]["annual_premium"] > 0 else 0

        return comparison

    def analyze_premium_trends(self, historical_premiums, insurer,
                               country, age):
        """Analyze premium trends over time"""
        filtered = [
            p for p in historical_premiums
            if p["insurer"] == insurer
            and p["country"] == country
            and p["age"] == age
        ]

        if len(filtered) < 2:
            return None

        sorted_data = sorted(filtered, key=lambda x: x["collected_at"])
        first = sorted_data[0]["annual_premium"]
        latest = sorted_data[-1]["annual_premium"]

        return {
            "insurer": insurer,
            "country": country,
            "age": age,
            "first_premium": first,
            "latest_premium": latest,
            "change_pct": ((latest - first) / first * 100) if first > 0 else 0,
            "data_points": len(sorted_data),
            "period": f"{sorted_data[0]['collected_at'][:10]} to "
                      f"{sorted_data[-1]['collected_at'][:10]}"
        }

Coverage Comparison

def compare_coverage(plans, coverage_aspect):
    """Compare specific coverage aspects across plans"""
    comparison = []
    for plan in plans:
        coverage_value = plan.get("coverage", {}).get(coverage_aspect)
        if coverage_value is not None:
            comparison.append({
                "insurer": plan["insurer"],
                "plan_name": plan.get("plan_name"),
                "country": plan["country"],
                "coverage_aspect": coverage_aspect,
                "coverage_value": coverage_value,
                "annual_premium": plan.get("annual_premium"),
                "value_ratio": (
                    coverage_value / plan["annual_premium"]
                    if plan.get("annual_premium", 0) > 0 else None
                )
            })

    return sorted(
        comparison,
        key=lambda x: x.get("value_ratio", 0) or 0,
        reverse=True
    )

Reporting

Executive Summary Report

Generate regular reports with these key metrics:

Premium trends: Average premium changes by market, insurer, and age group
Market competitiveness: Premium spreads and positioning by plan tier
Coverage evolution: Changes in coverage limits and benefits over time
Regulatory updates: New regulations affecting product design or pricing
Market entry activity: New plans launched or discontinued

Competitive Intelligence Dashboard

Track these metrics in real-time:

Premium positioning map (coverage vs. price scatter)
Market share estimates by premium volume
New product launch timeline
Regulatory change impact assessments
Customer satisfaction benchmarks

Best Practices

Use country-specific mobile proxies: Insurance websites serve different content based on location. DataResearchTools mobile proxies ensure you see the plans and pricing available to local consumers in each SEA market.

Capture premium calculations systematically: Run premium calculators for standardized demographic profiles to enable meaningful cross-insurer comparison.

Track plan details, not just premiums: Coverage details, exclusions, and waiting periods are as important as pricing for competitive analysis.

Monitor regulatory changes: Insurance regulations in SEA markets change frequently. Regulatory changes often precede product redesigns and pricing adjustments.

Validate with published data: Cross-reference your scraped data against published regulatory reports and industry statistics to ensure accuracy.

Respect data sensitivity: Insurance plan information is public, but customer data is not. Never attempt to access customer-facing portals or claims systems.

Conclusion

Health insurance plan and premium data collection across Southeast Asia requires geo-targeted proxy infrastructure to access local content from insurer websites, comparison platforms, and regulatory databases. DataResearchTools mobile proxies in all major SEA markets provide the reliable, localized access needed for comprehensive insurance market intelligence.

By automating plan data collection, premium tracking, and coverage analysis, insurance companies, brokers, and market researchers can maintain a real-time understanding of competitive dynamics across the rapidly evolving Southeast Asian health insurance landscape.

Start collecting health insurance intelligence with DataResearchTools mobile proxies today.