Scraping Health Insurance Plans and Premium Data

Scraping Health Insurance Plans and Premium Data

Health insurance markets in Southeast Asia are undergoing rapid transformation. With growing middle-class populations, government mandates for insurance coverage, and the rise of digital insurance platforms, the volume and complexity of health insurance products has exploded. For insurance companies, comparison platforms, brokers, and market researchers, collecting and analyzing health insurance plan data across the region is critical for competitive strategy.

This guide covers how to scrape health insurance plans, premium data, coverage details, and benefits information from insurance company websites, comparison platforms, and regulatory databases across Southeast Asia.

The Southeast Asian Health Insurance Landscape

Market Overview

Singapore

  • Mandatory MediShield Life provides basic coverage
  • Integrated Shield Plans (IPs) from private insurers provide enhanced coverage
  • Highly regulated by the Monetary Authority of Singapore (MAS)
  • Key insurers: AIA, Great Eastern, Prudential, NTUC Income, AXA

Thailand

  • Universal Coverage Scheme covers most citizens
  • Growing private health insurance market
  • Office of Insurance Commission (OIC) regulates
  • Key players: AIA Thailand, Thai Life Insurance, Muang Thai Life

Indonesia

  • BPJS Kesehatan provides national health coverage
  • Rapidly growing private insurance market
  • OJK (Financial Services Authority) regulates
  • Key players: Allianz Indonesia, Prudential Indonesia, Manulife Indonesia

Philippines

  • PhilHealth provides basic coverage
  • Growing HMO (Health Maintenance Organization) market
  • Insurance Commission regulates
  • Key players: Maxicare, Intellicare, Medicard, Pacific Cross

Malaysia

  • Government healthcare widely accessible
  • Private health insurance growing
  • Bank Negara Malaysia regulates
  • Key players: AIA Malaysia, Great Eastern Malaysia, Prudential Malaysia

Vietnam

  • Vietnam Social Security provides basic coverage
  • Private insurance market expanding rapidly
  • Ministry of Finance regulates
  • Key players: Bao Viet, Prudential Vietnam, Manulife Vietnam

Data Points to Collect

Plan Details

  • Plan names and tiers (basic, standard, premium)
  • Coverage types (hospitalization, outpatient, dental, maternity)
  • Annual coverage limits
  • Co-payment and deductible structures
  • Waiting periods for different coverage types
  • Pre-existing condition policies
  • Network restrictions (panel vs. non-panel)

Premium Data

  • Monthly and annual premium rates
  • Premium variations by age, gender, and smoking status
  • Family plan pricing structures
  • Group insurance rate indicators
  • Premium loading for pre-existing conditions
  • Discount programs (no-claims, multi-year, bundling)

Benefits and Features

  • Hospital room and board limits
  • Surgical benefit schedules
  • Outpatient treatment coverage
  • Prescription drug coverage
  • Preventive care and wellness benefits
  • International coverage options
  • Telemedicine integration
  • Critical illness riders

Claims and Performance Data

  • Claim settlement ratios (where published)
  • Average claim processing time
  • Customer satisfaction ratings
  • Complaint statistics from regulators

Technical Implementation

Proxy Configuration

import requests
from bs4 import BeautifulSoup
from datetime import datetime
import time
import json
import re

class InsuranceScraper:
    def __init__(self, proxy_user, proxy_pass):
        self.proxies = {
            "SG": f"http://{proxy_user}:{proxy_pass}@sg-mobile.dataresearchtools.com:8080",
            "TH": f"http://{proxy_user}:{proxy_pass}@th-mobile.dataresearchtools.com:8080",
            "ID": f"http://{proxy_user}:{proxy_pass}@id-mobile.dataresearchtools.com:8080",
            "PH": f"http://{proxy_user}:{proxy_pass}@ph-mobile.dataresearchtools.com:8080",
            "MY": f"http://{proxy_user}:{proxy_pass}@my-mobile.dataresearchtools.com:8080",
            "VN": f"http://{proxy_user}:{proxy_pass}@vn-mobile.dataresearchtools.com:8080"
        }

    def get_proxy(self, country):
        proxy_url = self.proxies[country]
        return {"http": proxy_url, "https": proxy_url}

    def get_headers(self, country):
        lang_map = {
            "SG": "en-SG,en;q=0.9",
            "TH": "th-TH,th;q=0.9,en;q=0.8",
            "ID": "id-ID,id;q=0.9,en;q=0.8",
            "PH": "en-PH,en;q=0.9",
            "MY": "ms-MY,ms;q=0.9,en;q=0.8",
            "VN": "vi-VN,vi;q=0.9,en;q=0.8"
        }
        return {
            "User-Agent": "Mozilla/5.0 (Linux; Android 14; SM-S918B) "
                          "AppleWebKit/537.36 Chrome/120.0.0.0 "
                          "Mobile Safari/537.36",
            "Accept-Language": lang_map.get(country, "en-US,en;q=0.9")
        }

Insurance Company Website Scraping

class InsuranceCompanyScraper:
    def __init__(self, scraper):
        self.scraper = scraper

    def scrape_plan_details(self, insurer_config, country):
        """Scrape health insurance plan details from insurer website"""
        proxy = self.scraper.get_proxy(country)
        headers = self.scraper.get_headers(country)
        plans = []

        for page in insurer_config["plan_pages"]:
            try:
                response = requests.get(
                    page["url"],
                    proxies=proxy,
                    headers=headers,
                    timeout=30
                )

                if response.status_code == 200:
                    parsed = page["parser"](response.text)
                    for plan in parsed:
                        plan["insurer"] = insurer_config["name"]
                        plan["country"] = country
                        plan["source_url"] = page["url"]
                        plan["collected_at"] = datetime.utcnow().isoformat()
                        plans.append(plan)

                time.sleep(2)
            except Exception as e:
                print(f"Error scraping {insurer_config['name']}: {e}")

        return plans

    def scrape_premium_calculator(self, insurer_config, country,
                                  age_range=range(25, 66, 5)):
        """
        Interact with premium calculators to collect pricing
        across different age groups
        """
        proxy = self.scraper.get_proxy(country)
        headers = self.scraper.get_headers(country)
        premium_data = []

        for age in age_range:
            for gender in ["male", "female"]:
                try:
                    # Many insurers have API endpoints for premium calculation
                    response = requests.post(
                        insurer_config["calculator_url"],
                        json={
                            "age": age,
                            "gender": gender,
                            "smoker": False,
                            "plan": insurer_config.get(
                                "default_plan", "standard"
                            )
                        },
                        proxies=proxy,
                        headers={**headers, "Content-Type": "application/json"},
                        timeout=30
                    )

                    if response.status_code == 200:
                        result = response.json()
                        premium_data.append({
                            "insurer": insurer_config["name"],
                            "country": country,
                            "age": age,
                            "gender": gender,
                            "smoker": False,
                            "plan": insurer_config.get(
                                "default_plan", "standard"
                            ),
                            "monthly_premium": result.get("monthly_premium"),
                            "annual_premium": result.get("annual_premium"),
                            "currency": result.get("currency",
                                self.scraper.get_currency(country)),
                            "collected_at": datetime.utcnow().isoformat()
                        })

                    time.sleep(1)
                except Exception as e:
                    print(f"Calculator error for age {age}: {e}")

        return premium_data

Comparison Platform Scraping

class ComparisonPlatformScraper:
    def __init__(self, scraper):
        self.scraper = scraper

    def scrape_comparison_site(self, platform_config, country):
        """Scrape insurance comparison platforms"""
        proxy = self.scraper.get_proxy(country)
        headers = self.scraper.get_headers(country)

        comparison_data = []

        try:
            response = requests.get(
                platform_config["url"],
                params=platform_config.get("params", {}),
                proxies=proxy,
                headers=headers,
                timeout=30
            )

            if response.status_code == 200:
                plans = platform_config["parser"](response.text)
                for plan in plans:
                    plan["platform"] = platform_config["name"]
                    plan["country"] = country
                    plan["collected_at"] = datetime.utcnow().isoformat()
                    comparison_data.append(plan)

        except Exception as e:
            print(f"Error scraping {platform_config['name']}: {e}")

        return comparison_data

    def collect_sg_shield_plans(self):
        """Collect Singapore Integrated Shield Plan comparisons"""
        proxy = self.scraper.get_proxy("SG")
        headers = self.scraper.get_headers("SG")

        # MAS and CPF Board publish IP comparison data
        try:
            response = requests.get(
                "https://www.cpf.gov.sg/member/healthcare-financing/"
                "medishield-life/comparison-of-integrated-shield-plans",
                proxies=proxy,
                headers=headers,
                timeout=30
            )

            if response.status_code == 200:
                return self.parse_shield_plan_comparison(response.text)
        except Exception as e:
            print(f"Error collecting Shield Plan data: {e}")
        return []

Regulatory Data Collection

class InsuranceRegulatoryMonitor:
    def __init__(self, scraper):
        self.scraper = scraper

    def monitor_mas_singapore(self):
        """Monitor MAS for insurance regulatory updates"""
        proxy = self.scraper.get_proxy("SG")
        headers = self.scraper.get_headers("SG")

        updates = []
        try:
            response = requests.get(
                "https://www.mas.gov.sg/regulation/insurance",
                proxies=proxy,
                headers=headers,
                timeout=30
            )
            if response.status_code == 200:
                updates = self.parse_mas_updates(response.text)
        except Exception as e:
            print(f"Error monitoring MAS: {e}")
        return updates

    def monitor_ojk_indonesia(self):
        """Monitor OJK for insurance regulatory updates"""
        proxy = self.scraper.get_proxy("ID")
        headers = self.scraper.get_headers("ID")

        updates = []
        try:
            response = requests.get(
                "https://www.ojk.go.id/id/kanal/iknb/regulasi/asuransi",
                proxies=proxy,
                headers=headers,
                timeout=30
            )
            if response.status_code == 200:
                updates = self.parse_ojk_updates(response.text)
        except Exception as e:
            print(f"Error monitoring OJK: {e}")
        return updates

Data Analysis

Premium Comparison Analysis

class PremiumAnalyzer:
    def compare_premiums(self, premium_data, country, age, plan_tier):
        """Compare premiums across insurers for specific demographics"""
        filtered = [
            p for p in premium_data
            if p["country"] == country
            and p["age"] == age
            and plan_tier.lower() in p.get("plan", "").lower()
        ]

        if not filtered:
            return None

        comparison = {
            "country": country,
            "age": age,
            "plan_tier": plan_tier,
            "insurers": [],
            "cheapest": None,
            "most_expensive": None,
            "avg_annual_premium": 0
        }

        premiums = []
        for p in filtered:
            annual = p.get("annual_premium", 0)
            premiums.append(annual)
            comparison["insurers"].append({
                "name": p["insurer"],
                "annual_premium": annual,
                "monthly_premium": p.get("monthly_premium"),
                "currency": p["currency"]
            })

        comparison["insurers"].sort(key=lambda x: x["annual_premium"])
        comparison["cheapest"] = comparison["insurers"][0]
        comparison["most_expensive"] = comparison["insurers"][-1]
        comparison["avg_annual_premium"] = sum(premiums) / len(premiums)
        comparison["spread_pct"] = (
            (comparison["most_expensive"]["annual_premium"] -
             comparison["cheapest"]["annual_premium"]) /
            comparison["cheapest"]["annual_premium"] * 100
        ) if comparison["cheapest"]["annual_premium"] > 0 else 0

        return comparison

    def analyze_premium_trends(self, historical_premiums, insurer,
                               country, age):
        """Analyze premium trends over time"""
        filtered = [
            p for p in historical_premiums
            if p["insurer"] == insurer
            and p["country"] == country
            and p["age"] == age
        ]

        if len(filtered) < 2:
            return None

        sorted_data = sorted(filtered, key=lambda x: x["collected_at"])
        first = sorted_data[0]["annual_premium"]
        latest = sorted_data[-1]["annual_premium"]

        return {
            "insurer": insurer,
            "country": country,
            "age": age,
            "first_premium": first,
            "latest_premium": latest,
            "change_pct": ((latest - first) / first * 100) if first > 0 else 0,
            "data_points": len(sorted_data),
            "period": f"{sorted_data[0]['collected_at'][:10]} to "
                      f"{sorted_data[-1]['collected_at'][:10]}"
        }

Coverage Comparison

def compare_coverage(plans, coverage_aspect):
    """Compare specific coverage aspects across plans"""
    comparison = []
    for plan in plans:
        coverage_value = plan.get("coverage", {}).get(coverage_aspect)
        if coverage_value is not None:
            comparison.append({
                "insurer": plan["insurer"],
                "plan_name": plan.get("plan_name"),
                "country": plan["country"],
                "coverage_aspect": coverage_aspect,
                "coverage_value": coverage_value,
                "annual_premium": plan.get("annual_premium"),
                "value_ratio": (
                    coverage_value / plan["annual_premium"]
                    if plan.get("annual_premium", 0) > 0 else None
                )
            })

    return sorted(
        comparison,
        key=lambda x: x.get("value_ratio", 0) or 0,
        reverse=True
    )

Reporting

Executive Summary Report

Generate regular reports with these key metrics:

  • Premium trends: Average premium changes by market, insurer, and age group
  • Market competitiveness: Premium spreads and positioning by plan tier
  • Coverage evolution: Changes in coverage limits and benefits over time
  • Regulatory updates: New regulations affecting product design or pricing
  • Market entry activity: New plans launched or discontinued

Competitive Intelligence Dashboard

Track these metrics in real-time:

  • Premium positioning map (coverage vs. price scatter)
  • Market share estimates by premium volume
  • New product launch timeline
  • Regulatory change impact assessments
  • Customer satisfaction benchmarks

Best Practices

  1. Use country-specific mobile proxies: Insurance websites serve different content based on location. DataResearchTools mobile proxies ensure you see the plans and pricing available to local consumers in each SEA market.
  1. Capture premium calculations systematically: Run premium calculators for standardized demographic profiles to enable meaningful cross-insurer comparison.
  1. Track plan details, not just premiums: Coverage details, exclusions, and waiting periods are as important as pricing for competitive analysis.
  1. Monitor regulatory changes: Insurance regulations in SEA markets change frequently. Regulatory changes often precede product redesigns and pricing adjustments.
  1. Validate with published data: Cross-reference your scraped data against published regulatory reports and industry statistics to ensure accuracy.
  1. Respect data sensitivity: Insurance plan information is public, but customer data is not. Never attempt to access customer-facing portals or claims systems.

Conclusion

Health insurance plan and premium data collection across Southeast Asia requires geo-targeted proxy infrastructure to access local content from insurer websites, comparison platforms, and regulatory databases. DataResearchTools mobile proxies in all major SEA markets provide the reliable, localized access needed for comprehensive insurance market intelligence.

By automating plan data collection, premium tracking, and coverage analysis, insurance companies, brokers, and market researchers can maintain a real-time understanding of competitive dynamics across the rapidly evolving Southeast Asian health insurance landscape.

Start collecting health insurance intelligence with DataResearchTools mobile proxies today.


Related Reading

Scroll to Top