Scraping Health Insurance Plans and Premium Data
Health insurance markets in Southeast Asia are undergoing rapid transformation. With growing middle-class populations, government mandates for insurance coverage, and the rise of digital insurance platforms, the volume and complexity of health insurance products has exploded. For insurance companies, comparison platforms, brokers, and market researchers, collecting and analyzing health insurance plan data across the region is critical for competitive strategy.
This guide covers how to scrape health insurance plans, premium data, coverage details, and benefits information from insurance company websites, comparison platforms, and regulatory databases across Southeast Asia.
The Southeast Asian Health Insurance Landscape
Market Overview
Singapore
- Mandatory MediShield Life provides basic coverage
- Integrated Shield Plans (IPs) from private insurers provide enhanced coverage
- Highly regulated by the Monetary Authority of Singapore (MAS)
- Key insurers: AIA, Great Eastern, Prudential, NTUC Income, AXA
Thailand
- Universal Coverage Scheme covers most citizens
- Growing private health insurance market
- Office of Insurance Commission (OIC) regulates
- Key players: AIA Thailand, Thai Life Insurance, Muang Thai Life
Indonesia
- BPJS Kesehatan provides national health coverage
- Rapidly growing private insurance market
- OJK (Financial Services Authority) regulates
- Key players: Allianz Indonesia, Prudential Indonesia, Manulife Indonesia
Philippines
- PhilHealth provides basic coverage
- Growing HMO (Health Maintenance Organization) market
- Insurance Commission regulates
- Key players: Maxicare, Intellicare, Medicard, Pacific Cross
Malaysia
- Government healthcare widely accessible
- Private health insurance growing
- Bank Negara Malaysia regulates
- Key players: AIA Malaysia, Great Eastern Malaysia, Prudential Malaysia
Vietnam
- Vietnam Social Security provides basic coverage
- Private insurance market expanding rapidly
- Ministry of Finance regulates
- Key players: Bao Viet, Prudential Vietnam, Manulife Vietnam
Data Points to Collect
Plan Details
- Plan names and tiers (basic, standard, premium)
- Coverage types (hospitalization, outpatient, dental, maternity)
- Annual coverage limits
- Co-payment and deductible structures
- Waiting periods for different coverage types
- Pre-existing condition policies
- Network restrictions (panel vs. non-panel)
Premium Data
- Monthly and annual premium rates
- Premium variations by age, gender, and smoking status
- Family plan pricing structures
- Group insurance rate indicators
- Premium loading for pre-existing conditions
- Discount programs (no-claims, multi-year, bundling)
Benefits and Features
- Hospital room and board limits
- Surgical benefit schedules
- Outpatient treatment coverage
- Prescription drug coverage
- Preventive care and wellness benefits
- International coverage options
- Telemedicine integration
- Critical illness riders
Claims and Performance Data
- Claim settlement ratios (where published)
- Average claim processing time
- Customer satisfaction ratings
- Complaint statistics from regulators
Technical Implementation
Proxy Configuration
import requests
from bs4 import BeautifulSoup
from datetime import datetime
import time
import json
import re
class InsuranceScraper:
def __init__(self, proxy_user, proxy_pass):
self.proxies = {
"SG": f"http://{proxy_user}:{proxy_pass}@sg-mobile.dataresearchtools.com:8080",
"TH": f"http://{proxy_user}:{proxy_pass}@th-mobile.dataresearchtools.com:8080",
"ID": f"http://{proxy_user}:{proxy_pass}@id-mobile.dataresearchtools.com:8080",
"PH": f"http://{proxy_user}:{proxy_pass}@ph-mobile.dataresearchtools.com:8080",
"MY": f"http://{proxy_user}:{proxy_pass}@my-mobile.dataresearchtools.com:8080",
"VN": f"http://{proxy_user}:{proxy_pass}@vn-mobile.dataresearchtools.com:8080"
}
def get_proxy(self, country):
proxy_url = self.proxies[country]
return {"http": proxy_url, "https": proxy_url}
def get_headers(self, country):
lang_map = {
"SG": "en-SG,en;q=0.9",
"TH": "th-TH,th;q=0.9,en;q=0.8",
"ID": "id-ID,id;q=0.9,en;q=0.8",
"PH": "en-PH,en;q=0.9",
"MY": "ms-MY,ms;q=0.9,en;q=0.8",
"VN": "vi-VN,vi;q=0.9,en;q=0.8"
}
return {
"User-Agent": "Mozilla/5.0 (Linux; Android 14; SM-S918B) "
"AppleWebKit/537.36 Chrome/120.0.0.0 "
"Mobile Safari/537.36",
"Accept-Language": lang_map.get(country, "en-US,en;q=0.9")
}Insurance Company Website Scraping
class InsuranceCompanyScraper:
def __init__(self, scraper):
self.scraper = scraper
def scrape_plan_details(self, insurer_config, country):
"""Scrape health insurance plan details from insurer website"""
proxy = self.scraper.get_proxy(country)
headers = self.scraper.get_headers(country)
plans = []
for page in insurer_config["plan_pages"]:
try:
response = requests.get(
page["url"],
proxies=proxy,
headers=headers,
timeout=30
)
if response.status_code == 200:
parsed = page["parser"](response.text)
for plan in parsed:
plan["insurer"] = insurer_config["name"]
plan["country"] = country
plan["source_url"] = page["url"]
plan["collected_at"] = datetime.utcnow().isoformat()
plans.append(plan)
time.sleep(2)
except Exception as e:
print(f"Error scraping {insurer_config['name']}: {e}")
return plans
def scrape_premium_calculator(self, insurer_config, country,
age_range=range(25, 66, 5)):
"""
Interact with premium calculators to collect pricing
across different age groups
"""
proxy = self.scraper.get_proxy(country)
headers = self.scraper.get_headers(country)
premium_data = []
for age in age_range:
for gender in ["male", "female"]:
try:
# Many insurers have API endpoints for premium calculation
response = requests.post(
insurer_config["calculator_url"],
json={
"age": age,
"gender": gender,
"smoker": False,
"plan": insurer_config.get(
"default_plan", "standard"
)
},
proxies=proxy,
headers={**headers, "Content-Type": "application/json"},
timeout=30
)
if response.status_code == 200:
result = response.json()
premium_data.append({
"insurer": insurer_config["name"],
"country": country,
"age": age,
"gender": gender,
"smoker": False,
"plan": insurer_config.get(
"default_plan", "standard"
),
"monthly_premium": result.get("monthly_premium"),
"annual_premium": result.get("annual_premium"),
"currency": result.get("currency",
self.scraper.get_currency(country)),
"collected_at": datetime.utcnow().isoformat()
})
time.sleep(1)
except Exception as e:
print(f"Calculator error for age {age}: {e}")
return premium_dataComparison Platform Scraping
class ComparisonPlatformScraper:
def __init__(self, scraper):
self.scraper = scraper
def scrape_comparison_site(self, platform_config, country):
"""Scrape insurance comparison platforms"""
proxy = self.scraper.get_proxy(country)
headers = self.scraper.get_headers(country)
comparison_data = []
try:
response = requests.get(
platform_config["url"],
params=platform_config.get("params", {}),
proxies=proxy,
headers=headers,
timeout=30
)
if response.status_code == 200:
plans = platform_config["parser"](response.text)
for plan in plans:
plan["platform"] = platform_config["name"]
plan["country"] = country
plan["collected_at"] = datetime.utcnow().isoformat()
comparison_data.append(plan)
except Exception as e:
print(f"Error scraping {platform_config['name']}: {e}")
return comparison_data
def collect_sg_shield_plans(self):
"""Collect Singapore Integrated Shield Plan comparisons"""
proxy = self.scraper.get_proxy("SG")
headers = self.scraper.get_headers("SG")
# MAS and CPF Board publish IP comparison data
try:
response = requests.get(
"https://www.cpf.gov.sg/member/healthcare-financing/"
"medishield-life/comparison-of-integrated-shield-plans",
proxies=proxy,
headers=headers,
timeout=30
)
if response.status_code == 200:
return self.parse_shield_plan_comparison(response.text)
except Exception as e:
print(f"Error collecting Shield Plan data: {e}")
return []Regulatory Data Collection
class InsuranceRegulatoryMonitor:
def __init__(self, scraper):
self.scraper = scraper
def monitor_mas_singapore(self):
"""Monitor MAS for insurance regulatory updates"""
proxy = self.scraper.get_proxy("SG")
headers = self.scraper.get_headers("SG")
updates = []
try:
response = requests.get(
"https://www.mas.gov.sg/regulation/insurance",
proxies=proxy,
headers=headers,
timeout=30
)
if response.status_code == 200:
updates = self.parse_mas_updates(response.text)
except Exception as e:
print(f"Error monitoring MAS: {e}")
return updates
def monitor_ojk_indonesia(self):
"""Monitor OJK for insurance regulatory updates"""
proxy = self.scraper.get_proxy("ID")
headers = self.scraper.get_headers("ID")
updates = []
try:
response = requests.get(
"https://www.ojk.go.id/id/kanal/iknb/regulasi/asuransi",
proxies=proxy,
headers=headers,
timeout=30
)
if response.status_code == 200:
updates = self.parse_ojk_updates(response.text)
except Exception as e:
print(f"Error monitoring OJK: {e}")
return updatesData Analysis
Premium Comparison Analysis
class PremiumAnalyzer:
def compare_premiums(self, premium_data, country, age, plan_tier):
"""Compare premiums across insurers for specific demographics"""
filtered = [
p for p in premium_data
if p["country"] == country
and p["age"] == age
and plan_tier.lower() in p.get("plan", "").lower()
]
if not filtered:
return None
comparison = {
"country": country,
"age": age,
"plan_tier": plan_tier,
"insurers": [],
"cheapest": None,
"most_expensive": None,
"avg_annual_premium": 0
}
premiums = []
for p in filtered:
annual = p.get("annual_premium", 0)
premiums.append(annual)
comparison["insurers"].append({
"name": p["insurer"],
"annual_premium": annual,
"monthly_premium": p.get("monthly_premium"),
"currency": p["currency"]
})
comparison["insurers"].sort(key=lambda x: x["annual_premium"])
comparison["cheapest"] = comparison["insurers"][0]
comparison["most_expensive"] = comparison["insurers"][-1]
comparison["avg_annual_premium"] = sum(premiums) / len(premiums)
comparison["spread_pct"] = (
(comparison["most_expensive"]["annual_premium"] -
comparison["cheapest"]["annual_premium"]) /
comparison["cheapest"]["annual_premium"] * 100
) if comparison["cheapest"]["annual_premium"] > 0 else 0
return comparison
def analyze_premium_trends(self, historical_premiums, insurer,
country, age):
"""Analyze premium trends over time"""
filtered = [
p for p in historical_premiums
if p["insurer"] == insurer
and p["country"] == country
and p["age"] == age
]
if len(filtered) < 2:
return None
sorted_data = sorted(filtered, key=lambda x: x["collected_at"])
first = sorted_data[0]["annual_premium"]
latest = sorted_data[-1]["annual_premium"]
return {
"insurer": insurer,
"country": country,
"age": age,
"first_premium": first,
"latest_premium": latest,
"change_pct": ((latest - first) / first * 100) if first > 0 else 0,
"data_points": len(sorted_data),
"period": f"{sorted_data[0]['collected_at'][:10]} to "
f"{sorted_data[-1]['collected_at'][:10]}"
}Coverage Comparison
def compare_coverage(plans, coverage_aspect):
"""Compare specific coverage aspects across plans"""
comparison = []
for plan in plans:
coverage_value = plan.get("coverage", {}).get(coverage_aspect)
if coverage_value is not None:
comparison.append({
"insurer": plan["insurer"],
"plan_name": plan.get("plan_name"),
"country": plan["country"],
"coverage_aspect": coverage_aspect,
"coverage_value": coverage_value,
"annual_premium": plan.get("annual_premium"),
"value_ratio": (
coverage_value / plan["annual_premium"]
if plan.get("annual_premium", 0) > 0 else None
)
})
return sorted(
comparison,
key=lambda x: x.get("value_ratio", 0) or 0,
reverse=True
)Reporting
Executive Summary Report
Generate regular reports with these key metrics:
- Premium trends: Average premium changes by market, insurer, and age group
- Market competitiveness: Premium spreads and positioning by plan tier
- Coverage evolution: Changes in coverage limits and benefits over time
- Regulatory updates: New regulations affecting product design or pricing
- Market entry activity: New plans launched or discontinued
Competitive Intelligence Dashboard
Track these metrics in real-time:
- Premium positioning map (coverage vs. price scatter)
- Market share estimates by premium volume
- New product launch timeline
- Regulatory change impact assessments
- Customer satisfaction benchmarks
Best Practices
- Use country-specific mobile proxies: Insurance websites serve different content based on location. DataResearchTools mobile proxies ensure you see the plans and pricing available to local consumers in each SEA market.
- Capture premium calculations systematically: Run premium calculators for standardized demographic profiles to enable meaningful cross-insurer comparison.
- Track plan details, not just premiums: Coverage details, exclusions, and waiting periods are as important as pricing for competitive analysis.
- Monitor regulatory changes: Insurance regulations in SEA markets change frequently. Regulatory changes often precede product redesigns and pricing adjustments.
- Validate with published data: Cross-reference your scraped data against published regulatory reports and industry statistics to ensure accuracy.
- Respect data sensitivity: Insurance plan information is public, but customer data is not. Never attempt to access customer-facing portals or claims systems.
Conclusion
Health insurance plan and premium data collection across Southeast Asia requires geo-targeted proxy infrastructure to access local content from insurer websites, comparison platforms, and regulatory databases. DataResearchTools mobile proxies in all major SEA markets provide the reliable, localized access needed for comprehensive insurance market intelligence.
By automating plan data collection, premium tracking, and coverage analysis, insurance companies, brokers, and market researchers can maintain a real-time understanding of competitive dynamics across the rapidly evolving Southeast Asian health insurance landscape.
Start collecting health insurance intelligence with DataResearchTools mobile proxies today.
- How AI + Proxies Are Transforming Drug Discovery Data Pipelines
- Best Proxies for Healthcare Data Collection in 2026
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- API vs Web Scraping: When You Need Proxies (and When You Don’t)
- ASEAN Data Protection Laws: A Web Scraping Compliance Matrix
- Best Proxies for Government Data Scraping
- How AI + Proxies Are Transforming Drug Discovery Data Pipelines
- Best Proxies for Healthcare Data Collection in 2026
- aiohttp + BeautifulSoup: Async Python Scraping
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- API vs Web Scraping: When You Need Proxies (and When You Don’t)
- ASEAN Data Protection Laws: A Web Scraping Compliance Matrix
- How AI + Proxies Are Transforming Drug Discovery Data Pipelines
- Best Proxies for Healthcare Data Collection in 2026
- aiohttp + BeautifulSoup: Async Python Scraping
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- API vs Web Scraping: When You Need Proxies (and When You Don’t)
- ASEAN Data Protection Laws: A Web Scraping Compliance Matrix
Related Reading
- How AI + Proxies Are Transforming Drug Discovery Data Pipelines
- Best Proxies for Healthcare Data Collection in 2026
- aiohttp + BeautifulSoup: Async Python Scraping
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- API vs Web Scraping: When You Need Proxies (and When You Don’t)
- ASEAN Data Protection Laws: A Web Scraping Compliance Matrix