How to Scrape Government Vehicle Registration and Auction Data
Government databases and auction platforms contain some of the most valuable and authoritative vehicle data available. Registration statistics reveal market trends, auction results establish fair market values, and deregistration data shows fleet turnover patterns. For automotive businesses, researchers, and data providers in Southeast Asia, accessing this data systematically through web scraping unlocks insights that are simply not available elsewhere.
This guide covers how to collect government vehicle registration data and auction information across Southeast Asian markets using proxy infrastructure.
Government Vehicle Data Sources by Country
Singapore
Singapore has some of the most accessible government vehicle data in the region:
Land Transport Authority (LTA):
- Vehicle registration and deregistration statistics
- COE bidding results (bi-monthly)
- Open data portal with vehicle population datasets
- Road tax and vehicle inspection data
OneMotoring:
- Vehicle registration services
- COE statistics and trends
- Vehicle inspection records
Government Auctions:
- Singapore Customs auctions (seized vehicles)
- Government surplus vehicle sales
- Statutory board vehicle disposals
Malaysia
Jabatan Pengangkutan Jalan (JPJ):
- Vehicle registration database
- Road tax status checks
- Ownership transfer records
Government Auctions:
- PUSPAKOM condemned vehicle data
- Royal Malaysian Customs (seized vehicles)
- Government fleet disposal
Thailand
Department of Land Transport (DLT):
- Vehicle registration statistics
- Annual vehicle inspection data
- Commercial vehicle licensing
Thai Government Auctions:
- Customs Department auctions
- Revenue Department seized property sales
- State-owned vehicle disposals
Indonesia
Korlantas Polri (Traffic Police):
- Vehicle registration data (STNK)
- Vehicle ownership records
Government Auctions:
- DJKN (State Asset Management) auctions
- Customs auction platforms
- BUMN (state enterprise) vehicle disposals
Proxy Requirements for Government Sites
Technical Challenges
Government websites present unique scraping challenges:
- Rate limiting: Government sites often have strict rate limits due to limited infrastructure
- Geographic restrictions: Some data is only accessible from within the country
- Session management: Multi-step data lookups require session continuity
- Legacy technology: Older sites may use non-standard HTML or require specific browser behaviors
- CAPTCHA protection: Many government forms include CAPTCHA challenges
DataResearchTools Proxy Configuration
For government data collection, the key requirements are geographic authenticity and session stability:
class GovDataProxyManager:
def __init__(self, api_key):
self.api_key = api_key
self.endpoint = "proxy.dataresearchtools.com"
def get_gov_proxy(self, country):
"""Get a sticky proxy for government site access"""
session_id = f"gov-{country}-{int(time.time()) % 100000}"
auth = f"{self.api_key}:country-{country}-type-mobile-session-{session_id}-ttl-1800"
return {
"http": f"http://{auth}@{self.endpoint}:8080",
"https": f"http://{auth}@{self.endpoint}:8080"
}Mobile proxies from DataResearchTools are effective for government sites because they provide authentic in-country IP addresses. Government websites are most likely to serve full data to visitors who appear to be local residents accessing services through their mobile devices.
Scraping Singapore LTA Data
COE Bidding Results
COE results are published after every bidding exercise and are publicly available:
class LTACOEScraper:
def __init__(self, proxy_manager):
self.proxy_manager = proxy_manager
self.base_url = "https://www.onemotoring.com.sg"
def scrape_coe_results(self):
proxy = self.proxy_manager.get_gov_proxy("SG")
session = requests.Session()
session.proxies.update(proxy)
session.headers.update({
"User-Agent": get_random_mobile_ua(),
"Accept-Language": "en-SG,en;q=0.9"
})
response = session.get(
f"{self.base_url}/content/onemotoring/home/buying/coe-702702702702702702702702702702702702702702.html",
timeout=30
)
if response.status_code == 200:
return self.parse_coe_results(response.text)
return None
def parse_coe_results(self, html):
soup = BeautifulSoup(html, 'html.parser')
results = {}
tables = soup.select('table')
for table in tables:
rows = table.select('tr')
for row in rows:
cells = row.select('td')
if len(cells) >= 3:
category = cells[0].get_text(strip=True)
quota = cells[1].get_text(strip=True)
premium = cells[2].get_text(strip=True)
if any(cat in category for cat in ["Cat A", "Cat B", "Cat C", "Cat D", "Cat E"]):
results[category] = {
"quota": quota,
"premium": self.parse_premium(premium),
}
return results
def parse_premium(self, premium_str):
clean = re.sub(r'[^\d.]', '', premium_str)
try:
return float(clean)
except:
return None
def scrape_coe_history(self, months=24):
"""Scrape historical COE premiums"""
proxy = self.proxy_manager.get_gov_proxy("SG")
session = requests.Session()
session.proxies.update(proxy)
session.headers.update({"User-Agent": get_random_mobile_ua()})
history = []
# Navigate through historical COE result pages
for month_offset in range(months):
date = datetime.now() - timedelta(days=month_offset * 30)
month_data = self.get_month_coe_data(session, date)
if month_data:
history.extend(month_data)
time.sleep(random.uniform(3, 7))
return historyLTA Vehicle Population Data
class LTAVehiclePopulationScraper:
def __init__(self, proxy_manager):
self.proxy_manager = proxy_manager
def scrape_vehicle_population(self):
"""Scrape vehicle population statistics from LTA"""
proxy = self.proxy_manager.get_gov_proxy("SG")
session = requests.Session()
session.proxies.update(proxy)
session.headers.update({"User-Agent": get_random_mobile_ua()})
# LTA publishes data on data.gov.sg
response = session.get(
"https://data.gov.sg/api/action/datastore_search",
params={"resource_id": "vehicle_population_resource_id", "limit": 1000},
timeout=30
)
if response.status_code == 200:
return response.json().get("result", {}).get("records", [])
return []
def scrape_registration_stats(self):
"""Scrape monthly vehicle registration/deregistration statistics"""
proxy = self.proxy_manager.get_gov_proxy("SG")
session = requests.Session()
session.proxies.update(proxy)
session.headers.update({"User-Agent": get_random_mobile_ua()})
response = session.get(
"https://data.gov.sg/api/action/datastore_search",
params={"resource_id": "registration_stats_resource_id", "limit": 5000},
timeout=30
)
if response.status_code == 200:
records = response.json().get("result", {}).get("records", [])
return self.process_registration_stats(records)
return None
def process_registration_stats(self, records):
monthly = {}
for record in records:
month = record.get("month")
if month not in monthly:
monthly[month] = {
"registrations": 0,
"deregistrations": 0,
"by_make": {},
}
monthly[month]["registrations"] += int(record.get("number", 0))
return monthlyScraping Government Auction Platforms
General Auction Scraping Framework
class GovAuctionScraper:
def __init__(self, proxy_manager):
self.proxy_manager = proxy_manager
def scrape_auction_listings(self, auction_url, country):
proxy = self.proxy_manager.get_gov_proxy(country)
with sync_playwright() as p:
browser = p.chromium.launch(proxy={"server": proxy["http"]})
context = browser.new_context(
user_agent=get_random_mobile_ua(),
locale=self.get_locale(country)
)
page = context.new_page()
page.goto(auction_url, wait_until="networkidle", timeout=30000)
# Wait for auction listings to load
page.wait_for_timeout(3000)
listings = page.evaluate("""
() => {
const items = document.querySelectorAll(
'.auction-item, .lot-item, [class*="auction"], [class*="lot"]'
);
return Array.from(items).map(item => ({
lot_number: item.querySelector('[class*="lot-number"]')?.textContent?.trim(),
title: item.querySelector('h3, h4, [class*="title"]')?.textContent?.trim(),
description: item.querySelector('[class*="description"], [class*="desc"]')?.textContent?.trim(),
starting_bid: item.querySelector('[class*="price"], [class*="bid"]')?.textContent?.trim(),
auction_date: item.querySelector('[class*="date"]')?.textContent?.trim(),
location: item.querySelector('[class*="location"]')?.textContent?.trim(),
image: item.querySelector('img')?.src,
}));
}
""")
browser.close()
return [l for l in listings if l.get("title")]
def get_locale(self, country):
locales = {
"SG": "en-SG",
"MY": "en-MY",
"TH": "th-TH",
"ID": "id-ID",
}
return locales.get(country, "en-US")Singapore Customs Auctions
class SGCustomsAuctionScraper:
def __init__(self, proxy_manager):
self.proxy_manager = proxy_manager
def scrape_customs_auctions(self):
proxy = self.proxy_manager.get_gov_proxy("SG")
session = requests.Session()
session.proxies.update(proxy)
session.headers.update({
"User-Agent": get_random_mobile_ua(),
"Accept-Language": "en-SG,en;q=0.9"
})
# Singapore Customs auction notices
response = session.get(
"https://www.customs.gov.sg/news-and-media/public-auction",
timeout=30
)
if response.status_code == 200:
return self.parse_auction_notices(response.text)
return []
def parse_auction_notices(self, html):
soup = BeautifulSoup(html, 'html.parser')
auctions = []
for notice in soup.select('.auction-notice, article, .content-item'):
auction = {
"title": safe_text(notice, 'h2, h3'),
"date": safe_text(notice, '[class*="date"], time'),
"location": safe_text(notice, '[class*="venue"], [class*="location"]'),
"description": safe_text(notice, 'p, [class*="description"]'),
"link": notice.select_one('a')['href'] if notice.select_one('a') else None,
}
if auction["title"]:
auctions.append(auction)
return auctionsIndonesian Government Auctions (DJKN)
class DJKNAuctionScraper:
def __init__(self, proxy_manager):
self.proxy_manager = proxy_manager
def scrape_vehicle_auctions(self):
proxy = self.proxy_manager.get_gov_proxy("ID")
session = requests.Session()
session.proxies.update(proxy)
session.headers.update({
"User-Agent": get_random_mobile_ua(),
"Accept-Language": "id-ID,id;q=0.9"
})
# DJKN e-auction platform
response = session.get(
"https://lelang.go.id/search",
params={
"category": "kendaraan",
"status": "upcoming",
},
timeout=30
)
if response.status_code == 200:
return self.parse_djkn_results(response.text)
return []
def parse_djkn_results(self, html):
soup = BeautifulSoup(html, 'html.parser')
auctions = []
for item in soup.select('.auction-card, .lelang-item'):
auctions.append({
"platform": "djkn",
"country": "ID",
"title": safe_text(item, '.item-title, h4'),
"starting_price": safe_text(item, '.starting-price, [class*="harga"]'),
"auction_date": safe_text(item, '.auction-date, [class*="tanggal"]'),
"location": safe_text(item, '.location, [class*="lokasi"]'),
"lot_number": safe_text(item, '.lot-number'),
"vehicle_details": safe_text(item, '.vehicle-info, [class*="keterangan"]'),
})
return auctionsAnalyzing Government Data
Registration Trend Analysis
class RegistrationAnalyzer:
def analyze_registration_trends(self, registration_data, months=12):
"""Analyze vehicle registration trends"""
recent = self.filter_recent_months(registration_data, months)
if not recent:
return None
monthly_totals = []
for month, data in sorted(recent.items()):
monthly_totals.append({
"month": month,
"registrations": data["registrations"],
"deregistrations": data.get("deregistrations", 0),
"net_change": data["registrations"] - data.get("deregistrations", 0),
})
return {
"period": f"Last {months} months",
"total_registrations": sum(m["registrations"] for m in monthly_totals),
"total_deregistrations": sum(m["deregistrations"] for m in monthly_totals),
"net_fleet_change": sum(m["net_change"] for m in monthly_totals),
"avg_monthly_registrations": statistics.mean([m["registrations"] for m in monthly_totals]),
"trend_direction": self.calculate_trend(monthly_totals),
"monthly_data": monthly_totals,
}
def calculate_trend(self, monthly_data):
if len(monthly_data) < 3:
return "insufficient_data"
first_half = statistics.mean([m["registrations"] for m in monthly_data[:len(monthly_data)//2]])
second_half = statistics.mean([m["registrations"] for m in monthly_data[len(monthly_data)//2:]])
if second_half > first_half * 1.05:
return "increasing"
elif second_half < first_half * 0.95:
return "decreasing"
return "stable"Auction Price Analysis
class AuctionPriceAnalyzer:
def analyze_auction_results(self, auction_data):
"""Analyze government auction results for vehicle pricing intelligence"""
vehicle_auctions = [a for a in auction_data if self.is_vehicle_auction(a)]
if not vehicle_auctions:
return None
results = []
for auction in vehicle_auctions:
vehicle_info = self.extract_vehicle_info(auction)
if vehicle_info:
results.append({
**vehicle_info,
"auction_price": auction.get("final_price") or auction.get("starting_price"),
"auction_date": auction.get("auction_date"),
"location": auction.get("location"),
})
return {
"total_vehicle_auctions": len(results),
"avg_auction_price": statistics.mean([r["auction_price"] for r in results if r.get("auction_price")]),
"by_make": Counter(r.get("make") for r in results if r.get("make")),
"price_range": {
"min": min(r["auction_price"] for r in results if r.get("auction_price")),
"max": max(r["auction_price"] for r in results if r.get("auction_price")),
},
"results": results,
}
def compare_auction_to_market(self, auction_results, market_data):
"""Compare auction prices to market prices"""
comparisons = []
for auction in auction_results:
make = auction.get("make")
model = auction.get("model")
year = auction.get("year")
market_price = self.get_market_price(make, model, year, market_data)
if market_price and auction.get("auction_price"):
discount = ((market_price - auction["auction_price"]) / market_price) * 100
comparisons.append({
"vehicle": f"{make} {model} {year}",
"auction_price": auction["auction_price"],
"market_price": market_price,
"discount_pct": round(discount, 1),
})
return sorted(comparisons, key=lambda x: x["discount_pct"], reverse=True)Data Storage
Schema for Government Data
CREATE TABLE gov_registration_stats (
id SERIAL PRIMARY KEY,
country VARCHAR(5),
month DATE,
vehicle_type VARCHAR(50),
make VARCHAR(100),
registrations INTEGER,
deregistrations INTEGER,
source VARCHAR(100),
scraped_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE gov_auction_listings (
id SERIAL PRIMARY KEY,
country VARCHAR(5),
platform VARCHAR(100),
lot_number VARCHAR(50),
vehicle_description TEXT,
vehicle_make VARCHAR(100),
vehicle_model VARCHAR(200),
vehicle_year INTEGER,
starting_price DECIMAL(15, 2),
final_price DECIMAL(15, 2),
currency VARCHAR(5),
auction_date DATE,
location VARCHAR(200),
status VARCHAR(20),
scraped_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE coe_results (
id SERIAL PRIMARY KEY,
bidding_date DATE,
bidding_round INTEGER,
category VARCHAR(10),
quota INTEGER,
bids_received INTEGER,
premium DECIMAL(10, 2),
scraped_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
UNIQUE (bidding_date, bidding_round, category)
);Ethical Considerations
When scraping government data:
- Public data only: Only collect data that is publicly available and intended for public access
- Respect rate limits: Government servers often have limited capacity; be conservative with request frequency
- No personal data: Avoid collecting personal information from registration records
- Attribution: Cite government sources when using the data
- Compliance: Ensure your data collection complies with each country’s computer misuse and data protection laws
Conclusion
Government vehicle registration and auction data provides authoritative market intelligence that complements commercial data sources. COE results in Singapore, registration statistics across the region, and government auction pricing all contribute to a more complete picture of automotive market dynamics in Southeast Asia.
DataResearchTools mobile proxies enable reliable access to government websites and auction platforms across the region. With in-country mobile IPs that government sites trust, sticky sessions for multi-page data lookups, and the reliability needed for scheduled data collection, DataResearchTools provides the proxy infrastructure that makes systematic government data collection feasible.
Combine government data with commercial marketplace data to build the most comprehensive automotive intelligence datasets available for Southeast Asian markets.
- Automotive Inventory Tracking Across Multiple Dealer Websites
- Automotive Review Aggregation Using Proxy Networks
- How to Scrape AliExpress Product Data Without Getting Blocked
- Amazon Buy Box Monitoring: Proxy Setup for Continuous Tracking
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- API vs Web Scraping: When You Need Proxies (and When You Don’t)
- Automotive Inventory Tracking Across Multiple Dealer Websites
- Automotive Review Aggregation Using Proxy Networks
- aiohttp + BeautifulSoup: Async Python Scraping
- How to Scrape AliExpress Product Data Without Getting Blocked
- Amazon Buy Box Monitoring: Proxy Setup for Continuous Tracking
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- Automotive Inventory Tracking Across Multiple Dealer Websites
- Automotive Review Aggregation Using Proxy Networks
- aiohttp + BeautifulSoup: Async Python Scraping
- How to Scrape AliExpress Product Data Without Getting Blocked
- Amazon Buy Box Monitoring: Proxy Setup for Continuous Tracking
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- Automotive Inventory Tracking Across Multiple Dealer Websites
- Automotive Review Aggregation Using Proxy Networks
- aiohttp + BeautifulSoup: Async Python Scraping
- How to Scrape AliExpress Product Data Without Getting Blocked
- Amazon Buy Box Monitoring: Proxy Setup for Continuous Tracking
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
Related Reading
- Automotive Inventory Tracking Across Multiple Dealer Websites
- Automotive Review Aggregation Using Proxy Networks
- aiohttp + BeautifulSoup: Async Python Scraping
- How to Scrape AliExpress Product Data Without Getting Blocked
- Amazon Buy Box Monitoring: Proxy Setup for Continuous Tracking
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)