How to Scrape Government Vehicle Registration and Auction Data

How to Scrape Government Vehicle Registration and Auction Data

Government databases and auction platforms contain some of the most valuable and authoritative vehicle data available. Registration statistics reveal market trends, auction results establish fair market values, and deregistration data shows fleet turnover patterns. For automotive businesses, researchers, and data providers in Southeast Asia, accessing this data systematically through web scraping unlocks insights that are simply not available elsewhere.

This guide covers how to collect government vehicle registration data and auction information across Southeast Asian markets using proxy infrastructure.

Government Vehicle Data Sources by Country

Singapore

Singapore has some of the most accessible government vehicle data in the region:

Land Transport Authority (LTA):

  • Vehicle registration and deregistration statistics
  • COE bidding results (bi-monthly)
  • Open data portal with vehicle population datasets
  • Road tax and vehicle inspection data

OneMotoring:

  • Vehicle registration services
  • COE statistics and trends
  • Vehicle inspection records

Government Auctions:

  • Singapore Customs auctions (seized vehicles)
  • Government surplus vehicle sales
  • Statutory board vehicle disposals

Malaysia

Jabatan Pengangkutan Jalan (JPJ):

  • Vehicle registration database
  • Road tax status checks
  • Ownership transfer records

Government Auctions:

  • PUSPAKOM condemned vehicle data
  • Royal Malaysian Customs (seized vehicles)
  • Government fleet disposal

Thailand

Department of Land Transport (DLT):

  • Vehicle registration statistics
  • Annual vehicle inspection data
  • Commercial vehicle licensing

Thai Government Auctions:

  • Customs Department auctions
  • Revenue Department seized property sales
  • State-owned vehicle disposals

Indonesia

Korlantas Polri (Traffic Police):

  • Vehicle registration data (STNK)
  • Vehicle ownership records

Government Auctions:

  • DJKN (State Asset Management) auctions
  • Customs auction platforms
  • BUMN (state enterprise) vehicle disposals

Proxy Requirements for Government Sites

Technical Challenges

Government websites present unique scraping challenges:

  • Rate limiting: Government sites often have strict rate limits due to limited infrastructure
  • Geographic restrictions: Some data is only accessible from within the country
  • Session management: Multi-step data lookups require session continuity
  • Legacy technology: Older sites may use non-standard HTML or require specific browser behaviors
  • CAPTCHA protection: Many government forms include CAPTCHA challenges

DataResearchTools Proxy Configuration

For government data collection, the key requirements are geographic authenticity and session stability:

class GovDataProxyManager:
    def __init__(self, api_key):
        self.api_key = api_key
        self.endpoint = "proxy.dataresearchtools.com"

    def get_gov_proxy(self, country):
        """Get a sticky proxy for government site access"""
        session_id = f"gov-{country}-{int(time.time()) % 100000}"
        auth = f"{self.api_key}:country-{country}-type-mobile-session-{session_id}-ttl-1800"
        return {
            "http": f"http://{auth}@{self.endpoint}:8080",
            "https": f"http://{auth}@{self.endpoint}:8080"
        }

Mobile proxies from DataResearchTools are effective for government sites because they provide authentic in-country IP addresses. Government websites are most likely to serve full data to visitors who appear to be local residents accessing services through their mobile devices.

Scraping Singapore LTA Data

COE Bidding Results

COE results are published after every bidding exercise and are publicly available:

class LTACOEScraper:
    def __init__(self, proxy_manager):
        self.proxy_manager = proxy_manager
        self.base_url = "https://www.onemotoring.com.sg"

    def scrape_coe_results(self):
        proxy = self.proxy_manager.get_gov_proxy("SG")

        session = requests.Session()
        session.proxies.update(proxy)
        session.headers.update({
            "User-Agent": get_random_mobile_ua(),
            "Accept-Language": "en-SG,en;q=0.9"
        })

        response = session.get(
            f"{self.base_url}/content/onemotoring/home/buying/coe-702702702702702702702702702702702702702702.html",
            timeout=30
        )

        if response.status_code == 200:
            return self.parse_coe_results(response.text)
        return None

    def parse_coe_results(self, html):
        soup = BeautifulSoup(html, 'html.parser')
        results = {}

        tables = soup.select('table')
        for table in tables:
            rows = table.select('tr')
            for row in rows:
                cells = row.select('td')
                if len(cells) >= 3:
                    category = cells[0].get_text(strip=True)
                    quota = cells[1].get_text(strip=True)
                    premium = cells[2].get_text(strip=True)

                    if any(cat in category for cat in ["Cat A", "Cat B", "Cat C", "Cat D", "Cat E"]):
                        results[category] = {
                            "quota": quota,
                            "premium": self.parse_premium(premium),
                        }

        return results

    def parse_premium(self, premium_str):
        clean = re.sub(r'[^\d.]', '', premium_str)
        try:
            return float(clean)
        except:
            return None

    def scrape_coe_history(self, months=24):
        """Scrape historical COE premiums"""
        proxy = self.proxy_manager.get_gov_proxy("SG")

        session = requests.Session()
        session.proxies.update(proxy)
        session.headers.update({"User-Agent": get_random_mobile_ua()})

        history = []
        # Navigate through historical COE result pages
        for month_offset in range(months):
            date = datetime.now() - timedelta(days=month_offset * 30)
            month_data = self.get_month_coe_data(session, date)
            if month_data:
                history.extend(month_data)
            time.sleep(random.uniform(3, 7))

        return history

LTA Vehicle Population Data

class LTAVehiclePopulationScraper:
    def __init__(self, proxy_manager):
        self.proxy_manager = proxy_manager

    def scrape_vehicle_population(self):
        """Scrape vehicle population statistics from LTA"""
        proxy = self.proxy_manager.get_gov_proxy("SG")

        session = requests.Session()
        session.proxies.update(proxy)
        session.headers.update({"User-Agent": get_random_mobile_ua()})

        # LTA publishes data on data.gov.sg
        response = session.get(
            "https://data.gov.sg/api/action/datastore_search",
            params={"resource_id": "vehicle_population_resource_id", "limit": 1000},
            timeout=30
        )

        if response.status_code == 200:
            return response.json().get("result", {}).get("records", [])
        return []

    def scrape_registration_stats(self):
        """Scrape monthly vehicle registration/deregistration statistics"""
        proxy = self.proxy_manager.get_gov_proxy("SG")

        session = requests.Session()
        session.proxies.update(proxy)
        session.headers.update({"User-Agent": get_random_mobile_ua()})

        response = session.get(
            "https://data.gov.sg/api/action/datastore_search",
            params={"resource_id": "registration_stats_resource_id", "limit": 5000},
            timeout=30
        )

        if response.status_code == 200:
            records = response.json().get("result", {}).get("records", [])
            return self.process_registration_stats(records)
        return None

    def process_registration_stats(self, records):
        monthly = {}
        for record in records:
            month = record.get("month")
            if month not in monthly:
                monthly[month] = {
                    "registrations": 0,
                    "deregistrations": 0,
                    "by_make": {},
                }

            monthly[month]["registrations"] += int(record.get("number", 0))

        return monthly

Scraping Government Auction Platforms

General Auction Scraping Framework

class GovAuctionScraper:
    def __init__(self, proxy_manager):
        self.proxy_manager = proxy_manager

    def scrape_auction_listings(self, auction_url, country):
        proxy = self.proxy_manager.get_gov_proxy(country)

        with sync_playwright() as p:
            browser = p.chromium.launch(proxy={"server": proxy["http"]})
            context = browser.new_context(
                user_agent=get_random_mobile_ua(),
                locale=self.get_locale(country)
            )
            page = context.new_page()

            page.goto(auction_url, wait_until="networkidle", timeout=30000)

            # Wait for auction listings to load
            page.wait_for_timeout(3000)

            listings = page.evaluate("""
                () => {
                    const items = document.querySelectorAll(
                        '.auction-item, .lot-item, [class*="auction"], [class*="lot"]'
                    );
                    return Array.from(items).map(item => ({
                        lot_number: item.querySelector('[class*="lot-number"]')?.textContent?.trim(),
                        title: item.querySelector('h3, h4, [class*="title"]')?.textContent?.trim(),
                        description: item.querySelector('[class*="description"], [class*="desc"]')?.textContent?.trim(),
                        starting_bid: item.querySelector('[class*="price"], [class*="bid"]')?.textContent?.trim(),
                        auction_date: item.querySelector('[class*="date"]')?.textContent?.trim(),
                        location: item.querySelector('[class*="location"]')?.textContent?.trim(),
                        image: item.querySelector('img')?.src,
                    }));
                }
            """)

            browser.close()
            return [l for l in listings if l.get("title")]

    def get_locale(self, country):
        locales = {
            "SG": "en-SG",
            "MY": "en-MY",
            "TH": "th-TH",
            "ID": "id-ID",
        }
        return locales.get(country, "en-US")

Singapore Customs Auctions

class SGCustomsAuctionScraper:
    def __init__(self, proxy_manager):
        self.proxy_manager = proxy_manager

    def scrape_customs_auctions(self):
        proxy = self.proxy_manager.get_gov_proxy("SG")

        session = requests.Session()
        session.proxies.update(proxy)
        session.headers.update({
            "User-Agent": get_random_mobile_ua(),
            "Accept-Language": "en-SG,en;q=0.9"
        })

        # Singapore Customs auction notices
        response = session.get(
            "https://www.customs.gov.sg/news-and-media/public-auction",
            timeout=30
        )

        if response.status_code == 200:
            return self.parse_auction_notices(response.text)
        return []

    def parse_auction_notices(self, html):
        soup = BeautifulSoup(html, 'html.parser')
        auctions = []

        for notice in soup.select('.auction-notice, article, .content-item'):
            auction = {
                "title": safe_text(notice, 'h2, h3'),
                "date": safe_text(notice, '[class*="date"], time'),
                "location": safe_text(notice, '[class*="venue"], [class*="location"]'),
                "description": safe_text(notice, 'p, [class*="description"]'),
                "link": notice.select_one('a')['href'] if notice.select_one('a') else None,
            }
            if auction["title"]:
                auctions.append(auction)

        return auctions

Indonesian Government Auctions (DJKN)

class DJKNAuctionScraper:
    def __init__(self, proxy_manager):
        self.proxy_manager = proxy_manager

    def scrape_vehicle_auctions(self):
        proxy = self.proxy_manager.get_gov_proxy("ID")

        session = requests.Session()
        session.proxies.update(proxy)
        session.headers.update({
            "User-Agent": get_random_mobile_ua(),
            "Accept-Language": "id-ID,id;q=0.9"
        })

        # DJKN e-auction platform
        response = session.get(
            "https://lelang.go.id/search",
            params={
                "category": "kendaraan",
                "status": "upcoming",
            },
            timeout=30
        )

        if response.status_code == 200:
            return self.parse_djkn_results(response.text)
        return []

    def parse_djkn_results(self, html):
        soup = BeautifulSoup(html, 'html.parser')
        auctions = []

        for item in soup.select('.auction-card, .lelang-item'):
            auctions.append({
                "platform": "djkn",
                "country": "ID",
                "title": safe_text(item, '.item-title, h4'),
                "starting_price": safe_text(item, '.starting-price, [class*="harga"]'),
                "auction_date": safe_text(item, '.auction-date, [class*="tanggal"]'),
                "location": safe_text(item, '.location, [class*="lokasi"]'),
                "lot_number": safe_text(item, '.lot-number'),
                "vehicle_details": safe_text(item, '.vehicle-info, [class*="keterangan"]'),
            })

        return auctions

Analyzing Government Data

Registration Trend Analysis

class RegistrationAnalyzer:
    def analyze_registration_trends(self, registration_data, months=12):
        """Analyze vehicle registration trends"""
        recent = self.filter_recent_months(registration_data, months)

        if not recent:
            return None

        monthly_totals = []
        for month, data in sorted(recent.items()):
            monthly_totals.append({
                "month": month,
                "registrations": data["registrations"],
                "deregistrations": data.get("deregistrations", 0),
                "net_change": data["registrations"] - data.get("deregistrations", 0),
            })

        return {
            "period": f"Last {months} months",
            "total_registrations": sum(m["registrations"] for m in monthly_totals),
            "total_deregistrations": sum(m["deregistrations"] for m in monthly_totals),
            "net_fleet_change": sum(m["net_change"] for m in monthly_totals),
            "avg_monthly_registrations": statistics.mean([m["registrations"] for m in monthly_totals]),
            "trend_direction": self.calculate_trend(monthly_totals),
            "monthly_data": monthly_totals,
        }

    def calculate_trend(self, monthly_data):
        if len(monthly_data) < 3:
            return "insufficient_data"

        first_half = statistics.mean([m["registrations"] for m in monthly_data[:len(monthly_data)//2]])
        second_half = statistics.mean([m["registrations"] for m in monthly_data[len(monthly_data)//2:]])

        if second_half > first_half * 1.05:
            return "increasing"
        elif second_half < first_half * 0.95:
            return "decreasing"
        return "stable"

Auction Price Analysis

class AuctionPriceAnalyzer:
    def analyze_auction_results(self, auction_data):
        """Analyze government auction results for vehicle pricing intelligence"""
        vehicle_auctions = [a for a in auction_data if self.is_vehicle_auction(a)]

        if not vehicle_auctions:
            return None

        results = []
        for auction in vehicle_auctions:
            vehicle_info = self.extract_vehicle_info(auction)
            if vehicle_info:
                results.append({
                    **vehicle_info,
                    "auction_price": auction.get("final_price") or auction.get("starting_price"),
                    "auction_date": auction.get("auction_date"),
                    "location": auction.get("location"),
                })

        return {
            "total_vehicle_auctions": len(results),
            "avg_auction_price": statistics.mean([r["auction_price"] for r in results if r.get("auction_price")]),
            "by_make": Counter(r.get("make") for r in results if r.get("make")),
            "price_range": {
                "min": min(r["auction_price"] for r in results if r.get("auction_price")),
                "max": max(r["auction_price"] for r in results if r.get("auction_price")),
            },
            "results": results,
        }

    def compare_auction_to_market(self, auction_results, market_data):
        """Compare auction prices to market prices"""
        comparisons = []
        for auction in auction_results:
            make = auction.get("make")
            model = auction.get("model")
            year = auction.get("year")

            market_price = self.get_market_price(make, model, year, market_data)
            if market_price and auction.get("auction_price"):
                discount = ((market_price - auction["auction_price"]) / market_price) * 100
                comparisons.append({
                    "vehicle": f"{make} {model} {year}",
                    "auction_price": auction["auction_price"],
                    "market_price": market_price,
                    "discount_pct": round(discount, 1),
                })

        return sorted(comparisons, key=lambda x: x["discount_pct"], reverse=True)

Data Storage

Schema for Government Data

CREATE TABLE gov_registration_stats (
    id SERIAL PRIMARY KEY,
    country VARCHAR(5),
    month DATE,
    vehicle_type VARCHAR(50),
    make VARCHAR(100),
    registrations INTEGER,
    deregistrations INTEGER,
    source VARCHAR(100),
    scraped_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE gov_auction_listings (
    id SERIAL PRIMARY KEY,
    country VARCHAR(5),
    platform VARCHAR(100),
    lot_number VARCHAR(50),
    vehicle_description TEXT,
    vehicle_make VARCHAR(100),
    vehicle_model VARCHAR(200),
    vehicle_year INTEGER,
    starting_price DECIMAL(15, 2),
    final_price DECIMAL(15, 2),
    currency VARCHAR(5),
    auction_date DATE,
    location VARCHAR(200),
    status VARCHAR(20),
    scraped_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE coe_results (
    id SERIAL PRIMARY KEY,
    bidding_date DATE,
    bidding_round INTEGER,
    category VARCHAR(10),
    quota INTEGER,
    bids_received INTEGER,
    premium DECIMAL(10, 2),
    scraped_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    UNIQUE (bidding_date, bidding_round, category)
);

Ethical Considerations

When scraping government data:

  • Public data only: Only collect data that is publicly available and intended for public access
  • Respect rate limits: Government servers often have limited capacity; be conservative with request frequency
  • No personal data: Avoid collecting personal information from registration records
  • Attribution: Cite government sources when using the data
  • Compliance: Ensure your data collection complies with each country’s computer misuse and data protection laws

Conclusion

Government vehicle registration and auction data provides authoritative market intelligence that complements commercial data sources. COE results in Singapore, registration statistics across the region, and government auction pricing all contribute to a more complete picture of automotive market dynamics in Southeast Asia.

DataResearchTools mobile proxies enable reliable access to government websites and auction platforms across the region. With in-country mobile IPs that government sites trust, sticky sessions for multi-page data lookups, and the reliability needed for scheduled data collection, DataResearchTools provides the proxy infrastructure that makes systematic government data collection feasible.

Combine government data with commercial marketplace data to build the most comprehensive automotive intelligence datasets available for Southeast Asian markets.


Related Reading

Scroll to Top