Building a Legislative Bill Tracker with Proxy-Powered Scraping

Building a Legislative Bill Tracker with Proxy-Powered Scraping

Legislative activity directly shapes the business environment. New laws create compliance obligations, open markets, impose restrictions, or create opportunities. For businesses, legal professionals, lobbyists, and policy analysts operating across Southeast Asia, automated legislative tracking provides early warning of changes that affect strategy and operations.

This guide covers how to build a legislative bill tracker that monitors parliamentary activity across multiple ASEAN countries using proxy-powered scraping.

Why Track Legislation

Business Impact

Legislative changes affect businesses in concrete ways:

  • Tax bills change cost structures and planning
  • Trade legislation affects import/export operations and market access
  • Labor laws impact hiring, compensation, and workforce management
  • Technology regulation shapes product development and compliance requirements
  • Environmental legislation affects industrial operations and reporting obligations
  • Financial regulation changes banking, insurance, and capital market operations

Early Warning Advantage

Legislation moves through predictable stages: introduction, committee review, debate, amendment, passage, and implementation. Tracking bills from introduction gives organizations months of lead time to prepare for changes.

Multi-Jurisdiction Complexity

Companies operating across ASEAN must track legislative activity in multiple countries simultaneously. Each parliament has its own procedures, timelines, and information systems.

Legislative Systems in ASEAN

Singapore

Parliament of Singapore (parliament.gov.sg)

  • Unicameral legislature
  • Bills published before second reading
  • Parliamentary debates available in Hansard
  • Select committee reports
  • Acts published in the Government Gazette

Singapore’s parliament publishes comprehensive information including bill texts, debate records, and committee reports in English.

Indonesia

DPR RI (dpr.go.id) – House of Representatives

  • Bicameral legislature (DPR and DPD)
  • Prolegnas: National legislation program published annually
  • Bill drafts (RUU) available during deliberation
  • Committee reports and hearing transcripts

DPD RI (dpd.go.id) – Regional Representative Council

  • Reviews legislation affecting regional interests

Philippines

Congress of the Philippines

  • Senate (senate.gov.ph): Upper chamber with published bill tracker
  • House of Representatives (congress.gov.ph): Lower chamber
  • Bicameral conference committee for reconciling differences
  • Bill status searchable online

The Philippine Congress provides one of the most detailed legislative tracking systems in ASEAN.

Thailand

National Assembly (parliament.go.th)

  • Bicameral (House of Representatives and Senate)
  • Bill drafts available during review
  • Committee reports published

Malaysia

Parliament of Malaysia (parlimen.gov.my)

  • Bicameral (Dewan Rakyat and Dewan Negara)
  • Hansard records of debates
  • Bill texts published before readings

Vietnam

National Assembly (quochoi.vn)

  • Unicameral legislature
  • Legislative program published annually
  • Bill drafts available for public comment
  • Committee review reports

Building the Legislative Tracker

Architecture Overview

class LegislativeTracker:
    """Track legislative bills across ASEAN parliaments."""

    def __init__(self, proxy_manager, database):
        self.proxy_manager = proxy_manager
        self.db = database
        self.parliament_scrapers = {}
        self.alert_engine = LegislativeAlertEngine()

    def register_parliament(self, country, scraper):
        """Register a parliament scraper."""
        self.parliament_scrapers[country] = scraper

    def run_tracking_cycle(self):
        """Execute one complete tracking cycle."""
        for country, scraper in self.parliament_scrapers.items():
            try:
                proxy = self.proxy_manager.get_proxy_for_country(country)

                # Fetch new and updated bills
                bills = scraper.fetch_bills(proxy)

                for bill in bills:
                    existing = self.db.get_bill(bill['bill_id'])

                    if not existing:
                        # New bill detected
                        self.db.store_bill(bill)
                        self.alert_engine.notify_new_bill(bill)
                    elif bill.get('status') != existing.get('status'):
                        # Bill status changed
                        self.db.update_bill(bill)
                        self.alert_engine.notify_status_change(
                            bill, existing['status'], bill['status']
                        )

                time.sleep(random.uniform(3, 8))

            except Exception as e:
                print(f"Error tracking {country} legislature: {e}")

Bill Data Schema

@dataclass
class LegislativeBill:
    bill_id: str
    country: str
    chamber: str            # senate, house, unicameral
    bill_number: str
    title: str
    short_title: str
    description: str
    sponsor: str
    co_sponsors: list
    filed_date: str
    status: str             # filed, committee, plenary, passed, enacted, vetoed
    committee: str
    subject_areas: list
    full_text_url: str
    related_bills: list
    timeline: list          # list of status changes with dates
    votes: list             # voting records
    amendments: list
    source_url: str
    last_checked: str

Singapore Parliament Scraper

class SingaporeParliamentScraper:
    """Scraper for Singapore Parliament bill tracking."""

    BASE_URL = "https://www.parliament.gov.sg"

    def fetch_bills(self, proxy):
        """Fetch current bills before parliament."""
        session = requests.Session()
        session.proxies = proxy
        session.headers.update({
            'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)',
            'Accept-Language': 'en-SG,en;q=0.9'
        })

        # Fetch bills listing page
        response = session.get(
            f"{self.BASE_URL}/bills",
            timeout=30
        )

        bills = self._parse_bills_listing(response.text)

        # Fetch detail for each bill
        for bill in bills:
            if bill.get('detail_url'):
                time.sleep(random.uniform(2, 4))
                detail_response = session.get(
                    bill['detail_url'],
                    timeout=30
                )
                bill.update(self._parse_bill_detail(detail_response.text))

        return bills

    def _parse_bills_listing(self, html):
        """Parse bills listing page."""
        from bs4 import BeautifulSoup
        soup = BeautifulSoup(html, 'html.parser')

        bills = []
        for item in soup.select('.bill-item, .content-item'):
            bill = {
                'country': 'SG',
                'chamber': 'unicameral',
                'title': '',
                'bill_number': '',
                'status': '',
                'detail_url': ''
            }

            title_elem = item.select_one('a, .title')
            if title_elem:
                bill['title'] = title_elem.get_text(strip=True)
                href = title_elem.get('href', '')
                if href:
                    bill['detail_url'] = f"{self.BASE_URL}{href}" if href.startswith('/') else href

            bill['bill_id'] = f"SG:{bill['bill_number']}" if bill['bill_number'] else f"SG:{hash(bill['title'])}"
            bills.append(bill)

        return bills

    def fetch_hansard(self, proxy, date=None):
        """Fetch parliamentary debate records."""
        session = requests.Session()
        session.proxies = proxy

        params = {}
        if date:
            params['date'] = date

        response = session.get(
            f"{self.BASE_URL}/hansard",
            params=params,
            timeout=30
        )

        return self._parse_hansard(response.text)

Philippines Congress Scraper

class PhilippinesCongressScraper:
    """Scraper for Philippine Congress bill tracking."""

    SENATE_URL = "https://www.senate.gov.ph"
    HOUSE_URL = "https://www.congress.gov.ph"

    def fetch_bills(self, proxy):
        """Fetch bills from both chambers."""
        senate_bills = self._fetch_senate_bills(proxy)
        house_bills = self._fetch_house_bills(proxy)

        return senate_bills + house_bills

    def _fetch_senate_bills(self, proxy):
        """Fetch bills from the Philippine Senate."""
        session = requests.Session()
        session.proxies = proxy
        session.headers.update({
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)',
            'Accept-Language': 'en-PH,en;q=0.9'
        })

        response = session.get(
            f"{self.SENATE_URL}/lis/leg_sys.aspx",
            timeout=30
        )

        return self._parse_senate_bills(response.text)

    def _fetch_house_bills(self, proxy):
        """Fetch bills from the House of Representatives."""
        session = requests.Session()
        session.proxies = proxy

        response = session.get(
            f"{self.HOUSE_URL}/legisdocs",
            timeout=30
        )

        return self._parse_house_bills(response.text)

    def _parse_senate_bills(self, html):
        """Parse Philippine Senate bill listings."""
        from bs4 import BeautifulSoup
        soup = BeautifulSoup(html, 'html.parser')

        bills = []
        for row in soup.select('table tr')[1:]:
            cells = row.find_all('td')
            if len(cells) >= 4:
                bill = {
                    'country': 'PH',
                    'chamber': 'senate',
                    'bill_number': cells[0].get_text(strip=True),
                    'title': cells[1].get_text(strip=True),
                    'sponsor': cells[2].get_text(strip=True),
                    'status': cells[3].get_text(strip=True),
                    'filed_date': cells[4].get_text(strip=True) if len(cells) > 4 else ''
                }
                bill['bill_id'] = f"PH:S:{bill['bill_number']}"
                bills.append(bill)

        return bills

Indonesia DPR Scraper

class IndonesiaDPRScraper:
    """Scraper for Indonesian House of Representatives."""

    BASE_URL = "https://www.dpr.go.id"

    def fetch_bills(self, proxy):
        """Fetch bills (RUU) from DPR."""
        session = requests.Session()
        session.proxies = proxy
        session.headers.update({
            'User-Agent': 'Mozilla/5.0 (Linux; Android 13)',
            'Accept-Language': 'id-ID,id;q=0.9'
        })

        # Fetch Prolegnas (national legislation program)
        response = session.get(
            f"{self.BASE_URL}/prolegnas",
            timeout=30
        )

        return self._parse_prolegnas(response.text)

    def _parse_prolegnas(self, html):
        """Parse Indonesian national legislation program."""
        from bs4 import BeautifulSoup
        soup = BeautifulSoup(html, 'html.parser')

        bills = []
        for item in soup.select('.prolegnas-item, .ruu-item, table tr'):
            bill = {
                'country': 'ID',
                'chamber': 'dpr',
                'title': '',
                'status': '',
                'sponsor': '',
                'subject_areas': []
            }

            title_elem = item.select_one('.title, td:first-child')
            if title_elem:
                bill['title'] = title_elem.get_text(strip=True)
                bill['bill_id'] = f"ID:DPR:{hash(bill['title'])}"
                bills.append(bill)

        return bills

Bill Classification and Analysis

Subject Area Classification

class BillClassifier:
    """Classify bills by subject area and business impact."""

    SUBJECT_AREAS = {
        'taxation': ['tax', 'pajak', 'revenue', 'duty', 'excise', 'fiscal'],
        'trade': ['trade', 'commerce', 'import', 'export', 'tariff', 'perdagangan'],
        'finance': ['banking', 'insurance', 'securities', 'fintech', 'perbankan'],
        'technology': ['digital', 'data protection', 'cyber', 'AI', 'teknologi'],
        'labor': ['employment', 'labor', 'wage', 'worker', 'ketenagakerjaan'],
        'environment': ['environment', 'emission', 'climate', 'lingkungan'],
        'healthcare': ['health', 'pharmaceutical', 'medical', 'kesehatan'],
        'education': ['education', 'university', 'school', 'pendidikan'],
        'defense': ['defense', 'military', 'security', 'pertahanan'],
        'infrastructure': ['infrastructure', 'construction', 'transport', 'infrastruktur'],
        'corporate': ['company law', 'corporate governance', 'merger', 'perseroan'],
        'land': ['land', 'property', 'real estate', 'agraria', 'pertanahan'],
    }

    def classify(self, bill):
        """Classify a bill by subject area."""
        text = f"{bill.get('title', '')} {bill.get('description', '')}".lower()
        matches = []

        for area, keywords in self.SUBJECT_AREAS.items():
            score = sum(1 for kw in keywords if kw in text)
            if score > 0:
                matches.append({'area': area, 'score': score})

        return sorted(matches, key=lambda x: x['score'], reverse=True)

    def assess_business_impact(self, bill, business_profile):
        """Assess potential business impact of a bill."""
        classifications = self.classify(bill)
        bill_areas = set(c['area'] for c in classifications)
        profile_areas = set(business_profile.get('relevant_areas', []))

        overlap = bill_areas & profile_areas
        impact_score = len(overlap) * 25

        # Check for high-impact language
        text = f"{bill.get('title', '')} {bill.get('description', '')}".lower()
        high_impact_terms = [
            'mandatory', 'prohibition', 'penalty', 'license requirement',
            'new obligation', 'ban', 'wajib', 'larangan', 'sanksi'
        ]
        for term in high_impact_terms:
            if term in text:
                impact_score += 10

        return {
            'score': min(impact_score, 100),
            'relevant_areas': list(overlap),
            'all_areas': [c['area'] for c in classifications],
            'priority': 'high' if impact_score >= 60 else 'medium' if impact_score >= 30 else 'low'
        }

Monitoring Legislative Progress

Status Change Detection

class LegislativeProgressMonitor:
    """Monitor bills as they progress through legislative stages."""

    STAGE_ORDER = {
        'filed': 1,
        'first_reading': 2,
        'committee': 3,
        'committee_report': 4,
        'second_reading': 5,
        'amendments': 6,
        'third_reading': 7,
        'passed_chamber': 8,
        'other_chamber': 9,
        'conference': 10,
        'enrolled': 11,
        'signed': 12,
        'enacted': 13,
        'vetoed': -1
    }

    def detect_progress(self, current_status, previous_status):
        """Detect if a bill has progressed, regressed, or stalled."""
        current_order = self.STAGE_ORDER.get(current_status, 0)
        previous_order = self.STAGE_ORDER.get(previous_status, 0)

        if current_order > previous_order:
            return 'advanced'
        elif current_order < previous_order and current_order != -1:
            return 'regressed'
        elif current_status == 'vetoed':
            return 'vetoed'
        else:
            return 'unchanged'

    def stale_bill_detection(self, db, stale_days=90):
        """Identify bills that have stalled."""
        active_bills = db.get_active_bills()
        stale_bills = []

        cutoff = datetime.utcnow() - timedelta(days=stale_days)
        for bill in active_bills:
            last_activity = bill.get('last_activity_date')
            if last_activity and datetime.fromisoformat(last_activity) < cutoff:
                stale_bills.append({
                    'bill': bill,
                    'days_since_activity': (datetime.utcnow() - datetime.fromisoformat(last_activity)).days
                })

        return stale_bills

Alert System

Configurable Legislative Alerts

class LegislativeAlertEngine:
    """Alert system for legislative changes."""

    def __init__(self, notifier):
        self.notifier = notifier
        self.subscribers = []

    def add_subscription(self, config):
        """Add a legislative alert subscription."""
        self.subscribers.append({
            'name': config['name'],
            'email': config['email'],
            'countries': config.get('countries', []),
            'subject_areas': config.get('subject_areas', []),
            'keywords': config.get('keywords', []),
            'alert_on': config.get('alert_on', ['new_bill', 'committee', 'passed', 'enacted']),
            'frequency': config.get('frequency', 'immediate')
        })

    def notify_new_bill(self, bill):
        """Notify subscribers about a new bill."""
        for sub in self.subscribers:
            if 'new_bill' in sub['alert_on'] and self._matches(bill, sub):
                self.notifier.send(
                    to=sub['email'],
                    subject=f"New Bill: {bill['title'][:50]} ({bill['country']})",
                    body=self._format_bill_alert(bill, 'New bill introduced')
                )

    def notify_status_change(self, bill, old_status, new_status):
        """Notify subscribers about a bill status change."""
        for sub in self.subscribers:
            if new_status in sub['alert_on'] and self._matches(bill, sub):
                self.notifier.send(
                    to=sub['email'],
                    subject=f"Bill Update: {bill['title'][:40]} - {new_status}",
                    body=self._format_bill_alert(
                        bill,
                        f"Status changed: {old_status} -> {new_status}"
                    )
                )

    def _matches(self, bill, subscription):
        if subscription['countries'] and bill['country'] not in subscription['countries']:
            return False
        if subscription['keywords']:
            text = f"{bill['title']} {bill.get('description', '')}".lower()
            if not any(kw.lower() in text for kw in subscription['keywords']):
                return False
        return True

DataResearchTools for Legislative Tracking

DataResearchTools provides the proxy infrastructure that makes multi-country legislative tracking reliable:

  • Six-country ASEAN coverage for accessing parliament and congress websites across the region
  • Mobile carrier IPs that government websites trust and do not block
  • Session management for navigating complex legislative database interfaces
  • Consistent uptime for daily legislative monitoring without gaps
  • Smart rotation to maintain access while checking multiple parliamentary sources

Our proxy network ensures that your legislative tracker has uninterrupted access to every parliament and congress website across Southeast Asia.

Getting Started

Phase 1: Core Tracker

Set up scrapers for your top 2-3 priority countries. Configure DataResearchTools proxies, implement basic bill detection, and set up keyword-based alerting.

Phase 2: Expanded Coverage

Add remaining ASEAN countries, implement bill classification and impact assessment, and build a searchable bill database.

Phase 3: Intelligence Layer

Add progress monitoring, stale bill detection, legislative forecasting based on historical patterns, and integration with your compliance and strategy workflows.

Conclusion

A legislative bill tracker powered by DataResearchTools’ proxy infrastructure gives organizations early visibility into the laws and regulations that will shape their operating environment across Southeast Asia. By automating the detection and classification of legislative activity, you transform scattered parliamentary information into structured intelligence that drives proactive strategy and compliance planning.

Start with the countries and legislative topics most critical to your business, build reliable scraping with appropriate proxy infrastructure, and expand systematically to create comprehensive legislative intelligence across the region.


Related Reading

Scroll to Top