Scraping Government Gazette Publications Across Southeast Asia

Scraping Government Gazette Publications Across Southeast Asia

Government gazettes are the official record of a country’s legal and administrative actions. They publish new legislation, regulations, government appointments, company registrations, land notices, bankruptcy proceedings, and other official announcements. For businesses, legal professionals, and researchers operating in Southeast Asia, gazette data is an authoritative source of intelligence that cannot be obtained elsewhere.

This guide covers how to systematically scrape government gazette publications across ASEAN countries using proxy infrastructure.

What Government Gazettes Publish

Legal and Regulatory Content

  • New acts of parliament and amendments
  • Government regulations and ministerial decrees
  • Executive orders and presidential decrees
  • Subsidiary legislation and rules

Administrative Notices

  • Government appointments and dismissals
  • Formation and dissolution of government bodies
  • Changes to administrative boundaries
  • Public holiday declarations

Business and Commercial Notices

  • Company incorporation and dissolution
  • Trademark and patent publications
  • Bankruptcy and winding-up orders
  • Change of company names

Land and Property Notices

  • Land acquisition notices
  • Zoning changes and development plans
  • Environmental impact assessment publications
  • Mining and exploration permits

Tender and Procurement Notices

Some countries publish procurement notices through their official gazettes in addition to dedicated procurement portals.

Government Gazettes by Country

Singapore

Singapore Government Gazette (eGazette)

  • URL: egazette.gov.sg
  • Format: Searchable online database with PDF supplements
  • Content: Bills, subsidiary legislation, government notifications, appointments
  • Language: English
  • Update frequency: Multiple times per week

Indonesia

Berita Negara (State Gazette)

  • URL: ditjenpp.kemenkumham.go.id
  • Format: Online database with downloadable PDFs
  • Content: Laws, government regulations, presidential decrees, ministerial regulations
  • Language: Bahasa Indonesia
  • Update frequency: Regular, tied to legislative calendar

Lembaran Negara (State Sheet)

  • Contains enacted legislation
  • Published alongside explanatory memoranda

Philippines

Official Gazette of the Republic of the Philippines

  • URL: officialgazette.gov.ph
  • Format: Web articles and PDF publications
  • Content: Executive orders, proclamations, administrative orders, legislation
  • Language: English and Filipino
  • Update frequency: Multiple times per week

Thailand

Royal Gazette (Ratchakitcha)

  • URL: ratchakitcha.soc.go.th
  • Format: Online database with PDF documents
  • Content: Royal decrees, legislation, regulations, appointments
  • Language: Thai
  • Update frequency: Tied to royal and legislative calendar

Malaysia

Federal Gazette

  • URL: federalgazette.agc.gov.my
  • Format: PDF publications
  • Content: Acts, subsidiary legislation, government notifications
  • Languages: Bahasa Malaysia and English
  • Update frequency: Regular

Vietnam

Official Gazette (Cong Bao)

  • URL: congbao.chinhphu.vn
  • Format: Online database
  • Content: Laws, decrees, decisions, circulars
  • Language: Vietnamese
  • Update frequency: Regular

Technical Implementation

Proxy Configuration for Gazette Scraping

class GazetteProxyConfig:
    """Proxy configuration for gazette scraping across ASEAN."""

    def __init__(self, proxy_manager):
        self.proxy_manager = proxy_manager
        self.gazette_configs = {
            'singapore': {
                'url': 'https://egazette.gov.sg',
                'country': 'SG',
                'session_type': 'sticky',
                'language': 'en-SG,en;q=0.9'
            },
            'indonesia': {
                'url': 'https://ditjenpp.kemenkumham.go.id',
                'country': 'ID',
                'session_type': 'rotating',
                'language': 'id-ID,id;q=0.9'
            },
            'philippines': {
                'url': 'https://www.officialgazette.gov.ph',
                'country': 'PH',
                'session_type': 'rotating',
                'language': 'en-PH,en;q=0.9'
            },
            'thailand': {
                'url': 'https://ratchakitcha.soc.go.th',
                'country': 'TH',
                'session_type': 'rotating',
                'language': 'th-TH,th;q=0.9'
            },
            'malaysia': {
                'url': 'https://federalgazette.agc.gov.my',
                'country': 'MY',
                'session_type': 'rotating',
                'language': 'ms-MY,ms;q=0.9,en;q=0.8'
            },
            'vietnam': {
                'url': 'https://congbao.chinhphu.vn',
                'country': 'VN',
                'session_type': 'rotating',
                'language': 'vi-VN,vi;q=0.9'
            }
        }

    def get_config(self, country_name):
        return self.gazette_configs.get(country_name.lower())

Singapore eGazette Scraper

class SingaporeGazetteScraper:
    """Scraper for Singapore Government Gazette."""

    BASE_URL = "https://egazette.gov.sg"

    def __init__(self, proxy_manager):
        self.proxy_manager = proxy_manager

    def fetch_recent_publications(self, year=None, gazette_type=None):
        """Fetch recent gazette publications."""
        proxy = self.proxy_manager.get_proxy_for_country('SG')
        session = requests.Session()
        session.proxies = proxy
        session.headers.update({
            'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)',
            'Accept-Language': 'en-SG,en;q=0.9'
        })

        params = {}
        if year:
            params['year'] = year
        if gazette_type:
            params['type'] = gazette_type

        response = session.get(
            f"{self.BASE_URL}/",
            params=params,
            timeout=30
        )

        return self.parse_gazette_list(response.text)

    def parse_gazette_list(self, html):
        """Parse gazette listing page."""
        from bs4 import BeautifulSoup
        soup = BeautifulSoup(html, 'html.parser')

        publications = []
        for item in soup.select('.gazette-item, .listing-item, tr'):
            pub = self._extract_publication(item)
            if pub:
                publications.append(pub)

        return publications

    def _extract_publication(self, element):
        """Extract publication data from a listing element."""
        title_elem = element.select_one('a, .title')
        date_elem = element.select_one('.date, td:nth-child(2)')

        if title_elem:
            return {
                'title': title_elem.get_text(strip=True),
                'url': title_elem.get('href', ''),
                'date': date_elem.get_text(strip=True) if date_elem else '',
                'country': 'Singapore',
                'source': 'eGazette'
            }
        return None

    def download_gazette_pdf(self, pdf_url):
        """Download a gazette publication PDF."""
        proxy = self.proxy_manager.get_proxy_for_country('SG')
        response = requests.get(
            pdf_url, proxies=proxy, timeout=120, stream=True
        )
        return response.content if response.status_code == 200 else None

Indonesia State Gazette Scraper

class IndonesiaGazetteScraper:
    """Scraper for Indonesian State Gazette (Berita Negara)."""

    BASE_URL = "https://ditjenpp.kemenkumham.go.id"

    def __init__(self, proxy_manager):
        self.proxy_manager = proxy_manager

    def search_regulations(self, keyword=None, year=None, regulation_type=None):
        """Search Indonesian state gazette."""
        proxy = self.proxy_manager.get_proxy_for_country('ID')
        session = requests.Session()
        session.proxies = proxy
        session.headers.update({
            'User-Agent': 'Mozilla/5.0 (Linux; Android 13)',
            'Accept-Language': 'id-ID,id;q=0.9,en;q=0.8'
        })

        params = {}
        if keyword:
            params['q'] = keyword
        if year:
            params['tahun'] = year
        if regulation_type:
            params['jenis'] = regulation_type

        response = session.get(
            f"{self.BASE_URL}/id/peraturan",
            params=params,
            timeout=30
        )

        return self.parse_regulation_list(response.text)

    def parse_regulation_list(self, html):
        """Parse regulation listing page."""
        from bs4 import BeautifulSoup
        soup = BeautifulSoup(html, 'html.parser')

        regulations = []
        for item in soup.select('.regulation-item, .result-item'):
            reg = {
                'title': '',
                'number': '',
                'date': '',
                'type': '',
                'url': '',
                'country': 'Indonesia',
                'source': 'JDIH'
            }

            title_elem = item.select_one('.title, h3, h4')
            if title_elem:
                reg['title'] = title_elem.get_text(strip=True)
                link = title_elem.find('a')
                if link:
                    reg['url'] = link.get('href', '')

            number_elem = item.select_one('.number, .nomor')
            if number_elem:
                reg['number'] = number_elem.get_text(strip=True)

            regulations.append(reg)

        return regulations

Philippines Official Gazette Scraper

class PhilippinesGazetteScraper:
    """Scraper for Philippines Official Gazette."""

    BASE_URL = "https://www.officialgazette.gov.ph"

    def __init__(self, proxy_manager):
        self.proxy_manager = proxy_manager

    def fetch_executive_orders(self, page=1):
        """Fetch executive orders from the Official Gazette."""
        proxy = self.proxy_manager.get_proxy_for_country('PH')
        session = requests.Session()
        session.proxies = proxy

        response = session.get(
            f"{self.BASE_URL}/executive-orders/page/{page}",
            timeout=30
        )

        return self.parse_gazette_entries(response.text, 'executive_order')

    def fetch_proclamations(self, page=1):
        """Fetch presidential proclamations."""
        proxy = self.proxy_manager.get_proxy_for_country('PH')
        session = requests.Session()
        session.proxies = proxy

        response = session.get(
            f"{self.BASE_URL}/proclamations/page/{page}",
            timeout=30
        )

        return self.parse_gazette_entries(response.text, 'proclamation')

    def fetch_administrative_orders(self, page=1):
        """Fetch administrative orders."""
        proxy = self.proxy_manager.get_proxy_for_country('PH')
        session = requests.Session()
        session.proxies = proxy

        response = session.get(
            f"{self.BASE_URL}/administrative-orders/page/{page}",
            timeout=30
        )

        return self.parse_gazette_entries(response.text, 'administrative_order')

Change Detection and Monitoring

Detecting New Publications

class GazetteMonitor:
    """Monitor gazettes for new publications."""

    def __init__(self, scrapers, database, alert_service):
        self.scrapers = scrapers
        self.db = database
        self.alerts = alert_service

    def check_all_gazettes(self):
        """Check all configured gazettes for new publications."""
        for country, scraper in self.scrapers.items():
            try:
                publications = scraper.fetch_recent_publications()

                for pub in publications:
                    if not self.db.publication_exists(pub):
                        self.db.store_publication(pub)
                        self.process_new_publication(pub)

            except Exception as e:
                print(f"Error checking {country} gazette: {e}")

    def process_new_publication(self, publication):
        """Process a newly detected gazette publication."""
        # Classify the publication
        classification = self.classify_publication(publication)

        # Check against alert subscriptions
        subscribers = self.db.get_matching_subscribers(classification)

        for subscriber in subscribers:
            self.alerts.send(
                recipient=subscriber,
                publication=publication,
                classification=classification
            )

Classification of Gazette Entries

class GazetteClassifier:
    """Classify gazette publications by type and subject matter."""

    CATEGORIES = {
        'legislation': ['act', 'law', 'undang-undang', 'bill'],
        'regulation': ['regulation', 'peraturan', 'rule', 'circular'],
        'executive': ['executive order', 'decree', 'keputusan presiden'],
        'trade': ['tariff', 'customs', 'import', 'export', 'bea cukai'],
        'taxation': ['tax', 'pajak', 'revenue', 'duty'],
        'corporate': ['company', 'incorporation', 'perseroan', 'winding up'],
        'land': ['land', 'tanah', 'acquisition', 'zoning'],
        'environment': ['environment', 'lingkungan', 'emission', 'pollution'],
        'labor': ['employment', 'labor', 'wage', 'ketenagakerjaan'],
        'finance': ['banking', 'insurance', 'securities', 'perbankan'],
    }

    def classify(self, publication):
        """Classify a gazette publication."""
        text = f"{publication.get('title', '')} {publication.get('description', '')}".lower()

        matches = []
        for category, keywords in self.CATEGORIES.items():
            score = sum(1 for kw in keywords if kw in text)
            if score > 0:
                matches.append({'category': category, 'score': score})

        return sorted(matches, key=lambda x: x['score'], reverse=True)

Data Storage and Indexing

Database Schema

CREATE TABLE gazette_publications (
    id SERIAL PRIMARY KEY,
    country VARCHAR(50) NOT NULL,
    source VARCHAR(100) NOT NULL,
    publication_type VARCHAR(100),
    title TEXT NOT NULL,
    reference_number VARCHAR(200),
    publication_date DATE,
    effective_date DATE,
    url TEXT,
    pdf_url TEXT,
    full_text TEXT,
    categories JSONB,
    metadata JSONB,
    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW()
);

CREATE INDEX idx_gazette_country ON gazette_publications(country);
CREATE INDEX idx_gazette_date ON gazette_publications(publication_date);
CREATE INDEX idx_gazette_type ON gazette_publications(publication_type);
CREATE INDEX idx_gazette_categories ON gazette_publications USING GIN(categories);

Full-Text Search

Enable searching across gazette publications using Elasticsearch or PostgreSQL full-text search for finding specific topics, entities, or regulatory references across all countries and time periods.

Practical Applications

Regulatory Compliance Teams

Legal and compliance teams use gazette monitoring to stay ahead of regulatory changes that affect their operations across multiple ASEAN jurisdictions.

Trade and Customs Professionals

Customs brokers and trade compliance specialists monitor gazette publications for changes to tariff schedules, import/export regulations, and trade agreements.

Real Estate and Property Professionals

Land acquisition notices, zoning changes, and development plans published in gazettes are essential intelligence for real estate investors and developers.

Corporate Legal Teams

Company registrations, name changes, bankruptcy notices, and winding-up orders published in gazettes are critical for due diligence and corporate governance.

DataResearchTools for Gazette Monitoring

DataResearchTools provides the proxy infrastructure needed for comprehensive gazette monitoring:

  • Six-country coverage with native mobile IPs in Singapore, Indonesia, Philippines, Thailand, Malaysia, and Vietnam
  • PDF download support with high-bandwidth connections for gazette documents
  • Reliable scheduling support for regular gazette checks
  • Language-appropriate routing for accessing non-English gazette portals
  • Scalable infrastructure for monitoring all ASEAN gazettes simultaneously

Our proxies are optimized for government website access, providing the reliability and authenticity needed for gazette monitoring operations.

Conclusion

Government gazettes are an authoritative but underutilized source of business intelligence. By systematically monitoring gazette publications across Southeast Asia with proxy-powered scraping, organizations can detect regulatory changes, track corporate events, and identify market opportunities before they become widely known.

DataResearchTools provides the foundation for reliable gazette monitoring across all ASEAN countries. Start with the gazettes most relevant to your business, build automated monitoring with appropriate classification and alerting, and expand your coverage to gain a comprehensive view of the official government record across the region.


Related Reading

Scroll to Top