Scraping Government Gazette Publications Across Southeast Asia
Government gazettes are the official record of a country’s legal and administrative actions. They publish new legislation, regulations, government appointments, company registrations, land notices, bankruptcy proceedings, and other official announcements. For businesses, legal professionals, and researchers operating in Southeast Asia, gazette data is an authoritative source of intelligence that cannot be obtained elsewhere.
This guide covers how to systematically scrape government gazette publications across ASEAN countries using proxy infrastructure.
What Government Gazettes Publish
Legal and Regulatory Content
- New acts of parliament and amendments
- Government regulations and ministerial decrees
- Executive orders and presidential decrees
- Subsidiary legislation and rules
Administrative Notices
- Government appointments and dismissals
- Formation and dissolution of government bodies
- Changes to administrative boundaries
- Public holiday declarations
Business and Commercial Notices
- Company incorporation and dissolution
- Trademark and patent publications
- Bankruptcy and winding-up orders
- Change of company names
Land and Property Notices
- Land acquisition notices
- Zoning changes and development plans
- Environmental impact assessment publications
- Mining and exploration permits
Tender and Procurement Notices
Some countries publish procurement notices through their official gazettes in addition to dedicated procurement portals.
Government Gazettes by Country
Singapore
Singapore Government Gazette (eGazette)
- URL: egazette.gov.sg
- Format: Searchable online database with PDF supplements
- Content: Bills, subsidiary legislation, government notifications, appointments
- Language: English
- Update frequency: Multiple times per week
Indonesia
Berita Negara (State Gazette)
- URL: ditjenpp.kemenkumham.go.id
- Format: Online database with downloadable PDFs
- Content: Laws, government regulations, presidential decrees, ministerial regulations
- Language: Bahasa Indonesia
- Update frequency: Regular, tied to legislative calendar
Lembaran Negara (State Sheet)
- Contains enacted legislation
- Published alongside explanatory memoranda
Philippines
Official Gazette of the Republic of the Philippines
- URL: officialgazette.gov.ph
- Format: Web articles and PDF publications
- Content: Executive orders, proclamations, administrative orders, legislation
- Language: English and Filipino
- Update frequency: Multiple times per week
Thailand
Royal Gazette (Ratchakitcha)
- URL: ratchakitcha.soc.go.th
- Format: Online database with PDF documents
- Content: Royal decrees, legislation, regulations, appointments
- Language: Thai
- Update frequency: Tied to royal and legislative calendar
Malaysia
Federal Gazette
- URL: federalgazette.agc.gov.my
- Format: PDF publications
- Content: Acts, subsidiary legislation, government notifications
- Languages: Bahasa Malaysia and English
- Update frequency: Regular
Vietnam
Official Gazette (Cong Bao)
- URL: congbao.chinhphu.vn
- Format: Online database
- Content: Laws, decrees, decisions, circulars
- Language: Vietnamese
- Update frequency: Regular
Technical Implementation
Proxy Configuration for Gazette Scraping
class GazetteProxyConfig:
"""Proxy configuration for gazette scraping across ASEAN."""
def __init__(self, proxy_manager):
self.proxy_manager = proxy_manager
self.gazette_configs = {
'singapore': {
'url': 'https://egazette.gov.sg',
'country': 'SG',
'session_type': 'sticky',
'language': 'en-SG,en;q=0.9'
},
'indonesia': {
'url': 'https://ditjenpp.kemenkumham.go.id',
'country': 'ID',
'session_type': 'rotating',
'language': 'id-ID,id;q=0.9'
},
'philippines': {
'url': 'https://www.officialgazette.gov.ph',
'country': 'PH',
'session_type': 'rotating',
'language': 'en-PH,en;q=0.9'
},
'thailand': {
'url': 'https://ratchakitcha.soc.go.th',
'country': 'TH',
'session_type': 'rotating',
'language': 'th-TH,th;q=0.9'
},
'malaysia': {
'url': 'https://federalgazette.agc.gov.my',
'country': 'MY',
'session_type': 'rotating',
'language': 'ms-MY,ms;q=0.9,en;q=0.8'
},
'vietnam': {
'url': 'https://congbao.chinhphu.vn',
'country': 'VN',
'session_type': 'rotating',
'language': 'vi-VN,vi;q=0.9'
}
}
def get_config(self, country_name):
return self.gazette_configs.get(country_name.lower())Singapore eGazette Scraper
class SingaporeGazetteScraper:
"""Scraper for Singapore Government Gazette."""
BASE_URL = "https://egazette.gov.sg"
def __init__(self, proxy_manager):
self.proxy_manager = proxy_manager
def fetch_recent_publications(self, year=None, gazette_type=None):
"""Fetch recent gazette publications."""
proxy = self.proxy_manager.get_proxy_for_country('SG')
session = requests.Session()
session.proxies = proxy
session.headers.update({
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)',
'Accept-Language': 'en-SG,en;q=0.9'
})
params = {}
if year:
params['year'] = year
if gazette_type:
params['type'] = gazette_type
response = session.get(
f"{self.BASE_URL}/",
params=params,
timeout=30
)
return self.parse_gazette_list(response.text)
def parse_gazette_list(self, html):
"""Parse gazette listing page."""
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')
publications = []
for item in soup.select('.gazette-item, .listing-item, tr'):
pub = self._extract_publication(item)
if pub:
publications.append(pub)
return publications
def _extract_publication(self, element):
"""Extract publication data from a listing element."""
title_elem = element.select_one('a, .title')
date_elem = element.select_one('.date, td:nth-child(2)')
if title_elem:
return {
'title': title_elem.get_text(strip=True),
'url': title_elem.get('href', ''),
'date': date_elem.get_text(strip=True) if date_elem else '',
'country': 'Singapore',
'source': 'eGazette'
}
return None
def download_gazette_pdf(self, pdf_url):
"""Download a gazette publication PDF."""
proxy = self.proxy_manager.get_proxy_for_country('SG')
response = requests.get(
pdf_url, proxies=proxy, timeout=120, stream=True
)
return response.content if response.status_code == 200 else NoneIndonesia State Gazette Scraper
class IndonesiaGazetteScraper:
"""Scraper for Indonesian State Gazette (Berita Negara)."""
BASE_URL = "https://ditjenpp.kemenkumham.go.id"
def __init__(self, proxy_manager):
self.proxy_manager = proxy_manager
def search_regulations(self, keyword=None, year=None, regulation_type=None):
"""Search Indonesian state gazette."""
proxy = self.proxy_manager.get_proxy_for_country('ID')
session = requests.Session()
session.proxies = proxy
session.headers.update({
'User-Agent': 'Mozilla/5.0 (Linux; Android 13)',
'Accept-Language': 'id-ID,id;q=0.9,en;q=0.8'
})
params = {}
if keyword:
params['q'] = keyword
if year:
params['tahun'] = year
if regulation_type:
params['jenis'] = regulation_type
response = session.get(
f"{self.BASE_URL}/id/peraturan",
params=params,
timeout=30
)
return self.parse_regulation_list(response.text)
def parse_regulation_list(self, html):
"""Parse regulation listing page."""
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')
regulations = []
for item in soup.select('.regulation-item, .result-item'):
reg = {
'title': '',
'number': '',
'date': '',
'type': '',
'url': '',
'country': 'Indonesia',
'source': 'JDIH'
}
title_elem = item.select_one('.title, h3, h4')
if title_elem:
reg['title'] = title_elem.get_text(strip=True)
link = title_elem.find('a')
if link:
reg['url'] = link.get('href', '')
number_elem = item.select_one('.number, .nomor')
if number_elem:
reg['number'] = number_elem.get_text(strip=True)
regulations.append(reg)
return regulationsPhilippines Official Gazette Scraper
class PhilippinesGazetteScraper:
"""Scraper for Philippines Official Gazette."""
BASE_URL = "https://www.officialgazette.gov.ph"
def __init__(self, proxy_manager):
self.proxy_manager = proxy_manager
def fetch_executive_orders(self, page=1):
"""Fetch executive orders from the Official Gazette."""
proxy = self.proxy_manager.get_proxy_for_country('PH')
session = requests.Session()
session.proxies = proxy
response = session.get(
f"{self.BASE_URL}/executive-orders/page/{page}",
timeout=30
)
return self.parse_gazette_entries(response.text, 'executive_order')
def fetch_proclamations(self, page=1):
"""Fetch presidential proclamations."""
proxy = self.proxy_manager.get_proxy_for_country('PH')
session = requests.Session()
session.proxies = proxy
response = session.get(
f"{self.BASE_URL}/proclamations/page/{page}",
timeout=30
)
return self.parse_gazette_entries(response.text, 'proclamation')
def fetch_administrative_orders(self, page=1):
"""Fetch administrative orders."""
proxy = self.proxy_manager.get_proxy_for_country('PH')
session = requests.Session()
session.proxies = proxy
response = session.get(
f"{self.BASE_URL}/administrative-orders/page/{page}",
timeout=30
)
return self.parse_gazette_entries(response.text, 'administrative_order')Change Detection and Monitoring
Detecting New Publications
class GazetteMonitor:
"""Monitor gazettes for new publications."""
def __init__(self, scrapers, database, alert_service):
self.scrapers = scrapers
self.db = database
self.alerts = alert_service
def check_all_gazettes(self):
"""Check all configured gazettes for new publications."""
for country, scraper in self.scrapers.items():
try:
publications = scraper.fetch_recent_publications()
for pub in publications:
if not self.db.publication_exists(pub):
self.db.store_publication(pub)
self.process_new_publication(pub)
except Exception as e:
print(f"Error checking {country} gazette: {e}")
def process_new_publication(self, publication):
"""Process a newly detected gazette publication."""
# Classify the publication
classification = self.classify_publication(publication)
# Check against alert subscriptions
subscribers = self.db.get_matching_subscribers(classification)
for subscriber in subscribers:
self.alerts.send(
recipient=subscriber,
publication=publication,
classification=classification
)Classification of Gazette Entries
class GazetteClassifier:
"""Classify gazette publications by type and subject matter."""
CATEGORIES = {
'legislation': ['act', 'law', 'undang-undang', 'bill'],
'regulation': ['regulation', 'peraturan', 'rule', 'circular'],
'executive': ['executive order', 'decree', 'keputusan presiden'],
'trade': ['tariff', 'customs', 'import', 'export', 'bea cukai'],
'taxation': ['tax', 'pajak', 'revenue', 'duty'],
'corporate': ['company', 'incorporation', 'perseroan', 'winding up'],
'land': ['land', 'tanah', 'acquisition', 'zoning'],
'environment': ['environment', 'lingkungan', 'emission', 'pollution'],
'labor': ['employment', 'labor', 'wage', 'ketenagakerjaan'],
'finance': ['banking', 'insurance', 'securities', 'perbankan'],
}
def classify(self, publication):
"""Classify a gazette publication."""
text = f"{publication.get('title', '')} {publication.get('description', '')}".lower()
matches = []
for category, keywords in self.CATEGORIES.items():
score = sum(1 for kw in keywords if kw in text)
if score > 0:
matches.append({'category': category, 'score': score})
return sorted(matches, key=lambda x: x['score'], reverse=True)Data Storage and Indexing
Database Schema
CREATE TABLE gazette_publications (
id SERIAL PRIMARY KEY,
country VARCHAR(50) NOT NULL,
source VARCHAR(100) NOT NULL,
publication_type VARCHAR(100),
title TEXT NOT NULL,
reference_number VARCHAR(200),
publication_date DATE,
effective_date DATE,
url TEXT,
pdf_url TEXT,
full_text TEXT,
categories JSONB,
metadata JSONB,
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW()
);
CREATE INDEX idx_gazette_country ON gazette_publications(country);
CREATE INDEX idx_gazette_date ON gazette_publications(publication_date);
CREATE INDEX idx_gazette_type ON gazette_publications(publication_type);
CREATE INDEX idx_gazette_categories ON gazette_publications USING GIN(categories);Full-Text Search
Enable searching across gazette publications using Elasticsearch or PostgreSQL full-text search for finding specific topics, entities, or regulatory references across all countries and time periods.
Practical Applications
Regulatory Compliance Teams
Legal and compliance teams use gazette monitoring to stay ahead of regulatory changes that affect their operations across multiple ASEAN jurisdictions.
Trade and Customs Professionals
Customs brokers and trade compliance specialists monitor gazette publications for changes to tariff schedules, import/export regulations, and trade agreements.
Real Estate and Property Professionals
Land acquisition notices, zoning changes, and development plans published in gazettes are essential intelligence for real estate investors and developers.
Corporate Legal Teams
Company registrations, name changes, bankruptcy notices, and winding-up orders published in gazettes are critical for due diligence and corporate governance.
DataResearchTools for Gazette Monitoring
DataResearchTools provides the proxy infrastructure needed for comprehensive gazette monitoring:
- Six-country coverage with native mobile IPs in Singapore, Indonesia, Philippines, Thailand, Malaysia, and Vietnam
- PDF download support with high-bandwidth connections for gazette documents
- Reliable scheduling support for regular gazette checks
- Language-appropriate routing for accessing non-English gazette portals
- Scalable infrastructure for monitoring all ASEAN gazettes simultaneously
Our proxies are optimized for government website access, providing the reliability and authenticity needed for gazette monitoring operations.
Conclusion
Government gazettes are an authoritative but underutilized source of business intelligence. By systematically monitoring gazette publications across Southeast Asia with proxy-powered scraping, organizations can detect regulatory changes, track corporate events, and identify market opportunities before they become widely known.
DataResearchTools provides the foundation for reliable gazette monitoring across all ASEAN countries. Start with the gazettes most relevant to your business, build automated monitoring with appropriate classification and alerting, and expand your coverage to gain a comprehensive view of the official government record across the region.
- Best Proxies for Government Data Scraping
- Building a Legislative Bill Tracker with Proxy-Powered Scraping
- How AI + Proxies Are Transforming Drug Discovery Data Pipelines
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- API vs Web Scraping: When You Need Proxies (and When You Don’t)
- ASEAN Data Protection Laws: A Web Scraping Compliance Matrix
- Best Proxies for Government Data Scraping
- Building a Government Contract Intelligence System with Proxies
- How AI + Proxies Are Transforming Drug Discovery Data Pipelines
- aiohttp + BeautifulSoup: Async Python Scraping
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- API vs Web Scraping: When You Need Proxies (and When You Don’t)
- Best Proxies for Government Data Scraping
- Building a Government Contract Intelligence System with Proxies
- How AI + Proxies Are Transforming Drug Discovery Data Pipelines
- aiohttp + BeautifulSoup: Async Python Scraping
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- API vs Web Scraping: When You Need Proxies (and When You Don’t)
Related Reading
- Best Proxies for Government Data Scraping
- Building a Government Contract Intelligence System with Proxies
- How AI + Proxies Are Transforming Drug Discovery Data Pipelines
- aiohttp + BeautifulSoup: Async Python Scraping
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- API vs Web Scraping: When You Need Proxies (and When You Don’t)