How to Scrape Government Tender Portals with Rotating Proxies
Government tender portals publish thousands of procurement opportunities every day across Southeast Asia alone. Companies that can systematically monitor these portals gain a significant competitive advantage in the B2G (business-to-government) market. However, scraping tender portals at scale requires rotating proxy infrastructure to avoid blocks and maintain consistent data collection.
This guide walks you through the entire process of setting up a tender portal scraping system powered by rotating proxies.
Understanding Government Tender Portals
Government tender portals are online platforms where public agencies publish procurement notices, invitation to bids, request for proposals, and contract awards. In Southeast Asia, major portals include:
- GeBIZ (Singapore) – The centralized procurement portal for Singapore government agencies
- LPSE (Indonesia) – Layanan Pengadaan Secara Elektronik, Indonesia’s electronic procurement system
- PhilGEPS (Philippines) – The Philippine Government Electronic Procurement System
- GPROCURE (Thailand) – Thailand’s government procurement platform
- ePerolehan (Malaysia) – Malaysia’s electronic procurement system
- muasamcong (Vietnam) – Vietnam’s national procurement network
Each portal has its own structure, anti-bot measures, and data formats. A successful scraping strategy must account for these differences while maintaining a unified data pipeline.
Why Rotating Proxies Are Essential for Tender Scraping
Volume of Data
A typical ASEAN-wide procurement monitoring operation needs to check hundreds of pages across multiple portals several times per day. This volume of requests from a single IP address will trigger rate limits within minutes.
Geographic Authentication
Tender portals often serve different content or impose different restrictions based on the visitor’s geographic location. Using proxies with IP addresses from the target country ensures you see the same content as local businesses.
Continuous Monitoring Requirements
Procurement opportunities have strict deadlines. Missing a tender because your scraper was blocked for hours means losing potential business. Rotating proxies ensure that if one IP is blocked, traffic seamlessly shifts to another.
Anti-Bot Protections
Government portals increasingly employ Cloudflare, Akamai, and custom anti-bot solutions. Rotating proxies distribute your requests across many IP addresses, making your traffic pattern indistinguishable from normal user behavior.
Setting Up Your Tender Portal Scraping Infrastructure
Step 1: Choose the Right Proxy Type
For tender portal scraping, mobile proxies deliver the best results. Government websites trust mobile IP addresses because blocking them would affect thousands of legitimate users accessing public information on their phones.
DataResearchTools offers mobile proxy plans with native carrier IPs across all major ASEAN countries. This geographic authenticity is crucial for accessing local tender portals without restrictions.
Step 2: Configure Proxy Rotation
The rotation strategy depends on the target portal’s behavior:
For portals with session-based access (GeBIZ, LPSE): Use sticky sessions that maintain the same IP for 10-30 minutes. This allows you to navigate multi-page listings and detail pages without triggering session validation errors.
# Sticky session configuration for session-based portals
proxy_url = "http://user:pass@sea.dataresearchtools.com:8080"
# Session ID keeps the same IP for the duration
session_proxy = f"{proxy_url}?session=tender_session_001&duration=1800"For portals with simple pagination (PhilGEPS): Use per-request rotation to distribute requests across many IPs.
# Per-request rotation for simple pagination
proxy_url = "http://user:pass@sea.dataresearchtools.com:8080"
# Each request gets a new IP
rotating_proxy = f"{proxy_url}?rotate=true"Step 3: Build the Scraper Architecture
A robust tender scraping system has several components:
┌─────────────┐ ┌──────────────┐ ┌───────────────┐
│ Scheduler │────▶│ Proxy Router │────▶│ Target Portal │
│ (Cron/APQ) │ │ (DataResearch│ │ (GeBIZ/LPSE/ │
│ │ │ Tools) │ │ PhilGEPS) │
└─────────────┘ └──────────────┘ └───────────────┘
│ │
│ ┌──────────────┐ │
└─────────────▶│ Parser │◀──────────┘
│ (per-portal │
│ adapter) │
└──────────────┘
│
┌──────────────┐
│ Database │
│ (tenders, │
│ awards) │
└──────────────┘Step 4: Implement Per-Portal Parsers
Each tender portal has a unique HTML structure. Create dedicated parser modules for each portal:
class TenderParser:
"""Base class for tender portal parsers."""
def __init__(self, proxy_config):
self.session = requests.Session()
self.session.proxies = proxy_config
self.session.headers.update({
'User-Agent': self.get_random_user_agent(),
'Accept-Language': self.get_locale_header()
})
def fetch_listing_page(self, url, page_number):
"""Fetch a page of tender listings."""
raise NotImplementedError
def parse_tender_summary(self, html):
"""Extract tender summaries from listing page."""
raise NotImplementedError
def fetch_tender_detail(self, tender_id):
"""Fetch full tender details."""
raise NotImplementedError
def parse_tender_detail(self, html):
"""Extract structured data from tender detail page."""
raise NotImplementedError
class GeBIZParser(TenderParser):
"""Parser for Singapore GeBIZ portal."""
BASE_URL = "https://www.gebiz.gov.sg"
def fetch_listing_page(self, url, page_number):
response = self.session.get(
f"{self.BASE_URL}/ptn/opportunity/BOListing.xhtml",
params={"page": page_number},
timeout=30
)
return response.text
def parse_tender_summary(self, html):
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')
tenders = []
for row in soup.select('.commandLink_BOLD'):
tenders.append({
'title': row.get_text(strip=True),
'reference': row.get('id', ''),
'portal': 'GeBIZ',
'country': 'Singapore'
})
return tendersStep 5: Handle Common Scraping Challenges
Dynamic JavaScript Rendering: Some tender portals load data via AJAX calls. Use a headless browser with proxy support:
from playwright.sync_api import sync_playwright
def scrape_dynamic_portal(url, proxy_config):
with sync_playwright() as p:
browser = p.chromium.launch(
proxy={
"server": proxy_config["server"],
"username": proxy_config["username"],
"password": proxy_config["password"]
}
)
page = browser.new_page()
page.goto(url, wait_until="networkidle")
content = page.content()
browser.close()
return contentPagination Handling: Tender portals use various pagination methods. Implement adaptive pagination:
def scrape_all_pages(parser, base_url, max_pages=100):
all_tenders = []
for page in range(1, max_pages + 1):
html = parser.fetch_listing_page(base_url, page)
tenders = parser.parse_tender_summary(html)
if not tenders:
break
all_tenders.extend(tenders)
time.sleep(random.uniform(2, 5)) # Polite delay
return all_tendersDate Range Filtering: Most portals allow filtering by date range. Use this to limit your scraping scope and reduce request volume:
from datetime import datetime, timedelta
def get_recent_tenders(parser, days_back=7):
end_date = datetime.now()
start_date = end_date - timedelta(days=days_back)
return parser.fetch_listing_page(
base_url,
page_number=1,
date_from=start_date.strftime("%Y-%m-%d"),
date_to=end_date.strftime("%Y-%m-%d")
)Optimizing Proxy Usage for Cost Efficiency
Proxy bandwidth costs money. Optimize your scraping to minimize unnecessary requests:
Cache Previously Scraped Data
Store tender IDs you have already scraped and skip them on subsequent runs. Only fetch detail pages for new tenders.
import redis
cache = redis.Redis(host='localhost', port=6379, db=0)
def is_new_tender(tender_id):
return not cache.exists(f"tender:{tender_id}")
def mark_as_scraped(tender_id):
cache.set(f"tender:{tender_id}", "scraped", ex=86400 * 30)Use Conditional Requests
Check for content changes before downloading full pages:
def fetch_with_etag(url, proxy_config, stored_etag=None):
headers = {}
if stored_etag:
headers['If-None-Match'] = stored_etag
response = requests.get(
url, proxies=proxy_config, headers=headers
)
if response.status_code == 304:
return None # Content unchanged
return {
'content': response.text,
'etag': response.headers.get('ETag')
}Schedule Scraping During Off-Peak Hours
Government portals experience peak traffic during business hours. Schedule your scraping runs during evenings and weekends for faster response times and lower block rates.
Data Structuring and Storage
Raw tender data needs to be structured for analysis. Create a standardized schema that works across all portals:
tender_schema = {
"tender_id": "string",
"portal": "string",
"country": "string",
"title": "string",
"description": "text",
"agency": "string",
"category": "string",
"estimated_value": "decimal",
"currency": "string",
"published_date": "datetime",
"closing_date": "datetime",
"contact_info": "json",
"documents": "json",
"status": "string",
"scraped_at": "datetime",
"raw_html": "text"
}Monitoring and Alerting
Set up alerts for high-value opportunities:
def check_and_alert(tender):
alert_keywords = [
"IT infrastructure", "cloud services",
"data analytics", "cybersecurity",
"software development"
]
title_lower = tender['title'].lower()
for keyword in alert_keywords:
if keyword in title_lower:
send_alert(
channel="procurement-alerts",
message=f"New matching tender: {tender['title']}\n"
f"Portal: {tender['portal']}\n"
f"Closing: {tender['closing_date']}\n"
f"Value: {tender['estimated_value']} {tender['currency']}"
)
breakScaling Your Tender Monitoring Operation
As you expand to cover more portals and countries, consider these scaling strategies:
- Distributed scraping: Run scrapers across multiple servers, each handling different portals
- Queue-based architecture: Use message queues like RabbitMQ or Redis to distribute scraping tasks
- Proxy pool segmentation: Dedicate specific proxy pools to specific portals to optimize rotation patterns
- Incremental scraping: Only scrape changes since the last run rather than full portal scans
DataResearchTools supports all these scaling patterns with flexible proxy allocation, API-driven configuration, and dedicated IP pools for high-priority targets.
Legal and Ethical Considerations
Government tender data is public information intended for broad dissemination. However, respect the following guidelines:
- Adhere to each portal’s terms of service
- Do not overload government servers with excessive requests
- Store and handle any personal data in compliance with local privacy laws
- Do not attempt to access restricted or authenticated sections without authorization
- Consider contributing to open data initiatives that make procurement data more accessible
Conclusion
Scraping government tender portals with rotating proxies is a proven strategy for building procurement intelligence capabilities. The combination of mobile proxies from DataResearchTools, well-structured parsers, and smart caching creates a reliable data pipeline that keeps your tender monitoring operation running continuously.
Start with one or two portals, validate your data quality, and expand systematically. The competitive advantage of early access to procurement opportunities makes this investment worthwhile for any company operating in the B2G space across Southeast Asia.
- Best Proxies for Government Data Scraping
- Building a Legislative Bill Tracker with Proxy-Powered Scraping
- How AI + Proxies Are Transforming Drug Discovery Data Pipelines
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- API vs Web Scraping: When You Need Proxies (and When You Don’t)
- ASEAN Data Protection Laws: A Web Scraping Compliance Matrix
- Best Proxies for Government Data Scraping
- Building a Government Contract Intelligence System with Proxies
- How AI + Proxies Are Transforming Drug Discovery Data Pipelines
- aiohttp + BeautifulSoup: Async Python Scraping
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- API vs Web Scraping: When You Need Proxies (and When You Don’t)
- Best Proxies for Government Data Scraping
- Building a Government Contract Intelligence System with Proxies
- How AI + Proxies Are Transforming Drug Discovery Data Pipelines
- aiohttp + BeautifulSoup: Async Python Scraping
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- API vs Web Scraping: When You Need Proxies (and When You Don’t)
- Best Proxies for Government Data Scraping
- Building a Government Contract Intelligence System with Proxies
- How AI + Proxies Are Transforming Drug Discovery Data Pipelines
- aiohttp + BeautifulSoup: Async Python Scraping
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- API vs Web Scraping: When You Need Proxies (and When You Don’t)
Related Reading
- Best Proxies for Government Data Scraping
- Building a Government Contract Intelligence System with Proxies
- How AI + Proxies Are Transforming Drug Discovery Data Pipelines
- aiohttp + BeautifulSoup: Async Python Scraping
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- API vs Web Scraping: When You Need Proxies (and When You Don’t)