How to Scrape Government Tender Portals with Rotating Proxies

How to Scrape Government Tender Portals with Rotating Proxies

Government tender portals publish thousands of procurement opportunities every day across Southeast Asia alone. Companies that can systematically monitor these portals gain a significant competitive advantage in the B2G (business-to-government) market. However, scraping tender portals at scale requires rotating proxy infrastructure to avoid blocks and maintain consistent data collection.

This guide walks you through the entire process of setting up a tender portal scraping system powered by rotating proxies.

Understanding Government Tender Portals

Government tender portals are online platforms where public agencies publish procurement notices, invitation to bids, request for proposals, and contract awards. In Southeast Asia, major portals include:

  • GeBIZ (Singapore) – The centralized procurement portal for Singapore government agencies
  • LPSE (Indonesia) – Layanan Pengadaan Secara Elektronik, Indonesia’s electronic procurement system
  • PhilGEPS (Philippines) – The Philippine Government Electronic Procurement System
  • GPROCURE (Thailand) – Thailand’s government procurement platform
  • ePerolehan (Malaysia) – Malaysia’s electronic procurement system
  • muasamcong (Vietnam) – Vietnam’s national procurement network

Each portal has its own structure, anti-bot measures, and data formats. A successful scraping strategy must account for these differences while maintaining a unified data pipeline.

Why Rotating Proxies Are Essential for Tender Scraping

Volume of Data

A typical ASEAN-wide procurement monitoring operation needs to check hundreds of pages across multiple portals several times per day. This volume of requests from a single IP address will trigger rate limits within minutes.

Geographic Authentication

Tender portals often serve different content or impose different restrictions based on the visitor’s geographic location. Using proxies with IP addresses from the target country ensures you see the same content as local businesses.

Continuous Monitoring Requirements

Procurement opportunities have strict deadlines. Missing a tender because your scraper was blocked for hours means losing potential business. Rotating proxies ensure that if one IP is blocked, traffic seamlessly shifts to another.

Anti-Bot Protections

Government portals increasingly employ Cloudflare, Akamai, and custom anti-bot solutions. Rotating proxies distribute your requests across many IP addresses, making your traffic pattern indistinguishable from normal user behavior.

Setting Up Your Tender Portal Scraping Infrastructure

Step 1: Choose the Right Proxy Type

For tender portal scraping, mobile proxies deliver the best results. Government websites trust mobile IP addresses because blocking them would affect thousands of legitimate users accessing public information on their phones.

DataResearchTools offers mobile proxy plans with native carrier IPs across all major ASEAN countries. This geographic authenticity is crucial for accessing local tender portals without restrictions.

Step 2: Configure Proxy Rotation

The rotation strategy depends on the target portal’s behavior:

For portals with session-based access (GeBIZ, LPSE): Use sticky sessions that maintain the same IP for 10-30 minutes. This allows you to navigate multi-page listings and detail pages without triggering session validation errors.

# Sticky session configuration for session-based portals
proxy_url = "http://user:pass@sea.dataresearchtools.com:8080"
# Session ID keeps the same IP for the duration
session_proxy = f"{proxy_url}?session=tender_session_001&duration=1800"

For portals with simple pagination (PhilGEPS): Use per-request rotation to distribute requests across many IPs.

# Per-request rotation for simple pagination
proxy_url = "http://user:pass@sea.dataresearchtools.com:8080"
# Each request gets a new IP
rotating_proxy = f"{proxy_url}?rotate=true"

Step 3: Build the Scraper Architecture

A robust tender scraping system has several components:

┌─────────────┐     ┌──────────────┐     ┌───────────────┐
│  Scheduler   │────▶│ Proxy Router │────▶│ Target Portal │
│  (Cron/APQ)  │     │ (DataResearch│     │  (GeBIZ/LPSE/ │
│              │     │   Tools)     │     │   PhilGEPS)   │
└─────────────┘     └──────────────┘     └───────────────┘
       │                                         │
       │              ┌──────────────┐           │
       └─────────────▶│    Parser    │◀──────────┘
                      │  (per-portal │
                      │   adapter)   │
                      └──────────────┘
                             │
                      ┌──────────────┐
                      │   Database   │
                      │  (tenders,   │
                      │   awards)    │
                      └──────────────┘

Step 4: Implement Per-Portal Parsers

Each tender portal has a unique HTML structure. Create dedicated parser modules for each portal:

class TenderParser:
    """Base class for tender portal parsers."""

    def __init__(self, proxy_config):
        self.session = requests.Session()
        self.session.proxies = proxy_config
        self.session.headers.update({
            'User-Agent': self.get_random_user_agent(),
            'Accept-Language': self.get_locale_header()
        })

    def fetch_listing_page(self, url, page_number):
        """Fetch a page of tender listings."""
        raise NotImplementedError

    def parse_tender_summary(self, html):
        """Extract tender summaries from listing page."""
        raise NotImplementedError

    def fetch_tender_detail(self, tender_id):
        """Fetch full tender details."""
        raise NotImplementedError

    def parse_tender_detail(self, html):
        """Extract structured data from tender detail page."""
        raise NotImplementedError


class GeBIZParser(TenderParser):
    """Parser for Singapore GeBIZ portal."""

    BASE_URL = "https://www.gebiz.gov.sg"

    def fetch_listing_page(self, url, page_number):
        response = self.session.get(
            f"{self.BASE_URL}/ptn/opportunity/BOListing.xhtml",
            params={"page": page_number},
            timeout=30
        )
        return response.text

    def parse_tender_summary(self, html):
        from bs4 import BeautifulSoup
        soup = BeautifulSoup(html, 'html.parser')
        tenders = []
        for row in soup.select('.commandLink_BOLD'):
            tenders.append({
                'title': row.get_text(strip=True),
                'reference': row.get('id', ''),
                'portal': 'GeBIZ',
                'country': 'Singapore'
            })
        return tenders

Step 5: Handle Common Scraping Challenges

Dynamic JavaScript Rendering: Some tender portals load data via AJAX calls. Use a headless browser with proxy support:

from playwright.sync_api import sync_playwright

def scrape_dynamic_portal(url, proxy_config):
    with sync_playwright() as p:
        browser = p.chromium.launch(
            proxy={
                "server": proxy_config["server"],
                "username": proxy_config["username"],
                "password": proxy_config["password"]
            }
        )
        page = browser.new_page()
        page.goto(url, wait_until="networkidle")
        content = page.content()
        browser.close()
        return content

Pagination Handling: Tender portals use various pagination methods. Implement adaptive pagination:

def scrape_all_pages(parser, base_url, max_pages=100):
    all_tenders = []
    for page in range(1, max_pages + 1):
        html = parser.fetch_listing_page(base_url, page)
        tenders = parser.parse_tender_summary(html)

        if not tenders:
            break

        all_tenders.extend(tenders)
        time.sleep(random.uniform(2, 5))  # Polite delay

    return all_tenders

Date Range Filtering: Most portals allow filtering by date range. Use this to limit your scraping scope and reduce request volume:

from datetime import datetime, timedelta

def get_recent_tenders(parser, days_back=7):
    end_date = datetime.now()
    start_date = end_date - timedelta(days=days_back)

    return parser.fetch_listing_page(
        base_url,
        page_number=1,
        date_from=start_date.strftime("%Y-%m-%d"),
        date_to=end_date.strftime("%Y-%m-%d")
    )

Optimizing Proxy Usage for Cost Efficiency

Proxy bandwidth costs money. Optimize your scraping to minimize unnecessary requests:

Cache Previously Scraped Data

Store tender IDs you have already scraped and skip them on subsequent runs. Only fetch detail pages for new tenders.

import redis

cache = redis.Redis(host='localhost', port=6379, db=0)

def is_new_tender(tender_id):
    return not cache.exists(f"tender:{tender_id}")

def mark_as_scraped(tender_id):
    cache.set(f"tender:{tender_id}", "scraped", ex=86400 * 30)

Use Conditional Requests

Check for content changes before downloading full pages:

def fetch_with_etag(url, proxy_config, stored_etag=None):
    headers = {}
    if stored_etag:
        headers['If-None-Match'] = stored_etag

    response = requests.get(
        url, proxies=proxy_config, headers=headers
    )

    if response.status_code == 304:
        return None  # Content unchanged

    return {
        'content': response.text,
        'etag': response.headers.get('ETag')
    }

Schedule Scraping During Off-Peak Hours

Government portals experience peak traffic during business hours. Schedule your scraping runs during evenings and weekends for faster response times and lower block rates.

Data Structuring and Storage

Raw tender data needs to be structured for analysis. Create a standardized schema that works across all portals:

tender_schema = {
    "tender_id": "string",
    "portal": "string",
    "country": "string",
    "title": "string",
    "description": "text",
    "agency": "string",
    "category": "string",
    "estimated_value": "decimal",
    "currency": "string",
    "published_date": "datetime",
    "closing_date": "datetime",
    "contact_info": "json",
    "documents": "json",
    "status": "string",
    "scraped_at": "datetime",
    "raw_html": "text"
}

Monitoring and Alerting

Set up alerts for high-value opportunities:

def check_and_alert(tender):
    alert_keywords = [
        "IT infrastructure", "cloud services",
        "data analytics", "cybersecurity",
        "software development"
    ]

    title_lower = tender['title'].lower()
    for keyword in alert_keywords:
        if keyword in title_lower:
            send_alert(
                channel="procurement-alerts",
                message=f"New matching tender: {tender['title']}\n"
                        f"Portal: {tender['portal']}\n"
                        f"Closing: {tender['closing_date']}\n"
                        f"Value: {tender['estimated_value']} {tender['currency']}"
            )
            break

Scaling Your Tender Monitoring Operation

As you expand to cover more portals and countries, consider these scaling strategies:

  1. Distributed scraping: Run scrapers across multiple servers, each handling different portals
  2. Queue-based architecture: Use message queues like RabbitMQ or Redis to distribute scraping tasks
  3. Proxy pool segmentation: Dedicate specific proxy pools to specific portals to optimize rotation patterns
  4. Incremental scraping: Only scrape changes since the last run rather than full portal scans

DataResearchTools supports all these scaling patterns with flexible proxy allocation, API-driven configuration, and dedicated IP pools for high-priority targets.

Legal and Ethical Considerations

Government tender data is public information intended for broad dissemination. However, respect the following guidelines:

  • Adhere to each portal’s terms of service
  • Do not overload government servers with excessive requests
  • Store and handle any personal data in compliance with local privacy laws
  • Do not attempt to access restricted or authenticated sections without authorization
  • Consider contributing to open data initiatives that make procurement data more accessible

Conclusion

Scraping government tender portals with rotating proxies is a proven strategy for building procurement intelligence capabilities. The combination of mobile proxies from DataResearchTools, well-structured parsers, and smart caching creates a reliable data pipeline that keeps your tender monitoring operation running continuously.

Start with one or two portals, validate your data quality, and expand systematically. The competitive advantage of early access to procurement opportunities makes this investment worthwhile for any company operating in the B2G space across Southeast Asia.


Related Reading

Scroll to Top