Proxies for Scraping Government Contract and Procurement Databases

Proxies for Scraping Government Contract and Procurement Databases

Government contracts represent a massive B2B opportunity. The US federal government alone awards over $700 billion in contracts annually, and state and local governments add hundreds of billions more. Companies that win government contracts become excellent B2B leads — they have confirmed budgets, known project timelines, and documented needs.

Scraping government procurement databases with mobile proxies lets you identify contract awardees, track spending patterns, and build targeted lead lists of companies actively doing business with government agencies.

Key Government Data Sources

Federal Databases

DatabaseURLData Available
SAM.govsam.govContract opportunities, entity registrations
USAspending.govusaspending.govAward data, spending by agency
FPDSfpds.govFederal procurement data
SBIR.govsbir.govSmall business innovation research
GovWin (Deltek)govwin.comIntelligence on upcoming contracts

State and Local

SourceCoverage
State procurement portalsEach state has its own system
BidNet DirectMulti-state bid aggregator
PublicPurchaseMunicipal procurement
Government BidsAggregated government RFPs

Why Proxies Are Needed for Government Sites

While government data is public by design, the websites hosting it have technical limitations:

  1. Rate limiting — SAM.gov and USAspending throttle rapid requests to protect server resources.
  2. Session management — Many procurement portals use session-based access that expires after inactivity.
  3. Geographic restrictions — Some state portals restrict access to in-state IP addresses or require US-based IPs.
  4. Legacy infrastructure — Government websites often have unpredictable response times and frequent timeouts.
  5. CAPTCHA challenges — Several portals employ CAPTCHA for repeated searches.

Mobile proxies provide reliable US-based IP addresses with the session persistence needed for these platforms.

Scraping SAM.gov Contract Opportunities

SAM.gov is the primary source for federal contract opportunities and entity registrations:

import requests
import time
import random
from datetime import datetime, timedelta

class SAMGovScraper:
    """Scrape SAM.gov for contract opportunities and registrations"""

    def __init__(self, api_key, proxy_url):
        self.api_key = api_key
        self.proxy_url = proxy_url
        self.base_url = "https://api.sam.gov"
        self.session = requests.Session()
        self.session.proxies = {"https": proxy_url}
        self.session.headers.update({
            "User-Agent": "Mozilla/5.0",
            "Accept": "application/json",
        })

    def search_opportunities(self, keywords, posted_from=None, limit=100):
        """Search for contract opportunities"""
        if posted_from is None:
            posted_from = (datetime.now() - timedelta(days=30)).strftime("%m/%d/%Y")

        params = {
            "api_key": self.api_key,
            "q": keywords,
            "postedFrom": posted_from,
            "limit": limit,
            "offset": 0,
        }

        all_opportunities = []

        while True:
            response = self.session.get(
                f"{self.base_url}/opportunities/v2/search",
                params=params,
                timeout=30,
            )

            if response.status_code == 200:
                data = response.json()
                opportunities = data.get("opportunitiesData", [])
                all_opportunities.extend(opportunities)

                if len(opportunities) < limit:
                    break

                params["offset"] += limit
                time.sleep(random.uniform(2, 5))
            elif response.status_code == 429:
                time.sleep(60)
            else:
                print(f"Error: {response.status_code}")
                break

        return all_opportunities

    def get_entity_registration(self, uei):
        """Look up entity registration details by UEI"""
        response = self.session.get(
            f"{self.base_url}/entity-information/v3/entities",
            params={
                "api_key": self.api_key,
                "ueiSAM": uei,
            },
            timeout=30,
        )

        if response.status_code == 200:
            data = response.json()
            entities = data.get("entityData", [])
            if entities:
                entity = entities[0]
                return {
                    "uei": uei,
                    "legal_name": entity.get("entityRegistration", {}).get("legalBusinessName"),
                    "dba_name": entity.get("entityRegistration", {}).get("dbaName"),
                    "cage_code": entity.get("entityRegistration", {}).get("cageCode"),
                    "naics_codes": entity.get("assertions", {}).get("naicsCode", []),
                    "address": entity.get("coreData", {}).get("physicalAddress"),
                    "poc": entity.get("pointsOfContact"),
                }
        return None

    def search_entities_by_naics(self, naics_code, state=None):
        """Find registered entities by NAICS code"""
        params = {
            "api_key": self.api_key,
            "naicsCode": naics_code,
            "registrationStatus": "A",  # Active only
        }
        if state:
            params["stateCode"] = state

        response = self.session.get(
            f"{self.base_url}/entity-information/v3/entities",
            params=params,
            timeout=30,
        )

        if response.status_code == 200:
            return response.json().get("entityData", [])
        return []

Scraping USAspending.gov

USAspending provides detailed award data showing which companies received contracts and for how much:

class USASpendingScraper:
    """Scrape USAspending.gov for contract award data"""

    def __init__(self, proxy_url):
        self.proxy_url = proxy_url
        self.base_url = "https://api.usaspending.gov/api/v2"
        self.session = requests.Session()
        self.session.proxies = {"https": proxy_url}

    def search_awards(self, keywords=None, agency=None, min_amount=None, fiscal_year=None):
        """Search for contract awards"""
        filters = {"award_type_codes": ["A", "B", "C", "D"]}  # Contract types

        if keywords:
            filters["keywords"] = [keywords]
        if agency:
            filters["agencies"] = [{"type": "awarding", "tier": "toptier", "name": agency}]
        if min_amount:
            filters["award_amounts"] = [{"lower_bound": min_amount}]
        if fiscal_year:
            filters["time_period"] = [{"start_date": f"{fiscal_year}-10-01", "end_date": f"{fiscal_year + 1}-09-30"}]

        payload = {
            "filters": filters,
            "fields": [
                "Award ID", "Recipient Name", "Award Amount",
                "Awarding Agency", "Start Date", "End Date",
                "Description", "recipient_id"
            ],
            "page": 1,
            "limit": 100,
            "sort": "Award Amount",
            "order": "desc",
        }

        all_awards = []

        while True:
            response = self.session.post(
                f"{self.base_url}/search/spending_by_award/",
                json=payload,
                timeout=30,
            )

            if response.status_code == 200:
                data = response.json()
                awards = data.get("results", [])
                all_awards.extend(awards)

                if not data.get("hasNext", False):
                    break

                payload["page"] += 1
                time.sleep(random.uniform(1, 3))
            else:
                break

        return all_awards

    def get_recipient_profile(self, recipient_id):
        """Get detailed profile of a contract recipient"""
        response = self.session.get(
            f"{self.base_url}/recipient/{recipient_id}/",
            timeout=30,
        )

        if response.status_code == 200:
            data = response.json()
            return {
                "name": data.get("name"),
                "duns": data.get("duns"),
                "uei": data.get("uei"),
                "parent_name": data.get("parent_name"),
                "location": data.get("location"),
                "total_transaction_amount": data.get("total_transaction_amount"),
                "total_contracts": data.get("total_contracts"),
                "total_grants": data.get("total_grants"),
            }
        return None

State Procurement Portal Scraping

State procurement portals vary widely in structure. Browser automation handles the diversity. For a refresher on proxy concepts, visit our proxy glossary.

from playwright.async_api import async_playwright

async def scrape_state_procurement(state_url, proxy_config, search_query):
    """Generic scraper for state procurement portals"""
    async with async_playwright() as p:
        browser = await p.chromium.launch(
            proxy=proxy_config,
            headless=False,
        )
        page = await browser.new_page()

        await page.goto(state_url, wait_until="networkidle")
        await page.wait_for_timeout(random.randint(3000, 6000))

        # Most state portals have a search box
        search_input = await page.query_selector(
            'input[type="text"], input[type="search"], input[name*="search"], input[name*="keyword"]'
        )
        if search_input:
            await search_input.fill(search_query)
            await page.keyboard.press("Enter")
            await page.wait_for_timeout(random.randint(3000, 8000))

        # Extract results (generic approach)
        results = []
        rows = await page.query_selector_all('table tr, [class*="result"], [class*="bid"]')

        for row in rows:
            cells = await row.query_selector_all('td, [class*="cell"]')
            if cells:
                texts = []
                for cell in cells:
                    text = (await cell.inner_text()).strip()
                    texts.append(text)
                if texts:
                    results.append(texts)

        await browser.close()
        return results

Building a Contract Intelligence Pipeline

Combine federal and state data into a unified lead pipeline:

class ContractIntelligencePipeline:
    """Build lead lists from government contract data"""

    def __init__(self, sam_scraper, usa_spending_scraper):
        self.sam = sam_scraper
        self.spending = usa_spending_scraper
        self.leads = []

    def identify_growing_contractors(self, naics_code, min_growth_pct=20):
        """Find companies with growing government contract revenue"""
        current_year = datetime.now().year

        for year in [current_year - 1, current_year]:
            awards = self.spending.search_awards(
                fiscal_year=year,
            )
            # Aggregate by recipient
            # Compare year-over-year growth

    def find_new_contract_winners(self, days_back=30):
        """Find companies that recently won government contracts"""
        posted_from = (datetime.now() - timedelta(days=days_back)).strftime("%m/%d/%Y")
        opportunities = self.sam.search_opportunities(
            keywords="",
            posted_from=posted_from,
        )

        leads = []
        for opp in opportunities:
            if opp.get("awardee"):
                lead = {
                    "company_name": opp["awardee"].get("name"),
                    "contract_title": opp.get("title"),
                    "agency": opp.get("department"),
                    "award_amount": opp.get("award", {}).get("amount"),
                    "award_date": opp.get("award", {}).get("date"),
                    "naics": opp.get("naicsCode"),
                    "lead_type": "new_contract_winner",
                }
                leads.append(lead)

        return leads

    def find_expiring_contracts(self, months_ahead=6):
        """Find contracts expiring soon (renewal/replacement opportunity)"""
        end_date = (datetime.now() + timedelta(days=months_ahead * 30)).strftime("%Y-%m-%d")

        # Search for contracts ending in the target window
        awards = self.spending.search_awards()

        expiring = []
        for award in awards:
            award_end = award.get("End Date")
            if award_end and award_end <= end_date:
                expiring.append({
                    "company_name": award.get("Recipient Name"),
                    "contract_description": award.get("Description"),
                    "expiry_date": award_end,
                    "award_amount": award.get("Award Amount"),
                    "agency": award.get("Awarding Agency"),
                    "lead_type": "expiring_contract",
                })

        return expiring

Enrichment and Outreach

Government contract data provides company names and contract details but rarely includes direct contact information. Enrich using web scraping techniques:

async def enrich_contractor(contractor, proxy_url):
    """Enrich government contractor with contact data"""
    enriched = contractor.copy()

    # Search SAM.gov for registration details (includes POC)
    entity = sam_scraper.get_entity_registration(contractor.get("uei"))
    if entity:
        poc = entity.get("poc", {})
        enriched["contact_name"] = poc.get("name")
        enriched["contact_email"] = poc.get("email")
        enriched["contact_phone"] = poc.get("phone")
        enriched["address"] = entity.get("address")
        enriched["naics_codes"] = entity.get("naics_codes")

    # Scrape company website for additional contacts
    if contractor.get("website"):
        try:
            response = requests.get(
                contractor["website"],
                proxies={"https": proxy_url},
                timeout=15,
            )
            import re
            emails = re.findall(
                r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}',
                response.text
            )
            enriched["website_emails"] = list(set(emails))
        except Exception:
            pass

    return enriched

Use Cases for Government Contract Data

For Government IT Service Providers

Track which companies win IT contracts, identify subcontracting opportunities, and monitor competitor wins in your target agencies.

For SaaS Companies

Find companies receiving large contracts (indicating growth and budget availability) in your target NAICS codes. These companies are actively scaling and need software tools.

For Staffing Agencies

Monitor new contract awards to identify companies that will need to hire. A $10M contract award typically means immediate staffing needs.

For Financial Services

Track companies with growing government revenue for lending, insurance, and banking products tailored to government contractors.

Conclusion

Government procurement databases are an underutilized goldmine for B2B lead generation. The data is public, the signals are strong (confirmed budgets and project timelines), and the competition for these leads is lower than for generic business directories. Mobile proxies ensure reliable access to government platforms that throttle automated requests, while API-based approaches (where available) provide structured data at scale. Build monitoring around contract awards, entity registrations, and upcoming opportunities to maintain a continuous pipeline of high-value B2B leads.


Related Reading

Scroll to Top