Proxies for Scraping Government Contract and Procurement Databases
Government contracts represent a massive B2B opportunity. The US federal government alone awards over $700 billion in contracts annually, and state and local governments add hundreds of billions more. Companies that win government contracts become excellent B2B leads — they have confirmed budgets, known project timelines, and documented needs.
Scraping government procurement databases with mobile proxies lets you identify contract awardees, track spending patterns, and build targeted lead lists of companies actively doing business with government agencies.
Key Government Data Sources
Federal Databases
| Database | URL | Data Available |
|---|---|---|
| SAM.gov | sam.gov | Contract opportunities, entity registrations |
| USAspending.gov | usaspending.gov | Award data, spending by agency |
| FPDS | fpds.gov | Federal procurement data |
| SBIR.gov | sbir.gov | Small business innovation research |
| GovWin (Deltek) | govwin.com | Intelligence on upcoming contracts |
State and Local
| Source | Coverage |
|---|---|
| State procurement portals | Each state has its own system |
| BidNet Direct | Multi-state bid aggregator |
| PublicPurchase | Municipal procurement |
| Government Bids | Aggregated government RFPs |
Why Proxies Are Needed for Government Sites
While government data is public by design, the websites hosting it have technical limitations:
- Rate limiting — SAM.gov and USAspending throttle rapid requests to protect server resources.
- Session management — Many procurement portals use session-based access that expires after inactivity.
- Geographic restrictions — Some state portals restrict access to in-state IP addresses or require US-based IPs.
- Legacy infrastructure — Government websites often have unpredictable response times and frequent timeouts.
- CAPTCHA challenges — Several portals employ CAPTCHA for repeated searches.
Mobile proxies provide reliable US-based IP addresses with the session persistence needed for these platforms.
Scraping SAM.gov Contract Opportunities
SAM.gov is the primary source for federal contract opportunities and entity registrations:
import requests
import time
import random
from datetime import datetime, timedelta
class SAMGovScraper:
"""Scrape SAM.gov for contract opportunities and registrations"""
def __init__(self, api_key, proxy_url):
self.api_key = api_key
self.proxy_url = proxy_url
self.base_url = "https://api.sam.gov"
self.session = requests.Session()
self.session.proxies = {"https": proxy_url}
self.session.headers.update({
"User-Agent": "Mozilla/5.0",
"Accept": "application/json",
})
def search_opportunities(self, keywords, posted_from=None, limit=100):
"""Search for contract opportunities"""
if posted_from is None:
posted_from = (datetime.now() - timedelta(days=30)).strftime("%m/%d/%Y")
params = {
"api_key": self.api_key,
"q": keywords,
"postedFrom": posted_from,
"limit": limit,
"offset": 0,
}
all_opportunities = []
while True:
response = self.session.get(
f"{self.base_url}/opportunities/v2/search",
params=params,
timeout=30,
)
if response.status_code == 200:
data = response.json()
opportunities = data.get("opportunitiesData", [])
all_opportunities.extend(opportunities)
if len(opportunities) < limit:
break
params["offset"] += limit
time.sleep(random.uniform(2, 5))
elif response.status_code == 429:
time.sleep(60)
else:
print(f"Error: {response.status_code}")
break
return all_opportunities
def get_entity_registration(self, uei):
"""Look up entity registration details by UEI"""
response = self.session.get(
f"{self.base_url}/entity-information/v3/entities",
params={
"api_key": self.api_key,
"ueiSAM": uei,
},
timeout=30,
)
if response.status_code == 200:
data = response.json()
entities = data.get("entityData", [])
if entities:
entity = entities[0]
return {
"uei": uei,
"legal_name": entity.get("entityRegistration", {}).get("legalBusinessName"),
"dba_name": entity.get("entityRegistration", {}).get("dbaName"),
"cage_code": entity.get("entityRegistration", {}).get("cageCode"),
"naics_codes": entity.get("assertions", {}).get("naicsCode", []),
"address": entity.get("coreData", {}).get("physicalAddress"),
"poc": entity.get("pointsOfContact"),
}
return None
def search_entities_by_naics(self, naics_code, state=None):
"""Find registered entities by NAICS code"""
params = {
"api_key": self.api_key,
"naicsCode": naics_code,
"registrationStatus": "A", # Active only
}
if state:
params["stateCode"] = state
response = self.session.get(
f"{self.base_url}/entity-information/v3/entities",
params=params,
timeout=30,
)
if response.status_code == 200:
return response.json().get("entityData", [])
return []Scraping USAspending.gov
USAspending provides detailed award data showing which companies received contracts and for how much:
class USASpendingScraper:
"""Scrape USAspending.gov for contract award data"""
def __init__(self, proxy_url):
self.proxy_url = proxy_url
self.base_url = "https://api.usaspending.gov/api/v2"
self.session = requests.Session()
self.session.proxies = {"https": proxy_url}
def search_awards(self, keywords=None, agency=None, min_amount=None, fiscal_year=None):
"""Search for contract awards"""
filters = {"award_type_codes": ["A", "B", "C", "D"]} # Contract types
if keywords:
filters["keywords"] = [keywords]
if agency:
filters["agencies"] = [{"type": "awarding", "tier": "toptier", "name": agency}]
if min_amount:
filters["award_amounts"] = [{"lower_bound": min_amount}]
if fiscal_year:
filters["time_period"] = [{"start_date": f"{fiscal_year}-10-01", "end_date": f"{fiscal_year + 1}-09-30"}]
payload = {
"filters": filters,
"fields": [
"Award ID", "Recipient Name", "Award Amount",
"Awarding Agency", "Start Date", "End Date",
"Description", "recipient_id"
],
"page": 1,
"limit": 100,
"sort": "Award Amount",
"order": "desc",
}
all_awards = []
while True:
response = self.session.post(
f"{self.base_url}/search/spending_by_award/",
json=payload,
timeout=30,
)
if response.status_code == 200:
data = response.json()
awards = data.get("results", [])
all_awards.extend(awards)
if not data.get("hasNext", False):
break
payload["page"] += 1
time.sleep(random.uniform(1, 3))
else:
break
return all_awards
def get_recipient_profile(self, recipient_id):
"""Get detailed profile of a contract recipient"""
response = self.session.get(
f"{self.base_url}/recipient/{recipient_id}/",
timeout=30,
)
if response.status_code == 200:
data = response.json()
return {
"name": data.get("name"),
"duns": data.get("duns"),
"uei": data.get("uei"),
"parent_name": data.get("parent_name"),
"location": data.get("location"),
"total_transaction_amount": data.get("total_transaction_amount"),
"total_contracts": data.get("total_contracts"),
"total_grants": data.get("total_grants"),
}
return NoneState Procurement Portal Scraping
State procurement portals vary widely in structure. Browser automation handles the diversity. For a refresher on proxy concepts, visit our proxy glossary.
from playwright.async_api import async_playwright
async def scrape_state_procurement(state_url, proxy_config, search_query):
"""Generic scraper for state procurement portals"""
async with async_playwright() as p:
browser = await p.chromium.launch(
proxy=proxy_config,
headless=False,
)
page = await browser.new_page()
await page.goto(state_url, wait_until="networkidle")
await page.wait_for_timeout(random.randint(3000, 6000))
# Most state portals have a search box
search_input = await page.query_selector(
'input[type="text"], input[type="search"], input[name*="search"], input[name*="keyword"]'
)
if search_input:
await search_input.fill(search_query)
await page.keyboard.press("Enter")
await page.wait_for_timeout(random.randint(3000, 8000))
# Extract results (generic approach)
results = []
rows = await page.query_selector_all('table tr, [class*="result"], [class*="bid"]')
for row in rows:
cells = await row.query_selector_all('td, [class*="cell"]')
if cells:
texts = []
for cell in cells:
text = (await cell.inner_text()).strip()
texts.append(text)
if texts:
results.append(texts)
await browser.close()
return resultsBuilding a Contract Intelligence Pipeline
Combine federal and state data into a unified lead pipeline:
class ContractIntelligencePipeline:
"""Build lead lists from government contract data"""
def __init__(self, sam_scraper, usa_spending_scraper):
self.sam = sam_scraper
self.spending = usa_spending_scraper
self.leads = []
def identify_growing_contractors(self, naics_code, min_growth_pct=20):
"""Find companies with growing government contract revenue"""
current_year = datetime.now().year
for year in [current_year - 1, current_year]:
awards = self.spending.search_awards(
fiscal_year=year,
)
# Aggregate by recipient
# Compare year-over-year growth
def find_new_contract_winners(self, days_back=30):
"""Find companies that recently won government contracts"""
posted_from = (datetime.now() - timedelta(days=days_back)).strftime("%m/%d/%Y")
opportunities = self.sam.search_opportunities(
keywords="",
posted_from=posted_from,
)
leads = []
for opp in opportunities:
if opp.get("awardee"):
lead = {
"company_name": opp["awardee"].get("name"),
"contract_title": opp.get("title"),
"agency": opp.get("department"),
"award_amount": opp.get("award", {}).get("amount"),
"award_date": opp.get("award", {}).get("date"),
"naics": opp.get("naicsCode"),
"lead_type": "new_contract_winner",
}
leads.append(lead)
return leads
def find_expiring_contracts(self, months_ahead=6):
"""Find contracts expiring soon (renewal/replacement opportunity)"""
end_date = (datetime.now() + timedelta(days=months_ahead * 30)).strftime("%Y-%m-%d")
# Search for contracts ending in the target window
awards = self.spending.search_awards()
expiring = []
for award in awards:
award_end = award.get("End Date")
if award_end and award_end <= end_date:
expiring.append({
"company_name": award.get("Recipient Name"),
"contract_description": award.get("Description"),
"expiry_date": award_end,
"award_amount": award.get("Award Amount"),
"agency": award.get("Awarding Agency"),
"lead_type": "expiring_contract",
})
return expiringEnrichment and Outreach
Government contract data provides company names and contract details but rarely includes direct contact information. Enrich using web scraping techniques:
async def enrich_contractor(contractor, proxy_url):
"""Enrich government contractor with contact data"""
enriched = contractor.copy()
# Search SAM.gov for registration details (includes POC)
entity = sam_scraper.get_entity_registration(contractor.get("uei"))
if entity:
poc = entity.get("poc", {})
enriched["contact_name"] = poc.get("name")
enriched["contact_email"] = poc.get("email")
enriched["contact_phone"] = poc.get("phone")
enriched["address"] = entity.get("address")
enriched["naics_codes"] = entity.get("naics_codes")
# Scrape company website for additional contacts
if contractor.get("website"):
try:
response = requests.get(
contractor["website"],
proxies={"https": proxy_url},
timeout=15,
)
import re
emails = re.findall(
r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}',
response.text
)
enriched["website_emails"] = list(set(emails))
except Exception:
pass
return enrichedUse Cases for Government Contract Data
For Government IT Service Providers
Track which companies win IT contracts, identify subcontracting opportunities, and monitor competitor wins in your target agencies.
For SaaS Companies
Find companies receiving large contracts (indicating growth and budget availability) in your target NAICS codes. These companies are actively scaling and need software tools.
For Staffing Agencies
Monitor new contract awards to identify companies that will need to hire. A $10M contract award typically means immediate staffing needs.
For Financial Services
Track companies with growing government revenue for lending, insurance, and banking products tailored to government contractors.
Conclusion
Government procurement databases are an underutilized goldmine for B2B lead generation. The data is public, the signals are strong (confirmed budgets and project timelines), and the competition for these leads is lower than for generic business directories. Mobile proxies ensure reliable access to government platforms that throttle automated requests, while API-based approaches (where available) provide structured data at scale. Build monitoring around contract awards, entity registrations, and upcoming opportunities to maintain a continuous pipeline of high-value B2B leads.
- How to Build an Automated Lead Scraping Pipeline with Proxies
- Building a B2B Contact Enrichment Pipeline with Mobile Proxies
- How to Scrape Job Listings at Scale with Rotating Proxies
- Proxies for HR Tech: Salary Benchmarking & Talent Intelligence
- How to Scrape AliExpress Product Data Without Getting Blocked
- Amazon Buy Box Monitoring: Proxy Setup for Continuous Tracking
- How to Build an Automated Lead Scraping Pipeline with Proxies
- Building a B2B Contact Enrichment Pipeline with Mobile Proxies
- How to Scrape Job Listings at Scale with Rotating Proxies
- Proxies for HR Tech: Salary Benchmarking & Talent Intelligence
- aiohttp + BeautifulSoup: Async Python Scraping
- How to Scrape AliExpress Product Data Without Getting Blocked
- How to Build an Automated Lead Scraping Pipeline with Proxies
- Building a B2B Contact Enrichment Pipeline with Mobile Proxies
- How to Scrape Job Listings at Scale with Rotating Proxies
- Proxies for HR Tech: Salary Benchmarking & Talent Intelligence
- aiohttp + BeautifulSoup: Async Python Scraping
- How to Scrape AliExpress Product Data Without Getting Blocked
- How to Build an Automated Lead Scraping Pipeline with Proxies
- Building a B2B Contact Enrichment Pipeline with Mobile Proxies
- How to Scrape Job Listings at Scale with Rotating Proxies
- Proxies for HR Tech: Salary Benchmarking & Talent Intelligence
- aiohttp + BeautifulSoup: Async Python Scraping
- How to Scrape AliExpress Product Data Without Getting Blocked
Related Reading
- How to Build an Automated Lead Scraping Pipeline with Proxies
- Building a B2B Contact Enrichment Pipeline with Mobile Proxies
- How to Scrape Job Listings at Scale with Rotating Proxies
- Proxies for HR Tech: Salary Benchmarking & Talent Intelligence
- aiohttp + BeautifulSoup: Async Python Scraping
- How to Scrape AliExpress Product Data Without Getting Blocked