How to Use Proxies for Patent and Trademark Database Scraping
Patent and trademark filings are among the most reliable indicators of a company’s innovation direction, product roadmap, and competitive strategy. When a company files a patent for a new technology or registers a trademark for a new product line, it signals investment, growth, and future market entry — all of which create B2B opportunities.
Scraping patent and trademark databases with mobile proxies enables competitive intelligence teams, IP law firms, technology scouts, and B2B sales organizations to monitor filings at scale and identify opportunities before they become public knowledge.
Why Patent Data Matters for B2B
Patent filings reveal information that companies do not voluntarily disclose:
- R&D direction — Patent applications show where a company is investing its engineering resources.
- Product launches — New trademark registrations precede product launches by 6-18 months.
- Hiring needs — Companies filing patents in new technology areas will need specialized talent.
- Acquisition signals — Patent portfolio buildup in a specific area may indicate acquisition intent.
- Partnership opportunities — Companies with complementary patent portfolios are natural partners.
B2B Use Cases
| If You Are | What Patent Data Reveals |
|---|---|
| IP Law Firm | Potential clients filing without counsel |
| Technology vendor | Companies investing in your technology area |
| Recruiter | Companies building new technology teams |
| Investor | Innovation velocity and competitive positioning |
| Market researcher | Industry trend analysis and competitive mapping |
Key Patent and Trademark Databases
| Database | Coverage | Access |
|---|---|---|
| USPTO (patents.google.com) | US patents and applications | Free, API available |
| PATSTAT (EPO) | Worldwide patent statistics | Subscription + API |
| WIPO PATENTSCOPE | International PCT applications | Free, limited API |
| Google Patents | Worldwide, full-text search | Free |
| USPTO TSDR | US trademarks | Free |
| EUIPO | EU trademarks | Free |
| Espacenet | European patents | Free |
Scraping Google Patents
Google Patents provides the most accessible interface for patent data with full-text search:
from playwright.async_api import async_playwright
import asyncio
import random
import re
import json
async def scrape_google_patents(query, proxy_config, max_results=100):
"""Scrape Google Patents search results"""
patents = []
async with async_playwright() as p:
browser = await p.chromium.launch(
proxy=proxy_config,
headless=False,
)
page = await browser.new_page()
page_num = 0
while len(patents) < max_results:
url = f"https://patents.google.com/?q={query}&oq={query}&page={page_num}"
await page.goto(url, wait_until="networkidle")
await page.wait_for_timeout(random.randint(3000, 6000))
# Extract patent cards
cards = await page.query_selector_all('search-result-item')
if not cards:
break
for card in cards:
patent = {}
# Patent number
num_el = await card.query_selector('[class*="patent-number"], .result-title span')
if num_el:
patent['patent_number'] = (await num_el.inner_text()).strip()
# Title
title_el = await card.query_selector('h3, [class*="title"]')
if title_el:
patent['title'] = (await title_el.inner_text()).strip()
# Assignee (company)
assignee_el = await card.query_selector('[class*="assignee"]')
if assignee_el:
patent['assignee'] = (await assignee_el.inner_text()).strip()
# Filing date
date_el = await card.query_selector('[class*="filing-date"], [class*="date"]')
if date_el:
patent['filing_date'] = (await date_el.inner_text()).strip()
# Abstract
abstract_el = await card.query_selector('[class*="abstract"]')
if abstract_el:
patent['abstract'] = (await abstract_el.inner_text()).strip()[:300]
if patent.get('patent_number'):
patents.append(patent)
page_num += 1
await page.wait_for_timeout(random.randint(5000, 12000))
await browser.close()
return patents[:max_results]USPTO Bulk Data Access
The USPTO provides bulk data downloads and APIs for programmatic access:
import requests
import xml.etree.ElementTree as ET
class USPTOScraper:
"""Access USPTO patent data via API and bulk downloads"""
def __init__(self, proxy_url):
self.proxy_url = proxy_url
self.session = requests.Session()
self.session.proxies = {"https": proxy_url}
self.session.headers.update({
"User-Agent": "Mozilla/5.0",
"Accept": "application/json",
})
def search_patent_applications(self, query, start_date=None, end_date=None, rows=50):
"""Search USPTO patent applications"""
params = {
"searchText": query,
"start": 0,
"rows": rows,
}
if start_date:
params["searchText"] += f" AND applicationFilingDate:[{start_date} TO {end_date or '*'}]"
response = self.session.get(
"https://developer.uspto.gov/ibd-api/v1/application/publications",
params=params,
timeout=30,
)
if response.status_code == 200:
data = response.json()
results = data.get("results", [])
return [
{
"patent_number": r.get("patentNumber"),
"title": r.get("inventionTitle"),
"abstract": r.get("abstractText", "")[:300],
"assignee": r.get("assigneeEntityName"),
"filing_date": r.get("filingDate"),
"inventors": r.get("inventorNameArrayText", []),
"classification": r.get("mainCPCSymbolText"),
}
for r in results
]
return []
def search_trademarks(self, query, status="live"):
"""Search USPTO trademark database"""
# TSDR API for trademark search
params = {
"searchText": query,
"status": status,
}
response = self.session.get(
"https://tsdrapi.uspto.gov/ts/cd/casesearch/search",
params=params,
timeout=30,
)
if response.status_code == 200:
return response.json()
return None
def get_patent_details(self, patent_number):
"""Get detailed patent information"""
response = self.session.get(
f"https://developer.uspto.gov/ibd-api/v1/patent/{patent_number}",
timeout=30,
)
if response.status_code == 200:
data = response.json()
return {
"patent_number": patent_number,
"title": data.get("inventionTitle"),
"abstract": data.get("abstractText"),
"claims": data.get("claimText"),
"assignee": data.get("assigneeEntityName"),
"inventors": data.get("inventorNameArrayText"),
"filing_date": data.get("filingDate"),
"grant_date": data.get("grantDate"),
"classifications": data.get("cpcClassifications", []),
"citations": data.get("referencedBy", []),
}
return NoneTrademark Monitoring for Brand Intelligence
New trademark registrations signal upcoming product launches. For proxy terminology used in these scraping operations, see our proxy glossary.
class TrademarkMonitor:
"""Monitor new trademark filings for competitive intelligence"""
def __init__(self, proxy_pool):
self.proxy_pool = proxy_pool
async def monitor_new_filings(self, company_names, nice_classes=None):
"""Monitor new trademark filings by specific companies"""
results = []
for company in company_names:
proxy = self.proxy_pool.get_next()
scraper = USPTOScraper(proxy)
# Search for recent filings by this company
trademarks = scraper.search_trademarks(
query=f'owner:"{company}"',
status="live",
)
if trademarks:
for tm in trademarks.get("results", []):
filing = {
"company": company,
"mark_text": tm.get("markText"),
"serial_number": tm.get("serialNumber"),
"filing_date": tm.get("filingDate"),
"status": tm.get("status"),
"nice_classes": tm.get("niceClasses", []),
"goods_services": tm.get("goodsAndServices"),
}
results.append(filing)
await asyncio.sleep(random.uniform(3, 8))
return results
def analyze_filings(self, filings):
"""Analyze trademark filings for business intelligence"""
analysis = {}
for filing in filings:
company = filing["company"]
if company not in analysis:
analysis[company] = {
"total_filings": 0,
"recent_filings": [],
"nice_classes": set(),
"potential_products": [],
}
analysis[company]["total_filings"] += 1
analysis[company]["recent_filings"].append(filing)
analysis[company]["nice_classes"].update(filing.get("nice_classes", []))
# Infer product type from Nice classification
nice_descriptions = {
9: "Software/Electronics",
35: "Business Services",
38: "Telecommunications",
41: "Education/Entertainment",
42: "Technology Services",
45: "Legal/Security Services",
}
for cls in filing.get("nice_classes", []):
if cls in nice_descriptions:
analysis[company]["potential_products"].append(
f"{filing.get('mark_text')}: {nice_descriptions[cls]}"
)
# Convert sets to lists for serialization
for company in analysis:
analysis[company]["nice_classes"] = list(analysis[company]["nice_classes"])
return analysisCompetitor Patent Portfolio Analysis
Track how competitors build their IP portfolios over time:
class PatentPortfolioAnalyzer:
"""Analyze competitor patent portfolios"""
def __init__(self, proxy_pool):
self.proxy_pool = proxy_pool
def build_portfolio(self, company_name, years_back=5):
"""Build a patent portfolio view for a company"""
proxy = self.proxy_pool.get_next()
scraper = USPTOScraper(proxy)
from datetime import datetime, timedelta
start_date = (datetime.now() - timedelta(days=years_back * 365)).strftime("%Y-%m-%d")
end_date = datetime.now().strftime("%Y-%m-%d")
patents = scraper.search_patent_applications(
query=f'assignee:"{company_name}"',
start_date=start_date,
end_date=end_date,
rows=1000,
)
portfolio = {
"company": company_name,
"total_patents": len(patents),
"patents_by_year": {},
"technology_areas": {},
"key_inventors": {},
"patents": patents,
}
for patent in patents:
# Group by year
year = patent.get("filing_date", "")[:4]
if year:
portfolio["patents_by_year"][year] = portfolio["patents_by_year"].get(year, 0) + 1
# Group by technology classification
classification = patent.get("classification", "Unknown")
if classification:
main_class = classification.split("/")[0] if "/" in classification else classification
portfolio["technology_areas"][main_class] = portfolio["technology_areas"].get(main_class, 0) + 1
# Track prolific inventors
for inventor in patent.get("inventors", []):
portfolio["key_inventors"][inventor] = portfolio["key_inventors"].get(inventor, 0) + 1
# Sort inventors by patent count
portfolio["key_inventors"] = dict(
sorted(portfolio["key_inventors"].items(), key=lambda x: x[1], reverse=True)[:20]
)
return portfolio
def compare_portfolios(self, companies):
"""Compare patent portfolios across competitors"""
portfolios = {}
for company in companies:
portfolios[company] = self.build_portfolio(company)
comparison = {
"companies": companies,
"total_patents": {c: p["total_patents"] for c, p in portfolios.items()},
"filing_trends": {c: p["patents_by_year"] for c, p in portfolios.items()},
"technology_overlap": self.find_technology_overlap(portfolios),
}
return comparison
def find_technology_overlap(self, portfolios):
"""Find technology areas where competitors overlap"""
all_areas = {}
for company, portfolio in portfolios.items():
for area, count in portfolio["technology_areas"].items():
if area not in all_areas:
all_areas[area] = {}
all_areas[area][company] = count
# Return areas where 2+ companies have patents
overlap = {
area: companies
for area, companies in all_areas.items()
if len(companies) >= 2
}
return overlapConverting Patent Data to B2B Leads
Transform patent intelligence into actionable sales leads. For teams already running web scraping operations, adding patent monitoring creates a high-value enrichment data source.
class PatentLeadGenerator:
"""Generate B2B leads from patent data"""
def __init__(self, proxy_pool):
self.proxy_pool = proxy_pool
def find_leads_from_patents(self, technology_area, min_patents=1):
"""Find companies filing patents in a specific area"""
proxy = self.proxy_pool.get_next()
scraper = USPTOScraper(proxy)
patents = scraper.search_patent_applications(
query=technology_area,
rows=500,
)
# Aggregate by company
companies = {}
for patent in patents:
assignee = patent.get("assignee", "Unknown")
if assignee == "Unknown":
continue
if assignee not in companies:
companies[assignee] = {
"company": assignee,
"patent_count": 0,
"patents": [],
"inventors": set(),
"latest_filing": None,
}
companies[assignee]["patent_count"] += 1
companies[assignee]["patents"].append(patent)
for inv in patent.get("inventors", []):
companies[assignee]["inventors"].add(inv)
filing_date = patent.get("filing_date")
if filing_date:
if not companies[assignee]["latest_filing"] or filing_date > companies[assignee]["latest_filing"]:
companies[assignee]["latest_filing"] = filing_date
# Filter and score
leads = []
for company_name, data in companies.items():
if data["patent_count"] >= min_patents:
data["inventors"] = list(data["inventors"])
data["lead_score"] = self.score_patent_lead(data)
leads.append(data)
return sorted(leads, key=lambda x: x["lead_score"], reverse=True)
def score_patent_lead(self, company_data):
"""Score a patent-based lead"""
score = 0
# More patents = more investment in the area
patent_count = company_data["patent_count"]
if patent_count >= 10:
score += 40
elif patent_count >= 5:
score += 30
elif patent_count >= 2:
score += 20
else:
score += 10
# Recent filings are more relevant
latest = company_data.get("latest_filing", "")
if latest:
from datetime import datetime
try:
filing_date = datetime.strptime(latest[:10], "%Y-%m-%d")
months_ago = (datetime.now() - filing_date).days / 30
if months_ago <= 6:
score += 30
elif months_ago <= 12:
score += 20
elif months_ago <= 24:
score += 10
except ValueError:
pass
# Multiple inventors = larger team
inventor_count = len(company_data.get("inventors", []))
if inventor_count >= 5:
score += 15
elif inventor_count >= 2:
score += 10
return scoreAutomated Patent Alert System
import json
from datetime import datetime
class PatentAlertSystem:
"""Automated alerts for new patent filings"""
def __init__(self, proxy_pool, storage):
self.proxy_pool = proxy_pool
self.storage = storage
def configure_alerts(self, alert_configs):
"""Set up monitoring alerts"""
self.alerts = alert_configs
# Example:
# [
# {"type": "company", "name": "CompetitorX", "technology": None},
# {"type": "technology", "name": None, "technology": "machine learning"},
# {"type": "inventor", "name": "John Smith", "technology": None},
# ]
def check_new_filings(self):
"""Check for new patent filings matching alert criteria"""
new_filings = []
proxy = self.proxy_pool.get_next()
scraper = USPTOScraper(proxy)
last_check = self.storage.get("last_patent_check", "2025-01-01")
for alert in self.alerts:
if alert["type"] == "company":
results = scraper.search_patent_applications(
query=f'assignee:"{alert["name"]}"',
start_date=last_check,
)
elif alert["type"] == "technology":
results = scraper.search_patent_applications(
query=alert["technology"],
start_date=last_check,
)
for result in results:
result["alert_type"] = alert["type"]
result["alert_name"] = alert.get("name") or alert.get("technology")
new_filings.append(result)
self.storage.set("last_patent_check", datetime.now().strftime("%Y-%m-%d"))
return new_filingsConclusion
Patent and trademark databases provide unique competitive intelligence and lead generation signals that most B2B teams overlook. Filing patterns reveal where companies are investing, what products they are developing, and which technology areas are seeing increased innovation. Mobile proxies ensure reliable access to USPTO, Google Patents, and international patent databases for large-scale data extraction. Build monitoring around your competitors’ filing activity, your target technology areas, and new trademark registrations to identify opportunities months before they become publicly obvious. This data enriches your existing lead profiles and creates an entirely new category of high-intent prospects.
- How to Build an Automated Lead Scraping Pipeline with Proxies
- Building a B2B Contact Enrichment Pipeline with Mobile Proxies
- How to Scrape Job Listings at Scale with Rotating Proxies
- Proxies for HR Tech: Salary Benchmarking & Talent Intelligence
- How to Scrape AliExpress Product Data Without Getting Blocked
- Amazon Buy Box Monitoring: Proxy Setup for Continuous Tracking
- How to Build an Automated Lead Scraping Pipeline with Proxies
- Building a B2B Contact Enrichment Pipeline with Mobile Proxies
- How to Scrape Job Listings at Scale with Rotating Proxies
- Proxies for HR Tech: Salary Benchmarking & Talent Intelligence
- aiohttp + BeautifulSoup: Async Python Scraping
- How to Scrape AliExpress Product Data Without Getting Blocked
- How to Build an Automated Lead Scraping Pipeline with Proxies
- Building a B2B Contact Enrichment Pipeline with Mobile Proxies
- How to Scrape Job Listings at Scale with Rotating Proxies
- Proxies for HR Tech: Salary Benchmarking & Talent Intelligence
- aiohttp + BeautifulSoup: Async Python Scraping
- How to Scrape AliExpress Product Data Without Getting Blocked
- How to Build an Automated Lead Scraping Pipeline with Proxies
- Building a B2B Contact Enrichment Pipeline with Mobile Proxies
- How to Scrape Job Listings at Scale with Rotating Proxies
- Proxies for HR Tech: Salary Benchmarking & Talent Intelligence
- aiohttp + BeautifulSoup: Async Python Scraping
- How to Scrape AliExpress Product Data Without Getting Blocked
Related Reading
- How to Build an Automated Lead Scraping Pipeline with Proxies
- Building a B2B Contact Enrichment Pipeline with Mobile Proxies
- How to Scrape Job Listings at Scale with Rotating Proxies
- Proxies for HR Tech: Salary Benchmarking & Talent Intelligence
- aiohttp + BeautifulSoup: Async Python Scraping
- How to Scrape AliExpress Product Data Without Getting Blocked
last updated: April 3, 2026