How to Scrape Yellow Pages Business Data

Yellow Pages remains one of the largest business directories in the United States, listing millions of local businesses across every industry and geography. Scraping Yellow Pages gives you access to business names, addresses, phone numbers, categories, websites, and customer reviews — essential data for lead generation, market research, and competitive analysis.

This guide walks through scraping Yellow Pages with Python, covering HTML structure analysis, data extraction, pagination handling, proxy rotation, and best practices for reliable, large-scale collection.

Why Scrape Yellow Pages?

Yellow Pages data powers several valuable business use cases:

Lead generation — Build targeted prospect lists by industry and location [link to: b2b-lead-gen]
Market research — Analyze business density, competition levels, and market saturation by region
Data enrichment — Supplement existing CRM data with phone numbers, addresses, and websites
Competitive intelligence — Map competitor locations and service areas
Local SEO analysis — Audit business listings for NAP (Name, Address, Phone) consistency

What Data Can You Extract?

Data Field	Availability	Notes
Business name	Always	Primary listing title
Address	Usually	Street, city, state, ZIP
Phone number	Usually	Primary contact number
Website URL	Sometimes	If listed by business
Business category	Always	Industry classification
Rating/Reviews	Sometimes	Star rating + review count
Hours of operation	Sometimes	If provided by business
Years in business	Occasionally	Badge on some listings

Understanding Yellow Pages Structure

Yellow Pages search results follow a predictable URL pattern:

https://www.yellowpages.com/search?search_terms={query}&geo_location_terms={location}

For example:

Plumbers in New York: https://www.yellowpages.com/search?search_terms=plumber&geo_location_terms=new+york+ny
Restaurants in Chicago: https://www.yellowpages.com/search?search_terms=restaurant&geo_location_terms=chicago+il

Pagination adds a page parameter:

https://www.yellowpages.com/search?search_terms=plumber&geo_location_terms=new+york+ny&page=2

Basic Yellow Pages Scraper

import requests
from bs4 import BeautifulSoup
import json
import time

class YellowPagesScraper:
def __init__(self, proxy=None):
self.session = requests.Session()
self.session.headers.update({
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8',
'Accept-Language': 'en-US,en;q=0.9',
'Accept-Encoding': 'gzip, deflate, br',
'Connection': 'keep-alive',
})
if proxy:
self.session.proxies = {
'http': proxy,
'https': proxy,
}

def search(self, query, location, max_pages=5):
"""Search Yellow Pages and return business listings."""
all_businesses = []

for page in range(1, max_pages + 1):
url = f"https://www.yellowpages.com/search"
params = {
'search_terms': query,
'geo_location_terms': location,
'page': page,
}

print(f"Scraping page {page} for '{query}' in '{location}'...")

try:
response = self.session.get(url, params=params, timeout=30)
response.raise_for_status()
except requests.RequestException as e:
print(f"Error on page {page}: {e}")
continue

businesses = self._parse_results(response.text)

if not businesses:
print(f"No more results on page {page}. Stopping.")
break

all_businesses.extend(businesses)
print(f"  Found {len(businesses)} businesses (total: {len(all_businesses)})")

# Rate limiting
time.sleep(2)

return all_businesses

def _parse_results(self, html):
"""Parse search results page and extract business data."""
soup = BeautifulSoup(html, 'lxml')
businesses = []

# Find all business listing cards
listings = soup.select('div.result')

for listing in listings:
business = {}

# Business name
name_el = listing.select_one('a.business-name')
if name_el:
business['name'] = name_el.text.strip()
business['detail_url'] = 'https://www.yellowpages.com' + name_el.get('href', '')
else:
continue  # Skip if no name found

# Phone number
phone_el = listing.select_one('div.phones')
business['phone'] = phone_el.text.strip() if phone_el else None

# Address
street_el = listing.select_one('div.street-address')
locality_el = listing.select_one('div.locality')

street = street_el.text.strip() if street_el else ''
locality = locality_el.text.strip() if locality_el else ''
business['address'] = f"{street}, {locality}".strip(', ')

# Category
category_el = listing.select_one('div.categories a')
business['category'] = category_el.text.strip() if category_el else None

# Rating
rating_el = listing.select_one('div.ratings')
if rating_el:
stars = rating_el.select_one('div.rating')
count = rating_el.select_one('span.count')
business['rating'] = stars.get('class', [''])[1] if stars else None
business['review_count'] = count.text.strip('()') if count else '0'
else:
business['rating'] = None
business['review_count'] = '0'

# Website
website_el = listing.select_one('a.track-visit-website')
business['website'] = website_el.get('href', None) if website_el else None

businesses.append(business)

return businesses

def save_json(self, data, filename):
"""Save results to JSON file."""
with open(filename, 'w') as f:
json.dump(data, f, indent=2)
print(f"Saved {len(data)} businesses to {filename}")

def save_csv(self, data, filename):
"""Save results to CSV file."""
import csv
if not data:
return

keys = data[0].keys()
with open(filename, 'w', newline='', encoding='utf-8') as f:
writer = csv.DictWriter(f, fieldnames=keys)
writer.writeheader()
writer.writerows(data)
print(f"Saved {len(data)} businesses to {filename}")


Usage
scraper = YellowPagesScraper(
proxy='http://user:pass@gate.smartproxy.com:7777'
)

Search for plumbers in New York
businesses = scraper.search('plumber', 'New York, NY', max_pages=5)

Save results
scraper.save_json(businesses, 'plumbers_nyc.json')
scraper.save_csv(businesses, 'plumbers_nyc.csv')

Scraping Business Detail Pages

For richer data, scrape individual business detail pages:

def scrape_detail_page(self, url):
"""Scrape a business detail page for additional information."""
try:
response = self.session.get(url, timeout=30)
response.raise_for_status()
except requests.RequestException as e:
print(f"Error fetching detail page: {e}")
return {}

soup = BeautifulSoup(response.text, 'lxml')
details = {}

# Business hours
hours_section = soup.select('div.open-details tr')
if hours_section:
hours = {}
for row in hours_section:
day = row.select_one('td:first-child')
time_val = row.select_one('td:last-child')
if day and time_val:
hours[day.text.strip()] = time_val.text.strip()
details['hours'] = hours

# About / Description
about = soup.select_one('dd.details-description')
details['description'] = about.text.strip() if about else None

# Services offered
services = soup.select('div.services-services a')
details['services'] = [s.text.strip() for s in services]

# Amenities
amenities = soup.select('div.amenities dd')
details['amenities'] = [a.text.strip() for a in amenities]

# Years in business
years = soup.select_one('div.years-in-business span.count')
details['years_in_business'] = years.text.strip() if years else None

return details

Multi-Location Scraping

To scrape across multiple cities or industries:

import itertools

def scrape_multiple_locations(scraper, queries, locations, max_pages=3):
"""Scrape Yellow Pages across multiple queries and locations."""
all_results = []

combinations = list(itertools.product(queries, locations))
total = len(combinations)

for i, (query, location) in enumerate(combinations, 1):
print(f"\n[{i}/{total}] Scraping '{query}' in '{location}'")

results = scraper.search(query, location, max_pages=max_pages)
for r in results:
r['search_query'] = query
r['search_location'] = location

all_results.extend(results)
print(f"  Running total: {len(all_results)} businesses")

time.sleep(3)  # Delay between searches

return all_results

Example: Scrape multiple industries across multiple cities
queries = ['plumber', 'electrician', 'hvac contractor']
locations = ['New York, NY', 'Los Angeles, CA', 'Chicago, IL', 'Houston, TX']

scraper = YellowPagesScraper(proxy='http://user:pass@gate.smartproxy.com:7777')
all_businesses = scrape_multiple_locations(scraper, queries, locations)
scraper.save_json(all_businesses, 'contractors_multi_city.json')

Handling Anti-Scraping Measures

Yellow Pages implements several anti-scraping protections:

Rate Limiting

Yellow Pages will block or serve CAPTCHAs if you send requests too quickly. Best practices:

Add 2-3 second delays between page requests
Add 5-10 second delays between different search queries
Randomize delay timing to avoid pattern detection

import random

delay = random.uniform(2.0, 4.0)
time.sleep(delay)

Proxy Rotation

For large-scale scraping, rotate through residential proxies to distribute requests across many IP addresses [link to: best-residential-proxy-providers]:

proxy_list = [
'http://user:pass@gate.smartproxy.com:7777',
'http://user:pass@gate.smartproxy.com:7778',
'http://user:pass@gate.smartproxy.com:7779',
]

def get_rotating_proxy():
return random.choice(proxy_list)

User-Agent Rotation

Rotate user agents to avoid fingerprint-based blocking:

USER_AGENTS = [
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 Chrome/120.0.0.0',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 14_0) AppleWebKit/537.36 Chrome/120.0.0.0',
'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:121.0) Gecko/20100101 Firefox/121.0',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 14_0) AppleWebKit/605.1.15 Safari/17.0',
]

headers = {'User-Agent': random.choice(USER_AGENTS)}

Data Cleaning and Deduplication

Raw scraped data often needs cleaning:

import re

def clean_business_data(businesses):
"""Clean and deduplicate scraped business listings."""
seen = set()
cleaned = []

for biz in businesses:
# Normalize phone number
if biz.get('phone'):
biz['phone'] = re.sub(r'[^\d]', '', biz['phone'])
if len(biz['phone']) == 10:
biz['phone'] = f"({biz['phone'][:3]}) {biz['phone'][3:6]}-{biz['phone'][6:]}"

# Normalize name
if biz.get('name'):
biz['name'] = biz['name'].strip()

# Deduplicate by phone + name combination
key = (biz.get('name', '').lower(), biz.get('phone', ''))
if key not in seen:
seen.add(key)
cleaned.append(biz)

print(f"Cleaned: {len(businesses)} -> {len(cleaned)} (removed {len(businesses) - len(cleaned)} duplicates)")
return cleaned

Alternative Approaches

Using the Yellow Pages API

Yellow Pages does not offer a public API, but some third-party services provide structured Yellow Pages data:

SerpApi — Offers a Yellow Pages API endpoint
Outscraper — Provides Yellow Pages data as a service
ScrapingBee — Handles rendering and anti-bot for you

These services are more expensive per query but eliminate the need to manage proxies and anti-detection yourself [link to: best-web-scraping-apis].

Similar Directories to Scrape

If Yellow Pages does not cover your needs, consider these alternatives:

Yelp — More reviews and photos [link to: how-to-scrape-yelp]
Google Maps — Largest business directory globally
BBB (Better Business Bureau) — Business ratings and complaints
Manta — US business directory
Whitepages — Phone and address lookups

Legal and Ethical Considerations

Before scraping Yellow Pages at scale, understand the legal landscape:

Yellow Pages’ Terms of Service prohibit automated data collection
Business names, addresses, and phone numbers are generally considered public information
Respect rate limits and do not overload their servers
Consider the CFAA (Computer Fraud and Abuse Act) implications for US-based scrapers
GDPR may apply if you collect data about EU-based businesses or individuals

Always consult with a legal professional before building a commercial scraping operation [link to: is-web-scraping-legal].

Frequently Asked Questions

Is it legal to scrape Yellow Pages?

Scraping publicly available business directory data exists in a legal gray area. While the data itself (business names, addresses, phone numbers) is generally public, Yellow Pages’ Terms of Service prohibit automated collection. The legal risk depends on your jurisdiction, the volume of data collected, and how you use it. Consult a lawyer for your specific use case [link to: is-web-scraping-legal].

How many listings can I scrape from Yellow Pages?

Yellow Pages shows approximately 30 listings per search results page with up to 100 pages per search query (around 3,000 results per search). By combining different search terms and locations, you can access millions of listings. Using residential proxies and proper rate limiting, you can reliably scrape thousands of listings per day [link to: best-rotating-proxy-services].

Does Yellow Pages have an API?

Yellow Pages does not offer a public API for bulk data access. You can use third-party services like SerpApi or Outscraper that provide structured Yellow Pages data through their APIs, or build your own scraper as shown in this guide [link to: best-web-scraping-apis].

How do I avoid getting blocked when scraping Yellow Pages?

Use residential proxies with rotation, add random delays between requests (2-5 seconds), rotate user agents, and avoid scraping during peak hours. If you encounter CAPTCHAs, reduce your request rate and switch to a fresh set of proxy IPs [link to: best-residential-proxy-providers].

Can I scrape Yellow Pages with a no-code tool?

Yes, tools like Octoparse, ParseHub, and WebScraper.io can scrape Yellow Pages without coding. These tools provide visual point-and-click interfaces for selecting data elements and handling pagination. However, they may struggle with anti-bot protections at scale [link to: best-no-code-web-scrapers].

last updated: March 12, 2026