How to Scrape Yellow Pages Business Data

How to Scrape Yellow Pages Business Data

Yellow Pages remains one of the largest business directories in the United States, listing millions of local businesses across every industry and geography. Scraping Yellow Pages gives you access to business names, addresses, phone numbers, categories, websites, and customer reviews — essential data for lead generation, market research, and competitive analysis.

This guide walks through scraping Yellow Pages with Python, covering HTML structure analysis, data extraction, pagination handling, proxy rotation, and best practices for reliable, large-scale collection.

Why Scrape Yellow Pages?

Yellow Pages data powers several valuable business use cases:

  • Lead generation — Build targeted prospect lists by industry and location [link to: b2b-lead-gen]
  • Market research — Analyze business density, competition levels, and market saturation by region
  • Data enrichment — Supplement existing CRM data with phone numbers, addresses, and websites
  • Competitive intelligence — Map competitor locations and service areas
  • Local SEO analysis — Audit business listings for NAP (Name, Address, Phone) consistency

What Data Can You Extract?

Data FieldAvailabilityNotes
Business nameAlwaysPrimary listing title
AddressUsuallyStreet, city, state, ZIP
Phone numberUsuallyPrimary contact number
Website URLSometimesIf listed by business
Business categoryAlwaysIndustry classification
Rating/ReviewsSometimesStar rating + review count
Hours of operationSometimesIf provided by business
Years in businessOccasionallyBadge on some listings

Understanding Yellow Pages Structure

Yellow Pages search results follow a predictable URL pattern:

https://www.yellowpages.com/search?search_terms={query}&geo_location_terms={location}

For example:

  • Plumbers in New York: https://www.yellowpages.com/search?search_terms=plumber&geo_location_terms=new+york+ny
  • Restaurants in Chicago: https://www.yellowpages.com/search?search_terms=restaurant&geo_location_terms=chicago+il

Pagination adds a page parameter:

https://www.yellowpages.com/search?search_terms=plumber&geo_location_terms=new+york+ny&page=2

Basic Yellow Pages Scraper

import requests

from bs4 import BeautifulSoup

import json

import time

class YellowPagesScraper:

def __init__(self, proxy=None):

self.session = requests.Session()

self.session.headers.update({

'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',

'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8',

'Accept-Language': 'en-US,en;q=0.9',

'Accept-Encoding': 'gzip, deflate, br',

'Connection': 'keep-alive',

})

if proxy:

self.session.proxies = {

'http': proxy,

'https': proxy,

}

def search(self, query, location, max_pages=5):

"""Search Yellow Pages and return business listings."""

all_businesses = []

for page in range(1, max_pages + 1):

url = f"https://www.yellowpages.com/search"

params = {

'search_terms': query,

'geo_location_terms': location,

'page': page,

}

print(f"Scraping page {page} for '{query}' in '{location}'...")

try:

response = self.session.get(url, params=params, timeout=30)

response.raise_for_status()

except requests.RequestException as e:

print(f"Error on page {page}: {e}")

continue

businesses = self._parse_results(response.text)

if not businesses:

print(f"No more results on page {page}. Stopping.")

break

all_businesses.extend(businesses)

print(f" Found {len(businesses)} businesses (total: {len(all_businesses)})")

# Rate limiting

time.sleep(2)

return all_businesses

def _parse_results(self, html):

"""Parse search results page and extract business data."""

soup = BeautifulSoup(html, 'lxml')

businesses = []

# Find all business listing cards

listings = soup.select('div.result')

for listing in listings:

business = {}

# Business name

name_el = listing.select_one('a.business-name')

if name_el:

business['name'] = name_el.text.strip()

business['detail_url'] = 'https://www.yellowpages.com' + name_el.get('href', '')

else:

continue # Skip if no name found

# Phone number

phone_el = listing.select_one('div.phones')

business['phone'] = phone_el.text.strip() if phone_el else None

# Address

street_el = listing.select_one('div.street-address')

locality_el = listing.select_one('div.locality')

street = street_el.text.strip() if street_el else ''

locality = locality_el.text.strip() if locality_el else ''

business['address'] = f"{street}, {locality}".strip(', ')

# Category

category_el = listing.select_one('div.categories a')

business['category'] = category_el.text.strip() if category_el else None

# Rating

rating_el = listing.select_one('div.ratings')

if rating_el:

stars = rating_el.select_one('div.rating')

count = rating_el.select_one('span.count')

business['rating'] = stars.get('class', [''])[1] if stars else None

business['review_count'] = count.text.strip('()') if count else '0'

else:

business['rating'] = None

business['review_count'] = '0'

# Website

website_el = listing.select_one('a.track-visit-website')

business['website'] = website_el.get('href', None) if website_el else None

businesses.append(business)

return businesses

def save_json(self, data, filename):

"""Save results to JSON file."""

with open(filename, 'w') as f:

json.dump(data, f, indent=2)

print(f"Saved {len(data)} businesses to {filename}")

def save_csv(self, data, filename):

"""Save results to CSV file."""

import csv

if not data:

return

keys = data[0].keys()

with open(filename, 'w', newline='', encoding='utf-8') as f:

writer = csv.DictWriter(f, fieldnames=keys)

writer.writeheader()

writer.writerows(data)

print(f"Saved {len(data)} businesses to {filename}")

Usage

scraper = YellowPagesScraper(

proxy='http://user:pass@gate.smartproxy.com:7777'

)

Search for plumbers in New York

businesses = scraper.search('plumber', 'New York, NY', max_pages=5)

Save results

scraper.save_json(businesses, 'plumbers_nyc.json')

scraper.save_csv(businesses, 'plumbers_nyc.csv')

Scraping Business Detail Pages

For richer data, scrape individual business detail pages:

def scrape_detail_page(self, url):

"""Scrape a business detail page for additional information."""

try:

response = self.session.get(url, timeout=30)

response.raise_for_status()

except requests.RequestException as e:

print(f"Error fetching detail page: {e}")

return {}

soup = BeautifulSoup(response.text, 'lxml')

details = {}

# Business hours

hours_section = soup.select('div.open-details tr')

if hours_section:

hours = {}

for row in hours_section:

day = row.select_one('td:first-child')

time_val = row.select_one('td:last-child')

if day and time_val:

hours[day.text.strip()] = time_val.text.strip()

details['hours'] = hours

# About / Description

about = soup.select_one('dd.details-description')

details['description'] = about.text.strip() if about else None

# Services offered

services = soup.select('div.services-services a')

details['services'] = [s.text.strip() for s in services]

# Amenities

amenities = soup.select('div.amenities dd')

details['amenities'] = [a.text.strip() for a in amenities]

# Years in business

years = soup.select_one('div.years-in-business span.count')

details['years_in_business'] = years.text.strip() if years else None

return details

Multi-Location Scraping

To scrape across multiple cities or industries:

import itertools

def scrape_multiple_locations(scraper, queries, locations, max_pages=3):

"""Scrape Yellow Pages across multiple queries and locations."""

all_results = []

combinations = list(itertools.product(queries, locations))

total = len(combinations)

for i, (query, location) in enumerate(combinations, 1):

print(f"\n[{i}/{total}] Scraping '{query}' in '{location}'")

results = scraper.search(query, location, max_pages=max_pages)

for r in results:

r['search_query'] = query

r['search_location'] = location

all_results.extend(results)

print(f" Running total: {len(all_results)} businesses")

time.sleep(3) # Delay between searches

return all_results

Example: Scrape multiple industries across multiple cities

queries = ['plumber', 'electrician', 'hvac contractor']

locations = ['New York, NY', 'Los Angeles, CA', 'Chicago, IL', 'Houston, TX']

scraper = YellowPagesScraper(proxy='http://user:pass@gate.smartproxy.com:7777')

all_businesses = scrape_multiple_locations(scraper, queries, locations)

scraper.save_json(all_businesses, 'contractors_multi_city.json')

Handling Anti-Scraping Measures

Yellow Pages implements several anti-scraping protections:

Rate Limiting

Yellow Pages will block or serve CAPTCHAs if you send requests too quickly. Best practices:

  • Add 2-3 second delays between page requests
  • Add 5-10 second delays between different search queries
  • Randomize delay timing to avoid pattern detection
import random

delay = random.uniform(2.0, 4.0)

time.sleep(delay)

Proxy Rotation

For large-scale scraping, rotate through residential proxies to distribute requests across many IP addresses [link to: best-residential-proxy-providers]:

proxy_list = [

'http://user:pass@gate.smartproxy.com:7777',

'http://user:pass@gate.smartproxy.com:7778',

'http://user:pass@gate.smartproxy.com:7779',

]

def get_rotating_proxy():

return random.choice(proxy_list)

User-Agent Rotation

Rotate user agents to avoid fingerprint-based blocking:

USER_AGENTS = [

'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 Chrome/120.0.0.0',

'Mozilla/5.0 (Macintosh; Intel Mac OS X 14_0) AppleWebKit/537.36 Chrome/120.0.0.0',

'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:121.0) Gecko/20100101 Firefox/121.0',

'Mozilla/5.0 (Macintosh; Intel Mac OS X 14_0) AppleWebKit/605.1.15 Safari/17.0',

]

headers = {'User-Agent': random.choice(USER_AGENTS)}

Data Cleaning and Deduplication

Raw scraped data often needs cleaning:

import re

def clean_business_data(businesses):

"""Clean and deduplicate scraped business listings."""

seen = set()

cleaned = []

for biz in businesses:

# Normalize phone number

if biz.get('phone'):

biz['phone'] = re.sub(r'[^\d]', '', biz['phone'])

if len(biz['phone']) == 10:

biz['phone'] = f"({biz['phone'][:3]}) {biz['phone'][3:6]}-{biz['phone'][6:]}"

# Normalize name

if biz.get('name'):

biz['name'] = biz['name'].strip()

# Deduplicate by phone + name combination

key = (biz.get('name', '').lower(), biz.get('phone', ''))

if key not in seen:

seen.add(key)

cleaned.append(biz)

print(f"Cleaned: {len(businesses)} -> {len(cleaned)} (removed {len(businesses) - len(cleaned)} duplicates)")

return cleaned

Alternative Approaches

Using the Yellow Pages API

Yellow Pages does not offer a public API, but some third-party services provide structured Yellow Pages data:

  • SerpApi — Offers a Yellow Pages API endpoint
  • Outscraper — Provides Yellow Pages data as a service
  • ScrapingBee — Handles rendering and anti-bot for you

These services are more expensive per query but eliminate the need to manage proxies and anti-detection yourself [link to: best-web-scraping-apis].

Similar Directories to Scrape

If Yellow Pages does not cover your needs, consider these alternatives:

  • Yelp — More reviews and photos [link to: how-to-scrape-yelp]
  • Google Maps — Largest business directory globally
  • BBB (Better Business Bureau) — Business ratings and complaints
  • Manta — US business directory
  • Whitepages — Phone and address lookups

Legal and Ethical Considerations

Before scraping Yellow Pages at scale, understand the legal landscape:

  • Yellow Pages’ Terms of Service prohibit automated data collection
  • Business names, addresses, and phone numbers are generally considered public information
  • Respect rate limits and do not overload their servers
  • Consider the CFAA (Computer Fraud and Abuse Act) implications for US-based scrapers
  • GDPR may apply if you collect data about EU-based businesses or individuals

Always consult with a legal professional before building a commercial scraping operation [link to: is-web-scraping-legal].

Frequently Asked Questions

Is it legal to scrape Yellow Pages?

Scraping publicly available business directory data exists in a legal gray area. While the data itself (business names, addresses, phone numbers) is generally public, Yellow Pages’ Terms of Service prohibit automated collection. The legal risk depends on your jurisdiction, the volume of data collected, and how you use it. Consult a lawyer for your specific use case [link to: is-web-scraping-legal].

How many listings can I scrape from Yellow Pages?

Yellow Pages shows approximately 30 listings per search results page with up to 100 pages per search query (around 3,000 results per search). By combining different search terms and locations, you can access millions of listings. Using residential proxies and proper rate limiting, you can reliably scrape thousands of listings per day [link to: best-rotating-proxy-services].

Does Yellow Pages have an API?

Yellow Pages does not offer a public API for bulk data access. You can use third-party services like SerpApi or Outscraper that provide structured Yellow Pages data through their APIs, or build your own scraper as shown in this guide [link to: best-web-scraping-apis].

How do I avoid getting blocked when scraping Yellow Pages?

Use residential proxies with rotation, add random delays between requests (2-5 seconds), rotate user agents, and avoid scraping during peak hours. If you encounter CAPTCHAs, reduce your request rate and switch to a fresh set of proxy IPs [link to: best-residential-proxy-providers].

Can I scrape Yellow Pages with a no-code tool?

Yes, tools like Octoparse, ParseHub, and WebScraper.io can scrape Yellow Pages without coding. These tools provide visual point-and-click interfaces for selecting data elements and handling pagination. However, they may struggle with anti-bot protections at scale [link to: best-no-code-web-scrapers].

last updated: March 12, 2026

Scroll to Top

Resources

Proxy Signals Podcast
Operator-level insights on mobile proxies and access infrastructure.

Multi-Account Proxies: Setup, Types, Tools & Mistakes (2026)