Scraping Zillow and Real Estate Data with Proxies

Why Real Estate Data Is Worth Scraping

Real estate is one of the most data-intensive industries, yet much of the critical data is locked behind platforms that restrict programmatic access. Zillow, Redfin, Realtor.com, and their international equivalents aggregate property listings, historical sales data, price estimates, and neighborhood statistics that power billions of dollars in investment decisions.

Investors use scraped real estate data to identify undervalued properties before they hit mainstream attention. PropTech companies build valuation models trained on historical listing data. Market researchers track inventory trends and pricing dynamics across hundreds of metro areas. Real estate agencies monitor competitor listings and pricing strategies.

The challenge is that major real estate platforms have strong financial incentives to prevent scraping. Their data is their product, and they invest accordingly in protecting it.

Anti-Scraping Measures Across Major Platforms

Each real estate platform has a different defensive posture.

Zillow

Zillow is the most heavily protected of the major US real estate platforms. Their defenses include:

  • PerimeterX (now HUMAN): Zillow uses HUMAN’s bot management platform, which employs advanced behavioral analysis, device fingerprinting, and machine learning-based detection.
  • Heavy JavaScript rendering: Property data is loaded dynamically via API calls from their React-based frontend. Raw HTTP requests return empty shells.
  • Rate limiting: Aggressive per-IP rate limits, with residential IPs capped at roughly 50-100 pages per hour.
  • Legal enforcement: Zillow has sent cease-and-desist letters to scraping operations and has terms of service that explicitly prohibit automated data collection.

Redfin

Redfin’s protections are moderately strong:

  • Cloudflare: Redfin uses Cloudflare for bot detection and DDoS protection, which provides IP reputation scoring, JavaScript challenges, and browser fingerprint validation.
  • API-based data loading: Like Zillow, property details are fetched via internal APIs, not embedded in the initial HTML.
  • Rate limiting: Less aggressive than Zillow, but still present. Approximately 100-200 pages per hour per IP.
  • Geographic restrictions: Some data is restricted based on the requester’s location.

Realtor.com

Realtor.com (operated by Move, Inc., a subsidiary of News Corp) has:

  • Akamai Bot Manager: Enterprise-grade bot detection with sophisticated fingerprinting.
  • Session-based tracking: Strong session validation that detects session anomalies.
  • Data partitioning: Different data endpoints for listings, sold data, and estimates, each with different access controls.

International Platforms

Platforms like PropertyGuru (Southeast Asia), Rightmove (UK), and Domain.com.au (Australia) generally have lighter anti-scraping measures than US platforms, making them more accessible starting points for real estate data projects.

Proxy Setup for Real Estate Scraping

Real estate platforms present unique proxy requirements.

Mobile Proxies: The Optimal Choice

Real estate browsing is inherently mobile. Over 70% of property searches start on mobile devices. This means mobile proxy traffic blends naturally with legitimate real estate platform traffic.

Using mobile proxies for web scraping on real estate platforms provides:

  • High IP trust scores that bypass initial reputation checks
  • Natural traffic patterns that align with how real users browse property listings
  • CGNAT-shared IPs that platforms cannot block without disrupting real users
  • Better success rates against HUMAN and Cloudflare detection layers

Geographic Targeting

Real estate data is inherently geographic. Property listings in Miami are different from listings in Singapore. Your proxy geography should match your target market for two reasons:

  1. Data relevance: Platforms serve different content based on request origin.
  2. Detection avoidance: Browsing property listings in a market 10,000 miles from your IP location looks suspicious.

For APAC real estate markets (Singapore, Australia, Hong Kong), DataResearchTools’ Singapore-based mobile proxies provide the geographic alignment needed for authentic request patterns.

Session Management

Real estate browsing is session-heavy. A typical user searches in an area, views 10-20 listings, compares prices, and returns later. Your proxy sessions should mirror this:

  • Maintain sticky sessions for 15-30 minutes per browsing session
  • Use the same IP across a search and the resulting listing views
  • Implement session breaks (30-60 minute gaps) between browsing sessions
  • Rotate to a new IP for each new “browsing session”

Data Points to Extract

Active Listings

For current listings, target these data points:

  • Property basics: Address, listing price, property type (single-family, condo, townhouse, multi-family)
  • Physical attributes: Bedrooms, bathrooms, square footage, lot size, year built, stories
  • Listing details: Days on market, listing agent, brokerage, MLS number, listing status
  • Financial indicators: Price per square foot, HOA fees, property tax amount, estimated mortgage payment
  • Media: Photo URLs, virtual tour links, floor plan availability
  • Description: Full listing description text (valuable for NLP analysis)

Price History

Price history data is often the most valuable dataset:

  • Historical listing prices (initial list price, price reductions, final sale price)
  • Dates of each price event
  • Time on market for each listing period
  • Price appreciation/depreciation over time
  • Rental price history (where available)

Zillow Zestimate and Redfin Estimate

Platform-generated property valuations are valuable for modeling:

  • Current estimated value
  • Value change over time (monthly, yearly)
  • Estimate confidence range (high/low)
  • Comparison to listing price (for currently listed properties)
  • Rent estimate (Zillow provides “Rent Zestimates”)

Neighborhood and Market Data

Aggregate data at the neighborhood level:

  • Median home prices and trends
  • Inventory levels (active listings count)
  • Average days on market
  • Price-to-rent ratios
  • School ratings and proximity
  • Walk score, transit score, bike score
  • Crime statistics (where available)
  • Demographics

Sold/Closed Transaction Data

Historical transaction data powers valuation models:

  • Sale price
  • Sale date
  • Buyer and seller (in states with public disclosure)
  • Transfer tax amounts
  • Price relative to list price (over/under asking)

Scraping Architecture for Real Estate Data

Phase 1: Market Discovery

Start with search pages to build a comprehensive list of properties in your target market.

  1. Define geographic boundaries (ZIP codes, cities, or map bounding boxes)
  2. Execute searches with filters (property type, price range, status)
  3. Paginate through results to collect all property IDs/URLs
  4. Store property identifiers for Phase 2

Phase 2: Property Detail Collection

For each discovered property, visit the detail page to extract full data:

  1. Load the property page using a headless browser with proxy
  2. Wait for dynamic content to render
  3. Intercept API responses for structured data (preferred) or parse rendered HTML
  4. Extract all target data points
  5. Store raw and parsed data

Phase 3: Ongoing Monitoring

Real estate data changes constantly. Listings are added, prices change, properties sell:

  • New listing detection: Re-run market discovery searches daily to find new listings
  • Price change monitoring: Re-scrape active listings every 2-3 days to detect price changes
  • Status change tracking: Detect when listings move from active to pending, sold, or withdrawn
  • Historical data preservation: Maintain historical snapshots for trend analysis

Technical Stack

A reliable real estate scraping stack includes:

  • Playwright with stealth plugins for browser automation
  • Mobile proxy pool with geographic targeting and sticky sessions
  • Task queue (e.g., Celery, Bull) for managing scraping jobs
  • PostgreSQL with PostGIS for storing property data with geographic queries
  • Monitoring dashboard tracking success rates, new listings found, and proxy health

For headless browser configuration details, see our guide on headless browser and proxy setup.

Geo-Targeted Scraping Strategies

Bounding Box Searches

Zillow and Redfin support map-based searches defined by geographic bounding boxes. This is more efficient than paginating through text-based search results:

  1. Define your target area as a bounding box (southwest corner lat/lng, northeast corner lat/lng)
  2. If the area returns too many results, subdivide into smaller bounding boxes
  3. Recursively split boxes until each returns a manageable number of results (under 500)
  4. Merge results and deduplicate by property ID

ZIP Code-Based Scraping

For systematic coverage, scrape by ZIP code:

  1. Obtain a list of all ZIP codes in your target market
  2. Search each ZIP code as a separate task
  3. This naturally distributes load and provides geographic organization
  4. ZIP codes with high listing counts may need further subdivision

MLS Area Mapping

In the US, Multiple Listing Service (MLS) areas provide another organizational scheme. Mapping MLS areas to your scraping boundaries can help you align with the industry-standard geographic divisions used by real estate professionals.

Building a Property Database

Schema Design

Design your database to handle the temporal nature of real estate data:

  • Properties table: Unique properties identified by address or parcel ID, with static attributes (bedrooms, square footage, year built)
  • Listings table: Each listing event for a property (a property can be listed, sold, and relisted multiple times)
  • Price events table: Every price change, with timestamp and source
  • Estimates table: Platform-generated value estimates, tracked over time
  • Market snapshots table: Aggregate market statistics by geography and date

Data Freshness

Real estate data has varying shelf lives:

  • Active listing prices: Valid for days (price changes happen frequently)
  • Property physical attributes: Valid for years (unless renovated)
  • Value estimates: Valid for weeks to months
  • Market statistics: Valid for weeks
  • Sold transaction data: Permanent (historical fact)

Design your refresh schedule accordingly to optimize proxy usage.

Legal Considerations

Real estate data scraping involves specific legal considerations beyond general web scraping law.

MLS Data and Copyright

MLS data is copyrighted by the originating MLS organization. Listings that appear on Zillow or Realtor.com are syndicated under license agreements. Scraping and redistributing this data may implicate MLS copyright claims.

Fair Housing Compliance

If you use scraped real estate data for any purpose that could affect housing decisions, you must comply with fair housing laws. This includes avoiding data collection or analysis that could enable housing discrimination.

Terms of Service

All major real estate platforms prohibit scraping in their terms of service. As discussed in our web scraping legal guide, ToS violations are civil contract matters, but they can lead to legal action.

Public Records Alternative

Many property data points (ownership, transaction prices, tax assessments) are public records available through county assessor and recorder offices. These are legally unambiguous to access, though the data is less structured and comprehensive than platform data.

Practical Risk Mitigation

  • Scrape only data that is publicly displayed without authentication
  • Do not redistribute raw scraped data as a competing product
  • Use scraped data for analysis and insight generation, not data resale
  • Consult legal counsel before launching a commercial real estate data product
  • Consider blending scraped data with public records for a legally stronger dataset

Getting Started

Real estate data scraping requires patience and investment in quality infrastructure. Start with a single market and a single platform. Validate your data quality against known property values before scaling. The combination of mobile proxies and a well-configured headless browser will give you the foundation for reliable property data collection.

For high-volume real estate scraping across multiple markets, explore our proxy solutions designed for sustained data collection operations that need to maintain consistent access over weeks and months.


Related Reading

Scroll to Top