How to Scrape Booking.com Hotel Data with Proxies

How to Scrape Booking.com Hotel Data with Proxies

Booking.com is the world’s largest online travel agency for accommodations, listing over 28 million properties across 220+ countries. For travel businesses, hospitality analysts, and price comparison platforms, Booking.com’s data is invaluable — but extracting it reliably requires the right proxy infrastructure.

This guide walks through the technical approach to scraping Booking.com for hotel prices, availability, reviews, and property details using DataResearchTools mobile proxies.

What Data Can You Extract from Booking.com

Available Data Points

Booking.com’s property listings contain rich, structured data:

Property Information:

  • Hotel name and star rating
  • Address and coordinates
  • Property type (hotel, hostel, apartment, villa)
  • Amenities and facilities list
  • Photos (URLs)
  • Check-in/check-out policies

Pricing Data:

  • Room rates by room type
  • Pricing for different date ranges
  • Taxes and fees breakdown
  • Discounts (Genius member, mobile deals, early booker)
  • Cancellation policy details
  • Breakfast inclusion status

Review Data:

  • Overall score and category scores (cleanliness, location, staff, etc.)
  • Individual review text and ratings
  • Reviewer nationality and travel type
  • Review date

Availability Data:

  • Room availability by date
  • Last booking indicators (“Only 2 rooms left!”)
  • Sold-out status

Why This Data Matters

Use CaseKey Data NeededBusiness Value
Price comparison platformRates, availability, room typesConsumer transparency
Hotel revenue managementCompetitor rates, occupancy signalsPricing optimization
Market researchPricing trends, review sentimentInvestment decisions
Travel agencyRates across OTAs, direct ratesBest-price sourcing
Hospitality consultingBenchmarking data, market positioningClient advisory

Booking.com’s Anti-Scraping Measures

Booking.com employs one of the most sophisticated anti-scraping systems in the travel industry.

Detection Layers

Layer 1: IP Reputation

  • Datacenter IP ranges are blocked or heavily restricted.
  • IPs with high request volumes are flagged and throttled.
  • Known proxy/VPN IP ranges receive degraded service or blocks.

Layer 2: Browser Fingerprinting

  • JavaScript-based fingerprinting checks for headless browser signatures.
  • Canvas fingerprint, WebGL renderer, and font enumeration are analyzed.
  • Inconsistencies between reported user-agent and actual browser capabilities trigger flags.

Layer 3: Behavioral Analysis

  • Request timing patterns are analyzed for automation signatures.
  • Navigation flow is evaluated — bots that skip directly to search results are flagged.
  • Mouse movement and scroll behavior may be monitored on some pages.

Layer 4: CAPTCHA and Verification

  • Suspected bots receive CAPTCHA challenges.
  • Persistent flagged IPs may receive phone verification prompts.
  • Some blocked requests return HTTP 429 (Too Many Requests) with extended cooldown periods.

Why Mobile Proxies Bypass These Defenses

DataResearchTools mobile proxies are effective against Booking.com’s defenses because:

  1. IP reputation: Mobile carrier IPs have the highest trust scores. These IPs are used by millions of real Booking.com users daily.
  2. IP classification: Anti-bot systems classify mobile IPs as legitimate user traffic, not proxy traffic.
  3. Shared IP pools: Mobile IPs are naturally shared among many users via carrier-grade NAT, so multiple requests from the same IP are expected behavior.
  4. Geo-accuracy: Mobile IPs from DataResearchTools resolve to the correct country, ensuring localized pricing without geo-detection flags.

Step-by-Step Scraping Setup

Prerequisites

  • Python 3.8+ with Playwright or Selenium
  • DataResearchTools mobile proxy credentials
  • A database for storing results (PostgreSQL recommended)

Step 1: Configure Proxy Connection

Set up your DataResearchTools mobile proxy for Booking.com scraping:

PROXY_CONFIG = {
    "server": "http://sg.dataresearchtools.com:10001",
    "username": "your_username",
    "password": "your_password"
}

For geo-targeted pricing, configure proxies for each target country:

CountryUse CaseEndpoint
SingaporeSGD pricing, SG promotionssg.dataresearchtools.com
ThailandTHB pricing, TH promotionsth.dataresearchtools.com
MalaysiaMYR pricing, MY promotionsmy.dataresearchtools.com
IndonesiaIDR pricing, ID promotionsid.dataresearchtools.com

Step 2: Set Up Browser Automation

Booking.com requires full JavaScript rendering. A simple HTTP request will not return complete data.

Playwright setup (recommended):

from playwright.async_api import async_playwright

async def create_browser(proxy_config):
    pw = await async_playwright().start()
    browser = await pw.chromium.launch(
        headless=True,
        proxy={
            "server": proxy_config["server"],
            "username": proxy_config["username"],
            "password": proxy_config["password"]
        }
    )
    context = await browser.new_context(
        user_agent="Mozilla/5.0 (Linux; Android 14; SM-S928B) "
                   "AppleWebKit/537.36 (KHTML, like Gecko) "
                   "Chrome/122.0.0.0 Mobile Safari/537.36",
        viewport={"width": 412, "height": 915},
        locale="en-SG"
    )
    return browser, context

Key configuration points:

  • Use a mobile user-agent string matching a popular device.
  • Set viewport to a mobile device resolution.
  • Set locale to match your proxy country.

Step 3: Navigate and Search

Mimic natural user behavior when navigating Booking.com:

  1. Visit the homepage first — do not jump directly to a search URL.
  2. Wait 2-3 seconds after page load before interacting.
  3. Enter search parameters through the search form (destination, dates, guests).
  4. Submit the search and wait for results to load.
async def search_hotels(page, destination, checkin, checkout, guests=2):
    # Navigate to homepage
    await page.goto("https://www.booking.com")
    await page.wait_for_timeout(2000)

    # Close any popup/overlay
    try:
        await page.click('[aria-label="Dismiss sign-in info."]', timeout=3000)
    except:
        pass

    # Enter destination
    await page.click('[data-testid="destination-container"]')
    await page.fill('input[name="ss"]', destination)
    await page.wait_for_timeout(1500)

    # Select from autocomplete
    await page.click('[data-testid="autocomplete-result"]')

    # Set dates and search
    # (Date picker interaction varies by Booking.com's current UI)
    # ...

    await page.click('[data-testid="submit-button"]')
    await page.wait_for_load_state("networkidle")

Step 4: Extract Listing Data

Once search results load, extract key data points:

From search results page:

  • Property name and link
  • Star rating
  • Review score and count
  • Price per night
  • Distance from center
  • Key amenities

From individual property pages (for detailed data):

  • Full room type list with prices
  • Detailed amenity breakdown
  • All reviews
  • Cancellation policies
  • Photos

Step 5: Handle Pagination

Booking.com search results are paginated, typically showing 25 properties per page.

  • Scroll to the bottom of each page to trigger lazy-loaded content.
  • Click the “Next page” button or load the next offset parameter.
  • Maintain the same proxy session (sticky IP) throughout pagination to avoid detection.
  • Add 3-5 second delays between page loads.

Optimizing Your Scraping Strategy

Session Management

Sticky sessions are essential for Booking.com scraping:

  • Maintain the same DataResearchTools mobile IP throughout a complete search session (search + pagination + property detail pages).
  • Session duration of 15-20 minutes works well.
  • Rotate to a new IP between different search sessions (different destinations or date ranges).

Request Pacing

Booking.com is sensitive to high request rates. Recommended pacing:

ActionDelay
Between page loads5-8 seconds
Between form interactions1-3 seconds
Between search sessions15-30 seconds
Between property detail visits4-7 seconds

Handling Booking.com’s Currency and Language

Booking.com automatically sets currency and language based on your IP location. With DataResearchTools mobile proxies:

  • A Singapore proxy shows prices in SGD by default.
  • A Thai proxy shows prices in THB by default.
  • Language is set based on IP but can be overridden via URL parameters or cookie settings.

To compare prices across markets, run the same search from different country proxies and record the currency-specific prices.

Managing Genius and Member Pricing

Booking.com offers discounted “Genius” pricing to frequent users. To capture both public and Genius prices:

  • Public pricing: Scrape without logging in. This shows the base rate available to all users.
  • Genius pricing: Log into a Genius-qualified account through the proxy to see member discounts.

Note: Maintaining Booking.com accounts for scraping purposes requires careful session management. Each account should be associated with a consistent proxy IP from a single country.

Data Extraction Patterns

Extracting Price Data

Booking.com’s price display includes multiple components:

ComponentWhere to FindNotes
Original priceStrikethrough text near the final priceOnly present when discounted
Final priceProminent price displayPer night or total, varies by UI
Taxes and fees“Includes taxes and fees” or separate lineMay be included or additional
Genius discountTagged with Genius iconOnly visible to logged-in Genius users
Mobile discount“Mobile-only price” tagVisible with mobile user-agent

Extracting Review Data

Reviews are loaded dynamically and may require scrolling or clicking “Show more” buttons:

  • Overall score is typically in the property header area.
  • Category scores (cleanliness, comfort, location, etc.) are in the review summary section.
  • Individual reviews are paginated, usually 10-25 per page.
  • Each review includes: score, text, reviewer country, travel type, room type, and date.

Extracting Availability Signals

Booking.com shows several availability indicators that provide valuable market intelligence:

  • “Only X rooms left at this price” — indicates high demand.
  • “Booked X times in the last 24 hours” — demand signal.
  • Sold-out dates in the calendar — occupancy indicator.
  • “Limited supply in your area” banner — area-wide demand signal.

Scaling Booking.com Scraping

Parallel Scraping by Country

Run separate scraping processes for each target country simultaneously:

Process 1: SG proxy → Booking.com search → Singapore hotels
Process 2: TH proxy → Booking.com search → Bangkok hotels
Process 3: MY proxy → Booking.com search → KL hotels
Process 4: ID proxy → Booking.com search → Bali hotels

Each process uses its own DataResearchTools proxy endpoint and operates independently.

Incremental vs. Full Crawls

  • Full crawl: Scrape all properties in a destination, including all room types and details. Run weekly or monthly.
  • Price check: Quick scrape of pricing for known properties. Run daily or multiple times per day.
  • Availability check: Check date-specific availability for monitored properties. Run daily for upcoming dates.

Data Freshness Strategy

Data TypeRefresh FrequencyRationale
PricesEvery 6-12 hoursPrices change frequently
AvailabilityDailyRooms sell out daily
ReviewsWeeklyNew reviews trickle in
Property detailsMonthlyAmenities change rarely

Common Issues and Solutions

Issue: Empty or Partial Results

Cause: Page did not fully render before data extraction attempted.

Solution: Increase wait times. Use wait_for_selector to confirm key elements are present before extracting.

Issue: Different Prices on Repeat Checks

Cause: Booking.com’s dynamic pricing is working as intended — prices genuinely change.

Solution: Record timestamps with every price data point. Multiple checks per day establish the price range, not a single “correct” price.

Issue: CAPTCHA After Many Requests

Cause: Request volume exceeded Booking.com’s per-IP threshold.

Solution: Reduce request rate. Rotate to a new DataResearchTools mobile IP. Ensure delays are randomized, not fixed intervals.

Issue: Redirect to Country-Specific Domain

Cause: Booking.com may redirect based on IP to a country-specific version (e.g., booking.com/sg).

Solution: This is expected behavior with country-specific proxies. It confirms your geo-targeting is working correctly.

Legal and Ethical Considerations

Booking.com’s terms of service prohibit automated data collection. Users should:

  • Consult with legal counsel about applicable laws in their jurisdiction.
  • Implement respectful scraping rates that do not impact platform performance.
  • Avoid collecting personal data from reviews beyond what is publicly displayed.
  • Consider Booking.com’s affiliate API as a complementary data source for permitted use cases.

Conclusion

Scraping Booking.com for hotel data is technically demanding due to the platform’s sophisticated anti-bot systems, dynamic JavaScript rendering, and geo-targeted pricing. DataResearchTools mobile proxies address the core challenge by providing trusted mobile carrier IPs that Booking.com’s systems treat as legitimate user traffic.

The combination of mobile proxies with proper browser automation, realistic request pacing, and country-specific geo-targeting enables reliable data collection from the world’s largest accommodation platform. Whether you need price intelligence for a handful of competitor hotels or market-wide data across multiple SEA destinations, the approach outlined in this guide provides a solid foundation.

Start with a small set of properties in a single destination, validate your data extraction against manual checks, and scale your operation as you confirm reliability.


Related Reading

Scroll to Top