How to Scrape Booking.com Hotel Prices with Proxy Rotation

How to Scrape Booking.com Hotel Prices with Proxy Rotation

Booking.com hosts over 28 million accommodation listings across 228 countries. For travel aggregators, market researchers, and hospitality analysts, access to this pricing data represents a significant competitive advantage. However, Booking.com employs some of the most aggressive anti-bot defenses in the travel industry, making scraping a non-trivial challenge.

In this guide, you will learn how to build a reliable Booking.com scraper using Python, Selenium, and mobile proxy rotation to extract hotel names, prices, ratings, and availability data without getting blocked.

Why Booking.com Is Difficult to Scrape

Booking.com presents several technical challenges that make naive scraping approaches fail quickly:

Dynamic pricing engine. Prices on Booking.com change based on session behavior, search history, device type, and geographic location. The same hotel room can display different prices to different users simultaneously. This means static HTTP requests often return incomplete or inaccurate data.

JavaScript-heavy rendering. Most pricing and availability data loads asynchronously through JavaScript. A simple requests.get() call returns a shell page with no useful hotel data. You need a browser automation tool to render the full page.

Sophisticated bot detection. Booking.com uses fingerprinting, behavioral analysis, and rate limiting to identify automated access. Repeated requests from the same IP address trigger CAPTCHAs, temporary blocks, or permanently altered search results.

Session-based personalization. The platform tracks user sessions extensively, adjusting displayed content based on browsing patterns. Without proper session management, scraped data may be inconsistent.

These challenges make web scraping proxies essential for any reliable Booking.com data collection project.

Setting Up Your Environment

Before writing any scraping code, install the required dependencies:

pip install selenium webdriver-manager beautifulsoup4 pandas

You will also need Chrome installed on your system. The webdriver-manager package handles ChromeDriver version management automatically.

Configuring Proxy Rotation for Booking.com

Booking.com is particularly sensitive to datacenter IP addresses. Mobile proxies provide the highest success rates because they use IP addresses assigned to real mobile carriers, making requests appear legitimate to Booking.com’s detection systems.

import random
import time
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup
import pandas as pd

class BookingProxyManager:
    """Manages proxy rotation for Booking.com scraping sessions."""

    def __init__(self, proxy_list):
        self.proxy_list = proxy_list
        self.current_index = 0
        self.failed_proxies = set()

    def get_next_proxy(self):
        """Return the next working proxy in rotation."""
        available = [p for p in self.proxy_list if p not in self.failed_proxies]
        if not available:
            self.failed_proxies.clear()
            available = self.proxy_list

        proxy = available[self.current_index % len(available)]
        self.current_index += 1
        return proxy

    def mark_failed(self, proxy):
        """Mark a proxy as temporarily failed."""
        self.failed_proxies.add(proxy)

    def create_driver(self, proxy):
        """Create a Selenium WebDriver with the given proxy."""
        chrome_options = Options()
        chrome_options.add_argument(f"--proxy-server={proxy}")
        chrome_options.add_argument("--headless=new")
        chrome_options.add_argument("--disable-blink-features=AutomationControlled")
        chrome_options.add_argument("--no-sandbox")
        chrome_options.add_argument(
            "--user-agent=Mozilla/5.0 (iPhone; CPU iPhone OS 17_0 like Mac OS X) "
            "AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.0 "
            "Mobile/15E148 Safari/604.1"
        )
        chrome_options.add_experimental_option("excludeSwitches", ["enable-automation"])
        chrome_options.add_experimental_option("useAutomationExtension", False)

        service = Service(ChromeDriverManager().install())
        driver = webdriver.Chrome(service=service, options=chrome_options)

        driver.execute_cdp_cmd(
            "Page.addScriptToEvaluateOnNewDocument",
            {"source": "Object.defineProperty(navigator, 'webdriver', {get: () => undefined})"},
        )

        return driver

This configuration uses a mobile user agent string and disables common automation indicators. The proxy manager rotates through your available mobile proxies and automatically skips proxies that have been temporarily blocked.

Building the Booking.com Scraper

The core scraper navigates to Booking.com search results and extracts structured hotel data from the rendered page:

class BookingScraper:
    """Scrapes hotel data from Booking.com search results."""

    def __init__(self, proxy_manager):
        self.proxy_manager = proxy_manager
        self.results = []

    def build_search_url(self, destination, checkin, checkout, adults=2, rooms=1):
        """Construct a Booking.com search URL with parameters."""
        base = "https://www.booking.com/searchresults.html"
        params = (
            f"?ss={destination.replace(' ', '+')}"
            f"&checkin={checkin}"
            f"&checkout={checkout}"
            f"&group_adults={adults}"
            f"&no_rooms={rooms}"
            f"&selected_currency=USD"
        )
        return base + params

    def scrape_search_results(self, destination, checkin, checkout, max_pages=3):
        """Scrape hotel listings from search results across multiple pages."""
        url = self.build_search_url(destination, checkin, checkout)

        for page in range(max_pages):
            offset = page * 25
            page_url = f"{url}&offset={offset}" if page > 0 else url

            proxy = self.proxy_manager.get_next_proxy()
            driver = None

            try:
                driver = self.proxy_manager.create_driver(proxy)
                driver.set_page_load_timeout(30)
                driver.get(page_url)

                # Wait for hotel cards to load
                WebDriverWait(driver, 15).until(
                    EC.presence_of_element_located(
                        (By.CSS_SELECTOR, "[data-testid='property-card']")
                    )
                )

                # Scroll to trigger lazy-loaded content
                self._scroll_page(driver)

                soup = BeautifulSoup(driver.page_source, "html.parser")
                hotels = self._parse_hotel_cards(soup)
                self.results.extend(hotels)

                print(f"Page {page + 1}: extracted {len(hotels)} hotels via {proxy}")

                # Randomized delay between pages
                time.sleep(random.uniform(3, 7))

            except Exception as e:
                print(f"Error on page {page + 1} with proxy {proxy}: {e}")
                self.proxy_manager.mark_failed(proxy)

            finally:
                if driver:
                    driver.quit()

        return self.results

    def _scroll_page(self, driver):
        """Scroll the page gradually to trigger lazy loading."""
        total_height = driver.execute_script("return document.body.scrollHeight")
        scroll_step = total_height // 5

        for i in range(1, 6):
            driver.execute_script(f"window.scrollTo(0, {scroll_step * i});")
            time.sleep(random.uniform(0.5, 1.5))

    def _parse_hotel_cards(self, soup):
        """Extract structured data from hotel property cards."""
        hotels = []
        cards = soup.select("[data-testid='property-card']")

        for card in cards:
            hotel = {}

            # Hotel name
            title_el = card.select_one("[data-testid='title']")
            hotel["name"] = title_el.get_text(strip=True) if title_el else None

            # Price
            price_el = card.select_one("[data-testid='price-and-discounted-price']")
            if price_el:
                price_text = price_el.get_text(strip=True)
                hotel["price"] = self._clean_price(price_text)
            else:
                hotel["price"] = None

            # Rating score
            score_el = card.select_one("[data-testid='review-score']")
            if score_el:
                score_text = score_el.get_text(strip=True)
                hotel["rating"] = self._extract_rating(score_text)
            else:
                hotel["rating"] = None

            # Location / distance
            distance_el = card.select_one("[data-testid='distance']")
            hotel["distance"] = distance_el.get_text(strip=True) if distance_el else None

            # Review count
            review_el = card.select_one("[data-testid='review-score'] .a3332d346a")
            hotel["review_count"] = review_el.get_text(strip=True) if review_el else None

            if hotel["name"]:
                hotels.append(hotel)

        return hotels

    @staticmethod
    def _clean_price(price_text):
        """Extract numeric price from text like 'US$142'."""
        import re
        match = re.search(r"[\d,]+\.?\d*", price_text.replace(",", ""))
        return float(match.group()) if match else None

    @staticmethod
    def _extract_rating(score_text):
        """Extract numeric rating from review score text."""
        import re
        match = re.search(r"(\d+\.?\d*)", score_text)
        return float(match.group(1)) if match else None

Running the Scraper

Put everything together to scrape hotel prices for a specific destination and date range:

def main():
    # Configure your mobile proxy endpoints
    proxies = [
        "http://user:pass@proxy1.example.com:8080",
        "http://user:pass@proxy2.example.com:8080",
        "http://user:pass@proxy3.example.com:8080",
    ]

    proxy_manager = BookingProxyManager(proxies)
    scraper = BookingScraper(proxy_manager)

    # Scrape hotels in Tokyo for specific dates
    results = scraper.scrape_search_results(
        destination="Tokyo, Japan",
        checkin="2026-05-01",
        checkout="2026-05-05",
        max_pages=5,
    )

    # Export to CSV
    df = pd.DataFrame(results)
    df.to_csv("booking_tokyo_hotels.csv", index=False)
    print(f"\nTotal hotels scraped: {len(results)}")
    print(df.describe())


if __name__ == "__main__":
    main()

Handling Dynamic Pricing Variations

Booking.com changes prices based on the user’s perceived location. To capture accurate regional pricing, rotate proxies from different geographic regions and record the proxy location alongside each price point:

def scrape_regional_prices(scraper, proxy_manager, destination, checkin, checkout):
    """Compare hotel prices as seen from different proxy locations."""
    regional_data = []

    for proxy in proxy_manager.proxy_list:
        driver = None
        try:
            driver = proxy_manager.create_driver(proxy)
            url = scraper.build_search_url(destination, checkin, checkout)
            driver.get(url)

            WebDriverWait(driver, 15).until(
                EC.presence_of_element_located(
                    (By.CSS_SELECTOR, "[data-testid='property-card']")
                )
            )

            soup = BeautifulSoup(driver.page_source, "html.parser")
            hotels = scraper._parse_hotel_cards(soup)

            for hotel in hotels:
                hotel["proxy_used"] = proxy
                regional_data.append(hotel)

            time.sleep(random.uniform(4, 8))

        except Exception as e:
            print(f"Regional scrape failed for {proxy}: {e}")
        finally:
            if driver:
                driver.quit()

    return regional_data

This approach reveals how Booking.com adjusts pricing by geography, which is valuable for travel scraping market research.

Best Practices for Booking.com Scraping

Rotate proxies per page, not per session. Booking.com tracks session continuity. Switching proxies mid-session can trigger detection. Use one proxy per complete search session, then rotate for the next search.

Respect rate limits. Space requests at least 3-5 seconds apart. Booking.com’s rate limiter triggers quickly on rapid sequential requests, even from different IPs.

Use mobile user agents. Mobile proxies paired with mobile user agent strings achieve the highest success rates on Booking.com. The mobile version of the site also has simpler HTML structure, making parsing easier.

Handle currency normalization. Always set selected_currency in the URL to ensure consistent price comparison across different proxy locations.

Monitor for soft blocks. Booking.com sometimes returns inflated prices or limited results instead of outright blocking. Compare result counts across proxies to detect this behavior.

Cache and deduplicate. Hotel listings repeat across pages. Use the hotel name and date range as a composite key to deduplicate your dataset.

Storing and Analyzing the Data

For ongoing price monitoring, store results in a structured format that tracks changes over time:

import json
from datetime import datetime

def save_with_metadata(results, destination, checkin, checkout):
    """Save scraping results with full metadata for historical tracking."""
    output = {
        "scrape_timestamp": datetime.now().isoformat(),
        "destination": destination,
        "checkin": checkin,
        "checkout": checkout,
        "total_hotels": len(results),
        "hotels": results,
    }

    filename = f"booking_{destination.replace(' ', '_')}_{checkin}.json"
    with open(filename, "w") as f:
        json.dump(output, f, indent=2)

    print(f"Saved {len(results)} hotels to {filename}")
    return filename

Common Pitfalls and Solutions

CAPTCHA challenges. If you encounter CAPTCHAs frequently, your proxy pool may be too small or your request patterns too uniform. Expand your pool of mobile proxies and add more randomization to request timing and scroll behavior.

Missing price data. Some hotel cards show “See prices” instead of actual prices. These require clicking through to the detail page. Consider a two-phase scraper: one pass for listings, a second pass for detailed pricing.

Stale selectors. Booking.com updates its frontend regularly. Build your selectors defensively using data-testid attributes, which tend to be more stable than CSS classes.

Conclusion

Scraping Booking.com hotel prices at scale requires a combination of browser automation, intelligent proxy rotation, and careful session management. Mobile proxies are particularly effective for this use case because they replicate the behavior of genuine mobile users browsing for hotels.

The scraper architecture presented here handles the core challenges of dynamic pricing, JavaScript rendering, and bot detection. For production deployments, consider adding database storage, automated scheduling, and alerting for price changes.

For more scraping guides covering other platforms, explore our web scraping proxy tutorials and proxy glossary for technical reference.


Related Reading

Scroll to Top