Scraping Google Flights and Skyscanner for Fare Intelligence (2026)

Google Flights and Skyscanner are the two most powerful flight search engines available to consumers — and they are also the most valuable data sources for anyone building fare intelligence systems. Between them, they aggregate fares from hundreds of airlines and OTAs, providing a comprehensive view of the market that no single airline website can match. But these platforms were built for human users, not automated scrapers, and both deploy sophisticated anti-bot defenses that make data extraction a technical challenge. This guide covers the specific obstacles you will face when scraping these platforms in 2026, the proxy configurations that work, and the techniques for extracting clean, structured fare data at scale.

Why Google Flights and Skyscanner Are the Best Fare Data Sources

Before diving into the technical details, it is worth understanding why these two platforms are preferred over scraping individual airline sites.

Google Flights Advantages

  • Comprehensive coverage: Aggregates fares from virtually every major airline and many OTAs worldwide
  • Structured data: Fare data is embedded in the page source in a relatively parseable format
  • Price history: Provides built-in price trend data that can supplement your own monitoring
  • Calendar view: Shows price variation across dates in a single request, reducing scraping volume
  • No redirect model: Displays actual prices inline rather than redirecting to external sites for quotes

Skyscanner Advantages

  • Deeper OTA coverage: Includes smaller and regional OTAs that Google Flights may miss
  • Price alerts API: Underlying API endpoints can sometimes be accessed for cleaner data extraction
  • Multi-city search: Better support for complex itineraries
  • Explore feature: “Everywhere” search reveals cheapest destinations from a given origin
  • Historical pricing: Longer price trend data for many routes
FeatureGoogle FlightsSkyscanner
Airline coverageExcellent — nearly all major carriersExcellent — plus budget and regional carriers
OTA aggregationLimited — focuses on airline directExtensive — dozens of OTAs per route
Anti-bot difficultyHigh (Google infrastructure)Moderate-High (Cloudflare + custom)
Data structureSemi-structured in page sourceJSON via API endpoints
Rate limitingAggressiveModerate
Geographic pricingYes — varies by search originYes — varies by market

Anti-Bot Measures You Will Face

Both platforms invest heavily in preventing automated access. Understanding their specific defenses is the first step to working around them. For a comprehensive overview of bot detection systems used across the web, see our detailed breakdown of anti-bot systems that target price scrapers and how proxies help bypass them.

Google Flights Detection Stack

Google’s anti-bot infrastructure is among the most sophisticated on the web. When scraping Google Flights, you will encounter:

  • reCAPTCHA v3 (invisible scoring): Runs in the background and assigns a bot probability score to every session. There is no visible challenge — sessions that score poorly are silently served degraded results or blocked entirely.
  • Browser fingerprinting: Google analyzes dozens of browser attributes including WebGL renderer, canvas fingerprint, installed fonts, plugin enumeration, and screen resolution consistency.
  • Behavioral analysis: Mouse movement patterns, scroll behavior, click timing, and navigation patterns are evaluated against human baselines.
  • IP reputation database: Google maintains arguably the world’s most comprehensive IP reputation database, flagging datacenter ranges, known proxy IPs, and IPs with suspicious activity histories.
  • Request pattern analysis: Searches that follow predictable patterns (same routes, regular intervals, no variation) are flagged as automated.

Skyscanner Detection Stack

Skyscanner uses a layered approach combining third-party and custom solutions:

  • Cloudflare Bot Management: Front-line defense that challenges suspicious traffic with JavaScript challenges and manages turnstile CAPTCHAs.
  • Custom rate limiting: Endpoint-specific rate limits that throttle search requests per IP and per session.
  • Session validation: Requires valid session cookies and proper header sequences — direct API calls without proper session initialization are rejected.
  • Device fingerprinting: Client-side JavaScript collects device characteristics and compares them against known bot signatures.

Proxy Setup for Flight Aggregator Scraping

Recommended Proxy Configuration

TargetProxy TypeRotationConcurrencyNotes
Google FlightsResidential (rotating)Per request3-5 sessionsMobile proxies for persistent blocks
Google Flights (heavy volume)Mobile (4G/5G)Sticky 5-min2-3 sessionsHighest success rate but most expensive
Skyscanner WebResidential (rotating)Per request5-8 sessionsStandard residential usually sufficient
Skyscanner API endpointsISP/Static ResidentialSticky 10-min3-5 sessionsSession persistence needed for API flow

Essential Proxy Configuration Rules

  1. Match proxy location to search context: If searching for flights departing from London, use a UK proxy. Mismatched locations trigger additional scrutiny.
  2. Consistent session fingerprinting: Your browser timezone, language, and locale settings must match your proxy’s geographic location. A browser claiming to be in Tokyo with a German IP is an obvious red flag.
  3. Avoid proxy chaining: Do not route traffic through multiple proxies. The added latency creates detectable timing anomalies.
  4. Monitor proxy health: Track success rates per proxy IP. Remove IPs that consistently trigger CAPTCHAs — they are likely flagged in the platform’s reputation database.

Handling JavaScript-Heavy Flight Search Pages

Both Google Flights and Skyscanner render search results entirely through client-side JavaScript. Simple HTTP request libraries (like requests in Python) cannot access fare data. You need a full browser rendering engine.

Headless Browser Setup

Use Playwright or Puppeteer with the following critical configurations:

  • Stealth plugins: Apply playwright-extra stealth or puppeteer-extra-plugin-stealth to mask automation signals (navigator.webdriver property, Chrome DevTools protocol detection, etc.)
  • Realistic viewport: Use common screen resolutions (1920×1080, 1366×768, 1440×900) — avoid unusual sizes
  • WebGL and Canvas: Ensure consistent GPU rendering fingerprints across sessions by using hardware-acceleration-compatible configurations
  • Font rendering: Match installed fonts to the operating system your user agent claims to be running

Waiting for Dynamic Content

Flight search results load progressively. A common mistake is scraping data before all results have loaded. Implement these waiting strategies:

  1. Wait for the loading spinner or progress indicator to disappear
  2. Monitor network requests — wait until XHR/fetch requests related to fare data have completed
  3. Check for the presence of a results count element (e.g., “Showing 245 flights”)
  4. Implement a stability check — wait until the DOM stops changing for a specified interval (2-3 seconds)

Understanding how major sites detect bot activity is crucial for maintaining access. Our guide on how sites detect and block bots covers the same fingerprinting and behavioral detection techniques used by travel platforms.

Extracting Structured Fare Data

Google Flights Data Extraction

Google Flights embeds fare data in several extractable formats:

  • Structured data in page source: Flight offers are often present in JSON-LD or microdata format within the page HTML
  • XHR response payloads: The actual API responses that populate search results contain structured JSON data. Intercepting these network requests yields the cleanest data.
  • DOM parsing: As a fallback, parse the rendered DOM for fare elements using CSS selectors targeting price containers, airline names, and itinerary details

The most reliable approach is to intercept XHR responses. Set up a network request listener in your headless browser that captures responses from Google’s internal flight search API endpoints. These responses contain structured fare data including prices, airlines, leg details, fare classes, and booking links.

Skyscanner Data Extraction

Skyscanner’s architecture makes it somewhat easier to extract structured data:

  • API endpoints: Skyscanner uses identifiable API endpoints for search results. The response format is JSON with clearly labeled fields for price, carrier, departure/arrival times, and duration.
  • Session-based polling: Skyscanner’s search creates a session and then polls for results. You need to capture the session creation response and then follow the polling endpoint to get complete results.
  • Price alerts endpoint: For monitoring over time, the price alert functionality uses a separate, simpler endpoint that can be more stable for repeated access.

Data Normalization

Raw extracted data from different sources uses different formats. Normalize every fare observation into a consistent schema:

FieldDescriptionExample
routeOrigin-Destination (IATA codes)JFK-LHR
departure_dateISO 8601 date2026-06-15
return_dateISO 8601 date (null for one-way)2026-06-22
priceDecimal in original currency487.50
currencyISO 4217 codeUSD
price_usdConverted to USD for comparison487.50
airlineOperating carrier IATA codeBA
stopsNumber of stops0
duration_minutesTotal journey time420
sourceData source identifiergoogle_flights
proxy_countryCountry of proxy usedUS
scraped_atUTC timestamp2026-03-07T14:30:00Z

Scaling Your Fare Intelligence Operation

Request Volume Planning

Start by calculating your actual monitoring needs. For each route-date combination, you need one search per target country. If you monitor 20 routes across 10 countries with daily checks, that is 200 searches per day — easily manageable with a modest proxy pool. Scale incrementally and monitor your block rate at each level.

Cost Optimization

Proxy costs are your primary operational expense. Reduce them by:

  • Caching results: Do not re-scrape a route if you already have a recent data point (within 2-4 hours for most routes)
  • Prioritizing routes: Focus proxy spend on routes with high price variance and upcoming travel dates
  • Using datacenter proxies for testing: Develop and debug your scraping logic with cheap datacenter proxies, then switch to residential for production
  • Off-peak scraping: Some proxy providers charge less during off-peak hours or offer unlimited bandwidth plans that favor lower concurrency

Error Handling and Recovery

Build robust retry logic that distinguishes between different failure modes:

  • CAPTCHA triggered: Rotate to a new IP and reduce request frequency for that target
  • Empty results: May indicate a soft block — retry with a different proxy type
  • Page structure change: Alert immediately — requires scraper code updates
  • Network timeout: Simple retry with the same or different proxy
  • 403/429 response: Hard block — rotate IP, increase delay, consider mobile proxies

Frequently Asked Questions

Is scraping Google Flights against their terms of service?

Yes, automated scraping of Google services generally violates their terms of service. However, the legal distinction between terms of service violations and illegal activity is significant. In the United States, the hiQ v. LinkedIn decision established that scraping publicly available data is not a violation of the Computer Fraud and Abuse Act. That said, Google is more aggressive than most companies in enforcing its ToS through technical measures (blocking, rate limiting) rather than legal action against individual scrapers. Proceed with awareness of the risks and operate at a scale that is proportionate to your legitimate business needs.

How often should I scrape fare data for a specific route?

For most routes, checking 2-4 times per day provides sufficient granularity to catch price changes. Fares on popular routes may change multiple times per day, but the significant drops that represent actual savings opportunities typically persist for hours, not minutes. For routes within 7 days of departure, increase monitoring to every 2-3 hours, as last-minute price changes are more frequent and larger in magnitude.

Can I use Skyscanner’s public API instead of scraping?

Skyscanner previously offered a public affiliate API, but access has become increasingly restricted. As of 2026, new API access requires an affiliate partnership application and approval. If you qualify, the API is far more efficient and reliable than scraping. For those without API access, scraping remains the practical alternative — but the API should always be your first choice if it is available to you.

What happens when Google Flights or Skyscanner changes their page structure?

Page structure changes are inevitable and are the most common cause of scraper failures. Build your extraction logic to be as resilient as possible by targeting stable attributes (data attributes, ARIA labels) rather than fragile CSS class names that may be obfuscated. Implement automated monitoring that alerts you when extraction yields empty or malformed data. Expect to update your scraping code every 4-8 weeks for Google Flights and every 2-3 months for Skyscanner.

Do I need separate proxy pools for Google Flights and Skyscanner?

You do not strictly need separate pools, but it is good practice to isolate them. If your IPs get flagged on one platform, you do not want that to affect your operations on the other. Use your proxy provider’s session management to assign different IP ranges to different targets. This also makes it easier to optimize rotation and concurrency settings per platform, since their detection thresholds are different.

Conclusion

Scraping Google Flights and Skyscanner for fare intelligence is technically demanding but entirely achievable with the right proxy infrastructure and extraction techniques. The key is to approach it systematically: understand the anti-bot defenses you face, configure your proxies and headless browsers to present a convincing human profile, extract data from the most reliable source within each page, and normalize everything into a consistent format for analysis. Start with one platform and a handful of routes, prove your extraction pipeline works reliably, and then scale. The fare data you collect will be orders of magnitude more comprehensive than what any single consumer search can reveal, and the pricing insights you derive from it will consistently save you money or generate revenue.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top