Scraping Google Flights and Skyscanner for Fare

Google Flights and Skyscanner are the two most powerful flight search engines available to consumers — and they are also the most valuable data sources for anyone building fare intelligence systems. Between them, they aggregate fares from hundreds of airlines and OTAs, providing a comprehensive view of the market that no single airline website can match. But these platforms were built for human users, not automated scrapers, and both deploy sophisticated anti-bot defenses that make data extraction a technical challenge. This guide covers the specific obstacles you will face when scraping these platforms in 2026, the proxy configurations that work, and the techniques for extracting clean, structured fare data at scale.

Why Google Flights and Skyscanner Are the Best Fare Data Sources

Before diving into the technical details, it is worth understanding why these two platforms are preferred over scraping individual airline sites.

Google Flights Advantages

Comprehensive coverage: Aggregates fares from virtually every major airline and many OTAs worldwide
Structured data: Fare data is embedded in the page source in a relatively parseable format
Price history: Provides built-in price trend data that can supplement your own monitoring
Calendar view: Shows price variation across dates in a single request, reducing scraping volume
No redirect model: Displays actual prices inline rather than redirecting to external sites for quotes

Skyscanner Advantages

Deeper OTA coverage: Includes smaller and regional OTAs that Google Flights may miss
Price alerts API: Underlying API endpoints can sometimes be accessed for cleaner data extraction
Multi-city search: Better support for complex itineraries
Explore feature: “Everywhere” search reveals cheapest destinations from a given origin
Historical pricing: Longer price trend data for many routes

Feature	Google Flights	Skyscanner
Airline coverage	Excellent — nearly all major carriers	Excellent — plus budget and regional carriers
OTA aggregation	Limited — focuses on airline direct	Extensive — dozens of OTAs per route
Anti-bot difficulty	High (Google infrastructure)	Moderate-High (Cloudflare + custom)
Data structure	Semi-structured in page source	JSON via API endpoints
Rate limiting	Aggressive	Moderate
Geographic pricing	Yes — varies by search origin	Yes — varies by market

Anti-Bot Measures You Will Face

Both platforms invest heavily in preventing automated access. Understanding their specific defenses is the first step to working around them. For a comprehensive overview of bot detection systems used across the web, see our detailed breakdown of anti-bot systems that target price scrapers and how proxies help bypass them.

Google Flights Detection Stack

Google’s anti-bot infrastructure is among the most sophisticated on the web. When scraping Google Flights, you will encounter:

reCAPTCHA v3 (invisible scoring): Runs in the background and assigns a bot probability score to every session. There is no visible challenge — sessions that score poorly are silently served degraded results or blocked entirely.
Browser fingerprinting: Google analyzes dozens of browser attributes including WebGL renderer, canvas fingerprint, installed fonts, plugin enumeration, and screen resolution consistency.
Behavioral analysis: Mouse movement patterns, scroll behavior, click timing, and navigation patterns are evaluated against human baselines.
IP reputation database: Google maintains arguably the world’s most comprehensive IP reputation database, flagging datacenter ranges, known proxy IPs, and IPs with suspicious activity histories.
Request pattern analysis: Searches that follow predictable patterns (same routes, regular intervals, no variation) are flagged as automated.

Skyscanner Detection Stack

Skyscanner uses a layered approach combining third-party and custom solutions:

Cloudflare Bot Management: Front-line defense that challenges suspicious traffic with JavaScript challenges and manages turnstile CAPTCHAs.
Custom rate limiting: Endpoint-specific rate limits that throttle search requests per IP and per session.
Session validation: Requires valid session cookies and proper header sequences — direct API calls without proper session initialization are rejected.
Device fingerprinting: Client-side JavaScript collects device characteristics and compares them against known bot signatures.

Proxy Setup for Flight Aggregator Scraping

Recommended Proxy Configuration

Target	Proxy Type	Rotation	Concurrency	Notes
Google Flights	Residential (rotating)	Per request	3-5 sessions	Mobile proxies for persistent blocks
Google Flights (heavy volume)	Mobile (4G/5G)	Sticky 5-min	2-3 sessions	Highest success rate but most expensive
Skyscanner Web	Residential (rotating)	Per request	5-8 sessions	Standard residential usually sufficient
Skyscanner API endpoints	ISP/Static Residential	Sticky 10-min	3-5 sessions	Session persistence needed for API flow

Essential Proxy Configuration Rules

Match proxy location to search context: If searching for flights departing from London, use a UK proxy. Mismatched locations trigger additional scrutiny.
Consistent session fingerprinting: Your browser timezone, language, and locale settings must match your proxy’s geographic location. A browser claiming to be in Tokyo with a German IP is an obvious red flag.
Avoid proxy chaining: Do not route traffic through multiple proxies. The added latency creates detectable timing anomalies.
Monitor proxy health: Track success rates per proxy IP. Remove IPs that consistently trigger CAPTCHAs — they are likely flagged in the platform’s reputation database.

Handling JavaScript-Heavy Flight Search Pages

Both Google Flights and Skyscanner render search results entirely through client-side JavaScript. Simple HTTP request libraries (like requests in Python) cannot access fare data. You need a full browser rendering engine.

Headless Browser Setup

Use Playwright or Puppeteer with the following critical configurations:

Stealth plugins: Apply playwright-extra stealth or puppeteer-extra-plugin-stealth to mask automation signals (navigator.webdriver property, Chrome DevTools protocol detection, etc.)
Realistic viewport: Use common screen resolutions (1920×1080, 1366×768, 1440×900) — avoid unusual sizes
WebGL and Canvas: Ensure consistent GPU rendering fingerprints across sessions by using hardware-acceleration-compatible configurations
Font rendering: Match installed fonts to the operating system your user agent claims to be running

Waiting for Dynamic Content

Flight search results load progressively. A common mistake is scraping data before all results have loaded. Implement these waiting strategies:

Wait for the loading spinner or progress indicator to disappear
Monitor network requests — wait until XHR/fetch requests related to fare data have completed
Check for the presence of a results count element (e.g., “Showing 245 flights”)
Implement a stability check — wait until the DOM stops changing for a specified interval (2-3 seconds)

Understanding how major sites detect bot activity is crucial for maintaining access. Our guide on how sites detect and block bots covers the same fingerprinting and behavioral detection techniques used by travel platforms.

Extracting Structured Fare Data

Google Flights Data Extraction

Google Flights embeds fare data in several extractable formats:

Structured data in page source: Flight offers are often present in JSON-LD or microdata format within the page HTML
XHR response payloads: The actual API responses that populate search results contain structured JSON data. Intercepting these network requests yields the cleanest data.
DOM parsing: As a fallback, parse the rendered DOM for fare elements using CSS selectors targeting price containers, airline names, and itinerary details

The most reliable approach is to intercept XHR responses. Set up a network request listener in your headless browser that captures responses from Google’s internal flight search API endpoints. These responses contain structured fare data including prices, airlines, leg details, fare classes, and booking links.

Skyscanner Data Extraction

Skyscanner’s architecture makes it somewhat easier to extract structured data:

API endpoints: Skyscanner uses identifiable API endpoints for search results. The response format is JSON with clearly labeled fields for price, carrier, departure/arrival times, and duration.
Session-based polling: Skyscanner’s search creates a session and then polls for results. You need to capture the session creation response and then follow the polling endpoint to get complete results.
Price alerts endpoint: For monitoring over time, the price alert functionality uses a separate, simpler endpoint that can be more stable for repeated access.

Data Normalization

Raw extracted data from different sources uses different formats. Normalize every fare observation into a consistent schema:

Field	Description	Example
route	Origin-Destination (IATA codes)	JFK-LHR
departure_date	ISO 8601 date	2026-06-15
return_date	ISO 8601 date (null for one-way)	2026-06-22
price	Decimal in original currency	487.50
currency	ISO 4217 code	USD
price_usd	Converted to USD for comparison	487.50
airline	Operating carrier IATA code	BA
stops	Number of stops	0
duration_minutes	Total journey time	420
source	Data source identifier	google_flights
proxy_country	Country of proxy used	US
scraped_at	UTC timestamp	2026-03-07T14:30:00Z

Scaling Your Fare Intelligence Operation

Request Volume Planning

Start by calculating your actual monitoring needs. For each route-date combination, you need one search per target country. If you monitor 20 routes across 10 countries with daily checks, that is 200 searches per day — easily manageable with a modest proxy pool. Scale incrementally and monitor your block rate at each level.

Cost Optimization

Proxy costs are your primary operational expense. Reduce them by:

Caching results: Do not re-scrape a route if you already have a recent data point (within 2-4 hours for most routes)
Prioritizing routes: Focus proxy spend on routes with high price variance and upcoming travel dates
Using datacenter proxies for testing: Develop and debug your scraping logic with cheap datacenter proxies, then switch to residential for production
Off-peak scraping: Some proxy providers charge less during off-peak hours or offer unlimited bandwidth plans that favor lower concurrency

Error Handling and Recovery

Build robust retry logic that distinguishes between different failure modes:

CAPTCHA triggered: Rotate to a new IP and reduce request frequency for that target
Empty results: May indicate a soft block — retry with a different proxy type
Page structure change: Alert immediately — requires scraper code updates
Network timeout: Simple retry with the same or different proxy
403/429 response: Hard block — rotate IP, increase delay, consider mobile proxies

Frequently Asked Questions

Is scraping Google Flights against their terms of service?

Yes, automated scraping of Google services generally violates their terms of service. However, the legal distinction between terms of service violations and illegal activity is significant. In the United States, the hiQ v. LinkedIn decision established that scraping publicly available data is not a violation of the Computer Fraud and Abuse Act. That said, Google is more aggressive than most companies in enforcing its ToS through technical measures (blocking, rate limiting) rather than legal action against individual scrapers. Proceed with awareness of the risks and operate at a scale that is proportionate to your legitimate business needs.

How often should I scrape fare data for a specific route?

For most routes, checking 2-4 times per day provides sufficient granularity to catch price changes. Fares on popular routes may change multiple times per day, but the significant drops that represent actual savings opportunities typically persist for hours, not minutes. For routes within 7 days of departure, increase monitoring to every 2-3 hours, as last-minute price changes are more frequent and larger in magnitude.

Can I use Skyscanner’s public API instead of scraping?

Skyscanner previously offered a public affiliate API, but access has become increasingly restricted. As of 2026, new API access requires an affiliate partnership application and approval. If you qualify, the API is far more efficient and reliable than scraping. For those without API access, scraping remains the practical alternative — but the API should always be your first choice if it is available to you.

What happens when Google Flights or Skyscanner changes their page structure?

Page structure changes are inevitable and are the most common cause of scraper failures. Build your extraction logic to be as resilient as possible by targeting stable attributes (data attributes, ARIA labels) rather than fragile CSS class names that may be obfuscated. Implement automated monitoring that alerts you when extraction yields empty or malformed data. Expect to update your scraping code every 4-8 weeks for Google Flights and every 2-3 months for Skyscanner.

Do I need separate proxy pools for Google Flights and Skyscanner?

You do not strictly need separate pools, but it is good practice to isolate them. If your IPs get flagged on one platform, you do not want that to affect your operations on the other. Use your proxy provider’s session management to assign different IP ranges to different targets. This also makes it easier to optimize rotation and concurrency settings per platform, since their detection thresholds are different.

Conclusion

Scraping Google Flights and Skyscanner for fare intelligence is technically demanding but entirely achievable with the right proxy infrastructure and extraction techniques. The key is to approach it systematically: understand the anti-bot defenses you face, configure your proxies and headless browsers to present a convincing human profile, extract data from the most reliable source within each page, and normalize everything into a consistent format for analysis. Start with one platform and a handful of routes, prove your extraction pipeline works reliably, and then scale. The fare data you collect will be orders of magnitude more comprehensive than what any single consumer search can reveal, and the pricing insights you derive from it will consistently save you money or generate revenue.

Scraping Google Flights and Skyscanner for Fare Intelligence (2026)