How to Scale Travel Fare Monitoring from 100 Routes to

Monitoring fares across 100 travel routes is a side project. Monitoring 10,000 routes is an engineering challenge that demands purpose-built infrastructure, intelligent proxy pool management, and a database architecture that can handle millions of price records without degrading query performance. This guide covers the practical steps to scale a travel fare monitoring operation from hobby-scale to production-grade, drawing on the same principles used by commercial fare aggregators.

Understanding the Scale Challenge

The difference between 100 routes and 10,000 routes is not just “do it 100 times more.” Scaling introduces problems that do not exist at smaller volumes:

Challenge	At 100 Routes	At 10,000 Routes
Daily scrape requests	500-1,000	50,000-100,000+
Proxy bandwidth (monthly)	15-30 GB	1.5-3 TB
Database records per year	~200K	~20M+
Scrape execution time (sequential)	2-4 hours	8-20 days (impossible)
Proxy cost (monthly)	$200-$400	$3,000-$15,000
Failure recovery	Manual review	Must be fully automated
Server infrastructure	Single machine	Distributed system

Sequential scraping is not viable at scale. If each scrape takes 5 seconds (including page load, rendering, and data extraction), scraping 100,000 daily requests sequentially would take nearly 6 days. You need distributed, concurrent execution. For the foundational principles of scaling price monitoring operations, see our detailed guide on scaling price monitoring to 100K products.

Distributed Scraping Architecture

Worker-Queue Architecture

The most proven architecture for large-scale scraping uses a job queue with multiple workers:

Scheduler: Generates scrape jobs based on route priority, schedule, and last-scrape timestamp. Places jobs in a message queue.
Message Queue: Holds pending scrape jobs. Redis, RabbitMQ, or AWS SQS all work. The queue decouples job creation from execution.
Worker Pool: Multiple worker processes (or containers) pull jobs from the queue, execute the scrape, and write results to the database. Workers can scale horizontally.
Result Processor: Validates scraped data, performs normalization, updates caches, and triggers alerts.
Monitor: Tracks worker health, queue depth, success rates, and proxy performance. Triggers alerts when metrics degrade.

Sizing Your Worker Pool

The number of workers you need depends on your target throughput and per-scrape latency:

Daily Scrape Target	Avg. Scrape Time	Scraping Window	Workers Needed
10,000	5 seconds	12 hours	2-3
50,000	5 seconds	12 hours	6-8
100,000	5 seconds	12 hours	12-15
100,000	10 seconds	12 hours	24-30
500,000	5 seconds	12 hours	60-75

These estimates include a 30% buffer for retries and failures. In practice, per-scrape time varies by target site: simple API-based scrapes (FlixBus) might take 2 seconds, while heavily defended sites requiring headless browsers (Booking.com, Eurostar) might take 15-20 seconds.

Containerized Deployment

Workers should be deployed as containers (Docker) for several reasons:

Horizontal scaling: Spin up more workers during peak scraping windows, scale down during quiet periods
Isolation: Each worker has its own headless browser instance, preventing memory leaks in one worker from affecting others
Reproducibility: Workers have identical environments, eliminating “works on my machine” issues
Recovery: Crashed containers are automatically restarted by the orchestrator (Kubernetes, Docker Swarm, ECS)

A typical worker container includes the scraping framework (Scrapy, Playwright), proxy client configuration, and result reporting. Memory allocation depends on whether you use headless browsers (512MB-1GB per container) or HTTP-only scraping (128-256MB per container). For server setup considerations specific to large-scale bot operations, see our guide on server setup for bot operations, which covers many overlapping infrastructure decisions.

Proxy Pool Management at Scale

Why a Single Proxy Provider Is Not Enough

At 10,000 routes, you are consuming enough proxy bandwidth that relying on a single provider creates critical risks:

Provider downtime: Even a few hours of proxy provider downtime creates data gaps across thousands of routes
IP pool exhaustion: High-volume usage burns through proxy IPs faster than providers can refresh them for your account
Pricing leverage: A single provider knows you are locked in and has less incentive to offer competitive rates
Geographic gaps: No single provider has the best coverage in every region

For detailed strategies on working with multiple proxy providers, see our guide on managing multiple proxy providers.

Multi-Provider Proxy Architecture

Build an abstraction layer between your scraping workers and your proxy providers:

Component	Function	Implementation
Proxy Router	Selects the best proxy for each request based on target site, geography, and provider performance	Custom middleware or commercial proxy manager
Performance Tracker	Records success rate, latency, and cost per proxy per target site	Time-series metrics (Prometheus, InfluxDB)
Budget Allocator	Distributes bandwidth budget across providers based on cost-effectiveness	Custom logic using performance data
Failover Logic	Automatically shifts traffic when a provider degrades	Circuit breaker pattern with automatic recovery
Cost Reporter	Tracks actual spend per provider per day/week/month	Dashboard pulling from provider APIs and internal logs

Proxy Allocation Strategy

Not all routes need the same proxy quality. Implement tiered allocation:

Tier 1 (high-defense sites: Booking.com, Google, Eurostar): Use ISP or mobile proxies. These are the most expensive but necessary for reliable scraping.
Tier 2 (moderate-defense sites: Expedia, Amtrak, Trainline): Rotating residential proxies provide a good balance of success rate and cost.
Tier 3 (low-defense sites: FlixBus API, smaller operators): High-quality datacenter proxies may work. Test before committing.

At 10,000 routes, proxy cost optimization matters. A 10% improvement in proxy efficiency at this scale saves $300-$1,500 per month.

IP Rotation and Cooling

At scale, you will encounter the same target sites hundreds or thousands of times per day. Even with large proxy pools, IP reuse is inevitable. Implement IP management:

Per-site IP tracking: Record which IPs have been used for each target site and when
Cooling periods: After using an IP for a site, wait a minimum time before reusing it (typically 30-60 minutes for aggressive sites)
Ban detection: When an IP is blocked, mark it as “cooled” for that site for an extended period (2-24 hours)
Cross-site independence: An IP banned on Booking.com can still be used for FlixBus. Track bans per-site, not globally.

Database Optimization for Time-Series Fare Data

Schema Design

At 10,000 routes with multiple daily scrapes, your database will accumulate millions of records per month. The schema must support both fast writes (ingesting scrape results) and fast reads (querying price history, generating alerts, serving comparison results).

A recommended schema approach:

Routes table: Static data about monitored routes (origin, destination, operator, mode of transport). Updated infrequently.
Fares table (time-series): The core table. Each record is one price observation: route_id, departure_date, fare_class, price, currency, availability, scrape_timestamp. This table grows continuously.
Alerts table: Active price alerts with route, threshold, and notification preferences.
Proxy metrics table: Performance data for proxy optimization. Separate from fare data to avoid slowing fare queries.

PostgreSQL with TimescaleDB

For most teams, PostgreSQL with the TimescaleDB extension is the best balance of capability, performance, and operational simplicity for time-series fare data.

Key configuration for fare monitoring:

Hypertable on fares table: Partition by scrape_timestamp with chunk intervals of 1 week (balances query performance with chunk management overhead)
Compression policy: Compress chunks older than 2 weeks. Time-series compression in TimescaleDB achieves 90-95% compression on fare data, reducing storage costs dramatically.
Retention policy: Keep detailed data for 12-24 months, then aggregate to daily min/max/median and drop the detail records.
Continuous aggregates: Pre-compute daily and weekly price summaries as materialized views that update automatically. These power dashboards and trend analysis without hitting the raw data.

Indexing Strategy

Critical indexes for fare monitoring queries:

Query Pattern	Required Index	Why
“Show price history for route X”	(route_id, scrape_timestamp DESC)	Most common query; needs fast range scan
“Find cheapest fare for destination Y on date Z”	(destination, departure_date, price)	Powers search results and alerts
“Which routes had price drops today?”	(scrape_timestamp, route_id) with partial index on recent data	Alert processing; only needs recent data
“Average price by operator for route X”	(route_id, operator, scrape_timestamp)	Competitive analysis queries

Write Optimization

Ingesting 50,000-100,000 records per day requires attention to write performance:

Batch inserts: Buffer scrape results and insert in batches of 500-1,000 records rather than one-at-a-time inserts
Async writes: Workers push results to a queue; a dedicated ingestion process writes to the database. This decouples scrape speed from database write speed.
COPY vs. INSERT: PostgreSQL COPY is 5-10x faster than INSERT for bulk loading. Use it for batch ingestion.
Minimize indexes on the fares table: Every index slows writes. Only index what you actively query. Add indexes for new query patterns as needed, not preemptively.

Route Prioritization and Scheduling

Not All Routes Deserve Equal Attention

At 10,000 routes, you cannot (and should not) scrape every route at the same frequency. Implement priority-based scheduling:

Priority Tier	Criteria	Scrape Frequency	Percentage of Routes
Critical	High traffic, high revenue, active price alerts	Every 2-4 hours	5-10%
High	Popular routes, competitive markets	Every 6-12 hours	15-25%
Standard	Moderate demand, stable pricing	Once daily	40-50%
Low	Low demand, infrequent price changes	Every 2-3 days	20-30%

Priority should be dynamic. A route that suddenly shows price volatility should be automatically promoted to a higher scraping frequency. A route with no price changes for 2 weeks can be demoted.

Intelligent Scheduling

Beyond simple frequency tiers, optimize your scheduling with:

Price change detection: Routes where prices changed in the last scrape get scheduled for a follow-up scrape sooner
Departure date proximity: Scrape more frequently as departure dates approach (prices change faster in the 2-week window before departure)
Time-of-day optimization: Some travel sites update prices at specific times. Identify these patterns and schedule scrapes after price updates.
Load spreading: Distribute scrapes evenly across the day rather than running everything at midnight. This keeps database write load steady and proxy usage smooth.

Cost Management

Breaking Down Operating Costs

At scale, understanding and optimizing costs becomes essential for sustainability:

Cost Category	Small (100 routes)	Medium (1,000 routes)	Large (10,000 routes)
Proxy bandwidth	$200-$400/mo	$800-$1,500/mo	$3,000-$15,000/mo
Server/compute	$20-$50/mo	$100-$300/mo	$500-$2,000/mo
Database storage	$10-$20/mo	$50-$100/mo	$200-$500/mo
Monitoring/alerting	Free tier	$20-$50/mo	$100-$300/mo
Engineering time	Part-time	Half-time	1-2 full-time engineers
Total	$250-$500/mo	$1,000-$2,000/mo	$4,000-$18,000/mo

Cost Optimization Strategies

API-first scraping: Whenever possible, scrape API endpoints rather than rendering full pages. This reduces bandwidth by 80-95% and uses less compute.
Conditional scraping: Check if a page has changed before fully processing it. HTTP ETag and Last-Modified headers can short-circuit unchanged pages.
Proxy tier matching: Use the cheapest proxy that works for each target. Do not use mobile proxies for sites that work fine with rotating residential.
Shared proxy pools: If you are monitoring multiple types of data (flights, hotels, ground transport), share proxy pools where target sites overlap.
Spot instances: Run workers on cloud spot/preemptible instances for 60-80% compute savings. Workers are inherently fault-tolerant (failed jobs re-enter the queue).
Right-size scraping frequency: Audit which routes genuinely need frequent scraping. Over-scraping is the single biggest source of wasted cost.

Monitoring and Alerting

Key Metrics to Track

At 10,000 routes, manual oversight is impossible. Automated monitoring must cover:

Scrape success rate (per site, per hour): Alert when it drops below 80% for any site. A sudden drop usually means the site changed its anti-bot rules or layout.
Queue depth: If jobs are accumulating faster than workers process them, you need more workers or your scrapes are getting slower.
Data freshness (per route tier): Track the age of the most recent scrape per route. Critical routes with data older than their target freshness need attention.
Proxy cost per successful scrape: This is your core efficiency metric. If it spikes, investigate whether proxies are failing more often or bandwidth per scrape has increased.
Database write latency: Increasing write latency indicates index bloat, insufficient buffer pool, or storage bottlenecks.
Price anomaly rate: Track how often scraped prices fail validation checks (negative prices, prices 10x above historical average). A spike in anomalies usually means a site layout change broke your parser.

Incident Response

Common incidents at scale and their response playbooks:

Incident	Detection	Response
Target site layout change	Parse errors spike; anomaly rate increases	Pause scraping for affected site; update parser; replay failed jobs
Proxy provider outage	Success rate drops for all sites simultaneously	Failover to backup provider; resume when primary recovers
Database write bottleneck	Queue depth grows; write latency increases	Increase batch size; check for lock contention; scale database
Worker crash loop	Worker restart count spikes	Check logs for OOM or browser crashes; increase container memory
Anti-bot escalation	Success rate drops for one site; other sites unaffected	Switch to higher-tier proxies; reduce request rate; update fingerprints

Migration Path: 100 to 10,000 Routes

Phase 1: Solid Foundation (100-500 Routes)

Single server, single proxy provider
PostgreSQL without TimescaleDB (standard partitioning is sufficient)
Cron-based scheduling
Manual monitoring with basic alerting
Focus on getting your parsers and normalization right

Phase 2: Early Scale (500-2,000 Routes)

Add a second proxy provider
Implement job queue (Redis Queue or RabbitMQ)
Move to containerized workers (2-5 containers)
Add TimescaleDB for time-series optimization
Implement priority-based scheduling
Build basic monitoring dashboard

Phase 3: Production Scale (2,000-10,000 Routes)

3+ proxy providers with automated failover
Kubernetes or ECS for worker orchestration (10-30 containers)
Dedicated database server with read replicas
Comprehensive monitoring with PagerDuty or similar alerting
Automated parser testing (catch layout changes before they affect production)
Cost optimization becomes a regular operational activity

Phase 4: Large Scale (10,000+ Routes)

Multi-region deployment for geographic proxy diversity and fault tolerance
Database sharding or migration to a purpose-built time-series database
Machine learning for anomaly detection and schedule optimization
Dedicated SRE (Site Reliability Engineering) for the scraping infrastructure
Formal vendor management for proxy providers

FAQ

What is the biggest bottleneck when scaling from 100 to 10,000 routes?

For most teams, the bottleneck is proxy management, not compute or database. At 100 routes, you can use a single proxy provider and not worry much about optimization. At 10,000 routes, proxy costs dominate your budget, and proxy failures are the primary cause of data gaps. Investing in a proper proxy abstraction layer with multi-provider support and intelligent routing pays for itself quickly through better success rates and lower per-scrape costs.

Should I build my own scraping infrastructure or use a commercial scraping API?

At small scale (under 500 routes), commercial scraping APIs like ScraperAPI, Zyte, or Bright Data’s Web Scraper IDE can be cost-effective and save development time. At 10,000 routes, the economics usually favor custom infrastructure. Commercial APIs charge per request, and at 50,000-100,000 daily requests, the monthly cost can exceed $5,000-$10,000. Custom infrastructure requires more engineering investment but gives you lower marginal costs, more control over data quality, and faster adaptation when target sites change.

How do I handle different target sites updating prices at different frequencies?

Track the price change frequency for each site empirically. After a few weeks of daily scraping, you will have data showing how often each site actually changes prices for each route. Some airlines update prices every few hours, while some bus operators only change prices daily or weekly. Use this data to set per-site scraping frequencies. There is no value in scraping a site 4 times per day if it only updates prices once. Adaptive scheduling based on observed change rates is one of the most effective cost optimizations at scale.

What happens when a target site completely blocks my scraping operation?

This is inevitable at scale. Your response should be layered: first, rotate to a different proxy provider. Second, review and update your request fingerprints (headers, TLS signature, browser profile). Third, reduce your request rate for that site. Fourth, try a different scraping approach (switch from headless browser to API interception or vice versa). If the site is genuinely blocked and you have exhausted technical options, consider whether you can get the same data from an aggregator site that is easier to scrape. Always maintain at least one backup data source for critical routes.

How much engineering time does maintaining a 10,000-route monitoring system require?

Once operational, plan for 1-2 full-time engineers dedicated to the system. Approximately 40% of their time goes to parser maintenance (target sites change layouts and anti-bot systems regularly), 25% to infrastructure operations (scaling, database maintenance, provider management), 20% to feature development (new sites, new data points, better alerting), and 15% to cost optimization and performance tuning. The common mistake is building the system and assuming it runs itself. Travel sites actively evolve their defenses, and a monitoring system without ongoing engineering attention degrades within weeks.

How to Scale Travel Fare Monitoring from 100 Routes to 10,000 (2026)