Scraping Trade Lane Data for Freight Route Optimization
Trade lane data, the information about shipping routes between ports, airports, and logistics hubs, is fundamental to freight route optimization. Understanding which trade lanes offer the best combination of cost, speed, reliability, and capacity enables logistics teams to make routing decisions that save money and improve service quality.
In Southeast Asia, the complexity of trade lane options is enormous. With multiple major ports and airports in each country, numerous carrier options on each route, and the choice between direct and transshipment services, the permutations for any origin-destination pair can run into hundreds. Systematic data collection and analysis is the only practical way to navigate this complexity.
Understanding Trade Lane Data
What Constitutes a Trade Lane
A trade lane is a specific shipping route between two points, characterized by:
- Origin and destination: Ports, airports, or inland logistics hubs
- Mode of transport: Ocean, air, road, rail, or multimodal
- Service type: Direct, transshipment, consolidated, or express
- Carriers operating the lane: Which shipping lines, airlines, or trucking companies serve the route
- Frequency: How often services operate (daily, weekly, bi-weekly)
- Transit time: Expected duration from origin to destination
- Capacity: Available space/weight on each service
- Cost: Current market rates for the trade lane
Key Trade Lanes in Southeast Asia
Major intra-Asia trade lanes include:
Ocean shipping:
- Singapore to/from all major SEA ports (hub-and-spoke model)
- Intra-SEA direct services (Bangkok-Jakarta, Ho Chi Minh-Manila)
- China-SEA lanes (Shenzhen/Shanghai to Singapore, Bangkok, Jakarta)
- Northeast Asia-SEA lanes (Busan/Tokyo to SEA ports)
Air cargo:
- Singapore-Bangkok-Hong Kong corridor
- SEA-China express cargo routes
- Intra-SEA express delivery lanes
Road freight:
- Thailand-Malaysia-Singapore corridor
- Thailand-Cambodia-Vietnam routes
- ASEAN highway network lanes
Data Sources for Trade Lane Intelligence
Carrier Schedules and Services
Ocean carriers publish sailing schedules showing:
- Port rotation (sequence of ports called)
- Vessel assignments
- Transit times between each port pair
- Service frequency
- Connection options at transshipment ports
Major sources include Maersk Line, MSC, CMA CGM, Evergreen, ONE, Hapag-Lloyd, and regional carriers like PIL, RCL, and Wan Hai.
Airlines publish cargo flight schedules with:
- Route and frequency
- Aircraft type (indicating cargo capacity)
- Connection options
- Cut-off times
Trucking platforms show:
- Available routes
- Transit time estimates
- Vehicle availability by route
Port and Terminal Data
- PSA Singapore: Connectivity data showing which services call at Singapore and connection possibilities
- Port authorities: Berth allocation schedules showing which vessels are calling
- Terminal operators: Handling capacity and service information
Freight Forwarder Platforms
Freight forwarders like Flexport, Freightos, and regional forwarders publish route options with comparative data on transit times and costs.
Why Proxies Are Needed for Trade Lane Data
Carrier Website Protections
Shipping line websites are increasingly protected against automated access:
- Schedule lookup rate limiting: Carriers restrict the number of schedule queries per session
- Point-to-point search restrictions: Repeated P2P schedule searches trigger bot detection
- Dynamic content: Schedule data is often loaded through JavaScript-heavy interfaces
- Geographic content serving: Some carriers show different schedule information based on user location
Multi-Source Collection Requirements
Comprehensive trade lane data requires collecting from dozens of carrier websites, port authorities, and freight platforms. The volume of requests needed to build a complete picture would quickly exhaust any single IP address’s access allowance.
DataResearchTools mobile proxies distribute these queries across many IPs, keeping per-IP request rates at natural levels. Country-specific proxies also ensure you receive locally relevant schedule and pricing information.
Building a Trade Lane Data Collection System
Data Model
from dataclasses import dataclass, field
from typing import List, Optional
@dataclass
class TradeLane:
origin_port: str
destination_port: str
carrier: str
service_name: str
mode: str # ocean, air, road
service_type: str # direct, transshipment, feeder
transit_days: int
frequency: str # daily, weekly, bi-weekly
vessel_or_flight: Optional[str] = None
transshipment_ports: List[str] = field(default_factory=list)
departure_day: Optional[str] = None # day of week
cut_off_days_before: int = 0
capacity_teu: Optional[int] = None
current_rate: Optional[float] = None
rate_currency: Optional[str] = None
collected_at: str = ""
source: str = ""
@dataclass
class RouteOption:
"""A complete route from origin to destination, possibly multimodal."""
origin: str
destination: str
legs: List[TradeLane]
total_transit_days: int
total_cost: Optional[float] = None
cost_currency: Optional[str] = None
transshipment_count: int = 0
reliability_score: Optional[float] = NoneCarrier Schedule Collection
class CarrierScheduleCollector:
"""Collect sailing schedules from carrier websites."""
def __init__(self, proxy_config):
self.proxy_config = proxy_config
def collect_schedules(self, carrier_config, origin, destination):
"""Collect schedule data from a carrier's website."""
country = self._port_to_country(origin)
proxy = self.proxy_config.get_proxy(country, sticky=True)
session = requests.Session()
session.proxies = proxy
session.headers.update({
"User-Agent": (
"Mozilla/5.0 (Linux; Android 14; Pixel 8) "
"AppleWebKit/537.36 Chrome/121.0.0.0 Mobile Safari/537.36"
),
"Accept": "application/json",
})
try:
response = session.get(
carrier_config["schedule_url"],
params={
"origin": origin,
"destination": destination,
"weeks": 4,
},
timeout=30,
)
if response.status_code == 200:
return self._parse_schedule(
response.json(), carrier_config["name"],
origin, destination
)
except Exception as e:
print(f"Schedule collection error for {carrier_config['name']}: {e}")
return []
def _parse_schedule(self, data, carrier, origin, destination):
"""Parse carrier schedule response into TradeLane objects."""
lanes = []
for service in data.get("services", []):
transshipments = service.get("via_ports", [])
lane = TradeLane(
origin_port=origin,
destination_port=destination,
carrier=carrier,
service_name=service.get("service_code", ""),
mode="ocean",
service_type=(
"direct" if not transshipments else "transshipment"
),
transit_days=service.get("transit_days", 0),
frequency=service.get("frequency", "weekly"),
vessel_or_flight=service.get("vessel_name"),
transshipment_ports=transshipments,
departure_day=service.get("departure_day"),
cut_off_days_before=service.get("cut_off_days", 0),
capacity_teu=service.get("capacity"),
collected_at=datetime.utcnow().isoformat(),
source=carrier,
)
lanes.append(lane)
return lanes
def _port_to_country(self, port_code):
mapping = {
"SGSIN": "sg", "THLCH": "th", "THBKK": "th",
"IDJKT": "id", "IDSBY": "id", "VNSGN": "vn",
"VNHPH": "vn", "PHMNL": "ph", "MYPKG": "my",
}
return mapping.get(port_code, "sg")
class MultiCarrierScheduleAggregator:
"""Aggregate schedules from multiple carriers for route comparison."""
def __init__(self, collector, carrier_configs):
self.collector = collector
self.carrier_configs = carrier_configs
def collect_all_schedules(self, origin, destination):
"""Collect schedules from all carriers for a port pair."""
all_lanes = []
for config in self.carrier_configs:
lanes = self.collector.collect_schedules(
config, origin, destination
)
all_lanes.extend(lanes)
time.sleep(random.uniform(3, 7))
return all_lanesRoute Optimization
class RouteOptimizer:
"""Optimize freight routing using collected trade lane data."""
def __init__(self, trade_lane_db):
self.db = trade_lane_db
def find_best_routes(
self, origin, destination, criteria="balanced",
max_transshipments=2
):
"""Find optimal routes between origin and destination."""
# Get all direct lanes
direct_lanes = self.db.get_lanes(origin, destination)
# Get transshipment options
transshipment_routes = self._find_transshipment_routes(
origin, destination, max_transshipments
)
all_routes = []
# Package direct lanes as routes
for lane in direct_lanes:
route = RouteOption(
origin=origin,
destination=destination,
legs=[lane],
total_transit_days=lane.transit_days,
total_cost=lane.current_rate,
cost_currency=lane.rate_currency,
transshipment_count=0,
)
all_routes.append(route)
# Add transshipment routes
all_routes.extend(transshipment_routes)
# Score and rank routes
scored_routes = self._score_routes(all_routes, criteria)
return scored_routes
def _find_transshipment_routes(
self, origin, destination, max_ts
):
"""Find routes with transshipment connections."""
routes = []
# Common transshipment hubs in SEA
hubs = ["SGSIN", "MYPKG", "THLCH", "CNSHA"]
for hub in hubs:
if hub == origin or hub == destination:
continue
first_legs = self.db.get_lanes(origin, hub)
second_legs = self.db.get_lanes(hub, destination)
for leg1 in first_legs:
for leg2 in second_legs:
# Check if connection is feasible
# (minimum connection time at hub)
min_connection_days = 2
total_transit = (
leg1.transit_days
+ min_connection_days
+ leg2.transit_days
)
total_cost = None
if leg1.current_rate and leg2.current_rate:
total_cost = leg1.current_rate + leg2.current_rate
route = RouteOption(
origin=origin,
destination=destination,
legs=[leg1, leg2],
total_transit_days=total_transit,
total_cost=total_cost,
cost_currency=leg1.rate_currency,
transshipment_count=1,
)
routes.append(route)
return routes
def _score_routes(self, routes, criteria):
"""Score routes based on optimization criteria."""
if not routes:
return []
for route in routes:
if criteria == "fastest":
route.reliability_score = 100 - route.total_transit_days
elif criteria == "cheapest":
route.reliability_score = (
-route.total_cost if route.total_cost else -float("inf")
)
elif criteria == "balanced":
# Balance of cost and speed
transit_score = max(0, 50 - route.total_transit_days * 2)
cost_score = max(
0, 50 - (route.total_cost or 5000) / 100
)
ts_penalty = route.transshipment_count * 5
route.reliability_score = (
transit_score + cost_score - ts_penalty
)
elif criteria == "reliable":
# Prefer direct routes with established carriers
route.reliability_score = (
50
- route.transshipment_count * 20
- route.total_transit_days
)
routes.sort(
key=lambda r: r.reliability_score or 0, reverse=True
)
return routes
def compare_trade_lanes(self, origin, destination):
"""Generate a comprehensive trade lane comparison."""
lanes = self.db.get_lanes(origin, destination)
if not lanes:
return {"message": "No trade lanes found"}
comparison = {
"origin": origin,
"destination": destination,
"total_options": len(lanes),
"direct_services": len([l for l in lanes if l.service_type == "direct"]),
"transshipment_services": len([l for l in lanes if l.service_type == "transshipment"]),
"carriers": list(set(l.carrier for l in lanes)),
"transit_time_range": {
"min": min(l.transit_days for l in lanes),
"max": max(l.transit_days for l in lanes),
"avg": round(
sum(l.transit_days for l in lanes) / len(lanes), 1
),
},
"services": [
{
"carrier": l.carrier,
"service": l.service_name,
"type": l.service_type,
"transit_days": l.transit_days,
"frequency": l.frequency,
"via": l.transshipment_ports,
"rate": l.current_rate,
}
for l in sorted(lanes, key=lambda x: x.transit_days)
],
}
return comparisonHistorical Performance Analysis
def analyze_lane_performance(schedule_db, actual_tracking_db, lane):
"""Analyze actual vs scheduled performance for a trade lane."""
# Get scheduled transit times
scheduled = schedule_db.get_lane_history(
lane.origin_port, lane.destination_port,
lane.carrier, lane.service_name
)
# Get actual arrival data
actuals = actual_tracking_db.get_arrivals(
lane.origin_port, lane.destination_port,
lane.carrier, lane.service_name
)
if not actuals:
return None
delays = []
for actual in actuals:
scheduled_days = actual["scheduled_transit_days"]
actual_days = actual["actual_transit_days"]
delay = actual_days - scheduled_days
delays.append(delay)
return {
"lane": f"{lane.origin_port}-{lane.destination_port}",
"carrier": lane.carrier,
"service": lane.service_name,
"scheduled_transit_days": lane.transit_days,
"avg_actual_days": round(
sum(d + lane.transit_days for d in delays) / len(delays), 1
),
"avg_delay_days": round(sum(delays) / len(delays), 1),
"on_time_pct": round(
sum(1 for d in delays if d <= 0) / len(delays) * 100, 1
),
"max_delay_days": max(delays),
"schedule_reliability": round(
sum(1 for d in delays if abs(d) <= 1) / len(delays) * 100, 1
),
"sailings_analyzed": len(delays),
}DataResearchTools for Trade Lane Data Collection
DataResearchTools mobile proxies support trade lane data collection with:
- Carrier website access: Reliably access schedule pages of all major carriers serving SEA trade lanes
- Port authority data: Access berth schedules and vessel call information from SEA port authorities
- Multi-country routing: Collect data from platforms in all relevant countries for comprehensive route analysis
- Consistent collection: Reliable mobile proxies enable the ongoing schedule monitoring needed for performance tracking
Conclusion
Trade lane data collection and analysis is foundational for freight route optimization. In Southeast Asia’s complex shipping network, with its numerous ports, carriers, and routing options, systematic data collection provides the visibility needed to make optimal routing decisions.
DataResearchTools mobile proxies enable reliable collection from carrier websites, port authorities, and freight platforms across the region. By building a comprehensive trade lane database and applying optimization logic, logistics teams can identify the best routes for each shipment, balancing cost, speed, reliability, and capacity according to their specific priorities.
Start with your highest-volume trade lanes, build out your data collection and analysis capabilities, and progressively expand coverage to create a complete picture of available routing options across Southeast Asia.
- Building a Delivery SLA Monitoring System with Proxies
- Building a Freight Rate Comparison Engine with Proxy Infrastructure
- How to Scrape AliExpress Product Data Without Getting Blocked
- Amazon Buy Box Monitoring: Proxy Setup for Continuous Tracking
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- API vs Web Scraping: When You Need Proxies (and When You Don’t)
- Best Proxies for Logistics and Supply Chain Data Collection
- Building a Delivery SLA Monitoring System with Proxies
- aiohttp + BeautifulSoup: Async Python Scraping
- How to Scrape AliExpress Product Data Without Getting Blocked
- Amazon Buy Box Monitoring: Proxy Setup for Continuous Tracking
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- Best Proxies for Logistics and Supply Chain Data Collection
- Building a Delivery SLA Monitoring System with Proxies
- aiohttp + BeautifulSoup: Async Python Scraping
- How to Scrape AliExpress Product Data Without Getting Blocked
- Amazon Buy Box Monitoring: Proxy Setup for Continuous Tracking
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
Related Reading
- Best Proxies for Logistics and Supply Chain Data Collection
- Building a Delivery SLA Monitoring System with Proxies
- aiohttp + BeautifulSoup: Async Python Scraping
- How to Scrape AliExpress Product Data Without Getting Blocked
- Amazon Buy Box Monitoring: Proxy Setup for Continuous Tracking
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
last updated: April 3, 2026