Scraping Trade Lane Data for Freight Route Optimization

Scraping Trade Lane Data for Freight Route Optimization

Trade lane data, the information about shipping routes between ports, airports, and logistics hubs, is fundamental to freight route optimization. Understanding which trade lanes offer the best combination of cost, speed, reliability, and capacity enables logistics teams to make routing decisions that save money and improve service quality.

In Southeast Asia, the complexity of trade lane options is enormous. With multiple major ports and airports in each country, numerous carrier options on each route, and the choice between direct and transshipment services, the permutations for any origin-destination pair can run into hundreds. Systematic data collection and analysis is the only practical way to navigate this complexity.

Understanding Trade Lane Data

What Constitutes a Trade Lane

A trade lane is a specific shipping route between two points, characterized by:

  • Origin and destination: Ports, airports, or inland logistics hubs
  • Mode of transport: Ocean, air, road, rail, or multimodal
  • Service type: Direct, transshipment, consolidated, or express
  • Carriers operating the lane: Which shipping lines, airlines, or trucking companies serve the route
  • Frequency: How often services operate (daily, weekly, bi-weekly)
  • Transit time: Expected duration from origin to destination
  • Capacity: Available space/weight on each service
  • Cost: Current market rates for the trade lane

Key Trade Lanes in Southeast Asia

Major intra-Asia trade lanes include:

Ocean shipping:

  • Singapore to/from all major SEA ports (hub-and-spoke model)
  • Intra-SEA direct services (Bangkok-Jakarta, Ho Chi Minh-Manila)
  • China-SEA lanes (Shenzhen/Shanghai to Singapore, Bangkok, Jakarta)
  • Northeast Asia-SEA lanes (Busan/Tokyo to SEA ports)

Air cargo:

  • Singapore-Bangkok-Hong Kong corridor
  • SEA-China express cargo routes
  • Intra-SEA express delivery lanes

Road freight:

  • Thailand-Malaysia-Singapore corridor
  • Thailand-Cambodia-Vietnam routes
  • ASEAN highway network lanes

Data Sources for Trade Lane Intelligence

Carrier Schedules and Services

Ocean carriers publish sailing schedules showing:

  • Port rotation (sequence of ports called)
  • Vessel assignments
  • Transit times between each port pair
  • Service frequency
  • Connection options at transshipment ports

Major sources include Maersk Line, MSC, CMA CGM, Evergreen, ONE, Hapag-Lloyd, and regional carriers like PIL, RCL, and Wan Hai.

Airlines publish cargo flight schedules with:

  • Route and frequency
  • Aircraft type (indicating cargo capacity)
  • Connection options
  • Cut-off times

Trucking platforms show:

  • Available routes
  • Transit time estimates
  • Vehicle availability by route

Port and Terminal Data

  • PSA Singapore: Connectivity data showing which services call at Singapore and connection possibilities
  • Port authorities: Berth allocation schedules showing which vessels are calling
  • Terminal operators: Handling capacity and service information

Freight Forwarder Platforms

Freight forwarders like Flexport, Freightos, and regional forwarders publish route options with comparative data on transit times and costs.

Why Proxies Are Needed for Trade Lane Data

Carrier Website Protections

Shipping line websites are increasingly protected against automated access:

  • Schedule lookup rate limiting: Carriers restrict the number of schedule queries per session
  • Point-to-point search restrictions: Repeated P2P schedule searches trigger bot detection
  • Dynamic content: Schedule data is often loaded through JavaScript-heavy interfaces
  • Geographic content serving: Some carriers show different schedule information based on user location

Multi-Source Collection Requirements

Comprehensive trade lane data requires collecting from dozens of carrier websites, port authorities, and freight platforms. The volume of requests needed to build a complete picture would quickly exhaust any single IP address’s access allowance.

DataResearchTools mobile proxies distribute these queries across many IPs, keeping per-IP request rates at natural levels. Country-specific proxies also ensure you receive locally relevant schedule and pricing information.

Building a Trade Lane Data Collection System

Data Model

from dataclasses import dataclass, field
from typing import List, Optional

@dataclass
class TradeLane:
    origin_port: str
    destination_port: str
    carrier: str
    service_name: str
    mode: str  # ocean, air, road
    service_type: str  # direct, transshipment, feeder
    transit_days: int
    frequency: str  # daily, weekly, bi-weekly
    vessel_or_flight: Optional[str] = None
    transshipment_ports: List[str] = field(default_factory=list)
    departure_day: Optional[str] = None  # day of week
    cut_off_days_before: int = 0
    capacity_teu: Optional[int] = None
    current_rate: Optional[float] = None
    rate_currency: Optional[str] = None
    collected_at: str = ""
    source: str = ""


@dataclass
class RouteOption:
    """A complete route from origin to destination, possibly multimodal."""
    origin: str
    destination: str
    legs: List[TradeLane]
    total_transit_days: int
    total_cost: Optional[float] = None
    cost_currency: Optional[str] = None
    transshipment_count: int = 0
    reliability_score: Optional[float] = None

Carrier Schedule Collection

class CarrierScheduleCollector:
    """Collect sailing schedules from carrier websites."""

    def __init__(self, proxy_config):
        self.proxy_config = proxy_config

    def collect_schedules(self, carrier_config, origin, destination):
        """Collect schedule data from a carrier's website."""
        country = self._port_to_country(origin)
        proxy = self.proxy_config.get_proxy(country, sticky=True)

        session = requests.Session()
        session.proxies = proxy
        session.headers.update({
            "User-Agent": (
                "Mozilla/5.0 (Linux; Android 14; Pixel 8) "
                "AppleWebKit/537.36 Chrome/121.0.0.0 Mobile Safari/537.36"
            ),
            "Accept": "application/json",
        })

        try:
            response = session.get(
                carrier_config["schedule_url"],
                params={
                    "origin": origin,
                    "destination": destination,
                    "weeks": 4,
                },
                timeout=30,
            )

            if response.status_code == 200:
                return self._parse_schedule(
                    response.json(), carrier_config["name"],
                    origin, destination
                )
        except Exception as e:
            print(f"Schedule collection error for {carrier_config['name']}: {e}")

        return []

    def _parse_schedule(self, data, carrier, origin, destination):
        """Parse carrier schedule response into TradeLane objects."""
        lanes = []

        for service in data.get("services", []):
            transshipments = service.get("via_ports", [])
            lane = TradeLane(
                origin_port=origin,
                destination_port=destination,
                carrier=carrier,
                service_name=service.get("service_code", ""),
                mode="ocean",
                service_type=(
                    "direct" if not transshipments else "transshipment"
                ),
                transit_days=service.get("transit_days", 0),
                frequency=service.get("frequency", "weekly"),
                vessel_or_flight=service.get("vessel_name"),
                transshipment_ports=transshipments,
                departure_day=service.get("departure_day"),
                cut_off_days_before=service.get("cut_off_days", 0),
                capacity_teu=service.get("capacity"),
                collected_at=datetime.utcnow().isoformat(),
                source=carrier,
            )
            lanes.append(lane)

        return lanes

    def _port_to_country(self, port_code):
        mapping = {
            "SGSIN": "sg", "THLCH": "th", "THBKK": "th",
            "IDJKT": "id", "IDSBY": "id", "VNSGN": "vn",
            "VNHPH": "vn", "PHMNL": "ph", "MYPKG": "my",
        }
        return mapping.get(port_code, "sg")


class MultiCarrierScheduleAggregator:
    """Aggregate schedules from multiple carriers for route comparison."""

    def __init__(self, collector, carrier_configs):
        self.collector = collector
        self.carrier_configs = carrier_configs

    def collect_all_schedules(self, origin, destination):
        """Collect schedules from all carriers for a port pair."""
        all_lanes = []

        for config in self.carrier_configs:
            lanes = self.collector.collect_schedules(
                config, origin, destination
            )
            all_lanes.extend(lanes)
            time.sleep(random.uniform(3, 7))

        return all_lanes

Route Optimization

class RouteOptimizer:
    """Optimize freight routing using collected trade lane data."""

    def __init__(self, trade_lane_db):
        self.db = trade_lane_db

    def find_best_routes(
        self, origin, destination, criteria="balanced",
        max_transshipments=2
    ):
        """Find optimal routes between origin and destination."""
        # Get all direct lanes
        direct_lanes = self.db.get_lanes(origin, destination)

        # Get transshipment options
        transshipment_routes = self._find_transshipment_routes(
            origin, destination, max_transshipments
        )

        all_routes = []

        # Package direct lanes as routes
        for lane in direct_lanes:
            route = RouteOption(
                origin=origin,
                destination=destination,
                legs=[lane],
                total_transit_days=lane.transit_days,
                total_cost=lane.current_rate,
                cost_currency=lane.rate_currency,
                transshipment_count=0,
            )
            all_routes.append(route)

        # Add transshipment routes
        all_routes.extend(transshipment_routes)

        # Score and rank routes
        scored_routes = self._score_routes(all_routes, criteria)

        return scored_routes

    def _find_transshipment_routes(
        self, origin, destination, max_ts
    ):
        """Find routes with transshipment connections."""
        routes = []

        # Common transshipment hubs in SEA
        hubs = ["SGSIN", "MYPKG", "THLCH", "CNSHA"]

        for hub in hubs:
            if hub == origin or hub == destination:
                continue

            first_legs = self.db.get_lanes(origin, hub)
            second_legs = self.db.get_lanes(hub, destination)

            for leg1 in first_legs:
                for leg2 in second_legs:
                    # Check if connection is feasible
                    # (minimum connection time at hub)
                    min_connection_days = 2
                    total_transit = (
                        leg1.transit_days
                        + min_connection_days
                        + leg2.transit_days
                    )

                    total_cost = None
                    if leg1.current_rate and leg2.current_rate:
                        total_cost = leg1.current_rate + leg2.current_rate

                    route = RouteOption(
                        origin=origin,
                        destination=destination,
                        legs=[leg1, leg2],
                        total_transit_days=total_transit,
                        total_cost=total_cost,
                        cost_currency=leg1.rate_currency,
                        transshipment_count=1,
                    )
                    routes.append(route)

        return routes

    def _score_routes(self, routes, criteria):
        """Score routes based on optimization criteria."""
        if not routes:
            return []

        for route in routes:
            if criteria == "fastest":
                route.reliability_score = 100 - route.total_transit_days
            elif criteria == "cheapest":
                route.reliability_score = (
                    -route.total_cost if route.total_cost else -float("inf")
                )
            elif criteria == "balanced":
                # Balance of cost and speed
                transit_score = max(0, 50 - route.total_transit_days * 2)
                cost_score = max(
                    0, 50 - (route.total_cost or 5000) / 100
                )
                ts_penalty = route.transshipment_count * 5
                route.reliability_score = (
                    transit_score + cost_score - ts_penalty
                )
            elif criteria == "reliable":
                # Prefer direct routes with established carriers
                route.reliability_score = (
                    50
                    - route.transshipment_count * 20
                    - route.total_transit_days
                )

        routes.sort(
            key=lambda r: r.reliability_score or 0, reverse=True
        )
        return routes

    def compare_trade_lanes(self, origin, destination):
        """Generate a comprehensive trade lane comparison."""
        lanes = self.db.get_lanes(origin, destination)

        if not lanes:
            return {"message": "No trade lanes found"}

        comparison = {
            "origin": origin,
            "destination": destination,
            "total_options": len(lanes),
            "direct_services": len([l for l in lanes if l.service_type == "direct"]),
            "transshipment_services": len([l for l in lanes if l.service_type == "transshipment"]),
            "carriers": list(set(l.carrier for l in lanes)),
            "transit_time_range": {
                "min": min(l.transit_days for l in lanes),
                "max": max(l.transit_days for l in lanes),
                "avg": round(
                    sum(l.transit_days for l in lanes) / len(lanes), 1
                ),
            },
            "services": [
                {
                    "carrier": l.carrier,
                    "service": l.service_name,
                    "type": l.service_type,
                    "transit_days": l.transit_days,
                    "frequency": l.frequency,
                    "via": l.transshipment_ports,
                    "rate": l.current_rate,
                }
                for l in sorted(lanes, key=lambda x: x.transit_days)
            ],
        }

        return comparison

Historical Performance Analysis

def analyze_lane_performance(schedule_db, actual_tracking_db, lane):
    """Analyze actual vs scheduled performance for a trade lane."""
    # Get scheduled transit times
    scheduled = schedule_db.get_lane_history(
        lane.origin_port, lane.destination_port,
        lane.carrier, lane.service_name
    )

    # Get actual arrival data
    actuals = actual_tracking_db.get_arrivals(
        lane.origin_port, lane.destination_port,
        lane.carrier, lane.service_name
    )

    if not actuals:
        return None

    delays = []
    for actual in actuals:
        scheduled_days = actual["scheduled_transit_days"]
        actual_days = actual["actual_transit_days"]
        delay = actual_days - scheduled_days
        delays.append(delay)

    return {
        "lane": f"{lane.origin_port}-{lane.destination_port}",
        "carrier": lane.carrier,
        "service": lane.service_name,
        "scheduled_transit_days": lane.transit_days,
        "avg_actual_days": round(
            sum(d + lane.transit_days for d in delays) / len(delays), 1
        ),
        "avg_delay_days": round(sum(delays) / len(delays), 1),
        "on_time_pct": round(
            sum(1 for d in delays if d <= 0) / len(delays) * 100, 1
        ),
        "max_delay_days": max(delays),
        "schedule_reliability": round(
            sum(1 for d in delays if abs(d) <= 1) / len(delays) * 100, 1
        ),
        "sailings_analyzed": len(delays),
    }

DataResearchTools for Trade Lane Data Collection

DataResearchTools mobile proxies support trade lane data collection with:

  • Carrier website access: Reliably access schedule pages of all major carriers serving SEA trade lanes
  • Port authority data: Access berth schedules and vessel call information from SEA port authorities
  • Multi-country routing: Collect data from platforms in all relevant countries for comprehensive route analysis
  • Consistent collection: Reliable mobile proxies enable the ongoing schedule monitoring needed for performance tracking

Conclusion

Trade lane data collection and analysis is foundational for freight route optimization. In Southeast Asia’s complex shipping network, with its numerous ports, carriers, and routing options, systematic data collection provides the visibility needed to make optimal routing decisions.

DataResearchTools mobile proxies enable reliable collection from carrier websites, port authorities, and freight platforms across the region. By building a comprehensive trade lane database and applying optimization logic, logistics teams can identify the best routes for each shipment, balancing cost, speed, reliability, and capacity according to their specific priorities.

Start with your highest-volume trade lanes, build out your data collection and analysis capabilities, and progressively expand coverage to create a complete picture of available routing options across Southeast Asia.


Related Reading

last updated: April 3, 2026

Scroll to Top

Resources

Proxy Signals Podcast
Operator-level insights on mobile proxies and access infrastructure.

Multi-Account Proxies: Setup, Types, Tools & Mistakes (2026)