Building a Freight Rate Comparison Engine with Proxy Infrastructure

Freight rate comparison is one of the most valuable applications in logistics technology. Shippers spend hours manually checking rates across carriers and platforms, often settling for suboptimal pricing because they lack visibility into the full market. A freight rate comparison engine that aggregates data from multiple sources automatically can save logistics teams thousands of hours and millions of dollars annually.

Building such an engine requires solving two fundamental challenges: reliably collecting rate data from multiple protected platforms, and normalizing that data into a format that enables meaningful comparison. This guide covers both, with a focus on the proxy infrastructure that makes large-scale rate collection possible.

What a Freight Rate Comparison Engine Does

A freight rate comparison engine collects, normalizes, and presents freight rates from multiple sources so users can quickly identify the best shipping options for their specific needs. The most useful engines go beyond simple price comparison to include:

Multi-modal comparison: Ocean, air, road, and rail rates for the same origin-destination pair
Total cost calculation: Base rates plus surcharges, fees, and accessorial charges
Transit time comparison: Balancing cost against speed
Service level comparison: Direct versus transshipment, guaranteed versus standard
Historical pricing: Current rates in context of recent trends
Rate validity: When quotes expire and need refreshing

Market Context

Several commercial freight rate comparison platforms exist, including Freightos, Shifl, Flexport’s platform, and Cargobase. However, these platforms each have different carrier coverage, and none provides complete market visibility. Building your own comparison engine, even if supplementary to these platforms, gives you control over the data sources, update frequency, and analysis capabilities.

Architecture of a Rate Comparison Engine

System Components

                    +-----------------------+
                    |    Data Sources        |
                    |  (Carrier portals,     |
                    |   Freight platforms,   |
                    |   Rate APIs)           |
                    +-----------+-----------+
                                |
                    +-----------v-----------+
                    |    Proxy Layer          |
                    |  (DataResearchTools     |
                    |   Mobile Proxies)       |
                    +-----------+-----------+
                                |
                    +-----------v-----------+
                    |    Collection Layer     |
                    |  (Scrapers, API        |
                    |   clients, parsers)    |
                    +-----------+-----------+
                                |
                    +-----------v-----------+
                    |    Normalization        |
                    |  (Currency, units,      |
                    |   fee structures)       |
                    +-----------+-----------+
                                |
                    +-----------v-----------+
                    |    Storage Layer        |
                    |  (PostgreSQL,           |
                    |   time-series data)     |
                    +-----------+-----------+
                                |
                    +-----------v-----------+
                    |    Comparison Engine    |
                    |  (Ranking, filtering,   |
                    |   analytics)            |
                    +-----------+-----------+
                                |
                    +-----------v-----------+
                    |    Presentation Layer   |
                    |  (Dashboard, API,       |
                    |   alerts)               |
                    +------------------------+

Technology Stack Recommendations

Collection: Python with Requests, Scrapy, and Playwright for JavaScript-heavy sites
Proxy management: DataResearchTools mobile proxies with country-specific endpoints
Database: PostgreSQL with TimescaleDB extension for time-series rate data
Normalization: Python data processing with pandas
API: FastAPI or Flask for serving comparison results
Dashboard: Grafana or custom React dashboard

Setting Up the Proxy Infrastructure

Why Mobile Proxies Are Critical

Rate comparison engines need to access dozens of platforms repeatedly. The key challenges include:

Scale: Collecting rates for hundreds of route-carrier combinations daily
Geographic accuracy: Getting locally accurate pricing from each platform
Reliability: Maintaining consistent access without interruptions from blocking
Speed: Collecting fresh data fast enough to be useful for decision-making

DataResearchTools mobile proxies address all four challenges:

Scale: Automatic IP rotation distributes requests across large pools of mobile IPs
Geographic accuracy: Country-specific endpoints in Singapore, Thailand, Indonesia, Vietnam, Philippines, and Malaysia
Reliability: Mobile IPs carry inherent trust and are rarely blocked
Speed: Low-latency connections through local mobile carriers

Proxy Configuration

class ProxyPool:
    """Manage proxy connections for rate collection across platforms."""

    def __init__(self, config):
        self.config = config
        self.session_counter = 0

    def get_proxy(self, country="sg", sticky=False):
        """
        Get a proxy connection.

        Args:
            country: Two-letter country code
            sticky: If True, maintain same IP for session duration
        """
        base = f"http://{self.config['user']}:{self.config['pass']}"
        endpoint = f"@{country}.dataresearchtools.com:{self.config['port']}"

        if sticky:
            self.session_counter += 1
            endpoint += f"?session=rate_{self.session_counter}"

        proxy_url = base + endpoint
        return {"http": proxy_url, "https": proxy_url}

    def get_rotating_proxy(self, country="sg"):
        """Get a proxy that rotates IP on each request."""
        return self.get_proxy(country, sticky=False)

    def get_sticky_proxy(self, country="sg"):
        """Get a proxy that maintains the same IP for the session."""
        return self.get_proxy(country, sticky=True)

Building the Collection Layer

Platform Adapters

Create modular adapters for each data source:

from abc import ABC, abstractmethod
from dataclasses import dataclass
from typing import List, Optional
from datetime import datetime

@dataclass
class RawRate:
    """Raw rate data as collected from source."""
    source: str
    carrier: str
    origin_port: str
    destination_port: str
    container_type: str
    base_rate: float
    currency: str
    surcharges: dict
    transit_days: int
    service_type: str
    valid_from: str
    valid_to: str
    collected_at: str

class PlatformAdapter(ABC):
    """Base class for platform-specific rate collectors."""

    def __init__(self, proxy_pool):
        self.proxy_pool = proxy_pool
        self.session = requests.Session()

    @abstractmethod
    def collect_rates(self, origin, destination, container_type) -> List[RawRate]:
        pass

    @abstractmethod
    def get_supported_routes(self) -> List[dict]:
        pass

    def _setup_session(self, country):
        """Configure session with proxy and headers."""
        self.session.proxies = self.proxy_pool.get_proxy(country)
        self.session.headers.update({
            "User-Agent": (
                "Mozilla/5.0 (Linux; Android 14; Pixel 8) "
                "AppleWebKit/537.36 Chrome/121.0.0.0 Mobile Safari/537.36"
            ),
            "Accept": "application/json, text/html",
            "Accept-Language": "en-US,en;q=0.9",
        })


class CarrierPortalAdapter(PlatformAdapter):
    """Collect rates from individual carrier portals."""

    def __init__(self, proxy_pool, carrier_config):
        super().__init__(proxy_pool)
        self.carrier_config = carrier_config

    def collect_rates(self, origin, destination, container_type):
        """Collect rates from carrier's rate inquiry page."""
        country = self._origin_to_country(origin)
        self._setup_session(country)

        rates = []
        try:
            # Example: POST to carrier's rate API
            response = self.session.post(
                self.carrier_config["rate_url"],
                json={
                    "pol": origin,
                    "pod": destination,
                    "equipment": container_type,
                    "date": datetime.now().strftime("%Y-%m-%d"),
                },
                timeout=30,
            )
            if response.status_code == 200:
                data = response.json()
                for quote in data.get("rates", []):
                    rate = RawRate(
                        source=self.carrier_config["name"],
                        carrier=self.carrier_config["name"],
                        origin_port=origin,
                        destination_port=destination,
                        container_type=container_type,
                        base_rate=quote["amount"],
                        currency=quote["currency"],
                        surcharges=self._extract_surcharges(quote),
                        transit_days=quote.get("transit_time", 0),
                        service_type=quote.get("service", "standard"),
                        valid_from=quote.get("valid_from", ""),
                        valid_to=quote.get("valid_to", ""),
                        collected_at=datetime.utcnow().isoformat(),
                    )
                    rates.append(rate)
        except Exception as e:
            print(f"Error collecting from {self.carrier_config['name']}: {e}")

        return rates

    def _extract_surcharges(self, quote):
        """Extract surcharge breakdown from quote data."""
        surcharges = {}
        for charge in quote.get("charges", []):
            if charge["type"] != "base_freight":
                surcharges[charge["type"]] = charge["amount"]
        return surcharges

    def get_supported_routes(self):
        return self.carrier_config.get("routes", [])

    def _origin_to_country(self, port_code):
        """Map port code to country for proxy selection."""
        port_country = {
            "SGSIN": "sg", "THBKK": "th", "THLCH": "th",
            "IDJKT": "id", "IDSBY": "id", "VNSGN": "vn",
            "VNHPH": "vn", "PHMNL": "ph", "PHCEB": "ph",
            "MYPKG": "my", "MYPEN": "my",
        }
        return port_country.get(port_code, "sg")

Collection Orchestrator

Coordinate collection across multiple platforms:

import time
import random
from datetime import datetime

class CollectionOrchestrator:
    """Orchestrate rate collection across all platforms."""

    def __init__(self, adapters, rate_store):
        self.adapters = adapters
        self.rate_store = rate_store

    def collect_all_rates(self, routes):
        """Collect rates from all adapters for specified routes."""
        collection_id = datetime.utcnow().strftime("%Y%m%d_%H%M%S")
        total_collected = 0

        for adapter in self.adapters:
            adapter_name = adapter.__class__.__name__
            print(f"Collecting from {adapter_name}...")

            for route in routes:
                try:
                    rates = adapter.collect_rates(
                        route["origin"],
                        route["destination"],
                        route["container_type"],
                    )
                    if rates:
                        self.rate_store.save_rates(rates, collection_id)
                        total_collected += len(rates)
                        print(
                            f"  {route['origin']}->{route['destination']}: "
                            f"{len(rates)} rates"
                        )

                    # Delay between requests
                    time.sleep(random.uniform(3, 6))

                except Exception as e:
                    print(
                        f"  Error on {route['origin']}->"
                        f"{route['destination']}: {e}"
                    )

        print(f"Collection complete: {total_collected} rates collected")
        return collection_id

Building the Normalization Layer

Currency Normalization

Rates from different sources come in different currencies. Normalize to a common currency:

class CurrencyNormalizer:
    """Normalize all rates to a common currency."""

    def __init__(self, base_currency="USD"):
        self.base_currency = base_currency
        self.exchange_rates = self._load_exchange_rates()

    def _load_exchange_rates(self):
        """Load current exchange rates."""
        # In production, fetch from a currency API
        return {
            "USD": 1.0,
            "SGD": 0.74,
            "THB": 0.028,
            "IDR": 0.000063,
            "VND": 0.000040,
            "PHP": 0.018,
            "MYR": 0.22,
            "EUR": 1.08,
            "CNY": 0.14,
        }

    def convert(self, amount, from_currency):
        """Convert amount to base currency."""
        rate = self.exchange_rates.get(from_currency.upper())
        if rate is None:
            raise ValueError(f"Unknown currency: {from_currency}")
        return round(amount * rate, 2)

Total Cost Calculation

Different carriers include different fees in their base rates. Calculate comparable total costs:

class TotalCostCalculator:
    """Calculate total shipping cost from base rate and surcharges."""

    # Common surcharge types that should be included in total cost
    INCLUDED_SURCHARGES = [
        "baf", "bas", "fuel_surcharge",  # Fuel-related
        "thc_origin", "thc_destination",  # Terminal handling
        "caf",  # Currency adjustment
        "pss", "gri",  # Peak season / general rate increase
        "isps",  # Security
        "doc_fee",  # Documentation
    ]

    def calculate_total(self, raw_rate, currency_normalizer):
        """Calculate normalized total cost for comparison."""
        # Convert base rate to USD
        base_usd = currency_normalizer.convert(
            raw_rate.base_rate, raw_rate.currency
        )

        # Add all applicable surcharges
        surcharge_total = 0
        for charge_type, amount in raw_rate.surcharges.items():
            if charge_type.lower() in self.INCLUDED_SURCHARGES:
                surcharge_total += currency_normalizer.convert(
                    amount, raw_rate.currency
                )

        return {
            "base_rate_usd": base_usd,
            "surcharges_usd": round(surcharge_total, 2),
            "total_usd": round(base_usd + surcharge_total, 2),
            "surcharge_breakdown": raw_rate.surcharges,
        }

Building the Comparison Engine

Rate Ranking

Rank collected rates by multiple criteria:

class RateComparisonEngine:
    """Compare and rank freight rates across sources."""

    def compare_rates(self, normalized_rates, sort_by="total_cost"):
        """
        Compare rates for a specific route and return ranked results.

        sort_by options: total_cost, transit_time, cost_per_day
        """
        # Calculate cost-per-transit-day for value comparison
        for rate in normalized_rates:
            if rate["transit_days"] > 0:
                rate["cost_per_day"] = round(
                    rate["total_usd"] / rate["transit_days"], 2
                )
            else:
                rate["cost_per_day"] = float("inf")

        # Sort based on criteria
        sort_keys = {
            "total_cost": lambda r: r["total_usd"],
            "transit_time": lambda r: r["transit_days"],
            "cost_per_day": lambda r: r["cost_per_day"],
        }

        sorted_rates = sorted(
            normalized_rates,
            key=sort_keys.get(sort_by, sort_keys["total_cost"])
        )

        # Add ranking metadata
        if sorted_rates:
            cheapest = sorted_rates[0]["total_usd"]
            for i, rate in enumerate(sorted_rates):
                rate["rank"] = i + 1
                rate["vs_cheapest_pct"] = round(
                    (rate["total_usd"] / cheapest - 1) * 100, 1
                ) if cheapest > 0 else 0

        return sorted_rates

    def find_best_value(self, normalized_rates):
        """Find the rate with the best balance of cost and speed."""
        if not normalized_rates:
            return None

        # Score each rate (lower is better)
        # Normalize cost and time to 0-1 scale
        costs = [r["total_usd"] for r in normalized_rates]
        times = [r["transit_days"] for r in normalized_rates]
        min_cost, max_cost = min(costs), max(costs)
        min_time, max_time = min(times), max(times)

        cost_range = max_cost - min_cost if max_cost > min_cost else 1
        time_range = max_time - min_time if max_time > min_time else 1

        for rate in normalized_rates:
            cost_score = (rate["total_usd"] - min_cost) / cost_range
            time_score = (rate["transit_days"] - min_time) / time_range
            # 60% weight on cost, 40% on time
            rate["value_score"] = round(cost_score * 0.6 + time_score * 0.4, 3)

        return min(normalized_rates, key=lambda r: r["value_score"])

Historical Trend Analysis

Compare current rates against historical data:

def analyze_rate_trends(db, origin, destination, container_type, days=90):
    """Analyze rate trends over a specified period."""
    query = """
        SELECT carrier, collected_at::date as date,
               AVG(total_usd) as avg_rate,
               MIN(total_usd) as min_rate,
               MAX(total_usd) as max_rate
        FROM normalized_rates
        WHERE origin_port = %s AND destination_port = %s
        AND container_type = %s
        AND collected_at >= NOW() - INTERVAL '%s days'
        GROUP BY carrier, collected_at::date
        ORDER BY date
    """
    df = pd.read_sql(query, db, params=[
        origin, destination, container_type, days
    ])

    # Calculate moving averages
    for carrier in df["carrier"].unique():
        mask = df["carrier"] == carrier
        df.loc[mask, "ma_7d"] = (
            df.loc[mask, "avg_rate"].rolling(7).mean()
        )
        df.loc[mask, "ma_30d"] = (
            df.loc[mask, "avg_rate"].rolling(30).mean()
        )

    return df

API for Serving Comparison Results

from fastapi import FastAPI, Query
from typing import Optional

app = FastAPI(title="Freight Rate Comparison API")

@app.get("/api/rates/compare")
async def compare_rates(
    origin: str = Query(..., description="Origin port code (e.g., SGSIN)"),
    destination: str = Query(..., description="Destination port code"),
    container_type: str = Query("40HC", description="Container type"),
    sort_by: str = Query("total_cost", description="Sort criteria"),
):
    """Compare current freight rates across all collected sources."""
    rates = rate_store.get_latest_rates(origin, destination, container_type)
    normalized = [normalizer.normalize(r) for r in rates]
    compared = comparison_engine.compare_rates(normalized, sort_by)

    return {
        "route": f"{origin} -> {destination}",
        "container_type": container_type,
        "rates_found": len(compared),
        "rates": compared,
        "best_value": comparison_engine.find_best_value(normalized),
        "collected_at": datetime.utcnow().isoformat(),
    }

Scheduling and Maintenance

Collection Schedule

Set up automated collection to keep data fresh:

Major trade lanes: Collect twice daily (morning and evening)
Secondary routes: Collect daily
Surcharge updates: Monitor carrier announcement pages weekly
Exchange rates: Update daily

Data Quality Monitoring

Monitor the health of your collection pipeline:

def check_collection_health(db, alert_threshold_hours=24):
    """Check if all sources are collecting data within expected timeframes."""
    query = """
        SELECT source, MAX(collected_at) as last_collection,
               COUNT(*) as rates_last_24h
        FROM raw_rates
        WHERE collected_at >= NOW() - INTERVAL '24 hours'
        GROUP BY source
    """
    results = pd.read_sql(query, db)

    alerts = []
    for _, row in results.iterrows():
        hours_since = (
            datetime.utcnow() - row["last_collection"]
        ).total_seconds() / 3600

        if hours_since > alert_threshold_hours:
            alerts.append({
                "source": row["source"],
                "hours_since_last": round(hours_since, 1),
                "severity": "high",
            })

    return alerts

DataResearchTools Integration Benefits

Building a freight rate comparison engine with DataResearchTools mobile proxies provides several strategic advantages:

Comprehensive SEA coverage: Collect rates from carrier portals across all major Southeast Asian markets with country-specific mobile IPs
High reliability: Mobile proxies maintain consistent access to rate platforms that block datacenter IPs
Scalable collection: As you add more carriers and routes, DataResearchTools scales with your needs
Cost efficiency: Collect rates from dozens of sources without paying for individual platform subscriptions

Conclusion

A freight rate comparison engine transforms how logistics teams make shipping decisions. Instead of manually checking rates across platforms, they get a consolidated view of the market with ranking, trend analysis, and total cost calculations.

The foundation of any effective comparison engine is reliable data collection, and DataResearchTools mobile proxies provide the infrastructure needed to collect rates consistently from protected shipping platforms across Southeast Asia. Start with a focused set of routes and carriers, build out the normalization and comparison logic, and expand your coverage as the system proves its value.

The ROI on a freight rate comparison engine is typically realized within the first few shipments, making it one of the highest-value logistics technology investments available.