Proxy Setup for Scraping Automotive Parts and Accessories Marketplaces

Proxy Setup for Scraping Automotive Parts and Accessories Marketplaces

The automotive parts and accessories market in Southeast Asia is enormous and fragmented. From OEM replacement parts to aftermarket accessories, pricing and availability data is spread across hundreds of online marketplaces, specialized retailers, and e-commerce platforms. For parts distributors, repair shops, insurance companies, and price comparison services, collecting this data systematically provides a significant competitive edge.

This guide covers the proxy setup and scraping strategies needed to collect automotive parts data from major marketplaces in the region.

The Auto Parts Data Landscape

Platform Categories

General E-Commerce:

  • Lazada (all SEA markets)
  • Shopee (all SEA markets)
  • Tokopedia (Indonesia)
  • TikTok Shop (growing parts category)

Specialized Auto Parts:

  • Boodmo (parts catalog and pricing)
  • PartSouq (OEM parts for Asian vehicles)
  • AutoDoc (global auto parts)
  • SpartCat (parts catalog)

Regional Specialists:

  • CarParts.sg (Singapore)
  • AutoParts.my (Malaysia)
  • Thai auto parts platforms

OEM Parts Portals:

  • Toyota Parts catalogs
  • Honda Parts websites
  • Brand-specific parts lookup tools

Data Value

Auto parts pricing data serves multiple business purposes:

  • Insurance claim validation: Verify repair cost estimates against actual market prices
  • Repair shop pricing: Set competitive labor and parts pricing
  • Parts distribution: Optimize inventory and pricing across the supply chain
  • Vehicle valuation: Factor in parts costs for total cost of ownership calculations
  • Market analysis: Track parts demand as an indicator of vehicle fleet composition

Proxy Infrastructure Requirements

Multi-Platform Challenges

Each e-commerce platform implements different anti-scraping measures:

  • Lazada: Akamai protection, API rate limiting, device fingerprinting
  • Shopee: Custom bot detection, session validation, geographic restrictions
  • Tokopedia: CloudFlare protection, JavaScript challenges
  • Specialized parts sites: Variable protection, often lighter but with strict rate limits

DataResearchTools Proxy Configuration

For auto parts scraping across Southeast Asian platforms, configure your proxy infrastructure to match each platform’s expectations:

class AutoPartsProxyManager:
    def __init__(self, api_key):
        self.api_key = api_key
        self.endpoint = "proxy.dataresearchtools.com"

    def get_platform_proxy(self, platform, country):
        """Get optimized proxy for specific platform and country"""
        config = self.platform_configs.get(platform, {})
        proxy_type = config.get("preferred_type", "mobile")
        rotation = config.get("rotation", "per_request")

        session_id = uuid4().hex[:8] if rotation == "per_request" else f"sticky-{platform}-{int(time.time()) % 10000}"

        auth = f"{self.api_key}:country-{country}-type-{proxy_type}-session-{session_id}"

        return {
            "http": f"http://{auth}@{self.endpoint}:8080",
            "https": f"http://{auth}@{self.endpoint}:8080"
        }

    platform_configs = {
        "lazada": {
            "preferred_type": "mobile",
            "rotation": "sticky",
            "session_duration": 600,
        },
        "shopee": {
            "preferred_type": "mobile",
            "rotation": "per_request",
        },
        "tokopedia": {
            "preferred_type": "mobile",
            "rotation": "sticky",
            "session_duration": 300,
        },
        "specialized": {
            "preferred_type": "residential",
            "rotation": "per_request",
        }
    }

Why Mobile Proxies Work Best

Southeast Asian e-commerce platforms are mobile-first. The majority of shopping traffic comes from mobile devices, especially through their apps. Using DataResearchTools mobile proxies ensures your scraping traffic matches this pattern, significantly reducing detection risk.

Mobile proxies also provide access to mobile-specific APIs and pricing that may differ from desktop versions. Some platforms show different prices or availability on their mobile apps versus desktop websites.

Scraping Lazada for Auto Parts

API-Based Approach

Lazada’s search can be accessed through their internal API:

class LazadaPartsScraper:
    def __init__(self, proxy_manager):
        self.proxy_manager = proxy_manager

    def search_parts(self, query, country="MY", page=1):
        proxy = self.proxy_manager.get_platform_proxy("lazada", country)

        country_domains = {
            "SG": "lazada.sg",
            "MY": "lazada.com.my",
            "TH": "lazada.co.th",
            "ID": "lazada.co.id",
            "PH": "lazada.com.ph",
            "VN": "lazada.vn",
        }

        domain = country_domains.get(country, "lazada.com")

        session = requests.Session()
        session.proxies.update(proxy)
        session.headers.update({
            "User-Agent": get_random_mobile_ua(),
            "Accept-Language": "en-US,en;q=0.9",
        })

        # Visit search page
        response = session.get(
            f"https://www.{domain}/catalog/",
            params={"q": query, "page": page},
            timeout=30
        )

        if response.status_code == 200:
            return self.parse_results(response.text, country)
        return []

    def parse_results(self, html, country):
        soup = BeautifulSoup(html, 'html.parser')
        products = []

        for item in soup.select('[data-qa-locator="product-item"]'):
            product = {
                "platform": "lazada",
                "country": country,
                "title": safe_text(item, '.product-title, [class*="title"]'),
                "price": safe_text(item, '.product-price, [class*="price"]'),
                "original_price": safe_text(item, '.product-original-price, [class*="original"]'),
                "discount": safe_text(item, '.product-discount, [class*="discount"]'),
                "rating": safe_text(item, '.product-rating, [class*="rating"]'),
                "sold_count": safe_text(item, '.product-sold, [class*="sold"]'),
                "seller": safe_text(item, '.seller-name, [class*="seller"]'),
                "location": safe_text(item, '.seller-location'),
                "url": item.select_one('a')['href'] if item.select_one('a') else None,
                "image": item.select_one('img')['src'] if item.select_one('img') else None,
            }
            products.append(product)

        return products

    def get_product_details(self, product_url, country):
        proxy = self.proxy_manager.get_platform_proxy("lazada", country)

        session = requests.Session()
        session.proxies.update(proxy)
        session.headers.update({"User-Agent": get_random_mobile_ua()})

        response = session.get(product_url, timeout=30)
        if response.status_code == 200:
            return self.parse_product_detail(response.text)
        return None

    def parse_product_detail(self, html):
        soup = BeautifulSoup(html, 'html.parser')

        # Extract specifications
        specs = {}
        for row in soup.select('.product-spec tr, [class*="specification"] tr'):
            cells = row.select('td')
            if len(cells) >= 2:
                key = cells[0].get_text(strip=True)
                value = cells[1].get_text(strip=True)
                specs[key] = value

        return {
            "specifications": specs,
            "description": safe_text(soup, '.product-description'),
            "shipping_info": safe_text(soup, '.shipping-info'),
            "return_policy": safe_text(soup, '.return-policy'),
        }

Scraping Shopee for Auto Parts

class ShopeePartsScraper:
    def __init__(self, proxy_manager):
        self.proxy_manager = proxy_manager

    def search_parts(self, query, country="SG", page=0):
        proxy = self.proxy_manager.get_platform_proxy("shopee", country)

        country_configs = {
            "SG": {"domain": "shopee.sg", "api_domain": "shopee.sg"},
            "MY": {"domain": "shopee.com.my", "api_domain": "shopee.com.my"},
            "TH": {"domain": "shopee.co.th", "api_domain": "shopee.co.th"},
            "ID": {"domain": "shopee.co.id", "api_domain": "shopee.co.id"},
            "PH": {"domain": "shopee.ph", "api_domain": "shopee.ph"},
            "VN": {"domain": "shopee.vn", "api_domain": "shopee.vn"},
        }

        config = country_configs.get(country)

        # Shopee search API
        url = f"https://{config['api_domain']}/api/v4/search/search_items"
        params = {
            "keyword": query,
            "limit": 60,
            "offset": page * 60,
            "order": "relevancy",
        }

        headers = {
            "User-Agent": get_random_mobile_ua(),
            "Referer": f"https://{config['domain']}/search?keyword={query}",
        }

        response = requests.get(url, params=params, headers=headers, proxies=proxy, timeout=30)

        if response.status_code == 200:
            data = response.json()
            return self.parse_shopee_results(data, config["domain"])
        return []

    def parse_shopee_results(self, data, domain):
        products = []
        for item in data.get("items", []):
            info = item.get("item_basic", {})
            products.append({
                "platform": "shopee",
                "title": info.get("name"),
                "price": info.get("price") / 100000 if info.get("price") else None,
                "price_min": info.get("price_min") / 100000 if info.get("price_min") else None,
                "price_max": info.get("price_max") / 100000 if info.get("price_max") else None,
                "discount": info.get("raw_discount"),
                "rating": info.get("item_rating", {}).get("rating_star"),
                "sold_count": info.get("sold"),
                "stock": info.get("stock"),
                "shop_id": info.get("shopid"),
                "item_id": info.get("itemid"),
                "url": f"https://{domain}/product/{info.get('shopid')}/{info.get('itemid')}",
                "image": f"https://cf.shopee.sg/file/{info.get('image')}" if info.get("image") else None,
            })
        return products

Parts Compatibility Matching

Building a Compatibility Database

One of the most valuable datasets in auto parts is compatibility information, which parts fit which vehicles:

class CompatibilityMapper:
    def extract_compatibility(self, product_data):
        """Extract vehicle compatibility from product listings"""
        title = product_data.get("title", "")
        description = product_data.get("description", "")
        specs = product_data.get("specifications", {})

        compatibility = {
            "from_title": self.parse_vehicles_from_text(title),
            "from_description": self.parse_vehicles_from_text(description),
            "from_specs": self.parse_specs_compatibility(specs),
        }

        # Merge and deduplicate
        all_vehicles = set()
        for source in compatibility.values():
            all_vehicles.update(source)

        return list(all_vehicles)

    def parse_vehicles_from_text(self, text):
        """Extract vehicle make/model/year references from text"""
        vehicles = set()

        # Common patterns: "Honda Civic 2016-2021", "Toyota Camry 2018+"
        pattern = r'(Toyota|Honda|Nissan|Mazda|Mitsubishi|Suzuki|Hyundai|Kia|BMW|Mercedes|Audi)\s+(\w+[\s\w]*?)\s+(\d{4})\s*[-~]\s*(\d{4})'
        matches = re.finditer(pattern, text, re.IGNORECASE)

        for match in matches:
            make = match.group(1)
            model = match.group(2).strip()
            year_from = int(match.group(3))
            year_to = int(match.group(4))

            for year in range(year_from, year_to + 1):
                vehicles.add(f"{make} {model} {year}")

        return vehicles

    def parse_specs_compatibility(self, specs):
        """Parse compatibility from structured specification data"""
        vehicles = set()

        compat_keys = ["compatible with", "fits", "for vehicle", "application"]
        for key, value in specs.items():
            if any(ck in key.lower() for ck in compat_keys):
                vehicles.update(self.parse_vehicles_from_text(value))

        return vehicles

OEM Part Number Cross-Reference

class PartNumberCrossRef:
    def __init__(self, proxy_manager):
        self.proxy_manager = proxy_manager

    def find_cross_references(self, oem_part_number, make):
        """Find equivalent parts across different brands using OEM part numbers"""
        results = {
            "oem_number": oem_part_number,
            "make": make,
            "cross_references": [],
        }

        # Search multiple platforms for this part number
        proxy = self.proxy_manager.get_platform_proxy("specialized", "SG")

        session = requests.Session()
        session.proxies.update(proxy)

        # Search general marketplaces
        lazada_results = self.search_lazada_by_number(oem_part_number)
        shopee_results = self.search_shopee_by_number(oem_part_number)

        all_results = lazada_results + shopee_results

        # Extract alternative part numbers from results
        for result in all_results:
            alt_numbers = self.extract_part_numbers(result.get("title", ""))
            for num in alt_numbers:
                if num != oem_part_number:
                    results["cross_references"].append({
                        "part_number": num,
                        "source": result.get("platform"),
                        "brand": self.extract_brand(result.get("title", "")),
                        "price": result.get("price"),
                    })

        return results

Price Monitoring for Auto Parts

Tracking Parts Prices Over Time

class PartssPriceMonitor:
    def __init__(self, proxy_manager, db):
        self.proxy_manager = proxy_manager
        self.db = db

    def monitor_part(self, part_number, search_terms, countries):
        """Monitor pricing for a specific part across platforms and countries"""
        all_prices = []

        for country in countries:
            lazada_scraper = LazadaPartsScraper(self.proxy_manager)
            shopee_scraper = ShopeePartsScraper(self.proxy_manager)

            # Search by part number
            lazada_results = lazada_scraper.search_parts(part_number, country)
            shopee_results = shopee_scraper.search_parts(part_number, country)

            # Also search by description terms
            for term in search_terms:
                lazada_results.extend(lazada_scraper.search_parts(term, country))
                shopee_results.extend(shopee_scraper.search_parts(term, country))

            for result in lazada_results + shopee_results:
                if self.is_relevant_part(result, part_number, search_terms):
                    price = self.parse_price(result.get("price"))
                    if price:
                        all_prices.append({
                            "platform": result["platform"],
                            "country": country,
                            "price": price,
                            "seller": result.get("seller"),
                            "rating": result.get("rating"),
                            "sold_count": result.get("sold_count"),
                            "url": result.get("url"),
                        })

        # Store price snapshot
        self.db.save_price_snapshot(part_number, all_prices)

        return {
            "part_number": part_number,
            "total_sources": len(all_prices),
            "cheapest": min(all_prices, key=lambda x: x["price"]) if all_prices else None,
            "most_expensive": max(all_prices, key=lambda x: x["price"]) if all_prices else None,
            "average_price": statistics.mean([p["price"] for p in all_prices]) if all_prices else None,
            "by_country": self.group_by_country(all_prices),
            "by_platform": self.group_by_platform(all_prices),
        }

Insurance Repair Cost Validation

Building a Parts Cost Database for Claims

Insurance companies can validate repair cost estimates using scraped parts pricing:

class RepairCostValidator:
    def __init__(self, parts_db):
        self.parts_db = parts_db

    def validate_repair_estimate(self, claim):
        """Validate a repair cost estimate against market parts pricing"""
        validation_results = []

        for line_item in claim.get("parts_list", []):
            part_name = line_item["part_name"]
            claimed_price = line_item["claimed_price"]
            vehicle = claim.get("vehicle", {})

            # Look up market pricing
            market_data = self.parts_db.get_part_pricing(
                part_name=part_name,
                vehicle_make=vehicle.get("make"),
                vehicle_model=vehicle.get("model"),
                vehicle_year=vehicle.get("year"),
                country=claim.get("country")
            )

            if market_data:
                deviation = ((claimed_price - market_data["average_price"]) / market_data["average_price"]) * 100

                validation_results.append({
                    "part": part_name,
                    "claimed_price": claimed_price,
                    "market_average": market_data["average_price"],
                    "market_range": [market_data["min_price"], market_data["max_price"]],
                    "deviation_pct": deviation,
                    "flag": "OVERPRICED" if deviation > 50 else "OK",
                    "sample_size": market_data["sample_size"],
                })

        total_claimed = sum(item["claimed_price"] for item in claim.get("parts_list", []))
        total_market = sum(r["market_average"] for r in validation_results if r.get("market_average"))

        return {
            "claim_id": claim.get("claim_id"),
            "line_items": validation_results,
            "total_claimed": total_claimed,
            "total_market_value": total_market,
            "overall_deviation_pct": ((total_claimed - total_market) / total_market * 100) if total_market else None,
            "flags": [r for r in validation_results if r.get("flag") == "OVERPRICED"],
        }

Scaling Your Parts Scraping Operation

Handling Large Catalogs

Auto parts catalogs can contain millions of items. Strategies for scale:

  • Prioritize by demand: Focus on the most commonly needed parts first
  • Category-based scraping: Organize scraping by parts category to manage scope
  • Incremental updates: After initial collection, only scrape for price changes
  • Parallel execution: Use DataResearchTools’ high-concurrency proxy support to run multiple scrapers simultaneously

Proxy Bandwidth Optimization

Auto parts pages can be image-heavy. Optimize bandwidth:

  • Skip image downloads unless you need them
  • Use API endpoints where available instead of HTML pages
  • Cache product detail pages and only re-fetch at intervals
  • Use compressed response formats

Conclusion

The automotive parts and accessories marketplace in Southeast Asia offers rich data for businesses across the automotive value chain. From insurance claim validation to competitive pricing for parts distributors, systematically collected parts data provides actionable intelligence that manual research cannot match.

DataResearchTools mobile proxies are essential for accessing the major e-commerce platforms where auto parts are sold in Southeast Asia. With carrier-grade IPs from all major SEA countries, platform-optimized session management, and the bandwidth to handle large-scale product catalog scraping, DataResearchTools provides the proxy infrastructure needed to build comprehensive auto parts pricing databases.

Whether you are building a parts price comparison tool, validating insurance repair estimates, or optimizing a parts distribution business, reliable proxy access to Lazada, Shopee, Tokopedia, and specialized auto parts platforms is the foundation of data-driven operations in this space.


Related Reading

Scroll to Top