Scraping Motorcycle and Scooter Listings in Southeast Asian Markets

Scraping Motorcycle and Scooter Listings in Southeast Asian Markets

Motorcycles and scooters are the dominant mode of personal transportation across most of Southeast Asia. In Vietnam, Indonesia, Thailand, and the Philippines, two-wheelers vastly outnumber cars, making the motorcycle market one of the region’s most significant automotive segments. For manufacturers, dealers, market researchers, and platform operators, collecting data from two-wheeler marketplaces provides critical insights into a market that moves billions of dollars annually.

This guide covers how to scrape motorcycle and scooter listing data from Southeast Asian platforms using proxy infrastructure.

The Two-Wheeler Market in Southeast Asia

Market Scale

  • Indonesia: Over 100 million registered motorcycles, largest market in the region
  • Vietnam: Over 70 million motorcycles, highest motorcycle-to-car ratio globally
  • Thailand: Over 20 million registered two-wheelers
  • Philippines: Rapidly growing market, especially for small displacement motorcycles
  • Malaysia: Significant motorcycle market alongside car ownership

Key Brands

  • Japanese: Honda, Yamaha, Suzuki, Kawasaki (dominant across the region)
  • Indian: TVS, Bajaj, Hero (growing presence)
  • Chinese: CFMoto, Benelli, GPX (expanding rapidly)
  • European: Vespa/Piaggio, Ducati, BMW Motorrad (premium segment)
  • Local: Modenas (Malaysia), SYM (regional presence)

Platform Landscape

General Classifieds with Strong Two-Wheeler Sections:

  • OLX (Indonesia, Philippines)
  • Mudah (Malaysia)
  • Carousell (Singapore, Malaysia, Philippines)
  • Kaidee (Thailand)
  • Cho Tot (Vietnam)

Specialized Motorcycle Platforms:

  • GIIAS Motor (Indonesia)
  • Bikewale / BikeAdvisor (regional)
  • Oto.com (Indonesia, includes motorcycles)
  • iMotorbike (Malaysia)

Manufacturer Direct:

  • Honda, Yamaha, Suzuki regional websites
  • Authorized dealer networks with online inventory

Proxy Requirements for Motorcycle Data

Geographic Targeting

Two-wheeler pricing varies dramatically across Southeast Asian countries due to local manufacturing, import duties, and government policies. Accurate data collection requires proxies that access each country’s platforms from local IPs.

DataResearchTools mobile proxies are particularly effective for motorcycle marketplace scraping because:

  • The vast majority of two-wheeler buyers in Southeast Asia browse on mobile devices
  • Motorcycle platforms in the region are designed mobile-first
  • Mobile IPs match the expected traffic pattern perfectly
  • Coverage across all major SEA markets ensures comprehensive data collection
class MotorcycleProxyManager:
    def __init__(self, api_key):
        self.api_key = api_key
        self.endpoint = "proxy.dataresearchtools.com"
        self.markets = {
            "ID": {"name": "Indonesia", "platforms": ["olx_id", "oto"]},
            "VN": {"name": "Vietnam", "platforms": ["chotot", "xe_may"]},
            "TH": {"name": "Thailand", "platforms": ["kaidee", "bikethailand"]},
            "PH": {"name": "Philippines", "platforms": ["olx_ph", "carousell_ph"]},
            "MY": {"name": "Malaysia", "platforms": ["mudah", "imotorbike"]},
            "SG": {"name": "Singapore", "platforms": ["carousell_sg", "sgbike"]},
        }

    def get_proxy(self, country):
        session_id = uuid4().hex[:8]
        auth = f"{self.api_key}:country-{country}-type-mobile-session-{session_id}"
        return {
            "http": f"http://{auth}@{self.endpoint}:8080",
            "https": f"http://{auth}@{self.endpoint}:8080"
        }

Scraping Major Platforms

OLX Indonesia – Motorcycle Listings

Indonesia’s OLX is the largest marketplace for used motorcycles in the country:

class OLXMotorcycleScraper:
    def __init__(self, proxy_manager):
        self.proxy_manager = proxy_manager
        self.base_url = "https://www.olx.co.id"

    def search_motorcycles(self, make=None, city=None, page=1):
        proxy = self.proxy_manager.get_proxy("ID")

        session = requests.Session()
        session.proxies.update(proxy)
        session.headers.update({
            "User-Agent": get_random_mobile_ua(),
            "Accept-Language": "id-ID,id;q=0.9,en;q=0.8"
        })

        url = f"{self.base_url}/motor_c5116"
        params = {"page": page}

        if make:
            params["filter"] = f"make_eq_{make.lower()}"
        if city:
            url = f"{self.base_url}/{city}/motor_c5116"

        response = session.get(url, params=params, timeout=30)

        if response.status_code == 200:
            return self.parse_olx_results(response.text)
        return []

    def parse_olx_results(self, html):
        soup = BeautifulSoup(html, 'html.parser')
        listings = []

        for item in soup.select('[data-aut-id="itemBox"]'):
            listing = {
                "platform": "olx_id",
                "country": "ID",
                "title": safe_text(item, '[data-aut-id="itemTitle"]'),
                "price": safe_text(item, '[data-aut-id="itemPrice"]'),
                "location": safe_text(item, '[data-aut-id="item-location"]'),
                "date_posted": safe_text(item, '[data-aut-id="item-date"]'),
                "url": item.select_one('a')['href'] if item.select_one('a') else None,
                "image": item.select_one('img')['src'] if item.select_one('img') else None,
            }
            listings.append(listing)

        return listings

    def get_listing_details(self, listing_url):
        proxy = self.proxy_manager.get_proxy("ID")

        session = requests.Session()
        session.proxies.update(proxy)
        session.headers.update({"User-Agent": get_random_mobile_ua()})

        full_url = f"{self.base_url}{listing_url}" if not listing_url.startswith("http") else listing_url
        response = session.get(full_url, timeout=30)

        if response.status_code == 200:
            return self.parse_listing_detail(response.text)
        return None

    def parse_listing_detail(self, html):
        soup = BeautifulSoup(html, 'html.parser')

        details = {}
        for row in soup.select('[class*="detail"] [class*="row"], [class*="spec"] tr'):
            label = row.select_one('[class*="label"], td:first-child')
            value = row.select_one('[class*="value"], td:last-child')
            if label and value:
                details[label.get_text(strip=True).lower()] = value.get_text(strip=True)

        return {
            "make": details.get("merk", details.get("brand")),
            "model": details.get("model"),
            "year": details.get("tahun", details.get("year")),
            "mileage_km": details.get("kilometer", details.get("jarak tempuh")),
            "engine_cc": details.get("kapasitas mesin", details.get("cc")),
            "transmission": details.get("transmisi", details.get("transmission")),
            "color": details.get("warna", details.get("color")),
            "condition": details.get("kondisi", details.get("condition")),
            "description": safe_text(soup, '[class*="description"]'),
        }

Cho Tot Vietnam – Motorcycle Section

class ChoTotMotorcycleScraper:
    def __init__(self, proxy_manager):
        self.proxy_manager = proxy_manager
        self.base_url = "https://www.chotot.com"

    def search_motorcycles(self, city=None, make=None, page=1):
        proxy = self.proxy_manager.get_proxy("VN")

        session = requests.Session()
        session.proxies.update(proxy)
        session.headers.update({
            "User-Agent": get_random_mobile_ua(),
            "Accept-Language": "vi-VN,vi;q=0.9,en;q=0.8"
        })

        url = f"{self.base_url}/mua-ban-xe-may"
        params = {"page": page}

        if city:
            url = f"{self.base_url}/{city}/mua-ban-xe-may"
        if make:
            params["make"] = make

        response = session.get(url, params=params, timeout=30)

        if response.status_code == 200:
            return self.parse_results(response.text)
        return []

    def parse_results(self, html):
        soup = BeautifulSoup(html, 'html.parser')
        listings = []

        for item in soup.select('.AdItem, [class*="ad-item"]'):
            listings.append({
                "platform": "chotot",
                "country": "VN",
                "title": safe_text(item, '.AdItem_title, [class*="title"]'),
                "price": safe_text(item, '.AdItem_price, [class*="price"]'),
                "location": safe_text(item, '.AdItem_location, [class*="location"]'),
                "url": item.select_one('a')['href'] if item.select_one('a') else None,
            })

        return listings

Mudah Malaysia – Motorcycles

class MudahMotorcycleScraper:
    def __init__(self, proxy_manager):
        self.proxy_manager = proxy_manager

    def search_motorcycles(self, state=None, make=None, page=1):
        proxy = self.proxy_manager.get_proxy("MY")

        session = requests.Session()
        session.proxies.update(proxy)
        session.headers.update({
            "User-Agent": get_random_mobile_ua(),
            "Accept-Language": "en-MY,en;q=0.9,ms;q=0.8"
        })

        url = "https://www.mudah.my/malaysia/motorcycles-for-sale"
        params = {"o": page}

        if state:
            url = f"https://www.mudah.my/{state}/motorcycles-for-sale"
        if make:
            params["q"] = make

        response = session.get(url, params=params, timeout=30)

        if response.status_code == 200:
            return self.parse_results(response.text)
        return []

    def parse_results(self, html):
        soup = BeautifulSoup(html, 'html.parser')
        listings = []

        for item in soup.select('.listing-item, [class*="listing"]'):
            listings.append({
                "platform": "mudah",
                "country": "MY",
                "vehicle_type": "motorcycle",
                "title": safe_text(item, '.listing-title'),
                "price": safe_text(item, '.listing-price'),
                "location": safe_text(item, '.listing-location'),
                "year": safe_text(item, '.listing-year'),
                "engine_cc": safe_text(item, '.listing-cc'),
                "url": item.select_one('a')['href'] if item.select_one('a') else None,
            })

        return listings

Data Normalization for Two-Wheelers

Vehicle Classification

Motorcycles and scooters require different classification than cars:

class TwoWheelerClassifier:
    def classify(self, listing):
        title = listing.get("title", "").lower()
        engine_cc = self.extract_cc(listing)

        # Determine vehicle type
        if any(word in title for word in ["scooter", "matic", "automatic", "vespa", "scoopy", "beat", "mio", "vario"]):
            vehicle_type = "scooter"
        elif any(word in title for word in ["cub", "kapchai", "underbone", "wave", "supra", "revo"]):
            vehicle_type = "underbone"
        elif any(word in title for word in ["sport", "ninja", "cbr", "r15", "r25", "gsx"]):
            vehicle_type = "sport"
        elif any(word in title for word in ["cruiser", "harley", "rebel", "vulcan"]):
            vehicle_type = "cruiser"
        elif any(word in title for word in ["adventure", "adv", "versys", "vstrom", "tenere"]):
            vehicle_type = "adventure"
        elif any(word in title for word in ["trail", "enduro", "crf", "klx", "wr"]):
            vehicle_type = "offroad"
        else:
            vehicle_type = "standard"

        # Determine segment by engine size
        if engine_cc:
            if engine_cc <= 125:
                segment = "small"
            elif engine_cc <= 250:
                segment = "medium"
            elif engine_cc <= 500:
                segment = "mid_large"
            else:
                segment = "large"
        else:
            segment = "unknown"

        return {
            "vehicle_type": vehicle_type,
            "segment": segment,
            "engine_cc": engine_cc,
        }

    def extract_cc(self, listing):
        """Extract engine displacement from listing data"""
        cc_str = listing.get("engine_cc", "")
        if cc_str:
            match = re.search(r'(\d+)\s*(?:cc|CC)', str(cc_str))
            if match:
                return int(match.group(1))

        # Try extracting from title
        title = listing.get("title", "")
        match = re.search(r'(\d{2,4})\s*(?:cc|CC)', title)
        if match:
            return int(match.group(1))

        # Common model name patterns
        model_cc = {
            "beat": 110, "vario": 125, "scoopy": 110, "mio": 125,
            "nmax": 155, "xmax": 250, "aerox": 155, "pcx": 160,
            "cbr150": 150, "r15": 155, "ninja 250": 250, "ninja 400": 400,
            "cb150": 150, "mt-15": 155, "duke 200": 200, "duke 390": 373,
        }

        title_lower = title.lower()
        for model, cc in model_cc.items():
            if model in title_lower:
                return cc

        return None

Price Normalization

Motorcycle prices span a huge range, from $500 used scooters to $30,000+ premium bikes:

class MotorcyclePriceNormalizer:
    CURRENCIES = {
        "ID": {"currency": "IDR", "symbols": ["Rp", "IDR"], "divisor": 1},
        "VN": {"currency": "VND", "symbols": ["₫", "VND", "đ"], "divisor": 1},
        "TH": {"currency": "THB", "symbols": ["฿", "THB", "บาท"], "divisor": 1},
        "PH": {"currency": "PHP", "symbols": ["₱", "PHP"], "divisor": 1},
        "MY": {"currency": "MYR", "symbols": ["RM", "MYR"], "divisor": 1},
        "SG": {"currency": "SGD", "symbols": ["S$", "SGD", "$"], "divisor": 1},
    }

    def normalize(self, price_str, country):
        if not price_str:
            return None

        config = self.CURRENCIES.get(country, {})

        # Clean price string
        clean = price_str
        for symbol in config.get("symbols", []):
            clean = clean.replace(symbol, "")

        # Handle Indonesian abbreviations
        if country == "ID":
            clean_lower = clean.lower().strip()
            if "jt" in clean_lower or "juta" in clean_lower:
                clean = re.sub(r'(?:jt|juta)', '', clean_lower).strip()
                try:
                    return float(clean.replace(",", ".")) * 1000000
                except:
                    return None
            elif "rb" in clean_lower or "ribu" in clean_lower:
                clean = re.sub(r'(?:rb|ribu)', '', clean_lower).strip()
                try:
                    return float(clean.replace(",", ".")) * 1000
                except:
                    return None

        # Handle Vietnamese pricing (often in millions)
        if country == "VN":
            clean_lower = clean.lower().strip()
            if "triệu" in clean_lower or "tr" in clean_lower:
                clean = re.sub(r'(?:triệu|tr)', '', clean_lower).strip()
                try:
                    return float(clean.replace(",", ".")) * 1000000
                except:
                    return None

        # Standard number parsing
        clean = clean.replace(",", "").replace(" ", "").strip()
        try:
            return float(clean)
        except:
            return None

Market Analysis for Two-Wheelers

Market Segmentation

class MotorcycleMarketAnalyzer:
    def analyze_market(self, listings, country):
        classifier = TwoWheelerClassifier()
        normalizer = MotorcyclePriceNormalizer()

        analyzed = []
        for listing in listings:
            classification = classifier.classify(listing)
            price = normalizer.normalize(listing.get("price"), country)

            analyzed.append({
                **listing,
                **classification,
                "price_normalized": price,
            })

        return {
            "total_listings": len(analyzed),
            "by_type": Counter(a["vehicle_type"] for a in analyzed),
            "by_segment": Counter(a["segment"] for a in analyzed),
            "by_make": Counter(self.extract_make(a.get("title", "")) for a in analyzed).most_common(10),
            "price_distribution": self.calculate_price_distribution(analyzed),
            "avg_price_by_type": self.avg_price_by_group(analyzed, "vehicle_type"),
            "avg_price_by_segment": self.avg_price_by_group(analyzed, "segment"),
        }

    def calculate_price_distribution(self, analyzed):
        prices = [a["price_normalized"] for a in analyzed if a.get("price_normalized")]
        if not prices:
            return None

        return {
            "min": min(prices),
            "max": max(prices),
            "mean": statistics.mean(prices),
            "median": statistics.median(prices),
            "p25": np.percentile(prices, 25),
            "p75": np.percentile(prices, 75),
        }

    def avg_price_by_group(self, analyzed, group_key):
        groups = {}
        for item in analyzed:
            group = item.get(group_key, "unknown")
            price = item.get("price_normalized")
            if price:
                if group not in groups:
                    groups[group] = []
                groups[group].append(price)

        return {group: statistics.mean(prices) for group, prices in groups.items() if prices}

Brand Market Share

def calculate_brand_market_share(listings):
    makes = Counter()
    for listing in listings:
        make = extract_make(listing.get("title", ""))
        if make:
            makes[make] += 1

    total = sum(makes.values())
    market_share = {
        make: {"count": count, "share_pct": round(count / total * 100, 1)}
        for make, count in makes.most_common()
    }

    return market_share

Building a Comprehensive Two-Wheeler Database

Data Pipeline

class MotorcycleDataPipeline:
    def __init__(self, proxy_manager):
        self.proxy_manager = proxy_manager
        self.scrapers = {
            "ID": [OLXMotorcycleScraper(proxy_manager)],
            "VN": [ChoTotMotorcycleScraper(proxy_manager)],
            "MY": [MudahMotorcycleScraper(proxy_manager)],
        }

    def collect_all_markets(self):
        all_data = {}

        for country, scrapers in self.scrapers.items():
            country_listings = []
            for scraper in scrapers:
                try:
                    listings = self.collect_all_pages(scraper, country)
                    country_listings.extend(listings)
                except Exception as e:
                    logger.error(f"Failed collecting from {country}: {e}")

            all_data[country] = country_listings
            logger.info(f"Collected {len(country_listings)} motorcycle listings from {country}")

        return all_data

    def collect_all_pages(self, scraper, country, max_pages=50):
        all_listings = []
        for page in range(1, max_pages + 1):
            listings = scraper.search_motorcycles(page=page)
            if not listings:
                break
            all_listings.extend(listings)
            time.sleep(random.uniform(2, 5))
        return all_listings

Conclusion

The motorcycle and scooter market in Southeast Asia represents one of the largest and most dynamic two-wheeler markets globally. Scraping listing data from platforms across Indonesia, Vietnam, Thailand, Malaysia, the Philippines, and Singapore provides the raw material for market intelligence that drives better decisions for manufacturers, dealers, and investors.

DataResearchTools mobile proxies are the ideal infrastructure for two-wheeler marketplace scraping in Southeast Asia. The mobile-first nature of these markets means mobile proxy traffic blends naturally with legitimate users. With carrier-level IPs across all major SEA countries and the bandwidth to support comprehensive data collection, DataResearchTools enables businesses to build complete two-wheeler market datasets from the region’s leading platforms.

Whether you are a motorcycle manufacturer tracking competitive pricing, a dealer monitoring inventory across the region, or a researcher studying mobility patterns, systematic data collection from two-wheeler marketplaces provides insights that manual monitoring simply cannot match.


Related Reading

Scroll to Top