Scraping Motorcycle and Scooter Listings in Southeast Asian Markets
Motorcycles and scooters are the dominant mode of personal transportation across most of Southeast Asia. In Vietnam, Indonesia, Thailand, and the Philippines, two-wheelers vastly outnumber cars, making the motorcycle market one of the region’s most significant automotive segments. For manufacturers, dealers, market researchers, and platform operators, collecting data from two-wheeler marketplaces provides critical insights into a market that moves billions of dollars annually.
This guide covers how to scrape motorcycle and scooter listing data from Southeast Asian platforms using proxy infrastructure.
The Two-Wheeler Market in Southeast Asia
Market Scale
- Indonesia: Over 100 million registered motorcycles, largest market in the region
- Vietnam: Over 70 million motorcycles, highest motorcycle-to-car ratio globally
- Thailand: Over 20 million registered two-wheelers
- Philippines: Rapidly growing market, especially for small displacement motorcycles
- Malaysia: Significant motorcycle market alongside car ownership
Key Brands
- Japanese: Honda, Yamaha, Suzuki, Kawasaki (dominant across the region)
- Indian: TVS, Bajaj, Hero (growing presence)
- Chinese: CFMoto, Benelli, GPX (expanding rapidly)
- European: Vespa/Piaggio, Ducati, BMW Motorrad (premium segment)
- Local: Modenas (Malaysia), SYM (regional presence)
Platform Landscape
General Classifieds with Strong Two-Wheeler Sections:
- OLX (Indonesia, Philippines)
- Mudah (Malaysia)
- Carousell (Singapore, Malaysia, Philippines)
- Kaidee (Thailand)
- Cho Tot (Vietnam)
Specialized Motorcycle Platforms:
- GIIAS Motor (Indonesia)
- Bikewale / BikeAdvisor (regional)
- Oto.com (Indonesia, includes motorcycles)
- iMotorbike (Malaysia)
Manufacturer Direct:
- Honda, Yamaha, Suzuki regional websites
- Authorized dealer networks with online inventory
Proxy Requirements for Motorcycle Data
Geographic Targeting
Two-wheeler pricing varies dramatically across Southeast Asian countries due to local manufacturing, import duties, and government policies. Accurate data collection requires proxies that access each country’s platforms from local IPs.
DataResearchTools mobile proxies are particularly effective for motorcycle marketplace scraping because:
- The vast majority of two-wheeler buyers in Southeast Asia browse on mobile devices
- Motorcycle platforms in the region are designed mobile-first
- Mobile IPs match the expected traffic pattern perfectly
- Coverage across all major SEA markets ensures comprehensive data collection
class MotorcycleProxyManager:
def __init__(self, api_key):
self.api_key = api_key
self.endpoint = "proxy.dataresearchtools.com"
self.markets = {
"ID": {"name": "Indonesia", "platforms": ["olx_id", "oto"]},
"VN": {"name": "Vietnam", "platforms": ["chotot", "xe_may"]},
"TH": {"name": "Thailand", "platforms": ["kaidee", "bikethailand"]},
"PH": {"name": "Philippines", "platforms": ["olx_ph", "carousell_ph"]},
"MY": {"name": "Malaysia", "platforms": ["mudah", "imotorbike"]},
"SG": {"name": "Singapore", "platforms": ["carousell_sg", "sgbike"]},
}
def get_proxy(self, country):
session_id = uuid4().hex[:8]
auth = f"{self.api_key}:country-{country}-type-mobile-session-{session_id}"
return {
"http": f"http://{auth}@{self.endpoint}:8080",
"https": f"http://{auth}@{self.endpoint}:8080"
}Scraping Major Platforms
OLX Indonesia – Motorcycle Listings
Indonesia’s OLX is the largest marketplace for used motorcycles in the country:
class OLXMotorcycleScraper:
def __init__(self, proxy_manager):
self.proxy_manager = proxy_manager
self.base_url = "https://www.olx.co.id"
def search_motorcycles(self, make=None, city=None, page=1):
proxy = self.proxy_manager.get_proxy("ID")
session = requests.Session()
session.proxies.update(proxy)
session.headers.update({
"User-Agent": get_random_mobile_ua(),
"Accept-Language": "id-ID,id;q=0.9,en;q=0.8"
})
url = f"{self.base_url}/motor_c5116"
params = {"page": page}
if make:
params["filter"] = f"make_eq_{make.lower()}"
if city:
url = f"{self.base_url}/{city}/motor_c5116"
response = session.get(url, params=params, timeout=30)
if response.status_code == 200:
return self.parse_olx_results(response.text)
return []
def parse_olx_results(self, html):
soup = BeautifulSoup(html, 'html.parser')
listings = []
for item in soup.select('[data-aut-id="itemBox"]'):
listing = {
"platform": "olx_id",
"country": "ID",
"title": safe_text(item, '[data-aut-id="itemTitle"]'),
"price": safe_text(item, '[data-aut-id="itemPrice"]'),
"location": safe_text(item, '[data-aut-id="item-location"]'),
"date_posted": safe_text(item, '[data-aut-id="item-date"]'),
"url": item.select_one('a')['href'] if item.select_one('a') else None,
"image": item.select_one('img')['src'] if item.select_one('img') else None,
}
listings.append(listing)
return listings
def get_listing_details(self, listing_url):
proxy = self.proxy_manager.get_proxy("ID")
session = requests.Session()
session.proxies.update(proxy)
session.headers.update({"User-Agent": get_random_mobile_ua()})
full_url = f"{self.base_url}{listing_url}" if not listing_url.startswith("http") else listing_url
response = session.get(full_url, timeout=30)
if response.status_code == 200:
return self.parse_listing_detail(response.text)
return None
def parse_listing_detail(self, html):
soup = BeautifulSoup(html, 'html.parser')
details = {}
for row in soup.select('[class*="detail"] [class*="row"], [class*="spec"] tr'):
label = row.select_one('[class*="label"], td:first-child')
value = row.select_one('[class*="value"], td:last-child')
if label and value:
details[label.get_text(strip=True).lower()] = value.get_text(strip=True)
return {
"make": details.get("merk", details.get("brand")),
"model": details.get("model"),
"year": details.get("tahun", details.get("year")),
"mileage_km": details.get("kilometer", details.get("jarak tempuh")),
"engine_cc": details.get("kapasitas mesin", details.get("cc")),
"transmission": details.get("transmisi", details.get("transmission")),
"color": details.get("warna", details.get("color")),
"condition": details.get("kondisi", details.get("condition")),
"description": safe_text(soup, '[class*="description"]'),
}Cho Tot Vietnam – Motorcycle Section
class ChoTotMotorcycleScraper:
def __init__(self, proxy_manager):
self.proxy_manager = proxy_manager
self.base_url = "https://www.chotot.com"
def search_motorcycles(self, city=None, make=None, page=1):
proxy = self.proxy_manager.get_proxy("VN")
session = requests.Session()
session.proxies.update(proxy)
session.headers.update({
"User-Agent": get_random_mobile_ua(),
"Accept-Language": "vi-VN,vi;q=0.9,en;q=0.8"
})
url = f"{self.base_url}/mua-ban-xe-may"
params = {"page": page}
if city:
url = f"{self.base_url}/{city}/mua-ban-xe-may"
if make:
params["make"] = make
response = session.get(url, params=params, timeout=30)
if response.status_code == 200:
return self.parse_results(response.text)
return []
def parse_results(self, html):
soup = BeautifulSoup(html, 'html.parser')
listings = []
for item in soup.select('.AdItem, [class*="ad-item"]'):
listings.append({
"platform": "chotot",
"country": "VN",
"title": safe_text(item, '.AdItem_title, [class*="title"]'),
"price": safe_text(item, '.AdItem_price, [class*="price"]'),
"location": safe_text(item, '.AdItem_location, [class*="location"]'),
"url": item.select_one('a')['href'] if item.select_one('a') else None,
})
return listingsMudah Malaysia – Motorcycles
class MudahMotorcycleScraper:
def __init__(self, proxy_manager):
self.proxy_manager = proxy_manager
def search_motorcycles(self, state=None, make=None, page=1):
proxy = self.proxy_manager.get_proxy("MY")
session = requests.Session()
session.proxies.update(proxy)
session.headers.update({
"User-Agent": get_random_mobile_ua(),
"Accept-Language": "en-MY,en;q=0.9,ms;q=0.8"
})
url = "https://www.mudah.my/malaysia/motorcycles-for-sale"
params = {"o": page}
if state:
url = f"https://www.mudah.my/{state}/motorcycles-for-sale"
if make:
params["q"] = make
response = session.get(url, params=params, timeout=30)
if response.status_code == 200:
return self.parse_results(response.text)
return []
def parse_results(self, html):
soup = BeautifulSoup(html, 'html.parser')
listings = []
for item in soup.select('.listing-item, [class*="listing"]'):
listings.append({
"platform": "mudah",
"country": "MY",
"vehicle_type": "motorcycle",
"title": safe_text(item, '.listing-title'),
"price": safe_text(item, '.listing-price'),
"location": safe_text(item, '.listing-location'),
"year": safe_text(item, '.listing-year'),
"engine_cc": safe_text(item, '.listing-cc'),
"url": item.select_one('a')['href'] if item.select_one('a') else None,
})
return listingsData Normalization for Two-Wheelers
Vehicle Classification
Motorcycles and scooters require different classification than cars:
class TwoWheelerClassifier:
def classify(self, listing):
title = listing.get("title", "").lower()
engine_cc = self.extract_cc(listing)
# Determine vehicle type
if any(word in title for word in ["scooter", "matic", "automatic", "vespa", "scoopy", "beat", "mio", "vario"]):
vehicle_type = "scooter"
elif any(word in title for word in ["cub", "kapchai", "underbone", "wave", "supra", "revo"]):
vehicle_type = "underbone"
elif any(word in title for word in ["sport", "ninja", "cbr", "r15", "r25", "gsx"]):
vehicle_type = "sport"
elif any(word in title for word in ["cruiser", "harley", "rebel", "vulcan"]):
vehicle_type = "cruiser"
elif any(word in title for word in ["adventure", "adv", "versys", "vstrom", "tenere"]):
vehicle_type = "adventure"
elif any(word in title for word in ["trail", "enduro", "crf", "klx", "wr"]):
vehicle_type = "offroad"
else:
vehicle_type = "standard"
# Determine segment by engine size
if engine_cc:
if engine_cc <= 125:
segment = "small"
elif engine_cc <= 250:
segment = "medium"
elif engine_cc <= 500:
segment = "mid_large"
else:
segment = "large"
else:
segment = "unknown"
return {
"vehicle_type": vehicle_type,
"segment": segment,
"engine_cc": engine_cc,
}
def extract_cc(self, listing):
"""Extract engine displacement from listing data"""
cc_str = listing.get("engine_cc", "")
if cc_str:
match = re.search(r'(\d+)\s*(?:cc|CC)', str(cc_str))
if match:
return int(match.group(1))
# Try extracting from title
title = listing.get("title", "")
match = re.search(r'(\d{2,4})\s*(?:cc|CC)', title)
if match:
return int(match.group(1))
# Common model name patterns
model_cc = {
"beat": 110, "vario": 125, "scoopy": 110, "mio": 125,
"nmax": 155, "xmax": 250, "aerox": 155, "pcx": 160,
"cbr150": 150, "r15": 155, "ninja 250": 250, "ninja 400": 400,
"cb150": 150, "mt-15": 155, "duke 200": 200, "duke 390": 373,
}
title_lower = title.lower()
for model, cc in model_cc.items():
if model in title_lower:
return cc
return NonePrice Normalization
Motorcycle prices span a huge range, from $500 used scooters to $30,000+ premium bikes:
class MotorcyclePriceNormalizer:
CURRENCIES = {
"ID": {"currency": "IDR", "symbols": ["Rp", "IDR"], "divisor": 1},
"VN": {"currency": "VND", "symbols": ["₫", "VND", "đ"], "divisor": 1},
"TH": {"currency": "THB", "symbols": ["฿", "THB", "บาท"], "divisor": 1},
"PH": {"currency": "PHP", "symbols": ["₱", "PHP"], "divisor": 1},
"MY": {"currency": "MYR", "symbols": ["RM", "MYR"], "divisor": 1},
"SG": {"currency": "SGD", "symbols": ["S$", "SGD", "$"], "divisor": 1},
}
def normalize(self, price_str, country):
if not price_str:
return None
config = self.CURRENCIES.get(country, {})
# Clean price string
clean = price_str
for symbol in config.get("symbols", []):
clean = clean.replace(symbol, "")
# Handle Indonesian abbreviations
if country == "ID":
clean_lower = clean.lower().strip()
if "jt" in clean_lower or "juta" in clean_lower:
clean = re.sub(r'(?:jt|juta)', '', clean_lower).strip()
try:
return float(clean.replace(",", ".")) * 1000000
except:
return None
elif "rb" in clean_lower or "ribu" in clean_lower:
clean = re.sub(r'(?:rb|ribu)', '', clean_lower).strip()
try:
return float(clean.replace(",", ".")) * 1000
except:
return None
# Handle Vietnamese pricing (often in millions)
if country == "VN":
clean_lower = clean.lower().strip()
if "triệu" in clean_lower or "tr" in clean_lower:
clean = re.sub(r'(?:triệu|tr)', '', clean_lower).strip()
try:
return float(clean.replace(",", ".")) * 1000000
except:
return None
# Standard number parsing
clean = clean.replace(",", "").replace(" ", "").strip()
try:
return float(clean)
except:
return NoneMarket Analysis for Two-Wheelers
Market Segmentation
class MotorcycleMarketAnalyzer:
def analyze_market(self, listings, country):
classifier = TwoWheelerClassifier()
normalizer = MotorcyclePriceNormalizer()
analyzed = []
for listing in listings:
classification = classifier.classify(listing)
price = normalizer.normalize(listing.get("price"), country)
analyzed.append({
**listing,
**classification,
"price_normalized": price,
})
return {
"total_listings": len(analyzed),
"by_type": Counter(a["vehicle_type"] for a in analyzed),
"by_segment": Counter(a["segment"] for a in analyzed),
"by_make": Counter(self.extract_make(a.get("title", "")) for a in analyzed).most_common(10),
"price_distribution": self.calculate_price_distribution(analyzed),
"avg_price_by_type": self.avg_price_by_group(analyzed, "vehicle_type"),
"avg_price_by_segment": self.avg_price_by_group(analyzed, "segment"),
}
def calculate_price_distribution(self, analyzed):
prices = [a["price_normalized"] for a in analyzed if a.get("price_normalized")]
if not prices:
return None
return {
"min": min(prices),
"max": max(prices),
"mean": statistics.mean(prices),
"median": statistics.median(prices),
"p25": np.percentile(prices, 25),
"p75": np.percentile(prices, 75),
}
def avg_price_by_group(self, analyzed, group_key):
groups = {}
for item in analyzed:
group = item.get(group_key, "unknown")
price = item.get("price_normalized")
if price:
if group not in groups:
groups[group] = []
groups[group].append(price)
return {group: statistics.mean(prices) for group, prices in groups.items() if prices}Brand Market Share
def calculate_brand_market_share(listings):
makes = Counter()
for listing in listings:
make = extract_make(listing.get("title", ""))
if make:
makes[make] += 1
total = sum(makes.values())
market_share = {
make: {"count": count, "share_pct": round(count / total * 100, 1)}
for make, count in makes.most_common()
}
return market_shareBuilding a Comprehensive Two-Wheeler Database
Data Pipeline
class MotorcycleDataPipeline:
def __init__(self, proxy_manager):
self.proxy_manager = proxy_manager
self.scrapers = {
"ID": [OLXMotorcycleScraper(proxy_manager)],
"VN": [ChoTotMotorcycleScraper(proxy_manager)],
"MY": [MudahMotorcycleScraper(proxy_manager)],
}
def collect_all_markets(self):
all_data = {}
for country, scrapers in self.scrapers.items():
country_listings = []
for scraper in scrapers:
try:
listings = self.collect_all_pages(scraper, country)
country_listings.extend(listings)
except Exception as e:
logger.error(f"Failed collecting from {country}: {e}")
all_data[country] = country_listings
logger.info(f"Collected {len(country_listings)} motorcycle listings from {country}")
return all_data
def collect_all_pages(self, scraper, country, max_pages=50):
all_listings = []
for page in range(1, max_pages + 1):
listings = scraper.search_motorcycles(page=page)
if not listings:
break
all_listings.extend(listings)
time.sleep(random.uniform(2, 5))
return all_listingsConclusion
The motorcycle and scooter market in Southeast Asia represents one of the largest and most dynamic two-wheeler markets globally. Scraping listing data from platforms across Indonesia, Vietnam, Thailand, Malaysia, the Philippines, and Singapore provides the raw material for market intelligence that drives better decisions for manufacturers, dealers, and investors.
DataResearchTools mobile proxies are the ideal infrastructure for two-wheeler marketplace scraping in Southeast Asia. The mobile-first nature of these markets means mobile proxy traffic blends naturally with legitimate users. With carrier-level IPs across all major SEA countries and the bandwidth to support comprehensive data collection, DataResearchTools enables businesses to build complete two-wheeler market datasets from the region’s leading platforms.
Whether you are a motorcycle manufacturer tracking competitive pricing, a dealer monitoring inventory across the region, or a researcher studying mobility patterns, systematic data collection from two-wheeler marketplaces provides insights that manual monitoring simply cannot match.
- Automotive Inventory Tracking Across Multiple Dealer Websites
- Automotive Review Aggregation Using Proxy Networks
- How to Scrape AliExpress Product Data Without Getting Blocked
- Amazon Buy Box Monitoring: Proxy Setup for Continuous Tracking
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- API vs Web Scraping: When You Need Proxies (and When You Don’t)
- Automotive Inventory Tracking Across Multiple Dealer Websites
- Automotive Review Aggregation Using Proxy Networks
- aiohttp + BeautifulSoup: Async Python Scraping
- How to Scrape AliExpress Product Data Without Getting Blocked
- Amazon Buy Box Monitoring: Proxy Setup for Continuous Tracking
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- Automotive Inventory Tracking Across Multiple Dealer Websites
- Automotive Review Aggregation Using Proxy Networks
- aiohttp + BeautifulSoup: Async Python Scraping
- How to Scrape AliExpress Product Data Without Getting Blocked
- Amazon Buy Box Monitoring: Proxy Setup for Continuous Tracking
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
Related Reading
- Automotive Inventory Tracking Across Multiple Dealer Websites
- Automotive Review Aggregation Using Proxy Networks
- aiohttp + BeautifulSoup: Async Python Scraping
- How to Scrape AliExpress Product Data Without Getting Blocked
- Amazon Buy Box Monitoring: Proxy Setup for Continuous Tracking
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)