How to Scrape Used Car Listings from Carousell and Mudah
Carousell and Mudah are two of the largest online classifieds platforms in Southeast Asia. Together, they host millions of used car listings across Singapore, Malaysia, the Philippines, Indonesia, and beyond. For automotive researchers, dealers, and data-driven businesses, these platforms represent an invaluable source of real-time market data.
This guide walks you through the process of scraping used car listings from both platforms, covering technical setup, proxy configuration, data extraction strategies, and common challenges you will encounter along the way.
Why Scrape Carousell and Mudah for Vehicle Data
Market Intelligence
Used car pricing in Southeast Asia is highly dynamic. Prices fluctuate based on supply, demand, government policies such as COE in Singapore, currency movements, and seasonal trends. Scraping these platforms gives you access to real-time pricing data that is impossible to obtain through any official API.
Competitive Analysis
Car dealers and resellers use scraped listing data to understand competitor pricing, identify undervalued vehicles, and spot inventory gaps in the market. A dealer in Kuala Lumpur can monitor every Mudah listing in their area to ensure their prices remain competitive.
Research and Analytics
Researchers studying automotive market trends, depreciation curves, and consumer behavior rely on large datasets that can only be built through systematic data collection from these platforms.
Understanding the Technical Landscape
Carousell Architecture
Carousell operates as a mobile-first platform with a web interface. Key technical characteristics include:
- Single Page Application (SPA): Built with React, Carousell renders content dynamically using JavaScript. This means traditional HTTP scraping may miss content that loads asynchronously.
- GraphQL API: Carousell uses GraphQL for its internal API, which can be accessed directly once you identify the correct endpoints and query structures.
- Anti-bot measures: Rate limiting, device fingerprinting, and session validation are all employed to detect automated access.
- Mobile API: The mobile app communicates through dedicated API endpoints that often return cleaner, more structured data than the web interface.
Mudah Architecture
Mudah (mudah.my) serves primarily the Malaysian market with a more traditional web architecture:
- Server-side rendering: Much of Mudah’s content is rendered server-side, making basic HTTP scraping more straightforward.
- Structured data: Listings include structured metadata that can be parsed from HTML without JavaScript rendering.
- Rate limiting: Mudah implements IP-based rate limiting that triggers after sustained request volumes.
- Regional filtering: Listings are organized by Malaysian states and regions, with location data embedded in URLs.
Setting Up Your Proxy Infrastructure
Both Carousell and Mudah will block your IP address if you send too many requests without proxies. For reliable scraping, you need a proxy solution that meets these requirements:
Geographic Requirements
- Carousell: Use proxies from Singapore, Malaysia, the Philippines, or Indonesia depending on which country’s listings you need. Carousell shows different listings based on your apparent location.
- Mudah: Malaysian proxies are essential since Mudah serves the Malaysian market exclusively. Singapore proxies can also work but may trigger additional verification.
Proxy Type Recommendations
For Carousell, mobile proxies from DataResearchTools are the optimal choice. Since Carousell is mobile-first, traffic from mobile IPs matches the platform’s expected user behavior perfectly. Mobile proxies also bypass the device fingerprinting checks that trip up datacenter and residential proxies.
For Mudah, residential proxies work well for basic listing scraping, while mobile proxies are recommended for higher-volume operations or when accessing listing details that require session continuity.
Configuration Example
from dataresearchtools_proxy import ProxyManager
# Configure proxies for Carousell (Singapore)
carousell_proxy = ProxyManager(
provider="dataresearchtools",
country="SG",
proxy_type="mobile",
rotation="per_request"
)
# Configure proxies for Mudah (Malaysia)
mudah_proxy = ProxyManager(
provider="dataresearchtools",
country="MY",
proxy_type="mobile",
rotation="sticky",
session_duration=300 # 5-minute sticky sessions
)Scraping Carousell Used Car Listings
Method 1: Web Scraping with Headless Browser
Because Carousell uses client-side rendering, a headless browser approach is often necessary:
from playwright.sync_api import sync_playwright
import json
def scrape_carousell_cars(proxy_config):
with sync_playwright() as p:
browser = p.chromium.launch(
proxy={
"server": proxy_config["server"],
"username": proxy_config["username"],
"password": proxy_config["password"]
}
)
page = browser.new_page()
page.set_extra_http_headers({
"Accept-Language": "en-SG,en;q=0.9"
})
# Navigate to car listings
page.goto("https://www.carousell.sg/categories/cars-159/")
page.wait_for_selector('[data-testid="listing-card"]')
listings = []
cards = page.query_selector_all('[data-testid="listing-card"]')
for card in cards:
listing = {
"title": card.query_selector("p").inner_text(),
"price": card.query_selector('[data-testid="listing-price"]').inner_text(),
"url": card.query_selector("a").get_attribute("href"),
}
listings.append(listing)
browser.close()
return listingsMethod 2: GraphQL API Direct Access
For faster and more efficient data collection, you can query Carousell’s GraphQL API directly:
import requests
def query_carousell_api(proxy, search_params):
url = "https://www.carousell.sg/api/2.0/graphql/"
query = {
"operationName": "searchProducts",
"variables": {
"categoryId": 159, # Cars category
"country": "SG",
"count": 40,
"filters": search_params
}
}
headers = {
"Content-Type": "application/json",
"User-Agent": "Mozilla/5.0 (iPhone; CPU iPhone OS 16_0 like Mac OS X)"
}
response = requests.post(url, json=query, headers=headers, proxies=proxy)
return response.json()Data Points Available from Carousell
Each Carousell car listing typically contains:
- Vehicle make and model
- Year of manufacture
- Asking price
- Mileage (odometer reading)
- Transmission type
- Fuel type
- COE expiry date (Singapore listings)
- Seller type (dealer or individual)
- Listing date and last updated time
- Location
- Photos (URLs)
- Description text
Scraping Mudah Used Car Listings
Basic HTTP Scraping
Mudah’s server-rendered pages make basic HTTP scraping viable:
import requests
from bs4 import BeautifulSoup
def scrape_mudah_cars(proxy, page_num=1):
url = f"https://www.mudah.my/malaysia/cars-for-sale?o={page_num}"
headers = {
"User-Agent": "Mozilla/5.0 (Linux; Android 12; SM-G991B)",
"Accept-Language": "en-MY,en;q=0.9,ms;q=0.8"
}
response = requests.get(url, headers=headers, proxies=proxy)
soup = BeautifulSoup(response.text, "html.parser")
listings = []
for item in soup.select(".listing-item"):
listing = {
"title": item.select_one(".listing-title").get_text(strip=True),
"price": item.select_one(".listing-price").get_text(strip=True),
"location": item.select_one(".listing-location").get_text(strip=True),
"year": item.select_one(".listing-year").get_text(strip=True),
"url": item.select_one("a")["href"],
}
listings.append(listing)
return listingsPaginating Through Results
Mudah organizes listings with pagination. To collect comprehensive data, you need to iterate through all available pages:
def scrape_all_mudah_listings(proxy_manager):
all_listings = []
page = 1
while True:
proxy = proxy_manager.get_proxy()
listings = scrape_mudah_cars(proxy, page)
if not listings:
break
all_listings.extend(listings)
page += 1
# Respectful delay between pages
time.sleep(random.uniform(2, 5))
return all_listingsData Points Available from Mudah
Mudah car listings typically include:
- Vehicle make, model, and variant
- Year of manufacture
- Price in Malaysian Ringgit
- Mileage
- Transmission type
- Fuel type
- Body type
- Color
- Seller location (state and area)
- Seller type (dealer or owner)
- Listing date
- Contact information
- Photos
Handling Common Challenges
Challenge 1: Dynamic Content Loading
Carousell uses infinite scroll to load additional listings. When using headless browsers, you need to simulate scrolling:
async def scroll_and_collect(page, max_listings=200):
collected = 0
previous_count = 0
while collected < max_listings:
await page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
await page.wait_for_timeout(2000)
cards = await page.query_selector_all('[data-testid="listing-card"]')
collected = len(cards)
if collected == previous_count:
break # No new listings loaded
previous_count = collected
return cardsChallenge 2: CAPTCHA Encounters
Both platforms may present CAPTCHAs during scraping. Strategies to minimize CAPTCHA frequency:
- Use mobile proxies from DataResearchTools, which carry high trust scores and rarely trigger CAPTCHAs
- Implement realistic browsing patterns with variable delays
- Rotate user agents alongside proxy rotation
- When a CAPTCHA appears, switch to a new IP immediately rather than attempting to solve it
Challenge 3: Price Format Variations
Car prices on these platforms come in various formats that need normalization:
import re
def normalize_price(price_text, currency="SGD"):
# Remove currency symbols and formatting
clean = re.sub(r'[^\d.]', '', price_text)
if clean:
return float(clean)
return NoneChallenge 4: Listing Deduplication
Both platforms allow sellers to relist vehicles, creating duplicates. Implement deduplication based on:
- Vehicle images (perceptual hashing)
- Seller ID combined with vehicle description
- Price and specification matching algorithms
Building a Complete Data Pipeline
A production-grade pipeline for Carousell and Mudah scraping should include these components:
Scheduler
Run your scrapers at regular intervals, typically every 4-6 hours for active markets. Avoid peak hours when anti-bot systems are most sensitive.
Data Storage
Store raw scraped data in a structured format. A database schema for car listings might include:
- Listing ID (platform-specific)
- Platform source
- Vehicle make and model
- Year
- Price
- Mileage
- Location
- Seller type
- Scrape timestamp
- Listing URL
- Raw HTML or JSON (for reprocessing)
Change Detection
Track price changes, new listings, and removed listings between scrape cycles. This change data is often more valuable than the raw listings themselves.
Data Validation
Implement validation rules to filter out obviously incorrect data such as unrealistic prices, impossible mileage figures, or incomplete listings.
Scaling Your Operation
As your data needs grow, consider these scaling strategies:
- Parallel scraping: Run multiple scraper instances simultaneously using different proxy sessions from DataResearchTools
- Geographic distribution: Scrape different regions in parallel rather than sequentially
- Incremental updates: After your initial full scrape, switch to monitoring only new and changed listings
- API preference: Where available, prefer API access over HTML scraping for higher throughput and cleaner data
Conclusion
Scraping used car listings from Carousell and Mudah provides access to rich automotive market data across Southeast Asia. The key to successful, sustainable scraping lies in using the right proxy infrastructure, implementing respectful scraping patterns, and building robust data pipelines.
DataResearchTools mobile proxies are particularly effective for these platforms because they match the mobile-first user behavior that both Carousell and Mudah expect. With proper proxy rotation, geographic targeting, and session management, you can build comprehensive used car market datasets that power pricing intelligence, competitive analysis, and market research across the region.
- Automotive Inventory Tracking Across Multiple Dealer Websites
- Automotive Review Aggregation Using Proxy Networks
- How to Scrape AliExpress Product Data Without Getting Blocked
- Amazon Buy Box Monitoring: Proxy Setup for Continuous Tracking
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- API vs Web Scraping: When You Need Proxies (and When You Don’t)
- Automotive Inventory Tracking Across Multiple Dealer Websites
- Automotive Review Aggregation Using Proxy Networks
- aiohttp + BeautifulSoup: Async Python Scraping
- How to Scrape AliExpress Product Data Without Getting Blocked
- Amazon Buy Box Monitoring: Proxy Setup for Continuous Tracking
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- Automotive Inventory Tracking Across Multiple Dealer Websites
- Automotive Review Aggregation Using Proxy Networks
- aiohttp + BeautifulSoup: Async Python Scraping
- How to Scrape AliExpress Product Data Without Getting Blocked
- Amazon Buy Box Monitoring: Proxy Setup for Continuous Tracking
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
Related Reading
- Automotive Inventory Tracking Across Multiple Dealer Websites
- Automotive Review Aggregation Using Proxy Networks
- aiohttp + BeautifulSoup: Async Python Scraping
- How to Scrape AliExpress Product Data Without Getting Blocked
- Amazon Buy Box Monitoring: Proxy Setup for Continuous Tracking
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)