Scraping AutoTrader, Cars.com, and CarGurus with Rotating Proxies
AutoTrader, Cars.com, and CarGurus are the three largest online automotive marketplaces in North America, collectively listing millions of vehicles from thousands of dealerships. These platforms contain invaluable data for market researchers, competitive intelligence teams, automotive startups, and businesses expanding between North American and Southeast Asian markets.
This guide provides a detailed technical walkthrough for scraping all three platforms using rotating proxies, covering the unique challenges each platform presents and the strategies that work best in 2026.
Why These Platforms Matter for Global Automotive Data
Even if your primary market is Southeast Asia, data from AutoTrader, Cars.com, and CarGurus serves important purposes:
- Global pricing benchmarks: Compare vehicle values between North American and Southeast Asian markets to identify import/export opportunities
- Market trend indicators: North American pricing trends often precede shifts in other markets by 3-6 months
- Vehicle specification data: These platforms have the most comprehensive vehicle specification databases available
- Dealer behavior analysis: Study how the world’s most mature automotive marketplace operates to inform strategy in emerging markets
Understanding Each Platform’s Defenses
AutoTrader
AutoTrader (autotrader.com) employs multi-layered bot protection:
- Akamai Bot Manager: Advanced JavaScript challenges and device fingerprinting
- Rate limiting: Both per-IP and per-session request limits
- Browser validation: Checks for headless browser indicators
- Cookie-based tracking: Persistent session cookies that track browsing patterns
AutoTrader is one of the most challenging automotive platforms to scrape due to its enterprise-grade security infrastructure.
Cars.com
Cars.com takes a slightly different approach:
- Cloudflare protection: JavaScript challenges and managed rules
- CAPTCHA integration: reCAPTCHA triggered by suspicious patterns
- Session validation: Requires valid session cookies for detailed listing pages
- API rate limiting: Strict limits on internal API endpoints
CarGurus
CarGurus implements its own protection system:
- Custom bot detection: Proprietary behavioral analysis
- IP reputation scoring: Known datacenter IPs are immediately blocked
- Request pattern analysis: Detects scraping patterns based on request timing and navigation paths
- Geographic validation: Cross-references IP location with search parameters
Proxy Strategy for Each Platform
Rotating Proxy Configuration
The key to scraping these platforms is using rotating proxies that match the expected traffic patterns. Each request should appear to come from a different legitimate user.
For all three platforms, mobile proxies offer the highest success rates because mobile traffic represents a significant and growing portion of automotive searches. DataResearchTools mobile proxies are effective for these sites because mobile IPs carry inherently high trust scores that bypass many of the anti-bot checks these platforms employ.
class AutoProxyManager:
def __init__(self, api_key):
self.base_url = "proxy.dataresearchtools.com"
self.api_key = api_key
def get_rotating_proxy(self, country="US"):
return {
"http": f"http://{self.api_key}:country-{country}-session-{uuid4()}@{self.base_url}:8080",
"https": f"http://{self.api_key}:country-{country}-session-{uuid4()}@{self.base_url}:8080"
}
def get_sticky_proxy(self, country="US", duration_minutes=10):
session_id = f"sticky-{hash(time.time()) % 10000}"
return {
"http": f"http://{self.api_key}:country-{country}-session-{session_id}-duration-{duration_minutes}@{self.base_url}:8080",
"https": f"http://{self.api_key}:country-{country}-session-{session_id}-duration-{duration_minutes}@{self.base_url}:8080"
}Platform-Specific Proxy Recommendations
| Platform | Recommended Proxy Type | Rotation Strategy | Success Rate |
|---|---|---|---|
| AutoTrader | Mobile | Per-request rotation | 92-96% |
| Cars.com | Mobile or Residential | Per-request rotation | 90-95% |
| CarGurus | Mobile | Sticky (5-10 min) | 93-97% |
Scraping AutoTrader
Approach: Headless Browser with Stealth
AutoTrader’s Akamai protection requires a stealth browser approach:
from playwright.sync_api import sync_playwright
from playwright_stealth import stealth_sync
import random
def scrape_autotrader(proxy_manager, search_params):
with sync_playwright() as p:
proxy = proxy_manager.get_rotating_proxy("US")
browser = p.chromium.launch(
proxy={"server": proxy["http"].split("@")[1].replace("http://", "")},
headless=True
)
context = browser.new_context(
user_agent=get_random_mobile_ua(),
viewport={"width": 412, "height": 915},
device_scale_factor=2.625,
)
page = context.new_page()
stealth_sync(page)
# Build search URL
make = search_params.get("make", "")
model = search_params.get("model", "")
zip_code = search_params.get("zip", "90210")
url = f"https://www.autotrader.com/cars-for-sale/all-cars/{make}/{model}?zip={zip_code}"
page.goto(url, wait_until="networkidle")
# Wait for listings to render
page.wait_for_selector('[data-cmp="inventoryListing"]', timeout=15000)
listings = []
items = page.query_selector_all('[data-cmp="inventoryListing"]')
for item in items:
try:
listing = extract_autotrader_listing(item)
listings.append(listing)
except Exception as e:
continue
browser.close()
return listings
def extract_autotrader_listing(element):
return {
"title": safe_text(element, 'h2'),
"price": safe_text(element, '[data-cmp="firstPrice"]'),
"mileage": safe_text(element, '[class*="mileage"]'),
"dealer": safe_text(element, '[data-cmp="dealerName"]'),
"location": safe_text(element, '[class*="dealer-location"]'),
"link": safe_attr(element, 'a', 'href'),
}Handling AutoTrader’s Pagination
AutoTrader loads results in pages of 25 listings. Navigate through pages systematically:
def scrape_autotrader_all_pages(proxy_manager, search_params, max_pages=20):
all_listings = []
for page_num in range(1, max_pages + 1):
search_params["firstRecord"] = (page_num - 1) * 25
proxy = proxy_manager.get_rotating_proxy("US")
listings = scrape_autotrader(proxy_manager, search_params)
if not listings:
break
all_listings.extend(listings)
time.sleep(random.uniform(3, 7))
return all_listingsScraping Cars.com
Approach: API Interception
Cars.com’s search results are loaded via internal API calls that return structured JSON data. Intercepting these calls is more efficient than parsing HTML:
def scrape_carscom(proxy_manager, search_params):
with sync_playwright() as p:
proxy = proxy_manager.get_rotating_proxy("US")
browser = p.chromium.launch(proxy={"server": proxy["http"]})
context = browser.new_context(user_agent=get_random_ua())
page = context.new_page()
api_responses = []
def handle_response(response):
if "/api/searchresults" in response.url:
try:
api_responses.append(response.json())
except:
pass
page.on("response", handle_response)
make = search_params.get("make", "")
url = f"https://www.cars.com/shopping/results/?stock_type=all&makes[]={make}"
page.goto(url, wait_until="networkidle")
browser.close()
if api_responses:
return parse_carscom_api_response(api_responses[0])
return []
def parse_carscom_api_response(data):
listings = []
for vehicle in data.get("listings", []):
listings.append({
"title": vehicle.get("title"),
"price": vehicle.get("price"),
"mileage": vehicle.get("mileage"),
"dealer_name": vehicle.get("dealer", {}).get("name"),
"location": vehicle.get("dealer", {}).get("city"),
"vin": vehicle.get("vin"),
"url": vehicle.get("url"),
})
return listingsDirect API Approach
If you can replicate the necessary headers and cookies, direct API calls are faster:
def query_carscom_api(proxy, make, model, zip_code):
url = "https://www.cars.com/api/searchresults/"
params = {
"makes[]": make,
"models[]": f"{make}-{model}",
"zip": zip_code,
"stock_type": "all",
"per_page": 100
}
headers = {
"User-Agent": get_random_ua(),
"Accept": "application/json",
"Referer": "https://www.cars.com/shopping/results/",
}
response = requests.get(url, params=params, headers=headers, proxies=proxy)
return response.json()Scraping CarGurus
Approach: Structured Search with Sticky Sessions
CarGurus works best with sticky sessions that simulate a real user browsing through search results:
def scrape_cargurus(proxy_manager, search_params):
proxy = proxy_manager.get_sticky_proxy("US", duration_minutes=10)
session = requests.Session()
session.proxies.update(proxy)
session.headers.update({
"User-Agent": get_random_mobile_ua(),
"Accept-Language": "en-US,en;q=0.9",
})
# First, visit the homepage to get session cookies
session.get("https://www.cargurus.com/")
time.sleep(random.uniform(1, 3))
# Build search URL
make = search_params.get("make", "")
url = f"https://www.cargurus.com/Cars/inventorylisting/viewDetailsFilterViewInventoryListing.action"
params = {
"entitySelectingHelper.selectedEntity": make,
"zip": search_params.get("zip", "90210"),
"distance": 50,
"sortDir": "ASC",
"sortType": "DEAL_SCORE",
}
response = session.get(url, params=params)
if response.status_code == 200:
return parse_cargurus_html(response.text)
return []
def parse_cargurus_html(html_content):
soup = BeautifulSoup(html_content, 'html.parser')
listings = []
for card in soup.select('[data-cg-ft="car-blade"]'):
listing = {
"title": safe_text_soup(card, 'h4'),
"price": safe_text_soup(card, '[class*="price"]'),
"deal_rating": safe_text_soup(card, '[class*="deal-rating"]'),
"mileage": safe_text_soup(card, '[class*="mileage"]'),
"dealer": safe_text_soup(card, '[class*="dealer-name"]'),
"days_on_market": safe_text_soup(card, '[class*="days-on-market"]'),
}
listings.append(listing)
return listingsCarGurus Deal Rating Data
CarGurus is unique in providing deal ratings (Great Deal, Good Deal, Fair Deal, etc.) that represent their algorithmic assessment of listing value. This data is particularly valuable for building price intelligence tools:
def extract_deal_ratings(listings):
deal_distribution = {
"great": 0,
"good": 0,
"fair": 0,
"high": 0,
"overpriced": 0
}
for listing in listings:
rating = listing.get("deal_rating", "").lower()
for key in deal_distribution:
if key in rating:
deal_distribution[key] += 1
break
return deal_distributionCross-Platform Data Aggregation
Unified Data Schema
Normalize data from all three platforms into a single schema:
class UnifiedListing:
def __init__(self):
self.source_platform = None
self.source_url = None
self.title = None
self.make = None
self.model = None
self.year = None
self.trim = None
self.price = None
self.mileage = None
self.vin = None
self.dealer_name = None
self.dealer_location = None
self.listing_date = None
self.scraped_at = NoneVIN-Based Deduplication
When the same vehicle appears on multiple platforms, use VIN to identify duplicates and compare pricing:
def find_cross_platform_listings(listings):
vin_map = {}
for listing in listings:
if listing.vin:
if listing.vin not in vin_map:
vin_map[listing.vin] = []
vin_map[listing.vin].append(listing)
cross_listed = {vin: entries for vin, entries in vin_map.items() if len(entries) > 1}
return cross_listedError Handling and Retry Logic
Scraping at scale requires robust error handling:
class ScraperWithRetry:
def __init__(self, proxy_manager, max_retries=3):
self.proxy_manager = proxy_manager
self.max_retries = max_retries
def scrape_with_retry(self, scrape_func, *args):
for attempt in range(self.max_retries):
try:
result = scrape_func(self.proxy_manager, *args)
if result:
return result
except requests.exceptions.ProxyError:
# Get a new proxy and retry
continue
except Exception as e:
if "captcha" in str(e).lower():
# Switch to a fresh mobile proxy
time.sleep(random.uniform(5, 10))
continue
raise
return NonePerformance Optimization
Concurrent Scraping
Run scrapers for different platforms simultaneously:
from concurrent.futures import ThreadPoolExecutor
def scrape_all_platforms(proxy_manager, search_params):
with ThreadPoolExecutor(max_workers=3) as executor:
futures = {
executor.submit(scrape_autotrader, proxy_manager, search_params): "autotrader",
executor.submit(scrape_carscom, proxy_manager, search_params): "carscom",
executor.submit(scrape_cargurus, proxy_manager, search_params): "cargurus",
}
results = {}
for future in futures:
platform = futures[future]
try:
results[platform] = future.result(timeout=60)
except Exception as e:
results[platform] = {"error": str(e)}
return resultsCaching Strategy
Cache listing detail pages that rarely change (vehicle specifications) while re-fetching prices frequently:
class ListingCache:
def __init__(self, cache_duration_hours=24):
self.cache = {}
self.cache_duration = cache_duration_hours * 3600
def get_or_fetch(self, listing_id, fetch_func):
if listing_id in self.cache:
cached_at, data = self.cache[listing_id]
if time.time() - cached_at < self.cache_duration:
return data
data = fetch_func(listing_id)
self.cache[listing_id] = (time.time(), data)
return dataConnecting North American and Southeast Asian Data
For businesses operating across both markets, combine data from AutoTrader, Cars.com, and CarGurus with Southeast Asian sources:
- Import opportunity detection: Find vehicles priced significantly lower in North America than in Southeast Asian markets
- Model availability comparison: Track which models are available in each market
- Feature-level pricing: Compare how specific features affect pricing in different markets
- Market maturity analysis: Use the depth of North American data to forecast Southeast Asian market development
DataResearchTools supports this cross-regional analysis by providing proxy infrastructure for both North American and Southeast Asian markets, allowing you to run unified data collection pipelines across all target geographies.
Conclusion
Scraping AutoTrader, Cars.com, and CarGurus requires sophisticated proxy infrastructure and platform-specific scraping strategies. Each platform has unique anti-bot measures that demand different approaches, from headless browsers with stealth plugins to API interception and sticky session management.
The rotating mobile proxies from DataResearchTools provide the high trust scores needed to maintain consistent access to these platforms. Combined with proper rate limiting, realistic browsing patterns, and robust error handling, you can build reliable data pipelines that deliver comprehensive automotive market intelligence from the world’s largest car listing platforms.
- Automotive Inventory Tracking Across Multiple Dealer Websites
- Automotive Review Aggregation Using Proxy Networks
- How to Scrape AliExpress Product Data Without Getting Blocked
- Amazon Buy Box Monitoring: Proxy Setup for Continuous Tracking
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- API vs Web Scraping: When You Need Proxies (and When You Don’t)
- Automotive Inventory Tracking Across Multiple Dealer Websites
- Automotive Review Aggregation Using Proxy Networks
- aiohttp + BeautifulSoup: Async Python Scraping
- How to Scrape AliExpress Product Data Without Getting Blocked
- Amazon Buy Box Monitoring: Proxy Setup for Continuous Tracking
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- Automotive Inventory Tracking Across Multiple Dealer Websites
- Automotive Review Aggregation Using Proxy Networks
- aiohttp + BeautifulSoup: Async Python Scraping
- How to Scrape AliExpress Product Data Without Getting Blocked
- Amazon Buy Box Monitoring: Proxy Setup for Continuous Tracking
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
Related Reading
- Automotive Inventory Tracking Across Multiple Dealer Websites
- Automotive Review Aggregation Using Proxy Networks
- aiohttp + BeautifulSoup: Async Python Scraping
- How to Scrape AliExpress Product Data Without Getting Blocked
- Amazon Buy Box Monitoring: Proxy Setup for Continuous Tracking
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)