Scraping AutoTrader, Cars.com, and CarGurus with Rotating Proxies

Scraping AutoTrader, Cars.com, and CarGurus with Rotating Proxies

AutoTrader, Cars.com, and CarGurus are the three largest online automotive marketplaces in North America, collectively listing millions of vehicles from thousands of dealerships. These platforms contain invaluable data for market researchers, competitive intelligence teams, automotive startups, and businesses expanding between North American and Southeast Asian markets.

This guide provides a detailed technical walkthrough for scraping all three platforms using rotating proxies, covering the unique challenges each platform presents and the strategies that work best in 2026.

Why These Platforms Matter for Global Automotive Data

Even if your primary market is Southeast Asia, data from AutoTrader, Cars.com, and CarGurus serves important purposes:

  • Global pricing benchmarks: Compare vehicle values between North American and Southeast Asian markets to identify import/export opportunities
  • Market trend indicators: North American pricing trends often precede shifts in other markets by 3-6 months
  • Vehicle specification data: These platforms have the most comprehensive vehicle specification databases available
  • Dealer behavior analysis: Study how the world’s most mature automotive marketplace operates to inform strategy in emerging markets

Understanding Each Platform’s Defenses

AutoTrader

AutoTrader (autotrader.com) employs multi-layered bot protection:

  • Akamai Bot Manager: Advanced JavaScript challenges and device fingerprinting
  • Rate limiting: Both per-IP and per-session request limits
  • Browser validation: Checks for headless browser indicators
  • Cookie-based tracking: Persistent session cookies that track browsing patterns

AutoTrader is one of the most challenging automotive platforms to scrape due to its enterprise-grade security infrastructure.

Cars.com

Cars.com takes a slightly different approach:

  • Cloudflare protection: JavaScript challenges and managed rules
  • CAPTCHA integration: reCAPTCHA triggered by suspicious patterns
  • Session validation: Requires valid session cookies for detailed listing pages
  • API rate limiting: Strict limits on internal API endpoints

CarGurus

CarGurus implements its own protection system:

  • Custom bot detection: Proprietary behavioral analysis
  • IP reputation scoring: Known datacenter IPs are immediately blocked
  • Request pattern analysis: Detects scraping patterns based on request timing and navigation paths
  • Geographic validation: Cross-references IP location with search parameters

Proxy Strategy for Each Platform

Rotating Proxy Configuration

The key to scraping these platforms is using rotating proxies that match the expected traffic patterns. Each request should appear to come from a different legitimate user.

For all three platforms, mobile proxies offer the highest success rates because mobile traffic represents a significant and growing portion of automotive searches. DataResearchTools mobile proxies are effective for these sites because mobile IPs carry inherently high trust scores that bypass many of the anti-bot checks these platforms employ.

class AutoProxyManager:
    def __init__(self, api_key):
        self.base_url = "proxy.dataresearchtools.com"
        self.api_key = api_key

    def get_rotating_proxy(self, country="US"):
        return {
            "http": f"http://{self.api_key}:country-{country}-session-{uuid4()}@{self.base_url}:8080",
            "https": f"http://{self.api_key}:country-{country}-session-{uuid4()}@{self.base_url}:8080"
        }

    def get_sticky_proxy(self, country="US", duration_minutes=10):
        session_id = f"sticky-{hash(time.time()) % 10000}"
        return {
            "http": f"http://{self.api_key}:country-{country}-session-{session_id}-duration-{duration_minutes}@{self.base_url}:8080",
            "https": f"http://{self.api_key}:country-{country}-session-{session_id}-duration-{duration_minutes}@{self.base_url}:8080"
        }

Platform-Specific Proxy Recommendations

PlatformRecommended Proxy TypeRotation StrategySuccess Rate
AutoTraderMobilePer-request rotation92-96%
Cars.comMobile or ResidentialPer-request rotation90-95%
CarGurusMobileSticky (5-10 min)93-97%

Scraping AutoTrader

Approach: Headless Browser with Stealth

AutoTrader’s Akamai protection requires a stealth browser approach:

from playwright.sync_api import sync_playwright
from playwright_stealth import stealth_sync
import random

def scrape_autotrader(proxy_manager, search_params):
    with sync_playwright() as p:
        proxy = proxy_manager.get_rotating_proxy("US")
        browser = p.chromium.launch(
            proxy={"server": proxy["http"].split("@")[1].replace("http://", "")},
            headless=True
        )

        context = browser.new_context(
            user_agent=get_random_mobile_ua(),
            viewport={"width": 412, "height": 915},
            device_scale_factor=2.625,
        )

        page = context.new_page()
        stealth_sync(page)

        # Build search URL
        make = search_params.get("make", "")
        model = search_params.get("model", "")
        zip_code = search_params.get("zip", "90210")

        url = f"https://www.autotrader.com/cars-for-sale/all-cars/{make}/{model}?zip={zip_code}"
        page.goto(url, wait_until="networkidle")

        # Wait for listings to render
        page.wait_for_selector('[data-cmp="inventoryListing"]', timeout=15000)

        listings = []
        items = page.query_selector_all('[data-cmp="inventoryListing"]')

        for item in items:
            try:
                listing = extract_autotrader_listing(item)
                listings.append(listing)
            except Exception as e:
                continue

        browser.close()
        return listings

def extract_autotrader_listing(element):
    return {
        "title": safe_text(element, 'h2'),
        "price": safe_text(element, '[data-cmp="firstPrice"]'),
        "mileage": safe_text(element, '[class*="mileage"]'),
        "dealer": safe_text(element, '[data-cmp="dealerName"]'),
        "location": safe_text(element, '[class*="dealer-location"]'),
        "link": safe_attr(element, 'a', 'href'),
    }

Handling AutoTrader’s Pagination

AutoTrader loads results in pages of 25 listings. Navigate through pages systematically:

def scrape_autotrader_all_pages(proxy_manager, search_params, max_pages=20):
    all_listings = []

    for page_num in range(1, max_pages + 1):
        search_params["firstRecord"] = (page_num - 1) * 25
        proxy = proxy_manager.get_rotating_proxy("US")

        listings = scrape_autotrader(proxy_manager, search_params)
        if not listings:
            break

        all_listings.extend(listings)
        time.sleep(random.uniform(3, 7))

    return all_listings

Scraping Cars.com

Approach: API Interception

Cars.com’s search results are loaded via internal API calls that return structured JSON data. Intercepting these calls is more efficient than parsing HTML:

def scrape_carscom(proxy_manager, search_params):
    with sync_playwright() as p:
        proxy = proxy_manager.get_rotating_proxy("US")
        browser = p.chromium.launch(proxy={"server": proxy["http"]})

        context = browser.new_context(user_agent=get_random_ua())
        page = context.new_page()

        api_responses = []

        def handle_response(response):
            if "/api/searchresults" in response.url:
                try:
                    api_responses.append(response.json())
                except:
                    pass

        page.on("response", handle_response)

        make = search_params.get("make", "")
        url = f"https://www.cars.com/shopping/results/?stock_type=all&makes[]={make}"
        page.goto(url, wait_until="networkidle")

        browser.close()

        if api_responses:
            return parse_carscom_api_response(api_responses[0])
        return []

def parse_carscom_api_response(data):
    listings = []
    for vehicle in data.get("listings", []):
        listings.append({
            "title": vehicle.get("title"),
            "price": vehicle.get("price"),
            "mileage": vehicle.get("mileage"),
            "dealer_name": vehicle.get("dealer", {}).get("name"),
            "location": vehicle.get("dealer", {}).get("city"),
            "vin": vehicle.get("vin"),
            "url": vehicle.get("url"),
        })
    return listings

Direct API Approach

If you can replicate the necessary headers and cookies, direct API calls are faster:

def query_carscom_api(proxy, make, model, zip_code):
    url = "https://www.cars.com/api/searchresults/"

    params = {
        "makes[]": make,
        "models[]": f"{make}-{model}",
        "zip": zip_code,
        "stock_type": "all",
        "per_page": 100
    }

    headers = {
        "User-Agent": get_random_ua(),
        "Accept": "application/json",
        "Referer": "https://www.cars.com/shopping/results/",
    }

    response = requests.get(url, params=params, headers=headers, proxies=proxy)
    return response.json()

Scraping CarGurus

Approach: Structured Search with Sticky Sessions

CarGurus works best with sticky sessions that simulate a real user browsing through search results:

def scrape_cargurus(proxy_manager, search_params):
    proxy = proxy_manager.get_sticky_proxy("US", duration_minutes=10)

    session = requests.Session()
    session.proxies.update(proxy)
    session.headers.update({
        "User-Agent": get_random_mobile_ua(),
        "Accept-Language": "en-US,en;q=0.9",
    })

    # First, visit the homepage to get session cookies
    session.get("https://www.cargurus.com/")
    time.sleep(random.uniform(1, 3))

    # Build search URL
    make = search_params.get("make", "")
    url = f"https://www.cargurus.com/Cars/inventorylisting/viewDetailsFilterViewInventoryListing.action"

    params = {
        "entitySelectingHelper.selectedEntity": make,
        "zip": search_params.get("zip", "90210"),
        "distance": 50,
        "sortDir": "ASC",
        "sortType": "DEAL_SCORE",
    }

    response = session.get(url, params=params)

    if response.status_code == 200:
        return parse_cargurus_html(response.text)
    return []

def parse_cargurus_html(html_content):
    soup = BeautifulSoup(html_content, 'html.parser')
    listings = []

    for card in soup.select('[data-cg-ft="car-blade"]'):
        listing = {
            "title": safe_text_soup(card, 'h4'),
            "price": safe_text_soup(card, '[class*="price"]'),
            "deal_rating": safe_text_soup(card, '[class*="deal-rating"]'),
            "mileage": safe_text_soup(card, '[class*="mileage"]'),
            "dealer": safe_text_soup(card, '[class*="dealer-name"]'),
            "days_on_market": safe_text_soup(card, '[class*="days-on-market"]'),
        }
        listings.append(listing)

    return listings

CarGurus Deal Rating Data

CarGurus is unique in providing deal ratings (Great Deal, Good Deal, Fair Deal, etc.) that represent their algorithmic assessment of listing value. This data is particularly valuable for building price intelligence tools:

def extract_deal_ratings(listings):
    deal_distribution = {
        "great": 0,
        "good": 0,
        "fair": 0,
        "high": 0,
        "overpriced": 0
    }

    for listing in listings:
        rating = listing.get("deal_rating", "").lower()
        for key in deal_distribution:
            if key in rating:
                deal_distribution[key] += 1
                break

    return deal_distribution

Cross-Platform Data Aggregation

Unified Data Schema

Normalize data from all three platforms into a single schema:

class UnifiedListing:
    def __init__(self):
        self.source_platform = None
        self.source_url = None
        self.title = None
        self.make = None
        self.model = None
        self.year = None
        self.trim = None
        self.price = None
        self.mileage = None
        self.vin = None
        self.dealer_name = None
        self.dealer_location = None
        self.listing_date = None
        self.scraped_at = None

VIN-Based Deduplication

When the same vehicle appears on multiple platforms, use VIN to identify duplicates and compare pricing:

def find_cross_platform_listings(listings):
    vin_map = {}
    for listing in listings:
        if listing.vin:
            if listing.vin not in vin_map:
                vin_map[listing.vin] = []
            vin_map[listing.vin].append(listing)

    cross_listed = {vin: entries for vin, entries in vin_map.items() if len(entries) > 1}
    return cross_listed

Error Handling and Retry Logic

Scraping at scale requires robust error handling:

class ScraperWithRetry:
    def __init__(self, proxy_manager, max_retries=3):
        self.proxy_manager = proxy_manager
        self.max_retries = max_retries

    def scrape_with_retry(self, scrape_func, *args):
        for attempt in range(self.max_retries):
            try:
                result = scrape_func(self.proxy_manager, *args)
                if result:
                    return result
            except requests.exceptions.ProxyError:
                # Get a new proxy and retry
                continue
            except Exception as e:
                if "captcha" in str(e).lower():
                    # Switch to a fresh mobile proxy
                    time.sleep(random.uniform(5, 10))
                    continue
                raise

        return None

Performance Optimization

Concurrent Scraping

Run scrapers for different platforms simultaneously:

from concurrent.futures import ThreadPoolExecutor

def scrape_all_platforms(proxy_manager, search_params):
    with ThreadPoolExecutor(max_workers=3) as executor:
        futures = {
            executor.submit(scrape_autotrader, proxy_manager, search_params): "autotrader",
            executor.submit(scrape_carscom, proxy_manager, search_params): "carscom",
            executor.submit(scrape_cargurus, proxy_manager, search_params): "cargurus",
        }

        results = {}
        for future in futures:
            platform = futures[future]
            try:
                results[platform] = future.result(timeout=60)
            except Exception as e:
                results[platform] = {"error": str(e)}

    return results

Caching Strategy

Cache listing detail pages that rarely change (vehicle specifications) while re-fetching prices frequently:

class ListingCache:
    def __init__(self, cache_duration_hours=24):
        self.cache = {}
        self.cache_duration = cache_duration_hours * 3600

    def get_or_fetch(self, listing_id, fetch_func):
        if listing_id in self.cache:
            cached_at, data = self.cache[listing_id]
            if time.time() - cached_at < self.cache_duration:
                return data

        data = fetch_func(listing_id)
        self.cache[listing_id] = (time.time(), data)
        return data

Connecting North American and Southeast Asian Data

For businesses operating across both markets, combine data from AutoTrader, Cars.com, and CarGurus with Southeast Asian sources:

  • Import opportunity detection: Find vehicles priced significantly lower in North America than in Southeast Asian markets
  • Model availability comparison: Track which models are available in each market
  • Feature-level pricing: Compare how specific features affect pricing in different markets
  • Market maturity analysis: Use the depth of North American data to forecast Southeast Asian market development

DataResearchTools supports this cross-regional analysis by providing proxy infrastructure for both North American and Southeast Asian markets, allowing you to run unified data collection pipelines across all target geographies.

Conclusion

Scraping AutoTrader, Cars.com, and CarGurus requires sophisticated proxy infrastructure and platform-specific scraping strategies. Each platform has unique anti-bot measures that demand different approaches, from headless browsers with stealth plugins to API interception and sticky session management.

The rotating mobile proxies from DataResearchTools provide the high trust scores needed to maintain consistent access to these platforms. Combined with proper rate limiting, realistic browsing patterns, and robust error handling, you can build reliable data pipelines that deliver comprehensive automotive market intelligence from the world’s largest car listing platforms.


Related Reading

Scroll to Top