How to Scrape AutoScout24 Car Listings Data

TL;DR
AutoScout24 blocks scrapers aggressively. use Playwright with a residential proxy, parse the embedded JSON-LD data for structured car details, and add session rotation to stay under rate limits.

AutoScout24 is Europe’s largest online car marketplace with millions of listings across 18 countries. dealers, data brokers, and automotive startups all want this data — market pricing, make/model distribution, time-to-sell metrics. the site serves it all publicly, but blocks automated access with bot detection and dynamic page rendering.

this guide walks through what actually works in 2026 for scraping AutoScout24 at scale.

what you’re dealing with

AutoScout24 uses a React-based frontend that loads listing data asynchronously. simple HTTP requests with BeautifulSoup return an empty shell. you need either a headless browser or the underlying API call that the frontend makes.

the site also runs Cloudflare and its own bot detection layer. residential proxies are required for sustained scraping — datacenter IPs get flagged quickly. see our overview of what is a proxy server for proxy types.

option 1: intercept the api call

open Chrome DevTools, load a search results page, and filter network requests by XHR. you’ll find a call to the AutoScout24 search API that returns JSON. replicate that call directly:

import requests

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
    "Accept": "application/json",
    "Referer": "https://www.autoscout24.com/lst",
    "X-Requested-With": "XMLHttpRequest",
}

params = {
    "make": "bmw",
    "sort": "standard",
    "desc": "0",
    "ustate": "N,U",
    "size": "20",
    "page": "1",
    "cy": "D",
    "atype": "C",
}

proxies = {"http": "http://user:pass@proxy:port",
           "https": "http://user:pass@proxy:port"}

r = requests.get(
    "https://www.autoscout24.com/lst",
    headers=headers,
    params=params,
    proxies=proxies,
    timeout=15
)
data = r.json()
listings = data.get("listings", [])
for car in listings:
    print(car.get("title"), car.get("price"), car.get("mileage"))

this approach is fastest but fragile — AutoScout24 can change the API structure or add authentication at any time.

option 2: playwright with residential proxy

from playwright.sync_api import sync_playwright
import json

proxy_config = {
    "server": "http://proxy-host:port",
    "username": "your-username",
    "password": "your-password"
}

with sync_playwright() as p:
    browser = p.chromium.launch(proxy=proxy_config, headless=True)
    context = browser.new_context(
        user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
        viewport={"width": 1280, "height": 800}
    )
    page = context.new_page()
    page.goto("https://www.autoscout24.com/lst/bmw?sort=standard&desc=0")
    page.wait_for_selector("article.ListItem_wrapper__TxHWu", timeout=15000)

    listings = page.query_selector_all("article[data-item-name='listing-item']")
    for item in listings:
        title = item.query_selector("h2")
        price = item.query_selector("[data-testid='price-label']")
        if title and price:
            print(title.inner_text(), "|", price.inner_text())

    browser.close()

extracting json-ld structured data

many listing pages embed structured data in JSON-LD format inside a <script type="application/ld+json"> tag. this is cleaner to parse than scraping rendered HTML:

from bs4 import BeautifulSoup
import json

# after fetching page html (via playwright or requests)
soup = BeautifulSoup(html, "html.parser")
scripts = soup.find_all("script", type="application/ld+json")
for s in scripts:
    try:
        data = json.loads(s.string)
        if data.get("@type") == "Car":
            print(data.get("name"), data.get("offers", {}).get("price"))
    except:
        pass

handling pagination

import time

def scrape_pages(make, max_pages=10):
    all_cars = []
    for page_num in range(1, max_pages + 1):
        url = f"https://www.autoscout24.com/lst/{make}?page={page_num}"
        # ... fetch and parse
        time.sleep(2 + random.uniform(0, 1))  # jitter helps avoid detection
    return all_cars

add random delays between requests. uniform patterns are more detectable than human-like variance. for proxy rotation strategies, see our guide on SOCKS5 vs HTTP proxy.

what data you can extract

  • make, model, variant, year
  • price (asking price)
  • mileage, fuel type, transmission
  • location (city, country)
  • listing date, dealer vs private seller
  • color, doors, power (kW/hp)

legal and ethical notes

AutoScout24’s terms of service prohibit automated data collection for commercial use. always consult legal counsel before building a commercial data product on scraped marketplace data. see our overview of what is web scraping for general guidelines on responsible scraping.

sources and further reading

related guides

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top