How to Scrape OLX Classifieds Across Countries (2026)

OLX runs classifieds in over 40 countries, and scraping it is not a single problem — it’s a dozen slightly different problems wearing the same brand. Poland, India, Brazil, Portugal, South Africa, and Ukraine each run on different subdomains with different anti-bot postures, different HTML structures, and in some cases entirely different backend APIs. if you’ve already worked through scrapers for similar platforms like Avito Russia or Gumtree, the core patterns carry over, but OLX’s multi-market architecture adds a layer of complexity worth mapping before you write a single line of code.

understand the OLX market landscape first

OLX Group (owned by Prosus/Naspers) operates each country market as a semi-independent subdomain: olx.pl, olx.com.br, olx.ua, olx.co.za, olx.in, olx.pt, and so on. they don’t share a unified frontend or API schema, so what works on Poland may break on India.

the anti-bot posture also varies significantly by market:

Market	Cloudflare	JSON API available	Rate limit (est.)	IP sensitivity
olx.pl (Poland)	no	yes (internal REST)	~60 req/min	medium
olx.com.br (Brazil)	yes (managed)	partial	~30 req/min	high
olx.in (India)	yes (bot score)	yes	~40 req/min	high
olx.ua (Ukraine)	no	yes	~80 req/min	low
olx.co.za (South Africa)	no	partial	~50 req/min	low
olx.pt (Portugal)	yes (JS challenge)	no	~25 req/min	very high

markets with Cloudflare and no accessible JSON API (Portugal is the worst offender) need Playwright. markets like Poland and Ukraine expose internal listing APIs the browser calls on page load — intercept those and you get clean JSON without rendering overhead.

hit the internal JSON API where it’s available

on olx.pl, open devtools network tab, load any category page, and you’ll see XHR calls to https://www.olx.pl/api/v1/offers/ with query parameters for category, region, page, and filters. this is your scraping target.

import httpx
import time

PROXY = "http://user:pass@mobile-proxy-sg:8101"
HEADERS = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
    "Accept": "application/json",
    "Referer": "https://www.olx.pl/nieruchomosci/mieszkania/sprzedaz/",
    "x-requested-with": "XMLHttpRequest",
}

def fetch_olx_pl_listings(category_id: int, page: int = 1):
    url = "https://www.olx.pl/api/v1/offers/"
    params = {
        "offset": (page - 1) * 40,
        "limit": 40,
        "category_id": category_id,
        "sort_by": "created_at:desc",
    }
    with httpx.Client(proxies={"https://": PROXY}, headers=HEADERS, timeout=15) as client:
        resp = client.get(url, params=params)
        resp.raise_for_status()
        return resp.json()["data"]

listings = fetch_olx_pl_listings(category_id=15)  # real estate

the response gives you title, price, location, category, listing URL, and a contact object. note: phone numbers are almost never in the listing JSON directly. they’re fetched via a separate POST /api/v1/offers/{id}/limited-phones/ call that requires a session cookie — which means you need to run a login flow or accept that phone-gated contact data won’t be available at scale without additional orchestration.

pagination patterns differ by market

OLX doesn’t standardize pagination across markets:

olx.pl, olx.ua: use offset + limit in the JSON API. straightforward, just increment offset by limit.
olx.com.br, olx.in: cursor-based pagination. the response includes a nextPage token or metadata.next_page field. store and pass this token in each subsequent request.
olx.co.za: page number param (?page=2) in the HTML layer. no JSON API, requires parsing listing cards from rendered HTML.
olx.pt: full JS rendering required. no page param in the URL — pagination is handled by a “load more” button that fires an XHR internally.

for cursor-based markets, write your loop defensively:

next_cursor = None
all_listings = []

while True:
    params = {"limit": 40}
    if next_cursor:
        params["page_token"] = next_cursor
    data = fetch_page(params)
    all_listings.extend(data["offers"])
    next_cursor = data.get("metadata", {}).get("next_page")
    if not next_cursor:
        break
    time.sleep(1.5)

this pattern also applies to Kijiji Canada, which uses a nearly identical offset-cursor hybrid depending on category.

proxy strategy across OLX markets

datacenter IPs get blocked fast on OLX — most markets will 429 or soft-block a datacenter range within 200-500 requests. residential proxies work better, but mobile proxies are the most reliable choice for high-sensitivity markets (Brazil, India, Portugal) because they carry real carrier ASNs that OLX’s fraud scoring treats as genuine consumer traffic.

key points for proxy selection:

use country-matched proxies: an Indian IP for olx.in, a Polish IP for olx.pl. cross-country residential IPs still trigger geo-mismatch signals on some markets.
rotate per request (not per session) on high-sensitivity markets. on low-sensitivity markets like Ukraine, session rotation every 50-100 requests is fine and faster.
set realistic request delays: 1-2 seconds on low-sensitivity markets, 2-4 seconds on high-sensitivity. anything faster than 0.5 req/sec per IP will get you rate-limited.
if you’re running multi-market scrapes in parallel, segment your proxy pool by country to avoid IP reputation cross-contamination.

this same principle — matching IP type to the target market’s bot-scoring model — is covered in depth in the context of price comparison work in Mobile Proxies for Insurance Quote Comparison, and the underlying logic transfers directly here.

schema design for multi-country data

the easiest mistake is building per-country tables. maintain one olx_listings table with a country_code column and handle market-specific field differences in your ETL layer, not your schema.

recommended unified schema:

listing_id       TEXT    -- OLX's internal ID (not URL slug)
country_code     TEXT    -- 'PL', 'BR', 'IN', 'UA', etc.
title            TEXT
price_amount     NUMERIC
price_currency   TEXT
location_city    TEXT
location_region  TEXT
category_id      INT
category_name    TEXT
url              TEXT
phone_available  BOOL    -- true if phone was fetchable
scraped_at       TIMESTAMPTZ
raw_json         JSONB   -- keep the source payload

store raw_json always. OLX adds and removes fields by market without notice, and having the raw payload means you can backfill derived columns without re-scraping. for eBay Kleinanzeigen, a similar raw-first approach saved significant re-scraping time when their listing schema shifted in early 2026.

numbered checklist before going to production:

verify the target market’s Cloudflare posture in a fresh browser session (check CF ray header)
confirm whether JSON API endpoints exist by inspecting network traffic on a live category page
test your proxy pool with 50 requests across 10 IPs and measure block rate
map pagination type (offset / cursor / page param) for your target category
decide whether phone-gated data is worth the session overhead for your use case
set up deduplication on (listing_id, country_code) before you run any bulk ingestion

Bottom line

OLX is scrapeable at scale in most markets if you match your approach to each market’s anti-bot posture: JSON API + light proxying for Poland and Ukraine, Playwright + mobile proxies for Portugal and Brazil. don’t build separate scrapers per country — build one configurable scraper with market-specific adapters and a unified schema. DRT covers the full classifieds landscape if you’re mapping competitors across markets; the patterns here apply directly to Avito, Gumtree, Kijiji, and the rest of the ecosystem.