OLX runs classifieds in over 40 countries, and scraping it is not a single problem — it’s a dozen slightly different problems wearing the same brand. Poland, India, Brazil, Portugal, South Africa, and Ukraine each run on different subdomains with different anti-bot postures, different HTML structures, and in some cases entirely different backend APIs. if you’ve already worked through scrapers for similar platforms like Avito Russia or Gumtree, the core patterns carry over, but OLX’s multi-market architecture adds a layer of complexity worth mapping before you write a single line of code.
understand the OLX market landscape first
OLX Group (owned by Prosus/Naspers) operates each country market as a semi-independent subdomain: olx.pl, olx.com.br, olx.ua, olx.co.za, olx.in, olx.pt, and so on. they don’t share a unified frontend or API schema, so what works on Poland may break on India.
the anti-bot posture also varies significantly by market:
| Market | Cloudflare | JSON API available | Rate limit (est.) | IP sensitivity |
|---|---|---|---|---|
| olx.pl (Poland) | no | yes (internal REST) | ~60 req/min | medium |
| olx.com.br (Brazil) | yes (managed) | partial | ~30 req/min | high |
| olx.in (India) | yes (bot score) | yes | ~40 req/min | high |
| olx.ua (Ukraine) | no | yes | ~80 req/min | low |
| olx.co.za (South Africa) | no | partial | ~50 req/min | low |
| olx.pt (Portugal) | yes (JS challenge) | no | ~25 req/min | very high |
markets with Cloudflare and no accessible JSON API (Portugal is the worst offender) need Playwright. markets like Poland and Ukraine expose internal listing APIs the browser calls on page load — intercept those and you get clean JSON without rendering overhead.
hit the internal JSON API where it’s available
on olx.pl, open devtools network tab, load any category page, and you’ll see XHR calls to https://www.olx.pl/api/v1/offers/ with query parameters for category, region, page, and filters. this is your scraping target.
import httpx
import time
PROXY = "http://user:pass@mobile-proxy-sg:8101"
HEADERS = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
"Accept": "application/json",
"Referer": "https://www.olx.pl/nieruchomosci/mieszkania/sprzedaz/",
"x-requested-with": "XMLHttpRequest",
}
def fetch_olx_pl_listings(category_id: int, page: int = 1):
url = "https://www.olx.pl/api/v1/offers/"
params = {
"offset": (page - 1) * 40,
"limit": 40,
"category_id": category_id,
"sort_by": "created_at:desc",
}
with httpx.Client(proxies={"https://": PROXY}, headers=HEADERS, timeout=15) as client:
resp = client.get(url, params=params)
resp.raise_for_status()
return resp.json()["data"]
listings = fetch_olx_pl_listings(category_id=15) # real estatethe response gives you title, price, location, category, listing URL, and a contact object. note: phone numbers are almost never in the listing JSON directly. they’re fetched via a separate POST /api/v1/offers/{id}/limited-phones/ call that requires a session cookie — which means you need to run a login flow or accept that phone-gated contact data won’t be available at scale without additional orchestration.
pagination patterns differ by market
OLX doesn’t standardize pagination across markets:
- olx.pl, olx.ua: use
offset+limitin the JSON API. straightforward, just increment offset by limit. - olx.com.br, olx.in: cursor-based pagination. the response includes a
nextPagetoken ormetadata.next_pagefield. store and pass this token in each subsequent request. - olx.co.za: page number param (
?page=2) in the HTML layer. no JSON API, requires parsing listing cards from rendered HTML. - olx.pt: full JS rendering required. no page param in the URL — pagination is handled by a “load more” button that fires an XHR internally.
for cursor-based markets, write your loop defensively:
next_cursor = None
all_listings = []
while True:
params = {"limit": 40}
if next_cursor:
params["page_token"] = next_cursor
data = fetch_page(params)
all_listings.extend(data["offers"])
next_cursor = data.get("metadata", {}).get("next_page")
if not next_cursor:
break
time.sleep(1.5)this pattern also applies to Kijiji Canada, which uses a nearly identical offset-cursor hybrid depending on category.
proxy strategy across OLX markets
datacenter IPs get blocked fast on OLX — most markets will 429 or soft-block a datacenter range within 200-500 requests. residential proxies work better, but mobile proxies are the most reliable choice for high-sensitivity markets (Brazil, India, Portugal) because they carry real carrier ASNs that OLX’s fraud scoring treats as genuine consumer traffic.
key points for proxy selection:
- use country-matched proxies: an Indian IP for
olx.in, a Polish IP forolx.pl. cross-country residential IPs still trigger geo-mismatch signals on some markets. - rotate per request (not per session) on high-sensitivity markets. on low-sensitivity markets like Ukraine, session rotation every 50-100 requests is fine and faster.
- set realistic request delays: 1-2 seconds on low-sensitivity markets, 2-4 seconds on high-sensitivity. anything faster than 0.5 req/sec per IP will get you rate-limited.
- if you’re running multi-market scrapes in parallel, segment your proxy pool by country to avoid IP reputation cross-contamination.
this same principle — matching IP type to the target market’s bot-scoring model — is covered in depth in the context of price comparison work in Mobile Proxies for Insurance Quote Comparison, and the underlying logic transfers directly here.
schema design for multi-country data
the easiest mistake is building per-country tables. maintain one olx_listings table with a country_code column and handle market-specific field differences in your ETL layer, not your schema.
recommended unified schema:
listing_id TEXT -- OLX's internal ID (not URL slug)
country_code TEXT -- 'PL', 'BR', 'IN', 'UA', etc.
title TEXT
price_amount NUMERIC
price_currency TEXT
location_city TEXT
location_region TEXT
category_id INT
category_name TEXT
url TEXT
phone_available BOOL -- true if phone was fetchable
scraped_at TIMESTAMPTZ
raw_json JSONB -- keep the source payloadstore raw_json always. OLX adds and removes fields by market without notice, and having the raw payload means you can backfill derived columns without re-scraping. for eBay Kleinanzeigen, a similar raw-first approach saved significant re-scraping time when their listing schema shifted in early 2026.
numbered checklist before going to production:
- verify the target market’s Cloudflare posture in a fresh browser session (check CF ray header)
- confirm whether JSON API endpoints exist by inspecting network traffic on a live category page
- test your proxy pool with 50 requests across 10 IPs and measure block rate
- map pagination type (offset / cursor / page param) for your target category
- decide whether phone-gated data is worth the session overhead for your use case
- set up deduplication on
(listing_id, country_code)before you run any bulk ingestion
Bottom line
OLX is scrapeable at scale in most markets if you match your approach to each market’s anti-bot posture: JSON API + light proxying for Poland and Ukraine, Playwright + mobile proxies for Portugal and Brazil. don’t build separate scrapers per country — build one configurable scraper with market-specific adapters and a unified schema. DRT covers the full classifieds landscape if you’re mapping competitors across markets; the patterns here apply directly to Avito, Gumtree, Kijiji, and the rest of the ecosystem.