How to scrape Jumia Africa product listings
Scrape Jumia Africa and you are scraping the largest pan-African ecommerce platform, with separate country instances for Nigeria, Kenya, Egypt, Morocco, Ivory Coast, Senegal, Ghana, Uganda, and a handful of other markets. Jumia was founded in 2012 and went public on the NYSE in 2019, and by 2026 it remains the dominant horizontal marketplace across Sub-Saharan Africa with category mixes that lean heavily into mobile phones, home appliances, and fashion. The scraping landscape is shaped by three things: per-country subdomains with different catalogues, a pricing system that mixes Jumia direct, Jumia Mall verified sellers, and a long tail of independent merchants, and an anti-bot layer that becomes more aggressive on the larger Nigeria and Egypt domains.
This guide focuses on Jumia Nigeria as the canonical example, with notes on cross-country differences. The patterns transfer to every Jumia country instance with minor adjustments to the domain and currency.
Mapping the Jumia URL and listing structure
Each Jumia country lives at its own subdomain: www.jumia.com.ng for Nigeria, www.jumia.co.ke for Kenya, www.jumia.com.eg for Egypt, www.jumia.ci for Ivory Coast, and so on. Within each country, the URL structure is consistent: https://www.jumia.com.ng/<product-slug>.html for product detail pages and https://www.jumia.com.ng/<category-slug>/ for category listings. Product slugs include a SKU identifier embedded near the end, which is the canonical key you should store.
Jumia exposes a server-rendered HTML front end with a JavaScript-hydrated layer for filters and recommendations. There is no fully public JSON API, but the product detail pages embed a <script type="application/ld+json"> block that carries Schema.org Product data including price, availability, brand, and SKU. Parsing this JSON-LD block is dramatically more reliable than scraping the visible HTML because the schema is stable while the visible markup changes regularly.
import httpx
from bs4 import BeautifulSoup
import json
HEADERS = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36",
"Accept-Language": "en-NG,en;q=0.9",
}
def parse_product(html: str) -> dict:
soup = BeautifulSoup(html, "lxml")
for script in soup.find_all("script", type="application/ld+json"):
try:
data = json.loads(script.string)
except (json.JSONDecodeError, TypeError):
continue
if isinstance(data, dict) and data.get("@type") == "Product":
return {
"sku": data.get("sku"),
"name": data.get("name"),
"brand": (data.get("brand") or {}).get("name"),
"price": (data.get("offers") or {}).get("price"),
"currency": (data.get("offers") or {}).get("priceCurrency"),
"availability": (data.get("offers") or {}).get("availability"),
"rating": (data.get("aggregateRating") or {}).get("ratingValue"),
}
return {}
async def fetch_product(url: str, proxy: str) -> dict:
async with httpx.AsyncClient(proxy=proxy, headers=HEADERS, timeout=20) as c:
r = await c.get(url)
if r.status_code != 200:
return {}
return parse_product(r.text)
Jumia’s JSON-LD includes the offer price but not the seller breakdown. To get the per-seller information, you have to parse the seller block from the HTML directly. The seller block is structured around a data-merchant-name attribute that is consistent across countries.
Country-specific proxy strategy for Jumia
Jumia’s bot detection becomes meaningfully stricter on the larger country instances. Nigeria and Egypt see the highest scrutiny, both because of the volume of legitimate scraping that already happens (price intelligence vendors, brand monitoring tools, affiliate networks) and because of the volume of fraud activity that uses the same patterns. Kenya and Ivory Coast see lighter enforcement.
Your proxy strategy should match the country you are scraping. For Nigeria and Egypt, use residential or mobile IPs in-country. MTN, Airtel, and 9mobile mobile pools work well for Nigeria. Vodafone and Etisalat residential pools work for Egypt. For the smaller markets, pan-African residential pools or even GCC-region pools sometimes work because Jumia’s geo-classification is less granular for low-volume countries.
| Country | Recommended proxy origin | Tolerance per IP |
|---|---|---|
| Nigeria | Nigerian mobile or residential | 100 req/hr per IP |
| Egypt | Egyptian residential | 100 req/hr per IP |
| Kenya | Kenyan or pan-African residential | 200 req/hr per IP |
| Morocco | Moroccan or French residential | 200 req/hr per IP |
| Ivory Coast | Pan-African or French residential | 250 req/hr per IP |
| Senegal | Pan-African or French residential | 250 req/hr per IP |
The cost differential between in-country residential and pan-African residential pools is significant for Nigerian inventory specifically. For most production workloads, the cost is justified by the success rate uplift. For lighter monitoring workloads under 1,000 SKUs per day, pan-African pools sometimes work even for Nigeria with careful rate limiting.
Crawling category trees and pagination
Category listings on Jumia are paginated with a ?page=N query parameter. The maximum reachable page depends on the category but typically caps at 50 pages of 40 products each. To go deeper, decompose by sub-category, brand facet, or price band. The category pages also embed a JSON-LD ItemList block that gives you structured access to the listing.
async def crawl_category(base_url: str, max_pages: int, proxy_pool):
items = []
for page in range(1, max_pages + 1):
proxy = proxy_pool.next()
url = f"{base_url}?page={page}"
async with httpx.AsyncClient(proxy=proxy, headers=HEADERS, timeout=20) as c:
r = await c.get(url)
if r.status_code != 200:
continue
soup = BeautifulSoup(r.text, "lxml")
cards = soup.select("article.prd")
if not cards:
break
for card in cards:
items.append({
"sku": card.get("data-sku"),
"name": card.select_one("h3.name").get_text(strip=True) if card.select_one("h3.name") else None,
"price": card.select_one("div.prc").get_text(strip=True) if card.select_one("div.prc") else None,
"url": "https://www.jumia.com.ng" + card.select_one("a.core")["href"] if card.select_one("a.core") else None,
})
return items
For very broad categories like Phones and Tablets in Nigeria, the visible pagination only covers the first 2,000 SKUs. The remaining tail requires faceted decomposition. Build a recursive crawler that subdivides any category exceeding 2,000 results into brand and price-band buckets until each bucket fits within the pagination cap.
Tracking Jumia Mall vs. third-party merchant signal
Jumia distinguishes between Jumia Mall (verified sellers with quality guarantees) and ordinary third-party merchants. The distinction matters for analytics because Jumia Mall pricing is often the more stable signal while third-party pricing is more volatile. The seller card on the product page exposes a Jumia Mall badge that you can detect by looking for the mall-badge CSS class.
For brand monitoring use cases, separate the dataset into Mall and non-Mall slices. The Mall slice gives you the canonical price point that the brand wants to maintain. The non-Mall slice gives you the gray-market and parallel-import price activity that often signals supply chain shifts or unauthorized resellers.
Rate limits, retries, and session management
Jumia does not publish rate limits, but observed behavior is consistent across countries. A single IP can sustain about 1 request per 2 seconds for an hour before triggering a soft block that returns either a 429 or a Cloudflare challenge. After a 10-30 minute cooldown the IP is usable again. The cooldown extends if you continue retrying through the block.
import asyncio
async def safe_request(url: str, proxy_pool, max_retries: int = 3):
for attempt in range(max_retries):
proxy = proxy_pool.next()
try:
async with httpx.AsyncClient(proxy=proxy, headers=HEADERS, timeout=20) as c:
r = await c.get(url)
if r.status_code == 200:
return r
if r.status_code in (429, 503):
await asyncio.sleep(60)
continue
except httpx.HTTPError:
pass
await asyncio.sleep(5 * (attempt + 1))
return None
The exponential backoff matters more on Jumia than on some other African ecommerce sites because the block escalation is relatively slow but the cooldown extends quickly under retries. A patient retry pattern outperforms an aggressive one.
Working with multi-currency Jumia datasets
Each Jumia country uses its own local currency: NGN for Nigeria, KES for Kenya, EGP for Egypt, MAD for Morocco, XOF for Ivory Coast and Senegal. For pan-African analyses, normalize to USD or EUR using daily FX rates rather than scrape-time conversions. NGN in particular has had significant devaluation events in recent years, and any cross-time analysis needs to account for that.
Detecting and routing around CAPTCHA challenges on Jumia
When Jumia flags your traffic, the response is usually a Cloudflare interrogation page rather than a clean HTTP error. Your scraper needs to detect this content-type swap explicitly. Look for the signature cf-mitigated header, the presence of __cf_chl_ cookies, or HTML containing Just a moment.... Treat any of these as a soft block.
def is_challenged(response) -> bool:
if response.status_code in (403, 503):
return True
if "cf-mitigated" in response.headers:
return True
if "__cf_chl_" in response.headers.get("set-cookie", ""):
return True
body = response.text[:2000].lower()
return "just a moment" in body or "checking your browser" in body
When you detect a challenge, do not retry on the same IP for at least 30 minutes. Mark that IP as cooling and route subsequent requests to a different IP in your pool. Aggressive retries on a flagged IP cause the cooling window to extend and can lead to long-term blacklisting of your subnet.
For pages that absolutely must be fetched (a specific SKU your client cares about), have a fallback path that uses a headless browser with a real Nigeria residential IP. The browser path costs more per page but solves the small percentage of challenges that the API path cannot handle. Most production setups maintain a 95/5 split: 95% of requests go through the lightweight HTTP and JSON path, 5% fall through to the browser path on challenge.
Working with NGN pricing and FX normalization
Pricing in Nigeria is denominated in NGN, and any cross-market analysis requires careful FX normalization. The naive approach of converting at scrape time using a live FX feed introduces noise into your trend lines because exchange rate movements get conflated with real price changes. The correct pattern is to store the price in local NGN and apply FX conversion at query time using a daily reference rate.
CREATE TABLE fx_rates (
rate_date DATE NOT NULL,
base_ccy VARCHAR(3) NOT NULL,
quote_ccy VARCHAR(3) NOT NULL,
rate DECIMAL(18,8) NOT NULL,
PRIMARY KEY (rate_date, base_ccy, quote_ccy)
);
Source the daily rates from a reliable feed such as the European Central Bank reference rates or your bank’s wholesale feed. Avoid scraping retail FX rates because they include the bank’s spread and produce inconsistent comparisons. For analyses that span multiple years, also account for currency revaluation events that occasionally happen in emerging markets.
Comparing Jumia to other regional marketplaces
| Marketplace | Country focus | Catalogue scale | Bot strictness |
|---|---|---|---|
| Jumia | Nigeria | Large | High |
| Konga | Adjacent markets | Medium | Medium |
| Jiji | Adjacent markets | Smaller | Lower |
Cross-marketplace analyses help separate platform-specific dynamics from genuine market trends. If a price drops on Jumia but stays flat across the comparable competitors, that is a platform-driven event rather than a market-wide signal. Your scraping pipeline should ingest from at least three platforms in any market where you intend to publish category insights.
Operational monitoring and alerting
Every production scraper needs three monitoring layers regardless of target. The first is per-IP success rate over a 5-minute window, alerting if any IP drops below 80%. The second is parser error rate, alerting if more than 1% of fetched pages fail to extract the canonical fields. The third is data freshness, alerting if your downstream consumers see snapshots more than 24 hours old.
import time
from collections import deque
class IPHealthTracker:
def __init__(self, window_seconds: int = 300):
self.window = window_seconds
self.events = {}
def record(self, ip: str, success: bool):
bucket = self.events.setdefault(ip, deque())
now = time.time()
bucket.append((now, success))
while bucket and bucket[0][0] < now - self.window:
bucket.popleft()
def success_rate(self, ip: str) -> float:
bucket = self.events.get(ip)
if not bucket:
return 1.0
successes = sum(1 for _, ok in bucket if ok)
return successes / len(bucket)
Wire this into Prometheus or your existing observability stack so the on-call engineer sees IP degradation as it happens rather than after the daily snapshot fails. For long-running operations against Jumia, IP rotation triggered by the health tracker is more reliable than fixed rotation schedules.
Legal and compliance considerations for Nigeria
Public product, price, and availability data are generally treated as fair to scrape in most jurisdictions, but Nigeria has its own consumer protection and personal data frameworks that overlay any general analysis. Confine your collection to non-personal data: SKU identifiers, prices, descriptions, ratings as aggregates, and seller display names. Avoid collecting individual buyer reviews with names, phone numbers, or email addresses attached, and avoid pulling any data behind a login.
For commercial deployment of a scraper that targets Jumia, document your basis for processing, your data retention period, and your purpose limitation. Most data protection regimes treat scraped public data more favorably when there is a clear lawful basis and the data is not used for direct marketing to identified individuals. The W3C Web Annotation guidance and similar published frameworks remain useful starting points for documenting your approach.
Pipeline orchestration and scheduling
For any non-trivial scraping operation, a dedicated orchestration layer is the difference between a script you babysit and a service that runs unattended. The two strong open-source choices in 2026 are Prefect 3 and Dagster. Both handle the patterns you need: DAG dependencies, retries, observability, secret management, and dynamic fan-out across IPs and categories.
from prefect import flow, task
@task(retries=3, retry_delay_seconds=60)
def fetch_category(category_id: int, page: int):
return crawl_one_page(category_id, page)
@task
def store_pages(pages: list):
write_to_db(pages)
@flow(name="jumia-daily-sweep")
def daily_sweep(category_ids: list):
futures = []
for cid in category_ids:
for page in range(1, 50):
futures.append(fetch_category.submit(cid, page))
pages = [f.result() for f in futures]
store_pages(pages)
Run the flow on a 6-hour or 24-hour schedule depending on how dynamic the underlying catalogue is. For seasonal markets like apparel where pricing changes daily, a 6-hour cadence catches the meaningful movements without driving up proxy costs unnecessarily. For long-tail categories like books or industrial supplies, daily is sufficient and the cost saving is meaningful.
Sample analytics queries on the collected dataset
Once your snapshots are landing reliably, the analytics layer is where the value materializes. A few queries that consistently come up across Jumia datasets:
-- Top 50 SKUs by price drop in the last 7 days
SELECT sku, MIN(selling_price) - MAX(selling_price) AS price_drop
FROM snapshot
WHERE snapshot_at > now() - interval '7 days'
GROUP BY sku
ORDER BY price_drop ASC
LIMIT 50;
-- Stock-out frequency per category
SELECT category_id,
SUM(CASE WHEN in_stock = 0 THEN 1 ELSE 0 END)::float / COUNT(*) AS oos_rate
FROM snapshot
WHERE snapshot_at > now() - interval '30 days'
GROUP BY category_id
ORDER BY oos_rate DESC;
-- New SKUs first seen in the last 14 days
SELECT sku, MIN(snapshot_at) AS first_seen
FROM snapshot
GROUP BY sku
HAVING MIN(snapshot_at) > now() - interval '14 days'
ORDER BY first_seen DESC;
These three queries alone power most of the dashboards a category manager wants. Add a brand share view, a seller concentration view, and a campaign-frequency view and you have a competitive intelligence product. The collection layer is the prerequisite; the analytics layer is where you create defensible value.
Versioning your scraper for catalogue evolution
Every ecommerce site evolves its catalogue structure regularly. New attribute fields appear, old fields are deprecated, category trees are reorganized, and pricing display logic changes. Your scraper code has to evolve with these changes, and a versioning pattern that keeps old data interpretable is critical. Stamp every snapshot row with the scraper version that produced it. When you deploy a new version of the parser, increment the version number. Downstream analytics can filter by version when they need consistent semantics across a time range, or join across versions when they want long-running trend analysis.
ALTER TABLE snapshot ADD COLUMN scraper_version VARCHAR(16);
CREATE INDEX scraper_version_idx ON snapshot(scraper_version);
Pair this with a small registry table that documents what each scraper version did differently. When a downstream user asks why a particular metric jumped on a specific date, the version registry usually has the answer.
Common pitfalls when scraping Jumia
Three issues catch most teams off guard. The first is country-domain fragmentation. Jumia operates 11 country sites (jumia.com.ng, jumia.co.ke, jumia.ci, etc), each with its own currency, language, and seller pool. The same SKU can sit on multiple country sites with different prices and different stock. A scraper that treats jumia.com.ng prices as representative of West Africa understates Ivorian prices by 10-25% on average.
The second is JumiaPay vs cash-on-delivery price drift. Some sellers offer a JumiaPay discount that is rendered only when the user signs in. Anonymous scraping captures the list price; authenticated scraping captures the discounted price. Decide which population matters for your analysis and stay consistent.
The third is flash-sale staleness. The flash_sales API serves a curated subset of SKUs with countdown timers. The countdown is computed client-side from a server-issued epoch. If you cache the response for more than 60 seconds during an active flash sale, the timer drifts and downstream consumers see incorrect end times. Bypass cache for flash-sale endpoints and accept the higher request cost during campaign hours.
FAQ
Can I scrape all Jumia countries from a single Nigerian residential IP pool?
Technically yes, but Jumia geo-classifies the visitor IP and serves different catalogues per country. A Nigerian IP requesting jumia.co.ke works but raises the bot score because real Kenyan traffic does not originate from Nigerian residential networks. For each country you intend to scrape, source proxies from that country or from a regional adjacent market.
How does Jumia handle the difference between Mall and non-Mall sellers in the API?
There is no public API. In the HTML, Jumia Mall items are tagged with a CSS badge and a separate data-mall attribute on the seller card. Capture this signal at parse time and persist it in your snapshot table so downstream analyses can filter by Mall status.
Does Jumia’s price include shipping?
The price shown on the listing card is the product price exclusive of shipping. Shipping is calculated at checkout based on delivery zone and seller fulfillment method. For total cost analyses you would need to simulate add-to-basket flows, which is more complex and less reliable. Most price intelligence projects work with the listed price as the canonical signal.
Are Jumia reviews scrapable for sentiment analysis?
Reviews are visible on product pages and accessible through HTML scraping. They include the reviewer’s display name (usually a first name and initial) and the review text. From a privacy perspective, treat the display name as personal data and avoid storing it in long-term datasets. The review text and rating are the analytically useful fields.
Does Jumia run a sitemap I can use for SKU discovery?
Jumia exposes sitemaps at /sitemap.xml for each country. They list categories and product URLs but the per-product entries are typically only a sample of the full catalogue. For exhaustive SKU discovery, combine sitemap parsing with category and seller crawls.
Can one scraper cover all 11 Jumia country sites?
Architecturally yes, but you need per-country proxy pools. A Lagos residential IP that hits jumia.co.ke succeeds at low volume but is throttled aggressively above ~200 requests per hour.
How do I reconcile Jumia data with Konga or other competitors?
Use brand+model+capacity as the join key for electronics. EAN/UPC coverage is patchy, so fall back to fuzzy title matching with a confidence score before merging analytics.
To build a broader Africa ecommerce intelligence stack, browse the ecommerce scraping category for tooling reviews, proxy comparisons, and framework deep dives that pair with the patterns above.