How to scrape Konga Nigeria product listings

How to scrape Konga Nigeria product listings

Scrape Konga Nigeria and you tap into the second-largest general-merchandise marketplace in Nigeria, alongside Jumia. Konga has shifted ownership multiple times since its founding in 2012 and has rebuilt its catalogue and logistics under each phase. By 2026 the platform serves a strong base in electronics, fashion, and groceries with a hybrid first-party and third-party seller model. The scraping landscape is shaped by three things: a JSON catalogue API that powers the front end, an aggressive Cloudflare front end that profiles non-Nigerian traffic strictly, and a SKU schema that overlaps partially with Jumia for cross-marketplace analytics.

This guide focuses on Konga at konga.com as the canonical example. The patterns also apply to KongaPay merchant subsystems with minor adjustments.

Mapping Konga URL and JSON structure

Konga product URLs follow the pattern https://www.konga.com/product/<product-slug>-<productId>. The trailing productId is the canonical SKU. Behind every product page sits a JSON endpoint at https://api.konga.com/v1/catalog/products/sku/<productId>. The endpoint returns price, stock, full description, images, and seller information.

import httpx

API = "https://api.konga.com/v1/catalog/products/sku"

HEADERS = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
    "Accept": "application/json",
    "Accept-Language": "en-NG,en;q=0.9",
}

async def fetch_konga(sku: str, proxy: str):
    url = f"{API}/{sku}"
    async with httpx.AsyncClient(proxy=proxy, headers=HEADERS, timeout=20) as c:
        r = await c.get(url)
        if r.status_code == 200:
            return r.json()
        return None

The response includes the canonical product object with sku, name, brand, original_price, special_price, stock_status, seller_name, seller_id, categories, and rating_summary. For most analytical use cases the API is sufficient.

Nigerian proxy strategy

Konga’s Cloudflare front end is aggressive against non-Nigerian IPs. Nigerian residential or mobile IPs through MTN, Airtel, 9mobile, or Glo are required for sustained scraping. Pan-African residential pools work for light loads but degrade at volume.

For workloads under 2,000 product reads per day, a small Nigerian residential pool with sticky 15-minute sessions is sufficient. For higher volumes, dedicated Nigerian mobile ports are the cleaner path because they sustain higher request rates without challenges.

Crawling the category tree

Konga exposes a category tree at https://api.konga.com/v1/catalog/categories. Each category has a url_key and an id. The listing endpoint at https://api.konga.com/v1/catalog/products accepts category, sort, and pagination parameters, with practical limits of 25 pages of 40 products each.

async def crawl_category(category_id: int, proxy_pool, max_pages: int = 25):
    results = []
    for page in range(1, max_pages + 1):
        proxy = proxy_pool.next()
        url = "https://api.konga.com/v1/catalog/products"
        params = {
            "category_id": category_id,
            "page": page,
            "limit": 40,
            "sort": "relevance",
        }
        async with httpx.AsyncClient(proxy=proxy, headers=HEADERS, timeout=20) as c:
            r = await c.get(url, params=params)
            if r.status_code != 200:
                break
            items = r.json().get("products", [])
            if not items:
                break
            results.extend(items)
    return results

For broader categories like Phones and Tablets, decompose by brand and price band facets exposed in the search response.

Cross-checking Konga pricing against Jumia

The most analytically interesting Nigerian ecommerce signal is the price differential between Konga and Jumia for the same SKU. They overlap heavily on consumer electronics, household goods, and fashion. For brand monitoring, the price gap often signals which platform is running a promotion or which platform a particular seller is using as the price-leader channel.

FieldKongaJumiaNotes
Canonical IDsku (string)SKU embedded in URLDifferent schemes
Mall equivalentKongaCareJumia MallBoth flag verified sellers
Price schemeoriginal + specialregular + saleBoth expose pre/post-discount
Update cadenceHourlyHourlyBoth refresh throughout the day

For SKU matching across the two platforms, group by EAN where available, then by normalized title plus brand for the long tail. Match rates on Nigerian ecommerce specifically tend to be lower than other markets because EAN coverage is voluntary and many sellers use product names that emphasize Nigerian-specific marketing claims.

Detecting and routing around CAPTCHA challenges on Konga

When Konga flags your traffic, the response is usually a Cloudflare or vendor interrogation page rather than a clean HTTP error. Your scraper needs to detect this content-type swap explicitly. Look for the signature cf-mitigated header, the presence of __cf_chl_ cookies, or HTML containing Just a moment.... Treat any of these as a soft block.

def is_challenged(response) -> bool:
    if response.status_code in (403, 503):
        return True
    if "cf-mitigated" in response.headers:
        return True
    if "__cf_chl_" in response.headers.get("set-cookie", ""):
        return True
    body = response.text[:2000].lower()
    return "just a moment" in body or "checking your browser" in body

When you detect a challenge, do not retry on the same IP for at least 30 minutes. Mark that IP as cooling and route subsequent requests to a different IP in your pool. Aggressive retries on a flagged IP cause the cooling window to extend and can lead to long-term blacklisting of your subnet. For pages that absolutely must be fetched, have a fallback path that uses a headless browser with a real Nigeria residential IP. The browser path costs more per page but solves the small percentage of challenges that the API path cannot handle.

Working with NGN pricing and FX normalization

Pricing on Konga is denominated in NGN, and any cross-market analysis requires careful FX normalization. The naive approach of converting at scrape time using a live FX feed introduces noise into your trend lines because exchange rate movements get conflated with real price changes. Store the price in local NGN and apply FX conversion at query time using a daily reference rate.

CREATE TABLE fx_rates (
    rate_date DATE NOT NULL,
    base_ccy VARCHAR(3) NOT NULL,
    quote_ccy VARCHAR(3) NOT NULL,
    rate DECIMAL(18,8) NOT NULL,
    PRIMARY KEY (rate_date, base_ccy, quote_ccy)
);

Source the daily rates from a reliable feed such as the European Central Bank reference rates or your bank wholesale feed. Avoid scraping retail FX rates because they include the bank spread.

Comparing Konga to other regional marketplaces

MarketplaceCountry focusCatalogue scaleBot strictness
KongaNigeriaLargeHigh
JumiaAdjacent marketsMediumMedium
JijiAdjacent marketsSmallerLower

Cross-marketplace analyses help separate platform-specific dynamics from genuine market trends. If a price drops on Konga but stays flat across the comparable competitors, that is a platform-driven event rather than a market-wide signal.

Operational monitoring and alerting

Every production scraper needs three monitoring layers. The first is per-IP success rate over a 5-minute window, alerting if any IP drops below 80%. The second is parser error rate, alerting if more than 1% of fetched pages fail to extract the canonical fields. The third is data freshness, alerting if your downstream consumers see snapshots more than 24 hours old.

import time
from collections import deque

class IPHealthTracker:
    def __init__(self, window_seconds: int = 300):
        self.window = window_seconds
        self.events = {}

    def record(self, ip: str, success: bool):
        bucket = self.events.setdefault(ip, deque())
        now = time.time()
        bucket.append((now, success))
        while bucket and bucket[0][0] < now - self.window:
            bucket.popleft()

    def success_rate(self, ip: str) -> float:
        bucket = self.events.get(ip)
        if not bucket:
            return 1.0
        successes = sum(1 for _, ok in bucket if ok)
        return successes / len(bucket)

Wire this into Prometheus or your existing observability stack so the on-call engineer sees IP degradation as it happens rather than after the daily snapshot fails.

Legal and compliance considerations for Nigeria

Public product, price, and availability data are generally treated as fair to scrape in most jurisdictions, but Nigeria has its own consumer protection and personal data frameworks. Confine your collection to non-personal data: SKU identifiers, prices, descriptions, ratings as aggregates, and seller display names. Avoid collecting individual buyer reviews with names, phone numbers, or email addresses attached, and avoid pulling any data behind a login.

For commercial deployment, document your basis for processing, your data retention period, and your purpose limitation. The W3C Web Annotation guidance and similar published frameworks remain useful starting points for documenting your approach.

Pipeline orchestration and scheduling

For any non-trivial scraping operation, a dedicated orchestration layer is the difference between a script you babysit and a service that runs unattended. The two strong open-source choices in 2026 are Prefect 3 and Dagster.

from prefect import flow, task

@task(retries=3, retry_delay_seconds=60)
def fetch_category(category_id: int, page: int):
    return crawl_one_page(category_id, page)

@flow(name="konga-daily-sweep")
def daily_sweep(category_ids: list):
    futures = []
    for cid in category_ids:
        for page in range(1, 50):
            futures.append(fetch_category.submit(cid, page))
    return [f.result() for f in futures]

Run the flow on a 6-hour or 24-hour schedule depending on how dynamic the underlying catalogue is.

Sample analytics queries

-- Top 50 SKUs by price drop in the last 7 days
SELECT sku, MIN(selling_price) - MAX(selling_price) AS price_drop
FROM snapshot
WHERE snapshot_at > now() - interval '7 days'
GROUP BY sku
ORDER BY price_drop ASC
LIMIT 50;

-- Stock-out frequency per category
SELECT category_id,
       SUM(CASE WHEN in_stock = 0 THEN 1 ELSE 0 END)::float / COUNT(*) AS oos_rate
FROM snapshot
WHERE snapshot_at > now() - interval '30 days'
GROUP BY category_id
ORDER BY oos_rate DESC;

-- New SKUs first seen in the last 14 days
SELECT sku, MIN(snapshot_at) AS first_seen
FROM snapshot
GROUP BY sku
HAVING MIN(snapshot_at) > now() - interval '14 days';

These queries power most of the dashboards a category manager wants. Add a brand share view, a seller concentration view, and a campaign-frequency view and you have a competitive intelligence product.

Building robust deduplication across noisy listings

The long-tail catalogue is full of near-duplicate listings. The standard deduplication approach uses a three-pass funnel: exact match on EAN, normalized title plus brand TF-IDF similarity, then perceptual image hash similarity.

import imagehash
from PIL import Image

def perceptual_hash(image_path: str) -> str:
    img = Image.open(image_path)
    return str(imagehash.phash(img, hash_size=16))

Tune the similarity thresholds against a hand-labeled gold set of 500 to 1,000 known duplicate clusters. Without a gold set, you will either over-merge or under-merge.

Caching strategy and incremental crawls

Full daily snapshots scale linearly with catalogue size, which becomes expensive at multi-million SKU scale. Most production deployments shift from full snapshots to incremental refreshes after the initial ramp using freshness deadline, volatility, and business priority signals to decide what to refetch on each cycle. Priority-driven scheduling reduces total request volume by 60-80% compared to blind full snapshots, while keeping the data fresh on the SKUs that actually matter to the business.

End-to-end pipeline architecture

A production-grade scraping pipeline has four layers that work together: collection, parsing, storage, and serving. Each layer has its own failure modes and its own scaling characteristics, and treating them as a single monolith is the most common architectural mistake teams make when scaling beyond hobby workloads.

The collection layer handles the network conversation: HTTP requests, proxy assignment, retry logic, and rate limit enforcement. It should know nothing about the data shape and nothing about how the data will eventually be queried. Its only job is to fetch raw bytes reliably and hand them off to the next layer with metadata about which IP, which user agent, and which timestamp produced them.

The parsing layer transforms raw bytes into structured records. It owns the schema, the field normalization, and the validation rules. When the upstream HTML or JSON structure changes, only the parsing layer needs to adapt. Keep parsers idempotent and version them aggressively so old raw bytes can be re-parsed when you discover bugs.

The storage layer holds the canonical snapshots in a query-optimized format. For most ecommerce datasets, a column-oriented store like DuckDB, ClickHouse, or BigQuery outperforms row-oriented Postgres at analytical scale. The trade-off is write latency and update support; column stores prefer append-only and bulk loads, which fits the snapshot model naturally.

The serving layer exposes the data to consumers, whether that is a BI dashboard, an API for downstream systems, or an alerting pipeline. Keep the serving layer denormalized and pre-aggregated where possible. Recomputing complex analytics on every dashboard load wastes resources and hurts responsiveness.

# Pseudo-code for the four-layer split
async def collect(url: str, proxy_pool) -> RawFetch:
    proxy = proxy_pool.next()
    response = await http_get(url, proxy)
    return RawFetch(url=url, body=response.text, fetched_at=now(), ip=proxy.ip)

def parse(raw: RawFetch) -> Snapshot:
    data = json.loads(raw.body)
    return Snapshot(sku=data["id"], price=data["price"], ...)

def store(snapshot: Snapshot, db) -> None:
    db.append("snapshots", snapshot)

def serve(query: str, db) -> list:
    return db.query(query)

Decoupling these layers also enables independent scaling. The collection layer is bound by proxy capacity and network bandwidth. The parsing layer is CPU-bound. The storage layer is bound by I/O and disk capacity. The serving layer is bound by query concurrency. Each layer can scale horizontally without coupling to the others.

Data quality monitoring patterns

Beyond per-IP success rate, every snapshot should pass a small battery of data quality checks before being considered authoritative. The checks fall into three categories: structural, distributional, and semantic.

Structural checks verify that every required field is present and of the expected type. A snapshot row missing the price field is not a real snapshot. A row with a negative price is not a real price.

Distributional checks compare the current snapshot against recent history. If today’s snapshot has 30% fewer SKUs than yesterday, something broke either in collection or in the upstream catalogue. Either way, the on-call engineer needs to investigate before downstream consumers see broken data.

Semantic checks compare related fields for consistency. If a SKU shows in_stock = true but stock_quantity = 0, one of the fields is wrong. If the discount percentage is computed from list_price and selling_price, the computed value should match the stated discount field.

def quality_check(snapshot: list[dict]) -> list[str]:
    errors = []
    if not snapshot:
        errors.append("empty snapshot")
        return errors
    avg_yesterday = get_yesterday_avg_size()
    if len(snapshot) < avg_yesterday * 0.7:
        errors.append(f"snapshot size {len(snapshot)} is 30% below yesterday")
    invalid = [r for r in snapshot if r.get("price", -1) < 0]
    if invalid:
        errors.append(f"{len(invalid)} rows have invalid price")
    return errors

Run quality checks as a separate flow that gates promotion of the snapshot from staging to production. A snapshot that fails quality checks should be quarantined for human review, not silently published.

Cost optimization strategies

Proxy bandwidth is usually the dominant cost in a production scraping operation. Three optimization patterns consistently reduce cost without hurting data quality. The first is request deduplication: if two consumers ask for the same SKU within the same hour, the system should serve the cached response rather than refetching. The second is conditional GET: when the upstream supports ETag or If-Modified-Since headers, conditional requests transfer no body when the resource has not changed. The third is selective field hydration: when the upstream API supports field selection, requesting only the fields you need reduces payload size dramatically.

For workloads above 100 GB of monthly proxy bandwidth, these three optimizations together reduce cost by 40-60% without changing the analytical output. The engineering effort to implement them is modest and the payback period is usually under a month at production volume.

Common pitfalls when scraping Konga

Three issues recur. The first is KongaPay-financing price masking. Konga shows the headline price on the product card and a financed monthly figure on the detail page. Some scrapers extract the monthly figure and treat it as the SKU price, which understates the cash price by an order of magnitude. Always pull the selling_price field from the product JSON, not from rendered HTML.

The second is seller-type confusion. Konga operates a first-party (Konga Retail) and a third-party seller pool on the same SKU. Pricing dynamics differ: first-party prices are stable for weeks, third-party prices reprice daily. Aggregating both into one population smears the trend signal. Segment by seller_type before computing any time-series metric.

The third is FX-driven price stale state. The Naira moves frequently against the USD. Sellers reprice imported electronics on a 1-3 day lag after FX shocks. A scraper that stores only the local-currency price loses the FX-attribution signal. Capture the scrape-time FX rate from a stable reference (CBN or Bloomberg) and compute USD-equivalent prices in a derived column.

FAQ

Is the Konga API officially documented?
The api.konga.com endpoints are the same endpoints used by the public web site. They have been stable for several years but are not contractually supported.

Can I scrape Konga from a UK or US residential IP?
For occasional product lookups, yes. For sustained scraping, Konga blocks non-Nigerian IPs aggressively. Nigerian residential or mobile IPs are strongly preferred.

Does Konga distinguish between first-party and third-party sellers?
Yes. The seller_id and seller_name fields identify the seller. KongaCare-flagged sellers receive verification badges. For analytics, separating first-party Konga listings from third-party sellers is essential because the price dynamics differ.

How does Konga handle the Naira’s frequent revaluation?
Konga prices update with the underlying Naira movements, but lag behind FX shocks by 24-72 hours as sellers reprice. For longitudinal analyses, normalize prices to a stable currency using daily reference FX rates.

What about Konga’s logistics promise vs. actual delivery?
The product page shows the logistics promise but the API does not expose delivery success rates. For analytics that need to assess seller delivery reliability, you have to scrape buyer reviews and aggregate the delivery-related sentiment, which requires more text processing.

Are there days when Konga rate-limits scrapers more aggressively?
Yes. Black Friday week (late November) and December gift season (Dec 15-24) carry the tightest rate limits. Pre-stage baseline snapshots in October to avoid being throttled during the actual event.

How do I match Konga SKUs to Jumia for cross-marketplace analytics?
Use brand + model + storage capacity as the canonical join. Konga’s GTIN coverage is sparse, so fuzzy title matching with a similarity threshold of 0.85 is the practical fallback.

Does Konga’s mobile app expose a different API surface than the web site?
Yes. The Android app uses protobuf endpoints under mobileapi.konga.com for personalization, while the public web JSON layer under api.konga.com covers catalog and pricing. For catalog and pricing intelligence the web endpoints are sufficient and easier to maintain. Reverse-engineering the protobuf surface only pays back when the analytics design requires the personalization signal.

To build a broader Nigeria ecommerce intelligence stack, browse the ecommerce scraping category for tooling reviews, proxy comparisons, and framework deep dives.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top
message me on telegram

Resources

Proxy Signals Podcast
Operator-level insights on mobile proxies and access infrastructure.

Multi-Account Proxies: Setup, Types, Tools & Mistakes (2026)