How to scrape Noon UAE ecommerce in 2026

How to scrape Noon UAE ecommerce in 2026

Scrape Noon UAE and you are scraping the dominant home-grown ecommerce platform in the Gulf, with separate marketplaces for the United Arab Emirates, Saudi Arabia, and Egypt. Noon was launched in 2017 as a Mohammed bin Rashid initiative to compete with Amazon’s regional expansion, and by 2026 it has captured a meaningful share of GCC ecommerce GMV through Noon Daily for grocery, Noon Food for delivery, and the core noon.com marketplace for general merchandise. The scraping landscape is shaped by three things: dual-language content (Arabic and English), country-specific catalogues with overlapping but not identical SKUs, and a Cloudflare-front-end that aggressively profiles non-GCC traffic.

This guide covers Noon UAE specifically. Most patterns transfer to noon.com.sa (Saudi) and noon.com.eg (Egypt) with minor adjustments to the country code parameter and currency parsing. The target keyword for this guide is scrape Noon UAE.

How Noon’s URL and language structure works

Noon URLs include a country segment, a language segment, and the product slug. A UAE Arabic URL looks like https://www.noon.com/uae-ar/<product-slug>/<sku>/p/, and the English equivalent is https://www.noon.com/uae-en/.... The two URLs serve different language pages but resolve to the same SKU. The trailing /p/ is what tells Noon’s routing layer that this is a product detail page.

The SKU at the end of the URL is the canonical identifier. A given product is sold in UAE, Saudi, and Egypt with the same SKU prefix but different listing IDs per country. If you are building a cross-country catalogue, capture both the SKU and the country segment as a composite key.

Noon’s product API lives at https://www.noon.com/_svc/catalog/api/v3/u/<sku>/p. The endpoint accepts an X-Locale header that controls language and country. Set it to en-ae for UAE English, ar-ae for UAE Arabic, en-sa for Saudi English, and so on. The JSON shape is consistent across locales.

import httpx
import asyncio

NOON_API = "https://www.noon.com/_svc/catalog/api/v3/u"

def headers(locale: str):
    return {
        "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.0 Safari/605.1.15",
        "Accept": "application/json",
        "X-Locale": locale,
        "X-Platform": "web",
        "Referer": f"https://www.noon.com/{locale.split('-')[1]}-{locale.split('-')[0]}/",
    }

async def fetch_product(sku: str, locale: str, proxy: str):
    url = f"{NOON_API}/{sku}/p"
    async with httpx.AsyncClient(proxy=proxy, headers=headers(locale), timeout=20) as c:
        r = await c.get(url)
        if r.status_code == 200:
            return r.json()
        return None

The response includes the canonical product object, an offers array with merchant pricing, an attributes array with structured specs, and a crossSellRecommendations block that you can use to discover related SKUs without crawling categories.

Why UAE residential proxies matter for Noon

Noon’s bot detection is built on Cloudflare Bot Management plus an internal scoring service that considers the visitor IP geography, the language requested, and the device fingerprint. A request from an AWS Frankfurt IP asking for X-Locale: ar-ae is immediately suspicious because real UAE Arabic-language traffic almost never originates from European data centers.

The cleanest signal you can give Noon is a UAE residential IP requesting the locale that matches its country. UAE residential pools are smaller than Turkish or Vietnamese pools because the population is smaller, but several major proxy providers list inventory specifically in UAE through Etisalat and du. Expect to pay a premium versus general residential pools.

For workloads under 5,000 SKUs per day, you can sometimes get away with rotating GCC residential IPs (any Saudi or Kuwait IP usually works as long as the locale matches). For higher volumes, dedicated UAE inventory matters because the pool size shrinks and Noon’s behavioral scoring catches the pattern of foreign IPs rapid-fire requesting Arabic content.

Pulling category and search data

Noon exposes a search API at https://www.noon.com/_svc/search/api/v3/u/search. The endpoint takes a query string, a country segment, sort, and pagination params. For category sweeps, you pass the category slug as a filter facet rather than a path param.

async def search_noon(query: str, locale: str, page: int, proxy: str):
    url = "https://www.noon.com/_svc/search/api/v3/u/search"
    params = {
        "q": query,
        "page": page,
        "limit": 50,
        "sort": "popularity",
    }
    async with httpx.AsyncClient(proxy=proxy, headers=headers(locale), timeout=20) as c:
        r = await c.get(url, params=params)
        if r.status_code == 200:
            return r.json().get("hits", [])
        return []

The search API caps result depth at roughly 1,000 hits per query, which is the typical CDN safeguard against scraping. To go deeper into a category, decompose the query by brand, price band, or attribute facets. The search response includes the available facets and counts, which gives you a recipe for subdivision.

Handling Arabic text and RTL parsing

If your downstream pipeline is going to treat product titles as English-only, you will lose half of Noon’s catalogue. Many sellers list products in both languages, but a meaningful long tail of grocery, beauty, and fashion is Arabic-only or has more detailed Arabic descriptions than English ones. Capture both languages from day one.

The simplest pattern is to make two requests per SKU, one with ar-ae and one with en-ae, and merge the results. The title, description, attributes, and seller.name fields differ between the two responses. Everything numeric (price, stock, rating) is identical. For storage, use a JSONB or JSON column that holds both language variants and let downstream consumers pick whichever they need.

For full-text search across the dataset, Postgres with the arabic text search configuration works well, as does Elasticsearch with the Arabic analyzer. Avoid stripping diacritics during ingestion because they carry semantic meaning in Arabic product names, especially for branded items.

Comparing Noon’s three country marketplaces

CountryCurrencyCatalogue size estimateBot scrutinyTypical proxy cost
UAE (uae)AED8M+ SKUsHigh$$$
Saudi (sa)SAR12M+ SKUsHighest$$$$
Egypt (eg)EGP4M+ SKUsMedium$$

Saudi gets the most attention from Noon’s bot defenses because it generates the largest GMV. Egypt has the lightest defenses but the lowest catalogue depth. If you are building a Gulf-wide price intelligence product, plan for separate proxy budgets per country and don’t try to scrape all three from the same IP pool.

Rate limits and retry patterns

Noon’s rate limit thresholds are not published, but observed behavior is consistent: a single residential IP can sustain about 1 product detail request per second for 30-60 minutes before triggering a soft block that returns 403 with a Cloudflare challenge page. After a 5-15 minute cooldown the IP is usable again. Sticky-session residential proxies handle this gracefully with rotation on 403.

async def safe_fetch(sku: str, locale: str, proxy_pool, max_retries: int = 3):
    for attempt in range(max_retries):
        proxy = proxy_pool.next()
        try:
            data = await fetch_product(sku, locale, proxy)
            if data:
                return data
        except httpx.HTTPError:
            pass
        await asyncio.sleep(2 ** attempt + 5)
    return None

For large category sweeps, distribute work across IPs so no single IP exceeds the throughput threshold. A scheduling layer that tracks per-IP request counts and back-off windows pays for itself quickly.

Storing Noon snapshots

Schema for a per-country product snapshot:

CREATE TABLE noon_snapshot (
    snapshot_at TIMESTAMP NOT NULL,
    country VARCHAR(2) NOT NULL,
    sku VARCHAR(64) NOT NULL,
    seller_id VARCHAR(64),
    price_aed DECIMAL(12,2),
    sale_price_aed DECIMAL(12,2),
    in_stock BOOLEAN,
    rating DECIMAL(3,2),
    review_count INT,
    title_en TEXT,
    title_ar TEXT,
    PRIMARY KEY (snapshot_at, country, sku)
);

For Saudi and Egypt, swap the price column to local currency. If you are normalizing across countries, store both local currency and a derived AED-equivalent column updated daily from a central FX table. Don’t try to convert at scrape time because exchange-rate noise will pollute your trend lines.

Linking Noon scraping to broader GCC strategy

Noon is the largest single property in GCC ecommerce, but it is not the only one. Amazon UAE and amazon.sa are still major players, and category-specific sites like Sharaf DG for electronics matter for some verticals. Build your scraping stack with a multi-source mindset from the start, even if you are only launching with Noon. Our GCC ecommerce scraping overview collects related guides as we publish them.

For broader proxy strategy in MENA markets, see our residential proxy provider ranking, which now includes vendor-by-vendor UAE and Saudi inventory counts.

Detecting and routing around CAPTCHA challenges

When Noon flags your traffic, the response is usually a Cloudflare interrogation page rather than an HTTP 4xx. Your scraper needs to detect this content-type swap explicitly. Look for the signature cf-mitigated header, the presence of __cf_chl_ cookies, or HTML containing Just a moment.... Treat any of these as a soft block.

def is_challenged(response) -> bool:
    if response.status_code in (403, 503):
        return True
    if "cf-mitigated" in response.headers:
        return True
    if "__cf_chl_" in response.headers.get("set-cookie", ""):
        return True
    body = response.text[:2000].lower()
    return "just a moment" in body or "checking your browser" in body

When you detect a challenge, do not retry on the same IP for at least 30 minutes. Mark that IP as cooling and route subsequent requests to a different IP in your pool. Aggressive retries on a flagged IP cause the cooling window to extend and can lead to long-term blacklisting of your subnet.

For pages that absolutely must be fetched (a specific SKU your client cares about), have a fallback path that uses a headless browser with real UAE residential IP. The browser path costs more per page but solves the small percentage of challenges that the API path cannot handle. Most production setups maintain a 95/5 split: 95% of requests go through the lightweight HTTP+JSON path, 5% fall through to the browser path on challenge.

Working with AED pricing and FX normalization

Pricing in Noon is denominated in AED, and any cross-market analysis requires careful FX normalization. The naive approach of converting at scrape time using a live FX feed introduces noise into your trend lines because exchange rate movements get conflated with real price changes. The correct pattern is to store the price in local AED and apply FX conversion at query time using a daily reference rate.

CREATE TABLE fx_rates (
    rate_date DATE NOT NULL,
    base_ccy VARCHAR(3) NOT NULL,
    quote_ccy VARCHAR(3) NOT NULL,
    rate DECIMAL(18,8) NOT NULL,
    PRIMARY KEY (rate_date, base_ccy, quote_ccy)
);

Source the daily rates from a reliable feed such as the European Central Bank reference rates or your bank’s wholesale feed. Avoid scraping retail FX rates because they include the bank’s spread and produce inconsistent comparisons. For analyses that span multiple years, also account for currency revaluation events that occasionally happen in emerging markets.

Comparing Noon to other regional marketplaces

MarketplaceCountry focusCatalogue scaleBot strictness
NoonUAELargeHigh
Amazon UAEAdjacent marketsMediumMedium
Sharaf DGAdjacent marketsSmallerLower

Cross-marketplace analyses help separate platform-specific dynamics from genuine market trends. If a price drops on Noon but stays flat across the comparable competitors, that is a platform-driven event rather than a market-wide signal. Your scraping pipeline should ingest from at least three platforms in any market where you intend to publish category insights.

Operational monitoring and alerting

Every production scraper needs three monitoring layers regardless of target. The first is per-IP success rate over a 5-minute window, alerting if any IP drops below 80%. The second is parser error rate, alerting if more than 1% of fetched pages fail to extract the canonical fields. The third is data freshness, alerting if your downstream consumers see snapshots more than 24 hours old.

import time
from collections import deque

class IPHealthTracker:
    def __init__(self, window_seconds: int = 300):
        self.window = window_seconds
        self.events = {}

    def record(self, ip: str, success: bool):
        bucket = self.events.setdefault(ip, deque())
        now = time.time()
        bucket.append((now, success))
        while bucket and bucket[0][0] < now - self.window:
            bucket.popleft()

    def success_rate(self, ip: str) -> float:
        bucket = self.events.get(ip)
        if not bucket:
            return 1.0
        successes = sum(1 for _, ok in bucket if ok)
        return successes / len(bucket)

Wire this into Prometheus or your existing observability stack so the on-call engineer sees IP degradation as it happens rather than after the daily snapshot fails. For long-running operations against Noon, IP rotation triggered by the health tracker is more reliable than fixed rotation schedules.

Legal and compliance considerations for UAE

Public product, price, and availability data are generally treated as fair to scrape in most jurisdictions, but UAE has its own consumer protection and personal data frameworks that overlay any general analysis. Confine your collection to non-personal data: SKU identifiers, prices, descriptions, ratings as aggregates, and seller display names. Avoid collecting individual buyer reviews with names, phone numbers, or email addresses attached, and avoid pulling any data behind a login.

For commercial deployment of a scraper that targets Noon, document your basis for processing, your data retention period, and your purpose limitation. Most data protection regimes treat scraped public data more favorably when there is a clear lawful basis and the data is not used for direct marketing to identified individuals. The W3C Web Annotation guidance and similar published frameworks remain useful starting points for documenting your approach.

Pipeline orchestration and scheduling

For any non-trivial scraping operation, a dedicated orchestration layer is the difference between a script you babysit and a service that runs unattended. The two strong open-source choices in 2026 are Prefect 3 and Dagster. Both handle the patterns you need: DAG dependencies, retries, observability, secret management, and dynamic fan-out across IPs and categories.

from prefect import flow, task

@task(retries=3, retry_delay_seconds=60)
def fetch_category(category_id: int, page: int):
    return crawl_one_page(category_id, page)

@task
def store_pages(pages: list):
    write_to_db(pages)

@flow(name="Noon-daily-sweep")
def daily_sweep(category_ids: list):
    futures = []
    for cid in category_ids:
        for page in range(1, 50):
            futures.append(fetch_category.submit(cid, page))
    pages = [f.result() for f in futures]
    store_pages(pages)

Run the flow on a 6-hour or 24-hour schedule depending on how dynamic the underlying catalogue is. For seasonal markets like apparel where pricing changes daily, a 6-hour cadence catches the meaningful movements without driving up proxy costs unnecessarily. For long-tail categories like books or industrial supplies, daily is sufficient and the cost saving is meaningful.

Sample analytics queries on the collected dataset

Once your snapshots are landing reliably, the analytics layer is where the value materializes. A few queries that consistently come up across Noon datasets:

-- Top 50 SKUs by price drop in the last 7 days
SELECT sku, MIN(selling_price) - MAX(selling_price) AS price_drop
FROM snapshot
WHERE snapshot_at > now() - interval '7 days'
GROUP BY sku
ORDER BY price_drop ASC
LIMIT 50;

-- Stock-out frequency per category
SELECT category_id,
       SUM(CASE WHEN in_stock = 0 THEN 1 ELSE 0 END)::float / COUNT(*) AS oos_rate
FROM snapshot
WHERE snapshot_at > now() - interval '30 days'
GROUP BY category_id
ORDER BY oos_rate DESC;

-- New SKUs first seen in the last 14 days
SELECT sku, MIN(snapshot_at) AS first_seen
FROM snapshot
GROUP BY sku
HAVING MIN(snapshot_at) > now() - interval '14 days'
ORDER BY first_seen DESC;

These three queries alone power most of the dashboards a category manager wants. Add a brand share view, a seller concentration view, and a campaign-frequency view and you have a competitive intelligence product. The collection layer is the prerequisite; the analytics layer is where you create defensible value.

Common pitfalls when scraping noon

Three failure patterns recur across noon scrapers. The first is country code confusion. noon runs separate storefronts for UAE, Saudi Arabia, and Egypt, each with its own pricing, currency, and stock. The country selector is set via cookie and URL prefix (/uae-en, /saudi-en, /egypt-en). A scraper that drops the country prefix gets redirected by IP geolocation, which corrupts the dataset when proxies move between PoPs. Always pin the country in both the URL and the x-locale header.

The second is double-currency rendering. Saudi pages show prices in SAR but include a USD conversion in the markup for some SKUs. Naive selectors pick whichever appears first in the DOM, which flips between SAR and USD across page versions. Read the price from the structured JSON-LD block, not from visible HTML.

The third is mall-vs-marketplace confusion. noon Mall (curated brand storefronts) and noon Marketplace (third-party sellers) coexist on the same product page. The seller_type and is_noon_mall flags distinguish them. Analytics that treat all listings as one population miss the fact that Mall pricing is more stable while Marketplace pricing reprices weekly. Segment your dataset by seller type before computing trend metrics.

FAQ

Is scraping Noon legal in the UAE?
Public product data is generally considered fair to scrape in most jurisdictions, but UAE law has a strong personal data and consumer protection framework. Restrict your scraping to product, price, and seller data. Avoid pulling buyer reviews with personally identifying details, avoid scraping logged-in pages, and respect any explicit terms of service. For commercial use, consult legal counsel familiar with UAE Federal Decree-Law No. 45 of 2021.

Why do prices differ between Noon UAE and Noon Saudi for the same product?
Each country marketplace is operated as a separate entity with separate seller relationships, separate logistics, and separate pricing decisions. The same SKU can be listed by completely different sellers in the two markets at different prices. Treat each country as an independent dataset for analytical purposes.

Does Noon block VPN traffic?
Noon does not specifically block VPN traffic. It blocks any IP that fingerprints as a data center or hosting provider, which is what most consumer VPN exits look like. Residential and mobile IPs from genuine consumer ISPs in the GCC are the only reliable way to scrape at volume.

How does Noon handle the Friday-Saturday weekend?
Noon catalogue updates and pricing changes continue through the GCC weekend, often with promotional bursts on Friday afternoons. If you are tracking pricing dynamics, your snapshot cadence should not skip weekends.

Can I use Noon’s affiliate API instead of scraping?
Noon runs an affiliate program through Tradedoubler and other networks. The affiliate APIs provide product data for approved affiliates but the catalogue coverage and refresh frequency are limited compared to scraping the public site. For competitive intelligence use cases, scraping remains the higher-fidelity option.

Does noon block residential proxies from non-GCC countries?
Yes, with increasing severity. Casual product lookups from EU or US residential IPs usually succeed, but sustained scraping is throttled within 24 hours. UAE, Saudi, or Egyptian residential or mobile IPs are required for production volume.

How does noon handle price changes during White Friday?
Prices update in waves every 1-3 hours during the campaign. Pre-campaign baselines should be captured at least 7 days before the event to expose true discount depth.

If you are scoping a scraping infrastructure for this market, browse the ecommerce scraping category for tooling reviews, proxy comparisons, and framework deep dives that pair with the patterns above.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top
message me on telegram

Resources

Proxy Signals Podcast
Operator-level insights on mobile proxies and access infrastructure.

Multi-Account Proxies: Setup, Types, Tools & Mistakes (2026)