How to scrape Trendyol Turkey in 2026

How to scrape Trendyol Turkey in 2026

Scrape Trendyol Turkey at scale and you immediately discover that the marketplace behaves differently from Western counterparts. Trendyol is the dominant ecommerce platform in Turkey, owned by Alibaba Group since 2018, and serves more than 30 million active buyers across categories from fashion to electronics to grocery delivery via Trendyol Go. The site enforces Turkey-specific pricing, lira denomination, KDV (VAT) inclusive display, and a recommendation engine that geo-personalizes feeds based on the visitor IP. If you fetch Trendyol from a US data center IP, you get a stripped-down catalogue with no inventory data and frequent CAPTCHA challenges. If you fetch from a Turkish residential or mobile IP, you get the same payload a real Istanbul shopper sees.

This guide walks through everything you need to scrape Trendyol Turkey product, seller, and price data reliably in 2026. The patterns apply whether you are running price intelligence for a retail brand, building a competitive monitor for a Turkish seller, or feeding a category-level dataset into a machine learning pipeline.

Why Trendyol needs Turkey-resident proxies

Trendyol uses a CDN configuration that classifies the visitor IP into one of three buckets before serving content: domestic Turkish residential, domestic mobile, or international. International visitors get a slow path with aggressive rate limiting, frequent Cloudflare interstitials, and a noticeable degradation in the JSON payloads exposed to the browser. The most obvious symptom is missing seller information and missing stock counts when you scrape from the wrong country.

Use a Turkish residential or mobile proxy and the JSON endpoints behind the product card return full payloads that include the merchant ID, fulfillment warehouse, regional inventory, and KDV-inclusive prices. The cost difference between a Turkish residential pool and a US data center pool is real, but the data quality difference is larger. For most operations the math works out in favor of paying for clean Turkish IPs.

Mapping the Trendyol URL and JSON structure

Trendyol product URLs follow a predictable pattern that includes the brand slug, product slug, and a numeric product content ID. A typical URL looks like https://www.trendyol.com/<brand>/<product-slug>-p-<contentId>. The contentId is the stable identifier you want to capture in your database because the slug portion changes when sellers rename products.

Behind the scenes, Trendyol product pages hydrate from a JSON endpoint at https://public.trendyol.com/discovery-web-productgw-service/api/productDetail/<contentId>. This endpoint returns price, seller list, variants, ratings, and stock per variant in a single response. Hitting this endpoint directly is dramatically faster than parsing the HTML, and it is much less brittle to layout changes.

import httpx
import asyncio
from typing import Optional

TRENDYOL_API = "https://public.trendyol.com/discovery-web-productgw-service/api/productDetail"

HEADERS = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36",
    "Accept": "application/json",
    "Accept-Language": "tr-TR,tr;q=0.9,en;q=0.8",
    "Referer": "https://www.trendyol.com/",
}

async def fetch_product(content_id: int, proxy: str) -> Optional[dict]:
    url = f"{TRENDYOL_API}/{content_id}"
    async with httpx.AsyncClient(proxy=proxy, headers=HEADERS, timeout=20) as client:
        r = await client.get(url)
        if r.status_code == 200:
            return r.json()
        if r.status_code == 429:
            await asyncio.sleep(30)
            return None
        return None

The JSON returns a result object with the canonical product description, a variants array (size, color, stock, sellerId), and a merchantListings array containing every seller offering that product, the price, the cargo cost, and the campaign discount. For competitive intelligence on a single SKU across multiple sellers, this single endpoint is everything you need.

Building a category crawler that respects pagination

For category sweeps, Trendyol exposes a separate search and listing endpoint at https://public.trendyol.com/discovery-web-searchgw-service/v2/api/infinite-scroll. This endpoint accepts a category code, page number, sort option, and filter facets. The infinite-scroll naming reflects the front-end pattern, but the API is plain paginated JSON.

import httpx, asyncio, json

LISTING_API = "https://public.trendyol.com/discovery-web-searchgw-service/v2/api/infinite-scroll"

async def fetch_category_page(category_id: int, page: int, proxy: str):
    params = {
        "wc": category_id,
        "pi": page,
        "culture": "tr-TR",
        "sst": "BEST_SELLER",
        "userGenderId": "",
    }
    async with httpx.AsyncClient(proxy=proxy, headers=HEADERS, timeout=20) as client:
        r = await client.get(LISTING_API, params=params)
        if r.status_code != 200:
            return []
        data = r.json()
        return data.get("result", {}).get("products", [])

async def crawl_category(category_id: int, proxies: list[str], max_pages: int = 50):
    all_rows = []
    for page in range(1, max_pages + 1):
        proxy = proxies[page % len(proxies)]
        rows = await fetch_category_page(category_id, page, proxy)
        if not rows:
            break
        all_rows.extend(rows)
        await asyncio.sleep(2)
    return all_rows

Trendyol caps a single category sweep at roughly 200 pages of 24 products each. For very broad categories you need to subdivide by facet (price band, brand, color) to recover the long tail. The aggregations field in the response tells you which facets are available and the count of products behind each facet.

Handling pricing, KDV, and campaign discounts

Trendyol pricing is messy in the way Turkish ecommerce is messy. Every price is presented as KDV-inclusive (VAT-included), but campaign discounts, basket discounts, and seller-level promo codes mean the headline price almost never matches what the buyer actually pays at checkout. If you are building a competitive intelligence dashboard, decide upfront which price you mean by price.

The product detail JSON exposes four useful fields:

FieldMeaning
originalPriceSticker price before any discount, KDV inclusive
sellingPriceCurrent display price after seller discount
discountedPricePrice after Trendyol campaign overlay
basketPricePrice visible to buyer when added to basket (sometimes lower)

For most monitoring use cases, log all four every time. Models that try to compare to competitor sites need discountedPrice because that is the visible price on the listing card. Brand teams enforcing MAP (minimum advertised price) policies need originalPrice and sellingPrice because those are the prices the seller is publishing.

Proxy strategy for Trendyol at scale

Trendyol’s bot detection is layered. The first layer is Cloudflare bot management, which fingerprints TLS, HTTP/2 frames, and header order. The second layer is application-level behavioral analysis that watches for unrealistic page navigation patterns. The third is IP reputation scoring against a Turkish baseline.

For sub-10,000 product per day workloads, a small Turkish residential pool with rotating IPs per request is enough. For 100,000+ products per day, the math shifts toward sticky sessions on mobile IPs. The mobile IP costs more per port, but a single mobile IP can usually sustain a request rate of 5-10 product detail calls per second for hours without being flagged, and the per-product cost works out lower at high volume.

Reasonable starting allocation:

  • 1 mobile port on Türk Telekom or Turkcell: handles 50,000 product details per day
  • Backup of 50 rotating residential IPs in Turkey: handles category sweeps and seller crawls
  • Single fallback datacenter pool in Frankfurt: useful only for non-personalized public endpoints like sitemap discovery

For a deeper look at how different proxy categories behave under ecommerce scraping loads, see our residential vs mobile proxy comparison for ecommerce and our proxy provider ranking for 2026.

Avoiding common Trendyol scraping mistakes

The first mistake is treating the productCode in the URL as the product identifier. Trendyol uses three different identifiers in different parts of the system: the SEO slug, the contentId in the URL, and the merchant SKU inside the JSON. Always store the contentId as your primary key. The slug changes, the merchant SKU changes per seller, only the contentId is stable across the catalogue.

The second mistake is parsing the rendered HTML rather than calling the JSON endpoints. Trendyol re-renders the product card layout regularly. The HTML breaks every few months and your scraper needs maintenance. The JSON endpoints are the contract used by Trendyol’s own front end, and they are far more stable.

The third mistake is ignoring the seller dimension. A product page can list 30+ sellers offering the same SKU at different prices and shipping conditions. If you only capture the buy-box winner, you miss the entire competitive landscape on the listing. The merchantListings array is the source of truth for seller-level price intelligence.

Storing Trendyol data for analytics

For most workloads, a wide table per product snapshot works well in DuckDB or PostgreSQL. The schema should track the four price fields above, plus seller, stock per variant, ratings, review count, and the campaign banner if any. Take snapshots at a frequency aligned to your decision cadence. For dynamic-pricing competitors, every 4-6 hours captures meaningful change. For weekly category reports, a daily snapshot is enough.

CREATE TABLE trendyol_product_snapshot (
    snapshot_at TIMESTAMP NOT NULL,
    content_id BIGINT NOT NULL,
    seller_id BIGINT NOT NULL,
    original_price DECIMAL(12,2),
    selling_price DECIMAL(12,2),
    discounted_price DECIMAL(12,2),
    basket_price DECIMAL(12,2),
    in_stock INT,
    rating DECIMAL(3,2),
    review_count INT,
    campaign_text TEXT,
    PRIMARY KEY (snapshot_at, content_id, seller_id)
);
CREATE INDEX trendyol_content_idx ON trendyol_product_snapshot(content_id);

A 100k-product daily snapshot table will grow to roughly 30M rows per year. DuckDB handles that comfortably on a laptop. PostgreSQL handles it comfortably on a single node. Either way, partition by snapshot_at weekly or monthly to keep query plans tight.

Detecting and routing around CAPTCHA challenges

When Trendyol flags your traffic, the response is usually a Cloudflare interrogation page rather than an HTTP 4xx. Your scraper needs to detect this content-type swap explicitly. Look for the signature cf-mitigated header, the presence of __cf_chl_ cookies, or HTML containing Just a moment.... Treat any of these as a soft block.

def is_challenged(response) -> bool:
    if response.status_code in (403, 503):
        return True
    if "cf-mitigated" in response.headers:
        return True
    if "__cf_chl_" in response.headers.get("set-cookie", ""):
        return True
    body = response.text[:2000].lower()
    return "just a moment" in body or "checking your browser" in body

When you detect a challenge, do not retry on the same IP for at least 30 minutes. Mark that IP as cooling and route subsequent requests to a different IP in your pool. Aggressive retries on a flagged IP cause the cooling window to extend and can lead to long-term blacklisting of your subnet.

For pages that absolutely must be fetched (a specific SKU your client cares about), have a fallback path that uses a headless browser with real Turkey residential IP. The browser path costs more per page but solves the small percentage of challenges that the API path cannot handle. Most production setups maintain a 95/5 split: 95% of requests go through the lightweight HTTP+JSON path, 5% fall through to the browser path on challenge.

Working with TRY pricing and FX normalization

Pricing in Trendyol is denominated in TRY, and any cross-market analysis requires careful FX normalization. The naive approach of converting at scrape time using a live FX feed introduces noise into your trend lines because exchange rate movements get conflated with real price changes. The correct pattern is to store the price in local TRY and apply FX conversion at query time using a daily reference rate.

CREATE TABLE fx_rates (
    rate_date DATE NOT NULL,
    base_ccy VARCHAR(3) NOT NULL,
    quote_ccy VARCHAR(3) NOT NULL,
    rate DECIMAL(18,8) NOT NULL,
    PRIMARY KEY (rate_date, base_ccy, quote_ccy)
);

Source the daily rates from a reliable feed such as the European Central Bank reference rates or your bank’s wholesale feed. Avoid scraping retail FX rates because they include the bank’s spread and produce inconsistent comparisons. For analyses that span multiple years, also account for currency revaluation events that occasionally happen in emerging markets.

Comparing Trendyol to other regional marketplaces

MarketplaceCountry focusCatalogue scaleBot strictness
TrendyolTurkeyLargeHigh
HepsiburadaAdjacent marketsMediumMedium
GittiGidiyorAdjacent marketsSmallerLower

Cross-marketplace analyses help separate platform-specific dynamics from genuine market trends. If a price drops on Trendyol but stays flat across the comparable competitors, that is a platform-driven event rather than a market-wide signal. Your scraping pipeline should ingest from at least three platforms in any market where you intend to publish category insights.

Operational monitoring and alerting

Every production scraper needs three monitoring layers regardless of target. The first is per-IP success rate over a 5-minute window, alerting if any IP drops below 80%. The second is parser error rate, alerting if more than 1% of fetched pages fail to extract the canonical fields. The third is data freshness, alerting if your downstream consumers see snapshots more than 24 hours old.

import time
from collections import deque

class IPHealthTracker:
    def __init__(self, window_seconds: int = 300):
        self.window = window_seconds
        self.events = {}

    def record(self, ip: str, success: bool):
        bucket = self.events.setdefault(ip, deque())
        now = time.time()
        bucket.append((now, success))
        while bucket and bucket[0][0] < now - self.window:
            bucket.popleft()

    def success_rate(self, ip: str) -> float:
        bucket = self.events.get(ip)
        if not bucket:
            return 1.0
        successes = sum(1 for _, ok in bucket if ok)
        return successes / len(bucket)

Wire this into Prometheus or your existing observability stack so the on-call engineer sees IP degradation as it happens rather than after the daily snapshot fails. For long-running operations against Trendyol, IP rotation triggered by the health tracker is more reliable than fixed rotation schedules.

Legal and compliance considerations for Turkey

Public product, price, and availability data are generally treated as fair to scrape in most jurisdictions, but Turkey has its own consumer protection and personal data frameworks that overlay any general analysis. Confine your collection to non-personal data: SKU identifiers, prices, descriptions, ratings as aggregates, and seller display names. Avoid collecting individual buyer reviews with names, phone numbers, or email addresses attached, and avoid pulling any data behind a login.

For commercial deployment of a scraper that targets Trendyol, document your basis for processing, your data retention period, and your purpose limitation. Most data protection regimes treat scraped public data more favorably when there is a clear lawful basis and the data is not used for direct marketing to identified individuals. The W3C Web Annotation guidance and similar published frameworks remain useful starting points for documenting your approach.

Pipeline orchestration and scheduling

For any non-trivial scraping operation, a dedicated orchestration layer is the difference between a script you babysit and a service that runs unattended. The two strong open-source choices in 2026 are Prefect 3 and Dagster. Both handle the patterns you need: DAG dependencies, retries, observability, secret management, and dynamic fan-out across IPs and categories.

from prefect import flow, task

@task(retries=3, retry_delay_seconds=60)
def fetch_category(category_id: int, page: int):
    return crawl_one_page(category_id, page)

@task
def store_pages(pages: list):
    write_to_db(pages)

@flow(name="Trendyol-daily-sweep")
def daily_sweep(category_ids: list):
    futures = []
    for cid in category_ids:
        for page in range(1, 50):
            futures.append(fetch_category.submit(cid, page))
    pages = [f.result() for f in futures]
    store_pages(pages)

Run the flow on a 6-hour or 24-hour schedule depending on how dynamic the underlying catalogue is. For seasonal markets like apparel where pricing changes daily, a 6-hour cadence catches the meaningful movements without driving up proxy costs unnecessarily. For long-tail categories like books or industrial supplies, daily is sufficient and the cost saving is meaningful.

Sample analytics queries on the collected dataset

Once your snapshots are landing reliably, the analytics layer is where the value materializes. A few queries that consistently come up across Trendyol datasets:

-- Top 50 SKUs by price drop in the last 7 days
SELECT sku, MIN(selling_price) - MAX(selling_price) AS price_drop
FROM snapshot
WHERE snapshot_at > now() - interval '7 days'
GROUP BY sku
ORDER BY price_drop ASC
LIMIT 50;

-- Stock-out frequency per category
SELECT category_id,
       SUM(CASE WHEN in_stock = 0 THEN 1 ELSE 0 END)::float / COUNT(*) AS oos_rate
FROM snapshot
WHERE snapshot_at > now() - interval '30 days'
GROUP BY category_id
ORDER BY oos_rate DESC;

-- New SKUs first seen in the last 14 days
SELECT sku, MIN(snapshot_at) AS first_seen
FROM snapshot
GROUP BY sku
HAVING MIN(snapshot_at) > now() - interval '14 days'
ORDER BY first_seen DESC;

These three queries alone power most of the dashboards a category manager wants. Add a brand share view, a seller concentration view, and a campaign-frequency view and you have a competitive intelligence product. The collection layer is the prerequisite; the analytics layer is where you create defensible value.

Common pitfalls when scraping Trendyol

Three failure modes account for most production incidents on Trendyol scrapers. The first is silent variant collapse. The product detail endpoint nests variant arrays inside allVariants and slicingAttributes. Naive flatteners pick the first variant and drop the rest, which means size and color price differences vanish from the dataset. Always iterate the full variant array and emit one row per content_id plus listing_id pair.

The second is timezone drift on price-change events. Trendyol’s backend timestamps are in Europe/Istanbul (UTC+3, no DST), but many cloud functions default to UTC. If you compare today’s price snapshot taken at 09:00 UTC against yesterday’s snapshot taken at 22:00 UTC you are comparing two windows separated by 11 hours, not 24. Pin the snapshot timestamp to the local Trendyol day and store both timestamps explicitly.

The third is campaign-price contamination. The originalPrice and sellingPrice fields carry the headline price, but campaigns like flashDiscount and crossDiscount apply at checkout and only appear inside the promotions array. A scraper that stores sellingPrice as the realized price will overstate revenue by 8-15% during major campaigns like Legendary Friday and Birthday Week. Compute the realized price by walking the promotions array and applying each rule in order.

FAQ

Do I need to log in to scrape Trendyol product data?
No. Product details, category listings, and seller data are all available without authentication. Login is only required if you want to scrape order history, wallet balance, or personalized recommendations. For 99% of competitive intelligence and price monitoring use cases, anonymous scraping is sufficient.

Will the HTML scrape work without proxies if I rate limit aggressively?
You can pull a few hundred product pages per day from a single non-Turkish IP without immediate bans. Beyond that you hit either Cloudflare interstitials or a soft block where the JSON endpoints start returning 403. For any sustained operation, Turkish residential or mobile IPs are required.

How fresh is the price data on the public JSON endpoint?
The productDetail endpoint reflects current selling state with a CDN cache lifetime of about 60-180 seconds. For most monitoring workloads that is real-time enough. If you need true real-time pricing, the legacy productgw-service endpoint occasionally bypasses cache, but it is undocumented and can change without notice.

Does Trendyol expose stock counts or just availability?
The product detail JSON includes a stock integer for each variant. For top-selling SKUs, sellers often inflate stock counts to keep the buy box. For mid-tier and long-tail SKUs, the stock value is usually accurate to within 10-20%. Use it for trend signals rather than absolute inventory truth.

Can I scrape Trendyol Go (grocery) using the same approach?
Trendyol Go uses a different subdomain and a different API surface focused on hyper-local fulfillment. The proxy and rate-limit principles transfer, but the endpoints and JSON shape are different. Plan for separate code paths if your project covers both retail and grocery.

How do I detect when Trendyol rotates its anti-bot challenge variant?
Watch for a sudden jump in the share of responses returning HTML rather than JSON for the same endpoint. A rotation typically lifts the HTML share above 5% within an hour and stabilizes after 24-48 hours as your fingerprint pool adapts.

What is the right cadence for category-level snapshots vs SKU-level snapshots?
Category listings refresh every 6-12 hours for most analytical use cases. SKU-level price and stock snapshots run hourly for top 1000 SKUs and every 4-6 hours for the long tail.

If you are scoping a scraping infrastructure for this market, browse the ecommerce scraping category for tooling reviews, proxy comparisons, and framework deep dives that pair with the patterns above.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top
message me on telegram

Resources

Proxy Signals Podcast
Operator-level insights on mobile proxies and access infrastructure.

Multi-Account Proxies: Setup, Types, Tools & Mistakes (2026)