How to Scrape eBay Listings for Competitive Intelligence

How to Scrape eBay Listings for Competitive Intelligence

eBay remains one of the most important marketplaces globally, with over 1.9 billion live listings spanning categories from electronics to collectibles. Unlike purely retail platforms, eBay’s auction-style and marketplace model creates uniquely valuable data — real-time supply and demand signals, true market pricing through completed auctions, and seller performance metrics that reveal competitive dynamics.

For e-commerce sellers, market researchers, and data analysts, eBay data provides competitive intelligence that is difficult to obtain from any other source. This guide walks through building a complete eBay scraper in Python using rotating proxies for reliable, large-scale data extraction.

Why eBay Data Is Valuable for Competitive Intelligence

eBay’s marketplace model generates data types that traditional retail sites do not expose:

  • Completed listings: Actual sale prices reveal true market values, not just asking prices.
  • Auction dynamics: Bid counts and bid histories show real-time demand intensity.
  • Seller metrics: Feedback scores, sell-through rates, and response times indicate market competitiveness.
  • Supply tracking: Listing volumes and durations show market saturation for specific products.
  • Price distribution: The range of prices for identical items reveals pricing power and buyer willingness to pay.

This data powers e-commerce strategies including product sourcing, pricing optimization, and inventory planning.

Why Proxies Are Essential for eBay Scraping

eBay’s anti-bot measures include:

  • IP-based rate limiting: eBay tracks request volume per IP and throttles or blocks excessive requestors.
  • Bot detection scripts: Client-side JavaScript verifies browser authenticity.
  • CAPTCHA challenges: Suspicious traffic patterns trigger verification.
  • Request header analysis: Missing or inconsistent headers flag automated tools.
  • Behavioral monitoring: eBay analyzes navigation patterns to distinguish bots from humans.

Residential and mobile proxies provide IP addresses that eBay treats as legitimate consumer traffic, dramatically reducing detection risk.

Setting Up Your Environment

pip install requests beautifulsoup4 lxml pandas

Building the eBay Scraper

Step 1: Configure Session with Proxy

import requests
from bs4 import BeautifulSoup
import json
import time
import random
import re
import pandas as pd
from datetime import datetime
from urllib.parse import quote_plus, urlencode

class EbayScraper:
    """Scrape eBay listings for competitive intelligence."""

    BASE_URL = "https://www.ebay.com"
    SEARCH_URL = "https://www.ebay.com/sch/i.html"

    def __init__(self, proxy_url):
        self.session = requests.Session()
        self.session.proxies = {
            "http": proxy_url,
            "https": proxy_url,
        }
        self.session.headers.update({
            "User-Agent": (
                "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
                "AppleWebKit/537.36 (KHTML, like Gecko) "
                "Chrome/120.0.0.0 Safari/537.36"
            ),
            "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
            "Accept-Language": "en-US,en;q=0.9",
            "Accept-Encoding": "gzip, deflate, br",
            "Connection": "keep-alive",
            "Sec-Fetch-Dest": "document",
            "Sec-Fetch-Mode": "navigate",
        })

    def _fetch_page(self, url, params=None, max_retries=3):
        """Fetch a page with retry logic."""
        for attempt in range(max_retries):
            try:
                response = self.session.get(url, params=params, timeout=25)

                if response.status_code == 200:
                    if "captcha" in response.text.lower():
                        print(f"CAPTCHA detected, attempt {attempt + 1}")
                        time.sleep(random.uniform(10, 20))
                        continue
                    return response.text
                elif response.status_code == 429:
                    print("Rate limited, waiting...")
                    time.sleep(random.uniform(20, 40))
                elif response.status_code == 403:
                    print(f"Blocked (403), attempt {attempt + 1}")
                    time.sleep(random.uniform(10, 20))
                else:
                    print(f"Status {response.status_code}, attempt {attempt + 1}")
                    time.sleep(random.uniform(3, 8))

            except requests.exceptions.RequestException as e:
                print(f"Request error: {e}")
                time.sleep(random.uniform(5, 10))

        return None

Step 2: Search Active Listings

    def search_listings(self, query, num_pages=3, condition=None,
                        min_price=None, max_price=None, sort="best_match"):
        """Search eBay for active listings."""
        all_listings = []

        sort_map = {
            "best_match": "12",
            "price_low": "15",
            "price_high": "16",
            "newly_listed": "10",
            "ending_soonest": "1",
        }

        for page in range(1, num_pages + 1):
            params = {
                "_nkw": query,
                "_pgn": page,
                "_sop": sort_map.get(sort, "12"),
            }

            # Add price filters
            if min_price:
                params["_udlo"] = min_price
            if max_price:
                params["_udhi"] = max_price

            # Add condition filter
            if condition == "new":
                params["LH_ItemCondition"] = "1000"
            elif condition == "used":
                params["LH_ItemCondition"] = "3000"
            elif condition == "refurbished":
                params["LH_ItemCondition"] = "2000|2500"

            print(f"Scraping page {page} for '{query}'...")
            html = self._fetch_page(self.SEARCH_URL, params=params)
            if not html:
                print(f"Failed to fetch page {page}")
                continue

            listings = self._parse_search_results(html)
            if not listings:
                print(f"No listings on page {page}")
                break

            all_listings.extend(listings)
            print(f"  Found {len(listings)} listings (total: {len(all_listings)})")
            time.sleep(random.uniform(3, 6))

        return all_listings

    def _parse_search_results(self, html):
        """Parse listing cards from eBay search results."""
        soup = BeautifulSoup(html, "lxml")
        listings = []

        # eBay search result items
        items = soup.select("li.s-item")

        for item in items:
            listing = {}

            # Title
            title_el = item.select_one("div.s-item__title span[role='heading']")
            if not title_el:
                title_el = item.select_one("h3.s-item__title")
            listing["title"] = title_el.get_text(strip=True) if title_el else None

            # Skip "Shop on eBay" promotional items
            if listing["title"] and "shop on ebay" in listing["title"].lower():
                continue

            # Price
            price_el = item.select_one("span.s-item__price")
            if price_el:
                price_text = price_el.get_text(strip=True)
                listing["price_text"] = price_text

                # Handle price ranges (e.g., "$10.00 to $20.00")
                prices = re.findall(r"\$([\d,]+\.?\d*)", price_text)
                if prices:
                    listing["price_low"] = float(prices[0].replace(",", ""))
                    listing["price_high"] = float(prices[-1].replace(",", "")) if len(prices) > 1 else listing["price_low"]

            # Listing URL
            link_el = item.select_one("a.s-item__link")
            if link_el:
                url = link_el.get("href", "")
                listing["url"] = url.split("?")[0]  # Remove tracking params

            # Item ID from URL
            if listing.get("url"):
                id_match = re.search(r"/itm/(\d+)", listing["url"])
                listing["item_id"] = id_match.group(1) if id_match else None

            # Shipping cost
            shipping_el = item.select_one("span.s-item__shipping")
            if shipping_el:
                ship_text = shipping_el.get_text(strip=True)
                listing["shipping_text"] = ship_text
                if "free" in ship_text.lower():
                    listing["shipping_cost"] = 0.0
                else:
                    ship_match = re.search(r"\$([\d,]+\.?\d*)", ship_text)
                    listing["shipping_cost"] = float(ship_match.group(1).replace(",", "")) if ship_match else None

            # Listing type (auction vs buy it now)
            bid_el = item.select_one("span.s-item__bids")
            if bid_el:
                listing["listing_type"] = "auction"
                bid_text = bid_el.get_text(strip=True)
                bid_match = re.search(r"(\d+)", bid_text)
                listing["bid_count"] = int(bid_match.group(1)) if bid_match else 0
            else:
                listing["listing_type"] = "buy_it_now"
                listing["bid_count"] = None

            # Time remaining (for auctions)
            time_el = item.select_one("span.s-item__time-left")
            listing["time_left"] = time_el.get_text(strip=True) if time_el else None

            # Condition
            condition_el = item.select_one("span.SECONDARY_INFO")
            listing["condition"] = condition_el.get_text(strip=True) if condition_el else None

            # Seller info
            seller_el = item.select_one("span.s-item__seller-info-text")
            listing["seller"] = seller_el.get_text(strip=True) if seller_el else None

            # Location
            location_el = item.select_one("span.s-item__location")
            listing["location"] = location_el.get_text(strip=True) if location_el else None

            # Image
            img_el = item.select_one("img.s-item__image-img")
            listing["image_url"] = img_el.get("src") if img_el else None

            listing["scraped_at"] = datetime.now().isoformat()

            if listing.get("title") and listing.get("item_id"):
                listings.append(listing)

        return listings

Step 3: Scrape Completed/Sold Listings

Completed listings are the gold standard for market pricing research.

    def search_completed_listings(self, query, num_pages=3):
        """Search for completed (sold) eBay listings to get actual sale prices."""
        all_listings = []

        for page in range(1, num_pages + 1):
            params = {
                "_nkw": query,
                "_pgn": page,
                "LH_Complete": "1",  # Completed listings
                "LH_Sold": "1",     # Only sold items
                "_sop": "13",       # Sort by end date: recent first
            }

            print(f"Scraping completed listings page {page} for '{query}'...")
            html = self._fetch_page(self.SEARCH_URL, params=params)
            if not html:
                continue

            listings = self._parse_search_results(html)

            # Mark as completed/sold
            for listing in listings:
                listing["status"] = "sold"
                listing["data_type"] = "completed_listing"

            all_listings.extend(listings)
            print(f"  Found {len(listings)} sold listings")
            time.sleep(random.uniform(3, 6))

        return all_listings

Step 4: Extract Individual Listing Details

    def get_listing_details(self, item_url):
        """Fetch detailed information for a single eBay listing."""
        html = self._fetch_page(item_url)
        if not html:
            return None

        soup = BeautifulSoup(html, "lxml")
        details = {}

        # Title
        title_el = soup.select_one("h1.x-item-title__mainTitle span")
        details["title"] = title_el.get_text(strip=True) if title_el else None

        # Price
        price_el = soup.select_one("div.x-price-primary span.ux-textspans")
        if price_el:
            details["price_text"] = price_el.get_text(strip=True)
            match = re.search(r"\$([\d,]+\.?\d*)", details["price_text"])
            details["price"] = float(match.group(1).replace(",", "")) if match else None

        # Condition
        condition_el = soup.select_one("div.x-item-condition-text span.ux-textspans")
        details["condition"] = condition_el.get_text(strip=True) if condition_el else None

        # Seller information
        seller_el = soup.select_one("div.x-sellercard-atf__info span.ux-textspans--BOLD")
        details["seller_name"] = seller_el.get_text(strip=True) if seller_el else None

        # Seller feedback score
        feedback_el = soup.select_one("span.ux-seller-section__item--link span.ux-textspans")
        details["seller_feedback"] = feedback_el.get_text(strip=True) if feedback_el else None

        # Seller positive percentage
        positive_el = soup.select_one("div.x-sellercard-atf__data-item span.ux-textspans--SECONDARY")
        if positive_el:
            text = positive_el.get_text(strip=True)
            pct_match = re.search(r"([\d.]+)%", text)
            details["seller_positive_pct"] = pct_match.group(1) if pct_match else None

        # Item specifics
        details["item_specifics"] = {}
        spec_rows = soup.select("div.ux-layout-section-evo__row")
        for row in spec_rows:
            labels = row.select("div.ux-labels-values__labels span.ux-textspans")
            values = row.select("div.ux-labels-values__values span.ux-textspans")
            if labels and values:
                key = labels[0].get_text(strip=True)
                val = values[0].get_text(strip=True)
                if key and val:
                    details["item_specifics"][key] = val

        # Description (often in iframe, extract what's in main page)
        desc_el = soup.select_one("div.d-item-description")
        if desc_el:
            details["description_preview"] = desc_el.get_text(strip=True)[:500]

        # Watchers
        watcher_el = soup.select_one("span.d-watches__count")
        if watcher_el:
            watcher_text = watcher_el.get_text(strip=True)
            watcher_match = re.search(r"(\d+)", watcher_text)
            details["watchers"] = int(watcher_match.group(1)) if watcher_match else None

        # Quantity sold
        sold_el = soup.select_one("div.d-quantity__availability span.ux-textspans--BOLD")
        if sold_el:
            sold_text = sold_el.get_text(strip=True)
            sold_match = re.search(r"(\d+)", sold_text)
            details["quantity_sold"] = int(sold_match.group(1)) if sold_match else None

        # Available quantity
        qty_el = soup.select_one("select#qtySubTxt option:last-child")
        if qty_el:
            details["quantity_available"] = qty_el.get_text(strip=True)

        # Shipping
        shipping_el = soup.select_one("div.ux-labels-values--shipping span.ux-textspans--BOLD")
        details["shipping"] = shipping_el.get_text(strip=True) if shipping_el else None

        # Returns
        returns_el = soup.select_one("div.ux-labels-values--returns span.ux-textspans--BOLD")
        details["returns_policy"] = returns_el.get_text(strip=True) if returns_el else None

        details["scraped_at"] = datetime.now().isoformat()

        return details

Step 5: Competitive Analysis Functions

class EbayCompetitiveAnalyzer:
    """Analyze eBay data for competitive intelligence."""

    def __init__(self, scraper):
        self.scraper = scraper

    def analyze_market_pricing(self, query, include_sold=True):
        """Analyze active and sold listing prices for a product."""
        # Get active listings
        active = self.scraper.search_listings(query, num_pages=3)
        print(f"Active listings: {len(active)}")

        analysis = {
            "query": query,
            "analyzed_at": datetime.now().isoformat(),
            "active_listings": len(active),
        }

        # Active listing price analysis
        active_prices = [l["price_low"] for l in active if l.get("price_low")]
        if active_prices:
            analysis["active_prices"] = {
                "min": min(active_prices),
                "max": max(active_prices),
                "mean": round(sum(active_prices) / len(active_prices), 2),
                "median": sorted(active_prices)[len(active_prices) // 2],
                "count": len(active_prices),
            }

        # Buy It Now vs Auction breakdown
        bin_listings = [l for l in active if l.get("listing_type") == "buy_it_now"]
        auction_listings = [l for l in active if l.get("listing_type") == "auction"]
        analysis["buy_it_now_count"] = len(bin_listings)
        analysis["auction_count"] = len(auction_listings)

        # Completed/sold listing analysis
        if include_sold:
            time.sleep(random.uniform(5, 10))
            sold = self.scraper.search_completed_listings(query, num_pages=3)
            print(f"Sold listings: {len(sold)}")
            analysis["sold_listings"] = len(sold)

            sold_prices = [l["price_low"] for l in sold if l.get("price_low")]
            if sold_prices:
                analysis["sold_prices"] = {
                    "min": min(sold_prices),
                    "max": max(sold_prices),
                    "mean": round(sum(sold_prices) / len(sold_prices), 2),
                    "median": sorted(sold_prices)[len(sold_prices) // 2],
                    "count": len(sold_prices),
                }

                # Price gap analysis
                if active_prices:
                    analysis["price_gap"] = {
                        "avg_active_vs_sold": round(
                            analysis["active_prices"]["mean"] - analysis["sold_prices"]["mean"], 2
                        ),
                        "gap_pct": round(
                            ((analysis["active_prices"]["mean"] - analysis["sold_prices"]["mean"])
                             / analysis["sold_prices"]["mean"]) * 100, 2
                        ),
                    }

        return analysis, active, sold if include_sold else []

    def analyze_sellers(self, listings):
        """Analyze seller competition in search results."""
        seller_data = {}

        for listing in listings:
            seller = listing.get("seller", "Unknown")
            if seller not in seller_data:
                seller_data[seller] = {
                    "listing_count": 0,
                    "prices": [],
                    "conditions": [],
                }
            seller_data[seller]["listing_count"] += 1
            if listing.get("price_low"):
                seller_data[seller]["prices"].append(listing["price_low"])
            if listing.get("condition"):
                seller_data[seller]["conditions"].append(listing["condition"])

        # Calculate seller metrics
        seller_metrics = []
        for seller, data in seller_data.items():
            metric = {
                "seller": seller,
                "listing_count": data["listing_count"],
                "market_share_pct": round(
                    (data["listing_count"] / len(listings)) * 100, 2
                ),
            }
            if data["prices"]:
                metric["avg_price"] = round(sum(data["prices"]) / len(data["prices"]), 2)
                metric["min_price"] = min(data["prices"])
                metric["max_price"] = max(data["prices"])
            seller_metrics.append(metric)

        # Sort by listing count
        seller_metrics.sort(key=lambda x: x["listing_count"], reverse=True)
        return seller_metrics

Step 6: Run the Complete Pipeline

def main():
    proxy_url = "http://user:pass@proxy.dataresearchtools.com:8080"
    scraper = EbayScraper(proxy_url)
    analyzer = EbayCompetitiveAnalyzer(scraper)

    # Market analysis for multiple products
    products_to_analyze = [
        "iPhone 15 Pro Max 256GB",
        "Nintendo Switch OLED",
        "Dyson V15 Detect",
    ]

    all_analyses = []
    for product in products_to_analyze:
        print(f"\n{'='*60}")
        print(f"Analyzing market for: {product}")
        print(f"{'='*60}")

        analysis, active, sold = analyzer.analyze_market_pricing(product)
        all_analyses.append(analysis)

        print(f"\nPricing Summary:")
        if "active_prices" in analysis:
            ap = analysis["active_prices"]
            print(f"  Active: ${ap['min']:.2f} - ${ap['max']:.2f} (avg: ${ap['mean']:.2f})")
        if "sold_prices" in analysis:
            sp = analysis["sold_prices"]
            print(f"  Sold:   ${sp['min']:.2f} - ${sp['max']:.2f} (avg: ${sp['mean']:.2f})")
        if "price_gap" in analysis:
            pg = analysis["price_gap"]
            print(f"  Gap:    ${pg['avg_active_vs_sold']:.2f} ({pg['gap_pct']:+.1f}%)")

        # Seller analysis
        seller_metrics = analyzer.analyze_sellers(active)
        print(f"\nTop Sellers:")
        for s in seller_metrics[:5]:
            print(f"  {s['seller']}: {s['listing_count']} listings "
                  f"({s['market_share_pct']}% share)")

        time.sleep(random.uniform(10, 15))

    # Save all results
    with open("ebay_market_analysis.json", "w", encoding="utf-8") as f:
        json.dump(all_analyses, f, indent=2, ensure_ascii=False)

    print(f"\nAnalysis complete for {len(all_analyses)} products")


if __name__ == "__main__":
    main()

Anti-Detection Best Practices for eBay

Request Timing

eBay is moderately tolerant of automated requests compared to Amazon or Google, but still requires careful pacing:

  • Space search page requests 3-6 seconds apart.
  • Wait 4-8 seconds between individual listing page fetches.
  • Add 10-15 second pauses between different search queries.
  • Limit daily requests to under 1,000 per IP address.

Header Consistency

Maintain consistent headers within a session but vary them between sessions. eBay checks for header anomalies that indicate automated tools.

Geographic Matching

eBay serves different content based on location. Match your proxy location to the eBay domain you are scraping (US proxy for ebay.com, UK proxy for ebay.co.uk).

Cookie Management

Let cookies accumulate naturally throughout a session. eBay’s tracking cookies help establish session legitimacy.

Applications for eBay Competitive Data

eBay data drives numerous competitive intelligence use cases:

  • Product sourcing: Find underpriced items for resale by comparing active listing prices to completed sale prices.
  • Pricing optimization: Set competitive prices based on real market data rather than guesswork.
  • Demand forecasting: Track bid counts and watchers to predict which products will sell quickly.
  • Competitor monitoring: Track specific sellers’ inventory, pricing changes, and listing strategies.
  • Market entry analysis: Evaluate market opportunity by analyzing seller competition and price distributions before entering a new product category.
  • Trend identification: Monitor which product categories see increasing listings and higher sell-through rates.

For broader e-commerce intelligence, combining eBay data with Amazon and Walmart pricing creates a comprehensive market view.

Scaling Your eBay Scraper

For enterprise-scale eBay data collection:

  1. Distributed architecture: Run multiple scraper instances across different servers, each with its own proxy pool.
  2. Queue management: Use Redis or Celery to manage scraping tasks and distribute workload.
  3. Database storage: Store results in PostgreSQL with proper indexing on item_id, seller, and price columns.
  4. Deduplication: Track item IDs to avoid scraping the same listing multiple times within a monitoring cycle.
  5. Change detection: Compare current prices against previous scrapes to generate price change alerts.

Legal Considerations

eBay’s Terms of Service restrict automated access. However, eBay also provides an official API (eBay Browse API and Finding API) that offers legitimate programmatic access to listing data. The API has rate limits and data restrictions, but for many use cases, it provides sufficient data without the legal risk of scraping.

Consider the API for production use cases and reserve scraping for data that the API does not expose.

Conclusion

Scraping eBay listings for competitive intelligence provides actionable data on market pricing, seller competition, and demand patterns. The combination of active and completed listing analysis creates a complete picture of market dynamics that no other data source can match.

Reliable eBay scraping depends on quality proxy infrastructure. Rotating residential proxies from DataResearchTools provide the IP diversity and legitimacy needed for sustained data collection. For more web scraping guides and definitions of proxy-related terms, visit our proxy glossary.


Related Reading

Scroll to Top