How to Scrape Google Shopping Results for Price Monitoring

How to Scrape Google Shopping Results for Price Monitoring

Google Shopping aggregates product listings from thousands of retailers, making it one of the most comprehensive sources of pricing data on the internet. For e-commerce businesses, price monitoring through Google Shopping data enables competitive pricing strategies, market analysis, and automated repricing that can directly impact revenue.

Scraping Google Shopping, however, means going up against Google’s world-class anti-bot systems. This guide provides a complete Python framework for extracting Google Shopping data using rotating proxies for reliable, large-scale price monitoring.

Why Google Shopping Data Matters for E-Commerce

Google Shopping serves as a universal price comparison engine. The data it aggregates reveals:

  • Competitive pricing: See exactly what competitors charge for identical or similar products.
  • Market positioning: Understand where your prices sit relative to the market average.
  • Seller landscape: Identify which retailers compete on specific products.
  • Price trends: Track how prices change over time for seasonal analysis and demand forecasting.
  • Product availability: Monitor stock levels across multiple retailers.

For e-commerce businesses of any size, this data is a competitive advantage.

Why Proxies Are Non-Negotiable for Google Scraping

Google operates the most sophisticated anti-bot infrastructure on the internet. Scraping Google Shopping without proxies will result in:

  • Immediate CAPTCHAs after a handful of requests.
  • IP blacklisting across all Google properties.
  • Altered results served to detected bots that do not reflect actual prices.
  • Rate limiting that makes data collection impractically slow.

Mobile and residential proxies are essential because Google trusts traffic from ISP-assigned and mobile carrier IPs. These addresses are shared by millions of real users, making it infeasible for Google to block them.

Setting Up Your Environment

pip install requests beautifulsoup4 lxml pandas schedule

Building the Google Shopping Scraper

Step 1: Configure Session with Proxy Rotation

import requests
from bs4 import BeautifulSoup
import json
import time
import random
import re
import pandas as pd
from datetime import datetime
from urllib.parse import quote_plus

class GoogleShoppingScraper:
    """Scrape Google Shopping results for price monitoring."""

    BASE_URL = "https://www.google.com/search"

    def __init__(self, proxy_url):
        self.session = requests.Session()
        self.session.proxies = {
            "http": proxy_url,
            "https": proxy_url,
        }
        self._rotate_headers()

    def _rotate_headers(self):
        """Set randomized but realistic browser headers."""
        user_agents = [
            "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
            "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
            "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:121.0) Gecko/20100101 Firefox/121.0",
            "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.2 Safari/605.1.15",
        ]

        self.session.headers.update({
            "User-Agent": random.choice(user_agents),
            "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
            "Accept-Language": "en-US,en;q=0.9",
            "Accept-Encoding": "gzip, deflate, br",
            "Connection": "keep-alive",
            "Upgrade-Insecure-Requests": "1",
            "Sec-Fetch-Dest": "document",
            "Sec-Fetch-Mode": "navigate",
            "Sec-Fetch-Site": "none",
            "Sec-Fetch-User": "?1",
        })

    def _fetch_with_retry(self, url, params=None, max_retries=3):
        """Fetch a URL with retry logic and header rotation."""
        for attempt in range(max_retries):
            try:
                self._rotate_headers()
                response = self.session.get(url, params=params, timeout=20)

                if response.status_code == 200:
                    if "captcha" in response.text.lower() or "unusual traffic" in response.text.lower():
                        print(f"CAPTCHA detected, attempt {attempt + 1}")
                        time.sleep(random.uniform(20, 40))
                        continue
                    return response.text
                elif response.status_code == 429:
                    print(f"Rate limited, waiting...")
                    time.sleep(random.uniform(30, 60))
                else:
                    print(f"Status {response.status_code}, attempt {attempt + 1}")
                    time.sleep(random.uniform(5, 10))

            except requests.exceptions.RequestException as e:
                print(f"Request error: {e}")
                time.sleep(random.uniform(5, 10))

        return None

Step 2: Search Google Shopping

    def search_products(self, query, num_pages=3, country="us"):
        """Search Google Shopping for products matching a query."""
        all_products = []

        for page in range(num_pages):
            params = {
                "q": query,
                "tbm": "shop",  # Shopping tab
                "hl": "en",
                "gl": country,
                "start": page * 20,
            }

            print(f"Scraping page {page + 1} for '{query}'...")
            html = self._fetch_with_retry(self.BASE_URL, params=params)
            if not html:
                print(f"Failed to fetch page {page + 1}")
                continue

            products = self._parse_shopping_results(html)
            if not products:
                print(f"No products found on page {page + 1}")
                break

            all_products.extend(products)
            print(f"  Found {len(products)} products (total: {len(all_products)})")

            time.sleep(random.uniform(4, 8))

        return all_products

    def _parse_shopping_results(self, html):
        """Parse product listings from Google Shopping results."""
        soup = BeautifulSoup(html, "lxml")
        products = []

        # Google Shopping product cards
        product_cards = soup.select("div.sh-dgr__gr-auto")
        if not product_cards:
            # Alternative selector
            product_cards = soup.select("div.sh-dgr__content")

        for card in product_cards:
            product = {}

            # Product title
            title_el = card.select_one("h3.tAxDx") or card.select_one("h4")
            if not title_el:
                title_el = card.select_one("a[aria-label]")
                if title_el:
                    product["title"] = title_el.get("aria-label", "")
            else:
                product["title"] = title_el.get_text(strip=True)

            # Price
            price_el = card.select_one("span.a8Pemb") or card.select_one("span.kHxwFf")
            if price_el:
                product["price_text"] = price_el.get_text(strip=True)
                price_match = re.search(r"[\$\£\€]([\d,]+\.?\d*)", product["price_text"])
                product["price"] = float(price_match.group(1).replace(",", "")) if price_match else None

            # Seller/Store
            seller_el = card.select_one("div.aULzUe") or card.select_one("div.IuHnof")
            product["seller"] = seller_el.get_text(strip=True) if seller_el else None

            # Rating
            rating_el = card.select_one("span.Rsc7Yb")
            product["rating"] = rating_el.get_text(strip=True) if rating_el else None

            # Review count
            review_el = card.select_one("span.QIrs8")
            if review_el:
                review_text = review_el.get_text(strip=True)
                review_match = re.search(r"([\d,]+)", review_text)
                product["review_count"] = review_match.group(1) if review_match else None

            # Product link
            link_el = card.select_one("a[href*='/shopping/product/']")
            if link_el:
                href = link_el.get("href", "")
                product["google_shopping_url"] = (
                    f"https://www.google.com{href}" if href.startswith("/") else href
                )

            # Shipping info
            shipping_el = card.select_one("span.vEjMR")
            product["shipping"] = shipping_el.get_text(strip=True) if shipping_el else None

            # Image
            img_el = card.select_one("img")
            product["image_url"] = img_el.get("src") if img_el else None

            # Timestamp
            product["scraped_at"] = datetime.now().isoformat()

            if product.get("title") and product.get("price"):
                products.append(product)

        return products

Step 3: Get Detailed Product Pricing from Multiple Sellers

    def get_product_offers(self, product_url):
        """Fetch all seller offers for a specific product."""
        html = self._fetch_with_retry(product_url)
        if not html:
            return []

        soup = BeautifulSoup(html, "lxml")
        offers = []

        # Find offer listings
        offer_cards = soup.select("tr.sh-osd__offer")
        if not offer_cards:
            offer_cards = soup.select("div.sh-osd__content")

        for card in offer_cards:
            offer = {}

            # Seller name
            seller = card.select_one("td.sh-osd__seller-name a") or card.select_one("a.b5ycib")
            offer["seller"] = seller.get_text(strip=True) if seller else None

            # Price
            price = card.select_one("td.sh-osd__total-price") or card.select_one("span.g9WBQb")
            if price:
                offer["total_price_text"] = price.get_text(strip=True)
                price_match = re.search(r"[\$\£\€]([\d,]+\.?\d*)", offer["total_price_text"])
                offer["total_price"] = float(price_match.group(1).replace(",", "")) if price_match else None

            # Base price
            base_price = card.select_one("td.sh-osd__offer-price") or card.select_one("span.drzWO")
            if base_price:
                offer["base_price_text"] = base_price.get_text(strip=True)

            # Shipping cost
            shipping = card.select_one("td.sh-osd__shipping")
            offer["shipping"] = shipping.get_text(strip=True) if shipping else None

            # Seller rating
            seller_rating = card.select_one("span.sh-osd__seller-rating")
            offer["seller_rating"] = seller_rating.get_text(strip=True) if seller_rating else None

            offer["scraped_at"] = datetime.now().isoformat()

            if offer.get("seller"):
                offers.append(offer)

        return offers

Step 4: Build a Price Monitoring System

class PriceMonitor:
    """Monitor prices for specific products over time."""

    def __init__(self, proxy_url, data_dir="price_data"):
        self.scraper = GoogleShoppingScraper(proxy_url)
        self.data_dir = data_dir
        self.price_history = {}

    def monitor_products(self, product_queries):
        """Run a monitoring cycle for a list of product queries."""
        timestamp = datetime.now().isoformat()
        cycle_results = []

        for query in product_queries:
            print(f"\nMonitoring prices for: {query}")
            products = self.scraper.search_products(query, num_pages=2)

            for product in products:
                product["query"] = query
                product["monitor_timestamp"] = timestamp
                cycle_results.append(product)

                # Track price history
                key = f"{product.get('title', '')}|{product.get('seller', '')}"
                if key not in self.price_history:
                    self.price_history[key] = []
                self.price_history[key].append({
                    "price": product.get("price"),
                    "timestamp": timestamp,
                })

            time.sleep(random.uniform(5, 10))

        return cycle_results

    def detect_price_changes(self, threshold_pct=5.0):
        """Detect significant price changes from historical data."""
        alerts = []

        for product_key, history in self.price_history.items():
            if len(history) < 2:
                continue

            current = history[-1]["price"]
            previous = history[-2]["price"]

            if current is None or previous is None or previous == 0:
                continue

            change_pct = ((current - previous) / previous) * 100

            if abs(change_pct) >= threshold_pct:
                alerts.append({
                    "product": product_key.split("|")[0],
                    "seller": product_key.split("|")[1],
                    "previous_price": previous,
                    "current_price": current,
                    "change_pct": round(change_pct, 2),
                    "direction": "increase" if change_pct > 0 else "decrease",
                    "timestamp": history[-1]["timestamp"],
                })

        return alerts

    def generate_report(self, cycle_results):
        """Generate a price monitoring report."""
        df = pd.DataFrame(cycle_results)

        report = {
            "timestamp": datetime.now().isoformat(),
            "total_products_tracked": len(df),
            "unique_sellers": df["seller"].nunique(),
            "queries_monitored": df["query"].nunique(),
        }

        # Price statistics per query
        for query in df["query"].unique():
            query_df = df[df["query"] == query]
            prices = query_df["price"].dropna()
            report[f"stats_{query}"] = {
                "count": len(prices),
                "min_price": prices.min(),
                "max_price": prices.max(),
                "avg_price": round(prices.mean(), 2),
                "median_price": round(prices.median(), 2),
            }

        return report

Step 5: Run the Complete Pipeline

def main():
    proxy_url = "http://user:pass@proxy.dataresearchtools.com:8080"

    # One-time product search
    scraper = GoogleShoppingScraper(proxy_url)

    queries = [
        "wireless noise cancelling headphones",
        "mechanical keyboard RGB",
        "4K webcam",
    ]

    all_products = []
    for query in queries:
        products = scraper.search_products(query, num_pages=2)
        all_products.extend(products)
        time.sleep(random.uniform(5, 10))

    # Save search results
    with open("google_shopping_results.json", "w", encoding="utf-8") as f:
        json.dump(all_products, f, indent=2, ensure_ascii=False)

    df = pd.DataFrame(all_products)
    df.to_csv("google_shopping_results.csv", index=False)

    print(f"\nTotal products found: {len(all_products)}")

    # Price analysis
    for query in queries:
        query_products = [p for p in all_products if p.get("query") == query or query in str(p.get("title", "")).lower()]
        prices = [p["price"] for p in query_products if p.get("price")]
        if prices:
            print(f"\n{query}:")
            print(f"  Products: {len(prices)}")
            print(f"  Price range: ${min(prices):.2f} - ${max(prices):.2f}")
            print(f"  Average: ${sum(prices)/len(prices):.2f}")

    # Set up continuous monitoring
    monitor = PriceMonitor(proxy_url)

    products_to_monitor = [
        "Sony WH-1000XM5",
        "Apple AirPods Pro",
        "Samsung Galaxy Buds",
    ]

    # Run one monitoring cycle
    results = monitor.monitor_products(products_to_monitor)
    report = monitor.generate_report(results)
    print(f"\nMonitoring Report:")
    print(json.dumps(report, indent=2))


if __name__ == "__main__":
    main()

Scheduling Automated Price Checks

For continuous price monitoring, schedule regular scraping runs:

import schedule

def scheduled_monitoring():
    """Run price monitoring on a schedule."""
    proxy_url = "http://user:pass@proxy.dataresearchtools.com:8080"
    monitor = PriceMonitor(proxy_url)

    products = [
        "Sony WH-1000XM5",
        "Apple AirPods Pro 2",
        "Bose QuietComfort Ultra",
    ]

    results = monitor.monitor_products(products)
    alerts = monitor.detect_price_changes(threshold_pct=3.0)

    if alerts:
        print(f"\nPrice change alerts:")
        for alert in alerts:
            print(f"  {alert['product']} ({alert['seller']}): "
                  f"${alert['previous_price']:.2f} -> ${alert['current_price']:.2f} "
                  f"({alert['change_pct']:+.1f}%)")

    # Save results
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
    with open(f"price_data/monitoring_{timestamp}.json", "w") as f:
        json.dump({"results": results, "alerts": alerts}, f, indent=2)

# Run every 6 hours
schedule.every(6).hours.do(scheduled_monitoring)

while True:
    schedule.run_pending()
    time.sleep(60)

Google-Specific Anti-Detection Tips

CAPTCHA Prevention

Google’s CAPTCHAs are triggered by request volume and pattern analysis. Minimize CAPTCHA encounters by:

  1. Keeping requests under 30 per hour per IP address.
  2. Varying search query formats and parameters.
  3. Including natural pauses of 4-10 seconds between requests.
  4. Using mobile proxies which have higher trust scores with Google.

Geographic Consistency

Google Shopping results are heavily localized. Ensure your proxy location matches the gl parameter in your search queries. A US proxy should use gl=us, a UK proxy should use gl=gb, and so on.

Session Warmup

Before making shopping searches, establish a legitimate session:

def warmup_google_session(scraper):
    """Warm up a Google session before shopping searches."""
    # Visit Google homepage first
    scraper._fetch_with_retry("https://www.google.com/")
    time.sleep(random.uniform(2, 4))

    # Do a regular search first
    scraper._fetch_with_retry(
        "https://www.google.com/search",
        params={"q": "weather today"},
    )
    time.sleep(random.uniform(3, 6))

Use Cases for Google Shopping Data

Scraped Google Shopping data enables powerful e-commerce strategies:

  • Dynamic pricing: Automatically adjust your prices based on competitor pricing data.
  • MAP compliance: Monitor whether retailers are violating Minimum Advertised Price agreements.
  • Assortment gaps: Identify products competitors sell that you do not carry.
  • Marketplace intelligence: Track which sellers appear most frequently and their pricing strategies.
  • SEO and advertising: Analyze how Shopping ads appear for specific keywords to optimize your own campaigns.

Conclusion

Scraping Google Shopping results provides the pricing intelligence that e-commerce businesses need to compete effectively. The Python framework in this guide handles search result extraction, multi-seller pricing, and automated monitoring with proper anti-detection measures.

Success with Google Shopping scraping depends entirely on your proxy infrastructure. Mobile proxies from DataResearchTools offer the highest success rates against Google’s detection systems. For more web scraping techniques, visit our tutorial library and proxy glossary.


Related Reading

Scroll to Top