How to Scrape Google Shopping Results for Price Monitoring
Google Shopping aggregates product listings from thousands of retailers, making it one of the most comprehensive sources of pricing data on the internet. For e-commerce businesses, price monitoring through Google Shopping data enables competitive pricing strategies, market analysis, and automated repricing that can directly impact revenue.
Scraping Google Shopping, however, means going up against Google’s world-class anti-bot systems. This guide provides a complete Python framework for extracting Google Shopping data using rotating proxies for reliable, large-scale price monitoring.
Why Google Shopping Data Matters for E-Commerce
Google Shopping serves as a universal price comparison engine. The data it aggregates reveals:
- Competitive pricing: See exactly what competitors charge for identical or similar products.
- Market positioning: Understand where your prices sit relative to the market average.
- Seller landscape: Identify which retailers compete on specific products.
- Price trends: Track how prices change over time for seasonal analysis and demand forecasting.
- Product availability: Monitor stock levels across multiple retailers.
For e-commerce businesses of any size, this data is a competitive advantage.
Why Proxies Are Non-Negotiable for Google Scraping
Google operates the most sophisticated anti-bot infrastructure on the internet. Scraping Google Shopping without proxies will result in:
- Immediate CAPTCHAs after a handful of requests.
- IP blacklisting across all Google properties.
- Altered results served to detected bots that do not reflect actual prices.
- Rate limiting that makes data collection impractically slow.
Mobile and residential proxies are essential because Google trusts traffic from ISP-assigned and mobile carrier IPs. These addresses are shared by millions of real users, making it infeasible for Google to block them.
Setting Up Your Environment
pip install requests beautifulsoup4 lxml pandas scheduleBuilding the Google Shopping Scraper
Step 1: Configure Session with Proxy Rotation
import requests
from bs4 import BeautifulSoup
import json
import time
import random
import re
import pandas as pd
from datetime import datetime
from urllib.parse import quote_plus
class GoogleShoppingScraper:
"""Scrape Google Shopping results for price monitoring."""
BASE_URL = "https://www.google.com/search"
def __init__(self, proxy_url):
self.session = requests.Session()
self.session.proxies = {
"http": proxy_url,
"https": proxy_url,
}
self._rotate_headers()
def _rotate_headers(self):
"""Set randomized but realistic browser headers."""
user_agents = [
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:121.0) Gecko/20100101 Firefox/121.0",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.2 Safari/605.1.15",
]
self.session.headers.update({
"User-Agent": random.choice(user_agents),
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.9",
"Accept-Encoding": "gzip, deflate, br",
"Connection": "keep-alive",
"Upgrade-Insecure-Requests": "1",
"Sec-Fetch-Dest": "document",
"Sec-Fetch-Mode": "navigate",
"Sec-Fetch-Site": "none",
"Sec-Fetch-User": "?1",
})
def _fetch_with_retry(self, url, params=None, max_retries=3):
"""Fetch a URL with retry logic and header rotation."""
for attempt in range(max_retries):
try:
self._rotate_headers()
response = self.session.get(url, params=params, timeout=20)
if response.status_code == 200:
if "captcha" in response.text.lower() or "unusual traffic" in response.text.lower():
print(f"CAPTCHA detected, attempt {attempt + 1}")
time.sleep(random.uniform(20, 40))
continue
return response.text
elif response.status_code == 429:
print(f"Rate limited, waiting...")
time.sleep(random.uniform(30, 60))
else:
print(f"Status {response.status_code}, attempt {attempt + 1}")
time.sleep(random.uniform(5, 10))
except requests.exceptions.RequestException as e:
print(f"Request error: {e}")
time.sleep(random.uniform(5, 10))
return NoneStep 2: Search Google Shopping
def search_products(self, query, num_pages=3, country="us"):
"""Search Google Shopping for products matching a query."""
all_products = []
for page in range(num_pages):
params = {
"q": query,
"tbm": "shop", # Shopping tab
"hl": "en",
"gl": country,
"start": page * 20,
}
print(f"Scraping page {page + 1} for '{query}'...")
html = self._fetch_with_retry(self.BASE_URL, params=params)
if not html:
print(f"Failed to fetch page {page + 1}")
continue
products = self._parse_shopping_results(html)
if not products:
print(f"No products found on page {page + 1}")
break
all_products.extend(products)
print(f" Found {len(products)} products (total: {len(all_products)})")
time.sleep(random.uniform(4, 8))
return all_products
def _parse_shopping_results(self, html):
"""Parse product listings from Google Shopping results."""
soup = BeautifulSoup(html, "lxml")
products = []
# Google Shopping product cards
product_cards = soup.select("div.sh-dgr__gr-auto")
if not product_cards:
# Alternative selector
product_cards = soup.select("div.sh-dgr__content")
for card in product_cards:
product = {}
# Product title
title_el = card.select_one("h3.tAxDx") or card.select_one("h4")
if not title_el:
title_el = card.select_one("a[aria-label]")
if title_el:
product["title"] = title_el.get("aria-label", "")
else:
product["title"] = title_el.get_text(strip=True)
# Price
price_el = card.select_one("span.a8Pemb") or card.select_one("span.kHxwFf")
if price_el:
product["price_text"] = price_el.get_text(strip=True)
price_match = re.search(r"[\$\£\€]([\d,]+\.?\d*)", product["price_text"])
product["price"] = float(price_match.group(1).replace(",", "")) if price_match else None
# Seller/Store
seller_el = card.select_one("div.aULzUe") or card.select_one("div.IuHnof")
product["seller"] = seller_el.get_text(strip=True) if seller_el else None
# Rating
rating_el = card.select_one("span.Rsc7Yb")
product["rating"] = rating_el.get_text(strip=True) if rating_el else None
# Review count
review_el = card.select_one("span.QIrs8")
if review_el:
review_text = review_el.get_text(strip=True)
review_match = re.search(r"([\d,]+)", review_text)
product["review_count"] = review_match.group(1) if review_match else None
# Product link
link_el = card.select_one("a[href*='/shopping/product/']")
if link_el:
href = link_el.get("href", "")
product["google_shopping_url"] = (
f"https://www.google.com{href}" if href.startswith("/") else href
)
# Shipping info
shipping_el = card.select_one("span.vEjMR")
product["shipping"] = shipping_el.get_text(strip=True) if shipping_el else None
# Image
img_el = card.select_one("img")
product["image_url"] = img_el.get("src") if img_el else None
# Timestamp
product["scraped_at"] = datetime.now().isoformat()
if product.get("title") and product.get("price"):
products.append(product)
return productsStep 3: Get Detailed Product Pricing from Multiple Sellers
def get_product_offers(self, product_url):
"""Fetch all seller offers for a specific product."""
html = self._fetch_with_retry(product_url)
if not html:
return []
soup = BeautifulSoup(html, "lxml")
offers = []
# Find offer listings
offer_cards = soup.select("tr.sh-osd__offer")
if not offer_cards:
offer_cards = soup.select("div.sh-osd__content")
for card in offer_cards:
offer = {}
# Seller name
seller = card.select_one("td.sh-osd__seller-name a") or card.select_one("a.b5ycib")
offer["seller"] = seller.get_text(strip=True) if seller else None
# Price
price = card.select_one("td.sh-osd__total-price") or card.select_one("span.g9WBQb")
if price:
offer["total_price_text"] = price.get_text(strip=True)
price_match = re.search(r"[\$\£\€]([\d,]+\.?\d*)", offer["total_price_text"])
offer["total_price"] = float(price_match.group(1).replace(",", "")) if price_match else None
# Base price
base_price = card.select_one("td.sh-osd__offer-price") or card.select_one("span.drzWO")
if base_price:
offer["base_price_text"] = base_price.get_text(strip=True)
# Shipping cost
shipping = card.select_one("td.sh-osd__shipping")
offer["shipping"] = shipping.get_text(strip=True) if shipping else None
# Seller rating
seller_rating = card.select_one("span.sh-osd__seller-rating")
offer["seller_rating"] = seller_rating.get_text(strip=True) if seller_rating else None
offer["scraped_at"] = datetime.now().isoformat()
if offer.get("seller"):
offers.append(offer)
return offersStep 4: Build a Price Monitoring System
class PriceMonitor:
"""Monitor prices for specific products over time."""
def __init__(self, proxy_url, data_dir="price_data"):
self.scraper = GoogleShoppingScraper(proxy_url)
self.data_dir = data_dir
self.price_history = {}
def monitor_products(self, product_queries):
"""Run a monitoring cycle for a list of product queries."""
timestamp = datetime.now().isoformat()
cycle_results = []
for query in product_queries:
print(f"\nMonitoring prices for: {query}")
products = self.scraper.search_products(query, num_pages=2)
for product in products:
product["query"] = query
product["monitor_timestamp"] = timestamp
cycle_results.append(product)
# Track price history
key = f"{product.get('title', '')}|{product.get('seller', '')}"
if key not in self.price_history:
self.price_history[key] = []
self.price_history[key].append({
"price": product.get("price"),
"timestamp": timestamp,
})
time.sleep(random.uniform(5, 10))
return cycle_results
def detect_price_changes(self, threshold_pct=5.0):
"""Detect significant price changes from historical data."""
alerts = []
for product_key, history in self.price_history.items():
if len(history) < 2:
continue
current = history[-1]["price"]
previous = history[-2]["price"]
if current is None or previous is None or previous == 0:
continue
change_pct = ((current - previous) / previous) * 100
if abs(change_pct) >= threshold_pct:
alerts.append({
"product": product_key.split("|")[0],
"seller": product_key.split("|")[1],
"previous_price": previous,
"current_price": current,
"change_pct": round(change_pct, 2),
"direction": "increase" if change_pct > 0 else "decrease",
"timestamp": history[-1]["timestamp"],
})
return alerts
def generate_report(self, cycle_results):
"""Generate a price monitoring report."""
df = pd.DataFrame(cycle_results)
report = {
"timestamp": datetime.now().isoformat(),
"total_products_tracked": len(df),
"unique_sellers": df["seller"].nunique(),
"queries_monitored": df["query"].nunique(),
}
# Price statistics per query
for query in df["query"].unique():
query_df = df[df["query"] == query]
prices = query_df["price"].dropna()
report[f"stats_{query}"] = {
"count": len(prices),
"min_price": prices.min(),
"max_price": prices.max(),
"avg_price": round(prices.mean(), 2),
"median_price": round(prices.median(), 2),
}
return reportStep 5: Run the Complete Pipeline
def main():
proxy_url = "http://user:pass@proxy.dataresearchtools.com:8080"
# One-time product search
scraper = GoogleShoppingScraper(proxy_url)
queries = [
"wireless noise cancelling headphones",
"mechanical keyboard RGB",
"4K webcam",
]
all_products = []
for query in queries:
products = scraper.search_products(query, num_pages=2)
all_products.extend(products)
time.sleep(random.uniform(5, 10))
# Save search results
with open("google_shopping_results.json", "w", encoding="utf-8") as f:
json.dump(all_products, f, indent=2, ensure_ascii=False)
df = pd.DataFrame(all_products)
df.to_csv("google_shopping_results.csv", index=False)
print(f"\nTotal products found: {len(all_products)}")
# Price analysis
for query in queries:
query_products = [p for p in all_products if p.get("query") == query or query in str(p.get("title", "")).lower()]
prices = [p["price"] for p in query_products if p.get("price")]
if prices:
print(f"\n{query}:")
print(f" Products: {len(prices)}")
print(f" Price range: ${min(prices):.2f} - ${max(prices):.2f}")
print(f" Average: ${sum(prices)/len(prices):.2f}")
# Set up continuous monitoring
monitor = PriceMonitor(proxy_url)
products_to_monitor = [
"Sony WH-1000XM5",
"Apple AirPods Pro",
"Samsung Galaxy Buds",
]
# Run one monitoring cycle
results = monitor.monitor_products(products_to_monitor)
report = monitor.generate_report(results)
print(f"\nMonitoring Report:")
print(json.dumps(report, indent=2))
if __name__ == "__main__":
main()Scheduling Automated Price Checks
For continuous price monitoring, schedule regular scraping runs:
import schedule
def scheduled_monitoring():
"""Run price monitoring on a schedule."""
proxy_url = "http://user:pass@proxy.dataresearchtools.com:8080"
monitor = PriceMonitor(proxy_url)
products = [
"Sony WH-1000XM5",
"Apple AirPods Pro 2",
"Bose QuietComfort Ultra",
]
results = monitor.monitor_products(products)
alerts = monitor.detect_price_changes(threshold_pct=3.0)
if alerts:
print(f"\nPrice change alerts:")
for alert in alerts:
print(f" {alert['product']} ({alert['seller']}): "
f"${alert['previous_price']:.2f} -> ${alert['current_price']:.2f} "
f"({alert['change_pct']:+.1f}%)")
# Save results
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
with open(f"price_data/monitoring_{timestamp}.json", "w") as f:
json.dump({"results": results, "alerts": alerts}, f, indent=2)
# Run every 6 hours
schedule.every(6).hours.do(scheduled_monitoring)
while True:
schedule.run_pending()
time.sleep(60)Google-Specific Anti-Detection Tips
CAPTCHA Prevention
Google’s CAPTCHAs are triggered by request volume and pattern analysis. Minimize CAPTCHA encounters by:
- Keeping requests under 30 per hour per IP address.
- Varying search query formats and parameters.
- Including natural pauses of 4-10 seconds between requests.
- Using mobile proxies which have higher trust scores with Google.
Geographic Consistency
Google Shopping results are heavily localized. Ensure your proxy location matches the gl parameter in your search queries. A US proxy should use gl=us, a UK proxy should use gl=gb, and so on.
Session Warmup
Before making shopping searches, establish a legitimate session:
def warmup_google_session(scraper):
"""Warm up a Google session before shopping searches."""
# Visit Google homepage first
scraper._fetch_with_retry("https://www.google.com/")
time.sleep(random.uniform(2, 4))
# Do a regular search first
scraper._fetch_with_retry(
"https://www.google.com/search",
params={"q": "weather today"},
)
time.sleep(random.uniform(3, 6))Use Cases for Google Shopping Data
Scraped Google Shopping data enables powerful e-commerce strategies:
- Dynamic pricing: Automatically adjust your prices based on competitor pricing data.
- MAP compliance: Monitor whether retailers are violating Minimum Advertised Price agreements.
- Assortment gaps: Identify products competitors sell that you do not carry.
- Marketplace intelligence: Track which sellers appear most frequently and their pricing strategies.
- SEO and advertising: Analyze how Shopping ads appear for specific keywords to optimize your own campaigns.
Conclusion
Scraping Google Shopping results provides the pricing intelligence that e-commerce businesses need to compete effectively. The Python framework in this guide handles search result extraction, multi-seller pricing, and automated monitoring with proper anti-detection measures.
Success with Google Shopping scraping depends entirely on your proxy infrastructure. Mobile proxies from DataResearchTools offer the highest success rates against Google’s detection systems. For more web scraping techniques, visit our tutorial library and proxy glossary.
- How to Scrape Amazon Product Data with Proxies: 2026 Python Guide
- How to Scrape Bing Search Results with Python and Proxies
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- API vs Web Scraping: When You Need Proxies (and When You Don’t)
- ASEAN Data Protection Laws: A Web Scraping Compliance Matrix
- How to Build an Ethical Web Scraping Policy for Your Company
- How to Scrape Amazon Product Data with Proxies: 2026 Python Guide
- How to Scrape Bing Search Results with Python and Proxies
- aiohttp + BeautifulSoup: Async Python Scraping
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- API vs Web Scraping: When You Need Proxies (and When You Don’t)
- ASEAN Data Protection Laws: A Web Scraping Compliance Matrix
- How to Scrape Amazon Product Data with Proxies: 2026 Python Guide
- How to Scrape Bing Search Results with Python and Proxies
- aiohttp + BeautifulSoup: Async Python Scraping
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- API vs Web Scraping: When You Need Proxies (and When You Don’t)
- ASEAN Data Protection Laws: A Web Scraping Compliance Matrix
- How to Scrape Amazon Product Data with Proxies: 2026 Python Guide
- How to Scrape Bing Search Results with Python and Proxies
- aiohttp + BeautifulSoup: Async Python Scraping
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- API vs Web Scraping: When You Need Proxies (and When You Don’t)
- ASEAN Data Protection Laws: A Web Scraping Compliance Matrix
Related Reading
- How to Scrape Amazon Product Data with Proxies: 2026 Python Guide
- How to Scrape Bing Search Results with Python and Proxies
- aiohttp + BeautifulSoup: Async Python Scraping
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- API vs Web Scraping: When You Need Proxies (and When You Don’t)
- ASEAN Data Protection Laws: A Web Scraping Compliance Matrix