How to Scrape Google Trends Data for Market Research

How to Scrape Google Trends Data for Market Research

Google Trends provides one of the most accessible windows into real-time consumer interest and market demand. Unlike traditional market research that relies on surveys and reports, Google Trends reflects actual search behavior from billions of queries. For product managers, investors, content strategists, and market researchers, this data reveals demand shifts, seasonal patterns, and emerging trends before they show up in sales figures.

While Google Trends offers a web interface, it limits data exports and does not support automated collection. The pytrends library provides programmatic access, but Google aggressively rate-limits automated requests. This guide demonstrates how to build a robust Google Trends data pipeline using Python, pytrends, and mobile proxy rotation to collect trend data at scale.

What Google Trends Data Tells You

Google Trends does not report absolute search volumes. Instead, it provides a relative interest score from 0 to 100, where 100 represents the peak popularity of a term within the selected time range and geography. This normalization makes it ideal for:

  • Trend comparison: How does interest in “AI chatbot” compare to “search engine” over the past 5 years?
  • Seasonal analysis: When does demand for “winter coats” peak relative to “swimwear”?
  • Geographic targeting: Which states or countries show the highest interest in your product category?
  • Related query discovery: What terms do people search alongside your target keywords?
  • Rising topic detection: What queries are seeing rapid growth before they go mainstream?

Setting Up the Environment

Install the required packages:

pip install pytrends pandas matplotlib requests

Basic Pytrends Usage

Start with a simple example to understand the data structure before adding proxy rotation:

from pytrends.request import TrendReq
import pandas as pd
import time


def basic_trends_query(keywords, timeframe="today 12-m", geo="US"):
    """Fetch basic Google Trends data for a list of keywords."""
    pytrends = TrendReq(hl="en-US", tz=360)
    pytrends.build_payload(keywords, cat=0, timeframe=timeframe, geo=geo)

    # Interest over time
    interest_df = pytrends.interest_over_time()

    # Related queries
    related = pytrends.related_queries()

    # Interest by region
    regional = pytrends.interest_by_region(resolution="COUNTRY")

    return {
        "interest_over_time": interest_df,
        "related_queries": related,
        "regional_interest": regional,
    }

This works for one-off queries but fails under repeated use because Google rate-limits the underlying API requests.

Adding Proxy Rotation for Scale

To collect Google Trends data for hundreds of keywords, you need proxy rotation. Google is particularly aggressive about rate-limiting Trends requests, making web scraping proxies essential for any serious data collection effort:

import random
import json
from pytrends.request import TrendReq


class GoogleTrendsCollector:
    """Collects Google Trends data at scale using proxy rotation."""

    def __init__(self, proxy_list):
        self.proxy_list = proxy_list
        self.proxy_index = 0
        self.request_count = 0
        self.requests_per_proxy = 5  # Rotate after N requests per proxy

    def _get_pytrends_instance(self):
        """Create a pytrends instance with the current proxy."""
        proxy = self.proxy_list[self.proxy_index % len(self.proxy_list)]

        # pytrends expects proxies in a specific format
        proxy_config = [f"https://{proxy.replace('http://', '')}"]

        pytrends = TrendReq(
            hl="en-US",
            tz=360,
            timeout=(10, 30),
            proxies=proxy_config,
            retries=3,
            backoff_factor=1.0,
        )

        self.request_count += 1
        if self.request_count >= self.requests_per_proxy:
            self.proxy_index += 1
            self.request_count = 0
            print(f"Rotated to proxy index {self.proxy_index % len(self.proxy_list)}")

        return pytrends

    def get_interest_over_time(self, keywords, timeframe="today 12-m", geo=""):
        """Fetch interest over time with proxy rotation and retry logic."""
        max_retries = 3

        for attempt in range(max_retries):
            try:
                pytrends = self._get_pytrends_instance()
                pytrends.build_payload(
                    keywords, cat=0, timeframe=timeframe, geo=geo
                )

                df = pytrends.interest_over_time()
                time.sleep(random.uniform(2, 5))
                return df

            except Exception as e:
                print(f"Attempt {attempt + 1} failed: {e}")
                self.proxy_index += 1  # Force proxy rotation
                self.request_count = 0
                time.sleep(random.uniform(10, 20))

        return pd.DataFrame()

    def get_related_queries(self, keywords, timeframe="today 12-m", geo=""):
        """Fetch related queries with proxy rotation."""
        max_retries = 3

        for attempt in range(max_retries):
            try:
                pytrends = self._get_pytrends_instance()
                pytrends.build_payload(
                    keywords, cat=0, timeframe=timeframe, geo=geo
                )

                related = pytrends.related_queries()
                time.sleep(random.uniform(2, 5))
                return related

            except Exception as e:
                print(f"Related queries attempt {attempt + 1} failed: {e}")
                self.proxy_index += 1
                self.request_count = 0
                time.sleep(random.uniform(10, 20))

        return {}

    def get_regional_interest(self, keywords, timeframe="today 12-m", geo="", resolution="COUNTRY"):
        """Fetch interest by region with proxy rotation."""
        max_retries = 3

        for attempt in range(max_retries):
            try:
                pytrends = self._get_pytrends_instance()
                pytrends.build_payload(
                    keywords, cat=0, timeframe=timeframe, geo=geo
                )

                regional = pytrends.interest_by_region(
                    resolution=resolution, inc_low_vol=True, inc_geo_code=True
                )
                time.sleep(random.uniform(2, 5))
                return regional

            except Exception as e:
                print(f"Regional interest attempt {attempt + 1} failed: {e}")
                self.proxy_index += 1
                self.request_count = 0
                time.sleep(random.uniform(10, 20))

        return pd.DataFrame()

    def get_suggestions(self, keyword):
        """Get Google Trends keyword suggestions."""
        try:
            pytrends = self._get_pytrends_instance()
            suggestions = pytrends.suggestions(keyword=keyword)
            time.sleep(random.uniform(1, 3))
            return suggestions
        except Exception as e:
            print(f"Suggestions error: {e}")
            return []

Building a Market Research Pipeline

Combine the collector methods into a complete market research workflow that analyzes a topic from multiple angles:

class MarketResearchPipeline:
    """Comprehensive market research using Google Trends data."""

    def __init__(self, collector):
        self.collector = collector

    def analyze_market(self, primary_keywords, competitor_keywords, geo="US"):
        """Run a full market analysis for a set of keywords."""
        report = {}

        # 1. Trend comparison (up to 5 keywords at a time)
        print("Collecting trend data...")
        all_keywords = primary_keywords + competitor_keywords

        # Google Trends limits to 5 keywords per query
        keyword_batches = [
            all_keywords[i:i + 5] for i in range(0, len(all_keywords), 5)
        ]

        trend_data = []
        for batch in keyword_batches:
            df = self.collector.get_interest_over_time(
                batch, timeframe="today 5-y", geo=geo
            )
            if not df.empty:
                trend_data.append(df)
            time.sleep(random.uniform(3, 6))

        if trend_data:
            report["trends"] = pd.concat(trend_data, axis=1)
            # Remove duplicate 'isPartial' columns
            report["trends"] = report["trends"].loc[
                :, ~report["trends"].columns.duplicated()
            ]

        # 2. Related queries for each primary keyword
        print("Collecting related queries...")
        report["related"] = {}
        for keyword in primary_keywords:
            related = self.collector.get_related_queries(
                [keyword], timeframe="today 12-m", geo=geo
            )
            if related and keyword in related:
                report["related"][keyword] = {
                    "top": related[keyword].get("top"),
                    "rising": related[keyword].get("rising"),
                }
            time.sleep(random.uniform(3, 6))

        # 3. Regional interest
        print("Collecting regional data...")
        report["regional"] = {}
        for keyword in primary_keywords:
            regional = self.collector.get_regional_interest(
                [keyword], timeframe="today 12-m", geo=geo,
                resolution="REGION" if geo else "COUNTRY",
            )
            if not regional.empty:
                report["regional"][keyword] = regional
            time.sleep(random.uniform(3, 6))

        return report

    def seasonal_analysis(self, keyword, years=5, geo="US"):
        """Analyze seasonal patterns for a keyword."""
        df = self.collector.get_interest_over_time(
            [keyword], timeframe=f"today {years * 12}-m", geo=geo
        )

        if df.empty:
            return None

        # Add month column for seasonal aggregation
        df["month"] = df.index.month
        df["year"] = df.index.year

        seasonal = df.groupby("month")[keyword].mean()
        peak_month = seasonal.idxmax()
        trough_month = seasonal.idxmin()

        return {
            "keyword": keyword,
            "monthly_averages": seasonal.to_dict(),
            "peak_month": peak_month,
            "trough_month": trough_month,
            "seasonality_ratio": seasonal.max() / seasonal.min() if seasonal.min() > 0 else None,
            "raw_data": df,
        }

    def find_rising_topics(self, seed_keywords, geo="US"):
        """Discover rising topics related to seed keywords."""
        rising_topics = []

        for keyword in seed_keywords:
            related = self.collector.get_related_queries(
                [keyword], timeframe="today 3-m", geo=geo
            )

            if related and keyword in related:
                rising = related[keyword].get("rising")
                if rising is not None and not rising.empty:
                    for _, row in rising.iterrows():
                        rising_topics.append({
                            "seed_keyword": keyword,
                            "rising_query": row.get("query", ""),
                            "growth_value": row.get("value", ""),
                        })

            time.sleep(random.uniform(3, 6))

        return pd.DataFrame(rising_topics)

Extracting and Exporting Results

def main():
    proxies = [
        "http://user:pass@proxy1.example.com:8080",
        "http://user:pass@proxy2.example.com:8080",
        "http://user:pass@proxy3.example.com:8080",
        "http://user:pass@proxy4.example.com:8080",
        "http://user:pass@proxy5.example.com:8080",
    ]

    collector = GoogleTrendsCollector(proxies)
    pipeline = MarketResearchPipeline(collector)

    # Market analysis
    primary = ["mobile proxy", "residential proxy", "rotating proxy"]
    competitors = ["bright data", "smartproxy"]

    report = pipeline.analyze_market(primary, competitors, geo="US")

    # Export trend data
    if "trends" in report and not report["trends"].empty:
        report["trends"].to_csv("trends_over_time.csv")
        print("Trend data exported")

    # Export related queries
    for keyword, data in report.get("related", {}).items():
        if data.get("rising") is not None:
            safe_name = keyword.replace(" ", "_")
            data["rising"].to_csv(f"rising_queries_{safe_name}.csv", index=False)

    # Seasonal analysis
    seasonal = pipeline.seasonal_analysis("web scraping", years=5, geo="US")
    if seasonal:
        print(f"\nSeasonal Analysis - 'web scraping':")
        print(f"Peak month: {seasonal['peak_month']}")
        print(f"Trough month: {seasonal['trough_month']}")
        print(f"Seasonality ratio: {seasonal['seasonality_ratio']:.2f}")

    # Rising topics
    rising = pipeline.find_rising_topics(primary, geo="US")
    if not rising.empty:
        rising.to_csv("rising_topics.csv", index=False)
        print(f"\nFound {len(rising)} rising topics")
        print(rising.head(20).to_string())


if __name__ == "__main__":
    main()

Handling Google’s Rate Limiting

Google Trends is one of the most aggressively rate-limited Google services. Here are strategies that keep your collection running:

Slow and steady pacing. Unlike other scraping targets where 1-2 second delays suffice, Google Trends requires 3-10 second delays between requests. The service monitors burst patterns across short windows.

Proxy pool sizing. For Google Trends, plan on needing at least 5-10 mobile proxies for moderate workloads. Each proxy should handle no more than 5-10 requests before rotation.

Time-of-day optimization. Google’s rate limits appear to be less strict during off-peak hours (UTC 02:00-08:00). Schedule large collection jobs during these windows for higher success rates.

Batch keyword grouping. Google Trends allows up to 5 keywords per query. Always group related keywords into single queries to minimize the total number of API calls needed.

Request spacing variation. Use non-uniform delays. Instead of sleeping exactly 5 seconds between requests, use random.uniform(3, 8) to avoid creating detectable patterns.

Advanced Use Cases

Competitor Monitoring

Track how interest in competitor brands changes relative to your own:

def monitor_brand_competition(collector, brands, timeframe="today 3-m"):
    """Track relative brand interest over time."""
    if len(brands) > 5:
        # Compare each brand against the first as baseline
        baseline = brands[0]
        comparisons = []
        for brand in brands[1:]:
            df = collector.get_interest_over_time(
                [baseline, brand], timeframe=timeframe
            )
            if not df.empty:
                comparisons.append(df)
            time.sleep(random.uniform(5, 10))
        return comparisons
    else:
        return collector.get_interest_over_time(brands, timeframe=timeframe)

Content Calendar Planning

Use seasonal data to time content publication for maximum search demand:

def build_content_calendar(pipeline, topic_keywords):
    """Identify optimal months for publishing content on each topic."""
    calendar = {}

    for keyword in topic_keywords:
        seasonal = pipeline.seasonal_analysis(keyword, years=3)
        if seasonal:
            calendar[keyword] = {
                "best_publish_month": seasonal["peak_month"] - 1 or 12,
                "peak_demand_month": seasonal["peak_month"],
                "avoid_month": seasonal["trough_month"],
            }
        time.sleep(random.uniform(5, 10))

    return calendar

Publishing content one month before peak demand ensures it is indexed and ranking by the time search volume peaks, a strategy well-suited for SEO proxy powered rank tracking.

Data Quality Considerations

Normalization awareness. Google Trends scores are relative, not absolute. A score of 50 for “proxy server” and 50 for “web hosting” does not mean they have equal search volume. The scores are only comparable within the same query.

Geographic granularity. At the country level, Google Trends provides reliable data. At the city or metro level, data becomes sparse and less reliable for niche topics.

Category filtering. Use Google Trends categories to disambiguate keywords. “Python” in the programming category returns different data than “Python” without category filtering.

Short-term noise. Daily data contains significant noise. For trend analysis, weekly or monthly aggregation provides cleaner signals.

Conclusion

Google Trends is an underutilized resource for market research that becomes dramatically more powerful with automated collection. The combination of pytrends for structured access and mobile proxy rotation for rate limit management enables comprehensive trend analysis across hundreds of keywords and geographic regions.

For related data collection techniques, explore our SEO proxy guides and web scraping tutorials. The proxy glossary provides definitions for proxy-related terms mentioned in this guide.


Related Reading

Scroll to Top