How to Scrape Google Play Reviews

How to Scrape Google Play Reviews

Google Play Store hosts over 3.5 million Android apps with billions of user reviews. For mobile app developers, product teams, and competitive analysts, Play Store review data provides critical feedback about user experience and competitor apps.

What Data Can You Extract?

  • App metadata (name, developer, category, price, rating)
  • User reviews (text, rating, date, helpful count, developer reply)
  • Version history and changelogs
  • Download counts and installs
  • In-app purchase details
  • Similar apps recommendations
  • Developer contact information

Example JSON Output

{
  "app_id": "com.example.proxyapp",
  "name": "Proxy Manager Pro",
  "developer": "TechCorp Inc.",
  "rating": 4.3,
  "reviews_count": 25432,
  "installs": "1,000,000+",
  "category": "Tools",
  "reviews": [{
    "review_id": "gp:abc123",
    "text": "Works great for rotating proxies automatically...",
    "rating": 5,
    "author": "John D.",
    "date": "2026-02-28",
    "thumbs_up": 23,
    "developer_reply": "Thank you for the feedback!"
  }]
}

Prerequisites

pip install google-play-scraper requests beautifulsoup4

Method 1: Using google-play-scraper Library

from google_play_scraper import app, reviews, search, Sort
import json
import time

class PlayStoreScraper:
    def __init__(self, country="us", language="en"):
        self.country = country
        self.language = language

    def search_apps(self, query, n_hits=30):
        results = search(query, lang=self.language, country=self.country, n_hits=n_hits)
        return [{
            "app_id": r["appId"],
            "title": r["title"],
            "developer": r.get("developer"),
            "rating": r.get("score"),
            "installs": r.get("installs"),
            "price": r.get("price"),
            "icon": r.get("icon"),
        } for r in results]

    def get_app_details(self, app_id):
        details = app(app_id, lang=self.language, country=self.country)
        return {
            "app_id": details["appId"],
            "title": details["title"],
            "developer": details["developer"],
            "rating": details["score"],
            "ratings_count": details["ratings"],
            "reviews_count": details["reviews"],
            "installs": details["installs"],
            "price": details["price"],
            "free": details["free"],
            "category": details["genre"],
            "description": details["description"][:500],
            "version": details.get("version"),
            "updated": details.get("updated"),
            "content_rating": details.get("contentRating"),
        }

    def get_reviews(self, app_id, count=200, sort=Sort.NEWEST):
        result, continuation_token = reviews(
            app_id, lang=self.language, country=self.country,
            sort=sort, count=count
        )

        return [{
            "review_id": r["reviewId"],
            "author": r["userName"],
            "rating": r["score"],
            "text": r["content"],
            "date": str(r["at"]),
            "thumbs_up": r["thumbsUpCount"],
            "developer_reply": r.get("replyContent"),
            "app_version": r.get("reviewCreatedVersion"),
        } for r in result]

    def get_all_reviews(self, app_id, max_count=1000):
        all_reviews = []
        result, token = reviews(
            app_id, lang=self.language, country=self.country,
            sort=Sort.NEWEST, count=min(max_count, 200)
        )
        all_reviews.extend(result)

        while token and len(all_reviews) < max_count:
            result, token = reviews(
                app_id, lang=self.language, country=self.country,
                sort=Sort.NEWEST, count=200, continuation_token=token
            )
            all_reviews.extend(result)
            time.sleep(1)

        return all_reviews[:max_count]


# Usage
scraper = PlayStoreScraper(country="us")
apps = scraper.search_apps("proxy vpn", n_hits=10)

for a in apps[:3]:
    details = scraper.get_app_details(a["app_id"])
    reviews_data = scraper.get_reviews(a["app_id"], count=50)
    print(f"{details['title']} ({details['rating']}): {len(reviews_data)} reviews")
    time.sleep(1)

Handling Anti-Bot Protections

Google Play has moderate bot detection. The google-play-scraper library handles most of this, but for high-volume scraping:

  1. Rate limiting: 1-2 requests per second
  2. Proxy rotation: Every 50-100 requests
  3. User agent rotation: Vary user agents

Proxy Recommendations

Proxy TypeSuccess RateBest For
Residential85-90%High volume
Datacenter60-70%Moderate volume
Mobile90%+Best success

Use residential proxies for high-volume scraping.

Legal Considerations

  1. Google ToS: Automated scraping of Google Play may violate Terms of Service.
  2. Review Content: Reviews are user-generated and copyrighted.
  3. Developer Data: Developer contact info may be personal data.
  4. Rate Limits: Excessive scraping may trigger temporary blocks.

See our compliance guide.

Method 2: Web Scraping with Selenium

For data beyond what the google-play-scraper library provides:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options
import json
import time

class PlayStoreSeleniumScraper:
    def __init__(self, proxy=None):
        options = Options()
        options.add_argument("--headless=new")
        options.add_argument("--no-sandbox")
        if proxy:
            options.add_argument(f"--proxy-server={proxy}")
        self.driver = webdriver.Chrome(options=options)

    def scrape_app_page(self, app_id):
        url = f"https://play.google.com/store/apps/details?id={app_id}&hl=en"
        self.driver.get(url)
        time.sleep(3)

        data = self.driver.execute_script('''
            const result = {};
            result.title = document.querySelector("h1")?.innerText;
            result.developer = document.querySelector("[class*='developer'] span")?.innerText;

            // JSON-LD
            const scripts = document.querySelectorAll('script[type="application/ld+json"]');
            for (const script of scripts) {
                try {
                    const json = JSON.parse(script.textContent);
                    if (json["@type"] === "SoftwareApplication") {
                        result.rating = json.aggregateRating?.ratingValue;
                        result.review_count = json.aggregateRating?.ratingCount;
                        result.price = json.offers?.price;
                        result.category = json.applicationCategory;
                    }
                } catch {}
            }
            return result;
        ''')

        return data

    def close(self):
        self.driver.quit()

Advanced Scraping Techniques

Batch Processing Multiple Apps

import time
import json

def batch_scrape_apps(app_ids, scraper, delay=2):
    results = []
    for i, app_id in enumerate(app_ids):
        try:
            details = scraper.get_app_details(app_id)
            reviews = scraper.get_reviews(app_id, count=50)
            results.append({
                "app": details,
                "reviews": reviews,
                "review_count": len(reviews)
            })
            print(f"[{i+1}/{len(app_ids)}] {details.get('title', 'Unknown')}")
        except Exception as e:
            print(f"Error with {app_id}: {e}")
        time.sleep(delay)
    return results

Sentiment Analysis on Reviews

def analyze_review_sentiment(reviews):
    total = len(reviews)
    if total == 0:
        return {}

    ratings = [r.get("score", 0) for r in reviews]
    avg_rating = sum(ratings) / total

    positive = sum(1 for r in ratings if r >= 4)
    negative = sum(1 for r in ratings if r <= 2)
    neutral = total - positive - negative

    return {
        "total_reviews": total,
        "average_rating": round(avg_rating, 2),
        "positive_pct": round(positive / total * 100, 1),
        "negative_pct": round(negative / total * 100, 1),
        "neutral_pct": round(neutral / total * 100, 1),
    }

Tracking Reviews Over Time

from datetime import datetime, timedelta

def get_recent_reviews(scraper, app_id, days=30):
    all_reviews = scraper.get_all_reviews(app_id, max_count=500)
    cutoff = datetime.now() - timedelta(days=days)

    recent = [r for r in all_reviews if r.get("at") and r["at"] > cutoff]
    return recent

Handling Google Play Anti-Bot Protections

1. Rate Limiting

The google-play-scraper library handles most rate limiting internally. For custom scrapers, implement 1-2 second delays between requests.

2. Regional Content Variations

Google Play serves different content, ratings, and even app availability by region. Always specify the country and language parameters:

# Common region codes
regions = {
    "US": ("us", "en"),
    "UK": ("gb", "en"),
    "Japan": ("jp", "ja"),
    "India": ("in", "en"),
    "Brazil": ("br", "pt"),
    "Germany": ("de", "de"),
}

3. Continuation Tokens

Google Play uses continuation tokens for pagination. The google-play-scraper library handles this automatically, but for custom implementations:

# Manual pagination with continuation tokens
result, token = reviews(app_id, count=200)
while token:
    more, token = reviews(app_id, count=200, continuation_token=token)
    result.extend(more)
    time.sleep(1)

Frequently Asked Questions

Does Google Play have an official API?

Google does not offer a public API for Play Store reviews. The google-play-scraper Python library reverse-engineers Google’s internal APIs to provide structured access. Google’s official Play Developer API is only available to app publishers for their own apps.

How many reviews can I scrape per app?

Using the google-play-scraper library, you can typically retrieve up to 10,000-15,000 reviews per app through pagination. The practical limit depends on how many reviews the app has.

Can I get developer replies to reviews?

Yes, the google-play-scraper library returns developer replies (replyContent field) when available. This is valuable for analyzing customer service patterns.

How do I compare reviews across countries?

Specify different country codes when initializing the scraper. Reviews, ratings, and even app availability can vary significantly by region.

Is scraping Google Play legal?

Google’s Terms of Service restrict automated access. The google-play-scraper library operates in a legal gray area. For production use cases, consult legal counsel and consider Google’s Play Developer API for your own apps.

Data Export and Storage

For production-level scraping pipelines, store reviews in a database for trend analysis and deduplication:

import sqlite3
import json
import csv

class PlayStoreDataStore:
    def __init__(self, db_path="playstore_reviews.db"):
        self.conn = sqlite3.connect(db_path)
        self.conn.execute('''CREATE TABLE IF NOT EXISTS reviews
            (review_id TEXT PRIMARY KEY, app_id TEXT, author TEXT,
             rating INTEGER, text TEXT, date TEXT, thumbs_up INTEGER,
             developer_reply TEXT, app_version TEXT,
             scraped_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP)''')

    def upsert_reviews(self, app_id, reviews):
        new_count = 0
        for r in reviews:
            try:
                self.conn.execute(
                    """INSERT OR REPLACE INTO reviews
                    (review_id, app_id, author, rating, text, date, thumbs_up, developer_reply, app_version)
                    VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)""",
                    (r.get("review_id") or r.get("reviewId"), app_id,
                     r.get("author") or r.get("userName"), r.get("rating") or r.get("score"),
                     r.get("text") or r.get("content"), str(r.get("date") or r.get("at", "")),
                     r.get("thumbs_up") or r.get("thumbsUpCount", 0),
                     r.get("developer_reply") or r.get("replyContent"),
                     r.get("app_version") or r.get("reviewCreatedVersion"))
                )
                new_count += 1
            except Exception as e:
                print(f"Error storing review: {e}")
        self.conn.commit()
        return new_count

    def export_csv(self, app_id, output_path):
        cursor = self.conn.execute(
            "SELECT * FROM reviews WHERE app_id = ? ORDER BY date DESC", (app_id,)
        )
        rows = cursor.fetchall()
        columns = [desc[0] for desc in cursor.description]
        with open(output_path, "w", newline="") as f:
            writer = csv.writer(f)
            writer.writerow(columns)
            writer.writerows(rows)
        print(f"Exported {len(rows)} reviews to {output_path}")

    def get_rating_distribution(self, app_id):
        cursor = self.conn.execute(
            "SELECT rating, COUNT(*) FROM reviews WHERE app_id = ? GROUP BY rating ORDER BY rating",
            (app_id,)
        )
        return dict(cursor.fetchall())

Comparing Apps in the Same Category

One of the most valuable use cases for Play Store scraping is competitive analysis. Compare multiple apps side by side to identify strengths and weaknesses:

def compare_apps(scraper, app_ids, review_count=100):
    comparison = []
    for app_id in app_ids:
        details = scraper.get_app_details(app_id)
        reviews_data = scraper.get_reviews(app_id, count=review_count)

        # Calculate review metrics
        ratings = [r.get("score", 0) for r in reviews_data]
        avg = sum(ratings) / len(ratings) if ratings else 0
        negative = sum(1 for r in ratings if r <= 2)

        comparison.append({
            "app": details["title"],
            "overall_rating": details["rating"],
            "total_reviews": details.get("reviews_count", 0),
            "installs": details.get("installs", "N/A"),
            "recent_avg_rating": round(avg, 2),
            "recent_negative_pct": round(negative / len(ratings) * 100, 1) if ratings else 0,
            "has_dev_replies": any(r.get("replyContent") for r in reviews_data),
        })
        time.sleep(2)

    return comparison

This comparison approach helps product teams identify where competitors fall short and where opportunities exist for differentiation.

Conclusion

The google-play-scraper library provides the most convenient access to Google Play Store data. For large-scale operations, combine the library with proxy rotation and careful rate limiting. Storing results in a database enables longitudinal analysis of review trends, developer responsiveness, and competitive positioning across your app category.

Visit dataresearchtools.com for proxy solutions and our app store optimization guides.


Related Reading

Scroll to Top