How to Scrape Google Play Reviews
Google Play Store hosts over 3.5 million Android apps with billions of user reviews. For mobile app developers, product teams, and competitive analysts, Play Store review data provides critical feedback about user experience and competitor apps.
What Data Can You Extract?
- App metadata (name, developer, category, price, rating)
- User reviews (text, rating, date, helpful count, developer reply)
- Version history and changelogs
- Download counts and installs
- In-app purchase details
- Similar apps recommendations
- Developer contact information
Example JSON Output
{
"app_id": "com.example.proxyapp",
"name": "Proxy Manager Pro",
"developer": "TechCorp Inc.",
"rating": 4.3,
"reviews_count": 25432,
"installs": "1,000,000+",
"category": "Tools",
"reviews": [{
"review_id": "gp:abc123",
"text": "Works great for rotating proxies automatically...",
"rating": 5,
"author": "John D.",
"date": "2026-02-28",
"thumbs_up": 23,
"developer_reply": "Thank you for the feedback!"
}]
}Prerequisites
pip install google-play-scraper requests beautifulsoup4Method 1: Using google-play-scraper Library
from google_play_scraper import app, reviews, search, Sort
import json
import time
class PlayStoreScraper:
def __init__(self, country="us", language="en"):
self.country = country
self.language = language
def search_apps(self, query, n_hits=30):
results = search(query, lang=self.language, country=self.country, n_hits=n_hits)
return [{
"app_id": r["appId"],
"title": r["title"],
"developer": r.get("developer"),
"rating": r.get("score"),
"installs": r.get("installs"),
"price": r.get("price"),
"icon": r.get("icon"),
} for r in results]
def get_app_details(self, app_id):
details = app(app_id, lang=self.language, country=self.country)
return {
"app_id": details["appId"],
"title": details["title"],
"developer": details["developer"],
"rating": details["score"],
"ratings_count": details["ratings"],
"reviews_count": details["reviews"],
"installs": details["installs"],
"price": details["price"],
"free": details["free"],
"category": details["genre"],
"description": details["description"][:500],
"version": details.get("version"),
"updated": details.get("updated"),
"content_rating": details.get("contentRating"),
}
def get_reviews(self, app_id, count=200, sort=Sort.NEWEST):
result, continuation_token = reviews(
app_id, lang=self.language, country=self.country,
sort=sort, count=count
)
return [{
"review_id": r["reviewId"],
"author": r["userName"],
"rating": r["score"],
"text": r["content"],
"date": str(r["at"]),
"thumbs_up": r["thumbsUpCount"],
"developer_reply": r.get("replyContent"),
"app_version": r.get("reviewCreatedVersion"),
} for r in result]
def get_all_reviews(self, app_id, max_count=1000):
all_reviews = []
result, token = reviews(
app_id, lang=self.language, country=self.country,
sort=Sort.NEWEST, count=min(max_count, 200)
)
all_reviews.extend(result)
while token and len(all_reviews) < max_count:
result, token = reviews(
app_id, lang=self.language, country=self.country,
sort=Sort.NEWEST, count=200, continuation_token=token
)
all_reviews.extend(result)
time.sleep(1)
return all_reviews[:max_count]
# Usage
scraper = PlayStoreScraper(country="us")
apps = scraper.search_apps("proxy vpn", n_hits=10)
for a in apps[:3]:
details = scraper.get_app_details(a["app_id"])
reviews_data = scraper.get_reviews(a["app_id"], count=50)
print(f"{details['title']} ({details['rating']}): {len(reviews_data)} reviews")
time.sleep(1)Handling Anti-Bot Protections
Google Play has moderate bot detection. The google-play-scraper library handles most of this, but for high-volume scraping:
- Rate limiting: 1-2 requests per second
- Proxy rotation: Every 50-100 requests
- User agent rotation: Vary user agents
Proxy Recommendations
| Proxy Type | Success Rate | Best For |
|---|---|---|
| Residential | 85-90% | High volume |
| Datacenter | 60-70% | Moderate volume |
| Mobile | 90%+ | Best success |
Use residential proxies for high-volume scraping.
Legal Considerations
- Google ToS: Automated scraping of Google Play may violate Terms of Service.
- Review Content: Reviews are user-generated and copyrighted.
- Developer Data: Developer contact info may be personal data.
- Rate Limits: Excessive scraping may trigger temporary blocks.
See our compliance guide.
Method 2: Web Scraping with Selenium
For data beyond what the google-play-scraper library provides:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options
import json
import time
class PlayStoreSeleniumScraper:
def __init__(self, proxy=None):
options = Options()
options.add_argument("--headless=new")
options.add_argument("--no-sandbox")
if proxy:
options.add_argument(f"--proxy-server={proxy}")
self.driver = webdriver.Chrome(options=options)
def scrape_app_page(self, app_id):
url = f"https://play.google.com/store/apps/details?id={app_id}&hl=en"
self.driver.get(url)
time.sleep(3)
data = self.driver.execute_script('''
const result = {};
result.title = document.querySelector("h1")?.innerText;
result.developer = document.querySelector("[class*='developer'] span")?.innerText;
// JSON-LD
const scripts = document.querySelectorAll('script[type="application/ld+json"]');
for (const script of scripts) {
try {
const json = JSON.parse(script.textContent);
if (json["@type"] === "SoftwareApplication") {
result.rating = json.aggregateRating?.ratingValue;
result.review_count = json.aggregateRating?.ratingCount;
result.price = json.offers?.price;
result.category = json.applicationCategory;
}
} catch {}
}
return result;
''')
return data
def close(self):
self.driver.quit()Advanced Scraping Techniques
Batch Processing Multiple Apps
import time
import json
def batch_scrape_apps(app_ids, scraper, delay=2):
results = []
for i, app_id in enumerate(app_ids):
try:
details = scraper.get_app_details(app_id)
reviews = scraper.get_reviews(app_id, count=50)
results.append({
"app": details,
"reviews": reviews,
"review_count": len(reviews)
})
print(f"[{i+1}/{len(app_ids)}] {details.get('title', 'Unknown')}")
except Exception as e:
print(f"Error with {app_id}: {e}")
time.sleep(delay)
return resultsSentiment Analysis on Reviews
def analyze_review_sentiment(reviews):
total = len(reviews)
if total == 0:
return {}
ratings = [r.get("score", 0) for r in reviews]
avg_rating = sum(ratings) / total
positive = sum(1 for r in ratings if r >= 4)
negative = sum(1 for r in ratings if r <= 2)
neutral = total - positive - negative
return {
"total_reviews": total,
"average_rating": round(avg_rating, 2),
"positive_pct": round(positive / total * 100, 1),
"negative_pct": round(negative / total * 100, 1),
"neutral_pct": round(neutral / total * 100, 1),
}Tracking Reviews Over Time
from datetime import datetime, timedelta
def get_recent_reviews(scraper, app_id, days=30):
all_reviews = scraper.get_all_reviews(app_id, max_count=500)
cutoff = datetime.now() - timedelta(days=days)
recent = [r for r in all_reviews if r.get("at") and r["at"] > cutoff]
return recentHandling Google Play Anti-Bot Protections
1. Rate Limiting
The google-play-scraper library handles most rate limiting internally. For custom scrapers, implement 1-2 second delays between requests.
2. Regional Content Variations
Google Play serves different content, ratings, and even app availability by region. Always specify the country and language parameters:
# Common region codes
regions = {
"US": ("us", "en"),
"UK": ("gb", "en"),
"Japan": ("jp", "ja"),
"India": ("in", "en"),
"Brazil": ("br", "pt"),
"Germany": ("de", "de"),
}3. Continuation Tokens
Google Play uses continuation tokens for pagination. The google-play-scraper library handles this automatically, but for custom implementations:
# Manual pagination with continuation tokens
result, token = reviews(app_id, count=200)
while token:
more, token = reviews(app_id, count=200, continuation_token=token)
result.extend(more)
time.sleep(1)Frequently Asked Questions
Does Google Play have an official API?
Google does not offer a public API for Play Store reviews. The google-play-scraper Python library reverse-engineers Google’s internal APIs to provide structured access. Google’s official Play Developer API is only available to app publishers for their own apps.
How many reviews can I scrape per app?
Using the google-play-scraper library, you can typically retrieve up to 10,000-15,000 reviews per app through pagination. The practical limit depends on how many reviews the app has.
Can I get developer replies to reviews?
Yes, the google-play-scraper library returns developer replies (replyContent field) when available. This is valuable for analyzing customer service patterns.
How do I compare reviews across countries?
Specify different country codes when initializing the scraper. Reviews, ratings, and even app availability can vary significantly by region.
Is scraping Google Play legal?
Google’s Terms of Service restrict automated access. The google-play-scraper library operates in a legal gray area. For production use cases, consult legal counsel and consider Google’s Play Developer API for your own apps.
Data Export and Storage
For production-level scraping pipelines, store reviews in a database for trend analysis and deduplication:
import sqlite3
import json
import csv
class PlayStoreDataStore:
def __init__(self, db_path="playstore_reviews.db"):
self.conn = sqlite3.connect(db_path)
self.conn.execute('''CREATE TABLE IF NOT EXISTS reviews
(review_id TEXT PRIMARY KEY, app_id TEXT, author TEXT,
rating INTEGER, text TEXT, date TEXT, thumbs_up INTEGER,
developer_reply TEXT, app_version TEXT,
scraped_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP)''')
def upsert_reviews(self, app_id, reviews):
new_count = 0
for r in reviews:
try:
self.conn.execute(
"""INSERT OR REPLACE INTO reviews
(review_id, app_id, author, rating, text, date, thumbs_up, developer_reply, app_version)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)""",
(r.get("review_id") or r.get("reviewId"), app_id,
r.get("author") or r.get("userName"), r.get("rating") or r.get("score"),
r.get("text") or r.get("content"), str(r.get("date") or r.get("at", "")),
r.get("thumbs_up") or r.get("thumbsUpCount", 0),
r.get("developer_reply") or r.get("replyContent"),
r.get("app_version") or r.get("reviewCreatedVersion"))
)
new_count += 1
except Exception as e:
print(f"Error storing review: {e}")
self.conn.commit()
return new_count
def export_csv(self, app_id, output_path):
cursor = self.conn.execute(
"SELECT * FROM reviews WHERE app_id = ? ORDER BY date DESC", (app_id,)
)
rows = cursor.fetchall()
columns = [desc[0] for desc in cursor.description]
with open(output_path, "w", newline="") as f:
writer = csv.writer(f)
writer.writerow(columns)
writer.writerows(rows)
print(f"Exported {len(rows)} reviews to {output_path}")
def get_rating_distribution(self, app_id):
cursor = self.conn.execute(
"SELECT rating, COUNT(*) FROM reviews WHERE app_id = ? GROUP BY rating ORDER BY rating",
(app_id,)
)
return dict(cursor.fetchall())Comparing Apps in the Same Category
One of the most valuable use cases for Play Store scraping is competitive analysis. Compare multiple apps side by side to identify strengths and weaknesses:
def compare_apps(scraper, app_ids, review_count=100):
comparison = []
for app_id in app_ids:
details = scraper.get_app_details(app_id)
reviews_data = scraper.get_reviews(app_id, count=review_count)
# Calculate review metrics
ratings = [r.get("score", 0) for r in reviews_data]
avg = sum(ratings) / len(ratings) if ratings else 0
negative = sum(1 for r in ratings if r <= 2)
comparison.append({
"app": details["title"],
"overall_rating": details["rating"],
"total_reviews": details.get("reviews_count", 0),
"installs": details.get("installs", "N/A"),
"recent_avg_rating": round(avg, 2),
"recent_negative_pct": round(negative / len(ratings) * 100, 1) if ratings else 0,
"has_dev_replies": any(r.get("replyContent") for r in reviews_data),
})
time.sleep(2)
return comparisonThis comparison approach helps product teams identify where competitors fall short and where opportunities exist for differentiation.
Conclusion
The google-play-scraper library provides the most convenient access to Google Play Store data. For large-scale operations, combine the library with proxy rotation and careful rate limiting. Storing results in a database enables longitudinal analysis of review trends, developer responsiveness, and competitive positioning across your app category.
Visit dataresearchtools.com for proxy solutions and our app store optimization guides.
- How to Scrape AliExpress Product Data
- How to Scrape Amazon Product Reviews in 2026
- aiohttp + BeautifulSoup: Async Python Scraping
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- API vs Web Scraping: When You Need Proxies (and When You Don’t)
- ASEAN Data Protection Laws: A Web Scraping Compliance Matrix
- How to Scrape AliExpress Product Data
- How to Scrape Amazon Product Reviews in 2026
- aiohttp + BeautifulSoup: Async Python Scraping
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- API vs Web Scraping: When You Need Proxies (and When You Don’t)
- ASEAN Data Protection Laws: A Web Scraping Compliance Matrix
- How to Scrape AliExpress Product Data
- How to Scrape Amazon Product Reviews in 2026
- aiohttp + BeautifulSoup: Async Python Scraping
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- API vs Web Scraping: When You Need Proxies (and When You Don’t)
- ASEAN Data Protection Laws: A Web Scraping Compliance Matrix
Related Reading
- How to Scrape AliExpress Product Data
- How to Scrape Amazon Product Reviews in 2026
- aiohttp + BeautifulSoup: Async Python Scraping
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- API vs Web Scraping: When You Need Proxies (and When You Don’t)
- ASEAN Data Protection Laws: A Web Scraping Compliance Matrix