How to Scrape Google Play Store Reviews and Install Counts (2026)

How to Scrape Google Play Store Reviews and Install Counts (2026)

Google Play Store has over 3.5 million apps, and for competitive intelligence, market research, or app analytics, scraping Play Store reviews and install counts is one of the most valuable data collection tasks you can run. The problem is that Google actively blocks scraping — rate limits, CAPTCHAs, and a shifting API surface make naive approaches fail within minutes. this guide covers what actually works in 2026: the unofficial internal API, third-party libraries, and browser automation fallbacks.

What data you can actually get

Install counts on Google Play are reported in ranges (“1M+” or “500K+”), not exact numbers. that’s annoying but workable for most use cases. reviews, on the other hand, are rich: star rating, text, reviewer name, date, device type, and app version. here’s what’s available per app listing:

FieldSourceExact or Range
Install countHTML / internal APIRange only (e.g., “10M+”)
Rating scoreHTML / internal APIExact (4.3, etc.)
Review countHTML / internal APIExact
Review textInternal API (paginated)Exact
Review dateInternal APIExact timestamp
App version reviewed onInternal APIExact
Developer replyInternal APIExact

If your use case needs exact install counts, you’ll need to cross-reference with third-party analytics providers like AppFollow, Sensor Tower, or data.ai — Play’s own data won’t give you them.

The google-play-scraper library (fastest path)

The google-play-scraper Python package wraps Google Play’s internal _/PlayStoreUi/data/batchexecute endpoint and handles pagination and parsing for you. it’s the fastest way to get moving:

from google_play_scraper import app, reviews, Sort

# app metadata (includes installs range + rating)
result = app(
    'com.spotify.music',
    lang='en',
    country='us'
)
print(result['installs'])   # "1,000,000,000+"
print(result['score'])      # 4.3
print(result['ratings'])    # 35821944

# paginated reviews
result, continuation_token = reviews(
    'com.spotify.music',
    lang='en',
    country='us',
    sort=Sort.NEWEST,
    count=100,
    filter_score_with=None
)

# keep fetching with the token
result2, next_token = reviews(
    'com.spotify.music',
    continuation_token=continuation_token
)

The library handles the protobuf decoding that Google’s batchexecute endpoint returns, which saves you the pain of doing it manually. You can pull 200-300 reviews per call before Google starts throttling. For full review dumps on a popular app, budget 10-20 requests with 2-3 second delays between them.

One gotcha: the count parameter is a hint, not a guarantee. Google sometimes returns fewer results per page, especially for older apps with sparse reviews.

Hitting the internal API directly

If you want more control (or the library breaks after a Play Store update), you can hit the batchexecute endpoint yourself. this is the same approach you’d use when reverse-engineering mobile app APIs for data extraction — identify the real endpoint, replicate the payload, strip out the obfuscation.

The request structure for Play reviews looks like this:

import requests, json

url = "https://play.google.com/_/PlayStoreUi/data/batchexecute"
payload = {
    "f.req": json.dumps([[["UsvDTd","[[null,[[10,[10,50]],true,null,[96,27,4,8,57,30,110,79,11,16,49,1,3,9,12,104,55,56,51,10,34,31,77,49,28,28,7,9,5,10,58,68,45,35,51,51,8,22,45,20,13,47,8,77]],[[\"en\",\"us\"]],null,null,null,[[]]]]",None,"generic"]]]))
}
headers = {
    "Content-Type": "application/x-www-form-urlencoded",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
}
r = requests.post(url, data=payload, headers=headers)

The response is a nested JSON-within-string format (Google wraps it in )]}'\n). You’ll need to strip that prefix, then parse through 2-3 layers of arrays to get to the review data. annoying, but doable.

If you need to intercept and modify these requests to understand the full parameter set, comparing Charles Proxy vs mitmproxy for mobile API scraping covers how to set up a MITM proxy to capture the exact payloads Play Store apps send.

Scaling up: proxies and rate limits

Scraping a single app is easy. Scraping 10,000 apps for a market survey is where things break. Google Play enforces rate limits per IP aggressively — you’ll hit 429s after roughly 50-80 requests from the same IP in a short window.

Here’s a practical setup for scale:

  1. Use rotating residential proxies (datacenter proxies get blocked faster)
  2. Keep delays between 2-5 seconds per IP
  3. Rotate user-agent strings alongside IPs
  4. Implement exponential backoff on 429 responses
  5. Cache app metadata aggressively (install counts don’t change hourly)

For review monitoring at scale, consider pulling just new reviews using the Sort.NEWEST parameter and a watermark timestamp. pulling all reviews on every run is wasteful and gets you blocked faster.

Some apps also serve different content by country. the country parameter matters: an app with 100 English reviews might have 5,000 Japanese ones. if you’re doing global sentiment analysis, you need to loop across country codes.

When the API breaks: HTML scraping fallback

Google occasionally changes the batchexecute payload structure, and libraries take a few days to catch up. The HTML fallback is slower but more stable for app metadata (installs, rating, description):

from bs4 import BeautifulSoup
import requests

app_id = "com.spotify.music"
url = f"https://play.google.com/store/apps/details?id={app_id}&hl=en&gl=us"
headers = {"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36"}
r = requests.get(url, headers=headers)
soup = BeautifulSoup(r.text, "html.parser")

# install count is in a specific itemprop or data-g-label span
installs = soup.find("div", {"data-g-label": "Installs"})

The HTML structure shifts more often than you’d like. If you need something truly stable for a production pipeline, combining the API approach for reviews with HTML scraping for metadata hedges against breakage in either.

If Play Store starts serving JavaScript-rendered content that breaks requests-based scraping, Playwright with a real browser context is the nuclear option. it’s slow and expensive at scale but handles any anti-bot measure short of device attestation. For mobile-specific SSL pinning issues when working through a proxy, Frida vs Objection for bypassing mobile app SSL pinning is worth reading before you go down that path.

Handling pagination and completeness

Getting all reviews for a popular app (some have millions) requires careful pagination handling:

  • continuation_token expires after roughly 24 hours
  • The API caps total returnable reviews at around 4,000-5,000 per language/country/sort combination
  • Sorting by Sort.MOST_RELEVANT and Sort.NEWEST gives you different slices of the total review pool
  • There’s no way to get a full 100% complete dump through the API alone

For deeper coverage, the pillar guide on scraping Google Play reviews covers additional approaches including combining API results with third-party data sources.

One approach for completeness: pull NEWEST reviews on a daily cron, sort by date, and store incrementally. over time you build a more complete dataset than any single bulk pull would give you.

Bottom line

google-play-scraper is the right starting point for 90% of use cases — it handles protobuf decoding and pagination so you don’t have to. for scale, pair it with rotating residential proxies and incremental pulls by date. if exact install numbers matter for your analysis, augment with a paid analytics provider because Play Store won’t give them to you. DRT covers these mobile scraping tradeoffs in more depth across the rest of the mobile-scraping category.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top
message me on telegram

Resources

Proxy Signals Podcast
Operator-level insights on mobile proxies and access infrastructure.

Multi-Account Proxies: Setup, Types, Tools & Mistakes (2026)