How to Scrape Google Play and App Store Data with Proxies

How to Scrape Google Play and App Store Data with Proxies

App store data is one of the most valuable resources for mobile marketers, product managers, and competitive intelligence teams. App metadata, user reviews, download estimates, and pricing information drive decisions about product positioning, feature development, and market entry.

The challenge is that both Google Play and Apple’s App Store actively resist automated data collection. This guide provides a technical walkthrough for scraping both platforms using mobile proxies, covering metadata extraction, review collection, and strategies for handling anti-bot protections.

Why Scrape App Store Data?

Before diving into the technical details, here is what you can do with app store data:

  • Competitive analysis. Track competitor apps’ ratings, reviews, update frequency, and feature changes.
  • Market research. Identify trending apps, emerging categories, and unmet user needs in specific countries.
  • ASO keyword research. Analyze which keywords competitors use in titles and descriptions.
  • Sentiment analysis. Mine user reviews for product feedback and pain points.
  • Price monitoring. Track pricing changes across regions for competitor apps and in-app purchases.
  • Investment research. Evaluate app performance metrics before acquisition or investment decisions.

Google Play vs. App Store: Key Differences for Scraping

Understanding the structural differences between the two stores helps you build more effective scrapers.

AspectGoogle PlayApple App Store
Web interfaceFull web appLimited web presence
API availabilityNo official public APIiTunes Search API
Data renderingClient-side JS heavyServer-side for web
Anti-bot measuresAggressiveModerate
Review accessWeb + API endpointsRSS feeds + API
Geo-targeting parametergl (country code)country parameter
Language parameterhl (language code)lang parameter

Google Play is harder to scrape because it relies heavily on JavaScript rendering and has more aggressive bot detection. The App Store is somewhat easier thanks to the iTunes Search API, but large-scale scraping still requires proxies.

Setting Up Your Proxy Infrastructure

Why Mobile Proxies Are Necessary

Both stores are designed for mobile users. Their anti-bot systems are tuned to expect mobile traffic patterns. Mobile proxies provide:

  • Legitimate mobile carrier IPs that match expected traffic sources
  • Natural IP rotation through carrier NAT pools
  • Lower detection rates compared to datacenter or residential proxies
  • Accurate geo-targeted results from real mobile networks

Configuring DataResearchTools Mobile Proxies

DataResearchTools offers mobile proxies with country-level targeting across Southeast Asia, which is ideal for scraping localized app store data. Here is a basic configuration:

# Proxy configuration for different countries
PROXY_CONFIGS = {
    "singapore": {
        "http": "http://user-country-sg:pass@gate.dataresearchtools.com:port",
        "https": "http://user-country-sg:pass@gate.dataresearchtools.com:port"
    },
    "thailand": {
        "http": "http://user-country-th:pass@gate.dataresearchtools.com:port",
        "https": "http://user-country-th:pass@gate.dataresearchtools.com:port"
    },
    "indonesia": {
        "http": "http://user-country-id:pass@gate.dataresearchtools.com:port",
        "https": "http://user-country-id:pass@gate.dataresearchtools.com:port"
    },
    "philippines": {
        "http": "http://user-country-ph:pass@gate.dataresearchtools.com:port",
        "https": "http://user-country-ph:pass@gate.dataresearchtools.com:port"
    }
}

Scraping Google Play Store

Approach 1: Direct Web Scraping

Google Play renders most content with JavaScript, so you need a headless browser or a way to handle dynamic content.

Using Playwright with Proxies

from playwright.sync_api import sync_playwright
import json

def scrape_google_play_app(app_id, country="sg", language="en"):
    url = f"https://play.google.com/store/apps/details?id={app_id}&gl={country}&hl={language}"

    with sync_playwright() as p:
        browser = p.chromium.launch(
            proxy={
                "server": "http://gate.dataresearchtools.com:port",
                "username": f"user-country-{country}",
                "password": "your-password"
            }
        )

        context = browser.new_context(
            user_agent="Mozilla/5.0 (Linux; Android 14; Pixel 8) AppleWebKit/537.36"
        )

        page = context.new_page()
        page.goto(url, wait_until="networkidle")

        # Extract app metadata
        data = {}
        data["title"] = page.query_selector("h1").inner_text()
        data["developer"] = page.query_selector('[class*="developer"]').inner_text()
        data["rating"] = page.query_selector('[class*="rating"]').inner_text()

        # Extract description
        desc_element = page.query_selector('[data-g-id="description"]')
        if desc_element:
            data["description"] = desc_element.inner_text()

        browser.close()
        return data

Using HTTP Requests with Parsing

For higher volume scraping, direct HTTP requests are more efficient. Google Play has some endpoints that return structured data:

import requests
from bs4 import BeautifulSoup

def scrape_play_store_search(keyword, country="th", language="th"):
    url = "https://play.google.com/store/search"
    params = {
        "q": keyword,
        "c": "apps",
        "gl": country,
        "hl": language
    }

    headers = {
        "User-Agent": "Mozilla/5.0 (Linux; Android 14; SM-S928B) AppleWebKit/537.36",
        "Accept-Language": f"{language}-{country.upper()},{language};q=0.9"
    }

    proxy = {
        "http": f"http://user-country-{country}:pass@gate.dataresearchtools.com:port",
        "https": f"http://user-country-{country}:pass@gate.dataresearchtools.com:port"
    }

    response = requests.get(url, params=params, headers=headers, proxies=proxy)

    if response.status_code == 200:
        soup = BeautifulSoup(response.text, "html.parser")
        # Parse search results
        apps = []
        for app_card in soup.select("div.ULeU3b"):
            app = {
                "name": app_card.select_one(".ubGTjb").text if app_card.select_one(".ubGTjb") else None,
                "developer": app_card.select_one(".wMUdtb").text if app_card.select_one(".wMUdtb") else None,
            }
            apps.append(app)
        return apps

    return None

Scraping Google Play Reviews

Google Play reviews can be fetched through an internal API endpoint. This requires some reverse engineering, but the basic approach is:

import requests

def fetch_play_reviews(app_id, country="id", language="id", count=100):
    """Fetch reviews from Google Play internal API."""

    # Google Play uses a batch RPC endpoint for reviews
    url = "https://play.google.com/_/PlayStoreUi/data/batchexecute"

    # The payload format requires specific protobuf-like parameters
    # This is a simplified example -- actual implementation needs
    # the correct request format

    proxy = {
        "http": f"http://user-country-{country}:pass@gate.dataresearchtools.com:port",
        "https": f"http://user-country-{country}:pass@gate.dataresearchtools.com:port"
    }

    headers = {
        "Content-Type": "application/x-www-form-urlencoded;charset=utf-8",
        "User-Agent": "Mozilla/5.0 (Linux; Android 14; Pixel 8) AppleWebKit/537.36"
    }

    # Build the review request payload
    payload = build_review_payload(app_id, country, language, count)

    response = requests.post(url, data=payload, headers=headers, proxies=proxy)
    return parse_review_response(response.text)

Scraping Apple App Store

Approach 1: iTunes Search API

Apple provides an official search API that is easier to work with than Google Play:

import requests

def search_app_store(term, country="sg", limit=50):
    url = "https://itunes.apple.com/search"
    params = {
        "term": term,
        "country": country,
        "entity": "software",
        "limit": limit
    }

    proxy = {
        "http": f"http://user-country-{country}:pass@gate.dataresearchtools.com:port",
        "https": f"http://user-country-{country}:pass@gate.dataresearchtools.com:port"
    }

    response = requests.get(url, params=params, proxies=proxy)
    data = response.json()

    apps = []
    for result in data.get("results", []):
        app = {
            "name": result["trackName"],
            "developer": result["artistName"],
            "price": result["price"],
            "rating": result.get("averageUserRating", "N/A"),
            "review_count": result.get("userRatingCount", 0),
            "bundle_id": result["bundleId"],
            "description": result["description"],
            "genres": result["genres"],
            "version": result["version"],
            "size_bytes": result["fileSizeBytes"],
            "content_rating": result["contentAdvisoryRating"]
        }
        apps.append(app)

    return apps

Approach 2: App Store Lookup API

For specific apps, the lookup API provides detailed metadata:

def lookup_app(app_id, country="ph"):
    url = f"https://itunes.apple.com/lookup?id={app_id}&country={country}"

    proxy = {
        "http": f"http://user-country-{country}:pass@gate.dataresearchtools.com:port",
        "https": f"http://user-country-{country}:pass@gate.dataresearchtools.com:port"
    }

    response = requests.get(url, proxies=proxy)
    data = response.json()

    if data["resultCount"] > 0:
        return data["results"][0]
    return None

Scraping App Store Reviews

App Store reviews are available through RSS feeds, which makes them relatively straightforward to collect:

import requests
import xml.etree.ElementTree as ET

def fetch_app_store_reviews(app_id, country="sg", page=1):
    url = f"https://itunes.apple.com/{country}/rss/customerreviews/id={app_id}/page={page}/sortBy=mostRecent/xml"

    proxy = {
        "http": f"http://user-country-{country}:pass@gate.dataresearchtools.com:port",
        "https": f"http://user-country-{country}:pass@gate.dataresearchtools.com:port"
    }

    response = requests.get(url, proxies=proxy)

    if response.status_code == 200:
        root = ET.fromstring(response.content)
        namespace = {"atom": "http://www.w3.org/2005/Atom", "im": "http://itunes.apple.com/rss"}

        reviews = []
        for entry in root.findall("atom:entry", namespace)[1:]:  # Skip first entry (metadata)
            review = {
                "title": entry.find("atom:title", namespace).text,
                "content": entry.find("atom:content", namespace).text,
                "rating": entry.find("im:rating", namespace).text,
                "author": entry.find("atom:author/atom:name", namespace).text,
                "version": entry.find("im:version", namespace).text
            }
            reviews.append(review)

        return reviews

    return []

Handling Anti-Bot Measures

Both stores employ various techniques to prevent automated scraping. Here is how to handle them.

Google Play Anti-Bot Defenses

Google Play uses several detection mechanisms:

  • JavaScript challenges. Renders content dynamically to block simple HTTP scrapers.
  • Rate limiting. Throttles requests from IPs that make too many requests.
  • CAPTCHA. Presents CAPTCHAs when suspicious activity is detected.
  • Fingerprinting. Analyzes browser characteristics and behavior patterns.

Mitigation Strategies

  1. Use headless browsers for JavaScript-heavy pages, but configure them to avoid detection:
# Playwright stealth configuration
context = browser.new_context(
    viewport={"width": 412, "height": 915},  # Mobile viewport
    user_agent="Mozilla/5.0 (Linux; Android 14; Pixel 8) ...",
    locale="th-TH",
    timezone_id="Asia/Bangkok",
    is_mobile=True,
    has_touch=True
)
  1. Implement request delays between 3-8 seconds with random jitter:
import time
import random

def smart_delay():
    base_delay = 5
    jitter = random.uniform(-2, 3)
    time.sleep(base_delay + jitter)
  1. Rotate proxies between requests. With DataResearchTools mobile proxies, each new connection can provide a fresh IP from the carrier’s pool.
  1. Maintain session consistency. Use the same proxy IP for related requests (e.g., loading an app page and then its reviews) to avoid appearing as multiple simultaneous users.

App Store Anti-Bot Defenses

Apple’s defenses are generally less aggressive for API endpoints but include:

  • Rate limiting on the iTunes Search and Lookup APIs (approximately 20 requests per minute per IP).
  • Temporary bans for sustained high-volume requests.
  • Geo-verification that checks if requests match the claimed country.

Mitigation Strategies

  1. Respect rate limits. Keep API requests under 20 per minute per IP.
  2. Use mobile proxies from the target country so geo-verification passes naturally.
  3. Implement exponential backoff when you receive 403 or 429 responses:
def request_with_backoff(url, proxies, max_retries=5):
    for attempt in range(max_retries):
        response = requests.get(url, proxies=proxies)

        if response.status_code == 200:
            return response

        if response.status_code in (403, 429):
            wait_time = (2 ** attempt) + random.uniform(0, 1)
            time.sleep(wait_time)
            continue

        break

    return None

Building a Complete Scraping Pipeline

Here is how to put all the pieces together into a production-ready pipeline.

Architecture Overview

[Scheduler] --> [Task Queue] --> [Scraper Workers] --> [Proxy Pool] --> [App Stores]
                                       |
                                  [Data Store]
                                       |
                                  [Processing]
                                       |
                                  [Output/Reports]

Data Storage Schema

Structure your database to capture all relevant fields:

CREATE TABLE apps (
    id SERIAL PRIMARY KEY,
    store VARCHAR(10),          -- 'google_play' or 'app_store'
    app_id VARCHAR(255),
    country VARCHAR(5),
    title TEXT,
    developer TEXT,
    rating DECIMAL(3,2),
    review_count INTEGER,
    price DECIMAL(10,2),
    category VARCHAR(100),
    description TEXT,
    last_updated TIMESTAMP,
    scraped_at TIMESTAMP DEFAULT NOW()
);

CREATE TABLE reviews (
    id SERIAL PRIMARY KEY,
    app_id VARCHAR(255),
    store VARCHAR(10),
    country VARCHAR(5),
    rating INTEGER,
    title TEXT,
    content TEXT,
    author VARCHAR(255),
    app_version VARCHAR(50),
    review_date DATE,
    scraped_at TIMESTAMP DEFAULT NOW()
);

Error Handling and Monitoring

Robust error handling is critical for long-running scraping operations:

import logging

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("app_store_scraper")

def scrape_with_monitoring(app_id, country, store):
    try:
        if store == "google_play":
            data = scrape_google_play_app(app_id, country)
        else:
            data = lookup_app(app_id, country)

        if data:
            save_to_database(data)
            logger.info(f"Successfully scraped {app_id} from {store} ({country})")
        else:
            logger.warning(f"No data returned for {app_id} from {store} ({country})")

    except requests.exceptions.ProxyError as e:
        logger.error(f"Proxy error for {app_id}: {e}")
        # Switch to backup proxy
    except requests.exceptions.Timeout as e:
        logger.error(f"Timeout for {app_id}: {e}")
        # Retry with longer timeout
    except Exception as e:
        logger.error(f"Unexpected error for {app_id}: {e}")

Scaling Your Scraping Operation

As your data needs grow, consider these scaling strategies:

  • Parallel workers. Run multiple scraper instances, each using a different proxy from your DataResearchTools pool.
  • Prioritized queues. Scrape high-priority apps (your own and top competitors) more frequently than the broader market.
  • Incremental updates. Only scrape full app details when you detect a change (new version, rating shift, etc.). Check for changes with lightweight API calls first.
  • Regional distribution. Use proxies from each target country simultaneously to maximize throughput while staying within per-IP rate limits.

Legal and Ethical Considerations

Keep these points in mind when scraping app store data:

  • Review and comply with each store’s Terms of Service.
  • Do not scrape personal user data beyond what is publicly visible.
  • Store and process data in compliance with local data protection laws (PDPA in Thailand, etc.).
  • Use scraped data for legitimate business purposes such as market research and competitive analysis.
  • Avoid placing excessive load on app store servers.

Conclusion

Scraping Google Play and App Store data requires the right tools, careful anti-bot handling, and reliable proxies. Mobile proxies from DataResearchTools give you the authentic mobile IPs needed to access localized store data across Southeast Asian markets without triggering detection systems.

Start with the iTunes Search API for App Store data since it is the easiest entry point, then build out your Google Play scraping capabilities using headless browsers. Scale gradually, monitor your success rates, and adjust your approach based on what each store throws at you.


Related Reading

Scroll to Top