How to Scrape Stock Market Data with Mobile Proxies

How to Scrape Stock Market Data with Mobile Proxies

Financial data drives trading strategies, research, and investment decisions. While premium data feeds from Bloomberg Terminal or Refinitiv cost tens of thousands of dollars per year, much of the underlying stock market data is publicly available — if you know how to collect it reliably.

This guide covers how to scrape stock market data using mobile proxies, including the best data sources, proxy configuration, technical implementation, and the legal landscape you need to navigate.

Why Scrape Stock Market Data?

Cost Reduction

Professional financial data terminals cost $20,000-$25,000 per year per seat. For independent traders, small funds, and research teams, scraping publicly available data can provide 80% of the value at a fraction of the cost.

Custom Data Pipelines

Pre-built data feeds deliver standardized formats on fixed schedules. Scraping lets you:

  • Collect exactly the fields you need
  • Update at your preferred frequency (real-time, hourly, daily)
  • Combine data from multiple sources into a unified format
  • Build proprietary datasets that give you an analytical edge

Coverage Gaps

No single data provider covers everything. Scraping fills gaps:

  • Emerging market stocks not covered by major data providers
  • Alternative data from financial forums and social sentiment
  • Regional exchange data from Southeast Asian and other local markets
  • Historical data that providers have archived or paywalled

Best Stock Market Data Sources for Scraping

Tier 1: APIs (Preferred When Available)

Always prefer official APIs over web scraping when they exist. They are more reliable, structured, and typically allowed under terms of service.

SourceCoverageFree TierRate LimitsNotes
Yahoo Finance API (unofficial)Global stocks, ETFs, cryptoYes~2,000 req/hourMost popular free source
Alpha VantageUS + international stocks25 req/day (free)Strict on free tierGood historical data
Polygon.ioUS stocks, options, cryptoYes (delayed data)5 req/min (free)Real-time with paid plans
FinnhubGlobal stocks, forex, crypto60 req/min (free)ModerateGood international coverage
IEX CloudUS stocksLimited free tierPlan-dependentHigh-quality US data
Twelve DataGlobal stocks, forex, crypto800 req/day (free)ModerateGood SEA coverage

Proxy usage with APIs: Even with APIs, proxies help you stay within rate limits by distributing requests across multiple IPs. Each IP gets its own rate limit allocation.

Tier 2: Financial Websites (Scraping Required)

When APIs are insufficient, these websites provide rich financial data:

Yahoo Finance (finance.yahoo.com)

  • Real-time and historical prices for global stocks
  • Financial statements (income, balance sheet, cash flow)
  • Analyst estimates and recommendations
  • Heavily rate-limited and increasingly difficult to scrape without quality proxies

Google Finance (google.com/finance)

  • Real-time quotes and basic charts
  • Limited historical data
  • Simpler to scrape than Yahoo Finance but less data depth

Investing.com

  • Comprehensive global coverage including SEA markets
  • Economic calendar, technical indicators
  • Aggressive anti-bot protection — requires mobile proxies for reliable access

MarketWatch (marketwatch.com)

  • US market focus with good fundamental data
  • Moderate anti-bot protection
  • Good for earnings and financial news data

Bloomberg (bloomberg.com)

  • Limited free data (paywalled for most content)
  • Heavy anti-bot protection
  • Mobile proxies essential for any sustained scraping

Tier 3: Exchange Websites (Regional Data)

For Southeast Asian markets specifically:

ExchangeWebsiteCoverageScraping Difficulty
SGX (Singapore)sgx.comSingapore stocks, REITsModerate
SET (Thailand)set.or.thThai stocks, warrantsModerate
IDX (Indonesia)idx.co.idIndonesian stocksEasy-Moderate
PSE (Philippines)pse.com.phPhilippine stocksEasy
Bursa Malaysiabursamalaysia.comMalaysian stocksModerate
HOSE/HNX (Vietnam)hsx.vn, hnx.vnVietnamese stocksModerate

Critical note: Scraping regional SEA exchanges often requires IPs from the respective country. SGX may throttle or block non-Singapore IPs. SET may serve different content to Thai versus foreign IPs. DataResearchTools mobile proxies from SEA carriers provide the geographic authenticity needed for reliable access to these regional exchanges.

Proxy Setup for Financial Data Scraping

Why Mobile Proxies Are Ideal for Financial Scraping

Financial websites implement some of the strictest anti-bot measures because:

  1. Data value: Financial data has direct monetary value, making it a high-value scraping target
  2. Regulatory pressure: Financial platforms face regulatory requirements to prevent unauthorized data redistribution
  3. Commercial interest: They sell data subscriptions and want to protect that revenue stream

Mobile proxies overcome these defenses because:

  • CGNAT IPs are shared by thousands of real mobile users — blocking them affects legitimate traffic
  • Mobile carrier IPs have the highest trust scores in IP reputation databases
  • Financial platforms expect mobile traffic (trading apps are predominantly mobile)

Recommended Proxy Architecture

Financial Data Scraper
    |
    +-- API Requests (Yahoo Finance, Alpha Vantage)
    |       -> Rotating residential or datacenter proxies
    |       -> High volume, moderate protection
    |
    +-- Website Scraping (Investing.com, Bloomberg)
    |       -> Mobile proxies (rotating)
    |       -> High protection targets need highest trust
    |
    +-- Regional Exchange Data (SGX, SET, IDX)
    |       -> Country-specific mobile proxies
    |       -> Singapore proxy for SGX, Thai proxy for SET, etc.
    |
    +-- Real-Time Price Feeds
            -> Dedicated ISP or mobile proxy (sticky session)
            -> Persistent WebSocket connections

Configuration Example

import requests
from datetime import datetime

# Configure proxy for financial scraping
PROXY_CONFIG = {
    # For Yahoo Finance API - rotating proxy
    "api_scraping": {
        "http": "http://user-country-us:pass@gateway.dataresearchtools.com:port",
        "https": "http://user-country-us:pass@gateway.dataresearchtools.com:port"
    },
    # For SGX (Singapore Exchange) - Singapore mobile proxy
    "sgx_scraping": {
        "http": "http://user-country-sg:pass@gateway.dataresearchtools.com:port",
        "https": "http://user-country-sg:pass@gateway.dataresearchtools.com:port"
    },
    # For SET (Thai Exchange) - Thailand mobile proxy
    "set_scraping": {
        "http": "http://user-country-th:pass@gateway.dataresearchtools.com:port",
        "https": "http://user-country-th:pass@gateway.dataresearchtools.com:port"
    }
}

def scrape_stock_data(ticker, source="yahoo"):
    proxy = PROXY_CONFIG.get(f"{source}_scraping", PROXY_CONFIG["api_scraping"])

    headers = {
        "User-Agent": "Mozilla/5.0 (iPhone; CPU iPhone OS 17_0 like Mac OS X)",
        "Accept-Language": "en-US,en;q=0.9",
    }

    # Add appropriate delay between requests
    url = f"https://finance.yahoo.com/quote/{ticker}"
    response = requests.get(url, proxies=proxy, headers=headers, timeout=30)
    return response

Technical Implementation Guide

Step 1: Identify Your Data Requirements

Before writing any code, document exactly what data you need:

Price data:

  • Open, High, Low, Close, Volume (OHLCV)
  • Real-time vs delayed vs end-of-day
  • Time granularity (tick, 1-min, 5-min, hourly, daily)

Fundamental data:

  • Financial statements (quarterly, annual)
  • Earnings estimates
  • Dividend history
  • Company metrics (P/E, market cap, etc.)

Alternative data:

  • News headlines and sentiment
  • Analyst ratings changes
  • Insider transactions
  • Social media sentiment

Step 2: Build Your Scraper Architecture

A robust financial data scraper needs several components:

Request manager:

  • Handles proxy rotation
  • Implements rate limiting per target
  • Manages retries with exponential backoff
  • Logs all requests for debugging

Parser layer:

  • Extracts structured data from HTML or JSON
  • Handles format changes gracefully
  • Validates data types (prices should be numbers, dates should be valid)

Storage layer:

  • Time-series optimized database (InfluxDB, TimescaleDB, or even well-indexed PostgreSQL)
  • Deduplication logic
  • Data quality checks

Scheduler:

  • Runs jobs at appropriate intervals
  • Handles market hours vs off-hours differently
  • Manages timezone conversions (critical for SEA markets spanning multiple zones)

Step 3: Handle Common Challenges

Challenge: Rate limiting

Financial websites aggressively rate-limit scrapers. Solutions:

  • Distribute requests across multiple mobile proxy IPs
  • Implement per-domain request queues with configurable delays
  • Use exponential backoff: start at 2 seconds, double on each failure, cap at 60 seconds
  • Schedule bulk scraping during off-peak hours (weekends for market data, overnight for news)

Challenge: JavaScript rendering

Many financial websites load data dynamically via JavaScript. You need either:

  • A headless browser (Playwright, Puppeteer) to render the page
  • Network analysis to find the underlying API calls (often more efficient)

For mobile proxies, headless browsers with mobile device emulation provide the most authentic traffic pattern.

Challenge: Data format changes

Financial websites change their HTML structure without warning. Protect yourself:

  • Use multiple selectors for critical data points
  • Implement data validation (a stock price should be positive, volume should be an integer)
  • Set up alerts when scrapers return unexpected data
  • Monitor scraper success rates daily

Challenge: Market-specific encoding

SEA exchange websites may use local character encoding for company names and announcements. Ensure your scraper handles:

  • Thai script (SET listings)
  • Indonesian language (IDX listings)
  • Chinese characters (SGX has dual-language listings)

Step 4: Data Quality Assurance

Financial data errors can cost real money. Implement checks:

def validate_stock_data(data):
    checks = {
        "price_positive": data["close"] > 0,
        "volume_non_negative": data["volume"] >= 0,
        "high_gte_low": data["high"] >= data["low"],
        "close_in_range": data["low"] <= data["close"] <= data["high"],
        "date_valid": data["date"] <= datetime.now(),
        "price_reasonable": data["close"] < data["previous_close"] * 2,  # No 100%+ jumps without review
    }

    failed = [k for k, v in checks.items() if not v]
    if failed:
        log_warning(f"Data quality check failed: {failed} for {data['ticker']}")
        return False
    return True

Legal Considerations

What Is Generally Permissible

Based on current legal precedent (primarily US-based, with the hiQ v. LinkedIn ruling as the landmark case):

  • Scraping publicly available data is generally legal
  • Accessing data without authentication that is displayed to any visitor is typically permissible
  • Collecting factual data (stock prices, financial figures) does not violate copyright (facts cannot be copyrighted)

What Requires Caution

  • Terms of service violations can expose you to breach-of-contract claims (though enforcement is rare for small-scale scraping)
  • Data redistribution may violate exchange licensing agreements
  • Real-time data is often protected by exchange data agreements
  • Circumventing technical access controls may raise legal issues under the CFAA (US) or equivalent laws

SEA-Specific Legal Considerations

Each SEA country has its own legal framework:

  • Singapore: Strong intellectual property protections; respect SGX data licensing terms
  • Thailand: SET data has specific redistribution restrictions
  • Indonesia: OJK (financial regulator) oversees financial data distribution
  • Philippines: PSE data policies are relatively permissive for personal use

Best Practices for Legal Compliance

  1. Scrape only publicly available data — do not bypass login walls or paywalls
  2. Respect robots.txt — at minimum, document your decision if you choose to access disallowed paths
  3. Do not redistribute raw data commercially without understanding licensing requirements
  4. Add value — derivative analysis and insights built on factual data are generally protected
  5. Rate-limit your scraping — do not overload servers (this is where proxies help by distributing load)
  6. Cache aggressively — do not re-scrape data you already have

Building a Multi-Market Financial Data Pipeline

For teams tracking stocks across both US and SEA markets, here is a practical pipeline architecture:

US Market Data (NYSE, NASDAQ)

Schedule: Market hours (9:30 AM - 4:00 PM ET)
Sources: Yahoo Finance API + Polygon.io
Proxy: US residential rotating proxies
Frequency: Every 5 minutes during trading hours, daily after close

Singapore Market Data (SGX)

Schedule: Market hours (9:00 AM - 5:00 PM SGT)
Sources: SGX website + Yahoo Finance (SGX-listed tickers)
Proxy: DataResearchTools Singapore mobile proxy
Frequency: Every 15 minutes during trading hours

Thailand Market Data (SET)

Schedule: Market hours (10:00 AM - 4:30 PM ICT)
Sources: SET website + Investing.com
Proxy: DataResearchTools Thailand mobile proxy
Frequency: Every 15 minutes during trading hours

Indonesia Market Data (IDX)

Schedule: Market hours (9:00 AM - 4:00 PM WIB)
Sources: IDX website + Yahoo Finance
Proxy: DataResearchTools Indonesia mobile proxy
Frequency: Every 15 minutes during trading hours

End-of-Day Consolidation

Schedule: After all markets close (10:00 PM SGT)
Tasks:
  - Validate all collected data
  - Fill gaps from backup sources
  - Calculate derived metrics
  - Export to analysis tools

Performance Optimization Tips

1. Cache Everything

Store previously collected data locally. Never re-scrape historical data that will not change (last year’s financial statements, historical price data).

2. Use APIs When Possible

Scraping HTML is 10-100x more expensive (in bandwidth and processing) than calling an API. Always check for an API endpoint before building an HTML scraper.

3. Parallel Collection

Run scrapers for different markets in parallel since they operate in different time zones:

  • SGX and IDX overlap (both UTC+7/+8 trading hours)
  • SET trading hours partially overlap with SGX
  • US markets open after SEA markets close

4. Smart Proxy Allocation

  • Use cheaper datacenter/residential proxies for public APIs with published rate limits
  • Reserve mobile proxies for protected websites and regional exchange access
  • Match proxy geography to the exchange you are scraping

5. Monitor and Alert

Set up monitoring for:

  • Scraper success rate dropping below 95%
  • Unusual data patterns (potential scraping errors)
  • Proxy costs exceeding budget thresholds
  • Target website structure changes (CSS selector failures)

Conclusion

Scraping stock market data is a powerful capability for traders, researchers, and analysts who need cost-effective access to financial information across global markets. Mobile proxies are the key enabler for accessing protected financial platforms reliably, especially when targeting regional exchanges in Southeast Asia.

The combination of official APIs for high-volume standardized data and targeted web scraping with mobile proxies for protected sources gives you comprehensive coverage. For SEA market data, DataResearchTools mobile proxies provide the country-specific carrier IPs needed to access SGX, SET, IDX, and PSE data as a local user — a requirement that generic proxy providers cannot reliably meet.

Build your financial data pipeline thoughtfully: respect legal boundaries, implement robust quality checks, cache aggressively, and allocate proxy resources based on the specific protection level of each data source. The result is a cost-effective, reliable data infrastructure that supports informed financial decision-making across markets.


Related Reading

Scroll to Top