How to Scrape Stock Market Data with Mobile Proxies
Financial data drives trading strategies, research, and investment decisions. While premium data feeds from Bloomberg Terminal or Refinitiv cost tens of thousands of dollars per year, much of the underlying stock market data is publicly available — if you know how to collect it reliably.
This guide covers how to scrape stock market data using mobile proxies, including the best data sources, proxy configuration, technical implementation, and the legal landscape you need to navigate.
Why Scrape Stock Market Data?
Cost Reduction
Professional financial data terminals cost $20,000-$25,000 per year per seat. For independent traders, small funds, and research teams, scraping publicly available data can provide 80% of the value at a fraction of the cost.
Custom Data Pipelines
Pre-built data feeds deliver standardized formats on fixed schedules. Scraping lets you:
- Collect exactly the fields you need
- Update at your preferred frequency (real-time, hourly, daily)
- Combine data from multiple sources into a unified format
- Build proprietary datasets that give you an analytical edge
Coverage Gaps
No single data provider covers everything. Scraping fills gaps:
- Emerging market stocks not covered by major data providers
- Alternative data from financial forums and social sentiment
- Regional exchange data from Southeast Asian and other local markets
- Historical data that providers have archived or paywalled
Best Stock Market Data Sources for Scraping
Tier 1: APIs (Preferred When Available)
Always prefer official APIs over web scraping when they exist. They are more reliable, structured, and typically allowed under terms of service.
| Source | Coverage | Free Tier | Rate Limits | Notes |
|---|---|---|---|---|
| Yahoo Finance API (unofficial) | Global stocks, ETFs, crypto | Yes | ~2,000 req/hour | Most popular free source |
| Alpha Vantage | US + international stocks | 25 req/day (free) | Strict on free tier | Good historical data |
| Polygon.io | US stocks, options, crypto | Yes (delayed data) | 5 req/min (free) | Real-time with paid plans |
| Finnhub | Global stocks, forex, crypto | 60 req/min (free) | Moderate | Good international coverage |
| IEX Cloud | US stocks | Limited free tier | Plan-dependent | High-quality US data |
| Twelve Data | Global stocks, forex, crypto | 800 req/day (free) | Moderate | Good SEA coverage |
Proxy usage with APIs: Even with APIs, proxies help you stay within rate limits by distributing requests across multiple IPs. Each IP gets its own rate limit allocation.
Tier 2: Financial Websites (Scraping Required)
When APIs are insufficient, these websites provide rich financial data:
Yahoo Finance (finance.yahoo.com)
- Real-time and historical prices for global stocks
- Financial statements (income, balance sheet, cash flow)
- Analyst estimates and recommendations
- Heavily rate-limited and increasingly difficult to scrape without quality proxies
Google Finance (google.com/finance)
- Real-time quotes and basic charts
- Limited historical data
- Simpler to scrape than Yahoo Finance but less data depth
Investing.com
- Comprehensive global coverage including SEA markets
- Economic calendar, technical indicators
- Aggressive anti-bot protection — requires mobile proxies for reliable access
MarketWatch (marketwatch.com)
- US market focus with good fundamental data
- Moderate anti-bot protection
- Good for earnings and financial news data
Bloomberg (bloomberg.com)
- Limited free data (paywalled for most content)
- Heavy anti-bot protection
- Mobile proxies essential for any sustained scraping
Tier 3: Exchange Websites (Regional Data)
For Southeast Asian markets specifically:
| Exchange | Website | Coverage | Scraping Difficulty |
|---|---|---|---|
| SGX (Singapore) | sgx.com | Singapore stocks, REITs | Moderate |
| SET (Thailand) | set.or.th | Thai stocks, warrants | Moderate |
| IDX (Indonesia) | idx.co.id | Indonesian stocks | Easy-Moderate |
| PSE (Philippines) | pse.com.ph | Philippine stocks | Easy |
| Bursa Malaysia | bursamalaysia.com | Malaysian stocks | Moderate |
| HOSE/HNX (Vietnam) | hsx.vn, hnx.vn | Vietnamese stocks | Moderate |
Critical note: Scraping regional SEA exchanges often requires IPs from the respective country. SGX may throttle or block non-Singapore IPs. SET may serve different content to Thai versus foreign IPs. DataResearchTools mobile proxies from SEA carriers provide the geographic authenticity needed for reliable access to these regional exchanges.
Proxy Setup for Financial Data Scraping
Why Mobile Proxies Are Ideal for Financial Scraping
Financial websites implement some of the strictest anti-bot measures because:
- Data value: Financial data has direct monetary value, making it a high-value scraping target
- Regulatory pressure: Financial platforms face regulatory requirements to prevent unauthorized data redistribution
- Commercial interest: They sell data subscriptions and want to protect that revenue stream
Mobile proxies overcome these defenses because:
- CGNAT IPs are shared by thousands of real mobile users — blocking them affects legitimate traffic
- Mobile carrier IPs have the highest trust scores in IP reputation databases
- Financial platforms expect mobile traffic (trading apps are predominantly mobile)
Recommended Proxy Architecture
Financial Data Scraper
|
+-- API Requests (Yahoo Finance, Alpha Vantage)
| -> Rotating residential or datacenter proxies
| -> High volume, moderate protection
|
+-- Website Scraping (Investing.com, Bloomberg)
| -> Mobile proxies (rotating)
| -> High protection targets need highest trust
|
+-- Regional Exchange Data (SGX, SET, IDX)
| -> Country-specific mobile proxies
| -> Singapore proxy for SGX, Thai proxy for SET, etc.
|
+-- Real-Time Price Feeds
-> Dedicated ISP or mobile proxy (sticky session)
-> Persistent WebSocket connectionsConfiguration Example
import requests
from datetime import datetime
# Configure proxy for financial scraping
PROXY_CONFIG = {
# For Yahoo Finance API - rotating proxy
"api_scraping": {
"http": "http://user-country-us:pass@gateway.dataresearchtools.com:port",
"https": "http://user-country-us:pass@gateway.dataresearchtools.com:port"
},
# For SGX (Singapore Exchange) - Singapore mobile proxy
"sgx_scraping": {
"http": "http://user-country-sg:pass@gateway.dataresearchtools.com:port",
"https": "http://user-country-sg:pass@gateway.dataresearchtools.com:port"
},
# For SET (Thai Exchange) - Thailand mobile proxy
"set_scraping": {
"http": "http://user-country-th:pass@gateway.dataresearchtools.com:port",
"https": "http://user-country-th:pass@gateway.dataresearchtools.com:port"
}
}
def scrape_stock_data(ticker, source="yahoo"):
proxy = PROXY_CONFIG.get(f"{source}_scraping", PROXY_CONFIG["api_scraping"])
headers = {
"User-Agent": "Mozilla/5.0 (iPhone; CPU iPhone OS 17_0 like Mac OS X)",
"Accept-Language": "en-US,en;q=0.9",
}
# Add appropriate delay between requests
url = f"https://finance.yahoo.com/quote/{ticker}"
response = requests.get(url, proxies=proxy, headers=headers, timeout=30)
return responseTechnical Implementation Guide
Step 1: Identify Your Data Requirements
Before writing any code, document exactly what data you need:
Price data:
- Open, High, Low, Close, Volume (OHLCV)
- Real-time vs delayed vs end-of-day
- Time granularity (tick, 1-min, 5-min, hourly, daily)
Fundamental data:
- Financial statements (quarterly, annual)
- Earnings estimates
- Dividend history
- Company metrics (P/E, market cap, etc.)
Alternative data:
- News headlines and sentiment
- Analyst ratings changes
- Insider transactions
- Social media sentiment
Step 2: Build Your Scraper Architecture
A robust financial data scraper needs several components:
Request manager:
- Handles proxy rotation
- Implements rate limiting per target
- Manages retries with exponential backoff
- Logs all requests for debugging
Parser layer:
- Extracts structured data from HTML or JSON
- Handles format changes gracefully
- Validates data types (prices should be numbers, dates should be valid)
Storage layer:
- Time-series optimized database (InfluxDB, TimescaleDB, or even well-indexed PostgreSQL)
- Deduplication logic
- Data quality checks
Scheduler:
- Runs jobs at appropriate intervals
- Handles market hours vs off-hours differently
- Manages timezone conversions (critical for SEA markets spanning multiple zones)
Step 3: Handle Common Challenges
Challenge: Rate limiting
Financial websites aggressively rate-limit scrapers. Solutions:
- Distribute requests across multiple mobile proxy IPs
- Implement per-domain request queues with configurable delays
- Use exponential backoff: start at 2 seconds, double on each failure, cap at 60 seconds
- Schedule bulk scraping during off-peak hours (weekends for market data, overnight for news)
Challenge: JavaScript rendering
Many financial websites load data dynamically via JavaScript. You need either:
- A headless browser (Playwright, Puppeteer) to render the page
- Network analysis to find the underlying API calls (often more efficient)
For mobile proxies, headless browsers with mobile device emulation provide the most authentic traffic pattern.
Challenge: Data format changes
Financial websites change their HTML structure without warning. Protect yourself:
- Use multiple selectors for critical data points
- Implement data validation (a stock price should be positive, volume should be an integer)
- Set up alerts when scrapers return unexpected data
- Monitor scraper success rates daily
Challenge: Market-specific encoding
SEA exchange websites may use local character encoding for company names and announcements. Ensure your scraper handles:
- Thai script (SET listings)
- Indonesian language (IDX listings)
- Chinese characters (SGX has dual-language listings)
Step 4: Data Quality Assurance
Financial data errors can cost real money. Implement checks:
def validate_stock_data(data):
checks = {
"price_positive": data["close"] > 0,
"volume_non_negative": data["volume"] >= 0,
"high_gte_low": data["high"] >= data["low"],
"close_in_range": data["low"] <= data["close"] <= data["high"],
"date_valid": data["date"] <= datetime.now(),
"price_reasonable": data["close"] < data["previous_close"] * 2, # No 100%+ jumps without review
}
failed = [k for k, v in checks.items() if not v]
if failed:
log_warning(f"Data quality check failed: {failed} for {data['ticker']}")
return False
return TrueLegal Considerations
What Is Generally Permissible
Based on current legal precedent (primarily US-based, with the hiQ v. LinkedIn ruling as the landmark case):
- Scraping publicly available data is generally legal
- Accessing data without authentication that is displayed to any visitor is typically permissible
- Collecting factual data (stock prices, financial figures) does not violate copyright (facts cannot be copyrighted)
What Requires Caution
- Terms of service violations can expose you to breach-of-contract claims (though enforcement is rare for small-scale scraping)
- Data redistribution may violate exchange licensing agreements
- Real-time data is often protected by exchange data agreements
- Circumventing technical access controls may raise legal issues under the CFAA (US) or equivalent laws
SEA-Specific Legal Considerations
Each SEA country has its own legal framework:
- Singapore: Strong intellectual property protections; respect SGX data licensing terms
- Thailand: SET data has specific redistribution restrictions
- Indonesia: OJK (financial regulator) oversees financial data distribution
- Philippines: PSE data policies are relatively permissive for personal use
Best Practices for Legal Compliance
- Scrape only publicly available data — do not bypass login walls or paywalls
- Respect robots.txt — at minimum, document your decision if you choose to access disallowed paths
- Do not redistribute raw data commercially without understanding licensing requirements
- Add value — derivative analysis and insights built on factual data are generally protected
- Rate-limit your scraping — do not overload servers (this is where proxies help by distributing load)
- Cache aggressively — do not re-scrape data you already have
Building a Multi-Market Financial Data Pipeline
For teams tracking stocks across both US and SEA markets, here is a practical pipeline architecture:
US Market Data (NYSE, NASDAQ)
Schedule: Market hours (9:30 AM - 4:00 PM ET)
Sources: Yahoo Finance API + Polygon.io
Proxy: US residential rotating proxies
Frequency: Every 5 minutes during trading hours, daily after closeSingapore Market Data (SGX)
Schedule: Market hours (9:00 AM - 5:00 PM SGT)
Sources: SGX website + Yahoo Finance (SGX-listed tickers)
Proxy: DataResearchTools Singapore mobile proxy
Frequency: Every 15 minutes during trading hoursThailand Market Data (SET)
Schedule: Market hours (10:00 AM - 4:30 PM ICT)
Sources: SET website + Investing.com
Proxy: DataResearchTools Thailand mobile proxy
Frequency: Every 15 minutes during trading hoursIndonesia Market Data (IDX)
Schedule: Market hours (9:00 AM - 4:00 PM WIB)
Sources: IDX website + Yahoo Finance
Proxy: DataResearchTools Indonesia mobile proxy
Frequency: Every 15 minutes during trading hoursEnd-of-Day Consolidation
Schedule: After all markets close (10:00 PM SGT)
Tasks:
- Validate all collected data
- Fill gaps from backup sources
- Calculate derived metrics
- Export to analysis toolsPerformance Optimization Tips
1. Cache Everything
Store previously collected data locally. Never re-scrape historical data that will not change (last year’s financial statements, historical price data).
2. Use APIs When Possible
Scraping HTML is 10-100x more expensive (in bandwidth and processing) than calling an API. Always check for an API endpoint before building an HTML scraper.
3. Parallel Collection
Run scrapers for different markets in parallel since they operate in different time zones:
- SGX and IDX overlap (both UTC+7/+8 trading hours)
- SET trading hours partially overlap with SGX
- US markets open after SEA markets close
4. Smart Proxy Allocation
- Use cheaper datacenter/residential proxies for public APIs with published rate limits
- Reserve mobile proxies for protected websites and regional exchange access
- Match proxy geography to the exchange you are scraping
5. Monitor and Alert
Set up monitoring for:
- Scraper success rate dropping below 95%
- Unusual data patterns (potential scraping errors)
- Proxy costs exceeding budget thresholds
- Target website structure changes (CSS selector failures)
Conclusion
Scraping stock market data is a powerful capability for traders, researchers, and analysts who need cost-effective access to financial information across global markets. Mobile proxies are the key enabler for accessing protected financial platforms reliably, especially when targeting regional exchanges in Southeast Asia.
The combination of official APIs for high-volume standardized data and targeted web scraping with mobile proxies for protected sources gives you comprehensive coverage. For SEA market data, DataResearchTools mobile proxies provide the country-specific carrier IPs needed to access SGX, SET, IDX, and PSE data as a local user — a requirement that generic proxy providers cannot reliably meet.
Build your financial data pipeline thoughtfully: respect legal boundaries, implement robust quality checks, cache aggressively, and allocate proxy resources based on the specific protection level of each data source. The result is a cost-effective, reliable data infrastructure that supports informed financial decision-making across markets.
- How to Collect Cryptocurrency Price Data Across Exchanges
- NFT Minting with Mobile Proxies: Multi-Wallet Setup Guide
- How to Avoid IP-Based Sybil Detection in Crypto Protocols
- Best Proxies for Binance, Bybit, and OKX API Trading
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- Anti-Phishing with Proxies: How Security Teams Use Mobile IPs
- How to Collect Cryptocurrency Price Data Across Exchanges
- NFT Minting with Mobile Proxies: Multi-Wallet Setup Guide
- How to Avoid IP-Based Sybil Detection in Crypto Protocols
- Best Proxies for Binance, Bybit, and OKX API Trading
- 403 Forbidden in Web Scraping: How to Fix It
- aiohttp + BeautifulSoup: Async Python Scraping
- How to Collect Cryptocurrency Price Data Across Exchanges
- NFT Minting with Mobile Proxies: Multi-Wallet Setup Guide
- How to Avoid IP-Based Sybil Detection in Crypto Protocols
- Best Proxies for Binance, Bybit, and OKX API Trading
- 403 Forbidden in Web Scraping: How to Fix It
- aiohttp + BeautifulSoup: Async Python Scraping
- How to Collect Cryptocurrency Price Data Across Exchanges
- NFT Minting with Mobile Proxies: Multi-Wallet Setup Guide
- How to Avoid IP-Based Sybil Detection in Crypto Protocols
- Best Proxies for Binance, Bybit, and OKX API Trading
- 403 Forbidden Error: What It Means & How to Fix It
- 403 Forbidden in Web Scraping: How to Fix It
Related Reading
- How to Collect Cryptocurrency Price Data Across Exchanges
- NFT Minting with Mobile Proxies: Multi-Wallet Setup Guide
- How to Avoid IP-Based Sybil Detection in Crypto Protocols
- Best Proxies for Binance, Bybit, and OKX API Trading
- 403 Forbidden Error: What It Means & How to Fix It
- 403 Forbidden in Web Scraping: How to Fix It