How to Scrape Betting Odds from Multiple Bookmakers
Scraping betting odds from bookmakers is one of the most technically demanding forms of web scraping. Bookmakers use cutting-edge anti-bot technology, serve odds through complex JavaScript frameworks, update prices every few seconds, and actively detect and block automated access. Yet the data is enormously valuable for odds comparison, market analysis, trading models, and research.
This guide provides a detailed, technical walkthrough for scraping odds from major bookmakers, with specific strategies for each platform and practical proxy configurations.
Bookmaker Landscape: Know Your Targets
Major International Bookmakers
| Bookmaker | Base | Primary Tech Stack | Scraping Difficulty | Best Proxy Type |
|---|---|---|---|---|
| Bet365 | UK | React, WebSocket | 10/10 | Mobile (UK) |
| Pinnacle | Curacao | React, REST API | 5/10 | Mobile (any) |
| Betfair Exchange | UK | Angular, REST API | 6/10 | Mobile (UK/IE) |
| William Hill | UK | React, WebSocket | 8/10 | Mobile (UK) |
| 1xBet | Curacao | Custom framework | 6/10 | Mobile (varies) |
| Betway | Malta | React | 7/10 | Mobile (EU) |
Asian Bookmakers
| Bookmaker | Base | Primary Tech Stack | Scraping Difficulty | Best Proxy Type |
|---|---|---|---|---|
| Sbobet | Philippines | Custom, AJAX | 7/10 | Mobile (SEA) |
| Maxbet/IBCBet | Philippines | Custom | 6/10 | Mobile (SEA) |
| M88 | Philippines | HTML + AJAX | 5/10 | Mobile (SEA) |
| W88 | Philippines | HTML + JS | 5/10 | Mobile (SEA) |
| 188bet | Isle of Man | React | 6/10 | Mobile (SEA/UK) |
| 12BET | Philippines | HTML | 4/10 | Mobile (SEA) |
| Fun88 | Philippines | Custom | 5/10 | Mobile (SEA) |
Asian bookmakers are particularly important because they often set the market for football (soccer) odds. Sharp bettors and trading firms watch Asian lines closely because they tend to move first.
DataResearchTools mobile proxies cover all major Southeast Asian markets, making them ideal for scraping Asian bookmakers that require regional IP addresses.
Technical Approaches by Bookmaker
Bet365: The Hardest Target
Bet365 is widely considered the most difficult bookmaker to scrape. Their anti-bot measures include:
- Custom JavaScript obfuscation that changes frequently
- WebSocket-based odds delivery
- Advanced browser fingerprinting
- Geographic IP verification
- Behavioral analysis (mouse movements, scroll patterns)
- Device attestation
Approach: Full Browser Automation
from playwright.async_api import async_playwright
import asyncio
import json
class Bet365Scraper:
def __init__(self, proxy_config):
self.proxy = {
"server": f"http://{proxy_config['host']}:{proxy_config['port']}",
"username": proxy_config["user"],
"password": proxy_config["pass"]
}
async def scrape(self, sport="soccer"):
async with async_playwright() as p:
browser = await p.chromium.launch(
proxy=self.proxy,
headless=False # Bet365 detects headless browsers
)
context = await browser.new_context(
viewport={"width": 412, "height": 915},
user_agent=(
"Mozilla/5.0 (Linux; Android 14; SM-S918B) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/121.0.0.0 Mobile Safari/537.36"
),
locale="en-GB",
timezone_id="Europe/London",
geolocation={"latitude": 51.5074, "longitude": -0.1278},
permissions=["geolocation"]
)
page = await context.new_page()
# Intercept WebSocket messages for odds data
ws_messages = []
page.on("websocket", lambda ws: self.handle_websocket(ws, ws_messages))
await page.goto("https://www.bet365.com", wait_until="networkidle")
# Navigate to sport section
await page.wait_for_timeout(3000)
# Human-like interaction
await self.simulate_human_behavior(page)
# Navigate to target sport
sport_link = await page.query_selector(f'text="{sport.title()}"')
if sport_link:
await sport_link.click()
await page.wait_for_timeout(2000)
# Collect odds from the page
odds_data = await self.extract_odds(page)
await browser.close()
return odds_data
async def simulate_human_behavior(self, page):
"""Simulate realistic human browsing"""
import random
# Random mouse movements
for _ in range(random.randint(3, 7)):
x = random.randint(50, 350)
y = random.randint(100, 800)
await page.mouse.move(x, y)
await page.wait_for_timeout(random.randint(200, 800))
# Random scroll
await page.mouse.wheel(0, random.randint(100, 500))
await page.wait_for_timeout(random.randint(500, 1500))
def handle_websocket(self, ws, messages):
"""Capture WebSocket messages containing odds"""
ws.on("framereceived", lambda data: messages.append(data))
async def extract_odds(self, page):
"""Extract odds from the rendered page"""
# Bet365 uses dynamic class names, so use structural selectors
events = await page.query_selector_all("[class*='event']")
results = []
for event in events:
try:
teams = await event.query_selector_all("[class*='participant']")
odds_cells = await event.query_selector_all("[class*='odds']")
if teams and odds_cells:
result = {
"home": await teams[0].inner_text() if len(teams) > 0 else None,
"away": await teams[1].inner_text() if len(teams) > 1 else None,
"odds": [await cell.inner_text() for cell in odds_cells]
}
results.append(result)
except Exception:
continue
return resultsCritical notes for Bet365:
- Use non-headless browsers (or undetectable headless setups)
- Mobile proxies from the UK are essential since Bet365 verifies geographic location
- Rotate browser profiles, not just IPs
- Limit sessions to 15-20 minutes before creating a new one
- DataResearchTools mobile proxies with UK endpoints provide the geographic authenticity Bet365 requires
Pinnacle: The Accessible Sharp Book
Pinnacle is the most scraper-friendly major bookmaker, partly because they welcome sharp bettors and do not limit winning accounts. Their odds serve as the market benchmark.
Approach: API-Style Scraping
import requests
from bs4 import BeautifulSoup
class PinnacleScraper:
def __init__(self, proxy_config):
self.proxy = {
"http": f"http://{proxy_config['user']}:{proxy_config['pass']}@{proxy_config['host']}:{proxy_config['port']}",
"https": f"http://{proxy_config['user']}:{proxy_config['pass']}@{proxy_config['host']}:{proxy_config['port']}"
}
self.headers = {
"User-Agent": "Mozilla/5.0 (Linux; Android 14; Pixel 8) "
"AppleWebKit/537.36 Chrome/121.0.0.0 Mobile Safari/537.36",
"Accept": "application/json, text/html",
"Accept-Language": "en-US,en;q=0.9",
"Referer": "https://www.pinnacle.com/",
"X-Requested-With": "XMLHttpRequest"
}
self.session = requests.Session()
self.session.proxies = self.proxy
self.session.headers.update(self.headers)
def get_sports(self):
"""Get available sports"""
response = self.session.get(
"https://guest.api.arcadia.pinnacle.com/0.1/sports",
timeout=30
)
return response.json()
def get_leagues(self, sport_id):
"""Get leagues for a sport"""
response = self.session.get(
f"https://guest.api.arcadia.pinnacle.com/0.1/sports/{sport_id}/leagues",
timeout=30
)
return response.json()
def get_matchups(self, sport_id, league_id=None):
"""Get events and odds"""
url = f"https://guest.api.arcadia.pinnacle.com/0.1/sports/{sport_id}/matchups"
if league_id:
url += f"?leagueId={league_id}"
response = self.session.get(url, timeout=30)
return response.json()
def get_odds(self, matchup_id):
"""Get detailed odds for a specific event"""
response = self.session.get(
f"https://guest.api.arcadia.pinnacle.com/0.1/matchups/{matchup_id}/markets/related/straight",
timeout=30
)
return response.json()
def scrape_all_football_odds(self):
"""Scrape all football odds"""
# Football sport_id is typically 29
matchups = self.get_matchups(sport_id=29)
all_odds = []
for matchup in matchups:
odds = self.get_odds(matchup["id"])
all_odds.append({
"event": matchup,
"odds": odds,
"scraped_at": datetime.utcnow().isoformat()
})
# Respectful rate limiting
time.sleep(random.uniform(1, 3))
return all_oddsSbobet: The Asian Market Leader
Sbobet sets the line for Asian handicap markets and is heavily used by professional bettors in Southeast Asia.
Approach: AJAX Interception
class SbobetScraper:
def __init__(self, proxy_config):
self.proxy = {
"http": f"http://{proxy_config['user']}:{proxy_config['pass']}@{proxy_config['host']}:{proxy_config['port']}",
"https": f"http://{proxy_config['user']}:{proxy_config['pass']}@{proxy_config['host']}:{proxy_config['port']}"
}
self.headers = {
"User-Agent": "Mozilla/5.0 (Linux; Android 14; Samsung SM-A546B) "
"AppleWebKit/537.36 Chrome/121.0.0.0 Mobile Safari/537.36",
"Accept-Language": "th-TH,th;q=0.9,en;q=0.8",
"Referer": "https://www.sbobet.com/",
}
def scrape_football(self):
"""Scrape Sbobet football odds"""
session = requests.Session()
session.proxies = self.proxy
session.headers.update(self.headers)
# Load the main page first (establish session cookies)
session.get("https://www.sbobet.com/", timeout=30)
time.sleep(random.uniform(2, 4))
# Access the football section via AJAX endpoint
response = session.get(
"https://www.sbobet.com/web-root/restricted/sport/football/today",
timeout=30
)
return self.parse_sbobet_odds(response.text)
def parse_sbobet_odds(self, html):
"""Parse Sbobet's odds from the response"""
soup = BeautifulSoup(html, "html.parser")
events = []
for row in soup.select(".GameList tr"):
try:
teams = row.select(".TeamName")
odds_cells = row.select(".OddsPrice")
if teams and odds_cells:
event = {
"home": teams[0].text.strip() if len(teams) > 0 else None,
"away": teams[1].text.strip() if len(teams) > 1 else None,
"handicap": self.extract_handicap(row),
"odds_home": self.parse_odds(odds_cells[0].text),
"odds_away": self.parse_odds(odds_cells[1].text) if len(odds_cells) > 1 else None,
"total": self.extract_total(row)
}
events.append(event)
except Exception:
continue
return eventsFor Sbobet, a Southeast Asian mobile proxy is essential. Sbobet restricts access based on geographic location and is primarily accessible from Asian IP addresses. DataResearchTools Thai, Indonesian, and Philippine mobile proxies provide the geographic authenticity needed.
Betfair Exchange: Unique Data Source
Betfair is a betting exchange, not a traditional bookmaker. Its odds are set by the market (bettors against each other), making it a unique data source.
Approach: Official API (Preferred)
Betfair offers an official API for data access:
import betfairlightweight
class BetfairScraper:
def __init__(self, username, password, app_key, proxy_config):
self.trading = betfairlightweight.APIClient(
username=username,
password=password,
app_key=app_key
)
# Configure proxy
self.trading.session.proxies = {
"http": f"http://{proxy_config['user']}:{proxy_config['pass']}@{proxy_config['host']}:{proxy_config['port']}",
"https": f"http://{proxy_config['user']}:{proxy_config['pass']}@{proxy_config['host']}:{proxy_config['port']}"
}
self.trading.login()
def get_football_markets(self):
"""Get all active football markets"""
event_filter = betfairlightweight.filters.market_filter(
event_type_ids=["1"], # Football
market_type_codes=["MATCH_ODDS", "OVER_UNDER_25"],
in_play_only=False
)
markets = self.trading.betting.list_market_catalogue(
filter=event_filter,
max_results=100,
market_projection=["RUNNER_DESCRIPTION", "MARKET_START_TIME"]
)
return markets
def get_market_odds(self, market_id):
"""Get current odds for a market"""
price_projection = betfairlightweight.filters.price_projection(
price_data=["EX_BEST_OFFERS"]
)
market_books = self.trading.betting.list_market_book(
market_ids=[market_id],
price_projection=price_projection
)
return market_booksData Pipeline Architecture
Real-Time Odds Collection
import asyncio
from datetime import datetime
import json
class OddsPipeline:
def __init__(self, scrapers, database, alert_system):
self.scrapers = scrapers
self.db = database
self.alerts = alert_system
async def run(self):
"""Main pipeline loop"""
while True:
tasks = []
for scraper in self.scrapers:
task = asyncio.create_task(
self.scrape_and_store(scraper)
)
tasks.append(task)
results = await asyncio.gather(*tasks, return_exceptions=True)
# Log results
for scraper, result in zip(self.scrapers, results):
if isinstance(result, Exception):
self.alerts.send(
f"Scraper error for {scraper.name}: {str(result)}"
)
# Wait before next cycle
await asyncio.sleep(30) # Adjust based on your needs
async def scrape_and_store(self, scraper):
"""Scrape odds from one bookmaker and store results"""
odds_data = await scraper.scrape()
timestamp = datetime.utcnow()
records = []
for event in odds_data:
for market in event.get("markets", []):
for selection in market.get("selections", []):
record = {
"bookmaker": scraper.name,
"event_id": event["id"],
"event_name": event["name"],
"sport": event["sport"],
"market_type": market["type"],
"selection": selection["name"],
"odds": selection["odds"],
"timestamp": timestamp
}
records.append(record)
await self.db.bulk_insert(records)
return len(records)Data Normalization
Every bookmaker presents odds differently. Normalize into a common format:
class OddsNormalizer:
"""Normalize odds data from various bookmakers into standard format"""
SPORT_MAPPING = {
# Bet365
"Soccer": "football",
"Basketball": "basketball",
"Tennis": "tennis",
# Pinnacle
"Football": "football",
# Sbobet
"football": "football",
}
MARKET_MAPPING = {
"1X2": "match_result",
"MATCH_ODDS": "match_result",
"MoneyLine": "match_result",
"Asian Handicap": "asian_handicap",
"AH": "asian_handicap",
"Over/Under": "total",
"OVER_UNDER": "total",
"O/U": "total",
}
def normalize(self, raw_odds, bookmaker):
"""Convert bookmaker-specific format to standard"""
return {
"bookmaker": bookmaker,
"sport": self.SPORT_MAPPING.get(raw_odds.get("sport"), raw_odds.get("sport", "").lower()),
"league": raw_odds.get("league", ""),
"event": {
"home": raw_odds.get("home_team"),
"away": raw_odds.get("away_team"),
"start_time": raw_odds.get("start_time"),
},
"market": {
"type": self.MARKET_MAPPING.get(raw_odds.get("market_type"), raw_odds.get("market_type")),
"line": raw_odds.get("line"),
},
"selections": self.normalize_selections(raw_odds),
"timestamp": datetime.utcnow().isoformat()
}
def normalize_selections(self, raw_odds):
"""Normalize selection names and odds values"""
selections = []
for sel in raw_odds.get("selections", []):
selections.append({
"name": self.clean_selection_name(sel["name"]),
"odds_decimal": self.to_decimal(sel.get("odds"), sel.get("odds_format", "decimal")),
"status": sel.get("status", "active")
})
return selections
def to_decimal(self, odds, format_type):
"""Convert any odds format to decimal"""
if format_type == "decimal":
return float(odds)
elif format_type == "american":
if odds > 0:
return (odds / 100) + 1
else:
return (100 / abs(odds)) + 1
elif format_type == "hongkong":
return float(odds) + 1
elif format_type == "malay":
if odds >= 0:
return float(odds) + 1
else:
return (1 / abs(float(odds))) + 1
elif format_type == "indonesian":
if odds >= 0:
return float(odds) + 1
else:
return (1 / abs(float(odds))) + 1
return float(odds)Proxy Rotation Strategy by Bookmaker
Customized Rotation Policies
ROTATION_POLICIES = {
"bet365": {
"proxy_type": "mobile",
"country": "GB",
"session_type": "sticky",
"session_duration_minutes": 15,
"requests_per_session": 30,
"cooldown_minutes": 10,
"concurrent_sessions": 1,
"notes": "Most aggressive anti-bot. Single session, short duration."
},
"pinnacle": {
"proxy_type": "mobile",
"country": "any",
"session_type": "rotating",
"requests_per_ip": 50,
"cooldown_minutes": 0,
"concurrent_sessions": 3,
"notes": "Tolerant of scraping. Can run multiple sessions."
},
"sbobet": {
"proxy_type": "mobile",
"country": ["TH", "ID", "PH", "MY"],
"session_type": "sticky",
"session_duration_minutes": 30,
"requests_per_session": 40,
"cooldown_minutes": 5,
"concurrent_sessions": 2,
"notes": "Requires SEA IP. DataResearchTools SEA proxies recommended."
},
"betfair": {
"proxy_type": "mobile",
"country": ["GB", "IE", "AU"],
"session_type": "sticky",
"session_duration_minutes": 60,
"requests_per_session": 100,
"cooldown_minutes": 0,
"concurrent_sessions": 2,
"notes": "API-based. Stable sessions preferred."
},
"m88": {
"proxy_type": "mobile",
"country": ["TH", "VN", "ID"],
"session_type": "sticky",
"session_duration_minutes": 45,
"requests_per_session": 60,
"cooldown_minutes": 3,
"concurrent_sessions": 2,
"notes": "Standard SEA bookmaker. Moderate protection."
}
}Handling Common Scraping Challenges
Challenge 1: Dynamic Content Loading
Many bookmakers load odds asynchronously after the initial page load:
async def wait_for_odds(page, timeout=10000):
"""Wait for odds to appear on the page"""
try:
await page.wait_for_selector(
"[class*='odds'], [class*='price'], [data-odds]",
timeout=timeout,
state="visible"
)
# Additional wait for all odds to stabilize
await page.wait_for_timeout(2000)
except TimeoutError:
print("Odds did not load within timeout")
return False
return TrueChallenge 2: Odds Format Differences
Asian bookmakers often display odds in Malay, Hong Kong, or Indonesian format:
| Format | Favorite | Underdog | Example |
|---|---|---|---|
| Decimal | 1.85 | 2.10 | European standard |
| American | -118 | +110 | US standard |
| Hong Kong | 0.85 | 1.10 | HK = Decimal – 1 |
| Malay | 0.85 | -0.91 | Neg = inverse |
| Indonesian | -1.18 | 1.10 | Inverse of Malay |
Challenge 3: Market Matching
The same event appears differently across bookmakers. Matching events requires fuzzy matching:
from fuzzywuzzy import fuzz
def match_events(event_a, event_b, threshold=85):
"""Determine if two events from different bookmakers are the same"""
# Compare team names
home_score = fuzz.ratio(
event_a["home"].lower(),
event_b["home"].lower()
)
away_score = fuzz.ratio(
event_a["away"].lower(),
event_b["away"].lower()
)
# Check if start times are close (within 5 minutes)
time_diff = abs(
(event_a["start_time"] - event_b["start_time"]).total_seconds()
)
time_match = time_diff < 300
# Both team names must match well, and time must be close
return (home_score >= threshold and
away_score >= threshold and
time_match)Challenge 4: Geographic Restrictions
Some bookmakers are only accessible from specific countries. DataResearchTools mobile proxies solve this by providing genuine mobile IPs from the required regions:
| Bookmaker | Accessible Regions | DataResearchTools Coverage |
|---|---|---|
| Sbobet | Southeast Asia | Thailand, Indonesia, Philippines, Malaysia, Vietnam |
| M88 | Asia | Full SEA coverage |
| W88 | Asia | Full SEA coverage |
| Bet365 | UK, EU, select others | UK endpoints available |
| Betfair | UK, Ireland, Australia | UK endpoints available |
Monitoring and Maintenance
Health Checks
class ScraperHealthMonitor:
def __init__(self):
self.metrics = {}
def record_scrape(self, bookmaker, success, duration, records_count):
if bookmaker not in self.metrics:
self.metrics[bookmaker] = {
"total_scrapes": 0,
"successful": 0,
"failed": 0,
"avg_duration": 0,
"total_records": 0
}
m = self.metrics[bookmaker]
m["total_scrapes"] += 1
if success:
m["successful"] += 1
m["total_records"] += records_count
else:
m["failed"] += 1
# Running average
m["avg_duration"] = (
(m["avg_duration"] * (m["total_scrapes"] - 1) + duration)
/ m["total_scrapes"]
)
def get_health_report(self):
report = {}
for bookmaker, m in self.metrics.items():
success_rate = m["successful"] / max(m["total_scrapes"], 1) * 100
report[bookmaker] = {
"success_rate": f"{success_rate:.1f}%",
"avg_duration": f"{m['avg_duration']:.1f}s",
"total_records": m["total_records"],
"status": "healthy" if success_rate > 90 else "degraded" if success_rate > 70 else "failing"
}
return reportConclusion
Scraping betting odds from multiple bookmakers is a complex but achievable task when you combine the right tools. Each bookmaker requires a tailored approach: Bet365 demands full browser automation with UK mobile proxies, Pinnacle offers relatively accessible API-like endpoints, and Asian bookmakers like Sbobet require Southeast Asian mobile IPs.
DataResearchTools mobile proxies provide the geographic coverage and IP quality needed to access bookmakers across both European and Asian markets. Their Southeast Asian carrier network is particularly valuable for scraping the Asian bookmakers that professional bettors rely on for sharp pricing.
Start with the easiest targets (Pinnacle, smaller Asian books), build your normalization pipeline, and then tackle the harder bookmakers as your infrastructure matures. The odds data you collect will power comparison tools, arbitrage detection, market analysis, and predictive models that create genuine competitive advantage in the sports betting ecosystem.
- Mobile Proxies for Sports Betting Odds Scraping
- Proxies for Arbitrage Betting: Multi-Account Management Guide
- Best Mobile Proxies for Sneaker Botting in 2026
- How to Set Up Proxies with Sneaker Bots (Kodai, Cyber, Sole AIO)
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- API vs Web Scraping: When You Need Proxies (and When You Don’t)
- Mobile Proxies for Sports Betting Odds Scraping
- Proxies for Arbitrage Betting: Multi-Account Management Guide
- Best Mobile Proxies for Sneaker Botting in 2026
- How to Set Up Proxies with Sneaker Bots (Kodai, Cyber, Sole AIO)
- aiohttp + BeautifulSoup: Async Python Scraping
- Anti-Bot Detection Glossary: 50+ Terms Defined
- Mobile Proxies for Sports Betting Odds Scraping
- Proxies for Arbitrage Betting: Multi-Account Management Guide
- Best Mobile Proxies for Sneaker Botting in 2026
- How to Set Up Proxies with Sneaker Bots (Kodai, Cyber, Sole AIO)
- aiohttp + BeautifulSoup: Async Python Scraping
- Anti-Bot Detection Glossary: 50+ Terms Defined
- Mobile Proxies for Sports Betting Odds Scraping
- Proxies for Arbitrage Betting: Multi-Account Management Guide
- Best Mobile Proxies for Sneaker Botting in 2026
- How to Set Up Proxies with Sneaker Bots (Kodai, Cyber, Sole AIO)
- 403 Forbidden Error: What It Means & How to Fix It
- 407 Proxy Authentication Required: Fix Guide
Related Reading
- Mobile Proxies for Sports Betting Odds Scraping
- Proxies for Arbitrage Betting: Multi-Account Management Guide
- Best Mobile Proxies for Sneaker Botting in 2026
- How to Set Up Proxies with Sneaker Bots (Kodai, Cyber, Sole AIO)
- 403 Forbidden Error: What It Means & How to Fix It
- 407 Proxy Authentication Required: Fix Guide