How to Scrape Betting Odds from Multiple Bookmakers

How to Scrape Betting Odds from Multiple Bookmakers

Scraping betting odds from bookmakers is one of the most technically demanding forms of web scraping. Bookmakers use cutting-edge anti-bot technology, serve odds through complex JavaScript frameworks, update prices every few seconds, and actively detect and block automated access. Yet the data is enormously valuable for odds comparison, market analysis, trading models, and research.

This guide provides a detailed, technical walkthrough for scraping odds from major bookmakers, with specific strategies for each platform and practical proxy configurations.

Bookmaker Landscape: Know Your Targets

Major International Bookmakers

BookmakerBasePrimary Tech StackScraping DifficultyBest Proxy Type
Bet365UKReact, WebSocket10/10Mobile (UK)
PinnacleCuracaoReact, REST API5/10Mobile (any)
Betfair ExchangeUKAngular, REST API6/10Mobile (UK/IE)
William HillUKReact, WebSocket8/10Mobile (UK)
1xBetCuracaoCustom framework6/10Mobile (varies)
BetwayMaltaReact7/10Mobile (EU)

Asian Bookmakers

BookmakerBasePrimary Tech StackScraping DifficultyBest Proxy Type
SbobetPhilippinesCustom, AJAX7/10Mobile (SEA)
Maxbet/IBCBetPhilippinesCustom6/10Mobile (SEA)
M88PhilippinesHTML + AJAX5/10Mobile (SEA)
W88PhilippinesHTML + JS5/10Mobile (SEA)
188betIsle of ManReact6/10Mobile (SEA/UK)
12BETPhilippinesHTML4/10Mobile (SEA)
Fun88PhilippinesCustom5/10Mobile (SEA)

Asian bookmakers are particularly important because they often set the market for football (soccer) odds. Sharp bettors and trading firms watch Asian lines closely because they tend to move first.

DataResearchTools mobile proxies cover all major Southeast Asian markets, making them ideal for scraping Asian bookmakers that require regional IP addresses.

Technical Approaches by Bookmaker

Bet365: The Hardest Target

Bet365 is widely considered the most difficult bookmaker to scrape. Their anti-bot measures include:

  • Custom JavaScript obfuscation that changes frequently
  • WebSocket-based odds delivery
  • Advanced browser fingerprinting
  • Geographic IP verification
  • Behavioral analysis (mouse movements, scroll patterns)
  • Device attestation

Approach: Full Browser Automation

from playwright.async_api import async_playwright
import asyncio
import json

class Bet365Scraper:
    def __init__(self, proxy_config):
        self.proxy = {
            "server": f"http://{proxy_config['host']}:{proxy_config['port']}",
            "username": proxy_config["user"],
            "password": proxy_config["pass"]
        }

    async def scrape(self, sport="soccer"):
        async with async_playwright() as p:
            browser = await p.chromium.launch(
                proxy=self.proxy,
                headless=False  # Bet365 detects headless browsers
            )

            context = await browser.new_context(
                viewport={"width": 412, "height": 915},
                user_agent=(
                    "Mozilla/5.0 (Linux; Android 14; SM-S918B) "
                    "AppleWebKit/537.36 (KHTML, like Gecko) "
                    "Chrome/121.0.0.0 Mobile Safari/537.36"
                ),
                locale="en-GB",
                timezone_id="Europe/London",
                geolocation={"latitude": 51.5074, "longitude": -0.1278},
                permissions=["geolocation"]
            )

            page = await context.new_page()

            # Intercept WebSocket messages for odds data
            ws_messages = []

            page.on("websocket", lambda ws: self.handle_websocket(ws, ws_messages))

            await page.goto("https://www.bet365.com", wait_until="networkidle")

            # Navigate to sport section
            await page.wait_for_timeout(3000)

            # Human-like interaction
            await self.simulate_human_behavior(page)

            # Navigate to target sport
            sport_link = await page.query_selector(f'text="{sport.title()}"')
            if sport_link:
                await sport_link.click()
                await page.wait_for_timeout(2000)

            # Collect odds from the page
            odds_data = await self.extract_odds(page)

            await browser.close()
            return odds_data

    async def simulate_human_behavior(self, page):
        """Simulate realistic human browsing"""
        import random

        # Random mouse movements
        for _ in range(random.randint(3, 7)):
            x = random.randint(50, 350)
            y = random.randint(100, 800)
            await page.mouse.move(x, y)
            await page.wait_for_timeout(random.randint(200, 800))

        # Random scroll
        await page.mouse.wheel(0, random.randint(100, 500))
        await page.wait_for_timeout(random.randint(500, 1500))

    def handle_websocket(self, ws, messages):
        """Capture WebSocket messages containing odds"""
        ws.on("framereceived", lambda data: messages.append(data))

    async def extract_odds(self, page):
        """Extract odds from the rendered page"""
        # Bet365 uses dynamic class names, so use structural selectors
        events = await page.query_selector_all("[class*='event']")

        results = []
        for event in events:
            try:
                teams = await event.query_selector_all("[class*='participant']")
                odds_cells = await event.query_selector_all("[class*='odds']")

                if teams and odds_cells:
                    result = {
                        "home": await teams[0].inner_text() if len(teams) > 0 else None,
                        "away": await teams[1].inner_text() if len(teams) > 1 else None,
                        "odds": [await cell.inner_text() for cell in odds_cells]
                    }
                    results.append(result)
            except Exception:
                continue

        return results

Critical notes for Bet365:

  • Use non-headless browsers (or undetectable headless setups)
  • Mobile proxies from the UK are essential since Bet365 verifies geographic location
  • Rotate browser profiles, not just IPs
  • Limit sessions to 15-20 minutes before creating a new one
  • DataResearchTools mobile proxies with UK endpoints provide the geographic authenticity Bet365 requires

Pinnacle: The Accessible Sharp Book

Pinnacle is the most scraper-friendly major bookmaker, partly because they welcome sharp bettors and do not limit winning accounts. Their odds serve as the market benchmark.

Approach: API-Style Scraping

import requests
from bs4 import BeautifulSoup

class PinnacleScraper:
    def __init__(self, proxy_config):
        self.proxy = {
            "http": f"http://{proxy_config['user']}:{proxy_config['pass']}@{proxy_config['host']}:{proxy_config['port']}",
            "https": f"http://{proxy_config['user']}:{proxy_config['pass']}@{proxy_config['host']}:{proxy_config['port']}"
        }
        self.headers = {
            "User-Agent": "Mozilla/5.0 (Linux; Android 14; Pixel 8) "
                           "AppleWebKit/537.36 Chrome/121.0.0.0 Mobile Safari/537.36",
            "Accept": "application/json, text/html",
            "Accept-Language": "en-US,en;q=0.9",
            "Referer": "https://www.pinnacle.com/",
            "X-Requested-With": "XMLHttpRequest"
        }
        self.session = requests.Session()
        self.session.proxies = self.proxy
        self.session.headers.update(self.headers)

    def get_sports(self):
        """Get available sports"""
        response = self.session.get(
            "https://guest.api.arcadia.pinnacle.com/0.1/sports",
            timeout=30
        )
        return response.json()

    def get_leagues(self, sport_id):
        """Get leagues for a sport"""
        response = self.session.get(
            f"https://guest.api.arcadia.pinnacle.com/0.1/sports/{sport_id}/leagues",
            timeout=30
        )
        return response.json()

    def get_matchups(self, sport_id, league_id=None):
        """Get events and odds"""
        url = f"https://guest.api.arcadia.pinnacle.com/0.1/sports/{sport_id}/matchups"
        if league_id:
            url += f"?leagueId={league_id}"

        response = self.session.get(url, timeout=30)
        return response.json()

    def get_odds(self, matchup_id):
        """Get detailed odds for a specific event"""
        response = self.session.get(
            f"https://guest.api.arcadia.pinnacle.com/0.1/matchups/{matchup_id}/markets/related/straight",
            timeout=30
        )
        return response.json()

    def scrape_all_football_odds(self):
        """Scrape all football odds"""
        # Football sport_id is typically 29
        matchups = self.get_matchups(sport_id=29)

        all_odds = []
        for matchup in matchups:
            odds = self.get_odds(matchup["id"])
            all_odds.append({
                "event": matchup,
                "odds": odds,
                "scraped_at": datetime.utcnow().isoformat()
            })

            # Respectful rate limiting
            time.sleep(random.uniform(1, 3))

        return all_odds

Sbobet: The Asian Market Leader

Sbobet sets the line for Asian handicap markets and is heavily used by professional bettors in Southeast Asia.

Approach: AJAX Interception

class SbobetScraper:
    def __init__(self, proxy_config):
        self.proxy = {
            "http": f"http://{proxy_config['user']}:{proxy_config['pass']}@{proxy_config['host']}:{proxy_config['port']}",
            "https": f"http://{proxy_config['user']}:{proxy_config['pass']}@{proxy_config['host']}:{proxy_config['port']}"
        }
        self.headers = {
            "User-Agent": "Mozilla/5.0 (Linux; Android 14; Samsung SM-A546B) "
                           "AppleWebKit/537.36 Chrome/121.0.0.0 Mobile Safari/537.36",
            "Accept-Language": "th-TH,th;q=0.9,en;q=0.8",
            "Referer": "https://www.sbobet.com/",
        }

    def scrape_football(self):
        """Scrape Sbobet football odds"""
        session = requests.Session()
        session.proxies = self.proxy
        session.headers.update(self.headers)

        # Load the main page first (establish session cookies)
        session.get("https://www.sbobet.com/", timeout=30)
        time.sleep(random.uniform(2, 4))

        # Access the football section via AJAX endpoint
        response = session.get(
            "https://www.sbobet.com/web-root/restricted/sport/football/today",
            timeout=30
        )

        return self.parse_sbobet_odds(response.text)

    def parse_sbobet_odds(self, html):
        """Parse Sbobet's odds from the response"""
        soup = BeautifulSoup(html, "html.parser")
        events = []

        for row in soup.select(".GameList tr"):
            try:
                teams = row.select(".TeamName")
                odds_cells = row.select(".OddsPrice")

                if teams and odds_cells:
                    event = {
                        "home": teams[0].text.strip() if len(teams) > 0 else None,
                        "away": teams[1].text.strip() if len(teams) > 1 else None,
                        "handicap": self.extract_handicap(row),
                        "odds_home": self.parse_odds(odds_cells[0].text),
                        "odds_away": self.parse_odds(odds_cells[1].text) if len(odds_cells) > 1 else None,
                        "total": self.extract_total(row)
                    }
                    events.append(event)
            except Exception:
                continue

        return events

For Sbobet, a Southeast Asian mobile proxy is essential. Sbobet restricts access based on geographic location and is primarily accessible from Asian IP addresses. DataResearchTools Thai, Indonesian, and Philippine mobile proxies provide the geographic authenticity needed.

Betfair Exchange: Unique Data Source

Betfair is a betting exchange, not a traditional bookmaker. Its odds are set by the market (bettors against each other), making it a unique data source.

Approach: Official API (Preferred)

Betfair offers an official API for data access:

import betfairlightweight

class BetfairScraper:
    def __init__(self, username, password, app_key, proxy_config):
        self.trading = betfairlightweight.APIClient(
            username=username,
            password=password,
            app_key=app_key
        )
        # Configure proxy
        self.trading.session.proxies = {
            "http": f"http://{proxy_config['user']}:{proxy_config['pass']}@{proxy_config['host']}:{proxy_config['port']}",
            "https": f"http://{proxy_config['user']}:{proxy_config['pass']}@{proxy_config['host']}:{proxy_config['port']}"
        }
        self.trading.login()

    def get_football_markets(self):
        """Get all active football markets"""
        event_filter = betfairlightweight.filters.market_filter(
            event_type_ids=["1"],  # Football
            market_type_codes=["MATCH_ODDS", "OVER_UNDER_25"],
            in_play_only=False
        )

        markets = self.trading.betting.list_market_catalogue(
            filter=event_filter,
            max_results=100,
            market_projection=["RUNNER_DESCRIPTION", "MARKET_START_TIME"]
        )

        return markets

    def get_market_odds(self, market_id):
        """Get current odds for a market"""
        price_projection = betfairlightweight.filters.price_projection(
            price_data=["EX_BEST_OFFERS"]
        )

        market_books = self.trading.betting.list_market_book(
            market_ids=[market_id],
            price_projection=price_projection
        )

        return market_books

Data Pipeline Architecture

Real-Time Odds Collection

import asyncio
from datetime import datetime
import json

class OddsPipeline:
    def __init__(self, scrapers, database, alert_system):
        self.scrapers = scrapers
        self.db = database
        self.alerts = alert_system

    async def run(self):
        """Main pipeline loop"""
        while True:
            tasks = []
            for scraper in self.scrapers:
                task = asyncio.create_task(
                    self.scrape_and_store(scraper)
                )
                tasks.append(task)

            results = await asyncio.gather(*tasks, return_exceptions=True)

            # Log results
            for scraper, result in zip(self.scrapers, results):
                if isinstance(result, Exception):
                    self.alerts.send(
                        f"Scraper error for {scraper.name}: {str(result)}"
                    )

            # Wait before next cycle
            await asyncio.sleep(30)  # Adjust based on your needs

    async def scrape_and_store(self, scraper):
        """Scrape odds from one bookmaker and store results"""
        odds_data = await scraper.scrape()
        timestamp = datetime.utcnow()

        records = []
        for event in odds_data:
            for market in event.get("markets", []):
                for selection in market.get("selections", []):
                    record = {
                        "bookmaker": scraper.name,
                        "event_id": event["id"],
                        "event_name": event["name"],
                        "sport": event["sport"],
                        "market_type": market["type"],
                        "selection": selection["name"],
                        "odds": selection["odds"],
                        "timestamp": timestamp
                    }
                    records.append(record)

        await self.db.bulk_insert(records)
        return len(records)

Data Normalization

Every bookmaker presents odds differently. Normalize into a common format:

class OddsNormalizer:
    """Normalize odds data from various bookmakers into standard format"""

    SPORT_MAPPING = {
        # Bet365
        "Soccer": "football",
        "Basketball": "basketball",
        "Tennis": "tennis",
        # Pinnacle
        "Football": "football",
        # Sbobet
        "football": "football",
    }

    MARKET_MAPPING = {
        "1X2": "match_result",
        "MATCH_ODDS": "match_result",
        "MoneyLine": "match_result",
        "Asian Handicap": "asian_handicap",
        "AH": "asian_handicap",
        "Over/Under": "total",
        "OVER_UNDER": "total",
        "O/U": "total",
    }

    def normalize(self, raw_odds, bookmaker):
        """Convert bookmaker-specific format to standard"""
        return {
            "bookmaker": bookmaker,
            "sport": self.SPORT_MAPPING.get(raw_odds.get("sport"), raw_odds.get("sport", "").lower()),
            "league": raw_odds.get("league", ""),
            "event": {
                "home": raw_odds.get("home_team"),
                "away": raw_odds.get("away_team"),
                "start_time": raw_odds.get("start_time"),
            },
            "market": {
                "type": self.MARKET_MAPPING.get(raw_odds.get("market_type"), raw_odds.get("market_type")),
                "line": raw_odds.get("line"),
            },
            "selections": self.normalize_selections(raw_odds),
            "timestamp": datetime.utcnow().isoformat()
        }

    def normalize_selections(self, raw_odds):
        """Normalize selection names and odds values"""
        selections = []
        for sel in raw_odds.get("selections", []):
            selections.append({
                "name": self.clean_selection_name(sel["name"]),
                "odds_decimal": self.to_decimal(sel.get("odds"), sel.get("odds_format", "decimal")),
                "status": sel.get("status", "active")
            })
        return selections

    def to_decimal(self, odds, format_type):
        """Convert any odds format to decimal"""
        if format_type == "decimal":
            return float(odds)
        elif format_type == "american":
            if odds > 0:
                return (odds / 100) + 1
            else:
                return (100 / abs(odds)) + 1
        elif format_type == "hongkong":
            return float(odds) + 1
        elif format_type == "malay":
            if odds >= 0:
                return float(odds) + 1
            else:
                return (1 / abs(float(odds))) + 1
        elif format_type == "indonesian":
            if odds >= 0:
                return float(odds) + 1
            else:
                return (1 / abs(float(odds))) + 1
        return float(odds)

Proxy Rotation Strategy by Bookmaker

Customized Rotation Policies

ROTATION_POLICIES = {
    "bet365": {
        "proxy_type": "mobile",
        "country": "GB",
        "session_type": "sticky",
        "session_duration_minutes": 15,
        "requests_per_session": 30,
        "cooldown_minutes": 10,
        "concurrent_sessions": 1,
        "notes": "Most aggressive anti-bot. Single session, short duration."
    },
    "pinnacle": {
        "proxy_type": "mobile",
        "country": "any",
        "session_type": "rotating",
        "requests_per_ip": 50,
        "cooldown_minutes": 0,
        "concurrent_sessions": 3,
        "notes": "Tolerant of scraping. Can run multiple sessions."
    },
    "sbobet": {
        "proxy_type": "mobile",
        "country": ["TH", "ID", "PH", "MY"],
        "session_type": "sticky",
        "session_duration_minutes": 30,
        "requests_per_session": 40,
        "cooldown_minutes": 5,
        "concurrent_sessions": 2,
        "notes": "Requires SEA IP. DataResearchTools SEA proxies recommended."
    },
    "betfair": {
        "proxy_type": "mobile",
        "country": ["GB", "IE", "AU"],
        "session_type": "sticky",
        "session_duration_minutes": 60,
        "requests_per_session": 100,
        "cooldown_minutes": 0,
        "concurrent_sessions": 2,
        "notes": "API-based. Stable sessions preferred."
    },
    "m88": {
        "proxy_type": "mobile",
        "country": ["TH", "VN", "ID"],
        "session_type": "sticky",
        "session_duration_minutes": 45,
        "requests_per_session": 60,
        "cooldown_minutes": 3,
        "concurrent_sessions": 2,
        "notes": "Standard SEA bookmaker. Moderate protection."
    }
}

Handling Common Scraping Challenges

Challenge 1: Dynamic Content Loading

Many bookmakers load odds asynchronously after the initial page load:

async def wait_for_odds(page, timeout=10000):
    """Wait for odds to appear on the page"""
    try:
        await page.wait_for_selector(
            "[class*='odds'], [class*='price'], [data-odds]",
            timeout=timeout,
            state="visible"
        )
        # Additional wait for all odds to stabilize
        await page.wait_for_timeout(2000)
    except TimeoutError:
        print("Odds did not load within timeout")
        return False
    return True

Challenge 2: Odds Format Differences

Asian bookmakers often display odds in Malay, Hong Kong, or Indonesian format:

FormatFavoriteUnderdogExample
Decimal1.852.10European standard
American-118+110US standard
Hong Kong0.851.10HK = Decimal – 1
Malay0.85-0.91Neg = inverse
Indonesian-1.181.10Inverse of Malay

Challenge 3: Market Matching

The same event appears differently across bookmakers. Matching events requires fuzzy matching:

from fuzzywuzzy import fuzz

def match_events(event_a, event_b, threshold=85):
    """Determine if two events from different bookmakers are the same"""
    # Compare team names
    home_score = fuzz.ratio(
        event_a["home"].lower(),
        event_b["home"].lower()
    )
    away_score = fuzz.ratio(
        event_a["away"].lower(),
        event_b["away"].lower()
    )

    # Check if start times are close (within 5 minutes)
    time_diff = abs(
        (event_a["start_time"] - event_b["start_time"]).total_seconds()
    )
    time_match = time_diff < 300

    # Both team names must match well, and time must be close
    return (home_score >= threshold and
            away_score >= threshold and
            time_match)

Challenge 4: Geographic Restrictions

Some bookmakers are only accessible from specific countries. DataResearchTools mobile proxies solve this by providing genuine mobile IPs from the required regions:

BookmakerAccessible RegionsDataResearchTools Coverage
SbobetSoutheast AsiaThailand, Indonesia, Philippines, Malaysia, Vietnam
M88AsiaFull SEA coverage
W88AsiaFull SEA coverage
Bet365UK, EU, select othersUK endpoints available
BetfairUK, Ireland, AustraliaUK endpoints available

Monitoring and Maintenance

Health Checks

class ScraperHealthMonitor:
    def __init__(self):
        self.metrics = {}

    def record_scrape(self, bookmaker, success, duration, records_count):
        if bookmaker not in self.metrics:
            self.metrics[bookmaker] = {
                "total_scrapes": 0,
                "successful": 0,
                "failed": 0,
                "avg_duration": 0,
                "total_records": 0
            }

        m = self.metrics[bookmaker]
        m["total_scrapes"] += 1
        if success:
            m["successful"] += 1
            m["total_records"] += records_count
        else:
            m["failed"] += 1

        # Running average
        m["avg_duration"] = (
            (m["avg_duration"] * (m["total_scrapes"] - 1) + duration)
            / m["total_scrapes"]
        )

    def get_health_report(self):
        report = {}
        for bookmaker, m in self.metrics.items():
            success_rate = m["successful"] / max(m["total_scrapes"], 1) * 100
            report[bookmaker] = {
                "success_rate": f"{success_rate:.1f}%",
                "avg_duration": f"{m['avg_duration']:.1f}s",
                "total_records": m["total_records"],
                "status": "healthy" if success_rate > 90 else "degraded" if success_rate > 70 else "failing"
            }
        return report

Conclusion

Scraping betting odds from multiple bookmakers is a complex but achievable task when you combine the right tools. Each bookmaker requires a tailored approach: Bet365 demands full browser automation with UK mobile proxies, Pinnacle offers relatively accessible API-like endpoints, and Asian bookmakers like Sbobet require Southeast Asian mobile IPs.

DataResearchTools mobile proxies provide the geographic coverage and IP quality needed to access bookmakers across both European and Asian markets. Their Southeast Asian carrier network is particularly valuable for scraping the Asian bookmakers that professional bettors rely on for sharp pricing.

Start with the easiest targets (Pinnacle, smaller Asian books), build your normalization pipeline, and then tackle the harder bookmakers as your infrastructure matures. The odds data you collect will power comparison tools, arbitrage detection, market analysis, and predictive models that create genuine competitive advantage in the sports betting ecosystem.


Related Reading

Scroll to Top