How to Scrape Trade Show and Conference Attendee Lists

How to Scrape Trade Show and Conference Attendee Lists

Trade shows and industry conferences concentrate your ideal prospects in one place. Exhibitor lists, speaker profiles, and attendee directories contain pre-qualified leads — these are companies actively investing in your industry. Manually collecting this data from dozens of event websites is time-consuming, but automated scraping with mobile proxies makes it possible to build comprehensive event-based lead databases.

This guide covers how to extract exhibitor data, speaker information, and attendee details from major event platforms and individual conference websites.

Why Event Data Is High-Quality Lead Data

Event attendees and exhibitors are among the most qualified B2B leads available:

  • Active buyers — Companies attending trade shows are actively evaluating solutions.
  • Budget confirmed — Exhibitor booth fees range from $5,000 to $100,000+, indicating real purchasing power.
  • Decision makers — Conference attendees are typically senior professionals with buying authority.
  • Industry verified — Attendance confirms the company operates in your target vertical.
  • Timing signal — Companies preparing for shows are often in active procurement cycles.

Event Data Sources

Major Event Platforms

Most trade shows host their exhibitor and session data on standardized event platforms:

PlatformCommon EventsData Available
Map Your ShowLarge trade showsExhibitor list, booth locations, categories
SwapcardTech conferencesAttendees, speakers, sponsors
WhovaProfessional conferencesAgenda, speakers, exhibitors
CventCorporate eventsExhibitors, sessions
EventbriteSmaller eventsOrganizer info, event details
a2z/PersonifyIndustry trade showsExhibitors, products, floor plans

Individual Event Websites

Many major conferences host their own exhibitor directories. Examples include CES, SXSW, Web Summit, Dreamforce, and hundreds of industry-specific shows.

Scraping Exhibitor Directories

Map Your Show / a2z Platform

Many large trade shows use these platforms for their exhibitor directories. The data is typically loaded via JavaScript:

from playwright.async_api import async_playwright
import asyncio
import random
import json

async def scrape_exhibitor_directory(event_url, proxy_config):
    """Scrape exhibitor directory from event website"""
    async with async_playwright() as p:
        browser = await p.chromium.launch(
            proxy=proxy_config,
            headless=False,
        )
        page = await browser.new_page()

        await page.goto(event_url, wait_until="networkidle")
        await page.wait_for_timeout(random.randint(3000, 6000))

        exhibitors = []

        # Many exhibitor directories load via AJAX — intercept API calls
        async def handle_response(response):
            if "exhibitor" in response.url.lower() and response.status == 200:
                try:
                    data = await response.json()
                    if isinstance(data, list):
                        exhibitors.extend(data)
                    elif isinstance(data, dict) and "exhibitors" in data:
                        exhibitors.extend(data["exhibitors"])
                except Exception:
                    pass

        page.on("response", handle_response)

        # Scroll through the directory to trigger all data loads
        await auto_scroll(page)

        # If no API data captured, parse the HTML directly
        if not exhibitors:
            exhibitors = await parse_exhibitor_html(page)

        await browser.close()
        return exhibitors


async def auto_scroll(page, max_scrolls=50):
    """Scroll through a page to trigger lazy loading"""
    for i in range(max_scrolls):
        await page.evaluate("window.scrollBy(0, 500)")
        await page.wait_for_timeout(random.randint(500, 1500))

        # Check if we've reached the bottom
        at_bottom = await page.evaluate(
            "(window.innerHeight + window.scrollY) >= document.body.scrollHeight"
        )
        if at_bottom:
            break


async def parse_exhibitor_html(page):
    """Parse exhibitor data from HTML when API interception fails"""
    exhibitors = []

    cards = await page.query_selector_all('[class*="exhibitor"], [class*="company-card"]')

    for card in cards:
        exhibitor = {}

        name_el = await card.query_selector('h2, h3, [class*="name"]')
        if name_el:
            exhibitor['company_name'] = (await name_el.inner_text()).strip()

        booth_el = await card.query_selector('[class*="booth"]')
        if booth_el:
            exhibitor['booth_number'] = (await booth_el.inner_text()).strip()

        category_els = await card.query_selector_all('[class*="category"], [class*="tag"]')
        exhibitor['categories'] = []
        for cat in category_els:
            exhibitor['categories'].append((await cat.inner_text()).strip())

        desc_el = await card.query_selector('[class*="description"], p')
        if desc_el:
            exhibitor['description'] = (await desc_el.inner_text()).strip()[:500]

        link_el = await card.query_selector('a[href*="http"]')
        if link_el:
            exhibitor['website'] = await link_el.get_attribute('href')

        if exhibitor.get('company_name'):
            exhibitors.append(exhibitor)

    return exhibitors

Extracting Exhibitor Detail Pages

Each exhibitor listing often links to a detail page with contact information:

async def scrape_exhibitor_detail(page, detail_url):
    """Scrape full details from an exhibitor's profile page"""
    await page.goto(detail_url, wait_until="networkidle")
    await page.wait_for_timeout(random.randint(2000, 5000))

    details = {}

    # Company description
    desc_el = await page.query_selector('[class*="description"], [class*="about"]')
    if desc_el:
        details['description'] = (await desc_el.inner_text()).strip()

    # Contact info
    contact_el = await page.query_selector('[class*="contact"]')
    if contact_el:
        contact_html = await contact_el.inner_html()

        # Extract email
        import re
        emails = re.findall(r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}', contact_html)
        details['emails'] = list(set(emails))

        # Extract phone
        phones = re.findall(r'[\+]?[(]?[0-9]{1,3}[)]?[-\s\.]?[0-9]{3}[-\s\.]?[0-9]{4,6}', contact_html)
        details['phones'] = list(set(phones))

    # Website
    website_el = await page.query_selector('a[href*="http"][class*="website"]')
    if website_el:
        details['website'] = await website_el.get_attribute('href')

    # Social links
    social_links = await page.query_selector_all('a[href*="linkedin"], a[href*="twitter"]')
    details['social'] = {}
    for link in social_links:
        href = await link.get_attribute('href')
        if 'linkedin' in href:
            details['social']['linkedin'] = href
        elif 'twitter' in href:
            details['social']['twitter'] = href

    # Products/services
    product_els = await page.query_selector_all('[class*="product"]')
    details['products'] = []
    for prod in product_els:
        details['products'].append((await prod.inner_text()).strip())

    return details

Scraping Speaker and Session Data

Conference speaker lists contain decision-makers and thought leaders:

async def scrape_speakers(agenda_url, proxy_config):
    """Scrape speaker data from conference agenda"""
    async with async_playwright() as p:
        browser = await p.chromium.launch(proxy=proxy_config)
        page = await browser.new_page()

        await page.goto(agenda_url, wait_until="networkidle")
        await page.wait_for_timeout(random.randint(3000, 6000))

        speakers = []
        speaker_cards = await page.query_selector_all(
            '[class*="speaker"], [class*="presenter"]'
        )

        for card in speaker_cards:
            speaker = {}

            name_el = await card.query_selector('[class*="name"], h3, h4')
            if name_el:
                speaker['name'] = (await name_el.inner_text()).strip()

            title_el = await card.query_selector('[class*="title"], [class*="role"]')
            if title_el:
                speaker['title'] = (await title_el.inner_text()).strip()

            company_el = await card.query_selector('[class*="company"], [class*="org"]')
            if company_el:
                speaker['company'] = (await company_el.inner_text()).strip()

            bio_el = await card.query_selector('[class*="bio"], p')
            if bio_el:
                speaker['bio'] = (await bio_el.inner_text()).strip()[:300]

            img_el = await card.query_selector('img')
            if img_el:
                speaker['photo_url'] = await img_el.get_attribute('src')

            if speaker.get('name'):
                speakers.append(speaker)

        await browser.close()
        return speakers

Building an Event Calendar Scraping System

Automate discovery and scraping across multiple events throughout the year. For foundational concepts on proxy rotation used in this system, check our proxy glossary.

import schedule
from datetime import datetime, timedelta

class EventCalendarScraper:
    """Automated scraping across multiple trade shows and conferences"""

    def __init__(self, proxy_pool):
        self.proxy_pool = proxy_pool
        self.events = []
        self.all_leads = []

    def add_event(self, event):
        """Register an event for scraping"""
        self.events.append({
            "name": event["name"],
            "exhibitor_url": event.get("exhibitor_url"),
            "speaker_url": event.get("speaker_url"),
            "event_date": event.get("date"),
            "industry": event.get("industry"),
            "scrape_start": event.get("date") - timedelta(days=30),
        })

    async def scrape_upcoming_events(self):
        """Scrape all events happening in the next 60 days"""
        today = datetime.now()
        upcoming = [
            e for e in self.events
            if e["event_date"] and today <= e["event_date"] <= today + timedelta(days=60)
        ]

        for event in upcoming:
            proxy = self.proxy_pool.get_next()
            proxy_config = {
                "server": proxy,
            }

            if event.get("exhibitor_url"):
                exhibitors = await scrape_exhibitor_directory(
                    event["exhibitor_url"],
                    proxy_config
                )
                for ex in exhibitors:
                    ex["event_name"] = event["name"]
                    ex["event_date"] = event["event_date"].isoformat()
                    ex["industry"] = event["industry"]
                    ex["lead_type"] = "exhibitor"
                self.all_leads.extend(exhibitors)

            if event.get("speaker_url"):
                speakers = await scrape_speakers(
                    event["speaker_url"],
                    proxy_config
                )
                for sp in speakers:
                    sp["event_name"] = event["name"]
                    sp["event_date"] = event["event_date"].isoformat()
                    sp["lead_type"] = "speaker"
                self.all_leads.extend(speakers)

        return self.all_leads

Enriching Event Leads

Event data provides company names but often lacks direct contact information. Enrich with email and phone data using your web scraping infrastructure:

async def enrich_exhibitor(exhibitor, proxy_url):
    """Enrich exhibitor data with contact information"""
    enriched = exhibitor.copy()

    if exhibitor.get('website'):
        try:
            response = requests.get(
                exhibitor['website'],
                proxies={"https": proxy_url},
                timeout=15,
                headers={"User-Agent": "Mozilla/5.0"}
            )

            import re
            # Extract emails
            emails = re.findall(
                r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}',
                response.text
            )
            enriched['emails'] = list(set(e.lower() for e in emails))

            # Extract phone numbers
            phones = re.findall(
                r'[\+]?[(]?[0-9]{1,3}[)]?[-\s\.]?[0-9]{3}[-\s\.]?[0-9]{4,6}',
                response.text
            )
            enriched['phones'] = list(set(phones))[:3]

        except Exception:
            pass

    return enriched

Competitive Intelligence from Event Data

Track which competitors are exhibiting where and what they are promoting:

class CompetitorEventTracker:
    """Track competitor presence across trade shows"""

    def __init__(self):
        self.competitor_data = {}

    def track_competitor(self, competitor_name, event_data):
        """Record competitor's event participation"""
        if competitor_name not in self.competitor_data:
            self.competitor_data[competitor_name] = []

        self.competitor_data[competitor_name].append({
            "event": event_data["event_name"],
            "date": event_data["event_date"],
            "booth_size": event_data.get("booth_number"),
            "products_showcased": event_data.get("products", []),
            "description": event_data.get("description"),
        })

    def get_competitor_report(self, competitor_name):
        """Generate report on competitor's event strategy"""
        events = self.competitor_data.get(competitor_name, [])
        return {
            "competitor": competitor_name,
            "total_events": len(events),
            "events": sorted(events, key=lambda x: x.get("date", ""), reverse=True),
            "products_promoted": list(set(
                p for e in events for p in e.get("products_showcased", [])
            )),
        }

Output Formatting

Structure event leads for import into your CRM or outreach tool:

import csv

def export_event_leads(leads, output_file="event_leads.csv"):
    """Export event leads to CSV"""
    fieldnames = [
        'company_name', 'booth_number', 'categories', 'website',
        'emails', 'phones', 'event_name', 'event_date',
        'industry', 'lead_type', 'description'
    ]

    with open(output_file, 'w', newline='', encoding='utf-8') as f:
        writer = csv.DictWriter(f, fieldnames=fieldnames, extrasaction='ignore')
        writer.writeheader()

        for lead in leads:
            row = lead.copy()
            row['categories'] = '; '.join(lead.get('categories', []))
            row['emails'] = '; '.join(lead.get('emails', []))
            row['phones'] = '; '.join(lead.get('phones', []))
            writer.writerow(row)

    print(f"Exported {len(leads)} event leads to {output_file}")

Timing Your Outreach

Event-based leads have a natural outreach timeline:

  • 30 days before event — Initial outreach: “We noticed you’re exhibiting at [Event]. Let’s connect beforehand.”
  • During event — Real-time engagement if you are also attending.
  • 1-7 days after event — Follow-up: “Great seeing [Company] at [Event]. Here’s how we can help…”
  • 30 days after event — Value-based follow-up with industry insights.

Conclusion

Trade show and conference data provides some of the highest-quality B2B leads available — pre-qualified by industry, budget, and buying intent. Automated scraping with mobile proxies lets you build comprehensive databases across dozens of events per year, capturing exhibitor details, speaker profiles, and session data. The key is systematic enrichment of raw event data with contact information from company websites and professional networks. Start with the five largest events in your industry, validate the lead quality through outreach, and expand your event calendar from there.


Related Reading

Scroll to Top