Web Scraping Conferences and Events Worth Attending in 2026

TL;DR
2026 has a strong lineup of conferences covering web scraping, data engineering, and AI-driven data collection. this guide covers the events most relevant to scraping practitioners, from niche technical meetups to major data infrastructure summits.

the web scraping space rarely gets its own dedicated conferences — practitioners tend to scatter across data engineering, developer, and AI events. knowing which events are actually worth attending versus which are vendor showcases saves both time and budget.

this guide covers the 2026 events most relevant to people who build and run web scraping infrastructure, along with what to look for in each.

why attend in-person events at all

the scraping community is unusually private online. practitioners share little publicly because their edge depends on techniques others have not discovered yet. in-person events at the right venues unlock conversations that never happen in public forums. hallway tracks and after-session drinks produce more useful intelligence than the sessions themselves.

vendor relationships are also easier to build in person. if you are evaluating proxy providers or anti-bot solution vendors, a conference is the fastest way to meet their technical teams and get straight answers.

top events for scraping and data collection practitioners in 2026

PyCon US 2026

PyCon remains the most relevant general programming conference for scraping practitioners. the web scraping and data collection talks are scattered across the data engineering and automation tracks. look for talks tagged with BeautifulSoup, Playwright, Scrapy, and async HTTP. Pittsburgh, PA — typically May.

Data Engineering Summit (EMEA)

the European data engineering community has strong representation from scraping-adjacent practitioners. expect talks on large-scale HTTP infrastructure, distributed crawling, and data pipeline reliability. the proxy and anti-bot vendor presence is significant. typically held in Amsterdam or London, March-April timeframe.

ScrapingBee Community Workshops

ScrapingBee runs periodic virtual and in-person workshops focused specifically on scraping techniques. these are more technical than vendor-heavy. topics typically include JavaScript rendering, proxy rotation, and captcha handling. check their blog for 2026 dates.

BigDataWorld London

BigDataWorld includes a dedicated web data track covering legal, technical, and commercial aspects of large-scale data collection. the legal track is particularly valuable given the evolving regulatory landscape around scraping. typically held April-May at ExCeL London.

AWS re:Invent 2026

re:Invent is relevant for serverless scraping practitioners. sessions on Lambda at scale, Step Functions for orchestration, and EventBridge for scheduling map directly to scraping workloads. the scale of the event means content is broad, but the infrastructure sessions are excellent. Las Vegas, November-December.

NodeConf EU

for practitioners using Playwright or Puppeteer at scale, NodeConf EU covers JavaScript-side automation techniques. the async patterns and browser automation sessions are particularly relevant. typically held in Ireland, October.

Strata Data and AI (O’Reilly)

Strata covers the full data stack and typically includes sessions on data acquisition, which is a polite name for web scraping at scale. the business intelligence and ETL tracks overlap significantly with scraping pipeline concerns. check O’Reilly’s 2026 event calendar for locations.

what to look for when evaluating events

session quality signals

prioritize events where speakers present actual production systems rather than toy examples. abstracts that mention specific scale metrics (requests per day, IP pool sizes, failure rates) indicate practitioners rather than vendors presenting case studies.

hallway track culture

the best scraping knowledge transfer happens informally. events with structured networking (birds-of-a-feather sessions, workshop formats) produce more value than lecture-heavy formats. look for unconference elements in the schedule.

vendor-to-practitioner ratio

events with more than 40% vendor-sponsored sessions tend to be sales-heavy rather than technically rigorous. check the speaker list: if most speakers are from proxy or anti-bot companies presenting their own tools, the signal-to-noise ratio will be low.

virtual alternatives worth tracking

several strong virtual options exist for those who cannot travel. the Scrapy community Discord runs periodic live coding sessions. the Zyte (formerly Scrapy Cloud) team publishes deep technical content and occasionally runs webinars that match conference quality. the web-scraping subreddit AMA sessions, while informal, have produced some of the most candid technical discussions available publicly.

YouTube channels from practitioners like John Watson Rooney and Chris Mayer publish tutorial content that rivals most conference session quality, and it is free and on-demand.

building your 2026 event calendar

pick one or two in-person events that align with your primary stack (Python vs Node, serverless vs containerized, residential vs datacenter proxies). supplement with virtual events for topics outside your core. budget for hallway track conversations — set up 1:1 meetings before the event using the attendee list when available.

understand how web scraping works at scale before attending technical sessions. practitioners who ask informed questions get better answers and more follow-up conversations than those attending to learn basics.