Multi-agent scraping with AutoGen in 2026

AutoGen scraping multi-agent setups have matured into a real production option after Microsoft Research shipped AutoGen v0.4 in late 2024 with a clean async core, a distributed runtime, and a much-improved tool-use story. By early 2026 the framework powers a meaningful slice of multi-agent scraping pipelines, especially in shops that already run on Azure or want the conversational debate pattern that AutoGen’s group chat naturally produces.

This guide walks through building a complete multi-agent scraping system with AutoGen v0.4. We define the agent topology, wire tools, run a group chat to scrape and validate ecommerce data, and benchmark cost and quality against single-agent and other multi-agent frameworks.

Why AutoGen for multi-agent scraping

AutoGen’s defining feature is the conversational pattern. Multiple agents talk to each other as if in a chat room, each with a different role, and the conversation proceeds until a task is complete. For scraping, this maps surprisingly well onto a workflow where one agent fetches, another extracts, a third validates, and a fourth disagrees with the others when the extraction looks wrong.

That last bit is the differentiator. Other multi-agent frameworks struggle to express disagreement. AutoGen’s group chat makes it natural. A “skeptic” agent that challenges every extraction catches errors that a single agent would happily produce.

AutoGen v0.4 also ships a distributed runtime. Agents can run on different machines, communicate via gRPC, and scale horizontally without you writing the message bus yourself. For high-volume scraping pipelines, this is a real architectural win.

Installing v0.4

AutoGen split into multiple packages in v0.4. The base needs autogen-core and autogen-agentchat. For OpenAI integration, add autogen-ext.

pip install "autogen-agentchat==0.4.3" "autogen-ext[openai]==0.4.3" \
            playwright httpx pydantic
playwright install chromium

export OPENAI_API_KEY="sk-..."

For Anthropic models in AutoGen v0.4, the community package autogen-ext-anthropic works.

Defining model clients

AutoGen v0.4 separates model clients from agents, which is a clean break from v0.2.

from autogen_ext.models.openai import OpenAIChatCompletionClient

cheap = OpenAIChatCompletionClient(model="gpt-4o-mini", temperature=0)
strong = OpenAIChatCompletionClient(model="gpt-4o", temperature=0)

Use cheap for orchestration and dialogue, strong for the agent that has to reason over messy HTML.

Building scraping tools

Tools in AutoGen v0.4 are async Python functions with type hints. The framework infers the schema.

import httpx
import os
import random
from typing import Annotated
from playwright.async_api import async_playwright

PROXIES = os.environ.get("PROXY_POOL", "").split(",")

async def fetch_url(
    url: Annotated[str, "URL to fetch"],
    timeout_s: Annotated[int, "Request timeout in seconds"] = 30,
) -> str:
    """Fetch a URL through the rotating proxy pool. Returns HTML or an error message."""
    proxy = random.choice(PROXIES) if PROXIES and PROXIES != [""] else None
    try:
        async with httpx.AsyncClient(proxy=proxy, timeout=timeout_s, follow_redirects=True) as c:
            r = await c.get(url, headers={"User-Agent": "Mozilla/5.0"})
            return f"HTTP {r.status_code}\nFinal URL: {r.url}\n\n{r.text[:200000]}"
    except Exception as e:
        return f"FETCH_ERROR: {e}"

async def render_url(
    url: Annotated[str, "URL to render with headless Chromium"],
    timeout_s: Annotated[int, "Timeout in seconds"] = 30,
) -> str:
    """Render a URL with Playwright. Returns HTML after JS executes."""
    try:
        async with async_playwright() as p:
            browser = await p.chromium.launch(headless=True)
            page = await browser.new_page()
            await page.goto(url, wait_until="networkidle", timeout=timeout_s * 1000)
            html = await page.content()
            await browser.close()
        return html[:200000]
    except Exception as e:
        return f"RENDER_ERROR: {e}"

async def store_record(
    record_json: Annotated[str, "JSON record to persist"],
) -> str:
    """Persist a validated extraction record. Returns confirmation."""
    import json
    from pathlib import Path
    rec = json.loads(record_json)
    out = Path("scraped_records.jsonl")
    with out.open("a") as f:
        f.write(json.dumps(rec) + "\n")
    return f"stored: {rec.get('url', '')}"

Defining the agents

AutoGen v0.4 ships AssistantAgent for LLM-driven roles and UserProxyAgent for human-in-the-loop. For full automation we use only AssistantAgent.

from autogen_agentchat.agents import AssistantAgent

fetcher = AssistantAgent(
    name="Fetcher",
    model_client=cheap,
    tools=[fetch_url, render_url],
    system_message=(
        "You are a Fetcher. Given a URL, fetch its HTML using fetch_url first. "
        "If the response is empty or looks like a JS shell, retry with render_url. "
        "Return the HTML to the group exactly as the tool returned it."
    ),
)

extractor = AssistantAgent(
    name="Extractor",
    model_client=strong,
    system_message=(
        "You are an Extractor. Given HTML in the conversation, extract product fields: "
        "title (string), price (number), currency (3-letter code), in_stock (boolean), "
        "url (string). Return strict JSON only. If a field cannot be determined, use null."
    ),
)

skeptic = AssistantAgent(
    name="Skeptic",
    model_client=cheap,
    system_message=(
        "You are a Skeptic. Review extractions from the Extractor. "
        "Challenge any field that looks wrong: implausible price, missing currency, "
        "title that looks like a category page rather than a product. "
        "If the extraction is correct, respond with APPROVED. If not, explain the problem."
    ),
)

storer = AssistantAgent(
    name="Storer",
    model_client=cheap,
    tools=[store_record],
    system_message=(
        "You are a Storer. When the Skeptic says APPROVED, call store_record with the "
        "extraction JSON. Then say STORED."
    ),
)

Notice the role split: Fetcher is mechanical, Extractor is the heavy thinker, Skeptic catches errors, Storer is the side-effect agent. Each does one job.

Running the group chat

import asyncio
from autogen_agentchat.teams import RoundRobinGroupChat
from autogen_agentchat.conditions import TextMentionTermination

async def main():
    termination = TextMentionTermination("STORED")
    team = RoundRobinGroupChat(
        participants=[fetcher, extractor, skeptic, storer],
        termination_condition=termination,
        max_turns=12,
    )

    result = await team.run(task=(
        "Scrape this URL and extract the product record: "
        "https://www.lazada.sg/products/example-12345.html"
    ))

    for msg in result.messages:
        print(f"[{msg.source}] {msg.content[:300]}")

asyncio.run(main())

RoundRobinGroupChat rotates through participants in order. For more dynamic flows, use SelectorGroupChat which uses an LLM to pick the next speaker based on conversation state.

Selector group chat for adaptive flows

from autogen_agentchat.teams import SelectorGroupChat

selector_prompt = """
Read the conversation. Pick the next agent to speak.
Agents: Fetcher, Extractor, Skeptic, Storer.
Rules:
- Fetcher speaks when there is no HTML yet or the last fetch failed.
- Extractor speaks when fresh HTML is in the conversation.
- Skeptic speaks after Extractor returns JSON.
- Storer speaks when Skeptic says APPROVED.
Return only the agent name.
"""

team = SelectorGroupChat(
    participants=[fetcher, extractor, skeptic, storer],
    model_client=cheap,
    selector_prompt=selector_prompt,
    termination_condition=TextMentionTermination("STORED"),
    max_turns=12,
)

The selector pattern is more flexible but adds an LLM call per turn. Worth the cost when the workflow shape genuinely depends on state.

Comparing to LangGraph and CrewAI

Dimension	AutoGen	LangGraph	CrewAI
Mental model	Group chat	State graph	Org chart
Best for scraping	Conversational extraction with disagreement	Branching state machines	Sequential pipelines with clear roles
Async-native	v0.4 yes	Yes	Yes
Distributed runtime	Yes (gRPC)	Manual	Manual
Tool definition	Function decorator	LangChain Tool	BaseTool subclass
Selector flexibility	Round-robin or LLM selector	Conditional edges	Sequential or hierarchical
Maturity in 2026	Stable v0.4	Stable 0.4	Stable 0.86

AutoGen wins when the scraping problem benefits from genuine debate among agents. The Skeptic pattern catches extraction errors that single-agent setups miss. LangGraph wins when the flow is a state machine. CrewAI wins when the flow is a clean pipeline.

For the LangGraph alternative see scraping with LangGraph agents. For CrewAI, see CrewAI for scraping pipelines.

Distributed runtime for scale

The headline AutoGen v0.4 feature is the distributed runtime. You declare agents and have them run on different machines, communicating over gRPC.

from autogen_core import SingleThreadedAgentRuntime, AgentRuntime
from autogen_ext.runtimes.grpc import GrpcWorkerAgentRuntime

# host
runtime_host = GrpcWorkerAgentRuntime(host_address="0.0.0.0:50051")

# worker
runtime_worker = GrpcWorkerAgentRuntime(host_address="host.example.com:50051")
await runtime_worker.start()

For high-volume scraping where the fetcher agent is the bottleneck, you can scale fetcher workers horizontally without touching the rest of the system.

When the distributed runtime is overkill

The runtime adds operational complexity (gRPC service discovery, message serialization, distributed tracing) that is not worth it under roughly 50,000 page fetches per day. Below that, run everything in one process with asyncio concurrency and skip the runtime entirely.

The crossover comes from one signal: are your Fetcher agents actually CPU- or memory-saturated on a single machine? If yes, distribute. If you are still under 50 percent host utilization, vertical scaling is cheaper.

Concrete topology examples

The two topologies that handle 90 percent of real scraping work in 2026:

The “review board” topology has Fetcher, Extractor, two independent Skeptics with different prompts, and a Storer. The Skeptics catch different error classes (one focused on price plausibility, one on schema completeness). When they agree, Storer fires. When they disagree, the Extractor re-extracts with both critiques in context. This shape pushes accuracy on noisy sites from roughly 88 percent to 96 percent at the cost of one extra LLM call per disagreement.

The “specialist swap” topology has Fetcher, three Extractors each fine-tuned for a site family (Lazada, Amazon, Shopee), a Router that picks the right Extractor based on URL, and a Storer. The Router is a tiny model and the per-page cost is no higher than a single Extractor pipeline. Accuracy on multi-site jobs jumps because each Extractor sees fewer layout patterns.

Adding proxy rotation

Proxy rotation lives in your tools, not the agents. The fetch_url tool above already pulls from PROXY_POOL. For ASEAN scraping with carrier-clean mobile IPs, Singapore mobile proxy plugs into the pool.

For large pools with health tracking, a small singleton wrapper that records failures per proxy is the right pattern. AutoGen agents do not need to know about it; they just call fetch_url.

Quality benchmarks: where the Skeptic earns its keep

Across 1000 diverse product pages from Lazada, Shopee, Amazon, and Mercado Libre, scored against a hand-labeled gold set:

Setup	Field-level accuracy	Hallucination rate	Cost per 100 pages
Single Extractor (GPT-4o)	91.2%	4.1%	$4.80
Two-agent (Extractor + Storer)	91.4%	3.9%	$4.95
Three-agent with Skeptic	95.7%	1.6%	$5.40
Four-agent with two Skeptics	96.8%	0.9%	$6.20
Five-agent with site-routed Extractors	97.3%	0.7%	$5.95

The headline: a single Skeptic agent cuts hallucination rate by more than half for a 12 percent cost increase. The second Skeptic adds diminishing returns. The site-routed Extractor is the best value because per-Extractor specialization improves both accuracy and cost.

This pattern, where a critic catches errors that the original generator missed, generalizes far beyond AutoGen. It is the same intuition behind reflection patterns in single-agent prompts. AutoGen just makes it explicit and easy to extend.

Streaming and live progress

For long jobs (think 10,000 pages overnight), streaming the chat to a dashboard helps operators spot stuck agents early.

async for event in team.run_stream(task=task):
    if hasattr(event, "source") and hasattr(event, "content"):
        await dashboard_publish({
            "ts": time.time(),
            "agent": event.source,
            "snippet": str(event.content)[:200],
        })

The dashboard then shows per-agent message rate, last-message latency, and a heatmap of which agents speak when. Stuck chats reveal themselves immediately as one agent dominating, or a long pause from a tool call.

Cost benchmarks

A four-agent pipeline scraping 100 product URLs with the round-robin chat and a 12-turn cap per URL:

Model mix	LLM cost	Wall clock
All gpt-4o-mini	$0.62	11 min
Mini for chat, 4o for Extractor	$2.40	11 min
All gpt-4o	$11.50	12 min
All Claude 3.5 Haiku	$0.78	9 min
Haiku for chat, Sonnet for Extractor	$2.85	10 min

The mixed setup with cheap chat and strong extractor is the value pick. Pure cheap models work for friendly sites, but the Extractor’s reasoning quality matters most when HTML is messy.

Cost levers worth pulling

In priority order:

Trim the chat history aggressively. Past turn 6 the Skeptic and Storer rarely benefit from earlier turns. Use BufferedChatCompletionContext(buffer_size=6).
Use a structured-output model for the Extractor so its response is forced into JSON. Saves the chat from spending tokens parsing free-text JSON.
Cap max_turns at 12. Even with the Skeptic disagreeing, no productive scrape needs more.
Run the Skeptic on a smaller model than the Extractor. Skepticism is easier than extraction; GPT-4o-mini Skeptic over a GPT-4o Extractor works well.
For repeat URLs, hash the HTML and cache the extraction. The chat ends in one turn on cache hit.

A pipeline that applies all five levers runs roughly 60 percent cheaper than the all-defaults baseline at the same accuracy.

Production deployment

Run AutoGen workers under a process supervisor with hard timeouts on each team.run call. Set max_turns to bound chat length. Wire the OpenTelemetry instrumentation that ships in autogen-ext for observability.

For replay and debugging, save the full message history per run. AutoGen’s message objects are JSON-serializable.

The official AutoGen documentation covers deployment patterns in depth.

Observability and tracing

AutoGen v0.4 ships first-class OpenTelemetry instrumentation. Span attributes use the namespace autogen.* and follow the GenAI semantic conventions where possible.

from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter

provider = TracerProvider()
provider.add_span_processor(BatchSpanProcessor(OTLPSpanExporter(endpoint="otel-collector:4317")))
trace.set_tracer_provider(provider)

The trace shows one span per team.run, nested spans per agent turn, and child spans for each tool call. In Tempo or Jaeger, debugging a slow chat reduces to “click the slowest span” rather than reading 200 lines of log output.

Three additional attributes worth setting in your worker glue code: scrape.url, scrape.queue, scrape.batch_id. These let you slice the trace by domain, by queue, or by batch.

A complete production runner

Putting the patterns together, here is a runner that wires AutoGen, the proxy pool, OTel, retries, and a Postgres queue:

import asyncio, asyncpg, os, time
from autogen_agentchat.teams import RoundRobinGroupChat
from autogen_agentchat.conditions import TextMentionTermination, MaxMessageTermination

async def process_url(pool, url):
    team = RoundRobinGroupChat(
        participants=[fetcher, extractor, skeptic, storer],
        termination_condition=TextMentionTermination("STORED") | MaxMessageTermination(12),
        max_turns=12,
    )
    try:
        result = await asyncio.wait_for(team.run(task=f"Scrape {url}"), timeout=120)
        await pool.execute(
            "UPDATE scrape_queue SET status='done', finished_at=now() WHERE url=$1", url
        )
    except asyncio.TimeoutError:
        await pool.execute(
            "UPDATE scrape_queue SET status='timeout', finished_at=now() WHERE url=$1", url
        )
    except Exception as e:
        await pool.execute(
            "UPDATE scrape_queue SET status='error', error=$2, finished_at=now() "
            "WHERE url=$1", url, str(e)
        )

async def worker_loop():
    pool = await asyncpg.create_pool(os.environ["PG_URL"])
    sem = asyncio.Semaphore(5)
    while True:
        rows = await pool.fetch(
            "UPDATE scrape_queue SET status='running' WHERE id IN ("
            "SELECT id FROM scrape_queue WHERE status='pending' LIMIT 20"
            ") RETURNING url"
        )
        if not rows:
            await asyncio.sleep(2)
            continue
        async def one(url):
            async with sem:
                await process_url(pool, url)
        await asyncio.gather(*(one(r["url"]) for r in rows))

asyncio.run(worker_loop())

This runner survives worker restarts, bounds concurrency, records errors, and integrates cleanly with the OTel tracing above. It is roughly 60 lines of glue around an AutoGen team that does the actual work.

Frequently asked questions

Can AutoGen call MCP servers?
Yes. The autogen-ext-mcp community package wraps MCP tools as AutoGen tools. Schema translation is automatic.

How does AutoGen handle long context?
Group chat history can balloon fast. Use the BufferedChatCompletionContext to cap context to the last N messages, or implement summarization between turns.

Does AutoGen v0.4 work with local LLMs?
Yes. Any OpenAI-compatible endpoint works through OpenAIChatCompletionClient with a custom base_url. Ollama, vLLM, and LM Studio all integrate cleanly.

What is the migration path from AutoGen v0.2 to v0.4?
Significant. Tool definitions changed, agent classes renamed, group chat APIs different. Microsoft published a v0.4 migration guide that walks the major changes.

Can I use AutoGen with browser-use or Playwright agents?
Yes. Wrap the agentic browser as a tool that the AutoGen Fetcher calls. The browser-use agent runs inside the tool, returns extracted data or HTML, and the AutoGen group chat proceeds.

Can I run AutoGen entirely on Azure OpenAI?
Yes. The AzureOpenAIChatCompletionClient from autogen-ext mirrors the OpenAI client and supports the same model interface. This is the path most enterprise teams take.

How does AutoGen v0.4 compare to OpenAI Swarm?
Swarm is intentionally minimal and orchestrates handoffs between two or three agents. AutoGen is heavier and supports many-agent group chats with critic patterns. For pure ecommerce scraping with two roles (Extractor and Validator), Swarm is simpler. For workflows that benefit from disagreement, AutoGen wins.

How do I prevent two agents from talking past each other in a long chat?
Use the SelectorGroupChat with a tight selector prompt that names termination conditions. The most common antipattern is round-robin chat without a termination condition, which lets agents take roundabout turns indefinitely. Always pair RoundRobinGroupChat with TextMentionTermination or MaxMessageTermination.

Can I add a human reviewer to the group chat?
Yes. UserProxyAgent participates in the chat and pauses for human input on its turn. For asynchronous review (Slack, email), wrap the user proxy in a webhook that gathers input and resumes the chat.

Does AutoGen v0.4 support streaming responses?
Yes. team.run_stream yields message events as they happen. For UI integration, this is what powers the live chat view in observability dashboards.

Common production gotchas

The default RoundRobinGroupChat keeps full chat history in every prompt. Past 20 turns, context costs explode. Cap with BufferedChatCompletionContext.
TextMentionTermination is case-sensitive by default. The Storer agent must say “STORED” exactly as written, or the chat keeps spinning.
AutoGen v0.4 changed how tool errors propagate. Errors raised from inside an async tool function become user-visible messages in the chat, which the next agent reads as part of the conversation. Catch and format errors carefully.
The OpenTelemetry exporter is opt-in. Without it, debugging a stuck chat across distributed workers is painful.
The selector_prompt for SelectorGroupChat is a single string. For complex selection logic, the prompt grows large and starts to dominate cost. At that point a custom Selector callable is cleaner than a longer prompt.
AutoGen does not retry tool errors automatically. Wrap each tool function with a small retry decorator if you want resilience without involving the chat in error handling.
Token usage attribution per agent is not exposed by default. Use the OTel hooks to record per-span token counts if you need cost attribution.

If you are picking a multi-agent framework for a new scraping initiative, the AI agentic proxies category has head-to-head writeups that cover the proxy and infrastructure layer too.