Google ADK + Web Scraping: Build AI Agents with Proxy Integration

TL;DR
Google’s Agent Development Kit (ADK) lets you build multi-step AI agents that can browse and scrape the web as part of a larger workflow. this guide shows how to connect ADK agents to web scraping tools and route requests through proxies for reliable data collection.

what is Google ADK

Google ADK (Agent Development Kit) is an open-source Python framework released in April 2025 for building multi-agent systems powered by Gemini models. it provides a structured way to define agents, give them tools, and orchestrate multi-step workflows. think of it as a runtime for agents that can call functions, run sub-agents, and maintain state across steps.

for scraping use cases, ADK’s tool system is the key piece. you can define a Python function as a tool and the agent will decide when to call it, what arguments to pass, and how to use the result in subsequent steps. this is fundamentally different from writing a scraping script where you hardcode the sequence of operations.

installing ADK

pip install google-adk requests beautifulsoup4
export GOOGLE_API_KEY='your-gemini-api-key'

defining a scraping tool for ADK

ADK tools are Python functions with type annotations and docstrings. the docstring is what the agent uses to understand when and how to call the tool:

import requests
from bs4 import BeautifulSoup
from google.adk.agents import Agent

def fetch_webpage(url: str, proxy_url: str = None) -> dict:
    proxies = {'http': proxy_url, 'https': proxy_url} if proxy_url else None
    headers = {'User-Agent': 'Mozilla/5.0'}
    try:
        response = requests.get(url, headers=headers, proxies=proxies, timeout=15)
        soup = BeautifulSoup(response.text, 'html.parser')
        for tag in soup(['script', 'style', 'nav']):
            tag.decompose()
        return {
            'title': soup.title.string if soup.title else '',
            'text': soup.get_text(separator='\n', strip=True)[:5000],
            'links': [a['href'] for a in soup.find_all('a', href=True)][:20]
        }
    except Exception as e:
        return {'error': str(e)}

building the ADK agent

scraping_agent = Agent(
    name='web_scraper',
    model='gemini-2.0-flash',
    description='an agent that scrapes web pages and extracts structured data',
    instruction='you are a web scraping agent. fetch pages, extract the requested data, return JSON.',
    tools=[fetch_webpage]
)

running the agent

from google.adk.runners import Runner
from google.adk.sessions import InMemorySessionService
from google.genai.types import Content, Part
import asyncio

session_service = InMemorySessionService()
runner = Runner(agent=scraping_agent, app_name='scraper', session_service=session_service)

async def run_task(task: str):
    session = await session_service.create_session(app_name='scraper', user_id='user1')
    message = Content(parts=[Part(text=task)])
    async for event in runner.run_async(user_id='user1', session_id=session.id, new_message=message):
        if event.is_final_response():
            return event.response.text

result = asyncio.run(run_task('fetch https://news.ycombinator.com and list the top 5 story titles'))
print(result)

multi-step scraping workflows

where ADK shines is in chaining steps the agent figures out on its own. instead of writing a script that fetches a list page, extracts URLs, then fetches each URL, you can give the agent the goal and let it plan the steps. the agent handles the iteration logic. your job is to provide reliable tools and sensible rate-limiting inside those tools.

adding proxy rotation

import random
PROXY_POOL = [
    'http://user:pass@proxy1.example.com:8080',
    'http://user:pass@proxy2.example.com:8080',
]

def fetch_with_rotation(url: str) -> dict:
    proxy = random.choice(PROXY_POOL)
    return fetch_webpage(url, proxy_url=proxy)

for residential or mobile proxies with sticky sessions, see our guide on proxy servers for connection format details.

ADK vs writing a scraping script

use ADK when: the scraping task requires navigation decisions, the target sites change structure frequently, or you need to combine scraping with summarization or classification. use a traditional script when: the site structure is stable and predictable, you need maximum throughput, or you want to minimize API costs. see our web scraping fundamentals guide for the traditional approach.

sources and further reading

related guides

last updated: April 1, 2026