It looks like write permission to ~/Desktop/drt-articles/ is being blocked. here is the full article body — copy it directly:
—
Slack is where most engineering and growth teams already live, which makes it the obvious place to surface scraped data without forcing anyone to open a dashboard. building a Slack bot that scrapes the web with Claude cuts the loop from “I need that data” to “here it is” down to a single message. this guide walks through a working 2026 setup using the Slack Bolt SDK, Anthropic’s Claude API, and Playwright — with honest notes on where it breaks.
Why Claude Instead of a Rules-Based Scraper
Traditional scraper bots are brittle. you write selectors, sites change their markup, and suddenly your pipeline returns empty strings at 3am. Claude changes the dynamic: instead of matching CSS paths, you describe what you want in plain English and let the model extract it from raw HTML or markdown. the tradeoff is latency (an LLM call adds 1-3 seconds) and cost, but for on-demand queries in Slack it is usually acceptable.
The architecture that works in practice is a thin Bolt app that receives slash commands or mentions, hands the URL and extraction intent to a tool-calling loop, and posts the result as a formatted message. if you have built anything similar for Discord, the Slack version shares about 80% of the same plumbing — see Building a Discord Bot That Scrapes for Your Server (2026) for that variant. the core difference is Slack’s block kit response format versus Discord embeds.
Stack and Dependencies
You need four things:
slack-bolt>= 1.18 (Python) for the event loop and slash command handlinganthropic>= 0.25 for tool use with claude-sonnet-4-6playwrightfor JS-rendered pages (install withplaywright install chromium)- a residential or mobile proxy if you are scraping anything beyond public static pages
For proxy selection, here is a quick comparison of what works reliably in 2026:
| Provider type | JS rendering | Bot detection risk | Cost estimate |
|---|---|---|---|
| Datacenter | yes | high | $0.50-2/GB |
| Residential rotating | yes | low-medium | $3-8/GB |
| Mobile 4G rotating | yes | very low | $10-25/GB |
| Static residential | no (usually) | medium | $2-5/GB |
For most Slack bot use cases — checking a competitor price, pulling a LinkedIn post count, grabbing a job listing — residential rotating is the right balance. mobile proxies make sense if you are hitting mobile-first sites or anything with aggressive bot scoring.
Setting Up the Bolt App with Tool Use
The Slack app needs the app_mentions:read, chat:write, and commands scopes. run it in Socket Mode during development so you avoid ngrok. here is the minimal handler:
from slack_bolt import App
from slack_bolt.adapter.socket_mode import SocketModeHandler
import anthropic, asyncio
from playwright.async_api import async_playwright
app = App(token=os.environ["SLACK_BOT_TOKEN"])
client = anthropic.Anthropic()
@app.event("app_mention")
def handle_mention(event, say):
user_text = event["text"]
result = run_scrape_agent(user_text)
say(result)
def run_scrape_agent(prompt: str) -> str:
tools = [{
"name": "fetch_page",
"description": "Fetch a URL and return its text content",
"input_schema": {
"type": "object",
"properties": {"url": {"type": "string"}},
"required": ["url"]
}
}]
messages = [{"role": "user", "content": prompt}]
while True:
resp = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
tools=tools,
messages=messages
)
if resp.stop_reason == "end_turn":
return resp.content[0].text
for block in resp.content:
if block.type == "tool_use" and block.name == "fetch_page":
html = asyncio.run(fetch_with_playwright(block.input["url"]))
messages.append({"role": "assistant", "content": resp.content})
messages.append({"role": "user", "content": [{
"type": "tool_result",
"tool_use_id": block.id,
"content": html[:15000]
}]})
breakThe 15,000-character truncation is intentional. Claude can extract structured data from a partial page; sending the full DOM of a modern SPA burns tokens without improving accuracy.
Handling Anti-Bot Measures
Most blocks happen at the network level, not the extraction level. a working setup in 2026 uses:
- Route Playwright through a rotating residential proxy (
--proxy-server=http://user:pass@host:port) - Set a real browser User-Agent (Playwright’s default headless fingerprint is flagged by Cloudflare and DataDome)
- Add a 1-2 second random delay before extraction to mimic human read time
- For Cloudflare-protected pages, use a stealth plugin like
playwright-stealthor switch to a managed scraping API for those specific domains
The bot does not need to handle every site perfectly. in practice, 70-80% of Slack queries target unprotected or lightly protected pages. for the rest, you can add a fallback that responds with a link to the page and a note that direct extraction was blocked. the Telegram Bot for Web Scraping Alerts (2026 Setup) guide covers a similar fallback pattern with alert routing, which translates directly to Slack’s say() interface.
Rate Limits, Costs, and Scaling
A few numbers from running this in production with a 15-person team:
- average Claude call per query: 1,200 input tokens + 300 output = roughly $0.002 at Sonnet pricing
- average Playwright page load: 800ms on a residential proxy
- end-to-end Slack response time: 3-5 seconds for most queries
That is cheap enough to leave unrestricted for small teams. for larger orgs, add a per-user rate limit (the Slack user ID is in event["user"]) and a daily token budget tracked in Redis or a simple Postgres table.
The WhatsApp Bot for Price Alerts via Web Scraping (2026) post benchmarks similar per-query costs across messaging platforms — Slack lands in the middle, cheaper than WhatsApp Business API per message but slightly more complex to auth than Telegram.
If you want to go beyond single-URL extraction and build multi-step agent scrapers (crawling paginated results, filling forms, handling login flows), the patterns in Claude Code for Web Scraping: Building Agent Scrapers in 2026 apply directly to this architecture. the tool-calling loop above is the same primitive, just triggered from Slack instead of a terminal.
Caching Repeated Queries
If the same URL gets queried more than once in a 15-minute window, serve the cached result rather than re-fetching. a simple dict with (url, timestamp) keys works at small scale. this cuts costs by 40-60% for teams that monitor the same pages repeatedly (competitor pricing, job boards, public dashboards).
Deployment Checklist
Deploying to production takes about 30 minutes if you use Railway or Fly.io:
- Move from Socket Mode to HTTP mode (set
SLACK_APP_TOKENonly for dev) - Set environment variables:
SLACK_BOT_TOKEN,SLACK_SIGNING_SECRET,ANTHROPIC_API_KEY, proxy credentials - Add a health check endpoint (
GET /healthreturning 200) for your host’s uptime check - Set
PLAYWRIGHT_BROWSERS_PATH=/ms-playwrightand runplaywright install --with-deps chromiumin your Dockerfile - Cap memory at 512MB minimum — Chromium plus your app will push past 256MB under load
Socket Mode is fine for personal use or internal tooling. the HTTP deployment adds the signing secret verification step but is otherwise identical code.
Bottom Line
A Slack bot backed by Claude and Playwright is one of the fastest ways to put on-demand web data in front of a team that will not leave Slack to get it. start with the Bolt + tool-use pattern above, add a residential proxy for anything beyond public pages, and cache aggressively. DRT covers this infrastructure layer in depth — if you are standardizing a data collection stack, check the rest of the scraping infrastructure guides for proxy selection and anti-bot tooling.