- All 5 internal links woven inline
- Table covers all 5 platforms with honest tradeoffs
- Numbered list in deployment section, bullet list in GPU use cases section
- Code snippet is a realistic Modal Playwright function with volume caching
- Bottom line is 3 sentences with soft DRT mention
—
Web scraping gets expensive the moment JavaScript rendering, screenshots, and anti-bot handling stop being edge cases and become the default path. In 2026, Modal Labs is one of the cleanest ways to ship that heavier scraper stack without babysitting servers, Docker hosts, or idle browser pools. You write Python, wrap jobs with @app.function, ship with modal deploy, and let scale-to-zero handle bursty demand. That model is especially strong when your scraper is not just fetching HTML, but launching Playwright, saving browser state, and running vision steps on screenshots before turning the result into structured data.
Where Modal Labs actually wins for scraping
Modal is best when your workload is spiky, CPU or GPU heavy, and annoying to operate on traditional app platforms. A basic requests-based crawler does not need it. A scraper that opens Chromium, waits for client-side rendering, scrolls, screenshots, classifies elements, and retries through proxies often does.
Three use cases fit particularly well:
- JS-heavy SPA scraping at burst scale
- Vision-assisted extraction from screenshots
- One-shot jobs triggered by queues, cron, or webhooks
The big operational advantage is that Modal’s unit of deployment matches scraper reality. Most scraping jobs are independent, parallel, and short-lived. You do not want a permanently warm VM just to catch ten jobs per hour. Modal lets you run one function per job, fan out aggressively, and pay nothing when idle.
That makes it a better fit than app-first platforms for ephemeral parallelism. If you have already looked at Deploying Scrapers on Render 2026: Background Worker Patterns, the contrast is straightforward: Render is comfortable for always-on worker queues, but Modal is cleaner when concurrency spikes unpredictably. The same pattern shows up against Deploying Scrapers on Cloudflare Workers 2026: Limits and Workarounds, because Workers are excellent for lightweight edge logic, but the 30-second CPU limit makes full browser automation and on-box model inference awkward fast.
The caveat is just as important. Modal is not cheap if your scraper runs flat-out all day. Once utilization is steady, bare metal economics catch up hard, which is why Deploying Scrapers on Hetzner: Cheapest Production Stack 2026 remains the price leader for sustained volume.
Serverless GPU headless browsers, when they pay for themselves
Most scrapers do not need a GPU. If you are only using Playwright to wait for hydration and click a pagination button, CPU containers are enough. GPU becomes rational when the browser is only half the job and perception is the other half.
Typical examples:
- Taking screenshots and running a local detector such as YOLO or Florence to find buttons, solve visual flows, or label page regions
- Sending images to a model like Gemini 2.0 Flash for structured extraction from rendered pages, receipts, dashboards, or CAPTCHA-adjacent flows
- Processing large batches of screenshots where browser time and model time are tightly coupled
That third category is where Modal gets interesting. Instead of screenshotting on one system and shipping images to another pipeline, you can keep Playwright and inference in the same function. That reduces glue code, cuts queue churn, and simplifies retries.
If your vision stack is API-first, read Gemini 2.0 Flash for Web Scraping: Cheap Multi-Modal Scrapers in 2026. The practical pattern is often CPU browser plus external model API first, then move to T4 or A10G only when latency, screenshot volume, or local model control justify it.
A simple Modal layout looks like this:
import modal
app = modal.App("playwright-scraper")
image = (
modal.Image.debian_slim()
.pip_install("playwright==1.52.0", "httpx", "beautifulsoup4")
.run_commands("playwright install chromium")
)
cache = modal.Volume.from_name("scraper-cache", create_if_missing=True)
@app.function(
image=image,
cpu=2,
memory=4096,
timeout=900,
volumes={"/cache": cache},
)
def scrape_page(url: str):
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
context = browser.new_context(storage_state="/cache/state.json")
page = context.new_page()
page.goto(url, wait_until="networkidle", timeout=90000)
html = page.content()
page.screenshot(path="/cache/last-page.png", full_page=True)
context.storage_state(path="/cache/state.json")
browser.close()
cache.commit()
return {"url": url, "html_len": len(html)}
@app.local_entrypoint()
def main():
print(scrape_page.remote("https://example.com"))That example is intentionally boring, which is good. Boring deployments survive production.
Costs, cold starts, and the math that matters
Modal’s pricing only looks expensive or cheap when divorced from utilization. CPU-only functions cost about $0.000011 per vCPU-second. T4 is about $0.000583 per GPU-second, roughly $2.10 per GPU-hour. A10G is about $0.000972 per GPU-second, roughly $3.50 per GPU-hour. There is no idle bill because scale-to-zero is the point.
Here is the practical comparison:
| Platform | Best use case | Main strength | Main weakness |
|---|---|---|---|
| Modal Labs | Bursty browser jobs, vision scraping | Scale-to-zero, GPU option, simple Python deployment | Cold starts, shared datacenter IPs |
| Render | Always-on queues, stable worker apps | Familiar app platform, persistent workers | Less elegant for massive short bursts |
| Cloudflare Workers | Lightweight fetch, edge transforms | Very fast global edge execution | 30s CPU limit blocks heavy browser work |
| Fly.io | Region-sensitive scraping | Better region pinning and placement control | More ops overhead than pure serverless |
| Hetzner | Constant high-volume scraping | Lowest long-run cost | You operate the fleet |
Cold starts are real, so plan for them. CPU containers usually come up in 5 to 15 seconds. GPU containers with browser binaries are more like 20 to 40 seconds. If your use case depends on sub-second first response, Modal is the wrong primary host. If your jobs run for 60 to 300 seconds and arrive in bursts, those startup penalties are usually acceptable.
The decision rule is simple. Use Modal when the savings from not running idle infrastructure are larger than the premium you pay during execution. A scraper that runs 200 short browser jobs per day often fits Modal well. A farm running 24/7 across thousands of sessions usually belongs on cheaper dedicated infrastructure.
The deployment pattern that works in production
The cleanest production pattern on Modal is not one giant scraper function. It is a small set of focused functions with explicit state handling.
A good baseline looks like this:
- one fetch function for browser navigation
- one parse function for HTML or screenshot extraction
- one storage function or queue handoff
- one
modal.Volumefor cached profiles, cookies, and downloads
Volumes matter more than people expect. With modal.Volume, you can persist browser profiles, cookies, downloaded files, and intermediate screenshots across runs. That is useful for authenticated scraping, session reuse, and reducing repeated setup cost. It is not a database, but it is a practical cache layer for scraper state.
A deployment flow usually stays simple:
- Build the container image with Chromium and your parsing libraries
- Mount a volume for browser state and artifacts
- Expose a web endpoint or queue-triggered function
- Deploy with
modal deploy scraper.py - Run one-shot jobs during development with
modal run scraper.py::scrape_page
Two production warnings deserve emphasis. First, Modal egress IPs are shared AWS us-east-1 datacenter IPs. Serious anti-bot systems will fingerprint them quickly. If the target matters, pair Modal with residential proxies. Second, geographic control is weaker than region-pinned worker models, so if source-country affinity is non-negotiable, Deploying Scrapers on Fly.io 2026: Region-Pinned Workers may be a better fit.
The opinionated default: start on CPU, add persistent volumes early, bring proxies before adding GPUs, and only move to T4 or A10G once the value of on-box vision is proven. Teams often reverse that order and waste money.
Bottom line
Modal Labs is a strong 2026 choice for bursty, browser-heavy scrapers, especially when Playwright and vision inference need to live in the same pipeline. It is not the cheapest host for sustained workloads, and it is not the stealthiest because of shared AWS egress, but it is one of the fastest ways to get serverless GPU scraping into production with minimal ops. If you are comparing deployment hosts seriously, dataresearchtools.com covers Render, Workers, Fly.io, and Hetzner in the same series depending on your traffic shape.
Related guides on dataresearchtools.com
- Deploying Scrapers on Render 2026: Background Worker Patterns
- Deploying Scrapers on Cloudflare Workers 2026: Limits and Workarounds
- Deploying Scrapers on Hetzner: Cheapest Production Stack 2026
- Deploying Scrapers on Fly.io 2026: Region-Pinned Workers
- Pillar: Gemini 2.0 Flash for Web Scraping: Cheap Multi-Modal Scrapers in 2026