Web Scraping with n8n in 2026: HTTP + Playwright Workflow Patterns

n8n has quietly become one of the most capable scraping orchestration layers available in 2026, especially for teams that want self-hosted control, JavaScript execution, and Playwright support without paying per-operation fees. If you’re evaluating automation tools for data pipelines, web scraping with n8n covers a wider range of extraction patterns than most engineers expect from a workflow tool.

Why n8n Works for Scraping Workflows

Most no-code tools hit a wall when pages require JavaScript rendering. n8n sidesteps this by letting you run arbitrary Node.js inside the Code node, and with the community Playwright node available via the n8n-nodes-puppeteer package (which now officially bundles Playwright), you get full browser control inside your workflow.

The self-hosted model matters here. Unlike web scraping with Make.com, where every HTTP call consumes an operation credit, n8n on a VPS has no per-run cost. You pay for your server, not your scraping volume.

HTTP Request Node: The Right Patterns

The HTTP Request node is the fastest path for static pages and APIs. Key settings to get right:

Authentication: supports OAuth2, API keys, and custom headers out of the box
Response format: set to JSON for APIs, String for raw HTML, File for binary downloads
Batching: use the SplitInBatches node upstream to stay under rate limits
Retry on fail: enable with a 2-5 second wait; most transient 429s resolve in one retry

For pagination, the most reliable pattern is a Loop using a cursor or page number stored in a workflow variable:

// HTTP Request node - Query Parameters
{
  "page": "={{ $workflow.variables.currentPage }}",
  "per_page": "50"
}

Set a Merge node at the loop exit to accumulate results, then push to your destination. This handles pagination up to several thousand pages without memory issues if you flush to storage every N iterations.

Handling 403s and Anti-Bot Responses

Rotate your User-Agent in the header field using an expression:

// Code node - before HTTP Request
const agents = [
  'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
  'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36'
];
return [{ json: { ua: agents[Math.floor(Math.random() * agents.length)] } }];

For persistent 403s, you need residential proxies. Set the HTTP Request node’s proxy fields to your rotating endpoint. This is more reliable than anything Zapier offers at this layer — see the web scraping with Zapier comparison for what Zapier can and cannot reach.

Playwright Workflows for JS-Heavy Sites

Install the community Playwright node (n8n-nodes-playwright) on your self-hosted instance. Once registered, you get a browser automation node with standard goto/click/extract actions.

A realistic Playwright workflow in n8n looks like:

Trigger: Schedule node (every 6 hours)
Set URL list: Code node that defines target pages
SplitInBatches: Process 5 URLs at a time to limit memory
Playwright node: Navigate, wait for selector, extract inner text or attribute
Code node: Clean and normalize extracted data
Postgres/Google Sheets node: Write results

The Playwright node’s “Wait for selector” field is critical. Use CSS selectors, not XPath, for reliability. Set a 10-15 second timeout for SPAs that lazy-load content.

For comparison, web scraping with Pipedream has tighter Node.js integration but no native browser node — you’d need to call an external Browserless API. n8n keeps it self-contained.

n8n vs. Other Automation Tools for Scraping

Tool	Browser Support	Self-Hosted	Cost Model	Rate Limit Control
n8n	Yes (Playwright node)	Yes	Server cost	Full control
Make.com	No native	No	Per-operation	Limited
Zapier	No	No	Per-task	Limited
Pipedream	Via API call	Partial	Per-compute	Good
Activepieces	No native	Yes	Server cost	Good

n8n and Activepieces are the two serious self-hosted contenders. Activepieces has a cleaner UI and faster iteration cycle for simple HTTP scrapes. n8n wins when you need the Playwright node, complex branching logic, or the Code node’s full Node.js environment.

Error Handling and Production Reliability

n8n’s error workflow feature is underused. Set a dedicated Error Workflow in Settings and handle failed executions centrally:

Log errors to a Google Sheet or Supabase table with executionId, nodeName, timestamp, and errorMessage
Alert via Telegram or Slack node on repeated failures
Use the continueOnFail toggle on the HTTP Request node for non-critical extractions so one bad URL doesn’t kill the entire batch

For high-frequency scraping (sub-hourly), run n8n with the queue mode enabled (EXECUTIONS_MODE=queue) and a Redis backend. This prevents workflow collisions and gives you execution visibility in the n8n UI. A $6/mo DigitalOcean droplet handles 20-30 concurrent light HTTP scrapes without issue. Playwright workflows need at least 2GB RAM — plan for a $12-18/mo instance.

One production gotcha: n8n’s built-in scheduler drifts slightly over time. For exact-interval scraping, trigger via an external cron (a simple curl to the webhook trigger URL) rather than relying on the Schedule node for precision.

For deeper coverage of n8n scraping patterns including authentication flows, JavaScript injection, and output normalization, the n8n web scraping guide on DRT goes further than what a single article can cover.

Bottom line

n8n is the right pick in 2026 if you want self-hosted browser-level scraping without per-operation billing and without writing a custom scraper from scratch. Use the HTTP Request node for API-style and static targets, add the Playwright community node for anything JS-rendered, and deploy with queue mode once you hit production scale. DataResearchTools covers the full scraping stack — tools, proxies, anti-bot bypass, and infrastructure — if you need to go deeper on any layer of this pipeline.