Axios Retry for Web Scraping in Node.js: the Complete Guide
when you are scraping websites at any meaningful scale, requests will fail. servers return 429 rate limit errors, proxies time out, connections drop, and CAPTCHAs appear unexpectedly. the difference between a hobby scraper and a production-grade one is how it handles these failures.
Axios is one of the most popular HTTP clients in the Node.js ecosystem, and with the right retry configuration, it becomes a reliable foundation for web scraping. this guide covers everything from basic retry setup to advanced patterns with proxy rotation, exponential backoff, and intelligent error classification.
Why Axios for Web Scraping
Axios has several advantages over other Node.js HTTP clients for scraping:
- interceptors let you modify requests and responses globally, perfect for adding headers, rotating proxies, and logging
- automatic JSON parsing simplifies API scraping
- built-in timeout support prevents hung connections from blocking your scraper
- proxy support through the proxy config option or http agents
- wide ecosystem including axios-retry, axios-rate-limit, and other middleware
the main alternative is got or undici, but Axios’s interceptor system makes it particularly well-suited for the middleware patterns common in scraping.
Basic Axios Retry Setup
Installation
npm install axios axios-retry
Minimal Configuration
const axios = require('axios');
const axiosRetry = require('axios-retry').default;
const client = axios.create({
timeout: 30000, // 30 second timeout
headers: {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
}
});
axiosRetry(client, {
retries: 3,
retryDelay: axiosRetry.exponentialDelay,
retryCondition: (error) => {
return axiosRetry.isNetworkOrIdempotentRequestError(error) ||
error.response?.status === 429 ||
error.response?.status >= 500;
}
});
async function scrape(url) {
try {
const response = await client.get(url);
return response.data;
} catch (error) {
console.error(`failed to scrape ${url}: ${error.message}`);
return null;
}
}
this basic setup will retry requests up to 3 times with exponential backoff when it encounters network errors, 429 rate limits, or server errors (5xx).
Understanding Retry Strategies
Exponential Backoff
exponential backoff increases the delay between retries exponentially. this is the most common strategy for scraping because it gives overwhelmed servers time to recover.
// axios-retry's built-in exponential delay
// retry 1: ~1000ms (with jitter)
// retry 2: ~2000ms
// retry 3: ~4000ms
// retry 4: ~8000ms
axiosRetry(client, {
retries: 5,
retryDelay: (retryCount, error) => {
const delay = axiosRetry.exponentialDelay(retryCount, error);
console.log(`retry ${retryCount}: waiting ${delay}ms before next attempt`);
return delay;
}
});
Custom Backoff with Jitter
adding randomness (jitter) prevents multiple scrapers from retrying at the exact same time, which would cause a thundering herd problem:
function customBackoff(retryCount, error) {
const baseDelay = 1000; // 1 second
const maxDelay = 30000; // 30 seconds cap
// exponential backoff with full jitter
const exponentialDelay = baseDelay * Math.pow(2, retryCount - 1);
const cappedDelay = Math.min(exponentialDelay, maxDelay);
const jitteredDelay = Math.random() * cappedDelay;
return Math.floor(jitteredDelay);
}
axiosRetry(client, {
retries: 5,
retryDelay: customBackoff
});
Linear Backoff
for sites with predictable rate limits, linear backoff can be more appropriate:
function linearBackoff(retryCount) {
return retryCount * 2000; // 2s, 4s, 6s, 8s...
}
Respecting Retry-After Headers
some servers tell you exactly how long to wait via the Retry-After header:
function respectRetryAfter(retryCount, error) {
const retryAfter = error.response?.headers['retry-after'];
if (retryAfter) {
// retry-after can be seconds or an HTTP date
const seconds = parseInt(retryAfter);
if (!isNaN(seconds)) {
return seconds * 1000;
}
// try parsing as date
const retryDate = new Date(retryAfter);
if (!isNaN(retryDate.getTime())) {
return Math.max(0, retryDate.getTime() - Date.now());
}
}
// fallback to exponential backoff
return axiosRetry.exponentialDelay(retryCount, error);
}
Advanced Retry Configuration for Scraping
Smart Retry Conditions
not all errors should trigger retries. retrying a 404 wastes time, while retrying a 403 might work if you switch proxies:
const RETRY_STATUS_CODES = new Set([
408, // request timeout
429, // too many requests
500, // internal server error
502, // bad gateway
503, // service unavailable
504, // gateway timeout
522, // connection timed out (Cloudflare)
524, // a timeout occurred (Cloudflare)
]);
const SWITCH_PROXY_STATUS_CODES = new Set([
403, // forbidden (likely IP blocked)
407, // proxy authentication required
429, // rate limited (might be IP-specific)
]);
axiosRetry(client, {
retries: 5,
retryCondition: (error) => {
// always retry network errors
if (!error.response) return true;
const status = error.response.status;
return RETRY_STATUS_CODES.has(status) || SWITCH_PROXY_STATUS_CODES.has(status);
},
retryDelay: (retryCount, error) => {
if (error.response?.status === 429) {
// longer delay for rate limits
return respectRetryAfter(retryCount, error);
}
return axiosRetry.exponentialDelay(retryCount, error);
}
});
Retry with Proxy Rotation
the most powerful pattern for scraping is combining retries with proxy rotation. when a request fails due to an IP block, the retry automatically uses a different proxy:
const axios = require('axios');
const axiosRetry = require('axios-retry').default;
const { HttpsProxyAgent } = require('https-proxy-agent');
class ProxyRotator {
constructor(proxies) {
this.proxies = proxies;
this.currentIndex = 0;
this.failedProxies = new Set();
}
getNext() {
let attempts = 0;
while (attempts < this.proxies.length) {
const proxy = this.proxies[this.currentIndex % this.proxies.length];
this.currentIndex++;
if (!this.failedProxies.has(proxy)) {
return proxy;
}
attempts++;
}
// all proxies failed, reset and try again
this.failedProxies.clear();
return this.proxies[0];
}
markFailed(proxy) {
this.failedProxies.add(proxy);
}
}
const proxyRotator = new ProxyRotator([
'http://user:pass@proxy1.example.com:8080',
'http://user:pass@proxy2.example.com:8080',
'http://user:pass@proxy3.example.com:8080',
'http://user:pass@proxy4.example.com:8080',
'http://user:pass@proxy5.example.com:8080',
]);
const client = axios.create({ timeout: 30000 });
// intercept requests to add proxy
client.interceptors.request.use((config) => {
const proxyUrl = proxyRotator.getNext();
config.httpsAgent = new HttpsProxyAgent(proxyUrl);
config.metadata = { proxyUrl }; // store for error handling
return config;
});
axiosRetry(client, {
retries: 5,
retryCondition: (error) => {
if (error.response?.status === 403 || error.response?.status === 429) {
// mark the proxy as failed so the next retry uses a different one
const proxyUrl = error.config?.metadata?.proxyUrl;
if (proxyUrl) {
proxyRotator.markFailed(proxyUrl);
}
return true;
}
return axiosRetry.isNetworkOrIdempotentRequestError(error);
},
retryDelay: axiosRetry.exponentialDelay
});
Retry with Different User Agents
some anti-bot systems block specific user agents. rotating them on retry can help:
const userAgents = [
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 Chrome/120.0.0.0',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 Chrome/120.0.0.0',
'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 Chrome/120.0.0.0',
'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:121.0) Gecko/20100101 Firefox/121.0',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:121.0) Gecko/20100101 Firefox/121.0',
];
client.interceptors.request.use((config) => {
const retryCount = config['axios-retry']?.retryCount || 0;
config.headers['User-Agent'] = userAgents[retryCount % userAgents.length];
return config;
});
Building a Production Scraper with Axios Retry
here is a complete production-ready scraper that combines all the patterns above:
const axios = require('axios');
const axiosRetry = require('axios-retry').default;
const cheerio = require('cheerio');
const { HttpsProxyAgent } = require('https-proxy-agent');
const pLimit = require('p-limit');
class ProductionScraper {
constructor(options = {}) {
this.concurrency = options.concurrency || 5;
this.delayBetweenRequests = options.delayMs || 1000;
this.results = [];
this.errors = [];
// proxy setup
this.proxyGateway = options.proxyGateway || null;
// create axios client
this.client = axios.create({
timeout: options.timeout || 30000,
maxRedirects: 5,
validateStatus: (status) => status < 400,
});
this._setupRetry();
this._setupInterceptors();
}
_setupRetry() {
axiosRetry(this.client, {
retries: this.retryAttempts || 4,
retryDelay: (retryCount, error) => {
// respect Retry-After header
const retryAfter = error.response?.headers?.['retry-after'];
if (retryAfter) {
const seconds = parseInt(retryAfter);
if (!isNaN(seconds)) return seconds * 1000;
}
// exponential backoff with jitter
const baseDelay = 1000;
const maxDelay = 30000;
const delay = Math.min(baseDelay * Math.pow(2, retryCount - 1), maxDelay);
const jitter = delay * 0.5 * Math.random();
return Math.floor(delay + jitter);
},
retryCondition: (error) => {
if (!error.response) return true; // network error
const status = error.response.status;
return [403, 408, 429, 500, 502, 503, 504, 522, 524].includes(status);
},
onRetry: (retryCount, error, requestConfig) => {
console.log(`retry #${retryCount} for ${requestConfig.url}: ${error.message}`);
}
});
}
_setupInterceptors() {
// request interceptor: add proxy and headers
this.client.interceptors.request.use((config) => {
if (this.proxyGateway) {
config.httpsAgent = new HttpsProxyAgent(this.proxyGateway);
}
config.headers = {
...config.headers,
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.9',
'Accept-Encoding': 'gzip, deflate, br',
'Connection': 'keep-alive',
};
return config;
});
// response interceptor: log successful requests
this.client.interceptors.response.use(
(response) => {
console.log(`success: ${response.config.url} [${response.status}]`);
return response;
},
(error) => {
return Promise.reject(error);
}
);
}
async scrapeUrl(url, parser) {
try {
const response = await this.client.get(url);
const $ = cheerio.load(response.data);
const data = parser($, url);
this.results.push(data);
return data;
} catch (error) {
const errorInfo = {
url,
status: error.response?.status || 'network_error',
message: error.message,
retries: error.config?.['axios-retry']?.retryCount || 0
};
this.errors.push(errorInfo);
return null;
}
}
async scrapeUrls(urls, parser) {
const limit = pLimit(this.concurrency);
const tasks = urls.map((url, index) => {
return limit(async () => {
// delay between requests
if (index > 0) {
await this._delay(this.delayBetweenRequests);
}
return this.scrapeUrl(url, parser);
});
});
await Promise.all(tasks);
return {
results: this.results,
errors: this.errors,
successRate: `${this.results.length}/${urls.length}`
};
}
_delay(ms) {
return new Promise(resolve => setTimeout(resolve, ms));
}
}
// usage example
async function main() {
const scraper = new ProductionScraper({
concurrency: 3,
delayMs: 2000,
timeout: 30000,
proxyGateway: 'http://user:pass@gateway.proxyservice.com:7777'
});
const urls = [
'https://example.com/products/page/1',
'https://example.com/products/page/2',
'https://example.com/products/page/3',
];
const parser = ($, url) => {
const products = [];
$('.product-item').each((i, el) => {
products.push({
name: $(el).find('.product-name').text().trim(),
price: $(el).find('.product-price').text().trim(),
rating: $(el).find('.rating').attr('data-score'),
source_url: url
});
});
return { url, products, scraped_at: new Date().toISOString() };
};
const results = await scraper.scrapeUrls(urls, parser);
console.log(`scraping complete: ${results.successRate}`);
console.log(`errors: ${JSON.stringify(results.errors, null, 2)}`);
}
main().catch(console.error);
Common Pitfalls and How to Avoid Them
Pitfall 1: Retrying Non-Idempotent Requests
by default, axios-retry only retries idempotent requests (GET, HEAD, OPTIONS). if you are scraping with POST requests (some APIs require this), you need to explicitly allow it:
axiosRetry(client, {
retries: 3,
retryCondition: (error) => {
// this retries ALL request methods, including POST
return error.response?.status >= 500 || !error.response;
}
});
Pitfall 2: Infinite Retry Loops
always set a maximum retry count and a maximum delay:
axiosRetry(client, {
retries: 5, // never retry more than 5 times
retryDelay: (retryCount) => {
return Math.min(1000 * Math.pow(2, retryCount), 60000); // cap at 60 seconds
}
});
Pitfall 3: Not Handling CAPTCHA Responses
a 200 response does not always mean success. some sites return a 200 with a CAPTCHA page:
client.interceptors.response.use((response) => {
const html = typeof response.data === 'string' ? response.data : '';
if (html.includes('captcha') || html.includes('challenge-platform')) {
// treat CAPTCHA pages as errors to trigger retry
const error = new Error('CAPTCHA detected');
error.config = response.config;
error.response = response;
error.response.status = 403; // fake status to trigger retry
throw error;
}
return response;
});
Pitfall 4: Memory Leaks with Large Response Bodies
when scraping thousands of pages, response bodies can consume significant memory:
const client = axios.create({
timeout: 30000,
maxContentLength: 10 * 1024 * 1024, // 10MB max response size
decompress: true,
});
Monitoring Retry Performance
tracking retry metrics helps you optimize your scraping configuration:
class RetryMetrics {
constructor() {
this.totalRequests = 0;
this.totalRetries = 0;
this.retriesByStatus = {};
this.averageRetryDelay = 0;
this.totalDelayMs = 0;
}
recordRetry(status, delayMs) {
this.totalRetries++;
this.totalDelayMs += delayMs;
this.retriesByStatus[status] = (this.retriesByStatus[status] || 0) + 1;
this.averageRetryDelay = this.totalDelayMs / this.totalRetries;
}
recordRequest() {
this.totalRequests++;
}
getReport() {
return {
totalRequests: this.totalRequests,
totalRetries: this.totalRetries,
retryRate: `${((this.totalRetries / this.totalRequests) * 100).toFixed(1)}%`,
retriesByStatus: this.retriesByStatus,
averageRetryDelayMs: Math.round(this.averageRetryDelay),
timeSpentRetrying: `${(this.totalDelayMs / 1000).toFixed(1)}s`
};
}
}
const metrics = new RetryMetrics();
// integrate with axios-retry
axiosRetry(client, {
retries: 5,
retryDelay: (retryCount, error) => {
const delay = axiosRetry.exponentialDelay(retryCount, error);
metrics.recordRetry(error.response?.status || 'network', delay);
return delay;
}
});
client.interceptors.request.use((config) => {
metrics.recordRequest();
return config;
});
// print metrics after scraping
process.on('exit', () => {
console.log('retry metrics:', JSON.stringify(metrics.getReport(), null, 2));
});
Axios Retry vs. Alternatives
| feature | axios-retry | got (built-in) | undici retry | custom wrapper |
|---|---|---|---|---|
| exponential backoff | yes | yes | yes | manual |
| retry-after header | manual | yes | no | manual |
| per-request config | yes | yes | limited | yes |
| interceptor support | via axios | hooks | no | custom |
| proxy rotation on retry | via interceptors | via hooks | manual | manual |
| community size | large | large | growing | n/a |
axios-retry wins for scraping because of Axios’s interceptor system, which lets you build modular middleware for proxy rotation, user agent rotation, and request/response logging.
Conclusion
proper retry handling transforms Axios from a simple HTTP client into a reliable scraping engine. the key principles are:
- always use exponential backoff with jitter to avoid thundering herd problems
- classify errors intelligently so you only retry errors that have a chance of succeeding
- rotate proxies on IP-related failures (403, 429) to get a fresh IP on retry
- respect Retry-After headers when servers provide them
- monitor retry metrics to identify patterns and optimize your configuration
start with the basic setup and add complexity as your scraping needs grow. the production scraper class in this guide provides a solid foundation that you can extend for any scraping project.