Axios Retry for Web Scraping in Node.js: the Complete Guide

Axios Retry for Web Scraping in Node.js: the Complete Guide

when you are scraping websites at any meaningful scale, requests will fail. servers return 429 rate limit errors, proxies time out, connections drop, and CAPTCHAs appear unexpectedly. the difference between a hobby scraper and a production-grade one is how it handles these failures.

Axios is one of the most popular HTTP clients in the Node.js ecosystem, and with the right retry configuration, it becomes a reliable foundation for web scraping. this guide covers everything from basic retry setup to advanced patterns with proxy rotation, exponential backoff, and intelligent error classification.

Why Axios for Web Scraping

Axios has several advantages over other Node.js HTTP clients for scraping:

  • interceptors let you modify requests and responses globally, perfect for adding headers, rotating proxies, and logging
  • automatic JSON parsing simplifies API scraping
  • built-in timeout support prevents hung connections from blocking your scraper
  • proxy support through the proxy config option or http agents
  • wide ecosystem including axios-retry, axios-rate-limit, and other middleware

the main alternative is got or undici, but Axios’s interceptor system makes it particularly well-suited for the middleware patterns common in scraping.

Basic Axios Retry Setup

Installation

npm install axios axios-retry

Minimal Configuration

const axios = require('axios');
const axiosRetry = require('axios-retry').default;

const client = axios.create({
  timeout: 30000, // 30 second timeout
  headers: {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
  }
});

axiosRetry(client, {
  retries: 3,
  retryDelay: axiosRetry.exponentialDelay,
  retryCondition: (error) => {
    return axiosRetry.isNetworkOrIdempotentRequestError(error) ||
           error.response?.status === 429 ||
           error.response?.status >= 500;
  }
});

async function scrape(url) {
  try {
    const response = await client.get(url);
    return response.data;
  } catch (error) {
    console.error(`failed to scrape ${url}: ${error.message}`);
    return null;
  }
}

this basic setup will retry requests up to 3 times with exponential backoff when it encounters network errors, 429 rate limits, or server errors (5xx).

Understanding Retry Strategies

Exponential Backoff

exponential backoff increases the delay between retries exponentially. this is the most common strategy for scraping because it gives overwhelmed servers time to recover.

// axios-retry's built-in exponential delay
// retry 1: ~1000ms (with jitter)
// retry 2: ~2000ms
// retry 3: ~4000ms
// retry 4: ~8000ms

axiosRetry(client, {
  retries: 5,
  retryDelay: (retryCount, error) => {
    const delay = axiosRetry.exponentialDelay(retryCount, error);
    console.log(`retry ${retryCount}: waiting ${delay}ms before next attempt`);
    return delay;
  }
});

Custom Backoff with Jitter

adding randomness (jitter) prevents multiple scrapers from retrying at the exact same time, which would cause a thundering herd problem:

function customBackoff(retryCount, error) {
  const baseDelay = 1000; // 1 second
  const maxDelay = 30000; // 30 seconds cap

  // exponential backoff with full jitter
  const exponentialDelay = baseDelay * Math.pow(2, retryCount - 1);
  const cappedDelay = Math.min(exponentialDelay, maxDelay);
  const jitteredDelay = Math.random() * cappedDelay;

  return Math.floor(jitteredDelay);
}

axiosRetry(client, {
  retries: 5,
  retryDelay: customBackoff
});

Linear Backoff

for sites with predictable rate limits, linear backoff can be more appropriate:

function linearBackoff(retryCount) {
  return retryCount * 2000; // 2s, 4s, 6s, 8s...
}

Respecting Retry-After Headers

some servers tell you exactly how long to wait via the Retry-After header:

function respectRetryAfter(retryCount, error) {
  const retryAfter = error.response?.headers['retry-after'];

  if (retryAfter) {
    // retry-after can be seconds or an HTTP date
    const seconds = parseInt(retryAfter);
    if (!isNaN(seconds)) {
      return seconds * 1000;
    }

    // try parsing as date
    const retryDate = new Date(retryAfter);
    if (!isNaN(retryDate.getTime())) {
      return Math.max(0, retryDate.getTime() - Date.now());
    }
  }

  // fallback to exponential backoff
  return axiosRetry.exponentialDelay(retryCount, error);
}

Advanced Retry Configuration for Scraping

Smart Retry Conditions

not all errors should trigger retries. retrying a 404 wastes time, while retrying a 403 might work if you switch proxies:

const RETRY_STATUS_CODES = new Set([
  408, // request timeout
  429, // too many requests
  500, // internal server error
  502, // bad gateway
  503, // service unavailable
  504, // gateway timeout
  522, // connection timed out (Cloudflare)
  524, // a timeout occurred (Cloudflare)
]);

const SWITCH_PROXY_STATUS_CODES = new Set([
  403, // forbidden (likely IP blocked)
  407, // proxy authentication required
  429, // rate limited (might be IP-specific)
]);

axiosRetry(client, {
  retries: 5,
  retryCondition: (error) => {
    // always retry network errors
    if (!error.response) return true;

    const status = error.response.status;
    return RETRY_STATUS_CODES.has(status) || SWITCH_PROXY_STATUS_CODES.has(status);
  },
  retryDelay: (retryCount, error) => {
    if (error.response?.status === 429) {
      // longer delay for rate limits
      return respectRetryAfter(retryCount, error);
    }
    return axiosRetry.exponentialDelay(retryCount, error);
  }
});

Retry with Proxy Rotation

the most powerful pattern for scraping is combining retries with proxy rotation. when a request fails due to an IP block, the retry automatically uses a different proxy:

const axios = require('axios');
const axiosRetry = require('axios-retry').default;
const { HttpsProxyAgent } = require('https-proxy-agent');

class ProxyRotator {
  constructor(proxies) {
    this.proxies = proxies;
    this.currentIndex = 0;
    this.failedProxies = new Set();
  }

  getNext() {
    let attempts = 0;
    while (attempts < this.proxies.length) {
      const proxy = this.proxies[this.currentIndex % this.proxies.length];
      this.currentIndex++;

      if (!this.failedProxies.has(proxy)) {
        return proxy;
      }
      attempts++;
    }

    // all proxies failed, reset and try again
    this.failedProxies.clear();
    return this.proxies[0];
  }

  markFailed(proxy) {
    this.failedProxies.add(proxy);
  }
}

const proxyRotator = new ProxyRotator([
  'http://user:pass@proxy1.example.com:8080',
  'http://user:pass@proxy2.example.com:8080',
  'http://user:pass@proxy3.example.com:8080',
  'http://user:pass@proxy4.example.com:8080',
  'http://user:pass@proxy5.example.com:8080',
]);

const client = axios.create({ timeout: 30000 });

// intercept requests to add proxy
client.interceptors.request.use((config) => {
  const proxyUrl = proxyRotator.getNext();
  config.httpsAgent = new HttpsProxyAgent(proxyUrl);
  config.metadata = { proxyUrl }; // store for error handling
  return config;
});

axiosRetry(client, {
  retries: 5,
  retryCondition: (error) => {
    if (error.response?.status === 403 || error.response?.status === 429) {
      // mark the proxy as failed so the next retry uses a different one
      const proxyUrl = error.config?.metadata?.proxyUrl;
      if (proxyUrl) {
        proxyRotator.markFailed(proxyUrl);
      }
      return true;
    }
    return axiosRetry.isNetworkOrIdempotentRequestError(error);
  },
  retryDelay: axiosRetry.exponentialDelay
});

Retry with Different User Agents

some anti-bot systems block specific user agents. rotating them on retry can help:

const userAgents = [
  'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 Chrome/120.0.0.0',
  'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 Chrome/120.0.0.0',
  'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 Chrome/120.0.0.0',
  'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:121.0) Gecko/20100101 Firefox/121.0',
  'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:121.0) Gecko/20100101 Firefox/121.0',
];

client.interceptors.request.use((config) => {
  const retryCount = config['axios-retry']?.retryCount || 0;
  config.headers['User-Agent'] = userAgents[retryCount % userAgents.length];
  return config;
});

Building a Production Scraper with Axios Retry

here is a complete production-ready scraper that combines all the patterns above:

const axios = require('axios');
const axiosRetry = require('axios-retry').default;
const cheerio = require('cheerio');
const { HttpsProxyAgent } = require('https-proxy-agent');
const pLimit = require('p-limit');

class ProductionScraper {
  constructor(options = {}) {
    this.concurrency = options.concurrency || 5;
    this.delayBetweenRequests = options.delayMs || 1000;
    this.results = [];
    this.errors = [];

    // proxy setup
    this.proxyGateway = options.proxyGateway || null;

    // create axios client
    this.client = axios.create({
      timeout: options.timeout || 30000,
      maxRedirects: 5,
      validateStatus: (status) => status < 400,
    });

    this._setupRetry();
    this._setupInterceptors();
  }

  _setupRetry() {
    axiosRetry(this.client, {
      retries: this.retryAttempts || 4,
      retryDelay: (retryCount, error) => {
        // respect Retry-After header
        const retryAfter = error.response?.headers?.['retry-after'];
        if (retryAfter) {
          const seconds = parseInt(retryAfter);
          if (!isNaN(seconds)) return seconds * 1000;
        }

        // exponential backoff with jitter
        const baseDelay = 1000;
        const maxDelay = 30000;
        const delay = Math.min(baseDelay * Math.pow(2, retryCount - 1), maxDelay);
        const jitter = delay * 0.5 * Math.random();
        return Math.floor(delay + jitter);
      },
      retryCondition: (error) => {
        if (!error.response) return true; // network error
        const status = error.response.status;
        return [403, 408, 429, 500, 502, 503, 504, 522, 524].includes(status);
      },
      onRetry: (retryCount, error, requestConfig) => {
        console.log(`retry #${retryCount} for ${requestConfig.url}: ${error.message}`);
      }
    });
  }

  _setupInterceptors() {
    // request interceptor: add proxy and headers
    this.client.interceptors.request.use((config) => {
      if (this.proxyGateway) {
        config.httpsAgent = new HttpsProxyAgent(this.proxyGateway);
      }

      config.headers = {
        ...config.headers,
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
        'Accept-Language': 'en-US,en;q=0.9',
        'Accept-Encoding': 'gzip, deflate, br',
        'Connection': 'keep-alive',
      };

      return config;
    });

    // response interceptor: log successful requests
    this.client.interceptors.response.use(
      (response) => {
        console.log(`success: ${response.config.url} [${response.status}]`);
        return response;
      },
      (error) => {
        return Promise.reject(error);
      }
    );
  }

  async scrapeUrl(url, parser) {
    try {
      const response = await this.client.get(url);
      const $ = cheerio.load(response.data);
      const data = parser($, url);
      this.results.push(data);
      return data;
    } catch (error) {
      const errorInfo = {
        url,
        status: error.response?.status || 'network_error',
        message: error.message,
        retries: error.config?.['axios-retry']?.retryCount || 0
      };
      this.errors.push(errorInfo);
      return null;
    }
  }

  async scrapeUrls(urls, parser) {
    const limit = pLimit(this.concurrency);

    const tasks = urls.map((url, index) => {
      return limit(async () => {
        // delay between requests
        if (index > 0) {
          await this._delay(this.delayBetweenRequests);
        }
        return this.scrapeUrl(url, parser);
      });
    });

    await Promise.all(tasks);

    return {
      results: this.results,
      errors: this.errors,
      successRate: `${this.results.length}/${urls.length}`
    };
  }

  _delay(ms) {
    return new Promise(resolve => setTimeout(resolve, ms));
  }
}

// usage example
async function main() {
  const scraper = new ProductionScraper({
    concurrency: 3,
    delayMs: 2000,
    timeout: 30000,
    proxyGateway: 'http://user:pass@gateway.proxyservice.com:7777'
  });

  const urls = [
    'https://example.com/products/page/1',
    'https://example.com/products/page/2',
    'https://example.com/products/page/3',
  ];

  const parser = ($, url) => {
    const products = [];
    $('.product-item').each((i, el) => {
      products.push({
        name: $(el).find('.product-name').text().trim(),
        price: $(el).find('.product-price').text().trim(),
        rating: $(el).find('.rating').attr('data-score'),
        source_url: url
      });
    });
    return { url, products, scraped_at: new Date().toISOString() };
  };

  const results = await scraper.scrapeUrls(urls, parser);
  console.log(`scraping complete: ${results.successRate}`);
  console.log(`errors: ${JSON.stringify(results.errors, null, 2)}`);
}

main().catch(console.error);

Common Pitfalls and How to Avoid Them

Pitfall 1: Retrying Non-Idempotent Requests

by default, axios-retry only retries idempotent requests (GET, HEAD, OPTIONS). if you are scraping with POST requests (some APIs require this), you need to explicitly allow it:

axiosRetry(client, {
  retries: 3,
  retryCondition: (error) => {
    // this retries ALL request methods, including POST
    return error.response?.status >= 500 || !error.response;
  }
});

Pitfall 2: Infinite Retry Loops

always set a maximum retry count and a maximum delay:

axiosRetry(client, {
  retries: 5, // never retry more than 5 times
  retryDelay: (retryCount) => {
    return Math.min(1000 * Math.pow(2, retryCount), 60000); // cap at 60 seconds
  }
});

Pitfall 3: Not Handling CAPTCHA Responses

a 200 response does not always mean success. some sites return a 200 with a CAPTCHA page:

client.interceptors.response.use((response) => {
  const html = typeof response.data === 'string' ? response.data : '';

  if (html.includes('captcha') || html.includes('challenge-platform')) {
    // treat CAPTCHA pages as errors to trigger retry
    const error = new Error('CAPTCHA detected');
    error.config = response.config;
    error.response = response;
    error.response.status = 403; // fake status to trigger retry
    throw error;
  }

  return response;
});

Pitfall 4: Memory Leaks with Large Response Bodies

when scraping thousands of pages, response bodies can consume significant memory:

const client = axios.create({
  timeout: 30000,
  maxContentLength: 10 * 1024 * 1024, // 10MB max response size
  decompress: true,
});

Monitoring Retry Performance

tracking retry metrics helps you optimize your scraping configuration:

class RetryMetrics {
  constructor() {
    this.totalRequests = 0;
    this.totalRetries = 0;
    this.retriesByStatus = {};
    this.averageRetryDelay = 0;
    this.totalDelayMs = 0;
  }

  recordRetry(status, delayMs) {
    this.totalRetries++;
    this.totalDelayMs += delayMs;
    this.retriesByStatus[status] = (this.retriesByStatus[status] || 0) + 1;
    this.averageRetryDelay = this.totalDelayMs / this.totalRetries;
  }

  recordRequest() {
    this.totalRequests++;
  }

  getReport() {
    return {
      totalRequests: this.totalRequests,
      totalRetries: this.totalRetries,
      retryRate: `${((this.totalRetries / this.totalRequests) * 100).toFixed(1)}%`,
      retriesByStatus: this.retriesByStatus,
      averageRetryDelayMs: Math.round(this.averageRetryDelay),
      timeSpentRetrying: `${(this.totalDelayMs / 1000).toFixed(1)}s`
    };
  }
}

const metrics = new RetryMetrics();

// integrate with axios-retry
axiosRetry(client, {
  retries: 5,
  retryDelay: (retryCount, error) => {
    const delay = axiosRetry.exponentialDelay(retryCount, error);
    metrics.recordRetry(error.response?.status || 'network', delay);
    return delay;
  }
});

client.interceptors.request.use((config) => {
  metrics.recordRequest();
  return config;
});

// print metrics after scraping
process.on('exit', () => {
  console.log('retry metrics:', JSON.stringify(metrics.getReport(), null, 2));
});

Axios Retry vs. Alternatives

featureaxios-retrygot (built-in)undici retrycustom wrapper
exponential backoffyesyesyesmanual
retry-after headermanualyesnomanual
per-request configyesyeslimitedyes
interceptor supportvia axioshooksnocustom
proxy rotation on retryvia interceptorsvia hooksmanualmanual
community sizelargelargegrowingn/a

axios-retry wins for scraping because of Axios’s interceptor system, which lets you build modular middleware for proxy rotation, user agent rotation, and request/response logging.

Conclusion

proper retry handling transforms Axios from a simple HTTP client into a reliable scraping engine. the key principles are:

  1. always use exponential backoff with jitter to avoid thundering herd problems
  2. classify errors intelligently so you only retry errors that have a chance of succeeding
  3. rotate proxies on IP-related failures (403, 429) to get a fresh IP on retry
  4. respect Retry-After headers when servers provide them
  5. monitor retry metrics to identify patterns and optimize your configuration

start with the basic setup and add complexity as your scraping needs grow. the production scraper class in this guide provides a solid foundation that you can extend for any scraping project.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top