How to Scrape ZipRecruiter Job Data in 2026

How to Scrape ZipRecruiter Job Data in 2026

ZipRecruiter is one of the leading job marketplaces in the United States, with over 110 million job seekers and 9 million active job postings. Its AI-powered matching technology and extensive employer network make it a valuable data source for labor market analysts, recruitment agencies, salary benchmarking services, and HR tech companies.

This guide covers how to scrape ZipRecruiter job data using Python, handle their anti-bot protections, and use proxies for reliable extraction.

What Data Can You Extract?

ZipRecruiter job listings include:

  • Job titles and descriptions
  • Company names and profiles
  • Salary estimates (ZipRecruiter provides salary estimates even when employers don’t)
  • Location data (city, state, remote options)
  • Employment type (full-time, part-time, contract)
  • Posted date and urgency indicators
  • Required qualifications and skills
  • Benefits information
  • Application count (“X people clicked apply”)

Example JSON Output

{
  "job_id": "zr_12345",
  "title": "Full Stack Developer",
  "company": "StartupXYZ",
  "location": "Austin, TX",
  "salary": "$120,000 - $160,000/year",
  "employment_type": "Full-time",
  "posted": "3 days ago",
  "description": "We are looking for an experienced Full Stack Developer...",
  "skills": ["React", "Node.js", "PostgreSQL", "AWS"],
  "benefits": ["401(k)", "Health Insurance", "Remote Work"],
  "url": "https://www.ziprecruiter.com/c/StartupXYZ/Job/Full-Stack-Developer/-in-Austin,TX"
}

Prerequisites

pip install requests beautifulsoup4 lxml fake-useragent playwright
playwright install chromium

Residential proxies are recommended for ZipRecruiter scraping.

Method 1: Scraping with Requests and BeautifulSoup

import requests
from bs4 import BeautifulSoup
from fake_useragent import UserAgent
import json
import time
import random

class ZipRecruiterScraper:
    def __init__(self, proxy_url=None):
        self.session = requests.Session()
        self.ua = UserAgent()
        self.proxy_url = proxy_url

    def _get_headers(self):
        return {
            "User-Agent": self.ua.random,
            "Accept": "text/html,application/xhtml+xml",
            "Accept-Language": "en-US,en;q=0.9",
            "Referer": "https://www.ziprecruiter.com/",
        }

    def _get_proxies(self):
        if self.proxy_url:
            return {"http": self.proxy_url, "https": self.proxy_url}
        return None

    def search_jobs(self, search_term, location="", max_pages=5):
        """Search ZipRecruiter job listings."""
        all_jobs = []

        for page in range(1, max_pages + 1):
            url = f"https://www.ziprecruiter.com/jobs-search?search={search_term}&location={location}&page={page}"

            try:
                response = self.session.get(
                    url, headers=self._get_headers(),
                    proxies=self._get_proxies(), timeout=30
                )
                response.raise_for_status()
                soup = BeautifulSoup(response.text, "lxml")

                # Extract JSON-LD structured data
                scripts = soup.find_all("script", type="application/ld+json")
                for script in scripts:
                    try:
                        data = json.loads(script.string)
                        if isinstance(data, list):
                            for item in data:
                                if item.get("@type") == "JobPosting":
                                    job = {
                                        "title": item.get("title"),
                                        "company": item.get("hiringOrganization", {}).get("name"),
                                        "location": item.get("jobLocation", {}).get("address", {}).get("addressLocality") if isinstance(item.get("jobLocation"), dict) else None,
                                        "salary": str(item.get("baseSalary", "")),
                                        "date_posted": item.get("datePosted"),
                                        "employment_type": item.get("employmentType"),
                                        "description": item.get("description", "")[:500],
                                    }
                                    all_jobs.append(job)
                    except json.JSONDecodeError:
                        continue

                # Fallback: HTML parsing
                if not all_jobs or page > 1:
                    cards = soup.select("article.job_result, div[class*='job-listing']")
                    for card in cards:
                        title = card.select_one("h2 a, [class*='job-title'] a")
                        company = card.select_one("[class*='company'], [class*='hiring-company']")
                        location_elem = card.select_one("[class*='location']")
                        salary = card.select_one("[class*='salary']")

                        job = {
                            "title": title.get_text(strip=True) if title else None,
                            "company": company.get_text(strip=True) if company else None,
                            "location": location_elem.get_text(strip=True) if location_elem else None,
                            "salary": salary.get_text(strip=True) if salary else None,
                            "url": title["href"] if title and title.get("href") else None,
                        }
                        if job.get("title"):
                            all_jobs.append(job)

                print(f"Page {page}: Total jobs: {len(all_jobs)}")
                time.sleep(random.uniform(3, 6))

            except requests.RequestException as e:
                print(f"Error on page {page}: {e}")
                continue

        return all_jobs


# Usage
scraper = ZipRecruiterScraper(proxy_url="http://user:pass@proxy:port")
jobs = scraper.search_jobs("python developer", "New York, NY", max_pages=3)
print(f"Found {len(jobs)} jobs")
print(json.dumps(jobs[:3], indent=2))

Method 2: Scraping with Playwright

For JavaScript-rendered content and job detail pages, use Playwright:

import asyncio
from playwright.async_api import async_playwright
import json
import random

class ZipRecruiterPlaywrightScraper:
    def __init__(self, proxy=None):
        self.proxy = proxy

    async def scrape_job_details(self, job_url):
        """Scrape full job details from a listing page."""
        async with async_playwright() as p:
            browser_args = {"headless": True}
            if self.proxy:
                browser_args["proxy"] = {"server": self.proxy}

            browser = await p.chromium.launch(**browser_args)
            page = await browser.new_page()
            await page.goto(job_url, wait_until="networkidle", timeout=60000)
            await asyncio.sleep(3)

            data = await page.evaluate('''
                () => {
                    const result = {};
                    const title = document.querySelector("h1, [class*='job-title']");
                    result.title = title ? title.innerText.trim() : null;

                    const company = document.querySelector("[class*='company'], [class*='hiring']");
                    result.company = company ? company.innerText.trim() : null;

                    const salary = document.querySelector("[class*='salary']");
                    result.salary = salary ? salary.innerText.trim() : null;

                    const description = document.querySelector("[class*='description'], [class*='job-body']");
                    result.description = description ? description.innerText.trim() : null;

                    const benefits = [];
                    document.querySelectorAll("[class*='benefit']").forEach(el => {
                        benefits.push(el.innerText.trim());
                    });
                    result.benefits = benefits;

                    // JSON-LD
                    const scripts = document.querySelectorAll('script[type="application/ld+json"]');
                    for (const script of scripts) {
                        try {
                            const json = JSON.parse(script.textContent);
                            if (json["@type"] === "JobPosting") {
                                result.date_posted = json.datePosted;
                                result.valid_through = json.validThrough;
                                result.employment_type = json.employmentType;
                                result.structured_salary = json.baseSalary || null;
                            }
                        } catch {}
                    }

                    return result;
                }
            ''')

            await browser.close()
            return data

Handling ZipRecruiter Anti-Bot Protections

1. Cloudflare Protection

ZipRecruiter uses Cloudflare for bot mitigation. Use stealth browser configurations and avoid rapid request patterns. Playwright with stealth plugins reduces detection risk.

2. Rate Limiting

ZipRecruiter will block IPs after sustained scraping. Implement 3-6 second delays between pages and rotate proxies every 20-30 requests.

3. Geographic Restrictions

ZipRecruiter serves different content based on IP location. Since it is a US-focused platform, always use US-based proxy IPs for complete and accurate results.

Data Storage and Salary Analysis

Store scraped salary data for compensation benchmarking and labor market analysis:

import sqlite3
import json

class ZipRecruiterDataStore:
    def __init__(self, db_path="ziprecruiter_jobs.db"):
        self.conn = sqlite3.connect(db_path)
        self.conn.execute('''CREATE TABLE IF NOT EXISTS jobs
            (job_id TEXT PRIMARY KEY, title TEXT, company TEXT,
             location TEXT, salary TEXT, employment_type TEXT,
             description TEXT, date_posted TEXT, url TEXT,
             scraped_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP)''')

    def store_job(self, job):
        self.conn.execute(
            """INSERT OR REPLACE INTO jobs
            (job_id, title, company, location, salary, employment_type, description, date_posted, url)
            VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)""",
            (job.get("url", "").split("/")[-1], job.get("title"), job.get("company"),
             job.get("location"), job.get("salary"), job.get("employment_type"),
             job.get("description", "")[:1000], job.get("date_posted"), job.get("url"))
        )
        self.conn.commit()

    def salary_report(self, job_title_keyword):
        cursor = self.conn.execute(
            "SELECT title, company, location, salary FROM jobs WHERE title LIKE ? AND salary IS NOT NULL ORDER BY date_posted DESC",
            (f"%{job_title_keyword}%",)
        )
        return cursor.fetchall()

    def top_hiring_companies(self, limit=20):
        cursor = self.conn.execute(
            "SELECT company, COUNT(*) as job_count FROM jobs GROUP BY company ORDER BY job_count DESC LIMIT ?",
            (limit,)
        )
        return cursor.fetchall()

Proxy Recommendations

Proxy TypeSuccess RateBest For
US Residential80-90%Job search scraping
ISP Proxies75-85%Consistent monitoring
Mobile85-95%Bypassing blocks
Datacenter30-40%Small-scale testing

US residential proxies work best since ZipRecruiter is US-focused. Mobile proxies from US carriers provide the highest success rates for bypassing Cloudflare protections.

Legal Considerations

  1. Terms of Service: ZipRecruiter’s ToS prohibits automated data collection without authorization.
  2. Salary Data: Salary estimates are proprietary ZipRecruiter data generated by their AI algorithms.
  3. Employer Data: Company information should be treated as business data, not personal data.
  4. Commercial Use: Consult legal counsel before using scraped data for commercial salary benchmarking or competitive products.

See our web scraping compliance guide for details.

Frequently Asked Questions

Does ZipRecruiter have a public API?

ZipRecruiter offers a Job Posting API for employers and a Job Search API for partners, but both require approval and have usage restrictions. Web scraping is the primary method for research data extraction.

How accurate are ZipRecruiter salary estimates?

ZipRecruiter generates salary estimates using their proprietary algorithm even when employers do not provide salary ranges. These estimates are generally within 10-15% of actual compensation but should be validated against other sources like Glassdoor, Indeed, or Bureau of Labor Statistics data.

Can I scrape ZipRecruiter from outside the US?

ZipRecruiter is US-focused, so most job data requires US IP addresses. Use US residential proxies for accurate results. Non-US IPs may see limited or no results.

How often do ZipRecruiter listings update?

New jobs are posted continuously throughout the day. For market research, daily scraping captures most new listings. For real-time hiring intelligence or monitoring specific companies, scrape every few hours.

How does ZipRecruiter compare to other job boards for scraping?

ZipRecruiter’s JSON-LD structured data makes it one of the easier job boards to scrape. Indeed and LinkedIn have more aggressive anti-bot protections. ZipRecruiter’s unique value is its AI-generated salary estimates, which are not available on most other platforms.

Conclusion

ZipRecruiter’s JSON-LD structured data makes job listing extraction relatively straightforward compared to other job boards. Combine JSON-LD parsing with HTML fallbacks and Playwright for detail pages. Use US residential proxies for reliable data collection, and store results in a database for salary benchmarking and labor market analysis.

For more job market scraping guides, check our web scraping proxy guide and proxy provider comparisons.


Related Reading

last updated: April 3, 2026

Scroll to Top

Resources

Proxy Signals Podcast
Operator-level insights on mobile proxies and access infrastructure.

Multi-Account Proxies: Setup, Types, Tools & Mistakes (2026)