How to Scrape Recruitee Pages for Lead Sourcing (2026)

Recruitee powers career pages for thousands of mid-market companies across Europe and North America, and its consistent URL structure makes it one of the more approachable ATS targets for lead sourcing at scale. If you’re building a list of companies actively hiring in a specific role, location, or tech stack, scraping Recruitee pages gives you a real-time signal that job boards like LinkedIn lag by days.

Understanding Recruitee’s URL and API Structure

Every Recruitee career site follows the same pattern: https://{company}.recruitee.com/ for the public jobs page, and https://{company}.recruitee.com/api/offers/ for the JSON feed. That API endpoint is the main event — it returns structured job data without any rendering requirement.

A typical response from /api/offers/ looks like this:

{
  "offers": [
    {
      "id": 182934,
      "title": "Senior Data Engineer",
      "department": "Engineering",
      "location": "Amsterdam, Netherlands",
      "remote": true,
      "created_at": "2026-04-12T08:00:00Z",
      "career_url": "https://acme.recruitee.com/o/senior-data-engineer"
    }
  ]
}

No authentication, no token rotation, just a clean GET. For most companies this endpoint returns 200 with Content-Type: application/json. It doesn’t paginate (all offers come back in one call), which keeps the scraper simple.

The harder part is getting the list of company subdomains to query. There’s no public Recruitee directory, so you need to seed your target list from a separate source: Apollo, Crunchbase, or a curated vertical list. This is conceptually similar to the approach covered in How to Scrape Yellow Pages Business Data, where you build a domain list first, then loop your scraper over it.

Building the Scraper

Use httpx with async for throughput. Recruitee doesn’t aggressively rate-limit individual company subdomains, but if you’re hitting 1,000+ subdomains in one run, you’ll want concurrency caps and retry logic.

import asyncio
import httpx

async def fetch_offers(client: httpx.AsyncClient, slug: str) -> dict:
    url = f"https://{slug}.recruitee.com/api/offers/"
    try:
        r = await client.get(url, timeout=10)
        if r.status_code == 200:
            return {"slug": slug, "offers": r.json().get("offers", [])}
    except (httpx.TimeoutException, httpx.RequestError):
        pass
    return {"slug": slug, "offers": []}

async def main(slugs: list[str]):
    async with httpx.AsyncClient(follow_redirects=True) as client:
        tasks = [fetch_offers(client, s) for s in slugs]
        return await asyncio.gather(*tasks)

Run this with a semaphore (limit to 20-30 concurrent) and you can process 5,000 companies in under 10 minutes on a standard VPS. The follow_redirects=True matters — some companies migrate away from Recruitee and the old subdomain 301s somewhere unhelpful.

Useful fields to extract per offer: title, department, location, remote, created_at, career_url. The created_at field is the most valuable for freshness filtering — jobs posted in the last 14 days indicate active hiring, which is a strong lead qualifier.

Anti-Bot Considerations and Proxy Use

The JSON API endpoint is low-friction for most companies, but if you’re scraping the HTML career pages (to capture structured data not in the API, like required skills parsed from job descriptions), you’ll hit Cloudflare on some subdomains. Recruitee’s default configuration doesn’t block the API path, but aggressive crawling of the HTML listings will get your IP flagged.

For the API-only approach, residential proxies are overkill — a rotating datacenter pool at ~3 req/s per IP is fine. If you’re parsing HTML job descriptions at scale, use residential or mobile IPs and add a randomized delay between 1.5 and 4 seconds. Keep your User-Agent consistent with a recent Chrome build.

Compared to more heavily protected ATS platforms, Recruitee is relatively open:

ATS PlatformAPI AvailableCloudflare PresentAuth RequiredJS Rendering Needed
RecruiteeYes (/api/offers/)Sometimes (HTML only)NoNo (API)
WorkdayNo public APIYesYesYes
SmartRecruitersPartialYesSometimesYes
PersonioNo public APIVariesNoNo
AshbyYes (/api/job-board/)MinimalNoNo

Workday is the most locked down by far — as covered in How to Scrape Workday Career Sites at Scale (2026), it requires full browser automation and per-tenant URL discovery. Recruitee is closer to Ashby in permissiveness, which is why it’s a good starting point if you’re new to ATS scraping.

Enriching and Qualifying the Lead Data

Raw job posting data alone is a weak lead signal. You need to layer on company-level attributes to prioritize outreach. A useful enrichment stack:

  1. Resolve the company slug to a domain using the Clearbit or Hunter enrichment API
  2. Cross-reference against your CRM to filter out existing customers or known churned accounts
  3. Pull headcount and funding stage from Apollo or Crunchbase to segment by company size
  4. Score by job recency — offers posted within the last 7 days get the highest priority
  5. Filter by department if you’re targeting specific buyers (e.g., only “Engineering” or “Data” roles indicate a technical buyer)

What the Recruitee API doesn’t give you is department headcount or seniority distribution across all open roles. For that, you’d need to aggregate across multiple job posts. If a company has 8 open engineering roles across data/ML/backend, that’s a much stronger signal than one generic posting.

This enrichment workflow mirrors what’s needed when targeting other ATS sources. How to Scrape SmartRecruiters Hiring Pages (2026) and How to Scrape Personio Career Sites (2026) cover similar enrichment approaches for those platforms — the enrichment logic is largely portable once you have normalized offer records.

Handling Edge Cases

A few things that will break a naive scraper:

  • Subdomain not found (404/NXDOMAIN): Some companies deactivate their Recruitee account but the subdomain persists in your seed list. Catch DNS failures separately from HTTP errors and flag them for removal.
  • Empty offers array: A company may have a valid Recruitee account with zero active postings. Log these separately — they’re worth re-checking in 30 days rather than discarding.
  • Non-English job descriptions: Recruitee is popular in the Netherlands, Germany, and Poland. If your downstream NLP pipeline assumes English, add a language detection step (langdetect or fasttext) before parsing.
  • Custom domains: Some companies configure a custom domain (e.g., jobs.acme.com) that proxies to Recruitee. The API path still works: https://jobs.acme.com/api/offers/. Check the page source for the Recruitee widget script to confirm.

How to Scrape Ashby Career Sites for Talent Pipelines (2026) documents a nearly identical custom-domain issue — it’s a common pattern across modern ATS platforms that support white-labeling.

Bottom Line

Recruitee’s /api/offers/ endpoint is the cleanest ATS scraping target available right now — no auth, no JS rendering, structured JSON out of the box. The real work is in building a quality seed list of company subdomains and enriching the output into actionable lead records. Start with a focused vertical (SaaS companies in the Netherlands, for example), validate your pipeline on 500 companies before scaling, and re-run the scrape weekly for fresh hiring signals. DRT covers this class of infrastructure scraping targets in depth — if you’re building a full multi-ATS pipeline, bookmark the full series.

Related guides on dataresearchtools.com

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top
message me on telegram

Resources

Proxy Signals Podcast
Operator-level insights on mobile proxies and access infrastructure.

Multi-Account Proxies: Setup, Types, Tools & Mistakes (2026)