How to Scrape Ashby Career Sites for Talent Pipelines (2026)

Ashby has quietly become the ATS of choice for fast-growing startups and Series B+ companies, which makes scraping Ashby career sites a high-signal move for talent pipeline builders, recruiting agencies, and competitive intelligence teams. The problem is that Ashby’s job boards aren’t served from a single domain — each company self-hosts under a pattern like jobs.ashbyhq.com/ — and the rendering is React-based, which trips up naive scrapers.

How Ashby Job Pages Are Structured

Every Ashby career site follows a predictable URL schema:

https://jobs.ashbyhq.com/{company-slug}
https://jobs.ashbyhq.com/{company-slug}/{job-id}

The listing page renders a JSON payload into the DOM, but Ashby also exposes a public API endpoint that returns structured job data without JavaScript rendering:

GET https://api.ashbyhq.com/posting-api/job-board/{company-slug}

This is the cleanest extraction path. The response is JSON with fields like title, team, location, isRemote, employmentType, and applicationFormDefinition. No authentication required, no browser needed.

import httpx, json

SLUG = "linear"  # replace with target company slug
resp = httpx.get(f"https://api.ashbyhq.com/posting-api/job-board/{SLUG}", timeout=15)
data = resp.json()

for job in data.get("jobs", []):
    print(job["title"], "|", job.get("location", {}).get("name"), "|", job["id"])

Run this against a list of target slugs and you have a structured talent pipeline feed in minutes.

Finding Company Slugs at Scale

The slug discovery problem is where most pipelines break. There’s no public directory of all Ashby customers, so you need to build your own list.

Three approaches that work in 2026:

  1. Google dork: site:jobs.ashbyhq.com returns thousands of indexed subpaths. Paginate through results and extract the slug from the URL path.
  2. LinkedIn scrape: Filter companies by ATS tech stack using tools like Clay or Phantom Buster, which surface ATS provider from careers page redirects.
  3. Common Crawl: Query the March 2026 crawl for jobs.ashbyhq.com hostnames and extract unique slugs from the url column in Athena or BigQuery.

For a talent agency scraping 500+ companies, a seeded Common Crawl query gives the highest coverage per compute dollar.

Anti-Bot Behaviour and Rate Limits

The posting API (api.ashbyhq.com/posting-api) is intentionally public and low-friction. Ashby wants jobs indexed. That said, hammering it with concurrent requests will get your IP soft-blocked within minutes.

Realistic limits from testing:

BehaviourObserved limit
Concurrent requests (same IP)~5 before 429s appear
Requests per minute (single IP)~60 sustained
Cooldown after 42930-90 seconds
User-agent rejectionNot enforced on API
Bot detection on HTML pagesCloudflare Turnstile (varies by company)

The HTML job listing pages (jobs.ashbyhq.com) are a different story. Some companies enable Cloudflare Turnstile on the front-end, which means rendering them requires a headless browser or a Turnstile solver. For bulk data extraction, stick to the API — avoid the HTML path entirely unless you need application form fields that aren’t exposed in the JSON.

Rotate IPs per company slug, not per request. A residential proxy pool with 1 request per slug per session keeps your fingerprint clean and stays well within Ashby’s tolerance. If you’re also scraping other ATS platforms in the same pipeline — say, Recruitee or Personio — use separate proxy sessions per provider to avoid cross-contamination of block signals.

Normalising Ashby Data for Cross-ATS Pipelines

Raw Ashby output doesn’t map cleanly to other ATS schemas. If you’re building a unified talent intelligence feed that also pulls from iCIMS or Taleo, normalisation is the unglamorous work that determines whether your pipeline is actually useful.

Ashby-specific fields to watch:

  • location.name can be "Remote", a city, or a hybrid string like "New York, NY (Hybrid)" — parse these consistently
  • employmentType uses Ashby’s own enum: "FullTime", "PartTime", "Contract", "Temporary" — remap to your schema
  • team is a nested object with id and name, not a flat string
  • compensationTier appears only when the company has salary transparency enabled — treat it as optional

A canonical schema across ATS providers should use ISO 3166-1 alpha-2 for country codes, a remote_type enum (full, hybrid, none), and Unix timestamps for posted_at. Ashby’s createdAt field is UTC ISO 8601, which is straightforward to convert.

The same normalisation discipline applies when you’re pulling structured data from completely different verticals — the schema design lessons in How to Scrape Latin American Real Estate Sites cover multi-source field unification patterns that transfer directly to multi-ATS pipelines.

Running This at Scale

For a production pipeline covering 1,000+ Ashby companies, the architecture is straightforward:

  • Orchestration: Temporal or a simple cron on a VPS — Ashby jobs don’t change by the minute, so daily or twice-daily refreshes are enough
  • Queue: Redis or SQS with one task per company slug
  • Workers: 10-20 concurrent workers, each with a dedicated residential IP session
  • Storage: Postgres with a ats_jobs table and a (company_slug, job_id, scraped_at) composite key for deduplication
  • Change detection: Hash the job list per slug on each run and only emit events when the hash changes

Short bullet checklist before going to production:

  • Confirm the slug list covers your target company set (test 10 manually)
  • Set httpx timeout to 15s and retry twice with exponential backoff on 5xx
  • Log 429s with the slug and timestamp — patterns reveal which companies have extra rate protection
  • Store raw JSON alongside normalised rows — Ashby’s schema has changed twice in the past 18 months

For monitoring, track the ratio of slugs returning zero jobs vs. a non-empty list. A sudden spike in zero-job responses usually means your IP pool is blocked, not that all your targets froze hiring simultaneously.

Bottom Line

The Ashby posting API is genuinely scraper-friendly — use it instead of rendering HTML, rotate IPs at the slug level, and invest the saved complexity into normalisation and deduplication. If you’re building a serious multi-ATS talent pipeline, Ashby is one of the easier integrations; the harder work is schema consistency across providers. DRT covers ATS scraping patterns, proxy infrastructure, and data pipeline design in depth — the same principles apply whether you’re pulling from five job boards or five hundred.

Related guides on dataresearchtools.com

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top
message me on telegram

Resources

Proxy Signals Podcast
Operator-level insights on mobile proxies and access infrastructure.

Multi-Account Proxies: Setup, Types, Tools & Mistakes (2026)