How to Scrape Lever and Greenhouse Job Boards Programmatically (2026)

Lever and Greenhouse power job listings for thousands of tech companies, and scraping them programmatically is one of the cleanest data collection tasks you can do in 2026 — both platforms expose structured APIs and predictable URL patterns that make bulk extraction far more reliable than scraping legacy ATS systems.

Why Lever and Greenhouse Are Scraper-Friendly

Neither platform is trying to hide its job data. Greenhouse publishes a public JSON board API that requires no authentication for read access. Lever similarly exposes a public posting endpoint per employer. Both are designed this way intentionally: companies want their listings indexed and aggregated.

That said, “scraper-friendly” doesn’t mean “rate-limit-free.” Both platforms throttle aggressive crawlers, and Greenhouse’s newer board configurations increasingly route through Cloudflare. If you’re building a high-volume aggregator, you’ll hit walls that a naive requests loop won’t survive. For context on how other ATS stacks compare, see How to Scrape Workday Career Sites at Scale (2026) — Workday is the opposite of Greenhouse: no public API, full JS rendering required.

Greenhouse API: The Right Way to Pull Job Data

Greenhouse’s Job Board API is the cleanest starting point. Each company has a unique board token, and the base endpoint is:

GET https://boards-api.greenhouse.io/v1/boards/{board_token}/jobs?content=true

The content=true parameter pulls full job descriptions. Without it, you only get metadata.

import httpx
import time

BOARD_TOKENS = ["stripe", "airbnb", "figma", "notion"]

def fetch_greenhouse_jobs(token: str) -> list[dict]:
    url = f"https://boards-api.greenhouse.io/v1/boards/{token}/jobs"
    r = httpx.get(url, params={"content": "true"}, timeout=15)
    r.raise_for_status()
    return r.json().get("jobs", [])

all_jobs = []
for token in BOARD_TOKENS:
    jobs = fetch_greenhouse_jobs(token)
    all_jobs.extend(jobs)
    time.sleep(1.2)  # stay under rate limits

print(f"Collected {len(all_jobs)} listings")

Each job record returns id, title, location, updated_at, content (HTML), departments, offices, and metadata. The department and office arrays are particularly useful for org-structure analysis or filtering by function.

Finding the board token for a given company is the only friction point. Most companies embed it in their careers page URL (boards.greenhouse.io/{token}) or in the page source as a data attribute. A simple regex against the page source finds it in under a second.

Lever API: Postings and Department Filters

Lever’s public posting endpoint follows the same pattern:

GET https://api.lever.co/v0/postings/{company_slug}?mode=json

The mode=json parameter is essential — without it, Lever returns HTML. Lever also supports department and team filtering via query params, which makes targeted extraction much cleaner than post-processing a full dump.

Useful query parameters:

  • ?department=Engineering — filter by department
  • ?team=Backend — filter by team
  • ?commitment=Full-time — filter by job type
  • ?mode=json&limit=50&offset=0 — pagination (default page size is 25)

Lever responses include id, text (job title), categories (department, team, location, commitment), description (HTML), descriptionPlain, lists (responsibilities and requirements as structured arrays), salaryRange, and applyUrl. The salaryRange field is populated for US roles when companies opt into transparency.

Handling Scale: Board Tokens in Bulk

If you’re building a jobs aggregator covering thousands of companies, the bottleneck is token/slug discovery, not the API calls themselves. A practical pipeline looks like this:

  1. Seed a company list from a SaaS review aggregator (G2, Capterra). If you want structured data from those platforms, How to Scrape G2.com and Capterra SaaS Reviews Programmatically covers the extraction path in detail.
  2. For each company, check careers page URLs for Greenhouse or Lever patterns.
  3. Validate the token/slug returns a 200 before adding it to your active list.
  4. Run nightly delta syncs using updated_at filtering rather than full re-pulls.

Delta syncing matters because both APIs are fast but not free under load. Greenhouse returns updated_at per job; Lever returns createdAt and updatedAt. A daily sync that only fetches listings updated in the last 24 hours reduces your request volume by 80-90% on a mature dataset.

Platform Comparison

FeatureGreenhouseLever
Public APIYes, JSONYes, JSON
Auth requiredNo (read)No (read)
Salary dataRareUS roles, opt-in
Department filteringPost-processNative query param
Cloudflare protectionIncreasingMinimal
Pagination styleSingle responseLimit/offset
HTML job descriptionsYes (content=true)Yes + plain text

Greenhouse is better for companies with complex location/department hierarchies. Lever is cleaner for filtering by commitment type or team before pulling content.

When the API Breaks Down

Both APIs have edge cases that will quietly return garbage without throwing errors:

  • Deleted listings return 200: Greenhouse keeps deleted jobs in responses for up to 72 hours with no flag. Check updated_at recency and watch for empty departments arrays as a signal.
  • Lever pagination silently truncates: If a company has more than 250 listings, Lever’s API stops at 250 without a “next page” indicator in some configurations. Always check if your result count equals a round number and retry with offset.
  • Cloudflare 1020 on Greenhouse: This is an access denied, not a rate limit. Rotating residential IPs fixes it. Data center IPs increasingly trigger 1020 on high-traffic boards.
  • Board token changes: Companies occasionally reorg and migrate to a new board slug. Build dead-link detection into your pipeline.

For ATS platforms without public APIs, the extraction approach differs substantially. How to Scrape SmartRecruiters Hiring Pages (2026) covers a platform that requires a hybrid API-plus-DOM strategy, and How to Scrape Recruitee Pages for Lead Sourcing (2026) covers another mid-market ATS popular in Europe. If your target companies use smaller regional systems, How to Scrape Boutique Recruitment Site Postings (2026) is the right starting point for less structured targets.

Bottom line

Greenhouse and Lever are the easiest ATS platforms to scrape at scale in 2026 — both expose well-documented public APIs that return structured JSON with no login required. Start with the API before reaching for a browser automation tool. Use delta syncing by updated_at to keep costs low, rotate residential IPs to handle Cloudflare friction on high-traffic Greenhouse boards, and build dead-slug detection from day one. DRT covers the full ATS ecosystem if you need extraction guides for the other major platforms your targets use.

Related guides on dataresearchtools.com

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top
message me on telegram

Resources

Proxy Signals Podcast
Operator-level insights on mobile proxies and access infrastructure.

Multi-Account Proxies: Setup, Types, Tools & Mistakes (2026)