How to Scrape JobStreet Southeast Asia Listings in 2026

JobStreet is one of the most-scraped job boards in Southeast Asia, and for good reason: it covers Singapore, Malaysia, Philippines, Indonesia, and Vietnam under one domain structure, making it a high-value target for recruiters building talent pipelines, researchers tracking hiring trends, and HR tech companies benchmarking salaries. Scraping JobStreet in 2026 is harder than it looks. Seek’s acquisition of JobStreet brought infrastructure upgrades that tightened bot detection, and naive Requests-based scrapers now hit walls within minutes.

What JobStreet’s Anti-Bot Stack Looks Like in 2026

JobStreet runs Cloudflare with JavaScript challenges on most listing pages. The search results endpoint (/en/job-search/) renders server-side HTML, but the pagination and filter state are managed via URL query parameters, which is good news for structured scraping. The job detail pages load salary and company data dynamically via XHR calls to internal APIs.

The key fingerprinting vectors to watch:

  • TLS fingerprint (JA3/JA4): Node’s default https module and Python’s requests both fail here
  • Browser challenge cookies: cf_clearance is required on subsequent requests
  • Rate limiting: more than 30 requests per minute from one IP triggers a 429 or silent redirect to a CAPTCHA wall
  • Headless detection: navigator.webdriver checks are active on job detail pages

If you’ve already dealt with similar Cloudflare setups on other regional platforms, the pattern will feel familiar. The approach for How to Scrape Seek.com.au Australian Job Listings in 2026 maps closely to what works on JobStreet, since both are Seek-owned properties with shared infrastructure.

Recommended Scraping Stack

For most production use cases, you want one of three approaches depending on volume and budget:

ApproachBest forCostReliability
Playwright + residential proxyLow-to-mid volume, full data$50-200/moHigh
Scraping API (Apify/Zyte)Medium volume, managed$100-400/moHigh
Raw HTTP + TLS spoofing (curl-cffi)High volume, cost-sensitive$20-80/moMedium

For most teams scraping fewer than 10,000 listings per day, Playwright with a residential proxy rotation is the most reliable path. For higher volumes, Zyte’s Smart Browser handles Cloudflare challenges automatically and exposes a simple HTTP API.

The curl-cffi library in Python is worth knowing about: it mimics browser TLS fingerprints at the HTTP layer without spinning up a full browser, which cuts infrastructure cost significantly. It works on JobStreet’s listing index pages but breaks on detail pages that fire JavaScript challenges.

Extracting Listing Data

JobStreet’s search results page at https://www.jobstreet.com.sg/en/job-search/ accepts query parameters for keyword, location, and classification. The HTML structure is stable enough to parse with BeautifulSoup after you’ve passed the Cloudflare check.

Here’s a minimal working pattern using Playwright:

from playwright.async_api import async_playwright
import asyncio, json

async def scrape_jobstreet(keyword: str, location: str, pages: int = 5):
    results = []
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        context = await browser.new_context(
            user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
            proxy={"server": "http://YOUR_RESIDENTIAL_PROXY:PORT",
                   "username": "user", "password": "pass"}
        )
        page = await context.new_page()
        for pg in range(1, pages + 1):
            url = f"https://www.jobstreet.com.sg/en/job-search/?keywords={keyword}&location={location}&page={pg}"
            await page.goto(url, wait_until="domcontentloaded")
            await page.wait_for_selector("[data-automation='jobListing']", timeout=15000)
            cards = await page.query_selector_all("[data-automation='jobListing']")
            for card in cards:
                title = await card.query_selector("[data-automation='jobTitle']")
                company = await card.query_selector("[data-automation='jobCompany']")
                results.append({
                    "title": await title.inner_text() if title else None,
                    "company": await company.inner_text() if company else None,
                })
        await browser.close()
    return results

The data-automation attributes on JobStreet are relatively stable, though they’ve changed twice since 2023, so build your selectors around them but keep a fallback XPath for the job ID embedded in the listing URL.

For salary data, it’s not always in the card. You’ll need to visit individual job detail pages, where salary appears in a tagged with data-automation="job-detail-salary". On listings where salary is hidden, that element is absent entirely rather than masked, so a simple None check is sufficient.

Country and Language Variants

JobStreet operates slightly differently per country, with separate domains and locale-specific URL structures:

  1. Singapore: jobstreet.com.sg/en/job-search/
  2. Malaysia: jobstreet.com.my/en/job-search/
  3. Philippines: jobstreet.com.ph/en/job-search/
  4. Indonesia: jobstreet.co.id/id/job-search/ (Indonesian-language default)
  5. Vietnam: jobstreet.com.vn/en/job-search/

Indonesia and Vietnam are worth noting because they use country-code TLDs rather than .com subdomains, and the Indonesian version defaults to Bahasa Indonesia, meaning classification labels and location names won’t match if you’re trying to normalize across countries. Run separate scrapers per domain and normalize at the data layer.

If you’re building a multi-platform SEA hiring dataset, also look at what’s available on How to Scrape JobsDB Hong Kong + Thailand Job Listings (2026), since JobsDB covers markets that JobStreet doesn’t.

Proxy and Rate Limit Strategy

Residential proxies are non-negotiable for production JobStreet scraping. Datacenter IPs are blocked at the ASN level on Cloudflare’s radar within hours. Singapore residential IPs work best for .com.sg, Malaysian IPs for .com.my, and so on — using a geo-matched proxy reduces the chance of CAPTCHA challenges from location mismatch signals.

Rate limits to stay within:

  • No more than 1 request per 3 seconds per IP
  • Rotate IP after every 15-20 requests at most
  • Add random 1-4 second jitter between requests

For comparison, How to Scrape Foundit (Monster India) Job Postings in 2026 covers a platform with looser rate limiting than JobStreet, so if you’re running a multi-board scraper, JobStreet should dictate your throttle ceiling.

If you’re operating at the data infrastructure level and need to understand how proxy sourcing works behind the scenes, the How to Scrape Cars.com Vehicle Listings and Dealer Data (2026) guide has a solid breakdown of residential proxy pool management that applies to any high-volume scraping operation.

For teams managing proprietary data sources alongside public scraping, How to Scrape Tracxn Free Tier Pages in 2026 covers a useful adjacent case: extracting structured company-level data that pairs well with hiring signal data from job boards.

Storing and Structuring the Output

Raw JobStreet data needs cleanup before it’s useful. Common issues:

  • Salary ranges are free-text strings (“SGD 4,000 – SGD 7,000 per month”), so you’ll need a regex normalization pass
  • Company names include legal suffixes (“Pte. Ltd.”, “Sdn. Bhd.”) that fragment grouping if not stripped
  • Location fields mix district names with MRT station references in Singapore

A minimal Postgres schema:

CREATE TABLE jobstreet_listings (
    id TEXT PRIMARY KEY,          -- extracted from URL
    country CHAR(2),
    title TEXT,
    company TEXT,
    salary_min INT,
    salary_max INT,
    currency CHAR(3),
    location TEXT,
    posted_at DATE,
    scraped_at TIMESTAMPTZ DEFAULT NOW()
);

Deduplication on the URL-embedded job ID is reliable. JobStreet reuses IDs across reposts, so a composite key of (id, posted_at) is safer if you’re tracking repost frequency.

Bottom Line

JobStreet is scrapeable in 2026 with Playwright plus residential proxies, but you need geo-matched IPs, conservative rate limits, and selector monitoring since the data-automation attributes drift. For most teams, Zyte Smart Browser is the fastest path to stable production data. DRT covers scraping setups like this across dozens of platforms, so if you’re building a broader SEA data pipeline, check the rest of the job board coverage in the series.

Related guides on dataresearchtools.com

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top
message me on telegram

Resources

Proxy Signals Podcast
Operator-level insights on mobile proxies and access infrastructure.

Multi-Account Proxies: Setup, Types, Tools & Mistakes (2026)