How to Scrape LinkedIn Job Listings with Rotating Proxies

How to Scrape LinkedIn Job Listings with Rotating Proxies

LinkedIn is the world’s largest professional network and one of the most comprehensive sources of job market data. With millions of job listings posted across every industry and geography, LinkedIn data powers salary research, labor market analysis, competitive intelligence, and recruitment automation.

However, LinkedIn maintains some of the most aggressive anti-scraping defenses of any website. Their systems are specifically designed to detect and block automated access, making a robust proxy strategy absolutely essential. This guide covers how to scrape LinkedIn job listings using Python and rotating mobile proxies.

Why LinkedIn Is Exceptionally Hard to Scrape

LinkedIn’s anti-bot measures go beyond typical website protections:

  • Authentication walls: Most valuable content requires login, and LinkedIn tracks session behavior intensively.
  • Rate limiting tiers: Even logged-in users face strict limits on profile views, searches, and page loads.
  • Account restrictions: Accounts flagged for automation are throttled, restricted, or permanently banned.
  • Legal enforcement: LinkedIn has a history of pursuing legal action against scrapers, including the landmark hiQ Labs case.
  • JavaScript rendering: Key content loads dynamically, requiring browser-level rendering.
  • CSRF tokens: API endpoints require valid CSRF tokens that must be extracted from page loads.

For LinkedIn scraping, mobile proxies provide the best balance of reliability and safety because LinkedIn sees mobile IP traffic as legitimate mobile app usage.

Setting Up Your Environment

pip install requests beautifulsoup4 lxml pandas selenium webdriver-manager

Approach 1: Scraping Public Job Listings (No Login Required)

LinkedIn makes some job listing data available without authentication through their public job search pages.

Configure Proxy and Session

import requests
from bs4 import BeautifulSoup
import time
import random
import json
import pandas as pd
from urllib.parse import quote_plus

class LinkedInJobScraper:
    """Scrape LinkedIn job listings using public endpoints."""

    BASE_URL = "https://www.linkedin.com/jobs-guest/jobs/api/seeMoreJobPostings/search"

    def __init__(self, proxy_url):
        self.session = requests.Session()
        self.session.proxies = {
            "http": proxy_url,
            "https": proxy_url,
        }
        self.session.headers.update({
            "User-Agent": (
                "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
                "AppleWebKit/537.36 (KHTML, like Gecko) "
                "Chrome/120.0.0.0 Safari/537.36"
            ),
            "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
            "Accept-Language": "en-US,en;q=0.9",
            "Accept-Encoding": "gzip, deflate, br",
            "Connection": "keep-alive",
            "Sec-Fetch-Dest": "document",
            "Sec-Fetch-Mode": "navigate",
            "Sec-Fetch-Site": "none",
        })

    def search_jobs(self, keywords, location="", num_pages=5):
        """Search for job listings with pagination."""
        all_jobs = []

        for page in range(num_pages):
            start = page * 25
            params = {
                "keywords": keywords,
                "location": location,
                "start": start,
                "f_TPR": "r604800",  # Past week
                "sortBy": "DD",  # Date posted
            }

            try:
                response = self.session.get(
                    self.BASE_URL, params=params, timeout=20
                )

                if response.status_code == 200:
                    jobs = self._parse_job_cards(response.text)
                    if not jobs:
                        print(f"No more results at page {page + 1}")
                        break
                    all_jobs.extend(jobs)
                    print(f"Page {page + 1}: Found {len(jobs)} jobs (total: {len(all_jobs)})")
                elif response.status_code == 429:
                    print("Rate limited, waiting...")
                    time.sleep(random.uniform(30, 60))
                    continue
                else:
                    print(f"Status {response.status_code} on page {page + 1}")

            except requests.exceptions.RequestException as e:
                print(f"Request error: {e}")

            time.sleep(random.uniform(3, 7))

        return all_jobs

    def _parse_job_cards(self, html):
        """Parse job listing cards from search results HTML."""
        soup = BeautifulSoup(html, "lxml")
        jobs = []

        cards = soup.find_all("div", class_="base-card")
        for card in cards:
            job = {}

            # Job title
            title_el = card.find("h3", class_="base-search-card__title")
            job["title"] = title_el.get_text(strip=True) if title_el else None

            # Company name
            company_el = card.find("h4", class_="base-search-card__subtitle")
            job["company"] = company_el.get_text(strip=True) if company_el else None

            # Location
            location_el = card.find("span", class_="job-search-card__location")
            job["location"] = location_el.get_text(strip=True) if location_el else None

            # Date posted
            date_el = card.find("time", class_="job-search-card__listdate")
            if date_el:
                job["date_posted"] = date_el.get("datetime")
            else:
                date_el = card.find("time", class_="job-search-card__listdate--new")
                job["date_posted"] = date_el.get("datetime") if date_el else None

            # Job URL
            link_el = card.find("a", class_="base-card__full-link")
            job["url"] = link_el.get("href", "").split("?")[0] if link_el else None

            # Job ID from URL
            if job["url"]:
                job["job_id"] = job["url"].split("-")[-1] if "-" in job["url"] else None

            # Salary (if shown)
            salary_el = card.find("span", class_="job-search-card__salary-info")
            job["salary"] = salary_el.get_text(strip=True) if salary_el else None

            if job["title"]:
                jobs.append(job)

        return jobs

Scrape Individual Job Details

    def get_job_details(self, job_url):
        """Fetch detailed information for a single job listing."""
        try:
            # Use the guest view endpoint
            response = self.session.get(job_url, timeout=20)
            if response.status_code != 200:
                return None

            soup = BeautifulSoup(response.text, "lxml")
            details = {}

            # Job description
            desc_el = soup.find("div", class_="show-more-less-html__markup")
            if desc_el:
                details["description"] = desc_el.get_text(strip=True)
                details["description_html"] = str(desc_el)

            # Seniority level
            criteria = soup.find_all("li", class_="description__job-criteria-item")
            for criterion in criteria:
                label = criterion.find("h3")
                value = criterion.find("span")
                if label and value:
                    label_text = label.get_text(strip=True).lower()
                    value_text = value.get_text(strip=True)

                    if "seniority" in label_text:
                        details["seniority_level"] = value_text
                    elif "employment" in label_text:
                        details["employment_type"] = value_text
                    elif "function" in label_text:
                        details["job_function"] = value_text
                    elif "industr" in label_text:
                        details["industry"] = value_text

            # Number of applicants
            applicants_el = soup.find("span", class_="num-applicants__caption")
            if applicants_el:
                details["applicants"] = applicants_el.get_text(strip=True)

            return details

        except Exception as e:
            print(f"Error fetching job details: {e}")
            return None

Approach 2: Authenticated Scraping with Selenium

For more comprehensive data, authenticated scraping through a browser provides access to additional fields.

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from webdriver_manager.chrome import ChromeDriverManager


class LinkedInAuthenticatedScraper:
    """Authenticated LinkedIn scraper using Selenium."""

    def __init__(self, proxy_host, proxy_port, proxy_user, proxy_pass):
        options = Options()
        options.add_argument("--headless=new")
        options.add_argument("--no-sandbox")
        options.add_argument("--disable-blink-features=AutomationControlled")
        options.add_argument(f"--proxy-server=http://{proxy_host}:{proxy_port}")
        options.add_argument("--window-size=1920,1080")

        options.add_experimental_option("excludeSwitches", ["enable-automation"])

        service = Service(ChromeDriverManager().install())
        self.driver = webdriver.Chrome(service=service, options=options)
        self.driver.execute_cdp_cmd(
            "Page.addScriptToEvaluateOnNewDocument",
            {"source": "Object.defineProperty(navigator, 'webdriver', {get: () => undefined})"},
        )

    def login(self, email, password):
        """Log in to LinkedIn."""
        self.driver.get("https://www.linkedin.com/login")
        time.sleep(random.uniform(2, 4))

        email_field = self.driver.find_element(By.ID, "username")
        email_field.send_keys(email)
        time.sleep(random.uniform(0.5, 1.5))

        pass_field = self.driver.find_element(By.ID, "password")
        pass_field.send_keys(password)
        time.sleep(random.uniform(0.5, 1))

        submit_btn = self.driver.find_element(
            By.CSS_SELECTOR, "button[type='submit']"
        )
        submit_btn.click()
        time.sleep(random.uniform(3, 6))

        # Check for security verification
        if "checkpoint" in self.driver.current_url:
            print("Security checkpoint detected. Manual intervention may be needed.")
            return False
        return True

    def search_jobs_authenticated(self, keywords, location="", num_results=50):
        """Search jobs while logged in for richer data."""
        encoded_kw = quote_plus(keywords)
        encoded_loc = quote_plus(location)
        base_url = (
            f"https://www.linkedin.com/jobs/search/"
            f"?keywords={encoded_kw}&location={encoded_loc}"
        )

        self.driver.get(base_url)
        time.sleep(random.uniform(3, 5))

        jobs = []
        processed_ids = set()

        # Scroll to load results
        for scroll in range(num_results // 25 + 1):
            self.driver.execute_script(
                "window.scrollTo(0, document.body.scrollHeight)"
            )
            time.sleep(random.uniform(2, 4))

            # Click "See more jobs" if visible
            try:
                see_more = self.driver.find_element(
                    By.CSS_SELECTOR, "button.infinite-scroller__show-more-button"
                )
                see_more.click()
                time.sleep(random.uniform(2, 3))
            except Exception:
                pass

        # Extract job cards
        cards = self.driver.find_elements(
            By.CSS_SELECTOR, "li.jobs-search-results__list-item"
        )

        for card in cards[:num_results]:
            try:
                job_id = card.get_attribute("data-occludable-job-id")
                if not job_id or job_id in processed_ids:
                    continue
                processed_ids.add(job_id)

                # Click the job card to load details
                card.click()
                time.sleep(random.uniform(1.5, 3))

                job = {"job_id": job_id}

                # Title
                title_el = self.driver.find_elements(
                    By.CSS_SELECTOR, "h2.jobs-unified-top-card__job-title a"
                )
                job["title"] = title_el[0].text if title_el else None

                # Company
                company_el = self.driver.find_elements(
                    By.CSS_SELECTOR, "span.jobs-unified-top-card__company-name a"
                )
                job["company"] = company_el[0].text if company_el else None

                # Location
                loc_el = self.driver.find_elements(
                    By.CSS_SELECTOR, "span.jobs-unified-top-card__bullet"
                )
                job["location"] = loc_el[0].text if loc_el else None

                # Salary
                salary_el = self.driver.find_elements(
                    By.CSS_SELECTOR, "span.jobs-unified-top-card__job-insight span"
                )
                for sel in salary_el:
                    text = sel.text
                    if "$" in text or "yr" in text:
                        job["salary"] = text
                        break

                jobs.append(job)

            except Exception as e:
                print(f"Error processing card: {e}")
                continue

        return jobs

    def close(self):
        """Close the browser."""
        self.driver.quit()

Running the Complete Pipeline

def main():
    proxy_url = "http://user:pass@proxy.dataresearchtools.com:8080"

    # Method 1: Public scraping (no login)
    scraper = LinkedInJobScraper(proxy_url)

    # Search for jobs
    search_queries = [
        {"keywords": "python developer", "location": "San Francisco, CA"},
        {"keywords": "data engineer", "location": "New York, NY"},
        {"keywords": "machine learning engineer", "location": "Remote"},
    ]

    all_jobs = []
    for query in search_queries:
        print(f"\nSearching: {query['keywords']} in {query['location']}")
        jobs = scraper.search_jobs(
            keywords=query["keywords"],
            location=query["location"],
            num_pages=3,
        )
        all_jobs.extend(jobs)
        time.sleep(random.uniform(5, 10))

    print(f"\nTotal jobs found: {len(all_jobs)}")

    # Get details for top jobs
    for i, job in enumerate(all_jobs[:15]):
        if job.get("url"):
            print(f"Fetching details for: {job['title']} at {job['company']}")
            details = scraper.get_job_details(job["url"])
            if details:
                job.update(details)
            time.sleep(random.uniform(3, 7))

    # Save results
    with open("linkedin_jobs.json", "w", encoding="utf-8") as f:
        json.dump(all_jobs, f, indent=2, ensure_ascii=False)

    # Create analysis DataFrame
    df = pd.DataFrame(all_jobs)
    df.to_csv("linkedin_jobs.csv", index=False)

    # Basic analysis
    print(f"\nJobs by location:")
    print(df["location"].value_counts().head(10))
    print(f"\nJobs with salary info: {df['salary'].notna().sum()}")
    print(f"\nTop hiring companies:")
    print(df["company"].value_counts().head(10))


if __name__ == "__main__":
    main()

Session Management for LinkedIn

LinkedIn is particularly sensitive to session behavior. Here are critical guidelines:

Account Safety

class LinkedInSessionManager:
    """Manage LinkedIn scraping sessions to protect accounts."""

    MAX_DAILY_SEARCHES = 100
    MAX_DAILY_PROFILE_VIEWS = 80
    MAX_DAILY_JOB_VIEWS = 200

    def __init__(self):
        self.daily_counts = {
            "searches": 0,
            "profile_views": 0,
            "job_views": 0,
        }

    def can_perform(self, action_type):
        """Check if an action is within safe daily limits."""
        limits = {
            "searches": self.MAX_DAILY_SEARCHES,
            "profile_views": self.MAX_DAILY_PROFILE_VIEWS,
            "job_views": self.MAX_DAILY_JOB_VIEWS,
        }
        limit = limits.get(action_type, 0)
        return self.daily_counts.get(action_type, 0) < limit

    def record_action(self, action_type):
        """Record an action and return the updated count."""
        self.daily_counts[action_type] = self.daily_counts.get(action_type, 0) + 1
        return self.daily_counts[action_type]

Proxy Rotation Strategy

For LinkedIn, avoid rotating proxies too frequently. LinkedIn expects consistent IP addresses within a session. Instead:

  1. Assign one proxy per LinkedIn account session.
  2. Use the same proxy for the duration of a scraping session (30-60 minutes).
  3. Rotate to a new proxy when starting a new session.
  4. Use geographically consistent proxies (your proxy location should match the account’s expected location).

Legal Considerations

LinkedIn scraping carries significant legal weight. The hiQ Labs v. LinkedIn case established that scraping publicly available data is not a violation of the Computer Fraud and Abuse Act. However, this ruling has nuances:

  • Public data: Scraping publicly accessible job listings without logging in has stronger legal footing.
  • Authenticated scraping: Using login credentials to access and scrape data may violate LinkedIn’s Terms of Service, which constitutes a contractual (not criminal) issue.
  • Data usage: How you use the data matters. Competing directly with LinkedIn’s products using their data creates legal risk.
  • User consent: Collecting personal information from LinkedIn profiles may require consent under GDPR or CCPA.

Always consult legal counsel before implementing LinkedIn scraping at scale for commercial purposes.

Use Cases for LinkedIn Job Data

The job listing data you extract powers numerous applications:

  • Salary benchmarking: Analyze compensation ranges across roles, locations, and industries.
  • Skills demand tracking: Monitor which skills appear most frequently in job descriptions to identify market trends.
  • Competitive hiring intelligence: Track which companies are hiring, for what roles, and in which locations.
  • Recruitment automation: Build feeds of relevant job listings for candidates or clients.
  • Labor market research: Study hiring trends over time for academic or policy research.

Conclusion

Scraping LinkedIn job listings requires careful attention to both technical and legal considerations. The public job search API provides a solid foundation for data collection without authentication, while authenticated scraping via Selenium enables access to richer data at the cost of increased complexity and risk.

Rotating mobile proxies from DataResearchTools provide the IP diversity and legitimacy needed to sustain LinkedIn scraping operations. For broader web scraping strategies and proxy concepts, explore our resource library and proxy glossary.


Related Reading

Scroll to Top