How to Scrape LinkedIn Job Listings with Rotating Proxies
LinkedIn is the world’s largest professional network and one of the most comprehensive sources of job market data. With millions of job listings posted across every industry and geography, LinkedIn data powers salary research, labor market analysis, competitive intelligence, and recruitment automation.
However, LinkedIn maintains some of the most aggressive anti-scraping defenses of any website. Their systems are specifically designed to detect and block automated access, making a robust proxy strategy absolutely essential. This guide covers how to scrape LinkedIn job listings using Python and rotating mobile proxies.
Why LinkedIn Is Exceptionally Hard to Scrape
LinkedIn’s anti-bot measures go beyond typical website protections:
- Authentication walls: Most valuable content requires login, and LinkedIn tracks session behavior intensively.
- Rate limiting tiers: Even logged-in users face strict limits on profile views, searches, and page loads.
- Account restrictions: Accounts flagged for automation are throttled, restricted, or permanently banned.
- Legal enforcement: LinkedIn has a history of pursuing legal action against scrapers, including the landmark hiQ Labs case.
- JavaScript rendering: Key content loads dynamically, requiring browser-level rendering.
- CSRF tokens: API endpoints require valid CSRF tokens that must be extracted from page loads.
For LinkedIn scraping, mobile proxies provide the best balance of reliability and safety because LinkedIn sees mobile IP traffic as legitimate mobile app usage.
Setting Up Your Environment
pip install requests beautifulsoup4 lxml pandas selenium webdriver-managerApproach 1: Scraping Public Job Listings (No Login Required)
LinkedIn makes some job listing data available without authentication through their public job search pages.
Configure Proxy and Session
import requests
from bs4 import BeautifulSoup
import time
import random
import json
import pandas as pd
from urllib.parse import quote_plus
class LinkedInJobScraper:
"""Scrape LinkedIn job listings using public endpoints."""
BASE_URL = "https://www.linkedin.com/jobs-guest/jobs/api/seeMoreJobPostings/search"
def __init__(self, proxy_url):
self.session = requests.Session()
self.session.proxies = {
"http": proxy_url,
"https": proxy_url,
}
self.session.headers.update({
"User-Agent": (
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/120.0.0.0 Safari/537.36"
),
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.9",
"Accept-Encoding": "gzip, deflate, br",
"Connection": "keep-alive",
"Sec-Fetch-Dest": "document",
"Sec-Fetch-Mode": "navigate",
"Sec-Fetch-Site": "none",
})
def search_jobs(self, keywords, location="", num_pages=5):
"""Search for job listings with pagination."""
all_jobs = []
for page in range(num_pages):
start = page * 25
params = {
"keywords": keywords,
"location": location,
"start": start,
"f_TPR": "r604800", # Past week
"sortBy": "DD", # Date posted
}
try:
response = self.session.get(
self.BASE_URL, params=params, timeout=20
)
if response.status_code == 200:
jobs = self._parse_job_cards(response.text)
if not jobs:
print(f"No more results at page {page + 1}")
break
all_jobs.extend(jobs)
print(f"Page {page + 1}: Found {len(jobs)} jobs (total: {len(all_jobs)})")
elif response.status_code == 429:
print("Rate limited, waiting...")
time.sleep(random.uniform(30, 60))
continue
else:
print(f"Status {response.status_code} on page {page + 1}")
except requests.exceptions.RequestException as e:
print(f"Request error: {e}")
time.sleep(random.uniform(3, 7))
return all_jobs
def _parse_job_cards(self, html):
"""Parse job listing cards from search results HTML."""
soup = BeautifulSoup(html, "lxml")
jobs = []
cards = soup.find_all("div", class_="base-card")
for card in cards:
job = {}
# Job title
title_el = card.find("h3", class_="base-search-card__title")
job["title"] = title_el.get_text(strip=True) if title_el else None
# Company name
company_el = card.find("h4", class_="base-search-card__subtitle")
job["company"] = company_el.get_text(strip=True) if company_el else None
# Location
location_el = card.find("span", class_="job-search-card__location")
job["location"] = location_el.get_text(strip=True) if location_el else None
# Date posted
date_el = card.find("time", class_="job-search-card__listdate")
if date_el:
job["date_posted"] = date_el.get("datetime")
else:
date_el = card.find("time", class_="job-search-card__listdate--new")
job["date_posted"] = date_el.get("datetime") if date_el else None
# Job URL
link_el = card.find("a", class_="base-card__full-link")
job["url"] = link_el.get("href", "").split("?")[0] if link_el else None
# Job ID from URL
if job["url"]:
job["job_id"] = job["url"].split("-")[-1] if "-" in job["url"] else None
# Salary (if shown)
salary_el = card.find("span", class_="job-search-card__salary-info")
job["salary"] = salary_el.get_text(strip=True) if salary_el else None
if job["title"]:
jobs.append(job)
return jobsScrape Individual Job Details
def get_job_details(self, job_url):
"""Fetch detailed information for a single job listing."""
try:
# Use the guest view endpoint
response = self.session.get(job_url, timeout=20)
if response.status_code != 200:
return None
soup = BeautifulSoup(response.text, "lxml")
details = {}
# Job description
desc_el = soup.find("div", class_="show-more-less-html__markup")
if desc_el:
details["description"] = desc_el.get_text(strip=True)
details["description_html"] = str(desc_el)
# Seniority level
criteria = soup.find_all("li", class_="description__job-criteria-item")
for criterion in criteria:
label = criterion.find("h3")
value = criterion.find("span")
if label and value:
label_text = label.get_text(strip=True).lower()
value_text = value.get_text(strip=True)
if "seniority" in label_text:
details["seniority_level"] = value_text
elif "employment" in label_text:
details["employment_type"] = value_text
elif "function" in label_text:
details["job_function"] = value_text
elif "industr" in label_text:
details["industry"] = value_text
# Number of applicants
applicants_el = soup.find("span", class_="num-applicants__caption")
if applicants_el:
details["applicants"] = applicants_el.get_text(strip=True)
return details
except Exception as e:
print(f"Error fetching job details: {e}")
return NoneApproach 2: Authenticated Scraping with Selenium
For more comprehensive data, authenticated scraping through a browser provides access to additional fields.
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from webdriver_manager.chrome import ChromeDriverManager
class LinkedInAuthenticatedScraper:
"""Authenticated LinkedIn scraper using Selenium."""
def __init__(self, proxy_host, proxy_port, proxy_user, proxy_pass):
options = Options()
options.add_argument("--headless=new")
options.add_argument("--no-sandbox")
options.add_argument("--disable-blink-features=AutomationControlled")
options.add_argument(f"--proxy-server=http://{proxy_host}:{proxy_port}")
options.add_argument("--window-size=1920,1080")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
service = Service(ChromeDriverManager().install())
self.driver = webdriver.Chrome(service=service, options=options)
self.driver.execute_cdp_cmd(
"Page.addScriptToEvaluateOnNewDocument",
{"source": "Object.defineProperty(navigator, 'webdriver', {get: () => undefined})"},
)
def login(self, email, password):
"""Log in to LinkedIn."""
self.driver.get("https://www.linkedin.com/login")
time.sleep(random.uniform(2, 4))
email_field = self.driver.find_element(By.ID, "username")
email_field.send_keys(email)
time.sleep(random.uniform(0.5, 1.5))
pass_field = self.driver.find_element(By.ID, "password")
pass_field.send_keys(password)
time.sleep(random.uniform(0.5, 1))
submit_btn = self.driver.find_element(
By.CSS_SELECTOR, "button[type='submit']"
)
submit_btn.click()
time.sleep(random.uniform(3, 6))
# Check for security verification
if "checkpoint" in self.driver.current_url:
print("Security checkpoint detected. Manual intervention may be needed.")
return False
return True
def search_jobs_authenticated(self, keywords, location="", num_results=50):
"""Search jobs while logged in for richer data."""
encoded_kw = quote_plus(keywords)
encoded_loc = quote_plus(location)
base_url = (
f"https://www.linkedin.com/jobs/search/"
f"?keywords={encoded_kw}&location={encoded_loc}"
)
self.driver.get(base_url)
time.sleep(random.uniform(3, 5))
jobs = []
processed_ids = set()
# Scroll to load results
for scroll in range(num_results // 25 + 1):
self.driver.execute_script(
"window.scrollTo(0, document.body.scrollHeight)"
)
time.sleep(random.uniform(2, 4))
# Click "See more jobs" if visible
try:
see_more = self.driver.find_element(
By.CSS_SELECTOR, "button.infinite-scroller__show-more-button"
)
see_more.click()
time.sleep(random.uniform(2, 3))
except Exception:
pass
# Extract job cards
cards = self.driver.find_elements(
By.CSS_SELECTOR, "li.jobs-search-results__list-item"
)
for card in cards[:num_results]:
try:
job_id = card.get_attribute("data-occludable-job-id")
if not job_id or job_id in processed_ids:
continue
processed_ids.add(job_id)
# Click the job card to load details
card.click()
time.sleep(random.uniform(1.5, 3))
job = {"job_id": job_id}
# Title
title_el = self.driver.find_elements(
By.CSS_SELECTOR, "h2.jobs-unified-top-card__job-title a"
)
job["title"] = title_el[0].text if title_el else None
# Company
company_el = self.driver.find_elements(
By.CSS_SELECTOR, "span.jobs-unified-top-card__company-name a"
)
job["company"] = company_el[0].text if company_el else None
# Location
loc_el = self.driver.find_elements(
By.CSS_SELECTOR, "span.jobs-unified-top-card__bullet"
)
job["location"] = loc_el[0].text if loc_el else None
# Salary
salary_el = self.driver.find_elements(
By.CSS_SELECTOR, "span.jobs-unified-top-card__job-insight span"
)
for sel in salary_el:
text = sel.text
if "$" in text or "yr" in text:
job["salary"] = text
break
jobs.append(job)
except Exception as e:
print(f"Error processing card: {e}")
continue
return jobs
def close(self):
"""Close the browser."""
self.driver.quit()Running the Complete Pipeline
def main():
proxy_url = "http://user:pass@proxy.dataresearchtools.com:8080"
# Method 1: Public scraping (no login)
scraper = LinkedInJobScraper(proxy_url)
# Search for jobs
search_queries = [
{"keywords": "python developer", "location": "San Francisco, CA"},
{"keywords": "data engineer", "location": "New York, NY"},
{"keywords": "machine learning engineer", "location": "Remote"},
]
all_jobs = []
for query in search_queries:
print(f"\nSearching: {query['keywords']} in {query['location']}")
jobs = scraper.search_jobs(
keywords=query["keywords"],
location=query["location"],
num_pages=3,
)
all_jobs.extend(jobs)
time.sleep(random.uniform(5, 10))
print(f"\nTotal jobs found: {len(all_jobs)}")
# Get details for top jobs
for i, job in enumerate(all_jobs[:15]):
if job.get("url"):
print(f"Fetching details for: {job['title']} at {job['company']}")
details = scraper.get_job_details(job["url"])
if details:
job.update(details)
time.sleep(random.uniform(3, 7))
# Save results
with open("linkedin_jobs.json", "w", encoding="utf-8") as f:
json.dump(all_jobs, f, indent=2, ensure_ascii=False)
# Create analysis DataFrame
df = pd.DataFrame(all_jobs)
df.to_csv("linkedin_jobs.csv", index=False)
# Basic analysis
print(f"\nJobs by location:")
print(df["location"].value_counts().head(10))
print(f"\nJobs with salary info: {df['salary'].notna().sum()}")
print(f"\nTop hiring companies:")
print(df["company"].value_counts().head(10))
if __name__ == "__main__":
main()Session Management for LinkedIn
LinkedIn is particularly sensitive to session behavior. Here are critical guidelines:
Account Safety
class LinkedInSessionManager:
"""Manage LinkedIn scraping sessions to protect accounts."""
MAX_DAILY_SEARCHES = 100
MAX_DAILY_PROFILE_VIEWS = 80
MAX_DAILY_JOB_VIEWS = 200
def __init__(self):
self.daily_counts = {
"searches": 0,
"profile_views": 0,
"job_views": 0,
}
def can_perform(self, action_type):
"""Check if an action is within safe daily limits."""
limits = {
"searches": self.MAX_DAILY_SEARCHES,
"profile_views": self.MAX_DAILY_PROFILE_VIEWS,
"job_views": self.MAX_DAILY_JOB_VIEWS,
}
limit = limits.get(action_type, 0)
return self.daily_counts.get(action_type, 0) < limit
def record_action(self, action_type):
"""Record an action and return the updated count."""
self.daily_counts[action_type] = self.daily_counts.get(action_type, 0) + 1
return self.daily_counts[action_type]Proxy Rotation Strategy
For LinkedIn, avoid rotating proxies too frequently. LinkedIn expects consistent IP addresses within a session. Instead:
- Assign one proxy per LinkedIn account session.
- Use the same proxy for the duration of a scraping session (30-60 minutes).
- Rotate to a new proxy when starting a new session.
- Use geographically consistent proxies (your proxy location should match the account’s expected location).
Legal Considerations
LinkedIn scraping carries significant legal weight. The hiQ Labs v. LinkedIn case established that scraping publicly available data is not a violation of the Computer Fraud and Abuse Act. However, this ruling has nuances:
- Public data: Scraping publicly accessible job listings without logging in has stronger legal footing.
- Authenticated scraping: Using login credentials to access and scrape data may violate LinkedIn’s Terms of Service, which constitutes a contractual (not criminal) issue.
- Data usage: How you use the data matters. Competing directly with LinkedIn’s products using their data creates legal risk.
- User consent: Collecting personal information from LinkedIn profiles may require consent under GDPR or CCPA.
Always consult legal counsel before implementing LinkedIn scraping at scale for commercial purposes.
Use Cases for LinkedIn Job Data
The job listing data you extract powers numerous applications:
- Salary benchmarking: Analyze compensation ranges across roles, locations, and industries.
- Skills demand tracking: Monitor which skills appear most frequently in job descriptions to identify market trends.
- Competitive hiring intelligence: Track which companies are hiring, for what roles, and in which locations.
- Recruitment automation: Build feeds of relevant job listings for candidates or clients.
- Labor market research: Study hiring trends over time for academic or policy research.
Conclusion
Scraping LinkedIn job listings requires careful attention to both technical and legal considerations. The public job search API provides a solid foundation for data collection without authentication, while authenticated scraping via Selenium enables access to richer data at the cost of increased complexity and risk.
Rotating mobile proxies from DataResearchTools provide the IP diversity and legitimacy needed to sustain LinkedIn scraping operations. For broader web scraping strategies and proxy concepts, explore our resource library and proxy glossary.
- How to Scrape Amazon Product Data with Proxies: 2026 Python Guide
- How to Scrape Bing Search Results with Python and Proxies
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- API vs Web Scraping: When You Need Proxies (and When You Don’t)
- ASEAN Data Protection Laws: A Web Scraping Compliance Matrix
- How to Build an Ethical Web Scraping Policy for Your Company
- How to Scrape Amazon Product Data with Proxies: 2026 Python Guide
- How to Scrape Bing Search Results with Python and Proxies
- aiohttp + BeautifulSoup: Async Python Scraping
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- API vs Web Scraping: When You Need Proxies (and When You Don’t)
- ASEAN Data Protection Laws: A Web Scraping Compliance Matrix
- How to Scrape Amazon Product Data with Proxies: 2026 Python Guide
- How to Scrape Bing Search Results with Python and Proxies
- aiohttp + BeautifulSoup: Async Python Scraping
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- API vs Web Scraping: When You Need Proxies (and When You Don’t)
- ASEAN Data Protection Laws: A Web Scraping Compliance Matrix
- How to Scrape Amazon Product Data with Proxies: 2026 Python Guide
- How to Scrape Bing Search Results with Python and Proxies
- aiohttp + BeautifulSoup: Async Python Scraping
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- API vs Web Scraping: When You Need Proxies (and When You Don’t)
- ASEAN Data Protection Laws: A Web Scraping Compliance Matrix
Related Reading
- How to Scrape Amazon Product Data with Proxies: 2026 Python Guide
- How to Scrape Bing Search Results with Python and Proxies
- aiohttp + BeautifulSoup: Async Python Scraping
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- API vs Web Scraping: When You Need Proxies (and When You Don’t)
- ASEAN Data Protection Laws: A Web Scraping Compliance Matrix