403 Forbidden in Web Scraping: How to Fix It
The 403 Forbidden status code is the most common error web scrapers encounter. It means the server understood your request but refuses to fulfill it. In the context of web scraping, this almost always means the server has identified your request as automated and is blocking it.
This guide covers every common cause of 403 errors and provides tested solutions for each scenario.
What Causes 403 Forbidden in Web Scraping?
1. Missing or Incorrect User-Agent
The most frequent cause. Python’s requests library sends python-requests/2.x.x as the default User-Agent, which no real browser would use.
2. Missing Browser Headers
Real browsers send 15-20 headers with every request. Sending just a User-Agent is suspicious.
3. IP Reputation
Your IP address is flagged — either because it’s a datacenter IP, it’s on a blacklist, or you’ve made too many requests.
4. TLS Fingerprint Mismatch
Your TLS handshake doesn’t match any real browser, even if your headers look correct.
5. Geographic Restrictions
The content is restricted to specific countries or regions.
6. Authentication Required
The page requires login or API authentication.
7. Rate Limiting
You’ve exceeded the server’s request rate threshold.
8. WAF Rules
Web Application Firewall rules are blocking your specific request pattern.
9. robots.txt Enforcement
Some servers actively block requests to URLs disallowed by robots.txt.
10. Referrer Checking
The server expects a valid Referer header from navigation within the site.
Diagnosing the Cause
Before applying fixes, identify the specific cause:
import requests
def diagnose_403(url):
"""Diagnose why a URL returns 403."""
tests = {
"default": {},
"with_ua": {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"},
"with_full_headers": {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.9",
"Accept-Encoding": "gzip, deflate, br",
"Sec-Fetch-Dest": "document",
"Sec-Fetch-Mode": "navigate",
"Sec-Fetch-Site": "none",
}
}
for test_name, headers in tests.items():
try:
response = requests.get(url, headers=headers, timeout=15)
print(f"{test_name}: {response.status_code} ({len(response.text)} bytes)")
if response.status_code == 200:
print(f" -> Fix found: {test_name}")
return test_name
except Exception as e:
print(f"{test_name}: Error - {e}")
print("Headers alone don't fix it. Try TLS matching or proxies.")
return None
diagnose_403("https://target-site.com")Fix 1: Set Proper Browser Headers
The simplest and most common fix. Replace minimal headers with a complete browser header set:
import requests
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.9",
"Accept-Encoding": "gzip, deflate, br",
"Connection": "keep-alive",
"Upgrade-Insecure-Requests": "1",
"Sec-Ch-Ua": '"Chromium";v="122", "Not(A:Brand";v="24", "Google Chrome";v="122"',
"Sec-Ch-Ua-Mobile": "?0",
"Sec-Ch-Ua-Platform": '"Windows"',
"Sec-Fetch-Dest": "document",
"Sec-Fetch-Mode": "navigate",
"Sec-Fetch-Site": "none",
"Sec-Fetch-User": "?1",
"Cache-Control": "max-age=0",
}
response = requests.get("https://target-site.com", headers=headers)
print(response.status_code)Success rate: This alone fixes 30-40% of 403 errors.
Fix 2: Match TLS Fingerprint
If headers alone don’t work, the server is likely checking TLS fingerprints:
from curl_cffi import requests
response = requests.get(
"https://target-site.com",
impersonate="chrome"
)
print(response.status_code)Success rate: Fixes an additional 20-30% of 403 errors.
For a detailed explanation, see our TLS/JA3 fingerprinting guide.
Fix 3: Use Residential Proxies
If TLS matching doesn’t help, the issue is likely IP-based:
from curl_cffi import requests
proxies = {
"http": "http://user:pass@residential-proxy.com:7777",
"https": "http://user:pass@residential-proxy.com:7777"
}
response = requests.get(
"https://target-site.com",
impersonate="chrome",
proxies=proxies
)
print(response.status_code)Success rate: Fixes most remaining 403 errors when combined with proper headers and TLS matching.
See our proxy provider reviews for recommendations.
Fix 4: Add Referrer Header
Some sites check that requests come from within their own site:
headers["Referer"] = "https://target-site.com/"
headers["Origin"] = "https://target-site.com"
response = requests.get("https://target-site.com/products", headers=headers)For AJAX/API requests, the referrer should be the page that triggered the request:
# When accessing an API endpoint called by a specific page
api_headers = {
**headers,
"Referer": "https://target-site.com/search-page",
"X-Requested-With": "XMLHttpRequest",
"Accept": "application/json"
}Fix 5: Handle Cookies Properly
Many sites set required cookies on the homepage that must be present for subsequent requests:
from curl_cffi import requests
session = requests.Session(impersonate="chrome")
# Step 1: Visit homepage to get cookies
home_response = session.get("https://target-site.com/")
print(f"Homepage: {home_response.status_code}")
print(f"Cookies: {dict(session.cookies)}")
# Step 2: Now access the target page (with cookies)
target_response = session.get("https://target-site.com/products")
print(f"Target: {target_response.status_code}")Fix 6: Use Browser Automation
When all HTTP-level approaches fail, use a real browser:
import undetected_chromedriver as uc
import time
driver = uc.Chrome()
driver.get("https://target-site.com/products")
time.sleep(5)
if "403" not in driver.title:
print("Success!")
content = driver.page_source
# Extract cookies for future use
cookies = {c['name']: c['value'] for c in driver.get_cookies()}
driver.quit()See our Undetected ChromeDriver tutorial for full setup.
Fix 7: Rotate User-Agents
Using the same User-Agent for thousands of requests gets flagged:
import random
user_agents = [
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36",
"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:123.0) Gecko/20100101 Firefox/123.0",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.3 Safari/605.1.15",
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36",
]
def get_random_headers():
ua = random.choice(user_agents)
return {
"User-Agent": ua,
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.9",
"Accept-Encoding": "gzip, deflate, br",
"Sec-Fetch-Dest": "document",
"Sec-Fetch-Mode": "navigate",
"Sec-Fetch-Site": "none",
}For a comprehensive approach, see our User-Agent rotation guide.
Fix 8: Slow Down Request Rate
403 can be a response to rate limiting (instead of the more explicit 429):
import time
import random
for url in urls:
response = session.get(url, headers=get_random_headers())
if response.status_code == 403:
print("Got 403, backing off...")
time.sleep(random.uniform(30, 60)) # Long pause
continue
# Normal delay between requests
time.sleep(random.uniform(2, 5))For detailed rate limiting strategies, see our rate limiting guide.
Fix 9: Use the Site’s API
Check for API endpoints that might be less restricted:
# Open browser DevTools > Network tab
# Navigate the site normally
# Look for XHR/Fetch requests to API endpoints
# Common patterns:
api_endpoints = [
"https://target-site.com/api/v1/products",
"https://target-site.com/api/search",
"https://api.target-site.com/v2/items",
"https://target-site.com/graphql",
]
for endpoint in api_endpoints:
response = session.get(endpoint, headers={
"Accept": "application/json",
"X-Requested-With": "XMLHttpRequest",
})
print(f"{endpoint}: {response.status_code}")Fix 10: Change Request Method
Some WAF rules only block specific HTTP methods:
# If GET is blocked, try POST
response = session.post(
"https://target-site.com/api/search",
json={"query": "keyword"},
headers=headers
)
# Or try HEAD first to check accessibility
head_response = session.head("https://target-site.com/page", headers=headers)
print(f"HEAD: {head_response.status_code}")Systematic Fix Approach
Apply fixes in order from simplest to most complex:
from curl_cffi import requests
import undetected_chromedriver as uc
import time
def resilient_fetch(url, proxy=None):
"""Try increasingly sophisticated methods to fetch a URL."""
proxies = {"http": proxy, "https": proxy} if proxy else None
# Level 1: curl_cffi with browser impersonation
try:
response = requests.get(
url,
impersonate="chrome",
proxies=proxies,
timeout=15
)
if response.status_code == 200:
return {"method": "curl_cffi", "content": response.text}
except:
pass
# Level 2: curl_cffi with session (cookie accumulation)
try:
session = requests.Session(impersonate="chrome")
if proxies:
session.proxies = proxies
from urllib.parse import urlparse
parsed = urlparse(url)
base_url = f"{parsed.scheme}://{parsed.netloc}"
session.get(base_url) # Get cookies
time.sleep(1)
response = session.get(url)
if response.status_code == 200:
return {"method": "curl_cffi_session", "content": response.text}
except:
pass
# Level 3: Browser automation
try:
options = uc.ChromeOptions()
if proxy:
options.add_argument(f"--proxy-server={proxy}")
driver = uc.Chrome(options=options)
driver.get(url)
time.sleep(8)
content = driver.page_source
driver.quit()
if len(content) > 1000:
return {"method": "browser", "content": content}
except:
pass
return None
result = resilient_fetch(
"https://target-site.com/products",
proxy="http://user:pass@residential-proxy:7777"
)
if result:
print(f"Success with {result['method']}: {len(result['content'])} bytes")Common 403 Patterns by Website Type
E-Commerce Sites (Amazon, eBay, etc.)
Typically check User-Agent, rate, and IP reputation. Fix with residential proxies + proper headers.
Social Media (LinkedIn, Facebook, etc.)
Strict authentication requirements. Usually need logged-in sessions + browser automation.
News Sites
Often use CDN-level blocks. Fix with TLS fingerprint matching + geographic proxies.
Government/Public Data Sites
Frequently block non-standard User-Agents. Fix with proper headers alone.
Conclusion
403 Forbidden errors in web scraping have many causes, but they follow a predictable hierarchy. Start with the simplest fix (proper headers), escalate to TLS fingerprint matching, add residential proxies if needed, and fall back to browser automation for the most protected sites.
The systematic approach outlined above handles 95%+ of 403 errors. For the remaining edge cases, site-specific analysis of headers, cookies, and API endpoints is needed.
For related guides, see our articles on Cloudflare Error 1020, IP bans, and how websites detect bots.
- Best CAPTCHA Solving Services in 2026: Complete Comparison
- Best CAPTCHA Solving Services in 2026: Complete Comparison
- Anti-Phishing with Proxies: How Security Teams Use Mobile IPs
- Brand Protection with Proxies: Detect Counterfeit Sellers & Trademark Violations
- How Cybersecurity Teams Use Proxies for Threat Intelligence
- Using Mobile Proxies for Dark Web Monitoring and Research
- Best CAPTCHA Solving Services in 2026: Complete Comparison
- Best CAPTCHA Solving Services in 2026: Complete Comparison
- Anti-Phishing with Proxies: How Security Teams Use Mobile IPs
- Brand Protection with Proxies: Detect Counterfeit Sellers & Trademark Violations
- How Cybersecurity Teams Use Proxies for Threat Intelligence
- Using Mobile Proxies for Dark Web Monitoring and Research
- Best CAPTCHA Solving Services in 2026: Complete Comparison
- Best CAPTCHA Solving Services in 2026: Complete Comparison
- Anti-Phishing with Proxies: How Security Teams Use Mobile IPs
- Brand Protection with Proxies: Detect Counterfeit Sellers & Trademark Violations
- How Cybersecurity Teams Use Proxies for Threat Intelligence
- Using Mobile Proxies for Dark Web Monitoring and Research
- Best CAPTCHA Solving Services in 2026: Complete Comparison
- Browser Fingerprinting: What It Is and How to Prevent It
- Anti-Phishing with Proxies: How Security Teams Use Mobile IPs
- Brand Protection with Proxies: Detect Counterfeit Sellers & Trademark Violations
- How Cybersecurity Teams Use Proxies for Threat Intelligence
- Using Mobile Proxies for Dark Web Monitoring and Research
- Best CAPTCHA Solving Services in 2026: Complete Comparison
- Browser Fingerprinting: What It Is and How to Prevent It
- Anti-Phishing with Proxies: How Security Teams Use Mobile IPs
- Brand Protection with Proxies: Detect Counterfeit Sellers & Trademark Violations
- How Cybersecurity Teams Use Proxies for Threat Intelligence
- Using Mobile Proxies for Dark Web Monitoring and Research
- Best CAPTCHA Solving Services in 2026: Complete Comparison
- Browser Fingerprinting: What It Is and How to Prevent It
- Anti-Phishing with Proxies: How Security Teams Use Mobile IPs
- Brand Protection with Proxies: Detect Counterfeit Sellers & Trademark Violations
- How Cybersecurity Teams Use Proxies for Threat Intelligence
- Using Mobile Proxies for Dark Web Monitoring and Research
- Best CAPTCHA Solving Services in 2026: Complete Comparison
- Browser Fingerprinting: What It Is and How to Prevent It
- Anti-Phishing with Proxies: How Security Teams Use Mobile IPs
- Brand Protection with Proxies: Detect Counterfeit Sellers & Trademark Violations
- How Cybersecurity Teams Use Proxies for Threat Intelligence
- Using Mobile Proxies for Dark Web Monitoring and Research
- Best CAPTCHA Solving Services in 2026: Complete Comparison
- Browser Fingerprinting: What It Is and How to Prevent It
- Anti-Phishing with Proxies: How Security Teams Use Mobile IPs
- Brand Protection with Proxies: Detect Counterfeit Sellers & Trademark Violations
- How Cybersecurity Teams Use Proxies for Threat Intelligence
- Using Mobile Proxies for Dark Web Monitoring and Research
Related Reading
- Best CAPTCHA Solving Services in 2026: Complete Comparison
- Browser Fingerprinting: What It Is and How to Prevent It
- Anti-Phishing with Proxies: How Security Teams Use Mobile IPs
- Brand Protection with Proxies: Detect Counterfeit Sellers & Trademark Violations
- How Cybersecurity Teams Use Proxies for Threat Intelligence
- Using Mobile Proxies for Dark Web Monitoring and Research