Naver is South Korea’s dominant search engine with 60%+ market share. scraping it requires handling Korean character encoding (UTF-8), session cookies, and Naver’s bot detection. this guide covers search results, blog posts, and shopping data extraction.
why scrape Naver?
Naver dominates Korean search with over 60% market share, making it essential for SEO research, price monitoring, and content analysis in the Korean market. unlike Google, Naver surfaces its own properties heavily — Naver Blog, Naver Cafe, and Naver Shopping all appear prominently in results. understanding Naver’s SERP structure is critical for Korean market research.
Naver’s search results page (search.naver.com) loads content server-side, making basic HTML scraping feasible for many result types. however, some sections use dynamic loading that requires JavaScript rendering.
setting up your environment
install the required libraries. beautifulsoup4 with lxml is the fastest combination for parsing Naver’s HTML, and requests handles the HTTP layer well.
pip install requests beautifulsoup4 lxmlKorean text handling requires explicit UTF-8 encoding. Naver serves pages in UTF-8, so Python 3’s default string handling works correctly without additional configuration.
scraping Naver search results
Naver’s search URL structure is straightforward: https://search.naver.com/search.naver?query=KEYWORD. the main result container uses the class lst_total for organic results. always set a Korean-language Accept-Language header and a realistic user agent.
import requests
from bs4 import BeautifulSoup
import urllib.parse
def scrape_naver_search(keyword, num_results=10):
encoded = urllib.parse.quote(keyword)
url = f"https://search.naver.com/search.naver?query={encoded}&display={num_results}"
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
"Accept-Language": "ko-KR,ko;q=0.9,en-US;q=0.8",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Referer": "https://www.naver.com/"
}
response = requests.get(url, headers=headers, timeout=10)
response.encoding = "utf-8"
soup = BeautifulSoup(response.text, "lxml")
results = []
for item in soup.select(".lst_total .bx"):
title_el = item.select_one(".title_link")
desc_el = item.select_one(".dsc_txt")
if title_el:
results.append({
"title": title_el.get_text(strip=True),
"url": title_el.get("href", ""),
"description": desc_el.get_text(strip=True) if desc_el else ""
})
return results
results = scrape_naver_search("웹 스크래핑")
for r in results:
print(r["title"], "|", r["url"])scraping Naver Blog posts
Naver Blog is a major content platform. blog search is accessible at https://search.naver.com/search.naver?where=post&query=KEYWORD. blog posts include publish dates and author handles, which are useful for content research.
def scrape_naver_blog(keyword):
encoded = urllib.parse.quote(keyword)
url = f"https://search.naver.com/search.naver?where=post&query={encoded}"
headers = {
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36",
"Accept-Language": "ko-KR,ko;q=0.9",
"Referer": "https://www.naver.com/"
}
response = requests.get(url, headers=headers)
response.encoding = "utf-8"
soup = BeautifulSoup(response.text, "lxml")
posts = []
for item in soup.select(".api_txt_lines"):
posts.append(item.get_text(strip=True))
return postsusing the Naver Search API
Naver provides an official Search API that covers blog, news, shopping, and web search. register at developers.naver.com to get a client ID and secret. the API returns JSON and is the recommended approach for production pipelines — it avoids bot detection entirely.
import requests
CLIENT_ID = "YOUR_CLIENT_ID"
CLIENT_SECRET = "YOUR_CLIENT_SECRET"
def naver_api_search(query, display=10, search_type="webkr"):
url = f"https://openapi.naver.com/v1/search/{search_type}.json"
headers = {
"X-Naver-Client-Id": CLIENT_ID,
"X-Naver-Client-Secret": CLIENT_SECRET
}
params = {"query": query, "display": display}
response = requests.get(url, headers=headers, params=params)
return response.json()
data = naver_api_search("데이터 수집")
for item in data.get("items", []):
print(item["title"], item["link"])handling anti-bot measures
Naver implements rate limiting and checks for session cookies. after 50-100 requests from the same IP, you will start seeing CAPTCHAs or empty result pages. use residential proxies and add 2-4 second delays between requests. rotating proxies through Korean ISP addresses works best for location-specific results.
scraping Asian search engines often requires regional proxy infrastructure. our Singapore mobile proxy gives you a Southeast Asian IP footprint with real 4G/5G connections.
for proxy setup guidance, see what is a proxy server and SOCKS5 vs HTTP proxy. for general scraping concepts, visit what is web scraping.