How to Scrape Naver: Korean Search Engine Data Extraction
Naver dominates the South Korean internet. with over 60% search market share in Korea, it is the gateway to the Korean digital economy. unlike Google, Naver operates as a closed ecosystem with its own blog platform (Naver Blog), shopping mall (Naver Shopping), knowledge base (Naver Knowledge iN), and news aggregation. for anyone doing SEO, market research, or competitor analysis in the Korean market, Naver data is essential.
this guide covers how to scrape Naver’s search results, shopping data, blog content, and trending keywords using Python with proper proxy configuration for Korean geo-targeting.
Understanding Naver’s Structure
Naver is not just a search engine. it is a platform with multiple integrated services:
- Naver Search: web search results, blended with Naver’s own content
- Naver Shopping: product listings and price comparison
- Naver Blog: user-generated blog content (heavily featured in search results)
- Naver Cafe: community forums
- Naver News: aggregated news from Korean publishers
- Naver Knowledge iN: Q&A platform similar to Quora
- Naver Maps: mapping and local business data
- Naver DataLab: search trend analytics (similar to Google Trends)
each service has different scraping requirements and challenges.
Setting Up Korean Proxies
Naver serves different content based on your geographic location. to get authentic Korean search results, you need Korean IP addresses:
import httpx
# Korean proxy configuration
PROXY_CONFIGS = {
# residential proxies with Korean targeting
"smartproxy_kr": "http://user:pass@gate.smartproxy.com:7777", # add country=kr
"bright_data_kr": "http://user-country-kr:pass@brd.superproxy.io:22225",
"oxylabs_kr": "http://user-country-kr:pass@pr.oxylabs.io:7777"
}
async def test_proxy_location(proxy_url: str) -> dict:
"""verify proxy is routing through Korea."""
async with httpx.AsyncClient(proxies={"all://": proxy_url}, timeout=15) as client:
response = await client.get("https://ipinfo.io/json")
data = response.json()
print(f"IP: {data.get('ip')}, Country: {data.get('country')}, City: {data.get('city')}")
return data
most major proxy providers support Korean residential IPs. make sure to specify Korea (KR) as the target country in your proxy configuration. without Korean IPs, Naver may serve limited results or block your requests entirely.
Scraping Naver Search Results
Using the Naver Search API (Official)
Naver offers an official Search API through the Naver Developers platform. this is the cleanest approach:
import httpx
import json
class NaverSearchAPI:
"""official Naver Search API client."""
def __init__(self, client_id: str, client_secret: str):
self.client_id = client_id
self.client_secret = client_secret
self.base_url = "https://openapi.naver.com/v1/search"
async def search(self, query: str, search_type: str = "webkr",
display: int = 10, start: int = 1) -> dict:
"""
search Naver.
search_type options:
- webkr: web results (Korean)
- news: news articles
- blog: Naver Blog posts
- shop: shopping results
- image: image results
- cafearticle: Naver Cafe posts
- kin: Knowledge iN answers
"""
async with httpx.AsyncClient(timeout=15) as client:
response = await client.get(
f"{self.base_url}/{search_type}",
headers={
"X-Naver-Client-Id": self.client_id,
"X-Naver-Client-Secret": self.client_secret
},
params={
"query": query,
"display": display,
"start": start,
"sort": "sim" # sim=relevance, date=date
}
)
response.raise_for_status()
return response.json()
async def search_all_types(self, query: str) -> dict:
"""search across all Naver verticals for a query."""
types = ["webkr", "news", "blog", "shop", "cafearticle", "kin"]
results = {}
for search_type in types:
try:
data = await self.search(query, search_type, display=10)
results[search_type] = data
except Exception as e:
results[search_type] = {"error": str(e)}
return results
# usage
api = NaverSearchAPI(
client_id="your_client_id",
client_secret="your_client_secret"
)
to get API credentials, register at https://developers.naver.com and create an application. the free tier allows 25,000 requests per day, which is generous for most research use cases.
Scraping Search Results Directly
when you need data beyond what the API provides (rankings, featured snippets, ad positions), scrape the search results pages directly:
import asyncio
from playwright.async_api import async_playwright
from bs4 import BeautifulSoup
from urllib.parse import quote
class NaverSearchScraper:
def __init__(self, proxy_url: str = None):
self.proxy_url = proxy_url
async def search(self, query: str, page: int = 1) -> dict:
"""scrape Naver search results page."""
start = (page - 1) * 10 + 1
encoded_query = quote(query)
url = f"https://search.naver.com/search.naver?query={encoded_query}&start={start}"
async with async_playwright() as p:
launch_opts = {"headless": True}
if self.proxy_url:
launch_opts["proxy"] = {"server": self.proxy_url}
browser = await p.chromium.launch(**launch_opts)
context = await browser.new_context(
locale="ko-KR",
timezone_id="Asia/Seoul",
viewport={"width": 1920, "height": 1080}
)
page_obj = await context.new_page()
await page_obj.goto(url, wait_until="networkidle")
html = await page_obj.content()
await browser.close()
return self._parse_results(html, query)
def _parse_results(self, html: str, query: str) -> dict:
"""parse Naver search results HTML."""
soup = BeautifulSoup(html, "html.parser")
results = {
"query": query,
"organic_results": [],
"blog_results": [],
"news_results": [],
"shopping_results": [],
"knowledge_panel": None
}
# organic web results
for item in soup.select(".lst_total .bx"):
try:
title_el = item.select_one(".total_tit a")
desc_el = item.select_one(".total_dsc_wrap")
if title_el:
results["organic_results"].append({
"title": title_el.get_text(strip=True),
"url": title_el.get("href", ""),
"description": desc_el.get_text(strip=True) if desc_el else ""
})
except Exception:
continue
# blog results (Naver prioritizes its own blogs)
for item in soup.select(".api_txt_lines.total_tit"):
try:
link = item.select_one("a")
if link and "blog.naver.com" in link.get("href", ""):
results["blog_results"].append({
"title": link.get_text(strip=True),
"url": link.get("href", "")
})
except Exception:
continue
# news section
for item in soup.select(".news_area"):
try:
title_el = item.select_one(".news_tit")
source_el = item.select_one(".info_group a")
if title_el:
results["news_results"].append({
"title": title_el.get_text(strip=True),
"url": title_el.get("href", ""),
"source": source_el.get_text(strip=True) if source_el else ""
})
except Exception:
continue
return results
Scraping Naver Shopping
Naver Shopping is Korea’s largest product comparison platform. scraping it gives you price intelligence across Korean e-commerce:
class NaverShoppingScraper:
def __init__(self, proxy_url: str = None):
self.proxy_url = proxy_url
async def search_products(self, query: str, sort: str = "rel") -> list:
"""
scrape Naver Shopping search results.
sort options: rel (relevance), price_asc, price_dsc, date, review
"""
encoded = quote(query)
url = f"https://search.shopping.naver.com/search/all?query={encoded}&sort={sort}"
async with async_playwright() as p:
launch_opts = {"headless": True}
if self.proxy_url:
launch_opts["proxy"] = {"server": self.proxy_url}
browser = await p.chromium.launch(**launch_opts)
context = await browser.new_context(locale="ko-KR", timezone_id="Asia/Seoul")
page = await context.new_page()
await page.goto(url, wait_until="networkidle")
await page.wait_for_timeout(2000)
# scroll to load more products
for _ in range(3):
await page.evaluate("window.scrollBy(0, 1000)")
await page.wait_for_timeout(1000)
html = await page.content()
await browser.close()
return self._parse_products(html)
def _parse_products(self, html: str) -> list:
"""parse shopping search results."""
soup = BeautifulSoup(html, "html.parser")
products = []
for item in soup.select(".product_item__MDtDF"):
try:
product = {}
title_el = item.select_one(".product_title__Mmw2K")
product["title"] = title_el.get_text(strip=True) if title_el else ""
price_el = item.select_one(".price_num__S2p_v")
if price_el:
price_text = price_el.get_text(strip=True).replace(",", "").replace("원", "")
product["price_krw"] = int(price_text) if price_text.isdigit() else price_text
mall_el = item.select_one(".product_mall_title")
product["mall"] = mall_el.get_text(strip=True) if mall_el else ""
review_el = item.select_one(".product_num__fafe5")
product["review_count"] = review_el.get_text(strip=True) if review_el else "0"
rating_el = item.select_one(".product_grade__QiVVK")
product["rating"] = rating_el.get_text(strip=True) if rating_el else ""
link_el = item.select_one("a")
product["url"] = link_el.get("href", "") if link_el else ""
if product["title"]:
products.append(product)
except Exception:
continue
return products
Scraping Naver DataLab Trends
Naver DataLab provides search trend data for the Korean market, similar to Google Trends:
class NaverDataLabScraper:
"""scrape search trends from Naver DataLab."""
def __init__(self, client_id: str, client_secret: str):
self.client_id = client_id
self.client_secret = client_secret
async def get_trend(self, keywords: list[str], start_date: str,
end_date: str, time_unit: str = "month") -> dict:
"""
get search trend data via Naver DataLab API.
time_unit: month, week, date
dates in format: yyyy-mm-dd
"""
url = "https://openapi.naver.com/v1/datalab/search"
keyword_groups = [
{"groupName": kw, "keywords": [kw]} for kw in keywords
]
payload = {
"startDate": start_date,
"endDate": end_date,
"timeUnit": time_unit,
"keywordGroups": keyword_groups
}
async with httpx.AsyncClient(timeout=15) as client:
response = await client.post(
url,
json=payload,
headers={
"X-Naver-Client-Id": self.client_id,
"X-Naver-Client-Secret": self.client_secret,
"Content-Type": "application/json"
}
)
response.raise_for_status()
return response.json()
async def get_shopping_trend(self, category: str, start_date: str,
end_date: str) -> dict:
"""get shopping category trend data."""
url = "https://openapi.naver.com/v1/datalab/shopping/categories"
payload = {
"startDate": start_date,
"endDate": end_date,
"timeUnit": "month",
"category": [{"name": category, "param": [category]}]
}
async with httpx.AsyncClient(timeout=15) as client:
response = await client.post(
url,
json=payload,
headers={
"X-Naver-Client-Id": self.client_id,
"X-Naver-Client-Secret": self.client_secret,
"Content-Type": "application/json"
}
)
return response.json()
# usage
datalab = NaverDataLabScraper("your_id", "your_secret")
trend = asyncio.run(datalab.get_trend(
keywords=["프록시", "VPN", "웹스크래핑"],
start_date="2025-01-01",
end_date="2026-03-01"
))
Handling Korean Text and Encoding
Korean text requires proper encoding handling:
import re
def clean_korean_text(text: str) -> str:
"""clean and normalize Korean text from scraped content."""
# remove HTML entities
text = text.replace(" ", " ").replace("&", "&")
# normalize whitespace
text = re.sub(r'\s+', ' ', text).strip()
# remove control characters but keep Korean characters
text = re.sub(r'[\x00-\x08\x0b\x0c\x0e-\x1f\x7f]', '', text)
return text
def extract_korean_keywords(text: str) -> list[str]:
"""extract Korean keyword phrases from text."""
# match sequences of Korean characters (Hangul)
korean_pattern = re.compile(r'[가-힣]+(?:\s[가-힣]+)*')
matches = korean_pattern.findall(text)
return [m.strip() for m in matches if len(m.strip()) > 1]
def translate_naver_category(category_kr: str) -> str:
"""translate common Naver Shopping categories to English."""
categories = {
"디지털/가전": "Digital/Electronics",
"패션의류": "Fashion",
"패션잡화": "Fashion Accessories",
"화장품/미용": "Cosmetics/Beauty",
"식품": "Food",
"출산/육아": "Baby/Kids",
"생활/건강": "Living/Health",
"스포츠/레저": "Sports/Leisure",
"도서": "Books"
}
return categories.get(category_kr, category_kr)
Scraping Naver Blog Content
Naver Blog posts heavily influence Korean SEO. extracting blog content helps with content analysis and keyword research:
class NaverBlogScraper:
def __init__(self, proxy_url: str = None):
self.proxy_url = proxy_url
async def scrape_blog_post(self, blog_url: str) -> dict:
"""scrape a single Naver Blog post."""
async with async_playwright() as p:
launch_opts = {"headless": True}
if self.proxy_url:
launch_opts["proxy"] = {"server": self.proxy_url}
browser = await p.chromium.launch(**launch_opts)
context = await browser.new_context(locale="ko-KR")
page = await context.new_page()
await page.goto(blog_url, wait_until="networkidle")
await page.wait_for_timeout(2000)
# Naver Blog uses iframes for post content
frames = page.frames
content_frame = None
for frame in frames:
if "PostView" in frame.url or "post-view" in frame.url:
content_frame = frame
break
if content_frame:
html = await content_frame.content()
else:
html = await page.content()
await browser.close()
return self._parse_blog(html, blog_url)
def _parse_blog(self, html: str, url: str) -> dict:
soup = BeautifulSoup(html, "html.parser")
title_el = soup.select_one(".se-title-text, .pcol1")
content_el = soup.select_one(".se-main-container, #postViewArea")
date_el = soup.select_one(".se_publishDate, .date")
return {
"url": url,
"title": title_el.get_text(strip=True) if title_el else "",
"content": clean_korean_text(content_el.get_text()) if content_el else "",
"date": date_el.get_text(strip=True) if date_el else "",
"word_count": len(content_el.get_text().split()) if content_el else 0
}
Building a Complete Naver Research Pipeline
here is a full pipeline that combines all the scrapers for competitive research in the Korean market:
import asyncio
import json
from datetime import datetime
class NaverResearchPipeline:
def __init__(self, api_id: str, api_secret: str, proxy_url: str = None):
self.api = NaverSearchAPI(api_id, api_secret)
self.search_scraper = NaverSearchScraper(proxy_url)
self.shopping_scraper = NaverShoppingScraper(proxy_url)
self.datalab = NaverDataLabScraper(api_id, api_secret)
async def research_keyword(self, keyword: str) -> dict:
"""comprehensive research on a Korean keyword."""
report = {
"keyword": keyword,
"researched_at": datetime.now().isoformat(),
"search_results": {},
"shopping_data": [],
"trends": {}
}
# search across verticals
report["search_results"] = await self.api.search_all_types(keyword)
# shopping products
report["shopping_data"] = await self.shopping_scraper.search_products(keyword)
# search trends (last 12 months)
try:
report["trends"] = await self.datalab.get_trend(
keywords=[keyword],
start_date="2025-03-01",
end_date="2026-03-01"
)
except Exception as e:
report["trends"] = {"error": str(e)}
return report
async def competitive_analysis(self, keywords: list[str]) -> list:
"""analyze multiple keywords for competitive intelligence."""
results = []
for kw in keywords:
print(f"researching: {kw}")
data = await self.research_keyword(kw)
results.append(data)
await asyncio.sleep(3)
return results
# usage
pipeline = NaverResearchPipeline(
api_id="your_id",
api_secret="your_secret",
proxy_url="http://user-country-kr:pass@proxy.example.com:8080"
)
keywords = ["모바일 프록시", "데이터 수집", "웹 스크래핑 도구"]
results = asyncio.run(pipeline.competitive_analysis(keywords))
with open("naver_research.json", "w", encoding="utf-8") as f:
json.dump(results, f, ensure_ascii=False, indent=2)
Best Practices for Naver Scraping
use the official API first: Naver’s API is generous at 25,000 requests/day. only scrape the HTML when you need data the API does not provide
always use Korean proxies: Naver content and rankings differ significantly between Korean and international IPs. Korean residential proxies are essential for accurate data
set Korean locale: configure your browser context with
locale="ko-KR"andtimezone_id="Asia/Seoul"to get the same experience as Korean usershandle encoding correctly: always save Korean text with
ensure_ascii=Falseand UTF-8 encodingrespect rate limits: Naver blocks aggressive scraping quickly. keep delays of 3-5 seconds between requests
monitor for layout changes: Naver updates its frontend frequently. CSS selectors that work today may break next week. the API-first approach minimizes this risk
watch for CAPTCHA: Naver shows CAPTCHAs after repeated automated access. if you hit one, rotate to a fresh proxy IP and add longer delays
Conclusion
Naver is the essential platform for anyone doing research, SEO, or market analysis in South Korea. the combination of the official Naver API (for structured search and trend data) and browser-based scraping (for rankings, shopping prices, and blog analysis) gives you comprehensive coverage of the Korean digital market. Korean residential proxies are not optional for this work. they are required for accurate, geo-targeted results that match what Korean users actually see.