How to Scrape Naver: Korean Search Engine Data Extraction

How to Scrape Naver: Korean Search Engine Data Extraction

Naver dominates the South Korean internet. with over 60% search market share in Korea, it is the gateway to the Korean digital economy. unlike Google, Naver operates as a closed ecosystem with its own blog platform (Naver Blog), shopping mall (Naver Shopping), knowledge base (Naver Knowledge iN), and news aggregation. for anyone doing SEO, market research, or competitor analysis in the Korean market, Naver data is essential.

this guide covers how to scrape Naver’s search results, shopping data, blog content, and trending keywords using Python with proper proxy configuration for Korean geo-targeting.

Understanding Naver’s Structure

Naver is not just a search engine. it is a platform with multiple integrated services:

  • Naver Search: web search results, blended with Naver’s own content
  • Naver Shopping: product listings and price comparison
  • Naver Blog: user-generated blog content (heavily featured in search results)
  • Naver Cafe: community forums
  • Naver News: aggregated news from Korean publishers
  • Naver Knowledge iN: Q&A platform similar to Quora
  • Naver Maps: mapping and local business data
  • Naver DataLab: search trend analytics (similar to Google Trends)

each service has different scraping requirements and challenges.

Setting Up Korean Proxies

Naver serves different content based on your geographic location. to get authentic Korean search results, you need Korean IP addresses:

import httpx

# Korean proxy configuration
PROXY_CONFIGS = {
    # residential proxies with Korean targeting
    "smartproxy_kr": "http://user:pass@gate.smartproxy.com:7777",  # add country=kr
    "bright_data_kr": "http://user-country-kr:pass@brd.superproxy.io:22225",
    "oxylabs_kr": "http://user-country-kr:pass@pr.oxylabs.io:7777"
}

async def test_proxy_location(proxy_url: str) -> dict:
    """verify proxy is routing through Korea."""
    async with httpx.AsyncClient(proxies={"all://": proxy_url}, timeout=15) as client:
        response = await client.get("https://ipinfo.io/json")
        data = response.json()
        print(f"IP: {data.get('ip')}, Country: {data.get('country')}, City: {data.get('city')}")
        return data

most major proxy providers support Korean residential IPs. make sure to specify Korea (KR) as the target country in your proxy configuration. without Korean IPs, Naver may serve limited results or block your requests entirely.

Scraping Naver Search Results

Using the Naver Search API (Official)

Naver offers an official Search API through the Naver Developers platform. this is the cleanest approach:

import httpx
import json

class NaverSearchAPI:
    """official Naver Search API client."""

    def __init__(self, client_id: str, client_secret: str):
        self.client_id = client_id
        self.client_secret = client_secret
        self.base_url = "https://openapi.naver.com/v1/search"

    async def search(self, query: str, search_type: str = "webkr",
                     display: int = 10, start: int = 1) -> dict:
        """
        search Naver.

        search_type options:
        - webkr: web results (Korean)
        - news: news articles
        - blog: Naver Blog posts
        - shop: shopping results
        - image: image results
        - cafearticle: Naver Cafe posts
        - kin: Knowledge iN answers
        """
        async with httpx.AsyncClient(timeout=15) as client:
            response = await client.get(
                f"{self.base_url}/{search_type}",
                headers={
                    "X-Naver-Client-Id": self.client_id,
                    "X-Naver-Client-Secret": self.client_secret
                },
                params={
                    "query": query,
                    "display": display,
                    "start": start,
                    "sort": "sim"  # sim=relevance, date=date
                }
            )
            response.raise_for_status()
            return response.json()

    async def search_all_types(self, query: str) -> dict:
        """search across all Naver verticals for a query."""
        types = ["webkr", "news", "blog", "shop", "cafearticle", "kin"]
        results = {}

        for search_type in types:
            try:
                data = await self.search(query, search_type, display=10)
                results[search_type] = data
            except Exception as e:
                results[search_type] = {"error": str(e)}

        return results

# usage
api = NaverSearchAPI(
    client_id="your_client_id",
    client_secret="your_client_secret"
)

to get API credentials, register at https://developers.naver.com and create an application. the free tier allows 25,000 requests per day, which is generous for most research use cases.

Scraping Search Results Directly

when you need data beyond what the API provides (rankings, featured snippets, ad positions), scrape the search results pages directly:

import asyncio
from playwright.async_api import async_playwright
from bs4 import BeautifulSoup
from urllib.parse import quote

class NaverSearchScraper:
    def __init__(self, proxy_url: str = None):
        self.proxy_url = proxy_url

    async def search(self, query: str, page: int = 1) -> dict:
        """scrape Naver search results page."""
        start = (page - 1) * 10 + 1
        encoded_query = quote(query)
        url = f"https://search.naver.com/search.naver?query={encoded_query}&start={start}"

        async with async_playwright() as p:
            launch_opts = {"headless": True}
            if self.proxy_url:
                launch_opts["proxy"] = {"server": self.proxy_url}

            browser = await p.chromium.launch(**launch_opts)
            context = await browser.new_context(
                locale="ko-KR",
                timezone_id="Asia/Seoul",
                viewport={"width": 1920, "height": 1080}
            )
            page_obj = await context.new_page()
            await page_obj.goto(url, wait_until="networkidle")

            html = await page_obj.content()
            await browser.close()

            return self._parse_results(html, query)

    def _parse_results(self, html: str, query: str) -> dict:
        """parse Naver search results HTML."""
        soup = BeautifulSoup(html, "html.parser")
        results = {
            "query": query,
            "organic_results": [],
            "blog_results": [],
            "news_results": [],
            "shopping_results": [],
            "knowledge_panel": None
        }

        # organic web results
        for item in soup.select(".lst_total .bx"):
            try:
                title_el = item.select_one(".total_tit a")
                desc_el = item.select_one(".total_dsc_wrap")
                if title_el:
                    results["organic_results"].append({
                        "title": title_el.get_text(strip=True),
                        "url": title_el.get("href", ""),
                        "description": desc_el.get_text(strip=True) if desc_el else ""
                    })
            except Exception:
                continue

        # blog results (Naver prioritizes its own blogs)
        for item in soup.select(".api_txt_lines.total_tit"):
            try:
                link = item.select_one("a")
                if link and "blog.naver.com" in link.get("href", ""):
                    results["blog_results"].append({
                        "title": link.get_text(strip=True),
                        "url": link.get("href", "")
                    })
            except Exception:
                continue

        # news section
        for item in soup.select(".news_area"):
            try:
                title_el = item.select_one(".news_tit")
                source_el = item.select_one(".info_group a")
                if title_el:
                    results["news_results"].append({
                        "title": title_el.get_text(strip=True),
                        "url": title_el.get("href", ""),
                        "source": source_el.get_text(strip=True) if source_el else ""
                    })
            except Exception:
                continue

        return results

Scraping Naver Shopping

Naver Shopping is Korea’s largest product comparison platform. scraping it gives you price intelligence across Korean e-commerce:

class NaverShoppingScraper:
    def __init__(self, proxy_url: str = None):
        self.proxy_url = proxy_url

    async def search_products(self, query: str, sort: str = "rel") -> list:
        """
        scrape Naver Shopping search results.
        sort options: rel (relevance), price_asc, price_dsc, date, review
        """
        encoded = quote(query)
        url = f"https://search.shopping.naver.com/search/all?query={encoded}&sort={sort}"

        async with async_playwright() as p:
            launch_opts = {"headless": True}
            if self.proxy_url:
                launch_opts["proxy"] = {"server": self.proxy_url}

            browser = await p.chromium.launch(**launch_opts)
            context = await browser.new_context(locale="ko-KR", timezone_id="Asia/Seoul")
            page = await context.new_page()

            await page.goto(url, wait_until="networkidle")
            await page.wait_for_timeout(2000)

            # scroll to load more products
            for _ in range(3):
                await page.evaluate("window.scrollBy(0, 1000)")
                await page.wait_for_timeout(1000)

            html = await page.content()
            await browser.close()

            return self._parse_products(html)

    def _parse_products(self, html: str) -> list:
        """parse shopping search results."""
        soup = BeautifulSoup(html, "html.parser")
        products = []

        for item in soup.select(".product_item__MDtDF"):
            try:
                product = {}

                title_el = item.select_one(".product_title__Mmw2K")
                product["title"] = title_el.get_text(strip=True) if title_el else ""

                price_el = item.select_one(".price_num__S2p_v")
                if price_el:
                    price_text = price_el.get_text(strip=True).replace(",", "").replace("원", "")
                    product["price_krw"] = int(price_text) if price_text.isdigit() else price_text

                mall_el = item.select_one(".product_mall_title")
                product["mall"] = mall_el.get_text(strip=True) if mall_el else ""

                review_el = item.select_one(".product_num__fafe5")
                product["review_count"] = review_el.get_text(strip=True) if review_el else "0"

                rating_el = item.select_one(".product_grade__QiVVK")
                product["rating"] = rating_el.get_text(strip=True) if rating_el else ""

                link_el = item.select_one("a")
                product["url"] = link_el.get("href", "") if link_el else ""

                if product["title"]:
                    products.append(product)
            except Exception:
                continue

        return products

Naver DataLab provides search trend data for the Korean market, similar to Google Trends:

class NaverDataLabScraper:
    """scrape search trends from Naver DataLab."""

    def __init__(self, client_id: str, client_secret: str):
        self.client_id = client_id
        self.client_secret = client_secret

    async def get_trend(self, keywords: list[str], start_date: str,
                        end_date: str, time_unit: str = "month") -> dict:
        """
        get search trend data via Naver DataLab API.

        time_unit: month, week, date
        dates in format: yyyy-mm-dd
        """
        url = "https://openapi.naver.com/v1/datalab/search"

        keyword_groups = [
            {"groupName": kw, "keywords": [kw]} for kw in keywords
        ]

        payload = {
            "startDate": start_date,
            "endDate": end_date,
            "timeUnit": time_unit,
            "keywordGroups": keyword_groups
        }

        async with httpx.AsyncClient(timeout=15) as client:
            response = await client.post(
                url,
                json=payload,
                headers={
                    "X-Naver-Client-Id": self.client_id,
                    "X-Naver-Client-Secret": self.client_secret,
                    "Content-Type": "application/json"
                }
            )
            response.raise_for_status()
            return response.json()

    async def get_shopping_trend(self, category: str, start_date: str,
                                  end_date: str) -> dict:
        """get shopping category trend data."""
        url = "https://openapi.naver.com/v1/datalab/shopping/categories"

        payload = {
            "startDate": start_date,
            "endDate": end_date,
            "timeUnit": "month",
            "category": [{"name": category, "param": [category]}]
        }

        async with httpx.AsyncClient(timeout=15) as client:
            response = await client.post(
                url,
                json=payload,
                headers={
                    "X-Naver-Client-Id": self.client_id,
                    "X-Naver-Client-Secret": self.client_secret,
                    "Content-Type": "application/json"
                }
            )
            return response.json()

# usage
datalab = NaverDataLabScraper("your_id", "your_secret")
trend = asyncio.run(datalab.get_trend(
    keywords=["프록시", "VPN", "웹스크래핑"],
    start_date="2025-01-01",
    end_date="2026-03-01"
))

Handling Korean Text and Encoding

Korean text requires proper encoding handling:

import re

def clean_korean_text(text: str) -> str:
    """clean and normalize Korean text from scraped content."""
    # remove HTML entities
    text = text.replace(" ", " ").replace("&", "&")

    # normalize whitespace
    text = re.sub(r'\s+', ' ', text).strip()

    # remove control characters but keep Korean characters
    text = re.sub(r'[\x00-\x08\x0b\x0c\x0e-\x1f\x7f]', '', text)

    return text

def extract_korean_keywords(text: str) -> list[str]:
    """extract Korean keyword phrases from text."""
    # match sequences of Korean characters (Hangul)
    korean_pattern = re.compile(r'[가-힣]+(?:\s[가-힣]+)*')
    matches = korean_pattern.findall(text)
    return [m.strip() for m in matches if len(m.strip()) > 1]

def translate_naver_category(category_kr: str) -> str:
    """translate common Naver Shopping categories to English."""
    categories = {
        "디지털/가전": "Digital/Electronics",
        "패션의류": "Fashion",
        "패션잡화": "Fashion Accessories",
        "화장품/미용": "Cosmetics/Beauty",
        "식품": "Food",
        "출산/육아": "Baby/Kids",
        "생활/건강": "Living/Health",
        "스포츠/레저": "Sports/Leisure",
        "도서": "Books"
    }
    return categories.get(category_kr, category_kr)

Scraping Naver Blog Content

Naver Blog posts heavily influence Korean SEO. extracting blog content helps with content analysis and keyword research:

class NaverBlogScraper:
    def __init__(self, proxy_url: str = None):
        self.proxy_url = proxy_url

    async def scrape_blog_post(self, blog_url: str) -> dict:
        """scrape a single Naver Blog post."""
        async with async_playwright() as p:
            launch_opts = {"headless": True}
            if self.proxy_url:
                launch_opts["proxy"] = {"server": self.proxy_url}

            browser = await p.chromium.launch(**launch_opts)
            context = await browser.new_context(locale="ko-KR")
            page = await context.new_page()

            await page.goto(blog_url, wait_until="networkidle")
            await page.wait_for_timeout(2000)

            # Naver Blog uses iframes for post content
            frames = page.frames
            content_frame = None
            for frame in frames:
                if "PostView" in frame.url or "post-view" in frame.url:
                    content_frame = frame
                    break

            if content_frame:
                html = await content_frame.content()
            else:
                html = await page.content()

            await browser.close()
            return self._parse_blog(html, blog_url)

    def _parse_blog(self, html: str, url: str) -> dict:
        soup = BeautifulSoup(html, "html.parser")

        title_el = soup.select_one(".se-title-text, .pcol1")
        content_el = soup.select_one(".se-main-container, #postViewArea")
        date_el = soup.select_one(".se_publishDate, .date")

        return {
            "url": url,
            "title": title_el.get_text(strip=True) if title_el else "",
            "content": clean_korean_text(content_el.get_text()) if content_el else "",
            "date": date_el.get_text(strip=True) if date_el else "",
            "word_count": len(content_el.get_text().split()) if content_el else 0
        }

Building a Complete Naver Research Pipeline

here is a full pipeline that combines all the scrapers for competitive research in the Korean market:

import asyncio
import json
from datetime import datetime

class NaverResearchPipeline:
    def __init__(self, api_id: str, api_secret: str, proxy_url: str = None):
        self.api = NaverSearchAPI(api_id, api_secret)
        self.search_scraper = NaverSearchScraper(proxy_url)
        self.shopping_scraper = NaverShoppingScraper(proxy_url)
        self.datalab = NaverDataLabScraper(api_id, api_secret)

    async def research_keyword(self, keyword: str) -> dict:
        """comprehensive research on a Korean keyword."""
        report = {
            "keyword": keyword,
            "researched_at": datetime.now().isoformat(),
            "search_results": {},
            "shopping_data": [],
            "trends": {}
        }

        # search across verticals
        report["search_results"] = await self.api.search_all_types(keyword)

        # shopping products
        report["shopping_data"] = await self.shopping_scraper.search_products(keyword)

        # search trends (last 12 months)
        try:
            report["trends"] = await self.datalab.get_trend(
                keywords=[keyword],
                start_date="2025-03-01",
                end_date="2026-03-01"
            )
        except Exception as e:
            report["trends"] = {"error": str(e)}

        return report

    async def competitive_analysis(self, keywords: list[str]) -> list:
        """analyze multiple keywords for competitive intelligence."""
        results = []
        for kw in keywords:
            print(f"researching: {kw}")
            data = await self.research_keyword(kw)
            results.append(data)
            await asyncio.sleep(3)

        return results

# usage
pipeline = NaverResearchPipeline(
    api_id="your_id",
    api_secret="your_secret",
    proxy_url="http://user-country-kr:pass@proxy.example.com:8080"
)

keywords = ["모바일 프록시", "데이터 수집", "웹 스크래핑 도구"]
results = asyncio.run(pipeline.competitive_analysis(keywords))

with open("naver_research.json", "w", encoding="utf-8") as f:
    json.dump(results, f, ensure_ascii=False, indent=2)

Best Practices for Naver Scraping

  1. use the official API first: Naver’s API is generous at 25,000 requests/day. only scrape the HTML when you need data the API does not provide

  2. always use Korean proxies: Naver content and rankings differ significantly between Korean and international IPs. Korean residential proxies are essential for accurate data

  3. set Korean locale: configure your browser context with locale="ko-KR" and timezone_id="Asia/Seoul" to get the same experience as Korean users

  4. handle encoding correctly: always save Korean text with ensure_ascii=False and UTF-8 encoding

  5. respect rate limits: Naver blocks aggressive scraping quickly. keep delays of 3-5 seconds between requests

  6. monitor for layout changes: Naver updates its frontend frequently. CSS selectors that work today may break next week. the API-first approach minimizes this risk

  7. watch for CAPTCHA: Naver shows CAPTCHAs after repeated automated access. if you hit one, rotate to a fresh proxy IP and add longer delays

Conclusion

Naver is the essential platform for anyone doing research, SEO, or market analysis in South Korea. the combination of the official Naver API (for structured search and trend data) and browser-based scraping (for rankings, shopping prices, and blog analysis) gives you comprehensive coverage of the Korean digital market. Korean residential proxies are not optional for this work. they are required for accurate, geo-targeted results that match what Korean users actually see.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top