Scrapling Web Scraping Library: Complete Python Tutorial

TL;DR
Scrapling is a Python library that combines the simplicity of BeautifulSoup with auto-matching for resilient element selection. it handles element location drift automatically, making scrapers more durable. this tutorial covers installation, selectors, and real-world usage.

what is Scrapling?

Scrapling is an open-source Python library designed to make web scrapers resilient to website structure changes. traditional scrapers break when a site updates its HTML because hardcoded CSS selectors or XPaths stop matching. Scrapling solves this with auto-matching — it learns element fingerprints and finds elements even when their location shifts.

the library is built on top of lxml for performance and provides a jQuery-like API that feels familiar to developers coming from BeautifulSoup or Scrapy. Scrapling version 0.2.9+ supports both synchronous and async workflows.

installation

pip install scrapling

Scrapling requires Python 3.8+. for async support and browser-based fetching, install the full extras:

pip install scrapling[all]
# installs playwright, httpx, and async dependencies

basic usage: fetching and parsing

Scrapling’s Fetcher class handles HTTP requests, and the response object exposes a rich selector API. here is a basic example scraping article titles from a news page:

from scrapling import Fetcher

fetcher = Fetcher(auto_match=False)
page = fetcher.get("https://news.ycombinator.com")

# CSS selector — returns a list of Tag objects
titles = page.css(".titleline > a")
for title in titles:
    print(title.text, "|", title.attrib.get("href", ""))

# XPath selector
links = page.xpath("//a[@class='storylink']/@href")
print(links)

auto-matching: the killer feature

auto-matching is what sets Scrapling apart. when enabled, it stores a fingerprint of each matched element (text content, tag structure, sibling context) so it can relocate the same element if the site’s HTML changes. this is ideal for production scrapers that run for months without maintenance.

from scrapling import Fetcher

fetcher = Fetcher(auto_match=True, auto_match_db="scrapling_cache.db")

page = fetcher.get("https://example-shop.com/product/123")

# first run: finds element by CSS and stores fingerprint
price = page.find("span.price", auto_save=True, identifier="product_price")

# subsequent runs: relocates element even if CSS changes
price = page.find(identifier="product_price")
print(price.text)

async scraping with Scrapling

for scraping multiple pages concurrently, use the async fetcher. Scrapling wraps httpx under the hood for async HTTP and supports asyncio natively.

import asyncio
from scrapling import AsyncFetcher

async def scrape_pages(urls):
    fetcher = AsyncFetcher(auto_match=False)
    results = []
    for url in urls:
        page = await fetcher.get(url)
        title = page.find("h1")
        results.append({"url": url, "title": title.text if title else ""})
    return results

urls = ["https://example.com/page1", "https://example.com/page2"]
data = asyncio.run(scrape_pages(urls))
print(data)

browser-based fetching

for JavaScript-heavy pages, Scrapling integrates Playwright via its PlaywrightFetcher class. this lets you render pages fully before parsing, using the same Scrapling selector API on the rendered DOM.

from scrapling import PlaywrightFetcher

fetcher = PlaywrightFetcher(headless=True)
page = fetcher.get("https://spa-site.com/products")

# wait for dynamic content
products = page.css(".product-card")
for p in products:
    name = p.find(".product-name")
    price = p.find(".product-price")
    print(name.text, price.text)

comparison with BeautifulSoup

BeautifulSoup is the most popular HTML parser in Python but has no built-in resilience for structure changes. Scrapling is a better choice for long-running scrapers or production pipelines where maintenance cost matters. for one-off scripts, BeautifulSoup’s smaller footprint may be preferable.

for further context on scraping approaches, see what is web scraping. for proxy integration, check what is a proxy server and SOCKS5 vs HTTP proxy.