Scrapling Web Scraping Python Tutorial: Adaptive Scraping

Scrapling is a Python web scraping library designed to solve one of the biggest pain points in scraping: broken selectors. when a website changes its HTML structure, traditional scrapers break. Scrapling uses intelligent, adaptive selectors that can still find the right elements even after a website redesign.

this tutorial covers everything you need to start scraping with Scrapling, from basic usage to advanced techniques with proxy integration.

What is Scrapling

Scrapling is an open-source Python library that takes a different approach to element selection. instead of relying solely on CSS selectors or XPath that break when HTML changes, Scrapling builds a fingerprint of each element based on multiple attributes. when the page structure changes, it uses fuzzy matching to locate the same element in the new layout.

key features include:

adaptive selectors: elements are identified by multiple properties, not just their CSS path
auto-match: if an element moves in the DOM, Scrapling can still find it
built-in stealth: browser fingerprint randomization and anti-detection features
playwright integration: full JavaScript rendering support
clean API: simple, intuitive interface inspired by BeautifulSoup

Installation

install Scrapling with pip:

pip install scrapling
python -m scrapling install  # installs browser dependencies

this installs the core library plus Playwright browsers for JavaScript rendering.

for a minimal installation without browser support:

pip install scrapling[core]

Basic Usage

Scrapling provides three main fetcher classes depending on your needs.

StaticFetcher: Simple HTTP Requests

use this for pages that do not require JavaScript rendering:

from scrapling import StaticFetcher

fetcher = StaticFetcher()

# fetch a page
page = fetcher.get("https://quotes.toscrape.com/")

# find elements using CSS selectors
quotes = page.find_all("div.quote")

for quote in quotes:
    text = quote.find("span.text").text
    author = quote.find("small.author").text
    print(f"{author}: {text}")

StealthFetcher: Anti-Detection Browsing

use this when websites block standard requests:

from scrapling import StealthFetcher

fetcher = StealthFetcher()

# this uses a real browser with anti-detection measures
page = fetcher.get("https://example.com/protected-page")

# Scrapling handles fingerprint randomization automatically
products = page.find_all("div.product-card")

for product in products:
    name = product.find("h2").text
    price = product.find(".price").text
    print(f"{name}: {price}")

PlayWrightFetcher: Full Browser Control

use this when you need full control over the browser:

from scrapling import PlayWrightFetcher

fetcher = PlayWrightFetcher()

page = fetcher.get(
    "https://example.com/dynamic-page",
    wait_selector="div.results",  # wait for this element
    timeout=15000
)

results = page.find_all("div.result-item")
for item in results:
    print(item.text)

Adaptive Selectors: the Core Feature

the most powerful feature of Scrapling is adaptive selection. here is how it works.

Traditional Approach (Fragile)

# this breaks when the website changes its HTML structure
price = page.find("div.product-info > span.price-current > strong")

Scrapling Adaptive Approach (Resilient)

from scrapling import Adaptor

# first run: Scrapling learns the element
html = fetcher.get("https://example.com/product")

# use auto_match to find elements adaptively
price_element = html.find("span.price", auto_match=True)

# Scrapling creates a fingerprint based on:
# - text content patterns
# - surrounding elements
# - attribute values
# - position in the document
# - visual similarity

# on subsequent runs, even if the class name changes from
# "price" to "product-price" or the element moves,
# Scrapling can still locate it using the fingerprint

How Auto-Match Works

Scrapling’s auto_match builds an element profile using multiple signals:

from scrapling import Adaptor

# parse HTML content
page = Adaptor(html_content)

# find with auto_match enabled
# Scrapling uses these signals to identify elements:
# 1. text content and patterns (e.g., "$XX.XX" looks like a price)
# 2. element tag and attributes
# 3. sibling and parent context
# 4. position relative to other matched elements

product_name = page.find("h1.product-title", auto_match=True)
product_price = page.find("span.price", auto_match=True)

# save the learned selectors for future use
# this stores the element fingerprints
page.save("product_page_profile")

# on the next run, load the profile
page2 = Adaptor(new_html_content)
page2.load("product_page_profile")

# even if classes changed, auto_match finds the right elements
name = page2.find("h1.product-title", auto_match=True)
price = page2.find("span.price", auto_match=True)

Integrating Proxies with Scrapling

proxy support is essential for serious scraping. here is how to set it up with each fetcher.

Proxies with StaticFetcher

from scrapling import StaticFetcher

# single proxy
fetcher = StaticFetcher()
page = fetcher.get(
    "https://example.com",
    proxy="http://user:pass@proxy.example.com:8080"
)

# rotating proxies
import random

PROXY_POOL = [
    "http://user:pass@us1.proxy.com:8080",
    "http://user:pass@us2.proxy.com:8080",
    "http://user:pass@eu1.proxy.com:8080",
]

fetcher = StaticFetcher()
page = fetcher.get(
    "https://example.com",
    proxy=random.choice(PROXY_POOL)
)

Proxies with StealthFetcher

from scrapling import StealthFetcher

fetcher = StealthFetcher()

# residential proxy for maximum success rate
page = fetcher.get(
    "https://protected-site.com/data",
    proxy="http://user:pass@residential.proxy.com:8080"
)

Proxies with PlayWrightFetcher

from scrapling import PlayWrightFetcher

fetcher = PlayWrightFetcher()

page = fetcher.get(
    "https://example.com/products",
    proxy={
        "server": "http://proxy.example.com:8080",
        "username": "user",
        "password": "pass"
    }
)

Building a Complete Scraping Pipeline

let us build a practical scraper that extracts product data with proxy rotation and error handling.

import json
import time
import random
from scrapling import StealthFetcher, Adaptor

class ScraplingPipeline:
    def __init__(self, proxies=None):
        self.fetcher = StealthFetcher()
        self.proxies = proxies or []
        self.results = []

    def _get_proxy(self):
        if not self.proxies:
            return None
        return random.choice(self.proxies)

    def scrape_product(self, url):
        """scrape a single product page."""
        proxy = self._get_proxy()

        page = self.fetcher.get(url, proxy=proxy)

        # use adaptive selectors
        name = page.find("h1", auto_match=True)
        price = page.find("[class*='price']", auto_match=True)
        description = page.find("[class*='description']", auto_match=True)
        rating = page.find("[class*='rating']", auto_match=True)

        return {
            "url": url,
            "name": name.text if name else None,
            "price": price.text if price else None,
            "description": description.text if description else None,
            "rating": rating.text if rating else None,
        }

    def scrape_listing(self, url, product_selector="a[href*='product']"):
        """scrape a listing page and follow product links."""
        proxy = self._get_proxy()
        page = self.fetcher.get(url, proxy=proxy)

        # find product links
        links = page.find_all(product_selector)
        product_urls = []

        for link in links:
            href = link.get("href")
            if href:
                if href.startswith("/"):
                    # convert relative to absolute
                    from urllib.parse import urljoin
                    href = urljoin(url, href)
                product_urls.append(href)

        return product_urls

    def run(self, listing_url, max_products=50, delay=2.0):
        """full pipeline: listing -> product pages -> structured data."""
        print(f"scraping listing: {listing_url}")
        product_urls = self.scrape_listing(listing_url)
        product_urls = product_urls[:max_products]

        print(f"found {len(product_urls)} products")

        for i, url in enumerate(product_urls):
            print(f"  [{i+1}/{len(product_urls)}] {url}")
            try:
                data = self.scrape_product(url)
                self.results.append(data)
            except Exception as e:
                print(f"    error: {e}")
                self.results.append({"url": url, "error": str(e)})

            time.sleep(delay + random.uniform(0, 1))

        return self.results

    def save(self, filename):
        """save results to JSON."""
        with open(filename, "w") as f:
            json.dump(self.results, f, indent=2)
        print(f"saved {len(self.results)} results to {filename}")

# usage
pipeline = ScraplingPipeline(proxies=[
    "http://user:pass@proxy1.example.com:8080",
    "http://user:pass@proxy2.example.com:8080",
])

results = pipeline.run("https://example.com/products", max_products=20)
pipeline.save("products.json")

Handling Pagination

scraping multiple pages of results:

from scrapling import StealthFetcher

def scrape_paginated(base_url, max_pages=10, proxies=None):
    """scrape through paginated results."""
    fetcher = StealthFetcher()
    all_items = []

    for page_num in range(1, max_pages + 1):
        url = f"{base_url}?page={page_num}"
        proxy = random.choice(proxies) if proxies else None

        page = fetcher.get(url, proxy=proxy)
        items = page.find_all("div.item", auto_match=True)

        if not items:
            print(f"no items on page {page_num}, stopping")
            break

        for item in items:
            all_items.append({
                "title": item.find("h2").text if item.find("h2") else None,
                "price": item.find(".price").text if item.find(".price") else None,
            })

        print(f"page {page_num}: {len(items)} items (total: {len(all_items)})")
        time.sleep(2)

    return all_items

Data Extraction Patterns

Extracting Tables

from scrapling import Adaptor

def extract_table(page, table_selector="table"):
    """extract table data into a list of dictionaries."""
    table = page.find(table_selector)
    if not table:
        return []

    # get headers
    headers = []
    header_row = table.find("thead tr") or table.find("tr")
    if header_row:
        headers = [th.text.strip() for th in header_row.find_all(["th", "td"])]

    # get rows
    rows = []
    body = table.find("tbody") or table
    for tr in body.find_all("tr")[1:] if not table.find("thead") else body.find_all("tr"):
        cells = [td.text.strip() for td in tr.find_all("td")]
        if cells and len(cells) == len(headers):
            rows.append(dict(zip(headers, cells)))

    return rows

Extracting Structured Data (JSON-LD)

import json

def extract_jsonld(page):
    """extract JSON-LD structured data from a page."""
    scripts = page.find_all("script[type='application/ld+json']")

    structured_data = []
    for script in scripts:
        try:
            data = json.loads(script.text)
            structured_data.append(data)
        except json.JSONDecodeError:
            continue

    return structured_data

Scrapling vs Other Libraries

Feature	Scrapling	BeautifulSoup	Scrapy	Playwright
adaptive selectors	yes	no	no	no
anti-detection	built-in	no	no	partial
JS rendering	yes	no	with plugin	yes
async support	yes	no	yes	yes
learning curve	low	low	medium	medium
speed	moderate	fast	fast	slow
proxy support	yes	manual	yes	yes

Troubleshooting Common Issues

auto_match returns the wrong element:
this usually happens when multiple elements look similar. provide more context in your selector or use a combination of tag name and partial class matching.

# too vague
price = page.find("span", auto_match=True)

# better: more context
price = page.find("span[class*='price']", auto_match=True)

# best: combine with parent context
product_div = page.find("div.product-main")
price = product_div.find("span[class*='price']", auto_match=True)

StealthFetcher is slow:
the stealth fetcher launches a full browser. for pages that do not need JS rendering, switch to StaticFetcher which is much faster.

proxy connection errors:
verify your proxy credentials and test connectivity separately before integrating with Scrapling.

import httpx

proxy = "http://user:pass@proxy.example.com:8080"
try:
    resp = httpx.get("https://httpbin.org/ip", proxy=proxy, timeout=10)
    print(f"proxy IP: {resp.json()['origin']}")
except Exception as e:
    print(f"proxy error: {e}")

Performance Tips

use StaticFetcher when possible: it is 5 to 10 times faster than browser-based fetchers
cache responses: save HTML locally during development to avoid re-fetching
batch processing: collect URLs first, then process them with controlled concurrency
respect rate limits: add delays between requests to avoid getting blocked
rotate proxies: distribute requests across multiple IPs for better success rates
profile your selectors: auto_match is powerful but slower than direct CSS selectors. use it only where you need adaptability

Conclusion

Scrapling fills a real gap in the Python scraping ecosystem. its adaptive selectors mean you spend less time maintaining broken scrapers and more time working with the data you extract. combined with built-in stealth features and proxy support, it is a solid choice for projects where website stability is a concern.

start with the StealthFetcher for protected sites, use StaticFetcher for speed on simple pages, and enable auto_match on elements that tend to change. this combination handles most real-world scraping scenarios without the constant maintenance that traditional approaches demand.