How to Scrape ImovelWeb Brazil: Property Data Pipeline (2026)

—

ImovelWeb is Brazil’s second-largest property portal, listing 3+ million active rental and sale properties across São Paulo, Rio de Janeiro, and every major metro. if you’re building a Brazilian real estate dataset — for investment analysis, price forecasting, or competitive research — scraping ImovelWeb is faster and more complete than any official data source. here’s how to build a reliable pipeline in 2026.

What ImovelWeb Serves and How It Protects Itself

ImovelWeb runs on a React frontend with server-side rendering. most listing pages load critical data (price, address, specs) inline in the HTML, which means you don’t need to execute JavaScript for basic fields. detail pages hydrate additional data via XHR calls to their internal API, so a two-pass approach (static HTML for listing index + XHR interception for full property detail) is the most efficient architecture.

anti-bot defenses as of 2026:

Cloudflare Turnstile on search result pages at high request volume
rate limiting by IP: roughly 60-80 requests per minute before soft blocks appear
user-agent and header fingerprinting on the detail page XHR endpoints
cookie-based session tokens that expire after ~10 minutes of inactivity

no CAPTCHA on individual property pages at moderate volume, but aggressive crawling triggers 429s fast. the defense profile is similar to what you’d encounter on Realtor.com — if you’ve read How to Scrape Realtor.com Property Data in 2026 (Bypass Next.js Protection), the same proxy rotation and header hygiene principles apply directly here.

Parsing the HTML: Key Selectors

ImovelWeb listing pages use consistent CSS classes that have been stable through 2025-2026. the search results grid renders listing cards server-side, which is the cleanest extraction path.

import httpx
from selectolax.parser import HTMLParser

HEADERS = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
    "Accept-Language": "pt-BR,pt;q=0.9",
    "Accept": "text/html,application/xhtml+xml",
    "Referer": "https://www.imovelweb.com.br/",
}

def parse_listings(html: str) -> list[dict]:
    tree = HTMLParser(html)
    results = []
    for card in tree.css("div[data-qa='posting PROPERTY']"):
        price = card.css_first("div[data-qa='POSTING_CARD_PRICE']")
        address = card.css_first("div[data-qa='POSTING_CARD_LOCATION']")
        link = card.css_first("a[data-qa='posting PROPERTY']")
        results.append({
            "price": price.text(strip=True) if price else None,
            "address": address.text(strip=True) if address else None,
            "url": "https://www.imovelweb.com.br" + link.attrs.get("href", "") if link else None,
        })
    return results

key attributes to extract from cards: data-qa="POSTING_CARD_PRICE", POSTING_CARD_FEATURES (beds/baths/m²), POSTING_CARD_LOCATION, and the canonical listing URL. for the full detail page, the JSON-LD block under

Scroll to Top

   message me on telegram    
Resources
Proxy Signals Podcast
Operator-level insights on mobile proxies and access infrastructure.
Multi-Account Proxies: Setup, Types, Tools & Mistakes (2026)
                 English