How to Scrape Latin American Real Estate Sites (Imovelweb, Mercado Libre)

TL;DR
Latin American real estate platforms like Imovelweb and Mercado Libre Inmuebles require geo-targeted proxies, Spanish/Portuguese language headers, and pagination handling tuned for regional URL structures. this guide covers the practical extraction patterns for both platforms.

Latin American real estate data is underserved by English-language scraping guides. yet Brazil, Argentina, Colombia, and Mexico have significant real estate markets with active listing portals that update daily. researchers, investors, and proptech startups regularly need this data at scale.

this guide focuses on Imovelweb (Brazil’s second-largest real estate portal) and Mercado Libre Inmuebles (Mercado Libre’s real estate vertical, dominant across Spanish-speaking Latin America).

understanding the target sites

imovelweb (Brazil)

Imovelweb operates at imovelweb.com.br and covers residential and commercial listings across Brazilian states. the site serves content in Portuguese and uses a React frontend with server-side rendered listing data embedded in the page HTML. listing pages also expose structured data via JSON-LD schema markup, which is easier to parse than HTML scraping.

mercado libre inmuebles

Mercado Libre’s real estate vertical operates under different domains by country: inmuebles.mercadolibre.com.ar (Argentina), inmuebles.mercadolibre.com.mx (Mexico), inmuebles.mercadolibre.com.co (Colombia). the listing API is partially accessible via Mercado Libre’s public developer API, which provides a cleaner extraction path than HTML scraping for some data points.

proxy requirements for Latin American targets

both platforms geo-restrict some content. Imovelweb may serve different results or redirect based on detected IP location. using Brazilian residential or mobile proxies for Imovelweb and country-specific proxies for Mercado Libre avoids these restrictions and reduces bot detection rates.

datacenter proxies work initially but get blocked relatively quickly on both platforms. residential IPs have significantly higher success rates for sustained scraping. learn more about proxy types in our guide on what a proxy server is.

scraping imovelweb listings in Python

import urllib.request
import json
import re
import time

def scrape_imovelweb_page(url, proxy_url=None):
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
        "Accept-Language": "pt-BR,pt;q=0.9,en;q=0.8",
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
    }
    
    if proxy_url:
        proxy_handler = urllib.request.ProxyHandler({
            "https": proxy_url,
            "http": proxy_url
        })
        opener = urllib.request.build_opener(proxy_handler)
    else:
        opener = urllib.request.build_opener()
    
    req = urllib.request.Request(url, headers=headers)
    
    try:
        with opener.open(req, timeout=20) as resp:
            html = resp.read().decode("utf-8", errors="replace")
        return html
    except Exception as e:
        print(f"error fetching {url}: {e}")
        return None

def extract_jsonld_listings(html):
    # extract JSON-LD structured data from page
    pattern = r''
    matches = re.findall(pattern, html, re.DOTALL)
    listings = []
    for match in matches:
        try:
            data = json.loads(match.strip())
            if data.get("@type") in ["Apartment", "House", "RealEstateListing", "Product"]:
                listings.append({
                    "type": data.get("@type"),
                    "name": data.get("name"),
                    "price": data.get("offers", {}).get("price"),
                    "currency": data.get("offers", {}).get("priceCurrency"),
                    "address": data.get("address", {}).get("streetAddress"),
                    "city": data.get("address", {}).get("addressLocality"),
                    "url": data.get("url")
                })
        except json.JSONDecodeError:
            continue
    return listings

# example usage
base_url = "https://www.imovelweb.com.br/imoveis-venda-sao-paulo-sp.html"
html = scrape_imovelweb_page(base_url, proxy_url="http://user:pass@br-proxy.example.com:8080")
if html:
    listings = extract_jsonld_listings(html)
    for listing in listings[:3]:
        print(json.dumps(listing, ensure_ascii=False, indent=2))

pagination handling for imovelweb

Imovelweb uses numeric pagination in the URL: /imoveis-venda-sao-paulo-sp-pagina-2.html. extract the total listing count from the page to calculate the number of pages needed rather than iterating until you hit an empty page. the total count is typically available in the page’s meta tags or in an embedded JavaScript variable.

def build_imovelweb_page_urls(base_path, total_pages):
    urls = [f"https://www.imovelweb.com.br{base_path}.html"]
    for page in range(2, total_pages + 1):
        urls.append(f"https://www.imovelweb.com.br{base_path}-pagina-{page}.html")
    return urls

mercado libre API approach

Mercado Libre exposes a public search API that works for real estate listings without requiring authentication for basic queries. this is cleaner and more reliable than HTML scraping for core listing data.

import urllib.request
import urllib.parse
import json

def search_mercadolibre_real_estate(country_code, query, limit=50, offset=0):
    # country codes: MLA=Argentina, MLM=Mexico, MCO=Colombia, MLB=Brazil
    params = urllib.parse.urlencode({
        "q": query,
        "category": f"{country_code}1459",  # real estate category
        "limit": limit,
        "offset": offset
    })
    url = f"https://api.mercadolibre.com/sites/{country_code}/search?{params}"
    
    req = urllib.request.Request(url, headers={
        "Accept": "application/json",
        "Accept-Language": "es-AR"
    })
    
    try:
        with urllib.request.urlopen(req, timeout=15) as resp:
            return json.loads(resp.read())
    except Exception as e:
        print(f"API error: {e}")
        return None

# search Buenos Aires apartments for sale
result = search_mercadolibre_real_estate("MLA", "departamento venta buenos aires")
if result:
    print(f"total results: {result.get('paging', {}).get('total')}")
    for item in result.get("results", [])[:3]:
        print(f"- {item.get('title')} | ${item.get('price')} {item.get('currency_id')}")

handling currency and price formats

Latin American real estate pricing varies by country: Brazil uses BRL (reais), Argentina uses ARS and USD (many listings show both), Mexico uses MXN. Mercado Libre’s API returns the currency code alongside the price. normalize to a common currency at ingestion time using exchange rate data rather than storing raw values across different currencies without context.

data quality considerations

listing data quality on Latin American portals varies more than on European or North American sites. duplicate listings from multiple agencies are common. price formats are inconsistent (some listings use dots as thousand separators, others use commas). area measurements may use m2 or m² or the text “metros”. build normalization into your extraction pipeline rather than treating raw values as clean data.

understanding the full web scraping pipeline from request to structured storage helps you design for this data quality variance from the start.