How to Scrape Latin American Real Estate: ImovelWeb, Mercado Libre, and More

How to Scrape Latin American Real Estate: ImovelWeb, Mercado Libre, and More

latin america’s real estate market is one of the fastest-growing regions for property investment, and the data landscape is fragmented across dozens of country-specific platforms. unlike the US market where Zillow and Realtor.com dominate, LATAM real estate data is spread across platforms like ImovelWeb (Brazil), Mercado Libre Inmuebles (Argentina, Mexico, Colombia), Properati, Lamudi, and numerous local MLS systems.

this guide covers how to scrape the major latin american real estate platforms, handle their anti-bot protections, structure the data for analysis, and use proxies to maintain reliable access across different countries.

Why Scrape LATAM Real Estate Data

the business cases for latin american real estate scraping include:

  • cross-market investment analysis. comparing property prices across countries to identify undervalued markets
  • rental yield calculations. scraping both sale prices and rental listings to calculate yields
  • market trend monitoring. tracking price changes over time in specific neighborhoods
  • competitive intelligence for real estate agencies. understanding competitor listings and pricing strategies
  • academic research on urbanization. studying housing patterns across developing cities
  • proptech product development. building data-driven real estate tools for the LATAM market

Major Platforms by Country

before diving into scraping techniques, here is a map of the key platforms:

countryprimary platformssecondary platforms
BrazilImovelWeb, ZAP Imoveis, OLX BrasilVivaReal, Imoveis.com
ArgentinaMercado Libre Inmuebles, ZonaPropProperati, ArgenProp
MexicoMercado Libre Inmuebles, Inmuebles24Lamudi, Vivanuncios
ColombiaMercado Libre Inmuebles, FincaRaizProperati, Metrocuadrado
ChilePortal Inmobiliario, Yapo.clMercado Libre, TocToc
PeruUrbania, AdondeVivirOLX, Mercado Libre

Scraping ImovelWeb (Brazil)

ImovelWeb is one of Brazil’s largest real estate platforms. it has moderate anti-bot protection and requires Brazilian IP addresses for some content.

Setting Up the Scraper

import httpx
from selectolax.parser import HTMLParser
import json
import time
from dataclasses import dataclass, asdict
from typing import Optional

@dataclass
class Property:
    title: str
    price: Optional[str]
    currency: str
    location: str
    neighborhood: str
    city: str
    state: str
    area_m2: Optional[float]
    bedrooms: Optional[int]
    bathrooms: Optional[int]
    parking: Optional[int]
    property_type: str
    listing_url: str
    source: str
    scraped_at: str

class ImovelWebScraper:
    def __init__(self, proxy_url: str = None):
        self.base_url = "https://www.imovelweb.com.br"
        self.proxy = proxy_url
        self.client = httpx.Client(
            proxy=self.proxy,
            headers={
                "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
                              "AppleWebKit/537.36 (KHTML, like Gecko) "
                              "Chrome/120.0.0.0 Safari/537.36",
                "Accept-Language": "pt-BR,pt;q=0.9,en;q=0.8",
                "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
            },
            follow_redirects=True,
            timeout=30.0
        )

    def search_properties(self, city: str, property_type: str = "venda",
                          page: int = 1) -> list[Property]:
        """
        search for properties in a given city
        property_type: 'venda' (sale) or 'aluguel' (rent)
        """
        # imovelweb URL pattern
        url = f"{self.base_url}/{city}/{property_type}"
        if page > 1:
            url += f"/pagina-{page}"

        response = self.client.get(url)
        if response.status_code != 200:
            print(f"failed to fetch {url}: {response.status_code}")
            return []

        return self._parse_listing_page(response.text, city)

    def _parse_listing_page(self, html: str, city: str) -> list[Property]:
        """parse a listing page and extract property data"""
        tree = HTMLParser(html)
        properties = []

        # imovelweb uses data attributes for listing cards
        cards = tree.css('[data-qa="posting PROPERTY"]')

        for card in cards:
            try:
                prop = self._parse_card(card, city)
                if prop:
                    properties.append(prop)
            except Exception as e:
                print(f"error parsing card: {e}")
                continue

        return properties

    def _parse_card(self, card, city: str) -> Optional[Property]:
        """extract property data from a listing card"""
        from datetime import datetime

        # extract title
        title_el = card.css_first('[data-qa="POSTING_CARD_DESCRIPTION"]')
        title = title_el.text().strip() if title_el else "untitled"

        # extract price
        price_el = card.css_first('[data-qa="POSTING_CARD_PRICE"]')
        price_text = price_el.text().strip() if price_el else None

        # parse price and currency
        price = None
        currency = "BRL"
        if price_text:
            price = price_text.replace("R$", "").replace(".", "").replace(",", ".").strip()

        # extract location
        location_el = card.css_first('[data-qa="POSTING_CARD_LOCATION"]')
        location = location_el.text().strip() if location_el else ""

        # extract features (area, bedrooms, etc)
        area = self._extract_feature(card, "area")
        bedrooms = self._extract_feature_int(card, "bedrooms")
        bathrooms = self._extract_feature_int(card, "bathrooms")
        parking = self._extract_feature_int(card, "parking")

        # extract link
        link_el = card.css_first("a[href]")
        listing_url = ""
        if link_el:
            href = link_el.attributes.get("href", "")
            listing_url = href if href.startswith("http") else f"{self.base_url}{href}"

        return Property(
            title=title,
            price=price,
            currency=currency,
            location=location,
            neighborhood=self._extract_neighborhood(location),
            city=city,
            state=self._city_to_state(city),
            area_m2=area,
            bedrooms=bedrooms,
            bathrooms=bathrooms,
            parking=parking,
            property_type="sale",
            listing_url=listing_url,
            source="imovelweb",
            scraped_at=datetime.utcnow().isoformat()
        )

    def _extract_feature(self, card, feature_type: str) -> Optional[float]:
        """extract a numeric feature from a card"""
        el = card.css_first(f'[data-qa="POSTING_CARD_{feature_type.upper()}"]')
        if el:
            text = el.text().strip()
            # extract numbers from text like "120 m2"
            import re
            numbers = re.findall(r'[\d.,]+', text)
            if numbers:
                return float(numbers[0].replace(",", "."))
        return None

    def _extract_feature_int(self, card, feature_type: str) -> Optional[int]:
        """extract an integer feature"""
        value = self._extract_feature(card, feature_type)
        return int(value) if value else None

    def _extract_neighborhood(self, location: str) -> str:
        """extract neighborhood from location string"""
        parts = location.split(",")
        return parts[0].strip() if parts else ""

    def _city_to_state(self, city: str) -> str:
        """map city slug to Brazilian state"""
        city_state_map = {
            "sao-paulo": "SP",
            "rio-de-janeiro": "RJ",
            "belo-horizonte": "MG",
            "curitiba": "PR",
            "porto-alegre": "RS",
            "brasilia": "DF",
            "salvador": "BA",
            "fortaleza": "CE",
        }
        return city_state_map.get(city, "")

    def scrape_multiple_cities(self, cities: list[str], pages_per_city: int = 5) -> list[Property]:
        """scrape properties across multiple cities"""
        all_properties = []

        for city in cities:
            print(f"scraping {city}...")
            for page in range(1, pages_per_city + 1):
                properties = self.search_properties(city, page=page)
                all_properties.extend(properties)
                print(f"  page {page}: found {len(properties)} properties")
                time.sleep(2)  # respect rate limits

            time.sleep(5)  # longer delay between cities

        return all_properties

Running the ImovelWeb Scraper

# usage with a Brazilian proxy
scraper = ImovelWebScraper(
    proxy_url="http://user-country-br:pass@gate.proxyservice.com:7777"
)

# scrape properties in Sao Paulo and Rio
properties = scraper.scrape_multiple_cities(
    cities=["sao-paulo", "rio-de-janeiro"],
    pages_per_city=3
)

# export to JSON
import json
with open("latam_properties.json", "w", encoding="utf-8") as f:
    json.dump([asdict(p) for p in properties], f, ensure_ascii=False, indent=2)

print(f"scraped {len(properties)} properties total")

Scraping Mercado Libre Real Estate

Mercado Libre operates across most of Latin America and has a dedicated real estate section (Inmuebles). its anti-bot protection is more aggressive than ImovelWeb.

Handling Mercado Libre’s Protections

Mercado Libre uses several anti-bot techniques:
– JavaScript-rendered content requiring a headless browser
– device fingerprinting
– rate limiting tied to both IP and session
– CAPTCHAs for suspicious patterns

import asyncio
from playwright.async_api import async_playwright
import json

class MercadoLibreRealEstateScraper:
    def __init__(self, proxy_config: dict = None):
        """
        proxy_config: {
            "server": "http://gate.proxyservice.com:7777",
            "username": "user-country-ar",
            "password": "pass"
        }
        """
        self.proxy_config = proxy_config
        self.base_urls = {
            "argentina": "https://inmuebles.mercadolibre.com.ar",
            "mexico": "https://inmuebles.mercadolibre.com.mx",
            "colombia": "https://inmuebles.mercadolibre.com.co",
        }

    async def scrape_country(self, country: str, max_pages: int = 5) -> list[dict]:
        """scrape real estate listings for a specific country"""
        base_url = self.base_urls.get(country)
        if not base_url:
            raise ValueError(f"unsupported country: {country}")

        async with async_playwright() as p:
            browser_args = {
                "headless": True,
                "args": ["--disable-blink-features=AutomationControlled"]
            }

            if self.proxy_config:
                browser_args["proxy"] = self.proxy_config

            browser = await p.chromium.launch(**browser_args)
            context = await browser.new_context(
                viewport={"width": 1920, "height": 1080},
                locale="es-AR" if country == "argentina" else "es-MX",
                user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
                           "AppleWebKit/537.36 Chrome/120.0.0.0 Safari/537.36"
            )

            page = await context.new_page()
            all_listings = []

            for page_num in range(1, max_pages + 1):
                url = f"{base_url}/venta"
                if page_num > 1:
                    offset = (page_num - 1) * 48
                    url += f"/_Desde_{offset + 1}"

                print(f"scraping {country} page {page_num}: {url}")

                try:
                    await page.goto(url, wait_until="networkidle", timeout=60000)
                    await page.wait_for_selector(".ui-search-result", timeout=10000)

                    listings = await self._extract_listings(page, country)
                    all_listings.extend(listings)
                    print(f"  found {len(listings)} listings")

                    # random delay between pages
                    import random
                    await asyncio.sleep(random.uniform(3, 7))

                except Exception as e:
                    print(f"  error on page {page_num}: {e}")
                    continue

            await browser.close()
            return all_listings

    async def _extract_listings(self, page, country: str) -> list[dict]:
        """extract listing data from the current page"""
        listings = await page.evaluate("""
            () => {
                const results = [];
                const cards = document.querySelectorAll('.ui-search-result');

                cards.forEach(card => {
                    const titleEl = card.querySelector('.ui-search-item__title');
                    const priceEl = card.querySelector('.andes-money-amount__fraction');
                    const currencyEl = card.querySelector('.andes-money-amount__currency-symbol');
                    const locationEl = card.querySelector('.ui-search-item__location');
                    const linkEl = card.querySelector('a.ui-search-link');

                    // extract attributes (bedrooms, area, etc)
                    const attrs = {};
                    card.querySelectorAll('.ui-search-card-attributes__attribute').forEach(attr => {
                        attrs[attr.textContent.trim()] = true;
                    });

                    results.push({
                        title: titleEl ? titleEl.textContent.trim() : '',
                        price: priceEl ? priceEl.textContent.trim().replace(/\\./g, '') : null,
                        currency: currencyEl ? currencyEl.textContent.trim() : '',
                        location: locationEl ? locationEl.textContent.trim() : '',
                        url: linkEl ? linkEl.href : '',
                        attributes: Object.keys(attrs),
                    });
                });

                return results;
            }
        """)

        from datetime import datetime

        for listing in listings:
            listing["country"] = country
            listing["source"] = "mercadolibre"
            listing["scraped_at"] = datetime.utcnow().isoformat()

        return listings

# usage
async def main():
    scraper = MercadoLibreRealEstateScraper(
        proxy_config={
            "server": "http://gate.proxyservice.com:7777",
            "username": "user-country-ar",
            "password": "your_password"
        }
    )

    argentina_listings = await scraper.scrape_country("argentina", max_pages=3)
    print(f"total argentina listings: {len(argentina_listings)}")

    with open("mercadolibre_argentina.json", "w", encoding="utf-8") as f:
        json.dump(argentina_listings, f, ensure_ascii=False, indent=2)

asyncio.run(main())

Proxy Strategy for LATAM Scraping

Why Geo-Targeted Proxies Matter

latin american real estate platforms serve different content based on location:
– ImovelWeb may block non-Brazilian IPs entirely
– Mercado Libre shows different listings based on country
– some platforms redirect international visitors to a generic page
– pricing may be displayed in USD instead of local currency for foreign IPs

# proxy configuration for multi-country LATAM scraping
PROXY_CONFIG = {
    "gateway": "gate.proxyservice.com",
    "port": 7777,
    "username": "your_user",
    "password": "your_pass",
    "country_codes": {
        "brazil": "br",
        "argentina": "ar",
        "mexico": "mx",
        "colombia": "co",
        "chile": "cl",
        "peru": "pe",
    }
}

def get_proxy_url(country: str) -> str:
    """get a geo-targeted proxy URL for a specific country"""
    config = PROXY_CONFIG
    country_code = config["country_codes"].get(country, "us")
    username = f"{config['username']}-country-{country_code}"
    return f"http://{username}:{config['password']}@{config['gateway']}:{config['port']}"

# example usage
br_proxy = get_proxy_url("brazil")
# http://your_user-country-br:your_pass@gate.proxyservice.com:7777

ar_proxy = get_proxy_url("argentina")
# http://your_user-country-ar:your_pass@gate.proxyservice.com:7777

Proxy Rotation Between Requests

import random

class LatamProxyRotator:
    def __init__(self, proxy_gateway: str, username: str, password: str):
        self.gateway = proxy_gateway
        self.username = username
        self.password = password
        self.session_counter = 0

    def get_proxy(self, country: str) -> str:
        """get a proxy with session rotation for sticky sessions"""
        self.session_counter += 1
        session_id = f"session-{self.session_counter}-{random.randint(1000, 9999)}"
        username = f"{self.username}-country-{country}-session-{session_id}"
        return f"http://{username}:{self.password}@{self.gateway}"

    def get_proxy_for_platform(self, platform: str) -> str:
        """automatically select the right country proxy for a platform"""
        platform_country = {
            "imovelweb": "br",
            "zapimoveis": "br",
            "mercadolibre_ar": "ar",
            "mercadolibre_mx": "mx",
            "mercadolibre_co": "co",
            "portalinmobiliario": "cl",
            "fincaraiz": "co",
            "urbania": "pe",
        }
        country = platform_country.get(platform, "us")
        return self.get_proxy(country)

Data Normalization Across Platforms

one of the biggest challenges in multi-platform scraping is normalizing data from different formats and languages.

from dataclasses import dataclass
from typing import Optional
import re

@dataclass
class NormalizedProperty:
    """standardized property format across all LATAM platforms"""
    source_platform: str
    country: str
    city: str
    neighborhood: str
    property_type: str  # apartment, house, land, commercial
    listing_type: str   # sale, rent
    price_local: Optional[float]
    price_usd: Optional[float]
    local_currency: str
    area_m2: Optional[float]
    bedrooms: Optional[int]
    bathrooms: Optional[int]
    parking_spots: Optional[int]
    price_per_m2_local: Optional[float]
    price_per_m2_usd: Optional[float]
    listing_url: str
    scraped_at: str

class PropertyNormalizer:
    # approximate exchange rates (update regularly)
    USD_RATES = {
        "BRL": 5.0,
        "ARS": 900.0,
        "MXN": 17.0,
        "COP": 4000.0,
        "CLP": 900.0,
        "PEN": 3.7,
    }

    PROPERTY_TYPE_MAP = {
        # portuguese
        "apartamento": "apartment",
        "casa": "house",
        "terreno": "land",
        "comercial": "commercial",
        "sala": "commercial",
        "cobertura": "penthouse",
        # spanish
        "departamento": "apartment",
        "local": "commercial",
        "oficina": "office",
        "lote": "land",
    }

    @classmethod
    def normalize(cls, raw: dict, source: str, country: str) -> NormalizedProperty:
        """normalize a raw property dict into standard format"""
        from datetime import datetime

        currency = cls._detect_currency(raw, country)
        price_local = cls._parse_price(raw.get("price"))
        price_usd = cls._convert_to_usd(price_local, currency)
        area = cls._parse_area(raw.get("area_m2") or raw.get("area"))

        price_per_m2_local = None
        price_per_m2_usd = None
        if price_local and area and area > 0:
            price_per_m2_local = round(price_local / area, 2)
            if price_usd:
                price_per_m2_usd = round(price_usd / area, 2)

        return NormalizedProperty(
            source_platform=source,
            country=country,
            city=raw.get("city", ""),
            neighborhood=raw.get("neighborhood", ""),
            property_type=cls._normalize_type(raw.get("property_type", "")),
            listing_type=raw.get("listing_type", "sale"),
            price_local=price_local,
            price_usd=price_usd,
            local_currency=currency,
            area_m2=area,
            bedrooms=cls._parse_int(raw.get("bedrooms")),
            bathrooms=cls._parse_int(raw.get("bathrooms")),
            parking_spots=cls._parse_int(raw.get("parking")),
            price_per_m2_local=price_per_m2_local,
            price_per_m2_usd=price_per_m2_usd,
            listing_url=raw.get("listing_url", ""),
            scraped_at=datetime.utcnow().isoformat()
        )

    @classmethod
    def _detect_currency(cls, raw: dict, country: str) -> str:
        country_currency = {
            "brazil": "BRL",
            "argentina": "ARS",
            "mexico": "MXN",
            "colombia": "COP",
            "chile": "CLP",
            "peru": "PEN",
        }
        return raw.get("currency", country_currency.get(country, "USD"))

    @classmethod
    def _parse_price(cls, price_str) -> Optional[float]:
        if price_str is None:
            return None
        if isinstance(price_str, (int, float)):
            return float(price_str)
        # remove currency symbols and format
        cleaned = re.sub(r'[^\d.,]', '', str(price_str))
        # handle different decimal separators
        if ',' in cleaned and '.' in cleaned:
            # 1.234.567,89 format (BR/AR) or 1,234,567.89 format
            if cleaned.rindex(',') > cleaned.rindex('.'):
                cleaned = cleaned.replace('.', '').replace(',', '.')
            else:
                cleaned = cleaned.replace(',', '')
        elif ',' in cleaned:
            cleaned = cleaned.replace(',', '.')
        try:
            return float(cleaned)
        except ValueError:
            return None

    @classmethod
    def _convert_to_usd(cls, amount: Optional[float], currency: str) -> Optional[float]:
        if amount is None or currency == "USD":
            return amount
        rate = cls.USD_RATES.get(currency)
        if rate:
            return round(amount / rate, 2)
        return None

    @classmethod
    def _normalize_type(cls, raw_type: str) -> str:
        return cls.PROPERTY_TYPE_MAP.get(raw_type.lower(), raw_type.lower())

    @classmethod
    def _parse_int(cls, value) -> Optional[int]:
        if value is None:
            return None
        try:
            return int(float(value))
        except (ValueError, TypeError):
            return None

Exporting and Analyzing the Data

Export to CSV for Analysis

import csv
from dataclasses import fields, asdict

def export_to_csv(properties: list[NormalizedProperty], filename: str):
    """export normalized properties to CSV"""
    if not properties:
        print("no properties to export")
        return

    fieldnames = [f.name for f in fields(NormalizedProperty)]

    with open(filename, 'w', newline='', encoding='utf-8') as f:
        writer = csv.DictWriter(f, fieldnames=fieldnames)
        writer.writeheader()
        for prop in properties:
            writer.writerow(asdict(prop))

    print(f"exported {len(properties)} properties to {filename}")

# usage
export_to_csv(normalized_properties, "latam_real_estate_2026.csv")

Quick Market Analysis

import statistics

def analyze_market(properties: list[NormalizedProperty], city: str) -> dict:
    """quick market analysis for a specific city"""
    city_props = [p for p in properties if p.city.lower() == city.lower() and p.price_usd]

    if not city_props:
        return {"city": city, "error": "no properties found"}

    prices = [p.price_usd for p in city_props]
    prices_per_m2 = [p.price_per_m2_usd for p in city_props if p.price_per_m2_usd]

    return {
        "city": city,
        "total_listings": len(city_props),
        "median_price_usd": round(statistics.median(prices), 2),
        "mean_price_usd": round(statistics.mean(prices), 2),
        "min_price_usd": min(prices),
        "max_price_usd": max(prices),
        "median_price_per_m2_usd": round(statistics.median(prices_per_m2), 2) if prices_per_m2 else None,
        "avg_area_m2": round(statistics.mean([p.area_m2 for p in city_props if p.area_m2]), 1),
    }

# compare cities
for city in ["sao-paulo", "buenos-aires", "mexico-city", "bogota"]:
    analysis = analyze_market(normalized_properties, city)
    print(f"\n{city}:")
    for key, value in analysis.items():
        print(f"  {key}: {value}")

each country has different data protection regulations:

  • Brazil (LGPD): similar to GDPR. scraping publicly listed property data is generally acceptable, but avoid scraping personal contact information of agents or owners.
  • Argentina: personal data law (Ley 25.326) protects personal information. property listing data itself is typically fine.
  • Mexico: LFPDPPP protects personal data. real estate platforms may have specific terms against scraping.
  • Colombia: Law 1581 on personal data protection. publicly available listings are generally fair game.

general recommendations:
– only scrape publicly visible listing data
– do not scrape agent phone numbers or email addresses for marketing purposes
– respect robots.txt for each platform
– implement reasonable rate limits to avoid disrupting services
– keep scraped data for analysis purposes, not republication

Conclusion

scraping latin american real estate data requires a multi-platform, multi-country approach with geo-targeted proxies and careful data normalization. the key challenges are handling different anti-bot protections across platforms, dealing with multiple languages and currencies, and maintaining consistent data quality.

start with one platform and one country (ImovelWeb in Brazil is a good first target due to its relatively light anti-bot measures), get your normalization pipeline working, and then expand to additional platforms. the proxy investment pays for itself quickly when you consider that most LATAM platforms block international IPs outright.

the code examples in this guide provide a solid foundation. adapt the selectors to match current page structures (they change periodically) and always test with a small number of requests before scaling up.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top