How to Scrape Latin American Real Estate: ImovelWeb, Mercado Libre, and More
latin america’s real estate market is one of the fastest-growing regions for property investment, and the data landscape is fragmented across dozens of country-specific platforms. unlike the US market where Zillow and Realtor.com dominate, LATAM real estate data is spread across platforms like ImovelWeb (Brazil), Mercado Libre Inmuebles (Argentina, Mexico, Colombia), Properati, Lamudi, and numerous local MLS systems.
this guide covers how to scrape the major latin american real estate platforms, handle their anti-bot protections, structure the data for analysis, and use proxies to maintain reliable access across different countries.
Why Scrape LATAM Real Estate Data
the business cases for latin american real estate scraping include:
- cross-market investment analysis. comparing property prices across countries to identify undervalued markets
- rental yield calculations. scraping both sale prices and rental listings to calculate yields
- market trend monitoring. tracking price changes over time in specific neighborhoods
- competitive intelligence for real estate agencies. understanding competitor listings and pricing strategies
- academic research on urbanization. studying housing patterns across developing cities
- proptech product development. building data-driven real estate tools for the LATAM market
Major Platforms by Country
before diving into scraping techniques, here is a map of the key platforms:
| country | primary platforms | secondary platforms |
|---|---|---|
| Brazil | ImovelWeb, ZAP Imoveis, OLX Brasil | VivaReal, Imoveis.com |
| Argentina | Mercado Libre Inmuebles, ZonaProp | Properati, ArgenProp |
| Mexico | Mercado Libre Inmuebles, Inmuebles24 | Lamudi, Vivanuncios |
| Colombia | Mercado Libre Inmuebles, FincaRaiz | Properati, Metrocuadrado |
| Chile | Portal Inmobiliario, Yapo.cl | Mercado Libre, TocToc |
| Peru | Urbania, AdondeVivir | OLX, Mercado Libre |
Scraping ImovelWeb (Brazil)
ImovelWeb is one of Brazil’s largest real estate platforms. it has moderate anti-bot protection and requires Brazilian IP addresses for some content.
Setting Up the Scraper
import httpx
from selectolax.parser import HTMLParser
import json
import time
from dataclasses import dataclass, asdict
from typing import Optional
@dataclass
class Property:
title: str
price: Optional[str]
currency: str
location: str
neighborhood: str
city: str
state: str
area_m2: Optional[float]
bedrooms: Optional[int]
bathrooms: Optional[int]
parking: Optional[int]
property_type: str
listing_url: str
source: str
scraped_at: str
class ImovelWebScraper:
def __init__(self, proxy_url: str = None):
self.base_url = "https://www.imovelweb.com.br"
self.proxy = proxy_url
self.client = httpx.Client(
proxy=self.proxy,
headers={
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/120.0.0.0 Safari/537.36",
"Accept-Language": "pt-BR,pt;q=0.9,en;q=0.8",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
},
follow_redirects=True,
timeout=30.0
)
def search_properties(self, city: str, property_type: str = "venda",
page: int = 1) -> list[Property]:
"""
search for properties in a given city
property_type: 'venda' (sale) or 'aluguel' (rent)
"""
# imovelweb URL pattern
url = f"{self.base_url}/{city}/{property_type}"
if page > 1:
url += f"/pagina-{page}"
response = self.client.get(url)
if response.status_code != 200:
print(f"failed to fetch {url}: {response.status_code}")
return []
return self._parse_listing_page(response.text, city)
def _parse_listing_page(self, html: str, city: str) -> list[Property]:
"""parse a listing page and extract property data"""
tree = HTMLParser(html)
properties = []
# imovelweb uses data attributes for listing cards
cards = tree.css('[data-qa="posting PROPERTY"]')
for card in cards:
try:
prop = self._parse_card(card, city)
if prop:
properties.append(prop)
except Exception as e:
print(f"error parsing card: {e}")
continue
return properties
def _parse_card(self, card, city: str) -> Optional[Property]:
"""extract property data from a listing card"""
from datetime import datetime
# extract title
title_el = card.css_first('[data-qa="POSTING_CARD_DESCRIPTION"]')
title = title_el.text().strip() if title_el else "untitled"
# extract price
price_el = card.css_first('[data-qa="POSTING_CARD_PRICE"]')
price_text = price_el.text().strip() if price_el else None
# parse price and currency
price = None
currency = "BRL"
if price_text:
price = price_text.replace("R$", "").replace(".", "").replace(",", ".").strip()
# extract location
location_el = card.css_first('[data-qa="POSTING_CARD_LOCATION"]')
location = location_el.text().strip() if location_el else ""
# extract features (area, bedrooms, etc)
area = self._extract_feature(card, "area")
bedrooms = self._extract_feature_int(card, "bedrooms")
bathrooms = self._extract_feature_int(card, "bathrooms")
parking = self._extract_feature_int(card, "parking")
# extract link
link_el = card.css_first("a[href]")
listing_url = ""
if link_el:
href = link_el.attributes.get("href", "")
listing_url = href if href.startswith("http") else f"{self.base_url}{href}"
return Property(
title=title,
price=price,
currency=currency,
location=location,
neighborhood=self._extract_neighborhood(location),
city=city,
state=self._city_to_state(city),
area_m2=area,
bedrooms=bedrooms,
bathrooms=bathrooms,
parking=parking,
property_type="sale",
listing_url=listing_url,
source="imovelweb",
scraped_at=datetime.utcnow().isoformat()
)
def _extract_feature(self, card, feature_type: str) -> Optional[float]:
"""extract a numeric feature from a card"""
el = card.css_first(f'[data-qa="POSTING_CARD_{feature_type.upper()}"]')
if el:
text = el.text().strip()
# extract numbers from text like "120 m2"
import re
numbers = re.findall(r'[\d.,]+', text)
if numbers:
return float(numbers[0].replace(",", "."))
return None
def _extract_feature_int(self, card, feature_type: str) -> Optional[int]:
"""extract an integer feature"""
value = self._extract_feature(card, feature_type)
return int(value) if value else None
def _extract_neighborhood(self, location: str) -> str:
"""extract neighborhood from location string"""
parts = location.split(",")
return parts[0].strip() if parts else ""
def _city_to_state(self, city: str) -> str:
"""map city slug to Brazilian state"""
city_state_map = {
"sao-paulo": "SP",
"rio-de-janeiro": "RJ",
"belo-horizonte": "MG",
"curitiba": "PR",
"porto-alegre": "RS",
"brasilia": "DF",
"salvador": "BA",
"fortaleza": "CE",
}
return city_state_map.get(city, "")
def scrape_multiple_cities(self, cities: list[str], pages_per_city: int = 5) -> list[Property]:
"""scrape properties across multiple cities"""
all_properties = []
for city in cities:
print(f"scraping {city}...")
for page in range(1, pages_per_city + 1):
properties = self.search_properties(city, page=page)
all_properties.extend(properties)
print(f" page {page}: found {len(properties)} properties")
time.sleep(2) # respect rate limits
time.sleep(5) # longer delay between cities
return all_properties
Running the ImovelWeb Scraper
# usage with a Brazilian proxy
scraper = ImovelWebScraper(
proxy_url="http://user-country-br:pass@gate.proxyservice.com:7777"
)
# scrape properties in Sao Paulo and Rio
properties = scraper.scrape_multiple_cities(
cities=["sao-paulo", "rio-de-janeiro"],
pages_per_city=3
)
# export to JSON
import json
with open("latam_properties.json", "w", encoding="utf-8") as f:
json.dump([asdict(p) for p in properties], f, ensure_ascii=False, indent=2)
print(f"scraped {len(properties)} properties total")
Scraping Mercado Libre Real Estate
Mercado Libre operates across most of Latin America and has a dedicated real estate section (Inmuebles). its anti-bot protection is more aggressive than ImovelWeb.
Handling Mercado Libre’s Protections
Mercado Libre uses several anti-bot techniques:
– JavaScript-rendered content requiring a headless browser
– device fingerprinting
– rate limiting tied to both IP and session
– CAPTCHAs for suspicious patterns
import asyncio
from playwright.async_api import async_playwright
import json
class MercadoLibreRealEstateScraper:
def __init__(self, proxy_config: dict = None):
"""
proxy_config: {
"server": "http://gate.proxyservice.com:7777",
"username": "user-country-ar",
"password": "pass"
}
"""
self.proxy_config = proxy_config
self.base_urls = {
"argentina": "https://inmuebles.mercadolibre.com.ar",
"mexico": "https://inmuebles.mercadolibre.com.mx",
"colombia": "https://inmuebles.mercadolibre.com.co",
}
async def scrape_country(self, country: str, max_pages: int = 5) -> list[dict]:
"""scrape real estate listings for a specific country"""
base_url = self.base_urls.get(country)
if not base_url:
raise ValueError(f"unsupported country: {country}")
async with async_playwright() as p:
browser_args = {
"headless": True,
"args": ["--disable-blink-features=AutomationControlled"]
}
if self.proxy_config:
browser_args["proxy"] = self.proxy_config
browser = await p.chromium.launch(**browser_args)
context = await browser.new_context(
viewport={"width": 1920, "height": 1080},
locale="es-AR" if country == "argentina" else "es-MX",
user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 Chrome/120.0.0.0 Safari/537.36"
)
page = await context.new_page()
all_listings = []
for page_num in range(1, max_pages + 1):
url = f"{base_url}/venta"
if page_num > 1:
offset = (page_num - 1) * 48
url += f"/_Desde_{offset + 1}"
print(f"scraping {country} page {page_num}: {url}")
try:
await page.goto(url, wait_until="networkidle", timeout=60000)
await page.wait_for_selector(".ui-search-result", timeout=10000)
listings = await self._extract_listings(page, country)
all_listings.extend(listings)
print(f" found {len(listings)} listings")
# random delay between pages
import random
await asyncio.sleep(random.uniform(3, 7))
except Exception as e:
print(f" error on page {page_num}: {e}")
continue
await browser.close()
return all_listings
async def _extract_listings(self, page, country: str) -> list[dict]:
"""extract listing data from the current page"""
listings = await page.evaluate("""
() => {
const results = [];
const cards = document.querySelectorAll('.ui-search-result');
cards.forEach(card => {
const titleEl = card.querySelector('.ui-search-item__title');
const priceEl = card.querySelector('.andes-money-amount__fraction');
const currencyEl = card.querySelector('.andes-money-amount__currency-symbol');
const locationEl = card.querySelector('.ui-search-item__location');
const linkEl = card.querySelector('a.ui-search-link');
// extract attributes (bedrooms, area, etc)
const attrs = {};
card.querySelectorAll('.ui-search-card-attributes__attribute').forEach(attr => {
attrs[attr.textContent.trim()] = true;
});
results.push({
title: titleEl ? titleEl.textContent.trim() : '',
price: priceEl ? priceEl.textContent.trim().replace(/\\./g, '') : null,
currency: currencyEl ? currencyEl.textContent.trim() : '',
location: locationEl ? locationEl.textContent.trim() : '',
url: linkEl ? linkEl.href : '',
attributes: Object.keys(attrs),
});
});
return results;
}
""")
from datetime import datetime
for listing in listings:
listing["country"] = country
listing["source"] = "mercadolibre"
listing["scraped_at"] = datetime.utcnow().isoformat()
return listings
# usage
async def main():
scraper = MercadoLibreRealEstateScraper(
proxy_config={
"server": "http://gate.proxyservice.com:7777",
"username": "user-country-ar",
"password": "your_password"
}
)
argentina_listings = await scraper.scrape_country("argentina", max_pages=3)
print(f"total argentina listings: {len(argentina_listings)}")
with open("mercadolibre_argentina.json", "w", encoding="utf-8") as f:
json.dump(argentina_listings, f, ensure_ascii=False, indent=2)
asyncio.run(main())
Proxy Strategy for LATAM Scraping
Why Geo-Targeted Proxies Matter
latin american real estate platforms serve different content based on location:
– ImovelWeb may block non-Brazilian IPs entirely
– Mercado Libre shows different listings based on country
– some platforms redirect international visitors to a generic page
– pricing may be displayed in USD instead of local currency for foreign IPs
Recommended Proxy Configuration
# proxy configuration for multi-country LATAM scraping
PROXY_CONFIG = {
"gateway": "gate.proxyservice.com",
"port": 7777,
"username": "your_user",
"password": "your_pass",
"country_codes": {
"brazil": "br",
"argentina": "ar",
"mexico": "mx",
"colombia": "co",
"chile": "cl",
"peru": "pe",
}
}
def get_proxy_url(country: str) -> str:
"""get a geo-targeted proxy URL for a specific country"""
config = PROXY_CONFIG
country_code = config["country_codes"].get(country, "us")
username = f"{config['username']}-country-{country_code}"
return f"http://{username}:{config['password']}@{config['gateway']}:{config['port']}"
# example usage
br_proxy = get_proxy_url("brazil")
# http://your_user-country-br:your_pass@gate.proxyservice.com:7777
ar_proxy = get_proxy_url("argentina")
# http://your_user-country-ar:your_pass@gate.proxyservice.com:7777
Proxy Rotation Between Requests
import random
class LatamProxyRotator:
def __init__(self, proxy_gateway: str, username: str, password: str):
self.gateway = proxy_gateway
self.username = username
self.password = password
self.session_counter = 0
def get_proxy(self, country: str) -> str:
"""get a proxy with session rotation for sticky sessions"""
self.session_counter += 1
session_id = f"session-{self.session_counter}-{random.randint(1000, 9999)}"
username = f"{self.username}-country-{country}-session-{session_id}"
return f"http://{username}:{self.password}@{self.gateway}"
def get_proxy_for_platform(self, platform: str) -> str:
"""automatically select the right country proxy for a platform"""
platform_country = {
"imovelweb": "br",
"zapimoveis": "br",
"mercadolibre_ar": "ar",
"mercadolibre_mx": "mx",
"mercadolibre_co": "co",
"portalinmobiliario": "cl",
"fincaraiz": "co",
"urbania": "pe",
}
country = platform_country.get(platform, "us")
return self.get_proxy(country)
Data Normalization Across Platforms
one of the biggest challenges in multi-platform scraping is normalizing data from different formats and languages.
from dataclasses import dataclass
from typing import Optional
import re
@dataclass
class NormalizedProperty:
"""standardized property format across all LATAM platforms"""
source_platform: str
country: str
city: str
neighborhood: str
property_type: str # apartment, house, land, commercial
listing_type: str # sale, rent
price_local: Optional[float]
price_usd: Optional[float]
local_currency: str
area_m2: Optional[float]
bedrooms: Optional[int]
bathrooms: Optional[int]
parking_spots: Optional[int]
price_per_m2_local: Optional[float]
price_per_m2_usd: Optional[float]
listing_url: str
scraped_at: str
class PropertyNormalizer:
# approximate exchange rates (update regularly)
USD_RATES = {
"BRL": 5.0,
"ARS": 900.0,
"MXN": 17.0,
"COP": 4000.0,
"CLP": 900.0,
"PEN": 3.7,
}
PROPERTY_TYPE_MAP = {
# portuguese
"apartamento": "apartment",
"casa": "house",
"terreno": "land",
"comercial": "commercial",
"sala": "commercial",
"cobertura": "penthouse",
# spanish
"departamento": "apartment",
"local": "commercial",
"oficina": "office",
"lote": "land",
}
@classmethod
def normalize(cls, raw: dict, source: str, country: str) -> NormalizedProperty:
"""normalize a raw property dict into standard format"""
from datetime import datetime
currency = cls._detect_currency(raw, country)
price_local = cls._parse_price(raw.get("price"))
price_usd = cls._convert_to_usd(price_local, currency)
area = cls._parse_area(raw.get("area_m2") or raw.get("area"))
price_per_m2_local = None
price_per_m2_usd = None
if price_local and area and area > 0:
price_per_m2_local = round(price_local / area, 2)
if price_usd:
price_per_m2_usd = round(price_usd / area, 2)
return NormalizedProperty(
source_platform=source,
country=country,
city=raw.get("city", ""),
neighborhood=raw.get("neighborhood", ""),
property_type=cls._normalize_type(raw.get("property_type", "")),
listing_type=raw.get("listing_type", "sale"),
price_local=price_local,
price_usd=price_usd,
local_currency=currency,
area_m2=area,
bedrooms=cls._parse_int(raw.get("bedrooms")),
bathrooms=cls._parse_int(raw.get("bathrooms")),
parking_spots=cls._parse_int(raw.get("parking")),
price_per_m2_local=price_per_m2_local,
price_per_m2_usd=price_per_m2_usd,
listing_url=raw.get("listing_url", ""),
scraped_at=datetime.utcnow().isoformat()
)
@classmethod
def _detect_currency(cls, raw: dict, country: str) -> str:
country_currency = {
"brazil": "BRL",
"argentina": "ARS",
"mexico": "MXN",
"colombia": "COP",
"chile": "CLP",
"peru": "PEN",
}
return raw.get("currency", country_currency.get(country, "USD"))
@classmethod
def _parse_price(cls, price_str) -> Optional[float]:
if price_str is None:
return None
if isinstance(price_str, (int, float)):
return float(price_str)
# remove currency symbols and format
cleaned = re.sub(r'[^\d.,]', '', str(price_str))
# handle different decimal separators
if ',' in cleaned and '.' in cleaned:
# 1.234.567,89 format (BR/AR) or 1,234,567.89 format
if cleaned.rindex(',') > cleaned.rindex('.'):
cleaned = cleaned.replace('.', '').replace(',', '.')
else:
cleaned = cleaned.replace(',', '')
elif ',' in cleaned:
cleaned = cleaned.replace(',', '.')
try:
return float(cleaned)
except ValueError:
return None
@classmethod
def _convert_to_usd(cls, amount: Optional[float], currency: str) -> Optional[float]:
if amount is None or currency == "USD":
return amount
rate = cls.USD_RATES.get(currency)
if rate:
return round(amount / rate, 2)
return None
@classmethod
def _normalize_type(cls, raw_type: str) -> str:
return cls.PROPERTY_TYPE_MAP.get(raw_type.lower(), raw_type.lower())
@classmethod
def _parse_int(cls, value) -> Optional[int]:
if value is None:
return None
try:
return int(float(value))
except (ValueError, TypeError):
return None
Exporting and Analyzing the Data
Export to CSV for Analysis
import csv
from dataclasses import fields, asdict
def export_to_csv(properties: list[NormalizedProperty], filename: str):
"""export normalized properties to CSV"""
if not properties:
print("no properties to export")
return
fieldnames = [f.name for f in fields(NormalizedProperty)]
with open(filename, 'w', newline='', encoding='utf-8') as f:
writer = csv.DictWriter(f, fieldnames=fieldnames)
writer.writeheader()
for prop in properties:
writer.writerow(asdict(prop))
print(f"exported {len(properties)} properties to {filename}")
# usage
export_to_csv(normalized_properties, "latam_real_estate_2026.csv")
Quick Market Analysis
import statistics
def analyze_market(properties: list[NormalizedProperty], city: str) -> dict:
"""quick market analysis for a specific city"""
city_props = [p for p in properties if p.city.lower() == city.lower() and p.price_usd]
if not city_props:
return {"city": city, "error": "no properties found"}
prices = [p.price_usd for p in city_props]
prices_per_m2 = [p.price_per_m2_usd for p in city_props if p.price_per_m2_usd]
return {
"city": city,
"total_listings": len(city_props),
"median_price_usd": round(statistics.median(prices), 2),
"mean_price_usd": round(statistics.mean(prices), 2),
"min_price_usd": min(prices),
"max_price_usd": max(prices),
"median_price_per_m2_usd": round(statistics.median(prices_per_m2), 2) if prices_per_m2 else None,
"avg_area_m2": round(statistics.mean([p.area_m2 for p in city_props if p.area_m2]), 1),
}
# compare cities
for city in ["sao-paulo", "buenos-aires", "mexico-city", "bogota"]:
analysis = analyze_market(normalized_properties, city)
print(f"\n{city}:")
for key, value in analysis.items():
print(f" {key}: {value}")
Legal Considerations for LATAM Scraping
each country has different data protection regulations:
- Brazil (LGPD): similar to GDPR. scraping publicly listed property data is generally acceptable, but avoid scraping personal contact information of agents or owners.
- Argentina: personal data law (Ley 25.326) protects personal information. property listing data itself is typically fine.
- Mexico: LFPDPPP protects personal data. real estate platforms may have specific terms against scraping.
- Colombia: Law 1581 on personal data protection. publicly available listings are generally fair game.
general recommendations:
– only scrape publicly visible listing data
– do not scrape agent phone numbers or email addresses for marketing purposes
– respect robots.txt for each platform
– implement reasonable rate limits to avoid disrupting services
– keep scraped data for analysis purposes, not republication
Conclusion
scraping latin american real estate data requires a multi-platform, multi-country approach with geo-targeted proxies and careful data normalization. the key challenges are handling different anti-bot protections across platforms, dealing with multiple languages and currencies, and maintaining consistent data quality.
start with one platform and one country (ImovelWeb in Brazil is a good first target due to its relatively light anti-bot measures), get your normalization pipeline working, and then expand to additional platforms. the proxy investment pays for itself quickly when you consider that most LATAM platforms block international IPs outright.
the code examples in this guide provide a solid foundation. adapt the selectors to match current page structures (they change periodically) and always test with a small number of requests before scaling up.