How to Scrape MercadoLibre Product Data in 2026
MercadoLibre is Latin America’s largest e-commerce and fintech platform, operating in 18 countries including Argentina, Brazil, Mexico, Colombia, and Chile. With over 148 million active users and a dominant market position across the region, MercadoLibre is the definitive data source for Latin American e-commerce intelligence, competitive pricing analysis, and regional market research.
This guide covers how to scrape MercadoLibre product data using Python, leverage their public API, and integrate proxies for reliable large-scale extraction.
What Data Can You Extract from MercadoLibre?
MercadoLibre provides extensive product and seller data:
- Product titles and descriptions
- Pricing (in local currencies across markets)
- Seller information (reputation, sales history, location)
- Product condition (new, used, refurbished)
- Shipping details (free shipping, fulfillment type)
- Customer ratings and reviews
- Product attributes and specifications
- Category hierarchy
- Listing type (auction, buy-it-now)
Example JSON Output
{
"id": "MLA1234567890",
"title": "iPhone 15 Pro Max 256GB - Nuevo Sellado",
"price": 1899999,
"currency": "ARS",
"condition": "new",
"sold_quantity": 342,
"available_quantity": 15,
"seller": {
"id": 987654321,
"nickname": "TECNOSTORE_AR",
"reputation": "platinum",
"transactions": 15420,
"positive_ratings": 99.2
},
"shipping": {
"free_shipping": true,
"fulfillment": "mercadoenvios"
},
"categories": ["Celulares y Smartphones", "iPhone"],
"location": "Buenos Aires, Argentina",
"url": "https://www.mercadolibre.com.ar/p/MLA1234567890"
}Prerequisites
pip install requests beautifulsoup4 lxml fake-useragentMercadoLibre has a public API that provides structured data access. For web scraping beyond API limits, residential proxies from Latin American IP pools are recommended.
Method 1: Using MercadoLibre’s Public API
MercadoLibre offers one of the most accessible public APIs among e-commerce platforms. No authentication is required for basic search and product data.
import requests
import json
import time
import random
class MercadoLibreAPIScraper:
def __init__(self, site_id="MLA", proxy_url=None):
"""
site_id options:
MLA = Argentina, MLB = Brazil, MLM = Mexico,
MCO = Colombia, MLC = Chile, MLU = Uruguay
"""
self.site_id = site_id
self.api_url = "https://api.mercadolibre.com"
self.proxy_url = proxy_url
self.session = requests.Session()
def _get_proxies(self):
if self.proxy_url:
return {"http": self.proxy_url, "https": self.proxy_url}
return None
def search_products(self, query, limit=50, offset=0):
"""Search products using the MercadoLibre API."""
url = f"{self.api_url}/sites/{self.site_id}/search"
params = {
"q": query,
"limit": min(limit, 50),
"offset": offset,
}
try:
response = self.session.get(
url,
params=params,
proxies=self._get_proxies(),
timeout=30
)
response.raise_for_status()
data = response.json()
products = []
for item in data.get("results", []):
product = {
"id": item.get("id"),
"title": item.get("title"),
"price": item.get("price"),
"currency": item.get("currency_id"),
"condition": item.get("condition"),
"sold_quantity": item.get("sold_quantity"),
"available_quantity": item.get("available_quantity"),
"permalink": item.get("permalink"),
"thumbnail": item.get("thumbnail"),
"seller_id": item.get("seller", {}).get("id"),
"free_shipping": item.get("shipping", {}).get("free_shipping"),
"category_id": item.get("category_id"),
"listing_type": item.get("listing_type_id"),
}
products.append(product)
total_results = data.get("paging", {}).get("total", 0)
return products, total_results
except requests.RequestException as e:
print(f"Error: {e}")
return [], 0
def search_all(self, query, max_results=500):
"""Paginate through all search results."""
all_products = []
offset = 0
while offset < max_results:
products, total = self.search_products(query, limit=50, offset=offset)
if not products:
break
all_products.extend(products)
offset += 50
print(f"Fetched {len(all_products)}/{min(total, max_results)} products")
time.sleep(random.uniform(0.5, 1.5))
return all_products
def get_product_detail(self, item_id):
"""Get detailed product information by item ID."""
url = f"{self.api_url}/items/{item_id}"
try:
response = self.session.get(
url,
proxies=self._get_proxies(),
timeout=30
)
response.raise_for_status()
data = response.json()
return {
"id": data.get("id"),
"title": data.get("title"),
"price": data.get("price"),
"currency": data.get("currency_id"),
"condition": data.get("condition"),
"sold_quantity": data.get("sold_quantity"),
"available_quantity": data.get("available_quantity"),
"listing_type": data.get("listing_type_id"),
"permalink": data.get("permalink"),
"pictures": [pic.get("url") for pic in data.get("pictures", [])],
"attributes": {
attr.get("name"): attr.get("value_name")
for attr in data.get("attributes", [])
if attr.get("value_name")
},
"warranty": data.get("warranty"),
"seller_id": data.get("seller_id"),
"date_created": data.get("date_created"),
"last_updated": data.get("last_updated"),
}
except requests.RequestException as e:
print(f"Error getting product {item_id}: {e}")
return None
def get_product_description(self, item_id):
"""Get product description text."""
url = f"{self.api_url}/items/{item_id}/description"
try:
response = self.session.get(url, proxies=self._get_proxies(), timeout=30)
response.raise_for_status()
data = response.json()
return data.get("plain_text") or data.get("text")
except Exception:
return None
def get_seller_info(self, seller_id):
"""Get seller reputation and details."""
url = f"{self.api_url}/users/{seller_id}"
try:
response = self.session.get(url, proxies=self._get_proxies(), timeout=30)
response.raise_for_status()
data = response.json()
reputation = data.get("seller_reputation", {})
return {
"id": data.get("id"),
"nickname": data.get("nickname"),
"registration_date": data.get("registration_date"),
"power_seller_status": reputation.get("power_seller_status"),
"level_id": reputation.get("level_id"),
"transactions_completed": reputation.get("transactions", {}).get("completed"),
"positive_ratings": reputation.get("transactions", {}).get("ratings", {}).get("positive"),
}
except Exception as e:
print(f"Error: {e}")
return None
def get_reviews(self, item_id, limit=50):
"""Get product reviews."""
url = f"{self.api_url}/reviews/item/{item_id}"
params = {"limit": limit}
try:
response = self.session.get(url, params=params, proxies=self._get_proxies(), timeout=30)
response.raise_for_status()
data = response.json()
reviews = []
for review in data.get("reviews", []):
reviews.append({
"rating": review.get("rating"),
"title": review.get("title"),
"content": review.get("content"),
"date": review.get("date_created"),
"likes": review.get("likes"),
"dislikes": review.get("dislikes"),
})
return reviews
except Exception as e:
print(f"Error: {e}")
return []
def search_by_category(self, category_id, limit=50):
"""Search products within a specific category."""
url = f"{self.api_url}/sites/{self.site_id}/search"
params = {
"category": category_id,
"limit": limit,
}
try:
response = self.session.get(url, params=params, proxies=self._get_proxies(), timeout=30)
response.raise_for_status()
return response.json().get("results", [])
except Exception:
return []
# Usage
if __name__ == "__main__":
# Argentina market
scraper = MercadoLibreAPIScraper(site_id="MLA", proxy_url="http://user:pass@proxy:port")
# Search products
products, total = scraper.search_products("iphone 15 pro", limit=10)
print(f"Found {total} total results")
for p in products[:3]:
print(f" {p['title']} - ${p['price']} {p['currency']}")
# Get details
detail = scraper.get_product_detail(p["id"])
print(json.dumps(detail, indent=2, ensure_ascii=False))
# Get reviews
reviews = scraper.get_reviews(p["id"], limit=5)
print(f" Reviews: {len(reviews)}")
time.sleep(1)Method 2: Web Scraping MercadoLibre HTML
For data not available through the API, you can scrape the website directly.
import requests
from bs4 import BeautifulSoup
from fake_useragent import UserAgent
import json
import time
import random
class MercadoLibreWebScraper:
def __init__(self, country="com.ar", proxy_url=None):
self.session = requests.Session()
self.ua = UserAgent()
self.proxy_url = proxy_url
self.base_url = f"https://www.mercadolibre.{country}"
self.list_url = f"https://listado.mercadolibre.{country}"
def _get_headers(self):
return {
"User-Agent": self.ua.random,
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language": "es-AR,es;q=0.9,en;q=0.8",
"Referer": self.base_url,
}
def _get_proxies(self):
if self.proxy_url:
return {"http": self.proxy_url, "https": self.proxy_url}
return None
def search_products(self, query, max_pages=3):
"""Scrape search results from the website."""
all_products = []
for page in range(1, max_pages + 1):
offset = (page - 1) * 50 + 1
url = f"{self.list_url}/{query.replace(' ', '-')}_Desde_{offset}_NoIndex_True"
try:
response = self.session.get(
url,
headers=self._get_headers(),
proxies=self._get_proxies(),
timeout=30
)
response.raise_for_status()
soup = BeautifulSoup(response.text, "lxml")
items = soup.select("li.ui-search-layout__item")
for item in items:
try:
title_elem = item.select_one("h2.ui-search-item__title")
price_elem = item.select_one("span.andes-money-amount__fraction")
link_elem = item.select_one("a.ui-search-link")
shipping = item.select_one("p.ui-search-item__shipping")
product = {
"title": title_elem.get_text(strip=True) if title_elem else None,
"price": price_elem.get_text(strip=True) if price_elem else None,
"url": link_elem["href"] if link_elem else None,
"free_shipping": "gratis" in shipping.get_text(strip=True).lower() if shipping else False,
}
all_products.append(product)
except Exception:
continue
print(f"Page {page}: Found {len(items)} products")
time.sleep(random.uniform(2, 4))
except requests.RequestException as e:
print(f"Error on page {page}: {e}")
continue
return all_products
# Usage
scraper = MercadoLibreWebScraper(country="com.ar", proxy_url="http://user:pass@proxy:port")
results = scraper.search_products("samsung galaxy", max_pages=3)
print(json.dumps(results[:5], indent=2, ensure_ascii=False))Proxy Recommendations for MercadoLibre
| Proxy Type | Success Rate | Best For |
|---|---|---|
| LATAM Residential | 85-95% | Regional market research |
| US Residential | 70-80% | General scraping |
| Mobile Proxies | 90%+ | High-volume extraction |
| Datacenter | 40-50% | API calls only |
For API access, datacenter proxies often work fine. For web scraping, use residential proxies from the target country for accurate regional data.
Legal Considerations
- Public API: MercadoLibre’s API is publicly available but has rate limits (varies by endpoint).
- Terms of Service: Automated scraping beyond the API is restricted in their ToS.
- Data Protection: Latin American data protection laws (Argentina’s PDPL, Brazil’s LGPD) apply.
- Commercial Use: The API terms allow limited commercial use. Consult legal counsel for large-scale operations.
See our web scraping compliance guide for more details.
Frequently Asked Questions
Does MercadoLibre have a public API?
Yes. MercadoLibre offers a comprehensive public API (api.mercadolibre.com) that provides access to search results, product details, seller information, reviews, and categories. Basic endpoints require no authentication.
What markets does MercadoLibre cover?
MercadoLibre operates in 18 Latin American countries. Key site IDs: MLA (Argentina), MLB (Brazil), MLM (Mexico), MCO (Colombia), MLC (Chile), MLU (Uruguay), MPE (Peru), MEC (Ecuador).
What are MercadoLibre API rate limits?
The public API allows approximately 10,000 requests per hour without authentication. With OAuth authentication, limits are higher. Implement 0.5-1 second delays between requests to stay within limits.
Can I scrape MercadoLibre pricing across countries?
Yes. Use different site_id values (MLA, MLB, MLM, etc.) to access pricing in each country. Prices are displayed in local currencies (ARS, BRL, MXN, etc.).
Why do some API endpoints return 403 errors?
Some MercadoLibre API endpoints require OAuth authentication. For basic search and product data, no auth is needed. For user-specific data, order history, or higher rate limits, register as a developer and use OAuth tokens.
Advanced Techniques
Handling Pagination
Most websites paginate their results. Implement robust pagination handling:
def scrape_all_pages(scraper, base_url, max_pages=20):
all_data = []
for page in range(1, max_pages + 1):
url = f"{base_url}?page={page}"
results = scraper.search(url)
if not results:
break
all_data.extend(results)
print(f"Page {page}: {len(results)} items (total: {len(all_data)})")
time.sleep(random.uniform(2, 5))
return all_dataData Validation and Cleaning
Always validate scraped data before storage:
def validate_data(item):
required_fields = ["title", "url"]
for field in required_fields:
if not item.get(field):
return False
return True
def clean_text(text):
if not text:
return None
# Remove extra whitespace
import re
text = re.sub(r'\s+', ' ', text).strip()
# Remove HTML entities
import html
text = html.unescape(text)
return text
# Apply to results
cleaned = [item for item in results if validate_data(item)]
for item in cleaned:
item["title"] = clean_text(item.get("title"))Monitoring and Alerting
Build monitoring into your scraping pipeline:
import logging
from datetime import datetime
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)
class ScrapingMonitor:
def __init__(self):
self.start_time = datetime.now()
self.requests = 0
self.errors = 0
self.items = 0
def log_request(self, success=True):
self.requests += 1
if not success:
self.errors += 1
if self.requests % 50 == 0:
elapsed = (datetime.now() - self.start_time).seconds
rate = self.requests / max(elapsed, 1) * 60
logger.info(f"Requests: {self.requests}, Errors: {self.errors}, "
f"Items: {self.items}, Rate: {rate:.1f}/min")
def log_item(self, count=1):
self.items += countError Handling and Retry Logic
Implement robust error handling:
import time
from requests.exceptions import RequestException
def retry_request(func, max_retries=3, base_delay=5):
for attempt in range(max_retries):
try:
return func()
except RequestException as e:
if attempt == max_retries - 1:
raise
delay = base_delay * (2 ** attempt)
print(f"Attempt {attempt + 1} failed: {e}. Retrying in {delay}s...")
time.sleep(delay)
return NoneData Storage Options
Choose the right storage for your scraping volume:
import json
import csv
import sqlite3
class DataStorage:
def __init__(self, db_path="scraped_data.db"):
self.conn = sqlite3.connect(db_path)
self.conn.execute('''CREATE TABLE IF NOT EXISTS items
(id TEXT PRIMARY KEY, title TEXT, url TEXT, data JSON, scraped_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP)''')
def save(self, item):
self.conn.execute(
"INSERT OR REPLACE INTO items (id, title, url, data) VALUES (?, ?, ?, ?)",
(item.get("id"), item.get("title"), item.get("url"), json.dumps(item))
)
self.conn.commit()
def export_json(self, output_path):
cursor = self.conn.execute("SELECT data FROM items")
items = [json.loads(row[0]) for row in cursor.fetchall()]
with open(output_path, "w") as f:
json.dump(items, f, indent=2)
def export_csv(self, output_path):
cursor = self.conn.execute("SELECT * FROM items")
rows = cursor.fetchall()
with open(output_path, "w", newline="") as f:
writer = csv.writer(f)
writer.writerow(["id", "title", "url", "data", "scraped_at"])
writer.writerows(rows)Frequently Asked Questions
How often should I scrape data?
The optimal frequency depends on how often the source data changes. For real-time data (stock prices, news), scrape every few minutes. For product listings, daily or weekly is usually sufficient. For reviews, weekly scraping captures new feedback without excessive load.
What happens if my IP gets blocked?
If you receive 403 or 429 status codes, your IP is likely blocked. Switch to a different proxy, implement exponential backoff, and slow your request rate. Rotating residential proxies automatically switch IPs to prevent blocks.
Should I use headless browsers or HTTP requests?
Use HTTP requests (with BeautifulSoup or similar) whenever possible — they are faster and use less resources. Switch to headless browsers (Selenium, Playwright) only when JavaScript rendering is required for the data you need.
How do I handle CAPTCHAs?
CAPTCHAs indicate aggressive bot detection. To minimize them: use residential or mobile proxies, implement realistic delays, rotate user agents, and maintain consistent session behavior. For persistent CAPTCHAs, consider CAPTCHA-solving services as a last resort.
Can I scrape data commercially?
The legality of commercial scraping depends on the platform’s ToS, the type of data collected, and your jurisdiction. Public data is generally more permissible, but always consult legal counsel for commercial use cases. See our compliance guide.
Conclusion
MercadoLibre is one of the most accessible e-commerce platforms to scrape thanks to its comprehensive public API. For basic product research, the API provides clean, structured data without browser automation. For large-scale operations or data not available via API, combine web scraping with residential proxies from target Latin American markets.
Explore our e-commerce proxy guide for more platform-specific scraping strategies.
- How to Scrape AliExpress Product Data
- How to Scrape Amazon Product Reviews in 2026
- aiohttp + BeautifulSoup: Async Python Scraping
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- API vs Web Scraping: When You Need Proxies (and When You Don’t)
- ASEAN Data Protection Laws: A Web Scraping Compliance Matrix
- How to Scrape AliExpress Product Data
- How to Scrape Amazon Product Reviews in 2026
- aiohttp + BeautifulSoup: Async Python Scraping
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- API vs Web Scraping: When You Need Proxies (and When You Don’t)
- ASEAN Data Protection Laws: A Web Scraping Compliance Matrix
- How to Scrape AliExpress Product Data
- How to Scrape Amazon Product Reviews in 2026
- aiohttp + BeautifulSoup: Async Python Scraping
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- API vs Web Scraping: When You Need Proxies (and When You Don’t)
- ASEAN Data Protection Laws: A Web Scraping Compliance Matrix
Related Reading
- How to Scrape AliExpress Product Data
- How to Scrape Amazon Product Reviews in 2026
- aiohttp + BeautifulSoup: Async Python Scraping
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- API vs Web Scraping: When You Need Proxies (and When You Don’t)
- ASEAN Data Protection Laws: A Web Scraping Compliance Matrix