How to Scrape Target.com Product Data
Target is one of the largest retailers in the United States, with over 1,900 stores and a robust e-commerce platform. For competitive intelligence, price monitoring, and market research, Target.com is an essential data source that provides insights into pricing strategies, product assortments, and consumer trends.
This guide covers everything you need to scrape Target.com effectively using Python, including how to leverage their internal API endpoints and handle their anti-bot protections.
What Data Can You Extract from Target?
Target.com product listings provide extensive retail data:
- Product names and descriptions
- Pricing (regular, sale, and clearance prices)
- Inventory availability (in-store and online)
- Product ratings and review counts
- Brand information
- Category hierarchy
- Product images
- Specifications and dimensions
- Shipping and pickup options
- UPC/DPCI codes
Example JSON Output
{
"tcin": "54191097",
"title": "KitchenAid Classic Stand Mixer - 4.5qt",
"brand": "KitchenAid",
"price": {
"current": 249.99,
"regular": 299.99,
"savings": 50.00,
"currency": "USD"
},
"rating": 4.7,
"review_count": 3842,
"availability": {
"online": "in_stock",
"store_pickup": true,
"same_day_delivery": true
},
"categories": ["Kitchen & Dining", "Kitchen Appliances", "Stand Mixers"],
"specifications": {
"Wattage": "275 watts",
"Bowl Capacity": "4.5 quarts",
"Dimensions": "14.1\" x 8.7\" x 13.9\"",
"Weight": "22 lbs"
},
"images": [
"https://target.scene7.com/is/image/Target/54191097_main"
],
"dpci": "072-04-0123",
"upc": "883049123456",
"url": "https://www.target.com/p/kitchenaid-classic-stand-mixer/-/A-54191097"
}Prerequisites
pip install requests beautifulsoup4 selenium fake-useragent lxmlTarget.com is moderately aggressive with anti-bot measures. Residential proxies are recommended for reliable scraping.
Method 1: Using Target’s Internal API (Redsky API)
Target uses an internal API called “Redsky” that powers their product pages. This is often more reliable than HTML scraping.
import requests
from fake_useragent import UserAgent
import json
import time
import random
class TargetAPIScraper:
def __init__(self, proxy_url=None):
self.session = requests.Session()
self.ua = UserAgent()
self.proxy_url = proxy_url
self.api_key = "9f36aeafbe60771e321a7cc95a78140772ab3e96" # Public API key
self.base_api = "https://redsky.target.com"
def _get_headers(self):
return {
"User-Agent": self.ua.random,
"Accept": "application/json",
"Accept-Language": "en-US,en;q=0.9",
"Origin": "https://www.target.com",
"Referer": "https://www.target.com/",
}
def _get_proxies(self):
if self.proxy_url:
return {"http": self.proxy_url, "https": self.proxy_url}
return None
def search_products(self, query, page=1, count=24):
"""Search Target products via API."""
url = f"{self.base_api}/redsky_aggregations/v1/web/plp_search_v2"
params = {
"key": self.api_key,
"channel": "WEB",
"count": count,
"default_purchasability_filter": "true",
"keyword": query,
"offset": (page - 1) * count,
"page": f"/s/{query}",
"pricing_store_id": "3991",
"visitor_id": "0123456789ABCDEF",
}
try:
response = self.session.get(
url,
params=params,
headers=self._get_headers(),
proxies=self._get_proxies(),
timeout=30
)
response.raise_for_status()
data = response.json()
products = self._parse_search_results(data)
return products
except requests.RequestException as e:
print(f"Search error: {e}")
return []
def _parse_search_results(self, data):
"""Parse search API response."""
products = []
search_results = data.get("data", {}).get("search", {}).get("products", [])
for item in search_results:
product = {
"tcin": item.get("tcin"),
"title": item.get("item", {}).get("product_description", {}).get("title"),
"brand": item.get("item", {}).get("primary_brand", {}).get("name"),
"url": f"https://www.target.com{item.get('item', {}).get('enrichment', {}).get('buy_url', '')}",
}
# Pricing
price_data = item.get("price", {})
product["price"] = {
"current": price_data.get("formatted_current_price"),
"regular": price_data.get("formatted_current_price_default_message"),
}
# Rating
rating_data = item.get("ratings_and_reviews", {}).get("statistics", {})
product["rating"] = rating_data.get("rating", {}).get("average")
product["review_count"] = rating_data.get("rating", {}).get("count")
# Availability
fulfillment = item.get("fulfillment", {})
product["availability"] = {
"online": fulfillment.get("is_out_of_stock_in_all_store_locations", True) is False,
"shipping": fulfillment.get("shipping_options", {}).get("availability_status"),
}
# Image
images = item.get("item", {}).get("enrichment", {}).get("images", {})
product["image"] = images.get("primary_image_url")
products.append(product)
return products
def get_product_details(self, tcin):
"""Get detailed product info by TCIN."""
url = f"{self.base_api}/redsky_aggregations/v1/web/pdp_client_v1"
params = {
"key": self.api_key,
"tcin": tcin,
"pricing_store_id": "3991",
"has_pricing_store_id": "true",
}
try:
response = self.session.get(
url,
params=params,
headers=self._get_headers(),
proxies=self._get_proxies(),
timeout=30
)
response.raise_for_status()
data = response.json()
return self._parse_product_detail(data)
except requests.RequestException as e:
print(f"Product detail error: {e}")
return None
def _parse_product_detail(self, data):
"""Parse detailed product API response."""
product_data = data.get("data", {}).get("product", {})
item = product_data.get("item", {})
price = product_data.get("price", {})
ratings = product_data.get("ratings_and_reviews", {})
product = {
"tcin": product_data.get("tcin"),
"title": item.get("product_description", {}).get("title"),
"description": item.get("product_description", {}).get("downstream_description"),
"brand": item.get("primary_brand", {}).get("name"),
"dpci": item.get("dpci"),
"upc": item.get("primary_barcode"),
"price": {
"current": price.get("formatted_current_price"),
"regular": price.get("formatted_current_price_default_message"),
},
"rating": ratings.get("statistics", {}).get("rating", {}).get("average"),
"review_count": ratings.get("statistics", {}).get("rating", {}).get("count"),
"specifications": item.get("product_description", {}).get("bullet_descriptions", []),
"categories": [
bc.get("name") for bc in item.get("product_classification", {}).get("product_type_name_hierarchy", [])
],
}
return product
def get_reviews(self, tcin, page=0, size=10):
"""Get product reviews."""
url = f"https://r2d2.target.com/ggc/reviews/v2/results"
params = {
"key": self.api_key,
"reviewedId": tcin,
"page": page,
"size": size,
"sortBy": "most_recent",
}
try:
response = self.session.get(
url,
params=params,
headers=self._get_headers(),
proxies=self._get_proxies(),
timeout=30
)
response.raise_for_status()
return response.json()
except requests.RequestException as e:
print(f"Reviews error: {e}")
return None
# Usage
if __name__ == "__main__":
scraper = TargetAPIScraper(proxy_url="http://user:pass@proxy:port")
# Search products
results = scraper.search_products("air fryer", page=1)
print(f"Found {len(results)} products")
# Get product details
for product in results[:3]:
tcin = product.get("tcin")
if tcin:
details = scraper.get_product_details(tcin)
print(json.dumps(details, indent=2))
time.sleep(random.uniform(2, 4))Method 2: HTML Scraping with Selenium
For when API endpoints change or for data not available via API.
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options
import json
import time
class TargetSeleniumScraper:
def __init__(self, proxy=None):
chrome_options = Options()
chrome_options.add_argument("--headless=new")
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--disable-blink-features=AutomationControlled")
if proxy:
chrome_options.add_argument(f"--proxy-server={proxy}")
self.driver = webdriver.Chrome(options=chrome_options)
def search_products(self, query):
"""Search Target.com and extract products."""
url = f"https://www.target.com/s?searchTerm={query}"
self.driver.get(url)
WebDriverWait(self.driver, 15).until(
EC.presence_of_element_located(
(By.CSS_SELECTOR, "[data-test='product-grid'] a")
)
)
# Scroll to load products
for _ in range(3):
self.driver.execute_script("window.scrollBy(0, 1000);")
time.sleep(1)
products = self.driver.execute_script("""
const results = [];
const items = document.querySelectorAll('[data-test="product-grid"] section');
items.forEach(item => {
const link = item.querySelector('a[href*="/p/"]');
const title = item.querySelector('[data-test="product-title"]');
const price = item.querySelector('[data-test="current-price"]');
const rating = item.querySelector('[data-test="ratings"]');
results.push({
title: title ? title.innerText.trim() : null,
price: price ? price.innerText.trim() : null,
url: link ? link.href : null,
rating: rating ? rating.innerText.trim() : null
});
});
return results;
""")
return products
def close(self):
self.driver.quit()Handling Target’s Anti-Bot Protections
1. Akamai Bot Manager
Target uses Akamai for bot detection. Key strategies:
- Use residential proxies to avoid datacenter IP blacklists
- Maintain realistic browser fingerprints
- Include all expected headers in API requests
2. API Key Rotation
Target’s API keys change periodically. Monitor for 403 errors and update accordingly:
def get_api_key(self):
"""Fetch current API key from Target homepage."""
response = self.session.get("https://www.target.com/", headers=self._get_headers())
import re
match = re.search(r'"apiKey":"(\w+)"', response.text)
return match.group(1) if match else None3. Geographic Restrictions
Target only operates in the US. Use US-based proxies:
# Use US residential proxies
proxy = "http://user:pass@us-residential.proxy:port"Proxy Recommendations for Target
| Proxy Type | Success Rate | Best For |
|---|---|---|
| US Residential | 85-90% | Search and product pages |
| US ISP | 80-85% | API scraping |
| US Datacenter | 40-50% | API only, limited |
| Mobile (US) | 90-95% | High-volume scraping |
Since Target is US-only, you need US-based proxies. Residential proxies from US locations provide the most reliable access. Check our proxy setup guides for configuration details.
Legal Considerations
- Terms of Service: Target prohibits automated scraping in their Terms of Use.
- hiQ v. LinkedIn: While the hiQ case supports scraping of publicly available data, Target data may be considered proprietary.
- CFAA: Unauthorized access to Target’s systems could raise Computer Fraud and Abuse Act concerns.
- Price Data: Using price data for price-fixing or anti-competitive practices is illegal.
- Personal Data: Never scrape customer reviews for personal data extraction.
Review our web scraping legal guide for comprehensive compliance information.
Rate Limiting Best Practices
- API requests: Maximum 2-3 per second
- Search pages: 1 request every 3-5 seconds
- Product pages: 1 request every 4-6 seconds
- Session rotation: Every 100-200 requests
- Daily limits: Keep under 5,000 requests per IP
class RateLimiter:
def __init__(self, min_delay=2, max_delay=5):
self.min_delay = min_delay
self.max_delay = max_delay
self.last_request = 0
def wait(self):
elapsed = time.time() - self.last_request
delay = random.uniform(self.min_delay, self.max_delay)
if elapsed < delay:
time.sleep(delay - elapsed)
self.last_request = time.time()Advanced Techniques
Handling Pagination
Most websites paginate their results. Implement robust pagination handling:
def scrape_all_pages(scraper, base_url, max_pages=20):
all_data = []
for page in range(1, max_pages + 1):
url = f"{base_url}?page={page}"
results = scraper.search(url)
if not results:
break
all_data.extend(results)
print(f"Page {page}: {len(results)} items (total: {len(all_data)})")
time.sleep(random.uniform(2, 5))
return all_dataData Validation and Cleaning
Always validate scraped data before storage:
def validate_data(item):
required_fields = ["title", "url"]
for field in required_fields:
if not item.get(field):
return False
return True
def clean_text(text):
if not text:
return None
# Remove extra whitespace
import re
text = re.sub(r'\s+', ' ', text).strip()
# Remove HTML entities
import html
text = html.unescape(text)
return text
# Apply to results
cleaned = [item for item in results if validate_data(item)]
for item in cleaned:
item["title"] = clean_text(item.get("title"))Monitoring and Alerting
Build monitoring into your scraping pipeline:
import logging
from datetime import datetime
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)
class ScrapingMonitor:
def __init__(self):
self.start_time = datetime.now()
self.requests = 0
self.errors = 0
self.items = 0
def log_request(self, success=True):
self.requests += 1
if not success:
self.errors += 1
if self.requests % 50 == 0:
elapsed = (datetime.now() - self.start_time).seconds
rate = self.requests / max(elapsed, 1) * 60
logger.info(f"Requests: {self.requests}, Errors: {self.errors}, "
f"Items: {self.items}, Rate: {rate:.1f}/min")
def log_item(self, count=1):
self.items += countError Handling and Retry Logic
Implement robust error handling:
import time
from requests.exceptions import RequestException
def retry_request(func, max_retries=3, base_delay=5):
for attempt in range(max_retries):
try:
return func()
except RequestException as e:
if attempt == max_retries - 1:
raise
delay = base_delay * (2 ** attempt)
print(f"Attempt {attempt + 1} failed: {e}. Retrying in {delay}s...")
time.sleep(delay)
return NoneData Storage Options
Choose the right storage for your scraping volume:
import json
import csv
import sqlite3
class DataStorage:
def __init__(self, db_path="scraped_data.db"):
self.conn = sqlite3.connect(db_path)
self.conn.execute('''CREATE TABLE IF NOT EXISTS items
(id TEXT PRIMARY KEY, title TEXT, url TEXT, data JSON, scraped_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP)''')
def save(self, item):
self.conn.execute(
"INSERT OR REPLACE INTO items (id, title, url, data) VALUES (?, ?, ?, ?)",
(item.get("id"), item.get("title"), item.get("url"), json.dumps(item))
)
self.conn.commit()
def export_json(self, output_path):
cursor = self.conn.execute("SELECT data FROM items")
items = [json.loads(row[0]) for row in cursor.fetchall()]
with open(output_path, "w") as f:
json.dump(items, f, indent=2)
def export_csv(self, output_path):
cursor = self.conn.execute("SELECT * FROM items")
rows = cursor.fetchall()
with open(output_path, "w", newline="") as f:
writer = csv.writer(f)
writer.writerow(["id", "title", "url", "data", "scraped_at"])
writer.writerows(rows)Frequently Asked Questions
How often should I scrape data?
The optimal frequency depends on how often the source data changes. For real-time data (stock prices, news), scrape every few minutes. For product listings, daily or weekly is usually sufficient. For reviews, weekly scraping captures new feedback without excessive load.
What happens if my IP gets blocked?
If you receive 403 or 429 status codes, your IP is likely blocked. Switch to a different proxy, implement exponential backoff, and slow your request rate. Rotating residential proxies automatically switch IPs to prevent blocks.
Should I use headless browsers or HTTP requests?
Use HTTP requests (with BeautifulSoup or similar) whenever possible — they are faster and use less resources. Switch to headless browsers (Selenium, Playwright) only when JavaScript rendering is required for the data you need.
How do I handle CAPTCHAs?
CAPTCHAs indicate aggressive bot detection. To minimize them: use residential or mobile proxies, implement realistic delays, rotate user agents, and maintain consistent session behavior. For persistent CAPTCHAs, consider CAPTCHA-solving services as a last resort.
Can I scrape data commercially?
The legality of commercial scraping depends on the platform’s ToS, the type of data collected, and your jurisdiction. Public data is generally more permissible, but always consult legal counsel for commercial use cases. See our compliance guide.
Conclusion
Target.com provides excellent e-commerce data through both its internal APIs and web pages. The Redsky API is particularly useful for structured data extraction, offering clean JSON responses with comprehensive product information.
For reliable Target scraping, use US residential proxies and respect their rate limits. Visit our e-commerce scraping hub for more retailer-specific guides and proxy recommendations.
- How to Scrape AliExpress Product Data
- How to Scrape Amazon Product Reviews in 2026
- aiohttp + BeautifulSoup: Async Python Scraping
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- API vs Web Scraping: When You Need Proxies (and When You Don’t)
- ASEAN Data Protection Laws: A Web Scraping Compliance Matrix
- How to Scrape AliExpress Product Data
- How to Scrape Amazon Product Reviews in 2026
- aiohttp + BeautifulSoup: Async Python Scraping
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- API vs Web Scraping: When You Need Proxies (and When You Don’t)
- ASEAN Data Protection Laws: A Web Scraping Compliance Matrix
- How to Scrape AliExpress Product Data
- How to Scrape Amazon Product Reviews in 2026
- aiohttp + BeautifulSoup: Async Python Scraping
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- API vs Web Scraping: When You Need Proxies (and When You Don’t)
- ASEAN Data Protection Laws: A Web Scraping Compliance Matrix
Related Reading
- How to Scrape AliExpress Product Data
- How to Scrape Amazon Product Reviews in 2026
- aiohttp + BeautifulSoup: Async Python Scraping
- How Anti-Bot Systems Detect Scrapers (Cloudflare, Akamai, PerimeterX)
- API vs Web Scraping: When You Need Proxies (and When You Don’t)
- ASEAN Data Protection Laws: A Web Scraping Compliance Matrix