Commercial real estate operates on an entirely different data plane than residential. Deal sizes are larger, underwriting is more rigorous, and the information asymmetry between those who have data and those who do not can mean millions of dollars in missed opportunities. Platforms like LoopNet, CoStar, and CREXi hold the data that drives commercial deals — and scraping them with the right proxy setup gives you an analytical advantage that manual research simply cannot match. This guide breaks down the CRE data landscape, the technical challenges of scraping commercial platforms, and the proxy strategies that make large-scale data collection possible.
The Commercial Real Estate Data Landscape
Unlike residential real estate, where Zillow and Realtor.com provide free, relatively open access to listing data, the commercial real estate data ecosystem is heavily gated. CoStar Group — which also owns LoopNet — dominates the market and actively restricts data access to paying subscribers.
Key CRE Data Platforms
| Platform | Data Type | Access Level | Scraping Difficulty |
|---|---|---|---|
| LoopNet | Commercial listings (sale and lease) | Free basic, premium for details | Medium-High |
| CoStar | Comprehensive CRE analytics, comps, tenant data | Subscription only ($300-$1,000+/mo) | Very High (authenticated) |
| CREXi | Commercial listings, auction data | Free basic, premium for analytics | Medium |
| Reonomy | Property ownership, debt, transaction history | Subscription only | Very High (authenticated) |
| CommercialCafe | Office and retail listings | Free basic | Medium |
| Crexi Marketplace | Broker listings, deal flow | Free basic | Medium |
| County assessor sites | Tax data, property details, ownership | Public / free | Low-Medium (per county) |
What CRE Data Is Worth Scraping
Commercial real estate analysis hinges on a few core metrics that determine property valuation and investment returns:
- Cap rate (capitalization rate): Net operating income divided by property price. Tells you the yield if you purchased the property outright.
- NOI (net operating income): Gross rental income minus operating expenses. The fundamental measure of property cash flow.
- Price per square foot: Allows comparison across properties of different sizes.
- Occupancy rates: Percentage of leasable space currently under lease.
- Lease terms: Remaining lease duration, tenant quality (credit rating), lease type (NNN, gross, modified gross).
- Comparable sales (comps): Recent sale prices for similar properties in the area.
Scraping these data points across hundreds of properties in a market lets you build valuation models, identify underpriced assets, and underwrite deals with far more confidence than relying on broker-provided information alone.
Scraping LoopNet
What Is Accessible
LoopNet provides basic listing information for free: property type, location, asking price or lease rate, building size, and listing broker contact. Detailed information — financials, tenant data, historical occupancy, offering memorandums — requires a premium account.
For scraping purposes, the freely accessible data is still valuable for market mapping and initial screening. You can identify all available properties in a market, track listing activity over time, and build a database of asking prices and property characteristics.
Technical Challenges
LoopNet uses moderate anti-bot defenses:
- Rate limiting: Aggressive throttling after 50-100 requests per IP per hour.
- JavaScript rendering: Search results and listing details require JavaScript execution.
- CAPTCHA triggers: Rapid pagination triggers hCaptcha challenges.
- Session validation: The site tracks browsing patterns and flags sessions that navigate unnaturally (e.g., skipping search and going directly to listing URLs).
Use browser automation (Playwright with stealth plugins) rather than raw HTTP requests. Start from the search page, apply filters, and paginate through results — mimicking how a real user would browse. For ethical guidelines on this kind of data collection, refer to our piece on the legal and ethical considerations of price scraping with proxies.
Recommended Proxy Setup for LoopNet
Residential rotating proxies are the minimum viable option. ISP proxies perform better due to LoopNet’s session-based anti-bot logic — the site tracks behavior across multiple page loads, and residential IPs that rotate mid-session trigger suspicion. Use sticky sessions of 10-15 minutes per IP, enough to complete a search-and-paginate cycle before rotating.
Scraping CoStar
The Authentication Challenge
CoStar is the most data-rich CRE platform and also the most difficult to scrape. It requires paid authentication, and the platform actively monitors for automated access patterns among its subscribers. Scraping CoStar with a legitimate account risks account termination and potential legal action — CoStar has historically been aggressive about enforcing its terms of service.
What You Can Extract Without an Account
CoStar’s public-facing presence is limited. Some property data appears in CoStar-powered widgets on broker websites, and CoStar’s marketing pages occasionally expose aggregate market statistics. These fragments can be scraped, but they do not provide the comprehensive property-level data that makes CoStar valuable.
Alternative Approaches
Rather than scraping CoStar directly — which carries significant legal and account risk — consider these alternatives:
- Scrape the public sources that feed CoStar: County assessor records, deed filings, and permit data are all public. CoStar aggregates these sources; you can do the same.
- Use CoStar’s API legitimately: CoStar offers API access for enterprise customers. If your operation justifies the cost, an API subscription provides structured data without the legal risk of scraping.
- Scrape broker websites: Many brokers republish CoStar data on their own listing pages with less sophisticated anti-bot protection.
- Build from county records: County assessor sites provide ownership, tax, and property characteristic data. Combined with free listing data from LoopNet and CREXi, you can reconstruct much of what CoStar offers.
Scraping CREXi
Platform Overview
CREXi has emerged as a LoopNet competitor with a more modern platform and, critically, less aggressive anti-scraping measures. The site provides listing data, auction information, and basic analytics for free, with premium tiers for deeper insights.
Technical Approach
CREXi’s architecture is a modern single-page application (SPA) built on React. Data is loaded via API calls that return JSON, making it relatively straightforward to scrape once you identify the API endpoints. Key considerations:
- API endpoint discovery: Use browser developer tools to monitor network requests while browsing the site. CREXi’s API endpoints follow predictable patterns and return well-structured JSON.
- Pagination: API responses include pagination tokens. Iterate through all pages for a given search query.
- Rate limiting: CREXi’s rate limits are moderate — approximately 100-200 requests per IP per hour. Rotating residential proxies handle this easily.
- Data completeness: Free-tier API responses include most listing details but may omit seller contact information and financial documents. Premium data requires authentication.
Proxy Requirements for CREXi
CREXi’s defenses are lighter than LoopNet’s. Residential rotating proxies work reliably, and even quality datacenter proxies can achieve 50-60% success rates. For cost-effective large-scale scraping, a mixed approach works well: use datacenter proxies for the bulk of requests and fall back to residential proxies when datacenter IPs get blocked.
Proxy Architecture for Multi-Platform CRE Scraping
Managing Multiple Proxy Providers
Scraping multiple CRE platforms simultaneously — LoopNet, CREXi, county assessor sites, broker websites — creates a complex proxy management challenge. Each platform has different anti-bot thresholds, different geographic sensitivities, and different session requirements. Using a single proxy provider for all targets is simple but suboptimal. For strategies on coordinating multiple proxy sources, see our guide on how to manage multiple proxy providers.
Recommended Architecture
| Target Platform | Proxy Type | Session Type | Rotation Interval | Concurrency |
|---|---|---|---|---|
| LoopNet | ISP or residential | Sticky (10-15 min) | Per search session | 3-5 concurrent |
| CREXi | Residential or datacenter | Rotating | Per request | 5-10 concurrent |
| County assessor sites | ISP or residential | Sticky (5-10 min) | Per parcel lookup | 2-3 concurrent |
| Broker websites | Residential or datacenter | Rotating | Per request | 5-10 concurrent |
| Auction platforms | Residential | Sticky (5 min) | Per auction listing | 3-5 concurrent |
Proxy Budget Planning for CRE Data
Commercial real estate scraping typically involves fewer total requests than residential scraping (fewer listings per market) but higher value per data point. A typical CRE market might have 200-1,000 active listings across platforms, compared to 5,000-50,000 residential listings.
Monthly proxy costs for a CRE scraping operation:
| Operation Scale | Markets Covered | Monthly Requests | Estimated Proxy Cost |
|---|---|---|---|
| Single market | 1 | 5,000-15,000 | $50-$150 |
| Regional | 5-10 | 25,000-100,000 | $150-$500 |
| National | 50+ | 250,000-1,000,000 | $500-$3,000 |
Building a CRE Data Pipeline
Data Normalization
CRE data from different sources uses inconsistent formats. LoopNet might list a property as “retail” while CREXi calls it “shopping center” and the county assessor codes it as “commercial — retail.” Build a normalization layer that maps platform-specific categories to your internal taxonomy. Key fields to normalize:
- Property type (office, retail, industrial, multifamily, hospitality, special purpose)
- Building class (A, B, C — not all platforms use this classification)
- Size units (square feet vs. acres for land)
- Price format (asking price vs. per-SF price vs. cap rate)
- Address formatting (for deduplication across platforms)
Cross-Platform Deduplication
The same property frequently appears on multiple platforms. Effective deduplication requires address normalization (standardizing street names, abbreviations, unit numbers) and fuzzy matching. A property listed at “100 Main St, Suite 200” on LoopNet and “100 Main Street #200” on CREXi is the same property. Use geocoding as a secondary match — properties within 50 meters with similar characteristics are likely duplicates.
Cap Rate Calculation from Scraped Data
Not all listings include cap rates, but you can estimate them from available data:
- If NOI and price are listed: Cap rate = NOI / price.
- If only rental income is listed: Estimate expenses at 30-45% of gross income (varies by property type) to calculate NOI, then divide by price.
- If only price per SF is listed: Use market-average rental rates (from other listings or lease comps) to estimate gross income, apply expense ratios, and calculate cap rate.
These estimates are rough but useful for initial screening. They help identify properties worth deeper due diligence before you request actual financials from the broker.
Legal and Ethical Considerations for CRE Scraping
Commercial real estate platform scraping carries higher legal risk than scraping public residential listing sites. Key factors:
- Terms of service: LoopNet and CoStar have explicit anti-scraping provisions. Violating ToS could result in account bans and potential legal action.
- Copyright: Listing descriptions, photos, and broker-prepared materials may be copyrighted. Scraping metadata (prices, sizes, locations) is legally different from copying creative content.
- Database rights: In some jurisdictions, the compilation of data itself has legal protection, even if individual data points are factual.
- Contract law: If you have a paid account with a platform that prohibits scraping, you have a contractual obligation to comply, unlike a random visitor governed only by browse-wrap ToS.
The safest approach is to focus your scraping on public-facing, freely accessible data and supplement with legitimate API access or subscriptions where the data is behind authentication walls.
Frequently Asked Questions
Is scraping LoopNet worth the effort when the data is limited on free accounts?
Yes, even free-tier LoopNet data is valuable for market mapping and trend analysis. You can track listing volume, asking prices, days on market, and property types over time. This macro-level data helps identify market trends and opportunities without requiring the detailed financials available only to premium subscribers. The free data tells you where to focus; the premium data (obtained through legitimate subscriptions) informs specific deal underwriting.
Can I scrape CoStar without an account?
CoStar’s public-facing web presence is minimal. Without an account, you can only scrape marketing content and occasional market reports they publish publicly. The core property database, analytics, and comp data require authentication. Instead of trying to scrape CoStar directly, build equivalent datasets from county records, LoopNet, CREXi, and broker websites — which collectively cover much of the same raw data that CoStar aggregates and enhances.
How do I handle properties that appear on multiple platforms with different data?
Create a “golden record” system. When you identify duplicate listings across platforms, merge the data by taking the most complete or most recent value for each field. For example, CREXi might have a more recent price update while LoopNet has more detailed property characteristics. Your merged record combines the best data from each source. Track the source of each field for audit purposes.
What is the minimum proxy setup for a CRE scraping operation?
For a single-market operation scraping LoopNet and CREXi, you can start with a pool of 25-50 rotating residential proxies. This provides enough IP diversity for 5,000-10,000 monthly requests with comfortable rotation intervals. As you add more platforms and markets, scale your pool proportionally. Budget approximately $75-$150 per month for a starter proxy plan with a reputable residential provider.
Should I scrape CRE data myself or buy it from a data provider?
It depends on your scale and technical capabilities. Data providers like ATTOM, Reonomy, and CoStar offer clean, structured CRE data at premium prices ($500-$5,000+ per month). If you need data for a few markets and lack technical resources, buying is faster. If you need broad market coverage, custom data points, or higher refresh frequencies — or if you want to avoid ongoing subscription costs — building your own scraping pipeline pays for itself within a few months.