If you have ever researched real estate data, you have probably heard the term “MLS” — the Multiple Listing Service. It is the backbone of real estate transactions in the United States, containing the most comprehensive and accurate listing data available anywhere. Naturally, anyone building a data-driven real estate tool wants access to it. But can you legally scrape MLS data? The answer is complicated, and getting it wrong can expose you to serious legal liability. This guide breaks down what MLS data is, the legal landscape surrounding it, and what alternatives exist for accessing this data through proxies and other means.
What Is the MLS and Why Does It Matter?
The Multiple Listing Service is not a single database — it is a network of approximately 580 regional databases across the United States. Each regional MLS is operated by a local association of Realtors and serves as the authoritative repository of listing information for that area.
What Data Does the MLS Contain?
- Active listings: Every property listed for sale by an MLS member, with complete details (price, property specs, photos, listing agent, showing instructions)
- Pending and sold data: Transaction status updates, including final sale prices
- Historical transactions: Past sales, price changes, and listing history going back years or decades
- Agent and broker data: Contact information, transaction history, and commission structures
- Confidential remarks: Notes visible only to agents (showing instructions, seller motivations, etc.)
- Off-market data: Properties being marketed privately among agents before public listing
MLS vs. Public Real Estate Sites
| Feature | MLS (Direct) | Zillow / Redfin / Realtor.com |
|---|---|---|
| Data completeness | 100% of agent-listed properties | 90-95% (some delayed or missing) |
| Data freshness | Real-time | Minutes to hours delay |
| Sold price accuracy | Exact (from closing records) | Usually accurate, sometimes estimated |
| Confidential agent notes | Yes | No |
| Off-market listings | Some (pocket listings) | No |
| Commission data | Yes (before NAR settlement changes) | No |
| Access requirements | Licensed agent/broker or approved vendor | Public access |
| Cost | MLS membership fees ($200-$1,000+/year) | Free |
The Legal Landscape Around MLS Data Scraping
This is the section that matters most. MLS data scraping sits at the intersection of several legal frameworks, and the risks are significantly higher than scraping public real estate websites. For context on the broader legal landscape of web scraping, see our guide on legal and ethical considerations for price scraping with proxies.
Copyright Protection
MLS databases are compilations of data that may be protected by copyright under the “compilation doctrine.” While individual facts (a property’s price or square footage) are not copyrightable, the selection, coordination, and arrangement of data in an MLS database may be. Key cases:
- Feist Publications v. Rural Telephone Service (1991): The Supreme Court held that factual compilations can receive copyright protection if they display sufficient creativity in selection or arrangement. However, purely factual databases with obvious arrangement (alphabetical, geographic) may not qualify.
- MLS-specific rulings: Several courts have found that MLS databases contain sufficient creative selection and arrangement to qualify for copyright protection, particularly the combination of fields, categories, and status designations that each MLS creates.
Terms of Service and Contractual Restrictions
Every MLS has Terms of Service that explicitly prohibit unauthorized access, scraping, redistribution, and commercial use of its data. Unlike public websites (where ToS enforcement is debated), MLS Terms of Service carry more weight because:
- MLS access requires an account, creating a clear contractual relationship
- Users must affirmatively agree to terms before accessing data
- The MLS can revoke access and membership for violations
- Violations can result in NAR disciplinary proceedings for licensed agents
Computer Fraud and Abuse Act (CFAA)
Accessing an MLS system without authorization or in excess of authorized access could violate the CFAA. Unlike scraping public websites (where the hiQ v. LinkedIn precedent provides some protection), MLS systems are access-controlled and require credentials. Unauthorized access to credentialed systems is much more clearly within the CFAA’s scope.
State Laws
Many states have their own computer access and data protection laws that may apply to MLS scraping. California’s CDAFA (Comprehensive Data Access and Fraud Act) and similar state statutes can create additional liability.
NAR Rules and Regulations
The National Association of Realtors sets rules governing MLS data use. NAR’s MLS Policy Handbook explicitly covers data licensing, display, and redistribution. Violating these rules can result in:
- Fines from the local MLS
- Suspension or expiration of MLS membership
- Loss of Realtor designation
- In extreme cases, license revocation by the state real estate commission
Public vs. Private MLS Data: A Critical Distinction
Not all MLS data is equally protected. Understanding the public/private distinction is crucial for assessing legal risk.
Public MLS Data
Some MLS data is intentionally made public through IDX (Internet Data Exchange) feeds, syndication agreements with portals like Zillow, and public-facing MLS websites. This data includes:
- Active listing details (price, photos, property specs)
- Listing agent contact information
- Open house schedules
- Basic property descriptions
Scraping publicly displayed MLS data from consumer-facing websites falls into the same legal gray area as scraping Zillow or Redfin. The hiQ precedent provides some protection, but the data’s origin in an MLS system adds complexity.
Private MLS Data
Other MLS data is intentionally restricted to authorized users:
- Confidential agent remarks
- Commission rates and buyer agent compensation
- Showing instructions and lockbox codes
- Seller contact information
- Detailed transaction documents
Accessing or scraping private MLS data without authorization is clearly high-risk from a legal standpoint. This crosses the line from “scraping public data” (legally ambiguous) to “unauthorized access to protected systems” (clearly problematic).
Legal Alternatives to Scraping MLS Data
Given the legal risks of directly scraping MLS systems, several legitimate alternatives exist for accessing MLS-quality data.
1. IDX Feeds
Internet Data Exchange (IDX) is a system that allows MLS members to display listing data on their own websites. If you partner with a licensed broker, you can access IDX feeds that provide real-time listing data in a structured format.
- Pros: Legitimate access, real-time data, structured format (RETS or Web API)
- Cons: Requires broker partnership, display restrictions apply, limited to the broker’s MLS regions
- Cost: Typically $50-$200/month per MLS, plus broker cooperation
2. RESO Web API
The Real Estate Standards Organization (RESO) has developed a standardized Web API that many MLS systems now support. This provides a RESTful API for accessing listing data with proper authentication and authorization.
- Pros: Standardized format across MLS systems, clean API access, legitimate
- Cons: Requires vendor approval from each MLS, application process, compliance requirements
- Cost: Varies by MLS — typically $100-$500/month plus per-record fees
3. Data Aggregators
Companies like CoreLogic, ATTOM Data, and Black Knight aggregate MLS data from hundreds of sources and license it to businesses. This is the most comprehensive option but also the most expensive.
- Pros: Nationwide coverage, historical data, additional data layers (tax, mortgage, foreclosure)
- Cons: Very expensive, minimum contract requirements, restrictive licensing terms
- Cost: $500-$10,000+/month depending on data scope
4. Public Record Sources
County recorder offices, assessor databases, and other public record sources contain property data that is truly public. Scraping these sources carries much lower legal risk.
- Pros: Genuinely public data, minimal legal risk, includes tax and ownership data
- Cons: No active listing data (only recorded transactions), inconsistent formats across counties, often requires county-by-county access
- Cost: Free to low cost (some counties charge for bulk access)
5. Scraping Public Real Estate Portals
Instead of scraping MLS systems directly, scrape the public-facing websites that display MLS data: Zillow, Redfin, Realtor.com, and Trulia. These sites source their data from MLS systems but display it publicly. While this approach has its own legal considerations, the risk profile is significantly lower than accessing MLS systems directly.
| Data Source | Legal Risk | Data Quality | Cost | Best For |
|---|---|---|---|---|
| Direct MLS scraping | High | Excellent | Low (proxy costs) | Not recommended |
| IDX feeds | Low | Excellent | Moderate | Licensed brokers/agents |
| RESO Web API | Low | Excellent | Moderate-High | Tech companies, startups |
| Data aggregators | Very low | Excellent | High | Enterprise, large businesses |
| Public records | Very low | Good (no active listings) | Low | Transaction/ownership data |
| Public portal scraping | Low-Moderate | Good | Low (proxy costs) | Most independent users |
If You Decide to Scrape Public MLS Portals
Many MLS organizations operate public-facing websites where consumers can search for listings. These are different from the member-only MLS portals. If you decide to scrape these public-facing MLS websites, here are proxy considerations:
MLS Portal Technical Characteristics
Public MLS portals tend to be less technically sophisticated than major platforms like Zillow or Redfin. Many run on older web platforms and have basic anti-bot protections. However, some have adopted modern bot detection tools.
- Smaller sites: Many regional MLS public portals have minimal bot protection. Rotating residential proxies are usually sufficient.
- Larger portals: Major MLS portals (like those serving large metro areas) may use commercial bot detection. ISP proxies provide more reliable access.
- Rate sensitivity: MLS portals typically have less server capacity than Zillow or Redfin. Keep request rates low (5-10 per minute) to avoid overloading their infrastructure.
Proxy Recommendations for MLS Portal Scraping
| MLS Portal Size | Recommended Proxy | Rate Limit |
|---|---|---|
| Small regional (under 10K listings) | Rotating Residential | 5-8 req/min |
| Mid-size metro (10K-100K listings) | ISP or Rotating Residential | 8-12 req/min |
| Large metro (100K+ listings) | ISP with Residential backup | 10-15 req/min |
The NAR Settlement and Its Impact on MLS Data
In 2024, the National Association of Realtors reached a landmark settlement that changed how MLS data is structured and shared. Key changes relevant to data access:
- Commission transparency: Buyer agent compensation offers are no longer published in MLS listings, removing a data point that was previously available.
- Data sharing evolution: The settlement has accelerated discussions about MLS data accessibility and whether the traditional gatekeeper model should evolve.
- New entrants: The settlement has opened doors for alternative listing platforms and data services, potentially creating new legitimate data sources.
The real estate data landscape is evolving rapidly, and what is available today may change significantly in the coming years. Building flexible data pipelines that can adapt to new sources and formats is more important than ever.
Building a Compliant Real Estate Data Strategy
Step 1: Identify What Data You Actually Need
Before deciding how to get data, define exactly what you need. Active listings? Sold prices? Tax records? You may find that public portal scraping covers your needs without touching MLS systems directly.
Step 2: Use the Lowest-Risk Source That Meets Your Needs
Start with public records and public real estate portals. If those are insufficient, explore IDX feeds through a broker partnership. Only consider direct MLS access (through legitimate vendor agreements) for enterprise-grade applications.
Step 3: Implement Proper Data Handling
Regardless of your data source, implement responsible data handling: do not republish copyrighted descriptions or photos, respect robots.txt preferences, keep request rates reasonable, and store only the data you need.
Step 4: Consult a Lawyer
If your business model depends on real estate data, invest in legal counsel familiar with real estate data law. The cost of legal advice is far less than the cost of a lawsuit from an MLS organization.
FAQ
Is scraping MLS data illegal?
Scraping private MLS portals (those requiring member credentials) without authorization is legally risky and could violate the Computer Fraud and Abuse Act, state computer access laws, and MLS Terms of Service. Scraping public-facing MLS consumer portals falls into a gray area similar to scraping Zillow or Redfin. The legal risk depends heavily on how you access the data, what data you take, and how you use it. This is not a simple yes/no answer — consult an attorney for your specific situation.
What is the difference between MLS data and Zillow data?
MLS data is the original source — it is entered by listing agents and contains complete, real-time information including confidential notes and commission structures. Zillow receives a subset of MLS data through syndication agreements and IDX feeds, typically with a delay of minutes to hours. Zillow also adds its own data layers (Zestimate, page views, save counts) that are not available in MLS systems. For most analytical purposes, Zillow data is a reasonable proxy for MLS data, though it lacks the completeness and freshness of direct MLS access.
Can I access MLS data through a broker partnership?
Yes, this is one of the most common legitimate approaches. If you partner with a licensed broker who is an MLS member, they can provide you access to IDX feeds or arrange for your technology to be approved as an MLS vendor. The broker acts as your sponsor, and you must comply with the MLS’s data use policies. This approach requires a real relationship with a broker and compliance with display rules — you cannot simply use the data however you want.
Are there free alternatives to MLS data for real estate analysis?
Yes. Public county assessor and recorder websites provide free access to property tax records, ownership history, and recorded transaction prices. The Census Bureau provides demographic and housing data at the neighborhood level. Redfin offers free downloadable market data through its Data Center. And scraping public real estate portals (with appropriate proxies) can provide listing-level data at minimal cost. For building a system that tracks prices from these public sources, see our guide to building a real estate price tracker with rotating proxies.
What happens if I get caught scraping MLS data?
Consequences depend on the nature of the scraping and who pursues action. For licensed agents scraping their own MLS, the most common consequences are fines, suspension of MLS access, and potentially disciplinary proceedings through NAR. For unlicensed individuals or companies, the MLS organization may send a cease-and-desist letter, pursue civil litigation for copyright infringement or breach of contract, or report the access as a CFAA violation. The severity typically scales with the volume of data accessed and the commercial use made of it.