Scraping NFT collection floor prices and metadata in 2026

Scrape NFT data jobs in 2026 sit at the intersection of three different systems that all need to agree before you have a complete record: the marketplace API for current listings and floor price, the underlying blockchain for ownership and provenance, and IPFS or Arweave for the actual metadata and image. Pull from only one source and you have a partial picture; pull from all three and you spend most of your engineering time on rate limits, gateway timeouts, and CDN caching weirdness. The market has consolidated since the 2021-2022 peak. Three platforms (OpenSea, Blur, Magic Eden) handle the vast majority of liquidity, and the long tail of marketplaces is mostly dead. That consolidation makes the scraping problem more tractable than it was three years ago.

This guide covers the practical mechanics of building an NFT data pipeline in 2026: which marketplace APIs are public versus paid, how to reconcile on-chain truth against marketplace cache, and the rate-limit and proxy patterns that let a small operation track 10,000+ collections continuously.

What “floor price” actually means and why it is hard

The floor price of a collection is the lowest active listing price on a marketplace. It sounds simple but it has three quirks that ruin naive scrapers.

First, floor is per-marketplace. A collection might have a 0.5 ETH floor on OpenSea and a 0.45 ETH floor on Blur because of fee differences and platform-specific listings. The “true” floor is the minimum across all marketplaces where the collection trades.

Second, floor is sensitive to outliers. A single listing at a clearly broken price (1 wei, or accidentally bid in DAI instead of ETH) becomes the technical floor until it gets bought or canceled. Production trackers usually compute a “true floor” by sorting listings ascending and taking the price at the 1st percentile or after the first few listings, ignoring obvious outliers.

Third, floor changes constantly. During a popular mint or hype cycle, floor can move 5-10% in a single minute. Polling at 5-minute cadence will miss most of the movement. Real-time floor tracking requires websocket or webhook subscriptions where the marketplace offers them.

OpenSea API in 2026

OpenSea operates the most widely used NFT API. As of 2026, the v2 API requires an API key for almost everything. You can request a free key through their developer portal but the free tier is limited to 4 requests per second and excludes the highest-value endpoints (real-time order book, historical sales). Paid plans start at around $200/month for higher rate limits and at $1500/month for the data tier with full historical access.

The free tier is enough for tracking floor price across a few hundred collections at 5-minute cadence. Past that you either pay or you augment with on-chain data (which is free but has its own complexity).

import time
import requests

class OpenSeaClient:
    def __init__(self, api_key: str):
        self.api_key = api_key
        self.session = requests.Session()
        self.session.headers.update({
            "X-API-KEY": api_key,
            "Accept": "application/json",
        })

    def get_collection(self, slug: str):
        url = f"https://api.opensea.io/api/v2/collections/{slug}"
        return self._get(url)

    def get_listings(self, slug: str, limit: int = 50):
        url = f"https://api.opensea.io/api/v2/listings/collection/{slug}/all"
        return self._get(url, params={"limit": limit})

    def get_stats(self, slug: str):
        url = f"https://api.opensea.io/api/v2/collections/{slug}/stats"
        return self._get(url)

    def _get(self, url, params=None, retries=3):
        for attempt in range(retries):
            resp = self.session.get(url, params=params, timeout=15)
            if resp.status_code == 429:
                time.sleep(2 ** attempt)
                continue
            resp.raise_for_status()
            return resp.json()

The stats endpoint is the cheapest way to track floor price because it returns the floor and total volume in a single call. The listings endpoint gives more detail but costs more rate-limit budget per collection.

Blur API: less documented but more powerful

Blur captured most of the professional trading volume in 2023 and has held it. Their API is technically not public but a usable endpoint exists at https://core-api.prod.blur.io/v1/. It requires a session token that you obtain by signing a wallet message. Tracking sites like Nansen and CryptoSlam use this endpoint. Blur tolerates it as long as you stay below roughly 100 requests per minute per session.

Blur’s particular value is the bid pool: a unified bidding mechanism where buyers commit ETH against an entire collection at a price. The sum of bids at each tier is a strong demand signal that does not exist on OpenSea. Scraping the Blur bid pool gives you data that is genuinely not available anywhere else without paying Blur for it directly.

def blur_collection_stats(slug: str, auth_token: str):
    url = f"https://core-api.prod.blur.io/v1/collections/{slug}"
    resp = requests.get(
        url,
        headers={
            "authToken": auth_token,
            "User-Agent": "Mozilla/5.0",
        },
        timeout=10,
    )
    return resp.json()

The auth token expires after about 24 hours. Production setups rotate the wallet signing automatically.

Magic Eden for Solana and multichain

Magic Eden is the dominant Solana NFT marketplace and has expanded to Bitcoin Ordinals, Polygon, and Ethereum. Their public API at https://api-mainnet.magiceden.dev/v2/ does not require an API key for most endpoints and tolerates 2 requests per second per IP.

def magiceden_collection_stats(symbol: str):
    url = f"https://api-mainnet.magiceden.dev/v2/collections/{symbol}/stats"
    resp = requests.get(url, timeout=10)
    return resp.json()

Magic Eden is the right tool when you care about Solana NFT data or Bitcoin Ordinals. For Ethereum NFTs, OpenSea and Blur have deeper liquidity and you should use them instead.

Marketplace API comparison

marketplace	API auth	free rate limit	floor price	full listing data	best for
OpenSea	API key (free + paid)	4 req/s free	yes	yes (paid for full)	Ethereum mainstream
Blur	session token	~100 req/min	yes	yes	Ethereum trading data
Magic Eden	none	2 req/s	yes	yes	Solana, Ordinals, multichain
LooksRare	none for public	5 req/s	yes	yes	Ethereum royalty-aware
X2Y2	API key (paid)	varies	yes	yes	Ethereum, declining
Tensor	none	aggressive	yes	yes	Solana professional
Reservoir	API key (free tier)	30k req/day free	yes	yes (aggregated)	aggregator across all ETH chains

Reservoir deserves special mention. They aggregate listings from OpenSea, Blur, LooksRare, and several others into a unified API. For most use cases, Reservoir is easier than running separate scrapers against each marketplace. The free tier is generous and covers most research work.

Marketplace decision matrix

Use this matrix when deciding which marketplace API to lean on for a given collection or use case:

use case	primary	fallback	notes
Top-100 Ethereum bluechip floor	Reservoir	OpenSea	Reservoir’s aggregated floor catches Blur and Sudoswap that OpenSea misses
Trader analytics on Ethereum	Blur	Reservoir	Blur’s bid pool data is unique and worth the auth pain
Solana NFT floor and trades	Magic Eden	Tensor	Magic Eden has full coverage; Tensor for execution-quality data
Bitcoin Ordinals	Magic Eden	Hiro	Magic Eden Ordinals support has matured; Hiro still useful for protocol-level inscriptions
L2 NFT collections (Base, Zora)	Reservoir	Native marketplace	Reservoir has Base and Zora; smaller chains are spotty
Royalty enforcement analysis	LooksRare + custom	Reservoir	Royalty data requires per-marketplace logic that aggregators flatten
Real-time floor tracking for trading bots	Reservoir webhooks	Marketplace websockets	Webhooks remove polling latency entirely

Pick the primary based on use case rather than alphabetical order. The fallback path is critical because every marketplace API has occasional outages and rate-limit surprises.

On-chain truth: when the marketplace is wrong

Marketplaces cache aggressively. A listing might be canceled or filled on-chain but still showing as active on the marketplace API for 30-60 seconds. For research this is fine. For trading or arbitrage detection it is fatal.

The authoritative source for ownership and listing status is the blockchain itself. Each marketplace operates its own listing contract (Seaport for OpenSea, BlurExchange for Blur). You can read the contract state directly via RPC and confirm which listings are still active.

from web3 import Web3

w3 = Web3(Web3.HTTPProvider("https://eth-mainnet.g.alchemy.com/v2/YOUR_KEY"))

SEAPORT_ADDRESS = "0x00000000000000ADc04C56Bf30aC9d3c0aAF14dC"

def is_listing_active(order_hash: bytes) -> bool:
    seaport = w3.eth.contract(address=SEAPORT_ADDRESS, abi=SEAPORT_ABI)
    status = seaport.functions.getOrderStatus(order_hash).call()
    is_validated, is_cancelled, total_filled, total_size = status
    return is_validated and not is_cancelled and total_filled < total_size

For ownership, the ERC-721 ownerOf(tokenId) view is the truth. Marketplaces show owner data that may be stale by minutes. If you are scoring rarity or computing wallet holdings, you must call the contract directly.

Reservoir-first architecture

A pragmatic 2026 architecture treats Reservoir as the primary read source and falls back to direct marketplace APIs only when Reservoir lacks coverage or freshness. The flow:

Reservoir’s tokens/v6 and orders/asks/v5 endpoints aggregate listings across OpenSea, Blur, LooksRare, X2Y2, Sudoswap, and a half-dozen newer venues. One request returns the cross-venue floor and the venue-by-venue breakdown.
For collections Reservoir does not cover (newer chains, niche L2s), call the native marketplace API directly.
For sub-second freshness on top-tier collections, subscribe to Reservoir’s webhook events instead of polling.
Cross-check Reservoir’s data once per day against Etherscan transfer counts to catch indexing lag, which appears occasionally during high-volume mint events.

The single-source pattern saves dozens of integration headaches and the multi-marketplace reconciliation work. The cost is dependency on Reservoir staying alive and pricing reasonably; have the direct-marketplace fallback paths shipped and tested even if you do not use them daily.

IPFS metadata fetching

NFT metadata for image, name, traits, and description is usually stored on IPFS or Arweave, with a token URI that resolves to a JSON document. The chain stores the URI; the URI points to the metadata; the metadata points to the image.

IPFS gateways are notoriously unreliable. The default ipfs.io gateway frequently returns 504 timeouts. Production scrapers maintain a list of gateways and round-robin requests across them with retry logic.

GATEWAYS = [
    "https://ipfs.io/ipfs/",
    "https://cloudflare-ipfs.com/ipfs/",
    "https://gateway.pinata.cloud/ipfs/",
    "https://nftstorage.link/ipfs/",
    "https://w3s.link/ipfs/",
]

def fetch_ipfs(cid: str, timeout: int = 10):
    for gateway in GATEWAYS:
        try:
            resp = requests.get(gateway + cid, timeout=timeout)
            if resp.status_code == 200:
                return resp.json() if "json" in resp.headers.get("Content-Type", "") else resp.content
        except requests.RequestException:
            continue
    raise IPFSFetchError(cid)

For projects you scrape repeatedly, pin the metadata to your own IPFS node or upload it to S3. This eliminates gateway flakiness and makes downstream queries instant.

Proxy and rate limit strategy

Marketplace APIs are the rate-limit bottleneck for NFT scraping. The mitigation pattern is multi-key rotation rather than IP rotation. Each developer account gets its own API key, and you round-robin across keys. OpenSea allows multiple API keys per account, and you can register multiple accounts (within their terms) for additional throughput.

For unkeyed endpoints (Magic Eden, Blur with rotating tokens), proxies do help. Use residential proxies to avoid the “all your traffic from one AWS IP” pattern that gets fingerprinted. We compare options in our best residential proxy providers 2026 review.

For RPC calls to read on-chain data, you have your own rate limits with your RPC provider (Alchemy, Infura, QuickNode). Most providers have generous free tiers, and one provider key per worker process avoids cross-contamination of rate limits.

Storage schema

NFT data is multidimensional and most teams over-design the schema on day one. Start simple:

CREATE TABLE collections (
    slug TEXT PRIMARY KEY,
    chain TEXT NOT NULL,
    contract_address TEXT NOT NULL,
    name TEXT,
    total_supply INTEGER,
    UNIQUE (chain, contract_address)
);

CREATE TABLE collection_snapshots (
    slug TEXT NOT NULL REFERENCES collections(slug),
    captured_at TIMESTAMPTZ NOT NULL,
    floor_price_eth NUMERIC,
    volume_24h_eth NUMERIC,
    sales_24h INTEGER,
    listed_count INTEGER,
    owner_count INTEGER,
    PRIMARY KEY (slug, captured_at)
);

CREATE INDEX ON collection_snapshots (captured_at DESC);

CREATE TABLE token_listings (
    chain TEXT NOT NULL,
    contract_address TEXT NOT NULL,
    token_id NUMERIC NOT NULL,
    marketplace TEXT NOT NULL,
    price_eth NUMERIC,
    seller_address TEXT,
    listed_at TIMESTAMPTZ,
    expires_at TIMESTAMPTZ,
    order_hash TEXT,
    is_active BOOLEAN,
    PRIMARY KEY (chain, contract_address, token_id, marketplace, order_hash)
);

For 1000 collections at 5-minute snapshot cadence, you generate 288,000 collection snapshot rows per day plus listing-level data. PostgreSQL handles this comfortably for a year before you need to consider partitioning.

Snapshot cadence vs storage tradeoff

Cadence has compounding effects on storage and rate-limit budget. The right cadence depends on tier:

Tier 1 (top 200 collections, mints, news-driven assets): 60 seconds. These move fast enough that a 5-minute gap loses real signal. At 200 collections this is 17,280 calls per day. With 4 OpenSea API keys you can absorb that comfortably.
Tier 2 (next 2,000 collections): 5 minutes. Still useful for tracking trends; movement is slower so the lower cadence preserves more than 90% of meaningful signal.
Tier 3 (long tail, 10,000+ collections): 1 hour. Catches major liquidity changes without burning the rate budget. Many long-tail collections have zero listed items most days, so most calls return identical data.

A reasonable two-week-old listing without movement can be polled daily. Reservoir’s snapshot endpoint accepts batched collection lookups, which is the cheapest way to refresh long-tail tiers in bulk.

Sales history and provenance

For provenance and sales history, the on-chain approach is more reliable than marketplace APIs. Every NFT transfer emits a Transfer event from the ERC-721 contract. Every marketplace sale emits a marketplace-specific event (Seaport’s OrderFulfilled, etc.) that includes price.

Indexing services like Goldsky, The Graph, Subsquid, and the previously mentioned Reservoir aggregate this data and expose it via GraphQL. For one-off queries, use a service. For continuous indexing of specific collections, run a node and subscribe directly to the events.

We cover the broader on-chain indexing patterns in our crypto-defi category hub and our deep dive on scraping crypto exchange order books.

External authoritative reference: the OpenSea API documentation covers the current endpoint catalog and rate-limit policy.

Cost worked example

A practical 2026 setup tracking 5,000 collections across Ethereum and Solana with mixed cadence costs roughly:

Reservoir API free tier ($0) plus one paid key for the higher rate limits ($150/mo)
OpenSea free key for floor stats; no paid tier needed if you reconcile via Reservoir
Magic Eden public API ($0)
Alchemy Growth tier for on-chain reads ($49/mo)
Pinata or NFT.Storage for self-pinned IPFS metadata ($20/mo, 50 GB)
$40/mo VPS for the collector (4 vCPU, 8 GB)
$25/mo Postgres on a small managed instance
30 IPs of residential proxy for Blur and unauth endpoints (~$50/mo on a starter pack)

Total: about $335/month. The same coverage purchased through a vendor (NFTGo Enterprise, Nansen Query) runs $1,500-5,000/month. Self-hosting becomes a clear win once you cross 200 collections and need historical depth.

Common failure modes

The most common failure mode in NFT scrapers is treating the marketplace API as ground truth. Always reconcile against on-chain state for ownership and listing status. The second most common failure is not handling IPFS gateway timeouts. Build retry logic across multiple gateways from day one.

The third failure mode is overreacting to outlier listings. A single 0.0001 ETH listing on a 1 ETH floor collection is almost always either a wash trade attempt or a scam. Production trackers ignore listings below the 1st percentile of recent floor history.

FAQ

Q: which API gives the best Ethereum coverage?
For aggregated Ethereum data, Reservoir is the easiest path. For raw OpenSea data, the official API. For trading depth, Blur. Most production setups combine all three.

Q: do I need a wallet to scrape NFT data?
Not for read-only scraping of public marketplace data. You need a wallet to access Blur’s authenticated endpoints and to interact directly with marketplace contracts on-chain. A throwaway wallet with no funds works fine for read-only authentication.

Q: how do I track floor price changes in real time?
OpenSea offers a streaming events API that pushes order events. Reservoir has webhooks. Without paid endpoints, polling at 60-second cadence is the practical floor for “near real time.”

Q: can I scrape rarity scores?
Rarity is computed from trait distribution, which you derive from the metadata of all tokens in a collection. Once you have the metadata table, rarity computation is trivial. Most rarity tools use the same algorithm and produce similar scores.

Q: what about Bitcoin Ordinals?
Ordinals data lives on the Bitcoin chain and is indexed by services like Magic Eden, Hiro, and Ord.io. The standard Bitcoin RPC does not return Ordinal data directly; you need an indexer in front of a full node.

Q: how do I tell a wash trade from a real sale?
Wash trades typically loop between two related wallets (often funded from the same source within the previous 7 days), at suspiciously round prices, with no time between transfers. Cross-reference seller and buyer addresses against a wallet-cluster service like Arkham or against your own clustering on shared funding sources. Flagging is heuristic, not exact, but rules out 80-90% of obvious wash activity.

Q: do marketplaces ever sue scrapers?
Cease-and-desist letters happen, lawsuits are rare and typically reserved for projects that resell the data as a competing product. Personal use, research, and non-commercial analytics have not historically attracted enforcement. Building a public dashboard that displays floor prices crosses into commercial territory and is where you should consult counsel.

Q: should I use The Graph subgraphs instead of direct RPC?
The Graph is excellent for queries that span many blocks and aggregate data, like “all sales of collection X in the last month.” Direct RPC is better for single-state lookups, like “is this listing currently active.” Use both: subgraphs for analytics, RPC for live truth.

Closing

NFT scraping in 2026 is a multi-source reconciliation problem more than a pure scraping problem. The marketplace APIs give you the user-facing view; the chain gives you the truth; IPFS gives you the content. Build pipelines that treat all three as inputs and reconcile them, and you can run a research-grade NFT data system at hobbyist cost. For broader infrastructure see our crypto-defi category hub.