—
Last.fm sits on one of the richest public music datasets on the web — over 700 million scrobbles logged per month, public listening histories, artist play counts, and tag-based genre metadata. if you want to scrape Last.fm for music trend analysis, recommendation datasets, or popularity signals, you’re in luck: Last.fm exposes most of this through a free, documented API, with HTML scraping as a fallback for the edges the API doesn’t cover.
why Last.fm data is worth collecting
Last.fm tracks actual listening behavior, not editorial curation. that’s a fundamentally different signal from how to scrape Apple Music Charts and Playlists (2026), where chart positions reflect algorithmic promotion as much as listener preference. Last.fm scrobble counts come from real playback events submitted by users across Spotify, Winamp, and dozens of integrated clients.
use cases that justify the effort:
- building music recommendation training datasets
- tracking artist popularity trends over time using scrobble velocity
- academic research on listener behavior and genre clustering
- competitive benchmarking for music labels and A&R teams
- enriching artist records with listener counts and tag metadata
the public profile model also helps: unlike Spotify, which requires OAuth to access personalized listening data, Last.fm public profiles expose recent tracks, loved tracks, and listening stats without any login. you get more with a free API key than most platforms offer after authentication.
the Last.fm API: what you can access
register at last.fm/api to get a free API key. the base URL is https://ws.audioscrobbler.com/2.0/. the key endpoints:
| method | what it returns | notes |
|---|---|---|
user.getRecentTracks | paginated scrobble history for any public user | max 200 per page, supports from/to unix timestamps |
user.getTopArtists | top artists by play count for a user | period: 7day, 1month, 3month, 6month, 12month, overall |
artist.getInfo | listeners, scrobbles, bio, similar artists, tags | no auth needed |
chart.getTopArtists | site-wide weekly top artists | global trending signal |
tag.getTopTracks | top tracks for a genre tag | useful for genre-specific datasets |
track.getSimilar | similar tracks by Last.fm collaborative filtering |
rate limits are generous: unauthenticated requests cap at 5 per second, and the API documentation references a soft ceiling around 5 million calls per day for API key holders. in practice, at 5 req/sec you can pull about 430,000 calls per day if you run continuously — enough for most research workloads.
pulling user recent tracks in Python
import requests
import time
API_KEY = "your_api_key_here"
BASE_URL = "https://ws.audioscrobbler.com/2.0/"
def get_recent_tracks(username, pages=5):
tracks = []
for page in range(1, pages + 1):
params = {
"method": "user.getRecentTracks",
"user": username,
"api_key": API_KEY,
"format": "json",
"limit": 200,
"page": page,
}
resp = requests.get(BASE_URL, params=params, timeout=10)
resp.raise_for_status()
data = resp.json()
batch = data.get("recenttracks", {}).get("track", [])
tracks.extend(batch)
total_pages = int(data["recenttracks"]["@attr"]["totalPages"])
if page >= total_pages:
break
time.sleep(0.2) # stay under 5 req/sec
return tracks
tracks = get_recent_tracks("rj") # Last.fm founder, public profile
print(f"fetched {len(tracks)} tracks")pagination works via the page and limit params. always read @attr.totalPages from the response before assuming there’s more data — some users have short histories.
HTML scraping: when the API falls short
the API covers most structured data, but a few things only appear on the rendered page: the “similar artists” sidebar layout, listener count per country (inferred from profile data), and some tag-weight visualizations. for these, you need HTML scraping.
Last.fm artist pages follow a consistent structure. the listener count lives in a span with class header-metadata-display. similar artists appear in a grid that’s server-rendered, not loaded by JavaScript, so basic requests + BeautifulSoup works without a headless browser.
compare the two approaches:
| dimension | Last.fm API | HTML scraping |
|---|---|---|
| rate limit | 5 req/sec (key), documented | no stated limit, but blocks trigger around 100+ req/min per IP |
| data richness | structured JSON, all core fields | additional layout data, visual positioning |
| complexity | low — single HTTP call per endpoint | medium — CSS selectors break on redesigns |
| robustness | high — versioned, stable since 2005 | medium — depends on markup stability |
| auth required | API key only | none for public pages |
for scraping at scale beyond a few thousand pages, rotating residential proxies are worth using to avoid IP-level rate limiting. the same logic applies when you scrape SoundCloud artist and track data — both platforms tolerate modest crawl rates but block aggressive single-IP patterns.
building a dataset: scrobble history at scale
the most valuable Last.fm dataset for music ML is scrobble history across many users. the challenge is user discovery — the API has no “list all users” endpoint.
a practical approach:
- seed with known public usernames (the API’s
chart.getTopArtistsresponse includes listener usernames in some contexts;artist.getInfoincludes recent listeners) - pull each user’s top artists and recent tracks via
user.getRecentTracks - extract artist names and use
artist.getInfoto enrich with global play count, listener count, and tags - store to a local SQLite or Postgres table keyed on
(username, artist_mbid)to deduplicate - expand the seed set by scraping the “listeners” section on popular artist pages
this graph-walk approach surfaces active scrobblers quickly. a 10,000-user sample is achievable in under 48 hours at the API rate limit.
for comparison, scraping Spotify public data gets you play counts and editorial playlists but not individual listening histories — Spotify requires auth tokens that expire every hour. Last.fm’s persistent public profiles are genuinely more accessible for longitudinal research.
handling data quality and missing fields
a few common issues when working with Last.fm data:
- now playing tracks:
user.getRecentTracksincludes a@attr.nowplayingflag on the current track — filter these out if you need completed scrobbles only, since they have no timestamp - corrected artist names: Last.fm applies artist name corrections silently; the
artistcorrectedfield tells you when a correction was applied - MusicBrainz IDs:
mbidfields are often empty strings for smaller artists — don’t rely on them as primary keys, use artist name with lowercase normalization instead - deleted or private accounts: a 404 or
error: 6response means the user doesn’t exist or the profile is private; cache these to avoid retrying - Unicode normalization: artist names like “Sigur Rós” come back differently encoded depending on the endpoint — normalize to NFC before storing
the same data quality mindset applies when scraping niche music platforms. if you’re also pulling from Bandcamp artist pages and sales data, expect even more inconsistency since Bandcamp artist pages are largely user-generated with no enforced schema. Last.fm’s API is comparatively clean.
it’s also worth noting that the public-data-first strategy used here — no fake credentials, no session hijacking — mirrors the approach covered in scraping ZoomInfo without an account. when a platform exposes data publicly, you’re almost always better off using sanctioned access patterns than scraping authenticated endpoints.
Bottom line
start with the Last.fm API for structured data — it’s free, stable, and covers 95% of what you need for music datasets and trend analysis. add HTML scraping only for the specific page elements the API doesn’t expose, and use rotating proxies if you’re running above a few hundred pages per hour. dataresearchtools.com covers the full stack of music platform scraping, including Spotify, Apple Music, SoundCloud, and Bandcamp — the same techniques and infrastructure apply across all of them.
Related guides on dataresearchtools.com
- How to Scrape Spotify Public Data (2026): Playlists, Artists, Charts
- How to Scrape Apple Music Charts and Playlists (2026)
- How to Scrape SoundCloud Artist + Track Data (2026)
- How to Scrape Bandcamp Artist Pages and Sales Data (2026)
- Pillar: How to Scrape ZoomInfo Without Account: Public Data Strategies (2026)