How to Scrape Last.fm Listening Data and Artist Metadata (2026)

—

Last.fm sits on one of the richest public music datasets on the web — over 700 million scrobbles logged per month, public listening histories, artist play counts, and tag-based genre metadata. if you want to scrape Last.fm for music trend analysis, recommendation datasets, or popularity signals, you’re in luck: Last.fm exposes most of this through a free, documented API, with HTML scraping as a fallback for the edges the API doesn’t cover.

why Last.fm data is worth collecting

Last.fm tracks actual listening behavior, not editorial curation. that’s a fundamentally different signal from how to scrape Apple Music Charts and Playlists (2026), where chart positions reflect algorithmic promotion as much as listener preference. Last.fm scrobble counts come from real playback events submitted by users across Spotify, Winamp, and dozens of integrated clients.

use cases that justify the effort:

building music recommendation training datasets
tracking artist popularity trends over time using scrobble velocity
academic research on listener behavior and genre clustering
competitive benchmarking for music labels and A&R teams
enriching artist records with listener counts and tag metadata

the public profile model also helps: unlike Spotify, which requires OAuth to access personalized listening data, Last.fm public profiles expose recent tracks, loved tracks, and listening stats without any login. you get more with a free API key than most platforms offer after authentication.

the Last.fm API: what you can access

method	what it returns	notes
`user.getRecentTracks`	paginated scrobble history for any public user	max 200 per page, supports `from`/`to` unix timestamps
`user.getTopArtists`	top artists by play count for a user	period: 7day, 1month, 3month, 6month, 12month, overall
`artist.getInfo`	listeners, scrobbles, bio, similar artists, tags	no auth needed
`chart.getTopArtists`	site-wide weekly top artists	global trending signal
`tag.getTopTracks`	top tracks for a genre tag	useful for genre-specific datasets
`track.getSimilar`	similar tracks by Last.fm collaborative filtering

rate limits are generous: unauthenticated requests cap at 5 per second, and the API documentation references a soft ceiling around 5 million calls per day for API key holders. in practice, at 5 req/sec you can pull about 430,000 calls per day if you run continuously — enough for most research workloads.

pulling user recent tracks in Python

import requests
import time

API_KEY = "your_api_key_here"
BASE_URL = "https://ws.audioscrobbler.com/2.0/"

def get_recent_tracks(username, pages=5):
    tracks = []
    for page in range(1, pages + 1):
        params = {
            "method": "user.getRecentTracks",
            "user": username,
            "api_key": API_KEY,
            "format": "json",
            "limit": 200,
            "page": page,
        }
        resp = requests.get(BASE_URL, params=params, timeout=10)
        resp.raise_for_status()
        data = resp.json()
        batch = data.get("recenttracks", {}).get("track", [])
        tracks.extend(batch)
        total_pages = int(data["recenttracks"]["@attr"]["totalPages"])
        if page >= total_pages:
            break
        time.sleep(0.2)  # stay under 5 req/sec
    return tracks

tracks = get_recent_tracks("rj")  # Last.fm founder, public profile
print(f"fetched {len(tracks)} tracks")

pagination works via the page and limit params. always read @attr.totalPages from the response before assuming there’s more data — some users have short histories.

HTML scraping: when the API falls short

the API covers most structured data, but a few things only appear on the rendered page: the “similar artists” sidebar layout, listener count per country (inferred from profile data), and some tag-weight visualizations. for these, you need HTML scraping.

Last.fm artist pages follow a consistent structure. the listener count lives in a span with class header-metadata-display. similar artists appear in a grid that’s server-rendered, not loaded by JavaScript, so basic requests + BeautifulSoup works without a headless browser.

compare the two approaches:

dimension	Last.fm API	HTML scraping
rate limit	5 req/sec (key), documented	no stated limit, but blocks trigger around 100+ req/min per IP
data richness	structured JSON, all core fields	additional layout data, visual positioning
complexity	low — single HTTP call per endpoint	medium — CSS selectors break on redesigns
robustness	high — versioned, stable since 2005	medium — depends on markup stability
auth required	API key only	none for public pages

for scraping at scale beyond a few thousand pages, rotating residential proxies are worth using to avoid IP-level rate limiting. the same logic applies when you scrape SoundCloud artist and track data — both platforms tolerate modest crawl rates but block aggressive single-IP patterns.

building a dataset: scrobble history at scale

the most valuable Last.fm dataset for music ML is scrobble history across many users. the challenge is user discovery — the API has no “list all users” endpoint.

a practical approach:

seed with known public usernames (the API’s chart.getTopArtists response includes listener usernames in some contexts; artist.getInfo includes recent listeners)
pull each user’s top artists and recent tracks via user.getRecentTracks
extract artist names and use artist.getInfo to enrich with global play count, listener count, and tags
store to a local SQLite or Postgres table keyed on (username, artist_mbid) to deduplicate
expand the seed set by scraping the “listeners” section on popular artist pages

this graph-walk approach surfaces active scrobblers quickly. a 10,000-user sample is achievable in under 48 hours at the API rate limit.

for comparison, scraping Spotify public data gets you play counts and editorial playlists but not individual listening histories — Spotify requires auth tokens that expire every hour. Last.fm’s persistent public profiles are genuinely more accessible for longitudinal research.

handling data quality and missing fields

a few common issues when working with Last.fm data:

now playing tracks: user.getRecentTracks includes a @attr.nowplaying flag on the current track — filter these out if you need completed scrobbles only, since they have no timestamp
corrected artist names: Last.fm applies artist name corrections silently; the artistcorrected field tells you when a correction was applied
MusicBrainz IDs: mbid fields are often empty strings for smaller artists — don’t rely on them as primary keys, use artist name with lowercase normalization instead
deleted or private accounts: a 404 or error: 6 response means the user doesn’t exist or the profile is private; cache these to avoid retrying
Unicode normalization: artist names like “Sigur Rós” come back differently encoded depending on the endpoint — normalize to NFC before storing

the same data quality mindset applies when scraping niche music platforms. if you’re also pulling from Bandcamp artist pages and sales data, expect even more inconsistency since Bandcamp artist pages are largely user-generated with no enforced schema. Last.fm’s API is comparatively clean.

it’s also worth noting that the public-data-first strategy used here — no fake credentials, no session hijacking — mirrors the approach covered in scraping ZoomInfo without an account. when a platform exposes data publicly, you’re almost always better off using sanctioned access patterns than scraping authenticated endpoints.

Bottom line

start with the Last.fm API for structured data — it’s free, stable, and covers 95% of what you need for music datasets and trend analysis. add HTML scraping only for the specific page elements the API doesn’t expose, and use rotating proxies if you’re running above a few hundred pages per hour. dataresearchtools.com covers the full stack of music platform scraping, including Spotify, Apple Music, SoundCloud, and Bandcamp — the same techniques and infrastructure apply across all of them.