How to Scrape Last.fm Listening Data and Artist Metadata (2026)

Last.fm sits on one of the richest public music datasets on the web — over 700 million scrobbles logged per month, public listening histories, artist play counts, and tag-based genre metadata. if you want to scrape Last.fm for music trend analysis, recommendation datasets, or popularity signals, you’re in luck: Last.fm exposes most of this through a free, documented API, with HTML scraping as a fallback for the edges the API doesn’t cover.

why Last.fm data is worth collecting

Last.fm tracks actual listening behavior, not editorial curation. that’s a fundamentally different signal from how to scrape Apple Music Charts and Playlists (2026), where chart positions reflect algorithmic promotion as much as listener preference. Last.fm scrobble counts come from real playback events submitted by users across Spotify, Winamp, and dozens of integrated clients.

use cases that justify the effort:

  • building music recommendation training datasets
  • tracking artist popularity trends over time using scrobble velocity
  • academic research on listener behavior and genre clustering
  • competitive benchmarking for music labels and A&R teams
  • enriching artist records with listener counts and tag metadata

the public profile model also helps: unlike Spotify, which requires OAuth to access personalized listening data, Last.fm public profiles expose recent tracks, loved tracks, and listening stats without any login. you get more with a free API key than most platforms offer after authentication.

the Last.fm API: what you can access

register at last.fm/api to get a free API key. the base URL is https://ws.audioscrobbler.com/2.0/. the key endpoints:

methodwhat it returnsnotes
user.getRecentTrackspaginated scrobble history for any public usermax 200 per page, supports from/to unix timestamps
user.getTopArtiststop artists by play count for a userperiod: 7day, 1month, 3month, 6month, 12month, overall
artist.getInfolisteners, scrobbles, bio, similar artists, tagsno auth needed
chart.getTopArtistssite-wide weekly top artistsglobal trending signal
tag.getTopTrackstop tracks for a genre taguseful for genre-specific datasets
track.getSimilarsimilar tracks by Last.fm collaborative filtering

rate limits are generous: unauthenticated requests cap at 5 per second, and the API documentation references a soft ceiling around 5 million calls per day for API key holders. in practice, at 5 req/sec you can pull about 430,000 calls per day if you run continuously — enough for most research workloads.

pulling user recent tracks in Python

import requests
import time

API_KEY = "your_api_key_here"
BASE_URL = "https://ws.audioscrobbler.com/2.0/"

def get_recent_tracks(username, pages=5):
    tracks = []
    for page in range(1, pages + 1):
        params = {
            "method": "user.getRecentTracks",
            "user": username,
            "api_key": API_KEY,
            "format": "json",
            "limit": 200,
            "page": page,
        }
        resp = requests.get(BASE_URL, params=params, timeout=10)
        resp.raise_for_status()
        data = resp.json()
        batch = data.get("recenttracks", {}).get("track", [])
        tracks.extend(batch)
        total_pages = int(data["recenttracks"]["@attr"]["totalPages"])
        if page >= total_pages:
            break
        time.sleep(0.2)  # stay under 5 req/sec
    return tracks

tracks = get_recent_tracks("rj")  # Last.fm founder, public profile
print(f"fetched {len(tracks)} tracks")

pagination works via the page and limit params. always read @attr.totalPages from the response before assuming there’s more data — some users have short histories.

HTML scraping: when the API falls short

the API covers most structured data, but a few things only appear on the rendered page: the “similar artists” sidebar layout, listener count per country (inferred from profile data), and some tag-weight visualizations. for these, you need HTML scraping.

Last.fm artist pages follow a consistent structure. the listener count lives in a span with class header-metadata-display. similar artists appear in a grid that’s server-rendered, not loaded by JavaScript, so basic requests + BeautifulSoup works without a headless browser.

compare the two approaches:

dimensionLast.fm APIHTML scraping
rate limit5 req/sec (key), documentedno stated limit, but blocks trigger around 100+ req/min per IP
data richnessstructured JSON, all core fieldsadditional layout data, visual positioning
complexitylow — single HTTP call per endpointmedium — CSS selectors break on redesigns
robustnesshigh — versioned, stable since 2005medium — depends on markup stability
auth requiredAPI key onlynone for public pages

for scraping at scale beyond a few thousand pages, rotating residential proxies are worth using to avoid IP-level rate limiting. the same logic applies when you scrape SoundCloud artist and track data — both platforms tolerate modest crawl rates but block aggressive single-IP patterns.

building a dataset: scrobble history at scale

the most valuable Last.fm dataset for music ML is scrobble history across many users. the challenge is user discovery — the API has no “list all users” endpoint.

a practical approach:

  1. seed with known public usernames (the API’s chart.getTopArtists response includes listener usernames in some contexts; artist.getInfo includes recent listeners)
  2. pull each user’s top artists and recent tracks via user.getRecentTracks
  3. extract artist names and use artist.getInfo to enrich with global play count, listener count, and tags
  4. store to a local SQLite or Postgres table keyed on (username, artist_mbid) to deduplicate
  5. expand the seed set by scraping the “listeners” section on popular artist pages

this graph-walk approach surfaces active scrobblers quickly. a 10,000-user sample is achievable in under 48 hours at the API rate limit.

for comparison, scraping Spotify public data gets you play counts and editorial playlists but not individual listening histories — Spotify requires auth tokens that expire every hour. Last.fm’s persistent public profiles are genuinely more accessible for longitudinal research.

handling data quality and missing fields

a few common issues when working with Last.fm data:

  • now playing tracks: user.getRecentTracks includes a @attr.nowplaying flag on the current track — filter these out if you need completed scrobbles only, since they have no timestamp
  • corrected artist names: Last.fm applies artist name corrections silently; the artistcorrected field tells you when a correction was applied
  • MusicBrainz IDs: mbid fields are often empty strings for smaller artists — don’t rely on them as primary keys, use artist name with lowercase normalization instead
  • deleted or private accounts: a 404 or error: 6 response means the user doesn’t exist or the profile is private; cache these to avoid retrying
  • Unicode normalization: artist names like “Sigur Rós” come back differently encoded depending on the endpoint — normalize to NFC before storing

the same data quality mindset applies when scraping niche music platforms. if you’re also pulling from Bandcamp artist pages and sales data, expect even more inconsistency since Bandcamp artist pages are largely user-generated with no enforced schema. Last.fm’s API is comparatively clean.

it’s also worth noting that the public-data-first strategy used here — no fake credentials, no session hijacking — mirrors the approach covered in scraping ZoomInfo without an account. when a platform exposes data publicly, you’re almost always better off using sanctioned access patterns than scraping authenticated endpoints.

Bottom line

start with the Last.fm API for structured data — it’s free, stable, and covers 95% of what you need for music datasets and trend analysis. add HTML scraping only for the specific page elements the API doesn’t expose, and use rotating proxies if you’re running above a few hundred pages per hour. dataresearchtools.com covers the full stack of music platform scraping, including Spotify, Apple Music, SoundCloud, and Bandcamp — the same techniques and infrastructure apply across all of them.

Related guides on dataresearchtools.com

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top

Resources

Proxy Signals Podcast
Operator-level insights on mobile proxies and access infrastructure.

Multi-Account Proxies: Setup, Types, Tools & Mistakes (2026)