What Is Firecrawl? Features, Pricing, Tutorial

If you’ve been researching AI-powered web scraping tools, you’ve probably come across Firecrawl. Launched by Mendable and quickly adopted by thousands of developers, Firecrawl has become one of the most talked-about tools in the modern data extraction landscape.

But what exactly is Firecrawl, and should you use it for your next project? This article breaks down everything you need to know — what it does, how it works, what it costs, and how to get started with a hands-on tutorial.

Firecrawl in a Nutshell

Firecrawl is an API service that converts any website into clean, LLM-ready data. You give it a URL, and it returns the page content as structured markdown, cleaned HTML, or extracted JSON — with JavaScript fully rendered and boilerplate content removed.

Think of it as a bridge between the messy reality of the web and the clean, structured data that AI applications need.

The Problem Firecrawl Solves

Traditional web scraping involves:

Sending HTTP requests to get raw HTML
Parsing HTML with tools like BeautifulSoup
Handling JavaScript rendering with Selenium or Playwright
Writing CSS selectors or XPath queries for each site
Dealing with anti-bot protections
Cleaning extracted content (removing ads, navigation, footers)
Maintaining all of this when websites change

Firecrawl handles steps 1 through 6 with a single API call. And because it uses AI for content extraction rather than brittle CSS selectors, step 7 largely disappears too.

How Firecrawl Works

Under the hood, Firecrawl combines several technologies:

URL Input → Headless Browser → Page Rendering → Content Extraction → AI Cleaning → Clean Output

Headless Browser Rendering — Firecrawl loads pages in a full Chromium browser, executing JavaScript just like a real user’s browser would
Content Detection — AI identifies the main content area, separating it from navigation, ads, and boilerplate
Markdown Conversion — HTML is converted to clean markdown, preserving structure (headings, lists, tables, code blocks) while removing clutter
Optional LLM Extraction — For structured data needs, Firecrawl can use language models to extract specific fields according to a schema you define

Core Features

Four Operating Modes

Mode	Purpose	Use Case
Scrape	Extract content from one URL	Get a single article, product page, or document
Crawl	Follow links and extract from multiple pages	Index an entire documentation site or blog
Map	Discover all URLs on a site	Plan a targeted scraping strategy
Extract	Pull structured data using LLM	Get product details, pricing, contact info as JSON

Output Formats

Firecrawl can return data in multiple formats from a single request:

Markdown — Clean, readable text with formatting preserved
HTML — Cleaned HTML with boilerplate removed
Raw HTML — Complete page source
Links — All URLs found on the page
Screenshots — Visual capture of the rendered page
Extract — Structured JSON based on your schema

Built-In Capabilities

JavaScript rendering — Handles React, Vue, Angular, and all SPA frameworks
Anti-bot bypass — Stealth techniques for accessing protected sites
Automatic pagination — Follow “next page” links automatically
Mobile rendering — Render pages as they appear on mobile devices
Wait conditions — Wait for specific elements or timeouts before extraction
Custom headers — Send authentication tokens, cookies, or custom user agents

Firecrawl Pricing Breakdown

Understanding the pricing model is important before committing to any scraping platform.

Credit System

Firecrawl uses a credit-based system where 1 credit = 1 page operation:

Plan	Monthly Credits	Price	Per-Credit Cost
Free	500	$0	$0
Hobby	3,000	$16	$0.0053
Standard	100,000	$83	$0.00083
Growth	500,000	$333	$0.00067
Enterprise	Custom	Custom	Negotiable

What Counts as a Credit

One scrape call = 1 credit
One page in a crawl = 1 credit
One map call = 1 credit
One extract call = 1 credit (plus LLM costs on self-hosted)

Is Firecrawl Worth the Cost?

For context, manually building and maintaining equivalent scraping infrastructure typically costs:

Developer time: 20-40 hours to build a robust scraper
Infrastructure: $50-200/month for proxies, browser automation, and servers
Maintenance: 5-10 hours/month to fix broken selectors and handle site changes

At $83/month for 100,000 pages, Firecrawl is often cheaper than the alternatives — especially when you factor in engineering time.

Self-Hosting: The Free Alternative

Firecrawl is open source. You can self-host it for free (minus your infrastructure costs):

git clone https://github.com/mendableai/firecrawl.git
cd firecrawl
docker compose up -d

Self-hosting eliminates credit limits but requires you to manage servers, proxies, and updates yourself.

Getting Started Tutorial

Let’s build a practical project: scraping a documentation site to create a local knowledge base.

Step 1: Install the SDK

pip install firecrawl-py

Step 2: Initialize the Client

from firecrawl import FirecrawlApp

app = FirecrawlApp(api_key="fc-your-api-key")

Step 3: Scrape a Single Page

Start with a basic scrape to understand the output:

result = app.scrape_url("https://docs.example.com/introduction", {
    "formats": ["markdown"]
})

print(result["markdown"][:500])
print(f"\nMetadata: {result['metadata']['title']}")

Step 4: Discover the Site Structure

Use Map mode to find all documentation pages:

map_result = app.map_url("https://docs.example.com", {
    "limit": 1000
})

doc_urls = [url for url in map_result["links"] if "/docs/" in url]
print(f"Found {len(doc_urls)} documentation pages")

Step 5: Crawl the Documentation

crawl_result = app.crawl_url("https://docs.example.com", {
    "limit": 200,
    "includePaths": ["/docs/*"],
    "excludePaths": ["/docs/changelog/*"],
    "formats": ["markdown"]
})

print(f"Crawled {len(crawl_result['data'])} pages")

Step 6: Save as a Knowledge Base

import json
import os

output_dir = "knowledge_base"
os.makedirs(output_dir, exist_ok=True)

for page in crawl_result["data"]:
    # Create a filename from the URL
    slug = page["metadata"]["url"].split("/")[-1] or "index"
    filepath = os.path.join(output_dir, f"{slug}.md")

    # Write markdown content
    with open(filepath, "w") as f:
        f.write(f"# {page['metadata']['title']}\n\n")
        f.write(f"Source: {page['metadata']['url']}\n\n")
        f.write(page["markdown"])

    print(f"Saved: {filepath}")

print(f"\nKnowledge base created with {len(crawl_result['data'])} documents")

Step 7: Extract Structured Data

For pages where you need specific fields rather than full content:

from pydantic import BaseModel
from typing import List

class APIEndpoint(BaseModel):
    method: str
    path: str
    description: str
    parameters: List[str]

result = app.scrape_url("https://docs.example.com/api-reference", {
    "formats": ["extract"],
    "extract": {
        "schema": APIEndpoint.model_json_schema(),
        "prompt": "Extract all API endpoints documented on this page"
    }
})

for endpoint in result["extract"]:
    print(f"{endpoint['method']} {endpoint['path']}: {endpoint['description']}")

Common Use Cases

1. RAG Pipeline Data Collection

Firecrawl is widely used to feed RAG (Retrieval-Augmented Generation) pipelines with fresh web data. The clean markdown output is ideal for chunking and embedding.

2. Competitive Intelligence

Monitor competitor websites for pricing changes, new features, and content updates. The Extract mode makes it easy to pull structured pricing data without writing custom parsers.

3. Content Aggregation

Build curated content feeds by crawling multiple sources and extracting article summaries, publish dates, and key topics.

4. Lead Generation

Scrape business directories, review sites, and company pages to extract contact information and company details for B2B lead generation.

5. AI Training Data

Collect diverse web content for fine-tuning language models or building training datasets, with Firecrawl handling the content cleaning automatically.

Limitations to Know About

No tool is perfect. Here are Firecrawl’s current limitations:

Rate limits — Even paid plans have concurrency limits that may slow down very large crawls
Credit costs add up — For millions of pages per month, self-hosting or traditional scrapers may be more cost-effective
LLM extraction accuracy — Extract mode depends on LLM quality, which can occasionally misinterpret complex layouts
No built-in scheduling — You need external tools (cron, n8n, Airflow) for recurring scrapes
Anti-bot limitations — While good, some heavily protected sites may still block requests. In these cases, combining Firecrawl with residential proxies can help.

Firecrawl vs Other AI Scrapers

Feature	Firecrawl	Crawl4ai	ScrapeGraphAI
Type	API + Self-host	Open source library	Open source library
Language	Multi-SDK	Python	Python
LLM Required	Only for Extract	Optional	Required
Output	Markdown/JSON	Markdown/JSON	JSON
Crawling	Built-in	Built-in	Limited
Anti-Bot	Built-in	Basic	None
Ease of Use	Easiest	Moderate	Moderate
Cost	Free tier + paid	Free	Free (+ LLM costs)

For a detailed comparison, see our Crawl4ai vs Firecrawl breakdown or our best AI web scrapers roundup.

Frequently Asked Questions

What is Firecrawl used for?

Firecrawl is used for converting websites into clean data for AI applications. Common uses include building knowledge bases for RAG systems, collecting training data for LLMs, monitoring competitor websites, aggregating content from multiple sources, and extracting structured data like product details or pricing information.

Is Firecrawl open source?

Yes, Firecrawl’s core is open source under the AGPL license. You can self-host it using Docker for free with unlimited usage. The commercial cloud version adds managed infrastructure, higher reliability, better anti-bot capabilities, and premium support.

Do I need an LLM API key to use Firecrawl?

For the cloud version, no — LLM extraction is handled server-side. For the self-hosted version, you need an OpenAI API key (or compatible LLM endpoint) only if you want to use Extract mode. The Scrape, Crawl, and Map modes work without any LLM.

How many pages can I scrape with Firecrawl?

The free plan allows 500 pages per month. Paid plans range from 3,000 to 500,000+ pages monthly. Self-hosted Firecrawl has no credit limits — your throughput is limited only by your infrastructure capacity.

Can Firecrawl replace BeautifulSoup or Scrapy?

For most use cases, yes. Firecrawl handles JavaScript rendering, content cleaning, and anti-bot bypasses that BeautifulSoup cannot. However, BeautifulSoup and Scrapy remain better choices for very high-volume scraping of simple HTML pages where per-page costs matter and you don’t need AI-powered extraction.

Conclusion

Firecrawl has earned its popularity by solving the right problem at the right time. As AI applications increasingly need clean web data, tools that bridge the gap between messy websites and structured inputs become essential.

If you’re building anything that consumes web data — from chatbots to market research dashboards — Firecrawl is worth evaluating. Start with the free tier, try the Python tutorial above, and see how it compares to your current workflow.

For more AI-powered scraping tools and comparisons, explore our complete guide to AI web scrapers.