What Is Firecrawl? Features, Pricing, Tutorial
If you’ve been researching AI-powered web scraping tools, you’ve probably come across Firecrawl. Launched by Mendable and quickly adopted by thousands of developers, Firecrawl has become one of the most talked-about tools in the modern data extraction landscape.
But what exactly is Firecrawl, and should you use it for your next project? This article breaks down everything you need to know — what it does, how it works, what it costs, and how to get started with a hands-on tutorial.
Firecrawl in a Nutshell
Firecrawl is an API service that converts any website into clean, LLM-ready data. You give it a URL, and it returns the page content as structured markdown, cleaned HTML, or extracted JSON — with JavaScript fully rendered and boilerplate content removed.
Think of it as a bridge between the messy reality of the web and the clean, structured data that AI applications need.
The Problem Firecrawl Solves
Traditional web scraping involves:
- Sending HTTP requests to get raw HTML
- Parsing HTML with tools like BeautifulSoup
- Handling JavaScript rendering with Selenium or Playwright
- Writing CSS selectors or XPath queries for each site
- Dealing with anti-bot protections
- Cleaning extracted content (removing ads, navigation, footers)
- Maintaining all of this when websites change
Firecrawl handles steps 1 through 6 with a single API call. And because it uses AI for content extraction rather than brittle CSS selectors, step 7 largely disappears too.
How Firecrawl Works
Under the hood, Firecrawl combines several technologies:
URL Input → Headless Browser → Page Rendering → Content Extraction → AI Cleaning → Clean Output- Headless Browser Rendering — Firecrawl loads pages in a full Chromium browser, executing JavaScript just like a real user’s browser would
- Content Detection — AI identifies the main content area, separating it from navigation, ads, and boilerplate
- Markdown Conversion — HTML is converted to clean markdown, preserving structure (headings, lists, tables, code blocks) while removing clutter
- Optional LLM Extraction — For structured data needs, Firecrawl can use language models to extract specific fields according to a schema you define
Core Features
Four Operating Modes
| Mode | Purpose | Use Case |
|---|---|---|
| Scrape | Extract content from one URL | Get a single article, product page, or document |
| Crawl | Follow links and extract from multiple pages | Index an entire documentation site or blog |
| Map | Discover all URLs on a site | Plan a targeted scraping strategy |
| Extract | Pull structured data using LLM | Get product details, pricing, contact info as JSON |
Output Formats
Firecrawl can return data in multiple formats from a single request:
- Markdown — Clean, readable text with formatting preserved
- HTML — Cleaned HTML with boilerplate removed
- Raw HTML — Complete page source
- Links — All URLs found on the page
- Screenshots — Visual capture of the rendered page
- Extract — Structured JSON based on your schema
Built-In Capabilities
- JavaScript rendering — Handles React, Vue, Angular, and all SPA frameworks
- Anti-bot bypass — Stealth techniques for accessing protected sites
- Automatic pagination — Follow “next page” links automatically
- Mobile rendering — Render pages as they appear on mobile devices
- Wait conditions — Wait for specific elements or timeouts before extraction
- Custom headers — Send authentication tokens, cookies, or custom user agents
Firecrawl Pricing Breakdown
Understanding the pricing model is important before committing to any scraping platform.
Credit System
Firecrawl uses a credit-based system where 1 credit = 1 page operation:
| Plan | Monthly Credits | Price | Per-Credit Cost |
|---|---|---|---|
| Free | 500 | $0 | $0 |
| Hobby | 3,000 | $16 | $0.0053 |
| Standard | 100,000 | $83 | $0.00083 |
| Growth | 500,000 | $333 | $0.00067 |
| Enterprise | Custom | Custom | Negotiable |
What Counts as a Credit
- One
scrapecall = 1 credit - One page in a
crawl= 1 credit - One
mapcall = 1 credit - One
extractcall = 1 credit (plus LLM costs on self-hosted)
Is Firecrawl Worth the Cost?
For context, manually building and maintaining equivalent scraping infrastructure typically costs:
- Developer time: 20-40 hours to build a robust scraper
- Infrastructure: $50-200/month for proxies, browser automation, and servers
- Maintenance: 5-10 hours/month to fix broken selectors and handle site changes
At $83/month for 100,000 pages, Firecrawl is often cheaper than the alternatives — especially when you factor in engineering time.
Self-Hosting: The Free Alternative
Firecrawl is open source. You can self-host it for free (minus your infrastructure costs):
git clone https://github.com/mendableai/firecrawl.git
cd firecrawl
docker compose up -dSelf-hosting eliminates credit limits but requires you to manage servers, proxies, and updates yourself.
Getting Started Tutorial
Let’s build a practical project: scraping a documentation site to create a local knowledge base.
Step 1: Install the SDK
pip install firecrawl-pyStep 2: Initialize the Client
from firecrawl import FirecrawlApp
app = FirecrawlApp(api_key="fc-your-api-key")Step 3: Scrape a Single Page
Start with a basic scrape to understand the output:
result = app.scrape_url("https://docs.example.com/introduction", {
"formats": ["markdown"]
})
print(result["markdown"][:500])
print(f"\nMetadata: {result['metadata']['title']}")Step 4: Discover the Site Structure
Use Map mode to find all documentation pages:
map_result = app.map_url("https://docs.example.com", {
"limit": 1000
})
doc_urls = [url for url in map_result["links"] if "/docs/" in url]
print(f"Found {len(doc_urls)} documentation pages")Step 5: Crawl the Documentation
crawl_result = app.crawl_url("https://docs.example.com", {
"limit": 200,
"includePaths": ["/docs/*"],
"excludePaths": ["/docs/changelog/*"],
"formats": ["markdown"]
})
print(f"Crawled {len(crawl_result['data'])} pages")Step 6: Save as a Knowledge Base
import json
import os
output_dir = "knowledge_base"
os.makedirs(output_dir, exist_ok=True)
for page in crawl_result["data"]:
# Create a filename from the URL
slug = page["metadata"]["url"].split("/")[-1] or "index"
filepath = os.path.join(output_dir, f"{slug}.md")
# Write markdown content
with open(filepath, "w") as f:
f.write(f"# {page['metadata']['title']}\n\n")
f.write(f"Source: {page['metadata']['url']}\n\n")
f.write(page["markdown"])
print(f"Saved: {filepath}")
print(f"\nKnowledge base created with {len(crawl_result['data'])} documents")Step 7: Extract Structured Data
For pages where you need specific fields rather than full content:
from pydantic import BaseModel
from typing import List
class APIEndpoint(BaseModel):
method: str
path: str
description: str
parameters: List[str]
result = app.scrape_url("https://docs.example.com/api-reference", {
"formats": ["extract"],
"extract": {
"schema": APIEndpoint.model_json_schema(),
"prompt": "Extract all API endpoints documented on this page"
}
})
for endpoint in result["extract"]:
print(f"{endpoint['method']} {endpoint['path']}: {endpoint['description']}")Common Use Cases
1. RAG Pipeline Data Collection
Firecrawl is widely used to feed RAG (Retrieval-Augmented Generation) pipelines with fresh web data. The clean markdown output is ideal for chunking and embedding.
2. Competitive Intelligence
Monitor competitor websites for pricing changes, new features, and content updates. The Extract mode makes it easy to pull structured pricing data without writing custom parsers.
3. Content Aggregation
Build curated content feeds by crawling multiple sources and extracting article summaries, publish dates, and key topics.
4. Lead Generation
Scrape business directories, review sites, and company pages to extract contact information and company details for B2B lead generation.
5. AI Training Data
Collect diverse web content for fine-tuning language models or building training datasets, with Firecrawl handling the content cleaning automatically.
Limitations to Know About
No tool is perfect. Here are Firecrawl’s current limitations:
- Rate limits — Even paid plans have concurrency limits that may slow down very large crawls
- Credit costs add up — For millions of pages per month, self-hosting or traditional scrapers may be more cost-effective
- LLM extraction accuracy — Extract mode depends on LLM quality, which can occasionally misinterpret complex layouts
- No built-in scheduling — You need external tools (cron, n8n, Airflow) for recurring scrapes
- Anti-bot limitations — While good, some heavily protected sites may still block requests. In these cases, combining Firecrawl with residential proxies can help.
Firecrawl vs Other AI Scrapers
| Feature | Firecrawl | Crawl4ai | ScrapeGraphAI |
|---|---|---|---|
| Type | API + Self-host | Open source library | Open source library |
| Language | Multi-SDK | Python | Python |
| LLM Required | Only for Extract | Optional | Required |
| Output | Markdown/JSON | Markdown/JSON | JSON |
| Crawling | Built-in | Built-in | Limited |
| Anti-Bot | Built-in | Basic | None |
| Ease of Use | Easiest | Moderate | Moderate |
| Cost | Free tier + paid | Free | Free (+ LLM costs) |
For a detailed comparison, see our Crawl4ai vs Firecrawl breakdown or our best AI web scrapers roundup.
Frequently Asked Questions
What is Firecrawl used for?
Firecrawl is used for converting websites into clean data for AI applications. Common uses include building knowledge bases for RAG systems, collecting training data for LLMs, monitoring competitor websites, aggregating content from multiple sources, and extracting structured data like product details or pricing information.
Is Firecrawl open source?
Yes, Firecrawl’s core is open source under the AGPL license. You can self-host it using Docker for free with unlimited usage. The commercial cloud version adds managed infrastructure, higher reliability, better anti-bot capabilities, and premium support.
Do I need an LLM API key to use Firecrawl?
For the cloud version, no — LLM extraction is handled server-side. For the self-hosted version, you need an OpenAI API key (or compatible LLM endpoint) only if you want to use Extract mode. The Scrape, Crawl, and Map modes work without any LLM.
How many pages can I scrape with Firecrawl?
The free plan allows 500 pages per month. Paid plans range from 3,000 to 500,000+ pages monthly. Self-hosted Firecrawl has no credit limits — your throughput is limited only by your infrastructure capacity.
Can Firecrawl replace BeautifulSoup or Scrapy?
For most use cases, yes. Firecrawl handles JavaScript rendering, content cleaning, and anti-bot bypasses that BeautifulSoup cannot. However, BeautifulSoup and Scrapy remain better choices for very high-volume scraping of simple HTML pages where per-page costs matter and you don’t need AI-powered extraction.
Conclusion
Firecrawl has earned its popularity by solving the right problem at the right time. As AI applications increasingly need clean web data, tools that bridge the gap between messy websites and structured inputs become essential.
If you’re building anything that consumes web data — from chatbots to market research dashboards — Firecrawl is worth evaluating. Start with the free tier, try the Python tutorial above, and see how it compares to your current workflow.
For more AI-powered scraping tools and comparisons, explore our complete guide to AI web scrapers.
- AI Web Scraper with Python: Build Your Own
- Best AI Web Scrapers 2026: Complete Comparison
- Agentic Browsers Explained: Browserbase, Browser Use, and Proxy Infrastructure
- Agentic Browsers Explained: The Future of AI + Proxies in 2026
- How AI Agents Use Proxies for Real-Time Web Data Collection in 2026
- Mobile Proxies for AI Data Collection: Web Scraping for Training Data
- AI Web Scraper with Python: Build Your Own
- Best AI Web Scrapers 2026: Complete Comparison
- Agentic Browsers Explained: Browserbase, Browser Use, and Proxy Infrastructure
- Agentic Browsers Explained: The Future of AI + Proxies in 2026
- How AI Agents Use Proxies for Real-Time Web Data Collection in 2026
- Mobile Proxies for AI Data Collection: Web Scraping for Training Data
Related Reading
- AI Web Scraper with Python: Build Your Own
- Best AI Web Scrapers 2026: Complete Comparison
- Agentic Browsers Explained: Browserbase, Browser Use, and Proxy Infrastructure
- Agentic Browsers Explained: The Future of AI + Proxies in 2026
- How AI Agents Use Proxies for Real-Time Web Data Collection in 2026
- Mobile Proxies for AI Data Collection: Web Scraping for Training Data