How to Scrape SEC EDGAR Filings Data in 2026

How to Scrape SEC EDGAR Filings Data in 2026

SEC EDGAR (Electronic Data Gathering, Analysis, and Retrieval) is the U.S. Securities and Exchange Commission’s free database of corporate filings, containing over 21 million filings from public companies. For financial analysts, compliance professionals, investment researchers, and fintech developers, EDGAR provides the most authoritative source of public company financial data in the United States.

looking for premium 4G/5G IPs? our Singapore mobile proxies for scraping start at $40/month for 200GB.

Unlike most scraping targets, SEC EDGAR is explicitly designed for public data access and provides a well-documented API, making it one of the most scraper-friendly data sources available.

What Data Can You Extract?

SEC EDGAR contains comprehensive regulatory filings:

  • Annual reports (10-K) and quarterly reports (10-Q)
  • Current reports (8-K) for material events
  • Insider trading (Form 3, 4, 5)
  • Proxy statements (DEF 14A)
  • Registration statements (S-1 for IPOs)
  • XBRL financial data (structured financial statements)
  • Company information (CIK, SIC codes, addresses)
  • Filing history and amendments

Example JSON Output

{
  "company": {
    "cik": "0000320193",
    "name": "Apple Inc.",
    "ticker": "AAPL",
    "sic": "3571",
    "state": "CA"
  },
  "recent_filing": {
    "form_type": "10-K",
    "filing_date": "2025-11-01",
    "accession_number": "0000320193-25-000123",
    "primary_document": "aapl-20250927.htm",
    "url": "https://www.sec.gov/Archives/edgar/data/320193/..."
  }
}

Prerequisites

pip install requests sec-edgar-downloader beautifulsoup4 pandas

Method 1: Using SEC EDGAR API (Recommended)

SEC provides a free, public API (EDGAR Full-Text Search and company data APIs).

import requests
import json
import time

class SECEdgarScraper:
    def __init__(self, user_agent="YourName your@email.com"):
        self.session = requests.Session()
        self.base_url = "https://efts.sec.gov/LATEST"
        self.data_url = "https://data.sec.gov"
        self.headers = {
            "User-Agent": user_agent,
            "Accept": "application/json",
        }

    def search_companies(self, query):
        """Search for companies by name or ticker."""
        url = f"{self.data_url}/submissions/CIK{query.zfill(10)}.json"

        try:
            response = self.session.get(url, headers=self.headers, timeout=30)
            if response.status_code == 200:
                return response.json()
        except Exception:
            pass

        # Fallback: full-text search
        url = f"{self.base_url}/search-index?q={query}&dateRange=custom&startdt=2024-01-01&enddt=2026-12-31"
        try:
            response = self.session.get(url, headers=self.headers, timeout=30)
            response.raise_for_status()
            return response.json()
        except Exception as e:
            print(f"Error: {e}")
            return None

    def get_company_filings(self, cik, form_type=None):
        """Get filings for a company by CIK number."""
        cik_padded = str(cik).zfill(10)
        url = f"{self.data_url}/submissions/CIK{cik_padded}.json"

        try:
            response = self.session.get(url, headers=self.headers, timeout=30)
            response.raise_for_status()
            data = response.json()

            filings = data.get("filings", {}).get("recent", {})
            results = []

            forms = filings.get("form", [])
            dates = filings.get("filingDate", [])
            accessions = filings.get("accessionNumber", [])
            documents = filings.get("primaryDocument", [])

            for i in range(len(forms)):
                if form_type and forms[i] != form_type:
                    continue

                accession_clean = accessions[i].replace("-", "")
                results.append({
                    "form_type": forms[i],
                    "filing_date": dates[i],
                    "accession_number": accessions[i],
                    "primary_document": documents[i],
                    "url": f"https://www.sec.gov/Archives/edgar/data/{cik}/{accession_clean}/{documents[i]}",
                })

            return {
                "company_name": data.get("name"),
                "cik": cik,
                "ticker": data.get("tickers", [""])[0] if data.get("tickers") else None,
                "filings": results,
            }

        except requests.RequestException as e:
            print(f"Error: {e}")
            return None

    def get_xbrl_data(self, cik, taxonomy="us-gaap", tag="Revenue"):
        """Get structured XBRL financial data."""
        cik_padded = str(cik).zfill(10)
        url = f"{self.data_url}/api/xbrl/companyfacts/CIK{cik_padded}.json"

        try:
            response = self.session.get(url, headers=self.headers, timeout=30)
            response.raise_for_status()
            data = response.json()

            facts = data.get("facts", {}).get(taxonomy, {}).get(tag, {})
            units = facts.get("units", {})

            results = []
            for unit_type, values in units.items():
                for v in values:
                    results.append({
                        "value": v.get("val"),
                        "unit": unit_type,
                        "period_end": v.get("end"),
                        "period_start": v.get("start"),
                        "form": v.get("form"),
                        "filing_date": v.get("filed"),
                    })

            return results

        except Exception as e:
            print(f"Error: {e}")
            return []

    def get_insider_trading(self, cik):
        """Get insider trading filings (Form 4)."""
        return self.get_company_filings(cik, form_type="4")

    def search_filings(self, query, form_type=None, date_from=None, date_to=None):
        """Full-text search across all filings."""
        params = {"q": query, "from": 0, "size": 50}
        if form_type:
            params["forms"] = form_type
        if date_from:
            params["startdt"] = date_from
        if date_to:
            params["enddt"] = date_to

        try:
            response = self.session.get(
                f"{self.base_url}/search-index",
                params=params, headers=self.headers, timeout=30
            )
            response.raise_for_status()
            return response.json()
        except Exception as e:
            print(f"Error: {e}")
            return None


# Usage
scraper = SECEdgarScraper(user_agent="DataResearch admin@dataresearchtools.com")

# Get Apple filings
apple = scraper.get_company_filings(320193, form_type="10-K")
print(f"Company: {apple['company_name']}")
print(f"10-K filings: {len(apple['filings'])}")

# Get revenue data
revenue = scraper.get_xbrl_data(320193, tag="Revenues")
print(f"Revenue data points: {len(revenue)}")
for r in revenue[-4:]:
    print(f"  {r['period_end']}: ${r['value']:,.0f}")

# Get insider trading
insider = scraper.get_insider_trading(320193)
print(f"Form 4 filings: {len(insider['filings'])}")

SEC EDGAR Access Rules

SEC EDGAR has specific access requirements:

  • User-Agent: Must include your name and email address
  • Rate Limit: Maximum 10 requests per second
  • No Authentication: All data is freely accessible
  • robots.txt: Allows broad scraping with reasonable rate limits
# Required User-Agent format
headers = {
    "User-Agent": "CompanyName admin@company.com"
}

Proxy Recommendations

Proxy TypeNecessityBest For
NoneSufficientStandard use
DatacenterOptionalHigh-volume batch jobs

SEC EDGAR is designed for public access. Proxies are rarely needed. Just respect the 10 requests/second rate limit.

Legal Considerations

  1. Public Data: SEC filings are public records. No restrictions on accessing or using the data.
  2. Fair Access: SEC requests that users limit to 10 requests per second for fair access.
  3. Attribution: While not legally required, citing SEC as the data source is best practice.
  4. Redistribution: No restrictions on redistributing SEC filing data.

Frequently Asked Questions

Is SEC EDGAR data free?

Yes. All SEC EDGAR data is freely available to the public. No API key, registration, or authentication is required.

How do I find a company’s CIK number?

Search by company name or ticker at https://www.sec.gov/cgi-bin/browse-edgar?company=&CIK=AAPL. The CIK for Apple is 0000320193.

Can I download full financial statements?

Yes. Use the XBRL API for structured financial data, or download complete filing documents (HTML, XML) from the filing URLs.

How quickly are new filings available?

SEC filings typically appear on EDGAR within minutes of submission. Real-time filing notifications are available via the SEC’s RSS feeds.

Conclusion

SEC EDGAR is the gold standard for public company data access — free, well-documented, and explicitly designed for programmatic access. The API provides structured data for filings, financial statements, and company information without any anti-bot protections. Focus on the XBRL API for structured financial data and the full-text search for research queries.

For more financial data guides, visit our web scraping proxy guide and proxy provider comparisons.


Related Reading

last updated: May 11, 2026

Scroll to Top
message me on telegram

Resources

Proxy Signals Podcast
Operator-level insights on mobile proxies and access infrastructure.

Multi-Account Proxies: Setup, Types, Tools & Mistakes (2026)