NEW: Access all datasets via MCP Server — no SDK required. Any AI agent, any MCP-compatible client.
Learn moreThe most complete standardized SEC EDGAR dataset for quantitative research
105M+ financial facts from 12,000+ companies spanning 30+ years. Point-in-time accurate. Zero survivorship bias. Delivered as 8 Parquet tables you can query with DuckDB, Python, or Excel.
Built on the authoritative source for US public company financials
The SEC requires every public company to file structured XBRL financial statements. These filings are published as the EDGAR Financial Statements Data Sets — quarterly ZIP archives containing machine-readable data for every 10-K, 10-Q, 8-K, and 20-F filed with the Commission.
Each quarterly release includes five core files: num.txt (numeric values), sub.txt (submission metadata), tag.txt (XBRL tag definitions), pre.txt (presentation linkbase), and cal.txt (calculation linkbase).
Source Details
From raw SEC filings to queryable Parquet in 10 steps
Every financial fact passes through a deterministic pipeline that standardizes, enriches, and validates before export. No manual intervention. No estimation.
SEC EDGAR Ingestion
Quarterly XBRL bulk downloads from SEC EDGAR Financial Statements Data Sets are ingested automatically. Each release contains num.txt, sub.txt, tag.txt, pre.txt, and cal.txt files covering every public filing.
XBRL Submission Parsing
Every XBRL submission in the quarterly dump is parsed. Filing metadata, entity info, tagged numeric values, and calculation linkbases are extracted and validated.
Entity & Security Normalization
CIK numbers are resolved to standardized entity records. SIC codes are mapped to sectors and industries. Exchange information, ticker symbols, and CUSIP identifiers are enriched from multiple sources.
Concept Standardization
12,000+ raw XBRL tags are mapped to approximately 200 standardized financial concepts. Revenue synonyms, debt variants, and custom extensions all resolve to a single canonical concept name.
Point-in-Time Indexing
Every fact receives a knowledge_at timestamp equal to the SEC acceptance date of the filing that introduced it. No backfilling, no estimation. What was known on any date is precisely queryable.
Amendment Reconciliation
10-K/A and 10-Q/A filings (restated financials) are tracked separately. Original values and restated values coexist in the dataset, each with their own knowledge_at timestamp.
Derived Quarterly Values
Q2 and Q3 cash flow statements in 10-Qs report year-to-date totals. The pipeline computes the incremental quarterly figure and stores it as derived_quarterly_value alongside the raw YTD number.
Index Membership Enrichment
S&P500 and Russell 2000 membership is tracked historically with effective and end dates. The references table carries an is_sp500 boolean for instant filtering.
Parquet Export
Column-oriented Parquet files with ZSTD compression are generated for each tier. Optimized for DuckDB, Polars, and Spark. Exported to Cloudflare R2 after every EDGAR quarterly release.
Manifest Update
manifest.json records the snapshot date, last_updated timestamp, and row counts for every table. SDKs and integrations use this to detect fresh data automatically.
8 Parquet tables, one complete financial universe
Each table is a column-oriented Parquet file with ZSTD compression. Query with DuckDB, Polars, Spark, or any engine that reads Parquet.
Every SEC-registered entity that has filed XBRL financial statements. Includes active, delisted, bankrupt, and acquired companies.
ciknamesic_codesectorindustrystate_of_incorporationfiscal_year_endcategoryTicker symbols and exchange listings. One entity may have multiple securities. Tracks active status and delisting dates.
entity_idsymbolexchangecusipis_activedelisted_datesecurity_typeEvery XBRL filing processed: 10-K, 10-Q, 8-K, 20-F, and their amendments. The knowledge_at field is the SEC acceptance timestamp.
accession_identity_idform_typefiling_dateperiod_endknowledge_atis_amendmentThe core table. Every standardized financial fact extracted from every filing. Supports point-in-time queries via knowledge_at and quarterly derivation via derived_quarterly_value.
entity_idaccession_idstandard_conceptnumeric_valuederived_quarterly_valueunitdecimalsfiscal_yearfiscal_periodperiod_endknowledge_atform_typePre-computed valuation ratios and financial metrics. P/E, P/B, EV/EBITDA, FCF yield, ROE, ROIC, gross margin, and more. Updated with each filing cycle.
entity_idvaluation_datemodel_typeper_share_valuecurrent_pricemargin_of_safetyvaluation_labelMaps every raw XBRL tag to its standardized concept. Includes the original tag name, namespace, description, and financial statement category.
raw_tagnamespacestandard_conceptdescriptioncategorystatement_typeHistorical index constituents with effective and end dates. Know exactly which companies were in the S&P500 on any given date.
entity_idindex_nameeffective_dateend_dateDerived flat join of entity + security + index_membership. One row per security. Eliminates 3-table joins for most queries. The starting point for any cross-company analysis.
ciksymbolnamesectorindustryexchangeis_activeis_sp500sic_codeKnow exactly what was known, and when
Every fact in the dataset carries a knowledge_at timestamp — the exact date and time the SEC accepted the filing that introduced that fact.
This is critical for backtesting. Without point-in-time data, your 2015 backtest unknowingly uses data that was only available in 2016 (look-ahead bias). The result: inflated returns that evaporate in live trading.
Concrete Example: AAPL Q1 FY2020
Point-in-time query with the Python SDK
Fetch AAPL revenue as it was known on a specific date
from valuein_sdk import ValueinClient, ValueinError
sql = """
SELECT
r.symbol,
f.filing_date,
f.period_end,
f.knowledge_at,
fa.numeric_value / 1e9 AS revenue_billions,
f.form_type
FROM references r
JOIN filing f ON f.entity_id = r.cik
JOIN fact fa ON fa.accession_id = f.accession_id
WHERE r.symbol = 'AAPL'
AND fa.standard_concept = 'Revenues'
AND f.form_type IN ('10-Q', '10-Q/A')
AND f.knowledge_at <= '2020-05-01'
ORDER BY f.period_end DESC, f.knowledge_at DESC
LIMIT 5;
"""
try:
with ValueinClient() as client:
df = client.query(sql)
print(df)
except ValueinError as e:
print(f"Valuein error: {e}")Why this matters
- Eliminates look-ahead bias in walk-forward backtests
- Supports event studies around filing dates
- Amendment tracking shows original vs. restated values
- Reproduces any historical research state exactly
Every company that ever filed. Including the ones that failed.
Most financial datasets only include companies that are still active today. That means your backtest never considers the Enrons, the Lehmans, the RadioShacks — companies that went bankrupt and dragged portfolios down.
The result? Inflated historical returns that don't replicate in live trading. Academic research estimates survivorship bias overstates annual returns by 1-2 percentage points.
Valuein includes every entity that ever filed XBRL financial statements with the SEC. Delisted, bankrupt, acquired, merged — they are all here, with their complete filing history up to the date they ceased operations.
Universe Composition
~50% of the full universe consists of companies that are no longer actively trading. Excluding them fundamentally distorts any historical analysis.
Notable companies in the full universe
All with complete financial statements through their final SEC filing.
12,000+ XBRL tags. ~200 standardized concepts.
The XBRL taxonomy is sprawling. Apple reports revenue as RevenueFromContractWithCustomerExcludingAssessedTax. Older filings use SalesRevenueNet. Some companies create custom extensions entirely. Cross-company analysis becomes impossible without standardization.
Valuein maps every raw tag to a canonical standard_concept — while preserving the original tag in the taxonomy_guide table. No black box. Full provenance.
| Raw XBRL Tag | Standardized Concept | |
|---|---|---|
us-gaap:RevenueFromContractWithCustomerExcludingAssessedTax | Revenues | |
us-gaap:SalesRevenueNet | Revenues | |
us-gaap:Revenues | Revenues | |
us-gaap:SalesRevenueGoodsNet | Revenues | |
us-gaap:RevenueFromContractWithCustomerIncludingAssessedTax | Revenues | |
custom:TotalNetRevenues | Revenues |
This is just one concept. Revenue alone has 80+ raw XBRL synonyms across the filing universe. The taxonomy_guide table documents every mapping — browse it in the Data Catalog.
Original and restated values, side by side
When a company files a 10-K/A or 10-Q/A, it is restating previously reported financial data. Most datasets silently overwrite the original values. Valuein keeps both.
The original filing and the amendment each have their own knowledge_at timestamp. The is_amendment flag on the filing table distinguishes them. You can query original-only, amended-only, or compare both.
- 10-K/A: Amended annual report — restated annual financials
- 10-Q/A: Amended quarterly report — restated quarterly financials
- Both original and restated values stored with distinct knowledge_at
- is_amendment flag on the filing table for easy filtering
Query both original and restated values
Compare a company's original 10-K with its amendment
from valuein_sdk import ValueinClient, ValueinError
sql = """
SELECT
r.symbol,
f.form_type,
f.filing_date,
f.knowledge_at,
f.is_amendment,
fa.standard_concept,
fa.numeric_value / 1e9 AS value_billions
FROM references r
JOIN filing f ON f.entity_id = r.cik
JOIN fact fa ON fa.accession_id = f.accession_id
WHERE r.symbol = 'XYZ'
AND fa.standard_concept = 'Revenues'
AND f.form_type IN ('10-K', '10-K/A')
AND f.period_end = '2023-12-31'
ORDER BY f.knowledge_at ASC;
"""
try:
with ValueinClient() as client:
df = client.query(sql)
# Row 1: original 10-K
# Row 2: amended 10-K/A (if filed)
print(df)
except ValueinError as e:
print(f"Valuein error: {e}")Coverage at a glance
Filing Types Covered
10-KAnnual report
10-QQuarterly report
8-KCurrent report (material events)
20-FAnnual report (foreign private issuers)
10-K/AAnnual report amendment
10-Q/AQuarterly report amendment
Tier Breakdown
| Tier | Data Scope | Rate Limit | Price |
|---|---|---|---|
| S&P500 | S&P500 · 500+ tickers · 1994–present | 10,000 / day | Free |
| Institutional | Full universe · 12,000+ tickers incl. delisted · 1994–present | Unlimited | $200/mo |
| Custom | Custom · Negotiated universe · redistribution license | Unlimited + SLA | Contact sales |
Start querying 105M+ facts today
Register free to access the full S&P500 universe — no credit card required. Institutional full-universe access at $200/mo.
Also available via Excel Power Query and direct Parquet download.