+
VectorFin
VectorFin Apache Iceberg Financial Data
Access VectorFin data directly via pyiceberg — open format, no vendor lock-in, bitemporal time-travel built in.
5 min
Setup time
7
Iceberg tables
5K+
US tickers
Nightly
Updates
Prerequisites
📋VectorFin Starter plan
🔑API key from app.vectorfinancials.com
☁️Apache Iceberg account
Connection Guide
1
Install pyiceberg and configure GCS catalog
Install dependencies and set up catalog connection to VectorFin's Polaris REST catalog.
bash
pip install "pyiceberg[gcs]" pandas numpy2
Connect to the Polaris catalog
Initialize a pyiceberg catalog pointing to VectorFin's REST catalog endpoint.
python
from pyiceberg.catalog import load_catalog
catalog = load_catalog(
"vectorfin",
**{
"type": "rest",
"uri": "https://catalog.vectorfinancials.com",
"credential": "client_id:client_secret", # from your VectorFin dashboard
"warehouse": "vectorfin_warehouse",
}
)
# List available namespaces and tables
print(catalog.list_namespaces())
# [('embeddings',), ('signals',)]
print(catalog.list_tables("signals"))
# [('signals', 'whystock_score'), ('signals', 'regime'), ...]3
Load a table and scan data
Open a VectorFin Iceberg table and scan into a pandas DataFrame.
python
import pandas as pd
# Open the whystock_score signals table
table = catalog.load_table("signals.whystock_score")
# Scan with filters (pushdown predicates)
from pyiceberg.expressions import And, GreaterThanOrEqual, EqualTo
df = table.scan(
row_filter=And(
EqualTo("ticker", "AAPL"),
GreaterThanOrEqual("date", "2024-01-01"),
),
selected_fields=("ticker", "date", "score", "components"),
).to_pandas()
print(df.head(10))4
Time-travel: point-in-time query
Use Iceberg's bitemporal knowledge_ts to reconstruct what was known at a specific date.
python
from pyiceberg.expressions import And, LessThanOrEqual, EqualTo
from datetime import datetime
# Load embeddings table
emb_table = catalog.load_table("embeddings.transcripts")
# What did we know about AAPL as of Jan 1, 2024?
df = emb_table.scan(
row_filter=And(
EqualTo("ticker", "AAPL"),
LessThanOrEqual("knowledge_ts", datetime(2024, 1, 1).isoformat()),
),
selected_fields=("ticker", "fiscal_period", "chunk_idx", "embedding"),
).to_pandas()
import numpy as np
E = np.stack(df["embedding"].values)
print(f"Loaded {len(E)} embeddings with shape {E.shape}")Available Tables
All 7 VectorFin data tables — bitemporal (effective_ts + knowledge_ts), append-only, nightly updates.
embeddings.transcriptsEarnings call chunk embeddings▼
sql
catalog.load_table("embeddings.transcripts").scan(row_filter=EqualTo("ticker", "AAPL")).to_pandas()embeddings.filingsSEC filing section embeddings▼
sql
catalog.load_table("embeddings.filings").scan(row_filter=EqualTo("filing_type", "10-K")).to_pandas()signals.whystock_scoreComposite quant score▼
sql
catalog.load_table("signals.whystock_score").scan().to_pandas().sort_values("score", ascending=False)signals.regimeMarket regime classification▼
sql
catalog.load_table("signals.regime").scan(row_filter=EqualTo("ticker", "NVDA")).to_pandas()signals.volatilityGARCH volatility forecasts▼
sql
catalog.load_table("signals.volatility").scan().to_pandas()signals.sentiment_driftEarnings sentiment drift▼
sql
catalog.load_table("signals.sentiment_drift").scan().to_pandas()signals.anomalyAnomaly detection scores▼
sql
catalog.load_table("signals.anomaly").scan(row_filter=GreaterThan("anomaly_score", 0.8)).to_pandas()Related Integrations
Start querying in 5 minutes
Sign up for VectorFin and get immediate API access.