VectorFin Apache Iceberg Catalog
Connect to the VectorFin Polaris REST catalog with pyiceberg or any Iceberg client. Pro/Enterprise — catalog auth is self-serve; raw data scans require a one-time GCS grant from support.
Prerequisites
Connection Guide
Provision a Polaris credential (Pro / Enterprise)
Sign in to the dashboard, open Data Access, and click Provision. You'll receive a client_id and a client_secret — the secret is shown ONCE, so paste it into your secret manager immediately.
# From app.vectorfinancials.com/dashboard/data-access:
# Catalog URI: https://catalog.vectorfinancials.com/api/catalog
# Warehouse: vectorfin
# Client ID: <shown in dashboard>
# Client Secret: <shown ONCE — copy now>
export VF_POLARIS_CLIENT_ID="..."
export VF_POLARIS_CLIENT_SECRET="..."Connect pyiceberg to the Polaris REST catalog
Catalog metadata operations — list_namespaces, list_tables, load_table — work over the REST API alone. No GCS access required for these.
# pip install "pyiceberg[gcs]" pandas
from pyiceberg.catalog import load_catalog
import os
catalog = load_catalog(
"vectorfin",
**{
"type": "rest",
"uri": "https://catalog.vectorfinancials.com/api/catalog",
"warehouse": "vectorfin",
"credential": f"{os.environ['VF_POLARIS_CLIENT_ID']}:{os.environ['VF_POLARIS_CLIENT_SECRET']}",
"scope": "PRINCIPAL_ROLE:ALL",
},
)
print(catalog.list_namespaces(("vectorfin",)))
# [('vectorfin', 'embeddings'), ('vectorfin', 'signals')]
print(catalog.list_tables(("vectorfin", "signals")))
# whystock_score, regime, volatility, sentiment_drift, anomalyRequest GCS read access for your service account
pyiceberg needs to read Parquet/metadata files from gs://vectorfinancials-data/warehouse/vectorfin/. We grant prefix-scoped roles/storage.objectViewer to a service account you control — open a ticket with the SA email from your GCP project. Turnaround: 1 business day. (We do not vend HMAC keys; we use IAM grants so you keep audit on your side.)
# 1. In your own GCP project, create a service account (or pick an existing one).
# 2. Email the SA email address (e.g. iceberg@your-proj.iam.gserviceaccount.com)
# to support@vectorfinancials.com — subject "Iceberg GCS grant <org>".
# 3. We run polaris/grant-snowflake-sa.sh equivalent for your SA, scoped to:
# gs://vectorfinancials-data/warehouse/vectorfin/*
# 4. Authenticate locally as that SA before running pyiceberg scans:
gcloud auth application-default login --impersonate-service-account=iceberg@your-proj.iam.gserviceaccount.comLoad a table and scan with pushdown filters
Once GCS is granted, table.scan() reads Parquet directly from GCS via the Polaris-issued metadata path. Filters and column projection are pushed down.
from pyiceberg.expressions import And, EqualTo, GreaterThanOrEqual
table = catalog.load_table(("vectorfin", "signals", "whystock_score"))
df = table.scan(
row_filter=And(
EqualTo("ticker", "AAPL"),
GreaterThanOrEqual("date", "2024-01-01"),
),
selected_fields=("ticker", "date", "score", "components"),
).to_pandas()
print(df.head())Bitemporal time-travel via knowledge_ts
Every VectorFin table is append-only with effective_ts (when the fact applied) and knowledge_ts (when we learned it). Filtering on knowledge_ts gives a true point-in-time view — no lookahead bias in backtests.
from pyiceberg.expressions import And, EqualTo, LessThanOrEqual
from datetime import datetime
# What did we know about AAPL signals as of 2024-01-01?
df = table.scan(
row_filter=And(
EqualTo("ticker", "AAPL"),
LessThanOrEqual("knowledge_ts", datetime(2024, 1, 1).isoformat()),
),
selected_fields=("ticker", "date", "score", "effective_ts", "knowledge_ts"),
).to_pandas()
# Latest known row per (ticker, date)
df = (df.sort_values("knowledge_ts")
.drop_duplicates(["ticker", "date"], keep="last")
.sort_values("date"))
print(df.tail())Available Tables
All 7 VectorFin data tables — bitemporal (effective_ts + knowledge_ts), append-only, nightly updates.
vectorfin.embeddings.transcriptsEarnings call chunk embeddings (768-dim)▼
catalog.load_table(("vectorfin","embeddings","transcripts")).scan(row_filter=EqualTo("ticker","AAPL")).to_pandas()vectorfin.embeddings.filingsSEC filing section embeddings▼
catalog.load_table(("vectorfin","embeddings","filings")).scan(row_filter=EqualTo("filing_type","10-K")).to_pandas()vectorfin.signals.whystock_scoreComposite quant score (0–100)▼
catalog.load_table(("vectorfin","signals","whystock_score")).scan().to_pandas()vectorfin.signals.regimeMarket regime (trending/ranging/volatile)▼
catalog.load_table(("vectorfin","signals","regime")).scan(row_filter=EqualTo("ticker","NVDA")).to_pandas()vectorfin.signals.volatilityGARCH volatility forecasts (1d/5d/21d)▼
catalog.load_table(("vectorfin","signals","volatility")).scan().to_pandas()vectorfin.signals.sentiment_driftEarnings sentiment drift vectors▼
catalog.load_table(("vectorfin","signals","sentiment_drift")).scan().to_pandas()vectorfin.signals.anomalyAnomaly scores and flags▼
catalog.load_table(("vectorfin","signals","anomaly")).scan(row_filter=GreaterThanOrEqual("anomaly_score",0.8)).to_pandas()Related Integrations
Start querying in 15 minutes
Sign up for VectorFin and get immediate API access.