VectorFin/Integrations/Apache Iceberg
+
VectorFin
Self-serve setup + 1× support grant

VectorFin Apache Iceberg Catalog

Connect to the VectorFin Polaris REST catalog with pyiceberg or any Iceberg client. Pro/Enterprise — catalog auth is self-serve; raw data scans require a one-time GCS grant from support.

15 min
Setup time
7
Iceberg tables
5K+
US tickers
Nightly
Updates

Prerequisites

📋VectorFin Pro plan
🔑API key from app.vectorfinancials.com
☁️Apache Iceberg account

Connection Guide

1

Provision a Polaris credential (Pro / Enterprise)

Sign in to the dashboard, open Data Access, and click Provision. You'll receive a client_id and a client_secret — the secret is shown ONCE, so paste it into your secret manager immediately.

bash
# From app.vectorfinancials.com/dashboard/data-access:
#   Catalog URI:  https://catalog.vectorfinancials.com/api/catalog
#   Warehouse:    vectorfin
#   Client ID:    <shown in dashboard>
#   Client Secret: <shown ONCE — copy now>
export VF_POLARIS_CLIENT_ID="..."
export VF_POLARIS_CLIENT_SECRET="..."
2

Connect pyiceberg to the Polaris REST catalog

Catalog metadata operations — list_namespaces, list_tables, load_table — work over the REST API alone. No GCS access required for these.

python
# pip install "pyiceberg[gcs]" pandas
from pyiceberg.catalog import load_catalog
import os

catalog = load_catalog(
    "vectorfin",
    **{
        "type": "rest",
        "uri": "https://catalog.vectorfinancials.com/api/catalog",
        "warehouse": "vectorfin",
        "credential": f"{os.environ['VF_POLARIS_CLIENT_ID']}:{os.environ['VF_POLARIS_CLIENT_SECRET']}",
        "scope": "PRINCIPAL_ROLE:ALL",
    },
)

print(catalog.list_namespaces(("vectorfin",)))
# [('vectorfin', 'embeddings'), ('vectorfin', 'signals')]

print(catalog.list_tables(("vectorfin", "signals")))
# whystock_score, regime, volatility, sentiment_drift, anomaly
3

Request GCS read access for your service account

pyiceberg needs to read Parquet/metadata files from gs://vectorfinancials-data/warehouse/vectorfin/. We grant prefix-scoped roles/storage.objectViewer to a service account you control — open a ticket with the SA email from your GCP project. Turnaround: 1 business day. (We do not vend HMAC keys; we use IAM grants so you keep audit on your side.)

bash
# 1. In your own GCP project, create a service account (or pick an existing one).
# 2. Email the SA email address (e.g. iceberg@your-proj.iam.gserviceaccount.com)
#    to support@vectorfinancials.com — subject "Iceberg GCS grant <org>".
# 3. We run polaris/grant-snowflake-sa.sh equivalent for your SA, scoped to:
#       gs://vectorfinancials-data/warehouse/vectorfin/*
# 4. Authenticate locally as that SA before running pyiceberg scans:
gcloud auth application-default login --impersonate-service-account=iceberg@your-proj.iam.gserviceaccount.com
4

Load a table and scan with pushdown filters

Once GCS is granted, table.scan() reads Parquet directly from GCS via the Polaris-issued metadata path. Filters and column projection are pushed down.

python
from pyiceberg.expressions import And, EqualTo, GreaterThanOrEqual

table = catalog.load_table(("vectorfin", "signals", "whystock_score"))

df = table.scan(
    row_filter=And(
        EqualTo("ticker", "AAPL"),
        GreaterThanOrEqual("date", "2024-01-01"),
    ),
    selected_fields=("ticker", "date", "score", "components"),
).to_pandas()

print(df.head())
5

Bitemporal time-travel via knowledge_ts

Every VectorFin table is append-only with effective_ts (when the fact applied) and knowledge_ts (when we learned it). Filtering on knowledge_ts gives a true point-in-time view — no lookahead bias in backtests.

python
from pyiceberg.expressions import And, EqualTo, LessThanOrEqual
from datetime import datetime

# What did we know about AAPL signals as of 2024-01-01?
df = table.scan(
    row_filter=And(
        EqualTo("ticker", "AAPL"),
        LessThanOrEqual("knowledge_ts", datetime(2024, 1, 1).isoformat()),
    ),
    selected_fields=("ticker", "date", "score", "effective_ts", "knowledge_ts"),
).to_pandas()

# Latest known row per (ticker, date)
df = (df.sort_values("knowledge_ts")
        .drop_duplicates(["ticker", "date"], keep="last")
        .sort_values("date"))
print(df.tail())

Available Tables

All 7 VectorFin data tables — bitemporal (effective_ts + knowledge_ts), append-only, nightly updates.

vectorfin.embeddings.transcriptsEarnings call chunk embeddings (768-dim)
sql
catalog.load_table(("vectorfin","embeddings","transcripts")).scan(row_filter=EqualTo("ticker","AAPL")).to_pandas()
vectorfin.embeddings.filingsSEC filing section embeddings
sql
catalog.load_table(("vectorfin","embeddings","filings")).scan(row_filter=EqualTo("filing_type","10-K")).to_pandas()
vectorfin.signals.whystock_scoreComposite quant score (0–100)
sql
catalog.load_table(("vectorfin","signals","whystock_score")).scan().to_pandas()
vectorfin.signals.regimeMarket regime (trending/ranging/volatile)
sql
catalog.load_table(("vectorfin","signals","regime")).scan(row_filter=EqualTo("ticker","NVDA")).to_pandas()
vectorfin.signals.volatilityGARCH volatility forecasts (1d/5d/21d)
sql
catalog.load_table(("vectorfin","signals","volatility")).scan().to_pandas()
vectorfin.signals.sentiment_driftEarnings sentiment drift vectors
sql
catalog.load_table(("vectorfin","signals","sentiment_drift")).scan().to_pandas()
vectorfin.signals.anomalyAnomaly scores and flags
sql
catalog.load_table(("vectorfin","signals","anomaly")).scan(row_filter=GreaterThanOrEqual("anomaly_score",0.8)).to_pandas()

Start querying in 15 minutes

Sign up for VectorFin and get immediate API access.