+
VectorFin

Connect BigQuery to VectorFin Iceberg Data

Query VectorFin Iceberg tables directly from BigQuery with external table registration — no ETL, no copies.

5 min
Setup time
7
Iceberg tables
5K+
US tickers
Nightly
Updates

Prerequisites

📋VectorFin Pro plan
🔑API key from app.vectorfinancials.com
☁️BigQuery account

Connection Guide

1

Get API access and service account

Sign up at app.vectorfinancials.com (Pro tier required). Create a BigQuery Connection with your GCS service account that has read access to gs://vectorfinancials-data/.

bash
# Create BigQuery connection to GCS
bq mk --connection \
  --connection_type=CLOUD_RESOURCE \
  --location=US \
  your-project:US.vf_connection

# Grant storage access to the connection service account
gsutil iam ch serviceAccount:SERVICE_ACCOUNT@developer.gserviceaccount.com:roles/storage.objectViewer \
  gs://vectorfinancials-data
2

Register external Iceberg table

Create external tables pointing to VectorFin's GCS Iceberg warehouse. Run once; re-run after each nightly pipeline update to refresh metadata.

sql
-- Register transcript embeddings table
CREATE OR REPLACE EXTERNAL TABLE `your-project.your_dataset.vf_transcripts`
WITH CONNECTION `us-central1.vf_connection`
OPTIONS (
  format = 'ICEBERG',
  uris = ['gs://vectorfinancials-data/warehouse/embeddings/transcripts/metadata/LATEST.metadata.json']
);

-- Register whystock score signals table
CREATE OR REPLACE EXTERNAL TABLE `your-project.your_dataset.vf_whystock_score`
WITH CONNECTION `us-central1.vf_connection`
OPTIONS (
  format = 'ICEBERG',
  uris = ['gs://vectorfinancials-data/warehouse/signals/whystock_score/metadata/LATEST.metadata.json']
);
3

Verify with a count query

Run a simple COUNT to confirm the table is accessible and data is flowing.

sql
SELECT COUNT(*) as total_rows, COUNT(DISTINCT ticker) as tickers
FROM `your-project.your_dataset.vf_transcripts`;
-- Expected: ~500K rows, 5000+ tickers
4

Query embeddings and signals

Run your first real query — semantic embeddings for AAPL and a bitemporal point-in-time signal lookup.

sql
-- AAPL semantic embeddings for 2024
SELECT ticker, fiscal_period, chunk_idx, embedding
FROM `your-project.your_dataset.vf_transcripts`
WHERE ticker = 'AAPL' AND fiscal_period LIKE '2024%'
ORDER BY fiscal_period, chunk_idx;

-- Bitemporal point-in-time query (as-of Jan 1 2024)
SELECT ticker, date, score, components
FROM `your-project.your_dataset.vf_whystock_score`
WHERE ticker = 'AAPL'
  AND knowledge_ts <= TIMESTAMP('2024-01-01')
ORDER BY effective_ts DESC
LIMIT 1;

Available Tables

All 7 VectorFin data tables — bitemporal (effective_ts + knowledge_ts), append-only, nightly updates.

embeddings/transcriptsEarnings call chunk embeddings (768-dim)
sql
SELECT ticker, fiscal_period, chunk_idx, embedding FROM vf_transcripts WHERE ticker = 'NVDA' LIMIT 10
embeddings/filingsSEC filing section embeddings
sql
SELECT ticker, filing_type, section, embedding FROM vf_filings WHERE ticker = 'MSFT' AND filing_type = '10-K'
signals/whystock_scoreComposite quant score (0–100)
sql
SELECT ticker, date, score FROM vf_whystock_score WHERE date >= '2024-01-01' ORDER BY score DESC LIMIT 20
signals/regimeMarket regime (trending/ranging/volatile)
sql
SELECT ticker, date, regime, confidence FROM vf_regime WHERE ticker = 'AAPL'
signals/volatilityGARCH volatility forecasts (1d/5d/21d)
sql
SELECT ticker, date, garch_vol_1d, garch_vol_21d FROM vf_volatility WHERE ticker = 'TSLA'
signals/sentiment_driftEarnings sentiment drift vectors
sql
SELECT ticker, fiscal_period, drift_vector FROM vf_sentiment_drift WHERE ticker = 'AMZN'
signals/anomalyAnomaly scores and flags
sql
SELECT ticker, date, anomaly_score, flags FROM vf_anomaly WHERE anomaly_score > 0.8

Start querying in 5 minutes

Sign up for VectorFin and get immediate API access.