+
VectorFin

VectorFin dbt Integration

Model VectorFin signals in dbt — define transforms on top of Iceberg tables, document lineage, and schedule with dbt Cloud.

20 min
Setup time
7
Iceberg tables
5K+
US tickers
Nightly
Updates

Prerequisites

📋VectorFin Pro plan
🔑API key from app.vectorfinancials.com
☁️dbt account

Connection Guide

1

Configure dbt profile for BigQuery or Snowflake

Set up your dbt profiles.yml to connect to BigQuery (or Snowflake) where VectorFin tables are registered.

bash
# ~/.dbt/profiles.yml
vectorfin_project:
  target: dev
  outputs:
    dev:
      type: bigquery
      method: service-account
      project: your-gcp-project
      dataset: vectorfin_models
      keyfile: /path/to/service-account.json
      threads: 4
2

Define VectorFin sources

Declare VectorFin Iceberg tables as dbt sources in your project.

bash
# models/sources.yml
version: 2

sources:
  - name: vectorfin
    database: your-gcp-project
    schema: vectorfin_raw
    tables:
      - name: vf_transcripts
        description: "Earnings call transcript embeddings (768-dim)"
        columns:
          - name: ticker
            description: "Equity ticker symbol"
          - name: fiscal_period
          - name: chunk_idx
          - name: embedding
            description: "768-dim float array from Gemini text-embedding-004"
          - name: effective_ts
          - name: knowledge_ts
      - name: vf_whystock_score
        description: "Composite quant score 0-100"
3

Create a signal model

Build a dbt model that transforms raw VectorFin signals into analysis-ready data.

sql
-- models/signals/daily_top_signals.sql
{{
  config(
    materialized='table',
    partition_by={'field': 'date', 'data_type': 'date'},
    cluster_by=['ticker']
  )
}}

with base as (
    select
        ticker,
        date,
        score,
        json_extract_scalar(components, '$.regime_score') as regime_score,
        json_extract_scalar(components, '$.volatility_score') as vol_score,
        json_extract_scalar(components, '$.sentiment_score') as sentiment_score,
        effective_ts,
        knowledge_ts
    from {{ source('vectorfin', 'vf_whystock_score') }}
    where date >= '2024-01-01'
),

ranked as (
    select *,
        rank() over (partition by date order by score desc) as daily_rank
    from base
)

select * from ranked
where daily_rank <= 100
4

Run and test your models

Execute dbt models and run data quality tests on VectorFin signal data.

bash
# Run all models
dbt run --select signals/

# Run tests (not-null, accepted-values, etc.)
dbt test --select signals/

# Generate and serve docs
dbt docs generate
dbt docs serve

# Schedule in dbt Cloud: set up a job that runs after
# VectorFin nightly pipeline completes (~2am UTC)

Available Tables

All 7 VectorFin data tables — bitemporal (effective_ts + knowledge_ts), append-only, nightly updates.

source(vectorfin, vf_transcripts)Raw transcript embeddings source
sql
select * from {{ source('vectorfin', 'vf_transcripts') }} where ticker = 'AAPL'
source(vectorfin, vf_whystock_score)Raw quant score source
sql
select * from {{ source('vectorfin', 'vf_whystock_score') }} where date >= '2024-01-01'
ref(daily_top_signals)Top 100 signals per day model
sql
select * from {{ ref('daily_top_signals') }} where date = current_date()
ref(regime_summary)Regime classification summary
sql
select ticker, regime, count(*) from {{ ref('regime_summary') }} group by 1, 2
ref(volatility_surface)Multi-horizon volatility model
sql
select ticker, date, garch_vol_1d, garch_vol_21d from {{ ref('volatility_surface') }}

Start querying in 20 minutes

Sign up for VectorFin and get immediate API access.