RAG

RAG QUICKSTART

Building a financial RAG pipeline on VectorFin

VectorFin is a drop-in retrieval layer for Retrieval-Augmented Generation (RAG). The API serves pre-computed embedding vectors (not raw text, and no server-side search): discover the available quarters with GET /v1/transcripts/{ticker}, pull the 768-dim vectors with GET /v1/embeddings/{ticker}?fiscal_period=..., then index them in your own vector store and rank by cosine similarity against your embedded query.

python

import os, requests, numpy as np, google.generativeai as genai

genai.configure(api_key=os.environ["GEMINI_API_KEY"])
VF = "https://api.vectorfinancials.com"
H = {"X-API-Key": os.environ["VECTORFIN_API_KEY"]}

question = "What did Apple management say about services margins in 2024?"

# 1. Discover which fiscal periods are available for the ticker.
periods = requests.get(f"{VF}/v1/transcripts/AAPL",
                       headers=H, params={"limit": 8}).json()

# 2. Pull the pre-computed embedding vectors for those periods.
records = []
for p in periods:
    records += requests.get(
        f"{VF}/v1/embeddings/AAPL",
        headers=H,
        params={"fiscal_period": p["fiscal_period"], "limit": 100},
    ).json()

# 3. Embed the query with the same model VectorFin uses, then rank
#    by cosine similarity in YOUR vector store (the "R" in RAG).
q = genai.embed_content(
    model="gemini-embedding-2-preview",
    content=question,
    task_type="retrieval_query",
)["embedding"]
q = np.asarray(q)

def cosine(v):
    v = np.asarray(v)
    return float(q @ v / (np.linalg.norm(q) * np.linalg.norm(v)))

top = sorted(records, key=lambda r: cosine(r["embedding"]), reverse=True)[:8]

# 4. Ground the LLM with citations (the "AG" in RAG). Note: you join
#    against your own transcript text keyed by (fiscal_period, chunk_idx).
cites = "\n".join(
    f"[{r['ticker']} {r['fiscal_period']} chunk {r['chunk_idx']}]"
    for r in top
)
print("Top matches:\n" + cites)

Want point-in-time RAG (e.g. answer Q2 2022 questions without future information)? Every record is bitemporal, so filter the returned rows on knowledge_ts to drop anything we learned after your as-of date, the same way the underlying Apache Iceberg tables are stored.

BigQuery: preview

On Pro and Enterprise you can run the retrieval step inside BigQuery instead of a separate vector store. BigQuery embeds the query for you, so there is no out-of-band query vector to paste: register a remote model over the Gemini endpoint, embed the question with ML.GENERATE_EMBEDDING (the same RETRIEVAL_QUERY task type and 768 dims the corpus uses), and VECTOR_SEARCH the shared transcript embeddings by cosine distance. See the BigQuery integration for the full walkthrough.

sql

-- One-time: a remote model over the Vertex Gemini embedding endpoint.
CREATE OR REPLACE MODEL `[your_project].[your_dataset].gemini2_embed`
REMOTE WITH CONNECTION DEFAULT
OPTIONS (ENDPOINT = 'gemini-embedding-2-preview');

-- Rank earnings calls by how closely their chunks match the question.
SELECT
  base.ticker        AS ticker,
  base.fiscal_period AS fiscal_period,
  MIN(distance)      AS best_chunk_distance,
  ARRAY_AGG(base.chunk_idx ORDER BY distance LIMIT 10) AS relevant_chunk_indexes
FROM VECTOR_SEARCH(
  TABLE `[your_project].[your_dataset].transcripts`,
  'embedding',
  (
    SELECT ml_generate_embedding_result AS embedding
    FROM ML.GENERATE_EMBEDDING(
      MODEL `[your_project].[your_dataset].gemini2_embed`,
      (SELECT 'How are operating margins trending and what is driving the change?' AS content),
      STRUCT(TRUE AS flatten_json_output,
             'RETRIEVAL_QUERY' AS task_type,
             768 AS output_dimensionality)
    )
  ),
  top_k => 100,
  distance_type => 'COSINE'
)
GROUP BY ticker, fiscal_period
ORDER BY best_chunk_distance
LIMIT 20;

VECTOR_SEARCH returns chunk indexes, not prose. Join the ranked chunk_idx values back to the raw transcript text shared alongside the embeddings to read the matched paragraphs. UNNEST(... WITH OFFSET) keeps the rank order.