RAG
Building a financial RAG pipeline on VectorFin
VectorFin is a drop-in retrieval layer for Retrieval-Augmented Generation (RAG). The API serves pre-computed embedding vectors (not raw text, and no server-side search): discover the available quarters with GET /v1/transcripts/{ticker}, pull the 768-dim vectors with GET /v1/embeddings/{ticker}?fiscal_period=..., then index them in your own vector store and rank by cosine similarity against your embedded query.
import os, requests, numpy as np, google.generativeai as genai
genai.configure(api_key=os.environ["GEMINI_API_KEY"])
VF = "https://api.vectorfinancials.com"
H = {"X-API-Key": os.environ["VECTORFIN_API_KEY"]}
question = "What did Apple management say about services margins in 2024?"
# 1. Discover which fiscal periods are available for the ticker.
periods = requests.get(f"{VF}/v1/transcripts/AAPL",
headers=H, params={"limit": 8}).json()
# 2. Pull the pre-computed embedding vectors for those periods.
records = []
for p in periods:
records += requests.get(
f"{VF}/v1/embeddings/AAPL",
headers=H,
params={"fiscal_period": p["fiscal_period"], "limit": 100},
).json()
# 3. Embed the query with the same model VectorFin uses, then rank
# by cosine similarity in YOUR vector store (the "R" in RAG).
q = genai.embed_content(
model="gemini-embedding-2-preview",
content=question,
task_type="retrieval_query",
)["embedding"]
q = np.asarray(q)
def cosine(v):
v = np.asarray(v)
return float(q @ v / (np.linalg.norm(q) * np.linalg.norm(v)))
top = sorted(records, key=lambda r: cosine(r["embedding"]), reverse=True)[:8]
# 4. Ground the LLM with citations (the "AG" in RAG). Note: you join
# against your own transcript text keyed by (fiscal_period, chunk_idx).
cites = "\n".join(
f"[{r['ticker']} {r['fiscal_period']} chunk {r['chunk_idx']}]"
for r in top
)
print("Top matches:\n" + cites)Want point-in-time RAG (e.g. answer Q2 2022 questions without future information)? Every record is bitemporal, so filter the returned rows on knowledge_ts to drop anything we learned after your as-of date, the same way the underlying Apache Iceberg tables are stored.
On Pro and Enterprise you can run the retrieval step inside BigQuery instead of a separate vector store. BigQuery embeds the query for you, so there is no out-of-band query vector to paste: register a remote model over the Gemini endpoint, embed the question with ML.GENERATE_EMBEDDING (the same RETRIEVAL_QUERY task type and 768 dims the corpus uses), and VECTOR_SEARCH the shared transcript embeddings by cosine distance. See the BigQuery integration for the full walkthrough.
-- One-time: a remote model over the Vertex Gemini embedding endpoint.
CREATE OR REPLACE MODEL `[your_project].[your_dataset].gemini2_embed`
REMOTE WITH CONNECTION DEFAULT
OPTIONS (ENDPOINT = 'gemini-embedding-2-preview');
-- Rank earnings calls by how closely their chunks match the question.
SELECT
base.ticker AS ticker,
base.fiscal_period AS fiscal_period,
MIN(distance) AS best_chunk_distance,
ARRAY_AGG(base.chunk_idx ORDER BY distance LIMIT 10) AS relevant_chunk_indexes
FROM VECTOR_SEARCH(
TABLE `[your_project].[your_dataset].transcripts`,
'embedding',
(
SELECT ml_generate_embedding_result AS embedding
FROM ML.GENERATE_EMBEDDING(
MODEL `[your_project].[your_dataset].gemini2_embed`,
(SELECT 'How are operating margins trending and what is driving the change?' AS content),
STRUCT(TRUE AS flatten_json_output,
'RETRIEVAL_QUERY' AS task_type,
768 AS output_dimensionality)
)
),
top_k => 100,
distance_type => 'COSINE'
)
GROUP BY ticker, fiscal_period
ORDER BY best_chunk_distance
LIMIT 20;VECTOR_SEARCH returns chunk indexes, not prose. Join the ranked chunk_idx values back to the raw transcript text shared alongside the embeddings to read the matched paragraphs. UNNEST(... WITH OFFSET) keeps the rank order.