ML & AI

What is Dense Retrieval?

A retrieval approach that encodes both queries and documents as dense vectors, finding relevant matches via nearest-neighbor search rather than keyword overlap.

In Plain English

There are two ways to find a document about "margin compression" in a filing. The old way: look for documents that contain the exact words "margin compression." The new way: encode your query as a vector of numbers, encode every document as a vector of numbers, and find documents whose vectors point in the same direction as your query vector. The second approach is dense retrieval.

The word "dense" contrasts with "sparse." Traditional keyword search (BM25, TF-IDF) represents documents as sparse vectors — long lists mostly filled with zeros, where each non-zero position corresponds to a word that appears. Dense retrieval produces short, dense vectors — hundreds of dimensions, all non-zero — that encode meaning rather than word counts.

This matters enormously in practice. Dense retrieval can match "gross margin declined" to "profitability deteriorated on the product side" even though they share no words. Sparse retrieval would give those documents a similarity score of zero.

The trade-off is computation. Sparse retrieval can be done with inverted indexes in microseconds. Dense retrieval requires embedding every document at index time and running nearest-neighbor search at query time. The computational cost has dropped dramatically as hardware and algorithms have improved, making dense retrieval the default in modern search and RAG systems.

In financial text, dense retrieval has an especially strong advantage: the jargon is vast and inconsistent. Different CFOs describe the same economic reality using entirely different vocabulary. Dense retrieval bridges that vocabulary gap automatically.

Technical Definition

Dense retrieval uses a bi-encoder architecture: a query encoder E_q and a document encoder E_d, both typically transformer models. At index time, each document d is encoded as v_d = E_d(d) ∈ ℝⁿ. At query time, the query is encoded as v_q = E_q(q) and the top-k documents are retrieved by:

D = argmax_k sim(v_q, v_d) for all v_d in the index

where sim is cosine similarity or inner product.

Contrast with sparse retrieval (BM25), where document and query representations are term-frequency vectors in vocabulary space (dimension = vocabulary size, ~50K-100K). BM25 scores use term frequency, inverse document frequency, and document length normalization but cannot handle synonymy or paraphrase.

Hybrid retrieval combines both: retrieve k₁ candidates with BM25 and k₂ with dense retrieval, then merge with reciprocal rank fusion or a learned combiner. Hybrid generally outperforms either alone, especially on out-of-domain queries.

Fine-tuned bi-encoders (e.g., DPR fine-tuned on financial QA pairs) substantially outperform general-purpose encoders on domain-specific retrieval tasks.

How VectorFin Uses This

VectorFin's embeddings are produced by the gemini-embedding-2-preview model, which functions as the document encoder in a bi-encoder dense retrieval system. Every earnings call transcript is chunked into 512-token segments (with 64-token overlap), encoded, and stored as rows in the Iceberg embeddings table.

The POST /v1/embeddings/search endpoint accepts either a text query (VectorFin encodes it server-side) or a raw vector (for customers who want to encode with their own model). The server runs ANN search over the pre-built index and returns ranked chunks with similarity scores.

For bulk research workflows — e.g., finding all earnings call passages across 500 tickers that discuss a specific theme — the batch search endpoint accepts up to 100 queries per request, making cross-ticker dense retrieval feasible at research scale.

Code Example

import requests

API_BASE = "https://api.vectorfinancials.com"
API_KEY = "vf_your_api_key_here"

# Dense retrieval: text query → VectorFin encodes and retrieves
def dense_search(query: str, tickers: list[str], period_start: str, period_end: str, top_k: int = 8):
    resp = requests.post(
        f"{API_BASE}/v1/embeddings/search",
        json={
            "query": query,                    # server encodes with gemini-embedding-2-preview
            "tickers": tickers,
            "period_start": period_start,
            "period_end": period_end,
            "top_k": top_k,
        },
        headers={"X-API-Key": API_KEY},
    )
    resp.raise_for_status()
    return resp.json()["results"]

# Find earnings call passages discussing pricing power across semiconductor tickers
results = dense_search(
    query="ability to raise prices and maintain margins in competitive environment",
    tickers=["NVDA", "AMD", "INTC", "QCOM", "AVGO"],
    period_start="2023-Q1",
    period_end="2024-Q4",
    top_k=10,
)

for r in results:
    print(f"{r['ticker']} {r['period']} (sim={r['score']:.3f}): {r['text'][:150]}")

External References

Put Dense Retrieval to work in your pipeline

Access AI-ready financial data — embeddings, signals, Iceberg tables.

Get API Access Back to Glossary