ML & AI

What is Transformer?

A neural network architecture built on self-attention that has become the foundation for all major language models, including those used to embed financial text.

In Plain English

Before transformers, language models read text one word at a time, like someone listening to a sentence through a narrow tube. They struggled to connect a pronoun late in a sentence to the noun it referred to thirty words earlier. Transformers solved this by letting every word look at every other word simultaneously through a mechanism called self-attention.

Think of it like a conference room discussion where everyone can speak to everyone else at once, rather than a chain of telephone messages. The model learns which words are relevant to which other words for the task at hand. In the sentence "The company's revenue grew despite the headwinds," the transformer can directly link "headwinds" back to "revenue" and "company" without losing that context over distance.

This architecture, introduced in the 2017 paper "Attention Is All You Need," became the engine for BERT, GPT, and virtually every major language model since. BERT-style transformers, which read text bidirectionally (left-to-right and right-to-left simultaneously), are especially effective for producing embeddings — they create rich representations of meaning because each token's output encodes context from the entire surrounding passage.

For financial text, the bidirectional context is critical. A phrase like "we remain cautiously optimistic" means something very different depending on whether it precedes or follows a paragraph about supply chain disruptions. Transformers capture that context automatically.

Technical Definition

A transformer consists of stacked encoder and/or decoder blocks. Each block contains two sub-layers: a multi-head self-attention layer and a position-wise feed-forward network, each followed by layer normalization with a residual connection.

Self-attention for a sequence of token embeddings X ∈ ℝ^(n×d) computes:

Attention(Q, K, V) = softmax(QKᵀ / √d_k) · V

where Q = XW_Q, K = XW_K, V = XW_V are projections via learned weight matrices. Multi-head attention runs h parallel attention heads and concatenates their outputs.

Positional encodings (sinusoidal or learned) are added to token embeddings since self-attention is permutation-invariant. Encoder-only models (BERT, RoBERTa) produce contextualized token representations used for embedding and classification. Decoder-only models (GPT) generate text autoregressively.

How VectorFin Uses This

VectorFin uses Google's gemini-embedding-2-preview, a transformer-based model, to encode every chunk of earnings call transcripts and SEC filings into 768-dimensional vectors. The transformer's bidirectional attention means the embedding for a single paragraph captures surrounding context — management tone, preceding topics, and the conversational structure of analyst Q&A sessions.

These embeddings are stored in the Iceberg tables at gs://vectorfinancials-data/warehouse/embeddings/ and served via:

GET https://api.vectorfinancials.com/v1/embeddings/{ticker}?period=2024-Q4
X-API-Key: <your-key>

Because transformers produce semantically rich embeddings, downstream signals like sentiment_drift and anomaly are far more informative than bag-of-words or TF-IDF representations of the same text.

Code Example

# Transformers produce embeddings — here's how to inspect what the model captured
import requests
import numpy as np

API_KEY = "vf_your_api_key_here"

resp = requests.get(
    "https://api.vectorfinancials.com/v1/embeddings/NVDA",
    params={"period": "2024-Q4"},
    headers={"X-API-Key": API_KEY},
)
resp.raise_for_status()
data = resp.json()

print(f"Model: {data['model']}")          # gemini-embedding-2-preview
print(f"Dimension: {data['dimension']}")  # 768
print(f"Chunks: {len(data['chunks'])}")   # number of transcript segments

# Each chunk = one self-contained passage the transformer encoded
for chunk in data["chunks"][:3]:
    vec = np.array(chunk["embedding"])
    print(f"  chunk {chunk['chunk_idx']}: norm={np.linalg.norm(vec):.3f}, "
          f"preview='{chunk['text'][:80]}...'")

External References

Put Transformer to work in your pipeline

Access AI-ready financial data — embeddings, signals, Iceberg tables.

Get API Access Back to Glossary