NLP for Finance

What is Earnings Call?

The quarterly conference call where public company management presents financial results and takes analyst questions, one of the richest sources of unstructured financial information.

In Plain English

Four times a year, the CEO and CFO of every major public company get on a phone call with Wall Street. For 45-90 minutes, they explain the quarter's results, discuss what they expect going forward, and answer questions from analysts at Goldman Sachs, Morgan Stanley, and hundreds of other institutions. These calls are recorded, transcribed, and filed publicly. They're among the most valuable sources of information about a company you'll find.

The format is consistent: prepared remarks (15-30 minutes, scripted), followed by a question-and-answer session (30-60 minutes, unscripted). The prepared remarks are polished and carefully lawyer-reviewed, often revealing what management wants to emphasize. In Q&A, analysts probe for specifics, push back on guidance, and occasionally force management to address uncomfortable topics. The unscripted responses tend to be more revealing than the prepared text.

What makes earnings calls so valuable for NLP is the volume, consistency, and density of meaningful information. You have the same executives talking about the same company, once per quarter, for years. Language patterns shift in detectable ways. A CEO who has said "we're confident in our pipeline" every quarter for two years, then suddenly says "we're encouraged by early indicators," has said something meaningfully different. A shift like that escapes keyword filters but shows up when you compare embedding vectors.

Financial analysts have long known that tone, word choice, and delivery matter. Extracting that signal at scale has only recently become practical. Transformer-based embeddings now make it feasible across thousands of companies.

Technical Definition

An earnings call transcript has a defined structure that VectorFin's pipeline uses for intelligent chunking:

1. Operator introduction: boilerplate, minimal signal 2. Safe harbor and forward-looking statement disclaimer: legal preamble 3. CEO prepared remarks: strategic context, highlights, narrative 4. CFO prepared remarks: detailed financial results, guidance 5. Q&A session: analyst name, question, management response 6. Operator closing: boilerplate

VectorFin segments transcripts into 512-token chunks with 64-token overlap. Speaker identification is preserved in chunk metadata. Q&A chunks include the speaker's name and role, allowing you to filter for CFO-only commentary or questions from specific analysts.

Fiscal period format in VectorFin: {YYYY}-Q{N} where N is the company's fiscal quarter number (1-4). A company with a June 30 fiscal year end has its Q1 = July-September, which is calendar Q3. VectorFin stores both the fiscal period label and calendar quarter for cross-company comparison.

How VectorFin Uses This

VectorFin embeds every earnings call transcript for the S&P 500 (beta), stored in:

text

gs://vectorfinancials-data/warehouse/embeddings/transcripts/

Each chunk row: ticker, fiscal_period, chunk_idx, section (prepared/qa), speaker, effective_ts (fiscal period end), knowledge_ts (ingestion time), embedding (float[768]).

text

GET https://api.vectorfinancials.com/v1/embeddings/{ticker}?fiscal_period=2024-Q4

fiscal_period and limit are the only query params. Each returned record carries a section key (prepared/qa), allowing you to filter Q&A chunks client-side. The Q&A filter helps detect management evasion. When management consistently gives vague or deflecting answers to analyst questions about a specific topic, you've found something worth investigating.

Code Example

python

import requests

API_BASE = "https://api.vectorfinancials.com"
API_KEY = "vf_your_api_key_here"

# Analyze the Q&A section separately from prepared remarks
# Q&A contains more candid, unscripted management language

ticker = "TSLA"
fiscal_period = "2024-Q3"

# Fetch all chunks
all_resp = requests.get(
    f"{API_BASE}/v1/embeddings/{ticker}",
    params={"fiscal_period": fiscal_period},
    headers={"X-API-Key": API_KEY},
)
all_chunks = all_resp.json()  # /v1/embeddings returns a JSON array of records

# Separate prepared remarks from Q&A
prepared = [c for c in all_chunks if c.get("section") == "prepared"]
qa = [c for c in all_chunks if c.get("section") == "qa"]

print(f"{ticker} {fiscal_period}: {len(prepared)} prepared chunks, {len(qa)} Q&A chunks")

# Rank the Q&A chunks against a topic vector. There is no server-side search:
# the chunks are already in hand, so do the cosine ranking locally.
import numpy as np

topic_vector = np.array(...)  # encode "production capacity / supply chain" yourself
E = np.stack([c["embedding"] for c in qa])                   # (N, 768)
sims = (E @ topic_vector) / (
    np.linalg.norm(E, axis=1) * np.linalg.norm(topic_vector)
)

print(f"\nTop Q&A chunks about production challenges:")
for i in np.argsort(sims)[::-1][:5]:
    c = qa[i]
    speaker = c.get("speaker", "Unknown")
    # records carry vectors + keys, not text. join (ticker, fiscal_period, chunk_idx)
    # back to transcript store to get the actual spoken words.
    print(f"[{speaker}] {ticker} {c['fiscal_period']} chunk {c['chunk_idx']} (sim: {sims[i]:.3f})")

Related Terms

fiscal period

External References

Put Earnings Call to work in your pipeline

Pull AI-ready embeddings and signals as Iceberg tables or over the REST API.

Get API Access