NLP for Finance

What is Sentiment Analysis?

The automated classification of text as positive, negative, or neutral (and sometimes more granular) to extract opinion and tone signals at scale.

In Plain English

Humans can read an earnings call transcript and immediately sense whether management sounds confident or worried, whether the tone is upbeat or guarded. But human readers can only process a limited number of calls, and their judgments are subjective and inconsistent. Sentiment analysis automates this process — teaching machines to detect tone and opinion in text at scale.

In its simplest form, sentiment analysis counts positive and negative words. Does this paragraph contain more words like "exceeded," "record," "strong" or words like "challenging," "decline," "disappointed"? The Loughran-McDonald dictionary, developed specifically for financial text, is the classic tool for this — it's a wordlist of words that carry positive or negative connotation in a financial context (interestingly, many words that seem negative in general English, like "liability," are neutral in finance).

Modern sentiment analysis uses transformer-based models rather than wordlists. FinBERT, a BERT model fine-tuned on financial text, understands context: it knows that "Our margins improved despite headwinds" is net positive, even though "headwinds" is a negative word. It understands negation, hedging, and the complex conditional language that characterizes financial communication.

The real power in financial sentiment analysis comes from tracking changes over time. A company whose management language shifted from cautiously optimistic to genuinely concerned across several consecutive earnings calls is sending a signal that no single call's sentiment score captures. This is the intuition behind VectorFin's sentiment_drift signal: measuring how tone is changing, not just what tone is present.

Technical Definition

Sentiment analysis approaches by sophistication:

Lexicon-based: Count positive/negative words from domain-specific dictionaries (Loughran-McDonald, Harvard GI). Score = (#positive − #negative) / total_words. Fast, interpretable, but ignores context and syntax.

Machine learning: Train a classifier (logistic regression, SVM) on labeled financial text. Requires labeled training data. Better than lexicons but still context-limited.

Fine-tuned LLM: BERT/RoBERTa fine-tuned on financial sentiment datasets (FinBERT). Understands context, negation, and domain-specific language. Typical accuracy 85-93% on financial sentence classification.

Aspect-based sentiment analysis (ABSA): Identifies sentiment toward specific aspects (margins, revenue, guidance, competition) rather than a document-level score. More granular and actionable.

Embedding-based drift: Rather than classifying polarity, compute cosine distance between consecutive document embeddings to detect tone shifts. This is the approach VectorFin uses for sentiment_drift — no explicit positive/negative label, just geometric drift in the meaning-space.

How VectorFin Uses This

VectorFin's signals/sentiment_drift table quantifies the tone shift between consecutive earnings calls for each ticker. The drift score is computed as the signed cosine distance between the mean embedding vectors of consecutive quarterly calls:

Positive drift ≈ management language becoming more optimistic or expansive
Negative drift ≈ language becoming more cautious, hedged, or pessimistic
Near-zero drift ≈ consistent messaging quarter over quarter

This signal is available for 5,000+ tickers back to 2018:

GET https://api.vectorfinancials.com/v1/signals/sentiment-drift/{ticker}?period=2024-Q3

The anomaly signal uses a related approach: when the embedding of a specific earnings call chunk is far from historical norms for that company, it flags the passage as an anomalous communication — potential early warning of an undisclosed material change.

Code Example

import requests
import pandas as pd

API_BASE = "https://api.vectorfinancials.com"
API_KEY = "vf_your_api_key_here"

# Track sentiment drift over time for a company
def get_drift_history(ticker: str, periods: list[str]) -> pd.DataFrame:
    rows = []
    for period in periods:
        resp = requests.get(
            f"{API_BASE}/v1/signals/sentiment-drift/{ticker}",
            params={"period": period},
            headers={"X-API-Key": API_KEY},
        )
        if resp.ok and resp.json().get("drift_score") is not None:
            rows.append({
                "period": period,
                "drift_score": resp.json()["drift_score"],
                "drift_magnitude": abs(resp.json()["drift_score"]),
            })
    return pd.DataFrame(rows)

periods = ["2023-Q1", "2023-Q2", "2023-Q3", "2023-Q4",
           "2024-Q1", "2024-Q2", "2024-Q3", "2024-Q4"]

df = get_drift_history("NVDA", periods)
print("NVDA sentiment drift history:")
print(df.to_string(index=False))

# Flag large negative drifts as bearish warning signals
warnings = df[df["drift_score"] < -0.15]
if not warnings.empty:
    print(f"\nBearish language shifts detected: {warnings['period'].tolist()}")

External References

Put Sentiment Analysis to work in your pipeline

Access AI-ready financial data — embeddings, signals, Iceberg tables.

Get API Access Back to Glossary