VectorFin/Glossary/Named Entity Recognition (NER)
NLP for Finance

What is Named Entity Recognition (NER)?

An NLP technique that automatically identifies and classifies structured entities — companies, people, dates, dollar amounts, locations — within unstructured text.

In Plain English

When an analyst reads an earnings call transcript, they're automatically extracting structured information without thinking about it. "Management mentioned they're working with TSMC on the next generation chips, expect to ship in Q3 2025, and see a $2 billion revenue opportunity." In that one sentence, the analyst noted: a partner company (TSMC), a product event (next generation chips), a timeline (Q3 2025), and a financial figure ($2 billion). Named Entity Recognition (NER) is the automated version of that mental parsing.

NER systems scan text and label spans with entity types. "TSMC" gets labeled as an ORGANIZATION. "Q3 2025" gets labeled as a DATE. "$2 billion" gets labeled as a MONEY entity. "San Jose" in a description of a new facility gets labeled as a LOCATION. The output is a structured representation of who, when, how much, and where from prose that was never designed to be machine-readable.

In financial text, the entity types that matter most are often domain-specific and not handled well by general-purpose NER systems. Recognizing that "10-Q," "SEC," and "GAAP" are meaningful entities requires training on financial corpora. Distinguishing between a company reporting its own revenue ("our revenue was $5 billion") versus mentioning a competitor's revenue ("while Amazon reported $20 billion") requires understanding the entity's role in the sentence, not just its label.

The output of NER becomes the foundation for knowledge graph construction: after extracting entities, you can build a network of relationships — which companies are mentioned in each other's earnings calls, which executives appear across multiple firms, which technologies or markets are co-mentioned. These graphs reveal competitive dynamics and industry relationships that are invisible from financial statements alone.

Technical Definition

NER is typically formulated as a sequence labeling problem. Given a tokenized text t = [t₁, ..., t_n], predict a label yᵢ ∈ \{B-TYPE, I-TYPE, O\} for each token, where B = beginning of entity, I = inside entity, O = outside (not an entity).

Common entity types in general NLP: PERSON, ORG (organization), GPE (geopolitical entity), DATE, TIME, MONEY, PERCENT, CARDINAL.

Finance-specific entity types (from FinNER-139):

  • TICKER (stock symbol)
  • FINANCIAL_METRIC (revenue, EPS, gross margin)
  • REGULATORY_BODY (SEC, FASB, FTC)
  • FILING_TYPE (10-K, 8-K)
  • FISCAL_PERIOD (Q3 2024, FY2023)
  • PRODUCT_SERVICE (specific product names)

Model architectures: BiLSTM-CRF (traditional), fine-tuned BERT (dominant), GLiNER (generalist LLM-based extraction). Finance-specific models (FinBERT-NER) significantly outperform general-purpose models on financial entity recognition tasks.

NER → Relation Extraction → Knowledge Graph is the pipeline: extract entities, classify relationships between entity pairs (company A is_competitor_of company B), build a graph.

How VectorFin Uses This

VectorFin applies NER during the filing and transcript processing pipeline to extract structured entities that enrich the embedding metadata:

  • Competitor mentions: which companies does management name in their earnings calls? (competitor sentiment analysis)
  • Product mentions: new product names, technology partnerships, supplier names
  • Numerical entities: guidance figures extracted alongside the embedding for hybrid search
  • Person entities: track whether specific executives are mentioned in unusual contexts

The anomaly signal incorporates NER-derived features: a sudden increase in mentions of regulatory bodies (SEC, DOJ, FTC) or legal terminology in an earnings call or 8-K is flagged as a potential anomaly indicator.

GET https://api.vectorfinancials.com/v1/embeddings/{ticker}?period=2024-Q3
    # Response includes entities[] per chunk

Each embedding chunk in the API response includes an optional entities field with extracted named entities, enabling filtering for chunks that mention specific companies or dollar figures.

Code Example

import requests

API_BASE = "https://api.vectorfinancials.com"
API_KEY = "vf_your_api_key_here"

# Find earnings call chunks where specific competitors are mentioned
# NER makes this more reliable than keyword matching

ticker = "NVDA"
period = "2024-Q3"

# Semantic search for competitive landscape discussion
competitive_resp = requests.post(
    f"{API_BASE}/v1/embeddings/search",
    json={
        "query": "competitive landscape market share AMD Intel competition alternative GPU solutions",
        "tickers": [ticker],
        "period_start": period,
        "period_end": period,
        "top_k": 5,
    },
    headers={"X-API-Key": API_KEY},
)

print(f"Competitive mentions in {ticker} {period} earnings call:")
for r in competitive_resp.json()["results"]:
    entities = r.get("entities", [])
    org_entities = [e["text"] for e in entities if e["type"] == "ORG"]
    print(f"\nChunk {r['chunk_idx']} (score: {r['score']:.3f})")
    print(f"  Organizations mentioned: {org_entities}")
    print(f"  Text: {r['text'][:200]}")

# Cross-reference: do competitors mention NVDA in their own calls?
print("\n\nDo AMD and Intel mention NVIDIA in their earnings calls?")
for comp_ticker in ["AMD", "INTC"]:
    resp = requests.post(
        f"{API_BASE}/v1/embeddings/search",
        json={
            "query": "NVIDIA competitive products data center GPU market",
            "tickers": [comp_ticker],
            "period_start": period,
            "period_end": period,
            "top_k": 2,
        },
        headers={"X-API-Key": API_KEY},
    )
    results = resp.json()["results"]
    print(f"  {comp_ticker}: {len(results)} relevant chunks found")

Put Named Entity Recognition (NER) to work in your pipeline

Access AI-ready financial data — embeddings, signals, Iceberg tables.