A data modeling pattern that tracks two independent time axes — when something happened in the real world and when your system first learned about it — enabling rigorous point-in-time queries.
In Plain English
Most databases record what is true right now. Update a row and the old value is gone — overwritten. This is fine for many applications, but it's catastrophic for financial data analysis and backtesting. If you're trying to reconstruct what you knew about a company on October 1, 2022, you need more than just today's data. You need the data as it existed on that specific date, including all the revisions and corrections that have happened since.
Bitemporal data solves this by tracking two independent time dimensions for every piece of information. The first is called "effective time" — when did this fact become true in the real world? An earnings report that covers Q3 2024 has an effective date of the quarter end (September 30, 2024). The second is called "transaction time" or "knowledge time" — when did your data system first record this fact? The earnings report might be filed on October 25, 2024 and ingested by your system on October 26, 2024. That's the knowledge time.
Why do both matter? Consider a company that restates its earnings six months after the original release. A bitemporal database records both versions: the original numbers with their original knowledge time, and the restated numbers with a new knowledge time. When you backtest a strategy as of October 2024, you use only data with knowledge_ts ≤ October 2024 — you don't accidentally use the restated numbers that your system didn't know about yet.
This pattern is especially important in finance because of look-ahead bias: the insidious error of using information in a backtest that wasn't actually available at the time of the simulated trade. Survivorship bias (using only companies that survived) is one form. Using restated financials as if they were the original reports is another. Bitemporal modeling eliminates this class of error at the data layer.
Technical Definition
A bitemporal table has two mandatory timestamp columns (in addition to any domain attributes):
- effective_ts (valid time / business time): when the fact became true in the real world
- knowledge_ts (transaction time / system time): when the database recorded the fact
A bitemporal query "as of time t" retrieves facts where knowledge_ts <= t — the universe of information that was available at time t.
For an earnings call embedding, the bitemporal tuple is:
- effective_ts = fiscal period end date (e.g., 2024-09-30)
- knowledge_ts = ingestion timestamp (e.g., 2024-10-26T03:15:00Z)
SQL point-in-time query pattern:
SELECT *
FROM embeddings_transcripts
WHERE ticker = 'AAPL'
AND knowledge_ts <= '2024-10-15 00:00:00' -- only data known by this date
ORDER BY effective_ts DESCThe table is append-only: records are never updated, only new records are inserted with the current knowledge_ts. This enables full audit history and makes the data immutable for compliance purposes.
How VectorFin Uses This
Every table in VectorFin's data warehouse is bitemporal. The schema for every signal table includes both effective_ts and knowledge_ts:
-- Signals table structure
ticker VARCHAR
date DATE -- the business date (effective)
score DOUBLE
components STRUCT(...)
effective_ts TIMESTAMP -- = date at market close (e.g., 2024-10-01 21:00:00 UTC)
knowledge_ts TIMESTAMP -- = when signals_writer job ran (~02:00 UTC next day)This gap between effective_ts and knowledge_ts — typically 5-8 hours for VectorFin signals — represents the processing pipeline latency and is critical for realistic backtesting: your strategy cannot trade on a signal computed at 2am on the signal's effective date.
GET https://api.vectorfinancials.com/v1/signals/whystock-score/AAPL?date=2024-10-01The API automatically applies the knowledge_ts filter unless you explicitly request raw bitemporal data.
Code Example
import duckdb
import pandas as pd
conn = duckdb.connect()
conn.execute("INSTALL iceberg; LOAD iceberg;")
# Point-in-time query: what signals were available for a backtest on 2024-10-15?
# CRITICAL: use knowledge_ts <= backtest_date, not effective_ts
backtest_date = "2024-10-15"
df = conn.execute(f"""
SELECT
ticker,
date AS effective_date,
score,
knowledge_ts,
DATEDIFF('hour', effective_ts, knowledge_ts) AS pipeline_lag_hours
FROM vectorfinancials.signals.whystock_score
WHERE knowledge_ts <= TIMESTAMP '{backtest_date} 23:59:59'
AND date >= DATE '{backtest_date}' - INTERVAL 7 DAYS
AND date <= DATE '{backtest_date}'
ORDER BY date DESC, score DESC
LIMIT 20
""").df()
print(f"Signals available for backtest as of {backtest_date}:")
print(df.to_string(index=False))
# Verify: no rows have knowledge_ts > backtest_date (look-ahead check)
future_data = df[df["knowledge_ts"] > backtest_date]
assert len(future_data) == 0, f"Look-ahead bias detected! {len(future_data)} rows"
print("\nLook-ahead bias check: PASSED")Put Bitemporal Data to work in your pipeline
Access AI-ready financial data — embeddings, signals, Iceberg tables.