VectorFin/Glossary/Point-in-Time Query
Data Engineering

What is Point-in-Time Query?

A database query that retrieves the data exactly as it existed at a specific historical moment, used to reconstruct past states without contamination from later information.

In Plain English

A point-in-time query asks: "What did our database know on a specific past date?" It reconstructs the exact state of the data as it existed at that moment — not today's view of history, but the genuinely historical state.

This distinction matters enormously in financial analysis. Today's database might contain restated earnings, corrected company filings, and signals computed from data that was later revised. If you're backtesting a strategy as of January 2022, you need January 2022's data — the numbers that were actually available to investors on that date, including all their errors and incompleteness.

Imagine you're an archaeologist versus a historian. The historian can use everything we know today to reconstruct the past. The archaeologist must work only from evidence that existed at a given point in time. Financial backtesting requires the archaeologist's approach. Using the historian's approach — current knowledge of past events — generates artificially strong backtest results that collapse in live trading.

Point-in-time queries are implemented in two main ways. The simpler way is a filter on an ingestion/knowledge timestamp: WHERE knowledge_ts <= backtest_date. This gives you all records that were loaded into the database before the backtest date. The more sophisticated way (Iceberg time travel) uses table snapshots: SELECT * FROM table FOR SYSTEM_TIME AS OF '2022-01-01' queries against the table snapshot that existed at that timestamp.

For the cleanest backtests, you want both: filter on knowledge_ts to avoid using restated data, and filter on effective_ts to get the business-relevant date range.

Technical Definition

A point-in-time query over a bitemporal table T reconstructs the state of T as observed at time t_obs for facts effective at time t_eff:

SELECT *
FROM T
WHERE knowledge_ts <= t_obs     -- only data known at t_obs
  AND effective_ts <= t_eff     -- only facts effective by t_eff
-- For slowly-changing dimensions, additionally:
  AND (end_knowledge_ts IS NULL OR end_knowledge_ts > t_obs)

Iceberg time travel syntax:

-- By timestamp (snapshot at or before this time)
SELECT * FROM table FOR SYSTEM_TIME AS OF TIMESTAMP '2022-01-01 00:00:00'

-- By snapshot ID
SELECT * FROM table FOR VERSION AS OF 1234567890

-- Between two snapshots (incremental processing)
SELECT * FROM table CHANGES BETWEEN VERSION 1000 AND VERSION 2000

Point-in-time join (the hard part): joining two bitemporal tables where both must be as-of the same time. Example: joining a signals table (knowledge_ts, effective_ts) to a fundamentals table (knowledge_ts, fiscal_period) requires careful AS-OF matching to avoid look-ahead in either table.

How VectorFin Uses This

VectorFin's API implements point-in-time queries as a first-class feature via the as_of parameter:

GET https://api.vectorfinancials.com/v1/signals/whystock-score/AAPL
    ?date=2024-10-01         # effective date
    &as_of=2024-10-15        # knowledge_ts cutoff

Without as_of, the API returns the latest available data for the effective date. With as_of, it returns only data that had been ingested by the as_of timestamp — correctly simulating what would have been available on that date.

For bulk historical queries, the Iceberg tables support full time travel. Pro customers can reconstruct the complete state of any signal table at any point in the past using Apache Iceberg's snapshot history.

Code Example

import requests
import pandas as pd
from datetime import date, timedelta

API_BASE = "https://api.vectorfinancials.com"
API_KEY = "vf_your_api_key_here"

def backtest_signals(
    tickers: list[str],
    start_date: str,
    end_date: str,
    rebalance_frequency_days: int = 21,
) -> pd.DataFrame:
    """
    Collect point-in-time signal history for a universe of tickers.
    Correctly uses as_of parameter to prevent look-ahead bias.
    """
    current = date.fromisoformat(start_date)
    end = date.fromisoformat(end_date)
    results = []

    while current <= end:
        trade_date = current.isoformat()

        for ticker in tickers:
            resp = requests.get(
                f"{API_BASE}/v1/signals/whystock-score/{ticker}",
                params={
                    "date": trade_date,
                    "as_of": trade_date,  # point-in-time: only data known by trade_date
                },
                headers={"X-API-Key": API_KEY},
            )
            if resp.ok and resp.json().get("score") is not None:
                data = resp.json()
                results.append({
                    "trade_date": trade_date,
                    "ticker": ticker,
                    "score": data["score"],
                    "knowledge_ts": data["knowledge_ts"],
                })

        current += timedelta(days=rebalance_frequency_days)

    return pd.DataFrame(results)

# Run a point-in-time backtest over Q4 2024
signals_history = backtest_signals(
    tickers=["AAPL", "MSFT", "NVDA", "GOOGL", "META"],
    start_date="2024-10-01",
    end_date="2024-12-31",
    rebalance_frequency_days=21,  # monthly rebalance
)

print(f"Collected {len(signals_history)} point-in-time observations")
print(signals_history.to_string(index=False))

Put Point-in-Time Query to work in your pipeline

Access AI-ready financial data — embeddings, signals, Iceberg tables.