DuckDB + VectorFin
Pull VectorFin signals into DuckDB via the REST API for fast local analytics. Works on every plan; no infrastructure required.
Prerequisites
Connection Guide
Install DuckDB and a HTTP client
DuckDB runs in-process from Python (or as a CLI). For the REST API path, you only need requests + duckdb.
pip install duckdb requests pandasPull signals via the REST API into a DataFrame
Fetch a ticker's signals as JSON, hand the DataFrame to DuckDB, and run SQL. Round-trip latency dominates; for repeated queries, materialize once.
import duckdb, requests, pandas as pd, os
VF_API_KEY = os.environ["VECTORFIN_API_KEY"]
def fetch_signals(ticker: str, date_from: str = "2024-01-01") -> pd.DataFrame:
r = requests.get(
f"https://api.vectorfinancials.com/v1/signals/{ticker}",
headers={"X-API-Key": VF_API_KEY},
params={"date_from": date_from, "limit": 365},
timeout=10,
)
r.raise_for_status()
return pd.DataFrame(r.json())
# Build a small basket
basket = pd.concat([
fetch_signals(t).assign(ticker=t)
for t in ["AAPL", "MSFT", "NVDA", "GOOGL", "META"]
])
con = duckdb.connect()
con.register("signals", basket)
con.sql("""
SELECT ticker, AVG(score) AS avg_score, COUNT(*) AS n_days
FROM signals
WHERE date >= '2024-06-01'
GROUP BY ticker
ORDER BY avg_score DESC
""").show()Materialize to a Parquet file for repeat workloads
API quotas matter. For repeated analysis, dump the basket to Parquet once and let DuckDB read from disk on subsequent runs.
# One-shot: pull and persist
con.sql("COPY signals TO 'vectorfin_signals.parquet' (FORMAT PARQUET)")
# Subsequent runs: zero API calls
con = duckdb.connect()
df = con.sql("SELECT * FROM 'vectorfin_signals.parquet' WHERE ticker = 'AAPL'").df()
print(df.tail())Embeddings: fetch by fiscal period, similarity in DuckDB
For semantic search across a small set of tickers, pull the chunks via the REST API and run cosine in DuckDB with list_dot_product / list_cosine_similarity.
# Pull embeddings for a basket and load into DuckDB
import numpy as np
rows = []
for ticker in ["AAPL", "MSFT", "NVDA"]:
r = requests.get(
f"https://api.vectorfinancials.com/v1/embeddings/{ticker}",
headers={"X-API-Key": VF_API_KEY},
params={"fiscal_period": "2024-Q3", "limit": 100},
timeout=10,
)
r.raise_for_status()
rows.extend(r.json())
embs = pd.DataFrame(rows)
con.register("embeddings", embs)
# Random query vector — replace with one from your embedding model
query = (np.random.randn(768)).tolist()
con.sql(f"""
WITH q AS (SELECT {query}::DOUBLE[] AS v)
SELECT ticker, fiscal_period, chunk_idx,
list_cosine_similarity(embedding, q.v) AS sim
FROM embeddings, q
ORDER BY sim DESC
LIMIT 10
""").show()Available Tables
All 7 VectorFin data tables — bitemporal (effective_ts + knowledge_ts), append-only, nightly updates.
GET /v1/signals/{ticker}Fetch signals → register as a DuckDB view▼
con.register("signals", pd.DataFrame(requests.get(url, headers=hdr).json()))GET /v1/embeddings/{ticker}Fetch chunks → run list_cosine_similarity in DuckDB▼
con.sql("SELECT list_cosine_similarity(embedding, ?::DOUBLE[]) FROM embeddings", [query])COPY ... TO ParquetPersist for repeat queries without burning API quota▼
con.sql("COPY signals TO 'vf.parquet' (FORMAT PARQUET)")Related Integrations
Start querying in 5 minutes
Sign up for VectorFin and get immediate API access.