VectorFin/Glossary/BigQuery Analytics Hub
Data Engineering

What is BigQuery Analytics Hub?

Google Cloud's data sharing marketplace that lets publishers list datasets and subscribers access them as native BigQuery tables without data movement.

In Plain English

Every data company faces the same distribution problem: how do you get your data into customers' hands without building custom integrations for each customer's environment, without making them copy and maintain a separate dataset, and without creating a support burden every time your schema evolves?

BigQuery Analytics Hub solves this for Google Cloud customers. A data publisher (like VectorFin) lists a dataset on the Hub. A subscriber (like a quant fund using BigQuery) subscribes to the listing with a few clicks and immediately has the dataset available as a native BigQuery table in their own GCP project — no data copying, no ETL, no maintenance. When VectorFin updates its data, subscribers automatically see the latest version.

The subscriber pays only their own BigQuery compute costs for queries. They don't have to manage pipelines, credentials, or schema migrations. For them, the dataset looks and behaves like any other table in their BigQuery environment — they can join it with their own data, build dashboards on it, schedule queries against it, and apply their own access controls.

For data publishers, Analytics Hub eliminates the need for custom data delivery infrastructure. Instead of building S3 push pipelines, SFTP transfers, or bespoke API integrations for each customer, VectorFin lists once on Analytics Hub and all subscribing BigQuery customers get automatic access.

The technology behind this is BigQuery Omni and external table support for Apache Iceberg: the subscriber's query runs against VectorFin's Parquet files on GCS without physically copying the data to the subscriber's project.

Technical Definition

Analytics Hub consists of:

Exchanges: Containers for data listings (e.g., a "VectorFin Financial Data" exchange).

Listings: Individual dataset offerings within an exchange. Each listing points to a BigQuery dataset or BigLake Iceberg table. Listings can be public, private (specific subscribers), or restricted.

Linked datasets: When a subscriber subscribes to a listing, BigQuery creates a "linked dataset" in their project — a virtual dataset that references the publisher's actual data. Queries run against the linked dataset transparently read the publisher's data.

BigLake: BigQuery's managed external table framework. BigLake tables point to GCS Parquet or Iceberg files. Access control is managed at the table level, not the storage level — the publisher controls access without exposing GCS bucket permissions.

Pricing model: publisher pays for the GCS storage and Analytics Hub listing fee. Subscriber pays for BigQuery slot usage when querying. No per-row API costs for bulk queries.

Schema evolution: because the data is physically in GCS (not copied), schema changes by the publisher are immediately visible to all subscribers — no migration required.

How VectorFin Uses This

VectorFin publishes its signal and embedding datasets via BigQuery Analytics Hub for Pro plan customers on GCP. The subscription workflow:

1. VectorFin grants the customer's GCP project access to the private listing 2. Customer subscribes via the Analytics Hub console (5 minutes) 3. A linked dataset appears in the customer's BigQuery project 4. Customer queries VectorFin data with standard BigQuery SQL

No custom integration, no data pipeline, no credential management. Just SQL.

Pro customers get access to:

  • vectorfinancials.signals.whystock_score — composite signal, daily, 5,000+ tickers
  • vectorfinancials.signals.volatility — GARCH forecasts, daily
  • vectorfinancials.signals.regime — market regime, daily
  • vectorfinancials.signals.sentiment_drift — NLP drift, quarterly
  • vectorfinancials.signals.anomaly — anomaly flags, daily

Code Example

from google.cloud import bigquery

# After subscribing to VectorFin on Analytics Hub, the linked dataset
# appears in your own BQ project as if it were native data

client = bigquery.Client(project="your-gcp-project")

# Standard BigQuery SQL — joins VectorFin signals with your own portfolio
query = """
WITH portfolio AS (
    SELECT ticker, shares, avg_cost_basis
    FROM `your-project.your_dataset.holdings`
    WHERE date = CURRENT_DATE()
),
signals AS (
    SELECT
        ticker,
        score AS whystock_score,
        garch_vol_21d,
        regime,
        knowledge_ts
    FROM `your-project.vectorfin_linked.whystock_score`
    WHERE DATE(effective_ts) = DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)
      AND knowledge_ts <= CURRENT_TIMESTAMP()
)
SELECT
    p.ticker,
    p.shares,
    s.whystock_score,
    s.garch_vol_21d,
    CASE
        WHEN s.whystock_score > 0.7 AND s.regime = 'bull' THEN 'OVERWEIGHT'
        WHEN s.whystock_score < 0.3 OR s.regime = 'bear' THEN 'UNDERWEIGHT'
        ELSE 'NEUTRAL'
    END AS recommendation
FROM portfolio p
LEFT JOIN signals s USING (ticker)
ORDER BY s.whystock_score DESC NULLS LAST
"""

result = client.query(query).to_dataframe()
print("Portfolio positions with VectorFin signals:")
print(result.to_string(index=False))

Put BigQuery Analytics Hub to work in your pipeline

Access AI-ready financial data — embeddings, signals, Iceberg tables.