Data Engineering

What is BigQuery Analytics Hub?

Google Cloud's data sharing marketplace that lets publishers list datasets and subscribers access them as native BigQuery tables without data movement.

In Plain English

Every data company faces the same distribution problem: how do you get your data into customers' hands without building a custom integration for each customer's environment or forcing them to copy and maintain a separate dataset that breaks every time your schema changes?

BigQuery Analytics Hub solves this for Google Cloud customers. A data publisher (like VectorFin) lists a dataset on the Hub. A subscriber (like a quant fund using BigQuery) subscribes to the listing with a few clicks and immediately has the dataset available as a native BigQuery table in their own GCP project. Data copying, ETL, and ongoing maintenance are all eliminated. When VectorFin updates its data, subscribers automatically see the latest version.

The subscriber pays only their own BigQuery compute costs for queries. They don't have to manage pipelines, credentials, or schema migrations. For them, the dataset looks and behaves like any other table in their BigQuery environment. They can join it with their own data, build dashboards on it, schedule queries against it, and apply their own access controls.

For data publishers, Analytics Hub eliminates the need for custom data delivery infrastructure. Instead of building S3 push pipelines, SFTP transfers, or bespoke API integrations for each customer, VectorFin lists once on Analytics Hub and all subscribing BigQuery customers get automatic access.

Under the hood, BigQuery Omni and Apache Iceberg external tables enable this: subscriber queries run directly against VectorFin's Parquet files on GCS without copying data to the subscriber's project.

Technical Definition

Analytics Hub consists of:

Exchanges: Containers for data listings (e.g., a "VectorFin Financial Data" exchange).

Listings: Individual dataset offerings within an exchange. Each listing points to a BigQuery dataset or BigLake Iceberg table. Listings can be public, private (specific subscribers), or restricted.

Linked datasets: When a subscriber subscribes to a listing, BigQuery creates a "linked dataset" in their project. This virtual dataset references the publisher's actual data, so queries against it transparently access the source without copying.

BigLake: BigQuery's managed external table framework. BigLake tables point to GCS Parquet or Iceberg files. Access control is managed at the table level, not the storage level. The publisher controls access without exposing GCS bucket permissions.

Pricing model: publisher pays for the GCS storage and Analytics Hub listing fee. Subscriber pays for BigQuery slot usage when querying. No per-row API costs for bulk queries.

Schema evolution: because the data is physically in GCS (not copied), schema changes by the publisher are immediately visible to all subscribers. No migration required.

How VectorFin Uses This

VectorFin publishes its signal and embedding datasets via BigQuery Analytics Hub for Pro plan customers on GCP. The subscription workflow:

1. VectorFin grants the customer's GCP project access to the private listing 2. Customer subscribes via the Analytics Hub console (5 minutes) 3. A linked dataset appears in the customer's BigQuery project 4. Customer queries VectorFin data with standard BigQuery SQL

No custom integration, no data pipeline, no credential management. Just SQL.

Pro customers get access to:

vectorfinancials.vectorfinancials_signals.signals, one flat table, weekly, the S&P 500 (beta). Columns include the composite score, the Piotroski / Altman Z / Beneish M / Sloan accrual component fields, and regime / regime_confidence.

Code Example

python

from google.cloud import bigquery

# After subscribing to VectorFin on Analytics Hub, the linked dataset
# appears in your own BQ project as if it were native data

client = bigquery.Client(project="your-gcp-project")

# Standard BigQuery SQL - joins VectorFin signals with your own portfolio
query = """
WITH portfolio AS (
    SELECT ticker, shares, avg_cost_basis
    FROM `your-project.your_dataset.holdings`
    WHERE date = CURRENT_DATE()
),
signals AS (
    SELECT
        ticker,
        score AS whystock_score,
        altman_zone,
        regime,
        knowledge_ts
    FROM `your-project.vectorfin_linked.signals`
    WHERE DATE(effective_ts) = DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)
      AND knowledge_ts <= CURRENT_TIMESTAMP()
)
SELECT
    p.ticker,
    p.shares,
    s.whystock_score,
    s.altman_zone,
    CASE
        WHEN s.whystock_score > 0.7 AND s.regime = 'bull' THEN 'OVERWEIGHT'
        WHEN s.whystock_score < 0.3 OR s.regime = 'bear' THEN 'UNDERWEIGHT'
        ELSE 'NEUTRAL'
    END AS recommendation
FROM portfolio p
LEFT JOIN signals s USING (ticker)
ORDER BY s.whystock_score DESC NULLS LAST
"""

result = client.query(query).to_dataframe()
print("Portfolio positions with VectorFin signals:")
print(result.to_string(index=False))

Related Terms

apache iceberg polaris catalog

External References

Put BigQuery Analytics Hub to work in your pipeline

Pull AI-ready embeddings and signals as Iceberg tables or over the REST API.

Get API Access