An open-source implementation of the Iceberg REST Catalog specification that enables multiple compute engines to read and write the same Iceberg tables through a standard API.
In Plain English
Apache Iceberg tables don't manage themselves. Someone has to keep track of which files make up each table, where those files live, what the current schema is, and what the current snapshot is. That's the catalog's job — it's the table of contents for your data lake.
Apache Polaris is an open-source catalog that implements the Iceberg REST Catalog specification: a standard HTTP API that any Iceberg-compatible engine can call to discover and interact with tables. Think of it as a data registry or a DNS server for your tables. Snowflake asks Polaris, "Where is the signals.whystock_score table?" Polaris responds with the metadata file location. Snowflake then reads the data directly from GCS.
Before standards like the Iceberg REST Catalog, each compute engine had its own catalog format. Your Spark cluster used Hive Metastore. Your Databricks cluster used Unity Catalog. Snowflake used its own proprietary table registry. Getting all three to read the same underlying data required complex ETL pipelines and constant synchronization.
Polaris eliminates this. One catalog, one source of truth, multiple engines. You create a table in Polaris, point it at a GCS location, and immediately both Snowflake and Databricks can query it as a native table in their respective SQL environments — with full ACID consistency, schema enforcement, and time travel.
The "open" in open-source matters here. Snowflake donated Polaris to the Apache Software Foundation in 2024, ensuring it remains vendor-neutral and that no single cloud provider controls the standard.
Technical Definition
The Iceberg REST Catalog API spec defines endpoints:
GET /v1/namespaces # list namespaces
GET /v1/namespaces/{ns}/tables # list tables in namespace
GET /v1/namespaces/{ns}/tables/{tbl} # load table metadata
POST /v1/namespaces/{ns}/tables # create table
POST /v1/namespaces/{ns}/tables/{tbl} # commit table update
DELETE /v1/namespaces/{ns}/tables/{tbl} # drop tableAuthentication: OAuth2 token exchange or static credentials. Scopes: CATALOG, PRINCIPAL_ROLE:ALL.
Key concepts:
- Warehouse: a GCS/S3 location prefix that Polaris manages
- Namespace: logical grouping of tables (analogous to a database schema)
- Principal role: access control — which clients can read/write which catalogs and namespaces
- Vended credentials: Polaris generates short-lived GCS credentials for the compute engine to directly access data files, without the engine needing long-term storage credentials
Polaris vs alternatives: AWS Glue (AWS-specific), Hive Metastore (JVM-heavy), Nessie (version control-centric), Unity Catalog (Databricks-specific). Polaris is the only fully open-source Iceberg REST Catalog implementation.
How VectorFin Uses This
VectorFin runs Apache Polaris on Cloud Run at catalog.vectorfinancials.com. This catalog serves as the authoritative registry for all VectorFin Iceberg tables.
Pro plan customers are provisioned as principals in the Polaris catalog with read-only access to the vectorfinancials warehouse. They configure their Snowflake, Databricks, or BigQuery to use the catalog:
Snowflake external volume + catalog integration:
CREATE CATALOG INTEGRATION vectorfin_catalog
CATALOG_SOURCE = ICEBERG_REST
TABLE_FORMAT = ICEBERG
CATALOG_NAMESPACE = 'signals'
REST_CONFIG = ('CATALOG_URI' = 'https://catalog.vectorfinancials.com')
REST_AUTHENTICATION = ('TYPE' = 'BEARER', 'BEARER_TOKEN' = 'your_api_key');
CREATE ICEBERG TABLE signals_whystock_score
CATALOG = 'vectorfin_catalog'
CATALOG_TABLE_NAME = 'whystock_score';Code Example
import requests
POLARIS_URL = "https://catalog.vectorfinancials.com"
API_KEY = "vf_your_api_key_here"
headers = {
"Authorization": f"Bearer {API_KEY}",
"Content-Type": "application/json",
}
# List available namespaces
namespaces = requests.get(f"{POLARIS_URL}/v1/namespaces", headers=headers).json()
print("Available namespaces:")
for ns in namespaces.get("namespaces", []):
print(f" {'.'.join(ns)}")
# List tables in the signals namespace
tables = requests.get(
f"{POLARIS_URL}/v1/namespaces/signals/tables",
headers=headers,
).json()
print("\nSignal tables:")
for t in tables.get("identifiers", []):
print(f" {t['namespace']}.{t['name']}")
# Load table metadata (shows current snapshot, schema, partition spec)
table_meta = requests.get(
f"{POLARIS_URL}/v1/namespaces/signals/tables/whystock_score",
headers=headers,
).json()
print(f"\nwhystock_score current snapshot ID: {table_meta['metadata']['current-snapshot-id']}")
print(f"Schema columns: {[f['name'] for f in table_meta['metadata']['schemas'][-1]['fields']]}")External References
Put Apache Polaris (Iceberg Catalog) to work in your pipeline
Access AI-ready financial data — embeddings, signals, Iceberg tables.