What is Format Switch?
A filing change that is large in words but small in meaning — a template swap or reformat — flagged so it can be down-weighted instead of mistaken for a real disclosure shift.
In Plain English
A format switch is the most common false positive in filing-change detection. An issuer changes the wrapper, not the message: a new document template, an HTML-to-text reflow, reordered subsections, or a paragraph reworded to say exactly the same thing. A word-level diff lights up; nothing of substance has changed.
These are noise, and they are frequent. Legal and reporting teams refresh boilerplate routinely. If you screen on raw lexical change alone, format switches will dominate your results and bury the handful of changes that actually matter.
VectorFin detects them by comparison. When a section's lexical change is high but its semantic change, measured over embeddings, stays low, the edit is almost certainly cosmetic. The Filing Change Signal sets a format_switch_suspected flag on exactly these cases so you can filter them out and keep your attention on substantive disclosure shifts.
Technical Definition
format_switch_suspected is a boolean derived from the relationship between lexical and semantic change for a section. Conceptually it is set when:
(1 − cosine) is high (large lexical change) AND (1 − cosine_embedding) is low (small semantic change),
equivalently when lex_sem_divergence = cosine_embedding − cosine is large and positive. It is the operational counterpart to the lexical-semantic divergence: the divergence is the continuous gap, the flag is the thresholded verdict.
How VectorFin Uses This
Every scored section of a FilingChangeRecord carries format_switch_suspected next to the raw cosine, jaccard, and cosine_embedding values. A typical screen drops rows where the flag is true before ranking the rest by semantic change or by change_pctile_universe.
Code Example
import requests
resp = requests.get(
"https://api.vectorfinancials.com/v1/filings/AAPL/changes",
params={"form_type": "10-K", "limit": 10},
headers={"X-API-Key": "vf_sk_your_key_here"},
)
substantive = [
(r["accession"], s["section_id"], round(1 - s["cosine_embedding"], 3))
for r in resp.json()
for s in r["sections"]
if s["parse_status"] == "ok" and not s["format_switch_suspected"]
]
print(sorted(substantive, key=lambda x: -x[2]))Related Terms
Put Format Switch to work in your pipeline
Pull AI-ready embeddings and signals as Iceberg tables or over the REST API.
Get API Access