The question every company asks before choosing a warehouse is "which is best?". The question no one asks, but should, is "which is most expensive to abandon?". In 2026, with Databricks, Snowflake and BigQuery in broad technical parity, the practical difference that will matter over the next 5–10 years is lock-in — and each of the three has a different aprisonment pattern, with a different exit cost, and a different marketing layer to obscure it.
This text enumerates the three lock-in vectors that matter, shows how each warehouse scores in each, and explains why an official partner's comparison is, by definition, partial. Not because the partner lies — because they can only see well the technology they deliver.
The structural bias of the official partner
Before we get into the vectors, it's worth understanding why partner comparisons are problematic. A Databricks Gold partner earns revenue training teams on Spark, optimizing Delta Lake, selling Unity Catalog. A Snowflake Premier partner earns revenue structuring Snowflake warehouses, optimizing cluster warehouses, selling Streamlit. A Google Cloud partner earns revenue in BigQuery + Looker + Vertex AI.
None of the three can honestly recommend the client leave their platform — not out of bad faith, but because expertise is all concentrated on one side of the choice. Asking a Databricks partner to compare with Snowflake is like asking a soccer coach to recommend a swimming academy. The answer may be technically correct on the surface and structurally biased in the conclusion.
Agnostic consultancy — without resale incentive — is the only arrangement where the comparison can be honest. It is not the general case of the market. Hence this text.
The question "which warehouse is best?" has 80% of the answer in the public comparative. The 20% that decide are lock-in — and no one with resale incentive will give you that piece.
Vector 1 — Storage format
Here lives the oldest and most serious lock-in. When data is in a proprietary closed format, migrating requires rewriting, reprocessing, and validating everything.
Snowflake uses a proprietary internal format (FDN — Flexible Data Network). A pure Snowflake client has all data in a format only Snowflake reads. Recently, Snowflake started supporting Iceberg (open format) as external tables, but the default operation is still internal. A full exit requires COPY INTO of all tables to S3 in Parquet, then re-ingest in another warehouse. At 50TB volume, reverse migration project costs US$ 200k–500k in specialized consulting.
Databricks with Delta Lake uses open format (Delta) on the client's storage (S3/ADLS/GCS). Data lives in the client's cloud storage, in a format that Spark, DuckDB, Trino and others read. Reverse migration to another processing engine is trivial — just point the new engine to the same bucket. Databricks' lock-in is elsewhere (next vector), not in format.
BigQuery uses a proprietary format (Capacitor). It supports export to Parquet in GCS, but internal data is closed. A full exit requires complete extraction via export jobs, similar to Snowflake. Difference vs Snowflake: BigQuery Storage Read API allows external reading without intermediate export — facilitates hybrid stack, but doesn't eliminate lock-in.
Practical scoring:
- Databricks: low lock-in (open format).
- BigQuery: medium lock-in (proprietary, but with external read API).
- Snowflake: high lock-in (proprietary, costly export, although Iceberg is entering).
Vector 2 — Processing layer and specific SQL
Even with portable data, compute layer and SQL syntax create subtle dependencies.
Snowflake has well-adherent ANSI SQL, with proprietary extensions (Snowpark, JavaScript UDFs, Streamlit, Cortex AI) that don't migrate. A company adopting Cortex for generative analysis stays locked into that vendor for that function. Stored procedures in Snowflake JavaScript are total rewrite on another platform.
Databricks has multiple engines (Spark SQL, Photon, own serverless SQL). Spark is portable (any cloud, OSS). Photon is proprietary but, being under-the-hood optimization, doesn't create syntactic lock-in. UDFs in Python/Scala are portable (any Spark runs). Real Databricks lock-in: notebooks, MLflow, Unity Catalog, and Workflows — those are an exclusive path. A team that gets used to Databricks Notebooks faces high friction in another tool.
BigQuery has SQL standard close to ANSI, with extensions (native ARRAY, STRUCT, BQML, BigQuery ML). BQML for machine learning models is a powerful but exclusive function — migration to another engine requires rewriting models in Python/Spark. Geo functions and ARRAY/STRUCT are more flexible than in Snowflake but, at the same time, create code that doesn't run elsewhere without refactoring.
Practical scoring:
- Databricks: medium lock-in (Spark portable, but notebook/MLflow/Unity Catalog stick).
- Snowflake: medium-high lock-in (SQL portable, UDFs and Cortex not).
- BigQuery: medium-high lock-in (SQL close to standard, BQML and specific functions not).
Vector 3 — Integration with the cloud ecosystem
The most underestimated lock-in: how much the warehouse is sewn together with other services from the same cloud.
BigQuery lives inside Google Cloud. Native integration with Looker, Vertex AI, Pub/Sub, Dataflow, Cloud Storage. Reverse migration is not just warehouse — it's renegotiating the entire data stack that grew around it. A company with Looker + BigQuery + Vertex AI must migrate three products together. Exit cost grows exponentially with time inside GCP.
Snowflake runs multi-cloud (AWS, Azure, GCP). That's the main marketing argument — "Snowflake is neutral between clouds". True at compute. Not true at integrations: Snowflake Native Apps, Snowpark Container Services, Streamlit, Cortex are exclusive. A team adopting those layers re-creates lock-in at another level.
Databricks runs on AWS, Azure and GCP natively. Has deep integrations with each (especially Azure, via Microsoft partnership), but the engine is portable between clouds — workspace in AWS migrates to Azure with lower cost than other options. Real lock-in is in Unity Catalog (governance layer) and Workflows (orchestration) — those migrate with refactoring, not simple export.
Practical scoring:
- Databricks: low-medium lock-in in ecosystem (multi-cloud + portable engine).
- Snowflake: medium lock-in (multi-cloud compute, but new features stick).
- BigQuery: high lock-in (lives within GCP, native integrations create compound dependency).
How to measure before signing
Four practical questions to ask before closing an annual contract.
- What % of data lives in open format? If < 50%, significant technical lock-in. Negotiate explicit support for Iceberg/Delta as external tables.
- How much SQL code is portable (pure ANSI)? Audit the 50 most critical queries. If more than 30% uses proprietary functions, migration is refactoring, not export.
- How many native integrations with other products of the same cloud? If the entire stack lives in a single cloud, moving warehouse is moving the entire stack.
- What's the estimated cost of full export? Ask the partner for an estimate of US$/TB for
COPY INTO/EXPORT. If they don't know or deflect, it's a red signal.
A company that asks these 4 questions before signing pays cheaper, negotiates better clauses, and almost never needs to migrate. A company that doesn't, pays 3 years on the wrong warehouse and discovers the exit cost when the budget doubles.
The honest choice in 2026
Without resale bias, recommendation by context:
Greenfield with portability priority: Databricks with Delta Lake. Client's own storage, open format, multi-cloud engine. Minimum possible lock-in for a modern warehouse.
Company already on Azure or multi-cloud with mixed workloads: Databricks or Snowflake. Decision by existing team strength.
Already-Google-cloud company: BigQuery, without hesitation — but aware that it's accepting high lock-in in exchange for deep native integration. It's not the wrong decision, it's the informed decision.
Pure analytical use case, no heavy ML, small team: Snowflake. Simpler operation, pure SQL, no need to manage Spark.
Analytical + ML at scale use case: Databricks (or BigQuery + Vertex if already on Google). Snowflake with Snowpark only partially solves it.
Important: none of these recommendations come from mid-market comparative alone. They come from specific lock-in. For each case, the question "how much does it cost to leave?" is as important as "how much does it cost to run?". Official partners answer the second. The first is for the client to research — or for an agnostic consultant to answer. There's a structural reason for that arrangement.