Every time a user submits a natural language query to an enterprise AI system backed by a cloud LLM, something invisible happens: a detailed description of that company's data — table names, column structures, sample values, business logic — is transmitted to a third-party server, processed, and used to generate a response.
Most enterprises have not fully reckoned with what this means. They are using cloud AI to answer sensitive operational questions while sending fragments of their most sensitive operational context to infrastructure they do not control.
"The question is not whether local LLMs are as capable as cloud models. The question is whether cloud models are an acceptable risk for enterprise data."
What a Query Actually Contains
Consider a typical enterprise query prompt sent to a cloud LLM for SQL generation. It contains the schema context for the relevant tables — column names, data types, relationships. It may contain sample values to ground the model. It contains the natural language question itself, which often includes entity names, department names, product names, and financial concepts.
A single query to generate sales-by-region SQL might include:
- The schema of your financial transaction table, including all column names and constraints
- Sample values from that table — potentially including real revenue figures
- The names of your regions, which may not be publicly known
- The specific business question, which reveals what your executive team is thinking about
- Your database dialect, version, and naming conventions — useful competitive intelligence
Multiplied across an organisation's daily query volume, this is an ongoing, low-level exfiltration of business context to external infrastructure. It is legal, it is contractually permitted by most cloud AI providers, and it is almost certainly outside the mental model of the CISO who signed the enterprise agreement.
⚠ The Compliance Surface Most Teams Miss
GDPR Article 28 requires a Data Processing Agreement with any sub-processor that handles personal data. When schema context includes columns containing personal data (customer names, emails, addresses), sending that schema to a cloud LLM may require explicit DPA coverage — coverage that standard enterprise AI agreements frequently do not provide.
The Performance Inversion
The conventional wisdom holds that cloud LLMs outperform local models for complex reasoning tasks. For general-purpose reasoning, this is largely true. For the specific task of enterprise SQL generation, it is increasingly false — and the gap is narrowing rapidly.
SQL generation for a specific database is a narrow, structured task with a well-defined success criterion: the generated SQL executes successfully and returns the correct result. It does not require general reasoning about the world. It requires deep familiarity with the schema and correct application of the SQL dialect.
Local models fine-tuned specifically for SQL generation — SQLCoder, DeepSeek Coder, Code Llama — match or exceed the SQL accuracy of much larger general-purpose cloud models when evaluated on domain-specific schemas. The domain specificity of the task narrows the capability gap dramatically.
Latency tells a different story
Cloud LLMs introduce network round-trip latency plus queue wait time plus inference time. In practice, cloud LLM calls for SQL generation average 2–8 seconds per query, depending on prompt length and server load. This is acceptable for batch reporting. It is not acceptable for operational queries where users expect sub-second responses.
A local LLM running on enterprise GPU infrastructure — or even on CPU for smaller models — returns SQL in 400ms to 2 seconds, with consistent latency unaffected by external server load. The user experience difference is significant enough to change adoption patterns.
SQLCoder-7B
Purpose-built for text-to-SQL. Outperforms GPT-4 on Spider benchmark for schema-specific queries. Runs on a single A10 GPU.
DeepSeek-Coder
Strong code generation across SQL dialects. Excellent schema grounding. Available in 7B and 33B variants for different hardware budgets.
Llama 3 — Fine-tuned
General-purpose base model with SQL fine-tuning. Good balance of reasoning and SQL accuracy. Fully open weights, enterprise-deployable.
The Infrastructure Reality
The objection to local LLMs is usually hardware: GPU infrastructure is expensive, specialised, and operationally complex. This was true three years ago. The picture is different today.
A single NVIDIA A10G GPU (available in every major cloud provider's infrastructure) can serve a 7B parameter SQL model at 60–80 tokens per second — fast enough to handle hundreds of concurrent enterprise users. The monthly cost of that GPU instance is comparable to the per-query cost of cloud LLM APIs at moderate query volumes, with zero marginal cost per additional query.
More importantly, on-premise GPU deployment means the model runs inside the enterprise network boundary. Schema context never leaves. Queries never leave. The entire inference stack is within the security perimeter of the organisation that owns the data.
Vertiscope AI's Architecture Decision
Vertiscope AI deploys entirely within the customer's infrastructure. The LLM runs on hardware the customer controls, in the network they control, with credentials they manage. The schema context that flows through the query pipeline — including table structures, sample values, and query history — never touches external infrastructure.
This is not a privacy feature. It is the only architecture that is compatible with the data governance requirements of regulated enterprises in financial services, healthcare, government, and critical infrastructure.
The organisations that will lead enterprise AI adoption in the next five years will not be those that moved fastest to cloud AI. They will be those that built the local infrastructure to run AI at scale within their own boundaries — and then used that infrastructure as a competitive moat.
Cloud AI is a starting point. Local AI is the destination. The enterprises winning this transition are the ones who understand that the journey is not optional.
Part of the Vertiscope AI research series on enterprise AI infrastructure.