Self-Correcting SQL — Vertiscope AI Research

Every text-to-SQL system generates incorrect queries. This is not a flaw to be eliminated — it is an inherent property of mapping imprecise natural language to formal query syntax against schemas that were not designed with natural language in mind. The question is not how to achieve zero SQL errors. The question is what happens next when an error occurs.

In a conventional text-to-SQL setup, the answer is: nothing automated. The error propagates to the user as an empty result, a database error message, or — most dangerously — a syntactically valid query that returns wrong data without any indication that something went wrong.

Self-correcting SQL is the architectural pattern that changes this. Rather than treating query generation as a one-shot operation, the system treats it as a feedback loop: generate, execute, evaluate, and if the evaluation fails, regenerate with the error context incorporated.

"The system that never fails is not the one that generates perfect SQL. It's the one that never lets an imperfect SQL reach the user unchecked."

The Three Categories of SQL Error

Understanding self-correction requires understanding what kinds of SQL errors actually occur in production text-to-SQL systems. They fall into three categories with distinct detection and correction strategies.

Structural errors

Structural errors are the easiest to detect and correct: the generated SQL does not parse or execute. The database returns an error message. Column names that don't exist, table names that are misspelled, syntax errors from dialect confusion (using LIMIT on SQL Server, TOP on MySQL), invalid function names, malformed JOIN conditions.

Detection is trivial — the database itself reports the error. Correction is achieved by feeding the error message back to the LLM as additional context: "The previous query failed with: column 'revenue' does not exist in table 'transactions'. Available columns are: amount, currency, transaction_date." The LLM can typically resolve structural errors in one retry.

Semantic errors

Semantic errors are more insidious: the query executes successfully but returns incorrect results. A GROUP BY on the wrong column. An aggregation that double-counts because of an implicit cross join. A date filter that is off by one timezone. A WHERE condition that uses OR where AND was intended.

These cannot be detected by simply checking whether the query ran. They require result validation — comparing the output against expectations derived from query intent and historical result patterns. A query for "total monthly revenue" that returns a single row instead of 12 is almost certainly wrong, even if it executed without error.

Precision errors

Precision errors are the hardest category: the query returns plausible results that are neither clearly correct nor clearly wrong. A revenue figure that is 3% off from the correct answer because of a subtle currency conversion omission. A count that is right for most records but misses one edge case in the data.

These require human validation to catch, which is why the finalization system exists: when a user confirms results are correct and finalises a query, the system caches that confirmed SQL and uses it for all future identical queries — bypassing the generation loop entirely.

The Correction Loop

Generate

LLM produces SQL from natural language query and schema context. Prompt includes table names, column types, sample values, and dialect instructions.

Validate (pre-execution)

Static analysis checks the generated SQL against known schema: table existence, column names, join key types, dialect compliance. Errors caught here avoid unnecessary database round-trips.

Execute

Validated SQL runs against the target database with a configurable timeout. Execution errors (syntax, permissions, resource limits) are captured with full error text.

Evaluate

Result set checked against intent: expected cardinality, column count, value distributions, null rates, aggregation sanity. Anomalies trigger the correction path.

Correct (if needed)

Error context appended to original prompt. LLM regenerates SQL with explicit failure information. Maximum retry depth: 3 attempts before surfacing to user with error context.

Deliver

Validated results delivered with provenance metadata. Generation attempt count logged. Corrections applied are visible in the execution trace for audit purposes.

What Goes Into a Correction Prompt

The effectiveness of the correction loop depends almost entirely on the quality of information fed back to the LLM. A correction prompt that only says "the query failed, try again" is nearly useless. A correction prompt that says exactly what failed and why gives the model enough context to fix the issue on the first retry.

A correction prompt includes:

The original natural language query
The full schema context (unchanged)
The previously generated SQL that failed
The exact error message from the database, verbatim
For semantic errors: the result that was returned and why it is suspected to be incorrect
Explicit instruction: "Fix only the identified error. Do not change parts of the query that were working correctly."

The final instruction matters more than it appears. Without it, LLMs tend to overfit corrections — rewriting the entire query when only a column name needed changing, introducing new errors while fixing the original one.

The Finalization Shortcut

The correction loop handles in-flight errors. But it has a ceiling: for genuinely ambiguous queries where the schema does not provide enough context to generate correct SQL reliably, even three retries may not be enough.

This is where query finalization becomes a performance primitive, not just a user experience feature. When a user confirms a result is correct and finalises the query, the system stores the confirmed SQL keyed to the query text. On all future runs of that query — regardless of rephrasing — the stored SQL executes directly, bypassing the generation and correction loop entirely.

~0ms

LLM time for finalised queries — generation bypassed entirely

100%

Accuracy for finalised queries — human-confirmed correct SQL

3×

Typical correction loop reduction after 30 days of active use

The practical effect is that a system in active use becomes progressively faster and more accurate over time. Common queries get finalised. The correction loop fires less frequently. The LLM handles only genuinely novel queries. The organisation builds a corpus of verified, production-tested SQL that belongs entirely to them — not to any external provider.

Hallucination Detection

LLMs generate confident-sounding SQL for tables and columns that do not exist. This is not rare — it is a consistent failure mode that occurs whenever the model's training data contains SQL patterns that don't match the current schema.

Pre-execution validation catches most hallucinations by checking every table and column reference in the generated SQL against the actual schema. A query referencing sales.revenue_usd when the column is named sales.amount_in_dollars fails validation before it ever reaches the database. The correction loop then provides the real column name and retries.

For cross-source queries, hallucination detection extends to source routing: a query that references a table from the wrong database system is detected and re-attributed to the correct source before execution. This prevents the class of error where valid SQL for MySQL runs against Oracle and produces a cryptic syntax error rather than a clear attribution failure.

Part of the Vertiscope AI research series on autonomous query intelligence.

Self-Correcting SQL: How AI Learns from Its Own Mistakes