New Research - Optimizing Text-to-SQL Performance: The Key Role of Metadata Configurations

Gareth Price

Introduction

Text-to-SQL systems are transforming the way businesses interact with their data, making it accessible to non-technical users. But how do we optimize these systems to ensure accurate, reliable, and efficient query performance?

At CorralData, we conducted a comprehensive empirical study analyzing the impact of different metadata configurations on Text-to-SQL system performance. The results offer clear guidance on how to structure metadata to maximize query accuracy and execution success.

In this post, we break down our findings, highlight the most effective metadata configurations, and provide actionable insights for organizations looking to improve their Text-to-SQL capabilities.

Why Metadata Matters in Text-to-SQL

Text-to-SQL models convert natural language queries into SQL code. However, even the most advanced models struggle without proper metadata support. Metadata provides essential context, allowing AI models to:

Understand database structure
Improve query accuracy
Resolve ambiguity in column and table relationships
Optimize execution performance

Key Findings from Our Study

We evaluated 128 different metadata configurations using OpenAI’s GPT-4o model against a sample database. Our dataset included 19 queries of varying complexity, resulting in 2,432 test cases.

1. The Best Metadata Configuration Achieved a 94.1% Correctness Score for Simple Queries

The highest-performing metadata combination included:

✅ Schema Information (database structure)
✅ Column Descriptions (natural language descriptions of columns)
✅ Table Descriptions (purpose and contents of tables)
✅ Foreign Keys (relationships between tables)
✅ Primary Keys (unique identifiers for each table)
✅ Common Queries (predefined queries with explanations)

This combination outperformed simpler configurations, improving both query execution success and result correctness.

2. Adding Metadata Improves Performance—But Only Up to a Point

We observed diminishing returns beyond three or four metadata components.

Moving from one to three components yielded an average 15% accuracy increase
Adding a fourth component provided only a 5% boost
Beyond four, additional metadata offered minimal improvements

This means organizations should prioritize a strategic selection of metadata rather than overloading their Text-to-SQL models.

3. Common Queries and Primary Keys Had the Most Impact

Not all metadata is equally valuable. Our study found that the most impactful metadata components were:

🔹 Common Queries (+74.6% improvement)
🔹 Primary Keys (+65.3% improvement)
🔹 Column Descriptions (+62.4% improvement)
🔹 Schema Information (+61.8% improvement)

Meanwhile, Examples (sample data) had the least impact (+41.0%), and in some cases, introduced noise that degraded performance.

The Optimal Approach to Metadata Implementation

Based on our findings, we recommend the following metadata hierarchy for organizations implementing Text-to-SQL systems:

Essential Metadata (High ROI)

1️⃣ Common Queries – Foundation for query understanding
2️⃣ Primary Keys – Ensures accurate joins and aggregations
3️⃣ Column Descriptions – Improves field selection accuracy

Optional Enhancements (Contextual Value)

4️⃣ Schema Information – Helpful for table relationships
5️⃣ Foreign Keys – Useful for multi-table queries
6️⃣ Table Descriptions – Provides additional clarity

Use Sparingly (Low Impact)

❌ Examples – Can introduce noise, use only if necessary

Practical Takeaways for Businesses

For organizations deploying Text-to-SQL systems, these insights provide clear implementation guidelines:

✅ Start simple – Implement common queries, primary keys, and column descriptions first
✅ Test and refine – Use empirical data to assess the need for additional metadata
✅ Avoid unnecessary complexity – More metadata isn’t always better
✅ Adapt to your database – If your schema has unclear naming, column descriptions become more critical

Conclusion

Metadata plays a crucial role in improving Text-to-SQL system performance. Our research shows that organizations can significantly enhance query accuracy and execution success by prioritizing the right metadata components while avoiding unnecessary complexity.

To dive deeper into our full research findings and methodology, download the complete study below.