DBChat

Query your database with natural language

Project Overview (Problem Space & Impact)

DBChat is a revolutionary chat application that harnesses modern natural language processing (NLP) techniques to translate user queries into SQL, execute them on a database, and return user-friendly responses. Aimed at democratizing data insights for non-technical users, DBChat is designed to bridge the significant gap between complex database querying and user accessibility. This capability is especially crucial for staff within organizations who need data-driven insights but lack the technical expertise to query databases directly. By empowering users to make informed decisions quickly and independently, DBChat enhances operational efficiency and makes data insights accessible to staff of all backgrounds, transforming the way organizations leverage their data for strategic advantage.






The State of Text-To-SQL

Foundational Developments

The pre-LLM phase of Text-to-SQL research focused on tackling the intrinsic challenges of translating natural language queries into SQL statements. As documented in Recent Advances in Text-to-SQL: A Survey of What We Have and What We Expect by Naihao Deng et al. (2022)", the field has explored various datasets, techniques, and challenges. This foundational work categorized datasets into single-domain and cross-domain types, each addressing specific issues such as domain generalization and the complexity of SQL queries. Methodologies have evolved from simple rule-based systems to complex deep learning frameworks, utilizing graph-based encoding, attention mechanisms, and pre-trained language models to enhance the representation and decoding of SQL queries (Deng, 2022).


Acceleration Through Large Language Models

In 2023, the paper "Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation" led by Dawei Gao et al. demonstrated how LLMs like GPT-4 have rapidly accelerated the development of Text-to-SQL solutions (Gao, 2023). Their work highlighted the effectiveness of LLMs in improving execution accuracy and refining prompt engineering methods. With the introduction of DAIL-SQL, an integrated solution that achieved an execution accuracy of 86.6% on the Spider leaderboard, the research established new benchmarks for the field. This marked a notable improvement over previous models and underscored the potential of LLMs to enhance Text-to-SQL processes substantially.


Modern Day Applications

In 2024, the rapidly growing interest in leveraging LLMs for automation has created a prolific, competitive environment for Text-to-SQL model creation. While many recent Text-To-SQL models - often fine-tuned versions of the largest state-of-the-art, general purpose and instruct LLMs - have achieved accuracy scores above 90%, there are few productionized and deployed applications of this technology that use a natural language question to return data insights from a translated sql query run on a real database. DBChat is one of only a select group of organizations (along with other pioneers like Defog, Borealis) leading the effort to democratize data insights for nontechnical users through the implementation of state-of-the-art Text-To-SQL models on real databases.






What Differentiates Us

DBChat distinguishes itself from other platforms through several innovative features that cater to a broad range of user backgrounds and needs. Central to its capabilities is the real-world application of Text-to-SQL technology, which allows users to interact with live databases using natural language queries. This intuitive interface is complemented by both single and multi-turn capabilities, enabling fluid, context-aware conversations that mimic human interaction. The evaluation of DBChat's performance is robust, utilizing diverse datasets (Spider, Defog, and DBChat's own original evaluation dataset) that help to maximize its generalizability across various domains. Furthermore, the platform's modularity is a key advantage; users can deploy any model that suits their specific requirements. Additionally, DBChat supports file uploads, enhancing its utility by enabling on-the-fly natural language interactions with data. Lastly, its comprehensive user interface features offer a seamless and robust user experience, making complex database schemas accessible and user-friendly.



Watch Our Demo!

About the Team

This capstone project was designed and developed by 5 students from Berkeley Master's of Information & Data Science program, seeking to streamline SQL database interactions. Our team is passionate about making complex technology accessible and easy to use.



Nico Loffreda

Nico Loffreda

MIDS '24 | Senior Data Scientist at NVIDIA

I School Profile | LinkedIn | GitHub




Neal Vazquez

Neal Vazquez

MIDS '24 | Senior Data Analyst at Microsoft

I School Profile | LinkedIn




Florencia Froebel

Florencia Froebel

MIDS '24 | Data Integrations Engineer

I School Profile | LinkedIn | GitHub




Arjun Kandaswamy

Arjun Kandaswamy

MIDS '24 | Business Intelligence Analyst at Anaconda

I School Profile | LinkedIn




Richard Shim

Richard Shim

MIDS '24 | Senior Reliability Engineer at Nuro

I School Profile | LinkedIn