Pandey, H., Gupta, A., Sarkar, S., Tomer, M., Johannes, S. and Gong, Y., 2025. GEMMA-SQL: A novel text-to-SQL model based on large language models. Applied Artificial Intelligence. (In Press)
Full text available as:
|
PDF
2511.04710v1.pdf - Accepted Version Restricted to Repository staff only Available under License Creative Commons Attribution Non-commercial Share Alike. 2MB | |
|
Copyright to original material in this document is with the original owner(s). Access to this content through BURO is granted on condition that you use it only for research, scholarly or other non-commercial purposes. If you wish to use it for any other purposes, you must contact BU via BURO@bournemouth.ac.uk. Any third party copyright material in this document remains the property of its respective owner(s). BU grants no licence for further use of that third party material. |
DOI: 10.48550/arXiv.2511.04710
Abstract
Text-to-SQL systems enable users to interact with structured databases using natural language, eliminating the need for specialized programming knowledge. In this work, we introduce GEMMA-SQL, a lightweight and efficient text-to-SQL model built upon the open-source Gemma 2B architecture. Unlike many large language models (LLMs), GEMMA-SQL is fine-tuned in a resource-efficient, iterative manner and can be deployed on low-cost hardware. Leveraging the SPIDER benchmark for training and evaluation, GEMMA-SQL combines multiple prompting strategies, including few-shot learning, to enhance SQL query generation accuracy. The instruction-tuned variant, GEMMA-SQL Instruct, achieves 66.8% Test-Suite accuracy and 63.3% Exact Set Match accuracy, outperforming several state-of-the-art baselines such as IRNet, RYANSQL, and CodeXDavinci. The proposed approach demonstrates that effective prompt design and targeted instruction tuning can significantly boost performance while maintaining high scalability and adaptability. These results position GEMMA-SQL as a practical, open-source alternative for robust and accessible text-to-SQL systems.
| Item Type: | Article |
|---|---|
| ISSN: | 0883-9514 |
| Uncontrolled Keywords: | Domain-specific languages; generative AI; GEMMA; large language models; SPIDER; text-to-SQL |
| Group: | Faculty of Science & Technology |
| ID Code: | 41488 |
| Deposited By: | Symplectic RT2 |
| Deposited On: | 11 Nov 2025 12:46 |
| Last Modified: | 11 Nov 2025 12:46 |
Downloads
Downloads per month over past year
| Repository Staff Only - |
Tools
Tools