Generate SQL Queries
Let’s Get Started! #
Structured Query Language (SQL) is complex. Without GenAI Studio, you would need an indepth understanding of databases and metadata. GenAI Studio aims to generate SQL queries from natural language datasets and convert the queries into semantically-correct SQL for you.
By following these step-by-step instructions, you’ll see how easy it is to build a plain-text-to-SQL solution that generates semantically-correct SQL queries for you from your dataset. All you have to do is create a new project, connect to a Hugging Face dataset, load a base model, and begin prompting in the playground. You can even fine-tune your model to make it even more customized!
Objectives #
When you have completed this tutorial, you will have created a project and prompted a chat model to generate SQL queries. We’ll cover all of the following topics:
- Creating a Project
- Importing a Pre-Loaded Hugging Face Dataset
- Linking the Dataset to Your Project
- Creating a Text-to-SQL Prompt
- Generating SQL Queries from Plain Text
GenAI Studio guides you through going from basic prompt design to pretty good model performance. Its built-in guardrails help prevent memory issues and help prevent you from choosing the wrong parameters.
Get Set Up #
- Ensure you’re signed in.
- Confirm your Hugging Face access token is set.
Create a New Project #
- From the GenAI Studio home page, select Create a new project.
- Name the project
Text to SQL Tutorial
. - Select Create Project.
- Select your new project to open it.
In the next section, you’ll import a pre-loaded Hugging Face dataset.
Import a Pre-Loaded Hugging Face Dataset #
We’ll import a dataset that is particularly useful for training a model how to create SQL queries from text because it was built for exactly that! From this dataset, you’ll see how to work with your own datasets because it is a good example.
- Navigate to the Datasets tab.
- Select New Dataset.
- Provide the following Hugging Face dataset inputs:
- Dataset name:
determined-ai/text-to-sql-easy
- Description:
text to sql queries
- Dataset name:
- Select Create Dataset.
The text-to-sql-easy
dataset is already split into train (10%), validation (10%), and test (10%). The dataset consists of the following features:
- Instruction: The
instruction
column contains requests or questions written in everyday language and serves as the basis for generating SQL queries. - Input: The
input
column is the blueprint of our dataset and is defined as a collection of CREATE TABLE statements. These statements lay out how our data is organized–what tables exist, and what columns they contain. - Response: The
response
is the end product or target. It contains the actual SQL queries that have been expertly crafted to fetch the information asked for in theinstruction
column.
Your screen should look similar to this:
You now have a dataset ready for linking to snapshots in your project.
Create a Snapshot and Generate SQL Queries #
Now that we have our project and dataset, let’s open the snapshot playground, create a snapshot and link our imported dataset.
Next, we’ll create a snapshot specific to our needs–creating SQL queries from text.
Open the Snapshot playground #
The snapshot playground is where you’ll interact with GenAI Studio’s base models to get an idea of how they perform with certain settings and prompts. Once you find a combination that works well, you can save the snapshot and use it to create a new model.
- Navigate to the Snapshots tab from your project dashboard.
- Select New Playground to open a new snapshot Playground.
Select a Foundation (Base) Model and Resource Pool #
First, you’ll need to select a foundation (base) model. You’ll find a curated list of foundation models to choose from. Selecting a foundation model is required because it serves as the underlying engine that drives the generative model’s performance.
- Choose Select model.
- In Model properties, select a chat model such as meta-llama/Llama-2-7b-chat-hf main.
- Select a Resource Pool, e.g.,
a100
. - Select Load.
When selecting a base model, you generally want to select the most appropriate model based on what you need the model to do in your specific project or task.
Link the Dataset #
With our base model loaded, we can link the dataset we uploaded from Hugging Face.
- Open the dataset drawer, then in Load dataset, choose the
determined-ai/text-to-sql-easy
dataset. - Preview the dataset samples and then select Load.
GenAI Studio let’s us easily make use of the columns, the columns instruction
, input
, and response
, to quickly build our prompt.
Create a Prompt #
Supply the model with the following prompt:
Instruction
You are a helpful programmer assistant that excels at SQL. When prompted with a task and a definition of an SQL table, you respond with a SQL query to retrieve information from the table. Don’t explain your reasoning, only provide the SQL query.
Examples
Below is an example:
Instruction: {{instruction}}
SQL table: {{input}}
SQL query: {{response}}
Input
Instruction: {{instruction}}
SQL table: {{input}}
SQL query:
Expected Output
{{response}}
Generate SQL Queries #
After you’ve put everything in, the model will give its response. This is where you see how well it understood your request.
-
Select Generate.
-
Review the model’s response in the output.
-
Select Save Snapshot and provide a meaningful name for your snapshot, such as
snapshot1
. -
Select Save.
Saving a snapshot of the playground lets us revisit this exact point in the training process without needing to start over. This is particularly useful for iterating on our model’s performance, because we can make adjustments and immediately see how they impact the results.
Recap #
- Creating a Project: You set up your own project, establishing a workspace where all your SQL generation activities are centralized.
- Importing and Linking Datasets: You imported the “determined-ai/text-to-sql-easy” dataset from Hugging Face, which is specifically tailored for generating SQL queries from plain text. You successfully linked this dataset to your project, setting the stage for practical, hands-on learning.
- Crafting and Testing Prompts: Using the snapshot playground, you crafted a prompt that guides the model to convert plain text instructions into precise SQL queries. You’ve seen firsthand how to formulate requests that the model can understand and execute.
- Generating SQL Queries: You initiated the generation of SQL queries by engaging with the base model, observing how the model interprets and responds to your inputs.
- Snapshot Management: You created and saved snapshots, a crucial feature that allows you to save your work and revisit it anytime. This facilitates iterative improvements and fine-tuning of the model’s performance.
- Iterative Learning and Improvement: Throughout the tutorial, GenAI Studio’s built-in guardrails helped you prevent common mistakes like memory overloads or suboptimal parameter selection, ensuring a smooth and productive learning experience.
Next, we’ll fine-tune our model.