Text to SQL:
Generate SQL Queries

Let’s Get Started!

Structured Query Language (SQL) is complex. Without GenAI Studio, you would need an indepth understanding of databases and metadata. GenAI Studio aims to generate SQL queries from natural language datasets and convert the queries into semantically-correct SQL for you.

By following these step-by-step instructions, you’ll see how easy it is to build a plain-text-to-SQL solution that generates semantically-correct SQL queries for you from your dataset. All you have to do is create a new project, connect to a Hugging Face dataset, load a base model, and begin prompting in the playground. You can even fine-tune your model to make it even more customized!

Objectives

When you have completed this tutorial, you will have created a project and prompted a chat model to generate SQL queries. We’ll cover all of the following topics:

  • Creating a Project
  • Importing a Pre-Loaded Hugging Face Dataset
  • Linking the Dataset to Your Project
  • Creating a Text-to-SQL Prompt
  • Generating SQL Queries from Plain Text

GenAI Studio guides you through going from basic prompt design to pretty good model performance. Its built-in guardrails help prevent memory issues and help prevent you from choosing the wrong parameters.

Get Set Up

Navigating to and from the Cluster
If you’re starting from the Machine Learning Development Environment cluster home page, selecting GenAI will open the Generative AI Studio home page in your browser. Similarly, from your project home page, selecting MLDE will open Machine Learning Development Environment in your browser.

Create a New Project

  1. From the GenAI Studio home page, select Create a new project.
  2. Name the project Text to SQL Tutorial.
  3. Select Create Project.
  4. Select your new project to open it.

In the next section, you’ll import a pre-loaded Hugging Face dataset.

Import a Pre-Loaded Hugging Face Dataset

We’ll import a dataset that is particularly useful for training a model how to create SQL queries from text because it was built for exactly that! From this dataset, you’ll see how to work with your own datasets because it is a good example.

  1. Navigate to the Datasets tab.
  2. Select New Dataset.
  3. Provide the following Hugging Face dataset inputs:
    • Dataset name: determined-ai/text-to-sql-easy
    • Description: text to sql queries
  4. Select Create Dataset.
About the Training Set
In machine learning, we use a training set to teach the model how to perform a task, a validation set to fine-tune the model and make sure it isn’t memorizing answers but actually learning, and a test set to evaluate the model’s performance on completely new, unseen data, ensuring it can effectively handle real-world requests to create SQL queries.

The text-to-sql-easy dataset is already split into train (10%), validation (10%), and test (10%). The dataset consists of the following features:

  • Instruction: The instruction column contains requests or questions written in everyday language and serves as the basis for generating SQL queries.
  • Input: The input column is the blueprint of our dataset and is defined as a collection of CREATE TABLE statements. These statements lay out how our data is organized–what tables exist, and what columns they contain.
  • Response: The response is the end product or target. It contains the actual SQL queries that have been expertly crafted to fetch the information asked for in the instruction column.

Your screen should look similar to this:

You now have a dataset ready for linking to snapshots in your project.

Create a Snapshot and Generate SQL Queries

Now that we have our project and dataset, let’s open the snapshot playground, create a snapshot and link our imported dataset.

Next, we’ll create a snapshot specific to our needs–creating SQL queries from text.

Open the Snapshot playground

The snapshot playground is where you’ll interact with GenAI Studio’s base models to get an idea of how they perform with certain settings and prompts. Once you find a combination that works well, you can save the snapshot and use it to create a new model.

  1. Navigate to the Snapshots tab from your project dashboard.
  2. Select New Playground to open a new snapshot Playground.

Select a Foundation (Base) Model and Resource Pool

First, you’ll need to select a foundation (base) model. You’ll find a curated list of foundation models to choose from. Selecting a foundation model is required because it serves as the underlying engine that drives the generative model’s performance.

  1. Choose Select model.
  2. In Model properties, select a chat model such as meta-llama/Llama-2-7b-chat-hf main.
  3. Select a Resource Pool, e.g., a100.
  4. Select Load.

When selecting a base model, you generally want to select the most appropriate model based on what you need the model to do in your specific project or task.

With our base model loaded, we can link the dataset we uploaded from Hugging Face.

  1. Open the dataset drawer, then in Load dataset, choose the determined-ai/text-to-sql-easy dataset.
  2. Preview the dataset samples and then select Load.

GenAI Studio let’s us easily make use of the columns, the columns instruction, input, and response, to quickly build our prompt.

Create a Prompt

Supply the model with the following prompt:

Instruction

You are a helpful programmer assistant that excels at SQL. When prompted with a task and a definition of an SQL table, you respond with a SQL query to retrieve information from the table. Don’t explain your reasoning, only provide the SQL query.

Examples

Below is an example:
Instruction: {{instruction}}
SQL table: {{input}}
SQL query: {{response}}

Input

Instruction: {{instruction}}
SQL table: {{input}}
SQL query:

Expected Output

{{response}}

Generate SQL Queries

After you’ve put everything in, the model will give its response. This is where you see how well it understood your request.

  1. Select Generate.

  2. Review the model’s response in the output.

  3. Select Save Snapshot and provide a meaningful name for your snapshot, such as snapshot1.

  4. Select Save.

Saving a snapshot of the playground lets us revisit this exact point in the training process without needing to start over. This is particularly useful for iterating on our model’s performance, because we can make adjustments and immediately see how they impact the results.

Congratulations!
Congratulations! You’ve just mastered the art of creating SQL queries with GenAI Studio!

Recap

  • Creating a Project: You set up your own project, establishing a workspace where all your SQL generation activities are centralized.
  • Importing and Linking Datasets: You imported the “determined-ai/text-to-sql-easy” dataset from Hugging Face, which is specifically tailored for generating SQL queries from plain text. You successfully linked this dataset to your project, setting the stage for practical, hands-on learning.
  • Crafting and Testing Prompts: Using the snapshot playground, you crafted a prompt that guides the model to convert plain text instructions into precise SQL queries. You’ve seen firsthand how to formulate requests that the model can understand and execute.
  • Generating SQL Queries: You initiated the generation of SQL queries by engaging with the base model, observing how the model interprets and responds to your inputs.
  • Snapshot Management: You created and saved snapshots, a crucial feature that allows you to save your work and revisit it anytime. This facilitates iterative improvements and fine-tuning of the model’s performance.
  • Iterative Learning and Improvement: Throughout the tutorial, GenAI Studio’s built-in guardrails helped you prevent common mistakes like memory overloads or suboptimal parameter selection, ensuring a smooth and productive learning experience.

Next, we’ll fine-tune our model.