Fine-Tuning Reference
GenAI Studio supports the following model and hardware pair configurations.
Mistral-7b | Configuration |
---|---|
T4 | - slots_per_trial: 16 - context_window: 2048 - batch_size: 1 - deepspeed: True - gradient_checkpointing: True - torch_dtype: “float16” |
V100 | - slots_per_trial: 24 - context_window: 2048 - batch_size: 1 - deepspeed: True - gradient_checkpointing: True - torch_dtype: “float16” |
A100 | - slots_per_trial: 8 - context_window: 4096 - batch_size: 1 - deepspeed: True - gradient_checkpointing: True - torch_dtype: “float16” |
Llama-2-7b | Configuration | Llama-2-13b | Configuration |
---|---|---|---|
V100 | - slots_per_trial: 16 - context_window: 2048 - batch_size: 1 - deepspeed: True - gradient_checkpointing: True - torch_dtype: “float16” |
V100 | - slots_per_trial: 32 - context_window: 2048 - batch_size: 1 - deepspeed: True - gradient_checkpointing: True - torch_dtype: “float16” |
T4 | - slots_per_trial: 16 - context_window: 2048 - batch_size: 1 - deepspeed: True - gradient_checkpointing: True - torch_dtype: “float16” |
T4 | - slots_per_trial: 32 - context_window: 2048 - batch_size: 1 - deepspeed: True - gradient_checkpointing: True - torch_dtype: “float16” |
A100 | - slots_per_trial: 4 - context_window: 2048 - batch_size: 2 - deepspeed: True - gradient_checkpointing: True - torch_dtype: “float16” |
A100 | - slots_per_trial: 16 - context_window: 4096 - batch_size: 1 - deepspeed: True - gradient_checkpointing: True - torch_dtype: “float16” |
- slots_per_trial: 8 - context_window: 4096 - batch_size: 2 - deepspeed: True - gradient_checkpointing: True - torch_dtype: “float16” |
- slots_per_trial: 8 - context_window: 1024 - batch_size: 1 - deepspeed: True - gradient_checkpointing: True - torch_dtype: “float16” |
falcon-7b | Configuration | falcon-40b | Configuration |
---|---|---|---|
T4 | - slots_per_trial: 16 - context_window: 2048 - batch_size: 1 - deepspeed: True - gradient_checkpointing: True - torch_dtype: “float16” |
V100 | - slots_per_trial: 128 - context_window: 1024 - batch_size: 2 - deepspeed: True - gradient_checkpointing: True - torch_dtype: “float16” |
V100 | - slots_per_trial: 16 - context_window: 2048 - batch_size: 1 - deepspeed: True - gradient_checkpointing: True - torch_dtype: “float16” |
A100 | - slots_per_trial: 32 - context_window: 2048 - batch_size: 4 - deepspeed: True - gradient_checkpointing: True - torch_dtype: “float16” |
A100 | - slots_per_trial: 4 - context_window: 2048 - batch_size: 2 - deepspeed: True - gradient_checkpointing: True - torch_dtype: “float16” |
mpt-7b | Configuration | mpt-30b | Configuration |
---|---|---|---|
T4 | - slots_per_trial: 16 - context_window: 2048 - batch_size: 1 - deepspeed: True - gradient_checkpointing: True - torch_dtype: “float16” |
T4 | - slots_per_trial: 64 - context_window: 1024 - batch_size: 1 - deepspeed: True - gradient_checkpointing: True - torch_dtype: “float16” |
V100 | - slots_per_trial: 16 - context_window: 2048 - batch_size: 1 - deepspeed: True - gradient_checkpointing: True - torch_dtype: “float16” |
V100 | - slots_per_trial: 128 - context_window: 1024 - batch_size: 1 - deepspeed: True - gradient_checkpointing: True - torch_dtype: “float16” |
A100 | - slots_per_trial: 4 - context_window: 2048 - batch_size: 1 - deepspeed: True - gradient_checkpointing: True - torch_dtype: “float16” |
A100 | - slots_per_trial: 16 - context_window: 1024 - batch_size: 2 - deepspeed: True - gradient_checkpointing: True - torch_dtype: “float16” |
- slots_per_trial: 32 - context_window: 2048 - batch_size: 2 - deepspeed: True - gradient_checkpointing: True - torch_dtype: “float16” |
Configuration Description #
-
slots_per_trial: The number of slots (GPUs) each trial (such as a run of fine-tuning) will use. For example, if
slots_per_trial
is set to 16 and the hardware type isV100
, then one fine-tuning run will need 16 V100 GPUs. -
context_window: The maximum number of tokens the model can consider at once. For example, a
context_window
size of2048
indicates the model can consider sequences of up to 2048 tokens long. -
batch_size: The number of batches the model will process at a time.
-
deepspeed: When set to
True
, the training process uses DeepSpeed’s optimizations to improve speed and efficiency, reduce memory consumption, and potentially increase training speed. -
gradient_checkpointing: A technique for reducing memory usage that allows training larger models on hardware with limited memory.
-
torch_dtype: Specifies the data type used during training. For example,
float16
reduces memory usage and can help with faster computation.