Supported Hardware Reference

Consult this resource when setting up your cluster to determine how many concurrent users and models each instance can support.

Calculating Concurrent Models Per Instance Type

Prompt Engineering Requirements

This table denotes how many models can run concurrently on one instance. For example, one p4d.24xlarge instance (8 A100 GPUs) can support eight Llama-2-7b models running prompt engineering at the same time.

Model Instance Type
8 A100 GPUs (p4d.24xlarge) 8 V100 GPUs (p3.16xlarge) 4 T4 GPUs (g4dn.12xlarge)
Mistral-7b 8 models 4 models 2 models
Llama-2-7b 8 models 4 models 2 models
Llama-2-13b 4 models 2 models 2 models
Llama-2-70b 2 models N/A N/A
falcon-7b 8 models N/A N/A
falcon-40b 2 models 1 model N/A
mpt-7b 8 models 4 models 2 models
mpt-30b 2 models 1 model N/A

Fine-Tuning Requirements

This table denotes how many instances of each model that can be used concurrently for fine-tuning on different types of hardware instances. For example, one p4d.24xlarge instance can only support one Llama-2-7b model running fine-tuning. To fine-tune the same model, two p3.16xlarge instances (totally 16 V100 GPUs) are required.

Model Instance Type
8 A100 (p4d.24xlarge) 8 V100 (p3.16xlarge) 4 T4 (g4dn.12xlarge)
Mistral-7b 1 instance / job 3 instances / job 4 instances / job
Llama-2-7b 1 instance / job 2 instances / job 4 instances / job
Llama-2-13b 1 or 2 instance(s) / job (depending on context_window length) 4 instances / job 8 instances / job
Llama-2-70b N/A N/A N/A
falcon-7b 0.5 instance / job 2 instances / job 4 instances / job
falcon-40b 4 instances / job 16 instances / job N/A
mpt-7b 0.5 instance / job 2 instances / job 4 instances / job
mpt-30b 2 or 4 instances / job (depending on context_window length) 16 instances / job 4 instances / job