Create Image (OpenLLM)
The following steps guide you through creating a containerized image for an inference service using the OpenLLM CLI. The result of this is a publicly accessible container published at <user>/<model-name>
which can be referenced in an HPE Machine Learning Inferencing Software Model definition and deployed as a service.
Before You Start #
- Ensure you have completed Developer System Setup.
- Ensure you have Docker installed and running.
- Ensure you have deployed the HPE Machine Learning Inferencing Software controller and have the
openllm
CLI installed. - Ensure you have an available GPU on the system
How to Create a Containerized Image #
- Build the container using the
openllm build
command.openllm build --backend vllm --containerize <user/model-name>
- Get the
IMAGE ID
of the resulting container.docker image ls
REPOSITORY TAG IMAGE ID CREATED SIZE tiiuae--falcon-7b-service 898df1396f35e447d5fe44e0a3ccaaaa69f30d36 4ac4bb1f2dce 3 minutes ago 24.9GB
- Tag the resulting container.
docker tag <image-id> <user>/<model-name>
- Push the container to a publicly-accessible docker repo.
docker push <user>/<model-name>
- Verify the container is accessible.
docker pull <user>/<model-name>
You are now ready to upload this image to a registry and create a packaged model in HPE Machine Learning Inferencing Software.
Model Testing #
You can serve a model for interactive testing before building the container by specifying the desired LLM name in the following command:
openllm start --backend vllm facebook/opt-125m
Once started, the LLM service will be listening on http://localhost:3000
. You may interact with it via the SwaggerUI web interface using a browser pointed to that URL. The SwaggerUI shows the available REST API methods of the service.
You can also use the openllm query
command:
openllm query "What is an LLM?"
What is an LLM?
A degree in engineering