From Registry (UI)

Before You Start #

Set up a registry
Confirm that your model is available in your chosen registry
HuggingFace: Sign up for a Hugging Face account and create an access token.
- Profile > Settings > Access Tokens
- New Token
OpenLLM: Sign up for a Hugging Face account and create an access token.
- Profile > Settings > Access Tokens
- New Token
NGC: Sign up for an Nvidia NGC account and obtain the necessary API key.
- Profile > Setup > Generate API Key

Basic Details #

Sign in to HPE Machine Learning Inferencing Software.
Navigate to Packaged Models.
Select Add new model.
Input details for the following:
- Name: The name of the model within HPE Machine Learning Inferencing Software.
- Description: A brief description of the model.
Select Next.

Storage Details #

NVIDIA NIMs

MLIS shows the currently available NIMs for the organization specified in the registry. Review the support matrix for a specific NVIDIA NIM for LLMS to properly configure the resources needed to run the model.

When the organization is not specified in your registry configuration, MLIS displays all available NIMs, including those not designed for LLMs. Non-LLM NIMs (such as ProteinMPNN or TTS FastPitch) are incompatible with the LLM configuration interface and may fail to launch using MLIS’s default settings. To run these specialized NIMs, you may need to use the custom model format and provide specific environment variables or arguments as detailed in each NIM’s documentation.

Input details for the following:

Registry: The registry where the model is stored.
Model Format: Options are HuggingFace,OpenLLM, Bento archive, NIM, Custom.
Image: The container image servicing the model; must be the name of the image + a release tag. For NIM, see the NGC catalog for the image options.

URL/Path: The location of the model object in the registry.

Prefix	URI Syntax	Description
hf://	`hf://<model-ref>`	A vLLM-compatible model name from huggingface.co dynamically loaded and executed with a vLLM backend.
openllm://	`openllm://<model-ref>`	An openllm model name from huggingface.co dynamically loaded and executed with a OpenLLM + VLLM backend.
s3://	`s3://<bucket-name>/<path-to-model>`	A model directory dynamically downloaded from an associated s3 registry bucket. This is supported for the bento-archive, openllm, and custom model formats.
pvc://	See From PVC setup guides	A PVC model path that can be used for pre-downloaded NIM and Custom models.
pfs://	`pfs://<project>/<repo>@<commit>[:<path>][?containerPath=<path>]`	A PFS model path that can be used for models stored in HPE Machine Learning Data Management repositories.
ngc://	Not supported

Custom models can use any of the supported registry URI syntaxes, depending on where the model is stored.

Optionally, enable the local caching toggle to cache the model on first use. This speeds up startup times for subsequent deployments. You must have the Admin user role to enable local caching.
Select Next.

Resource Templates #

Choose a Resource Template or define custom resources.

Name	Description	Request CPU	Request Memory	Request GPU	Limit CPU	Limit Memory	Limit GPU
cpu-tiny	1 cpu, 10Gi memory, no gpu per replica	1	10Gi		1	10Gi
cpu-small	4 cpu, 20Gi memory, no gpu per replica	4	20Gi		6	40Gi
cpu-large	8 cpu, 40Gi memory, no gpu per replica	8	40Gi		10	60Gi
gpu-tiny	1 cpu, 10Gi, 1 gpu per replica	1	10Gi	1	1	10Gi	1
gpu-small	2 cpu, 20Gi, 2 gpu per replica	2	20Gi	2	6	40Gi	2
gpu-large	8 cpu, 40Gi, 4 gpu per replica	8	40Gi	4	10	60Gi	4

GPU Type

Specifying a GPU type requires heterogenous GPU support be enabled.

Select Next.

Environment Variables & Arguments #

Environment variables and arguments are advanced configuration options that you can set for your packaged model. These inputs will vary based on your model’s requirements. For more information, see the Advanced Configuration reference article.

Provide any needed Environment Variables.
Provide any needed Arguments.
Select Create model.