From Registry (UI)
Before You Start #
- Set up a registry
- Confirm that your model is available in your chosen registry
- HuggingFace: Sign up for a Hugging Face account and create an access token.
- Profile > Settings > Access Tokens
- New Token
- OpenLLM: Sign up for a Hugging Face account and create an access token.
- Profile > Settings > Access Tokens
- New Token
- NGC: Sign up for an Nvidia NGC account and obtain the necessary API key.
- Profile > Setup > Generate API Key
Basic Details #
- Sign in to HPE Machine Learning Inferencing Software.
- Navigate to Packaged Models.
- Select Add new model.
- Input details for the following:
- Name: The name of the model within HPE Machine Learning Inferencing Software.
- Description: A brief description of the model.
- Select Next.
Storage Details #
MLIS shows the currently available NIMs for the organization specified in the registry. Review the support matrix for a specific NVIDIA NIM for LLMS to properly configure the resources needed to run the model.
When the organization is not specified in your registry configuration, MLIS displays all available NIMs, including those not designed for LLMs. Non-LLM NIMs (such as ProteinMPNN or TTS FastPitch) are incompatible with the LLM configuration interface and may fail to launch using MLIS’s default settings. To run these specialized NIMs, you may need to use the custom model format and provide specific environment variables or arguments as detailed in each NIM’s documentation.
- Input details for the following:
- Registry: The registry where the model is stored.
- Model Format: Options are
HuggingFace
,OpenLLM
,Bento archive
,NIM
,Custom
. - Image: The container image servicing the model; must be the name of the image + a release tag. For NIM, see the NGC catalog for the image options.
- URL/Path: The location of the model object in the registry.
Prefix URI Syntax Description hf:// hf://<model-ref>
A vLLM-compatible model name from huggingface.co dynamically loaded and executed with a vLLM backend. openllm:// openllm://<model-ref>
An openllm model name from huggingface.co dynamically loaded and executed with a OpenLLM + VLLM backend. s3:// s3://<bucket-name>/<path-to-model>
A model directory dynamically downloaded from an associated s3 registry bucket. This is supported for the bento-archive, openllm, and custom model formats. pvc:// See From PVC setup guides A PVC model path that can be used for pre-downloaded NIM and Custom models. pfs:// pfs://<project>/<repo>@<commit>[:<path>][?containerPath=<path>]
A PFS model path that can be used for models stored in HPE Machine Learning Data Management repositories. ngc:// Not supported
- Optionally, enable the local caching toggle to cache the model on first use. This speeds up startup times for subsequent deployments. You must have the Admin user role to enable local caching.
- Select Next.
Resource Templates #
-
Choose a Resource Template or define custom resources.
Name Description Request CPU Request Memory Request GPU Limit CPU Limit Memory Limit GPU cpu-tiny 1 cpu, 10Gi memory, no gpu per replica 1 10Gi 1 10Gi cpu-small 4 cpu, 20Gi memory, no gpu per replica 4 20Gi 6 40Gi cpu-large 8 cpu, 40Gi memory, no gpu per replica 8 40Gi 10 60Gi gpu-tiny 1 cpu, 10Gi, 1 gpu per replica 1 10Gi 1 1 10Gi 1 gpu-small 2 cpu, 20Gi, 2 gpu per replica 2 20Gi 2 6 40Gi 2 gpu-large 8 cpu, 40Gi, 4 gpu per replica 8 40Gi 4 10 60Gi 4 GPU TypeSpecifying a GPU type requires heterogenous GPU support be enabled. -
Select Next.
Environment Variables & Arguments #
Environment variables and arguments are advanced configuration options that you can set for your packaged model. These inputs will vary based on your model’s requirements. For more information, see the Advanced Configuration reference article.
- Provide any needed Environment Variables.
- Provide any needed Arguments.
- Select Create model.