From Registry (API)
Before You Start #
- Set up a registry
- Confirm that your model is available in your chosen registry
- OpenLLM: Sign up for a Hugging Face account and create an access token.
- Profile > Settings > Access Tokens
- New Token
- NGC: Sign up for an Nvidia NGC account and obtain the necessary API key.
- Profile > Setup > Generate API Key
Supported Registry URI Syntax #
Review the supported registry URI syntax before adding a model from a registry. This URI is passed in using the url
field.
Prefix | URI Syntax | Description |
---|---|---|
openllm:// | openllm://<model-ref> | An openllm model name from huggingface.co dynamically loaded and executed with a VLLM backend. |
s3:// | s3://<bucket-name>/<path-to-model> | A model directory dynamically downloaded from an associated s3 registry bucket. This is supported for the bento-archive, openllm, and custom model formats. |
pvc:// | See From PVC setup guides | A PVC model path that can be used for pre-downloaded NIM and Custom models. |
pfs:// | pfs://<project>/<repo>@<commit>[:<path>][?containerPath=<path>] | A PFS model path that can be used for models stored in HPE Machine Learning Data Management repositories. |
ngc:// | Not supported |
How to Add a Packaged Model From a Registry #
- Sign in to HPE Machine Learning Inferencing Software.
curl -X 'POST' \ '<YOUR_EXT_CLUSTER_IP>/api/v1/login' \ -H 'accept: application/json' \ -H 'Content-Type: application/json' \ -d '{ "username": "<YOUR_USERNAME>", "password": "<YOUR_PASSWORD>" }'
- Obtain the Bearer token from the response.
- Use the following cURL command to add a new packaged model.
curl -X 'POST' \ '<YOUR_EXT_CLUSTER_IP>/api/v1/models' \ -H 'accept: application/json' \ -H 'Authorization: Bearer <YOUR_ACCESS_TOKEN>' \ -H 'Content-Type: application/json' \ -d '{ "arguments": ["--debug"], "description": "<DESCRIPTION>", "environment": { "key": "value", "key2": "value2" } "image": "<USER_NAME>/<MODEL_NAME>:<TAG>", "modelFormat": "<MODEL_FORMAT>", "name": "<MODEL_NAME>", "registry": "<REGISTRY>", "resources": { "gpuType": "<GPU_TYPE>", "limits": { "cpu": "<CPU_LIMIT>", "gpu": "<GPU_LIMIT>", "memory": "<MEMORY_LIMIT>" }, "requests": { "cpu": "<CPU_REQUEST>", "gpu": "<GPU_REQUEST>", "memory": "<MEMORY_REQUEST>" } }, "url": "<OBJECT_URL>" }'