From Registry (API)

Before You Start

Supported Registry URI Syntax

Review the supported registry URI syntax before adding a model from a registry. This URI is passed in using the url field.

PrefixURI SyntaxDescription
hf://hf://<model-ref>A vLLM-compatible model name from huggingface.co dynamically loaded and executed with a vLLM backend.
openllm://openllm://<model-ref>An openllm model name from huggingface.co dynamically loaded and executed with a OpenLLM + VLLM backend.
s3://s3://<bucket-name>/<path-to-model>A model directory dynamically downloaded from an associated s3 registry bucket. This is supported for the bento-archive, openllm, and custom model formats.
pvc://See From PVC setup guidesA PVC model path that can be used for pre-downloaded NIM and Custom models.
pfs://pfs://<project>/<repo>@<commit>[:<path>][?containerPath=<path>]A PFS model path that can be used for models stored in HPE Machine Learning Data Management repositories.
ngc://Not supported

How to Add a Packaged Model From a Registry

  1. Sign in to HPE Machine Learning Inferencing Software.
    curl -X 'POST' \
      '<YOUR_EXT_CLUSTER_IP>/api/v1/login' \
      -H 'accept: application/json' \
      -H 'Content-Type: application/json' \
      -d '{
      "username": "<YOUR_USERNAME>",
      "password": "<YOUR_PASSWORD>"
    }'
  2. Obtain the Bearer token from the response.
  3. Use the following cURL command to add a new packaged model.
    curl -X 'POST' \
      '<YOUR_EXT_CLUSTER_IP>/api/v1/models' \
      -H 'accept: application/json' \
      -H 'Authorization: Bearer <YOUR_ACCESS_TOKEN>' \
      -H 'Content-Type: application/json' \
      -d '{
      "arguments": ["--debug"],
      "description": "<DESCRIPTION>",
      "environment": {
          "key": "value",
          "key2": "value2"
      }
      "image": "<USER_NAME>/<MODEL_NAME>:<TAG>",
      "modelFormat": "<MODEL_FORMAT>",
      "name": "<MODEL_NAME>",
      "registry": "<REGISTRY>",
      "resources": {
          "gpuType": "<GPU_TYPE>",
          "limits": {
              "cpu": "<CPU_LIMIT>",
              "gpu": "<GPU_LIMIT>",
              "memory": "<MEMORY_LIMIT>"
          },
          "requests": {
              "cpu": "<CPU_REQUEST>",
              "gpu": "<GPU_REQUEST>",
              "memory": "<MEMORY_REQUEST>"
          }
      },
      "url": "<OBJECT_URL>"
    }'