Release Notes Highlights for MLIS

1.3.0

January 9, 2025

Welcome to the 1.3.0 release of HPE Machine Learning Inferencing Software (MLIS).

Highlights #

This release includes the following features/changes:

Huggingface/vLLM Runtime #

You can now deploy vLLM-compatible models. directly downloaded from huggingface.co.

New registry of type HuggingFace enables access to vLLM-compatible models on huggingface.co
UI Browser to select from vLLLM-compatible huggingface models.
New model format vllm which allows deployment of vLLM-compatible models. By default the models are executed with vllm/vllm-openai:v0.6.2. If no GPUs are provided, an cpu-only amd64 variant of this runtime provided by MLIS.

HPE AI Essentials Updates #

When deployed as part of HPE AI Essentials, note the following changes in MLIS behavior:

All deployment endpoints require an API token generated by HPE AI Essentials (Gen AI -> Model Endpoints page)
The API tokens capability of MLIS is disabled.
When accessing NVIDIA NIMs, only the models provided by HPE AI Essentials are available. No NGC Access key is required.

Bug Fixes #

The first model can be deployed successfully using the Roll-out button. On the first roll-out the model was blocked by the errors “Model ’’ does not exist”.

1.2.0

October 31, 2024

Welcome to the 1.2.0 release of HPE Machine Learning Inferencing Software (MLIS).

Highlights #

This release includes the following features:

Model Caching (PV/PVC) #

As an admin, you can now enable model caching when installing MLIS. Model caching uses ReadWriteMany (RWX) PersistentVolumeClaims (PVCs) to improve inference service startup times and performance.

Automatically managed by the controller
Efficient access to cached models across multiple namespaces
Configurable caching behavior and storage options
Support for NFS and compatible storage classes
Ability to enable/disable caching for specific models and deployments
Tools for managing and cleaning up cached models

To enable, set modelsCacheStorage.enabled: true in Helm values during installation.

For full details, see the Model Caching documentation.

Model Registries & Storage #

Added support for a new registry type: HPE Machine Learning Data Management PFS repositories. Models can now be pulled using the pfs:// protocol. See the PFS Registry Setup Guide and Add Registry Guide for more information.

Manageable Auto Scaling & Resource Templates #

MLIS now offers enhanced customization of auto scaling and resource templates. While default templates are provided, you can create and manage custom templates through the Settings page in the UI. Key features include:

Custom resource template creation for packaged models
Custom auto scaling template creation for deployments

Enhancements #

Added tooltip to user icon displaying logged-in username on hover

Bug Fixes #

Fixed UI issue preventing confirmation prompt closure when deleting registries, packaged models, or deployments
Corrected display of image field requirement based on packaged model type
Ensured proper saving of environment variables without values for packaged models and deployments
Fixed persistence of environment variable changes on paused deployments
Stabilized display of packaged models list from NGC registry in UI

1.1.0

August 7, 2024

Welcome to the 1.1.0 release of HPE Machine Learning Inferencing Software (MLIS).

REST API Docs

Reminder: You can access the Rest API Documentation from your MLIS instance by navigating to http://<your-mlis-url>/docs/rest-api/.

Highlights #

This release includes the following features:

Deployments #

Deployment Tokens #

Create deployment tokens to control access to your inference service endpoint.

UI Feature: Create and manage deployment tokens via the UI.
CLI Feature: In addition to the timestamp format, you can manage the expiration date of an access token by applying a simple date-time or simple date format.

Model Management #

Feature: Load models from Persistent Volume Claims (PVC) using pvc:// model URL prefixes within the same Kubernetes namespace.

Admin #

Feature: Automatically collect and report anonymous customer telemetry data to improve product quality and support.

Known Issues #

DB Connections: TLS/SSL connections are not supported for both built-in and external databases. This issue will be addressed in a future release.
LLM Streaming Response Payload Received All at Once: We have identified an issue where streaming responses are being received all at once, instead of as a continuous stream. See the No Streamed Responses Troubleshooting article for more information and a workaround.
UI Mislabels Deployment on Errors Tab: While a deployment is in progress, the UI may display error messages for a “Deployment B” when viewing the Errors tab; this is a mislabeling of the deployment. In this case, you can ignore the initial column and instead focus on the “Type” and “Message” columns during troubleshooting.
Custom Grafana Base URLs: Deployments come with pre-built Grafana dashboard quicklinks — however, if you choose to manually configure the grafana.deployment_dashboard_baseurl in values.yaml, the link cannot be set to point to a completely different Grafana instance with a full host path; it must point to a different path on the same Grafana instance.

1.0.0

May 5, 2024

Welcome to the first Generally Available (GA) release of HPE Machine Learning Inferencing Software (MLIS)! We recommend that you follow the steps outlined in our Get Started section to install and configure this platform.

You can also refer to the object model, environment variable, and helm chart references to learn more about the platform’s architecture and configuration options.

REST API Docs

You can access the Rest API Documentation from your MLIS instance by navigating to http://<your-mlis-url>/docs/rest-api/.

Highlights #

This release includes the following features:

Registries #

Create registries to reference your models from various sources. MLIS supports s3, OpenLLM, and NGC registries.

Feature: Perform registry operations via the UI, API, or CLI.
Feature: s3 registries include any s3-compatible storage service (e.g., AWS S3, MinIO, etc.).

Packaged Models #

Register packaged models to be used in deployments of inference services. These can be both models you’ve trained and uploaded to your registry or pre-existing models provided by your registry. Supported model types include Bento Archive, Custom (openllm or bentoml), bentoml, NIM, and OpenLLM.

Feature: Perform packaged model operations via the UI, API, or CLI.
Feature: MLIS provides default images to execute bentoml and openllm models from openllm:// and s3:// URLs.
Feature: MLIS enables you to pull and execute NIM models directly from the NGC catalog.
Feature: You can provide entirely custom bentoml or openllm container images or build a new image off of our default base container images.
Feature: Specify environment variables and arguments to the model container during packaged model creation.
Feature: Specify resource templates for the model container during packaged model creation.
Feature: Select GPU types for the model container to use during packaged model creation; this must be enabled by an admin.

Deployments #

Create deployments to launch inference services. Deployments are created from packaged models and can be scaled horizontally. You can provide users access to the deployment via a generated URL.

Feature: Perform deployment operations via the UI, API, or CLI.
Feature: Choose a default autoscaling target template or define custom autoscaling targets for your deployment.
Feature: Provide environment variables and arguments to the deployment instance during deployment creation.
Feature: Require authentication for using a deployed inference service.
Feature: Initiate canary rollouts to test new model version performance before full deployment.
Feature: Monitor deployment performance and resource usage using pre-built Grafana dashboards. These dashboards include logs (via Loki) and metrics (via Prometheus).

Admin #

Feature: Set up external authentication for MLIS using your favorite identity provider.
- See the GitHub Identity Provider guide for an example.
Feature: Manage users and user roles (RBAC) via the UI, API, or CLI.

Known Issues #

Custom Grafana Base URLs: Deployments come with pre-built Grafana dashboard quicklinks — however, if you choose to manually configure the grafana.deployment_dashboard_baseurl in values.yaml, the link provided on a deployment from the UI will not work.
Unsaved Changes: Changes made in a modal (e.g., editing a packaged model) are lost if you click outside the modal without saving.
SSO Sign-in Button: You must click on the text of the SSO sign-in button to sign in with SSO; clicking anywhere else on the button itself results in a “user not found” error.
NIM Model Dropdown: If you are adding a NIM packaged model for NGC, the dropdown for selecting a NIM model does not display any models due to a NIM v1.0.0 change. You can still manually enter the NIM model name from the NGC catalog in the image field.