Release Notes Highlights for MLIS

1.2.0

October 31, 2024

Welcome to the 1.2.0 release of HPE Machine Learning Inferencing Software (MLIS).


Highlights

This release includes the following features:

Model Caching (PV/PVC)

As an admin, you can now enable model caching when installing MLIS. Model caching uses ReadWriteMany (RWX) PersistentVolumeClaims (PVCs) to improve inference service startup times and performance.

  • Automatically managed by the controller
  • Efficient access to cached models across multiple namespaces
  • Configurable caching behavior and storage options
  • Support for NFS and compatible storage classes
  • Ability to enable/disable caching for specific models and deployments
  • Tools for managing and cleaning up cached models

To enable, set modelsCacheStorage.enabled: true in Helm values during installation.

For full details, see the Model Caching documentation.

Model Registries & Storage

Added support for a new registry type: HPE Machine Learning Data Management PFS repositories. Models can now be pulled using the pfs:// protocol. See the PFS Registry Setup Guide and Add Registry Guide for more information.

Manageable Auto Scaling & Resource Templates

MLIS now offers enhanced customization of auto scaling and resource templates. While default templates are provided, you can create and manage custom templates through the Settings page in the UI. Key features include:

  • Custom resource template creation for packaged models
  • Custom auto scaling template creation for deployments

Enhancements

  • Added tooltip to user icon displaying logged-in username on hover

Bug Fixes

  • Fixed UI issue preventing confirmation prompt closure when deleting registries, packaged models, or deployments
  • Corrected display of image field requirement based on packaged model type
  • Ensured proper saving of environment variables without values for packaged models and deployments
  • Fixed persistence of environment variable changes on paused deployments
  • Stabilized display of packaged models list from NGC registry in UI

1.1.0

August 7, 2024

Welcome to the 1.1.0 release of HPE Machine Learning Inferencing Software (MLIS).

tip icon REST API Docs
Reminder: You can access the Rest API Documentation from your MLIS instance by navigating to http://<your-mlis-url>/docs/rest-api/.

Highlights

This release includes the following features:

Deployments

Deployment Tokens

Create deployment tokens to control access to your inference service endpoint.

  • UI Feature: Create and manage deployment tokens via the UI.
  • CLI Feature: In addition to the timestamp format, you can manage the expiration date of an access token by applying a simple date-time or simple date format.

Model Management

Admin

  • Feature: Automatically collect and report anonymous customer telemetry data to improve product quality and support.

Known Issues

  • DB Connections: TLS/SSL connections are not supported for both built-in and external databases. This issue will be addressed in a future release.
  • LLM Streaming Response Payload Received All at Once: We have identified an issue where streaming responses are being received all at once, instead of as a continuous stream. See the No Streamed Responses Troubleshooting article for more information and a workaround.
  • UI Mislabels Deployment on Errors Tab: While a deployment is in progress, the UI may display error messages for a “Deployment B” when viewing the Errors tab; this is a mislabeling of the deployment. In this case, you can ignore the initial column and instead focus on the “Type” and “Message” columns during troubleshooting.
  • Custom Grafana Base URLs: Deployments come with pre-built Grafana dashboard quicklinks — however, if you choose to manually configure the grafana.deployment_dashboard_baseurl in values.yaml, the link cannot be set to point to a completely different Grafana instance with a full host path; it must point to a different path on the same Grafana instance.

1.0.0

May 5, 2024

Welcome to the first Generally Available (GA) release of HPE Machine Learning Inferencing Software (MLIS)! We recommend that you follow the steps outlined in our Get Started section to install and configure this platform.

You can also refer to the object model, environment variable, and helm chart references to learn more about the platform’s architecture and configuration options.

tip icon REST API Docs
You can access the Rest API Documentation from your MLIS instance by navigating to http://<your-mlis-url>/docs/rest-api/.

Highlights

This release includes the following features:

Registries

Create registries to reference your models from various sources. MLIS supports s3, OpenLLM, and NGC registries.

  • Feature: Perform registry operations via the UI, API, or CLI.
  • Feature: s3 registries include any s3-compatible storage service (e.g., AWS S3, MinIO, etc.).

Packaged Models

Register packaged models to be used in deployments of inference services. These can be both models you’ve trained and uploaded to your registry or pre-existing models provided by your registry. Supported model types include Bento Archive, Custom (openllm or bentoml), bentoml, NIM, and OpenLLM.

  • Feature: Perform packaged model operations via the UI, API, or CLI.
  • Feature: MLIS provides default images to execute bentoml and openllm models from openllm:// and s3:// URLs.
  • Feature: MLIS enables you to pull and execute NIM models directly from the NGC catalog.
  • Feature: You can provide entirely custom bentoml or openllm container images or build a new image off of our default base container images.
  • Feature: Specify environment variables and arguments to the model container during packaged model creation.
  • Feature: Specify resource templates for the model container during packaged model creation.
  • Feature: Select GPU types for the model container to use during packaged model creation; this must be enabled by an admin.

Deployments

Create deployments to launch inference services. Deployments are created from packaged models and can be scaled horizontally. You can provide users access to the deployment via a generated URL.

  • Feature: Perform deployment operations via the UI, API, or CLI.
  • Feature: Choose a default autoscaling target template or define custom autoscaling targets for your deployment.
  • Feature: Provide environment variables and arguments to the deployment instance during deployment creation.
  • Feature: Require authentication for using a deployed inference service.
  • Feature: Initiate canary rollouts to test new model version performance before full deployment.
  • Feature: Monitor deployment performance and resource usage using pre-built Grafana dashboards. These dashboards include logs (via Loki) and metrics (via Prometheus).

Admin


Known Issues

  • Custom Grafana Base URLs: Deployments come with pre-built Grafana dashboard quicklinks — however, if you choose to manually configure the grafana.deployment_dashboard_baseurl in values.yaml, the link provided on a deployment from the UI will not work.
  • Unsaved Changes: Changes made in a modal (e.g., editing a packaged model) are lost if you click outside the modal without saving.
  • SSO Sign-in Button: You must click on the text of the SSO sign-in button to sign in with SSO; clicking anywhere else on the button itself results in a “user not found” error.
  • NIM Model Dropdown: If you are adding a NIM packaged model for NGC, the dropdown for selecting a NIM model does not display any models due to a NIM v1.0.0 change. You can still manually enter the NIM model name from the NGC catalog in the image field.