Manage Auto Scaling Templates
HPE Machine Learning Inferencing Software includes default autoscaling templates that users can select when adding or editing a deployment. You can manage these auto scaling templates and create new ones using the MLIS UI, CLI, or API.
Before You Start #
- You must have the Admin or Maintainer user role to manage auto scaling templates.
Default Auto Scaling Templates #
The following table shows the default auto scaling templates available in MLIS.
name | description | autoscaling_min_replicas | autoscaling_max_replicas | autoscaling_metric | autoscaling_target |
---|---|---|---|---|---|
fixed-1 | One inference service replica, always available. | 1 | 1 | rps | 0 |
fixed-2 | Two inference service replicas, always available. | 2 | 2 | rps | 0 |
scale-0-to-1-concurrency-3 | Scale from 0 to 1 replicas with metric concurrency 3. | 0 | 1 | concurrency | 3 |
scale-0-to-4-rps-10 | Scale from 0 to 4 replicas metric with requests-per-second 10. | 0 | 4 | rps | 10 |
scale-0-to-8-rps-20 | Scale from 0 to 8 replicas metric with requests-per-second 20. | 0 | 8 | rps | 20 |
scale-1-to-4-rps-10 | Scale from 1 to 4 replicas metric with requests-per-second 10. | 1 | 4 | rps | 10 |
scale-1-to-8-concurrency-3 | Scale from 1 to 8 replicas with metric concurrency 3. | 1 | 8 | concurrency | 3 |
How to Add Auto Scaling Templates #
Via the UI #
-
In the MLIS UI, navigate to Settings > Auto scaling templates.
-
Select Add new auto scaling template.
-
Input the name, description, and autoscaling requirements for the template.
Field Description Template name A unique identifier for the auto scaling template Description A brief explanation of the template’s purpose or characteristics Minimum instances The lowest number of instances that will run, even during periods of low activity Maximum instances The highest number of instances allowed to run during peak demand Auto scaling target The metric and target value used to trigger scaling actions - Metric The type of metric to monitor (e.g., concurrency, CPU utilization, memory usage) - Target The threshold value for the chosen metric that triggers scaling actions Available MetricsThe available metric types depend on the Autoscaler implementation:
- KPA Autoscaler (default): Supports
concurrency
andrps
metrics - HPA Autoscaler: Supports the
cpu
metric
- KPA Autoscaler (default): Supports
-
Select Create template.
The new template is now available from the Auto scaling targets template dropdown on the Scaling tab when adding or editing a deployment. To update this template, select the ellipsis icon next to the template name and choose Edit.
Via the CLI #
- Sign in via the CLI.
aioli user login admin
- Add a new resource template with the following command:
aioli templates autoscaling create <TEMPLATE_NAME> \ --autoscaling-min-replicas <MIN_REPLICAS> \ --autoscaling-max-replicas <MAX_REPLICAS> \ --autoscaling-metric <METRIC> \ --autoscaling-target <TARGET>
Via the API #
- Sign in to MLIS.
curl -X 'POST' \ '<YOUR_EXT_CLUSTER_IP>/api/v1/login' \ -H 'accept: application/json' \ -H 'Content-Type: application/json' \ -d '{ "username": "<YOUR_USERNAME>", "password": "<YOUR_PASSWORD>" }'
- Obtain the Bearer token from the response.
- Use the following cURL command to add a new auto scaling template.
curl -X 'POST' \ 'https://<YOUR_EXT_CLUSTER_IP>/api/v1/templates/autoscaling' \ -H 'Accept: application/json' \ -H 'Authorization: Bearer <YOUR_ACCESS_TOKEN>' \ -H 'Content-Type: application/json' \ -d '{ "autoScaling": { "maxReplicas": 1, "metric": "rps", "minReplicas": 0, "target": 1 }, "description": "An autoscaling template", "name": "my-template" }'