Manage Auto Scaling Templates

HPE Machine Learning Inferencing Software includes default autoscaling templates that users can select when adding or editing a deployment. You can manage these auto scaling templates and create new ones using the MLIS UI, CLI, or API.

Before You Start

  • You must have the Admin or Maintainer user role to manage auto scaling templates.

Default Auto Scaling Templates

The following table shows the default auto scaling templates available in MLIS.

namedescriptionautoscaling_min_replicasautoscaling_max_replicasautoscaling_metricautoscaling_target
fixed-1One inference service replica, always available.11rps0
fixed-2Two inference service replicas, always available.22rps0
scale-0-to-1-concurrency-3Scale from 0 to 1 replicas with metric concurrency 3.01concurrency3
scale-0-to-4-rps-10Scale from 0 to 4 replicas metric with requests-per-second 10.04rps10
scale-0-to-8-rps-20Scale from 0 to 8 replicas metric with requests-per-second 20.08rps20
scale-1-to-4-rps-10Scale from 1 to 4 replicas metric with requests-per-second 10.14rps10
scale-1-to-8-concurrency-3Scale from 1 to 8 replicas with metric concurrency 3.18concurrency3

How to Add Auto Scaling Templates

Via the UI

  1. In the MLIS UI, navigate to Settings > Auto scaling templates.

  2. Select Add new auto scaling template.

  3. Input the name, description, and autoscaling requirements for the template.

    Field Description
    Template name A unique identifier for the auto scaling template
    Description A brief explanation of the template’s purpose or characteristics
    Minimum instances The lowest number of instances that will run, even during periods of low activity
    Maximum instances The highest number of instances allowed to run during peak demand
    Auto scaling target The metric and target value used to trigger scaling actions
    - Metric The type of metric to monitor (e.g., concurrency, CPU utilization, memory usage)
    - Target The threshold value for the chosen metric that triggers scaling actions
    note icon Available Metrics

    The available metric types depend on the Autoscaler implementation:

    • KPA Autoscaler (default): Supports concurrency and rps metrics
    • HPA Autoscaler: Supports the cpu metric
  4. Select Create template.

The new template is now available from the Auto scaling targets template dropdown on the Scaling tab when adding or editing a deployment. To update this template, select the ellipsis icon next to the template name and choose Edit.

Via the CLI

  1. Sign in via the CLI.
    aioli user login admin
  2. Add a new resource template with the following command:
    aioli templates autoscaling create <TEMPLATE_NAME> \
    --autoscaling-min-replicas <MIN_REPLICAS> \
    --autoscaling-max-replicas <MAX_REPLICAS> \
    --autoscaling-metric <METRIC> \
    --autoscaling-target <TARGET>

Via the API

  1. Sign in to MLIS.
    curl -X 'POST' \
      '<YOUR_EXT_CLUSTER_IP>/api/v1/login' \
      -H 'accept: application/json' \
      -H 'Content-Type: application/json' \
      -d '{
      "username": "<YOUR_USERNAME>",
      "password": "<YOUR_PASSWORD>"
    }'
  2. Obtain the Bearer token from the response.
  3. Use the following cURL command to add a new auto scaling template.
    curl -X 'POST' \
      'https://<YOUR_EXT_CLUSTER_IP>/api/v1/templates/autoscaling' \
      -H 'Accept: application/json' \
      -H 'Authorization: Bearer <YOUR_ACCESS_TOKEN>' \
      -H 'Content-Type: application/json' \
      -d '{
        "autoScaling": {
          "maxReplicas": 1,
          "metric": "rps",
          "minReplicas": 0,
          "target": 1
        },
        "description": "An autoscaling template",
        "name": "my-template"
    }'