Resource Requests PPS

Spec

This is a top-level attribute of the pipeline spec.

{
  "pipeline": {...},
  "transform": {...},
  "resourceRequests": {
    "cpu": number,
    "memory": string,
    "gpu": {
      "type": string,
      "number": int
      }
    "disk": string,
  },
  ...
}

Behavior

resourceRequests describes the amount of resources that the pipeline workers will consume. Knowing this in advance enables HPE Machine Learning Data Management to schedule big jobs on separate machines, so that they do not conflict, slow down, or terminate.

This parameter is optional, and if you do not explicitly add it in the pipeline spec, HPE Machine Learning Data Management creates Kubernetes containers with the following default resources:

  • The user and storage containers request 1 CPU, 0 disk space, and 256MB of memory.
  • The init container requests the same amount of CPU, memory, and disk space that is set for the user container.

The resourceRequests parameter enables you to overwrite these default values.

The memory field is a string that describes the amount of memory, in bytes, that each worker needs. Allowed SI suffixes include M, K, G, Mi, Ki, Gi, and other.

For example, a worker that needs to read a 1GB file into memory might set "memory": "1.2G" with a little extra for the code to use in addition to the file. Workers for this pipeline will be placed on machines with at least 1.2GB of free memory, and other large workers will be prevented from using it, if they also set their resourceRequests.

The cpu field is a number that describes the amount of CPU time in cpu seconds/real seconds that each worker needs. Setting "cpu": 0.5 indicates that the worker should get 500ms of CPU time per second. Setting "cpu": 2 indicates that the worker gets 2000ms of CPU time per second. In other words, it is using 2 CPUs, though worker threads might spend 500ms on four physical CPUs instead of one second on two physical CPUs.

The disk field is a string that describes the amount of ephemeral disk space, in bytes, that each worker needs. Allowed SI suffixes include M, K, G, Mi, Ki, Gi, and other.

In both cases, the resource requests are not upper bounds. If the worker uses more memory than it is requested, it does not mean that it will be shut down. However, if the whole node runs out of memory, Kubernetes starts deleting pods that have been placed on it and exceeded their memory request, to reclaim memory. To prevent deletion of your worker node, you must set your memory request to a sufficiently large value. However, if the total memory requested by all workers in the system is too large, Kubernetes cannot schedule new workers because no machine has enough unclaimed memory. cpu works similarly, but for CPU time.

For more information about resource requests and limits see the Kubernetes docs on the subject.