Advanced Configuration Options

While adding or editing a deployment, you can customize the default runtime configuration MLIS uses to start your model server. This is useful for models that require modifications to MLIS’s default settings or when using a custom image that needs specific configuration parameters.

note icon Note

Before You Start

  • You should review your model’s framework documentation (e.g., OpenLLM, BentoML, NIM) and its CLI options.
  • You should already know the arguments and environment variables you’d like to set for your model based on previous testing.

Runtime Configuration Options

Environment Variables

VariableDescription
AIOLI_LOGGER_PORTThe port that the logger service listens on; default is 49160.
AIOLI_PROGRESS_DEADLINEThe deadline for downloading the model; default is 1500s.
AIOLI_READINESS_FAILURE_THRESHOLDThe number of readiness probe failures before the deployment is considered unhealthy; default is 100.
AIOLI_COMMAND_OVERRIDEThe customized deployment command that enables you to override the default deployment command within a predefined runtime.
AIOLI_SERVICE_PORTThe inference service container port used for communication; default is 8080 except for NIMs, which is 8000.
AIOLI_DISABLE_MODEL_CACHEDisables automatic model caching for a deployment, even if it is enabled for the model; default is false.
AIOLI_DISABLE_LOGGERA workaround for the Kserve defect concerning streamed responses.

Command Override Arguments

MLIS executes a default command for your container runtime based on the type of packaged model you have selected. However, you can modify this command using the AIOLI_COMMAND_OVERRIDE environment variable. Any arguments from the packaged model are then appended to the end of this command, followed by any arguments from the deployment (AIOLI_COMMAND_OVERRIDE = [CLI_COMMAND] [MODEL_ARGS] [DEPLOYMENT_ARGS]).

Default Commands

The following table shows the default command for each packaged model’s framework type:

FRAMEWORKCOMMANDDESCRIPTION
OpenLLMopenllm start --port {{.containerPort}} {{.modelDir}}You can add any options from OpenLLM version 0.4.44 to your command (see openllm start -h).
Bento Archivebentoml serve ...You can add any options from BentoML version 1.1.11 to your command (see bentoml serve -h).
CustomnoneFor custom models, the default entrypoint for the container is executed.
NVIDIA NIMnoneFor NIM models, the default entrypoint for the container is executed. You must use environment variables; NIM contaiers do not honor CLI arguments.
vLLM–served-model-name {{.modelName}} –model {{.modelName}} –port {{.containerPort}} –download-dir {{.modelDir}}Arguments vary for S3/PVC/PFS URLs.

Template Variables

MLIS provides template variables for customizing the runtime command:

Named ArgumentDescription
{{.numGpus}}The number of GPUs the model is requesting.
{{.modelName}}The MLIS model name being deployed.
{{.modelDir}}The directory into which the model will be downloaded. This is typically /mnt/models. This applies to NIM, OpenLLM, and S3 models.
{{.containerPort}}The http port that the container must listen on for inference requests and readiness checks.

Examples

AIOLI_COMMAND_OVERRIDE="openllm start {{.modelName}} --port {{.containerPort}} --gpu-memory-utilization 0.9 --max-total-tokens 4096"
AIOLI_COMMAND_OVERRIDE="bentoml serve {{.modelDir}}/bentofile.yaml --production --port {{.containerPort}} --host 0.0.0.0"