Advanced Configuration Options

While adding or editing a deployment, you can customize the default runtime configuration MLIS uses to start your model server. This is useful for models that require modifications to MLIS’s default settings or when using a custom image that needs specific configuration parameters.

Note

If you want to use a fully custom container image instead of simply modifying the default runtime configuration, see Customize the Base Container Image.

Before You Start #

You should review your model’s framework documentation (e.g., OpenLLM, BentoML, NIM) and its CLI options.
You should already know the arguments and environment variables you’d like to set for your model based on previous testing.

Runtime Configuration Options #

Environment Variables #

Variable	Description
`AIOLI_LOGGER_PORT`	The port that the logger service listens on; default is `49160`.
`AIOLI_PROGRESS_DEADLINE`	The deadline for downloading the model; default is `1500s`.
`AIOLI_READINESS_FAILURE_THRESHOLD`	The number of readiness probe failures before the deployment is considered unhealthy; default is `100`.
`AIOLI_COMMAND_OVERRIDE`	The customized deployment command that enables you to override the default deployment command within a predefined runtime.
`AIOLI_SERVICE_PORT`	The inference service container port used for communication; default is `8080` except for NIMs, which is `8000`.
`AIOLI_DISABLE_MODEL_CACHE`	Disables automatic model caching for a deployment, even if it is enabled for the model; default is `false`.
`AIOLI_DISABLE_LOGGER`	A workaround for the Kserve defect concerning streamed responses.

Command Override Arguments #

MLIS executes a default command for your container runtime based on the type of packaged model you have selected. However, you can modify this command using the AIOLI_COMMAND_OVERRIDE environment variable. Any arguments from the packaged model are then appended to the end of this command, followed by any arguments from the deployment (AIOLI_COMMAND_OVERRIDE = [CLI_COMMAND] [MODEL_ARGS] [DEPLOYMENT_ARGS]).

Default Commands #

The following table shows the default command for each packaged model’s framework type:

FRAMEWORK	COMMAND	DESCRIPTION
OpenLLM	`openllm start --port {{.containerPort}} {{.modelDir}}`	You can add any options from OpenLLM version `0.4.44` to your command (see `openllm start -h`).
Bento Archive	`bentoml serve ...`	You can add any options from BentoML version `1.1.11` to your command (see `bentoml serve -h`).
Custom	`none`	For custom models, the default entrypoint for the container is executed.
NVIDIA NIM	`none`	For NIM models, the default entrypoint for the container is executed. You must use environment variables; NIM contaiers do not honor CLI arguments.
vLLM	`--model {{.modelName}} --port {{.containerPort}} --download-dir {{.modelDir}} --tensor-parallel-size {{.numGpus}}`	Arguments vary for S3/PVC/PFS URLs.

Template Variables #

MLIS provides template variables for customizing the runtime command:

Named Argument	Description
`{{.numGpus}}`	The number of GPUs the model is requesting.
`{{.modelName}}`	The model name being deployed. For Openllm/vLLM the model name path from the url.
`{{.modelDir}}`	The directory into which the model will be downloaded. This is typically `/mnt/models`. This applies to NIM, OpenLLM, and S3 models.
`{{.containerPort}}`	The http port that the container must listen on for inference requests and readiness checks.

Examples #

AIOLI_COMMAND_OVERRIDE="openllm start {{.modelName}} --port {{.containerPort}} --gpu-memory-utilization 0.9 --max-total-tokens 4096"

AIOLI_COMMAND_OVERRIDE="bentoml serve {{.modelDir}}/bentofile.yaml --production --port {{.containerPort}} --host 0.0.0.0"