Pre-loading Models #

When deploying an inference service, it can take a significant amount of time for the service to reach the Ready state if it has to download and load a large AI model during startup. Pre-loading models in MLIS significantly reduces inference service startup times, especially for large AI models.

This section covers two approaches to pre-loading models: Model Caching and Manual Pre-loading.

Feature	Model Caching	Manual Pre-loading
Description	Automatic cache-on-first-use when enabled by an Administrator	Manual creation of a PersistentVolumeClaim (PVC) for each model version and referencing the PVC URL
Scope	Multiple deployments & namespaces	Individual deployments
Model Types	Bento-Archive, OpenLLM, and Nim – Custom models not supported; ideal for models with existing URLs (`pfs://`, `openllm://`, `s3://`)	Any; ideal for OpenLLM or bento-archive models
Storage	Shared PVC across all deployments	PVC can be shared or dedicated to a specific model
User Experience	- Transparent - Automatic removal of unused models	- Must create and manage a PVC - Must use PVC syntax & URL to add packaged model - Must have PVC in the same namespace as the model being deployed - Must manually remove unused models
Flexibility	Allows defining caching behavior	Allows mapping of arbitrary storage

Both methods ensure models are readily available at service startup, improving MLIS deployment responsiveness and efficiency. Choose the method that best fits your specific use case and requirements.

Additional PVC Uses Cases

Note that there are other uses for PVCs, such as:

Cache-on-first-use PVC: Using a PVC that automatically caches the model on first use by referencing the PVC URL; only usable with Custom and NIM models where the URL isn’t already used.
Arbitrary PVC mounts: Mounting a PVC to add arbitrary storage to your inference service container; the PVC can contain a model or any other data.

Options #

Model Caching (PV/PVC)

Learn how to set up Model Caching to improve performance and reduce costs.

Manual Model Pre-loading (PVC)

Learn how to manually pre-load a packaged model onto a Persistent Volume Claim (PVC) for …