Model Caching (PV/PVC)
Model caching is a method for pre-loading models in MLIS to reduce inference service startup times and improve performance. This approach uses a ReadWriteMany (RWX) PersistentVolumeClaim (PVC) that can be accessed across multiple namespaces where you deploy your inference services.
During installation, when model caching is enabled, the MLIS controller automatically sets up and manages this PVC. When you deploy an inference service, the system creates copies of the underlying PersistentVolume (PV) and PVC in the target namespace, ensuring efficient access to cached models.
Benefits
- Improved startup time performance
- Automatic management by the MLIS controller
- Efficient access to models across multiple namespaces
Key features
- Utilizes a ReadWriteMany (RWX) PersistentVolumeClaim (PVC)
- Creates efficient PV and PVC copies in target namespaces upon deployment
- Ensures frequently used models are readily available
Model caching is particularly useful for optimizing resource usage and enhancing overall system efficiency in large-scale deployments.
Before You Start #
- You must be logged into MLIS as a user with an Admin role to enable model caching.
- You should be familiar with PersistantVolumeClaims, PersistentVolumes, and StorageClasses.
- You must have a ReadWriteMany StorageClass configured on your Kubernetes cluster (e.g.
nfs-client
if usingnfs-subdir-external-provisioner
’s default configuration – otherwise, the name is arbitrary and will vary.)
Model Caching Options #
The full set of options for configuring model caching are provided in the default values.yaml
and can be reviewed in the following section.
Check Available StorageClasses #
Verify that your Kubernetes cluster has a StorageClass that supports ReadWriteMany (RWX) access mode.
-
Run the following command to list available StorageClasses:
kubectl get storageclasses.storage.k8s.io
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE nfs-client cluster.local/nfs-subdir-external-provisioner Delete Immediate true 58m premium-rwo pd.csi.storage.gke.io Delete WaitForFirstConsumer true 27h standard kubernetes.io/gce-pd Delete Immediate true 27h standard-rwo (default) pd.csi.storage.gke.io Delete WaitForFirstConsumer true
-
Copy the StorageClass name you want to use when enabling model caching.
Manually Provisioned PV for RWX Access (Advanced) #
If you don’t have an appropriate ReadWriteMany StorageClass configured on your Kubernetes cluster, you can provide a manually configured PV that matches all of the following:
- Access mode: ReadWriteMany (RWX)
- Size:
modelsCacheStorage.storageSize
’s value in your Helm values - Storage class:
modelsCacheStorage.storageClassName
’s value in your Helm values - NFS: Your cluster’s NFS server IP address and the path to the NFS server’s filestore directory
apiVersion: v1
kind: PersistentVolume
metadata:
name: nfs-pv
spec:
capacity:
storage: <STORAGE_SIZE>
accessModes:
- ReadWriteMany
nfs:
path: <NFS_SERVER_PATH>
server: <NFS_SERVER_IP>
persistentVolumeReclaimPolicy: Retain
storageClassName: <STORAGE_CLASS_NAME>
Enable Model Caching #
If you have a ReadWriteMany StorageClass that maps to NFS or a cluster filesystem using a hostPath volume, you can specify the StorageClass name in the Helm values during the install.
Example
modelsCacheStorage:
enabled: true
storageSize: 300Gi
storageClassName: <NFS_STORAGE_CLASS_NAME> # nfs-client if using nfs-subdir-external-provisioner's default configuration.
When MLIS is installed with model caching enabled, a PVC will be created with a name following the format aioli-models-cache-pvc-<release-name>
. This PVC will be bound to an appropriate PV based on the storageClassName you specified.
kubectl get pvc -A | egrep "^NAMESPACE|aioli-models-cache-pvc-aioli"
NAMESPACE NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS VOLUMEATTRIBUTESCLASS AGE
default aioli-models-cache-pvc-aioli Bound pvc-faf71fe8-a2d3-4fb4-9ce2-506c76af08df 300Gi RWX <STORAGE_CLASS_NAME> <unset> 163m
NFS Storage Considerations #
Many Kubernetes clusters use NFS-based storage solutions, such as the nfs-subdir-external-provisioner
, which provides automatic provisioning of PVCs to an NFS endpoint through the nfs-client
StorageClass. This setup offers several advantages for model caching:
- Direct filesystem access: On Kubernetes systems with
nfs-client
storage, you can directly access the filesystem, making cleanup and management easier. - Automatic provisioning: PVCs are automatically provisioned to an NFS endpoint, simplifying storage management.
- Flexible naming: Persistent volumes are typically provisioned with names following a defined format.
However, it’s important to be aware of some limitations when using NFS storage for model caching:
- Storage guarantees: The provisioned storage is not guaranteed. You may allocate more than the NFS share’s total size, and the share may not have enough storage space to accommodate all requests.
- Storage limits: The provisioned storage limit is not enforced. Applications can expand to use all available storage regardless of the provisioned size.
- Resizing limitations: Storage resize or expansion operations are not currently supported. Attempting to resize may result in an error state.
Before enabling model caching with NFS storage, ensure that:
- Your Kubernetes nodes can communicate with the NFS server at the host level.
- The NFS server is properly configured before using the
nfs-client
and provisioner. - You monitor storage usage to prevent overallocation or running out of space.
Configure Caching Behavior #
Adjust the caching behavior using the following parameters:
checkUnusedCachedModelsEvery
: How frequently the controller checks for and deletes unused cached models.purgeUnusedCachedModelsAfter
: How long the controller retains unused cached models.
The time unit can be any of the following: s
(seconds), m
(minutes), h
(hours), d
(days), or w
(weeks).
modelsCacheStorage:
enabled: true
checkUnusedCachedModelsEvery: 1d
purgeUnusedCachedModelsAfter: 1w
Enable Caching for Specific Models #
Individual models must explicitly enable the use of model caching. Users with an Admin role can use the CLI or UI to enable caching for individual models. Once enabled, new versions of the model will continue to have caching enabled even if the new version is created by a user with the Maintainer role
aioli model update <model-name> --enable-caching/--disable-caching
Disable Caching for Specific Deployments #
Model caching can be temporarily disabled for a particular deployment by setting the environment variable AIOLI_DISABLE_MODEL_CACHE=true
in the deployment. This may be useful if the disk used for the shared network storage becomes full, causing deployments to fail. Disabling the model caching for a particular deployment reverts back to using the local disk on the pod at the expense of having to download the model for each deployment replica.
Manage Cached Models #
- Models are automatically removed when deleted from the database
- Unused models are purged based on the configured time period
- The MLIS controller manages the cache content
Switching StorageClasses #
If you need to switch to a different StorageClass (e.g., to increase storage size or for performance reasons), you can use the pvcNameSuffix
option to create a new PVC. Models will be re-cached on the new PVC.
- Upgrade MLIS with the new StorageClass and a unique PVC name suffix by adding the following to your upgrade command:
helm upgrade aioli aioli-1.3.0.tgz \ -f my-values.yaml \ # All of the original values from your Helm install should also be included here. --set modelsCacheStorage.enabled=true \ --set modelsCacheStorage.pvcNameSuffix="<UNIQUE_SUFFIX>" \ --set modelsCacheStorage.storageClassName="<NFS_STORAGE_CLASS_NAME>" \ --set modelsCacheStorage.storageSize=1000Gi
- After the upgrade, perform a canary rollout of the model for a seamless transition. This rollout must include at least one update to your deployment or model settings to trigger the canary rollout.
aioli deployment update <DEPLOYMENT_NAME> \ --model <PACKAGED_MODEL_NAME> \ --canary-percentage 100
At this point, you can optionally perform the cleanup process to remove any leftover cached models from the old PV/PVC.
Disable PV/PVC Caching #
- To disable model caching, perform a Helm upgrade with
modelCacheStorage.enabled
set tofalse
.helm upgrade aioli aioli-1.3.0.tgz \ -f my-values.yaml \ # All of the original values from your Helm install should also be included here. --set global.imagePullSecrets[0].name=regcred \ --set modelsCacheStorage.enabled=false
- Models continue to use space on the PV/PVC until a canary rollout is performed. This rollout must include at least one update to your deployment or model settings to trigger the canary rollout.
aioli deployment update <DEPLOYMENT_NAME> \ --model <PACKAGED_MODEL_NAME> \ --canary-percentage 100
At this point, you can optionally perform the cleanup process to remove the cached models from the PV/PVC.
Clean Up Cached Models #
After extending the PV/PVC or before uninstalling MLIS, it’s important to clean up old models to free up storage space on the PV. This process should be done before deleting the old PVC, as PVC deletion doesn’t automatically clean up the storage used on the PV.
If you don’t have direct access to the PV server, you can create a temporary pod to access and clean up the models:
- Create a cleanup pod:
cat <<EOF | kubectl apply -f - apiVersion: v1 kind: Pod metadata: name: pvc-cleanup namespace: default spec: containers: - image: busybox name: pvc-inspector command: ["tail"] args: ["-f", "/dev/null"] volumeMounts: - mountPath: /mnt/models name: pvc-mount volumes: - name: pvc-mount persistentVolumeClaim: claimName: aioli-models-cache-pvc-aioli EOF
- Access the pod and remove the models:
kubectl exec -it pvc-cleanup -- sh cd /mnt/models/ ls -l # List contents to identify model directories rm -rf <model-directory-to-remove>/
- Verify the space has been freed:
df -h /mnt/models
- Remove the cleanup pod when finished:
kubectl delete pod pvc-cleanup
- Delete the old PVC:
kubectl delete pvc aioli-models-cache-pvc-aioli
Troubleshooting #
- If the PVC remains in
Pending
state, check StorageClass availability - Ensure the PV supports cloning across namespaces (NFS or hostPath)
- For custom storage types, use
modelsCacheStorage.bypassStorageCheck: true
StorageClass Compatibility #
If you specify a storageClassName
that results in a persistent volume type other than NFS or hostPath, it will be ignored and an ERROR will be logged in the MLIS controller log. If you are certain that your CSI driver or other storage type results in a PV that can be cloned between namespaces, you can bypass the type check by setting modelsCacheStorage.bypassStorageCheck: true
in your Helm values.