Snapshots

This article explains how to create and manage snapshots of your HPE Machine Learning Data Management cluster. Creating snapshots is essential before performing cluster upgrades to ensure you can recover your cluster state if needed.

Understanding Snapshots

A HPE Machine Learning Data Management snapshot is a complete backup of your cluster state, consisting of:

  • Postgres Snapshot: A pg_dump of the database containing cluster state information
  • ChunkSet: A collection of live data chunks from object storage at the time of backup

You can use these snapshots to restore your cluster to a previous state during Helm upgrades.

helm upgrade pachd pachyderm/pachyderm \
  --set restoreSnapshot.enabled=true \
  --set restoreSnapshot.snapshot_id=42 \
  --reuse-values

How to Manage Snapshots

Create a Snapshot

  1. Run pachctl create snapshot.
  2. Obtain the ID of the snapshot by running pachctl list snapshot.
    ID CHUNKSET CREATED 
    2  2        3 seconds ago
    1  1        58 seconds ago

List Available Snapshots

If you create snapshots routinely, you can list all available snapshots by running pachctl list snapshot.

pachctl list snapshot

Inspect a Snapshot

You can inspect snapshots to verify the contents by running pachctl inspect snapshot <SNAPSHOT_ID>.

pachctl inspect snapshot <SNAPSHOT_ID>
ID: 1
Chunkset: 1
Created: 2 minutes ago
Version: v2.12.0
Fileset: fa492645343dff8e0ed7f6685c8b6228.d724eb2b1bf365148b8f4ffa746a50cfc2446f7ca5fa713f4ef8a757b0a792fe

Delete a Snapshot

Typically, you should delete snapshots after successfully upgrading your cluster.

pachctl delete snapshot <SNAPSHOT_ID>