First-Time Setup

HPE Machine Learning Data Management can be deployed in Kubernetes using a wide variety of container orchestrators, but to get you set up for the very first time, we recommend using Docker Desktop. This installation method is very fast and will provide you with everything you need to start the Beginner Tutorial.

For production deployments, we recommend following these guides:

Before You Start #

Hardened Security and Dependency Considerations

If you are deploying in a hardened security environment, such as within the DoD community or other regulated sectors, consider downloading and installing HPE Machine Learning Data Management from Iron Bank, a hardened container registry.

MLDM images may be pulled from Iron Bank by updating the global registry setting in the MLDM Helm chart values.yaml to use registry1.dso.mil/, e.g.

global:
  ...
  image:
    registry: registry1.dso.mil/

Additionally, note that the MLDM Helm chart relies on the Bitnami image and its associated sub-chart. If the Bitnami image is unavailable or if your available PostgreSQL image cannot be managed through the Bitnami sub-chart, you will need to install PostgreSQL separately. Refer to Global Helm Chart Values for details on specifying your separate PostgreSQL instance. Also, refer to Non-Bundled Database Setup for more detail on using your own PostgreSQL instance with MLDM.

If you have questions, please reach out to your Customer Support Engineer for assistance before proceeding.

You must have Homebrew installed.

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

You must have Windows Subsystem for Linux (WSL) 2 enabled (wsl --install) and a Linux distribution installed; if Linux does not boot in your WSL terminal after downloading from the Microsoft store, see the manual installation guide.

Manual Step Summary:

Open a PowerShell terminal.
Run each of the following:

dism.exe /online /enable-feature /featurename:Microsoft-Windows-Subsystem-Linux /all /norestart

dism.exe /online /enable-feature /featurename:VirtualMachinePlatform /all /norestart

Run each of the following:

wsl --update

wsl --set-default-version 2

wsl --install -d Ubuntu

Restart your machine.
Start a WSL terminal and set up your first Ubuntu user.
Update Ubuntu.

sudo apt update
sudo apt upgrade -y

Install Homebrew in Ubuntu so you can complete the rest of this guide:

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

All installation steps after 1. Install Docker Desktop must be run through the WSL terminal (Ubuntu) and not in PowerShell.

You are now ready to continue to Step 1.

Kubernetes & Openshift Version Support

Kubernetes: HPE Machine Learning Data Management supports the three most recent minor release versions of Kubernetes. If your Kubernetes version is not among these, it is End of Life (EOL) and unsupported. This ensures HPE Machine Learning Data Management users access to the latest Kubernetes features and bug fixes.
Openshift: HPE Machine Learning Data Management is compatible with OpenShift versions within the “Full Support” window.

1. Install Docker Desktop #

Install Docker Desktop for your machine.
Navigate to Settings for Mac, Windows, or Linux.
- Adjust your resources (~4 CPUs and ~12GB Memory)
- Enable Kubernetes
- On Windows, enable Docker Desktop integration in Ubuntu if Ubuntu is not your default Linux distro.
Select Apply & Restart.

2. Install Pachctl CLI #

brew tap pachyderm/tap && brew install pachyderm/tap/pachctl@2.12

AMD

curl -o /tmp/pachctl.deb -L https://github.com/pachyderm/pachyderm/releases/download/v2.12.0/pachctl_2.12.0_amd64.deb && sudo dpkg -i /tmp/pachctl.deb

ARM

curl -o /tmp/pachctl.deb -L https://github.com/pachyderm/pachyderm/releases/download/v2.12.0/pachctl_2.12.0_arm64.deb && sudo dpkg -i /tmp/pachctl.deb

AMD

curl -L https://github.com/pachyderm/pachyderm/releases/download/v2.12.0/pachctl_2.12.0_linux_amd64.tar.gz | sudo tar -xzv --strip-components=1 -C /usr/local/bin

ARM

curl -L https://github.com/pachyderm/pachyderm/releases/download/v2.12.0/pachctl_2.12.0_linux_arm64.tar.gz | sudo  tar -xzv --strip-components=1 -C /usr/local/bin

3. Install & Configure Helm #

Install Helm:

brew install helm

curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash

curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash

Add the Pachyderm repo to Helm:

helm repo add pachyderm https://helm.pachyderm.com  
helm repo update

Install PachD:

Tip

Open your browser and check http://localhost before installing. If any other tools are using the same port as HPE Machine Learning Data Management, add the following argument to the below command: --set proxy.service.httpPort=8080

The arguments passed to the helm install command will vary depending how you wish to configure your installation.

If you are not deploying locally (--set deployTarget=LOCAL), then you must specify a backend (--set pachd.storage.backend=<YOUR_BACKEND>) and storage url (--set pachd.storage.storageURL="s3://my-bucket").
helm install pachyderm pachyderm/pachyderm \ --set deployTarget=LOCAL \ --set proxy.enabled=true \ --set proxy.service.type=LoadBalancer \ --set proxy.host=localhost
Are you using an Enterprise trial key? If so, you can set up Enterprise Pachyderm locally by storing your trial key in a license.txt file and passing it into the following Helm command:
helm install pachyderm pachyderm/pachyderm \ --set deployTarget=LOCAL \ --set pachd.enterpriseLicenseKey="$(cat license.txt)" \ --set proxy.enabled=true \ --set proxy.service.type=LoadBalancer \ --set proxy.host=localhost \ --set pachd.storage.backend=<YOUR_BACKEND> \ --set pachd.storage.storageURL="s3://my-bucket" or "gs://my-bucket" or "azblob://my-container"
This unlocks Enterprise features but also requires user authentication to access Console. A mock user is created by default to get you started, with the username: admin and password: password.
This may take several minutes to complete.

4. Verify Installation #

In a new terminal, run the following command to check the status of your pods:

kubectl get pods

NAME                                           READY   STATUS      RESTARTS   AGE
pod/console-5b67678df6-s4d8c                   1/1     Running     0          2m8s
pod/etcd-0                                     1/1     Running     0          2m8s
pod/pachd-c5848b5c7-zwb8p                      1/1     Running     0          2m8s
pod/pg-bouncer-7b855cb797-jqqpx                1/1     Running     0          2m8s
pod/postgres-0                                 1/1     Running     0          2m8s

Re-run this command after a few minutes if pachd is not ready.

5. Connect to Cluster #

pachctl connect http://localhost:80

Warning

If you set the httpPort to a new value, such as 8080, use that value in the command. pachctl connect http://localhost:8080

Optionally open your browser and navigate to the Console UI.

Tip

You can check your Pachyderm version and connection to pachd at any time with the following command:

pachctl version

COMPONENT           VERSION  

pachctl             2.12.0  
pachd               2.12.0