Release Notes Highlights for MLDM

2.10.0

January 1, 0001

Feature: For enterprise customers using MLDM (Pachyderm) & MLDE (Determined) in a combined cluster environment, the MLDE Notebooks now include support for the Jupyter Pachyderm extension by default.
Feature: You can now set up and maintain metadata on your HPE Machine Learning Data Management artifacts. This includes clusters, projects, repos, branches, and commits.
Enhancement: The blog storage configuration attribute GOCDK_ENABLED is now set to true by default in your Helm chart values; in 2.11.0 the option to disable it will be removed.
Enhancement: The Console UI has undergone several improvements, including:
- Improved file browsing experience
- Improved DAG visualizations
  - Interactive DAG edge highlighting
  - Distinguishing colors and/or patterns based on pipeline types
  - Enhancing the ability to understand more about connections by their edges (like joins)
  - Visual indications of parallelism for pipelines when spec calls for it and when running
- Pipeline and repo table paging

2.11.0

January 1, 0001

Feature: Users can now manage metadata (as key:value pairs) in Console for projects and repositories from the User Metadata tab of the details side panel.
Enhancement: Projects, pipelines, branches, and commits now include the following dervied metadata by default: created_at, created_by, updated_at, and updated_by.
Feature: Pre-built Jsonnet templates are now available in Console when creating a pipeline:
- Snowflake Integration: Creates a cron pipeline that can execute a query against a Snowflake database and return the results in a single output file.
- Hugging Face Downloader: Creates a cron pipeline to download datasets or models from huggingface on demand.
Enhancement: Several enhancements have been made to improve the integration between HPE Machine Learning Data Management and Machine Learning Development Environment.
Feature: The Pachyderm SDK now has an extras package cdr that you can install (pip install pachyderm_sdk[cdr]) to make use of Common Data Refs (CDRs) in your user code. CDRs improve performance and speed by downloading version-controlled data directly from HPE Machine Learning Data Management’s underlying Object Storage bucket and caching that data locally on your machine, allowing datasets to be assembled entirely locally and incrementally updated.
Security: The HPE Machine Learning Data Management repository is now available at Iron Bank, a hardened container image repository owned and maintained by the U.S. Department of Defense (DoD) that supports the end-to-end lifecycle for modern software development. If you plan to download and install from Iron Bank, please reach out to ai-support@hpe.com or your Customer Success Engineer for assistance.
Notice: The gocdk_enabled attribute has been removed from the Helm Chart Values as it is now the default object storage driver.

2.12.2

January 1, 0001

Feature: Users can now snapshot and restore pachyderm. There is a new Snapshot API, which allows you to Create, List, Delete and Inspect Snapshots.
Feature: Users can now specify that changes to the files in a particular input should not result with datum reprocessing using Reference Inputs.
Feature: Users can now implement deferred processing with a mechanism called Conditional Propagation. This mechanism is intended to become more robust over time and ultimately replace branch triggers.

2.6.0

January 1, 0001

Feature: Datum Batching is now available. Datum Batching is a performance optimization process that enables processing multiple datums sequentially.
Feature: The JupyterLab Pipeline Extension (PPS Extension) is now available, allowing users to push notebook code directly into a pipeline to create and run it. This feature is in Alpha, so we encourage you to share your feedback with us as you use it.
Enhancement: New RBAC roles have been added to Projects: ProjectViewerRole, ProjectWriterRole, ProjectOwnerRole, and ProjectCreatorRole. You can read about the roles here.
Enhancement: The Console UI has undergone some substantial improvements, including a revamped file browser and more detailed information about pipeline and job performance.
Enhancement: The Documentation site has undergone a substantial information architecture overhaul, making it easier to find the information you need. Content is now stored in top-level folders that follow the natural progression of learning about and using HPE Machine Learning Data Management.

2.7.0

January 1, 0001

Feature: The new Pachyderm SDK is now available. Check out the reference documentation, install guide, and example starter project.
Feature: Console now has a runtime visualization for jobs in your pipeline.
Feature: The documentation site now has a chatbot to help you find what you’re looking for. This feature is in beta, so please let us know if you have any feedback through our Slack community.
Feature: HPE Machine Learning Data Management’s helm chart now has a section for preflight checks, allowing you to easily validate whether the upgrade/migrations will be successful. This section can be found at pachd.preflightchecks. Simply set enabled: true and set the image.tag to the new version you want to upgrade to. If created the pod named pachyderm-preflight-check shows a status of Completed, you are ready to perform the upgrade. See the Upgrade steps for more information.
Enhancement: Console’s scalability has been improved to handle more concurrent users (50+) and power users who have many pipelines.
Enhancement: Console’s DAG visualization has been upgraded to include more information about the state of your pipelines.
Enhancement: The Jupyterlab Pipeline Specification Extension now supports GPUs.
Refactor: The functionality of the Branch Cron Trigger has been refactored to work more intuitively. Previously, cron triggers functioned more like rate limiters; now, they enable you to set up a scheduled reoccurring event on a repo branch that evaluates and fires the trigger. When a Cron Trigger fires, but no new data has been added, there are no new downstream commits or jobs. See our Cron glossary entry for more information on crons in HPE Machine Learning Data Management
Deprecation: The original Python SDK (python-pachyderm) will be deprecated in 9 months (May 2024). We recommend that you start trying out the new Pachyderm SDK (pachyderm-sdk) and begin planning your transition.

2.8.0

January 1, 0001

Feature: You can now create and manage pipelines in Console! To showcase this, we’ve added Console steps to all of our tutorials.
Feature: You can now set global defaults for your cluster that are passed down to all pipeline specs. These defaults provide a consistent experience for your data scientists and help manage your cluster. You can manage defaults via the PachCTL CLI or within Console.
Beta: You can now try out a beta version of our Unified Deployment experience with Determined.
Update: Branch triggers now require the trigger branch to exist before adding a --trigger setting to the target branch.
Enhancement: All pipeline specification references have been standardized to use camelCase format; use this format going forward when creating pipeline specifications.