Pipeline

About #

A pipeline is a HPE Machine Learning Data Management primitive responsible for reading data from a specified source, such as a HPE Machine Learning Data Management repo, transforming it according to the pipeline specification, and writing the result to an output repo.

Pipelines subscribe to a branch in one or more input repositories, and every time the branch has a new commit, the pipeline executes a job that runs user code to completion and writes the results to a commit in the output repository.

Pipelines are defined declaratively using a JSON or YAML file (the pipeline specification), which must include the name, input, and transform parameters at a minimum. Pipelines can be chained together to create a directed acyclic graph (DAG).