Jsonnet Pipeline Specifications

Jsonnet pipeline specifications are a way to parameterize your pipeline specification files, adding a dynamic component to their creation and maintenance. This is achieved by using the open-source Jsonnet data templating language to wrap the baseline of a JSON pipeline specification file into a function, which allows the injection of parameters to a pipeline specification file.

You can use Jsonnet pipeline specs to both create and update pipelines.

Benefits

  • Parameterization: Pass parameters to your pipeline specification files, making them dynamic and reusable.
  • Code Reuse: Reuse the baseline of a given pipeline spec while experimenting with various values of given fields.
  • Modularity: Create a library of pipeline specifications that can be instantiated with different parameters.
  • Readability: Write more concise and readable pipeline specifications.
  • Flexibility: Create multiple pipelines at once from a single file.
  • Ease of Maintenance: Maintain a single file for multiple pipelines, reducing the number of files you need to manage.

Use Cases

  • Parameterizing Input Repositories: Pass different input repositories to the same pipeline specification file.
  • Parameterizing Image Tags: Pass different image tags to the same pipeline specification file.
  • Parameterizing Pipeline Names: Pass different pipeline names to the same pipeline specification file.
  • Parameterizing Pipeline Descriptions: Pass different pipeline descriptions to the same pipeline specification file.
  • Parameterizing Transform Commands: Pass different transform commands to the same pipeline specification file.
  • Parameterizing Transform Images: Pass different transform images to the same pipeline specification file.
  • Parameterizing Input Globs: Pass different input globs to the same pipeline specification file.

Before You Start

warning icon Warning

All jsonnet pipeline specs have a .jsonnet extension. Read Jsonnet’s complete standard library documentation to learn about all the variables types, string manipulation and mathematical functions, or assertions available to you.

At the minimum, your function should always have a parameter that acts as a pipeline.name modifier. HPE Machine Learning Data Management’s pipeline names are unique. You can quickly generate several pipelines from the same jsonnet pipeline specification file by adding a prefix or a suffix to its generic name.

CLI

Creating pipelines from pipeline specs utilizing jsonnet in the CLI requires providing a function with named arguments that represent the parameters you want to pass to the pipeline spec.

// comments are arbitrary but recommended to describe the function
// arg1: description of arg1
// arg2: description of arg2

function(arg1, arg2, ... )
{
 ...
}
pachctl create pipeline --jsonnet jsonnet/example.jsonnet --arg arg1=foo --arg arg2=bar

Examples

Parameterizing Pipeline Name & Input Repo

The following example enables you to:

  • Pass in a value for the name attribute as the suffix parameter to create a unique pipeline name.
  • Pass in a value for the repo attribute as the src parameter to specify the input repository.
# edges.jsonnet
////
// Template arguments:
//
// suffix : An arbitrary suffix appended to the name of this pipeline, for
//          disambiguation when multiple instances are created.
// src : the repo from which this pipeline will read the images to which
//       it applies edge detection.
////
function(suffix, src)
{
  pipeline: { name: "edges-"+suffix },
  description: "OpenCV edge detection on "+src,
  input: {
    pfs: {
      name: "images",
      glob: "/*",
      repo: src,
    }
  },
  transform: {
    cmd: [ "python3", "/edges.py" ],
    image: "pachyderm/opencv:0.0.1"
  }
}
pachctl create pipeline --jsonnet jsonnet/edges.jsonnet --arg suffix=1 --arg src=images

Console

Creating pipelines from pipeline specs utilizing jsonnet in Console requires adhering to the following required template structure in YAML format:

/*
title: Required title of the pipeline
description: "Optional description of the pipeline"
args: # Required array that tells console what fields to present to the user.
- name: arg1
  description: description of arg1
  type: string
- name: arg2
  description: description of arg2
  type: string
  default: Optional default value to display upon creating an instance of this pipeline
*/
function(arg1, arg2, ... )

Examples

Parameterizing Pipeline Name & Input Repo

  1. Create an edges.jsonnet file like the following:
    /*
    title: Image edges
    description: "Simple example pipeline."
    args:
    - name: suffix
      description: Pipeline name suffix
      type: string
      default: 1
    - name: src
      description: Input repo to pipeline.
      type: string
      default: test
    */
    function(suffix, src)
    {
      pipeline: { name: "edges-"+suffix },
      description: "OpenCV edge detection on "+src,
      input: {
        pfs: {
          name: "images",
          glob: "/*",
          repo: src,
        }
      },
      transform: {
        cmd: [ "python3", "/edges.py" ],
        image: "pachyderm/opencv:0.0.1"
      }
    }
  2. Save the jsonnet pipline spec file at an accessible location.
  3. Authenticate to HPE Machine Learning Data Management or access Console via Localhost.
  4. Scroll through the project list to find a project you want to view.
  5. Select View Project.
  6. Select Create > Pipeline from template from the sidebar.
  7. Provide a valid path to the pipeline spec file.
  8. Select Continue.
  9. Fill out any populated fields from the pipeline spec file and verify if default values are correct.
  10. Select Create Pipeline.