Built-in tasks

The otter.tasks package contains the built-in task types.

tasks.hello_world module

Simple hello world example.

pydantic model otter.tasks.hello_world.HelloWorldSpec[source]

Bases: Spec

Configuration fields for the hello_world task.

Show JSON schema
{
   "title": "HelloWorldSpec",
   "description": "Configuration fields for the hello_world task.",
   "type": "object",
   "properties": {
      "name": {
         "title": "Name",
         "type": "string"
      },
      "requires": {
         "default": [],
         "items": {
            "type": "string"
         },
         "title": "Requires",
         "type": "array"
      },
      "scratchpad_ignore_missing": {
         "default": false,
         "title": "Scratchpad Ignore Missing",
         "type": "boolean"
      },
      "who": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": "world",
         "title": "Who"
      }
   },
   "additionalProperties": true,
   "required": [
      "name"
   ]
}

Config:
  • extra: str = allow

Fields:
Validators:

field who: str | None = 'world'

The person to greet.

class otter.tasks.hello_world.HelloWorld(spec: HelloWorldSpec, context: TaskContext)[source]

Bases: Task

Simple hello world example.

run() Self[source]

Say hello, then create an artifact about it.

validate() Self[source]

Always pass.

If you don’t want to validate anything, this method can be omitted.

tasks.copy module

Copy a file.

pydantic model otter.tasks.copy.CopySpec[source]

Bases: Spec

Configuration fields for the copy task.

Show JSON schema
{
   "title": "CopySpec",
   "description": "Configuration fields for the copy task.",
   "type": "object",
   "properties": {
      "name": {
         "title": "Name",
         "type": "string"
      },
      "requires": {
         "default": [],
         "items": {
            "type": "string"
         },
         "title": "Requires",
         "type": "array"
      },
      "scratchpad_ignore_missing": {
         "default": false,
         "title": "Scratchpad Ignore Missing",
         "type": "boolean"
      },
      "source": {
         "title": "Source",
         "type": "string"
      },
      "destination": {
         "title": "Destination",
         "type": "string"
      }
   },
   "additionalProperties": true,
   "required": [
      "name",
      "source",
      "destination"
   ]
}

Config:
  • extra: str = allow

Fields:
Validators:

field destination: str [Required]

The path, relative to release_uri to upload the file to.

field source: str [Required]

The URL of the file to download.

class otter.tasks.copy.Copy(spec: CopySpec, context: TaskContext)[source]

Bases: Task

Copy a file.

Downloads a file from source, then uploads it to destination.

Note

destination will be prepended with the otter.config.model.Config.release_uri config field.

If no release_uri is provided in the configuration, the file will only be downloaded locally. This is useful for local runs or debugging. The local path will be created by prepeding otter.config.model.Config.work_path to the destination field.

validate() Self[source]

Check that the downloaded file exists and has a valid size.

tasks.download module

Download a file.

pydantic model otter.tasks.download.DownloadSpec[source]

Bases: Spec

Configuration fields for the download task.

Show JSON schema
{
   "title": "DownloadSpec",
   "description": "Configuration fields for the download task.",
   "type": "object",
   "properties": {
      "name": {
         "title": "Name",
         "type": "string"
      },
      "requires": {
         "default": [],
         "items": {
            "type": "string"
         },
         "title": "Requires",
         "type": "array"
      },
      "scratchpad_ignore_missing": {
         "default": false,
         "title": "Scratchpad Ignore Missing",
         "type": "boolean"
      },
      "source": {
         "title": "Source",
         "type": "string"
      },
      "destination": {
         "anyOf": [
            {
               "format": "path",
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Destination"
      }
   },
   "additionalProperties": true,
   "required": [
      "name",
      "source"
   ]
}

Config:
  • extra: str = allow

Fields:
Validators:

field destination: Path | None = None

The local path to download the file to. If ommitted, the file will be downloaded to the same path as the source.

field source: str [Required]

The URL of the file to download. If it looks like a relative path, it will be prepended the release_uri.

class otter.tasks.download.Download(spec: DownloadSpec, context: TaskContext)[source]

Bases: Task

Download a file.

Downloads a file from source to destination. There are a few defaults and conveniences built in to the task:

  • If source does not contain a protocol (:// not present), the release_uri

    will be prepended to the source.

  • If destination is not provided, the file will be downloaded to the same path

    as the source, prepending the work path.

Those two together are useful for downloading files from the release bucket.

validate() Self[source]

Check that the downloaded file exists and has a valid size.

tasks.explode module

Generate more tasks based on a list.

pydantic model otter.tasks.explode.ExplodeSpec[source]

Bases: Spec

Configuration fields for the explode task.

Show JSON schema
{
   "title": "ExplodeSpec",
   "description": "Configuration fields for the explode task.",
   "type": "object",
   "properties": {
      "name": {
         "title": "Name",
         "type": "string"
      },
      "requires": {
         "default": [],
         "items": {
            "type": "string"
         },
         "title": "Requires",
         "type": "array"
      },
      "scratchpad_ignore_missing": {
         "default": false,
         "title": "Scratchpad Ignore Missing",
         "type": "boolean"
      },
      "do": {
         "items": {
            "$ref": "#/$defs/Spec"
         },
         "title": "Do",
         "type": "array"
      },
      "foreach": {
         "items": {
            "type": "string"
         },
         "title": "Foreach",
         "type": "array"
      }
   },
   "$defs": {
      "Spec": {
         "additionalProperties": true,
         "description": "Task Spec model.\n\nA `Spec` describes the properties and types for the config of a :py:class:`Task`.\n`Specs` are generated from the config file in :py:meth:`otter.task.load_specs`.\n\nThis is the base on which task `Specs` are built. Specific `Tasks` extend this\nclass to add custom attributes.\n\nThe first word in :py:attr:`name` determines the :py:attr:`task_type`. This is\nused to identify the :py:class:`Task` in the :py:class:`otter.task.TaskRegistry`\nand in the config file.\n\nFor example, for a ``DoSomething`` class defining a `Task`, the `task_type`\nwill be ``do_something``, and in the configuration file, it could be used\ninside a `Step` like this:\n\n.. code-block:: yaml\n\n    steps:\n        - do_something to create an example resource:\n            some_field: some_value\n            another_field: another_value",
         "properties": {
            "name": {
               "title": "Name",
               "type": "string"
            },
            "requires": {
               "default": [],
               "items": {
                  "type": "string"
               },
               "title": "Requires",
               "type": "array"
            },
            "scratchpad_ignore_missing": {
               "default": false,
               "title": "Scratchpad Ignore Missing",
               "type": "boolean"
            }
         },
         "required": [
            "name"
         ],
         "title": "Spec",
         "type": "object"
      }
   },
   "additionalProperties": true,
   "required": [
      "name",
      "do",
      "foreach"
   ]
}

Config:
  • extra: str = allow

Fields:
Validators:

field do: list[Spec] [Required]

The tasks to explode. Each task in the list will be duplicated for each iteration of the foreach list.

field foreach: list[str] [Required]

The list to iterate over.

class otter.tasks.explode.Explode(spec: ExplodeSpec, context: TaskContext)[source]

Bases: Task

Generate more tasks based on a list.

This task will duplicate the specs in the do list for each entry in the foreach list.

Inside of the specs in the do list, the string each can be used as as a sentinel to refer to the current iteration value.

Warning

The ${each} placeholder MUST be present in the otter.task.model.Spec.name of the new specs defined inside do, as otherwise all of them will have the same name, and it must be unique.

Example:

steps:
    - explode species:
    foreach:
        - homo_sapiens
        - mus_musculus
        - drosophila_melanogaster
    do:
        - name: copy ${each} genes
        source: https://example.com/genes/${each}/file.tsv
        destination: genes-${each}.tsv
        - name: copy ${each} proteins
        source: https://example.com/proteins/${each}/file.tsv
        destination: proteins-${each}.tsv

Keep in mind this replacement of each will only be done in strings, not lists or sub-objects.

tasks.explode_glob module

Generate more tasks based on a glob.

pydantic model otter.tasks.explode_glob.ExplodeGlobSpec[source]

Bases: Spec

Configuration fields for the explode task.

Show JSON schema
{
   "title": "ExplodeGlobSpec",
   "description": "Configuration fields for the explode task.",
   "type": "object",
   "properties": {
      "name": {
         "title": "Name",
         "type": "string"
      },
      "requires": {
         "default": [],
         "items": {
            "type": "string"
         },
         "title": "Requires",
         "type": "array"
      },
      "scratchpad_ignore_missing": {
         "default": false,
         "title": "Scratchpad Ignore Missing",
         "type": "boolean"
      },
      "glob": {
         "title": "Glob",
         "type": "string"
      },
      "do": {
         "items": {
            "$ref": "#/$defs/Spec"
         },
         "title": "Do",
         "type": "array"
      }
   },
   "$defs": {
      "Spec": {
         "additionalProperties": true,
         "description": "Task Spec model.\n\nA `Spec` describes the properties and types for the config of a :py:class:`Task`.\n`Specs` are generated from the config file in :py:meth:`otter.task.load_specs`.\n\nThis is the base on which task `Specs` are built. Specific `Tasks` extend this\nclass to add custom attributes.\n\nThe first word in :py:attr:`name` determines the :py:attr:`task_type`. This is\nused to identify the :py:class:`Task` in the :py:class:`otter.task.TaskRegistry`\nand in the config file.\n\nFor example, for a ``DoSomething`` class defining a `Task`, the `task_type`\nwill be ``do_something``, and in the configuration file, it could be used\ninside a `Step` like this:\n\n.. code-block:: yaml\n\n    steps:\n        - do_something to create an example resource:\n            some_field: some_value\n            another_field: another_value",
         "properties": {
            "name": {
               "title": "Name",
               "type": "string"
            },
            "requires": {
               "default": [],
               "items": {
                  "type": "string"
               },
               "title": "Requires",
               "type": "array"
            },
            "scratchpad_ignore_missing": {
               "default": false,
               "title": "Scratchpad Ignore Missing",
               "type": "boolean"
            }
         },
         "required": [
            "name"
         ],
         "title": "Spec",
         "type": "object"
      }
   },
   "additionalProperties": true,
   "required": [
      "name",
      "glob",
      "do"
   ]
}

Config:
  • extra: str = allow

Fields:
Validators:

field do: list[Spec] [Required]

The tasks to explode. Each task in the list will be duplicated for each iteration of the foreach list.

field glob: str [Required]

The glob expression.

class otter.tasks.explode_glob.ExplodeGlob(spec: ExplodeGlobSpec, context: TaskContext)[source]

Bases: Task

Generate more tasks based on a glob.

This task will duplicate the specs in the do list for each entry in a list coming from a glob expression.

The task will add the following keys to a local scratchpad:

  • uri: the full file path

  • match_prefix: the path up to the glob pattern and, in cases where possible,

    relative to otter.config.model.Config.release_uri.

  • match_path: the part of the path that the glob matched without the

    file name. NOTE that this will always end with a slash, so do not include it in the templating.

  • match_stem: the file name of the matched file without the extension.

  • match_ext: the file extensions of the matched file, with the dot.

  • uuid: an UUID4, in case it is needed to generate unique names.

- name: explode_glob things
  glob: 'gs://release-25/input/items/**/*.json'
  do:
    - name: transform ${match_stem} into parquet
      source: ${uri}
      destination: intermediate/${match_path}${math_stem}.parquet

for a bucket containing two files:

gs://release-25/input/items/furniture/chair.json
gs://release-25/input/items/furniture/table.json

And release_uri set to gs://release-25

the values will be:

Scratchpad values for the first task

key

value

uri

gs://release-25/input/items/furniture/chair.json

match_prefix

input/items

match_path

furniture/

match_stem

chair

match_ext

.json

uuid

<uuid>

the first task will be duplicated twice, with the following specs:

- name: transform chair into parquet
  source: input/items/furniture/chair.json
  destination: intermediate/furniture/chair.parquet
- name: transform table into parquet
  source: input/items/furniture/table.json
  destination: intermediate/furniture/table.parquet

tasks.find_latest module

Find the last-modified file among those in a prefix URI.

pydantic model otter.tasks.find_latest.FindLatestSpec[source]

Bases: Spec

Configuration fields for the find_latest task.

Show JSON schema
{
   "title": "FindLatestSpec",
   "description": "Configuration fields for the find_latest task.",
   "type": "object",
   "properties": {
      "name": {
         "title": "Name",
         "type": "string"
      },
      "requires": {
         "default": [],
         "items": {
            "type": "string"
         },
         "title": "Requires",
         "type": "array"
      },
      "scratchpad_ignore_missing": {
         "default": false,
         "title": "Scratchpad Ignore Missing",
         "type": "boolean"
      },
      "source": {
         "title": "Source",
         "type": "string"
      },
      "pattern": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Pattern"
      },
      "scratchpad_key": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Scratchpad Key"
      }
   },
   "additionalProperties": true,
   "required": [
      "name",
      "source"
   ]
}

Config:
  • extra: str = allow

Fields:
Validators:

field pattern: str | None = None

The pattern to match files against. The pattern should be a simple string match, preceded by an exclamation mark to exclude files. For example, foo will match only files containing foo, while !foo will exclude all files containing foo.

field scratchpad_key: str | None = None

The scratchpad key where the path of the latest file will be stored. Defaults to the task name.

field source: str [Required]

The prefix from where the file with the latest modification date will be found.

class otter.tasks.find_latest.FindLatest(spec: FindLatestSpec, context: TaskContext)[source]

Bases: Task

Find the last-modified file among those in a prefix URI.

Module contents

Builtin tasks.