Built-in tasks
The otter.tasks
package contains the built-in task types.
tasks.hello_world module
Simple hello world example.
- pydantic model otter.tasks.hello_world.HelloWorldSpec[source]
Bases:
Spec
Configuration fields for the hello_world task.
Show JSON schema
{ "title": "HelloWorldSpec", "description": "Configuration fields for the hello_world task.", "type": "object", "properties": { "name": { "title": "Name", "type": "string" }, "requires": { "default": [], "items": { "type": "string" }, "title": "Requires", "type": "array" }, "scratchpad_ignore_missing": { "default": false, "title": "Scratchpad Ignore Missing", "type": "boolean" }, "who": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": "world", "title": "Who" } }, "additionalProperties": true, "required": [ "name" ] }
- Config:
extra: str = allow
- Fields:
- Validators:
- field who: str | None = 'world'
The person to greet.
- class otter.tasks.hello_world.HelloWorld(spec: HelloWorldSpec, context: TaskContext)[source]
Bases:
Task
Simple hello world example.
tasks.copy module
Copy a file.
- pydantic model otter.tasks.copy.CopySpec[source]
Bases:
Spec
Configuration fields for the copy task.
Show JSON schema
{ "title": "CopySpec", "description": "Configuration fields for the copy task.", "type": "object", "properties": { "name": { "title": "Name", "type": "string" }, "requires": { "default": [], "items": { "type": "string" }, "title": "Requires", "type": "array" }, "scratchpad_ignore_missing": { "default": false, "title": "Scratchpad Ignore Missing", "type": "boolean" }, "source": { "title": "Source", "type": "string" }, "destination": { "title": "Destination", "type": "string" } }, "additionalProperties": true, "required": [ "name", "source", "destination" ] }
- Config:
extra: str = allow
- Fields:
- Validators:
- field destination: str [Required]
The path, relative to release_uri to upload the file to.
- field source: str [Required]
The URL of the file to download.
- class otter.tasks.copy.Copy(spec: CopySpec, context: TaskContext)[source]
Bases:
Task
Copy a file.
Downloads a file from source, then uploads it to destination.
Note
destination will be prepended with the
otter.config.model.Config.release_uri
config field.If no release_uri is provided in the configuration, the file will only be downloaded locally. This is useful for local runs or debugging. The local path will be created by prepeding
otter.config.model.Config.work_path
to the destination field.
tasks.download module
Download a file.
- pydantic model otter.tasks.download.DownloadSpec[source]
Bases:
Spec
Configuration fields for the download task.
Show JSON schema
{ "title": "DownloadSpec", "description": "Configuration fields for the download task.", "type": "object", "properties": { "name": { "title": "Name", "type": "string" }, "requires": { "default": [], "items": { "type": "string" }, "title": "Requires", "type": "array" }, "scratchpad_ignore_missing": { "default": false, "title": "Scratchpad Ignore Missing", "type": "boolean" }, "source": { "title": "Source", "type": "string" }, "destination": { "anyOf": [ { "format": "path", "type": "string" }, { "type": "null" } ], "default": null, "title": "Destination" } }, "additionalProperties": true, "required": [ "name", "source" ] }
- Config:
extra: str = allow
- Fields:
- Validators:
- field destination: Path | None = None
The local path to download the file to. If ommitted, the file will be downloaded to the same path as the source.
- field source: str [Required]
The URL of the file to download. If it looks like a relative path, it will be prepended the release_uri.
- class otter.tasks.download.Download(spec: DownloadSpec, context: TaskContext)[source]
Bases:
Task
Download a file.
Downloads a file from source to destination. There are a few defaults and conveniences built in to the task:
- If source does not contain a protocol (
://
not present), the release_uri will be prepended to the source.
- If source does not contain a protocol (
- If destination is not provided, the file will be downloaded to the same path
as the source, prepending the work path.
Those two together are useful for downloading files from the release bucket.
tasks.explode module
Generate more tasks based on a list.
- pydantic model otter.tasks.explode.ExplodeSpec[source]
Bases:
Spec
Configuration fields for the explode task.
Show JSON schema
{ "title": "ExplodeSpec", "description": "Configuration fields for the explode task.", "type": "object", "properties": { "name": { "title": "Name", "type": "string" }, "requires": { "default": [], "items": { "type": "string" }, "title": "Requires", "type": "array" }, "scratchpad_ignore_missing": { "default": false, "title": "Scratchpad Ignore Missing", "type": "boolean" }, "do": { "items": { "$ref": "#/$defs/Spec" }, "title": "Do", "type": "array" }, "foreach": { "items": { "type": "string" }, "title": "Foreach", "type": "array" } }, "$defs": { "Spec": { "additionalProperties": true, "description": "Task Spec model.\n\nA `Spec` describes the properties and types for the config of a :py:class:`Task`.\n`Specs` are generated from the config file in :py:meth:`otter.task.load_specs`.\n\nThis is the base on which task `Specs` are built. Specific `Tasks` extend this\nclass to add custom attributes.\n\nThe first word in :py:attr:`name` determines the :py:attr:`task_type`. This is\nused to identify the :py:class:`Task` in the :py:class:`otter.task.TaskRegistry`\nand in the config file.\n\nFor example, for a ``DoSomething`` class defining a `Task`, the `task_type`\nwill be ``do_something``, and in the configuration file, it could be used\ninside a `Step` like this:\n\n.. code-block:: yaml\n\n steps:\n - do_something to create an example resource:\n some_field: some_value\n another_field: another_value", "properties": { "name": { "title": "Name", "type": "string" }, "requires": { "default": [], "items": { "type": "string" }, "title": "Requires", "type": "array" }, "scratchpad_ignore_missing": { "default": false, "title": "Scratchpad Ignore Missing", "type": "boolean" } }, "required": [ "name" ], "title": "Spec", "type": "object" } }, "additionalProperties": true, "required": [ "name", "do", "foreach" ] }
- Config:
extra: str = allow
- Fields:
- Validators:
- field do: list[Spec] [Required]
The tasks to explode. Each task in the list will be duplicated for each iteration of the foreach list.
- field foreach: list[str] [Required]
The list to iterate over.
- class otter.tasks.explode.Explode(spec: ExplodeSpec, context: TaskContext)[source]
Bases:
Task
Generate more tasks based on a list.
This task will duplicate the specs in the do list for each entry in the foreach list.
Inside of the specs in the do list, the string each can be used as as a sentinel to refer to the current iteration value.
Warning
The ${each} placeholder MUST be present in the
otter.task.model.Spec.name
of the new specs defined inside do, as otherwise all of them will have the same name, and it must be unique.Example:
steps: - explode species: foreach: - homo_sapiens - mus_musculus - drosophila_melanogaster do: - name: copy ${each} genes source: https://example.com/genes/${each}/file.tsv destination: genes-${each}.tsv - name: copy ${each} proteins source: https://example.com/proteins/${each}/file.tsv destination: proteins-${each}.tsv
Keep in mind this replacement of each will only be done in strings, not lists or sub-objects.
tasks.explode_glob module
Generate more tasks based on a glob.
- pydantic model otter.tasks.explode_glob.ExplodeGlobSpec[source]
Bases:
Spec
Configuration fields for the explode task.
Show JSON schema
{ "title": "ExplodeGlobSpec", "description": "Configuration fields for the explode task.", "type": "object", "properties": { "name": { "title": "Name", "type": "string" }, "requires": { "default": [], "items": { "type": "string" }, "title": "Requires", "type": "array" }, "scratchpad_ignore_missing": { "default": false, "title": "Scratchpad Ignore Missing", "type": "boolean" }, "glob": { "title": "Glob", "type": "string" }, "do": { "items": { "$ref": "#/$defs/Spec" }, "title": "Do", "type": "array" } }, "$defs": { "Spec": { "additionalProperties": true, "description": "Task Spec model.\n\nA `Spec` describes the properties and types for the config of a :py:class:`Task`.\n`Specs` are generated from the config file in :py:meth:`otter.task.load_specs`.\n\nThis is the base on which task `Specs` are built. Specific `Tasks` extend this\nclass to add custom attributes.\n\nThe first word in :py:attr:`name` determines the :py:attr:`task_type`. This is\nused to identify the :py:class:`Task` in the :py:class:`otter.task.TaskRegistry`\nand in the config file.\n\nFor example, for a ``DoSomething`` class defining a `Task`, the `task_type`\nwill be ``do_something``, and in the configuration file, it could be used\ninside a `Step` like this:\n\n.. code-block:: yaml\n\n steps:\n - do_something to create an example resource:\n some_field: some_value\n another_field: another_value", "properties": { "name": { "title": "Name", "type": "string" }, "requires": { "default": [], "items": { "type": "string" }, "title": "Requires", "type": "array" }, "scratchpad_ignore_missing": { "default": false, "title": "Scratchpad Ignore Missing", "type": "boolean" } }, "required": [ "name" ], "title": "Spec", "type": "object" } }, "additionalProperties": true, "required": [ "name", "glob", "do" ] }
- Config:
extra: str = allow
- Fields:
- Validators:
- field do: list[Spec] [Required]
The tasks to explode. Each task in the list will be duplicated for each iteration of the foreach list.
- field glob: str [Required]
The glob expression.
- class otter.tasks.explode_glob.ExplodeGlob(spec: ExplodeGlobSpec, context: TaskContext)[source]
Bases:
Task
Generate more tasks based on a glob.
This task will duplicate the specs in the
do
list for each entry in a list coming from a glob expression.The task will add the following keys to a local scratchpad:
uri
: the full file pathmatch_prefix
: the path up to the glob pattern and, in cases where possible,relative to
otter.config.model.Config.release_uri
.
match_path
: the part of the path that the glob matched without thefile name. NOTE that this will always end with a slash, so do not include it in the templating.
match_stem
: the file name of the matched file without the extension.match_ext
: the file extensions of the matched file, with the dot.uuid
: an UUID4, in case it is needed to generate unique names.
- name: explode_glob things glob: 'gs://release-25/input/items/**/*.json' do: - name: transform ${match_stem} into parquet source: ${uri} destination: intermediate/${match_path}${math_stem}.parquet
for a bucket containing two files:
gs://release-25/input/items/furniture/chair.jsongs://release-25/input/items/furniture/table.jsonAnd release_uri set to
gs://release-25
the values will be:
Scratchpad values for the first task key
value
uri
gs://release-25/input/items/furniture/chair.json
match_prefix
input/items
match_path
furniture/
match_stem
chair
match_ext
.json
uuid
<uuid>
the first task will be duplicated twice, with the following specs:
- name: transform chair into parquet source: input/items/furniture/chair.json destination: intermediate/furniture/chair.parquet - name: transform table into parquet source: input/items/furniture/table.json destination: intermediate/furniture/table.parquet
tasks.find_latest module
Find the last-modified file among those in a prefix URI.
- pydantic model otter.tasks.find_latest.FindLatestSpec[source]
Bases:
Spec
Configuration fields for the find_latest task.
Show JSON schema
{ "title": "FindLatestSpec", "description": "Configuration fields for the find_latest task.", "type": "object", "properties": { "name": { "title": "Name", "type": "string" }, "requires": { "default": [], "items": { "type": "string" }, "title": "Requires", "type": "array" }, "scratchpad_ignore_missing": { "default": false, "title": "Scratchpad Ignore Missing", "type": "boolean" }, "source": { "title": "Source", "type": "string" }, "pattern": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "title": "Pattern" }, "scratchpad_key": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": null, "title": "Scratchpad Key" } }, "additionalProperties": true, "required": [ "name", "source" ] }
- Config:
extra: str = allow
- Fields:
- Validators:
- field pattern: str | None = None
The pattern to match files against. The pattern should be a simple string match, preceded by an exclamation mark to exclude files. For example,
foo
will match only files containingfoo
, while!foo
will exclude all files containingfoo
.
- field scratchpad_key: str | None = None
The scratchpad key where the path of the latest file will be stored. Defaults to the task name.
- field source: str [Required]
The prefix from where the file with the latest modification date will be found.
- class otter.tasks.find_latest.FindLatest(spec: FindLatestSpec, context: TaskContext)[source]
Bases:
Task
Find the last-modified file among those in a prefix URI.
Module contents
Builtin tasks.