task package

task.task module

Task classes for PIS.

class pis.task.task.Pretask(definition: BaseTaskDefinition)[source]

Bases: Task

Base class for all pretasks.

Pretasks are tasks that are run before the main tasks. They are used to prepare the environment for the main tasks. For example: getting a list of files to later download one or all of them; or generating a new set of tasks based on some parameters.

Pretasks are defined by a PretaskDefinition object, which contains the configuration fields for the pretask. The Pretask class only needs to implement run. All the initialization, registration and reporting is handled internally by parent classes and PIS itself.

To implement a new pretask, create a new PretaskDefinition class that inherits from TaskDefinition or and contains the configuration fields required. Then, create a new class that inherits from Pretask and implements the run method.

The name of the class, converted to snake_case, will be the ‘real name’ of the pretask, which is used to identify the pretask in the registry and in the configuration file.

class pis.task.task.Task(definition: BaseTaskDefinition)[source]

Bases: TaskReporter

Base class for all tasks.

Tasks are the main building blocks of PIS. They are responsible for running the various operations that are needed to gather the data from the different sources.

Tasks are defined by a TaskDefinition object, which contains the configuration fields for the task. The Task class only needs to implement how it runs and validates itself. All the initialization, registration and reporting is handled internally by parent classes and PIS itself.

To implement a new task, create a new TaskDefinition class that inherits from TaskDefinition or and contains the configuration fields required. Then, create a new class that inherits from Task and implements the run and the validate methods.

The name of the class, converted to snake_case, will be the real name of the task, which is used to identify the task in the registry and in the configuration file.

For example, for a DoSomething task, the real name will be do_something, and in the configuration file, it could be used inside a step like this:

steps:
    - do_something to create an example resource:
        some_field: some_value
        another_field: another_value
Parameters:

definition (BaseTaskDefinition) – The definition of the task.

Variables:
  • definition (BaseTaskDefinition) – The definition of the task.

  • resource (Resource) – The resource object associated with the task.

run(*, abort: Event) Self[source]

Run the task.

This method contains the actual work of the task. All tasks must implement run, and it must download or generate a resource in the path stored in the destination field of the task definition.

Optionally, an abort event can be watched to stop the task if another task has failed. This is useful for long running work that can be stopped midway once the run is deemed to be a failure.

Parameters:

abort (Event) – The event that will be set if another task has failed.

Returns:

The task instance itself must be returned.

Return type:

Self

upload(*, abort: Event) Self[source]

Upload the task.

This method will upload the file generated by the task to the remote uri. The destination field of the task definition will be used as the path in the bucket.

There is no need to implement this method in the subclass unless the task needs some special handling for the upload, which is unlikely.

Parameters:

abort (Event) – The event that will be set if another task has failed.

Returns:

The task instance itself must be returned.

Return type:

Self

validate(*, abort: Event) Self[source]

Validate the task.

This method should be implemented by the task subclass to perform validation. If not implemented, the task resource will always be considered valid.

The validate method should make use of the v method from the validators module to invoke a series of validators. See pis.validators.v().

Parameters:

abort (Event) – The event that will be set if another task has failed.

Returns:

The task instance itself must be returned.

Return type:

Self

task.task_registry module

TaskRegistry class handles the registry of tasks.

class pis.task.task_registry.TaskRegistry[source]

TaskRegistry contains the registry of tasks.

The registry is where PIS will instantiate tasks from when it reads the configuration file. It contains the mapping of task names to their respective classes, task definitions and manifests.

Variables:
  • tasks (dict[str, type[Task]]) – Mapping of task names to their respective classes.

  • task_definitions (dict[str, type[BaseTaskDefinition]]) – Mapping of task names to their respective task definition classes.

  • task_manifests (dict[str, type[TaskManifest]]) – Mapping of task names to their respective task manifest classes.

  • pre_tasks (list[str]) – List of names of the pretasks.

instantiate_p(pretask_definition: BaseTaskDefinition) Pretask[source]

Instantiate a pretask.

Parameters:

pretask_definition (BaseTaskDefinition) – The pretask definition to instantiate the pretask from.

Returns:

The instantiated pretask.

Return type:

Pretask

instantiate_t(task_definition: BaseTaskDefinition) Task[source]

Instantiate a task.

Parameters:

task_definition (BaseTaskDefinition) – The task definition to instantiate the task from.

Returns:

The instantiated task.

Return type:

Task

is_pretask(task_definition: BaseTaskDefinition) bool[source]

Return whether the task is a pretask.

Parameters:

task_definition (BaseTaskDefinition) – The task definition to check.

Returns:

Whether the task is a pretask.

Return type:

bool

register_tasks()[source]

Register all tasks from the tasks directory.

Module contents

Task module.

pis.task.init_task_registry()[source]

Initialize the task registry.

pis.task.task_registry()[source]

Return the task registry.

The task registry must be initialized explicitly by calling the init_task_registry function before trying to access it. If it is not initialized, a PISError is raised.

Raises:

PISError – If the task registry is not initialized.

Returns:

The task registry.

Return type:

TaskRegistry