config package

This package contains the configuration models, parsers and utilities for PIS.

config.config module

Main module in the config package.

class pis.config.config.Config[source]

The configuration object.

This class loads the settings from different sources and merges them into a single settings object. The sources are, in order of precedence:

  1. Command line arguments

  2. Environment variables

  3. YAML configuration file

  4. Default settings

Variables:

settings (Settings) – The settings object.

get_scratchpad_sentinel_dict() dict[str, Any][source]

Return the sentinel dictionary for the scratchpad.

Returns:

The sentinel dictionary for the scratchpad.

Return type:

dict[str, Any]

get_task_definitions() list[BaseTaskDefinition][source]

Validate the task definitions.

Makes sure the task definitions specified in the configuration file for the step the application is going to run are valid.

Returns:

The list of task definitions.

Return type:

list[BaseTaskDefinition]

config.models module

This module contains the models for the configuration settings.

pydantic model pis.config.models.BaseTaskDefinition[source]

Bases: BaseModel

Base Task definition model.

This model is the base on which pis.config.models.TaskDefinition and pis.config.models.PretaskDefinition are built. Specific tasks or pretasks subclass those to add custom attributes.

Show JSON schema
{
   "title": "BaseTaskDefinition",
   "description": "Base Task definition model.\n\nThis model is the base on which :class:`pis.config.models.TaskDefinition`\nand :class:`pis.config.models.PretaskDefinition` are built. Specific\ntasks or pretasks subclass those to add custom attributes.",
   "type": "object",
   "properties": {
      "name": {
         "title": "Name",
         "type": "string"
      }
   },
   "additionalProperties": true,
   "required": [
      "name"
   ]
}

Config:
  • extra: str = allow

Fields:
field name: str [Required]
pydantic model pis.config.models.CliSettings[source]

Bases: BaseModel

CLI settings model.

Show JSON schema
{
   "title": "CliSettings",
   "description": "CLI settings model.",
   "type": "object",
   "properties": {
      "step": {
         "default": "",
         "title": "Step",
         "type": "string"
      },
      "config_file": {
         "anyOf": [
            {
               "format": "path",
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Config File"
      },
      "work_dir": {
         "anyOf": [
            {
               "format": "path",
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Work Dir"
      },
      "remote_uri": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Remote Uri"
      },
      "pool": {
         "anyOf": [
            {
               "type": "integer"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Pool"
      },
      "log_level": {
         "anyOf": [
            {
               "enum": [
                  "TRACE",
                  "DEBUG",
                  "INFO",
                  "SUCCESS",
                  "WARNING",
                  "ERROR",
                  "CRITICAL"
               ],
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Log Level"
      }
   }
}

Fields:
field config_file: Path | None = None
field log_level: Literal['TRACE', 'DEBUG', 'INFO', 'SUCCESS', 'WARNING', 'ERROR', 'CRITICAL'] | None = None
field pool: int | None = None
field remote_uri: Annotated[str, AfterValidator(func=remote_uri_is_valid)] | None = None
field step: str = ''
field work_dir: Path | None = None
pydantic model pis.config.models.EnvSettings[source]

Bases: BaseModel

Environment settings model.

Show JSON schema
{
   "title": "EnvSettings",
   "description": "Environment settings model.",
   "type": "object",
   "properties": {
      "step": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Step"
      },
      "config_file": {
         "anyOf": [
            {
               "format": "path",
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Config File"
      },
      "work_dir": {
         "anyOf": [
            {
               "format": "path",
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Work Dir"
      },
      "remote_uri": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Remote Uri"
      },
      "pool": {
         "anyOf": [
            {
               "type": "integer"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Pool"
      },
      "log_level": {
         "anyOf": [
            {
               "enum": [
                  "TRACE",
                  "DEBUG",
                  "INFO",
                  "SUCCESS",
                  "WARNING",
                  "ERROR",
                  "CRITICAL"
               ],
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Log Level"
      }
   }
}

Fields:
field config_file: Path | None = None
field log_level: Literal['TRACE', 'DEBUG', 'INFO', 'SUCCESS', 'WARNING', 'ERROR', 'CRITICAL'] | None = None
field pool: int | None = None
field remote_uri: Annotated[str, AfterValidator(func=remote_uri_is_valid)] | None = None
field step: str | None = None
field work_dir: Path | None = None
pis.config.models.LOG_LEVELS

The log levels.

alias of Literal[‘TRACE’, ‘DEBUG’, ‘INFO’, ‘SUCCESS’, ‘WARNING’, ‘ERROR’, ‘CRITICAL’]

pydantic model pis.config.models.PretaskDefinition[source]

Bases: BaseTaskDefinition, BaseModel

Pretask definition model.

Show JSON schema
{
   "title": "PretaskDefinition",
   "description": "Pretask definition model.",
   "type": "object",
   "properties": {
      "name": {
         "title": "Name",
         "type": "string"
      }
   },
   "additionalProperties": true,
   "required": [
      "name"
   ]
}

Config:
  • extra: str = allow

Fields:

pydantic model pis.config.models.Settings[source]

Bases: BaseModel

Settings model.

This model is used to define the settings for the application.

It is constructed by merging the settings from the environment, CLI, and YAML configuration file. The fields are defined in order of precedence, with the environment settings taking precedence over the CLI settings, which take precedence over the YAML settings. All fields have defaults so that any field left unset after the merge will have a value.

The step field is required, but has an empty string as a default value, so objects can be created without setting it. Its validation is handled by Config class.

Show JSON schema
{
   "title": "Settings",
   "description": "Settings model.\n\nThis model is used to define the settings for the application.\n\nIt is constructed by merging the settings from the environment, CLI, and YAML\nconfiguration file. The fields are defined in order of precedence, with the\nenvironment settings taking precedence over the CLI settings, which take\nprecedence over the YAML settings. All fields have defaults so that any field\nleft unset after the merge will have a value.\n\nThe :attr:`step` field is required, but has an empty string as a default value,\nso objects can be created without setting it. Its validation is handled by\nConfig class.",
   "type": "object",
   "properties": {
      "step": {
         "default": "",
         "title": "Step",
         "type": "string"
      },
      "config_file": {
         "default": "config.yaml",
         "format": "path",
         "title": "Config File",
         "type": "string"
      },
      "work_dir": {
         "default": "output",
         "format": "path",
         "title": "Work Dir",
         "type": "string"
      },
      "remote_uri": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Remote Uri"
      },
      "pool": {
         "default": 5,
         "title": "Pool",
         "type": "integer"
      },
      "log_level": {
         "default": "INFO",
         "enum": [
            "TRACE",
            "DEBUG",
            "INFO",
            "SUCCESS",
            "WARNING",
            "ERROR",
            "CRITICAL"
         ],
         "title": "Log Level",
         "type": "string"
      }
   }
}

Fields:
field config_file: Path = PosixPath('config.yaml')
field log_level: Literal['TRACE', 'DEBUG', 'INFO', 'SUCCESS', 'WARNING', 'ERROR', 'CRITICAL'] = 'INFO'

See LOG_LEVELS.

field pool: int = 5

The number of workers in the pool where tasks will run.

field remote_uri: Annotated[str, AfterValidator(func=remote_uri_is_valid)] | None = None

The remote working URI. If present, this is where resources, logs and manifest will be uploaded to.

field step: str = ''

The step to run. This is a required field, and its validation is handled by pis.config.config.Config._validate_step().

field work_dir: Path = PosixPath('output')

The local working directory path. This is where resources will be downloaded and the manifest and logs will be written to before upload to the GCS bucket.

merge_model(incoming: BaseModel)[source]

Merge the fields of another model into this model.

Parameters:

incoming (BaseModel) – The incoming model.

pydantic model pis.config.models.TaskDefinition[source]

Bases: BaseTaskDefinition, BaseModel

Task definition model.

This model is used to define the tasks to be run by the application. It includes the destination as a required field, as the tasks are expected to create a resource.

Note

PIS is not intended to be used by chaining tasks, as there is no way to generate a dependency graph, all tasks will be sent to the worker pool at the same time.

Show JSON schema
{
   "title": "TaskDefinition",
   "description": "Task definition model.\n\nThis model is used to define the tasks to be run by the application. It includes\nthe destination as a required field, as the tasks are expected to create a resource.\n\n.. note:: PIS is not intended to be used by chaining tasks, as there is no way to\n    generate a dependency graph, all tasks will be sent to the worker pool at the same\n    time.",
   "type": "object",
   "properties": {
      "name": {
         "title": "Name",
         "type": "string"
      },
      "destination": {
         "format": "path",
         "title": "Destination",
         "type": "string"
      }
   },
   "additionalProperties": true,
   "required": [
      "name",
      "destination"
   ]
}

Config:
  • extra: str = allow

Fields:
field destination: Path [Required]
pydantic model pis.config.models.YamlSettings[source]

Bases: BaseModel

YAML settings model.

Show JSON schema
{
   "title": "YamlSettings",
   "description": "YAML settings model.",
   "type": "object",
   "properties": {
      "work_dir": {
         "anyOf": [
            {
               "format": "path",
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Work Dir"
      },
      "remote_uri": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Remote Uri"
      },
      "pool": {
         "anyOf": [
            {
               "type": "integer"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Pool"
      },
      "log_level": {
         "anyOf": [
            {
               "enum": [
                  "TRACE",
                  "DEBUG",
                  "INFO",
                  "SUCCESS",
                  "WARNING",
                  "ERROR",
                  "CRITICAL"
               ],
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Log Level"
      }
   }
}

Fields:
field log_level: Literal['TRACE', 'DEBUG', 'INFO', 'SUCCESS', 'WARNING', 'ERROR', 'CRITICAL'] | None = None
field pool: int | None = None
field remote_uri: Annotated[str, AfterValidator(func=remote_uri_is_valid)] | None = None
field work_dir: Path | None = None
pis.config.models.remote_uri_is_valid(uri: str) str[source]

Validate a remote URI.

Parameters:

uri (str) – The URI to validate.

Returns:

The URI if it is valid.

Return type:

str

Raises:

AssertionError – If the URI does not contain a protocol.

config.cli module

This module contains the functions to parse the command line arguments.

pis.config.cli.parse_cli() CliSettings[source]

Parses the command line arguments and returns a CliSettings object.

Returns:

The parsed command line arguments.

Return type:

CliSettings

pis.config.cli.to_env(var: str) str[source]

Converts a variable name to an environment variable name.

Parameters:

var (str) – The variable name to convert.

Returns:

The environment variable name.

Return type:

str

config.env module

This module contains the functions to parse the environment variables.

pis.config.env.parse_env() EnvSettings[source]

Parses the environment variables and returns an EnvSettings object.

Returns:

The parsed environment variables.

Return type:

EnvSettings

pis.config.env.to_setting(name: str) str[source]

Converts an environment variable name to a setting name.

Parameters:

name (str) – The environment variable name to convert.

Returns:

The setting name.

Return type:

str

config.yaml module

This module contains the functions to parse yaml files.

pis.config.yaml.get_yaml_sentinel_dict(yaml_dict: dict[str, Any]) dict[str, Any][source]

Get the yaml sentinel dictionary.

This function returns the sentinel dictionary for a scratchpad from the yaml settings. If the sentinel dictionary is not present, an empty dictionary is returned.

Parameters:

yaml_dict (dict[str, Any]) – The yaml settings.

Returns:

The sentinel dictionary.

Return type:

dict[str, Any]

pis.config.yaml.get_yaml_settings(yaml_dict: dict[str, Any]) YamlSettings[source]

Validate the yaml settings.

This function validates the yaml settings against the YamlSettings model.

Warning

If the settings are invalid, the program will log an error and exit.

Parameters:

yaml_dict (dict[str, Any]) – The yaml settings.

Returns:

The validated yaml settings.

Return type:

YamlSettings

pis.config.yaml.get_yaml_stepdefs(yaml_dict: dict[str, Any]) dict[str, list[BaseTaskDefinition]][source]

Validate the yaml step definitions.

This function validates the yaml step definitions against the BaseTaskDefinition model.

Warning

If the step definitions are invalid, the program will log an error and exit.

Parameters:

yaml_dict (dict[str, Any]) – The yaml settings.

Returns:

The validated yaml step definitions.

Return type:

dict[str, list[BaseTaskDefinition]]

pis.config.yaml.load_yaml_file(config_file: Path) str[source]

Load a yaml file.

Parameters:

config_file (Path) – The path to the yaml file.

Returns:

The contents of the yaml file.

Return type:

str

pis.config.yaml.parse_yaml(config_file: Path) dict[str, Any][source]

Parse a yaml file.

This function loads a yaml file, parses its content, and returns it as a dictionary.

Warning

If the file cannot be read or the content cannot be parsed, the program will log an error and exit.

Parameters:

config_file (Path) – The path to the yaml file.

Returns:

The parsed yaml content.

Return type:

dict

pis.config.yaml.parse_yaml_string(yaml_string: str) dict[source]

Parse a yaml string.

Parameters:

yaml_string (str) – The yaml string to parse.

Returns:

The parsed yaml content.

Return type:

dict

config.scratchpad module

Scratchpad module.

This module defines the Scratchpad class, which is a centralized place to store key-value pairs in the configuration of the application. It provides utilities to perform template substition.

class pis.config.scratchpad.Scratchpad(sentinel_dict: dict[str, Any] | None = None)[source]

A class to store and replace placeholders in strings.

This class is used to store key-value pairs and replace placeholders in strings with the corresponding values. The placeholders are defined in the strings using the dollar sign followed by the placeholder name enclosed in curly braces, e.g., ${person.name}. The placeholders can have dots in their names to represent nested dictionaries or objects.

Example:
>>> scratchpad = Scratchpad()
>>> scratchpad.store('person.name', 'Alice')
>>> scratchpad.replace('Hello, ${person.name}!')
'Hello, Alice!'
Variables:

sentinel_dict (dict[str, Any]) – A dictionary to store the key-value pairs.

replace(sentinel: str | Path) str[source]

Replace placeholders in a string with the corresponding values.

Parameters:

sentinel (str | Path) – The string with placeholders to replace.

Returns:

The string with the placeholders replaced by their values.

Return type:

str

Raises:

ScratchpadError – If a placeholder in the string does not have a corresponding value in the scratchpad.

store(key: str, value: str | list[str])[source]

Store a key-value pair in the scratchpad.

Both strings and lists of strings are accepted as values. It might be useful to extend it to accept dicts as well.

Parameters:
  • key (str) – The key to store.

  • value (str | list[str]) – The value to store.

class pis.config.scratchpad.TemplateWithDots(template)[source]

Bases: Template

A subclass of string.Template that allows dots in placeholders.

idpattern = '(?a:[_a-z][._a-z0-9]*)'
pattern = re.compile('\n            \\$(?:\n              (?P<escaped>\\$)  |   # Escape sequence of two delimiters\n              (?P<named>(?a:[_a-z][._a-z0-9]*))       |   # delimiter and a Python identifier\n         , re.IGNORECASE|re.VERBOSE)

Module contents

Configuration package.

pis.config.init_config()[source]

Initialize the global configuration object.

pis.config.scratchpad() Scratchpad[source]

Return the scratchpad.

If the scratchpad has not been initialized, it will be initialized. The scratchpad is stored for subsequent calls.

Returns:

The scratchpad.

Return type:

pis.config.scratchpad.Scratchpad

pis.config.settings() Settings[source]

Return the application settings.

See pis.config.models.Settings.

Returns:

The settings object.

Return type:

pis.config.models.Settings

pis.config.steps() list[str][source]

Return the steps.

If the steps have not been loaded, they will be loaded from the configuration file. The steps are stored for subsequent calls.

Returns:

The steps.

Return type:

list[str]

pis.config.task_definitions() list[BaseTaskDefinition][source]

Return the task definitions.

If the task definitions have not been loaded, they will be loaded from the configuration file. The task definitions are stored for subsequent calls.

Returns:

The task definitions.

Return type:

list[BaseTaskDefinition]