Documentation for `ToMeDa`

tomeda.params

The module serves as a central interface for configuration management in ToMeDa. It provides robust support for command line argument processing, handling of environment variables and dotenv files.

The main component of this module is the TomedaParameter dataclass, which stores values from various sources with prioritization. These values are fundamentally validated through properties within the dataclass.

The get_parameters method allows for the creation of an instance of TomedaParameterValues, populated with the appropriate values.

This module facilitates effective configuration management by streamlining the integration of command line arguments, environment variables, and configuration files, ensuring a coherent and prioritized collection of configuration parameters.

logger `module-attribute`

logger: TraceLogger = getLogger(__name__)

TomedaParameter `dataclass`

TomedaParameter()

A data class to store and manage parameters for the Tomeda application.

This class uses properties to enforce type checking and validation for each parameter. It also includes a method to validate interdependencies between different parameters.

Attributes:

Name	Type	Description
`_gatherer_file`	`Path \| None`	Path to the gatherer file.
`_schema_module`	`str \| None`	Python module to import the schema from.
`_schema_class`	`str \| None`	Python class name to import the schema from.
`_read`	`bool`	Flag indicating whether to read a filled-in gatherer file.
`_collector_root`	`Path \| None`	Root directory (or file) of the Tomeda collector process.
`_output`	`Path \| None`	Path to output the metadata as JSON.
`_force_overwrite`	`bool`	Flag to overwrite the existing configuration file if it exists.
`_dry_run`	`bool`	Flag for a dry run, executing the script without making actual changes.
`_create_gatherer`	`bool`	Flag to create an empty gatherer file.
`_create_schema_documentation`	`bool`	Flag to create the schema documentation.
`_create_dataset_table`	`bool`	Flag to create the dataset table.
`_check_recommendations`	`bool`	Flag to check if the recommendations are met.
`_extract_repository_metadata`	`bool`	Flag to extract metadata from the data repository.
`_derive_repository_keys`	`bool`	Flag to derive keys from the data repository.
`_derive_repository_tsv_from_mapping`	`bool`	Flag to derive TSV files from the mapping table.
`_create_repository_compatible_metadata`	`bool`	Flag to create repository compatible metadata.
`_tsv_dir`	`list[Path]`	Directories containing TSV files.
`_matched_entries`	`Path \| None`	Path to the matched entries file.
`_schema_info_table`	`Path \| None`	Path to the schema info table.
`_new_keys`	`list[Path]`	List of paths to the new keys file.
`_dataset_metadata`	`Path \| None`	Path to the dataset metadata file.
`_mapping_table`	`Path \| None`	Mapping table, serving as an input.
`_upload`	`bool`	Flag to upload the metadata JSON to the data repository server.
`_upload_file`	`Path \| None`	Path to the upload file.
`_server_url`	`str \| None`	URL of the data repository server.
`_target_collection`	`str \| None`	Name of the target collection in the data repository.
`_query_fields`	`list[str]`	Fields used to check if the dataset already exists.
`_api_token`	`str \| None`	API token for authentication at the data repository server.

Methods:

Name	Description
`validate_interdependent_attributes`	Validates the interdependencies between different parameters.

Example

params = TomedaParameter() params.dry_run = '1' params.validate_interdependent_attributes()

api_token `property` `writable`

api_token: str | None

API token for authentication at the data repository server.

check_recommendations `property` `writable`

check_recommendations: bool

Flag to check if the recommendations are met.

collector_root `property` `writable`

collector_root: Path | None

Root directory (or file) of the Tomeda collector process.

create_dataset_table `property` `writable`

create_dataset_table: bool

Flag to create the dataset table.

create_gatherer `property` `writable`

create_gatherer: bool

Flag to create an empty gatherer file.

create_repository_compatible_metadata `property` `writable`

create_repository_compatible_metadata: bool

Flag to create repository compatible metadata.

create_schema_documentation `property` `writable`

create_schema_documentation: bool

Flag to create the schema documentation.

dataset_metadata `property` `writable`

dataset_metadata: Path | None

Path to the dataset metadata file.

derive_repository_keys `property` `writable`

derive_repository_keys: bool

Flag to derive keys from the data repository.

derive_repository_tsv_from_mapping `property` `writable`

derive_repository_tsv_from_mapping: bool

Flag to derive TSV files from the mapping table.

dry_run `property` `writable`

dry_run: bool

Flag for dry run.

extract_repository_metadata `property` `writable`

extract_repository_metadata: bool

Flag to extract metadata from the data repository.

force_overwrite `property` `writable`

force_overwrite: bool

Flag to overwrite the existing configuration file if it exists.

gatherer_file `property` `writable`

gatherer_file: Path | None

Path to the gatherer file.

mapping_table `property` `writable`

mapping_table: Path | None

Mapping table, serving as an input.

matched_entries `property` `writable`

matched_entries: Path | None

Path to the matched entries file.

new_keys `property` `writable`

new_keys: list[Path]

Path to the new keys file.

output `property` `writable`

output: Path | None

Path to output the metadata as JSON.

query_fields `property` `writable`

query_fields: list[str]

Fields used to check if the dataset already exists.

read `property` `writable`

read: bool

Flag indicating whether to read a filled-in gatherer file.

schema_class `property` `writable`

schema_class: str | None

Python class name to import the schema from.

schema_info_table `property` `writable`

schema_info_table: Path | None

Path to the schema info table.

schema_module `property` `writable`

schema_module: str | None

Python module to import the schema from.

server_url `property` `writable`

server_url: str | None

URL of the data repository server.

target_collection `property` `writable`

target_collection: str | None

Name of the target collection in the data repository.

tsv_dir `property` `writable`

tsv_dir: list[Path]

Directories containing TSV files.

upload `property` `writable`

upload: bool

Flag to upload the metadata JSON to the data repository server.

upload_file `property` `writable`

upload_file: Path | None

Path to the upload file.

validate_interdependent_attributes

validate_interdependent_attributes() -> None

Validates the interdependencies between different parameters.

Raises:

Type	Description
`TomedaValueError`	If the interdependencies are not met.

Source code in tomeda/params.py

def validate_interdependent_attributes(self) -> None:
    """
    Validates the interdependencies between different parameters.

    Raises
    ------
    TomedaValueError
        If the interdependencies are not met.
    """
    # DOFF_00
    if self.create_schema_documentation and (
        self.schema_module is None
        or self.schema_class is None
        or self.output is None
    ):
        raise TomedaValueError(
            f"schema_module: {self.schema_module}, schema_class: "
            f"{self.schema_class}, output: {self.output}",
            "For creating schema documentation, schema_module, "
            "schema_class, and output must be set.",
        )

    # DOFF_01
    if self.create_gatherer and (
        self.output is None
        or self.schema_module is None
        or self.schema_class is None
    ):
        raise TomedaValueError(
            f"schema_module: {self.schema_module}, schema_class: "
            f"{self.schema_class}, output: {self.output}",
            "For creating a gatherer file, gatherer_file, "
            "schema_module, and schema_class must be set.",
        )

    # DOFF_02
    if self.create_dataset_table and (
        self.schema_module is None
        or self.schema_class is None
        or self.output is None
    ):
        raise TomedaValueError(
            f"schema_module: {self.schema_module}, schema_class: "
            f"{self.schema_class}, output: {self.output}",
            "For creating a dataset table, schema_module, "
            "schema_class, and output must be set.",
        )

    # DOFF_03
    if self.extract_repository_metadata and (
        self.tsv_dir is None or self.output is None
    ):
        raise TomedaValueError(
            f"tsv_dir: {self.tsv_dir}, output: {self.output}",
            "For extracting repository metadata, tsv_dir and output "
            "must be set.",
        )

    # DOFF_04
    if self.derive_repository_keys and (
        self.matched_entries is None or self.schema_info_table is None
    ):
        raise TomedaValueError(
            f"matched_entries: {self.matched_entries}, "
            f"schema_info_table: {self.schema_info_table}",
            "For deriving repository keys, matched_entries and "
            "schema_info_table must be set.",
        )

    # DOFF_05
    if self.derive_repository_tsv_from_mapping and (
        self.new_keys is None or self.schema_info_table is None
    ):
        raise TomedaValueError(
            f"new_keys: {self.new_keys}, "
            f"schema_info_table: {self.schema_info_table}",
            "For deriving repository TSV from mapping, new_keys and "
            "schema_info_table must be set.",
        )

    # USER_01
    if self.read and (  # pylint: disable=too-many-boolean-expressions
        self.gatherer_file is None
        or self.collector_root is None
        or self.schema_module is None
        or self.schema_class is None
        or self.output is None
    ):
        raise TomedaValueError(
            f"gatherer_file: {self.gatherer_file}, "
            f"collector_root: {self.collector_root}, "
            f"schema_module: {self.schema_module}, "
            f"schema_class: {self.schema_class}, "
            f"output: {self.output}",
            "For reading a gatherer file, gatherer_file, collector_root, "
            "schema_module, schema_class, and output must be set.",
        )

    # USER_02
    if self.create_repository_compatible_metadata and (
        self.mapping_table is None
        or self.tsv_dir is None
        or self.dataset_metadata is None
        or self.output is None
    ):
        raise TomedaValueError(
            f"mapping_table: {self.mapping_table}, "
            f"tsv_dir: {self.tsv_dir}, "
            f"dataset_metadata: {self.dataset_metadata}, "
            f"output: {self.output}",
            "For creating repository compatible metadata, "
            "mapping_table, tsv_dir, dataset_metadata, and output "
            "must be set.",
        )

    # USER_03
    if self.upload and (
        self.server_url is None
        or self.target_collection is None
        or self.api_token is None
    ):
        raise TomedaValueError(
            f"server_url: {self.server_url}, "
            f"target_collection: {self.target_collection}, "
            f"api_token: {self.api_token}",
            "For uploading, server_url, target_collection, "
            "and api_token must be set. (Optional: query_fields)",
        )

get_parameters

get_parameters() -> TomedaParameter

Returns the parameters for the Tomeda application.

The parameters are read from the environment variables and the command line arguments. The values are prioritized in the following order: 1. Command line arguments 2. ENV file 3. OS environment variables 4. Default values, if available

Returns:

Type	Description
`TomedaParameter`	The parameters for the Tomeda application.

Source code in tomeda/params.py

def get_parameters() -> TomedaParameter:
    """
    Returns the parameters for the Tomeda application.

    The parameters are read from the environment variables and the command
    line arguments. The values are prioritized in the following order:
    1. Command line arguments
    2. ENV file
    3. OS environment variables
    4. Default values, if available

    Returns
    -------
    TomedaParameter
        The parameters for the Tomeda application.
    """
    logger.info("Loading parameters...")
    params = TomedaParameter()
    os_env = _load_os_env_variables()
    env_file = _load_env_file()
    cli_args = _get_cli_args().__dict__

    params = _update_parameters(params, os_env)
    params = _update_parameters(params, env_file)
    params = _update_parameters(params, cli_args)
    logger.debug("Parameters: %s", params)
    params.validate_interdependent_attributes()
    return params

Documentation for ToMeDa

tomeda.params

logger module-attribute

TomedaParameter dataclass

api_token property writable

check_recommendations property writable

collector_root property writable

create_dataset_table property writable

create_gatherer property writable

create_repository_compatible_metadata property writable

create_schema_documentation property writable

dataset_metadata property writable

derive_repository_keys property writable

derive_repository_tsv_from_mapping property writable

dry_run property writable

extract_repository_metadata property writable

force_overwrite property writable

gatherer_file property writable

mapping_table property writable

matched_entries property writable

new_keys property writable

output property writable

query_fields property writable

read property writable

schema_class property writable

schema_info_table property writable

schema_module property writable

server_url property writable

target_collection property writable

tsv_dir property writable

upload property writable

upload_file property writable