Skip to content

Documentation for ToMeDa

tomeda.params

The module serves as a central interface for configuration management in ToMeDa. It provides robust support for command line argument processing, handling of environment variables and dotenv files.

The main component of this module is the TomedaParameter dataclass, which stores values from various sources with prioritization. These values are fundamentally validated through properties within the dataclass.

The get_parameters method allows for the creation of an instance of TomedaParameterValues, populated with the appropriate values.

This module facilitates effective configuration management by streamlining the integration of command line arguments, environment variables, and configuration files, ensuring a coherent and prioritized collection of configuration parameters.

logger module-attribute

logger: TraceLogger = getLogger(__name__)

TomedaParameter dataclass

TomedaParameter()

A data class to store and manage parameters for the Tomeda application.

This class uses properties to enforce type checking and validation for each parameter. It also includes a method to validate interdependencies between different parameters.

Attributes:

Name Type Description
_gatherer_file Path | None

Path to the gatherer file.

_schema_module str | None

Python module to import the schema from.

_schema_class str | None

Python class name to import the schema from.

_read bool

Flag indicating whether to read a filled-in gatherer file.

_collector_root Path | None

Root directory (or file) of the Tomeda collector process.

_output Path | None

Path to output the metadata as JSON.

_force_overwrite bool

Flag to overwrite the existing configuration file if it exists.

_dry_run bool

Flag for a dry run, executing the script without making actual changes.

_create_gatherer bool

Flag to create an empty gatherer file.

_create_schema_documentation bool

Flag to create the schema documentation.

_create_dataset_table bool

Flag to create the dataset table.

_check_recommendations bool

Flag to check if the recommendations are met.

_extract_repository_metadata bool

Flag to extract metadata from the data repository.

_derive_repository_keys bool

Flag to derive keys from the data repository.

_derive_repository_tsv_from_mapping bool

Flag to derive TSV files from the mapping table.

_create_repository_compatible_metadata bool

Flag to create repository compatible metadata.

_tsv_dir list[Path]

Directories containing TSV files.

_matched_entries Path | None

Path to the matched entries file.

_schema_info_table Path | None

Path to the schema info table.

_new_keys list[Path]

List of paths to the new keys file.

_dataset_metadata Path | None

Path to the dataset metadata file.

_mapping_table Path | None

Mapping table, serving as an input.

_upload bool

Flag to upload the metadata JSON to the data repository server.

_upload_file Path | None

Path to the upload file.

_server_url str | None

URL of the data repository server.

_target_collection str | None

Name of the target collection in the data repository.

_query_fields list[str]

Fields used to check if the dataset already exists.

_api_token str | None

API token for authentication at the data repository server.

Methods:

Name Description
validate_interdependent_attributes

Validates the interdependencies between different parameters.

Example

params = TomedaParameter() params.dry_run = '1' params.validate_interdependent_attributes()

api_token property writable

api_token: str | None

API token for authentication at the data repository server.

check_recommendations property writable

check_recommendations: bool

Flag to check if the recommendations are met.

collector_root property writable

collector_root: Path | None

Root directory (or file) of the Tomeda collector process.

create_dataset_table property writable

create_dataset_table: bool

Flag to create the dataset table.

create_gatherer property writable

create_gatherer: bool

Flag to create an empty gatherer file.

create_repository_compatible_metadata property writable

create_repository_compatible_metadata: bool

Flag to create repository compatible metadata.

create_schema_documentation property writable

create_schema_documentation: bool

Flag to create the schema documentation.

dataset_metadata property writable

dataset_metadata: Path | None

Path to the dataset metadata file.

derive_repository_keys property writable

derive_repository_keys: bool

Flag to derive keys from the data repository.

derive_repository_tsv_from_mapping property writable

derive_repository_tsv_from_mapping: bool

Flag to derive TSV files from the mapping table.

dry_run property writable

dry_run: bool

Flag for dry run.

extract_repository_metadata property writable

extract_repository_metadata: bool

Flag to extract metadata from the data repository.

force_overwrite property writable

force_overwrite: bool

Flag to overwrite the existing configuration file if it exists.

gatherer_file property writable

gatherer_file: Path | None

Path to the gatherer file.

mapping_table property writable

mapping_table: Path | None

Mapping table, serving as an input.

matched_entries property writable

matched_entries: Path | None

Path to the matched entries file.

new_keys property writable

new_keys: list[Path]

Path to the new keys file.

output property writable

output: Path | None

Path to output the metadata as JSON.

query_fields property writable

query_fields: list[str]

Fields used to check if the dataset already exists.

read property writable

read: bool

Flag indicating whether to read a filled-in gatherer file.

schema_class property writable

schema_class: str | None

Python class name to import the schema from.

schema_info_table property writable

schema_info_table: Path | None

Path to the schema info table.

schema_module property writable

schema_module: str | None

Python module to import the schema from.

server_url property writable

server_url: str | None

URL of the data repository server.

target_collection property writable

target_collection: str | None

Name of the target collection in the data repository.

tsv_dir property writable

tsv_dir: list[Path]

Directories containing TSV files.

upload property writable

upload: bool

Flag to upload the metadata JSON to the data repository server.

upload_file property writable

upload_file: Path | None

Path to the upload file.

validate_interdependent_attributes

validate_interdependent_attributes() -> None

Validates the interdependencies between different parameters.

Raises:

Type Description
TomedaValueError

If the interdependencies are not met.

Source code in tomeda/params.py
 924
 925
 926
 927
 928
 929
 930
 931
 932
 933
 934
 935
 936
 937
 938
 939
 940
 941
 942
 943
 944
 945
 946
 947
 948
 949
 950
 951
 952
 953
 954
 955
 956
 957
 958
 959
 960
 961
 962
 963
 964
 965
 966
 967
 968
 969
 970
 971
 972
 973
 974
 975
 976
 977
 978
 979
 980
 981
 982
 983
 984
 985
 986
 987
 988
 989
 990
 991
 992
 993
 994
 995
 996
 997
 998
 999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
def validate_interdependent_attributes(self) -> None:
    """
    Validates the interdependencies between different parameters.

    Raises
    ------
    TomedaValueError
        If the interdependencies are not met.
    """
    # DOFF_00
    if self.create_schema_documentation and (
        self.schema_module is None
        or self.schema_class is None
        or self.output is None
    ):
        raise TomedaValueError(
            f"schema_module: {self.schema_module}, schema_class: "
            f"{self.schema_class}, output: {self.output}",
            "For creating schema documentation, schema_module, "
            "schema_class, and output must be set.",
        )

    # DOFF_01
    if self.create_gatherer and (
        self.output is None
        or self.schema_module is None
        or self.schema_class is None
    ):
        raise TomedaValueError(
            f"schema_module: {self.schema_module}, schema_class: "
            f"{self.schema_class}, output: {self.output}",
            "For creating a gatherer file, gatherer_file, "
            "schema_module, and schema_class must be set.",
        )

    # DOFF_02
    if self.create_dataset_table and (
        self.schema_module is None
        or self.schema_class is None
        or self.output is None
    ):
        raise TomedaValueError(
            f"schema_module: {self.schema_module}, schema_class: "
            f"{self.schema_class}, output: {self.output}",
            "For creating a dataset table, schema_module, "
            "schema_class, and output must be set.",
        )

    # DOFF_03
    if self.extract_repository_metadata and (
        self.tsv_dir is None or self.output is None
    ):
        raise TomedaValueError(
            f"tsv_dir: {self.tsv_dir}, output: {self.output}",
            "For extracting repository metadata, tsv_dir and output "
            "must be set.",
        )

    # DOFF_04
    if self.derive_repository_keys and (
        self.matched_entries is None or self.schema_info_table is None
    ):
        raise TomedaValueError(
            f"matched_entries: {self.matched_entries}, "
            f"schema_info_table: {self.schema_info_table}",
            "For deriving repository keys, matched_entries and "
            "schema_info_table must be set.",
        )

    # DOFF_05
    if self.derive_repository_tsv_from_mapping and (
        self.new_keys is None or self.schema_info_table is None
    ):
        raise TomedaValueError(
            f"new_keys: {self.new_keys}, "
            f"schema_info_table: {self.schema_info_table}",
            "For deriving repository TSV from mapping, new_keys and "
            "schema_info_table must be set.",
        )

    # USER_01
    if self.read and (  # pylint: disable=too-many-boolean-expressions
        self.gatherer_file is None
        or self.collector_root is None
        or self.schema_module is None
        or self.schema_class is None
        or self.output is None
    ):
        raise TomedaValueError(
            f"gatherer_file: {self.gatherer_file}, "
            f"collector_root: {self.collector_root}, "
            f"schema_module: {self.schema_module}, "
            f"schema_class: {self.schema_class}, "
            f"output: {self.output}",
            "For reading a gatherer file, gatherer_file, collector_root, "
            "schema_module, schema_class, and output must be set.",
        )

    # USER_02
    if self.create_repository_compatible_metadata and (
        self.mapping_table is None
        or self.tsv_dir is None
        or self.dataset_metadata is None
        or self.output is None
    ):
        raise TomedaValueError(
            f"mapping_table: {self.mapping_table}, "
            f"tsv_dir: {self.tsv_dir}, "
            f"dataset_metadata: {self.dataset_metadata}, "
            f"output: {self.output}",
            "For creating repository compatible metadata, "
            "mapping_table, tsv_dir, dataset_metadata, and output "
            "must be set.",
        )

    # USER_03
    if self.upload and (
        self.server_url is None
        or self.target_collection is None
        or self.api_token is None
    ):
        raise TomedaValueError(
            f"server_url: {self.server_url}, "
            f"target_collection: {self.target_collection}, "
            f"api_token: {self.api_token}",
            "For uploading, server_url, target_collection, "
            "and api_token must be set. (Optional: query_fields)",
        )

get_parameters

get_parameters() -> TomedaParameter

Returns the parameters for the Tomeda application.

The parameters are read from the environment variables and the command line arguments. The values are prioritized in the following order: 1. Command line arguments 2. ENV file 3. OS environment variables 4. Default values, if available

Returns:

Type Description
TomedaParameter

The parameters for the Tomeda application.

Source code in tomeda/params.py
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
def get_parameters() -> TomedaParameter:
    """
    Returns the parameters for the Tomeda application.

    The parameters are read from the environment variables and the command
    line arguments. The values are prioritized in the following order:
    1. Command line arguments
    2. ENV file
    3. OS environment variables
    4. Default values, if available

    Returns
    -------
    TomedaParameter
        The parameters for the Tomeda application.
    """
    logger.info("Loading parameters...")
    params = TomedaParameter()
    os_env = _load_os_env_variables()
    env_file = _load_env_file()
    cli_args = _get_cli_args().__dict__

    params = _update_parameters(params, os_env)
    params = _update_parameters(params, env_file)
    params = _update_parameters(params, cli_args)
    logger.debug("Parameters: %s", params)
    params.validate_interdependent_attributes()
    return params