Skip to content

Documentation for ToMeDa

tomeda.file_handler

This module provides a comprehensive solution for file handling in ToMeDa. The main class, TomedaFileHandler, offers a set of methods for reading, writing, and managing files.

The read and write methods are the central functions of this module:

  • read() -> list[str]: The read method, by default, reads the content of a file as a list of strings, with each line of the file being a separate string in the list. Leading and trailing whitespaces are removed and empty lines are skipped. For special use cases, it offers the flexibility to read the entire file content as a single string or to disable the removal of whitespaces.

  • write() -> None: The write method writes content to a file. When a string or a list of strings is provided, the strings are written to the file as they are. If no line breaks are present, Line Feeds (\n) are appended to the strings. If a list is empty or None, the file is created but not written to.

These two methods are central to the functionality of the TomedaFileHandler class and enable reading and writing of files with a high degree of detail and control. For further details on these and other methods in this class, refer to the respective docstrings in the codebase.

Possible optimizations:

  • File locking: If ToMeDa processes many records in parallel or in threads, it can cause problems with file accesses. File locking can be used to prevent this. Library fcntl for unix and msvcrt for windows can be used, filelock is a wrapper for both.
  • File System Check: Some file systems have no or only partial permission concept. This could be taken into account for file permission checks.
  • Chunked File Reading: If ToMeDa processes large files, it can be useful to read the file in chunks. This can be done with the read method of the file object.
  • File Encoding: If ToMeDa processes files with different encodings, it can be useful to use the 'chardet' library to detect the encoding of the file.

logger module-attribute

logger: TraceLogger = getLogger(__name__)

TomedaFileHandler

TomedaFileHandler(
    file_path: Path,
    encoding: str = "utf8",
    overwrite: bool = False,
    dryrun: bool = False,
)

File Handler takes care of reading, writing, creating and more of files.

Source code in tomeda/file_handler.py
71
72
73
74
75
76
77
78
79
80
81
def __init__(
    self,
    file_path: Path,
    encoding: str = "utf8",
    overwrite: bool = False,
    dryrun: bool = False,
) -> None:
    self.file_path = file_path
    self.encoding = encoding
    self.overwrite = overwrite
    self.dryrun = dryrun

MAX_WRITE_ATTEMPTS class-attribute instance-attribute

MAX_WRITE_ATTEMPTS = 5

WRITE_ATTEMPT_DELAY class-attribute instance-attribute

WRITE_ATTEMPT_DELAY = 0.1

dryrun instance-attribute

dryrun = dryrun

encoding instance-attribute

encoding = encoding

file_path instance-attribute

file_path = file_path

overwrite instance-attribute

overwrite = overwrite

create_file

create_file() -> None

Creates a file.

If the file already exists, nothing is done. if parent directories do not exist, they are created.

Raises:

Type Description
TomedaFileCreateError

If the file could not be created.

TomedaFileDirectoryCreateError

If the parent directory could not be created.

TomedaFilePermissionError

If the file or parent directory could not be created due to permission.

Source code in tomeda/file_handler.py
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
def create_file(self) -> None:
    """
    Creates a file.

    If the file already exists, nothing is done. if parent directories
    do not exist, they are created.

    Raises
    ------
    TomedaFileCreateError
        If the file could not be created.
    TomedaFileDirectoryCreateError
        If the parent directory could not be created.
    TomedaFilePermissionError
        If the file or parent directory could not be created due to
        permission.

    """
    if self.is_existing():
        logger.trace(
            "File '%s' exists. (%s.create_file)",
            self.file_path,
            self.__class__.__name__,
        )
    else:
        self.create_parent_dir()
        try:
            self.file_path.touch()
        except PermissionError as error:
            logger.trace(
                "File '%s' could not be created due to permission. "
                "(%s.create_file)",
                self.file_path,
                self.__class__.__name__,
                exc_info=True,
            )
            raise TomedaFilePermissionError(
                self.file_path,
                f"File could not be created due to permission: "
                f"{self.file_path}",
            ) from error
        except IOError as error:
            logger.trace(
                "File '%s' could not be created. (%s.create_file)",
                self.file_path,
                self.__class__.__name__,
                exc_info=True,
            )
            raise TomedaFileCreateError(
                self.file_path,
                f"File could not be created: {self.file_path}",
            ) from error

create_parent_dir

create_parent_dir() -> None

Creates the parent directory of a file.

Raises:

Type Description
TomedaFileDirectoryCreateError

If the parent directory could not be created.

TomedaFilePermissionError

If the parent directory could not be created due to permission.

Source code in tomeda/file_handler.py
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
def create_parent_dir(self) -> None:
    """
    Creates the parent directory of a file.

    Raises
    ------
    TomedaFileDirectoryCreateError
        If the parent directory could not be created.
    TomedaFilePermissionError
        If the parent directory could not be created due to permission.

    """
    if self.file_path.parent.exists():
        logger.trace(
            "Parent directory '%s' exists. (%s.create_parent_dir)",
            self.file_path.parent,
            self.__class__.__name__,
        )
        return

    try:
        self.file_path.parent.mkdir(parents=True, exist_ok=True)
        logger.debug(
            "Parent directory '%s' created.", self.file_path.parent
        )
    except PermissionError as error:
        logger.trace(
            "Parent directory '%s' could not be created. "
            "(%s.create_parent_dir)",
            self.file_path.parent,
            self.__class__.__name__,
            exc_info=True,
        )
        raise TomedaFilePermissionError(
            self.file_path.parent,
            f"Parent directory could not be created: "
            f"{self.file_path.parent}",
        ) from error
    except IOError as error:
        logger.trace(
            "Parent directory '%s' could not be created. "
            "(%s.create_parent_dir)",
            self.file_path.parent,
            self.__class__.__name__,
            exc_info=True,
        )
        raise TomedaFileDirectoryCreateError(
            self.file_path.parent,
            f"Parent directory could not be created: "
            f"{self.file_path.parent}",
        ) from error

is_existing

is_existing() -> bool

Checks if a file exists.

Returns:

Type Description
bool

True if file exists, False otherwise.

Source code in tomeda/file_handler.py
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
def is_existing(self) -> bool:
    """
    Checks if a file exists.

    Returns
    -------
    bool
        True if file exists, False otherwise.

    """
    result = self.file_path.exists()
    if not result:
        logger.trace(
            "File '%s' does not exist. (%s.is_existing)",
            self.file_path,
            self.__class__.__name__,
        )
    return result

is_readable

is_readable() -> bool

Checks if a file is readable.

Returns:

Type Description
bool

True if file is readable, False otherwise or not exists.

Source code in tomeda/file_handler.py
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
def is_readable(self) -> bool:
    """
    Checks if a file is readable.

    Returns
    -------
    bool
        True if file is readable, False otherwise or not exists.

    """
    result = os.access(self.file_path, os.R_OK)
    if not result:
        logger.trace(
            "File '%s' is not readable (%s.is_readable)",
            self.file_path,
            self.__class__.__name__,
        )
    return result

is_writable

is_writable() -> bool

Checks if a file is writable.

Returns:

Type Description
bool

True if file is writable, False otherwise or not exists.

Source code in tomeda/file_handler.py
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
def is_writable(self) -> bool:
    """
    Checks if a file is writable.

    Returns
    -------
    bool
        True if file is writable, False otherwise or not exists.

    """
    result = os.access(self.file_path, os.W_OK)
    if not result:
        logger.trace(
            "File '%s' is not writable or not exist. (%s.is_writeable)",
            self.file_path,
            self.__class__.__name__,
        )
    return result

read

read(raw: bool = False, strip: bool = True) -> list[str]

Reads the content of a file and returns it as a list of strings. Line endings are set to unix line endings (\n), if line endings are not removed anyway.

  • If raw is True, the content is returned as a list with a single string. strip is ignored in this case. Line endings are not removed, but set unix line endings (\n).
  • If raw is False, the content is returned as a list of strings, each string represents a line of the file.
  • If strip is True, each string is removed line ending characters and leading and trailing whitespaces. Empty lines are removed.

Parameters:

Name Type Description Default
raw bool

If True, the content is returned as a single string.

False
strip bool

If True, each line is stripped from leading and trailing whitespaces.

True

Returns:

Type Description
list[str]

The content of the file as a list of strings.

Raises:

Type Description
TomedaFileNotReadableError

If the file is not readable.

TomedaFilePermissionError

If the file is not readable due to permission error.

TomedaFileEncodingError

If the file encoding is not valid.

Source code in tomeda/file_handler.py
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
def read(self, raw: bool = False, strip: bool = True) -> list[str]:
    """
    Reads the content of a file and returns it as a list of strings. Line
    endings are set to unix line endings (`\\n`), if line endings are not
    removed anyway.

    * If `raw` is True, the content is returned as a list with a single
        string. `strip` is ignored in this case. Line endings are not
        removed, but set unix line endings (`\\n`).
    * If `raw` is False, the content is returned as a list of
        strings, each string represents a line of the file.
    * If `strip` is True, each string is removed line ending characters and
        leading and trailing whitespaces. Empty lines are removed.

    Parameters
    ----------
    raw: bool
        If `True`, the content is returned as a single string.
    strip: bool
        If `True`, each line is stripped from leading and trailing
        whitespaces.

    Returns
    -------
    list[str]
        The content of the file as a list of strings.

    Raises
    ------
    TomedaFileNotReadableError
        If the file is not readable.
    TomedaFilePermissionError
        If the file is not readable due to permission error.
    TomedaFileEncodingError
        If the file encoding is not valid.

    """
    if not self.is_readable():
        logger.trace(
            "File '%s' is not readable. (%s.read)",
            self.file_path,
            self.__class__.__name__,
            exc_info=True,
        )
        raise TomedaFileNotReadableError(
            self.file_path,
            f"File is not readable: {self.file_path}",
        )

    try:
        if raw:
            with self.file_path.open(
                mode="r", encoding=self.encoding, newline="\n"
            ) as file_:
                return [file_.read()]

        else:
            with self.file_path.open(
                encoding=self.encoding, newline=""
            ) as file_:
                file_contents = file_.read().splitlines()
            if strip:
                result = [line.strip() for line in file_contents]
                result = [line for line in result if line]
            else:
                result = file_contents

            return result
    except PermissionError as error:
        logger.trace(
            "File '%s' is not readable due to permission error. (%s.read)",
            self.file_path,
            self.__class__.__name__,
            exc_info=True,
        )
        raise TomedaFilePermissionError(
            self.file_path,
            f"File is not readable due to permission error: "
            f"{self.file_path}",
        ) from error
    except UnicodeDecodeError as error:
        logger.trace(
            "File '%s' is not readable due to encoding error. (%s.read)",
            self.file_path,
            self.__class__.__name__,
            exc_info=True,
        )
        raise TomedaFileEncodingError(
            self.file_path,
            f"File is not readable due to encoding error: {self.file_path}",
        ) from error
    except IOError as error:
        logger.trace(
            "File '%s' is not readable due to IO error. (%s.read)",
            self.file_path,
            self.__class__.__name__,
            exc_info=True,
        )
        raise TomedaFileNotReadableError(
            self.file_path,
            f"File is not readable due to IO error: {self.file_path}",
        ) from error

write

write(content: str | list[str] | None) -> None

Writes content to a file.

The content can be a string or a list of strings. If a list is given, each list item is written as a line. If a string is given, it is written as is without any line ending characters or other modifications. If the content is None or empty list, the file is created empty. If the file exists and overwrite is True, the file is overwritten/replaced. If the file exists and overwrite is False, an exception is raised. If the file does not exist, it is created with parent directories if needed.

Parameters:

Name Type Description Default
content str | list[str] | None

The content to write to the file.

required

Raises:

Type Description
TomedaFileValueError

If the content is not a string or a list of strings.

TomedaFileNotWritableError

If the file is not writable.

TomedaFileExistsError

If the file exists and overwrite is False.

TomedaFileCreateError

If the file could not be created.

TomedaDirectoryCreateError

If the parent directory could not be created.

TomedaFileEncodingError

If the encoding is not valid for the file.

TomedaFilePermissionError

If the file is not writable due to permission error.

TomedaFileError

If the file could not be written due to an unknown error.

Source code in tomeda/file_handler.py
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
def write(self, content: str | list[str] | None) -> None:
    """
    Writes content to a file.

    The content can be a string or a list of strings. If a list is given,
    each list item is written as a line. If a string is given, it is
    written as is without any line ending characters or other modifications.
    If the content is None or empty list, the file is created empty. If the
    file exists and overwrite is `True`, the file is overwritten/replaced.
    If the file exists and overwrite is `False`, an exception is raised.
    If the file does not exist, it is created with parent directories if
    needed.

    Parameters
    ----------
    content: str | list[str]
        The content to write to the file.

    Raises
    ------
    TomedaFileValueError
        If the content is not a string or a list of strings.
    TomedaFileNotWritableError
        If the file is not writable.
    TomedaFileExistsError
        If the file exists and overwrite is False.
    TomedaFileCreateError
        If the file could not be created.
    TomedaDirectoryCreateError
        If the parent directory could not be created.
    TomedaFileEncodingError
        If the encoding is not valid for the file.
    TomedaFilePermissionError
        If the file is not writable due to permission error.
    TomedaFileError
        If the file could not be written due to an unknown error.

    """
    if self.dryrun:
        logger.info("Dryrun: Not writing to '%s'.", self.file_path)
        return

    self._check_overwrite()
    self._check_writable()

    text = self._convert_line_to_text(content)
    self.create_file()

    for attempt in range(self.MAX_WRITE_ATTEMPTS):
        try:
            self.file_path.write_text(
                text,
                encoding=self.encoding,
                errors="strict",
                newline="\n",
            )
            logger.debug(
                "Wrote to file '%s'. (%s.write)",
                self.file_path,
                self.__class__.__name__,
            )
            break
        except Exception as error:
            if not self._handle_write_error(error, attempt):
                raise TomedaFileError(
                    self.file_path,
                    f"Could not write to file (Unknown Error): "
                    f"{self.file_path}",
                ) from error