eformer.paths

Contents

eformer.paths#

Universal path utilities for local and cloud storage.

Provides a unified API for working with paths across different storage backends including local filesystem and Google Cloud Storage (GCS).

Classes:

UniversalPath: Abstract base class for path operations LocalPath: Local filesystem path implementation GCSPath: Google Cloud Storage path implementation PathManager: Factory for creating appropriate path objects MLUtilPath: Extended path manager with ML utilities

Key Features:
  • Unified API for local and cloud storage

  • Transparent switching between storage backends

  • Support for JAX array and dictionary I/O

  • Recursive directory operations

  • Path manipulation and traversal

Example

>>> from eformer.paths import ePath
>>>
>>>
>>> local_path = ePath("data/model.pkl")
>>> local_path.write_bytes(data)
>>>
>>>
>>> gcs_path = ePath("gs://bucket/model.pkl")
>>> gcs_path.write_bytes(data)
>>>
>>>
>>> ePath.save_jax_array(array, "gs://bucket/weights.npy")
>>> loaded = ePath.load_jax_array("gs://bucket/weights.npy")
class eformer.paths.GCSPath(path: str, client: google.cloud.storage.client.Client | None = None)[source]#

Bases: UniversalPath

Google Cloud Storage path implementation.

Provides UniversalPath interface for Google Cloud Storage operations. Handles blob operations, bucket management, and directory emulation.

path#

Full GCS path string (gs://bucket/path).

client#

Google Cloud Storage client.

bucket_name#

Name of the GCS bucket.

blob_name#

Path within the bucket.

Example

>>> path = GCSPath("gs://my-bucket/data/model.pkl")
>>> path.exists()
True
>>> path.write_bytes(model_bytes)
>>> for item in path.parent.iterdir():
...     print(item.name)
as_posix() str[source]#

Return the string representation with forward slashes.

Returns

Path string with forward slashes as separators.

property blob#
property bucket#
exists() bool[source]#

Check if the path exists.

Returns

True if the path exists, False otherwise.

glob(pattern: str, recursive: bool = False) Iterator[GCSPath][source]#

Find paths matching a glob pattern.

Parameters
  • pattern – Glob pattern to match (e.g., “.txt”, “*/*.py”).

  • recursive – If True, search recursively through subdirectories.

Yields

UniversalPath objects for each matching path.

is_absolute() bool[source]#

Return True if the path is absolute.

Returns

True if the path is absolute, False otherwise.

is_dir() bool[source]#

Check if the path is a directory.

Returns

True if the path is a directory, False otherwise.

is_file() bool[source]#

Check if the path is a file.

Returns

True if the path is a file, False otherwise.

iterdir() Iterator[GCSPath][source]#

Iterate over the contents of a directory.

Yields

UniversalPath objects for each item in the directory.

Raises

NotADirectoryError – If the path is not a directory.

mkdir(parents: bool = True, exist_ok: bool = True) None[source]#

Create directory at this path.

Parameters
  • parents – Create parent directories if needed.

  • exist_ok – Don’t raise error if directory exists.

Raises

FileExistsError – If exist_ok is False and path exists.

property name: str#
property parent: GCSPath#
parts() tuple[str, ...][source]#

Return a tuple of the path components.

Returns

Tuple of individual path components.

read_bytes() bytes[source]#

Read binary content from the path.

Returns

The binary content of the file.

Raises
  • FileNotFoundError – If the path doesn’t exist.

  • ValueError – If trying to read from a directory.

read_text(encoding: str = 'utf-8') str[source]#

Read text content from the path.

Parameters

encoding – Text encoding to use.

Returns

The text content of the file.

Raises
  • FileNotFoundError – If the path doesn’t exist.

  • ValueError – If trying to read from a directory.

relative_to(other: GCSPath) GCSPath[source]#

Return a relative path from other to this path.

Parameters

other – Base path to compute relative path from.

Returns

Relative path from other to this path.

Raises

ValueError – If this path is not relative to other.

rename(target: GCSPath) GCSPath[source]#

Rename this path to the given target.

Parameters

target – New path name.

Returns

New path object pointing to target.

resolve() GCSPath[source]#

Make the path absolute, resolving any symlinks.

Returns

Absolute path with symlinks resolved.

rmdir() None[source]#

Remove this directory.

The directory must be empty.

Raises
  • OSError – If the directory is not empty.

  • NotADirectoryError – If the path is not a directory.

stat() dict[str, Any][source]#

Return file statistics.

Returns

Dictionary containing file metadata such as size, mtime, etc.

Raises

FileNotFoundError – If the path doesn’t exist.

stem() str[source]#

Return the final path component without its suffix.

Returns

The stem of the final path component.

Example

>>> path = LocalPath("/data/model.tar.gz")
>>> path.stem()
'model.tar'
property suffix: str#
suffixes() list[str][source]#

Return a list of the path’s file suffixes.

Returns

List of suffixes including the leading dots.

Example

>>> path = LocalPath("/data/model.tar.gz")
>>> path.suffixes()
['.tar', '.gz']

Remove this file or symbolic link.

Parameters

missing_ok – If True, don’t raise error if file doesn’t exist.

Raises

FileNotFoundError – If missing_ok is False and file doesn’t exist.

with_name(name: str) GCSPath[source]#

Return a new path with the name changed.

Parameters

name – New name for the final path component.

Returns

New path with the name replaced.

with_stem(stem: str) GCSPath[source]#

Return a new path with the stem changed.

Parameters

stem – New stem for the final path component.

Returns

New path with the stem replaced.

with_suffix(suffix: str) GCSPath[source]#

Return a new path with the suffix changed.

Parameters

suffix – New suffix (including leading dot).

Returns

New path with the suffix replaced.

write_bytes(data: bytes) None[source]#

Write binary content to the path.

Parameters

data – Binary data to write.

Raises

ValueError – If trying to write to a directory.

write_text(data: str, encoding: str = 'utf-8') None[source]#

Write text content to the path.

Parameters
  • data – Text data to write.

  • encoding – Text encoding to use.

Raises

ValueError – If trying to write to a directory.

class eformer.paths.LocalPath(path: str | pathlib.Path)[source]#

Bases: UniversalPath

Local filesystem path implementation.

Wraps pathlib.Path to provide the UniversalPath interface for local filesystem operations.

path#

The underlying pathlib.Path object.

Example

>>> path = LocalPath("/data/model.pkl")
>>> path.exists()
True
>>> path.parent
LocalPath('/data')
>>> (path.parent / "config.json").write_text(config)
as_posix() str[source]#

Return the string representation with forward slashes.

Returns

Path string with forward slashes as separators.

exists() bool[source]#

Check if the path exists.

Returns

True if the path exists, False otherwise.

glob(pattern: str, recursive: bool = False) Iterator[LocalPath][source]#

Find paths matching a glob pattern.

Parameters
  • pattern – Glob pattern to match (e.g., “.txt”, “*/*.py”).

  • recursive – If True, search recursively through subdirectories.

Yields

UniversalPath objects for each matching path.

is_absolute() bool[source]#

Return True if the path is absolute.

Returns

True if the path is absolute, False otherwise.

is_dir() bool[source]#

Check if the path is a directory.

Returns

True if the path is a directory, False otherwise.

is_file() bool[source]#

Check if the path is a file.

Returns

True if the path is a file, False otherwise.

iterdir() Iterator[LocalPath][source]#

Iterate over the contents of a directory.

Yields

UniversalPath objects for each item in the directory.

Raises

NotADirectoryError – If the path is not a directory.

mkdir(parents: bool = True, exist_ok: bool = True) None[source]#

Create directory at this path.

Parameters
  • parents – Create parent directories if needed.

  • exist_ok – Don’t raise error if directory exists.

Raises

FileExistsError – If exist_ok is False and path exists.

property name: str#
property parent: LocalPath#
parts() tuple[str, ...][source]#

Return a tuple of the path components.

Returns

Tuple of individual path components.

read_bytes() bytes[source]#

Read binary content from the path.

Returns

The binary content of the file.

Raises
  • FileNotFoundError – If the path doesn’t exist.

  • ValueError – If trying to read from a directory.

read_text(encoding: str = 'utf-8') str[source]#

Read text content from the path.

Parameters

encoding – Text encoding to use.

Returns

The text content of the file.

Raises
  • FileNotFoundError – If the path doesn’t exist.

  • ValueError – If trying to read from a directory.

relative_to(other: LocalPath) LocalPath[source]#

Return a relative path from other to this path.

Parameters

other – Base path to compute relative path from.

Returns

Relative path from other to this path.

Raises

ValueError – If this path is not relative to other.

rename(target: LocalPath) LocalPath[source]#

Rename this path to the given target.

Parameters

target – New path name.

Returns

New path object pointing to target.

resolve() LocalPath[source]#

Make the path absolute, resolving any symlinks.

Returns

Absolute path with symlinks resolved.

rmdir() None[source]#

Remove this directory.

The directory must be empty.

Raises
  • OSError – If the directory is not empty.

  • NotADirectoryError – If the path is not a directory.

stat() dict[str, Any][source]#

Return file statistics.

Returns

Dictionary containing file metadata such as size, mtime, etc.

Raises

FileNotFoundError – If the path doesn’t exist.

stem() str[source]#

Return the final path component without its suffix.

Returns

The stem of the final path component.

Example

>>> path = LocalPath("/data/model.tar.gz")
>>> path.stem()
'model.tar'
property suffix: str#
suffixes() list[str][source]#

Return a list of the path’s file suffixes.

Returns

List of suffixes including the leading dots.

Example

>>> path = LocalPath("/data/model.tar.gz")
>>> path.suffixes()
['.tar', '.gz']

Remove this file or symbolic link.

Parameters

missing_ok – If True, don’t raise error if file doesn’t exist.

Raises

FileNotFoundError – If missing_ok is False and file doesn’t exist.

with_name(name: str) LocalPath[source]#

Return a new path with the name changed.

Parameters

name – New name for the final path component.

Returns

New path with the name replaced.

with_stem(stem: str) LocalPath[source]#

Return a new path with the stem changed.

Parameters

stem – New stem for the final path component.

Returns

New path with the stem replaced.

with_suffix(suffix: str) LocalPath[source]#

Return a new path with the suffix changed.

Parameters

suffix – New suffix (including leading dot).

Returns

New path with the suffix replaced.

write_bytes(data: bytes) None[source]#

Write binary content to the path.

Parameters

data – Binary data to write.

Raises

ValueError – If trying to write to a directory.

write_text(data: str, encoding: str = 'utf-8') None[source]#

Write text content to the path.

Parameters
  • data – Text data to write.

  • encoding – Text encoding to use.

Raises

ValueError – If trying to write to a directory.

class eformer.paths.MLUtilPath(gcs_client: google.cloud.storage.client.Client | None = None, gcs_credentials_path: str | None = None)[source]#

Bases: PathManager

Extended path manager with ML-specific utilities.

Adds JAX array and dictionary I/O operations to the base PathManager. Supports various serialization formats and handles JAX/NumPy conversions.

Example

>>> path_manager = MLUtilPath()
>>>
>>> path_manager.save_jax_array(array, "gs://bucket/weights.npy")
>>>
>>> loaded = path_manager.load_jax_array("gs://bucket/weights.npy")
>>>
>>> path_manager.save_dict({"weights": weights}, "config.json")
copy_tree(src: str | eformer.paths.UniversalPath, dst: str | eformer.paths.UniversalPath) None[source]#

Copy entire directory tree between local and GCS.

Recursively copies all files and directories from source to destination. Works across different storage backends (local to GCS, GCS to local, etc.).

Parameters
  • src – Source path (directory or file).

  • dst – Destination path.

Example

>>>
>>> manager.copy_tree("data/", "gs://bucket/data/")
>>>
>>> manager.copy_tree("gs://bucket/model/", "local_model/")
load_dict(path: str | eformer.paths.UniversalPath, format: str = 'json') dict[str, Any][source]#

Load dictionary from various formats.

Parameters
  • path – Source path (local or GCS).

  • format – Serialization format (‘json’ or ‘pickle’).

Returns

Loaded dictionary.

Raises
  • ValueError – If format is not supported.

  • FileNotFoundError – If path doesn’t exist.

Example

>>> config = manager.load_dict("config.json")
>>> data = manager.load_dict("gs://bucket/data.pkl", "pickle")
load_jax_array(path: str | eformer.paths.UniversalPath, format: str = 'npy') Array[source]#

Load JAX array from various formats.

Parameters
  • path – Source path (local or GCS).

  • format – Serialization format (‘npy’ or ‘pickle’).

Returns

Loaded JAX array.

Raises
  • ValueError – If format is not supported.

  • FileNotFoundError – If path doesn’t exist.

Example

>>> weights = manager.load_jax_array("weights.npy")
>>> biases = manager.load_jax_array("gs://bucket/biases.pkl", "pickle")
save_dict(data: dict[str, Any], path: str | eformer.paths.UniversalPath, format: str = 'json') None[source]#

Save dictionary in various formats.

Parameters
  • data – Dictionary to save. Values can include JAX arrays which will be converted to lists for JSON format.

  • path – Destination path (local or GCS).

  • format – Serialization format (‘json’ or ‘pickle’).

Raises

ValueError – If format is not supported.

Example

>>> manager.save_dict({"weights": [1, 2, 3]}, "config.json")
>>> manager.save_dict(complex_data, "gs://bucket/data.pkl", "pickle")
save_jax_array(array: Array, path: str | eformer.paths.UniversalPath, format: str = 'npy') None[source]#

Save JAX array in various formats.

Parameters
  • array – JAX array to save.

  • path – Destination path (local or GCS).

  • format – Serialization format (‘npy’ or ‘pickle’).

Raises

ValueError – If format is not supported.

Example

>>> manager.save_jax_array(weights, "weights.npy")
>>> manager.save_jax_array(biases, "gs://bucket/biases.pkl", "pickle")
class eformer.paths.PathManager(gcs_client: google.cloud.storage.client.Client | None = None, gcs_credentials_path: str | None = None)[source]#

Bases: object

Factory for creating appropriate path objects.

Automatically creates LocalPath or GCSPath based on the path string. Manages GCS client creation and credential handling.

gcs_client#

Cached GCS client instance.

Example

>>> manager = PathManager()
>>> local = manager("/data/file.txt")
>>> isinstance(local, LocalPath)
True
>>> gcs = manager("gs://bucket/file.txt")
>>> isinstance(gcs, GCSPath)
True
property gcs_client#
class eformer.paths.UniversalPath[source]#

Bases: ABC

Abstract base class for universal path operations.

Defines the interface for path operations that work across different storage backends. All concrete implementations must provide these methods.

This class follows the pathlib.Path API where possible to provide a familiar interface for Python developers.

abstract as_posix() str[source]#

Return the string representation with forward slashes.

Returns

Path string with forward slashes as separators.

abstract exists() bool[source]#

Check if the path exists.

Returns

True if the path exists, False otherwise.

abstract glob(pattern: str, recursive: bool = False) Iterator[UniversalPath][source]#

Find paths matching a glob pattern.

Parameters
  • pattern – Glob pattern to match (e.g., “.txt”, “*/*.py”).

  • recursive – If True, search recursively through subdirectories.

Yields

UniversalPath objects for each matching path.

abstract is_absolute() bool[source]#

Return True if the path is absolute.

Returns

True if the path is absolute, False otherwise.

abstract is_dir() bool[source]#

Check if the path is a directory.

Returns

True if the path is a directory, False otherwise.

abstract is_file() bool[source]#

Check if the path is a file.

Returns

True if the path is a file, False otherwise.

abstract iterdir() Iterator[UniversalPath][source]#

Iterate over the contents of a directory.

Yields

UniversalPath objects for each item in the directory.

Raises

NotADirectoryError – If the path is not a directory.

abstract mkdir(parents: bool = True, exist_ok: bool = True) None[source]#

Create directory at this path.

Parameters
  • parents – Create parent directories if needed.

  • exist_ok – Don’t raise error if directory exists.

Raises

FileExistsError – If exist_ok is False and path exists.

abstract parts() tuple[str, ...][source]#

Return a tuple of the path components.

Returns

Tuple of individual path components.

abstract read_bytes() bytes[source]#

Read binary content from the path.

Returns

The binary content of the file.

Raises
  • FileNotFoundError – If the path doesn’t exist.

  • ValueError – If trying to read from a directory.

abstract read_text(encoding: str = 'utf-8') str[source]#

Read text content from the path.

Parameters

encoding – Text encoding to use.

Returns

The text content of the file.

Raises
  • FileNotFoundError – If the path doesn’t exist.

  • ValueError – If trying to read from a directory.

abstract relative_to(other: UniversalPath) UniversalPath[source]#

Return a relative path from other to this path.

Parameters

other – Base path to compute relative path from.

Returns

Relative path from other to this path.

Raises

ValueError – If this path is not relative to other.

abstract rename(target: UniversalPath) UniversalPath[source]#

Rename this path to the given target.

Parameters

target – New path name.

Returns

New path object pointing to target.

abstract resolve() UniversalPath[source]#

Make the path absolute, resolving any symlinks.

Returns

Absolute path with symlinks resolved.

abstract rmdir() None[source]#

Remove this directory.

The directory must be empty.

Raises
  • OSError – If the directory is not empty.

  • NotADirectoryError – If the path is not a directory.

abstract stat() dict[str, Any][source]#

Return file statistics.

Returns

Dictionary containing file metadata such as size, mtime, etc.

Raises

FileNotFoundError – If the path doesn’t exist.

abstract stem() str[source]#

Return the final path component without its suffix.

Returns

The stem of the final path component.

Example

>>> path = LocalPath("/data/model.tar.gz")
>>> path.stem()
'model.tar'
abstract suffixes() list[str][source]#

Return a list of the path’s file suffixes.

Returns

List of suffixes including the leading dots.

Example

>>> path = LocalPath("/data/model.tar.gz")
>>> path.suffixes()
['.tar', '.gz']

Remove this file or symbolic link.

Parameters

missing_ok – If True, don’t raise error if file doesn’t exist.

Raises

FileNotFoundError – If missing_ok is False and file doesn’t exist.

abstract with_name(name: str) UniversalPath[source]#

Return a new path with the name changed.

Parameters

name – New name for the final path component.

Returns

New path with the name replaced.

abstract with_stem(stem: str) UniversalPath[source]#

Return a new path with the stem changed.

Parameters

stem – New stem for the final path component.

Returns

New path with the stem replaced.

abstract with_suffix(suffix: str) UniversalPath[source]#

Return a new path with the suffix changed.

Parameters

suffix – New suffix (including leading dot).

Returns

New path with the suffix replaced.

abstract write_bytes(data: bytes) None[source]#

Write binary content to the path.

Parameters

data – Binary data to write.

Raises

ValueError – If trying to write to a directory.

abstract write_text(data: str, encoding: str = 'utf-8') None[source]#

Write text content to the path.

Parameters
  • data – Text data to write.

  • encoding – Text encoding to use.

Raises

ValueError – If trying to write to a directory.

eformer.paths.is_local_path(path: Union[str, Path, UniversalPath]) bool[source]#

Return True when a path points at the local filesystem.

eformer.paths.is_remote_path(path: Union[str, Path, UniversalPath]) bool[source]#

Return True when a path points at a non-local backend.

eformer.paths.path_protocol(path: Union[str, Path, UniversalPath]) str[source]#

Return the normalized protocol for a path-like input.

Plain local paths and file:// URLs normalize to "file". Remote URLs such as gs:// and s3:// return their scheme.