eformer.paths#
Universal path utilities for local and cloud storage.
Provides a unified API for working with paths across different storage backends including local filesystem and Google Cloud Storage (GCS).
- Classes:
UniversalPath: Abstract base class for path operations LocalPath: Local filesystem path implementation GCSPath: Google Cloud Storage path implementation PathManager: Factory for creating appropriate path objects MLUtilPath: Extended path manager with ML utilities
- Key Features:
Unified API for local and cloud storage
Transparent switching between storage backends
Support for JAX array and dictionary I/O
Recursive directory operations
Path manipulation and traversal
Example
>>> from eformer.paths import ePath
>>>
>>>
>>> local_path = ePath("data/model.pkl")
>>> local_path.write_bytes(data)
>>>
>>>
>>> gcs_path = ePath("gs://bucket/model.pkl")
>>> gcs_path.write_bytes(data)
>>>
>>>
>>> ePath.save_jax_array(array, "gs://bucket/weights.npy")
>>> loaded = ePath.load_jax_array("gs://bucket/weights.npy")
- class eformer.paths.GCSPath(path: str, client: google.cloud.storage.client.Client | None = None)[source]#
Bases:
UniversalPathGoogle Cloud Storage path implementation.
Provides UniversalPath interface for Google Cloud Storage operations. Handles blob operations, bucket management, and directory emulation.
- path#
Full GCS path string (gs://bucket/path).
- client#
Google Cloud Storage client.
- bucket_name#
Name of the GCS bucket.
- blob_name#
Path within the bucket.
Example
>>> path = GCSPath("gs://my-bucket/data/model.pkl") >>> path.exists() True >>> path.write_bytes(model_bytes) >>> for item in path.parent.iterdir(): ... print(item.name)
- as_posix() str[source]#
Return the string representation with forward slashes.
- Returns
Path string with forward slashes as separators.
- property blob#
- property bucket#
- glob(pattern: str, recursive: bool = False) Iterator[GCSPath][source]#
Find paths matching a glob pattern.
- Parameters
pattern – Glob pattern to match (e.g., “.txt”, “*/*.py”).
recursive – If True, search recursively through subdirectories.
- Yields
UniversalPath objects for each matching path.
- is_absolute() bool[source]#
Return True if the path is absolute.
- Returns
True if the path is absolute, False otherwise.
- is_dir() bool[source]#
Check if the path is a directory.
- Returns
True if the path is a directory, False otherwise.
- is_file() bool[source]#
Check if the path is a file.
- Returns
True if the path is a file, False otherwise.
- iterdir() Iterator[GCSPath][source]#
Iterate over the contents of a directory.
- Yields
UniversalPath objects for each item in the directory.
- Raises
NotADirectoryError – If the path is not a directory.
- mkdir(parents: bool = True, exist_ok: bool = True) None[source]#
Create directory at this path.
- Parameters
parents – Create parent directories if needed.
exist_ok – Don’t raise error if directory exists.
- Raises
FileExistsError – If exist_ok is False and path exists.
- property name: str#
- parts() tuple[str, ...][source]#
Return a tuple of the path components.
- Returns
Tuple of individual path components.
- read_bytes() bytes[source]#
Read binary content from the path.
- Returns
The binary content of the file.
- Raises
FileNotFoundError – If the path doesn’t exist.
ValueError – If trying to read from a directory.
- read_text(encoding: str = 'utf-8') str[source]#
Read text content from the path.
- Parameters
encoding – Text encoding to use.
- Returns
The text content of the file.
- Raises
FileNotFoundError – If the path doesn’t exist.
ValueError – If trying to read from a directory.
- relative_to(other: GCSPath) GCSPath[source]#
Return a relative path from other to this path.
- Parameters
other – Base path to compute relative path from.
- Returns
Relative path from other to this path.
- Raises
ValueError – If this path is not relative to other.
- rename(target: GCSPath) GCSPath[source]#
Rename this path to the given target.
- Parameters
target – New path name.
- Returns
New path object pointing to target.
- resolve() GCSPath[source]#
Make the path absolute, resolving any symlinks.
- Returns
Absolute path with symlinks resolved.
- rmdir() None[source]#
Remove this directory.
The directory must be empty.
- Raises
OSError – If the directory is not empty.
NotADirectoryError – If the path is not a directory.
- stat() dict[str, Any][source]#
Return file statistics.
- Returns
Dictionary containing file metadata such as size, mtime, etc.
- Raises
FileNotFoundError – If the path doesn’t exist.
- stem() str[source]#
Return the final path component without its suffix.
- Returns
The stem of the final path component.
Example
>>> path = LocalPath("/data/model.tar.gz") >>> path.stem() 'model.tar'
- property suffix: str#
- suffixes() list[str][source]#
Return a list of the path’s file suffixes.
- Returns
List of suffixes including the leading dots.
Example
>>> path = LocalPath("/data/model.tar.gz") >>> path.suffixes() ['.tar', '.gz']
- unlink(missing_ok: bool = False) None[source]#
Remove this file or symbolic link.
- Parameters
missing_ok – If True, don’t raise error if file doesn’t exist.
- Raises
FileNotFoundError – If missing_ok is False and file doesn’t exist.
- with_name(name: str) GCSPath[source]#
Return a new path with the name changed.
- Parameters
name – New name for the final path component.
- Returns
New path with the name replaced.
- with_stem(stem: str) GCSPath[source]#
Return a new path with the stem changed.
- Parameters
stem – New stem for the final path component.
- Returns
New path with the stem replaced.
- with_suffix(suffix: str) GCSPath[source]#
Return a new path with the suffix changed.
- Parameters
suffix – New suffix (including leading dot).
- Returns
New path with the suffix replaced.
- class eformer.paths.LocalPath(path: str | pathlib.Path)[source]#
Bases:
UniversalPathLocal filesystem path implementation.
Wraps pathlib.Path to provide the UniversalPath interface for local filesystem operations.
- path#
The underlying pathlib.Path object.
Example
>>> path = LocalPath("/data/model.pkl") >>> path.exists() True >>> path.parent LocalPath('/data') >>> (path.parent / "config.json").write_text(config)
- as_posix() str[source]#
Return the string representation with forward slashes.
- Returns
Path string with forward slashes as separators.
- glob(pattern: str, recursive: bool = False) Iterator[LocalPath][source]#
Find paths matching a glob pattern.
- Parameters
pattern – Glob pattern to match (e.g., “.txt”, “*/*.py”).
recursive – If True, search recursively through subdirectories.
- Yields
UniversalPath objects for each matching path.
- is_absolute() bool[source]#
Return True if the path is absolute.
- Returns
True if the path is absolute, False otherwise.
- is_dir() bool[source]#
Check if the path is a directory.
- Returns
True if the path is a directory, False otherwise.
- is_file() bool[source]#
Check if the path is a file.
- Returns
True if the path is a file, False otherwise.
- iterdir() Iterator[LocalPath][source]#
Iterate over the contents of a directory.
- Yields
UniversalPath objects for each item in the directory.
- Raises
NotADirectoryError – If the path is not a directory.
- mkdir(parents: bool = True, exist_ok: bool = True) None[source]#
Create directory at this path.
- Parameters
parents – Create parent directories if needed.
exist_ok – Don’t raise error if directory exists.
- Raises
FileExistsError – If exist_ok is False and path exists.
- property name: str#
- parts() tuple[str, ...][source]#
Return a tuple of the path components.
- Returns
Tuple of individual path components.
- read_bytes() bytes[source]#
Read binary content from the path.
- Returns
The binary content of the file.
- Raises
FileNotFoundError – If the path doesn’t exist.
ValueError – If trying to read from a directory.
- read_text(encoding: str = 'utf-8') str[source]#
Read text content from the path.
- Parameters
encoding – Text encoding to use.
- Returns
The text content of the file.
- Raises
FileNotFoundError – If the path doesn’t exist.
ValueError – If trying to read from a directory.
- relative_to(other: LocalPath) LocalPath[source]#
Return a relative path from other to this path.
- Parameters
other – Base path to compute relative path from.
- Returns
Relative path from other to this path.
- Raises
ValueError – If this path is not relative to other.
- rename(target: LocalPath) LocalPath[source]#
Rename this path to the given target.
- Parameters
target – New path name.
- Returns
New path object pointing to target.
- resolve() LocalPath[source]#
Make the path absolute, resolving any symlinks.
- Returns
Absolute path with symlinks resolved.
- rmdir() None[source]#
Remove this directory.
The directory must be empty.
- Raises
OSError – If the directory is not empty.
NotADirectoryError – If the path is not a directory.
- stat() dict[str, Any][source]#
Return file statistics.
- Returns
Dictionary containing file metadata such as size, mtime, etc.
- Raises
FileNotFoundError – If the path doesn’t exist.
- stem() str[source]#
Return the final path component without its suffix.
- Returns
The stem of the final path component.
Example
>>> path = LocalPath("/data/model.tar.gz") >>> path.stem() 'model.tar'
- property suffix: str#
- suffixes() list[str][source]#
Return a list of the path’s file suffixes.
- Returns
List of suffixes including the leading dots.
Example
>>> path = LocalPath("/data/model.tar.gz") >>> path.suffixes() ['.tar', '.gz']
- unlink(missing_ok: bool = False) None[source]#
Remove this file or symbolic link.
- Parameters
missing_ok – If True, don’t raise error if file doesn’t exist.
- Raises
FileNotFoundError – If missing_ok is False and file doesn’t exist.
- with_name(name: str) LocalPath[source]#
Return a new path with the name changed.
- Parameters
name – New name for the final path component.
- Returns
New path with the name replaced.
- with_stem(stem: str) LocalPath[source]#
Return a new path with the stem changed.
- Parameters
stem – New stem for the final path component.
- Returns
New path with the stem replaced.
- with_suffix(suffix: str) LocalPath[source]#
Return a new path with the suffix changed.
- Parameters
suffix – New suffix (including leading dot).
- Returns
New path with the suffix replaced.
- class eformer.paths.MLUtilPath(gcs_client: google.cloud.storage.client.Client | None = None, gcs_credentials_path: str | None = None)[source]#
Bases:
PathManagerExtended path manager with ML-specific utilities.
Adds JAX array and dictionary I/O operations to the base PathManager. Supports various serialization formats and handles JAX/NumPy conversions.
Example
>>> path_manager = MLUtilPath() >>> >>> path_manager.save_jax_array(array, "gs://bucket/weights.npy") >>> >>> loaded = path_manager.load_jax_array("gs://bucket/weights.npy") >>> >>> path_manager.save_dict({"weights": weights}, "config.json")
- copy_tree(src: str | eformer.paths.UniversalPath, dst: str | eformer.paths.UniversalPath) None[source]#
Copy entire directory tree between local and GCS.
Recursively copies all files and directories from source to destination. Works across different storage backends (local to GCS, GCS to local, etc.).
- Parameters
src – Source path (directory or file).
dst – Destination path.
Example
>>> >>> manager.copy_tree("data/", "gs://bucket/data/") >>> >>> manager.copy_tree("gs://bucket/model/", "local_model/")
- load_dict(path: str | eformer.paths.UniversalPath, format: str = 'json') dict[str, Any][source]#
Load dictionary from various formats.
- Parameters
path – Source path (local or GCS).
format – Serialization format (‘json’ or ‘pickle’).
- Returns
Loaded dictionary.
- Raises
ValueError – If format is not supported.
FileNotFoundError – If path doesn’t exist.
Example
>>> config = manager.load_dict("config.json") >>> data = manager.load_dict("gs://bucket/data.pkl", "pickle")
- load_jax_array(path: str | eformer.paths.UniversalPath, format: str = 'npy') Array[source]#
Load JAX array from various formats.
- Parameters
path – Source path (local or GCS).
format – Serialization format (‘npy’ or ‘pickle’).
- Returns
Loaded JAX array.
- Raises
ValueError – If format is not supported.
FileNotFoundError – If path doesn’t exist.
Example
>>> weights = manager.load_jax_array("weights.npy") >>> biases = manager.load_jax_array("gs://bucket/biases.pkl", "pickle")
- save_dict(data: dict[str, Any], path: str | eformer.paths.UniversalPath, format: str = 'json') None[source]#
Save dictionary in various formats.
- Parameters
data – Dictionary to save. Values can include JAX arrays which will be converted to lists for JSON format.
path – Destination path (local or GCS).
format – Serialization format (‘json’ or ‘pickle’).
- Raises
ValueError – If format is not supported.
Example
>>> manager.save_dict({"weights": [1, 2, 3]}, "config.json") >>> manager.save_dict(complex_data, "gs://bucket/data.pkl", "pickle")
- save_jax_array(array: Array, path: str | eformer.paths.UniversalPath, format: str = 'npy') None[source]#
Save JAX array in various formats.
- Parameters
array – JAX array to save.
path – Destination path (local or GCS).
format – Serialization format (‘npy’ or ‘pickle’).
- Raises
ValueError – If format is not supported.
Example
>>> manager.save_jax_array(weights, "weights.npy") >>> manager.save_jax_array(biases, "gs://bucket/biases.pkl", "pickle")
- class eformer.paths.PathManager(gcs_client: google.cloud.storage.client.Client | None = None, gcs_credentials_path: str | None = None)[source]#
Bases:
objectFactory for creating appropriate path objects.
Automatically creates LocalPath or GCSPath based on the path string. Manages GCS client creation and credential handling.
- gcs_client#
Cached GCS client instance.
Example
>>> manager = PathManager() >>> local = manager("/data/file.txt") >>> isinstance(local, LocalPath) True >>> gcs = manager("gs://bucket/file.txt") >>> isinstance(gcs, GCSPath) True
- property gcs_client#
- class eformer.paths.UniversalPath[source]#
Bases:
ABCAbstract base class for universal path operations.
Defines the interface for path operations that work across different storage backends. All concrete implementations must provide these methods.
This class follows the pathlib.Path API where possible to provide a familiar interface for Python developers.
- abstract as_posix() str[source]#
Return the string representation with forward slashes.
- Returns
Path string with forward slashes as separators.
- abstract exists() bool[source]#
Check if the path exists.
- Returns
True if the path exists, False otherwise.
- abstract glob(pattern: str, recursive: bool = False) Iterator[UniversalPath][source]#
Find paths matching a glob pattern.
- Parameters
pattern – Glob pattern to match (e.g., “.txt”, “*/*.py”).
recursive – If True, search recursively through subdirectories.
- Yields
UniversalPath objects for each matching path.
- abstract is_absolute() bool[source]#
Return True if the path is absolute.
- Returns
True if the path is absolute, False otherwise.
- abstract is_dir() bool[source]#
Check if the path is a directory.
- Returns
True if the path is a directory, False otherwise.
- abstract is_file() bool[source]#
Check if the path is a file.
- Returns
True if the path is a file, False otherwise.
- abstract iterdir() Iterator[UniversalPath][source]#
Iterate over the contents of a directory.
- Yields
UniversalPath objects for each item in the directory.
- Raises
NotADirectoryError – If the path is not a directory.
- abstract mkdir(parents: bool = True, exist_ok: bool = True) None[source]#
Create directory at this path.
- Parameters
parents – Create parent directories if needed.
exist_ok – Don’t raise error if directory exists.
- Raises
FileExistsError – If exist_ok is False and path exists.
- abstract parts() tuple[str, ...][source]#
Return a tuple of the path components.
- Returns
Tuple of individual path components.
- abstract read_bytes() bytes[source]#
Read binary content from the path.
- Returns
The binary content of the file.
- Raises
FileNotFoundError – If the path doesn’t exist.
ValueError – If trying to read from a directory.
- abstract read_text(encoding: str = 'utf-8') str[source]#
Read text content from the path.
- Parameters
encoding – Text encoding to use.
- Returns
The text content of the file.
- Raises
FileNotFoundError – If the path doesn’t exist.
ValueError – If trying to read from a directory.
- abstract relative_to(other: UniversalPath) UniversalPath[source]#
Return a relative path from other to this path.
- Parameters
other – Base path to compute relative path from.
- Returns
Relative path from other to this path.
- Raises
ValueError – If this path is not relative to other.
- abstract rename(target: UniversalPath) UniversalPath[source]#
Rename this path to the given target.
- Parameters
target – New path name.
- Returns
New path object pointing to target.
- abstract resolve() UniversalPath[source]#
Make the path absolute, resolving any symlinks.
- Returns
Absolute path with symlinks resolved.
- abstract rmdir() None[source]#
Remove this directory.
The directory must be empty.
- Raises
OSError – If the directory is not empty.
NotADirectoryError – If the path is not a directory.
- abstract stat() dict[str, Any][source]#
Return file statistics.
- Returns
Dictionary containing file metadata such as size, mtime, etc.
- Raises
FileNotFoundError – If the path doesn’t exist.
- abstract stem() str[source]#
Return the final path component without its suffix.
- Returns
The stem of the final path component.
Example
>>> path = LocalPath("/data/model.tar.gz") >>> path.stem() 'model.tar'
- abstract suffixes() list[str][source]#
Return a list of the path’s file suffixes.
- Returns
List of suffixes including the leading dots.
Example
>>> path = LocalPath("/data/model.tar.gz") >>> path.suffixes() ['.tar', '.gz']
- abstract unlink(missing_ok: bool = False) None[source]#
Remove this file or symbolic link.
- Parameters
missing_ok – If True, don’t raise error if file doesn’t exist.
- Raises
FileNotFoundError – If missing_ok is False and file doesn’t exist.
- abstract with_name(name: str) UniversalPath[source]#
Return a new path with the name changed.
- Parameters
name – New name for the final path component.
- Returns
New path with the name replaced.
- abstract with_stem(stem: str) UniversalPath[source]#
Return a new path with the stem changed.
- Parameters
stem – New stem for the final path component.
- Returns
New path with the stem replaced.
- abstract with_suffix(suffix: str) UniversalPath[source]#
Return a new path with the suffix changed.
- Parameters
suffix – New suffix (including leading dot).
- Returns
New path with the suffix replaced.
- eformer.paths.is_local_path(path: Union[str, Path, UniversalPath]) bool[source]#
Return True when a path points at the local filesystem.
- eformer.paths.is_remote_path(path: Union[str, Path, UniversalPath]) bool[source]#
Return True when a path points at a non-local backend.
- eformer.paths.path_protocol(path: Union[str, Path, UniversalPath]) str[source]#
Return the normalized protocol for a path-like input.
Plain local paths and
file://URLs normalize to"file". Remote URLs such asgs://ands3://return their scheme.