cached-path1.6.3
Published
A file utility for accessing both local and remote files through a unified interface
pip install cached-path
Package Downloads
Project URLs
Requires Python
>3.8
Dependencies
- requests
<3.0,>=2.0
- rich
<14.0,>=12.1
- filelock
<3.14,>=3.4
- boto3
<2.0,>=1.0
- google-cloud-storage
<3.0,>=1.32.0
- huggingface-hub
<0.24.0,>=0.8.1
- beaker-py
<2.0,>=1.13.2; extra == "dev"
- ruff
; extra == "dev"
- mypy
<2.0,>=1.6.0; extra == "dev"
- black
<25.0,>=23.1.0; extra == "dev"
- isort
<6.0,>=5.12.0; extra == "dev"
- pytest
; extra == "dev"
- flaky
; extra == "dev"
- twine
>=1.11.0; extra == "dev"
- setuptools
; extra == "dev"
- wheel
; extra == "dev"
- build
; extra == "dev"
- responses
==0.21.0; extra == "dev"
- Sphinx
<8.0,>=6.0; extra == "dev"
- furo
==2024.1.29; extra == "dev"
- myst-parser
<3.0,>=1.0.0; extra == "dev"
- sphinx-copybutton
==0.5.2; extra == "dev"
- sphinx-autobuild
==2021.3.14; extra == "dev"
- sphinx-autodoc-typehints
; extra == "dev"
- packaging
; extra == "dev"
cached-path
A file utility library that provides a unified, simple interface for accessing both local and remote files. This can be used behind other APIs that need to access files agnostic to where they are located.
Quick links
Installation
cached-path requires Python 3.7 or later.
Installing with pip
cached-path is available on PyPI. Just run
pip install cached-path
Installing from source
To install cached-path from source, first clone the repository:
git clone https://github.com/allenai/cached_path.git
cd cached_path
Then run
pip install -e .
Usage
from cached_path import cached_path
Given something that might be a URL or local path, cached_path()
determines which.
If it's a remote resource, it downloads the file and caches it to the cache directory, and
then returns the path to the cached file. If it's already a local path,
it makes sure the file exists and returns the path.
For URLs, http://
, https://
, s3://
(AWS S3), gs://
(Google Cloud Storage), and hf://
(HuggingFace Hub) are all supported out-of-the-box.
Optionally beaker://
URLs in the form of beaker://{user_name}/{dataset_name}/{file_path}
are supported, which requires beaker-py to be installed.
For example, to download the PyTorch weights for the model epwalsh/bert-xsmall-dummy
on HuggingFace, you could do:
cached_path("hf://epwalsh/bert-xsmall-dummy/pytorch_model.bin")
For paths or URLs that point to a tarfile or zipfile, you can also add a path
to a specific file to the url_or_filename
preceeded by a "!", and the archive will
be automatically extracted (provided you set extract_archive
to True
),
returning the local path to the specific file. For example:
cached_path("model.tar.gz!weights.th", extract_archive=True)
Cache directory
By default the cache directory is ~/.cache/cached_path/
, however there are several ways to override this setting:
- set the environment variable
CACHED_PATH_CACHE_ROOT
, - call
set_cache_dir()
, or - set the
cache_dir
argument each time you callcached_path()
.
Team
cached-path is developed and maintained by the AllenNLP team, backed by the Allen Institute for Artificial Intelligence (AI2). AI2 is a non-profit institute with the mission to contribute to humanity through high-impact AI research and engineering. To learn more about who specifically contributed to this codebase, see our contributors page.
License
cached-path is licensed under Apache 2.0. A full copy of the license can be found on GitHub.