cubed-xarray0.0.8
Published
Interface for using cubed with xarray for parallel computation.
pip install cubed-xarray
Package Downloads
Authors
Project URLs
Requires Python
>=3.10
Note: this is a proof-of-concept, and many things are incomplete, untested, or don't work.
cubed-xarray
Interface for using cubed with xarray.
Requirements
- Cubed version >=0.23.0
- Xarray version >=2024.09.0
Installation
Install via pip
pip install cubed-xarray
or conda
conda install -c conda-forge cubed-xarray
Importing
You don't need to import this package in user code. Once poperly installed, xarray should automatically become aware of this package via the magic of entrypoints.
Usage
Xarray objects backed by cubed arrays can be created either by:
- Passing existing
cubed.Arrayobjects to thedataargument of xarray constructors, - Calling
.chunkon xarray objects, - Passing a
chunksargument toxarray.open_dataset.
In (2) and (3) the choice to use cubed.Array instead of dask.array.Array is made by passing the keyword argument chunked_array_type='cubed'.
To pass arguments to the constructor of cubed.Array you should pass them via the dictionary from_array_kwargs, e.g. from_array_kwargs={'spec': cubed.Spec(allowed_mem='2GB')}.
If cubed and cubed-xarray are installed but dask is not, then specifying chunked_array_type is not necessary,
as the entrypoints system will then default to the only chunked parallel backend available (i.e. cubed).
Sharp Edges 🔪
Some things almost certainly won't work yet:
- Certain operations called in xarray but not implemented in cubed, for instance
pad(see https://github.com/tomwhite/cubed/issues/193) - Array operations involving NaNs - for now use
skipna=Trueto avoid eager loading (see https://github.com/pydata/xarray/issues/7243) - Using
parallel=Truewithxr.open_mfdatasetwon't work because cubed doesn't implement a version ofdask.Delayed(see https://github.com/pydata/xarray/issues/7810) - Groupby (see https://github.com/tomwhite/cubed/issues/223 and https://github.com/xarray-contrib/flox/issues/224)
xarray.map_blocksdoes not actually dispatch tocubed.map_blocksyet, and will always use Dask.- Certain operations using
cumreduction(e.g.ffillandbfill) are not hooked up to theChunkManageryet, so will attempt to call dask.
and some other things might work but have not yet been tried:
- Saving to formats other than zarr
In general a bug could take the form of an error, or of a silent attempt to coerce the array type to numpy by immediately computing the underlying array.
Tests
Integration tests for wrapping cubed with xarray also live in this repository.