Thursday, June 5th, 2025 (27 days ago)
TLDR: Xarray has been through a major refactoring of its internals that makes coordinate-based data selection and alignment more customizable, via built-in and/or 3rd party indexes! In this post we highlight a few examples that take advantage of this new superpower
Xarray is a large project that is constantly evolving to meet needs of users and stay relevant to work with novel data formats and use-cases. One area of improvement identified in the Development Roadmap is the ability add new coordinate indexing capabilities beyond the original pandas.Index
. Let's look at a few examples to understand what is now possible!
TODO: Insert Benoit's awesome schematic from indexing sprint :)
Generally-useful index alternatives are already part of Xarray!
By default a pandas.Index
calculates all coordinates and holds them in-memory. There are many use-cases where for 1-D coordinates where it's more efficient to store the start,stop,and step and calculate specific coordinate values on-the-fly. THis is what RangeIndex accomplishes:
1import xarray as xr 2from xarray.indexes import RangeIndex 3 4index = RangeIndex.arange(0.0, 100_000, 0.1, dim='x') 5ds = xr.Dataset(coords=xr.Coordinates.from_xindex(index)) 6ds 7
TODO: Not sure if this one is ready to highlight(https://github.com/pydata/xarray/pull/10296)
TODO: Highlight https://xvec.readthedocs.io/en/v0.2.0/generated/xvec.GeometryIndex.html
TODO: Highlight https://github.com/dcherian/rasterix
While we're extremely excited about what can already be accomplished with the new indexing capabilities, there are plenty of exciting ideas for future work. If you're interested in getting involved we recommend following this GitHub Issue!
This work would not have been possible without technical input from the Xarray core team and community! Several developers received essential funding from a CZI Essential Open Source Software for Science (EOSS) grant as well as NASA's Open Source Tools, Frameworks, and Libraries (OSTFL) grant 80NSSC22K0345.