Skip to content

Commit

Permalink
Implement new "most common" regridder. (#46)
Browse files Browse the repository at this point in the history
* Implement new "most common" regridder.

* Add 'regrid.stat' for statistical reductions other than the mode

* Add fill_value & ensure monotonic sorts only when not sorted

* Move from list[str] to list[Hashable]

* Refactor reduction methods w/ format_for_regrid, remove duplicate sortbys

* Rename expected_groups -> values

* Remove second sortby

* Add basic tests for regrid.stats

* Disable lat/lon coord formatting for stats-based methods

* Update demo notebooks

* Update docs, changelog

* test statistical padding, add extra longitude monotonicity

* fix dtype comparison

---------

Co-authored-by: Sam Levang <slevang@salientpredictions.com>
  • Loading branch information
BSchilperoort and slevang authored Sep 25, 2024
1 parent bc7be5b commit d8b7b23
Show file tree
Hide file tree
Showing 14 changed files with 1,891 additions and 335 deletions.
7 changes: 6 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,15 @@ and this project adheres to [Semantic Versioning](https://semver.org/).

## Unreleased

Changed:
- the "most common" routine has been overhauled, thanks to [@dcherian](https://github.com/dcherian). It is now much more efficient, and can operate fully lazily on dask arrays. Users do need to provide the expected groups (i.e., unique labels in the data), and the regridder is only available for `xr.DataArray` currently ([#46](https://github.com/xarray-contrib/xarray-regrid/pull/46)).
- you can now use `None` as input to the `time_dim` kwarg in the regridding methods to force regridding over the time dimension (as long as it's numeric) ([#46](https://github.com/xarray-contrib/xarray-regrid/pull/46)).

Added:
- `.regrid.stat` for reducing datasets using statistical methods such as the variance or median ([#46](https://github.com/xarray-contrib/xarray-regrid/pull/46)).
- a "least common" routine (i.e. anti-mode), which is the inverse of the most common value ([#46](https://github.com/xarray-contrib/xarray-regrid/pull/46)).
- If latitude/longitude coordinates are detected and the domain is global, apply automatic padding at the boundaries, which gives behavior more consistent with common tools like ESMF and CDO ([#45](https://github.com/xarray-contrib/xarray-regrid/pull/45)).
- Conservative regridding weights are converted to sparse matrices if the optional [sparse](https://github.com/pydata/sparse) package is installed, which improves compute and memory performance in most cases ([#49](https://github.com/xarray-contrib/xarray-regrid/pull/49)).


## 0.3.0 (2024-09-05)

Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,9 @@ With xarray-regrid it is possible to regrid between two rectilinear grids. The f
- Nearest-neighbor
- Conservative
- Cubic
- "Most common value" (zonal statistics)
- "Most common value", as well as other zonal statistics (e.g., variance or median).

All regridding methods, except for the "most common value" can operate lazily on [Dask arrays](https://docs.xarray.dev/en/latest/user-guide/dask.html).
All regridding methods can operate lazily on [Dask arrays](https://docs.xarray.dev/en/latest/user-guide/dask.html).

Note that "Most common value" is designed to regrid categorical data to a coarse resolution. For regridding categorical data to a finer resolution, please use "nearest-neighbor" regridder.

Expand Down
8 changes: 5 additions & 3 deletions docs/getting_started.rst
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,9 @@ Multiple regridding methods are available:
* `nearest-neighbor <autoapi/xarray_regrid/regrid/index.html#xarray_regrid.regrid.Regridder.conservative>`_ (``.regrid.nearest``)
* `cubic interpolation <autoapi/xarray_regrid/regrid/index.html#xarray_regrid.regrid.Regridder.cubic>`_ (``.regrid.cubic``)
* `conservative regridding <autoapi/xarray_regrid/regrid/index.html#xarray_regrid.regrid.Regridder.conservative>`_ (``.regrid.conservative``)
* `zonal statistics <autoapi/xarray_regrid/regrid/index.html#xarray_regrid.regrid.Regridder.stat>`_ (``.regrid.stat``) is available to compute statistics such as the maximum value, or variance.

Additionally, a zonal statistics `method to compute the most common value <autoapi/xarray_regrid/regrid/index.html#xarray_regrid.regrid.Regridder.most_common>`_
is available (``.regrid.most_common``).
This can be used to upscale very fine categorical data to a more course resolution.
Additionally, there are separate methods available to compute the
`most common value <autoapi/xarray_regrid/regrid/index.html#xarray_regrid.regrid.Regridder.most_common>`_
(``.regrid.most_common``) and `least common value <autoapi/xarray_regrid/regrid/index.html#xarray_regrid.regrid.Regridder.least_common>`_
(``.regrid.least_common``). This can be used to upscale very fine categorical data to a more course resolution.
4 changes: 3 additions & 1 deletion docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -37,9 +37,11 @@ The following methods are supported:
* `Nearest-neighbor <autoapi/xarray_regrid/regrid/index.html#xarray_regrid.regrid.Regridder.nearest>`_
* `Conservative <autoapi/xarray_regrid/regrid/index.html#xarray_regrid.regrid.Regridder.conservative>`_
* `Cubic <autoapi/xarray_regrid/regrid/index.html#xarray_regrid.regrid.Regridder.cubic>`_
* `Zonal statistics <autoapi/xarray_regrid/regrid/index.html#xarray_regrid.regrid.Regridder.stat>`_
* `"Most common value" (zonal statistics) <autoapi/xarray_regrid/regrid/index.html#xarray_regrid.regrid.Regridder.most_common>`_
* `"Least common value" (zonal statistics) <autoapi/xarray_regrid/regrid/index.html#xarray_regrid.regrid.Regridder.least_common>`_

Note that "Most common value" is designed to regrid categorical data to a coarse resolution. For regridding categorical data to a finer resolution, please use "nearest-neighbor" regridder.
Note that "Most/least common value" is designed to regrid categorical data to a coarse resolution. For regridding categorical data to a finer resolution, please use "nearest-neighbor" regridder.

For usage examples, please refer to the `quickstart guide <getting_started>`_ and the `example notebooks <notebooks/index>`_.

Expand Down
Loading

0 comments on commit d8b7b23

Please sign in to comment.