Skip to content

v0.75.0

Compare
Choose a tag to compare
@holukas holukas released this 26 Apr 11:26
· 214 commits to main since this release
e648180

v0.75.0 | 26 Apr 2024

XGBoost gap-filling

XGBoost can now be used to fill gaps in time series data.
In diive, XGBoost is implemented in class XGBoostTS, which adds additional options for easily including e.g.
lagged variants of feature variables, timestamp info (DOY, month, ...) and a continuous record number. It also allows
direct feature reduction by including a purely random feature (consisting of completely random numbers) and calculating
the 'permutation importance'. All features where the permutation importance is lower than for the random feature can
then be removed from the dataset, i.e., the list of features, before building the final model.

XGBoostTS and RandomForestTS both use the same base class MlRegressorGapFillingBase. This base class will also
facilitate the implementation of other gap-filling algorithms in the future.

Another fun (for me) addition is the new class TimeSince. It allows to calculate the time since the last occurrence of
specific conditions. One example where this class can be useful is the calculation of 'time since last precipitation',
expressed as number of records, which can be helpful in identifying dry conditions. More examples: 'time since freezing
conditions' based on air temperature; 'time since management' based on management info, e.g. fertilization events.
Please see the notebook for some illustrative examples.

Please note that diive is still under developement and bugs can be expected.

New features

  • Added gap-filling class XGBoostTS for time series data,
    using XGBoost (diive.pkgs.gapfilling.xgboost_ts.XGBoostTS)
  • Added new class TimeSince: counts number of records (inceremental number / counter) since the last time a time
    series was inside a specified range, useful for e.g. counting the time since last precipitation, since last freezing
    temperature, etc. (diive.pkgs.createvar.timesince.TimeSince)

Additions

  • Added base class for machine learning regressors, which is basically the code shared between the different
    methods. At the moment used by RandomForestTS and XGBoostTS. (diive.core.ml.common.MlRegressorGapFillingBase)
  • Added option to change line color directly in TimeSeries plots (diive.core.plotting.timeseries.TimeSeries.plot)

Notebooks

  • Added new notebook for gap-filling using XGBoostTS with mininmal settings (notebooks/GapFilling/XGBoostGapFillingMinimal.ipynb)
  • Added new notebook for gap-filling using XGBoostTS with more extensive settings (notebooks/GapFilling/XGBoostGapFillingExtensive.ipynb)
  • Added new notebook for creating TimeSince variables (notebooks/CalculateVariable/TimeSince.ipynb)

Tests

  • Added test case for XGBoost gap-filling (tests.test_gapfilling.TestGapFilling.test_gapfilling_xgboost)
  • Updated test case for random forest gap-filling (tests.test_gapfilling.TestGapFilling.test_gapfilling_randomforest)
  • Harmonized test case for XGBoostTS with test case of RandomForestTS
  • Added test case for TimeSince variable creation (tests.test_createvar.TestCreateVar.test_timesince)

What's Changed

Full Changelog: v0.74.1...v0.75.0