Releases: holukas/diive
v0.79.1
v0.79.1 | 26 Aug 2024
Additions
- Added new function to apply quality flags to certain time periods only (
diive.pkgs.qaqc.flags.restrict_application
) - Added to option to restrict the application of the angle-of-attack flag to certain time periods (
diive.pkgs.fluxprocessingchain.level2_qualityflags.FluxQualityFlagsEddyPro.angle_of_attack_test
)
Changes
- Test options in
FluxProcessingChain
are now always passed as dict. This has the advantage that in addition to run
the test by setting the dict keyapply
toTrue
, various other test settings can be passed, for example the new
parameterapplication dates
for the angle-of-attack flag. (
diive.pkgs.fluxprocessingchain.fluxprocessingchain.FluxProcessingChain
)
Tests
- Added unittest for Flux Processing Chain up to Level-2 (
tests.test_fluxprocessingchain.TestFluxProcessingChain.test_fluxprocessingchain_level2
) - 36/36 unittests ran successfully
What's Changed
Full Changelog: v0.79.0...v0.79.1
v0.79.0
v0.79.0 | 22 Aug 2024
This version introduces a histogram plot that has the option to display z-score as vertical lines superimposed on the
distribution, which helps in assessing z-score settings used by some outlier removal functions.
Histogram plot of half-hourly air temperature measurements at the ICOS Class 1 ecosystem
station Davos between 2013 and 2022, displayed in
20 equally-spaced bins. The dashed vertical lines show the z-score and the corresponding value calculated based on the
time series. The bin with most counts is highlighted orange.
New features
- Added new class
HistogramPlot
for plotting histograms, based on the Matplotlib
implementation (diive.core.plotting.histogram.HistogramPlot
) - Added function to calculate the value for a specific z-score, e.g., based on a time series it calculates the value
where z-score =3
etc. (diive.core.funcs.funcs.val_from_zscore
)
Additions
- Added histogram plots to
FlagBase
, histograms are now shown for all outlier methods (diive.core.base.flagbase.FlagBase.defaultplot
) - Added daytime/nighttime histogram plots to (
diive.pkgs.outlierdetection.hampel.HampelDaytimeNighttime
) - Added daytime/nighttime histogram plots to (
diive.pkgs.outlierdetection.zscore.zScoreDaytimeNighttime
) - Added daytime/nighttime histogram plots to (
diive.pkgs.outlierdetection.lof.LocalOutlierFactorDaytimeNighttime
) - Added daytime/nighttime histogram plots to (
diive.pkgs.outlierdetection.absolutelimits.AbsoluteLimitsDaytimeNighttime
) - Added option to calculate the z-score with sign instead of absolute (
diive.core.funcs.funcs.zscore
)
Changes
- Improved daytime/nighttime outlier plot used by various outlier removal classes (
diive.core.base.flagbase.FlagBase.plot_outlier_daytime_nighttime
)
Notebooks
- Added notebook for plotting histograms (
notebooks/Plotting/Histogram.ipynb
) - Added notebook for manual removal of data points (
notebooks/OutlierDetection/ManualRemoval.ipynb
) - Added notebook for outlier detection using local outlier factor, separately during daytime and nighttime (
notebooks/OutlierDetection/LocalOutlierFactorDaytimeNighttime.ipynb
) - Updated notebook (
notebooks/OutlierDetection/HampelDaytimeNighttime.ipynb
) - Updated notebook (
notebooks/OutlierDetection/AbsoluteLimitsDaytimeNighttime.ipynb
) - Updated notebook (
notebooks/OutlierDetection/zScoreDaytimeNighttime.ipynb
) - Updated notebook (
notebooks/OutlierDetection/LocalOutlierFactorAllData.ipynb
)
Tests
- Added unittest for plotting histograms (
tests.test_plots.TestPlots.test_histogram
) - Added unittest for calculating histograms (without plotting) (
tests.test_analyses.TestCreateVar.test_histogram
)
What's Changed
Full Changelog: v0.78.1.1...v0.79.0
v0.78.1.1
v0.78.1
v0.78.1 | 19 Aug 2024
Changes
- Added option to set different
n_sigma
for daytime and nightime data
inHampelDaytimeNighttime
(diive.pkgs.outlierdetection.hampel.HampelDaytimeNighttime
) - Updated
flag_outliers_hampel_dtnt_test
in step-wise outlier detection - Updated
level32_flag_outliers_hampel_dtnt_test
in flux processing chain
Notebooks
- Updated notebook
HampelDaytimeNighttime
- Updated notebook
FluxProcessingChain
Tests
- Updated unittest
test_hampel_filter_daytime_nighttime
What's Changed
Full Changelog: v0.78.0...v0.78.1
v0.78.0
v0.78.0 | 18 Aug 2024
New features
- Added new class for outlier removal, based on the rolling z-score. It can also be used in step-wise outlier detection
and during meteoscreening from the
database. (diive.pkgs.outlierdetection.zscore.zScoreRolling
,diive.pkgs.outlierdetection.stepwiseoutlierdetection.StepwiseOutlierDetection
,diive.pkgs.qaqc.meteoscreening.StepwiseMeteoScreeningDb
). - Added Hampel filter for outlier removal (
diive.pkgs.outlierdetection.hampel.Hampel
) - Added Hampel filter (separate daytime, nighttime) for outlier
removal (diive.pkgs.outlierdetection.hampel.HampelDaytimeNighttime
) - Added function to plot daytime and nighttime outliers during outlier
tests (diive.core.plotting.outlier_dtnt.outlier_daytime_nighttime
)
Changes
- Flux processing chain:
- Several changes to the flux processing chain to make sure it can also work with data files not directly output by
EddyPro. The classFluxProcessingChain
can now handle files that have a different format than the two EddyPro
output filesEDDYPRO-FLUXNET-CSV-30MIN
andEDDYPRO-FULL-OUTPUT-CSV-30MIN
. See following notes. - Removed option to process EddyPro
_full_output_
files, since it as an older format and its variables do not
follow FLUXNET conventions. - Removed keyword
filetype
in classFluxProcessingChain
. It is now assumed that the variable names follow the
FLUXNET convention. Variables used in FLUXNET are
listed here (diive.pkgs.fluxprocessingchain.fluxprocessingchain.FluxProcessingChain
) - When detecting the base variable from which a flux variable was calculated, the variables defined for
filetypeEDDYPRO-FLUXNET-CSV-30MIN
are now assumed by default. (diive.pkgs.flux.common.detect_basevar
) - Renamed function that detects the base variable that was used to calculate the respective
flux (diive.pkgs.flux.common.detect_fluxbasevar
) - Renamed
gas
in functions related to completeness tests tofluxbasevar
to better reflect that the completeness
test does not necessarily require a gas (e.g.T_SONIC
is used to calculate the completeness for sensible heat
flux) (flag_fluxbasevar_completeness_eddypro_test
)
- Several changes to the flux processing chain to make sure it can also work with data files not directly output by
- Removing the radiation offset now uses
0.001
(W m-2) instead of50
as the threshold value to flag nighttime values
for the correction (diive.pkgs.corrections.offsetcorrection.remove_radiation_zero_offset
) - The database tag for meteo data screened with
diive
is
nowmeteoscreening_diive
(diive.pkgs.qaqc.meteoscreening.StepwiseMeteoScreeningDb.resample
) - During noise generation, function now uses the absolute values of the min/max of a series to calculate minimum noise
and maximum noise (diive.pkgs.createvar.noise.add_impulse_noise
)
Notebooks
- Added new notebook for outlier detection using class
zScore
(notebooks/OutlierDetection/zScore.ipynb
) - Added new notebook for outlier detection using
classzScoreDaytimeNighttime
(notebooks/OutlierDetection/zScoreDaytimeNighttime.ipynb
) - Added new notebook for outlier removal using trimming (
notebooks/OutlierDetection/TrimLow.ipynb
) - Updated notebook (
notebooks/MeteoScreening/StepwiseMeteoScreeningFromDatabase_v7.0.ipynb
) - When uploading screened meteo data to the database using the notebook
StepwiseMeteoScreeningFromDatabase
, variables
with the same name, measurement and data version as the screened variable(s) are now deleted from the database before
the new data are uploaded. Implemented in the Python packagedbc-influxdb
to avoid duplicates in the database. Such
duplicates can occur when one of the tags of an otherwise identical variable changed, e.g., when one of the tags of
the originally uploaded data was wrong and needed correction. The databaseInfluxDB
stores a new time series
alongside the previous time series when one of the tags is different in an otherwise identical time series.
Tests
- Added test case for
Hampel
filter (tests.test_outlierdetection.TestOutlierDetection.test_hampel_filter
) - Added test case for
HampelDaytimeNighttime
filter (tests.test_outlierdetection.TestOutlierDetection.test_hampel_filter_daytime_nighttime
) - Added test case for
zScore
(tests.test_outlierdetection.TestOutlierDetection.test_zscore
) - Added test case for
TrimLow
(tests.test_outlierdetection.TestOutlierDetection.test_trim_low_nt
) - Added test case
forzScoreDaytimeNighttime
(tests.test_outlierdetection.TestOutlierDetection.test_zscore_daytime_nighttime
) - 33/33 unittests ran successfully
Environment
- Added package sktime, a unified framework for machine learning with
time series.
What's Changed
Full Changelog: v0.77.0...v0.78.0
v0.77.0
v0.77.0 | 11 Jun 2024
Additions
- Plotting cumulatives with
CumulativeYear
now also shows the cumulative for the reference, i.e. for the mean over the
reference years (diive.core.plotting.cumulative.CumulativeYear
) - Plotting
DielCycle
now acceptsylim
parameter (diive.core.plotting.dielcycle.DielCycle
) - Added long-term dataset for local testing purposes (internal
only) (diive.configs.exampledata.load_exampledata_parquet_long
) - Added several classes in preparation for long-term gap-filling for a future update
Changes
- Several updates and changes to the base class for regressor decision
trees (diive.core.ml.common.MlRegressorGapFillingBase
):- The data are now split into training set and test set at the very start of regressor setup. This test set is used
to evaluate models on unseen data. The default split is 80% training and 20% test data. - Plotting (scores, importances etc.) is now generally separated from the method where they are calculated.
- the same
random_state
is now used for all processing steps - refactored code
- beautified console output
- The data are now split into training set and test set at the very start of regressor setup. This test set is used
- When correcting for relative humidity values above 100%, the maximum of the corrected time series is now set to 100,
after the (daily) offset was removed (diive.pkgs.corrections.offsetcorrection.remove_relativehumidity_offset
) - During feature reduction in machine learning regressors, features with permutation importance < 0 are now always
removed (diive.core.ml.common.MlRegressorGapFillingBase._remove_rejected_features
) - Changed default parameters for quick random forest gap-filling (
diive.pkgs.gapfilling.randomforest_ts.QuickFillRFTS
) - I tried to improve the console output (clarity) for several functions and methods
Environment
- Added package dtreeviz to visualize decision trees
Notebooks
- Updated notebook (
notebooks/GapFilling/RandomForestGapFilling.ipynb
) - Updated notebook (
notebooks/GapFilling/LinearInterpolation.ipynb
) - Updated notebook (
notebooks/GapFilling/XGBoostGapFillingExtensive.ipynb
) - Updated notebook (
notebooks/GapFilling/XGBoostGapFillingMinimal.ipynb
) - Updated notebook (
notebooks/GapFilling/RandomForestParamOptimization.ipynb
) - Updated notebook (
notebooks/GapFilling/QuickRandomForestGapFilling.ipynb
)
Tests
- Updated and fixed test case (
tests.test_outlierdetection.TestOutlierDetection.test_zscore_increments
) - Updated and fixed test case (
tests.test_gapfilling.TestGapFilling.test_gapfilling_randomforest
)
What's Changed
Full Changelog: v0.76.2...v0.77.0
v0.76.2
v0.76.2 | 23 May 2024
Additions
- Added function to calculate absolute double differences of a time series, which is the sum of absolute differences
between a data record and its preceding and next record. Used in classzScoreIncrements
for finding (isolated)
outliers that are distant from neighboring records. (diive.core.dfun.stats.double_diff_absolute
) - Added small function to calculate z-score stats of a time series (
diive.core.dfun.stats.sstats_zscore
) - Added small function to calculate stats for absolute double differences of a time
series (diive.core.dfun.stats.sstats_doublediff_abs
)
Changes
- Changed the algorithm for outlier detection when using
zScoreIncrements
. Data points are now flagged as outliers if
the z-scores of three absolute differences (previous record, next record and the sum of both) all exceed a specified
threshold. (diive.pkgs.outlierdetection.incremental.zScoreIncrements
)
Notebooks
- Added new notebook for outlier detection using
classLocalOutlierFactorAllData
(notebooks/OutlierDetection/LocalOutlierFactorAllData.ipynb
)
Tests
- Added new test case
forLocalOutlierFactorAllData
(tests.test_outlierdetection.TestOutlierDetection.test_lof_alldata
)
What's Changed
Full Changelog: v0.76.1...v0.76.2
v0.76.1
v0.76.1 | 17 May 2024
Additions
- It is now possible to set a fixed random seed when creating impulse
noise (diive.pkgs.createvar.noise.add_impulse_noise
)
Changes
- In class
zScoreIncrements
, outliers are now detected by calculating the sum of the absolute differences between a
data point and its respective preceding and next data point. Before, only the non-absolute difference of the preceding
data point was considered. The sum of absolute differences is then used to calculate the z-score and in further
consequence to flag outliers. (diive.pkgs.outlierdetection.incremental.zScoreIncrements
)
Notebooks
- Added new notebook for outlier detection using
classzScoreIncrements
(notebooks/OutlierDetection/zScoreIncremental.ipynb
) - Added new notebook for outlier detection using
classLocalSD
(notebooks/OutlierDetection/LocalSD.ipynb
)
Tests
- Added new test case for
zScoreIncrements
(tests.test_outlierdetection.TestOutlierDetection.test_zscore_increments
) - Added new test case for
LocalSD
(tests.test_outlierdetection.TestOutlierDetection.test_localsd
)
What's Changed
Full Changelog: v0.76.0...v0.76.1
v0.76.0
v0.76.0 | 14 May 2024
Diel cycle plot
The new class DielCycle
allows to plot diel cycles per month or across all data for time series data. At the moment,
it plots the (monthly) diel cycles as means (+/- standard deviation). It makes use of the time info contained in the
datetime timestamp index of the data. All aggregates are calculated by grouping data by time and (optional) separately
for each month. The diel cycles have the same time resolution as the time component of the timestamp index, e.g. hourly.
New features
- Added new class
DielCycle
for plotting diel cycles per month (diive.core.plotting.dielcycle.DielCycle
) - Added new function
diel_cycle
for calculating diel cycles per month. This function is also used by the plotting
classDielCycle
(diive.core.times.resampling.diel_cycle
)
Additions
- Added color scheme that contains 12 colors, one for each month. Not perfect, but better than
before. (diive.core.plotting.styles.LightTheme.colors_12_months
)
Notebooks
- Added new notebook for plotting diel cycles (per month) (
notebooks/Plotting/DielCycle.ipynb
) - Added new notebook for calculating diel cycles (per month) (
notebooks/Resampling/ResamplingDielCycle.ipynb
)
Tests
- Added test case for new function
diel_cycle
(tests.test_resampling.TestResampling.test_diel_cycle
)
What's Changed
Full Changelog: v0.75.0...v0.76.0
v0.75.0
v0.75.0 | 26 Apr 2024
XGBoost gap-filling
XGBoost can now be used to fill gaps in time series data.
In diive
, XGBoost
is implemented in class XGBoostTS
, which adds additional options for easily including e.g.
lagged variants of feature variables, timestamp info (DOY, month, ...) and a continuous record number. It also allows
direct feature reduction by including a purely random feature (consisting of completely random numbers) and calculating
the 'permutation importance'. All features where the permutation importance is lower than for the random feature can
then be removed from the dataset, i.e., the list of features, before building the final model.
XGBoostTS
and RandomForestTS
both use the same base class MlRegressorGapFillingBase
. This base class will also
facilitate the implementation of other gap-filling algorithms in the future.
Another fun (for me) addition is the new class TimeSince
. It allows to calculate the time since the last occurrence of
specific conditions. One example where this class can be useful is the calculation of 'time since last precipitation',
expressed as number of records, which can be helpful in identifying dry conditions. More examples: 'time since freezing
conditions' based on air temperature; 'time since management' based on management info, e.g. fertilization events.
Please see the notebook for some illustrative examples.
Please note that diive
is still under developement and bugs can be expected.
New features
- Added gap-filling class
XGBoostTS
for time series data,
using XGBoost (diive.pkgs.gapfilling.xgboost_ts.XGBoostTS
) - Added new class
TimeSince
: counts number of records (inceremental number / counter) since the last time a time
series was inside a specified range, useful for e.g. counting the time since last precipitation, since last freezing
temperature, etc. (diive.pkgs.createvar.timesince.TimeSince
)
Additions
- Added base class for machine learning regressors, which is basically the code shared between the different
methods. At the moment used byRandomForestTS
andXGBoostTS
. (diive.core.ml.common.MlRegressorGapFillingBase
) - Added option to change line color directly in
TimeSeries
plots (diive.core.plotting.timeseries.TimeSeries.plot
)
Notebooks
- Added new notebook for gap-filling using
XGBoostTS
with mininmal settings (notebooks/GapFilling/XGBoostGapFillingMinimal.ipynb
) - Added new notebook for gap-filling using
XGBoostTS
with more extensive settings (notebooks/GapFilling/XGBoostGapFillingExtensive.ipynb
) - Added new notebook for creating
TimeSince
variables (notebooks/CalculateVariable/TimeSince.ipynb
)
Tests
- Added test case for XGBoost gap-filling (
tests.test_gapfilling.TestGapFilling.test_gapfilling_xgboost
) - Updated test case for random forest gap-filling (
tests.test_gapfilling.TestGapFilling.test_gapfilling_randomforest
) - Harmonized test case for XGBoostTS with test case of RandomForestTS
- Added test case for
TimeSince
variable creation (tests.test_createvar.TestCreateVar.test_timesince
)
What's Changed
Full Changelog: v0.74.1...v0.75.0