- Adjust version number to avoid publishing conflict
- Removed invalid imports
- Added test case for
diive
imports (tests.test_imports.TestImports.test_imports
) - 52/52 unittests ran successfully
- New class
BinFitterCP
for fitting function to binned data, includes confidence interval and prediction interval (diive.pkgs.fits.fitter.BinFitterCP
)
- Added small function to detect duplicate entries in lists (
diive.core.funcs.funcs.find_duplicates_in_list
) - Added new filetype (
diive/configs/filetypes/ETH-MERCURY-CSV-20HZ.yml
) - Added new filetype (
diive/configs/filetypes/GENERIC-CSV-HEADER-1ROW-TS-END-FULL-NS-20HZ.yml
)
- Not directly a bug fix, but when reading EddyPro fluxnet files with
LoadEddyProOutputFiles
(e.g., in the flux processing chain) duplicate columns are now automatically renamed by adding a numbered suffix. For example, if two variables are namedCUSTOM_CH4_MEAN
in the output file, they are automatically renamed toCUSTOM_CH4_MEAN_1
andCUSTOM_CH4_MEAN_2
(diive.core.dfun.frames.compare_len_header_vs_data
)
- Added notebook example for
BinFitterCP
(notebooks/Fits/BinFitterCP.ipynb
) - Updated flux processing chain notebook to
v8.6
, import for loading EddyPro fluxnet output files was missing
- Added test case for
BinFitterCP
(tests.test_fits.TestFits.test_binfittercp
) - 51/51 unittests ran successfully
From now on Python version 3.11.10
is used for developing Python (up to now, version 3.9
was used). All unittests
were successfully executed with this new Python version. In addition, all notebooks were re-run, all looked good.
JupyterLab is now included in the environment, which makes it
easier to quickly install diive
(pip install diive
) in an environment and directly use its notebooks, without the
need to install JupyterLab separately.
diive
will now be developed using Python version3.11.10
- Added JupyterLab
- Added jupyter bokeh
- All notebooks were re-run and updated using Python version
3.11.10
- 50/50 unittests ran successfully with Python version
3.11.10
- Adjusted flags check in QCF flag report, the progressive flag must be the same as the previously calculated overall
flag (
diive.pkgs.qaqc.qcf.FlagQCF.report_qcf_evolution
)
- When detecting the frequency from the time delta of records, the inferred frequency is accepted if the most frequent
timedelta was found for more than 50% of records (
diive.core.times.times.timestamp_infer_freq_from_timedelta
) - Storage terms are now gap-filled using the rolling median in an expanding time window (
FluxStorageCorrectionSinglePointEddyPro._gapfill_storage_term
)
- Added notebook example for using the flux processing chain for CH4 flux from a subcanopy eddy covariance station (
notebooks/Workbench/CH-DAS_2023_FluxProcessingChain/FluxProcessingChain_NEE_CH-DAS_2023.ipynb
)
- Fixed info for storage term correction report to account for cases when more storage terms than flux records are
available (
FluxStorageCorrectionSinglePointEddyPro.report
)
- 50/50 unittests ran successfully
Finally it is possible to use the MDS
(marginal distribution sampling
) gap-filling method in diive
. This method is
the current default and widely used gap-filling method for eddy covariance ecosystem fluxes. For a detailed description
of the method see Reichstein et al. (2005) and Pastorello et al. (2020; full references given below).
The implementation of MDS
in diive
(FluxMDS
) follows the description in Reichstein et al. (2005) and should
therefore yield results similar to other implementations of this algorithm. FluxMDS
can also easily output model
scores, such as r2 and error values.
At the moment it is not yet possible to use FluxMDS
in the flux processing chain, but during the preparation of this
update the flux processing chain code was already refactored and prepared to include FluxMDS
in one of the next
updates.
At the moment, FluxMDS
is specifically tailored to gap-fill ecosystem fluxes, a more general implementation (e.g., to
gap-fill meteorological data) will follow.
- Added new gap-filling class
FluxMDS
:MDS
stands formarginal distribution sampling
. The method uses a time window to first identify meteorological conditions (short-wave incoming radiation, air temperature and VPD) similar to those when the missing data occurred. Gaps are then filled with the mean flux in the time window.FluxMDS
cannot be used in the flux processing chain, but will be implemented soon.- (
diive.pkgs.gapfilling.mds.FluxMDS
)
- Storage correction: By default, values missing in the storage term are now filled with a rolling mean in an
expanding time window. Testing showed that the (single point) storage term is missing for between 2-3% of the data,
which I think is reason enough to make filling these gaps the default option. Previously, it was optional to fill the
gaps using random forest, however, results were not great since only the timestamp info was used as model features.
Plots generated during Level-3.1 were also updated, now better showing the storage terms (gap-filled and
non-gap-filled) and the flag indicating filled values (
diive.pkgs.fluxprocessingchain.level31_storagecorrection.FluxStorageCorrectionSinglePointEddyPro
)
- Added notebook example for
FluxMDS
(notebooks/GapFilling/FluxMDSGapFilling.ipynb
)
- Added test case for
FluxMDS
(tests.test_gapfilling.TestGapFilling.test_fluxmds
) - 50/50 unittests ran successfully
- Fixed bug: overall quality flag
QCF
was not created correctly for the different USTAR scenarios (diive.core.base.identify.identify_flagcols
) (diive.pkgs.qaqc.qcf.FlagQCF
) - Fixed bug: calculation of
QCF
flag sums is now strictly done on flag columns. Before, sums were calculated across all columns in the flags dataframe, which resulted in erroneous overall flags after USTAR filtering (diive.pkgs.qaqc.qcf.FlagQCF._calculate_flagsums
)
- Added polars
- Pastorello, G. et al. (2020). The FLUXNET2015 dataset and the ONEFlux processing pipeline for eddy covariance data. 27. https://doi.org/10.1038/s41597-020-0534-3
- Reichstein, M., Falge, E., Baldocchi, D., Papale, D., Aubinet, M., Berbigier, P., Bernhofer, C., Buchmann, N., Gilmanov, T., Granier, A., Grunwald, T., Havrankova, K., Ilvesniemi, H., Janous, D., Knohl, A., Laurila, T., Lohila, A., Loustau, D., Matteucci, G., … Valentini, R. (2005). On the separation of net ecosystem exchange into assimilation and ecosystem respiration: Review and improved algorithm. Global Change Biology, 11(9), 1424–1439. https://doi.org/10.1111/j.1365-2486.2005.001002.x
- Added notebook showing an example for
LongTermGapFillingRandomForestTS
(notebooks/GapFilling/LongTermRandomForestGapFilling.ipynb
) - Added notebook example for
MeasurementOffset
(notebooks/Corrections/MeasurementOffset.ipynb
)
- Added unittest for
LongTermGapFillingRandomForestTS
(tests.test_gapfilling.TestGapFilling.test_gapfilling_longterm_randomforest
) - Added unittest for
WindDirOffset
(tests.test_corrections.TestCorrections.test_winddiroffset
) - Added unittest for
DaytimeNighttimeFlag
(tests.test_createvar.TestCreateVar.test_daytime_nighttime_flag
) - Added unittest for
calc_vpd_from_ta_rh
(tests.test_createvar.TestCreateVar.test_calc_vpd
) - Added unittest for
percentiles101
(tests.test_analyses.TestAnalyses.test_percentiles
) - Added unittest for
GapFinder
(tests.test_analyses.TestAnalyses.test_gapfinder
) - Added unittest for
SortingBinsMethod
(tests.test_analyses.TestAnalyses.test_sorting_bins_method
) - Added unittest for
daily_correlation
(tests.test_analyses.TestAnalyses.test_daily_correlation
) - Added unittest for
QuantileXYAggZ
(tests.test_analyses.TestCreateVar.test_quantilexyaggz
) - 49/49 unittests ran successfully
- Fixed bug that caused results from long-term gap-filling to be inconsistent despite using a fixed random state. I
found the following: when reducing features across years, the removal of duplicate features from a list of found
features created a list where the order of elements changed each run. This in turn produced slightly different
gap-filling results each time the long-term gap-filling was executed. Used Python version where this issue occurred
was
3.9.19
.- Here is a simplified example, where
input_list
is a list of elements with some duplicate elements: - Running
output_list = list(set(input_list))
generatesoutput_list
where the elements would have a different output order each run. The elements were otherwise the same, only their order changed. - To keep the order of elements consistent it was necessary to
output_list.sort()
. - (
diive.pkgs.gapfilling.longterm.LongTermGapFillingBase.reduce_features_across_years
)
- Here is a simplified example, where
- Corrected wind direction could be 360°, but will now be 0° (
diive.pkgs.corrections.winddiroffset.WindDirOffset._correct_degrees
)
It is now possible to gap-fill multi-year datasets using the class LongTermGapFillingRandomForestTS
. In this approach,
data from neighboring years are pooled together before training the random forest model for gap-filling a specific year.
This is especially useful for long-term, multi-year datasets where environmental conditions and drivers might change
over years and decades.
Why random forest? Because it performed well and to me it looks like the first choice for gap-filling ecosystem fluxes, at least at the moment.
Long-term gap-filling using random forest is now also built into the flux processing chain (Level-4.1). This allows to quickly gap-fill the different USTAR scenarios and to create some useful plots (I hope). See the flux processing chain notebook for how this looks like.
In a future update it will be possible to either directly switch to XGBoost
for gap-filling, or to use it (and other
machine-learning models) in combination with random forest in the flux processing chain.
Here is an example for a dataset containing CO2 flux (NEE
) measurements from 2005 to 2023:
- for gap-filling the year 2005, the model is trained on data from 2005, 2006 and 2007 (2005 has no previous year)
- for gap-filling the year 2006, the model is trained on data from 2005, 2006 and 2007 (same model as for 2005)
- for gap-filling the year 2007, the model is trained on data from 2006, 2007 and 2008
- ...
- for gap-filling the year 2012, the model is trained on data from 2011, 2012 and 2013
- for gap-filling the year 2013, the model is trained on data from 2012, 2013 and 2014
- for gap-filling the year 2014, the model is trained on data from 2013, 2014 and 2015
- ...
- for gap-filling the year 2021, the model is trained on data from 2020, 2021 and 2022
- for gap-filling the year 2022, the model is trained on data from 2021, 2022 and 2023 (same model as for 2023)
- for gap-filling the year 2023, the model is trained on data from 2021, 2022 and 2023 (2023 has no next year)
- Added new method for long-term (multiple years) gap-filling using random forest to flux processing chain (
diive.pkgs.fluxprocessingchain.fluxprocessingchain.FluxProcessingChain.level41_gapfilling_longterm
) - Added new class for long-term (multiple years) gap-filling using random forest (
diive.pkgs.gapfilling.longterm.LongTermGapFillingRandomForestTS
) - Added class for plotting cumulative sums across all data, for multiple columns (
diive.core.plotting.cumulative.Cumulative
) - Added class to detect a constant offset between two measurements (
diive.pkgs.corrections.measurementoffset.MeasurementOffset
)
- Creating lagged variants creates gaps which then leads to incomplete features in machine learning models. Now, gaps
are filled using simple forward and backward filling, limited to the number of values defined in lag. For example,
if variable TA is lagged by -2 value this creates two missing values for this variant at the start of the time series,
which then are then gap-filled using the simple backwards fill with
limit=2
. (diive.core.dfun.frames.lagged_variants
)
- Updated flux processing chain notebook to include long-term gap-filling using random forest (
notebooks/FluxProcessingChain/FluxProcessingChain.ipynb
) - Added new notebook for plotting cumulative sums across all data, for multiple columns (
notebooks/Plotting/Cumulative.ipynb
)
- Unittest for flux processing chain now includes many more methods (
tests.test_fluxprocessingchain.TestFluxProcessingChain.test_fluxprocessingchain
) - 39/39 unittests ran successfully
- Fixed deprecation warning in (
diive.core.ml.common.prediction_scores_regr
)
This update brings advancements for post-processing eddy covariance data in the context of the FluxProcessingChain
.
The goal is to offer a complete chain for post-processing ecosystem flux data, specifically designed to work seamlessly
with the standardized _fluxnet
output file from the
widely-used EddyPro software.
Now, diive offers the option for USTAR filtering based on known constant thresholds across the entire dataset (similar
to the CUT
scenarios in FLUXNET data). While seasonal (DJF, MAM, JJA, SON) thresholds are calculated internally,
applying them on a seasonal basis or using variable thresholds per year (like FLUXNET's VUT
scenarios) isn't yet
implemented.
With this update, the FluxProcessingChain
class can handle various data processing steps:
- Level-2: Quality flag expansion
- Level-3.1: Storage correction
- Level-3.2: Outlier removal
- Level-3.3: (new) USTAR filtering (with constant thresholds for now)
- (upcoming) Level-4.1: long-term gap-filling using random forest and XGBoost
- For info about the different flux levels see Swiss FluxNet flux processing chain
- Added class to apply multiple known constant USTAR (friction velocity) thresholds, creating flags that indicate time
periods characterized by low turbulence for multiple USTAR scenarios. The constant thresholds must be known
beforehand, e.g., from an earlier USTAR detection run, or from results from FLUXNET (
diive.pkgs.flux.ustarthreshold.FlagMultipleConstantUstarThresholds
) - Added class to apply one single known constant USTAR thresholds (
diive.pkgs.flux.ustarthreshold.FlagSingleConstantUstarThreshold
) - Added
FlagMultipleConstantUstarThresholds
to the flux processing chain (diive.pkgs.fluxprocessingchain.fluxprocessingchain.FluxProcessingChain.level33_constant_ustar
) - Added USTAR detection algorithm based on Papale et al., 2006 (
diive.pkgs.flux.ustarthreshold.UstarDetectionMPT
) - Added function to analyze high-quality ecosystem fluxes that helps in understanding the range of highest-quality data(
diive.pkgs.flux.hqflux.analyze_highest_quality_flux
)
LocalSD
outlier detection can now use a constant SD:- Added parameter to use standard deviation across all data (constant) instead of the rolling SD to calculate the
upper and lower limits that define outliers in the median rolling window (
diive.pkgs.outlierdetection.localsd.LocalSD
) - Added to step-wise outlier detection (
diive.pkgs.outlierdetection.stepwiseoutlierdetection.StepwiseOutlierDetection.flag_outliers_localsd_test
) - Added to meteoscreening from database (
diive.pkgs.qaqc.meteoscreening.StepwiseMeteoScreeningDb.flag_outliers_localsd_test
) - Added to flux processing chain (
diive.pkgs.fluxprocessingchain.fluxprocessingchain.FluxProcessingChain.level32_flag_outliers_localsd_test
)
- Added parameter to use standard deviation across all data (constant) instead of the rolling SD to calculate the
upper and lower limits that define outliers in the median rolling window (
- Replaced
.plot_date()
from the Matplotlib library with.plot()
due to deprecation
- Added notebook for plotting cumulative sums per year (
notebooks/Plotting/CumulativesPerYear.ipynb
) - Added notebook for removing outliers based on the z-score in rolling time window (
notebooks/OutlierDetection/zScoreRolling.ipynb
)
- Fixed bug when saving a pandas Series to parquet (
diive.core.io.files.save_parquet
) - Fixed bug when plotting
doy_mean_cumulative
: no longer crashes when years defined in parameterexcl_years_from_reference
are not in dataset (diive.core.times.times.doy_mean_cumulative
) - Fixed deprecation warning when plotting in
bokeh
(interactive plots)
- Added unittest for
LocalSD
using constant SD (tests.test_outlierdetection.TestOutlierDetection.test_localsd_with_constantsd
) - Added unittest for rolling z-score outlier removal (
tests.test_outlierdetection.TestOutlierDetection.test_zscore_rolling
) - Improved check if figure and axis were created in (
tests.test_plots.TestPlots.test_histogram
) - 39/39 unittests ran successfully
- Added new package
scikit-optimize
- Added new package
category_encoders
- Added outlier tests to step-wise meteoscreening from database:
Hampel
,HampelDaytimeNighttime
andTrimLow
(diive.pkgs.qaqc.meteoscreening.StepwiseMeteoScreeningDb
) - Added parameter to control whether or not to output the middle timestamp when loading parquet files with
load_parquet()
. By default,output_middle_timestamp=True
. (diive.core.io.files.load_parquet
)
- Re-created environment and created new
lock
file - Currently using Python 3.9.19
- Added new notebook for creating a flag that indicates missing values (
notebooks/OutlierDetection/MissingValues.ipynb
) - Updated notebook for meteoscreening from database (
notebooks/MeteoScreening/StepwiseMeteoScreeningFromDatabase.ipynb
) - Updated notebook for loading and saving parquet files (
notebooks/Formats/LoadSaveParquetFile.ipynb
)
- Added unittest for flagging missing values (
tests.test_outlierdetection.TestOutlierDetection.test_missing_values
) - 37/37 unittests ran successfully
- Fixed links in README, needed absolute links to notebooks
- Fixed issue with return list in (
diive.pkgs.analyses.histogram.Histogram.peakbins
)
- Added new function to apply quality flags to certain time periods only (
diive.pkgs.qaqc.flags.restrict_application
) - Added to option to restrict the application of the angle-of-attack flag to certain time periods (
diive.pkgs.fluxprocessingchain.level2_qualityflags.FluxQualityFlagsEddyPro.angle_of_attack_test
)
- Test options in
FluxProcessingChain
are now always passed as dict. This has the advantage that in addition to run the test by setting the dict keyapply
toTrue
, various other test settings can be passed, for example the new parameterapplication dates
for the angle-of-attack flag. (diive.pkgs.fluxprocessingchain.fluxprocessingchain.FluxProcessingChain
)
- Added unittest for Flux Processing Chain up to Level-2 (
tests.test_fluxprocessingchain.TestFluxProcessingChain.test_fluxprocessingchain_level2
) - 36/36 unittests ran successfully
This version introduces a histogram plot that has the option to display z-score as vertical lines superimposed on the distribution, which helps in assessing z-score settings used by some outlier removal functions.
Histogram plot of half-hourly air temperature measurements at the ICOS Class 1 ecosystem station Davos between 2013 and 2022, displayed in 20 equally-spaced bins. The dashed vertical lines show the z-score and the corresponding value calculated based on the time series. The bin with most counts is highlighted orange.
- Added new class
HistogramPlot
for plotting histograms, based on the Matplotlib implementation (diive.core.plotting.histogram.HistogramPlot
) - Added function to calculate the value for a specific z-score, e.g., based on a time series it calculates the value
where z-score =
3
etc. (diive.core.funcs.funcs.val_from_zscore
)
- Added histogram plots to
FlagBase
, histograms are now shown for all outlier methods (diive.core.base.flagbase.FlagBase.defaultplot
) - Added daytime/nighttime histogram plots to (
diive.pkgs.outlierdetection.hampel.HampelDaytimeNighttime
) - Added daytime/nighttime histogram plots to (
diive.pkgs.outlierdetection.zscore.zScoreDaytimeNighttime
) - Added daytime/nighttime histogram plots to (
diive.pkgs.outlierdetection.lof.LocalOutlierFactorDaytimeNighttime
) - Added daytime/nighttime histogram plots to (
diive.pkgs.outlierdetection.absolutelimits.AbsoluteLimitsDaytimeNighttime
) - Added option to calculate the z-score with sign instead of absolute (
diive.core.funcs.funcs.zscore
)
- Improved daytime/nighttime outlier plot used by various outlier removal classes (
diive.core.base.flagbase.FlagBase.plot_outlier_daytime_nighttime
)
- Added notebook for plotting histograms (
notebooks/Plotting/Histogram.ipynb
) - Added notebook for manual removal of data points (
notebooks/OutlierDetection/ManualRemoval.ipynb
) - Added notebook for outlier detection using local outlier factor, separately during daytime and nighttime (
notebooks/OutlierDetection/LocalOutlierFactorDaytimeNighttime.ipynb
) - Updated notebook (
notebooks/OutlierDetection/HampelDaytimeNighttime.ipynb
) - Updated notebook (
notebooks/OutlierDetection/AbsoluteLimitsDaytimeNighttime.ipynb
) - Updated notebook (
notebooks/OutlierDetection/zScoreDaytimeNighttime.ipynb
) - Updated notebook (
notebooks/OutlierDetection/LocalOutlierFactorAllData.ipynb
)
- Added unittest for plotting histograms (
tests.test_plots.TestPlots.test_histogram
) - Added unittest for calculating histograms (without plotting) (
tests.test_analyses.TestCreateVar.test_histogram
)
- Added CITATIONS file
- Added option to set different
n_sigma
for daytime and nightime data inHampelDaytimeNighttime
(diive.pkgs.outlierdetection.hampel.HampelDaytimeNighttime
) - Updated
flag_outliers_hampel_dtnt_test
in step-wise outlier detection - Updated
level32_flag_outliers_hampel_dtnt_test
in flux processing chain
- Updated notebook
HampelDaytimeNighttime
- Updated notebook
FluxProcessingChain
- Updated unittest
test_hampel_filter_daytime_nighttime
- 35/35 unittests ran successfully
- Added new class for outlier removal, based on the rolling z-score. It can also be used in step-wise outlier detection
and during meteoscreening from the
database. (
diive.pkgs.outlierdetection.zscore.zScoreRolling
,diive.pkgs.outlierdetection.stepwiseoutlierdetection.StepwiseOutlierDetection
,diive.pkgs.qaqc.meteoscreening.StepwiseMeteoScreeningDb
). - Added Hampel filter for outlier removal (
diive.pkgs.outlierdetection.hampel.Hampel
) - Added Hampel filter (separate daytime, nighttime) for outlier
removal (
diive.pkgs.outlierdetection.hampel.HampelDaytimeNighttime
) - Added function to plot daytime and nighttime outliers during outlier
tests (
diive.core.plotting.outlier_dtnt.outlier_daytime_nighttime
)
- Flux processing chain:
- Several changes to the flux processing chain to make sure it can also work with data files not directly output by
EddyPro. The class
FluxProcessingChain
can now handle files that have a different format than the two EddyPro output filesEDDYPRO-FLUXNET-CSV-30MIN
andEDDYPRO-FULL-OUTPUT-CSV-30MIN
. See following notes. - Removed option to process EddyPro
_full_output_
files, since it as an older format and its variables do not follow FLUXNET conventions. - Removed keyword
filetype
in classFluxProcessingChain
. It is now assumed that the variable names follow the FLUXNET convention. Variables used in FLUXNET are listed here (diive.pkgs.fluxprocessingchain.fluxprocessingchain.FluxProcessingChain
) - When detecting the base variable from which a flux variable was calculated, the variables defined for
filetype
EDDYPRO-FLUXNET-CSV-30MIN
are now assumed by default. (diive.pkgs.flux.common.detect_basevar
) - Renamed function that detects the base variable that was used to calculate the respective
flux (
diive.pkgs.flux.common.detect_fluxbasevar
) - Renamed
gas
in functions related to completeness tests tofluxbasevar
to better reflect that the completeness test does not necessarily require a gas (e.g.T_SONIC
is used to calculate the completeness for sensible heat flux) (flag_fluxbasevar_completeness_eddypro_test
)
- Several changes to the flux processing chain to make sure it can also work with data files not directly output by
EddyPro. The class
- Removing the radiation offset now uses
0.001
(W m-2) instead of50
as the threshold value to flag nighttime values for the correction (diive.pkgs.corrections.offsetcorrection.remove_radiation_zero_offset
) - The database tag for meteo data screened with
diive
is nowmeteoscreening_diive
(diive.pkgs.qaqc.meteoscreening.StepwiseMeteoScreeningDb.resample
) - During noise generation, function now uses the absolute values of the min/max of a series to calculate minimum noise
and maximum noise (
diive.pkgs.createvar.noise.add_impulse_noise
)
- Added new notebook for outlier detection using class
zScore
(notebooks/OutlierDetection/zScore.ipynb
) - Added new notebook for outlier detection using
class
zScoreDaytimeNighttime
(notebooks/OutlierDetection/zScoreDaytimeNighttime.ipynb
) - Added new notebook for outlier removal using trimming (
notebooks/OutlierDetection/TrimLow.ipynb
) - Updated notebook (
notebooks/MeteoScreening/StepwiseMeteoScreeningFromDatabase_v7.0.ipynb
) - When uploading screened meteo data to the database using the notebook
StepwiseMeteoScreeningFromDatabase
, variables with the same name, measurement and data version as the screened variable(s) are now deleted from the database before the new data are uploaded. Implemented in the Python packagedbc-influxdb
to avoid duplicates in the database. Such duplicates can occur when one of the tags of an otherwise identical variable changed, e.g., when one of the tags of the originally uploaded data was wrong and needed correction. The databaseInfluxDB
stores a new time series alongside the previous time series when one of the tags is different in an otherwise identical time series.
- Added test case for
Hampel
filter (tests.test_outlierdetection.TestOutlierDetection.test_hampel_filter
) - Added test case for
HampelDaytimeNighttime
filter (tests.test_outlierdetection.TestOutlierDetection.test_hampel_filter_daytime_nighttime
) - Added test case for
zScore
(tests.test_outlierdetection.TestOutlierDetection.test_zscore
) - Added test case for
TrimLow
(tests.test_outlierdetection.TestOutlierDetection.test_trim_low_nt
) - Added test case
for
zScoreDaytimeNighttime
(tests.test_outlierdetection.TestOutlierDetection.test_zscore_daytime_nighttime
) - 33/33 unittests ran successfully
- Added package sktime, a unified framework for machine learning with time series.
- Plotting cumulatives with
CumulativeYear
now also shows the cumulative for the reference, i.e. for the mean over the reference years (diive.core.plotting.cumulative.CumulativeYear
) - Plotting
DielCycle
now acceptsylim
parameter (diive.core.plotting.dielcycle.DielCycle
) - Added long-term dataset for local testing purposes (internal
only) (
diive.configs.exampledata.load_exampledata_parquet_long
) - Added several classes in preparation for long-term gap-filling for a future update
- Several updates and changes to the base class for regressor decision
trees (
diive.core.ml.common.MlRegressorGapFillingBase
):- The data are now split into training set and test set at the very start of regressor setup. This test set is used to evaluate models on unseen data. The default split is 80% training and 20% test data.
- Plotting (scores, importances etc.) is now generally separated from the method where they are calculated.
- the same
random_state
is now used for all processing steps - refactored code
- beautified console output
- When correcting for relative humidity values above 100%, the maximum of the corrected time series is now set to 100,
after the (daily) offset was removed (
diive.pkgs.corrections.offsetcorrection.remove_relativehumidity_offset
) - During feature reduction in machine learning regressors, features with permutation importance < 0 are now always
removed (
diive.core.ml.common.MlRegressorGapFillingBase._remove_rejected_features
) - Changed default parameters for quick random forest gap-filling (
diive.pkgs.gapfilling.randomforest_ts.QuickFillRFTS
) - I tried to improve the console output (clarity) for several functions and methods
- Added package dtreeviz to visualize decision trees
- Updated notebook (
notebooks/GapFilling/RandomForestGapFilling.ipynb
) - Updated notebook (
notebooks/GapFilling/LinearInterpolation.ipynb
) - Updated notebook (
notebooks/GapFilling/XGBoostGapFillingExtensive.ipynb
) - Updated notebook (
notebooks/GapFilling/XGBoostGapFillingMinimal.ipynb
) - Updated notebook (
notebooks/GapFilling/RandomForestParamOptimization.ipynb
) - Updated notebook (
notebooks/GapFilling/QuickRandomForestGapFilling.ipynb
)
- Updated and fixed test case (
tests.test_outlierdetection.TestOutlierDetection.test_zscore_increments
) - Updated and fixed test case (
tests.test_gapfilling.TestGapFilling.test_gapfilling_randomforest
)
- Added function to calculate absolute double differences of a time series, which is the sum of absolute differences
between a data record and its preceding and next record. Used in class
zScoreIncrements
for finding (isolated) outliers that are distant from neighboring records. (diive.core.dfun.stats.double_diff_absolute
) - Added small function to calculate z-score stats of a time series (
diive.core.dfun.stats.sstats_zscore
) - Added small function to calculate stats for absolute double differences of a time
series (
diive.core.dfun.stats.sstats_doublediff_abs
)
- Changed the algorithm for outlier detection when using
zScoreIncrements
. Data points are now flagged as outliers if the z-scores of three absolute differences (previous record, next record and the sum of both) all exceed a specified threshold. (diive.pkgs.outlierdetection.incremental.zScoreIncrements
)
- Added new notebook for outlier detection using
class
LocalOutlierFactorAllData
(notebooks/OutlierDetection/LocalOutlierFactorAllData.ipynb
)
- Added new test case
for
LocalOutlierFactorAllData
(tests.test_outlierdetection.TestOutlierDetection.test_lof_alldata
)
- It is now possible to set a fixed random seed when creating impulse
noise (
diive.pkgs.createvar.noise.add_impulse_noise
)
- In class
zScoreIncrements
, outliers are now detected by calculating the sum of the absolute differences between a data point and its respective preceding and next data point. Before, only the non-absolute difference of the preceding data point was considered. The sum of absolute differences is then used to calculate the z-score and in further consequence to flag outliers. (diive.pkgs.outlierdetection.incremental.zScoreIncrements
)
- Added new notebook for outlier detection using
class
zScoreIncrements
(notebooks/OutlierDetection/zScoreIncremental.ipynb
) - Added new notebook for outlier detection using
class
LocalSD
(notebooks/OutlierDetection/LocalSD.ipynb
)
- Added new test case for
zScoreIncrements
(tests.test_outlierdetection.TestOutlierDetection.test_zscore_increments
) - Added new test case for
LocalSD
(tests.test_outlierdetection.TestOutlierDetection.test_localsd
)
The new class DielCycle
allows to plot diel cycles per month or across all data for time series data. At the moment,
it plots the (monthly) diel cycles as means (+/- standard deviation). It makes use of the time info contained in the
datetime timestamp index of the data. All aggregates are calculated by grouping data by time and (optional) separately
for each month. The diel cycles have the same time resolution as the time component of the timestamp index, e.g. hourly.
- Added new class
DielCycle
for plotting diel cycles per month (diive.core.plotting.dielcycle.DielCycle
) - Added new function
diel_cycle
for calculating diel cycles per month. This function is also used by the plotting classDielCycle
(diive.core.times.resampling.diel_cycle
)
- Added color scheme that contains 12 colors, one for each month. Not perfect, but better than
before. (
diive.core.plotting.styles.LightTheme.colors_12_months
)
- Added new notebook for plotting diel cycles (per month) (
notebooks/Plotting/DielCycle.ipynb
) - Added new notebook for calculating diel cycles (per month) (
notebooks/Resampling/ResamplingDielCycle.ipynb
)
- Added test case for new function
diel_cycle
(tests.test_resampling.TestResampling.test_diel_cycle
)
XGBoost can now be used to fill gaps in time series data.
In diive
, XGBoost
is implemented in class XGBoostTS
, which adds additional options for easily including e.g.
lagged variants of feature variables, timestamp info (DOY, month, ...) and a continuous record number. It also allows
direct feature reduction by including a purely random feature (consisting of completely random numbers) and calculating
the 'permutation importance'. All features where the permutation importance is lower than for the random feature can
then be removed from the dataset, i.e., the list of features, before building the final model.
XGBoostTS
and RandomForestTS
both use the same base class MlRegressorGapFillingBase
. This base class will also
facilitate the implementation of other gap-filling algorithms in the future.
Another fun (for me) addition is the new class TimeSince
. It allows to calculate the time since the last occurrence of
specific conditions. One example where this class can be useful is the calculation of 'time since last precipitation',
expressed as number of records, which can be helpful in identifying dry conditions. More examples: 'time since freezing
conditions' based on air temperature; 'time since management' based on management info, e.g. fertilization events.
Please see the notebook for some illustrative examples.
Please note that diive
is still under developement and bugs can be expected.
- Added gap-filling class
XGBoostTS
for time series data, using XGBoost (diive.pkgs.gapfilling.xgboost_ts.XGBoostTS
) - Added new class
TimeSince
: counts number of records (inceremental number / counter) since the last time a time series was inside a specified range, useful for e.g. counting the time since last precipitation, since last freezing temperature, etc. (diive.pkgs.createvar.timesince.TimeSince
)
- Added base class for machine learning regressors, which is basically the code shared between the different
methods. At the moment used by
RandomForestTS
andXGBoostTS
. (diive.core.ml.common.MlRegressorGapFillingBase
) - Added option to change line color directly in
TimeSeries
plots (diive.core.plotting.timeseries.TimeSeries.plot
)
- Added new notebook for gap-filling using
XGBoostTS
with mininmal settings (notebooks/GapFilling/XGBoostGapFillingMinimal.ipynb
) - Added new notebook for gap-filling using
XGBoostTS
with more extensive settings (notebooks/GapFilling/XGBoostGapFillingExtensive.ipynb
) - Added new notebook for creating
TimeSince
variables (notebooks/CalculateVariable/TimeSince.ipynb
)
- Added test case for XGBoost gap-filling (
tests.test_gapfilling.TestGapFilling.test_gapfilling_xgboost
) - Updated test case for random forest gap-filling (
tests.test_gapfilling.TestGapFilling.test_gapfilling_randomforest
) - Harmonized test case for XGBoostTS with test case of RandomForestTS
- Added test case for
TimeSince
variable creation (tests.test_createvar.TestCreateVar.test_timesince
)
This update adds the first notebooks (and tests) for outlier detection methods. Only two tests are included so far and
both tests are relatively simple, but both notebooks already show in principle how outlier removal is handled. An
important aspect is that diive
single outlier methods do not remove outliers by default, but instead a flag is created
that shows where the outliers are located. The flag can then be used to remove the data points.
This update also includes the addition of a small function that creates artificial spikes in time series data and is
therefore very useful for testing outlier detection methods.
More outlier removal notebooks will be added in the future, including a notebook that shows how to combine results from
multiple outlier tests into one single overall outlier flag.
- Added: new function to add impulse noise to time series (
diive.pkgs.createvar.noise.impulse
)
- Added: new notebook for outlier detection: absolute limits, separately for daytime and nighttime
data (
notebooks/OutlierDetection/AbsoluteLimitsDaytimeNighttime.ipynb
) - Added: new notebook for outlier detection: absolute limits (
notebooks/OutlierDetection/AbsoluteLimits.ipynb
)
- Added: test case for outlier detection: absolute limits, separately for daytime and
nighttime data (
tests.test_outlierdetection.TestOutlierDetection.test_absolute_limits
) - Added: test case for outlier detection: absolute
limits (
tests.test_outlierdetection.TestOutlierDetection.test_absolute_limits
)
- Added: new function to remove rows that do not have timestamp
info (
NaT
) (diive.core.times.times.remove_rows_nat
anddiive.core.times.times.TimestampSanitizer
) - Added: new settings
VARNAMES_ROW
andVARUNITS_ROW
in filetypes YAML files, allows better and more specific configuration when reading data files (diive/configs/filetypes
) - Added: many (small) example data files for various filetypes, e.g.
ETH-RECORD-TOA5-CSVGZ-20HZ
- Added: new optional check in
TimestampSanitizer
that compares the detected time resolution of a time series with the nominal (expected) time resolution. Runs automatically when reading files withReadFileType
, in which case theFREQUENCY
from the filetype configs is used as the nominal time resolution. (diive.core.times.times.TimestampSanitizer
,diive.core.io.filereader.ReadFileType
) - Added: application of
TimestampSanitizer
after inserting a timestamp and setting it as index with functioninsert_timestamp
, this makes sure the freq/freqstr info is available for the new timestamp index (diive.core.times.times.insert_timestamp
)
- General: Ran all notebook examples to make sure they work with this version of
diive
- Added: new notebook for reading EddyPro fluxnet output file with
DataFileReader
parameters (notebooks/ReadFiles/Read_single_EddyPro_fluxnet_output_file_with_DataFileReader.ipynb
) - Added: new notebook for reading EddyPro fluxnet output file with
ReadFileType
and pre-defined filetypeEDDYPRO-FLUXNET-CSV-30MIN
(notebooks/ReadFiles/Read_single_EddyPro_fluxnet_output_file_with_ReadFileType.ipynb
) - Added: new notebook for reading multiple EddyPro fluxnet output files with
MultiDataFileReader
and pre-defined filetypeEDDYPRO-FLUXNET-CSV-30MIN
(notebooks/ReadFiles/Read_multiple_EddyPro_fluxnet_output_files_with_MultiDataFileReader.ipynb
)
- Renamed: function
get_len_header
toparse_header
(diive.core.dfun.frames.parse_header
) - Renamed: exampledata files (
diive/configs/exampledata
) - Renamed: filetypes YAML files to always include the file extension in the file name (
diive/configs/filetypes
) - Reduced: file size for most example data files
- Added: various test cases for loading filetypes (
tests/test_loaddata.py
) - Added: test case for loading and merging multiple
files (
tests.test_loaddata.TestLoadFiletypes.test_load_exampledata_multiple_EDDYPRO_FLUXNET_CSV_30MIN
) - Added: test case for reading EddyPro fluxnet output file with
DataFileReader
parameters (tests.test_loaddata.TestLoadFiletypes.test_load_exampledata_EDDYPRO_FLUXNET_CSV_30MIN_datafilereader_parameters
) - Added: test case for resampling series to 30MIN time
resolution (
tests.test_time.TestTime.test_resampling_to_30MIN
) - Added: test case for inserting timestamp with a different convention (middle, start,
end) (
tests.test_time.TestTime.test_insert_timestamp
) - Added: test case for inserting timestamp as index (
tests.test_time.TestTime.test_insert_timestamp_as_index
)
- Fixed: bug in class
DetectFrequency
when inferred frequency isNone
(diive.core.times.times.DetectFrequency
) - Fixed: bug in class
DetectFrequency
wherepd.Timedelta()
would crash if the input frequency does not have a number.Timedelta
does not accept e.g. the frequency stringmin
for minutely time resolution, even though e.g.pd.infer_freq()
outputsmin
for data in 1-minute time resolution.TimeDelta
requires a number, in this case1min
. Results frominfer_freq()
are now checked if they contain a number and if not,1
is added at the beginning of the frequency string. (diive.core.times.times.DetectFrequency
) - Fixed: bug in notebook
WindDirectionOffset
, related to frequency detection during heatmap plotting - Fixed: bug in
TimestampSanitizer
where the script would crash if the timestamp contained an element that could not be converted to datetime, e.g., when there is a string mixed in with the regular timestamps. Data rows with invalid timestamps are now parsed asNaT
by usingerrors='coerce'
inpd.to_datetime(data.index, errors='coerce')
. (diive.core.times.times.convert_timestamp_to_datetime
anddiive.core.times.times.TimestampSanitizer
) - Fixed: bug when plotting heatmap (
diive.core.plotting.heatmap_datetime.HeatmapDateTime
)
- Added new function
trim_frame
that allows to trim the start and end of a dataframe based on available records of a variable (diive.core.dfun.frames.trim_frame
) - Added new option to export borderless
heatmaps (
diive.core.plotting.heatmap_base.HeatmapBase.export_borderless_heatmap
)
- Added more info in comments of class
WindRotation2D
(diive.pkgs.echires.windrotation.WindRotation2D
) - Added example data for EddyPro full_output
files (
diive.configs.exampledata.load_exampledata_eddypro_full_output_CSV_30MIN
) - Added code in an attempt to harmonize frequency detection from data: in class
DetectFrequency
the detected frequency strings are now converted fromTimedelta
(pandas) tooffset
(pandas) to.freqstr
. This will yield the frequency string as seen by (the current version of) pandas. The idea is to harmonize between different representations e.g.T
ormin
for minutes ( see here). (diive.core.times.times.DetectFrequency
)
- Updated class
DataFileReader
to comply with newpandas
kwargs when using.read_csv()
(diive.core.io.filereader.DataFileReader._parse_file
) - Environment: updated
pandas
to v2.2.2 andpyarrow
to v15.0.2 - Updated date offsets in config filetypes to be compliant with
pandas
version 2.2+ ( see here and here), e.g.,30T
was changed to30min
. This seems to work without raising a warning, however, if frequency is inferred from available data, the resulting frequency string shows e.g.30T
, i.e. still showingT
for minutes instead ofmin
. (diive/configs/filetypes
) - Changed variable names in
WindRotation2D
to be in line with the variable names given in the paper by Wilczak et al. (2001) https://doi.org/10.1023/A:1018966204465
- Removed function
timedelta_to_string
because this can be done with pandasto_offset().freqstr
- Removed function
generate_freq_str
(unused)
- Added test case for reading EddyPro full_output
files (
tests.test_loaddata.TestLoadFiletypes.test_load_exampledata_eddypro_full_output_CSV_30MIN
) - Updated test for frequency detection (
tests.test_timestamps.TestTime.test_detect_freq
)
pyproject.toml
now uses the inequality syntax>=
instead of caret syntax^
because the version capping is restrictive and prevents compatibility in conda installations. See #74- Added badges in
README.md
- Smaller
diive
logo inREADME.md
- Added new heatmap plotting class
HeatmapYearMonth
that allows to plot a variable in year/month classes(diive.core.plotting.heatmap_datetime.HeatmapYearMonth
)
- Refactored code for class
HeatmapDateTime
(diive.core.plotting.heatmap_datetime.HeatmapDateTime
) - Added new base class
HeatmapBase
for heatmap plots. Currently used byHeatmapYearMonth
andHeatmapDateTime
(diive.core.plotting.heatmap_base.HeatmapBase
)
- Added new notebook for
HeatmapDateTime
(notebooks/Plotting/HeatmapDateTime.ipynb
) - Added new notebook for
HeatmapYearMonth
(notebooks/Plotting/HeatmapYearMonth.ipynb
)
- Fixed bug in
HeatmapDateTime
where the last record of each day was not shown
- Added new notebook for
Percentiles
(notebooks/Analyses/Percentiles.ipynb
) - Added new notebook for
LinearInterpolation
(notebooks/GapFilling/LinearInterpolation.ipynb
) - Added new notebook for calculating z-aggregates in quantiles (classes) of x and
y (
notebooks/Analyses/CalculateZaggregatesInQuantileClassesOfXY.ipynb
) - Updated notebook for
DaytimeNighttimeFlag
(notebooks/CalculateVariable/DaytimeNighttimeFlag.ipynb
)
- Updated notebook for
SortingBinsMethod
(diive.pkgs.analyses.decoupling.SortingBinsMethod
)
Plot showing vapor pressure deficit (y) in 10 classes of short-wave incoming radiation (x), separate for 5 classes of
air temperature (z). All values shown are medians of the respective variable. The shaded errorbars refer to the
interquartile range for the respective class. Plot was generated using the class SortingBinsMethod
.
- Refactored class
LongtermAnomaliesYear
(diive.core.plotting.bar.LongtermAnomaliesYear
)
- Added new notebook for
LongtermAnomaliesYear
(notebooks/Plotting/LongTermAnomalies.ipynb
)
- Refactored class
SortingBinsMethod
: Allows to investigate binned aggregates of a variable z in binned classes of x and y. All bins now show medians and interquartile ranges. (diive.pkgs.analyses.decoupling.SortingBinsMethod
)
- Added new notebook for
SortingBinsMethod
- Added absolute links to example notebooks in
README.md
- From now on,
diive
is officially published on pypi
- Added new notebook for
daily_correlation
function (notebooks/Analyses/DailyCorrelation.ipynb
) - Added new notebook for
Histogram
class (notebooks/Analyses/Histogram.ipynb
)
- Daily correlations are now returned with daily (
1d
) timestamp index (diive.pkgs.analyses.correlation.daily_correlation
) - Updated README
- Environment: Added ruff to dev dependencies for linting
- Fixed: Replaced all references to old filetypes using the underscore to their respective new filetype names,
e.g. all occurrences of
EDDYPRO_FLUXNET_30MIN
were replaced with the new nameEDDYPRO-FLUXNET-CSV-30MIN
. - Environment: Python 3.11 is now allowed in
pyproject.toml
:python = ">=3.9,<3.12"
- Environment: Removed
fitter
library from dependencies, was not used. - Docs: Testing documentation generation using Sphinx, although it looks very rough at the moment.
This update focuses on the implementation of several classes that work with high-resolution (20 Hz) data.
The main motivation behind these implementations is the upcoming new version of another
script, dyco, which will make direct use of these new classes. dyco
allows
to detect and remove time lags from time series data and can also handle drifting lags, i.e., lags that
are not constant over time. This is especially useful for eddy covariance data, where the detection of
accurate time lags is of high importance for the calculation of ecosystem fluxes.
Plot showing the covariance between the turbulent departures of vertical wind and CO2 measurements.
Maximum (absolute) covariance was found at record -26, which means that the CO2 signal has to be shifted
by 26 records in relation to the wind data to obtain the maximum covariance between the two variables.
Since the covariance was calculated on 20 Hz data, this corresponds to a time lag of 1.3 seconds
between CO2 and wind (20 Hz = measurement every 0.05 seconds, 26 * 0.05 = 1.3), or, to put it
another way, the CO2 signal arrived 1.3 seconds later at the sensor than the wind signal. Maximum
covariance was calculated using the MaxCovariance
class.
- Added new class
MaxCovariance
to find the maximum covariance between two variables (diive.pkgs.echires.lag.MaxCovariance
) - Added new class
FileDetector
to detect expected and unexpected files from a list of files (diive.core.io.filesdetector.FileDetector
) - Added new class
FileSplitter
to split file into multiple smaller parts and export them as multiple CSV files. (diive.core.io.filesplitter.FileSplitter
) - Added new class
FileSplitterMulti
to split multiple files into multiple smaller parts and save them as CSV or compressed CSV files. (diive.core.io.filesplitter.FileSplitterMulti
) - Added new function
create_timestamp
that calculates the timestamp for each record in a dataframe, based on number of records in the file and the file duration. (diive.core.times.times.create_timestamp
)
- Added new filetype
ETH-SONICREAD-BICO-CSVGZ-20HZ
, these files contain data that were originally logged by thesonicread
script which is in use in the ETH Grassland Sciences group since the early 2000s to record eddy covariance data within the Swiss FluxNet. Data were then converted to a regular format using the Python script bico, which also compressed the resulting CSV files togz
files (gzipped
). - Added new filetype
GENERIC-CSV-HEADER-1ROW-TS-MIDDLE-FULL-NS-20HZ
, which corresponds to a CSV file with one header row with variable names, a timestamp that describes the middle of the averaging period, whereby the timestamp also includes nanoseconds. Time resolution of the file is 20 Hz.
- Renamed class
TurbFlux
toWindRotation2D
and updated code a bit, e.g., now it is possible to get rotated values for all three wind components (u'
,v'
,w'
) in addition to the rotated scalarc'
. (diive.pkgs.echires.windrotation.WindRotation2D
) - Renamed filetypes: all filetypes now use the dash instead of an underscore
- Renamed filetype to
ETH-RECORD-DAT-20HZ
: this filetype originates from the new eddy covariance real-time logging scriptrECord
(currently not open source) - Missing values are now defined for all files
as:
NA_VALUES: [ -9999, -6999, -999, "nan", "NaN", "NAN", "NA", "inf", "-inf", "-" ]
- Updated (and cleaned) notebook
StepwiseMeteoScreeningFromDatabase.ipynb
- In
StepwiseOutlierDetection
, it is now possible to re-run an outlier detection method. The re-run(s) would produce flag(s) with the same name(s) as for the first (original) run. Therefore, an integer is added to the flag name. For example, if the test z-score daytime/nighttime is run the first time, it produces the flag with the nameFLAG_TA_T1_2_1_OUTLIER_ZSCOREDTNT_TEST
. When the test is run again (e.g. with different settings) then the name of the flag of this second run isFLAG_TA_T1_2_1_OUTLIER_ZSCOREDTNT_2_TEST
, etc ... The script now checks whether a flag of the same name was already created, in which case an integer is added to the flag name. These re-runs are now available in addition to therepeat=True
keyword. (diive.pkgs.outlierdetection.stepwiseoutlierdetection.StepwiseOutlierDetection.addflag
) Example:METHOD
withSETTINGS
is applied withrepeat=True
and therefore repeated until no more outliers were found with these settings. The name of the flag produced isTEST_METHOD_FLAG
.- Next,
METHOD
is applied again withrepeat=True
, but this time with differentSETTINGS
. Like before, the test is repeated until no more outliers were found with the new settings. The name of the flag produced isTEST_METHOD_2_FLAG
. METHOD
can be re-run any number of times, each time producing a new flag:TEST_METHOD_3_FLAG
,TEST_METHOD_4_FLAG
, ...
- Added new function to format timestamps to FLUXNET ISO
format (
YYYYMMDDhhmm
) (diive.core.times.times.format_timestamp_to_fluxnet_format
)
- Refactored and fixed class to reformat data for FLUXNET
upload (
diive.pkgs.formats.fluxnet.FormatEddyProFluxnetFileForUpload
) - Fixed
None
error when reading data files (diive.core.io.filereader.DataFileReader._parse_file
)
- Updated notebook
FormatEddyProFluxnetFileForUpload.ipynb
- Added new functions to extract info from a binary that was stored as
integer. These functions convert a subrange of bits from an integer or an integer series to floats with an
optional gain applied. See docstring of the respective functions for more
info. (
diive.pkgs.binary.extract.get_encoded_value_from_int
) (diive.pkgs.binary.extract.get_encoded_value_series
) - Added new filetype
RECORD_DAT_20HZ
(diive/configs/filetypes/RECORD_DAT_20HZ.yml
) for eddy covariance high-resolution (20Hz) raw data files recorded by the ETHrECord
logging script.
- Fixed bugs in
FluxProcessingChain
, flag creation for missing values did not work because of the missingrepeat
keyword (diive.pkgs.fluxprocessingchain.fluxprocessingchain.FluxProcessingChain
)
Harmonized the way outlier flags are calculated. Outlier flags are all based on the same base
class diive.core.base.flagbase.FlagBase
like before, but the base class now includes more code that
is shared by the different outlier detection methods. For example, FlagBase
includes a method that
enables repeated execution of a single outlier detection method multiple times until all outliers
are removed. Results from all iterations are then combined into one single flag.
The class StepwiseMeteoScreeningDb
that makes direct use of the stepwise outlier detection was
adjusted accordingly.
- Updated notebook
StepwiseMeteoScreeningFromDatabase.ipynb
- Removed outlier test based on seasonal-trend decomposition and z-score calculations (
OutlierSTLRZ
). The test worked in principle, but at the moment it is unclear how to set reliable parameters. In addition the test is slow when used with multiple years of high-resolution data. De-activated for the moment.
- Updated: many docstrings.
The flux processing chain was updated in an attempt to make processing more streamlined and easier to follow. One of the
biggest changes is the implementation of the repeat
keyword for outlier tests. With this keyword set to True
, the
respective test is repeated until no more outliers can be found. How the flux processing chain can be used is shown in
the updated FluxProcessingChain
notebook (notebooks/FluxProcessingChain/FluxProcessingChain.ipynb
).
- Added new class
QuickFluxProcessingChain
, which allows to quickly execute a simplified version of the flux processing chain. This quick version runs with a lot of default values and thus not a lot of user input is needed, only some basic settings. (diive.pkgs.fluxprocessingchain.fluxprocessingchain.QuickFluxProcessingChain
) - Added new repeater function for outlier detection:
repeater
is wrapper that allows to execute an outlier detection method multiple times, where each iteration gets its own outlier flag. As an example: the simple z-score test is run a first time and then repeated until no more outliers are found. Each iteration outputs a flag. This is now used in theStepwiseOutlierDetection
and thus the flux processing chain Level-3.2 (outlier detection) and the meteoscreening inStepwiseMeteoScreeningDb
(not yet checked in this update). To repeat an outlier method use therepeat
keyword arg (see theFluxProcessingChain
notebook for examples).(diive.pkgs.outlierdetection.repeater.repeater
) - Added new function
filter_strings_by_elements
: Returns a list of strings from list1 that contain all of the elements in list2.(core.funcs.funcs.filter_strings_by_elements
) - Added new function
flag_steadiness_horizontal_wind_eddypro_test
: Create flag for steadiness of horizontal wind u from the sonic anemometer. Makes direct use of the EddyPro output files and converts the flag to a standardized 0/1 flag.(pkgs.qaqc.eddyproflags.flag_steadiness_horizontal_wind_eddypro_test
)
- Added automatic calculation of daytime and nighttime flags whenever the flux processing chain is started
flags (
diive.pkgs.fluxprocessingchain.fluxprocessingchain.FluxProcessingChain._add_swinpot_dt_nt_flag
)
- Removed class
ThymeBoostOutlier
for outlier detection. At the moment it was not possible to get it to work properly.
- It appears that the kwarg
fmt
is used slightly differently forplot_date
andplot
inmatplotlib
. It seems it is always defined forplot_date
, while it is optional forplot
. Now usingfmt
kwarg to avoid the warning: UserWarning: marker is redundantly defined by the 'marker' keyword argument and the fmt string "o" (-> marker='o'). The keyword argument will take precedence. Therefore using 'fmt="X"' instead of 'marker="X"'. See also answer here
- Removed
thymeboost
- Added new class
ScatterXY
: a simple scatter plot that supports bins (core.plotting.scatter.ScatterXY
)
- Added notebook
notebooks/Plotting/ScatterXY.ipynb
- Added new class
DaytimeNighttimeFlag
to calculate daytime flag (1=daytime, 0=nighttime), nighttime flag (1=nighttime, 0=daytime) and potential radiation from latitude and longitude (diive.pkgs.createvar.daynightflag.DaytimeNighttimeFlag
)
- Added support for N2O and CH4 fluxes during the calculation of the
QCF
quality flag in classFlagQCF
- Added first code for USTAR threshold detection for NEE
- Added new notebook
notebooks/CalculateVariable/Daytime_and_nighttime_flag.ipynb
diive
repository is now hosted on GitHub.
- Added first code for XGBoost gap-filling, not production-ready yet
- Added check if enough columns for lagging features in class
RandomForestTS
- Added more details in report for class
FluxStorageCorrectionSinglePointEddyPro
- Fixed check in
RandomForestTS
for bug inQuickFillRFTS
: number of available columns was checked too early - Fixed
QuickFillRFTS
implementation inOutlierSTLRZ
- Fixed
QuickFillRFTS
implementation inThymeBoostOutlier
- Added new package xgboost
- Updated all packages
- Implemented feature reduction (permutation importance) as separate method in
RandomForestTS
- Added new function to set values within specified time ranges to a constant
value(
pkgs.corrections.setto_value.setto_value
)- The function is now also implemented as method
in
StepwiseMeteoScreeningDb
(pkgs.qaqc.meteoscreening.StepwiseMeteoScreeningDb.correction_setto_value
)
- The function is now also implemented as method
in
- Updated notebook
notebooks/GapFilling/RandomForestGapFilling.ipynb
- Updated notebook
notebooks/GapFilling/QuickRandomForestGapFilling.ipynb
- Updated notebook
notebooks/MeteoScreening/StepwiseMeteoScreeningFromDatabase.ipynb
- Updated testcase for gap-filling with random
forest (
test_gapfilling.TestGapFilling.test_gapfilling_randomforest
)
- Re-implemented gap-filling of long-term time series spanning multiple years, where the model
to gap-fill a specific year is built from data from the respective year and its two closest
neighboring years. (
pkgs.gapfilling.randomforest_ts.LongTermRandomForestTS
)
- Fixed bug in
StepwiseMeteoScreeningDb
where position ofreturn
during setup was incorrect
- Added function to calculate the daily correlation between two time
series (
pkgs.analyses.correlation.daily_correlation
) - Added function to calculate potential radiation (
pkgs.createvar.potentialradiation.potrad
)
- Fixed bug in
StepwiseMeteoScreeningDb
where the subclassStepwiseOutlierDetection
did not use the already sanitized timestamp from the parent class, but sanitized the timestamp a second time, leading to potentially erroneous and irregular timestamps.
RandomForestTS
now has the following functions included as methods:steplagged_variants
: includes lagged variants of featuresinclude_timestamp_as_cols
: includes timestamp info as data columnsadd_continuous_record_number
: adds continuous record number as new columnsanitize
: validates and prepares timestamps for further processing
RandomForestTS
now outputs an additional predictions column where predictions from the full model and predictions from the fallback model are collected- Renamed function
steplagged_variants
tolagged_variants
(core.dfun.frames.lagged_variants
) - Updated function
lagged_variants
: now accepts a list of lag times. This makes it possible to lag variables in both directions, i.e., the observed value can be paired with values before and after the actual time. For example, the variableTA
is the observed value at the current timestamp,TA-1
is the value from the preceding record, andTA+1
is the value from the next record. Using values from the next record can be useful when modeling observations using data from a neighboring measurement location that has similar records but lagged in time due to distance. - Updated README
- Updated testcase for gap-filling with random
forest (
test_gapfilling.TestGapFilling.test_gapfilling_randomforest
)
- Updated
notebooks/MeteoScreening/StepwiseMeteoScreeningFromDatabase.ipynb
- Added more args for better control of
TimestampSanitizer
(core.times.times.TimestampSanitizer
) - Refined various docstrings
- Added new class for optimizing random forest parameters (
pkgs.gapfilling.randomforest_ts.OptimizeParamsRFTS
) - Added new plots for prediction error and residuals (
core.ml.common.plot_prediction_residuals_error_regr
) - Added function that adds a continuous record number as new column in a dataframe. This
could be useful to include as feature in gap-filling models for long-term datasets spanning multiple years.
(
core.dfun.frames.add_continuous_record_number
)
- When reading CSV files with pandas
.read_csv()
, the argmangle_dupe_cols=True
was removed because it is deprecated since pandas 2.0 ... - ... therefore the check for duplicate column names in class
ColumnNamesSanitizer
has been refactored. In case of duplicate columns names, an integer suffix is added to the column name. For example:VAR
is renamed toVAR.1
if it already exists in the dataframe. In caseVAR.1
also already exists, it is renamed toVAR.2
, and so on. The integer suffix is increased until the variable name is unique. (core.io.filereader.ColumnNamesSanitizer
) - Similarly, when reading CSV files with pandas
.read_csv()
, the argdate_parser
was removed because it is deprecated since pandas 2.0. When reading a CSV, the argdate_format
is now used instead. The input format remains unchanged, it is still a string giving the datetime format, such as"%Y%m%d%H%M"
. - The random feature variable is now generated using the same random state as the
model. (
pkgs.gapfilling.randomforest_ts.RandomForestTS
) - Similarly,
train_test_split
is now also using the same random state as the model. (pkgs.gapfilling.randomforest_ts.RandomForestTS
)
- Added new notebook
notebooks/GapFilling/RandomForestParamOptimization.ipynb
- Added testcase for loading dataframe from parquet file (
test_loaddata.TestLoadFiletypes.test_exampledata_parquet
) - Added testcase for gap-filling with random forest (
test_gapfilling.TestGapFilling.test_gapfilling_randomforest
)
- Updated
poetry
to latest version1.6.1
- Updated all packages to their latest versions
- Added new package yellowbrick
The class StepwiseMeteoScreeningDb
, which is used for quality-screening of meteo data
stored in the ETH Grassland Sciences database, has been refactored. It is now using the
previously introduced class StepwiseOutlierDetection
for outlier
tests. (pkgs.qaqc.meteoscreening.StepwiseMeteoScreeningDb
)
The following classes are no longer used and were removed from step-wise outlier detection:
- Removed z-score IQR test, too unreliable (
pkgs.outlierdetection.zscore.zScoreIQR
) - Similarly, removed seasonal trend decomposition that used z-score IQR test, too
unreliable (
pkgs.outlierdetection.seasonaltrend.OutlierSTLRIQRZ
)
- Updated notebook
notebooks/MeteoScreening/StepwiseMeteoScreeningFromDatabase.ipynb
- Added new notebook
notebooks/GapFilling/RandomForestGapFilling.ipynb
- Added new notebook
notebooks/GapFilling/QuickRandomForestGapFilling.ipynb
- Added new notebook
notebooks/Workbench/Remove_unneeded_cols.ipynb
The class RandomForestTS
has been refactored. In essence, it still uses the same
RandomForestRegressor
as before, but now outputs feature importances additionally
as computed by permutation. More details about permutation importance can be found
in scikit's official documentation
here: Permutation feature importance.
When the model is trained using .trainmodel()
, a random variable is included as additional
feature. Permutation importances of all features - including the random variable - are then
analyzed. Variables that yield a lower importance score than the random variables are removed
from the dataset and are not used to build the model. Typically, the permutation importance for
the random variable is very close to zero or even negative.
The built-in importance calculation in the RandomForestRegressor
uses the Gini importance,
an impurity-based feature importance that favors high cardinality features over low cardinality
features. This is not ideal in case of time series data that is combined with categorical data.
Permutation importance is therefore a better indicator whether a variable included in the model
is an important predictor or not.
The class now splits input data into training and testing datasets (holdout set). By default, the training set comprises 75% of the input data, the testing set 25%. After the model was trained, it is tested on the testing set. This should give a better indication of how well the model works on unseen data.
Once .trainmodel()
is finished, the model is stored internally and can be used to gap-fill
the target variable by calling .fillgaps()
.
In addition, the class now offers improved output with additional text output and plots that give more info about model training, testing and application during gap-filling.
RandomForestTS
has also been streamlined. The option to include timestamp info as features
(e.g., a column describing the season of the respective record) during model building is now
its own function (.include_timestamp_as_cols()
) and was removed from the class.
- New class
QuickFillRFTS
that usesRandomForestTS
in the background to quickly fill time series data (pkgs.gapfilling.randomforest_ts.QuickFillRFTS
) - New function to include timestamp info as features, e.g. YEAR and DOY (
core.times.times.include_timestamp_as_cols
) - New function to calculate various model scores, e.g. mean absolute error, R2 and
more (
core.ml.common.prediction_scores_regr
) - New function to insert the meteorological season (Northern hemisphere) as variable (
core.times.times.insert_season
). For each record in the time series, the seasonal info between spring (March, April, May) and winter (December, January, February) is added as integer number (0=spring, summer=1, autumn=2, winter=3).
- Added new example dataset, comprising ecosystem fluxes between 1997 and 2022 from the
ICOS Class 1 Ecosystem station CH-Dav.
This dataset will be used for testing code on long-term time series. The dataset is stored in the
parquet
file format, which allows fast loading and saving of datafiles in combination with good compression. The simplest way to load the dataset is to use:
from diive.configs.exampledata import load_exampledata_parquet
df = load_exampledata_parquet()
- Updated README with installation details
- Updated notebook
notebooks/CalculateVariable/Calculate_VPD_from_TA_and_RH.ipynb
Updates to class FormatEddyProFluxnetFileForUpload
, for quickly formatting the EddyPro fluxnet
output file to comply with FLUXNET requirements for uploading data.
-
Formatting EddyPro fluxnet files for upload to FLUXNET:
FormatEddyProFluxnetFileForUpload
- Added new method to rename variables from the EddyPro fluxnet file to comply
with FLUXNET variable codes.
._rename_to_variable_codes()
- Added new method to remove errneous time periods from dataset
.remove_erroneous_data()
- Added new method to remove fluxes from time periods of insufficient signal strength / AGC
.remove_low_signal_data()
- Added new method to rename variables from the EddyPro fluxnet file to comply
with FLUXNET variable codes.
- Fixed bug: when data points are removed manually using class
ManualRemoval
and the data to be removed is a single datetime (e.g.,2005-07-05 23:15:00
) then the removal now also works if the provided datetime is not found in the time series. Previously, the class raised the error that the provided datetime is not part of the index. (pkgs.outlierdetection.manualremoval.ManualRemoval
)
- Updated notebook
notebooks/Formats\FormatEddyProFluxnetFileForUpload.ipynb
to version3
- Relaxed conditions a bit when inferring time resolution of time
series (
core.times.times.timestamp_infer_freq_progressively
,core.times.times.timestamp_infer_freq_from_timedelta
)
- When reading parquet files, the TimestampSanitizer is applied by default to detect e.g. the time resolution
of the time series. Parquet files do not store info on time resolution like it is stored in pandas dataframes
(e.g.
30T
for 30MIN time resolution), even if the dataframe containing that info was saved to a parquet file.
- Fixed bug where interactive time series plot did not show in Jupyter notebooks (
core.plotting.timeseries.TimeSeries
) - Fixed bug where certain parts of the flux processing chain could not be used for the sensible heat flux
H
. The issue was thatH
is calculated from sonic temperature (T_SONIC
in EddyPro_fluxnet_
output files), which was not considered in functionpkgs.flux.common.detect_flux_basevar
. - Fixed bug: interactive plotting in notebooks using
bokeh
did not work. The reason was that thebokeh
plot tools (controls)ZoomInTool()
andZoomOutTool()
do not seem to work anymore. Both tools are now deactivated.
- Added new notebook for simple (interactive) time series plotting
notebooks/Plotting/TimeSeries.ipynb
- Updated notebook
notebooks/FluxProcessingChain/FluxProcessingChain.ipynb
to version 3
This update focuses on the flux processing chain, in particular the creation of the extended
quality flags, the flux storage correction and the creation of the overall quality flag QCF
.
- Added new class
StepwiseOutlierDetection
that can be used for general outlier detection in time series data. It is based on theStepwiseMeteoScreeningDb
class introduced in v0.50.0, but aims to be more generally applicable to all sorts of time series data stored in files (pkgs.outlierdetection.stepwiseoutlierdetection.StepwiseOutlierDetection
) - Added new outlier detection class that identifies outliers based on seasonal-trend decomposition
and z-score calculations (
pkgs.outlierdetection.seasonaltrend.OutlierSTLRZ
) - Added new outlier detection class that flags values based on absolute limits that can be defined
separately for daytime and nighttime (
pkgs.outlierdetection.absolutelimits.AbsoluteLimitsDaytimeNighttime
) - Added small functions to directly save (
core.io.files.save_as_parquet
) and load (core.io.files.load_parquet
) parquet files. Parquet files offer fast loading and saving in combination with good compression. For more information about the Parquet format see here
- Angle-of-attack: The angle-of-attack test can now be used during QC flag creation
(
pkgs.fluxprocessingchain.level2_qualityflags.FluxQualityFlagsLevel2.angle_of_attack_test
) - Various smaller additions
- Renamed class
FluxQualityFlagsLevel2
toFluxQualityFlagsLevel2EddyPro
because it is directly based on the EddyPro output (pkgs.fluxprocessingchain.level2_qualityflags.FluxQualityFlagsLevel2EddyPro
) - Renamed class
FluxStorageCorrectionSinglePoint
toFluxStorageCorrectionSinglePointEddyPro
(pkgs.fluxprocessingchain.level31_storagecorrection.FluxStorageCorrectionSinglePointEddyPro
) - Refactored creation of flux quality
flags (
pkgs.fluxprocessingchain.level2_qualityflags.FluxQualityFlagsLevel2EddyPro
) - Missing storage correction terms are now gap-filled using random forest before the storage terms are
added to the flux. For some records, the calculated flux was available but the storage term was missing, resulting
in a missing storage-corrected flux (example: 97% of fluxes had storage term available, but for 3% it was missing).
The gap-filling makes sure that each flux values has a corresponding storage term and thus more values are
available for further processing. The gap-filling is done solely based on timestamp information, such as DOY
and hour. (
pkgs.fluxprocessingchain.level31_storagecorrection.FluxStorageCorrectionSinglePoint
) - The outlier detection using z-scores for daytime and nighttime data uses latitude/longitude settings to
calculate daytime/nighttime via
pkgs.createvar.daynightflag.nighttime_flag_from_latlon
. Before z-score calculation, the time resolution of the time series is now checked and assigned automatically. (pkgs.outlierdetection.zscore.zScoreDaytimeNighttime
) - Removed
pkgs.fluxprocessingchain.level32_outlierremoval.FluxOutlierRemovalLevel32
since flux outlier removal is now done in the generally applicable classStepwiseOutlierDetection
(see new features) - Various smaller changes and refactorings
- Updated
poetry
to newest versionv1.5.1
. Thelock
files have a new format sincev1.3.0
. - Created new
lock
file forpoetry
. - Added new package
pyarrow
. - Added new package
pymannkendall
(see GitHub) to analyze time series data for trends. Functions of this package are not yet implemented indiive
.
- Added new notebook for loading and saving parquet files in
notebooks/Formats/LoadSaveParquetFile.ipynb
- Flux processing chain: Added new notebook for flux post-processing
in
notebooks/FluxProcessingChain/FluxProcessingChain.ipynb
.
- Identify critical heat days for ecosytem flux NEE (net ecosystem exchange, based on air temperature and VPD
(
pkgs.flux.criticalheatdays.FluxCriticalHeatDaysP95
) - Calculate z-aggregates in classes of x and y (
pkgs.analyses.quantilexyaggz.QuantileXYAggZ
) - Plot heatmap from pivoted dataframe, using x,y,z values (
core.plotting.heatmap_xyz.HeatmapPivotXYZ
) - Calculate stats for time series and store results in dataframe (
core.dfun.stats.sstats
) - New helper function to load and merge files of a specific filetype (
core.io.files.loadfiles
)
- Added more parameters when formatting EddyPro fluxnet file for FLUXNET
(
pkgs.formats.fluxnet.FormatEddyProFluxnetFileForUpload
)
- Removed left-over code
- Multiple smaller refactorings
- Added new notebook for calculating VPD in
notebooks/CalculateVariable/Calculate_VPD_from_TA_and_RH.ipynb
- Added new notebook for calculating time series stats
notebooks/Stats/TimeSeriesStats.ipynb
- Added new notebook for formatting EddyPro output for upload to
FLUXNET
notebooks/Formats/FormatEddyProFluxnetFileForUpload.ipynb
- Added new notebooks for reading data files (ICOS BM files)
- Added additional output to other notebooks
- Added new notebook section
Workbench
for practical use cases
- New filetype
configs/filetypes/ICOS_H1R_CSVZIP_1MIN.yml
- Added more output for detecting frequency from timeseries index (
core.times.times.DetectFrequency
)- The associated functions have been updated accordingly:
core.times.times.timestamp_infer_freq_from_fullset
,core.times.times.timestamp_infer_freq_progressively
,core.times.times.timestamp_infer_freq_from_timedelta
- Added new notebook (
notebooks/TimeStamps/Detect_time_resolution.ipynb
) - Added new unittest (
tests/test_timestamps.py
)
- The associated functions have been updated accordingly:
- GapFinder now gives by default sorted output, i.e. the output dataframe shows start and
end date for the largest gaps first (
pkgs.analyses.gapfinder.GapFinder
)
- Added new notebook for finding gaps in time series in
notebooks/Analyses/GapFinder.ipynb
- Added new notebook for time functions in
notebooks/TimeFunctions/times.ipynb
- New repository branch
indev
is used as developement branch from now on - Branch
main
will contain code from the most recent release
This update focuses on wind direction time series and adds the first example notebooks
to diive
. From now on, new example notebooks will be added regularly.
- Wind direction offset correction: Compare yearly wind direction histograms to
reference, detect offset in comparison to reference and correct wind directions
for offset per year (
pkgs.corrections.winddiroffset.WindDirOffset
) - Wind direction aggregation: Calculate mean etc. of wind direction in
degrees (
core.funcs.funcs.winddirection_agg_kanda
)
- Added new notebook for wind direction offset correction in
notebooks/Corrections/WindDirectionOffset.ipynb
- Added new notebok for reading ICOS BM files in
notebooks/ReadFiles/Read_data_from_ICOS_BM_files.ipynb
- Histogram analysis now accepts pandas Series as input (
pkgs.analyses.histogram.Histogram
)
- Added unittests for reading (some) filetypes
- The DataFileReader can now directly read zipped files (
core.io.filereader.DataFileReader
) - Interactive time series plot: (
core.plotting.timeseries.TimeSeries.plot_interactive
)- added x- and y-axis to the plots
- new parameters
width
andheight
allow to control the size of the plot - more controls such as undo/redo and zoom in/zoom out buttons were added
- The filetypes defined in
diive/configs/filetypes
now accept the settingCOMPRESSION: "zip"
. In essence, this allows to read zipped files directly. - New filetype
ICOS_H2R_CSVZIP_10S
- Compression in filetypes is now given as
COMPRESSION: "None"
for no compression, andCOMPRESSION: "zip"
for zipped CSV files.
LocalSD
inStepwiseMeteoScreeningDb
now accepts the parameterwinsize
to define the size of the rolling window (defaultNone
, in which case the window size is calculated automatically as 1/20 of the number of records). (pkgs.qaqc.meteoscreening.StepwiseMeteoScreeningDb.flag_outliers_localsd_test
)
- Fixed bug: outlier test
LocalSD
did not consider user inputn_sd
(pkgs.qaqc.meteoscreening.StepwiseMeteoScreeningDb.flag_outliers_localsd_test
)
- Fixed bug: during resampling, the info for the tag
data_version
was incorrectly stored in tagfreq
. (pkgs.qaqc.meteoscreening.StepwiseMeteoScreeningDb.resample
)
- Added plotting library
bokeh
to dependencies
- When combining data of different time resolutions, the data are now combined using
.combine_first()
instead of.concat()
to avoid duplicates during merging. This should work reliably because data of the highest resolution are available first, and then lower resolution upsampled (backfilled) data are added, filling gaps in the high resolution data. Because gaps are filled, overlaps between the two resolutions are avoided. With.concat()
, gaps were not filled, but timestamps were simply added as new records, and thus duplicates in the timestamp occurred. (pkgs.qaqc.meteoscreening.StepwiseMeteoScreeningDb._harmonize_timeresolution
) - Updated dependencies to newest possible versions
- Removed the packages
jupyterlab
andjupyter-bokeh
from dependencies, because the latter caused issues when trying to installdiive
in aconda
environment on a shared machine. Both dependencies are still listed in thepyproject.toml
file asdev
dependencies. It makes sense to keep both packages separate fromdiive
because they are specifically forjupyter
notebooks and not strictly related todiive
functionality.
- In
StepwiseMeteoScreeningDb
the current cleaned timeseries can now be plotted withshowplot_current_cleaned
. - Timeseries can now be plotted using the
bokeh
library. This plot are interactive and can be directly used in jupyter notebooks. (core.plotting.timeseries.TimeSeries
) - Added new plotting package
jupyter_bokeh
for interactive plotting in Jupyter lab. - Added new plotting package
seaborn
.
StepwiseMeteoScreeningDb
now works on a copy of the input data to avoid unintended data overwrite of input.
- Data formats: Added new package
diive/pkgs/formats
that assists in converting data outputs to formats required e.g. for data sharing with FLUXNET.- Convert the EddyPro
_fluxnet_
output file to the FLUXNET data format for data upload (data sharing). (pkgs.formats.fluxnet.ConvertEddyProFluxnetFileForUpload
)
- Convert the EddyPro
- Insert timestamp column: Insert timestamp column that shows the START, END
or MIDDLE time of the averaging interval (
core.times.times.insert_timestamp
) - Manual removal of data points: Flag manually defined data points as outliers.
(
pkgs.outlierdetection.manualremoval.ManualRemoval
)
Added additional outlier detection algorithms
to StepwiseMeteoScreeningDb
(pkgs.qaqc.meteoscreening.StepwiseMeteoScreeningDb
):
- Added local outlier factor test, across all data
(
pkgs.qaqc.meteoscreening.StepwiseMeteoScreeningDb.flag_outliers_lof_test
) - Added local outlier factor test, separately for daytime and nighttime
(
pkgs.qaqc.meteoscreening.StepwiseMeteoScreeningDb.flag_outliers_lof_dtnt_test
)
- Implemented random flux uncertainty calculation, based on Holliner and Richardson (2005)
and Pastorello et al. (2020). Calculations also include a first estimate of the error
propagation when summing up flux values to annual sums. See end of CHANGELOG for links to references.
(
pkgs.flux.uncertainty.RandomUncertaintyPAS20
)
- Added example data in
diive/configs/exampledata
, including functions to load the data.
- In
core.io.filereader
, the following classes now also acceptoutput_middle_timestamp
(boolean with defaultTrue
) as parameter:MultiDataFileReader
,ReadFileType
,DataFileReader
. This allows to keep the original timestamp of the data. - Some minor plotting adjustments
Stepwise quality-screening of meteorological data, directly from the database
In this update, the stepwise meteoscreening directly from the database introduced in the
previous update was further refined and extended, with additional outlier tests and corrections
implemented. The stepwise meteoscreening allows to perform step-by-step quality tests on
meteorological. A preview plot after running a test is shown and the user can decide if
results are satisfactory or if the same test with different parameters should be re-run.
Once results are satisfactory, the respective test flag is added to the data. After running
the desired tests, an overall flag QCF
is calculated from all individual tests.
In addition to the creation of quality flags, the stepwise screening allows to correct
data for common issues. For example, short-wave radiation sensors often measure negative
values during the night. These negative values are useful because they give info about
the accuracy and precision of the sensor. In this case, values during the night should
be zero. Instead of cutting off negative values, diive
detects the nighttime offset
for each day and then calculates a correction slope between individual days. This way,
the daytime values are also corrected.
After quality-screening and corrections, data are resampled to 30MIN time resolution.
At the moment, the stepwise meteoscreening works for data downloaded from the InfluxDB
database. The screening respects the database format (including tags) and prepares
the screened, corrected and resampled data for direct database upload.
Due to its modular approach, the stepwise screening can be easily adjusted to work with any type of data files. This adjustment will be done in one of the next updates.
- Renamed class
MetScrDbMeasurementVars
toStepwiseMeteoScreeningDb
(pkgs.qaqc.meteoscreening.StepwiseMeteoScreeningDb
)
- Stepwise MeteoScreening:
Added access to multiple methods for easy stepwise execution:
- Added local SD outlier test (
pkgs.qaqc.meteoscreening.StepwiseMeteoScreeningDb.flag_outliers_localsd_test
) - Added absolute limits outlier test (
pkgs.qaqc.meteoscreening.StepwiseMeteoScreeningDb.flag_outliers_abslim_test
) - Added correction to remove radiation zero
offset (
pkgs.qaqc.meteoscreening.StepwiseMeteoScreeningDb.correction_remove_radiation_zero_offset
) - Added correction to remove relative humidity
offset (
pkgs.qaqc.meteoscreening.StepwiseMeteoScreeningDb.correction_remove_relativehumidity_offset
) - Added correction to set values above a threshold to
threshold (
pkgs.qaqc.meteoscreening.StepwiseMeteoScreeningDb.correction_setto_max_threshold
) - Added correction to set values below a threshold to
threshold (
pkgs.qaqc.meteoscreening.StepwiseMeteoScreeningDb.correction_setto_min_threshold
) - Added comparison plot before/after QC and
corrections (
pkgs.qaqc.meteoscreening.StepwiseMeteoScreeningDb.showplot_resampled
)
- Added local SD outlier test (
- Stepwise MeteoScreening: (
pkgs.qaqc.meteoscreening.MetScrDbMeasurementVars
)- Helper class to screen time series of meteo variables directly from the database. The class is optimized to work in Jupyter notebooks. Various outlier detection methods can be called on-demand. Outlier results are displayed and the user can accept the results and proceed, or repeat the step with adjusted method parameters. An unlimited amount of tests can be chained together. At the end of the screening, an overall flag is calculated from ALL single flags. The overall flag is then used to filter the time series.
- Variables: The class allows the simultaneous quality-screening of multiple variables from one single measurement, e.g., multiple air temperature variables.
- Resampling:Filtered time series are resampled to 30MIN time resolution.
- Database tags: Is optimized to work with the InfluxDB format of the ETH Grassland Sciences Group. The class can handle database tags and updates tags after data screening and resampling.
- Handling different time resolutions: One challenging aspect of the screening were the different time resolutions of the raw data. In some cases, the time resolution changed from e.g. 10MIN for older data to 1MIN for newer date. In cases of different time resolution, the lower resolution is upsampled to the higher resolution, the emerging gaps are back-filled with available data. Back-filling is used because the timestamp in the database always is TIMESTAMP_END, i.e., it gives the end of the averaging interval. The advantage of upsampling is that all outlier detection routines can be applied to the whole dataset. Since data are resampled to 30MIN after screening and since the TIMESTAMP_END is respected, the upsampling itself has no impact on resulting aggregates.
- Generating the plot NEP penalty vs hours above threshold now requires a
minimum of 2 bootstrap runs to calculate prediction intervals
(
pkgs.flux.nep_penalty.NEPpenalty.plot_critical_hours
)
- Fixed bug in
BinFitter
, the parameter to set the number of predictions is now correctly namedn_predictions
. Similarn_bins_x
. - Fixed typos in functions
insert_aggregated_in_hires
,SortingBinsMethod
,FindOptimumRange
andpkgs.analyses.optimumrange.FindOptimumRange._values_in_optimum_range
and others. - Other typos
- USTAR threshold: (
pkgs.flux.ustarthreshold.UstarThresholdConstantScenarios
)- Calculates how many records of e.g. a flux variable are still available after the application of different USTAR thresholds. In essence, it gives an overview of the sensitivity of the variable to different thresholds.
- Outlier detection, LOF across all data: (
pkgs.outlierdetection.lof.LocalOutlierFactorAllData
)- Calculation of the local outlier factor across all data, i.e., no differentiation between daytime and nighttime data.
- Outlier detection, increments: (
pkgs.outlierdetection.incremental.zScoreIncremental
)- Based on the absolute change of on record in comparison to the previous record. These differences are stored as timeseries, the z-score is calculated and outliers are removed based on the observed differences. Works well with data that do not have a diel cycle, e.g. soil water content.
- Outlier detection: LOF, local outlier factor**: (
pkgs.outlierdetection.lof.LocalOutlierFactorDaytimeNighttime
)- Identify outliers based on the local outlier factor, done separately for daytime and nighttime data
- Multiple z-score outlier detections:
- Simple outlier detection based on the z-score of observations, calculated from
mean and std from the complete timeseries. (
pkgs.outlierdetection.zscore.zScore
) - z-score outlier detection separately for daytime and nighttime
data (
pkgs.outlierdetection.zscore.zScoreDaytimeNighttime
) - Identify outliers based on the z-score of the interquartile range data (
pkgs.outlierdetection.zscore.zScoreIQR
)
- Simple outlier detection based on the z-score of observations, calculated from
mean and std from the complete timeseries. (
- Outlier detection: (
pkgs.fluxprocessingchain.level32_outlierremoval.OutlierRemovalLevel32
):- Class that allows to apply multiple methods for outlier detection during as part of the flux processing chain
- Flux Processing Chain:
- Worked on making the chain more accessible to users. The purpose of the modules in
pkgs/fluxprocessingchain
is to expose functionality to the user, i.e., they make functionality needed in the chain accessible to the user. This should be as easy as possible and this update further simplified this access. At the moment there are three modules inpkgs/fluxprocessingchain/
:level2_qualityflags.py
,level31_storagecorrection.py
andlevel32_outlierremoval.py
. An example for the chain is given influxprocessingchain.py
.
- Worked on making the chain more accessible to users. The purpose of the modules in
- QCF flag: (
pkgs.qaqc.qcf.FlagQCF
)- Refactored code: the creation of overall quality flags
QCF
is now done using the same code for flux and meteo data. The general logic of theQCF
calculation is that results from multiple quality checks that are stored as flags in the data are combined into one single quality flag.
- Refactored code: the creation of overall quality flags
- Outlier Removal using STL:
- Module was renamed to
pkgs.outlierdetection.seasonaltrend.OutlierSTLRIQRZ
. It is not the most convenient name, I know, but it stands for Seasonal Trend decomposition using LOESS, based on Residual analysis of the InterQuartile Range using Z-scores
- Module was renamed to
- Search files can now search in subfolders of multiple base folders (
core.io.filereader.search_files
)
- Outlier Removal using STL: (
pkgs.outlierdetection.seasonaltrend.OutlierSTLIQR
)- Implemented first code to remove outliers using seasonal-srend decomposition using LOESS.
This method divides a time series into seasonal, trend and residual components.
diive
uses the residuals to detect outliers based on z-score calculations.
- Implemented first code to remove outliers using seasonal-srend decomposition using LOESS.
This method divides a time series into seasonal, trend and residual components.
- Overall quality flag for meteo data: (
pkgs.qaqc.qcf.MeteoQCF
)- Combines the results from multiple flags into one single flag
- Very similar to the calculation of the flux QCF flag
- MeteoScreening: (
diive/pkgs/qaqc/meteoscreening.py
)- Refactored most of the code relating to the quality-screening of meteo data
- Implemented the calculation of the overall quality flag QCF
- Two overview figures are now created at the end on the screening
- Flags for tests used during screening are now created using a base class (
core.base.flagbase.FlagBase
)
- Flux Processing Chain: All modules relating to the Swiss FluxNet flux processing
chain are now collected in the dedicated package
fluxprocessingchain
. Relevant modules were moved to this package, some renamed:pkgs.fluxprocessingchain.level2_qualityflags.QualityFlagsLevel2
pkgs.fluxprocessingchain.level31_storagecorrection.StorageCorrectionSinglePoint
pkgs.fluxprocessingchain.qcf.QCF
- Reading YAML files: (
core.io.filereader.ConfigFileReader
)- Only filetype configuration files are validated, i.e. checked if they follow the
expected file structure. However, there can be other YAML files, such as the file
pipes_meteo.yaml
that defines the QA/QC steps for each meteo variable. For the moment, only the filetype files are validated and the validation is skipped for the pipes file.
- Only filetype configuration files are validated, i.e. checked if they follow the
expected file structure. However, there can be other YAML files, such as the file
- Refactored calculation of nighttime flag from sun altitude: code is now vectorized
and runs - unsurprisingly - much faster (
pkgs.createvar.nighttime_latlon.nighttime_flag_from_latlon
) - Some smaller changes relating to text output to the console
- Flux storage correction: (
pkgs.flux.storage.StorageCorrectionSinglePoint
)- Calculate storage-corrected fluxes
- Creates Level-3.1 in the flux processing chain
- Overall quality flag: (
pkgs.qaqc.qcf.QCF
)- Calculate overall quality flag from multiple individual flags
- Flux quality-control: (
pkgs.qaqc.fluxes.QualityFlagsLevel2
)- Flags now have the string
_L2_
in their name to identify them as flags created during Level-2 calculations in the Swiss FluxNet flux processing chain. - All flags can now be returned to the main data
- Flags now have the string
- Renamed
pkgs.qaqc.fluxes.FluxQualityControlFlag
topkgs.qaqc.fluxes.QualityFlagsLevel2
- Flux quality-control: (
pkgs.qaqc.fluxes.FluxQualityControlFlag
)- Added heatmap plots for before/after QC comparison
- Improved code for calculation of overall flag
QCF
- Improved console output
- Flux quality-control: (
pkgs.qaqc.fluxes.FluxQualityControlFlag
)- First implementation of quality control of ecosystem fluxes. Generates one
overall flag (
QCF
=quality control flag) from multiple quality test results in EddyPro'sfluxnet
output file. The resultingQCF
is Level-2 in the Swiss FluxNet processing chain, described here.QCF
is mostly based on the ICOS methodology, described by Sabbatini et al. (2018).
- First implementation of quality control of ecosystem fluxes. Generates one
overall flag (
- Histogram: (
pkgs.analyses.histogram.Histogram
)- Calculates histogram from time series, identifies peak distribution
- Percentiles: (
pkgs.analyses.quantiles.percentiles
)- Calculates percentiles (0-100) for a time series
- Scatter: Implemented first version of
core.plotting.scatter.Scatter
, which will be used for scatter plots in the future
- Critical days: (
pkgs.flux.criticaldays.CriticalDays
)- Renamed Variables, now using Dcrit (instead of CRD) and nDcrit (instead of nCRD)
- NEP Penalty: (
pkgs.flux.nep_penalty.NEPpenalty
)- Code was refactored to work with NEP (net ecosystem productivity) instead of NEE (net ecosystem exchange)
- CO2 penalty was renamed to the more descriptive NEP penalty
- Sanitize column names: implemented in
core.io.filereader.ColumnNamesSanitizer
Column names are now checked for duplicates. Found duplicates are renamed by adding a suffix to the column name. Example:co2_mean
andco2_mean
are renamed toco2_mean.1
andco2_mean.2
. This check is now implemented during the reading of the data file incore.io.filereader.DataFileReader
. - Configuration files: When reading filetype configuration files in
core.io.filereader.ConfigFileReader
, the resulting dictionary that contains all configurations is now validated. The validation makes sure the parameters for.read_csv()
are in the proper format. - Updated all dependencies to their newest (possible) version
- Added support for filetype
EDDYPRO_FLUXNET_30MIN
(configs/filetypes/EDDYPRO_FLUXNET_30MIN.yml
)
- Frequency groups detection: Data in long-term datasets are often characterized by changing time
resolutions at which data were recorded.
core.times.times.detect_freq_groups
detects changing time resolutions in datasets and adds a group identifier in a new column that gives info about the detected time resolution in seconds, e.g.,600
for 10MIN data records. This info allows to address and process the different time resolutions separately during later processing, which is needed e.g. during data quality-screening and resampling. - Outlier removal using z-score: First version of
pkgs.outlierdetection.zscore.zscoreiqr
Removes outliers based on the z-score of interquartile range data. Data are divided into 8 groups based on quantiles. The z-score is calculated for each data point in the respective group and based on the mean and SD of the respective group. The z-score threshold to identify outlier data is calculated as the max of z-scores found in IQR data multiplied by factor. z-scores above the threshold are marked as outliers. - Outlier removal using local standard deviation: First version of
pkgs.outlierdetection.local3sd.localsd
Calculates mean and SD in a rolling window and marks data points outside a specified range.
- MeteoScreening: Added the new parameter
resampling_aggregation
in the meteoscreening settingdiive/pkgs/qaqc/pipes_meteo.yaml
. For example,TA
needsmean
,PRECIP
needssum
.
- MeteoScreening:
pkgs.qaqc.meteoscreening.MeteoScreeningFromDatabaseSingleVar
Refactored the merging of quality-controlled 30MIN data when more than one raw data time resolution is involved. - Resampling:
core.times.resampling.resample_series_to_30MIN
The minimum required values for resampling is1
. However, this is only relevant for lower resolution data e.g. 10MIN and 30MIN, because for higher resolutions the calculated value for minimum required values yields values > 1 anyway. In addition, if data are already in 30MIN resolution, they are still going through the resampling processing although it would not be necessary, because the processing includes other steps relevant to all data resolutions, such as the change of the timestamp from TIMESTAMP_MIDDLE to TIMESTAMP_END.
- Removed display bug when showing data after high-res meteoscreening in heatmap. Plot showed original instead of meteoscreened data
- Decoupling: Added first version of decoupling code (
pkgs.analyses.decoupling.SortingBinsMethod
). This allows the investigation of binned aggregates of a variablez
in binned classes ofx
andy
. For example: show mean GPP (y
) in 5 classes of VPD (x
), separate for 10 classes of air temperature (z
).
- Time series plot:
core.plotting.timeseries.TimeSeries
plots a simple time series. This will be the default method to plot time series.
-
Critical days: Several changes in
pkgs.flux.criticaldays.CriticalDays
:- By default, daily aggregates are now calculated from 00:00 to 00:00 (before it was 7:00 to 07:00).
- Added parameters for specifying the labels for the x- and y-axis in output figure
- Added parameter for setting dpi of output figure
- Some smaller adjustments
pkgs.flux.co2penalty.CO2Penalty.plot_critical_hours
: 95% predicion bands are now smoothed (rolling mean)
-
CO2 penalty: (since v0.44.0 renamed to NEP penalty)
- Some code refactoring in
pkgs.flux.co2penalty.CO2Penalty
, e.g. relating to plot appearances
- Some code refactoring in
pkgs.fits.binfitter.BinFitterBTS
fits a quadratic or linear equation to data.- This is a refactored version of the previous
BinFitter
to allow more options. - Implemented
pkgs.fits.binfitter.PlotBinFitterBTS
for plottingBinFitterBTS
results PlotBinFitterBTS
now allows plotting of confidence intervals for the upper and lower prediction bands- The updated
BinFitterBTS
is now implemented inpkgs.flux.criticaldays.CriticalDays
It is now possible to show confidence intervals for the upper and lower prediction bands.
core.plotting.heatmap_datetime.HeatmapDateTime
now acceptsfigsize
- When reading a file using
core.io.filereader.ReadFileType
, the index column is now parsed to a temporarily named column. After reading the file data, the temporary column name is renamed to the correct name. This was implemented to avoid duplicate issues regarding the index column when parsing the file, because a data column with the same name as the index column might be in the dataset.
- Fixed bug in
pkgs.gapfilling.randomforest_ts.RandomForestTS
: fallback option for gap-filling was never used and some gaps would remain in the time series.
- New analysis:
pkgs.flux.co2penalty.CO2Penalty
calculates the CO2 penalty as the difference between the observed co2 flux and the potential co2 flux modelled from less extreme environmental conditions.
- New calculation:
pkgs.createvar.vpd.calc_vpd_from_ta_rh
calculates vapor pressure deficit (VPD) from air temperature and relative humidity
- Fixed:
core.plotting.cumulative.CumulativeYear
now shows zero line if needed - Fixed:
core.plotting.cumulative.CumulativeYear
now shows proper axis labels
- New analysis:
pkgs.flux.criticaldays.CriticalDays
detects days in y that are above a detected x threshold. At the moment, this is implemented to work with half-hourly flux data as input and was tested with VPD (x) and NEE (y). In the example below critical days are defined as the VPD daily max value where the daily sum of NEE (in g CO2 m-2 d-1) becomes positive (i.e., emission of CO2 from the ecosystem to the atmosphere).
- New analysis:
pkgs.analyses.optimumrange.FindOptimumRange
finds the optimum for a variable in binned other variable. This is useful for e.g. detecting the VPD range where CO2 uptake was highest (=most negative).
- New plot:
core.plotting.cumulative.CumulativeYear
plots cumulative sums per year
- New plot:
core.plotting.bar.LongtermAnomaliesYear
plots yearly anomalies in relation to a reference period
- Refactored various code bits for plotting
- Refactored code for
pkgs/gapfilling/randomforest_ts.py
- Implemented lagged variants of variables
- Implemented long-term gap-filling, where the model to gap-fill a specific year is built from the respective year and its neighboring years
- Implemented feature reduction using sklearn's RFECV
- Implemented TimeSeriesSplit used as the cross-validation splitting strategy during feature reduction
- Implemented
TimestampSanitizer
also when reading from file withcore.io.filereader.DataFileReader
- Removed old code in
.core.dfun.files
and moved files logistics to.core.io.files
instead - Implemented saving and loading Python
pickles
in.core.io.files
- Added function
pkgs.corrections.offsetcorrection.remove_relativehumidity_offset
to correct humidity measurements for values > 100%
- Added first code for outlier detection via seasonal trends in
pkgs/outlierdetection/seasonaltrend.py
- Prepared
pkgs/analyses/optimumrange.py
for future updates
- Implemented corrections and quality screening for radiation data in
pkgs.qaqc.meteoscreening
Additions to pkgs.corrections
:
- Added function
.offsetcorrection.remove_radiation_zero_offset
to correct radiation data for nighttime offsets - Added function
.setto_threshold.setto_threshold
to set values above or below a specfied threshold value to the threshold.
- Added function
core.plotting.plotfuncs.quickplot
for quickly plotting pandas Series and DataFrame data
- Implemented
TimeSanitizer
incore.times.resampling.resample_series_to_30MIN
- Added decorator class
core.utils.prints.ConsoleOutputDecorator
, a wrapper to execute functions with additional info that is output to the console.
- Added new class
core.times.times.TimestampSanitizer
- Class that handles timestamp checks and fixes, such as the creation of a continuous timestamp without date gaps.
- Added
pkgs.createvar.nighttime_latlon.nighttime_flag_from_latlon
- Function for the calculation of a nighttime flag (1=nighttime) from latitude and longitude coordinates of a specific location.
- Added
core.plotting.heatmap_datetime.HeatmapDateTime
- Class to generate a heatmap plot from timeseries data.
MeteoScreening uses a general settings file pipes_meteo.yaml
that contains info how
specific measurements
should be screened. Such measurements
group similar variables
together, e.g. different air temperatures are measurement TA
.
Additions to module pkgs.qaqc.meteoscreening
:
- Added class
ScreenVar
- Performs quality screening of air temperature
TA
. - As first check, I implemented outlier detection via the newly added package
ThymeBoost
, along with checks for absolute limits. - Screening applies the checks defined in the file
pipes_meteo.yaml
for the respectivemeasurement
, e.g.TA
for air temperature. - The screening outputs a separate dataframe that contains
QCF
flags for each check. - The checks do not change the original time series. Instead, only the flags are generated.
- Screening routines for more variables will be added over the next updates.
- Performs quality screening of air temperature
- Added class
MeteoScreeningFromDatabaseSingleVar
- Performs quality screening and resampling to 30MIN of variables downloaded from the database.
- It uses the
detailed
data when downloading data from the database usingdbc-influxdb
. - The
detailed
data contains the measurement of the variable, along with multiple tags that describe the data. The tags are needed for storage in the database. - After quality screening of the original high-resolution data, flagged values are removed and then data are resampled.
- It also handles the issue that data downloaded for a specific variable can have different time resolution over the years, although I still need to test this.
- After screening and resampling, data are in a format that can be directly uploaded to the
database using
dbc-influxdb
.
- Added class
MeteoScreeningFromDatabaseMultipleVars
- Wrapper where multiple variables can be screened in one run.
- This should also work in combination of different
measurements
. For example, screening radiation and temperature data in one run.
Additions to pkgs.outlierdetection
:
- Added module
thymeboost
- Added module
absolute_limits
- This version introduces the code for calculating carbon cost and critical heat days.
- Added new package for flux-specific calculations:
diive.pkgs.flux
- Added new module for calculating carbon cost:
diive.pkgs.flux.carboncost
- Added new module for calculating critical heat days:
diive.pkgs.flux.criticalheatdays
- None
- None
The diive
library contains packages and modules that aim to facilitate working
with time series data, in particular ecosystem data.
Previous versions of diive
included a GUI. The GUI component will from now on
be developed separately as diive-gui
, which makes use of the diive
library.
Previous versions of diive
(up to v0.22.0) can be found in the separate repo
diive-legacy.
This initial version of the diive
library contains several first versions of
packages that will be extended with the next versions.
Notable introduction in this version is the package echires
for working with
high-resolution eddy covariance data. This package contains the module fluxdetectionlimit
,
which allows the calculation of the flux detection limit following Langford et al. (2015).
- Added
common
: Common functionality, e.g. reading data files - Added
pkgs > analyses
: General analyses - Added
pkgs > corrections
: Calculate corrections for existing variables - Added
pkgs > createflag
: Create flag variables, e.g. for quality checks - Added
pkgs > createvar
: Calculate new variables, e.g. potential radiation - Added
pkgs > echires
: Calculations for eddy covariance high-resolution data, e.g. 20Hz data - Added
pkgs > gapfilling
: Gap-filling routines - Added
pkgs > outlierdetection
: Outlier detection - Added
pkgs > qaqc
: Quality screening for timeseries variables
- Added
optimumrange
inpkgs > analyses
- Added
gapfinder
inpkgs > analyses
- Added
offsetcorrection
inpkgs > corrections
- Added
setto_threshold
inpkgs > corrections
- Added
outsiderange
inpkgs > createflag
- Added
potentialradiation
inpkgs > createvar
- Added
fluxdetectionlimit
inpkgs > echires
- Added
interpolate
inpkgs > gapfilling
- Added
hampel
inpkgs > outlierdetection
- Added
meteoscreening
inpkgs > qaqc
- None
- None
- Hollinger, D. Y., & Richardson, A. D. (2005). Uncertainty in eddy covariance measurements and its application to physiological models. Tree Physiology, 25(7), 873–885. https://doi.org/10.1093/treephys/25.7.873
- Langford, B., Acton, W., Ammann, C., Valach, A., & Nemitz, E. (2015). Eddy-covariance data with low signal-to-noise ratio: Time-lag determination, uncertainties and limit of detection. Atmospheric Measurement Techniques, 8(10), 4197–4213. https://doi.org/10.5194/amt-8-4197-2015
- Papale, D., Reichstein, M., Aubinet, M., Canfora, E., Bernhofer, C., Kutsch, W., Longdoz, B., Rambal, S., Valentini, R., Vesala, T., & Yakir, D. (2006). Towards a standardized processing of Net Ecosystem Exchange measured with eddy covariance technique: Algorithms and uncertainty estimation. Biogeosciences, 3(4), 571–583. https://doi.org/10.5194/bg-3-571-2006
- Pastorello, G. et al. (2020). The FLUXNET2015 dataset and the ONEFlux processing pipeline for eddy covariance data. 27. https://doi.org/10.1038/s41597-020-0534-3
- Reichstein, M., Falge, E., Baldocchi, D., Papale, D., Aubinet, M., Berbigier, P., Bernhofer, C., Buchmann, N., Gilmanov, T., Granier, A., Grunwald, T., Havrankova, K., Ilvesniemi, H., Janous, D., Knohl, A., Laurila, T., Lohila, A., Loustau, D., Matteucci, G., … Valentini, R. (2005). On the separation of net ecosystem exchange into assimilation and ecosystem respiration: Review and improved algorithm. Global Change Biology, 11(9), 1424–1439. https://doi.org/10.1111/j.1365-2486.2005.001002.x