Skip to content

Commit

Permalink
Merge pull request #205 from holukas/indev
Browse files Browse the repository at this point in the history
Indev
  • Loading branch information
holukas authored Sep 11, 2024
2 parents e05ee15 + 47fe57c commit 0a9f1d7
Show file tree
Hide file tree
Showing 33 changed files with 6,841 additions and 4,825 deletions.
101 changes: 92 additions & 9 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,88 @@

![DIIVE](images/logo_diive1_256px.png)

## v0.81.0 | 11 Sep 2024

## Expanding Flux Processing Capabilities

This update brings advancements for post-processing eddy covariance data in the context of the `FluxProcessingChain`.
The goal is to offer a complete chain for post-processing ecosystem flux data, specifically designed to work seamlessly
with the standardized `_fluxnet` output file from the
widely-used [EddyPro](https://www.licor.com/env/products/eddy-covariance/eddypro) software.

Now, diive offers the option for USTAR filtering based on *known* constant thresholds across the entire dataset (similar
to the `CUT` scenarios in FLUXNET data). While seasonal (DJF, MAM, JJA, SON) thresholds are calculated internally,
applying them on a seasonal basis or using variable thresholds per year (like FLUXNET's `VUT` scenarios) isn't yet
implemented.

With this update, the `FluxProcessingChain` class can handle various data processing steps:

- Level-2: Quality flag expansion
- Level-3.1: Storage correction
- Level-3.2: Outlier removal
- Level-3.3: (new) USTAR filtering (with constant thresholds for now)
- (upcoming) Level-4.1: long-term gap-filling using random forest and XGBoost
- For info about the different flux levels
see [Swiss FluxNet flux processing chain](https://www.swissfluxnet.ethz.ch/index.php/data/ecosystem-fluxes/flux-processing-chain/)

### New features

- Added class to apply multiple known constant USTAR (friction velocity) thresholds, creating flags that indicate time
periods characterized by low turbulence for multiple USTAR scenarios. The constant thresholds must be known
beforehand, e.g., from an earlier USTAR detection run, or from results from FLUXNET (
`diive.pkgs.flux.ustarthreshold.FlagMultipleConstantUstarThresholds`)
- Added class to apply one single known constant USTAR thresholds (
`diive.pkgs.flux.ustarthreshold.FlagSingleConstantUstarThreshold`)
- Added `FlagMultipleConstantUstarThresholds` to the flux processing chain (
`diive.pkgs.fluxprocessingchain.fluxprocessingchain.FluxProcessingChain.level33_constant_ustar`)
- Added USTAR detection algorithm based on Papale et al., 2006 (`diive.pkgs.flux.ustarthreshold.UstarDetectionMPT`)
- Added function to analyze high-quality ecosystem fluxes that helps in understanding the range of highest-quality data(
`diive.pkgs.flux.hqflux.analyze_highest_quality_flux`)

### Additions

- `LocalSD` outlier detection can now use a constant SD:
- Added parameter to use standard deviation across all data (constant) instead of the rolling SD to calculate the
upper and lower limits that define outliers in the median rolling window (
`diive.pkgs.outlierdetection.localsd.LocalSD`)
- Added to step-wise outlier detection (
`diive.pkgs.outlierdetection.stepwiseoutlierdetection.StepwiseOutlierDetection.flag_outliers_localsd_test`)
- Added to meteoscreening from database (
`diive.pkgs.qaqc.meteoscreening.StepwiseMeteoScreeningDb.flag_outliers_localsd_test`)
- Added to flux processing chain (
`diive.pkgs.fluxprocessingchain.fluxprocessingchain.FluxProcessingChain.level32_flag_outliers_localsd_test`)

### Changes

- Replaced `.plot_date()` from the Matplotlib library with `.plot()` due to deprecation

### Notebooks

- Added notebook for plotting cumulative sums per year (`notebooks/Plotting/CumulativesPerYear.ipynb`)
- Added notebook for removing outliers based on the z-score in rolling time window (
`notebooks/OutlierDetection/zScoreRolling.ipynb`)

### Bugfixes

- Fixed bug when saving a pandas Series to parquet (`diive.core.io.files.save_parquet`)
- Fixed bug when plotting `doy_mean_cumulative`: no longer crashes when years defined in parameter
`excl_years_from_reference` are not in dataset (`diive.core.times.times.doy_mean_cumulative`)
- Fixed deprecation warning when plotting in `bokeh` (interactive plots)

### Tests

- Added unittest for `LocalSD` using constant SD (
`tests.test_outlierdetection.TestOutlierDetection.test_localsd_with_constantsd`)
- Added unittest for rolling z-score outlier removal (
`tests.test_outlierdetection.TestOutlierDetection.test_zscore_rolling`)
- Improved check if figure and axis were created in (`tests.test_plots.TestPlots.test_histogram`)
- 39/39 unittests ran successfully

### Environment

- Added new package `scikit-optimize`
- Added new package `category_encoders`

## v0.80.0 | 28 Aug 2024

### Additions
Expand All @@ -18,7 +100,8 @@

### Notebooks

- Added new notebook for creating a flag that indicates missing values (`notebooks/OutlierDetection/MissingValues.ipynb`)
- Added new notebook for creating a flag that indicates missing values (
`notebooks/OutlierDetection/MissingValues.ipynb`)
- Updated notebook for meteoscreening from database (
`notebooks/MeteoScreening/StepwiseMeteoScreeningFromDatabase.ipynb`)
- Updated notebook for loading and saving parquet files (`notebooks/Formats/LoadSaveParquetFile.ipynb`)
Expand Down Expand Up @@ -2048,17 +2131,17 @@ which allows the calculation of the flux detection limit following Langford et a

- None

#### **REFERENCES**

Langford, B., Acton, W., Ammann, C., Valach, A., & Nemitz, E. (2015). Eddy-covariance data with low signal-to-noise
ratio: Time-lag determination, uncertainties and limit of detection. Atmospheric Measurement Techniques, 8(10),
4197–4213. https://doi.org/10.5194/amt-8-4197-2015

# References
## **REFERENCES**

- Hollinger, D. Y., & Richardson, A. D. (2005). Uncertainty in eddy covariance measurements
and its application to physiological models. Tree Physiology, 25(7),
873–885. https://doi.org/10.1093/treephys/25.7.873
- Langford, B., Acton, W., Ammann, C., Valach, A., & Nemitz, E. (2015). Eddy-covariance data with low signal-to-noise
ratio: Time-lag determination, uncertainties and limit of detection. Atmospheric Measurement Techniques, 8(10),
4197–4213. https://doi.org/10.5194/amt-8-4197-2015
- Papale, D., Reichstein, M., Aubinet, M., Canfora, E., Bernhofer, C., Kutsch, W., Longdoz, B., Rambal, S., Valentini,
R., Vesala, T., & Yakir, D. (2006). Towards a standardized processing of Net Ecosystem Exchange measured with eddy
covariance technique: Algorithms and uncertainty estimation. Biogeosciences, 3(4),
571–583. https://doi.org/10.5194/bg-3-571-2006
- Pastorello, G. et al. (2020). The FLUXNET2015 dataset and the ONEFlux processing pipeline
for eddy covariance data. 27. https://doi.org/10.1038/s41597-020-0534-3

5 changes: 4 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@

[![DOI](https://zenodo.org/badge/708559210.svg)](https://zenodo.org/doi/10.5281/zenodo.10884017)

*`diive` is currently under active developement with frequent updates.*

# Time series data processing

`diive` is a Python library for time series processing, in particular ecosystem data. Originally developed
Expand Down Expand Up @@ -123,10 +125,11 @@ _Create single outlier flags where `0=OK` and `2=outlier`._
- **z-score**: Identify outliers based on the z-score across all time series data ([notebook example](https://github.com/holukas/diive/blob/main/notebooks/OutlierDetection/zScore.ipynb))
- **z-score increments daytime/nighttime**: Identify outliers based on the z-score of double increments ([notebook example](https://github.com/holukas/diive/blob/main/notebooks/OutlierDetection/zScoreIncremental.ipynb))
- **z-score daytime/nighttime**: Identify outliers based on the z-score, separately for daytime and nighttime ([notebook example](https://github.com/holukas/diive/blob/main/notebooks/OutlierDetection/zScoreDaytimeNighttime.ipynb))
- **z-score rolling**: Identify outliers based on the rolling z-score
- **z-score rolling**: Identify outliers based on the rolling z-score ([notebook example](https://github.com/holukas/diive/blob/main/notebooks/OutlierDetection/zScoreRolling.ipynb))

### Plotting

- **Cumulatives per year** ([notebook example](https://github.com/holukas/diive/blob/main/notebooks/Plotting/CumulativesPerYear.ipynb))
- **Diel cycle per month** ([notebook example](https://github.com/holukas/diive/blob/main/notebooks/Plotting/DielCycle.ipynb))
- **Heatmap date/time**: showing values (z) of time series as date (y) vs time (x) ([notebook example](https://github.com/holukas/diive/blob/main/notebooks/Plotting/HeatmapDateTime.ipynb))
- **Heatmap year/month**: showing values (z) of time series as year (y) vs month (x) ([notebook example](https://github.com/holukas/diive/blob/main/notebooks/Plotting/HeatmapYearMonth.ipynb))
Expand Down
63 changes: 32 additions & 31 deletions diive/core/base/flagbase.py
Original file line number Diff line number Diff line change
Expand Up @@ -165,19 +165,19 @@ def defaultplot(self, n_iterations: int = 1):
ax_ok = fig.add_subplot(gs[1, 0], sharex=ax_series)
ax_ok_hist = fig.add_subplot(gs[1, 1])

ax_series.plot_date(self.series.index, self.series,
label=f"{self.series.name}", color="#607D8B",
alpha=.5, markersize=8, markeredgecolor='none')
ax_series.plot_date(self.series[rejected].index, self.series[rejected],
label="outlier (rejected)", color="#F44336", alpha=1,
markersize=12, markeredgecolor='none', fmt='X')
ax_series.plot(self.series.index, self.series,
label=f"{self.series.name}", color="#607D8B", linestyle='none', markeredgewidth=1,
marker='o', alpha=.5, markersize=6, markeredgecolor="#607D8B", fillstyle='none')
ax_series.plot(self.series[rejected].index, self.series[rejected],
label="outlier (rejected)", color="#F44336", alpha=1, linestyle='none',
markersize=12, markeredgecolor='none', marker='X')
hist_kwargs = dict(method='n_bins', n_bins=None, highlight_peak=True, show_zscores=True, show_info=False,
show_title=False, show_zscore_values=False, show_grid=False)
HistogramPlot(self.series, **hist_kwargs).plot(ax=ax_series_hist)

ax_ok.plot_date(self.series[ok].index, self.series[ok],
label="filtered series", alpha=.5,
markersize=8, markeredgecolor='none')
ax_ok.plot(self.series[ok].index, self.series[ok],
label="filtered series", alpha=.5, linestyle='none', markeredgewidth=1,
marker='o', markersize=6, markeredgecolor="#607D8B", fillstyle='none')
HistogramPlot(self.series[ok], **hist_kwargs).plot(ax=ax_ok_hist)

default_format(ax=ax_series)
Expand Down Expand Up @@ -220,56 +220,57 @@ def plot_outlier_daytime_nighttime(self, series: Series, flag_daytime: Series,
fig.suptitle(title, fontsize=24, fontweight='bold')

ax_series = fig.add_subplot(gs[0, 0])
ax_series.xaxis.axis_date()
ax_series_hist = fig.add_subplot(gs[0, 1])
ax_cleaned = fig.add_subplot(gs[0, 2], sharex=ax_series)
ax_cleaned.xaxis.axis_date()
ax_cleaned_hist = fig.add_subplot(gs[0, 3])

ax_series_dt = fig.add_subplot(gs[1, 0])
ax_series_dt.xaxis.axis_date()
ax_series_dt_hist = fig.add_subplot(gs[1, 1])
ax_cleaned_dt = fig.add_subplot(gs[1, 2], sharex=ax_series)
ax_cleaned_dt.xaxis.axis_date()
ax_cleaned_dt_hist = fig.add_subplot(gs[1, 3])

ax_series_nt = fig.add_subplot(gs[2, 0], sharex=ax_series)
ax_series_nt.xaxis.axis_date()
ax_series_nt_hist = fig.add_subplot(gs[2, 1])
ax_cleaned_nt = fig.add_subplot(gs[2, 2], sharex=ax_series)
ax_cleaned_nt.xaxis.axis_date()
ax_cleaned_nt_hist = fig.add_subplot(gs[2, 3])

axes_series = [ax_series, ax_cleaned, ax_series_dt, ax_cleaned_dt, ax_series_nt, ax_cleaned_nt]
axes_hist = [ax_series_hist, ax_cleaned_hist, ax_series_dt_hist,
ax_cleaned_dt_hist, ax_series_nt_hist, ax_cleaned_nt_hist]
hist_kwargs = dict(method='n_bins', n_bins=None, highlight_peak=True, show_zscores=True, show_info=False,
show_title=False, show_zscore_values=False, show_grid=False)
series_kwargs = dict(x=df.index, fmt='o', mec='none', alpha=.2, color='black')
series_kwargs = dict(marker='o', mec='black', markeredgewidth=1, alpha=.2, fillstyle='none', linestyle='none')

# Column 0
ax_series.plot_date(
y=df['CLEANED'], label=f"OK ({df['CLEANED'].count()} values)", **series_kwargs)
ax_series.plot_date(
x=df.index, y=df['OUTLIER'], fmt='X', ms=10, mec='none',
alpha=.9, color='red', label=f"outlier ({df['OUTLIER'].count()} values)")
ax_series_dt.plot_date(
y=df['UNFILTERED_DT'], label=f"series ({df['UNFILTERED_DT'].count()} values)", **series_kwargs)
ax_series_dt.plot_date(
x=df.index, y=df['OUTLIER_DT'], fmt='X', ms=10, mec='none',
alpha=.9, color='red', label=f"outlier ({df['OUTLIER_DT'].count()} values)")
ax_series_nt.plot_date(
y=df['UNFILTERED_NT'], label=f"series ({df['UNFILTERED_NT'].count()} values)", **series_kwargs)
ax_series_nt.plot_date(
x=df.index, y=df['OUTLIER_NT'], fmt='X', ms=10, mec='none',
alpha=.9, color='red', label=f"outlier ({df['OUTLIER_NT'].count()} values)")
ax_series.plot(df.index, df['CLEANED'], label=f"OK ({df['CLEANED'].count()} values)", **series_kwargs)
ax_series.plot(df.index, df['OUTLIER'], marker='X', ms=10, mec='none', linestyle='none',
alpha=.9, color='red', label=f"outlier ({df['OUTLIER'].count()} values)")
ax_series_dt.plot(df.index, df['UNFILTERED_DT'], label=f"series ({df['UNFILTERED_DT'].count()} values)",
**series_kwargs)
ax_series_dt.plot(df.index, df['OUTLIER_DT'], marker='X', ms=10, mec='none', linestyle='none',
alpha=.9, color='red', label=f"outlier ({df['OUTLIER_DT'].count()} values)")
ax_series_nt.plot(df.index, df['UNFILTERED_NT'], label=f"series ({df['UNFILTERED_NT'].count()} values)",
**series_kwargs)
ax_series_nt.plot(df.index, df['OUTLIER_NT'], marker='X', ms=10, mec='none', linestyle='none',
alpha=.9, color='red', label=f"outlier ({df['OUTLIER_NT'].count()} values)")

# Column 1
HistogramPlot(s=df['UNFILTERED'], **hist_kwargs).plot(ax=ax_series_hist)
HistogramPlot(s=df['UNFILTERED_DT'], **hist_kwargs).plot(ax=ax_series_dt_hist)
HistogramPlot(s=df['UNFILTERED_NT'], **hist_kwargs).plot(ax=ax_series_nt_hist)

# Column 2
ax_cleaned.plot_date(
y=df['CLEANED'], label=f"cleaned ({df['CLEANED'].count()} values)", **series_kwargs)
ax_cleaned_dt.plot_date(
y=df['CLEANED_DT'], label=f"cleaned daytime ({df['CLEANED_DT'].count()} values)", **series_kwargs)
ax_cleaned_nt.plot_date(
y=df['CLEANED_NT'], label=f"cleaned nighttime ({df['CLEANED_NT'].count()} values)", **series_kwargs)
ax_cleaned.plot(df.index, df['CLEANED'], label=f"cleaned ({df['CLEANED'].count()} values)", **series_kwargs)
ax_cleaned_dt.plot(df.index, df['CLEANED_DT'], label=f"cleaned daytime ({df['CLEANED_DT'].count()} values)",
**series_kwargs)
ax_cleaned_nt.plot(df.index, df['CLEANED_NT'], label=f"cleaned nighttime ({df['CLEANED_NT'].count()} values)",
**series_kwargs)

# Column 3
HistogramPlot(s=df['CLEANED'], **hist_kwargs).plot(ax=ax_cleaned_hist)
Expand Down
2 changes: 2 additions & 0 deletions diive/core/io/files.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,8 @@ def save_parquet(filename: str, data: DataFrame or Series, outpath: str or None
"""
filepath = set_outpath(outpath=outpath, filename=filename, fileextension='parquet')
tic = time.time()
if isinstance(data, Series):
data = data.to_frame()
data.to_parquet(filepath)
toc = time.time() - tic
print(f"Saved file {filepath} ({toc:.3f} seconds).")
Expand Down
51 changes: 16 additions & 35 deletions diive/core/plotting/cumulative.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,7 @@ def __init__(self,

# Create axis
self.fig, self.ax = pf.create_ax()
self.ax.xaxis.axis_date()

def _add_reference(self, digits_after_comma):

Expand All @@ -69,12 +70,12 @@ def _add_reference(self, digits_after_comma):

# label = f"{year}: {cumulative_df[year].dropna().iloc[-1]:.2f}"
mean_end = self.mean_doy_cumulative_df['MEAN_DOY_TIME'].iloc[-1]
self.ax.plot_date(x=self.mean_doy_cumulative_df.index.values,
y=self.mean_doy_cumulative_df['MEAN_DOY_TIME'].values,
color='black', alpha=1,
ls='-', lw=theme.WIDTH_LINE_WIDER,
marker='', markeredgecolor='none', ms=0,
zorder=99, label=f'mean {mean_end:.{digits_after_comma}f}')
self.ax.plot(self.mean_doy_cumulative_df.index.values,
self.mean_doy_cumulative_df['MEAN_DOY_TIME'].values,
color='black', alpha=1,
ls='-', lw=theme.WIDTH_LINE_WIDER,
marker='', markeredgecolor='none', ms=0,
zorder=99, label=f'mean {mean_end:.{digits_after_comma}f}')
# self.ax.fill_between(mean_cumulative_df.index.values,
# mean_cumulative_df['MEAN+1.96_SD'].values,
# mean_cumulative_df['MEAN-1.96_SD'].values,
Expand Down Expand Up @@ -125,12 +126,12 @@ def plot(self, showplot: bool = True, digits_after_comma: int = 2):
lw = theme.WIDTH_LINE_WIDER if year == self.highlight_year else theme.WIDTH_LINE_DEFAULT
color = self.highlight_year_color if year == self.highlight_year else color_list[ix]

self.ax.plot_date(x=self.cumulatives_per_year_df.index,
y=self.cumulatives_per_year_df[year],
color=color, alpha=1,
ls='-', lw=lw,
marker='', markeredgecolor='none', ms=0,
zorder=99, label=label)
self.ax.plot(self.cumulatives_per_year_df.index,
self.cumulatives_per_year_df[year],
color=color, alpha=1,
ls='-', lw=lw,
marker='', markeredgecolor='none', ms=0,
zorder=99, label=label)

# Show reference
if self.show_reference:
Expand All @@ -143,29 +144,9 @@ def plot(self, showplot: bool = True, digits_after_comma: int = 2):


def example():
# # Test data
# from diive.core.io.filereader import ReadFileType
# loaddatafile = ReadFileType(
# filetype='DIIVE-CSV-30MIN',
# filepath=r"M:\Downloads\_temp\CH_LAE_FP2021_2004-2020_ID20210607205711.diive.csv",
# # filepath=r"F:\Dropbox\luhk_work\_current\fp2022\7-14__IRGA627572__addingQCF0\CH-DAV_FP2022.1_1997-2022.08_ID20220826234456_30MIN.diive.csv",
# data_nrows=None)
# data_df, metadata_df = loaddatafile.get_filedata()
#
# from diive.core.io.files import save_as_pickle
# filepath = save_as_pickle(
# outpath=r"M:\Downloads\_temp",
# # outpath=r'F:\Dropbox\luhk_work\_current\fp2022\7-14__IRGA627572__addingQCF0',
# filename='CH_LAE_FP2021_2004-2020_ID20210607205711.diive.csv',
# # filename='CH-DAV_FP2022.1_1997-2022.08_ID20220826234456_30MIN.diive.csv',
# data=data_df)

# Test data
from diive.core.io.files import load_pickle
df_orig = load_pickle(
filepath=r"M:\Downloads\_temp\CH_LAE_FP2021_2004-2020_ID20210607205711.diive.csv.pickle"
# filepath=r'F:\Dropbox\luhk_work\_current\fp2022\7-14__IRGA627572__addingQCF0\CH-DAV_FP2022.1_1997-2022.08_ID20220826234456_30MIN.diive.csv.pickle'
)
from diive.configs.exampledata import load_exampledata_parquet
df_orig = load_exampledata_parquet()

df = df_orig.copy()

Expand All @@ -190,7 +171,7 @@ def example():
# series.index = pd.to_datetime(series.index)
# series = series.groupby(series.index.year).mean() # yearly mean

series = df['NEE_f'].copy()
series = df['NEE_CUT_REF_f'].copy()
# series = df['NEE_CUT_REF_f'].copy()
series = series.multiply(0.02161926) # umol CO2 m-2 s-1 --> g C m-2 30min-1
series_units = r'($\mathrm{gC\ m^{-2}}$)'
Expand Down
Loading

0 comments on commit 0a9f1d7

Please sign in to comment.