Skip to content

Commit

Permalink
Merge pull request #66 from holukas/file-splitter
Browse files Browse the repository at this point in the history
v0.71.0
  • Loading branch information
holukas authored Mar 14, 2024
2 parents 390aa21 + a572955 commit f665651
Show file tree
Hide file tree
Showing 37 changed files with 1,730 additions and 3,613 deletions.
56 changes: 56 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,62 @@

![DIIVE](images/logo_diive1_256px.png)

## v0.71.0 | 14 Mar 2024

### High-resolution update

This update focuses on the implementation of several classes that work with high-resolution (20 Hz) data.

The main motivation behind these implementations is the upcoming new version of another
script, [dyco](https://github.com/holukas/dyco), which will make direct use of these new classes. `dyco` allows
to detect and remove time lags from time series data and can also handle drifting lags, i.e., lags that
are not constant over time. This is especially useful for eddy covariance data, where the detection of
accurate time lags is of high importance for the calculation of ecosystem fluxes.

![DIIVE](images/lagMaxCovariance_diive_v0.71.0.png)
*Plot showing the covariance between the turbulent departures of vertical wind and CO2 measurements.
Maximum (absolute) covariance was found at record -26, which means that the CO2 signal has to be shifted
by 26 records in relation to the wind data to obtain the maximum covariance between the two variables.
Since the covariance was calculated on 20 Hz data, this corresponds to a time lag of 1.3 seconds
between CO2 and wind (20 Hz = measurement every 0.05 seconds, 26 * 0.05 = 1.3), or, to put it
another way, the CO2 signal arrived 1.3 seconds later at the sensor than the wind signal. Maximum
covariance was calculated using the `MaxCovariance` class.*

### New features

- Added new class `MaxCovariance` to find the maximum covariance between two
variables (`diive.pkgs.echires.lag.MaxCovariance`)
- Added new class `FileDetector` to detect expected and unexpected files from a list of
files (`diive.core.io.filesdetector.FileDetector`)
- Added new class `FileSplitter` to split file into multiple smaller parts and export them as multiple CSV
files. (`diive.core.io.filesplitter.FileSplitter`)
- Added new class `FileSplitterMulti` to split multiple files into multiple smaller parts
and save them as CSV or compressed CSV files. (`diive.core.io.filesplitter.FileSplitterMulti`)
- Added new function `create_timestamp` that calculates the timestamp for each record in a dataframe,
based on number of records in the file and the file duration. (`diive.core.times.times.create_timestamp`)

### Additions

- Added new filetype `ETH-SONICREAD-BICO-CSVGZ-20HZ`, these files contain data that were originally logged
by the `sonicread` script which is in use in the [ETH Grassland Sciences group](https://gl.ethz.ch/) since the early
2000s to record eddy covariance data within the [Swiss FluxNet](https://www.swissfluxnet.ethz.ch/). Data were
then converted to a regular format using the Python script [bico](https://github.com/holukas/bico), which
also compressed the resulting CSV files to `gz` files (`gzipped`).
- Added new filetype `GENERIC-CSV-HEADER-1ROW-TS-MIDDLE-FULL-NS-30MIN`, which corresponds to a CSV file with
one header row with variable names, a timestamp that describes the middle of the averaging period, whereby
the timestamp also includes nanoseconds. Time resolution of the file is 30MIN.

### Changes

- Renamed class `TurbFlux` to `WindRotation2D` and updated code a bit, e.g., now it is possible to get
rotated values for all three wind components (`u'`, `v'`, `w'`) in addition to the rotated
scalar `c'`. (`diive.pkgs.echires.windrotation.WindRotation2D`)
- Renamed filetypes: all filetypes now use the dash instead of an underscore
- Renamed filetype to `ETH-RECORD-DAT-20HZ`: this filetype originates from the new eddy covariance real-time
logging script `rECord` (currently not open source)
- Missing values are now defined for all files
as: `NA_VALUES: [ -9999, -6999, -999, "nan", "NaN", "NAN", "NA", "inf", "-inf", "-" ]`

## v0.70.1 | 1 Mar 2024

- Updated (and cleaned) notebook `StepwiseMeteoScreeningFromDatabase.ipynb`
Expand Down
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@

# Time series data processing

`diive` is a Python library for time series processing.
`diive` is a Python library for time series processing, in particular ecosystem data. Originally developed
for [Swiss FluxNet](https://www.swissfluxnet.ethz.ch/) by the [ETH Grassland Sciences group](https://gl.ethz.ch/).

Recent updates: [CHANGELOG](CHANGELOG.md)
Recent releases: [Releases](https://github.com/holukas/diive/releases)
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
GENERAL:
NAME: "CSV_10MIN"
NAME: "CSV-10MIN"
DESCRIPTION: "Generic CSV format with 1-row header containing variable names and 1-column full timestamp."
TAGS: [ "GENERIC-CSV" ]

Expand All @@ -17,6 +17,7 @@ DATA:
HEADER_SECTION_ROWS: [ 0 ]
SKIP_ROWS: [ ]
HEADER_ROWS: [ 0 ]
NA_VALUES: [ -9999, nan, NaN, NAN, -6999, '-' ]
NA_VALUES: [ -9999, -6999, -999, "nan", "NaN", "NAN", "NA", "inf", "-inf", "-" ]
FREQUENCY: "10T"
DELIMITER: ","

Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
GENERAL:
NAME: "CSV_TS-FULL-MIDDLE_30MIN"
NAME: "CSV-TS-FULL-MIDDLE_30MIN"
DESCRIPTION: "Generic CSV format with 1-row header containing variable names and 1-column full middle timestamp."
TAGS: [ "GENERIC-CSV" ]

Expand All @@ -17,6 +17,6 @@ DATA:
HEADER_SECTION_ROWS: [ 0 ]
SKIP_ROWS: [ ]
HEADER_ROWS: [ 0 ]
NA_VALUES: [ -9999 ]
NA_VALUES: [ -9999, -6999, -999, "nan", "NaN", "NAN", "NA", "inf", "-inf", "-" ]
FREQUENCY: "30T"
DELIMITER: ","
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
GENERAL:
NAME: "DIIVE_CSV_30MIN"
NAME: "DIIVE-CSV-30MIN"
DESCRIPTION: "Default DIIVE format with 2-row header (variable name, units) and 1-column full timestamp."
TAGS: [ "DIIVE" ]

Expand All @@ -17,6 +17,6 @@ DATA:
HEADER_SECTION_ROWS: [ 0, 1 ]
SKIP_ROWS: [ ]
HEADER_ROWS: [ 0, 1 ]
NA_VALUES: [ -9999 ]
NA_VALUES: [ -9999, -6999, -999, "nan", "NaN", "NAN", "NA", "inf", "-inf", "-" ]
FREQUENCY: "30T"
DELIMITER: ","
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
GENERAL:
NAME: "EDDYPRO_FLUXNET_30MIN"
NAME: "EDDYPRO-FLUXNET-30MIN"
DESCRIPTION: "The *_fluxnet_* file from EddyPro."
TAGS: [ "EDDYPRO" ]

Expand All @@ -17,6 +17,6 @@ DATA:
HEADER_SECTION_ROWS: [ 0 ]
SKIP_ROWS: [ ]
HEADER_ROWS: [ 0 ]
NA_VALUES: [ -9999 ]
NA_VALUES: [ -9999, -6999, -999, "nan", "NaN", "NAN", "NA", "inf", "-inf", "-" ]
FREQUENCY: "30T"
DELIMITER: ","
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
GENERAL:
NAME: "EDDYPRO_FULL_OUTPUT_30MIN"
NAME: "EDDYPRO-FULL-OUTPUT-30MIN"
DESCRIPTION: "The *_full_output_* file from EddyPro."
TAGS: [ "EDDYPRO" ]

Expand All @@ -17,6 +17,6 @@ DATA:
HEADER_SECTION_ROWS: [ 0, 1, 2 ]
SKIP_ROWS: [ 0 ]
HEADER_ROWS: [ 0, 1 ]
NA_VALUES: [ -9999 ]
NA_VALUES: [ -9999, -6999, -999, "nan", "NaN", "NAN", "NA", "inf", "-inf", "-" ]
FREQUENCY: "30T"
DELIMITER: ","
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
GENERAL:
NAME: "RECORD_DAT_20HZ"
NAME: "ETH-RECORD-DAT-20HZ"
DESCRIPTION: "TOA5 format with 4-row header including variable name and units row, and no timestamp."
TAGS: [ "TOA5" ]

Expand All @@ -17,6 +17,6 @@ DATA:
HEADER_SECTION_ROWS: [ 0, 1, 2, 3 ]
SKIP_ROWS: [ 0, 3 ]
HEADER_ROWS: [ 0, 1 ]
NA_VALUES: [ -9999 ]
NA_VALUES: [ -9999, -6999, -999, "nan", "NaN", "NAN", "NA", "inf", "-inf", "-" ]
FREQUENCY: "50ms"
DELIMITER: ","
22 changes: 22 additions & 0 deletions diive/configs/filetypes/ETH-SONICREAD-BICO-CSVGZ-20HZ.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
GENERAL:
NAME: "ETH-SONICREAD-BICO-CSVGZ-20HZ"
DESCRIPTION: "TOA5 format with 4-row header including variable name and units row, and no timestamp."
TAGS: [ "sonicread", "bico" ]

FILE:
EXTENSION: "*.csv"
COMPRESSION: "gzip"

TIMESTAMP:
DESCRIPTION: "No timestamp in files."
INDEX_COLUMN: "-not-available-"
DATETIME_FORMAT: "-not-available-"
SHOWS_START_MIDDLE_OR_END_OF_RECORD: "-not-available-"

DATA:
HEADER_SECTION_ROWS: [ 0, 1, 2 ]
SKIP_ROWS: [ 2 ]
HEADER_ROWS: [ 0, 1 ]
NA_VALUES: [ -9999, -6999, -999, "nan", "NaN", "NAN", "NA", "inf", "-inf", "-" ]
FREQUENCY: "50ms"
DELIMITER: ","
2 changes: 1 addition & 1 deletion diive/configs/filetypes/FLUXNET-CH4-HH-CSV-30MIN.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,6 @@ DATA:
HEADER_SECTION_ROWS: [ 0, 1, 2 ]
SKIP_ROWS: [ 0, 1 ]
HEADER_ROWS: [ 0 ]
NA_VALUES: [ -9999 ]
NA_VALUES: [ -9999, -6999, -999, "nan", "NaN", "NAN", "NA", "inf", "-inf", "-" ]
FREQUENCY: "30T"
DELIMITER: ","
2 changes: 1 addition & 1 deletion diive/configs/filetypes/FLUXNET-FULLSET-HH-CSV-30MIN.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,6 @@ DATA:
HEADER_SECTION_ROWS: [ 0 ]
SKIP_ROWS: [ ]
HEADER_ROWS: [ 0 ]
NA_VALUES: [ -9999 ]
NA_VALUES: [ -9999, -6999, -999, "nan", "NaN", "NAN", "NA", "inf", "-inf", "-" ]
FREQUENCY: "30T"
DELIMITER: ","
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
GENERAL:
NAME: "GENERIC-CSV_HEADER-1ROW_TS-END-FULL_1MIN"
NAME: "GENERIC-CSV-HEADER-1ROW-TS-END-FULL-1MIN"
DESCRIPTION: "Generic CSV format with 1-row header containing variable names and 1-column full timestamp."
TAGS: [ "GENERIC-CSV" ]

Expand All @@ -17,6 +17,6 @@ DATA:
HEADER_SECTION_ROWS: [ 0 ]
SKIP_ROWS: [ ]
HEADER_ROWS: [ 0 ]
NA_VALUES: [ -9999 ]
NA_VALUES: [ -9999, -6999, -999, "nan", "NaN", "NAN", "NA", "inf", "-inf", "-" ]
FREQUENCY: "1T"
DELIMITER: ","
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
GENERAL:
NAME: "GENERIC-CSV-HEADER-1ROW-TS-MIDDLE-FULL-NS-30MIN"
DESCRIPTION: "Generic CSV format with 1-row header containing variable names and 1-column full timestamp with nanoseconds."
TAGS: [ "GENERIC-CSV" ]

FILE:
EXTENSION: "*.csv"
COMPRESSION: "None"

TIMESTAMP:
DESCRIPTION: "1 column with full timestamp with seconds and nanoseconds"
INDEX_COLUMN: [ 0 ]
DATETIME_FORMAT: "%Y-%m-%d %H:%M:%S.%f"
SHOWS_START_MIDDLE_OR_END_OF_RECORD: "middle"

DATA:
HEADER_SECTION_ROWS: [ 0 ]
SKIP_ROWS: [ ]
HEADER_ROWS: [ 0 ]
NA_VALUES: [ -9999, -6999, -999, "nan", "NaN", "NAN", "NA", "inf", "-inf", "-" ]
FREQUENCY: "50ms"
DELIMITER: ","
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
GENERAL:
NAME: "ICOS_CSV_1MIN"
NAME: "ICOS-CSV-1MIN"
DESCRIPTION: "Uncompressed (not zipped) ICOS format with 1-row header and ISO timestamp."
TAGS: [ "ICOS" ]

Expand All @@ -17,6 +17,6 @@ DATA:
HEADER_SECTION_ROWS: [ 0 ]
SKIP_ROWS: [ ]
HEADER_ROWS: [ 0 ]
NA_VALUES: [ -9999 ]
NA_VALUES: [ -9999, -6999, -999, "nan", "NaN", "NAN", "NA", "inf", "-inf", "-" ]
FREQUENCY: "1T"
DELIMITER: ","
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
GENERAL:
NAME: "ICOS_H1R_CSVZIP_1MIN"
NAME: "ICOS-H1R-CSVZIP-1MIN"
DESCRIPTION: "Compressed (zipped) ICOS format with 1-row header (variable names) and ISO timestamp."
TAGS: [ "ICOS" ]

Expand All @@ -17,6 +17,6 @@ DATA:
HEADER_SECTION_ROWS: [ 0 ]
SKIP_ROWS: [ ]
HEADER_ROWS: [ 0 ]
NA_VALUES: [ -9999 ]
NA_VALUES: [ -9999, -6999, -999, "nan", "NaN", "NAN", "NA", "inf", "-inf", "-" ]
FREQUENCY: "1MIN"
DELIMITER: ","
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
GENERAL:
NAME: "ICOS_H2R_CSVZIP_10S"
NAME: "ICOS-H2R-CSVZIP-10S"
DESCRIPTION: "Compressed (zipped) ICOS format with 2-row header (variables, units) and ISO timestamp."
TAGS: [ "ICOS" ]

Expand All @@ -17,6 +17,6 @@ DATA:
HEADER_SECTION_ROWS: [ 0, 1 ]
SKIP_ROWS: [ ]
HEADER_ROWS: [ 0, 1 ]
NA_VALUES: [ -9999 ]
NA_VALUES: [ -9999, -6999, -999, "nan", "NaN", "NAN", "NA", "inf", "-inf", "-" ]
FREQUENCY: "10S"
DELIMITER: ","
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
GENERAL:
NAME: "ICOS_H2R_CSVZIP_1MIN"
NAME: "ICOS-H2R-CSVZIP-1MIN"
DESCRIPTION: "Compressed (zipped) ICOS format with 2-row header (variable names and units) and ISO timestamp."
TAGS: [ "ICOS" ]

Expand All @@ -17,6 +17,6 @@ DATA:
HEADER_SECTION_ROWS: [ 0, 1 ]
SKIP_ROWS: [ ]
HEADER_ROWS: [ 0, 1 ]
NA_VALUES: [ -9999 ]
NA_VALUES: [ -9999, -6999, -999, "nan", "NaN", "NAN", "NA", "inf", "-inf", "-" ]
FREQUENCY: "1MIN"
DELIMITER: ","
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
GENERAL:
NAME: "REDDYPROC_30MIN"
NAME: "REDDYPROC-30MIN"
DESCRIPTION: "Output file from ReddyProc."
TAGS: [ "REDDYPROC" ]

Expand All @@ -17,6 +17,6 @@ DATA:
HEADER_SECTION_ROWS: [ 0 ]
SKIP_ROWS: [ ]
HEADER_ROWS: [ 0 ]
NA_VALUES: [ "NA", -9999, "inf", "-inf" ]
NA_VALUES: [ -9999, -6999, -999, "nan", "NaN", "NAN", "NA", "inf", "-inf", "-" ]
FREQUENCY: "30T"
DELIMITER: ","
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
GENERAL:
NAME: "TOA5_CSV_10MIN"
NAME: "TOA5-CSV-10MIN"
DESCRIPTION: "TOA5 format with 4-row header including variable name and units row, and 1-column full timestamp."
TAGS: [ "TOA5" ]

Expand All @@ -17,6 +17,6 @@ DATA:
HEADER_SECTION_ROWS: [ 0, 1, 2, 3 ]
SKIP_ROWS: [ 0, 3 ]
HEADER_ROWS: [ 0, 1 ]
NA_VALUES: [ -9999 ]
NA_VALUES: [ -9999, -6999, -999, "nan", "NaN", "NAN", "NA", "inf", "-inf", "-" ]
FREQUENCY: "10T"
DELIMITER: ","
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
GENERAL:
NAME: "TOA5_DAT_1MIN_EMPTYUNITS"
NAME: "TOA5-DAT-1MIN-EMPTYUNITS"
DESCRIPTION: "TOA5 format with 4-row header including variable name and *empty* units row, and 1-column full timestamp."
TAGS: [ "TOA5" ]

Expand All @@ -17,6 +17,6 @@ DATA:
HEADER_SECTION_ROWS: [ 0, 1, 2, 3 ]
SKIP_ROWS: [ 0, 2, 3 ]
HEADER_ROWS: [ 0 ]
NA_VALUES: [ -9999 ]
NA_VALUES: [ -9999, -6999, -999, "nan", "NaN", "NAN", "NA", "inf", "-inf", "-" ]
FREQUENCY: "1T"
DELIMITER: ","
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
GENERAL:
NAME: "TOA5_DAT_1MIN"
NAME: "TOA5-DAT-1MIN"
DESCRIPTION: "TOA5 format with 4-row header including variable name and units row, and 1-column full timestamp."
TAGS: [ "TOA5" ]

Expand All @@ -17,6 +17,6 @@ DATA:
HEADER_SECTION_ROWS: [ 0, 1, 2, 3 ]
SKIP_ROWS: [ 0, 3 ]
HEADER_ROWS: [ 0, 1 ]
NA_VALUES: [ -9999 ]
NA_VALUES: [ -9999, -6999, -999, "nan", "NaN", "NAN", "NA", "inf", "-inf", "-" ]
FREQUENCY: "1T"
DELIMITER: ","
2 changes: 1 addition & 1 deletion diive/configs/filetypes/__diive_CSV_10MIN.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,6 @@ DATA:
HEADER_SECTION_ROWS: [ 0 ]
SKIP_ROWS: [ ]
HEADER_ROWS: [ 0 ]
NA_VALUES: [ -9999, nan, NaN, NAN, -6999, '-' ]
NA_VALUES: [ -9999, -6999, -999, "nan", "NaN", "NAN", "NA", "inf", "-inf", "-" ]
FREQUENCY: "10T"
DELIMITER: ","
2 changes: 1 addition & 1 deletion diive/configs/filetypes/__diive_CSV_1MIN.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,6 @@ DATA:
HEADER_SECTION_ROWS: [ 0 ]
SKIP_ROWS: [ ]
HEADER_ROWS: [ 0 ]
NA_VALUES: [ -999, -9999, nan, NaN, NAN, -6999, '-' ]
NA_VALUES: [ -9999, -6999, -999, "nan", "NaN", "NAN", "NA", "inf", "-inf", "-" ]
FREQUENCY: "1T"
DELIMITER: ","
Loading

0 comments on commit f665651

Please sign in to comment.