Skip to content

Latest commit

 

History

History
396 lines (253 loc) · 19.2 KB

NEWS.md

File metadata and controls

396 lines (253 loc) · 19.2 KB

timetk 2.6.1.9000 (Development Version)

Improvements

  • summarize_by_time(): Added a .week_start argument to allow specifying .week_start = 1 for Monday start. Default is 7 for Sunday Start. This can also be changed with the lubridate by setting the lubridate.week.start option.

  • Plot ACF Diagnostics (plot_acf_diagnostics()): Change default parameter to .show_white_noise_bars = TRUE. #85

  • Time Series CV (time_series_cv()): Add Label for tune_results

Bug Fixes

  • plot_time_series_regression(): Fixed an issue when lags are added to .formula. Pads lags with NA.

  • tk_tbl.zoo(): Fix an issue when readr::type_convert() produces warning messages about not having character columns in inputs. #89

timetk 2.6.1

Improvements

  • tk_augment_slidify(), tk_augment_lags(), tk_augment_leads(), tk_augment_differences(): Now works with multiple columns (passed via .value) and tidyselect (e.g. contains()).

Fixes

  • Reduce "New names" messages.
#> New names: 
#> * NA -> ...1 
  • Remove dependency on lazyeval. #24
  • Fix deprecated functions: select_() used with tk_xts_(). #52

timetk 2.6.0

New Functions

  • filter_period() (#64): Applies filtering expressions within time-based periods (windows).
  • slice_period() (#64): Applies slices within time-based periods (windows).
  • condense_period() (#64): Converts a periodicity from a higher (e.g. daily) to lower (e.g. monthly) frequency. Similar to xts::to.period() and tibbletime::as_period().
  • tk_augment_leads() and lead_vec() (#65): Added to make it easier / more obvious on how to create leads.

Fixes

  • time_series_cv(): Fix bug with Panel Data. Train/Test Splits only returning 1st observation in final time stamp. Should return all observations.
  • future_frame() and tk_make_future_timeseries(): Now sort the incoming index to ensure dates returned go into the future.
  • tk_augment_lags() and tk_augment_slidify(): Now overwrite column names to match the behavior of tk_augment_fourier() and tk_augment_differences().

timetk 2.5.0

Improvements

  • time_series_cv(): Now works with time series groups. This is great for working with panel data.
  • future_frame(): Gets a new argument called .bind_data. When set to TRUE, it performs a data binding operation with the incoming data and the future frame.

Miscellaneous

  • Tune startup messages (#63)

timetk 2.4.0

  • step_slidify_augment() - A variant of step slidify that adds multiple rolling columns inside of a recipe.

Bug Fixes

  • Add warning when %+time% and %-time% return missing values
  • Fix issues with tk_make_timeseries() and tk_make_future_timeseries() providing odd results for regular time series. GitHub Issue 60

timetk 2.3.0

New Functionality

  • tk_time_series_cv_plan() - Now works with k-fold cross validation objects from vfold_cv() function.

  • pad_by_time() - Added new argument .fill_na_direction to specify a tidyr::fill() strategy for filling missing data.

Bug Fixes

  • Augment functions (e.g. tk_augment_lags()) - Fix bug with grouped functions not being exported
  • Vectorized Functions - Compatabiliy with ts class

timetk 2.2.1

New Functions

  • step_log_interval_vec() - Extends the log_interval_vec() for recipes preprocessing.

Parallel Processing

  • Parallel backend for use with tune and recipes

Bug Fixes

  • log_interval_vec() - Correct the messaging
  • complement.ts_cv_split - Helper to show time series cross validation splits in list explorer.

timetk 2.2.0

New Functions

  • mutate_by_time(): For applying mutates by time windows
  • log_interval_vec() & log_interval_inv_vec(): For constrained interval forecasting.

Improvements

  • plot_acf_diagnostics(): A new argument, .show_white_noise_bars for adding white noise bars to an ACF / PACF Plot.
  • pad_by_time(): New arguments .start_date and .end_date for expanding/contracting the padding windows.

timetk 2.1.0

New Functions

  • plot_time_series_regression(): Convenience function to visualize & explore features using Linear Regression (stats::lm() formula).
  • time_series_split(): A convenient way to return a single split from time_series_cv(). Returns the split in the same format as rsample::initial_time_split().

Improvements

  • Auto-detect date and date-time: Affects summarise_by_time(), filter_by_time(), tk_summary_diagnostics
  • tk_time_series_cv_plan(): Allow a single resample from rsample::initial_time_split or timetk::time_series_split
  • Updated Vignette: The vignette, "Forecasting Using the Time Series Signature", has been updated with modeltime and tidymodels.

Plotting Improvements

  • All plotting functions now support Tab Completion (a minor breaking change was needed to do so, see breaking changes below)
  • plot_time_series():
    • Add .legend_show to toggle on/off legends.
    • Permit numeric index (fix issue with smoother failing)

Breaking Changes

  • Tab Completion: Replace ... with .facet_vars or .ccf_vars. This change is needed to improve tab-completion. It affects :
    • plot_time_series()
    • plot_acf_diagnostics()
    • plot_anomaly_diagnostics()
    • plot_seasonal_diagnostics()
    • plot_stl_diagnostics()

Bug Fixes

  • fourier_vec() and step_fourier_vec(): Add error if observations have zero difference. Issue #40.

timetk 2.0.0

New Interactive Plotting Functions

  • plot_anomaly_diagnostics(): Visualize Anomalies for One or More Time Series

New Data Wrangling Functions

  • future_frame(): Make a future tibble from an existing time-based tibble.

New Diagnostic / Data Processing Functions

  • tk_anomaly_diagnostics() - Group-wise anomaly detection and diagnostics. A wrapper for the anomalize R package functions without importing anomalize.

New Vectorized Functions:

  • ts_clean_vec() - Replace Outliers & Missing Values in a Time Series
  • standardize_vec() - Centers and scales a time series to mean 0, standard deviation 1
  • normalize_vec() - Normalizes a time series to Range: (0, 1)

New Recipes Preprocessing Steps:

  • step_ts_pad() - Preprocessing for padding time series data. Adds rows to fill in gaps and can be used with step_ts_impute() to interpolate going from low to high frequency!
  • step_ts_clean() - Preprocessing step for cleaning outliers and imputing missing values in a time series.

New Parsing Functions

  • parse_date2() and parse_datetime2(): These are similar to readr::parse_date() and lubridate::as_date() in that they parse character vectors to date and datetimes. The key advantage is SPEED. parse_date2() uses anytime package to process using C++ Boost.Date_Time library.

Improvements:

  • plot_acf_diagnostics(): The .lags argument now handles time-based phrases (e.g. .lags = "1 month").
  • time_series_cv(): Implements time-based phrases (e.g. initial = "5 years" and assess = "1 year")
  • tk_make_future_timeseries(): The n_future argument has been deprecated for a new length_out argument that accepts both numeric input (e.g. length_out = 12) and time-based phrases (e.g. length_out = "12 months"). A major improvement is that numeric values define the number of timestamps returned even if weekends are removed or holidays are removed. Thus, you can always anticipate the length. (Issue #19).
  • diff_vec: Now reports the initial values used in the differencing calculation.

Bug Fixes:

  • plot_time_series():
    • Fix name collision when .value = .value.
  • tk_make_future_timeseries():
    • Respect timezones
  • time_series_cv():
    • Fix incorrect calculation of starts/stops
    • Make skip = 1 default. skip = 0 does not make sense.
    • Fix issue with skip adding 1 to stops.
    • Fix printing method
  • plot_time_series_cv_plan() & tk_time_series_cv_plan():
    • Prevent name collisions when underlying data has column "id" or "splits"
  • tk_make_future_timeseries():
    • Fix bug when day of month doesn't exist. Lubridate period() returns NA. Fix implemented with ceiling_date().
  • pad_by_time():
    • Fix pad_value so only inserts pad values where new row was inserted.
  • step_ts_clean(), step_ts_impute():
    • Fix issue with lambda = NULL

Breaking Changes:

These should not be of major impact since the 1.0.0 version was just released.

  • Renamed impute_ts_vec() to ts_impute_vec() for consistency with ts_clean_vec()
  • Renamed step_impute_ts() to step_ts_impute() for consistency with underlying function
  • Renamed roll_apply_vec() to slidify_vec() for consistency with slidify() & relationship to slider R package
  • Renamed step_roll_apply to step_slidify() for consistency with slidify() & relationship to slider R package
  • Renamed tk_augment_roll_apply to tk_augment_slidify() for consistency with slidify() & relationship to slider R package
  • plot_time_series_cv_plan() and tk_time_series_cv_plan(): Changed argument from .rset to .data.

timetk 1.0.0

New Interactive Plotting Functions:

  • plot_time_series() - A workhorse time-series plotting function that generates interactive plotly plots, consolidates 20+ lines of ggplot2 code, and scales well to many time series using dplyr groups.
  • plot_acf_diagnostics() - Visualize the ACF, PACF, and any number of CCFs in one plot for Multiple Time Series. Interactive plotly by default.
  • plot_seasonal_diagnostics() - Visualize Multiple Seasonality Features for One or More Time Series. Interactive plotly by default.
  • plot_stl_diagnostics() - Visualize STL Decomposition Features for One or More Time Series.
  • plot_time_series_cv_plan() - Visualize the Time Series Cross Validation plan made with time_series_cv().

New Time Series Data Wrangling:

  • summarise_by_time() - A time-based variant of dplyr::summarise() for flexible summarization using common time-based criteria.
  • filter_by_time() - A time-based variant of dplyr::filter() for flexible filtering by time-ranges.
  • pad_by_time() - Insert time series rows with regularly spaced timestamps.
  • slidify() - Make any function a rolling / sliding function.
  • between_time() - A time-based variant of dplyr::between() for flexible time-range detection.
  • add_time() - Add for time series index. Shifts an index by a period.

New Recipe Functions:

Feature Generators:

  • step_holiday_signature() - New recipe step for adding 130 holiday features based on individual holidays, locales, and stock exchanges / business holidays.
  • step_fourier() - New recipe step for adding fourier transforms for adding seasonal features to time series data
  • step_roll_apply() - New recipe step for adding rolling summary functions. Similar to recipes::step_window() but is more flexible by enabling application of any summary function.
  • step_smooth() - New recipe step for adding Local Polynomial Regression (LOESS) for smoothing noisy time series
  • step_diff() - New recipe for adding multiple differenced columns. Similar to recipes::step_lag().
  • step_box_cox() - New recipe for transforming predictors. Similar to step_BoxCox() with improvements for forecasting including "guerrero" method for lambda selection and handling of negative data.
  • step_impute_ts() - New recipe for imputing a time series.

New Rsample Functions

  • time_series_cv() - Create rsample cross validation sets for time series. This function produces a sampling plan starting with the most recent time series observations, rolling backwards.

New Vector Functions:

These functions are useful on their own inside of mutate() and power many of the new plotting and recipes functions.

  • roll_apply_vec() - Vectorized rolling apply function - wraps slider::slide_vec()
  • smooth_vec() - Vectorized smoothing function - Applies Local Polynomial Regression (LOESS)
  • diff_vec() and diff_inv_vec() - Vectorized differencing function. Pads NA's by default (unlike stats::diff).
  • lag_vec() - Vectorized lag functions. Returns both lags and leads (negative lags) by adjusting the .lag argument.
  • box_cox_vec(), box_cox_inv_vec(), & auto_lambda() - Vectorized Box Cox transformation. Leverages forecast::BoxCox.lambda() for automatic lambda selection.
  • fourier_vec() - Vectorized Fourier Series calculation.
  • impute_ts_vec() - Vectorized imputation of missing values for time series. Leverages forecast::na.interp().

New Augment Functions:

All of the functions are designed for scale. They respect dplyr::group_by().

  • tk_augment_holiday_signature() - Add holiday features to a data.frame using only a time-series index.
  • tk_augment_roll_apply() - Add multiple columns of rolling window calculations to a data.frame.
  • tk_augment_differences() - Add multiple columns of differences to a data.frame.
  • tk_augment_lags() - Add multiple columns of lags to a data.frame.
  • tk_augment_fourier() - Add multiple columns of fourier series to a data.frame.

New Make Functions:

Make date and date-time sequences between start and end dates.

  • tk_make_timeseries() - Super flexible function for creating daily and sub-daily time series.
  • tk_make_weekday_sequence() - Weekday sequence that accounts for both stripping weekends and holidays
  • tk_make_holiday_sequence() - Makes a sequence of dates corresponding to business holidays in calendars from timeDate (common non-working days)
  • tk_make_weekend_sequence() - Weekday sequence of dates for Saturday and Sunday (common non-working days)

New Get Functions:

  • tk_get_holiday_signature() - Get 100+ holiday features using only a time-series index.
  • tk_get_frequency() and tk_get_trend() - Automatic frequency and trend calculation from a time series index.

New Diagnostic / Data Processing Functions

  • tk_summary_diagnostics() - Group-wise time series summary.
  • tk_acf_diagnostics() - The data preparation function for plot_acf_diagnostics()
  • tk_seasonal_diagnostics() - The data preparation function for plot_seasonal_diagnostics()
  • tk_stl_diagnostics() - Group-wise STL Decomposition (Season, Trend, Remainder). Data prep for plot_stl_diagnostics().
  • tk_time_series_cv_plan - The data preparation function for plot_time_series_cv_plan()

New Datasets

  • M4 Competition - Sample "economic" datasets from hourly, daily, weekly, monthly, quarterly, and yearly.
  • Walmart Recruiting Retail Sales Forecasting Competition - Sample of 7 retail time series
  • Web Traffic Forecasting (Wikipedia) Competition - Sample of 10 website time series
  • Taylor's Energy Demand - Single time series with 30-minute interval of energy demand
  • UCI Bike Sharing Daily - A time series consisting of Capital Bikesharing Transaction Counts and related time-based features.

Improvements:

  • tk_make_future_timeseries() - Now accepts n_future as a time-based phrase like "12 seconds" or "1 year".

Bug Fixes:

  • Don't set timezone on date - Accommodate recent changes to lubridate::tz<- which now returns POSIXct when used Date objects. Fixed in PR32 by @vspinu.

Potential Breaking Changes:

  • tk_augment_timeseries_signature() - Changed from data to .data to prevent name collisions when piping.

timetk 0.1.3

New Features:

  • recipes Integration - Ability to apply time series feature engineering in the tidymodels machine learning workflow.
    • step_timeseries_signature() - New step_timeseries_signature() for adding date and date-time features.
  • New Vignette - "Time Series Machine Learning" (previously forecasting using the time series signature)

Bug Fixes:

  • xts::indexTZ is deprecated. Use tzone instead.
  • Replace arrange_ with arrange.
  • Fix failing tests due to tidyquant 1.0.0 upagrade (single stocks now return an extra symbol column).

timetk 0.1.2

  • Compatability with tidyquant v0.5.7 - Removed dependency on tidyverse
  • Dependency cleanup - removed devtools and other unncessary dependencies.

timetk 0.1.1

  • Added timeSeries to Suggests to satisfy a CRAN issue.

timetk 0.1.0

  • Renamed package timetk. Was formerly timekit.
  • Improvements:
    • Fixed issue with back-ticked date columns
    • Update pkgdown
    • support for robets