Skip to content

Commit

Permalink
Merge pull request #50 from databricks-industry-solutions/foundation-…
Browse files Browse the repository at this point in the history
…model-notebooks

Added example notebooks for time series foundation models
  • Loading branch information
ryuta-yoshimatsu authored Jun 4, 2024
2 parents 54fbdcc + d43969c commit 44967fb
Show file tree
Hide file tree
Showing 16 changed files with 805 additions and 20 deletions.
20 changes: 10 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ To run this solution on a public [M4](https://www.kaggle.com/datasets/yogesh94/m

Local models are used to model individual time series. We support models from [statsforecast](https://github.com/Nixtla/statsforecast), [r fable](https://cran.r-project.org/web/packages/fable/vignettes/fable.html) and [sktime](https://www.sktime.net/en/stable/). Covariates (i.e. exogenous regressors) are currently only supported for some statsforecast models.

To get started, attach the [notebooks/demo_local_univariate_daily.py](https://github.com/databricks-industry-solutions/many-model-forecasting/blob/main/notebooks/demo_local_univariate_daily.py) notebook to a cluster running [DBR 14.3 ML](https://docs.databricks.com/en/release-notes/runtime/14.3lts-ml.html) or later runtime. The cluster can be either a single-node or multi-node CPU cluster. Make sure to set the following [Spark configurations](https://spark.apache.org/docs/latest/configuration.html) on the cluster before you start using MMF: ```spark.sql.execution.arrow.enabled true``` and ```spark.sql.adaptive.enabled false``` (more detailed explanation to follow).
To get started, attach the [examples/local_univariate_daily.py](https://github.com/databricks-industry-solutions/many-model-forecasting/blob/main/examples/local_univariate_daily.py) notebook to a cluster running [DBR 14.3 ML](https://docs.databricks.com/en/release-notes/runtime/14.3lts-ml.html) or later runtime. The cluster can be either a single-node or multi-node CPU cluster. Make sure to set the following [Spark configurations](https://spark.apache.org/docs/latest/configuration.html) on the cluster before you start using MMF: ```spark.sql.execution.arrow.enabled true``` and ```spark.sql.adaptive.enabled false``` (more detailed explanation to follow).

In this notebook, we will apply 20+ models to 100 time series. You can specify the models to use in a list:

Expand Down Expand Up @@ -103,13 +103,13 @@ To modify the model hyperparameters, directly change the values in [mmf_sa/model

MMF is fully integrated with MLflow, so once the training kicks off, the experiments will be visible in the MLflow Tracking UI with the corresponding metrics and parameters (note that we do not log all local models in MLFlow but we store the binary in the tables ```evaluation_output``` and ```scoring_output```). The metric you see in the MLflow Tracking UI is a simple mean over backtesting trials over all time series.

Other example notebooks for monthly forecasting and forecasting with exogenous regressors can be found in [notebooks/demo_local_univariate_monthly.py](https://github.com/databricks-industry-solutions/many-model-forecasting/blob/main/notebooks/demo_local_univariate_monthly.py) and [notebooks/demo_local_univariate_external_regressors_daily.py](https://github.com/databricks-industry-solutions/many-model-forecasting/blob/main/notebooks/demo_local_univariate_external_regressors_daily.py).
Other example notebooks for monthly forecasting and forecasting with exogenous regressors can be found in [examples/local_univariate_monthly.py](https://github.com/databricks-industry-solutions/many-model-forecasting/blob/main/examples/local_univariate_monthly.py) and [examples/local_univariate_external_regressors_daily.py](https://github.com/databricks-industry-solutions/many-model-forecasting/blob/main/examples/local_univariate_external_regressors_daily.py).

### Global Models

Global models leverage patterns across multiple time series, enabling shared learning and improved predictions for each series. You typically train one big model for many or all time series. We support deep learning based models from [neuralforecast](https://nixtlaverse.nixtla.io/neuralforecast/index.html). Covariates (i.e. exogenous regressors) and hyperparameter tuning are both supported.

To get started, attach the [notebooks/demo_global_daily.py](https://github.com/databricks-industry-solutions/many-model-forecasting/blob/main/notebooks/demo_global_daily.py) notebook to a cluster running [DBR 14.3 ML](https://docs.databricks.com/en/release-notes/runtime/index.html) or later runtime. We recommend using a single-node cluster with multiple GPU instances such as [g4dn.12xlarge [T4]](https://aws.amazon.com/ec2/instance-types/g4/) on AWS or [Standard_NC64as_T4_v3](https://learn.microsoft.com/en-us/azure/virtual-machines/nct4-v3-series) on Azure. Multi-node setting is currently not supported.
To get started, attach the [examples/global_daily.py](https://github.com/databricks-industry-solutions/many-model-forecasting/blob/main/examples/global_daily.py) notebook to a cluster running [DBR 14.3 ML](https://docs.databricks.com/en/release-notes/runtime/index.html) or later runtime. We recommend using a single-node cluster with multiple GPU instances such as [g4dn.12xlarge [T4]](https://aws.amazon.com/ec2/instance-types/g4/) on AWS or [Standard_NC64as_T4_v3](https://learn.microsoft.com/en-us/azure/virtual-machines/nct4-v3-series) on Azure. Multi-node setting is currently not supported.

You can choose the models to train and put them in a list:

Expand All @@ -130,7 +130,7 @@ active_models = [

The models prefixed with "Auto" perform hyperparameter optimization within a specified range (see below for more detail). A comprehensive list of models currently supported by MMF is available in the [models_conf.yaml](https://github.com/databricks-industry-solutions/many-model-forecasting/blob/main/mmf_sa/models/models_conf.yaml).

Now, with the following command, we run the [notebooks/run_daily.py](https://github.com/databricks-industry-solutions/many-model-forecasting/blob/main/notebooks/run_daily.py) that will run the ```run_forecast``` function and loop through the ```active_models``` list . The reason why we iterate through the models this way is because once a neuralforecast model is loaded to the memory, we need to restart the python kernel to use another model.
Now, with the following command, we run the [examples/run_daily.py](https://github.com/databricks-industry-solutions/many-model-forecasting/blob/main/examples/run_daily.py) that will run the ```run_forecast``` function and loop through the ```active_models``` list . The reason why we iterate through the models this way is because once a neuralforecast model is loaded to the memory, we need to restart the python kernel to use another model.

```python
for model in active_models:
Expand All @@ -140,7 +140,7 @@ for model in active_models:
arguments={"catalog": catalog, "db": db, "model": model})
```

Inside the [notebooks/run_daily.py](https://github.com/databricks-industry-solutions/many-model-forecasting/blob/main/notebooks/run_daily.py), we have the ```run_forecast``` function specified as:
Inside the [examples/run_daily.py](https://github.com/databricks-industry-solutions/many-model-forecasting/blob/main/examples/run_daily.py), we have the ```run_forecast``` function specified as:

```python
run_forecast(
Expand Down Expand Up @@ -178,13 +178,13 @@ To modify the model hyperparameters or reset the range of the hyperparameter opt

MMF is fully integrated with MLflow and so once the training kicks off, the experiments will be visible in the MLflow Tracking UI with the corresponding metrics and parameters. Once the training is complete the models will be logged to MLFlow and registered to Unity Catalog.

Other example notebooks for monthly forecasting and forecasting with exogenous regressors can be found in [notebooks/demo_global_monthly.py](https://github.com/databricks-industry-solutions/many-model-forecasting/blob/main/notebooks/demo_global_monthly.py) and [notebooks/demo_global_external_regressors_daily.py](https://github.com/databricks-industry-solutions/many-model-forecasting/blob/main/notebooks/demo_global_external_regressors_daily.py) respectively.
Other example notebooks for monthly forecasting and forecasting with exogenous regressors can be found in [examples/global_monthly.py](https://github.com/databricks-industry-solutions/many-model-forecasting/blob/main/examples/global_monthly.py) and [examples/global_external_regressors_daily.py](https://github.com/databricks-industry-solutions/many-model-forecasting/blob/main/examples/global_external_regressors_daily.py) respectively.

### Foundation Models

Foundation time series models are large transformer based models pretrained on millions or billions of time series. These models can produce analysis (i.e. forecasting, anomaly detection, classfication) on an unforeseen time series without training or tuning. We support open source models from multiple sources: [chronos](https://github.com/amazon-science/chronos-forecasting), [moirai](https://blog.salesforceairesearch.com/moirai/), and [moment](https://github.com/moment-timeseries-foundation-model/moment). Covariates (i.e. exogenous regressors) and fine-tuning are currently not yet supported. This is a rapidly changing field, and we are working on updating the supported models and features as the field evolves.

To get started, attach the [notebooks/demo_foundation_daily.py](https://github.com/databricks-industry-solutions/many-model-forecasting/blob/main/notebooks/demo_foundation_daily.py) notebook to a cluster running [DBR 14.3 ML](https://docs.databricks.com/en/release-notes/runtime/index.html) or later runtime. We recommend using a single-node cluster with multiple GPU instances such as [g4dn.12xlarge [T4]](https://aws.amazon.com/ec2/instance-types/g4/) on AWS or [Standard_NC64as_T4_v3](https://learn.microsoft.com/en-us/azure/virtual-machines/nct4-v3-series) on Azure. Multi-node setup is currently not supported.
To get started, attach the [examples/foundation_daily.py](https://github.com/databricks-industry-solutions/many-model-forecasting/blob/main/examples/foundation_daily.py) notebook to a cluster running [DBR 14.3 ML](https://docs.databricks.com/en/release-notes/runtime/index.html) or later runtime. We recommend using a single-node cluster with multiple GPU instances such as [g4dn.12xlarge [T4]](https://aws.amazon.com/ec2/instance-types/g4/) on AWS or [Standard_NC64as_T4_v3](https://learn.microsoft.com/en-us/azure/virtual-machines/nct4-v3-series) on Azure. Multi-node setup is currently not supported.

You can choose the models you want to evaluate and forecast by specifying them in a list:

Expand All @@ -204,7 +204,7 @@ active_models = [

A comprehensive list of models currently supported by MMF is available in the [models_conf.yaml](https://github.com/databricks-industry-solutions/many-model-forecasting/blob/main/mmf_sa/models/models_conf.yaml).

Now, with the following command, we run the [notebooks/run_daily.py](https://github.com/databricks-industry-solutions/many-model-forecasting/blob/main/notebooks/run_daily.py) that will run the ```run_forecast``` function. We loop through the ```active_models``` list for the same reason mentioned above (see the global model section).
Now, with the following command, we run the [examples/run_daily.py](https://github.com/databricks-industry-solutions/many-model-forecasting/blob/main/examples/run_daily.py) that will run the ```run_forecast``` function. We loop through the ```active_models``` list for the same reason mentioned above (see the global model section).

```python
for model in active_models:
Expand All @@ -214,13 +214,13 @@ for model in active_models:
arguments={"catalog": catalog, "db": db, "model": model})
```

Inside the [notebooks/run_daily.py](https://github.com/databricks-industry-solutions/many-model-forecasting/blob/main/notebooks/run_daily.py), we have the same ```run_forecast``` function as above.
Inside the [examples/run_daily.py](https://github.com/databricks-industry-solutions/many-model-forecasting/blob/main/examples/run_daily.py), we have the same ```run_forecast``` function as above.

To modify the model hyperparameters, directly change the values in [mmf_sa/models/models_conf.yaml](https://github.com/databricks-industry-solutions/many-model-forecasting/blob/main/mmf_sa/models/models_conf.yaml) or overwrite these values in [mmf_sa/base_forecasting_conf.yaml](https://github.com/databricks-industry-solutions/many-model-forecasting/blob/main/mmf_sa/base_forecasting_conf.yaml).

MMF is fully integrated with MLflow and so once the training kicks off, the experiments will be visible in the MLflow Tracking UI with the corresponding metrics and parameters. During the evaluation, the models are logged and registered to Unity Catalog.

An example notebook for monthly forecasting can be found in [notebooks/demo_foundation_monthly.py](https://github.com/databricks-industry-solutions/many-model-forecasting/blob/main/notebooks/demo_foundation_monthly.py).
An example notebook for monthly forecasting can be found in [examples/foundation_monthly.py](https://github.com/databricks-industry-solutions/many-model-forecasting/blob/main/examples/foundation_monthly.py).

## Project support
Please note the code in this project is provided for your exploration only, and are not formally supported by Databricks with Service Level Agreements (SLAs). They are provided AS-IS and we do not make any guarantees of any kind. Please do not submit a support ticket relating to any issues arising from the use of these projects. The source in this project is provided subject to the Databricks License. All included or referenced third party libraries are subject to the licenses set forth below.
Expand Down
Loading

0 comments on commit 44967fb

Please sign in to comment.