-
Notifications
You must be signed in to change notification settings - Fork 37
Training HERON Compatible ARMAs in RAVEN
One of the principle tools HERON leverages for stochastic technoeconomic analysis is the synthetic histories generated by RAVEN. These synthetic histories provide the stochastic boundary conditions for dispatch optimization.
However, HERON has been built around specific ARMA
structures. This guide will help guide you in setting up your RAVEN workflow to train your ARMA
ROMs.
The synthetic histories produced by RAVEN and used in HERON currently originate in a trained ARMA
ROM from a RAVEN training workflow. Examples of these training workflows can be found in HERON at HERON/tests/integration_tests/ARMA
.
Synthetic histories for use in HERON are expected to be three-dimensional:
- Time, or the micro-step time evolution variable (e.g. hourly),
- Cluster, or the clustering identification variable (e.g. 20 clusters to represent a year of data)
- Year, or the macro-step time evolution variable (e.g. year).
Both
Time
andYear
can be renamed in the HERON input in the<Case>
node. In this guide, we will refer to them asTime
andYear
for convenience.
This structure comes from an ARMA
ROM that is trained using the <Segment grouping='interpolate'>
node. For example, in the RAVEN workflow for training the ROM (based on train_sine.xml
):
<Models>
<ROM name="arma" subType="ARMA">
<Target>Signal, Time</Target>
<Features>scaling</Features>
<pivotParameter>Time</pivotParameter>
<P>0</P>
<Q>0</Q>
<Fourier>10</Fourier>
<Segment grouping='interpolate'> <!-- note this node specifically -->
<macroParameter>Year</macroParameter>
<Classifier class='Models' type='PostProcessor'>classifier</Classifier>
<subspace divisions='365'>Time</subspace>
</Segment>
<reseedCopies>False</reseedCopies>
<seed>42</seed>
</ROM>
The node above would divide the training data into 365 divisions per year for clustering. Note also that a <Classifier>
is specified. This determines how the divisions will be clustered for analysis. The classifier for this case is as follows:
<PostProcessor name="classifier" subType="DataMining">
<KDD labelFeature="labels" lib="SciKitLearn">
<Features>Signal</Features>
<SKLtype>cluster|KMeans</SKLtype>
<n_clusters>20</n_clusters> <!-- note this node specifically -->
</KDD>
</PostProcessor>
Note that the <n_clusters>
node is requesting 20 clusters for our data. Other clustering strategies can be used and are enumerated in the RAVEN documentation.
The input CSV for training an interpolated ARMA
is a RAVEN-style HistorySet CSV, which consists of a "header" CSV with corresponding "auxiliary" CSVs. For example, for head.csv
:
scaling,Year,filename
1,2020,data_2020.csv
1,2025,data_2025.csv
and corresponding data_2020.csv
:
Time,Signal
0,1.01
1,1.005
...
and similarly for data_2025.csv
. Note that the head.csv
correlates the Year
with the file containing the signal data for that year. RAVEN will train the ARMA
ROM to interpolate for the years between 2020 and 2025, while training ROMs directly on 2020 and 2025. Many different years can be included, and RAVEN will interpolate for all of them. See the RAVEN manual for more information.
What if you only want to run a single year without clustering? It is possible to "trick" the RAVEN ARMA
to do that with some minor modifications to the training data and RAVEN training input file.
First, we assume you have a header CSV (for example, head.csv
) that looks something like:
scaling,filename
1,mydata.csv
Modify this by inserting a Year
column, with whatever value you want, for example 1:
scaling,Year,filename
1,1,mydata.csv
then copy the first entry after the header and paste it at the end, modifying the Year
value to be the next integer value:
scaling,Year,filename
1,1,mydata.csv
1,2,mydata.csv
Note that the filename
stays the same; this will convince RAVEN that the two years are fundamentally the same, but give you the Year
structure we're looking for.
We assume you already have a ROM training input for this data. This will need to be extended to include interpolated segmenting and clustering; however, like we tricked it into being multiyear, we will also trick it into being a single cluster, which is identical to having no clustering. For example, from the train_sine.xml
example:
<ROM name="arma" subType="ARMA">
<Target>Signal, Time</Target>
<Features>scaling</Features>
<pivotParameter>Time</pivotParameter>
<P>0</P>
<Q>0</Q>
<Fourier>10</Fourier>
<Segment grouping='interpolate'>
<macroParameter>Year</macroParameter>
<Classifier class='Models' type='PostProcessor'>classifier</Classifier>
<subspace divisions='1'>Time</subspace>
</Segment>
<reseedCopies>False</reseedCopies>
<seed>42</seed>
</ROM>
<PostProcessor name="classifier" subType="DataMining">
<KDD labelFeature="labels" lib="SciKitLearn">
<Features>Signal</Features>
<SKLtype>cluster|KMeans</SKLtype>
<n_clusters>1</n_clusters>
</KDD>
</PostProcessor>
Note we use divisions='1'
and <n_clusters>1</n_clusters>
to get the clustering format correct. Now the ARMA that is trained will have the correct (Time, Cluster, Year) structure for use in HERON. Note that <ProjectTime>
in HERON still should be at least 2 to run without issues.