Skip to content

Commit

Permalink
Fix up pipeline vignette
Browse files Browse the repository at this point in the history
  • Loading branch information
katieb1 committed Oct 11, 2024
1 parent 53a4f36 commit bd71915
Show file tree
Hide file tree
Showing 3 changed files with 103 additions and 58 deletions.
28 changes: 14 additions & 14 deletions vignettes/a01_rsyncrosim_vignette_basic.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -289,9 +289,9 @@ Here, we are viewing the contents of a SyncroSim datasheet as an R data frame. A

### Configure model inputs using `datasheet()` and `addRow()`

Currently our Inputs scenario datasheet is empty! We will need to add some values to the Inputs datasheet (`InputDatasheet`) so we can run our model.
Currently our `Inputs` scenario datasheet is empty! We will need to add some values to the `Inputs` datasheet (`InputDatasheet`) so we can run our model.

First, assign the Inputs datasheet to a new data frame variable.
First, assign the `Inputs` datasheet to a new data frame variable.

```{r assign input data, warning = FALSE}
# Assign contents of the Inputs datasheet to an R data frame
Expand All @@ -306,12 +306,12 @@ Check the columns that need input values and the type of values these columns re
str(myInputDataframe)
```

The Inputs datasheet requires 2 values:
The `Inputs` datasheet requires 2 values:

* `m` : the slope of the linear equation.
* `b` : the intercept of the linear equation.

Now, we will update the Inputs data frame. This can be done in many ways (e.g. using the `dplyr` package), but `rsyncrosim` also provides a helper function called `addRow()` for easily adding new rows to R data frames. The `addRow()` function takes the `targetDataframe` as the first value (in this case, our Inputs data frame that we want to update), and the data frame of new rows to append to the input data frame as the second value.
Now, we will update the `Inputs` data frame. This can be done in many ways (e.g. using the `dplyr` package), but `rsyncrosim` also provides a helper function called `addRow()` for easily adding new rows to R data frames. The `addRow()` function takes the `targetDataframe` as the first value (in this case, our `Inputs` data frame that we want to update), and the data frame of new rows to append to the input data frame as the second value.

```{r add input data, warning = FALSE}
# Create input data and add it to the input data frame
Expand All @@ -324,7 +324,7 @@ myInputDataframe

### Saving modifications to datasheets using `saveDatasheet()`

Now that we have a complete data frame of the Inputs, we will save this data frame to its respective SyncroSim datasheets using the `saveDatasheet()` function. Since this datasheet is scenario-scoped, we will save it at the scenario level by setting `ssimObject = myScenario`.
Now that we have a complete data frame of the `Inputs`, we will save this data frame to its respective SyncroSim datasheets using the `saveDatasheet()` function. Since this datasheet is scenario-scoped, we will save it at the scenario level by setting `ssimObject = myScenario`.

```{r save input data, warning = FALSE}
# Save Inputs R data frame to a SyncroSim datasheet
Expand All @@ -334,9 +334,9 @@ saveDatasheet(ssimObject = myScenario, data = myInputDataframe,

### Configuring the `Pipeline` datasheet

Next, we need to add data to the Pipeline datasheet. The Pipeline datasheet determines which transformers the scenarios will run and in which order. Use the code below to assign the Pipeline datasheet to a new data frame variable and check the values required by the datasheet.
Next, we need to add data to the `Pipeline` datasheet. The `Pipeline` datasheet is a built-in SyncroSim datasheet, meaning that it comes with every SyncroSim library regardless of which packages that library uses.The `Pipeline` datasheet determines which transformer stage the scenarios will run and in which order. We use the term "transformers" because these constitute scripts that *transform* input data into output data. Use the code below to assign the `Pipeline` datasheet to a new data frame variable and check the values required by the datasheet.

```{r assign pipeline data, warning = FALSE}
```{r assign Pipeline data, warning = FALSE}
# Assign contents of the Pipeline datasheet to an R data frame
myPipeline <- datasheet(myScenario,
name = "core_Pipeline")
Expand All @@ -347,13 +347,13 @@ str(myPipeline)

The Pipeline datasheet requires 2 values:

* `StageNameId` : the pipeline stage (transformer). This column is a factor that has only a single level: "Hello World Time (R)".
* `StageNameId` : the pipeline transformer stage. This column is a factor that has only a single level: "Hello World Time (R)".
* `RunOrder` : the numerical order in which the stages will be run.

Below, we use the `addRow()` and `saveDatasheet()` functions to update the Pipeline datasheet with the transformer(s) we want to run and the order in which we want to run them. In this case, there is only a single transformer available from the `helloworldTime` package, called "Hello World Time (R)", so we will add this transformer to the data frame and set the `RunOrder` to `1`.
Below, we use the `addRow()` and `saveDatasheet()` functions to update the `Pipeline` datasheet with the transformer(s) we want to run and the order in which we want to run them. In this case, there is only a single transformer available from the `helloworldTime` package, called "Hello World Time (R)", so we will add this transformer to the data frame and set the `RunOrder` to `1`.

```{r add pipeline data, warning = FALSE}
# Create pipeline data and add it to the pipeline data frame
```{r add Pipeline data, warning = FALSE}
# Create Pipeline data and add it to the Pipeline data frame
myPipelineRow <- data.frame(StageNameId = "Hello World Time (R)", RunOrder = 1)
myPipeline <- addRow(myPipeline, myPipelineRow)
Expand Down Expand Up @@ -419,7 +419,7 @@ runLog(myResultScenario)

### Result scenarios

A *result scenario* is generated when a scenario is run, and is an exact copy of the original scenario (i.e. it contains the original scenario's values for all Inputs datasheets). The result scenario is passed to the transformer in order to generate model output, with the results of the transformer's calculations then being added to the result scenario as output datasheets. In this way the result scenario contains both the output of the run and a snapshot record of all the model inputs.
A *result scenario* is generated when a scenario is run, and is an exact copy of the original scenario (i.e. it contains the original scenario's values for all `Inputs` datasheets). The result scenario is passed to the transformer in order to generate model output, with the results of the transformer's calculations then being added to the result scenario as output datasheets. In this way the result scenario contains both the output of the run and a snapshot record of all the model inputs.

Check out the current scenarios in your library using the `scenario()` function.

Expand All @@ -445,7 +445,7 @@ Looking at the `data` column, the `Outputs` does not contain any data in the ori

### Viewing results with `datasheet()`

The next step is to view the Outputs datasheet in the result scenario that was populated from running the original scenario. We can load the result table using the `datasheet()` function and setting the `name` parameter to the Outputs datasheet.
The next step is to view the `Outputs` datasheet in the result scenario that was populated from running the original scenario. We can load the result table using the `datasheet()` function and setting the `name` parameter to the `Outputs` datasheet.


```{r view results datasheets, warning = FALSE}
Expand Down Expand Up @@ -474,7 +474,7 @@ myNewScenario <- scenario(ssimObject = myProject,
scenario(myLibrary)['Name']
```

To edit the new scenario, we must first load the contents of the Inputs datasheet and assign it to a new R data frame using the `datasheet()` function. We will set the `empty` argument to `TRUE` so that instead of getting the values from the existing scenario, we can start with an empty data frame again.
To edit the new scenario, we must first load the contents of the `Inputs` datasheet and assign it to a new R data frame using the `datasheet()` function. We will set the `empty` argument to `TRUE` so that instead of getting the values from the existing scenario, we can start with an empty data frame again.

```{r load input data from new Scenario, warning = FALSE}
# Load empty Inputs datasheets as an R data frame
Expand Down
32 changes: 17 additions & 15 deletions vignettes/a02_rsyncrosim_vignette_uncertainty.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -140,7 +140,7 @@ View the datasheets associated with your new scenario using the `datasheet()` fu
datasheet(myScenario)
```

From the list of datasheets above, we can see that there are three datasheets specific to the `helloworldUncertainty` package. Let's view the contents of the Inputs datasheet as an R data frame.
From the list of datasheets above, we can see that there are three datasheets specific to the `helloworldUncertainty` package. Let's view the contents of the `Inputs` datasheet as an R data frame.

```{r view specific datasheet, warning = FALSE}
# View the contents of the Inputs datasheet for the scenario
Expand All @@ -151,7 +151,7 @@ datasheet(myScenario, name = "helloworldUncertainty_InputDatasheet")

**Inputs Datasheet**

Currently our input scenario datasheet is empty! We need to add some values to our Inputs datasheet (`InputDatasheet`) so we can run our model. First, assign the contents of the Inputs datasheet to a new data frame variable using `datasheet()`, then check the columns that need input values.
Currently our input scenario datasheet is empty! We need to add some values to our `Inputs` datasheet (`InputDatasheet`) so we can run our model. First, assign the contents of the `Inputs` datasheet to a new data frame variable using `datasheet()`, then check the columns that need input values.

```{r assign input data, warning = FALSE}
# Load the Inputs datasheet to an R data frame
Expand All @@ -162,7 +162,7 @@ myInputDataframe <- datasheet(myScenario,
str(myInputDataframe)
```

The Inputs datasheet requires three values:
The `Inputs` datasheet requires three values:

* `mMean` : the mean of the slope normal distribution.
* `mSD` : the standard deviation of the slope normal distribution.
Expand Down Expand Up @@ -220,30 +220,30 @@ saveDatasheet(ssimObject = myScenario, data = myPipeline,
name = "core_Pipeline")
```

**RunControl Datasheet**
**Run Control Datasheet**

The `RunControl` datasheet provides information about how many time steps and iterations to use in the model. Here, we set the *number of iterations*, as well as the minimum and maximum time steps for our model. The number of iterations we set is equivalent to the number of Monte Carlo realizations, so the greater the number of iterations, the more accurate the range of output values we will obtain. Let's take a look at the columns that need input values.
The `Run Control` datasheet provides information about how many time steps and iterations to use in the model. Here, we set the *number of iterations*, as well as the minimum and maximum time steps for our model. The number of iterations we set is equivalent to the number of Monte Carlo realizations, so the greater the number of iterations, the more accurate the range of output values we will obtain. Let's take a look at the columns that need input values.

```{r modify run control}
# Load RunControl datasheet to a new R data frame
# Load Run Control datasheet to a new R data frame
runSettings <- datasheet(myScenario, name = "helloworldUncertainty_RunControl")
# Check the columns of the RunControl data frame
# Check the columns of the Run Control data frame
str(runSettings)
```

The RunControl datasheet requires the following 3 columns:
The `Run Control` datasheet requires the following 3 columns:

* `MaximumIteration` : total number of iterations to run the model for.
* `MinimumTimestep` : the starting time point of the simulation.
* `MaximumTimestep` : the end time point of the simulation.

*Note:* A fourth hidden column, `MinimumIteration`, also exists in the RunControl datasheet (default=1).
*Note:* A fourth hidden column, `MinimumIteration`, also exists in the `Run Control` datasheet (default=1).

We'll add this information to an R data frame and then add it to the Run Control data frame using `addRow()`. For this example, we will use only five iterations.
We'll add this information to an R data frame and then add it to the `Run Control` data frame using `addRow()`. For this example, we will use only five iterations.

```{r}
# Create run control data and add it to the run control data frame
# Create Run Control data and add it to the Run Control data frame
runSettingsRow <- data.frame(MaximumIteration = 5,
MinimumTimestep = 1,
MaximumTimestep = 10)
Expand All @@ -256,7 +256,7 @@ runSettings
Finally, save the R data frame to a SyncroSim datasheet using `saveDatasheet()`.

```{r}
# Save RunControl R data frame to a SyncroSim datasheet
# Save Run Control R data frame to a SyncroSim datasheet
saveDatasheet(ssimObject = myScenario,
data = runSettings,
name = "helloworldUncertainty_RunControl")
Expand All @@ -267,7 +267,9 @@ saveDatasheet(ssimObject = myScenario,

### Setting run parameters with `run()`

We will now run our scenario using the `run()` function in `rsyncrosim`. If we have a large model and we want to parallelize the run using multiprocessing, we can modify the library-scoped "core_Multiprocessing" datasheet. Since we are using five iterations in our model, we will set the number of jobs to five so each multiprocessing core will run a single iteration.
We will now run our scenario using the `run()` function in `rsyncrosim`.

If we have a large model and we want to parallelize the run using multiprocessing, we can modify the library-scoped "core_Multiprocessing" datasheet. Since we are using five iterations in our model, we will set the number of jobs to five so each multiprocessing core will run a single iteration.

```{r}
# Load list of available library-scoped datasheets
Expand Down Expand Up @@ -298,7 +300,7 @@ Now, when we run our scenario, it will use the desired multiprocessing configura
myResultScenario <- run(myScenario)
```

Running the original scenario creates a new scenario object, known as a result scenario, that contains a read-only snapshot of the Inputs datasheets, as well as the Outputs datasheets filled with result data. We can view which scenarios are result scenarios using the `scenario()` function from `rsyncrosim`.
Running the original scenario creates a new scenario object, known as a result scenario, that contains a read-only snapshot of the `Inputs` datasheets, as well as the `Outputs` datasheets filled with result data. We can view which scenarios are result scenarios using the `scenario()` function from `rsyncrosim`.

```{r}
# Check that we have two scenarios, and one is a result scenario
Expand All @@ -311,7 +313,7 @@ scenario(myLibrary)

### Viewing results with `datasheet()`

The next step is to view the Outputs datasheets added to the result scenario when it was run. We can load the result tables using the `datasheet()` function. In this package, the datasheet containing the results is called "OutputDatasheet".
The next step is to view the `Outputs` datasheets added to the result scenario when it was run. We can load the result tables using the `datasheet()` function. In this package, the datasheet containing the results is called "OutputDatasheet".


```{r view results datasheets, warning = FALSE}
Expand Down
Loading

0 comments on commit bd71915

Please sign in to comment.