Fix up pipeline vignette

syncrosim · Oct 11, 2024 · bd71915 · bd71915
1 parent 53a4f36
commit bd71915
Show file tree

Hide file tree

Showing 3 changed files with 103 additions and 58 deletions.
diff --git a/vignettes/a01_rsyncrosim_vignette_basic.Rmd b/vignettes/a01_rsyncrosim_vignette_basic.Rmd
@@ -289,9 +289,9 @@ Here, we are viewing the contents of a SyncroSim datasheet as an R data frame. A
 
 ### Configure model inputs using `datasheet()` and `addRow()`
 
-Currently our Inputs scenario datasheet is empty! We will need to add some values to the Inputs datasheet (`InputDatasheet`) so we can run our model. 
+Currently our `Inputs` scenario datasheet is empty! We will need to add some values to the `Inputs` datasheet (`InputDatasheet`) so we can run our model. 
 
-First, assign the Inputs datasheet to a new data frame variable.
+First, assign the `Inputs` datasheet to a new data frame variable.
 
 ```{r assign input data, warning = FALSE}
 # Assign contents of the Inputs datasheet to an R data frame
@@ -306,12 +306,12 @@ Check the columns that need input values and the type of values these columns re
 str(myInputDataframe)
 ```
 
-The Inputs datasheet requires 2 values:
+The `Inputs` datasheet requires 2 values:
 
 * `m` : the slope of the linear equation.
 * `b` : the intercept of the linear equation.
 
-Now, we will update the Inputs data frame. This can be done in many ways (e.g. using the `dplyr` package), but `rsyncrosim` also provides a helper function called `addRow()` for easily adding new rows to R data frames. The `addRow()` function takes the `targetDataframe` as the first value (in this case, our Inputs data frame that we want to update), and the data frame of new rows to append to the input data frame as the second value.
+Now, we will update the `Inputs` data frame. This can be done in many ways (e.g. using the `dplyr` package), but `rsyncrosim` also provides a helper function called `addRow()` for easily adding new rows to R data frames. The `addRow()` function takes the `targetDataframe` as the first value (in this case, our `Inputs` data frame that we want to update), and the data frame of new rows to append to the input data frame as the second value.
 
 ```{r add input data, warning = FALSE}
 # Create input data and add it to the input data frame
@@ -324,7 +324,7 @@ myInputDataframe
 
 ### Saving modifications to datasheets using `saveDatasheet()`
 
-Now that we have a complete data frame of the Inputs, we will save this data frame to its respective SyncroSim datasheets using the `saveDatasheet()` function. Since this datasheet is scenario-scoped, we will save it at the scenario level by setting `ssimObject = myScenario`.
+Now that we have a complete data frame of the `Inputs`, we will save this data frame to its respective SyncroSim datasheets using the `saveDatasheet()` function. Since this datasheet is scenario-scoped, we will save it at the scenario level by setting `ssimObject = myScenario`.
 
 ```{r save input data, warning = FALSE}
 # Save Inputs R data frame to a SyncroSim datasheet
@@ -334,9 +334,9 @@ saveDatasheet(ssimObject = myScenario, data = myInputDataframe,
 
 ### Configuring the `Pipeline` datasheet
 
-Next, we need to add data to the Pipeline datasheet. The Pipeline datasheet determines which transformers the scenarios will run and in which order. Use the code below to assign the Pipeline datasheet to a new data frame variable and check the values required by the datasheet.
+Next, we need to add data to the `Pipeline` datasheet. The `Pipeline` datasheet is a built-in SyncroSim datasheet, meaning that it comes with every SyncroSim library regardless of which packages that library uses.The `Pipeline` datasheet determines which transformer stage the scenarios will run and in which order. We use the term "transformers" because these constitute scripts that *transform* input data into output data. Use the code below to assign the `Pipeline` datasheet to a new data frame variable and check the values required by the datasheet.
 
-```{r assign pipeline data, warning = FALSE}
+```{r assign Pipeline data, warning = FALSE}
 # Assign contents of the Pipeline datasheet to an R data frame
 myPipeline <- datasheet(myScenario,
                         name = "core_Pipeline")
@@ -347,13 +347,13 @@ str(myPipeline)
 
 The Pipeline datasheet requires 2 values:
 
-* `StageNameId` : the pipeline stage (transformer). This column is a factor that has only a single level: "Hello World Time (R)". 
+* `StageNameId` : the pipeline transformer stage. This column is a factor that has only a single level: "Hello World Time (R)". 
 * `RunOrder` : the numerical order in which the stages will be run.
 
-Below, we use the `addRow()` and `saveDatasheet()` functions to update the Pipeline datasheet with the transformer(s) we want to run and the order in which we want to run them. In this case, there is only a single transformer available from the `helloworldTime` package, called "Hello World Time (R)", so we will add this transformer to the data frame and set the `RunOrder` to `1`.
+Below, we use the `addRow()` and `saveDatasheet()` functions to update the `Pipeline` datasheet with the transformer(s) we want to run and the order in which we want to run them. In this case, there is only a single transformer available from the `helloworldTime` package, called "Hello World Time (R)", so we will add this transformer to the data frame and set the `RunOrder` to `1`.
 
-```{r add pipeline data, warning = FALSE}
-# Create pipeline data and add it to the pipeline data frame
+```{r add Pipeline data, warning = FALSE}
+# Create Pipeline data and add it to the Pipeline data frame
 myPipelineRow <- data.frame(StageNameId = "Hello World Time (R)", RunOrder = 1)
 myPipeline <- addRow(myPipeline, myPipelineRow)
 
@@ -419,7 +419,7 @@ runLog(myResultScenario)
 
 ### Result scenarios
 
-A *result scenario* is generated when a scenario is run, and is an exact copy of the original scenario (i.e. it contains the original scenario's values for all Inputs datasheets). The result scenario is passed to the transformer in order to generate model output, with the results of the transformer's calculations then being added to the result scenario as output datasheets. In this way the result scenario contains both the output of the run and a snapshot record of all the model inputs.
+A *result scenario* is generated when a scenario is run, and is an exact copy of the original scenario (i.e. it contains the original scenario's values for all `Inputs` datasheets). The result scenario is passed to the transformer in order to generate model output, with the results of the transformer's calculations then being added to the result scenario as output datasheets. In this way the result scenario contains both the output of the run and a snapshot record of all the model inputs.
 
 Check out the current scenarios in your library using the `scenario()` function.
 
@@ -445,7 +445,7 @@ Looking at the `data` column, the `Outputs` does not contain any data in the ori
 
 ### Viewing results with `datasheet()`
 
-The next step is to view the Outputs datasheet in the result scenario that was populated from running the original scenario. We can load the result table using the `datasheet()` function and setting the `name` parameter to the Outputs datasheet.
+The next step is to view the `Outputs` datasheet in the result scenario that was populated from running the original scenario. We can load the result table using the `datasheet()` function and setting the `name` parameter to the `Outputs` datasheet.
 
 
 ```{r view results datasheets, warning = FALSE}
@@ -474,7 +474,7 @@ myNewScenario <- scenario(ssimObject = myProject,
 scenario(myLibrary)['Name']
 ```
 
-To edit the new scenario, we must first load the contents of the Inputs datasheet and assign it to a new R data frame using the `datasheet()` function. We will set the `empty` argument to `TRUE` so that instead of getting the values from the existing scenario, we can start with an empty data frame again.
+To edit the new scenario, we must first load the contents of the `Inputs` datasheet and assign it to a new R data frame using the `datasheet()` function. We will set the `empty` argument to `TRUE` so that instead of getting the values from the existing scenario, we can start with an empty data frame again.
 
 ```{r load input data from new Scenario, warning = FALSE}
 # Load empty Inputs datasheets as an R data frame

diff --git a/vignettes/a02_rsyncrosim_vignette_uncertainty.Rmd b/vignettes/a02_rsyncrosim_vignette_uncertainty.Rmd
@@ -140,7 +140,7 @@ View the datasheets associated with your new scenario using the `datasheet()` fu
 datasheet(myScenario)
 ```
 
-From the list of datasheets above, we can see that there are three datasheets specific to the `helloworldUncertainty` package. Let's view the contents of the Inputs datasheet as an R data frame.
+From the list of datasheets above, we can see that there are three datasheets specific to the `helloworldUncertainty` package. Let's view the contents of the `Inputs` datasheet as an R data frame.
 
 ```{r view specific datasheet, warning = FALSE}
 # View the contents of the Inputs datasheet for the scenario
@@ -151,7 +151,7 @@ datasheet(myScenario, name = "helloworldUncertainty_InputDatasheet")
 
 **Inputs Datasheet**
 
-Currently our input scenario datasheet is empty! We need to add some values to our Inputs datasheet (`InputDatasheet`) so we can run our model. First, assign the contents of the Inputs datasheet to a new data frame variable using `datasheet()`, then check the columns that need input values.
+Currently our input scenario datasheet is empty! We need to add some values to our `Inputs` datasheet (`InputDatasheet`) so we can run our model. First, assign the contents of the `Inputs` datasheet to a new data frame variable using `datasheet()`, then check the columns that need input values.
 
 ```{r assign input data, warning = FALSE}
 # Load the Inputs datasheet to an R data frame
@@ -162,7 +162,7 @@ myInputDataframe <- datasheet(myScenario,
 str(myInputDataframe)
 ```
 
-The Inputs datasheet requires three values:
+The `Inputs` datasheet requires three values:
 
 * `mMean` : the mean of the slope normal distribution.
 * `mSD` : the standard deviation of the slope normal distribution.
@@ -220,30 +220,30 @@ saveDatasheet(ssimObject = myScenario, data = myPipeline,
               name = "core_Pipeline")
 ```
 
-**RunControl Datasheet**
+**Run Control Datasheet**
 
-The `RunControl` datasheet provides information about how many time steps and iterations to use in the model. Here, we set the *number of iterations*, as well as the minimum and maximum time steps for our model. The number of iterations we set is equivalent to the number of Monte Carlo realizations, so the greater the number of iterations, the more accurate the range of output values we will obtain. Let's take a look at the columns that need input values. 
+The `Run Control` datasheet provides information about how many time steps and iterations to use in the model. Here, we set the *number of iterations*, as well as the minimum and maximum time steps for our model. The number of iterations we set is equivalent to the number of Monte Carlo realizations, so the greater the number of iterations, the more accurate the range of output values we will obtain. Let's take a look at the columns that need input values. 
 
 ```{r modify run control}
-# Load RunControl datasheet to a new R data frame
+# Load Run Control datasheet to a new R data frame
 runSettings <- datasheet(myScenario, name = "helloworldUncertainty_RunControl")
 
-# Check the columns of the RunControl data frame
+# Check the columns of the Run Control data frame
 str(runSettings)
 ```
 
-The RunControl datasheet requires the following 3 columns:
+The `Run Control` datasheet requires the following 3 columns:
 
 * `MaximumIteration` : total number of iterations to run the model for.
 * `MinimumTimestep` : the starting time point of the simulation.
 * `MaximumTimestep` : the end time point of the simulation.
 
-*Note:* A fourth hidden column, `MinimumIteration`, also exists in the RunControl datasheet (default=1).
+*Note:* A fourth hidden column, `MinimumIteration`, also exists in the `Run Control` datasheet (default=1).
 
-We'll add this information to an R data frame and then add it to the Run Control data frame using `addRow()`. For this example, we will use only five iterations.
+We'll add this information to an R data frame and then add it to the `Run Control` data frame using `addRow()`. For this example, we will use only five iterations.
 
 ```{r}
-# Create run control data and add it to the run control data frame
+# Create Run Control data and add it to the Run Control data frame
 runSettingsRow <- data.frame(MaximumIteration = 5,
                              MinimumTimestep = 1,
                              MaximumTimestep = 10)
@@ -256,7 +256,7 @@ runSettings
 Finally, save the R data frame to a SyncroSim datasheet using `saveDatasheet()`.
 
 ```{r}
-# Save RunControl R data frame to a SyncroSim datasheet
+# Save Run Control R data frame to a SyncroSim datasheet
 saveDatasheet(ssimObject = myScenario, 
               data = runSettings,
               name = "helloworldUncertainty_RunControl")
@@ -267,7 +267,9 @@ saveDatasheet(ssimObject = myScenario,
 
 ### Setting run parameters with `run()`
 
-We will now run our scenario using the `run()` function in `rsyncrosim`. If we have a large model and we want to parallelize the run using multiprocessing, we can modify the library-scoped "core_Multiprocessing" datasheet. Since we are using five iterations in our model, we will set the number of jobs to five so each multiprocessing core will run a single iteration.
+We will now run our scenario using the `run()` function in `rsyncrosim`. 
+
+If we have a large model and we want to parallelize the run using multiprocessing, we can modify the library-scoped "core_Multiprocessing" datasheet. Since we are using five iterations in our model, we will set the number of jobs to five so each multiprocessing core will run a single iteration.
 
 ```{r}
 # Load list of available library-scoped datasheets
@@ -298,7 +300,7 @@ Now, when we run our scenario, it will use the desired multiprocessing configura
 myResultScenario <- run(myScenario)
 ```
 
-Running the original scenario creates a new scenario object, known as a result scenario, that contains a read-only snapshot of the Inputs datasheets, as well as the Outputs datasheets filled with result data. We can view which scenarios are result scenarios using the `scenario()` function from `rsyncrosim`.
+Running the original scenario creates a new scenario object, known as a result scenario, that contains a read-only snapshot of the `Inputs` datasheets, as well as the `Outputs` datasheets filled with result data. We can view which scenarios are result scenarios using the `scenario()` function from `rsyncrosim`.
 
 ```{r}
 # Check that we have two scenarios, and one is a result scenario
@@ -311,7 +313,7 @@ scenario(myLibrary)
 
 ### Viewing results with `datasheet()`
 
-The next step is to view the Outputs datasheets added to the result scenario when it was run. We can load the result tables using the `datasheet()` function. In this package, the datasheet containing the results is called "OutputDatasheet".
+The next step is to view the `Outputs` datasheets added to the result scenario when it was run. We can load the result tables using the `datasheet()` function. In this package, the datasheet containing the results is called "OutputDatasheet".
 
 
 ```{r view results datasheets, warning = FALSE}