diff --git a/episodes/cli.md b/episodes/cli.md
index a54fc74..f6be24b 100644
--- a/episodes/cli.md
+++ b/episodes/cli.md
@@ -22,7 +22,7 @@ exercises: 2
## Command line interfaces
-A command-line interface, usually abbreviated to CLI, is a terminal or prompt that accepts text input that instructs a computer what to do. They are used to start programs and perform actions within the computer's operating system.
+A command-line interface, usually abbreviated to CLI, is a terminal or prompt that accepts text input that instructs a computer what to do. They are used to start programs and perform actions within the computer's operating system.
In this section, we'll introduce the concept of providing a command-line interface to our research code to make it easier to use and provide a well-documented "entry point" to our software.
@@ -30,7 +30,7 @@ In this section, we'll introduce the concept of providing a command-line interfa
Command lines are a way of interacting with a digital system that go back to the early history of computing. They might seem old-fashioned because typing out commands means that there is no graphical component. It may seem restrictive because your mouse isn't used, but terminals have a lot of power because we can formulate our instructions to the computer by writing commands. We have a direct line to control our computer's operating system.
-It's a great way to talk to your computer because you can record the commands that you've run to provide a documented history of a research process. (We could record a video screen capture of your working procedure, but that's much less efficient.)
+It's a great way to "talk" to your computer because you can record the commands that you've run to provide a documented history of a research process. (We could record a video screen capture of your working procedure, but that's much less efficient.)
Terminals are more efficient for running repetitive tasks and provide extra functionality for advanced users. They are an cost-effective way to provide a user interface for research software, as research teams often lack the resources and know-how to produce sophisticated graphical user interfaces.
@@ -245,6 +245,29 @@ The output is a description of the `ls` command, instructions for using it, and
:::
+::::::::::::::::::::::::::::::::::::: challenge
+
+Try the command line statements described above.
+
+- How would you seek further help if you encounter an error?
+- What response does the terminal provide? Is this what you expect?
+
+:::::::::::::::::::::::::::::::::::::
+
+## CLIs in R
+
+This rest of this episode is focussed on the Python programming language.
+
+R, while a powerful statistical computing language, doesn't have a built-in module specifically designed for creating CLIs. Unlike Python, this means that you'll need to use external packages or write your own functions to handle command-line arguments and options.
+
+However, there are several packages that can help you create to CLIs in R:
+
+- [optparse](https://trevorldavis.com/R/optparse/dev/)
+- [cli](https://cli.r-lib.org/)
+- [Rapp](https://github.com/r-lib/Rapp)
+
+These packages create CLIs for your R scripts, making them easier to distribute for others to use.
+
## CLIs in Python
We can add a command-line interface to our Python code using the methods and tools that are included in the Python programming language.
diff --git a/episodes/contributors.md b/episodes/contributors.md
index a6ee404..06ee933 100644
--- a/episodes/contributors.md
+++ b/episodes/contributors.md
@@ -21,14 +21,32 @@ exercises: 2
::::::::::::::::::::::::::::::::::::::::::::::::
-## Introduction
+## Collaborative research software development
-Most research software is written in a collaborative manner, involving multiple specialists from within a team or from multiple institutions. For the long-term health of a software package, it’s important to encourage potential contributors to get in touch and feel welcome to take part. Useful research software can take on a life of its own. For more information on planning the development of research software and project governance, see Module 1a.
+Often, in today's research environment, much analytics software is written in a **collaborative manner**, involving multiple specialists from within a team, or from multiple institutions. For the long-term health of a software package, it’s important to encourage potential contributors to get in touch and feel welcome to take part. Useful research software can take on a life of its own.
+
+::::::::::::::::::::::::::::::::::::: callout
+
+## Research software project management
+
+For more information on planning the development of research software and project governance, see Module 1a.
+
+:::::::::::::::::::::::::::::::::::::
It’s often published using an open source licence, which means that all the code is publicly available and may be used and modified by anyone, within certain conditions (see module 1b to learn more about software licensing.)
There's a lot more creating and managing a sustainable community aorund a research software project, but having a central piece of documentation for contributors is a great start!
+::::::::::::::::::::::::::::::::::::: discussion
+
+Consider these questions amongst the group:
+
+- How can we effectively foster a collaborative environment for research software development?
+- How can barriers to participation be removed for a diverse range of individuals and institutions?
+- What strategies can be implemented to ensure that all contributors feel valued and included?
+
+:::::::::::::::::::::::::::::::::::::
+
## Contribution guides
Contribution guidelines help users and understand how they can help to improve the software, whether that’s by submitting bug reports, suggesting new features, or writing better code and documentation. All of these aspects are vital to produce reusable research software.
@@ -39,9 +57,19 @@ It’s important to explain how the project is managed so the process for evalua
Contribution guides will save you time in the long run, because it provides an on-ramp for people to get involved, prevents them from getting confused, and reduces the amount of incorrectly-submitted bug reports or requests for change, etc.
+::::::::::::::::::::::::::::::::::::: discussion
+
+Discuss these issues amongst the group:
+
+- What essential components should be included in a comprehensive documentation for research software contributors?
+- How can we make onboarding new contributors a smooth and welcoming process, ensuring they have the necessary information and support to be successful?
+- How can we balance the need for clear guidelines with the desire to encourage creativity and innovation?
+
+:::::::::::::::::::::::::::::::::::::
+
### How to write contributor guidance
-The stanard practice for authoring a contribution guide for a software project is to create a file called `CONTRIBUTING.md` in the root folder of your project. This is a Markdown file that introduces new people to the project. It lets people know the ways they can take part in the research software project and what to do to get involved.
+The standard practice for authoring a contribution guide for a software project is to create a file called `CONTRIBUTING.md` in the root folder of your project. This is a Markdown file that introduces new people to the project. It lets people know the ways they can take part in the research software project and what to do to get involved.
The specific contents of this file depend upon the kind of research project, but some useful information to provide typically includes:
@@ -104,6 +132,15 @@ Many projects following programming standards to manage code quality. A coding s
This might include guidance and advice, or more strict rules as standards that are checked by a code linter. A code linter is an analysis tool that inspects code and checks for common errors and problems, producing a report for the developer to read and act upon. Common coding style standards include the PEP 8 style guide for the Python programming language and the tidyverse style guide in the R statistical language.
+::::::::::::::::::::::::::::::::::::: discussion
+
+Discuss these issues as a group:
+
+- Why are coding conventions important for collaborative research projects?
+- How can we establish and enforce coding style guidelines that promote consistency and readability?
+
+:::::::::::::::::::::::::::::::::::::
+
::::::::::::::::::::::::::::::::::::: keypoints
- **Encourage collaboration:** There are **many ways to contribute** to a research software project, including bug reoprts, feature suggests, design discussions, documentation, and software engineering.
diff --git a/episodes/docstrings.md b/episodes/docstrings.md
index 37a156a..e1e126d 100644
--- a/episodes/docstrings.md
+++ b/episodes/docstrings.md
@@ -27,7 +27,7 @@ If you’re publishing a research software package, one of the most common ways
We learned about _functions_ in an earlier module. Functions help us to break our code into smaller units that have a single purpose. By documenting those functions effectively, we aim to **explain their purpose** to future users and maintainers of that code. We also need to describe all the expected inputs and outputs of the function.
-### Documentation strings
+## Documentation strings
We describe functions by using a feature of many programming languages called documentation strings, usually abbreviated to **docstring**. A documentation string is a piece of text that describes that piece of code and helps people to use it.
@@ -49,7 +49,7 @@ def add(x, y):
### R
-In R, we use a comment with a single quote `#'` to specify a documentation string for a function.
+In R, we use the [roxygen2](https://roxygen2.r-lib.org/) package, where a comment with a single quote `#'` to specify a documentation string for a function.
```R
#' Calculate the sum of two numbers.
@@ -150,9 +150,18 @@ abs(x, /)
The most important thing to include in a docstrings is an explanation of the purpose of this piece of code. To write a useful docstring, put yourself in the shoes of someone who encounters your code for the first time and needs a simple introduction that doesn’t assume any implied knowledge. The explanation will be very basic and seem obvious to you, but it may help a new user greatly.
-Next, we must describe the inputs and outputs of the function.
+### Arguments
-We list all the input parameters, or arguments, like so:
+Next, we must describe the inputs and outputs of the function, its _arguments_.
+
+We list all the arguments, or input parameters, as shown in the code examples below.
+Each argument has a name and a brief description.
+
+::: group-tab
+
+### Python
+
+The argument name matches the variable name in the function signature, such as `add(x, y)` in this case.
```python
def add(x, y):
@@ -160,56 +169,103 @@ def add(x, y):
Calculate the sum of two numbers.
Args:
- x: The first number to add. (float or int)
- y: The second number to add. (float or int)
+ x: The first number to add.
+ y: The second number to add.
"""
return x + y
```
+### R
+
+The argument name matches the variable name in the function signature, such as `function(x, y)` in this case.
+
+```R
+#' Calculate the sum of two numbers.
+#'
+#' @param x The first number to add.
+#' @param y The second number to add.
+add <- function(x, y) {
+ return(x + y)
+}
+```
+
+:::
+
We have added an “arguments” (abbreviated to “args”) section to our docstring which lists the input parameters of the function and describes each one.
+::::::::::::::::::::::::::::::::: challenge
+
+Add a description of each argument to a function in your code.
+
+:::::::::::::::::::::::::::::::::
+
+### Return values
+
Finally, we describe the result of the function that is output by the return statement.
+::: group-tab
+
+### Python
+
```python
def add(x, y):
"""
Calculate the sum of two numbers.
Args:
- x: The first number to add. (float or int)
- y: The second number to add. (float or int)
+ x: The first number to add.
+ y: The second number to add.
Returns:
- The sum of x and y. (float or int)
+ The sum of x and y.
"""
return x + y
```
+### R
+
+```R
+#' Calculate the sum of two numbers.
+#'
+#' @param x The first number to add.
+#' @param y The second number to add.
+#' @return The sum of x and y.
+add <- function(x, y) {
+ return(x + y)
+}
+```
+
+:::
+
This will help the user to understand what the function does and what they can expect to receive back when they call it. It can also be useful to explain any potential errors or exceptions that the function will raise if the inputs aren’t as expected, and how to deal with them.
-Help function in R and Python e.g.
+::::::::::::::::::::::::::::::::: challenge
-```python
-help(print)
-```
+Describe the return value of a function in a documentation string.
+
+:::::::::::::::::::::::::::::::::
-## Usage examples
+### Usage examples
We can also include demonstrations of how to use our code by providing code snippets. To do this, we write a collection of sample code that demonstrate how to use functions effectively in different scenarios.
To do this, let's add an examples section to our documentation string. Each code example has a prefix of `>>>` which represents the input prompt on the Python interpreter. Some code editors will provide syntax highlighting of these code snippets.
+::: group-tab
+
+### Python
+
```python
def add(x, y):
"""
Calculate the sum of two numbers.
Args:
- x: The first number to add. (float or int)
- y: The second number to add. (float or int)
+ x: The first number to add.
+ y: The second number to add.
Returns:
- The sum of x and y. (float or int)
+ The sum of x and y.
Examples:
>>> add(1, 1)
@@ -220,12 +276,49 @@ def add(x, y):
return x + y
```
+### R
+
+For more information about writing R code examples within function documentation, please see the [Examples](https://r-pkgs.org/man.html#sec-man-examples) section in the book _R Packages_ by Hadley Wickham.
+
+```R
+#' Add two numbers.
+#'
+#' @param x The first number to add.
+#' @param y The second number to add.
+#' @return The sum of `x` and `y`.
+#'
+#' @examples
+#' add(1, 1)
+#' add(1.3, 5.3)
+add <- function(x, y) {
+ return(x + y)
+}
+```
+
+:::
+
+::::::::::::::::::::::::::::::::: challenge
+
+Write a brief code example within the documentation string in a function in your code.
+
+:::::::::::::::::::::::::::::::::
+
:::: spoiler
### Using docstrings to create automatic tests
+::: group-tab
+
+### Python
+
These code examples can be used as automatic tests using the [doctest](https://docs.python.org/3/library/doctest.html) module which is built into Python.
+### R
+
+In the R ecosystem, we can automatically test the examples in our documentation strings using the [doctest](https://cran.r-project.org/web/packages/doctest/vignettes/doctest.html) package.
+
+:::
+
::::
## Best practices
@@ -248,15 +341,16 @@ A list of documentation string standards in Python:
- The [Google Style Guide](https://google.github.io/styleguide/pyguide.html#381-docstrings) sets out a docstring format.
- [Sphinx docstring format](https://www.sphinx-doc.org/en/master/), which has a [NumpyDoc extension](https://numpydoc.readthedocs.io/en/latest/format.html) designed for scientific use.
-It doesn't _really_ matter which one you select, as long as it's used consistently across a project and it's clear what the syntax means. Some standards are better-supported by other tools such as IDEs and documentation generators.
+It doesn't matter which one you select, as long as it's used consistently across a project and it's clear what the syntax means. Some standards are better-supported by other tools such as IDEs and documentation generators.
::::
## Automatically generate docstrings
-Generative AI services such as Google Gemini can write docstrings automatically.
+Generative AI services such as [Google Gemini](https://gemini.google.com/) can read your code and write docstrings automatically, to a certain extent.
-To do this, ask the system to create a docstring and copy your code into the prompt text box. Below is an example prompt and the reply generated by the Google Gemini algorithm:
+To do this, ask the system to create a docstring and copy your code into the prompt text box.
+Below is an example prompt and the reply generated by the Google Gemini algorithm:
```
Please generate a docstring for this Python function:
@@ -266,7 +360,7 @@ def calculate_rectangle_area(width, height):
return area
```
-The result is the follow docstring, in addition to some helpful descriptions of the content that it generated.
+The result is the following docstring, in addition to some helpful descriptions of the content that it generated.
```python
def calculate_rectangle_area(width, height):
@@ -287,7 +381,27 @@ def calculate_rectangle_area(width, height):
return area
```
-This can save you a lot of time, but as with any AI-generated content, always check the output and ensure it's correct!
+This AI-generated content contains a function summary, argument descriptions, and explains the return value as we discussed previously.
+
+::::::::::::::::::::::::::::::::::::: challenge
+
+Try asking a generative AI service such as Google Gemini to read your code.
+
+- Ask it to generate documentation of different kinds.
+- Request a review of your code. What does the bot think?
+- Can the chat-bot create a diagram to illustrate a concept that is relevant to your research software?
+
+:::::::::::::::::::::::::::::::::::::
+
+This can save you a lot of time, but as with any LLM-generated content, always check the output and ensure it's correct!
+
+::::::::::::::::::::::::::::::::::::: discussion
+
+What are the benefits and risks of using a large langauge model (LLM) service such as Google Gemini or OpenAI ChatGPT to interpret your code and produce content that you use in your research?
+
+How should we critically evaluate this material so that it can be used appropriately to improve the productivity of our research teams without jeopardising our ethics or integrity or causing security risks?
+
+:::::::::::::::::::::::::::::::::::::
::::::::::::::::::::::::::::::::::::: keypoints
diff --git a/episodes/readable.md b/episodes/readable.md
index d401a11..8cfb779 100644
--- a/episodes/readable.md
+++ b/episodes/readable.md
@@ -24,7 +24,7 @@ It’s a common trope in the software engineering world that code is **read much
## Syntax highlighting
-Many text editors use syntax highlighting to display parts of your source code using different colours or fonts to signify the meaning of each word or symbol.
+Many text editors use **syntax highlighting** to display parts of your source code using different colours or fonts to signify the meaning of each word or symbol.
For example, variable names may be given a bright blue colour, strings highighted in green, and numbers shown in a red font.
Let's take a look to see its benefits:
@@ -130,7 +130,17 @@ count_word_occurrences <- function(filename, word_to_count) {
Which bit of code is easier to read? What a difference a splash of colour makes! I know which development environment I'd rather work in.
-To work with our source code in a colourised way like this, use a text editor or IDE with a syntax highlighting feature such as Notepad++, VSCode, PyCharm, or RStudio.
+### Code editors
+
+To work with our source code in a colourised way like this, use a text editor or IDE with a syntax highlighting feature such as Notepad++, VSCode, PyCharm, or RStudio.
+
+::::::::::::::::::::::::::::::::::::: challenge
+
+Try using some code editing software to apply syntax highlighting to your code.
+
+If you don't have access to an IDE, you could try the Online syntax highlighting tool by Oleg Parashchenko which can colourise [R scripts](https://tohtml.com/r/) and [Python code](https://tohtml.com/python/).
+
+:::::::::::::::::::::::::::::::::::::
## Meaningful names
@@ -178,6 +188,15 @@ calculate_area <- function(width, height) {
:::
+::::::::::::::::::::::::::::::::::::: discussion
+
+Try modifying your example code by renaming the variables and functions.
+
+- How much meaning can you include in these object names?
+- What are the limitations of this approach?
+
+:::::::::::::::::::::::::::::::::::::
+
### Naming conventions
The communities of developers that use each programming language usually follow a conventional approach when naming objects in their code.
@@ -263,6 +282,16 @@ age = age + 3
It's best practice to use a very concise style when writing code comments. I recommend using active tense verbs.
+::::::::::::::::::::::::::::::::::::: discussion
+
+Try adding comments to your code.
+
+- Which parts of the code will most benefit from comments?
+- How long and detailed should comments be?
+- How would you refer someone to an external website for more information?
+
+:::::::::::::::::::::::::::::::::::::
+
## Type hints
Type hints display the expected *type* of each object in your code. They are a kind of "documentation as code" that annotate the code that's already there, rather than being written as separate documentation. While they don't change the way the software works, they can help to improve code clarity and may be used to catch errors early in the development process.
@@ -375,3 +404,10 @@ It will take some time and effort to write these labels, but it will pay off in
- Label functions and variables with *type hints* to tell the user what data types are expected.
::::::::::::::::::::::::::::::::::::::::::::::::
+
+## Further resources
+
+To find out more about the topics covered in this episode, please refer to the following pages:
+
+- _The Hitchhiker's Guide to Python_ [Code Style](https://docs.python-guide.org/writing/style/)
+- [The tidyverse style guide](https://style.tidyverse.org/) for R
diff --git a/episodes/sites.md b/episodes/sites.md
index e0a102c..3a9e524 100644
--- a/episodes/sites.md
+++ b/episodes/sites.md
@@ -22,7 +22,7 @@ exercises: 2
::::::::::::::::::::::::::::::::::::::::::::::::
-## Introduction
+## Documentation websites
A documentation website is a user guide and reference manual for a library of research code. Up to now, we've looked at ways to put helpful notes in our code, but now we'll learn how to write a longer, more complete guide to the research tools you create.
@@ -34,25 +34,44 @@ To get an idea of this, here are some links documentation websites for widely-us
- [ggplot2](https://ggplot2.tidyverse.org/index.html) is a plotting package for the R statistical language.
- [scikit-learn](https://scikit-learn.org/stable/user_guide.html) is a machine learning library for the Python programming language.
+::::::::::::::::::::::::::::::::::::: discussion
+
+Evaluate these documentation sites.
+
+- What do you like about them?
+- How approachable are they as a new user?
+- What do you find difficult to understand in this material?
+
+:::::::::::::::::::::::::::::::::::::
+
+## Why create a website?
+
+There are many advantages to building a documentation site to provide a information-rich resource for researchers who use your code at institutions all around the world.
+
### Advantages
-There are many advantages to building a documentation site to provide a information-rich resource for researchers who use your code at institutions all around the world. These sites can work as hubs for collaboration, sharing the latest updates, and encouraging people to take up your system and get involved in improving it. The effort of setting one up will be rewarded in the long run because you will have created a valuable asset that will foster collaboration and knowledge sharing in your research community.
+These sites can work as hubs for collaboration, sharing the latest updates, and encouraging people to take up your system and get involved in improving it. The effort of setting one up will be rewarded in the long run because you will have created a valuable asset that will foster collaboration and knowledge sharing in your research community.
A key foundation stone of modern digital research practices is the ability to replicate results by reproducing analysis workflows. Clear, thorough documentation of the research code ensures that researchers can repeat processes and verify results and other people's outputs.
-Documementation sites are really useful for introducing new users to your software. It makes it much easier and faster for new users to get started using your software to boost their research. It's one of the most effective ways to create a user base that has a sophisticated understanding of the research code, which is essential for them to adapt it to the complex problems that often raise in research contexts.
+Documentation sites are really useful for introducing new users to your software. It makes it much easier and faster for new users to get started using your software to boost their research. It's one of the most effective ways to create a user base that has a sophisticated understanding of the research code, which is essential for them to adapt it to the complex problems that often raise in research contexts.
They're also a valuable resource for your existing user base, enabling them to look up reference material or search the manual to find new capabilities they weren't aware of before. This will increase the potential for your software to accellerate the productivity of other research teams and boost scientific progress.
### When to use one
-Although the advantages are numerous, not all software packages require a comprehensive documentation website. However, for any code project that is growing in the number of collaborators, users, and technicala complexity, consider coordinating the team to write one as soon as possible to help the project grow in a healthy manner.
+Although the advantages are numerous, not all software packages require a comprehensive documentation website. However, for any code project that is growing in the number of collaborators, users, and technical complexity, consider coordinating the team to write one as soon as possible to help the project grow in a healthy manner.
-### Writing style
+::::::::::::::::::::::::::::::::::::: discussion
-Strive to use everyday, jargon-free language. It helps to set an approachable tone that encourages others to use the software and get involved with the project. This will en sure that the code is accesible to the widest possible layers of the research community and foster collaboration.
+When is it appropriate to establish a documentation website?
+Consider the following factors:
-Always consider the target audience of your documentation, because your user base may be unaware of some of the unstated assumptions and technical backgroud knowledge that you take for granted.
+- How many resources will it take to write and maintain?
+- How many end-users need the information?
+- Is there a simpler format that can convey the same information?
+
+:::::::::::::::::::::::::::::::::::::
## Contents
@@ -86,6 +105,13 @@ This prevents a situation where potential solutions to common issues do exist, b
An appendix containing frequently asked questions (FAQs) is very useful to save yourself time in responding to common queries from the users of your code.
+
+## Writing style
+
+As we discussed in the [episode on READMEs](readmes.md), it's important to strive to use everyday, jargon-free language. It helps to set an approachable tone that encourages others to use the software and get involved with the project. This will en sure that the code is accesible to the widest possible layers of the research community and foster collaboration.
+
+Always consider the target audience of your documentation, because your user base may be unaware of some of the unstated assumptions and technical backgroud knowledge that you take for granted.
+
## Tools
There are various tools available to build documentation sites for your research software.
@@ -102,6 +128,8 @@ To create a wiki, which is a simple, easy-to-edit web site, go to the main page
::: callout
+## GitHub Wikis
+
For more information about the wiki feature on GitHub, see [Documenting your project with wikis](https://docs.github.com/en/communities/documenting-your-project-with-wikis) on the GitHub documentation.
:::