Skip to content

Commit

Permalink
Merge pull request #45 from Joe-Heffer-Shef/44-add-exercises-througho…
Browse files Browse the repository at this point in the history
…ut-the-course

Add exercises throughout the course
  • Loading branch information
Joe-Heffer-Shef authored Sep 12, 2024
2 parents 8fa53d8 + 80a289c commit 42b82f4
Show file tree
Hide file tree
Showing 5 changed files with 274 additions and 36 deletions.
27 changes: 25 additions & 2 deletions episodes/cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,15 +22,15 @@ exercises: 2

## Command line interfaces

A command-line interface, usually abbreviated to CLI, is a terminal or prompt that accepts text input that instructs a computer what to do. They are used to start programs and perform actions within the computer's operating system.
A command-line interface, usually abbreviated to <abbr title="Command-line interface">CLI</abbr>, is a terminal or prompt that accepts text input that instructs a computer what to do. They are used to start programs and perform actions within the computer's operating system.

In this section, we'll introduce the concept of providing a command-line interface to our research code to make it easier to use and provide a well-documented "entry point" to our software.

### Advantages of CLIs for research tools

Command lines are a way of interacting with a digital system that go back to the early history of computing. They might seem old-fashioned because typing out commands means that there is no graphical component. It may seem restrictive because your mouse isn't used, but terminals have a lot of power because we can formulate our instructions to the computer by writing commands. We have a direct line to control our computer's operating system.

It's a great way to talk to your computer because you can record the commands that you've run to provide a documented history of a research process. (We could record a video screen capture of your working procedure, but that's much less efficient.)
It's a great way to "talk" to your computer because you can record the commands that you've run to provide a documented history of a research process. (We could record a video screen capture of your working procedure, but that's much less efficient.)

Terminals are more efficient for running repetitive tasks and provide extra functionality for advanced users. They are an cost-effective way to provide a user interface for research software, as research teams often lack the resources and know-how to produce sophisticated graphical user interfaces.

Expand Down Expand Up @@ -245,6 +245,29 @@ The output is a description of the `ls` command, instructions for using it, and

:::

::::::::::::::::::::::::::::::::::::: challenge

Try the command line statements described above.

- How would you seek further help if you encounter an error?
- What response does the terminal provide? Is this what you expect?

:::::::::::::::::::::::::::::::::::::

## CLIs in R

This rest of this episode is focussed on the Python programming language.

R, while a powerful statistical computing language, doesn't have a built-in module specifically designed for creating <abbr title="Command-line interfaces">CLIs</abbr>. Unlike Python, this means that you'll need to use external packages or write your own functions to handle command-line arguments and options.

However, there are several packages that can help you create to CLIs in R:

- [optparse](https://trevorldavis.com/R/optparse/dev/)
- [cli](https://cli.r-lib.org/)
- [Rapp](https://github.com/r-lib/Rapp)

These packages create <abbr title="Command-line interfaces">CLIs</abbr> for your R scripts, making them easier to distribute for others to use.

## CLIs in Python

We can add a command-line interface to our Python code using the methods and tools that are included in the Python programming language.
Expand Down
43 changes: 40 additions & 3 deletions episodes/contributors.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,14 +21,32 @@ exercises: 2

::::::::::::::::::::::::::::::::::::::::::::::::

## Introduction
## Collaborative research software development

Most research software is written in a collaborative manner, involving multiple specialists from within a team or from multiple institutions. For the long-term health of a software package, it’s important to encourage potential contributors to get in touch and feel welcome to take part. Useful research software can take on a life of its own. For more information on planning the development of research software and project governance, see Module 1a.
Often, in today's research environment, much analytics software is written in a **collaborative manner**, involving multiple specialists from within a team, or from multiple institutions. For the long-term health of a software package, it’s important to encourage potential contributors to get in touch and feel welcome to take part. Useful research software can take on a life of its own.

::::::::::::::::::::::::::::::::::::: callout

## Research software project management

For more information on planning the development of research software and project governance, see Module 1a.

:::::::::::::::::::::::::::::::::::::

It’s often published using an open source licence, which means that all the code is publicly available and may be used and modified by anyone, within certain conditions (see module 1b to learn more about software licensing.)

There's a lot more creating and managing a sustainable community aorund a research software project, but having a central piece of documentation for contributors is a great start!

::::::::::::::::::::::::::::::::::::: discussion

Consider these questions amongst the group:

- How can we effectively foster a collaborative environment for research software development?
- How can barriers to participation be removed for a diverse range of individuals and institutions?
- What strategies can be implemented to ensure that all contributors feel valued and included?

:::::::::::::::::::::::::::::::::::::

## Contribution guides

Contribution guidelines help users and understand how they can help to improve the software, whether that’s by submitting bug reports, suggesting new features, or writing better code and documentation. All of these aspects are vital to produce reusable research software.
Expand All @@ -39,9 +57,19 @@ It’s important to explain how the project is managed so the process for evalua

Contribution guides will save you time in the long run, because it provides an on-ramp for people to get involved, prevents them from getting confused, and reduces the amount of incorrectly-submitted bug reports or requests for change, etc.

::::::::::::::::::::::::::::::::::::: discussion

Discuss these issues amongst the group:

- What essential components should be included in a comprehensive documentation for research software contributors?
- How can we make onboarding new contributors a smooth and welcoming process, ensuring they have the necessary information and support to be successful?
- How can we balance the need for clear guidelines with the desire to encourage creativity and innovation?

:::::::::::::::::::::::::::::::::::::

### How to write contributor guidance

The stanard practice for authoring a contribution guide for a software project is to create a file called `CONTRIBUTING.md` in the root folder of your project. This is a Markdown file that introduces new people to the project. It lets people know the ways they can take part in the research software project and what to do to get involved.
The standard practice for authoring a contribution guide for a software project is to create a file called `CONTRIBUTING.md` in the root folder of your project. This is a Markdown file that introduces new people to the project. It lets people know the ways they can take part in the research software project and what to do to get involved.

The specific contents of this file depend upon the kind of research project, but some useful information to provide typically includes:

Expand Down Expand Up @@ -104,6 +132,15 @@ Many projects following programming standards to manage code quality. A coding s

This might include guidance and advice, or more strict rules as standards that are checked by a code linter. A code linter is an analysis tool that inspects code and checks for common errors and problems, producing a report for the developer to read and act upon. Common coding style standards include the PEP 8 style guide for the Python programming language and the tidyverse style guide in the R statistical language.

::::::::::::::::::::::::::::::::::::: discussion

Discuss these issues as a group:

- Why are coding conventions important for collaborative research projects?
- How can we establish and enforce coding style guidelines that promote consistency and readability?

:::::::::::::::::::::::::::::::::::::

::::::::::::::::::::::::::::::::::::: keypoints

- **Encourage collaboration:** There are **many ways to contribute** to a research software project, including bug reoprts, feature suggests, design discussions, documentation, and software engineering.
Expand Down
158 changes: 136 additions & 22 deletions episodes/docstrings.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ If you’re publishing a research software package, one of the most common ways

We learned about _functions_ in an earlier module. Functions help us to break our code into smaller units that have a single purpose. By documenting those functions effectively, we aim to **explain their purpose** to future users and maintainers of that code. We also need to describe all the expected inputs and outputs of the function.

### Documentation strings
## Documentation strings

We describe functions by using a feature of many programming languages called documentation strings, usually abbreviated to **docstring**. A documentation string is a piece of text that describes that piece of code and helps people to use it.

Expand All @@ -49,7 +49,7 @@ def add(x, y):

### R

In R, we use a comment with a single quote `#'` to specify a documentation string for a function.
In R, we use the [roxygen2](https://roxygen2.r-lib.org/) package, where a comment with a single quote `#'` to specify a documentation string for a function.

```R
#' Calculate the sum of two numbers.
Expand Down Expand Up @@ -150,66 +150,122 @@ abs(x, /)

The most important thing to include in a docstrings is an explanation of the purpose of this piece of code. To write a useful docstring, put yourself in the shoes of someone who encounters your code for the first time and needs a simple introduction that doesn’t assume any implied knowledge. The explanation will be very basic and seem obvious to you, but it may help a new user greatly.

Next, we must describe the inputs and outputs of the function.
### Arguments

We list all the input parameters, or arguments, like so:
Next, we must describe the inputs and outputs of the function, its _arguments_.

We list all the arguments, or input parameters, as shown in the code examples below.
Each argument has a name and a brief description.

::: group-tab

### Python

The argument name matches the variable name in the function signature, such as `add(x, y)` in this case.

```python
def add(x, y):
"""
Calculate the sum of two numbers.
Args:
x: The first number to add. (float or int)
y: The second number to add. (float or int)
x: The first number to add.
y: The second number to add.
"""
return x + y
```

### R

The argument name matches the variable name in the function signature, such as `function(x, y)` in this case.

```R
#' Calculate the sum of two numbers.
#'
#' @param x The first number to add.
#' @param y The second number to add.
add <- function(x, y) {
return(x + y)
}
```

:::

We have added an “arguments” (abbreviated to “args”) section to our docstring which lists the input parameters of the function and describes each one.

::::::::::::::::::::::::::::::::: challenge

Add a description of each argument to a function in your code.

:::::::::::::::::::::::::::::::::

### Return values

Finally, we describe the result of the function that is output by the return statement.

::: group-tab

### Python

```python
def add(x, y):
"""
Calculate the sum of two numbers.
Args:
x: The first number to add. (float or int)
y: The second number to add. (float or int)
x: The first number to add.
y: The second number to add.
Returns:
The sum of x and y. (float or int)
The sum of x and y.
"""
return x + y
```

### R

```R
#' Calculate the sum of two numbers.
#'
#' @param x The first number to add.
#' @param y The second number to add.
#' @return The sum of x and y.
add <- function(x, y) {
return(x + y)
}
```

:::

This will help the user to understand what the function does and what they can expect to receive back when they call it. It can also be useful to explain any potential errors or exceptions that the function will raise if the inputs aren’t as expected, and how to deal with them.

Help function in R and Python e.g.
::::::::::::::::::::::::::::::::: challenge

```python
help(print)
```
Describe the return value of a function in a documentation string.

:::::::::::::::::::::::::::::::::

## Usage examples
### Usage examples

We can also include demonstrations of how to use our code by providing code snippets. To do this, we write a collection of sample code that demonstrate how to use functions effectively in different scenarios.

To do this, let's add an examples section to our documentation string. Each code example has a prefix of `>>>` which represents the input prompt on the Python interpreter. Some code editors will provide syntax highlighting of these code snippets.

::: group-tab

### Python

```python
def add(x, y):
"""
Calculate the sum of two numbers.
Args:
x: The first number to add. (float or int)
y: The second number to add. (float or int)
x: The first number to add.
y: The second number to add.
Returns:
The sum of x and y. (float or int)
The sum of x and y.
Examples:
>>> add(1, 1)
Expand All @@ -220,12 +276,49 @@ def add(x, y):
return x + y
```

### R

For more information about writing R code examples within function documentation, please see the [Examples](https://r-pkgs.org/man.html#sec-man-examples) section in the book _R Packages_ by Hadley Wickham.

```R
#' Add two numbers.
#'
#' @param x The first number to add.
#' @param y The second number to add.
#' @return The sum of `x` and `y`.
#'
#' @examples
#' add(1, 1)
#' add(1.3, 5.3)
add <- function(x, y) {
return(x + y)
}
```

:::

::::::::::::::::::::::::::::::::: challenge

Write a brief code example within the documentation string in a function in your code.

:::::::::::::::::::::::::::::::::

:::: spoiler

### Using docstrings to create automatic tests

::: group-tab

### Python

These code examples can be used as automatic tests using the [doctest](https://docs.python.org/3/library/doctest.html) module which is built into Python.

### R

In the R ecosystem, we can automatically test the examples in our documentation strings using the [doctest](https://cran.r-project.org/web/packages/doctest/vignettes/doctest.html) package.

:::

::::

## Best practices
Expand All @@ -248,15 +341,16 @@ A list of documentation string standards in Python:
- The [Google Style Guide](https://google.github.io/styleguide/pyguide.html#381-docstrings) sets out a docstring format.
- [Sphinx docstring format](https://www.sphinx-doc.org/en/master/), which has a [NumpyDoc extension](https://numpydoc.readthedocs.io/en/latest/format.html) designed for scientific use.

It doesn't _really_ matter which one you select, as long as it's used consistently across a project and it's clear what the syntax means. Some standards are better-supported by other tools such as <abbr title="Integrated development environments">IDEs</abbr> and documentation generators.
It doesn't matter which one you select, as long as it's used consistently across a project and it's clear what the syntax means. Some standards are better-supported by other tools such as <abbr title="Integrated development environments">IDEs</abbr> and documentation generators.

::::

## Automatically generate docstrings

Generative AI services such as Google Gemini can write docstrings automatically.
Generative AI services such as [Google Gemini](https://gemini.google.com/) can read your code and write docstrings automatically, to a certain extent.

To do this, ask the system to create a docstring and copy your code into the prompt text box. Below is an example prompt and the reply generated by the Google Gemini algorithm:
To do this, ask the system to create a docstring and copy your code into the prompt text box.
Below is an example prompt and the reply generated by the Google Gemini algorithm:

```
Please generate a docstring for this Python function:
Expand All @@ -266,7 +360,7 @@ def calculate_rectangle_area(width, height):
return area
```

The result is the follow docstring, in addition to some helpful descriptions of the content that it generated.
The result is the following docstring, in addition to some helpful descriptions of the content that it generated.

```python
def calculate_rectangle_area(width, height):
Expand All @@ -287,7 +381,27 @@ def calculate_rectangle_area(width, height):
return area
```

This can save you a lot of time, but as with any AI-generated content, always check the output and ensure it's correct!
This <abbr title="artificial intelligence">AI</abbr>-generated content contains a function summary, argument descriptions, and explains the return value as we discussed previously.

::::::::::::::::::::::::::::::::::::: challenge

Try asking a generative AI service such as Google Gemini to read your code.

- Ask it to generate documentation of different kinds.
- Request a review of your code. What does the bot think?
- Can the chat-bot create a diagram to illustrate a concept that is relevant to your research software?

:::::::::::::::::::::::::::::::::::::

This can save you a lot of time, but as with any <abbr title="Large langauge model">LLM</abbr>-generated content, always check the output and ensure it's correct!

::::::::::::::::::::::::::::::::::::: discussion

What are the benefits and risks of using a large langauge model (LLM) service such as Google Gemini or OpenAI ChatGPT to interpret your code and produce content that you use in your research?

How should we critically evaluate this material so that it can be used appropriately to improve the productivity of our research teams without jeopardising our ethics or integrity or causing security risks?

:::::::::::::::::::::::::::::::::::::

::::::::::::::::::::::::::::::::::::: keypoints

Expand Down
Loading

0 comments on commit 42b82f4

Please sign in to comment.