Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix typos and normalise punctuation #99

Merged
merged 1 commit into from
Jun 20, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions _episodes/01-introduction.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ objectives:
- "Locate helpful resources to learn more about OpenRefine."
keypoints:
- "OpenRefine is a powerful, free, and open source tool that can be used for data cleaning."
- "OpenRefine will automatically track any steps allowing you to backtrack as needed and providing a record of all work done"
- "OpenRefine will automatically track any steps allowing you to backtrack as needed and providing a record of all work done."
---

# Lesson
Expand Down Expand Up @@ -46,10 +46,10 @@ If after installation and running OpenRefine, it does not automatically open for
## Getting help for OpenRefine


You can find out a lot more about OpenRefine at [http://openrefine.org](http://openrefine.org) and check out some great introductory videos.
These videos and others on OpenRefine can also be found on YouTube by searching under 'OpenRefine'. There is a [Google Group](https://groups.google.com/g/openrefine)
that can answer a lot of beginner questions and problems. Information can also be found on [StackOverflow](https://stackoverflow.com/questions/tagged/openrefine)
where you can find a lot of help. As with other programs of this type, OpenRefine libraries are available too, where you can find a script you need and copy it
You can find out a lot more about OpenRefine at [http://openrefine.org](http://openrefine.org) and check out some great introductory videos.
These videos and others on OpenRefine can also be found on YouTube by searching under 'OpenRefine'. There is a [Google Group](https://groups.google.com/g/openrefine)
that can answer a lot of beginner questions and problems. Information can also be found on [StackOverflow](https://stackoverflow.com/questions/tagged/openrefine)
where you can find a lot of help. As with other programs of this type, OpenRefine libraries are available too, where you can find a script you need and copy it
into your OpenRefine instance to run it on your dataset.


Expand Down
6 changes: 3 additions & 3 deletions _episodes/02-working-with-openrefine.md
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,7 @@ In OpenRefine, clustering means "finding groups of different values that might b
3. Select the `key collision` method and `metaphone3` keying function. It should identify two clusters.
4. Click the `Merge?` box beside each cluster, then click `Merge Selected and Recluster` to apply the corrections to the dataset.
4. Try selecting different `Methods` and `Keying Functions` again, to see what new merges are suggested.
5. You should find that using the default settings, no more clusters are found, for example to merge `Ruaca-Nhamuenda` with `Ruaca` or `Chirdozo` with `Chirodzo`. (Note that the `nearest neighbor` method with `ppm` distance, `radius` ≥ 4, and `block chars` ≤ 4 will find these clusters, as well as other settings with `levenshtein` distance)
5. You should find that using the default settings, no more clusters are found, for example to merge `Ruaca-Nhamuenda` with `Ruaca` or `Chirdozo` with `Chirodzo`. (Note that the `nearest neighbor` method with `ppm` distance, `radius` ≥ 4, and `block chars` ≤ 4 will find these clusters, as well as other settings with `levenshtein` distance)
6. To merge these values we will hover over them in the village text facet, select edit, and manually change the names. Change `Chirdozo` to `Chirodzo` and `Ruaca-Nhamuenda` to `Ruaca`. You should now have four clusters: `Chirodzo`, `God`, `Ruaca` and `49`.

Important: If you `Merge` using a different method or keying function, or more times than described in the instructions above,
Expand Down Expand Up @@ -206,7 +206,7 @@ You should now see a new text facet box in the left-hand pane.
> column.
{: .challenge}

## Using undo and redo.
## Using undo and redo

It's common while exploring and cleaning a dataset to discover after you've made a change that you really should have done something else first. OpenRefine provides `Undo` and `Redo` operations to make this easy.

Expand All @@ -221,7 +221,7 @@ It's common while exploring and cleaning a dataset to discover after you've made

## Trim Leading and Trailing Whitespace

Words with spaces at the beginning or end are particularly hard for we humans to tell from strings without, but the blank characters will make a difference to the computer. We usually want to remove these. As of version 3.4 of OpenRefine, the option to trim leading and trailing whitespaces is present at the moment of importing the data (see image at the top of this page).
Words with spaces at the beginning or end are particularly hard for we humans to tell from strings without, but the blank characters will make a difference to the computer. We usually want to remove these. As of version 3.4 of OpenRefine, the option to trim leading and trailing whitespaces is present at the moment of importing the data (see image at the top of this page).

If you unchecked that box when importing data, or if leading or trailing whitespaces were introduced while splitting columns, or other operations, OpenRefine also provides a tool to remove blank characters from the beginning and end of any entries that have them.

Expand Down
26 changes: 13 additions & 13 deletions _episodes/03-filter-sort.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,12 +25,12 @@ There are many entries in our data table. We can filter it to work on a subset o

> ## Exercise
>
> 1. What roof types are selected by this procedure?
> 2. How would you restrict this to only one of the roof types?
> 1. What roof types are selected by this procedure?
> 2. How would you restrict this to only one of the roof types?
>
> > ## Solution
> > 1. Do `Facet` > `Text facet` on the `respondent_roof_type` column after filtering. This will show that
> > two names match your filter criteria. They are `mabatipitched` and `mabatisloping`.
> > two names match your filter criteria. They are `mabatipitched` and `mabatisloping`.
> > 2. To restrict to only one of these two roof types, you could include more letters in your filter.
> >
> {: .solution}
Expand Down Expand Up @@ -74,10 +74,10 @@ If this is your first time sorting this table, then the drop-down menu for the s

> ## Exercise
>
> Sort the data by `gps_Altitude`. Do you think the first few entries may have incorrect altitudes?.
> Sort the data by `gps_Altitude`. Do you think the first few entries may have incorrect altitudes?
>
> > ## Solution
> > In the `gps:Altitude` column, select `Sort...` > `numbers` and select `smallest first`. The first few values are all 0. The altitudes are more likely 'missing' than incorrect. The survey is delivered by Smartphone with the gps information added automatically by the app. The lack of an altitude value suggests that the smartphone was unable to provide it and it defaulted to 0.
> > In the `gps_Altitude` column, select `Sort...` > `numbers` and select `smallest first`. The first few values are all 0. The altitudes are more likely 'missing' than incorrect. The survey is delivered by Smartphone with the gps information added automatically by the app. The lack of an altitude value suggests that the smartphone was unable to provide it and it defaulted to 0.
> {: .solution}
{: .challenge}

Expand All @@ -88,27 +88,27 @@ If you try to re-sort a column that you have already used, the drop-down menu ch
* `Sort` > `Reverse` - This option allows you to reverse the order of the sort.
* `Sort` > `Remove sort` - This option allows you to undo your sort.

### Sorting by multiple columns.
### Sorting by multiple columns

You can sort by multiple columns by performing sort on additional columns. The sort will depend on the order in which you select columns to sort. To restart the sorting process with a particular column, check the `sort by this column alone` box in the `Sort` pop-up menu.

If you go back to one of the already sorted columns and select > `Sort` > `Remove sort`, that column is removed from your multiple sort. If it is the only column sorted, then data reverts to its original order.

> ## Exercise
>
> We discovered in an earlier lesson that the value for one of the `village` entries was given as 49. This is clearly wrong. By looking at the GPS coordinates for the entries of the other villages can we decide what village the data in that column was collected from?
> 1. Sort on `gps_Latitude` as a number with the smallest first.
> 2. Add a sort on `gps_Longitude` as a number with the smallest first.
> 3. Using the drop down arrow on the `village` column, select `Edit column` > `Move column to end`. This will allow you to compare village names with GPS coordinates.
> We discovered in an earlier lesson that the value for one of the `village` entries was given as 49. This is clearly wrong. By looking at the GPS coordinates for the entries of the other villages can we decide what village the data in that column was collected from?
> 1. Sort on `gps_Latitude` as a number with the smallest first.
> 2. Add a sort on `gps_Longitude` as a number with the smallest first.
> 3. Using the drop down arrow on the `village` column, select `Edit column` > `Move column to end`. This will allow you to compare village names with GPS coordinates.
> 4. Scroll through the entries until you find village `49`. Can you tell from it's GPS coordinates which village it belong to?
> 5. Now sort only by `interview_date` as date. Move the `village` column to the start of the table. Does the row where village is `49` group with one particular village? Is it the same village as when comparing GPS coordinates?
>
> > ## Solution
> >
> > The interview data for that row is in a small cluster of Chirodzo interviews when sorting by GPS coordinates. When sorting by interview date, it is also with Chirodzo interviews. In fact, only Chirodzo had interviews conducted on that date.
> {: .solution}
> > The interview data for that row is in a small cluster of Chirodzo interviews when sorting by GPS coordinates. When sorting by interview date, it is also with Chirodzo interviews. In fact, only Chirodzo had interviews conducted on that date.
> {: .solution}
{: .challenge}

Perform a text facet on the `village` column and change `49` to the village name that was determined in the previous exercise. You should now have only three village names.
Perform a text facet on the `village` column and change `49` to the village name that was determined in the previous exercise. You should now have only three village names.

{% include links.md %}
2 changes: 1 addition & 1 deletion _episodes/04-numbers.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ To transform cells in the `years_farm` column to numbers, click the down arrow f

> ## Exercise
>
> Transform three more columns, `no_members`, `years_liv`, and `buildings_in_compound`, from text to numbers. Can all columns be transformed to numbers? - Try it with `village` for example.
> Transform three more columns, `no_membrs`, `years_liv`, and `buildings_in_compound`, from text to numbers. Can all columns be transformed to numbers? - Try it with `village` for example.
>
> > ## Solution
> >
Expand Down
2 changes: 1 addition & 1 deletion _episodes/07-resources.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ objectives:
- "Understand that there are many online resources available for more information on OpenRefine."
- "Identify other resources about OpenRefine."
keypoints:
- "Other examples and resources online are good for learning more about OpenRefine"
- "Other examples and resources online are good for learning more about OpenRefine."
---

# Lesson
Expand Down