Skip to content

Commit

Permalink
Added some corrections to discovery configuration and accompanying cl…
Browse files Browse the repository at this point in the history
…arification
  • Loading branch information
torstees committed Oct 5, 2023
1 parent 4abf7a1 commit 6b1e8c4
Showing 1 changed file with 7 additions and 4 deletions.
11 changes: 7 additions & 4 deletions docs/the_setup.md
Original file line number Diff line number Diff line change
Expand Up @@ -252,7 +252,7 @@ dataset:
discovery:
filename: data/discovery.csv
code_harmonization: harmony/data-harmony.csv
key_columns: subject_id, sample_id
key_columns: subject_id, sample_id, chrom, pos
data_dictionary:
filename: data/discovery-dd.csv
colnames:
Expand All @@ -262,6 +262,8 @@ If you look carefully, you'll notice one more property that hasn't been discusse
Subject is easy, since it's key column happens to be the default, which we defined in the property, *id_colname*. As a result, we don't need to provide a *key_columns* property for that table.
It is worth noting that discovery has 4 *key_columns*. This is because it isn't unique even with subject_id and sample_id. There is a subject with 3 different variants at different chrosome/position in this file, so we had to add those to the key columns as well.
There is actually more that can be done with our dataset entries. For instance, if we want to merge one table inside another table's entries based on common key columns, you can have Whistler do that for you. Or, if you want to group entries of the same table together using a set of keys, Whistler can do that as well. These types of transformation can help simplify the whistle code and make compilation of resources faster. However, they not expected to be necessary too often. Read more about configuration [dataset entries](https://nih-ncpi.github.io/ncpi-whistler/#/ref/project_config?id=the-dataset-list-dataset).
### Modular Configurations
Expand All @@ -277,13 +279,14 @@ This would be your normal, standard configuration, but let's say you are working
```yaml
active_tables:
subject: true
family: false
family: true
conditions: false
sample: true
sequencing: false
discovery: false
harmony: false
```
For this, Whistler would only process the tables, *subject* and *sample*. The whistle code would never even see *family*, *conditions*, *sequencing* nor *discovery* and would not attempt to build any resources from those tables.
For this, Whistler would only process the tables, *family*, *subject* and *sample*. The whistle code would never even see *conditions*, *sequencing* nor *discovery* and would not attempt to build any resources from those tables. Using these would allow you to run tests quicker to identify changes to your whistle code did what you want. Please note that in order for whistle to see *subject* for this configuration file, you must always provide *family* since the *subject* data is embedded inside the *family* table. Hiding *family* from whistler will also hide any embedded tables.
For now, let's just assume we want Whistler to process all of the tables. But, rather than use the *ALL*, we'll simply set each individual table to true:
```yaml
Expand Down Expand Up @@ -399,7 +402,7 @@ dataset:
discovery:
filename: data/sequencing.csv
code_harmonization: harmony/data-harmony.csv
key_columns: subject_id, sample_id
key_columns: subject_id, sample_id, chrom, pos
data_dictionary:
filename: data/discovery-dd.csv
colnames:
Expand Down

0 comments on commit 6b1e8c4

Please sign in to comment.