Must meet all three: How to name files - Jennifer Bryan
- Machine-readable
- Human-readable
- Sortable
dataset-name_time-granularity_grouping_year.extension
-
Lower case underscore naming convention
-
Descriptive file names
-
An underscore separates the different descriptors
-
Hyphens are used in place of spaces within descriptors
Descriptors
- Order file should be run (00-10)
- Dataset Name, Source, Location
- Time granularity (hourly, daily, minutely, yearly, etc.)
- Grouping categorizer ex ‘by-age’, ‘by-cd’, ‘by-cd-age’
- Date or Year
Examples:
- acs_unemployment_by-cd_2018.csv
- 😢 N_per_day_age_pop.csv → 🥳 nys_doc-population_daily_by-age
- No dots in variables names (not python friendly)
- Lower case underscore naming convention
- Descriptive
- Comment first use of variable (if not completely obvious)
- Functions and prior scripts should be referenced at the top of every file.
- Each file should do one thing.
- Functions or complicated cleaning or scraping should be put into its own script.
- Often is a .R or .PY script
- There can be a file that calls or references previous files. (One file to run it all or to produce final outputs)
- Often is a .RMD or .IPYNB or bash file
- Functions or complicated cleaning or scraping should be put into its own script.
- Avoid hard-coding values into your code.
- Recommend DRY code. Turn repeated code into a function that takes the changes as parameters.
- Files that depend on other files require a numbered file name.
- Delineate sections of code
- Explain parameter choices of functions or hard-coded values
- Explain first use of variable (if not completely obvious)
There are two types of repositories: project and collection.
For project repositories:
- Each data request is its own repo.
- The Readme, About, and Tags are filled out.
- No passwords are ever shown on a repo.
- If there is no private data, then repo is destined to be public.
- When cleaning up a repo, old, unused code/data/visuals can be placed into an 'archived' called folder.