Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A convention for functions #24

Open
wfmackey opened this issue Apr 19, 2021 · 8 comments
Open

A convention for functions #24

wfmackey opened this issue Apr 19, 2021 · 8 comments
Labels
question Further information is requested

Comments

@wfmackey
Copy link
Collaborator

wfmackey commented Apr 19, 2021

See: #23 (comment) from @mrjoh3

I think we have to make a decision on a prefix for data retrieval functions from abscorr, and develop a convention for what is attached to a function, what is included in an argument, etc.

The prefix can some something simple and consistent, maybe getdata_.

The convention should include standard positions for <connecting structure>(eg ASGS SA2) and <data name> (eg SEIFA scores), and then additional data options, such as <data subclass> (eg SEIFA IRSAD), <year>, etc.

From that, some options are:

  1. everything as arguments: getdata(data = <data name>, structure = <connecting structure>, data_subclass = <data subclass>); so: getdata(data = "seifa", structure = "sa2", data_subclass = "irsad")
  2. data name in function name, ala purrr family: getdata_<data name>(structure = <connecting structure>, data_subclass = <data subclass>); so: getdata_seifa(structure = "sa2", data_subclass = "irsad")
  3. structure + data name in function name: getdata_<connecting structure>_<data name>_(data_subclass = <data subclass>); so: getdata_sa2_seifa(data_subclass = "irsad")
  4. and etc etc any combinations of the above.

Maybe one way to start is to have a getdata() master function (option 1), and then build helper functions (2, 3) on top of them (if desirable).

@mrjoh3
Copy link
Collaborator

mrjoh3 commented Apr 20, 2021

ha, I have built an entire internal package for accessing data that I called getdata. Great minds and all that.

what about just "get_"? from the options above I think I prefer number 2.

 getdata_seifa(structure = "sa2", data_subclass = "irsad")

it matches best at least in the seifa case

@wfmackey
Copy link
Collaborator Author

Yep, let's run with that for now. And I am fine with get_ as the prefix, eg get_seifa. We'll re-assess once a few datasets have been built in, see how it's working etc.

More generally in terms of data, down the line we should also think about:

  1. what data the package should contain. For example, restricting it to data that can be matched to an existing structure. I also want to make sure the abscorr package doesn't unnecessarily overlap with readabs in terms of time-series data retrieval. Down the line, we'll probably need some sort of inclusion criteria.
  2. what cleaning standards there should be for data included. Minimalist, consistent and well-documented cleaning will be important for broad use.

@daviddiviny
Copy link
Collaborator

Also, I would suggest:

  • clean_ for the functions that standardise and returns the official title.
  • return_ for a family of functions that return the code or the level etc for a title - need to think about this one a bit more.

Should all this be documented in a wiki? @wfmackey

@wfmackey
Copy link
Collaborator Author

wfmackey commented May 30, 2021

A wiki is a great idea, @daviddiviny ! It could be a section in the larger 'how to contribute' document #4.

@MattCowgill
Copy link
Collaborator

FYI @daviddiviny I've opened a PR to merge the strayr package with this package. The strayr function has been renamed clean_state() (though strayr() still works as a wrapper around clean_state()).

@wfmackey wfmackey added the question Further information is requested label Jun 1, 2021
@wfmackey
Copy link
Collaborator Author

wfmackey commented Jun 7, 2021

Thinking about this some more recently. The package contains 'families' of functions denoted by their prefix. So far we have:

  • clean_: tools for converting a vector to an existing structure
  • parse_: tools extracting usable data contained in a character vector commonly seen in Australian data (eg income buckets)
  • getdata_: returning commonly-used data stored in the package that can be matched to structures

and then we have structures -- eg anzsco -- stored as objects rather than functions, and functions that exist outside those families (ie is_holiday).

Some questions/thoughts are:

  • should structures sit inside functions with the prefix get_? eg get_anzsco() would return the anzsco tibble. This has the benefits of being consistent with other elements of the package, and allows options (eg get_anzsco(long = TRUE) could return a long version of anzsco, similar to what @daviddiviny has suggested in the past: Consistent naming convention for structures?  #17 (comment)).

  • should we think about an additional family of helper functions that perform common tasks, such as inflating dollar figures using CPI? These could have the prefix run_ or make_ or something (neither of those suggestions are good)? This could incorporate the parse_ family of functions. So we could have run_income_parser, run_cpi_inflator, run_holiday_id etc.

  • should we force functions into those groups or is that being a bit pedantic about the whole thing? I like families with common prefixes because it allows people to 'tab' through to see what's available (this is a Hadley tip, too), and it helps quickly summarise the usages of the package. It also neatly defines what particular functions should do for all contributors. For example, the clean_ functions should take a vector and match it to an existing structure.

@MattCowgill
Copy link
Collaborator

is_holiday could become parse_holiday I reckon @bryceroney

@bryceroney
Copy link
Contributor

I am open to changing it, not sure if putting it in the parse_ family is the best option though as those current functions are more about cleaning text rather than returning a boolean.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants