-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add pb_read
and pb_write
functions
#115
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this strategy (specifically the guess_read_function()
) always makes me nervous. While all abstractions are a bit leaky, I've always found attempts to write a generic "read" function as an abstract method particularly leaky. There's just too much going on in read_
methods to abstract away (what about other data serializations, like spatial formats? what about lazy reads / remote reads etc).
I'm fine with a convenience wrapper that makes common use patterns more concise. Maybe all that's needed is a bit more documentation saying something like 'for common formats such as ..., along with advice about how to bypass this convenience if a user prefers their own read function (say, from
readr` package, or for some other format).
I believe this will fail on spatial formats since it would not be one of csv, tsv, rds, parquet? (unfamiliar with how geoparquet works and whether arrow::read_parquet will process geoparquet, but I assume yes?).
Yep, this will read eagerly by design/default, and maybe it's a bad thing for folks who should be thinking about optimizing - however uninformed users would currently do pb_download anyways so it's not necessarily much different than that? I agree with improving the docs, e.g.
|
3edf4e8
to
e3c07b8
Compare
cd41835
to
6ff7768
Compare
0042f61
to
21a4737
Compare
Flow state hit me like a bus, many apologies for this PR running away from me. Since your last review (diff), I:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good!
Closes #97.
I thought briefly about making this a wrapper around pb_download_url + read function that accepts URLs, but I don't think it had the flexibility I wanted plus I ran into issues downloading from private repositories that I later learned was around not being able to pass an auth token to it.
I think this is the most flexible approach to the problem but would love to hear any thoughts