-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assetstore Import Tracker / Repeater #197
Comments
@dgutman Did I miss anything in our desired feature list here? I recognize that you would like a cron-like task to repeat imports at some point. I think we need hash-matching for that to actually do what we want, and I think it is too risky to ever automate deleting missing items. If we ever cron imports, then we should probably cron checking for missing files and report that somewhere (next to the imports list, maybe?) so that the admin can decide what to do. Ages ago I was involved in a project where we automatically added and removed files from a database when they came and when on NAS-like devices. Devices with intermittent availability (for instance, across any network) made auto removal very risky. |
This is obviously complicated and potentially expensive in terms of walking
gigantic filesystems....
I think an option to "hide" images based on inaccessibility may be
reasonable... since these images usually still have cached thumbnails and
also NFS and other disconnected asset stores can be disconnected for many
many reasons, we obviously don't want to just delete these links. In many
cases I still have metadata associated with an item that I may want to
retrieve, even if the image is not online currently.
Perhaps first thing to do is clean up how the DSA responds when it tries
(and fails) to access an image.. it currently throws errors and/or the
server becomes generally unhappy. Similarly, it would be good to maybe use
some sort of badge/decorator to annotate images that appear to be
"disconnected". There's also likely two big differentiators.. a single
file going missing may merit a "badge" on that image, since it may suggest
a single file was moved/deleted. In the case an entire directory goes
"dark", we may want to handle them separately.
Finally, we may want to have the option to "hide" missing images depending
on user class. I image in production, it may be useful if the admin and/or
collection OWNER sees files that have gone MIA, but we may want to hide
those files from other classes of users..
…On Tue, Mar 1, 2022 at 9:47 AM David Manthey ***@***.***> wrote:
@dgutman <https://github.com/dgutman> Did I miss anything in our desired
feature list here? I recognize that you would like a cron-like task to
repeat imports at some point. I think we need hash-matching for that to
actually do what we want, and I think it is too risky to ever automate
deleting missing items. If we ever cron imports, then we should probably
cron checking for missing files and report that somewhere (next to the
imports list, maybe?) so that the admin can decide what to do.
Ages ago I was involved in a project where we automatically added and
removed files from a database when they came and when on NAS-like devices.
Devices with intermittent availability (for instance, across any network)
made auto removal very risky.
—
Reply to this email directly, view it on GitHub
<#197 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAFODTT35UXFXEEFEQRFCVTU5YUZVANCNFSM5PUH3BHQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you were mentioned.Message ID:
***@***.***
com>
--
David A Gutman, M.D. Ph.D.
Associate Professor of Neurology
Emory University School of Medicine
|
The import endpoint supports include/exclude RegEx . We don't expose that in UI (we probably should). |
It sounds like when we check for missing files, we would just add some chunk of metadata to the file (and possibly to its parent item) that we could remove again if the file comes back. Then showing missing ones could trivially be done by a virtual folder that matches on that metadata. Since the check for something being present/missing is likely to be stale when we actually try to access something, then any actions we take that expect that flag to be one way or another would have to check again. Throwing errors when a file is missing is outside the scope of this plugin (and probably differs in the Girder interface versus the HistomicsUI interface). Let's address what we want to do about that in a different issue. |
I don't know enough about the Girder implementation to know whether this is sufficiently relevant, but just in case it is ... rsync -a farway:MySource/ 2022-03-01/ --link-dest=2022-02-28/ --link-dest=2022-02-27/ --link-dest=2022-02-26/ Although N.B. the last I checked, which was about 10 years ago, there was a limit of, maybe, 20 |
@Leengit We aren't copying anything in this -- we are just indexing files that exist somewhere -- it could be a filesystem or an S3 bucket or a GridFS server, etc. "Import" is an indexing operation, not a copy operation. |
I've begun work on this here: https://github.com/DigitalSlideArchive/import-tracker |
@AlmightyYakob We should move the individual parts of this task to issues on https://github.com/DigitalSlideArchive/import-tracker. |
I've moved all the details from this issue to separate issues in https://github.com/DigitalSlideArchive/import-tracker, so I'm closing this issue. |
This is a summary of a long-desired feature. Once a repo is created for such a feature, any issues related to it should be moved there (e.g., #193).
We'd like to have a Girder plugin that records when any Import action is done on an assetstore. This would record all of the options: path, destination, etc. for arbitrary assetstore types (probably by hooking the import endpoint event), plus the time that the import started.
We want to show a list of import actions, sorted most-recent first with appropriate details and a button to repeat the import exactly as done before. This list would be accessible from a button somewhere on the assetstore list page and would probably need to be paged. For repeated imports with exactly the same options and assetstore, maybe instead of showing each import as separate line, it would show a "number of times" and the most recent time? In the list, we want to show sensible names, not just girder ids, for collections and folders.
As a bonus, it would be great if when we went to an assetstore import page we showed the last few (10?) imports that were done for that assetstore, so that the user could redo them or see how they wanted to do something differently.
The further feature would be optionally modifying how repeated imports are done: currently if a file doesn't exist in the expected target directory, it is created. We frequently import a directory-tree of files, then organize them in Girder so they are not conceptually in the original directory-tree. Reimporting makes duplicates of all of these files. It would be great if there were an option in import to "skip if file already is in Girder somewhere" -- this can be done by matching the import path. If the file size has changed, we would update the existing file. The more sophisticated method would be to use the computed hash and match on that -- the file might have been renamed either on the assetstore OR in Girder, and, if the hash matches, it would be nice to not have a duplicate. This would be slower, as the hash has to be computed.
It would be nice to have a feature to flag any file in girder that is no longer available on an assetstore. For filesystem assetstores, this would confirm the path is reachable. For S3 assetstores, this would have to confirm the asset is still in the bucket (so would probably be slow). If we did this, we would probably want to show a list of such files (or only such files on a specific assetstore, or only such files from a specific import path) and then have an option to delete associated Girder items (and probably prune empty girder folders, too).
The text was updated successfully, but these errors were encountered: