WebDAV view to DANDI Archive #164

jwodder · 2023-12-13T17:24:10Z

Closes #166.

To do:

Get Properly open filehandles for RemoteReadableAssets dandi-cli#1376 merged & released
Add a README
Lay out Dandiset versions as described here
Present assets in directory hierarchies
Present Zarr entries in directory hierarchies

- Put all Dandisets under `/dandisets/` - Set the members of each Dandiset to `draft`, `latest`, and `releases`

yarikoptic · 2023-12-16T21:42:42Z

This is awesome even if there is no redirections et (or are there??) !

Some initial observations

browsing it all in chromium is great -- fast and responsive to go among folders.
Tried FUSE mounts
- TL;DR Summary: no winner/ideal yet -- behavior is quite different across! 1 bug spotted. Not sure yet why rclone mount does not provide any content yet.
- webdavfs - should have RANGE requests support, built from Go
  - very slow navigation (under mc). In part (according to logs) since my shell checks each folder for presence of .hg, .bzr, .git, .nols. And that is going up in the hierarchy! I think we could safely assume that no such files would be there . Generally, by default I think we should just ignore all .dotfiles and 404 them but have an option to enable support for them. edit: .zarr folders have .dotfiles we must show by default. So I think it might be better to explicitly black list some and add option to override that , e.g. make it --black-list-regex defaulting to '\.(bzr|git|nols|svn)$')
    - attempt to dandi ls an nwb failed -- requests were like
      15:38:28.743 - DEBUG : Raising DAVError 404 Not Found: /dandisets/000004/draft/sub-P10HMH/__editable__.dandi-0.56.2+11.ge51981d0.finder.__path_hook__ which is I guess specific to webdavfs
- davfs2
  - seems to be not very actively maintained by kept afloat (last release in 2022)
  - listing of folders slow -- does sequentially PROPFIND on every subfolder
  - DANDI_DEVEL=1 dandi ls --use-fake-digest *nwb under /tmp/dandiarchive-fuse2/dandisets/000004/draft/sub-P27HMH worked out ok: takes around 28 sec, but subsequent just 3sec due to our fscache (can disable with DANDI_CACHE=ignore and then get repeated 28ish sec)
- rclone mount
  - fast listing of folders

eyeballed following exception being raised at this code level: AttributeError: 'Version' object has no attribute 'timestamp' -- happened only when I got to `rclone` and it seems that forbids it to operate

16:19:53.889 - ERROR   : Caught HTTPRequestException(HTTP_INTERNAL_ERROR)
Traceback (most recent call last):
  File "/home/yoh/proj/dandi/dandi-infrastructure/webdav/venvs/dev3/lib/python3.11/site-packages/wsgidav/error_printer.py", line 50, in __call__
    for v in app_iter:
  File "/home/yoh/proj/dandi/dandi-infrastructure/webdav/venvs/dev3/lib/python3.11/site-packages/wsgidav/request_resolver.py", line 224, in __call__
    for v in app_iter:
  File "/home/yoh/proj/dandi/dandi-infrastructure/webdav/venvs/dev3/lib/python3.11/site-packages/wsgidav/request_server.py", line 126, in __call__
    app_iter = provider.custom_request_handler(environ, start_response, method)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yoh/proj/dandi/dandi-infrastructure/webdav/venvs/dev3/lib/python3.11/site-packages/wsgidav/dav_provider.py", line 1620, in custom_request_handler
    return default_handler(environ, start_response)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yoh/proj/dandi/dandi-infrastructure/webdav/venvs/dev3/lib/python3.11/site-packages/wsgidav/request_server.py", line 361, in do_PROPFIND
    propList = child.get_properties("allprop")
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yoh/proj/dandi/dandi-infrastructure/webdav/venvs/dev3/lib/python3.11/site-packages/wsgidav/dav_provider.py", line 602, in get_properties
    name_list = self.get_property_names(is_allprop=mode == "allprop")
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yoh/proj/dandi/dandi-infrastructure/webdav/venvs/dev3/lib/python3.11/site-packages/wsgidav/dav_provider.py", line 545, in get_property_names
    if self.get_creation_date() is not None:
       ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yoh/proj/dandi/dandi-infrastructure/webdav/dandidav.py", line 349, in get_creation_date
    return self.dandiset.version.timestamp()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'Version' object has no attribute 'timestamp'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/yoh/proj/dandi/dandi-infrastructure/webdav/venvs/dev3/lib/python3.11/site-packages/wsgidav/error_printer.py", line 83, in __call__
    raise as_DAVError(e)
wsgidav.dav_error.DAVError: 500

if I patch with such crude one

diff --git a/webdav/dandidav.py b/webdav/dandidav.py
index e7429af..2aa3fcd 100644
--- a/webdav/dandidav.py
+++ b/webdav/dandidav.py
@@ -346,7 +346,10 @@ class VersionResource(AssetFolder):
         return False
 
     def get_creation_date(self) -> float:
-        return self.dandiset.version.timestamp()
+        try:
+            return self.dandiset.version.timestamp()
+        except:
+            return None
 
     def get_last_modified(self) -> float:
         return self.dandiset.version.modified.timestamp()

then

davfs2 (takes awhile to cd /tmp/dandiarchive-fuse2/dandisets/000004/draft/sub-P10HMH due to it still getting PROPFIND on every dandiset under dandisets/... uff) -- dandi ls still works fine in 4 GETs with similar to before timing
rclone mount -- cannot give content : even cat of dandiset.yaml is coming out empty. the dandi ls seems to do similar 4 GETs but then gives no metadata , logs say "Problem obtaining metadata for sub-P10HMH_ses-20060901_ecephys+image.nwb: Unable to synchronously open file (message not aligned)".

webdav/dandidav.py

Co-authored-by: Yaroslav Halchenko <debian@onerussian.com>

jwodder · 2023-12-18T13:42:07Z

@yarikoptic

This is awesome even if there is no redirections et (or are there??) !

Redirections are not implemented, and I'd rather not have to implement them the suggested way.

webdavfs - should have RANGE requests support

wsgidav disables Range support by default, but I think I can enable it.

jwodder · 2023-12-18T13:48:40Z

@yarikoptic I think I've enabled Range support; try it now.

webdav/dandidav.py

yarikoptic · 2024-01-03T16:58:44Z

@jwodder could you please assess performance of this webdav implementation (even without redirects) with those davfs2 and webdavfs (rclone seems to be just not cooked enough) and compare against datalad-fuse (fsspec based) for the purpose of our https://github.com/dandi/dandisets-healthstatus ?

jwodder · 2024-01-03T17:36:10Z

@yarikoptic Exactly what operation do you want me to run with dandisets-healthstatus? Do you want me to run the healthcheck on every asset or just specific assets? What mode? Etc.

yarikoptic · 2024-01-03T17:48:22Z

@yarikoptic Exactly what operation do you want me to run with dandisets-healthstatus?

a sample run of

pynwb_open_load_ns
matnwb_nwbRead
dandi ls (to load metadata)

while having DANDI_CACHE=ignore set I guess to avoid any possible caching side effects from fscacher.

I also wonder if any of them has functionality for local caching but that could be investigated later.

Do you want me to run the healthcheck on every asset or just specific assets?

just a sample asset of some "typical" size (a few GBs) so there is no full download. May be would be good if ran on something on what we currently have datalad-fuse timing out. Eg. looking at https://github.com/dandi/dandisets-healthstatus?tab=readme-ov-file#summary could be the 000016 sub-mouse1-fni16/sub-mouse1-fni16_ses-161228151100.nwb (although it is just 440M so not really in GBs but next 2 dandisets have even smaller ones)

What mode? Etc.

What do you mean by mode here specifically?

Overall goal here to see if we could make dandisets-healthstatus more efficient by not relying on our custom datalad-fuse but rather on this webdav and some "more developed/tested" webdav FUSE solution.

jwodder · 2024-01-05T14:12:48Z

@yarikoptic

What exactly should dandi ls be run on? Should it be run once per asset on each of the same assets that the tests are run on, or something else?
By "mode", I meant dandisets-healthstatus's check mode, but if the tests are only to be run on certain assets, it's irrelevant.

jwodder · 2024-01-05T14:38:45Z

@yarikoptic How exactly should the timing of tests be done? dandisets-healthstatus has no facilities for timing, and measuring the runtime of the dandisets-healthstatus command will pick up time spent installing MatNWB.

jwodder · 2024-01-05T15:09:24Z

@yarikoptic Are you expecting me to run these tests on drogon (which is where dandisets-healthstatus is normally run and where its clone of dandi/dandisets is)? Neither davfs2 nor webdavfs is installed on drogon, and I don't believe I my account has permission to install software on drogon via apt (unless they should be installed via conda instead?).

yarikoptic · 2024-01-05T17:19:38Z

What exactly should dandi ls be run on?

on that same asset(s) on which you run any other benchmark. E.g. the suggested "random" dandiset 000016 sub-mouse1-fni16/sub-mouse1-fni16_ses-161228151100.nwb asset

FWIW, I see we needing

script (or a command within script) e.g. run_benchmarks which would be given 1. top level for e.g. /tmp/dandisets-fuse/ ; 2. a list of assets (e.g. as 000016/sub-mouse1-fni16/sub-mouse1-fni16_ses-161228151100.nwb) . The script would produce a record of timings for the benchmarks across those assets, and summary for the "total"
script (or a command within script), e.g. run_benchmarks_across_fuses which would be given the list of assets, and then take care about
- looping through possible fuse solutions: datalad-fuse, webdav+webdavfs, webdav+davfs2
  - starting a fuse mount, running run_bechmarks on that fuse mount and assets
- report overall winner and winners per each of the tests (we might need per file later)

Should it be run once per asset on each of the same assets that the tests are run on, or something else?

dandi ls is just one of the benchmarks -- how long it would take. All benchmarks are to be ran on the same list of assets

@yarikoptic How exactly should the timing of tests be done? dandisets-healthstatus has no facilities for timing, and measuring the runtime of the dandisets-healthstatus command will pick up time spent installing MatNWB.

hm, good concern -- setup of the environment should not be a part of the benchmarking time. I can only hypothesize on how to accomplish the drill here, e.g.

to add timing within records healthstatus produces, and estimate exactly around running the test
in benchmarking code just do call outs to code in healthstatus to setup env but otherwise has its own "running" of benchmarks and do timing. Might be best since we might eventually want to change how to run benchmarks, e.g. multiple times and have "cold" and "warm" times, which is nothing what we would want to do in healthstatus

@yarikoptic Are you expecting me to run these tests on drogon (which is where dandisets-healthstatus is normally run and where its clone of dandi/dandisets is)? Neither davfs2 nor webdavfs is installed on drogon, and I don't believe I my account has permission to install software on drogon via apt (unless they should be installed via conda instead?).

eh, drogon is probably consistently inconsistent in its load to run benchmarks. May be let's run it on smaug which should generally be less loaded.

I have built webdavfs under /opt/webdavfs/webdavfs.

That one seems to be entirely magically not picky and runs just fine under your account

(base) smaug:~$ mkdir /tmp/dandisets-fuse
(base) smaug:~$ /opt/webdavfs/webdavfs http://localhost:8080 /tmp/dandisets-fuse
http://localhost:8080: no PUT Range support, mounting read-only
^Z
[1]+  Stopped                 /opt/webdavfs/webdavfs http://localhost:8080 /tmp/dandisets-fuse
[148]
(base) smaug:~$ bg
[1]+ /opt/webdavfs/webdavfs http://localhost:8080 /tmp/dandisets-fuse &
(base) smaug:~$ ls -ld /tmp/dandisets-fuse/dandisets/000003
drwx------ 1 jwodder jwodder 0 Nov  6  2020 /tmp/dandisets-fuse/dandisets/000003

unmounting though seems tricky -- just stopping that process not enough and demands running umount as root :-/ You now can run sudo /usr/local/sbin/unmount-tmp-fuse which force unmounts /tmp/dandisets-fuse (just use that path for your mounts)

... I will figure out later for davfs2, it is fighting me... installed system wide and also more recent one under /opt/davfs2/DESTDIR/usr/local/sbin/ but it still wants root to mount... may be you see how

jwodder · 2024-01-05T17:57:50Z

@yarikoptic So you want all the mounting & benchmarking to be done by a script. Is there a reason for this script to be standalone versus a subcommand added to dandisets-healthstatus?

yarikoptic · 2024-01-06T18:56:15Z

yes -- by a script so we could easily redo on some other assets, or extend list of fuse systems to try etc.

if you see that it is easier to implement within dandisets-healthstatus somehow -- I totally do not mind.

jwodder · 2024-01-08T16:10:37Z

@yarikoptic

if you see that it is easier to implement within dandisets-healthstatus somehow -- I totally do not mind.

Well, you already said above:

in benchmarking code just do call outs to code in healthstatus to setup env but otherwise has its own "running" of benchmarks and do timing

Seeing as the benchmarks include tests currently implemented in dandisets-healthstatus, it seems absurd to use it for just environment setup but not running of those tests; hence, I can see the following options:

Implement the benchmarking as one or more subcommands added to dandisets-healthstatus
- But what subcommands? Should there just be one subcommand that does all the benchmarking at once (mount mounts, run & time tests)? Do we need (as you suggested above) a run_benchmarks command that just runs & times the tests? Should there be dedicated subcommands for mounting each of the three mount types and unmounting once the user hits Ctrl-C? Perhaps one subcommand that mounts a single mount type specified on the command line, runs & times the tests, and then unmounts?
- If the benchmarking is to be implemented as part of dandisets-healthstatus, I request that you create a new issue in that repository for this.
Implement the benchmarking as a separate script that has dandisets-healthstatus as a dependency
- Since the benchmarking script will be separate from dandisets-healthstatus, this comes with the risk that any future change to the latter will break the former. One option to address this would be to include a Git commit hash in the benchmarking script's requirements specifier for dandisets-healthstatus, but then the benchmarking script won't get any benefits that may come from future updates to dandisets-healthstatus.
- If we do this, where should the benchmarking script be saved? I assume you would like it committed to some repository, but which one? This one (dandi/dandi-infrastructure)?

jwodder · 2024-01-08T18:50:32Z

Code moved to https://github.com/dandi/dandi-webdav.

jwodder added 8 commits December 13, 2023 12:17

Proof-of-concept WebDAV view to DANDI Archive

a4ae52b

Add README

1a3a30e

Require dandi >= 0.58.2

c391310

Fix

d74d572

Adjust Dandisets layout

7cf6f1e

- Put all Dandisets under `/dandisets/` - Set the members of each Dandiset to `draft`, `latest`, and `releases`

Lay out assets in directory hierarchies

4336887

Delay fetching of Dandiset metadata

66b7864

Lay out Zarr entries in directory hierarchies

21e4f39

jwodder marked this pull request as ready for review December 14, 2023 16:03

jwodder requested a review from yarikoptic December 14, 2023 16:04

yarikoptic reviewed Dec 16, 2023

View reviewed changes

webdav/dandidav.py Outdated Show resolved Hide resolved

jwodder and others added 2 commits December 18, 2023 08:19

Set dandiset.yaml content type to text/*

aac80c5

Co-authored-by: Yaroslav Halchenko <debian@onerussian.com>

Fix VersionResource.get_creation_date()

3cbcdcd

Support ranges

a6244bd

jwodder added 2 commits December 18, 2023 08:54

Assume certain dotfiles don't exist

8105ec7

Better pre-checking of Dandiset & version names

c6d7f7c

yarikoptic reviewed Dec 19, 2023

View reviewed changes

webdav/dandidav.py Show resolved Hide resolved

jwodder added a commit to dandi/dandi-webdav that referenced this pull request Jan 8, 2024

Import code from dandi/dandi-infrastructure#164

9ed6202

jwodder closed this Jan 8, 2024

jwodder mentioned this pull request Jan 8, 2024

Compare performance of webdav+davfs2, webdav+webdavfs, and datalad-fuse dandi/dandisets-healthstatus#66

Closed

jwodder mentioned this pull request Mar 27, 2024

Add benchmarking of various mounting strategies dandi/dandisets-healthstatus#67

Merged

15 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WebDAV view to DANDI Archive #164

WebDAV view to DANDI Archive #164

jwodder commented Dec 13, 2023 •

edited

Loading

yarikoptic commented Dec 16, 2023 •

edited

Loading

jwodder commented Dec 18, 2023

jwodder commented Dec 18, 2023

yarikoptic commented Jan 3, 2024

jwodder commented Jan 3, 2024

yarikoptic commented Jan 3, 2024

jwodder commented Jan 5, 2024

jwodder commented Jan 5, 2024

jwodder commented Jan 5, 2024

yarikoptic commented Jan 5, 2024 •

edited

Loading

jwodder commented Jan 5, 2024

yarikoptic commented Jan 6, 2024

jwodder commented Jan 8, 2024

jwodder commented Jan 8, 2024

WebDAV view to DANDI Archive #164

WebDAV view to DANDI Archive #164

Conversation

jwodder commented Dec 13, 2023 • edited Loading

yarikoptic commented Dec 16, 2023 • edited Loading

jwodder commented Dec 18, 2023

jwodder commented Dec 18, 2023

yarikoptic commented Jan 3, 2024

jwodder commented Jan 3, 2024

yarikoptic commented Jan 3, 2024

jwodder commented Jan 5, 2024

jwodder commented Jan 5, 2024

jwodder commented Jan 5, 2024

yarikoptic commented Jan 5, 2024 • edited Loading

jwodder commented Jan 5, 2024

yarikoptic commented Jan 6, 2024

jwodder commented Jan 8, 2024

jwodder commented Jan 8, 2024

jwodder commented Dec 13, 2023 •

edited

Loading

yarikoptic commented Dec 16, 2023 •

edited

Loading

yarikoptic commented Jan 5, 2024 •

edited

Loading