Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GC old inventory listings #198

Open
yarikoptic opened this issue Nov 15, 2024 · 0 comments
Open

GC old inventory listings #198

yarikoptic opened this issue Nov 15, 2024 · 0 comments

Comments

@yarikoptic
Copy link
Member

As "discovered" in

we might not really need historical records of inventory to achieve a "full backup" of S3. Inventory dumps themselves are quite large! I am still fetching (to facilitate analysis etc, but might stop doing that) and so far fetched 14TB. As such, it is a notable amount of storage . Here is how they grew through the years (per day)

(dandisets-2) dandi@drogon:/mnt/backup/dandi/dandiarchive-inventory$ code/print-manifest-summary dump/202*-01-01T*/manifest.json
dump/2020-01-01T00-00Z/manifest.json : 1 entries,   197K total size
dump/2021-01-01T00-00Z/manifest.json : 1 entries,    3.8M total size
dump/2022-01-01T00-00Z/manifest.json : 1 entries,      17M total size
dump/2023-01-01T01-00Z/manifest.json : 384 entries,         36G total size
dump/2024-01-01T01-00Z/manifest.json : 406 entries,         38G total size

and this year grew to 39G per day(!) which would amount 14TB per year just for the dumps (so I expect to fetch then 40TB... may be should interrupt and fetch specific days and their data only).

Mostly it is due to all the zarr/s. But it remains the case that we might want to prune some old inventory listings soonish. (attn @satra with whom we briefly discussed some bucket GCing to do)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant