GC old inventory listings #198

yarikoptic · 2024-11-15T19:56:01Z

As "discovered" in

Inventory-based backup tool dandi-utils#3

we might not really need historical records of inventory to achieve a "full backup" of S3. Inventory dumps themselves are quite large! I am still fetching (to facilitate analysis etc, but might stop doing that) and so far fetched 14TB. As such, it is a notable amount of storage . Here is how they grew through the years (per day)

(dandisets-2) dandi@drogon:/mnt/backup/dandi/dandiarchive-inventory$ code/print-manifest-summary dump/202*-01-01T*/manifest.json
dump/2020-01-01T00-00Z/manifest.json : 1 entries,   197K total size
dump/2021-01-01T00-00Z/manifest.json : 1 entries,    3.8M total size
dump/2022-01-01T00-00Z/manifest.json : 1 entries,      17M total size
dump/2023-01-01T01-00Z/manifest.json : 384 entries,         36G total size
dump/2024-01-01T01-00Z/manifest.json : 406 entries,         38G total size

and this year grew to 39G per day(!) which would amount 14TB per year just for the dumps (so I expect to fetch then 40TB... may be should interrupt and fetch specific days and their data only).

Mostly it is due to all the zarr/s. But it remains the case that we might want to prune some old inventory listings soonish. (attn @satra with whom we briefly discussed some bucket GCing to do)

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GC old inventory listings #198

GC old inventory listings #198

yarikoptic commented Nov 15, 2024

GC old inventory listings #198

GC old inventory listings #198

Comments

yarikoptic commented Nov 15, 2024