Implement quota tracking options per ObjectStore. #10221

jmchilton · 2020-09-15T00:45:50Z

Builds on #10212.

Overview

#6552 implemented the ability for admins to assign job outputs to different object stores at runtime (this could take into account tool/workflow injected parameters or just be based on user, tool, destination, cluster state, etc..). But all the stored data would consume the same quota - regardless of the source selected.

This pull request allows different object stores or different groups of object stores to have different quotas or no quota at all. This enables uses cases such as sending job to cheaper data when a user's quota is getting near full or allowing admin to setup tool and/of workflow parameters to send job outputs higher quality, more redundant storage based on user selected options or user preferences.

This is a substantial step forward toward allowing scratch-space histories, while I suspect we want to implement some higher level convince functions and interface around that (per history preferences, object store preferences types) - I think that would all be based on these abstractions - abstractions that allow even more flexibility for admins who require it.

Implementation

This adds the quota tag to XML/YAML object store declarations - that allow specifying a "quota source label" for each objectstore in a nested objectstore or disabling quota all together on objectstores.

The following quota block would assign all this storage to a quota source labelled with s3.

        <backend id="dynamic_s3" type="disk" weight="0">
            <quota source="s3" />
            <files_dir path="${temp_directory}/files_dynamic_s3"/>

Whereas this would disable quota usage for this object store altogether.

        <backend id="temp_disk" type="disk" weight="0">
            <quota enabled="false" />
            <files_dir path="${temp_directory}/files_cloud_scratch"/>

In order to implement this a new table/model has been added to track a user's usage per quota source label - namely UserQuotaSourceUsage. Object stores that did not have a source label are still tracked using the User model's disk_usage attribute. I've updated all the scripts that recalculate user usage.

UI + API

The quota dialog adds the option to pick a quota source label from those defined on the object stores, though this option only appears if quota source labels are configured.

Likewise, by default the quota meter is unaffected but when multiple quota source labels are configured the meter becomes a link that shows the usage of each quota source.

A new API /api/users/<user_id|current>/usage enables this.

Abstractions for #4840

While this PR adds significant complexity related to recalculating a User's quota - it does reduce the duplication, adds tests (made more useful by having fewer paths through the quota recalculation code), and bring object store information into the calculation. I think this is all stuff that would be needed for #4840 and currently missing.

Part of this establishes a pattern for how to exclude certain datasets from usage calculation both when it is being added (included in #4840) and when re-calculdated (not included in #4840).

The API endpoints for disk usage across object stores and the UI entry point for displaying that information will hopefully both enable a more robust implementation of #4840.

jmchilton · 2020-11-20T00:40:03Z

In long conversation with @natefoo and @mvdbeek we decided this needs to go a bit further at least before being rolled out on to main.

Longer term we need to have the ability to copy from one objectstore to another asynchronously, but until that is ready there are certain copies that are effectively just changing the object_store_id on a dataset and those should be implemented - with quota recalculation, a UI, etc...
Histories need to be filterable by datasets in a given objectstore - so users can see data scheduled for deletion. This can just piggyback on existing UI filtering plumbing.

I'd also love a little summary of objectstore, usage, etc.. within a history - perhaps using disk usage per dataset widget Dannon demo'd years ago (@dannon do you have a link to that sitting in a branch somewhere?) - but that might be something that should be an iteration 2 type of thing.

dannon · 2020-11-20T13:28:09Z

@jmchilton I'll see if I can dig it up -- I know I have it somewhere and it'd be great for that to see use somewhere.

jmchilton added kind/enhancement area/objectstore labels Sep 15, 2020

galaxybot added the status/WIP label Sep 15, 2020

jmchilton force-pushed the quota_per_objectstore branch 6 times, most recently from 06842e5 to 090e02d Compare September 16, 2020 13:54

jmchilton mentioned this pull request Sep 16, 2020

User-facing objectstore metadata. #10233

Merged

jmchilton force-pushed the quota_per_objectstore branch 2 times, most recently from fc219a0 to b42fda2 Compare September 25, 2020 00:26

jmchilton mentioned this pull request Sep 25, 2020

Quota meter text v-center off #10290

Closed

jmchilton force-pushed the quota_per_objectstore branch 6 times, most recently from aa4c391 to 3f20af9 Compare September 29, 2020 01:05

jmchilton changed the title ~~[WIP] Implement quota tracking options per ObjectStore.~~ Implement quota tracking options per ObjectStore. Sep 29, 2020

jmchilton changed the title ~~Implement quota tracking options per ObjectStore.~~ [WIP] Implement quota tracking options per ObjectStore. Sep 29, 2020

jmchilton force-pushed the quota_per_objectstore branch from 3f20af9 to 7bb1dea Compare September 29, 2020 21:36

jmchilton changed the title ~~[WIP] Implement quota tracking options per ObjectStore.~~ Implement quota tracking options per ObjectStore. Sep 30, 2020

jmchilton removed the status/WIP label Sep 30, 2020

jmchilton force-pushed the quota_per_objectstore branch from 7bb1dea to 7bcb285 Compare September 30, 2020 15:02

galaxybot added this to the 21.01 milestone Sep 30, 2020

jmchilton force-pushed the quota_per_objectstore branch 3 times, most recently from 4c59300 to 1a0c743 Compare October 9, 2020 13:57

jmchilton force-pushed the quota_per_objectstore branch from 1a0c743 to 810b787 Compare November 18, 2020 17:53

jmchilton force-pushed the quota_per_objectstore branch from 810b787 to 69c86ec Compare November 30, 2020 14:29

jmchilton mentioned this pull request Dec 1, 2020

[WIP] Implement abstractions to annotate non-sharable datasets & objectstores. #10840

Closed

get_quota in SQL

aca732a

jmchilton force-pushed the quota_per_objectstore branch from 69c86ec to cf3b6ca Compare December 15, 2020 15:46

Implement quota tracking options per ObjectStore.

feaa31d

jmchilton force-pushed the quota_per_objectstore branch from cf3b6ca to feaa31d Compare December 15, 2020 16:47

jmchilton mentioned this pull request Dec 15, 2020

User-based ObjectStore #4840

Closed

jmchilton closed this Dec 21, 2020

jmchilton mentioned this pull request Dec 21, 2020

[WIP] Implement quota tracking options per ObjectStore. #10977

Closed

This was referenced Jun 9, 2022

[WIP] Implement abstractions to annotate non-sharable datasets & objectstores. #14044

Closed

Implement quota tracking options per ObjectStore. #14047

Closed

Empower Users to Select Storage Destination #14073

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement quota tracking options per ObjectStore. #10221

Implement quota tracking options per ObjectStore. #10221

jmchilton commented Sep 15, 2020 •

edited

Loading

jmchilton commented Nov 20, 2020

dannon commented Nov 20, 2020

Implement quota tracking options per ObjectStore. #10221

Implement quota tracking options per ObjectStore. #10221

Conversation

jmchilton commented Sep 15, 2020 • edited Loading

Overview

Implementation

UI + API

Abstractions for #4840

jmchilton commented Nov 20, 2020

dannon commented Nov 20, 2020

jmchilton commented Sep 15, 2020 •

edited

Loading