Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Empower Users to Select Storage Destination #14073

Merged
merged 21 commits into from
Mar 3, 2023

Conversation

jmchilton
Copy link
Member

@jmchilton jmchilton commented Jun 14, 2022

A long time goal for hosted Galaxy options is bring you own storage. In my opinion, this paradigm doesn’t work well without abstractions at every level of Galaxy from the backend configs, database, object store, the quota agent, API, and front-end for displaying information about where data has been stored where and selecting where future new data will be stored. This pull request attempts to put these abstractions in place for future user-defined object stores in such a way that we gracefully enable a large number important new features such as scratch histories in this pull request on the path forward.

The User Story

Once an admin has declared a nested object store to have selectable components (all backward compatible and described in detail below). The user will be will have new options available in the GUI. The options include selecting a default object store for the user account and selecting one at the history level. These options will not be displayed if the admin has not declared any nested object stores as “selectable”.

GUI

The user will be allowed to select a default preferred object store in the user preferences (User -> User Preferences -> Preferred Object Store).

Screen Shot 2022-06-14 at 3 39 49 PM

When this option is selected, a modal will be displayed for that option that displays the available object store options selections. A visual language has been crafted to quickly and consistently summarize structured information annotated by the admin about the target object store using font awesome icon layers and Bootstrap Vue colors.

Screen Shot 2022-06-14 at 3 40 58 PM

On mouseover, this selection displays a lot more information. Including the full markdown description of the object store (added in #10233). This pull request overhauls that metadata to include information about the target quota (using variants of UI components integrated in #13113) as well as integrating the ideas and implementation of per-objectstore quotas (from #14047 /  #10221 / #10977 ).

Screen Shot 2022-06-14 at 3 43 04 PM

The icons and quota allow users to quickly visually summarize the differences between the object stores. In addition, much more information can be attached by Galaxy administrators via Markdown in Galaxy’ s object store (XML or YAML) that will be displayed on mouseover. The icon might show this storage is faster or backed up and this other storage is not, but the mouseover allows admins to link out to institutional information, summarize hardware or policies, etc…

Screen Shot 2022-06-14 at 3 43 18 PM

(More information about the icons - where the information is coming from, what the icons mean, what the colors mean, and why… is provided in the next section of this PR.)

A preferred object store ID can be set at the user level, but it can also be set for individual histories and on a per tool execution basis (no UI currently for this later option).

When selectable object stores are configured, on mouseover of the data storage icon in the history shows the object store selection that will be used for a history is displayed - as well if it is being set at the user or history level. Clicking the button will allow the user to select a different target for that history.

Screen Shot 2022-06-14 at 3 43 36 PM

Screen Shot 2022-06-14 at 3 43 52 PM

In addition to the setting this at the history level - it can be set at the tool level or at the workflow level. At the workflow level - different object stores can be set for tool outputs and intermediate datasets.

Here is the tool interface:

Screen Shot 2022-06-27 at 2 59 27 PM

The workflow interface is a bit rough but looks something like this:

Screen Shot 2022-06-27 at 3 14 53 PM

Screen Shot 2022-06-27 at 3 15 13 PM

Finally, overhauling the metadata and display of information about concrete object stores available for selection means much more information can also be displayed in the “Dataset Storage” section of the “Dataset Details” page of existing datasets - since these are using similar APIs and the same GUI components.

Screen Shot 2022-06-14 at 3 55 56 PM

Screen Shot 2022-06-14 at 3 56 08 PM

A Visual Language for Object Store Selection.

My thought processes and the current icons as well as documentation for how to set them can be found in this JS Fiddle.

https://jsfiddle.net/uw8jz2ry/5/

Hopefully the JS Fiddle empowers reviewers to fork and provide specific feedback on new badges, alterations to style, etc..

Creating a visual language around nested object store selections will help semi-technical users make decisions without getting into the nitty gritty. Displaying additional information on mouseover will allow power users to dig into details and will allow institutional admins to link out to relevant hardware pages, SLAs, policy documents, etc..

The core of the visual language is a set of badges (i.e. icons) with specific meaning. These icons can be broken into two categories - ones determined by Galaxy from existing functional configuration options and ones specified by the Galaxy Administrator. All administrator specified badge options may include a markdown message that is displayed when the user digs into the relevant badge.

Rather than giving admin's the ability to just define arbitrary tags and icons and meaning, keeping things structured means we can provide higher quality help and that in the future Galaxy can provide potentially even higher level selection options, wizard dialogs, or methods of dynamically determining which object store use. For instance - user's may be able to select "it is important that this analysis runs as fast as possible" vs "it is important that I can share and published this analysis" and let Galaxy determine the appropriate object store in the future. Currently though, a specific selection is still required.

Another aspect of the visual language is colors. Three different colors underpin the relevant badges. These colors correspond to user "Advantages" of the storage, user "Disadvantages" of the storage, and "Neutral" aspects of the storage. These are defined relative to the perspective of a researcher desiring resources on a shared infrastructure. Something like "quota enforced" is an advantage to the administrators or maintainers of the infrastructure but likely not perceived as an advantage to the researcher - so it is colored as a "disadvantage".

The Icons (JS Fiddle Screenshots)

Screen Shot 2022-06-14 at 4 13 40 PM

Screen Shot 2022-06-14 at 4 13 54 PM

Screen Shot 2022-06-14 at 4 14 07 PM

Screen Shot 2022-06-14 at 4 14 18 PM

The API Story

The selectable object stores are available now via

/api/objecstore?selectable=true

The information on a specific object store ID is available at

/api/objecstore/<object_store_id>

The new APIs from per-objectstore quotas are included in this pull request:

/api/users/<user_id|current>/usage

and

/api/users/<user_id|current>/usage/<quota_source_label>

The Admin Story

Per Object Store Quotas

(From #14047)

This pull request allows different object stores or different groups of object stores to have different quotas or no quota at all. This enables uses cases such as sending job to cheaper data when a user's quota is getting near full or allowing admin to setup tool and/of workflow parameters to send job outputs higher quality, more redundant storage based on user selected options or user preferences.

This adds the quota tag to XML/YAML object store declarations - that allow specifying a "quota source label" for each objectstore in a nested objectstore or disabling quota all together on objectstores.

The following quota block would assign all this storage to a quota source labelled with s3.

        <backend id="dynamic_s3" type="disk" weight="0">
            <quota source="s3" />
            <files_dir path="${temp_directory}/files_dynamic_s3"/>

Whereas this would disable quota usage for this object store altogether.

        <backend id="temp_disk" type="disk" weight="0">
            <quota enabled="false" />
            <files_dir path="${temp_directory}/files_cloud_scratch"/>

In order to implement this a new table/model has been added to track a user's usage per quota source label - namely UserQuotaSourceUsage. Object stores that did not have a source label are still tracked using the User model's disk_usage attribute. I've updated all the scripts that recalculate user usage.

Private Object Stores

(From #14044)

  • Allow marking objectstores (in either XML or YAML) as private - indicating datasets stored in them should not be shared.
  • Add sharable property to model.Dataset that checks its object_store_id against the configured object store to determine if it is not stored in a private objectstore.
  • Add abstraction to security_agent to check if a dataset is restricted to a single user and augment galaxy.jobs.JobWrapper._set_object_store_ids and ObjectStorePopulator to prevent jobs that might create non-private datasets in private objectstores.
  • Model/security layer prevents copying non-sharable dataset into libraries or attaching private sharing roles to them.
  • The edit metadata form will display a message that the dataset is unsharable on the permissions page.
  • Integration test case to ensure cannot upload public datasets to a private objectstore.
  • Integration test case to ensure cannot modify access permissions of datasets stored in private objectstores.
  • Expose information about whether objectstores are privateObjectStore metadata display (User-facing objectstore metadata. #10233).

Badges and Pulling it All Together

Quota and private object store information is exposed as badges.

The following example is used for the screenshots above and will likely power future end-to-end testing. It demonstrates badge usage and how to attached Markdown description to specific badges.

<?xml version="1.0"?>
<!--
    Huge chunks of text were stolen wholesale from MSI's data storage website
    (https://www.msi.umn.edu/content/data-storage). I've made large changes and adapted
    this for demonstration purposes - none of the text or policies or guarantees reflect
    actual current MSI or UMN policies.
-->
<object_store type="distributed">
    <backends>
        <backend id="high_performance" allow_selection="true" type="disk" weight="1" name="High Performance Storage">
            <description>All MSI researchers have access to a high-performance, high capacity primary storage platform. This system currently provides 3.5 PB (petabytes) of storage. The integrity of the data is protected by daily snapshots and tape backups. It has sustained read and write speeds of up to 25 GB/sec.

There is default access to this storage by any MSI group with an active account. Very large needs can be also met, but need to be approved by the MSI HPC Allocation Committee. More details are available on the [Storage Allocations](https://www.msi.umn.edu/content/storage-allocations) page.

More information about MSI Storage can be found [here](https://www.msi.umn.edu/content/data-storage).
</description>
            <files_dir path="/Users/jxc755/workspace/galaxy/database/objects/deafult"/>
            <badges>
                <faster />
                <more_stable />
                <backed_up>Backed up to MSI's long term tape drive nightly. More information about our tape drive can be found on our [Archive Tier Storage](https://www.msi.umn.edu/content/archive-tier-storage) page.</backed_up>
            </badges>
        </backend>
        <backend id="second" allow_selection="true" type="disk" weight="0" name="Second Tier Storage">
            <quota source="second_tier" />
            <description>MSI first added a Ceph object storage system in November 2014 as a second tier storage option. The system currently has around 10 PB of usable storage installed.

MSI's second tier storage is designed to address the growing need for resources that support data-intensive research. It is tightly integrated with other MSI storage and computing resources in order to support a wide variety of research data life cycles and data analysis workflows. In addition, this object storage platform offers new access modes, such as Amazon’s S3 (Simple Storage Service) interface, so that researchers can better manage their data and more seamlessly share data with other researchers whether or not the other researcher has an MSI account or is at the University of Minnesota.

More information about MSI Storage can be found [here](https://www.msi.umn.edu/content/data-storage).
</description>
            <files_dir path="/Users/jxc755/workspace/galaxy/database/objects/temp"/>
            <badges>
                <faster />
                <less_stable />
                <not_backed_up />
                <less_secure>MSI's enterprise level data security policies and montioring have not yet been integrated with Ceph storage.</less_secure>
                <short_term>The data stored here is purged after a month.</short_term>
            </badges>
        </backend>
        <backend id="experimental" allow_selection="true" type="disk" weight="0" name="Experimental Scratch" private="true">
            <quota enabled="false" />
            <description>MSI Ceph storage that is purged more aggressively (weekly instead of monthly) and so it only appropriate for short term methods development and such. The rapid deletion of stored data enables us to provide this storage without a quota.

More information about MSI Storage can be found [here](https://www.msi.umn.edu/content/data-storage).
            </description>
            <files_dir path="/Users/jxc755/workspace/galaxy/database/objects/temp"/>
            <badges>
                <faster />
                <less_stable />
                <not_backed_up />
                <less_secure>MSI's enterprise level data security policies and montioring have not yet been integrated with Ceph storage.</less_secure>
                <short_term>The data stored here is purged after a week.</short_term>
            </badges>
        </backend>
        <backend id="surfs" allow_selection="true" type="disk" weight="0" name="SURFS" private="true">
            <quota source="umn_surfs" />
            <description>Much of the data analysis conducted on MSI’s high-performance computing resources uses data gathered from UMN shared research facilities (SRFs). In recognition of the need for short to medium term storage for this data, MSI provides a service, Shared User Research Facilities Storage (SURFS), enabling SRFs to deliver data directly to MSI users. By providing a designated location for this data, MSI can focus data backup and other processes to these key datasets.  As part of this service, MSI will provide the storage of the data for one year from its delivery date.

It's expected that the consumers of these data sets will be responsible for discerning which data they may wish to keep past the 1-year term, and finding an appropriate place to keep it. There are several possible storage options both at MSI and the wider university. You can explore your options using OIT’s digital [storage options chooser tool](https://it.umn.edu/services-technologies/comparisons/select-digital-storage-options).

More information about MSI Storage can be found [here](https://www.msi.umn.edu/content/data-storage).</description>
            <badges>
                <slower />
                <more_secure>University of Minnesota data security analysist's have authorized this storage for the storage of human data.</more_secure>
                <more_stable />
                <backed_up />
            </badges>
        </backend>
    </backends>
</object_store>

In Comparison to Other Approaches

My concern about most alternative approaches that have been proposed or attempted is that they over-fit Galaxy to use cases that too specific - and make more general approaches as outlined above more difficult.

For instance, Vahid’s user based object store #4840 approach worked only with the global quota in a very specific way and would have prevented user-based quota decisions, multiple quotas on different existing object stores, etc…

Nate has proposed simply adding a field to Dataset that dispatches between a tracked quota and untracked quota that gets deleted routinely. A UI for this could be rapidly prototyped and it would implement scratch histories very quickly for main but it makes very rich solutions such as this more difficult - where as this solution can readily be adapted with two lines of object store change to implement 85% of that functionality. Clearly, in the use case described above - multiple storage sources might be thought of as “temporary” or “scratch”. I invented certain aspects of the use case but I think the illustrates the issues with simple binaries when it comes to storage and Galaxy.

Thinking about how much a solution is “over fitted” to this use case is a criteria we should apply when evaluating this approach as well. I think making user’s pick specific object stores requires perhaps too much work in complex scenarios but I think I’ve argued how the structured information about badges above could readily enable higher level decisions - either on the front end or on the backend.

We may want to restrict certain options to certain groups of people - this feels like it would be easy to integrate with some added markup for groups, etc… on the object store selection or even more so by allowing dynamic selection ID criteria to be enabled.

I also, designed this with an eye toward per-user object stores and I think it is a good set of abstractions for extended in that direction but I will need to work through an implementation in order to prove it I think.

On the other end of the spectrum, essentially all of this is possible functionally with the inclusion of (#6552) years ago. A dynamic job destination can read user preferences, job resource parameters, etc... and pick an object store based on that. Additionally, the job destination can read tags perhaps on a history and dispatch based on that (as we have done for training days). If anything in this pull request is… an approach on top of that that makes specific decisions that may be over fitted in some complex situations. Galaxy admins can fallback to this older mechanism in those cases but I hope I’ve articulated why this mechanism works for a wide variety of existing use cases we wish to target.

TODO:

I'd really like to push all of this to a subsequent PR - it is fiddly stuff compared to the huge shifts in abstractions and such.

  • Test Case: Ensure a private objectstore isn’t selected for a public history.
  • Test Case: Ensure a private objectstore isn’t selected as user default is user default for new histories is public.
  • Pydantic for object store API. (1 hour)
  • FastAPI for new user APIs (1 hour)
  • Pydanic for new user APIs (1 hour)
  • Framework for ObjectStore config validation/linting… (4 hours)
    • Do a UI for it also?
  • Fine tuning “preferred” vs ... “selection” nomenclature in the UI
    • Automated default based on object store and job configuration stuff.
    • Allow admin to configure this if dynamic job runners are used.
  • Redo target on HistorySelect component…

How to test the changes?

(Select all options that apply)

License

@jmchilton jmchilton changed the title [WIP] Empower User Object Store Selection [WIP] Empower Users to Select Object Store Jun 14, 2022
@jmchilton jmchilton changed the title [WIP] Empower Users to Select Object Store [WIP] Empower Users to Storage Destination Jun 14, 2022
@jmchilton jmchilton changed the title [WIP] Empower Users to Storage Destination [WIP] Empower Users to Select Storage Destination Jun 14, 2022
@jmchilton jmchilton force-pushed the object_store_ui branch 8 times, most recently from 71ed0e0 to 69507d0 Compare June 18, 2022 23:18
@jmchilton jmchilton force-pushed the object_store_ui branch 2 times, most recently from 6aa086f to 6a6eb8c Compare June 20, 2022 16:37
@jmchilton jmchilton force-pushed the object_store_ui branch 4 times, most recently from 5aafbd1 to 6720d32 Compare June 28, 2022 20:09
@jmchilton
Copy link
Member Author

Maybe it was porting the changes from sql-migrate to alembic but I was able to recreate the migration issue and I believe it should be fixed with d1a2eab.

@jmchilton jmchilton marked this pull request as ready for review February 21, 2023 20:43
object_store: BaseObjectStore = depends(BaseObjectStore)

@router.get(
"/api/object_store",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"/api/object_store",
"/api/object_stores",

it seems like most (every?) collection in our API is in plural. I think that's a good convention. Is there a reason you're deviating from this ?

) -> List[Dict[str, Any]]:
if not selectable:
raise RequestParameterInvalidException(
"The object store index query currently needs to be called with selectable=true"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for that ? Seems a little odd. If there's a good reason SelectableQueryParam should be marked as mandatory and be the only valid value.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a narrow view of the object stores and just returns like concrete object stores for selection here and what is needed for selection (pydantic models implemented in the follow up already but namely description, badges, name, etc...). I don't want to think through what a nested store should return - we don't have an application requiring that currently. By having this parameter and making it clear we expect returns to be for selection - we are keeping this open for other applications that might require a more expansive view of the object stores.


@router.get(
"/api/object_store/{object_store_id}",
summary="Return boolean to indicate if Galaxy's default object store allows selection.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this should work for all object stores, and it also doesn't return a boolean.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in the follow PR already I think.

concrete_object_store = self.object_store.get_concrete_store_by_object_store_id(object_store_id)
if concrete_object_store is None:
raise ObjectNotFound()
as_dict = concrete_object_store.to_dict()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably quite important to have a pydantic model for this, so we're not accidentally leaking sensitive data from an object store. Also should this be filtered down by the user, so users can't list object stores they have no business knowing about ?

Copy link
Member Author

@jmchilton jmchilton Mar 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - absolutely and this is already done in the follow up PR.

source: Optional[str] = Field(
description="The quota source label corresponding to the object store the dataset is stored in (or would be stored in)"
)
enabled: bool = Field(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the description match the enabled variable ? Given the description should this maybe be tracks_quota ?

subworkflow_output = subworkflow.workflow_output_for(step_output.output_name)
if subworkflow_output is not None:
output_dict = EffectiveOutput(
output_name=subworkflow_output.output_name, step_id=subworkflow_output.workflow_step_id
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the output_name and step_id don't necessarily make this unique if the same subworkflow is embedded more than once, is that going to be a problem ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll create an issue to write a test - I'm not sure off the top of my head.

return self._call_method("_get_concrete_store_badges", obj, [], False)

def _is_private(self, obj):
return self._call_method("_is_private", obj, ObjectNotFound, True)
Copy link
Member

@mvdbeek mvdbeek Mar 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ObjectNotFound leaks travels through the API, so as an unprivileged user you see for any deleted dataset

{"err_msg":"objectstore, _call_method failed: _is_private on <galaxy.model.Dataset(212984) at 0x15c7ff760>, kwargs: {}","err_code":404001}

Screenshot 2023-03-03 at 13 48 45

@mvdbeek
Copy link
Member

mvdbeek commented Mar 3, 2023

Apart from these issues it's working super well, I set up a scratch backend in the distributed object store and it's all working as it should. The popover stuff is a little annoying, but that's minor UX stuff, and i think you're addressing a bunch of things from my review in #15654 already.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants