Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restrict AWS Sagemaker Instance Type Selection to Orchestrator Configuration #2214

Open
1 task
strickvl opened this issue Jan 3, 2024 · 9 comments
Open
1 task
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@strickvl
Copy link
Contributor

strickvl commented Jan 3, 2024

Open Source Contributors Welcomed!

Please comment below if you would like to work on this issue!

Contact Details [Optional]

support@zenml.io

What happened?

The configuration of the instance_type for AWS Sagemaker Orchestrator is currently determined by the developer/data scientist/ML engineer at the time of running the pipeline via the SagemakerOrchestratorSettings in code. This setup does not allow a DevOps Engineer or ML Engineer with an admin role to control or restrict the choice of instance types. This could lead to potential misuse, such as selecting excessively high-resource instances for trivial tasks or intentionally creating resource-intensive loops.

Task Description

Move the instance_type attribute from the SagemakerOrchestratorSettings in the code to the SagemakerOrchestrator config, which is set up during the component registration. This change will allow better control and governance over the resources used for running pipelines in AWS Sagemaker.

Expected Outcome

  • The instance_type should be configurable at the component registration level by an admin or a DevOps engineer.
  • Developers or data scientists should not be able to override the instance_type at the pipeline execution level.
  • The change should ensure better resource management and prevent potential misuse of AWS resources.

Steps to Implement

  • Update the SagemakerOrchestrator configuration to include the instance_type attribute.
  • Remove the instance_type option from the SagemakerOrchestratorSettings.
  • Ensure that the orchestrator respects the instance_type set during the component registration and does not allow overrides at runtime.
  • Update the documentation to reflect these changes.

Additional Context

This change is prompted by the need to enhance governance and control over resource utilization in cloud environments, particularly in team settings where multiple individuals have access to deploy pipelines.

Code of Conduct

  • I agree to follow this project's Code of Conduct
@strickvl strickvl added enhancement New feature or request good first issue Good for newcomers labels Jan 3, 2024
@christianversloot
Copy link
Contributor

christianversloot commented Jan 3, 2024

Is there room for discussion regarding this idea?

We are currently benefiting greatly from the fact that it's allowed to provide instance_type via SagemakerOrchestratorSettings. Internally, we've defined a procedure where any AWS-based training run is discussed first, where at least four eyes give the approval that it can actually run in the cloud. Any other run is done with local resources. We're using a variety of instance types. In fact, if I am not incorrect, we will now need to define multiple stacks with multiple SageMaker Orchestrator components in order to use multiple instance types, which is quite cumbersome for us.

I do however understand the rationale for this issue very well especially for larger organizations, but could there be some middle ground? For example, something like this:

  • Use the SagemakerOrchestrator config to define a default instance_type.
  • Also add an other_instance_type_allowed: bool or similarly named option to the same configuration, which allows DevOps engineers / admins to decide whether people running ZenML pipelines can manually provide instance_type in SagemakerOrchestratorSettings.
  • If it's allowed, everything will keep working as is: use the default instance_type if none is provided; use the pipeline-specified one if another one is provided.
  • If it's disallowed, raise an error in case someone attempts to configure a disallowed instance_type (i.e. the case where SagemakerOrchestratorSettings.instance_type != SagemakerOrchestratorConfig.instance_type).

WDYT?

@strickvl
Copy link
Contributor Author

strickvl commented Jan 3, 2024

I think I like the suggestion! It's a nice middle ground between flexibility + control over resource usage. It would be a new approach we haven't taken so far in how we allow components to be configured, and I'd be interested in @schustmi's thoughts on the approach particularly in the light of RBAC / permissions work he's been doing recently. I'm wondering if we should / should not consider this scenario with that in mind?

@schustmi
Copy link
Contributor

schustmi commented Jan 5, 2024

I also like the suggestion, seems like a good compromise 👍

RBAC will control which users have permissions to update the stack component configuration but will not affect the Settings right now, so nothing important to consider there.

@AryaMoghaddam
Copy link

I would like to work on this issue, if possible

@AryaMoghaddam
Copy link

@strickvl Sorry, I picked up the other 2 issues wouldn't have time for this one

@aiakide
Copy link
Contributor

aiakide commented Sep 4, 2024

This sounds like a nice first issue to participate in the development of ZenML. I would like to participate and implement the changes suggested by @christianversloot. 🙂

Also, am I correct in assuming that the change does not need to be made in the SagemakerStepOperatorConfig since the instance type configuration can be traced back to a step there anyway, or does the customization need to be made there as well?

@schustmi
Copy link
Contributor

schustmi commented Sep 5, 2024

@aiakide I'm currently working on a PR #2984 that affects the settings of all stack components which are related to infrastructure resources (like the instance_type mentioned in this ticket). Is it okay if you wait a few more days until that is merged before starting to implement this suggestion?

@aiakide
Copy link
Contributor

aiakide commented Sep 5, 2024

@schustmi Sure, no problem. No hurry.

@schustmi
Copy link
Contributor

schustmi commented Sep 5, 2024

Thanks! I'll let you know once it's merged and also add some more details on how this should be implemented with the new settings structure

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

5 participants