-
Notifications
You must be signed in to change notification settings - Fork 153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ShinyProxy pod spin up in Kubernetes cannot be drained #501
Comments
Hi The reason that we don't add the owner reference is that ShinyProxy needs full control over the pods. When using the Kubernetes operator (https://github.com/openanalytics/shinyproxy-operator), new ShinyProxy serves are started and removed automatically. If such a ShinyProxy pod would own the app pods, k8s would delete the running apps when a ShinyProxy server gets deleted, even if the app is still in use. In addition, apps can run in any namespace and it's not possible to have cross-namespace owner references. The problem with draining nodes cannot be solved easily. ShinyProxy automatically removes app pods if this app is no longer in use. So if a node contains some app pods, those pods are still in use. If you drain that node, those pods will be removed and the user will notice that the app has crashed. Since these apps typically store the state in memory (e.g. as is the case with Shiny), it's impossible to move the pod to a different node, without losing the state of the app. We are looking into ways to improve this behavior, but did not yet found a way to fully prevent timeouts when draining nodes. P.s. for autoscaling this should not be an issue. We advice to use the cluster-autoscaler and not an autoscaler that is provided by the cloud provider (e.g. azure provides such an option). The cluster-autoscaler is fully aware that it should not try to remove nodes that contains ShinyProxy pods and therefore does not attempt to remove these nodes. |
First and foremost, thank you @LEDfan for providing an answer on this topic.
Although, this might still timeout if a connection is handled by the pod and the pod doesn't go down, but would reduce the chances. Regarding the AutoScaling, that is correct for scaling out but scaling in [reducing the number of nodes], if there are ShinyProxy pods in all the nodes, you'll face a similar issue, or you would end up not being able to properly optimize the infra. Cheers! |
Your proposal would work when the applications hosted by ShinyProxy are stateless. However, in our experience, the majority of applications deployed using ShinyProxy are in fact stateful. This is the case for e.g. Shiny apps, but also when using IDE's like RStudio, Jupyter notebooks etc. If the pod of a user gets moved to another node they will loose whatever they were doing in the app. Therefore, during an update of k8s, you either have to accept that users loose their work (and then it's preferred if you stop the apps of a user, such that they are aware their app was stopped, instead of resetting the state of the app) or you'll have to accept that some nodes will stay around until the user closes their work. What kind of applications are you deploying using ShinyProxy? To ensure pods are not kept indefinitely (e.g. when the user never turns off their PC), you can have a look at the max-lifetime setting: https://shinyproxy.io/documentation/configuration/#max-lifetime . If you set this to e.g. 4 hours, your nodes will be removed after 4 hours. Regarding the autoscaling, I think this is a matter of tuning the cluster configuring for the specific use case. We have multiple setups where the cluster regularly scale's up and down, and is not wasting node resources. However, I think this is a bit too much to explain here. Finally, if you want 100% efficiency, you could have a look at services like AWS Fargate, either using k8s or directly using the recently added ECS. We are always willing to implement additional backends for ShinyProxy, e.g. to use Container instances on Azure. |
Hey Team!
When using ShinyProxy to spin up a pod in Kubernetes, this one does not have an Owner Reference.
As it does not have an Owner Reference, Kubernetes cannot identify who should be in charge of that pod lifecycle.
This could be solved by using a
label
on the pod, and adding aselector
to thekubectl drain
- sourceBut if you are running a Managed Kubernetes Cluster, such as AKS, EKS & GKE (Azure, AWS and GCP respectively) there are certain operations conducted by the Cloud Provider that you won't be able to modify in order for that one to have that specific flag while doing so. Such as node auto-scaling, or Kubernetes version node upgrade.
I wanted to check with you, team, if there is any way for us to have a fixed controller if we are deploying the backend in Kubernetes.
Cheers!
The text was updated successfully, but these errors were encountered: