-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Introduce Node Lifecycle WG #8396
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
/hold |
Looks like I'm not a member of kubernetes org anymore. I was a few years back, but didn't keep up with contributions recently. You can remove me as a lead and I can reapply after some contributions to this WG. |
75e1096
to
a19a192
Compare
We have had impactful conversations with Ryan about this group and its goals. He has experience with cluster maintenance and I look forward to his participation in the WG. |
/cc |
a19a192
to
2d6ac13
Compare
@atiratree, I would like to be part of this WG. Pls include me as well. |
I have written some PoC that might interest this wg, sign me up. |
/cc |
@evrardjp: GitHub didn't allow me to request PR reviews from the following users: evrardjp. Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/cc |
@kaushik229: GitHub didn't allow me to request PR reviews from the following users: kaushik229. Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i like how this is going and am excited to see the wg formed. thank you @atiratree !
I have been exploring the API's in this area and would like to help on this initiative. Considering that, @atiratree, I would like to be part of this WG. |
43ff1f5
to
c627543
Compare
Thank you all for your interest! Just to be on the same page for all visitors, this WG is open to everyone and we will announce the weekly meetings on the dev@kubernetes.io mailing list as soon as the group is formed. If you are interested in helping us organize/lead this group, please write me on Slack to discuss. |
c627543
to
48b4634
Compare
...) and other scenarios to use the new unified node draining approach. | ||
- Explore possible scenarios behind the reason why the node was terminated/drained/killed and how to | ||
track and react to each of them. Consider past discussions/historical perspective | ||
(e.g. "tombstones"). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In SIG node this week, we saw a presentation on node readiness from @ajaysundark, and we have also been discussing the idea of node capabilities, which @pravk03 (among others) is looking into. These seem like related but slightly different topics from each other as well as from the existing taints/tolerations functionality. Can we include these in the scope of this WG?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to adding Node Readiness in one of the goals. I can only speak from storage perspective, but one of the long standing problem is to make sure we can only schedule pods that require CSI driver, when said CSI driver is available on the node.
Currently, we don't take this into account and hence a node can have too many pods that require storage scheduled to a node and they remain stuck and they require manual intervention to cleanup.I have added my thoughts here - kubernetes/kubernetes#131208 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that we should consider node readiness here, and I also want to call out the risk of scope creep of this WG. In the beginning I think we should be very clear about what our definition of done is. this proposal began as a way to close GNS KEP, and while I agree the expanded scope of reworking node API semantics would properly fix the problem, I want to be wary of biting off more than the community can collectively chew.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, we should explicitly mention Node Readiness.
@haircommander, I share your concern. We're not certain some of these deliverables are solvable, so the language is a little open ended right now. I think we will need to provide clarity as time goes on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, I have added both of these initiatives to the WG.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for including node-readiness into consideration for wg-node-lifecycle.
I'm having the proposal ready for feedbacks https://docs.google.com/document/d/11i2_rewvcbQkFFq1BIwHa7lgIefgZ8Mak-QMZjP7fFs/edit?tab=t.0#heading=h.jtg1hwbt8shd. Happy to collaborate to align on the directions. Please include me relevant channels / discussions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, I will take a look and we can add it to the WG agenda.
Btw, the link is wrong above but correct in the enhancements issue.
@@ -60,6 +60,12 @@ subprojects, and resolve cross-subproject technical issues and decisions. | |||
- [@kubernetes/sig-cli-test-failures](https://github.com/orgs/kubernetes/teams/sig-cli-test-failures) - Test Failures and Triage | |||
- Steering Committee Liaison: Paco Xu 徐俊杰 (**[@pacoxu](https://github.com/pacoxu)**) | |||
|
|||
## Working Groups | |||
|
|||
The following [working groups][working-group-definition] are sponsored by sig-cli: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is SIG CLI the sponsor or SIG Node?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be SIG Node. We'll fix on the next update
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are adding this to all the stakeholder SIGs and they are in favor of participating (or at least consulting) in this effort. Btw this file is generated.
89b5a33
to
6c47dcc
Compare
Co-authored-by: Ryan Hallisey <rhallisey@nvidia.com>
6c47dcc
to
497fa61
Compare
We have just opened voting for the WG meetings. PTAL https://groups.google.com/a/kubernetes.io/g/dev/c/q37DZHUpemA/m/g1vDescLEAAJ |
The Karpenter maintainer team has been looking forward to seeing something like this get rolling in the community! We've been keeping an eye on kubernetes/enhancements#4212 and kubernetes/enhancements#4563 for a while now and would love to see some form of these things go into the project! Would love to join and make sure some folks from the project participate! |
No description provided.