feat: Add g7e instance types to health-monitoring-agent node affinity#381
Open
PremiumSpider wants to merge 1 commit intoaws:mainfrom
Open
feat: Add g7e instance types to health-monitoring-agent node affinity#381PremiumSpider wants to merge 1 commit intoaws:mainfrom
PremiumSpider wants to merge 1 commit intoaws:mainfrom
Conversation
Add ml.g7e.{2,4,8,12,24,48}xlarge to the health-monitoring-agent
DaemonSet node affinity allowlist so the agent runs on g7e instances.
Part of g7e instance type onboarding for HyperPod.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What's changing and why?
Adding g7e instance types (ml.g7e.{2,4,8,12,24,48}xlarge) to the health-monitoring-agent DaemonSet node affinity allowlist. Without this change, the health monitoring agent won't be scheduled on g7e nodes, meaning no health monitoring on g7e instances.
Part of g7e instance type onboarding for HyperPod.
Related PR: #380
Before/After UX
Before: Health monitoring agent pods are not scheduled on g7e nodes because the node affinity doesn't include g7e instance types. g7e nodes have no health monitoring coverage.
After: Health monitoring agent pods are correctly scheduled on all g7e nodes via node affinity matching.
How was this change tested?
Config-only change — added g7e instance types to the YAML node affinity values list. No logic changes.
Are unit tests added?
N/A — config-only change, no code logic modified.
Are integration tests added?
N/A — config-only change.
Reviewer Guidelines
One of the following must be true: