Skip to content

TRT-2506: Add OS version info to ClusterData#30827

Open
petr-muller wants to merge 1 commit intoopenshift:mainfrom
petr-muller:trt-2506-osversion-platformidentification
Open

TRT-2506: Add OS version info to ClusterData#30827
petr-muller wants to merge 1 commit intoopenshift:mainfrom
petr-muller:trt-2506-osversion-platformidentification

Conversation

@petr-muller
Copy link
Member

@petr-muller petr-muller commented Mar 3, 2026

Populate OS major version from the worker MachineConfigPool's OSImageStream, defaulting to "9" when unset. This enables downstream consumers (Sippy, risk analysis) to correlate test results with the node OS version.

Changes:

  • Collect OS imagestream names from the cluster default, master MCP, worker MCP, and any additional MCPs
  • Preserve partial OS data even when getOSImageStreams returns errors (e.g. OSImageStream singleton failure still returns MCP data)
  • Skip empty stream names in additional MCPs to avoid bogus entries in Additional
  • Unit tests covering happy paths, error handling, and deduplication

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Summary by CodeRabbit

  • New Features

    • Capture and expose cluster OS image streams (default, master, worker, additional) as part of cluster data.
  • Chores

    • Improved resilience and diagnostics during OS resolution: non‑fatal warnings are collected and reported to reduce failures and aid troubleshooting.
  • Tests

    • Added comprehensive unit tests covering OS stream gathering, error paths, deduplication, and edge cases.

@openshift-ci-robot
Copy link

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: automatic mode

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Mar 3, 2026
@openshift-ci-robot
Copy link

openshift-ci-robot commented Mar 3, 2026

@petr-muller: This pull request references TRT-2506 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Populate OS major version from the worker MachineConfigPool's OSImageStream, defaulting to "9" when unset. This enables downstream consumers (Sippy, risk analysis) to correlate test results with the node OS version.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested review from deads2k and p0lyn0mial March 3, 2026 12:28
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 3, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: petr-muller

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 3, 2026
@coderabbitai
Copy link

coderabbitai bot commented Mar 3, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Added OSImageStreams and OS types; extended ClusterData with an OS field. BuildClusterData now initializes a machineconfig client, calls getOSImageStreams to populate ClusterData.OS, and aggregates non-fatal errors alongside fatal ones.

Changes

Cohort / File(s) Summary
Platform identification types & logic
pkg/monitortestlibrary/platformidentification/types.go
Added OSImageStreams and OS types; added OS OS \json:"os"`toClusterData; implemented getOSImageStreams(mc machineconfigclient.Interface) (OSImageStreams, error); updated BuildClusterDatato create a machineconfig client, callgetOSImageStreams, assign clusterData.OS`, and collect non-fatal errors; updated imports and error paths.
Unit tests for OS image streams
pkg/monitortestlibrary/platformidentification/types_test.go
New comprehensive tests for getOSImageStreams: fake machineconfig client and API surfaces, helpers to create MCPs and OSImageStream singleton, test cases covering default/master/worker/additional streams, deduplication, NotFound vs non-404 errors, MCP list errors, and combined error scenarios.

Sequence Diagram

sequenceDiagram
    participant BuildClusterData as BuildClusterData
    participant MCClient as MachineConfigClient
    participant OSIS as OSImageStreamSingleton
    participant MCPs as MachineConfigPools

    BuildClusterData->>MCClient: init client & call getOSImageStreams()
    MCClient->>OSIS: Get singleton OSImageStream
    OSIS-->>MCClient: Return Default stream or NotFound/error
    MCClient->>MCPs: List MachineConfigPools
    MCPs-->>MCClient: Return MCP list with OSImageURL/OSImageStream
    MCClient-->>BuildClusterData: Return OSImageStreams {Default, MasterMCP, WorkerMCP, Additional[]} + aggregated errors
    BuildClusterData->>BuildClusterData: Assign clusterData.OS and append non-fatal warnings/errors
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 11.11% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: adding OS version information (via OSImageStreams) to the ClusterData struct.
Stable And Deterministic Test Names ✅ Passed Test file uses Go's standard testing package with t.Run() subtests, not Ginkgo. All test case names are static descriptive strings with no dynamic information.
Test Structure And Quality ✅ Passed The pull request adds a standard Go unit test using the testing package with table-driven pattern, not Ginkgo-based code, making Ginkgo-specific quality checks inapplicable.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
📝 Coding Plan for PR comments
  • Generate coding plan

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Mar 3, 2026

@petr-muller: This pull request references TRT-2506 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Populate OS major version from the worker MachineConfigPool's OSImageStream, defaulting to "9" when unset. This enables downstream consumers (Sippy, risk analysis) to correlate test results with the node OS version.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Summary by CodeRabbit

Release Notes

  • New Features
  • Enhanced cluster monitoring to track operating system major version information. The system now automatically captures OS version details from cluster configurations, defaulting to version 9 when unavailable.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/monitortestlibrary/platformidentification/types.go`:
- Around line 145-151: The current branch using mcp.Spec.OSImageStream.Name and
strings.CutPrefix("rhel-") leaves clusterData.OS.MajorVersion empty for
non-empty names that don't start with "rhel-"; update the branch so that when
osStreamName != "" and CutPrefix returns ok==false you either (a) append a parse
error (e.g., to the existing error/validation collection used for this
component) referencing the raw osStreamName and that it didn't match the
expected "rhel-*" pattern, and set clusterData.OS.MajorVersion to a clear
fallback value (e.g., "unknown" or your chosen default), or (b) return an
explicit error up the call chain; modify the code around
mcp.Spec.OSImageStream.Name, strings.CutPrefix and clusterData.OS.MajorVersion
to implement one of these behaviors so unrecognized OSImageStream names are
surfaced instead of silently leaving MajorVersion empty.

ℹ️ Review info

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 84df1a9 and a97cccd.

📒 Files selected for processing (1)
  • pkg/monitortestlibrary/platformidentification/types.go

@petr-muller
Copy link
Member Author

/test images

1 similar comment
@petr-muller
Copy link
Member Author

/test images

@petr-muller
Copy link
Member Author

/payload-job periodic-ci-openshift-release-main-ci-4.22-e2e-azure-ovn-rhcos10-techpreview-serial

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 4, 2026

@petr-muller: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-main-ci-4.22-e2e-azure-ovn-rhcos10-techpreview-serial

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/a6117d40-17cb-11f1-9a08-1bb1ee783c51-0

@petr-muller
Copy link
Member Author

/pipeline required

@openshift-ci-robot
Copy link

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

@petr-muller petr-muller force-pushed the trt-2506-osversion-platformidentification branch from a97cccd to 88e8f82 Compare March 4, 2026 16:20
@openshift-ci-robot
Copy link

openshift-ci-robot commented Mar 4, 2026

@petr-muller: This pull request references TRT-2506 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Populate OS major version from the worker MachineConfigPool's OSImageStream, defaulting to "9" when unset. This enables downstream consumers (Sippy, risk analysis) to correlate test results with the node OS version.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Summary by CodeRabbit

  • New Features

  • Enhanced cluster monitoring to capture and surface OS major version information from cluster configuration, defaulting to "9" when unavailable.

  • Chores

  • Improved resilience and diagnostics: non‑fatal warnings are collected and reported during OS resolution to reduce failures and aid troubleshooting.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@petr-muller
Copy link
Member Author

/payload-job periodic-ci-openshift-release-main-ci-4.22-e2e-azure-ovn-rhcos10-techpreview-serial

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 4, 2026

@petr-muller: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-main-ci-4.22-e2e-azure-ovn-rhcos10-techpreview-serial

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/55329010-17e6-11f1-9f67-eca718b8a1a5-0

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
pkg/monitortestlibrary/platformidentification/types.go (1)

194-216: ⚠️ Potential issue | 🟠 Major

Don’t coerce unrecognized configured streams to 9.

At Line [194] and Line [207], non-empty but unrecognized stream names are warned about, but execution still falls through to Line [216] and returns "9". That can mislabel custom/non-RHEL streams as RHEL 9 and skew downstream correlation.

🔧 Proposed fix
 	if err == nil {
 		osStreamName := mcp.Spec.OSImageStream.Name
 		if majorVersion, ok := strings.CutPrefix(osStreamName, "rhel-"); ok {
 			return majorVersion, nil
 		}
 		if osStreamName != "" {
 			warnings = append(warnings, fmt.Errorf("unrecognized worker MCP OSImageStream name %q, does not match expected rhel-* pattern", osStreamName))
+			return "", warnings
 		}
 	}
@@
 	if err == nil {
 		defaultStream := osImageStream.Status.DefaultStream
 		if majorVersion, ok := strings.CutPrefix(defaultStream, "rhel-"); ok {
 			return majorVersion, warnings
 		}
 		if defaultStream != "" {
 			warnings = append(warnings, fmt.Errorf("unrecognized OSImageStream default stream %q, does not match expected rhel-* pattern", defaultStream))
+			return "", warnings
 		}
 	} else if !kapierrs.IsNotFound(err) {
 		warnings = append(warnings, fmt.Errorf("error getting OSImageStream singleton: %w", err))
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/monitortestlibrary/platformidentification/types.go` around lines 194 -
216, The code currently appends a warning for non-empty, unrecognized stream
names (osStreamName and defaultStream) but then falls through to return "9";
change this so that when a non-empty stream value fails the rhel-* check you
return an empty string (or a sentinel indicating unknown) together with the
accumulated warnings immediately instead of defaulting to "9". Specifically, in
the branches that check osStreamName and defaultStream (the blocks referencing
osStreamName and defaultStream variables and the strings.CutPrefix checks),
after appending the warning for an unrecognized non-empty stream, return "",
warnings so the function does not coerce custom/non-RHEL streams to "9".
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/monitortestlibrary/platformidentification/types.go`:
- Around line 188-197: The GET of the worker MCP
(mcClient.MachineconfigurationV1().MachineConfigPools().Get) currently swallows
errors; instead, when err != nil capture and surface it by appending a
descriptive error to the existing warnings slice (e.g. fmt.Errorf("failed to get
worker MCP: %w", err)) so callers see API/RBAC/transport issues; keep the
existing mcp handling when err == nil (using mcp.Spec.OSImageStream.Name and
strings.CutPrefix) and only append the new warning when the Get fails.

---

Duplicate comments:
In `@pkg/monitortestlibrary/platformidentification/types.go`:
- Around line 194-216: The code currently appends a warning for non-empty,
unrecognized stream names (osStreamName and defaultStream) but then falls
through to return "9"; change this so that when a non-empty stream value fails
the rhel-* check you return an empty string (or a sentinel indicating unknown)
together with the accumulated warnings immediately instead of defaulting to "9".
Specifically, in the branches that check osStreamName and defaultStream (the
blocks referencing osStreamName and defaultStream variables and the
strings.CutPrefix checks), after appending the warning for an unrecognized
non-empty stream, return "", warnings so the function does not coerce
custom/non-RHEL streams to "9".

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c2d33c18-1aad-46fa-a14d-8d9bc794624f

📥 Commits

Reviewing files that changed from the base of the PR and between a97cccd and 88e8f82.

📒 Files selected for processing (1)
  • pkg/monitortestlibrary/platformidentification/types.go

@petr-muller petr-muller force-pushed the trt-2506-osversion-platformidentification branch from 88e8f82 to 938e28e Compare March 4, 2026 16:35
@petr-muller
Copy link
Member Author

/payload-job periodic-ci-openshift-release-main-ci-4.22-e2e-azure-ovn-rhcos10-techpreview-serial

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 4, 2026

@petr-muller: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-main-ci-4.22-e2e-azure-ovn-rhcos10-techpreview-serial

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/354ce780-17e8-11f1-8bef-f8eb82678be9-0

@openshift-ci-robot
Copy link

openshift-ci-robot commented Mar 4, 2026

@petr-muller: This pull request references TRT-2506 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Populate OS major version from the worker MachineConfigPool's OSImageStream, defaulting to "9" when unset. This enables downstream consumers (Sippy, risk analysis) to correlate test results with the node OS version.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Summary by CodeRabbit

  • New Features

  • Enhanced cluster monitoring to capture and surface the OS major version for clusters; when unavailable it will default to "9".

  • Chores

  • Improved resilience and diagnostics during OS resolution: non‑fatal warnings are collected and reported to reduce failures and aid troubleshooting.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/monitortestlibrary/platformidentification/types.go`:
- Around line 191-192: The code currently returns the entire suffix from
strings.CutPrefix(osStreamName, "rhel-") (e.g., "9.4" or "9-custom") for
MajorVersion; instead extract and validate only the numeric major component
before returning: split the suffix at the first non-digit/at the '.' (or use
strconv.Atoi on the leading token), ensure it parses as an integer, and return
that numeric string (or an error) instead of the full suffix. Apply the same
change to the other occurrence that also uses strings.CutPrefix for "rhel-" so
both return a validated numeric major version.
- Around line 217-218: The code currently appends a warning when falling back to
the default OS major version "9" which turns an expected fallback into an error;
remove the warnings = append(...) call for the default path so defaulting to "9"
does not produce a warning. Instead only append to warnings when an
OSImageStream or MCP value was present but could not be parsed (i.e., preserve
warnings on actual parse errors), updating the logic in the function that
computes/returns the OS major version (the block that currently does warnings =
append(warnings, fmt.Errorf(...)) and return "9", warnings) to simply return
"9", warnings without emitting a warning.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: fa93e0f1-d0cd-4b2a-ae8b-1726a3e30d3b

📥 Commits

Reviewing files that changed from the base of the PR and between 88e8f82 and 938e28e.

📒 Files selected for processing (1)
  • pkg/monitortestlibrary/platformidentification/types.go

Comment on lines +191 to +192
if majorVersion, ok := strings.CutPrefix(osStreamName, "rhel-"); ok {
return majorVersion, nil
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Normalize to an actual major version before returning.

Lines [191]-[192] and [206]-[207] return the full suffix after rhel-, which can include non-major content (for example, 9.4 or 9-custom). Since this field is MajorVersion, parse and validate just the numeric major component.

🔧 Proposed fix
 import (
 	"context"
 	"errors"
 	"fmt"
+	"strconv"
 	"strings"
@@
 func getOSMajorVersion(ctx context.Context, mcClient machineconfigclient.Interface) (string, []error) {
@@
 	if err == nil {
 		osStreamName := mcp.Spec.OSImageStream.Name
-		if majorVersion, ok := strings.CutPrefix(osStreamName, "rhel-"); ok {
-			return majorVersion, nil
+		if majorVersion, ok := parseRHELMajorVersion(osStreamName); ok {
+			return majorVersion, nil
 		}
 		if osStreamName != "" {
 			warnings = append(warnings, fmt.Errorf("unrecognized worker MCP OSImageStream name %q, does not match expected rhel-* pattern", osStreamName))
 		}
@@
 	if err == nil {
 		defaultStream := osImageStream.Status.DefaultStream
-		if majorVersion, ok := strings.CutPrefix(defaultStream, "rhel-"); ok {
+		if majorVersion, ok := parseRHELMajorVersion(defaultStream); ok {
 			return majorVersion, warnings
 		}
 		if defaultStream != "" {
 			warnings = append(warnings, fmt.Errorf("unrecognized OSImageStream default stream %q, does not match expected rhel-* pattern", defaultStream))
 		}
@@
 }
+
+func parseRHELMajorVersion(stream string) (string, bool) {
+	suffix, ok := strings.CutPrefix(stream, "rhel-")
+	if !ok || suffix == "" {
+		return "", false
+	}
+	major := strings.SplitN(suffix, ".", 2)[0]
+	if _, err := strconv.Atoi(major); err != nil {
+		return "", false
+	}
+	return major, true
+}

Also applies to: 206-207

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/monitortestlibrary/platformidentification/types.go` around lines 191 -
192, The code currently returns the entire suffix from
strings.CutPrefix(osStreamName, "rhel-") (e.g., "9.4" or "9-custom") for
MajorVersion; instead extract and validate only the numeric major component
before returning: split the suffix at the first non-digit/at the '.' (or use
strconv.Atoi on the leading token), ensure it parses as an integer, and return
that numeric string (or an error) instead of the full suffix. Apply the same
change to the other occurrence that also uses strings.CutPrefix for "rhel-" so
both return a validated numeric major version.

Comment on lines +217 to +218
warnings = append(warnings, fmt.Errorf("could not determine OS major version from MCP or OSImageStream, defaulting to 9"))
return "9", warnings
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Do not emit a warning for the expected default-to-9 path.

Line [217] turns a normal fallback (unset stream) into an error signal. That makes BuildClusterData return non-nil errors for healthy clusters where defaulting to "9" is intended behavior.

🔧 Proposed fix
-	// No stream information available from MCP or cluster default, assuming RHEL 9
-	warnings = append(warnings, fmt.Errorf("could not determine OS major version from MCP or OSImageStream, defaulting to 9"))
-	return "9", warnings
+	// No stream information available from MCP or cluster default, assuming RHEL 9
+	// This is expected when no explicit stream is configured.
+	return "9", warnings
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/monitortestlibrary/platformidentification/types.go` around lines 217 -
218, The code currently appends a warning when falling back to the default OS
major version "9" which turns an expected fallback into an error; remove the
warnings = append(...) call for the default path so defaulting to "9" does not
produce a warning. Instead only append to warnings when an OSImageStream or MCP
value was present but could not be parsed (i.e., preserve warnings on actual
parse errors), updating the logic in the function that computes/returns the OS
major version (the block that currently does warnings = append(warnings,
fmt.Errorf(...)) and return "9", warnings) to simply return "9", warnings
without emitting a warning.

@openshift-ci-robot
Copy link

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

// Check the worker MCP for an explicit OS image stream override
mcp, err := mcClient.MachineconfigurationV1().MachineConfigPools().Get(ctx, "worker", metav1.GetOptions{})
if err == nil {
osStreamName := mcp.Spec.OSImageStream.Name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about collecting the set of raw distinct osStreamName values associated with workers / master (when possible) as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that sounds like a good direction. I was thinking about collecting the raw names (do not try parse them to minor/major versions when collecting into clusterdata) for 1) the default stream from cluster singleton 2) master MCP 3) worker MCP, and maybe also 4) list of any other streams in any other MCPs, if they are different from the ones seen in 1,2,3

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variant identification would then process this semi-raw data any way we need

@petr-muller petr-muller force-pushed the trt-2506-osversion-platformidentification branch from 938e28e to 7cb94bb Compare March 12, 2026 12:09
@openshift-ci-robot
Copy link

openshift-ci-robot commented Mar 12, 2026

@petr-muller: This pull request references TRT-2506 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Populate OS major version from the worker MachineConfigPool's OSImageStream, defaulting to "9" when unset. This enables downstream consumers (Sippy, risk analysis) to correlate test results with the node OS version.

Includes unit tests for the getOSImageStreams function covering happy paths, error handling (404 vs non-404 on OSImageStream singleton, MCP list errors), and deduplication of additional MCP stream names.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Mar 12, 2026

@petr-muller: This pull request references TRT-2506 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Populate OS major version from the worker MachineConfigPool's OSImageStream, defaulting to "9" when unset. This enables downstream consumers (Sippy, risk analysis) to correlate test results with the node OS version.

Includes unit tests for the getOSImageStreams function covering happy paths, error handling (404 vs non-404 on OSImageStream singleton, MCP list errors), and deduplication of additional MCP stream names.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Summary by CodeRabbit

  • New Features

  • Capture and expose OS image streams (default, master, worker, additional) and surface the cluster OS information; OS major version defaults to "9" when unavailable.

  • Chores

  • Improved resilience and diagnostics during OS resolution: non‑fatal warnings are collected and reported to reduce failures and aid troubleshooting.

  • Tests

  • Added comprehensive unit tests for OS image stream gathering, error paths, deduplication, and edge cases.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/monitortestlibrary/platformidentification/types.go`:
- Around line 212-226: The loop over mcpList.Items unconditionally inserts
mcp.Spec.OSImageStream.Name into seen, allowing empty names to end up in
osImageStreams.Additional; update the loop in the block that populates seen so
that before calling seen.Insert(...) you check the name is non-empty (e.g.,
capture name := mcp.Spec.OSImageStream.Name and if name != "" then
seen.Insert(name)), leaving the existing special-case handling for "master" and
"worker" unchanged and ensuring osImageStreams.Additional is built only from
non-empty stream names.
- Around line 152-160: The current branch only sets clusterData.OS when
getOSImageStreams returns nil error, which drops partial results; change the
logic so after calling getOSImageStreams (the call inside the
machineconfigclient.NewForConfig branch) you always assign clusterData.OS =
OS{OSImageStreams: osImageStreams} if osImageStreams is non-nil, and still
append any err to errors; specifically adjust the branch involving
machineconfigclient.NewForConfig, getOSImageStreams, clusterData.OS and the OS
type so partial-success values from getOSImageStreams are preserved while
keeping the existing error handling.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: de0582fb-3dc1-4a0a-a19e-e554bc5652d7

📥 Commits

Reviewing files that changed from the base of the PR and between 938e28e and 7cb94bb.

📒 Files selected for processing (2)
  • pkg/monitortestlibrary/platformidentification/types.go
  • pkg/monitortestlibrary/platformidentification/types_test.go

@petr-muller petr-muller force-pushed the trt-2506-osversion-platformidentification branch from 7cb94bb to 540371d Compare March 12, 2026 12:25
@openshift-ci-robot
Copy link

openshift-ci-robot commented Mar 12, 2026

@petr-muller: This pull request references TRT-2506 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Populate OS major version from the worker MachineConfigPool's OSImageStream, defaulting to "9" when unset. This enables downstream consumers (Sippy, risk analysis) to correlate test results with the node OS version.

Changes:

  • Collect OS imagestream names from the cluster default, master MCP, worker MCP, and any additional MCPs
  • Preserve partial OS data even when getOSImageStreams returns errors (e.g. OSImageStream singleton failure still returns MCP data)
  • Skip empty stream names in additional MCPs to avoid bogus entries in Additional
  • Unit tests covering happy paths, error handling, and deduplication

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Mar 12, 2026

@petr-muller: This pull request references TRT-2506 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Populate OS major version from the worker MachineConfigPool's OSImageStream, defaulting to "9" when unset. This enables downstream consumers (Sippy, risk analysis) to correlate test results with the node OS version.

Changes:

  • Collect OS imagestream names from the cluster default, master MCP, worker MCP, and any additional MCPs
  • Preserve partial OS data even when getOSImageStreams returns errors (e.g. OSImageStream singleton failure still returns MCP data)
  • Skip empty stream names in additional MCPs to avoid bogus entries in Additional
  • Unit tests covering happy paths, error handling, and deduplication

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Summary by CodeRabbit

  • New Features

  • Capture and expose cluster OS image streams (default, master, worker, additional) as part of cluster data.

  • Chores

  • Improved resilience and diagnostics during OS resolution: non‑fatal warnings are collected and reported to reduce failures and aid troubleshooting.

  • Tests

  • Added comprehensive unit tests covering OS stream gathering, error paths, deduplication, and edge cases.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
pkg/monitortestlibrary/platformidentification/types_test.go (1)

249-256: Assert both error contributions in the “joined” case.

The "both OSImageStream and MCP errors are joined" case currently only checks err != nil, so it still passes if one side gets dropped. Tighten this case to verify both expected error fragments are present.

🧪 Suggested test hardening
 import (
 	"context"
 	"fmt"
+	"strings"
 	"testing"
@@
 	tests := []struct {
 		name               string
 		client             *fakeMCClient
@@
 		wantErr            bool
+		wantErrContains    []string
 		wantAdditionalNil  bool
 	}{
@@
 		{
 			name: "both OSImageStream and MCP errors are joined",
 			client: newFakeMCClient(
 				nil,
 				fmt.Errorf("osimagestream error"),
 				nil,
 				fmt.Errorf("mcp error"),
 			),
-			wantErr: true,
+			wantErr:         true,
+			wantErrContains: []string{"osimagestream error", "mcp error"},
 		},
@@
 			if !tt.wantErr && err != nil {
 				t.Fatalf("unexpected error: %v", err)
 			}
+			for _, want := range tt.wantErrContains {
+				if err == nil || !strings.Contains(err.Error(), want) {
+					t.Fatalf("error %q missing from %v", want, err)
+				}
+			}

Also applies to: 307-312

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/monitortestlibrary/platformidentification/types_test.go` around lines 249
- 256, The test case "both OSImageStream and MCP errors are joined" only asserts
err != nil; update the test in
pkg/monitortestlibrary/platformidentification/types_test.go to assert that the
returned error string contains both expected fragments (e.g., "osimagestream
error" and "mcp error") after calling the code under test (using the existing
newFakeMCClient setup), and make the same tightening change for the similar case
around lines 307-312 so both error contributions are explicitly checked rather
than just non-nil.
pkg/monitortestlibrary/platformidentification/types.go (1)

196-210: Thread the caller context through the MachineConfig lookups.

These API calls use context.TODO(), so cancellation and deadlines from BuildClusterData stop before the OS lookup path. Line 155 can pass ctx straight through.

♻️ Proposed fix
-func getOSImageStreams(mc machineconfigclient.Interface) (OSImageStreams, error) {
+func getOSImageStreams(ctx context.Context, mc machineconfigclient.Interface) (OSImageStreams, error) {
 	var osImageStreams OSImageStreams
 	var errs []error

-	osImageStream, err := mc.MachineconfigurationV1alpha1().OSImageStreams().Get(context.TODO(), "cluster", metav1.GetOptions{})
+	osImageStream, err := mc.MachineconfigurationV1alpha1().OSImageStreams().Get(ctx, "cluster", metav1.GetOptions{})
 	if err != nil {
 		if !kapierrs.IsNotFound(err) {
 			errs = append(errs, fmt.Errorf("error getting OSImageStream singleton: %v", err))
@@
-	mcpList, err := mc.MachineconfigurationV1().MachineConfigPools().List(context.TODO(), metav1.ListOptions{})
+	mcpList, err := mc.MachineconfigurationV1().MachineConfigPools().List(ctx, metav1.ListOptions{})
-		osImageStreams, err := getOSImageStreams(mcClient)
+		osImageStreams, err := getOSImageStreams(ctx, mcClient)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/monitortestlibrary/platformidentification/types.go` around lines 196 -
210, The getOSImageStreams function uses context.TODO() for API calls which
prevents cancellation from BuildClusterData; change the signature of
getOSImageStreams to accept a context (e.g., func getOSImageStreams(ctx
context.Context, mc machineconfigclient.Interface) (OSImageStreams, error)),
replace the two context.TODO() uses
(mc.MachineconfigurationV1alpha1().OSImageStreams().Get and
mc.MachineconfigurationV1().MachineConfigPools().List) with the passed ctx, and
update every caller (notably BuildClusterData) to pass its ctx through to
getOSImageStreams so deadlines/cancellations propagate.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/monitortestlibrary/platformidentification/types.go`:
- Around line 42-43: The JSON tag on the struct field Additional is wrong
(json:"omitempty") which sets the JSON key to "omitempty"; update the tag for
the Additional []string field to use the omit-empty option correctly by changing
it to json:",omitempty" so the serialized name remains "Additional" and the
field is omitted when empty.

---

Nitpick comments:
In `@pkg/monitortestlibrary/platformidentification/types_test.go`:
- Around line 249-256: The test case "both OSImageStream and MCP errors are
joined" only asserts err != nil; update the test in
pkg/monitortestlibrary/platformidentification/types_test.go to assert that the
returned error string contains both expected fragments (e.g., "osimagestream
error" and "mcp error") after calling the code under test (using the existing
newFakeMCClient setup), and make the same tightening change for the similar case
around lines 307-312 so both error contributions are explicitly checked rather
than just non-nil.

In `@pkg/monitortestlibrary/platformidentification/types.go`:
- Around line 196-210: The getOSImageStreams function uses context.TODO() for
API calls which prevents cancellation from BuildClusterData; change the
signature of getOSImageStreams to accept a context (e.g., func
getOSImageStreams(ctx context.Context, mc machineconfigclient.Interface)
(OSImageStreams, error)), replace the two context.TODO() uses
(mc.MachineconfigurationV1alpha1().OSImageStreams().Get and
mc.MachineconfigurationV1().MachineConfigPools().List) with the passed ctx, and
update every caller (notably BuildClusterData) to pass its ctx through to
getOSImageStreams so deadlines/cancellations propagate.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 1210543f-9b54-4a53-b462-2167881b7bec

📥 Commits

Reviewing files that changed from the base of the PR and between 7cb94bb and 540371d.

📒 Files selected for processing (2)
  • pkg/monitortestlibrary/platformidentification/types.go
  • pkg/monitortestlibrary/platformidentification/types_test.go

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 12, 2026

@petr-muller: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-ovn-serial-2of2 938e28e link true /test e2e-aws-ovn-serial-2of2

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

In cluster with nodes managed through machineconfig API, collect OS
imagestream names configured for cluster default, master MCP, worker MCP
and any other OS imagestream names seen on other MCPs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@petr-muller petr-muller force-pushed the trt-2506-osversion-platformidentification branch from 540371d to 2c9b689 Compare March 12, 2026 16:07
@openshift-ci-robot
Copy link

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

@petr-muller
Copy link
Member Author

/payload-job periodic-ci-openshift-machine-config-operator-release-4.22-periodics-e2e-aws-mco-disruptive-techpreview-1of3 periodic-ci-openshift-machine-config-operator-release-4.22-periodics-e2e-gcp-mco-disruptive-techpreview-2of3 periodic-ci-openshift-machine-config-operator-release-4.22-periodics-e2e-azure-mco-disruptive-techpreview-3of3

@petr-muller
Copy link
Member Author

/payload-job periodic-ci-openshift-release-main-ci-4.23-e2e-aws-ovn-rhcos10-techpreview-serial-1of3 periodic-ci-openshift-release-main-ci-4.23-e2e-aws-ovn-rhcos10-techpreview-serial-2of3 periodic-ci-openshift-release-main-ci-4.23-e2e-aws-ovn-rhcos10-techpreview-serial-3of3 periodic-ci-openshift-release-main-ci-4.23-e2e-gcp-ovn-rhcos10-techpreview periodic-ci-openshift-release-main-ci-4.22-e2e-azure-ovn-rhcos10-techpreview-serial

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 12, 2026

@petr-muller: trigger 3 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-machine-config-operator-release-4.22-periodics-e2e-aws-mco-disruptive-techpreview-1of3
  • periodic-ci-openshift-machine-config-operator-release-4.22-periodics-e2e-gcp-mco-disruptive-techpreview-2of3
  • periodic-ci-openshift-machine-config-operator-release-4.22-periodics-e2e-azure-mco-disruptive-techpreview-3of3

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/95e71b80-1e3f-11f1-8aee-95beaf313f3e-0

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 12, 2026

@petr-muller: trigger 5 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-main-ci-4.23-e2e-aws-ovn-rhcos10-techpreview-serial-1of3
  • periodic-ci-openshift-release-main-ci-4.23-e2e-aws-ovn-rhcos10-techpreview-serial-2of3
  • periodic-ci-openshift-release-main-ci-4.23-e2e-aws-ovn-rhcos10-techpreview-serial-3of3
  • periodic-ci-openshift-release-main-ci-4.23-e2e-gcp-ovn-rhcos10-techpreview
  • periodic-ci-openshift-release-main-ci-4.22-e2e-azure-ovn-rhcos10-techpreview-serial

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/9ae9e9f0-1e3f-11f1-9118-ee6d70fe4dbb-0

@petr-muller
Copy link
Member Author

/test e2e-aws-ovn-single-node-upgrade
/test e2e-hypershift-conformance

@petr-muller
Copy link
Member Author

/test ?

@petr-muller
Copy link
Member Author

/test e2e-gcp-ovn-techpreview
/test e2e-aws-ovn-single-node-techpreview

@petr-muller
Copy link
Member Author

Collection looks good:

Current default jobs will not collect anything: e2e-aws-ovn-fips artifact

    "os": {
        "Default": "",
        "MasterMCP": "",
        "WorkerMCP": ""
    },

Current TP jobs will see a default collected as rhel-9: e2e-gcp-ovn-techpreview artifact

    "os": {
        "Default": "rhel-9",
        "MasterMCP": "",
        "WorkerMCP": ""
    },

And finally, RHCOS10 TP jobs will see the default rhel-9 but both master and worker pools are rhel-10: e2e-aws-ovn-rhcos10-techpreview-serial-3of3 artifact:

    "os": {
        "Default": "rhel-9",
        "MasterMCP": "rhel-10",
        "WorkerMCP": "rhel-10"
    },

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants