Add gcov-based test pruning with coverage cache by sbryngelson · Pull Request #1284 · MFlowCode/MFC

sbryngelson · 2026-03-02T00:58:01Z

Summary

File-level gcov coverage cache maps 555 test UUIDs to exercised .fpp source files (gzip JSON, 11KB, committed to repo)
--only-changes flag prunes tests by intersecting PR-changed files against coverage cache
--build-coverage-cache flag + 3-phase parallel cache builder (prepare, run, gcov collect)
New rebuild-cache CI job runs on Phoenix via SLURM when cases.py or Fortran dependency graph changes
Dep-change detection greps PR/push diffs for added use/include statements
Conservative fallbacks: missing cache runs all, missing sim coverage includes test, ALWAYS_RUN_ALL files trigger full suite
Remove continue-on-error from github CI job (fixes auto-cancellation)
53 unit tests cover core coverage logic
TEMP: duplicate use in m_bubbles.fpp + remove CMakeLists.txt from ALWAYS_RUN_ALL to test the full cache rebuild pipeline in CI

Replaces #1283.

Test plan

CI lint checks pass
rebuild-cache job triggers (dep_changed detection)
Github test jobs download cache artifact and prune tests via --only-changes
Revert TEMP changes before merge

🤖 Generated with Claude Code

Copilot

Pull request overview

Adds a gcov-derived, file-level coverage cache to enable pruning the MFC test suite to only those tests that exercise files changed in a PR, with CI support to rebuild and distribute the cache when it becomes stale.

Changes:

Introduces toolchain/mfc/test/coverage.py (cache build/load + changed-file detection + coverage-based test filtering) and commits a gzipped JSON cache.
Integrates new CLI flags --only-changes and --build-coverage-cache into the test runner and CLI schema.
Updates CI workflows/scripts to rebuild the cache on dependency graph changes and to run pruned test suites on PRs.

Reviewed changes

Copilot reviewed 14 out of 16 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
`toolchain/mfc/test/coverage.py`	Implements coverage cache build/load + changed-file diffing + pruning logic (`ALWAYS_RUN_ALL` safeguards).
`toolchain/mfc/test/test_coverage_unit.py`	Adds offline unit tests covering diff parsing, gcov JSON parsing, cache normalization, and pruning behavior.
`toolchain/mfc/test/test.py`	Wires `--only-changes` pruning and `--build-coverage-cache` cache generation into the test command.
`toolchain/mfc/cli/commands.py`	Adds CLI arguments/examples for coverage cache building and pruning.
`toolchain/mfc/test/case.py`	Factors post-process output parameter mods into reusable constants/helpers used by the cache builder.
`toolchain/mfc/test/test_coverage_cache.json.gz`	Adds committed baseline coverage cache artifact.
`.github/workflows/test.yml`	Adds `rebuild-cache` job + dependency-change detection + artifact flow; enables pruning on PRs.
`.github/workflows/phoenix/rebuild-cache.sh`	SLURM-executed rebuild script: clean, gcov build, build cache in parallel.
`.github/workflows/phoenix/test.sh`	Enables pruning on PRs and scales CPU thread count based on allocation.
`.github/workflows/phoenix/submit.sh`	Adjusts CPU SLURM submission options for cache rebuild workload.
`.github/workflows/frontier/test.sh`	Enables pruning on PRs.
`.github/workflows/frontier_amd/test.sh`	Enables pruning on PRs.
`.github/file-filter.yml`	Adds `cases_py` filter to trigger cache rebuilds when test definitions change.
`CMakeLists.txt`	Improves gcov build reliability (disable `-march=native`/LTO for gcov; add Fypp line-marker flag).
`.gitignore`	Ignores legacy raw (non-gz) cache file.
`src/simulation/m_bubbles.fpp`	TEMP change: duplicated `use` for pipeline exercise.

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (3)

src/simulation/m_bubbles.fpp (1)
19-19: Remove duplicate module import.

Line 19 repeats use m_helper_basic already declared on Line 18. This is redundant and can cause avoidable compiler warnings/noise across toolchains.
Proposed fix
-    use m_helper_basic
.github/workflows/phoenix/test.sh (1)
54-56: Potential issue with unset variable in arithmetic comparison.

When SLURM_CPUS_ON_NODE is unset, the comparison SLURM_CPUS_ON_NODE > 64 evaluates the unset variable as 0, but the default :-8 is only applied in the else branch. This works but is fragile. Consider applying the default first:
Suggested improvement
-# Use up to 64 parallel test threads on CPU (GNR nodes have 192 cores).
-# Cap at 64 to avoid overwhelming MPI's ORTE daemons with concurrent launches.
-n_test_threads=$(( SLURM_CPUS_ON_NODE > 64 ? 64 : ${SLURM_CPUS_ON_NODE:-8} ))
+# Use up to 64 parallel test threads on CPU (GNR nodes have 192 cores).
+# Cap at 64 to avoid overwhelming MPI's ORTE daemons with concurrent launches.
+_cpus=${SLURM_CPUS_ON_NODE:-8}
+n_test_threads=$(( _cpus > 64 ? 64 : _cpus ))
toolchain/mfc/test/test_coverage_unit.py (1)
74-75: Use underscore-prefixed stub args to keep lint clean.

Optional cleanup to silence ARG005 without changing behavior.
♻️ Suggested tweak
-_case_stub.input_bubbles_lagrange = lambda case: None
-_case_stub.get_post_process_mods = lambda params: {}
+_case_stub.input_bubbles_lagrange = lambda _case: None
+_case_stub.get_post_process_mods = lambda _params: {}

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4ee892c and dbe7098.

⛔ Files ignored due to path filters (1)

toolchain/mfc/test/test_coverage_cache.json.gz is excluded by !**/*.gz

📒 Files selected for processing (15)

.github/file-filter.yml
.github/workflows/frontier/test.sh
.github/workflows/frontier_amd/test.sh
.github/workflows/phoenix/rebuild-cache.sh
.github/workflows/phoenix/submit.sh
.github/workflows/phoenix/test.sh
.github/workflows/test.yml
.gitignore
CMakeLists.txt
src/simulation/m_bubbles.fpp
toolchain/mfc/cli/commands.py
toolchain/mfc/test/case.py
toolchain/mfc/test/coverage.py
toolchain/mfc/test/test.py
toolchain/mfc/test/test_coverage_unit.py

On GNR nodes (192 cores), $(nproc) returns 192 which overwhelms MPI daemons and causes SIGTERM (exit 143) during benchmarks. Master lands on a 24-core node and passes while PR lands on GNR and fails, making benchmarks appear broken by the PR. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Strip case.py and coverage.py from ALWAYS_RUN_ALL so CI exercises coverage-based pruning on this PR. - Add TEMP use statement to m_bubbles.fpp to demonstrate selective test execution (only ~69 bubble-related tests should run). - Set t_step_stop=1 in cache builder: one timestep exercises the same code paths while preventing heavy 3D tests from timing out under gcov instrumentation. Fixes 80 tests with missing sim coverage. - Update unit tests to match TEMP ALWAYS_RUN_ALL changes. Restore before merge: - coverage.py ALWAYS_RUN_ALL entries - m_bubbles.fpp use statement - unit test assertions Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The --only-changes flag in self-hosted test scripts checks GITHUB_EVENT_NAME to decide whether to prune tests. But SLURM batch jobs run in a fresh shell without GitHub Actions env vars. Export the variable in the sbatch heredoc so it gets baked into the job script at submit time. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

New cache built from CI with t_step_stop=1 fix. All 459 non-example tests now have full simulation coverage (previously 80 were missing). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ause Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…iagnostics The coverage cache Phase 2 was killed by OOM (exit code 137) when running 32 concurrent gcov-instrumented MPI processes. Reduce the cap from 32 to 16 workers. Also remove diagnostic logging added in the previous commit now that slug computation has been confirmed correct. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Restore case.py and coverage.py to ALWAYS_RUN_ALL now that CI cache rebuild is confirmed working. Remove temporary 'use m_helper' from m_bubbles.fpp that was added to exercise coverage pruning. Fix unit tests to assert these files trigger the full suite. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The qbmm example has x-coordinates from -1000 to 1000, which overflow the Fortran formatted write field width on gfortran 12.3.0, producing asterisks instead of numbers. This causes the packer to fail parsing the output. The test passes on newer gfortran (13+, 15) but fails on Phoenix GNR nodes with gfortran 12. Other qbmm tests still provide full coverage of the qbmm code paths. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…rror handling - Fix __filter return type annotation to match actual tuple return - Detect removed use/include statements in dep_changed (match ^- in addition to ^+) - Wrap git subprocess calls in try/except for TimeoutExpired/OSError Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Auto-generated case-optimized files under build/staging/ never appear in PR diffs, so keeping them in the cache is noise. Filter at parse time and clean the existing cache. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Cut benchmark time steps from 60-70 to 20 (GPU) / 10 (CPU) — still sufficient for grind time measurement - Unify Frontier SLURM config: bench now uses CFD154/batch/normal like tests instead of ENG160/extended (2hr wall time vs 6hr) - Reduce CI timeout from 8hr to 4hr Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-03-05T14:09:36Z

Claude Code Review

Head SHA: 66c792a23ab4709c45979432efda463ad24e541a
Files changed: 25

Changed files

.github/file-filter.yml
.github/workflows/bench.yml
.github/workflows/frontier/bench.sh, submit.sh, test.sh
.github/workflows/phoenix/bench.sh, rebuild-cache.sh, submit.sh, test.sh
.github/workflows/test.yml
.gitignore, CMakeLists.txt
benchmarks/*/case.py (5 files — benchmark step fix)
toolchain/mfc/build.py, toolchain/mfc/cli/commands.py
toolchain/mfc/test/case.py, cases.py, coverage.py (new), test.py
toolchain/mfc/test/test_coverage_cache.json.gz (binary, new)
toolchain/mfc/test/test_coverage_unit.py (new, 664 lines)

Summary

Adds gcov-based test pruning: a file-level coverage cache maps each of 555 test UUIDs to the .fpp source files it exercises, and --only-changes intersects PR-changed files against the cache to skip irrelevant tests.
Three-phase parallel cache builder (prepare → run → gcov collect), new rebuild-cache CI job on Phoenix/SLURM, and conservative fallbacks (missing cache → run all; no sim coverage → include conservatively).
53 unit tests in test_coverage_unit.py cover the core filtering logic without requiring a build or network access.
Benchmarks refactored to use a ternary for t_step_stop/t_step_save when --steps is provided.

Findings

🔴 Security: Fork code runs with `contents: write` on self-hosted runner

.github/workflows/test.yml, rebuild-cache job (~line 274–306)

The rebuild-cache job checks out the PR head SHA from the fork (ref: github.event.pull_request.head.sha) and then executes:

bash .github/workflows/phoenix/submit.sh .github/workflows/phoenix/rebuild-cache.sh cpu none

Because the checkout fetches the fork's .github/workflows/phoenix/rebuild-cache.sh, a malicious fork can inject arbitrary commands into that script. The job runs on a persistent self-hosted Phoenix runner with contents: write permission, which allows pushing to refs/heads/master.

The trigger is pull_request (not pull_request_target), so the workflow file itself runs from the base branch — but the shell scripts invoked by that workflow come from the checked-out worktree, which is the fork's code. This is a supply-chain risk.

Mitigation options:

Only trigger rebuild-cache on push (master) and workflow_dispatch, never on pull_request.
Or: pin the scripts to run from the base-branch copy before checking out the PR head (two-step checkout, or use a separate step that downloads the scripts from the base).
Or: add explicit github.repository == 'MFlowCode/MFC' + github.event.pull_request.head.repo.full_name == 'MFlowCode/MFC' guards so the job is skipped entirely for forks.

🟡 TEMP changes must be reverted before merge (as noted in PR body)

toolchain/mfc/test/cases.py and toolchain/mfc/test/test_coverage_unit.py

The PR explicitly calls out two temporary changes that test the pipeline:

A duplicate use statement in m_bubbles.fpp to trigger dep-change detection.
Removal of CMakeLists.txt from ALWAYS_RUN_ALL, confirmed by the new unit test test_cmakelists_does_not_trigger_all which asserts should_run_all_tests({"CMakeLists.txt"}) is False.

The root-level CMakeLists.txt controls which source files are compiled and with what flags. A change there (e.g., adding a new .fpp to the build) can affect all tests without touching any .fpp file directly. Once the TEMP changes are reverted, CMakeLists.txt (and/or CMakeLists.txt prefix) should be back in ALWAYS_RUN_ALL.

🟡 Variable naming confusion in `build_coverage_cache`

toolchain/mfc/test/coverage.py, ~line 1189–1212

phase1_jobs = min(n_jobs, 16)
cons.print(f"... ({phase1_jobs} test workers, {n_jobs} gcov workers)...")
# ...
# Phase 1: single-threaded — no parallelism here
# Phase 2:
with ThreadPoolExecutor(max_workers=phase1_jobs) as pool:  # ← uses phase1_jobs
# Phase 3:
with ThreadPoolExecutor(max_workers=n_jobs) as pool:

Phase 1 is explicitly single-threaded; phase1_jobs is actually used as the Phase 2 worker count. The name is misleading — consider n_test_workers or max_concurrent_tests.

🟡 `cons.raw.file` redirect in `_prepare_test` is fragile

toolchain/mfc/test/coverage.py, ~lines 1143–1153

orig_file = cons.raw.file
cons.raw.file = io.StringIO()
try:
    ...
finally:
    cons.raw.file = orig_file

This suppresses Rich console output by swapping out its internal file handle. It depends on Rich's internal raw.file attribute being stable across Rich versions, which is not guaranteed. A cleaner approach would be to pass a quiet flag to get_inp() or capture at the subprocess level. The comment acknowledges it's not thread-safe, which is consistent with Phase 1 being single-threaded — just note this would be a silent regression if Phase 1 is ever parallelized.

🟢 Binary blob growth over time

toolchain/mfc/test/test_coverage_cache.json.gz

The gzip cache (11 KB now) is committed to the repo and regenerated on each push to master that touches cases.py or Fortran deps. Each regeneration creates a new git object. At ~11 KB per commit, this is negligible currently, but over hundreds of cache rebuilds it could accumulate. Git LFS or downloading the artifact separately (rather than committing to the repo) would avoid the accumulation. Not a blocker but worth tracking.

Minor nits

toolchain/mfc/test/coverage.py line 893: double blank line before _parse_gcov_json_output (style).
_prepare_test comment says # NOTE: not thread-safe — Phase 1 must remain single-threaded but doesn't explain why (the cons.raw.file swap). A one-liner cross-reference would help future readers.
_run_single_test_direct: ppn = test_info["ppn"] always resolves to 1 via getattr(case, 'ppn', 1) since TestCase objects don't have a ppn attribute. If multi-rank coverage matters, this should use the same ppn logic as the normal test runner.

Overall: The architecture is well thought-out and the conservative fallbacks are sound. The primary concern before merging is the fork-code execution issue in the rebuild-cache CI job, and ensuring the TEMP changes are reverted.

…rror handling

github-actions · 2026-03-05T17:55:44Z

Claude Code Review

Head SHA: 1bac876a65c0480ab4201bf795a07b00fd84148b
Files changed: 27

Changed files

.github/file-filter.yml
.github/scripts/retry-build.sh
.github/scripts/run-tests-with-retry.sh
.github/workflows/bench.yml
.github/workflows/frontier/{bench,submit,test}.sh
.github/workflows/phoenix/{bench,rebuild-cache,submit,test}.sh
.github/workflows/test.yml
.gitignore
CMakeLists.txt
benchmarks/*/case.py  (5 files)
toolchain/mfc/build.py
toolchain/mfc/cli/commands.py
toolchain/mfc/test/{case,cases,test}.py
toolchain/mfc/test/coverage.py  (new, 767 lines)
toolchain/mfc/test/test_coverage_cache.json.gz  (new)
toolchain/mfc/test/test_coverage_unit.py  (new, 664 lines)

Summary

Adds gcov-based file-level coverage cache (555 test UUIDs -> exercised .fpp files, gzip JSON) committed to the repo
Introduces --only-changes flag to prune tests by intersecting changed .fpp files against the coverage cache; conservative fallbacks on missing/stale cache
Adds rebuild-cache CI job that builds MFC with --gcov, runs all tests to collect per-test coverage, and auto-commits the updated cache on push-to-master
Dep-change detection in the lint-gate step greps the PR/push diff for added/removed use/include statements to trigger a cache rebuild
53 unit tests cover core coverage-cache logic without requiring a real build or git state

Findings

1. TEMP code must be reverted before merge (blocker)

The PR description explicitly flags:

TEMP: duplicate use in m_bubbles.fpp + remove CMakeLists.txt from ALWAYS_RUN_ALL to test the full cache rebuild pipeline in CI

The test plan marks this as a required step ("Revert TEMP changes before merge"). The cases.py hunk also adds "1D_qbmm" to the example-skip list — it is unclear whether that is also TEMP. These must be cleaned up before merging.

2. Push-event dep-change detection silently fails on shallow clones

.github/workflows/test.yml — dep-check step:

DIFF=$(git diff "$BEFORE".."$AFTER" 2>/dev/null || echo "")

On the standard actions/checkout shallow clone (--depth 1), $BEFORE is not present in local history, so git diff fails, DIFF is empty, and dep_changed is set to false. The cache will not be rebuilt after a push that adds a use statement, silently leaving it stale.

For PR events the fallback is safe (dep_changed=true). The push-event path should match:

DIFF=$(git diff "$BEFORE".."$AFTER" 2>/dev/null) || {
  echo "git diff failed on push — defaulting to dep_changed=true for safety."
  echo "dep_changed=true" >> "$GITHUB_OUTPUT"
  exit 0
}

3. `--gcov` passthrough to `./mfc.sh test` may cause argument errors

.github/scripts/run-tests-with-retry.sh:

--test-all|--single|--debug|--gcov) PASSTHROUGH="$PASSTHROUGH $arg" ;;

--gcov is passed through to ./mfc.sh test, but test does not define a --gcov argument in cli/commands.py (only build does). The rebuild-cache.sh script also calls ./mfc.sh test --build-coverage-cache --gcov, which would fail if the test command rejects unknown flags. Verify that ARG("gcov") resolves correctly during a test run, or add a no-op --gcov argument to the test command definition.

4. Benchmark step count reductions change benchmark semantics

All 5 benchmark case.py files reduce t_step_stop/t_step_save from 7x(5*size+5) to 2x(5*size+5) (viscous from 6x to 2x). This 3-3.5x reduction in simulation length may reduce warmup time and steady-state sampling quality. If intentional for CI speed, add a comment to avoid confusion when interpreting benchmark results later.

5. Bare `SLURM_CPUS_ON_NODE` in arithmetic is fragile

.github/workflows/phoenix/test.sh:

n_test_threads=$(( SLURM_CPUS_ON_NODE > 64 ? 64 : ${SLURM_CPUS_ON_NODE:-8} ))

The bare SLURM_CPUS_ON_NODE in the ternary guard evaluates to 0 when unset, so the fallback to 8 works by accident. Prefer consistent usage:

n_test_threads=$(( ${SLURM_CPUS_ON_NODE:-8} > 64 ? 64 : ${SLURM_CPUS_ON_NODE:-8} ))

Positive observations

The github.repository == 'MFlowCode/MFC' guard on rebuild-cache correctly prevents untrusted fork code from running on the self-hosted Phoenix runner with write permissions.
Conservative fallbacks throughout coverage.py (missing cache -> full suite, UUID not in cache -> include test, missing sim coverage -> include test) are well thought out.
The _normalize_cache function for backwards compat with old line-level format is a nice touch.
Removing continue-on-error: true from the GitHub test job restores auto-cancellation correctly; Phoenix/Frontier already had retry logic.
664 lines of offline unit tests for coverage logic without requiring a real build or git state is good practice.

Generated by Claude Sonnet 4.6 via Claude Code

codecov · 2026-03-05T21:18:50Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 44.84%. Comparing base (2c3590c) to head (a84cabc).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1284      +/-   ##
==========================================
- Coverage   44.95%   44.84%   -0.12%     
==========================================
  Files          70       70              
  Lines       20503    20511       +8     
  Branches     1946     1946              
==========================================
- Hits         9217     9198      -19     
- Misses      10164    10191      +27     
  Partials     1122     1122

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…ogress counters - Remove SYSCHECK from cache-build targets (never built in this context, caused 459 spurious warnings per cache rebuild) - Break binary execution loop on missing binary or failed target to prevent downstream targets from producing init-only gcda files with garbage coverage - Warn with byte offset when gcov JSON parse truncates mid-stream - Fix Phase 2/3 progress counters to use len(test_infos)/len(test_results) instead of len(cases) so the final line prints correctly when tests are skipped - Include the actual exception in load_coverage_cache corrupt-cache warning - Add --only-changes to retry PASSTHROUGH so PR runs retry with same scope - Rename phase1_jobs -> phase2_jobs (Phase 1 is serial; cap is for Phase 2) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The cache rebuild runs 459 tests which each write grid files, restart files, and simulation output to the shared scratch NFS filesystem. Previously these were never cleaned up, leaving several GB of data on NFS between the rebuild job finishing and the subsequent test jobs starting. This I/O pressure can destabilize the NFS mount and cause transient failures (missing silo files) in the main test jobs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

github-actions · 2026-03-06T16:56:21Z

Claude Code Review

Head SHA: df97a8a
Files changed: 27

Files

.github/file-filter.yml
.github/scripts/retry-build.sh
.github/scripts/run-tests-with-retry.sh
.github/workflows/bench.yml
.github/workflows/frontier/bench.sh, submit.sh, test.sh
.github/workflows/phoenix/bench.sh, rebuild-cache.sh, submit.sh, test.sh
.github/workflows/test.yml
CMakeLists.txt
toolchain/mfc/cli/commands.py
toolchain/mfc/test/case.py, cases.py, coverage.py, test.py
toolchain/mfc/test/test_coverage_cache.json.gz (binary)
toolchain/mfc/test/test_coverage_unit.py
5 benchmark case.py files

Summary

Introduces file-level gcov coverage cache (555 tests → .fpp files) and --only-changes flag to prune CI test runs on PRs
New rebuild-cache CI job on Phoenix/SLURM runs a 3-phase parallel cache builder (prepare → run → gcov collect)
Dep-change detection greps PR/push diffs for use/include additions to detect stale coverage cache
53 unit tests cover core coverage logic with full offline mocking
Removes continue-on-error: true from the github matrix job; downstream jobs now depend on rebuild-cache via always() guard

Findings

[Must fix before merge] TEMP changes need revert (toolchain/mfc/test/cases.py, src/simulation/m_bubbles.fpp)
The PR description explicitly flags two TEMP changes that must be reverted before merge:

"duplicate use in m_bubbles.fpp + remove CMakeLists.txt from ALWAYS_RUN_ALL to test the full cache rebuild pipeline in CI"

Neither the m_bubbles.fpp change nor the CMakeLists.txt removal from ALWAYS_RUN_ALL appears in the diff listing, suggesting they may already be reverted or were only in a prior iteration. Confirm both are cleaned up before merging. cases.py has a +1/-1 change (excluded test list tweak) that also warrants a second look.

[Security / CI] rebuild-cache runs PR head code on self-hosted runner with contents: write (.github/workflows/test.yml, lines ~305–358)

    permissions:
      contents: write   # Required for Commit Cache to Master on push events
    steps:
      - name: Clone
        uses: actions/checkout@v4
        with:
          ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }}

The job checks out github.event.pull_request.head.sha (untrusted fork code for external PRs) and executes it on the phoenix self-hosted runner. The contents: write permission is declared at the job level, so it's held even when triggered by pull requests. Because GITHUB_TOKEN permissions for pull_request from forks are automatically downgraded by GitHub, the write permission itself is inert for fork PRs — but untrusted code still runs on the self-hosted runner.

If this repo ever receives PRs from external contributors (not just org members), this is a supply-chain risk. Consider scoping contents: write to only the push/dispatch path, or adding a environment: restricted gate that requires maintainer approval before running the cache rebuild for fork PRs.

[Behavior change] continue-on-error removal from github job (.github/workflows/test.yml)

Previously the github matrix job (Ubuntu + macOS × compiler variants) had continue-on-error: true, which prevented a single flaky matrix entry from marking the overall workflow as failed. This is now removed. fail-fast: false is still set so all matrix entries still run, but the github job will now be marked as failed if any single matrix entry fails. If this job is a required status check for PR merging, intermittent macOS flakiness could start blocking merges. Intentional or worth a comment?

[Minor] phase2_jobs cap hardcoded at 16 (toolchain/mfc/test/coverage.py, line ~1264)

phase2_jobs = min(n_jobs, 16)

The 16-worker limit is reasonable for OOM protection but is not configurable. On nodes with large RAM (Phoenix GNR nodes have significant memory), the cap may be unnecessarily conservative. A comment explaining the memory estimate (e.g. "each MPI+gcov test uses ~2–5 GB") is present and helpful — consider making it an environment variable or CLI parameter for future tuning without a code change.

[Minor] Thread-safety note on console redirect in _prepare_test (toolchain/mfc/test/coverage.py, lines ~1216–1226)

orig_file = cons.raw.file
cons.raw.file = io.StringIO()
try:
    ...
finally:
    cons.raw.file = orig_file

The comment says "NOTE: not thread-safe — Phase 1 must remain single-threaded." This is correct and documented. Just flagging it as a fragility: if Phase 1 is ever parallelized (e.g., to speed up the 555-case prepare loop), this will produce garbled/lost console output. A thread-local or lock-based approach would be more robust long-term.

[Nit] Double space in YAML runner group (.github/workflows/test.yml, line ~321)

    runs-on:
      group:  phoenix   # two spaces before "phoenix"

Cosmetic, but inconsistent with the rest of the file.

Replace setup-build-cache.sh symlink mechanism with rm -rf build before each test run on Phoenix and Frontier. Benchmark jobs unaffected. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

github-actions · 2026-03-06T18:45:05Z

Claude Code Review

Head SHA: a010a9a34d2afecb397ba0e432d6e203d28a6b59

Files changed: 28

Changed files

.github/file-filter.yml
.github/scripts/retry-build.sh
.github/scripts/run-tests-with-retry.sh
.github/workflows/bench.yml
.github/workflows/frontier/bench.sh, build.sh, submit.sh, test.sh
.github/workflows/phoenix/bench.sh, rebuild-cache.sh (new), submit.sh, test.sh
.github/workflows/test.yml
CMakeLists.txt
benchmarks/*/case.py (5 files)
toolchain/mfc/build.py, cli/commands.py
toolchain/mfc/test/case.py, cases.py, coverage.py (new), test.py, test_coverage_cache.json.gz (new), test_coverage_unit.py (new)

Summary

Adds gcov-based file-level coverage cache (11 KB gzip JSON, 555 test UUIDs) to prune CI test runs to only those touching changed .fpp files
New --only-changes CLI flag intersects PR-changed files against cached coverage map; conservative fallbacks (missing cache → run all, ALWAYS_RUN_ALL files → full suite)
New rebuild-cache CI job on Phoenix runs the full instrumented suite via SLURM and commits the updated cache to master on push events
Removes continue-on-error: true from the github matrix job (fixes auto-cancellation suppression); caps parallel jobs at 64 to avoid MPI daemon exhaustion on GNR nodes
53 unit tests cover the core coverage filter/parse logic in test_coverage_unit.py

Findings

1. TEMP changes must be reverted before merge (acknowledged in PR description)

The PR description explicitly lists two temporary items needed only to exercise the cache rebuild pipeline in CI:

Duplicate use statement in src/simulation/m_bubbles.fpp
Removal of CMakeLists.txt from ALWAYS_RUN_ALL

These must not be merged as-is. Consider using a separate test branch or a dummy file change that does not touch production Fortran source.

2. git diff "$BEFORE".."$AFTER" on push events has no safe fallback (test.yml, ~line 321)

The PR fallback is correctly dep_changed=true when gh pr diff fails on pull_request events. But for push events:

DIFF=$(git diff "$BEFORE".."$AFTER" 2>/dev/null || echo "")

\.event.before is the all-zeros SHA on the first push to a new branch or after a force push to master. git diff 0000000..HEAD fails silently and DIFF becomes empty, so dep_changed=false is emitted. If cases.py also did not change, rebuild-cache is skipped on master push — cache could remain stale after a force-push that restructures Fortran imports. Suggested fix: mirror the PR fallback:

DIFF=$(git diff "$BEFORE".."$AFTER" 2>/dev/null) || {
  echo "dep_changed=true" >> "$GITHUB_OUTPUT"; exit 0
}

3. rebuild-cache job checks out PR HEAD on a self-hosted runner (test.yml, rebuild-cache steps)

ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }}

This runs code from the PR author's fork on the Phoenix self-hosted runner, which has contents: write permission. For pull_request events (not pull_request_target) from forks, GitHub restricts write-scope secrets automatically, so the push-to-master step is safely blocked. However, any collaborator with write access to MFlowCode/MFC who opens a PR could execute arbitrary code on Phoenix hardware. This is consistent with the existing self job pattern but is worth noting. Consider constraining this job to push/workflow_dispatch only (build the cache only post-merge, never from PR HEAD), or requiring a maintainer label before running.

4. github and self jobs now block on rebuild-cache (up to 4 hours) on any PR touching cases.py or Fortran imports (test.yml)

rebuild-cache has timeout-minutes: 240. When triggered, every downstream test job in the matrix must wait for it. For a PR that tweaks a test parameter in cases.py, this means up to 4 hours of additional CI delay before any tests run. Consider whether a shorter timeout or a pre-existing stale-but-valid cache is acceptable (i.e., only rebuild after merge, not on every PR).

5. Missing != 'failure' guard for rebuild-cache in downstream job conditions (test.yml, ~lines 395-455)

The conditions for github and self check:

needs.rebuild-cache.result != 'cancelled'

This correctly handles the skipped case ('skipped' != 'cancelled' → jobs run using the committed cache). However, it also allows the jobs to proceed when rebuild-cache has result == 'failure'. In that case, --only-changes is still passed and the stale committed cache is used. This is the intended conservative fallback, but it means a cache-rebuild failure is silent from the test job perspective — the jobs run but may run fewer tests than expected. A warning log step or an explicit comment to that effect would aid debugging.

Improvement Opportunities

GCOV_PREFIX_STRIP computation (coverage.py, _compute_gcov_prefix_strip): The comment says -1 excludes root '/' but the GCC manual counts the leading slash as one of the N stripped components. Validate on both Linux absolute paths and any unusual runner mount points (e.g., symlinked /home → /storage), as os.path.realpath is correctly used to resolve symlinks first.
export GITHUB_EVENT_NAME="$GITHUB_EVENT_NAME" in frontier/submit.sh and phoenix/submit.sh heredocs: the variable is already exported in the environment; the self-referential export is a no-op (though harmless). A comment clarifying that this propagates the variable into the SLURM job's inline script environment would improve readability.
Benchmark step-count reduction (5 benchmark case.py files, 7→2 multiplier): this is a 71% reduction in default benchmark steps. Confirm that the shortened runs still produce representative throughput numbers and that CI timing comparisons remain valid.

When the runner process is killed (exit 137) before the SLURM job completes, sacct is used to verify the job's final state. If the SLURM job completed with exit 0:0, the CI step passes regardless of the monitor's exit code. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

All three submit.sh scripts (phoenix, frontier, frontier_amd symlink) now call a single helper that wraps monitor_slurm_job.sh with sacct fallback: if the monitor is killed before the SLURM job completes, the helper re-checks the job's final state and exits 0 if it succeeded. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

When benchmarking master vs PR, submit_and_monitor_bench.sh was using the master directory's submit.sh for the master bench job. Master's submit.sh calls monitor_slurm_job.sh directly without SIGKILL recovery. When the monitor was killed (exit 137), the master bench YAML was never found. Fix: always use the PR's submit.sh (which calls run_monitored_slurm_job.sh with sacct fallback) for both master and PR bench submissions. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

github-actions · 2026-03-06T20:20:48Z

Claude Code Review

Head SHA: baa49bf4b8cca8b9d9102e4d547d5fc2d5be9f3c
Files changed: 30 | +1903 / -91

Files (30)

.github/file-filter.yml
.github/scripts/retry-build.sh
.github/scripts/run-tests-with-retry.sh
.github/scripts/run_monitored_slurm_job.sh  (new)
.github/workflows/bench.yml
.github/workflows/frontier/bench.sh
.github/workflows/frontier/build.sh
.github/workflows/frontier/submit.sh
.github/workflows/frontier/test.sh
.github/workflows/phoenix/bench.sh
.github/workflows/phoenix/rebuild-cache.sh  (new)
.github/workflows/phoenix/submit.sh
.github/workflows/phoenix/test.sh
.github/workflows/test.yml
.gitignore
CMakeLists.txt
benchmarks/*/case.py  (5 files, t_step_stop multiplier 7x -> 2x)
toolchain/mfc/build.py
toolchain/mfc/cli/commands.py
toolchain/mfc/test/case.py
toolchain/mfc/test/cases.py
toolchain/mfc/test/coverage.py  (new, 795 lines)
toolchain/mfc/test/test.py
toolchain/mfc/test/test_coverage_cache.json.gz  (binary, committed)
toolchain/mfc/test/test_coverage_unit.py  (new, 664 lines)

Summary

Introduces gcov-based file-level test pruning: build once with --gcov, map each of ~555 tests to the .fpp files it executes, cache that mapping, then intersect against PR diffs to skip unaffected tests.
Adds a rebuild-cache CI job (Phoenix SLURM, 240-min timeout) that regenerates the cache whenever cases.py or Fortran dependency graph (use/include diffs) changes.
Adds run_monitored_slurm_job.sh for resilient SLURM monitoring (handles runner SIGKILL before job completion).
Switches Phoenix CPU-test nodes to cpu-gnr (exclusive, 192-core Granite Rapids) and caps parallelism at 64 to avoid overwhelming OpenMPI daemons.
53 unit tests for core coverage logic (fully offline, no build required).

Findings

Must fix before merge

1. TEMP changes are committed — must be reverted (acknowledged in PR checklist)

The PR body explicitly marks two items as TEMP:

toolchain/mfc/test/cases.py — "1D_qbmm" added to casesToSkip to reduce cache-rebuild load in CI.
coverage.py — CMakeLists.txt removed from ALWAYS_RUN_ALL to trigger the dep-changed path during CI testing.

Both alter production behavior and must be reverted before merge.

2. Committed binary artifact built under TEMP conditions

toolchain/mfc/test/test_coverage_cache.json.gz was generated while 1D_qbmm was excluded and CMakeLists.txt was absent from ALWAYS_RUN_ALL. The cases_hash stored in its _meta will immediately mismatch the post-revert cases.py, causing load_coverage_cache to return None on every PR and silently fall back to the full test suite until a fresh rebuild-cache job runs. Consider committing a clean cache (built after reverting the TEMP changes) or documenting the expected cold-start window.

Notable issues

3. n_test_threads arithmetic inconsistency — phoenix/test.sh

# current (inconsistent references):
n_test_threads=$(( SLURM_CPUS_ON_NODE > 64 ? 64 : ${SLURM_CPUS_ON_NODE:-8} ))

If SLURM_CPUS_ON_NODE is unset the condition evaluates 0 > 64 = false and the fallback :-8 fires correctly, so behavior is right. But the two references to the variable are inconsistent. Cleaner:

n_test_threads=$(( ${SLURM_CPUS_ON_NODE:-8} > 64 ? 64 : ${SLURM_CPUS_ON_NODE:-8} ))

4. rebuild-cache.sh hardcodes -j 8 for the build — phoenix/rebuild-cache.sh line 19

retry_build ./mfc.sh build --gcov -j 8

On an exclusive 192-core GNR node the build phase uses only 8 of 192 cores. Using $NJOBS (already capped at 64) would cut build time significantly with no correctness risk for a CPU-only gfortran build.

5. Global state mutation to suppress console output in _prepare_test — coverage.py

orig_file = cons.raw.file
cons.raw.file = io.StringIO()
try:
    ...
finally:
    cons.raw.file = orig_file

This mutates a module-level singleton. Phase 1 is intentionally single-threaded so there is no race today, but redirecting cons.raw.file is undocumented Rich behavior. If cons.raw is ever replaced with a different Rich Console object the attribute may not exist. Consider rich.console.Console(quiet=True) as a context-local sink instead.

6. GITHUB_EVENT_NAME export in SLURM submit heredoc — phoenix/submit.sh line 80, frontier/submit.sh line 82

export GITHUB_EVENT_NAME="$GITHUB_EVENT_NAME"

Exporting the variable so SLURM propagates it into the batch job is correct (SLURM strips most CI env vars). If the variable is unset on a manual submit, an empty string is exported and test.sh conservatively runs the full suite — the right fallback. A brief inline comment explaining why the explicit re-export is needed would help future readers.

7. Post-revert CMakeLists.txt / ALWAYS_RUN_ALL alignment

After the TEMP revert, CMakeLists.txt presumably returns to ALWAYS_RUN_ALL. But the unit test test_cmakelists_does_not_trigger_all asserts the opposite. Either (a) revert the test to expect True, or (b) document the deliberate decision to exclude root CMakeLists.txt (covered by toolchain/cmake/ prefix) and keep the test as-is.

Minor observations

Phase 2 parallelism cap of 16 (coverage.py, phase2_jobs = min(n_jobs, 16)): with ~555 tests × 3 executables at ~10× gcov slowdown, validate empirically that the 240-minute SLURM timeout is sufficient after the TEMP revert restores the full test set.
Unit test import mechanism (importlib.util + exec fallback with string-substituted imports): functional but fragile if import lines in coverage.py are reformatted. A conftest.py that adds the repo root to sys.path would make the imports straightforward.
Benchmark t_step_stop reduction (7× → 2×): logical change, but unrelated to the coverage feature. Worth a separate commit/note if benchmark timing is being tracked historically.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

github-actions · 2026-03-06T20:26:34Z

Claude Code Review

Head SHA: a84cabc
Files changed: 30

Changed files

.github/file-filter.yml
.github/scripts/retry-build.sh
.github/scripts/run-tests-with-retry.sh
.github/scripts/run_monitored_slurm_job.sh (new)
.github/scripts/submit_and_monitor_bench.sh
.github/workflows/bench.yml
.github/workflows/frontier/{bench,build,submit,test}.sh
.github/workflows/phoenix/{bench,rebuild-cache,submit,test}.sh
.github/workflows/test.yml
CMakeLists.txt, .gitignore
benchmarks/*/case.py (5 files)
toolchain/mfc/build.py, toolchain/mfc/cli/commands.py
toolchain/mfc/test/{case,cases,coverage,test}.py + unit tests + cache gz

Summary

Introduces a 3-phase gcov-based coverage cache (prepare → run → collect) over 555 tests to enable PR-level test pruning via --only-changes.
Adds a rebuild-cache CI job on Phoenix that runs on cases.py or Fortran dep-graph changes; uploads artifact for PRs, commits to master on push.
Adds run_monitored_slurm_job.sh wrapping SLURM monitoring with sacct-based SIGKILL recovery.
Replaces build cache with clean rm -rf build on Phoenix/Frontier; adds GNR-partition SBATCH settings.
Ships 53-unit-test suite for the coverage module; includes 11 KB pre-built cache artifact.

Findings

🔴 Must fix before merge

1. TEMP changes not reverted — test plan checkbox unchecked

The PR body explicitly flags two TEMP changes that must be reverted before merge, and marks the step unchecked:

[ ] Revert TEMP changes before merge

Both are still present:

CMakeLists.txt missing from ALWAYS_RUN_ALL — The list in toolchain/mfc/test/coverage.py (added lines ~947–960) does not include CMakeLists.txt or toolchain/cmake/ entries. The PR body says this was removed temporarily to trigger the cache rebuild pipeline in CI. Without it, a PR that only changes CMakeLists.txt will skip all tests despite the build system changing.

Suggested fix: add "CMakeLists.txt" to ALWAYS_RUN_ALL (or add "toolchain/cmake/" to ALWAYS_RUN_ALL_PREFIXES):
```
ALWAYS_RUN_ALL = frozenset([
    ...
    "CMakeLists.txt",
])
```
"1D_qbmm" added to casesToSkip in cases.py:1074 — This silently suppresses the 1D_qbmm example test. If this was the duplicate-use TEMP change, it must be reverted; if it's a legitimate bug workaround it should be documented separately.

🟡 Design concerns

2. toolchain/mfc/test/test.py not in ALWAYS_RUN_ALL
toolchain/mfc/test/coverage.py (lines ~947–960) includes case.py, cases.py, coverage.py in ALWAYS_RUN_ALL, but not test.py. A PR that modifies the main test runner (test.py) while changing no .fpp files would skip all tests. Low-probability but worth closing.

3. Benchmark step counts reduced 3–3.5× without explanation
All 5 benchmark cases (5eq_rk3_weno3_hllc, hypo_hll, ibm, igr, viscous_weno5_sgb_acoustic) change the t_step_stop/t_step_save multiplier from 6–7 → 2. This is not mentioned in the PR summary. Fewer steps means less wall-clock time but weaker benchmark signal and potentially less statistical reliability. Is this intentional and permanent, or another temporary change?

4. rebuild-cache.sh: build parallelism hardcoded to -j 8
phoenix/rebuild-cache.sh line ~272 uses retry_build ./mfc.sh build --gcov -j 8 while NJOBS is computed from SLURM_CPUS_ON_NODE for tests. On GNR nodes with 192 cores, the build step uses only 8 threads. Consider using NJOBS (already capped at 64) or at least a higher fixed value to avoid a slow build step.

🟢 Minor / informational

5. GITHUB_EVENT_NAME propagation to SLURM jobs
Both phoenix/submit.sh and frontier/submit.sh add export GITHUB_EVENT_NAME="$GITHUB_EVENT_NAME" so that test.sh/rebuild-cache.sh can conditionally set prune_flag. If the variable is unset in the outer shell, it will be exported as empty, and prune_flag will remain "" (safe). The pattern works but relies on submit.sh being the entry point — direct SLURM invocations without it would silently fall back to full-suite (also safe).

6. retry-build.sh shellcheck disable: intentional
The change from eval "$clean_cmd" to $clean_cmd with a shellcheck disable=SC2086 comment is correct — word-splitting on RETRY_CLEAN_CMD is intended. The comment explains this clearly.

Before merging

Revert TEMP: add CMakeLists.txt back to ALWAYS_RUN_ALL (or an appropriate prefix)
Revert TEMP: remove "1D_qbmm" from casesToSkip (or document separately if intentional)
Clarify/confirm benchmark step-count reduction is intentional
Consider adding toolchain/mfc/test/test.py to ALWAYS_RUN_ALL

sbryngelson marked this pull request as ready for review March 2, 2026 03:49

Copilot AI review requested due to automatic review settings March 2, 2026 03:49

Copilot started reviewing on behalf of sbryngelson March 2, 2026 03:50 View session

Copilot AI reviewed Mar 2, 2026

View reviewed changes

coderabbitai bot reviewed Mar 2, 2026

View reviewed changes

sbryngelson force-pushed the gcov-test-pruning-v3 branch 2 times, most recently from fcfa309 to d6e922c Compare March 3, 2026 23:14

MFlowCode deleted a comment from github-actions bot Mar 5, 2026

MFlowCode deleted a comment from qodo-code-review bot Mar 5, 2026

MFlowCode deleted a comment from github-actions bot Mar 5, 2026

MFlowCode deleted a comment from coderabbitai bot Mar 5, 2026

MFlowCode deleted a comment from github-actions bot Mar 5, 2026

MFlowCode deleted a comment from codecov bot Mar 5, 2026

MFlowCode deleted a comment from github-actions bot Mar 5, 2026

sbryngelson and others added 12 commits March 5, 2026 09:03

Update coverage cache: 100% sim coverage with t_step_stop=1

187f3b8

New cache built from CI with t_step_stop=1 fix. All 459 non-example tests now have full simulation coverage (previously 80 were missing). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

DIAG: add slug diagnostic logging to identify binary-not-found root c…

3ab5bce

…ause Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Filter out build/staging paths from coverage cache

c4db356

Auto-generated case-optimized files under build/staging/ never appear in PR diffs, so keeping them in the cache is noise. Filter at parse time and clean the existing cache. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Harden cache push target and add safety comments

6d10fb7

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

sbryngelson force-pushed the gcov-test-pruning-v3 branch from b0e6e8a to 66c792a Compare March 5, 2026 14:05

sbryngelson added 2 commits March 5, 2026 11:21

Use retry_build in coverage cache rebuild for NFS resilience

b50e6ea

Address PR review feedback: fix type annotation, dep detection, and e…

1bac876

…rror handling

sbryngelson and others added 2 commits March 6, 2026 03:41

sbryngelson and others added 2 commits March 6, 2026 13:40

Remove persistent build cache for self-hosted test runners

44e15ef

Replace setup-build-cache.sh symlink mechanism with rm -rf build before each test run on Phoenix and Frontier. Benchmark jobs unaffected. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Remove build cache from benchmark jobs on Phoenix and Frontier

a010a9a

sbryngelson and others added 3 commits March 6, 2026 15:16

Fix submit_and_monitor_bench.sh: define SCRIPT_DIR before use

a84cabc

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add gcov-based test pruning with coverage cache#1284

Add gcov-based test pruning with coverage cache#1284
sbryngelson wants to merge 34 commits intoMFlowCode:masterfrom
sbryngelson:gcov-test-pruning-v3

sbryngelson commented Mar 2, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

github-actions bot commented Mar 5, 2026

Uh oh!

github-actions bot commented Mar 5, 2026

Uh oh!

codecov bot commented Mar 5, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 6, 2026

Uh oh!

github-actions bot commented Mar 6, 2026

Uh oh!

github-actions bot commented Mar 6, 2026

Uh oh!

github-actions bot commented Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

Conversation

sbryngelson commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Mar 5, 2026

Claude Code Review

Summary

Findings

🔴 Security: Fork code runs with contents: write on self-hosted runner

🟡 TEMP changes must be reverted before merge (as noted in PR body)

🟡 Variable naming confusion in build_coverage_cache

🟡 cons.raw.file redirect in _prepare_test is fragile

🟢 Binary blob growth over time

Minor nits

Uh oh!

github-actions bot commented Mar 5, 2026

Claude Code Review

Summary

Findings

1. TEMP code must be reverted before merge (blocker)

2. Push-event dep-change detection silently fails on shallow clones

3. --gcov passthrough to ./mfc.sh test may cause argument errors

4. Benchmark step count reductions change benchmark semantics

5. Bare SLURM_CPUS_ON_NODE in arithmetic is fragile

Positive observations

Uh oh!

codecov bot commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions bot commented Mar 6, 2026

Claude Code Review

Summary

Findings

Uh oh!

github-actions bot commented Mar 6, 2026

Claude Code Review

Summary

Findings

Improvement Opportunities

Uh oh!

github-actions bot commented Mar 6, 2026

Claude Code Review

Summary

Findings

Must fix before merge

Notable issues

Minor observations

Uh oh!

github-actions bot commented Mar 6, 2026

Claude Code Review

Summary

Findings

🔴 Must fix before merge

🟡 Design concerns

🟢 Minor / informational

Before merging

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

sbryngelson commented Mar 2, 2026 •

edited

Loading

🔴 Security: Fork code runs with `contents: write` on self-hosted runner

🟡 Variable naming confusion in `build_coverage_cache`

🟡 `cons.raw.file` redirect in `_prepare_test` is fragile

3. `--gcov` passthrough to `./mfc.sh test` may cause argument errors

5. Bare `SLURM_CPUS_ON_NODE` in arithmetic is fragile

codecov bot commented Mar 5, 2026 •

edited

Loading