Skip to content

fix: Strict raft state#3167

Merged
alpe merged 2 commits intomainfrom
alex/raft_constraint
Mar 16, 2026
Merged

fix: Strict raft state#3167
alpe merged 2 commits intomainfrom
alex/raft_constraint

Conversation

@alpe
Copy link
Contributor

@alpe alpe commented Mar 16, 2026

Overview

Ensure new raft state is increasing time and height. Not enforcing height + 1

Summary by CodeRabbit

  • Bug Fixes

    • Strengthened state transition validation logic to more strictly enforce correct progression sequences, improving detection of invalid state combinations and ensuring more robust state management.
  • Tests

    • Added comprehensive test suite for state validation covering multiple scenarios: normal progressions, equal values, decreasing sequences, and other edge cases.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 16, 2026

The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed⏩ skipped✅ passed✅ passedMar 16, 2026, 4:07 PM

@claude
Copy link
Contributor

claude bot commented Mar 16, 2026

Claude finished @alpe's task in 1m 29s —— View job


Code Review

  • Gather context and understand the PR
  • Read changed files
  • Review code quality and correctness
  • Check error handling and edge cases
  • Review tests
  • Post final review

The change is small and focused: assertValid now rejects non-increasing (equal or decreasing) height and timestamp transitions instead of only rejecting strictly decreasing ones. This is the correct semantics for a monotonically-advancing block state machine.

pkg/raft/types.go

Bug: Misleading error messages (lines 15, 18)

The conditions were tightened to >=, but the error message strings still use >:

if s.Height >= next.Height {
    return fmt.Errorf("invalid height: %d > %d", s.Height, next.Height)   // wrong: says > but fires on ==
}
if s.Timestamp >= next.Timestamp {
    return fmt.Errorf("invalid timestamp: %d > %d", s.Timestamp, next.Timestamp)  // same issue
}

When s.Height == next.Height == 10, the error reads "invalid height: 10 > 10", which is factually incorrect and will confuse anyone debugging. CodeRabbit already flagged this.

Suggested fix: Fix this →

if s.Height >= next.Height {
    return fmt.Errorf("invalid height: current %d >= next %d", s.Height, next.Height)
}
if s.Timestamp >= next.Timestamp {
    return fmt.Errorf("invalid timestamp: current %d >= next %d", s.Timestamp, next.Timestamp)
}

Caller context (node.go:315)

assertValid is called from FSM.Apply against f.state.Load(), which starts as &RaftBlockState{} (Height=0, Timestamp=0). This means the very first applied log entry must have Height > 0 and Timestamp > 0. That seems correct for a block-producing sequencer, but it's worth confirming this is always true — e.g., can a node ever apply a genesis-like entry with Timestamp=0?

Return value of Apply on validation failure

When assertValid returns an error, FSM.Apply returns that error. Per the hashicorp/raft documentation, returning a non-nil error from Apply does not stop the cluster — it is returned to the leader's raft.Apply().Response() and the log entry is still committed. If a follower rejects the entry here, its state will diverge silently (the entry is considered applied, but f.state is not updated). This is an existing concern, not introduced by this PR, but strengthening the guard makes it more likely to fire in practice, so it's worth noting.

pkg/raft/types_test.go

The table-driven test is well-structured and covers the key boundary cases introduced by this change. A few observations:

  1. Missing: initial state test. The FSM initialises with &RaftBlockState{} (Height=0, Timestamp=0). There is no test case verifying that a valid first block (e.g., Height=1, Timestamp=1) passes assertValid from the zero state. This is the most common entry point in production. Fix this →

  2. Missing: height gap test. The function comment says "does not ensure that no gaps exist", but there is no test confirming a gap (e.g., Height 10 → 15) is accepted. This would serve as documentation that gaps are intentional.

  3. Error message not asserted. Tests use require.Error(t, err) without checking the message content. Given the misleading message bug above, asserting require.ErrorContains(t, err, ">=") or similar would catch message regressions.

Summary

Severity Issue
Minor bug Error messages say > but condition is >= — misleading in logs and debugging
Test gap No test for FSM initial state (Height=0, Timestamp=0 → first block)
Test gap No test confirming height gaps are allowed (as stated in the comment)
Pre-existing FSM.Apply returning error on invalid state may silently diverge followers

The core logic change (>>=) is correct and the test suite is a solid foundation. The error message fix and two additional test cases would make this production-ready.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 16, 2026

Warning

Rate limit exceeded

@alpe has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 20 minutes and 33 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: aa00e013-5198-4138-9623-10ac0dc25147

📥 Commits

Reviewing files that changed from the base of the PR and between f1b0dbe and 340a33b.

📒 Files selected for processing (1)
  • CHANGELOG.md
📝 Walkthrough

Walkthrough

Modified validation boundary checks in RaftBlockState transitions from strict (>) to non-strict (>=) comparisons for height and timestamp fields. Added comprehensive test coverage for the updated assertValid function with scenarios covering normal progression, equal states, and boundary conditions.

Changes

Cohort / File(s) Summary
Validation Logic Update
pkg/raft/types.go
Changed height and timestamp boundary checks from strict (>) to non-strict (>=), making equal or non-increasing sequences invalid for RaftBlockState transitions.
Test Coverage
pkg/raft/types_test.go
Added TestAssertValid with comprehensive test scenarios covering normal progression, equal current/next states, equal height, equal timestamp, lower timestamp, and lower height cases.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 A hop, a skip, the bounds now tight,
No equal states in Raft's height,
Validation strict, timestamps true,
Tests confirm what we shall do! ✨

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'fix: Strict raft state' is vague and doesn't clearly convey the specific change made. It mentions 'strict' but the actual change relaxes constraints (>= instead of >), which is contradictory. Use a more descriptive title that accurately reflects the change, such as 'fix: Enforce monotonic raft state transitions' or 'fix: Require increasing height and timestamp in raft state'.
✅ Passed checks (1 passed)
Check name Status Explanation
Description check ✅ Passed The description is present but minimal, lacking detailed context, rationale, and reasoning for the specific validation changes made to boundary checks.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch alex/raft_constraint
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link

codecov bot commented Mar 16, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 60.24%. Comparing base (eabde1a) to head (340a33b).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3167      +/-   ##
==========================================
+ Coverage   60.20%   60.24%   +0.03%     
==========================================
  Files         115      115              
  Lines       11902    11902              
==========================================
+ Hits         7166     7170       +4     
+ Misses       3925     3921       -4     
  Partials      811      811              
Flag Coverage Δ
combined 60.24% <100.00%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
pkg/raft/types_test.go (1)

9-44: Good test coverage; consider adding edge cases for initial state and height gaps.

The table-driven tests cover the core validation scenarios well. Given the context from node.go where FSM starts with an empty RaftBlockState{} (Height=0, Timestamp=0), and the PR description noting that height gaps are allowed, consider adding these cases:

Suggested additional test cases
 		"lower height": {
 			s:      &RaftBlockState{Height: 11, Timestamp: 100},
 			next:   &RaftBlockState{Height: 10, Timestamp: 101},
 			expErr: true,
 		},
+		"from initial state": {
+			s:    &RaftBlockState{Height: 0, Timestamp: 0},
+			next: &RaftBlockState{Height: 1, Timestamp: 100},
+		},
+		"height gap allowed": {
+			s:    &RaftBlockState{Height: 10, Timestamp: 100},
+			next: &RaftBlockState{Height: 15, Timestamp: 200},
+		},
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/raft/types_test.go` around lines 9 - 44, Add two table-driven cases to
TestAssertValid for the initial/zero-state and for allowed height gaps: include
a case where s is &RaftBlockState{} (Height:0, Timestamp:0) and next is a valid
non-zero state (e.g., Height>0, Timestamp>0) expecting no error, and a case
where s has Height N and next has Height M where M > N+1 (a height gap) also
expecting no error; update the specs map in TestAssertValid to reference
RaftBlockState for these scenarios so the test covers FSM initial state and
confirms height gaps are permitted.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/raft/types.go`:
- Around line 14-18: The error messages are misleading because the checks use >=
but the messages say '>' for s.Height/next.Height and
s.Timestamp/next.Timestamp; update the fmt.Errorf strings in the comparisons
that use s.Height >= next.Height and s.Timestamp >= next.Timestamp to reflect >=
(e.g. "invalid height: %d >= %d" and "invalid timestamp: %d >= %d") or use
wording like "not less than" so the message accurately matches the condition for
the symbols s.Height, next.Height, s.Timestamp and next.Timestamp.

---

Nitpick comments:
In `@pkg/raft/types_test.go`:
- Around line 9-44: Add two table-driven cases to TestAssertValid for the
initial/zero-state and for allowed height gaps: include a case where s is
&RaftBlockState{} (Height:0, Timestamp:0) and next is a valid non-zero state
(e.g., Height>0, Timestamp>0) expecting no error, and a case where s has Height
N and next has Height M where M > N+1 (a height gap) also expecting no error;
update the specs map in TestAssertValid to reference RaftBlockState for these
scenarios so the test covers FSM initial state and confirms height gaps are
permitted.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3087cb86-9ddc-4b55-baaf-0210abaccc0d

📥 Commits

Reviewing files that changed from the base of the PR and between eabde1a and f1b0dbe.

📒 Files selected for processing (2)
  • pkg/raft/types.go
  • pkg/raft/types_test.go

Comment on lines +14 to 18
if s.Height >= next.Height {
return fmt.Errorf("invalid height: %d > %d", s.Height, next.Height)
}
if s.Timestamp > next.Timestamp {
if s.Timestamp >= next.Timestamp {
return fmt.Errorf("invalid timestamp: %d > %d", s.Timestamp, next.Timestamp)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Error messages are misleading for equality cases.

The conditions now use >= to reject equal values, but the error messages still show >. When s.Height == next.Height, the message "invalid height: 10 > 10" is confusing because 10 is not greater than 10.

Proposed fix to clarify error messages
 func assertValid(s *RaftBlockState, next *RaftBlockState) error {
 	if s.Height >= next.Height {
-		return fmt.Errorf("invalid height: %d > %d", s.Height, next.Height)
+		return fmt.Errorf("invalid height: current %d >= next %d", s.Height, next.Height)
 	}
 	if s.Timestamp >= next.Timestamp {
-		return fmt.Errorf("invalid timestamp: %d > %d", s.Timestamp, next.Timestamp)
+		return fmt.Errorf("invalid timestamp: current %d >= next %d", s.Timestamp, next.Timestamp)
 	}
 	return nil
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if s.Height >= next.Height {
return fmt.Errorf("invalid height: %d > %d", s.Height, next.Height)
}
if s.Timestamp > next.Timestamp {
if s.Timestamp >= next.Timestamp {
return fmt.Errorf("invalid timestamp: %d > %d", s.Timestamp, next.Timestamp)
if s.Height >= next.Height {
return fmt.Errorf("invalid height: current %d >= next %d", s.Height, next.Height)
}
if s.Timestamp >= next.Timestamp {
return fmt.Errorf("invalid timestamp: current %d >= next %d", s.Timestamp, next.Timestamp)
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/raft/types.go` around lines 14 - 18, The error messages are misleading
because the checks use >= but the messages say '>' for s.Height/next.Height and
s.Timestamp/next.Timestamp; update the fmt.Errorf strings in the comparisons
that use s.Height >= next.Height and s.Timestamp >= next.Timestamp to reflect >=
(e.g. "invalid height: %d >= %d" and "invalid timestamp: %d >= %d") or use
wording like "not less than" so the message accurately matches the condition for
the symbols s.Height, next.Height, s.Timestamp and next.Timestamp.

@alpe alpe added this pull request to the merge queue Mar 16, 2026
Merged via the queue into main with commit 9dfa0d5 Mar 16, 2026
30 of 31 checks passed
@alpe alpe deleted the alex/raft_constraint branch March 16, 2026 17:00
alpe added a commit that referenced this pull request Mar 17, 2026
* main:
  chore: minor changes from audit tool (#3165)
  build(deps): Bump actions/download-artifact from 8.0.0 to 8.0.1 in the patch-updates group (#3169)
  fix: Strict raft state (#3167)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants