Skip to content

Fix global lock contention in DistClusterControllerStateModel caused by Optional.empty() singleton#3052

Merged
junkaixue merged 3 commits intoapache:masterfrom
LZD-PratyushBhatt:fix_global_lock_contention
Mar 7, 2026
Merged

Fix global lock contention in DistClusterControllerStateModel caused by Optional.empty() singleton#3052
junkaixue merged 3 commits intoapache:masterfrom
LZD-PratyushBhatt:fix_global_lock_contention

Conversation

@LZD-PratyushBhatt
Copy link
Contributor

@LZD-PratyushBhatt LZD-PratyushBhatt commented Jun 28, 2025

Issues

Description

  • Here are some details about my PR, including screenshots of any UI changes:
    Refer to _controllerOpt getting shared across resources #3050 for Problem description.
    Basically, all DistClusterControllerStateModel instances were synchronizing on the same global lock object due to Optional.empty() returning a singleton instance.
    Replaced synchronization on _controllerOpt (the shared singleton) with a dedicated per-instance lock object _controllerLock = new Object(). This eliminates cross-instance contention while preserving within-instance synchronization guarantees.
    (Write a concise description including what, why, how)

Tests

  • The following tests are written for this issue:
    Added a new test testNoSharedLockAcrossInstances
    (List the names of added unit/integration tests)

  • The following is the result of the "mvn test" command on the appropriate module:

(If CI test fails due to known issue, please specify the issue and test PR locally. Then copy & paste the result of "mvn test" to here.)

Changes that Break Backward Compatibility (Optional)

  • My PR contains changes that break backward compatibility or previous assumptions for certain methods or API. They include:

(Consider including all behavior changes for public methods or API. Also include these changes in merge description so that other developers are aware of these changes. This allows them to make relevant code changes in feature branches accounting for the new method/API behavior.)

Documentation (Optional)

  • In case of new functionality, my PR adds documentation in the following wiki page:

(Link the GitHub wiki you added)

Commits

  • My commits all reference appropriate Apache Helix GitHub issues in their subject lines. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Code Quality

  • My diff has been formatted using helix-style.xml
    (helix-style-intellij.xml if IntelliJ IDE is used)

@LZD-PratyushBhatt LZD-PratyushBhatt changed the title Fix global lock contention in DistClusterControllerStateModel caused … Fix global lock contention in DistClusterControllerStateModel caused by Optional.empty() singleton Jun 28, 2025
@LZD-PratyushBhatt
Copy link
Contributor Author

@junkaixue @GrantPSpencer Can you take a look at this? Thanks!

Copy link
Contributor

@GrantPSpencer GrantPSpencer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Great catch on this, will greatly reduce topstate handoff when controller is processing multiple STs simultaneously

Just have 1 nit comment on test assertion

Copy link
Contributor

@GrantPSpencer GrantPSpencer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
will still need committer approval, cc @junkaixue @xyuanlu

@LZD-PratyushBhatt
Copy link
Contributor Author

Hey @junkaixue can you take a look at this again if you get time, Thanks a lot!

@LZD-PratyushBhatt
Copy link
Contributor Author

Hey @junkaixue , @xyuanlu reminder on this one, Thanks for your time!

@junkaixue junkaixue merged commit c4b5d36 into apache:master Mar 7, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

_controllerOpt getting shared across resources

4 participants