Helix fails to connect with Kerberos enabled ZK#3102
Conversation
1d23180 to
41baa97
Compare
|
@junkaixue, could you please review this PR? |
zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/ZkClient.java
Outdated
Show resolved
Hide resolved
4f8412f to
70f0516
Compare
|
The failure in testEvacuateWithDisabledPartition(org.apache.helix.integration.rebalancer.TestInstanceOperation) appears unrelated to this change. I will retrigger the build. |
70f0516 to
a308687
Compare
The build passed successfully this time |
|
@arshadmohammad please follow the checkin steps. We need the author confirm the PR is good to check and no more changes. |
|
I confirm that this PR is ready for check-in and no further changes are required. |
|
Thanks, @junkaixue , for reviewing and merging the PR. Should we also merge this change into the helix-1.3.x branch? If so, I can raise a PR for it. |
|
The changes in this PR are also fully applicable to the helix-1.3.x branch |
|
Thank-you @arshadmohammad for the fix, hopefully this should. help resolve issue observed in #3071, |
|
@arshadmohammad @vishalsuvagia if you feel there is a need for 1.3.x release. Please send a request to dev@helix.apache.org. We can process the release with backported change. |
Issues
Description
Refer #3101 for details on the issue
Tests
In the Quickstart sample app, I have enabled Zookeeper Kerberos authentication and verified the fix
Quickstart Output Before Fix
Creating cluster: HELIX_QUICKSTART
Adding 2 participants to the cluster
Added participant: localhost_12000
Added participant: localhost_12001
Configuring StateModel: MyStateModel with 1 Leader and 1 Standby
Adding a resource MyResource: with 6 partitions and 2 replicas
Starting Participants
ERROR ZKHelixManager zkClient is not connected after waiting 10000ms., clusterName: HELIX_QUICKSTART, zkAddress: sl73tskrapd1044.visa.com:2181
ERROR ZKHelixManager fail to createClient. retry 1
org.apache.helix.HelixException: HelixManager is not connected within retry timeout for cluster HELIX_QUICKSTART
at org.apache.helix.manager.zk.ZKHelixManager.checkConnected(ZKHelixManager.java:417)
at org.apache.helix.manager.zk.ZKHelixManager.getConfigAccessor(ZKHelixManager.java:688)
at org.apache.helix.manager.zk.ParticipantManager.(ParticipantManager.java:118)
at org.apache.helix.manager.zk.ZKHelixManager.handleNewSessionAsParticipant(ZKHelixManager.java:1441)
at org.apache.helix.manager.zk.ZKHelixManager.handleNewSession(ZKHelixManager.java:1391)
at org.apache.helix.manager.zk.ZKHelixManager.createClient(ZKHelixManager.java:783)
at org.apache.helix.manager.zk.ZKHelixManager.connect(ZKHelixManager.java:818)
at org.apache.helix.examples.Quickstart$MyProcess.start(Quickstart.java:247)
at org.apache.helix.examples.Quickstart.startNodes(Quickstart.java:146)
at org.apache.helix.examples.Quickstart.main(Quickstart.java:164)
ERROR ZKHelixManager fail to createClient. retry 2
org.apache.helix.zookeeper.zkclient.exception.ZkTimeoutException: Waiting to be connected to ZK server has timed out.
at org.apache.helix.zookeeper.zkclient.ZkClient.waitForEstablishedSession(ZkClient.java:2082)
at org.apache.helix.manager.zk.ZKHelixManager.createClient(ZKHelixManager.java:776)
at org.apache.helix.manager.zk.ZKHelixManager.connect(ZKHelixManager.java:818)
at org.apache.helix.examples.Quickstart$MyProcess.start(Quickstart.java:247)
at org.apache.helix.examples.Quickstart.startNodes(Quickstart.java:146)
at org.apache.helix.examples.Quickstart.main(Quickstart.java:164)
Quickstart Output After Fix:
Creating cluster: HELIX_QUICKSTART
Adding 2 participants to the cluster
Added participant: localhost_12000
Added participant: localhost_12001
Configuring StateModel: MyStateModel with 1 Leader and 1 Standby
Adding a resource MyResource: with 6 partitions and 2 replicas
Starting Participants
Started Participant: localhost_12000
Started Participant: localhost_12001
Starting Helix Controller
LeaderStandbyStateModel.onBecomeStandbyFromOffline():localhost_12000 transitioning from OFFLINE to STANDBY for MyResource MyResource_1
LeaderStandbyStateModel.onBecomeStandbyFromOffline():localhost_12000 transitioning from OFFLINE to STANDBY for MyResource MyResource_4
LeaderStandbyStateModel.onBecomeStandbyFromOffline():localhost_12000 transitioning from OFFLINE to STANDBY for MyResource MyResource_3
LeaderStandbyStateModel.onBecomeStandbyFromOffline():localhost_12000 transitioning from OFFLINE to STANDBY for MyResource MyResource_5
org.apache.helix.zookeeper.impl.client.TestRawZkClient
#testWaitForKeeperStateWithSaslAuthenticated
#testWaitForKeeperStateWithConnectedReadOnly
#testWaitForKeeperStateWithOtherStates
#testWaitForKeeperStateExactMatchStillWorks
(If CI test fails due to known issue, please specify the issue and test PR locally. Then copy & paste the result of "mvn test" to here.)
Changes that Break Backward Compatibility (Optional)
(Consider including all behavior changes for public methods or API. Also include these changes in merge description so that other developers are aware of these changes. This allows them to make relevant code changes in feature branches accounting for the new method/API behavior.)
Documentation (Optional)
(Link the GitHub wiki you added)
Commits
Code Quality
(helix-style-intellij.xml if IntelliJ IDE is used)