Skip to content

feat: implement consecutive failures healthcheck#32

Open
nest-aka-swan wants to merge 1 commit intomainfrom
add-healthcheck-consecutive-failures
Open

feat: implement consecutive failures healthcheck#32
nest-aka-swan wants to merge 1 commit intomainfrom
add-healthcheck-consecutive-failures

Conversation

@nest-aka-swan
Copy link

  • Adds a healthcheckConsecutiveFailures option that requires a connection to fail health checks N times in a row before being marked unhealthy, preventing flaps from removing healthy replicas from the pool

  • Introduces a setConnectionHealth helper on PGDispatcher that centralizes health state transitions and tracks consecutiveFailures per connection. When healthy = true, the counter resets; when healthy = false, the counter increments and only marks the connection unhealthy once the threshold is reached

  • The option is optional and backward-compatible: when unset (or 0), the behavior is the same as before

@jhoncool
Copy link
Collaborator

@nest-aka-swan
Thanks for the PR! The flapping problem is real.

However, I think before adding complexity to the logic, it's worth trying to tune healthcheckTimeout and healthcheckInterval first — flapping is often just a symptom of overly aggressive settings.

What real-world problem are you experiencing? Often flapping is caused by misconfigured connection limits rather than healthcheck sensitivity.

For example, if you have 3 database hosts (1 primary + 2 replicas), 20 Node.js pods, and pool.max = 10, you need at least 10 × 20 × 3 = 600 as the connection limit for your database user. If max_connections or the per-user connection limit in PostgreSQL is set lower than that, you'll get intermittent healthcheck failures that look like flapping.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants