Backport router health checks #501

Serpentian · 2024-12-03T12:20:01Z

Backport of the https://github.com/tarantool/vshard-ee/pull/20. No changes in code. Changed the commit msg in the first one (commit hash) and the reason for NO_DOC in the second one

This commit fixes the flakiness of the failover/cluster_changes test, which was caused by the commit 9fc976d ("router: calls affect temporary prioritized replica"). It started to check `replica.net_sequential_ok` during `up_replica_priority`. However, during the configuration of the instance, `net_sequential_ok` is 0, so when failover fiber didn't manage to ping the instance (e.g. connection have not been created yet) router throwed SUBOPTIMAL_REPLICA alert. Let's disable health checkers during router configuration (when prioritized replica is not set at all). Closes tarantool#495 NO_DOC=bugfix NO_TEST=<covered by failover/cluster_changes>

Before this patch router didn't take into account the state of box.info.replication of the storage, when routing requests to it. From now on router automatically lowers the priority of replica, when router supposes, that connection from the master to a replica is dead (status or idle > 30) or too slow (lag is > 30 sec). We also change REPLICA_NOACTIVITY_TIMEOUT from 5 minutes to 30 seconds. This is needed to speed up how quickly a replica notices the master's change. Before the patch the non-master never knew, where the master currently is. Now, since we try to check status of the master's upstream, we need to find this master in service_info via conn_manager. Since after that replica doesn't do any requests to master, the connection is collected by conn_manager in collect_idle_conns after 30 seconds. Then router's failover calls service_info one more time and non-master locates master, which may have already changed. This patch allows to increase the consistency of read requests and decreases the probability of reading a stale data. Closes tarantool#453 Closes tarantool#487 NO_DOC=bugfix

Gerold103

Thanks for the port, and I am happy that you got this patch open-sourced!

Serpentian added 2 commits December 3, 2024 15:21

Serpentian force-pushed the gh-453-router-health-checks branch from aedd579 to 1cd42d6 Compare December 3, 2024 12:21

Serpentian requested a review from Gerold103 December 3, 2024 12:25

Serpentian assigned Gerold103 Dec 3, 2024

Gerold103 approved these changes Dec 3, 2024

View reviewed changes

Gerold103 merged commit e1c806e into tarantool:master Dec 3, 2024
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backport router health checks #501

Backport router health checks #501

Serpentian commented Dec 3, 2024 •

edited

Loading

Gerold103 left a comment

Backport router health checks #501

Backport router health checks #501

Conversation

Serpentian commented Dec 3, 2024 • edited Loading

Gerold103 left a comment

Choose a reason for hiding this comment

Serpentian commented Dec 3, 2024 •

edited

Loading