[router] Improve HAR routing in Router #1414
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Update Venice Router HAR least-loaded algo to take group latency into account and here
are the reasons:
a. When the total qps is relatively low, the pending request count based load rebalancing
algo doesn't work well as most of the time, the pending count is 0. One example: let us
say, the avg latency of one group is 10ms, and if the group qps is 10, that means very likely
only 10-20% of the time, there are some pending requests and even another group latency is
much lower: 1-2ms, the faster group won't be selected, and the latency-based LB algo will kick
in when the above case happens to prefer the faster group if the pending count is equal among all
the groups.
b. Completely getting rid of pending request count based LB will result in another issue as the relatively
slower group will get almost zero request.
c. A combination of pending request and latency will select the faster groups when all groups are busy or idle,
and it will still send some amount of requests to the slower groups in case it is too idle and it will help
bring back the slower group into the rotation when it is recovered from the slowness.
How was this PR tested?
CI
Does this PR introduce any user-facing changes?