*: Implementation of weighted random shuffling (A113)#8864
*: Implementation of weighted random shuffling (A113)#8864easwars merged 8 commits intogrpc:masterfrom
Conversation
This is already taken care of by the xDS client
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #8864 +/- ##
==========================================
+ Coverage 82.42% 83.17% +0.75%
==========================================
Files 414 414
Lines 32750 32755 +5
==========================================
+ Hits 26994 27245 +251
+ Misses 4096 4090 -6
+ Partials 1660 1420 -240
🚀 New features to boost your workflow:
|
| // will change the order of endpoints but not touch the order of the | ||
| // addresses within each endpoint. - A61 | ||
| if cfg.ShuffleAddressList { | ||
| endpoints = append([]resolver.Endpoint{}, endpoints...) |
There was a problem hiding this comment.
We should continue cloning the endpoints list here. Directly mutating the ResolverState can lead to data races if the caller of UpdateClientConnState reads the state concurrently. It would also be helpful to add a comment explaining this to prevent the accidental removal of the copy in the future.
There was a problem hiding this comment.
Argh. I feel like we need to either make resolver.State.Endpoints an immutable type (EndpointList) or guarantee it's deeply copied when resolver.State is copied.
arjan-bal
left a comment
There was a problem hiding this comment.
LGTM, with some minor comments.
| // Endpoints weights are the product of normalized locality weight and | ||
| // endpoint weight, represented as a fixed-point number in uQ1.31 format. | ||
| // Locality weights are normalized as: | ||
| // P1: locality 0: 80 / (100) = 0.8 |
There was a problem hiding this comment.
I believe locality 0 has weight 20 and locality 3 has weight 80. They seem to be swapped in this comment.
There was a problem hiding this comment.
OMG. This test is so messed up and I've made a copy of it, making it two problems instead of one. I've anyways filed #8868 to cleanup these tests.
| } | ||
| if diff := cmp.Diff(got, tt.wantConfig); diff != "" { | ||
| if diff := cmp.Diff(gotConfig, tt.wantConfig); diff != "" { | ||
| t.Errorf("localitiesToWeightedTarget() diff (-got +want) %v", diff) |
There was a problem hiding this comment.
The error message logs the wrong function name. Same issue a couple of lines below and in the corresponding test with weighted sorting enabled.
| } | ||
| resolverEndpoint = weight.Set(resolverEndpoint, weight.EndpointInfo{Weight: endpointWeight}) | ||
| } else { | ||
| resolverEndpoint = weight.Set(resolverEndpoint, weight.EndpointInfo{Weight: locality.Weight * endpoint.Weight}) |
There was a problem hiding this comment.
Not related to the changes introduced in this PR, but can't this multiplication result in an overflow since EndpointInfo.Weight is also a uint32?
There was a problem hiding this comment.
Yes, it definitely can overflow. But since no one has complained so far and we will get rid of this code path soonish (once we delete the env var), I'm Ok letting it be as-is.
| // will change the order of endpoints but not touch the order of the | ||
| // addresses within each endpoint. - A61 | ||
| if cfg.ShuffleAddressList { | ||
| endpoints = append([]resolver.Endpoint{}, endpoints...) |
There was a problem hiding this comment.
Argh. I feel like we need to either make resolver.State.Endpoints an immutable type (EndpointList) or guarantee it's deeply copied when resolver.State is copied.
| // Here, and in the "else" block below, we clone the endpoints | ||
| // slice to avoid mutating the resolver state. Doing the latter | ||
| // would lead to data races if the caller is accessing the same | ||
| // slice concurrently. |
There was a problem hiding this comment.
I think this is important even if it's not a concurrent update. We just shouldn't be mutating the incoming update's endpoints because that likely changes state in the parent.
| // in each locality, the probability of selecting the endpoint within the | ||
| // locality is 1. Therefore, the target fractions end up as 1/3 and 2/3 | ||
| // respectively. | ||
| const wantRPCs0 = float64(1) / float64(3) |
There was a problem hiding this comment.
FWIW you only need to float64 either the numerator or the denominator.
This PR implements the currently in-review gRFC A113: grpc/proposal#535. I've split the PR into logically separate commits to help with the review process. Summary of changes: - Commit 1: simplify the implementation of `groupLocalitiesByPriority` - Change the implementation to use newly added methods in the stdlib `maps` and `slices` package to significantly simplify the implementation (and get rid of an unnecessary test) - Commit 2: Remove code that handles localities and endpoints of weight 0 - Remove unnecessary checks for locality and endpoint weights of `0` in `cluster_resolver`. The xDS client already guarantees that these weights will never be set to `0`. - Commit 3: add the env var GRPC_EXPERIMENTAL_PF_WEIGHTED_SHUFFLING - Commit 4: Weight computation changes in cluster_resolver LB policy - This performs the weight normalization and fixed-point arithmetic specified in A113 - The change here is guarded by the above env var - Ended up duplicating the tests that verify the weight computation behavior. This will make it easier to delete the old tests when the env var is removed. - Commit 5: Fix a broken test in ring_hash due to the new weight computation - Commit 6: Weighted shuffling in pick_first - Contains the changes specified in A113 for the pick_first LB policy - Changes are guarded by the env var RELEASE NOTES: - pickfirst: Add support for weighted random shuffling of endpoints, as described in gRFC A113
Original PR : #8864 RELEASE NOTES: - pickfirst: Add support for weighted random shuffling of endpoints, as described in gRFC A113 Co-authored-by: Easwar Swaminathan <easwars@google.com>
This PR implements the currently in-review gRFC A113: grpc/proposal#535. I've split the PR into logically separate commits to help with the review process. Summary of changes: - Commit 1: simplify the implementation of `groupLocalitiesByPriority` - Change the implementation to use newly added methods in the stdlib `maps` and `slices` package to significantly simplify the implementation (and get rid of an unnecessary test) - Commit 2: Remove code that handles localities and endpoints of weight 0 - Remove unnecessary checks for locality and endpoint weights of `0` in `cluster_resolver`. The xDS client already guarantees that these weights will never be set to `0`. - Commit 3: add the env var GRPC_EXPERIMENTAL_PF_WEIGHTED_SHUFFLING - Commit 4: Weight computation changes in cluster_resolver LB policy - This performs the weight normalization and fixed-point arithmetic specified in A113 - The change here is guarded by the above env var - Ended up duplicating the tests that verify the weight computation behavior. This will make it easier to delete the old tests when the env var is removed. - Commit 5: Fix a broken test in ring_hash due to the new weight computation - Commit 6: Weighted shuffling in pick_first - Contains the changes specified in A113 for the pick_first LB policy - Changes are guarded by the env var RELEASE NOTES: - pickfirst: Add support for weighted random shuffling of endpoints, as described in gRFC A113
This PR implements the currently in-review gRFC A113: grpc/proposal#535. I've split the PR into logically separate commits to help with the review process.
Summary of changes:
groupLocalitiesByPrioritymapsandslicespackage to significantly simplify the implementation (and get rid of an unnecessary test)0incluster_resolver. The xDS client already guarantees that these weights will never be set to0.RELEASE NOTES: