2. LeastResourceUsageWithWeight
2.1 Candidate Broker Pool
LeastResourceUsageWithWeight
uses the same scoring algorithm as ThresholdShedder
to score brokers, i.e., it calculates the score by historical weight algorithm based on the resource usage and obtains the average score of all brokers. Finally, it selects candidate brokers based on the threshold configuration loadBalancerAverageResourceUsageDifferenceThresholdPercentage
.
If a broker's score, when added to this threshold, is still not greater than the average score, then that broker will be added to the candidate broker list. After obtaining the candidate broker list, a broker is randomly selected from it; if there are no candidate brokers, a broker is randomly selected from all brokers.
For example, when the resource usage of Broker1 is 10%, the resource usage of Broker2 is 30%, and the resource usage of Broker3 is 80%, the calculated average resource usage is 40%. In this case, the placement strategy can select Broker1 and Broker2 as the candidates. This is because the set threshold is 10, and 10 + 10 ≤ 40
, 30 + 10 ≤ 40
, which meets the screening criteria. As a result, the bundle unloaded from Broker3 can be more evenly distributed to Broker1 and Broker2, rather than being concentrated entirely on the lowest-load Broker1.
Recalling the over placement issue introduced earlier, since brokers are scored and ranked based on resource usage here, and it is not possible to estimate the load of the newly allocated bundle onto the broker, selecting the broker with the lowest score alone would trigger the over placement problem.
To address this, LeastResourceUsageWithWeight
introduces a new configuration loadBalancerAverageResourceUsageDifferenceThresholdPercentage
. It first selects a pool of candidate brokers and then randomly chooses one to avoid the over placement issue.
However, in practice, it is found that the parameter loadBalancerAverageResourceUsageDifferenceThresholdPercentage
is very difficult to set to an appropriate value, resulting in the frequent triggering of the global random selection logic as a fallback strategy, which is bad.
For example, if the current cluster has 6 brokers with scores of 40, 40, 40, 40, 69, and 70, respectively, the average score is 49.83. Using the default configuration, there are no candidate brokers because 40 + 10 > 49.83
. At this point, the fallback global random selection logic is triggered. Therefore, the bundle may be unloaded from the high-load Broker5 to the equally high-load Broker6, or vice versa. Such incorrect load balancing decisions are precisely the root cause of the cluster's continuous execution of load balancing operations. In contrast, an excellent load balancing algorithm usually only needs one or two decisions to achieve the desired effect.
If you try to reduce the configuration value to expand the random pool, for example, by setting it to 0, some brokers in a high-load state may also be included in the candidate broker list. For example, if the current cluster has 5 brokers with scores of 10, 60, 70, 80, and 80, respectively, the average score is 60. With a configuration value of 0, Broker1 and Broker2 are both candidate brokers. In this case, if Broker2 takes on half of the unloaded traffic, it is highly likely to become overloaded.
Therefore, the LeastResourceUsageWithWeight
algorithm cannot avoid incorrect load balancing (over placement problem). Of course, if you are going to use the ThresholdShedder
algorithm, the combination of ThresholdShedder + LeastResourceUsageWithWeight
is still superior to the combination of ThresholdShedder + LeastLongTermMessageRate
, because at least the scoring algorithm of LeastResourceUsageWithWeight
is consistent with that of ThresholdShedder
.
2.2 Historical Weight Algorithm
Finally, let's introduce the problems caused by the historical weight algorithm. The historical weight algorithm is used by ThresholdShedder
and LeastResourceUsageWithWeight
, and the algorithm is as follows:
The default value of historyPercentage
is 0.9. It can be seen that the score calculated in the previous round has a significant impact on the current score, while the current maximum resource usage only accounts for 0.1. Short-term severe load fluctuations will not immediately reflect in the load score, thus solving the problem of load fluctuation. However, introducing this algorithm has significant side effects.
For example, the problem of over unloading. In the following example, the current cluster contains one broker (Broker1) with a load of 90%. Since it has been running stably, its historical score is 90. Now, a new broker (Broker2) is added with a current load of 10%. Then:
First round of shedding execution: Broker1 scores 90, Broker2 scores 10. For simplicity, assume the algorithm will move some bundles to make their scores the same. After the load migration is completed, the actual loads of Broker1 and Broker2 are both 50.
Second round of shedding execution: Broker1 scores
90*0.9 + 50*0.1 = 86
, Broker2 scores10*0.9 + 50*0.1 = 14
.Notice that Broker1's actual load is 50, but its load score is overestimated at 86! Broker2's actual load is also 50, but it is underestimated at 14!
Although their actual loads are already the same, their scores are still significantly different, and they will continue to unload traffic. Therefore, 36 points of traffic corresponding to Broker1 are unloaded to Broker2, resulting in Broker1's load becoming 14 and Broker2's load becoming 86. The loads are reversed!
Third round of shedding execution: Broker1 scores
86*0.9 + 14*0.1 = 78.8
, Broker2 scores14*0.9 + 86*0.1 = 21.2
. At this point, it still judges that Broker1's load is higher than Broker2's, and it will continue to unload traffic from Broker1 to Broker2. As a result, all the remaining load on Broker1 will be unloaded, making Broker1 the same as when Broker2 was first added, not carrying any bundles, and all the load will be added to Broker2.
This example is an idealized theoretical analysis, but we can still see that using the historical scoring algorithm severely misestimates the broker's actual load. Especially when a broker adds or reduces load, although it can avoid the problem of load fluctuation, it introduces more serious and widespread problems: misestimating the broker's actual load, thereby causing incorrect load balancing execution. In this example, it leads to the problem of over unloading.
If you want to avoid the defect of load misestimation, you can set the configuration loadBalancerHistoryResourcePercentage
to 0. However, this will not be able to deal with load fluctuations. This is the dilemma of having to choose between two desirable but incompatible options, and this is precisely the key reason why I developed the new algorithm AvgShedder
, which can achieve the effect of having both.
2.3 Algorithm Combinations
LeastResourceUsageWithWeight
is specifically designed and implemented to be used in conjunction with ThresholdShedder
. Therefore, if you want to use LeastResourceUsageWithWeight
, it must be used together with ThresholdShedder
.
Last updated
Was this helpful?