1. LeastLongTermMessageRate
Last updated
Was this helpful?
Last updated
Was this helpful?
The LeastLongTermMessageRate
strategy was initially designed to be used in conjunction with OverloadShedder
. When calculating the load scores for brokers, this strategy does not consider weight configurations (e.g., loadBalancerCPUResourceWeight
). Instead, it directly uses the maximum resource usage rates for CPU, direct memory, and network as the initial scores for brokers. Additionally, it reuses the loadBalancerBrokerOverloadedThresholdPercentage
configuration from OverloadShedder
. If the initial score exceeds the default value of 85% for this configuration, the score is set to INF
(i.e., infinity). Otherwise, the final score for the broker is calculated based on the message rate.
Finally, a broker with the smallest final score is randomly selected and returned. If all brokers have a score of INF, a broker is randomly chosen and returned.
Note: Since the score is of type double
, it is highly unlikely that multiple brokers will have the same score. Therefore, it can be considered that the broker with the smallest final score is always selected and returned as the new owner of the bundle.
In essence, the LeastLongTermMessageRate
scoring algorithm is based on message rate. Although the initial score is derived from the maximum resource usage, its purpose is to initially filter out overloaded brokers, which will not be returned as candidate brokers (unless all brokers are overloaded). Ultimately, the final score is calculated based on the message rate, and brokers are ranked according to this final score.
Since the LeastLongTermMessageRate
scoring is based on message rate, it faces the same issue as UniformLoadShedder
regarding heterogeneous environments (i.e., adaptability issues).
Reminder: As defined in Chapter 1, adaptability issues refer to the inability of the old algorithm to adapt to heterogeneous environments, where higher-performance machines cannot handle more traffic load compared to lower-performance machines.
So, to address the issue of heterogeneous environments, can the LeastLongTermMessageRate
scoring algorithm be adjusted to score based on resource usage?
The answer is no. Before providing a detailed answer to this question, let's first introduce the issue of over placement (as discussed in Chapter 1).
For example, suppose there are currently 11 brokers, all with resource usage around 50%, and the threshold is set to 70%. Since the cluster is not under heavy pressure, an attempt is made to scale down by 3 brokers. Consequently, all bundles on these three brokers will be unloaded and then loaded onto the remaining brokers. Because load data is updated periodically and cannot keep up with the speed of topic lookup and bundle loading (and due to the performance limitations of Zookeeper, it is also impossible to speed up), the load data for each broker in the leader broker's memory can be considered as unupdated in the short term. Therefore, during this period, the broker with the lowest score will always be the same broker, and all bundles will be loaded onto this low-load broker, causing it to become overloaded.
LeastLongTermMessageRate
addresses this issue by using data from PreallocatedBundleData
.
When calculating the final score for a broker, it not only uses the long-term message rate aggregated by the broker itself but also adds the message rate of the bundles that have been allocated to the broker but have not yet been included in the broker's load data. The final score calculation formula is: the sum of the long-term aggregated message in and out rates of the broker, plus the sum of the message in and out rates of the preallocated bundles. That is:
For example, suppose there are two bundles with a rate of 20KB/s each, and broker1 and broker2 with rates of 100KB/s and 110KB/s, respectively. When the first bundle is allocated, it is assigned to broker1. However, the load data of broker1 will not be updated in the short term. When it comes to allocating the second bundle, even though the load data of `broker1` has not changed, its score has already become 100 + 20 = 120KB/s. Therefore, this time the bundle is allocated to `broker2`, thus avoiding the over placement issue.
Now we can answer the question posed earlier: It is not possible to change the scoring to be based on resource usage because bundles only have statistical data for message rate and throughput, and it is impossible to predict how much resource usage will increase for a broker when a bundle is loaded. Therefore, it is also impossible to use data from PreallocatedBundleData
for estimation.
Note: The over placement issue is precisely what the subsequent algorithm LeastResourceUsageWithWeight
has to deal with because it scores based on resource usage.
Next, let's analyze the effects by combining the LoadSheddingStrategy
and the ModularLoadManagerStrategy
.
LeastLongTermMessageRate + OverloadShedder
This was the initial combination, but due to some inherent flaws in OverloadShedder
, it is not recommended.
LeastLongTermMessageRate + ThresholdShedder
This combination is even worse than LeastLongTermMessageRate + OverloadShedder
and is not recommended, though it is the default combination. Since OverloadShedder
scores brokers based on resource usage, while LeastLongTermMessageRate
scores based on message rate, the inconsistency in unloading and placement criteria is highly likely to lead to repeated load balancing executions. Bundles are unloaded but placed on the wrong broker (brokerX), causing brokerX to be judged as overloaded again during the next shedding execution. This is also the motivation for proposing the new placement strategy LeastResourceUsageWithWeight
.
LeastLongTermMessageRate + UniformLoadShedder
This is a more suitable combination and is recommended. It successfully avoids the over placement issue. Both the shedding and placement strategies use message rate as the scoring criterion. However, using message rate naturally leads to issues in heterogeneous environments (i.e., adaptability issues). Moreover, UniformLoadShedder
cannot handle short-term traffic fluctuations and lacks stability.