3. Strategy Selection
Based on the above analysis, although we have three shedding strategies and two placement strategies, which can produce 3*2=6
combinations, in fact, we only have two recommended choices:
ThresholdShedder + LeastResourceUsageWithWeight
UniformLoadShedder + LeastLongTermMessageRate
Both choices have their own advantages and disadvantages. You can choose according to the specific scenario needs. The following table summarizes the pros and cons of the two choices:
Strategy
Adaptability to Heterogeneous Environments (Adaptability)
Adaptability to Load Fluctuations (Stability)
Over Placement (Correctness)
Over Unloading (Correctness)
Speed
ThresholdShedder + LeastResourceUsageWithWeight
Fair 1
Good
Poor
Poor
Fair 3
UniformLoadShedder + LeastLongTermMessageRate
Poor 2
Poor
Good
Good
Fair 4
1. In terms of adaptability to heterogeneous environments, the performance of ThresholdShedder + LeastResourceUsageWithWeight
can only be rated as fair. The reason is that ThresholdShedder
cannot fully adapt to heterogeneous environments. Although it will not mistakenly judge a high-load broker as a low-load one, heterogeneous environments can still affect the load balancing effectiveness of ThresholdShedder
.
For example, if the current cluster has three brokers with resource usage rates of 10, 50, and 70, respectively, and Broker1 and Broker2 are homogeneous, while Broker3 is idle but has a resource usage rate of 70 due to the deployment of other processes, we would hope that Broker3 could share some of the load with Broker1. However, since the average load is 43.33, and 43.33 + 10 > 50, Broker2 will not be judged as overloaded, and the overloaded Broker3 has no traffic to unload, thus putting the load balancing algorithm in a non-working state.
2. In the same scenario, if the combination of UniformLoadShedder
and LeastLongTermMessageRate
is used, the problem becomes even more severe. This would cause some load to be transferred from Broker2 to Broker3, resulting in a significant performance drop for all topics served by Broker3. Therefore, its adaptability is rated as poor.
Thus, it is not recommended to run Pulsar in a heterogeneous environment, as the current load balancing algorithms cannot adapt well.
3. In terms of load balancing speed, although ThresholdShedder + LeastResourceUsageWithWeight
can unload the load of all high-load brokers at once, the historical weight algorithm seriously interferes with the accuracy of load balancing decisions. In practice, it requires multiple load balancing iterations to finally stabilize. Therefore, its load balancing speed score is only fair.
4. UniformLoadShedder + LeastLongTermMessageRate
can only handle one overloaded broker at a time. Therefore, when there are many brokers, it takes a long time to complete load balancing. As a result, its load balancing speed score is also fair.
I have verified the above conclusions through experiments. You can refer to AvgShedder
's PR content, where I have posted the process and results of the experiment.
Last updated
Was this helpful?