3. Strategy Selection

Based on the above analysis, although we have three shedding strategies and two placement strategies, which can produce 3*2=6 combinations, in fact, we only have two recommended choices:

  • ThresholdShedder + LeastResourceUsageWithWeight

  • UniformLoadShedder + LeastLongTermMessageRate

Both choices have their own advantages and disadvantages. You can choose according to the specific scenario needs. The following table summarizes the pros and cons of the two choices:

Strategy

Adaptability to Heterogeneous Environments (Adaptability)

Adaptability to Load Fluctuations (Stability)

Over Placement (Correctness)

Over Unloading (Correctness)

Speed

ThresholdShedder + LeastResourceUsageWithWeight

Fair 1

Good

Poor

Poor

Fair 3

UniformLoadShedder + LeastLongTermMessageRate

Poor 2

Poor

Good

Good

Fair 4

1. In terms of adaptability to heterogeneous environments, the performance of ThresholdShedder + LeastResourceUsageWithWeight can only be rated as fair. The reason is that ThresholdShedder cannot fully adapt to heterogeneous environments. Although it will not mistakenly judge a high-load broker as a low-load one, heterogeneous environments can still affect the load balancing effectiveness of ThresholdShedder.

For example, if the current cluster has three brokers with resource usage rates of 10, 50, and 70, respectively, and Broker1 and Broker2 are homogeneous, while Broker3 is idle but has a resource usage rate of 70 due to the deployment of other processes, we would hope that Broker3 could share some of the load with Broker1. However, since the average load is 43.33, and 43.33 + 10 > 50, Broker2 will not be judged as overloaded, and the overloaded Broker3 has no traffic to unload, thus putting the load balancing algorithm in a non-working state.

2. In the same scenario, if the combination of UniformLoadShedder and LeastLongTermMessageRate is used, the problem becomes even more severe. This would cause some load to be transferred from Broker2 to Broker3, resulting in a significant performance drop for all topics served by Broker3. Therefore, its adaptability is rated as poor.

Thus, it is not recommended to run Pulsar in a heterogeneous environment, as the current load balancing algorithms cannot adapt well.

3. In terms of load balancing speed, although ThresholdShedder + LeastResourceUsageWithWeight can unload the load of all high-load brokers at once, the historical weight algorithm seriously interferes with the accuracy of load balancing decisions. In practice, it requires multiple load balancing iterations to finally stabilize. Therefore, its load balancing speed score is only fair.

4. UniformLoadShedder + LeastLongTermMessageRate can only handle one overloaded broker at a time. Therefore, when there are many brokers, it takes a long time to complete load balancing. As a result, its load balancing speed score is also fair.

I have verified the above conclusions through experiments. You can refer to AvgShedder's PR content, where I have posted the process and results of the experiment.

[improve] [pip] PIP-364: Introduce a new load balance algorithm AvgShedder

Last updated

Was this helpful?