1. UniformLoadShedder + LeastLongTermMessageRate

1.1 Environment Configuration

A pulsar cluster was built with 4 brokers and 20 bookies. However, one machine, XXX.34, is heterogeneous and significantly more powerful than the other three machines.

The load balancing-related configurations are as follows:

loadBalancerLoadSheddingStrategy=org.apache.pulsar.broker.loadbalance.impl.UniformLoadShedder
loadBalancerLoadPlacementStrategy=org.apache.pulsar.broker.loadbalance.impl.LeastLongTermMessageRate

Using the combination of UniformLoadShedder and LeastLongTermMessageRate, most of the configuration is set to default values. For instance, the maximum message rate can be up to 1.5 times the minimum message rate, while the maximum throughput can reach up to 4 times the minimum throughput.

Note: No evidence exists to indicate that, within the default settings, the throughput ratio threshold ought to be greater than the message rate ratio threshold. It is advisable for users to adjust these two thresholds according to actual scenarios.

The feature of bundle split and uniform distribution of bundles has been disabled:

loadBalancerDistributeBundlesEvenlyEnabled=false
loadBalancerAutoBundleSplitEnabled=false
loadBalancerAutoUnloadSplitBundlesEnabled=false

It is strongly recommended to disable the bundle even distribution feature! This feature can render the load-balancing algorithm nearly ineffective. The reason is that it enforces the number of bundles across different brokers to be equivalent. When filtering candidate brokers, it will exclude the majority of brokers, the result of which will be fed into the load - balancing algorithm. In small clusters, the input of load-balancing algorithm usually contains one single broker. As a result, the load - balancing algorithm becomes nearly ineffective.

1.2 Heterogeneous Environment

Two pressure testing tasks were launched:

To observe the execution of the UniformLoadShedder algorithm, two additional panels were added:

The maximum-to-minimum ratio of throughput (in and out):

max(sum(pulsar_throughput_in+pulsar_throughput_out) by (instance))/min(sum(pulsar_throughput_in+pulsar_throughput_out) by (instance))

The maximum-to-minimum ratio of message rates (in and out):

max(sum(pulsar_rate_in+pulsar_rate_out) by (instance))/min(sum(pulsar_rate_in+pulsar_rate_out) by (instance))

One can observe that following a single round of load balancing, both ratios decreased from 2.5 to approximately 1.2.

Regarding message rates and throughput, this round of load balancing proved highly successful. The message rates and throughput across the brokers within the cluster converged significantly, and the system attained a stable state within just 5 minutes.

Nevertheless, upon examining the resource utilization metrics, it becomes evident that the cluster is, in fact, in a rather unbalanced condition. Since the performance of XXX.34 is significantly better than that of the other brokers, its resource utilization is much lower than that of the other nodes. This undoubtedly leads to resource wastage. If the load on each low-performance broker were to increase further, it could easily lead to an overload situation, while the high-performance machine XXX.34 would still remain at a low load level. Evidently, this is not the optimal state we anticipate. We would like XXX.34 to take on more load tasks.

1.3 Load Fluctuation

To simulate abrupt load increases and decreases, an additional topic, persistent://public/default/testTxn, was created. The production throughput remains consistent with that of other tasks. However, the consumption throughput halts every minute, pauses for one minute, and subsequently resumes consumption.

Upon observing the monitoring data, it is evident that the load - balancing algorithm continuously unloads bundles. This is due to the fact that the sudden fluctuations (both increases and decreases) in consumption throughput cause the ratio of the maximum to the minimum message rates to exceed the configured threshold of 1.5, thereby triggering continuous bundle unloading.

PreviousChapter 4 Load Balancing Algorithm - Experimental Verification Next2. ThresholdShedder + LeastResourceUsageWithWeight

Last updated 1 month ago

Was this helpful?