1. UniformLoadShedder + LeastLongTermMessageRate
Last updated
Was this helpful?
Last updated
Was this helpful?
A pulsar cluster was built with 4 brokers and 20 bookies. However, one machine, XXX.34
, is heterogeneous and significantly more powerful than the other three machines.
The load balancing-related configurations are as follows:
Using the combination of UniformLoadShedder
and LeastLongTermMessageRate
, most of the configuration is set to default values. For instance, the can be up to 1.5 times the minimum message rate, while the can reach up to 4 times the minimum throughput.
Note: No evidence exists to indicate that, within the default settings, the throughput ratio threshold ought to be greater than the message rate ratio threshold. It is advisable for users to adjust these two thresholds according to actual scenarios.
The feature of bundle split
and uniform distribution of bundles
has been disabled:
It is strongly recommended to disable the bundle even distribution feature! This feature can render the load-balancing algorithm nearly ineffective. The reason is that it enforces the number of bundles across different brokers to be equivalent. When filtering candidate brokers, it will exclude the majority of brokers, the result of which will be fed into the load - balancing algorithm. In small clusters, the input of load-balancing algorithm usually contains one single broker. As a result, the load - balancing algorithm becomes nearly ineffective.
Two pressure testing tasks were launched:
To observe the execution of the UniformLoadShedder
algorithm, two additional panels were added:
The maximum-to-minimum ratio of throughput (in and out):
max(sum(pulsar_throughput_in+pulsar_throughput_out) by (instance))/min(sum(pulsar_throughput_in+pulsar_throughput_out) by (instance))
The maximum-to-minimum ratio of message rates (in and out):
max(sum(pulsar_rate_in+pulsar_rate_out) by (instance))/min(sum(pulsar_rate_in+pulsar_rate_out) by (instance))
One can observe that following a single round of load balancing, both ratios decreased from 2.5 to approximately 1.2.
Regarding message rates and throughput, this round of load balancing proved highly successful. The message rates and throughput across the brokers within the cluster converged significantly, and the system attained a stable state within just 5 minutes.
Nevertheless, upon examining the resource utilization metrics, it becomes evident that the cluster is, in fact, in a rather unbalanced condition. Since the performance of XXX.34
is significantly better than that of the other brokers, its resource utilization is much lower than that of the other nodes. This undoubtedly leads to resource wastage. If the load on each low-performance broker were to increase further, it could easily lead to an overload situation, while the high-performance machine XXX.34
would still remain at a low load level. Evidently, this is not the optimal state we anticipate. We would like XXX.34
to take on more load tasks.
To simulate abrupt load increases and decreases, an additional topic, persistent://public/default/testTxn
, was created. The production throughput remains consistent with that of other tasks. However, the consumption throughput halts every minute, pauses for one minute, and subsequently resumes consumption.
Upon observing the monitoring data, it is evident that the load - balancing algorithm continuously unloads bundles. This is due to the fact that the sudden fluctuations (both increases and decreases) in consumption throughput cause the ratio of the maximum to the minimum message rates to exceed the configured threshold of 1.5, thereby triggering continuous bundle unloading.