Kafka controller election principle

Posted Jun 28, 20204 min read

1. Introduction to Kafka Controller

There will be one or more brokers in the Kafka cluster, one of which will be elected as the controller(Kafka Controller), which is responsible for managing the status of all partitions and replicas in the entire cluster. When the leader copy of a partition(a partition will have multiple copies, of which only the leader copy provides external read and write services) fails, the controller is responsible for electing a new leader copy for the partition. When it is detected that the ISR set of a partition changes, the controller is responsible for informing all brokers to update their metadata information. When increasing the number of partitions for a topic, the controller is responsible for the reallocation of partitions.
_Introduction to partition collection:All copies in a partition are collectively called AR(Assigned Replicas). All replicas that maintain a certain degree of synchronization with the leader replica(including the leader replica) form an ISR(In-Sync Replicas). Out-of-Sync Replicas(OSR) consists of replicas that lag too much in synchronization with the leader replica(excluding the leader replica). _
The _leader replica is responsible for maintaining and tracking the lag status of all follower replicas in the ISR collection. When the follower replica lags too much or becomes invalid, the leader replica will remove it from the ISR collection. When a follower copy in the OSR set "catch up" the leader copy, then the leader copy will transfer it from the OSR set to the ISR set. By default, when the leader copy fails, only the copy in the ISR set is eligible to be elected as the new leader. _

2. Kafka controller election principle

The election of controllers in Kafka depends on Zookeeper. Brokers who successfully campaign for controllers will create /controller temporary(Ephemeral) nodes in Zookeeper. The content of this temporary node is as follows:

{"version":1,"brokerid":0,"timestamp":"1593330804078"}

The version is related to the Kafka version, which is a fixed value for the same Kafka version. brokerid represents the id number of the broker that became the controller, and timestamp represents the timestamp(accurate to milliseconds) when the campaign became the controller.
At any time, there is only one controller in the cluster. When each broker starts, it will try to read the Brokerrid value of the /controller node. If the Brokerrid value read is not -1, it means that other broker nodes have successfully campaigned for the controller, so the current broker is Will give up the election; if there is no /controller node in Zookeeper, or the data of this node is abnormal, then it will try to create a /controller node. When the current broker is going to create a node, there may be other brokers trying to create the node at the same time, and only the successful broker will become the controller. Each broker saves the current controller's brokerider value in memory, which can be identified as activeControllerId.
Zookeeper also has a controller-related /controller_epoch node. This node is a Persistent node, and the node stores an integer controller_epoch value. The controller_epoch value is used to record the number of times the controller has changed, that is, to record the current generation of controllers, which we can also call "controller era".
The initial value of controller_epoch is 1, that is, the epoch of the first controller in the cluster is 1. When the controller changes, each new controller is selected and the field value is increased by 1. Each request to interact with the controller will carry the controller_epoch field. If the requested controller_epoch value is less than the controller_epoch value in memory, the request is considered to be sent to the controller that has expired, then This request will be considered invalid. If the requested controller_epoch value is greater than the controller_epoch value in memory, it means that a new controller has been elected(that is, the broker that received this request is no longer the controller). It can be seen that Kafka guarantees the uniqueness of the controller through controller_epoch, and thus guarantees the consistency of related operations. `
Brokers with controller status need more responsibilities than other ordinary brokers. The details are as follows:

  • Monitor changes in the partition.

  • Monitor theme changes.

  • Monitor broker-related changes.

  • Read from Zookeeper to get all the current information related to topics, partitions and brokers and manage them accordingly.

  • Start and manage partition state machine and copy state machine.

  • Update the metadata information of the cluster.

    When the data of the /controller node changes, each broker will update the activeControllerId stored in its own memory. If the broker is the controller before the data is changed, and its own brokerider value is inconsistent with the new activeControllerId value after the data change, then you need to "back off" and close the corresponding resources, such as closing the state machine and canceling the corresponding listener. It is possible that the controller went offline due to an exception, causing the temporary node /controller to be automatically deleted; it may also be deleted for other reasons.
    When the /controller node is deleted, each broker will be elected. If the broker is the controller before the node is deleted, then there is also a backoff action before the election. If you have special needs, you can manually delete the /controller node to trigger a new round of elections. Of course, shutting down the broker corresponding to the controller and manually writing the corresponding data of the new brokerid to the /controller node can also trigger a new round of elections.

3. Summary

The process of Kafka controller selection is not complicated, but the various boundary conditions considered are relatively thoughtful, and the robustness of the program is better. Sometimes we need to design or adjust some kind of distributed cluster according to the business scenario, the election of Kafka controller can also be a good reference, especially the design of /controller_epoch, considering that the controller may be for some reason Outdated, the uniqueness of the controller is guaranteed.

4. References

"In-depth understanding of Kafka:core design and practical principles"