Zookeeper & Eureka

Posted Jun 28, 20203 min read

Zookeeper

Zookeeper is designed based on CP, that is, any time access request to Zookeeper can get consistent data results, and the system is fault-tolerant to network segmentation, but it cannot guarantee the availability of each service request. From the actual situation, when using Zookeeper to obtain the service list, if zookeeper is selecting the master, or if more than half of the machines in the Zookeeper cluster are not available, then the data will not be obtained. So, Zookeeper cannot guarantee service availability.

In most distributed environments, especially those involving data storage, data consistency should be guaranteed first, which is why zookeeper is designed as a CP. But for service discovery scenarios, the situation is different. For the same service, even if the service provider information stored in different nodes of the registration center is not the same, it will not cause catastrophic consequences. Because for service consumers, being able to consume is the most important. Trying to consume after obtaining possibly incorrect service instance information is better than not consuming because you cannot obtain instance information.(It can fail quickly if you try it, and you can update the configuration and try again later) Therefore, for service discovery, availability is more important than data consistency. AP beats CP.

Eureka

Spring Cloud Netflix followed the AP principle when designing Eureka. Eureka Server can also run multiple instances to build a cluster to solve a single point problem, but unlike ZooKeeper's election process, Eureka Server uses Peer to Peer peer-to-peer communication. This is a decentralized architecture, no master/slave distinction, and each Peer is equivalent. In this architecture, nodes register with each other to improve availability. Each node needs to add one or more valid serviceUrls to point to other nodes. Each node can be regarded as a copy of other nodes.

If a certain Eureka Server goes down, Eureka Client's request will be automatically switched to the new Eureka Server node. When the downed server is restored, Eureka will put it into the server cluster management again. When a node starts accepting client requests, all operations will perform a replicateToPeer(inter-node replication) operation to replicate the request to all nodes currently known to other EurekaServers.

After a new Eureka Server node is started, it will first try to obtain all instance registry information from neighboring nodes to complete initialization. Eureka Server obtains all the nodes through the getEurekaServiceUrls() method, and it will be updated regularly through heartbeat renewal. In the default configuration, if Eureka Server does not receive the heartbeat of a service instance within a certain period of time, EurekaServer will cancel the instance(default is 90 seconds, configured by eureka.instance.lease-expiration-duration-in-seconds). When the Eureka Server node loses too many heartbeats in a short time(such as a network partition failure), then the node will enter self-protection mode.

to sum up

ZooKeeper is based on CP and does not guarantee high availability. If zookeeper is selecting the master, or if more than half of the machines in the Zookeeper cluster are unavailable, data will not be available. Eureka is based on AP and can guarantee high availability. Even if all machines are hung, it can also get locally cached data. As a registration center, in fact, the configuration does not change frequently, only when the version is released and the machine fails. For infrequently changing configurations, CP is not suitable, and APs can sacrifice consistency to ensure availability when they encounter problems, returning both old data and cached data.

So theoretically Eureka is more suitable as a registration center. Most projects in the real environment may use ZooKeeper, because the cluster is not large enough, and basically no more than half of the machines used as registration centers are hung up. So there is actually no big problem.