Dare to write "familiar" zookeeper, these basic concepts must be mastered
Posted May 27, 2020 • 12 min read
This article mainly shares some basic concepts of
zookeeper. Before officially entering the topic, I would like to talk to you about my interview experience when I first started. It can be said that it is straightforward and cute.
Interviewer:Have you ever used
Me:Used, registration and discovery of services for
Interviewer:Do you know what
Me:I know, the registration center
Interviewer:How do you use
zookeeper in your project?
Me:Just add a
zookeeper service address to the
application.properties configuration file in
springboot. . .
The dialogue above seems to be fine, but it seems that something is not right. The result is that every time I answer this interview, I am passed.
Why was zookeeper asked? Because my resume project says proficient use of zookeeper, the "skilled" use that the interviewer understands is not configuration, and the project startup is not as simple as reporting an error. Therefore, it is still necessary to fully understand the relevant knowledge of zookeeper.
Zookeeper, as an open source subproject in the
Hadoop project, is a classic distributed data consistency solution, dedicated to providing distributed applications with high performance, high availability, and strict sequential access control capabilities Distributed coordination service.
1. The zookeeper data model
zookeeper maintains a data structure similar to a file system, and each subdirectory(/WeChat,/WeChat/public number) is called a
znode or node. Just like the file system, we can easily add and delete
znode nodes, and we can add and delete
child znode under a
znode. The difference is that the file system is
znode. Store data(strictly speaking, data must be stored, the default is a null character).
zookeeper is a directory node structure, it must start with
“/”when acquiring and creating nodes, otherwise it will report an error
Path must start with/character when acquiring nodes.
[zk:localhost:2181(CONNECTED) 13]get test Command failed:java.lang.IllegalArgumentException:Path must start with/character
The root node name must be
"/XXX ", and you must bring the root node directory
"/XXX/AAA " when creating child nodes.
For example:if you want to get the following picture
Programmer's internal affairs, the node must be stitched with a complete path
get/WeChat/public number/programmer internally incident
get/WeChat/Public account/Something in the programmer
znode is used to store
kb-level data, the maximum amount of data that can be stored is
1MB(please note:the data volume of a node not only contains its own stored data, it The names of all the child nodes of _ are also converted into Bytes, so the number of child nodes of
znode is not unlimited. Although the size of the node storage can be manually modified, it is generally not recommended.
2. znode node properties
A znode can not only store data, but also have some other special properties. Next we create a
/test node to analyze the meaning of its various attributes.
[zk:localhost:2181(CONNECTED) 6]get/test 456 cZxid = 0x59ac // ctime = Mon Mar 30 15:20:08 CST 2020 mZxid = 0x59ad mtime = Mon Mar 30 15:22:25 CST 2020 pZxid = 0x59ac cversion = 0 dataVersion = 2 aclVersion = 0 ephemeralOwner = 0x0 dataLength = 3 numChildren = 0
|cZxid||Transaction Id when the data node was created|
|mZxid||The latest thing Id when the data node is modified|
|pZxid||Current node's parent node transaction Id|
|ctime||The data node creation time|
|mtime||Last modification time of the data node|
|dataVersion||The current node version number(+1 increments each time the value is modified)|
|cversion||The version number of the child node(the number of times the child node is modified, the value +1 increments each time it is modified)|
|aclVersion||the current node acl version number(the node is modified acl permissions, the value +1 increments each time it is modified)|
|ephemeralOwner||Temporary node label, if the current node is a temporary node, the session ID(sessionId) of the creator of the storage is stored, if not, then the value = 0|
|dataLength||Length of data stored in current node|
|numChildren||The number of children under the current node|
We see that there are many attributes of a
znode node, but the main attributes are the three of
A change in the state of the
znode node will cause the node to receive a timestamp in the format
zxid. This timestamp is globally ordered, and the creation or update of a znode node will generate a new one. If the value of
zxid1 is less than the value of
zxid2, then the change in
zxid2 is after
zxid1. Each znode node has 3
cZxid(node creation time),
mZxid(the modification time of the node, not related to the child node),
pZxid(the node or the child nodes of the node(The time of the last creation or modification is irrelevant to the grandchildren).
zxid attribute is mainly used in the
zookeeper cluster, which is described in detail later when introducing the cluster.
There are three version numbers in the
dataversion(data version number),
cversion(child node version number), and
aclversion(the ACL permission version number owned by the node).
The data in
znode can have multiple versions. If there are multiple versions of data stored in a certain node, you need to bring the version number to query the data of this node. Whenever we modify the data of the
znode node, the
dataversion version number of the node will increment. When the client requests the
znode node, it will return both the node data and the version number. In addition, when
-1, you can ignore the version and operate. When setting permissions on a node, the
aclVersion version number will increase, and the ACL permissions control will be described in detail below.
To verify, we modify the data of the
/test node to see what has changed in
dataVersion, and find that the
dataVersion attribute becomes 3, and the version number is incremented.
[zk:localhost:2181(CONNECTED) 10]set/test 8888 cZxid = 0x59ac ctime = Mon Mar 30 15:20:08 CST 2020 mZxid = 0x59b6 mtime = Mon Mar 30 16:58:08 CST 2020 pZxid = 0x59ac cversion = 0 dataVersion = 3 aclVersion = 0 ephemeralOwner = 0x0 dataLength = 4 numChildren = 0
3. Types of znode
zookeeper has four types of
znode, you need to specify the type when creating a node with the client
zookeeper.create("/public number/programmer's internal affairs", "" .getBytes(), Ids.OPEN_ACL_UNSAFE, CreateMode.EPHEMERAL_SEQUENTIAL);
PERSISTENT\ -persistent directory node:After the client creates the node, disconnecting it from zookeeper will make the node persistent. When the client connects again, the node will still exist.
PERSISTENT_SEQUENTIAL\ -persistent sequential node:After the client creates the node, the node will be persistent when disconnected from zookeeper, and the connected node still exists. Zookeeper will sequentially number the node name, for example:/lock/0000000001 ,/Lock/0000000002,/lock/0000000003.
EPHEMERAL\ -Temporary directory node:After the client is disconnected from zookeeper, the node will be deleted
EPHEMERAL_SEQUENTIAL\ -Temporary sequential node:After the client is disconnected from zookeeper, the node is deleted and the node name will be sequentially numbered, for example:/lock/0000000001,/lock/0000000002,/lock/0000000003
Second, the ACL permissions control of the node
Access Control List(node permission control), through the
ACL mechanism to solve the access permission problem of
znode node, it should be noted that
zookeeper control of permissions is based on
znode level Yes, it means that the permissions between the nodes are not inherited, that is, the child nodes do not inherit the permissions of the parent node.
The format for setting ACL permissions in
zookeeper is composed of three sections:
schema:means of authorization
world:means anyone can access
auth:only authenticated users can access
digest:use username:password user password to generate MD5 hash value as authentication ID
host/ip:use client host IP address for authentication
id:The scope of the permission, used to identify the identity, depends on which way the schema chooses.
acl:What permissions are given to a node, the permissions of the node are create, delete, write, read, admin collectively referred to as
world:means anyone can access
Let's take a look at the
getAcl command, the permissions of
znode nodes that have not been set permissions, by default.
[zk:localhost:2181(CONNECTED) 12]getAcl/test 'world,' anyone :cdrwa
When you see a node that has no ACL attribute set, the default schema uses
world, the scope is
anyone, and the node permission is
cdwra, which means anyone can access it.
What if we want to set a world permission on a node other than the world for a schema?
auth:Only authenticated users can access
The schema uses
auth authorization to indicate that only authenticated users can access, then you need to add authenticated users first, and you need to set ACL permissions for authenticated users after you add them.
addauth digest test:password(clear text)
It should be noted that the password for setting up authenticated users is in clear text.
[zk:localhost:2181(CONNECTED) 2]addauth digest user:user //user name:password [zk:localhost:2181(CONNECTED) 5]setAcl/test auth:user:crdwa [zk:localhost:2181(CONNECTED) 6]getAcl/test 'digest,' user:ben + k/3JomjGj4mfd4fYsfM6p0A = :cdrwa
In fact, after we set up this way, we open this node to all authenticated users.
SetAcl/test auth:user:crdwa is equivalent to
setAcl/test auth ::crdwa.
digest:username:password verification method
Username:The authorization in password mode is for a single specific user. This method does not need to add an authenticated user first.
If the zookeeper client is used to set the ACL in the code, the password is in clear text, but if it is a client operation such as zk.cli, the password needs to be processed by
setAcl <path> digest:<user>:<password(ciphertext)>:<acl> setAcl/test digest:user:jalRr + knv/6L2uXdenC93dEDNuE =:crdwa
So how does the password encrypt? There are several ways:
echo -n <user>:<password> | openssl dgst -binary -sha1 | openssl base64
zookeeper built-in library org.apache.zookeeper.server.auth.DigestAuthenticationProvider` to generate
java -cp /zookeeper-3.4.13/zookeeper-3.4.13.jar:/zookeeper-3.4.13/lib/slf4j-api-1.7.25.jar \ org.apache.zookeeper.server.auth.DigestAuthenticationProvider \ root:root root:root-> root:qiTlqPLK7XM2ht3HMn02qRpkKIE =
host/ip:use client host IP address for authentication
This way is easier to understand. By authorizing a specific IP address, it can also be an IP segment.
[zk:localhost:2181(CONNECTED) 3]setAcl/test0000000014 ip:127.0.0.1:crdwa cZxid = 0x59ac ctime = Mon Mar 30 15:20:08 CST 2020 mZxid = 0x59b6 mtime = Mon Mar 30 16:58:08 CST 2020 pZxid = 0x59ac cversion = 0 dataVersion = 3 aclVersion = 3 //This version has been increasing ephemeralOwner = 0x0 dataLength = 4 numChildren = 0
3. The soul of zookeeper watcher
We said at the beginning:
zookeeper can provide service registration and discovery for
dubbo, as a registration center, but have you ever thought about why
zookeeper can achieve service registration and discovery? This has to say the watcher of the soul of
1. What is a watcher?
watcher is a very core function in
zooKeeper. The client
watcher can monitor the data changes of the node and the changes of its child nodes. Once these states change, the zooKeeper server will notify all the settings on this node. The client of the watcher, so that each client quickly perceives that the state of the node it is monitoring changes, and makes corresponding logical processing.
A brief introduction to
watcher, then let ’s analyze how
zookeeper implements the registration and discovery of services.
The service registration and discovery of
zookeeper mainly uses the
zonode node data model and
watcher mechanism. The general process is as follows:
- Service registration: When the service provider(
Provider) starts, it will register service information with the
zookeeper server, that is, create a node, for example:user registration service
com.xxx.user.register, And store service-related data(such as the service provider's IP address, port, etc.) on the node.
- Service discovery: When the service consumer(
Consumer) starts, it obtains registered service information from the
zookeeper serverand sets up
watch monitoringaccording to the dependent service information configured by itself, and obtains the registered service After the information, cache the service provider's information locally and make the service call.
- Service notification: Once the service provider is down for some reason and no longer provides services, the client disconnects from the
zookeeperserver, and the corresponding service node of the service provider on the
zookeeperserver will be Delete(for example:user registration service
com.xxx.user.register), then the
zookeeperserver will asynchronously register the service
com.xxx.user.registerwith all consumer users, and set the
watch monitoringThe service consumer issues a notification that the node is deleted, and the consumer pulls the latest service list based on the received notification and updates the locally cached service list.
The above process is the general principle that
zookeeper can realize service registration and discovery.
2, watcher type
znode node can set two types of
watch, one is
DataWatches, based on the data change of the znode node to trigger the
watch event, the trigger conditions
The other is
Child Watches, which is based on the watch event triggered by the change of the child node of znode, and the trigger conditions
When calling the delete() method to delete a znode, it will trigger both
Data Watches and
Child Watches. If the deleted node has a parent node, the parent node will trigger a
3. Watcher features
The watch event of the watch on the node is one-time! The client has set a watch on the specified node. Once the data of the node is changed and the client is notified once, the client's monitoring event on the node is invalid.
If you want to continue to monitor this node, you need to set the
watch event on the node to
True in the client ’s monitoring callback. Otherwise, the client can only receive the change notification of the node once.
4. What functions can zookeeper achieve?
The registration and discovery function of the service is just the tip of the iceberg of zookeeper. It can also implement a series of functions such as distributed locks, queues, and configuration centers. Next, we will only analyze the principle. The specific implementation of the online check for information is quite comprehensive. .
1. Distributed lock
zookeeper is an ordered node based on the
watcher mechanism and
znode, and is inherently a blank for distributed locks. First create a parent node
/test/lock as a lock, try to be a persistent node(PERSISTENT type), each client that tries to acquire this lock, create a temporary sequencer under the parent node
Due to the incremental nature of the serial number, we stipulate that the node with the smallest serial number acquires the lock. For example:the client to acquire the lock, create a node under
/test/lock node as
/test/lock/seq-00000001, it is the smallest, so it gets the lock first, other nodes wait for notification to acquire the lock again.
/test/lock/seq-00000001 deletes the node and releases the lock after executing its own logic.
So whose notification does the node
/test/lock/seq-00000002 want to obtain the lock?
Here we let the
/test/lock/seq-00000002 node listen to the
/test/lock/seq-00000001 node, and once the
/test/lock/seq-00000001 node is deleted, notify
/test/lock/seq -00000002 node, let it judge whether it is the smallest node again, if it gets the lock, it does not continue to wait for notification.
/test/lock/seq-00000003 node listens to
/test/lock/seq-00000002 nodes, always let the next node listen to the previous node, and do n’t let all nodes listen to the smallest node, avoid setting Unnecessary monitoring, so as not to cause a lot of invalid notifications, forming a "herding effect".
redis distributed lock,
zookeeper distributed lock is not recommended because of the poor performance of creating and deleting a large number of nodes.
2. Distributed queue
Zookeeper is also very simple to implement distributed queues. The ordered nodes of znode are naturally "first in, first out", and the nodes created later are always the largest, and the node with the lowest sequence number can always be used for dequeuing.
3. Configuration management
There are many open source projects that use Zookeeper to maintain configuration. Like message queue Kafka, Zookeeper is used to maintain broker information; dubbo manages service configuration information. The principle is also based on the
watcher mechanism, for example:create a
/config node to store some configurations, the client monitors this node, modify the configuration information of the
/config node a little, and notify each client of data changes to pull the configuration information again.
4. Naming Service
The "zookeeper" naming service:that is what we often say about service registration and discovery, mainly to obtain the address of the resource or service, the service provider and other information according to the specified name, using the characteristics of its
znode node and the
watcher mechanism , Using it as a configuration center for dynamic registration and access to service information, unified management of service names and their corresponding server list information, we can perceive the status of back-end servers(online, offline, and downtime) in near real time.
to sum up
The purpose of this article is to introduce you to the basics of zookeeper. Concepts like the zookeeper cluster selection that is asked more frequently in interviews are not written in this issue because the content of the cluster is also relatively large. I am afraid that the space is too large. In the end, everyone has no patience to read it(in fact, it is a bit lazy, ha ha ha!) Interested friends can pay attention to a wave, we will see the zookeeper cluster next time.