ElasticSearch cluster setup (three nodes)

Posted May 25, 20204 min read

elasticSearch performance
Elastic performance is very much related to the machine's allocated memory and whether to use SSD SSD
Department, mainly because it can reduce the disk read and write interaction when searching, and the data search
Address, basically put all the data in memory;

Elasticsearch is not suitable for particularly large amounts of data, because it takes up too much memory,
If the amount of data is large, you can consider using big data technology, so it is necessary to estimate before using
The amount of business data and the machines and memory to be used;
basic configuration
#Need to be set to the same to join the cluster
cluster-name:xxxx;
#The name of each node cannot be the same
node-name:yyy
# Bind the local ip, otherwise it cannot be accessed remotely
network-host:192.168.1.10
#master election/communication timeout time between nodes(the configuration of this time needs to be set according to the actual situation)
discovery.zen.ping_timeout:3s
Cluster discovery mechanism
# unicast-list:It is to use several nodes in the cluster as communication nodes to receive and return information to other nodes to be added. It is not necessary to set all nodes as unicast list configuration items:
discovery.zen.ping.unicast.hosts:["192.168.1.6", "192.168.1.8", "192.168.1.11"]
Cluster fault detection
Two methods:
1. The master pings other nodes to ensure that all other nodes are alive;
2. Ping master node from other nodes;

Configuration item
# How often to ping a node
discovery.zen.fd.ping_interval:30s
# Timeout time for each ping
discovery.zen.fd.ping_timeout:120s
# How many times a node has been pinged failed
discovery.zen.fd.ping_retries:6
Principles of message replication between clusters
# Similar to the second order submission of zookeeper, but zookeeper is strongly consistent, and
# elasticsearch can configure the number of nodes that return messages, under normal circumstances(n/2 + 1) nodes will be configured
discovery.zen.minimum_master_nodes:2
Data and log addresses
# General recommended directory address
path.logs:/var/log/elasticsearch
path.data:/var/data/elasticsearch
log configuration
appender.rolling.strategy.type = DefaultRolloverStrategy
appender.rolling.strategy.action.type = Delete
appender.rolling.strategy.action.basepath = ${sys:es.logs.base \ _path}
appender.rolling.strategy.action.condition.type = IfLastModified
appender.rolling.strategy.action.condition.age = 7D
appender.rolling.strategy.action.PathConditions.type = IfFileName
appender.rolling.strategy.action.PathConditions.glob = ${sys:es.logs.cluster \ _name}-\ *
The first line is configured with the default DefaultRolloverStrategy
The second line is configured with Delete action, after the rollover, the file will be deleted
The third line is the basic path configured with es log
The fourth line is to configure the conditions under which rollover occurs, based on IfLastModified
The fifth line is the number of days configured for retention, here is 7 days
The sixth line is configured to delete files matching 7 days ago
The seventh line is configured to delete a file format, so that only delete expired log files, but do not delete the slow query log
Cluster brain splitting parameter configuration
the reason:
Split brain is mainly caused by network problems, because a cluster is divided into two networks
Go, cannot communicate with each other, so the master will be re-elected for the cluster when the network is restored
After that, it will cause data inconsistency or loss;

#The value is the number of nodes n \ 2 + 1, when there are only two nodes, no matter how many this value is, there is a problem;
discovery.zen.master_minimum_nodes:2
Meaningless shard reallocation when the cluster restarts
Reason:If there are ten nodes, five of them start first, and the remaining five nodes start
Very slow, then the five nodes started first will form a cluster, remember p-shard and r-
The allocation of shards, when the other five nodes are started, these shards will be redistributed, which consumes
A lot of resources and time;

Configuration items:
# How many nodes can start to form a cluster
gateway.recover_after_nodes:2
# How many nodes are expected to be connected before shard allocation begins
gateway.expected_nodes:3
# overtime time
gateway.recover_after_time:1m
General method of jvm allocation

Usually half of the machine's memory is allocated to the jvm, because the machine takes up not only the jvm, lucene, and system resources. If you don't use a lot of aggregation or filedata data, you can set the jvm memory to be smaller, but the jvm memory Cannot exceed 32G, more than 32 will cause a waste of memory, and it is related to the pointer compression of the data in the memory;

es background mode start
# es does not allow root user to start, so you need to add a new user
adduser elasticsearch;
passwd elasticsearch;
# Authorize, authorize the es related directory to the new user
chown -R elasticsearch/usr/local/...
# Add a line at the end of the /etc/sysctl.conf file, the number of file handles
vm.max_map_count = 262144
# Modify etc/security/limits.conf, add the following content
# soft xxx:Represents the warning setting, which can exceed this setting value, but there will be a warning after exceeding.
# hard xxx:stands for strict setting, it is not allowed to exceed the value of this setting
# nproc:is the operating system level limit on the number of processes created by each user
# nofile:is the limit on the number of files that each process can open
* soft nofile 65536
* hard nofile 65536
* soft nproc 4096
* hard nproc 4096
# start up
./bin/elasticsearch -d