Automatic registration of node-exporter through Consul+Prometheus to realize automatic monitoring of OpenStack VM

Posted Jun 16, 20207 min read

1. Ask a question

In the work, the vm of the OpenStack cluster needs to solve the monitoring of basic performance indicators. If you start each one and then manually add the monitoring node_exporter, and then write prometheus.yml, it will be a nightmare for our lazy programmers. Prometheus+Consul monitoring program.

2. Solution

1. Force automatic deployment by packaging node_exporter into Image
2. Automatically register node_exporter to consul by developing a small program, and the small program is also packaged into Image like node_exporter
3. Configure Prometheus to discover node_exporter

3. Deploy Consul cluster

3.1 Cluster planning

System Hostname IP
Centos-7.7 compute-7-1 172.16.100.71
Centos-7.7 compute-7-2 172.16.100.72
Centos-7.7 compute-7-3 172.16.100.73

3.1 Download Consul and install it yourself

Consul v1.7.2

3.1.1 Configure master token

$curl \
    --request PUT \
    http://172.16.100.71:8500/v1/acl/bootstrap

3.1.2 Configure the obtained master token

compute-7-1:

{
    "bootstrap_expect":1,
    "datacenter":"sibat_consul",
    "primary_datacenter":"sibat_consul",
    "data_dir":"/data/consul",
    "start_join":[
        "172.16.100.72",
        "172.16.100.73"
   ],
    "retry_join":[
        "172.16.100.72",
        "172.16.100.73"
   ],
    "connect":{
        "enabled":true
    },
    "server":true,
    "client_addr":"0.0.0.0",
    "ui":true,
    "node_name":"compute-7-1",
    "bind_addr":"172.16.100.71",
    "advertise_addr":"172.16.100.71",
    "enable_script_checks":false,
    "enable_local_script_checks":true,
    "log_file":"/var/log",
    "log_rotate_bytes":300000000,
    "log_rotate_duration":"360h",
    "log_level":"info",
    "encrypt":"gEjZMbDxnA5UDS5DJRI3Nn5KvOwdVa46jneHK0gFDa8=",
    "acl":{
        "enabled":true,
        "default_policy":"deny",
        "enable_token_persistence":true,
        "tokens":{
            "master":"8dc1eb67-1f5f-4e10-ad9d-5e58b047647c"
        }
    }
}

compute-7-2

{
    "datacenter":"sibat_consul",
    "primary_datacenter":"sibat_consul",
    "data_dir":"/data/consul",
    "connect":{
        "enabled":true
    },
    "server":true,
    "client_addr":"0.0.0.0",
    "ui":true,
    "node_name":"compute-7-2",
    "bind_addr":"172.16.100.72",
    "advertise_addr":"172.16.100.72",
    "enable_script_checks":false,
    "enable_local_script_checks":true,
    "log_file":"/var/log",
    "log_rotate_bytes":300000000,
    "log_rotate_duration":"360h",
    "log_level":"info",
    "acl_datacenter":"sibat_consul",
    "encrypt":"gEjZMbDxnA5UDS5DJRI3Nn5KvOwdVa46jneHK0gFDa8=",
    "acl":{
        "enabled":true,
        "default_policy":"deny",
        "enable_token_persistence":true,
        "tokens":{
            "master":"8dc1eb67-1f5f-4e10-ad9d-5e58b047647c"
        }
    }
}

compute-7-3

{
    "datacenter":"sibat_consul",
    "primary_datacenter":"sibat_consul",
    "data_dir":"/data/consul",
    "connect":{
        "enabled":true
    },
    "server":true,
    "client_addr":"0.0.0.0",
    "ui":true,
    "node_name":"compute-7-3",
    "bind_addr":"172.16.100.73",
    "advertise_addr":"172.16.100.73",
    "enable_script_checks":false,
    "enable_local_script_checks":true,
    "log_file":"/var/log",
    "log_rotate_bytes":300000000,
    "log_rotate_duration":"360h",
    "log_level":"info",
    "acl_datacenter":"sibat_consul",
    "encrypt":"gEjZMbDxnA5UDS5DJRI3Nn5KvOwdVa46jneHK0gFDa8=",
    "acl":{
        "enabled":true,
        "default_policy":"deny",
        "enable_token_persistence":true,
        "tokens":{
            "master":"8dc1eb67-1f5f-4e10-ad9d-5e58b047647c"
        }
    }
}

Start in three nodes

3.1.3 All three nodes are executed

$sudo useradd consul

$sudo vim /usr/lib/systemd/system/consul.service
Description=consul:the monitoring system
Documentation=http://prometheus.io/docs/

[Service]
User=consul
Group=consul
ExecStart=/usr/bin/consul agent -config-file /etc/consul.d/consul_config.json
KillMode=process
Restart=on-failure
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target

$sudo systemctl daemon-reload

3.1.4 Execute in compute-7-2 and compute-7-3

$sudo systemctl restart consul && sudo systemctl enable consul

3.1.5 Execute in compute-7-3

$sudo systemctl restart consul && sudo systemctl enable consul

After startup, we will see that there are permissions-related errors in the server log. According to the official documentation, it is because the agent's token is not configured, so the agent's token needs to be created:

$curl \
    --request PUT \
    --header "X-Consul-Token:8dc1eb67-1f5f-4e10-ad9d-5e58b047647c" \
    --data \
    '{
    "Name":"Agent Token",
    "Type":"client",
    "Rules":"node \"\" {policy = \"write\"} service \"\" {policy = \"read\" }" }'http://172.16.100.71:8500/v1/acl/create

3.1.6 Configure the obtained agent token

compute-7-1:

{
    "bootstrap_expect":1,
    "datacenter":"sibat_consul",
    "primary_datacenter":"sibat_consul",
    "data_dir":"/data/consul",
    "start_join":[
        "172.16.100.72",
        "172.16.100.73"
   ],
    "retry_join":[
        "172.16.100.72",
        "172.16.100.73"
   ],
    "connect":{
        "enabled":true
    },
    "server":true,
    "client_addr":"0.0.0.0",
    "ui":true,
    "node_name":"compute-7-1",
    "bind_addr":"172.16.100.71",
    "advertise_addr":"172.16.100.71",
    "enable_script_checks":false,
    "enable_local_script_checks":true,
    "log_file":"/var/log",
    "log_rotate_bytes":300000000,
    "log_rotate_duration":"360h",
    "log_level":"info",
    "encrypt":"gEjZMbDxnA5UDS5DJRI3Nn5KvOwdVa46jneHK0gFDa8=",
    "acl":{
        "enabled":true,
        "default_policy":"deny",
        "enable_token_persistence":true,
        "tokens":{
            "master":"8dc1eb67-1f5f-4e10-ad9d-5e58b047647c",
            "agent":"883efc94-0c59-c46f-67cf-4644ac4adad2"
        }
    }
}

compute-7-2

{
    "datacenter":"sibat_consul",
    "primary_datacenter":"sibat_consul",
    "data_dir":"/data/consul",
    "connect":{
        "enabled":true
    },
    "server":true,
    "client_addr":"0.0.0.0",
    "ui":true,
    "node_name":"compute-7-2",
    "bind_addr":"172.16.100.72",
    "advertise_addr":"172.16.100.72",
    "enable_script_checks":false,
    "enable_local_script_checks":true,
    "log_file":"/var/log",
    "log_rotate_bytes":300000000,
    "log_rotate_duration":"360h",
    "log_level":"info",
    "acl_datacenter":"sibat_consul",
    "encrypt":"gEjZMbDxnA5UDS5DJRI3Nn5KvOwdVa46jneHK0gFDa8=",
    "acl":{
        "enabled":true,
        "default_policy":"deny",
        "enable_token_persistence":true,
        "tokens":{
            "master":"8dc1eb67-1f5f-4e10-ad9d-5e58b047647c",
            "agent":"883efc94-0c59-c46f-67cf-4644ac4adad2"
        }
    }
}

compute-7-3

{
    "datacenter":"sibat_consul",
    "primary_datacenter":"sibat_consul",
    "data_dir":"/data/consul",
    "connect":{
        "enabled":true
    },
    "server":true,
    "client_addr":"0.0.0.0",
    "ui":true,
    "node_name":"compute-7-3",
    "bind_addr":"172.16.100.73",
    "advertise_addr":"172.16.100.73",
    "enable_script_checks":false,
    "enable_local_script_checks":true,
    "log_file":"/var/log",
    "log_rotate_bytes":300000000,
    "log_rotate_duration":"360h",
    "log_level":"info",
    "acl_datacenter":"sibat_consul",
    "encrypt":"gEjZMbDxnA5UDS5DJRI3Nn5KvOwdVa46jneHK0gFDa8=",
    "acl":{
        "enabled":true,
        "default_policy":"deny",
        "enable_token_persistence":true,
        "tokens":{
            "master":"8dc1eb67-1f5f-4e10-ad9d-5e58b047647c",
            "agent":"883efc94-0c59-c46f-67cf-4644ac4adad2"
        }
    }
}

3.1.7 Execute in compute-7-2 and compute-7-3

$sudo systemctl restart consul && sudo systemctl enable consul

3.1.8 Execute in compute-7-3

$sudo systemctl restart consul && sudo systemctl enable consul

You can access the UI after the cluster is stable, http://172.16.100.71 :8500

4. Integrate Prometheus

$sudo vim /etc/prometheus/prometheus.yml
...
  -job_name:'OpenStack-vms'
    consul_sd_configs:
      -server:"172.16.100.71:8500"
        token:'8dc1eb67-1f5f-4e10-ad9d-5e58b047647c'
        services:[]
      -server:"172.16.100.72:8500"
        token:'8dc1eb67-1f5f-4e10-ad9d-5e58b047647c'
        services:[]
      -server:"172.16.100.73:8500"
        token:'8dc1eb67-1f5f-4e10-ad9d-5e58b047647c'
        services:[]
    relabel_configs:
      -source_labels:[__meta_consul_tags]
        regex:".*OpenStack-vms.*"
        replacement:OpenStack-vms
        action:keep
        target_label:env
      -regex:__meta_consul_service_metadata_(.+)
        action:labelmap
...

$sudo systemctl restart prometheus

After starting, you can find the job_name just configured in the prometheus UI:
TIM Picture 20200611134431.png

5. VMS automatic registration

Question:Regarding automatic registration, there is no better solution in the native components. I just started to use curl to automatically register by writing rc.local through the shell, but sometimes I found that there was still no registration. At the same time, it is found that consul is not a strong consistency registration center, and sometimes the same serviceid is registered to different nodes at the same time:
TIM Picture 20200611135436.png
So I developed a small program using go language to automatically register node_exporter, and use systemd to set up auto-startup to achieve the effect of automatic registration, and through A set of algorithms to avoid repeated registration and achieve balanced registration.

$wget https://github.com/FrankenFuncc/consul-registy-service/releases/download/202006161758/consulR.zip
$unzip consulR.zip
$wget https://github.com/prometheus/node_exporter/releases/download/v1.0.0/node_exporter-1.0.0.linux-amd64.tar.gz
$tar -zxvf node_exporter-1.0.0.linux-amd64.tar.gz -C /usr/local/
$mv /usr/local/node_exporter-1.0.0.linux-amd64.tar.gz /usr/local/node_exporter

Node_Exporter installation and auto start

$vim
[Unit]
Description=node_exporter:the monitoring system
Documentation=http://prometheus.io/docs/

[Service]
ExecStart=/usr/local/node_exporter/node_exporter
Restart=always
StartLimitInterval=0
RestartSec=10

[Install]
WantedBy=multi-user.target
$systemctl daemon-reload && systemctl start node_exporter && systemctl enable node_exporter

Consul installation and auto start

$vim /etc/consul/consul.yaml
System:
  ServiceName:consul-registy-service
  ListenAddress:0.0.0.0
  Port:9984
  #Retrieve the IP address of the egress network card by this IP and port
  FindAddress:8.8.8.8:80
Logs:
  LogFilePath:/data/consul/consul.log
  LogLevel:info
Consul:
  Address:172.16.100.71:8500,172.16.100.72:8500,172.16.100.73:8500
  Token:8dc1eb67-1f5f-4e10-ad9d-5e58b047647c
  CheckTimeout:5s
  CheckInterval:5s
  CheckDeregisterCriticalServiceAfter:true
  CheckDeregisterCriticalServiceAfterTime:5s
Service:
  Tag:node-exporter
  #Address is empty, the default is to use the FindAddress configuration to retrieve the egress network card IP address
  Address:
  Port:9100


$vim /usr/lib/systemd/system/consul.service
[Unit]
Description=Consul
After=network-online.target

[Service]
User=nobody
ExecStart=/usr/local/consul --confpath=/etc/consul/consul.yaml
Restart=on-failure
RestartSec=1

[Install]
WantedBy=multi-user.target
$systemctl daemon-reload && systemctl start consul && systemctl enable consul

After creating the image, you can use this image to be automatically discovered by prometheus.