Issue 02: Prometheus data collection (1)

Posted May 25, 20204 min read

Column-first picture.png

Previous article(Issue 01:Detailed Prometheus column opening) introduces the architecture of Prometheus. This article will introduce Prometheus data collection. This article will first introduce the format and classification of the collected data, and then give some suggestions for use.

  1. Format and classification of collected data

1.1 Format of collected data x \ `

Prometheus uses metric to indicate monitoring metrics, which consists of metric name and label pairs:

<metric name> {<label name = <label value>, ...}

The metric name indicates the general characteristics of the monitoring metrics, for example, http \ _requests \ _total represents the total number of http requests received. The metric name must consist of letters, numbers, underscores, or colons. The colon is reserved for recording rules and should not be used directly.

labels embodies the dimensional characteristics of monitoring metrics, such as _http \ _requests \ _total {method = "POST", status = "200"} _ represents the total number of requests with a POST response result of 200. Prometheus can not only easily increase the description dimension by adding a label to a metric, but also conveniently support filtering and aggregation during data query. For example, when you need to obtain the total number of requests with a response of 200, you only need to specify _http \ _request \ _total {status = "200"} _.

Prometheus refers to a series of values generated by metric over time as a time series. The data at a certain point in time is called sample, which consists of a float64 float value and a timestamp in milliseconds.

1.2 Classification of collected data

After understanding the format of the data collected by Prometheus, let's take a look at its classification. Prometheus divides the collected data into four types:Counter, Gauge, Histogram, and Summary.

** It should be noted that this is just a logical classification. Prometheus does not use the type information of the collected data, but treats them as untyped data. This may change in the future.

Below, we will specifically introduce four types.

Counter

Counter is a counter type, suitable for monotonically increasing scenarios, such as the total number of requests, the total number of tasks completed, the total number of errors, etc. It is very irrelevant and will not reset to 0 due to restart.

Gauge

Gauge is used to represent values that can be increased or decreased, such as CPU and memory usage, IO size, and so on.

Histogram

Histogram is a cumulative histogram, which is usually used to describe the long-tail effect of monitoring items.

for example:

Suppose you use Hitogram to analyze the response time of API calls, and use the array \ [30ms, 100ms, 300ms, 1s, 3s, 5s, 10s ]to divide the response time into 8 intervals. Then every time the response time is collected, such as 200ms, then the corresponding interval(0, 30ms ),(30ms, 100ms ),(100ms, 300ms ) count will increase by 1. The response time is the abscissa, each The count value of each interval is the ordinate, and the cumulative histogram of API call response time can be obtained.

Summary

Summary is similar to Histogram, it records the quantile of the monitoring item. What is quantile? For example:suppose 100 times are called for an http request, and 100 response time values are obtained. Arrange the 100 time response values in ascending order, then the 0.9 quantile(90%position) represents the 90th number.

Histogram can approximate the percentile, but the result is not accurate, and Summary is calculated on the client, which is more accurate than Histogram. However, the Summary calculation consumes more resources, and the calculated indicator can no longer get the average or correlate with other indicators, so it is usually used independently.

Second, the use of recommendations

2.1 Metric naming

  • The metric name should start with the domain to which it belongs. For example, the metric about the process starts with process:process \ _ cpu \ _seconds \ _total.
  • Metric names should end with descriptive plural basic units. If it is a total metric, you can add total at the end, for example:http \ _requests \ _ total.

2.2 Selection of Label

Label should be used to describe the typical characteristics of metric, such as using _operation = "create | update | delete" _ to describe different types of http requests. Special attention should be paid:user ID and e-mail address can not be used as labels with a wide range of values, otherwise it will significantly increase the amount of data storage. At the same time, the number of labels for a metric should not be too much. The number of labels for a single metric should be kept within 10 as much as possible.

2.3 Selection of Histogram and Summary

  • If you need to use aggregate functions, use Histogram
  • If you have a rough expectation of the distribution of observations, use Histogram, otherwise use Summary

2.4 What should I monitor?

  • In terms of types of services, all types of services should be monitored:online services, offline services, and batch tasks
  • In terms of the realization of a single service, the key logic of the service should be monitored, such as the total number of key logic executions, the number of failures, the number of retries, etc.
  • In terms of service quality, the total number of service requests, request error rate and request response time should be monitored
  • In terms of system resources, resource utilization, saturation and errors should be monitored

Further reading:
[Recording rules] https://prometheus.io/docs/pr ... )

        .png

Related Posts