Data index system construction practice
Posted May 25, 2020 • 11 min read
Why build a data indicator system?
Because we hope that time will be spent solving problems instead of finding them.
What we need is not data, but insight into the mapping behind the data.
In the process of business development, a large amount of data will be generated.It is of no value to look at the data alone.Only when it is combined with the business and transformed into information, then it can be reflected after processing.
For business data, it is usually divided into two items:one is the dimension, and the other is the measurement, or the indicator.These two constitute the basis of our data analysis.
For structured data, it can be understood as a multi-dimensional cube(cube), which contains dimensions and measures
Here is an example of a three-dimensional cube model:
Imagine the mode in which you operate the pivot table.You can perform multi-dimensional observations on this cube by drilling, slicing, dicing, etc., and record and compare multiple metrics to obtain the business status depicted by the data reduction, and then pass The comparison yields insights into business development to make corresponding decisions.
However, in actual business operations, it is not possible to fully understand the status of business development from just a few dimensions.We need to observe the business from a deeper level, and we need to quickly locate the cause when problems occur in the business indicators. Build a suitable indicator system to achieve.
1.1 Why build an indicator system
Monitoring the status of business development through an indicator system, the greatest value is to use time efficiently and spend time solving problems instead of looking for problems, thereby improving the overall human efficiency.
The output of the indicator system should be an indicator dictionary and the corresponding Dashboard display, which must meet at least the following requirements:
- Systematic index monitoring system, able to understand the current status of business development from multiple dimensions
- Can quickly locate the problem when there is a problem in business development
- Efficiently provide data support for the team
This is actually the first two layers of business data analysis capabilities, namely what happened and why.
Indicator type and naming
In the process of building the indicator system, the first action is to clarify the classification of the indicator and the naming method of the constrained indicator, so that each indicator can be known and reduce the communication cost. The division and naming of standard indicators:
The indicators are divided into atomic indicators and derived indicators
Atomic indicators are measurements based on the behavior of a business event. They are indivisible indicators in the business definition. They are nouns with clear business meanings, reflecting clear business statistics and calculation logic, such as payment amounts.
- Atomic index = business process + measurement
- Derived index = time period + modifier + atomic index. Derived index can be understood as the delineation of the statistical scope of the atomic index business.
The following figure is the relationship between the basic concepts:
Refer to Ali's definition of the above basic concepts:
Business segment:A higher-dimensional business division method than the data domain, which is suitable for particularly huge business systems.
Business process:refers to the business activity event of an enterprise. The following orders, payment, and refunds are all business processes. Please note that the business process is an inseparable behavioral event. In layman's terms, the business process is an event in an enterprise activity.
Modification type:It is an abstract division of modifiers. The modification type belongs to a certain service domain. For example, the access terminal type of the log domain covers modifiers such as wireless and PC.
Modifier:It points out that the business scenes of the indicators other than the statistical dimension limit abstraction. The modifier belongs to a modifier type. If there are access terminals in the log domain, there are modifiers such as PC and wireless.
Time period:It is used to clarify the time range or time point of data statistics, such as the last 30 days, the natural week, and the end of the day.
Metric/Atomic Index:Atomic index and metric have the same meaning. Based on the measurement of a business event, it is an indivisible indicator in the business definition. It is a noun with a clear business meaning, usually a combination of business process and measurement. , Such as the payment amount.
Dimension:A dimension is a measurement environment, used to reflect a type of attribute of a business. The collection of such attributes constitutes a dimension and can also become an entity object. Dimensions belong to a data domain, such as geographic latitude and time dimensions. For example, when analyzing the transaction process, the environment in which the transaction takes place can be described through dimensions such as buyer, seller, commodity, and time.
Derived indicators:derived indicators = one atomic indicator + multiple modifiers(optional) + time period. It can be understood as the delineation of the statistical scope of atomic indicators. For example, the atomic indicator:the payment amount, the payment amount of the overseas buyer in the last day is the derived indicator(the last day is the time period, the overseas is the modifier, the buyer is the dimension, not the modifier)
Results and process indicators
Resultive indicator: For example, GMV or order volume in the e-commerce scenario, which is usually the bottom of the business funnel and is an unchangeable, posterior indicator.
Procedural indicators: It can be simply understood as the path I took before reaching this result, and the process of measuring the quality of conversion through this path. It is intervenable and is usually "user behavior."
In the actual business operation process, we should not only pay attention to the result indicators, but also pay attention to the process indicators. By optimizing the process indicators , we can more effectively achieve the result indicators
After understanding the types of indicators, we can start to build our indicator system. First of all, what is the core indicator that we are concerned about?
The core index should be a result index, then dismantle the process index on the basis of the core index and divide the levels vertically, and then divide the relationship between the levels, and through the level division, we finally achieve what we need effect
Index system construction
When building an indicator system, use the OSM model horizontally and classify three levels of indicators vertically
3.1 Horizontal selection data indicators
The selection of data indicators needs to be directional, and the indicators that best represent the development of the business need to be selected according to the current status of the business. In this regard, there are mature models to refer to. Here, the OSM model is used to select indicators
OSM model(Obejective, Strategy, Measurement)
Respectively represent business goals, business strategy, business metrics
O:What is the user s goal for using the product? What do the products meet the needs of users? What is the core goal of the business
S:What is the strategy to achieve the above goals?
M:What are the changes in data indicators that these strategies bring?
(Reference) According to the core business objectives, the key indicators finally selected are as follows:
3.2 Vertical division of data indicator levels
Based on the data indicators selected above, the data indicators are hierarchically divided. The division of the indicator layer can help the company build a complete data monitoring indicator system, so as to discover the increase or decrease of performance in a timely manner, as well as the causes, saving money in finding problems On time.
Index grading can help us locate problems more efficiently and verify your methodology without having to think about which indexes to look at every time.
- First-level indicators:Company's strategic level indicators, which must be the core indicators recognized by the entire company to measure performance. It can directly guide the company's strategic goals and measure the company's business achievements. Essentially, it requires two-way understanding and recognition of management and subordinate employees, and it must be easy to communicate. Such as the company s sales, or the activity of social products
- Secondary indicators:Business strategy level indicators. The secondary indicators are the path indicators of the primary indicators. When the primary indicators change, we can quickly locate the cause of the problem by viewing the secondary indicators. For example, uv, conversion rate, and customer unit price, these three indicators can quickly locate the reason for the decrease in sales.
- Three-level indicators:business execution level indicators, the third-level indicators are the dismantling of the path of the second-level indicators, that is, the process indicators of the second-level indicators. Through the third-level indicators, you can efficiently locate the causes of the fluctuation of the second-level indicators, and you can quickly make corresponding actions. This step will be dismantled based on historical experience. When dismantling, you can try to ask yourself what do I need to do to achieve the secondary indicator? What are the indicators corresponding to these things?
Build a number of warehouses
The indicator system has been built, and the required indicators have been sorted out, but how can we calculate the indicators we need more reasonably and effectively?
This requires construction of data warehouses, multi-dimensional calculation and summary of data through the ETL process, and then visual display through Bi products, and finally we produce the indicator system we need.
And this data bin needs to satisfy idempotency, that is, the results of multiple review calculations should be the same.
During the construction process, only the offline data warehouse construction in T + 1 mode is considered, and other situations are not considered for the time being.
4.1 Divide the data domain
- Data warehouse is a subject-oriented(data synthesis, classification and abstraction for analysis and application) application. In addition to horizontal layering, the design of the data warehouse model usually needs to divide the data domain vertically according to the business situation. Data domain is a collection of more closely related data topics, which is a high-level conceptual classification of business objects, the purpose is to facilitate the management and application of data.
- Data domain refers to a collection that abstracts business processes or dimensions for business analysis. To protect the vitality of the entire system, the data domain needs to be abstracted and refined, and maintained and updated for a long time. When dividing the data domain, it can not only cover all current business needs, but also allow new services to be included in the existing data domain or expand the new data domain when entering. The division of the data domain can be carried out after business research, and it is necessary to analyze what business activities are in each business module.
- The data domain can be divided according to the department of the user company, or according to the business process or the function module in the business sector.
For example, the division of the following product domains(according to functional modules):
4.2 Constructing the bus matrix
After clarifying which business processes exist in each data domain, the bus matrix can be constructed.
At the same time, it is necessary to clarify which dimensions the business process is related to, and define the business process and dimensions under each data domain.
The key to constructing a bus matrix is:
- Clarify the data domain to which the business process belongs
- Clarify the relationship between business processes and dimensions
Based on the above two points, macroscopically construct the relationship between business topics and data domains, microscopically construct the relationship between business processes and dimensions in business topics
1 . Macro business matrix
The macro matrix is the relationship between business topics and data topics, and only the business processes involved in the community are placed here.
2. Micro business matrix
Micromatrix is the relationship between data subject and dimension.
When constructing the micro business matrix, we need to combine the analysis of the business process to define the dimension. According to the different forms of the business, we need to analyze from different dimensions. The definition of this dimension needs to be combined with the business scenario and analysis indicators. The final definition is as follows:
Based on the bus matrix obtained above, we can design the following model:
- Detailed model design:design consistency dimension table DIM and consistency fact table DWD
- Aggregation model design:design the common aggregation layer DWS and application aggregation layer ADS
4.3 Determination of statistical indicators
This step needs to calculate the data, standardize the naming according to the index system previously constructed using the OSM model and the index layering, and then calculate the indexes involved.
For example, by accessing the fact table, calculate the number of users in the community, the number of users in each sub-sector of the community, and split the platform, version, and user type.
4.4 Layering of data warehouse
The purpose of the data warehouse layering is that we hope that the flow of data can be more orderly and controllable, reduce repeated development, unify the data caliber, and respond to diverse data needs in a timely and effective manner. Refer to the following structure to organize the data:
DWD layer:detail fact layer DWS layer:topic summary layer, this step can be split into two layers:
The DWM layer summarizes daily data based on the dimensions of the detailed fact table, and the DWS layer summarizes data based on commonly used analysis dimensions
Perform hierarchical calculation according to the following hierarchical calling standards
- Prohibit reverse call
- Avoid calling at the same level
- Preferential use of public layer
- Avoid cross-layer references
Output indicator dictionary
The data calculates the indicators we need after the ETL, but in the calculation process of the data, we will encounter a lot of caliber calculation problems, requiring us to clear the caliber many times together with operations, technology, and products.
For example, the likes that are deleted during user interaction are not counted as likes, and the deleted comments are not considered as interactions or the like. Therefore, in the final stage, we need to output an index dictionary for the calculation caliber of each indicator in the calculation process, the handling of outliers, etc., so that we can communicate with the operation.
Of course, this step can also be output after the construction of the vertical indicator level, but since we cannot predict the problems in the calculation process in advance, it is recommended to output an accurate indicator dictionary at the final stage of the indicator system construction. A
The three elements that the indicator dictionary output must be clear are:indicator name, indicator description, calculation method, the structure can refer to the following indicator dictionary:
BI layer display
There are many ways to display the BI layer. Currently, the most common and common way to display is through BI tools.
Mainstream BI tools include foreign tableau and powerbi; domestic representative BI products include NetEase, Alibaba s QuickBI and smartBI. However, the above BI platform is charged, and the customization is not so strong.
Open source BI tools There are superset, redash, metabase in foreign countries, and CBoard and Davinci in China. For a brief introduction of these products, you can view this article to visualize BI tools. Here, superset is used to display BI layer.
Considering that we want to output an indicator system rather than a simple data display, it is necessary to perform basic cascading refresh and filter operations during the display so that operations can find the cause of the problem through the display layer.
The presentation layer needs to meet the following functions:
- Ability to describe the current operational status of the business from multiple angles
- Ability to analyze business in multiple dimensions from time, platform, and crowd
- Able to quickly obtain historical data through excel flexible analysis
- Able to understand the meaning of corresponding indicators
- Ability to focus on the core link
In addition, it is definitely unrealistic to display all data on the same DashBoard, so it is necessary to plan DashBoard to the following level according to the analysis theme
- Overall business level:This level focuses on the core theme of the current business status, namely the first and second level data in the indicator system
- Business execution level:This level focuses on the result indicators and path indicators of specific business topics, that is, the second and third level data in the indicator system