uncle killer

Posted May 27, 20207 min read

As a programmer under Linux, sometimes I have to face a problem, that is, the system memory is used up. What will the kernel do when the process applies to the kernel again? Will the malloc function called in the program return null?

In order to deal with the problem of insufficient memory, the Linux kernel has invented a mechanism called OOM(Out Of Memory) killer, which can be configured to control the behavior of the kernel when the memory is insufficient.

OOM killer

When both physical memory and swap space are used up, if there are processes to apply for memory, the kernel will trigger the OOM killer, which behaves as follows:

  1. Check the file/proc/sys/vm/panic \ _on \ _oom, if the value inside is 2, then the system must trigger panic
  2. If the value of/proc/sys/vm/panic \ _on \ _oom is 1, then the system may trigger panic(see introduction below)
  3. If the value of/proc/sys/vm/panic \ _on \ _oom is 0, or the panic was not triggered in the previous step, then the kernel continues to check the file/proc/sys/vm/oom \ _kill \ _allocating \ _task
  4. If/proc/sys/vm/oom \ _kill \ _allocating \ _task is 1, then the kernel will kill the process currently applying for memory
  5. If/proc/sys/vm/oom \ _kill \ _allocating \ _task is 0, the kernel will check the score of each process, and the process with the highest score will be killed(see below)

After the process is killed, if/proc/sys/vm/oom \ _dump \ _tasks is 1, and the core file size is set in the system's rlimit, the program specified in/proc/sys/kernel/core \ _pattern will be used Generate a core dump file, this file will contain
pid, uid, tgid, vm size, rss, nr \ _ptes, nr \ _pmds, swapents, oom \ _score \ _adj
Score, name and other content, after getting this core file, you can do some analysis to see why this process is selected to kill.

Here you can look at the default configuration of ubuntu:

#OOM after panic
dev @ ubuntu:~ $cat/proc/sys/vm/panic_on_oom
0

#OOMAfter killing the process with the highest score
dev @ ubuntu:~ $cat/proc/sys/vm/oom_kill_allocating_task
0

#Process will generate core dump file after being killed due to OOM
dev @ ubuntu:~ $cat/proc/sys/vm/oom_dump_tasks
1

#The default max core file size is 0, so the system will not generate a core file
dev @ ubuntu:~ $prlimit | grep CORE
CORE max core file size 0 unlimited blocks

#core dump file generation is handed to apport, related settings can refer to apport data
dev @ ubuntu:~ $cat/proc/sys/kernel/core_pattern
|/usr/share/apport/apport%p%s%c%P

Reference: apport

panic \ _on \ _oom

As described above, the value of the file can be 0/1/2, 0 is not to trigger panlic, 2 is to trigger panlic, if it is 1, then it depends on [mempolicy]( https://www.kernel . org/doc/Documentation/vm/numa_memory_policy.txt) and cpusets , this article does not introduce this content.

kernel default line after panic
In order to die there, the purpose is to give developers a chance to debug. But it is useless for most application layer developers, but I hope it restarts quickly. In order to restart the kernel after panic, you can modify the file/proc/sys/kernel/panic, which indicates how many seconds after panic the system will restart, the default value of this file is 0, which means never restart.

#Set panic to restart the system 3 seconds
dev @ ubuntu:~ $sudo sh -c "echo 3>/proc/sys/kernel/panic"

Adjust score

When the value of oom \ _kill \ _allocating \ _task is 0(the system default configuration), the system will kill the process with the highest score in the system. How did the score come from? This value is maintained by the kernel and stored in the/proc//oom \ _score file of each process.

The score of each process is affected by various factors, such as the time the process runs. The longer the time, the more important the program, so the lower the score; the more memory the process allocates after startup, the more memory it takes, the higher the score ; Here are just one or two factors that affect the score. The actual situation is much more complicated, you need to look at the kernel code. There is an article here for reference:[Taming the OOM killer]( https://lwn.net/Articles/317814 /)

Because the score calculation is complicated and difficult to control, the kernel provides another file to control the score, that is the file/proc//oom \ _adj. The default value of this file is 0, but it can be configured as -17 To any value in the middle of 15, after calculating the process score, the kernel will perform a calculation with the value of this file, and the result will be written into/proc//oom \ _score as the final score of the process. The calculation method is roughly as follows:

  • If the value of/proc//oom \ _adj is a positive number, the score will be multiplied by the power of 2, where n is the value in the file
  • If the value of/proc//oom \ _adj is negative, the score will be divided by 2 to the nth power, where n is the value in the file

Since the process score is a 16-bit integer in the kernel, -17 means that the final process score will always be 0, which means it will never be killed.

Of course, this control method is not very accurate, but at least it is much stronger than none.

Change setting

The above files can be modified in the following three ways. Here we will use panic \ _on \ _oom as an example:

  • Write files directly(invalid after restart)

      dev @ ubuntu:~ $sudo sh -c "echo 2>/proc/sys/vm/panic_on_oom"
  • Through control commands(invalid after restart)

      dev @ dev:~ $sudo sysctl vm.panic_on_oom = 2
  • Modify configuration file(will continue to take effect after restart)

      #Add vm.panic_on_oom = 2 to the file sysctl.conf through the editor(if it already exists, modify the configuration item)
      dev @ dev:~ $sudo vim /etc/sysctl.conf
    
      #Reload sysctl.conf to make the modification take effect immediately
      dev @ dev:~ $sudo sysctl -p

Journal

Once the OOM killer is triggered, the kernel will generate the corresponding log, which can generally be seen in/var/log/messages. If syslog is configured, the log may be in/var/log/syslog. Here is the log sample in ubuntu example

dev @ dev:~ $grep oom/var/log/syslog
Jan 23 21:30:29 dev kernel:[490.006836]eat_memory invoked oom-killer:gfp_mask = 0x24280ca, order = 0, oom_score_adj = 0
Jan 23 21:30:29 dev kernel:[490.006871][<ffffffff81191442>]oom_kill_process + 0x202/0x3c0

cgroup's OOM killer

In addition to the system's OOM killer, if a memory cgroup is configured, the process will be limited by the memory cgroup to which it belongs. If the cgroup's limit is exceeded, the cgroup's OOM killer, cgroup's OOM killer, and the system's OOM killer The behavior is slightly different, please refer to Linux Cgroup Series(04):Restrict cgroup memory usage .

malloc

malloc is a function of libc. C/C ++ programmers should be familiar with this function. It actually calls the kernel [sbrk]( http://man7.org/linux/man-pages/man2/brk . 2.html) and mmap , in order to avoid frequent calls to kernel functions and optimize performance, it is based on kernel functions Implemented a set of its own memory management functions.

Since there is an OOM killer to help us kill the process when there is not enough memory, will the malloc called at this time still return NULL to the application process? The answer is no, because there are only two cases:

  1. The current process of applying for memory is killed:all are killed, and it makes no sense to return anything
  2. Other processes are killed:free memory is released, so the kernel can allocate memory for the current process

When will we return NULL when we call malloc, as can be seen from the help file of the malloc function, the following two In all cases, NULL is returned:

  • The virtual address space used exceeds the limit of RLIMIT \ _AS
  • The data space used exceeds the limit of RLIMIT \ _DATA. The data space here includes the data segment of the program, BSS segment and heap

For the introduction of virtual address space and heap, please refer to Memory Usage of Linux Process . The default value of these two parameters is unlimited, so as long as they are not modified By default, the limit will not be triggered. There is an extreme case that needs to be noted, that is, there is a problem with the code writing, which exceeds the virtual address space range of the system. For example, the virtual address space range of a 32-bit system is only 4G. In this case, it is not sure what kind of system the system will use. Way returns error.

rlimit

Both RLIMIT \ _AS and RLIMIT \ _DATA mentioned above can be set and read by the function getrlimit and setrlimit , and at the same time Linux also provides a prlimit program to set and read the rlimit configuration.

prlimit is used instead
ulimit a program, in addition to the above two parameters can be set, there are other parameters, such as The size of the core file. For the usage of prlimit, please refer to its Help File .

#By default, the values ​​of RLIMIT_AS and RLIMIT_DATA are unlimited
dev @ dev:~ $prlimit | egrep "DATA | AS"
AS address space limit unlimited unlimited bytes
DATA max data size unlimited unlimited bytes

Test code

Programs in the C language will be affected by libc, and the segmentfault error may be triggered before the OOM killer is triggered. If you want to test the OOM killer with the C language program, you must pay attention to the behavior of malloc is affected by MMAP \ _THRESHOLD. If there is more memory, malloc will call mmap to map memory, which does not necessarily trigger the OOM killer. The specific details are not yet clear. Here is an example of triggering oom killer for reference:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

#define M(1024 * 1024)
#define K 1024

int main(int argc, char * argv [])
{
    char * p;
    int size = 0;
    while(1) {
        p =(char *) malloc(K);
        if(p == NULL) {
            printf("memory allocate failed! \ n");
            return -1;
        }
        memset(p, 0, K);
        size + = K;
        if(size%(100 * M) == 0) {
            printf("%d00M memory allocated \ n", size/(100 * M));
            sleep(1);
        }
    }

    return 0;
}

Conclusion

For a process, the use of memory is limited by many factors. It may reach the limit of rlimit and memory cgroup before the system runs out of memory. At the same time, it may also be affected by the relevant memory management libraries used by different programming languages. Even if the system is in a state of insufficient memory, applying for new memory does not necessarily trigger the OOM killer, which requires specific analysis of specific problems.

reference