V8's garbage collection mechanism

Posted May 26, 20207 min read

Garbage collection strategy

Generally, garbage data collection is divided into two strategies:manual collection and automatic collection.

  • Manual recycling-when memory is allocated and when memory is destroyed is controlled by code. If the data is no longer needed, but is not actively destroyed, then this situation is called a memory leak.
  • Automatic collection-The generated garbage data is released by the garbage collector, and does not need to be manually released by code. Languages such as JavaScript, Java, Python, etc.

data storage

We know that the original data type in JavaScript is stored in the stack space, and the data of the reference type is stored in the heap space. Through this allocation method, we solved the problem of data memory allocation.
However, after some data is used, it may not be needed anymore. We call this data junk data. If these garbage data are kept in memory, the memory will be used more and more, so we need to recycle these garbage data to free up limited memory space. Next, we will talk about the recycling of stack data and heap data, respectively.

Call stack data recovery

Let's take the following code as an example

function outterFn() {
    var a = 'out'
    var b = {value:"out"}
    function innerFn() {
      var c = "in"
      var d = {value:"in"}
    }
    innerFn()
}
foo()

When the program runs to innerFn(), the snapshot of the call stack and heap space is as shown below.
It can be seen from the figure that the data of the original type is allocated to the stack, and the data of the reference type is allocated to the heap. When the execution of the outterFn function ends, the execution context of the outterFn function will be destroyed from the heap, so how is it destroyed? Let's analyze it below.

image.png

In the previous article, we introduced that if the function is executed, the JavaScript engine creates the execution context of the function and pushes the execution context of the function into the call stack, as shown in the above call stack. At the same time, there is a pointer(called ESP) that records the current execution state, pointing to the current function innerFn in the call stack. After the execution of the innerFn function is completed, the function execution flow enters the next function outterFn, then the execution context of the innerFn function needs to be destroyed. ESP will help you at this time. JavaScript will move ESP down to the execution context of the outerFn function. This down operation is the process of destroying the execution context of the innerFn function.

Heap data recycling

Through the above analysis, we know that after the execution of the outterFn function of the above code ends, ESP should point to the global execution context. In this case, the execution context of the outterFn function and innerFn function is in an invalid state, but it is saved in the heap The two objects in still occupy space.

To collect garbage data in the heap, you need to use the garbage collector in JavaScript. So, next we will analyze how the garbage data in the next heap is recycled.

Intergenerational Hypothesis

However, before officially introducing recycling, we need to learn the content of The Generational Hypothesis, which is an important term in the field of garbage collection. Subsequent garbage collection strategies are based on this hypothesis. So it is very important.
The intergenerational hypothesis has the following two characteristics:

  • The first is that most objects exist in memory for a short time. In short, many objects become inaccessible soon after they are allocated memory;
  • The second is an undead object, which will live longer.

With the foundation of the intergenerational hypothesis, we can discuss how V8 implements garbage collection. In V8, the heap will be divided into two regions:young generation and old generation

  • In the new generation, objects with short survival time are stored, and only support 1 ~ 8M capacity. Recycling using the secondary garbage collector
  • Objects that have survived for a long time in the old generation. Compared with the new generation area, the capacity is much larger, and the main garbage collector is used for recycling

work process

In fact, no matter what type of garbage collector, they all have a common execution process.

  • The first step is to mark active and inactive objects in the space. The so-called active objects are the objects still in use, and the inactive objects are the objects that can be garbage collected.
  • The second step is to reclaim the memory occupied by inactive objects. That is, after all the marking is completed, all objects marked as recyclable in the memory are cleaned up uniformly.
  • The third step is to organize the memory. Generally speaking, after frequently recycling objects, there will be a lot of discontinuous space in memory, we call these discontinuous memory spaces memory fragmentation. After a large amount of memory fragmentation occurs in the memory, if large contiguous memory needs to be allocated, there may be insufficient memory. So the last step needs to sort these memory fragments, but this step is actually optional, because some garbage collectors will not generate memory fragments, such as the secondary garbage collector we will introduce next.

Then next, we will follow this process to analyze how the new generation garbage collector(secondary garbage collector) and the old generation garbage collector(main garbage collector) handle garbage collection.

Deputy Garbage Collector

Normally, most small objects will be allocated to the new area, so although this area is not large, garbage collection is still relatively frequent. The new generation uses the Scavenge algorithm to deal with. The so-called Scavenge algorithm divides the Cenozoic space into two areas, half of which is the target area and half is the free area.
The newly added objects will be stored in the object area. When the object area is almost full, you need to perform a garbage cleaning operation. In the garbage collection process, first mark the garbage in the object area; after the marking is completed, enter the garbage cleanup phase, the secondary garbage collector will copy these live objects to the free area, and it will also these objects Arranged in an orderly manner, so this copying process is equivalent to completing the memory sorting operation. After copying, there is no memory fragmentation in the free area. After the copying is completed, the object area and the idle area are reversed, that is, the original object area becomes the idle area, and the original idle area becomes the object area. This completes the garbage object recycling operation, and at the same time, this role flip operation also allows the two areas in the new generation to be reused indefinitely. Because of the Scavenge algorithm used in the new generation, every time a cleanup operation is performed, the surviving objects need to be copied from the object area to the free area. However, the copying operation requires time cost. If the space in the freshman area is set too large, the time for each cleanup will be too long. Therefore, in order to perform efficiency, the space in the freshman area is generally set to be relatively small. It is precisely because the space in the newborn area is not large, so it is easy to fill the entire area with surviving objects. In order to solve this problem, the JavaScript engine uses an object promotion strategy, that is, objects that are still alive after two garbage collections will be moved to the old area.

Main garbage collector

In addition to those who are promoted in the new district, some large objects will be directly allocated to the old district. Therefore, the objects in the old area have two characteristics, one is that the object occupies a large space, and the other is that the object has a long survival time. Because the objects in the old area are relatively large, if you want to use the Scavenge algorithm for garbage collection in the old area, it will take more time to copy these large objects, resulting in inefficient recycling and half the space. Therefore, the main garbage collector uses the Mark-Sweep algorithm for garbage collection.
The first is the marking process stage. The marking phase starts from a set of root elements and recursively traverses the set of root elements. During this traversal, the elements that can be reached are called active objects, and the elements that do not arrive can be judged as garbage data .

From the previous picture, we can roughly see the marking process of garbage data. After the execution of the innerFn function ends, ESP moves down and points to the execution context of the outerFn function. At this time, if you traverse the call stack, you will not find the reference 1003 The variable of the address means that the data of 1003 is garbage data. Since the piece of data 1005 is referenced by the variable b, the piece of data will be marked as the active object. This is the general marking process.

The next step is the garbage removal process. It is completely different from the garbage removal process of the secondary garbage collector. You can roughly understand the removal process by referring to the following figure:the marking process and the cleaning process are the marking-clearing algorithm, but after the marking-clearing algorithm is executed multiple times on a block of memory, Large amounts of discontinuous memory fragmentation. And too much fragmentation will cause large objects to be unable to allocate enough continuous memory, so another algorithm-Mark-Compact(Mark-Compact), this marking process is still the same as the Mark-Clear algorithm, But the next step is not to directly clean up the recyclable objects, but to move all the surviving objects to one end, and then directly clean up the memory beyond the end boundary.
image.png

Full pause

Now we know that V8 uses the secondary garbage collector and the main garbage collector to handle garbage collection. However, because JavaScript runs on the main thread, once the garbage collection algorithm is executed, it is necessary to pause the JavaScript script that is being executed. Script execution resumes after garbage collection is completed. We call this behavior Stop-The-World. For example, the data in the heap is 1.5GB, and it takes more than 1 second for V8 to implement a complete garbage collection. This is also the time when the JavaScript thread is suspended due to garbage collection. If such time is spent, then the performance and response of the application Ability will plummet.
In the garbage collection of the new generation of V8, due to its small space and fewer surviving objects, the effect of full pause is not large, but the old generation is different. If the main thread takes too long during the garbage collection process, the main thread cannot do other things. For example, the page is executing a JavaScript animation, because the garbage collector is working, it will cause this animation to not be executed during this period, which will cause the page to freeze. In order to reduce the stall caused by garbage collection in the old generation, V8 divides the marking process into sub-marking processes, and at the same time allows garbage collection marking and JavaScript application logic to alternate until the marking phase is completed. We call this algorithm incremental. Incremental marking(Incremental Marking) algorithm. As shown below:

image.png

Using the incremental marking algorithm, a complete garbage collection task can be split into many small tasks. The execution time of these small tasks is relatively short, and they can be interspersed among other JavaScript tasks to be executed, so that when the above animation effect is executed, It will not let users feel that the page is stuck because of the garbage collection task.