Is jvm difficult? I do n’t think so, do n’t brag, this map can be learned

Posted May 28, 202015 min read

Adhering to the consistent learning style and technical articles, let's take a brain map first

Technology is not a big deal. It s not just those things if you go deeper, just to see if you are really working hard in the process of learning. I personally think this is the difference between learning and dabbing

Then we look at the specific technical explanation


One, JVM memory model and garbage collection algorithm

1. According to the Java Virtual Machine Specification, JVM divides memory into:

  • New(young generation)
  • Tenured(old generation)
  • Permanent generation(Perm)

Among them, New and Tenured belong to the heap memory, and the heap memory will be allocated from the memory specified by the JVM startup parameter(-Xmx:3G). Perm does not belong to the heap memory, and is directly allocated by the virtual machine, but can be passed - XX:PermSize -XX:MaxPermSize and other parameters to adjust its size.

  • Young generation(New):The young generation is used to store the Java objects just allocated by the JVM
  • Old generation(Tenured):Objects not recovered by garbage collection in the young generation will be copied to the old generation
  • Permanent generation(Perm):The permanent generation stores Class and Method meta-information. Its size is related to the size of the project, the amount of classes and methods. The general setting is 128M. The setting principle is to reserve 30%of the space.

New is divided into several parts:

  • Eden:Eden is used to store the JVM just allocated objects
  • Survivor1
  • Survivro2:The two Survivor spaces are the same size, when the objects in Eden are not collected after garbage collection, they will be copied back and forth between the two Survivor, when a certain condition is met, such as the number of copies, it will Was copied to Tenured. Obviously, Survivor just increased the object's stay in the young generation, increasing the possibility of being garbage collected.

2. Garbage collection algorithm

Garbage collection algorithms can be divided into three categories, all based on mark-sweep(copy) algorithms:

  • Serial algorithm(single thread)
  • Parallel algorithm
  • Concurrency algorithm

The JVM will select the appropriate recycling algorithm for each memory generation according to the hardware configuration of the machine. For example, if the machine has more than one core, the parallel algorithm will be selected for the young generation. For details of the selection, please refer to the JVM tuning document.

A little explanation is that the parallel algorithm uses multi-threaded garbage collection, the execution of the program will be suspended during the collection, and the concurrent algorithm is also multi-threaded collection, but the application execution is not stopped during the period. ** So, the concurrent algorithm is suitable for some programs with high interactivity. After observation, the concurrent algorithm will reduce the size of the young generation. In fact, it uses a large old generation, which in turn has a relatively low throughput compared to the parallel algorithm.

Another question is, when is the garbage collection action performed?

  • When the memory of the young generation is full, a normal GC will be triggered, and the GC only recycles the young generation. It needs to be emphasized __ when __ , the young generation is full of Eden, Survivor will not cause GC
  • Full GC will be triggered when the old generation is full, and Full GC will recycle the young and old generations at the same time
  • Full GC will also be triggered when permanently filled, which will lead to the unloading of Class and Method meta information

Another question is when will an OutOfMemoryException be thrown, not thrown when the memory is exhausted

  • 98%of JVM time is spent in memory reclamation
  • Less than 2%of memory recovered each time

Satisfying these two conditions will trigger OutOfMemoryException, which will leave the system with a slight gap to do some operations before Down, such as manual printing of Heap Dump.


Two, memory leaks and solutions

  1. Some phenomena before the system crashes:
  • The time of each garbage collection is getting longer and longer, from the previous 10ms to about 50ms, and the time of FullGC is also extended from the previous 0.5s to 4 and 5s
  • There are more and more times of FullGC, and the most frequent time is to conduct FullGC in less than 1 minute
  • The memory of the old generation is getting larger and larger and no memory is released for the old generation after each FullGC

After that, the system will be unable to respond to new requests and gradually reach the critical value of OutOfMemoryError.

  1. Generate heap dump files

The current Heap information is generated by the JMX MBean, and the size is a 3G(the size of the entire heap) hprof file. If JMX is not started, the file can be generated through the Java jmap command.

  1. Analyze the dump file

The next thing to consider is how to open this 3G heap information file. Obviously, the general Window system does not have such a large memory, and it must use high-configuration Linux. Of course, we can use X-Window to import graphics from Linux to Window. We consider using the following tools to open the file:

  1. Visual VM
  2. IBM HeapAnalyzer
  3. The Hprof tool that comes with JDK

To ensure the loading speed when using these tools, it is recommended to set the maximum memory to 6G. After use, it was found that none of these tools can intuitively observe memory leaks. Although Visual VM can observe the size of the object, it cannot see the call stack; although HeapAnalyzer can see the call stack, it cannot open a 3G file correctly. Therefore, we chose Eclipse's special static memory analysis tool:Mat.

  1. Analysis of memory leaks

Through Mat, we can clearly see which objects are suspected as memory leaks, which objects occupy the most space and the calling relationship of objects. In response to this case, there are many instances of JbpmContext in ThreadLocal. After investigation, it is because the Context of JBPM was not closed.

In addition, we can also analyze the thread status through Mat or JMX, and observe which object the thread is blocked on, so as to determine the bottleneck of the system.

  1. Regression problem

Q:Why is the garbage collection time getting longer and longer before the crash?

A:According to the memory model and the garbage collection algorithm, garbage collection is divided into two parts:memory marking and cleaning(copying). The marked part will not change as long as the memory size is fixed for a fixed time, and the copy part will change, because each time there are some garbage collection The memory cannot be recovered, so the amount of copying is increased, resulting in longer time. Therefore, the time of garbage collection can also be used as a basis for judging memory leaks

Q:Why are there more and more times of Full GC?

A:Therefore, the accumulation of memory gradually exhausts the memory of the old generation, resulting in no more space for new object allocation, resulting in frequent garbage collection

Q:Why is the memory occupied by the older generation getting bigger?

A:Because the memory of the young generation cannot be recycled, more and more copies are copied to the old generation


Three, performance tuning

In addition to the above memory leak, we also found that the CPU is less than 3%for a long time, and the system throughput is not enough. For 8core × 16G, 64bit Linux servers, it is a serious waste of resources.

While the CPU load is insufficient, occasionally users will reflect the request for too long, we realized that the program and JVM must be tuned. From the following aspects:

  • Thread pool:Solve the problem of long user response time
  • connection pool
  • JVM startup parameters:Adjust the memory ratio and garbage collection algorithm of each generation to improve throughput
  • Program algorithm:Improve program logic algorithm to improve performance

\ * \ * 1. Java thread pool(
java.util.concurrent.ThreadPoolExecutor) \ * \ *

The thread pool used by most applications on JVM6 is the thread pool that comes with the JDK. The reason why the mature Java thread pool is explained is because the behavior of the thread pool is slightly different from what we imagined. The Java thread pool has several important configuration parameters:

  • corePoolSize:number of core threads(latest number of threads)
  • maximumPoolSize:the maximum number of threads, tasks exceeding this number will be rejected, users can customize the processing method through the RejectedExecutionHandler interface
  • keepAliveTime:the time the thread keeps alive
  • workQueue:work queue, store the tasks performed

The Java thread pool needs to pass in a Queue parameter(workQueue) to store the executed tasks, and for different choices of Queue, the thread pool has completely different behavior:

  • SynchronousQueue:a waiting queue with no capacity, the insert operation of one thread must wait for the remove operation of another thread, using this Queue thread pool will allocate a new thread for each task
  • LinkedBlockingQueue:Unbounded queue. With this queue, the thread pool will ignore the maximumPoolSize parameter and only use corePoolSize threads to process all tasks. Unprocessed tasks will be queued in LinkedBlockingQueue
  • ArrayBlockingQueue:Bounded queue, under the effect of bounded queue and maximumPoolSize, the program will be difficult to be tuned:larger Queue and small maximumPoolSize will result in low CPU load; small Queue and large pool, Queue will Did not start due to its role.

In fact, our requirements are very simple. We hope that the thread pool can set the minimum number of threads and the maximum number of threads like the connection pool. When the minimum number <task <maximum number, new threads should be assigned for processing; when the task> maximum number, You should wait for an idle thread to process the task.

However, the design idea of the thread pool is that the task should be placed in the Queue. When the Queue cannot be placed, the new thread is considered. If the Queue is full and the new thread cannot be derived, the task is rejected. The design results in "execution before release", "execution after release", "rejection without waiting". Therefore, according to different Queue parameters, the maximumPoolSize cannot be increased blindly to improve throughput.

Of course, to achieve our goal, we must encapsulate the thread pool. Fortunately, enough custom interfaces are left in the ThreadPoolExecutor to help us achieve our goal. The way we encapsulate is:

  • Use SynchronousQueue as a parameter to make maximumPoolSize play a role, to prevent threads from being allocated without restrictions, and at the same time can increase system throughput by increasing maximumPoolSize
  • Customize a RejectedExecutionHandler to process when the number of threads exceeds the maximumPoolSize. The processing method is to check whether the thread pool can execute a new task at intervals. If the rejected task can be put back into the thread pool, the inspection time depends on the size of keepAliveTime .

\ * \ * 2. Connection pool(
org.apache.commons.dbcp.BasicDataSource) \ * \ *

currently using
org.apache.commons.dbcp.BasicDataSource, because the default configuration was used before, so when the amount of access is large, it is observed through JMX that many Tomcat threads are blocked on the lock of the Apache ObjectPool used by BasicDataSource. The direct reason was because The maximum number of connections in the BasicDataSource connection pool is set too small. The default BasicDataSource configuration uses only 8 maximum connections.

I also observed a problem. When the system is not accessed for a long time, such as 2 days, Mysql on the DB will disconnect all the connections, resulting in the unusable connection in the connection pool. In order to solve these problems, we have thoroughly studied BasicDataSource and found some optimization points:

  • Mysql supports 100 links by default, so the configuration of each connection pool should be based on the number of machines in the cluster. If there are 2 servers, each can be set to 60
  • initialSize:the parameter is the number of open connections
  • minEvictableIdleTimeMillis:This parameter sets the idle time of each connection, beyond which the connection will be closed
  • timeBetweenEvictionRunsMillis:the running period of the background thread, used to detect expired connections
  • maxActive:the maximum number of connections that can be allocated
  • maxIdle:The maximum idle number. When the connection is used, it is found that the number of connections is greater than maxIdle, and the connection will be closed directly. Only connections with initialSize <x <maxIdle will be periodically checked for expiration. This parameter is mainly used to increase throughput during peak access.
  • How is initialSize maintained? After researching the code, it is found that BasicDataSource will close all overdue connections, and then open the initialSize number of connections. This feature, together with minEvictableIdleTimeMillis, timeBetweenEvictionRunsMillis, ensures that all overdue initialSize connections will be reconnected, thus avoiding Mysql from breaking for a long time without action Connection problem.

3. JVM parameters

In the JVM startup parameters, you can set some parameter settings related to memory and garbage collection. By default, no settings are made. The JVM will work very well, but some well-configured Servers and specific applications must be carefully tuned to obtain Best performance. By setting we hope to achieve some goals:

  • GC time is small enough
  • The number of GCs is low enough
  • Full GC cycle is long enough

The first two are contradictory at present. If the GC time is small, a smaller heap must be used. To ensure that the number of GCs is small enough, a larger heap must be guaranteed. We can only balance it.

(1) For the setting of the JVM heap, you can generally limit its minimum and maximum values by -Xms -Xmx, In order to prevent the garbage collector from shrinking the heap between the minimum and maximum and generating additional time, we usually set the maximum and minimum Set to the same value(2) The young and old generations will allocate heap memory according to the default ratio(1:2), you can adjust the ratio between the two by adjusting NewRadio The size can also be set for the recycling generation, such as the young generation, through -XX:newSize -XX:MaxNewSize to set its absolute size. Similarly, in order to prevent the young generation from shrinking, we usually set -XX:newSize -XX:MaxNewSize to the same size

(3) How big is the young and old generation setting? There is no doubt that there is no answer to my question, otherwise there will be no tuning. Let's observe what effect the changes in the size of the two

  • ** A larger young generation will inevitably lead to a smaller old generation. The larger young generation will extend the period of the ordinary GC, but will increase the time of each GC; the smaller old generation will cause the more frequent Full GC * *
  • Small young generation will inevitably lead to larger old generations. Small young generations will cause ordinary GCs to be frequent, but each GC time will be shorter; large old generations will reduce the frequency of Full GCs
  • How to choose should depend on the application Distribution of object life cycle:If the application has a large number of temporary objects, you should choose a larger young generation; if there are relatively more persistent objects, the old generation should be appropriately increased . However, many applications do not have such obvious characteristics, and the decision should be based on the following two points:(A) In the principle of Full GC as few as possible, let the old generation cache common objects as much as possible, the default ratio of JVM is 1:2(B) By observing the application for a period of time, to see how much memory other old generations will occupy at the peak, without affecting the Full GC, increase the young generation according to the actual situation, for example, you can control the ratio at 1:1. But at least 1/3 of growth space should be reserved for the old generation

(4) On machines with better configuration(such as multi-core, large memory), you can choose a parallel collection algorithm for the old generation:\ -XX:+ UseParallelOldGC, the default is Serial collection

(5) Thread stack setting:Each thread will open a 1M stack by default, used to store stack frames, call parameters, local variables, etc. For most applications, this default value is too large, generally 256K is sufficient. In theory, with the memory unchanged, reducing the stack of each thread can produce more threads, but this is actually limited by the operating system.

(4) Heap Dump information can be played through the following parameters

  • \ -XX:HeapDumpPath
  • \ -XX:+ PrintGCDetails
  • \ -XX:+ PrintGCTimeStamps
  • \ -Xloggc:/usr/aaa/dump/heap \ _trace.txt

The following parameters can be used to control the print heap information during OutOfMemoryError

  • \ -XX:+ HeapDumpOnOutOfMemoryError

Please take a look at the Java parameter configuration for a time:(Server:Linux 64Bit, 8Core × 16G)

JAVA \ _OPTS = "$JAVA \ _OPTS -server -Xms3G -Xmx3G -Xss256k -XX:PermSize = 128m -XX:MaxPermSize = 128m -XX:+ UseParallelOldGC -XX:+ HeapDumpOnOutOfMemoryError -XX:HeapDumpPath =/usr/aaa/dump -XX:+ PrintGCDetails -XX:+ PrintGCTimeStamps -Xloggc:/usr/aaa/dump/heap \ _trace.txt -XX:NewSize = 1G -XX:MaxNewSize = 1G "

After observing that the configuration is very stable, the time of ordinary GC is about 10ms each time, Full GC basically does not happen, or it only happens once after a long time.

By analyzing the dump file, it can be found that a Full GC occurs every 1 hour. After multiple verifications, as long as the JMX service is enabled in the JVM, JMX will perform a Full GC once an hour to clear the reference. For this, please refer to the attached document .

  1. Tuning of program algorithm:this time not to be the focus

\ ================================================= ===============================

Tuning Method

Everything is for this step, tuning, before tuning, we need to remember the following principles:

  1. Most Java applications do not require GC optimization on the server;

  2. Most Java applications that cause GC problems are not due to incorrect parameter settings, but code problems;

  3. Before the application goes online, first consider setting the machine's JVM parameters to the optimal(most suitable);

  4. Reduce the number of objects created;

  5. Reduce the use of global variables and large objects;

  6. GC optimization is the last resort.

  7. In actual use, there are much more optimized codes for analyzing GC conditions than optimizing GC parameters;

GC optimization has two purposes(

1. Reduce the number of objects transferred to the old generation to a minimum;

2. Reduce the execution time of full GC;

In order to achieve the above purpose, generally, you need to do:

  1. Reduce the use of global variables and large objects;

  2. Adjust the size of the new generation to the most appropriate;

  3. Set the size of the old generation as the most appropriate;

  4. Select the appropriate GC collector;

In the above four methods, several "appropriate" are used, what exactly is appropriate, in general, please refer to the suggestions in the "collector matching" and "starting memory allocation" sections above. However, these suggestions are not omnipotent, and need to be developed and changed according to your machine and application. In actual operation, you can set the two machines to different GC parameters, and compare them. Choose those that really improve performance or reduce Parameters of GC time.

The really skilled use of GC tuning is based on the actual combat experience of GC monitoring and tuning many times. The general steps for monitoring and tuning are:

1, monitor the status of GC

Use various JVM tools to view the current log, analyze the current JVM parameter settings, and analyze the current heap memory snapshot and gc log, according to the actual memory division of each area and GC execution time, feel whether to optimize;

2. Analyze the results and determine whether optimization is required

If the parameter settings are reasonable, the system does not have a timeout log, the GC frequency is not high, and the GC time is not high, then there is no need to perform GC optimization; if the GC time exceeds 1-3 seconds, or frequent GC, it must be optimized;

Note:If the following indicators are met, GC is generally not required:

Minor GC execution time is less than 50ms;

Minor GC is executed infrequently, about once every 10 seconds;

Full GC execution time is less than 1s;

The execution frequency of Full GC is not frequent, not less than once every 10 minutes;

3, adjust GC type and memory allocation

If the memory allocation is too large or too small, or the GC collector used is slow, you should first adjust these parameters, and first find one or several machines for beta, and then compare the performance of the optimized machine and the non-optimized machine Compare and make a final choice in a targeted manner;

4, continuous analysis and adjustment

Through continuous trial and error, analyze and find the most suitable parameters

5, comprehensive application parameters

If the most suitable parameters are found, they will be applied to all servers and follow-up.

Tuning example

The above content is all on paper. Let's illustrate with some real examples:

Example 1:

The author yesterday found that some development and testing machines are abnormal:
java.lang.OutOfMemoryError:GC overhead limit exceeded, this exception represents:

The GC takes too much time to free up very little space. There are two reasons for this:1. The heap is too small, 2. There are endless loops or large objects;

The author first ruled out the second reason, because this application is running online at the same time, if there is a problem, it hangs long ago. So it is suspected that the heap setting in this machine is too small;

Use ps -ef | grep "java" to view and find:

JVM performance tuning summary:JVM memory model, memory leaks and solutions, tuning method ~

The heap area of this application is only 768m, and the machine has 2g of memory. Only one java application runs on the machine, and there is no other place that needs to occupy memory. In addition, this application is relatively large and requires a lot of memory;

The author judged from the above situation, only need to change the size setting of each area in the heap, so changed to the following situation:

JVM performance tuning summary:JVM memory model, memory leaks and solutions, tuning methods ~

Tracking the running situation and found that the related abnormalities did not reappear;

Example 2:(

A service system, stuck often appears, analyze the reason, and find that the Full GC time is too long:

jstat -gcutil:

S0 S1 E O P YGC YGCT FGC FGCT GCT

12.16 0.00 5.18 63.78 20.32 54 2.047 5 6.946 8.993

Analyzing the above data, it was found that Young GC performed 54 times, which took 2.047 seconds, and each Young GC took 37ms. In the normal range, while Full GC performed 5 times, which took 6.946 seconds, with an average of 1.389s each time. , The data shows that the problem is:Full GC takes a long time, the analysis of the system means that NewRatio = 9, that is, the ratio of the size of the young generation to the old generation is 1:9, which is the problem the reason:

  1. The young generation is too small, which causes the object to enter the old generation ahead of time, triggering Full GC in the old generation;

  2. The old age is larger and it takes more time to perform Full GC;

The optimization method is to adjust the value of NewRatio to 4 and find that Full GC does not happen again, only Young GC is being executed. This is to clean up the objects in the new generation and not enter the old age(this approach is very useful for some applications, but not for all applications)

Example 3:

During the performance test of an application, it was found that the memory occupancy rate was high, and Full GC was frequent. Use sudo -u admin -H jmap -dump:format = b, file = filename.hprof pid to dump the memory, generate a dump file, and Using the mat gap analysis under Eclipse, we found:

JVM performance tuning summary:JVM memory model, memory leaks and solutions, tuning method ~

As can be seen from the figure, there is a problem with this thread. The large number of objects referenced by the LinkedBlockingQueue queue are not released, resulting in the entire thread occupying up to 378m of memory. At this time, the developer is notified to optimize the code and release the related objects.

Regarding the JVM, there is such a humble opinion, there is something wrong, I hope everyone points out, thank you

Pay attention to the public account:Java Architects Alliance, update the technical articles every day, and need an architecture diagram. After paying attention to the public account, reply to the "architecture diagram"