Chanler

Chanler

"Dark Horse JVM" 1. Basics

Introduction to JVM#

What is JVM#

JVM stands for Java Virtual Machine, which is essentially a program that runs on a computer, responsible for executing Java bytecode files.

First, use javac to compile the source code .java into .class bytecode files, and then use the java command to start the JVM to run the bytecode. The JVM will use a combination of interpretation and JIT compilation to execute the code on the computer.

Execution flow of Java program|500

JVM Functions#

JVM Functions

  1. Interpretation and Execution:
    1. Interprets the instructions in the bytecode file into machine code in real-time for execution by the computer.
  2. Memory Management:
    1. Automatically allocates memory for objects, methods, etc.
    2. Automatic garbage collection mechanism to reclaim unused objects.
  3. Just-In-Time Compilation (JIT):
    1. Optimizes hot code to improve execution efficiency.

Bytecode machines cannot recognize bytecode directly; they require the JVM to interpret it into machine code for execution. Observing C language, compiling .c source code directly into machine code is significantly less efficient. Therefore, JIT compilation was added, which optimizes hot bytecode instructions and stores them in memory for direct invocation later, eliminating the need for repeated interpretation and optimizing performance.

image.png|500

Java Virtual Machine Specification#

Specifies the requirements that the current version of the JVM for secondary development must meet: including the definition of class bytecode files, loading and initialization of classes and interfaces, instruction sets, and more.

The "Java Virtual Machine Specification" outlines the requirements for the design of the virtual machine, rather than the design requirements for Java itself. This means that the virtual machine can run class bytecode files generated by other languages such as Groovy and Scala.

NameAuthorSupported VersionsCommunity Activity (GitHub Stars)FeaturesApplicable Scenarios
HotSpot (Oracle JDK)OracleAll versionsHigh (Closed Source)Most widely used, stable and reliable, active community JIT support, default virtual machine for Oracle JDKDefault
HotSpot (Open JDK)OracleAll versionsMedium (16.1k)Same as above, open source, default virtual machine for Open JDKDefault for JDK secondary development
GraalVMOracleSupports enterprise versions 11, 17, 19, 8High (18.7k)Multi-language support, high performance, JIT, AOT supportMicroservices, cloud-native architecture requiring multi-language mixed programming
Dragonwell JDKAlibabaStandard version 8, 11, 17, extended version 11, 17Low (3.9k)Enhanced high performance based on OpenJDK, bug fixes, security improvements, supports JWarmup, ElasticHeap, Wisp featuresE-commerce, logistics, finance sectors with high performance requirements
Eclipse OpenJ9 (originally IBM J9)IBM8, 11, 17, 19, 20Low (3.1k)High performance, scalable JIT, AOT feature supportMicroservices, cloud-native architecture

HotSpot is the most widely used.

HotSpot Development History|500

Detailed Explanation of Bytecode Files#

Composition of the Java Virtual Machine#

image.png|500

ClassLoader: The core component responsible for loading the contents of bytecode files into memory.

Runtime Data Area: Manages the memory used by the JVM, where created objects, class information, and other content are stored.

Execution Engine: Includes the JIT compiler, interpreter, and garbage collector; the execution engine uses the interpreter to convert bytecode instructions into machine code, optimizes performance with the JIT compiler, and uses the garbage collector to reclaim unused objects.

Native Interface: Calls methods compiled in C/C++, declared in Java with the native keyword, such as public static native void sleep(long millis) throws InterruptedException;

Composition of Bytecode Files#

Viewing Bytecode#

Bytecode file viewer jclasslib

image.png|500

Composition of Bytecode Files#

Components of bytecode files:

  • Basic Information: Magic number, Java version number corresponding to the bytecode file, access flags (public, final, etc.), parent class and interface information.
  • Constant Pool: Stores string constants, class or interface names, field names, mainly used in bytecode instructions.
  • Fields: Information about fields declared in the current class or interface.
  • Methods: Information about methods declared in the current class or interface, with the core content being the bytecode instructions of the methods.
  • Attributes: Attributes of the class, such as the source file name, list of inner classes, etc.

Basic Information#

Files cannot be identified by file extension alone; file extensions can be arbitrarily modified without affecting the content of the file. Software verifies the file type by checking the first few bytes (file header) of the file. If the software does not support that type, it will throw an error.

In Java bytecode files, the file header is referred to as the magic number. The Java Virtual Machine checks whether the first four bytes of the bytecode file are 0xcafebabe; if not, the bytecode file cannot be used normally, and the Java Virtual Machine will throw an appropriate error.

The major version number is used to determine whether the current bytecode version is compatible with the JVM.

image.png|500

The major and minor version numbers refer to the JDK version number used when compiling the bytecode file. The major version number identifies the major version, while the minor version number distinguishes different minor version identifiers.
JDK 1.0 - 1.1 used 45.0 - 45.3.
From JDK 1.2 onwards, the major version number calculation method is major version number - 44, for example, 52 means the major version number is JDK 8.

If there is an incompatibility, such as a bytecode file version of 52 but a JVM version of 50:

  1. Upgrade the JDK.
  2. Lower the required version of the bytecode file, reduce the dependency version, or change the dependency.

Generally, option 2 is chosen to adjust dependencies, as upgrading the JDK is a significant action that may cause compatibility issues.

image.png|500

Constant Pool#

It can save string literals by storing only one copy, with strings stored as constants in the String class pointing to a UTF-8 literal constant.

Example: a="abc"; abc="abc" In this case, there is only one UTF-8 literal constant, referenced by the String class constant, as the name of the variable abc.

image.png|500

Methods#

Introduction: int i=0; i=i++; What is the final value of i?

image.png|500

The local variable table uses the order of declaration as the index, where the passed argument args is 0, and i and j are 1 and 2, respectively. The operand stack is used for operands.

Bytecode instruction parsing for int i=0; int j=i+1;

  1. iconst_0, pushes the constant 0 onto the operand stack, so the stack only contains 0.
    |500

  2. istore_1 pops the top of the operand stack and stores it in the local variable table at index 1.
    image.png|500

  3. iload_1 pushes the number at local variable table index 1 onto the operand stack, which is 0.
    image.png|500

  4. iconst_1 pushes the constant 1 onto the operand stack.

  5. iadd adds the top two numbers on the operand stack and puts the result back on the stack, leaving only the constant 1 on the operand stack.
    image.png|500

  6. istore_2 pops the top element 1 from the operand stack and stores it in the local variable table at index 2, completing the assignment of variable j.

  7. The return statement executes, ending the method and returning.
    image.png|500

Bytecode for int i=0; i=i++;
image.png|500

Bytecode for int i=0; i=++i;
image.png|500

Generally, the more bytecode instructions, the worse the performance. For the following three types of +1, what is the performance?

int i=0,j=0,k=0;
i++;
j = j + 1;
k += 1;

Three typical bytecode generations, but in reality, the JIT compiler may optimize all of these into iinc:

  • i++; (iinc)
  • j = j + 1; (iloadiconst_1iaddistore)
  • k += 1; (iloadiconst_1iaddistore)

Fields#

Fields store the information of fields declared in the current class or interface. As shown in the figure, two fields a1 and a2 are defined, and these two fields will appear in this part of the content, along with the field names, descriptors (field types), and access flags (public/private static final, etc.).

image.png|500

Attributes#

Attributes mainly refer to the properties of the class, such as the source file name, list of inner classes, etc.

image.png|500

Common Bytecode Tools#

javap is a decompilation tool that comes with the JDK, allowing you to view the contents of bytecode files through the console.

Simply enter javap to see all parameters, and enter javap -v xxx.class to view specific bytecode information. If it is a jar package, you need to first use the jar –xvf command to extract it.

image.png|500

jclasslib also has an IDEA plugin version that can view the contents of bytecode files compiled from code.

New tool: Alibaba Arthas

Arthas is an online monitoring and diagnostic product that allows real-time viewing of application load, memory, GC, and thread status information from a global perspective. It can diagnose business issues without modifying application code, using reflection.

image.png|500

Download the Arthas documentation jar package and run it.

Related commands:

  • dump -d /tmp/output java.lang.String saves the bytecode file to the local machine.
  • jad --source-only demo.MathGame decompiles the class bytecode into source code for verification.

Class Lifecycle#

Overview of Lifecycle#

Loading, Linking, Initialization, Usage, Unloading

Loading Phase#

  1. The first step of the Loading phase is for the class loader to obtain bytecode information in binary stream form through different channels based on the fully qualified name of the class. Programmers can use various channels to extend Java code:

    • Retrieve files from local disk.
    • Generate dynamically at runtime, such as through the Spring framework.
    • Obtain bytecode files over the network using Applet technology.
  2. After loading the class, the JVM saves the information from the bytecode into the method area, generating an InstanceKlass object in the method area to store all class information, which also includes information for implementing specific functionalities such as polymorphism.
    image.png|500

  3. The JVM will also generate a java.lang.Class object on the heap that is similar to the data in the method area, which serves to retrieve class information and store static field data in Java code from JDK 8 onwards.
    image.png|500

Linking Phase#

The linking phase consists of three sub-phases:

  1. Verification: Checks whether the content complies with the "Java Virtual Machine Specification."
  2. Preparation: Allocates memory for static variables and sets initial values.
  3. Resolution: Replaces symbolic references in the constant pool with direct references pointing to memory.
Linking Phase - Verification#

The main purpose of verification is to check whether the Java bytecode file adheres to the constraints in the "Java Virtual Machine Specification." This phase generally does not require programmer involvement:

  1. File format verification, such as whether the file starts with 0xCAFEBABE, whether the major and minor version numbers meet the current Java virtual machine version requirements, and whether the JDK version is not less than the file version.
  2. Metadata verification, such as whether a class must have a superclass (super cannot be null).
  3. Verification of the semantics of program execution instructions, such as whether the instructions in a method jump to incorrect locations during execution.
  4. Symbolic reference verification, such as whether private methods in other classes are accessed.
Linking Phase - Preparation#

The preparation phase allocates memory for static variables and sets initial values. Each basic data type and reference data type has its initial value. Note: The initial value here is the default value for each type, not the initial value set in the code.

Data TypeInitial Value
int0
long0L
short0
char‘\u0000’
byte0
booleanfalse
double0.0
Reference Data Typenull

In the example below, memory will be allocated for value and initialized to 0 during the linking phase - preparation sub-phase, and the value will only be modified to 1 during the initialization phase.

public class Student {
	public static int value = 1;
}

An exception is for variables modified with final, as these variables will not change in value later, so their values will be assigned during the preparation phase.

Linking Phase - Resolution#

The resolution phase primarily replaces symbolic references in the constant pool with direct references. Symbolic references access content in the constant pool using numbers in the bytecode file.

Direct references access specific data using memory addresses.

Initialization Phase#

The initialization phase executes the bytecode instructions of the clinit (class init) method, which includes assigning values to static variables and executing the code in static blocks (in code order).

The execution order of the clinit method is consistent with the order of the code.

The putstatic instruction pops the number from the operand stack and places it in the static variable position in the heap. The bytecode instruction #2 points to the static variable value in the constant pool, which will be replaced with the variable's address during the resolution phase.

image.png|500

The following methods can trigger class initialization:

  1. Accessing a class's static variable or static method. Note that if the variable is final and the right side of the equals sign is a constant, it will not trigger class initialization, as this variable has already been assigned a value during the preparation phase of the linking stage.
  2. Calling Class.forName(String className) can control whether to initialize.
  3. Creating an object of that class.
  4. Executing the current class's Main method.

Adding the -XX:+TraceClassLoading parameter in Java startup arguments can print the classes that are loaded and initialized.

Example questions:

image.png|500

image.png|500

The clinit instruction may not appear in specific situations, such as:

  1. No static code blocks and no static variable assignment statements.
  2. Static variable declarations without assignment statements.
  3. Static variable definitions using the final keyword; these variables will be assigned values directly during the preparation phase of the linking stage.

Inheritance situations:

  1. Directly accessing a superclass's static variable will not trigger the initialization of the subclass.
  2. The subclass's initialization clinit will call the superclass's clinit initialization method first.

Example questions:

Initializing the subclass first triggers the superclass.
image.png|500

Directly accessing the superclass static variable will not trigger subclass initialization.
image.png|500

Creating an array does not trigger the initialization of the elements' class in the array.

public class Test2 {
    public static void main(String[] args) {
        Test2_A[] arr = new Test2_A[10];
    }
}

class Test2_A {
    static {
        System.out.println("Test2 A's static block runs");
    }
}

If a variable modified with final requires execution instructions to derive its value, the clinit method will be executed for initialization.

public class Test4 {
    public static void main(String[] args) {
        System.out.println(Test4_A.a);
    }
}

class Test4_A {
    public static final int a = Integer.valueOf(1);

    static {
        System.out.println("Test4_A's static block runs");
    }
}

Class Loader#

The ClassLoader is a technology provided by the JVM for applications to obtain class and interface bytecode data. The class loader only participates in the bytecode acquisition and loading into memory during the loading process.

image.png|500

The class loader obtains the contents of the bytecode file in binary stream form and then hands the obtained data to the JVM, which generates corresponding objects in the method area and heap to store bytecode information.

Classification of Class Loaders#

Class loaders are divided into two categories: those implemented in Java code and those implemented in the JVM's underlying source code.

Versions Before JDK 8#

image.png|500

The BootStrap class loader loads the core jar packages of the JRE, which cannot be accessed in Java code.

image.png|500

The Extension class loader and Application class loader are both located in sun.misc.Launcher, which is a static inner class that inherits from URLClassLoader, capable of loading bytecode files into memory through directories or specified jar packages.

image.png|500

Using the -Djava.ext.dirs=jar package directory parameter can extend the directory for using extension jar packages, using ; (Windows) or : (macOS/Linux) for directory path separation.

image.png|500

The application class loader loads class files under the classpath, primarily loading classes from the project and classes from third-party jar packages introduced via Maven.

Parent Delegation Mechanism of Class Loaders#

Since there are multiple class loaders in the Java Virtual Machine, the core of the parent delegation mechanism is to resolve which class loader should load a class.

Mechanism functions:

  1. Prevents malicious code from replacing core libraries in the JDK, such as java.lang.String, ensuring the integrity and security of core libraries.
  2. Prevents duplicate loading, ensuring that a class is loaded only once by a class loader.

The parent delegation mechanism means that when a class loader receives a task to load a class, it will check upwards to see if it has already been loaded, and then attempt to load downwards.

The downward delegation loading establishes a loading priority, attempting to load from the bootstrap class loader downwards. If it is found in its loading directory, it will be successfully loaded.

image.png|500

Example: dev.chanler.my.C is in the classpath; it checks upwards from Application and finds it has not been loaded; it checks downwards from Bootstrap and finds it is not in the loading directory, so only Application can successfully load it because C is in the classpath.

Questions:

  1. If a class appears in the loading locations of three class loaders, who should load it?
    • The bootstrap class loader loads it, as per the parent delegation mechanism, which has the highest priority.
  2. Can the String class be overridden? If a java.lang.String class is created in your project, will it be loaded?
    • No, it will return the String class loaded by the bootstrap class loader in the rt.jar package.
  3. What is the parent delegation mechanism of classes?
    • When a class loader attempts to load a class, it checks upwards to see if it has already been loaded. If it has, it returns directly; if not, it checks downwards to load it.
    • The application class loader's parent class loader is the extension class loader, and the extension class loader's parent class loader is the bootstrap class loader, but in code, it is null because Bootstrap cannot be accessed.
    • The benefits of the parent delegation mechanism are twofold: first, it prevents malicious code from replacing core libraries in the JDK, such as java.lang.String, ensuring the integrity and security of core libraries; second, it prevents a class from being loaded multiple times.

Breaking the Parent Delegation Mechanism#

There are three ways to break the parent delegation mechanism, but essentially only the first method truly breaks it:

  • Custom class loaders: Custom class loaders that override the loadClass method, such as Tomcat, implement class isolation between applications.
  • Thread context class loaders: Using context class loaders to load classes, such as JDBC and JNDI.
  • OSGi framework class loaders: Historically, the OSGi framework implemented a new class loader mechanism that allows peer-to-peer delegation for class loading, which is rarely used now.
Breaking the Parent Delegation Mechanism - Custom Class Loader#

A Tomcat program can run multiple web applications. If these two applications have the same fully qualified name, such as the Servlet class, Tomcat must ensure that both classes can be loaded and that they should be different classes. Therefore, without breaking the parent delegation mechanism, it cannot load the second Servlet class.

image.png|500

Tomcat uses custom class loaders to achieve class isolation between applications, where each application has an independent class loader to load the corresponding classes.

image.png|500

Four core methods of ClassLoader

public Class<?> loadClass(String name)
Entry point for class loading, providing the parent delegation mechanism. Internally calls findClass, which is important.

protected Class<?> findClass(String name)
Implemented by class loader subclasses, retrieves binary data and calls defineClass, such as URLClassLoader, which retrieves binary data from class files based on file paths, which is important.

protected final Class<?> defineClass(String name, byte[] b, int off, int len)
Performs some class name validation and then calls the underlying method of the virtual machine to load the bytecode information into the virtual machine's memory.

protected final void resolveClass(Class<?> c)
Executes the linking phase in the class lifecycle; `loadClass` defaults to false.

The loadClass method defaults to resolve being false, so it does not perform the linking and initialization phases. Class.forName will perform loading, linking, and initialization.

To break the parent delegation mechanism, the core logic inside loadClass must be re-implemented.

image.png|500

Custom class loader parent defaults to AppClassLoader.

/**
 * Breaking the parent delegation mechanism - Custom Class Loader
 */
public class BreakClassLoader1 extends ClassLoader {

    private String basePath;
    private final static String FILE_EXT = ".class";

    // Set the loading directory
    public void setBasePath(String basePath) {
        this.basePath = basePath;
    }

    // Load files from the specified directory using commons io
    private byte[] loadClassData(String name)  {
        try {
            String tempName = name.replaceAll("\\.", Matcher.quoteReplacement(File.separator));
            FileInputStream fis = new FileInputStream(basePath + tempName + FILE_EXT);
            try {
                return IOUtils.toByteArray(fis);
            } finally {
                IOUtils.closeQuietly(fis);
            }

        } catch (Exception e) {
            System.out.println("Custom class loader failed to load, error reason: " + e.getMessage());
            return null;
        }
    }

    // Override loadClass method
    @Override
    public Class<?> loadClass(String name) throws ClassNotFoundException {
        // If it's in the java package, still use the parent delegation mechanism
        if(name.startsWith("java.")){
            return super.loadClass(name);
        }
        // Load from the specified directory on disk
        byte[] data = loadClassData(name);
        // Call the underlying method of the virtual machine to create objects in the method area and heap
        return defineClass(name, data, 0, data.length);
    }

    public static void main(String[] args) throws ClassNotFoundException, InstantiationException, IllegalAccessException, IOException {
        // First custom class loader object
        BreakClassLoader1 classLoader1 = new BreakClassLoader1();
        classLoader1.setBasePath("D:\\lib\\");

        Class<?> clazz1 = classLoader1.loadClass("com.itheima.my.A");
        // Second custom class loader object
        BreakClassLoader1 classLoader2 = new BreakClassLoader1();
        classLoader2.setBasePath("D:\\lib\\");

        Class<?> clazz2 = classLoader2.loadClass("com.itheima.my.A");

        System.out.println(clazz1 == clazz2);

        Thread.currentThread().setContextClassLoader(classLoader1);

        System.out.println(Thread.currentThread().getContextClassLoader());

        System.in.read();
     }
}

Question: Will two custom class loaders loading the same qualified name class conflict?

  • No, in the same JVM, only the same class loader + the same class qualified name will be considered the same class.
  • In Arthas, you can use the sc-d class name method to see the specific situation.

The parent delegation mechanism is in loadClass, while loadClass calls findClass, and overriding findClass is the real way to implement loading bytecode files from various channels, such as loading classes from a database, converting them into byte arrays, and calling defineClass to store them in memory.

Breaking the Parent Delegation Mechanism - Thread Context Class Loader JDBC Example#

The DriverManager class manages different drivers.

image.png|500

The DriverManager class is located in the rt.jar package, so it is loaded by the bootstrap class loader. However, DriverManager delegates the loading of driver jar packages to the application class loader.

image.png|500

Question: How does DriverManager know where to load the driver in the jar package?
The Service Provider Interface (SPI) is a built-in service discovery mechanism in JDK.

The thread context class loader is actually the application class loader by default.

image.png|500

Viewpoint: Does the JDBC example really break the parent delegation mechanism?

  • The DriverManager loaded by the bootstrap class loader delegates the loading of the driver class to the application class loader, breaking the parent delegation.
  • JDBC only triggers the loading of the driver class after the DriverManager has loaded, and the class loading still follows the parent delegation mechanism because loading through the application class loader still goes through the loadClass method, which contains the parent delegation mechanism.

It can only be said that from a macro perspective, it is the parent level delegating to the child level, while at the micro level, the execution layer's class loader internal function logic still follows the parent delegation, but the parent level refuses to execute.

Breaking the Parent Delegation Mechanism - OSGi Modular Framework#

image.png|500

Using Arthas Hot Deployment to Solve Online Bugs#

image.png|500

Notes:

  1. After the program restarts, the bytecode files will revert because they are only replaced in memory unless the class files are updated in the jar package.
  2. Using retransform cannot add methods or fields, nor can it update methods that are currently being executed.

Class Loaders After JDK 9#

In versions before JDK 8, the Extension and Application class loaders inherited from URLClassLoader in sun.misc.Launcher.java.

image.png|500

After JDK 9, the concept of modules was introduced, and the design of class loaders changed significantly.

The bootstrap class loader is implemented in Java and located in jdk.internal.loader.ClassLoaders. The Java BootClassLoader inherits from BuiltinClassLoader, which implements finding the bytecode resources to load from modules.

image.png|500

The platform class loader follows a modular approach to load bytecode files, so its inheritance relationship changes from URLClassLoader to BuiltinClassLoader. BuiltinClassLoader implements loading bytecode files from modules, primarily for backward compatibility with older versions, without special logic.

JVM Memory Area - Runtime Data Area#

The runtime data area is responsible for managing the memory used by the JVM, such as creating and destroying objects.

The "Java Virtual Machine Specification" defines the role of each part, divided into two major blocks: thread-private and thread-shared.

Thread-private: program counter, Java virtual machine stack, native method stack.
Thread-shared: method area, heap.

image.png|500

Program Counter#

The Program Counter Register (PC Register) records the address of the bytecode instruction currently being executed for each thread.

image.png|500

Example:

During the loading phase, the virtual machine reads the instructions from the bytecode file into memory and converts the offsets from the original file into memory addresses. Each bytecode instruction has a memory address.

image.png|500

During code execution, the program counter records the address of the next bytecode instruction. After executing the current instruction, the virtual machine's execution engine uses the program counter to execute the next instruction. Here, for simplicity, offsets are used as a substitute; the actual memory execution should preserve addresses.

image.png|500

Continuing down to the last line, the return statement indicates the end of the current method execution, and the program counter will hold the address of the method's exit, which is the address to return to the calling method.

image.png|500

Thus, the program counter can control the flow of program instructions, implementing branching, jumping, exceptions, and other logic by simply placing the address of the next instruction to be executed in the program counter.

In a multi-threaded situation, the program counter can also record the instruction address to be interpreted and executed next before the CPU switches, facilitating a return to continue interpretation and execution.

image.png|500

Question: Can the program counter experience memory overflow during operation?

  • Memory overflow refers to a situation where the data stored in a certain memory area exceeds the maximum memory limit that the virtual machine can provide.
  • Since each thread only stores a fixed-length memory address, the program counter will not experience memory overflow.
  • Programmers do not need to handle the program counter.

JVM Stack#

The Java Virtual Machine Stack uses a stack data structure to manage basic data during method calls, following a First In Last Out (FILO) principle. Each method call uses a stack frame to save its information.

image.png|500

The stack frame in the Java Virtual Machine Stack mainly contains three aspects:

  • Local Variable Table: Stores all local variables during method execution.
  • Operand Stack: A region in the stack frame used to store temporary data during instruction execution.
  • Frame Data: Mainly contains dynamic linking, method exit, and references to the exception table.

Local Variable Table#

The local variable table is used to store all local variables during method execution.

There are two types of local variable tables:

  • One is in the bytecode file.
  • The other is in the stack frame, stored in memory. The local variable table in the stack frame is generated based on the contents of the bytecode file.

Effective range: The effective range of this local variable is the range in the bytecode where it can be accessed. The starting PC indicates from which offset it can access this variable, ensuring the variable has been initialized. The length indicates the length of the effective range of this local variable starting from the starting PC, such as j having an effective range until the return bytecode instruction on line 4.

The local variable table in the stack frame is an array, with one position being one slot. long and double occupy two slots.

image.png|500

The this reference of instance objects and method parameters are also stored at the beginning of the local variable table in the order of their definitions.

image.png|500

Question: How many slots does the following code occupy?

public void test4(int k,int m){
    {
        int a = 1;
        int b = 2;
    }
    {
        int c = 1;
    }
    int i = 0;
    long j = 1;
}

Is this, k, m, a, b, c, i, j a total of 9 slots? Not necessarily.

To save space, the slots in the local variable table can be reused. Once a local variable is no longer in use, the current slot can be reused. In this case, a, b, and c will not be used later and can be reused by i and j. However, the this reference and method parameters persist throughout the method's lifecycle, and the slots they occupy will not be reused.

Thus, the number of slots in the local variable table should be the minimum required at runtime, which can be determined at compile time. During execution, the stack frame simply creates a local variable table array of the corresponding length.

image.png|500

Operand Stack#

The operand stack is a region in the stack frame used by the virtual machine to store intermediate data during instruction execution, following a stack structure.

The maximum depth of the operand stack can be determined at compile time, allowing for correct memory allocation during execution.

Example: The maximum depth of the operand stack is 2.

image.png|500

Frame Data#

Frame data mainly includes dynamic linking, method exit, and references to the exception table.

Dynamic Linking#

When the bytecode instructions of the current class reference attributes or methods of other classes, it needs to convert symbolic references (numbers) into corresponding memory addresses in the runtime constant pool.

Dynamic linking saves the mapping relationship from numbers to memory addresses in the runtime constant pool.

image.png|500

Method Exit#

Method exit refers to when a method ends correctly or abnormally, the current stack frame will be popped, and the program counter should point to the address of the next instruction in the previous stack frame, which is the address of the next instruction in the caller.

image.png|500

Exception Table#

The exception table stores information about exception handling in the code, including the effective range of exception capture and the bytecode instruction positions to jump to after an exception occurs.

Example: In this exception table, the starting offset for exception capture is 2, and the ending offset is 4. If an object of java.lang.Exception or its subclass is thrown during execution from 2 to 4, it will be captured and then jump to the instruction at offset 7.

image.png|500

Stack Memory Overflow#

If the JVM stack has too many stack frames, exceeding the maximum size that the stack memory can allocate, a memory overflow will occur, resulting in a StackOverflowError.

image.png|500

You can set the virtual machine parameters -Xss1m or -Xss1024K. A 1M JVM stack can accommodate 10,676 stack frames.

Each version of the JVM will also have requirements for stack size. The HotSpot JVM requires a minimum of 180K and a maximum of 1024M for JDK 8 on Windows 64-bit.

Native Method Stack#

In the HotSpot JVM, the Java Virtual Machine stack and the native method stack use the same stack space. The native method stack stores information such as parameters, local variables, and return values for native methods.

Native methods refer to methods written in C language within the JVM, publicly declared in Java code to allow calls.

Heap Memory#

In general, the heap memory in a Java program is the largest memory area, shared among threads.

All created objects reside in the heap, while the local variable table on the stack can store references to objects in the heap. Static variables can also store references to heap objects, allowing objects to be shared between threads.

image.png|500

Heap Memory Overflow#

The heap memory size has an upper limit. When continuously adding objects to the heap reaches this limit, an OutOfMemoryError (OOM) will be thrown. In this code, continuously creating 100M-sized byte arrays and placing them in an ArrayList will eventually exceed the heap memory limit, throwing an OOM error.

/**
 * Usage and recovery of heap memory
 */
public class Demo1 {
    public static void main(String[] args) throws InterruptedException, IOException {
        ArrayList<Object> objects = new ArrayList<Object>();
        System.in.read();
        while (true){
            objects.add(new byte[1024 * 1024 * 100]);
            Thread.sleep(1000);
        }
    }
}

Three Important Values#

There are three values to pay attention to in heap space: used, total, and max.
used refers to the currently used heap memory, total is the available heap memory allocated by the JVM, and max is the maximum heap memory allowed by the JVM, meaning total can expand to a maximum size of max.

In Arthas, you can see these three values used, total, and max by using the command dashboard -i refresh rate (5000ms).

If no virtual machine parameters are set, max defaults to 1/4 of the system memory, and total defaults to 1/64 of the system memory.

As the number of objects in the heap increases, used grows larger. When the available memory in total is insufficient, it will continue to request memory, with the upper limit being max.

Question: So, does the heap memory overflow when used = max = total?
No, the conditions for heap memory overflow are more complex and will be detailed in the GC explanation.

Setting Heap Size#

To modify the heap size, you can use the virtual machine parameters –Xmx (max maximum value) and -Xms (initial total).
Syntax: -Xmxvalue -Xmsvalue
Units: bytes (default, must be a multiple of 1024), k or K (KB), m or M (MB), g or G (GB).
Limitations: -Xmx max must be greater than 2 MB, and -Xms total must be greater than 1 MB.

Suggestion: Set -Xmx max and -Xms total to the same value to reduce the overhead of memory allocation and deallocation, as well as the situation where the heap shrinks after excess memory.

Method Area#

The method area is where basic information is stored, shared among threads, including:

  • Class metadata, which stores basic information about all classes.
  • Runtime constant pool, which stores the contents of the constant pool in the bytecode file.
  • String constant pool, which stores string constants.

Class Metadata#

The method area stores the basic information of each class, also known as metadata, in the InstanceKlass object.

This is completed during the class loading phase and includes the fields, methods, and other contents in the bytecode file, as well as information needed during runtime, such as the virtual method table (the basis for implementing polymorphism).

image.png|500

Runtime Constant Pool#

In addition to storing class metadata, the method area also stores the runtime constant pool, which contains the contents of the constant pool in the bytecode.

In the bytecode file, constants are found using a lookup table by number. This constant pool is called the static constant pool. Once the constant pool is loaded into memory, it can be quickly located by memory address, and this is called the runtime constant pool.

image.png|500

Implementation of Method Area#

The method area is a virtual concept designed in the "Java Virtual Machine Specification." Each Java virtual machine implements it differently. The Hotspot design is as follows:

  • In versions 7 and earlier, the method area was stored in the permanent generation space within the heap, and the size of the heap was controlled by virtual machine parameters.
  • In versions 8 and later, the method area is stored in the metaspace, which is maintained in direct memory by the operating system. By default, it can continue to allocate memory as long as it does not exceed the operating system's upper limit.

Method Area Overflow#

Dynamically generating bytecode data using the ByteBuddy tool and continuously loading it into memory can simulate the overflow of the method area.

String Constant Pool#

In addition to class metadata and the runtime constant pool, the method area also has a region called the string constant pool (StringTable).

The string constant pool stores constant string content defined in the code, such as the string "123," which will be placed in the string constant pool.

Objects created with new are stored in heap memory.

image.png|500

In earlier designs, the string constant pool was part of the runtime constant pool, and their storage locations were the same. The string constant pool and the runtime constant pool were separated; after JDK 7, the string constant pool is in heap memory.

image.png|500

Question: Are the addresses equal?

/**
 * String constant pool example
 */
public class Demo2 {
    public static void main(String[] args) {
        String a = "1";
        String b = "2";
        String c = "12";
        String d = a + b;
        System.out.println(c == d);
    }
}

They do not point to the same address.

image.png|500

Question: Do the addresses point to the same object?

package chapter03.stringtable;

/**
 * String constant pool example
 */
public class Demo3 {
    public static void main(String[] args) {
        String a = "1";
        String b = "2";
        String c = "12";
        String d = "1" + "2";
        System.out.println(c == d);
    }
}

Checking the bytecode file reveals that during the compilation phase, 1 and 2 are concatenated directly, so both point to the same object in the string constant pool.

image.png|500

Summary of the two questions:

String variable concatenation uses StringBuilder and stores it in heap memory, while constant concatenation is directly connected during the compilation phase.

image.png|500

After JDK 7, string.intern() will return the string in the string constant pool; if it does not exist, it will place the reference of the string into the string constant pool.

Here, the string constant pool is automatically placed by the JVM.

image.png|500

Question: Where are static variables stored?

  • In versions 6 and earlier, static variables were stored in the method area, which is the permanent generation.
  • In versions 7 and later, static variables are stored in the heap within the Class object, separating them from the permanent generation.

Direct Memory#

Direct memory does not exist in the "Java Virtual Machine Specification" and is not part of the Java runtime memory area. It was introduced in JDK 1.4 with the NIO mechanism, using direct memory primarily to solve the following two problems:

  1. If an object in the Java heap is no longer used, reclaiming it can affect the creation and use of objects.
  2. IO operations, such as reading files, require first reading the file into direct memory (buffer) and then copying the data to the Java heap.

Files can be placed in direct memory, and the heap can maintain references to direct memory, avoiding the overhead of data copying and the creation and reclamation of file objects.

image.png|500

You can allocate size using the parameter XX:MaxDirectMemorySize=size.

JVM Garbage Collection#

In languages like C/C++ that do not have an automatic garbage collection mechanism, an object that is no longer used needs to be manually released; otherwise, memory leaks will occur. Memory leaks refer to unused objects that are not reclaimed in the system, and the accumulation of memory leaks can lead to memory overflow.

The process of releasing objects is called garbage collection. To simplify object release, Java introduced an automatic garbage collection (GC) mechanism, with the garbage collector primarily responsible for reclaiming memory on the heap.

image.png|500

Question: Which parts of memory does the garbage collector need to manage?
For thread-private parts, they are created with thread creation and destroyed with thread destruction; method stack frames automatically pop from the stack and release memory after method execution, so they do not require garbage collection. Therefore, the parts that need garbage collection are the method area and heap, which are shared among threads.

Method Area Reclamation#

The content that can be reclaimed in the method area mainly consists of classes that are no longer in use.

To determine whether a class can be unloaded, it must meet the following conditions:

  1. All instance objects of this class have been reclaimed, meaning there are no instances of this class or its subclasses in the heap.
Class<?> clazz = loader.loadClass(name: "com.itheima.my.A");
Object o = clazz.newInstance();
= null;
  1. The class loader that loaded this class has been reclaimed.
  2. The java.lang.Class object corresponding to this class is not referenced anywhere.

The two virtual machine parameters -XX:+TraceClassLoading and -XX:+TraceClassUnloading can show logs of class loading and unloading (i.e., reclamation).

If manual garbage collection is needed, the System.gc() method can be called, but it does not guarantee immediate garbage collection; it merely sends a request for garbage collection to the Java virtual machine, which will determine whether garbage collection is necessary.

Heap Reclamation#

Reference Counting Method and Reachability Analysis Method#

The GC determines whether an object can be reclaimed based on whether it is referenced. If an object is referenced, it indicates that the object is still in use and cannot be reclaimed.

Question: Do A and B need to remove their mutual references?
No, because there is no way to access the A and B objects through references in the code.

image.png|500

Common methods for determining whether an object can be reclaimed include reference counting and reachability analysis.

Reference Counting Method#

The reference counting method maintains a reference counter for each object. When an object is referenced, the counter increases by 1; when the reference is canceled, it decreases by 1.

In this case, canceling two references can make the reference counter return to 0, allowing it to be reclaimed.

image.png|500

However, in the following situation, objects A and B reference each other, and their counters are both 1. However, there are no local variable references to these two objects in the code, meaning they should be reclaimable, but according to the reference counter method, they cannot be reclaimed.

Reachability Analysis Method#

Java uses the reachability analysis algorithm to determine whether an object can be reclaimed.

Reachability analysis divides objects into two categories: garbage collection roots (GC Roots) and ordinary objects; there are reference relationships between objects.

If an object can be reached from the root object (GC Root), it is considered unreachable; GC Roots are not reclaimable.

image.png|500

GC Root objects include:

  • Thread objects that reference method parameters and local variables in the thread stack frame.
  • java.lang.Class objects loaded by the system class loader.
  • Monitor objects that hold the objects of synchronized locks.
  • Global objects used during native method calls.

Example: Thread object.
image.png|500

Five Types of Object References#

The object references described in the reachability algorithm generally refer to strong references, meaning that if a GC Root object has a reference to an ordinary object, that ordinary object cannot be reclaimed.

Java has designed five types of reference methods:

  • Strong Reference
  • Soft Reference
  • Weak Reference
  • Phantom Reference
  • Finalizer Reference
Soft Reference#

Soft references are weaker than strong references. If an object is only associated with a soft reference, it will be reclaimed when the program runs out of memory.

image.png|500

The execution process of soft references is as follows:

  1. Wrap the object using a soft reference: new SoftReference<ObjectType>(object).
  2. When memory is insufficient, the virtual machine attempts garbage collection.
  3. If garbage collection still does not resolve the memory shortage, the objects in the soft reference will be reclaimed.
  4. If memory is still insufficient, an OutOfMemory exception will be thrown.

Placing 100M of data in a soft reference, where bytes = null; removes the strong reference to the data, leaving only the soft reference wrapped by SoftReference. If the virtual machine's max memory is set to -Xmx=200M, the second attempt to access the data in the soft reference will fail because during the second creation of 100M data, even with GC, memory is insufficient, and the objects in the soft reference will be reclaimed, successfully freeing up enough memory to accommodate the new 100M data.

byte[] bytes = new byte[1024 * 1024 * 100];
SoftReference<byte[]> softReference = new SoftReference<byte[]>(bytes);
bytes = null;
System.out.println(softReference.get());

byte[] bytes2 = new byte[1024 * 1024 * 100];
System.out.println(softReference.get());

If the objects in the soft reference are reclaimed due to insufficient memory, the SoftReference itself must also be reclaimed. SoftReference provides a queue mechanism:

  1. When creating a soft reference, it passes a reference queue through the constructor.
  2. When the object wrapped in the soft reference is reclaimed, that soft reference object will be placed in the reference queue.
  3. By traversing the reference queue in code, the strong reference to the SoftReference can be removed.

image.png|500

Using ReferenceQueue, the strong reference saves the SoftReference object. When the object wrapped in the soft reference is reclaimed, the SoftReference itself will be placed into the reference queue passed during construction, allowing it to be popped and traversed, thus losing its strong reference and becoming reclaimable.

ArrayList<SoftReference> softReferences = new ArrayList<>();
ReferenceQueue<byte[]> queues = new ReferenceQueue<byte[]>();
for (int i = 0; i < 10; i++) {
	byte[] bytes = new byte[1024 * 1024 * 100];
	SoftReference studentRef = new SoftReference<byte[]>(bytes,queues);
	softReferences.add(studentRef);
}

SoftReference<byte[]> ref = null;
int count = 0;
while ((ref = (SoftReference<byte[]>) queues.poll()) != null) {
	count++;
}
System.out.println(count);

You can extend SoftReference, storing _key in the constructor to clean up the key in the HashMap when the soft reference object is reclaimed.

image.png|500

private void cleanCache() {
	StudentRef ref = null;
	while ((ref = (StudentRef) q.poll()) != null) {
		StudentRefs.remove(ref._key);
	}
}
Weak Reference#

Weak references are similar to soft references, but the difference is that weak references will be reclaimed regardless of memory availability. The implementation class is WeakReference, mainly used in ThreadLocal. Weak references also provide a reference queue, which will place the weak reference into the queue when the wrapped data is reclaimed.

image.png|500

Manual GC leads to the data wrapped by the weak reference being directly reclaimed, resulting in null on the second attempt.

byte[] bytes = new byte[1024 * 1024 * 100];
WeakReference<byte[]> weakReference = new WeakReference<byte[]>(bytes);
bytes = null;
System.out.println(weakReference.get());

System.gc();

System.out.println(weakReference.get());
Phantom Reference and Finalizer Reference#

These two types of references are generally not used in regular development.

Phantom references, also known as ghost references, cannot be used to access the contained object. The only purpose of a phantom reference is to receive notifications when the object is reclaimed by the garbage collector. Java implements phantom references using PhantomReference, which is used in direct memory to know when direct memory objects are no longer in use, allowing for memory reclamation.

Finalizer references indicate that when an object needs to be reclaimed, it will be placed in the reference queue of the Finalizer class and later retrieved by a finalizerThread to execute the object's finalize method. During this process, it is possible to associate the object with a strong reference again, but this is not recommended, as if it takes too long, it may affect the reclamation of other objects.

Garbage Collection Algorithms#

Introduction#

For garbage collection, there are only two steps:

  1. Identify living objects in memory.
  2. Release the memory of non-living objects, allowing the program to reuse this space.

In 1960, John McCarthy published the first GC algorithm: the mark-and-sweep algorithm.
In 1963, Marvin L. Minsky published the copying algorithm.
Subsequent garbage collection algorithms, such as mark-and-compact and generational GC, are optimizations based on these two algorithms.

Standard Evaluation#

The Java garbage collection process is completed by a separate GC thread. However, regardless of which GC algorithm is used, there will be stages that require stopping all user threads, a process known as Stop The World (STW). If the STW time is too long, it will affect user experience.

User code execution and garbage collection execution alternate, causing user threads to stop during STW. The evaluation of whether a GC algorithm is excellent is based on three aspects:

  1. Throughput: Throughput refers to the ratio of CPU time spent executing user code to the total CPU execution time, i.e., throughput = execution time of user code / (execution time of user code + GC time). The higher the throughput value, the more efficient the garbage collection.
  2. Maximum Pause Time: The maximum pause time refers to the maximum value of all STW times during the garbage collection process.
  3. Heap Utilization Efficiency: Different garbage collection algorithms use heap memory differently. For example, the mark-and-sweep algorithm can use the entire heap memory, while the copying algorithm divides the heap memory in half, using only one half at a time. From the perspective of heap utilization efficiency, the mark-and-sweep algorithm is superior to the copying algorithm.

Generally, the larger the heap memory, the longer the maximum pause time; to reduce the maximum pause time, the throughput will decrease. It is impossible to achieve optimal results for heap utilization efficiency, throughput, and maximum pause time simultaneously.

Mark-and-Sweep Algorithm#

The mark-and-sweep algorithm consists of two core phases:

  1. Marking Phase: Marks all living objects. In Java, the reachability analysis algorithm is used to traverse all living objects starting from GC Roots.
  2. Sweeping Phase: Deletes unmarked (non-living) objects from memory.

For example, if object D is unmarked, it will be cleared.
image.png|500

Advantages: Simple implementation, requiring only a flag for each object in the first phase and deleting objects in the second phase.
Disadvantages:

  1. Fragmentation Problem: Memory is continuous, but deleted objects may not be. After reclamation, many small usable memory units may exist, but they are too small to allocate. For example, if a total of 9 bytes of space is reclaimed, but even 5 bytes of an object cannot be allocated.
  2. Slow Allocation Speed: Requires maintaining a free list to record memory fragments, and traversing to find suitable-sized memory fragments takes too long.
Copying Algorithm#

The copying algorithm's core idea is:

  1. Prepare two spaces: From space and To space. During object allocation, only the From space is used.
  2. During garbage collection (GC), living objects from the From space are copied to the To space.
  3. The names of the two spaces are swapped, ensuring that the From space is always the allocated and used space.

image.png|500

image.png|500

Advantages:

  • High throughput: Only requires one traversal of living objects and copying them to the To space, which is more efficient than the mark-and-sweep algorithm, as it avoids an additional traversal, but is less efficient than the mark-and-compact algorithm due to the extra object movement.
  • No fragmentation: Objects are placed in order.

Disadvantages:

  • Only half of the memory space can be used at a time.
Mark-and-Compact Algorithm#

The mark-and-compact algorithm, also known as the mark-compression algorithm, solves the fragmentation problem that arises from the mark-and-sweep algorithm.

The mark-and-compact algorithm's core idea is:

  1. Marking Phase: Marks all living objects, using the reachability analysis algorithm from GC Roots.
  2. Compaction Phase: Moves living objects to one end of the heap, clearing the memory space of living objects.

image.png|500

Advantages:

  • High memory utilization efficiency: The entire heap memory can be used, unlike the copying algorithm, which can only use half of the heap memory.
  • No fragmentation: In the compaction phase, objects can be moved to one side of memory, leaving the remaining space as valid space for allocating objects.

Disadvantages:

  • The efficiency of the compaction phase is not high. There are many types of compaction algorithms, such as the Lisp2 compaction algorithm, which requires searching through the entire heap for objects three times, resulting in poor overall performance.
Generational Garbage Collection Algorithm#

The generational garbage collection algorithm combines the ideas of the above algorithms, dividing the entire memory area into young and old generations.

image.png|500

In JDK 8, you can use the JVM parameter -XX:+UseSerialGC to run the program with generational GC. You can view the three areas using the memory command in Arthas.

Eden space + Survivor space form the young generation; tenured_gen refers to the promotion area, i.e., the old generation.

Related JVM Parameters:

Parameter NameParameter MeaningExample
-XmsSets the minimum and initial size of the heap, must be a multiple of 1024 and greater than 1MBFor example, initial size 6MB:
-Xms6291456
-Xms6144k
-Xms6m
-XmxSets the maximum size of the heap, must be a multiple of 1024 and greater than 2MBFor example, maximum heap 80MB:
-Xmx83886080
-Xmx81920k
-Xmx80m
-XmnSize of the young generationYoung generation 256MB:
-Xmn256m
-Xmn262144k
-Xmn268435456
-XXThe ratio of Eden to Survivor, default is 8
1g memory in the young generation, 800MB in Eden, 100MB each in S0 and S1
To adjust the ratio to 4:
-XX=4
-XX:+PrintGCDetailsverbosePrints GC logsNone

Heap refers to available heap, while the survivor area can only use one block at a time.

Note that the SurvivorRatio ratio means eden:s0:s1 = SurvivorRatio:1:1.

Execution Process:

  1. In the generational garbage collection algorithm, newly created objects are first placed in the Eden area.
  2. As more objects accumulate in the Eden area, when it becomes full and cannot accommodate new objects, it triggers a young generation GC, known as Minor GC or Young GC. Minor GC will reclaim objects from Eden and the From space that need to be reclaimed, placing the remaining objects into the To space. Here, From and To refer to the two survivor areas, using the copying algorithm's idea.
    image.png|500
  3. Next, the two survivor areas, From and To, are swapped, with S1 becoming From and S0 becoming To. When the Eden area is full, Minor GC will still occur, reclaiming objects from Eden and From S1 that need to be reclaimed, placing the remaining objects into To space, which is S0. Note: Each Minor GC will record the age of objects, starting at 0, and incrementing by 1 after each GC.
    image.png|500
  4. If the age of an object reaches the threshold (maximum 15, default value depends on the garbage collector), the object will be promoted to the old generation.
    image.png|500
  5. If an object exceeds half the size of a region, it will be placed directly into the old generation. This type of old generation is called the Humongous area. For example, if the heap memory is 4G and each region is 2M, any large object exceeding 1M will be placed in the Humongous area, and if the object is too large, it may span multiple regions.
    image.png|500
  6. After multiple collections, many old generation regions may appear. If the total heap occupancy reaches the threshold -XX:InitiatingHeapOccupancyPercent, default 45%, it will trigger a mixed GC, reclaiming objects from both the young generation and some old generation areas, using the copying algorithm to complete the process.
    image.png|500
Why does the generational GC algorithm divide the heap into young and old generations?

The characteristics of objects in heap memory:

  • Most objects in the system are created and quickly become unused and can be reclaimed, such as user order data that can be released after being returned to the user.
  • The old generation stores long-lived objects, such as most Spring bean objects that will not be reclaimed after the program starts.
  • In the default settings of the virtual machine, the size of the young generation is much smaller than that of the old generation.

The reasons for dividing the heap into young and old generations in the generational GC algorithm are:

  1. It allows adjusting the ratio of the young and old generations to adapt to different types of applications, improving memory utilization and performance.
  2. The young and old generations use different garbage collection algorithms. The young generation typically uses the copying algorithm, while the old generation can use mark-and-sweep or mark-and-compact algorithms, providing flexibility for programmers.
  3. The design of generations allows for only reclaiming the young generation (Minor GC). If it meets the allocation requirements for objects, there is no need to perform a full heap reclamation (Full GC), reducing STW time.

Garbage Collectors#

Garbage collectors are the specific implementations of garbage collection algorithms. Since garbage collectors are divided into young and old generations, all garbage collectors except G1 must be used in pairs.

image.png|500

Serial - Serial Old#

Use the JVM parameter -XX:+UseSerialGC to use this pair of GC.

Young Generation - Serial Garbage Collector

Serial is a single-threaded garbage collector for reclaiming the young generation.

image.png|500

Old Generation - Serial Old Garbage Collector

SerialOld is the old generation version of the Serial garbage collector, using single-threaded serial reclamation.

image.png|500

ParNew - CMS#

Young Generation - ParNew Garbage Collector

Use the JVM parameter -XX:+UseParNewGC to use ParNew GC.

ParNew is essentially an optimization of Serial for multi-CPU environments, using multi-threading for garbage collection.

image.png|500

Old Generation - CMS Concurrent Mark Sweep Garbage Collector

Use the JVM parameter -XX:+UseConcMarkSweepGC to use CMS GC.

CMS focuses on the system's pause time, allowing user threads and garbage collection threads to execute concurrently during certain steps, reducing user thread wait times.

image.png|500

CMS execution steps:

  1. Initial marking: Quickly marks objects directly associated with GC Roots in a very short time.
  2. Concurrent marking: Marks all objects without pausing user threads.
  3. Final marking: Due to changes in the concurrent marking phase, some objects may have been incorrectly marked or missed, requiring re-marking.
  4. Concurrent cleanup: Cleans up dead objects without pausing user threads.

Disadvantages:

  1. CMS uses the mark-and-sweep algorithm, which can lead to significant memory fragmentation after garbage collection. CMS will perform compaction during Full GC. This can lead to user thread pauses, and the parameter -XX:CMSFullGCsBeforeCompaction=N (default 0) can be used to adjust N times of Full GC before compaction.
  2. It cannot handle "floating garbage" generated during concurrent cleanup, meaning it cannot achieve complete garbage collection.
  3. If the old generation runs out of memory and cannot allocate objects, CMS will degrade to Serial Old, which blocks the old generation since it cannot reclaim memory.
Parallel Scavenge - Parallel Old#

Young Generation - Parallel Scavenge Garbage Collector

Parallel Scavenge is the default young generation garbage collector in JDK 8, using multi-threaded parallel reclamation and focusing on system throughput, automatically adjusting heap memory size.

image.png|500

Old Generation - Parallel Old Garbage Collector

You can use the JVM parameters -XX:+UseParallelGC or -XX:+UseParallelOldGC to use the combination of Parallel Scavenge + Parallel Old.

Parallel Old is designed for the old generation version of the Parallel Scavenge garbage collector, utilizing multi-threaded concurrent reclamation.

image.png|500

G1 Garbage Collector#

The default garbage collector after JDK 9 is the G1 (Garbage First) garbage collector.

Parallel Scavenge focuses on throughput and allows users to set maximum pause times, but it reduces the available space in the young generation. CMS focuses on pause times but decreases throughput.

G1 aims to combine the advantages of the above two garbage collectors:

  1. Supports reclaiming large heap spaces with high throughput.
  2. Supports multi-CPU parallel garbage collection.
  3. Allows users to set maximum pause times.

Before G1, young and old generations were generally continuous.

image.png|500

G1 divides the entire heap into multiple equal-sized regions, referred to as Regions. The regions do not need to be continuous and are divided into Eden, Survivor, and Old areas. The size of a Region is calculated as heap space size / 2048, and it can also be specified using the JVM parameter `-

Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.