Exploring the InliningPhase of Graal compiler

Before we dive deeper, let’s cover some basic concepts first.

The Java compiler is crucial for converting Java code into machine code. It generally operates in two modes:

JIT (Just-in-Time), which compiles code as it’s needed during execution.
AOT (Ahead-of-Time), which compiles all the code before execution starts.

The Java compiler in the JDK is called HotSpot. It includes three types:

C0: The simplest form, a bytecode interpreter.
C1: The client compiler, it compiles quickly but with less optimization.
C2: The server compiler, which optimizes thoroughly but compiles slowly.

When a program runs, it starts with C0 interpreting the bytecode. Then, to speed up execution, C1 quickly compiles the code. For code that’s run often, C2 takes over, producing highly optimized machine code.

C2 performs aggressive optimizations by analyzing execution data, such as determining not to compile functions that are not used from begining of execution. However, these aggressive optimizations can sometimes lead to issues. For instance, if C2 chooses not to compile a function but it’s needed later, the VM has to rely on machine code generated by C1 or revert to using C0 to interpret it. This stage is called deoptimzation, which can result in performance drawbacks.

If a piece of code, usually a function, is executed a certain number of times, it’s recognized as a HotSpot and then compiled into machine code.

Graal compiler written in Java

The Java compiler in the JDK is written in C++, which can be challenging to read for some programmers(myself at the moment). Additionally, C++ can lead to memory and security issues, such as stack overflows and memory leaks.

The good news is there’s a Java compiler called Graal, which is written in Java itself. You can use the Graal compiler to replace the one inside the JDK, provided that the JDK version supports JVMCI (JVM Compiler Interface). Any JIT compiler that implements this interface can be used by the JVM.

GraalVM is a JVM developed by Oracle that uses Graal as its JIT compiler. It supports not only Java but also JavaScript, Ruby, R, and Python through a framework called Truffle layered on top of Graal. Additionally, it supports C, C++, and Rust by adding another layer on top of Truffle called Sulong.
GraalVM supports Ahead-of-Time (AOT) compilation. You can compile your project into a native image using GraalVM, which can significantly reduce startup time.

Understanding the Graal Compiler by building, modifying, and debugging it

Clone graal source code from Github

1	`git clone git@github.com:oracle/graal.git`

Clone mx, the command-line tool used for the development of Graal projects

1	`git clone git@github.com:graalvm/mx.git`

Set mx in $PATH

1	`export PATH="/path/to/mx:$PATH"`

Install JDK supports JVMCI by using mx

1	`mx fetch-jdk`

Set JAVA_HOME in $PATH

1	`export JAVA_HOME=/path/to/.mx/jdks/labsjdk-ce-21-jvmci-23.1-b33/Contents/Home`

Move to graal compiler and compile

1 2	`cd graal/compiler/ mx build`

Before using Graal to compile and debug code, we need to configure the IDE. This article covers the setup for IntelliJ. For instructions on configuring other IDEs, you can refer to this document.

For configuring IntelliJ, install the Python plugin and then run the following command:

1	`mx intellijinit`

Now, we can use IntelliJ to open the graal/compiler project and proceed with further steps.

Let’s dive into the source code to understand what happens when you use Graal to compile. When a method is recognized as a HotSpot, the VM calls compileMethod (Just-In-Time compilation) to compile it. This compileMethod is located in the jdk.graal.compiler.hotspot package.

Add a prinln function to see if it will be called.

public CompilationRequestResult compileMethod(CompilationRequest request) {
    System.out.println("Begin to compile method: " + request.getMethod().getName() + "\nbytecode: " + java.util.Arrays.toString(request.getMethod().getCode()));
    return compileMethod(this, request);
}

We can write a simple Java program to test the Graal compiler. In the code below, we create an add method and call it repeatedly in a for-loop, which will turn it into a HotSpot method. Graal will then compile this method into machine code.

// Foo.java
public class Foo{
    public static void main(String args[]){
        int i = 0;
        while(true){
           if(i%1000==0){
                System.out.println(i);
                try{
                    Thread.sleep(100);
                }catch(Exception e){}
            }
            i++;
            add(i,i+1);   
        }
    }


    public static int add(int x, int y){
        return x + y;
    }
}

Using Graal to compile Java code:

javac Foo.java
mx vm \
  -XX:+UnlockExperimentalVMOptions \
  -XX:+EnableJVMCI \
  -XX:+UseJVMCICompiler \
  -XX:-TieredCompilation \
  -XX:CompileOnly=Foo.add \
Foo

vm: the first time you run the command mx vm, it will copy the Java HotSpot VM from the JDK to Graal and create a new GraalVM instance
-XX:+UnlockExperimentalVMOptions: Enale experimental features
-XX:+EnableJVMCI: Enable JVMCI
-XX:+UseJVMCICompiler: Using Graal to compile
-XX:-TieredCompilation: Disable tiered compilation, we will talk about it later
-XX:CompileOnly=Foo.add: Only compiles the add method in Foo class

When the add method has been executed many times, it will be recognized as a HotSpot and compiled. You can use the -XX:CompileThreshold option to control the number of executions after which a method should be compiled.

-d option in mx allows you to debug the compilation process:

mx -d vm \
  -XX:+UnlockExperimentalVMOptions \
  -XX:+EnableJVMCI \
  -XX:+UseJVMCICompiler \
  -XX:-TieredCompilation \
  -XX:CompileOnly=Foo.add \
Foo

After executing the above command, set a breakpoint in IntelliJ IDEA and click the debug button to start debugging.

We can now debug the compiler and analyze its behavior. :)

Sea of Nodes

I recommend reading Graal IR: An Extensible Declarative Intermediate Representation and Understanding Basic Graal Graphs, as I only provide a summary of them in this blog.

Graal converts bytecode into a graph Intermediate Representation (IR). It bridges the huge semantic gap between high-level languages and machine languages. We can use IGV or seafoam to display the generated IR graph.

Recompile Foo.java with two new parameters: -Dgraal.Dump to dump IR and -Xcomp to compile every function the compiler encounters for the first time.

mx vm \
  -XX:+UnlockExperimentalVMOptions \
  -XX:+EnableJVMCI \
  -XX:+UseJVMCICompiler \
  -XX:-TieredCompilation \
  -XX:CompileOnly=Foo.add \
  -Dgraal.Dump \
  -Xcomp \
Foo

Then we use seafoam to generate a graph:

1	`seafoam graal_dumps/2024.06.01.12.15.39.071/HotSpotCompilation-115\[Foo.add$int,_int$int\].bgv:0 render`

A Graal graph has nodes(instructions) and edges(data-flow or control-flow)

A green data-flow edge measn that the result of a node(instruction) is used as input to another node.
A red control-flow edge means that control passes from one node, which is executed first, to another, which is executed next.
P(0) the first parameter
P(1) the second parameter
x, y addition labels have nothing to do with source code
The biggest advantage of Sea of Nodes is that it can reflect control flow and data flow simultaneously with a data structure, and minimize their interdependence as much as possible.

When using idealgraphvisualizer, I encountered the issue of java.lang.NullPointerException: Cannot invoke "org.graalvm.visualizer.script.UserScriptEngine.prepare(). As a result, I had to switch to a different JDK.

1	`mx -p graal/compiler igv --jdkhome /Users/badbubble/Library/Java/JavaVirtualMachines/corretto-18.0.2/Contents/Home`

idealGraphVisualizer

When you run idealGraphVisualizer, it will listen on port 4445 for messages from GraalVM. Therefore, you need to compile the program by running mx vm after starting idealGraphVisualizer.

Data-flow

In the Intermediate Representation (IR) below, blue edges indicate the flow of data. Parameter 0 and parameter 1 flow to the + node before continuing to the Return node.
data-flow

The Add node has two inputs, which are x and y, these two inputs are the two attributes of the AddNode. Note that the direction of the arrow in this diagram represents data dependency, that is, the Add node maintains references to its two input nodes, which is actually consistent with AST. The flow of data, on the other hand, is reversed from x and y flowing towards the Add node.

class AddNode extends Node { 
    @Input Node left;
    @Input Node right;
}

add node

Control-flow

control-flow
Red edges indicate control flow in the graph. You can navigate through the graph starting from the Start node following the control flow. To reach the Return node, you need to calculate the data (x+y).

FloatingNode and FixedNode

The main advantage of Sea of Nods is that it allows you to use a single data structure to represent both control-flow and data-flow, while minimizing their interdependence. For example, in the following loop function we can remove int b = x * 2 from this for-loop since it does not use variable i or the for loop itself. In other representations, it might be necessary to analyze that int b is not related to the for-loop. However, in Sea of Nodes, int b does not belong to any blocks, nodes with flexibility in execution time, we say they are floating.

public static int loop(int x) {
    int sum = 0;
    for (int i = 0; i < 10; i++) {
        // here
        int b = x * 2;
        sum += b;
    }
}

In addition to floating nodes, there are also some nodes that are fixed in the control flow and cannot be out of order. These nodes are called fixed nodes. Apart from those process control type of nodes (such as IfNode).

FrameState

It saved the state of the stack frame.

deoptimization: C2 aggressively optimizes code and may require deoptimization during execution.
debug

the processing steps of Graal IR

By adding -Dgraal.Dump=:5, GraalVM will dump every processing steps, we can see phase inlining in it:

mx vm \
  -XX:+UnlockExperimentalVMOptions \
  -XX:+EnableJVMCI \
  -XX:+UseJVMCICompiler \
  -XX:-TieredCompilation \
  -XX:CompileOnly=Foo.add \
  -Dgraal.Dump=:5 \
  -Xcomp \
Foo

processing-step

Inlining

Inline optimization is a very important optimization strategy for the Java JIT compiler. Simply put, inlining is to expand the body of the called method at the place of call. The biggest advantage of doing this is to save the overhead of function calls. For frequently called functions, inline optimization can greatly improve program performance.

Inline optimization has certain conditions:
- the method to be inlined must be a hotspot method
- the method to be inlined cannot be too large

There are many getter and setter methods in a program. If we do not optimize them inline, it could affect the program’s performance.

We write the test code and build it using the -XX:-Inline parameter to disable inlining:

public class Foo{
    private static int X = 0;
    private static int Y = 0;

    public static void main(String args[]){
        int i = 0;
        while(true){
            if(i%1000==0){
                System.out.println(i);
                try{
                    Thread.sleep(100);
                }catch(Exception e){}
            }
            i++;
            add2();
        }
    }

    public static int getX(){
        return X;
    }

    public static int getY(){
        return Y;
    }

    public static int add2(){
        return getX() + getY();
    }
    
}

mx vm \
  -XX:+UnlockExperimentalVMOptions \
  -XX:+EnableJVMCI \
  -XX:+UseJVMCICompiler \
  -XX:-TieredCompilation \
  -XX:CompileOnly=Foo.add2 \
  -XX:-Inline \
  -Dgraal.Dump=:5 \
  -Xcomp \
Foo

before inlining

then we remove -XX:-Inline to enable inlining:
after inlining

InliningPolicy

public interface InliningPolicy {
    final class Decision {
        public static final Decision YES = new Decision(true, "(unknown reason)");
        public static final Decision NO = new Decision(false, "(unknown reason)");

        private final boolean shouldInline;
        private final String reason;

        private Decision(boolean shouldInline, String reason) {
            this.shouldInline = shouldInline;
            this.reason = reason;
        }

        public boolean shouldInline() {
            return shouldInline;
        }

        public String getReason() {
            return reason;
        }

        /**
         * This constructor avoids the need to box arguments when the message is a simple string.
         */
        public Decision withReason(boolean isTracing, String newReason) {
            if (isTracing) {
                return new Decision(shouldInline, newReason);
            } else {
                return this;
            }
        }

        public Decision withReason(boolean isTracing, String newReason, Object... args) {
            if (isTracing) {
                return new Decision(shouldInline, String.format(newReason, args));
            } else {
                return this;
            }
        }
    }

    boolean continueInlining(StructuredGraph graph);

    Decision isWorthInlining(Replacements replacements, MethodInvocation invocation, InlineInfo calleeInfo, int inliningDepth, boolean fullyProcessed);
}

GreedyInliningPolicy

InlineEverythingPolicy

AbstractInliningPolicy

InlineMethodSubstitutionsPolicy

References

Exploring the InliningPhase of Graal compiler

http://badbub.com/2024/05/08/explore_inline/

Author

BadBubble

Posted on

May 8, 2024

Licensed under