Exploring the InliningPhase of Graal compiler
Before we dive deeper, let’s cover some basic concepts first.
The Java compiler is crucial for converting Java code into machine code. It generally operates in two modes:
- JIT (Just-in-Time), which compiles code as it’s needed during execution.
- AOT (Ahead-of-Time), which compiles all the code before execution starts.
The Java compiler in the JDK is called HotSpot. It includes three types:
- C0: The simplest form, a bytecode interpreter.
- C1: The client compiler, it compiles quickly but with less optimization.
- C2: The server compiler, which optimizes thoroughly but compiles slowly.
When a program runs, it starts with C0 interpreting the bytecode. Then, to speed up execution, C1 quickly compiles the code. For code that’s run often, C2 takes over, producing highly optimized machine code.
C2 performs aggressive optimizations by analyzing execution data, such as determining not to compile functions that are not used from begining of execution. However, these aggressive optimizations can sometimes lead to issues. For instance, if C2 chooses not to compile a function but it’s needed later, the VM has to rely on machine code generated by C1 or revert to using C0 to interpret it. This stage is called deoptimzation, which can result in performance drawbacks.
If a piece of code, usually a function, is executed a certain number of times, it’s recognized as a HotSpot and then compiled into machine code.
Graal compiler written in Java
The Java compiler in the JDK is written in C++, which can be challenging to read for some programmers(myself at the moment). Additionally, C++ can lead to memory and security issues, such as stack overflows and memory leaks.
The good news is there’s a Java compiler called Graal, which is written in Java itself. You can use the Graal compiler to replace the one inside the JDK, provided that the JDK version supports JVMCI (JVM Compiler Interface). Any JIT compiler that implements this interface can be used by the JVM.
GraalVM is a JVM developed by Oracle that uses Graal as its JIT compiler. It supports not only Java but also JavaScript, Ruby, R, and Python through a framework called Truffle layered on top of Graal. Additionally, it supports C, C++, and Rust by adding another layer on top of Truffle called Sulong.
GraalVM supports Ahead-of-Time (AOT) compilation. You can compile your project into a native image using GraalVM, which can significantly reduce startup time.
Understanding the Graal Compiler by building, modifying, and debugging it
Clone graal source code from Github
1 |
|
Clone mx, the command-line tool used for the development of Graal projects
1 |
|
Set mx in $PATH
1 |
|
Install JDK supports JVMCI by using mx
1 |
|
Set JAVA_HOME in $PATH
1 |
|
Move to graal compiler and compile
1 |
|
Before using Graal to compile and debug code, we need to configure the IDE. This article covers the setup for IntelliJ. For instructions on configuring other IDEs, you can refer to this document.
For configuring IntelliJ, install the Python plugin and then run the following command:
1 |
|
Now, we can use IntelliJ to open the graal/compiler project and proceed with further steps.
Let’s dive into the source code to understand what happens when you use Graal to compile. When a method is recognized as a HotSpot, the VM calls compileMethod (Just-In-Time compilation) to compile it. This compileMethod is located in the jdk.graal.compiler.hotspot package
.
Add a prinln function to see if it will be called.
1 |
|
We can write a simple Java program to test the Graal compiler. In the code below, we create an add method and call it repeatedly in a for-loop, which will turn it into a HotSpot method. Graal will then compile this method into machine code.
1 |
|
Using Graal to compile Java code:
1 |
|
- vm: the first time you run the command mx vm, it will copy the Java HotSpot VM from the JDK to Graal and create a new GraalVM instance
- -XX:+UnlockExperimentalVMOptions: Enale experimental features
- -XX:+EnableJVMCI: Enable JVMCI
- -XX:+UseJVMCICompiler: Using Graal to compile
- -XX:-TieredCompilation: Disable tiered compilation, we will talk about it later
- -XX:CompileOnly=Foo.add: Only compiles the add method in Foo class
When the add method has been executed many times, it will be recognized as a HotSpot and compiled. You can use the -XX:CompileThreshold option to control the number of executions after which a method should be compiled.
-d
option in mx
allows you to debug the compilation process:
1 |
|
After executing the above command, set a breakpoint in IntelliJ IDEA and click the debug button to start debugging.
We can now debug the compiler and analyze its behavior. :)
Sea of Nodes
I recommend reading Graal IR: An Extensible Declarative Intermediate Representation and Understanding Basic Graal Graphs, as I only provide a summary of them in this blog.
Graal converts bytecode into a graph Intermediate Representation (IR). It bridges the huge semantic gap between high-level languages and machine languages. We can use IGV or seafoam to display the generated IR graph.
Recompile Foo.java
with two new parameters: -Dgraal.Dump
to dump IR and -Xcomp
to compile every function the compiler encounters for the first time.
1 |
|
Then we use seafoam
to generate a graph:
1 |
|
A Graal graph has nodes(instructions) and edges(data-flow or control-flow)
- A green data-flow edge measn that the result of a node(instruction) is used as input to another node.
- A red control-flow edge means that control passes from one node, which is executed first, to another, which is executed next.
- P(0) the first parameter
- P(1) the second parameter
- x, y addition labels have nothing to do with source code
The biggest advantage of Sea of Nodes is that it can reflect control flow and data flow simultaneously with a data structure, and minimize their interdependence as much as possible.
When using idealgraphvisualizer
, I encountered the issue of java.lang.NullPointerException: Cannot invoke "org.graalvm.visualizer.script.UserScriptEngine.prepare()
. As a result, I had to switch to a different JDK.
1 |
|
When you run idealGraphVisualizer
, it will listen on port 4445 for messages from GraalVM
. Therefore, you need to compile the program by running mx vm
after starting idealGraphVisualizer
.
Data-flow
In the Intermediate Representation (IR) below, blue edges indicate the flow of data. Parameter 0 and parameter 1 flow to the +
node before continuing to the Return
node.
The Add node has two inputs, which are x and y, these two inputs are the two attributes of the AddNode. Note that the direction of the arrow in this diagram represents data dependency, that is, the Add node maintains references to its two input nodes, which is actually consistent with AST. The flow of data, on the other hand, is reversed from x and y flowing towards the Add node.
1 |
|
Control-flow
Red edges indicate control flow in the graph. You can navigate through the graph starting from the Start node following the control flow. To reach the Return node, you need to calculate the data (x+y).
FloatingNode and FixedNode
The main advantage of Sea of Nods is that it allows you to use a single data structure to represent both control-flow and data-flow, while minimizing their interdependence. For example, in the following loop
function we can remove int b = x * 2
from this for-loop since it does not use variable i or the for loop itself. In other representations, it might be necessary to analyze that int b
is not related to the for-loop. However, in Sea of Nodes
, int b
does not belong to any blocks, nodes with flexibility in execution time, we say they are floating.
1 |
|
In addition to floating nodes, there are also some nodes that are fixed in the control flow and cannot be out of order. These nodes are called fixed nodes. Apart from those process control type of nodes (such as IfNode).
FrameState
It saved the state of the stack frame.
- deoptimization: C2 aggressively optimizes code and may require deoptimization during execution.
- debug
the processing steps of Graal IR
By adding -Dgraal.Dump=:5
, GraalVM will dump every processing steps, we can see phase inlining in it:
1 |
|
Inlining
Inline optimization is a very important optimization strategy for the Java JIT compiler. Simply put, inlining is to expand the body of the called method at the place of call. The biggest advantage of doing this is to save the overhead of function calls. For frequently called functions, inline optimization can greatly improve program performance.
- Inline optimization has certain conditions:
- the method to be inlined must be a hotspot method
- the method to be inlined cannot be too large
There are many getter and setter methods in a program. If we do not optimize them inline, it could affect the program’s performance.
We write the test code and build it using the -XX:-Inline
parameter to disable inlining:
1 |
|
1 |
|
then we remove -XX:-Inline
to enable inlining:
InliningPolicy
1 |
|