Writing Micro-Benchmarks for Java Hotspot JVM

Thomas Wang, July 2001
last update August 2001

Abstract

In this article, we will examine some tips and pitfalls of writing micro-benchmarks for Java Hotspot JVM.

We will present a sample benchmarking program structure that is quite suitable for benchmarking the Java Hotspot JVM.

Introduction

Software developers often write micro-benchmarks for examine the impact of alternative algorithms to program performance. Micro-benchmarks are also used to estimate the required capacity for a large server system.

The Java Hotspot JVM is known to be tricky to benchmark. Because methods are compiled into native code on-the-fly, a naively written benchmarks often give misleading results.

If a method is run insufficient number of times, then it may not get compiled, or it may get compiled while under measurement. Either of the cases will result in incorrect measurement.

A Naive Benchmark Program

public class test1
{
  public static void main(String[] args)
  {
    long starttime = System.currentTimeMillis();
    long sum = 0;
    for (int indx=200000; --indx >= 0; ) sum += 42;
    long endtime = System.currentTimeMillis();
    System.out.println("elapsed milli-seconds: " + (endtime - starttime));
  }
}

This benchmark program has all the benchmark code within the main method. This will produce incorrect benchmark result. We have to realize Hotspot JVM compiles a method according to how many times it has been run. Since the main method is run exactly once, it probably will not get compiled.

There is an advanced JVM feature called "On Stack Replacement". If this feature is available, then the main method may become compiled sometime during the measurement loop. However, there is no way to tell how many iterations are interpreted versus compiled. This just turns an incorrect measurement into an uncertain measurement.

The sharp eyed will find another thing wrong with this benchmark. A benchmark should contain test sequences that cannot be optimized away. A smart JVM may replace the loop with a simple assignment of "sum = 8200000;" This may be fine if we are measuring the optimizer; not fine if we are measuring long arithmetic performance in a loop.

Warm-up Phase and Measurement Phase

After a method is run 10000 times, the Hotspot compiler will compile it into native code. Periodically, the Hotspot compiler may recompile the method. After an unspecified amount of time, then the compilation system should become quiet.

As a rule of thumb, a micro-benchmark program should call the test subroutine 50000 times to make up the warm-up phase. Each time the test subroutine is called, a small work-load should be supplied. Hotspot compiler will only compile codes that are actually being run. If you bypass the work-load code during warm-up, then the work-load code will remain un-compiled!

After the warm-up phase, then the benchmark program should start the measurement phase. In the measurement phase, real workload should be supplied to the test subroutine, and the timing information printed.

Because it will take a while for the compilation system to settle down, the benchmark program should print a series of timing runs, so that the outputs can be examined to make sure indeed the measurements have come to a steady state.

The Test Subroutine

The test subroutine is a method where the code to be test is to be located.

  public static void run(boolean direct, int loopcount)
  {
    long starttime = System.currentTimeMillis();
    for(int indx = loopcount; --indx >= 0; )
    {
      if (direct)
      {
        div_result = loopcount / 60;
        mod_result = loopcount % 60;
      }
      else
      {
        div_result = loopcount / 60;
        mod_result = loopcount - div_result * 60;
      }
    }
    long endtime = System.currentTimeMillis();
    if (loopcount > 1)
      System.out.println("direct:"+direct+" "+(endtime - starttime)+" ms");
  }

It is very useful to be able to specify the number of iterations a test is to be run, as well as the type of test that is to be run. By varying the iterations count, we can change the work-load.

By including the timing measurement codes in the test subroutine, the timing measurement codes would be compiled as well. Be careful to exercise the printing code at least once during the warm-up phase, otherwise the Hotspot compiler will compile the printing code during the measurement phase.

A Sample Benchmark Application

import java.util.*;

public class mathbench
{
  public static void run(boolean strict, int loopcount)
  {
    double incr = 1.7d;
    double curnum = 21.4d;
    long t1 = System.currentTimeMillis();
    for (int indx = loopcount; --indx >= 0; )
    {
      double num
        = strict ? StrictMath.ceil(curnum) : Math.ceil(curnum);
      curnum = - (Math.abs(num) + incr);
    }
    long t2 = System.currentTimeMillis();
    if (loopcount > 3)
    {
      System.out.println("strict:" + strict + " result:" + curnum);
      System.out.println((t2 - t1) + " ms");
    }
  }

  public static void main(String argv[])
  {
    int indx;
    System.out.println("warm-up phase");
    run(false, 4); // exercise printing code
    for (indx=50000; indx > 0; --indx)
    {
      run(false, 3);
      run(true, 3);
    }
    System.out.println("measurement phase");
    for (indx=10; --indx >= 0; )
    {
      run(true,  100000);
      run(false, 100000);
    }
  }
}

To recap what was discussed in this article, notice the specific features of this example benchmark program.

References

"The Java HotSpot(tm) Server Compiler", Michael Paleczny, Christopher Vick, and Cliff Click, Proceedings of the Java Virtual Machine Research and Technology Symposium (JVM '01)