Selecting Hash Table Size in Java

Thomas Wang, July 1998
last update July 1998

Abstract

In this article, we recommend the number 89 as the default initial size for Java built-in hash table. This initial size allows the table size to remain prime after 5 expansions. During expansions up to 2^32 bit range, there are no divisors found smaller than 11.

Introduction

The Java built-in hash table has an initial capacity that can be specified by the programmer. If the number of entries exceeds the threshold, the capacity is multiplied by the formula of 2n+1.

It is a common practice to recommend a prime number to be the size of a hash table. This way, a funnel will occur only if the keys are multiples of the prime number. On the other hand, if the hash table size is not a prime number, keys that are multiples of any of the factors of the hash table size will form a funnel.

Since Java's hash table expansion uses formula of 2n+1. It is natural to expect the best candidate for initial size to remain a prime under a number of table expansions. We will search for a number X with the following properties:

Elimination

Large prime numbers end with the decimal digit 1, 3, 7 or 9. Any number ending with digit 0, 2, 4, 5, 6, 8 would not be a prime (eg. 10, 22, 34, 65, 78). Under the expansion of 2n+1, it can be shown that any number with digit ending in 1, 3, or 7 will become divisible by 5 after four expansions. (11 -> 23 -> 47 -> 95) Any hash table size that is divisible by 5 is very bad, as keys in multiples of 10 are common.

The Search

The search would be based on finding a prime number ending in the digit 9 that have the properties we desire.

Primes under 100 with ending digit of 9 consist of 19, 29, 59, 79, and 89. The expansion of 19 is 39, which is divisible by 3 - too small. The expansion of 29 is 59, a prime number. This is good, however the expansion of 59 is 119, which is divisible by 7 - somewhat small. We looked next at the number 79. The expansion of 79 is 159, which is again divisible by 3.

Lastly we look at the number 89. Fortunately 89 turns out to be a good choice. The following section lists the expansion of 89.

89
179
359
719
1439
2879
5759 = 13 * 443
11519
23039
46079 = 11 * 59 * 71
92159 = 157 * 587
184319 = 19 * 89 * 109
368639 = 43 * 8573
737279
1474559
2949119
5898239 = 643 * 9173
11796479 = 41 * 149 * 1931
23592959 = 13 * 1814843
47185919 = 11 * 4289629
94371839 = 881 * 107119
188743679
377487359 = 53 * 79 * 89 * 1013
754974719 = 29 * 1129 * 23059
1509949439 = 107 * 14111677

You can run this test yourself with this C program.

#include 

int main()
{
  int val = 89;
  int indx;
  char buf[80];

  for (indx=0; indx < 26; ++indx)
  {
    sprintf(buf, "factor %d", val);
    system(buf);
    val += val + 1;
  }
  return 0;
}

Conclusion

89 is a good size. There might be better sizes for hash tables that are known to be of large size. The above test program can be used to find them.

What does the Java library use as a default hash table size if you do not specify anything - is it 89? No, the source code indicates 101. This is a number we rejected in the initial step.


HOME

Give feedbacks to wang@cup.hp.com