In this article, we will examine the construction of a hash function based on object memory address. Address based hash functions are suitable for transient mutable objects whose underlying implementation does not move objects in memory.
In a class library, mutable objects are often considered to be self identifying. For example, two objects that happen to contain the same data value are still considered to be different objects. A future update function call can cause their values to diverge.
The construction of a hash function for mutable objects therefore cannot be based on the data content of an object.
If the underlying memory system does not move objects, then there is a simple mechanism to generate a hash value for mutable objects. We can use an object's memory address to generate a hash value.
In the previous article named Integer Hash Function, we discussed Knuth's Multiplicative Method. Basically the key value is multiplied by a large prime number to obtain the hash value.
Knuth's Multiplicative Method has a flaw that variations of upper key bits are not propagated to lower hash bits. However, in memory layout, objects are laid out in sequential orders. Therefore the upper bits of the address do not vary in the first place. As a result, we can safely use the multiplicative method for our hash function construction. The hashes of a group of consecutively layed out objects would be spread out appropriately in the 2^32 range. A general observation is that if object keys are chosen in a monotonic increasing sequence, then Knuth's Multiplicative Method can be safely used for hashing.
uint32 address_hash(char* addr)
{
register uint32 key;
key = (uint32) addr;
return (key >> 3) * 2654435761;
}
2654435761 is the golden ratio of 2^32. The right shift of 3 bits assumes that all objects are aligned on the 8 byte boundary. If a system aligns on the 4 byte boundary, then a right shift of 2 bits should be done.
In this article, we have shown the method to construct a hash function based on an object's address. This is a high speed hash function suitable for use with various hash table implementations.