We will find out what are Hash functions, Hashcode, Collision and Rehashing.
Since HashMap and HashSet both uses Hashing Technique we will also be able to answer
- Internal working of HashMap
- Internal working of HashSet
Hash functions are used in hash tables, to quickly locate a data record (e.g., a dictionary definition) given its search key.
Hash function has the following properties
- it always returns a number for an object.
- two equal objects will always have the same number
- two unequal objects not always have different numbers
Precedure of storing objets using a hash function
- Create an array of size M.
- Choose a hash function h, that is a mapping from objects into integers 0, 1, …, M-1.
- Put these objects into an array at indexes computed via the hash function index = h(object). Such array is called a hash table.
How to choose a hash function?
- The hashCode() method is implemented in the Object class and therefore each class in Java inherits it.
- The hash code provides a numeric representation of an object.
The method hashCode has different implementation in different classes. In the String class, hashCode is computed by the following formula
s.charAt(0) * 31n-1 + s.charAt(1) * 31n-2 + ... + s.charAt(n-1)
where s is a string and n is its length. An example
"ABC" = 'A' * 312 + 'B' * 31 + 'C' = 65 * 312 + 66 * 31 + 67 = 64578
Note that Java’s hashCode method might return a negative integer.
If a string is long enough, its hashcode will be bigger than the largest integer we can store on 32 bits CPU.
In this case, due to integer overflow, the value returned by hashCode can be negative.
When we put objects into a hashtable, it is possible that different objects (by the equals() method) might have the same hashcode. This is called a collision.
How to resolve collisions?
- One way is by putting the keys that collide in a linked list.
- A linked list is formed at that array index location and new entry is stored as next node.
- A hash table then is an array of lists.
Why Hash Table?
- The big advantage of using a hash table is a constant-time performance for the basic operations add, remove, contains, size.
- Though, because of collisions, we cannot guarantee the constant runtime in the worst-case.
- Why? Imagine that all our objects collide into the same index.
- Then searching for one of them will be equivalent to searching in a list, that takes a liner runtime.
- However, we can guarantee an expected constant runtime, if we make sure that our lists won’t become too long.
What is Rehashing?
- Above problem of too long list is usually implemented by maintaining a load factor that keeps a track of the average length of lists.
- If a load factor approaches a set in advanced threshold, we create a bigger array and rehash all elements from the old table into the new one. This is called Rehashing.