Let’s know all about the hashMap in Java

Prerna Jain
10 min readSep 10, 2018

--

The Collection interface (java.util.Collection) and Map interface (java.util.Map) are the two main “root” interfaces of Java collection classes

Hierarchy of Collection Framework

Map : Contains Key value pairs. Doesn't allow duplicates.  
Example implementation are HashMap and TreeMap.
TreeMap implements SortedMap.

HashMap

HashMap is a part of Java’s collection since Java 1.2. It provides the basic implementation of Map interface of Java. It stores the data in (K, V) pairs. HashMap is known as HashMap because it uses a technique called Hashing

Few important features of HashMap are:

  • HashMap is a part of java.util package.
  • HashMap extends an abstract class AbstractMap which also provides an incomplete implementation of Map interface.
  • It also implements Cloneable and Serializable interface. K and V in the hashMap definition represent Key and Value respectively.
  • HashMap doesn’t allow duplicate keys but allows duplicate values. That means A single key can’t contain more than 1 value but more than 1 key can contain the same value.
  • HashMap allows null key also but only once and multiple null values.
  • This class makes no guarantees as to the order of the map; in particular, it does not guarantee that the order will remain constant over time. It is roughly similar to HashTable but is unsynchronized.

Hashing
Hashing is designed to solve the problem of needing to efficiently find or store an item in a collection. For example, if we have a list of 10,000 words of English and we want to check if a given word is in the list, it would be inefficient to successively compare the word with all 10,000 items until we find a match. Hashing is a technique to make things more efficient by transforming of a string of characters into a usually shorter fixed-length value or key that represents the original string.

Hashing is used to index and retrieve items in a database because it is faster to find the item using the shorter hashed key than to find it using the original value.

Hashing in terms of Java
Hashing means using some function or algorithm to map object data to some representative integer value. This so-called hash code (or simply hash) can then be used as a way to narrow down our search when looking for the item in the map.
Function used here is hashCode(), Its necessary to write hashCode() method properly for better performance of HashMap.

Here I am taking key of my own class so that I can override hashCode() method to show different scenarios. My Key class is

//custom Key class to override hashCode() and equals() method
class Key
{
String key;
Key(String key)
{
this.key = key;
}

@Override
public int hashCode()
{
return (int)key.charAt(0);
}
@Override
public boolean equals(Object obj)
{
return key.equals((String)obj);
}
}

Here overrided hashCode() method returns the first character’s ASCII value as hash code. So whenever the first character of key is same, the hash code will be same. You should not approach this criteria in your program. It is just for demo purpose.

hashCode() method
hashCode() method is used to get the hash Code of an object. hashCode() method of object class returns the memory reference of object in integer form. Definition of hashCode() method is public native hashCode(). It indicates the implementation of hashCode() is native because there is not any direct method in java to fetch the reference of object. It is possible to provide your own implementation of hashCode().
In HashMap, hashCode() is used to calculate the bucket and therefore calculate the index.

equals() method
equals method is used to check that 2 objects are equal or not. This method is provided by Object class. You can override this in your class to provide your own implementation.
HashMap uses equals() to compare the key whether the are equal or not. If equals() method return true, they are equal otherwise not equal.

A bucket is one element of HashMap array. It is used to store nodes. Two or more nodes can have the same bucket. In that case link list structure is used to connect the nodes. Each little list is generally called a bucket. Buckets are different in capacity. A relation between bucket and capacity is as follows:

capacity = number of buckets * load factor

A Node can represent a class having following objects :

  1. int hash
  2. K key
  3. V value
  4. Node next

A single bucket can have more than one nodes, it depends on hashCode() method. The better your hashCode() method is, the better your buckets will be utilized

Index Calculation in Hashmap
Hash code of key may be large enough to create an array. hash code generated may be in the range of integer and if we create arrays for such a range, then it will easily cause outOfMemoryException. So we generate index to minimize the size of array. Basically following operation is performed to calculate index.

// can use any of these, both results the same
index = hashCode(key) & (n-1)
index = hashCode(key) % (n-1)

where n is number of buckets or the size of array. In this example, I will consider n as default size that is 16.

Initially Empty hashMap: Here, the hashmap is size is taken as 16.

HashMap map = new HashMap();

HashMap :

Inserting Key-Value Pair: Putting one key-value pair in above HashMap

map.put(new Key("vishal"), 20);

Steps:

  1. Calculate hash code of Key {“vishal”}. It will be generated as 118.
  2. Calculate index by using index method it will be 6.
  3. Create a node object as :
{
int hash = 118

// {"vishal"} is not a string but
// an object of class Key
Key key = {"vishal"}

Integer value = 20
Node next = null
}

4. Place this object at index 6, if no other object is presented there.

HashMap after insertion

Inserting another Key-Value Pair: Now, putting other pair that is,

map.put(new Key("sachin"), 30);

Steps:

  1. Calculate hashCode of Key {“sachin”}. It will be generated as 115.
  2. Calculate index by using index method it will be 3.
  3. Create a node object as :
{
int hash = 115
Key key = {"sachin"}
Integer value = 30
Node next = null
}

Place this object at index 3 if no other object is presented there.
Now HashMap becomes :

In Case of collision: Now, putting another pair that is,

map.put(new Key("vaibhav"), 40);

Steps:

  1. Calculate hash code of Key {“vaibhav”}. It will be generated as 118.
  2. Calculate index by using index method it will be 6.
  3. Create a node object as :
{
int hash = 118
Key key = {"vaibhav"}
Integer value = 40
Node next = null
}Place this object at index 6 if no other object is presented there.
  1. In this case a node object is found at the index 6 – this is a case of collision.
  2. In that case, check via hashCode() and equals() method that if both the keys are same.
  3. If keys are same, replace the value with current value.
  4. Otherwise connect this node object to the previous node object via linked list and both are stored at index 6.
    Now HashMap becomes :

Using get method()

get(K key) method is used to get a value by its key. If you don’t know the key then it is not possible to fetch a value.

Fetch the data for key sachin:

map.get(new Key("sachin"));

Steps:

  1. Calculate hash code of Key {“sachin”}. It will be generated as 115.
  2. Calculate index by using index method it will be 3.
  3. Go to index 3 of array and compare first element’s key with given key. If both are equals then return the value, otherwise check for next element if it exists.
  4. In our case it is found as first element and returned value is 30.

Fetch the data for key vaibahv:

map.get(new Key("vishal"));

Steps:

  1. Calculate hash code of Key {“vishal”}. It will be generated as 118.
  2. Calculate index by using index method it will be 6.
  3. Go to index 6 of array and compare first element’s key with given key. If both are equals then return the value, otherwise check for next element if it exists.
  4. In our case it is not found as first element and next of node object is not null.
  5. If next of node is null then return null.
  6. If next of node is not null traverse to the second element and repeat the process 3 until key is not found or next is not null.
// Java program to illustrate internal working of HashMapimport java.util.HashMap;class Key 
{
String key;
Key(String key){
this.key = key;
}
@Override
public int hashCode(){
int hash = (int)key.charAt(0);
System.out.println("hashCode for key: "+ key + " = " + hash);
return hash;
}
@Override
public boolean equals(Object obj){
return key.equals(((Key)obj).key);
}
}
// Driver classpublic class GFG {public static void main(String[] args){
HashMap map = new HashMap();
map.put(new Key("vishal"), 20);
map.put(new Key("sachin"), 30);
map.put(new Key("vaibhav"), 40);
System.out.println();
System.out.println("Value for key sachin: " + map.get(new Key("sachin")));
System.out.println("Value for key vaibhav: " + map.get(new Key("vaibhav")));
}
}
Output:
hashCode for key: vishal = 118
hashCode for key: sachin = 115
hashCode for key: vaibhav = 118

hashCode for key: sachin = 115
Value for key sachin: 30
hashCode for key: vaibhav = 118
Value for key vaibhav: 40

HashMap Changes in Java 8

As we know now that in case of hash collision entry objects are stored as a node in a linked-list and equals() method is used to compare keys. That comparison to find the correct key with in a linked-list is a linear operation so in a worst case scenario the complexity becomes O(n).
To address this issue, Java 8 hash elements use balanced trees instead of linked lists after a certain threshold is reached. Which means HashMap starts with storing Entry objects in linked list but after the number of items in a hash becomes larger than a certain threshold, the hash will change from using a linked list to a balanced tree, which will improve the worst case performance from O(n) to O(log n).

Constructors in HashMap

HashMap provides 4 constructors and access modifier of each is public:

HashMap() : It is the default constructor which creates an instance of HashMap with initial capacity 16 and load factor 0.75.
HashMap(int initial capacity) : It creates a HashMap instance with specified initial capacity and load factor 0.75.
HashMap(int initial capacity, float loadFactor) : It creates a HashMap instance with specified initial capacity and specified load factor.
HashMap(Map map) : It creates instance of HashMapwith same mappings as specified map

Synchronized HashMap- As it is told that HashMap is unsynchronized i.e. multiple threads can access it simultaneously. If multiple threads access this class simultaneously and at least one thread manipulates it structurally then it is necessary to make it synchronized externally. It is done by synchronizing some object which enzapsulates the map. If No such object exists then it can be wrapped around Collections.synchronizedMap() to make HashMap synchronized and avoid accidental unsynchronized access. As in following example:

Map m = Collections.synchronizedMap(new HashMap(...));
// Now the Map m is synchronized.

Performance of HashMap depends on 2 parameters:

  1. Initial Capacity
  2. Load Factor

Initial Capacity: Capacity is simply the number of buckets whereas the initial capacity means the number of buckets in hashMap instance when it is created. The number of buckets will be automatically increased if the current size gets full.

Load Factor: The load factor is a measure of how full the HashSet is allowed to get before its capacity is automatically increased. When the number of entries in the hashMap exceeds the product of the load factor and the current capacity, the hash table is rehashed (that is, internal data structures are rebuilt). Rehashing is a process of increasing the capacity. In HashMap capacity is multiplied by 2.so that the hash table has approximately twice the number of buckets.

Example: If internal capacity is 16 and load factor is 0.75 then, number of buckets will automatically get increased when the table has 12 elements in it.

If the initial capacity is kept higher then rehashing will never be done. But by keeping it higher it increases the time complexity of iteration. So it should be choosed very cleverly to increase the performance. The expected number of values should be taken into account to set initial capacity. Most generally preffered load factor value is 0.75 which provides a good deal between time and space costs. Load factor’s value varies between 0 and 1.

Important points:

  • Time complexity is almost constant for put and get method until rehashing is not done.
  • In case of collision, i.e. index of two or more nodes are same, nodes are joined by link list i.e. latest element will get inserted at head of list. first node is referenced by second node and second by third and so on.
  • If key given already exist in HashMap, the value is replaced with new value.
  • hash code of null key is 0.
  • When getting an object with its key, the linked list is traversed until the key matches or null is found on next field.

--

--