In this post we'll see how HashSet internally works in Java, which is also a favourite Java Collections interview question but before going into internal implementation of HashSet in Java it is important to know two points about HashSet.
- HashSet in Java only stores unique values i.e. no duplicates are allowed.
-
HashSet works on the concept of hashing just like HashMap in Java but its working differs from the HashMap in the following way-
- In HashMap a (Key, Value) pair is added and the hash function is calculated using key.
- Where as in the HashSet hash function is calculated using the value itself. Note that in HashSet we have add(E e) method which takes just the element to be added as parameter.
HashSet internally uses HashMap
Now coming back to internal implementation of HashSet in Java the most important point is HashSet class implementation internally uses HashMap to store it's elements.
Within the HashSet there are many constructors one without any parameter and several more with initial capacity or load factor but each one of these constructor creates a HashMap. Since HashSet internally uses HashMap so knowing how HashMap works internally in Java will help you to understand how HashSet works internally in Java.
HashSet Constructor snippets
In the HashSet class in Java you can see that constructors of the class do create a HashMap.
/** * Constructs a new, empty set; the backing <tt>HashMap</tt> instance has * default initial capacity (16) and load factor (0.75). */ public HashSet() { map = new HashMap<>(); }
public HashSet(int initialCapacity, float loadFactor) { map = new HashMap<>(initialCapacity, loadFactor); }
And the map, which is used for storing values, is defined as
private transient HashMap<E,Object> map;
In the constructor, if you have noticed, there are parameters named initial capacity and load factor. For HashSet, default initial capacity is 16, that is an array (or bucket) of length 16 would be created and default load factor is 0.75. Where load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased.
How elements are added - HashSet internal implementation
I stated in the point 2 above that HashSet calculates the hash function using value itself and there is no (Key, Value) pair in HashSet and then came the statement that HashSet internally uses HashMap to store objects. These two statements may sound contradictory as HashMap stores (key, value) pair so let's see how these these two contradictory statements hold true.
Actually from add method of HashSet class put() method of HashMap is called where the value, which has to be added in the Set, becomes Key and a constant object "PRESENT" is used as value.
That's how PRESENT is defined in HashSet implementation-
// Dummy value to associate with an Object in the backing Map private static final Object PRESENT = new Object();
And that's how add method is implemented in HashSet class -
public boolean add(E e) { return map.put(e, PRESENT)==null; }
So you can see with in the internal implementation of the HashSet it's a (key, value) pair which is actually getting added. It's just that the actual value (which is added to the HashSet) becomes the key and a dummy value "PRESENT" is added as value when storing it in the backing HashMap.
For example a statement for adding an element to HashSet- set.add("Mumbai"); internally translates into map.put("Mumbai", PRESENT); and then added to the backing HashMap instance.
One thing to note here is, in HashMap value may be duplicate but Key should be unique. That's how HashSet makes sure that only unique values are stored in it, since the value which is to be stored in the HashSet becomes the key while storing it in HashMap.
How element is removed - HashSet internal implementation
When we need to remove an element from the HashSet, internally again remove method of HashSet calls remove(Object key) method of the HashMap.
That is how it is implemented in HashSet class.
public boolean remove(Object o) { return map.remove(o)==PRESENT; }
Here note that remove(Object key) method of the HashMap returns the Value associated with the key. Whereas the remove(Object o) method of the HashSet returns Boolean value. Also we know that for every value added in HashSet, internally when it is added to the associated HashMap, value becomes Key and the value is always an object called PRESENT. Therefore the value that is returned from the remove(Object key) method of the HashMap is always PRESENT thus the condition map.remove(o)==PRESENT.
How elements are retrieved from HashSet in Java
In HashSet there is no get method as provided in Map or List. In HashSet iterator is there which will iterate through the values of the Set. Internally it will call the keyset of the HashMap, as values are stored as keys in the HashMap so what we'll get is the values stored in the HashSet.
That's how iterator is internally implemented in the HashSet in Java.
/** * Returns an iterator over the elements in this set. The elements * are returned in no particular order. * * @return an Iterator over the elements in this set * @see ConcurrentModificationException */ public Iterator<E> iterator() { return map.keySet().iterator(); }
Points to note
- Unlike HashMap where hash function is calculated using key HashSet uses the value itself to calculate the hash function.
- Since hash function is calculated using value that is why only unique values are stored in the HashSet.
- HashSet internally uses HashMap to store its elements.
- When element is added to HashSet using add(E e) method internally HashSet calls put() method of the HashMap where the value passed in the add method becomes key in the put() method. A dummy value “PRESENT” is passed as value in the put() method.
Recommendations for learning (Udemy Courses)
- Java Programming Masterclass Course
- Java In-Depth: Become a Complete Java Engineer!
- Spring Framework Master Class Course
- Complete Python Bootcamp Course
- Python for Data Science and Machine Learning
That's all for this topic How HashSet Works Internally in Java. If you have any doubt or any suggestions to make please drop a comment. Thanks!
Related Topics
You may also like-
very nicely explained..kudos !!
ReplyDeleteHere is the doubt, you said "HashSet, from add method, put method of HashMap is called where the value which has to be added in the Set becomes Key and a constant object PRESENT is used as value", but in Hashmap it works on hashing concept change it key into hashcode and save it in a bucket, we can also have same hashcode for two different keys in hashmap but in hashmap if 2 different key has same hashmap, then it use equals() method to find exact value, what in case of HashSet if we have same hashcode for 2 different values even if it try to use equals() method, it save PRESENT as value.
ReplyDeletewhat will happen in this case. Could you please clarify.
Well there is no get() method in HashSet so your scenario will never occur, please read "How Object is retrieved from HashSet" section of the post. All you can get is a iterator which will call keyset of the HashMap so you will get set of keys in the HashMap which incidentally means elements of the HashSet.
DeleteYour explanation directly hits the brain...Thanq :)
ReplyDeleteVery good tutorial, for all implementations. Suggestions: include comparison of all collections, feature like when to use which collection, like arrays should be used for faster access and is slow during insertion etc.
ReplyDeleteThanks for the feedback!!
ReplyDeleteAll topics are very well explained.Should have search option in this blog.
ReplyDeleteI have a doubt. When we say hashset cannot have duplicates because hashcode is called on the value itself instead of key, then what about hash collision. What if 2 values have the same hashcode? Since it is a possible case in HashMap, where 2 different key’s can have same hashcode, can’t this happen for same 2 different values in case of HashSet?
ReplyDeleteYes that could be a possible scenario for a poorly implemented hashcode. If two different values have the same hashcode then latest value addition will overwrite the old one despite values being different.
Delete