JAVA Collections
JAVA Collections
Arrays :
Arrays are collection of elements which are of same type . Arrays can store both primitive and reference objects. We have to define the size of an array while initialising the array like below.
int[] intArray = new int[3];
Where 3 is size of an array,
To check the length of an array we can use .length .
There is no option to remove an element from an array just we have to point an object to null and we cannot change the size of an array. When we try to add element at index which is greater than the size of an array then JAVA will throw Array IndexOutofBound Exception.
So ., The main disadvantages of Array is that it doesn’t contains any methods to do all the operations. And The size of an array will not dynamically change.
Advantage is it is more efficient compared to list because Arrays will take primitives where it is not possible in ArrayList so to wrap and unwrap objects it will take some time.
Why ArrayList will not accept primitives in JAVA?
There are two main reasons why ArrayList won't directly accept primitive data types in Java:
- Generics: Since Java 5, collections like ArrayList use generics. Generics allow you to specify the data type that the collection can hold. This ensures type safety and prevents errors at compile time. Primitive data types like int, double, etc., are not objects in Java, and generics work with objects.
- Collection Framework Design: The Java Collection Framework (JCF) is designed to work with objects. Objects have methods and behaviour that collections can leverage. For example, you can call methods like equals() and hashCode() on objects to compare them within the collection. Primitive data types don't have these methods.
While primitive data types offer efficiency, using wrapper classes with collections like ArrayList provides type safety, consistency, and leverages the object-oriented features of Java.
To add an element to an array we have to specify the array position then assign the value to the position like below
intArray[0] = 1;
intArray[2] =2;
For Removing an element Java Arrays will not provide any builtin method so we can follow below approaches:
Choosing the Right Approach:
- Manual Removal (Shifting): This approach is efficient if you don't need to return a new array and want to modify the original array in-place. However, it can be less readable and might affect the original array's elements order.
- Creating a New Array: This approach is more readable and creates a new array without modifying the original one. It's preferred if you need to preserve the original array or want to return a new array without the removed
Collections:
This Collections Framework is in Java.util package.
Having seen above with Arrays ., Arrays have many limitations like dynamic size growing ., operations on elements etc.. In JAVA 5 Java Introduced a framework called Collections. Collection is an interface in JAVA which is extended by many java collections like List., Set…. This collection Interface is introduced in 1.2. The Root interface in collection framework is Collection Interface which represents groups of elements . Some Collections are sorted and some are not. Some Collections are ordered and som are unordered. Collections that have a defined encounter order are generally subtypes of the SequencedCollection interface .The JDK does not provide any direct implementations of this interface: it provides implementations of more specific subInterfaces like Set and List.
Generics :
Collection interface is a generic interface . Generic Interface means its type check. Generic class or Generic Interface means complete class or interface works on the type parameter which we defined by letter E. Like Collection<E>.
Iterable:
This Iterable marker Interface introduced in Java 5 and This interface is of generic which is having type parameter.Collections Interface is extended by Iterable<E> Interface. This Iterator is available in java.lang package. Implementing this interface allows the collections for traversing in for or for-each loops.
This Interface contains methods like iterator and default method of forEach from java 8.
Iterator:
Iterator is an interface which will do all operations for traversing of collections. It have methods like hasNext,next,
Certain methods are specified to be optional. If a collection implementation doesn't implement a particular operation, it should define the corresponding method to throw UnsupportedOperationException. Such methods are marked "optional operation" in method specifications of the collections interfaces.Many Methods in Collections Interface depend on equals and hashCode methods from Object class like contains depends on these method to check weather elements contains in collection or not. The Collection Interface contains many abstract , default methods such as add , addAll, remove, removeAll, stream, contains etc..
SequencedCollection:
This is an interface introduced in JAVA 21. To maintain order in collections the. Collections will implement SequencedCollection to maintain Order. This interface having many methods related sequencing of elements like addFirst, removeFirst, addLast, removeLast, getFirst , getLast,
List :
List is an interface which extends SequencedCollection interface. List is an ordered Collections of elements where any user can have access over where the elements got inserted. Any element can access through the index of the list. List will allow duplicate elements and many null values except some specials methods where they will throw an exception while adding null values. As List interface indirectly extending the Collection Interface it contains all the methods collection have like add, remove, etc.. additionally list interface have 4 methods to access the elements. 2 methods to add collection of elements at a time . Lists also provide unmodifiable lists through methods like List.of, List.copyOf where we cannot do any mutable operations on these elements. These will also not accept null values. The extra methods where List Interface added are below :
Sort method., which will Take Comparator as argument,.indexOf any object and ListIterator<E> listIterator() and many other we can refer to API Documentation.
List interface is implemented by following Classes
ArrayLists :
ArraysList behave same as arrays with all disadvantages solved . ArraysLists is a class in JAVA which implements the LIST Interface ., The main advantage of ArrayList is that it will dynamically grow its size in runtime. And it is same old object structure which contains state and behaviour .ArrayLists contains many methods we can do many operations with collection.
To define new array list we can write like below:
ArrayList<Integer> intArray = new ArrayList<Integer>();
See in above example we declared a variable of type array list of Integer type and assigned the object of array list and we didn’t mentioned any size inside . Java Collection framework will automatically take care of size at runtime, If we want arrayList of particular size we will restrict it like below.
ArrayList<Integer> intArray = new ArrayList<Integer>(5);
Here 5 is the initial capacity of the list which JAVA will allocate the memory initially and even we can add extra items at runtime Java will take care of dynamically growing.
To check the size of the arrayList we can use the below method:
intArray.size();
As ArrayList is an object in JAVA 5 it contains many methods.
To add an element in arrayList we can do below
intArray.add(2);
intArray.add(2,3);
Where in above example add(2,3) means 3 will add at 2 index.
To remove an element in arrayList we can do below
intArray.remove(object); or intArray.remove(index);
To get an element in arrayList we can do below
intArray.get(index);
And Java ArrayList contains many methods like contains., IsEmpty., capacity etc…..
Vector :
Both Vector and ArrayList are used to represent ordered collections of elements in Java, but they have some key differences:
Synchronization:
- Vector: Synchronized. This means that only one thread can access a Vector's methods at a time. This ensures thread safety in multithreaded environments, preventing data corruption caused by concurrent access. However, synchronization comes at a performance cost.
- ArrayList: Not synchronized. This makes ArrayList faster for single-threaded operations because there's no overhead for thread synchronization. However, it can lead to data corruption issues if accessed by multiple threads simultaneously.
Resizing Behavior:
- Vector: When a Vector reaches its capacity (initial size), it doubles its size to accommodate new elements. This can lead to some wasted memory if the collection doesn't grow significantly.
- ArrayList: When an ArrayList reaches its capacity, it increases its size by 50% by default. This can be more memory-efficient than doubling the size every time. However, you can also customize the growth factor in ArrayList if needed.
Legacy vs. Modern:
- Vector: Legacy class. It was introduced in earlier versions of Java and is considered less preferred in modern development due to its synchronization overhead and less efficient resizing behavior.
- ArrayList: Part of the Java Collections Framework. It's a more modern and recommended choice for most scenarios due to its better performance and flexibility.
Additional Differences:
- Vector: Provides some legacy methods like addElement and firstElement that are not present in ArrayList.
- ArrayList: Offers methods like trimToSize that allow you to manually reduce the size of the internal array to avoid wasting memory.
Queue :
A queue generally is a linear data structure which follows first-in first out(FIFO) principle. The element which inserted first will be moved to head. Th element which insert last has to wait until all elements removed. It is like line of people waiting for something.
In Java Queue we cannot use directly we will use its implementations like LinkedList., PriorityQueue.
Queue offers 2 types of operations one throws exception if the operation fails while other returns special value. add(e)., remove, element() will throws exception and offer(e), poll() and peek() returns special value.Every Queue implementation must specify its ordering properties.
Choosing the Right Queue Implementation:
- If you need a simple FIFO queue with efficient insertions and removals at both ends, use LinkedList.
- If you need to prioritize elements based on a specific order, use PriorityQueue.
We can implement LIFO queue using stack or reversed order linked list in Java.
The primary difference between add ,remove, element and offer, poll, peek in Queue implementations in Java lies in their behaviour when the queue reaches its capacity (if applicable):
DeQueue :
The Deque ( The name deque is short for "double ended queue" and is usually pronounced "deck") interface in Java is part of the java.util package and represents a Double Ended Queue. It extends the Queue interface, providing functionalities for adding and removing elements from both ends (front and back) of the collection.
Dequeue can act as both Stack and Queue
PriorityQueue :
An unbounded priority queue based on a priority heap. The elements of the priority queue are ordered according to their natural ordering, or by a Comparator provided at queue construction time, depending on which constructor is used. A priority queue does not permit null elements. A priority queue relying on natural ordering also does not permit insertion of non-comparable objects (doing so may result in ClassCastException).
Note that this implementation is not synchronized. Multiple threads should not access a PriorityQueue instance concurrently if any of the threads modifies the queue. Instead, use the thread-safe java.util.concurrent.PriorityBlockingQueue class.
LinkedList :
Doubly-linked list implementation of the List and Deque interfaces. Implements all optional list operations, and permits all elements (including null).
LinkedList is not synchronized. If multiple threads access a linked list concurrently, and at least one of the threads modifies the list structurally, it must be synchronized externally. (A structural modification is any operation that adds or deletes one or more elements; merely setting the value of an element is not a structural modification.) This is typically accomplished by synchronizing on some object that naturally encapsulates the list. If no such object exists, the list should be "wrapped" using the Collections.synchronizedList method. This is best done at creation time, to prevent accidental unsynchronized access to the list:
List list = Collections.synchronizedList(new LinkedList(...))
LinkedList class contains an inner static call called Node which contains item, next and prev members and an all arg constructor.
This class contains 3 members size, first which will return a Node and last which will return a Node . All these 3 members are declared transient means they cannot be serialized with class serialization.
It also contains 3 constructors one is no - arg constructor., with collection of elements as a arg conduction and with an element as a arg as constructor.
Methods in LinkedList class :
private void linkFirst(E e) :
linkFirst which takes element of type Parameter E and returns void.
This method swaps the first element of global variable with the element adding with next and prev of new element linked to first of global like below
private void linkFirst(E e) {
final Node<E> f = first;
final Node<E> newNode = new Node<>(null, e, f);
first = newNode;
if (f == null)
last = newNode;
else
f.prev = newNode;
size++;
modCount++;
}
Void linkLast(E e) :
This method swaps the last element of global variable with the element adding with prev and next of new element linked to first like below
void linkLast(E e) {
final Node<E> l = last;
final Node<E> newNode = new Node<>(l, e, null);
last = newNode;
if (l == null)
first = newNode;
else
l.next = newNode;
size++;
modCount++;
}
Add method of LinkedList calls linkLast internally and adds at last.
Set :
A collection that contains no duplicate elements. More formally, sets contain no pair of elements e1 and e2 such that e1.equals(e2), and at most one null element. As implied by its name, this interface models the mathematical set abstraction.
Some set implementations have restrictions on the elements that they may contain. For example, some implementations prohibit null elements, and some have restrictions on the types of their elements. Attempting to add an ineligible element throws an unchecked exception, typically NullPointerException or ClassCastException. Attempting to query the presence of an ineligible element may throw an exception, or it may simply return false; some implementations will exhibit the former behavior and some will exhibit the latter. More generally, attempting an operation on an ineligible element whose completion would not result in the insertion of an ineligible element into the set may throw an exception or it may succeed, at the option of the implementation. Such exceptions are marked as "optional" in the specification for this interface.
Unmodifiable Sets :
The Set.of and Set.copyOf static factory methods provide a convenient way to create unmodifiable sets. The Set instances created by these methods have the following characteristics:
They are unmodifiable. Elements cannot be added or removed. Calling any mutator method on the Set will always cause UnsupportedOperationException to be thrown. However, if the contained elements are themselves mutable, this may cause the Set to behave inconsistently or its contents to appear to change.
They disallow null elements. Attempts to create them with null elements result in NullPointerException.
They are serializable if all elements are serializable.
They reject duplicate elements at creation time. Duplicate elements passed to a static factory method result in IllegalArgumentException.
The iteration order of set elements is unspecified and is subject to change.
They are value-based. Programmers should treat instances that are equal as interchangeable and should not use them for synchronization, or unpredictable behavior may occur. For example, in a future release, synchronization may fail. Callers should make no assumptions about the identity of the returned instances. Factories are free to create new instances or reuse existing ones.
They are serialized as specified on the Serialized Form page
SequencedSet :
This is an interface which extend sequenced collection adds only one method called reversed which returns SequenceSet<E>;
SortedSet:
A Set that further provides a total ordering on its elements. The elements are ordered using their natural ordering, or by a Comparator typically provided at sorted set creation time. The set's iterator will traverse the set in ascending element order. Several additional operations are provided to take advantage of the ordering. (This interface is the set analogue of SortedMap.)
All elements inserted into a sorted set must implement the Comparable interface (or be accepted by the specified comparator). Furthermore, all such elements must be mutually comparable: e1.compareTo(e2) (or comparator.compare(e1, e2)) must not throw a ClassCastException for any elements e1 and e2 in the sorted set. Attempts to violate this restriction will cause the offending method or constructor invocation to throw a ClassCastException.
All general-purpose sorted set implementation classes should provide four "standard" constructors: 1) A void (no arguments) constructor, which creates an empty sorted set sorted according to the natural ordering of its elements. 2) A constructor with a single argument of type Comparator, which creates an empty sorted set sorted according to the specified comparator. 3) A constructor with a single argument of type Collection, which creates a new sorted set with the same elements as its argument, sorted according to the natural ordering of the elements. 4) A constructor with a single argument of type SortedSet, which creates a new sorted set with the same elements and the same ordering as the input sorted set. There is no way to enforce this recommendation, as interfaces cannot contain constructors
NavigableSet:
Inherits all functionalities from SortedSet.
Offers additional methods for navigating through the set based on specific elements:
- lower(element): Returns the element in the set that is strictly less than the given element, or null if there is no such element.
- floor(element): Returns the element in the set that is less than or equal to the given element, or null if there is no such element.
- ceiling(element): Returns the element in the set that is greater than or equal to the given element, or null if there is no such element.
- higher(element): Returns the element in the set that is strictly greater than the given element, or null if there is no such element.
- pollFirst(): Removes and returns the first element from the set, or null if the set is empty.
- pollLast(): Removes and returns the last element from the set, or null if the set is empty.
- subSet(fromElement, toElement): Returns a view of the portion of this set whose elements range from fromElement to toElement (inclusive or exclusive depending on the implementation).
- headSet(toElement): Returns a view of the portion of this set whose elements are less than toElement.
- tailSet(fromElement): Returns a view of the portion of this set whose elements are greater than or equal to fromElement.
HashSet:
This class implements the Set interface, backed by a hash table (actually a HashMap instance). It makes no guarantees as to the iteration order of the set; in particular, it does not guarantee that the order will remain constant over time. This class permits the null element.
This class offers constant time performance for the basic operations (add, remove, contains and size), assuming the hash function disperses the elements properly among the buckets. Iterating over this set requires time proportional to the sum of the HashSet instance's size (the number of elements) plus the "capacity" of the backing HashMap instance (the number of buckets). Thus, it's very important not to set the initial capacity too high (or the load factor too low) if iteration performance is important.
Note that this implementation is not synchronized. If multiple threads access a hash set concurrently, and at least one of the threads modifies the set, it must be synchronized externally. This is typically accomplished by synchronizing on some object that naturally encapsulates the set. If no such object exists, the set should be "wrapped" using the Collections.synchronizedSet method. This is best done at creation time, to prevent accidental unsynchronized access to the set:
Set s = Collections.synchronizedSet(new HashSet(…));
Internally it uses HashMap as a member in class .We can Pass the iniitalCapacity and loadFactor in Arg Constrctor which internal calls HashMap like below :
public HashSet(int initialCapacity, float loadFactor) {
map = new HashMap<>(initialCapacity, loadFactor);
}
HashMap instance has default initial capacity (16) and load factor (0.75).
Below is the dummy Object where Set will use internally as class Member and pass as value in HashMap
// Dummy value to associate with an Object in the backing Map
static final Object PRESENT = new Object();
TreeSet :
A NavigableSet implementation based on a TreeMap. The elements are ordered using their natural ordering, or by a Comparator provided at set creation time, depending on which constructor is used.This implementation provides guaranteed log(n) time cost for the basic operations (add, remove and contains).
Note that the ordering maintained by a set (whether or not an explicit comparator is provided) must be consistent with equals if it is to correctly implement the Set interface. (See Comparable or Comparator for a precise definition of consistent with equals.) This is so because the Set interface is defined in terms of the equals operation, but a TreeSet instance performs all element comparisons using its compareTo (or compare) method, so two elements that are deemed equal by this method are, from the standpoint of the set, equal. The behavior of a set is well-defined even if its ordering is inconsistent with equals; it just fails to obey the general contract of the Set interface.
Note that this implementation is not synchronized. If multiple threads access a tree set concurrently, and at least one of the threads modifies the set, it must be synchronized externally. This is typically accomplished by synchronizing on some object that naturally encapsulates the set. If no such object exists, the set should be "wrapped" using the Collections.synchronizedSortedSet method. This is best done at creation time, to prevent accidental unsynchronized access to the set:
SortedSet s = Collections.synchronizedSortedSet(new TreeSet(...));
TreeSet does not allow null values. This is because TreeSet relies on the compareTo method for ordering its elements, and null has a special behavior when compared with other objects.
LinkedHashSet :
Hash table and linked list implementation of the Set interface, with well-defined encounter order. This implementation differs from HashSet in that it maintains a doubly-linked list running through all of its entries. This linked list defines the encounter order (iteration order), which is the order in which elements were inserted into the set (insertion-order). The least recently inserted element (the eldest) is first, and the youngest element is last. Note that encounter order is not affected if an element is re-inserted into the set with the add method. (An element e is reinserted into a set s if s.add(e) is invoked when s.contains(e) would return true immediately prior to the invocation.) The reverse-ordered view of this set is in the opposite order, with the youngest element appearing first and the eldest element appearing last. The encounter order of elements already in the set can be changed by using the addFirst and addLast methods.
This class provides all of the optional Set and SequencedSet operations, and it permits null elements. Like HashSet, it provides constant-time performance for the basic operations ( add, contains and remove), assuming the hash function disperses elements properly among the buckets. Performance is likely to be just slightly below that of HashSet, due to the added expense of maintaining the linked list, with one exception: Iteration over a LinkedHashSet requires time proportional to the size of the set, regardless of its capacity. Iteration over a HashSet is likely to be more expensive, requiring time proportional to its capacity.
A linked hash set has two parameters that affect its performance: initial capacity and load factor. They are defined precisely as for HashSet. Note, however, that the penalty for choosing an excessively high value for initial capacity is less severe for this class than for HashSet, as iteration times for this class are unaffected by capacity.
Note that this implementation is not synchronized. If multiple threads access a linked hash set concurrently, and at least one of the threads modifies the set, it must be synchronized externally. This is typically accomplished by synchronizing on some object that naturally encapsulates the set. If no such object exists, the set should be "wrapped" using the Collections.synchronizedSet method. This is best done at creation time, to prevent accidental unsynchronized access to the set:
Set s = Collections.synchronizedSet(new LinkedHashSet(…));
EnumSet:
EnumSet is abstract Sealed Class where it extends AbstactSet and It permits only JumboEnumSet and RegularEnumSet classes to extend this class.
A specialized Set implementation for use with enum types. All of the elements in an enum set must come from a single enum type that is specified, explicitly or implicitly, when the set is created. Enum sets are represented internally as bit vectors. This representation is extremely compact and efficient. The space and time performance of this class should be good enough to allow its use as a high-quality, typesafe alternative to traditional int-based "bit flags." Even bulk operations (such as containsAll and retainAll) should run very quickly if their argument is also an enum set.
Null elements are not permitted. Attempts to insert a null element will throw NullPointerException. Attempts to test for the presence of a null element or to remove one will, however, function properly.
Like most collection implementations, EnumSet is not synchronized. If multiple threads access an enum set concurrently, and at least one of the threads modifies the set, it should be synchronized externally. This is typically accomplished by synchronizing on some object that naturally encapsulates the enum set. If no such object exists, the set should be "wrapped" using the Collections.synchronizedSet method. This is best done at creation time, to prevent accidental unsynchronized access:
Set<MyEnum> s = Collections.synchronizedSet(EnumSet.noneOf(MyEnum.class))
Use EnumSet when you specifically work with sets of enum constants and require a memory-efficient solution.
JumboEnumSet is a Private implementation class for EnumSet, for "jumbo" enum types (i.e., those with more than 64 elements).
RegularEnumSet is a. Private implementation class for EnumSet, for "regular sized" enum types (i.e., those with 64 or fewer enum constants).
Map:
An object that maps keys to values. A map cannot contain duplicate keys; each key can map to at most one value.
This interface takes the place of the Dictionary class, which was a totally abstract class rather than an interface.
The Map interface does not extend collection interface due to structural differences that is why map does not contain any iterator. To iterate the elements in the map it has to use entrySet which converts to set of elements then it can use iterator. The Map Interface contains an inner interface called Entry.It contains many methods like getKey, getValue, setValue , comparingByKey etc…..
And some immutable methods like of, entry, ofEntries , copyOf etc….
The Map interface provides three collection views, which allow a map's contents to be viewed as a set of keys, collection of values, or set of key-value mappings. The order of a map is defined as the order in which the iterators on the map's collection views return their elements. Some map implementations, like the TreeMap class, make specific guarantees as to their encounter order; others, like the HashMap class, do not. Maps with a defined encounter order are generally subtypes of the SequencedMap interface.
Great care must be exercised if mutable objects are used as map keys. The behavior of a map is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is a key in the map. A special case of this prohibition is that it is not permissible for a map to contain itself as a key. While it is permissible for a map to contain itself as a value, extreme caution is advised: the equals and hashCode methods are no longer well defined on such a map.
All general-purpose map implementation classes should provide two "standard" constructors: a void (no arguments) constructor which creates an empty map, and a constructor with a single argument of type Map, which creates a new map with the same key-value mappings as its argument. In effect, the latter constructor allows the user to copy any map, producing an equivalent map of the desired class. There is no way to enforce this recommendation (as interfaces cannot contain constructors) but all of the general-purpose map implementations in the JDK comply.
The "destructive" methods contained in this interface, that is, the methods that modify the map on which they operate, are specified to throw UnsupportedOperationException if this map does not support the operation. If this is the case, these methods may, but are not required to, throw an UnsupportedOperationException if the invocation would have no effect on the map. For example, invoking the putAll(Map) method on an unmodifiable map may, but is not required to, throw the exception if the map whose mappings are to be "superimposed" is empty.
Some map implementations have restrictions on the keys and values they may contain. For example, some implementations prohibit null keys and values, and some have restrictions on the types of their keys. Attempting to insert an ineligible key or value throws an unchecked exception, typically NullPointerException or ClassCastException. Attempting to query the presence of an ineligible key or value may throw an exception, or it may simply return false; some implementations will exhibit the former behavior and some will exhibit the latter. More generally, attempting an operation on an ineligible key or value whose completion would not result in the insertion of an ineligible element into the map may throw an exception or it may succeed, at the option of the implementation. Such exceptions are marked as "optional" in the specification for this interface.
Some map implementations have restrictions on the keys and values they may contain. For example, some implementations prohibit null keys and values, and some have restrictions on the types of their keys. Attempting to insert an ineligible key or value throws an unchecked exception, typically NullPointerException or ClassCastException. Attempting to query the presence of an ineligible key or value may throw an exception, or it may simply return false; some implementations will exhibit the former behaviour and some will exhibit the latter. More generally, attempting an operation on an ineligible key or value whose completion would not result in the insertion of an ineligible element into the map may throw an exception or it may succeed, at the option of the implementation. Such exceptions are marked as "optional" in the specification for this interface.
Some map operations which perform recursive traversal of the map may fail with an exception for self-referential instances where the map directly or indirectly contains itself. This includes the clone(), equals(), hashCode() and toString() methods. Implementations may optionally handle the self-referential scenario, however most current implementations do not do so.
Unmodifiable Maps
The Map.of, Map.ofEntries, and Map.copyOf static factory methods provide a convenient way to create unmodifiable maps. The Map instances created by these methods have the following characteristics:
1. They are unmodifiable. Keys and values cannot be added, removed, or updated. Calling any mutator method on the Map will always cause UnsupportedOperationException to be thrown. However, if the contained keys or values are themselves mutable, this may cause the Map to behave inconsistently or its contents to appear to change.
- They disallow null keys and values. Attempts to create them with null keys or values result in NullPointerException.
- They are serializable if all keys and values are serializable.
4. They reject duplicate keys at creation time. Duplicate keys passed to a static factory method result in IllegalArgumentException.
- The iteration order of mappings is unspecified and is subject to change.
6. They are value-based. Programmers should treat instances that are equal as interchangeable and should not use them for synchronization, or unpredictable behavior may occur. For example, in a future release, synchronization may fail. Callers should make no assumptions about the identity of the returned instances. Factories are free to create new instances or reuse existing ones.
7. They are serialized as specified on the Serialized Form page.
Dictionary :
Dictionary is a legacy abstract class which stores key-value Paris like map . Dictionary is introduced from Java 1 release and after introducing Java Collection framework Java recommend to use Map in place of dictionary. HashTable class is extending this abstract class. It has many methods like keys() , elements which gives Enumeration of the data. Put and remove are mutable methods. Dictionary basically will not accept the null keys any class which is extending dictionary should not accept null keys.
HashTable :
HashTable is a class which is introduced from JAVA 1 which is extending dictionary initially .,after releasing Java collection framework Java team put HashTable under collection framework . That’s why HashTable extends Dictionary and impends Map Interface. HashTable contains below members which will be used in methods to achieve the functionality.
Count - this is a variable of type integer and stores the size of key-value pairs in map
Table - this is variable which is transient and of type Entry<?,?>[] . array of entry objects -., Entry is an inner interface defined in Map interface and HastTable implements and provides implementation class like below.
private static class Entry<K,V> implements Map.Entry<K,V> {
final int hash;
final K key;
V value;
Entry<K,V> next;
protected Entry(int hash, K key, V value, Entry<K,V> next) {
this.hash = hash;
this.key = key;
this.value = value;
this.next = next;
}
@SuppressWarnings("unchecked")
protected Object clone() {
return new Entry<>(hash, key, value,
(next==null ? null : (Entry<K,V>) next.clone()));
}
// Map.Entry Ops
public K getKey() {
return key;
}
public V getValue() {
return value;
}
public V setValue(V value) {
if (value == null)
throw new NullPointerException();
V oldValue = this.value;
this.value = value;
return oldValue;
}
public boolean equals(Object o) {
if (!(o instanceof Map.Entry<?, ?> e))
return false;
return (key==null ? e.getKey()==null : key.equals(e.getKey())) &&
(value==null ? e.getValue()==null : value.equals(e.getValue()));
}
public int hashCode() {
return hash ^ Objects.hashCode(value);
}
public String toString() {
return key.toString()+"="+value.toString();
}
}
Threshold - The is the max limit where the table will be rehashed when its reach the threshold limit. This is of integer type.
loadFactor - This is a variable of type float which stores the load factor for Hash table.
modCount - This is variable of type int . It will store The number of times this HashTable has been structurally modified Structural modifications are those that change the number of entries in the HashTable or otherwise modify its internal structure (e.g., rehash). This field is used to make iterators on Collection-views of the Hashtable fail-fast. (See ConcurrentModificationException).
HashTable is synchronised which means it is thread safe.
Architecture in any Map Implementations :
Any Map implementation classes contains 2 main variables initial capacity and load factor. By using these 2 variables the data structure will calculate the threshold value where the storage will doubles once the data reaches the threshold. For example we initialised a HashTable with 100 initial capacity and 0.75 as load factor then threshold will be calculates by following formula :
threshold = (int)Math.min(initialCapacity * loadFactor, MAX_ARRAY_SIZE + 1);
MAX_ARRAY_SIZE = 0x7fffffff;
Threshold = 100*0.75 = 75.
So why the HashTable size reaches 75 then it will do rehash by doubling the size of the Array using right shift operator like below
int newCapacity = (oldCapacity << 1) + 1;
It will create an array of size initial capacity while intializing the HashMap using constuctor like below :
table = new Entry<?,?>[initialCapacity];
Default initial capacity for HashTable is 11.
Default load factor for HashTable is 0.75.
An instance of Hashtable has two parameters that affect its performance: initial capacity and load factor. The capacity is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created. Note that the hash table is open: in the case of a "hash collision", a single bucket stores multiple entries, which must be searched sequentially. The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased. The initial capacity and load factor parameters are merely hints to the implementation. The exact details as to when and whether the rehash method is invoked are implementation-dependent.
Generally, the default load factor (.75) offers a good tradeoff between time and space costs. Higher values decrease the space overhead but increase the time cost to look up an entry (which is reflected in most Hashtable operations, including get and put).
The initial capacity controls a tradeoff between wasted space and the need for rehash operations, which are time-consuming. No rehash operations will ever occur if the initial capacity is greater than the maximum number of entries the Hashtable will contain divided by its load factor. However, setting the initial capacity too high can waste space.
If many entries are to be made into a Hashtable, creating it with a sufficiently large capacity may allow the entries to be inserted more efficiently than letting it perform automatic rehashing as needed to grow the table.
Map implementation uses buckets to store data. It will compute the HashCode of the Key and stores the entry in the particular index of the Array . If another hashcode got same Index as another Key they this results in Hash Collision . HashTable allows to store multiple entries in single bucket if 2 keys compute same index this is called chaining. Here we use linked List to store entries in single bucket.
Below is the implementation when we put a pair in HashMap.
public synchronized V put(K key, V value) {
// Make sure the value is not null
if (value == null) {
throw new NullPointerException();
}
// Makes sure the key is not already in the hashtable.
Entry<?,?> tab[] = table;
int hash = key.hashCode();
int index = (hash & 0x7FFFFFFF) % tab.length;
@SuppressWarnings("unchecked")
Entry<K,V> entry = (Entry<K,V>)tab[index];
for(; entry != null ; entry = entry.next) {
if ((entry.hash == hash) && entry.key.equals(key)) {
V old = entry.value;
entry.value = value;
return old;
}
}
addEntry(hash, key, value, index);
return null;
}
In the above implementation of put method in HashTable. It will check for null values. If the value is not null the it will compute the HashCode and fine the index . And get the bucket if exists in the particular index and it will get that bucket and check the key already exists ., If the key already exits then it will replace the value for the key . Else it will add an entry in the HashTable like below :
private void addEntry(int hash, K key, V value, int index) {
Entry<?,?> tab[] = table;
if (count >= threshold) {
// Rehash the table if the threshold is exceeded
rehash();
tab = table;
hash = key.hashCode();
index = (hash & 0x7FFFFFFF) % tab.length;
}
// Creates the new entry.
@SuppressWarnings("unchecked")
Entry<K,V> e = (Entry<K,V>) tab[index];
tab[index] = new Entry<>(hash, key, value, e);
count++;
modCount++;
}
While adding entry it will check that the hashTable reaches the threshold or not if HashTable reaches the threshold limit then it will doubles the size using reHash Method then creates new entry . The reHash implementation of HashTable is below :
protected void rehash() {
int oldCapacity = table.length;
Entry<?,?>[] oldMap = table;
// overflow-conscious code
int newCapacity = (oldCapacity << 1) + 1;
if (newCapacity - MAX_ARRAY_SIZE > 0) {
if (oldCapacity == MAX_ARRAY_SIZE)
// Keep running with MAX_ARRAY_SIZE buckets
return;
newCapacity = MAX_ARRAY_SIZE;
}
Entry<?,?>[] newMap = new Entry<?,?>[newCapacity];
modCount++;
threshold = (int)Math.min(newCapacity * loadFactor, MAX_ARRAY_SIZE + 1);
table = newMap;
for (int i = oldCapacity ; i-- > 0 ;) {
for (Entry<K,V> old = (Entry<K,V>)oldMap[i] ; old != null ; ) {
Entry<K,V> e = old;
old = old.next;
int index = (e.hash & 0x7FFFFFFF) % newCapacity;
e.next = (Entry<K,V>)newMap[index];
newMap[index] = e;
}
}
}
HashMap :
Hash table based implementation of the Map interface. This implementation provides all of the optional map operations, and permits null values and the null key. (The HashMap class is roughly equivalent to Hashtable, except that it is unsynchronized and permits nulls.) This class makes no guarantees as to the order of the map; in particular, it does not guarantee that the order will remain constant over time.
HashMap have TreeNode like in a bucket if the no of nodes increases then HashMap automatically converts Nodes into TreeNode Structure which makes HashMap traversing easy as TreeNode implementation follows Red-Black Tree BST implementation. TreeNode is inner class inside HashMap which extends Entry from LinkedHashMap and which again extends Node from HashMap. TreeNode contains parent., left and right Childs. Tree data structures generally help in fast traversing. It contains many methods like root().,find., getTreeNode etc..
TreeNode :
TreeNode is a data structure which stores the reference of parent , left and right Childs.It has a boolean variable red to know whether the treeNode is red or black. Red Black tree implementation is type of Binary search tree where it follows certain rules to say the tree is red-black tree or not.
The rules to say a tree is BST or not is follows :
1. The tree should balanced Binary search tree.
- Each node in the tree should be red or black.
3. The Root Node should be black.
4. Red nodes cannot have red children. In other words, no two adjacent nodes can be red.
5. Every path from a node to its descendant null nodes (leaves) must have the same number of black nodes. This property ensures that the tree remains approximately balanced.
When performing insertion or deletion operations on a Red-Black Tree, additional rules are applied to maintain its properties:
Insertion:
- If a violation occurs during insertion (e.g., a red node with a red parent), the tree is rebalanced through rotations and color adjustments to restore the Red-Black properties.
Deletion:
- When deleting a node, if it has two black children, the tree may violate the Black Height Property. To fix this, additional steps such as rotations, color adjustments, and node replacements are performed to restore the Red-Black properties
The following tree is red-black tree :
The TreeNode class has many methods to balance a tree to red-black .
root() - This method returns root element means main parent of this treeNode . The Implementation of root() class is as follows :
final TreeNode<K,V> root() {
for (TreeNode<K,V> r = this, p;;) {
if ((p = r.parent) == null)
return r;
r = p;
}
}
moveRootToFront — This method moves the given root node to front of the bin . It takes 2 args .one is array of nodes and another the nodes which we move to front . The implementation is as follows :
static <K,V> void moveRootToFront(Node<K,V>[] tab, TreeNode<K,V> root) {
int n;
if (root != null && tab != null && (n = tab.length) > 0) {
int index = (n - 1) & root.hash;
TreeNode<K,V> first = (TreeNode<K,V>)tab[index];
if (root != first) {
Node<K,V> rn;
tab[index] = root;
TreeNode<K,V> rp = root.prev;
if ((rn = root.next) != null)
((TreeNode<K,V>)rn).prev = rp;
if (rp != null)
rp.next = rn;
if (first != null)
first.prev = root;
root.next = first;
root.prev = null;
}
assert checkInvariants(root);
}
}
The above method will move the root to the front of the tab by changing the chains for first node and pointing the next and prev of root to direct chain and first node to next of root and prev of root to null and prev of first node to root node. And checks finally the tree is red -black balanced or not by using checkInvariants method.
find — Finds the node starting at root p with the given hash and key. The kc argument caches comparableClassFor(key) upon first use comparing keys.
final TreeNode<K,V> find(int h, Object k, Class<?> kc) {
TreeNode<K,V> p = this;
do {
int ph, dir; K pk;
TreeNode<K,V> pl = p.left, pr = p.right, q;
if ((ph = p.hash) > h)
p = pl;
else if (ph < h)
p = pr;
else if ((pk = p.key) == k || (k != null && k.equals(pk)))
return p;
else if (pl == null)
p = pr;
else if (pr == null)
p = pl;
else if ((kc != null ||
(kc = comparableClassFor(k)) != null) &&
(dir = compareComparables(kc, k, pk)) != 0)
p = (dir < 0) ? pl : pr;
else if ((q = pr.find(h, k, kc)) != null)
return q;
else
p = pl;
} while (p != null);
return null;
}
There are many other methods like treeify - Forms tree of the nodes linked from this node and untreeify - Returns a list of non-TreeNodes replacing those linked from this node.
We can refer the documentation for all the methods for TreeNode.
HashMap Methods :
hash - This method takes the key and returns hash as int value.
static final int hash(Object key) {
int h;
return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
In Java, the >>> operator is the unsigned right shift operator. It shifts the bits of the operand to the right by a specified number of positions, filling the leftmost bits with zeros. The >>> operator treats the value as an unsigned integer, so it always fills the leftmost bits with zeros, regardless of the sign of the operand.
For example, if h is an integer variable, h >>> 16 will shift the bits of h 16 positions to the right, and the leftmost 16 bits will be filled with zeros.
Comments
Post a Comment