IT Tips & Insights: Learn all about data structures, and when to use different classes of collections.
By Fernando Neto, Software Engineer
When working with data, we can use several different structures so that the data can be treated more efficiently and manipulated as a collection.
One of the most basic types of structures we can use is the System.Array type or the classes from the System.Collections and System.Collections.Generic namespaces to add, remove, and modify individual elements or a range of elements in a collection.
All collections using .Net provide methods to add, remove or locate items in the collection. However, there are collections that work like a dictionary, so we have a unique key to access the data, unlike other collections such as lists, queues and stacks.
In general, you should use generic collections when handling large amounts of data. The following table describes some common collection scenarios and the collection classes you can use for these scenarios.
Need | Collection |
---|---|
Store items as key/value pairs for quick lookup by key | Dictionary<TKey,TValue> |
Access Items by Index | List<T> |
First in, First Out (FIFO) | Queue<T> |
Last in, First Out (LIFO) | Stack<T> |
Access items in sequence | LinkedList<T> |
Receive notifications when items are removed from or added to the collection. | ObservableCollection<T> |
Classified collection | SortedList<TKey,TValue> |
A set of math functions | HashSet<T>
SortedSet<T> |
It is important to understand how each data structure works for each need.Â
Collections and Data Structure Complexity
When choosing a collection type, we need to consider potential performance tradeoffs. Below is an analysis of the complexity of using each data structure:
Action | Complexity | Worst Case |
---|---|---|
Stack<T>.Push | O(1) | O(n) |
Queue<T>.Enqueue | O(1) | O(n) |
List<T>.Add | O(1) | O(n) |
List<T>.Item[Int32] | O(1) | O(1) |
List<T>.Enumerator | O(n) | O(n) |
Search on List<T> | O(1) | O(n) |
SortedSet<T>.Add | O(log n) | O(n) |
Dictionary<T>.Add | O(1) | O(n) |
Search on Dictionary<T> | O(1) | O(1) |
SortedDictionary<T>.Add | O(log n) | O(n log n) |
As you can see from the list above, we have different complexities for different actions. Some collections are specific, like stacks and queues, or linked lists. However, when we talk about collections as a list of data, we can treat this data as a List or Dictionaries.Â
When dealing with a list of users, we can create a List or a Dictionary where the Dictionary Key is the user’s ID. This will greatly reduce the complexity of returning a user, rather than using a List.
When working with data, we need to be aware of the complexity of accessing information. We often use framework functions that can perform very long queries on lists, causing the application to not perform as expected.
Benchmark of Collections
Six tests were created with the following types of collections: List, HashSet and Dictionary. The tests and their results are described below:
Test 1 – Add 1.000.000 value types.
Results:
————————————————
HashSet.Add : 48 Elapsed Milliseconds
List.Add : 14 Elapsed Milliseconds
Dictionary.Add : 48 Elapsed Milliseconds
Dictionary[n] = n : 42 Elapsed Milliseconds
Test 2 – Add 1.000.000 reference types.
Results:
————————————————
HashSet.Add : 268 Elapsed Milliseconds
List.Add : 23 Elapsed Milliseconds
Dictionary.Add : 208 Elapsed Milliseconds
Dictionary[n] = n : 150 Elapsed Milliseconds
Test 3 – Run Contains() on half of 10000 value types.
Results:
————————————————
HashSet.Contains : 0 Elapsed Milliseconds
List.Contains : 8 Elapsed Milliseconds
Dictionary.ContainsKey : 0 Elapsed Milliseconds
Dictionary.ContainsValue : 48 Elapsed Milliseconds
Test 4 – Run Contains() on half of 10000 reference types.
Results:
————————————————
HashSet.Contains : 0 Elapsed Milliseconds
List.Contains : 126 Elapsed Milliseconds
Dictionary.ContainsKey : 0 Elapsed Milliseconds
Dictionary.ContainsValue : 207 Elapsed Milliseconds
Test 5 – Remove half of 10000 value types.
Results:
————————————————
HashSet.Remove : 0 Elapsed Milliseconds
List.Remove : 5 Elapsed Milliseconds
Dictionary.Remove : 0 Elapsed Milliseconds
Test 6 – Remove half of 10000 reference types.
Results:
————————————————
HashSet.Remove : 0 Elapsed Milliseconds
List.Remove : 68 Elapsed Milliseconds
Dictionary.Remove : 0 Elapsed Milliseconds
These tests were created using .Net 6.
The HashSet and Dictionary types are, in general, better performing than List, in which the faster speed adding new items is greatly offset by deficits in other common operations. However, it’s important to remember that based on your use case, the type of collection you should use normally picks itself. Use a list if you just need a List to keep track of items. Use a Dictionary if you require hash lookup against some value. Use a hash set if you need to perform set operations.
References
Collection and data structure
https://learn.microsoft.com/en-us/dotnet/standard/collections/
Benchmark unit test
https://theburningmonk.com/2011/03/hashset-vs-list-vs-dictionary/
*The tests are disponible on the link above, I created them using .Net 6.
About
Hello, my name is Fernando. I’m from Sao Paulo, Brazil. I have been working with .Net since 2006, with all versions and technologies, since version 2.0 of Framework. Currently I work as a Software Engineer at Softensity, always working with best practices and the latest technologies.