C++/C++: Deques and sets
It takes longer to remove a randomly selected element from a deque of given size than it takes to remove a randomly selected element from a set of the same size. True of False? Explain.
I cannot comment too authoritatively on such things as I am not an expert in the design and implementation of data structures.
However in theory yes, removing an element from a set is quite possibly quicker, unless you are removing from the beginning or end of the deque (for which removal is fast).
Sets are often node based containers based on a tree structure thus removing a given node should be quite quick. However if the removal caused additional work such as re-balancing of a balanced binary tree, then it may prove to be slower. A set could also be implemented using a hash table, using the set values as the hashed values. In such a case removing an item could be similar to that of removing an item for a deque or vector. Note that C++ std::set complexity constraints imply an implementation using a balanced binary tree or similar (e.g. a red-black tree as they are good for both varying the number of elements and searching for elements).
Deques are often implemented as an array of arrays, thus removing an element not at an end would mean that some re-jigging of subsequent values is required at least for part of the structure.
However it really depends on the implementations of the deque and set types, the size and state of the collections and possibly even the system you are running on as cache memory is much faster than main memory and main memory is very much faster than memory paged to and from disk.
As an illustration of this last point consider how much additional memory is required for housekeeping by the collection type. If the overhead per element is quite high for a set over a deque then this could lead to the set causing memory to be paged whereas the deque does not. This is a real possibility - see for example:
Cache memory in particular rewards locality of data items, so for example operations on a vector may be faster than using a set if the contiguous data in the vector fits in fast processor cache memory. I suspect a similar argument could hold for a deque in certain situations at least.
OK having given you a bunch of ifs and buts above the time complexity constraints for the standard C++ library std::deque and std::set types are as follows.
In my descriptions below p is an iterator that points to the element to be erased. This means the time complexity is _only_ for the erasure of a single element. No time is included for searching for the element for a given value, no considerations for removal of additional elements are included:
- if p points to an element at the beginning or end of the deque
then the erase operation occurs in constant time (O(1)).
- if p point to an element in the middle of the deque then
the erase operation occurs in linear time (O(n)).
See the ISO C++ standard section 23.2.1 paragraph 1.
- Erasure of the element pointed to by p occurs in amortised
constant time (O(1)).
See the ISO C++ standard section 23.1.2 table 69.
Amortised constant time means that the time is constant on average (i.e. amortised over many operations), but any given single operation may take a longer amount of time.
The O(1) and O(n) values are constant and linear time complexity expressed using what is called Big-O notation. Other complexity values are logarithmic ( O(log(n)) ), n-log-n ( O(n*log(n)) ) and quadratic ( O(n*n) - note this should be n raised to the power 2 but the AllExperts answer formatting does not support superscripted characters as far as I can tell! ).
To get a handle on the effect on runtime with respect to these complexity values check out the following table (view the table using a mono-spaced font such as Courier or the formatting will be messed up):
Complexity | Number of Elements
Type | Notation | 1| 2| 5| 10| 50| 100| 1000
Constant |O(1) | 1| 1| 1| 1| 1| 1| 1
Logarithmic|O(log(n)) | 1| 2| 3| 4| 6| 7| 10
Linear |O(n) | 1| 2| 5| 10| 50| 100| 1000
n-log-n |O(n*log(n))| 1| 4| 15| 40| 300| 700| 10000
Quadratric |O(n*n) | 1| 4| 25| 100| 2500| 10000| 1000000
The above was borrowed from the excellent book "The C++ Standard Library a Tutorial and Reference" by Nicolai M. Josuttis (page 22).
As you can see you do not want to use algorithms or operations that have a complexity worse than linear for anything but the smallest of datasets as the increase in time to perform them rises alarmingly with the number of elements!
Note that the absolute time taken to perform these operations is not given by such complexity types and notations. They only specify how much more time it would take to perform the operation on a collection of n items than it takes to perform the operation on a collection of one item.
Thus if it takes 2 weeks to look up an item in constant time in some collection of a million items then this would still be slower in real elapsed time than looking up an item in a collection of a million items using an operation having quadratic time complexity but the time to look up an item in a collection of a single item is only a millionth of a second (by my calculations this would take 1000000 * 1000000 / 1000000 seconds or 1000000 seconds, which is approximately 278 hours or 11.6 days). However if both collections had ten million items then the constant time operation would still take 2 weeks but the quadratic time operation would take 100 times longer (approximately 1157.4 days).
Sorry if this is not the nice clean answer you were looking for but we do not live in a nice clean world!