You are here:

- Home
- Computing/Technology
- C/C++
- C++
- C++: Deques and sets

Advertisement

It takes longer to remove a randomly selected element from a deque of given size than it takes to remove a randomly selected element from a set of the same size. True of False? Explain.

Thank you

I cannot comment too authoritatively on such things as I am not an expert in the design and implementation of data structures.

However in theory yes, removing an element from a set is quite possibly quicker, unless you are removing from the beginning or end of the deque (for which removal is fast).

Sets are often node based containers based on a tree structure thus removing a given node should be quite quick. However if the removal caused additional work such as re-balancing of a balanced binary tree, then it may prove to be slower. A set could also be implemented using a hash table, using the set values as the hashed values. In such a case removing an item could be similar to that of removing an item for a deque or vector. Note that C++ std::set complexity constraints imply an implementation using a balanced binary tree or similar (e.g. a red-black tree as they are good for both varying the number of elements and searching for elements).

Deques are often implemented as an array of arrays, thus removing an element not at an end would mean that some re-jigging of subsequent values is required at least for part of the structure.

However it really depends on the implementations of the deque and set types, the size and state of the collections and possibly even the system you are running on as cache memory is much faster than main memory and main memory is very much faster than memory paged to and from disk.

As an illustration of this last point consider how much additional memory is required for housekeeping by the collection type. If the overhead per element is quite high for a set over a deque then this could lead to the set causing memory to be paged whereas the deque does not. This is a real possibility - see for example:

http://www.gotw.ca/publications/mill14.htm.

Cache memory in particular rewards locality of data items, so for example operations on a vector may be faster than using a set if the contiguous data in the vector fits in fast processor cache memory. I suspect a similar argument could hold for a deque in certain situations at least.

OK having given you a bunch of ifs and buts above the time complexity constraints for the standard C++ library std::deque and std::set types are as follows.

In my descriptions below p is an iterator that points to the element to be erased. This means the time complexity is _only_ for the erasure of a single element. No time is included for searching for the element for a given value, no considerations for removal of additional elements are included:

std::deque<T>::erase(p):

- if p points to an element at the beginning or end of the deque

then the erase operation occurs in constant time (O(1)).

- if p point to an element in the middle of the deque then

the erase operation occurs in linear time (O(n)).

See the ISO C++ standard section 23.2.1 paragraph 1.

std::set<T>::erase(p) :

- Erasure of the element pointed to by p occurs in amortised

constant time (O(1)).

See the ISO C++ standard section 23.1.2 table 69.

Amortised constant time means that the time is constant on average (i.e. amortised over many operations), but any given single operation may take a longer amount of time.

The O(1) and O(n) values are constant and linear time complexity expressed using what is called Big-O notation. Other complexity values are logarithmic ( O(log(n)) ), n-log-n ( O(n*log(n)) ) and quadratic ( O(n*n) - note this should be n raised to the power 2 but the AllExperts answer formatting does not support superscripted characters as far as I can tell! ).

To get a handle on the effect on runtime with respect to these complexity values check out the following table (view the table using a mono-spaced font such as Courier or the formatting will be messed up):

Complexity | Number of Elements

Type | Notation | 1| 2| 5| 10| 50| 100| 1000

----------------------------------------------------------------------

Constant |O(1) | 1| 1| 1| 1| 1| 1| 1

Logarithmic|O(log(n)) | 1| 2| 3| 4| 6| 7| 10

Linear |O(n) | 1| 2| 5| 10| 50| 100| 1000

n-log-n |O(n*log(n))| 1| 4| 15| 40| 300| 700| 10000

Quadratric |O(n*n) | 1| 4| 25| 100| 2500| 10000| 1000000

The above was borrowed from the excellent book "The C++ Standard Library a Tutorial and Reference" by Nicolai M. Josuttis (page 22).

As you can see you do not want to use algorithms or operations that have a complexity worse than linear for anything but the smallest of datasets as the increase in time to perform them rises alarmingly with the number of elements!

Note that the absolute time taken to perform these operations is not given by such complexity types and notations. They only specify how much more time it would take to perform the operation on a collection of n items than it takes to perform the operation on a collection of one item.

Thus if it takes 2 weeks to look up an item in constant time in some collection of a million items then this would still be slower in real elapsed time than looking up an item in a collection of a million items using an operation having quadratic time complexity but the time to look up an item in a collection of a single item is only a millionth of a second (by my calculations this would take 1000000 * 1000000 / 1000000 seconds or 1000000 seconds, which is approximately 278 hours or 11.6 days). However if both collections had ten million items then the constant time operation would still take 2 weeks but the quadratic time operation would take 100 times longer (approximately 1157.4 days).

Sorry if this is not the nice clean answer you were looking for but we do not live in a nice clean world!

I am a software developer with more than 15 years C++ experience and over 25 years experience developing a wide variety of applications for Windows NT/2000/XP, UNIX, Linux and other platforms. I can help with basic to advanced C++, C (although I do not write just-C much if at all these days so maybe ask in the C section about purely C matters), software development and many platform specific and system development problems.

My career started in the mid 1980s working as a batch process operator for the now defunct Inner London Education Authority, working on Prime mini computers. I then moved into the role of Programmer / Analyst, also on the Primes, then into technical support and finally into the micro computing section, using a variety of 16 and 8 bit machines.
Following the demise of the ILEA I worked for a small company, now gone, called Hodos. I worked on a part task train simulator using C and the Intel DVI (Digital Video Interactive) - the hardware based predecessor to Indeo. Other projects included a CGI based train simulator (different goals to the first), and various other projects in C and Visual Basic (er, version 1 that is).
When Hodos went into receivership I went freelance and finally managed to start working in C++. I initially had contracts working on train simulators (surprise) and multimedia - I worked on many of the Dorling Kindersley CD-ROM titles and wrote the screensaver games for the Wallace and Gromit Cracking Animator CD.
My more recent contracts have been more traditionally IT based, working predominately in C++ on MS Windows NT, 2000. XP, Linux and UN*X. These projects have had wide ranging additional skill sets including system analysis and design, databases and SQL in various guises, C#, client server and remoting, cross porting applications between platforms and various client development processes.
I have an interest in the development of the C++ core language and libraries and try to keep up with at least some of the papers on the ISO C++ Standard Committee site at http://www.open-std.org/jtc1/sc22/wg21/.**Education/Credentials**