You are here:

C++/Delete duplicate elements in vector

Advertisement


Question
Hi,

vector<string> str;

I have the vector filled with strings "a", "b", "c", "b", "c", "d", "a".
i.e
str[0] = a; str[1] = b; str[2] = c;.....and so on.

Now I would like to delete the duplicate elements in the above vector such that the vector only contains distinct elemets "a", "b", "c" and "d".

How can I do that??

Please help

Thanks!!

Answer
Answer:
-------

First I think your example line:

   str[0] = a; str[1] = b; str[2] = c;.....and so on.

is _not_ what you intended to say unless there are variables a, b, c, etc. that contain the string values "a", "b", "c", etc. I get the point however...

The solution is quite easy _if_ your vector is sorted: use the std::unique algorithm to remove consecutive duplicate values. So if you do not mind sorting the values first then that is the easiest solution:

   std::sort( str.begin(), str.end() );

   std::vector<std::string>::iterator new_end_pos;
   new_end_pos = std::unique( str.begin(), str.end() );
   
   // The elements between new_end_pos and str.end() hold
   // hold their old values, and must be erased to complete
   // the operation:

   str.erase( new_end_pos, str.end() );

You will note that unique returns a position. This position is the new end position in the collection. Like other C++ library algorithms that remove elements it does not really remove any values, but copies over removed values with values that follow, and everything shuffles up, and the "removed" values are left at the end. So for example in this case we start with:

   "a", "b", "c", "b", "c", "d", "a"

Which when sorted becomes:

   "a", "a", "b", "b", "c", "c", "d"

Then after unique has run the vector contains:

   "a", "b", "c", "d", "c", "c", "d"

And the new_end_pos points to the second "c" at position index 4 (counting from zero as normal for C/C++ array indexes), which as is usual for all C++ standard library end positions is in fact the one-past-the-end position value.

To really erase these values we use the returned new end position and the existing end position as arguments to the vector's erase operation, which leaves the vector as we would expect:

   "a", "b", "c", "d"

With a new size of 4.

If sorting the vector is unacceptable then things are a little more complex. We can use the std::remove algorithm to remove elements of a specific value from a collection. So we would have to repeatedly call remove, using the value of the preceding element on the remaining elements of the vector.

So initially we remove all elements from the second element to the end of the vector that have a value equal to that of the first element. Like std::unique, std::remove returns the new end position, so next we call remove to remove values equal to the second element's value from the third element to the new end position. We continue in this fashion until the range we are checking has no elements in it - that is we start at the latest end position.

The code would look something like this:

   // initially the end of the range we are removing from
   // is the end of the str vector:
   std::vector<std::string>::iterator end_pos( str.end() );

   // We define a position start_pos, initially the position of the
   // first element. This can be used to obtain the value to remove
   // and define the start position of the range when incremented:
   std::vector<std::string>::iterator start_pos( str.begin() );

   std::string value( *start_pos ); // set initial value to remove

   // Increment start position, then see if we still have work to
   // do by seeing if the remove range has any items in it:
   while (++start_pos != end_pos)
   {
       end_pos = std::remove( start_pos, end_pos, value );
       value = *start_pos;
   }

   // As before we need to call erase to really erase the
   // removed elements
   str.erase( end_pos, str.end() );

Notice that we only have to call erase after all the calls to std::remove have been made. The ways remove and erase work make a little more sense when seen like this...

Note that you should include the standard header <algorithm> to make use of std::sort, std::unique, std::erase, std::remove and most of the other algorithms supplied with the C++ standard library that work on sequences.

For more information on the C++ standard library please refer to a decent reference such as Nicolai M. Josuttis' book "The C++ Standard Library A Tutorial and Reference" - my copy is one of my most used C++ references.

For you to play with, here are the methods above placed into full functions:

   void SortUnqiueErase()
   {
       std::vector<std::string> str;
       str.push_back("a");
       str.push_back("b");
       str.push_back("c");
       str.push_back("b");
       str.push_back("c");
       str.push_back("d");
       str.push_back("a");
       PrintElements( str, "Initial sequence: " );

       std::sort( str.begin(), str.end() );
       PrintElements( str, "Sorted sequence: " );

       std::vector<std::string>::iterator new_end_pos;
       new_end_pos = std::unique( str.begin(), str.end() );
       PrintElements( str, "Sequence after std::unique: " );

       str.erase( new_end_pos, str.end() );
       PrintElements( str, "Sequence after str.erase: " );
   }

   void RemoveErase()
   {
       std::vector<std::string> str;
       str.push_back("a");
       str.push_back("b");
       str.push_back("c");
       str.push_back("b");
       str.push_back("c");
       str.push_back("d");
       str.push_back("a");
       PrintElements( str, "Initial sequence: " );

       std::vector<std::string>::iterator end_pos( str.end() );

       std::vector<std::string>::iterator start_pos( str.begin() );

       std::string value( *start_pos );
       while (++start_pos != end_pos)
       {
         end_pos = std::remove( start_pos, end_pos, value );
         value = *start_pos;
       }
       PrintElements( str, "Sequence after std::remove calls: " );

       str.erase( end_pos, str.end() );
       PrintElements( str, "Sequence after str.erase: " );
   }

   int main()
   {
       SortUnqiueErase();
       RemoveErase();
   }

The PrintElements function is a useful utility function template that takes a collection and outputs each element to std::cout, it uses the std::for_each algorithm with a simple additional function template to print each element:

   template <typename T>
   void PrintItem( T const & v )
   {
       std::cout << v << ' ';
   }

   template <class CT>
   void PrintElements( CT const & coll, char const * optMsg="" )
   {
       std::cout << optMsg;
       std::for_each( coll.begin()
         , coll.end()
         , PrintItem<typename CT::value_type>
         );
       std::cout << '\n';
   }

Do not worry if you do not get all the syntax - the tricky bits are just a result of wanting to make the function generic so it can handle any collection of any type that can be output using the IOStream stream insertion operator (operator<<).

I built and ran the code on Windows Vista using the MSVC++ 2005 (a.k.a. version 8.0) and on Ubuntu 64-bit Linux with g++ 4.1.2.

Hope this is of use, and have fun!  

C++

All Answers


Answers by Expert:


Ask Experts

Volunteer


Ralph McArdell

Expertise

I am a software developer with more than 15 years C++ experience and over 25 years experience developing a wide variety of applications for Windows NT/2000/XP, UNIX, Linux and other platforms. I can help with basic to advanced C++, C (although I do not write just-C much if at all these days so maybe ask in the C section about purely C matters), software development and many platform specific and system development problems.

Experience

My career started in the mid 1980s working as a batch process operator for the now defunct Inner London Education Authority, working on Prime mini computers. I then moved into the role of Programmer / Analyst, also on the Primes, then into technical support and finally into the micro computing section, using a variety of 16 and 8 bit machines. Following the demise of the ILEA I worked for a small company, now gone, called Hodos. I worked on a part task train simulator using C and the Intel DVI (Digital Video Interactive) - the hardware based predecessor to Indeo. Other projects included a CGI based train simulator (different goals to the first), and various other projects in C and Visual Basic (er, version 1 that is). When Hodos went into receivership I went freelance and finally managed to start working in C++. I initially had contracts working on train simulators (surprise) and multimedia - I worked on many of the Dorling Kindersley CD-ROM titles and wrote the screensaver games for the Wallace and Gromit Cracking Animator CD. My more recent contracts have been more traditionally IT based, working predominately in C++ on MS Windows NT, 2000. XP, Linux and UN*X. These projects have had wide ranging additional skill sets including system analysis and design, databases and SQL in various guises, C#, client server and remoting, cross porting applications between platforms and various client development processes. I have an interest in the development of the C++ core language and libraries and try to keep up with at least some of the papers on the ISO C++ Standard Committee site at http://www.open-std.org/jtc1/sc22/wg21/.

Education/Credentials

©2016 About.com. All rights reserved.