C++/How to remove duplicates in a vector
Expert: Ralph McArdell - 4/6/2011
QuestionI am writing a program that is suppose to remove duplicates in a vector. I am having trouble writing the remove_duplicates function. I am suppose to go through all the elements and check to see if the same value occurs at a lower index. If true, remove the element.
Here is my code so far..
#include <iostream>
#include <vector>
using namespace std;
void remove_element(int pos, vector<int>&a)
{
for(int i = pos; i < a.size() - 1; i++)
{
a[i] = a[i + 1];
a.pop_back();
}
}
void remove_duplicates(vector<int>& a,vector<int>& b)
{
for(int k = 0; k < a.size()-1; k++)
{
if(a[k] == a[k+1])
}
}
void printVector(vector<int> a)
{
for (int i = 0; i < a.size(); i++)
cout << a[i] << " ";
cout << "\n";
}
int main()
{
vector<int> empty;
vector<int> v1(9);
v1[0] = 1; v1[1] = 4; v1[2] = 9; v1[3] = 16;
v1[4] = 9; v1[5] = 7; v1[6] = 4; v1[7] = 9; v1[8] = 11;
vector<int> v2(7);
v2[0] = 11; v2[1] = 11; v2[2] = 7; v2[3] = 9; v2[4] = 16;
v2[5] = 4; v2[6] = 1;
remove_duplicates(empty);
printVector(empty);
printVector(v1);
remove_duplicates(v1);
printVector(v1);
printVector(v2);
remove_duplicates(v2);
printVector(v2);
}
AnswerWell your remove_duplicates function only checks for this element equalling the next element in the sequence not all other elements in the sequence.
It is also incorrect when k==a.size()-1, then element k+1 is accessed but there is _no_ element k+1, as elements are indexed from 0 to a.size()-1.
Your approach will require an inner loop, from k+1 to != a.size():
Your outer loop only needs to have k to go up to k != a.size() - 2 as the last element has no elements further along to check:
for( unsigned int k = 0; k != a.size()-2; ++k)
{
for ( unsigned int l = k; l != a.size()-1; ++l )
{
if ( a[l] == a[k] )
{
// remove the element...
}
}
}
Now the std::vector class template has an erase member that can erase a single element or a range of elements.
The C++ standard library also has an algorithm function template that can remove all instances of a specific value from a sequence of values. However it does not really remove the values - it just copies the values later along the sequence into the 'holes' left by the removed values and returns the position of the new supposed end. We can use this together with the erase member function of vector to 'remove' all the various duplicate elements from the vector then properly erase them by passing the new supposed end of the vector to erase as the first element to erase and the end of the vector as one-past-the-end to stop at (all sequence ranges are specified as first to one-past-last in C++ standard library ranges, they are also specified using things called iterators which can be thought of as positions or generalised pointers to the underlying elements).
Thus to remove and erase a single known value from all elements of a vector we would write something like so:
a.erase( std::remove(a.begin(), a.end(), the_value_to_remove), a.end() );
To see what is going on suppose the vector a contained the values:
3 5 2 7 6 4 5 9 1 5 0 :end:
where :end: represents a one-past-the-end position placeholder returned by a.end().
If the_value_to_remove == 5 then after executing:
std::remove(a.begin(), a.end(), the_value_to_remove)
all the elements having the value 5 will be removed and the other elements will be copied around to fill the holes viz:
3 2 7 6 4 9 1 0 :1: 5 0 :end:
The returned position will be where the new one-past-the-end end point should be - which will point to the element I showed between colons: :1:.
Passing this element's position and the current a.end() position to a.erase() will erase these now unneeded elements, thus after executing:
a.erase( std::remove(a.begin(), a.end(), the_value_to_remove), a.end() );
we have a vector with elements:
3 2 7 6 4 9 1 0 :end:
Now this is all very interesting I hear you say, if a bit over the top, but how does this help you to remove duplicates?
Well first the sequence range passed to remove does not have to start at the first element of a container ( such as. a.begin() ), nor finish at the end ( such as a.end() ).
Second we do not have to call erase after every use of remove - we can continue removing elements until we are done.
For a fuller example please read through my answer to a similar question:
http://en.allexperts.com/q/C-1040/Delete-duplicate-elements-vector.htm
Of particular relevance might be the last part from:
"If sorting the vector is unacceptable then things are a little more complex...."
Hope this helps
Oh, and if you are using the C++ standard library containers and hopefully the algorithms that work with them, then you might find a decent book on the subject of use. For all the C++ standard library a decent general purpose reference and tutorial is "The C++ Standard Library a Tutorial and Reference" by Nicolai M. Josuttis.