C++/STL
Expert: Ralph McArdell - 8/30/2009
Questioni can't understand the code:
std::sort(theList.begin(), theList.end(), compare);
the compare function get just two strings(that are our keys)then if the first is greater, the compare is true in this function:
int compare(const string& s1, const string& s2)
{
string a1 = s1.substr(0, s1.find(' ') );
string a2 = s2.substr(0, s2.find(' ') );
return a1 > a2;
}
the sort function just sorts these two strings?
i don't know after sorting the first two strings what does the functiosort n do?
does it then get the next two stings and sort them?
but in my example i don't understand how it works,plz just check it out:
7 , 5 , 3 , 2
( 5 , 7) , 3 , 2 <---------- the first two strings
5 , 7 , (2 , 3) <----------the next two strings
but as we see it the numbers didn't sort.
so how does the sort function work?
and could you plz tell me what exactly does compare do in the code below?
int x = std::binary_search(theList.begin(), theList.end(), value, compare);
Thanx
Bita
AnswerThe std::sort algorithm works with random access iterators - so if theList is an actual linked list - e.g. std::list<std::string> - then you cannot use std::sort with it.Instead you have to use the specially provided std::list-types' sort member functions.
Secondly you show code that presumes, and talk about, elements in theList being std::string instances but then show a set of elements that appear not to be strings but integers. As your compare function seems to only use part of a string and a space is used in determining these substrings (wrongly - see below) and you show your set of elements like so:
7 , 5 , 3 , 2
I cannot determine which spaces are part of the strings and which parts of the question text. Please place such strings in quotes as per C/C++ literal string style to make such ambiguities clear, e.g. do you mean:
"7" , "5" , "3" , "2"
or maybe:
" 7 "," 5 ", " 3 ", " 2 "
You will I hope appreciate that the two (and more) possible sets of string values from the values you showed will in this case have an effect on _which_ substrings are compared.
Now as far as I can work out your compare function is trying to compare the first 'word' of each string passed to it against the other - specifically whether the first word of the first string is greater than the first word of the second. Point: the operation functions passed to std::sort are binary predicates - that is they take two values and return a true/false Boolean value. Why then does your function return int rather than bool?
Now to the part you have wrong:
Oddly while the std::string find functions return index position values, the std::string substr functions take start position and _length_ of substring to return. They do not as you are doing here take start position and end position. Also remember that the find functions can return std::string::npos if the searched for item(s) is/are not found.
Now if no sort criteria (i.e. binary predicate function) are given to std::sort then std::less is used which returns true if item 1 is less than item 2, which in the case of the sequence of values "7" , "5" , "3" , "2", will yield the sequence "2", "3", "5", "7".
If then we reduce your compare function to:
bool compare(const std::string& s1, const std::string& s2)
{
return s1 > s2;
}
Then the sequence "7" , "5" , "3" , "2" would yield "7" , "5" , "3" , "2", that is the values are _already_ in the required order.
As to which values in the sequence std::sort presents to the comparison predicate function (and in what order) - well that depends on the exact sort algorithm(s) employed (the C++ standard guarantees that on average n-log-n comparisons performance and that std::sort not necessarily stable, see
http://en.wikipedia.org/wiki/Sorting_algorithm for some possible sort algorithms that fulfil these criteria). In short your compare function will be called with values where ever in a sort algorithm you see some sort of comparison on element values. For example if we look at the merge sort algorithm given at
http://en.wikipedia.org/wiki/Merge_sort specifically the initial pseudo code for the merge_sort function then we see the line:
if left.last_item > right.first_item
In this case, as we work with less rather than greater as the default we would be using:
if right.first_item < left.last_item
And in the case of actually requiring a less than comparison we would expect something equivalent to:
if ( std::less(right.first_item, left.last_item) )
To appear in the sort algorithm code.
In the case of providing your compare function then we would end up with something akin to:
if ( compare(right.first_item, left.last_item) )
In fact it would probably look more like:
if ( fn(right.first_item, left.last_item) )
Where fn is the function parameter passed to the std::sort algorithm.
You might like to look at the Wikipedia pages for some of the other search algorithms for more examples.
Looking at a real example in the Microsoft Visual C++ 2008 standard library I see that this implementation seems to use a combination of quick sort, heap sort and insertion sort depending on data set size, and the first use of the predicate comparison function is during the quick sort divide and conquer logic in an internal function for sorting the median of three values - fist, mid, last - where, in the case of compare, it would be called with the values:
compare( mid, first );
compare( last, mid );
compare( mid, first);
If any results are true then those values are swapped. Are you any the wiser? No, neither am I - other than to appreciate that this sort of thing occurs in well optimised library code, that it is not going to be that easy to reverse engineer and that if I'd revise the quick sort partitioning logic it would probably make sense. If you really wish to then I suggest you read the contents of your std::sort function and follow around the logic - doing so would probably be instructive for you. You might like to try to obtain more than one implementation (e.g. for different C++ compilers, or maybe third party implementations such as STLport at
http://www.stlport.org/).
As to your use of std::binary_search - first, again, std::binary_search returns a bool value so why use an int?
The binary search algorithm works on sorted data sets. For best results the algorithm requires random access to the data sets - but the std::binary_search implementation will work with just a forward iterator (one that can only move forward one element at a time).
Once again std::less is used if no operation function is provided. Once again the operation is a binary predicate that takes two values and returns a bool.
In this case one of the values will be the value you pass to std::binary_search. The other will be one of the elements in the sequence. A binary search starts in (approximately) the central element. It then compares the value of this element to the value being searched for. If it is not equal to the value searched for (a not before b and b not before a) then the next element checked is either at around position 1/4 or 3/4 of the number of elements, depending on whether the value searched for is before (comparison predicate returns true) or after (comparison predicate returns false, and switching the arguments returns true). Again the result from the tests is used to determine whether the value has been found or weather to search nearer the start of the sequence or nearer the end. At each step the change in position is halved thus the steps are (starting at the begriming) : +(number of elements)/2, +/- (number of elements)/4, +/- (number of elements)/8... until the step size is less than 0.5 which leads to some complexity when working with integer index and offset values; the +/- means plus or minus - i.e, add or subtract the amount to the current position. Here is a simple example using std::less (as your compare function seem broken I see little point using it for explanatory purposes):
Given a sequence of sorted integers using less than as the is before criteria:
1 3 5 7 9 11 13 15 17 19 21
In which we have 11 values.
We are looking to see if 19 is in the (sorted) sequence. To take care of the 0.5 requirement we will use a roundabout way of adding the 0.5 to integers. We want to do this:
truncate(n / d + 0.5)
Instead we shall do this using integer operations:
(2*n/d + 1)/2
That is we multiply the whole expression inside the truncate by 2 then divide result down by 2 again.
We have 11 elements so 2*n is 22.
The first element we shall look at is (22/2 + 1)/2 = 6, counting along we see this has the value 11:
19 < 11 is false so 19 is not before 11
11 < 19 is true so 11 is before 19 (or 19 is after 11)
Hence we need to increase our position. We add (22/4+1)/2 = 3 to our position, which moves us to the element at position 9 from the start, which is 17
19 < 17 is false so 19 is not before 17
17 < 19 is true so 17 is before 19 (or 19 is after 17)
Hence we need to increase out position. We add (22/8+1)/2 = 1 to our position which moves us to the element at position 10 from the start, which is 19
19 (searched for value) < 19 (element value) is false so 19 is not before 19
19 (element value) < 19 (searched for value) is false so 19 is not before 19
Hence we have found the element we are looking for, return true.
Now let's search for 23:
The first element we shall look at again is (22/2 + 1)/2 = 6, the element with the value 11:
23 < 11 is false so 23 is not before 11
11 < 23 is true so 11 is before 23 (or 23 is after 11)
Hence we need to increase out position. Adding (22/4+1)/2 = 3 to our position, which moves us again to the element at position 9 from the start, which is 17
23 < 17 is false so 23 is not before 17
17 < 23 is true so 17 is before 23 (or 23 is after 17)
Hence we need to increase out position. Adding (22/8+1)/2 = 1 to our position moves us again to the element at position 10 from the start, which is 19
23 < 19 is false so 23 is not before 19
19 < 23 is true so 19 is before 23 (or 23 is after 19)
Hence we need to increase out position. Adding (22/16+1)/2 = 1 to our position moves us to the element at position 11 from the start, which is 21
23 < 21 is false so 23 is not before 21
21 < 23 is true so 21 is before 23 (or 23 is after 21)
Hence we need to increase out position. We would want to add (22/32+1)/2 = 0 to our position, but the change is zero so we reached the end of the search without locating the value we were looking for and false is returned.
Notionally, where you see a less than comparison std::less is called [as a function is required, the operator < cannot be used directly, so it is wrapped, as with other operators, in a function (template) std::less]. In fact an implementation may provide a totally separate function (and related internal functions) for the default case and use the < operator directly.
To see how it would work with your compare function just replace the sequence of values with those from your sorted sequence, then replace the parts of above of the form "v1 < v2 is" with "compare(v1, v2) returns".
However I think you need to fix the bugs in compare first.
Hope this has given you some insight into the sorts of things going on.