You are here:

C++/About file handling in C++

Advertisement


Question
Kindly requested that can you tell me How to sort the 100 records in the file in C++

Answer
Not definitively. This is because they are many ways one could approach this problem. The exact solution would depend on the specific circumstances - what other code were available, how big the records were, etc.

However that having been said the most obvious solution would be to read all the 100 records into memory, sort them, then write them out to file again - now in sorted order.

The other technique would require sorting them in place in the file. (Then there would be combinations of the two...).

So, using the first method, and using the facilities in the C++ standard library:

   - Create a (sequence) collection of records
   - Open the file containing the record data
   - Read the data into the collection (unsorted)
   - Sort the data in the collection
   - Write the data back to file (sorted).
     Depending on your requirements this could mean overwriting
     the same file as was read from or writing to a completely
     different file.

A variation on this theme would be:

   - Create a (associative) collection of records
   - Open the file containing the record data
   - Read the data into the collection, (sorted)
   - Write the data back to file.
     Depending on your requirements this could mean overwriting
     the same file as was read from or writing to a completely
     different file.

The difference between the two is the type of container used. In the first scheme we use a sequence container like a vector or list. The data in such a container would be initially arranged in the unsorted order the records were read from the file, and then would be re-arranged by sorting explicitly.

In the second scheme we use an associative container such as a set or map. Such containers keep their member items in a sorted order as items are inserted in them.

In both cases we require a record type that allows record objects to be read from and written to file and can be copy constructable and assignable and can be destroyed by a destructor (which may be the compiler supplied default), and can have the records compared to determine which of two records comes before the other (e.g. by defining a less operation or operator<). Additionally it would probably be useful if objects of the record type could be default constructable and define an equality operation (operator==).

OK, here is an example class:

   #include <string>
   #include <istream>
   #include <ostream>

   class Record
   {
   friend std::ostream & operator<<(std::ostream& out, Record const & rec);
   friend std::istream & operator>>(std::istream & in, Record & rec);
   friend bool RecordLessById(Record const & rec1, Record const & Rec2);
   friend bool RecordLessByName(Record const & rec1, Record const & Rec2);

       unsigned int iId;
       std::string  iName;

   public:
       Record() : iId(0) {}  // define default constructor

       Record(unsigned int id, std::string const &  name)
       : iId(id), iName(name)
       {}  

   // Rely on compiler defaults for copy, assignment and destruction

   // Two records are equal if both the id and name are equal
       bool operator==( Record const & rhs ) const
       {
         return iId==rhs.iId && iName == rhs.iName;
       }

   // if can compare equal then we should allow for compare not equal
       bool operator!=( Record const & rhs ) const
       {
         return ! (*this == rhs);
       }
   };

   std::ostream & operator<<(std::ostream & out, Record const & rec)
   {
       out << rec.iId << ' ' << rec.iName;
       return out;
   }

   std::istream & operator>>(std::istream & in, Record & rec)
   {
       in >> rec.iId;
       in >> rec.iName;
       return in;
   }

   bool RecordLessById( Record const & rec1, Record const & rec2 )
   {
       return rec1.iId < rec2.iId;
   }

   bool RecordLessByName( Record const & rec1, Record const & rec2 )
   {
       return rec1.iName < rec2.iName;
   }

In this simple example I have used a really simple record having 2 fields: an integer id and a textual name.

I provide very simple operator<< and operator>> to write and read record objects. I have left out all error checking for simplicity of the example.

I have not provided an operator<. Instead I provide two functions RecordLessById and RecordLessByName, both of which perform a less than comparison of data in the two records passed in. However, as is common when sorting records, sometimes you wish to sort by one field and sometimes by another field. This is what these two functions achieve: RecordLessById compares using < the ids (iId members) of two records, while RecordLessByName compares the name fields (iName members) of two records. These when used with a sort operation will sort ascending. To sort descending use similar functions that compare using > (greater than) instead.

We can use the class to create some data:

   #include <boost/lexical_cast.hpp>
   #include <fstream>

   void CreateDataSet( char const * pathName, unsigned int nRecs )
   {
       char const * Names[]={ "Zoe", "Bob", "Joe", "Kev", "Ben", "Kim",
         "Amy", "Pen", "Tim", "Ria", "Ali", "Viv"
         };

       unsigned int id(67);

       unsigned int const NumberOfNames
         (sizeof(Names)/sizeof(char const *));

       std::string name;
       std::ofstream out(pathName);

       for (unsigned int i(0); i<nRecs; ++i, id = (id + 1) % nRecs )
       {
         name = Names[i%NumberOfNames];
         name += "-";
         name += boost::lexical_cast<std::string>
         ((id)%(NumberOfNames-3));
         Record rec( id, name );
         out << rec << '\n';
       }
   }

Some of this code is a bit arbitrary and odd but produces a file like so:

67 Zoe-4
68 Bob-5
69 Joe-6
70 Kev-7
71 Ben-8
72 Kim-0
73 Amy-1
74 Pen-2
75 Tim-3
76 Ria-4
77 Ali-5
78 Viv-6
79 Zoe-7
   .
   .
   .
57 Amy-3
58 Pen-4
59 Tim-5
60 Ria-6
61 Ali-7
62 Viv-8
63 Zoe-0
64 Bob-1
65 Joe-2
66 Kev-3

The point being that neither ids or names are in order. The boost::lexicial_cast is a useful utility from the Boost libraries (see http://www.boost.org/, http://www.boost.org/libs/libraries.htm, and particularly http://www.boost.org/libs/conversion/index.html) that allows converting between string types and other types using operator>> and operator<<. Thus I use it as a quick way to convert a number to text.

It can be called like so to write 100 records to a file called records.txt:

       CreateDataSet("records.txt", 100);

Next we need to use some sort of sequence container to hold the records in memory while they are being sorted. I chose to use a std::vector for this job, in fact a std::vector<Record>. I define a type alias for this type, and use the alias thereafter:

   #include <vector>

   typedef std::vector<Record>  RecordSequence;

We can fill in such a sequence from a file like so:

   void ReadRecordSet(  char const * pathName, RecordSequence & recSet )
   {
       std::ifstream in(pathName);

       Record rec;

       for ( in >> rec; in; in >> rec )
       {
         recSet.push_back( rec );
       }
   }

Basically we open the requested file then keep trying to read records into the local rec object until there is some problem with the stream, hopefully this will be end of file, again for this example I have kept error checking to a minimum for clarity and brevity.

Once a rec object has been read I add it to the end of the items in the recSet collection. We have to use push_back to create new items in the collection.

ReadRecordSet is used like so:

       RecordSequence recordSet;
       ReadRecordSet("records.txt", recordSet);
       DisplayRecords(recordSet);

I use another short function to display the current contents and order of items in the record set:

   void DisplayRecords( RecordSequence const & recSet )
   {
       std::cout << "Record Set size: " << recSet.size() << '\n';
       for ( unsigned int i(0); i < recSet.size(); ++i )
       {
         std::cout << recSet[i] << '\n';
       }
   }

This is useful to see the immediate effect of operations on the data!

Next we need to sort the data. Luckily for us the C++ standard library comes with various sorting functions, of which I show just the basic sort here:

   #include <algorithm>

   // ...

       std::sort( recordSet.begin(), recordSet.end(), RecordLessById );
       DisplayRecords(recordSet);

For the reasons stated earlier I did not provide the Record class with an operator<. If I had then we could use the simpler:

       std::sort( recordSet.begin(), recordSet.end() );

As it is I provided the two functions RecordLessById and RecordLessByName instead. As these are different from the default we have to specify their use explicitly as a third argument to std::sort. Note the use of recordSet.begin() and recordSet.end(), this defines the range of items we wish sorted. In this case the whole collection. Note that end, by convention returns one past the end. All such sequence ranges in the C++ standard library are specified from the first item's position to a position of one past the last item in the sequence.

I call DisplayRecords after sorting so we can see that the data is in fact sorted.

Now we need to write the sorted data back to a file:

   void WriteDataSet
        (char const * pathName, RecordSequence const & recSet)
   {
       std::ofstream out(pathName);

       RecordSequence::const_iterator pos(recSet.begin() );
       RecordSequence::const_iterator finish( recSet.end() );
       for (; pos!=finish; ++pos )
       {
         out << *pos << '\n';
       }
   }

This is much like the reverse of the read case. A file is opened for writing and we iterate from the beginning to the end of the sequence writing records to the file. Note that positions in the C++ standard library are represented by things called iterators. They can be considered a generalisation of pointers. Like pointers the can be de-referenced to obtain the item at the position they represent.

This can be used like so:

       std::sort( recordSet.begin(), recordSet.end(), RecordLessById );
       DisplayRecords(recordSet);
       WriteDataSet( "records_by_id.txt", recordSet );

Now we can re-sort the records by name instead if we like:

       std::sort(recordSet.begin(), recordSet.end(), RecordLessByName);
       DisplayRecords(recordSet);
       WriteDataSet( "records_by_name.txt", recordSet );

OK, so what about the case of using an associative container like a std::set? In fact in this case multiset is better because it allows items having duplicate sort positions (keys in effect), whereas set does not, and some names are repeated here. If I wanted to ensure we only had one entry for each name then I would use std::set.

Well this is a variation on a theme. Starting as before with a typedef alias for the multiset type, we might try the following:

   #include <set>

   typedef std::multiset<Record>  RecordSet;

Only to run into some probably quite nasty errors from the compiler. This is because associative containers are sorted, so they require the sorting comparison predicate operation up front when we create the type. The above cannot be used as the Record type has no operator< that is required by this default usage. OK next we try specifying one of the two function names we do have for comparison as the operation to use by default:

   typedef std::multiset<Record, RecordLessByName>  RecordSet;

Only to get more nasty errors! This is because the template parameters for the std::set/std::multiset class templates specify a type not a function or pointer to function for this template parameter. This forces us to either go with the flow and define an operator< or to provide our own type as glue-code around RecordLessByName:

   struct CmpRecordsByName
   : public std::binary_function<Record, Record, bool>
   {   
       bool operator()(const Record& left, const Record& right) const
       {   
         return RecordLessByName(left, right);
       }
   };

Basically all the type needs to provide is a member operator() of the correct type. Such types are called functors and operator() is the call operator. This is because objects of such types can be used like functions:

   CmpRecordsByName cmpObject;
   if ( cmpObject( rec1, rec2 ) )
   {
   // do something
   }
   else
   {
   // do something else
   }

OK so that provides enough glue to define and use our (multi-)set type alias:

   typedef std::multiset<Record, CmpRecordsByName>  RecordSet;

Next we need revised versions of the read, write and display functions:

   void ReadRecordSet(  char const * pathName, RecordSet & recSet )
   {
       std::ifstream in(pathName);

       Record rec;

       for ( in >> rec; in; in >> rec )
       {
         recSet.insert( rec );
       }
   }

   void WriteDataSet( char const * pathName, RecordSet const & recSet )
   {
       std::ofstream out(pathName);

       RecordSet::const_iterator pos(recSet.begin() );
       RecordSet::const_iterator finish( recSet.end() );
       for (; pos!=finish; ++pos )
       {
         out << *pos << '\n';
       }
   }

   void DisplayRecords( RecordSet const & recSet )
   {
       std::cout << "Record Set size: " << recSet.size() << '\n';
       RecordSet::const_iterator pos(recSet.begin() );
       RecordSet::const_iterator finish( recSet.end() );
       for (; pos!=finish; ++pos )
       {
         std::cout << *pos << '\n';
       }
   }

As you can see these are pretty similar to the previous versions, although I switched to using iterators in DisplayRecords. We can now read and sort records from file into a multiset and display and write out the results:

       RecordSet recordSet;
       ReadRecordSet("records.txt", recordSet);
       DisplayRecords(recordSet);
       WriteDataSet( "records_by_name.txt", recordSet );

One thing you might observe is that the order of the records when sorted by name differs slightly between the sequence case and the set case. This only affects items that have the same name. Records having the same name value may end up in different positions:

49 Ali-4
13 Ali-4

Or:

13 Ali-4
49 Ali-4

For example. This can be fixed by: 1: sorting the sequence case _only_ by name, and not by id first. 2/ using stable_sort rather than sort, which preserves the relative order of equal-items, for the sequence case.

Finally a note about sorting in-file without reading and writing to memory first. I think it could be done using similar techniques to those presented here but you would need a random access file-iterator that matched the criteria of random access iterators used by functions such as std::sort. This is not impossible, but the stream iterators provided with the standard library assume very basic stream operations - and are not therefore random access iterators! (Random access means being able to randomly move from one position straight to another position without having to go to every position in between). Hence you would have to write your own random access file iterator.

To learn more about all the capabilities of C++ standard library containers, iterators, algorithms (such as std::sort), streams, etc, etc, ... I suggest you obtain a good text on the subject such as the excellent "The C++ Standard Library A Tutorial and Reference" by Nicolai M. Josuttis.


Hope this is of use.  

C++

All Answers


Answers by Expert:


Ask Experts

Volunteer


Ralph McArdell

Expertise

I am a software developer with more than 15 years C++ experience and over 25 years experience developing a wide variety of applications for Windows NT/2000/XP, UNIX, Linux and other platforms. I can help with basic to advanced C++, C (although I do not write just-C much if at all these days so maybe ask in the C section about purely C matters), software development and many platform specific and system development problems.

Experience

My career started in the mid 1980s working as a batch process operator for the now defunct Inner London Education Authority, working on Prime mini computers. I then moved into the role of Programmer / Analyst, also on the Primes, then into technical support and finally into the micro computing section, using a variety of 16 and 8 bit machines. Following the demise of the ILEA I worked for a small company, now gone, called Hodos. I worked on a part task train simulator using C and the Intel DVI (Digital Video Interactive) - the hardware based predecessor to Indeo. Other projects included a CGI based train simulator (different goals to the first), and various other projects in C and Visual Basic (er, version 1 that is). When Hodos went into receivership I went freelance and finally managed to start working in C++. I initially had contracts working on train simulators (surprise) and multimedia - I worked on many of the Dorling Kindersley CD-ROM titles and wrote the screensaver games for the Wallace and Gromit Cracking Animator CD. My more recent contracts have been more traditionally IT based, working predominately in C++ on MS Windows NT, 2000. XP, Linux and UN*X. These projects have had wide ranging additional skill sets including system analysis and design, databases and SQL in various guises, C#, client server and remoting, cross porting applications between platforms and various client development processes. I have an interest in the development of the C++ core language and libraries and try to keep up with at least some of the papers on the ISO C++ Standard Committee site at http://www.open-std.org/jtc1/sc22/wg21/.

Education/Credentials

©2016 About.com. All rights reserved.