You are here:

C++/File processing

Advertisement


Question
QUESTION: Hi,Ralph McArdell

I am reading a text file in c++. There are 13 columns in a row and 3370 lines. I need only 4th and 8th column to write on output file. I am using "getline" for reading a  line. Let me show you how i am reading.

for(int LINE=0; LINE<=0;)
{
if(firstfile.eof() !=0)
{          
exit(1);
}
firstfile.getline(line,80);          
istream&seekg(ios::cur);          
firstfile.getline(line,31);          
fout.write(line,31);          
{
istream&seekg(ios::cur);          
firstfile.getline(line,17);          
{
istream&seekg(ios::cur);          
firstfile.getline(line,4,',');          
firstfile.getline(line,63);          
{
istream&seekg(ios::cur);          
firstfile.getline(line,31,',');          
firstfile.getline(line,SIZE);          
fout.write(line,31);          
   }
 }
}
 
But, somewhere it is working properly but not everywhere.
Here is sample of file.

Universal      ,General          ,System Wide          ,Database_Schema_Data_Version   ,Char          ,NULL,TimesTen Database schema and provisioned data version number   ,4.20.38.0          ,NULL,NULL,NULL,0,NULL
Universal      ,General          ,System Wide          ,MaxNumberOfNodes          ,Integer        ,NULL,Max number of nodes per type          ,63          ,64          ,5          ,128          ,1,NULL
Universal      ,General          ,System Wide          ,CaleaFlag          ,String         ,NULL,CALEA Falg for Backward Comp          ,1.6          ,64          ,5          ,128          ,1,NULL
Universal      ,General          ,System Wide          ,MaxNumberOfMscNodes          ,Integer        ,NULL,Max number of nodes per type          ,63          ,NULL,NULL,NULL,1,NULL
Universal      ,General          ,System Wide          ,MaxNumberOfMsfNodes          ,Integer        ,NULL,Max number of nodes per type          ,64          ,NULL,NULL,NULL,1,NULL

It is working properly till here.But this type of lines comes it shows wrong output.

CPSNodes       ,Call Processing          ,Call Manager          ,defaultSilenceToneId          ,Integer        ,NULL,NULL,0          ,NULL,NULL,NULL,1,NULL


Please help me.
Thanks


ANSWER: This is a rather odd piece of code. For a start what are all these supposed calls to the stream seekg operation?

   istream&seekg(ios::cur);

Surely these should be something like:

   firstfile.seekg(position, ios::cur);

where position is the required position relative to the current file get position.

As it stands these lines are erroneous and I cannot see the need for them.

Secondly in your data file your first line's 8th column (if I counted correctly) has the odd value:

   4.20.38.0

Which is not a number. I shall assume this is a typo.

You seem to be trying to guess where the column data is on each line based on magic values such as 80, 31, 17, 4 and 63 - what are these? Are these values you worked out by just looking at the file or are they actual supposed maximum field character widths? In either case you should make them named constants - if for no other reason than to help anyone else understand your code - me for example!

   std::streamsize const WidthOfFirstThreeFields(80);

   // ...

       firstfile.getline(line, WidthOfFirstThreeFields);

etc.

This sort of processing as you have found is very fragile. You only need a line that does not follow the assumed format to mess things up. In this case this happens with the lines of the format you show that fails. In your processing you read 63 characters for the 7th field. Anyway this covers the case where you actually have a value in this field such as:

   "TimesTen Database schema and provisioned data version number   "
     
However, the failed line has:

   "NULL"

For this field, thus reading 63 characters overshoots the following field's data that you are interested in.

A quick examination of this line as opposed to the others would have shown this to you. Why you needed me to point out where the difference in the good and bad line data were and thus which part of the field reading sequence was causing problems I cannot fathom.

Also I cannot fathom why you insist on specifying a comma for the field delimiter for some fields but not others. It indicates you know how to use getline to read up to a maximum number of characters if some character other than end of line is not reached first. Why do you _not_ specify all such field reads in this fashion? OK, for the first read you read three fields at once as far as I can tell, but you could save a lot of bother by just reading a large maximum field width for all fields but specify that a comma is the delimiter for all fields (except the last of course). OK so you have to read all fields and ignore the 1st, 2nd, 3rd, 5th, 6th, and 7th fields' data, but at least it is likely to work for all sensible variations of field widths:

   size_t const     LineSize(1024);
   streamsize const OutputFieldWidth(31);
   char const       FieldDelimiter( ',' );
   char const       RecordDelimiter( '\n' );

   char line[LineSize];

   for(;;)
   {
       firstfile.getline(line,LineSize,FieldDelimiter);// Field #1 - ignore          
       firstfile.getline(line,LineSize,FieldDelimiter);// Field #2 - ignore          
       firstfile.getline(line,LineSize,FieldDelimiter);// Field #3 - ignore          
       firstfile.getline(line,LineSize,FieldDelimiter);// Field #4 - keep
       if ( !firstfile || firstfile.eof() || !fout )
       {
         break;
       }
       fout.write( line, OutputFieldWidth );          
       firstfile.getline(line,LineSize,FieldDelimiter);// Field #5 - ignore          
       firstfile.getline(line,LineSize,FieldDelimiter);// Field #6 - ignore          
       firstfile.getline(line,LineSize,FieldDelimiter);// Field #7 - ignore          
       firstfile.getline(line,LineSize,FieldDelimiter);// Field #8 keep
       if ( !firstfile || firstfile.eof() || !fout )
       {
         break;
       }
       fout.write( line, OutputFieldWidth );          
       firstfile.getline(line,LineSize,RecordDelimiter); // Rest of record
         // - ignore
    }

Note the simplicity of the code. I have added minimal checking for stream state by only checking the streams are OK before writing any results. I have been a bit belt-and-braces breaking out of the loop if either a stream is not OK (i.e. failed) or at end of file, as usually reading past the end to cause end of file to be detected also causes a stream failure anyway!

You might like to consider writing out an end of line after each iteration, however it is your output data format so I'll leave that up to you to decide.

Also note that you can specify looping forever using a for-loop with empty loop control statements:

   for(;;)

and that the eof and other stream state member functions return a bool and so do not require checking against zero.

I have used ISO standard C++, and included <fstream> and specified:

   using namespace std;

In keeping with the names used in your posted code.

I built the above code with MSVC++ 2005 and ran it with your example good and bad data lines. The output was as follows:

   Database_Schema_Data_Version   4.20.38.0          
   MaxNumberOfNodes          63          
   CaleaFlag          1.6          
   MaxNumberOfMscNodes          63          
   MaxNumberOfMsfNodes          64          
   defaultSilenceToneId          0          

I have wrapped the data onto one line per record as if you had added the write of a newline at the end of each record as it is easier to read for us humans. The spaces at the beginning of the line I added as primitive formatting for the purposes of my answer text layout only - they do not form part of the output data.

Note that we can use the ignore std::istream operation to skip over data. This works just like getline except it ignores data up to the delimiter character. To specify reading up to the end of the file or to the next demlimiter character we can use the value std::numeric_limits<int>::max() (include <limits>).
If you wish not to play around with fixed length char buffers then you can used the std::getline functions that work with a stream and a std::string.

Here is a revised version using std::istream::ignore and std::getline and std::string:

   streamsize const OutputFieldWidth(31);
   char const       FieldDelimiter( ',' );
   char const       RecordDelimiter( '\n' );
   int  const       IgnoreMax( std::numeric_limits<int>::max() );

   string line;

   for(;;)
   {
       firstfile.ignore(IgnoreMax, FieldDelimiter); // Field #1 - ignore          
       firstfile.ignore(IgnoreMax, FieldDelimiter); // Field #2 - ignore          
       firstfile.ignore(IgnoreMax, FieldDelimiter); // Field #3 - ignore          
       getline( firstfile, line, FieldDelimiter);   // Field #4 - keep
       if ( !firstfile || firstfile.eof() || !fout )
       {
         break;
       }
       fout.write( line.c_str(), OutputFieldWidth );          
       firstfile.ignore(IgnoreMax, FieldDelimiter); // Field #5 - ignore          
       firstfile.ignore(IgnoreMax, FieldDelimiter); // Field #6 - ignore          
       firstfile.ignore(IgnoreMax, FieldDelimiter); // Field #7 - ignore          
       getline( firstfile, line, FieldDelimiter);   // Field #8 keep
       if ( !firstfile || firstfile.eof() || !fout )
       {
         break;
       }
       fout.write( line.c_str(), OutputFieldWidth );          
       firstfile.ignore(IgnoreMax, RecordDelimiter); // Rest of record          
         // - ignore
   }

In this case of course we have to include <string> and <limits>, and the line buffer is a std::string rather than an array of char, which will resize itself as appropriate. As we are using a std::string for line, we have to access the data as a character buffer. Thus I call c_str on line in the calls to fout.write. In this case as we are writing fixed length raw data rather strings we could have used line.data() instead.

All ignored data is now passed over using calls to the std::istream::ignore member function on firstfile.

Now we are assuming the written data will always be exactly 31 characters in length and that this amount or more will always have been read. This seems a little fragile as well. So maybe we should write the data out as formatted text and specify the required field width instead:

   streamsize const OutputFieldWidth(31);
   char const       FieldDelimiter( ',' );
   char const       RecordDelimiter( '\n' );
   int  const       IgnoreMax( std::numeric_limits<int>::max() );

   string line;

   fout.width( OutputFieldWidth );  // Set the output field width

   for(;;)
   {
       firstfile.ignore(IgnoreMax, FieldDelimiter); // Field #1 - ignore          
       firstfile.ignore(IgnoreMax, FieldDelimiter); // Field #2 - ignore          
       firstfile.ignore(IgnoreMax, FieldDelimiter); // Field #3 - ignore          
       getline( firstfile, line, FieldDelimiter);   // Field #4 - keep
       if ( !firstfile || firstfile.eof() || !fout )
       {
         break;
       }
       fout << line;          
       firstfile.ignore(IgnoreMax, FieldDelimiter); // Field #5 - ignore          
       firstfile.ignore(IgnoreMax, FieldDelimiter); // Field #6 - ignore          
       firstfile.ignore(IgnoreMax, FieldDelimiter); // Field #7 - ignore          
       getline( firstfile, line, FieldDelimiter);   // Field #8 keep
       if ( !firstfile || firstfile.eof() || !fout )
       {
         break;
       }
       fout << line;          
       firstfile.ignore(IgnoreMax, RecordDelimiter); // Rest of record          
         // - ignore
   }

If you find you have to do a lot of similar extractions of data like this from files of similar structure then you might like to consider writing a convenient function that helps extract the fields you want from a record. If all such files have the same delimiters then you can leave the delimiter characters as constants as in my examples above. However if they vary in their delimiters then maybe these need to be passed into such a function. Other information passed in would be which fields of a record you want and data passed out would be the extracted data for those fields. A good choice here would be std::vectors (include <vector>):

   typedef std::vector<int>          FieldNumbersType;
   typedef std::vector<std::string>   FieldValuesType;

   FieldValuesType ExtractFieldsFromRecord
   ( std::istream & in
   , FieldNumbersType const & fieldsRequired
   );

Using such a function would reduce the code above to something like so:

   std::ifstream firstfile( "file1.dat");
   std::ofstream fout("outfile.txt" );

   std::streamsize const OutputFieldWidth(31);
   fout.width( OutputFieldWidth );  // Set the output field width

   int const ItemNameField(4);
   int const ItemValueField(8);

   FieldNumbersType requiredFields;
   requiredFields.push_back(ItemNameField);
   requiredFields.push_back(ItemValueField);

   for(;;)
   {
       FieldValuesType fields( ExtractFieldsFromRecord( firstfile
         , requiredFields
         )
         );

       if ( !firstfile || firstfile.eof() || !fout || fout.eof() )
       {
         break;
       }
       fout << fields[0] << fields[1] << '\n';          
   }

Note that in these latter examples I have not assumed using namespace standard or equivalent has been specified. I also assume that the ExtractFieldsFromRecord function consumes a whole record and thus the stream read from is positioned ready for processing a subsequent record on a successful return (i.e. the stream is still in a good state). I gave some (guessed at) meaningfully named constants the fields numbers 4 and 8 and used thesew named values instead of the magic values 4 and 8. Oh, and I added that newline to each written out record. Note you might consider if it would be easier to number fields from 0 rather than 1 (in which case ItemNameField and ItemValueField should be 3 and 7 respectfully).

I shall leave it to you to ponder how such a function might be implemented should you feel it would be useful.

Hope this has been useful to you and good luck with implementing your application.





---------- FOLLOW-UP ----------

QUESTION: Thanks for your kind help, But still i am having some problems. When i am using these codes it shows lot of erors. I am using Turbo C++, Version 3.0,1992. Because of version of my compiler i'm having problem.As you mention in your answer 80,31,17,4,63 these are actual maximum field character widths. OK! let  me show u file from starting.


##ttBulkCp
#
# SPATIAL.CONFIGPARAMS, 13 columns, dumped Tue Sep 18 00:23:29 2007
# columns:
#      1. NODENAME     CHAR(15)
#      2. SUBSYSTEMNAME CHAR(31)
#      3. MANAGERNAME  CHAR(31)
#      4. PARAMNAME    CHAR(31)
#      5. PARAMTYPE    CHAR(15)
#      6. PARAMLENGTH  INTEGER
#      7. PARAMDESCRIPTION CHAR(63)
#      8. PARAMVALUE   CHAR(31)
#      9. DEFAULTVALUE CHAR(31)
#     10. LOWVALUE     CHAR(31)
#     11. HIGHVALUE    CHAR(31)
#     12. ISMODIFIABLE TINYINT
#     13. MANAGERLIST  CHAR(128)
# end
#

Universal      ,General          ,System Wide          ,Database_Schema_Data_Version   ,Char          ,NULL,TimesTen Database schema and provisioned data version number   ,4.20.38.0          ,NULL,NULL,NULL,0,NULL
Universal      ,General          ,System Wide          ,MaxNumberOfNodes          ,Integer        ,NULL,Max number of nodes per type          ,63          ,64          ,5          ,128          ,1,NULL
Universal      ,General          ,System Wide          ,CaleaFlag          ,String         ,NULL,CALEA Falg for Backward Comp          ,1.6          ,64          ,5          ,128          ,1,NULL
Universal      ,General          ,System Wide          ,MaxNumberOfMscNodes          ,Integer        ,NULL,Max number of nodes per type          ,63          ,NULL,NULL,NULL,1,NULL
Universal      ,General          ,System Wide          ,MaxNumberOfMsfNodes          ,Integer        ,NULL,Max number of nodes per type          ,64          ,NULL,NULL,NULL,1,NULL
Universal      ,General          ,System Wide          ,MaxProtectionGroups          ,Integer        ,NULL,Max number of protection groups          ,64          ,NULL,NULL,NULL,1,NULL
Universal      ,General          ,System Wide          ,ExternalNetworkMgr          ,Char          ,NULL,External Network Manager IP address          ,          ,NULL,NULL,NULL,1,NULL
Universal      ,General          ,System Wide          ,TrafficSndPort          ,Integer        ,NULL,Reliable xport send port          ,1024          ,NULL,NULL,NULL,0,NULL
Universal      ,General          ,System Wide          ,TrafficRcvPort          ,Integer        ,NULL,Reliable xport recieve port          ,1025          ,NULL,NULL,NULL,0,NULL
Universal      ,General          ,System Wide          ,SNMPCommandPort          ,Integer        ,NULL,SNMP port for commands          ,161          ,NULL,NULL,NULL,0,NULL
Universal      ,General          ,System Wide          ,SNMPTrapPort          ,Integer        ,NULL,SNMP port for sending traps          ,162          ,NULL,NULL,NULL,0,NULL


Every column has fixed length except 6th and 12th.
can you show me full code you made.

Thanks.

Answer
Sorry but you have almost all of the code I used. The only parts missing are:

- the header includes (primarily <iostream> in my case, <iostream.h> in yours, also see comments below).

- the using namespace std directive which as you are using an ancient pre-standard compiler you will not require

- the definition of main (or other function) around the code:

   int main()
   {
   }

- the file stream definitions for firstfile and fout (which also open the files I was using) before the code I show, which I presume you have in some form already.

I am sorry but my memory of what C++ features were around at what time and the details of things like old C++ library functionality are very hazy these days. 1992 is 15 years ago, which in computing terms is several eons ago! Work on the next ISO standard for C++ is well underway (targeted for release in 2009 at present), and the current standard has been around since 1998, had a bug-fix update in 2003 (TC1) and a technical report (TR1) optional library update around 2005.

Unfortunately I think your compiler is too old to even handle a cross platform implementation of the C++ standard library such as STLPort (see http://www.stlport.org/), but you could try if you are really interested.

Hence I think all my examples except the first version which uses:

   char line[LineSize];

are quite probably beyond the scope of your compiler/library. Certainly you will not have access to <limits>, <string> and <vector>, i.e. no std::numeric_limits, no std::string and no std::vector. However you may have alternate string and collection types available. Unfortunately I do not know what shipped with the Turbo compiler as I never used it (I came mainly down the MSC route and then to C++ via MSC 7. Note that is C not C++, and not Visual).

Further, the implementation of the IOStreams around in 1992 (sometimes now, post ISO standard C++, quaintly called traditional IOStreams) was different to the ISO standard version and not so standardised across different implementations of C++.

The only problems I can see off hand with this first version of the code are the use of the streamsize type and possibly the tests for the stream being in a bad state. You can replace streamsize by some other integer type check out the documentation for your IOStream implementation of getline to see what type is used for this parameter as unsurprisingly I do not have access to this information! Also check out the documentation to see how you check for bad stream states other than eof. Alternatively just check for eof on firstfile and hope nothing else goes wrong:

       if ( firstfile.eof() )
       {
         break;
       }


Oh, you should I think be able to use the formatted output variations for writing the results. As far as I remember you are quite likely to have a width function for your ostreams, and you should have operator<< for C-style strings that you use with istream::getline.

Unless you are stuck using actual MS-DOS, PC-DOS, DR-DOS or similar (or maybe a 16-bit version of MS Windows) then I would suggest you update your C++ implementation to a more modern set of tools. If you are using a modern version of MS Windows say MS Windows 2000 or later then you should be able to use one of several free compiler products see thefreecountry (http://www.thefreecountry.com/), specifically the free C/C++ compilers page for details (http://www.thefreecountry.com/compilers/cpp.shtml). Note that I have not as yet managed to access the Turbo C++ explorer site, however you might get better results from the main Borland site (http://www.borland.com/). I found Turbo product information at http://www.codegear.com/products/turbo, the (current) Turbo C++ datasheet lists Windows 2000 SP4 and later as system requirements.

The other listed free compilers of particular interest are the current Microsoft Visual C++ Express edition (although I suspect this may have an even more up to date set of system requirement than the Borland offering), and the Bloodshed DevC++ compiler and IDE which uses the MinGW MS Windows port of the GNU  C++ compiler. The system requirements for DevC++ are Windows 95/98/NT/2000/XP (for the current beta of version 5).

Hope this helps you to get up and running. Good luck.  

C++

All Answers


Answers by Expert:


Ask Experts

Volunteer


Ralph McArdell

Expertise

I am a software developer with more than 15 years C++ experience and over 25 years experience developing a wide variety of applications for Windows NT/2000/XP, UNIX, Linux and other platforms. I can help with basic to advanced C++, C (although I do not write just-C much if at all these days so maybe ask in the C section about purely C matters), software development and many platform specific and system development problems.

Experience

My career started in the mid 1980s working as a batch process operator for the now defunct Inner London Education Authority, working on Prime mini computers. I then moved into the role of Programmer / Analyst, also on the Primes, then into technical support and finally into the micro computing section, using a variety of 16 and 8 bit machines. Following the demise of the ILEA I worked for a small company, now gone, called Hodos. I worked on a part task train simulator using C and the Intel DVI (Digital Video Interactive) - the hardware based predecessor to Indeo. Other projects included a CGI based train simulator (different goals to the first), and various other projects in C and Visual Basic (er, version 1 that is). When Hodos went into receivership I went freelance and finally managed to start working in C++. I initially had contracts working on train simulators (surprise) and multimedia - I worked on many of the Dorling Kindersley CD-ROM titles and wrote the screensaver games for the Wallace and Gromit Cracking Animator CD. My more recent contracts have been more traditionally IT based, working predominately in C++ on MS Windows NT, 2000. XP, Linux and UN*X. These projects have had wide ranging additional skill sets including system analysis and design, databases and SQL in various guises, C#, client server and remoting, cross porting applications between platforms and various client development processes. I have an interest in the development of the C++ core language and libraries and try to keep up with at least some of the papers on the ISO C++ Standard Committee site at http://www.open-std.org/jtc1/sc22/wg21/.

Education/Credentials

©2016 About.com. All rights reserved.