You are here:

C++/reading from a file

Advertisement


Question
I'm taking an intro C++ course and am trying to find some usefulness for it.  I'm trying to write a program to manipulate a text file and put it into Excel.  I know how to do this, but I'm running into a problem when reading in the data.  The text document is essentially 7 columns (group#, name, ssn, plan1, plan2, cost, cobra) separated by tabs.  I can easily read in the group#, ssn, plan1, plan2, and cost, but I'm having trouble reading the name and cobra columns.  The reason for this is that the name "field" contains spaces and the cobra "field" is blank for most rows.  Here's two examples of what the text file looks like (I've separated the fields with brackets):

[5569433] [Employee A] [555555555] [RIDE V01] [ZHE1 ZSV1] [$4.27]   [COBRA Dt : 08/01/2010]

[5569433] [Employee B] [111111111] [RIDE V01] [ZHE1 ZSV1] [$4.27]

How can I get the entire name into one variable?  I thought of using the getline function, but that would read the entire row into the variable which I do not want.  Also how do I tell the program to store whatever is in the cobra column to that variable, whether the column is blank or has a value?  I have no idea how to do that.  Once I get those into the variables, the manipulation and output would not be an issue.  Thanks.

Answer
Hmm, well reading such data can be problematic. However if your fields are always separated by tabs and no field contains tabs  only spaces then you could use the form of getline that allows specifying the delimiter character and specify the delimiter as a tab for all but the last field (which would require some care anyway as it is optional). The idea would be something like this:

   std::string groupFieldValue;
   std::getline(fileInputStream, groupFieldValue, '\t');

   std::string nameFieldValue;
   std::getline(fileInputStream, nameFieldValue, '\t');

       .       .       .
       .       .       .
       .       .       .

   std::string costFieldValue;
   std::getline(fileInputStream, costFieldValue, '\t');

   std::string cobraFieldValue;
   std::getline(fileInputStream, cobraFieldValue); // read up to end of line - which will be blank if no field.

I have left out all error checking for brevity but you should make sure you check the state of the input stream after each I/O operation to make sure it is still in a good state. Refer to a good C++ library reference for details of IOStream state and error conditions - e.g. "The C++ Standard Library a Tutorial and Reference" by Nicolai M. Josuttis.

You will notice that I am using C++ library std::strings rather than C-style zero terminated array of char strings, and using the often overlooked standalone getline functions that read into std::string rather than a (fixed size) char array buffer. ONe reason for this is that you do not need to worry (too much) about the size of std::strings - they will handle re-sizing themselves as required. Another reason is that std::string objects have a lot of member function operations including operations for finding positions of interest within strings and extracting substrings which can be of use if you need to clean up any of the raw field values extracted using std::getline.

Include <string> to use std::string and std::getline.

Again refer to a good C++ library reference for more details on std::string  and std::getline.

If some of the field values do contain tabs within them then you have a much trickier task ahead of you. In this case I would suggest reading each record as a whole line into a std::string then using the operations on a std::string to determine where the fields' start and end points are and extract them as substrings into your field value holding objects:

   std::string record = std::getline(fileInputStream, record);

   std::string::size_type groupEndPos( record.find( '\t') );
   std::string groupFieldValue( record.substr(0, groupEndPos) ); // 0 is start character
         // groupEndPos is number of characters in substring
         // in fact is should be groupEndPos - startPos
         // but in this case startPos is 0 so just
         // groupEndPos will do.

   std::string::size_type nameStartPos( groupEndPos+1 );  // Name field starts after tab between group and name fields

   // The position of the tab between the name and ssn fields is one before the first digit character of the ssn field
   std::string::size_type nameEndPos( record.find_first_of( "0123456789", nameStartPos)-1 );
   std::string nameFieldValue( record.substr(nameStartPos, nameEndPos-nameStartPos) );
       .          .          .
       .          .          .
       .          .          .

Note that again I have elided any error checking for brevity. Note that:

   - if a find or find_xxx operation cannot locate a matching character or substring at any position in the string
     then the value std::string::npos is returned
   - the start position for a field should always be less than or equal to the end position for a field otherwise the
     number of character sin a substring calculation will be negative

Finally, please note that the above code snippets are all straight out of my head and have not been compiled so may contain errors or types. If this is the case then I apologise.

Hope this gives you some pointers.

C++

All Answers


Answers by Expert:


Ask Experts

Volunteer


Ralph McArdell

Expertise

I am a software developer with more than 15 years C++ experience and over 25 years experience developing a wide variety of applications for Windows NT/2000/XP, UNIX, Linux and other platforms. I can help with basic to advanced C++, C (although I do not write just-C much if at all these days so maybe ask in the C section about purely C matters), software development and many platform specific and system development problems.

Experience

My career started in the mid 1980s working as a batch process operator for the now defunct Inner London Education Authority, working on Prime mini computers. I then moved into the role of Programmer / Analyst, also on the Primes, then into technical support and finally into the micro computing section, using a variety of 16 and 8 bit machines. Following the demise of the ILEA I worked for a small company, now gone, called Hodos. I worked on a part task train simulator using C and the Intel DVI (Digital Video Interactive) - the hardware based predecessor to Indeo. Other projects included a CGI based train simulator (different goals to the first), and various other projects in C and Visual Basic (er, version 1 that is). When Hodos went into receivership I went freelance and finally managed to start working in C++. I initially had contracts working on train simulators (surprise) and multimedia - I worked on many of the Dorling Kindersley CD-ROM titles and wrote the screensaver games for the Wallace and Gromit Cracking Animator CD. My more recent contracts have been more traditionally IT based, working predominately in C++ on MS Windows NT, 2000. XP, Linux and UN*X. These projects have had wide ranging additional skill sets including system analysis and design, databases and SQL in various guises, C#, client server and remoting, cross porting applications between platforms and various client development processes. I have an interest in the development of the C++ core language and libraries and try to keep up with at least some of the papers on the ISO C++ Standard Committee site at http://www.open-std.org/jtc1/sc22/wg21/.

Education/Credentials

©2016 About.com. All rights reserved.