You are here:

C++/file handling

Advertisement


Question
how to read line by line from a text file and differentiate between charecter data and integer data present in the line.

Answer
First I am going to point out that _all_ data in a _text_ file is _character data_. Some sequences of characters might be able to be interpreted as integer values. However the problem is to define which ones. Let's look at some examples:

123456
-123456
+123456
123,456
123.456
abcdef
ABCDEF
0xabcdef
0xABCDEF
01234567
08888888

All of the above could be interpreted as integer values. Some of them may not be interpreted as integer values in certain situations.

For example if we are reading values from a std::istream object using the stream extraction operator (operator>>) and we are using locales set for US or UK local number punctuation conventions then 123.456 is not a valid integer. However, if we were using say German local conventions then 123.567 would be a valid integer value as the use of comma and full stop for use as thousands separator and decimal point are reversed.

Likewise if we are reading data in hexadecimal format (i.e. the std::ios::basefield::hex format flag is set) then abcdef and ABCDEF would be valid integers. On the other hand if we were reading data in octal format (i.e. the std::ios::basefield::oct format flag is set), or default format then 08888888 is not a valid integer value (by default values starting with 0 are interpreted as octal values and values starting with 0x or 0X are interpreted as hexadecimal values).

So having said all that the easiest way is probably to try to assume initially the character data represents an integer, check to see whether this assumption causes a stream error, and then interpret the data as character data. However care must be taken if you have very unstructured data as that below:

Ajdlk;jas;ldja2131lkJS;LKJ A123lkjkla sjd8a sd_1lkja23;ljasd3 nms lksw

This is because string data when read using operator>> is normally delimited by white space (although you can turn this off) and characters which could be interpreted as integers will in such a case be interpreted as string data. This would mean that the sequence:

ldja 2131lkJS;LKJ A123lkjkla

would be interpreted as:

  Text: ldja
Integer: 2131l
  Text: kJS;LKJ
  Text: A123lkjkla

(assuming we are not reading integers in hexadecimal format). To prevent this we should extract non-integer data from the stream one character at a time, then see if the next sequence forms an integer. We can do this using one of the std::istream::get functions.

Of course there may still be problems if the integer read is too large for the data type it is put into. This will cause a stream failure for the large number, effectively causing it to be ignored.

Here is some example code:

int main()
{
 // Get a line of data.
 // Here I read from std::cin and then process the data
 // read using a std::istringstream.
 // You would read from your file stream instead.
   std::cout << "Mixed data: ";
   std::string line;
   std::getline( std::cin, line );
   std::istringstream lnStrm(line);

 // If you wish to keep whitespace turn off the skipws flag
   lnStrm.unsetf( std::ios::skipws );

 // Define values for current processing mode
 // This will help us to group the output together.
   enum ModeT { Init, Text, Num };
   ModeT mode = Init;

 // While we have some characters in the string stream...
   while ( !lnStrm.eof() )
     {
     // Try reading data as an integer...
       int integer(0);
       lnStrm >> integer;

     // If this fails assume we have text character data
       if ( lnStrm.fail() )
         {
         // If this is a change of data type from the last
         // item processed print the new type of data to the
         // console and update the data mode value.
         if ( mode != Text )
         {
         mode = Text;
         std::cout << "\n   Text: ";
         }

         // Clear the error flags from the stream
         lnStrm.clear();

         // Output a single character from the stream.
         // This will be the character that caused the integer
         // parsing to fail.
         std::cout << char(lnStrm.get());
         }
       // If stream OK then some characters parsed as an integer.
       else if ( lnStrm )
         {
         // If this is a change of data type from the last
         // item processed print the new type of data to the
         // console and update the data mode value.
         if ( mode != Num )
         {
         mode = Num;
         std::cout << "\nInteger: ";
         }

         // Output the integer value processed
         std::cout << integer << " ";
         }
       // Oh dear! Stream is in a fatally bad state, stop processing.
       else if ( lnStrm.bad() )
         {
         std::cerr << "\nFatal stream error!" << std::endl;
         break;
         }
     }
}

Hope this sets you on your way to a solution. I cannot be of much more help with out knowing more about the details of the sort of data you are trying to cope with.  

C++

All Answers


Answers by Expert:


Ask Experts

Volunteer


Ralph McArdell

Expertise

I am a software developer with more than 15 years C++ experience and over 25 years experience developing a wide variety of applications for Windows NT/2000/XP, UNIX, Linux and other platforms. I can help with basic to advanced C++, C (although I do not write just-C much if at all these days so maybe ask in the C section about purely C matters), software development and many platform specific and system development problems.

Experience

My career started in the mid 1980s working as a batch process operator for the now defunct Inner London Education Authority, working on Prime mini computers. I then moved into the role of Programmer / Analyst, also on the Primes, then into technical support and finally into the micro computing section, using a variety of 16 and 8 bit machines. Following the demise of the ILEA I worked for a small company, now gone, called Hodos. I worked on a part task train simulator using C and the Intel DVI (Digital Video Interactive) - the hardware based predecessor to Indeo. Other projects included a CGI based train simulator (different goals to the first), and various other projects in C and Visual Basic (er, version 1 that is). When Hodos went into receivership I went freelance and finally managed to start working in C++. I initially had contracts working on train simulators (surprise) and multimedia - I worked on many of the Dorling Kindersley CD-ROM titles and wrote the screensaver games for the Wallace and Gromit Cracking Animator CD. My more recent contracts have been more traditionally IT based, working predominately in C++ on MS Windows NT, 2000. XP, Linux and UN*X. These projects have had wide ranging additional skill sets including system analysis and design, databases and SQL in various guises, C#, client server and remoting, cross porting applications between platforms and various client development processes. I have an interest in the development of the C++ core language and libraries and try to keep up with at least some of the papers on the ISO C++ Standard Committee site at http://www.open-std.org/jtc1/sc22/wg21/.

Education/Credentials

©2016 About.com. All rights reserved.