You are here:

C++/What is the internally difference if I read a data as bytes, characters, string and stream?

Advertisement


Question
Hi again,

What is the internally difference if I read a data as bytes, characters, string and stream?

Thanks,
lzzzz


Answer
Depends what you mean and in what context. Also data is plural so you  cannot read a data – it is a datum.

If you mean in C and C++, then the char type is often a signed or unsigned integer of byte (8-bit) size. It is also the smallest built in C and C++ type. It may not be 8-bits in size at all. If I remember correctly for C++ the size of the integer types have to be as follows:

       size of char  = 1,
       size of short >= size of char,
       size of int   >= size of short,
       size of long  >= size of int.

Notice that the sizes of the integer types are all at least as large as the previous type. So while for a 32-bit compiler the following are reasonable:

       size of char  = 1 =  8 bits,
       size of short = 2 = 16 bits,
       size of int   = 4 = 32 bits,
       size of long  = 4 = 32 bits.

For some weird signal processor the following might be true:

       size of char  = 1 = 24 bits
       size of short = 1 = 24 bits,
       size of int   = 1 = 24 bits,
       size of long  = 1 = 24 bits.

The original use intended for char was obviously for storing byte sized character data, but it can just as easily be used to store small integer values that have other meanings – such as state values or raw I/O port data which is often only a byte in size. As an aside, the bool C++ type is often implemented as a byte sized object as a compromise between required size (1 bit), wasted space and speed of use – most modern architectures can address a byte as the smallest unit, but accessing anything smaller such as a bit becomes cumbersome.

Note that the values stored in these integer types only have a meaning when used with the appropriate operations. A 32-bit value can be interpreted as a single 32-bit integer (signed or unsigned), a single 32-bit floating point value, two 16-bit short integers or 16-bit characters, or four 8-bit integers or characters. What the numbers represent is again open to interpretation – are they counters, or id values or characters or colour channel values or ...

The only reason an array of bytes – say unsigned char – is interpreted as a C-string is because that is how the functions such as strcpy, strlen, printf etc. treat the data – and expect the data to be arranged – try using one of these functions on a char array that does not have a zero value in its last element!

Note that C and C++ support the wchar_t type. In C this is a typedef alias for some other integer type such as an unsigned short, hence the _t postfix to the name. In C++ they decided to keep the name but make wchar_t a full built in type so that compiler can differentiate overloaded functions taking as parameters say unsigned short and a wchar_t. The wchar_t type is meant for use with wide characters and character strings – such as 16 or 32 bit encodings for UNICODE symbols.

Now a string is a sort of an array of (usually) character data – which again come in narrow character (char) and wide character (wchar_t) formats. In C of course strings were exactly that – an array of char or wchar_t, with the convention that they were terminated by a zero value, making them one character longer to store than the number of characters in the string.

In C++ we have the std::string and std::wstring classes. These are classes that represent strings and the data they hold are collections of, again char and wchar_t respectfully. However, how exactly they store these characters is up to the class implementation.

Note that you could have a string in which the characters are any integer type for most purposes – if for example char is 8-bit, wchar_t is 16-bit but you have to handle a character encoding that is 32-bit then maybe you would use an array of unsigned int or unsigned long (for some popular 32-bit compilers). In fact I would probably define a typedef alias for the required integer type – such as

       typedef unsigned long    wchar32;

Then I can create array of these:

       wchar32  veryWideString[NumberOfCharacters + 1];

and can fill the array with the wide characters. Unfortunately there is of course no support for literal characters of this size, as there are for char characters and strings: ‘c' and “A char string” and wchar_t strings: L'w' and L”A wchar_t string” – note the L (for long) prefix. Also there are no C library support for these extra-wide characters and string so again you would have to write your own.

However for C++ you can define a new string class based on the std::basic_string<> class template (which already have predefined aliases for std::string and std::wstring), using our new wchar32 character type:

       typedef std::basic_string<wchar32> w32string;

Now we have a type that provides us with all the usual std::string operations but on character data that is 32-bits in size.

Now the other area where characters and integers differ is in how they are formatted to a standard C++ stream. char and wchar_t and the various string types will display characters. Other integer types will be formatted to represent the value as some sequence of characters that displays the value as a number. This can be annoying if your char or char array contain numeric data rather than character data. In this case casting the character(s) to ints will fix the problem:

       char smallInteger( 65 );
       std::cout << smallInteger << ‘\n'; // outputs A on a line
       std::cout << static_cast<int>(smallInteger)
         << ‘\n';          // outputs 65 on a line.

Now to reading values. How the data is interpreted depends on how the data is read. If you use the C++ streams and read the data from file using the extraction operator (>>, the opposite to the above example) then the data should be formatted as if it had been written via a C++ stream in the complementary way. However, you can read raw chars using the get, read and readsome member functions of the std::istream type. All read data as char or char*. Those forms that take a char* and a length either terminate the read data with a zero (get) or not (read, readsome). You can look up the small print on std::istream for yourself – I do not intend to reproduce for you here as it is a waste of my time. I also will not go into the various other ways of reading data here (such as the C library fread function).


If you are just reading binary file data open the file as binary for reading and read it using read or readsome.

If you wish to manually read chars or C char strings use get or getline.

If you have a text file with strings and numbers in some format use << to read in individual fields for the types they represent. If using chars as small integers read the data into a temporary int then copy to the required char.

Hope this helps but once again to post a quick question with little context to help me answer quickly – again  your are starting to annoy me as I have to answer at length and hope I answered your question. Any more like this and I shall just refuse your questions. Now can I have my dinner please?  

C++

All Answers


Answers by Expert:


Ask Experts

Volunteer


Ralph McArdell

Expertise

I am a software developer with more than 15 years C++ experience and over 25 years experience developing a wide variety of applications for Windows NT/2000/XP, UNIX, Linux and other platforms. I can help with basic to advanced C++, C (although I do not write just-C much if at all these days so maybe ask in the C section about purely C matters), software development and many platform specific and system development problems.

Experience

My career started in the mid 1980s working as a batch process operator for the now defunct Inner London Education Authority, working on Prime mini computers. I then moved into the role of Programmer / Analyst, also on the Primes, then into technical support and finally into the micro computing section, using a variety of 16 and 8 bit machines. Following the demise of the ILEA I worked for a small company, now gone, called Hodos. I worked on a part task train simulator using C and the Intel DVI (Digital Video Interactive) - the hardware based predecessor to Indeo. Other projects included a CGI based train simulator (different goals to the first), and various other projects in C and Visual Basic (er, version 1 that is). When Hodos went into receivership I went freelance and finally managed to start working in C++. I initially had contracts working on train simulators (surprise) and multimedia - I worked on many of the Dorling Kindersley CD-ROM titles and wrote the screensaver games for the Wallace and Gromit Cracking Animator CD. My more recent contracts have been more traditionally IT based, working predominately in C++ on MS Windows NT, 2000. XP, Linux and UN*X. These projects have had wide ranging additional skill sets including system analysis and design, databases and SQL in various guises, C#, client server and remoting, cross porting applications between platforms and various client development processes. I have an interest in the development of the C++ core language and libraries and try to keep up with at least some of the papers on the ISO C++ Standard Committee site at http://www.open-std.org/jtc1/sc22/wg21/.

Education/Credentials

©2016 About.com. All rights reserved.