C++/What is the internally difference if I read a data as bytes, characters, string and stream?
Expert: Ralph McArdell - 10/19/2005
QuestionHi again,
What is the internally difference if I read a data as bytes, characters, string and stream?
Thanks,
lzzzz
AnswerDepends what you mean and in what context. Also data is plural so you cannot read a data it is a datum.
If you mean in C and C++, then the char type is often a signed or unsigned integer of byte (8-bit) size. It is also the smallest built in C and C++ type. It may not be 8-bits in size at all. If I remember correctly for C++ the size of the integer types have to be as follows:
size of char = 1,
size of short >= size of char,
size of int >= size of short,
size of long >= size of int.
Notice that the sizes of the integer types are all at least as large as the previous type. So while for a 32-bit compiler the following are reasonable:
size of char = 1 = 8 bits,
size of short = 2 = 16 bits,
size of int = 4 = 32 bits,
size of long = 4 = 32 bits.
For some weird signal processor the following might be true:
size of char = 1 = 24 bits
size of short = 1 = 24 bits,
size of int = 1 = 24 bits,
size of long = 1 = 24 bits.
The original use intended for char was obviously for storing byte sized character data, but it can just as easily be used to store small integer values that have other meanings such as state values or raw I/O port data which is often only a byte in size. As an aside, the bool C++ type is often implemented as a byte sized object as a compromise between required size (1 bit), wasted space and speed of use most modern architectures can address a byte as the smallest unit, but accessing anything smaller such as a bit becomes cumbersome.
Note that the values stored in these integer types only have a meaning when used with the appropriate operations. A 32-bit value can be interpreted as a single 32-bit integer (signed or unsigned), a single 32-bit floating point value, two 16-bit short integers or 16-bit characters, or four 8-bit integers or characters. What the numbers represent is again open to interpretation are they counters, or id values or characters or colour channel values or ...
The only reason an array of bytes say unsigned char is interpreted as a C-string is because that is how the functions such as strcpy, strlen, printf etc. treat the data and expect the data to be arranged try using one of these functions on a char array that does not have a zero value in its last element!
Note that C and C++ support the wchar_t type. In C this is a typedef alias for some other integer type such as an unsigned short, hence the _t postfix to the name. In C++ they decided to keep the name but make wchar_t a full built in type so that compiler can differentiate overloaded functions taking as parameters say unsigned short and a wchar_t. The wchar_t type is meant for use with wide characters and character strings such as 16 or 32 bit encodings for UNICODE symbols.
Now a string is a sort of an array of (usually) character data which again come in narrow character (char) and wide character (wchar_t) formats. In C of course strings were exactly that an array of char or wchar_t, with the convention that they were terminated by a zero value, making them one character longer to store than the number of characters in the string.
In C++ we have the std::string and std::wstring classes. These are classes that represent strings and the data they hold are collections of, again char and wchar_t respectfully. However, how exactly they store these characters is up to the class implementation.
Note that you could have a string in which the characters are any integer type for most purposes if for example char is 8-bit, wchar_t is 16-bit but you have to handle a character encoding that is 32-bit then maybe you would use an array of unsigned int or unsigned long (for some popular 32-bit compilers). In fact I would probably define a typedef alias for the required integer type such as
typedef unsigned long wchar32;
Then I can create array of these:
wchar32 veryWideString[NumberOfCharacters + 1];
and can fill the array with the wide characters. Unfortunately there is of course no support for literal characters of this size, as there are for char characters and strings: c' and A char string and wchar_t strings: L'w' and LA wchar_t string note the L (for long) prefix. Also there are no C library support for these extra-wide characters and string so again you would have to write your own.
However for C++ you can define a new string class based on the std::basic_string<> class template (which already have predefined aliases for std::string and std::wstring), using our new wchar32 character type:
typedef std::basic_string<wchar32> w32string;
Now we have a type that provides us with all the usual std::string operations but on character data that is 32-bits in size.
Now the other area where characters and integers differ is in how they are formatted to a standard C++ stream. char and wchar_t and the various string types will display characters. Other integer types will be formatted to represent the value as some sequence of characters that displays the value as a number. This can be annoying if your char or char array contain numeric data rather than character data. In this case casting the character(s) to ints will fix the problem:
char smallInteger( 65 );
std::cout << smallInteger << \n'; // outputs A on a line
std::cout << static_cast<int>(smallInteger)
<< \n'; // outputs 65 on a line.
Now to reading values. How the data is interpreted depends on how the data is read. If you use the C++ streams and read the data from file using the extraction operator (>>, the opposite to the above example) then the data should be formatted as if it had been written via a C++ stream in the complementary way. However, you can read raw chars using the get, read and readsome member functions of the std::istream type. All read data as char or char*. Those forms that take a char* and a length either terminate the read data with a zero (get) or not (read, readsome). You can look up the small print on std::istream for yourself I do not intend to reproduce for you here as it is a waste of my time. I also will not go into the various other ways of reading data here (such as the C library fread function).
If you are just reading binary file data open the file as binary for reading and read it using read or readsome.
If you wish to manually read chars or C char strings use get or getline.
If you have a text file with strings and numbers in some format use << to read in individual fields for the types they represent. If using chars as small integers read the data into a temporary int then copy to the required char.
Hope this helps but once again to post a quick question with little context to help me answer quickly again your are starting to annoy me as I have to answer at length and hope I answered your question. Any more like this and I shall just refuse your questions. Now can I have my dinner please?