C++/signed vs unsigned char ?
As when I compile:
using namespace std;
char ch1 ='«';
unsigned char ch2 ='«';
char ch3 =128;
«Press any key to continue . . .
Please tell me why my '«' became char -57 or 199 , it should be char 128 or -128 . I inputed this « from Character map in Windows .
As far as I know , every character after scope [0-127] actually has 2 value for display , each for signed and unsigned char .
Is that true ?
Could you tell me briefly whats the main difference btw signed vs unsigned char ?
Thank you very much !
First the character you chose from the Windows character map is obviously not the same as that with character code 128 (0x80) in fact when I check it on my system it says it has a value of 0xC7 = 192 + 7 = 199. (This information came from the hint text that appears when you hover over a character in the MS Windows character map application: the text read:
U+00C7: Latin Capital Letter C With Cedilla
199 represented as an 8-bit 2s complement signed value is -57:
199 == 1100 0111 binary.
The top bit is 1 so if the value is signed it will be negative.
To get the magnitude (i.e. the positive equivalent) we flip the bits (1s complement) and add 1 (2s complement):
0011 1000 + 1 = 0011 1001 = 0x39 = 57
Thus the bit pattern 1100 0111 represents an 8-bit unsigned value of 199 or a signed 8-bit 2s complement value of -57.
In my locale using a console (CMD.EXE) under Windows Vista the output is:
(Those are A with tilde characters on the first line)
I am not sure where the ╟ ╟ comes from in your output but suspect it is due to translation of the question text through AllExperts. I would expect something similar to happen to my A-with tilde characters shown above when I post this answer text back to you...
The reason you get the characters for the lines:
is that you are passing char values to the cout ostream which just passes the value to the console for display as a character.
On the other hand you get the values of the characters with the line:
because adding a literal zero adds an int value, thus the result of the expressions ch1+0 and ch2+0 are int not a char values. Int values sent to ostreams are formatted as a sequence of digit characters that represent the integer's value. You could also convert them explicitly to ints:
The reason the character output is different in the console to the main Windows environment is that the code page used differs - and no I do not like it much either! Windows in the main uses UNICODE encoding (UTF16) internally. This encoding can handle over a million character codes - 65536 in a single 16-bit value and the rest using special 2 16 bit character sequences. The console however is resolutely 8-bit character based with the upper 128 characters being determined by the choice of the code page (see the chcp command). It cannot even handle UTF-8 - the multi-byte 8-bit encoding for UNICODE characters - at least not as far as I can tell and I recently had occasion to try!
On the other hand the value 128 does appear to be the character you were originally after in the console codepage used on both our systems by default.
Remember that what value produces what character is a matter of mappings and fonts in use at the time, and these may well differ between applications such as consoles and GUI based editors, and of course the keyboard. e.g. the Visual Studio editor which does accept UNICODE characters - try it in main code and see - a good way us to quote something in MS Word and copy and paste the quoted text including the quotes to a program in a Visual Studio edit window. The editor will accept the fancy quote characters used by Word, but then the compiler will complain when you try to build the program!
Now onto the char types. I am not sure exactly what you mean by having two values - if you mean that bit patterns can be interpreted in different ways then yes.
Unlike other integer types char does not automatically indicate a signed character, there are in fact three types for char: char, signed char and unsigned char. Whether a plain char type is signed or unsigned depends on the compiler (and quite possibly the compiler flags - gcc has the -fsigned-char -funsigned-char options and MS VC++ has the /J - set default char to unsigned option). The reason for this is probably historic, and once you have products interpreting char as signed and others interpreting char as unsigned getting one side to change is an uphill struggle as all code written for those compilers will assume signed (or unsigned) char by default. Thus a change one way or the other is likely to break people's code - and there is a lot of old C and C++ code out there.
This is also why compilers often have options to override their preferred default signedness for char - to help when porting code from a system/compiler that defaults to the alternate signedness for char.
The difference between signed and unsigned char is the same as the difference between signed and unsigned versions of the other integer types, other than the fact that to ensure you get a signed char you have to state signed char and not just char as noted above. It is how the bit pattern is interpreted when performing arithmetic - the other use for char is as a small integer rather than a character. Many I/O devices use byte sized registers for example, and char is commonly of byte size. Also if doing manipulations on character codes - for example when converting from one character encoding scheme to another - it is often useful to modify bit patterns and add offsets and the like - adding an offset to a signed char will probably not get the expected result (although the bit pattern may be correct, any tests may fail e.g. 127+72=199, but if signed would be -57. If tested for say > 32 the test will fail). Another area of confusion with signed values is right shifting as the sign bit is often extended down the shifted bits (sign extended/arithmetic shift), but may not be (logical shift) as the standard says:
"The value of E1 >> E2 is E1 right shifted
E2 bit positions. If E1 has an unsigned type or
if E1 has a signed type and a nonnegative value,
the value of the result is the integral part of
the quotient of E1 divided by the quantity 2
raised to the power E2. If E1 has a signed type
and a negative value, the resulting value is
So it is usually best to view the values as definitely unsigned in such cases.
Exactly how signed types are represented is up to the compiler/system. The most common on desktop systems is of course 2s complement, as noted at the beginning of the answer. Another alternative would be sign magnitude representation in which the highest bit just indicates + or - and the other bits are always just a binary representation of how positive or negative. Thus in sign magnitude representation -57 and +57 differ only in the value of the top, sign, bit.
For more on various signed number representations see the Wikipedia article at http://en.wikipedia.org/wiki/Signed_number_representations
Hope this has been of some use. Please ask further questions if you have further queries.