C++/Pre & post increment
Expert: Ralph McArdell - 8/1/2007
QuestionSir,
Read the question below.
#include<iostream.h>
#include<conio.h>
void main()
{
int a,add;
a=5;
add = (++a)+(++a);
cout<<add;
getch();
}
According to me its answer should be 13. But compiler gives 14? Why is it so?
AnswerBecause what you are doing is illegal! You are modifying the value of a more than once in the expression:
add = (++a)+(++a);
Under the rules for expressions (section 5, paragraph 4 of the ISO C++ standard):
"Except where noted, the order of evaluation of operands of
individual operators and subexpressions of individual
expressions, and the order in which side effects take place,
is unspecified. Between the previous and next sequence
point a scalar object shall have its stored value modified at
most once by the evaluation of an expression. Furthermore,
the prior value shall be accessed only to determine the value
to be stored. The requirements of this paragraph shall be met
for each allowable ordering of the subexpressions of a full
expression; otherwise the behavior is undefined.
[Example:
i = v[i++]; // the behavior is unspecified
i = 7, i++, i++; // i becomes 9
i = ++i + 1; // the behavior is unspecified
i = i + 1; // the value of i is incremented
-end example]"
In particular notice the final words "otherwise the behavior is undefined" along with "a scalar object shall have its stored value modified at most once by the evaluation of an expression". I would also be suspicious of "the prior value shall be accessed only to determine the value to be stored". The reference to sequence points is a technicality as some operators, such as the comma operator, although still used in a single expression introduce a point at which previous calculations should have been done - and so the restrictions of this clause do not apply across such operators (as in the second example) - hence they invented the term sequence point.
The standard introduces side effects and sequence points thus (section 1.9, paragraph 7):
"Accessing an object designated by a volatile lvalue,
modifying an object, calling a library I/O function, or
calling a function that does any of those operations are
all side effects, which are changes in the state of the
execution environment. Evaluation of an expression might
produce side effects. At certain specified points in the
execution sequence called sequence points, all side effects
of previous evaluations shall be complete and no side effects
of subsequent evaluations shall have taken place."
There are sequence points:
- At the completion of evaluation of a full expression
- When calling a function, after evaluation of function arguments
- After copying a function return value before execution of expressions outside the function
- When evaluating each of the following expressions there is a sequence point after evaluation of the fist expression (called a in each expression below).
a && b
a || b
a ? b : c
a , b
In your case your expression falls only into the first case where there is a sequence point at the end of the full expression. Thus as it modifies a twice, and twice being more than once the expression falls into the area of undefined behaviour - which in fact means anything could happen. For example your program gives inconsistent results (either with versions built using other compilers or compiler switches, or with itself across executions), your program crashes, or even the program causes your hard disk being reformatted. The first is most likely - that you get results that are inconsistent between compilers or compiling using the same compiler with different switches.
What is happening here (for this specific compiler) is that one ++a is performed making a equal to 6. Next the second ++a is performed making a equal to seven. Next a is added to a, and as a is 7 at this point 7 + 7 equals 14. This despite the expression being illegal is in fact the most reasonable result if you check the order of precedence of pre increment and binary addition and the fact you forced this evaluation sequence using parentheses.
Look at it like this:
int a = 5;
++a;
++a;
int add = a + a;
It may seem that nothing can go wrong. However consider what sort of machine instructions a compiler could generate for youe expression:
First one that works as above:
Load value of a from memory into register1
Increment value in register1
Increment value in register1
Store value of register1 back to memory address for a
Add value in register1 to value in register1
Store value in register1 to memory address for add
Here the compiler remembers that the value of a is in register1 while it is being worked on so uses register1 as if it were a for most of the expression evaluation. Only at the end does register1 not contain the value of a, so the compiler stores the updated value of a back to memory before it is destroyed.
Here is a sequence that produces 12 rather than 14:
Load value of a from memory into register1
Increment value in register1
Load value of a from memory into register2
Increment value in register2
Store value of register2 back to memory address for a
Add value in register2 to value in register1
Store value in register1 to memory address for add
Here the compiler loads a from memory into different registers and increments each one separately - as a starts at 5, both times 5 is loaded into a register, thus when register2 is added to register1 both were incremented from 5 and contain 6. I am not convinced this is a sequence that a C++ compiler should produce. It may have to store register1 back to memory after incrementing the value (see next example).
Here is a sequence that produces the 13 you thought you would get rather than 14:
Load value of a from memory into register1
Increment value in register1
Store value of register1 back to memory address for a
Load value of a from memory into register2
Store value of register1 back to memory address for a
Increment value in register2
Add value in register2 to value in register1
Store value in register1 to memory address for add
Here the compiler loads and stores a from / to memory around each operation. So register1 is loaded with the starting value of a of 5, is incremented to 6 and stored back to memory. Register 2 is then loaded with the updated value of a, 6, and is incremented to 7 and stored back to memory. Register2 is then added to register1, thus producing 6 + 7, i.e. 13, which is stored back to add's memory location.
Now loading and storing data to and from memory is expensive, even with cache memory - the fastest memory are the CPU registers (assuming the processor uses such things, most do). Thus in the three possibilities above the first sequence would seem the most efficient - 1 load, 2 stores. The second is next, having two loads and two stores and the third is last having two loads and three stores.
Of course exactly what code choices a compiler makes depends on many things - the architecture and instruction set of the processor, other code around the site of the expression in question, what optimisations are in effect (if any) etc. The three examples I show here are just that - examples - in all three I have assumed that operations are performed on values in CPU registers and that increment is a separate operation of its own. This is just one possibility, although a reasonable one. As I mentioned I am not sure a C++ compiler should produce code like that of the second example - it may be restricted by the letter of the standard. Thus the standard limits the scope for optimisations in some cases, and therefore restricts the available code sequence choices that a compiler writer could choose.
I hope this gives you some idea as to why your expression did not perform as expected and why the C++ language standard makes such usage illegal and leading to undefined behaviour.