You are here:

# C++/c++

Question
Is it any way to prevent changing of number? In c++ I get answer of 2999  / 10 = 299.899994. But it should be 299.9 answer is very very approximate but this is causing problem in my new program I am trying to make. please help.

This is not so much a C++ problem as it is a problem with floating point representation and arithmetic in general - i.e. it is a computer science problem. I am not an expert in the field of floating point arithmetic as I do not have to use it all that often, but have picked up the basics over the years, so here is a very brief and simplistic introduction.

Floating point representation of real values is an approximation of such values in most cases . The format stores values as a binary mantissa and a binary exponent. That is the value is:

binary-mantissa * (2 ^ binary-exponent)

Where * is multiplication and ^ is exponent (to the power of).

So many values that are not integers or are integers but are too large to be represented in the number of bits available for the mantissa portion of a floating point value can only be stored as an approximation. Non-integer values that can be exactly represented as the sum of positive and negative powers of 2 may be able to be exactly represented if there are enough bits to represent all parts of the value.

Just as in base 10 arithmetic there are values that cannot be represented in a finite number of digits (1/3 for example), so there are in base 2 (i.e. binary). Unfortunately 1/10 (as far as I know) is one such value [it has the repeating binary fraction 0.000110011001100..  (1/16+1/32+1/256+1/512+1/4096+1/8192+..., maybe more easily thought of as 3/32 + 3/512 + 3/8192+...)]. So just as dividing by 3 in decimal gives results with an infinite number of values after the decimal point for 2 out of three value (dividing a multiple of 3 by 3 of course yields a finite, whole number), so dividing by 10 in binary gives a result with infinite number of values after the binary point for 4 out of 5 cases (values divisible by 10 and by 5 generally give the expected results).

Of course computers hold finite values so what we get is an approximation of these infinite quantities, and thus they cannot be 100% accurate. You might like to try the following program that calculates values from [2990 to 3000] divided by 10 for the three floating point types supported by C++ (float, double, and long double). Note that I am using ISO standard C++ and C++ library features (which is not unreasonable as the ISO C++ standard was first published in 1998 most implementations around today should be able to cope with such a program):

#include <iostream> // for std::cout
#include <iomanip>  // for std::setprecision
#include <limits>   // for std::numeric_limits

// Function template to output a (floating point) value to std::cout
// using a precision based on the type of the passed value.
template <typename T>
void Print( T v )
{
std::cout << std::setprecision(std::numeric_limits<T>::digits10+2) << v
<< " (precision=" << std::numeric_limits<T>::digits10+2 << ")\n"
;
}

// Main entry point function.
// Iterates from 2990 to 3000.
// Calculates the loop value divided by 10
// for float, double and long double precision
// and prints the results to std::cout.
int main()
{
for (int v(2990); v != 3001; ++v )
{
float f(v/10.0F);
double d(v/10.0);
long double ld(v/10.0L);

Print(f);
Print(d);
Print(ld);
std::cout << '\n';
}
return 0;
}

Depending on your compiler long double may have the same precision as double or it may have more precision. For the two compilers I tried the code with MS Visual C++ 2005 had the same precision and g++ 4.2.3 for 64-bit Ubuntu 8.0.4 Linux had slightly more precision. The results should look something like:

299 (precision=8)
299 (precision=17)
299 (precision=17)

299.10001 (precision=8)
299.10000000000002 (precision=17)
299.10000000000002 (precision=17)

299.20001 (precision=8)
299.19999999999999 (precision=17)
299.19999999999999 (precision=17)

299.29999 (precision=8)
299.30000000000001 (precision=17)
299.30000000000001 (precision=17)

299.39999 (precision=8)
299.39999999999998 (precision=17)
299.39999999999998 (precision=17)

299.5 (precision=8)
299.5 (precision=17)
299.5 (precision=17)

299.60001 (precision=8)
299.60000000000002 (precision=17)
299.60000000000002 (precision=17)

299.70001 (precision=8)
299.69999999999999 (precision=17)
299.69999999999999 (precision=17)

299.79999 (precision=8)
299.80000000000001 (precision=17)
299.80000000000001 (precision=17)

299.89999 (precision=8)
299.89999999999998 (precision=17)
299.89999999999998 (precision=17)

300 (precision=8)
300 (precision=17)
300 (precision=17)

(from a run of the version built with the MS VC++ 2005)

OK so that is a brief introduction as to why - now what can you do to mitigate the problem?

Well you could try using a floating point type with more precision. The value you show in the question seems to indicate a single precision 32-bit floating point type. Usually this is used for the float type in C/C++. So try using double rather than float. The error when using greater precision will be smaller and this may be enough for your calculations - especially if the results only require the precision of a float value. That is all inputs and outputs are floats but all internal calculations are done using doubles.

If that does not work then try re-factoring the calculations e.g. so that divisions are done once only.

So rather than say

y1 = x1/10.0;
y2 = x2/20.0
y3 = x3/65.6
z = y1*y2*y3;

you could try:

z = x1*x2*x3;
y = 10.0 * 20.0 * 65.5;
z /= y;

However you have to be careful with such transformations to avoid introducing additional overflows (and possibly underflows).

Of course not being an expert on floating point operations I am not sure if this really mitigates the problem.

You could also check to see if your compiler has switches to control how floating point values are handled. Sometimes you have fast but not so accurate versus slower by more accurate options.

One thing you will have to be aware of when using floating point values is that because of the inexact nature of the floating point representation and operations when performing tests with floating point values you nearly always have to define an error value that is of a size relevant to your application - such a value sometimes called epsilon or similar.

So rather than saying:

if (fp_value == exact_value) ...

we have to say something like:

if ( fabs(fp_value - exact_value) < epsilon ) ...

That is if the absolute value of the difference between our value and the value we are testing for is less than the expected (or acceptable) error due to inexactitudes in floating point representation and arithmetic then consider the condition true. The fabs function (or std::fabs for hardcore C++ users <g>) can be found in the <cmath>, or <math.h> for C.

So if in your application values are accurate to say three decimal places use an epsilon value one or two orders of magnitude smaller e.g.

double const Epsilon(0.00001);

or even:

long double const Epsilon(0.00001L);

Note that using a float is probably not good enough if the values of epsilon and the expected values cannot be represented with the required precision - then we are right back where we started!

For example we could modify the Print function from the previous example like so:

typedef long double   ftest_type;

template <typename T>
void Print( T v, ftest_type expected )
{
std::cout << std::setprecision(std::numeric_limits<T>::digits10+2) << v
<< " (precision=" << std::numeric_limits<T>::digits10+2 << ')'
;

ftest_type const Epsilon(0.000001L);

if ( std::fabs(v - expected) < Epsilon )
{
std::cout << " Within required accuracy\n";
}
else
{
std::cout << " Outside required accuracy\n";
}
}

Print now takes a second parameter - an expected result value. This is of type ftest_type which is an alias for long double.

Now after printing the value of the calculation and precision a note is also displayed indicating whether the value was within the error tolerance expected or not using the compare the difference to an epsilon value test shown previously.

Main is updated like so:

int main()
{
ftest_type const ExpectedValues[]
=  { 299.0L, 299.1L, 299.2L, 299.3L
, 299.4L, 299.5L, 299.6L, 299.7L
, 299.8L, 299.9L, 300.0L, 300.1L
};
int const Begin(2990);
for (int i(0); i != 11; ++i )
{
int v(Begin+i);
float f(v/10.0f);
double d(v/10.0);
long double ld(v/10.0l);

Print(f, ExpectedValues[i]);
Print(d, ExpectedValues[i]);
Print(ld, ExpectedValues[i]);

std::cout << '\n';
}
return 0;
}

Here a set of expected result values are setup and each one passed to the Print function calls. The loop now iterates from 0 to 10 and calculates v based on the loop iteration value and a begriming value set to 2990. This should produce the same set of values for v as previously but allows the loop iteration value, i, to be used as an index into the ExpectedValues array.

The results now look like so (from a MS VC++ 2005 build):

299 (precision=8) Within required accuracy
299 (precision=17) Within required accuracy
299 (precision=17) Within required accuracy

299.10001 (precision=8) Outside required accuracy
299.10000000000002 (precision=17) Within required accuracy
299.10000000000002 (precision=17) Within required accuracy

299.20001 (precision=8) Outside required accuracy
299.19999999999999 (precision=17) Within required accuracy
299.19999999999999 (precision=17) Within required accuracy

299.29999 (precision=8) Outside required accuracy
299.30000000000001 (precision=17) Within required accuracy
299.30000000000001 (precision=17) Within required accuracy

299.39999 (precision=8) Outside required accuracy
299.39999999999998 (precision=17) Within required accuracy
299.39999999999998 (precision=17) Within required accuracy

299.5 (precision=8) Within required accuracy
299.5 (precision=17) Within required accuracy
299.5 (precision=17) Within required accuracy

299.60001 (precision=8) Outside required accuracy
299.60000000000002 (precision=17) Within required accuracy
299.60000000000002 (precision=17) Within required accuracy

299.70001 (precision=8) Outside required accuracy
299.69999999999999 (precision=17) Within required accuracy
299.69999999999999 (precision=17) Within required accuracy

299.79999 (precision=8) Outside required accuracy
299.80000000000001 (precision=17) Within required accuracy
299.80000000000001 (precision=17) Within required accuracy

299.89999 (precision=8) Outside required accuracy
299.89999999999998 (precision=17) Within required accuracy
299.89999999999998 (precision=17) Within required accuracy

300 (precision=8) Within required accuracy
300 (precision=17) Within required accuracy
300 (precision=17) Within required accuracy

You will notice that only some of the float result values are outside of our required tolerance.

However if we change the ftest_type to float:

typedef float         ftest_type;

And rebuild we will probably get a bunch of warnings. These should be due to us specifying long double values for the Epsilon and ExpectedValues array values, so we can update them like so:

ftest_type const Epsilon(0.000001F);

...

ftest_type const ExpectedValues[]
=  { 299.0F, 299.1F, 299.2F, 299.3F
, 299.4F, 299.5F, 299.6F, 299.7F
, 299.8F, 299.9F, 300.0F, 300.1F
};

to stop the compiler complaining.

Now if we re-build and run the program the output looks like so (again from a MS VC++ 2005 build):

299 (precision=8) Within required accuracy
299 (precision=17) Within required accuracy
299 (precision=17) Within required accuracy

299.10001 (precision=8) Within required accuracy
299.10000000000002 (precision=17) Outside required accuracy
299.10000000000002 (precision=17) Outside required accuracy

299.20001 (precision=8) Within required accuracy
299.19999999999999 (precision=17) Outside required accuracy
299.19999999999999 (precision=17) Outside required accuracy

299.29999 (precision=8) Within required accuracy
299.30000000000001 (precision=17) Outside required accuracy
299.30000000000001 (precision=17) Outside required accuracy

299.39999 (precision=8) Within required accuracy
299.39999999999998 (precision=17) Outside required accuracy
299.39999999999998 (precision=17) Outside required accuracy

299.5 (precision=8) Within required accuracy
299.5 (precision=17) Within required accuracy
299.5 (precision=17) Within required accuracy

299.60001 (precision=8) Within required accuracy
299.60000000000002 (precision=17) Outside required accuracy
299.60000000000002 (precision=17) Outside required accuracy

299.70001 (precision=8) Within required accuracy
299.69999999999999 (precision=17) Outside required accuracy
299.69999999999999 (precision=17) Outside required accuracy

299.79999 (precision=8) Within required accuracy
299.80000000000001 (precision=17) Outside required accuracy
299.80000000000001 (precision=17) Outside required accuracy

299.89999 (precision=8) Within required accuracy
299.89999999999998 (precision=17) Outside required accuracy
299.89999999999998 (precision=17) Outside required accuracy

300 (precision=8) Within required accuracy
300 (precision=17) Within required accuracy
300 (precision=17) Within required accuracy

Now we see an odd thing. All those values other than .0 and .5 fractions that were within tolerance last time are now outside tolerance and those that were outside are now within tolerance.

This is due to the fact that the expected values are now floats and are thus outside the required tolerance to start with - their values cannot be represented accurately enough for our specified tolerance as floats. However, the results of the calculations have exactly the same error for floats as the float expected values have so in these cases the errors cancel out!

For more on floating point try an Internet search. You might like to start with:

http://en.wikipedia.org/wiki/Floating_point

and:

http://docs.sun.com/source/806-3568/ncg_goldberg.html

this second article, called "What Every Computer Scientist Should Know About Floating-Point Arithmetic", is very long and gets quite technical. I have not read it all, but it does have some information on rounding and errors and the likes before things get too technical.

And then there are these from the C++ FAQ I refer in point 2/ to in my Instructions for Questioners at

http://www.parashift.com/c++-faq-lite

In this case you need to refer to the newbie section:

http://www.parashift.com/c++-faq-lite/newbie.html

Specifically 29.16 and 29.17:

http://www.parashift.com/c++-faq-lite/newbie.html#faq-29.16
http://www.parashift.com/c++-faq-lite/newbie.html#faq-29.17

Hope this has given you some idea about some of the pit falls of using floating point values and operations and hopefully given your soma pointers to possible ways to mitigate your problems. I cannot give more specific information as you provide no specific details as to how it is causing problems with your program.

If you would like further help then please ask further questions - but please remember that I am not a floating point expert!

In fact I think I may add something to this effect to my Instructions to Questioners - I have repeated enough time now...
Questioner's Rating
 Rating(1-10) Knowledgeability = 9 Clarity of Response = 8 Politeness = 10 Comment worked. Actually I was not searching for more approximate but the method to find it is exectally same as it should be or not and i found it in answer. Thanks!!

C++

Volunteer

#### Ralph McArdell

##### Expertise

I am a software developer with more than 15 years C++ experience and over 25 years experience developing a wide variety of applications for Windows NT/2000/XP, UNIX, Linux and other platforms. I can help with basic to advanced C++, C (although I do not write just-C much if at all these days so maybe ask in the C section about purely C matters), software development and many platform specific and system development problems.

##### Experience

My career started in the mid 1980s working as a batch process operator for the now defunct Inner London Education Authority, working on Prime mini computers. I then moved into the role of Programmer / Analyst, also on the Primes, then into technical support and finally into the micro computing section, using a variety of 16 and 8 bit machines. Following the demise of the ILEA I worked for a small company, now gone, called Hodos. I worked on a part task train simulator using C and the Intel DVI (Digital Video Interactive) - the hardware based predecessor to Indeo. Other projects included a CGI based train simulator (different goals to the first), and various other projects in C and Visual Basic (er, version 1 that is). When Hodos went into receivership I went freelance and finally managed to start working in C++. I initially had contracts working on train simulators (surprise) and multimedia - I worked on many of the Dorling Kindersley CD-ROM titles and wrote the screensaver games for the Wallace and Gromit Cracking Animator CD. My more recent contracts have been more traditionally IT based, working predominately in C++ on MS Windows NT, 2000. XP, Linux and UN*X. These projects have had wide ranging additional skill sets including system analysis and design, databases and SQL in various guises, C#, client server and remoting, cross porting applications between platforms and various client development processes. I have an interest in the development of the C++ core language and libraries and try to keep up with at least some of the papers on the ISO C++ Standard Committee site at http://www.open-std.org/jtc1/sc22/wg21/.

Education/Credentials