## Tuesday, September 15, 2009

### Careful with Floating Point Sums

Floating point computations are imperfect in the computing world. Often times we overlook this fact, so it's good to refresh our memories every now and then.

A simple summation loop such as this one can produce results with large errors.

int i;
float fSumFloat = 0.0f;
double fSumDouble = 0.0;
const float fIncrementFloat = 0.0001f;
const double fIncrementDouble = 0.0001;
const int nIterations = 25000000;

for( i = 0; i < nIterations; i++ )
{
fSumFloat += fIncrementFloat;
fSumDouble += fIncrementDouble;
}

You would expect that summing 25 million 0.0001's would result in 2500, but depending on your machine the results will vary. Here are results of running the above on my PC (with all optimizations turned off). You'll see that there is some error even when using double precision floating point.

Sum (float) = 2048.0000000000
Sum (double) = 2500.0000004718
Expected Sum = 2500.0000000000

Now that I have your attention, there are ways to combat the roundoff errors that ultimately kill us. One popular method is the Kahan Summation method. Another method is known as pairwise summation, where you continually sum pairs of terms until all you have left is the final sum. For example, start with 1 + 2 + 3 + 4. The first pass yields you (1+2) = 3 and (3+4) = 7. The second pass gives you your desired sum of (3+7) = 10.

Both of these summation methods are slower than naive summation, but they are available if needed.

One thing that I found a bit interesting is that the results of these two methods were different on my PC as compared to a Linux box. I'm pretty sure that it has nothing to do with the OS, instead it's probably due to CPU and compiler differences.

On my computer at home:
Sum (float) = 2048.0000000000
Sum (double) = 2500.0000004718
Actual Kahan Sum (float) = 2048.0000000000
Pairwise Sum (float) = 2499.9998779297
Expected Sum = 2500.0000000000

On a Linux box:
Actual Sum (float) = 2048.0000000000
Actual Sum (double) = 2500.0000004718
Actual Kahan Sum (float) = 2500.0000000000
Pairwise Sum (float) = 2500.0000000000

I also noticed that when I had compiler optimizations turned on, the results coming from my PC were definitely better.

Actual Sum (float) = 2499.9999368447 [Compiler Optimizations Turned On]

Anyway, the bottom line is if you're going to be dealing with floating point calculations (and, summations in particular), be careful and run lots of experiments. Even when the code you've written is highly portable, you will want to make sure that results will be as you expect on your target environments. 