The Black Art of Compiler Mojo
Well, here I am working on a piece of code on two systems, the same code I was working on last week. I’m testing it on a dataset on two machines simultaneously: An old SGI Onyx340, and a fairly new Redhat Linux Opteron system (x64). After a few runs, I notice I’m getting wildly different answers on the two systems.
Well, not wildly. They both determine the same starting points, with the same values. They both show the same data statistics (ranges, sizes, etc). But on the linux machine, the code seems to terminate much earlier, only having generated 1/3 to 1/2 of the points the SGI version does. So what’s goin on?
It seems to be doing the same thing in both cases. Best I can figure, it’s just that after a while the rounding errors seems to start causing diverging results in the two sets of code. This is where we separate the men from the boys and delve deep into the black magic of Compiler Optimization switches. I initially thought that perhaps a compiler optimization was kicking in somewhere and reordering things in a way that lost some precision. Well, after alot of headscratching, I came to a realization that I hadn’t thought of before: All the SGI stuff I’m compiling in n64 mode (64-bit), but I’m not setting any options at all on the Linux Machine. So I reconfigure the app to use “-O -m64″ for all the compile stages. Much to my surprise, now it matches the SGI version. I’m still not entirely sure what’s going on, but I have a hunch that it’s now using Double’s instead of Floats for some of the computations, increasing the significant figures and removing some of the roundoff error that might have been giving me trouble. And as an added bonus, the -O flag has significantly improved performance.
During this last run, with both systems using -O & 64 bit apps, the Linux system generated 700+ cores with 18-point tubes in the time it took the SGI machine to generate 75 with 10-point tubes. Woohoo for commodity hardware!
Update: Nov 22nd, 8:00am Well, looks like I spoke too soon. For some reason, the Linux machine still returns “nan” for some computations that the SGI solves with accuracy. Guess it’s back to the drawing board….

