One of the machines we have here at the lab is an older Compaq SC45 named Emerald. Given that the machine has pretty much reached end-of-life (4 years for government supercomputers), and that Compaq has been acquired by HP, this machine is basically “orphaned”. But since it is currently operational, that means I have to have my code running on it. For better or worse.
For those of you who have never had the “priviledge” of working on a Compaq SC, they run something called Tru64 Unix. I think it’s part of a standard called OSF Distributed Computing Environment, and also goes by the name “HP-UX” sometimes. I’m not sure how many of my problems are related to the fact that it’s based on this and how much is from Security and Age related issues, but it’s been an absolute nightmare. At first it wasn’t so bad. The GNU 3.4.0 Compilers that are installed don’t work (IOT/Abort Trap everytime), and alot of the GNU utilities I like were missing (top, vim, dir), but I was eventually able to get everything setup.
One of the things that really complicates my application is the amount of dependencies it requires. To get it running requires a working installation of (in Order) Mesa, VTK, Xdmf, HDF5, and Xmdf. To make it even more complicated, some of those require CMake. You can imagine how difficult it is to keep all these changing codebases in working order, but then multiply that by the amount of work it takes to keep it running on our SGI Origin3k running Irix, Cray XT3 running a variant of Linux, and various other workstations running Redhat Enterprise Linux. It’s pretty difficult, made even more so by the fact that each of these systems has their own compilers & libraries that are optimized for the architecture at hand. None of them are compatible, and each of them have their own problems.
This week I’ve been fighting with Emerald and Compaq’s (now HP’s) Tru64 compilers. I had previously gotten all my code to compile just fine, so I thought this would be an easy task. In classic Murphy fashion, it was anything but. The problem was an interesting one as everything would compile and link perfectly, no errors at all. I’ld install it all, and finally compile my program and install it beautifully. Then when I try to run it:
emerald0> ./main
resolve_symbols: loader error: dlopen: libvtkVolumeRendering.so: symbol “__array_new2″ unresolved

This led me on a two-day long adventure of the internals of dynamically-linked shared libraries and how they work. Specifically, how they don’t work. After two days of learning all about ldd and nm, and how /usr/bin/cxx is actually a symlink to a driver script in /usr/ccs/lib/cmplrs/cxx/V7.1-006… I finally figured it all out. As shown above, the problem was this mysterious “__array_new2″ function, which isn’t called by any of the source code that I’ve compiled (nor in my code nor any of it’s dependencies). I figured it had to be a result of name mangling of some other function, probably C++’s “new” keyword. Try as I might, I wasn’t able to figure out where it was tho. With a tip from some other guys, I finally found it in the libcxx.so library in the compiler’s directory. But why couldn’t the program find it?
Well, I was somewhat surprised to find out that even those these are “dynamically” linked and that they will find libraries specified via the LD_LIBRARY_PATH environment variable, evidently an initial search path is hard-coded into the program during the link stage. So what was happening was the “libcxx.so” link requirement was being satisfied by another libcxx.so in another directory, which didn’t have this __array_new2 function. Finally, after manually adding this directory to my compile commands & recompiling, I got it all to work. Adding this directory to my LD_LIBRARY_PATH wasn’t enough, as it was always looking in this hard coded “/usr/shlib” location for it first, and finding it there.
So what’s the point of this lengthy post? I honestly don’t know, I guess just to whine & vent a bit. That one stupid library cost me (and a user who was waiting for me) two days of productive work. So far the only reasonable explanation is that someone upgraded the compiler but didn’t remove some of the old libraries, hence this old cxx library in an odd location. On a positive side I learned a few things:
- Always make sure you’re linking against what you think you’re linking against.
- Always compile & install into your home directory first, before pushing it out to public use. I screwed up a perfectly good working installation by installing too quick.
- This also helped us fix another problem that’s been bugging us for a long time: how to make Static Libraries. We would always get hundreds of errors about libraries not having a table of contents. Come to find out that the ar and ranlib in /usr/local/bin don’t work with the Tru64 outputs, and you have to use the other ar & ranlib in /usr/bin.
So in short, it’s been a very frustrating couple of days. But at least I finally accomplished what I originally intended (upgrade my app on Emerald), and learned a few new things along the way. Guess you can’t ask for more…. Except for it not to take a week.
[tag:Tru64][tag:programming][tag:compaq][tag:unix]