Yeraze's Domain 3.0

Supercomputers, Programming, and Life in Mississippi

Entries for the ‘Source Code’ Category

Process Limiting in Unix with Semaphores

One common trick in the Wide World of Windows (Also known as the WWW, not to be confused with the World Wide Web or Weasley’s Wizard Wheezes) is to prevent the user from running more than 1 instance of an application.  Try it sometime, open up something like Microsoft Word.  Then, try to open it again and you’ll find that it doesn’t work, it simply refocuses the previous session.  It’s a pretty useful trick and Windows gives you lots of ways to make this happen.

However, on a real multi-user operating system like Linux or Unix, such behavior is shunned.  You expect multiple copies of everything to be open at once since you may have multiple users running the program simultaneously.  Every now and then, however, it’s advantageous to get this similar behavior.  I ran into a case-in-point this week.  A user was running ezViz, my pride and joy, on one of the supercomputers.  He submitted several jobs, each one running several instances of ezViz serially.  Unfortunately, they all wound up on the same node, and he wound up running 8 versions simultaneously, each one wanted 3.5G of a 16G machine.  It wasn’t pretty. 

After thinking about it for a while, I thought it might be helpful to give the user a way to specify a maximum number of simultaneous runs.  Runs beyond that number would simply wait their turn.  First attempts were to implement something like ‘ps‘ to check the running processes and see how many were running.  Not only is this difficult, but there are alot of race conditions and such from multiple versions trying to check simultaneously. 

I did some research and a friend of mine suggested using a shared memory block with shmget to have each process ‘register’ itself in the shared space.  Each process could register and know how many other processes were going, and then decide whether to wait or continue.  While that’s definately an option, a similar but far better method is Semaphores.  Come on inside for more..
[tag:linux][tag:unix][tag:source][tag:semaphore]

SWFC & Flash Animations

Last week, at work, they decided that we need another site redesign for the DAAC Website.  The new frontpage is significantly smaller than before, so one idea early one was to incorporate a Flash Menu to scroll between different items.  That way we increase exposure of various things, without dedicating any more screen space.  It was a good idea, and pretty quickly we decided on something like the widget at www.steampowered.com and gamespot.com.  By "we" I mean that the group decided I needed to make it :)   So I broke down the requirements like so:

  1. It should smoothly transition between 5 images, about once every 3 seconds.
  2. Buttons across the bottom that indicate which of the 5 images is shown
  3. When they click on a button, go directly to that image and disable the automatic transition
  4. Each image should be accompanied by a small text Description
  5. Clicking on the Image should go to a URL

Seemed simple enough, right?  Unfortunately, I’ve never done anything in Flash before, and we don’t have the Adobe Flash CS2 package in the office.  Being the proponent of Open-Source that I am (and having a deadline of "On the Website" within 4 weeks), I hit the net to see what I could find.  It wasn’t long before I stumbled upon the SWFTools.
[tag:flash][tag:swftools][tag:swfc]

C Loop Optimization: Duff’s Device

Yesterday I stumbled across an interesting article on a code snippet known as "Duff’s Device".  It’s a clever little C Macro to partially unroll a loop, offering pretty significant performance improvements in certain scenarios.  The code is relatively simple, consisting of the single define shown here:

#define DUFF_DEVICE_8(aCount, aAction) \
{ \
  int count_ = (aCount); \
  int times_ = (count_ + 7) >> 3; \
  switch (count_ & 7){ \
    case 0: do { aAction; \
      case 7: aAction; \
      case 6: aAction; \
      case 5: aAction; \
      case 4: aAction; \
      case 3: aAction; \
      case 2: aAction; \
      case 1: aAction; \
    } while (–times_ > 0); \
  } \
}

The code itself is a little strange, placing a Do/While loop inside a Switch is something that most programmers have never seen before (Myself included).  Upon closer inspection tho, there are several little optimizations in here aside from the main one:

  1. The loop iterator –times_ is a pre-decrement.  This is a minor detail, but pre-decrement (and increment) are faster than post-decrement (and increment).  Post requires the compiler to create a duplicate of the variable and copy the updated value back at the end, while a pre-operation can be done in-place immediately. 
  2. Use of Shifts instead of Division at the beginning.  This is a huge improvement when working with integers and powers of 2 (>>3 is equivalent to division by 8).
  3. Comparison against 0.  This is a special-case in most hardware that is faster to evaluate than any other comparison.

These three optimizations are often done by the compiler for you, so specifying them won’t usually do much for you.  But the entire "Duff’s Device" is somewhat mysterious.  How does it work? Click inside for details…
[tag:c][tag:source][tag:optimization][tag:duffdevice]

TCL vs Python

Today I spent some time rewriting a lengthy Tcl script of mine in Python.  Why, you may ask?  Well, the script is a bit unwieldy in Tcl and I could see where the additional structure of Python may help to clean it up.  The script is a simple log parser to analyze log file and generate pretty HTML documents with the results.  Performance was starting to become a problem as the Tcl version was using alot of memory by loading the entire contents of the files into memory before analyzing them.  I really needed to rework the script to analyze them line-by-line, which would be a major refactoring of the code so I figured I’ld try Python (It’s not really a rewrite, I’ld just need to interlace the Read & Parse portions instead of having them in 2 separate loops).

It took me about 2 hours to do it.  The starting TCL script is 566 lines and the resulting Python script is 456 lines, a net saving of about 100 lines of code.  But that’s not particularly important.  I verified that the Tcl & Python versions generated identical HTML, and then set out doing some basic benchmarks.  For starters, here’s some simple benchmarks of processing all the logfiles I have right now:

Tcl Python
49.3s 27.63s

So that’s almost 50% time saved.. ButI had expected an improvement but that’s more than I ever hoped.  I had expected Python to be more optimized and efficient than TCL, but was there any other reasons why this might be such an improvement?  Click inside for some of my thoughts…[tag:tcl][tag:python][tag:code][tag:programming]

InvSqrt

Now this is simply an amazing little snippet of math here. This function performs a 1/sqrt(x) operation in a fraction of the time, while maintaining accuracy. This operation is performed alot in graphics, and seeing it distilled down to such a simple and fast function… Well, it’s amazing.

It’s basically a Newton-Raphson approximation that exploits the X86 memory to perform some fancy floating-point match in Integer registers (eg. much much faster), with a very clever first approximation (0x5f3759df ). Very impressive stuff.

Found this via Beyond3d.
Update: Just curious of the accuracy of this snippet, so I coded it up so see what happened.. Click through for the results…

Update 12/20: I added some more results. Now I have some Timing Figures.  Also, a friend just sent me a link to This PDF Paper by Chris Lomont describing the “Magic number” and how the entire process works.  It’s an interesting read, but gonna be a while before I can fully understand it.

[tag:algorithm][tag:sqrt][tag:snippet]

PNG Transparency in Internet Explorer

It’s a long-held belief that the folks at Microsoft really don’t have the slightest idea what people want from a Browser.  Issues like CSS incompatibility, javascript incompatibilities, DOM problems, and PNG behavior have been problems for as long as I can remember.  Microsoft has repeatedly said that “people don’t care about that stuff” and gone on to develop other features that people don’t want like VBScript and ActiveX.  Even IE7 suffers from most of those problems, but is getting better bit by bit.

In my day-to-day use, the one that most commonly bites me is the PNG Transparency problem.  Like GIF, PNG supports a transparency channel so that you can have regions of an image “see-through”.  Unlike GIF, however, PNG supports a full 8-bit transparency so that you can semi-transparent regions rather than just binary transparency.  This lets you do alot of really nice effects, as well as use Antialiasing and feathering to make your images smoother.  Unfortunately, IE doesn’t support this and always places PNG’s on a solid-color background.  This makes them pretty much useless for web use unless you’re designing exclusively for FireFox, Opera and Safari.

In some work this week, tho, I found an incredible little snippet of code that actually fixes PNG Transparency in IE. That’s right, it fixes it.  It seems that in IE5.5 they added a (custom, of course) stylesheet tag for “behaviors” where you can write code to change how things behave.  I think it was meant to allow a means of embed-ing code (via “htc” files) into CSS, so you can have code attached to CSS tags instead of HTML tags.  It’s a neat idea but I’ve never seen it in use, until today.  On the WebFX Website they have an example for PNG Behavior that re-enables the expected transparency with some clever code.  It’s a fantastic hack, just drop the blank.gif and pngbehavior.htc on the server and 1 line of CSS and viola, PNG’s work on that page.

It’s as simple as adding the following to every page:

<style type="text/css">img {   behavior: url("pngbehavior.htc");}</style>

I’m using it on the stuff I’m developing now, and it makes things so much easier. 
[tag:explorer][tag:ie][tag:png][tag:code][tag:html][tag:browser]

Python: The Right Way and the Wrong Way

As you’ve probably noticed by now, I’ve been trying really hard to learn Python and work it into my daily coding rituals.  It’s a popular language with alot of potential, but I’ve unfortunately ran into some problems with it.  Most of my issues have been related to the poor execution speed.  I expected, since Python is interpreted, that it would be slightly slower than native C code but not terribly so.  Sadly, I’ve seen performance hits of 3x-10x of Python code over C code (using VTK).

It’s been a bit disheartening.  It’s so quick to develop that I’ve got a nice collection of little scripts to work with VTK.  Since VTK is all C, the performance penalty is negligible.  But start to do anything that’s pure python and suddenly the performance plummets to simply unworkable levels.  Just yesterday I was presented with another one of these, but thanks to Viraj I learned a valuable lesson:  While there may be more than 1 way to skin a cat, there is 1 way that is far superior to the others.  Unfortunately, it’s the most obscure way as well.

Read on for the details…
[tag:python][tag:sourcecode][tag:vtk][tag:performance]

Python with a Vengeance

Yesterday I had my first.. well, disappointment in Python. Let me explain.

I have some data from a researcher that is written in a very simple ASCII format. On each line I have X,Y,Z particle location, U,V,W particle velocity, and a few other paramters. They are all space separated, and I just needed to load them into a VTK file so I could look at them with other code. The datasets, however, are pretty large (20 million lines in the file = 20 million particles). The easiest thing to do seemed to be to use my new VTK Python code and see how it went.

Without too much trouble I threw together some quick Python code to do the following

  1. Initialize & Open the file
  2. Read the line & split it into the parts
  3. Add a “vtkVertex” to the “vtkPolyData” at the desired location.
  4. Convert the UVW Velocity to its cylindrical coordinate Theta & R
  5. Save the Theta & R
  6. Repeat 2-5 for every line in the file
  7. Combine them all into a single vtkPolyData
  8. Write the results to Disk in a Legacy VTK file.

I had the code written and debugged in about 10 minutes. Then I started to notice something.
Note: This story is broken into multiple pages because of it’s length.
[tag:python][tag:vtk][tag:performance][tag:comparison][tag:programming]

Python: First Impressions

Well, I’ve finished the first half of O’Reilley’s Learning Python book and I’m impressed. The language is clean and not too hard to learn, especially given my experience with similar languages like Tcl/Tk. There are however a few things that intrigue and concern me:

  1. No Magic.Python seems really big on the “No Magic” idea, meaning that if the result of an operation is ambiguous then it simply doesn’t do it. This was particularly frustrating for me with TclTk as it was frequently difficult to determine if a string would be considered a Number (for addition) or a string (for Concatenation). This most frequently was a problem when you were trying to do string processing operations on numbers, you wound up with a string like “042″ that it started interpreting as Octal 42. That Python warns you & forces you to typecast is somewhat nice.
  2. Immutable vs Mutable: In C you can spend alot of time getting the hang of “Pass by Reference” vs “Pass by Value”. In Python it seems you spend alot of time trying to get the hang of Mutable vs Immutable. The reference is similar when it comes to function arguments, but this distinction effects the code at all levels. From what I see, it’s mostly a theoretical problem that in practice doesn’t really happen much. I mean, honestly, how often have you ever had to modify a number “in place”. I’m betting never. “2″ is “2″, and you probably want it to stay that way. Where I figure I’ll get burned is with “tuples”, the immutable form of a list.
  3. Generators/Iterators: This is just a neat gizmo here. Having functions that “yield” a value instead of “return” seems a bit awkward, but I can already see the huge benefits you can reap from this.

And, of course, there’s plenty more. I’m just getting started. VTK has Python Bindings (I believe the new Paraview 3.0 is Python-based even) so I’ve spent the last few days trying to get VTK+Python setup on my Windows Workstation at the office and on our SGI Irix system. It’s proven to be difficult, partly because of the wierd architectures and partly because of VTK, but it’s done. So I present to you, my first functional VTK+Python script.
 [tag:python][tag:source][tag:vtk][tag:script][tag:software][tag:programming]

Swapping Extents

Working in graphics, one thing you find yourself doing quite often is taking two points and then computing a set of extents. Usually I do this with 3 if statements (one for each axis) to ensure I get the Min & Max swapped if necessary.

Well, I was watching the VTK CVS tree today and found one clever little code change like so:

int swapXBounds = (spacing[0] < 0); // 1 if true, 0 if false
int swapYBounds = (spacing[1] < 0); // 1 if true, 0 if false
int swapZBounds = (spacing[2] < 0); // 1 if true, 0 if false
this->Bounds[0] = origin[0] + (extent[0+swapXBounds] * spacing[0]);
this->Bounds[2] = origin[1] + (extent[2+swapYBounds] * spacing[1]);
this->Bounds[4] = origin[2] + (extent[4+swapZBounds] * spacing[2]);
this->Bounds[1] = origin[0] + (extent[1-swapXBounds] * spacing[0]);
this->Bounds[3] = origin[1] + (extent[3-swapYBounds] * spacing[1]);
this->Bounds[5] = origin[2] + (extent[5-swapZBounds] * spacing[2]);

This accomplishes something very similar but with alot less code & branching. So a simple extents checker could be written like so:

int swapPoints = (range[0] > range[1]);
min = range[0 + swapPoints];
max = range[1 - swapPoints];

I just thought this was a really neat way to avoid all the branching If statements that typically accompany a piece of code like this.
[tag:sourcecode][tag:graphics][tag:programming]