Python: The Right Way and the Wrong Way
As you’ve probably noticed by now, I’ve been trying really hard to learn Python and work it into my daily coding rituals. It’s a popular language with alot of potential, but I’ve unfortunately ran into some problems with it. Most of my issues have been related to the poor execution speed. I expected, since Python is interpreted, that it would be slightly slower than native C code but not terribly so. Sadly, I’ve seen performance hits of 3x-10x of Python code over C code (using VTK).
It’s been a bit disheartening. It’s so quick to develop that I’ve got a nice collection of little scripts to work with VTK. Since VTK is all C, the performance penalty is negligible. But start to do anything that’s pure python and suddenly the performance plummets to simply unworkable levels. Just yesterday I was presented with another one of these, but thanks to Viraj I learned a valuable lesson: While there may be more than 1 way to skin a cat, there is 1 way that is far superior to the others. Unfortunately, it’s the most obscure way as well.
Read on for the details…
[tag:python][tag:sourcecode][tag:vtk][tag:performance]
For the first attempt, I used a List of Lists. I started with a list of 200 entries, each entry being an empty list. These empty lists would represent my bins and I would simply append the new values into the lists. The relevant code looked something like this:
| count = []*200 for i in xrange(0, NewC.GetNumberOfTuples()) : if (i % 2**20 == 0) : print “t.. %s” % i point = data.GetPoint(i) val = NewC.GetValue(i) id = int((val – min) / step) if (id < 0) : id = 0 if (id > bins-1): id = bins count[id].append(” %s %s %s” % (point[0] * 100.0, point[1] * 100.0, point[2] * 100.0) |
(There is other code before this to Load the VTK Dataset, and other code after this to write the results to disk)
So to describe the code briefly:
- I first construct my list of 200 empty lists.
- For every “tuple” (point) in the data
- Compute “id” from the NewC “val”
- Constrain it within (0,200)
- Append the XYZ location to the appropriate list
And it worked. Kinda. From the debugging I did (on smaller datasets), it worked. But when I ran it on the full dataset, I finally had to kill it after 65Minutes. VTK Can load the dataset in about 1G of ram, but during this stage the vtkpython process ballooned to 5G (before I killed it).
Surely python can’t be this slow, can it? Well, I didn’t know honestly. That was when I got Viraj involved. We discussed the problem in detail and Viraj was a confused as I was. I could write the results straight to disk faster than I could store them in memory, except I couldn’t maintain 200 simultaneous open file handles. Not to mention that was just a really nasty hack. Viraj suggested that I try using a Dict of Lists, instead of a List of Lists. Neither of us really thought much would change but we were curious. A quick experiment revealed no improvement.
After some digging around, Viraj sent me a link to this google groups post on comp.lang.python . With the information there, I rewrote the code like so:
| count = [] sizes = [] for i in xrange(0,bins): count.append(cStringIO.StringIO()) sizes.append(0) for i in xrange(0, NewC.GetNumberOfTuples()) : |
Basically, I now maintain a list of “cStringIO” objects and append to those. The code is functionally the same, I wind up with 200 strings that are my bins.
Interestingly, tho, this code runs in 30 minutes flat. It also runs in only 3G of Ram, half of the previous version. I could do better in C, but it would be a real mess of linked lists and text formatting stuff (for the later portions). This is a good workable solution.
So the moral of the story: There is a right way and a wrong way to everything. Not that the wrong way is “broken” necessarily, but the right way is far superior. This little exercise has reaffirmed my faith in python and given me another reason to keep studying it. Hopefully, this little article will prove equally useful to someone else.

