Yesterday I had my first.. well, disappointment in Python. Let me explain.

I have some data from a researcher that is written in a very simple ASCII format. On each line I have X,Y,Z particle location, U,V,W particle velocity, and a few other paramters. They are all space separated, and I just needed to load them into a VTK file so I could look at them with other code. The datasets, however, are pretty large (20 million lines in the file = 20 million particles). The easiest thing to do seemed to be to use my new VTK Python code and see how it went.

Without too much trouble I threw together some quick Python code to do the following

  1. Initialize & Open the file
  2. Read the line & split it into the parts
  3. Add a “vtkVertex” to the “vtkPolyData” at the desired location.
  4. Convert the UVW Velocity to its cylindrical coordinate Theta & R
  5. Save the Theta & R
  6. Repeat 2-5 for every line in the file
  7. Combine them all into a single vtkPolyData
  8. Write the results to Disk in a Legacy VTK file.

I had the code written and debugged in about 10 minutes. Then I started to notice something.
Note: This story is broken into multiple pages because of it’s length.
[tag:python][tag:vtk][tag:performance][tag:comparison][tag:programming]

The code looked like this:

from vtk import *

points = vtkPoints()
points.Allocate(100000,4096)

grid = vtkPolyData()
grid.Allocate(100000,4096)

cell = vtkVertex()
pointList = vtkIdList()

Vtheta = vtkFloatArray()
Vtheta.SetName(“Vtheta”)

Vr = vtkFloatArray()
Vr.SetName(“Vr”)

num = 0
print “Reading file…”
breakpoint = 2**14
for line in open(r’part_all/Mpart_all.53132′):
    num += 1
    if num % breakpoint == 0:
        print ” %s..” % num,

    parts = line.split()
    (x,y,z) = map(float, parts[0:3])
    pointList.Reset()
    pointList.InsertNextId(points.InsertNextPoint(x,y,z))

    grid.InsertNextCell(cell.GetCellType(), pointList)
    (u,v,w) = map(float, parts[3:6])
    ir = 1.0 / ((x*x + y*y)**(1/2))
    cos = x * ir;
    sin = y * ir;
    Vr.InsertNextValue(u*cos + v*sin)
    Vtheta.InsertNextValue(-u*sin + v*cos)

print ‘Compositing…’
grid.SetPoints(points)
grid.GetPointData().AddArray(Vr)
grid.GetPointData().AddArray(Vtheta)

print ‘Writing VTK file’
writer = vtkDataSetWriter()
writer.SetFileName(‘particles.vtk’)
writer.SetFileTypeToBinary()
writer.SetInput(grid)
writer.Write()

Not too complicated. It ran like a champ, and generated perfect results. The problem: It took 45 minutes. That’s just too long, even for VTK. 20Million points isn’t really all that much, especially when I’m not really doing any computation on them. Something didn’t seem right. So for a comparison, I converted the code to C.
[page_break]
The C Code looked like this:

#include <vtkPoints.h>
#include <vtkVertex.h>
#include <vtkFloatArray.h>
#include <vtkPolyData.h>
#include <vtkPointData.h>
#include <vtkIdList.h>
#include <vtkDataSetWriter.h>
#include <unistd.h>
#include <stdio.h>

int main(int argc, char *argv[]) {
    vtkPoints *points = vtkPoints::New();
    points->Allocate(100000,4096);

    vtkPolyData *grid = vtkPolyData::New();
    grid->Allocate(100000,4096);

    vtkVertex *cell = vtkVertex::New();
    vtkIdList *pointList = vtkIdList::New();

    vtkFloatArray *Vtheta = vtkFloatArray::New();
    Vtheta->SetName(“Vtheta”);

    vtkFloatArray *Vr = vtkFloatArray::New();
    Vr->SetName(“Vr”);

    int num = 0;;
    printf(“Reading file “%s”…n”, argv[1]);;
    int breakpoint = 1 << 14;;

    FILE *fptr = fopen(argv[1], “r”);;
    float x, y, z, u, v, w;;
    float cos, sin, ir;
    char line[1024];;
    while(!feof(fptr)) {
        fgets(line, 1024, fptr);;
        num += 1;
        if ((num % breakpoint) == 0) {
            printf(“t%i…n”, num);;
        }

        sscanf(line, “%f %f %f %f %f %f”, &x, &y, &z, &u, &v, &w);;
        pointList->Reset();
        pointList->InsertNextId(points->InsertNextPoint(x,y,z));

        grid->InsertNextCell(cell->GetCellType(), pointList);
        ir = 1.0 / sqrt(x*x + y*y);
        cos = x * ir;;
        sin = y * ir;;
        Vr->InsertNextValue(u*cos + v*sin);
       Vtheta->InsertNextValue(-u*sin + v*cos);
    }

    printf(“Compositing…n”);;
    grid->SetPoints(points) ;
    grid->GetPointData()->AddArray(Vr);
    grid->GetPointData()->AddArray(Vtheta);

    printf(“Writing VTK file “%s”n”, argv[2]);;
    vtkDataSetWriter *writer = vtkDataSetWriter::New();
    writer->SetFileName(argv[2]);
    writer->SetFileTypeToBinary();
    writer->SetInput(grid);
    writer->Write();
}

It’s pretty obvious that this code is longer, but from closer inspection you can see that’s it’s just C-specific stuff (Include headers, variable declarations, etc). Because I already had the Python code, I was able to start with that and some global Search-and-replace’s and be most of the way there. I had to rewrite the file I/O and parsing (split vs sscanf) but that wasn’t too bad.

The real surprise: It runs in 6 minutes.
[page_break]

So, the conclusion here….

  RunTime Memory Usage Code Size
Python 45m 985M 50 lines
C 6m 985M 65 lines, compiles to 236Kb

The two codes are virtually identical.  They perform the exact same tasks and output the exact same results.  But the C code runs in 7x faster.  Why?  Well, I can think of a few reasons.

  1. How Python Handles Numbers.  Python seems to handle alot of numeric data as strings internally (or something like strings, byte arrays basically) so that it can manage the Long Int type and the other various other intricacies.  I think this makes the slight bit of computation I had to do in the loop significantly slower. 
  2. Generic vs Specific functions Also, Python has no square-root function meaning I have to replace it with a power-of-1-half.  This change from a specific routine (square root) to a generic interface (raise to the 1/2 power) I think also contributes to a significant slowdown.
  3. API Translation.  Even though there is a Python Wrapping for VTK, I think it internally adds an extra layer of function calls to every VTK function you use.  This seems minor, but since I’m using 6 VTK functions inside the loop, the extra overhead from pushing and popping stack and function parameters could become a problem.  Especially given that these must be translated from Python “objects” to C pointers.
  4. Compiled vs Interpreted  Of course this is an issue with all Scripting languages.  C is compiled to machine-code while Python is run-time interpreted.

I think #2 and #3 were the major culprits in my experienced performance hits.  If python had a build-in square-root function then that might alleviate some of my problems, but I think it would still remain in some form.  Personally, I still find the Python code useful for several reasons:

  1. No compile step – Not having to fight with a makefile (a CMake file actually) saves some startup time.
  2. Quick to develop – Being able to develop with all of C’s semicolons, curly braces, and forced variable declaration makes code quick to develop, albeit a bit sloppy at times.
  3. Easy to “tweak” – This is a major pain in C code.  Every tweak means a few syntax errors usually, and a few compile & link stages.  In Python it was generally as simple as “save and run”.

Plus, once I had the Python code working fairly well it was easy to convert it to a C program to benefit from the improved performance.  Also, the Python code will be alot more portable since it will run with any properly configured “vtkpython”, rather than having to be recompiled and relinked which will require a new Makefile on every platform.

It was an interesting experiment and I wanted to share the results.  I’m going to keep using Python for some stuff here, but now I’ve got some concrete evidence of the performance gains from the extra time spent converting to C.