Today I spent some time rewriting a lengthy Tcl script of mine in Python.  Why, you may ask?  Well, the script is a bit unwieldy in Tcl and I could see where the additional structure of Python may help to clean it up.  The script is a simple log parser to analyze log file and generate pretty HTML documents with the results.  Performance was starting to become a problem as the Tcl version was using alot of memory by loading the entire contents of the files into memory before analyzing them.  I really needed to rework the script to analyze them line-by-line, which would be a major refactoring of the code so I figured I’ld try Python (It’s not really a rewrite, I’ld just need to interlace the Read & Parse portions instead of having them in 2 separate loops).

It took me about 2 hours to do it.  The starting TCL script is 566 lines and the resulting Python script is 456 lines, a net saving of about 100 lines of code.  But that’s not particularly important.  I verified that the Tcl & Python versions generated identical HTML, and then set out doing some basic benchmarks.  For starters, here’s some simple benchmarks of processing all the logfiles I have right now:

Tcl Python
49.3s 27.63s

So that’s almost 50% time saved.. ButI had expected an improvement but that’s more than I ever hoped.  I had expected Python to be more optimized and efficient than TCL, but was there any other reasons why this might be such an improvement?  Click inside for some of my thoughts…[tag:tcl][tag:python][tag:code][tag:programming]
I attribute it mostly to the improved efficiency of the following snippet:
TCL Version
        if {![info exists users($user)]} {
            set users($user) 0
        }
        incr users($user)
Python Version
        users[user] = users.get(user,0) + 1

Not only does it go from 4 interpreted lines to a single line, but it removed a branching statement (Or at least buried it in the C-implementation of the dictionary’s “get” function).  That little code snippet appears about 15 times inside my program, usually within a loop.  I was also able to compress alot of code like this:

TCL Version
if {[string range [lindex $entry 1] 0 1] == “P:”} {
        set p [string range [lindex $entry 1] 2 end]
}
Python Version
if entry[1][0:2] == “P:”:
                p = entry[1][2:]

This doesn’t result in any net reduction in code size, but significantly reduces the level of parsing required (If I recall, pretty much every set of []‘s in TCL spawns off a sub-interpreter to parse it).  Python’s Slicing functionality I think really helped the performance of most of my script, being that it’s fundamentally a big string processor script.

Another piece of code I’m proud of:

TCL Version
set line [string range $line [expr $pos + 6] end]
set parts [split $line '/']
set cleanparts “”
foreach p $parts {
    lappend cleanparts [string trim $p]
}
Python Version
newline = line[pos+6:]
entry = map((lambda x: x.strip()), newline.split(‘/’))

That little snippet splits a line into a list (using the ‘/’ character as a separator) and then strips all the leading & trailing spaces off each element.  That’s the first part of parsing my logfiles. 

Another thing that helped alot was Python’s “Triple-quote” strings.  Since I’m generating HTML & JavaScript output I have alot of square brackets, curly brackets, quotes, and more.  Tcl Make these a bit difficult to work with, requiring me to manually escape all of these characters.  With Python’s triple-quote I was able to condense groups of 30-40 “puts” down to a single “write”.

So I’m happy.  The new Python Script is easier to read, easier to maintain, and faster to run.  I’ve also been able to add a few new features, mostly safety stuff using the Try/Except structures around most of the IO.   I really wonder what kind of improvement such a change (Tcl to Python) would make on the code I used to write for Z-Kat, err Mako..