TCL vs Python
Today I spent some time rewriting a lengthy Tcl script of mine in Python. Why, you may ask? Well, the script is a bit unwieldy in Tcl and I could see where the additional structure of Python may help to clean it up. The script is a simple log parser to analyze log file and generate pretty HTML documents with the results. Performance was starting to become a problem as the Tcl version was using alot of memory by loading the entire contents of the files into memory before analyzing them. I really needed to rework the script to analyze them line-by-line, which would be a major refactoring of the code so I figured I’ld try Python (It’s not really a rewrite, I’ld just need to interlace the Read & Parse portions instead of having them in 2 separate loops).
It took me about 2 hours to do it. The starting TCL script is 566 lines and the resulting Python script is 456 lines, a net saving of about 100 lines of code. But that’s not particularly important. I verified that the Tcl & Python versions generated identical HTML, and then set out doing some basic benchmarks. For starters, here’s some simple benchmarks of processing all the logfiles I have right now:
| Tcl | Python |
| 49.3s | 27.63s |
| TCL Version |
| if {![info exists users($user)]} { set users($user) 0 } incr users($user) |
| Python Version |
| users[user] = users.get(user,0) + 1 |
Not only does it go from 4 interpreted lines to a single line, but it removed a branching statement (Or at least buried it in the C-implementation of the dictionary’s “get” function). That little code snippet appears about 15 times inside my program, usually within a loop. I was also able to compress alot of code like this:
| TCL Version |
| if {[string range [lindex $entry 1] 0 1] == “P:”} { set p [string range [lindex $entry 1] 2 end] } |
| Python Version |
| if entry[1][0:2] == “P:”: p = entry[1][2:] |
This doesn’t result in any net reduction in code size, but significantly reduces the level of parsing required (If I recall, pretty much every set of []‘s in TCL spawns off a sub-interpreter to parse it). Python’s Slicing functionality I think really helped the performance of most of my script, being that it’s fundamentally a big string processor script.
Another piece of code I’m proud of:
| TCL Version |
| set line [string range $line [expr $pos + 6] end] set parts [split $line '/'] set cleanparts “” foreach p $parts { lappend cleanparts [string trim $p] } |
| Python Version |
| newline = line[pos+6:] entry = map((lambda x: x.strip()), newline.split(‘/’)) |
That little snippet splits a line into a list (using the ‘/’ character as a separator) and then strips all the leading & trailing spaces off each element. That’s the first part of parsing my logfiles.
Another thing that helped alot was Python’s “Triple-quote” strings. Since I’m generating HTML & JavaScript output I have alot of square brackets, curly brackets, quotes, and more. Tcl Make these a bit difficult to work with, requiring me to manually escape all of these characters. With Python’s triple-quote I was able to condense groups of 30-40 “puts” down to a single “write”.
So I’m happy. The new Python Script is easier to read, easier to maintain, and faster to run. I’ve also been able to add a few new features, mostly safety stuff using the Try/Except structures around most of the IO. I really wonder what kind of improvement such a change (Tcl to Python) would make on the code I used to write for Z-Kat, err Mako..

