Yeraze's Domain 3.0

Supercomputers, Programming, and Life in Mississippi

Entries for the ‘Source Code’ Category

Purging IE’s SSL Cache

Still working on Freezerburn 2.0, getting alot of help from Kevin. We’ve integrated AJAX Page refreshes so that the webpage updates every 60s or so on it’s own, without requiring a pull page reload. It adds alot to usability and makes it alot friendlier to use, but with our work requirement of CAC [...]

Python’s ‘with’ statement – my new best friend

While working on FreezerBurn, I’ve had alot of difficulty figuring out how to manage SQLite Database Connection objects.  The story so far has been:

Write a “MasterDBConnection” function that either creates a connection, or returns the existing connection (maintained in a global variable).
#1 didn’t work because I use multiple threads, so improve it to use threading.local()  [...]

Python: SQLite & Multiple Threads

Still working on Freezerburn & SQLite, and I ran into an interested quirk yesterday involving SQLite’s interoperation with threads.

My application uses 3 threads:

  1. The Main Application (not really in a thread, but separate from the other 2)
  2. The Web Server
  3. The Communications Server

Establishing a Database connection can take a little bit of time, so I wrote a simple wrapper function to check if a connection was already established and use that one instead.  That was where my problems began. [tag:python][tag:sqlite]

 

SQLite Performance optimizations

For the last week or two I’ve been working on rewriting FreezerBurn to use SQLite database instead of the scattered INI files it currently uses.  I’m hoping it will be faster, more reliable, and significantly reduce some of the complicated code I’m having to use to manage huge lists & dict’s of Jobs, Nodes, Frames, and more.

In doing this, tho, I’ve found a few interesting performance quirks of SQLite that I thought I would share. Specifically two things:

  • Transactions vs Immediate commits
  • executescript vs executemany

So come on inside and read up if you’re interested. [tag:python][tag:sqlite][tag:code][tag:optimization]

Webservers with Python – SSL & CAC Authentication

Working in the DOD, there are a few things you just come to accept.  Webservers require security (SSL), and SSL requires Common Access Card Authentication.  I had hoped that when I implemented the HTTP Monitor for Freezerburn, I could ignore all the security aspects and simply say that "It’s behind all the government firewalls" and "It requires an account on the fileserver".  I wasn’t so lucky.

After alot of research from articles like:

I finally came up with a way to make it work, that required very little variation from Python’s built-in HTTPServer object.  Starting with the ASPN Cookbook recipe for SecureHTTPServer, I came up with the following:

import socket, os
from SocketServer import BaseServer
from BaseHTTPServer import HTTPServer
from SimpleHTTPServer import SimpleHTTPRequestHandler
from SocketServer import ThreadingMixIn
from OpenSSL import SSL
import sys

class SecureHTTPServer(HTTPServer):
    def __init__(self, server_address, HandlerClass,
                dodcerts, serverkey, servercert):
        BaseServer.__init__(self, server_address, HandlerClass)
        #  Based on online Documentation, the v23 actually enables TLS1 as well.
        ctx = SSL.Context(SSL.SSLv23_METHOD)
        #ctx = SSL.Context(SSL.TLSv1_METHOD)

        print "Loading Private Key from %s" % serverkey
        ctx.use_privatekey_file (serverkey)
        print "Loading Certificate from %s" % servercert
        ctx.use_certificate_file(servercert)
        print "Loading DOD Certifications from %s" % dodcerts
        ctx.set_verify_depth(2)
        ctx.load_client_ca(dodcerts)
        ctx.load_verify_locations(dodcerts)

        print "Creating SSL socket"
        callback = lambda conn,cert,errno,depth,retcode: retcode
        ctx.set_verify( SSL.VERIFY_FAIL_IF_NO_PEER_CERT | SSL.VERIFY_PEER, callback)
        ctx.set_session_id(‘Freezerburn’)
        self.socket = SSL.Connection(ctx, socket.socket(self.address_family,
                                                        self.socket_type))
        self.server_bind()
        self.server_activate()

class SecureHTTPRequestHandler(SimpleHTTPRequestHandler):
    def setup(self):
        self.connection = self.request
        self.rfile = socket._fileobject(self.request, "rb", self.rbufsize)
        self.wfile = socket._fileobject(self.request, "wb", self.wbufsize)

This creates a new object called the "SecureHTTPServer" that acts just like the regular HTTPServer, except it allows you to specify the location of the DOD Root Certificates, the Server Private Key, and the Server SSL Certificate.  The only real difference from the ASPN one is that it turns on Client Certificate Verification, which is the core of the CAC Authentication scheme.  With that little snippet of code, SSL & CAC were enabled in one fell swoop!
[tag:ssl][tag:python][tag:cac]

Webservers with Python – Caching

One thing I decided very early on with FreezerBurn was the need for a Web-Based monitoring tool.  I didn’t want to have to deploy a Python tool with a full user-interface to every desktop and then deal with all the Security implications of that.  With a web interface, I could simply let everyone use FireFox or IE or Safari and load it however they wanted.  At least, that was the initial idea.

Very quickly I came across Python’s HTTPServer class which handles a good 90% of what I needed.  After some experiments, though, I found that it tended to die under heavy loads (eg, loading my page with external JS, CSS, and Images).  I needed it to be fairly reliable, and thusly discovered alot more then I ever intended to about HTTP Caching.

HTTP implements caching 2 ways: ETags & "Last-Modified" dates.  Etags are simply checksums where the Browser says "The version I have has this checksum", and the server checks it and tells you "That’s right" or "That’s old, here’s a new one".  In my case, I chose to instead implement the "Last-modified" version where I could simply check the last modified date on the files.  The resulting code looks like this:

class MonitorServer(HTTPRequestHandler):
    def do_GET(self):
        global jobQueue
        global servers
        global mQueue
        global initTime
        # process self.path
        (scm, netloc, path, params, query, fragment) = urlparse.urlparse(self.path, ‘http’)
        if scm != ‘http’ or fragment:
            self.send_error(400, "bad url %s" % self.path)
            return
        print ‘HTTP: Serving %s’ % path
        # Write to self.wfilt
        if path == "/":
            # Raw Index Page
            self.wfile.write(""" blah blah blabh """)
        elif path == "/job":
            blah blah blah
        else:
            # Serve files from the system
            path = path.split(‘/’)[-1]
            if os.path.exists(constants.MonitorPath + ‘\\’ + path):
                (ctype, enc) = mimetypes.guess_type(path)
                info = os.stat(constants.MonitorPath + ‘\\’ + path)
                lastmod = datetime.datetime.fromtimestamp(info[ST_MTIME])
                if self.headers.get(‘If-None-Match’,”):
                    self.send_response(304)
                    return
                if self.headers.get(‘If-Modified-Since’,”):
                    dt = self.headers.get(‘If-Modified-Since’).split(‘;’)[0]
                    try:
                        modsince = datetime.datetime.strptime(dt, "%a, %d %b %Y %H:%M:%S %Z")
                        if modsince >= lastmod:
                            print "HTTP: No new version of %s" % path
                            self.send_response(304)
                            return
                    except:
                        pass

                self.send_response(200)
                self.send_header(‘Cache-Control’, ‘max-age=864000′)
                self.send_header(‘Expires’, "Fri, 30 Jan 2010 12:00:00 GMT")
                self.send_header(‘Content-Length’, info[ST_SIZE])
                self.send_header(‘Last-Modified’, lastmod.strftime("%a, %d %b %Y %H:%M:%S GMT"))
                self.send_header(‘Content-Type’, ctype)
                self.end_headers()
                self.copyfile(open(constants.MonitorPath + ‘\\’ + path, ‘rb’), self.wfile)
            else:
                print "MONITOR:Can’t find %s" % (constants.MonitorPath + ‘\\’ + path)
                self.send_error(404, "Unknown url %s" % self.path)

      

So, this piece of code creates a "MonitorServer" object  and defines the "GET" function.  In there it parses the requested URL, and if it’s in one of a few selected forms then send the  dynamically generated content.  Otherwise, send the requested file directly to the User.  As a security precaution, I don’t allow the user to specify a path (the path is stripped and the file must exist within 1 specific directory).  If the browser provides a "If-None-Match" header, then I return a 304 code which indicates the Cache is up-to-date.  If they provide a "If-Modified-Since", then I parse the date and compare it against the file, and returna 304 if appropriate.  One interesting thing I learned from this is that the Browser doesn’t actually return the date of the file, but rather returns the "Last-Modified" you send it, in identical formatting. 

So far, this works great!  It significantly improved the response of the server and has greatly reduced load-times (Even though it’s on a local-network).
[tag:python][tag:webserver][tag:caching]

Rookie Mistakes: Overmodularization & "Black-Box" design

A friend of mine has been roped into doing some web-design for a project with some friends of his from school.  It’s a relatively simple project, and they’re paying him decent money (not good, but decent).  Over the last few weeks, though, he’s slowly started to realize just how big of a mess this project is.  It seems to be some students that are doing most of the coding, and they’re doing it all in ASP and C-sharp, while they’re not particularly proficient with either.  The fact that they held up development for 2 days while they tried to get Visual Studio to work should indicate the caliber of development we’re talking here.

Today he was telling me some of the horrors he’s seen in the code (I’m still trying to convince him to submit some to WorseThanFailure.com), and it started to dawn on me what’s really going on here.  It’s a classic problem that pretty much every programmer falls into in his first big project: Overmodularization leading to "Black-Box" designs.

Let me explain.  All through school & training, we’re continuously told that "Design for Code-Reuse" and "Module design is good", and they’re right.  A piece of code that you’re going to use over and over again needs to go in a function, rather than suffer the ravages of cut-n-paste.  But, it does have a limit.  Nobody tells you that part.  I’ve fallen into this trap myself, everyone has at least once.  Here’s an example (from my friend’s project):  I’m going to use a table-less design, all divs, in this webpage.  So rather than scattering the div tags all through the code, I’ll write a "writeDiv" function that accepts a string and automatically puts the div tags around it.  Sounds good right? 

It’s a trivial example, but it’s a classic starting point for the type of failure we’re talking about.  You start with a basic "writeDiv", and scatter it all through your code.  Then you realize that not all Div’s are the same, some need CSS class specifiers.  What should happen is that you would modify the original writeDiv to accept a class argument, but we think it’ll be more efficient if we have a "writeTitleBarDiv" and "writeBorderedBoxDiv" functions instead.  Afterall, we’re just gonna be passing the same argument to all those anyway, right?  This goes on and on until you’ve achieved maximum modularity, but absolute zero code reuse.    The entire application is encapsulated in functions that are each called a single time because they’re too specialized to be used more than once.

That alone is bad enough, and a perfect example of Over-modularization. But, for alot of new programmers it doesn’t stop there.  (I’ve done this myself, I’m ashamed).  Since the resulting system that you’ve designed is so incredibly "modular", it’s a bit difficult to work with.  So, we’ll just encapsulate it as an engine that you can feel a short input to and get your desired output.  This is exceptionally prevalent in web-page systems.  Rather than writing HTML, you write some small script that somehow describes the page, and then the parser engine takes over and generates the actual page.  This could be a text file full of strange Wiki-like symbols, or it could be an ASP & C# script that uses no C# functions, just calls hundreds of methods of the "ParserEngine" object.  However, because of how the system is designed, writing this weird input language is easily several orders of magnitude more complicated than writing the raw HTML would be.   Not to mention, when the existing "Engine" fails in some way, good luck trying to debug it.

Such systems are typical of new "rookie" programmers, and there’s no telling how many of these poorly designed systems wind up becoming production tools.  Experienced programmers can see these things coming and make changes to avoid them.  It takes experience to know when to modularize and when to just write it there.  Be it in DSP microcode or web-page design, it takes experience.  Modular design is good from a code-reuse and maintainability perspective, but it does bring overhead in increased function calls and push’ing/pop’ing stack arguments.  Where to put the line is something that only comes with experience.

So, how many of these "Black-Boxes" have you built in your day? How many programmers did it take to "fix" it?
[tag:badcode][tag:programming][tag:modular]

vPIP & MediaWiki

Note: Please don’t cut-n-paste the following code.  If you want this plugin, then download the necessary files as a Zip File from This Page.

A friend of mine told me yesterday about vPIP.  It’s a collection of JavaScript code to allow for "Videos Playing in Place".  Basically, you can present an image on your page that when people click on it suddenly springs to live to play a movie.  It’s similar to the technology behind YouTube and all those other embedded video sites.  The Videos take alot of bandwidth and client-side resources, and there’s no reason to send them to the user until they explicitly ask for them.  It makes the browsing experience faster, while keeping it pretty clean and transparent to the user.

After a bit of research, vPIP seemed like a great thing to put on the DAAC wiki.  We have a fair number (an increasing number too) of videos on the website in our Wiki, and right now we show an image and a link to download the movie to the user.  It works, but it’s a bit clunky, especially since MediaWiki makes it so friggin difficult to have an Image as a Link to something else.  I first attempted to make a vPIP template that would generate the necessary HTML & JavaScript code, but MediaWiki kept removing the additional tags, breaking it.  It’s a safety feature, I understand, and one that I couldn’t find any way around (which is probably a good thing). 

So my next attempt was to create a Plugin for MediaWiki.  I already had one or two installed, and was able to use one of them as a template.  Continue inside for the details….
[tag:mediawiki][tag:vpip][tag:plugin]

Process Limiting in Unix with Semaphores

One common trick in the Wide World of Windows (Also known as the WWW, not to be confused with the World Wide Web or Weasley’s Wizard Wheezes) is to prevent the user from running more than 1 instance of an application.  Try it sometime, open up something like Microsoft Word.  Then, try to open it again and you’ll find that it doesn’t work, it simply refocuses the previous session.  It’s a pretty useful trick and Windows gives you lots of ways to make this happen.

However, on a real multi-user operating system like Linux or Unix, such behavior is shunned.  You expect multiple copies of everything to be open at once since you may have multiple users running the program simultaneously.  Every now and then, however, it’s advantageous to get this similar behavior.  I ran into a case-in-point this week.  A user was running ezViz, my pride and joy, on one of the supercomputers.  He submitted several jobs, each one running several instances of ezViz serially.  Unfortunately, they all wound up on the same node, and he wound up running 8 versions simultaneously, each one wanted 3.5G of a 16G machine.  It wasn’t pretty. 

After thinking about it for a while, I thought it might be helpful to give the user a way to specify a maximum number of simultaneous runs.  Runs beyond that number would simply wait their turn.  First attempts were to implement something like ‘ps‘ to check the running processes and see how many were running.  Not only is this difficult, but there are alot of race conditions and such from multiple versions trying to check simultaneously. 

I did some research and a friend of mine suggested using a shared memory block with shmget to have each process ‘register’ itself in the shared space.  Each process could register and know how many other processes were going, and then decide whether to wait or continue.  While that’s definately an option, a similar but far better method is Semaphores.  Come on inside for more..
[tag:linux][tag:unix][tag:source][tag:semaphore]

SWFC & Flash Animations

Last week, at work, they decided that we need another site redesign for the DAAC Website.  The new frontpage is significantly smaller than before, so one idea early one was to incorporate a Flash Menu to scroll between different items.  That way we increase exposure of various things, without dedicating any more screen space.  It was a good idea, and pretty quickly we decided on something like the widget at www.steampowered.com and gamespot.com.  By "we" I mean that the group decided I needed to make it :)   So I broke down the requirements like so:

  1. It should smoothly transition between 5 images, about once every 3 seconds.
  2. Buttons across the bottom that indicate which of the 5 images is shown
  3. When they click on a button, go directly to that image and disable the automatic transition
  4. Each image should be accompanied by a small text Description
  5. Clicking on the Image should go to a URL

Seemed simple enough, right?  Unfortunately, I’ve never done anything in Flash before, and we don’t have the Adobe Flash CS2 package in the office.  Being the proponent of Open-Source that I am (and having a deadline of "On the Website" within 4 weeks), I hit the net to see what I could find.  It wasn’t long before I stumbled upon the SWFTools.
[tag:flash][tag:swftools][tag:swfc]