Webservers with Python - Caching

Wednesday, March 19 2008 @ 03:32 PM CDT

Contributed by: Yeraze

One thing I decided very early on with FreezerBurn was the need for a Web-Based monitoring tool.  I didn't want to have to deploy a Python tool with a full user-interface to every desktop and then deal with all the Security implications of that.  With a web interface, I could simply let everyone use FireFox or IE or Safari and load it however they wanted.  At least, that was the initial idea.

Very quickly I came across Python's HTTPServer class which handles a good 90% of what I needed.  After some experiments, though, I found that it tended to die under heavy loads (eg, loading my page with external JS, CSS, and Images).  I needed it to be fairly reliable, and thusly discovered alot more then I ever intended to about HTTP Caching.

HTTP implements caching 2 ways: ETags & "Last-Modified" dates.  Etags are simply checksums where the Browser says "The version I have has this checksum", and the server checks it and tells you "That's right" or "That's old, here's a new one".  In my case, I chose to instead implement the "Last-modified" version where I could simply check the last modified date on the files.  The resulting code looks like this:

class MonitorServer(HTTPRequestHandler):
    def do_GET(self):
        global jobQueue
        global servers
        global mQueue
        global initTime
        # process self.path
        (scm, netloc, path, params, query, fragment) = urlparse.urlparse(self.path, 'http')
        if scm != 'http' or fragment:
            self.send_error(400, "bad url %s" % self.path)
            return
        print 'HTTP: Serving %s' % path
        # Write to self.wfilt
        if path == "/":
            # Raw Index Page
            self.wfile.write(""" blah blah blabh """)
        elif path == "/job":
            blah blah blah
        else:
            # Serve files from the system
            path = path.split('/')[-1]
            if os.path.exists(constants.MonitorPath + '\\' + path):
                (ctype, enc) = mimetypes.guess_type(path)
                info = os.stat(constants.MonitorPath + '\\' + path)
                lastmod = datetime.datetime.fromtimestamp(info[ST_MTIME])
                if self.headers.get('If-None-Match',''):
                    self.send_response(304)
                    return
                if self.headers.get('If-Modified-Since',''):
                    dt = self.headers.get('If-Modified-Since').split(';')[0]
                    try:
                        modsince = datetime.datetime.strptime(dt, "%a, %d %b %Y %H:%M:%S %Z")
                        if modsince >= lastmod:
                            print "HTTP: No new version of %s" % path
                            self.send_response(304)
                            return
                    except:
                        pass

                self.send_response(200)
                self.send_header('Cache-Control', 'max-age=864000')
                self.send_header('Expires', "Fri, 30 Jan 2010 12:00:00 GMT")
                self.send_header('Content-Length', info[ST_SIZE])
                self.send_header('Last-Modified', lastmod.strftime("%a, %d %b %Y %H:%M:%S GMT"))
                self.send_header('Content-Type', ctype)
                self.end_headers()
                self.copyfile(open(constants.MonitorPath + '\\' + path, 'rb'), self.wfile)
            else:
                print "MONITOR:Can't find %s" % (constants.MonitorPath + '\\' + path)
                self.send_error(404, "Unknown url %s" % self.path)

      
So, this piece of code creates a "MonitorServer" object  and defines the "GET" function.  In there it parses the requested URL, and if it's in one of a few selected forms then send the  dynamically generated content.  Otherwise, send the requested file directly to the User.  As a security precaution, I don't allow the user to specify a path (the path is stripped and the file must exist within 1 specific directory).  If the browser provides a "If-None-Match" header, then I return a 304 code which indicates the Cache is up-to-date.  If they provide a "If-Modified-Since", then I parse the date and compare it against the file, and returna 304 if appropriate.  One interesting thing I learned from this is that the Browser doesn't actually return the date of the file, but rather returns the "Last-Modified" you send it, in identical formatting. 

So far, this works great!  It significantly improved the response of the server and has greatly reduced load-times (Even though it's on a local-network).

Technorati Tags: , ,

Comments (0)


Yeraze's Domain
http://www.yeraze.com/article.php/webservers_with_python_caching