For the last week or two I’ve been working on rewriting FreezerBurn to use SQLite database instead of the scattered INI files it currently uses.  I’m hoping it will be faster, more reliable, and significantly reduce some of the complicated code I’m having to use to manage huge lists & dict’s of Jobs, Nodes, Frames, and more.

In doing this, tho, I’ve found a few interesting performance quirks of SQLite that I thought I would share. Specifically two things:

  • Transactions vs Immediate commits
  • executescript vs executemany

So come on inside and read up if you’re interested.

In my project, one of the main relationships is the “Jobs” table and the “Tasks” table.  Each Job can contain several Tasks, and the tasks are individually assigned.  Whenever a new job is created, I need to initialize the appropriate Tasks records, which means creation of several hundred database rows.  For my test cases, I’m using the “Long Shot” case of 5000 rows.

My current query looks something like this:

cur.executemany(“insert into tasks (job_id, task_id) values (%i,?)” % jobid, map((lambda x:(x,)), range(1,TaskCount)))

Which inserts “TaskCount” (5000 in this example) tasks into the database.  But I was worried about performance.  This will happen often, and other stuff is happening simultaneously.  I didn’t want to lockup the system while this is going on, and I didn’t need to lock the database for an extended period or it would interfere with the other threads. So, how long does this take?

Initially, I disabled Transactions (establishing the connection with isolation_level=None) because I thought immediate commits would be safer & use less overhead.  Inserting the 5000 records took 177 seconds, nearly 3 minutes.  This was unacceptable.

In researching optimizations, I found that many people recommended turning on the transactions to improve performance, contrary to my beliefs.  I removed the “isolation_level=None” line, and added a “connection.commit()” right after the executemany command.  The resulting code took 0.39 seconds.

I repeated these tests 4 times to try and reduce the effects of system load, caching, and other effects.  The result:

Isolation_level Run1 Run2 Run3 Run4
=None 177s 80s 76s 76s
default 0.39s 0.43s 0.40s 0.35s

Note: The Tasks table as a Primary Unique Key of (job_id, task_id) so I incremented the job_id betwen each run.

It’s a time savings almost impossible to Fathom.  From 72s to 0.35s, that’s over 2 orders of magnitude of savings.  Moral of the story: Transactions are your friend.

The other part of managing the “Tasks” table is that I need to be able to quickly update the entries within, since they contain Log & Status information that comes in every 20s or so.  Also, in certain cases I need to be able to mass-update ALL of the tasks entries in 1 shot, fast.

Unfortunately, I can’t use a single naive “update” command since each record will be updated uniquely.  My first thought was to build a giant string containing all of the update commands, separated by semicolons, and execute it with “executescript“. As a test script, I used the following code:

GiantQuery = “”
for N in range(1,5000):
GiantQuery = GiantQuery + “UPDATE tasks set status=1,logmessage=’Test Update message for frame %i’ where job_id=2 and task_id=%i;” % (N,N)

cur.executescript(GiantQuery)

Simple enough.  I ran this, and it took 139 seconds.  Again, unacceptable.

While I didn’t think it would make much difference, I thought I should try instead using the “executemany” function.  This changed my code like so:

GiantQuery = []
for N in range(1,5000):
GiantQuery.append((1,’Text update for %i’ % N, N))

cur.executemany(“UPDATE tasks SET status=?,logmessage=? where job_id=2 and task_id=?”, GiantQuery)

This query, much to my surprise, executes in 0.46 seconds.

Again, to eliminate external effects, I executed these several times:

method Run1 Run2 Run3 Run4
executescript 139s 91 125 137
executemany 0.46s 0.48 0.45 0.40

Why is the executemany so much faster?  The results are identical, and they’re executing identical commands.  My guess is in how the SQLite parse examines these strings.  SQLite internally compiles SQL commands into a form of machine-code called VDBE that’s optimized and much faster to parse.  The first method has to parse 5000 individual commands, but the second method may be smart enough to only parse the command once, and then switch out the bound parameters for each update.

So there you have it.. Two non-intuitive (to me at least) methods of significantly speeding up SQLite queries.  Maybe someone else out there can find them as useful as I have.