Storing data between runs

Sometimes there will be a need for your Python scripts to store information between runs. While configuration information is best kept in human readable files (like ini of yaml files) if it is data used internally then a better approach is needed. Unless the data to be stored is just plain text it is usually better long term to use a database for this. On first read, that last statement might seem overkill but a few years down the line when the script has stopped working or you need to extract the data stored within you will curse the decision to go with your own home-rolled file format.

We are only looking for ease of use here, not concurrent access or performance. With this in mind, SQLite is a very mature product and the DLL has been included in the Python standard library since v2.5 so there is nothing to install. It is powerful enough to be used as the data store for the Skype client. A good introduction to the SQLite module in Python (assuming you know a little SQL) can be found on Python Central.

Putting the snippets into a slight more complete example, we can create the cliched task list example with surprisingly little code. I’ve tried to choose the SQL statements to show off a few features you would probably want with your CRUD operations but this is not meant as a SQL tutorial.

import sqlite3

sqlcmds = ["CREATE TABLE IF NOT EXISTS persons (pid INTEGER PRIMARY KEY, givennames VARCHAR, surname VARCHAR);",
    "CREATE TABLE IF NOT EXISTS tasks (tid INTEGER PRIMARY KEY, pid INTEGER, name VARCHAR, percentcomplete INTEGER DEFAULT 0);",
    "INSERT INTO persons (pid,givennames,surname) VALUES (1,'Tom','Smith'),(2,'Dick','Jones'),(3,'Harry','Hat');",
    "INSERT INTO tasks (pid,name) VALUES (1,'Checktest'),(1,'Report Checktest'),(3,'Submit Report');",
    "UPDATE tasks SET percentcomplete = 25 WHERE name = 'Checktest';"]
sqlselect = """SELECT tasks.name, persons.givennames, persons.surname, tasks.percentcomplete
    FROM tasks inner join persons on tasks.pid = persons.pid"""

with sqlite3.connect(r"c:\path\to\sqlite.db") as dbconnection:
    cursor = dbconnection.cursor()
    for statement in sqlcmds:
        cursor.execute(statement)
        db.commit()
    cursor.execute(sqlselect)
    tasklist = ['"%s" with %s %s (%d%% complete)' % row for row in cursor]

print('\n'.join(tasklist))

Why is using SQLite an advantage? For starters you can easily view and execute arbitary SQL with the DB Browser for SQLite utility. When unexpected characters or a dodgy record breaks the code now you have a fighting chance at fixing it. Also SQLite doesn’t create large files, the example above is a mere 3Kb file.

An interesting take on this is y_serial, which basically pickles your Python object and stores it with a label in the SQLite table. I’ve not tested this, so make no recommendation for using it, but it is a novel way to store objects between runs without any SQL knowledge or installing any additional modules.

When I get chance to finish testing and uploading there will be a bigger example of this. I want to monitor the scheduled tasks for failures. There is a command line interface for this called schtasks which I can run using the code for running shell commands. As I am only interested in cases where the task has just failed I need to store the state on the last run and this is where SQLite will come in.

TBC

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s