Month: June 2015

Unzip a file in memory

The zipfile module is fairly flexible but there are occasions when you cannot pass it a filename (as a string) or a file like object; for example the open method on AWS S3 buckets does not return a suitable object. What to do if you can read the zip file into memory – writing it to disk just to read it back in again seems a waste.

Python, as is often the case, already has a module to solve this problem, in this case StringIO. This allows you to treat a string, or in this case the entire file in memory, as if it was a file.

This allows us to write our unzip procedure compactly as

# module imports and S3 connection omitted for brevity (and beyond scope)
s3file = s3connection.get_bucket(bucketname).get_key(filename)
if s3file:
 s3file.open()
 zf = udbfile.read()
 s3file.close()
 zip = zipfile.ZipFile(StringIO.StringIO(zf))
 zip.extractall()
Advertisements

Visual Studio and Python

While Idle is good enough to get you started with Python, if you are going beyond the basic scripting having a full IDE is more productive. There are lots of IDE’s available that support Python and it can often be a matter of personal taste but an old timer, Visual Studio, has recently become Python friendly.

While there has been a cut down version of Visual Studio packaged with the express versions for some time, Microsoft released a Community edition starting with Visual Studio 2013. This is the fully professional version aimed at open source projects, academic research, training, education and small professional teams. There is now a Visual Studio 2015 Community edition available for those wanting the latest and greatest.

The Python Tools for VS (v2.1 for 2013 and v2.2 for 2015) have also matured to the point where Python finally feels like a first class citizen in VS. This gives you code complete, ability to select the version of Python to run against (if you have more than one version of Python installed) and all the other features you expect from VS.

Python Tools installs direct from within VS with 2015 so no more hunting around the web for the right version. It comes with some templates to get you up and running (for Bottle, Django and Flask) and also support ironPython.

Where VS really comes into its own is when you are writing in more than one language. I only stumbled across the community edition because I was doing web work. VS supports JavaScript natively and can markup HTML and CSS. Then let you see the results in the browser of your choice using the internal web server in VS. Put all this together and you can write and debug the back end server in Python, modules in C and the client side in HTML, CSS and JavaScript all from with Visual Studio. The code complete alone should justify the hour to two spent learning to use VS.

If you are compiling C modules for Python, note that Python 3.5 was built with Visual Studio 2015, so the community edition gives you everything you need to compile to to embed Python in your C/C++ programs.

For those who are signed up to the Microsoft Virtual Academy, look of for a Python and Django jump start course. Also you can download from Microsoft a Windows 10 VM pre-installed with all the Visual Studio and related SDKs.

Threading

If you’ve come to Python from another language (certainly a low level language) then you know that threading is hard. You have a mutex or a semaphore which may be achievable with the synchronized command if the language supports it.

So it should come as no surprise to find in Python you initialize a Thread class with the function you want to run (myfunc in the example below) and start it.

t = threading.Thread(target=myfunc)
t.start()

I could put a fully working example in with a few more lines but SaltyCrane already has a simple Threading example on his blog which I cannot beat for clarity.

The problem with this simplistic method is there is no way to interact with the thread. Fine if you want to split out a long running I/O operation or finite background task but what if you need to stop the thread or query its status. You could work around these by passing in a mutable object but really you want to create your own class.

In the Thread class, when you call start it passes control over to the run method to actually execute your function. So your class just needs to override this run method. You will probably want to override the __init__ method as well, in which case don’t forget to call the parent initialization.

To demonstrate I have created a simple threading example. You initialise the class with a name and the number of seconds to sleep and it just writes the name to the console then sleeps for the specified time. The testing code creates 5 classes with different names / sleep times and then starts them running for 5 minutes so you can see the different output then stops them.