Core language

Language features and explanations

GIL: Who, what, why?

For most posts I concentrate on using Python to solve tasks (mostly system administration based). Apart from fringe cases these are not multi-threaded so I can safely ignore Python’s Global Interpreter Lock (normally shortened to GIL). Even when running a web server, it is usually left up to the web framework to handle any multi-threading so again the GIL is safely ignored.

What is the GIL? Firstly this is for CPython, the version of Python you are most likely to be running. Other implementations like Jython, PyPy or IronPython do things differently. It is just a mechanism to marshall access to internals (variables mostly) from different threads. It makes coding involving multiple threads straight forward but means threads in CPython are generally only good for solving blocking I/O.

For a brief but more technical explanation, Vinay Sajip posted a good single paragraph description to the GIL:

“Python’s GIL is intended to serialize access to interpreter internals from different threads. On multi-core systems, it means that multiple threads can’t effectively make use of multiple cores. (If the GIL didn’t lead to this problem, most people wouldn’t care about the GIL – it’s only being raised as an issue because of the increasing prevalence of multi-core systems.) If you want to understand it in detail, you can view this video or look at this set of slides.”

Why my interest? I have just finished reading an article by A. Jesse Jiryu Davis which goes into far more detail about the GIL. If you are planning a C extension, looking at multi-threading some code which shares data or just curious try his Grok the GIL article as a starting point.

Python SQL Server driver on Linux

So you have packaged your SQL monitoring and maintenance routines into a web server and demonstrated it all works from your computer. Impressed they ask for it to be put on a proper server – a Linux box. 5 years ago this would have involved using unsupported 3rd party drivers and who ran internal Linux servers anyway. Now the request seems almost reasonable although you will have to jump through more hoops than you would with Windows.

First off I’ll assume you are using the pyodbc module. On Linux this will require a C compiler. If you have chosen a minimal install then you’ll need to install them. This can be done with the following command (depending upon the flavour)

Redhat (Centos/Fedora)
sudo yum groupinstall 'Development Tools' -y
sudo yum install python-devel unixODBC-devel -y

Debian (Ubuntu)
sudo apt-get install build-essential -y
sudo apt-get install python-dev unixodbc-dev -y

With this done you can now pip install pyodbc. The pyodbc module is a wrapper around the native system drivers so you will need to install a suitable unixodbc driver. Microsoft have produced an official unixODBC driver since 2012 and it has been regularly maintained since. Installation instructions for v13 can be found on this blog post.

With pyodbc and unixodbc set up all you need to change in your actual code is the driver on the ODBC connection string to ‘ODBC Driver 13 for SQL Server’ and away you go. As a quick test, the following example will establish a connection and return the servername through a SQL query.

import pyodbc
cnxnstr = "Driver=ODBC Driver 13 for SQL Server;Server=<yourserver>;Uid=<yourusername>;Pwd=<yourpassword>;database=<yourdatabase>"
cnxn = pyodbc.connect(cnxnstr)
cursor = cnxn.cursor()
cursor.execute("SELECT @@SERVERNAME")
result = cursor.fetchall()
for row in result:
    print(row)
cursor.close()
cnxn.close()

Virtual environments in Visual Studio

A virtual environment in Python is a folder with everything needed to set up local configuration isolated from the rest of the system. This allows you can have modules installed locally which are different or do not exist in the global Python configuration. If you have used Node.js then you can think of virtual environments as npm default way of working – creating a local install of a package rather than a global one (pip’s default).

If you have multiple versions of Python installed on your machine then you can also specify which version of Python the virtual environment should use. This gives you ability to test your code against multiple versions of Python just by creating multiple virtual environments.

There are already plenty of good posts out there on virtual environments so the aim of this blog post is not to rehash why you should use virtual environments (see here for a good introductory blog post here) or as a quick setup guide (see the Hitchhikers Guide to Python post). It is a quick guide to using virtual environments within Visual Studio. If you have not used virtual environments before it is worth giving these posts a quick read before continuing.

As an aside, Python 3.3 introduced the venv module as an alternative for creating lightweight virtual environments (although the original wrapper pyvenv has already be depreciated in Python 3.6). While this is the correct way going forward, Visual Studio uses the older virtualenv method which is what I am concentrating on here.

Once you have created your Python solution expand it until you get to Python Environments. Right-click on this and choose Add Virtual Environment… from the menu list as shown belowvsve1

 

You can change the name of the folder (defaults to env) which is also used as the name of the virtual environment and the version of Python to use. Click Create to finish and you are ready to go (easy wasn’t it). If you expand the Python Environments node you should see the virtual environment appear.

In the background this has created a folder (the virtual environment) in your working directory with the name given. In case you are unsure, your working directory is the location is the location of the solution which defaults to X:\Users\me\Documents\VS20xx\Projects\Project Name\Solution Name\ – tip, change the default location). This could have been done manually by changing into the working directory and entering the following command (where X:\Python_xx is the installation directory for the version of Python you want to use and env is the name of the folder / virtual environment – if you just want your default version of Python then just pass the name of the folder).

virtualenv -p X:\Python_xx\python.exe env

To install a module into the virtual environment from Visual Studio just right-click on the virtual environment and select Install Python Package… from the menu or if you have a requirements.txt file you can select Install from requirements.txt. If you expand the virtual environment node you will see the modules installed. Once you have all the modules installed you can generate the requirements.txt file from the same menu and it will add the requirements.txt to your project for portability.

What if you want to use this virtual environment from the command line? Inside of the virtual environment is a Scripts directory with a script to make the necessary changes; the trick is to run the correct script from the working directory. The script to run depends upon whether you are running inside a PowerShell console (my recommendation) or from a command prompt. Change into the working directory and type in the following command (where env is the virtual environment folder)

PowerShell: .\env\Scripts\activate.ps1
Command prompt: env\Scripts\activate.bat

The prompt will change to the name of the virtual environment to show activation has succeeded. You can do everything you would normally do from the command line but now you are running against the virtual environment. To confirm the modules installed are only those you have specified type in ‘pip list’ and the version of Python is the one you specified with ‘python -v’.

Update: It appears I’m not the only one to be looking at virtual environments today, see this article if you want a similar introduction but from the command prompt only.

Python 3.6

Almost like a Christmas present, Python 3.6 has been released, just fifteen months after 3.5 was released (compared to an 18 month average for the 3.x branch). You can see the official what’s new page here, or if dry lists of features are not your thing, try this summary of the improvements. If nothing else the speed improvements might end one of the arguments for staying on v2.

What will be interesting to see is the take up of the asynchronous additions of Python 3.5 which have further been improved in 3.6. Node.js has shown just how efficient asynchronous programming can be and hopefully async / await can make this just as accessible in Python. If you’ve need seen these new keywords before, see this blog post for a decent introduction.

I’m hoping to investigate asynchronous programming in the New Year so there should be a post on here in the near future.

Pip requirements

You should be used to installing new modules using pip. You have probably used a requirements.txt file to install multiple modules together with the command.

pip install -r requirements.txt

But what about if you need more flexibility. Why would you ever need more flexibility? If you look at my introduction to YAML post, the code supports either the yaml or ruamel.yaml module. There is no way to add conditional logic to a requirements.txt file so a different strategy is needed.

pip is just a module so it can be imported like any other module. This not only gives you access to the main method, which takes an argument list just as if you were calling pip from the command line, but also to its various methods and classes. One of these is the WorkingSet class which creates a collection of the installed modules (or active distributions as the documentation calls them). Using this we can create the conditional logic needed to ensure one of the yaml modules is installed as below.

import pip
package_names = [ ws.project_name for ws in pip._vendor.pkg_resources.WorkingSet() ]
if ('yaml' not in package_names) and ('ruamel.yaml' not in package_names):
    pip.main(['install','ruamel.yaml'])

WorkingSet returns a few other useful properties and methods apart from the package_name. The location property returns the path to where the module is installed and the version property naturally returns the version installed. The requires method returns a list of dependencies.

As with most modules, if you’re interested in finding out more dig around in the source code.

Yielding files

Time for a challenge, so I’m going to try 12 blogs of Christmas. The aim is to write 12 blog entries in December (or at least by 5th January which is the 12th day of Christmas). That is one blog entry every 3 days. It’s a catchy title for a challenge (always helps, think Movember) which could be used for any challenge and I’ve twisted my ankle so I doubt I’ll be running; the 12 runs of Christmas does sound nice anyway. Yes it is the 4th already so not a good start.

After the last post I’ve been thinking of other examples of where a generator would be useful that was more in keeping with the theme of this blog (sys administration with Python in case you’ve forgotten). Iterating through system calls or an API would be a good candidate but I’ve not been using anything recently that fitted the bill. Another case that sprang to mind was file searching.

A reasonable way to do this would be to create a list but why use the memory to create the list if the caller is unlikely to need a list and they can use list comprehension to create a list anyway. So this should make a good generator example.

Some of the work is done already by os.walk; this will iterate through each directory giving you a list of files and folders. Normally when you looking for files you would specify a wildcard pattern so I’m going to use regular expressions and return any file that matches using yield. I’ve covered regular expressions a few times before so I’ll skip any explanation and just present the code which takes a directory and a file pattern and returns all the matching files.

import os, re</pre>
<pre>def filesearch (root, pattern, exact=True):
    searchre = re.compile(pattern)
    for parent, dirs, files in os.walk(root):
        for filename in files:
            if exact:
                res = searchre.match(filename)
            else
                res = searchre.search(filename)
            if res:
                yield os.path.join(parent,filename)

for filename in filesearch(r"C:\Temp",r".*\.exe"):
    print("%s has size %d" % (filename,os.path.getsize(filename)))

The only thing to note is I added a third option so you can do a match (the regular expression must match the whole filename) or a search (the regular expression only needs to match part of the filename). This defaults to true which is an exact match.

The example should find any executables in the C:\temp folder. Regular expressions are very powerful but not quite as simple using *.exe. Instead the asterisk becomes .* (match any character 0 or more times) and the dot has to be escaped as it is a special character. I’ve just printed the filename and size out but you could equally delete the file if it was bigger than a certain size etc.

And that’s my first post of 12 blogs of Christmas. Lets see if I can get all 12 done in time.


									

Generators and yield

A source of confusion for a lot of people new to Python and for anyone who has not used them for a while is the yield keyword. As this must be the third time I’ve had to relearn generators I thought I’d make a few notes.

My way of visualizing a generator is a function that returns (or should that be generates?) an iterator. Having the yield keyword in a function is enough to turn it into a generator. Once you have called the generator to get the iterator you can use it as you would any other iterator. Jeff Knupp has a much fuller explanation on his blog so give it a read and then return.

For an example I created a Fibonacci number generator with the following code along with a few examples using an iterator.

def fibonacci(a = 0,b = 1,maxiter=-1):
    while True:
        yield a
        a,b = b,a+b
        if maxiter > 0:
           maxiter -= 1
           if maxiter < 1:
              return

print([f for f in fibonacci(maxiter=10)])
for f in fibonacci(3,5,20):
    print(f)

First a few notes on the generator itself. You can specify the starting numbers a and b (naturally defaulting to 0 and 1) and a maximum number of Fibonacci numbers (or iterations to perform) with maxiter when you call the generator. Without setting maxiter the iterator will continue indefinitely and could not be used for list comprehension (the first example) and the for loop would be an infinite loop.

The use of return in a generator ends the iteration and is equivalent to raising a StopIteration exception (see PEP 255).  Replace the return keyword with raise StopIteration if you want to prove it.

Lastly, if you are wondering about the line a,b=b,a+b it is just a compact (and I think elegant) way of writing:

temp = a + b
a = b
b = temp

Behind the scenes the loop is calling a next method to get the next value from the iterator. There is nothing to stop you manually calling the next method as shown below. Also the generator will create a new iterator each time it is called. Each iterator will encapsulate their own values for a, b and maxiter as shown below.

i = fibonacci()
j = fibonacci(13,21)
print("variable i is %s\nvariable j is " % (i,j))
print("First 3 from i: %d , %d , %d" % (i.next(),i.next(),i.next()))
print("First 3 from j: %d , %d , %d" % (j.next(),j.next(),j.next()))

Hopefully Jeff’s explanation and my example above goes some way to explaining how generators work.

Executing arbitrary code

I last touched on executing code with the interactive console post but I’ve revisited the idea of entering Python code and executing it for a project I’ve been working on. Now before we start allowing users to execute code on your server is very dangerous. There is no way to make it 100% safe. If there is another way to achieve your aim without executing code then do it.

For an overview on what executing code (and importing modules) does behind the scenes, look at Armin Ronacher’s post on the subject. I will be following his suggestion of separating the compilation and execution steps. So, given the code we wish to run in a string, we will pass this to the compile function which returns bytecode. This bytecode can then be passed to the exec function to actually execute it. Just to be clear, we don’t have to split the process in two; you can run the Python code directly from a string as follows.

exec("print('Hello')")

So having decided to split the process into separate compile and exec steps, the first issue we need to solve is capturing errors (whether compile time or runtime) and passing these back to the user. Placing the compile and exec function inside a try block is enough to detect an error and we can use the format_exc function from the traceback module to send a formatted exception message back to the user if the code causes and exception.

Now we have a robust compile and execute sequence we just need to get the results of a successful run back to the user. We could create a log type function but Jochen Ritzel posted a simple context manager to redirect the standard output. With this we can redirect the stdout to a string (actually a StringIO) and return this assuming everything runs fine.

Putting all this together gives us the final code. The function exec_code works like exec but returns exceptions or the stdout of the executed code to you as well. The last three lines are just test code to show the difference between compile time, runtime and code with no errors.

Learning the Python library

My motivation for starting this blog was to make a note of useful Python code I’ve discovered or written so I could easily find it again. It also has the added benefit of forcing me to understand the libraries I’m using so I can explain them and write concise examples.

Another person who used a blog to start something is Doug Hellmann. He created the Python Module of the Week blog to get in the habit of writing something on a regular basis. The blog focuses on Python’s standard library and includes some very good examples.

Unzip a file in memory

The zipfile module is fairly flexible but there are occasions when you cannot pass it a filename (as a string) or a file like object; for example the open method on AWS S3 buckets does not return a suitable object. What to do if you can read the zip file into memory – writing it to disk just to read it back in again seems a waste.

Python, as is often the case, already has a module to solve this problem, in this case StringIO. This allows you to treat a string, or in this case the entire file in memory, as if it was a file.

This allows us to write our unzip procedure compactly as

# module imports and S3 connection omitted for brevity (and beyond scope)
s3file = s3connection.get_bucket(bucketname).get_key(filename)
if s3file:
 s3file.open()
 zf = udbfile.read()
 s3file.close()
 zip = zipfile.ZipFile(StringIO.StringIO(zf))
 zip.extractall()