Popularity

Last month saw the record number of visitors to the site. Given the lack of updates recently (nothing I have been working on has been worth writing about) this was a little surprising.  Hopefully will be back to more regular posts as I have more exciting work in the pipeline.

If you need digital intelligence in an app (far beyond what WordPress’ Site Stats give you) that are commercial offering like New Relic, Datadog and AppDynamics that give you an amazing level of detail. When I get a chance I want to see how much you can achieve using boomerang.js as part of your data gathering. I’ll update this post if I get anything useful.

GIL: Who, what, why?

For most posts I concentrate on using Python to solve tasks (mostly system administration based). Apart from fringe cases these are not multi-threaded so I can safely ignore Python’s Global Interpreter Lock (normally shortened to GIL). Even when running a web server, it is usually left up to the web framework to handle any multi-threading so again the GIL is safely ignored.

What is the GIL? Firstly this is for CPython, the version of Python you are most likely to be running. Other implementations like Jython, PyPy or IronPython do things differently. It is just a mechanism to marshall access to internals (variables mostly) from different threads. It makes coding involving multiple threads straight forward but means threads in CPython are generally only good for solving blocking I/O.

For a brief but more technical explanation, Vinay Sajip posted a good single paragraph description to the GIL:

“Python’s GIL is intended to serialize access to interpreter internals from different threads. On multi-core systems, it means that multiple threads can’t effectively make use of multiple cores. (If the GIL didn’t lead to this problem, most people wouldn’t care about the GIL – it’s only being raised as an issue because of the increasing prevalence of multi-core systems.) If you want to understand it in detail, you can view this video or look at this set of slides.”

Why my interest? I have just finished reading an article by A. Jesse Jiryu Davis which goes into far more detail about the GIL. If you are planning a C extension, looking at multi-threading some code which shares data or just curious try his Grok the GIL article as a starting point.

Calling a SOAP API

While REST (and not so RESTful) APIs have come to dominate there is still the occasionally find an API based on SOAP (especially if it uses a Microsoft back-end). For those interest in the different merits of the two technologies, there is an interesting infographic here.

While REST APIs can be handled with just the standard library, SOAP really needs a module to hide a lot of the complexities and boilerplate. The original SOAP modules (like SOAPy) no longer appear to be maintained (and hence do not work on later versions of Python). However if you just need the client (rather than a server), a new module called Zeep has appeared which is being actively maintained and has recently reached v1 level maturity. Zeep can be installed with with pip in the usual way.

An alternative to Zeep is the Suds-Jurko fork of Suds which may be a better choice if you need more options or find something Zeep does not support.

SOAP is XML based and uses a schema called Web Services Description Language (WSDL) to completely describe the operations, methods in Python, and how they are called (no dynamic bindings here). Zeep creates a client by from the WSDL passed as the first parameter in the constructor (the only required parameter). All of the operations are then exposed through the clients service class.

An example will hopefully make this clear. I am going to use one of the web services list at WebServiceX.NET, the one for distance type converting. This has an ChangeLengthUnit operation which takes 3 parameters; the length, the unit the length is in and the unit to convert to. Putting this together gives us the following code.

import zeep
soap = zeep.Client('http://www.webservicex.net/length.asmx?wsdl')
print(soap.service.ChangeLengthUnit(1,'Inches','Millimeters'))

A few of points (apart from the coders must be American due to the way they spell metres). All Microsoft active server methods (.asmx) should expose the WSDL if you append ?wsdl to the URI.

It doesn’t have to be a URI passed into the Client constructor. If the string begins with http (or https obviously) it will be treated as a URI. Otherwise it will assume it has been given a filepath to the WSDL file and try to open that. This is useful if you have to edit the WSDL file, say to fix a binding issue.

Once you have the client, you can get a sanitized version of the WSDL with the method soap.wsdl.dump() which should help you establish which operations are available and how you call them. The WSDL link above should help explain the terms but as you can see, even this simple example with just one operation has a lengthy WSDL.

Because SOAP uses standard types, zeep can convert the input and output correctly. If you look at the WSDL dump from above you will see the prefix xsd: http://www.w3.org/2001/XMLSchema listed – this schema defines all the standard data types and structures. If you check the return type of ChangeLengthUnit() you will notice it is a float.

Testing with alternative OS and browsers

Having covered Selenium testing in the previous post, the next step would be to test in multiple OSes. This is where having a WebDriver API really comes into its own; as long as there is a suitable WebDriver on the OS for the browser you want to test on then your tests should execute without change (or with minimal change to the initialization of the tests).

Instead the problem is moved to maintaining all the infrastructure. If setting up and maintaining multiple browsers and different OSes (including mobiles) seems like a lot of work there are several companies that offer this as a service. I am going to going to show the changes needed to get the previous test working on BrowserStack. Once you have signed up go to the Account > Settings page and make note of the username and access key.

You access a WebDriver running on another server with the remote method. This method requires an endpoint to connect to (which contains the username and access key from above) and a dictionary with the OS and browser you want to do the testing on as shown below. The line commented out is the one that needs to be replaced. Once connection is established, the rest of the test remains the same.

# driver = webdriver.Firefox()
desired = {'os': 'Windows', 'os_version': 'xp',
           'browser': 'IE', 'browser_version': '7.0'}
driver = webdriver.Remote(
    command_executor='http://username:accesskey@hub.browserstack.com:80/wd/hub',
    desired_capabilities=desired)

The BrowserStack documentation has Python examples which explain the above commands and more. Now if you go to http://www.browserstack.com/automate (login if necessary) you should now be able to see a log of the test just ran and even watch a video of it just to be sure the test is running in your specified environment.

There are alternatives to BrowserStack: Sauce Labs, Cross Browser Testing and TestingBot all offer similar functionality. TestingBot offers by far the cheapest entry into automated testing option as their personal plan includes 400 minutes of automated testing whereas all the others require you to purchase a more expensive plan before you get any automated testing minutes.

In addition to automated testing, all the above offer manual testing of the different browsers / OSes so you can troubleshoot problems that occur on other platforms.

Testing websites

Testing a static website was fairly simple. At the most basic you could just curl the pages to make sure you was getting the page expected. A more comprehensive test could be written in Python without any additional modules; although requests and a testing framework makes life easier.

Dynamic websites, where a large part of the page is generated by JavaScript, opens up lots more possibilities for problems. Subtle differences between browsers DOM, JavaScript methods and supported features mean you really have to test the website in a browser. Manual testing soon before laborious and time expensive. It is also completely unsuitable for continuous deployment (CD).

This is where Selenium comes in. This allows you to control a browser programatically, recreating the steps a tester would do manually. The python language bindings can be installed with pip as usual. To control a browser it uses an API known as WebDriver. To actually support the different browsers you need the appropriate WebDriver application for that browser, which can be downloaded from the following locations

Where you have 32- and 64-bit versions, you need the same one as the browser you wish to control, not your OS. So the 32-bit Internet Explorer WebDriver will control the 32-bit version of Internet Explorer. These need to be placed in a folder that is included in your path environment variable so that Python can find and run them; alternatively you can add the folder to sys.path before use. Each WebDriver has a different name so they can all go in the same folder. Although the Microsoft Edge WebDriver is an msi (the others are zip files) it does not add the installation folder to your path. I’ve also found the WebDriver for the last two, Internet Explorer and Opera, to be buggy but your mileage may vary.

Now for a quick test. To demonstrate a few features I’ve uploaded a script to BitBucket that simply checks if this blog is on the first page returned when searched for. This should always be the case but it does demonstrate starting the driver, opening a page, finding an element and inputting to it, then checking the result contains an element you was expecting.

Hopefully this should give you some idea of how to make a start. The official documentation has examples in Python of each call.

Python SQL Server driver on Linux

So you have packaged your SQL monitoring and maintenance routines into a web server and demonstrated it all works from your computer. Impressed they ask for it to be put on a proper server – a Linux box. 5 years ago this would have involved using unsupported 3rd party drivers and who ran internal Linux servers anyway. Now the request seems almost reasonable although you will have to jump through more hoops than you would with Windows.

First off I’ll assume you are using the pyodbc module. On Linux this will require a C compiler. If you have chosen a minimal install then you’ll need to install them. This can be done with the following command (depending upon the flavour)

Redhat (Centos/Fedora)
sudo yum groupinstall 'Development Tools' -y
sudo yum install python-devel unixODBC-devel -y

Debian (Ubuntu)
sudo apt-get install build-essential -y
sudo apt-get install python-dev unixodbc-dev -y

With this done you can now pip install pyodbc. The pyodbc module is a wrapper around the native system drivers so you will need to install a suitable unixodbc driver. Microsoft have produced an official unixODBC driver since 2012 and it has been regularly maintained since. Installation instructions for v13 can be found on this blog post.

With pyodbc and unixodbc set up all you need to change in your actual code is the driver on the ODBC connection string to ‘ODBC Driver 13 for SQL Server’ and away you go. As a quick test, the following example will establish a connection and return the servername through a SQL query.

import pyodbc
cnxnstr = "Driver=ODBC Driver 13 for SQL Server;Server=<yourserver>;Uid=<yourusername>;Pwd=<yourpassword>;database=<yourdatabase>"
cnxn = pyodbc.connect(cnxnstr)
cursor = cnxn.cursor()
cursor.execute("SELECT @@SERVERNAME")
result = cursor.fetchall()
for row in result:
    print(row)
cursor.close()
cnxn.close()

Raspberry Pi projects – Raspi Boy

Python and the Raspberry Pi go hand-in-handand if you want to learn Python from something other than sys admin and server jobs then a Raspberry Pi is good choice. Being a board only, others have also used it as the brains behind their projects.

One of these Kickstarter projects that has caught my eye is the Raspi Boy – basically a screen, battery and case that looks like a GameBoy. As well as a great way to play all those games from my childhood it will also be a fully portable Python system. I have made my pledge on Kickstarter and I’ll give you a review of it when it turns up.

Markdown

I’ve successfully completed the 12 blog challenge I set myself (with just hours to go) even if 4 of those blogs were editorial rather than code. It has reminded me how difficult coming up with regular content is.

Another problem related to content creation is how to format it. The most universal format is of course the plain text file. This has great portability and diff tools work well to see version changes but they are hardly pleasing on the eye. A universal format which allows formatting would naturally be HTML. These retain the portable advantage of text files but are far more difficult to write by hand and don’t work well with diff tools.

There are of course plenty of platforms (wikis and blogging for example) that give you a near WYSIWYG (or visual) editor to hide the HTML code and cleverly store the pages so as to be able to generate a diff of versions. However this can lock you in to the platform losing the portability.

One solution that tries to give portability, basic formatting and works with diff tools is Markdown. Those of you like me who grew up with email in the 90s will recognise Markdown as the way text only email was formatted. It’s main advantage is it doesn’t require any markup as such (formatting is mostly contextual) but has a direct relationship to basic HTML.

With these advantages it is no wonder Markdown has found a home in version control systems which are used to dealing with text files and displaying changes between versions. If you create a readme file, sites like BitBucket and GitHub will automatically display this file along with the directory contents. Make this a readme.md (md is the common extension for a Markdown document) and this will be formatted correctly.

There is a Python Markdown module with takes Markdown text and converts it to HTML. It also supports extensions to add extra functionality. With this module installed (pip install markdown) conversion is just a method call away.

import markdown
md = markdown.Markdown()
print(md.convert("""# Hello
Welcome to a Markdown world!""")

As a more useful example I have used this with the bottle web micro framework to create a program that allow you to view all the markdown documents in a folder through a web browser. If you go to the root it will list all the Markdown documents and you view one by clicking on it. Simply run the Python program from the directory you wish to view.

Testing coverage

Python already has a unittest module based on the Java JUnit library. You can create a series of tests by creating a class that inherits from unittest.TestCase – each method in this class then becomes a test. You can check a condition is true using the standard assert command. To run the tests just call the unittest.main method.

Straight forward but this gets difficult to manage when the tests are split over multiple files. To help with this there is the nose module which can be installed with pip in the usual way. This removes the need for the boilerplate code. It will search through all the python code looking for not only classes derived from unittest.TestCase but also for any method or class that matches the regex – basically contains test at a word boundary and is in a module called test.

As a simple demonstration I have created this repository. It contains stack.py – the most primitive stack implementation I could come up with. I want to ensure this works as expected so lets come up with some tests. I don’t want test code littering the main code so I’ve created a directory called test for all my testing. In here I’ve created a file called test-stack.py which contains two methods, one for testing the stack when empty and one to test that I push to the stack, I get the correct value back in the correct order.

Even though I’ve not created any boilerplate code to run these tests, if I enter the following command from the main directory it will indeed find and run these two tests. Hopefully both should pass.

nosetests

Obviously this is a trivial example but hopefully it shows how quickly unit tests can be set up. There is a lot more to be said on testings which will have to wait for another blog post.

So you have written some tests for your code. How do you know your tests are testing all of the code? This is known as code coverage and there is another good module for this called coverage.py which can be installed with pip as usual. The reason for choosing these two is they work together. Once installed I can include code coverage just by adding the following parameters to nosetests

nosetests --with-coverage --cover-erase

Now as well as running the tests it will show me the amount of code the tests have executed. This is fine for a metric but if the code coverage is not 100% how do you know what the tests are missing? Add another parameter to the command, –cover-html, and coverage will create an HTML report (inside the cover sub-directory).

Load index.html into a browser to see the summary similar to what is displayed on the screen at the end of the tests. Click on the module name and this shows you the module code but which lines were executed by the tests and which lines was not. A thin green bar at the start of the line of code indicates the line was executed; a red bar indicates no code execution.

Typing all these parameters in each time will get a little tedious. Thankfully nose supports an ini file for configuration. For some reason I could not get this to be automatically detected so I had to specify it at the command line with

nosetests -c nose.cfg

As a final note, coverage is just a metric on how much of you code is being tested. It does not imply anything about the quality of the tests. You can have 100% coverage with worthless tests just the same as you can have really thorough tests that only test a small section of the code. At least if you follow the above you will know what your tests are missing.

Virtual environments in Visual Studio

A virtual environment in Python is a folder with everything needed to set up local configuration isolated from the rest of the system. This allows you can have modules installed locally which are different or do not exist in the global Python configuration. If you have used Node.js then you can think of virtual environments as npm default way of working – creating a local install of a package rather than a global one (pip’s default).

If you have multiple versions of Python installed on your machine then you can also specify which version of Python the virtual environment should use. This gives you ability to test your code against multiple versions of Python just by creating multiple virtual environments.

There are already plenty of good posts out there on virtual environments so the aim of this blog post is not to rehash why you should use virtual environments (see here for a good introductory blog post here) or as a quick setup guide (see the Hitchhikers Guide to Python post). It is a quick guide to using virtual environments within Visual Studio. If you have not used virtual environments before it is worth giving these posts a quick read before continuing.

As an aside, Python 3.3 introduced the venv module as an alternative for creating lightweight virtual environments (although the original wrapper pyvenv has already be depreciated in Python 3.6). While this is the correct way going forward, Visual Studio uses the older virtualenv method which is what I am concentrating on here.

Once you have created your Python solution expand it until you get to Python Environments. Right-click on this and choose Add Virtual Environment… from the menu list as shown belowvsve1

 

You can change the name of the folder (defaults to env) which is also used as the name of the virtual environment and the version of Python to use. Click Create to finish and you are ready to go (easy wasn’t it). If you expand the Python Environments node you should see the virtual environment appear.

In the background this has created a folder (the virtual environment) in your working directory with the name given. In case you are unsure, your working directory is the location is the location of the solution which defaults to X:\Users\me\Documents\VS20xx\Projects\Project Name\Solution Name\ – tip, change the default location). This could have been done manually by changing into the working directory and entering the following command (where X:\Python_xx is the installation directory for the version of Python you want to use and env is the name of the folder / virtual environment – if you just want your default version of Python then just pass the name of the folder).

virtualenv -p X:\Python_xx\python.exe env

To install a module into the virtual environment from Visual Studio just right-click on the virtual environment and select Install Python Package… from the menu or if you have a requirements.txt file you can select Install from requirements.txt. If you expand the virtual environment node you will see the modules installed. Once you have all the modules installed you can generate the requirements.txt file from the same menu and it will add the requirements.txt to your project for portability.

What if you want to use this virtual environment from the command line? Inside of the virtual environment is a Scripts directory with a script to make the necessary changes; the trick is to run the correct script from the working directory. The script to run depends upon whether you are running inside a PowerShell console (my recommendation) or from a command prompt. Change into the working directory and type in the following command (where env is the virtual environment folder)

PowerShell: .\env\Scripts\activate.ps1
Command prompt: env\Scripts\activate.bat

The prompt will change to the name of the virtual environment to show activation has succeeded. You can do everything you would normally do from the command line but now you are running against the virtual environment. To confirm the modules installed are only those you have specified type in ‘pip list’ and the version of Python is the one you specified with ‘python -v’.

Update: It appears I’m not the only one to be looking at virtual environments today, see this article if you want a similar introduction but from the command prompt only.