Mr Popularity

David Robinson has done some good analysis on searches on Stack Overflow on the popularity of languages. This shows that Python is on track to be the most searched for programming language. He followed this up with further analysis to show the increase appears to be coming from data science and machine learning. This follows on from IEEE Spectrum putting Python as the most popular programming language for 2017 among developers.

Apart from giving me an excuse to put lots of links in the first paragraph what does this show. Probably little more than Python is flexible which we knew already (it’s why we’ve been using it). You can learn to program with it, produce a web service with it, do data analysis with it as well as all of automate your system administration jobs (which this blog mostly deals with).

Rather than starting a flame war over which language is the most popular or useful, a more useful takeaway from this is Python really is a first class language. It is not just an alternative to PERL, you can use it as your go-to language for everything and only change if a reason to do so appears.

Advertisements

AsciiDoc and DocBook

I’ve covered Markdown (.md) in other posts but another text format gaining popularity is AsciiDoc which is a plain text interpretation of DocBook XML. These files generally use a verbose .asciidoc file extension but you can sometimes see them using the text file (.txt) extension.

The main AsciiDoc program is written in Python, but there is no pip install method. Instead clone the directory using the Mercurial hg command or if you have not got a Mercurial client visit the Google Code website to download and unpack the zip file.

Once downloaded you can build the documentation for the AsciiDoc program with the following command:

python asciidoc.py doc\asciidoc.txt

This also acts as a way to test the basic setup. If all goes well you should see no error messages and it should create a doc\asciidoc.html file which you can open with any web browser.

To get most other formats, AsciiDoc converts the file to the DocBook XML format and then acts as a wrapper around DocBook to create the necessary file. DocBook is not aimed at Windows users so getting it installed is not straight forward. Thankfully combining this blog post and this SO post gives us the installation steps below.

First go to DocBooks SourceForge site and download the zip file. Unpack this to the C: drive (or wherever you want it) and optionally rename the directory to docbook-xsl, that is remove the version number from the folder. Add this folder to your path environment variable.

Now you need libxml2, libxslt, libxmlsec , zlib and iconv. A Windows build of all these can be obtained from ftp://ftp.zlatkovic.com/libxml/. Download the latest version of the zip file for each library and extract the contents of the bin directory of each zip file into the docbook-xsl directory created above.

So avoid calling several programs to create the other files, there is also a2x.py provided. This is a wrapper around the various software programs that need to be called. To create an epub ebook of the instructions above the command becomes:

python a2x.py -L -f epub -v doc\asciidoc.txt

Testing websites (headless)

I covered using Selenium to test website in previous posts (starting with this one which covers install and first test). Using a full browser ensures real world testing and can be done interactively. Using a full web browser does come with a performance penalty and may make CD integration tricky.

Partially for these reasons, PhantomJS was developed. It is based on WebKit and as well as offering a JavaScript API it can be controlled using Selenium.

Installation is easy. Download the windows zip file from the downloads page. There are no dependencies, you just need the phantomjs.exe file from the bin folder. Move this to a directory that exists on your path, like you did for the other webdrivers. You can then use this as apposed to one of the other drivers simply by calling the PhantomJS webdriver with

driver = webdriver.PhantomJS()

All of the Selenium examples will work with this one change.

Async requests

The advantages of asynchronous network requests (or any high latency requests) over synchronous is easy to explain with a little example. However as I’ll explain at the end, my little example hit a problem.

Doing the requests in a synchronous manner, one after the other, means you make the first one, wait for its response before moving on to the next and so on until all the requests are completed. The time taken is the sum of all the requests. If you did 20 requests where the longest took 1 second and the average took half a second then doing this synchronously would take 10 seconds.

Doing the same in an asynchronous manner you create a listener or a callback to handle the response then start the first request followed immediately by the second and so on without waiting for a response (as you have already created another piece of code to handle the responses). This should only take as long as the longest response, plus a little overhead, or about 1 second in our example above.

So why don’t we do everything asynchronously? Well JavaScript does, it’s one of the reasons Node became popular. However if you have done any serious amount of coding in JavaScript you will know the added complexity that this brings; because you do not wait for a response you either have to get the callback to update you, which gets tricky once callbacks get nested, or have some sort of polling mechanism to wait for all the responses before continuing. There are solutions to these issues, promises in JavaScript is one, and such are the advantages that from Python 3.5 onwards async and await keywords were added (similar to other languages, see PEP 492).

However if you just have a batch of HTTP based requests you want to run asynchronously, Kenneth Reitz, who wrote the excellent requests module I used in the Posting to Slack blog entry, has released grequests; basicly a monkey patch for requests to use the gevent module to make asynchronous calls.

You use the same get, post, put, head and delete request functions and returns the same response object. The difference the asynchronous way has an extra line to set up a tuple for handling the requests before calling grequests.map (or imap) to poll until all the requests are complete whereas the synchronous way just maps the get calls directly to the urls. I created a little program to demonstrate this and uploaded it to BitBucket. It makes 10 get requests asynchronously first and then synchronously and displays the timings. Putting the async first should eliminate any possibility requests being faster skewing the results.

So to the problem. The asynchronous calls on my Windows machine did not end up faster, if anything the average was slower. Confused, I tested the same code on a Linux box which produced the expected results; the async completing in a quarter of the time. At a guess it seems there may be a problem either with gevents or the greenlets module grequests depends upon for performance. I will do some more investigation and let you know.

Popularity

Last month saw the record number of visitors to the site. Given the lack of updates recently (nothing I have been working on has been worth writing about) this was a little surprising.  Hopefully will be back to more regular posts as I have more exciting work in the pipeline.

If you need digital intelligence in an app (far beyond what WordPress’ Site Stats give you) that are commercial offering like New Relic, Datadog and AppDynamics that give you an amazing level of detail. When I get a chance I want to see how much you can achieve using boomerang.js as part of your data gathering. I’ll update this post if I get anything useful.

GIL: Who, what, why?

For most posts I concentrate on using Python to solve tasks (mostly system administration based). Apart from fringe cases these are not multi-threaded so I can safely ignore Python’s Global Interpreter Lock (normally shortened to GIL). Even when running a web server, it is usually left up to the web framework to handle any multi-threading so again the GIL is safely ignored.

What is the GIL? Firstly this is for CPython, the version of Python you are most likely to be running. Other implementations like Jython, PyPy or IronPython do things differently. It is just a mechanism to marshall access to internals (variables mostly) from different threads. It makes coding involving multiple threads straight forward but means threads in CPython are generally only good for solving blocking I/O.

For a brief but more technical explanation, Vinay Sajip posted a good single paragraph description to the GIL:

“Python’s GIL is intended to serialize access to interpreter internals from different threads. On multi-core systems, it means that multiple threads can’t effectively make use of multiple cores. (If the GIL didn’t lead to this problem, most people wouldn’t care about the GIL – it’s only being raised as an issue because of the increasing prevalence of multi-core systems.) If you want to understand it in detail, you can view this video or look at this set of slides.”

Why my interest? I have just finished reading an article by A. Jesse Jiryu Davis which goes into far more detail about the GIL. If you are planning a C extension, looking at multi-threading some code which shares data or just curious try his Grok the GIL article as a starting point.

Calling a SOAP API

While REST (and not so RESTful) APIs have come to dominate there is still the occasionally find an API based on SOAP (especially if it uses a Microsoft back-end). For those interest in the different merits of the two technologies, there is an interesting infographic here.

While REST APIs can be handled with just the standard library, SOAP really needs a module to hide a lot of the complexities and boilerplate. The original SOAP modules (like SOAPy) no longer appear to be maintained (and hence do not work on later versions of Python). However if you just need the client (rather than a server), a new module called Zeep has appeared which is being actively maintained and has recently reached v1 level maturity. Zeep can be installed with with pip in the usual way.

An alternative to Zeep is the Suds-Jurko fork of Suds which may be a better choice if you need more options or find something Zeep does not support.

SOAP is XML based and uses a schema called Web Services Description Language (WSDL) to completely describe the operations, methods in Python, and how they are called (no dynamic bindings here). Zeep creates a client by from the WSDL passed as the first parameter in the constructor (the only required parameter). All of the operations are then exposed through the clients service class.

An example will hopefully make this clear. I am going to use one of the web services list at WebServiceX.NET, the one for distance type converting. This has an ChangeLengthUnit operation which takes 3 parameters; the length, the unit the length is in and the unit to convert to. Putting this together gives us the following code.

import zeep
soap = zeep.Client('http://www.webservicex.net/length.asmx?wsdl')
print(soap.service.ChangeLengthUnit(1,'Inches','Millimeters'))

A few of points (apart from the coders must be American due to the way they spell metres). All Microsoft active server methods (.asmx) should expose the WSDL if you append ?wsdl to the URI.

It doesn’t have to be a URI passed into the Client constructor. If the string begins with http (or https obviously) it will be treated as a URI. Otherwise it will assume it has been given a filepath to the WSDL file and try to open that. This is useful if you have to edit the WSDL file, say to fix a binding issue.

Once you have the client, you can get a sanitized version of the WSDL with the method soap.wsdl.dump() which should help you establish which operations are available and how you call them. The WSDL link above should help explain the terms but as you can see, even this simple example with just one operation has a lengthy WSDL.

Because SOAP uses standard types, zeep can convert the input and output correctly. If you look at the WSDL dump from above you will see the prefix xsd: http://www.w3.org/2001/XMLSchema listed – this schema defines all the standard data types and structures. If you check the return type of ChangeLengthUnit() you will notice it is a float.

Testing with alternative OS and browsers

Having covered Selenium testing in the previous post, the next step would be to test in multiple OSes. This is where having a WebDriver API really comes into its own; as long as there is a suitable WebDriver on the OS for the browser you want to test on then your tests should execute without change (or with minimal change to the initialization of the tests).

Instead the problem is moved to maintaining all the infrastructure. If setting up and maintaining multiple browsers and different OSes (including mobiles) seems like a lot of work there are several companies that offer this as a service. I am going to going to show the changes needed to get the previous test working on BrowserStack. Once you have signed up go to the Account > Settings page and make note of the username and access key.

You access a WebDriver running on another server with the remote method. This method requires an endpoint to connect to (which contains the username and access key from above) and a dictionary with the OS and browser you want to do the testing on as shown below. The line commented out is the one that needs to be replaced. Once connection is established, the rest of the test remains the same.

# driver = webdriver.Firefox()
desired = {'os': 'Windows', 'os_version': 'xp',
           'browser': 'IE', 'browser_version': '7.0'}
driver = webdriver.Remote(
    command_executor='http://username:accesskey@hub.browserstack.com:80/wd/hub',
    desired_capabilities=desired)

The BrowserStack documentation has Python examples which explain the above commands and more. Now if you go to http://www.browserstack.com/automate (login if necessary) you should now be able to see a log of the test just ran and even watch a video of it just to be sure the test is running in your specified environment.

There are alternatives to BrowserStack: Sauce Labs, Cross Browser Testing and TestingBot all offer similar functionality. TestingBot offers by far the cheapest entry into automated testing option as their personal plan includes 400 minutes of automated testing whereas all the others require you to purchase a more expensive plan before you get any automated testing minutes.

In addition to automated testing, all the above offer manual testing of the different browsers / OSes so you can troubleshoot problems that occur on other platforms.

Testing websites

Testing a static website was fairly simple. At the most basic you could just curl the pages to make sure you was getting the page expected. A more comprehensive test could be written in Python without any additional modules; although requests and a testing framework makes life easier.

Dynamic websites, where a large part of the page is generated by JavaScript, opens up lots more possibilities for problems. Subtle differences between browsers DOM, JavaScript methods and supported features mean you really have to test the website in a browser. Manual testing soon before laborious and time expensive. You also cannot integrate it into a continuous deployment (CD) process.

This is where Selenium comes in. This allows you to control a browser programatically, recreating the steps a tester would do manually. The python language bindings can be installed with pip as usual. To control a browser it uses an API known as WebDriver. To actually support the different browsers you need the appropriate WebDriver application for that browser, which can be downloaded from the following locations

Where you have 32- and 64-bit versions, you need the same one as the browser you wish to control, not your OS. So the 32-bit Internet Explorer WebDriver will control the 32-bit version of Internet Explorer. These need to be placed in a folder that is included in your path environment variable so that Python can find and run them; alternatively you can add the folder to sys.path before use. Each WebDriver has a different name so they can all go in the same folder. Although the Microsoft Edge WebDriver is an msi (the others are zip files) it does not add the installation folder to your path. I’ve also found the WebDriver for the last two, Internet Explorer and Opera, to be buggy but your mileage may vary.

Now for a quick test. To demonstrate a few features I’ve uploaded a script to BitBucket that simply checks if this blog is on the first page returned when searched for. This should always be the case but it does demonstrate starting the driver, opening a page, finding an element and inputting to it, then checking the result contains an element you was expecting.

Hopefully this should give you some idea of how to make a start. The official documentation has examples in Python of each call.

Python SQL Server driver on Linux

So you have packaged your SQL monitoring and maintenance routines into a web server and demonstrated it all works from your computer. Impressed they ask for it to be put on a proper server – a Linux box. 5 years ago this would have involved using unsupported 3rd party drivers and who ran internal Linux servers anyway. Now the request seems almost reasonable although you will have to jump through more hoops than you would with Windows.

First off I’ll assume you are using the pyodbc module. On Linux this will require a C compiler. If you have chosen a minimal install then you’ll need to install them. This can be done with the following command (depending upon the flavour)

Redhat (Centos/Fedora)
sudo yum groupinstall 'Development Tools' -y
sudo yum install python-devel unixODBC-devel -y

Debian (Ubuntu)
sudo apt-get install build-essential -y
sudo apt-get install python-dev unixodbc-dev -y

With this done you can now pip install pyodbc. The pyodbc module is a wrapper around the native system drivers so you will need to install a suitable unixodbc driver. Microsoft have produced an official unixODBC driver since 2012 and it has been regularly maintained since. Installation instructions for v13 can be found on this blog post.

With pyodbc and unixodbc set up all you need to change in your actual code is the driver on the ODBC connection string to ‘ODBC Driver 13 for SQL Server’ and away you go. As a quick test, the following example will establish a connection and return the servername through a SQL query.

import pyodbc
cnxnstr = "Driver=ODBC Driver 13 for SQL Server;Server=<yourserver>;Uid=<yourusername>;Pwd=<yourpassword>;database=<yourdatabase>"
cnxn = pyodbc.connect(cnxnstr)
cursor = cnxn.cursor()
cursor.execute("SELECT @@SERVERNAME")
result = cursor.fetchall()
for row in result:
    print(row)
cursor.close()
cnxn.close()