Web

Testing websites (headless)

I covered using Selenium to test website in previous posts (starting with this one which covers install and first test). Using a full browser ensures real world testing and can be done interactively. Using a full web browser does come with a performance penalty and may make CD integration tricky.

Partially for these reasons, PhantomJS was developed. It is based on WebKit and as well as offering a JavaScript API it can be controlled using Selenium.

Installation is easy. Download the windows zip file from the downloads page. There are no dependencies, you just need the phantomjs.exe file from the bin folder. Move this to a directory that exists on your path, like you did for the other webdrivers. You can then use this as apposed to one of the other drivers simply by calling the PhantomJS webdriver with

driver = webdriver.PhantomJS()

All of the Selenium examples will work with this one change.

Async requests

The advantages of asynchronous network requests (or any high latency requests) over synchronous is easy to explain with a little example. However as I’ll explain at the end, my little example hit a problem.

Doing the requests in a synchronous manner, one after the other, means you make the first one, wait for its response before moving on to the next and so on until all the requests are completed. The time taken is the sum of all the requests. If you did 20 requests where the longest took 1 second and the average took half a second then doing this synchronously would take 10 seconds.

Doing the same in an asynchronous manner you create a listener or a callback to handle the response then start the first request followed immediately by the second and so on without waiting for a response (as you have already created another piece of code to handle the responses). This should only take as long as the longest response, plus a little overhead, or about 1 second in our example above.

So why don’t we do everything asynchronously? Well JavaScript does, it’s one of the reasons Node became popular. However if you have done any serious amount of coding in JavaScript you will know the added complexity that this brings; because you do not wait for a response you either have to get the callback to update you, which gets tricky once callbacks get nested, or have some sort of polling mechanism to wait for all the responses before continuing. There are solutions to these issues, promises in JavaScript is one, and such are the advantages that from Python 3.5 onwards async and await keywords were added (similar to other languages, see PEP 492).

However if you just have a batch of HTTP based requests you want to run asynchronously, Kenneth Reitz, who wrote the excellent requests module I used in the Posting to Slack blog entry, has released grequests; basicly a monkey patch for requests to use the gevent module to make asynchronous calls.

You use the same get, post, put, head and delete request functions and returns the same response object. The difference the asynchronous way has an extra line to set up a tuple for handling the requests before calling grequests.map (or imap) to poll until all the requests are complete whereas the synchronous way just maps the get calls directly to the urls. I created a little program to demonstrate this and uploaded it to BitBucket. It makes 10 get requests asynchronously first and then synchronously and displays the timings. Putting the async first should eliminate any possibility requests being faster skewing the results.

So to the problem. The asynchronous calls on my Windows machine did not end up faster, if anything the average was slower. Confused, I tested the same code on a Linux box which produced the expected results; the async completing in a quarter of the time. At a guess it seems there may be a problem either with gevents or the greenlets module grequests depends upon for performance. I will do some more investigation and let you know.

Calling a SOAP API

While REST (and not so RESTful) APIs have come to dominate there is still the occasionally find an API based on SOAP (especially if it uses a Microsoft back-end). For those interest in the different merits of the two technologies, there is an interesting infographic here.

While REST APIs can be handled with just the standard library, SOAP really needs a module to hide a lot of the complexities and boilerplate. The original SOAP modules (like SOAPy) no longer appear to be maintained (and hence do not work on later versions of Python). However if you just need the client (rather than a server), a new module called Zeep has appeared which is being actively maintained and has recently reached v1 level maturity. Zeep can be installed with with pip in the usual way.

An alternative to Zeep is the Suds-Jurko fork of Suds which may be a better choice if you need more options or find something Zeep does not support.

SOAP is XML based and uses a schema called Web Services Description Language (WSDL) to completely describe the operations, methods in Python, and how they are called (no dynamic bindings here). Zeep creates a client by from the WSDL passed as the first parameter in the constructor (the only required parameter). All of the operations are then exposed through the clients service class.

An example will hopefully make this clear. I am going to use one of the web services list at WebServiceX.NET, the one for distance type converting. This has an ChangeLengthUnit operation which takes 3 parameters; the length, the unit the length is in and the unit to convert to. Putting this together gives us the following code.

import zeep
soap = zeep.Client('http://www.webservicex.net/length.asmx?wsdl')
print(soap.service.ChangeLengthUnit(1,'Inches','Millimeters'))

A few of points (apart from the coders must be American due to the way they spell metres). All Microsoft active server methods (.asmx) should expose the WSDL if you append ?wsdl to the URI.

It doesn’t have to be a URI passed into the Client constructor. If the string begins with http (or https obviously) it will be treated as a URI. Otherwise it will assume it has been given a filepath to the WSDL file and try to open that. This is useful if you have to edit the WSDL file, say to fix a binding issue.

Once you have the client, you can get a sanitized version of the WSDL with the method soap.wsdl.dump() which should help you establish which operations are available and how you call them. The WSDL link above should help explain the terms but as you can see, even this simple example with just one operation has a lengthy WSDL.

Because SOAP uses standard types, zeep can convert the input and output correctly. If you look at the WSDL dump from above you will see the prefix xsd: http://www.w3.org/2001/XMLSchema listed – this schema defines all the standard data types and structures. If you check the return type of ChangeLengthUnit() you will notice it is a float.

Testing with alternative OS and browsers

Having covered Selenium testing in the previous post, the next step would be to test in multiple OSes. This is where having a WebDriver API really comes into its own; as long as there is a suitable WebDriver on the OS for the browser you want to test on then your tests should execute without change (or with minimal change to the initialization of the tests).

Instead the problem is moved to maintaining all the infrastructure. If setting up and maintaining multiple browsers and different OSes (including mobiles) seems like a lot of work there are several companies that offer this as a service. I am going to going to show the changes needed to get the previous test working on BrowserStack. Once you have signed up go to the Account > Settings page and make note of the username and access key.

You access a WebDriver running on another server with the remote method. This method requires an endpoint to connect to (which contains the username and access key from above) and a dictionary with the OS and browser you want to do the testing on as shown below. The line commented out is the one that needs to be replaced. Once connection is established, the rest of the test remains the same.

# driver = webdriver.Firefox()
desired = {'os': 'Windows', 'os_version': 'xp',
           'browser': 'IE', 'browser_version': '7.0'}
driver = webdriver.Remote(
    command_executor='http://username:accesskey@hub.browserstack.com:80/wd/hub',
    desired_capabilities=desired)

The BrowserStack documentation has Python examples which explain the above commands and more. Now if you go to http://www.browserstack.com/automate (login if necessary) you should now be able to see a log of the test just ran and even watch a video of it just to be sure the test is running in your specified environment.

There are alternatives to BrowserStack: Sauce Labs, Cross Browser Testing and TestingBot all offer similar functionality. TestingBot offers by far the cheapest entry into automated testing option as their personal plan includes 400 minutes of automated testing whereas all the others require you to purchase a more expensive plan before you get any automated testing minutes.

In addition to automated testing, all the above offer manual testing of the different browsers / OSes so you can troubleshoot problems that occur on other platforms.

Testing websites

Testing a static website was fairly simple. At the most basic you could just curl the pages to make sure you was getting the page expected. A more comprehensive test could be written in Python without any additional modules; although requests and a testing framework makes life easier.

Dynamic websites, where a large part of the page is generated by JavaScript, opens up lots more possibilities for problems. Subtle differences between browsers DOM, JavaScript methods and supported features mean you really have to test the website in a browser. Manual testing soon before laborious and time expensive. You also cannot integrate it into a continuous deployment (CD) process.

This is where Selenium comes in. This allows you to control a browser programatically, recreating the steps a tester would do manually. The python language bindings can be installed with pip as usual. To control a browser it uses an API known as WebDriver. To actually support the different browsers you need the appropriate WebDriver application for that browser, which can be downloaded from the following locations

Where you have 32- and 64-bit versions, you need the same one as the browser you wish to control, not your OS. So the 32-bit Internet Explorer WebDriver will control the 32-bit version of Internet Explorer. These need to be placed in a folder that is included in your path environment variable so that Python can find and run them; alternatively you can add the folder to sys.path before use. Each WebDriver has a different name so they can all go in the same folder. Although the Microsoft Edge WebDriver is an msi (the others are zip files) it does not add the installation folder to your path. I’ve also found the WebDriver for the last two, Internet Explorer and Opera, to be buggy but your mileage may vary.

Now for a quick test. To demonstrate a few features I’ve uploaded a script to BitBucket that simply checks if this blog is on the first page returned when searched for. This should always be the case but it does demonstrate starting the driver, opening a page, finding an element and inputting to it, then checking the result contains an element you was expecting.

Hopefully this should give you some idea of how to make a start. The official documentation has examples in Python of each call.

Markdown

I’ve successfully completed the 12 blog challenge I set myself (with just hours to go) even if 4 of those blogs were editorial rather than code. It has reminded me how difficult coming up with regular content is.

Another problem related to content creation is how to format it. The most universal format is of course the plain text file. This has great portability and diff tools work well to see version changes but they are hardly pleasing on the eye. A universal format which allows formatting would naturally be HTML. These retain the portable advantage of text files but are far more difficult to write by hand and don’t work well with diff tools.

There are of course plenty of platforms (wikis and blogging for example) that give you a near WYSIWYG (or visual) editor to hide the HTML code and cleverly store the pages so as to be able to generate a diff of versions. However this can lock you in to the platform losing the portability.

One solution that tries to give portability, basic formatting and works with diff tools is Markdown. Those of you like me who grew up with email in the 90s will recognise Markdown as the way text only email was formatted. It’s main advantage is it doesn’t require any markup as such (formatting is mostly contextual) but has a direct relationship to basic HTML.

With these advantages it is no wonder Markdown has found a home in version control systems which are used to dealing with text files and displaying changes between versions. If you create a readme file, sites like BitBucket and GitHub will automatically display this file along with the directory contents. Make this a readme.md (md is the common extension for a Markdown document) and this will be formatted correctly.

There is a Python Markdown module with takes Markdown text and converts it to HTML. It also supports extensions to add extra functionality. With this module installed (pip install markdown) conversion is just a method call away.

import markdown
md = markdown.Markdown()
print(md.convert("""# Hello
Welcome to a Markdown world!""")

As a more useful example I have used this with the bottle web micro framework to create a program that allow you to view all the markdown documents in a folder through a web browser. If you go to the root it will list all the Markdown documents and you view one by clicking on it. Simply run the Python program from the directory you wish to view.

Decrypting AWS Windows passwords

With Linux instances, the public key of the key pair you specify when creating the instance is placed in the .ssh/authorized_keys directory. When you SSH in it encrypts the initial communication details with your public key so that only someone with the corresponding private key can decrypt the details and complete the connection.

Windows instances do not work in the same way. Instead when the instance is created a random password is created. This password is then encrypted with the public key.  You can request this encrypted password but you then need the private key to decrypt it. This can be done through the AWS console but if you are going to use boto to automate AWS then you really want a Python solution.

I have seen a couple of solutions using the PyCrypto module but I wanted a pure Python solution. Luckily there is an rsa module (pip install rsa) which is written in pure Python. With that and the boto module you can decrypt the password with the following code.

import rsa, boto, base64
instance_id = 'i-0123456789abcdef'
key_path = r'C:\path\to\private.pem'

ec2 = boto.connect_ec2() #access_key,secret_key
passwd = base64.b64decode(ec2.get_password_data(instance_id))
if (passwd):
    with open (key_path,'r') as privkeyfile:
        priv = rsa.PrivateKey.load_pkcs1(privkeyfile.read())
    key = rsa.decrypt(passwd,priv)
else:
    key = 'Wait at least 4 minutes after creation before the admin password is available'

print(key)

You just need to specify the instance_id and the private key file location (2nd and 3rd lines). The connect_ec2 method will use the credentials in the .aws file in your home directory if it exists. Alternatively you can specify the access key and secret key as parameters to the method. Assuming you haven’t jumped the gun this should print out the admin password.

Email configuration

Sending an email when a system breaks, warning level reached or just simply when a job completes is standard stuff. The smtplib module handles the sending and the email module makes building even a multipart mime email straightforward. If you have not used these libraries to send an email before then there are lots of other articles on the Internet with examples for example Matt’s blog post for a quick overview on sending plain text and then text + html multipart emails.

What can be surprising is the slight variations in what different SMTP gateways will require in order to work. The following table gives you the different settings needed to get the most common gateways working.

Service Server Port SSL (TLS) Username
Exchange your dns entry 25 no n/a *
Exchange your dns entry 587 yes Windows username
w/ domain
Office 365 smtp.office365.com 587 yes email address
Outlook.com  smtp.live.com 587 yes email address
Gmail  smtp.gmail.com 587 yes email address !
Yahoo mail  smtp.mail.yahoo.com 587 yes email address !
AWS SES email-smtp.us-east-1.amazonaws.com
email-smtp.us-west-2.amazonaws.com
email-smtp.eu-west-1.amazonaws.com
587
or
2587
yes Access key +

Notes:
* Exchange with use the credentials of the user running the Python command to determine what rights they have to send email. You do not need to login when using this method.
! You will have to allow access to less less secure apps. See here for Gmail and here for YahooMail.
+ Password is the secret access key. Verify email address before sending with this guide.

For a local Exchange server where you control the network (top option in the table above) you can use the code from Matt’s blog post above without change. However where the server is remote or you want to specify login credentials for Exchange you should look to using the Extended SMTP commands. This can be done by changing the sendmail code with the following.

s=smtplib.SMTP(Server,Port)
if 'STARTTLS' in s.ehlo():
    s.starttls()
    s.login(Username,Password)
    s.sendmail(...)
s.quit()

As a final note, I’ve included the ehlo command (Extended Hello) in the example above although all of the gateways listed will work without this. This is best practice as it informs the gateway you want to use extended commands and will also return a list of commands accepted. I’ve used this to check the gateway supports STARTTLS (I don’t want my username and password sent unencrypted to the server).

WinError 10048

While testing a simple tornado app I managed to get a WinError 10048. The rest of the error message made no sense which is why I’ve made a note of it here.

Basically is was caused by the port I had tried binding to already being in use. In my case a previous run had failed to exit. When I tried to run the program again I got this message. A simple mistake and easy to fix.

If you do need to check what applications are listening on ports use the netstat command with as follows

netstat -ab

Your own web Python IDE

A few years back I was wondering about editing Python in the cloud when I first got a Chromebook. At the time web based IDEs were only just starting to go mainstream; they have flourished since. How much better would it be if you could set up your own server? Writing a code editor in JavaScript is a huge undertaking but there are tools out there to do this and once you have the code the rest should be straight forward.

I wanted to try out Ace code editor for a few reasons; it is mature, supports Python (along with every other language I’m bothered about), is customizable, extensible, is licensed under a BSD licence, has an API and seemed easy to set up. So I set about creating a test rig so I could investigate the code editor with. Just getting the code editor to display without an absolute window position proved tricky (it’s always the little things that take the time) but eventually I discovered the div tag needed a height property to work the way I wanted it to.

With the editor now displaying, I really wanted to get my rig to run the code and display the result so I knew everything was working. This prompted the executing code post a couple of days ago (now you know the project I was hinting at). With a working execute function that returned the output as a string I just needed to get the text out of Ace using the method provided and back to the server.

I originally tried to do this without using any other JavaScript libraries but soon realised this was adding a lot of unnecessary and distracting script to the web page so went back to using the ever useful jQuery library to deal with the post and the updating of the results div. Surprisingly it worked on the first go.

The code (in BitBucket due to the length) uses all of the executing arbitrary code lines from the previous post but with the test now replaced with a bottle web server and a 35 line single page web app. I mixed CSS and JavaScript in with the HTML to simplify the server side code not because I recommend this or would do this in reality.

Running the code should give you an editor similar to the one below.

pyedit