Core language

Language features and explanations

Dates and time

Python has a reasonably good standard library module for handing dates and times but it can be a little confusing to a beginner probably because the first code they encounter will look something like the below with very little explanation.

import datetime
print("Running on %s" % (datetime.date.today()))
myDate = datetime.datetime(2018,6,18,16,13,0)

Why is it datetime.datetime? It is a simple explanation but one I’ve rarely seen included.

All Pythons classes for handling dates and times are in the module called datetime (naturally enough). This module contains a class for dates with no time element (datetime.date), a class for times (datetime.time) and a class for when you need both called unsurprisingly (but a little unfortunately) datetime.datetime, hence the code above.

It also contains 2 more classes; datetime.timedelta which is the interval between two dates / datetimes (the result of subtracting one datetime from another) and tzinfo, standard for time zone info, which is used to handle timezones in the time and datetime classes.

To add to the confusion, if you want to to get the date / time / datetime as of now, there is not a standard across the three; datetime uses the now() method, date uses the today() method and time does not have one! You have to use datetime and get the time part as below

import datetime
# Get the date and time as of now as a datetime
print(datetime.datetime.now())
# Get the date as of now (today)
print(datetime.date.today())
# Get the time as of now - have to use datetime!
print(datetime.datetime.now().time())

The confusion does not end there. If you want to format the date / time / datetime in a particular way you can use the strftime() method – probably short for string format time. The same method exists in all classes. Why it is called time and not date or something more generic is beyond me, datetime.date.strftime() makes little sense.

If you are reading in strings and need them parsed into a date / time / datetime there is strptime() method – probably short for string parse time – but this only exists in the datetime class. So you have to use a similar trick as above and create a datetime and extract just the date or time part.

Once you get passed the quirks above, you should find the datetime module straight forward to use. However if you do find yourself needing a library with more power, try the dateutil library. It can be installed with the usual pip install python-datetutil command.

Advertisements

pip + virtualenv = pipenv

I have long argued that one of the reasons Node took off so quickly was the inclusion of npm for package management. For all its faults, it allows anyone to quickly get up and working with a project and to build powerful applications by utilising other libraries. What’s more, by being local first, it avoids some of the dependency problems caused by different applications requiring different versions of the same library (at the expense of disk space and a little RAM).

Python never initially had a package manager but pip has evolved into the de facto standard and is now included with the installer. All packages are installed globally on the machine; this makes sense given Pythons’ history but is not idea. To have local packages just for your app you needed virtualenv or a similar tool.

The obvious next step to close the gap with npm would be a single tool that would set up a local environment and install the modules into. And that is exactly what pipenv does. It was created by by Kenneth Reitz (the author of the requests module which I’ve used in several posts ) and has quickly gained popularity in the last year.

Lacey has done a good write-up of the history that lead to pipenv on this blog post and there is a full guide available here, but it is just as simple to show you with an example. First install pipenv with pip install pipenv

Then you can create a project, with virtualenv and install the requests module with the following

mkdir pipenvproject
cd pipenvproject
pipenv install requests

That’s it (although personally I would have liked to see a pipenv init command). To prove there is a virtual environment there use the shell option to switch to it (no more remembering the patch to the batch file). To prove this try the following

pipenv shell
pip list
exit
pip list

The first pip list should just show requests and its dependencies. After exiting out of the virtual environment shell, the second pip list will list all of the packages installed on your system.

Mr Popularity

David Robinson has done some good analysis on searches on Stack Overflow on the popularity of languages. This shows that Python is on track to be the most searched for programming language. He followed this up with further analysis to show the increase appears to be coming from data science and machine learning. This follows on from IEEE Spectrum putting Python as the most popular programming language for 2017 among developers.

Apart from giving me an excuse to put lots of links in the first paragraph what does this show. Probably little more than Python is flexible which we knew already (it’s why we’ve been using it). You can learn to program with it, produce a web service with it, do data analysis with it as well as all of automate your system administration jobs (which this blog mostly deals with).

Rather than starting a flame war over which language is the most popular or useful, a more useful takeaway from this is Python really is a first class language. It is not just an alternative to PERL, you can use it as your go-to language for everything and only change if a reason to do so appears.

GIL: Who, what, why?

For most posts I concentrate on using Python to solve tasks (mostly system administration based). Apart from fringe cases these are not multi-threaded so I can safely ignore Python’s Global Interpreter Lock (normally shortened to GIL). Even when running a web server, it is usually left up to the web framework to handle any multi-threading so again the GIL is safely ignored.

What is the GIL? Firstly this is for CPython, the version of Python you are most likely to be running. Other implementations like Jython, PyPy or IronPython do things differently. It is just a mechanism to marshall access to internals (variables mostly) from different threads. It makes coding involving multiple threads straight forward but means threads in CPython are generally only good for solving blocking I/O.

For a brief but more technical explanation, Vinay Sajip posted a good single paragraph description to the GIL:

“Python’s GIL is intended to serialize access to interpreter internals from different threads. On multi-core systems, it means that multiple threads can’t effectively make use of multiple cores. (If the GIL didn’t lead to this problem, most people wouldn’t care about the GIL – it’s only being raised as an issue because of the increasing prevalence of multi-core systems.) If you want to understand it in detail, you can view this video or look at this set of slides.”

Why my interest? I have just finished reading an article by A. Jesse Jiryu Davis which goes into far more detail about the GIL. If you are planning a C extension, looking at multi-threading some code which shares data or just curious try his Grok the GIL article as a starting point.

Python SQL Server driver on Linux

So you have packaged your SQL monitoring and maintenance routines into a web server and demonstrated it all works from your computer. Impressed they ask for it to be put on a proper server – a Linux box. 5 years ago this would have involved using unsupported 3rd party drivers and who ran internal Linux servers anyway. Now the request seems almost reasonable although you will have to jump through more hoops than you would with Windows.

First off I’ll assume you are using the pyodbc module. On Linux this will require a C compiler. If you have chosen a minimal install then you’ll need to install them. This can be done with the following command (depending upon the flavour)

Redhat (Centos/Fedora)
sudo yum groupinstall 'Development Tools' -y
sudo yum install python-devel unixODBC-devel -y

Debian (Ubuntu)
sudo apt-get install build-essential -y
sudo apt-get install python-dev unixodbc-dev -y

With this done you can now pip install pyodbc. The pyodbc module is a wrapper around the native system drivers so you will need to install a suitable unixodbc driver. Microsoft have produced an official unixODBC driver since 2012 and it has been regularly maintained since. Installation instructions for v13 can be found on this blog post.

With pyodbc and unixodbc set up all you need to change in your actual code is the driver on the ODBC connection string to ‘ODBC Driver 13 for SQL Server’ and away you go. As a quick test, the following example will establish a connection and return the servername through a SQL query.

import pyodbc
cnxnstr = "Driver=ODBC Driver 13 for SQL Server;Server=<yourserver>;Uid=<yourusername>;Pwd=<yourpassword>;database=<yourdatabase>"
cnxn = pyodbc.connect(cnxnstr)
cursor = cnxn.cursor()
cursor.execute("SELECT @@SERVERNAME")
result = cursor.fetchall()
for row in result:
    print(row)
cursor.close()
cnxn.close()

Virtual environments in Visual Studio

A virtual environment in Python is a folder with everything needed to set up local configuration isolated from the rest of the system. This allows you can have modules installed locally which are different or do not exist in the global Python configuration. If you have used Node.js then you can think of virtual environments as npm default way of working – creating a local install of a package rather than a global one (pip’s default).

If you have multiple versions of Python installed on your machine then you can also specify which version of Python the virtual environment should use. This gives you ability to test your code against multiple versions of Python just by creating multiple virtual environments.

There are already plenty of good posts out there on virtual environments so the aim of this blog post is not to rehash why you should use virtual environments (see here for a good introductory blog post here) or as a quick setup guide (see the Hitchhikers Guide to Python post). It is a quick guide to using virtual environments within Visual Studio. If you have not used virtual environments before it is worth giving these posts a quick read before continuing.

As an aside, Python 3.3 introduced the venv module as an alternative for creating lightweight virtual environments (although the original wrapper pyvenv has already be depreciated in Python 3.6). While this is the correct way going forward, Visual Studio uses the older virtualenv method which is what I am concentrating on here.

Once you have created your Python solution expand it until you get to Python Environments. Right-click on this and choose Add Virtual Environment… from the menu list as shown belowvsve1

 

You can change the name of the folder (defaults to env) which is also used as the name of the virtual environment and the version of Python to use. Click Create to finish and you are ready to go (easy wasn’t it). If you expand the Python Environments node you should see the virtual environment appear.

In the background this has created a folder (the virtual environment) in your working directory with the name given. In case you are unsure, your working directory is the location is the location of the solution which defaults to X:\Users\me\Documents\VS20xx\Projects\Project Name\Solution Name\ – tip, change the default location). This could have been done manually by changing into the working directory and entering the following command (where X:\Python_xx is the installation directory for the version of Python you want to use and env is the name of the folder / virtual environment – if you just want your default version of Python then just pass the name of the folder).

virtualenv -p X:\Python_xx\python.exe env

To install a module into the virtual environment from Visual Studio just right-click on the virtual environment and select Install Python Package… from the menu or if you have a requirements.txt file you can select Install from requirements.txt. If you expand the virtual environment node you will see the modules installed. Once you have all the modules installed you can generate the requirements.txt file from the same menu and it will add the requirements.txt to your project for portability.

What if you want to use this virtual environment from the command line? Inside of the virtual environment is a Scripts directory with a script to make the necessary changes; the trick is to run the correct script from the working directory. The script to run depends upon whether you are running inside a PowerShell console (my recommendation) or from a command prompt. Change into the working directory and type in the following command (where env is the virtual environment folder)

PowerShell: .\env\Scripts\activate.ps1
Command prompt: env\Scripts\activate.bat

The prompt will change to the name of the virtual environment to show activation has succeeded. You can do everything you would normally do from the command line but now you are running against the virtual environment. To confirm the modules installed are only those you have specified type in ‘pip list’ and the version of Python is the one you specified with ‘python -v’.

Update: It appears I’m not the only one to be looking at virtual environments today, see this article if you want a similar introduction but from the command prompt only.

Python 3.6

Almost like a Christmas present, Python 3.6 has been released, just fifteen months after 3.5 was released (compared to an 18 month average for the 3.x branch). You can see the official what’s new page here, or if dry lists of features are not your thing, try this summary of the improvements. If nothing else the speed improvements might end one of the arguments for staying on v2.

What will be interesting to see is the take up of the asynchronous additions of Python 3.5 which have further been improved in 3.6. Node.js has shown just how efficient asynchronous programming can be and hopefully async / await can make this just as accessible in Python. If you’ve need seen these new keywords before, see this blog post for a decent introduction.

I’m hoping to investigate asynchronous programming in the New Year so there should be a post on here in the near future.

Pip requirements

You should be used to installing new modules using pip. You have probably used a requirements.txt file to install multiple modules together with the command.

pip install -r requirements.txt

But what about if you need more flexibility. Why would you ever need more flexibility? If you look at my introduction to YAML post, the code supports either the yaml or ruamel.yaml module. There is no way to add conditional logic to a requirements.txt file so a different strategy is needed.

pip is just a module so it can be imported like any other module. This not only gives you access to the main method, which takes an argument list just as if you were calling pip from the command line, but also to its various methods and classes. One of these is the WorkingSet class which creates a collection of the installed modules (or active distributions as the documentation calls them). Using this we can create the conditional logic needed to ensure one of the yaml modules is installed as below.

import pip
package_names = [ ws.project_name for ws in pip._vendor.pkg_resources.WorkingSet() ]
if ('yaml' not in package_names) and ('ruamel.yaml' not in package_names):
    pip.main(['install','ruamel.yaml'])

WorkingSet returns a few other useful properties and methods apart from the package_name. The location property returns the path to where the module is installed and the version property naturally returns the version installed. The requires method returns a list of dependencies.

As with most modules, if you’re interested in finding out more dig around in the source code.

Yielding files

Time for a challenge, so I’m going to try 12 blogs of Christmas. The aim is to write 12 blog entries in December (or at least by 5th January which is the 12th day of Christmas). That is one blog entry every 3 days. It’s a catchy title for a challenge (always helps, think Movember) which could be used for any challenge and I’ve twisted my ankle so I doubt I’ll be running; the 12 runs of Christmas does sound nice anyway. Yes it is the 4th already so not a good start.

After the last post I’ve been thinking of other examples of where a generator would be useful that was more in keeping with the theme of this blog (sys administration with Python in case you’ve forgotten). Iterating through system calls or an API would be a good candidate but I’ve not been using anything recently that fitted the bill. Another case that sprang to mind was file searching.

A reasonable way to do this would be to create a list but why use the memory to create the list if the caller is unlikely to need a list and they can use list comprehension to create a list anyway. So this should make a good generator example.

Some of the work is done already by os.walk; this will iterate through each directory giving you a list of files and folders. Normally when you looking for files you would specify a wildcard pattern so I’m going to use regular expressions and return any file that matches using yield. I’ve covered regular expressions a few times before so I’ll skip any explanation and just present the code which takes a directory and a file pattern and returns all the matching files.

import os, re</pre>
<pre>def filesearch (root, pattern, exact=True):
    searchre = re.compile(pattern)
    for parent, dirs, files in os.walk(root):
        for filename in files:
            if exact:
                res = searchre.match(filename)
            else
                res = searchre.search(filename)
            if res:
                yield os.path.join(parent,filename)

for filename in filesearch(r"C:\Temp",r".*\.exe"):
    print("%s has size %d" % (filename,os.path.getsize(filename)))

The only thing to note is I added a third option so you can do a match (the regular expression must match the whole filename) or a search (the regular expression only needs to match part of the filename). This defaults to true which is an exact match.

The example should find any executables in the C:\temp folder. Regular expressions are very powerful but not quite as simple using *.exe. Instead the asterisk becomes .* (match any character 0 or more times) and the dot has to be escaped as it is a special character. I’ve just printed the filename and size out but you could equally delete the file if it was bigger than a certain size etc.

And that’s my first post of 12 blogs of Christmas. Lets see if I can get all 12 done in time.


									

Generators and yield

A source of confusion for a lot of people new to Python and for anyone who has not used them for a while is the yield keyword. As this must be the third time I’ve had to relearn generators I thought I’d make a few notes.

My way of visualizing a generator is a function that returns (or should that be generates?) an iterator. Having the yield keyword in a function is enough to turn it into a generator. Once you have called the generator to get the iterator you can use it as you would any other iterator. Jeff Knupp has a much fuller explanation on his blog so give it a read and then return.

For an example I created a Fibonacci number generator with the following code along with a few examples using an iterator.

def fibonacci(a = 0,b = 1,maxiter=-1):
    while True:
        yield a
        a,b = b,a+b
        if maxiter > 0:
           maxiter -= 1
           if maxiter < 1:
              return

print([f for f in fibonacci(maxiter=10)])
for f in fibonacci(3,5,20):
    print(f)

First a few notes on the generator itself. You can specify the starting numbers a and b (naturally defaulting to 0 and 1) and a maximum number of Fibonacci numbers (or iterations to perform) with maxiter when you call the generator. Without setting maxiter the iterator will continue indefinitely and could not be used for list comprehension (the first example) and the for loop would be an infinite loop.

The use of return in a generator ends the iteration and is equivalent to raising a StopIteration exception (see PEP 255).  Replace the return keyword with raise StopIteration if you want to prove it.

Lastly, if you are wondering about the line a,b=b,a+b it is just a compact (and I think elegant) way of writing:

temp = a + b
a = b
b = temp

Behind the scenes the loop is calling a next method to get the next value from the iterator. There is nothing to stop you manually calling the next method as shown below. Also the generator will create a new iterator each time it is called. Each iterator will encapsulate their own values for a, b and maxiter as shown below.

i = fibonacci()
j = fibonacci(13,21)
print("variable i is %s\nvariable j is " % (i,j))
print("First 3 from i: %d , %d , %d" % (i.next(),i.next(),i.next()))
print("First 3 from j: %d , %d , %d" % (j.next(),j.next(),j.next()))

Hopefully Jeff’s explanation and my example above goes some way to explaining how generators work.