Yielding files

Time for a challenge, so I’m going to try 12 blogs of Christmas. The aim is to write 12 blog entries in December (or at least by 5th January which is the 12th day of Christmas). That is one blog entry every 3 days. It’s a catchy title for a challenge (always helps, think Movember) which could be used for any challenge and I’ve twisted my ankle so I doubt I’ll be running; the 12 runs of Christmas does sound nice anyway. Yes it is the 4th already so not a good start.

After the last post I’ve been thinking of other examples of where a generator would be useful that was more in keeping with the theme of this blog (sys administration with Python in case you’ve forgotten). Iterating through system calls or an API would be a good candidate but I’ve not been using anything recently that fitted the bill. Another case that sprang to mind was file searching.

A reasonable way to do this would be to create a list but why use the memory to create the list if the caller is unlikely to need a list and they can use list comprehension to create a list anyway. So this should make a good generator example.

Some of the work is done already by os.walk; this will iterate through each directory giving you a list of files and folders. Normally when you looking for files you would specify a wildcard pattern so I’m going to use regular expressions and return any file that matches using yield. I’ve covered regular expressions a few times before so I’ll skip any explanation and just present the code which takes a directory and a file pattern and returns all the matching files.

import os, re</pre>
<pre>def filesearch (root, pattern, exact=True):
    searchre = re.compile(pattern)
    for parent, dirs, files in os.walk(root):
        for filename in files:
            if exact:
                res = searchre.match(filename)
                res = searchre.search(filename)
            if res:
                yield os.path.join(parent,filename)

for filename in filesearch(r"C:\Temp",r".*\.exe"):
    print("%s has size %d" % (filename,os.path.getsize(filename)))

The only thing to note is I added a third option so you can do a match (the regular expression must match the whole filename) or a search (the regular expression only needs to match part of the filename). This defaults to true which is an exact match.

The example should find any executables in the C:\temp folder. Regular expressions are very powerful but not quite as simple using *.exe. Instead the asterisk becomes .* (match any character 0 or more times) and the dot has to be escaped as it is a special character. I’ve just printed the filename and size out but you could equally delete the file if it was bigger than a certain size etc.

And that’s my first post of 12 blogs of Christmas. Lets see if I can get all 12 done in time.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s