directory

Yielding files

Time for a challenge, so I’m going to try 12 blogs of Christmas. The aim is to write 12 blog entries in December (or at least by 5th January which is the 12th day of Christmas). That is one blog entry every 3 days. It’s a catchy title for a challenge (always helps, think Movember) which could be used for any challenge and I’ve twisted my ankle so I doubt I’ll be running; the 12 runs of Christmas does sound nice anyway. Yes it is the 4th already so not a good start.

After the last post I’ve been thinking of other examples of where a generator would be useful that was more in keeping with the theme of this blog (sys administration with Python in case you’ve forgotten). Iterating through system calls or an API would be a good candidate but I’ve not been using anything recently that fitted the bill. Another case that sprang to mind was file searching.

A reasonable way to do this would be to create a list but why use the memory to create the list if the caller is unlikely to need a list and they can use list comprehension to create a list anyway. So this should make a good generator example.

Some of the work is done already by os.walk; this will iterate through each directory giving you a list of files and folders. Normally when you looking for files you would specify a wildcard pattern so I’m going to use regular expressions and return any file that matches using yield. I’ve covered regular expressions a few times before so I’ll skip any explanation and just present the code which takes a directory and a file pattern and returns all the matching files.

import os, re</pre>
<pre>def filesearch (root, pattern, exact=True):
    searchre = re.compile(pattern)
    for parent, dirs, files in os.walk(root):
        for filename in files:
            if exact:
                res = searchre.match(filename)
            else
                res = searchre.search(filename)
            if res:
                yield os.path.join(parent,filename)

for filename in filesearch(r"C:\Temp",r".*\.exe"):
    print("%s has size %d" % (filename,os.path.getsize(filename)))

The only thing to note is I added a third option so you can do a match (the regular expression must match the whole filename) or a search (the regular expression only needs to match part of the filename). This defaults to true which is an exact match.

The example should find any executables in the C:\temp folder. Regular expressions are very powerful but not quite as simple using *.exe. Instead the asterisk becomes .* (match any character 0 or more times) and the dot has to be escaped as it is a special character. I’ve just printed the filename and size out but you could equally delete the file if it was bigger than a certain size etc.

And that’s my first post of 12 blogs of Christmas. Lets see if I can get all 12 done in time.


		
Advertisements

Processing file names

I seem to have spent a lot of time working with files this month. My task today was summarise the output from an inventory tool. This tool (well vb script) had created a lot of text files with the computer name and a date serial in the name. What I wanted is this information in a CSV file to compare to our asset list.

There are lots of ways of doing this but as the computer name is variable length but otherwise the file is known I used a regular expression. Regular expressions can get complicated but it this case I’m just looking for any letter, number, underscore or dash followed by .example.com (for my computer name) then an underscore followed by a 12 digit datetime serial. The rest of the name is irrelevant.

I am also using grouping by enclosing the computer name and datetime serial in parentheses. This allows me to return the matched details using group().

For the code below to work you would need to define a function formatdate to turn the datetime serial into something more readable. I’ve left this out to improve the clarity of the example.

import os, re, csv
prog = re.compile(r"([a-zA-Z0-9_\-]+).example.com_([0-9]{12})")

def getinfo ( instr ):
    res = prog.match(instr)
    if res:
        return (res.group(1),formatdate(res.group(2))]

with open('names.csv','w', newline='') as csvfile:
    csvwriter = csv.writer(csvfile)
    csvwriter.writerow(['Computer','Date']) # header
    for file in os.listdir(r'\\server\path\to\inventories'):
        fileinfo = getinfo(file)
        if fileinfo:
            csvwriter.writerow(fileinfo)

The natural progression would be to list comprehension to create a list and then write this all out in one go using writerows instead. However getting this to handle cases where the match failed resulted in code ugly code. If I work a way around this I’ll include it.

If you needed to include sub-directories as well then you could use os.walk instead of os.listdir then loop of the files list returned.

Modules

At the risk of simply repeating the document on modules, to run a method from a different python file you can use either of the following code snippets. The first gives you access to all the methods and variables inside of the file (note you don’t need the file extension) from within its own namespace. The second just the method you requested inside of your own namespace.

# Want access to all of the file's methods
import file
file.method_name()

# Just want a single method without anything
from file import method_name
method_name()

Both statements allow an optional as command to change the name. In the first case this changes the namespace (more on this later). The second changes the the name of the reference name for the method or variable. You will see the from … import a lot in code on the Internet although I tend to stay clear of it. There are a few things to be aware of

# Imports everything (with caveat) from file
# overwrites any object with the same name you already had
from file import *

# does not work
from file import method_name,another_method as new_name,another_name

# new_name refers to another_method, method_name is no longer available
from file import method_name,another_method as new_name

However how do you get access to a python file that is not in the same directory as calling python script, or the PYTHONPATH environment variable / registry value? There are two variations.

The paths search are held in a list object called sys.path and can be manipulated at runtime. Just add the your required path to this list. Don’t replace the list or you’ll lose access to all your standard libraries. As an example, the following code allows you to import any python file from either C:\PythonModules or the modules directory off the

import sys,os
sys.path.append(r"C:\PythonModules")
# getcwd gets the current working directory and add modules directory
sys.path.append(os.path.join(os.getcwd(),"modules"))

If the file you wanted to import was in a sub-directory that is already in your search path you can use the package notation. This works no matter how deep inside the directory structure the file is. So you could a import a file from the sub-directory modules \ local \ custom with the following code. Notice as gives you a shortcut rather than typing in the full namespace each time.

import modules.local.custom.file as mymod
mymod.mymethod()

The limitation of this method is that each directory will need a __init__.py file in each directory. In the above example there would have to be a  __init__.py file the modules, local and custom directories. This file can be empty or can contain initialisation code where required but if it does not exist, the directory will not be searched.

Python 3 users also note that importing a module creates a __pycache__ directory in the files location where it stores the compiled .pyc file rather than storing it in the same directory as the file which it what happened previously. So in Python 3 the above would create __pycache__ directories in modules, local and custom.