# Pre-processing a file with a generator

While answering a forum post on a function that processed a list I got thinking about how it would run in a real-life situation. Rather than a list being passed it would probably be a file. This almost worked except the line returns were passed in and I needed those stripped out. I was hoping to find an elegant solution and I did, a generator.

If you have not used generators before this wiki post is a good starting point. If you have used list comprehension then it is exactly the same just with different brackets. I’ll use collections.Counter() in place of the function to demonstrate; for those using a Python version earlier than 2.7 you will to create your own function.

First an example with a list which acts as the starting point:

def basicCounter ( mylist ):
# Python 2.7+ users could use collections.Counter instead
retdic = dict()
for item in mylist:
retdic[item] = retdic.get(item,0) + 1
return retdic

mylist = ['1','2','2','3','3','3']
counted = basicCounter(mylist)
print counted


Now let create a generator to process the lines in a file to remove the whitespace and line returns. The strip() function does this for a string, we just need to do this for every line in the file. This gives us our generator; (line.strip() for line in file).

Add a bit of code for opening the file and we have our version of the above which uses the contents of a file for the input instead.

#  basicCounter as before
# Python 2.5 users need the following line
# from __future__ import with_statement
with open(r'C:\path\to\file.txt') as myfile:
counted = basicCounter(line.strip() for line in myfile)
print counted


There is nothing to stop you making the processing much more complex; simply create your function and replace line.strip() with yourfunction(line). You can also make the processing conditional by adding an if clause at the end.