replace

Regular expression substituions

Following on from my introduction to regular expressions in Python, it is time to substitute the match with something more useful. This is done with the sub method. This takes at least 3 parameters; the regular expression, the replacement and the text to search. At its most basic you have the following

re.sub("PERL","Python","I program in PERL!")

This is not very exciting, the replace method on a string does exactly the same. But this basic example hides two powerful features; the first parameter is a regular expression and the second parameter can also be a function. Put this together with the example I used when introducing regular expressions and we have:

import re
def toupper (matchobj):
    return matchobj.group().upper()
text="Welcome to Python's Regular Expressions. I hope you enjoy what you F1nD."
regex=r"([A-Z])\w+"
print(re.sub(regex,toupper,text))

This matches the same words as previously but this time changes them to uppercase. The regex I’ve covered in some detail but the function parameter needs a bit more explaination. The function is passed the match object for each match and whatever the function returns is what is substituted into the text.

In the example above I’ve used the group method with no parameters to return the entire string that was matched. I simply turned this to uppercase so you can see something happening before returning it to the sub method. It is not much of stretch to go from this to basic template functionality.

I am going to look through the template for any substitution variables enclosed in double braces, {{ and }}, and replace it with result of a few functions. My first decision is how to get the name out from the matched string. I know it is two character in from both ends so I could use matchobj.group()[2:-2] but this would hard code the pattern. Instead I’ll use the grouping option of regular expressions and just enclose the variable name in parentheses and get the variable name using matchobj.group(1). This way, if I want to change the double braces to something else I can just change the regex pattern.

Then I need a way to map the variable name to the output I want. For this example I will just create a dictionary with the variable names as the key and the function to call as the value. This way if the variable name exists in the dictionary I can simply return the result of the function back.

To demonstrate I’ve create this example. I’ve included the template as a variable to make the example self contained. It should be self explanatory from the text this contains what is happening. The only other thing to mention is I change the matched string to lowercase to make the substitution case insensitive.

Advertisements