Web hooks from GitHub repo

Following on from an earlier post on the GitHub API, I had the requirement to add web hooks to a group of repos so that all pull requests raised, completed or closed would be posted to the web hook. All the repos have a common prefix and the PyGithub library provides a doc on how to create a web hook so just need to put the two together.

As I want this to be re-runnable, I first need to check there is not already a web hook. For this I created the function has_webhook which iterates through the existing hooks, using the get_hooks function, looking to see if any reference the passed in URL.

With that done I just need to add a little bit of additional code to my githubrepos.py template. First an option to the parameters so I can specify the web hook. Next I need a hook config dictionary for creating any new hooks and a list of events. As this is the same for each, I’ll do this at the start and for the sake of simplicity hardcode the list . Finally for my repo loop I need to call has_webhook function and for any which do not already have the web hook, create it using the create_hook function.

This gives me the final webhooks.py.

As a final note. If you are looking to integration Github into Slack or MS Teams, both offer apps which largely automate the process. Ultimately what they are doing is setting up a webhook in the same way as above.

Github API

In the old days of centralised version control systems, the number of repositories tended to be small as creating new ones usually involved convincing the admin to provide (usually expensive database) space for the repo. In these decentralised days where anyone can create or clone a repo, the number of repos the admin is responsible for has ballooned to hundreds or sometime even thousands of repos.

At this scale, manually configuring repos time consuming and error prone. Thankfully GitHub provides an API for management and there is a Python wrapper around the API called PyGithub. I’ve had a few tasks to do in Github recently so I’ve come up with a few automation scripts.

From the documentation, you first need to authenticate using either username and password or preferably an access token. There is also a variation if you have an enterprise account with its own domain name; as I haven’t I’ve ignored this option but I wanted to write the scripts is such a way as to make adding this easy.

Once authenticated you are most likely to want to limit the repos to those within your organisation using the get_organization method. This is already 4 options (ignoring enterprise accounts) just to list the repos. As I intend to have several scripts it makes sense to standardise this with the following 3 functions

githuboptparse
Create an optparse (parameter) parser to read in all standard options. Returns the parser so additional options can be added

githublogin
Authenticate with GitHub, the different methods are provider by
githuboptparse so the options need to be passed into this function

get_filtered_repos
Return all the repos in the organisation (if the organisation was specified at the command line) otherwise return all repos the user has access to. This is set by githuboptparse so the options need to be passed into this function. You also need to be authenicated so the return value of
githublogin also needs to be passed in. Returns an iterator just like get_repos() does.

In order to share these between the scripts I created a _utils.py file. With the standard options now abstracted, the boilerplate code to list all the repos reduces to just 6 lines, import _utils.py, create parser with githuboptparse, parse options, authenticate with githublogin, iterate through repos with get_filtered_repos and print repo name.

Most organisations will have naming convention so it is likely you are going to want to filter the repos further based on some criteria. This will involve modifying the iterator return, which sounds tricky but in fact is fairly easy to achieve as this contrived example shows

names = ('title','first','middle','last','suffixes')
for n in names: # default iterator returns all elements
    print(n)

def no_suffixes ( iteratee ):
    # ensure no suffixes are returned by iterator
    for i in iteratee:
        if i != 'suffixes':
            yield i

for n in no_suffixes(names):
    print(n) # look, suffixes has gone

Using this principle I added the ability to include only repos that contain a given regex and to exclude repos that match a given regex. I added the options to the githuboptparse function and then changed the filtered the iterator returned from get_filtered_repos  in the same way as above. Use githubrepos.py to test out this filtering.