Repositories

Github API

In the old days of centralised version control systems, the number of repositories tended to be small as creating new ones usually involved convincing the admin to provide (usually expensive database) space for the repo. In these decentralised days where anyone can create or clone a repo and the rise of automation has usually been underpinned by a repo, the number of repos the admin is responsible for has ballooned to tens or sometime even hundreds of repos.

At this scale, manually configuring repos time consuming and error prone. Thankfully GitHub provides an API for management and there is a Python wrapper around the API called PyGithub. I’ve had a few tasks to do in Github recently so I’ve come up with a few automation scripts.

From the documentation, you first need to authenticate using either username and password or preferably an access token. There is also a variation if you have an enterprise account with its own domain name; as I haven’t I’ve ignored this option but I wanted to write the scripts is such a way as to make adding this easy.

Once authenticated you are most likely to want to limit the repos to those within your organisation using the get_organization method. This is already 4 options (ignoring enterprise accounts) just to list the repos. As I intend to have several scripts it makes sense to standardise this with the following 3 functions

githuboptparse
Create an optparse (parameter) parser to read in all standard options. Returns the parser so additional options can be added

githublogin
Authenticate with GitHub, the different methods are provider by
githuboptparse so the options need to be passed into this function

get_filtered_repos
Return all the repos in the organisation (if the organisation was specified at the command line) otherwise return all repos the user has access to. This is set by githuboptparse so the options need to be passed into this function. You also need to be authenicated so the return value of
githublogin also needs to be passed in. Returns an iterator just like get_repos() does.

In order to share these between the scripts I created a _utils.py file. With the standard options now abstracted, the boilerplate code to list all the repos reduces to just 6 lines, import _utils.py, create parser with githuboptparse, parse options, authenticate with githublogin, iterate through repos with get_filtered_repos and print repo name.

Most organisations will have naming convention so it is likely you are going to want to filter the repos further based on some criteria. This will involve modifying the iterator return, which sounds tricky but in fact is fairly easy to achieve as this contrived example shows

names = ('title','first','middle','last','suffixes')
for n in names: # default iterator returns all elements
    print(n)

def no_suffixes ( iteratee ):
    # ensure no suffixes are returned by iterator
    for i in iteratee:
        if i != 'suffixes':
            yield i

for n in no_suffixes(names):
    print(n) # look, suffixes has gone

Using this principle I added the ability to include only repos that contain a given regex and to exclude repos that match a given regex. I added the options to the githuboptparse function and then changed the filtered the iterator returned from get_filtered_repos  in the same way as above. Use githubrepos.py to test out this filtering.

Advertisements