At VendAsta we frequently share libraries of code between projects. To make it easier to share this code I’ve developed a small package manager that downloads code within a directory from Github to be copied in to your current project. It’s a quick and dirty alternative to cloning an entire repository, grabbing the set of files you want and placing them in your project.
We’ll use the PyGithub Python library to interact with the Github API.
Logging in to Github
The first step is to log in to Github using our credentials. To do this we
instantiate a new Github object given our username and password and access the
associated user by calling get_user
.
from github import Github
github = Github('soofaloofa', 'password')
user = github.get_user()
This is equivalent to making a basic authentication request to get the currently authenticated user and storing the result in a local representation.
curl -u soofaloofa https://api.github.com/user
Accessing a repository
Now that we have a user we can get a repository for that user by name. To get the repository for this website we make a request to get a repo by owner.
repository = user.get_repo('soofaloofa.github.io')
Downloading a single file
To download a single file from a repository we make a call to get the contents of a file.
file_content = repository.get_contents('README.md')
Referencing commits
We have all the building blocks to download a resource from Github. The next step is to download a resource referenced by a specific commit. The Github API expects SHA values to reference a commit. To make this a bit more user friendly we can write a function that will search for a SHA given a git tag or branch name.
def get_sha_for_tag(repository, tag):
"""
Returns a commit PyGithub object for the specified repository and tag.
"""
branches = repository.get_branches()
matched_branches = [match for match in branches if match.name == tag]
if matched_branches:
return matched_branches[0].commit.sha
tags = repository.get_tags()
matched_tags = [match for match in tags if match.name == tag]
if not matched_tags:
raise ValueError('No Tag or Branch exists with that name')
return matched_tags[0].commit.sha
Now we can pass this SHA to the get_contents
function to get a file for that
specific commit.
sha = get_sha_for_tag(repository, 'develop')
file_content = repository.get_contents('README.md', ref=sha)
Putting it all together
By putting a bit more polish on this we can easily download entire directories of code that reference a single tag or branch and copy them to our local environment. The basic workflow is:
- Choose a repository.
- Choose a branch or tag.
- Choose a directory.
- Iteratively download all the files in that directory.
Let’s make that happen.
For this code I’ll assume that the Github user belongs to a single organization and that this organization is sharing code between repositories.
from github import Github
import getpass
username = raw_input("Github username: ")
password = getpass.getpass("Github password: ")
github = Github(username, password)
organization = github.get_user().get_orgs()[0]
repository_name = raw_input("Github repository: ")
repository = organization.get_repo(repository_name)
branch_or_tag_to_download = raw_input("Branch or tag to download: ")
sha = get_sha_for_tag(repository, branch_or_tag_to_download)
directory_to_download = raw_input("Directory to download: ")
download_directory(repository, sha, directory_to_download)
This piece of code is fairly simple and relies on a couple of helper functions:
get_sha_for_tag
and download_directory
. get_sha_for_tag
will return the
SHA commit hash given a branch or tag and download_directory
will recursively
download the files in the given directory.
def get_sha_for_tag(repository, tag):
"""
Returns a commit PyGithub object for the specified repository and tag.
"""
branches = repository.get_branches()
matched_branches = [match for match in branches if match.name == tag]
if matched_branches:
return matched_branches[0].commit.sha
tags = repository.get_tags()
matched_tags = [match for match in tags if match.name == tag]
if not matched_tags:
raise ValueError('No Tag or Branch exists with that name')
return matched_tags[0].commit.sha
def download_directory(repository, sha, server_path):
"""
Download all contents at server_path with commit tag sha in
the repository.
"""
contents = repository.get_dir_contents(server_path, ref=sha)
for content in contents:
print "Processing %s" % content.path
if content.type == 'dir':
download_directory(repository, sha, content.path)
else:
try:
path = content.path
file_content = repository.get_contents(path, ref=sha)
file_data = base64.b64decode(file_content.content)
file_out = open(content.name, "w")
file_out.write(file_data)
file_out.close()
except (GithubException, IOError) as exc:
logging.error('Error processing %s: %s', content.path, exc)
We’ve been using a variation of this simple script to share code between Github repositories and appreciate it’s flexibility and ease of use. Let me know if you find it useful!