Part I. A Guided Tour of the Social Web Prelude
7.2.1. Creating a GitHub API Connection 282
Like other social web properties, GitHub implements OAuth, and the steps to gaining API access involve creating an account followed by one of two possibilities: creating an application to use as the consumer of the API or creating a “personal” access token that will be linked directly to your account. In this chapter, we’ll opt to use a personal access token, which is as easy as clicking a button in the Personal Access API Tokens section of your account’s Applications menu, as shown in Figure 7-1. (See Appendix B for a more extensive overview of OAuth.)
Figure 7-1. Create a “Personal API Access Token” from the Applications menu in your account and provide a meaningful note so that you’ll remember its purpose
282 | Chapter 7: Mining GitHub: Inspecting Software Collaboration Habits, Building Interest Graphs, and More
A programmatic option for obtaining an access token as opposed to creating one within the GitHub user interface is shown in Example 7-1 as an adaptation of “Creating an OAuth token for command-line use” from GitHub’s help site. (If you are not taking advantage of the virtual machine experience for this book, as described in Appen‐
dix A, you’ll need to type pip install requests in a terminal prior to running this example.)
Example 7-1. Programmatically obtaining a personal API access token for accessing GitHub’s API
import requests
from getpass import getpass import json
username = '' # Your GitHub username password = '' # Your GitHub password
# Note that credentials will be transmitted over a secure SSL connection url = 'https://api.github.com/authorizations'
note = 'Mining the Social Web, 2nd Ed.' post_data = {'scopes':['repo'],'note': note } response = requests.post(
url,
auth = (username, password), data = json.dumps(post_data), )
print "API response:", response.text print
print "Your OAuth token is", response.json()['token']
# Go to https://github.com/settings/applications to revoke this token
As is the case with many other social web properties, GitHub’s API is built on top of HTTP and accessible through any programming language in which you can make an HTTP request, including command-line tools in a terminal. Following the precedents set by previous chapters, however, we’ll opt to take advantage of a Python library so that we can avoid some of the tedious details involved in making requests, parsing responses, and handling pagination. In this particular case, we’ll use PyGithub, which can be in‐
stalled with the somewhat predictable pip install PyGithub. We’ll start by taking at a couple of examples of how to make GitHub API requests before transitioning into a discussion of graphical models.
Let’s seed an interest graph in this chapter from the Mining-the-Social-Web GitHub repository and create connections between it and its stargazers. Listing the stargazers for a repository is possible with the List Stargazers API. You could try out an API request to get an idea of what the response type looks like by copying and pasting the following 7.2. Exploring GitHub’s API | 283
URL in your web browser: https://api.github.com/repos/ptwobrussell/Mining-the- Social-Web/stargazers.
Although you are reading Mining the Social Web, 2nd Edition, at the time of this writing the source code repository for the first edition still has much more activity than the second edition, so the first edition repository will serve as the basis of examples for this chapter. Analy‐
sis of any repository, including the repository for the second edition of this book, is easy enough to accomplish by simply changing the name of the initial project as introduced in Example 7-3.
The ability to issue an unauthenticated request in this manner is quite convenient as you are exploring the API, and the rate limit of 60 unauthenticated requests per hour is more than adequate for tinkering and exploring. You could, however, append a query string of the form ?access_token=xxx, where xxx specifies your access token, to make the same request in an authenticated fashion. GitHub’s authenticated rate limits are a generous 5,000 requests per hour, as described in the developer documentation for rate limiting. Example 7-2 illustrates a sample request and response. (Keep in mind that this is requesting only the first page of results and, as described in the developer documen‐
tation for pagination, metadata information for navigating the pages of results is in‐
cluded in the HTTP headers.)
Example 7-2. Making direct HTTP requests to GitHub’s API
import json import requests
# An unauthenticated request that doesn't contain an ?access_token=xxx query string url = "https://api.github.com/repos/ptwobrussell/Mining-the-Social-Web/stargazers"
response = requests.get(url)
# Display one stargazer
print json.dumps(response.json()[0], indent=1) print
# Display headers
for (k,v) in response.headers.items():
print k, "=>", v
Sample output follows:
{
"following_url": "https://api.github.com/users/rdempsey/following{/other_user}", "events_url": "https://api.github.com/users/rdempsey/events{/privacy}",
"organizations_url": "https://api.github.com/users/rdempsey/orgs", "url": "https://api.github.com/users/rdempsey",
284 | Chapter 7: Mining GitHub: Inspecting Software Collaboration Habits, Building Interest Graphs, and More
"gists_url": "https://api.github.com/users/rdempsey/gists{/gist_id}", "html_url": "https://github.com/rdempsey",
"subscriptions_url": "https://api.github.com/users/rdempsey/subscriptions", "avatar_url": "https://1.gravatar.com/avatar/8234a5ea3e56fca09c5549ee...png", "repos_url": "https://api.github.com/users/rdempsey/repos",
"received_events_url": "https://api.github.com/users/rdempsey/received_events", "gravatar_id": "8234a5ea3e56fca09c5549ee5e23e3e1",
"starred_url": "https://api.github.com/users/rdempsey/starred{/owner}{/repo}", "login": "rdempsey",
"type": "User", "id": 224,
"followers_url": "https://api.github.com/users/rdempsey/followers"
}
status => 200 OK
access-control-allow-credentials => true x-ratelimit-remaining => 58
x-github-media-type => github.beta x-content-type-options => nosniff
access-control-expose-headers => ETag, Link, X-RateLimit-Limit,
X-RateLimit-Remaining, X-RateLimit-Reset, X-OAuth-Scopes, X-Accepted-OAuth-Scopes transfer-encoding => chunked
x-github-request-id => 73f42421-ea0d-448c-9c90-c2d79c5b1fed content-encoding => gzip
vary => Accept, Accept-Encoding server => GitHub.com
last-modified => Sun, 08 Sep 2013 17:01:27 GMT x-ratelimit-limit => 60
link => <https://api.github.com/repositories/1040700/stargazers?page=2>;
rel="next",
<https://api.github.com/repositories/1040700/stargazers?page=30>;
rel="last"
etag => "ca10cd4edc1a44e91f7b28d3fdb05b10"
cache-control => public, max-age=60, s-maxage=60 date => Sun, 08 Sep 2013 19:05:32 GMT
access-control-allow-origin => *
content-type => application/json; charset=utf-8 x-ratelimit-reset => 1378670725
As you can see, there’s a lot of useful information that GitHub is returning to us that is not in the body of the HTTP response and is instead conveyed as HTTP headers, as outlined in the developer documentation. You should skim and understand what all of the various headers mean, but a few of note include the status header, which tells us that the request was OK with a 200 response; headers that involve the rate limit, such as x-ratelimit-remaining; and the link header, which contains a value such as the following:
https://api.github.com/repositories/1040700/stargazers?page=2; rel="next", https://api.github.com/repositories/1040700/stargazers?page=29; rel="last".
7.2. Exploring GitHub’s API | 285
The link header’s value is giving us a preconstructed URL that could be used to fetch the next page of results as well as an indication of how many total pages of results there are.