Computing Graph Centrality Measures 296

Một phần của tài liệu Mining the social web, 2nd edition (Trang 322 - 325)

Part I. A Guided Tour of the Social Web Prelude

7.4. Analyzing GitHub Interest Graphs 292

7.4.2. Computing Graph Centrality Measures 296

A centrality measure is a fundamental graph analytic that provides insight into the relative importance of a particular node in a graph. Let’s consider the following centrality measures, which will help us more carefully examine graphs to gain insights about networks:

Degree centrality

The degree centrality of a node in the graph is a measure of the number of incident edges upon it. Think of this centrality measure as a way of tabulating the frequency of incident edges on nodes for the purpose of measuring uniformity among them, finding the nodes with the highest or lowest numbers of incident edges, or otherwise trying to discover patterns that provide insight into the network topology based on number of connections as a primary motivation. The degree centrality of a node is just one facet that is useful in reasoning about its role in a network, and it provides a good starting point for identifying outliers or anomalies with respect to connect‐

edness relative to other nodes in the graph. In aggregate, we also know from our earlier discussion that the average degree centrality tells us something about the density of an overall graph. NetworkX provides networkx.degree_centrality as

296 | Chapter 7: Mining GitHub: Inspecting Software Collaboration Habits, Building Interest Graphs, and More

2. In the current discussion, the term “energy” is used to generically describe flow within an abstract graph.

a built-in function to compute the degree centrality of a graph. It returns a dictio‐

nary that maps the ID of each node to its degree centrality.

Betweenness centrality

The betweenness centrality of a node is a measure of how often it connects any other nodes in the graph in the sense of being in between other nodes. You might think about betweenness centrality as a measure of how critical a node is in con‐

necting other nodes as a broker or gateway. Although not necessarily the case, the loss of nodes with a high betweenness centrality measure could be quite disruptive to the flow of energy2 in a graph, and in some circumstances removing nodes with high betweenness centrality can disintegrate a graph into smaller subgraphs. Net‐

workX provides networkx.betweenness_centrality as a built-in function to compute the betweenness centrality of a graph. It returns a dictionary that maps the ID of each node to its betweenness centrality.

Closeness centrality

The closeness centrality of a node is a measure of how highly connected (“close”) it is to all other nodes in the graph. This centrality measure is also predicated on the notion of shortest paths in the graph and offers insight into how well connected a particular node is in the graph. Unlike a node’s betweenness centrality, which tells you something about how integral it is in connecting nodes as a broker or gateway, a node’s closeness centrality accounts more for direct connections. Think of close‐

ness in terms of a node’s ability to spread energy to all other nodes in a graph.

NetworkX provides networkx.closeness_centrality as a built-in function to compute the closeness centrality of a graph. It returns a dictionary that maps the ID of each node to its closeness centrality.

NetworkX provides a number of powerful centrality meas‐

ures in its online documentation.

Figure 7-4 shows the Krackhardt kite graph, a well-studied graph in social network analysis that illustrates the differences among the centrality measures introduced in this section. It’s called a “kite graph” because when rendered visually, it has the appearance of a kite.

7.4. Analyzing GitHub Interest Graphs | 297

Figure 7-4. The Krackhardt kite graph that will be used to illustrate degree, between‐

ness, and closeness centrality measures

Example 7-7 shows some code that loads this graph from NetworkX and calculates centrality measures on it, which are reproduced in Table 7-1. Although it has no bearing on the calculations, note that this particular graph is commonly used as a reference in social networking. As such, the edges are not directed since a connection in a social network implies a mutual acceptance criteria. In NetworkX, it is an instance of net workx.Graph as opposed to networkx.DiGraph.

Example 7-7. Calculating degree, betweenness, and closeness centrality measures on the Krackhardt kite graph

from operator import itemgetter from IPython.display import HTML from IPython.core.display import display

display(HTML('<img src="files/resources/ch07-github/kite-graph.png" width="400px">'))

# The classic Krackhardt kite graph

kkg = nx.generators.small.krackhardt_kite_graph() print "Degree Centrality"

print sorted(nx.degree_centrality(kkg).items(), key=itemgetter(1), reverse=True)

298 | Chapter 7: Mining GitHub: Inspecting Software Collaboration Habits, Building Interest Graphs, and More

print

print "Betweenness Centrality"

print sorted(nx.betweenness_centrality(kkg).items(), key=itemgetter(1), reverse=True)

print

print "Closeness Centrality"

print sorted(nx.closeness_centrality(kkg).items(), key=itemgetter(1), reverse=True)

Table 7-1. Degree, betweenness, and closeness centrality measures for the Krackhardt kite graph (maximum values for each column are presented in bold so that you can easily test your intuition against the graph presented in Figure 7-4)

Node Degree centrality Betweenness centrality Closeness centrality

0 0.44 0.02 0.53

1 0.44 0.02 0.53

2 0.33 0.00 0.50

3 0.67 0.10 0.60

4 0.33 0 0.50

5 0.55 0.2 0.64

6 0.55 0.2 0.64

7 0.33 0.39 0.60

8 0.22 0.22 0.43

9 0.11 0.00 0.31

Spend a few moments studying the Krackhardt kite graph and the centrality measures associated with it before moving on to the next section. These centrality measures will remain in our toolbox moving forward through this chapter.

Một phần của tài liệu Mining the social web, 2nd edition (Trang 322 - 325)

Tải bản đầy đủ (PDF)

(448 trang)