Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 77 trang
THÔNG TIN TÀI LIỆU
Nội dung
1 How to compute betweenness? How to select the number of clusters? J Leskovec, A Rajaraman, J Ullman: Mining of Massive Datasets, http://www.mmds.org Want to compute betweenness of paths starting at node � Breath first search starting from �: J Leskovec, A Rajaraman, J Ullman: Mining of Massive Datasets, http://www.mmds.org Count the number of shortest paths from � to all other nodes of the network: J Leskovec, A Rajaraman, J Ullman: Mining of Massive Datasets, http://www.mmds.org Compute betweenness by working up the tree: If there are multiple paths count them fractionally The algorithm: •Add edge flows: node flow = 1+∑child edges split the flow up based on the parent value • Repeat the BFS procedure for each starting node � 1+1 paths to H Split evenly 1+0.5 paths to J Split 1:2 path to K Split evenly J Leskovec, A Rajaraman, J Ullman: Mining of Massive Datasets, http://www.mmds.org Compute betweenness by working up the tree: If there are multiple paths count them fractionally The algorithm: •Add edge flows: node flow = 1+∑child edges split the flow up based on the parent value • Repeat the BFS procedure for each starting node � 1+1 paths to H Split evenly 1+0.5 paths to J Split 1:2 path to K Split evenly J Leskovec, A Rajaraman, J Ullman: Mining of Massive Datasets, http://www.mmds.org Graph Partitioning • Methods to break a network into sets of connected components called regions • Many general approaches – Divisive methods: Repeatedly identify and remove edges connecting densely connected regions – Agglomerative methods: Repeatedly identify and merge nodes that likely belong in the same region [Girvan-Newman ‘02] Divisive hierarchical clustering based on the notion of edge betweenness: Number of shortest paths passing through the edge Girvan-Newman Algorithm: § Undirected unweighted networks § Repeat until no edges are left: § Calculate betweenness of edges § Remove edges with highest betweenness § Connected components are communities § Gives a hierarchical decomposition of the network J Leskovec, A Rajaraman, J Ullman: Mining of Massive Datasets, http://www.mmds.org Girvan-Newman Algorithm • Divisive method Proposed by Girvan and Newman in 2002 • Uses edge betweenness to identify edges to remove • Edge betweenness: Total amount of “flow” an edge carries between all pairs of nodes where a single unit of flow between two nodes divides itself evenly among all shortest paths between the nodes (1/k units flow along each of k shortest paths) Girvan-Newman Algorithm Calculate betweenness of all edges Remove the edge(s) with highest betweenness Repeat steps and until graph is partitioned into as many regions as desired Girvan-Newman Algorithm B keeps & passes 1 A B C D 1 F ½ I H ½ J ½ ½ K 1 G E C keeps & passes 1 A 2 B C D 1 F ½ I H ½ J ½ ½ K 1 G E D keeps & passes along A B C D 1 F ½ I H ½ J ½ ½ K 1 G E A B C D 1 F 2 ½ I H J ½ ½ K 1 ½ E G E keeps & passes along No flow yet… A B C D 1 F 2 ½ I H ½ J ½ ½ K 1 G E Computing Edge Betweenness Efficiently For each node N in the graph Repeat for B, C, etc Perform breadth-first search of graph starting at node N Determine the number of shortest paths from N to every other node Based on these numbers, determine the amount of flow from N to all other nodes that use each edge Divide sum of flow of all edges by Since sum includes flow from A B and B A, etc Example b d a g c e Compute #geodesics from every node to g f Breadth-first search – means for doing manythings Example b d d=0 w=1 a c g e f Breadth-first search – means for doing many things Example b d=1 w=1 d d=0 w=1 a c e d=1 w=1 g f Breadth-first search – means for doing many things Example d=2 w=2 b d=1 w=1 d d=0 w=1 a c d=2 w=2 e d=1 w=1 f g d=2 w=2 Breadth-first search – means for doing many things Example d=2 w=2 d=3 w=4 b d=1 w=1 d d=0 w=1 a c d=2 w=2 e d=1 w=1 f g Have all info we need for edge betweenness now d=2 w=2 Breadth-first search – means for doing many things Example d=2 w=2 d=3 2/4 w=4 b d=1 w=1 d 1/2 a 2/4 d=0 w=1 c d=2 w=2 e d=1 w=1 1/2 f g Note: a and f are like leaves: no geodesic to g from other nodes passes through them d=2 w=2 Breadth-first search – means for doing many things An Example d=2 w=2 ½(1+2/4) d ) /4 +2 (1 ½ d=3 2/4 w=4 b a 2/4 4) / + (1 ½ d=1 w=1 c ½(1+2/4) d=2 w=2 e d=0 w=1 1/2 d=1 w=1 1/2 f g Note: a and f are like leaves: no geodesic to g from other nodes passes through them d=2 w=2 Breadth-first search – means for doing many things Example 1/1[ 1+½(1+2/4)+1/2(1+2/4)+1/2] d=2 w=2 ½(1+2/4) d ) /4 +2 (1 ½ d=3 2/4 w=4 b a 2/4 4) / + (1 ½ d=1 w=1 c ½(1+2/4) d=2 w=2 e d=0 w=1 1/2 d=1 w=1 1/2 f g Note: a and f are like leaves: no geodesic to g from other nodes passes through them d=2 w=2 Breadth-first search – means for doing many things Edge betweenness for all edges can be computed in time �(��) (�=#edges, �=#nodes) [Newman 2001] – details soon Recalculation makes algorithm �(�2�), so not feasible for large networks