1. Trang chủ
  2. » Công Nghệ Thông Tin

02 measuring networks, and random graph model

60 0 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 60
Dung lượng 33,15 MB

Nội dung

CS224W: Analysis of Networks Jure Leskovec, Stanford University http://cs224w.stanford.edu Degree distribution: P(k) Path length: h Clustering coefficient: C Connected components: s 10/3/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu Degree distribution P(k): Probability that a randomly chosen node has degree k Nk = # nodes with degree k P(k) ¡ Normalized histogram: P(k) = Nk / N ➔ plot ¡ 0.6 0.5 0.4 0.3 0.2 0.1 k Nk k 10/3/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu ¡ A path is a sequence of nodes in which each node is linked to the next one Pn = {i0,i1,i2, ,in } ¡ Pn = {(i0 ,i1),(i1 ,i2 ),(i2 ,i3 ), ,(in-1,in )} Path can intersect itself and pass through the same edge multiple times § E.g.: ACBDCDEG § In a directed graph a path can only follow the direction of the “arrow” 10/3/18 B F A D E G C X Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu H D ¡ between a pair of nodes is defined as A X the number of edges along the C shortest path connecting the nodes B § *If the two nodes are not connected, the hB,D = hA,X = ∞ distance is usually defined as infinite D ¡ In directed graphs paths need to follow the direction of the arrows A C B hB,C = 1, hC,B = 10/3/18 Distance (shortest path, geodesic) § Consequence: Distance is not symmetric: hB,C ≠ hC, B Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu ¡ Diameter: The maximum (shortest path) distance between any pair of nodes in a graph ¡ Average path length for a connected graph (component) or a strongly connected (component of a) directed graph where h is the distance from node i to node j h= hij E is max number of edges (total number of å Emax i , j ¹i node pairs) = n(n-1)/2 ij max § Many times we compute the average only over the connected pairs of nodes (that is, we ignore “infinite” length paths) 10/3/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu ¡ Clustering coefficient: § What portion of i’s neighbors are connected? § Node i with degree ki § Ci Î [0,1] § where ei is the number of edges between the neighbors of node i ¡ Average clustering coefficient: C = N 10/3/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu N åC i i ¡ Clustering coefficient: § What portion of i’s neighbors are connected? § Node i with degree ki § where ei is the number of edges between the neighbors of node i B F A D E G C H 10/3/18 kB=2, eB=1, CB=2/2 = kD=4, eD=2, CD=4/12 = 1/3 Avg clustering: C=0.33 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu ¡ Size of the largest connected component § Largest set where any two vertices can be joined by a path ¡ Largest component = Giant component B A D F C H How to find connected components: • Start from random node and perform Breadth First Search (BFS) • Label the nodes BFS visited • If all nodes are visited, the network is connected • Otherwise find an unvisited node and repeat BFS G 10/3/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 10 ¡ Assume each human is connected to 100 other people Then: § § § § § ¡ Step 1: reach 100 people Step 2: reach 100*100 = 10,000 people Step 3: reach 100*100*100 = 1,000,000 people Step 4: reach 100*100*100*100 = 100M people In steps we can reach 10 billion people s What’s wrong here? We ignore clustering! § Not all edges point to new people § 92% of FB friendships happen through a friend-of-a-friend 10/3/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 49 MSN network has orders of magnitude larger clustering than the corresponding Gnp! ¡ Other examples: ¡ Actor Collaborations (IMDB): N = 225,226 nodes, avg degree k = 61 Electrical power grid: N = 4,941 nodes, k = 2.67 Network of neurons: N = 282 nodes, k = 14 Network hactual hrandom Crandom Film actors 3.65 2.99 Power Grid 18.70 12.40 0.005 C elegans 2.65 2.25 0.05 0.00027 h Average shortest path length C Average clustering coefficient “actual” … real network “random” … random graph with same avg degree 10/3/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 50 ¡ Consequence of expansion: § Short paths: O(log n) § This is the smallest diameter we can get if we have a constant degree § But clustering is low! ¡ But networks have “local” structure: § Triadic closure: Friend of a friend is my friend § High clustering but diameter is also high ¡ 10/3/18 How can we have both? Low diameter Low clustering coefficient High clustering coefficient High diameter Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 51 ¡ Could a network with high clustering also be a small world (log $ dimeter)? § How can we at the same time have high clustering and small diameter? High clustering High diameter Low clustering Low diameter § Clustering implies edge “locality” § Randomness enables “shortcuts” 10/3/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 52 [Watts-Strogatz, ‘98] Small-World Model [Watts-Strogatz ‘98] Two components to the model: ¡ (1) Start with a low-dimensional regular lattice § (In our case we are using a ring as a lattice) § Has high clustering coefficient ¡ Now introduce randomness (“shortcuts”) ¡ (2) Rewire: § Add/remove edges to create shortcuts to join remote parts of the lattice § For each edge with prob p move the other end to a random node 10/3/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 53 [Watts-Strogatz, ‘98] High clustering High diameter h= N 2k C= 31 42 High clustering Low diameter Low clustering Low diameter h= log N log a C= k N Rewiring allows us to “interpolate” between a regular lattice and a random graph 10/3/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 54 Parameter region of high clustering and low path length (scaled) Average Path Length # $ Clustering Coefficient, ! = ∑ !& Intuition: It takes a lot of randomness to ruin the clustering, but a very small amount to create shortcuts Prob of rewiring, p 10/3/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 55 ¡ Could a network with high clustering be at the same time a small world? § Yes! You don’t need more than a few random links ¡ The Watts Strogatz Model: § Provides insight on the interplay between clustering and the small-world § Captures the structure of many realistic networks § Accounts for the high clustering of real networks § Does not lead to the correct degree distribution 10/3/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 58 ¡ 10/3/18 What mechanisms people use to navigate networks and find the target? Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 59 The setting: ¡ s only knows locations of its friends and location of the target t ¡ s does not know links of anyone else but itself ¡ Geographic Navigation: s “navigates” to a node geographically closest to t ¡ Search time T: Number of steps to reach t s t 10/3/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 60 Searchable Search time T: Not searchable Search time T: b O((log n) ) Kleinberg’s model O((log n) ) a O(n ) Watts-Strogatz O(n ) Erdős–Rényi Note: We know these graphs have diameter O(log n) So in Kleinberg’s model search time is polynomial in log n, while in Watts-Strogatz it is exponential (in log n) 10/3/18 O(n) Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 61 Watts-Strogatz graphs are not searchable ¡ How we make a searchable small-world graph? ¡ Intuition: ¡ § Our long range links are not random § They follow geography! Saul Steinberg, “View of the World from 9th Avenue” 10/3/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 62 Model [Kleinberg, Nature ‘01] ¡ § Nodes still on a grid § Node has one long range link § Prob of long link to node v: -α P(u → v) ~ d(u,v) 10/3/18 d α=1 P(u®v) α=0 P(u®v) P(u®v) § d(u,v) … grid distance between u and v § α … parameter ≥ d Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu d(u, v)-a P(u ® v) = å d(u, w)-a w¹ u α >> d 63 ¡ We know: § α = (i.e., Watts-Strogatz): We need O( n ) steps CHAPTER 20 THE SMALL-WORLD PHENOMENON § α = 1: We need O(log(n)2) steps 20 7.0 Search time ln T 6.0 5.0 0.0 10/3/18 1.0 2.0 exponent Exponent αq Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 64 igure 20.6: Simulation of decentralized search in the grid-based model with clustering xponent q Each point is the average of 1000 runs on (a slight variant of) a grid with 400 Small α: too many long links 10/3/18 Big α: too many short links Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 65

Ngày đăng: 26/07/2023, 19:35

w