08 link prediction (sbm)

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu [LibenNowell-Kleinberg ‘03] ¡ The link prediction task: § Given ![#$ , #$& ] a graph on edges up to time #$& ,output a ranked list L of links (not in ![#$ , #$& ]) that are predicted to appear in ![#( , #(& ] ¡ Evaluation: ![#$ , #$& ] ![#( , #(& ] § n = |Enew|: # new edges that appear during the test period [#( , #(& ] § Take top n elements of L and count correct edges 10/18/18 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu [LibenNowell-Kleinberg ‘03] ¡ Predict links in a evolving collaboration network ¡ Core: Because network data is very sparse § Consider only nodes with degree of at least § Because we don't know enough about nodes with less than edges to make good inferences 10/18/18 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu ¡ Methodology: § For each pair of nodes (x,y) compute score c(x,y) § For example, c(x,y) could be the # of common neighbors of x and y § Sort pairs (x,y) by the decreasing score c(x,y) § Note: Only consider/predict edges where both endpoints are in the core (deg ≥ 3) 10/18/18 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu X X § Predict top n pairs as new links § See which of these links actually appear in ![#$ , #$& ] [LibenNowell-Kleinberg ‘03] ¡ Different scoring functions !(#, %) = § § § § § § Graph distance: (negated) Shortest path length Common neighbors: |Γ * ∩ Γ(,)| Jaccard’s coefficient: Γ * ∩ Γ , /|Γ * ∪ Γ(,)| Adamic/Adar: ∑0∈2 ∩2(4) 1/log |Γ(9)| Γ(x) … neighbors Preferential attachment: |Γ * | ⋅ |Γ(,)| of node x PageRank: ;3 (,) + ;4 (*) § ;3 , … stationary distribution score of y under the random walk: § with prob 0.15, jump to x § with prob 0.85, go to random neighbor of current node ¡ Then, for a particular choice of c(·) § For every pair of nodes (x,y) compute c(x,y) § Sort pairs (x,y) by the decreasing score c(x,y) § Predict top n pairs as new links 10/18/18 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu [LibenNowell-Kleinberg ’ 03] Performance score: Fraction of new edges that are guessed correctly 10/18/18 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu ¡ Link prediction § Local structure: Link prediction via proximity § Global structure: Stochastic Blockmodels § Another way to predict links is to identify communities § We can then calculate link probabilities within and between communities 10/18/18 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu ¡ ¡ We often think of networks being organized into modules, cluster, communities Blockmodels: § Divide the nodes of the network into distinct sets, or "blocks", where all nodes in the same block have the same pattern of connection to nodes in other blocks J Leskovec, A Rajaraman, J Ullman: Mining of Massive Datasets, http://www.mmds.org J Leskovec, A Rajaraman, J Ullman: Mining of Massive Datasets, http://www.mmds.org 10 [Abbe’17] ¡ Weak recovery: § Weak recovery is not solvable if the graph does not have a giant component § Erdos-Renyi graph !(#, % = '/#) has a giant component (i.e., a component of size linear in n) if and only if ' > * § For ' > , G -, /- will almost surely have a unique giant component containing a positive fraction of the vertices § No other component will contain more than /(log -) vertices 10/18/18 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 38 [Abbe’17] ¡ In !!"#(%, ' = )/+, ,/%, -/%) § The expected size of each group is // § Each vertex has in expectation § 1// neighbors in its own group, and § 2// in each of the other groups § Expected degree = = ¡ 45 678 In !"#(%, ', : ; 0, (()*(,, -, # log(,)/,, % log(,)/,) is connected with high probability if and only if ! = (# + - − %)/- > § ()*(,, -, log(,)/,) is connected with high probability if min!< = diag C D § Weak recovery: § (()*(,, -, #/,, %/,) has a giant component (i.e., a component of size linear in ,) if and only if ! ≔ (# + - − %)/- > 10/18/18 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 40 [Abbe’17] ¡ The fundamental limit for exact recovery § Exact recovery in SSBM(n, 1/2, a ln(n)/n, b ln(n)/n) is solvable and efficiently so if § Note that § Recall that § ! − # > % '− ( > !,# % +,2⟹ > + '( > % is the connectivity requirement in SSBM !# is required to go from connectivity to exact recovery 211=a ln(n)/n Exact recovery needs: 222=a ln(n)/n !+# > + !# % 212=b ln(n)/n 10/18/18 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 41 [Abbe’17] ¡ The fundamental limit for exact recovery § Exact recovery in SBM(%, ', ln % */%) is solvable and efficiently so if § ,- , / : = 234 8-( 93:;(.)/ | 93:;(.)/ 567 > ?, § D+ = maxt∈[0,1] Dt § Chernoff-Hellinger (CH) divergence § KL (M||µ) = max ∑R ν T UL (V(T)/ν T ), ft(y):=1 − t + ty − yt L∈[O,P] § KL is a distance notion between communities § Intuitively: the further “apart” the community profiles are, the easier it should be to distinguish the communities 10/18/18 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 42 [Abbe’17] ¡ The fundamental limit for weak recovery § It is possible to detect communities if SNR > (KestenStigum (KS) threshold) § SNR: expected number of in-block edges divided by the expected number of out-block edges § SSBM(n, !/#, a/n, b/n) § $%& = ()*+)#() #*! +) § SBM(n, /, 0/n) § Let |23 | ≥ |25 | ≥ |26 | … be eigenvalues of diag(

Định dạng
Số trang	52
Dung lượng	41,5 MB