1. Trang chủ
  2. » Công Nghệ Thông Tin

08 link prediction (sbm)

52 0 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 52
Dung lượng 41,5 MB

Nội dung

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu [LibenNowell-Kleinberg ‘03] ¡ The link prediction task: § Given ![#$ , #$& ] a graph on edges up to time #$& ,output a ranked list L of links (not in ![#$ , #$& ]) that are predicted to appear in ![#( , #(& ] ¡ Evaluation: ![#$ , #$& ] ![#( , #(& ] § n = |Enew|: # new edges that appear during the test period [#( , #(& ] § Take top n elements of L and count correct edges 10/18/18 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu [LibenNowell-Kleinberg ‘03] ¡ Predict links in a evolving collaboration network ¡ Core: Because network data is very sparse § Consider only nodes with degree of at least § Because we don't know enough about nodes with less than edges to make good inferences 10/18/18 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu ¡ Methodology: § For each pair of nodes (x,y) compute score c(x,y) § For example, c(x,y) could be the # of common neighbors of x and y § Sort pairs (x,y) by the decreasing score c(x,y) § Note: Only consider/predict edges where both endpoints are in the core (deg ≥ 3) 10/18/18 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu X X § Predict top n pairs as new links § See which of these links actually appear in ![#$ , #$& ] [LibenNowell-Kleinberg ‘03] ¡ Different scoring functions !(#, %) = § § § § § § Graph distance: (negated) Shortest path length Common neighbors: |Γ * ∩ Γ(,)| Jaccard’s coefficient: Γ * ∩ Γ , /|Γ * ∪ Γ(,)| Adamic/Adar: ∑0∈2 ∩2(4) 1/log |Γ(9)| Γ(x) … neighbors Preferential attachment: |Γ * | ⋅ |Γ(,)| of node x PageRank: ;3 (,) + ;4 (*) § ;3 , … stationary distribution score of y under the random walk: § with prob 0.15, jump to x § with prob 0.85, go to random neighbor of current node ¡ Then, for a particular choice of c(·) § For every pair of nodes (x,y) compute c(x,y) § Sort pairs (x,y) by the decreasing score c(x,y) § Predict top n pairs as new links 10/18/18 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu [LibenNowell-Kleinberg ’ 03] Performance score: Fraction of new edges that are guessed correctly 10/18/18 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu ¡ Link prediction § Local structure: Link prediction via proximity § Global structure: Stochastic Blockmodels § Another way to predict links is to identify communities § We can then calculate link probabilities within and between communities 10/18/18 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu ¡ ¡ We often think of networks being organized into modules, cluster, communities Blockmodels: § Divide the nodes of the network into distinct sets, or "blocks", where all nodes in the same block have the same pattern of connection to nodes in other blocks J Leskovec, A Rajaraman, J Ullman: Mining of Massive Datasets, http://www.mmds.org J Leskovec, A Rajaraman, J Ullman: Mining of Massive Datasets, http://www.mmds.org 10 [Abbe’17] ¡ Weak recovery: § Weak recovery is not solvable if the graph does not have a giant component § Erdos-Renyi graph !(#, % = '/#) has a giant component (i.e., a component of size linear in n) if and only if ' > * § For ' > , G -, /- will almost surely have a unique giant component containing a positive fraction of the vertices § No other component will contain more than /(log -) vertices 10/18/18 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 38 [Abbe’17] ¡ In !!"#(%, ' = )/+, ,/%, -/%) § The expected size of each group is // § Each vertex has in expectation § 1// neighbors in its own group, and § 2// in each of the other groups § Expected degree = = ¡ 45 678 In !"#(%, ', : ; 0, (()*(,, -, # log(,)/,, % log(,)/,) is connected with high probability if and only if ! = (# + - − %)/- > § ()*(,, -, log(,)/,) is connected with high probability if min!< = diag C D § Weak recovery: § (()*(,, -, #/,, %/,) has a giant component (i.e., a component of size linear in ,) if and only if ! ≔ (# + - − %)/- > 10/18/18 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 40 [Abbe’17] ¡ The fundamental limit for exact recovery § Exact recovery in SSBM(n, 1/2, a ln(n)/n, b ln(n)/n) is solvable and efficiently so if § Note that § Recall that § ! − # > % '− ( > !,# % +,2⟹ > + '( > % is the connectivity requirement in SSBM !# is required to go from connectivity to exact recovery 211=a ln(n)/n Exact recovery needs: 222=a ln(n)/n !+# > + !# % 212=b ln(n)/n 10/18/18 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 41 [Abbe’17] ¡ The fundamental limit for exact recovery § Exact recovery in SBM(%, ', ln % */%) is solvable and efficiently so if § ,- , / : = 234 8-( 93:;(.)/ | 93:;(.)/ 567 > ?, § D+ = maxt∈[0,1] Dt § Chernoff-Hellinger (CH) divergence § KL (M||µ) = max ∑R ν T UL (V(T)/ν T ), ft(y):=1 − t + ty − yt L∈[O,P] § KL is a distance notion between communities § Intuitively: the further “apart” the community profiles are, the easier it should be to distinguish the communities 10/18/18 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 42 [Abbe’17] ¡ The fundamental limit for weak recovery § It is possible to detect communities if SNR > (KestenStigum (KS) threshold) § SNR: expected number of in-block edges divided by the expected number of out-block edges § SSBM(n, !/#, a/n, b/n) § $%& = ()*+)#() #*! +) § SBM(n, /, 0/n) § Let |23 | ≥ |25 | ≥ |26 | … be eigenvalues of diag(

Ngày đăng: 26/07/2023, 19:36

w