0 chương 8 du doan lien ket

Chương 7: DỰ ĐOÁN LIÊN KẾT (LINK PREDICTION) NỘI DUNG I Dự đoán liên kết dựa vào độ tương đồng II Dự đốn liên kết tích cực ko tích cực Mạng Xã Hội trực tuyến Link Prediction Will nodes 33 and 28 become friends in the future? What about nodes 27 and 4? Does network structure contain enough information to predict what new links will form in the future? Who to follow Strength of social ties (review) • Strong ties – surrounded by many mutual friends – characterized by lots of shared time together • Weak ties – have few mutual friends – Serve as bridges to diverse parts of the network – Provide access to novel information The Link-Prediction Problem for Social Networks (Liben-Nowell & Kleinberg) To what extent can the evolution of a social network be modeled using features intrinsic to the network itself? • Formalize the link prediction problem – Given a snapshot of a network, infer which new interactions between nodes are likely to occur in the future • Propose link prediction heuristics based on measures for analyzing the “proximity” of nodes in a network • Evaluate link prediction heuristics on large coauthorship networks Future coauthorships can be extracted from network topology The intuition • In many networks, people who are “close” belong to the same social circles and will inevitably encounter one another and become linked themselves • Link prediction heuristics measure how “close” people are x x y y Red nodes are close to each other Red nodes are more distant NỘI DUNG I Dự đoán liên kết dựa vào độ tương đồng Link prediction heuristics • Local x • • • • Common neighbors (CN) Jaccard (JC) Adamic-Adar (AA) Preferential attachment (PA) … • Global y • Katz score • Hitting time • PageRank … NỘI DUNG v Dự đoán liên kết dựa vào độ tương đồng Ø Các độ tương đồng cục 10 Theory of Structural Balance Consider edges as undirected  Start with intuition [Heider ’46]: – Friend of my friend is my friend – Enemy of enemy is my friend – Enemy of friend is my enemy  Look at connected triples of nodes that are consistent with this logic: + + + - + - + - - 43 Theory of Status  Status theory [Davis-Leinhardt ‘68, Guha et al ’04, Leskovec et al ‘10] – Link u + v means: v has higher status than u – Link u – v means: v has lower status than u – Based on signs/directions of links from/to node x make a prediction Lý thuyết trạng thái cách tiếp cân khác để dự đoán liên kết mạng xã hội  “Một liên kết dương từ A đến B có nghĩa B bạn A, B có trạng thái cao A Tương tự liên kết âm từ A đến đến B có nghĩa B kẻ thù A có 44 nghĩa B có trạng thái thấp A.”  Theory of Status  Status theory [Davis-Leinhardt ‘68, Guha et al ’04, Leskovec et al ‘10] – Link u + v means: v has higher status than u – Link u – v means: v has lower status than u – Based on signs/directions of links from/to node x make a prediction  Status and balance can make different predictions: - x u - + v Balance: + Status: – LogReg: – x u + v Balance: + Status: – LogReg: – + x - u v Balance: – Status: – LogReg: – 45 Balance and Status: Complete model + + + + + + + + - + + + + + + + + - 46 Balance and Status: Observations  Both theories agree well with learned models  Further observations: – Backward-backward triads have smaller weights than forward and mixed direction triads – Balance is in better agreement with Epinions and Slashdot while Status is with Wikipedia – Balance consistently disagrees with “enemy of my enemy is my friend” x u v 47 Balance theory  Balance based and learned coefficients: Feature Balance theory Epin Slashdot Wiki const 0.43 1.49 0.04 + + 0.05 0.04 0.05 - -1 -0.11 -0.24 -0.16 + -1 -0.21 -0.35 -0.14 - -0.01 -0.03 -0.05 + + + Model if signs would be created purely based on Balance theory 48 Status theory  Status based and learned coefficients: Feature const Status theory Epin -0.68 Slashdot -1.39 Wiki -0.30 + x+ u v uv -1 -0.10 -0.11 -0.19 + – u uv u>xx>v 49 Status theory + x+ u v x+ u v – + u x – v – x – Dự đoán liên kết dựa vào đặc trưng ? u v Triads where u>x>v 50 Learned vs Deterministic models  Deterministic models compare well to Learned models  Epinions and Slashdot: – More embedded edges are easier to predict  Wikipedia: – Status outperforms balance  Learned balance performs nearly as well as the full model Epin Slash Wiki 51 Generalization  Do people use these very different linking systems by obeying the same principles? – How generalizable are the results across the datasets? • Train on row “dataset”, predict on “column”  Almost perfect generalization of the models even though networks come from very different applications 52 Does negative information help?  Suppose we are only interested in predicting whether there is a trust edge or no edge  Does knowing negative edges help? YES! + – – ? – + – + + + + – + +– – ? Vs + + + + + + 53 From Local to Global Structure  Both theories make predictions about the globure of the network  Structural balance – Factions – Put nodes into groups such that the number of in group “+” and between group “-” edges is maximized  Status theory – Global Status – Flip direction and sign of negative edges – Assign each node a unique status value so that most edges point from low to high + - + 54 From Local to Global Structure  Fraction of edges of the network that satisfy Balance and Status?  Observations: – No evidence for global balance beyond the random baselines • Real data is 80% consistent vs 80% consistency under random baseline – Evidence for global status beyond the random baselines • Real data is 80% consistent, but 50% consistency under random baseline 55 Conclusion  Signed networks provide insight into how social computing systems are used: – Status vs Balance  Sign of relationship can be reliably predicted from the local network context – ~90% accuracy sign of the edge  More evidence that networks are globally organized based on status  People use signed edges consistently regardless of particular application – Near perfect generalization of models across datasets  Negative information helps in predicting positive edges 56 Jure Leskovec 57

Định dạng
Số trang	57
Dung lượng	2,52 MB