1. Trang chủ
  2. » Công Nghệ Thông Tin

13 outbreak detection in networks

48 0 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 48
Dung lượng 28,59 MB

Nội dung

CS224W: Analysis of Networks Jure Leskovec, Stanford University http://cs224w.stanford.edu ¡ ¡ (1) New problem: Outbreak detection (2) Develop an approximation algorithm § It is a submodular opt problem! ¡ (3) Speed-up greedy hill-climbing § Valid for optimizing general submodular functions (i.e., also works for influence maximization) ¡ (4) Prove a new “data dependent” bound on the solution quality § Valid for optimizing any submodular function (i.e., also works for influence maximization) 11/7/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu ¡ Given a real city water distribution network ¡ And data on how contaminants spread in the network ¡ Detect the contaminant as quickly as possible ¡ 11/7/18 S Problem posed by the US Environmental Protection Agency Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu Posts Users/blogs Information cascade Time ordered hyperlinks Which users/news sites should one follow to detect cascades as effectively as possible? 11/7/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu Want to read things before others Detect blue & yellow stories soon but miss the red story Detect all stories but late 11/7/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu ¡ Both of these two are instances of the same underlying problem! ¡ Given a dynamic process spreading over a network we want to select a set of nodes to detect the process effectively ¡ Many other applications: § Epidemics § Influence propagation § Network security 11/7/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu ¡ Utility of placing sensors: § Water flow dynamics, demands of households, … ¡ For each subset S Í V compute utility f(S) High impact outbreak Contamination Low impact outbreak S3 S1S2 S1 S4 Set V of all network junctions High sensing “quality” (e.g., f(S) = 0.9) 11/7/18 Medium impact outbreak S3 Sensor reduces impact through early detection! S2 S4 S1 Low sensing “quality” (e.g f(S)=0.01) Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu Given: ¡ ¡ Graph !(#, %) Data about how outbreaks spread over the ': § For each outbreak ( we know the time )(*, () when outbreak ( contaminates node * Water distribution network (physical pipes and junctions) 11/7/18 Simulator of water consumption & flow (built by Mech Eng people) We simulate the contamination spread for every possible location Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu Given: ¡ ¡ Graph !(#, %) Data about how outbreaks spread over the ': § For each outbreak ( we know the time )(*, () when outbreak ( contaminates node * a c b a c b The network of newsmedia 11/7/18 Traces of the information flow and identify influence sets Collect lots of articles and trace them to obtain data about information flow from a given news site Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu Given: ¡ ¡ Graph !(#, %) Data on how outbreaks spread over the ': ¡ Goal: Select a subset of nodes S that maximizes the expected reward: § For each outbreak ( we know the time )(*, () when outbreak ( contaminates node * max = ( 15 ⊆0 Expected reward for detecting outbreak i subject to: cost(S) < B P(i)… probability of outbreak i occurring f(i)… reward for detecting outbreak i using sensors S 11/7/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 10 ¡ Real metropolitan area water network § V = 21,000 nodes § E = 25,000 pipes ¡ ¡ 11/7/18 Use a cluster of 50 machines for a month Simulate 3.6 million epidemic scenarios (random locations, random days, random time of the day) Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 34 1.4 “Offline” the (1-1/e) bound Solution quality F(A) Higher is better 1.2 Data-dependent bound 0.8 0.6 Hill Climbing 0.4 0.2 0 10 15 20 Number of sensors placed Data-dependent bound is much tighter (gives more accurate estimate of alg performance) 11/7/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 35 [w/ Ostfeld et al., J of Water Resource Planning] Author ¡ 11/7/18 Placement heuristics perform much worse Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu Score CELF 26 Sandia 21 U Exter 20 Bentley systems 19 Technion (1) 14 Bordeaux 12 U Cyprus 11 U Guelph U Michigan Michigan Tech U Malcolm Proteo Technion (2) Battle of Water Sensor Networks competition 36 ¡ Different objective functions give different sensor placements Population affected 11/7/18 Detection likelihood Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 37 Here CELF is much faster than greedy hill-climbing! § (But there might be datasets/inputs where the CELF will have the same running time as greedy hill-climbing) 11/7/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 38 = I have 10 minutes Which news sites should I read to be most up to date? ? = Who are the most influential news sites? 11/7/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 39 Want to read things before others Detect blue & yellow soon but miss red Detect all stories but late 11/7/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 40 Crawled 45,000 blogs for year Obtained 10 million news posts ¡ And identified 350,000 cascades ¡ Cost of a blog is the number of posts it has ¡ ¡ 11/7/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 41 ¡ Online bound turns out to be much tighter! § Based on the plot below: 87% instead of 32.5% Old bound vs 11/7/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu Our bound CELF 42 ¡ ¡ 11/7/18 Heuristics perform much worse! One really needs to perform the optimization Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 43 ¡ ¡ CELF has sub-algorithms Which wins? Unit cost: § CELF picks large popular blogs ¡ Cost-benefit: § Cost proportional to the number of posts ¡ 11/7/18 We can much better when considering costs Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 44 ¡ Problem: Then CELF picks lots of small blogs that participate in few cascades ¡ We pick best solution that interpolates between the costs ¡ We can get good solutions with few blogs and few posts 11/7/18 Score f(S)=0.4 f(S)=0.3 f(S)=0.2 Each curve represents a set of solutions S with the same final reward f(S) Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 45 We want to generalize well to future (unknown) cascades ¡ Limiting selection to bigger blogs improves generalization! ¡ 11/7/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu Part 2-46 [Leskovec et al., KDD ’07] ¡ 11/7/18 CELF runs 700 times faster than simple hillclimbing algorithm Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 47 ¡ Outbreak detection problem in networks ¡ Different ways to formalize objective functions § All are submodular ¡ Lazy-Greedy algorithm for optimizing submodular functions ¡ CELF algorithm that combines versions of Lazy-Greedy ¡ Data-dependent bound on the solution quality 11/7/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 48

Ngày đăng: 26/07/2023, 19:36