1. Trang chủ
  2. » Giáo Dục - Đào Tạo

co so du lieu nang cao do phuc bai 5 csdl dothi cuuduongthancong com (1)

46 5 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 46
Dung lượng 484,46 KB

Nội dung

Mining, Indexing and Searching Graph Databases Presenter: A/ Prof Do Phuc Source: Jiawei Han , Vladimir Lipets July 22, 2010 from H Jeong et al Nature 411, 41 (2001) Graph, Graph, Everywhere Aspirin July 22, 2010 An Internet Web Yeast protein interaction network Co-author network Why Graph Mining and Searching? „ „ Graphs are ubiquitous „ Chemical compounds (Cheminformatics) „ Protein structures, biological pathways/networks (Bioinformactics) „ Program control flow, traffic flow, and workflow analysis „ XML databases, Web, and social network analysis Graph is a general model „ „ Trees, lattices, sequences, and items are degenerated graphs Diversity of graphs „ Directed vs undirected, labeled vs unlabeled (edges & vertices), weighted, with angles & geometry (topological vs 2-D/3-D) „ Complexity of algorithms: many problems are of high complexity! July 22, 2010 Outline „ Graph Isomorphism, Subgraph Isomorphism „ Mining frequent graph patterns „ Graph indexing methods „ Similairty search in graph databases „ Biological network analysis July 22, 2010 Motivation „ Graph, Subgraph isomorphism is important and very general form of pattern matching that finds practical application in areas such as: „ pattern recognition and computer vision, „ image processing, „ computer-aided design, graph grammars, „ graph transformation, „ biocomputing, „ search operation in chemical database, „ numerous others July 22, 2010 A hierarchy of pattern matching problems „ „ „ „ „ Graph isomorphism Subgraph isomorphism Maximum common subgraph Approximate subgraph isomorphism Graph edit distance July 22, 2010 Isomorphic Graphs July 22, 2010 Graph Isomorphism July 22, 2010 Subgraph of a given graph July 22, 2010 Subgraph Isomorphism July 22, 2010 10 Framework (cont.) Step Query Processing „ „ July 22, 2010 Use the feature-graph matrix to calculate the difference in the number of features between graph G and query Q, FG – FQ If FG – FQ > J, discard G The remaining graphs constitute a candidate answer set 32 Outline „ Mining frequent graph patterns „ Graph indexing methods „ Similairty search in graph databases „ Biological network analysis July 22, 2010 33 Biological Networks „ „ „ „ „ „ July 22, 2010 Protein-protein interaction network Metabolic network Transcriptional regulatory network Co-expression network Genetic Interaction network … 34 Data Mining Across Multiple Networks f f a j h c a c a e e b b d i g j a c j e b b k g i k d h c e e d j a h c i g f f h k d i g f a b k k d j h c h e July 22, 2010 f j g i b k d g i 35 Data Mining Across Multiple Networks f f a j h c a c a e e b b d i g j a c j e b b k g i k d h c e e d j a h c i g f f h k d i g f a b k k d j h c h e July 22, 2010 f j g i b k d g i 36 Identify Frequent Co-expression Clusters across Multiple Microarray Data Sets f a c1 c2… cm g1 2… g2 3… … c1 c2… cm g1 6… g2 3… … c d g a c b July 22, 2010 k i f e j k d e f e b c j h a c k b f j a d g i h j h k i k i f e g i e b d k d g e a j h b d g k i f c a c h j d g i g c b a h f c1 c2… cm g1 5… g2 1… … a b c1 c2… cm g1 4… g2 3… … e f h j c h j e b d g k i 37 CODENSE: Mine Coherent Dense Subgraphs (1) Builds a summary graph by eliminating infrequent edges f a a h c e b f f a c e b h c h e b f d d i g G1 d i g G2 i g a G3 h c e b f f a b e d b g G4 July 22, 2010 a h c i f a h c e d b g G5 i d h c i summary graph Ĝ e d g g i G6 38 CODENSE: Mine Coherent Dense Subgraphs (2) Identify dense subgraphs of the summary graph f a f Step h c h c e e b d g summary graph Ĝ i MODES g i Sub(Ĝ) Observation: If a frequent subgraph is dense, it must be a dense subgraph in the summary graph However, the reverse is not true July 22, 2010 39 Applying CoDense to 39 Yeast Microarray Data Sets f a c1 c2… cm g1 2… g2 3… … c1 c2… cm g1 6… g2 3… … c1 c2… cm g1 4… g2 3… … c e July 22, 2010 a b d g a c b k i f e j a c f j h a c f b k d g f a c k b i j h j a f e k i k i d g i h g j h e b d k i c e b d g e d g i g h j e k d c b a h f c1 c2… cm g1 5… g2 1… … f h j c h j e b d g k i 40 Discovery of New Genes Based on Similar Genes YDR115W MRP49 PHB1 MRPL51 PET100 ATP12 ATP17 MRPL37 ACN9 MRPL38 MRPL39 MRPL32 MRPS18 July 22, 2010 FMC1 41 Network of Known Similar Genes ATP17 MRP49 MRPL51 PHB1 ATP12 PET100 PET100 YDR115W MRPL38 ACN9 MRPL32 MRPL39 MRPS18 FMC1 Brown: YDR115W, FMC1, ATP12, MRPL37, MRPS18 GO:0019538 (protein metabolism; pvalue = 0.001122) July 22, 2010 42 Network Involved in the New Genes YDR115W MRP49 PHB1 MRPL51 PET100 ATP12 MRPL37 ATP17 ACN9 MRPL38 MRPL32 MRPL39 MRPS18 FMC1 Red:PHB1,ATP17,MRPL51,MRPL39, MRPL49, MRPL51,PET100 GO:0006091 (generation of precursor metabolites and energy; pvalue=0 001339) July 22, 2010 43 Outline „ Mining frequent graph patterns „ Graph indexing methods „ Similairty search in graph databases „ Biological network analysis July 22, 2010 44 Conclusions „ Graph mining has wide applications „ Frequent and closed subgraph mining methods „ gSpan and CloseGraph: pattern-growth depth-first search approach „ Graph indexing techniques: „ „ Similairty search in graph databases „ „ Indexing and approximate matching help similar subgraph search Biological network analysis „ „ Frequent and discirminative subgraphs as indexing fatures Mining coherent, dense, multiple biological networks Many new developments along the line of graph pattern mining July 22, 2010 45 Thanks and Questions July 22, 2010 46 ... structures „ Program control flow analysis „ Mining XML structures or Web communities „ July 22, 2010 Building blocks for graph classification, clustering, comparison, and correlation analysis... ICML’ 05) 24 Structure Similarity Search • CHEMICAL COMPOUNDS (a) caffeine (b) diurobromine (c) viagra • QUERY GRAPH July 22, 2010 25 Some “Straightforward” Methods „ Method1: Directly compute... „ biocomputing, „ search operation in chemical database, „ numerous others July 22, 2010 A hierarchy of pattern matching problems „ „ „ „ „ Graph isomorphism Subgraph isomorphism Maximum common

Ngày đăng: 16/12/2022, 22:43

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

w