Luận án tiến sĩ phân hạng và dự đoán gen liên quan đến bệnh bằng các thuật toán dựa trên mạng sinh học

i LỜI CAM ĐOAN Tôi xin cam đoan cơng trình nghiên cứu riêng tơi Các kết nghiên cứu viết chung với tác giả khác đồng ý họ trước đưa vào luận án Các kết nêu luận án trung thực chưa công bố cơng trình khác Tác giả Đặng Vũ Tùng ii LỜI CẢM ƠN Luận án tiến sỹ kết trình nghiên cứu lý thuyết tiến hành thực nghiệm đầy thách thức khó khăn; địi hỏi kiên trì tập trung cao độ Tôi thực hạnh phúc với kết đạt đề tài nghiên cứu Kết đạt không nỗ lực cá nhân, mà cịn có hỗ trợ giúp đỡ tập thể người hướng dẫn, sở đào tạo, quan chủ quản, đồng nghiệp gia đình Tơi xin bày tỏ tình cảm đến với họ Trước tiên, xin bày tỏ biết ơn sâu sắc đến PGS.TS Từ Minh Phương PGS.TS Lê Đức Hậu Được làm việc với hai thầy hội lớn cho tơi học hỏi phương pháp nghiên cứu, tính kiên trì phương pháp làm việc nghiêm túc, khoa học Tôi xin trân trọng cảm ơn Khoa Quốc tế sau đại học, Khoa Công nghệ thông tin, Ban Giám đốc Học viện Cơng nghệ Bưu Viễn thơng tạo điều kiện thuận lợi cho suốt q trình thực luận án Tơi xin cảm ơn Ban Giám đốc Học viện Thanh thiếu niên Việt Nam bạn bè, đồng nghiệp cổ vũ, động viên tạo điều kiện thuận lợi cho tơi q trình học tập, nghiên cứu Cuối cùng, tơi xin bày tỏ lịng biết ơn gia đình tơi, nơi khơi dậy truyền thống ln bên cạnh ủng hộ, giúp đỡ, chia sẻ với lúc khó khăn Xin chân thành cảm ơn! iii MỤC LỤC LỜI CAM ĐOAN i LỜI CẢM ƠN ii MỤC LỤC iii DANH MỤC CÁC CHỮ VIẾT TẮT vii DANH MỤC CÁC HÌNH ix DANH MỤC CÁC BẢNG xi PHẦN MỞ ĐẦU 1 Tính cấp thiết luận án Mục tiêu luận án 3 Các đóng góp luận án Bố cục luận án .5 Chương - TỔNG QUAN VỀ PHÂN HẠNG VÀ DỰ ĐOÁN GEN LIÊN QUAN ĐẾN BỆNH 1.1 CƠ SỞ VỀ SINH HỌC PHÂN TỬ 1.1.1 Tế bào 1.1.2 DNA 1.1.3 Gen .8 1.1.4 Quá trình điều khiển tổng hợp protein từ gen 1.2 PHÂN HẠNG GEN VÀ CÁC HƯỚNG TIẾP CẬN GIẢI QUYẾT .11 1.2.1 Bài toán phân hạng gen 11 1.2.2 Các hướng tiếp cận giải toán phân hạng gen 12 iv 1.3 CÁC CƠ SỞ DỮ LIỆU VÀ MẠNG SINH HỌC 16 1.3.1 Các sở liệu sinh học 16 1.3.2 Các mạng sinh học 20 1.3.3 Mạng tương tác gen/protein .22 1.4 CÁC PHƯƠNG PHÁP PHÂN HẠNG VÀ DỰ ĐOÁN GEN LIÊN QUAN ĐẾN BỆNH DỰA TRÊN MẠNG SINH HỌC .23 1.4.1 Phương pháp dựa mức độ gần gen/protein 23 1.4.2 Phương pháp dựa tích hợp liệu gen qui mô lớn 25 1.4.3 Phương pháp dựa tích hợp thơng tin kiểu hình 27 1.4.4 Phương pháp xây dựng mô đun bệnh 28 1.5 PHƯƠNG PHÁP ĐÁNH GIÁ CÁC THUẬT TOÁN PHÂN HẠNG 33 1.5.1 Phương pháp kiểm tra chéo 33 1.5.2 Xác định hiệu phương pháp phân hạng 35 1.6 KẾT LUẬN CHƯƠNG .38 Chương - PHÂN HẠNG VÀ DỰ ĐOÁN GEN LIÊN QUAN ĐẾN BỆNH DỰA TRÊN MẠNG TƯƠNG TÁC GEN/PROTEIN 41 2.1 ĐẶT VẤN ĐỀ 41 2.1.1 Bài toán phân hạng nút đồ thị 43 2.1.2 Thuật toán phân hạng trang kết hợp với xác suất tiên nghiệm .43 2.1.3 Thuật toán phân hạng học tăng cường 45 2.1.4 Thuật tốn bước ngẫu nhiên có quay lui 46 2.2 PHÂN HẠNG GEN BẰNG PHƯƠNG PHÁP HỌC TĂNG CƯỜNG KẾT HỢP VỚI XÁC SUẤT TIÊN NGHIỆM .48 2.2.1 Thuật toán phân hạng học tăng cường kết hợp với xác suất tiên nghiệm 48 v 2.2.2 Dữ liệu thực nghiệm 51 2.2.3 Thực nghiệm kết 53 2.3 PHÂN HẠNG GEN BẰNG PHƯƠNG PHÁP TÍNH TỔNG XÁC SUẤT LIÊN KẾT TRONG MẠNG TƯƠNG TÁC GEN/PROTEIN 61 2.3.1 Thuật toán dựa xác suất liên kết 61 2.3.2 Dữ liệu thực nghiệm 67 2.3.3 Thực nghiệm kết 67 2.4 SO SÁNH CÁC PHƯƠNG PHÁP PHÂN HẠNG GEN ĐỀ XUẤT .74 2.4.1 Về nguyên tắc thực hiện, ưu nhược điểm phạm vi áp dụng 74 2.4.2 Về thực nghiệm 75 2.5 KẾT LUẬN CHƯƠNG .76 Chương - PHÂN HẠNG VÀ DỰ ĐỐN GEN GÂY BỆNH DỰA TRÊN MẠNG KHƠNG ĐỒNG NHẤT 78 3.1 ĐẶT VẤN ĐỀ 78 3.2 MẠNG KHÔNG ĐỒNG NHẤT BỆNH - GEN 82 3.2.1 Tổng quan phương pháp xây dựng mạng không đồng 82 3.2.2 Các mạng gen/protein .82 3.2.3 Các mạng bệnh tương đồng 86 3.2.4 Mạng lưỡng phân .87 3.3 THUẬT TỐN BƯỚC NGẪU NHIÊN CĨ QUAY LUI TRÊN MẠNG KHÔNG ĐỒNG NHẤT 87 3.4 CÁC THỰC NGHIỆM VÀ KẾT QUẢ 92 3.4.1 So sánh hiệu với phương pháp lớp .92 3.4.2 Dự đoán gen liên quan đến bệnh Alzheimer 94 vi 3.5 KẾT LUẬN CHƯƠNG .95 KẾT LUẬN .97 DANH MỤC CÁC CÔNG TRÌNH ĐÃ CƠNG BỐ .100 TÀI LIỆU THAM KHẢO .101 vii DANH MỤC CÁC CHỮ VIẾT TẮT KÝ HIỆU DIỄN GIẢI TIẾNG ANH TIẾNG VIỆT Diện tích phía đường cong ROC AUC Area Under ROC Curve BIND Biomolecular Interaction Network Cơ sở liệu mạng tương tác sinh Database học phân tử BioGRID Biological General Repository for Cơ sở liệu sinh học công khai Interaction data sets bao gồm nhiều liệu tương tác CANDID A flexible method for prioritizing Một phương pháp phân hạng gen candidate genes for complex human giới thiệu Hutz cộng traits CIPHER Correlating protein Interaction Một phương pháp phân hạng gen network and PHEnotype network to giới thiệu Wu cộng pRedict disease genes DNA DeoxyriboNucleic Acid A-xít deoxyribonucleic DO Disease Ontology Bản thể bệnh EST Expressed Sequence Tag Thẻ biểu diễn trình tự eVOC A controlled vocabulary for unifying Một từ vựng kiểm soát để hợp gene expression data liệu biểu gen FN False Negative Âm tính giả (mẫu mang nhãn dương bị phân lớp sai vào lớp âm) FP False Positive Dương tính giả (mẫu mang nhãn âm bị phân lớp sai vào lớp dương) GO Gene Ontology Bản thể gen GWAS Genome - Wide Association Studies Nghiên cứu liên kết gen mở rộng (nghiên cứu tương quan toàn nhiễm sắc thể) HITS Hypertext Induced Topic Search Thuật tốn tìm kiếm Web HPO Human Phenotype Ontology Bản thể kiểu hình người HPRD Human Protein Reference Database Cơ sở liệu tương tác protein người KEGG Kyoto Encyclopedia of Genes and Bách khoa toàn thư Kyoto gen viii Genomes gen LOOCV Leave one out cross validation Kiểm tra chéo bỏ MeSH Medical Subject Heading Cơ sở liệu chủ đề y học MINT Molecular Interaction Database Cơ sở liệu tương tác phân tử MPO Mammalian Phenotype Ontology Bản thể kiểu hình động vật có vú NCBI National Center for Biotechnology Trung tâm Thông tin Công nghệ Information Sinh học Quốc gia OMIM Online Mendelian Inheritance in Cơ sở liệu trực tuyến di Man truyền Mendel người PRINCE PRIoritizatioN Elucidation ROC Đường cong đặc trưng hoạt động Receiver Operating Characteristic/ thu nhận - để xác định có Receiver Operating Curve tín hiệu nhiễu RWR Random Walk with Restart RWRH Random Walk with Restart on Thuật tốn bước ngẫu nhiên có Heterogeneous network quay lui mạng không đồng STRING Search Tool for the Retrieval of Cơng cụ tìm kiếm tương tác Interacting Genes/Proteins gen/protein TN True Negative Âm tính thật (mẫu mang nhãn âm phân lớp vào lớp âm) TP True Positive Dương tính thật (mẫu mang nhãn dương phân lớp vào lớp dương) UMLS Unified Medical Language System Hệ thống ngôn ngữ y học thống Yeast Two-Hybrid System Hệ thống lai kép nấm men (một phương pháp sử dụng để xác định tương tác protein) Y2H and Complex Một phương pháp phân hạng gen giới thiệu Vanunu cộng Thuật tốn bước ngẫu nhiên có quay lui ix DANH MỤC CÁC HÌNH Hình 1 Cấu trúc DNA Hình Sơ đồ tổng hợp protein từ gen 10 Hình Thay exon sơ đồ kết nối cho phép tế bào tạo protein khác từ gen đơn lẻ 11 Hình Sơ đồ tổng quan phân hạng gen 12 Hình Sơ đồ dự đốn gen liên quan đến bệnh dựa mơ hình học máy [59] 14 Hình Sơ đồ phương pháp phân hạng gen dựa mạng 15 Hình Mơ rối loạn mạng sinh học nguyên nhân gây bệnh người 21 Hình Phương pháp đánh giá thuật tốn phân hạng gen 34 Hình Phương pháp vẽ đường cong ROC 36 Hình Thuật toán RL_Rank with priors 50 Hình 2 Đường biểu diễn giá trị AUC trung bình 398 bệnh với tham số β = 0.8 γ tăng từ 0.1 đến 0.9 53 Hình Đường biểu diễn giá trị AUC trung bình 398 bệnh với tham số β = 0.7 γ tăng từ 0.1 đến 0.9 54 Hình Đường biểu diễn giá trị AUC trung bình 398 bệnh với tham số γ = 0.5 β tăng từ 0.1 đến 0.9 55 Hình Đường cong ROC biểu diễn kết RL_Rank with priors với tham số γ = 0.5, β = 0.7 PageRank with priors với tham số β = 0.7 56 Hình Ví dụ tính tốn xác suất đường đồ thị 64 Hình Thủ tục SigPathSum tính tốn độ liên quan nút với nút truy vấn 65 x Hình Thuật tốn phân hạng gen dựa xác suất liên kết 66 Hình Đường biểu diễn giá trị AUC trung bình thay đổi giá trị f 67 Hình 10 Biểu diễn đường cong ROC SigPathSum RWR 69 Hình 11 Biểu diễn đường cong ROC RL_Rank with Priors, SigPathSum RWR 75 Hình Sơ đồ xây dựng mạng khơng đồng tích hợp bệnh - gen 83 Hình Sơ đồ hoạt động thuật toán RWRH 88 Hình 3 Thuật tốn RWRH 91 Hình Đường cong ROC biểu diễn kết dự đoán mạng dựa HPO OMIM 93 100 DANH MỤC CÁC CÔNG TRÌNH ĐÃ CƠNG BỐ [T1] Đặng Vũ Tùng, Dương Anh Trà, Lê Đức Hậu, Từ Minh Phương, "Phân hạng gen gây bệnh sử dụng học tăng cường kết hợp với xác xuất tiên nghiệm.", Chun san Các cơng trình nghiên cứu, ứng dụng phát triển CNTT&TT, Tập V-1, số 13 (33), tháng 6/2015 [T2] Duc-Hau Le, Vu-Tung Dang, "Ontology-based disease similarity network for disease gene prediction" - Vietnam Journal Computer Science, Springer, DOI 10.1007/s40595-016-0063-3 [T3] Nguyễn Đại Phong, Đặng Vũ Tùng, Lê Đức Hậu, Từ Minh Phương, "Một phương pháp cải tiến cho toán xác định gen liên quan đến bệnh", Kỷ yếu Hội thảo FAIR 2016 [T4] Đặng Vũ Tùng, Nguyễn Đại Phong, Lê Đức Hậu, Từ Minh Phương, "Một phương pháp phân hạng gen gây bệnh dựa tổng xác suất liên kết mạng tương tác protein", Chun san Các cơng trình nghiên cứu, ứng dụng phát triển CNTT&TT, Tập V-2, số 16 (36), tháng 12/2016 101 TÀI LIỆU THAM KHẢO [1] Acland A and Agarwala R., (2014), Database resources of the National Center for Biotechnology Information, Nucleic Acids Research, vol 42, pp 7-17 [2] Adie E., Adams R., Evans K., Porteous D., and Pickard B., (2005), Speeding disease gene discovery by sequence based candidate prioritization, BMC Bioinformatics, vol 6, p 55 [3] Adie E., Adams R., Evans K., Porteous D., and Pickard B., (2006), Suspects: enabling fast and effective prioritization of positional candidates, Bioinformatics, vol 22, pp 773-774 [4] Aerts S., Lambrechts D., Maity S., Loo P V., Coessens B., Smet F D., et al., (2006), Gene prioritization through genomic data fusion, Nature Biotechnology, vol 24, pp 537-544 [5] Amberger J., Bocchini C A., Scott A F., and Hamosh A., (2009), McKusick's Online Mendelian Inheritance in Man (OMIM®), Nucleic Acids Research, vol 37, pp D793-D796 [6] Ashburner M., Ball C A., Blake J A., Botstein D., Butler H., Cherry J M., et al., (2000), Gene Ontology: tool for the unification of biology, Nat Genet, vol 25, pp 25-29 [7] Bader G., Donaldson I., and Wolting C., (2001), BIND–The Biomolecular Interaction Network Database, Nucleic Acids Res 29:242-5 [8] Barabasi A and Albert R., (1999), Emergence of scaling in random networks, Science, vol 286, pp 509-512 [9] Barabasi A., Gulbahce N., and Loscalzo J., (2011), Network medicine: a network-based approach to human disease, Nat Rev Genet, vol 12, pp 5668 102 [10] Barrell D., Dimmer E., Huntley R P., Binns D., O’Donovan C., and Apweiler R., (2009), The GOA database in 2009 - an integrated Gene Ontology Annotation resource, Nucleic Acids Res, vol 37, pp D396–D403 [11] Berg J., Lassig M., and Wagner A., (2004), Structure and evolution of protein interaction networks: a statistical model for link dynamics and gene duplications, BMC Evol Biol, vol [12] Bodenreider O., (2004), The Unified Medical Language System (UMLS): integrating biomedical terminology, Nucleic Acids Res vol 32, pp D267D270 [13] Bodhini D., Sandhiya M., Ghosh S., Majumder P., Rao M., Mohan V., et al., (2012), Association of His1085His INSR gene polymorphism with type diabetes in South Indians, Diabetes Technol Ther, vol 14, pp 696-700 [14] Brunner H and Driel M v., (2004), From syndrome families to functional genomics, Nat Rev Genet, vol 5, pp 545-551 [15] Calvo S., Jain M., Xie X., Sheth S A., Chang B., Goldberger O A., et al., (2006), Systematic identification of human mitochondrial disease genes through integrative genomics, Nat Genet, vol 38, pp 576-582 [16] Care M., Bradford J., and Needham C., (2009), Combining the interactome and deleterious SNP predictions to improve disease gene identification, Hum Mutat, vol 30, pp 485-492 [17] Charbonnier S., Gallego O., and Gavin A., (2008), The social network of a cell: recent advances in interactome mapping, Biotechnol Annu Rev vol 14, pp 1-28 [18] Chatr-aryamontri A., Ceol A., and Palazzi L., (2007), MINT: the Molecular INTeraction database, Nucleic acids research 35:D572-4 [19] Chen J., Aronow B J., and Jegga A G., (2009), Disease candidate gene identification and prioritization using protein interaction networks, BMC Bioinformatics, vol 10:73 103 [20] Chen J., Xu H., Aronow B J., and Jegga A G., (2007), Improved human disease candidate gene prioritization using mouse phenotype, BMC Bioinformatics, vol 8:392 [21] Chen X., Liu M.-X., and Yan G.-Y., (2012), Drug–target interaction prediction by random walk on the heterogeneous network, Molecular BioSystems, vol 8, pp 1970-1978 [22] Chen Y., Zhu J., Lum P., Yang X., Pinto S., MacNeil D., et al., (2008), Variations in DNA elucidate molecular networks that cause disease, Nature Genetics, vol 452, pp 429-435 [23] Consortium U., (2010), The Universal Protein Resource (UniProt) in 2010, Nucleic Acids Res, vol 38, pp D142–D148 [24] Derhami V., Khodadadian E., Ghasemzadeh M., and Bidoki A M Z., (2013), Applying reinforcement learning for web pages ranking algorithms, Applied Soft Computing, vol 13, pp 1686–1692 [25] Dezso Z., Nikolsky Y., Nikolskaya T., Miller J., Cherba D., Webb C., et al., (2009), Identifying disease-specific genes based on their topological significance in protein networks, BMC Syst Biol, vol [26] Dobrin R., Zhu J., Molony C., Argman C., Parrish M L., Carlson S., et al., (2009), Multi-tissue coexpression networks reveal unexpected subnetworks associated with disease, Genome Biol, vol 10, p R55 [27] Driel M v., Bruggeman J., Vriend G., Brunner H., and Leunissen J., (2006), A text-mining analysis of the human phenome, Eur J Hum Genet, vol 14, pp 535-542 [28] Driel M v., Cuelenaere K., Kemmeren P., Leunissen J., Brunner H., and Vriend G., (2005), GeneSeeker: extraction and integration of human diseaserelated information from web-based genetic databases, Nucleic acids research, vol 33, pp W758-W761 104 [29] Driel v., Cuelenaere, Kemmeren, Leunissen, and Brunner, (2003), A new web-based data mining tool for the identification of candidate genes for human genetic disorders, European Journal of Human Genetics vol 11, pp 57-63 [30] Eisen M B., Spellman P T., Brown P O., and Botstein D., (1998), Cluster analysis and display of genome-wide expression patterns, Proc Natl Acad Sci USA 95, pp 14863–14868 [31] Eisenberg E and Levanon E., (2003), Preferential attachment in the protein network evolution, Phys Rev Lett, vol 91 [32] Ewing R M., Chu P., Elisma F., Li H., Taylor P., Climie S., et al., (2007), Large-scale mapping of human protein-protein interactions by mass spectrometry, Mol Syst Biol vol [33] Franke L., Bakel H v., and Fokkens L., (2006), Reconstruction of a functional human gene network, with an application for prioritizing positional candidate genes, Am J Hum Genet, vol 78, pp 1011-1025 [34] Goehler H., Lalowski M., Stelzl U., Waelter S., Stroedicke M., Worm U., et al., (2004), A protein interaction network links GIT1, an enhancer of huntingtin aggregation, to Huntington's disease, Mol Cell, vol 15, pp 853865 [35] Goh K., Cusick M., and Valle D., (2007), The human disease network, Proc Natl Acad Sci USA, vol 104 [36] Guala D and Sonnhammer E L L., (2017), A large-scale benchmark of gene prioritization methods, vol 7, p 46598 [37] Haveliwala T H., (2002), Topic-sensitive PageRank, Proceedings of the 11th International World Wide Web Conference, Honolulu, Hawaii, pp 517526 [38] Hidalgo C., Blumm N., and Barabasi A., (2009), A dynamic network approach for the study of human phenotypes, PLoS Comput Biol, vol 105 [39] Hutz J., Kraja A., McLeod H., and Province M., (2008), Candid: a flexible method for prioritizing candidate genes for complex human traits, Genetic Epidemiology, vol 32, pp 779-790 [40] Hwang S., Son S., and Kim S., (2008), A protein interaction network associated with asthma, J Theor Biol vol 252, pp 722-731 [41] Jansen R., Yu H., Greenbaum D., Kluger Y., Krogan N., Chung S., et al., (2003), A Bayesian networks approach for predicting protein-protein interactions from genomic data, Science, vol 302, pp 449-453 [42] Jeh G and Widom J., (2002), Scaling personalized Web search, Stanford University, Computer Science Department Technical Report [43] Jensen L J., Kuhn M., and Stark M., (2009), STRING – a global view on proteins and their functional interactions in 630 organisms, Nucleic Acids Res, vol 37, pp D412–D416 [44] Jiang R., Gan M., and He P., (2011), Constructing a gene semantic similarity network for the inference of disease genes, BMC Systems Biology, vol 5(Suppl 2) [45] Junker B H., Koschützki D., and Schreiber F., (2006), Exploration of biological network centralities with CentiBiN, BMC Bioinformatics, vol 7:219 [46] Kanehisa M., Araki M., Goto S., Hattori M., Hirakawa M., Itoh M., et al., (2008), KEGG for linking genomes to life and the environment, Nucleic Acids Res, vol 36, pp D480-484 [47] Karni S., Soreq H., and Sharan R., (2009), A Network-Based Method for Predicting Disease-Causing Genes, Journal of Computational Biology, vol 16, pp 181-189 [48] Kazemi B., Seyed N., Moslemi E., Bandehpour M., Torbati M B., Saadat N., et al., (2009), Insulin Receptor Gene Mutations in Iranian Patients with Type II Diabetes Mellitus, Iranian Biomedical Journal, vol 13, pp 161-168 106 [49] Keerthikumar S., Bhadra S., Kandasamy K., Raju R., Ramachandra Y L., Bhattacharyya C., et al., (2009), Prediction of candidate primary immunodeficiency disease genes using a support vector machine learning approach, DNA Res., vol 16, pp 345-351 [50] Kelso J., Visagie J., Theiler G., Christoffels A., Bardien S., Smedley D., et al., (2003), eVOC: a controlled vocabulary for unifying gene expression data, Genome Res, vol 13, pp 1222-1230 [51] Kerrien S., Alam-Faruque Y., and Aranda B., (2007), IntAct–open source resource for molecular interaction data, Nucleic Acids Res, vol 35, pp D561-D566 [52] Khodadadian E., Ghasemzadeh M., Derhami V., and Mirsoleimani S A., (2012), A Novel Ranking Algorithm Based on Reinforcement Learning, Artificial Intelligence and Signal Processing (AISP), 2012 16th CSI International Symposium on, pp 546-551 [53] Kohler S., Bauer S., Horn D., and Robinson P N., (2008), Walking the Interactome for Prioritization of Candidate Disease Genes, The American Journal of Human Genetics, vol 82, pp 949-958 [54] Kohler S., Doelken S C., Mungall C J., Bauer S., Firth H V., Forestier I B., et al., (2014), The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data, Nucleic Acids Research, vol 42, pp D966–D974 [55] Krauthammer M., Kaufmann C., and Gilliam T., (2004), Molecular triangulation: bridging linkage and molecular network information for identifying candidate genes in Alzheimer’s disease, Proc Natl Acad Sci USA, vol 101, pp 15148-15153 [56] Lage K., Karlberg E O., Storling Z M., Olason P I., Pedersen A G., Rigina O., et al., (2007), A human phenome-interactome network of protein 107 complexes implicated in genetic disorders, Nat Biotechnology, vol 25, pp 309-316 [57] Le D.-H., (2015), Network-based ranking methods for prediction of novel disease associated microRNAs, Computational Biology and Chemistry, vol 58, pp 139-148 [58] Le D.-H., (2015), A novel method for identifying disease associated protein complexes based on functional similarity protein complex networks, Algorithms for Molecular Biology, vol 10 [59] Le D.-H., Hoai N X., and Kwon Y.-K., (2015), A Comparative study of classification-based machine learning methods for novel disease gene prediction, Knowledge and Systems Engineering, vol 326, pp 577-588 [60] Le D.-H and Kwon Y.-K., (2012), GPEC: A Cytoscape plug-in for random walk-based gene prioritization and biomedical evidence collection, Computational Biology and Chemistry, vol 37, pp 17-23 [61] Le D.-H and Kwon Y.-K., (2013), Neighbor-favoring weight reinforcement to improve random walk-based disease gene prioritization, Computational Biology and Chemistry, vol 44, pp 1-8 [62] Le D.-H and Nguyen M.-H., (2015), Towards more realistic machine learning techniques for prediction of disease-associated genes, In: Proceedings of the sixth international symposium on information and communication technology, Hue City, 2833269, ACM, pp 116-120 [63] Le D H., (2015), Disease phenotype similarity improves the prediction of novel disease-associated microRNAs, In: 2015 2nd National Foundation for Science and Technology Development conference on information and computer science (NICS), pp 76-81 [64] Lee D., Park J., Kay K., Christakis N., Oltvai Z., and Barabasi A., (2008), The implications of human metabolic network topology for disease comorbidity, Proc Natl Acad Sci, vol 105, pp 9880-9885 108 [65] Li J., Gong B., Chen X., Liu T., Wu C., Zhang F., et al., (2011), DOSim: an R package for similarity between diseases based on disease ontology, BMC Bioinformatics, vol 12 [66] Li Y and Patra J., (2010), Genome-wide inferring gene-phenotype relationship by walking on the heterogeneous network, Bioinformatics, vol 26, pp 1219-1224 [67] Linghu B., Snitkin E S., Hu Z., Xia Y., and DeLisi C., (2009), Genome-wide prioritization of disease genes and identification of disease-disease associations from an integrated human functional linkage network, Genome Biology, vol 10 [68] Liu M., Liberzon A., and Kong S., (2007), Network-based analysis of affected biological processes in type diabetes models, PLoS Genet; 3:e96 [69] López-Bigas N and Ouzounis C., (2004), Genome-wide identification of genes likely to be involved in human genetic disease, Nucleic acids research, vol 32, pp 3108-3114 [70] Lovász L., (1996), Random walks on graphs: A survey, Combinatorics, Paul Erdos is Eighty, vol 2, pp 353-398 [71] Lu M., Zhang Q., Deng M., Miao J., Guo Y., Gao W., et al., (2008), An analysis of human microRNA and disease associations, PLoS ONE, vol [72] Lubovac Z., Gamalielsson J., and Olsson B., (2006), Combining functional and topological properties to identify core modules in protein interaction networks, Proteins, vol 64, pp 948-959 [73] Maglott D., Ostell J., Pruitt K D., and Tatusova T., (2011), Entrez gene: genecentered information at NCBI, Nucleic Acids Res, vol 39(suppl 1), pp D52–D57 [74] Markou M and Singh S., (2003), Novelty detection: a review—part 2: neural network based approaches, Signal Process, vol 8, pp 2499-2521 109 [75] Masoudi-Nejad A and Meshkin A., "RETRACTED CHAPTER Gene Prioritization Resources and the Evaluation Method," in Gene Prioritization: Rationale, Methodologies and Algorithms, ed Cham: Springer International Publishing, 2014, pp 9-23 [76] Myers S A., Nield A., and Myers M., (2012), Zinc Transporters, Mechanisms of Action and Therapeutic Utility: Implications for Type Diabetes Mellitus, Journal of Nutrition and Metabolism, vol 2012, p 13 [77] Nabieva E., Jim K., Agarwal A., B B C., and Singh M., (2005), Wholeproteome prediction of protein function via graph-theoretic analysis of interaction maps, Bioinformatics, vol 21, pp 302-310 [78] Navlakha S and Kingsford C., (2010), The power of protein interaction networks for associating genes with diseases, Bioinformatics vol 26, pp 1057-1063 [79] Neduva V., Linding R., Su-Angrand I., Stark A., Masi F d., Gibson T., et al., (2005), Systematic discovery of new recognition peptides mediating protein interaction network, PLoS Biol, vol 3, p e405 [80] Obayashi T and Kinoshita K., (2011), COXPRESdb: a database to compare gene coexpression in seven model animals, Nucleic Acids Res, vol 39, pp D1016–D1022 [81] Obayashi T., Kinoshita K., Nakai K., Shibaoka M., Hayashi S., Saeki M., et al., (2006), ATTED-II: a database of co-expressed genes and cis elements for identifying co-regulated gene groups in Arabidopsis, Nucleic Acids Res, vol 35, pp D863-D869 [82] Osborne J D., Flatow J., Holko M., Lin S M., Kibbe W A., Zhu L J., et al., (2009), Annotating the human genome with Disease Ontology, BMC Genomics, vol 10: S6 [83] Oti M and Brunner H., (2007), The modular nature of genetic diseases, Clin Genet vol 71, pp 1-11 110 [84] Oti M., Snel B., Huynen M A., and Brunner H G., (2006), Predicting disease genes using protein-protein interactions, J Med Genet, vol 43, pp 691-699 [85] Perez-Iratxeta C., Bork P., and Andrade M A., (2002), Association of genes to genetically inherited diseases using data mining, Nature Genetics, vol 31, pp 316 - 319 [86] Peri S., Navarro J., Amanchy R., and Kristiansen T., (2003), Development of human protein reference database as an initial platform for approaching systems biology in humans, Genome Res, vol 13, pp 2363-2371 [87] Pers T., Hansen N., Lage K., Koefoed P., Dworzynski P., Miller M., et al., (2011), Meta-analysis of heterogeneous data sources for genome-scale identification of risk genes in complex phenotypes, Genetic Epidemiology, vol 35, pp 318-332 [88] Pesquita C., Faria D., Falcão A O., Lord P., and Couto F M., (2009), Semantic Similarity in Biomedical Ontologies, PLoS Comput Biol 5(7): e1000443 [89] Piro R., Molineris I., Ala U., P P P., and Cunto F D., (2010), Candidate gene prioritization based on spatially mapped gene expression: an application to XLMR, Bioinformatics, vol 26, pp 618-624 [90] Piro R M and Cunto F D., (2012), Computational approaches to diseasegene prediction: rationale, classification and successes, FEBS, vol 279, pp 678-696 [91] Poretsky L., 2010, Principles of Diabetes Mellitus, ed.: Springer New York Dordrecht Heidelberg London [92] Prasad T K., Goel R., Kandasamy K., Keerthikumar S., Kumar S., Mathivanan S., et al., (2009), Human Protein Reference Database – 2009 update, Nucleic Acids Res, vol 37, pp D767-D772 111 [93] Quackenbush J., (2001), Computational analysis of microarray data, Nat Rev Genet, vol 2, pp 418–427 [94] Radivojac P., Peng K., Clark W T., Peters B J., Mohan A., Boyle S M., et al., (2008), An integrated approach to inferring gene-disease associations in humans, Proteins Struct Funct Bioinform., vol 72, pp 1039-1037 [95] Rende D., Baysal N., and Kirdar B., (2013), Complex Disease Interventions from a Network Model for Type Diabetes, PLoS One, vol [96] Resnik P., (1995), Using information content to evaluate semantic similarity in a taxonomy, Paper presented at the 14th international joint conference on artificial intelligence, vol 1, Montreal [97] Richard I H., Cockram C S., Flyvbjerg A., and Goldstein B J., 2010, Textbook of Diabetes, ed.: Wiley-Blackwell [98] Rual J., Venkatesan K., and Hao T., (2005), Towards a proteomescale map of the human protein-protein interaction network, Nature Genetics, vol 437, pp 1173-1178 [99] Ruffner H., Bauer A., and Bouwmeester T., (2007), Human protein-protein interaction networks and the value for drug discovery, Drug Discov Today, vol 12, pp 709-716 [100] Rzhetsky A and Gomez S., (2011), Birth of scale-free molecular networks and the number of distinct DNA and protein domains per genome, Bioinformatics, vol 17, pp 988-996 [101] Rzhetsky A., Wajngurt D., Park N., and Zheng T., (2007), Probing genetic overlap among complex human phenotypes, Proc Natl Acad Sci USA, vol 104, pp 11694-11699 [102] Sam L., Liu Y., Li J., Friedman C., and Lussier Y., (2007), Discovery of protein interaction networks shared by diseases, Pac Symp Biocomput, pp 76-87 112 [103] Schlicker A., (2010), Ontology-based Similarity Measures and their Application in Bioinformatics, Universität des Saarlandes, p 166 [104] Seebacher J and Gavin A., (2011), SnapShot: Protein-protein interaction networks, Cell 2011, vol 144:1000 [105] Seelow D., Schwarz J., and Schuelke M., (2008), Genedistiller - distilling candidate genes from linkage intervals, PLoS ONE, vol 3:e:3874 [106] Sharan R and Ideke T., (2006), Modeling cellular machinery through biological network comparison, Nat Biotechnol, vol 24, pp 427-433 [107] Smalter A., Lei S F., and Chen X.-w., (2007), Human Disease-Gene Classification with Integrative Sequence-Based and Topological Features of Protein-Protein Interaction Networks, In: IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp 209-216 [108] Smith C L., Goldsmith C.-A W., and Eppig J T., (2004), The Mammalian Phenotype Ontology as a tool for annotating, analyzing and comparing phenotypic information, Genome Biol, vol 6: R7 [109] Stark C., Breitkreutz B., Reguly T., Boucher L., Breitkreutz A., and Tyers M., (2006), BioGRID: a general repository for interaction datasets, Nucleic Acids Res, vol 34, pp 535-539 [110] Stelzl U., Worm U., Lalowski M., Haenig C., Brembeck F., Goehler H., et al., (2005), A human proteinprotein interaction network: a resource for annotating the proteome, Cell, vol 122, pp 957-968 [111] Sun J., Patra J C., and Li Y., (2009), Functional link artificial neural network-based disease gene prediction, In: International joint conference on neural networks (IJCNN), 14–19 June 2009 [112] Taylor I., Linding R., and Warde-Farley D., (2009), Dynamic modularity in protein interaction networks predicts breast cancer outcome, Nat Biotechnol, vol 27, pp 199-204 113 [113] Taylor R., (2012), Insulin Resistance and Type Diabetes, Diabetes, vol 61, pp 778-779 [114] Tiffin, N., Kelso, F J., Powell, R A., et al., (2005), Integration of text- and data-mining using ontologies successfully selects disease gene candidates, Nucleic acids research, vol 33, pp 1544-1552 [115] Tranchevent L C., Barriot R., Yu S., Vooren S V., Loo P V., Coessens B., et al., (2008), Endeavour update: a web resource for gene prioritization in multiple species, Nucleic acids research, vol 36, pp W377-W384 [116] Vanunu O., Magger O., Ruppin E., Shlomi T., and Sharan R., (2010), Associating genes and protein complexes with disease via network propagation, PLoS Comput Biol, vol 6(1):e1000641 [117] Vidal M., (2009), A unifying view of 21st century systems biology, FEBS Lett vol 538, pp 3891-3894 [118] Vidal M., Cusick M., and Barabasi A., (2011), Interactome networks and human disease, Cell 2011, vol 144, pp 986-998 [119] Wagner A and Fell D., (2001), The small world inside large metabolic networks, Proc Biol Sci, vol 268, pp 1803-1810 [120] Wang H., Chang C K., Yang H.-I., and Chen Y., (2013), Estimating the Relative Importance of Nodes in Social Networks, Journal of Information Processing Society of Japan, vol 21(3), pp 414-422 [121] Wang X., Gulbahce N., and HaiyuanYu, (2011), Network-based methods for human disease gene prediction, Briefings in Functional Genomics, vol 10, pp 280-293 [122] Watts D J and Strogatz S H., (1998), Collective dynamics of small-world networks, Nature vol 393(1), pp 440-442 114 [123] Wong S., Zhang L., Tong A., Li Z., Goldberg D., King O., et al., (2004), Combining biological networks to predict genetic interactions, Proc Natl Acad Sci USA, vol 101, pp 15682-15687 [124] Wu X., Jiang R., Zhang M Q., and Li S., (2008), Network-based global inference of human disease genes, Mol Syst Biol, vol [125] Wu X., Liu Q., and Jiang R., (2009), Align human interactome with phenome to identify causative genes and networks underlying disease families, Bioinformatics, vol 25, pp 98-104 [126] Yu H., Tardivo L., and Tam S., (2011), Next-generation sequencing to generate interactome datasets, Nat Methods, vol 8, pp 478-480 [127] Zhang W., Sun F., and Jiang R., (2011), Integrating multiple protein-protein interaction networks to prioritize disease genes: a Bayesian regression approach, The Ninth Asia Pacific Bioinformatics Conference [128] Zhou M., Wang X., Li J., Hao D., Wang Z., Shi H., et al., (2015), Prioritizing candidate disease-related long non-coding RNAs by walking on the heterogeneous lncRNA and disease network, Mol Biosyst vol 11, pp 760-769 [129] Zhu M and Zhao S., (2007), Candidate gene identification approach: progress and challenges, IntJ Biol Sci, vol 3, pp 420-427

Định dạng
Số trang	126
Dung lượng	1,09 MB