Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 87 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
87
Dung lượng
11,01 MB
Nội dung
BỘ GIÁO DỤC VÀ ĐÀO TẠO TRƯỜNG ĐẠI HỌC SƯ PHẠM KỸ THUẬT THÀNH PHỐ HỒ CHÍ MINH LUẬN VĂN THẠC SĨ HOÀNG VŨ GIẢI PHÁP SONG SONG CHO VẤN ĐỀ GOM CỤM METAGENOMIC NGÀNH: KHOA HỌC MÁY TÍNH – 8480101 SKC007257 Tp Hồ Chí Minh, tháng 04/2021 BỘ GIÁO DỤC VÀ ĐÀO TẠO TRƯỜNG ĐẠI HỌC SƯ PHẠM KỸ THUẬT THÀNH PHỐ HỒ CHÍ MINH LUẬN VĂN THẠC SĨ HOÀNG VŨ GIẢI PHÁP SONG SONG CHO VẤN ĐỀ GOM CỤM TRÌNH TỰ METAGENOMIC NGÀNH: KHOA HỌC MÁY TÍNH - 8480101 Tp Hồ Chí Minh, tháng / 2021 BỘ GIÁO DỤC VÀ ĐÀO TẠO TRƯỜNG ĐẠI HỌC SƯ PHẠM KỸ THUẬT THÀNH PHỐ HỒ CHÍ MINH LUẬN VĂN THẠC SĨ HOÀNG VŨ GIẢI PHÁP SONG SONG CHO VẤN ĐỀ GOM CỤM TRÌNH TỰ METAGENOMIC NGÀNH: KHOA HỌC MÁY TÍNH – 8480101 Hướng dẫn khoa học: TS LÊ VĂN VINH Tp Hồ Chí Minh, tháng / 2021 LÝ LỊCH KHOA HỌC I LÝ LỊCH SƠ LƯỢC: Họ & tên: HỒNG VŨ Giới tính: Nam Ngày, tháng, năm sinh: 26 / / 1983 Nơi sinh: Kiên Giang Quê quán: Thái Bình Dân tộc: Kinh Chỗ riêng địa liên lạc: 2111 ấp Quảng Lộc, xã Quảng Tiến, huyện Trảng Bom, tỉnh Đồng Nai Điện thoại quan: Điện thoại nhà riêng: 0989216882 Fax: E-mail: hvu267@gmail.com II QUÁ TRÌNH ĐÀO TẠO: Đại học: Hệ đào tạo: Chính quy Thời gian đào tạo từ 9/2001 đến 3/2006 Nơi học (trường, thành phố): Đại học Bách Khoa Tp Hồ Chí Minh Ngành học: Cơng nghệ thơng tin Tên đồ án, luận án môn thi tốt nghiệp: Luận án: Phần mềm thời khóa biểu cho trường ĐH Bách Khoa Ngày & nơi bảo vệ đồ án, luận án thi tốt nghiệp: 12/2005 – ĐH Bách Khoa TP Hồ Chí Minh Người hướng dẫn: TS Nguyễn Thanh Sơn Thạc sĩ: Hệ đào tạo: Chính quy Thời gian đào tạo từ 10/2019 đến 5/2021 Nơi học (trường, thành phố): Đại học Sư Phạm Kỹ Thuật Tp Hồ Chí Minh Ngành học: Khoa học máy tính Tên luận văn: Giải pháp song song cho vấn đề gom cụm trình tự metagenomic i TÀI LIỆU THAM KHẢO [1] J Handelsman, et al The New Science of Metagenomics: Revealing the Secrets of Our Microbial Planet Washington (DC): National Academies Press (US); 2007: 12-32 [2] Fiers, Walter, et al Complete nucleotide sequence of bacteriophage MS2 RNA: primary and secondary structure of the replicase gene Nature 260.5551 (1976): 500-507 [3] Sanger, F., Coulson, A R., Friedmann, T., Air, G M., Barrell, B G., Brown, N L., & Smith, M (1978) The nucleotide sequence of bacteriophage φX174 Journal of molecular biology, 125(2), 225-246 [4] Fleischmann RD, Adams MD, White O, Clayton RA, Kirkness EF, et al Whole-genome random sequencing and assembly of Haemophilus influenzae Rd." Science 269.5223 (1995): 496-512 [5] Handelsman J, Rondon MR, Brady SF, Clardy J, Goodman RM Molecular biological access to the chemistry of unknown soil microbes: a new frontier for natural products Chemistry & biology 5.10 (1998): R245-R249 [6] Rondon MR, August PR, Bettermann AD, Brady SF, Grossman TH, et al Cloning the soil metagenome: a strategy for accessing the genetic and functional diversity of uncultured microorganisms Applied and environmental microbiology 66.6 (2000): 2541-2547 [7] Wooley, John C., Adam Godzik, and Iddo Friedberg A primer on metagenomics PLoS Comput Biol 6.2 (2010): e1000667 [8] Qin, J., Li, R., Raes, J et al A human gut microbial gene catalogue established by metagenomic sequencing Nature 464.7285 (2010): 59-65 [9] Thomas, T., Gilbert, J & Meyer, F Metagenomics-a guide from sampling to data analysis Microbial informatics and experimentation 2.1 (2012): 52 [10] L Panawala, "Difference Between Gene and Genome", Feb 2017 Internet: https://www.researchgate.net/publication/313839958_Difference_ Between_Gene_and_Genome, 02/2021 [11] Alberts, B., Johnson, A., Lewis, J., Raff, M., Roberts, K., & Walter, P The structure and function of DNA In Molecular Biology of the Cell 4th edition Garland Science, 2002 [12] Woese, C R., Kandler, O., & Wheelis, M L (1990) Towards a natural system of organisms: proposal for the domains Archaea, Bacteria, and Eucarya Proceedings of the National Academy of Sciences, 87(12), 45764579 [13] Sanger F, Nicklen S, Coulson AR DNA sequencing with chain-terminating inhibitors Proceedings of the national academy of sciences 74.12 (1977): 5463-5467 [14] Shendure, J., Ji, H Next-generation DNA sequencing Nature biotechnology 26.10 (2008): 1135-1145 [15] Metzker, M L Sequencing technologies—the next generation Nature reviews genetics 11.1 (2010): 31-46 [16] J G Black, Microbiology, 8th ed US: Wiley, January 2012 [17] Bohlin J Genomic signatures in microbes - properties and applications The Scientific World Journal 2011;11:715-725 [18] Mesbah, M.K., Premachandran, U., & Whitman, W.B (1989) Precise Measurement of the G+C Content of Deoxyribonucleic Acid by HighPerformance Liquid Chromatography International Journal of Systematic and Evolutionary Microbiology, 39, 159-167, April 2011 [19] Muto A, Osawa S The guanine and cytosine content of genomic DNA and bacterial evolution Proc Natl Acad Sci USA 1987;84(1):166-169 [20] Sueoka N On the genetic basis of variation and heterogeneity of DNA base composition Proc Natl Acad Sci USA 1962 Apr 15;48:582–592 53 [21] Gori, F., Mavroedis, D., Jetten, M S., & Marchiori, E Genomic signatures for metagenomic data analysis: Exploiting the reverse complementarity of tetranucleotides In 2011 IEEE International Conference on Systems Biology (ISB) Sep 2011: 149-154 [22] Jeffrey, H J Chaos game representation of gene structure Nucleic acids research vol 18,8 (1990): 2163-70 [23] Saeed, I., & Halgamuge, S K The oligonucleotide frequency derived error gradient and its application to the binning of metagenome fragments BMC genomics Vol 10 No S3, 1-13 BioMed Central, 2009 [24] Dalevi D, Dubhashi D, Hermansson M Bayesian classifiers for detecting HGT using fixed and variable order markov models of genomic signatures Bioinformatics 2006;22(5):517-522 [25] Bohlin, J., Skjerve, E & Ussery, D.W Reliability and applications of statistical methods based on oligonucleotide frequencies in bacterial and archaeal genomes BMC genomics 9.1 (2008): 104 [26] Kelley, D.R., Salzberg, S.L Clustering metagenomic sequences with interpolated Markov models BMC Bioinformatics 11.1 (2010): 544 [27] Pengyu N.,Yun X.,Wenhua C., and Weihua P - Metabinning: Hybrid metagenomic fragments binning using l-mers repeats and composition In The 6th International Conferenceon Bioinformatics and Biomedical Engineering (iCBBE2012), China, pp 375- 378 [28] Wu YW, Ye Y A novel abundance-based algorithm for binning metagenomic sequences using l-tuples J Comput Biol 2011;18(3):523-534 [29] Tanaseichuk O., Borneman J., and Jiang T - Separating metagenomic short reads into genomes via clustering Algorithms Mol Biol 7.1 (2012): 27 [30] Wang Y., Leung H C., Yiu S M., and Chin F Y - Metacluster 5.0: a tworound binning approach for metagenomic data for low-abundance species in a noisy sample, Bioinformatics, 28 (18) (2012) pp i356-i362 54 [31] Wang Y., Leung H C., Yiu S M., and Chin F Y - Metacluster 4.0: a novel binning Algorithm for ngs reads and huge number of species, Journal of Computational Biology, 19 (2) (2012) pp 241-249 [32] Olga T., James B., and Tao J - A probabilistic approach to accurate abundance-based binning of metagenomic reads, Algorithms in Bioinformatics, 7534 (2012) pp 404-416 [33] Patterson, David A John L Hennessy (1998) Computer Organization and Design, Second Edition, Morgan Kaufmann Publishers, p 715 ISBN 155860-428-6 [34] Czarnul, P., Proficz, J., & Drypczewski, K Survey of methodologies, approaches, and challenges in parallel programming using high-performance computing systems Scientific Programming, 2020 [35] Message Passing Interface Forum, MPI: A message-passing interface standard version 3.0, Sep 2012 [36] Flynn, Michael J (September 1972) Some Computer Organizations and Their Effectiveness IEEE Transactions on Computers C-21 (9): 948–960 [37] Singh, I Review Article Review on Parallel and Distributed Computing Scholars Journal of Engineering and Technology (SJET), 2013, 218-225 [38] Amdahl, G.M., Validity of the single processor approach to achieving large scale computer capability, in Proceedings of AFIPS Spring Joint Computer Conference, April 1967: 483-485 [39] Gustafson, J L Reevaluating Amdahl’s law Communications of ACM, Vol.31(5), pp 532-533, 1988 [40] Andrey K., Srijak B., Jonathan D., and JoshuaS W Unsupervised statistical clustering of environmental shotgun sequences, BMC Bioinformatics, 10.1 (2009): 316 [41] T C & Z D Nguyen, "Markovbin: An algorithm to cluster metagenomic reads using a mixture modeling of hierarchical distributions In Proceedings of 55 the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics 2013 (pp 115-123)" [42] Wang, Y., Hu, H., & Li, X MBBC: an efficient approach for metagenomic binning based on clustering BMC bioinformatics, 16(1) (2015): 36 [43] Yang B, Peng Y, Qin J, Chin FYL Metacluster: unsupervised binning of environmental genomic fragments and taxonomic annotation In Proceedings of the first ACM international conference on bioinformatics and computational biology 2010: 170–179 [44] Liao R, Zhang R, Guan J, Zhou S A new unsupervised binning approach for metagenomic sequences based on n-grams and automatic feature weighting IEEE/ACM Trans Comput Biol Bioinform 2014;11(1):42–54 [45] Wu, Y W., Simmons, B A., & Singer, S W MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets Bioinformatics, 32(4) 2016: 605-607 [46] Vinh, L.V., Lang, T.V., Binh, L.T et al A two-phase binning algorithm using l-mer frequency on groups of non-overlapping reads Algorithms Mol Biol 10, (2015): [47] Fiannaca, A., La Paglia, L., La Rosa, M., Renda, G., Rizzo, R., Gaglio, S., & Urso, A Deep learning models for bacteria taxonomic classification of metagenomic data BMC bioinformatics, 19(7), 2018, 61-76 [48] Le, V V., Van Lang, T., & Van Hoai, T MetaAB-A Novel Abundance-Based Binning Approach for Metagenomic Sequences In International Conference on Nature of Computation and Communication Nov 2014: 132-141 [49] Kang, D D., Froula, J., Egan, R., & Wang, Z MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities PeerJ, (2015): e1165 [50] Alneberg, J., Bjarnason, B S., de Bruijn, I., Schirmer, M., Quick, J., Ijaz, U Z., & Quince, C CONCOCT: clustering contigs on coverage and composition arXiv preprint arXiv (2013):1312.4038 56 [51] Herath, D., Tang, S L., Tandon, K., Ackland, D., & Halgamuge, S K CoMet: a workflow using contig coverage and composition for binning a metagenomic sample with high precision BMC bioinformatics, 18(16) (2017): 571 [52] Lu, Y Y., Chen, T., Fuhrman, J A., & Sun, F COCACOLA: binning metagenomic contigs using sequence COmposition, read CoverAge, COalignment and paired-end read LinkAge Bioinformatics, 33(6) (2017): 791798 [53] Wood, D E., Lu, J., & Langmead, B Improved metagenomic analysis with Kraken Genome biology, 20(1), 2019, 257 [54] Liang, Q., Bible, P W., Liu, Y., Zou, B., & Wei, L DeepMicrobes: taxonomic classification for metagenomics with deep learning NAR Genomics and Bioinformatics, 2(1), 2020, lqaa009 [55] Van Le, V., Van Tran, H., Duong, H N., Bui, G X., & Van Tran, L Taxonomic assignment for large-scale metagenomic data on high-perfomance systems Journal of Computer Science and Cybernetics, 33(2), (2017): 119130 [56] Rasheed, Z., & Rangwala, H A map-reduce framework for clustering metagenomes In 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (2013, May): 549-558 [57] Yang, X., Zola, J., & Aluru, S Large-scale metagenomic sequence clustering on map-reduce clusters Journal of bioinformatics and computational biology, 11(01) (2013): 1340001 [58] Su, X., Xu, J., & Ning, K Parallel-META: efficient metagenomic data analysis based on high-performance computation BMC systems biology, 6(S1) (2012): S16 [59] Su, X., Pan, W., Song, B., Xu, J., & Ning, K Parallel-META 2.0: enhanced metagenomic data analysis with functional annotation, high performance computing and advanced visualization PloS one, 9(3) (2014): e89323 [60] Zhou F, Olman V, Xu Y Barcodes for genomes and applications BMC Bioinformatics 9.1 (2008): 1-11 57 [61] Chor B, David Horn NG, Levy Y, Massingham T Genomic dna k-mer spectra: models and modalities Genomic Biol 2009;10(10):R108 [62] Pham, D T., Dimov, S S., & Nguyen, C D 2005 Selection of K in K-means clustering Proceedings of the Institution of Mechanical Engineers, Part C: Journal of Mechanical Engineering Science, 219(1), 103-119 [63] Richter DC, Ott F, Auch AF, Schmid R, Huson DH Metasim - a sequencing simulator for genomics and metagenomics PLoS ONE 2008;3(10):e3373 [64] Girotto, S., Pizzi, C., & Comin, M MetaProb: accurate metagenomic reads binning based on probabilistic sequence signatures Bioinformatics, 32(17) (2016): i567-i575 [65] Tyson GW, Chapman J, Hugenholtz P, Allen EE, Ram RJ, Richardson PM, et al Community structure and metabolism through reconstruction of microbial genomes from the environment Nature 2004;428(6978):37–43 [66] Masud, M A., Rahman, M M., Bhadra, S., & Saha, S Improved k-means Algorithm using Density Estimation In 2019 International Conference on Sustainable Technologies for Industry 4.0 (STI) (Dec 2019): 1-6 [67] Krishna, K., & Murty, M N Genetic K-means algorithm IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 29(3) (1999): 433439 58 DANH MỤC CƠNG TRÌNH CÔNG BỐ Vu Hoang, Vinh Le Van, Hoai Tran Van, Lang Tran Van and Bao Huynh Quang Parallel algorithm for the unsupervised binning of metagenomic sequences ICMLSC 2021, The 5th International Conference on Machine Learning and Soft Computing (ACM Conference Proceedings), Sanya, China, January, 2021 59 60 61 62 63 64 65 ... không nhỏ cho nhà nghiên cứu 1.2 Bài tốn gom cụm trình tự metagenomic Bài tốn gom cụm trình tự metagenomic vấn đề quan trọng cần giải phân tích liệu metagenomic Mục tiêu tốn phân chia trình tự (gọi... cho tốn gom cụm trình tự Hay giải pháp Yang cộng [57] sử dụng mơ hình map-reduce cho tốn gom cụm phân loại trình tự đồng thời xây dựng giải pháp metagenomic dựa mơ hình đám mây (cloud) Giải pháp. .. luận văn: Giải pháp song song cho vấn đề gom cụm trình tự metagenomic i Ngày & nơi bảo vệ luận văn: 22/4/2021 – Đại học Sư Phạm Kỹ Thuật Tp Hồ Chí Minh Người hướng dẫn: TS Lê Văn Vinh III Q TRÌNH