1. Trang chủ
  2. » Giáo Dục - Đào Tạo

Mô hình xử lý hiệu quả dữ liệu biểu hiện gen

171 45 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 171
Dung lượng 7,23 MB

Nội dung

BỘ GIÁO DỤC VÀ ĐÀO TẠO TRƯỜNG ĐẠI HỌC CẦN THƠ HUỲNH PHƯỚC HẢI MƠ HÌNH XỬ LÝ HIỆU QUẢ DỮ LIỆU BIỂU HIỆN GEN LUẬN ÁN TIẾN SĨ CHUYÊN NGÀNH: HỆ THỐNG THÔNG TIN Mà NGÀNH 62480104 CẦN THƠ, 2019 BỘ GIÁO DỤC VÀ ĐÀO TẠO TRƯỜNG ĐẠI HỌC CẦN THƠ HUỲNH PHƯỚC HẢI MƠ HÌNH XỬ LÝ HIỆU QUẢ DỮ LIỆU BIỂU HIỆN GEN LUẬN ÁN TIẾN SĨ CHUYÊN NGÀNH: HỆ THỐNG THÔNG TIN Mà NGÀNH 62480104 NGƯỜI HƯỚNG DẪN KHOA HỌC PGS TS ĐỖ THANH NGHỊ TS NGUYỄN VĂN HỊA CẦN THƠ, 2019 L˝IC MÌN ” ho n th nh lu“n ¡n n y tỉi ¢ nh“n ữổc sỹ hữợng dÔn, quan tƠm, giúp ù nhiằt tnh t quỵ Thy Cổ, bn b v ngữới thƠn Tổi xin gòi lới cÊm ỡn chƠn th nh n: Thy PGS.TS ỉ Thanh Ngh v Thy TS Nguyn Vôn Hặa  tn tnh ch bÊo, hữợng dÔn, ng viản v to mồi iãu kiằn tt nhĐt cho tổi quĂ tr…nh håc t“p v nghi¶n cøu Thƒy, Cỉ v c¡c anh, chà cıa khoa Cỉng ngh» thỉng tin v Truy•n thổng, trữớng i hồc Cn Thỡ  cung cĐp thảm ki‚n thøc, t⁄o måi i•u ki»n cho tỉi v quan t¥m, hØ trỉ tỉi qu¡ tr…nh håc t“p Ban gi¡m hi»u tr÷íng ⁄i håc An Giang, Ban chı nhi»m Khoa Cổng nghằ thổng tin  to iãu kiằn tổi ữổc tham gia hồc nƠng cao trnh chuyản mổn, cĂc bn ỗng nghiằp  khổng ngng ng vi¶n v gióp ï tỉi suŁt thíi gian håc t“p Sau cịng tỉi xin ch¥n th nh c£m ìn sƠu sc n gia nh v ngữới thƠn  giúp ï, ºng vi¶n tỉi suŁt qu¡ tr…nh håc t“p v to iãu kiằn tt nhĐt tổi ho n th nh lun Ăn NCS Huýnh Phữợc HÊi ii TMT T Trong nhng nôm gn Ơy, ung thữ l mt nhng nguyản nhƠn h ng u gƠy tò vong trản to n th giợi Do õ, ng y c ng cõ nhiãu nghiản cứu ữổc thỹc hiằn tm gi£i ph¡p hi»u qu£ ch'n o¡n v i•u trà ung thữ Tuy nhiản, cho n vÔn cặn nhiãu thĂch thức nguyản nhƠn gƠy ung thữ liản quan n rŁi lo⁄n di truy•n ho°c sü thay Œi qu¡ tr…nh phĂt trin tỹ nhiản t b o PhƠn t ‰ch bi”u hi»n gen b‹ng c¡c mæ h…nh håc m¡y l cỉng cư m⁄nh m‡ ” x¡c ành c¡c thay Œi cıa c¡c t‚ b o c¡c i•u ki»n mỉi tr÷íng kh¡c C¡c mỉ h… nh håc m¡y cung c§p thỉng tin hœu ‰ch ” ch'n o¡n v iãu tr ung thữ Tuy nhiản, mổ hnh hồc mĂy phƠn lợp d liằu biu hiằn gen l d bà qu¡ khỵp dœ li»u bi”u hi»n gen câ s chiãu rĐt lợn v s lữổng mÔu nhọ PhƠn lợp d liằu cõ s chiãu lợn l mt 10 th¡ch thøc cıa håc m¡y hi»n ⁄i Trong lu“n Ăn, chúng tổi giÊi quyt cĂc vĐn ã n y bng cĂc õng gõp sau Ơy Thứ nhĐt, chúng tổi ã xuĐt mổ hnh rút trch c trững mợi hồc cĂc t nh nông tiãm 'n ca d liằu bi”u hi»n gen b‹ng m⁄ng nì-ron t‰ch ch“p s¥u (DCNN) CĂc c trững mợi ữổc rút trch bng DCNN cÊi thiằn ch nh xĂc phƠn lợp d liằu biu hi»n gen cıa c¡c cæng ngh» DNA Microarray v RNA-Seq K‚t qu£ thüc nghi»m cho th§y º ch‰nh x¡c cıa cĂc b phƠn loi ữổc cÊi thiằn dũng DCNN rót tr‰ch °c tr÷ng tł dœ li»u bi”u hi»n gen Bản cnh õ, chúng tổi cặn ã xuĐt phữỡng phĂp giÊi quyt cÊ hai thĂch thức ca phƠn lợp d liằu biu hiằn gen bng giÊi thut tông cữớng d liằu SMOTE t c trững mợi ữổc rút trch bng m⁄ng DCNN Gi£i thu“t SMOTE ÷ỉc dịng ” sinh dœ liằu tng hổp t cĂc c trững mợi ữổc rút tr ‰ch b‹ng m⁄ng DCNN Dœ li»u tŒng hỉp ÷ỉc sinh ữổc tông cữớng cho d liằu hồc v sò dửng cĂc giÊi thut phƠn lợp hiằn i phƠn loi Thứ hai, chúng tổi ã xuĐt mổ hnh tông cữớng d liằu cho phƠn lợp biu hiằn gen bng mng i khĂng sinh mÔu (GAN) Mng GAN ữổc xƠy dỹng phũ hổp vợi d liằu biu hiằn gen ” sinh dœ li»u tŒng hæp tł dœ li»u gŁc Mổ hnh ữổc kt hổp vợi cĂc giÊi thut phƠn lợp phƠn loi d liằu biu hiằn gen Kt quÊ thỹc nghiằm cho thĐy mổ hnh ã xuĐt cÊi thi»n ÷ỉc º ch‰nh iii x¡c cıa c¡c gi£i thu“t gỗm k lĂng giãng, cƠy quyt nh, mĂy hồc vc-tỡ hỉ trổ v rng ngÔu nhiản Thứ ba, chúng tổi ã xuĐt mổ hnh hổp cĂc cƠy xiản phƠn ngÔu nhiản ỡn giÊn (RODS) dỹa trản mĂy hồc vc-tỡ hỉ trổ (SVM) phƠn lợp hiằu quÊ d liằu biu hiằn gen ị tững chnh l kt hổp nhiãu cƠy xiản phƠn ngÔu nhiản ỡn giÊn theo hữợng tip c“n Bagging v Boosting Chóng tỉi x¥y düng t“p hỉp cĂc cƠy xiản phƠn ngÔu nhiản ỡn giÊn dỹa trản siảu phflng ti ữu thu ữổc t huĐn luyằn SVM Kt quÊ thỹc nghiằm trản cho thĐy mổ hnh ã xuĐt hiằu quÊ hỡn cĂc giÊi thut khĂc gỗm k lĂng giãng, cƠy quyt nh, mĂy hồc vc-tỡ hỉ trổ, rng ngÔu nhiản, bagging v adaboost phƠn lợp trỹc tip trản s chiãu gc Ngo i ra, mổ hnh ã xuĐt cụng cÊi thiằn ữổc chnh xĂc ca mổ hnh phƠn lợp kt hổp vợi cĂc k thut tông cữớng d liằu bng mng GAN v rút tr‰ch °c tr÷ng b‹ng m⁄ng DCNN Tł khâa: dœ li»u biu hiằn gen, mổ hnh phƠn lợp, mng nỡ-ron tch chp sƠu, mng i khĂng sinh mÔu, mổ hnh hổp cĂc cƠy xiản phƠn ngÔu nhiản ỡn giÊn, mĂy håc v†c-tì hØ trỉ iv ABSTRACT In recent years, cancer is leading cause of death worldwide Therefore, more and more studies have been conducted which aim to improve the ability to discover cancers earlier and to diagnose them more accurately than was the case only a few years ago However, there are still many challenges in cancer treatment because the most common causes of cancer are genetic disorders and epigenetic alterations in the cells Gene expression is an exceptionally powerful tool for identifying changes in cells between different environmental conditions or developmental stages It is able to provide benefit information that is used to explore and diagnose disease Gene expression data classi-fication models play a key role to address the fundamental problems relat-ing to cancer Nevertheless, these models can easy overfiting because of the very-highdimensional and small-sample-size issues Classifying gene expres-sion data is a challenge in the field of machine learning In this dissertation we are interested in tackling these issues with the following contributions Firstly, we propose a new feature extraction model to learn latent fea-tures from gene expression data using deep convolutional neural network (DCNN) This model improves the classification accuracy of gene expression on both RNA-Seq and DNA-Microarray platforms Experiment results show that DCNN is effective to extract features from gene expression data On the other hand, we also propose a combined enhancing and extraction method to address both challenges of classification models using gene expression data In this approach, SMOTE algorithm generates new data from features extracted by DCNN These models are used in conjunction with various classifiers that efficiently classify gene expression data Secondly, we propose a new enhancing gene expression data model with generative adversarial network (GAN).GAN is implemented to generate synthetic data from original training datasets, which is used in conjunction with various classifiers to predict gene expression data Numerical test results show that our proposed model improve the classification accuracy of algorithms including support vector machines, k nearest neighbors and random forests v Finally, we investigate random ensemble oblique decision stumps (RODS) based on linear support vector machine (SVM) that is suitable for classify-ing very-high-dimensional microarray gene expression data Our classification algorithms (called Bag-RODS and Boost-RODS) learn multiple oblique decision stumps in the way of bagging and boosting to form an ensemble of classifiers more accurate than single model Numerical test results show that our proposed algorithms are more accurate than the-state-of-the-art classifica-tion models, including k nearest neighbors, support vector machines, decision trees and ensembles of decision trees like random forests, bagging and ad-aboost In addition, these models also improve the classification accuracy by combined with enhancing data model using the GAN and feature extraction model using DCNN Key words: gene expression data, classification, deep convolutional neural network, generative adversarial network, random ensemble oblique decision stumps, support vector machines vi MÖC LÖC L˝IC MÌN T´MT T ii iii ABSTRACT v MÖCLÖC vii DANHMÖCC CHNHV xii DANHMƯCC CB NGBI U xiv CH×ÌNG GI˛I THI U 1.1 T‰nh c§p thi‚t cıa lu“n ¡n 1.2 Mửc tiảu, i tữổng, phm vi v phữỡng phĂp nghiản cứu 1.3 Nhiằm vử v hữợng tip cn ca lun Ăn 1.3.1 Nghiản cứu xƠy dỹng mổ h…nh rót tr‰ch °c tr÷ng cho dœ li»u bi”u hi»n gen 1.3.2 Nghiản cứu xƠy dỹng mổ hnh tông c÷íng dœ li»u cho dœ li»u bi”u hi»n gen 1.3.3 Nghiản cứu xƠy dỹng mổ hnh phƠn lợp hiằu qu£ dœ li»u bi”u hi»n gen 1.4 C¡c âng gâp cıa lu“n ¡n 1.5 BŁ cöc cıa lu“n ¡n CHìèNG Cè S Lị THUY T V C C C˘NG TR NH LI N QUAN 11 2.1 Dœ li»u bi”u hi»n gen 11 2.2 Mæ hnh phƠn lợp d liằu biu hiằn gen 15 2.2.1 Ph¡t bi”u b i to¡n 15 2.2.2 ¡nh gi¡ mæ h…nh 16 2.2.3 Dœ li»u thüc nghi»m 18 2.3 C¡c nghi¶n cøu li¶n quan 24 2.3.1 Mỉ h…nh k l¡ng gi•ng 24 2.3.2 Mỉ h…nh c¥y quy‚t ành 25 2.3.3 M¡y håc v†c-tì hØ træ 26 vii 2.3.4 Ph÷ìng ph¡p t“p hỉp mỉ h…nh 2.3.5 Mỉ h…nh m⁄ng nì-ron nh¥n t⁄o 30 32 2.3.6 C¡c mæ h…nh håc s¥u 33 2.4 Th£o lu“n c¡c nghi¶n cøu li¶n quan 36 2.5 K‚t ch÷ìng 38 CH×ÌNG M˘ H NH RĨT TR CH C TRìNG CHO DLI UBI UHI NGEN 39 3.1 Giợi thiằu 39 3.2 Mỉ h…nh m⁄ng nì-ron t‰ch ch“p s¥u rót tr‰ch °c tr÷ng dœ li»u bi”u hi»n gen 41 3.2.1 Ki‚n tróc mỉ h…nh m⁄ng nì-ron t‰ch ch“p s¥u rót tr‰ch °c tr÷ng cho dœ li»u bi”u hi»n gen 41 3.2.2 Qu¡ tr…nh rót tr‰ch °c tr÷ng 44 3.2.3 CĂc giÊi thut phƠn lợp c tr÷ng ÷ỉc rót tr‰ch 49 3.3 K‚t qu£ thüc nghi»m 50 3.3.1 Kt quÊ phƠn lợp d liằu biu hiằn gen DNA Microarray 51 3.3.2 Kt quÊ phƠn lợp d liằu biu hi»n gen RNA-Seq 62 3.3.3 K‚t qu£ phƠn lợp d liằu biu hiằn gen RNA-Seq lợn 68 3.4 K‚t ch÷ìng CH×ÌNG 70 M˘ H NH T NG C×˝NG M U C TR×NG RĨT TR CH B NG SMOTE 4.1 Giỵi thi»u 71 71 4.2 Tông cữớng mÔu bng SMOTE dỹa v o c trững rót tr‰ch cıa dœ li»u bi”u hi»n gen 4.3 K‚t qu£ thüc nghi»m 73 76 4.3.1 Dœ li»u thüc nghi»m 76 4.3.2 Thi‚t l“p tham sŁ c¡c mæ h…nh 76 4.3.3 Kt quÊ phƠn lợp 78 4.4 K‚t ch÷ìng CH×ÌNG 89 M˘ H NH T NG C×˝NG DÚ LI U CHO DÚ LI UBI UHI NGEN 90 5.1 Giỵi thi»u 90 5.2 Mæ h…nh tông cữớng mÔu cho d liằu biu hiằn gen 92 viii C C CNG TR NH CNG Bă [CT1] Phuoc-Hai Huynh, Van Hoa Nguyen, Thanh-Nghi Do, "Novel hybrid DCNN-SVM model for classifying RNA-Sequencing gene expression data", Journal of Information and Telecommunication (JIT), Taylor & Francis, 3:4, pp 533-547, 2019 [CT2] Phuoc-Hai Huynh, Van Hoa Nguyen, Thanh-Nghi Do, "Enhancing gene expression classification of support vector machines with generative ad-versarial networks", Journal of Information and Communication Convergence Engineering (JICCE), Vol 17, pp 14-20, 2019 (SCOPUS) [CT3] Phuoc-Hai Huynh, Van Hoa Nguyen, Thanh-Nghi Do, "So s¡nh c¡c mỉ h…nh håc s¥u v c¡c kÿ thu“t håc tỹ ng khĂc phƠn lợp d liằu biu hiằn gene microarray", in proc of the 10th National Conference on Fundamental and Applied Information Technology Research (FAIR’10), pp 841- 850, 2017 [CT4] Phuoc-Hai Huynh, Van Hoa Nguyen, Thanh-Nghi Do, "A Coupling Support Vector Machines with the Feature Learning of Deep Convolutional Neural Networks for Classifying Microarray Gene Expression Data", in proc of the 10th Asian Conference on Intelligent Information and Database Systems (ACIIDS), Springer, pp 233-243, 2018 [CT5] Phuoc-Hai Huynh, Van Hoa Nguyen, Thanh-Nghi Do, "Random ensemble oblique decision stumps for classifying gene expression data",in proc of International Symposium on Information and Communication Technology 2018, Association for Computing Machinery (SoICT), pp 137-144, 2018 [CT6] Phuoc-Hai Huynh, Van Hoa Nguyen, Thanh-Nghi Do,"A combined enhancing and feature extraction algorithm to improve learning accuracy for gene expression classification", in proc of 6th International Conference on Future Data and Security Engineering 2019 (FDSE), Springer, pp 255-273, 2019 [CT7] Phuoc-Hai Huynh, Van Hoa Nguyen, Thanh-Nghi Do, "Improvements in the Large p, Small n Classification Issue", in Journal of SN Computer Science, Springer, 1, 207, 2020 138 T i li»u tham kh£o [1] U Alon, N Barkai, D Notterman, K Gish, S Ybarra, D Mack, and A Levine, Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays, Proceedings of the National Academy of Sciences, vol 96, no 12, pp 6745 6750, Jun 1999 [2] F Bray and et al., Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries, CA: a cancer journal for clinicians, 2018 [3] J Ferlay, I Soerjomataram, R Dikshit, S Eser, C Mathers, M Rebelo, D M Parkin, D Forman, and F Bray, Cancer incidence and mortality worldwide: sources, methods and major patterns in GLOBOCAN 2012, International journal of cancer, vol 136, no 5, 2015 [4] L Rahib, B D Smith, R Aizenberg, A B Rosenzweig, J M Fleshman, and L M Matrisian, Projecting cancer incidence and deaths to 2030: the unexpected burden of thyroid, liver, and pancreas cancers in the United States, Cancer research, vol 74, no 11, pp 2913 2921, 2014 [5] C M Perou and et al., Molecular portraits of human breast tumours, Nature, vol 406, no 6797, p 747, 2000 [6] J S Reis-Filho and L Pusztai, Gene expression profiling in breast cancer: classifica-tion, prognostication, and prediction, The Lancet, vol 378, no 9805, pp 1812 1823, 2011 [7] R Van’t Veer, Laura J & Bernards, Enabling personalized cancer medicine through analysis of gene-expression patterns, Nature, vol 452, no 7187, p 564, 2008 [8] M Snyder, Genomics and personalized medicine: what everyone needs to know Ox-ford University Press, 2016 [9] F H Crick, On protein synthesis, in Symp Soc Exp Biol, vol 12, no 138-63, 1958, p [10] V E Velculescu, L Zhang, B Vogelstein, and K W Kinzler, Serial analysis of gene expression, Science (New York, N.Y.), vol 270, no 5235, pp 484 487, Oct 1995 [11] S Selvaraj and J Natarajan, Microarray data analysis and mining tools, Bioinformation, vol 6, no 3, p 95, 2011 [12] T R Golub and et al., Molecular classification of cancer: class discovery and class prediction by gene expression monitoring, science, vol 286, no 5439, pp 531 537, 1999 139 [13] J Khan and et al., Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks, Nature medicine, vol 7, no 6, p 673, 2001 [14] L J Van’t Veer and et al., Gene expression profiling predicts clinical outcome of breast cancer, nature, vol 415, no 6871, p 530, 2002 [15] Y Saeys, I Inza, and P Larranaga, A review of feature selection techniques in bioinformatics, Bioinformatics, vol 23, no 19, pp 2507 2517, 2007 [16] A Sturn, J Quackenbush, and Z Trajanoski, Genesis: cluster analysis of microarray data , Bioinformatics, vol 18, no 1, pp 207 208, 2002 [17] M P Brown and et al, Knowledge-based analysis of microarray gene expression data by using support vector machines, Proceedings of the National Academy of Sciences, vol 97, no 1, pp 262 267, 2000 [18] I Guyon, J Weston, S Barnhill, and V Vapnik, Gene selection for cancer classifica-tion using support vector machines, Machine learning, vol 46, no 1-3, pp 389 422, 2002 [19] M Grzes and M Kretowski, Decision tree approach to microarray data analysis, Biocybernetics and Biomedical Engineering, vol 27, no 3, pp 29 42, 2007 [20] R Parry, W Jones, T Stokes, J Phan, R Moffitt, H Fang, L Shi, A Oberthuer, M Fischer, W Tong, and others, k-Nearest neighbor models for microarray gene expression analysis and clinical outcome prediction, The pharmacogenomics journal, vol 10, no 4, p 292, 2010 [21] M Pirooznia, J Y Yang, M Q Yang, and Y Deng, A comparative study of different machine learning methods on microarray gene expression data, BMC genomics, vol 9, no 1, p S13, 2008 [22] J Lu and et al., Microrna expression profiles classify human cancers, nature, vol 435, no 7043, p 834, 2005 [23] F Gao, W Wang, M Tan, L Zhu, Y Zhang, E Fessler, L Vermeulen, and X Wang, Deepcc: a novel deep learning-based framework for cancer molecular subtype classi-fication, Oncogenesis, vol 8, no 9, pp 12, 2019 [24] Q Yang and X Wu, 10 challenging problems in data mining research, International Journal of Information Technology & Decision Making, vol 5, no 04, pp 597 604, 2006 [25] A Kalantari and et al., Computational intelligence approaches for classification of medical data: State-of-the-art, future challenges and research directions, Neurocom-puting, vol 276, pp 22, 2018 [26] R Bellman, Dynamic programming treatment of the travelling salesman problem, Journal of the ACM (JACM), vol 9, no 1, pp 61 63, 1962 [27] A Brazma and J Vilo, Gene expression data analysis, FEBS letters, vol 480, no 1, pp 17 24, 2000 140 [28] P Ferroni, F M Zanzotto, S Riondino, N Scarpato, F Guadagni, and M Roselli, Breast cancer prognosis using a machine learning approach, Cancers, vol 11, no 3, p 328, 2019 [29] M D Ganggayah, N A Taib, Y C Har, P Lio, and S K Dhillon, Predicting factors for survival of breast cancer patients using machine learning techniques, BMC medical informatics and decision making, vol 19, no 1, p 48, 2019 [30] L Jinyan and L Huiqing, Kent ridge bio-medical data set repository Technical report, 2002 [31] A Brazma and et al., ArrayExpress a public repository for microarray gene expression data at the EBI, Nucleic acids research, vol 31, no 1, pp 68 71, 2003 [32] Broad Institute TCGA Genome Data Analysis Center, Analysis Overview for 28 January 2016, Broad Institute of MIT and Harvard, Tech Rep., 2016 [33] Z M Hira and D F Gillies, A Review of Feature Selection and Feature Extraction Methods Applied on Microarray Data, Advances in Bioinformatics, vol 2015, pp 13, 2015 [34] U Alon, N Barkai, D A Notterman, K Gish, S Ybarra, D Mack, and A J Levine, Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays, Proceedings of the National Academy of Sciences, vol 96, no 12, pp 6745 6750, 1999 [35] A D Roses, Pharmacogenetics and the practice of medicine, Nature, vol 405, no 6788, p 857, 2000 [36] P PARK, M Pagano, and M Bonetti, A nonparametric scoring algorithm for identi-fying informative genes from microarray data, Pacific Symposium on Biocomputing Pacific Symposium on Biocomputing, vol 6, pp 52 63, 02 2001 [37] C S Tan, W S Ting, M S Mohamad, W H Chan, S Deris, and Z Ali Shah, A review of feature extraction software for microarray gene expression data, BioMed research international, vol 2014, 2014 [38] J Landgrebe, W Wurst, and G Welzl, Permutation-validated principal components analysis of microarray data, Genome Biology, vol 3, no 4, pp research0019 1, 2002 [39] A Wang and E A Gehan, Gene selection for microarray data analysis using principal component analysis, Statistics in medicine, vol 24, no 13, pp 2069 2087, 2005 [40] S Jonnalagadda and R Srinivasan, Principal components analysis based methodol-ogy to identify differentially expressed genes in time-course microarray data, BMC Bioinformatics, vol 9, no 1, p 267, Dec 2008 [41] C.-H Zheng and et al., Gene Expression Data Classification Using Consensus Independent Component Analysis, Genomics, Proteomics & Bioinformatics, vol 6, no 2, pp 74 82, 2008 [42] V Nikulin and G J McLachlan, Penalized principal component analysis of microarray data, in International Meeting on Computational Intelligence Methods for Bioinformatics and Biostatistics Springer, 2009, pp 82 96 141 [43] S.-Y Kim, Effects of sample size on robustness and prediction accuracy of a prog-nostic gene signature, BMC Bioinformatics, vol 10, no 1, Dec 2009 [44] C Stretch, S Khan, N Asgarian, R Eisner, S Vaisipour, S Damaraju, K Graham, O F Bathe, H Steed, R Greiner et al., Effects of sample size on differential gene expression, rank order and prediction accuracy of a gene signature, PloS one, vol 8, no 6, p e65380, 2013 [45] M.-L Antonie, O R Zaiane, and A Coman, Application of data mining techniques for medical image classification, in Proceedings of the Second International Confer-ence on Multimedia Data Mining Springer-Verlag, 2001, pp 94 101 [46] F Zhang, X Xu, and Y Qiao, Deep classification of vehicle makers and models: The effectiveness of pre-training and data enhancement, in 2015 IEEE International Conference on Robotics and Biomimetics (ROBIO) IEEE, 2015, pp 231 236 [47] K Nithya, P D Kalaivaani, and R Thangarajan, An enhanced data mining model for text classification, in 2012 International Conference on Computing, Communication and Applications IEEE, 2012, pp [48] R R Bhat, V Viswanath, and X Li, Deepcancer: Detecting cancer via deep genera-tive learning through gene expressions, in 2017 IEEE 15th Intl Conf on Dependable, Autonomic and Secure Computing, 2017, pp 901 908 [49] E Fix and J Hodges, Discriminatory analysis-nonparametric discrimination: Small sample performance, California Univ Berkeley, Tech Rep., 1952 [50] L Breiman, J Friedman, R Olshen, and C Stone, Classification and Regression T rees (Monterey, California: Wadsworth) Inc, 1984 [51] J R Quinlan, C4.5: Programs for Machine Learning San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1993 [52] Vapnik, The nature of statistical learning theory Springer science & business media, 1995 [53] L Breiman, Random forests, Machine learning, vol 45, no 1, pp 32, 2001 [54] A Statnikov, I Tsamardinos, Y Dosbayev, and C F Aliferis, GEMS: a system for automated cancer diagnosis and biomarker discovery from microarray gene expression data, International journal of medical informatics, vol 74, no 7-8, pp 491 503, 2005 [55] R D‰az-Uriarte and S A De Andres, Gene selection and classification of microarray data using random forest, BMC bioinformatics, vol 7, no 1, p 3, 2006 [56] P.-H Huynh, V.-H Nguyen, and T.-N Do, Novel hybrid dcnn svm model for classifying rna-sequencing gene expression data, Journal of Information and Telecommu-nication, vol 3, no 4, pp 533 547, 2019 [57] P H Huynh, V H Nguyen, and T.-N Do, So sĂnh mổ hnh hồc sƠu vợi cĂc phữỡng phĂp hồc tỹ ng khĂc phƠn lợp d liằu biu hiằn gen, in proc of the 10th National Conf on Fundamental and Applied Information Technology Research (FAIR’10), 2017, pp 841 850 142 [58] P H Huynh, V H Nguyen, and T N Do, A Coupling Support Vector Machines with the Feature Learning of Deep Convolutional Neural Networks for Classifying Microarray Gene Expression Data, in Modern Approaches for Intelligent Information and Database Systems Springer, 2018, pp 233 243 [59] P.-H Huynh, V.-H Nguyen, and T.-N Do, A combined enhancing and feature extraction algorithm to improve learning accuracy for gene expression classification, in Future Data and Security Engineering, T K Dang, J Kung, M Takizawa, and S H Bui, Eds Cham: Springer International Publishing, 2019, pp 255 273 [60] P.-H Huynh, V H Nguyen, and T.-N Do, Improvements in the large p, small n classification issue, Journal of SN Computer Science, Mar 2020 [61] I Goodfellow, J Pouget-Abadie, M Mirza, B Xu, D Warde-Farley, S Ozair, A Courville, and Y Bengio, Generative adversarial nets, in Advances in neural information processing systems, 2014, pp 2672 2680 [62] P.-H Huynh, V H Nguyen, and T.-N Do, Enhancing Gene Expression Classification of Support Vector Machines with Generative Adversarial Networks, Journal of information and communication convergence engineering, vol 17, no 1, pp 14 20, Mar 2019 [63] L Breiman, Bagging predictors, Machine learning, vol 24, no 2, pp 123 140, 1996 [64] Freund and Schapire, A decision-theoretic generalization of on-line learning and an application to boosting, Journal of computer and system sciences, vol 55, no 1, pp 119 139, 1995 [65] P.-H Huynh, V H Nguyen, and T.-N Do, Random ensemble oblique decision stumps for classifying gene expression data, in Proceedings of the Ninth International Symposium on Information and Communication Technology, ser SoICT 2018 New York, NY, USA: Association for Computing Machinery, 2018, p 137 144 [Online] Available: https://doi.org/10.1145/3287921.3287987 [66] C A Kaiser, M Krieger, H Lodish, and A Berk, Molecular cell biology WH Freeman, 2007 [67] J D Watson, F H Crick, and others, Molecular structure of nucleic acids, Nature, vol 171, no 4356, pp 737 738, 1953 [68] T K Paul and H Iba, Extraction of informative genes from microarray data, in Proceedings of the 7th annual conference on Genetic and evolutionary computation ACM, 2005, pp 453 460 [69] Y.-F Guan and et al., Application of next-generation sequencing in clinical oncology to advance personalized treatment of cancer, Chinese Journal of Cancer, vol 31, no 10, pp 463 470, Oct 2012 [70] R Sandberg, Entering the era of single-cell transcriptomics in biology and medicine, Nature Methods, vol 11, no 1, pp 22 24, Jan 2014 [71] M Mele and et al., The human transcriptome across tissues and individuals, Science, vol 348, no 6235, pp 660 665, May 2015 143 [72] A Kolodziejczyk, J K Kim, V Svensson, J Marioni, and S Teichmann, The Tech-nology and Biology of Single-Cell RNA Sequencing, Molecular Cell, vol 58, no 4, pp 610 620, May 2015 [73] J C Alwine, D J Kemp, and G R Stark, Method for detection of specific RNAs in agarose gels by transfer to diazobenzyloxymethyl-paper and hybridization with DNA probes Proceedings of the National Academy of Sciences, vol 74, no 12, pp 5350 5354, 1977 [74] T J Aitman, DNA microarrays in medical practice, Bmj, vol 323, no 7313, pp 611 615, 2001 [75] M Schena, D Shalon, R W Davis, and P O Brown, Quantitative monitoring of gene expression patterns with a complementary DNA microarray, Science, vol 270, no 5235, pp 467 470, 1995 [76] R J Lipshutz, S P Fodor, T R Gingeras, and D J Lockhart, High density synthetic oligonucleotide arrays, Nature Genetics, vol 21, no Suppl, pp 20 24, Jan 1999 [77] Wang, RNA-Seq: a revolutionary tool for transcriptomics, Nature reviews genetics, vol 10, no 1, p 57, 2009 [78] R Edgar, M Domrachev, and A E Lash, Gene Expression Omnibus: NCBI gene expression and hybridization array data repository, Nucleic Acids Research, vol 30, no 1, pp 207 210, Jan 2002 [79] Weinstein and et al., The cancer genome atlas pan-cancer analysis project, Nature genetics, vol 45, no 10, p 1113, 2013 [80] K Kourou and et al., Machine learning applications in cancer prognosis and prediction, Computational and structural biotechnology journal, vol 13, pp 17, 2015 [81] Li, Yuanyuan, and et al., A comprehensive genomic pan-cancer classification using The Cancer Genome Atlas gene expression data, BMC genomics, vol 18, no 1, p 508, 2017 [82] A C Berger and et al., A comprehensive Pan-Cancer molecular study of gynecologic and breast cancers, Cancer Cell, 2018 [83] D H Wolpert and W G Macready, No free lunch theorems for optimization, IEEE transactions on evolutionary computation, vol 1, no 1, pp 67 82, 1997 [84] P W Novianti, K C B Roes, and M J C Eijkemans, Evaluation of Gene Expression Classification Studies: Factors Associated with Classification Performance, PLoS ONE, vol 9, no 4, p e96063, 2014 [85] X Wu and et al., Top 10 algorithms in data mining, Knowledge and information systems, vol 14, no 1, pp 37, 2008 [86] Li, Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method, Bioinformatics, vol 17, no 12, pp 1131 1142, 2001 144 [87] S Dudoit, J Fridlyand, and T P Speed, Comparison of discrimination methods for the classification of tumors using gene expression data, Journal of the American statistical association, vol 97, no 457, pp 77 87, 2002 [88] S Deegalla and H Bostrom, Classification of microarrays with knn: Comparison of dimensionality reduction methods, in International Conference on Intelligent Data Engineering and Automated Learning Springer, 2007, pp 800 809 [89] J Sun, K Passi, and C Jain, Increasing efficiency of microarray analysis by pca and machine learning methods, in Proceedings of the International Conference on Bioinformatics & Computational Biology (BIOCOMP), 2016, p 56 [90] A Chu and et al., Large-scale profiling of microRNAs for the cancer genome atlas, Nucleic acids research, vol 44, no 1, pp e3 e3, 2015 [91] T.-N Do, N.-K Pham, H H Nguyen, and M T Nguyen, GiÊi thut rng ngÔu nhiản vợi lut gĂn nhÂn cửc b cho phƠn lợp, in Proceedings of the 10th National Conference on Fundamental and Applied Information Technology Research (FAIR’9), 2015, pp 277 284 [92] N V Chawla, K W Bowyer, L O Hall, and W P Kegelmeyer, SMOTE: Synthetic Minority Over-sampling Technique J Artif Intell Res., vol 16, pp 321 357, 2002 [93] Netto, Applying decision trees to gene expression data from dna microarrays: A leukemia case study, in XXX congress of the Brazilian computer society, X workshop on medical informatics, 2010, p 10 [94] M B Al Snousy, H M El-Deeb, K Badran, and I A Al Khlil, Suite of decision tree-based classification algorithms on cancer gene expression data, Egyptian Informatics Journal, vol 12, no 2, pp 73 82, 2011 [95] B Ulfenborg, K Klinga-Levan, and B Olsson, Classification of tumor samples from expression data using decision trunks, Cancer informatics, vol 12, pp CIN S10 356, 2013 [96] S A Ludwig, D Jakobovic, and S Picek, Analyzing gene expression data: Fuzzy decision tree algorithm applied to the classification of cancer data, in Fuzzy Systems (FUZZ-IEEE), 2015 IEEE International Conference on IEEE, 2015, pp [97] N Cristianini and J Shawe-Taylor, An introduction to support vector machines and other kernel-based learning methods Cambridge university press, 2000 [98] J Weston and C Watkins, Multi-class support vector machines, Citeseer, Tech Rep., 1998 [99] U H.-G Kre el, Pairwise Classification and Support Vector Machines, B Scholkopf, C J C Burges, and A J Smola, Eds Cambridge, MA, USA: MIT Press, 1999, pp 255 268 [100] V Vural and J G Dy, A hierarchical method for multi-class support vector machines, in Proceedings of the twenty-first international conference on Machine learn-ing ACM, 2004, p 105 145 [101] C.-C Chang and C.-J Lin, LIBSVM: a library for support vector machines, ACM transactions on intelligent systems and technology (TIST), vol 2, no 3, p 27, 2011 [102] R.-E Fan and et al., LIBLINEAR: A library for large linear classification, Journal of machine learning research, vol 9, no Aug, pp 1871 1874, 2008 [103] A Statnikov, L Wang, and C F Aliferis, A comprehensive comparison of random forests and SVM for microarray-based cancer classification, BMC bioinformatics, vol 9, no 1, p 319, 2008 [104] T.-N Do, Gi¡o tr…nh c¡c h» tri thøc v khai th¡c dœ li»u, in Gi¡o tr…nh c¡c h» tri thøc v khai th¡c dœ li»u Nh xu§t b£n ⁄i håc Cƒn Thì, 2012, pp 800 809 [105] E.-J Yeoh and et al., Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling, Cancer cell, vol 1, no 2, pp 133 143, 2002 [106] S Mukherjee, Classifying Microarray Data Using Support Vector Machines Boston: Kluwer Academic Publishers, 2003, pp 166 185 [107] T S Furey, N Cristianini, N Duffy, D W Bednarski, M Schummer, and D Haussler, Support vector machine classification and validation of cancer tissue samples using microarray expression data, Bioinformatics, vol 16, no 10, pp 906 914, 2000 [108] S.-H Hsieh and et al., Leukemia cancer classification based on Support Vector Ma-chine, in Industrial Informatics (INDIN), 2010 8th IEEE International Conference on IEEE, 2010, pp 819 824 [109] C D A Vanitha, D Devaraj, and M Venkatesulu, Gene expression data classifi- cation using support vector machine and mutual information-based gene selection, procedia computer science, vol 47, pp 13 21, 2015 [110] S Cogill and L Wang, Support vector machine model of developmental brain gene expression data for prioritization of Autism risk gene candidates, Bioinformatics, vol 32, no 23, pp 3611 3618, 2016 [111] S S Hameed, R Hassan, and F F Muhammad, Selection and classification of gene expression in autism disorder: Use of a combination of statistical filters and a GBPSO-SVM algorithm, PloS one, vol 12, no 11, p e0187371, 2017 [112] L Gao, M Ye, and C Wu, Cancer Classification Based on Support Vector Machine Optimized by Particle Swarm Optimization and Artificial Bee Colony, Molecules, vol 22, no 12, p 2086, 2017 [113] R Rifkin and et al., An analytical method for multiclass molecular cancer classifi- cation, Siam Review, vol 45, no 4, pp 706 723, 2003 [114] L Shen and E C Tan, Reducing multiclass cancer classification to binary by output coding and SVM, Computational biology and chemistry, vol 30, no 1, pp 63 71, 2006 [115] A Statnikov and et al., A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis, Bioinformatics, vol 21, no 5, pp 631 643, 2004 146 [116] C Kim and H.-y Kim, An Improved SVM-T-RFE Based on Intensity-Dependent Normalization for Feature Selection in Gene Expression of Big-Data, in IT Conver-gence and Security 2017 Springer, 2018, pp 44 51 [117] O Hamzeh, A Alkhateeb, and L Rueda, Predicting Tumor Locations in Prostate Cancer Tissue Using Gene Expression, in International Conference on Bioinformat-ics and Biomedical Engineering Springer, 2018, pp 343 351 [118] T N Do, P Lenca, S Lallich, and N.-K Pham, Classifying very-high-dimensional data with random forests of oblique decision trees, in Advances in knowledge discov-ery and management Springer, 2010, pp 39 55 [119] L Breiman, Bias, variance, and arcing classifiers, 1996 [120] M Moorthy, Kohbalan and M Saberi, Random forest for gene selection and microar-ray data classification, Bioinformation, vol 7, no 3, p 142, 2011 [121] I Chen, Xi and Hemant, Random forests for genomic data analysis, Genomics, vol 99, no 6, pp 323 329, 2012 [122] K Nishiwaki, K Kanamori, and H Ohwada, Gene Selection from Microarray Data for Alzheimer’s Disease Using Random Forest, International Journal of Software Science and Computational Intelligence (IJSSCI), vol 9, no 2, pp 14 30, 2017 [123] J R Ummadi, B V R Reddy, and B E Reddy, A Novel Statistical Feature Selection Measure for Decision Tree Models on Microarray Cancer Detection, in Proceedings of International Conference on Computational Intelligence and Data Engineering Springer, 2018, pp 229 245 [124] T.-T Nguyen, J Z Huang, and T T Nguyen, Unbiased feature selection in learning random forests for high-dimensional data, The Scientific World Journal, vol 2015, 2015 [125] Q Wang, T.-T Nguyen, J Z Huang, and T T Nguyen, An efficient random forests algorithm for high dimensional data classification, Advances in Data Analysis and Classification, pp 20, 2018 [126] A C Tan and D Gilbert, Ensemble machine learning on gene expression data for cancer classification, Applied Bioinformatics, vol 2, no Suppl, pp S75 83, 2003 [127] M Dettling and P Buhlmann, Boosting for tumor classification with gene expression data, Bioinformatics, vol 19, no 9, pp 1061 1069, 2003 [128] M Dettling, BagBoosting for tumor classification with gene expression data, Bioin-formatics, vol 20, no 18, pp 3583 3593, 2004 [129] A Osareh and B Shadgar, An efficient ensemble learning method for gene microarray classification, BioMed research international, vol 2013, 2013 [130] G Zararsiz, D Goksuluk, S Korkmaz, V Eldem, I P Duru, A Ozturk, and T Unver, Classification of RNA-Seq data via bagging support vector machines, bioRxiv, p 007526, 2014 147 [131] J S Wei and et al., Prediction of clinical outcome using gene expression profiling and artificial neural networks for patients with neuroblastoma, Cancer research, vol 64, no 19, pp 6883 6891, 2004 [132] T Ayer, O Alagoz, J Chhatwal, J W Shavlik, C E Kahn, and E S Burnside, Breast cancer risk estimation with artificial neural networks re, Cancer, vol 116, no 14, pp 3310 3321, 2010 [133] C Ferreira, Designing neural networks using gene expression programming, in Ap-plied soft computing technologies: The challenge of complexity Springer, 2006, pp 517 535 [134] S Min, B Lee, and S Yoon, Deep learning in bioinformatics, Briefings in bioinformatics, vol 18, no 5, pp 851 869, 2017 [135] Y LeCun, Y Bengio, and others, Convolutional networks for images, speech, and time series, The handbook of brain theory and neural networks, vol 3361, no 10, p 1995, 1995 [136] D H Hubel and T Wiesel, Shape and arrangement of columns in cat’s striate cortex, The Journal of physiology, vol 165, no 3, pp 559 568, 1963 [137] Y LeCun, Y Bengio, and G Hinton, Deep learning, nature, vol 521, no 7553, p 436, 2015 [138] A Krizhevsky, I Sutskever, and G E Hinton, Imagenet classification with deep convolutional neural networks, in Advances in neural information processing systems, 2012, pp 1097 1105 [139] Kim and Yoon, Convolutional neural networks for sentence classification, in Proceedings of the 2014 Conference on Empirical Methods in NLP (EMNLP), 2014, pp 1746 1751 [140] J Hou, B Adhikari, and J Cheng, DeepSF: deep convolutional neural network for mapping protein sequences to folds, Bioinformatics, 2017 [141] O Ronneberger, P Fischer, and T Brox, U-net: Convolutional networks for biomed-ical image segmentation, in International Conference on Medical image computing and computer-assisted intervention Springer, 2015, pp 234 241 [142] P Moeskops, M Veta, M W Lafarge, K A Eppenhof, and J P Pluim, Adversarial training and dilated convolutions for brain MRI segmentation, in Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support Springer, 2017, pp 56 64 [143] T Araójo and et al., Classification of breast cancer histology images using Convolutional Neural Networks, PloS one, vol 12, no 6, p e0177544, 2017 [144] E I Zacharaki, Prediction of protein function using a deep convolutional neural network ensemble, PeerJ Computer Science, vol 3, p e124, 2017 [145] J F Nash and others, Equilibrium points in n-person games, Proceedings of the national academy of sciences, vol 36, no 1, pp 48 49, 1950 148 [146] S Reed, Z Akata, X Yan, L Logeswaran, B Schiele, and H Lee, Generative adversarial text to image synthesis, in International Conference on Machine Learning, 2016, pp 1060 1069 [147] C Ledig, L Theis, F Husz¡r, J Caballero, A Cunningham, A Acosta, A Aitken, A Tejani, J Totz, Z Wang et al., Photo-realistic single image super-resolution using a generative adversarial network, in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp 4681 4690 [148] A Dosovitskiy, J T Springenberg, M Tatarchenko, and T Brox, Learning to gener-ate chairs, tables and cars with convolutional networks, IEEE transactions on pattern analysis and machine intelligence, vol 39, no 4, pp 692 705, 2017 [149] E Choi, S Biswal, B Malin, J Duke, W F Stewart, and J Sun, Generating Multilabel Discrete Electronic Health Records using Generative Adversarial Networks, CoRR, vol abs/1703.06490, 2017 [150] T Li, C Zhang, and M Ogihara, A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression, Bioinformatics, vol 20, no 15, pp 2429 2437, 2004 [151] K.-B Duan, J C Rajapakse, H Wang, and F Azuaje, Multiple SVM-RFE for gene selection in cancer classification with expression data, IEEE transactions on nanobioscience, vol 4, no 3, pp 228 234, 2005 [152] P W Novianti, V L Jong, K C B Roes, and M J C Eijkemans, Factors affecting the accuracy of a class prediction model in gene expression data, BMC Bioinformat-ics, vol 16, no 1, p 199, 2015 [153] L Lusa and others, SMOTE for high-dimensional class-imbalanced data, BMC bioinformatics, vol 14, no 1, p 106, 2013 [154] M A Haidar and M Rezagholizadeh, TextKD-GAN: text generation using knowledge distillation and generative adversarial networks, in Canadian Conference on Artificial Intelligence Springer, 2019, pp 107 118 [155] Y Saeys, I Inza, and P Larranaga, A review of feature selection techniques in bioinformatics, bioinformatics, vol 23, no 19, pp 2507 2517, 2007 [156] A Gupta, H Wang, and M Ganapathiraju, Learning structure in gene expression data using deep architectures, with an application to gene clustering, in 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) IEEE, 2015, pp 1328 1335 [157] Y Li, C Huang, L Ding, Z Li, Y Pan, and X Gao, Deep learning in bioinformatics: Introduction, application, and perspective in the big data era, Methods, vol 166, pp 21, 2019 [158] B Tang, Z Pan, K Yin, and A Khateeb, Recent advances of deep learning in bioinformatics and computational biology, Frontiers in genetics, vol 10, 2019 [159] M Joseph, M Devaraj, and C K Leung, Deepgx: Deep learning using gene expres-sion for cancer classification, 2019 149 [160] X Glorot and Y Bengio, Understanding the difficulty of training deep feedforward neural networks, in Proceedings of the thirteenth international conference on artificial intelligence and statistics, 2010, pp 249 256 [161] L Zhang, W Zhou, V E Velculescu, S E Kern, R H Hruban, S R Hamilton, B Vogelstein, and K W Kinzler, Gene expression profiles in normal and cancer cells, Science, vol 276, no 5316, pp 1268 1272, 1997 [162] J A Sparano and et al., Adjuvant Chemotherapy Guided by a 21-Gene Expression Assay in Breast Cancer, New England Journal of Medicine, vol 379, no 2, pp 111 121, 2018 [163] X Han, Improving gene expression cancer molecular pattern discovery using non-negative principal component analysis, in Genome Informatics 2008: Genome Infor-matics Series Vol 21 World Scientific, 2008, pp 200 211 [164] W Astuti et al., Support vector machine and principal component analysis for microarray data classification, in Journal of Physics: Conference Series, vol 971, no IOP Publishing, 2018, p 012003 [165] M Abadi and et al., TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems, 2015 [166] F Pedregosa and et al., Scikit-learn: Machine Learning in Python, Journal of Machine Learning Research, vol 12, pp 2825 2830, 2011 [167] D P Kingma and J Ba, Adam: A method for stochastic optimization, Proceedings of the 3rd International Conference on Learning Representations (ICLR), 2014 [168] D Soydaner, A comparison of optimization algorithms for deep learning, International Journal of Pattern Recognition and Artificial Intelligence, 02 2020 [169] P Danaee, R Ghaeini, and D A Hendrix, A deep learning approach for cancer detection and relevant gene identification, in PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 World Scientific, 2017, pp 219 229 [170] C J Burges, A tutorial on support vector machines for pattern recognition, Data mining and knowledge discovery, vol 2, no 2, pp 121 167, 1998 [171] X Liu, E Campanac, H.-H Cheung, M N Ziats, L Canterel-Thouennon, M Raygada, V Baxendale, A L.-Y Pang, L Yang, S Swedo et al., Idiopathic autism: cellular and molecular phenotypes in pluripotent stem cell-derived neurons, Molec-ular neurobiology, vol 54, no 6, pp 4507 4523, 2017 [172] M Kabbout, M M Garcia, J Fujimoto, D D Liu, D Woods, C.-W Chow, G Men-doza, A A Momin, B P James, L Solis et al., Ets2 mediated tumor suppressive function and met oncogene inhibition in human non small cell lung cancer, Clinical cancer research, vol 19, no 13, pp 3383 3395, 2013 [173] G Wang, N Hu, H H Yang, L Wang, H Su, C Wang, R Clifford, E M Dawsey, J.-M Li, T Ding et al., Comparison of global gene expression of gastric cardia and noncardia cancers from a high-risk population in china, PloS one, vol 8, no 5, p e63826, 2013 150 [174] K R Kukurba and S B Montgomery, Rna sequencing and analysis, Cold Spring Harbor Protocols, vol 2015, no 11, pp pdb top084 970, 2015 [175] S Zhao, W.-P Fung-Leung, A Bittner, K Ngo, and X Liu, Comparison of RNASeq and microarray in transcriptome profiling of activated T cells, PloS one, vol 9, no 1, p e78644, 2014 [176] S Chakraborty and T Rahman, The difficulties in cancer treatment, Ecancermedicalscience, vol 6, p ed16, 2012 [177] J Van Hulse, T M Khoshgoftaar, and A Napolitano, Experimental perspectives on learning from imbalanced data, in Proceedings of the 24th international conference on Machine learning - ICML ’07 Corvalis, Oregon: ACM Press, 2007, pp 935 942 [178] R Blagus and L Lusa, SMOTE for high-dimensional class-imbalanced data, BMC Bioinformatics, vol 14, no 1, p 106, Dec 2013 [179] A Butte, The use and analysis of microarray data, Nature reviews drug discovery, vol 1, no 12, p 951, 2002 [180] V Popovici and et al., Effect of training-sample size and classification difficulty on the accuracy of genomic predictors, Breast Cancer Research, vol 12, no 1, p R5, 2010 [181] E M Karabulut and T Ibrikci, Discriminative deep belief networks for microarray based cancer classification, Biomedical Research, vol 28, no 3, 2017 [182] R Fakoor, F Ladhak, A Nazi, and M Huber, Using deep learning to enhance cancer diagnosis and classification, in Proceedings of the International Conference on Machine Learning, vol 28, 2013 [183] R Maier, R Zimmer, and R Kuffner, A Turing test for artificial expression data, Bioinformatics, vol 29, no 20, pp 2603 2609, Oct 2013 [184] A Ghahramani, F M Watt, and N M Luscombe, Generative adversarial networks uncover epidermal regulators and predict single cell perturbations, bioRxiv, p 262501, 2018 [185] S Ioffe and C Szegedy, Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift, in International Conference on Machine Learning, 2015, pp 448 456 [186] S Xiang and H Li, On the effects of batch and weight normalization in generative adversarial networks, 2017 [187] B Xu, N Wang, T Chen, and M Li, Empirical evaluation of rectified activations in convolutional network, arXiv preprint arXiv:1505.00853, 2015 [188] V Nair and G E Hinton, Rectified linear units improve restricted boltzmann machines, in Proceedings of the 27th international conference on machine learning (ICML-10), 2010, pp 807 814 [189] H Liu, J Li, and L Wong, A comparative study on feature selection and classification methods using gene expression profiles and proteomic patterns, Genome informatics, vol 13, pp 51 60, 2002 151 [190] M B Christopher, PATTERN RECOGNITION AND MACHINE LEARNING Springer-Verlag New York, 2016 [191] W Buntine, Learning classification trees, Statistics and Computing, vol 2, no 2, pp 63 73, Jun 1992 [192] D H Wolpert, Stacked generalization, Neural networks, vol 5, no 2, pp 241 259, 1992 [193] W Iba and P Langley, Induction of One-Level Decision Trees, in Proceedings of the Ninth International Workshop on Machine Learning, ser ML ’92, San Francisco, CA, USA, 1992, pp 233 240 [194] P A Futreal and et al., A census of human cancer genes, Nature reviews cancer, vol 4, no 3, p 177, 2004 [195] S K Murthy, S Kasif, S Salzberg, and R Beigel, OC1: A randomized algorithm for building oblique decision trees, in Proceedings of AAAI, vol 93 Citeseer, 1993, pp 322 327 [196] R Maclin and D Opitz, An empirical evaluation of bagging and boosting, AAAI/IAAI, vol 1997, pp 546 551, 1997 [197] M Skurichina and R P Duin, Bagging, boosting and the random subspace method for linear classifiers, Pattern Analysis & Applications, vol 5, no 2, pp 121 135, 2002 [198] I.-J Kim, H C Kang, and J.-G Park, Microarray Applications in Cancer Research, Cancer Research and Treatment, vol 36, no 4, p 207, 2004 152 ...BỘ GIÁO DỤC VÀ ĐÀO TẠO TRƯỜNG ĐẠI HỌC CẦN THƠ HUỲNH PHƯỚC HẢI MƠ HÌNH XỬ LÝ HIỆU QUẢ DỮ LIỆU BIỂU HIỆN GEN LUẬN ÁN TIẾN SĨ CHUYÊN NGÀNH: HỆ THỐNG THÔNG TIN Mà NGÀNH 62480104 NGƯỜI... nh§t l Affymetrix GeneChip Human Gene 1.0 ST Array (A-AFFY-141, 33297 chi•u), Affymetrix GeneChip Human Genome U133 Plus 2.0 (A-AFFY-44, 54675 chi•u) v Affymetrix GeneChip Human Genome HG-U133A... d liằu biu hiằn gen bng mng i khĂng sinh mÔu GAN (Generative adversarial network [61]) giÊi quyt vĐn ã s mÔu t Mng GAN ữổc dịng ” sinh dœ li»u tŒng hỉp mỵi tł dœ liằu biu hiằn gen v ữổc gĂn nhÂn

Ngày đăng: 30/07/2020, 06:24

TỪ KHÓA LIÊN QUAN

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

w