(Luận án tiến sĩ) mô hình xử lý hiệu quả dữ liệu biểu hiện gen

BỘ GIÁO DỤC VÀ ĐÀO TẠO TRƯỜNG ĐẠI HỌC CẦN THƠ HUỲNH PHƯỚC HẢI MƠ HÌNH XỬ LÝ HIỆU QUẢ DỮ LIỆU BIỂU HIỆN GEN LUẬN ÁN TIẾN SĨ CHUYÊN NGÀNH: HỆ THỐNG THÔNG TIN MÃ NGÀNH 62480104 CẦN THƠ, 2019 luan an BỘ GIÁO DỤC VÀ ĐÀO TẠO TRƯỜNG ĐẠI HỌC CẦN THƠ HUỲNH PHƯỚC HẢI MƠ HÌNH XỬ LÝ HIỆU QUẢ DỮ LIỆU BIỂU HIỆN GEN LUẬN ÁN TIẾN SĨ CHUYÊN NGÀNH: HỆ THỐNG THÔNG TIN MÃ NGÀNH 62480104 NGƯỜI HƯỚNG DẪN KHOA HỌC PGS TS ĐỖ THANH NGHỊ TS NGUYỄN VĂN HÒA CẦN THƠ, 2019 luan an luan an LỜI CẢM ƠN Để hoàn thành luận án nhận hướng dẫn, quan tâm, giúp đỡ nhiệt tình từ q Thầy Cơ, bạn bè người thân Tôi xin gửi lời cảm ơn chân thành đến: Thầy PGS.TS Đỗ Thanh Nghị Thầy TS Nguyễn Văn Hịa tận tình bảo, hướng dẫn, động viên tạo điều kiện tốt cho tơi q trình học tập nghiên cứu Thầy, Cô anh, chị khoa Công nghệ thông tin Truyền thông, trường Đại học Cần Thơ cung cấp thêm kiến thức, tạo điều kiện cho tơi quan tâm, hỗ trợ tơi q trình học tập Ban giám hiệu trường Đại học An Giang, Ban chủ nhiệm Khoa Công nghệ thông tin tạo điều kiện để tham gia học tập nâng cao trình độ chun mơn, bạn đồng nghiệp không ngừng động viên giúp đỡ suốt thời gian học tập Sau xin chân thành cảm ơn sâu sắc đến gia đình người thân giúp đỡ, động viên tơi suốt q trình học tập tạo điều kiện tốt để hoàn thành luận án NCS Huỳnh Phước Hải ii luan an TÓM TẮT Trong năm gần đây, ung thư nguyên nhân hàng đầu gây tử vong tồn giới Do đó, ngày có nhiều nghiên cứu thực để tìm giải pháp hiệu chẩn đoán điều trị ung thư Tuy nhiên, nhiều thách thức nguyên nhân gây ung thư liên quan đến rối loạn di truyền thay đổi trình phát triển tự nhiên tế bào Phân tích biểu gen mơ hình học máy cơng cụ mạnh mẽ để xác định thay đổi tế bào điều kiện môi trường khác Các mơ hình học máy cung cấp thơng tin hữu ích để chẩn đoán điều trị ung thư Tuy nhiên, mơ hình học máy để phân lớp liệu biểu gen dễ bị khớp liệu biểu gen có số chiều lớn số lượng mẫu nhỏ Phân lớp liệu có số chiều lớn 10 thách thức học máy đại Trong luận án, giải vấn đề đóng góp sau Thứ nhất, chúng tơi đề xuất mơ hình rút trích đặc trưng để học tính tiềm ẩn liệu biểu gen mạng nơ-ron tích chập sâu (DCNN) Các đặc trưng rút trích DCNN cải thiện độ xác phân lớp liệu biểu gen công nghệ DNA Microarray RNA-Seq Kết thực nghiệm cho thấy độ xác phân loại cải thiện dùng DCNN rút trích đặc trưng từ liệu biểu gen Bên cạnh đó, chúng tơi cịn đề xuất phương pháp giải hai thách thức phân lớp liệu biểu gen giải thuật tăng cường liệu SMOTE từ đặc trưng rút trích mạng DCNN Giải thuật SMOTE dùng để sinh liệu tổng hợp từ đặc trưng rút trích mạng DCNN Dữ liệu tổng hợp sinh tăng cường cho liệu học sử dụng giải thuật phân lớp phân loại Thứ hai, chúng tơi đề xuất mơ hình tăng cường liệu cho phân lớp biểu gen mạng đối kháng sinh mẫu (GAN) Mạng GAN xây dựng phù hợp với liệu biểu gen để sinh liệu tổng hợp từ liệu gốc Mơ hình kết hợp với giải thuật phân lớp để phân loại liệu biểu gen Kết thực nghiệm cho thấy mơ hình đề xuất cải thiện độ iii luan an xác giải thuật gồm k láng giềng, định, máy học véc-tơ hỗ trợ rừng ngẫu nhiên Thứ ba, chúng tơi đề xuất mơ hình tập hợp xiên phân ngẫu nhiên đơn giản (RODS) dựa máy học véc-tơ hỗ trợ (SVM) để phân lớp hiệu liệu biểu gen Ý tưởng kết hợp nhiều xiên phân ngẫu nhiên đơn giản theo hướng tiếp cận Bagging Boosting Chúng xây dựng tập hợp xiên phân ngẫu nhiên đơn giản dựa siêu phẳng tối ưu thu từ huấn luyện SVM Kết thực nghiệm cho thấy mô hình đề xuất hiệu giải thuật khác gồm k láng giềng, định, máy học véc-tơ hỗ trợ, rừng ngẫu nhiên, bagging adaboost phân lớp trực tiếp số chiều gốc Ngồi ra, mơ hình đề xuất cải thiện độ xác mơ hình phân lớp kết hợp với kỹ thuật tăng cường liệu mạng GAN rút trích đặc trưng mạng DCNN Từ khóa: liệu biểu gen, mơ hình phân lớp, mạng nơ-ron tích chập sâu, mạng đối kháng sinh mẫu, mơ hình tập hợp xiên phân ngẫu nhiên đơn giản, máy học véc-tơ hỗ trợ iv luan an ABSTRACT In recent years, cancer is leading cause of death worldwide Therefore, more and more studies have been conducted which aim to improve the ability to discover cancers earlier and to diagnose them more accurately than was the case only a few years ago However, there are still many challenges in cancer treatment because the most common causes of cancer are genetic disorders and epigenetic alterations in the cells Gene expression is an exceptionally powerful tool for identifying changes in cells between different environmental conditions or developmental stages It is able to provide benefit information that is used to explore and diagnose disease Gene expression data classification models play a key role to address the fundamental problems relating to cancer Nevertheless, these models can easy overfiting because of the very-high-dimensional and small-sample-size issues Classifying gene expression data is a challenge in the field of machine learning In this dissertation we are interested in tackling these issues with the following contributions Firstly, we propose a new feature extraction model to learn latent features from gene expression data using deep convolutional neural network (DCNN) This model improves the classification accuracy of gene expression on both RNA-Seq and DNA-Microarray platforms Experiment results show that DCNN is effective to extract features from gene expression data On the other hand, we also propose a combined enhancing and extraction method to address both challenges of classification models using gene expression data In this approach, SMOTE algorithm generates new data from features extracted by DCNN These models are used in conjunction with various classifiers that efficiently classify gene expression data Secondly, we propose a new enhancing gene expression data model with generative adversarial network (GAN).GAN is implemented to generate synthetic data from original training datasets, which is used in conjunction with various classifiers to predict gene expression data Numerical test results show that our proposed model improve the classification accuracy of algorithms including support vector machines, k nearest neighbors and random forests v luan an Finally, we investigate random ensemble oblique decision stumps (RODS) based on linear support vector machine (SVM) that is suitable for classifying very-high-dimensional microarray gene expression data Our classification algorithms (called Bag-RODS and Boost-RODS) learn multiple oblique decision stumps in the way of bagging and boosting to form an ensemble of classifiers more accurate than single model Numerical test results show that our proposed algorithms are more accurate than the-state-of-the-art classification models, including k nearest neighbors, support vector machines, decision trees and ensembles of decision trees like random forests, bagging and adaboost In addition, these models also improve the classification accuracy by combined with enhancing data model using the GAN and feature extraction model using DCNN Key words: gene expression data, classification, deep convolutional neural network, generative adversarial network, random ensemble oblique decision stumps, support vector machines vi luan an MỤC LỤC LỜI CẢM ƠN ii TÓM TẮT iii ABSTRACT v MỤC LỤC vii DANH MỤC CÁC HÌNH VẼ xii DANH MỤC CÁC BẢNG BIỂU xiv CHƯƠNG GIỚI THIỆU 1.1 Tính cấp thiết luận án 1.2 Mục tiêu, đối tượng, phạm vi phương pháp nghiên cứu 1.3 Nhiệm vụ hướng tiếp cận luận án 1.3.1 Nghiên cứu xây dựng mơ hình rút trích đặc trưng cho liệu biểu gen 1.3.2 Nghiên cứu xây dựng mơ hình tăng cường liệu cho liệu biểu gen 1.3.3 Nghiên cứu xây dựng mơ hình phân lớp hiệu liệu biểu gen 1.4 Các đóng góp luận án 1.5 Bố cục luận án CHƯƠNG CƠ SỞ LÝ THUYẾT VÀ CÁC CƠNG TRÌNH LIÊN QUAN 11 2.1 Dữ liệu biểu gen 11 2.2 Mơ hình phân lớp liệu biểu gen 15 2.2.1 Phát biểu toán 15 2.2.2 Đánh giá mơ hình 16 2.2.3 Dữ liệu thực nghiệm 18 Các nghiên cứu liên quan 24 2.3.1 Mơ hình k láng giềng 24 2.3.2 Mơ hình định 25 2.3.3 Máy học véc-tơ hỗ trợ 26 2.3 vii luan an 2.3.4 Phương pháp tập hợp mơ hình 30 2.3.5 Mơ hình mạng nơ-ron nhân tạo 32 2.3.6 Các mơ hình học sâu 33 2.4 Thảo luận nghiên cứu liên quan 36 2.5 Kết chương 38 CHƯƠNG MƠ HÌNH RÚT TRÍCH ĐẶC TRƯNG CHO DỮ LIỆU BIỂU HIỆN GEN 3.1 Giới thiệu 3.2 Mô hình mạng nơ-ron tích chập sâu rút trích đặc trưng liệu biểu gen 3.2.1 3.3 3.4 39 39 41 Kiến trúc mơ hình mạng nơ-ron tích chập sâu rút trích đặc trưng cho liệu biểu gen 41 3.2.2 Q trình rút trích đặc trưng 44 3.2.3 Các giải thuật phân lớp đặc trưng rút trích 49 Kết thực nghiệm 50 3.3.1 Kết phân lớp liệu biểu gen DNA Microarray 51 3.3.2 Kết phân lớp liệu biểu gen RNA-Seq 3.3.3 Kết phân lớp tập liệu biểu gen RNA-Seq lớn 68 Kết chương 62 70 CHƯƠNG MƠ HÌNH TĂNG CƯỜNG MẪU ĐẶC TRƯNG RÚT TRÍCH BẰNG SMOTE 4.1 Giới thiệu 4.2 Tăng cường mẫu SMOTE dựa vào đặc trưng rút trích 4.3 4.4 71 71 liệu biểu gen 73 Kết thực nghiệm 76 4.3.1 Dữ liệu thực nghiệm 76 4.3.2 Thiết lập tham số mơ hình 76 4.3.3 Kết phân lớp 78 Kết chương 89 CHƯƠNG MƠ HÌNH TĂNG CƯỜNG DỮ LIỆU CHO DỮ LIỆU BIỂU HIỆN GEN 90 5.1 Giới thiệu 90 5.2 Mơ hình tăng cường mẫu cho liệu biểu gen 92 viii luan an CÁC CƠNG TRÌNH ĐÃ CƠNG BỐ [CT1] Phuoc-Hai Huynh, Van Hoa Nguyen, Thanh-Nghi Do, "Novel hybrid DCNN-SVM model for classifying RNA-Sequencing gene expression data", Journal of Information and Telecommunication (JIT), Taylor & Francis, 3:4, pp 533-547, 2019 [CT2] Phuoc-Hai Huynh, Van Hoa Nguyen, Thanh-Nghi Do, "Enhancing gene expression classification of support vector machines with generative adversarial networks", Journal of Information and Communication Convergence Engineering (JICCE), Vol 17, pp 14-20, 2019 (SCOPUS) [CT3] Phuoc-Hai Huynh, Van Hoa Nguyen, Thanh-Nghi Do, "So sánh mơ hình học sâu kỹ thuật học tự động khác phân lớp liệu biểu gene microarray", in proc of the 10th National Conference on Fundamental and Applied Information Technology Research (FAIR’10), pp 841- 850, 2017 [CT4] Phuoc-Hai Huynh, Van Hoa Nguyen, Thanh-Nghi Do, "A Coupling Support Vector Machines with the Feature Learning of Deep Convolutional Neural Networks for Classifying Microarray Gene Expression Data", in proc of the 10th Asian Conference on Intelligent Information and Database Systems (ACIIDS), Springer, pp 233-243, 2018 [CT5] Phuoc-Hai Huynh, Van Hoa Nguyen, Thanh-Nghi Do, "Random ensemble oblique decision stumps for classifying gene expression data",in proc of International Symposium on Information and Communication Technology 2018, Association for Computing Machinery (SoICT), pp 137-144, 2018 [CT6] Phuoc-Hai Huynh, Van Hoa Nguyen, Thanh-Nghi Do,"A combined enhancing and feature extraction algorithm to improve learning accuracy for gene expression classification", in proc of 6th International Conference on Future Data and Security Engineering 2019 (FDSE), Springer, pp 255-273, 2019 [CT7] Phuoc-Hai Huynh, Van Hoa Nguyen, Thanh-Nghi Do, "Improvements in the Large p, Small n Classification Issue", in Journal of SN Computer Science, Springer, 1, 207, 2020 138 luan an Tài liệu tham khảo [1] U Alon, N Barkai, D Notterman, K Gish, S Ybarra, D Mack, and A Levine, “Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays,” Proceedings of the National Academy of Sciences, vol 96, no 12, pp 6745–6750, Jun 1999 [2] F Bray and et al., “Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries,” CA: a cancer journal for clinicians, 2018 [3] J Ferlay, I Soerjomataram, R Dikshit, S Eser, C Mathers, M Rebelo, D M Parkin, D Forman, and F Bray, “Cancer incidence and mortality worldwide: sources, methods and major patterns in GLOBOCAN 2012,” International journal of cancer, vol 136, no 5, 2015 [4] L Rahib, B D Smith, R Aizenberg, A B Rosenzweig, J M Fleshman, and L M Matrisian, “Projecting cancer incidence and deaths to 2030: the unexpected burden of thyroid, liver, and pancreas cancers in the United States,” Cancer research, vol 74, no 11, pp 2913–2921, 2014 [5] C M Perou and et al., “Molecular portraits of human breast tumours,” Nature, vol 406, no 6797, p 747, 2000 [6] J S Reis-Filho and L Pusztai, “Gene expression profiling in breast cancer: classification, prognostication, and prediction,” The Lancet, vol 378, no 9805, pp 1812–1823, 2011 [7] R Van’t Veer, Laura J & Bernards, “Enabling personalized cancer medicine through analysis of gene-expression patterns,” Nature, vol 452, no 7187, p 564, 2008 [8] M Snyder, Genomics and personalized medicine: what everyone needs to know Oxford University Press, 2016 [9] F H Crick, “On protein synthesis,” in Symp Soc Exp Biol, vol 12, no 138-63, 1958, p [10] V E Velculescu, L Zhang, B Vogelstein, and K W Kinzler, “Serial analysis of gene expression,” Science (New York, N.Y.), vol 270, no 5235, pp 484–487, Oct 1995 [11] S Selvaraj and J Natarajan, “Microarray data analysis and mining tools,” Bioinformation, vol 6, no 3, p 95, 2011 [12] T R Golub and et al., “Molecular classification of cancer: class discovery and class prediction by gene expression monitoring,” science, vol 286, no 5439, pp 531–537, 1999 139 luan an [13] J Khan and et al., “Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks,” Nature medicine, vol 7, no 6, p 673, 2001 [14] L J Van’t Veer and et al., “Gene expression profiling predicts clinical outcome of breast cancer,” nature, vol 415, no 6871, p 530, 2002 [15] Y Saeys, I Inza, and P Larra˜ naga, “A review of feature selection techniques in bioinformatics,” Bioinformatics, vol 23, no 19, pp 2507–2517, 2007 [16] A Sturn, J Quackenbush, and Z Trajanoski, “Genesis: cluster analysis of microarray data ,” Bioinformatics, vol 18, no 1, pp 207–208, 2002 [17] M P Brown and et al, “Knowledge-based analysis of microarray gene expression data by using support vector machines,” Proceedings of the National Academy of Sciences, vol 97, no 1, pp 262–267, 2000 [18] I Guyon, J Weston, S Barnhill, and V Vapnik, “Gene selection for cancer classification using support vector machines,” Machine learning, vol 46, no 1-3, pp 389–422, 2002 [19] M Grze´s and M Kretowski, “Decision tree approach to microarray data analysis,” Biocybernetics and Biomedical Engineering, vol 27, no 3, pp 29–42, 2007 [20] R Parry, W Jones, T Stokes, J Phan, R Moffitt, H Fang, L Shi, A Oberthuer, M Fischer, W Tong, and others, “k-Nearest neighbor models for microarray gene expression analysis and clinical outcome prediction,” The pharmacogenomics journal, vol 10, no 4, p 292, 2010 [21] M Pirooznia, J Y Yang, M Q Yang, and Y Deng, “A comparative study of different machine learning methods on microarray gene expression data,” BMC genomics, vol 9, no 1, p S13, 2008 [22] J Lu and et al., “Microrna expression profiles classify human cancers,” nature, vol 435, no 7043, p 834, 2005 [23] F Gao, W Wang, M Tan, L Zhu, Y Zhang, E Fessler, L Vermeulen, and X Wang, “Deepcc: a novel deep learning-based framework for cancer molecular subtype classification,” Oncogenesis, vol 8, no 9, pp 1–12, 2019 [24] Q Yang and X Wu, “10 challenging problems in data mining research,” International Journal of Information Technology & Decision Making, vol 5, no 04, pp 597–604, 2006 [25] A Kalantari and et al., “Computational intelligence approaches for classification of medical data: State-of-the-art, future challenges and research directions,” Neurocomputing, vol 276, pp 2–22, 2018 [26] R Bellman, “Dynamic programming treatment of the travelling salesman problem,” Journal of the ACM (JACM), vol 9, no 1, pp 61–63, 1962 [27] A Brazma and J Vilo, “Gene expression data analysis,” FEBS letters, vol 480, no 1, pp 17–24, 2000 140 luan an [28] P Ferroni, F M Zanzotto, S Riondino, N Scarpato, F Guadagni, and M Roselli, “Breast cancer prognosis using a machine learning approach,” Cancers, vol 11, no 3, p 328, 2019 [29] M D Ganggayah, N A Taib, Y C Har, P Lio, and S K Dhillon, “Predicting factors for survival of breast cancer patients using machine learning techniques,” BMC medical informatics and decision making, vol 19, no 1, p 48, 2019 [30] L Jinyan and L Huiqing, Kent ridge bio-medical data set repository Technical report, 2002 [31] A Brazma and et al., “ArrayExpress a public repository for microarray gene expression data at the EBI,” Nucleic acids research, vol 31, no 1, pp 68–71, 2003 [32] Broad Institute TCGA Genome Data Analysis Center, “Analysis Overview for 28 January 2016,” Broad Institute of MIT and Harvard, Tech Rep., 2016 [33] Z M Hira and D F Gillies, “A Review of Feature Selection and Feature Extraction Methods Applied on Microarray Data,” Advances in Bioinformatics, vol 2015, pp 1–13, 2015 [34] U Alon, N Barkai, D A Notterman, K Gish, S Ybarra, D Mack, and A J Levine, “Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays,” Proceedings of the National Academy of Sciences, vol 96, no 12, pp 6745–6750, 1999 [35] A D Roses, “Pharmacogenetics and the practice of medicine,” Nature, vol 405, no 6788, p 857, 2000 [36] P PARK, M Pagano, and M Bonetti, “A nonparametric scoring algorithm for identifying informative genes from microarray data,” Pacific Symposium on Biocomputing Pacific Symposium on Biocomputing, vol 6, pp 52–63, 02 2001 [37] C S Tan, W S Ting, M S Mohamad, W H Chan, S Deris, and Z Ali Shah, “A review of feature extraction software for microarray gene expression data,” BioMed research international, vol 2014, 2014 [38] J Landgrebe, W Wurst, and G Welzl, “Permutation-validated principal components analysis of microarray data,” Genome Biology, vol 3, no 4, pp research0019–1, 2002 [39] A Wang and E A Gehan, “Gene selection for microarray data analysis using principal component analysis,” Statistics in medicine, vol 24, no 13, pp 2069–2087, 2005 [40] S Jonnalagadda and R Srinivasan, “Principal components analysis based methodology to identify differentially expressed genes in time-course microarray data,” BMC Bioinformatics, vol 9, no 1, p 267, Dec 2008 [41] C.-H Zheng and et al., “Gene Expression Data Classification Using Consensus Independent Component Analysis,” Genomics, Proteomics & Bioinformatics, vol 6, no 2, pp 74–82, 2008 [42] V Nikulin and G J McLachlan, “Penalized principal component analysis of microarray data,” in International Meeting on Computational Intelligence Methods for Bioinformatics and Biostatistics Springer, 2009, pp 82–96 141 luan an [43] S.-Y Kim, “Effects of sample size on robustness and prediction accuracy of a prognostic gene signature,” BMC Bioinformatics, vol 10, no 1, Dec 2009 [44] C Stretch, S Khan, N Asgarian, R Eisner, S Vaisipour, S Damaraju, K Graham, O F Bathe, H Steed, R Greiner et al., “Effects of sample size on differential gene expression, rank order and prediction accuracy of a gene signature,” PloS one, vol 8, no 6, p e65380, 2013 [45] M.-L Antonie, O R Zaiane, and A Coman, “Application of data mining techniques for medical image classification,” in Proceedings of the Second International Conference on Multimedia Data Mining Springer-Verlag, 2001, pp 94–101 [46] F Zhang, X Xu, and Y Qiao, “Deep classification of vehicle makers and models: The effectiveness of pre-training and data enhancement,” in 2015 IEEE International Conference on Robotics and Biomimetics (ROBIO) IEEE, 2015, pp 231–236 [47] K Nithya, P D Kalaivaani, and R Thangarajan, “An enhanced data mining model for text classification,” in 2012 International Conference on Computing, Communication and Applications IEEE, 2012, pp 1–4 [48] R R Bhat, V Viswanath, and X Li, “Deepcancer: Detecting cancer via deep generative learning through gene expressions,” in 2017 IEEE 15th Intl Conf on Dependable, Autonomic and Secure Computing, 2017, pp 901–908 [49] E Fix and J Hodges, “Discriminatory analysis-nonparametric discrimination: Small sample performance,” California Univ Berkeley, Tech Rep., 1952 [50] L Breiman, J Friedman, R Olshen, and C Stone, Classification and Regression T rees (Monterey, California: Wadsworth) Inc, 1984 [51] J R Quinlan, C4.5: Programs for Machine Learning San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1993 [52] Vapnik, The nature of statistical learning theory Springer science & business media, 1995 [53] L Breiman, “Random forests,” Machine learning, vol 45, no 1, pp 5–32, 2001 [54] A Statnikov, I Tsamardinos, Y Dosbayev, and C F Aliferis, “GEMS: a system for automated cancer diagnosis and biomarker discovery from microarray gene expression data,” International journal of medical informatics, vol 74, no 7-8, pp 491–503, 2005 [55] R Díaz-Uriarte and S A De Andres, “Gene selection and classification of microarray data using random forest,” BMC bioinformatics, vol 7, no 1, p 3, 2006 [56] P.-H Huynh, V.-H Nguyen, and T.-N Do, “Novel hybrid dcnn–svm model for classifying rna-sequencing gene expression data,” Journal of Information and Telecommunication, vol 3, no 4, pp 533–547, 2019 [57] P H Huynh, V H Nguyen, and T.-N Do, “So sánh mơ hình học sâu với phương pháp học tự động khác phân lớp liệu biểu gen,” in proc of the 10th National Conf on Fundamental and Applied Information Technology Research (FAIR’10), 2017, pp 841–850 142 luan an [58] P H Huynh, V H Nguyen, and T N Do, “A Coupling Support Vector Machines with the Feature Learning of Deep Convolutional Neural Networks for Classifying Microarray Gene Expression Data,” in Modern Approaches for Intelligent Information and Database Systems Springer, 2018, pp 233–243 [59] P.-H Huynh, V.-H Nguyen, and T.-N Do, “A combined enhancing and feature extraction algorithm to improve learning accuracy for gene expression classification,” in Future Data and Security Engineering, T K Dang, J Kă ung, M Takizawa, and S H Bui, Eds Cham: Springer International Publishing, 2019, pp 255–273 [60] P.-H Huynh, V H Nguyen, and T.-N Do, “Improvements in the large p, small n classification issue,” Journal of SN Computer Science, Mar 2020 [61] I Goodfellow, J Pouget-Abadie, M Mirza, B Xu, D Warde-Farley, S Ozair, A Courville, and Y Bengio, “Generative adversarial nets,” in Advances in neural information processing systems, 2014, pp 2672–2680 [62] P.-H Huynh, V H Nguyen, and T.-N Do, “Enhancing Gene Expression Classification of Support Vector Machines with Generative Adversarial Networks,” Journal of information and communication convergence engineering, vol 17, no 1, pp 14–20, Mar 2019 [63] L Breiman, “Bagging predictors,” Machine learning, vol 24, no 2, pp 123–140, 1996 [64] Freund and Schapire, “A decision-theoretic generalization of on-line learning and an application to boosting,” Journal of computer and system sciences, vol 55, no 1, pp 119–139, 1995 [65] P.-H Huynh, V H Nguyen, and T.-N Do, “Random ensemble oblique decision stumps for classifying gene expression data,” in Proceedings of the Ninth International Symposium on Information and Communication Technology, ser SoICT 2018 New York, NY, USA: Association for Computing Machinery, 2018, p 137–144 [Online] Available: https://doi.org/10.1145/3287921.3287987 [66] C A Kaiser, M Krieger, H Lodish, and A Berk, Molecular cell biology WH Freeman, 2007 [67] J D Watson, F H Crick, and others, “Molecular structure of nucleic acids,” Nature, vol 171, no 4356, pp 737–738, 1953 [68] T K Paul and H Iba, “Extraction of informative genes from microarray data,” in Proceedings of the 7th annual conference on Genetic and evolutionary computation ACM, 2005, pp 453–460 [69] Y.-F Guan and et al., “Application of next-generation sequencing in clinical oncology to advance personalized treatment of cancer,” Chinese Journal of Cancer, vol 31, no 10, pp 463–470, Oct 2012 [70] R Sandberg, “Entering the era of single-cell transcriptomics in biology and medicine,” Nature Methods, vol 11, no 1, pp 22–24, Jan 2014 [71] M Mele and et al., “The human transcriptome across tissues and individuals,” Science, vol 348, no 6235, pp 660–665, May 2015 143 luan an [72] A Kolodziejczyk, J K Kim, V Svensson, J Marioni, and S Teichmann, “The Technology and Biology of Single-Cell RNA Sequencing,” Molecular Cell, vol 58, no 4, pp 610–620, May 2015 [73] J C Alwine, D J Kemp, and G R Stark, “Method for detection of specific RNAs in agarose gels by transfer to diazobenzyloxymethyl-paper and hybridization with DNA probes.” Proceedings of the National Academy of Sciences, vol 74, no 12, pp 5350–5354, 1977 [74] T J Aitman, “DNA microarrays in medical practice,” Bmj, vol 323, no 7313, pp 611–615, 2001 [75] M Schena, D Shalon, R W Davis, and P O Brown, “Quantitative monitoring of gene expression patterns with a complementary DNA microarray,” Science, vol 270, no 5235, pp 467–470, 1995 [76] R J Lipshutz, S P Fodor, T R Gingeras, and D J Lockhart, “High density synthetic oligonucleotide arrays,” Nature Genetics, vol 21, no Suppl, pp 20–24, Jan 1999 [77] Wang, “RNA-Seq: a revolutionary tool for transcriptomics,” Nature reviews genetics, vol 10, no 1, p 57, 2009 [78] R Edgar, M Domrachev, and A E Lash, “Gene Expression Omnibus: NCBI gene expression and hybridization array data repository,” Nucleic Acids Research, vol 30, no 1, pp 207–210, Jan 2002 [79] Weinstein and et al., “The cancer genome atlas pan-cancer analysis project,” Nature genetics, vol 45, no 10, p 1113, 2013 [80] K Kourou and et al., “Machine learning applications in cancer prognosis and prediction,” Computational and structural biotechnology journal, vol 13, pp 8–17, 2015 [81] Li, Yuanyuan, and et al., “A comprehensive genomic pan-cancer classification using The Cancer Genome Atlas gene expression data,” BMC genomics, vol 18, no 1, p 508, 2017 [82] A C Berger and et al., “A comprehensive Pan-Cancer molecular study of gynecologic and breast cancers,” Cancer Cell, 2018 [83] D H Wolpert and W G Macready, “No free lunch theorems for optimization,” IEEE transactions on evolutionary computation, vol 1, no 1, pp 67–82, 1997 [84] P W Novianti, K C B Roes, and M J C Eijkemans, “Evaluation of Gene Expression Classification Studies: Factors Associated with Classification Performance,” PLoS ONE, vol 9, no 4, p e96063, 2014 [85] X Wu and et al., “Top 10 algorithms in data mining,” Knowledge and information systems, vol 14, no 1, pp 1–37, 2008 [86] Li, “Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method,” Bioinformatics, vol 17, no 12, pp 1131–1142, 2001 144 luan an [87] S Dudoit, J Fridlyand, and T P Speed, “Comparison of discrimination methods for the classification of tumors using gene expression data,” Journal of the American statistical association, vol 97, no 457, pp 77–87, 2002 [88] S Deegalla and H Bostră om, Classification of microarrays with knn: Comparison of dimensionality reduction methods,” in International Conference on Intelligent Data Engineering and Automated Learning Springer, 2007, pp 800–809 [89] J Sun, K Passi, and C Jain, “Increasing efficiency of microarray analysis by pca and machine learning methods,” in Proceedings of the International Conference on Bioinformatics & Computational Biology (BIOCOMP), 2016, p 56 [90] A Chu and et al., “Large-scale profiling of microRNAs for the cancer genome atlas,” Nucleic acids research, vol 44, no 1, pp e3–e3, 2015 [91] T.-N Do, N.-K Pham, H H Nguyen, and M T Nguyen, “Giải thuật rừng ngẫu nhiên với luật gán nhãn cục cho phân lớp,” in Proceedings of the 10th National Conference on Fundamental and Applied Information Technology Research (FAIR’9), 2015, pp 277–284 [92] N V Chawla, K W Bowyer, L O Hall, and W P Kegelmeyer, “SMOTE: Synthetic Minority Over-sampling Technique.” J Artif Intell Res., vol 16, pp 321–357, 2002 [93] Netto, “Applying decision trees to gene expression data from dna microarrays: A leukemia case study,” in XXX congress of the Brazilian computer society, X workshop on medical informatics, 2010, p 10 [94] M B Al Snousy, H M El-Deeb, K Badran, and I A Al Khlil, “Suite of decision treebased classification algorithms on cancer gene expression data,” Egyptian Informatics Journal, vol 12, no 2, pp 73–82, 2011 [95] B Ulfenborg, K Klinga-Levan, and B Olsson, “Classification of tumor samples from expression data using decision trunks,” Cancer informatics, vol 12, pp CIN–S10 356, 2013 [96] S A Ludwig, D Jakobovic, and S Picek, “Analyzing gene expression data: Fuzzy decision tree algorithm applied to the classification of cancer data,” in Fuzzy Systems (FUZZ-IEEE), 2015 IEEE International Conference on IEEE, 2015, pp 1–8 [97] N Cristianini and J Shawe-Taylor, An introduction to support vector machines and other kernel-based learning methods Cambridge university press, 2000 [98] J Weston and C Watkins, “Multi-class support vector machines,” Citeseer, Tech Rep., 1998 [99] U H.-G Kreßel, “Pairwise Classification and Support Vector Machines, B Schă olkopf, C J C Burges, and A J Smola, Eds Cambridge, MA, USA: MIT Press, 1999, pp 255–268 [100] V Vural and J G Dy, “A hierarchical method for multi-class support vector machines,” in Proceedings of the twenty-first international conference on Machine learning ACM, 2004, p 105 145 luan an [101] C.-C Chang and C.-J Lin, “LIBSVM: a library for support vector machines,” ACM transactions on intelligent systems and technology (TIST), vol 2, no 3, p 27, 2011 [102] R.-E Fan and et al., “LIBLINEAR: A library for large linear classification,” Journal of machine learning research, vol 9, no Aug, pp 1871–1874, 2008 [103] A Statnikov, L Wang, and C F Aliferis, “A comprehensive comparison of random forests and SVM for microarray-based cancer classification,” BMC bioinformatics, vol 9, no 1, p 319, 2008 [104] T.-N Do, “Giáo trình hệ tri thức khai thác liệu,” in Giáo trình hệ tri thức khai thác liệu Nhà xuất Đại học Cần Thơ, 2012, pp 800–809 [105] E.-J Yeoh and et al., “Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling,” Cancer cell, vol 1, no 2, pp 133–143, 2002 [106] S Mukherjee, “Classifying Microarray Data Using Support Vector Machines.” Boston: Kluwer Academic Publishers, 2003, pp 166–185 [107] T S Furey, N Cristianini, N Duffy, D W Bednarski, M Schummer, and D Haussler, “Support vector machine classification and validation of cancer tissue samples using microarray expression data,” Bioinformatics, vol 16, no 10, pp 906–914, 2000 [108] S.-H Hsieh and et al., “Leukemia cancer classification based on Support Vector Machine,” in Industrial Informatics (INDIN), 2010 8th IEEE International Conference on IEEE, 2010, pp 819–824 [109] C D A Vanitha, D Devaraj, and M Venkatesulu, “Gene expression data classification using support vector machine and mutual information-based gene selection,” procedia computer science, vol 47, pp 13–21, 2015 [110] S Cogill and L Wang, “Support vector machine model of developmental brain gene expression data for prioritization of Autism risk gene candidates,” Bioinformatics, vol 32, no 23, pp 3611–3618, 2016 [111] S S Hameed, R Hassan, and F F Muhammad, “Selection and classification of gene expression in autism disorder: Use of a combination of statistical filters and a GBPSO-SVM algorithm,” PloS one, vol 12, no 11, p e0187371, 2017 [112] L Gao, M Ye, and C Wu, “Cancer Classification Based on Support Vector Machine Optimized by Particle Swarm Optimization and Artificial Bee Colony,” Molecules, vol 22, no 12, p 2086, 2017 [113] R Rifkin and et al., “An analytical method for multiclass molecular cancer classification,” Siam Review, vol 45, no 4, pp 706–723, 2003 [114] L Shen and E C Tan, “Reducing multiclass cancer classification to binary by output coding and SVM,” Computational biology and chemistry, vol 30, no 1, pp 63–71, 2006 [115] A Statnikov and et al., “A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis,” Bioinformatics, vol 21, no 5, pp 631–643, 2004 146 luan an [116] C Kim and H.-y Kim, “An Improved SVM-T-RFE Based on Intensity-Dependent Normalization for Feature Selection in Gene Expression of Big-Data,” in IT Convergence and Security 2017 Springer, 2018, pp 44–51 [117] O Hamzeh, A Alkhateeb, and L Rueda, “Predicting Tumor Locations in Prostate Cancer Tissue Using Gene Expression,” in International Conference on Bioinformatics and Biomedical Engineering Springer, 2018, pp 343–351 [118] T N Do, P Lenca, S Lallich, and N.-K Pham, “Classifying very-high-dimensional data with random forests of oblique decision trees,” in Advances in knowledge discovery and management Springer, 2010, pp 39–55 [119] L Breiman, “Bias, variance, and arcing classifiers,” 1996 [120] M Moorthy, Kohbalan and M Saberi, “Random forest for gene selection and microarray data classification,” Bioinformation, vol 7, no 3, p 142, 2011 [121] I Chen, Xi and Hemant, “Random forests for genomic data analysis,” Genomics, vol 99, no 6, pp 323–329, 2012 [122] K Nishiwaki, K Kanamori, and H Ohwada, “Gene Selection from Microarray Data for Alzheimer’s Disease Using Random Forest,” International Journal of Software Science and Computational Intelligence (IJSSCI), vol 9, no 2, pp 14–30, 2017 [123] J R Ummadi, B V R Reddy, and B E Reddy, “A Novel Statistical Feature Selection Measure for Decision Tree Models on Microarray Cancer Detection,” in Proceedings of International Conference on Computational Intelligence and Data Engineering Springer, 2018, pp 229–245 [124] T.-T Nguyen, J Z Huang, and T T Nguyen, “Unbiased feature selection in learning random forests for high-dimensional data,” The Scientific World Journal, vol 2015, 2015 [125] Q Wang, T.-T Nguyen, J Z Huang, and T T Nguyen, “An efficient random forests algorithm for high dimensional data classification,” Advances in Data Analysis and Classification, pp 1–20, 2018 [126] A C Tan and D Gilbert, “Ensemble machine learning on gene expression data for cancer classification,” Applied Bioinformatics, vol 2, no Suppl, pp S7583, 2003 [127] M Dettling and P Bă uhlmann, Boosting for tumor classification with gene expression data,” Bioinformatics, vol 19, no 9, pp 1061–1069, 2003 [128] M Dettling, “BagBoosting for tumor classification with gene expression data,” Bioinformatics, vol 20, no 18, pp 3583–3593, 2004 [129] A Osareh and B Shadgar, “An efficient ensemble learning method for gene microarray classification,” BioMed research international, vol 2013, 2013 [130] G Zararsiz, D Goksuluk, S Korkmaz, V Eldem, I P Duru, A Ozturk, and T Unver, “Classification of RNA-Seq data via bagging support vector machines,” bioRxiv, p 007526, 2014 147 luan an [131] J S Wei and et al., “Prediction of clinical outcome using gene expression profiling and artificial neural networks for patients with neuroblastoma,” Cancer research, vol 64, no 19, pp 6883–6891, 2004 [132] T Ayer, O Alagoz, J Chhatwal, J W Shavlik, C E Kahn, and E S Burnside, “Breast cancer risk estimation with artificial neural networks re,” Cancer, vol 116, no 14, pp 3310–3321, 2010 [133] C Ferreira, “Designing neural networks using gene expression programming,” in Applied soft computing technologies: The challenge of complexity Springer, 2006, pp 517–535 [134] S Min, B Lee, and S Yoon, “Deep learning in bioinformatics,” Briefings in bioinformatics, vol 18, no 5, pp 851–869, 2017 [135] Y LeCun, Y Bengio, and others, “Convolutional networks for images, speech, and time series,” The handbook of brain theory and neural networks, vol 3361, no 10, p 1995, 1995 [136] D H Hubel and T Wiesel, “Shape and arrangement of columns in cat’s striate cortex,” The Journal of physiology, vol 165, no 3, pp 559–568, 1963 [137] Y LeCun, Y Bengio, and G Hinton, “Deep learning,” nature, vol 521, no 7553, p 436, 2015 [138] A Krizhevsky, I Sutskever, and G E Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information processing systems, 2012, pp 1097–1105 [139] Kim and Yoon, “Convolutional neural networks for sentence classification,” in Proceedings of the 2014 Conference on Empirical Methods in NLP (EMNLP), 2014, pp 1746–1751 [140] J Hou, B Adhikari, and J Cheng, “DeepSF: deep convolutional neural network for mapping protein sequences to folds,” Bioinformatics, 2017 [141] O Ronneberger, P Fischer, and T Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention Springer, 2015, pp 234–241 [142] P Moeskops, M Veta, M W Lafarge, K A Eppenhof, and J P Pluim, “Adversarial training and dilated convolutions for brain MRI segmentation,” in Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support Springer, 2017, pp 56–64 [143] T Araújo and et al., “Classification of breast cancer histology images using Convolutional Neural Networks,” PloS one, vol 12, no 6, p e0177544, 2017 [144] E I Zacharaki, “Prediction of protein function using a deep convolutional neural network ensemble,” PeerJ Computer Science, vol 3, p e124, 2017 [145] J F Nash and others, “Equilibrium points in n-person games,” Proceedings of the national academy of sciences, vol 36, no 1, pp 48–49, 1950 148 luan an [146] S Reed, Z Akata, X Yan, L Logeswaran, B Schiele, and H Lee, “Generative adversarial text to image synthesis,” in International Conference on Machine Learning, 2016, pp 1060–1069 [147] C Ledig, L Theis, F Huszár, J Caballero, A Cunningham, A Acosta, A Aitken, A Tejani, J Totz, Z Wang et al., “Photo-realistic single image super-resolution using a generative adversarial network,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp 4681–4690 [148] A Dosovitskiy, J T Springenberg, M Tatarchenko, and T Brox, “Learning to generate chairs, tables and cars with convolutional networks,” IEEE transactions on pattern analysis and machine intelligence, vol 39, no 4, pp 692–705, 2017 [149] E Choi, S Biswal, B Malin, J Duke, W F Stewart, and J Sun, “Generating Multilabel Discrete Electronic Health Records using Generative Adversarial Networks,” CoRR, vol abs/1703.06490, 2017 [150] T Li, C Zhang, and M Ogihara, “A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression,” Bioinformatics, vol 20, no 15, pp 2429–2437, 2004 [151] K.-B Duan, J C Rajapakse, H Wang, and F Azuaje, “Multiple SVM-RFE for gene selection in cancer classification with expression data,” IEEE transactions on nanobioscience, vol 4, no 3, pp 228–234, 2005 [152] P W Novianti, V L Jong, K C B Roes, and M J C Eijkemans, “Factors affecting the accuracy of a class prediction model in gene expression data,” BMC Bioinformatics, vol 16, no 1, p 199, 2015 [153] L Lusa and others, “SMOTE for high-dimensional class-imbalanced data,” BMC bioinformatics, vol 14, no 1, p 106, 2013 [154] M A Haidar and M Rezagholizadeh, “TextKD-GAN: text generation using knowledge distillation and generative adversarial networks,” in Canadian Conference on Artificial Intelligence Springer, 2019, pp 107–118 [155] Y Saeys, I Inza, and P Larra˜ naga, “A review of feature selection techniques in bioinformatics,” bioinformatics, vol 23, no 19, pp 2507–2517, 2007 [156] A Gupta, H Wang, and M Ganapathiraju, “Learning structure in gene expression data using deep architectures, with an application to gene clustering,” in 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) IEEE, 2015, pp 1328–1335 [157] Y Li, C Huang, L Ding, Z Li, Y Pan, and X Gao, “Deep learning in bioinformatics: Introduction, application, and perspective in the big data era,” Methods, vol 166, pp 4–21, 2019 [158] B Tang, Z Pan, K Yin, and A Khateeb, “Recent advances of deep learning in bioinformatics and computational biology,” Frontiers in genetics, vol 10, 2019 [159] M Joseph, M Devaraj, and C K Leung, “Deepgx: Deep learning using gene expression for cancer classification,” 2019 149 luan an [160] X Glorot and Y Bengio, “Understanding the difficulty of training deep feedforward neural networks,” in Proceedings of the thirteenth international conference on artificial intelligence and statistics, 2010, pp 249–256 [161] L Zhang, W Zhou, V E Velculescu, S E Kern, R H Hruban, S R Hamilton, B Vogelstein, and K W Kinzler, “Gene expression profiles in normal and cancer cells,” Science, vol 276, no 5316, pp 1268–1272, 1997 [162] J A Sparano and et al., “Adjuvant Chemotherapy Guided by a 21-Gene Expression Assay in Breast Cancer,” New England Journal of Medicine, vol 379, no 2, pp 111–121, 2018 [163] X Han, “Improving gene expression cancer molecular pattern discovery using nonnegative principal component analysis,” in Genome Informatics 2008: Genome Informatics Series Vol 21 World Scientific, 2008, pp 200–211 [164] W Astuti et al., “Support vector machine and principal component analysis for microarray data classification,” in Journal of Physics: Conference Series, vol 971, no IOP Publishing, 2018, p 012003 [165] M Abadi and et al., TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems, 2015 [166] F Pedregosa and et al., “Scikit-learn: Machine Learning in Python,” Journal of Machine Learning Research, vol 12, pp 2825–2830, 2011 [167] D P Kingma and J Ba, “Adam: A method for stochastic optimization,” Proceedings of the 3rd International Conference on Learning Representations (ICLR), 2014 [168] D Soydaner, “A comparison of optimization algorithms for deep learning,” International Journal of Pattern Recognition and Artificial Intelligence, 02 2020 [169] P Danaee, R Ghaeini, and D A Hendrix, “A deep learning approach for cancer detection and relevant gene identification,” in PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017 World Scientific, 2017, pp 219–229 [170] C J Burges, “A tutorial on support vector machines for pattern recognition,” Data mining and knowledge discovery, vol 2, no 2, pp 121–167, 1998 [171] X Liu, E Campanac, H.-H Cheung, M N Ziats, L Canterel-Thouennon, M Raygada, V Baxendale, A L.-Y Pang, L Yang, S Swedo et al., “Idiopathic autism: cellular and molecular phenotypes in pluripotent stem cell-derived neurons,” Molecular neurobiology, vol 54, no 6, pp 4507–4523, 2017 [172] M Kabbout, M M Garcia, J Fujimoto, D D Liu, D Woods, C.-W Chow, G Mendoza, A A Momin, B P James, L Solis et al., “Ets2 mediated tumor suppressive function and met oncogene inhibition in human non–small cell lung cancer,” Clinical cancer research, vol 19, no 13, pp 3383–3395, 2013 [173] G Wang, N Hu, H H Yang, L Wang, H Su, C Wang, R Clifford, E M Dawsey, J.-M Li, T Ding et al., “Comparison of global gene expression of gastric cardia and noncardia cancers from a high-risk population in china,” PloS one, vol 8, no 5, p e63826, 2013 150 luan an [174] K R Kukurba and S B Montgomery, “Rna sequencing and analysis,” Cold Spring Harbor Protocols, vol 2015, no 11, pp pdb–top084 970, 2015 [175] S Zhao, W.-P Fung-Leung, A Bittner, K Ngo, and X Liu, “Comparison of RNASeq and microarray in transcriptome profiling of activated T cells,” PloS one, vol 9, no 1, p e78644, 2014 [176] S Chakraborty and T Rahman, “The difficulties in cancer treatment,” Ecancermedicalscience, vol 6, p ed16, 2012 [177] J Van Hulse, T M Khoshgoftaar, and A Napolitano, “Experimental perspectives on learning from imbalanced data,” in Proceedings of the 24th international conference on Machine learning - ICML ’07 Corvalis, Oregon: ACM Press, 2007, pp 935–942 [178] R Blagus and L Lusa, “SMOTE for high-dimensional class-imbalanced data,” BMC Bioinformatics, vol 14, no 1, p 106, Dec 2013 [179] A Butte, “The use and analysis of microarray data,” Nature reviews drug discovery, vol 1, no 12, p 951, 2002 [180] V Popovici and et al., “Effect of training-sample size and classification difficulty on the accuracy of genomic predictors,” Breast Cancer Research, vol 12, no 1, p R5, 2010 [181] E M Karabulut and T Ibrikci, “Discriminative deep belief networks for microarray based cancer classification,” Biomedical Research, vol 28, no 3, 2017 [182] R Fakoor, F Ladhak, A Nazi, and M Huber, “Using deep learning to enhance cancer diagnosis and classification,” in Proceedings of the International Conference on Machine Learning, vol 28, 2013 [183] R Maier, R Zimmer, and R Kă uffner, A Turing test for artificial expression data,” Bioinformatics, vol 29, no 20, pp 2603–2609, Oct 2013 [184] A Ghahramani, F M Watt, and N M Luscombe, “Generative adversarial networks uncover epidermal regulators and predict single cell perturbations,” bioRxiv, p 262501, 2018 [185] S Ioffe and C Szegedy, “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift,” in International Conference on Machine Learning, 2015, pp 448–456 [186] S Xiang and H Li, “On the effects of batch and weight normalization in generative adversarial networks,” 2017 [187] B Xu, N Wang, T Chen, and M Li, “Empirical evaluation of rectified activations in convolutional network,” arXiv preprint arXiv:1505.00853, 2015 [188] V Nair and G E Hinton, “Rectified linear units improve restricted boltzmann machines,” in Proceedings of the 27th international conference on machine learning (ICML-10), 2010, pp 807–814 [189] H Liu, J Li, and L Wong, “A comparative study on feature selection and classification methods using gene expression profiles and proteomic patterns,” Genome informatics, vol 13, pp 51–60, 2002 151 luan an [190] M B Christopher, PATTERN RECOGNITION AND MACHINE LEARNING Springer-Verlag New York, 2016 [191] W Buntine, “Learning classification trees,” Statistics and Computing, vol 2, no 2, pp 63–73, Jun 1992 [192] D H Wolpert, “Stacked generalization,” Neural networks, vol 5, no 2, pp 241–259, 1992 [193] W Iba and P Langley, “Induction of One-Level Decision Trees,” in Proceedings of the Ninth International Workshop on Machine Learning, ser ML ’92, San Francisco, CA, USA, 1992, pp 233–240 [194] P A Futreal and et al., “A census of human cancer genes,” Nature reviews cancer, vol 4, no 3, p 177, 2004 [195] S K Murthy, S Kasif, S Salzberg, and R Beigel, “OC1: A randomized algorithm for building oblique decision trees,” in Proceedings of AAAI, vol 93 Citeseer, 1993, pp 322–327 [196] R Maclin and D Opitz, “An empirical evaluation of bagging and boosting,” AAAI/IAAI, vol 1997, pp 546–551, 1997 [197] M Skurichina and R P Duin, “Bagging, boosting and the random subspace method for linear classifiers,” Pattern Analysis & Applications, vol 5, no 2, pp 121–135, 2002 [198] I.-J Kim, H C Kang, and J.-G Park, “Microarray Applications in Cancer Research,” Cancer Research and Treatment, vol 36, no 4, p 207, 2004 152 luan an ... 14 luan an Hình 2.3: Cấu trúc liệu ma trận biểu gen sau chuẩn hóa 2.2 Mơ hình phân lớp liệu biểu gen 2.2.1 Phát biểu toán Mơ hình phân lớp liệu biểu gen nhằm dự đoán nhãn phần tử liệu (new input)... đặc trưng cho liệu biểu gen để giảm chiều liệu cải thiện độ xác mơ hình phân loại 1.3.2 Nghiên cứu xây dựng mơ hình tăng cường liệu cho liệu biểu gen Do thí nghiệm tạo liệu biểu gen có chi phí... đánh giá mơ hình Dữ liệu biểu Tăng cường Giải thuật Đánh giá gen liệu phân lớp mơ hình Hình 1.2: Mơ hình phân lớp sử dụng phương pháp tăng cường liệu Vì nhiệm vụ thứ hai luận án đề xuất mơ hình