Kỹ thuật lựa chọn đặc trưng có giám sát cho vấn đề phân loại ảnh hạt giống lúa

BỘ GIÁO DỤC VÀ ĐÀO TẠO TRƯỜNG ĐẠI HỌC MỞ THÀNH PHỐ HỒ CHÍ MINH —————————————— Lâm Trần Tuấn Dzi KỸ THUẬT LỰA CHỌN ĐẶC TRƯNG CÓ GIÁM SÁT CHO VẤN ĐỀ PHÂN LOẠI ẢNH HẠT GIỐNG LÚA LUẬN VĂN THẠC SỸ KHOA HỌC MÁY TÍNH TP Hồ Chí Minh, Năm 2020 BỘ GIÁO DỤC VÀ ĐÀO TẠO TRƯỜNG ĐẠI HỌC MỞ THÀNH PHỐ HỒ CHÍ MINH —————————————— Lâm Trần Tuấn Dzi KỸ THUẬT LỰA CHỌN ĐẶC TRƯNG CÓ GIÁM SÁT CHO VẤN ĐỀ PHÂN LOẠI ẢNH HẠT GIỐNG LÚA Chuyên ngành : Khoa học máy tính Mã số chuyên ngành : 60 48 01 01 LUẬN VĂN THẠC SỸ KHOA HỌC MÁY TÍNH Người hướng dẫn khoa học: TS Trương Hồng Vinh TP Hồ Chí Minh, năm 2020 LỜI CAM ĐOAN Tôi cam đoan luận văn “Kỹ thuật lựa chọn đặc trưng có giám sát cho vấn đề phân loại ảnh hạt giống lúa” nghiên cứu tơi Ngoại trừ tài liệu tham khảo trích dẫn luận văn này, tơi cam đoan toàn phần hay phần nhỏ luận văn chưa sử dụng để nhận thạc sĩ nơi khác Khơng có sản phẩm/nghiên cứu người khác sử dụng luận văn mà khơng trích dẫn theo qui định Luận văn chưa nộp để nhận cấp trường đại học sở đào tạo khác Thành phố Hồ Chí Minh, năm 2020 Lâm Trần Tuấn Dzi i LỜI CẢM ƠN Trong thời gian học tập, nghiên cứu để hoàn thiện luận văn, nhận hướng dẫn, giúp đỡ tận tình q thầy cơ, bạn bè đồng nghiệp Tôi xin gửi lời cảm ơn sâu sắc đến TS Trương Hoàng Vinh, thầy nhiệt tình hướng dẫn hỗ trợ tơi tận tình suốt thời gian nghiên cứu thực luận văn Tôi may mắn thầy chấp nhận hướng dẫn, lần xin gửi lời cảm ơn sâu sắc đến thầy, TS Trương Hoàng Vinh Ngoài xin cảm ơn quý thầy cô lãnh đạo, giảng viên, chuyên viên, thư viện trường Đại học Mở TP Hồ Chí Minh cung cấp kiến thức, tài liệu quý báu hỗ trợ cho suốt thời gian học tập trường Tôi xin chân thành cảm ơn đến anh chị nhóm thầy hướng dẫn thảo luận trao đổi, hướng dẫn, hỗ trợ tơi, góp ý q báu để tơi hồn thành luận văn tốt Cuối xin gửi lời cảm ơn đến người thân yêu gia đình bên cạnh, hỗ trợ, động viên tạo điều kiện thuận lợi để tơi hồn thành luận văn chương trình học trường ii TĨM TẮT Lựa chọn đặc trưng kỹ thuật giúp loại bỏ đặc trưng dư thừa - giảm kích thước, làm tăng độ xác phân lớp, phân lớp thường giảm hiệu xuất có nhiều liệu gây nhiễu Trong luận văn sử dụng kỹ thuật lựa chọn đặc trưng liệu hình ảnh số giống lúa Việt Nam với đề xuất sau: • Rút trích dựa ba loại mô tả đặc trưng LBP, GIST, HOG; • Ghép mơ tả đặc trưng lại, bảy tập mơ tả đặc trưng; • Sử dụng sáu phương pháp lựa chọn đặc trưng thông dụng mRMR, ReliefF, Ilfs, CFS, Fisher score, Lasso để thực nghiệm bảy tập mơ tả đặc trưng; • Phân loại bảy tập mô tả đặc trưng trước sau qua bước lựa chọn đặc trưng trên, với hai phân lớp 1-NN SVM; • Một phương pháp cải tiến nhằm làm tăng độ xác, trước tiên lựa chọn phương pháp lựa chọn đặc trưng có hiệu xuất cao trên, tiến hành ghép đặc trưng hữu ích trả từ phương pháp lựa chọn đặc trưng này, chọn phương pháp lựa chọn đặc trưng để tiến hành “lựa chọn xếp lại” đặc trưng lần cuối, cuối phân lớp tập đặc trưng vừa xếp Cách thực hoàn toàn so với kỹ thuật ghép lựa chọn đặc trưng trước Với kết thực nghiệm cho thấy, phương ghép mô tả đặc trưng cho hiệu cao so với mô tả đặc trưng ban đầu; số phương pháp lựa chọn đặc trưng cho hiệu tốt so với dự kiến, tăng độ xác cao vài phần trăm giảm kích thước liệu; phương pháp đề xuất làm việc tốt với liệu thực nghiệm, giúp tăng độ xác lên, nhiên có số khó khăn q trình thực phương pháp này, luận văn iii Nhìn tổng quát phương pháp đề xuất hoạt động hiệu với liệu thực nghiệm Đề xuất mở hướng nghiên cứu với liệu giống lúa Việt Nam, phương pháp áp dụng cho liệu khác với liệu thực nghiệm Đặc biệt đóng góp thêm kỹ thuật ghép phương pháp lựa chọn đặc trưng cho tích hợp lựa chọn đặc trưng (ensemble feature selection) iv ABSTRACT Featured selection is a method that allows eliminating irrelevant features and decreasing the dimension of feature space due to the performance of classification is sensible with high dimensional feature vector In this thesis, we applied feature selection techniques for rice seed images classification on the images database of common rice variety in Vietnam There are several propositions such as: - Extracting on three local image descriptors including LBP, GIST and HOG; - Fusing all features to create seven features sets; - Using six common feature selection methods such as mRMR, ReliefF, Ilfs, CFS, Fisher score, Lasso for experimental evaluation on seven features set; - Using 1-N and SVM classifier for classifying; - A new selection method is proposed to combine multiple selection method to create a compact representation of rice seed images This method is used to find the most useful features of each feature selection method and then they are combined and re-ranked in the last time before feeding into the classifier However, this approach raises several limitations which are presented in this thesis In general, the results show that the efficiency of the proposed method in term of accuracy This method opens a new research direction for rice variety in Vietnam so that we can verity on other data in this thesis Additionally, we contribute a new ensemble approach that combine multiple feature selection methods v MỤC LỤC Lời cam đoan i Lời cảm ơn ii Tóm tắt iii Abstract v Mục lục vi Danh mục hình đồ thị vii Danh mục bảng viii Danh mục ký hiệu ix Danh mục viết tắt x Tổng quan xử lý ảnh nhận dạng hạt 1.1 Giới thiệu 1.2 Mục tiêu luận văn 1.3 Đóng góp luận văn 1.3.1 Tính khoa học 1.4 1.5 1.3.2 Tính thực tế 1.3.3 Kinh tế xã hội Cấu trúc luận văn Phương pháp vi giống 1 5 6 Cơ sở lý thuyết 2.1 Không gian màu 2.1.1 Không gian màu RGB 2.1.2 Độ sâu màu 2.2 Đặc trưng (feature) 2.2.1 Đặc trưng 2.2.2 Đặc trưng SIFT 2.2.3 Đặc trưng LBP 2.2.4 Đặc trưng GIST 2.2.5 Đặc trưng HOG 2.3 Bộ phân lớp (classifier) 2.3.1 Random Forest 2.4 2.5 8 10 13 13 16 17 18 19 21 22 2.3.2 K-nearest neighbor 2.3.3 Support vector machine Lựa chọn đặc trưng 2.4.1 Least Absolute Shrinkage and Selection Operator 2.4.2 Maximum Relevance and Minimum Redundancy 2.4.3 ReliefF 2.4.4 Correlation Feature Selection 2.4.5 Fisher score: 2.4.6 Infinite Latent Feature Selection Kết luận 23 24 25 29 30 31 32 32 33 33 35 35 36 36 37 38 39 40 Chia liệu phân lớp 3.4.1 Chia liệu 3.4.2 Phân lớp 41 41 42 Phương pháp đề xuất 3.1 Tiền xử lý 3.1.1 Chuẩn hóa kích thước 3.1.2 Loại bỏ ảnh phần bị nhiễu bên ngồi ảnh 3.2 Chuẩn hóa liệu 3.3 Ghép mô tả đặc trưng 3.3.1 Rút trích mơ tả đặc trưng LBP, GIST, HOG 3.3.2 Ghép mô tả đặc trưng 3.4 vii 3.5 3.6 3.7 Cách dùng phương pháp lựa chọn đặc trưng với phân lớp đọc kết Một đề xuất tăng độ xác với liệu xét Kết luận Kết thực nghiệm 4.1 Dữ liệu thực nghiệm 4.2 So sánh, phân tích, đánh giá kết 4.2.1 Độ xác mơ tả đặc trưng đơn ghép 4.2.2 Thực nghiệm với phương pháp lựa chọn đặc trưng 4.2.3 Kết thực nghiệm phương pháp đề xuất nâng cao Acc 4.3 Kết luận Hướng phát triển kết luận 5.1 Đóng góp, hạn chế luận văn 5.1.1 Đóng góp 5.1.2 Hạn chế 5.2 Hướng phát triển kết luận Phụ lục 42 44 46 48 48 50 50 53 67 69 70 70 70 71 72 78 viii TÀI LIỆU THAM KHẢO [1] R B Marimont and M B Shapiro Nearest Neighbour Searches and the Curse of Dimensionality IMA Journal of Applied Mathematics, 24(1):59–70, August 1979 Publisher: Oxford Academic [2] Vinh Truong Hoang Multi color space LBP-based feature selection for texture classification page 192, 2018 [3] Rumaisah Munir and Rizwan Ahmed Khan An extensive review on spectral imaging in biometric systems: Challenges & advancements Journal of Visual Communication and Image Representation, 65:102660, December 2019 [4] Antitza Dantcheva, Petros Elia, and Arun Ross What Else Does Your Biometric Data Reveal? A Survey on Soft Biometrics IEEE Transactions on Information Forensics and Security, 11(3):441–467, March 2016 [5] Shaveta Dargan and Munish Kumar A comprehensive survey on the biometric recognition systems based on physiological and behavioral modalities Expert Systems with Applications, 143:113114, April 2020 [6] Phan Thi Thu Hong, Tran Thi Thanh Hai, Le Thi Lan, Vo Ta Hoang, Vu Hai, and Thuy Thi Nguyen Comparative Study on Vision Based Rice Seed Varieties Identification In 2015 Seventh International Conference on Knowledge and Systems Engineering (KSE), pages 377–382, Ho Chi Minh City, Vietnam, October 2015 IEEE 73 [7] Itthi Chatnuntawech, Kittipong Tantisantisom, Paisan Khanchaitit, Thitikorn Boonkoom, Berkin Bilgic, and Ekapol Chuangsuwanich Rice Classification Using Spatio-Spectral Deep Convolutional Neural Network page 18, 2018 [8] Nicolas Vandenbroucke, Laurent Busin, and Ludovic Macaire Unsupervised color-image segmentation by multicolor space iterative pixel classification Journal of Electronic Imaging, 24(2):023032, April 2015 [9] Juwei Lu and K.N Plataniotis On conversion from color to gray-scale images for face detection In 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pages 114– 119, Miami, FL, June 2009 IEEE [10] David G Lowe Distinctive Image Features from Scale-Invariant Keypoints International Journal of Computer Vision, 60(2):91–110, November 2004 [11] D.G Lowe Object recognition from local scale-invariant features In Proceedings of the Seventh IEEE International Conference on Computer Vision, pages 1150–1157 vol.2, Kerkyra, Greece, 1999 IEEE [12] Timo Ojala, Matti Pietikăainen, and Topi Măaenpăaăa A Generalized Local Binary Pattern Operator for Multiresolution Gray Scale and Rotation Invariant Texture Classification In Gerhard Goos, Juris Hartmanis, Jan van Leeuwen, Sameer Singh, Nabeel Murshed, and Walter Kropatsch, editors, Advances in Pattern Recognition — ICAPR 2001, volume 2013, pages 399–408 Springer Berlin Heidelberg, Berlin, Heidelberg, 2001 [13] Li Liu, Paul Fieguth, Yulan Guo, Xiaogang Wang, and Matti Pietikăainen Local binary features for texture classification: Taxonomy and experimental study Pattern Recognition, 62:135–160, February 2017 74 [14] Aude Oliva and Antonio Torralba Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope page 31, 2001 [15] N Dalal and B Triggs Histograms of Oriented Gradients for Human Detection In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), volume 1, pages 886–893, San Diego, CA, USA, 2005 IEEE [16] Jason Brownlee A Tour of Machine Learning Algorithms December 2019 [17] Manuel Fernandez-Delgado, Eva Cernadas, Senen Barro, and Dinani Amorim Do we Need Hundreds of Classifiers to Solve Real World Classification Problems? page 49, 2014 [18] Advances in computational toxicology Springer Berlin Heidelberg, New York, NY, 2019 [19] Verónica Bolón-Canedo and Amparo Alonso-Betanzos Ensembles for feature selection: A review and future trends Information Fusion, 52:1– 12, December 2019 [20] Jie Cai, Jiawei Luo, Shulin Wang, and Sheng Yang Feature selection in machine learning: A new perspective Neurocomputing, 300:70–79, July 2018 [21] Makoto Yamada, Wittawat Jitkrittum, Leonid Sigal, Eric P Xing, and Masashi Sugiyama High-Dimensional Feature Selection by FeatureWise Kernelized Lasso Neural Computation, 26(1):185–207, January 2019 arXiv: 1202.0515 [22] Zhenyu Zhao, Radhika Anand, and Mallory Wang Maximum Relevance and Minimum Redundancy Feature Selection Methods for a Marketing Machine Learning Platform arXiv:1908.05376 [cs, stat], August 2019 arXiv: 1908.05376 75 [23] Sergio Ramírez-Gallego, Iago Lastra, David Martínez-Rego, Verónica Bolón-Canedo, José Manuel Benítez, Francisco Herrera, and Amparo Alonso-Betanzos Fast-mRMR: Fast Minimum Redundancy Maximum Relevance Algorithm for High-Dimensional Big Data: FAST-mRMR ALGORITHM FOR BIG DATA International Journal of Intelligent Systems, 32(2):134–152, February 2017 [24] Igor Kononenko Estimating attributes: Analysis and extensions of RELIEF In J G Carbonell, J Siekmann, G Goos, J Hartmanis, Francesco Bergadano, and Luc Raedt, editors, Machine Learning: ECML-94, volume 784, pages 171–182 Springer Berlin Heidelberg, Berlin, Heidelberg, 1994 [25] Kenji Kira and Larry A Rendell A Practical Approach to Feature Selection In Machine Learning Proceedings 1992, pages 249–256 Elsevier, 1992 ˇ ˇ [26] Igor Kononenko, Edvard Simec, and Marko Robnik-Sikonja Overcoming the Myopia of Inductive Learning Algorithms with RELIEFF Applied Intelligence, 7(1):39–55, January 1997 ˇ [27] Marko Robnik-Sikonja and Igor Kononenko Theoretical and Empirical Analysis of ReliefF and RReliefF Machine Learning, 53(1):23–69, October 2003 [28] Mark Hall Correlation-Based Feature Selection for Machine Learning Department of Computer Science, 19, June 2000 [29] Quanquan Gu, Zhenhui Li, and Jiawei Han Generalized Fisher Score for Feature Selection Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence, UAI 2011, February 2012 [30] Kathryn Laskey and Henri Prade Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence (1999) arXiv:1304.3843 [cs], August 2014 arXiv: 1304.3843 76 [31] Tajul Miftahushudur, Chaeriah Bin Ali Wael, and Teguh Praludi Infinite Latent Feature Selection Technique for Hyperspectral Image Classification Jurnal Elektronika dan Telekomunikasi, 19(1):32, August 2019 [32] Isha Vats Otsu Image Segmentation Algorithm: A Review 5(6):4, 2007 77 PHỤ LỤC Luận văn có hai báo: chấp nhận báo cáo hội nghị Springer-Rice 2020 [P1] (Scopus & Web of Science), chờ kết phản biện tạp chí Advances in Electrical and Electronic Engineering [P2] CÔNG BỐ [P1] D Lam Tran Tuan, V Truong Hoang, “Using feature selection based on multi-view for rice seed images classification”, Fifth International Conference on Research in Intelligent and Computing in Engineering (Springer RICE 2020) Springer-RICE, June 3-4, 2020, Thu Dau Mot, Vietnam (Accepted) [P2] D Lam Tran Tuan, V Truong Hoang, “Ensemble feature selection approach based on feature ranking for rice seed images classification” Advances in Electrical and Electronic Engineering (Submitted - Under review) 78 Using feature selection based on multi-view for rice seed images classification Dzi Lam Tran Tuan and Vinh Truong Hoang Ho Chi Minh City Open University, Vietnam e-mail: dziltt.178i@ou.edu.vn; vinh.th@ou.edu.vn Abstract Rice is one of the primary foods for human and various rice varieties have been cultivated in many countries Automatic rice variety inspection system based on computer vision is needed instead of using technical human expert In this paper, we have investigated three types of local descriptor such as Local Binary Pattern, Histogram Of Oriented Gradients and GIST to characterize rice seed images In order to enhance the robustness, multi-view approach has been considered to concatenate theses features However, this problem encounters the high-dimensional space feature vector and need to select the relevant features for a compact and better model We have applied different feature selection methods in a filter way to eliminate the impertinent features The experimental results on the VNRICE dataset have been shown the efficiency of our proposition Keywords: LBP · HOG · GIST · feature selection · feature ranking · multi-view Introduction In many countries, rice is the main crop and is a essential source of food consumed by nearly half the world’s population One of the most important factors in a high yield crop is that rice seeds must be pure The quality of rice depends entirely on the genetic properties of a rice variety In fact, various rice varieties may be combined with others, which might affect the productivity of a rice More recently, ST25 - a specific variety rice invented at Soc Trang province in south of Vietnam is nominated the best rice in the world After that, a plenty of fake ST25 is produced and sold in other regions So, it is extremely difficult for distinguish or recognize different rice variety by a non-expert human At this time, the process to select and identify the unwanted seeds is made manually in Vietnamese companies Based on the different visual inspection by technical experts Thus, this process is time consuming and can lead to a degradation of the quality of seeds An inspection process via computer vision system is required to replace the human task A huge number of vision systems for real-life problems have been developed and implemented [7,20] In agriculture, there are various works that use machine vision for automatic inspection, quality control of agricultural products Dzi Lam Tran Tuan and Vinh Truong Hoang Recently, the rapid advancement of technology is being developed to characterize features of multimedia data Various attributes descriptors have been introduced to represent image in the last decade [11] Each kind of attribute represents the data in a specific space and has precise spatial meaning and statistical properties Nhat and Hoang [17] present a method to fuse the features extracting from three descriptors (Local Binary Pattern, Histogram of Gradient and GIST) based on block division for face recognition task The concatenated features are then applied by canonical correlation analysis to have a compact representation before feeding into classifier Van and Hoang [21] propose to reduce noisy and irrelevant Local Ternary Pattern (LTP) features and Histogram of Gradient (HOG) coding on different color spaces for face analysis Mebatsion et al [15] fuse Fourier descriptors and three geometrical features for cereal grains recognition Duong and Hoang [6] apply to extract rice seed images based on features coded in multiple color spaces using HOG descriptor Multi-view learning was introduced to complement information between different views Different local descriptors are extracted and combined to create a multi-view image representation While concatenating different feature sets, it is evident that all the features not give the same contribution for the learning task and some feature might decrease the performance Feature selection method is used to select relevant features and reduce the high dimension of the origin data Feature selection allows to increase the efficiency of learning activities in two categories of purpose:(1) to better represent data, it eliminates noisy or irrelevant features, (2) to compute the classification or clustering efficiently since the dimensions of feature space selected is reduced More recently, Zhang et al present a complete review related to attribute selection and feature-level fusion methods [23] In this paper, we propose to extract multi-view features set based on LBP, HOG and GIST descriptors from rice seed images for classification task Before feeding to classifier, various feature selection approaches are further investigated to reduce the dimension space and increase the classification performance This paper is organized and structured as follows Section 2, introduces the feature extracting methods based on three local image descriptors Section and present proposed approach and experimental results Finally, the conclusion is discussed in section The feature extracting methods This section briefly reviews three local image descriptors used in the experiment for extracting features 2.1 Local binary pattern The LBPP,R (xc , yc ) code of each pixel (xc , yc ) is calculated by comparing the −1 gray value gc of the central pixel with the gray values {gi }P i=0 of its P neighbors, as follows [18]: Using feature selection based on multi-view for rice seed images classification LBPP,R = P −1 p=0 ω(gp − gc )2p (1) where gc is the gray value of central, gp is the gray value of P , R is the radius of the circle and ω(gp − gc ) is defined as if (gp - gc ) ≥ 0, otherwise ω(gp − gc ) = 2.2 (2) GIST GIST is firstly proposed by Oliva and Torralba [19] in order to classify objects which represents the shape of the object The primary idea of this approach is based on the Gabor filter: h(x, y) = e y − 21 ( x + δ ) −j2π(u0 x+v0 y) δx y e (3) For each (δx , δy ) of the image via the Gabor filter, we obtain all the elements in the image that are close to the point color (u0 x + v0 y) The result of the calculated vector GIST will have many dimensions To reduce the size of the vector, we averaged over each × grid of the above results Each image is configured by a Gabor filter with scales (scales) and directions (orientations), creating 32 characteristic maps of the same size 2.3 Histograms of Oriented Gradient Histograms of Oriented Gradient (HOG) descriptor is applied for different tasks in machine vision [5] such as human detection [4] HOG feature is extracted by counting the occurrences of gradient orientation base on the gradient angle and the gradient magnitude of local patches of an image The gradient angle and magnitude at each pixel is computed in an × pixels patch Next, 64 gradient feature vectors is divided into angular bins − 180◦ (20◦ each) The gradient magnitude T and angle K at each position (k, h) from an image J are computed as follow: ∆k = |J (k − 1, h) − J (k + 1, h)| (4) ∆h = |J (k, h − 1) − J (k, h + 1)| T (k, h) = (5) ∆2i + ∆2j (6) ∆k ∆j (7) K (k, h) = tan−1 Dzi Lam Tran Tuan and Vinh Truong Hoang Feature Selection Based on the availability of supervised information (i.e labels), feature selection techniques can be grouped into two large categories: supervised and unsupervised context [1] Additionally, different strategies of feature selection is proposed based on evaluation process such as filter, wrapper and hybrid methods [8] Hybrid approaches incorporate both filter and wrapper into a single structure, in order to give an effective solution for dimensionality reduction [3] In order to study the contribution of feature selection approaches for rice seed images classification, we propose to apply several selection approach based on image representing by multi-view descriptor In the following we will shortly present the common feature selection methods applied in supervised learning context LASSO (Least Absolute Shrinkage and Selection Operator) allows to compute feature selection based on the assumption of linear dependency between input features and output values Lasso minimizes the sum of squares of residuals when the sum of the absolute values of the regression coefficients is less than a constant, which yields certain strict regression coefficients equal to [3,22] mRMR (Maximum Relevance and Minimum Redundancy) is a mutual information based feature selection criterion, or distance/similarity scores to select features The aim is to penalize a feature’s relevancy by its redundancy in the presence of the other selected features [24] ReliefF [13] is extended from Relief [12] to support multiclass problem ReliefF seems to be a promising heuristic function that may overcome the myopia of current inductive learning algorithms Kira and Rendell used ReliefF as a preprocessor to eliminate irrelevant attributes from data description before learning ReliefF is general, relatively efficient, and reliable enough to guide the search in the learning process [14] CFS (Correlation Feature Selection) mainly applies heuristic methods to evaluate the effect of single feature corresponding to each group in order to obtain an optimal subset of attributes Fisher [2] identify a subset of features so that the distances between samples in different classes are as large as possible, while the distances between samples in the same class are as small as possible Fisher selects the top ranked features according to its scores Ilfs (Infinite Latent Feature Selection) is a technique consists of three steps such as preprocessing, feature weighting based on a fully connected graph in each node that connect all features Finally, energy scores of the path length are calculated, then rank its correspondence with the feature [16] 4.1 Experimental Results Experimental setup The rice seed images database compose of six rice seed varieties in the northern of Vietnam [10,6] We apply the 1-NN and SVM classifier to evaluated the Using feature selection based on multi-view for rice seed images classification classification performance A half of database is selected for training set and the rest is used for testing set We use Hold-out method with ratio (1/2 and 1/2) and split the training and testing set like a chessboard decomposition All experiments are implemented and simulated by Matlab 2019a and conducted on a PC with a configuration of a CPU Xeon 3.08GHz, 64 GBs of RAM 4.2 Results Table shows the accuracy obtained by 1-NN and SVM classifier without feature selection approach is applied The first column indicates the features used for representing images We use three individual local descriptor namely LBP, GIST and HOG and the concatenation of these descriptors such as LBP + GIST, LBP + HOG, GIST + HOG and LBP + GIST + HOG The second column indicates the number of feature (or dimension) corresponding to features type The third and fourth columns show the accuracy obtained by 1-NN and SVM classifier We observe that the multi-view by concatenating multiple feature give the better performance, however it increases the dimension Hence, the performance of SVM classifier is better than 1-NN classifier with 95.7% of accuracy Table 1: Classification performance without selection approach for different type of features Features Dimension 1-NN SVM LBP 768 53.0 77.0 GIST 512 69.4 88.3 HOG 21,384 71.5 94.7 LBP + GIST 1,280 70.5 91.7 LBP + HOG 22,152 72.0 95.5 GIST + HOG 21,896 72.1 94.9 LBP + GIST + HOG 22,664 72.7 95.7 Table presents the classification performance based o LBP + HOG features with different feature selection approaches The second to sixth row indicate the name of feature selection method The two classifier is used for each feature ranking approach The ACC with three sub-columns corresponding the three sub-columns indicate such as: (1) 100% column represents the accuracy obtain when using all features, (2) id% column present the minimum percentage of features used for reaching the same accuracy as all features in used (3) Dim column indicates the number of selected features is used for obtaining the corresponding accuracy The MAX ACC is the second sub-colum of 1-NN column It shows the maximal accuracy obtained for each feature selection method via three sub-columns, the percentage of features in used and the number of dimension, respectively We try to analyze this table by comparing the accuracy obtained and the number of selected features reduced For example, by using only 21% of features, we obtain the same accuracy as all features in used by Fisher selection method Dzi Lam Tran Tuan and Vinh Truong Hoang Table 2: LBP + HOG features - classification performance based on different feature selection method with 1-NN and SVM classifier ACC: accuracy, Dim: dimension, id%: percentage of selected features, id%: percentage of selected features with accuracy equal to all features are used 1-NN ACC Max ACC 100% id% Dim Max id % Dim Fisher 22,152 72.0 21 4,652 73.3 23 5,095 mRMR 22,152 72.0 1,108 76.4 11 2,437 ReliefF 22,152 72.0 443 75.7 665 Ilfs 22,152 72.0 98 21,709 72.2 98 21,709 Cfs 22,152 72.0 12 2,658 73.4 38 8,418 Lasso 22,152 72.0 1,994 75.8 19 4,209 LBP+ Dim HOG SVM ACC 100% id% 95.5 100 95.5 93 95.5 87 95.5 99 95.5 55 95.5 100 Dim 22,152 20,601 19,272 21,930 12,184 22,152 Max ACC Max id % Dim 95.5 100 22,152 95.6 98 21,709 95.5 88 19,494 95.5 99 21,930 95.6 88 19,494 95.5 100 22,152 Figure illustrated in detailed the performance of six feature selection approach corresponding to the percentage of features selected for GIST + HOG features and LBP + GIST + HOG features We see that the performance is totally depended on the kind of features and selection methods Table and table illustrate the same remark and observation for other multi-view strategy Table 3: GIST + HOG features - classification performance based on different feature selection method with 1-NN and SVM classifier ACC: accuracy, Dim: dimension, id%: percentage of selected features, id%: percentage of selected features with accuracy equal to all features are used GIST+ Dim HOG ACC 100% id% Fisher 21,896 72.1 20 mRMR 21,896 72.1 ReliefF 21,896 72.1 Ilfs 21,896 72.1 16 Cfs 21,896 72.1 Lasso 21,896 72.1 1-NN Dim 4,379 1,752 438 3,503 1,971 1,971 Max ACC Max id % Dim 73.4 27 5,912 74.9 13 2,846 75.0 657 72.3 16 3,503 73.4 66 14,451 76.9 20 4,379 SVM ACC 100% id% 94.9 93 94.9 85 94.9 83 94.9 98 94.9 42 94.9 100 Dim 20,363 18,612 18,174 21,458 9,196 21896 Max ACC Max id % Dim 95.0 95 20,801 95.1 93 20,363 95.1 85 18,612 95.0 99 21,677 95.3 85 18,612 94.9 100 21,896 According to the obtained results by different strategies, we conclude that there is no best approach for selecting an optimal subset of features as confirmed in [9], it is mainly depend on the features type The multi-view allows to improve the classification performance since it gives the best accuracy 95.7% (by SVM classifier and without feature selection) The dimension is largely reduced from 22,664 features to 16,318 features by Cfs method (see Figure 4) However, the accuracy is slightly improved with 0.2% By using feature selection method, it always gives the accuracy equal to the accuracy obtained in the case no selection method is applied Using feature selection based on multi-view for rice seed images classification Fig 1: 1-NN (a) and SVM (b) classifier on GIST + HOG features and LBP + GIST + HOG features (a) 1-NN (b) SVM SVM classifier on GIST + HOG features 80 100 75 95 70 90 Accuracy Accuracy 1-NN classifier on GIST + HOG features 65 60 80 CFS FISHER ILFS LASSO MRMR RELIEFF 55 85 CFS FISHER ILFS LASSO MRMR RELIEFF 75 50 70 10 20 30 40 50 60 70 80 90 100 10 20 Number of selected features 30 (c) 1-NN 50 60 70 80 90 100 (d) SVM 1-NN classifier on LBP + GIST + HOG features SVM classifier on LBP + GIST + HOG features 80 100 75 95 70 90 Accuracy Accuracy 40 Number of selected features 65 60 80 CFS FISHER ILFS LASSO MRMR RELIEFF 55 85 CFS FISHER ILFS LASSO MRMR RELIEFF 75 50 70 10 20 30 40 50 60 70 80 90 100 Number of selected features 10 20 30 40 50 60 70 80 90 100 Number of selected features Table 4: LBP + GIST + HOG - classification performance based on different feature selection method with 1-NN and SVM classifier ACC: accuracy, Dim: dimension, id%: percentage of selected features, id%: percentage of selected features with accuracy equal to all features are used LBP + GIST + HOG Fisher mRMR ReliefF Ilfs Cfs Lasso 1-NN ACC Max ACC 100% id% Dim Max id % Dim 22,664 72.7 20 4,533 74.3 24 5,439 22,664 72.7 1,360 77.2 11 2,493 22,664 72.7 453 75.6 680 22,664 72.7 100 22,664 72.8 100 22,664 22,664 72.7 10 2,266 73.9 29 6,573 22,664 72.7 2,040 76.4 18 4,080 Dim SVM ACC 100% id% 95.7 48 95.7 82 95.7 100 95.7 100 95.7 44 95.7 100 Dim 10,879 18,584 22,664 22,664 9,972 22,664 Max ACC Max id % Dim 95.8 81 18,358 95.8 85 19,264 95.7 100 22,664 95.7 100 22,664 95.9 72 16,318 95.7 100 22,664 Dzi Lam Tran Tuan and Vinh Truong Hoang Conclusion This paper presents a rice seed image classification based on multi-view descriptor and feature selection approach We use three common local image descriptors such as LBP, HOG and GIST The six feature selection method in supervised learning context and hybrid evaluation strategy is then applied to remove irrelevant feature on different view strategy The proposed approach is then evaluated on rice seed images database and shows its efficiency by improving the classification performance and reduce the dimension space This work is now extended to ensemble feature selection which combine multiple selection methods to achieve the compact and better representation References K Benabdeslem and M Hindawi Constrained laplacian score for semi-supervised feature selection In Machine Learning and Knowledge Discovery in Databases, pages 204–218 Springer, 2011 Christopher M Bishop et al Neural networks for pattern recognition Oxford university press, 1995 Jie Cai, Jiawei Luo, Shulin Wang, and Sheng Yang Feature selection in machine learning: A new perspective Neurocomputing, 300:70–79, July 2018 N Dalal and B Triggs Histograms of oriented gradients for human detection In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), volume 1, pages 886–893 IEEE Oscar Deniz, Gloria Bueno, Jes´ us Salido, and Fernando De la Torre Face recognition using histograms of oriented gradients Pattern Recognition Letters, 32:1598– 1603, 09 2011 H Duong and V T Hoang Dimensionality reduction based on feature selection for rice varieties recognition In 2019 4th International Conference on Information Technology (InCIT), pages 199–202, Oct 2019 Juliana Gomes and Fabiana Leta Applications of computer vision techniques in the agriculture and food industry: A review European Food Research and Technology, 235:989–1000, 09 2014 I Guyon and A Elisseeff An introduction to variable and feature selection The Journal of Machine Learning Research, 3:1157–1182, 2003 V T Hoang Multi color space LBP-based feature selection for texture classification PhD thesis, Universit´e du Littoral Cˆ ote d’Opale, 2018 10 Phan Thi Thu Hong, Tran Thi Thanh Hai, Le Thi Lan, Vo Ta Hoang, Vu Hai, and Thuy Thi Nguyen Comparative Study on Vision Based Rice Seed Varieties Identification In 2015 Seventh International Conference on Knowledge and Systems Engineering (KSE), pages 377–382, Ho Chi Minh City, Vietnam, October 2015 IEEE 11 A Humeau-Heurtier Texture feature extraction methods: A survey IEEE Access, 7:8975–9000, 2019 12 Kenji Kira and Larry A Rendell A Practical Approach to Feature Selection In Machine Learning Proceedings 1992, pages 249–256 Elsevier, 1992 13 Igor Kononenko Estimating attributes: Analysis and extensions of RELIEF In J G Carbonell, J Siekmann, G Goos, J Hartmanis, Francesco Bergadano, and Using feature selection based on multi-view for rice seed images classification 14 15 16 17 18 19 20 21 22 23 24 Luc Raedt, editors, Machine Learning: ECML-94, volume 784, pages 171–182 Springer Berlin Heidelberg, Berlin, Heidelberg, 1994 Igor Kononenko, Edvard Simec, and Marko Robnik-Sikonja Overcoming the myopia of inductive learning algorithms with RELIEFF page 17 H.K Mebatsion, J Paliwal, and D.S Jayas Automatic classification of nontouching cereal grains in digital images using limited morphological and color features Computers and Electronics in Agriculture, 90:99–105, January 2013 Tajul Miftahushudur, Chaeriah Bin Ali Wael, and Teguh Praludi Infinite Latent Feature Selection Technique for Hyperspectral Image Classification Jurnal Elektronika dan Telekomunikasi, 19(1):32, August 2019 H T M Nhat and V T Hoang Feature fusion by using lbp, hog, gist descriptors and canonical correlation analysis for face recognition In 2019 26th International Conference on Telecommunications (ICT), pages 371–375, April 2019 T Ojala, M Pietikă ainen, and T Mă aenpă aa ă A generalized local binary pattern operator for multiresolution gray scale and rotation invariant texture classification In Proceedings of the Second International Conference on Advances in Pattern Recognition, pages 397–406 Springer-Verlag, 2001 Aude Oliva and Antonio Torralba Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope page 31, 2001 Diego In´ acio Patr´ıcio and Rafael Rieder Computer vision and artificial intelligence in precision agriculture for grain crops: A systematic review Computers and Electronics in Agriculture, 153:69–81, 2018 Tien Nguyen Van and Vinh Truong Hoang Kinship Verification based on Local Binary Pattern features coding in different color space In 2019 26th International Conference on Telecommunications (ICT) (ICT 2019), Hanoi, Vietnam, April 2019 Makoto Yamada, Wittawat Jitkrittum, Leonid Sigal, Eric P Xing, and Masashi Sugiyama High-Dimensional Feature Selection by Feature-Wise Kernelized Lasso Neural Computation, 26(1):185–207, January 2019 arXiv: 1202.0515 Rui Zhang, Feiping Nie, Xuelong Li, and Xian Wei Feature selection with multiview data: A survey Information Fusion, 50:158–167, October 2019 Zhenyu Zhao, Radhika Anand, and Mallory Wang Maximum Relevance and Minimum Redundancy Feature Selection Methods for a Marketing Machine Learning Platform arXiv:1908.05376 [cs, stat], August 2019 arXiv: 1908.05376 ... sánh 06 phương pháp lựa chọn đặc trưng khác học có giám sát; kỹ thuật lựa chọn đặc trưng phù hợp để có độ xác cao chưa áp dụng kỹ thuật lựa chọn đặc trưng, mà giảm số đặc trưng gốc hay gọi giảm... thường, có hai loại đặc trưng có liên quan đặc trưng không liên quan Trong khuôn khổ phân loại, đặc trưng phù hợp đặc trưng có chứa thơng tin phân biệt lớp (học có giám sát) cụm (học khơng có giám sát) ... pháp lựa chọn đặc trưng này, chọn phương pháp lựa chọn đặc trưng để tiến hành ? ?lựa chọn xếp lại” đặc trưng lần cuối, cuối phân lớp tập đặc trưng vừa xếp Cách thực hoàn toàn so với kỹ thuật ghép lựa

Định dạng
Số trang	101
Dung lượng	7,57 MB