Kỹ thuật lựa chọn đặc trưng có giám sát cho vấn đề phân loại ảnh hạt giống lúa

BỘ GIÁO DỤC VÀ ĐÀO TẠO TRƯỜNG ĐẠI HỌC MỞ THÀNH PHỐ HỒ CHÍ MINH —————————————— Lâm Trần Tuấn Dzi KỸ THUẬT LỰA CHỌN ĐẶC TRƯNG CÓ GIÁM SÁT CHO VẤN ĐỀ PHÂN LOẠI ẢNH HẠT GIỐNG LÚA LUẬN VĂN THẠC SỸ KHOA HỌC MÁY TÍNH Tai Lieu Chat Luong TP Hồ Chí Minh, Năm 2020 BỘ GIÁO DỤC VÀ ĐÀO TẠO TRƯỜNG ĐẠI HỌC MỞ THÀNH PHỐ HỒ CHÍ MINH —————————————— Lâm Trần Tuấn Dzi KỸ THUẬT LỰA CHỌN ĐẶC TRƯNG CÓ GIÁM SÁT CHO VẤN ĐỀ PHÂN LOẠI ẢNH HẠT GIỐNG LÚA Chuyên ngành : Khoa học máy tính Mã số chuyên ngành : 60 48 01 01 LUẬN VĂN THẠC SỸ KHOA HỌC MÁY TÍNH Người hướng dẫn khoa học: TS Trương Hồng Vinh TP Hồ Chí Minh, năm 2020 LỜI CAM ĐOAN Tôi cam đoan luận văn “Kỹ thuật lựa chọn đặc trưng có giám sát cho vấn đề phân loại ảnh hạt giống lúa” nghiên cứu tơi Ngoại trừ tài liệu tham khảo trích dẫn luận văn này, tơi cam đoan tồn phần hay phần nhỏ luận văn chưa sử dụng để nhận thạc sĩ nơi khác Không có sản phẩm/nghiên cứu người khác sử dụng luận văn mà khơng trích dẫn theo qui định Luận văn chưa nộp để nhận cấp trường đại học sở đào tạo khác Thành phố Hồ Chí Minh, năm 2020 Lâm Trần Tuấn Dzi i LỜI CẢM ƠN Trong thời gian học tập, nghiên cứu để hồn thiện luận văn, tơi nhận hướng dẫn, giúp đỡ tận tình quý thầy cô, bạn bè đồng nghiệp Tôi xin gửi lời cảm ơn sâu sắc đến TS Trương Hoàng Vinh, thầy nhiệt tình hướng dẫn hỗ trợ tơi tận tình suốt thời gian nghiên cứu thực luận văn Tôi may mắn thầy chấp nhận hướng dẫn, lần xin gửi lời cảm ơn sâu sắc đến thầy, TS Trương Hồng Vinh Ngồi tơi xin cảm ơn q thầy cô lãnh đạo, giảng viên, chuyên viên, thư viện trường Đại học Mở TP Hồ Chí Minh cung cấp kiến thức, tài liệu quý báu hỗ trợ cho suốt thời gian học tập trường Tôi xin chân thành cảm ơn đến anh chị nhóm thầy hướng dẫn thảo luận trao đổi, hướng dẫn, hỗ trợ tôi, góp ý q báu để tơi hồn thành luận văn tốt Cuối xin gửi lời cảm ơn đến người thân yêu gia đình ln bên cạnh, hỗ trợ, động viên tạo điều kiện thuận lợi để tơi hồn thành luận văn chương trình học trường ii TÓM TẮT Lựa chọn đặc trưng kỹ thuật giúp loại bỏ đặc trưng dư thừa - giảm kích thước, làm tăng độ xác phân lớp, phân lớp thường giảm hiệu xuất có nhiều liệu gây nhiễu Trong luận văn sử dụng kỹ thuật lựa chọn đặc trưng liệu hình ảnh số giống lúa Việt Nam với đề xuất sau: • Rút trích dựa ba loại mơ tả đặc trưng LBP, GIST, HOG; • Ghép mơ tả đặc trưng lại, bảy tập mô tả đặc trưng; • Sử dụng sáu phương pháp lựa chọn đặc trưng thông dụng mRMR, ReliefF, Ilfs, CFS, Fisher score, Lasso để thực nghiệm bảy tập mô tả đặc trưng; • Phân loại bảy tập mơ tả đặc trưng trước sau qua bước lựa chọn đặc trưng trên, với hai phân lớp 1-NN SVM; • Một phương pháp cải tiến nhằm làm tăng độ xác, trước tiên lựa chọn phương pháp lựa chọn đặc trưng có hiệu xuất cao trên, tiến hành ghép đặc trưng hữu ích trả từ phương pháp lựa chọn đặc trưng này, chọn phương pháp lựa chọn đặc trưng để tiến hành “lựa chọn xếp lại” đặc trưng lần cuối, cuối phân lớp tập đặc trưng vừa xếp Cách thực hoàn toàn so với kỹ thuật ghép lựa chọn đặc trưng trước Với kết thực nghiệm cho thấy, phương ghép mô tả đặc trưng cho hiệu cao so với mô tả đặc trưng ban đầu; số phương pháp lựa chọn đặc trưng cho hiệu tốt so với dự kiến, tăng độ xác cao vài phần trăm giảm kích thước liệu; phương pháp đề xuất làm việc tốt với liệu thực nghiệm, giúp tăng độ xác lên, nhiên có số khó khăn trình thực phương pháp này, luận văn iii Nhìn tổng quát phương pháp đề xuất hoạt động hiệu với liệu thực nghiệm Đề xuất mở hướng nghiên cứu với liệu giống lúa Việt Nam, phương pháp áp dụng cho liệu khác với liệu thực nghiệm Đặc biệt đóng góp thêm kỹ thuật ghép phương pháp lựa chọn đặc trưng cho tích hợp lựa chọn đặc trưng (ensemble feature selection) iv ABSTRACT Featured selection is a method that allows eliminating irrelevant features and decreasing the dimension of feature space due to the performance of classification is sensible with high dimensional feature vector In this thesis, we applied feature selection techniques for rice seed images classification on the images database of common rice variety in Vietnam There are several propositions such as: - Extracting on three local image descriptors including LBP, GIST and HOG; - Fusing all features to create seven features sets; - Using six common feature selection methods such as mRMR, ReliefF, Ilfs, CFS, Fisher score, Lasso for experimental evaluation on seven features set; - Using 1-N and SVM classifier for classifying; - A new selection method is proposed to combine multiple selection method to create a compact representation of rice seed images This method is used to find the most useful features of each feature selection method and then they are combined and re-ranked in the last time before feeding into the classifier However, this approach raises several limitations which are presented in this thesis In general, the results show that the efficiency of the proposed method in term of accuracy This method opens a new research direction for rice variety in Vietnam so that we can verity on other data in this thesis Additionally, we contribute a new ensemble approach that combine multiple feature selection methods v MỤC LỤC Lời cam đoan i Lời cảm ơn ii Tóm tắt iii Abstract v Mục lục vi Danh mục hình đồ thị vii Danh mục bảng viii Danh mục ký hiệu ix Danh mục viết tắt x Tổng quan xử lý ảnh nhận dạng hạt 1.1 Giới thiệu 1.2 Mục tiêu luận văn 1.3 Đóng góp luận văn 1.3.1 Tính khoa học 1.4 1.5 1.3.2 Tính thực tế 1.3.3 Kinh tế xã hội Cấu trúc luận văn Phương pháp vi giống 1 5 6 Cơ sở lý thuyết 2.1 Không gian màu 2.1.1 Không gian màu RGB 2.1.2 Độ sâu màu 2.2 Đặc trưng (feature) 2.2.1 Đặc trưng 2.2.2 Đặc trưng SIFT 2.2.3 Đặc trưng LBP 2.2.4 Đặc trưng GIST 2.2.5 Đặc trưng HOG 2.3 Bộ phân lớp (classifier) 2.3.1 Random Forest 2.4 2.5 8 10 13 13 16 17 18 19 21 22 2.3.2 K-nearest neighbor 2.3.3 Support vector machine Lựa chọn đặc trưng 2.4.1 Least Absolute Shrinkage and Selection Operator 2.4.2 Maximum Relevance and Minimum Redundancy 2.4.3 ReliefF 2.4.4 Correlation Feature Selection 2.4.5 Fisher score: 2.4.6 Infinite Latent Feature Selection Kết luận 23 24 25 29 30 31 32 32 33 33 35 35 36 36 37 38 39 40 Chia liệu phân lớp 3.4.1 Chia liệu 3.4.2 Phân lớp 41 41 42 Phương pháp đề xuất 3.1 Tiền xử lý 3.1.1 Chuẩn hóa kích thước 3.1.2 Loại bỏ ảnh phần bị nhiễu bên ảnh 3.2 Chuẩn hóa liệu 3.3 Ghép mô tả đặc trưng 3.3.1 Rút trích mơ tả đặc trưng LBP, GIST, HOG 3.3.2 Ghép mô tả đặc trưng 3.4 vii 3.5 3.6 3.7 Cách dùng phương pháp lựa chọn đặc trưng với phân lớp đọc kết Một đề xuất tăng độ xác với liệu xét Kết luận Kết thực nghiệm 4.1 Dữ liệu thực nghiệm 4.2 So sánh, phân tích, đánh giá kết 4.2.1 Độ xác mơ tả đặc trưng đơn ghép 4.2.2 Thực nghiệm với phương pháp lựa chọn đặc trưng 4.2.3 Kết thực nghiệm phương pháp đề xuất nâng cao Acc 4.3 Kết luận Hướng phát triển kết luận 5.1 Đóng góp, hạn chế luận văn 5.1.1 Đóng góp 5.1.2 Hạn chế 5.2 Hướng phát triển kết luận Phụ lục 42 44 46 48 48 50 50 53 67 69 70 70 70 71 72 78 viii cách làm nghiên cứu làm giảm số chiều xuống Điểm hạn chế ghép tập đặc trưng nhiều phương pháp lựa chọn đặc trưng, liệu tăng, có lên gấp đến lần Vì địi hỏi cấu hình máy phải đủ mạnh để thực thực nghiệm, với cấu hình máy đây, cho phép chạy phương pháp số tập liệu tập liệu trên, chạy hết (do thiếu Ram) Điểm khác phương pháp sử dụng kết phương pháp lựa chọn đặc trưng nên cần phải có thời gian chờ 4.3 Kết luận Với kết thực nghiệm chương này, cho thấy phương pháp đề xuất hiệu với liệu thực nghiệm, nhiên tồn nhiều vấn đề cần xem xét kết chưa tối ưu, số hạn chế có thực nghiệm đáng ý phương pháp đề xuất tăng Acc chưa hoàn thiện, cần phải nghiên cứu để hoàn thiện hiệu xuất (hoặc) hạn chế sử dụng tài nguyên máy Chương sau trình bày số đóng góp, hạn chế luận văn hướng phát triển cho đề xuất phát triển tiếp tương lai 69 Chương HƯỚNG PHÁT TRIỂN VÀ KẾT LUẬN Nội dung chương nêu số đóng góp, hạn chế 5.1 hướng phát triển luận văn 5.2 5.1 5.1.1 Đóng góp, hạn chế luận văn Đóng góp Đầu tiên, kỹ thuật ghép mô tả đặc trưng tập liệu giống lúa thực nghiệm hoạt động tốt, giúp cho cải thiện độ xác mơ tả đặc trưng chưa ghép, hoạt động tốt với 20 giống lúa dùng để thử nghiệm; Thứ hai, kỹ thuật lựa chọn đặc trưng hiệu tập mô tả đặc trưng đề xuất, cách lựa chọn phương pháp lựa chọn đặc trưng tùy theo điều kiện để áp dụng thực tế, ưu tiên cho xử lý nhanh không gian lưu trữ hay ưu tiên cho độ xác cao Thứ ba, đóng góp thêm đề xuất ghép phương pháp lựa chọn đặc trưng, để tăng thêm độ xác tập liệu mô tả đặc trưng mà không cần phải tiền xử lý liệu hay thêm mơ tả đặc trưng khác Phương pháp mở hướng nghiên cứu lựa chọn đặc trưng, khơng với tập liệu luận văn, mà cịn 70 áp dụng với nhiều kiểu liệu khác Thứ tư, đề xuất làm tăng độ xác hơn, giảm số chiều liệu, giảm khơng gian lưu trữ, tăng thời gian tính tốn, kỹ thuật áp dụng tập liệu hình ảnh tương tự Thứ năm, đóng góp phương pháp nhận dạng hạt lúa giống cho người nghiên cứu kỹ thuật nhận dạng hạt giống Computer Vision 5.1.2 Hạn chế Hạn chế đầu tiên, phương pháp cần phải có máy tính cấu hình đủ mạnh đủ thời gian để chạy hết kỹ thuật ghép mô tả đặc trưng lựa chọn đặc trưng Đặc biệt phương pháp ghép FS làm tăng độ xác; Hạn chế thứ hai, thực nghiệm ba mô tả đặc trưng LBP, GIST, HOG Lý không đủ thời gian để nghiên cứu thực nghiệm; tương tự thực nghiệm sáu phương pháp FS thơng dụng, cịn nhiều phương pháp khác mà chưa thể thực hiện; Hạn chế thứ ba, đem lại kết cao phương pháp trước, nhiên dừng lại mức tối đa 95,93%, chưa đạt kết tối ưu nhất; Hạn chế thứ tư, phương pháp ghép FS mục đích làm tăng Acc làm giảm liệu hay tăng thời gian tính tốn Nó điểm hạn chế phương pháp này, làm số chiều liệu tăng lên, phương pháp luận làm giảm số chiều xuống; ghép tập đặc trưng nhiều phương pháp lựa chọn đặc trưng, liệu tăng, có lên gấp hai đến ba lần nhiều ghép nhiều ba; điểm phương pháp sử dụng kết phương pháp lựa chọn đặc trưng nên cần phải có thời gian chờ áp dụng kỹ thuật ghép FS Hạn chế chưa xây dựng ứng dụng để thực nghiệm tỉnh nhà, lý thời gian nghiên cứu chạy thực nghiệm nhiều so với dự kiến Đề cương; giống lúa tỉnh liệu luận văn, muốn làm phải thu thập đầy đủ hình ảnh 71 giống lúa thông dụng địa phương; Để khắc phục hạn chế trên, cần phải có thời gian kinh phí để thực hiện, cần phải nghiên cứu thêm tương lai 5.2 Hướng phát triển kết luận Với đóng góp hạn chế bên rút được, để làm tốt công việc nghiên cứu lĩnh vực cần có nhiều thời gian kinh phí Ngồi ra, người cần phải làm việc tồn thời gian sớm hoàn thành tốt cho sản phẩm hoàn thiện Hướng phát triển tương lai: • Nếu tranh thủ nguồn kinh phí để thực hiện, thu thập hình ảnh giống lúa thông dụng Việt Nam, tiếp tục nghiên cứu để hoàn thiện ứng dụng, giúp cho việc nhận dạng phân biệt lúa giống dễ dàng • Tiếp tục nghiên cứu, cải tiến kỹ thuật ghép FS có khả mở rộng sang liệu hình ảnh khác, khơng tập liệu lúa giống Trong lựa chọn đặc trưng, đặc trưng sử dụng lần, qua kết thực nghiệm đặc trưng sử dụng nhiều lần, vị trí khác mang lại kết tốt so với sử dụng lần Trong kỹ thuật ensemble trước sau ghép lại đặc trưng sử dụng lần (tới thời điểm thực luận văn này, theo đánh giá cá nhân) Vì có tiềm mang đến nghiên cứu lĩnh vực lựa chọn ghép đặc trưng khác • Nghiên cứu kết hợp phương pháp cho liệu đầu vào Deep learning để tăng hiệu xuất 72 TÀI LIỆU THAM KHẢO [1] R B Marimont and M B Shapiro Nearest Neighbour Searches and the Curse of Dimensionality IMA Journal of Applied Mathematics, 24(1):59–70, August 1979 Publisher: Oxford Academic [2] Vinh Truong Hoang Multi color space LBP-based feature selection for texture classification page 192, 2018 [3] Rumaisah Munir and Rizwan Ahmed Khan An extensive review on spectral imaging in biometric systems: Challenges & advancements Journal of Visual Communication and Image Representation, 65:102660, December 2019 [4] Antitza Dantcheva, Petros Elia, and Arun Ross What Else Does Your Biometric Data Reveal? A Survey on Soft Biometrics IEEE Transactions on Information Forensics and Security, 11(3):441–467, March 2016 [5] Shaveta Dargan and Munish Kumar A comprehensive survey on the biometric recognition systems based on physiological and behavioral modalities Expert Systems with Applications, 143:113114, April 2020 [6] Phan Thi Thu Hong, Tran Thi Thanh Hai, Le Thi Lan, Vo Ta Hoang, Vu Hai, and Thuy Thi Nguyen Comparative Study on Vision Based Rice Seed Varieties Identification In 2015 Seventh International Conference on Knowledge and Systems Engineering (KSE), pages 377–382, Ho Chi Minh City, Vietnam, October 2015 IEEE 73 [7] Itthi Chatnuntawech, Kittipong Tantisantisom, Paisan Khanchaitit, Thitikorn Boonkoom, Berkin Bilgic, and Ekapol Chuangsuwanich Rice Classification Using Spatio-Spectral Deep Convolutional Neural Network page 18, 2018 [8] Nicolas Vandenbroucke, Laurent Busin, and Ludovic Macaire Unsupervised color-image segmentation by multicolor space iterative pixel classification Journal of Electronic Imaging, 24(2):023032, April 2015 [9] Juwei Lu and K.N Plataniotis On conversion from color to gray-scale images for face detection In 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pages 114– 119, Miami, FL, June 2009 IEEE [10] David G Lowe Distinctive Image Features from Scale-Invariant Keypoints International Journal of Computer Vision, 60(2):91–110, November 2004 [11] D.G Lowe Object recognition from local scale-invariant features In Proceedings of the Seventh IEEE International Conference on Computer Vision, pages 1150–1157 vol.2, Kerkyra, Greece, 1999 IEEE [12] Timo Ojala, Matti Pietikăainen, and Topi Măaenpăaăa A Generalized Local Binary Pattern Operator for Multiresolution Gray Scale and Rotation Invariant Texture Classification In Gerhard Goos, Juris Hartmanis, Jan van Leeuwen, Sameer Singh, Nabeel Murshed, and Walter Kropatsch, editors, Advances in Pattern Recognition — ICAPR 2001, volume 2013, pages 399–408 Springer Berlin Heidelberg, Berlin, Heidelberg, 2001 [13] Li Liu, Paul Fieguth, Yulan Guo, Xiaogang Wang, and Matti Pietikăainen Local binary features for texture classification: Taxonomy and experimental study Pattern Recognition, 62:135–160, February 2017 74 [14] Aude Oliva and Antonio Torralba Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope page 31, 2001 [15] N Dalal and B Triggs Histograms of Oriented Gradients for Human Detection In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), volume 1, pages 886–893, San Diego, CA, USA, 2005 IEEE [16] Jason Brownlee A Tour of Machine Learning Algorithms December 2019 [17] Manuel Fernandez-Delgado, Eva Cernadas, Senen Barro, and Dinani Amorim Do we Need Hundreds of Classifiers to Solve Real World Classification Problems? page 49, 2014 [18] Advances in computational toxicology Springer Berlin Heidelberg, New York, NY, 2019 [19] Verónica Bolón-Canedo and Amparo Alonso-Betanzos Ensembles for feature selection: A review and future trends Information Fusion, 52:1– 12, December 2019 [20] Jie Cai, Jiawei Luo, Shulin Wang, and Sheng Yang Feature selection in machine learning: A new perspective Neurocomputing, 300:70–79, July 2018 [21] Makoto Yamada, Wittawat Jitkrittum, Leonid Sigal, Eric P Xing, and Masashi Sugiyama High-Dimensional Feature Selection by FeatureWise Kernelized Lasso Neural Computation, 26(1):185–207, January 2019 arXiv: 1202.0515 [22] Zhenyu Zhao, Radhika Anand, and Mallory Wang Maximum Relevance and Minimum Redundancy Feature Selection Methods for a Marketing Machine Learning Platform arXiv:1908.05376 [cs, stat], August 2019 arXiv: 1908.05376 75 [23] Sergio Ramírez-Gallego, Iago Lastra, David Martínez-Rego, Verónica Bolón-Canedo, José Manuel Benítez, Francisco Herrera, and Amparo Alonso-Betanzos Fast-mRMR: Fast Minimum Redundancy Maximum Relevance Algorithm for High-Dimensional Big Data: FAST-mRMR ALGORITHM FOR BIG DATA International Journal of Intelligent Systems, 32(2):134–152, February 2017 [24] Igor Kononenko Estimating attributes: Analysis and extensions of RELIEF In J G Carbonell, J Siekmann, G Goos, J Hartmanis, Francesco Bergadano, and Luc Raedt, editors, Machine Learning: ECML-94, volume 784, pages 171–182 Springer Berlin Heidelberg, Berlin, Heidelberg, 1994 [25] Kenji Kira and Larry A Rendell A Practical Approach to Feature Selection In Machine Learning Proceedings 1992, pages 249–256 Elsevier, 1992 ˇ ˇ [26] Igor Kononenko, Edvard Simec, and Marko Robnik-Sikonja Overcoming the Myopia of Inductive Learning Algorithms with RELIEFF Applied Intelligence, 7(1):39–55, January 1997 ˇ [27] Marko Robnik-Sikonja and Igor Kononenko Theoretical and Empirical Analysis of ReliefF and RReliefF Machine Learning, 53(1):23–69, October 2003 [28] Mark Hall Correlation-Based Feature Selection for Machine Learning Department of Computer Science, 19, June 2000 [29] Quanquan Gu, Zhenhui Li, and Jiawei Han Generalized Fisher Score for Feature Selection Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence, UAI 2011, February 2012 [30] Kathryn Laskey and Henri Prade Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence (1999) arXiv:1304.3843 [cs], August 2014 arXiv: 1304.3843 76 [31] Tajul Miftahushudur, Chaeriah Bin Ali Wael, and Teguh Praludi Infinite Latent Feature Selection Technique for Hyperspectral Image Classification Jurnal Elektronika dan Telekomunikasi, 19(1):32, August 2019 [32] Isha Vats Otsu Image Segmentation Algorithm: A Review 5(6):4, 2007 77 PHỤ LỤC Luận văn có hai báo: chấp nhận báo cáo hội nghị Springer-Rice 2020 [P1] (Scopus & Web of Science), chờ kết phản biện tạp chí Advances in Electrical and Electronic Engineering [P2] CÔNG BỐ [P1] D Lam Tran Tuan, V Truong Hoang, “Using feature selection based on multi-view for rice seed images classification”, Fifth International Conference on Research in Intelligent and Computing in Engineering (Springer RICE 2020) Springer-RICE, June 3-4, 2020, Thu Dau Mot, Vietnam (Accepted) [P2] D Lam Tran Tuan, V Truong Hoang, “Ensemble feature selection approach based on feature ranking for rice seed images classification” Advances in Electrical and Electronic Engineering (Submitted - Under review) 78 Using feature selection based on multi-view for rice seed images classification Dzi Lam Tran Tuan and Vinh Truong Hoang Ho Chi Minh City Open University, Vietnam e-mail: dziltt.178i@ou.edu.vn; vinh.th@ou.edu.vn Abstract Rice is one of the primary foods for human and various rice varieties have been cultivated in many countries Automatic rice variety inspection system based on computer vision is needed instead of using technical human expert In this paper, we have investigated three types of local descriptor such as Local Binary Pattern, Histogram Of Oriented Gradients and GIST to characterize rice seed images In order to enhance the robustness, multi-view approach has been considered to concatenate theses features However, this problem encounters the high-dimensional space feature vector and need to select the relevant features for a compact and better model We have applied different feature selection methods in a filter way to eliminate the impertinent features The experimental results on the VNRICE dataset have been shown the efficiency of our proposition Keywords: LBP · HOG · GIST · feature selection · feature ranking · multi-view Introduction In many countries, rice is the main crop and is a essential source of food consumed by nearly half the world’s population One of the most important factors in a high yield crop is that rice seeds must be pure The quality of rice depends entirely on the genetic properties of a rice variety In fact, various rice varieties may be combined with others, which might affect the productivity of a rice More recently, ST25 - a specific variety rice invented at Soc Trang province in south of Vietnam is nominated the best rice in the world After that, a plenty of fake ST25 is produced and sold in other regions So, it is extremely difficult for distinguish or recognize different rice variety by a non-expert human At this time, the process to select and identify the unwanted seeds is made manually in Vietnamese companies Based on the different visual inspection by technical experts Thus, this process is time consuming and can lead to a degradation of the quality of seeds An inspection process via computer vision system is required to replace the human task A huge number of vision systems for real-life problems have been developed and implemented [7,20] In agriculture, there are various works that use machine vision for automatic inspection, quality control of agricultural products Dzi Lam Tran Tuan and Vinh Truong Hoang Recently, the rapid advancement of technology is being developed to characterize features of multimedia data Various attributes descriptors have been introduced to represent image in the last decade [11] Each kind of attribute represents the data in a specific space and has precise spatial meaning and statistical properties Nhat and Hoang [17] present a method to fuse the features extracting from three descriptors (Local Binary Pattern, Histogram of Gradient and GIST) based on block division for face recognition task The concatenated features are then applied by canonical correlation analysis to have a compact representation before feeding into classifier Van and Hoang [21] propose to reduce noisy and irrelevant Local Ternary Pattern (LTP) features and Histogram of Gradient (HOG) coding on different color spaces for face analysis Mebatsion et al [15] fuse Fourier descriptors and three geometrical features for cereal grains recognition Duong and Hoang [6] apply to extract rice seed images based on features coded in multiple color spaces using HOG descriptor Multi-view learning was introduced to complement information between different views Different local descriptors are extracted and combined to create a multi-view image representation While concatenating different feature sets, it is evident that all the features not give the same contribution for the learning task and some feature might decrease the performance Feature selection method is used to select relevant features and reduce the high dimension of the origin data Feature selection allows to increase the efficiency of learning activities in two categories of purpose:(1) to better represent data, it eliminates noisy or irrelevant features, (2) to compute the classification or clustering efficiently since the dimensions of feature space selected is reduced More recently, Zhang et al present a complete review related to attribute selection and feature-level fusion methods [23] In this paper, we propose to extract multi-view features set based on LBP, HOG and GIST descriptors from rice seed images for classification task Before feeding to classifier, various feature selection approaches are further investigated to reduce the dimension space and increase the classification performance This paper is organized and structured as follows Section 2, introduces the feature extracting methods based on three local image descriptors Section and present proposed approach and experimental results Finally, the conclusion is discussed in section The feature extracting methods This section briefly reviews three local image descriptors used in the experiment for extracting features 2.1 Local binary pattern The LBPP,R (xc , yc ) code of each pixel (xc , yc ) is calculated by comparing the −1 gray value gc of the central pixel with the gray values {gi }P i=0 of its P neighbors, as follows [18]: Using feature selection based on multi-view for rice seed images classification LBPP,R = P −1 X p=0 ω(gp − gc )2p (1) where gc is the gray value of central, gp is the gray value of P , R is the radius of the circle and ω(gp − gc ) is defined as ω(gp − gc ) = 2.2 if (gp - gc ) ≥ 0, otherwise (2) GIST GIST is firstly proposed by Oliva and Torralba [19] in order to classify objects which represents the shape of the object The primary idea of this approach is based on the Gabor filter: h(x, y) = e y − 21 ( x + δ ) −j2π(u0 x+v0 y) δx y e (3) For each (δx , δy ) of the image via the Gabor filter, we obtain all the elements in the image that are close to the point color (u0 x + v0 y) The result of the calculated vector GIST will have many dimensions To reduce the size of the vector, we averaged over each × grid of the above results Each image is configured by a Gabor filter with scales (scales) and directions (orientations), creating 32 characteristic maps of the same size 2.3 Histograms of Oriented Gradient Histograms of Oriented Gradient (HOG) descriptor is applied for different tasks in machine vision [5] such as human detection [4] HOG feature is extracted by counting the occurrences of gradient orientation base on the gradient angle and the gradient magnitude of local patches of an image The gradient angle and magnitude at each pixel is computed in an × pixels patch Next, 64 gradient feature vectors is divided into angular bins − 180◦ (20◦ each) The gradient magnitude T and angle K at each position (k, h) from an image J are computed as follow: ∆k = |J (k − 1, h) − J (k + 1, h)| (4) ∆h = |J (k, h − 1) − J (k, h + 1)| q T (k, h) = ∆2i + ∆2j K (k, h) = tan −1 ∆k ∆j (5) (6) (7) Dzi Lam Tran Tuan and Vinh Truong Hoang Feature Selection Based on the availability of supervised information (i.e labels), feature selection techniques can be grouped into two large categories: supervised and unsupervised context [1] Additionally, different strategies of feature selection is proposed based on evaluation process such as filter, wrapper and hybrid methods [8] Hybrid approaches incorporate both filter and wrapper into a single structure, in order to give an effective solution for dimensionality reduction [3] In order to study the contribution of feature selection approaches for rice seed images classification, we propose to apply several selection approach based on image representing by multi-view descriptor In the following we will shortly present the common feature selection methods applied in supervised learning context LASSO (Least Absolute Shrinkage and Selection Operator) allows to compute feature selection based on the assumption of linear dependency between input features and output values Lasso minimizes the sum of squares of residuals when the sum of the absolute values of the regression coefficients is less than a constant, which yields certain strict regression coefficients equal to [3,22] mRMR (Maximum Relevance and Minimum Redundancy) is a mutual information based feature selection criterion, or distance/similarity scores to select features The aim is to penalize a feature’s relevancy by its redundancy in the presence of the other selected features [24] ReliefF [13] is extended from Relief [12] to support multiclass problem ReliefF seems to be a promising heuristic function that may overcome the myopia of current inductive learning algorithms Kira and Rendell used ReliefF as a preprocessor to eliminate irrelevant attributes from data description before learning ReliefF is general, relatively efficient, and reliable enough to guide the search in the learning process [14] CFS (Correlation Feature Selection) mainly applies heuristic methods to evaluate the effect of single feature corresponding to each group in order to obtain an optimal subset of attributes Fisher [2] identify a subset of features so that the distances between samples in different classes are as large as possible, while the distances between samples in the same class are as small as possible Fisher selects the top ranked features according to its scores Ilfs (Infinite Latent Feature Selection) is a technique consists of three steps such as preprocessing, feature weighting based on a fully connected graph in each node that connect all features Finally, energy scores of the path length are calculated, then rank its correspondence with the feature [16] 4.1 Experimental Results Experimental setup The rice seed images database compose of six rice seed varieties in the northern of Vietnam [10,6] We apply the 1-NN and SVM classifier to evaluated the Using feature selection based on multi-view for rice seed images classification classification performance A half of database is selected for training set and the rest is used for testing set We use Hold-out method with ratio (1/2 and 1/2) and split the training and testing set like a chessboard decomposition All experiments are implemented and simulated by Matlab 2019a and conducted on a PC with a configuration of a CPU Xeon 3.08GHz, 64 GBs of RAM 4.2 Results Table shows the accuracy obtained by 1-NN and SVM classifier without feature selection approach is applied The first column indicates the features used for representing images We use three individual local descriptor namely LBP, GIST and HOG and the concatenation of these descriptors such as LBP + GIST, LBP + HOG, GIST + HOG and LBP + GIST + HOG The second column indicates the number of feature (or dimension) corresponding to features type The third and fourth columns show the accuracy obtained by 1-NN and SVM classifier We observe that the multi-view by concatenating multiple feature give the better performance, however it increases the dimension Hence, the performance of SVM classifier is better than 1-NN classifier with 95.7% of accuracy Table 1: Classification performance without selection approach for different type of features Features Dimension 1-NN SVM LBP 768 53.0 77.0 GIST 512 69.4 88.3 HOG 21,384 71.5 94.7 LBP + GIST 1,280 70.5 91.7 LBP + HOG 22,152 72.0 95.5 GIST + HOG 21,896 72.1 94.9 LBP + GIST + HOG 22,664 72.7 95.7 Table presents the classification performance based o LBP + HOG features with different feature selection approaches The second to sixth row indicate the name of feature selection method The two classifier is used for each feature ranking approach The ACC with three sub-columns corresponding the three sub-columns indicate such as: (1) 100% column represents the accuracy obtain when using all features, (2) > id% column present the minimum percentage of features used for reaching the same accuracy as all features in used (3) Dim column indicates the number of selected features is used for obtaining the corresponding accuracy The MAX ACC is the second sub-colum of 1-NN column It shows the maximal accuracy obtained for each feature selection method via three sub-columns, the percentage of features in used and the number of dimension, respectively We try to analyze this table by comparing the accuracy obtained and the number of selected features reduced For example, by using only 21% of features, we obtain the same accuracy as all features in used by Fisher selection method

Định dạng
Số trang	101
Dung lượng	7,42 MB