(Luận văn thạc sĩ) nghiên cứu giải pháp phát hiện tấn công trong mạng vạn vật kết nối sử dụng học máy

HỌC VIỆN CƠNG NGHỆ BƯU CHÍNH VIỄN THƠNG - Nguyễn Thanh Hồng NGHIÊN CỨU GIẢI PHÁP PHÁT HIỆN TẤN CƠNG TRONG MẠNG VẠN VẬT KẾT NỐI SỬ DỤNG HỌC MÁY ĐỀ ÁN TỐT NGHIỆP THẠC SĨ KỸ THUẬT (Theo định hướng ứng dụng) HÀ NỘI - 2023 HỌC VIỆN CÔNG NGHỆ BƯU CHÍNH VIỄN THƠNG - Nguyễn Thanh Hoàng NGHIÊN CỨU GIẢI PHÁP PHÁT HIỆN TẤN CÔNG TRONG MẠNG VẠN VẬT KẾT NỐI SỬ DỤNG HỌC MÁY Chuyên ngành: Kĩ thuật viễn thông Mã số: 8.52.02.08 ĐỀ ÁN TỐT NGHIỆP THẠC SĨ KỸ THUẬT (Theo định hướng ứng dụng) NGƯỜI HƯỚNG DẪN KHOA HỌC TS Nguyễn Minh Tuấn HÀ NỘI - NĂM 2023 i LỜI CAM ĐOAN Tất nội dung đề tài Đề án “Nghiên cứu giải pháp phát công mạng vạn vật kết nối sử dụng học máy” nghiên cứu độc lập hướng dẫn, bảo, góp ý TS Nguyễn Minh Tuấn, giảng viên Khoa Viễn thông 1, Học viện Công nghệ Bưu Viễn thơng Kết thể đề án trung thực không chép hình thức Các tài liệu tham khảo đề án trích dẫn đầy đủ, rõ ràng Tơi xin chịu tồn tồn trách nhiệm với nội dung hình thức đề án Tác giả đề án Nguyễn Thanh Hoàng ii LỜI CẢM ƠN Lời xin chân thành cảm ơn TS Nguyễn Minh Tuấn thầy cô giáo khoa Viễn Thông – Học viện Cơng nghệ Bưu Viễn thơng hỗ trợ, góp ý, bảo tồn q trình thực đề án Đề văn kết nỗ lực học hỏi, phấn đấu, khắc phục khó khăn bảo, giúp đỡ tận tình giảng viên hướng dẫn Tơi xin cảm ơn gia đình bạn bè tạo điều kiện thuận lợi để hồn thành khố học cách tốt Sau cùng, xin cảm ơn anh bạn học viên lớp M21CQTE01-B sát cánh đồng hành tơi q trình học tập tại trường hoàn thành đề án iii MỤC LỤC LỜI CAM ĐOAN i LỜI CẢM ƠN ii DANH SÁCH BẢNG v DANH SÁCH HÌNH VẼ vi DANH MỤC CÁC THUẬT NGỮ, CHỮ VIẾT TẮT vii MỞ ĐẦU CHƯƠNG 1: GIỚI THIỆU VỀ AN TỒN THƠNG TIN TRONG IoT 1.1 Tổng quan công nghệ, hệ thống IoT 1.1.1 Công nghệ IoT 1.1.2 Hệ thống IoT 1.2 Vấn đề an tồn thơng tin hệ thống IoT 1.2.1 Các lỗ hổng bảo mật phần mềm phần sụn 1.2.2 Truyền thông khơng an tồn 10 1.2.3 Rò rỉ liệu từ hệ thống IoT 10 1.2.4 Rủi ro từ phần mềm độc hại 11 1.2.5 Tấn công mạng 12 1.3 Các giải pháp phát công hệ thống IoT 13 1.3.1 Quản lý tin cậy 13 1.3.2 Sự xác thực 15 1.3.3 Hệ thống phát xâm nhập 16 1.3.3.1 Các thành phần hệ thống IDS 17 1.3.3.2 Phân loại hệ thống phát xâm nhập 18 1.4 Kết luận chương 21 CHƯƠNG 2: GIẢI PHÁP PHÁT HIỆN TẤN CÔNG MẠNG 22 2.1 Tổng quan giải pháp IDS IoT sử dụng học máy 22 2.2 Giới thiệu thuật toán học máy học sâu sử dụng nghiên cứu 25 2.2.1 Rừng ngẫu nhiên 25 2.2.2 Đóng bao 27 2.2.3 Thúc đẩy 29 iv 2.2.4 k láng giềng gần 31 2.3 Đề xuất mơ hình kiến trúc tổng thể phát công IDS sử dụng học máy 32 2.4 Giới thiệu liệu NSL-KDD dùng cho xây dựng, thiết kế IDS 33 2.5 Kết luận chương 36 CHƯƠNG 3: THỬ NGHIỆM, ĐỀ XUẤT IDS 37 ỨNG DỤNG TRONG IoT 37 3.1 Xử lý tiền xử lý liệu 37 3.1.1 Tiền xử lý liệu 37 3.1.2 Chuẩn hóa nhỏ – lớn 39 3.2 Lựa chọn đặc tính tối ưu liệu 40 3.2.1 Loại bỏ đặc tính đệ quy 40 3.2.2 Xác thực chéo k lần 42 3.2.3 Triển khai lựa chọn đặc tính tối ưu liệu 43 3.3 Thiết lập, mô phỏng, thử nghiệm hệ thống IDS dựa học máy 44 3.4 Phân tích kết quả, so sánh hiệu IDS sử dụng thuật toán học máy khác 46 3.4.1 Loại bỏ đặc tính đệ quy 46 3.4.2 Đánh giá mơ hình 47 3.4.3 So sánh hiệu IDS đề xuất với nghiên cứu khác ứng dụng học máy, học sâu 49 3.5 Lựa chọn IDS tối ưu áp dụng cho hệ thống IoT 51 3.6 Những giới hạn phương pháp đề xuất 52 3.7 Kết luận chương 54 KẾT LUẬN 55 DANH MỤC TÀI LIỆU THAM KHẢO 57 PHỤ LỤC 62 v DANH SÁCH BẢNG Bảng 2.1 Số lượng ghi KDDTrain+, KDDTest+, KDDTest-21 34 Bảng 2.2 Các kiểu công lớp 35 Bảng 2.3 Phân bố liệu tập huấn luyện kiểm tra 35 Bảng 2.4 Các giao thức hình thức cơng sử dụng 36 Bảng 3.1 Xếp hạng đặc tính dựa vào độ quan trọng 46 Bảng 3.2 Hiệu suất xác thực cao mơ hình ML tập liệu xác thực47 Bảng 3.3 Ma trận nhầm lẫn phân loại Rừng Ngẫu Nhiên 48 Bảng 3.4 So sánh kết mơ hình đề xuất với nghiên cứu khác 50 vi DANH SÁCH HÌNH VẼ Hình 1.1 Các ứng dụng phổ biến IoT Hình 1.2 Sơ lược lịch sử phát triển IoT Hình 1.3 Kiến trúc mơi trường IoT Hình 1.4 Một số thách thức IoT phải đối mặt Hình 1.5 Các vấn đề bảo mật hệ thống IoT Hình 1.6 loại công mạng phổ biến hệ thống IoT 12 Hình 1.7 Một số hình thức cơng phổ biến lớp khác hệ thống IoT 13 Hình 1.8 Các loại quản lý tin cậy IoT 14 Hình 1.9 Các loại xác thực IoT 16 Hình 1.10 Các thành phần IDS 17 Hình 1.11 Phân loại IDS 18 Hình 2.1 Mã giả thuật toán RF cho phân loại hồi quy 26 Hình 2.2 Hình ảnh minh họa cho phân loại RF 27 Hình 2.3 Hình minh họa thuật toán BG 28 Hình 2.4 Hình minh họa thuật toán BS 29 Hình 2.5 Minh họa cho luật k láng giềng gần 31 Hình 2.6 Mơ hình tổng quan bước phương pháp đề xuất 32 Hình 2.7 Tấn công theo giao thức bên liệu KDDTrain+ 36 Hình 3.1 Những nhiệm vụ q trình tiền xử lý liệu………………… 38 Hình 3.2 Một số phương pháp giảm chiều liệu để lựa chọn đặc tính 41 Hình 3.3 Mơ hình hoạt động xác thực chéo k lần 43 Hình 3.4 Thủ tục RFE thực 45 Hình 3.5 Đường cong AUC-ROC 45 Hình 3.6 Đường cong ROC thuật tốn đề xuất 49 vii DANH MỤC CÁC THUẬT NGỮ, CHỮ VIẾT TẮT Thuật ngữ viết Nghĩa tiếng Việt Tiếng Anh tắt ANN Artificial neural network Mạng thần kinh nhân tạo ARM Advanced RISC Machine Máy RISC tiên tiến AUC Area Under the Curve Khu vực đường cong AV Adaptive Voting Bỏ phiếu thích ứng BG Bagging Đóng bao BS Boosting Thuật tốn thúc đẩy CNN Convolutional Neural Network Mạng thần kinh tích chập CSP Cloud Service Provider Nhà cung cấp dịch vụ đám mây CV Cross Validation Xác thực chéo DNN Deep Neural Network Mạng thần kinh sâu DoS Denial of Service Tấn công từ chối dịch vụ DoSL Denial of Sleep Tấn công từ chối giấc ngủ DT Decision Tree Cây định FN False Negative Âm tính sai FP False Positive Dương tính sai FPR False Positive Rate Tỉ lệ âm tính GA Genetic Algorithms Thuật toán di truyền Host-based Intrusion Detection Hệ thống phát xâm nhập dựa HIDS System máy chủ ICMP Internet Control Message Protocol Giao thức thông báo kiểm soát Internet viii IDS Intrusion Detection System Hệ thống phát xâm nhập IoT Internet of Things Mạng vạn vật kết nối KNN K Nearest Neighbors K láng giềng gần LSTM Long short-term memory Mạng nhớ dài-ngắn M2M Machine to Machine Máy với Máy MAC Media Access Control Điều khiển truy cập phương tiện Microprocessor without Interlocked Bộ vi xử lý khơng có giai đoạn đường MIPS Pipeline Stages ống ăn khớp với MitM Man-in-the-Middle Tấn công xen Message Queuing Telemetry MQTT Transport Vận chuyển tin nhắn hàng đợi từ xa MT Multi Tree Đa Network-based Intrusion Detection Hệ thống phát xâm nhập dựa NIDS System mạng PCA Principal component analysis Phân tích thành phần PLC Programmable logic controller Bộ điều khiển logic khả trình Probe Probing Thăm dị QoS Quality of Service Chất lượng dịch vụ R2L Root to Local Từ nguồn đến địa phương REST REpresentational State Transfer Đại diện cho chuyển đổi liệu RF Random Forest Rừng ngẫu nhiên RFE Recursive Feature Elimination Loại bỏ đặc tính đệ quy RISC Reduced Instruction Set Computer Tập lệnh máy tính thu gọn RNN Recurrent Neural Network Mạng thần kinh tái phát ROC Receiver operating characteristic Đặc thù vận hành bên nhận RTOS Real-time Operating System Hệ điều hành thời gian thực SVM Support Vector Machine Véc tơ máy hỗ trợ 57 DANH MỤC TÀI LIỆU THAM KHẢO [1] A V Dastjerdi and R Buyya, "Fog computing:Helping the Internet of Things realize its potential,"Computer, vol 49, no 8, pp 112-116, 2016 [2] Z Yan, P Zhang, and A V Vasilakos, "A survey on trust management for Internet of Things," Journal of network and computer applications, vol 42, pp 120-134, 2014 [3] D Evans, "The internet of things: How the next evolution of the internet is changing everything," CISCO white paper, vol 1, no 2011, pp 1-11, 2011 [4] S Ray, Y Jin, and A Raychowdhury, "The changing computing paradigm with internet of things: A tutorial introduction," IEEE Design & Test, vol 33, no 2, pp 7696, 2016 [5] M Abomhara, "Cyber security and the internet of things: vulnerabilities, threats, intruders and attacks," Journal of Cyber Security and Mobility, vol 4, no 1, pp 6588, 2015 [6] D Serpanos, "The Cyber-Physical Systems Revolution," Computer, vol 51, no 3, pp 70-73, 2018 [7] E Bertino and N Islam, "Botnets and internet of things security," Computer, vol 50, no 2, pp 76-79, 2017 [8] S Raza, L Wallgren, and T Voigt, "SVELTE: Realtime intrusion detection in the Internet of Things," Adhoc networks, vol 11, no 8, pp 2661-2674, 2013 [9] W Kong, J Shen, P Vijayakumar, Y Cho, V Chang, “A practical group blind signature scheme for privacy protection in smart grid,” J Parallel Distrib Comput., vol 136, pp 29–39, 2020 [10] M Eskandari, Z H Janjua, M Vecchio and F Antonelli, "Passban IDS: An Intelligent Anomaly-Based Intrusion Detection System for IoT Edge Devices," IEEE Internet of Things Journal, vol 7, no 8, pp 6882-6897, Aug 2020 [11] Abdel‐Basset, Mohamed, et al "Internet of things in smart education environment: Supportive framework in the decision‐making process." Concurrency and Computation: Practice and Experience, vol 31, 2019 58 [12] S.B Baker, W Xiang and I Atkinson, "Internet of Things for Smart Healthcare: Technologies, Challenges, and Opportunities," IEEE Access, vol 5, pp 26521-26544, 2017 [13] Gondchawar, Nikesh, and R.S.Kawitkar "IoT based smart agriculture." International Journal of advanced research in Computer and Communication Engineering, vol 5, no 6, pp 838-842, 2016 [14] Mishra, Nivedita, and Sharnil Pandya "Internet of things applications, security challenges, attacks, intrusion detection, and future visions: A systematic review." IEEE Access, vol 9, pp 59353-59377, 2021 [15] Hassan, Wan Haslina "Current research on Internet of Things (IoT) security: A survey." Computer networks, vol 148, pp 283-294, 2019 [16] V Hassija, V Chamola, V Saxena, D Jain, P Goyal and B Sikdar, "A Survey on IoT Security: Application Areas, Security Threats, and Solution Architectures," IEEE Access, vol 7, pp 82721-82743, 2019 [17] Alhammadi, Nafea Ali Majeed, and Khalid Hameed Zaboon "A Review of IoT Applications, Attacks and Its Recent Defense Methods." Journal of Global Scientific Research, vol 7, no 3, pp 2128-2134, 2022 [18] Liu, Hongyu, and Bo Lang "Machine learning and deep learning methods for intrusion detection systems: A survey." Applied sciences, vol 9, no 20, pp 4396, 2019 [19] X Gao, C Shan, C Hu, Z Niu, Z Liu, “An adaptive ensemble machine learning model for intrusion detection,” IEEE Access, vol 7, pp 82512–82521, 2019 [20] B Selvakumar, K Muneeswaran, “Firefly algorithm based feature selection for network intrusion detection,” Comput Secur., vol 81, pp 148–155, 2019 [21] A Binbusayyis and T Vaiyapuri, “Identifying and benchmarking key features for cyber intrusion detection: An ensemble approach,” IEEE Access, vol 7, pp 106495– 106513, 2019 [22] M T Nguyen and K Kim, ‘‘Genetic convolutional neural network for intrusion detection systems,’’ Future Gener Comput Syst., vol 113, pp 418–427, 2020 59 [23] A S Khan, Z Ahmad, J Abdullah, and F Ahmad, “A Spectrogram Image-Based Network Anomaly Detection System Using Deep Convolutional Neural Network,” IEEE Access, vol 9, pp 87979-87093, 2021 [24] G Andresini, A Appice, and D Malerba “Nearest cluster-based intrusion detection through convolutional neural networks,” Knowledge-Based Systems, vol 216, pp 115, 2021 [25] T.A Tchakoucht, M Ezziyyani, “Multilayered echo-state machine: A novel architecture for efficient intrusion detection,” IEEE Access, vol 6, pp 72458–72468, 2018 [26] A.A Diro, N Chilamkurti, “Distributed attack detection scheme using deep learning approach for Internet of Things,” Future Gener Comput Syst., vol 82, pp 761–768, 2018 [27] S.M Kasongo, Y Sun, “A deep learning method with filter based feature engineering for wireless intrusion detection system,” IEEE Access, vol 7, 38597– 38607, 2019 [28] R Vinayakumar, M Alazab, K.P Soman, P Poornachandran, A Al-Nemrat, S Venkatraman, “Deep learning approach for intelligent intrusion detection system,” IEEE Access, vol 7, pp 41525–41550, 2019 [29] T.-T.-H Le, Y Kim, H Kim, “Network intrusion detection based on novel feature selection model and various recurrent neural networks,” Appl Sci., vol 9, pp 1–29, 2019 [30] Q Chen, Z Meng, X Liu, Q Jin, and R Su, “Decision Variants for the Automatic Determination of Optimal Feature Subset in RF-RFE,” Genes (Basel), vol 6, pp 1-13, 2018 [31] E Kabir, J Hu, H Wang, G Zhuo, A novel statistical technique for intrusion detection systems, Future Gener Comput Syst., vol 79, pp 303–318, 2018 [32] K Peng, V C M Leung, and Q Huang, ‘‘Clustering approach based on mini batch Kmeans for intrusion detection system over big data,’’ IEEE Access, vol 6, pp 11897–11906, 2018 60 [33] Vargaftik S, supervised Keslassy I, anomaly Ben-Itzhak detection Y (2019) using Rade: decision Resource-efficient tree-based ensemble methods arXiv preprint arXiv:1909.11877 [34] LYPA, Borys; IVER, Oleh; KIFER, Viktor Application of machine learning methods for network intrusion detection system Proceeding of Processing, Transmission and Security of Information, 2019, 233-240 [35] T Hastie, R Tibshirani, and J Friedman, The Elements of Statistical Learning New York, NY, USA: Springer, 2001 [36] R O Duda, P E Hart, and D G Stork, Pattern Classification, 2nd ed New York, NY, USA: Wiley, 2001 [37] Aghdam, Hamed Habibi, and Elnaz Jahani Heravi "Guide to convolutional neural networks." New York, NY: Springer 10.978-973 (2017): 51 [38] Dhanabal, L., and S P Shantharajah "A study on NSL-KDD dataset for intrusion detection system based on classification algorithms." International journal of advanced research in computer and communication engineering, vol 4, no 6, pp 446452, 2015) [39] Samb, Mouhamadou Lamine, et al "A novel RFE-SVM-based feature selection approach for classification." International Journal of Advanced Science and Technology 43.1 (2012): 27-36 [40] Abdalgawad, Nada, et al "Generative deep learning to detect cyberattacks for the IoT23 dataset." IEEE Access, vol 10, pp 6430-6441, 2021 [41] Ullah, Imtiaz, and Qusay H Mahmoud "Design and development of a deep learningbased model for anomaly detection in IoT networks." IEEE Access, vol 9, pp 103906103926, 2021 [42] Ullah, Imtiaz, and Qusay H Mahmoud "A technique for generating a botnet dataset for anomalous activity detection in IoT networks." 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC) IEEE, 2020 [43] Liu, Jinxin, Burak Kantarci, and Carlisle Adams "Machine learning-driven intrusion detection for Contiki-NG-based IoT networks exposed to NSL-KDD 61 dataset." Proceedings of the 2nd ACM workshop on wireless security and machine learning, 2020 [44] Taghavinejad, Seyedeh Mahsan, et al "Intrusion detection in IoT-based smart grid using hybrid decision tree." 2020 6th International Conference on Web Research (ICWR) IEEE, 2020 [45] Rani, Deepa, and Narottam Chand Kaushal "Supervised machine learning based network intrusion detection system for Internet of Things." 2020 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT) IEEE, 2020 [46] Li, Yanmiao, et al "Robust detection for network intrusion of industrial IoT based on multi-CNN fusion." Measurement 154 (2020): 107450 62 PHỤ LỤC %code ========================================================= % This file implements the validation of every feature set using BG model %========================================================== === clear; close; clc; path_res = 'D:/my_folder/'; filename = {'BG_RFE_val_F_1'}; [KDD_val] = BG_RFE_validate; file_save = sprintf('%s%s',path_res,filename{1}); save(file_save,'KDD_val'); function [BG_val] = BG_RFE_validate load('BS_RFE_rank_feature.mat'); %#ok num_tr = size(KDD0.teblabel,1); num_fold = 5; tr_indices = crossvalind('Kfold',num_tr,num_fold); BG_val_acc = zeros(41,5); BG_val_fpr = zeros(41,5); BG_val_F1 = zeros(41,5); BG_val_tpr = zeros(41,5); for k = 1:num_fold teInd = (tr_indices == k); trInd = ~teInd; tr_folds = KDD0.tedata(trInd,:); te_folds = KDD0.tedata(teInd,:); tr_label_folds = KDD0.teblabel(trInd,1);te_label_folds = KDD0.teblabel(teInd,1); targ = unique(te_label_folds); nhan = unique(tr_label_folds); if any(targ(:)==0) && any(targ(:)== 1)&& any(nhan(:)==0) && any(nhan(:)== 1) for l=6:41 sprintf('Testing data is fold %d, loop %d',k,2) % BG model trained Model_BG = TreeBagger(BG_para.BG.num_tree,tr_folds(:,2:41),tr_label_folds, 'Method','classification','MinLeafSize',BG_para.BG.num_leaf, 'NumPredictorsToSample','all','prior','Empirical'); predict_BG = predict(Model_BG,te_folds(:,2:41)); S = sprintf('%s*', predict_BG{:}); predict_BG = sscanf(S, '%f*'); [~,Result_BG,~]= Matrix_construction(te_label_folds,predict_BG,0); BG_val_acc(1,k) = Result_BG.Accuracy; BG_val_fpr(1,k) = Result_BG.FalsePositiveRate; BG_val_tpr(1,k) = Result_BG.Sensitivity; BG_val_F1(1,k) = Result_BG.F1_score; end end end BG_val.acc = mean(BG_val_acc,2); BG_val.fpr = mean(BG_val_fpr,2); BG_val.F1_score = mean(BG_val_F1,2); BG_val.tpr = mean(BG_val_tpr,2); end function [conf_matrix,Result,RefereceResult]= Matrix_construction(true,pred) if (nargin < 2) error('Not enough input arguments Need atleast two vectors as input'); 63 elseif (nargin > 3) error('Too many input arguments.'); end true = true(:); pred = pred(:); if length(true) ~= length(pred) error('Input have different lengths') end x_true = unique(true); x_pred = unique(pred); condition=length(x_true)==length(x_pred); if ~condition error('Class List is not same in given inputs') end condition=(sum(x_true==x_pred)==length(x_true)); if ~condition error('Class List in given inputs are different') end class_list = x_true; B_true_class_list = {'normal';'attack'}; n_class = length(x_true); conf_matrix = zeros(n_class); pred_class = cell(1,n_class); class_ref = cell(n_class,1); row_name = cell(1,n_class); for i=1:n_class class_ref{i,1} = strcat('class',num2str(i-1),'==>',B_true_class_list(i)); for j=1:n_class val = (true == class_list(i)) & (pred == class_list(j)); conf_matrix(i,j) = sum(val); pred_class{i,j} = sum(val); end row_name{i}=strcat('True_class_',B_true_class_list{i}); end confusion_matrix_table = cell2table(pred_class,'VariableNames',{'Predicted_class_normal' 'Predicted_class_attack'}); confusion_matrix_table.Properties.RowNames = row_name; disp('Confusion Matrix for ML classifier of classses') disp(confusion_matrix_table) [Result,RefereceResult] = getValues(conf_matrix); RefereceResult.Class=[class_ref{:}]'; end % -function [Result,RefereceResult]= getValues(c_matrix) if (nargin < 1) error('Not enough input arguments Need atleast two vectors as input'); elseif (nargin > 1) error('Too many input arguments.'); end [row,col] = size(c_matrix); if row ~= col error('Confusion matrix dimention is wrong') end n_class = row; 64 TP = c_matrix(1,1); FN = c_matrix(1,2); FP = c_matrix(2,1); TN = c_matrix(2,2); P = TP + FN; N = FP + TN; accuracy = (TP+TN)/(P+N); Sensitivity = TP./P; Specificity = TN./N; Precision = TP./(TP+FP); FPR = 1-Specificity; F1_score= 2*(Sensitivity.*Precision)./(Precision+Sensitivity) ; RefereceResult.AccuracyInTotal = accuracy'; RefereceResult.Sensitivity = Sensitivity'; RefereceResult.FalsePositiveRate = FPR'; RefereceResult.F1_score = F1_score'; Result.Accuracy = (accuracy); Result.Sensitivity = mean(Sensitivity); Result.FalsePositiveRate = mean(FPR); Result.F1_score = mean(F1_score); end %========================================================= % This file compute the feature importance and then rank the features based % on their important values from highest to lowest %=============================================== clear; close; clc; path_res = 'D:/my_folder/'; filename = {'BS_RFE_rank_feature'}; [KDD0] = BS_RFE; file_save = sprintf('%s%s',path_res,filename{1}); save(file_save,'KDD0'); % -function [KDD0] = BS_RFE addpath('D:/my_folder/'); load('KDD41MM.mat'); num_tr = size(KDD.trblabel,1); num_fold = 5; tr_indices = crossvalind('Kfold',num_tr,num_fold); for k = 1:num_fold sprintf('Testing data is fold: %d',k) teInd = (tr_indices == k); trInd = ~teInd; tr_folds = KDD.trdata(trInd,:); te_folds = KDD.trdata(teInd,:); tr_label_folds = KDD.trblabel(trInd,1);te_label_folds = KDD.trblabel(teInd,1); targ = unique(te_label_folds); nhan = unique(tr_label_folds); if any(targ(:)==0) && any(targ(:)== 1)&& any(nhan(:)==0) && any(nhan(:)== 1) tree = templateTree('Minleaf',5); Model_BS = fitensemble(tr_folds,tr_label_folds,'RUSBoost',205,tree, 'LearnRate',0.8,'type','classification'); predict_BS = predict(Model_BS,te_folds); pred_BS = unique(predict_BS); imp(k,:) = predictorImportance(Model_BS); [~,Result_BS,~]= Matrix_construction(te_label_folds,predict_BS,0); BS_Result_acc(k) = Result_BS.Accuracy; BS_Result_fpr(k) = Result_BS.FalsePositiveRate; 65 BS_Result_tpr(k) = Result_BS.Sensitivity; end end accuracy = mean(BS_Result_acc); false_positive_rate = mean(BS_Result_fpr); true_positive_rate = mean(BS_Result_tpr); important_feature = mean(imp,1); [B,I] = sort(important_feature,2); for i = 1:41 fna{i} = KDD.feature_names{I(i)}; end fnames = fna.'; KDD0.fname = fnames; KDD0.trdata = KDD.trdata(:,I); KDD0.tedata = KDD.tedata(:,I); KDD0.tedata21 = KDD.tedata21(:,I); KDD0.trblabel = KDD.trblabel; KDD0.teblabel = KDD.teblabel; KDD0.teblabel21 = KDD.teblabel21; KDD0.imp = B.'; KDD0.imp_ind = I.'; KDD0.acc = accuracy; KDD0.fpr = false_positive_rate; KDD0.tpr = true_positive_rate; end % This file implements the validation of every feature set using BS model %==================================================== clear; close; clc; path_res = 'D:/my_folder/'; filename = {'BS_RFE_val_F_1'}; [KDD_val] = BS_RFE_validate; file_save = sprintf('%s%s',path_res,filename{1}); save(file_save,'KDD_val'); % -function [BS_val] = BS_RFE_validate load('BS_RFE_rank_feature.mat'); %#ok num_tr = size(KDD0.teblabel,1); num_fold = 5; tr_indices = crossvalind('Kfold',num_tr,num_fold); BS_val_acc = zeros(41,5); BS_val_fpr = zeros(41,5); BS_val_F1 = zeros(41,5); BS_val_tpr = zeros(41,5); for k = 1:num_fold teInd = (tr_indices == k); trInd = ~teInd; tr_folds = KDD0.tedata(trInd,:); te_folds = KDD0.tedata(teInd,:); tr_label_folds = KDD0.teblabel(trInd,1);te_label_folds = KDD0.teblabel(teInd,1); targ = unique(te_label_folds); nhan = unique(tr_label_folds); if any(targ(:)==0) && any(targ(:)== 1)&& any(nhan(:)==0) && any(nhan(:)== 1) for l=6:41 sprintf('Testing data is fold %d, loop %d',k,5) 66 % BS model trained tree = templateTree('Minleaf',13); Model_BS = fitensemble(tr_folds(:,l:41),tr_label_folds,'RUSBoost', 76,tree,'LearnRate',0.8,'type','classification'); predict_BS = predict(Model_BS,te_folds(:,l:41)); [~,Result_BS,~]= Matrix_construction(te_label_folds,predict_BS,0); BS_val_acc(1,k) = Result_BS.Accuracy; BS_val_fpr(1,k) = Result_BS.FalsePositiveRate; BS_val_tpr(1,k) = Result_BS.Sensitivity; BS_val_F1(1,k) = Result_BS.F1_score; end end end BS_val.acc = mean(BS_val_acc,2); BS_val.fpr = mean(BS_val_fpr,2); BS_val.F1_score = mean(BS_val_F1,2); BS_val.tpr = mean(BS_val_tpr,2); end % This file show the confusion matrix and ROC curve of the proposed model clear; close; clc; load('BS_RFE_rank_feature.mat'); trdata = KDD0.trdata(:,1:41); trblabel = KDD0.trblabel; Model_RF = TreeBagger(175,trdata,trblabel,'Method','classification', 'MinLeafSize',5,'prior','Empirical'); [predict_RF, score] = predict(Model_RF,KDD0.tedata21(:,1:41)); S = sprintf('%s*', predict_RF{:}); predict_RF = sscanf(S, '%f*'); [confusion_matrix_table,~,~]= Matrix_construction(KDD0.teblabel21,predict_RF) [X,Y,~,AUC] = perfcurve(KDD0.teblabel21,score(:,2),1); figure hold on title('ROC Curves','FontSize',18) xlabel('False positive rate','FontSize',18,'FontName','Times New Roman') ylabel('True positive rate','FontSize',18,'FontName','Times New Roman') plot(X,Y,'b','LineWidth',2); set(gca,'XTick',[0 0.2 0.4 0.6 0.8 1],'FontSize',18,'FontName','Times New Roman'); set(gca,'YTick',[0 0.2 0.4 0.6 0.8 1],'FontSize',18,'FontName','Times New Roman'); ylim([0 1]); xlim([0 1]); hold off; % This file implements the validation of every feature set using KNN model clear; close; clc; path_res = 'D:/my_folder/'; filename = {'KNN_RFE_val_F_1'}; [KDD_val] = KNN_RFE_validate; file_save = sprintf('%s%s',path_res,filename{1}); save(file_save,'KDD_val'); % -function [KNN_val] = KNN_RFE_validate 67 load('BS_RFE_rank_feature.mat'); %#ok num_tr = size(KDD0.teblabel,1); num_fold = 5; tr_indices = crossvalind('Kfold',num_tr,num_fold); KNN_val_acc = zeros(41,5); KNN_val_fpr = zeros(41,5); KNN_val_F1 = zeros(41,5); KNN_val_tpr = zeros(41,5); for k = 1:num_fold teInd = (tr_indices == k); trInd = ~teInd; tr_folds = KDD0.tedata(trInd,:); te_folds = KDD0.tedata(teInd,:); tr_label_folds = KDD0.teblabel(trInd,1);te_label_folds = KDD0.teblabel(teInd,1); targ = unique(te_label_folds); nhan = unique(tr_label_folds); if any(targ(:)==0) && any(targ(:)== 1)&& any(nhan(:)==0) && any(nhan(:)== 1) for l=6:41 sprintf('Testing data is fold %d, loop %d',k,1) % KNN model trained Model_KNN = fitcknn(tr_folds(:,l:41),tr_label_folds,'NumNeighbors',5, 'NSMethod','exhaustive','Distance','euclidean','Standardize',1); predict_KNN = predict(Model_KNN,te_folds(:,l:41)); [~,Result_KNN,~]= Matrix_construction(te_label_folds,predict_KNN,0); KNN_val_acc(1,k) = Result_KNN.Accuracy; KNN_val_fpr(1,k) = Result_KNN.FalsePositiveRate; KNN_val_tpr(1,k) = Result_KNN.Sensitivity; KNN_val_F1(1,k) = Result_KNN.F1_score; end end end KNN_val.acc = mean(KNN_val_acc,2); KNN_val.fpr = mean(KNN_val_fpr,2); KNN_val.F1_score = mean(KNN_val_F1,2); KNN_val.tpr = mean(KNN_val_tpr,2); end function [conf_matrix,Result,RefereceResult]= Matrix_construction(true,pred,Disp) if (nargin < 2) error('Not enough input arguments Need atleast two vectors as input'); elseif (nargin > 3) error('Too many input arguments.'); end true = true(:); pred = pred(:); if length(true) ~= length(pred) error('Input have different lengths') end x_true = unique(true); x_pred = unique(pred); condition=length(x_true)==length(x_pred); if ~condition error('Class List is not same in given inputs') end condition=(sum(x_true==x_pred)==length(x_true)); if ~condition 68 error('Class List in given inputs are different') end class_list = x_true; M_true_class_list = {'normal';'dos';'probe';'r2l';'u2r'}; B_true_class_list = {'normal';'attack'}; n_class = length(x_true); conf_matrix = zeros(n_class); pred_class = cell(1,n_class); class_ref = cell(n_class,1); row_name = cell(1,n_class); %Calculate conufsion for all classes switch n_class case for i=1:n_class %class_ref{i,1} = strcat('class',num2str(i),'==>',num2str(class_list(i))); class_ref{i,1} = strcat('class',num2str(i-1),'==>',B_true_class_list(i)); for j=1:n_class val = (true == class_list(i)) & (pred == class_list(j)); conf_matrix(i,j) = sum(val); pred_class{i,j} = sum(val); end end confusion_matrix_table = cell2table(pred_class,'VariableNames',{'Predicted_class_normal' 'Predicted_class_attack'}); confusion_matrix_table.Properties.RowNames = row_name; disp('Confusion Matrix for ML classifier of classses') disp(confusion_matrix_table) [Result,RefereceResult] = getValues(conf_matrix); %Output Struct for individual Classes RefereceResult.Class=[class_ref{:}]'; otherwise for i=1:n_class class_ref{i,1} = strcat('class',num2str(i-1),'==>',M_true_class_list(i)); for j=1:n_class val = (true == class_list(i)) & (pred == class_list(j)); conf_matrix(i,j) = sum(val); pred_class{i,j} = sum(val); end end confusion_matrix_table = cell2table(pred_class,'VariableNames', {'Predicted_class_normal' 'Predicted_class_dos' 'Predicted_class_probe' 'Predicted_class_r2l' 'Predicted_class_u2r'}); confusion_matrix_table.Properties.RowNames = row_name; disp('Confusion Matrix for ML classifier of multiple classes') disp(confusion_matrix_table) [Result,RefereceResult] = getValues(conf_matrix); %Output Struct for individual Classes RefereceResult.Class=[class_ref{:}]'; end if Disp if n_class > disp('Multi-Class Confusion Matrix Output') TruePositive = RefereceResult.TruePositive; 69 FalsePositive = RefereceResult.FalsePositive; FalseNegative = RefereceResult.FalseNegative; TrueNegative = RefereceResult.TrueNegative; TFPN = table(TruePositive,FalsePositive,FalseNegative,TrueNegative, 'RowNames',row_name); disp(TFPN); Param=struct2table(RefereceResult);disp(Param) else disp('Two-Class Confution Matrix') param={'','TruePositive','FalsePositive';'FalseNegative',conf_matrix(1,1), conf_matrix(1,2);'TrueNegative=TN',conf_matrix(2,1),conf_matrix(2,2)}; disp(param) end disp('Over all values') disp(Result) end end % -function [Result,RefereceResult]= getValues(c_matrix) if (nargin < 1) error('Not enough input arguments Need atleast two vectors as input'); elseif (nargin > 1) error('Too many input arguments.'); end [row,col] = size(c_matrix); if row ~= col error('Confusion matrix dimention is wrong') end n_class = row; switch n_class case TP = c_matrix(1,1); FN = c_matrix(1,2); FP = c_matrix(2,1); TN = c_matrix(2,2); otherwise TP = zeros(1,n_class); FN = zeros(1,n_class); FP = zeros(1,n_class); TN = zeros(1,n_class); for i=1:n_class TP(i) = c_matrix(i,i); FN(i) = sum(c_matrix(i,:))-c_matrix(i,i); FP(i) = sum(c_matrix(:,i))-c_matrix(i,i); TN(i) = sum(c_matrix(:))-TP(i)-FP(i)-FN(i); end end P = TP + FN; N = FP + TN; switch n_class case accuracy = (TP+TN)/(P+N); Error = - accuracy; Result.Accuracy = (accuracy); Result.Error = (Error); otherwise accuracy = (TP)./(P+N); Error = (FP)./(P+N); Result.Accuracy = sum(accuracy); Result.Error = sum(Error); end RefereceResult.AccuracyOfSingle = (TP./P)'; RefereceResult.ErrorOfSingle = - RefereceResult.AccuracyOfSingle; Sensitivity = TP./P; %quan tam thong so true positive rate 70 Specificity = TN./N; Precision = TP./(TP+FP); FPR = 1-Specificity; % quan tam thong so beta = 1; F1_score=( (1+(beta^2))*(Sensitivity.*Precision) ) / ( (beta^2)*(Precision+Sensitivity) ); MCC=[( TP.*TN - FP.*FN ) / ( ( (TP+FP).*P.*N.*(TN+FN) ).^(0.5) ); ( FP.*FN - TP.*TN ) / ( ( (TP+FP).*P.*N.*(TN+FN) ).^(0.5) )] ; MCC = max(MCC); % This file implements the validation of every feature set using RF model clear; close; clc; path_res = 'D:/my_folder/'; filename = {'RF_RFE_val_F_1'}; [KDD_val] = RF_RFE_validate; file_save = sprintf('%s%s',path_res,filename{1}); save(file_save,'KDD_val'); % -function [RF_val] = RF_RFE_validate load('BS_RFE_rank_feature.mat'); %#ok num_tr = size(KDD0.teblabel,1); num_fold = 5; tr_indices = crossvalind('Kfold',num_tr,num_fold); RF_val_acc = zeros(41,5); RF_val_fpr = zeros(41,5); RF_val_F1 = zeros(41,5); RF_val_tpr = zeros(41,5); for k = 1:num_fold teInd = (tr_indices == k); trInd = ~teInd; tr_folds = KDD0.tedata(trInd,:); te_folds = KDD0.tedata(teInd,:); tr_label_folds = KDD0.teblabel(trInd,1);te_label_folds = KDD0.teblabel(teInd,1); targ = unique(te_label_folds); nhan = unique(tr_label_folds); if any(targ(:)==0) && any(targ(:)== 1)&& any(nhan(:)==0) && any(nhan(:)== 1) for l=6:41 sprintf('Testing data is fold %d, loop %d',k,23) % RF model trained Model_RF = TreeBagger(25,tr_folds(:,l:41),tr_label_folds,'Method','classification', 'MinLeafSize',3,'prior','Empirical'); predict_RF = predict(Model_RF,te_folds(:,l:41)); S = sprintf('%s*', predict_RF{:}); predict_RF = sscanf(S, '%f*'); [~,Result_RF,~]= Matrix_construction(te_label_folds,predict_RF,0); RF_val_acc(1,k) = Result_RF.Accuracy; RF_val_fpr(1,k) = Result_RF.FalsePositiveRate; RF_val_tpr(1,k) = Result_RF.Sensitivity; RF_val_F1(1,k) = Result_RF.F1_score; end end end RF_val.acc = mean(RF_val_acc,2); RF_val.fpr = mean(RF_val_fpr,2); 71 RF_val.F1_score = mean(RF_val_F1,2); RF_val.tpr = mean(RF_val_tpr,2); end function [conf_matrix,Result,RefereceResult]= Matrix_construction(true,pred) if (nargin < 2) error('Not enough input arguments Need atleast two vectors as input'); elseif (nargin > 3) error('Too many input arguments.'); end true = true(:); pred = pred(:); if length(true) ~= length(pred) error('Input have different lengths') end x_true = unique(true); x_pred = unique(pred); condition=length(x_true)==length(x_pred); if ~condition error('Class List is not same in given inputs') end condition=(sum(x_true==x_pred)==length(x_true)); if ~condition error('Class List in given inputs are different') end class_list = x_true; B_true_class_list = {'normal';'attack'}; n_class = length(x_true); conf_matrix = zeros(n_class); pred_class = cell(1,n_class); class_ref = cell(n_class,1); row_name = cell(1,n_class); for i=1:n_class class_ref{i,1} = strcat('class',num2str(i-1),'==>',B_true_class_list(i)); for j=1:n_class val = (true == class_list(i)) & (pred == class_list(j)); conf_matrix(i,j) = sum(val); pred_class{i,j} = sum(val); end row_name{i}=strcat('True_class_',B_true_class_list{i}); end confusion_matrix_table = cell2table(pred_class,'VariableNames',{'Predicted_class_normal' 'Predicted_class_attack'}); confusion_matrix_table.Properties.RowNames = row_name; disp('Confusion Matrix for ML classifier of classses') disp(confusion_matrix_table) [Result,RefereceResult] = getValues(conf_matrix); RefereceResult.Class=[class_ref{:}]'; end

Định dạng
Số trang	82
Dung lượng	2,57 MB