Kỹ thuật học máy phối hợp và tiền xử lý dữ liệu trong việc nâng cao chất lượng phân lớp của các hệ thống phát hiện xâm nhập mạng.

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề	Kỹ Thuật Học Máy Phối Hợp Và Tiền Xử Lý Dữ Liệu Trong Việc Nâng Cao Chất Lượng Phân Lớp Của Các Hệ Thống Phát Hiện Xâm Nhập Mạng
Tác giả	Hoàng Ngọc Thanh
Người hướng dẫn	PGS.TS. Trần Văn Lăng
Trường học	Trường Đại Học Lạc Hồng
Chuyên ngành	Khoa Học Máy Tính
Thể loại	Luận Án Tiến Sĩ
Năm xuất bản	2022
Thành phố	Đồng Nai

Định dạng
Số trang	202
Dung lượng	4,98 MB

Nội dung

Kỹ thuật học máy phối hợp và tiền xử lý dữ liệu trong việc nâng cao chất lượng phân lớp của các hệ thống phát hiện xâm nhập mạng. Kỹ thuật học máy phối hợp và tiền xử lý dữ liệu trong việc nâng cao chất lượng phân lớp của các hệ thống phát hiện xâm nhập mạng. Kỹ thuật học máy phối hợp và tiền xử lý dữ liệu trong việc nâng cao chất lượng phân lớp của các hệ thống phát hiện xâm nhập mạng. Kỹ thuật học máy phối hợp và tiền xử lý dữ liệu trong việc nâng cao chất lượng phân lớp của các hệ thống phát hiện xâm nhập mạng. Kỹ thuật học máy phối hợp và tiền xử lý dữ liệu trong việc nâng cao chất lượng phân lớp của các hệ thống phát hiện xâm nhập mạng. Kỹ thuật học máy phối hợp và tiền xử lý dữ liệu trong việc nâng cao chất lượng phân lớp của các hệ thống phát hiện xâm nhập mạng. Kỹ thuật học máy phối hợp và tiền xử lý dữ liệu trong việc nâng cao chất lượng phân lớp của các hệ thống phát hiện xâm nhập mạng. Kỹ thuật học máy phối hợp và tiền xử lý dữ liệu trong việc nâng cao chất lượng phân lớp của các hệ thống phát hiện xâm nhập mạng. Kỹ thuật học máy phối hợp và tiền xử lý dữ liệu trong việc nâng cao chất lượng phân lớp của các hệ thống phát hiện xâm nhập mạng. Kỹ thuật học máy phối hợp và tiền xử lý dữ liệu trong việc nâng cao chất lượng phân lớp của các hệ thống phát hiện xâm nhập mạng.

BỘ GIÁO DỤC VÀ ĐÀO TẠO TRƯỜNG ĐẠI HỌC LẠC HỒNG HOÀNG NGỌC THANH KỸ THUẬT HỌC MÁY PHỐI HỢP VÀ TIỀN XỬ LÝ DỮ LIỆU TRONG VIỆC NÂNG CAO CHẤT LƯỢNG PHÂN LỚP CỦA CÁC HỆ THỐNG PHÁT HIỆN XÂM NHẬP MẠNG LUẬN ÁN TIẾN SĨ KHOA HỌC MÁY TÍNH Đồng Nai, năm 2022 BỘ GIÁO DỤC VÀ ĐÀO TẠO TRƯỜNG ĐẠI HỌC LẠC HỒNG HOÀNG NGỌC THANH KỸ THUẬT HỌC MÁY PHỐI HỢP VÀ TIỀN XỬ LÝ DỮ LIỆU TRONG VIỆC NÂNG CAO CHẤT LƯỢNG PHÂN LỚP CỦA CÁC HỆ THỐNG PHÁT HIỆN XÂM NHẬP MẠNG LUẬN ÁN TIẾN SĨ KHOA HỌC MÁY TÍNH Chuyên ngành: Khoa học máy tính Mã số ngành: 9480101 NGƯỜI HƯỚNG DẪN KHOA HỌC PGS.TS TRẦN VĂN LĂNG Đồng Nai, năm 2022 LỜI CAM ĐOAN Tên tơi là: Hồng Ngọc Thanh Sinh ngày: 13/11/1969 Nơi sinh: Bình Định Là nghiên cứu sinh chuyên ngành Khoa học máy tính, khóa 2015, Trường đại học Lạc Hồng Tôi xin cam đoan luận án tiến sĩ “Kỹ thuật học máy phối hợp tiền xử lý liệu việc nâng cao chất lượng phân lớp hệ thống phát xâm nhập mạng” cơng trình nghiên cứu cá nhân tơi, cơng trình tơi thực dưới hướng dẫn giảng viên, người hướng dẫn khoa học là: PGS TS Trần Văn Lăng Các thuật toán, số liệu kết trình bày luận án hồn tồn có từ thử nghiệm, trung thực khơng chép Nghiên cứu sinh Hoàng Ngọc Thanh LỜI CẢM ƠN Lời đầu tiên, với lòng biết ơn sâu sắc nhất, xin gửi lời cảm ơn tới PGS TS Trần Văn Lăng - người hướng dẫn khoa học, thầy người truyền cho tri thức, tâm huyết nghiên cứu khoa học, thầy tận tâm hướng dẫn, giúp đỡ tạo điều kiện tốt nhất để tơi hồn thành luận án Tơi xin chân thành cảm ơn Quý thầy cô Ban giám hiệu, Khoa công nghệ thông tin, Khoa sau đại học Trường đại học Lạc Hồng giảng dạy tạo điều kiện thuận lợi cho suốt thời gian tham gia nghiên cứu sinh Tôi xin cảm ơn hỗ trợ từ Ban giám hiệu, Khoa kỹ thuật khoa học máy tính, Trung tâm ngoại ngữ công nghệ thông tin Trường Đại học Quốc tế Sài Gịn, nơi tơi cơng tác Và tơi xin gửi lời cảm ơn chân thành tới đồng nghiệp, bạn bè - người quan tâm, động viên suốt thời gian qua Cuối cùng, xin dành tình cảm đặc biệt đến gia đình, người thân - người tin tưởng, động viên tiếp sức cho thêm nghị lực để tơi vững bước vượt qua khó khăn Tác giả Hồng Ngọc Thanh TĨM TẮT Phát bất thường dựa luồng vấn đề phát triển môi trường an ninh mạng Nhiều nghiên cứu trước áp dụng học máy phương pháp nâng cao khả phát bất thường hệ thống phát xâm nhập mạng (NIDS) Các nghiên cứu gần cho thấy, NIDS phải đối mặt với thách thức việc cải thiện độ xác, giảm tỷ lệ cảnh báo sai phát tấn công mới Nội dung luận án đề xuất số giải pháp sử dụng kỹ thuật học máy phối hợp cải tiến kỹ thuật tiền xử lý liệu việc nâng cao chất lượng phân lớp hệ thống phát xâm nhập mạng Điều dựa thực tế là: (1) Có nhiều liệu mất cân lớp tập liệu huấn luyện dùng cho NIDS (2) Các thuật tốn học máy sử dụng tất thuộc tính thực khơng liên quan đến mục tiêu phân lớp, điều làm giảm chất lượng phân lớp tăng thời gian tính tốn (3) Các phân lớp phối hợp vượt trội so với phân lớp đơn độ xác phân lớp Những lợi phân lớp phối hợp đặc biệt rõ ràng lĩnh vực phát xâm nhập Để giải vấn đề, luận án đề xuất cải tiến việc thực hai giải pháp giai đoạn tiền xử lý liệu, cụ thể là: (1) Đề xuất thuật toán lựa chọn thuộc tính sở cải tiến thuật tốn lựa chọn thuộc tính FFC BFE biết (2) Cải tiến kỹ thuật tăng mẫu giảm mẫu tập liệu huấn luyện Dữ liệu kết sau tiền xử lý sử dụng để huấn luyện phân lớp phối hợp cách sử dụng thuật toán học máy phối hợp đồng nhất (Bagging, Boosting, Stacking Decorate) không đồng nhất (Voting, Stacking RF) Kết thử nghiệm tập liệu huấn luyện kiểm tra đầy đủ tập liệu UNSW-NB15 cho thấy, giải pháp đề xuất cải thiện chất lượng phân lớp NIDS Bên cạnh kết đạt được, kết nghiên cứu luận án để lại tồn định hướng phát triển tương lai: (1) Thời gian h́n luyện mơ hình phân lớp đề x́t cịn lớn, việc phối hợp đắn thuật tốn để xây dựng mơ hình phân lớp lai, đa nhãn đáp ứng thời gian thực vấn đề cần tiếp tục nghiên cứu (2) Năng lực xử lý đóng vai trị quan trọng việc khai thác thuật toán học máy Việc nâng cao hiệu xử lý theo hướng tiếp cận xử lý song song việc tối ưu tham số cho kỹ thuật học máy vấn đề bo ngo ABSTRACT Stream-based intrusion detection is a growing problem in computer network security environments Many previous researches have applied machine learning as a method to detect attacks in Network Intrusion Detection Systems (NIDS) However, these methods still have limitations of low accuracy, high false alarm rate and detecting new attacks The content of the thesis proposes some solutions using ensemble machine learning techniques and improving data preprocessing techniques in improving the classification quality of NIDS This is based on the fact that: (1) There is a lot of class imbalance data in the training datasets used for NIDS (2) Machine learning algorithms can use some features that are really irrelevant to the classification goal, which reduces the quality of classification and increases computation time (3) Ensemble classifiers outperform the single classifiers in classification accuracy The advantages of the ensemble classifier are particularly evident in the area of network intrusion detection To solve the problem, the thesis proposes to improve the implementation of two solutions in the data preprocessing stage, details as follows: (1) Proposing feature selection algorithms on the basis of improving known FFC and BFE feature selection algorithms (2) Improving techniques for oversampling and undersampling the training dataset The resulting data after preprocessing is used to train the ensemble classifiers using both homogeneous (Bagging, Boosting, Stacking and Decorate) and heterogeneous (Voting, Stacking and RF) ensemble machine learning algorithms The experimental results on the full training and testing datasets of the UNSW-NB15 dataset show that the proposed solutions have improved the classification quality of the NIDS In addition to the achieved results, the research results of the thesis also leave shortcomings and future development orientations: (1) The training time of the proposed classification models is still large, the coordination the right algorithms to build a hybrid, multi-label and real-time response classification model is a problem that needs to be further researched (2) Processing capacity plays an important role in exploiting machine learning algorithms The improvement of processing efficiency in the direction of parallel processing as well as the optimization of parameters for machine learning techniques is still an open issue MỤC LỤC CHƯƠNG GIỚI THIỆU 1.1 Hệ thống phát xâm nhập 1.1.1 Giới thiệu IDS 1.1.2 Phân loại IDS 1.1.3 IDS sử dụng kỹ thuật học máy 1.2 Tính cấp thiết đề tài luận án 1.3 Mục tiêu nghiên cứu 1.4 Đối tượng phạm vi nghiên cứu 1.4.1 Đối tượng nghiên cứu 1.4.2 Phạm vi nghiên cứu 1.5 Phương pháp nghiên cứu 1.6 Ý nghĩa khoa học thực tiễn 1.6.1 Ý nghĩa khoa học 1.6.2 Ý nghĩa thực tiễn 1.7 Những điểm đóng góp mới 1.8 Kết cấu luận án CHƯƠNG CÁC NGHIÊN CỨU LIÊN QUAN 2.1 Cơ sở lý thuyết 2.1.1 Lựa chọn thuộc tính 2.1.2 Lấy mẫu lại tập liệu 15 2.1.3 Kỹ thuật học máy 21 2.1.4 Tập liệu sử dụng cho IDS 27 2.1.5 Chỉ số đánh giá hiệu IDS 33 2.2 Các nghiên cứu liên quan học máy cho IDS 36 2.2.1 Lựa chọn thuộc tính 36 2.2.2 Lấy mẫu lại tập liệu 38 2.2.3 Các mơ hình học máy cho IDS 40 2.2.4 Nhận xét 56 CHƯƠNG GIẢI PHÁP LỰA CHỌN THUỘC TÍNH 57 3.1 Giải pháp lựa chọn thuộc tính đề xuất 57 3.1.1 Các số đo thông tin 57 3.1.2 Thuật tốn loại bo thuộc tính ngược BFE 58 3.1.3 Thuật tốn chọn thuộc tính thuận FFC 59 3.1.4 Thuật toán lựa chọn thuộc tính đề xuất 61 3.2 Kết thực 65 3.2.1 Lựa chọn thuộc tính với kiểu tấn công Worms 66 3.2.2 Lựa chọn thuộc tính với kiểu tấn cơng Shellcode 68 3.2.3 Lựa chọn thuộc tính với kiểu tấn cơng Backdoor 70 3.2.4 Lựa chọn thuộc tính với kiểu tấn công Analysis 72 3.2.5 Lựa chọn thuộc tính với kiểu tấn cơng Recce 74 3.2.6 Lựa chọn thuộc tính với kiểu tấn cơng DoS 76 3.2.7 Lựa chọn thuộc tính với kiểu tấn công Fuzzers 78 3.2.8 Lựa chọn thuộc tính với kiểu tấn công Exploits 80 3.2.9 Lựa chọn thuộc tính với kiểu tấn cơng Generic 82 3.3 So sánh, nhận xét đánh giá giải pháp lựa chọn thuộc tính đề xuất 84 CHƯƠNG GIẢI PHÁP LẤY MẪU LẠI TẬP DỮ LIỆU 87 4.1 Giải pháp lấy mẫu lại tập liệu đề xuất 87 4.1.1 Giải pháp tăng mẫu 87 4.1.2 Giải pháp giảm mẫu 91 4.2 Kết thực 95 4.2.1 Tăng mẫu tập liệu 96 4.2.2 Giảm mẫu tập liệu 106 4.3 Tổng hợp kết nhận xét giải pháp lấy mẫu lại tập liệu 117 CHƯƠNG KỸ THUẬT PHỐI HỢP CHO MƠ HÌNH IDS 120 5.1 Kỹ thuật phối hợp đề xuất 120 5.2 Kết thực 125 5.2.1 Sử dụng kỹ thuật phối hợp với kiểu tấn công Worms 127 5.2.2 Sử dụng kỹ thuật phối hợp với kiểu tấn công Shellcode 129 5.2.3 Sử dụng kỹ thuật phối hợp với kiểu tấn công Backdoor .131 5.2.4 Sử dụng kỹ thuật phối hợp với kiểu tấn công Analysis 133 5.2.5 Sử dụng kỹ thuật phối hợp với kiểu tấn công Recce 135 5.2.6 Sử dụng kỹ thuật phối hợp với kiểu tấn công DoS 137 5.2.7 Sử dụng kỹ thuật phối hợp với kiểu tấn công Fuzzers 139 5.2.8 Sử dụng kỹ thuật phối hợp với kiểu tấn công Exploits 141 5.2.9 Sử dụng kỹ thuật phối hợp với kiểu tấn công Generic 143 5.3 Tổng hợp kết nhận xét kỹ thuật phối hợp 145 5.4 Mơ hình phân lớp lai đề xuất 146 CHƯƠNG KẾT LUẬN VÀ HƯỚNG PHÁT TRIỂN 149 6.1 Đánh giá kết đạt được, hạn chế hướng phát triển 149 6.2 Đánh giá ý nghĩa học thuật thực tiễn luận án 150 DANH MỤC CÁC KÝ HIỆU, CHỮ VIẾT TẮT Viết tắt ABC ADASYN ANN AUC Bagging BFE BFS BN CA CART CFS CNN CSE CV DoS DT FFC ELM ENN FPR GA GAR GC GP GR ICA IDS IG KNN KNNCF LC LDA LOO LR Viết đầy đủ Artificial Bee Colony Adaptive Synthetic Sampling Artificial Neural Network Area Under the Curve Bootstrap Aggregation Backward Feature Elimination Best First Search Bayesian Network Correlation Attribute Classification and Regression Trees Correlation-based Feature Selection Convolutional Neural Network Consistency Subset Evaluator Cross Validation Denial of Service Decision Tree Forward Feature Construction Extreme Learning Machines Edited Nearest Neighbors False Positive Rate Genetic Algorithm GRASP with Annealed Randomness Global Competence Genetic Programming Gain Ratio Independent Component Analysis Intrusion Detection System Information Gain K Nearest Neighbours K Nearest Neighbor Collaborative Filtering Local Competence Linear Discriminant Analysis Leave One Out Logistic Regression LSTM MARS ML MLP MV NB NCR NSGA OAR OSELM PART PCA PSO R2L RBF RF RMV RNN ROC RT SMOTE SSV SU SVM TPR U2R WLC WMV WRMV WTA Long Short - Term Memory Multivariate Adaptive Regression Splines Machine Learning Multi Layer Perceptron Majority Voting Naïve Bayes Neighborhood Cleaning Rule Non-dominated Sorting Genetic Algorithm One Against Rest Sequential Extreme Learning Machine Partial Decision Tree Principal Component Analysis Particle Swarm Optimization Remote to Local Radial Basis Function Random Forest Rigged Majority Voting Recurrent Neural Network Receiver Operating Characteristics Random Tree Synthetic Minority Over-Sampling Technique Separability Split Value Symmetrical Uncertainty Support Vector Machine True Positive Rate User to Root Weighted Local Competence Weighted Majority Voting Weighted Rigged Majority Voting Winner Takes All 146 Qua kết thử nghiệm nêu trên, ta có số nhận xét sau: (1) Các phân lớp phối hợp cho chất lượng phân lớp tốt so với trường hợp sử dụng phân lớp đơn tiêu chí đánh giá quan trọng F-Measure G-Means Trừ trường hợp với kiểu tấn công Shellcode, số G-Means kỹ thuật phân lớp đơn có cao kỹ thuật phối hợp, số Sensitivity kỹ thuật phân lớp cao Precision lại thấp tỷ suất cảnh báo sai cao Điều phù hợp với nghiên cứu trước [34] Vì với việc triển khai nhiều phân lớp sở, tỷ lệ lỗi chung phối hợp sẽ giảm xuống Những lợi rõ ràng lĩnh vực phát xâm nhập, có nhiều loại xâm nhập khác nhau, nên việc có phân lớp khác cần thiết để phát chúng [CT4], [CT6], [CT7] (2) Thời gian huấn luyện kiểm tra phân lớp phối hợp lớn thời gian phân lớp đơn, đặc biệt sử dụng kỹ thuật phối hợp không đồng nhất Stacking Decorate [CT4], [CT6], [CT7] (3) Ba kỹ thuật phối hợp sử dụng có hiệu đối với tập liệu UNSW-NB15 gồm: AdaBoost với phân lớp sở dùng DT với kiểu tấn công: Worms, Backdoor, Recce, Fuzzers Generic; Bagging với phân lớp sở dùng RT với kiểu tấn công: Analysis Exploits; MixStacking với kiểu tấn cơng: Shellcode, DoS Nhìn chung, phối hợp đồng nhất cho kết tốt so với phối hợp không đồng nhất phần lớn trường hợp Riêng với kiểu tấn công Shellcode DoS, phối hợp không đồng nhất cho F- Measure tốt không đáng kể, thời gian thực thi lại lớn (4) Phối hợp đồng nhất AdaBoost tốt phối hợp khác phần lớn trường hợp (5 tổng số kiểu tấn công, so với Bagging 2/9 Stacking 2/9) Điều chứng to kỹ thuật tự điều chỉnh tham số theo liệu dựa hiệu suất thực tế vòng lặp thuật toán học máy phát huy hiệu (5) Kỹ thuật Decorate giúp cải thiện chất lượng phân lớp với tập liệu huấn luyện nho như: NSL-KDD, KDDCup99 [CT4] Tuy nhiên, đối với tập liệu lớn UNSW-NB15, kỹ thuật không hiệu [CT7] (6) Việc sử dụng F-Measure để đánh giá chất lượng phân lớp giúp cải thiện mối quan hệ hài hịa Precision Recall 5.4 Mơ hình phân lớp lai đê xuất Từ kết thử nghiệm việc sử dụng kỹ thuật lựa chọn thuộc tính, lấy mẫu lại tập liệu kỹ thuật phối hợp nêu Luận án đề x́t mơ hình phân lớp lai để phát kiểu tấn công sở kết hợp giải pháp để nâng cao chất lượng phân lớp IDS trình bày Hình 5.8 Theo đó, trước đưa vào sử dụng, phân lớp phối hợp IDS trải qua trình huấn luyện Tập liệu huấn luyện UNSWNB15 trước tiên lấy mẫu lại qua việc sử dụng kỹ thuật tăng mẫu giảm mẫu phù hợp với kiểu tấn công Dữ liệu sau chuẩn hóa lựa chọn thuộc tính lần dùng kỹ thuật lựa chọn thuộc tính đề xuất Tập liệu huấn luyện sau sử dụng để huấn luyện phân lớp phối hợp dùng kỹ thuật phối hợp phù hợp nhất với kiểu tấn cơng Bảng 5.21 trình bày kỹ thuật lựa chọn thuộc tính, lấy mẫu lại tập liệu kỹ thuật phối hợp phù hợp nhất với kiểu tấn cơng 147 Dịng liệu truy cập vào mạng truyền số liệu Tập liệu huấn luyện UNSWNB15 Rút trích thuộc tính Lấy mẫu lại tập liệu huấn luyện Cảnh báo người quản trị mạng Yes Chuẩn hóa liệu Lựa chọn thuộc tính Bộ phân lớp phối hợp Tấn công? No Tập liệu kiểm tra UNSW-NB15 Qua tầng Hình 5.8 Mơ hình IDS đề xuất phát hiện các kiểu tấn công mạng Kiểu công Worms Shellcode Backdoor Analysis Recce DoS Fuzzers Exploits Generic Bảng 5.21 Các kỹ thuật đề xuất với kiểu tấn cơng Thuật tốn lựa Thuật tốn lấy mẫu lại tập liệu Bộ phân lớp phối Tăng mẫu Giảm mẫu chọn tḥc tính hợp mBFE-IG BL-SMOTE1+mBFE ENN+mFFC AdaBoost với DT mBFE-GR BL-SMOTE1+mFFC NCR+mBFE Mix Stacking mBFE-CA ADASYN+mBFE NCR+mFFC AdaBoost với DT mBFE-CA ADASYN+mFFC ENT+mBFE Bagging với RT mBFE-CA CL-SMOTE+mBFE ENN+mBFE AdaBoost với DT mBFE-CA CL-SMOTE+mBFE ENT+mFFC Mix Stacking mBFE-IG CL-SMOTE+mFFC NCR+mFFC AdaBoost với DT mBFE-CA ADASYN ENT+mFFC Bagging với RT mBFE-IG BL-SMOTE2+mFFC ENT+mFFC AdaBoost với DT Để kiểm chứng kết đạt được, phân lớp lai đề xuất sử dụng để phân lớp tập liệu UNSW-NB15 gồm: 43 thuộc tính, 82.332 ghi Trong có 37.000 ghi dán nhãn bình thường 45.332 ghi dán nhãn tấn công Ma trận lỗi (Confusion Matrix) kết sau phân lớp thể Bảng 5.22 số đánh giá thể Bảng 5.23 với kiểu tấn cơng có tập liệu Bảng 5.22 Ma trận lỗi phân lớp lai đề xuất Worms Shellcode Backdoor Analysis Recce DoS Fuzzers Exploits Generic Normal Worms 33 1 0 1 Shellcode 365 0 0 Backdoor 0 413 123 0 13 0 34 Analysis 192 336 17 125 Recce 14 114 201 101 2989 15 31 31 DoS 29 165 623 1593 59 1124 90 225 181 Fuzzers 14 637 492 479 4394 33 0 Exploits 137 196 827 782 224 38 59 8529 340 Generic 13 75 30 15 37 96 392 18143 63 Normal 172 219 0 669 0 35940 148 Bảng 5.23 Các số đánh giá phân lớp lai đề xuất Tấn công TP FP TN Worms 33 208 82080 Shellcode 365 1367 80587 Backdoor 413 2366 79383 Analysis 336 3313 78342 Recce 2989 299 78537 DoS 1124 83 78160 Fuzzers 4394 966 75304 Exploits 8529 683 70517 Generic 18143 63461 Normal 35940 781 44551 FN Accuracy Sensitivity Specifity FPR FNR 11 0.9973 0.7500 0.9975 0.0025 0.2500 13 0.9832 0.9656 0.9833 0.0167 0.0344 170 0.9692 0.7084 0.9711 0.0289 0.2916 341 0.9556 0.4963 0.9594 0.0406 0.5037 507 0.9902 0.8550 0.9962 0.0038 0.1450 2965 0.9630 0.2749 0.9989 0.0011 0.7251 1668 0.9680 0.7248 0.9873 0.0127 0.2752 2603 0.9601 0.7662 0.9904 0.0096 0.2338 728 0.9912 0.9614 1.0000 0.0000 0.0386 1060 0.9776 0.9714 0.9828 0.0172 0.0286 Các kết phân lớp dùng để so sánh với số nghiên cứu gần IDS sử dụng tập liệu UNSW-NB15 Bảng 5.24 so sánh số Accuracy với kết đạt báo [84], [130] [131] Bảng 5.25 so sánh số Sensitivity với kết đạt báo [84] [141] Các ô tô đo ô cho kết tốt nhất Theo đó, phân lớp lai đề xuất đạt Accuracy tốt báo khác kiểu tấn công Recce, Exploits Generic Về độ nhạy Sensitivity, phân lớp lai đề xuất tốt báo khác phần lớn kiểu tấn công trừ kiểu tấn công Generic Normal Bảng 5.24 So sánh số Accuracy với một số nghiên cứu gần Chỉ số Accuracy Tấn công Bộ phân lớp lai đê xuất [84] [130] [131] Worms 0.9973 0.9978 0.9992 0.9728 Shellcode 0.9832 0.9833 0.9940 0.9992 Backdoor 0.9692 0.9793 0.9911 0.9906 Analysis 0.9556 0.9930 0.9926 0.9944 Recce 0.9902 0.9618 0.9533 0.9874 DoS 0.9630 0.9571 0.9490 0.9814 Fuzzers 0.9680 0.9504 0.9147 0.9892 Exploits 0.9601 0.9358 0.9012 0.9391 Generic 0.9912 0.9870 0.9823 0.9834 Normal 0.9776 94.59 93.54 0.9816 Bảng 5.25 So sánh số Sensitivity với số nghiên cứu gần [84] Chỉ số Sensitivity [141] Tấn công Bộ phân lớp lai đê xuất Worms Shellcode Backdoor Analysis Recce DoS Fuzzers Exploits Generic Normal 0.7500 0.9656 0.7084 0.4963 0.8550 0.2749 0.7248 0.7662 0.9614 0.9714 0.1837 0.3639 0.6732 0.2045 0.4604 0.1429 0.6442 0.7622 0.8137 0.9739 0.7170 0.0500 0.5464 0.9672 0.9800 149 CHƯƠNG KẾT LUẬN VÀ HƯỚNG PHÁT TRIỂN Chương tổng kết kết đạt được, làm rõ toán cách giải vấn đề mà đề tài luận án đặt Chương đánh giá lại nội dung: nghiên cứu công trình khoa học liên quan, phương pháp đặt giải toán luận án Với kỹ thuật, giải pháp đề x́t, mơ tả tốn, phương pháp, thực nghiệm, điểm mạnh, tính mới/tính cải tiến liên tục, điểm điểm hạn chế hướng phát triển làm rõ Chương chia thành 02 mục chính: Mục 6.1 đánh giá kết đạt được, hạn chế hướng phát triển; Mục 6.2 đánh giá ý nghĩa học thuật thực tiễn luận án 6.1 Đánh giá vê kết đạt được, hạn chế hướng phát triển Phần nghiên cứu tổng quan luận án đã: Lược sử cơng trình liên quan đến hướng nghiên cứu đề tài tốn đặt từ giúp mang lại nhìn tổng thể đối với vấn đề nghiên cứu; Tìm hiểu kỹ thuật tảng vấn đề nghiên cứu; Phân tích điểm mạnh yếu nghiên cứu liên quan từ định tốn phương án giải quyết; So sánh giải pháp có sử dụng cách tiếp cận từ tìm ưu điểm hạn chế giải pháp; Cập nhật liên tục thời điểm nghiên cứu liên quan từ thấy phát triển liên tục hướng nghiên cứu Kết nghiên cứu đạt luận án thể qua đóng góp quan trọng, cụ thể sau:  Với lựa chọn thuộc tính: luận án đề xuất thuật toán mFFC mBFE sở cải tiến thuật tốn lựa chọn thuộc tính biết FFC BFE Việc cải tiến thể nội dung: (1) Kết hợp FFC BFE với việc xếp hạng thuộc tính để giảm thời gian chi phí tốn, điều đặc biệt phù hợp với tập liệu lớn Kết thực hiện, độ phức tạp �×(�−1) thời gian thuật tốn lựa chọn thuộc tính giảm từ O (N!) xuống cịn � ( ) (2) Thuật tốn cải tiến mFFC mBFE có xem xét tương quan thuộc tính tập liệu huấn luyện Điều nhằm giải hạn chế BFE FFC với tập liệu có thuộc tính khơng độc lập với thuộc tính khác (3) Thứ tự chọn bổ sung loại bo thuộc tính vào xếp hạng thuộc tính tập hợp Việc xếp hạng thuộc tính vào mức độ liên quan thuộc tính với nhãn lớp Luận án đề xuất sử dụng số đo: độ lợi thơng tin, tỷ śt lợi ích hệ số tương quan thuộc tính với nhãn lớp để xếp hạng thuộc tính  Với giải pháp lấy mẫu lại tập liệu huấn luyện: tính chất đặc thù tập liệu huấn luyện sử dụng cho IDS không cân bằng, ghi gán nhãn tấn công (lớp thiểu số) chiếm tỷ trọng rất nho so với ghi gán nhãn bình thường (lớp đa số) Vì việc sử dụng kỹ thuật tăng mẫu với lớp thiểu số giảm mẫu với lớp đa số cần thiết Tuy nhiên, với kỹ thuật tăng giảm mẫu có sử dụng thuật toán k láng giềng gần nhất với tham gia tất thuộc tính, có thuộc tính khơng liên quan nhiễu Điều làm giảm hiệu thuật toán tăng giảm mẫu Luận án đề xuất sử dụng thuật tốn lựa chọn thuộc tính cải tiến mFFC mBFE phần để loại bo thuộc tính khơng liên quan nhiễu tính khoảng cách thuật tốn tìm kiếm k láng giềng gần nhất kỹ thuật tăng giảm mẫu Kết thực cho thấy, giải pháp đề xuất nâng cao chất lượng kỹ thuật tăng 150 giảm mẫu so với trường hợp sử dụng kỹ thuật tăng giảm mẫu truyền thống Tuy nhiên, việc sử dụng kỹ thuật tăng giảm mẫu cải tiến sẽ làm tăng chi phí tính tốn �×(�−1) cho việc lựa chọn thuộc tính với độ phức tạp thời gian � ( )  Với giải pháp sử dụng kỹ thuật phối hợp: luận án đề xuất giải pháp lai xây dựng phân lớp cho IDS sở kết hợp đồng thời kỹ thuật: kỹ thuật lựa chọn thuộc tính mFFC mBFE; kỹ thuật tăng mẫu giảm mẫu kết hợp mFFC mBFE kỹ thuật phối hợp đồng nhất không đồng nhất Trong thực nghiệm, tập liệu UNSW-NB15 sử dụng để huấn luyện kiểm tra đánh giá, tập liệu có nhiều ghi tấn công tổng hợp đương đại chưa nhiều nhà nghiên cứu sử dụng Bên cạnh đó, luận án sử dụng số đánh giá F- Measure để đánh giá chất lượng phân lớp IDS thử nghiệm luận án Điều phù hợp với tập liệu huấn luyện mất cân lớp có ý nghĩa quan trọng, góp phần nâng cao hiệu đánh giá Bên cạnh kết đạt được, kết nghiên cứu luận án để lại tồn tại, hạn chế định hướng phát triển tương lai sau: (1) Thời gian h́n luyện mơ hình phân lớp lai cịn lớn, nhất sử dụng phân lớp phối hợp khơng đồng nhất mơ hình Việc phối hợp đắn thuật toán để xây dựng mơ hình phân lớp lai, đa nhãn đáp ứng thời gian thực vấn đề cần tiếp tục nghiên cứu (2) Năng lực xử lý liệu tính tốn hệ thống đóng vai trị quan trọng việc khai thác thuật toán học máy Việc nâng cao hiệu xử lý theo hướng tiếp cận xử lý song song việc tối ưu hóa tham số / siêu tham số kỹ thuật học máy vấn đề bo ngo 6.2 Đánh giá ý nghĩa học thuật thực tiễn luận án Về mặt học thuật: Kết nghiên cứu luận án mang lại có ý nghĩa khoa học, kết thực nghiệm tập liệu an ninh mạng UNSW-NB15 cho thấy, so với nghiên cứu có nhiều nhà nghiên cứu, giải pháp đề xuất giúp nâng cao chất lượng phân lớp xây dựng IDS, cụ thể đề xuất: (1) Đề xuất thuật tốn lựa chọn thuộc tính sở cải tiến thuật tốn lựa chọn thuộc tính FFC BFE biết (2) Cải tiến kỹ thuật tăng mẫu (oversampling) giảm mẫu (undersampling) tập liệu huấn luyện (3) Xây dựng phân lớp lai sở kết hợp kỹ thuật tiền xử lý liệu cải tiến nêu với kỹ thuật xây dựng phân lớp phối hợp (ensemble classifier) Về mặt thực tiễn: Việc nâng cao chất lượng phân lớp IDS yêu cầu cấp thiết giai đoạn nay, mà lưu lượng truy cập mạng không ngừng gia tăng, hình thức tấn cơng mạng ngày đa dạng phức tạp, gây hậu nghiêm trọng nhiều mặt đời sống, từ kinh tế, xã hội đến an ninh, quốc phòng, … Kết nghiên cứu luận án sở quan trọng giúp nhà quản trị mạng quan, doanh nghiệp có cảnh báo sớm cách nhanh chóng xác, để từ có giải pháp ứng phó phù hợp, giảm thiểu hậu tấn công mạng gây DANH MỤC CÁC CƠNG TRÌNH ĐÃ CƠNG BỐ CỦA LUẬN ÁN [CT1] Hoàng Ngọc Thanh, Trần Văn Lăng, Hoàng Tùng, “Một tiếp cận học máy để phân lớp kiểu tấn công hệ thống phát xâm nhập mạng”, Kỷ yếu Hội nghị Khoa học Quốc gia lần thứ IX - Nghiên cứu ứng dụng Công nghệ thông tin (FAIR'9), 502-508, DOI: 10.15625/vap.2016.00061, 2016 [CT2] Hoàng Ngọc Thanh, Trần Văn Lăng, “Rút gọn thuộc tính sử dụng độ lợi thông tin để tăng cường hiệu hệ thống phát xâm nhập mạng”, Kỷ yếu Hội nghị Khoa học Quốc gia lần thứ X - Nghiên cứu ứng dụng Công nghệ thông tin (FAIR'10), 823-831, DOI: 10.15625/vap.2017.00097, 2017 [CT3] Hoàng Ngọc Thanh, Trần Văn Lăng, “Một cách tiếp cận để giảm chiều liệu việc xây dựng hệ thống phát xâm nhập mạng hiệu quả”, Hội thảo lần thứ II: Một số vấn đề chọn lọc an toàn an ninh thơng tin - TP Hồ Chí Minh, 12/2017 [CT4] Hoàng Ngọc Thanh, Trần Văn Lăng, “Tạo luật cho tường lửa sử dụng kỹ thuật kết hợp dựa định”, Kỷ yếu Hội nghị Khoa học Quốc gia lần thứ XI - Nghiên cứu ứng dụng Công nghệ thông tin (FAIR'11), 489-496, DOI: 10.15625/vap.2018.00064, 2018 [CT5] Hoang Ngoc Thanh, Tran Van Lang, “An approach to reduce data dimension in building effective network intrusion detection systems,” EAI Endorsed Transactions on Context-aware Systems and Applications, 6(18), 2019 [CT6] Hoang Ngoc Thanh, Tran Van Lang, “Use the ensemble methods when detecting DoS attacks in network intrusion detection systems,” EAI Endorsed Transactions on Context-aware Systems and Applications, 6(19), 11 2019 [CT7] Hoang Ngoc Thanh, Tran Van Lang, “Evaluating effectiveness of ensemble classifiers when detecting Fuzzers attacks on the UNSW-NB15 dataset,” Journal of Computer Science and Cybernetics, 36(2), 173-185, DOI: 10.15625/1813-9663/36/2/14786, 2020 TÀI LIỆU THAM KHẢO [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] S M Othman, F M Ba-Alwi, N T Alsohybe and A Y Al-Hashida, "Intrusion detection model using machine learning algorithm on Big Data environment," J Big Data, vol 5, no 34 https://doi.org/10.1186/s40537-018-0145-4, 2018 A Thakkar and R Lohiya, "A survey on intrusion detection system: feature selection, model, performance measures, application perspective, challenges, and future research directions," Artificial Intelligence Review, vol 55, p 453–563, 2022 A Khraisat, I Gondal, P Vamplew and J Kamruzzaman, "Survey of intrusion detection systems: techniques, datasets and challenges," Cybersecurity, vol 2, no 1, pp 1-22, 2019 H I Alsaadi, R M Almuttairi, O Bayat and a O N Ucani, "Computational intelligence algorithms to handle dimensionality reduction for enhancing intrusion detection system," J Inf Sci Eng., vol 36, no 2, pp 293-308, 2020 O Almomani, "A feature selection model for network intrusion detection system based on PSO, GWO, FFA and GA algorithms," Symmetry (Basel), vol 12, no 6, pp 1-20, 2020 M S Bonab, A Ghaffari, F S Gharehchopogh and P Alemi, "A wrapper-based feature selection for improving performance of intrusion detection systems," Int J Commun Syst., vol 33, no 12, pp 1-25, 2020 R Roberto, R José and A.-R Jesús, "Heuristic Search over a Ranking for Feature Selection," Lecture Notes in Computer Science, vol 3512, pp 742-749, 2005 N Junsomboon, "Combining Over-Sampling and Under-Sampling Techniques for Imbalance Dataset," in Proceedings of the 9th International Conference on Machine Learning and Computing, 2017 S Bagui and K Li, "Resampling imbalanced data for network intrusion detection datasets," Journal of Big Data, vol 8, no 6, 2021 H Ahmed, A Hameed and N Bawany, "Network intrusion detection using oversampling technique and machine learning algorithms," PeerJ Computer Science 8:e820 DOI 10.7717/peerj-cs.820, 2022 N V Chawla, K W Bowyer, L O Hall and W P Kegelmeyer, "SMOTE: Synthetic Minority Over-sampling Technique," Journal of Artificial Intelligence Research, p 321– 357, 2002 F Last, G Douzas and F Baỗóo, "Oversampling for Imbalanced Learning Based on KMeans and SMOTE," CoRR abs/1711.00837, 2017 Y Pristyanto, A F Nugraha, A Dahlan, L A Wirasakti, A A Zein and I Pratama, "Multiclass Imbalanced Handling using ADASYN Oversampling and Stacking Algorithm," 2022," in 2022 16th International Conference on Ubiquitous Information Management and Communication, doi: 10.1109/IMCOM53663.2022.9721632, 2022 A Pathak, "Analysis of Different SMOTE based Algorithms on Imbalanced Datasets," International Research Journal of Engineering and Technology (IRJET), vol 8, no 8, pp 4111-4114, 2021 T Elhassan, M Aljurf, F Al-Mohanna and M Shoukri, "Classification of Imbalance Data using Tomek Link (T-Link) Combined with Random Under-Sampling (RUS) as a Data Reduction Method," Journal of Informatics and Data Mining, vol 1, 2016 D Guan, W Yuan, Y.-K Lee and S Lee, "Nearest neighbor editing aided by unlabeled data," Information Sciences, vol 179, pp 2273-2282, 2009 [17] Y Chali, H Sadid and M Mojahid, "Complex question answering: homogeneous or heterogeneous," in 19th International Conference on Application of Natural Language, Montpellier, France, 2014 [18] Y Seth, "Bootstrapping - A Powerful Resampling Method in Statistics," 12 2017 [Online] Available: https://yashuseth.wordpress.com/2017/12/02/bootstrapping-aresampling-method-in-statistics/ [Accessed 22 04 2022] [19] I Syarif, E Zaluska, A Prugel-Bennett and G Wills, "Application of Bagging, Boosting and Stacking to Intrusion Detection," in International Workshop on Machine Learning and Data Mining in Pattern Recognition, 2012 [20] N Sadki, "Understand different types of Boosting Algorithms," [Online] Available: https://iq.opengenus.org/types-of-boosting-algorithms/ [Accessed 22 04 2022] [21] A J Malik, W Shahzad and F A Khan, "Binary PSO and Random Forests algorithm for probe attacks detection in a network," in Evolutionary Computation (CEC), 2011 IEEE Congress on, IEEE, 2011 [22] S H Kok, A B Abdullah, N Z Jhanjhi and M Supramaniam, "A Review of Intrusion Detection System using Machine Learning Approach," International Journal of Engineering Research and Technology, vol 12, no 1, pp 8-15, 2019 [23] A Özgür and H Erdem, "A review of KDD99 dataset usage in intrusion detection and machine learning between 2010 and 2015," PeerJ PrePrints, 2016 [24] M Tavallaee, E Bagheri, W Lu and A.-A Ghorbani, "A detailed analysis of the KDDCup99 data set," in Proceedings of the Second IEEE Symposium on Computational Intelligence for Security and Defence Applications 2009, 2009 [25] E K Viegas, A O Santin and L S Oliveira, "Toward a reliable anomaly-based intrusion detection in real-world environments," Computer Networks, vol 127, pp 200-216, 2017 [26] S Aljawarneh, M Aldwairi and M B Yassein, "Anomaly-based intrusion detection system through feature selection analysis and building hybrid efficient model," Journal of Computational Science, vol 25, pp 152-160, 2018 [27] I Sharafaldin, A H Lashkari and A A Ghorbani, "Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization," in Proceeding of 4th International Conference on Information Systems Security and Privacy, 2018 [28] N Moustafa and J Slay, "UNSW-NB15: A comprehensive data set for network intrusion detection systems," in Conference on Military Communications and Information Systems, 2015 [29] F Melo, "Area under the ROC Curve," in Encyclopedia of Systems Biology, Springer New York, 2013, pp 38-39 [30] M Ring, S Wunderlich, D Scheuring, D Landes and A Hotho, "A Survey of Networkbased Intrusion Detection Data Sets," Computers & Security, vol 86, p 147–167, 2019 [31] S Raschka, "Model Evaluation, Model Selection, and Algorithm Selection in Machine Learning," arXiv:1811.12808v2 [cs.LG] Dec 2018, 2018 [32] M Bekkar, H K Djemaa and T A Alitouche, "Evaluation Measures for Models Assessment over Imbalanced Data Sets," Journal of Information Engineering and Applications, vol 3, no 10, pp 27-38, 2013 [33] J Brownlee, "Machine Learning Mastery," 08 01 2020 [Online] Available: https://machinelearningmastery.com/tour-of-evaluation-metrics-for-imbalancedclassification/ [Accessed 22 04 2022] [34] M Torabi, N I Udzir, M T Abdullah and R Yaakob, "A Review on Feature Selection and Ensemble Techniques for Intrusion Detection System," International Journal of Advanced Computer Science and Applications, vol 15, no 5, pp 538-553, 2021 [35] Z Liu, R Wang, M Tao and X Cai, "A class-oriented feature selection approach for multi-class imbalanced network traffic datasets based on local and global metrics fusion," Neurocomputing, vol 168, pp 365-381, 2015 [36] Y Zhu, J Liang, J Chen and Z Ming, "An improved NSGA-III algorithm for feature selection used in intrusion detection," Knowledge-Based Systems, vol 116, pp 74-85, 2017 [37] V B Vaghela, K H Vandra and N K Modi, "Entropy Based Feature Selection For Multi- Relational Naïve Bayesian Classifier," Journal of International Technology and Information Management, vol 23, no 1, pp 13-26, 2014 [38] Z Weidong, F Jingyu and L Yongmin, "Using Gini-Index for Feature Selection in Text Categorization," in 3rd International Conference on Information, Business and Education Technology, 2014 [39] A R A Yusof, N I Udzir, A Selamat, H Hamdan and M T Abdullah, "Adaptive feature selection for denial of services (DoS) attack," in 2017 IEEE Conference on Application, Information and Network Security (AINS), 2017 [40] N Sharma, P Verlekar, R Ashary and S Zhiquan, "Regularization and feature selection for large dimensional data," Machine Learning (cs.LG); Numerical Analysis (math.NA); Optimization and Control, arXiv:1712.01975, pp 1-12, 2019 [41] K Chen, F Y Zhou and X F Yuan, "Hybrid particle swarm optimization with spiralshaped mechanism for feature selection," Expert Systems with Applications, vol 128, pp 140-156, 2019 [42] B Ma and Y Xia, "A tribe competition-based genetic algorithm for feature selection in pattern classification," Applied Soft Computing, vol 58, pp 328-338, 2017 [43] T Mehmod and H B M Rais, "Ant colony optimization and feature selection for intrusion detection," in Advances in Machine Learning and Signal Processing, vol 387, Springer International Publishing, 2016, pp 305-312 [44] F Kuang, S Zhang, Z Jin and W Xu, "A novel SVM by combining kernel principal component analysis and improved chaotic particle swarm optimization for intrusion detection," Soft Computing, vol 19, no 5, pp 1187-1199, 2015 [45] M R G Raman, N Somu, K Kirthivasan, R Liscano and V S S Sriram, "An efficient intrusion detection system based on hypergraph - Genetic algorithm for parameter optimization and feature selection in support vector machine," Knowledge-Based Systems, vol 134, pp 1-12, 2017 [46] M H Ali, B A D A Mohammed, A Ismail and M F Zolkipli, "A New Intrusion Detection System Based on Fast Learning Network and Particle Swarm Optimization," IEEE Access, vol 6, pp 20255-20261, 2018 [47] P T T Hồng and N T Thủy, "Đánh giá kỹ thuật lựa chọn đặc trưng cho toán phân loại biểu gen," Tạp chí khoa học nông nghiệp Việt Nam, vol 14, no 3, pp 461- 468, 2016 [48] J Leevy, T Khoshgoftaar, R Bauder and N Seliya, "A survey on addressing high-class imbalance in big data," Journal of Big Data, vol 5, no 1, 2018 [49] J Johnson and T Khoshgoftaar, "Survey on deep learning with class imbalance," Journal of Big Data, vol 6, no 1, 2019 [50] B Raghuwanshi and S Shukla, "SMOTE based class-specific extreme learning machine for imbalanced learning," Knowledge-Based Systems, vol 187, p 104814, 2020 [51] A Luque, A Carrasco and A Martín, "The impact of class imbalance in classification performance metrics based on the binary confusion matrix," Pattern Recognition, vol 91, pp 216-231, 2019 [52] G Douzas and F Bacao, "Effective data generation for imbalanced learning using conditional generative adversarial networks," Expert Systems with Applications, vol 91, pp 464-471, 2018 [53] G Douzas, F Bacao and F Last, "Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE," Information Sciences, vol 465, pp 1-20, 2018 [54] A Amin, S Anwar, A Adnan, M Nawaz, N Howard, J Quadir, A Havalah and A Hussain, "Comparing Oversampling Techniques to Handle the Class Imbalance Problem: A Customer Churn Prediction Case Study," IEEE Access, vol 4, pp 7940-7957, 2016 [55] L Abdi and S Hashemi, "To Combat Multi-Class Imbalanced Problems by Means of Over-Sampling Techniques," IEEE Transactions on Knowledge and Data Engineering, vol 28, no 1, pp 238-251, 2016 [56] A Fernández, S Río, N Chawla and F Herrera, "An insight into imbalanced Big Data classification: outcomes and challenges," Complex & Intelligent Systems, vol 3, 2017 [57] M Basgall, W Hasperué, M Naiouf, A Fernández and F Herrera, "SMOTE-BD: An Exact and Scalable Oversampling Method for Imbalanced Classification in Big Data," Journal of Computer Science and Technology, vol 18, p e23, 2018 [58] D Terzi and S Sagiroglu, "A new big data model using distributed cluster-based resampling for class-imbalance problem," Applied Computer Systems, vol 24, pp 104110, 2019 [59] P Gutiérrez, M Lastra, J Benítez and F Herrara, "SMOTE-GPU: Big Data preprocessing on commodity hardware for imbalanced classification," Progress in Artificial Intelligence, vol 6, 2017 [60] I Triguero, M Galar, D Merino, J Maillo, H Bustince and F Herrera, "Evolutionary Undersampling for Extremely Imbalanced Big Data Classification under Apache Spark," in 2016 IEEE congress on evolutionary computation, Vancouver, 2016 [61] N T L Anh, "Thuật toán HMU toán phân lớp liệu mất cân bằng," Tạp chí khoa học và giáo dục, vol 02, no 42, pp 101-108, 2017 [62] A Verma and V Ranga, "Statistical analysis of CIDDS-001 dataset for Network Intrusion Detection Systems using Distance-based Machine Learning," Procedia Computer Science, vol 125, pp 709-716, 2018 [63] T Hamed, R Dara and S C Kremer, "Network intrusion detection system based on recursive feature addition and bigram technique," Computers & Security, vol 73, pp 137- 155, 2018 [64] C R Wang, R F Xu, S J Lee and C H Lee, "Network intrusion detection using equality constrained-optimization-based extreme learning machines," Knowledge-Based Systems, vol 147, pp 68-80, 2018 [65] G Fernandes, L F Carvalho, J J P C Rodrigues and M L Proenỗa, "Network anomaly detection using IP flows with Principal Component Analysis and Ant Colony Optimization," Journal of Network and Computer Applications, vol 64, pp 1-11, 2016 [66] A H Hamamoto, L F Carvalho, L D H Sampaio, T Abróo and M L Proenỗa, "Network Anomaly Detection System using Genetic Algorithm and Fuzzy Logic," Expert Systems with Applications, vol 92, pp 390-402, 2018 [67] W L Al-Yaseen, Z A Othman and M Z A Nazri, "Multi-level hybrid support vector machine and extreme learning machine based on modified K-means for intrusion detection system," Expert Systems with Applications, vol 67, pp 296-303, 2017 [68] I S Thaseen and C A Kumar, "Intrusion detection model using fusion of chi-square feature selection and multi class SVM," Journal of King Saud University Computer and Information Sciences, vol 29, pp 462-472, 2017 [69] R A R Ashfaq, X Z Wang, J Z Huang, H Abbas and Y L He, "Fuzziness based semi- supervised learning approach for intrusion detection system," Information Sciences, vol 378, pp 484-497, 2017 [70] U Ravale, N Marathe and P Padiya, "Feature selection based hybrid anomaly intrusion detection system using K Means and RBF kernel function," Procedia Computer Science, vol 45, pp 428-435, 2015 [71] V Hajisalem and S Babaie, "A hybrid intrusion detection system based on ABC-AFS algorithm for misuse and anomaly detection," Computer Networks, vol 136, pp 37-50, 2018 [72] C Khammassi and S Krichen, "A GA-LR wrapper approach for feature selection in network intrusion detection," Computers & Security, vol 70, pp 255-277, 2017 [73] M R G Raman, N Somu, K Kirthivasan, R Liscano and V S S Sriram, "An efficient intrusion detection system based on hypergraph - Genetic algorithm for parameter optimization and feature selection in support vector machine," Knowledge-Based Systems, vol 134, pp 1-12, 2017 [74] S Shitharth and D P Winston, "An enhanced optimization based algorithm for intrusion detection in SCADA network," Computers & Security, vol 70, pp 16-26, 2017 [75] S M H Bamakan, H Wang, T Yingjie and Y Shi, "An effective intrusion detection framework based on MCLP/SVM optimized by time-varying chaos particle swarm optimization," Neurocomputing, vol 199, pp 90-102, 2016 [76] H Wang, J Gu and S Wang, "An effective intrusion detection framework based on SVM with feature augmentation," Knowledge-Based Systems, vol 136, pp 130-139, 2017 [77] S Roshan, Y Miche, A Akusok and A Lendasse, "Adaptive and online network intrusion detection system using clustering and Extreme Learning Machines," Journal of The Franklin Institute, vol 355, pp 1752-1779, 2018 [78] C Guo, Y Ping, N Liu and S S Luo, "A two-level hybrid approach for intrusion detection," Neurocomputing, vol 214, pp 391-400, 2016 [79] S Y Ji, B K Jeong, S Choi and D H Jeong, "A multi-level intrusion detection method for abnormal network behaviors," Journal of Network and Computer Applications, vol 62, pp 9-17, 2016 [80] A A Aburomman and M B I Reaz, "A novel weighted support vector machines multiclass classifier based on differential evolution for intrusion detection systems," Information Sciences, vol 414, pp 225-246, 2017 [81] A S Amira, S E O Hanafi and A E Hassanien, "Comparison of classification techniques applied for network intrusion detection and classification," Journal of Applied Logic, vol 24, pp 109-118, 2017 [82] M Mazini, B Shirazi and I Mahdavi, "Anomaly network-based intrusion detection system using a reliable hybrid artificial bee colony and AdaBoost algorithms," Journal of King Saud University Computer and Information Sciences, 2018 [83] W C Lin, S W Ke and C F Tsai, "CANN: An intrusion detection system based on combining cluster centers and nearest neighbors," Knowledge-Based Systems, vol 78, pp 13-21, 2015 [84] D Papamartzivanos, F G Mármol and G Kambourakis, "Dendron: Genetic trees driven rule induction for network intrusion detection systems," Future Generation Computer Systems, vol 79, pp 558-574, 2018 [85] G Folino, C Pizzuti and G Spezzano, "An ensemble-based evolutionary framework for coping with distributed intrusion detection," Genetic Programming and Evolvable Machines, vol 11, pp 131-146, 2010 [86] M Gudadhe, P Prasad and K Wankhade, "A new data mining based network intrusion detection model," in Computer and Communication Technology (ICCCT), International Conference on, IEEE, 2010 [87] P P Angelov and X Zhou, "Evolving fuzzy-rule-based classifiers from data streams," Fuzzy Systems, IEEE Transactions on, vol 16, pp 1462-1475, 2008 [88] E Bahri, N Harbi and H N Huu, "Approach based ensemble methods for better and faster intrusion detection," Computational Intelligence in Security for Information Systems, Springer, pp 17-24, 2011 [89] D Gaikwad and R C Thool, "Intrusion detection system using bagging with partial decision treebase classifier," Procedia Computer Science, vol 49, pp 92-98, 2015 [90] L Lin, R Zuo, S Yang and Z Zhang, "SVM ensemble for anomaly detection based on rotation forest," in Intelligent Control and Information Processing (ICICIP), 2012 Third International Conference on, IEEE, 2012 [91] G Kumar and K Kumar, "Design of an evolutionary approach for intrusion detection," The Scientific World Journal, 2013 [92] V Bukhtoyarov and V Zhukov, "Ensemble-distributed approach in classification problem solution for intrusion detection systems," in Intelligent Data Engineering and Automated Learning-IDEAL 2014, Springer, 2014 [93] S Masarat, H Taheri and S Sharifian, "A novel framework, based on fuzzy ensemble of classifiers for intrusion detection systems," in Computer and Knowledge Engineering (ICCKE), 2014 4th International eConference on, IEEE, 2014 [94] S Mukkamala, A H Sung and A Abraham, "Intrusion detection using an ensemble of intelligent paradigms," Journal of network and computer applications, vol 28, pp 167182, 2005 [95] M Govindarajan and R Chandrasekaran, "Intrusion detection using an ensemble of classification methods," in World Congress on Engineering and Computer Science, 2012 [96] Y Meng and L F Kwok, "Enhancing false alarm reduction using voted ensemble selection in intrusion detection," International Journal of Computational Intelligence Systems, vol 6, pp 626-638, 2013 [97] N F Haq, A R Onik and F M Shah, "An ensemble framework of anomaly detection using hybridized feature selection approach (HFSA)," in SAI Intelligent Systems Conference (IntelliSys), 2015 [98] Y Gu, B Zhou and J Zhao, "PCA-ICA ensembled intrusion detection system by paretooptimal optimization," The Journal of Information Technology, vol 7, pp 510-515, 2008 [99] A P F Chan, W W Y Ng, D S Yeung and E C C Tsang, "Comparison of different fusion approaches for network intrusion detection using ensemble of RBFNN," in 2005 International Conference on Machine Learning and Cybernetics, 2005 [100] A Borji, Berlin and Heidelberg, "Combining Heterogeneous Classifiers for Network Intrusion Detection," in Advances in Computer Science - ASIAN 2007, 2007 [101] B A Tama and K H Rhee, "A combination of PSO-based feature selection and treebased classifiers ensemble for intrusion detection systems," Advances in Computer Science and Ubiquitous Computing, Springer, pp 489-495, 2015 [102] J Kim, H L T Thu and H Kim, "Long Short Term Memory Recurrent Neural Network Classifier for Intrusion Detection," in International Conference on Platform Technology and Service, 2016 [103] B Abolhasanzadeh, "Nonlinear dimensionality reduction for intrusion detection using auto-encoder bottleneck features," in 7th Conference on Information and Knowledge Technology, 2015 [104] U Fiore, F Palmieri, A Castiglione and A D Santis, "Network anomaly detection with the restricted Boltzmann machine," Neurocomputing, vol 122, pp 13-23, 2013 [105] N Gao, L Gao, Q Gao and H Wang, "An Intrusion Detection Model Based on Deep Belief Networks," in Second International Conference on Advanced Cloud and Big Data, 2014 [106] M Z Alom, V Bontupalli and T M Taha, "Intrusion detection using deep belief networks," in National Aerospace and Electronics Conference, 2015 [107] H Hota and A K Shrivas, "Data mining approach for developing various models based on types of attack and feature selection as intrusion detection systems (IDS)," Intelligent Computing, Networking, and Informatics, Springer, pp 845-851, 2014 [108] M S Pervez and D M Farid, "Feature selection and intrusion classification in NSLKDDCup99 dataset employing SVMs," in Software, Knowledge, Information Management and Applications (SKIMA), 2014 8th International Conference on, 2014 [109] A C Enache and V V Patriciu, "Intrusions detection based on support vector machine optimized with swarm intelligence," in Applied Computational Intelligence and Informatics (SACI), 2014 IEEE 9th International Symposium on, 2014 [110] N Jankowski and K Gra˛bczewski, "Heterogenous committees with competence analysis," in Hybrid Intelligent Systems HIS’05 Fifth International Conference on, IEEE, 2005 [111] Y Chen and Y Zhao, "A novel ensemble of classifiers for microarray data classification," Applied soft computing, vol 8, pp 1664-1669, 2008 [112] A Eleyan, H Özkaramanli and H Demirel, "Weighted majority voting for face recognition from low resolution video sequences," in Soft Computing, Computing with Words and Perceptions in System Analysis, Decision and Control, 2009 ICSCCW 2009 Fifth International Conference on, IEEE, 2009 [113] J Richiardi and A Drygajlo, "Reliability-based voting schemes using modalityindependent features in multi-classifier biometric authentication," Multiple Classifier Systems, Springer, pp 377-386, 2007 [114] A Kausar, M Ishtiaq, M A Jaffar and A M Mirza, "Optimization of ensemble based decision using PSO," in Proceedings of the World Congress on Engineering, WCE, 2010 [115] M A Tahir, J Kittler and A Bouridane, "Multilabel classification using heterogeneous ensemble of multi-label classifiers," Pattern Recognition Letters, vol 33, pp 513-523, 2012 [116] X Zhang, P Wang, L Du and H Liu, "New method for radar HRRP recognition and rejection based on weighted majority voting combination of multiple classifiers," in Signal Processing, Communications and Computing (ICSPCC), 2011 IEEE International Conference on, IEEE, 2011 [117] S Gu and Y Jin, "Heterogeneous classifier ensembles for EEG-based motor imaginary detection," in 2012 12th UK Workshop on Computational Intelligence (UKCI), IEEE, 2012 [118] L I Kuncheva and J J Rodríguez, "A weighted voting framework for classifiers ensembles," Knowledge and Information Systems, vol 38, pp 259-275, 2014 [119] H Toman, L Kovacs, A Jonas, L Hajdu and A Hajdu, "Generalized weighted majority voting with an application to algorithms having spatial output," in International Conference on Hybrid Artificial Intelligence Systems, Springer, 2012 [120] F Ye, Z Zhang, K Chakrabarty and X Gu, "Board-level functional fault diagnosis using artificial neural networks, support-vector machines, and weighted-majority voting," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol 32, pp 723-736, 2013 [121] J Cheng and L Chen, "A weighted regional voting based ensemble of multiple classifiers for face recognition," in International Symposium on Visual Computing, Springer, 2014 [122] K Remya and J Ramya, "Using weighted majority voting classifier combination for relation classification in biomedical texts," in Control, Instrumentation, Communication and Computational Technologies (ICCICCT), 2014 International Conference on, IEEE, 2014 [123] H F Eid, A Darwish, A E Hassanien and T.-H Kim, "Intelligent hybrid anomaly network intrusion detection system," Communication and Networking, Springer, pp 209218, 2011 [124] E D Hoz, E Hoz, A Ortiz, J Ortega and A Martínez-Álvarez, "Feature selection by multi-objective optimisation: Application to network anomaly detection by hierarchical self-organising maps," Knowledge-Based Systems, vol 71, pp 322-338, 2014 [125] S Rastegari, P Hingston and C.-P Lam, "Evolving statistical rulesets for network intrusion detection," Applied Soft Computing, vol 33, pp 348-359, 2015 [126] R Singh, H Kumar and R Singla, "An intrusion detection system using network traffic profiling and online sequential extreme learning machine," Expert Systems with Applications, vol 42, pp 8609-8624, 2015 [127] N K Kanakarajan and K Muniasamy, "Improving the accuracy of intrusion detection using gar-forest with feature selection," in Proceedings of the 4th International Conference on Frontiers in Intelligent Computing: Theory and Applications (FICTA) 2015, Springer, 2016 [128] A E Hassanien, T.-H Kim, J Kacprzyk and A I Awad, "Bio-inspiring Cyber Security and Cloud Services: Trends and Innovations," Springer, vol 70, 2014 [129] H H Pajouh, G Dastghaibyfard and S Hashemi, "Two-tier network anomaly detection model: a machine learning approach," Journal of Intelligent Information Systems, pp 114, 2015 [130] J Sharma, C Giri, O.-C Granmo and M Goodwin, "Multi-layer intrusion detection system with ExtraTrees feature selection, extreme learning machine ensemble, and softmax aggregation," EURASIP Journal on Information Security, vol 2019, pp 1-15, 2019 [131] S Moualla, K Khorzom and A Jafar, "Improving the Performance of Machine Learning- Based Network Intrusion Detection Systems on the UNSW-NB15 Dataset," Computational Intelligence and Neuroscience, vol 2021, pp 1-13, 2021 [132] V V Cảnh, "Phát xâm nhập mạng sử dụng kỹ thuật học máy," Tạp chí nghiên cứu khoa học và công nghệ quân sự, pp 105-120, 05-2017 [133] P V Huong, L D Thuan, L T H Van and D V Hung, "Intrusion Detection in IoT Systems Based on Deep Learning Using Convolutional Neural Network," in 6th NAFOSTED Conference on Information and Computer Science (NICS), Ha Noi - Viet Nam, 2019 [134] L H Hiep, L X Hieu, H T Tuyen and D T Quy, "Studying a solution for early detection of DDoS attacks based on machine learning algorithms," vol 227, no 11, p 137 – 144, 2022 [135] T M Tuấn, P H Hảo and T T Nam, "Hệ thống phát xâm nhập hai tầng cho mạng IoT sử dụng máy học," Tạp chí Khoa học Trường Đại học Cần Thơ, vol 58, no 2, pp 43-50, 2022 [136] C Sergio, D S Javier, L Ibai, O Ignacio, S Javier, J S Javier and I T Ana, "Chapter - Big Data in Road Transport and Mobility Research," in Intelligent Vehicles, Butterworth- Heinemann, 2018, pp 175-205 [137] Kotthoff, Lars, C Thornton, H H Hoos, F Hutter and K Leyton-Brown, "Auto-WEKA: Automatic model selection and hyperparameter optimization in WEKA," Automated Machine Learning - Springer, pp 81-95, 2019 [138] M Artur, "Review the performance of the Bernoulli Naïve Bayes Classifier in Intrusion Detection Systems using Recursive Feature Elimination with Cross-validated selection of the best number of features," in The 2020 Annual International Conference on BrainInspired Cognitive Architectures for Artificial Intelligence: Eleventh Annual Meeting of the BICA Society, 2021 [139] W Lian, G Nie, B Jia, D Shi, Q Fan and Y Liang, "An Intrusion Detection Method Based on Decision Tree-Recursive Feature Elimination in Ensemble Learning," Mathematical Problems in Engineering, vol 2020, pp 1-15, 2020 [140] B Neal, S Mittal, A Baratin, V Tantia, M Scicluna, S Lacoste-Julien and I Mitliagkas, "A modern take on the bias-variance tradeoﬀ in neural networks," ArXiv, vol abs/1810.08591, 2018 [141] V Kumar, D Sinha, A K Das, S C Pandey and R T Goswami, "An integrated rule based intrusion detection system: analysis on UNSW-NB15 data set and the real time online dataset," Cluster Computing, vol 23, p 1397–1418, 2019 ... GIÁO DỤC VÀ ĐÀO TẠO TRƯỜNG ĐẠI HỌC LẠC HỒNG HOÀNG NGỌC THANH KỸ THUẬT HỌC MÁY PHỐI HỢP VÀ TIỀN XỬ LÝ DỮ LIỆU TRONG VIỆC NÂNG CAO CHẤT LƯỢNG PHÂN LỚP CỦA CÁC HỆ THỐNG PHÁT HIỆN XÂM NHẬP MẠNG... ngành Khoa học máy tính, khóa 2015, Trường đại học Lạc Hồng Tơi xin cam đoan luận án tiến sĩ ? ?Kỹ thuật học máy phối hợp tiền xử lý liệu việc nâng cao chất lượng phân lớp hệ thống phát xâm nhập mạng”... thức việc cải thiện độ xác, giảm tỷ lệ cảnh báo sai phát tấn công mới Nội dung luận án đề xuất số giải pháp sử dụng kỹ thuật học máy phối hợp cải tiến kỹ thuật tiền xử lý liệu việc nâng cao chất

Ngày đăng: 02/12/2022, 19:05

Nguồn tham khảo

Tài liệu tham khảo

Loại

Chi tiết

[1] S. M. Othman, F. M. Ba-Alwi, N. T. Alsohybe and A. Y. Al-Hashida, "Intrusion detection model using machine learning algorithm on Big Data environment," J Big Data, vol. 5, no. 34 https://doi.org/10.1186/s40537-018-0145-4, 2018

Sách, tạp chí

Tiêu đề:	Intrusiondetection model using machine learning algorithm on Big Data environment

[2] A. Thakkar and R. Lohiya, "A survey on intrusion detection system: feature selection, model, performance measures, application perspective, challenges, and future research directions," Artificial Intelligence Review, vol. 55, p. 453–563, 2022

Sách, tạp chí

Tiêu đề:	A survey on intrusion detection system: feature selection,model, performance measures, application perspective, challenges, and future researchdirections

[3] A. Khraisat, I. Gondal, P. Vamplew and J. Kamruzzaman, "Survey of intrusion detection systems: techniques, datasets and challenges," Cybersecurity, vol. 2, no. 1, pp. 1-22, 2019

Sách, tạp chí

Tiêu đề:	Survey of intrusion detection systems: techniques, datasets and challenges

[4] H. I. Alsaadi, R. M. Almuttairi, O. Bayat and a. O. N. Ucani, "Computational intelligence algorithms to handle dimensionality reduction for enhancing intrusion detection system," J. Inf. Sci. Eng., vol. 36, no. 2, pp. 293-308, 2020

Sách, tạp chí

Tiêu đề:	Computationalintelligence algorithms to handle dimensionality reduction for enhancing intrusiondetection system

[5] O. Almomani, "A feature selection model for network intrusion detection system based on PSO, GWO, FFA and GA algorithms," Symmetry (Basel), vol. 12, no. 6, pp. 1-20, 2020

Sách, tạp chí

Tiêu đề:	A feature selection model for network intrusion detection system based on PSO, GWO, FFA and GA algorithms

[6] M. S. Bonab, A. Ghaffari, F. S. Gharehchopogh and P. Alemi, "A wrapper-based feature selection for improving performance of intrusion detection systems," Int. J. Commun.Syst., vol. 33, no. 12, pp. 1-25, 2020

Sách, tạp chí

Tiêu đề:	A wrapper-based featureselection for improving performance of intrusion detection systems

[7] R. Roberto, R. José and A.-R. Jesús, "Heuristic Search over a Ranking for Feature Selection," Lecture Notes in Computer Science, vol. 3512, pp. 742-749, 2005

Sách, tạp chí

Tiêu đề:	Heuristic Search over a Ranking for Feature Selection

[8] N. Junsomboon, "Combining Over-Sampling and Under-Sampling Techniques for Imbalance Dataset," in Proceedings of the 9th International Conference on Machine Learning and Computing, 2017

Sách, tạp chí

Tiêu đề:	Combining Over-Sampling and Under-Sampling Techniques forImbalance Dataset

[9] S. Bagui and K. Li, "Resampling imbalanced data for network intrusion detection datasets," Journal of Big Data, vol. 8, no. 6, 2021

Sách, tạp chí

Tiêu đề:	Resampling imbalanced data for network intrusion detection datasets

[10] H. Ahmed, A. Hameed and N. Bawany, "Network intrusion detection using oversampling technique and machine learning algorithms," PeerJ Computer Science 8:e820 DOI 10.7717/peerj-cs.820, 2022

Sách, tạp chí

Tiêu đề:	Network intrusion detection usingoversampling technique and machine learning algorithms

[11] N. V. Chawla, K. W. Bowyer, L. O. Hall and W. P. Kegelmeyer, "SMOTE: Synthetic Minority Over-sampling Technique," Journal of Artificial Intelligence Research, p. 321–357, 2002

Sách, tạp chí

Tiêu đề:	SMOTE: SyntheticMinority Over-sampling Technique

[12] F. Last, G. Douzas and F. Baỗóo, "Oversampling for Imbalanced Learning Based on K- Means and SMOTE," CoRR abs/1711.00837, 2017

Sách, tạp chí

Tiêu đề:	Oversampling for Imbalanced Learning Based on K- Means and SMOTE

[13] Y. Pristyanto, A. F. Nugraha, A. Dahlan, L. A. Wirasakti, A. A. Zein and I. Pratama,"Multiclass Imbalanced Handling using ADASYN Oversampling and Stacking Algorithm," 2022," in 2022 16th International Conference on Ubiquitous Information Management and Communication, doi: 10.1109/IMCOM53663.2022.9721632, 2022

Sách, tạp chí

Tiêu đề:	Multiclass Imbalanced Handling using ADASYN Oversampling and StackingAlgorithm," 2022

[14] A. Pathak, "Analysis of Different SMOTE based Algorithms on Imbalanced Datasets,"International Research Journal of Engineering and Technology (IRJET), vol. 8, no. 8, pp. 4111-4114, 2021

Sách, tạp chí

Tiêu đề:	Analysis of Different SMOTE based Algorithms on Imbalanced Datasets

[15] T. Elhassan, M. Aljurf, F. Al-Mohanna and M. Shoukri, "Classification of Imbalance Data using Tomek Link (T-Link) Combined with Random Under-Sampling (RUS) as a Data Reduction Method," Journal of Informatics and Data Mining, vol. 1, 2016

Sách, tạp chí

Tiêu đề:	Classification of ImbalanceData using Tomek Link (T-Link) Combined with Random Under-Sampling (RUS) as aData Reduction Method

[16] D. Guan, W. Yuan, Y.-K. Lee and S. Lee, "Nearest neighbor editing aided by unlabeled data," Information Sciences, vol. 179, pp. 2273-2282, 2009

Sách, tạp chí

Tiêu đề:	Nearest neighbor editing aided by unlabeled data

[17] Y. Chali, H. Sadid and M. Mojahid, "Complex question answering: homogeneous or heterogeneous," in 19th International Conference on Application of Natural Language, Montpellier, France, 2014

Sách, tạp chí

Tiêu đề:	Complex question answering: homogeneous orheterogeneous

[18] Y. Seth, "Bootstrapping - A Powerful Resampling Method in Statistics," 2 12 2017.[Online]. Available: https://yashuseth.wordpress.com/2017/12/02/bootstrapping-a-resampling-method-in-statistics/. [Accessed 22 04 2022]

Sách, tạp chí

Tiêu đề:	Bootstrapping - A Powerful Resampling Method in Statistics

[19] I. Syarif, E. Zaluska, A. Prugel-Bennett and G. Wills, "Application of Bagging, Boosting and Stacking to Intrusion Detection," in International Workshop on Machine Learning and Data Mining in Pattern Recognition, 2012

Sách, tạp chí

Tiêu đề:	Application of Bagging, Boostingand Stacking to Intrusion Detection

[20] N. Sadki, "Understand different types of Boosting Algorithms," [Online]. Available: https://iq.opengenus.org/types-of-boosting-algorithms/. [Accessed 22 04 2022]

Sách, tạp chí

Tiêu đề:	Understand different types of Boosting Algorithms