Nghiên cứu xây dựng hệ thống V-Sandbox trong phân tích và phát hiện mã độc IoT Botnet.Nghiên cứu xây dựng hệ thống V-Sandbox trong phân tích và phát hiện mã độc IoT Botnet.Nghiên cứu xây dựng hệ thống V-Sandbox trong phân tích và phát hiện mã độc IoT Botnet.Nghiên cứu xây dựng hệ thống V-Sandbox trong phân tích và phát hiện mã độc IoT Botnet.Nghiên cứu xây dựng hệ thống V-Sandbox trong phân tích và phát hiện mã độc IoT Botnet.Nghiên cứu xây dựng hệ thống V-Sandbox trong phân tích và phát hiện mã độc IoT Botnet.
1 MINISTRY OF EDUCATION AND TRAINING VIETNAM ACADEMY OF SIENCE AND TECHNOLOGY GRADUATE UNIVERSITY OF SIENCE AND TECHNOLOGY …… ….***………… LE HAI VIET RESEARCH ON BUILDING A V-SANDBOX SYSTEM FOR ANALYSIS AND DETECTION OF IOT BOTNETS Major: Information System Major code: 48 01 04 SUMMARY OF COMPUTER DOCTORAL THESIS Ha Noi – 2022 The thesis has been completed at: Graduate University of Science and Technology- Vietnam Academy of Science and Technology Supervisor 1: Dr Ngo Quoc Dung Supervisor 2: Prof Dr Vu Duc Thi Reviewer 1: … Reviewer 2: … Reviewer 3: … The thesis shall be defended in front of the Thesis Committee at Vietnam Academy Of Science And Technology - Graduate University Of Science And Technology, at ….… hour……, date…… month….…year 2022 This thesis could be found at: - The National Library of Vietnam - The Library of Graduate University of Science and Technology INTRODUCTION The urgency of the thesis Taking advantage of critical security vulnerabilities on IoT devices that are increasingly common, large-scale denial of service attacks have been discovered IoT botnet malware has different characteristics from traditional botnets, such as spread method, attack efficiency, etc In the above situation, the problem of researching solutions to detect IoT Botnet malware on devices and resource-constrained IoT is an urgent requirement Research objectives of the thesis Research and build a behavioral data collection and malware detection system based on machine learning models to improve accuracy and reduce complexity in IoT Botnet malware detection on resource-limited IoT devices using a dynamic analysis method The main contents of the thesis - Survey the characteristics of IoT devices with limited resources From there, choose a method to detect IoT Botnet malware that appears on this type of device - Research and build a sandbox environment to ensure the conditions for fully collecting behavioral data from IoT Botnet malicious code - Propose method of DSCG (Directed System Call Graph) can efficiently extract features for the detection of IoT botnets The proposed method has low complexity but still ensures high accuracy in IoT Botnet detection - Research and propose a machine learning model collaborating with suitable features capable of early detection of IoT botnets - The featured assessment and the proposed machine learning model for accuracy and efficiency in IoT Botnet malware detection are based on a sufficiently large and reliable data set with relevant studies to highlight the scientific contribution of the thesis CHAPTER OVERVIEW OF IOT DEVICES AND IOT BOTNETS 1.1 Overview of IoT devices 1.1.1.The definition of IoT devices Definition 1.1 IoT devices are devices capable of connecting and sharing data and resources based on existing and developing compatible information and communication technologies, self-reacting to changes in the environment to achieve achieve a certain goal 1.1.2 The classification of IoT devices IoT devices are divided into two main categories: constrained resource devices and high-capacity resource devices [9] The thesis uses the concept of resource-constrained IoT devices as follows: Definition 1.2 Resource-constrained IoT devices are IoT devices that have a limited structure in terms of resources used (such as data processing capacity, memory capacity, data transmission bandwidth, etc.) 1.1.3 Security issues exist on resource-constrained IoT devices With security problems stemming from the resource-limited characteristics of IoT devices, the use of botnets for denial of service attacks is becoming increasingly common and causes more severe consequences [19] With its characteristics, IoT Botnet malware requires a new mechanism to detect and prevent Therefore, the subject area that the Ph.D student chooses to study is IoT Botnet 1.2 Overview of IoT Botnets 1.2.1.The definition of IoT Botnets Definition 1.3 An IoT botnet is malware capable of infiltrating and infecting resource-limited IoT devices for botnet building 1.2.2 Characteristics of IoT Botnets Table 1.1 Compare characteristics of traditional botnets and IoT botnets Characteristics Traditional botnets x86 x64 (Intel AMD); Processor architecture, Windows OS; operating system Source code obfuscation technique Use complex techniques DDoS, Spam, Crypto mining,… Detectability Relatively easy to detect Storage location HDD, SSD, Flash,… Prevent other malware No Aim IoT botnets MIPS, ARM, SPARC, PowerPC,…;OS Kernel Linux 2.6/3.2 Less use of complicated techniques DDoS Hard to detect RAM Yes 1.3 IoT Botnet detection procedure 1.3.1 Overview The two main methods for detecting IoT botnets are static analysis and dynamic analysis Dynamic analysis is a method of detecting malicious behavior based on monitoring, collecting, and classifying the interaction behavior of the sample with the target environment Dynamic analysis can eliminate code obfuscation techniques commonly encountered in static analysis However, the challenge in performing dynamic analysis is building an environment that allows the malware to fully expose behaviors and be able to fully monitor those behaviors In addition, analyzing and detecting malicious behavior in a large amount of behavioral data collected is also a challenge To achieve the goal of the thesis, the Ph.D student chooses the direction of dynamic analysis and proposes a plan to overcome the weaknesses of this direction 1.3.2 Data collection According to the survey results, there are main groups of dynamic data collected through monitoring the execution environment, including network flows [32–35]; system calls [36, 37]; and interfacing with device resources [38] This execution environment can be actual environments built on top of real hardware [43, 44] or IoT sandboxes [43, 46–48] The thesis points out the main limitations of these IoT sandboxes, including the dynamic data source has not been fully collected; the sandbox environment is not yet capable of executing IoT Botnet malicious code Therefore, the Ph.D student will build an effective IoT sandbox to solve these disadvantages in Chapter of the thesis 1.3.3 Data preprocessing 1.3.3.1 Network data preprocessing The network data preprocessing method is usually based on the frequency or sequence characteristics of the network data extracted into feature data tables such as KDD99 [55], NSL-KDD [56], UNSW-NB15 [41], CSECIC-IDS2018 [57], and N-BaIoT [58] Many published studies [65-68] have demonstrated the effectiveness of the CSE-CIC-IDS2018 dataset with 80 features in particular Therefore, this method will be selected to apply to the preprocessing of network stream data in Chapter of the thesis 1.3.3.2 System call data preprocessing The system called data preprocessing includes two main trends: applying data processing methods with discrete features to extract features [75] and applying data processing methods with specific sequential features for training the classification model With the survey results, the Ph.D student has chosen to treat system calls as data with sequential properties to avoid losing the important feature of serialization of calls Therefore, to increase the efficiency of IoT Botnet malware detection, in Chapter 3, the researcher proposes to feature a directed system call graph with low complexity and ease of application with simple machine learning algorithms 1.3.3.2 Data preprocessing interacts with system resources Collecting network data and system calls will be difficult for small IoT devices Therefore, many researchers [38, 44, 45] have proposed methods of using data interacting with system resources to summarize the impact of IoT Botnet malware on the target Therefore, the data that interacts with the system's resources has been proven capable of detecting IoT Botnet malicious code This is also an approach of the PhD student in solving the problem of combining features for early detection of IoT Botnet malware presented in Chapter of the thesis 1.3.4 Malware analysis and detection 1.3.4.1 Using machine learning to detect IoT botnets Research results [38, 45, 76–78] have shown that popular machine learning algorithms used include K-nearest neighbors, Support Vector Machines, Decision Trees, and Random Forest The advantage of machine learning models is that they require few resources and have a fast execution time However, these models are often less accurate, with a high false negative rate 1.3.4.2 Using deep learning to detect IoT botnets Commonly used deep learning networks in IoT botnet malware detection include CNN [44], RNN, Deep Autoencoders [58], and DNN [80] However, deep learning networks have high accuracy but high computational complexity, making them difficult to implement in real time Therefore, the research problem is to build an IoT Botnet malware detection model that requires fewer resources, has a fast execution time, and still ensures accuracy 1.4 Conclusion In Chapter 1, the Ph.D student presented an overview of IoT devices, IoT Botnet malware, and the IoT Botnet malware detection process Accordingly, the issues that need to be solved to achieve the research goal of the thesis include: building an IoT Sandbox environment that allows IoT Botnet malware to execute a full life cycle, is capable of collecting full operating data via malicious code, and has a higher success rate of execution than other tools on the same data set; a low-complexity, easy-to-apply system called the “data preprocessing method” with simple machine learning algorithms; Combine multiple data sources featured in a machine learning model for early detection of IoT Botnet malware These three issues will be addressed in turn in the next chapters of this thesis The results of the survey, analysis, evaluation, and experiment of the proposed model in Chapter have been presented and published in journals and conference proceedings: - “Building a model for detecting malicious code on routers by an agent”, Proceedings of the 20th National Conference: The Vietnam conference of selected Information and Communications Technology, 2017 - “Building a model for collecting and detecting network attacks using IoT devices”, Proceedings of the 2nd National Conference: Symposium on Information Security (SoIS), 2017 - “Building a network intrusion detection system for civil IoT devices in smart homes,” Proceedings of the 21st National Conference: The Vietnam conference of selected Information and Communications Technology, 2018 CHAPTER BUILDING A SANDBOX ENVIRONMENT COLLECTS DATA FROM IOT BOTNETS EFFECTIVELY 2.1 A statement of the problem The research problem of this chapter is as follows: We are building a sandbox environment that allows us to fully simulate the requirements for the IoT Botnet malicious code to execute its full life cycle This sandbox must allow not only the full collection of common malware behavior data but also a higher success rate of simulation than other simulation tools on the same dataset 2.1 The architecture of the proposed model The proposed V-Sandbox architecture includes eight main components, as depicted in Figure 2.1 A specific description of the components is presented in the next content of the thesis Hình 2.1 The architecture of V-Sandbox Sandboxes are listed in tables 2.3-2.5 According to the comparison results, the proposed V-Sandbox has created a suitable environment for malicious code to show more of its behavior than the existing IoT Sandbox environments (as evidenced by the data in Table 2.6) Bảng 2.1 Compare the functions of IoT Sandboxes Collection data Multi- Multi- C&C DynamicAuto System File Host CPU OS Server libraries Network calls activity perfomance report DroidScope [97] N N N N N Y Y N N AASandbox [98] N N N N N Y N N N Cuckoo [49] Y Y N N Y NF Y N Y IoTBOX [43] Y Y N N Y N N N NS Limon [52] N N N N Y Y Y N Y REMnux [48] N N N N N Y N N Y Detux[54] N N N N Y N N N Y Padawan [53] Y Y N N N Y Y N Y LiSa [51] Y Y N NF Y Y Y N Y V-Sandbox Y Y Y Y Y Y Y Y Y * N: Not yet, Y: Yes, NF: Not Fully, NS: Not Sure (no open source) 2.4 Conclusion In this chapter, the thesis has built a V-Sandbox environment that ensures the conditions for being able to fully collect behavioral data of IoT Botnet This environment is fully automated, open source, and easy to install The idea and experimental results of the proposed method have been published at: - “V-Sandbox for Dynamic Analysis IoT Botnet,” IEEE Access, vol 8, pp 145768–145786, 2020, (SCIE index, Q1), ISSN: 2169-3536, DOI: 10.1109/ACCESS.2020.3014891 - “Building a system to detect malware in routers based on simulation”, Journal of Science and Technology on Information security, 1.CS (05) 2017 CHAPTER A DIRECTED SYSTEM CALL GRAPH FEATURE FOR DETECTING IOT BOTNETS 3.1 A statement of the problem 3.1.1.Selection of dynamic data sources for preprocessing and analysis In the IoT Botnet detection problem, when the network flow data and device resource occupation information cannot be achieved effectively, the researchers must use a dynamic data source called “system-call” [36, 37, 73, 75, 103] In Chapter 3, the thesis will propose a system called the data preprocessing method that is effectively applied to the IoT Botnet malware detection problem 3.1.2 The problem of building features from system calls The problem with this chapter is stated as follows: Let E be a set of n executable files for resource-constrained IoT devices, denoted by E = {e1, e2, …, en} where � � can be a malware or a benign file Where � = {�� , �ℎℎ , ����� � �, ������ , �ℎ� , ������ } �ℎ� �� � �ℎ�ℎ�ℎ is the set of features extracted from the system call in the IoT Botnet detection problem, each feature in the set F will produce n features corresponding to each } For example, for ��� � , with a mapping {�: � → �; ( ↦ = () feature ���ℎ�� , there exists a set of feature values �ℎ(�) = {�1 ↦ ��ℎ������ , �2 ↦ �ℎ2 , … , �� ↦ �ℎ� } We need to find a ����� ∉ ∀ |∀� ∈ �, ∃ � ↦ ������ for which ����� is more efficient than �� ∈ �, as quantified by common machine learning model evaluation metrics on the same homology dataset To solve the above research problem, the researcher proposes the directed system call graph feature to sequentially structure the system calls obtained from the V-Sandbox environment The proposed feature will have low complexity and be easy to apply with simple machine learning algorithms 3.1.3 A diagram and an idea of the proposed method The proposed method has main steps, namely: In Step 1, the ELF file is put into the V-Sandbox to collect system calls Next, redundant information is removed from the system call data through a simple data preprocessing function The result of this process is that the system call sequence of the input ELF file has been minimized In the second step, the DSCG system call graph is built from the simplified system call sequence The third step is to perform preprocessing of the DSCG graph data before putting it into the data classifiers by graph embedding to efficiently extract information about the features of the graph display the DSCG and reduce the dimension of this vector The fourth step, after extracting the appropriate set of features, is used to train and evaluate the IoT Botnet malware detection ability based on popular machine learning algorithms 3.2 A directed system call graph 3.2.1.Definition of a directed system call graph DSCG Definition 2.1 DSCG is a directed graph defined as GDSC=(V,E) where: V is a set of vertex vi representing system calls with the same name and arguments; E is a set of edges e k connecting from vertex vi to vertex vj of the graph, E ⊆ V×V, With loops, it counts as only one edge 3.2.1 Construct a directed system call graph DSCG In this step, the researcher will proceed to construct DSCG graphs for each input executable based on system call sequence information obtained from V-Sandbox The algorithm for constructing a DSCG graph is presented by pseudocode as shown in Algorithm 3.1 3.3 Graph data preprocessing The Ph.D student uses a graph embedding technique to preprocess DSCG graph data Graph embedding techniques experimented with in this thesis include FEATHER [106], LDP [107], and Graph2vec [108] 3.4 Experiment and evaluate 3.4.1.Dataset To evaluate the performance of the proposed feature, the data set contains 8911 executable samples running successfully from V-Sandbox, including 5023 IoT Botnet and 3888 microarchitecture cross-platform benign samples (including MIPS, ARM, X86, PowerPC, etc.) were collected and used for the experiment 3.4.2 Implement The Ph.D student uses scenarios to divide the dataset as follows to train and validate the proposed feature: Bảng 3.1 Scenarios to divide the dataset Scenario Training set Type Amount Bashlite 2786 Other IoT Botnet 727 Benign file 3088 Bashlite 2786 Mirai 1389 Benign file 3088 Mirai 1510 Other IoT Botnet 727 Benign file 3088 Testing set Type Amount Mirai 1510 Benign file 800 Other IoT Botnet 727 Benign file 800 Bashlite Benign file 2786 800 3.4.3.Metric for evaluating classification models The thesis used four metrics for evaluating classification models, including: Accuracy, True Positive Rate, False Positive Rate, and Area Under the Curve 3.4.4.Experimental results and discussion The experimental results are presented in Table 3.4 The features extracted from the DSCG graph achieved good results for the IoT Botnet malware detection problem (ACC≈96.89%, TPR≈94.97%, FPR≈1.4%, AUC≈0.989) This feature works well with simple and popular machine learning classifiers like KNN, SVM, Decision Tree, and Random Forest The number of dimensions of the feature vector extracted from the graph is also less than that of published studies, contributing to reducing the computational complexity when applied to IoT Botnet malware detection and classification models Specific comparisons for related studies are presented in Table 3.5 Table 3.2 The proposed model's value metric Scenario Training set Testing set Bashlite + Other Mirai + malware + Benign Benign Bashlite + Other Mirai + malware + Benign Benign Other Bashlite + malware + Benign Mirai Graph embedding algorithm ACC TPR Graph2vec 0.9649 0.9474 FPR AUC 0.0087 0.9895 MLC's best SVM Feather 0.9627 0.9453 0.0109 0.9923 RF LDP 0.9757 0.9669 0.0109 0.9792 DT Graph2vec 0.9809 0.9944 0.0294 0.9971 RF Feather 0.9355 0.863 0.0087 0.9932 RF LDP 0.933 0.8573 0.0087 0.9632 KNN Graph2vec 0.9854 0.9896 0.0272 0.9961 RF 0.9906 0.012 0.9972 RF 0.9919 0.9924 0.0098 0.9981 RF Feather LDP 0.99 Table 3.3 Compare the proposed model and related research Data Dimension Dataset/Research ACC FPR TPR MLC AUC preprocessin of feature object (%) (%) (%) g vector NSL-KDD, Feature Alhaidari 31 HMM 94.67 1.88 47.86 IoTPOT[40], UNSW pruning [70] NB15/ IoT Botnet method N-gram Alhanahnah Kaspersky, IoTPOT K-means string 400 85.20 [71] [40]/ IoT Botnet Clustering features Authors IoTPOT [40]/ IoT Haralick image Botnet texture features Extracting IoTPOT [40]/ IoT Meidan [58] traffic Botnet statistics Shobana IoTPOT [40]/ IoT N-gram, [73] Botnet TFIDF Nguyen [74] IoTPOT [40], Virustotal [96], Subgraph2V VirusShare [106]/ ec IoT Botnet IoTPOT [40], Proposed Virustotal [96], DSCG model VirusShare [106]/ IoT Botnet Karanja [72] 20 RF 95.38 - - 0.97 115 Deep autoencoder - 1.7 - - 184 RNN 98.31 - - - 140 RF 97.00 - - 0.96 128 SVM, DT, Random 96,89 1.4 94.97 0.989 Forest, KNN 3.5 Conclusion In this chapter, to sequentially structure the system calls obtained, the Ph.D student proposes the DSCG-directed system call graph feature, which has low complexity and is easy to apply with other learning algorithms simple machine The ideas and experimental results of this chapter have been published at: - “Iot Botnet Detection Using System Call Graphs and One-Class CNN Classification”, International Journal of Innovative Technology and Exploring Engineering (IJITEE), vol 8, no 10, pp 937–942, Aug 2019, (SCOPUS index), ISSN: 2278-3075, DOI: 10.35940/ijitee.J9091.0881019 - “Toward an effective IoT botnet detection method based on system calls”, Proceedings of the 23rd National Conference: The Vietnam conference of selected Information and Communications Technology, 2020 CHAPTER EARLY IOT BOTNET DETECTION USING A COLLABORATIVE MACHINE MODEL 4.1 A statement of the problem 4.1.1.Early detection Definition 4.1 Early detection is the ability to determine whether an executable is benign or malware based on the minimum collection of necessary data collected by dynamic analysis 4.1.2 A collaborative machine learning model for the early detection of malware Collaborative learning is classified into three main groups: early fusion, late fusion, and intermediate fusion Each collaborative machine learning method has its advantages and disadvantages However, for IoT Botnet malware early detection, the late merge model is suitable for combining different input characteristics of the malware and optimizing detection time Through theoretical and experimental research, the Ph.D student has proved the above statement 4.1.3.Early detection of IoT botnets The research problem of this chapter is stated as follows: “Building a collaborative machine learning model to improve the efficiency of IoT Botnet malware detection with simple machine learning algorithms, focusing on early detection based on a collection of the minimum necessary data obtained from dynamic analysis.” 4.2 Proposed model 4.2.1.An overview of architecture The difference between this method and existing methods is that only a minimal amount of data originally collected from the V-Sandbox is required to provide highly accurate detection results From there, the model is capable of early detection of IoT Botnet malicious code Figure 4.1 Architecture of the proposed model The proposed architecture is presented in Figure 4.4, with main components including a sandbox environment; preprocessor data component; data normalization component, feature extraction component; machine learning classifier; fusion component; 4.2.2 A sandbox environment (SC) To effectively collect information on the behavior of IoT Botnet malicious code, the Ph.D student chose V-Sandbox as the input ELF file execution environment 4.2.3.Preprocessor data component (PPDC) Để phát sớm mã độc IoT Botnet hiệu quả, cần phải lựa chọn ngưỡng độ dài tối thiểu dữ liệu để đưa vào phân loại Với kết số liệu thống kê tư Dataset, nghiên cứu sinh lựa chọn ngưỡng tối thiểu 300 lời gọi hệ thống, 20 hành vi thay đổi tài nguyên thiết bị 50 gói tin luồng mạng đầu tiên thu nhận được tư V-Sandbox để làm đầu vào cho mô hình học máy 4.2.4.Data normalization component (DNC) - For system call data, use the features of a directed system call graph (DSCG) - For network data, use the features set of the dataset CSE-CIC- IDS2018 [139] - For device resource usage data, use the output characteristics of VSandbox [105] 4.2.5.Feature extraction component The Ph.D student has considered many feature extraction methods such as Filter, Wrapper, Embedded, and Ensemble to extract suitable features Based on the survey results, the Ph.D student chose Wrapper 4.2.6.Machine learning classifier (MLC) The Ph.D student conducts testing of popular single machine learning algorithms (such as KNN, SVM, Decision Tree, and Random Forest) to choose the most optimal solution 4.2.7.Fusion component (FC) To be able to combine the prediction results of different machine learning classifiers, the researcher tests merge functions such as Voting and Logistic regression for his problem 4.3 Experiment and evaluate 4.3.1.Dataset To evaluate the performance of the proposed model, a dataset containing 8911 samples, including 5023 IoT Botnets and 3888 benign samples, was collected and used for the experiment 4.3.2.Implement Popular machine learning algorithms such as KNN, Decision Tree, Random Forest, and SVM have been installed and tested (with parameters in Table 4.3) for the input feature sets of the proposed model 4.3.3 Experimental Results Table 4.5 Optimization of machine learning models Model Network (k-NN) Performance (Random Forest) System-Call (k-NN) Proposed model ACC ROC AUC FPR Malware Precision Recall Benign F1 Precision Recall F1 0.8978 0.8901 0.1270 0.9500 0.9071 0.9280 0.7795 0.8730 0.8236 0.9904 0.9846 0.0282 0.9895 0.9973 0.9934 0.9928 0.9718 0.9822 0.9822 0.9715 0.0370 0.9860 0.9801 0.9830 0.9479 0.9630 0.9554 0.9937 0.9896 0.0194 0.9927 0.9987 0.9957 0.9964 0.9806 0.9884 4.3.4 Evaluating From the evaluation results on the dataset, the proposed collaborative machine learning model gives high accuracy results with ACC = 99.37% and AUC = 0.9896 The time it takes for the model to make a prediction is approximately seconds, which is faster than published studies on the early detection of malicious code on IoT devices In addition, the proposed model using only a small portion of the malicious code's execution behavior data was able to generate accurate detection without waiting for the malware to complete its behavior This is the outstanding contribution of this model 4.4 Conclusion In this chapter, the Ph.D student proposed a new collaborative machine learning (CMED) model for effective early detection of IoT botnets based on the minimum collection of necessary dynamic data The effectiveness of the proposed model has been proven through test results on a dataset with 8911 samples The ideas and experimental results of the proposed method in this chapter have been published at: - “A collaborative approach to early detection of IoT Botnet” Computers & Electrical Engineering Journal, Oct 2021 (SCIE index, Q1), ISSN: 0045-7906 CONCLUSION In this thesis, the Ph.D student focuses on understanding the distinguishing features of IoT Botnet malware from traditional malware, thereby serving as a basis for research and building machine learning models to improve accuracy By dynamic analysis, we identify and reduce complexity in IoT Botnet malware detection on resourceconstrained IoT devices Accordingly, the content of the thesis has focused on researching IoT botnet malware detection methods and evaluating the advantages and disadvantages of the existing methods From there, the thesis offers a solution to build a machine learning model with high accuracy and low complexity in IoT Botnet malware detection The proposed method of the thesis is practical when it is possible to deploy an application model that integrates agents into resource-limited IoT devices to collect and send information about the device's operating behavior to the money module central processing as input for IoT Botnet malware analysis, detection, and warning module Here, the DSCG graph feature extraction method and the Ph.D student's IoT Botnet early detection collaborative machine learning model are applied to classify benign and malicious files This is one of the contents in the framework of the national technology development and application research project “Research and build a system to automatically detect, warn, and prevent network attacks targeting IoT devices” (KC-4.0-05/19-25) in which the Ph.D student is member Although important research results have been achieved on scientific theory and practice in IoT Botnet malware detection, the thesis still has many issues that need to be researched and improved in the future, including: The proposed method of the thesis is currently being evaluated with a data set that mainly contains IoT Botnet malicious code, excluding other types of malicious code In recent times, many new variants of malicious code, such as ransomware, Trojans, spyware, etc., are being developed to be able to spread on resource-limited IoT devices This is also a potential security and information security risk that needs to be researched and detected Therefore, it is necessary to test and improve the method proposed in the thesis with these new types of malware soon The total time to initialize, execute, monitor, and generate behavioral reports of V-Sandbox environment input samples is still long, leading to time constraints in the IoT Botnet malware early detection solution In addition, the success rate of running samples in the dataset for V-Sandbox is 80.5% It is necessary to research and improve to increase the success rate of the remaining samples in the collected data set In the future, the PhD student will continue to improve and optimize the V-Sandbox to overcome these disadvantages The use of dynamic analysis as in the proposed method has achieved high efficiency in the IoT Botnet malware detection experiment in terms of scientific theory However, in practice, using signature-based patterns in malware detection is simple and saves system resources for actual deployment Therefore, researching solutions to automatically convert flexibly the detection results of the proposed model into signature samples for IDS is also an applied research topic in the future NEW CONTRIBUTIONS OF THE THESIS This thesis has three key contributions, including: 1- Build a V-Sandbox environment that ensures the conditions for fully collecting behavioral data from IoT botnets This sandbox environment is fully automated, open-source, and easy to use 2- Proposes method of DSCG (Directed System Call Graph) can efficiently extract features for the detection of IoT botnets The proposed method has low complexity but still ensures high accuracy in IoT Botnet detection, especially with newly appearing IoT Botnet malware lines 3- Propose an IoT Botnet malware detection model capable of combining many different features to detect IoT Botnets early This proposed model uses a minimum of necessary dynamic data and can still provide highly accurate forecasts, contributing to reducing IoT Botnet malware detection time A LIST OF PUBLISHED PAPERS 1) “Building a system to detect malware in routers based on simulation”, Journal of Science and Technology on Information security, 1.CS (05) 2017 2) “V-Sandbox for Dynamic Analysis IoT Botnet,” IEEE Access, vol 8, pp 145768–145786, 2020, (SCIE index, Q1), ISSN: 2169-3536, DOI: 10.1109/ACCESS.2020.3014891 3) “Iot Botnet Detection Using System Call Graphs and One-Class CNN Classification”, International Journal of Innovative Technology and Exploring Engineering (IJITEE), vol 8, no 10, pp 937–942, Aug 2019, (SCOPUS index), ISSN: 2278-3075, DOI: 10.35940/ijitee.J9091.0881019 4) “A collaborative approach to early detection of IoT Botnet” Computers & Electrical Engineering Journal, Oct 2021 (SCIE index, Q1), ISSN: 0045-7906 5) “Building a model for detecting malicious code on routers by an agent”, Proceedings of the 20th National Conference: The Vietnam conference of selected Information and Communications Technology, 2017 6) “Building a model for collecting and detecting network attacks using IoT devices”, Proceedings of the 2nd National Conference: Symposium on Information Security (SoIS), 2017 7) “Building a network intrusion detection system for civil IoT devices in smart homes,” Proceedings of the 21st National Conference: The Vietnam conference of selected Information and Communications Technology, 2018 8) “Using CNN and LSTM together improves the network attack detection performance of HIDS with the ADFA dataset”, Proceedings of the 3rd National Conference: Symposium on Information Security (SoIS), 2018 Published in the December 2018 issue of Information and Communication Journal (ISSN 1859-3550) 9) “Toward an effective IoT botnet detection method based on system calls”, Proceedings of the 23rd National Conference: The Vietnam conference of selected Information and Communications Technology, 2020 ... 47.86 IoTPOT[40], UNSW pruning [70] NB15/ IoT Botnet method N-gram Alhanahnah Kaspersky, IoTPOT K-means string 400 85.20 [71] [40]/ IoT Botnet Clustering features Authors IoTPOT [40]/ IoT Haralick... Extracting IoTPOT [40]/ IoT Meidan [58] traffic Botnet statistics Shobana IoTPOT [40]/ IoT N-gram, [73] Botnet TFIDF Nguyen [74] IoTPOT [40], Virustotal [96], Subgraph2V VirusShare [106]/ ec IoT Botnet.. . the behavior of IoT Botnet malicious code, the Ph.D student chose V-Sandbox as the input ELF file execution environment 4.2.3.Preprocessor data component (PPDC) Để phát sớm mã độc IoT Botnet hiệu