A Compact Trace Representation Using Deep Neural Networks for Process Mining44928

A Compact Trace Representation Using Deep Neural Networks for Process Mining 1st Hong-Nhung Bui Vietnam National University (VNU) VNU-University of Engineering and Technology (UET) Banking Academy of Vietnam Hanoi, Vietnam nhungbth@hvnh.edu.vn 4th Thi-Cham Nguyen Vietnam National University (VNU) VNU-University of Engineering and Technology (UET) Hanoi, Vietnam Hai Phong University of Medicine and Pharmacy Haiphong, Vietnam nthicham@hpmu.edu.vn 2nd Trong-Sinh Vu School of Information Science Japan Advanced Institute of Science and Technology Ishikawa, Japan sinhvtr@jaist.ac.jp 3rd Tri-Thanh Nguyen Vietnam National University (VNU) VNU-University of Engineering and Technology (UET) Hanoi, Vietnam ntthanh@vnu.edu.vn 5th Quang-Thuy Ha Vietnam National University (VNU) VNU-University of Engineering and Technology (UET) Hanoi, Vietnam thuyhq@vnu.edu.vn Abstract— In process mining, trace representation has a significant effect on the process discovery problem The challenge is to get highly informative but low-dimensional vector space from event logs This is required to improve the quality of the trace clustering problem for generating the process models clear enough to inspect Though traditional trace representation methods have specific advantages, their vector space often has a big number of dimensions In this paper, we address this problem by proposing a new trace representation method based on the deep neural networks Experimental results prove our proposal not only is better than the alternatives, but also significantly helps to reduce the dimension of trace representation Keywords—event logs, trace clustering, trace representation, deep neural networks, compact trace representation I INTRODUCTION Process mining, with three main tasks, i.e., process discovery, conformance checking, and process enhancement, plays a vital role in most today’s companies Thus, many enterprises are in need of process models generated from the event logs of their software systems to assist the enterprises in monitoring their employee’s process behaviors as well as process optimization However, generating process models from the whole raw event logs often results in complicated models that can not be used for inspection Trace clustering is an effective solution which separates an input event log into groups (clusters) containing similar events The generated model from an event cluster will have much lower complexity [1, 3, 4, 5, 6, 7, 8, 9], and be easy for inspection than that of the lower one This motivates us to train a deep neural network, and get the data at last layer, i.e., the one before the output layer of the network, to use for trace representation This representation has low dimension (called compact), however rich information, thus, it helps to reduce the complexity of clustering algorithms, and improve the results We also propose a method to transform the unlabeled trace into labeled trace for training the deep neural network in order to get the compact trace representation The results of the experiments on three real event logs indicates the effectiveness of our method It not only increases the quality of the generated process model, but also reduces the dimension of the vector space of trace representation The rest of this paper is organized as follows: In section 2, we introduce some traditional trace representation methods In section 3, the application of deep learning with neural networks for trace representation is presented Section provides the experimental results Finally, section shows some conclusions and future work II TRACE REPRESENTATION METHODS IN PROCESS DISCOVERY A The process discovery task Process discovery has the role of reconstructing an actual business process model by extracting the information from an event log recorded by transactional systems In this task, an event log is taken as the input and a business model is produced without using any prior information The trace representation method is one of the most important factors that affects the quality of trace clustering Traditional trace representations, such as Bag-of-activities, kgrams, Maximal Repeats, Distance Graphs, etc., create the vector space model with high dimension [2, 10] This increases the computational complexity, the execution time, as well as the storage space for clustering algorithms Motivated by the study in natural language processing [18], this article proposes a new method for trace representation, i.e., a compact representation, via deep neural networks Deep learning methods are based on several layers of neural networks It has the ability to learn at multi-layers of the networks Thus, the knowledge at a higher layer is richer 980-1-7281-3003-3 / 19 / $ 31.00 © 2019 IEEE Fig A fragment of the airline compensation requests event log [19] To discover the process models, the α algorithm can be utilized [12], and the obtained models can be represented by Petri nets The input of process discovery is an event log consisting of a list of events An event is considered as an actual action accompanied with some information, such as event id, activity, timestamp, resources (e.g., the involved person and/or device), and cost Fig provides a snippet of an event log In this task, we only consider the activity of events A set of events, ordered by timestamp, having the same “case id” forms a case which can be represented as a “trace” of the actual process as depicted in “Fig 2” [10] Fig The trace in an event log, where =“register request”, =“examine thoroughly”, =“examine casually”, =“check ticket”, =“decide”, =“reinitiate request”, =“pay compensation”, ℎ=“reject request” B Traditional trace representation methods In the process discovery problem, the contemporary the quality of the discovered model depends not only on the complexity of the event log but also on the trace representation method Different methods will exploit the different relationship/characteristics between the activities Similar to document representation, a trace can be represented by the vector space model The relationship/characteristics of activities is converted into numerical values as the elements of a vector There are two objectives of trace representation: one is to increase the relationship between the activities; the other is to reduce the dimension of the vector space model Existing work suggests many different approaches for trace representation, such as bag-of-activities, k-grams, maximal repeats, distance graph, etc [2, 10], as briefly described below: 1) Bag-of-activities: This is the most common trace representation method for clustering A trace is transformed into a vector of distinct activities appearing in the event log Each trace is converted into a vector in the form of the binary vector space model [2] If an activity appears in the trace, then its corresponding element in the vector is 1, otherwise For example, let an event log = [ ℎ, ℎ, ℎ, ℎ] The set of distinct activities is { , , , , , , ℎ}, then the set of binary vectors of bag-ofactivities representation of the above event log is {(1,1,0,1,1,0,1), (1,1,1,1,1,1,1), (1,1,0,1,1,0,1), (1,0,1,1,1,1, 1)} The dimension of the vector space model is 2) k-grams: a -gram refers to a sequence of consecutive activities For the trace [ ℎ], the set of 1-gram corresponds to { , , , , , , ℎ}; the set of 2-gram is { , , , , , , , ℎ}, etc Each distinct -gram is mapped into a feature in the vector space 3) Maximal Repeat: The Maximal repeat is defined as follows [2]: a maximal pair in a sequence is a subsequence https://www.rsipvision.com/wpcontent/uploads/2015/04/Slide5.png that manifests in at two distinct positions and such that the element to the immediate left (right) of the manifestation of at position i is different from the element to the left (right) of the manifestation of at position , i.e., ( , + | | − = ( , + | | − 1) = , and ( − 1) ≠ ( − 1) and ( + | |) ≠ ( + | |) , for ≤ < ≤ | | ( (0) and (| | + 1) are considered as null, i.e., ) Given the event log , when concatinating all the traces in L to form a sequence, the features in the vector space of maximal repeat } , so its is { , , , , , , ℎ, ℎ, , , ℎ, dimension is 12 4) Distance graph: Given a corpus , the order distance graph of a document generated from C is defined as graph ( ; ; ) = ( ( ); ( ; )) , where ( ) is the set of nodes, i.e., the set of distinct words in the entire corpus , and ( ; ) is the set of edges in the graph The set ( ; ) contains a set of directed edges from node to node if the word precedes word by at most positions Each edge in the graph is mapped into a feature in a vector space [13] To apply theory of distance graph in trace representation problem, the set of activities in the event log is considered as the set of “distinct words” in corpus , and a trace in the event log is considered as a document , thus the distance graphs for an event log can be constructed [10] The set of edges/features in distance graph order is { , , , , , , ℎ}; the set of corresponding features in distance graph order is { , , , , , , ℎ, , , , , , , , ℎ} ; the set of features in distance graph order is { , , , , , , ℎ, , , , , , , , ℎ, , , , , , ℎ} ; The corresponding dimensions are 7, 15, and 21 For a small event log with only traces, the dimension of the vector space is rather big depending on the different representation methods For the real-life event logs, the dimension of the vector space usually reaches to thousands or even tens of thousands This greatly affects the performance of clustering algorithms in terms of execution time and storage space Therefore, reducing the number of dimensions of the vector space model is a significant problem We will present our solution to this problem in the next section III DEEP NEURAL NETWORKS IN TRACE REPRESENTATION A Deep neural networks Deep Neural Networks (DNN) is a class of machine learning algorithms DNN, based on Artificial Neural Networks (ANN), allows computers to be able to "learn" at different levels of abstraction With the foundation of artificial neural networks with multiple layers between the input and output layers, deep neural networks were improved to imitate the human brain’s activities by using a great number of neurons connected to each other to process information [14,15,16] The structure of a deep neural network includes layers as described in Fig 31, where:  Input layer: consists of neurons receiving input values  Hidden layers: include neurons for performing transformations, the output of a layer is the input for the next layer representation, the compact trace representation will be used for clustering Fig Deep neural networks model  Output layer: contains neurons returning the desired output data The neurons are connected to each other by the formulas in hidden layers and output layer as following:  = ( + ) (1)  = ( + ) (2) where , , are the input, hidden and output values of a neuron, correspondingly; is an activation function (e.g., common activation functions are sigmoid, tanh, and ReLu); , are parameters of the networks, in which the connection weights are very important in a DNN, representing the importance of each input data in the information conversion process from one layer to another Learning in a DNN can be described as the process of receiving the expected results from adjusting the weights of the input The bias value permits the activation function to move line up and down to more effectively fit the prediction with the data Supervised learning and unsupervised learning are two basic techniques that a DNN is trained Supervised learning uses labeled data , and the learning process is repeated until the output value reaches the desired value In this work, the supervised learning technique is applied with three steps: (1) The output value is calculated (2) Output is compared with desired value (3) If the desired value is not satisfied, the weights and bias are adjusted, the output is recalculated by going back to step In the process of training, the initial weights , in a deep neural network are initialized randomly with the dimension depending on the dimension of the input value and the desired value Assuming that is an matrix, and is a matrix At the hidden layer, is initialized as a matrix, and is initialized as a matrix At the output layer, is initialized as a matrix, is initialized as a matrix Where is a value defined by the user and is much smaller than After applying formula (1), we obtain a matrix of dimension ; after applying formula (2), we obtain a matrix of dimension B Trace representation based on deep neural networks One of the purposes of deep neural networks is to train the input value into a compact intermediate representation (the hidden value ) with a new and better representation to accurately predict the output value In this paper, we apply this idea of the supervised learning technique in deep neural networks to improve the efficiency of the trace representation method in the event log Instead of using the original trace For instance, the credit process has some procedures, i.e., personal loan, corporate loan, home loan, and consumer loan, where each procedure shares common characteristics or activities The common characteristic is defined as trace context in [17] In other words, each procedure may contain a common sequence of activities, which is defined as a trace context Let = { , , … } be an event log, where is a trace Let be the longest common prefix of a trace subset, i.e., = { ∈ | = ^ }, such that | | > 1, where d is a activity sequence, d can be empty; the notation ‘^’ in ^ denotes sequence concatenation operation, then is called as a trace context [17] For example, given an event log = [ ℎ, ℎ, , , , ] , then the set of trace contexts of is { , , } In order to apply the supervised learning in deep neural networks, the set of traces in the event log is considered as the set of input data, and the trace context set is considered as the labeled data Z We design a deep neural network consisting of one input layer that receives the traces of event logs, two hidden layers and one output layer to predict the trace context as depicted in Fig Fig The idea of using deep neural networks for trace representation At the input layer, we represent a trace by binary bag-ofactivity Each input neuron receives a trace is an dimensional binary vector, = [ … ], where is either or For the labeled data Z, the trace context set is represented as one-hot vectors Suppose an event log has different trace contexts, each trace context will be represented by a one-hot vector = [ , , , ] with = if = , otherwise = (1 ≤ , ≤ ) Example, an event log has different trace contexts { , , }, so the trace context = [1,0,0]; = [0,1,0]; = [0,0,1] At the hidden layers, the hidden values are calculated according to the following formula: = ( + ) (3) The final vector obtained at this layer, i.e., the last neural network layer in the hidden layer, has the dimension of , will be used as the compact trace representation Any input trace vector with dimension of will be transformed into a vector with the dimension of We expect is much smaller than Moreover, the value of is adjusted during the training process, it has richer information than that of the input vector The value of each element in vector is a real number ranging from (0,1] instead of two discrete and values as in the input vectors, thus it contain finer information This characteristic is another clue why we select this vector as the input for the clustering task Its richer information will help to improve the clustering performance The output layer receives the hidden values , and the output values are calculated by the following formula: = ( + ) (4) where , , , are the model parameters which can be adjusted repeatedly during the training process The sigmoid function is used as the activation function as in (5): ( )= (5) The training process will finish when the output value reaches the trace context in an allowable error IV EXPERIMENTAL RESULTS A Experimental Method For experiments, we used the three-phase process discovery framework [10], i.e., “Trace Processing and Clustering”, “Discovering Process Model”, and “Evaluating Model”, as depicted in Fig.5 For evaluation, in the Trace Processing step, we implemented some other trace representation methods as the baseline, i.e., bag-of-activities, k-grams, maximal repeats, and the distance graph model The experimental platform is on Ubuntu 16.04, Python 2.7, Tensorflow2 1.2 In the Clustering step, the K-means clustering algorithm and the tool of data mining - Rapid Miner Studio3 were used In the Discovering Process Model phase, we use -algorithm (a plug-in in the tool of process mining - ProM4 6.6) to get the process models from event clusters Other hyperparameters of the model are set as follows: learning rate = 0.1, iterations = 10.000, and the softmax_cross_entropy_with_logits_v2 of Tensorflow is used The Evaluating Model phase determines the quality of the generated process models using two main measures Fitness and Precision [17] These two measures are in the range of [0,1], and the bigger the better We use the “conformance checker” plug-in in ProM 6.6 to calculate the fitness and precision measure for each sub-models Since there are more than one sub-model, we calculate the weighted average of the fitness and precision of all sub-models for comparison as follow: =∑ (6) where is the average value of the fitness or precision measure; is the number of models; is the number of traces in the event log; is the number of traces in ith cluster; and is the value of the fitness or precision measure in the model, correspondingly https://www.tensorflow.org/ https://rapidminer.com http://www.promtools.org/ www.processmining.org/event_logs_and_models_used_in_ book Fig A three-phase framework of process discovery B Experimental Data For the objectivity of experiments, we used three event logs Lfull , prAm6 and prHm6 in the process mining community with the following characteristics “Table I” TABLE I Event log #cases THE CHARACTERISTICS OF THREE EVENT LOGS #events #contexts Lfull 1391 7539 20 prAm6 1200 49792 11 prHm6 1155 1720 characteristics Duplicated traces, repeated activities in a trace Few duplicated traces, no-repeated activities No-duplicated traces, no-repeated activities C Experimental Results We set the parameter equal to a set of values [30, 40, 50, 60, 70, 80] to evaluate its importance The best experimental results, i.e., the dimension (Dim) of the traces and the Time (s-second, h-hour) to create a trace representation as well as the Fitness and Precision of the resulted process models are described in the Table II For Lfull, when is from 30 to 80 the best result is at 50, other resutls are almost the same For the other datasets, i.e., prAm6 and prBm6, the results are almost the same when this parameter is changed https://data.4tu.nl/repository/uuid:44c32783-15d0-4dbdaf8a-78b97be3de49 https://data.4tu.nl/repository/uuid:44c32783-15d0-4dbdaf8a-78b97be3de49 TABLE II THE RESULTS OF TRADITIONAL AND COMPACT TRACE REPRESENTATIONS Event log Method of Trace Representation Measure Dim Time Fitness Precision Scenario 1: Traditional Trace Representations Lfull Bag-of-activities 0.1s 0.991 0.754 k-grams 23 1.9s 0.955 0.962 Maximal Repeats 50 2s 0.950 Distance Graphs 43 1.9s 0.992 17s 0.99995 0.794 while its richness of information helps to improve the performance of clustering The experimental results indicate this method is quite suitable for complex event logs which contain a large number of activities The dimension of representation is reduced about ten times, while the precision and fitness are improved One possible future direction, we will try advanced deep learning methods based on Recurrent neural network (RNN) or Long-short term memory (LSTM), i.e., an improved deep learning method base on Recurrent neural network, to investigate whether they can improve the performance of trace clustering Scenario 2: Deep neural networks Compact trace 50 REFERENCES Scenario 1: Traditional Trace Representations prAm6 Bag-of-activities 317 0.3s 0.968 0.809 k-grams 2467 76s 0.968 0.809 Maximal Repeats 9493 8h 0.968 0.332 Distance Graphs 1927 93s 0.968 0.809 43s 0.973 0.911 Scenario 2: Deep neural networks Compact trace 30 Scenario 1: Traditional Trace Representations prHm6 Bag-of-activities 321 0.2s 0.902 0.660 k-grams 730 9.8s 0.902 0.660 Maximal Repeats 592 59s 0.897 0.730 Distance Graphs 1841 54s 0.902 0.660 37s 0.902 0.762 Scenario 2: Deep neural networks Compact trace 30 The experimental results show that, in compact trace representation, the fitness is higher than the precision for all datasets The training time of DNN does not take too much Except the Lfull dataset, which has a small number of activities, for the other datasets with a large number of activities, compact trace representation always has the best results The deep neural network has the ability to learn the relation among the activities in the input vector to generate the compact representation which may contain richer information than the input Thanks to this fact, the clustering can produce better results Especially, the dimension of compact trace representation is very small in comparison with other representation methods This indicates our method is a good choice for complex event logs with a large number of activities For two complex datasets in the experiments, the dimension of the compact trace representation is reduced about ten times, i.e., its dimension is 30 versus the input trace dimension of 317 and 321 This is exactly what we expect to reduce the dimension of feature space V CONCLUSIONS AND FUTURE WORK This paper proposes a new trace representation method using deep neural networks The output vectors at the last hidden layer of deep neural networks are used as the trace representation for later clustering phase The compactness of this representation helps to reduce the clustering complexity, [1] RPJ Chandra Bose, WMP Van der Aalst, “Trace Clustering Based on Conserved Patterns Towards Achieving Better Process Models”, Business Process Management Workshops, pp.170 (2009) [2] RPJ Chandra Bose, “Process Mining in the Large Preprocessing, Discovery, and Diagnostics”, PhD Thesis, Eindhoven University of Technology (2012) [3] Gianluigi Greco, Antonella Guzzo, Luigi Pontieri, Domenico Saccà, “Discovering Expressive Process Models by Clustering Log Traces”, IEEE Trans Knowl Data Eng, pp.1010 (2006) [4] A.K.A de Medeiros, A Guzzo, G Greco, WMP Van der Aalst, A J M M Weijters, Boudewijn F van Dongen, Domenico Saccà, “Process Mining Based on Clustering: A Quest for Precision”, BMP Workshops, pp.17 (2007) [5] M Song, Christian W Günther, WMP Van der Aalst, “Trace Clustering in Process Mining”, Business Process Management Workshops, pp.109 (2008) [6] De Weerdt, J., van den Broucke, S.K.L.M., Vanthienen, and J., Baesens, “Leveraging process discovery with trace clustering and text mining for intelligent analysis of incident management processes”, IEEE Congress on Evolutionary Computation, pp.1 (2012) [7] J De Weerdt, Seppe K L M vanden Broucke, Jan Vanthienen, Bart Baesens, “Active Trace Clustering for Improved Process Discovery”, IEEE Trans Knowl Data Eng 25(12), pp.2708 (2013) [8] Igor Fischer, Jan Poland, “New Methods for Spectral Clustering”, In Proc ISDIA (2004) [9] Joerg Evermann, Tom Thaler, Peter Fettke: “Clustering Traces using Sequence Alignment”, Business Process Management Workshops, pp 179-190 (2015) [10] Quang-Thuy Ha, Hong-Nhung Bui, Tri-Thanh Nguyen, “A trace clustering solution based on using the distance graph model”, In proceedings of ICCCI, pp 313-322 (2016) [11] T Thaler, Simon Felix Ternis, Peter Fettke, Peter Loos, “A Comparative Analysis of Process Instance Cluster Technique”s, Wirtschaftsinformatik, pp.423 (2015) [12] WMP Van der Aalst, “Process Mining - Data Science in Action”, Springer 2nd edition (2016) [13] Charu C Aggarwal, Peixiang Zhao, “Towards graphical models for text processing”, Knowl Inf Syst 36(1), pp.1-21 (2013) [14] Li Deng, Dong Yu: “Deep Learning: Methods and Applications”, NOW Publishers (2014) [15] Weibo Liu, Zidong Wang, Xiaohui Liu, Nianyin Zeng, Yurong Liu, Fuad E Alsaadi, “A survey of deep neural network architectures and their applications”, Neurocomputing 234: 11-26 (2017) [16] Md Zahangir Alom, Tarek M Taha, Chris Yakopcic, Stefan Westberg, Paheding Sidike, Mst Shamima Nasrin, Mahmudul Hasan, Brian C Van Essen, Abdul A S Awwal and Vijayan K Asari, “A State-of-theArt Survey on Deep Learning Theory and Architectures”, Electronics 8(3), 292 (2019) [17] Hong-Nhung Bui, Tri-Thanh Nguyen, Thi-Cham Nguyen, QuangThuy Ha, “A New Trace Clustering Algorithm Based on Context in Process Mining”, In proceedings of IJCRS, pp 644–657 (2018) [18] Tom Younga, Devamanyu Hazarikab, Soujanya Poriac, Erik Cambria, “Recent Trends in Deep Learning Based Natural Language Processing”, IEEE Comp Int Mag 13(3), pp 55-75 (2018) [19] WMP Van der Aalst: Process Mining Discovery, Conformance and Enhancement of Busi-ness Processes Springer (2011) ... III DEEP NEURAL NETWORKS IN TRACE REPRESENTATION A Deep neural networks Deep Neural Networks (DNN) is a class of machine learning algorithms DNN, based on Artificial Neural Networks (ANN), allows... Tarek M Taha, Chris Yakopcic, Stefan Westberg, Paheding Sidike, Mst Shamima Nasrin, Mahmudul Hasan, Brian C Van Essen, Abdul A S Awwal and Vijayan K Asari, ? ?A State-of-theArt Survey on Deep Learning... value and the desired value Assuming that is an matrix, and is a matrix At the hidden layer, is initialized as a matrix, and is initialized as a matrix At the output layer, is initialized as

Định dạng
Số trang	5
Dung lượng	392,33 KB