DSpace at VNU: A novel clustering method for animal trajectory analysis using Wireless Sensor Network

7 142 0
DSpace at VNU: A novel clustering method for animal trajectory analysis using Wireless Sensor Network

Đang tải... (xem toàn văn)

Thông tin tài liệu

A Novel Clustering Method for Animal Trajectory Analysis using Wireless Sensor Network Quang Hiep Vu Thi Hong Nhan Vu Database/Bioinformatics Laboratory Chungbuk National University Cheongju, Korea hiep88@dblab.chungbuk.ac.kr Human Machine Interaction Laboratory UET, Vietnam National University Hanoi, Vietnam vthnhan@gmail.com Meijing Li Keun Ho Ryu Database/Bioinformatics Laboratory Chungbuk National University Cheongju, Korea mjlee@dblab.chungbuk.ac.kr Database/Bioinformatics Laboratory Chungbuk National University Cheongju, Korea khryu@dblab.chungbuk.ac.kr Abstract—Animal plays an important role in our Earth, researching the movements of animals is very helpful for us to conserve rare and precious species as well as food exploration In this paper, we employ Wireless Sensor Networks (WSNs) with the potential for highly increased spatial and temporal resolution of measurement data Hence WSNs promise enhanced tracking of animals without human intervention To help experts making a better species and habitat assessment as well as conversation strategies, we propose an Extended Hierarchical Path clustering eHPC1 method for analyzing the mobility of wild animals A predictive mobility algorithm is also presented, which help experts solve the problems in data allocation and management A system that simulates the mobility of animals is implemented Performance of the proposed method is finally evaluated in terms of running time and estimation accuracy Keywords—Clustering methods, animal trajectory analysis, wireless sensor network I INTRODUCTION The animal kingdom is very large having a variety of animals big and small which can be found in water, in air and on land They have different shapes and sizes Animals are our natural resources and it is not infinite so we should our best to conserve them They need our care and love To that end, animal tracking is very useful, which helps us understand how individuals and populations move within local areas, migrate across oceans and continents, and evolve through millennia This information is being used to address environmental challenges such as climate and land use change, biodiversity loss, invasive species, and the spread of infectious diseases Wireless Sensor Networks (WSNs) provide an advanced solution for tracking animals A WSN is composed of relay nodes, sensor nodes, and the base stations Cellular networks can also be used considering the difficulty of achieving the necessary radio range coverage The WSN displays precise animal locations and movements By using the Received Signal Strength Indicator the trilateration method can be used to exactly locate the animals and GPS positioning (equipped with animals) gives accurate position information that can be stored on the sensor node The sensing data from distributed relay nodes will transmit to base stations and the base stations can use satellites or cellular networks to transmit the data to the researcher [7, 8] In this research, we assume animals always move in a coverage region of the WSNs and we can equip the devices in that region for monitoring animals In this paper, we employ WSNs to track animals WSNs technology will be more effective than other technologies for obtaining the required information with a very considerable reduction of the intervention of the researcher The automatically collected volume of data is enormous, which is analyzed and used in long-term decision making [13, 14, 15] Understanding the movement patterns of animals helps us in making strategies of rare wildlife conversation and efficiently food exploration For that purpose, many techniques for data analysis have been proposed Clustering is just one of them Conventional methods such as K-mean, DBSCAN, etc cannot be directly applied to discover hidden trajectory patterns since they were originally proposed for objects in form of points, not for objects in form of time series [1] An algorithm HierCluster recently proposed in [10] for finding out clusters of user trajectories using cell phones in cellular networks This work uses the metric edit distance for determining the similarity between two trajectories However calculating edit distances for hundreds of sequences, which is often the case, is extremely inefficient In this paper, we introduce an Extended Hierarchical Path Clustering (eHPC1) method for mobility paths of animals 249 Similar to HierCluster, eHPC1 works in the bottom-up fashion but the similarity or dissimilarity of clusters is determined by hamming distance Closest clusters are merged until the number of clusters is equal to a predefined threshold Besides, in management of animal mobility, we wish to know in advance the movement direction of an animal To this end we introduce the algorithm for Prediction of Directional Movement (PDM) Finally A simulator system for animal mobility is developed The performance of eHPC1 is evaluated with respect to the length of trajectories and the number of objects The prediction accuracy of the algorithm PDM is also assessed based on the deviation between moving points The mobility patterns as well as predictive positions can be used in the application of animal management The rest of the paper is organized as follows Section II overviews work related to clustering methods followed by an animal path model in Section III Section IV explains the algorithms for finding clusters of mobility paths and predictive position Section V shows the experimental results Conclusion and future work is presented in Section VI II RELATED WORK Fast mining of the information from the warehouse is always a significant issue in data analysis A variety of methods has been developed for this purpose, clustering is one out of them Clustering is a way of grouping a set of physical or abstract objects into classes of similar objects There is a great deal of clustering approaches available, and each of them may give a different grouping methodology of a dataset In general, clustering methods may be divided into two categories based on the cluster structure which they produce: hierarchical clustering and partitioning clustering [1, 11, 12] Partitioning methods (K-mean, Bisecting K-mean PAM, DBSCAN); in which the classes are mutually exclusive, and the less common clumping method, in which overlap is allowed Each object is a member of the cluster with which it is most similar; however the threshold of similarity has to be defined The hierarchical approaches can be divided into agglomerative and divisive [1, 11] Divisive methods (TOP DOWN) begin with just only one cluster that contains all sample data Then, the single cluster splits into or more clusters that have higher dissimilarity between them until the number of clusters specified by the user is obtained In contrast, Agglomerative methods (BOTTOM UP), the hierarchy is built up in a series of N-1 agglomerations or fusion of pairs of objects, beginning with the un-clustered dataset For N samples, agglomerative algorithms begin with N clusters and each cluster contains a single sample or a point Then two clusters are merged so that the similarity between them is the closest until the number of clusters becomes one or as specified by the user In this research, we extend the later for moving animals Previous methods have mainly dealt with clustering of point data Recent improvements in WSNs and tracking facilities have made it possible to collect a large amount of path data of moving objects There is increasing interest to perform data analysis over these path data A typical data analysis task is to find objects that have moved in a similar way Thus, an efficient clustering algorithm for path is essential for such data analysis tasks The work in [3, 6] have proposed a model-based clustering algorithm for paths Recently, there have been a lot of researches on mobility management Compared to the amount of work performed on location update, little has been done in the area of mobility prediction These works have some weaknesses in the following: For collecting such information, most of the works [3, 5, 6] use highly sophisticated and expensive tools such as GPS, which is very frequent readings uses the battery power faster and can’t re-task the network The work in [2, 3, 5] assumed the mobility patterns are already available These patterns are then used for mobility prediction and not attempt to find mobility pattern And prediction is based on the probability distribution of the speed and direction of the objects This paper studies path clustering method using the previously collected data The algorithm is built based on the idea of the hierarchical clustering approach HierCluster in [10] The edit distance between two strings is employed It defines the minimum number of label changes, insertions and deletes to map from one string to another Unfortunately, calculating edit distances for hundreds of sequences, which is often the case, is extremely inefficient To solve this problem, we apply hamming distance [4] and this measure is more appropriate for comparing series of labels associated with timestamps The trajectory patterns discovered are then used for problem of predictive mobility III ANIMAL MOBILITY MODEL In this paper we assume that the animals move in a space in which a wireless sensor network (WSN) is installed The coverage region of the WSN is partitioned into smaller areas called cells In each cell in the WSN, there is a base station (BS) which has the capability of broadcasting and receiving information The base stations are connected to each other via a fixed wired network The base station receives the sensing data from distributed relay nodes The coverage area consists of a number of location areas Each location area may consist of one or more cells but in our work we assumed that each location area consists of only one cell Base stations regularly broadcast the ID of the cell in which they are located Therefore, the animals which are in a cell would be picked up by listening to the broadcast channel transmitting the signal The movement of animals from one cell to another will be recorded in a database which called the home location register In addition, every base station keeps a database in which the profiles of the animals located in this cell are recorded This database is called visitor location register Therefore, in our system it is possible to get the movement history of an animal from the logs on its home location register 250 The mobility path of an animal is defined in form of Tr = < (id1, t1), (id2, t2) , (idk, tk)> where idk, tk denotes the ID number of the cell to which the animal enters at timestamp tk In this recording it is clear that two consecutive ID numbers must be the ID numbers of two neighboring cells in the network [8, 9] We call the original data recoded from WSNs the Animal Actual Paths (AAPs) They are considered as a valuable source of information because the mobility of the animals contains both regular and random patterns Therefore based on the AAPs, we may be able to extract the regular patterns If needed, the future movement direction of an animal can be estimated based on the mobility patterns We assume that AAPs is represented as Tr = (p1, p2…pn) in which each pi is a moving point as shown in Fig The moving point is represented by a Tr tuple pi= (xi, yi, ti), in which ti is the timestamp at the moment the point (xi, yi) is sampled IV APPROACH TO CLUSTERING ANIMAL MOVING PATHS AND PREDICTIONAL MOVEMENT This section presents a method for clustering animal trajectories by extending hierarchical clustering approach The mobility patterns discovered are then applied to estimate the directional movement of the animals A Method for clustering animal trajectories with a number of predefined clusters The idea of our algorithm is based on the Hierarchical agglomerative clustering (HAC) In other words, it works in Bottom-up fashion Each cluster AMP of animal mobility paths has a representative rep AMP The representative is the path that the minimum total number of distances to the rest of the paths in the same group Figure represent a pair of paths Owing to the uncertainty of the mobility and sometimes we not need to know the exact coordinate of the animal, therefore we can transform the absolute position (xi, yi) to a relative position The smallest unit of the relative position of the animal is the cell of the area covered by WSNs According, an AAPS can be represented by a series of relative positions To that, AAP is mapped onto the horizontal plane which is represented by cells Each cell c is a square shown in Figure As a result, the mobility path can be represented by Tr = (c1,c2…cn) in which ck denotes the ID of the cell k in the coverage region In this paper we will use the format Tr = (c1,c2…cn) in the proposed algorithm for representing animal path Where: ci: label of cell in the map if pi falls in that cells Fig The distance between a pair of paths After mapping the two paths onto the plane, we obtain a series of labels for each path To determine the distance between two sequences, the metric named hamming distance is used Assume we have two strings: Ta = { , , …, } and Tb = {, , … , } The distance between Ta and Tb is determined by the following equation (We asume the sequences are compared at the same time): (1) Where: (2) Fig Mobility path of an animal in 3D space We call the frequently followed mobility paths as Animal Mobility Patterns (AMPs) Understanding AMPs helps us understand the mobility rules of the animals It is useful in making decision related to food resource allowance for animals as well as conservation as well as exploitation strategies Besides, sometimes we wish to estimate the movement direction of an animal based on the mobility rules when we know the animal trajectory to the current moment We can predict the next inter-cell movement of the animal by matching the actual current path to one of the existing mobility patterns Applying the measure hamming distance to computing the distance between a pair of paths in Figure 2, we obtain hammingDistance(A,B) = because they have no labels in common and hammingDistance(A,C) = because they have labels in common Algorithm eHPC1() Input: + D: A set consisting of n animal actual paths AAPs + k: The number of clusters AMPs Output: k clusters of animal paths Method: 251 Begin Step 1: Initialize clusters Create n clusters AMPs with their corresponding n mobility paths AAPs in D as their representatives Step2: Repeat merging the closest clusters while (n > k) { + minimum hammingDistance(AMPi.rep, AMPj.rep) Å Find the two clusters i≠ j whose distance is minimum among the existing clusters, //Merge the two closest clusters + AAPs of AMP’Å AAPs of AMPi ∪ AAPs of AMPj ; + Calculate the representative path of the cluster AMP’; + n = n – 1; } return k clusters with their representative paths; End Algorithm Extended Hierarchical Path Clustering Algorithm eHPC1() with a predefined number k of clusters The Algorithm eHPC1() explains the mechanism of clustering the animal trajectories, which takes as its input a set D of n animal actual paths and k predefined number of clusters Initially every single animal actual path (AAP) forms a cluster AMP itself and this single element also plays the role as its representative At each iteration of eHPC1(), two closest AMPi and AMPj are merged to form a new cluster AMP’(i.e., AAPs of AMP’ is the union of AAPs of AMPi and AAPs of AMPj) After each merge operation, the representative of the new cluster AMP’ must be determined The merge operation is repeatedly carried out until the number of the AMPs is satisfied a predefined threshold k B Algorithm for Prediction of directional movement With the mobility patterns AMPs returned by the algorithm eHPC1(), we can apply to estimate the path an animal possibly follow in the near future Tr1(blue color) is closest to the third cluster (rep3-black color), so its future mobility is estimated based on the representative rep3 The Algorithm shows the process of Prediction of the Directional Movement PDM() of an animal when the trajectory Tr1 to the current moment is known With the given set of mobility patterns AMPs, the one whose representative is closest to Tr1 is found, say the representative P={P1, P2…,Pm} The next mobility path will be Tr2={Pm+1, Pm+2…,Pm+r} It is obvious that if the movement Tr1 is random and much different from the mobility rules the animal often follow, it is impossible to estimate the future mobility To control this case, a constraint maxdistance is used The next movement of the animal can only be estimated if the distance between its path Tr1 and the representatives of the clusters is less that threshold Algorithm PDM() Input: + S: Set of clusters AMPs + m: Length of current trajectory Tr1 + r: Length of the next moving path + Distance threshold: maxdistance Output: Future path Tr2 of the animal Method: Begin Step 1: Find the pattern AMPu whose representative rep is closest to Tr1 For ( i ∈ set of patterns AMPs) uÅ minimum hammingDistance(AMPi.rep, Tr1); Step 2: If ( Distance(Tr1, u) < maxdistance) //Predict the future movement of Tr1 Tr2Å( um+1…um+r), (r is length of future movement of Tr1) Else //Can’t predict Tr2Åφ; Return Tr2; End Algorithm Predicting future mobility path of the animal V EXPERIMENT AND RESULTS This section presents the experimental settings as well as the performance evaluation of the proposed algorithms Fig Example of predicted directional movement based on clusters of mobility paths An animal’s next movement is predicted by finding the best matching AMP with the trajectory the animal has been moving to the current time The best match is the one that has the minimum distance to the current trajectory In case, more than cluster one matches, we randomly choose one out of them for prediction Figure shows the predicted movement of an animal whose trajectory till the current moment Tr1 The path A Simulation Setting To assess the performance of the proposed algorithms, we firstly build a system that simulates the mobility of the animals Without loss of generality, it is assumed that the animals travel on a 10 by 15 square shaped network which gives a total of 150 cells (area of each cell corresponds to 60x60 pixels) In order to generate the Animal actual paths (AAPs), first a number of Animal mobility patterns (AMPs) is defined The 252 length of an AMP is determined by a uniform distribution with a length l Each AMP is taken as a random walk over the square network Otherwise, the AAPs are gathered by a function in which AMPs are selected randomly to be used for the generation of AAPs We also use a corruption mechanism to distinguish the AAPs from its corresponding AMPs We insert random cells between the consecutive cells of the AMP In order to accomplish this step, we define a deviation ratio θ, which denotes the ratio of the number of such random cells to the number of cells in the corresponding AMPs The total number of AAPs is defined by the user and from this we construct the training and test sets The training set is 90% total set of AAPs and the test set is 10% total set of AAPs Figure indicates that the execution time of the algorithm eHPC1 increase along the increase in the average length l of the trajectory It is appropriate to the theoretical analysis of the time complexity of eHPC1 discussed above 2) Impact of the number of mobility paths In this test, we investigate the effect of the number of paths In this case, we assign the default value (l=6) to the average path length of the data sets Figure shows that as the number c of mobility paths increases, the running time of eHPC1 algorithm also increases In order to evaluate the algorithm PDM(), we have used several paths from the original AAPs for testing accuracy problem Let’s assume that there are n original AAPs, and x number of AAPs paths (x

Ngày đăng: 16/12/2017, 06:28