A lossless DEM compression for fast retrieval method using fuzzy clustering and MANFIS neural network tài liệu, giáo án,...
Engineering Applications of Artificial Intelligence 29 (2014) 33–42 Contents lists available at ScienceDirect Engineering Applications of Artificial Intelligence journal homepage: www.elsevier.com/locate/engappai A lossless DEM compression for fast retrieval method using fuzzy clustering and MANFIS neural network Le Hoang Son a,n, Nguyen Duy Linh a, Hoang Viet Long b a b VNU University of Science, Vietnam National University, 334 Nguyen Trai, Thanh Xuan, Hanoi, Vietnam Faculty of Basic Sciences, University of Transport and Communications, Vietnam art ic l e i nf o a b s t r a c t Article history: Received July 2013 Received in revised form 27 November 2013 Accepted December 2013 Available online 16 December 2013 In this paper, we propose an integrated approach between fuzzy C-means (FCM) and multi-active neuro fuzzy inference system (MANFIS) for the lossless DEM compression for fast retrieval (DCR) problem, aiming to compress digital elevation model (DEM) data with the priority of fast retrieval from the client machine over the Internet environment Previous researches of this problem either used the float wavelet transforms integrated with the SPIHT coding or constructed a predictor model using statistical correlation of DEM data in local neighborhoods; thus giving large-sized compressed data and slow transferring time of data between the server and the client Based on the observation that different nonlinear transforms for predictive values in the sliding windows may increase the compression ratio, we herein present a novel approach for DCR problem and validated it experimentally on the benchmark DEM datasets The comparative results show that our method produces better compression ratio than the relevant ones & 2013 Elsevier Ltd All rights reserved Keywords: Data compression DCR problem Dem Fuzzy clustering Manfis Introduction Digital elevation model (DEM) (a.k.a terrain) is the most popular source among all kinds of terrain data that reflect the vertical and horizontal dimensions of land surface and are expressed in terms of the elevation, slope, and orientation of terrain features It has been being applied for various applications such as flood or drainage modeling, land-use studies, geological applications, rendering of 3D visualizations, rectification of aerial photography or satellite imagery, line-of-sight analysis, etc According to Li et al (2005) and Son et al (2012, 2013), DEM is commonly built using remote sensing techniques, e.g photogrammetry, LiDAR, IfSAR, or from land surveying so that its sizes are often large, for instances, approximately several gigabytes for a medium resolution terrain Transferring DEM over the Internet environment is necessary not only for sharing new knowledge about terrain features but also for the storing and retrieving relevant information of that terrain Nevertheless, the sizes of DEM are large, and the compression of this dataset is a must before sending it to the Internet As such, the lossless DEM compression for fast retrieval (DCR) problem was designed to compress DEM with the priority of fast retrieval from the client machine over the Internet environment n Corresponding author Tel.: ỵ 84 904171284; fax: ỵ 84 0438623938 E-mail addresses: sonlh@vnu.edu.vn, chinhson2002@gmail.com (L.H Son) 0952-1976/$ - see front matter & 2013 Elsevier Ltd All rights reserved http://dx.doi.org/10.1016/j.engappai.2013.12.002 Several works concerning the DCR problem were presented Boucheron and Creusere (2003, 2005) used some float wavelet transforms such as 5/3, 9/7, max and wavelets to decompose a terrain into sub-bands, such that lower sub-bands correspond to higher terrain frequencies and higher sub-bands correspond to lower terrain frequencies, where most of the image energy is concentrated Then, coefficients are quantified before encoding After quantification, coefficients can be encoded by the SPIHT coding which transmits the most important image information first The searching process is fast, however, the compression one is slow since much computation is required in SPIHT and Wavelet transforms Kidner and Smith (2003) presented an algorithm for the DCR problem using the statistical correlation of terrain data in local neighborhoods Elevation data are pre-processed by simple linear prediction algorithms such as 3-points, 8-points, 12-points or 24-points Lagrange The differences between the predictions and the real elevations are then compressed by Arithmetic Coding This algorithm is simple and obtains fast compressed time Moreover, the compressed files are less than half the size of encoded DEM by GZIP However, the algorithm does not support for retrieval process Inanc (2008) introduced ODETCOM which is a predictor model using a causal template of size eight (linear predictor) An over-determined system of linear equations that correspond to the prediction and consist of the eight elevations in the causal template and a constant term was used to find a compromise for the best set of the coefficients The compression ratio of ODETCOM is better than those of JPEG 2000 (Marcellin 34 L.H Son et al / Engineering Applications of Artificial Intelligence 29 (2014) 33–42 et al., 2000) (lossless mode) and JPEG-LS (Rane and Sapiro, 2001) Nonetheless, ODETCOM has high computational complexity depending on the time solving the over-determined linear equation systems Zheng et al (2009) focused on the DEM multi-scale representation, progressive compression and transmission based on the integer lifting wavelet Through a series of experiments on different wavelet transforms, the 2/6 integer wavelet was found to be the most suitable transform for the DEM multi-scale progressive compression among the 14 reversible integer wavelet transforms compared (Adams and Kossentini, 2000) Our previous work in Son et al (2011) presented an algorithm named as DCRA for the DCR problem using the sliding windows and the modified Kidner and Smith (2003) method with 1-point Lagrange being applied for all sliding windows Motivated by the multi-resolution mechanism, the client machine can access some parts of DEM in equivalent to a specific level of resolutions; thus accelerating the retrieval process from the client Summary of the relevant works for the DCR problem is highlighted below: Fig Two levels of resolutions The relevant researches either used the float wavelet trans- forms integrated with the SPIHT coding or constructed a predictor model using statistical correlation of DEM in local neighborhoods; thus giving large-sized compressed data (or low compression ratio) and slow transferring time of data between the server and the client The comparison in the paper (Son et al., 2011) showed that DCRA obtains faster compressed and retrieval time than the Kidner and Smith (2003) method and other relevant ones whilst the compression ratio is approximate to those of other methods From those remarks, we clearly recognize that low compression ratio is the major problem of all algorithms that affect the total processing time including the compressed, the transferring and the retrieval time, and DCRA is the best method for the DCR problem Thus, our objective in this study aims to enhance the compression ratio of DCRA by using an integrated approach between fuzzy C-means (FCM) and multi-active neuro fuzzy inference system (MANFIS) It is motivated by the fact that different non-linear transforms for predictive values in the sliding windows of DCRA may increase the compression ratio Our contribution in this work is the introduction of a novel method named as FMANFIS that incorporates FCM (Bezdek et al., 1984) with MANFIS to determine similar sliding windows and find out the coefficients of non-linear transforms for those sliding windows F-MANFIS will be compared with DCRA and other methods in terms of the compression ratio on the benchmark DEM datasets to verify the effectiveness of the proposed method The rest of the paper is organized as follows Section analyzes the DCR problem and the DCRA method The ideas and details of F-MANFIS are introduced in Section Section validates the proposed approach through a set of experiments involving Fig A set of moving steps Fig The DCR problem benchmark DEM data Finally, Section draws the conclusions and delineates the future research directions Analyses of the DCR problem and the DCRA method In this section, we clearly describe the DCR problem and the DCRA method Now, let us briefly present some basic notations in Son et al (2011) as follows (Figs 1–3): Definition A sliding window (SW) with an original point ðx0 ; y0 Þ and sizes ðw; hÞ is defined as SWðx0 ; y0 ị ẳ fx0 ỵ u; y0 ỵ vịju A ẵ0; w; v A ẵ0; hg: 1ị Denition A set of moving steps for the point ðx; yÞ with arguments ði; jÞ is denoted as Δi;j ðx; yÞ ẳ fx ỵ ; y ỵ ịj1 ẳ À i; 0; i; δ2 ¼ À j; 0; jg: ð2Þ Definition The set of original points and their sizes at the level k of resolutions is specified below W0 H0  i;  j ; ðx0 i ; y0 j ị ẳ 2k 2k i; j ¼ 1; 2k À 1; k Z 1; W H0 w; hị ẳ ; ; 2k 2k ð3Þ ð4Þ where ðW ; H Þ are the sizes of a DEM From now on, we denote SW k ðx0 i ; y0 j Þ as a sliding window with the original point ðx0 i ; y0 j Þ at the level k From those definitions, the DCR problem can be interpreted as the compression of DEM at the server machine that allows the fast display of a sliding window at a specific level of resolutions – SW k ðx0 i ; y0 j Þ from the client over the Internet environment without extracting the whole compressed terrain According to Section 1, DCRA, which is influenced by the multiresolution mechanism, is the best method for the DCR problem This method consists of two phases: compression and retrieval, and the effectiveness of the compression process decides the fast retrieval from the client machine The mechanism of the compression in DCRA is depicted in Fig It is obvious that sliding windows, generated from a DEM by the splitting mechanism at a specific level of resolutions, are transformed by the modified Lagrange 1-point The results of that procedure, including the template point and the prediction errors, L.H Son et al / Engineering Applications of Artificial Intelligence 29 (2014) 33–42 35 neighborhood As such, various transform functions should be used for predictive values in order to achieve high compression ratio Based upon those remarks, we will design a novel compression algorithm so-called F-MANFIS that uses fuzzy C-means (FCM) (Bezdek et al., 1984) to determine similar sliding windows in a DEM and MANFIS neural network to find out the coefficients of non-linear transforms for those sliding windows The F-MANFIS method 3.1 The algorithm Fig The compression process in DCRA are compressed by Arithmetic Coding and stored at the server machine In the client machine, a sliding window and its neighborhoods specified by the set of moving steps can be retrieved from the equivalent compressed ones at the server Thus, this enhances the compressed and retrieval time in comparison with the Kidner and Smith method and other relevant ones since a small number of compressed sliding windows is invoked and transferred to the client instead of the whole DEM Despite that the processing time is the advantage of DCRA, the compression ratio is still approximate to those of other relevant methods The reason for this fact is the using of the modified Lagrange 1-point in the compression process described below z ẳ x: 5ị In Eq (5), x is a template point in a sliding window and z is its predictive value Beside the Lagrange 1-point, there exists some other Lagrange transforms such as 3-points (JPEG encoder), 8-points, 12-points and 24-points that use multiple-templatepoints in the sliding window to predict a value in a sliding window Now, let us discuss some remarks of the Lagrange transforms as follows: Through Section 2, we clearly recognize that using various nonlinear multiple-template-points’ Lagrange transforms for predictive values in a sliding window enhances the compression ratio of the algorithm Nonetheless, the number of predictive values in all sliding windows is large, and we cannot use the strategy above for those predictive values Instead of this, the same Lagrange transform can be applied to all predictive values in a sliding window and to those in neighbored sliding windows since nearby sliding windows in a DEM share similar characteristics This observation is influenced by spatial interaction principle (Birkin and Clarke, 1991) that dominates the creation of DEM and other geographic sources The usage of Lagrange transforms in this work is different to that in the DCRA method, which applies a transform to all sliding windows only Consequently, a fuzzy clustering method is required to determine similar sliding windows Fig describes the flowchart of the F-MANFIS method According to this figure, based upon a desired level of resolution, a DEM is divided into sliding windows by the splitting mechanism (Son et al., 2011) These sliding windows are then classified into groups by FCM (Bezdek et al., 1984), which is a widely known fuzzy clustering algorithm for a set of classification problems, e.g image segmentation, fingerprint and gesture recognition, etc The compression ratio can be enhanced by a multiple-template- points Lagrange transform: Kidner and Smith (2003) argued that Lagrange 1-point is often used since it requires less computation than other transforms Nevertheless, more template points in the sliding window, e.g 8-points, 12-points and 24-points, the transform uses to predict a value in a sliding window, better accuracy the result obtains Thus, it should be a multipletemplate-points’ Lagrange transform for the consideration of the improvement of compression ratio in DCRA The non-linear Lagrange transform should be used instead of the linear one: The above Lagrange transforms are linear and expressed by Eq (6) The advantages of linear transforms are the simplicity and fast processing time However, linear transforms not minimize the prediction errors as expected Indeed, it is better to use a non-linear Lagrange transform instead of a linear one n z ¼ ∑ αi xi : 6ị iẳ1 It should be different transforms for different predictive values: In the articles, Kidner and Smith (2003) and Son et al (2011), the authors used a transform function for all predictive values only Even though this reduces the computational time of the algorithms, the prediction errors are not minimal since each predictive value has different correlation with its Fig The F-MANFIS flowchart 36 L.H Son et al / Engineering Applications of Artificial Intelligence 29 (2014) 33–42 Fig Activities of FCM (For interpretation of the references to color in this figure caption, the reader is referred to the web version of this paper.) parameters (a.k.a coefficients) are applied to other related sliding windows, e.g R-1 This reduces the computational complexity of the algorithm since the basic sliding window and other related sliding windows of a group have strong spatial correlation The outputted results are the template points and the prediction errors of related sliding windows All outputs of the group are compressed by the Arithmetic Coding algorithm and stored at the server as in the paper (Son et al., 2011) Similar activities are applied to other groups of sliding windows The F-MANFIS algorithm stops when all groups are compressed and stored at the server 3.2 Using MANFIS for basic sliding windows The pseudo-code below shows how FCM groups the sliding windows Input: -A set of sliding windows -Number of groups (P) Output: -P groups of sliding windows including centers FCM algorithm: 1: For each sliding window, calculate the mean value of all data points 2: Use FCM algorithm (Bezdek et al., 1984) to classify the set of mean values into P groups 3: For each group 4: Assign the data member that is nearest to the center as the new center, and its sliding window is called a basic sliding window 5: The sliding window holding other data members is called related sliding windows (R-) 6: End For Example In Fig 6, we illustrate the activities of FCM algorithm A DEM is divided into 16 sliding windows at the level of resolutions These sliding windows are classified into three groups with the centers being marked as red, green and blue (basic sliding windows) Sliding windows marked as R-1 belong to a cluster whose center is green Similarly, R-2 (R-3) relates to the blue (red) center Once we have the groups of sliding windows, the MANFIS neural network is applied to the basic sliding windows of those groups MANFIS is a generalization of CANFIS (Jang, 1993), which is a fuzzy-neural network designed to construct a multiple-output model with non-linear fuzzy rules such as the first-order Sugeno fuzzy rule By integrating a modular neural network, CANFIS is able to rapidly and accurately approximate complex functions According to Parthiban and Subramanian (2007), CANFIS can solve problems more efficiently than other types of neural networks when the underlying function to a model is highly variable or locally extreme Nonetheless, in order to find a non-linear transform function that is best-suited for a sliding window, an extension of CANFIS is required Thus, we propose a novel fuzzy-neural network the so-called multi-active neuro fuzzy inference system (MANFIS), which is able to solve the MIMO (multiple inputs– multiple outputs) model by various first-order Sugeno fuzzy rules Since the basic sliding window of a group contains principal elevations that are closely related to those of other related sliding windows, we firstly use the MANFIS neural network to the basic sliding window and receive the template points, the prediction errors and the MANFIS parameters used to restore the original basic sliding window from the compressed one This step consists of several timeconsuming training iterations of MANFIS parameters in order to get the optimal ones that minimize the prediction errors Next, those As we can recognize in Fig 5, the most important part of F-MAFIS is using MANFIS neural network for the basic sliding window that generates the optimal MANFIS parameters for other related sliding windows This sub-section describes in detail the structure, the working mechanism and the training phase of MANFIS The structure of MANFIS is depicted in Fig The working mechanism of MANFIS is described as follows Two inputted variables x and y are the template points in the basic sliding window They are modeled by the fuzzy sets having domain of values {“High”, “Low”} MANFIS is capable to process the MIMO model by the first-order Sugeno fuzzy rules in the following equation: Rule : IF ðx isA1 Þ and ðy is B1 Þ THEN ðu1 is C 11 Þ and ðu2 is C 12 Þ and ðu3 is C 13 …ðun is C 1n Þ: ð7Þ MANFIS consists of four layers as described below: Layer (fuzzification): Each node in this layer is the membership grade of a fuzzy set and specifies the degree to which the given input belongs to one of the fuzzy sets The membership function is i xị ẳ 1=1 ỵ exp x ci ịịị; 8ị i ẳ 1; 4; where ; ci ị, i ẳ 1; are the premise parameters Layer 2: Each node in this layer is product of all the incoming signals wj ¼ μAj ðxÞ Â μBj ðyÞ; j ¼ 1; 2; ð9Þ Layer (synthesis): Each node in this layer is sum of output values from the previous layer O3j ¼ ∑ wi  C ij ; i¼1 j ¼ 1; n; ð10Þ where n is the number of predictive values except two template points, and C ij A ½0; 1; j ¼ 1; n; i ¼ 1; are the consequence parameters The last node in the layer is calculated as the direct sum of outputted values: O3n ỵ ẳ w1 ỵ w2 : 11ị Layer (output): Outputted results are specified as follows: uj ẳ O3j =O3n ỵ ; j ẳ 1; n: ð12Þ Obviously, based on the premise and consequence parameters, the predictive values uj (j ¼ 1; n) can be calculated from two L.H Son et al / Engineering Applications of Artificial Intelligence 29 (2014) 33–42 37 Fig The structure of MANFIS template points x and y The mean square error (MSE) is calculated as the difference between the basic sliding window and its predictive sliding window: sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi MSE ¼ n ∑ ðuj À xj Þ2 ; ð13Þ j¼1 where xj j ẳ 1; nị is the original value of uj in the basic sliding window In order to minimize the MSE value, the training phase of MANFIS is required In this part, we use the hybrid algorithm between gradient descent (GD) and particle swarm optimization (PSO) (Kennedy and Eberhart, 1995) to train the MANFIS network The training consists of two sub-phases: forward and backward In the forward, we fix the premise parameters and use PSO to train the consequence ones Then, in the backward, we use the optimal consequence parameters corresponding to the global best swarm particle in PSO to train the premise ones by the GD method Those premise parameters are again used in the forward phase of the next iteration step The training phase is repeatedly performed until MSE is smaller than a pre-defined threshold (ε) The reasons of using the hybrid method between GD and PSO for the training are elicited as follows Since we have two kinds of parameters to be trained that are the premise and the consequence, there is a need of using different training methods to adapt with their own characteristics More specifically, the number of consequence parameters is equivalent to the number of elements in a basic sliding window, and they contribute greatly to the calculation of the MSE value between the basic sliding window and its predictive one Thus, a meta-heuristic method that can handle large number of particles and converge quickly to the optima such as PSO should be opted Lastly, because the number of premise parameters is small and their values are often affected by the consequence parameters so that the fast GD training method is used Another advantage of using the hybrid method between GD and PSO is the balance between computational time and the compression ratio Thus, this explains why we combine the GD and the PSO methods to train the parameters The GD training algorithm is well-known and carefully described in Ooyen and Nienhuis (1992) Indeed, we ignore the descriptions of this algorithm and concentrate on the PSO algorithm PSO is a population based stochastic optimization technique developed by Dr Eberhart and Dr Kennedy in 1995, inspired by social behavior of bird flocking or fish schooling Generally, it is based on the principle: “The best strategy to find the food is to follow the bird which is nearest to it” Thus, each single solution in PSO is a “bird” or “particle” in the search space All particles have fitness values which are evaluated by the fitness function to be optimized, and have velocities which direct the flying of the particles The particles fly through the problem space by following the current optimum particles The training of forward sub-phase using PSO is described below: Encode/decode: The beginning population is initiated with Q particles where Q is a designed parameter Each particle is a vector v ¼ ðC 11 ; C 21 ; C 12 ; C 22 ; ::; C 1n ; C 2n ị where C ij j ẳ 1; n i ẳ 1; 2ị is a consequence parameter These particles are randomly initiated in [0,1] The fitness function: n f vị ẳ 1=nị ẵw1 C 1i ỵ w2 C 2i ị=w1 ỵ w2 ị xi ; 14ị iẳ1 where xi (i ẳ 1; n) is the original value of ui in the basic sliding window The criteria to choose the best values in PSO: f ðvÞ- : ð15Þ After a number of training iterations ðTrain_IterÞ, the optimal solution of {premise, consequence} parameters is found We then need to perform the backward sub-phase to converge to the global solution Now, F-MANFIS is totally described But before we evaluate the proposed method, let us raise an important question In the 38 L.H Son et al / Engineering Applications of Artificial Intelligence 29 (2014) 33–42 flowchart of the algorithm in Fig 5, we have used FCM to classify similar sliding windows into groups However, how many groups of sliding windows are enough for F-MANFIS? In the other words, we have to specify the suitable value of parameter P in order to balance between the compression ratio and the computational complexity of the algorithm The following theorem helps us to answer the question Theorem The acceptable range of parameter P is ½2; N =3 where N is the number of sliding windows Proof Since the number of premise and consequence parameters of MANFIS is 2n ỵ 8ị where n ỵ 2ị is the number of values in a sliding window Indeed, the upper bound of parameter P is ẵN n ỵ 2ị=ẵ2n ỵ 2ị ỵ % N =3: 16ị Using this result, we can choose the suitable value of parameter P for a given level of resolutions and a given terrain Results 4.1 Experimental environment In this part, we describe the experimental environments such as the following: Experimental tools: We have implemented the proposed F- MANFIS method in addition to Kidner and Smith (2003) and DCRA (Son et al., 2011) in Java programming language and executed it on a PC Intel Pentium 4, CPU 2.66 GHz, GB RAM, 80 GB HDD Parameter setting: According to Kennedy and Eberhart (1995), some values below are assigned to the PSO’s parameters in order to achieve the best results: ○ The number of particles in PSO is: Q ¼ 100 ○ The maximal number of iterations in PSO is 100 ○ The parameters of velocity and positions in PSO is: c1 ; c2 ; c3 ị ẳ 0:2; 0:3; 0:5ị The threshold: ẳ 0:01 and the number of groups: P ¼ ○ The total number of iterations in the training phase: Train_Iter ¼ 50 Experimental datasets: a benchmark set of 24 USGS 1:250,000 scale DEMs on a 1201  1201 grid (see: http://dds.cr.usgs.gov/ pub/data/DEM/250/) Each DEM is the first in each higher level USGS directory (ordered alphabetically) for which there is a DEM entry (there are no DEMs for X or Y) Summary statistics of these data can be found at Kidner and Smith (2003), including the elevation ranges, mean elevations, standard deviations, and the entropies or information contents of the original data Objective: We evaluate the compression ratios and compressed time of those algorithms Some experiments to choose the suitable value of parameter Pand the comparison of training methods in F-MANFIS are also considered 4.2 The comparison of compression ratio In this section, we use three algorithms: Kidner & Smith, DCRA and F-MANFIS to compress 24 DEM datasets and record their compressed sizes The comparison of compressed sizes of those algorithms is described in Table Obviously, the compressed sizes of F-MANFIS are mostly smaller than those of DCRA and Kidner & Smith For example, the original size of dataset no (Dalhart) is 7045.44 kilobytes Table The comparison of compressed sizes of algorithms (kB) No Dataset Original size F-MANFIS 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Aberdeen Baker Caliente Dalhart Eagle Pass Fairmont Gadsden Hailey Idaho Falls Jacksonville Kalispell La Crosse Macon Nashville O’Neill Paducah Quebec Racine Sacramento Tallahassee Ukiah Valdosta Waco Yakima 5636.84 6634.1 6929.23 7045.44 5636.84 5636.84 5636.84 7044.76 7045.44 3058.48 6797.47 5636.84 4466.25 5636.84 5636.84 5578.93 5469.94 5636.84 6293.56 3984.84 5521.38 4223.39 5504.55 5842.81 143.36 245.76 217.09 200.7 147.46 143.36 188.42 172.03 237.57 135.17 192.51 147.46 253.95 151.55 163.84 151.55 143.36 151.55 217.09 151.55 143.36 151.55 151.55 143.36 DCRA 141.9023 760.57 653.85 252.94 249.29 213.73 514.33 803.64 241.52 81.73 787.45 669.01 320.35 479.86 329.22 332.72 406.32 77.54 870.08 267.56 668.62 137.36 320.56 587.99 Kidner & Smith 140.85 720.41 610.37 242.12 236.21 200.47 479.43 759.57 239.22 93.02 741.81 624.02 297.12 444.17 310.19 309.13 375.54 85.63 826.87 256.96 639.96 136 294.48 551.16 (kB)–6.88 Megabytes (MB) The compressed size of this dataset using Kidner & Smith (DCRA) is 242.12 (252.94) kB This value in case of F-MANFIS is 200.7 kB, which is approximate to 82.9% (79.3%) of that of Kidner & Smith (DCRA) Similarly, the compressed sizes of F-MANFIS, DCRA and Kidner & Smith for the dataset no 24 (Yakima) are 143.36, 587.99 and 551.16 kB, respectively The compressed size of F-MANFIS in this case is approximate to 26% (24.4%) of that of Kidner & Smith (DCRA) Nonetheless, there exist some cases that show the inferiority of F-MANFIS to other algorithms such as the dataset no (Aberdeen), no 10 (Jacksonville), no 18 (Racine) and no 22 (Valdosta) In those cases, the compressed sizes of F-MANFIS are larger than those of DCRA and Kidner & Smith However, the differences of sizes between algorithms are not large, and the number of those cases is small in comparison with the remaining Therefore, FMANFIS achieves better compressed sizes of data than other relevant algorithms From Table 1, we can find out that the maximal differences of compressed sizes between F-MANFIS and other algorithms are recorded at the dataset no 19 (Sacramento) In this case, the compressed sizes of F-MANFIS, DCRA and Kidner & Smith are 217.09, 870.08 and 826.87 kB, respectively The difference of compressed sizes between F-MANFIS and DCRA (Kidner & Smith) is 652.99 (609.78) The compressed size of F-MANFIS in this case is approximate to 26.2% (24.95%) of that of Kidner & Smith (DCRA) Similarly, the minimal differences of compressed sizes between FMANFIS and other algorithms are found at the dataset no (Idaho Falls) The difference of compressed sizes between F-MANFIS and DCRA (Kidner & Smith) in this case is 3.95 (1.65) The compressed size of F-MANFIS in this case is approximate to 99.3% (98.3%) of that of Kidner & Smith (DCRA) Fig visualizes the compressed sizes of three algorithms on 24 DEM datasets As we can recognize from this figure, the compressed ratio line of F-MANFIS is mostly below those lines of DCRA and Kidner & Smith The compressed size of F-MANFIS is approximate to 60.9% (61.6%) of that of DCRA (Kidner & Smith) Thus, using the hybrid approach between FCM and MANFIS in the proposed F-MANFIS helps the compression sizes of data better In Fig 9, we depict the compression ratios of all algorithms, calculated by the percentages of the compressed size over the L.H Son et al / Engineering Applications of Artificial Intelligence 29 (2014) 33–42 39 Fig The compressed sizes of algorithms on 24 benchmark DEM datasets (kB) Fig The compression ratios of algorithms original size of a DEM dataset The results show that the compression ratio of F-MANFIS is better than those of DCRA and Kidner & Smith The average compression ratio of F-MANFIS on 24 DEM datasets is 3.09 that means the compressed size of dataset is approximate to 3.09% of the original size Meanwhile, the average compression ratios of DCRA and Kidner & Smith are 7.22 and 6.84, respectively In Fig 10, the performance of all algorithms, calculated by the subtraction of 100 percents to the compression ratio, is described This figure shows how many percents of data can be compressed by a compression algorithm It is clear that F-MANFIS is quite stable through various cases The performance of F-MANFIS is 96.9% whilst those of DCRA and Kidner & Smith are 92.8% and 93.2%, respectively This re-confirms that F-MANFIS is better than DCRA and Kidner & Smith in terms of compression ratio We also investigate the compressed time of all algorithms The results in Table show that the F-MANFIS takes more time to compress the datasets than other algorithms Nonetheless, the compressed time of F-MANFIS is around 303 s ( $ min) to compress a DEM It is small and is acceptable to our context Moreover, the experimental results also affirm a remark in the paper (Son et al., 2011) that proves the advantage of DCRA over other algorithms in terms of processing time Some remarks found from Section 4.2 are the following: The compression ratio of F-MANFIS is better than those other relevant algorithms The performance of F-MANFIS is 96.9% The compressed time of F-MANFIS is slower than those of other algorithms 4.3 Choosing optimal value for the number of groups In this section, we made some experiments to determine the optimal number of groups of sliding windows (P) for our experimental datasets Even though we have a theoretical analysis of this determination through Theorem 1, an experimentation to specify the exact value of P is still necessary We have re-run the F-MANFIS algorithm with various numbers of groups and on different datasets, especially those who are the bad cases in Section 4.2 such as the dataset no (Aberdeen), no 10 (Jacksonville), no 18 (Racine) and no 40 L.H Son et al / Engineering Applications of Artificial Intelligence 29 (2014) 33–42 Fig 10 The performance of algorithms Table The comparison of compressed time of algorithms (sec) Table The compressed size of F-MANFIS through various numbers of groups (kB) No Dataset F-MANFIS DCRA Kidner & Smith 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Aberdeen Baker Caliente Dalhart Eagle Pass Fairmont Gadsden Hailey Idaho Falls Jacksonville Kalispell La Crosse Macon Nashville O’Neill Paducah Quebec Racine Sacramento Tallahassee Ukiah Valdosta Waco Yakima 293.6 289.1 289.1 297.7 318.7 318.6 321.7 310.3 337.7 350.1 304.5 308.5 298.8 301.1 285.9 289.2 294.6 306.5 300.1 298.3 301.8 295.3 291.8 280.3 5.8 20.5 14.6 6.4 6.6 6.0 11.3 20 6.4 4.9 19.3 15.6 7.4 10.1 7.9 7.9 10.6 5.3 21.7 6.4 17.4 5.03 7.4 12.8 5.8 25.7 19.6 7.0 6.9 6.03 14.8 25.6 7.02 4.7 24.5 19.6 8.3 12.6 9.0 9.3 11.3 5.0 31.1 7.8 20.7 5.3 10.1 17.01 22 (Valdosta) The level of resolutions used for the experiments both in the current section and in the previous one is Thus, we have N ¼ 42 ¼ 16 sliding windows According to Theorem 1, P r ẵN =3 ẳ 6: Therefore, the possible value of P belongs to the interval [2, 6] The experimental results are shown in Table Fig 11 below visualizes the results in Table From those results, we recognize that using large number of groups increases the compressed size of data in F-MANFIS For example, the compressed sizes of Aberdeen with P from to are 143.36, 155.65, 172.03, 188.42 and 204.80 Each time a group is added, the compressed size increases 9.3% of the previous size of data Similarly, the incremental ratios of Baker, Jacksonville, Racine and Valdosta are 5.1%, 7.8%, 10.9% and 12.3%, respectively The reason for this fact is that more MANFIS parameters needed to be No Dataset Aberdeen Baker Jacksonville Racine Valdosta P 143.36 245.76 135.17 151.55 151.55 155.65 204.80 151.55 167.94 169.96 172.03 221.18 172.03 188.42 192.46 188.42 225.28 159.74 208.89 218.96 204.80 237.57 180.22 229.38 240.62 stored at the server when the number of groups increases In most datasets, the value of parameter P ¼ often minimizes the compressed sizes of data Thus, it is our recommended number of groups of sliding windows when compressing a DEM dataset The remark found from Section 4.3 is the following: The optimal number of groups of sliding windows that should be used is two 4.4 The comparison of training methods In Section 3.2, we have already known the reasons of using the hybrid method between GD and PSO for the training phase However, is the hybrid method more effective than the stand alone PSO or GD for training? This section aims to answer this question by performing the comparison between the original F-MANFIS (a k.a GD-PSO) and the modified F-MANFIS with the training methods being the stand alone PSO (a.k.a PSO*) and GD (a.k.a GD*) In PSO*, we randomly generate the premise parameters and use them to create the consequence ones The new-born parameters are trained by the PSO algorithm We continue the process from random generation of premise parameters to training of consequence parameters several times and choose the final result having minimal MSE value among all In GD*, we also randomly generate the premise parameters and use them to create the consequence ones However, contrary to PSO*, those parameters are not trained but used to generate the next premise ones by the GD method This process is repeatedly performed until the total number of iterations in the training phase (Train_Iter) is reached L.H Son et al / Engineering Applications of Artificial Intelligence 29 (2014) 33–42 41 Fig 11 The compressed size of F-MANFIS by the number of groups Table The compressed sizes of algorithms using various training methods (kB) No Dataset Original Size GD-PSO GDn PSOn 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Aberdeen Baker Caliente Dalhart Eagle Pass Fairmont Gadsden Hailey Idaho Falls Jacksonville Kalispell La Crosse Macon Nashville O’Neill Paducah Quebec Racine Sacramento Tallahassee Ukiah Valdosta Waco Yakima 5636.84 6634.1 6929.23 7045.44 5636.84 5636.84 5636.84 7044.76 7045.44 3058.48 6797.47 5636.84 4466.25 5636.84 5636.84 5578.93 5469.94 5636.84 6293.56 3984.84 5521.38 4223.39 5504.55 5842.81 143.36 245.76 217.09 200.7 147.46 143.36 188.42 172.03 237.57 135.17 192.51 147.46 253.95 151.55 163.84 151.55 143.36 151.55 217.09 151.55 143.36 151.55 151.55 143.36 481.09 536.19 495.13 212.21 195.32 187.45 387.94 453.34 239.12 140.23 419.84 338.01 260.12 212.54 238.78 233.43 300.12 140.21 409.21 167.65 312.22 152.19 166.23 356.55 250.18 515.15 367.75 205.65 167.28 171.81 229.63 342.64 236.02 122.93 362.47 222.80 254.24 245.86 204.65 220.43 250.41 132.25 310.32 155.24 300.32 151.67 160.21 373.32 The evaluation criteria for this experiment are the compressed sizes and time of algorithms The results in Table show that the compressed sizes of GDPSO are smaller than those of GD* and PSO* For example, the compressed size of GD-PSO on the Aberdeen dataset is 143.36 kB whilst those of GD* and PSO* are 481.09 and 250.18 kB, respectively Similarly, the results on some datasets such as the Dalhart, Eagle Pass, Fairmont, Macon, Tallahassee and Waco also confirm the finding above even though the compressed sizes of three algorithms are approximate to the others Thus, this means that using the hybrid method between GD and PSO results in better compressed sizes of algorithm than using the stand alone GD and PSO The explanation for this fact is that in PSO*, the consequence parameters generated by PSO is just the best values for a given premise parameters This does not guarantee the optimal value since other premise parameters may obtain better values of consequence parameters In the other words, the solution found by PSO* is the local optima Similarly, the solution of GD* is the local optima too By combining both GD and PSO methods, the final solution is optimal both on the premise and consequence parameters so that it can be considered as the global optima The experimental results stated that the achieved global optima is better than the local optima of GD* and PSO* in terms of compressed sizes of algorithms Nonetheless, there are some exceptions in Table showing that the compressed sizes of GD-PSO are larger than those of PSO* and GD* For instance, the results on the Idaho Falls, Jacksonville and Racine datasets clearly confirm that the compressed sizes of PSO* are smaller than those of GD-PSO, especially on the Racine dataset both results of GD* and PSO* are better than that of GD-PSO This fact can be explained by the reverse impact of consequence to premise parameters In cases that the achieved consequence parameters is the global optima, they are still used to generate the premise parameters and the next consequence ones; thus increasing the MSE values since the next consequence parameters are not the global optima again Nevertheless, those cases are rare and most of times the GD-PSO method still produces better results than PSO* and GD* as stated in Table Another finding found from Table is the superiority of PSO* over GD* The results show that the compressed sizes of PSO* are mostly smaller than those of GD* The differences of values are quite obvious, e.g 230 kB (Aberdeen) or 110 kB (Hailey) There are also some exceptions that reflect the larger values of PSO* to GD* such as on the Nashville and Yakima datasets However, those exceptions are not much, and PSO* is assumed to be better than GD* This clearly confirms the significant role of consequence parameters in the system, and using a meta-heuristic algorithm such as PSO to train this kind of parameters achieves better MSE values and the compressed sizes of algorithms than using the GD algorithm to train the premise parameters only Fig 12 describes the compressed time of GD-PSO, PSO* and GD* The results show that using the GD method for the training of premise parameters in GD* takes less computational time than using the PSO method for consequence parameters in PSO* Both GD* and PSO* are faster than GD-PSO The average compressed time of GD-PSO is 303 s whilst those of GD* and PSO* are 150 and 256 s, respectively Even though the compressed time of GD-PSO is 42 L.H Son et al / Engineering Applications of Artificial Intelligence 29 (2014) 33–42 Fig 12 The compressed time of F-MANFIS using various training methods the largest among all, it is assumed to be acceptable to our context and objectives The remark found from Section 4.4 is the following: The hybrid GD-PSO for the training in F-MANFIS obtains better compressed sizes of algorithm than the stand alone GD and PSO methods PSO* is better than GD* in terms of compressed sizes GD-PSO has large compressed time but is acceptable Conclusions In this paper, we concentrated on the DCR problem and presented a hybrid approach between fuzzy clustering (FCM) and MANFIS neural network to improve the compression ratio of the state-of-the-art compression algorithm – DCRA The proposed method (F-MANFIS) used FCM to determine similar sliding windows in a DEM and used MANFIS to find out the coefficients of non-linear transforms for those sliding windows F-MANFIS was compared with some best compression algorithms for DCR such as DCRA and Kidner & Smith on the benchmark DEM datasets The results showed that the compression ratio of F-MANFIS is better than those of other algorithms Theoretical and experimental analyses on the optimal number of groups of sliding windows used in F-MANFIS and the efficiency of using the hybrid GD and PSO in the training phase over stand alone methods were conducted Further works of this theme aim to answer the questions as follows: (i) How to choose two template points in a sliding window to minimize the objective function?; (ii) How many template points is optimal? Acknowledgment The authors are greatly indebted to the editor-in-chief, Prof B Grabot and anonymous reviewers for their comments and their valuable suggestions that improved the quality and clarity of paper Other thanks will be sent to Mr Tran Van Huong and Ms Bui Thi Cuc for some experimental works This work is sponsored by the NAFOSTED under Contract no 102.01-2012.14 References Adams, M.D., Kossentini, F., 2000 Reversible integer-to-integer wavelet transforms for image compression: performance evaluation and analysis IEEE Trans Image Proc (6), 1010–1024 Bezdek, J.C., Ehrlich, R., et al., 1984 FCM: the fuzzy C-means clustering algorithm Comput Geosci 10, 191–203 Birkin, M., Clarke, G.P., 1991 Spatial interaction in geography Geogr Rev (5), 16–24 Boucheron, L.E., Creusere, C.D 2003 Compression of digital elevation maps for fast and efficient search and retrieval In: Proceedings of the International Conference on Image Processing (ICIP 2003), Barcelona, Spain, vol 1, pp 629–632 Boucheron, L.E., Creusere, C.D., 2005 Lossless wavelet-based compression of digital elevation maps for fast and efficient search and retrieval IEEE Trans Geosci Remote Sens 43 (5), 1210–1214 Inanc, M., 2008 Compressing Terrain Elevation Datasets Rensselaer Polytechnic Institute, New York Jang, J.S.R., 1993 ANFIS: adaptive-network-based fuzzy inference system IEEE Trans Syst Man Cybernet 23 (3), 665–685 Kennedy J., Eberhart R., 1995 Particle swarm optimization In: Proceedings of the IEEE International Conference on Neural Networks IV, pp 1942–1948 Kidner, D.B., Smith, D.H., 2003 Advances in the data compression of digital elevation models Comput Geosci 29, 985–1002 Li, Z., Zhu, Q., Gold, C., 2005 Digital Terrain Modeling: Principles and Methodology CRC Press, Boca Raton Marcellin, M.W., Gormish, M.J., Bilgin, A., Boliek, M.P., 2000 An overview of JPEG2000 In: Proceedings of the Data Compression Conference (DCC 2000), pp 523–541 Ooyen, A.V., Nienhuis, B., 1992 Improving the convergence of the back-propagation algorithm Neural Networks (3), 465–471 Parthiban, L., Subramanian, R., 2007 Intelligent heart disease prediction system using CANFIS and genetic algorithm Int J Biol Life Sci (3), 157–160 Rane, S.D., Sapiro, G., 2001 Evaluation of JPEG-LS, the new lossless and controlledlossy still image compression standard, for compression of high-resolution elevation data IEEE Trans Geosci Remote Sens 39 (10), 2298–2306 Son, L.H., Linh, N.D., Huong, T.V., Dien, N.H., 2011 A lossless effective method for the digital elevation model compression for fast retrieval problem Int J Comput Sci Network Security 11 (6), 35–44 Son, L.H., Cuong, B.C., Lanzi, P.L., Thong, N.T., 2012 A novel intuitionistic fuzzy clustering method for geo-demographic analysis Expert Syst Appl 39 (10), 9848–9859 Son, L.H., Cuong, B.C., Long, H.V., 2013 Spatial interaction – modification model and applications to geo-demographic analysis Knowledge-Based Syst 49, 152–170 Zheng, J.J., Fang, J.Y., Han, C.D (2009) Reversible integer wavelet evaluation for DEM progressive compression In: Proceedings of the IGARSS 2009, vol 5, 52–55 ... 24 Aberdeen Baker Caliente Dalhart Eagle Pass Fairmont Gadsden Hailey Idaho Falls Jacksonville Kalispell La Crosse Macon Nashville O’Neill Paducah Quebec Racine Sacramento Tallahassee Ukiah Valdosta... in DCRA The non-linear Lagrange transform should be used instead of the linear one: The above Lagrange transforms are linear and expressed by Eq (6) The advantages of linear transforms are the... Eagle Pass Fairmont Gadsden Hailey Idaho Falls Jacksonville Kalispell La Crosse Macon Nashville O’Neill Paducah Quebec Racine Sacramento Tallahassee Ukiah Valdosta Waco Yakima 5636.84 6634.1 6929.23