A detection for positioning sensor node based on multilayer perceptron

A Detection for Positioning Sensor Node Based on Multilayer Perceptron Thi-Kien Dao , Shi-Jie Jiang , Truong-Giang Ngo2(B) , Thi-Thanh-Tan Nguyen , and Trong-The Nguyen1,4 Fujian Provincial Key Laboratory of Big Data Mining and Applications, Fujian University of Technology, Fuzhou 350014, China vnthe@hpu.edu.vn Thuyloi University, 175 Tay Son, Dong Da, Hanoi, Vietnam giangnt@tlu.edu.vn Information Technology Faculty, Electric Power University, Hanoi, Vietnam Haiphong University of Manage and Technology, Haiphong 180000, Vietnam Abstract Node positioning accuracy and the environmental impact on devices in wireless sensor networks (WSN) have been paid attention much by scholars recently This paper proposed a prediction method for the sensor node positioning based on a multilayer perception in the neural networks The node locations based on its signals’ strength characteristics are captured to be a dataset The features about the signal strength of the node considered to extract from a large number of signal strength samples included noise that is measured by the nearest neighbor estimation for inputs of the scheme system The experimental results compared with the other method in the literature shows that the proposed scheme provides higher positioning accuracy and the lower average error than the competitors Keywords: Wireless sensor networks · Multilayer perceptron · Indoor positioning node · Predictive positioning Introduction Thanks to developing computer technology and smartphones, the sensors’ smart devices have become popularized in our daily life, e.g., in the fields of health care with positioning services, environment monitoring [1–3] The node device location is to find out its position by the estimation technique for the indoor environment [4, 5] Several factors can influence the node localization accuracy, e.g., complex indoor radio transmission environment, indoor building layout, personnel mobility, and so on [6] The indoor signal fading model cannot be established accurately, so its progress lags far behind the outdoor positioning technology [7] Global positioning system (GPS) often widely deployed and applied in outdoor positioning technology and the cellular base station Node indoor positioning solutions with low-cost and high-precision have been paid more attention from scholars The wireless communication technology, e.g., WiFi, Zigbee, Cellular, Bluetooth, can effectively be used to solve the blocking problem with GPS © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd 2021 J.-S Pan et al (eds.), Advances in Intelligent Information Hiding and Multimedia Signal Processing, Smart Innovation, Systems and Technologies 212, https://doi.org/10.1007/978-981-33-6757-9_34 270 T.-K Dao et al as outdoor [8] However, the localization accuracy is affected by obstacles, non-line-ofsight propagation, and noise due to the complex indoor environment [9] The traditional methods, such as continuous positioning, cannot achieve high accuracy because of the complex environments or the overfitting problem The primary practical significance is to study indoor positioning algorithms for smart node devices This paper considers the node device indoor positioning method based on the multilayers perception learning with the hidden structural features of data extracted by direct learning The network generalization ability is used to avoid the overfitting problem A deep learning model of the input, hidden, and output layers with setting the input–output approximately equal, learning the parameters of network weights, and then build the encoding mode, is applied to identify continuous prediction positioning The location of the nodes is estimated and measured through the captured signals strength with the nearest neighbor algorithm The signal strength data of a stacked coding scheme is used to build a position database for fitting generalization Related Work A multilayer perceptron (MLP) is a multilayer neural network that usually contains multiple hidden layers, which improve network expression ability for prediction It is similar to the three-layer structure of the traditional neural network that includes an input layer, a hidden layer, and output layer The gradient of concentrated learning is transmitted effectively through layer-by-layer training methods MLP is known as a concentrated pre-feedback network that is a typical deep learning model Its multiple layers of nodes, where each layer is fully connected to the next layer, and each node in the hidden layer is operated with a nonlinear activation function There are several activation functions, e.g., Sigmoid, Tanh, Relu functions The Sigmoid function is expressed as follows The Sigmoid function is used neural networks as an S-type function that can compress the real number into the interval of [0, 1], which is under a durable explanatory power However, when the neuron approaches or 1, saturation will occur, leading to gradient dispersion Therefore, the weight should be initialized carefully f (x) = 1 + ex (1) The Tanh function: this function has good data control ability and maps real numbers to the interval of [−1, 1], but there is still a saturation problem f (x) = ex − e−x ex + e−x (2) Relu function: is a linear correction unit, which is when x < and when x > Relu converges faster, but Relu is also more fragile Large gradient flow may lead to the permanent failure of neurons, which can be avoided by selecting an appropriate learning rate or inter-layer batch regularization The formula is as follows: f (x) = max(0, x) (3) A Detection for Positioning Sensor Node Based … 271 The network model is trained by using a back-propagation model The training sample set is x(1) , y(1) , , | x(m) , x(m) , where m is the number of samples, and the sample set is utilized for training the neural network The loss function in the experiment is expressed as follows hW ,b (x) − y2 (4) The critical step of the gradient descent method is equivalent to calculate the partial derivatives The iterative formula is given as follows J (W , b; x, y) = (1) ∂ (1) Wij = Wij − α J (W , b) (5) ∂Wij(1) where W andb are the weight and the bias item in the network, respectively (1) b(1) i = bi − α ∂ J (W , b) (6) (1) ∂bij where the a is the learning rate A deep learning regression prediction model with an indoor location scheme can predict and estimate discrete points For a more accurate continuous prediction location, a regression prediction model is used to build a dataset by using meaningful learning The linear regression model can be expressed as follows f (x) = wT x + b (7) where x represents input, w represents the weight, and b represents deviation w and b are trained as minimized objective functions The model first processes input data and then performs pre-training When the output layer is achieved, the model will propagate back The algorithm stops when it converges Proposed Scheme The location information of the sensor nodes in the deployed network environment is estimated by deep learning indoor location algorithm with the signal strength of various signal sources sensor devices The position point’s dataset is established based on the principle of extracts feature or reduces noise It would be trained and tested by a multilayer perception It matches the signal intensity features in the position of the node dataset, and the nearest neighbor algorithm is used to estimate the location of the points that are measured and chooses for the best matching position 3.1 The Positioning Algorithm The related features of the collected data can be extracted from high-dimensional collected data, and reduce the data dimension The input layer, the hidden layer, and the output layer are set as the calculation as follows h= 1 + exp(−wv − b) (8) 272 T.-K Dao et al Let h be the hidden layer, and v be to calculate reconstructed output layer u’ The calculation method is as follows: 1 + exp(−w h − b ) u = (9) where w and w are, respectively, the connection weights between the input layer and the hidden layer and between the hidden layer and reconstructed output layer The weight matrix w is limited to the transpose of the weight matrix w that is, w = wT b and b is the bias units of the hidden layer and reconstructed output layer, respectively; h is the hidden layer unit data The training of the automatic coding machine is to minimize the reconstruction error between u and u obtained through the input layer v The smaller the error is, the closer the reconstructed output layer is the input layer The hidden layer can better express the information of the input layer to reach the purpose of feature extraction The K-dimensional vector v = {vi |i = 1, 2, , k}, the input layer of the N the hidden layer The number of hidden layer neurons of experimental encoding values voffline = vji |j = 1, 2, , J ; i = 1, 2, , K is trained at the input of the stacking automatic coding machine of structure J is the number of data strips collected in the offline phase, and each dimension of each piece of data corresponds to an RSS of fixed AP or iBeacon The training under newly collected data: DATA offline = h3jioffline |j = 1, 2, , J ; i = 1, 2, , n = { data ji |j = 1, 2, , J ; i = 1, 2, , n}, n represents the dimension of data 3.2 Nearest Neighbor Technique Phase data {voffline = 1, 2, , K} is put in system with the input layer, and the structure for a forward propagation, where the parameters w and b are the DATA trained in the offline phase, and DATA = h3ionline |i = 1, 2, , n = {DATA i |i = 1, 2, , n} as the input data of the classifier nearest neighbor method iBeacon corresponding to the RSS [10] of each dimension of the original fingerprint database and the Vonline phase DATA online are the same, and the information expressed by each dimension of the new fingerprint database and the online DATA is also corresponding In the original dataset of offline and online DATA, the nearest neighbor method is used to calculate the online phase data and the Euclidean distance of the i data in the new dataset n dj = DATAi − data ij (10) i=1 where dataji represents the i dimension data of the j data in the new fingerprint database, dataji represents the i dimension data in the online phase, and n represents the dimension of the data processed by the automatic stack encoder Finally, depending on the order of Euclidean distance d j from small to large (the shorter the distance, the higher the similarity of the two kinds of data), the coordinate of the sampling point with the smallest range is the positioning result A Detection for Positioning Sensor Node Based … 273 Experimental Results A deployed network area is used for the nodes device localization to verify the effectiveness of the proposed scheme The setting environment of the deployed coverage network area includes corridors and offices equipped with desks, chairs, bookcases, and other office items The signal strength collected with the groups at a time interval of constant seconds for data in each location is to collect data The simulation of the multilayered neural network is tested with a set of 100 samples Several hidden layers in the multilayered neural network classifier are set to L (L is set to 3, 5, 10, 20, 50); the activation function used Relu adopted of the hidden layer that initializes the weight Regression fitting is carried out on the test set to predict the results of coordinate points It can be observed in the table that the prediction effect of multiple hidden layers is obviously better than that of a single layer, but the positioning error is still substantial Table shows a comparison of the prediction effect of multiple hidden layers with a single layer The results of coordinate node-points are estimated on the test set based on a regression fitting It can be seen that the positioning error of the multilayer hidden is better than a single layer Table Comparison of results of multi-hidden layers with the single hidden layer Hidden layer setting Mean positioning error /m A single hidden layer Multi-hidden layers 0.336 0.268 The error is less than 0.25 m registration point/% 19.5 51.8 Table shows the specific results of the selected activation functions, e.g., Relu, Sigmoid, and Tanh, for the hidden layers It is clear to see that adequate positioning accuracy through the complete action Relu function performs well in the classification task However, the Sigmoid and other functions are so practical in the case of uneven data distribution due to its weak ability to control data Table Comparison of positioning accuracy of selected activation functions for the hidden layers The activation function Relu Sigmoid Tanh Mean positioning error /m 0.2716 0.2158 0.1796 The error is less than 0.25 m registration point /% 37 54 71 The experimental results of the proposed method are compared with the other techniques, e.g., grey wolf optimizer (GWO) [4], firefly algorithm (FA) [5], pigeon-inspired optimization (PIO) [9], Ion motion optimization (IMO) [11] for constructing the relationship between positioning features and positioning coordinates under the same experimental conditions Fig shows the comparison of the obtained values of the proposed 274 T.-K Dao et al method with several nodes positioning means by, e.g., GWO, FA, IMO, and PIO algorithms Subfigure (a) is the average positioning error comparison, and subfigure (b) is the average positioning error cumulative probability distribution Observed, the obtained results of the proposed method can provide smaller errors in the device positioning problems Table depicts the comparison of the time consumption of the proposed method with the GWO, FA, IMO, and PIO approaches for node positioning problem (ms) with a variety of nodes numbers of deployed networks It can be seen that most cases of the time running of the proposed method produce a shorter time than the competitors Table Comparison of the time consumption of the proposed method with the GWO, FA, IMO, and PIO approaches for node positioning problem with different nodes numbers of deployed networks Algorithms N = 20 N = 50 N = 80 N = 110 N = 130 N = 160 Proposed method 297.476 350.721 398.991 451.081 502.766 552.004 GWO 301.350 357.109 408.329 459.231 513.221 565.171 FA 302.701 353.281 405.421 458.341 508.217 557.124 IMO 6.213 102.334 166.210 217.662 275.329 327.371 PIO 60.001 101.219 159.296 241.002 266.737 319.534 Generally, the comparison results of location accuracy and calculation time show that the proposed algorithm can achieve better performance than the competitors Conclusion In this study, we proposed a prediction scheme for the sensor node positioning based on a multilayer perception in the neural networks The nearest neighbor was used to estimation for inputs of the scheme system The signals’ strength of the node locations was used to be the parameters as inputs to the classification system The features of the node signal strength were extracted from a large number of signal strength samples event included noise The experimental results compared with the other approaches, e.g., GWO, FA, IMO, and PIO methods in the literature, show that the proposed scheme provides higher positioning accuracy and the lower average error than the competitors A Detection for Positioning Sensor Node Based … 275 Fig Comparison of the obtained values of the proposed method with several nodes positioning means by, e.g., GWO, FA, IMO, and PIO algorithms Subfigure a is the average positioning error comparison, and subfigure b is the average positioning error cumulative probability distribution References Clemensen, J., Larsen, S.B., Kyng, M., Kirkevold, M.: Participatory design in health sciences: using cooperative experimental methods in developing health services and computer technology Qual Health Res 17, 122–130 (2007) Dao, T., Nguyen, T., Pan, J., Qiao, Y., Lai, Q.: Identification failure data for cluster heads aggregation in WSN based on improving classification of SVM IEEE Access 8, 61070–61084 (2020) https://doi.org/10.1109/ACCESS.2020.2983219 Nguyen, T.-T., Qiao, Y., Pan, J.-S., Chu, S.-C., Chang, K.-C., Xue, X., Dao, T.-K.: A hybridized parallel bats algorithm for combinatorial problem of traveling salesman J Intell Fuzzy Syst Preprint, 1–10 (2020) https://doi.org/10.3233/JIFS-179668 Nguyen, T.-T., Thom, H.T.H., Dao, T.-K.: Estimation localization in wireless sensor network based on multi-objective grey wolf optimizer In: Akagi, M., Nguyen, T.-T., Vu, D.-T., Phung, T.-N., Huynh, V.-N (Eds.), Adv Inf Commun Technol Proc Int Conf ICTA 2016, Springer International Publishing, Cham, pp 228–237 (2017) https://doi.org/10.1007/9783-319-49073-1_25 Nguyen, T.-T., Pan, J.-S., Chu, S.-C., Roddick, J.F., Dao, T.-K.: Optimization localization in wireless sensor network based on multi-objective firefly algorithm J Netw Intell 1, 130–138 (2016) Nguyen, T.T., Pan, J.S., Dao, T.K.: An improved flower pollination algorithm for optimizing layouts of nodes in wireless sensor network IEEE Access 7, 75985–75998 (2019) https:// doi.org/10.1109/ACCESS.2019.2921721 Pei, L., Chen, R., Chen, Y., Leppäkoski, H., Perttula, A.: Indoor/outdoor seamless positioning technologies integrated on smart phone In: 2009 First Int Conf Adv Satell Sp Commun., IEEE, pp 141–145 (2009) Yeh, S.-C., Hsu, W.-H., Su, M.-Y., Chen, C.-H., Liu, K.-H.: A study on outdoor positioning technology using GPS and WiFi networks In: 2009 Int Conf Networking, Sens Control, IEEE, pp 597–601 (2009) Nguyen, T.-T., Pan, J.-S., Dao, T.-K., Sung, T.-W., Ngo, T.-G.: Pigeon-inspired optimization for node location in wireless sensor network BT—advances in engineering research and application In: Sattler, K.-U., Nguyen, D.C., Vu, N.P., Tien Long, B., Puta, H (Eds.) Springer International Publishing, Cham, pp 589–598 (2020) 10 Vaghefi, R.M., Gholami, M.R., Ström, E.G.: RSS-based sensor localization with unknown transmit power In: 2011 IEEE Int Conf Acoust Speech Signal Process., pp 2480–2483 (2011) https://doi.org/10.1109/ICASSP.2011.5946987 276 T.-K Dao et al 11 Pan, J.-S., Nguyen, T.-T., Chu, S.-C., Dao, T.-K., Ngo, T.-G.: Network, diversity enhanced ion motion optimization for localization in wireless sensor J Inf Hiding Multimed Signal Process 10, 221–229 (2019) Pork Price Prediction Using Topic Modeling and Feature Scoring Method Tserenpurev Chuluunsaikhan1 , Kwan-Hee Yoo1 , HyungChul Rah2 , and Aziz Nasridinov1(B) Department of Computer Science, Chungbuk National University, Cheongju, South Korea {teo,khyoo,aziz}@chungbuk.ac.kr Department of Management Information System, Chungbuk National University, Cheongju, South Korea hrah@chungbuk.ac.kr Abstract A large amount of text data may hide a numeric connection related to some other subject, for example, price In this paper, we aimed to predict pork prices based on topic modeling and word scoring method This study consists of four steps, such as feature extraction, word scoring, feature selection, and prediction Any prediction model has input/features and output We extracted our features from online news data using the topic modeling technique (LDA) Also, we selected the daily pork price as the output After that, we created a word scoring corpus using the result of LDA and price movements Because of our features and output are numeric values, we applied the Pearson’s correlation as feature selection To check our word scoring method, we built a prediction model of pork price using LSTM We evaluated the model without feature selection and with feature selection We used RMSE, MAE, and MAPE to measure our model accuracy The results show that our model can be used in the price prediction of pork and other agricultural commodities Keywords: Price prediction · Topic modeling · Word scoring · LSTM Introduction Agriculture is an important sector that has always been with human growth It is the process of producing food by the cultivation of grain and the raising of domesticated animals (livestock) All the humans are participants of this process For example, they are governors, farmers, and consumers The main thing that connects them is the price of agricultural commodities High prices benefit farmers, but not for consumers Consumers always want low prices In this situation, governors attempt to keep the price at the proper level Predicting the price will help governors make decisions in the future If farmers know the price in advance, they also can regulate the production of agricultural commodities It helps them to avoid the loss of the economy © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd 2021 J.-S Pan et al (eds.), Advances in Intelligent Information Hiding and Multimedia Signal Processing, Smart Innovation, Systems and Technologies 212, https://doi.org/10.1007/978-981-33-6757-9_35 278 T Chuluunsaikhan et al Many research works attempt to predict agricultural commodity prices [1–3] Notably, online text data (online news, Twitter, and others) make a significant contribution to predicting agricultural commodities prices [2, 3] Because online text data could include reasons (words) for price changes So, we need to extract good keywords to get good results of agricultural price prediction Pork is one of the agricultural commodities that people mostly use in Korea The market supply and demand determine the price of pork like any other product Some unexpected and unplanned actions can change supply and demand For example, African Swine Fever is a hot topic of the pork market in Korea During this disease, people decrease their usage of pork Because of that, the pork price is also fallen When demand is decreasing continuously, the governors begin to take action to support consumption Consumers usually know this kind of information from news and regulate their consumption That is why we believe that news can affect the price of pork Price prediction is a process of trying to calculate the future price using inputs/features that can affect the price The inputs/features can be different values based on their sector and goals In this paper, we proposed a pork price prediction model using topic modeling and word scoring method First, we applied a topic modeling technique to obtain the most relevant words from online news data Also, we compared the price with rising and falling prices for the previous day Finally, we scored each word of topic modeling using the price changes We evaluated our model using the LSTM algorithm To increase the accuracy of the model, we also applied a feature selection method We measured our accuracy using root mean squared error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE) Methodology We explain our proposed method in this section The section consists of feature extraction (2.1), feature scoring (2.2), feature selection (2.3), and price prediction (2.4) Figure shows an overview of our purposed method The proposed method mainly consists of feature extraction, feature scoring, and feature selection Additionally, we applied some other data preprocessing techniques to increase the accuracy of our model We explain it in detail in the following subsections 2.1 Feature Extraction Feature extraction is the initial step of the methodology We collected online news data from PigTimes [4], which is a portal web that publishes news about pig Since the portal publishes news related to pig, we can assure that our dataset just related to pig, pork, pork farm, and pork market Topic modeling is an important technique of natural language processing (NLP) It extracts relevant topics from a large amount of text data There are many approaches to obtaining topics from text data For example, LDA is one of the popular topic modeling techniques We extracted the input features of our prediction model using the LDA technique Our output feature is the daily price of pork We collected the retail price of pork from KAMIS [5], which provides various information related to the distribution of agricultural and livestock products Pork Price Prediction Using Topic Modeling … 279 Fig Overview of our methodology 2.2 Feature Scoring Feature scoring is the main idea of our study We set a score for each feature extracted from online news data First, we counted all cases of price increased and decreased Then, we also counted cases of price increased and decreased for each feature Using this data, we created the feature scoring method The scoring range is [−1, 1], −1 represents the best correlation for price decrement, and represents the best correlation for price increment FI FD Sw = − (1) TI TD Formula shows the method of feature scoring The attributes of function represent the following meanings: • • • • • Sw —The score of the feature FI —The number of cases of price increased for the feature TI —The number of total cases of price increased FD—The number of cases of price decreased for the feature TD—The number of total cases of price decreased For example, let us assume the word “African Swine Fever” usually occurs when the price decreases In that case, the score of the word will be near to -1 and we can imply it one reason of price decreases 2.3 Feature Selection Feature selection is one of the core concepts in the price prediction Choosing the right feature selection method depends on the input and output data of the model In our case, we use numerical input (feature scoring) and numerical output (price) That is why we choose Pearson’s correlation to select optimal features Pearson’s correlation determines if two variables are linearly related using a number between [−1, 1] Values −1 and mean that two variables have negative or positive linear correlation But if it is 0, there is no correlation at all We compared each feature with the price and picked features that have correlated with the price by value 0.1 ... batch regularization The formula is as follows: f (x) = max(0, x) (3) A Detection for Positioning Sensor Node Based … 271 The network model is trained by using a back-propagation model The training... for a forward propagation, where the parameters w and b are the DATA trained in the offline phase, and DATA = h3ionline |i = 1, 2, , n = {DATA i |i = 1, 2, , n} as the input data of the classifier... classification task However, the Sigmoid and other functions are so practical in the case of uneven data distribution due to its weak ability to control data Table Comparison of positioning accuracy

Định dạng
Số trang	11
Dung lượng	191,49 KB