Nâng cao hiệu quả truyền dữ liệu trong mạng cảm biến không dây dựa trên tương quan dữ liệu

CSMA Carrier Sense Multiple AccessECODA Entropy COrrelation clustering for Data Aggregation LEACH-C Low Energy Adaptive Clustering Hierarchy- Centralized MEMS Micro Electro Mechanical Sy

Trang 1

MINISTRY OF EDUCATION AND TRAINING

HANOI UNIVERSITY OF SCIENCE AND TECHNOLOGY

Nguyen Thi Thanh Nga

EFFICIENT DATA COMMUNICATION FOR WIRELESS SENSOR

NETWORK BASED ON DATA CORRELATION

Major: Computer Engineering Code No.: 9480106

COMPUTER ENGINEERING DISSERTATION

SUPERVISORS:

1 Dr Nguyen Kim Khanh

2 Assoc Prof Ngo Hong Son

Hanoi - 2018

Trang 2

I assure that this is my own research All the data and results in the thesis arecompletely true, were agreed to use in this thesis by co-authors This research hasn’tbeen published by other authors than me

Hanoi, 17th Decemberber 2018

Assoc Prof Ngo Hong Son

Trang 3

This Ph.D thesis has been carried out at the Department of ComputerEngineering, School of Information and Communication Technology, HanoiUniversity of Science and Technology The research has been completed undersupervisions of Dr Nguyen Kim Khanh and Associate Prof Dr Ngo Hong Son

Firstly, I would like to express my sincere gratitude to my advisors Dr NguyenKim Khanh and Associate Prof Dr Ngo Hong Son for their continuous support of myPh.D study and related research, for their patience, motivation, and immenseknowledge Their valuable guidance, unceasing encouragement and supports havehelped me during all the time of research and writing out of this thesis

Besides my advisors, I would like to thank all my colleagues in theDepartment of Computer Engineering for their insightful comments, encouragementand for the hard questions which incented me to widen my research from variousperspectives I would like to express my appreciation to Prof Dr Trinh Van Loanfor his time and patient helping me to correct the whole thesis as well as valuecomments during the process of pursuing my doctorate degree

I want to thank all my colleagues in the School of Information andCommunication Technology, for their supports and helps in my work

I gratefully acknowledge the receipt of grants from 911 project of Ministry

of Education and Training which enabled me to carry out this research

Finally, I would like to thank my family, my sisters, my father and mother,

my husband and two children for their endless love, encouraging and unconditionalsupporting me continuously and throughout writing this thesis

Nguyen Thi Thanh Nga

3

Trang 4

TABLE OF CONTENT

COMMITMENT 2

ACKNOWLEDGMENTS 3

TABLE OF CONTENT 4

LIST OF ABBREVIATIONS 7

LIST OF FIGURES 8

LIST OF TABLES 11

PREFACE 13

1 INTRODUCTION 16

Overviews 16

Energy conservation in WSNs 19

1.2.1 Radio optimization 19

1.2.2 Sleep/wake-up schemes 20

1.2.3 Energy efficient routing 20

1.2.4 Data reduction 21

1.2.5 Charging solution 22

Data correlation and energy conservation in WSNs 23

Problem statements and contributions 24

2 CORRELATION IN WIRELESS SENSOR NETWORK 25

Correlation model survey 25

Information entropy theory 31

2.2.1 Overview 31

2.2.2 Entropy concept 32

2.2.3 Joint entropy 32

Correlation and entropy 33

2.3.1 Correlation of two variables 33

2.3.1.1 Mutual information 33

2.3.1.2 Entropy correlation coefficient 34

Trang 5

2.3.2 Correlation of more than two variables 36

Conclusions 38

3 ENTROPY-BASED CORRELATION CLUSTERING 39

Joint entropy estimation 39

3.1.1 Determining the upper bound of joint entropy 39

3.1.2 Determining the lower bound of joint entropy 42

3.1.3 Validating entropy estimation 44

Correlation region and correlation clustering algorithm 47

3.2.1 Estimated joint entropy and correlation 47

3.2.2 Correlation region definition 50

3.2.3 Correlation clustering algorithm 52

3.2.4 Validation 54

Conclusions 56

4 ENTROPY CORRELATION BASED DATA AGGREGATIONS 57

Compression aggregation 57

4.1.1 Comparison of compression schemes 57

4.1.2 Compression based routing scheme in a correlated region 60

4.1.2.1 1-D analysis 61

4.1.2.2 2-D analysis 65

4.1.2.3 General topology model analysis 69

4.1.3 Optimal routing scheme in correlation networks 71

Representative aggregation 72

4.2.1 Distortion function 72

4.2.2 Number of representative nodes 73

4.2.3 Representative node selection 76

4.2.4 Practical validation 77

Conclusions 80

5

Trang 6

5 ENTROPY CORRELATION BASED DATA AGGREGATION

PROTOCOL (ECODA) 82

Network model 82

Radio model 83

Outline of ECODA 84

5.3.1 Set-up phase 85

5.3.2 Steady-state phase 87

Performance evaluation 87

5.4.1 Simulation models 87

5.4.1.1 Simulation parameters 88

5.4.1.2 Simulation setups 89

5.4.1.3 Dissipated energy calculation 90

5.4.2 Simulation results and discussions 92

5.4.2.1 Compression aggregation-based routing protocol 92

5.4.2.2 Representative aggregation-based routing protocol 97

5.4.3 Evaluations and comparison 100

5.4.3.1 The case of ECODA with compression aggregation 101

5.4.3.2 The case of ECODA with representative aggregation 106

Conclusions 107

6 CONCLUSIONS AND FUTURE STUDY 109

Summary of Contributions 109

Limitations 110

Future work 111

PUBLICATION LIST 112

REFERENCES 113

APPENDIX 125

Trang 7

CSMA Carrier Sense Multiple Access

ECODA Entropy COrrelation clustering for Data Aggregation

LEACH-C Low Energy Adaptive Clustering Hierarchy- Centralized

MEMS Micro Electro Mechanical Systems

RSSI Received Signal Strength Indication

TDMA Time Division Multiple Access

VLSI Very Large-Scale Integration

WSN(s) Wireless Sensor Network(s)

7

Trang 8

LIST OF FIGURES

Figure 1.1 Wireless Sensor Network 16

Figure 1.2 Wireless Sensor Network Applications 17Figure 2.1 The layout of sensor nodes in an environment with two different conditions

area 30Figure 2.2 The relations between entropies, joint entropy, and mutual information 33

Figure 2.3 Relation between correlation and joint entropy 37

Figure 3.1 Joint entropy calculation principle 42

Figure 3.2 Sensor layout in Intel Berkeley Research Lab 45

Figure 3.3 Practical, upper bound and lower bound joint entropy (JE) of subsets ofthe dataset 1 46

Figure 3.4 Estimated joint entropy with different values of entropy correlationcoefficients using upper bound function (with Hmax = 2[bits]) 48

Figure 3.5 Estimated joint entropy (by upper bound) and practical joint entropy ofdataset 1 49

Figure 3.6 Correlation-based clustering algorithm 52

Figure 3.7 Temperature data measured at 11 nodes in the dataset 1 53

Figure 3.8 Derivative of estimated joint entropy and calculated the joint entropy ofthe selected group 55

Figure 4.1 Routing paths for three schemes: (a) DSC, (b) RDC, and (c) CDR [122]59

Figure 4.2 Energy consumptions for the DSC, RDC and CDR schemes respectively

to entropy correlation coefficients 60

Figure 4.3 Routing pattern of 1-D network 61

Figure 4.4 Total bit-hop cost Es that corresponds to cluster size with different values

of entropy correlation coefficient in the case of 1-D with compression alongSPT to the cluster head 63

of entropy correlation coefficient in the case of 1-D with compression at thecluster head only 64

Figure 4.6 Routing pattern of the 2-D network [122] 65

Trang 9

of entropy correlation coefficient in the case of 2-D with compression along

SPT to the cluster head 67

Figure 4.8 Total bit-hop cost Es that corresponds to cluster size with different values of entropy correlation coefficient in the case of 2-D with compression at the cluster head only 68

Figure 4.9 Illustration of clustering for a general topology model 69

Figure 4.10 Total transmission cost that corresponds to cluster size with different values of entropy correlation coefficient with compression along SPT to the cluster head 70

Figure 4.11 Total transmission cost respectively to cluster size with different values of entropy correlation coefficient with compression at the cluster head only 71 Figure 4.12 The relation between distortion and the number of representative nodes with N = 10 74

Figure 4.13 The relation between distortion and the number of representative nodes with N = 15 74

Figure 4.14 The relation between distortion and the number of representative nodes with N = 20 75

Figure 4.15 Maximizing obtained information based representative node selection algorithm 77

Figure 5.1 Radio energy dissipation model 83

Figure 5.2 Time scheduling for one round 85

Figure 5.3 Sensor node distribution in the 200mx200m sensing area 88

Figure 5.4 Routing path of compression-based routing protocol 89

Figure 5.5 Total energy in each round in case of compression along SPT to the CH 93 Figure 5.6 Number of alive nodes in each round in case of compression along SPT to the CH 94

Figure 5.7 Total energy in each round in case of compression at the CH only 96

Figure 5.8 Number of alive nodes in each round in case of compression at the CH only 97

9

Trang 10

Figure 5.9 Total energy in each round in case of representative aggregation withcompression with 16 correlation clusters 98Figure 5.10 Number of alive nodes in each round in case of representative aggregation

with compression with 16 correlation clusters 98

Figure 5.11 Total energy in each round in the case of representative aggregationwithout compression with 16 correlation clusters 99

Figure 5.12 Number of alive nodes in each round in the case of representativeaggregation without compression with 16 correlation clusters 100Figure 5.13 Total energy comparison between distance-based protocol and ECODAwith compression aggregation in the case of 16 correlation clusters 101Figure 5.14 Total energy comparison between distance-based protocol and ECODAwith compression aggregation in the case of 16 correlation clusters 102Figure 5.15 Total energy comparison between distance-based protocol and ECODAwith compression aggregation in the case of 8 correlation clusters 102Figure 5.16 Total energy comparison between distance-based protocol and ECODAwith compression aggregation in the case of 8 correlation clusters 103Figure 5.17 Total energy comparison between distance-based protocol and ECODAwith compression aggregation in the case of 4 correlation clusters 104Figure 5.18 Total energy comparison distance-based protocol and ECODA withcompression aggregation in the case of 4 correlation clusters 105Figure 5.19 Total energy comparison between distance-based protocol and ECODAwith representative aggregation in the case of 16 correlation clusters 106

Figure 5.20 Number of alive nodes comparison between distance-based protocol and

ECODA with representative aggregation in the case of 16 correlation clusters107

Trang 11

LIST OF TABLES

Table 3.1 Node’s entropy of the dataset 1 46

Table 3.2 Entropy correlation coefficient of each pair from the dataset 1 47

Table 3.3 Practical, upper bound and lower bound joint entropy (JE) of subsets of the dataset 1 49

Table 3.4 Clustering results of 48 nodes 53

Table 4.1 Number of representative nodes with distortion D = 0.05 76

Table 4.4 Selection of representative nodes and the actual distortion based on theoretical calculation (dataset 1 with N = 11 nodes) 78

Table 4.5 Selection of representative nodes and the actual distortion based on practical calculation (dataset 1 with N = 11 nodes) 78

Table 4.6 Entropy values of 10 nodes in the correlation region (dataset 2 with N = 10 nodes) 78

Table 4.7 Selection of representative nodes and the actual distortion based on theoretical calculation (dataset 2 with N = 10 nodes) 79

Table 4.8 Selection of representative nodes and the actual distortion based on practical calculation (dataset 2 with N = 10 nodes) 80

Table 5.1 Simulation parameters 88

Table 5.2 Simulation results in case of compression along SPT to the CH 94

Table 5.3 Simulation results in case of compression at the CH only 95

Table 5.4 Simulation results in the case of representative aggregation with compression at the CH 97

Table 5.5 Simulation results in the case of representative aggregation without compression at the CH 100

Table 5.6 Comparison between distance-based protocol and ECODA with compression aggregation in the case of 16 correlation clusters 103 Table 5.7 Comparison between distance-based in the case of 8 correlation clusters 104

11

Trang 12

Table 5.8 Comparison between distance-based protocol and ECODA withcompression aggregation in the case of 4 correlation clusters 105

Table 5.9 Comparison between distance-based protocol and ECODA withrepresentative aggregation in the case of 16 correlation clusters 106

Trang 13

In typical WSNs applications, sensors are required for spatially densedeployment to achieve satisfactory coverage As a result, multiple sensors willrecord information about a single event in the sensing field, i.e sensed data arecorrelated with each other The existence of correlation characteristic can bringmany significant potential advantages for the development of efficientcommunication protocols well-suited to the WSNs paradigm For example, due tothe correlation degree, data in a correlated region can be compressed with a highratio to reduce the amount of sent data for saving dissipated energy Even with highenough correlation, it may not be necessary for every sensor node in a correlationgroup to transmit its data to the base station Instead, a smaller number of sensormeasurements (representation) might be adequate to communicate the event features

to the base station within a certain reliability/fidelity level

From this point of view, various researches have focused on discovering andexploiting the correlation of sensed data in WSNs At the beginning of theseresearches, the traditional probability and statistic theory have been used to describethe correlation among data Nevertheless, these approaches limited the correlation

as a linear relation that may not appropriate for general, nonlinear cases in practice.Therefore, the information entropy approach has been considered to obtain thegenerality However, most of the research approach, using traditional probability -statistic theory or information entropy theory, considered the correlation as adistance-dependence feature In general, the correlation of data may be independent

of external factors such as sensor location and environmental conditions and thus, so

it is better to concentrate on the information contained in the data itself rather thanconsidering only attribute meta-data such as location and time

This thesis concentrates to discover and exploit the general correlation inWSNs using information entropy theory to look at the sensed data itself At first, a

13

Trang 14

novel distance-independence entropy-based correlation model for describingcorrelation characteristics in a wireless sensor network is proposed From thisentropy correlation model, an energy efficient routing protocol with correlation-based data aggregation will be developed.

To discover the correlation property, at first, an estimation of joint entropyfor a data group is established From this estimation, a definition of the correlationgroup is proposed and then the correlation model that is used to calculate the jointentropy of the correlation data group is developed To exploit the correlationcharacteristic, two main data aggregation schemes are analyzed and evaluated usingthe proposed correlation model At the end, these schemes are used to develop dataaggregation routing protocols Using the proposed routing protocols, the transferreddata in the network is reduced so that the dissipated energy is decreased

The thesis structure is as follows:

Chapter 1: Introduction

This chapter reviews the introduction of WSNs, energy conservationschemes, and data correlation problems The main contributions of the thesis arealso presented shortly in this chapter

Chapter 2: Correlation in Wireless Sensor Network

This chapter presents the survey of correlation model in WSNs and thecorrelation through the point of view of information entropy Then, the idea toestablish a new correlation model is described

Chapter 3: Entropy-based Correlation Clustering

Based on the analyzed factors in chapter 2, we propose the approximatedestimation of joint entropy From this approximation method, we define thecorrelation region and propose the correlation clustering scheme We also verify thevalidation of the proposed estimation and correlation clustering scheme in thischapter

Chapter 4: Entropy-based Data Aggregations

In this chapter, we exploit the advantages of using data correlation by dataaggregation using entropy correlation including entropy-based representativeaggregation and entropy-based data compression

In entropy-based representative aggregation, the distortion of data in the groupwhile some nodes are put into sleep state is evaluated using the proposed correlation

Trang 15

based on the desired distortion and then a representative aggregation routingprotocol is developed.

In entropy-based data compression, several data compression schemes areevaluated using the proposed correlation model Then a compression-based routingprotocol is developed

Chapter 5: Entropy Correlation based Data Aggregation Protocol (ECODA)

In this chapter, we outline an Entropy COrrelation-based Data Aggregationprotocol (ECODA) using the proposed clustering scheme in chapter 3 and dataaggregation schemes in chapter 4 The simulations have also been done to validatethe effectiveness of the proposed clustering and aggregating schemes

Chapter 6: Conclusions and Further study

This chapter concludes the results of the thesis with careful evaluations andpoints out the remained problems that are the future works

15

Trang 16

Figure 1.1 Wireless Sensor Network 1

Trang 17

Wireless Sensor Network (WSN), is the collection of sensor nodes whichcooperatively monitor surrounding phenomena over large physical areas [1]–[4].These sensor nodes can sense, observe or measure, gather information from theenvironment and transmit the sensed data to the user based on some local decisionprocess A typical sensor node is composed of a sensing unit which is equipped withone or more sensors, a processing unit, a power unit, and a transceiver unit Thesensing unit could have various sensors such as thermal, biological, chemical,optical, and magnetic to measure properties of the environment A sensor nodeacquires data through the sensing unit, processes sensed data by the processing unitand finally transmits processed data using the transceiver unit Because of thelimitations of memory capabilities, sensor nodes should be implemented by wirelesscommunication to transfer the data to a base station, allowing them to disseminatetheir sensor data to remote processing, visualization, analysis, and storage systems.

Figure 1.2 Wireless Sensor Network Applications 2

2

https://www.researchgate.net/publication/220505150_Energy_Saving_Mechanisms_for_MAC_Protocols_in_ Wireless_Sensor_Networks/figures?lo=1

17

Trang 18

There are five types of WSNs: terrestrial WSN, underground WSN,underwater WSN, multi-media WSN, and mobile WSN [3] In terrestrial WSNs [1],there are hundreds to thousands of inexpensive wireless sensor nodes deployed in agiven area, either in an ad hoc or in a pre-planned manner Reliable communication

in a dense environment is very important in this WSN type Battery power is limitedand may not be rechargeable in terrestrial sensor nodes, however, they can beequipped with a secondary power source such as solar cells In a terrestrial WSN,energy can be conserved with multi-hop optimal routing, short transmission range,in-network data aggregation, eliminating data redundancy, minimizing delays, andusing low duty-cycle operations

In underground WSNs [5], sensor nodes are buried underground or in a cave

or mine used to monitor underground conditions An underground WSN is moreexpensive than a terrestrial WSN in terms of equipment, deployment, andmaintenance In addition, the operation of wireless communication is more difficult

in the underground environment due to signal losses and high levels of attenuation

Opposite to a dense deployment of sensor nodes in a terrestrial WSN,underwater WSNs [6] consist of sensor nodes and vehicles deployed underwater.Because of their special working environment, underwater sensor nodes are moreexpensive and fewer sensor nodes are deployed, in comparison with terrestrialWSNs Autonomous underwater vehicles are used for exploration or gathering datafrom sensor nodes Underwater wireless communications are typically establishedthrough transmission of acoustic waves with limited bandwidth, long propagationdelay, and signal fading issue In addition, underwater sensor nodes must be able toself-configure and adapt to the harsh ocean environment

Multi-media WSNs [7] have been developed to enable the monitoring andtracking of events using multimedia such as video, audio, and imaging Multi-mediaWSNs consist of various low-cost sensor nodes equipped with cameras andmicrophones They are usually deployed in a pre-planned manner into theenvironment to guarantee coverage Multi-media sensor nodes interconnect witheach other over a wireless connection for data retrieval, process, correlation, andcompression Because of high data transmission, challenges in multi-media WSNinclude high bandwidth demand, high energy consumption, quality of service (QoS)provisioning, data processing and compressing techniques, and cross-layer design

Mobile WSNs [8] [9] consist of a collection of sensor nodes that can move ontheir own and interact with the physical environment Same as in static WSNs, nodes in

Trang 19

mobile nodes can reposition and organize itself in the network This mobilitycharacteristic requires dynamic routing in a mobile WSN Challenges in mobileWSN include deployment, localization, self-organization, navigation and control,coverage, energy, maintenance, and data process.

The above described features of WSNs ensure great potential for manyapplications [10]–[14] The development of WSNs was motivated by militaryapplications [15]–[19] and then were widely used in various fields such as industrialmonitoring [20]–[25], environment monitoring [26]–[33], agriculture [34]–[37],forest fire detection [38]–[40], animal tracking [41] [42], healthcare [43]–[50],security [51]–[53], home automation [54] [55], power utility’s distribution [56],logistics [57], intelligent traffic systems [58], etc

In Vietnam, studies on WSNs have been considered in the last two decades.The most attracted topics are energy saving and load balancing in WSNs, inconsideration of base station position [59], delay constrained [60], 3D WSN [61],WSNs with holes [62], k-means clustering [63] The applications of WSNs are alsowidely considered such as landside monitoring [64], smart grid [65], target tracking[66], logistics [57], and healthcare monitoring [67]

Energy conservation in WSNs

In most cases, energy for activities in WSNs comes from a limited batterysupply However, in many applications, it is very hard or impossible to recharge thebatteries due to the deployment of the nodes because of the difficulties and hostileterrain or due to a large number of nodes deployed in the environment [68] [69] Forthose reasons, energy conservation is commonly recognized as the key challenge todesigning and operating the network in WSNs, because individual sensor nodes areexpected to be low-cost, small-in-size, and powered by a non-replaceable battery

In recent years, numerous energy-saving approaches have been proposed in

[70] [71] They can mainly be classified into five categories including radio

optimization, data reduction, sleep/wakeup schemes, energy-efficient routing and charging solutions The next section will present these five categories of energy-

saving approaches

1.2.1 Radio optimization

In radio optimization, to save energy, radio parameters such as coding andmodulation schemes, power transmission and antenna direction are optimized Radiooptimization approaches can further be divided into 4 schemes including modulation

19

Trang 20

optimization, cooperative communication, transmission power control, and adirectional antenna.

Modulation optimization tries to optimize the modulation parameters thatresults in minimum radio energy consumption The good trade-off between theconstellation size, the information rate, the transmission time, the distance betweennodes and the noise are considered [72] [73]

Cooperative communication schemes try to improve the quality of thereceived signal by collaborating several single antenna devices to create a virtualmulti-antenna transmitter [74] [75]

Transmission power control schemes enhance energy efficiency at thephysical layer by adjusting radio transmission power The idea is that a lowercommunication range between nodes requires less power from radio [76] [77].Another idea is that a node with higher remaining energy may increase itstransmission power, which enables other nodes to decrease their transmission power[78]

Directional antenna schemes allow the signal to be sent and received in onedirection at a time that allows the improvement of transmission range and throughput

[79] [80] To take advantage of directional antennas, new MAC protocols have beenproposed in [81] [82] In addition, some specific problems also have to beconsidered in [83]

1.2.2 Sleep/wake-up schemes

Sleep/wake-up schemes try to adapt node activity to save energy by puttingthe radio in sleep mode The main idea of this approach is the duty cycling scheme.Duty cycling scheme schedules the node radio state according to network activity tominimize idle listening and favor the sleep mode They are the most energy-efficientbut suffer from sleep latency In some cases, it is not possible to broadcastinformation to all its neighbors because of unsimultaneously active In addition,some fixing parameters such as listening/sleeping period, preamble length, and slottime are strictly issues because of system performance The detailed survey of dutycycling can be found in [84]

1.2.3 Energy efficient routing

Routing is also a burden that makes seriously drain energy reserves In general,there are various routing paradigms In this research area, some main paradigms areconsidered such as cluster architecture, energy as a routing metric, multipath routing,

Trang 21

relay node placement An extensive review of energy-aware routing protocols can

be found in [85] [86]

Cluster architecture is the organizing of the network into clusters and eachcluster is managed by a cluster head This technique has been proposed to enhanceenergy efficiency because it can limit energy consumption in different means such

as reducing the communication range inside the cluster that requires lesstransmission power; limiting the number of transmits by fusion done by clusterhead; reducing energy-intensive computation to cluster head; enabling power-offsome nodes inside the cluster while cluster head takes forwarding responsibilities;and balancing energy consumption by cluster head rotation [87]

Energy can also be considered as a metric in the setup path phase to extendthe lifetime of sensor networks In this case, routing algorithms not only focus onthe shortest paths but also can select the next hop based on its residual energy [88]

Multipath routing, in general, is more complex than single-path routing Butsingle-path routing can rapidly drain the energy of nodes on the selected path.Multipath routing can balance the energy among nodes by alternating forwardingnodes [89] [90] More surveys on multipath routing protocols can be found in [91]

The premature depletion of nodes in each region can create energy holes orpartition the network This situation can be avoided by optimizing node placements

or adding some relay nodes with enhanced capabilities This helps to improveenergy balance, avoid sensor hot-spots, ensuring coverage [92]–[94]

1.2.4 Data reduction

Energy consumption depends on data transmission Thus, reducing theamount of data to be delivered can save energy Data reduction approaches can bedivided into three types: data aggregation, adaptive sampling, and network coding

Data aggregation techniques involve different ways of routing data packets tocombine them by exploiting the extracted features and statistics of datasets comingfrom different sensor nodes There are several aggregation techniques with differentaggregation functions and for different specific application requirements The firsttype of aggregation function is to extract the maximum, minimum or averaged value

of aggregated data [95] [96] In this way, it can reduce the amount ofcommunicating data in the networks which affect the power consumption However,this technique can lose much of the original structure in the extracted data

The second type of aggregation technique is data compression Datacompression techniques are further divided into distributed data compression [97]

21

Trang 22

[98] and local data compression [99] [100] The distributed data compressiontechniques are the most optimal compression However, it is much morecomplicated than local data compression that is with smaller compression rate Thedetailed survey of data compression in WSN can be found in [101] It is important

to note that the data compression techniques are only effective with correlation data.Therefore, the correlation is usually required when using these techniques

The third type of aggregation technique is representative type [102] in whichsome nodes are chosen to be the representative of a group of nodes The other nodes

in the group can be put to sleep to save energy The number of sleep nodes thataffects the power consumption is decided by specified distortion Same as datacompression, these techniques required data in correlation

Adaptive sampling techniques adjust the sampling rate at each sensor whileensuring that application needs are met in terms of coverage of informationprecision by exploiting spatial-temporal correlations between data By reducing thenumber of samples, the amount of transmitted data is reduced thus save the nodeenergy The temporal analysis of sensed data is used in [103] and spatial correlation

is used in [104] More details about adaptive sampling can be found in [105]

Network coding is used to reduce the traffic in broadcast scenarios by sending alinear combination of several data instead of a copy of each data At the destinationnodes, data can be decoded by solving the linear equations [84] [106] Network codingexploits the trade-off between computation and communication since communicationsare slow compared to computations and more energy consumption

1.2.5 Charging solution

Several recent types of research address energy harvesting and wirelesscharging techniques for WSNs as promising solutions because of rechargecapability without human intervention

Energy harvesting techniques have been developed to enable the sensors toharvest energy from their surrounding environment such as solar, wind or kineticenergy [107] Energy harvesting schemes often require energy prediction to managethe available power efficiently It is important to note that because of the limitation

of remain energy between two harvesting opportunities, the energy savingmechanisms are still necessary to implement

The breakthrough in wireless power transfer is expected to enable thewireless charging capability for WSNs Wireless charging can be done in two ways:electromagnetic radiation and magnetic resonant coupling It is shown that

Trang 23

omnidirectional electromagnetic radiation technology is only applicable to ultra-lowpower requirement and low sensing activities [108] The reason is electromagneticwaves suffer from the rapid drop in power efficiency over distance, and activeradiation technology may pose safety concerns to humans In contrast, magneticresonance coupling appears to be the most promising technique with higherefficiency and safer However, the charging range is still a big concern [108].

Data correlation and energy conservation in WSNs

In typical WSNs applications, sensors are required for spatially densedeployment to achieve satisfactory coverage [1] Consequently, multiple sensorswill record information about a single event in the sensing field, i.e these senseddata strongly depends on each other For example, temperature sensors in the sameroom record the same temperature information, or several cameras that monitor thesame area record many frames with similar information In another word, they arecorrelated with each other The existence of correlation characteristic can bringmany significant potential advantages for the development of efficientcommunication protocols well-suited to the WSNs paradigm For example, due tothe correlation degree, data in a correlated region can be compressed with a highratio, thus the amount of sent data is reduced [109] Even with high enoughcorrelation, it may not be necessary for every sensor node in a correlation group totransmit its data to the base station; instead, a smaller number of sensormeasurements might be adequate to communicate the event features to the basestation within a certain reliability/fidelity level [110]

In addition, in WSNs, the power breakdown heavily depends on the specificnode However, the following remarks generally hold [109] [111]

• The radio energy consumption is of the same order of magnitude in thereception, transmission, and idle states, while the power consumption drops of atleast one order of magnitude in the sleep state Therefore, the radio should be put

to sleep (or turned off) whenever possible

• The communication activity has an energy consumption much higher than thecomputation activity It has been shown that transmitting one bit may consume

as much as executing a few thousand instructions [112] Therefore,communication should be traded for computation

Data correlation can allow us to reduce the data transferring, or even to put some sensor nodes to sleep Thus, it can make WSNs conserve energy significantly

23

Trang 24

Problem statements and contributions

The main problems in this research are “How to recognize the correlation

among dataset by looking at data itself and how to exploit the correlation characteristic for energy conservation in WSNs” In this research, we focus on

WSNs working in high correlation environment A high correlation environmentcan be divided into groups called correlation regions where measured data stronglydepends on each other By clustering sensor nodes into correlation regions, dataaggregation can be done to conserve the energy in WSNs In this paper, we focus ontwo data aggregation schemes including data compression and representativeaggregation The main contributions of the thesis are:

Developing an entropy correlation clustering algorithm and entropy correlation model to describe the correlation characteristics of a correlation cluster.

This algorithm can divide a correlation environment into several correlationregions using the entropy values of measured data and the entropy correlationcoefficients of measured data pairs in the environment At the same time, thisalgorithm uses only the data itself and does not depend on the distance information.The correlation model describes the relationship between the joint entropy of adataset and the number of data series in the dataset, in consideration of data’sentropy correlation coefficient

Analyzing and evaluating the impact of the correlation characteristic to data aggregation schemes.

With the proposed correlation clustering and model, it is necessary toevaluate their impact on data aggregation schemes With data compressionaggregation, several compression schemes and network structures are considered tofind the most appropriate compression routing for WSNs With representativeaggregation, a distortion function that measures the required ratio of data loss isused The number of representative nodes is then evaluated, and the representativenode selection algorithm is proposed

Developing an entropy correlation-based data aggregation protocol for WSNs to exploit the correlation characteristic of the sensed environment.

The developed protocol includes two phases, one phase is for data collection

to identify correlation characteristic, the other phase is for data aggregationimplementation For this protocol, the proposed clustering algorithm and dataaggregation schemes are used In addition, the design of the protocol is proposed.The feasibility of the developed protocol is then demonstrated using simulation

Trang 25

CHAPTER 2

As mentioned in chapter 1, correlation characteristic has many significantpotential advantages for the development of energy-efficient communicationprotocols for WSNs To evaluate and exploit the correlation characteristic, it isnecessary to build a correlation model This chapter concentrates on the survey ofthe existing correlation models From the advantages and limitations of the previouscorrelation model, the approaching methodology of developing a new correlationmodel will be pointed out

Correlation model survey

Correlation is represented for the relationships between quantitative variables

or categorical variables In other words, it’s a measure of how things are related.Data correlation is a measure of how data is related to each other

To exploit the correlation in WSNs, it is necessary to recognize the correlation among data in the network by establishing correlation models There have been many research efforts to study the correlation model in WSNs In [111], correlated nodes are supposed to observe the same source , and the observed data

( ) at the i th node is the sum of a correlated version of the source ( ) and observed noise ( ).

( ) = ( ) + ( ).

(2.1)

The correlation model is the covariance function (correlation coefficient )that is chosen to be distance dependence and can be classified into four groupsincluding:

Rational quadratic:

25

Trang 26

Magnitude -dissimilarity: Two-time series { , , … , and

Trang 27

Trend -dissimilarity: Two-time series { 1 , 2 , … , } and { 1 , 2 , … , } are trend -dissimilarity if

defined as the sum of absolute differences of their components:

(2.9) ( , ) = ∑| − |.

=1

The smaller the Manhattan distance is, the more similarity between the two vectors

is Manhattan distance also is used to define the dissimilarity in [119]

Some research efforts define the correlation model in different ways such as

a linear predictive model [120], node weight [121], data density correlation degree[102] In [120], a set of sensor nodes is a correlation set if a reading at a node can be

predicted using a linear combination of readings from the other nodes Let = { 1 , 2 , … , } is a set of sensor nodes Then, the predicted value of a node , ′[ ], can be presented as a linear combination

of 1 [ ], 2 [ ], … , [ ] for all :

(2.10)

=1

in whichare weighting coefficients The

set ( , ) is a correlation set if the difference between actual value [ ] and predicted value ′[ ] is within a certain application-dependent bound.

27

Trang 28

= ∑( [ ] − ′ [ ]) 2

=1

Weighting coefficients are determined such that is minimized

In [121], the correlation of a node with its neighbors is evaluated using correlated weight The definition of the Spatial Correlated Weight considers the average spatial distance deviation between each node and its neighbors within a predefined communication range For

correlated weight of node is defined as:

in which is Euclidean distance between node and node , ̅ is expected the value of , ̅ is the number of nodes in ( ), and ( ) is deviation of Large value of (or small value of ( )) implies the small distance variation between the node and its neighbors, i.e a high spatial correlation with its neighbors.

In [102], in order to evaluate the correlation, the data density correlation degree of a node is used Let sensor node has neighboring sensor nodes within the cycle of the communication radius of Those neighboring sensor nodes are called 1, 2, … , The data object of is , and its neighboring sensor nodes’ data is 1, 2, … , respectively Among these data objects, there are data objects whose distances to are less than and ≤ ≤ Then the data density correlation degree of sensor node to the sensor nodes whose data objects are in - neighborhood of is calculated as follows:

Trang 29

; is the average distance between the data objects and ; 1 , 2 and 3 are weights, 1 + 2 + 3 = 1.

Same as previously mentioned approaches, all the above models consider only the linear correlation between data and distance-based In order to solve more general correlation relation, entropy-based correlation models are considered [122]–[126] In [122], the joint entropy of a group of nodes are calculated using a real dataset and then a distance based joint entropy function is built by approximating the calculated joint entropy The joint entropy of a set of nodes { 1, 2, … , } is calculated based on the joint entropy −1 of the set of − 1 nodes { 1, 2,

… , −1} and the shortest distance from to any node in the { 1, 2, … , −1} as follows:

1

(2.14)

+ 1

in which is a constant that characterizes the extent of spatial correlation in the data

= 1 −

(2.17)

Entropy-based correlation model also is used in [125] Entropy correlationcoefficient is chosen to be the Pearson linear correlation coefficient to reduce thecomputation complexity but reduce the generality of using entropy

29

Trang 30

It can be seen that the correlation models in the above works are all based onthe distance between nodes The smaller the distances between nodes are, the higherthe correlation they are However, this assumption may be not always true because

of some physical barriers among nodes For example, in Figure 2.1, some sensornodes are placed in two rooms next to each other, in which room 1 is equipped with

an air conditioner while room 2 is without an air conditioner Node A and B areplaced close to each other, but they are in different rooms with independentconditions which causes their sensed data may be independent of each other Thesensed data in node A is correlated to node C because they are placed in the sameroom with the same conditions, despite their distance is larger than the distancebetween node A and B Therefore, it is necessary to establish a correlation modelwhich is distance-independent to the positions of sensor nodes In addition, whenobserving the readings over time of 54 sensors deployed at Intel Berkeley Lab [127][128], it is found out that correlation of data may be independent of external factorssuch as sensor location and environmental conditions Therefore, it is better to look

at the information contained in the data itself rather than considering only attributemeta-data such as location and time [126]

Air conditioner

Node A Node B

Node C

Figure 2.1 The layout of sensor nodes in an environment with two different conditions area

To improve the above problem, in [129], entropy is calculated from real data and then the joint entropy of a set of nodes { 1 , 2 , … , }

is approximated by a function in the set as follows:

(2.18)

in which and are constants determined from real data The advantage of this model

is a distance-independent model, but the disadvantage is that this model can only beobtained when the correlation set has been established The calculation fordetermining the correlation group using this model is very complicated with huge

Trang 31

It can be found that most of the correlation models are distance-based thatmay not be true in some cases such as examples which are shown in [126] In thispaper, authors found out that sensors in similar environmental conditions that arenot necessarily spatially correlated can report correlated data and correlation of datamay be independent of external factors such as sensor location and environmentalconditions Therefore, it is necessary to develop a model that is distance-independent and applicable practically for Wireless Sensor Networks.

Correlation model can be established mainly by using traditional probabilityand statistic theory or by entropy information theory However, the correlation fromthe point of view of information entropy is more general, but more challenges Withthe purpose to find a novel and general correlation model, this thesis will use theinformation entropy theory to discover the correlation characteristic by looking atthe data itself

Information entropy theory

2.2.1 Overview

Information entropy or Shannon’s entropy is a foundational concept ofinformation theory Information entropy quantifies the amount of information in avariable, thus providing the foundation for a theory around the notion of information

At a conceptual level, information entropy is simply the "amount ofinformation" in a variable More intuitively, that corresponds to the amount ofstorage (e.g number of bits) required to store the variable, which can be understood

to correspond to the amount of information in that variable However, thecalculation of this number of bit and therefore the amount of information in avariable is more involved than might appear at first sight It is not simply thenumber of bits required to represent all the different values that a variable mighttake on, which is just the raw data For example, a variable may take on any of 8different values In digital storage, 3 bits would be enough to uniquely represent the

8 different values, and thus the variable can be stored in 3 bits

However, this is an upper limit on the required storage; it is the amount ofstorage required to store the raw “data” of the variable, not the “information” in thatdata Less storage might be enough to store the information, depending on the process

by which the variable takes on different values For example, suppose a coin iscompletely biased and always comes up heads when tossed Then the random variablerepresenting the coin toss's outcome has probability 1 of coming up heads (in otherwords, it is a constant) It is not necessary to store that variable as it can be trivially

31

Trang 32

guessed at any time Thus, the amount of information in that variable is zero On theother hand, if we have a perfect coin with half-half chances of coming up heads or tailsupon a coin toss, then we can guess the outcome of a toss with only 50% accuracy(probability 0.5), so it is necessary to store the actual value of that coin toss outcome'srandom variable to know its value with better than 50% accuracy The amount ofinformation in this second random variable is much higher than in the first case.

In a more sophisticated representation of the variable, if a variable is easier toguess, then we can use that fact to reduce the number of bits needed to store thatinformation If the value of the variable is easier to guess, the variable is less

“surprise” and contains less information Thus, an alternative way of consideringentropy is as a measure of “compressibility” of the data, i.e., a compression metricthat expresses how much the raw data of a variable can be compressed withoutlosing the information in the variable

2.2.2 Entropy concept

In information theory, the entropy of a random variable is a function whichattempts to characterize the “unpredictability” or “uncertainty" of a random variable[130] On the other word, the more uncertainty or unpredictability the event is, the more information it will contain and the larger the value of its entropy is

If a random variable takes on values in a set = { 1 , 2 , … , }, and is defined by a probability distribution ( ), then the entropy ( ) of the discrete random variable is written as:

( ) = − ∑ ( )( ).

(2.19)

∈

The units of entropy are “bits” or “nats” depending on log ( ) is based on base 2 logarithms or natural logarithms.

In this context, the base 2 logarithm is used instead of natural logarithm and hence, entropy is defined as the expected number of bits of information contained in each event, taken over all possibilities.

2.2.3 Joint entropy

In case of random variables, the information is calculated by joint entropy which is the entropy of a joint probability distribution, or a multi-valued random variable [130] By definition, if and are jointly distributed according to joint probability distribution ( , ), the joint entropy ( , ) is:

Trang 33

( , ) = − ∑ ∑ ( , ) 2 ( , ),

(2.20)

∈ ∈

in which and are particular values of and

For more than two random variables 1 , 2 , … , , the joint entropy expandsto:

1 ∈ 1 ∈

in which 1 , 2 , … , are values of 1 , 2 , … , , respectively, ( 1 , … , ) is the probability of these values occurring together.

Correlation and entropy

2.3.1 Correlation of two variables

Correlation is a measure of the relation/dependence between variables Inentropy theory [130], the relation between two variables can be described by mutualinformation and correlation coefficient concepts

2.3.1.1 Mutual information

( , )

Figure 2.2 The relations between entropies, joint entropy, and mutual information

In Figure 2.2, the relationship between random variable entropies, joint entropy,and mutual information are described The relation between entropy and joint entropy

is shown by inequality (2.22) with equality if and are independent:

( , ) ≤ ( ) + ( ).

(2.22)

33

Trang 34

The above inequation shows that when the information covered by fullycomprised in its content, the joint entropy of two random variables exactly equals tothe summation of the entropies of both variables On the other hand, the jointentropy of these two variables is always smaller than the total entropies of these twovariables Knowing the joint entropy of random variables can tell us how muchknowing some variables reduces uncertainty about the other The smaller the value

of joint entropy is, the higher the correlation of the random variables is

Another metric which is also used for measuring the mutual dependencebetween the two variables is mutual information If the entropy of a random variable

is used to measure information about the event itself, mutual information is aquantity that measures the relationship between two random variables which aresampled simultaneously

For example, represents the weather and represents the humidity on a day in

a specific city The value of tells us something about the value of and vice versasuch as if the probability of the weather is rainy, then the probability of highhumidity is certain That is, these variables share mutual information If representsthe weather of a day, and represents the humidity of the same day, then theinformation of a rainy day can tell us something about the humidity of that day Onthe other hand, if represents the weather of a day, and represents the humidity of theother day, then and share no mutual information because the weather of one daydoes not contain any information about the humidity of the other day

In general, mutual information measures how much information is communicated, on average, in one random variable about another The mutual information between two variables

is 0 if and only if the two variables are statistically independent The formal definition of the

2.3.1.2 Entropy correlation coefficient

It is found that mutual information can be used to measure the correlationbetween two sets of data, the larger the value of mutual information of the twovariables, the more the correlation between these two variables However, it is difficult

to compare the correlation level between two pairs of random variables using mutualinformation or joint entropy, because their values depend on the entropy of

Trang 35

each individual data in the pair To overcome this problem, a normalized measure ofmutual information called entropy correlation coefficient which was introduced in[124] and [131] is used to evaluate the correlation.

The relation between mutual information and entropy is given by:

Trang 36

( , )( , ) = 2

( ) + ( ) Then ( , ) can be calculated by:

( , ) = 2

= 2 ( ) + ( ) ( ) + ( )

The coefficient ( , ) is called the entropy correlation coefficient of the two random variables and , in the relation with mutual information ( , ) or joint entropy ( , ) Entropy correlation coefficient presents the comparative relationship of a pair of data, independent to the value of individual entropy, and therefore it can be used to compare the correlation level of two pairs of data.

The entropy correlation coefficient varies from 0 to 1, depending on the correlation between the two nodes The larger the value of is, the higher the correlation is If = 1 (in case ( ) = ( ) = ( , )), two sets of data totally depend on each other If = 0 (in case ( , ) = ( ) + ( ) ), they are independent.

2.3.2 Correlation of more than two variables

The information entropy theory only gives us the correlation evaluation oftwo variables For correlation of more than two variables, one can be extended frommutual information or entropy correlation coefficient concepts, but it is not theefficient way and even useless

When working with correlation, there are two requirements that must befigured out The first requirement is how to recognize the correlation betweenvariables and the second one is how to evaluate the correlation level For correlation

of two variables, entropy correlation coefficient can be used for both recognitionand evaluation of correlation For the correlation of more than two variables, therehas not been any efficient way of using entropy theory directly to solve these two

Trang 37

requirements Instead, the correlation is recognized using distance information, andthe correlation level is evaluated using distance dependence functions [122]–[124].

With the expectation of looking at the data itself, the distance dependenceapproaches are useless To overcome these requirements, joint entropy can be anobject to consider From the information entropy theory, the joint entropy of avariable group presents the amount of information to specify the values of variables

in the group Joint entropy is always smaller than the total entropy of individualvariables The more correlation among variables in the group is, the more differencebetween joint entropy and total entropy of individual variables is However, it isdifficult to use the comparison between the joint entropy and total entropy Instead,the increasing of the joint entropy of a group when one variable is added into thegroup is considered If the added variable is highly correlated with variables in thegroup, i.e it strongly depends on the variables in the group, the increasing of thejoint entropy of the group by adding the variable is small

Figure 2.3 Relation between correlation and joint entropy

In another word, a small amount of additional information is needed to specifythe added variable Therefore, if we consider the relation of joint entropy value with thenumber of variables in a group, we can find that the increasing speed of joint entropyvalue will gradually be reduced and approach to zero In another word, the jointentropy value goes to approach the “saturation” state when the number of consideredvariables increases The nodes with higher correlation will approach the saturation statefaster This phenomenon is described in Figure 2.3 and is discovered

37

Trang 38

by the authors of [129] The speed of going to “saturation” state can be specified tocorrelation level.

However, it is difficult to use this joint entropy characteristic of correlationgroup to recognize a correlation group To find a correlation subgroup from aconsidered group, the relation of joint entropy and the number of variables of allpossible subgroups must be established and checked Therefore, we need to find anefficient way to use the joint entropy characteristic of correlation group to recognizethe correlation and evaluate the correlation level

Conclusions

In this chapter, the survey of the correlation model in WSNs along withseveral correlation model types has been done The survey results show that themodels which use traditional probability and statistic theory only describe the linearcorrelation The information entropy-based correlation models can perform thegeneral correlation Most of the correlation models are distance-dependent models.However, it is necessary to investigate the correlation characteristic byconcentrating on the data itself instead of using distance information Whenworking with data itself, one can use the relationship between joint entropy and thenumber of considered variables to recognize the correlation characteristic However,among the above discussed models, there has not any efficient way to employ thisrelationship This problem will be solved in the next chapter

Trang 39

CHAPTER 3

In chapter 2, we have shown that correlation is related to the relation betweenjoint entropy and the number of considered variables To determine whether a group

of nodes is correlated or not, it is necessary to know the entropy of each node andthe joint entropy of all subgroups of the considered group However, the calculation

of joint entropy for a group of more than two nodes is a waste of time with hugecomputation resources As a result, it is necessary to find a simple method toestimate joint entropy To solve this problem, in this chapter, we try to estimate thejoint entropy of a node group from the entropy of individual nodes in the group andthe entropy correlation coefficients of all pairs in the group From this estimation,the correlation characteristic can be recognized, and the correlation clustering can

be established

Joint entropy estimation

satisfies the following conditions:

of hierarchical clustering [132] as described in the following sections

3.1.1 Determining the upper bound of joint entropy

With a group that has only one node, we have the entropy of the nodedefined by equation (3.1):

(3.3)

in which 1 = 1.

With a pair of nodes, and , from the definition of entropy correlation

coefficient in equation (2.33), we have:

Trang 40

( , ) = min{ ( , ), ( , )}}. (3.10)

In addition, ( , ) ≥ ∀ ≠ ; 1 , , then ( , ) ≥

We have

Định dạng
Số trang	141
Dung lượng	5,92 MB