NeuralWirelessSensorNetworks Frank Oldewurtel and Petri M ¨ ah ¨ onen RWTH Aachen University, Department of WirelessNetworks Kackertstrasse 9, D-52072 Aachen, Germany Email: Frank.Oldewurtel@mobnets.rwth-aachen.de Abstract— We present an overview of embedded network applications and discuss requirements arising from this analysis. Furthermore, we discuss selected in-network processing tech- niques and point out the analogy between neural and sensor networks. In the following neuralnetworks are introduced in the sensor network context. We describe the motivation and the practical case for neuralnetworks in the sensornetworks context, and evaluate early results achieved with our test implementation. We argue that there is a high potential with these paradigms which promise a strong impact on the future research, especially if applied as a hybrid technology. I. INTRODUCTION ON SENSOR NETWORKSWirelessSensorNetworks (WSNs) are an emerging tech- nology which holds the potential to revolutionize everyday life [1], [2], [3], [4]. Continuous advances in semiconduc- tor technology have enabled miniaturization of radios and mechanical structures and deployment of very cost-efficient wirelesssensor nodes. A sensor node basically consists of a microcontroller (processor and memory), sensors, analog-to-digital converter (ADC), transceiver (sender and receiver) and power supply. The abstract architecture of a sensor node and the depen- dencies of its units are depicted in fig. 1. The sensor itself power supply sender receiver processor memory sensors ADC sensor node power supply sender receiver processor memory sensors ADC sensor node Fig. 1. The abstract architecture of a sensor node. is a transducer that converts a physical phenomenon such as heat into electrical signals that may be further manipulated. The provision of sensors is to gather information about the physical world, for instance, temperature, light, acceleration, velocity, vibrations, pressure, magnetism and humidity. Those are analogous to organs such as eyes and ears of biological organisms. The individual sensor nodes are inherently resource constrained due to size limitation (form factor), price and power consumption. Hence, the sensor node itself is not very powerful at all but combined with others, essentially building the WSN, leads to a powerful entity as a result of synergy effects. The sensing of a physical phenomenon, processing the extracted information, communication with others and the coordination of intended actions generate a high potential that should be exploited. In this paper we propose, yet another sensor network par- adigm, namely using neuralnetworks embedded with WSNs. The paper is organised as in the following. In section II we give an overview of real-world applications and its re- quirements in the embedded network context. Furthermore, in section III we give a rough view of the classical distributed information processing. After this, in section IV, we introduce neuralnetworks and show the analogy between neurons and sensor nodes. The Hopfield network (HN), the implementation, its test results and hierarchical approaches are presented in this section as well. Section V introduces the neural network paradigm in the sensor network context. In the following we discuss selected in-network processing techniques accord- ing to our focus within section VI. Finally, we conclude in section VII. II. C OMMON EMBEDDED NETWORK APPLICATIONS In this section we will give an overview of some of the envisaged application scenarios. We are confronted with highly different requirements regarding these scenarios, thus making it difficult to define separate and not overlapping classes of scenarios. In the following we will roughly define classes of scenarios in respect to requirements and constraints [1], [2], [3], [4]. A. Industrial scenarios Industrial scenarios have typically less need for power conservation because of the possibility to use fixed power supplies or the opportunity to change batteries easily in most cases. On the other hand these scenarios typically require highly reliable and robust functionality. Delay, throughput and interoperability can be crucial for some applications while the deployment of sensor nodes will generally be done by hand. Members of this class are the management of inventories, distributed robotics, teleoperation of autonomous systems (e.g. exploration of unknown and dangerous terrain), production and delivery and automotive telematics. Additionally it is pos- sible to disseminate traffic information and gather continuous car data derived from embedded sensors for many applications. Power monitoring is also an interesting application area. Sensor nodes can be attached to large buildings to detect locations or devices which consume a lot of power, thus providing indication to reduce power consumption. In that case sensor nodes can be placed manually and can be powered by a permanent power supply. Finally, precision agriculture has been considered in some projects. For example, wine production will benefit from tech- nologies which control soil moisture and the use of fertilizers or pesticides. Frost protection and monitoring the conditions that influence plant growth at vineyards might be enabled. The sensor nodes may be deployed in a regular grid across a vineyard to measure, e.g., light conditions, humidity and temperature. B. Smart life scenarios This group of scenarios will influence the everyday life of a person utilizing future embedded networks. Sensor devices and communicating embedded nodes will be ubiquitous in the environment, and are able to measure and convey diverse data about the user and the environment. In general the scenarios fall under the general terms of ubiquitous or pervasive com- puting, sometimes called ambient intelligence in Europe. Within this class of scenarios we face different require- ments on reliability. First, habitat monitoring, interactive toys and interactive museums are classical smart life scenarios where unreliable applications are perhaps inconvenient, but will not compromise security or life of humans. Second, strong reliability and additionally higher accuracy is manda- tory for home security and healthcare applications. In respect to healthcare one might think about drug administration in hospitals and vital sign monitoring. These types of applications will dramatically improve the quality of life and safety for patients due to non-intrusive assessment in real-time. The strong requirement present in all these scenarios is “zero- configurability”, since these applications are used by home users or personnel untrained in the intricacies of embedded networking. In respect to scenarios like home security or healthcare there might be need for a personalized first time configuration but in general everything (in particular a toy) should work just by switching it on. From the smart life applications we move into the domain of massive sensor network designed to monitor large envi- ronmental spaces. These application scenarios have strong need for power conservation since the system lifetime is supposed to be long, typically at least some years. On the one hand requirements such as reliability, accuracy, delay and throughput are not considered to be extremely important. On the other hand scalability and deployability are crucial because these networks will consist of high numbers of devices or will be deployed over huge areas. In respect to this, issues like cost-effectiveness and cheap (random) deployment play an important role. Also mentionable application example is the measurement of the incremental shift of ice and snow in alpine mountains. For these applications, pressure, temperature and tilt sensors for measuring the orientation of the node can be used to gather this kind of information. System lifetimes of many months to years are envisioned. In water monitoring sensors are used to observe temperature and salinity, and the measurement data should be available in real-time. While many of the nodes will be dropped, thus deployed by planes or ships, some might be deployed manually. C. Disaster monitoring and management scenarios Applications in this category are naturally tightly related to environmental monitoring at the outset. However, since disasters occur unpredictably, deployment considerations be- come vital. Also less stringent requirements are placed on network lifetimes, while accuracy requirements gain in impor- tance. Thus, these scenarios address most of the requirements mentioned before due to significant impacts on humans and the environment; namely reliability, robustness, scalability, accuracy, delay and rapid random deployment. The effective transfer of accurate and timely information is important in the very early stages of an incident to coordinate rescue of victims and a large number of resources such as food, water and medicine. Typical disasters considered in this context are earthquakes, floods, avalanches, tidal waves and tornados, and the tracking of chemical plumes. In general disruption of the existing communication infrastructure will occur with most disasters. The sensor nodes should be deployable rapidly in ad hoc fashion, e.g., from planes or through series of aerial platforms. Some sensor nodes might be predeployed in, say, known avalanche areas but we mostly envisage deployment in an ad hoc fashion. It is important that the information transport and processing capability does not become disconnected. Thus, connectivity is also a critical requirement. The system should also ensure that the information is not completely lost in case of, e.g., an aftershock following an earthquake. Information replication and robust distributed database techniques might be necessary to maintain availability and to improve reliability. The number and location of the replicated information should be considered carefully in respect to the power consumption which often leads to a trade-off. III. C LASSICAL DISTRIBUTED INFORMATION PROCESSING Taking a classical computer networking viewpoint, WSNs can be seen as a collection of processors which are deployed over an area of interest. Those processors have low com- putational capabilities. The associated sensor nodes maintain communication links to each other and sense the surrounding environment. If an event occurs within the sensing area of the WSN, local sensors start to gather data and propagate their information forward to a central location. The huge amount of sensor data with high dimensionality often obstructs traditional algorithms. In decentralised processing, data is preprocessed at each sensor node which basically leads to a directed diffusion type of network [5], [6]. The in-network processing can either be data aggregation or data fusion where copies of similar data are replaced by a single message [7]. This is more efficient since sensor data has significant redundancy both in time and space due to, e.g., correlation over different modalities and slowly changing environments. Similarly WSNs can be seen as distributed computing platforms like parallel computers. Obviously those show very little computational capability and support only bad quality communication wires. In respect to that way of thinking we envision that the application of lightweight algorithms such as neuralnetworks can be useful. From a data storage point of view, WSNs can be seen as distributed databases which collect information about the physical world, index them, and then handle queries within the network [8], [9]. The advantage of the database approach is that it provides a separation between the logical view (naming, access, operations) of the information held by the WSN and the actual implementation on the network. Sensor nodes may process sensor readings to produce ab- stract and typical representations of various states or contexts, for instance, for detection and classification purposes. Further- more, a sensor node contains descriptions of its characteristics, e.g., its location and type. A sensor network database contains all the features of every sensor. Another alternative to the centralized approach is to store the data within the network itself and assume that queries are generated anywhere in the WSN. It is important to stress that in-network storage enables data to be aggregated or fused before it is sent as an answer to a query. That is, one can reduce the overall data the network has to transmit through the combination of data at intermediate sensor nodes. Thus, communication is reduced which extends network lifetime due to energy savings. IV. N EURAL NETWORKS In general Artificial NeuralNetworks (ANN) show char- acteristics such as distributed representation and process- ing, massive parallelism, learning and generalization ability, adaptivity, inherent contextual information processing, fault tolerance and low computation. Many of those characteristics are either inherent or desirable for WSNs as well. As shown in fig. 2, we transfer the perception of a biological neuron via an artificial neuron to the sensor node. The biological neuron is composed of dendrites, an axon and a cell body called soma. Each neuron can form a connection to another neuron via the synapse which is a junction of an axon and a dendrite. So called postsynaptic potentials which are generated within the synapses are received via dendrites and chemically transformed within the soma. The axon carries away the action potential sent out by the soma to the next synapse [10]. The analogy to artificial neurons is easy to see. The incoming signals are weighted, as it is done in synapses, and further processed. The functionality h(x) is basically a weighted sum over all inputs. Here the output corresponds to the axon. The new step, transfer to the sensor node, is depicted in fig. 2 as well. The sensor converts the physical world to an electrical signal which is filtered or preprocessed corresponding to weighting/synapse. The sub- sequent processing within the processor corresponds to the chemical processing accomplished by the soma or applying the particular functionality h(x), respectively. Finally, the sensor dendrites soma axon 1 synapse axon 0 inputs h(x) output signal weight filtering processing radio link f(x) sensor artificial neuron biological neuron sensor node dendrites soma axon 1 synapse axon 0 inputs h(x) output signal weight filtering processing radio link f(x)f(x) sensor artificial neuron biological neuron sensor node Fig. 2. The analogy of the biological neuron (abstract), artificial neuron and the sensor node. node sends out the modified sensor reading via the radio link. This strong analogy shows that the sensor node itself can be seen as a biological and artificial neuron, respectively. Hence, we can easily extend our horizon and see some WSNs as large- scale ANNs whereas we are fully aware of inherent dangers of analogies. Nevertheless, the analogies are typically a good inspiration as we have seen from the fact how biological neuralnetworks have inspired neuralnetworks research, although naturally ANNs are not strictly speaking direct counterparts for their biological brethren. ANNs exchange information between neurons frequently and can be divided into two classes based on their connectiv- ity: feedforward and feedback networks. Especially feedback networks exchange lots of information due to their iterative nature. In terms of WSNs this means that sensor nodes need to communicate with each other frequently. As a rule of thumb, the energy consumption of the transmission of one packet equals 1000 arithmetic operations. Hence, sensor nodes will be depleted faster which will reduce network lifetime significantly. Feedforward networks are better applicable since their connections are directed from input layer to output layer. We can see the whole sensor network as a neural network and within each sensor node inside the WSN there could run also a neural network to decide on the output action. Thus, it is possible to envision two-level ANN architecture for WSNs. Hence, the implementation of full ANNs on each single sensor node is beneficial since the discussed inherent charac- teristics such as parallelism and low computation are valid. Therefore, efficient neural network implementations using simple computations can replace traditional signal processing algorithms to enable sensor nodes to process data by using less resources. sensor input pattern optimized output pattern sensor input pattern optimized output pattern sensor input pattern optimized output pattern Fig. 3. The Hopfield network applied to the single sensor node. A. Hopfield network applied to the single sensor node Wireless communication often suffers from bad channel conditions. One has to deal with erroneous or even lost data packets by signal processing algorithms or other techniques such as retransmission. In this context the HN shows promis- ing characteristics such as associative memory, robustness and error correction capability to overcome this drawback [11], [12], [13]. Associative memory means that a pattern is not stored on any individual neuron but is a property of the whole network. Thus, the weights within the HN store the average correlations between all pattern components across all patterns. The network presented with a partial or corrupted pattern can then use the correlations to recreate the entire pattern. The HN itself is robust since it performs pattern completion in case of missing data and pattern correction in case of corrupted data due to the association ability. The HN is a single layer fully connected feedback network with no direct feedback connections, i.e. each single neuron is not directly connected to itself. Furthermore, it shows symmetric (bidirectional) weights, i.e. the weights between all single neurons are equal in either direction. Figure 3 depicts the HN, which is presented with the sensor input pattern containing readings of three sensors. After iterative processing the optimized, i.e. completed or corrected pattern can be used to build a data packet, which is represented by the dashed box. B. Hopfield network implementation As a result of the promising characteristics we decided to implement the HN on real hardware. Therefore, we used Telos sensor nodes which provide an 8 MHz microcontroller with 10 kB RAM and 48 kB Flash [14]. Each node is equipped with a temperature sensor, humidity sensor and two light sensors which measure the total solar radiation (TSR) and the photosynthetically active radiation (PAR), respectively. The basic HN implementation consisting of 48 neurons was trained according to the extended “Hebb rule” [11]. That is, if there is a positive or negative correlation between two neurons, then the related weight increases or decreases, respectively. Only one learning cycle is needed due to the combination of the Hebb rule and the signum activation function sgn(x). The number of neurons equals the number of all bits we use from all sensors. To be more precise, we use the following number of bits related to the particular sensor: 14 bits for temperature, 12 for humidity, and 11 for each light sensor (TSR and PAR). This implementation uses TinyOS/nesC and fits into approximately 20 kB out of 48 kB [15], [16]. After learning, the HN processes the input pattern iteratively by using the asynchronous (sequential) update scheme. That is, neurons are randomly selected and each neuron updates its activation at a time. In order to determine a stable HN state (convergence), we use “10 times number of neurons” steps. Additionally, this ensures that all neurons are updated approximately at the same ratio. During all these steps, none of the neurons is allowed to change its activation to reach convergence, otherwise we start counting again. Therefore, the minimum number of neuron updates to reach convergence for our implementation with 48 neurons is 480. On the other hand one has to understand that the maximum number of updates is bounded, but could in theory require an infinite number of neuron updates. After converging to a stable HN state, we can derive three recognition ratios through comparison with the original events. Correct recognition determines the number of correct and valid recognised events. False recognition denotes the number of indeed incorrect recognised events which are still valid ones. The last possible category is spurious recognition which takes unintentional learned events into account. The HN might converge to those spurious events which were never presented to the network during learning. Therefore, spurious recognition determines the number of incorrect recognised events which are invalid. Spurious patterns are a HN property and show significant impact on the correct recognition ratio. One promising approach is to separate events based on their energy profile [17]. Spurious memories, their methods for determination and suppression are well known in associative neural models and can be used to optimise our implementa- tion [18], [19]. There are numerous parameters that can be exploited and adjusted dynamically by using our graphical user interface. Due to this degree of freedom, features such as preprocessing, coding type, the use of the input pattern, modification of the learned weight matrix, the use of “missing data”, increase the number of neurons or the “number of to-be-learned events”, can be enabled. The simple preprocessing is based on a distance measure between the actual input pattern and all learned events. The input pattern is adjusted towards that learned event which fits best (minimum distance). The reading of each particular sensor is processed independently. In respect to the coding type one can choose between binary or Gray coding of the raw sensor readings. However, after coding the data is converted into bipolar data representation which is preferable [11]. The use of the input pattern means that one can decide to include the input pattern in the complete iterative computation process or just in the first iteration. Furthermore, the weight matrix can be modified after learning. In order to reduce complexity we set the minimum weight of the matrix to zero. If one of the sensors measures obviously wrong or out of range data or no data at all, we use the “missing data” feature. In that case we set all so called missing data of the faulty sensor to zero. It is important to stress that the related actual HNs input is set to zero (after coding) not the sensor reading itself. Here, we identify another advantage of the bipolar data representation: instead of using ±1 exclusively we use 0 for missing data (pseudo-ternary). For this reason we can avoid to set all the missing bits to either +1 or −1 which would push the network in a wrong direction. Additionally, one can modify the number of neurons which affects the capacity of the HN, and the “number of to-be-learned events” where both have influence on the performance in terms of recognition ratio. C. Hopfield network test results Since we aim at showing that ANNs can support various WSN applications, real measurements were accomplished by using all above mentioned sensors. The sensor node was used to sense different locations on demand in order to measure so called events which basically consist of a combination of sensor readings. The events such as sensor node in office, outdoor, fridge, server room, held in hand, and underneath a lamp, inherently differ in terms of their sensor readings due to environmental changes. The naming of the events indicates the location where real measurements were conducted. The “number of to-be-learned events” has to be determined in advance. Hence, we assume changes in the environment and a fixed event-set. Changes in the event-set could be addressed with the ART neural network which is not within the scope of this document [11]. In that case, one will face the problem of interpreting the unknown and new event. The HN can store and retrieve a number of events equal to 13.8% of the number of neurons [12]. By using our graphical user interface we can send the “start learning” command to the wirelesssensor node. After executing the transmitted com- mand, the sensor node starts learning the above mentioned six reference events by itself. Subsequently, the HN was presented with different scenarios after learning was completed. Since it is obvious that all original events will be correctly recognised (verified for consistency check) after the minimum number of neuron updates we utilized two additional scenarios. First, so called “fuzzy events” are based on original events which where fuzzified in order to address alternating sensor readings due to environmental changes. Hence, we added a significant deviation of all sensor readings corresponding to light (+ 500 lux), temperature (+ 3°C) and humidity (+ 5%). As a result of those modifications we could observe 46-54% of inverted bits according to the original events. Thus, on average 50% of each sensor reading has been falsified on purpose. Second, so called “events with bit errors” are mutated orig- inal events which consider noise or corrupted sensor readings TABLE I H OPFIELD NETWORK TEST RESULTS WITH CORRECT (+), SPURIOUS (?) AND FALSE (-) RECOGNITIONS fuzzy events updates: recognitions: min/max +/?/- preproc. 522 / 720 3/0/3 no preproc. 541 / 746 0/4/2 no input pattern: preproc. 536 / 645 3/1/2 no preproc. 544 / 861 0/5/1 events with bit errors updates: recognitions: min/max +/?/- input pattern: used 506 / 594 6/0/0 unused 516 / 725 6/0/0 weight matrix modified 514 / 611 4/2/0 missing data 557 / 648 5/1/0 +2 events 572 / 703 6/0/0 original events updates: recognitions: min/max +/?/- missing data: preproc. 480 / 575 6/0/0 no preproc. 520 / 643 6/0/0 weight matrix modified 480 / 484 5/1/0 +2 events 480 / 533 6/0/0 due to failures and wireless errors. Hence, about 10% of the bits of raw sensor readings were inverted at random. In the following we will discuss the performance of the sensor node with HN implementation. We will gain insight by the evaluation of the test results such as the number of neuron updates and the recognition ratios which are listed in table I. By default, we include the input pattern in the complete iterative process, use binary coding, and no preprocessing or modification of the learned weight matrix, if not mentioned otherwise. First, we analyse the fuzzy events scenario with default settings mentioned before. By enabling preprocessing, one third of the “fuzzy events” could be associated with orig- inal events thus recognised correctly. Otherwise, none was retrieved correctly compared to the former case. This result proofs the robustness of the HN to significant variations. As we can see, preprocessing also reduces the necessary number of neuron updates by 3.5% (min/max). In the case we use no preprocessing and the input pattern only once and not for the complete iterative process, the false recognition ratio is slightly reduced. The more important fact is the decreased number of neuron updates, 0.6% (min) and 13.4% (max), respectively. Those results are reasonable since we took significant devia- tions (e.g., averaged 50%) of the original events into account which tends to be a tough task for the network. The reason for faster convergence is that the network is pushed in a good direction, if the input signal is used. Second, we analyse the “events with bit errors” scenario which shows good results in any case. We can achieve 100% correct recognition ratio regardless of the input pattern. The ratio stays perfect even if we increase the “number of to-be- learned events” by one third, e.g., to 8. Again, the use of the input pattern affects the convergence speed. Hence, the number of neuron updates can be reduced by 1.9% (min) and 8.1% (max), respectively. If we introduce bit errors and set the minimum weight of the matrix to zero, the correct recognition ratio decreases according to two third while still no pattern was recognised falsely. By using the “missing data” feature, i.e. all 11 bits of the PAR (light) sensor were set to zero before the pattern was presented to the network, one could obtain the impressive correct recognition ratio of 83%. It is important to stress that therewith about one third of the bits are not original anymore since the ordinary bit errors were still introduced. Finally, we discuss the “original events” tests with particular settings that show good results as well. In the case of “missing data”, regardless of preprocessing, a perfect correct recognition ratio could be achieved. This even holds when increasing the “number of to-be-learned events” by 2 while the maximum number of neuron updates increased by 11%. If we modify the weight matrix as mentioned before we obtain 83% correct recognition while the number of neuron updates could be kept approximately the same. Concluding, we deduce that the association of “fuzzy events” is a difficult task where preprocessing and the input pattern should be used in order to achieve better results. Bit errors and missing data as well as increasing the “number of to-be-learned events” can be handled very well by the HN. Mentionable are HN tests using “orthogonal events” for theoretically purposes. These particular events were deduced from Walsh sequences [20] which are all orthogonal to each other. One can use “orthogonal events” to exploit full capacity of HNs. We do not go into more details and refer to [11] since we restricted ourselves in showing the “real world” tests. D. Hierarchical approach using the Self-Organizing Map The key to extend network lifetime and to decrease power consumption is to reduce the amount of transmissions, i.e. the less data the better. Hence, the dimension of the processed events or patterns is crucial and shows significant impact. Another well known type of ANNs called the Self- Organizing Map (SOM) provides characteristics such as di- mension reduction and clustering of similar patterns [21]. In the following we will introduce a hierarchical approach which uses a sensor node with SOM implementation on top of a group of sensor nodes with HN implementation. We call the group of sensor nodes “sensor node cluster” and the particular sensor node on top “clusterhead” or “fusion center”. Figure 4 depicts a SOM consisting of 5 times 4 non-interconnected neurons (two-dimensional grid) applied to the sensor node. Additionally, squares and dashed boxes represent other sensor nodes with HN implementation and related output patterns (compare with fig. 3). The SOM is presented with all output patterns obtained from the sensor node cluster, consecutively. Each single neuron receives the entire pattern via weighted connections, indicated by curves. The neurons have their own reference vector, in the form of local storage of one particular input pattern that has been presented to the neural network. The self-organizing behaviour will result in a topographic map sensor nodes output pattern SOM association cluster classification, reference vector sensor nodes output pattern SOM association cluster classification, reference vector Fig. 4. The Self-Organizing Map applied to the single sensor node, collects and classifies events. where similar input patterns are mapped onto neurons in a particular region of the map. In general each n-dimensional input pattern is mapped to the 2-dimensional SOM, i.e. it performs n-to-2 dimension reduction. In respect to our case, each 48 bit pattern is converted to a 2-dimensional representation (48-to-2). The dimension reduction of input patterns can be seen as a sort of data aggregation or data fusion within the sensor node. While reducing the amount of data to be transmitted, the SOM performs clustering of similar patterns. This characteristic enables the determination of relations between patterns which leads to their classification. Let us consider an environmental monitoring application exemplarily where a cluster of sensor nodes senses an event. We assume that each output pattern of the sensor node cluster was optimized by a HN. In order to classify the event and increase the reliability of the decision, we present HN output patterns of each member of the cluster to the clusterhead with SOM implementation. The HN output patterns are mapped to the SOM in a manner that similar patterns result in close proximity (see patterns within the “association cluster” depicted in fig. 4). The patterns which are mapped to a different area can be interpreted as still corrupted or related to another event (for instance, see black pattern). Those can be ignored since the majority of the patterns will be mapped within a certain region which can be represented by a reference vector. The reference vector basically classifies the event and thus avoids further processing of redundant or high-dimensional data. This approach inherently reduces the uncertainty of sensor readings due to noise or malfunction. Furthermore, it can be used to validate the behaviour of sensor nodes. V. N EURAL NETWORK PARADIGM We have shown that the analogy between WSNs and ANNs is not only theoretical, but can be also used to actually implement this paradigm efficiently into real WSNs. It is important to stress that ANNs exhibit exactly the same archi- tecture as WSNs since neurons correspond to sensor nodes and connections correspond to radio links. Hence, we can exploit the neural network paradigm in the context of sensornetworks to gain deeper understanding and further perceptions. ANNs are robust and can inherently handle faults. They still provide reliable information even if single neurons are broken or malfunctioning. Our HN test results show this characteristic by using the missing data option. This behaviour can be observed in WSNs as well. In the case of a depleted or mal- functioning sensor node, others can take over its responsibility, without being specifically programmed, to perform the given task. This fail-safety property can be seen as sort of “self- programming” in an ANN-WSN. Redundancy in WSNs leads to robust systems since faulty sensors have little effect on the output. Sensor nodes in an area usually form a sensing cluster and work together in a distributed and parallel way similarly to a layer of neurons. The sensing cluster provides data within the same context but from different points of view which can be compressed by in-network processing techniques. Such contextual information processing is carried out by ANNs as well. VI. I N-NETWORK PROCESSING TECHNIQUES As shown the HN is a way to efficiently combine event recognition and error correction at the same time. Another hierarchical approach which will be further studied performs joint classification and compression by using a LVQ-type of network. The idea is to map the HN output patterns to only “ld (number of events)” neurons (here 48-to-3 mapping). That is, each single bit-combination could represent events explicitly. This approach can be seen as a very simple binary coding strategy that is much less complex compared with the SOM or the original LVQ neural network. In general there are different perspectives on classification. Namely traditional statistical approaches mainly based on Bayesian analysis, machine learning techniques, e.g., decision- trees or rule-based approaches, and neural networks. Statistical approaches are inherently based on the underlying probability model which leads to class probabilities rather than a classification. The classification using automatic computation procedures like decision-trees results from a sequence of logical steps while neuralnetworks perform the classification task on a more unconscious level. The Bayes classifier is based on the hypothesis space and the apriori knowledge [22]. Although the Bayes classifier is optimal in respect to successful classification it is not reasonable according to our envisaged application space. The reason is that we practically cannot determine the probability model since we do not have any apriori knowledge about the occurrence of events due to insufficient training data. In fact, incorrect assumed knowledge could influence the results strongly since results may depend more on apriori knowledge than on the measurement data itself. Another reason is the complexity of the Bayes classifier which can be quite costly to apply. The expense results from computation of posterior probabilities of every hypothesis and the combinations of the predictions to classify each new event. The decision-tree method is a widely used and practical instance for inductive inference [22]. It can be seen as an approximation of a target function which is iteratively encoded in the tree/subtree structure. In general decision trees classify events by sorting them top-down the tree which finally pro- vides the classification. The events need to be described by attribute-value pairs from a fixed set which is determined by the number and modality of the sensors. The advantage is that one does not need to derive any probabilistic model, can cope with missing data and just encode an appropriate classification rule. A missing attribute could be estimated or averaged based on examples for which this attribute has a known value. In contrast to statistical approaches the decision-tree al- gorithm, for example ID3 or C4.5, does not perform the probability hypothesis analysis but searches for the attribute which is the best classifier in the tree/subtree. A practical quantitative measure of the worth of an attribute is the entropy. Furthermore, one could cancel the classification process at certain depth of the tree since decision-trees divide the attribute space into regions of self-similarity. That is, that one can exploit the association ability in order to reduce the tree size and the complexity therewith. Noisy data representation might lead to large trees which inherently imply higher complexity. Due to the preprocessing and the recognition using the HN one can avoid noisy data that enables the reliable classification process. Additional in-network processing techniques such as cod- ing, compression and prediction play an important role to reduce data significantly and extend network lifetime. Selected methods which should be considered are, e.g., linear predictive coding and distributed source coding. The rationale behind is the fact that sensor readings within a particular area are highly correlated in space, in time and in modalities. For example, temperature will be quite constant in a particular region and will not vary much in common environments. The correlation implies redundancy in the raw data which is exploitable. A powerful way to de-correlate the data and remove the dynamicity is to apply predictive coding algorithms which can be implemented as neuralnetworks [23]. The general idea is to estimate the current sensor reading based on certain past sensor readings. This approach leads to reduced bit rates and thus energy conservation. The simplest algorithm is the linear predictor which faces the following trade-off: the higher the degree (complexity) of the predictor, the higher the compression ratio and the delay. Another promising approach to achieve higher energy- efficiency is source coding. In case of sensornetworks one needs to extend the basic idea to distributed source coding where correlated sources need to be taken into account (Slepian-Wolf limit) [24]. According to the sensor network paradigm one has to apply codes with low encoding complexity and high coding gain. The decoder complexity is not crucial but should be taken into account since one can assume higher computational capability at higher hierarchical levels or in the destination node itself. Many powerful codes are reported in the literature. The Low-Density Parity-Check (LDPC) codes earned much attention inside the coding community and have been applied for sensornetworks recently [25]. The Difference-Set Cyclic codes should be stressed since they are even more powerful than LDPC codes at short block length and enable linear encoding complexity (linear in time and number of bits) [26], [27]. Those codes are applicable in the sensor network context due to the discussed characteristics. VII. C ONCLUSION This document presents the scope and requirements on embedded network applications. Furthermore, it points out the similarity between sensornetworks and ANNs. The strong analogy between sensor node and neuron has been presented. We have shown that the neural network paradigm can be successfully applied to WSNs which can be seen as ANNs subsequently. In order to push ANNs forward to the application domain of sensornetworks a HN was implemented on a sensor node. The implementation using TinyOS/nesC fits into approximately 20 kB out of 48 kB. After evaluation, test results show that the association of “fuzzy events” is a difficult, but doable, task where preprocessing should be used to achieve better results. In the case of “events with bit errors” our implementation shows very good performance which proofs the strong error correction capability of the HN. The hierarchical approach, the use of a sensor node with SOM implementation on top of a group of sensor nodes with HN implementation, was introduced. This approach shows promising possibilities for increasing reliability, dimension reduction, error correction and for sensor validation. Future work is clearly warranted. The avoidance or detection of spurious patterns and the hierarchical SOM approach will help to find a more robust solution. The selected in-network processing techniques need to be evaluated in more detail to proof their applicability. We tested in the simulated en- vironment an ANN-WSN against destroyed sensor nodes and destroyed clusters of those. The simulations show promising results also in this domain. As we foresee, the exploitation of ANNs in the context of sensornetworks promises a strong impact to the future research, especially if applied as a hybrid technology. A CKNOWLEDGMENT This work was financially supported in part by European Union (RUNES IST-004536) and Ericsson. R EFERENCES [1] K. R ¨ omer and F. Mattern, “The design space of wirelesssensor net- works,” IEEE Wireless Comm., vol. 11, no. 6, pp. 54–61, Dec. 2004. [2] I. F. Akyildiz, W. Su, Y. Sankarasubramaniam, and E. Cayirci, “A survey on sensor networks,” IEEE CommMag., vol. 40, no. 8, pp. 102–114, Aug. 2002. [3] D. Culler, D. Estrin, and M. Srivastava, “Overview of sensor networks,” IEEE Computer, vol. 37, no. 8, pp. 41–49, Aug. 2004. [4] W. B. Heinzelman, A. L. Murphy, H. S. Carvalho, and M. A. Perillo, “Middleware to support sensor network applications,” IEEE Network, vol. 18, no. 1, pp. 6–14, Jan Feb. 2004. [5] C. Intanagonwiwat, R. Govindan, and D. Estrin, “Directed diffusion: a scalable and robust communication paradigm for sensor networks,” in Proc. of ACM/IEEE MobiCom, 2000. [6] C. Intanagonwiwat, R. Govindan, D. Estrin, J. Heidemann, and F. Silva, “Directed diffusion for wirelesssensor networking,” IEEE/ACM Trans. on Networking, vol. 11, no. 1, pp. 2–16, Feb. 2003. [7] L. Krishnamachari, D. Estrin, and S. Wicker, “The impact of data aggregation in wirelesssensor networks,” in Proc. of Int. Conf. on Distributed Computing Systems Workshop, 2002. [8] P. Bonnet, J. Gehrke, and P. Seshadri, “Querying the physical world,” IEEE Personal Comm., vol. 7, no. 5, pp. 10–15, Oct. 2000. [9] J. Gehrke and S. Madden, “Query processing in sensor networks,” IEEE Pervasive Computing, vol. 3, no. 1, pp. 46–55, 2004. [10] E. C. Kandel, J. H. Schwartz, and T. M. Jessel, Principles of Neural Science. McGraw-Hill, 2000. [11] L. Fausett, Fundamentals of Neural Networks. Prentice Hall, 1994. [12] D. J. C. MacKay, Information Theory, Inference, and Learning Algo- rithms. Cambridge University Press, 2003. [13] J. J. Hopfield, “Neural networks and physical systems with emergent collective computational abilities,” Proc. of National Academy of Sci- ences, 1982. [14] Telos B datasheet, http://www.xbow.com. [15] P. Lewis, S. Maddan, D. Gay, J. Polastre, R. Szewczyk, A. Woo, E. Brewer, and D. Culler, “The emergence of networking abstractions and techniques in TinyOS,” in Symp. on Networked Systems Design and Implementation, 2004. [16] D. Gay, P. Levis, R. von Behren, M. Welsh, E. Brewer, and D. Culler, “The nesc language: A holistic approach to networked embedded systems,” in ACM SIGPLAN Conf. on Programming Language Design and Implementation, 2003. [17] A. V. Robins and S. J. R. McCallum, “A robust method for distinguishing between learned and spurious attractors,” Elsevier Neural Networks, vol. 17, no. 3, pp. 313–326, 2004. [18] G. Athithan and C. Dasgupta, “On the problem of spurious patterns in neural associative memory models,” IEEE Trans. on Neural Networks, vol. 8, no. 6, pp. 1483–1491, 1997. [19] S. Abe, “Global convergence and suppression of spurious states of the Hopfield neural networks,” IEEE Trans. on Circuits and Systems, vol. 40, no. 4, pp. 246–257, 1993. [20] H. D. L ¨ uke, Korrelationssignale. Springer, 1992. [21] T. Kohonen, Self-Organizing Maps. Springer, 1992. [22] T. M. Mitchell, Machine Learning. McGraw-Hill, 1997. [23] F L. Luo and R. Unbehauen, Applied NeuralNetworks for Signal Processing. Cambridge University Press, 1998. [24] D. Slepian and J. K. Wolf, “Noiseless coding of correlated information sources,” IEEE Trans. on Information Theory, vol. 19, no. 4, pp. 471– 480, 1973. [25] M. Sartipi and F. Fekri, “Source and channel coding in wirelesssensornetworks using LDPC codes,” in IEEE Conf. on Sensor and Ad Hoc Communications and Networks, 2004. [26] T. Clevorn, F. Oldewurtel, and P. Vary, “Combined iterative demodula- tion and decoding using very short LDPC codes and rate-1 convolutional codes,” in Conf. on Information Science and Systems, 2005. [27] T. Clevorn, F. Oldewurtel, S. Godtmann, and P. Vary, “Iterative demodu- lation for DVB-S2,” in IEEE Int. Symp. on Personal Indoor and Mobile Radio Communications, 2005. . between neural and sensor networks. In the following neural networks are introduced in the sensor network context. We describe the motivation and the practical case for neural networks in the sensor. Neural Wireless Sensor Networks Frank Oldewurtel and Petri M ¨ ah ¨ onen RWTH Aachen University, Department of Wireless Networks Kackertstrasse 9, D-52072 Aachen,. future research, especially if applied as a hybrid technology. I. INTRODUCTION ON SENSOR NETWORKS Wireless Sensor Networks (WSNs) are an emerging tech- nology which holds the potential to revolutionize