Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 197 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
197
Dung lượng
2,46 MB
Nội dung
DISTRIBUTED OPTIMISATION IN WIRELESS SENSOR NETWORKS A HIERARCHICAL LEARNING APPROACH YEOW WAI LEONG NATIONAL UNIVERSITY OF SINGAPORE 2007 DISTRIBUTED OPTIMISATION IN WIRELESS SENSOR NETWORKS A HIERARCHICAL LEARNING APPROACH YEOW WAI LEONG (B. Eng (Hons), NUS ) A THESIS SUBMITTED FOR THE DEGREE OF DOCTOR OF PHILOSOPHY To my wife, Yuan Acknowledgements I wish to express my deep and sincere appreciation to my supervisors, Professors Lawrence Wong Wai-Choong and Tham Chen-Khong, for their guidance, help and support. It is Professor Wong who planted the seed for exciting research in Wireless Sensor Networks and Professor Tham who introduced me to the interesting world of reinforcement learning and Markov Decision Processes. This thesis would not be possible without their valuable insights and comments throughout the course of this candidature. I would also like to thank the other member of the thesis committee, Professor Chua Kee Chaing, who kindly reviewed and supported my work throughout, and provided insightful comments for improvement. It is a pleasure and honour to have Professors Bharadwaj Veeravalli, Leslie PackKaelbling and Marimuthu Palaniswami as the thesis examiners. My gratitude goes to them for their in-depth reviews, valuable suggestions and corrections to this work, which greatly helped me to improve this theis in various aspects. During the course of my candidature, I have had the chance to work with Professors Mehul Motani and Vikram Srinivasan on other interesting research projects. I have learnt much from them and the collaboration experience is certainly enriching and gratifying. Special thanks to fellow graduate students and members of the Computer Networks and Distributed Systems Laboratory. The frequent discussions over tough research problems and mutual encouragement with Yap Kok-Kiong and Rob Hoes have planted several i excitements in the course of my research and have made my life as a graduate student more colorful and joyful. Thank you to Luo Tie, Zhao Qun, Wang Wei, Hu Zhengqing and Ai Xin. I am deeply indebted to the person with a special place in my heart, my wife Yuan. I thank her for her patience, her encouragement, and continued support throughout, without which the thesis would not be completed. I also owe a great deal to my parents for being supportive of my studies. My Ph.D. candidature is supported by the A*STAR Graduate Scholarship Program. ii Table of Contents Acknowledgements i Table of Contents iii Summary viii List of Tables x List of Figures xi List of Symbols xiv List of Abbreviations xvii Introduction 1.1 Wireless Sensor Networks . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 The Learning Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Organisation of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . Issues in Wireless Sensor Networks 2.1 Topology Control and Routing . . . . . . . . . . . . . . . . . . . . . . . 2.2 Data Aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Target Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii TABLE OF CONTENTS Stochastic Planning 3.1 11 Markov Decision Process . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.1.1 Markov Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.1.2 Markov Decision Process . . . . . . . . . . . . . . . . . . . . . . 13 3.1.3 Bellman’s Equations . . . . . . . . . . . . . . . . . . . . . . . . . 15 MDP solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.2.1 Dynamic Programming: Value Iteration . . . . . . . . . . . . . . 16 3.2.2 Reinforcement Learning: Q-learning . . . . . . . . . . . . . . . . 17 3.3 Function Approximation: CMAC to speed up learning . . . . . . . . . . 19 3.4 Semi-MDP and Constrained MDP . . . . . . . . . . . . . . . . . . . . . 21 3.5 Hierarchical Reinforcement Learning . . . . . . . . . . . . . . . . . . . . 23 3.5.1 Sutton’s Options Formulation . . . . . . . . . . . . . . . . . . . . 24 3.5.2 MAXQ Value Function Decomposition . . . . . . . . . . . . . . . 25 3.2 Hard/Soft Constrained semi-MDP 28 4.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.2 Mathematical Notations and Related Work . . . . . . . . . . . . . . . . 30 4.2.1 Mathematical Notations . . . . . . . . . . . . . . . . . . . . . . . 30 4.2.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 HCsMDP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.3.1 Finite Horizon HCsMDP . . . . . . . . . . . . . . . . . . . . . . 33 4.3.2 Infinite Horizon HCsMDP . . . . . . . . . . . . . . . . . . . . . . 43 4.3.3 Solving HCsMDP . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 SCsMDP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.4.1 Optimal Policy Structure . . . . . . . . . . . . . . . . . . . . . . 46 4.4.2 SCsMDP Solution . . . . . . . . . . . . . . . . . . . . . . . . . . 48 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4.3 4.4 4.5 iv TABLE OF CONTENTS 4.6 4.7 4.5.1 HCsMDP Experiments . . . . . . . . . . . . . . . . . . . . . . . . 58 4.5.2 Special case: deadline-sensitive sMDP . . . . . . . . . . . . . . . 59 4.5.3 Taxi driver’s problem . . . . . . . . . . . . . . . . . . . . . . . . 60 4.5.4 SCsMDP Experiments . . . . . . . . . . . . . . . . . . . . . . . . 60 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.6.1 Transient MDPs . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.6.2 Risk-sensitive (utility) functions on total cost . . . . . . . . . . . 65 4.6.3 Two-sided soft constraints on total cost . . . . . . . . . . . . . . 65 4.6.4 Curse of dimensionality . . . . . . . . . . . . . . . . . . . . . . . 66 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 Distributed Performance Optimisation in WSN 68 5.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 5.2 Problem definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 5.2.1 Aggregated Data Quality and End-to-end Delay . . . . . . . . . 70 5.2.2 A Soft Constrained Markov Decision Process . . . . . . . . . . . 72 A Distributed Learning Algorithm with Soft Constraints . . . . . . . . . 74 5.3.1 An overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 5.3.2 Derivation of Rewards and Costs . . . . . . . . . . . . . . . . . . 75 5.3.3 Distributed Q-learning for SCsMDP . . . . . . . . . . . . . . . . 78 Aggregating Feedback to Reduce Overhead . . . . . . . . . . . . . . . . 80 5.4.1 Design Specification of a MDP Aggregated Feedback Mechanism 80 5.4.2 A Feedback Aggregation Solution . . . . . . . . . . . . . . . . . . 81 5.4.3 Experiments on Aggregated Feedback Mechanism . . . . . . . . 83 5.5 Encouraging Cooperation with a Leader . . . . . . . . . . . . . . . . . . 87 5.6 Reducing Policy Search Space with Rings Topology . . . . . . . . . . . . 89 5.7 ARCQ: putting everything together 92 5.3 5.4 v . . . . . . . . . . . . . . . . . . . . TABLE OF CONTENTS 5.8 5.9 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 5.8.1 A Simple Three-node Network . . . . . . . . . . . . . . . . . . . 96 5.8.2 Random Networks of Various Densities and Sizes . . . . . . . . . 98 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 WSN Performance Optimisation with Hierarchical Learning 106 6.1 An Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 6.2 HARCQ: a Hierarchical Model . . . . . . . . . . . . . . . . . . . . . . . 107 6.2.1 A Hierarchical Model . . . . . . . . . . . . . . . . . . . . . . . . 107 6.2.2 Reduction of State Space through Abstraction . . . . . . . . . . 109 6.2.3 The HARCQ Algorithm . . . . . . . . . . . . . . . . . . . . . . . 110 6.3 Space Complexity Analysis of ARCQ and HARCQ . . . . . . . . . . . . 111 6.4 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 6.4.1 Average Data Quality . . . . . . . . . . . . . . . . . . . . . . . . 114 6.4.2 End-to-end Delay Performance . . . . . . . . . . . . . . . . . . . 114 6.4.3 Lost packets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 6.4.4 Average Energy Consumed Per Node . . . . . . . . . . . . . . . . 117 6.4.5 Overhead caused by Feedback . . . . . . . . . . . . . . . . . . . . 117 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 6.5 Multiple Target Tracking using Hierarchical Learning 123 7.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 7.2 An Overview of the Target Tracking Problem . . . . . . . . . . . . . . . 124 7.3 A Target Tracking WSN Model . . . . . . . . . . . . . . . . . . . . . . . 125 7.3.1 Mobility Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 7.3.2 Tracking and Prediction Model . . . . . . . . . . . . . . . . . . . 128 7.4 Problem Formulation and System Design . . . . . . . . . . . . . . . . . 130 7.5 The HMTT Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 vi TABLE OF CONTENTS 7.5.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 7.5.2 Q(λ) algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 7.5.3 Higher Level Agent . . . . . . . . . . . . . . . . . . . . . . . . . . 134 7.5.4 Lower Level Agent (Trajectory Predictor) . . . . . . . . . . . . . 139 Analysis of HMTT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 7.6.1 Convergence of Hierarchical Q(λ) . . . . . . . . . . . . . . . . . . 141 7.6.2 Theoretical Bounds . . . . . . . . . . . . . . . . . . . . . . . . . 142 7.6.3 Time and Space Complexity of Q(λ) using a CMAC . . . . . . . 143 7.6.4 Relationship with other Hierarchical MDP formulations . . . . . 144 7.7 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 7.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 7.6 Conclusion and Future Research 158 8.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 8.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 A Multi-criteria Knapsack problem 162 B Publications 163 Bibliography 165 vii 8.2. FUTURE WORK expanded state space, and what are the gains over model-learning? 2. The cooperation method used in ARCQ is based on the use of carefully crafted common rewards among competing nodes. This eliminates the Prisoner’s Dilemma Problem and paves way for leader-based coordination. Unfortunately, such method may not be readily applicable in other situations. For example, if there are multiple costs and constraints, it is challenging to ensure that the resultant reward is common among all nodes. In light of this, other cooperation techniques should be looked into. 3. The hierarchical learning framework in HARCQ and HMTT thus far is based on hierarchical decomposition of tasks. This idea of hierarchical decomposition can be extended to across nodes. In ARCQ and HARCQ, the infrastructure between leader nodes and nodes are already established; in HMTT, the cluster of nodes are already formed. Perhaps, learning with hierarchical decomposition in infrastructure could lead to a better improvement in cooperation and coordination, and even better performance than hierarchical learning with subtasks. 161 Appendix A Multi-criteria Knapsack problem The (0,1) multi-criteria knapsack problem is a variant of the KNAPSACK problem, one of the first known NP-Complete problems (Garey and Johnson, 1979). Alanne (2004) gives a detailed description of the problem. Definition A.1 ((0,1) Multi-criteria Knapsack). A (0,1) Multi-criteria Knapsack problem is comprised of decision variables a1 , . . . , , . . . , an where = if item i is selected, else = 0. The objective function is n max vi (A.1a) i=1 where vi is the utility score awarded by selecting item i. The objective function is subjected to constraints n ci < C (A.1b) i=1 where ci is the vector of costs of selecting item i and C is the maximum total cost vector allowed. 162 Appendix B Publications The following is a list of publications produced over the duration of my candidature, in chronological order. Chen-Khong Tham, Daniel B. Yagan, Wai-Leong Yeow, and W.-C. Wong. Collaborative Processing in Sensor Networks with Assured QoS. In Workshop on Mobile, Wireless and Sensor Networks (MOBWISER), March 2004. Poster Session. Wai-Leong Yeow, Chen-Khong Tham, and Wai-Choong Wong. A Novel Target Movement Model and Energy Efficient Target Tracking in Sensor Networks. In Proc. IEEE Semiannual Vehicular Technology Conference (VTC-Spring), Stockholm, SE, May/June 2005. Wai-Leong Yeow, Chen-Khong Tham, and Wai-Choong Wong. Energy Efficient Multiple Target Tracking in Sensor Networks. In Proc. IEEE Global Communications Conference (GLOBECOM), St. Louis, MO, November/December 2005. Kok-Kiong Yap, Wai-Leong Yeow, Mehul Motani, and Chen-Khong Tham. Simple Directional Antennas: Improving Performance in Wireless Multihop Networks. In Proc. IEEE International Conference on Computer Communications (INFOCOM), 163 APPENDIX B. PUBLICATIONS April 2006. Wai-Leong Yeow, Chen-Khong Tham, and Wai-Choong Wong. Markov Decision Processes. Hard Constrained In Proc. AAAI Conference on Artificial Intelligence (AAAI), 2006. Wai-Leong Yeow, Chen-Khong Tham, and Wai-Choong Wong. Energy Efficient Multiple Target Tracking in Sensor Networks. IEEE Transactions on Vehicular Technology, 56(2):918–928, March 2007. Wai-Leong Yeow, Chen-Khong Tham, and Wai-Choong Wong. Optimizing Application Performance through Learning and Cooperation in a Wireless Sensor Network. In Proc. IEEE Workshop on Situation Management (SIMA), October 2007. Wai-Leong Yeow, Kok-Kiong Yap, Mehul Motani, and Chen-Khong Tham. Wireless Multi-hop Networking with Static Directionality. IEEE/ACM Transactions on Net- working, 2007. submitted. Rob Hoes, Wai-Leong Yeow, Vikram Srinivasan, and Chen-Khong Tham. DynaMoS: Dynamic Topology Maintenance in Wireless Sensor Networks with a Mobile Sink, 2006. ACM Transactions on Sensor Networks. submitted. Wai-Leong Yeow, Chen-Khong Tham, and Wai-Choong Wong. Hard and Soft Con- strained Semi-Markov Decision Processes. Journal of Artificial Intelligence Research, 2007. submitted. 164 Bibliography Ian F. Akyildiz, W. Su, Yogesh Sankarasubramaniam, and Erdal Cayirci. Wireless Sensor Networks: a Survey. Computer Networks, 38(4):393–422, March 2002. Kari Alanne. Selection of renovation actions using multi-criteria “knapsack” model. Automation in Construction, 13(3):377–391, May 2004. Erin L. Allwein, Robert E. Schapire, and Yoram Singer. Reducing Multiclass to Binary: A Unifying Approach. Journal of Machine Learning Research, 1:113–141, 2000. Eitan Altman. Constrained Markov Decision Processes. Chapman & Hall/CRC, 1999. ISBN 0849303826. Andrew G. Barto and Sridhar Mahadevan. Recent Advances in Hierarchical Reinforcement Learning. Discrete Event Dynamic Systems: Theory and Applications, 13(1/2): 41–77, January/April 2003. Andrew G. Barto, Steven J. Bradtke, and Satinder P. Singh. Learning to act using real-time dynamic programming. Artificial Intelligence, 72(1-2):81–138, 1995. ISSN 0004-3702. doi: http://dx.doi.org/10.1016/0004-3702(94)00011-O. Pritam Baruah, Rahul Urgaonkar, and Bhaskar Krishnamachari. Learning Enforced Time Domain Routing to Mobile Sinks in Wireless Sensor Fields. In Proc. IEEE Workshop on Embedded Networked Sensors (EmNetS-I), Tampa, FL, November 2004. R. Bellman. Dynamic Programming. Princeton University Press, Princeton, NJ, 1957. 165 BIBLIOGRAPHY Dimitri P. Bertsekas. Dynamic Programming and Optimal Control: 2nd Edition. Athena Scientific, 2000. ISBN 1-886529-09-4. Dimitri P. Bertsekas and John N. Tsitsiklis. Neuro-Dynamic Programming. Athena Scientific, 1996. ISBN 1886529108. F. J. Beutler and K. W. Ross. Time-Average Optimal Constrained Semi-Markov Decision Processes. Advances in Applied Probability, 18(2):341–359, 1986. Sangeeta Bhattacharya, Guoliang Xing, Chenyang Lu, Gruia-Catalin Roman, Brandon Harris, and Octav Chipara. Dynamic Wake-up and Topology Maintenance Protocols with Spatiotemporal Guarantees. In Proc. Information Processing in Sensor Networks (IPSN), Los Angeles, CA, April 2005. Doron Blatt and Alfred Hero. Distributed maximum likelihood estimation for sensor networks. In Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’04), volume 3, pages 929–932, may 2004. R.R. Brooks, P. Ramanathan, and A.M Sayeed. Distributed Target Classification and Tracking in Sensor Networks. Proceedings of the IEEE, 91(8):1163–1171, August 2003. BTnodes. BTnodes - A Distributed Environment for Prototyping Ad Hoc Networks. http://www.btnode.ethz.ch/, 2007. A. Cerpa and D. Estrin. ASCENT: Adaptive Self-Configuring sEnsor Networks Topologies. In Proc. IEEE International Conference on Computer Communications (INFOCOM), volume 3, pages 23–27, June 2002. Hyeong Soo Chang, P.J. Fard, S.I. Marcus, and M. Shayman. Multi-time scale Markov decision processes. IEEE Transactions on Automatic Control, 48(6):976–987, June 2003. 166 BIBLIOGRAPHY B. Chen and P. K. Varshney. A Bayesian sampling approach to decision fusion. IEEE Transactions on Signal Processing, 50(8):1809–1818, August 2002. Mun Chiang, Steven H. Low, A. Robert Calderbank, and John C. Doyle. Layering as Optimization Decomposition: A Mathematical Theory of Network Architectures. Proceedings of the IEEE, 95(1):255–312, January 2007. Chee-Yee Chong and Srikanta P. Kumar. Sensor networks: Evolution, Opportunities, and Challenges. Proceedings of the IEEE, 91(8):1247–1256, August 2003. Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to Algorithms. MIT Press, Cambridge, MA, USA, 2nd edition, 2001. ISBN 0-262-03293-7. Crossbow Technology, Inc. MICA2 Datasheet. http://www.xbow.com/Products/ Product_pdf_files/Wireless_pdf/MICA2_Datasheet.pdf, 2007a. Crossbow Technology, Inc. MICAz Datasheet. http://www.xbow.com/Products/ Product_pdf_files/Wireless_pdf/MICAz_Datasheet.pdf, 2007b. Crossbow Technology, Inc. Crossbow Technology, Inc. http://www.xbow.com/, 2007c. Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition. Journal of Artificial Intelligence Research, 13:227–303, November 2000. Thomas G. Dietterich. Machine Learning for Real-Time Decision Making. Technical Report F49620-98-1-0375, Oregon State University, June 2001. D. Dolgov and E. Durfee. Approximating Optimal Policies for Agents with Limited Execution Resources. In Proc. International Joint Conference on Artificial Intelligence (IJCAI), pages 1107–1112, August 2003. 167 BIBLIOGRAPHY D. Dolgov and E. Durfee. Approximating Probabilistic Constraints and Risk-Sensitive Optimization Criteria in Markov Decision Processes. In Proc. International Symposium on Artificial Intelligence and Mathematics (AI&M), January 2004. Marco Duarte and Yu-Hen Hu. Optimal Decision Fusion With Applications to Target Detection in Wireless Ad Hoc Sensor Networks. In Proc. IEEE International Conference on Multimedia and Expo (ICME), pages 1803–1806, June 2004. Vijay Erramilli, Ibrahim Matta, and Azer Bestavros. On the interaction between data aggregation and topology control in wireless sensor networks. In IEEE SECON’04, October 2004. Deborah Estrin, Ramesh Govindan, John Heidemann, and Satish Kumar. Next Century Challenges: Scalable Coordination in Sensor Networks. In Proc. ACM International Conference on Mobile Computing and Networking (MobiCom), pages 263–270, Seattle, WA, August 1999. R. Evans, Vikram Krishnamurthy, and G. Nair. Networked Sensor Management and Data Rate Control for Tracking Maneuvering Targets. IEEE Transactions on Signal Processing, 53(6):1979–1991, June 2005. E. Feinberg and A. Shwartz. Constrained Discounted Dynamic Programming. Mathematics of Operations Research, 21:922–945, 1996. E. A. Feinberg. Constrained Semi-Markov Decision Processes with Average Rewards. Mathematical Methods of Operations Research, 39(3):257–288, October 1994. Eugene A. Feinberg and Adam Shwartz. Mixed Criteria. In Eugene A. Feinberg and Adam Shwartz, editors, Handbook of Markov Decision Processes: Methods and Applications, pages 209–230. Kluwer Academic Publishers, Norwell, MA, 2002. Zoltan G´ abor, Z. Kalm´ ar, and C. Szepesv´ari. Multi-criteria Reinforcement Learning. 168 BIBLIOGRAPHY In Proc. International Conference on Machine Learning (ICML), pages 197–205, July 1998. Shashidhar Rao Gandham, Milind Dawande, Ravi Prakash, and S. Venkatesan. Energy Efficient Schemes for Wireless Sensor Networks with Multiple Mobile Base Stations. In Proc. IEEE Global Communications Conference (GLOBECOM), pages 377–381, December 2003. Michael R. Garey and David S. Johnson. Computers and Intractability: A Guide to the Theorey of NP-Completeness. W. H. Freeman, 1979. Peter Geibel and Fritz Wysotzki. Risk-Sensitive Reinforcement Learning Applied to Control under Constraints. Journal of Artificial Intelligence Research, 24:81–108, July 2005. Mohammad Ghavamzadeh and Sridhar Mahadevan. Learning to Communicate and Act Using Hierarchical Reinforcement Learning. In Proc. International Joint Conference on Autonomous Agents and Multi-Agent Systems (AAMAS), pages 1114–1121, August 2004. Soheil Ghiasi, Ankur Srivastava, Xiaojian Yang, and Majid Sarrafzadeh. Optimal Energy Aware Clustering in Sensor Networks. Sensors, 2:258–269, July 2002. Claudia V. Goldman and Shlomo Zilberstein. Optimizing information exchange in cooperative multi-agent systems. In Proc. International Joint Conference on Autonomous Agents and Multi-Agent Systems (AAMAS), pages 137–144, Melbourne, Australia, July 2003. J. Goldsmith and M. Mundhenk. Complexity Issues in Markov Decision Processes. In Proc. IEEE Conference on Computational Complexity (CCC), pages 272–280, June 1999. 169 BIBLIOGRAPHY Abhijit Gosavi. Simulation-Based Optimization : Parametric Optimization Techniques and Reinforcement Learning. Kluwer Academic Publishers, 2003. ISBN 1402074549. Carlos Guestrin. Planning Under Uncertainty in Complex Structured Environments. PhD thesis, Stanford University, Stanford, California, August 2003. Tian He, Chengdu Huang, Brian M. Blum, John A. Stankovic, and Tarek Abdelzaher. Range-free localization schemes for large scale sensor networks. In Proc. ACM International Conference on Mobile Computing and Networking (MobiCom), pages 81–85, San Diego, CA, September 2003. Wendi Heinzelman, Anantha Chandrakasan, and Hari Balakrishnan. Energy-Efficient Communication Protocols for Wireless Microsensor Networks. In Proc. Hawaiian International Conference on Systems Science (HICSS), volume 33, January 2000. Rob Hoes, Wai-Leong Yeow, Vikram Srinivasan, and Chen-Khong Tham. DynaMoS: Dynamic Topology Maintenance in Wireless Sensor Networks with a Mobile Sink. submitted to ACM Transactions on Sensor Networks, 2006. M. Horiguchi. Markov decision processes with a stopping time constraint. Mathematical Methods of Operations Research, 53(2):279–295, 2001. Joe M. Kahn, R. H. Katz, and K. S. J. Pister. Mobile Networking for Smart Dust. In Proc. ACM International Conference on Mobile Computing and Networking (MobiCom), 1999. Vikram Krishnamurthy. Algorithms for Optimal Scheduling and Management of Hidden Markov Model Sensors. IEEE Transactions on Signal Processing, 50(6):1382–1397, June 2002. Joanna Kulik, Wendi Heinzelman, and Hari Balakrishnan. Negotiation-Based Protocols 170 BIBLIOGRAPHY for Disseminating Information in Wireless Sensor Networks. ACM Wireless Networks, 8(2/3):169–185, March/May 2002. Michail G. Lagoudakis and Ronald Parr. Least-Squares Policy Iteration. Journal of Machine Learning Research, 4:1107–1149, December 2003. Martin Lauer and Martin Riedmiller. An Algorithm for Distributed Reinforcement Learning in Cooperative Multi-Agent Systems. In Proc. International Conference on Machine Learning (ICML), July 2000. Ben Liang and Zygmunt J. Haas. Predictive Distance-Based Mobility Management for PCS Networks. In Proc. IEEE International Conference on Computer Communications (INFOCOM), New York, NY, March 1999. Maxim Likhachev, Geoff Gordon, and Sebastian Thrun. Planning for Markov Decision Processes with Sparse Stochasticity. In Proc. Neural Information Processing Systems Conference (NIPS), pages 785–792, December 2005. Tong Liu, Paramir Bahl, and Imrich Chlamtac. Mobility Modeling, Location Tracking, and Trajectory Prediction in Wireless ATM Networks. IEEE Journal on Selected Areas in Communications, 16(6):922–935, August 1998. Y Liu, C K Tham, and Y M Jiang. Conformance Analysis in Networks with Service Level Agreements. Computer Networks, 47(6):885–906, April 2005. J. Luo and J. P. Hubaux. Joint Mobility and Routing for Lifetime Elongation in Wireless Sensor Networks. In Proc. IEEE International Conference on Computer Communications (INFOCOM), Miami, USA, March 2005. M. H. MacDougall. SMPL: A Simple Portable Simulation Language. Technical Report 820377-700A, Amdahl Corporation, Sunny Vale, CA, April 1980. 171 BIBLIOGRAPHY Alan Mainwaring, David Culler, Joseph Polastre, Robert Szewczyk, and John Anderson. Wireless sensor networks for habitat monitoring. In Proc. ACM International Workshop on Wireless Sensor Networks and Applications (WSNA), pages 88–97. ACM Press, 2002. ISBN 1-58113-589-0. Steven I. Marcus, Emmanuel Fern´andez-Gaucherand, Daniel Hern´andez-Hern´andez, Stefano Coraluppi, and Pedram Fard. Risk Sensitive Markov Decision Processes. In Christopher I. Byrnes, editor, Systems and Control in the Twenty-First Century, pages 263–279. Birkhauser Verlag AG, Boston, MA, 1997. J. Mirkovic, G. P. Venkataramani, S. Lu, and L. Zhang. A Self-Organizing Approach to Data Forwarding In Large-Scale Sensor Networks. In Proc. IEEE International Conference on Communications (ICC), volume 5, pages 1357–1361, June 2001. The Mulle. The Mulle - A node for Bluetooth Sensor Networks / Development information. http://www.csee.ltu.se/~jench/mulle.html, 2007. Suman Nath, Phillip B. Gibbons, Srinivasan Seshan, and Zachary R. Anderson. Synopsis diffusion for robust aggregation in sensor networks. In Proc. ACM Conference on Embedded Networked Sensor Systems (Sensys), November 2004. Athanasios Papoulis. Probability, Random Variables and Stochastic Processes. McGrawHill, 2002. ISBN 0070484775. Ronald Parr and Stuart J. Russell. Reinforcement Learning with Hierarchies of Machines. In Proc. Advances in Neural Information Processing Systems (NIPS), December 1997. Sriram Pemmaraju and Steven Skiena. Computational Discrete Mathematics: Combinatorics and Graph Theory with Mathematica. Cambridge University Press, 2003. ISBN 0521806860. 172 BIBLIOGRAPHY Gokul Poduval. Integrated Computational and Network QoS in Grid Computing. Master’s thesis, National University of Singapore, August 2005. Martin L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley and Sons, Inc., New York, NY, 1994. D. Salmond. Target Tracking: Introduction and Kalman Tracking Filters. In IEE Workshop Target Tracking: Algorithms and Applications (Ref. No. 2001/174), volume 2, pages 1–16, October 2001. Curt Schurgers, Vlasios Tsiatsis, and Mani B. Srivastava. STEM: Topology Management for Energy Efficient Sensor Networks. In Proc. IEEE Aerospace Conference, March 2002. N. Z. Shor, Krzysztof C. Kiwiel, and Andrzej Ruszcay` nski. Minimization methods for non-differentiable functions. Springer-Verlag New York, Inc., New York, NY, USA, 1985. ISBN 0-387-12763-1. R. S. Sutton, D. Precup, and S. Singh. Between MDPs and semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning. Artificial Intelligence, 112(1/2): 181–211, August 1999. Richard S. Sutton and Andrew G. Barto. Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA, 1998. ISBN 0262193981. Y. C. Tay, Kyle Jamieson, and Hari Balakrishnan. Collision-Minimizing CSMA and its Applications to Wireless Sensor Networks. IEEE Journal on Selected Areas in Communications, 22(6):1048–1057, August 2004. Chen-Khong Tham. Modular On-line Function Approximation for Scaling Up Reinforcement Learning. PhD thesis, Univ. of Cambridge, Cambridge, England, October 1994. 173 BIBLIOGRAPHY Chen-Khong Tham and Richard W. Prager. A Modular Q-Learning Architecture for Manipulator Task Decomposition. In Proc. International Conference on Machine Learning (ICML), pages 309–317, New Jersey, USA, July 1994. Morgan Kaufmann Publishers. Chen-Khong Tham, Daniel B. Yagan, Wai-Leong Yeow, and W.-C. Wong. Collaborative Processing in Sensor Networks with Assured QoS. In Workshop on Mobile, Wireless and Sensor Networks (MOBWISER), March 2004. Poster Session. The New York Times. New Model Army Soldier Rolls Closer to Battle. Feburary 16, 2005. Andr´ as Varga. OMNeT++: Discrete Event Simulation System. http://www.omnetpp. org/, 2006. Edward L. Waltz, James Llinas, and Franklin E. White. Multisensor Data Fusion. Artech House, Inc., 1990. ISBN 0890062773. Christopher J.C.H. Watkins and Peter Dayan. Q-learning. Machine Learning, 8(3): 279–292, May 1992. Matt Welsh. Citysense. http://www.citysense.net/, April 2007. WINS. WINS project. http://www.janet.ucla.edu/WINS/, 1999. Jin-Jun Xiao, Alejandro Ribeiro, Zhi-Qian Luo, and Georgios B. Giannakis. Distributed compression-estimation using wireless sensor networks. IEEE Signal Processing Magazine, 23(4):27–41, jul 2006. Lin Xiao, Stephen Boyd, and Sanjay Lall. A Scheme for Robust Distributed Sensor Fusion Based on Average Consensus. In Proc. International Conference on Information Processing in Sensor Networks (IPSN), 2005. 174 BIBLIOGRAPHY Yingqi Xu, Juilian Winter, and Wang-Chien Lee. Prediction-based Strategies for Energy Saving in Object Tracking Sensor Networks. In Proc. IEEE International Conference on Mobile Data Management (MDM), pages 346–357, Berkeley, CA, January 2004. Daniel Yagan and Chen-Khong Tham. Adaptive QoS Provisioning in Wireless Ad Hoc Networks: A Semi-MDP Approach. In Proc. IEEE Wireless Communications and Networking Conference (WCNC), March 2005. Kok-Kiong Yap, Wai-Leong Yeow, Mehul Motani, and Chen-Khong Tham. Simple Directional Antennas: Improving Performance in Wireless Multihop Networks. In Proc. IEEE International Conference on Computer Communications (INFOCOM), April 2006. Fan Ye, Gary Zhong, Songwu Lu, and Lixia Zhang. PEAS: A Robust Energy Conserving Protocol for Long-lived Sensor Networks. In Proc. IEEE International Conference on Distributed Computing Systems (ICDCS), Providence, Rhode Island, USA, May 2003. Wai-Leong Yeow, Chen-Khong Tham, and Wai-Choong Wong. Energy Efficient Multiple Target Tracking in Sensor Networks. In Proc. IEEE Global Communications Conference (GLOBECOM), St. Louis, MO, November/December 2005a. Wai-Leong Yeow, Chen-Khong Tham, and Wai-Choong Wong. A Novel Target Movement Model and Energy Efficient Target Tracking in Sensor Networks. In Proc. IEEE Semiannual Vehicular Technology Conference (VTC-Spring), Stockholm, SE, May/June 2005b. Wai-Leong Yeow, Chen-Khong Tham, and Wai-Choong Wong. Markov Decision Processes. Hard Constrained In Proc. AAAI Conference on Artificial Intelligence (AAAI), 2006. Wai-Leong Yeow, Chen-Khong Tham, and Wai-Choong Wong. Energy Efficient Multiple 175 BIBLIOGRAPHY Target Tracking in Sensor Networks. IEEE Transactions on Vehicular Technology, 56 (2):918–928, March 2007a. Wai-Leong Yeow, Kok-Kiong Yap, Mehul Motani, and Chen-Khong Tham. Wireless Multi-hop Networking with Static Directionality. submitted to IEEE/ACM Transactions on Networking, 2007b. Feng Zhao, Jie Liu, Juan Liu, Leonidas Guibas, and James Reich. Collaborative Signal and Information Processing: An Information-Directed Approach. Proceedings of the IEEE, 91(8):1199–1209, August 2003. Rong Zheng, Jennifer C. Hou, and Lui Sha. Asynchronous Wakeup for Ad Hoc Networks. In Proc. ACM International Symposium on Mobile Ad Hoc Networking and Computing (MobiHoc), pages 35–45. ACM Press, 2003. ISBN 1-58113-684-6. 176 [...]... approach: Hierarchical Reinforcement Learning 1.2 The Learning Approach Reinforcement Learning (RL) can be seen as a class of optimisation methods that has its roots in Markov Decision Processes (MDP) (Puterman, 1994), which is a popular method of solving sequential decision problems Since 1957, MDP (Bellman, 1957) has found applications in a variety of optimisation problems: target tracking (Evans et al.,... is based on both the MAXQ value function decomposition method and a state abstraction method that will be introduced in Chapter 3 Chapter 7 subsequently looks at another canonical WSN application, target tracking, using another hierarchical RL method A hierarchical RL structure is developed to conduct prediction-based tracking, which dynamically adjusts the sampling rate of the sensors to maintain high... communications The subsequent section describes data aggregation, a mechanism of combining several data packets into one short packet, thereby saving on communication costs The last section describes target tracking, a canonical application in WSN where tracking algorithms are adapted with energy-efficiency in mind 5 CHAPTER 2 ISSUES IN WIRELESS SENSOR NETWORKS 2.1 Topology Control and Routing Since sensors... all possible actions A gives the value function V (π) (x) h(x, a) Occupation measure; a stationary probability of the system being in state x with the control agent executing action a Q(π) (x, a | a) Q-function of a MAXQ subtask a ˘ ˘ Z (π) (x, a | a) Completion function of MAXQ subtask a after the agent chooses action ˘ ˘ a in state x V (π) (x | a) Value of a MAXQ subtask a ˘ ˘ ˆ V An estimate of some... complexity A variant of data fusion, known as decision fusion (Duarte and Hu, 2004), is a classification technique which does not require high computational load as compared to data fusion It involves the use of a posterior probabilities and aggregating information into a concise value-likelihood pair for comparison with the data at downstream Chen and Varshney (2002), Brooks et al (2003) and Xiao et al (2005)... or commonly known as the sink This increases the amount of time data packets take to reach the sink, and also increases the amount of lost packets due to packet collisions Chapter 5 looks at how to achieve the best quality data whilst ensuring receipt of up-to-date information at the sink A hierarchical RL structure is further developed in Chapter 6 which achieves the same goal as Chapter 5 but with... hierarchical learning is used to optimise performance of a sensor network from an application point of view We first look into how soft delay constraints can be incorporated into a Markov Decision Process paradigm, and suggest a reinforcement learning solution to such constraints We further consider a scenario where densely deployed sensors undergo a reporting storm: the sink should receive up-to-date data packets... with maximum accuracy despite a heavily congested viii TABLE OF CONTENTS network A distributed and cooperative learning algorithm is developed and its effectiveness is showed through simulations We further develop a hierarchical solution and demonstrated similar performance with significant memory savings The hierarchical learning paradigm is further explored in a multiple-target tracking problem and shown... information into a compact synopsis, thereby reducing the size of data A WSN generates vast amount of data while performing sensing operations over time This is especially the case when nodes are triggered off by some critical events, and a reporting storm may occur if every sensor attempts to forward data to the sink In general, to sustain a WSN, data aggregation has to be performed to reduce the amount... This facilitates sensors to work towards a common goal in a distributed manner Further, note that in the general sensor networks applications, multi-criteria optimisation is often required, e.g minimising energy consumption, maximising sensing accuracy, minimising delay, maximising throughput, etc Trade-offs and conflicts exist between various criteria As such, concepts from Constrained MDP (Altman, 1999) . DISTRIBUTED OPTIMISATION IN WIRELESS SENSOR NETWORKS A HIERARCHICAL LEARNING APPROACH YEOW WAI LEONG NATIONAL UNIVERSITY OF SINGAPORE 2007 DISTRIBUTED OPTIMISATION IN WIRELESS SENSOR NETWORKS A. being in state x at initialisation. r The reward received by a control agent after taking an action and entering a new system state. c Vector of costs incurred by a control agent after taking an. savings. The hierarchical learning paradigm is further explored in a multiple-target tracking problem and shown to demonstrate significant energy savings with uncompromised tracking accuracy. In