[Ray_J._Solomonoff_(auth.),_Frank_Emmert-Streib,_M_Information theory and statistical learning_1

Information Theory and Statistical Learning Frank Emmert-Streib • Matthias Dehmer Information Theory and Statistical Learning ABC Frank Emmert-Streib University of Washington Department of Biostatistics and Department of Genome Sciences 1705 NE Pacific St., Box 357730 Seattle WA 98195, USA and Queen’s University Belfast Computational Biology and Machine Learning Center for Cancer Research and Cell Biology School of Biomedical Sciences 97 Lisburn Road, Belfast BT9 7BL, UK v@bio-complexity.com ISBN: 978-0-387-84815-0 DOI: 10.1007/978-0-387-84816-7 Matthias Dehmer Vienna University of Technology Institute of Discrete Mathematics and Geometry Wiedner Hauptstr 8–10 1040 Vienna, Austria and University of Coimbra Center for Mathematics Probability and Statistics Apartado 3008, 3001–454 Coimbra, Portugal matthias@dehmer.org e-ISBN: 978-0-387-84816-7 Library of Congress Control Number: 2008932107 c Springer Science+Business Media, LLC 2009 All rights reserved This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights Printed on acid-free paper springer.com Preface This book presents theoretical and practical results of information theoretic methods used in the context of statistical learning Its major goal is to advocate and promote the importance and usefulness of information theoretic concepts for understanding and developing the sophisticated machine learning methods necessary not only to cope with the challenges of modern data analysis but also to gain further insights into their theoretical foundations Here Statistical Learning is loosely defined as a synonym, for, e.g., Applied Statistics, Artificial Intelligence or Machine Learning Over the last decades, many approaches and algorithms have been suggested in the fields mentioned above, for which information theoretic concepts constitute core ingredients For this reason we present a selected collection of some of the finest concepts and applications thereof from the perspective of information theory as the underlying guiding principles We consider such a perspective as very insightful and expect an even greater appreciation for this perspective over the next years The book is intended for interdisciplinary use, ranging from Applied Statistics, Artificial Intelligence, Applied Discrete Mathematics, Computer Science, Information Theory, Machine Learning to Physics In addition, people working in the hybrid fields of Bioinformatics, Biostatistics, Computational Biology, Computational Linguistics, Medical Bioinformatics, Neuroinformatics or Web Mining might profit tremendously from the presented results because these data-driven areas are in permanent need of new approaches to cope with the increasing flood of highdimensional, noisy data that possess seemingly never ending challenges for their analysis Many colleagues, whether consciously or unconsciously, have provided us with input, help and support before and during the writing of this book In particular we would like to thank Shun-ichi Amari, Hamid Arabnia, Găokhan Bakr, Alexandru T Balaban, Teodor Silviu Balaban, Frank J Balbach, João Barros, Igor Bass, Matthias Beck, Danail Bonchev, Stefan Borgert, Mieczyslaw Borowiecki, Rudi L Cilibrasi, Mike Coleman, Malcolm Cook, Pham Dinh-Tuan, Michael Drmota, Shinto Eguchi, B Roy Frieden, Bernhard Gittenberger, Galina Glazko, Martin Grabner, Earl Glynn, Peter Grassberger, Peter Hamilton, Kateˇrina Hlavácˇ ková-Schindler, Lucas R Hope, Jinjie Huang, Robert Jenssen, Attila Kertész-Farkas, András Kocsor, v vi Preface Elena Konstantinova, Kevin B Korb, Alexander Kraskov, Tyll Krăuger, Ming Li, J.F McCann, Alexander Mehler, Marco Măoller, Abbe Mowshowitz, Max Măuhlhăauser, Markus Măuller, Noboru Murata, Arcady Mushegian, Erik P Nyberg, Paulo Eduardo Oliveira, Hyeyoung Park, Judea Pearl, Daniel Polani, Sándor Pongor, William Reeves, Jorma Rissanen, Panxiang Rong, Reuven Rubinstein, Rainer Siegmund Schulze, Heinz Georg Schuster, Helmut Schwegler, Chris Seidel, Fred Sobik, Ray J Solomonoff, Doru Stefanescu, Thomas Stoll, John Storey, Milan Studeny, Ulrich Tamm, Naftali Tishby, Paul M.B Vitányi, José Miguel Urbano, Kazuho Watanabe, Dongxiao Zhu, Vadim Zverovich and apologize to all those who have been missed inadvertently We would like also to thank our editor Amy Brais from Springer who has always been available and helpful Last but not least we would like to thank our families for support and encouragement during all the time of preparing the book for publication We hope this book will help to spread the enthusiasm we have for this field and inspire people to tackle their own practical or theoretical research problems Belfast and Coimbra June 2008 Frank Emmert-Streib Matthias Dehmer Contents Algorithmic Probability: Theory and Applications Ray J Solomonoff Model Selection and Testing by the MDL Principle 25 Jorma Rissanen Normalized Information Distance 45 Paul M B Vitányi, Frank J Balbach, Rudi L Cilibrasi, and Ming Li The Application of Data Compression-Based Distances to Biological Sequences 83 Attila Kertész-Farkas, András Kocsor, and Sándor Pongor MIC: Mutual Information Based Hierarchical Clustering 101 Alexander Kraskov and Peter Grassberger A Hybrid Genetic Algorithm for Feature Selection Based on Mutual Information 125 Jinjie Huang and Panxiang Rong Information Approach to Blind Source Separation and Deconvolution 153 Pham Dinh-Tuan Causality in Time Series: Its Detection and Quantification by Means of Information Theory 183 Kateˇrina Hlavácˇ ková-Schindler Information Theoretic Learning and Kernel Methods 209 Robert Jenssen 10 Information-Theoretic Causal Power 231 Kevin B Korb, Lucas R Hope, and Erik P Nyberg vii viii Contents 11 Information Flows in Complex Networks 267 João Barros 12 Models of Information Processing in the Sensorimotor Loop 289 Daniel Polani and Marco Măoller 13 Information Divergence Geometry and the Application to Statistical Machine Learning 309 Shinto Eguchi 14 Model Selection and Information Criterion 333 Noboru Murata and Hyeyoung Park 15 Extreme Physical Information as a Principle of Universal Stability 355 B Roy Frieden 16 Entropy and Cloning Methods for Combinatorial Optimization, Sampling and Counting Using the Gibbs Sampler 385 Reuven Rubinstein Index 435 Contributors Frank J Balbach, University of Waterloo, Waterloo, ON, Canada, fbalbach@uwaterloo.ca João Barros, Instituto de Telecomunicaço˜ es, Universidade Porto, Porto, Portugal, barros@dcc.fc.up.pt Rudi L Cilibrasi, CWI, Kruislaan 413, 1098 SJ Amsterdam, The Netherlands, cilibrar@cilibrar.com Pham Dinh-Tuan, Laboratory Jean Kuntzmann, CNRS-INPG-UJF BP 53, 38041 Grenoble Cedex, France, Dinh-Tuan.Pham@imag.fr Shinto Eguchi, Institute of Statistical Mathematics, 4-6-7 Minami-Azabu, Minato-ku, Tokyo 106-8569, Japan, eguchi@ism.ac.jp B Roy Frieden, College of Optical Sciences, University of Arizona, Tucson, AZ 85721, USA, roy.frieden@optics.Arizona.edu Peter Grassberger, Department of Physics and Astronomy and Institute for Biocomplexity and Informatics, University of Calgary, 2500 University Drive NW, Calgary AB, Canada T2N 1N4, pgrassbe@ucalgary.ca Lucas R Hope, Bayesian Intelligence Pty Ltd., lhope@bayesian-intelligence.com Kateˇrina Hlavácˇ ková-Schindler, Commission for Scientific Visualization, Austrian Academy of Sciences and Donau-City Str 1, 1220 Vienna, Austria and Institute of Information Theory and Automation of the Academy of Sciences of the Czech Republic, Pod Vodárenskou vˇezˇ´ı 4, 18208 Praha 8, Czech Republic, katerina.schindler@assoc.oeaw.ac.at Jinjie Huang, Department of Automation, Harbin University of Science and Technology, Xuefu Road 52 Harbin 150080, China, jinjiehyh@yahoo.com.cn Robert Jenssen, Department of Physics and Technology, University of Tromsø, 9037 Tromso, Norway, robert.jenssen@phys.uit.no ix x Contributors Attila Kertész-Farkas, Research Group on Artificial Intelligence, Aradi vértanúk tere 1, 6720 Szeged, Hungary, kfa@inf.u-szeged.hu András Kocsor, Research Group on Artificial Intelligence, Aradi vértanúk tere 1, 6720 Szeged, Hungary, kocsor@inf.u-szeged.hu Kevin B Korb, Clayton School of IT, Monash University, Clayton 3600, Australia, kevin.korb@infotech.monash.edu.au Alexander Kraskov, UCL Institute of Neurology, Queen Square, London WC1N 3BG, UK, akraskov@ion.ucl.ac.uk Ming Li, University of Waterloo, Waterloo, ON, Canada, mli@uwaterloo.ca Marco Măoller, Adaptive Systems Research Group, School of Computer Science, University of Hertfordshire, Hatfield, UK XXX@herts.ac.uk Noboru Murata, Waseda University, Tokyo 169-8555, Japan, noboru.murata@eb.waseda.ac.jp Erik P Nyberg, School of Philosophy, University of Melbourne, Parkville 3052, Australia, e.nyberg@pgrad.unimelb.edu.au Hyeyoung Park, Kyungpook National University, Daegu 702-701, Korea, hypark@knu.ac.kr Daniel Polani, Adaptive Systems Research Group, School of Computer Science, University of Hertfordshire, Hatfield, UK, d.polani@herts.ac.uk Sándor Pongor, Protein Structure and Bioinformatics Group, International Centre for Genetic Engineering and Biotechnology, Padriciano 99, 34012 Trieste, Italy and Bioinformatics Group, Biological Research Centre, Hungarian Academy of Sciences, Temesvári krt 62, 6701 Szeged, Hungary, pongor@icgeb.org Jorma Rissanen, Helsinki Institute for Information Technology, Technical Universities of Tampere and Helsinki, and CLRC, Royal Holloway, University of London, London, UK, jorma.rissanen@hiit.fi Panxiang Rong, Department of Automation, Harbin University of Science and Technology, Xuefu Road 52 Harbin 150080, China, pxrong@hrbust.edu.cn Reuven Rubinstein, Faculty of Industrial Engineering and Management, Technion, Israel Institute of Technology Haifa 32000, Israel, ierrr01@ie.technion.ac.il Ray J Solomonoff, Visiting Professor, Computer Learning Research Centre, Royal Holloway, University of London, London, UK, rjsolo@ieee.org Paul M B Vitányi, CWI, Kruislaan 413, 1098 SJ Amsterdam, The Netherlands, paulv@cwi.nl Chapter Algorithmic Probability: Theory and Applications Ray J Solomonoff Abstract We first define Algorithmic Probability, an extremely powerful method of inductive inference We discuss its completeness, incomputability, diversity and subjectivity and show that its incomputability in no way inhibits its use for practical prediction Applications to Bernoulli sequence prediction and grammar discovery are described We conclude with a note on its employment in a very strong AI system for very general problem solving 1.1 Introduction Ever since probability was invented, there has been much controversy as to just what it meant, how it should be defined and above all, what is the best way to predict the future from the known past Algorithmic Probability is a relatively recent definition of probability that attempts to solve these problems We begin with a simple discussion of prediction and its relationship to probability This soon leads to a definition of Algorithmic Probability (ALP) and its properties The best-known properties of ALP are its incomputibility and its completeness (in that order) Completeness means that if there is any regularity (i.e property useful for prediction) in a batch of data, ALP will eventually find it, using a surprisingly small amount of data The incomputability means that in the search for regularities, at no point can we make a useful estimate of how close we are to finding the most important ones We will show, however, that this incomputability is of a very benign kind, so that in no way does it inhibit the use of ALP for good prediction One of the important properties of ALP is subjectivity, the amount of personal experiential information that the statistician must put into the system We will show that this R.J Solomonoff Visiting Professor, Computer Learning Research Centre, Royal Holloway, University of London, London, UK http://world.std.com/ rjs, e-mail: rjsolo@ieee.org F Emmert-Streib, M Dehmer (eds.), Information Theory and Statistical Learning, DOI: 10.1007/978-0-387-84816-7 1, c Springer Science+Business Media LLC 2009 ... Emmert-Streib • Matthias Dehmer Information Theory and Statistical Learning ABC Frank Emmert-Streib University of Washington Department of Biostatistics and Department of Genome Sciences 1705 NE... presents theoretical and practical results of information theoretic methods used in the context of statistical learning Its major goal is to advocate and promote the importance and usefulness of... problems Belfast and Coimbra June 2008 Frank Emmert-Streib Matthias Dehmer Contents Algorithmic Probability: Theory and Applications Ray J Solomonoff Model Selection and Testing by

Định dạng
Số trang	442
Dung lượng	9,67 MB