< BACK Elements of Artificial Neural Networks Kishan Mehrotra, Chilukuri K Mohan a n d Sanjay Ranka Preface Introduction 1.1 History of Neural Networks 1.2 Structure and Function of a Single Neuron 1.2.1 Biological neurons 1.2.2 Artificial neuron models 1.3 1.3.1 Fully connected networks October 9 ISBN - - 3 - 4 p p , 4 illus $ 0 / £ (CLOTH) 1.3.2 Layered networks 1.3.3 Acyclic networks 1.3.4 Feedforward networks ADD TO CART 1.3.5 Modular neural networks Series 1.4 Bradford Books 1.4.2 Competitive learning R e l a t e d Links More about this book and related software Neural Learning 1.4.1 Correlation learning Complex Adaptive Systems Instructor's Manual Neural Net Architectures ^ ^ i I i I 1.4.3 Feedback-based weight adaptation 1.5 What Can Neural Networks Be Used for? 1.5.1 Classification Request E x a m / D e s k Copy 1.5.2 Clustering Table of Contents 1.5.3 Vector quantization 1.5.4 Pattern association 1.5.5 Function approximation 1.5.6 Forecasting 1.5.7 Control applications 1.5.8 Optimization 1.5.9 Search 1.6 Evaluation of Networks 1.6.1 Quality of results 1.6.2 Generalizability 1.6.3 Computational resources 1.7 Implementation 1.8 Conclusion 1.9 Exercises Supervised Learning: Single-Layer Networks 2.1 Perceptrons 2.4 Guarantee of Success 2.5 Modifications 2.5.1 Pocket algorithm 2.5.2 Adalines 2.5.3 Multiclass discrimination 2.6 Conclusion 2.7 Exercises Supervised Learning: Multilayer Networks I 3.1 Multilevel Discrimination 3.2 Preliminaries 3.2.1 Architecture 3.2.2 Objectives 3.3 Backpropagation Algorithm 3.4 Setting the Parameter Values 3.4.1 Initialization of weights 3.4.2 Frequency of weight updates 3.4.3 Choice of learning rate 3.4.4 Momentum 3.4.5 Generalizability 3.4.6 Number of hidden layers and nodes 3.4.7 Number of samples 3.5 Theoretical Results* 3.5.1 Cover's theorem 3.5.2 Representations of functions 3.5.3 Approximations of functions 3.6 Accelerating the Learning Process 3.6.1 Quickprop algorithm 3.6.2 Conjugate gradient 3.7 Applications 3.7.1 Weaning from mechanically assisted ventilation 3.7.2 Classification of myoelectric signals 3.7.3 Forecasting commodity prices 3.7.4 Controlling a gantry crane 3.8 Conclusion 3.9 Exercises Supervised Learning: Mayer Networks II 4.1 Madalines 4.2 Adaptive Multilayer Networks 4.2.6 Tiling algorithm 4.3 Prediction Networks 4.3.1 Recurrent networks 4.3.2 Feedforward networks for forecasting 4.4 Radial Basis Functions 4.5 Polynomial Networks 4.6 Regularization 4.7 Conclusion 4.8 Exercises Unsupervised Learning 5.1 Winner-Take-All Networks 5.1.1 Hamming networks 5.1.2 Maxnet 5.1.3 Simple competitive learning 5.2 Learning Vector Quantizers 5.3 Counterpropagation Networks 5.4 Adaptive Resonance Theory 5.5 Topologically Organized Networks 5.5.1 Self-organizing maps 5.5.2 Convergence* 5.5.3 Extensions 5.6 Distance-Based Learning 5.6.1 Maximum entropy 5.6.2 Neural gas 5.7 Neocognitron 5.8 Principal Component Analysis Networks 5.9 Conclusion 5.10 Exercises Associative Models 6.1 Non-iterative Procedures for Association 6.2 Hopfield Networks 6.2.1 Discrete Hopfield networks 6.2.2 Storage capacity of Hopfield networks* 6.2.3 Continuous Hopfield networks 6.3 Brain-State-in-a-Box Network 6.4 Boltzmann Machines 6.4.1 Mean field annealing 6.5 Hetero-associators 7.1.2 Solving simultaneous linear equations 7.1.3 Allocating documents to multiprocessors Discrete Hopfield network Continuous Hopfield network Performance 7.2 Iterated Gradient Descent 7.3 Simulated Annealing 7.4 Random Search 7.5 Evolutionary Computation 7.5.1 Evolutionary algorithms 7.5.2 Initialization 7.5.3 Termination criterion 7.5.4 Reproduction 7.5.5 Operators Mutation Crossover 7.5.6 Replacement 7.5.7 Schema Theorem* 7.6 Conclusion 7.7 Exercises Appendix A: A Little Math A.1 Calculus A.2 Linear Algebra A.3 Statistics Appendix B: Data B.1 Iris Data B.2 Classification of Myoelectric Signals B.3 Gold Prices B.4 Clustering Animal Features B.5 3-D Corners, Grid and Approximation B.6 Eleven-City Traveling Salesperson Problem (Distances) B.7 Daily Stock Prices of Three Companies, over the Same Period B.8 Spiral Data Bibliography Index Preface This book is intended as an introduction to the subject of artificial neural networks for readers at the senior undergraduate or beginning graduate levels, as well as professional engineers and scientists The background presumed is roughly a year of college-level mathematics, and some amount of exposure to the task of developing algorithms and computer programs For completeness, some of the chapters contain theoretical sections that discuss issues such as the capabilities of algorithms presented These sections, identified by an asterisk in the section name, require greater mathematical sophistication and may be skipped by readers who are willing to assume the existence of theoretical results about neural network algorithms Many off-the-shelf neural network toolkits are available, including some on the Internet, and some that make source code available for experimentation Toolkits with user-friendly interfaces are useful in attacking large applications; for a deeper understanding, we recommend that the reader be willing to modify computer programs, rather than remain a user of code written elsewhere The authors of this book have used the material in teaching courses at Syracuse University, covering various chapters in the same sequence as in the book The book is organized so that the most frequently used neural network algorithms (such as error backpropagation) are introduced very early, so that these can form the basis for initiating course projects Chapters 2, 3, and have a linear dependency and, thus, should be covered in the same sequence However, chapters and are essentially independent of each other and earlier chapters, so these may be covered in any relative order If the emphasis in a course is to be on associative networks, for instance, then chapter may be covered before chapters 2, 3, and Chapter should be discussed before chapter If the "non-neural" parts of chapter (sections 7.2 to 7.5) are not covered in a short course, then discussion of section 7.1 may immediately follow chapter The inter-chapter dependency rules are roughly as follows l->2->3->4 l-»5 l->6 3-»5.3 - • 7.1 Within each chapter, it is best to cover most sections in the same sequence as the text; this is not logically necessary for parts of chapters 4, 5, and 7, but minimizes student confusion Material for transparencies may be obtained from the authors We welcome suggestions for improvements and corrections Instructors who plan to use the book in a course should XIV Preface send electronic mail to one of the authors, so that we can indicate any last-minute corrections needed (if errors are found after book production) New theoretical and practical developments continue to be reported in the neural network literature, and some of these are relevant even for newcomers to the field; we hope to communicate some such results to instructors who contact us The authors of this book have arrived at neural networks through different paths (statistics, artificial intelligence, and parallel computing) and have developed the material through teaching courses in Computer and Information Science Some of our biases may show through the text, while perspectives found in other books may be missing; for instance, we not discount the importance of neurobiological issues, although these consume little ink in the book It is hoped that this book will help newcomers understand the rationale, advantages, and limitations of various neural network models For details regarding some of the more mathematical and technical material, the reader is referred to more advanced texts such as those by Hertz, Krogh, and Palmer (1990) and Haykin (1994) We express our gratitiude to all the researchers who have worked on and written about neural networks, and whose work has made this book possible We thank Syracuse University and the University of Florida, Gainesville, for supporting us during the process of writing this book We thank Li-Min Fu, Joydeep Ghosh, and Lockwood Morris for many useful suggestions that have helped improve the presentation We thank all the students who have suffered through earlier drafts of this book, and whose comments have improved this book, especially S K Bolazar, M Gunwani, A R Menon, and Z Zeng We thank Elaine Weinman, who has contributed much to the development of the text Harry Stanton of the MIT Press has been an excellent editor to work with Suggestions on an early draft of the book, by various reviewers, have helped correct many errors Finally, our families have been the source of much needed support during the many months of work this book has entailed We expect that some errors remain in the text, and welcome comments and corrections from readers The authors may be reached by electronic mail at mehwtra@syr.edu, ckmohan@syr.edu, and ranka@cis.ufl.edu In particular, there has been so much recent research in neural networks that we may have mistakenly failed to mention the names of researchers who have developed some of the ideas discussed in this book Errata, computer programs, and data files will be made accessible by Internet A Introduction If we could first know where we are, and whither we are tending, we could better judge what to do, and how to it —Abraham Lincoln Many tasks involving intelligence or pattern recognition are extremely difficult to automate, but appear to be performed very easily by animals For instance, animals recognize various objects and make sense out of the large amount of visual information in their surroundings, apparently requiring very little effort It stands to reason that computing systems that attempt similar tasks will profit enormously from understanding how animals perform these tasks, and simulating these processes to the extent allowed by physical limitations This necessitates the study and simulation of Neural Networks The neural network of an animal is part of its nervous system, containing a large number of interconnected neurons (nerve cells) "Neural" is an adjective for neuron, and "network" denotes a graph-like structure Artificial neural networks refer to computing systems whose central theme is borrowed from the analogy of biological neural networks Bowing to common practice, we omit the prefix "artificial." There is potential for confusing the (artificial) poor imitation for the (biological) real thing; in this text, non-biological words and names are used as far as possible Artificial neural networks are also referred to as "neural nets," "artificial neural systems," "parallel distributed processing systems," and "connectionist systems." For a computing system to be called by these pretty names, it is necessary for the system to have a labeled directed graph structure where nodes perform some simple computations From elementary graph theory we recall that a "directed graph" consists of a set of "nodes" (vertices) and a set of "connections" (edges/links/arcs) connecting pairs of nodes A graph is a "labeled graph" if each connection is associated with a label to identify some property of the connection In a neural network, each node performs some simple computations, and each connection conveys a signal from one node to another, labeled by a number called the "connection strength" or "weight" indicating the extent to which a signal is amplified or diminished by a connection Not every such graph can be called a neural network, as illustrated in example 1.1 using a simple labeled directed graph that conducts an elementary computation 1.1 The "AND" of two binary inputs is an elementary logical operation, implemented in hardware using an "AND gate." If the inputs to the AND gate are x\ e {0,1} and X2 e {0,1}, the desired output is if x\ = X2 = 1, and otherwise A graph representing this computation is shown in figure 1.1, with one node at which computation (multiplication) is carried out, two nodes that hold the inputs (x\,x2), and one node that holds one output However, this graph cannot be considered a neural network since the connections EXAMPLE Introduction Multiplier *,s{0,l}•*> o = xx ANDJC2 X26{0,1}- Figure 1.1 AND gate graph (W,J:1)(H'2J(2) »» o = JCI AND x^ Figure 1.2 AND gate network between the nodes are fixed and appear to play no other role than carrying the inputs to the node that computes their conjunction We may modify the graph in figure 1.1 to obtain a network containing weights (connection strengths), as shown in figure 1.2 Different choices for the weights result in different functions being evaluated by the network Given a network whose weights are initially random, and given that we know the task to be accomplished by the network, a "learning algorithm" must be used to determine the values of the weights that will achieve the desired task The graph structure, with connection weights modifiable using a learning algorithm, qualifies the computing system to be called an artificial neural network 1.2 For the network shown in figure 1.2, the following is an example of a learning algorithm that will allow learning the AND function, starting from arbitrary values of w\ and u>2 The trainer uses the following four examples to modify the weights: {(*l = l,JC =l,i or u>2 may be repeated until the final result is satisfactory, with weights w\ = 5.0, W2 = 0.2 Can the weights of such a net be modified so that the system performs a different task? For instance, is there a set of values for w\ and W2 such that a net otherwise identical to that shown in figure 1.2 can compute the OR of its inputs? Unfortunately, there is no possible choice of weights w\ and u>2 such that {w\ • x\) • (tU2 • *2) will compute the OR of x\ and X2 For instance, whenever x\ = 0, the output value (w\ - x\) > (\V2 • X2) = 0, irrespective of whether X2 = The node function was predetermined to multiply weighted inputs, imposing a fundamental limitation on the capabilities of the network shown in figure 1.2, although it was adequate for the task of computing the AND function and for functions described by the mathematical expression o = wiW2XiX2 A different node function is needed if there is to be some chance of learning the OR function An example of such a node function is (x\ + X2 — x\ • X2), which evaluates to if x\ = or X2 = 1, and to if x\ = and X2 = (assuming that each input can take only a or value) But this network cannot be used to compute the AND function Sometimes, a network may be capable of computing a function, but the learning algorithm may not be powerful enough to find a satisfactory set of weight values, and the final result may be constrained due to the initial (random) choice of weights For instance, the AND function cannot be learnt accurately using the learning algorithm described above if we started from initial weight values w\ — W2 = 0.3, since the solution w\ = 1/0.3 cannot be reached by repeatedly incrementing (or decrementing) the initial choice of w\ by 0.1 We seem to be stuck with a one node function for AND and another for OR What if we did not know beforehand whether the desired function was AND or OR? Is there some node function such that we can simulate AND as well as OR by using different weight values? Is there a different network that is powerful enough to learn every conceivable function of its inputs? Fortunately, the answer is yes; networks can be built with sufficiently general node functions so that a large number of different problems can be solved, using a different set of weight values for each task The AND gate example has served as a takeoff point for several important questions: what are neural networks, what can they accomplish, how can they be modified, and what are their limitations? In the rest of this chapter, we review the history of research in neural networks, and address four important questions regarding neural network systems 13 Appendix: Data In this appendix, we present several example data files to serve as fodder for the neural network algorithms and programs discussed in this text B.l Iris Data This classic data of Anderson and Fisher (1936) pertains to a four-input, three-class classification problem The first four columns of the table indicate petal and sepal widths and lengths of various iris flowers, and the fifth column indicates the appropriate class (Setosa, Versicolor, and Virginia) 50 67 69 46 65 58 57 63 49 70 48 63 56 67 63 63 51 69 72 56 68 48 57 66 51 67 33 31 31 36 30 27 28 33 25 32 31 25 28 25 23 25 25 31 36 29 30 30 38 30 37 30 14 56 51 10 52 51 45 47 45 47 16 50 49 58 44 49 30 54 61 36 55 14 17 44 15 50 24 23 20 19 13 16 17 14 19 20 18 13 15 11 21 25 13 21 14 17 3 3 2 3 3 2 3 1 2 49 44 58 63 50 51 50 64 51 49 58 59 57 52 60 56 49 54 60 50 47 62 51 60 49 44 36 32 26 27 23 38 30 28 38 30 27 30 29 41 30 27 31 39 34 20 32 29 34 22 31 37 14 13 40 49 33 16 16 56 19 14 41 42 42 15 48 42 15 17 45 35 13 43 15 40 15 15 12 18 10 2 21 10 15 13 18 13 16 10 13 10 1 1 1 2 1 2 2 1 60 50 58 54 64 49 55 60 52 44 58 69 50 77 55 48 48 61 58 62 56 46 57 63 72 71 29 36 27 34 31 24 42 22 27 29 27 32 35 28 26 30 34 26 40 28 30 32 44 34 30 30 45 14 51 15 55 33 14 50 39 14 39 57 16 67 44 14 19 56 12 48 45 14 15 56 58 49 15 19 18 10 15 14 12 23 20 12 14 18 15 24 16 21 3 2 3 1 3 1 3 316 64 63 62 61 56 68 62 67 55 64 59 64 54 67 44 47 72 61 50 43 67 51 50 57 B.2 B Data 28 28 22 30 25 32 34 33 35 32 30 32 30 33 30 32 32 30 32 30 31 35 34 26 56 51 45 46 39 59 54 57 13 45 51 53 45 57 13 16 60 49 12 11 44 14 16 35 22 15 15 14 11 23 23 25 15 18 23 15 21 2 18 18 14 10 3 2 3 3 3 1 3 1 1 61 57 54 65 69 55 45 51 68 52 63 65 46 59 60 65 51 77 76 67 61 55 52 79 28 28 39 32 31 25 23 38 28 35 33 28 34 32 27 30 33 38 30 30 28 24 34 38 47 41 13 51 49 40 13 15 48 15 60 46 14 48 51 55 17 67 66 52 40 38 14 64 12 13 20 15 13 3 14 25 15 18 16 18 22 21 23 13 11 20 2 2 1 2 3 3 2 63 57 77 66 50 55 46 74 50 73 67 56 64 65 51 61 64 48 57 55 54 58 53 77 29 30 26 29 34 24 31 28 35 29 31 30 29 30 35 29 27 34 25 23 34 28 37 30 56 42 69 46 15 37 15 61 13 63 47 41 43 58 14 47 53 16 50 40 17 51 15 61 18 12 23 13 10 19 18 15 13 13 22 14 19 20 13 24 23 3 2 3 2 3 3 Classification of Myoelectric Signals The following data are extracted from a problem in discriminating between electrical signals observed at the human skin surface The first four columns correspond to input dimensions, and the last column indicates the class to which the signal belongs 0.138 -0.168 0.255 -0.029 0.044 0.003 -0.289 0.193 0.134 -0.163 0.048 -0.017 -0.030 -0.001 -0.009 0.003 -0.050 0.019 0.016 0.008 -0.008 0.028 0.023 0.018 1 900 000 030 030 076 191 ,634 525 >—* 169 097 1 o o o 1 o o o o o b o o o ,089 250 ,038 033 341 019 to ,004 ,107 ,098 ,003 ,012 1—» 930 o o p o o o o o o o o o o 900 930 078 161 437 098 013 180 ,293 168 342 051 082 102 016 ,034 ,015 ,032 ,007 ,097 ,109 ,073 050 ,003 1 o o 1 o o o o o o o o 1 1 o o o o o o o o o o o o o o vo eo vo i—» o o o o o h-» 990 990 I—» 138 084 084 274 052 071 072 ,112 111 I—» i—» 035 ,032 031 >—» i—* 58i 023 o o ,017 ,241 ,098 ,028 057 o o o o o o o o p o 1 o o o o o o o o o o o o o o o o o 930 £00 030 i— ~ i—« i—« 930 £00 630 300 i—» 317 113 114 00 to «o 016 015 013 054 ,034 930 9S0 ,058 010 033 i—» ,004 ,063 418 ,045 076 eo o o 1 o o o o o o o o1 o o o o o p o o o ben b ,016 042 ,018 ,185 o o o o o 1—» 302 ,044 ,024 ,382 ,044 ,015 146 ,018 040 ,253 1 1 1 1 o o o o o o o o o o o o o o p o o o o o o o o o e © b i—» 690 8888 373 246 040 021 302 117 ,148 012 588 035 030 ,046 ,368 ,171 ,014 ,165 ,034 ,007 ,138 1 o o o o o o o o 1 o 1 1 o o o o o o o o o o o o o o o en -o to oo o o i—» i—» o o o o i—» o i—» o i—» 000 600 600 H^ 373 253 017 013 021 H^ 042 021 856 010 070 ,061 245 H-1 033 023 ,003 ,133 240 ,036 ,012 ,032 ,010 o 1 o o o o 1 1 o o o o o o o o o o o o o o o o o o o o o o o o 318 B Data B Gold Prices The following numbers indicate daily gold prices on international markets; the task is to develop a neural network that can forecast these prices Prices on successive days are given on the same line, and the first entry in a line pertains to a date that follows the last entry on the preceding line 329.95 329.60 329.75 328.50 328.05 327.80 327.95 328.30 331.55 331.25 328.55 328.80 331.85 330.10 330.95 329.90 329.45 333.50 329.85 329.50 329.15 328.35 330.05 328.80 329.65 329.60 327.35 326.75 326.75 327.00 328.05 329.05 328.55 328.80 330.65 331.50 331.75 331.60 332.65 332.50 332.05 332.00 337.25 336.75 339.45 340.00 338.05 337.50 337.85 337.00 337.85 336.50 337.25 338.50 337.25 338.50 341.15 •339.90 329.95 328.65 328.15 328.50 331.80 328.55 330.30 330.65 329.60 329.55 328.55 329.45 330.15 326.45 327.10 328.65 329.35 331.30 331.75 332.15 332.05 338.35 340.35 338.40 337.30 337.30 339.20 339.05 340.55 328.20 324.75 327.34 329.00 329.84 329.09 331.87 331.20 328.55 329.31 327.63 330.40 331.41 328.47 327.11 327.71 329.10 331.94 331.90 332.78 331.92 338.34 338.28 338.85 336.94 336.94 338.66 338.23 341.08 330.15 328.45 328.15 328.85 330.95 329.25 330.15 329.25 330.55 329.35 328.95 328.45 328.75 326.75 327.05 329.15 329.45 332.45 331.25 332.65 333.15 336.95 339.45 337.35 337.85 337.15 339.55 340.55 339.15 329.55 328.35 327.75 331.80 328.80 332.00 330.00 327.85 329.20 328.15 329.40 328.75 327.00 326.15 327.90 328.85 330.25 332.70 332.20 332.25 336.35 338.20 339.10 337.75 337.00 336.95 336.60 339.90 339.25 329.75 328.15 328.05 330.15 328.35 331.65 330.00 330.00 330.15 328.75 330.10 328.25 327.35 326.55 327.15 329.00 330.45 332.60 331.35 332.50 335.75 336.75 339.30 336.60 337.25 336.65 338.65 339.85 338.85 330.51 326.32 327.28 328.92 329.66 330.26 327.90 331.20 330.73 327.78 328.57 330.40 326.87 326.03 325.86 329.10 330.00 331.97 330.90 333.81 332.20 336.93 338.52 337.47 336.94 336.32 339.71 340.19 339.01 B.5 3-D Corners, Grid and Approximation 337.15 348.75 349.75 354.25 339.60 352.00 353.00 355.80 339.70 350.10 351.60 354.95 339.09 349.04 349.64 354.22 319 341.15 352.25 354.65 346.50 350.00 355.90 334.00 350.75 354.90 340.87 352.28 355.49 B.4 Clustering Animal Features In the following data, each line describes features such as teeth that characterize one animal A clustering algorithm may be applied to determine which animals resemble each other the most, using eight-dimensional input vectors 2 2 3 3 3 3 1 1 1 3 1 1 1 0 0 0 1 1 1 1 0 0 0 1 3 2 2 2 1 4 3 2 2 1 1 4 3 3 3 3 3 3 2 3 3 3 3 3 3 3 3 3 3 3 3 3 0 0 3 3 3 2 4 4 1 1 1 1 1 1 1 0 1 1 1 1 1 1 0 0 4 3 4 3 3 4 3 2 4 3 3 1 1 1 1 1 3 3 2 2 2 1 1 3 3 Reproduced with permission: SAS® Users Guide: Statistics, 1982 edition, Cary, NC: 1982 © SAS Institute, Inc B.5 3-D Corners, Grid and Approximation The first three columns of each line in the following data correspond to input dimensions The data may be used for clustering, using the first three columns alone Some of the points fall in the vicinity of a corner of the unit cube, indicated by the fourth column, that may be used as desired output for a classification problem The same three-dimensional input data may also'be used for a three-dimensional chessboard or "grid" problem, in which « o\ in in H ( " • O N O N ^ O N oo vo vo *n NO _ -H SB © ~H' I I 11 i n N v o o N O ^ c o c o O N O O ' - H O O o © © © © © © © © *o