Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 549 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
549
Dung lượng
17,82 MB
Nội dung
List of Contributors L.F Abbott, Department of Physiology and Cellular Biophysics, Center for Neurobiology and Behavior, Columbia University College of Physicians and Surgeons, New York, NY 10032-2695, USA A.P.L Abdala, Department of Physiology, School of Medical Sciences, University of Bristol, Bristol BS8 1TD, UK D.E Angelaki, Department of Anatomy and Neurobiology, Box 8108, Washington University School of Medicine, 660 South Euclid Avenue, St Louis, MO 63110, USA J Beck, Department of Brain and Cognitive Sciences, University of Rochester, Rochester, NY, USA ´ ´ Y Bengio, Department IRO, Universite de Montreal, P.O Box 6128, Downtown Branch, Montreal, QC H3C 3J7, Canada C Cadieu, Redwood Center for Theoretical Neuroscience and Helen Wills Neuroscience Institute, University of California, Berkeley, CA, USA C.E Carr, Department of Biology, University of Maryland, College Park, MD 20742, USA ` ´ ´ P Cisek, Groupe de Recherche sur le Systeme Nerveux Central, Departement de Physiologie, Universite de ´ ´ Montreal, Montreal, QC H3C 3J7, Canada C.M Colbert, Biology and Biochemistry, University of Houston, Houston, TX, USA E.P Cook, Department of Physiology, McGill University, 3655 Sir William Osler, Montreal, QC H3G 1Y6, Canada P Dario, CRIM Laboratory, Scuola Superiore Sant’Anna, Viale Rinaldo Piaggio 34, 56025 Pontedera (Pisa), Italy A.G Feldman, Center for Interdisciplinary Research in Rehabilitation (CRIR), Rehabilitation Institute of Montreal, and Jewish Rehabilitation Hospital, Laval, 6300 Darlington, Montreal, QC H3S 2J4, Canada M.S Fine, Department of Biomedical Engineering, Washington University, Brookings Dr., St Louis, MO 63130, USA D.W Franklin, Kobe Advanced ICT Research Center, NiCT, and ATR Computational Neuroscience Laboratories, 2-2-2 Hikaridai, Keihanna Science City, Kyoto 619-0288, Japan W.J Freeman, Department of Molecular and Cell Biology, University of California, Berkeley, CA 947203206, USA ´ T Gisiger, Recepteurs et Cognition, Institut Pasteur, 25 rue du Docteur Roux, 75015 Paris Cedex 15, France S Giszter, Neurobiology and Anatomy, Drexel University College of Medicine, 2900 Queen Lane, Philadelphia, PA 19129, USA V Goussev, Center for Interdisciplinary Research in Rehabilitation (CRIR), Rehabilitation Institute of Montreal, and Jewish Rehabilitation Hospital, Laval, 6300 Darlington, Montreal, QC H3S 2J4, Canada R Grashow, Volen Center MS 013, Brandeis University, 415 South St., Waltham, MA 02454-9110, USA ´ ´ ´ A.M Green, Department de Physiologie, Universite de Montreal, 2960 Chemin de la Tour, Rm 2140, ´ Montreal, QC H3T 1J4, USA S Grillner, Nobel Institute for Neurophysiology, Department of Neuroscience, Karolinska Institutet, Retzius vag 8, SE-171 77 Stockholm, Sweden ă S Grossberg, Department of Cognitive and Neural Systems, Center for Adaptive Systems, and Center for Excellence for Learning in Education, Science and Technology, Boston University, 677 Beacon Street, Boston, MA 02215, USA v vi J.A Guest, Biology and Biochemistry, University of Houston, Houston, TX, USA C Hart, Neurobiology and Anatomy, Drexel University College of Medicine, 2900 Queen Lane, Philadelphia, PA 19129, USA M Hawken, Center for Neural Science, New York University, Washington Place, New York, NY 10003, USA M.R Hinder, Perception and Motor Systems Laboratory, School of Human Movement Studies, University of Queensland, Brisbane, Queensland 4072, Australia G.E Hinton, Department of Computer Science, University of Toronto, 10 Kings College Road, Toronto, M5S 3G4 Canada ´ ´ A Ijspeert, School of Computer and Communication Sciences, Ecole Polytechnique Federale de Lausanne (EPFL), Station 14, CH-1015 Lausanne, Switzerland ´ ´ ´ J.F Kalaska, GRSNC, Departement de Physiologie, Faculte de Medecine, Pavillon Paul-G Desmarais, ´ ´ ´ Universite de Montreal, C.P 6128, Succursale Centre-ville, Montreal, QC H3C 3J7, Canada ´ ´ ` ´ ´ M Kerszberg, Universite Pierre et Marie Curie, Modelisation Dynamique des Systemes Integres UMR ´ ´ CNRS 7138—Systematique, Adaptation, evolution, Quai Saint Bernard, 75252 Paris Cedex 05, France U Knoblich, Center for Biological and Computational Learning, McGovern Institute for Brain Research, Computer Science and Artificial Intelligence Laboratory, Brain and Cognitive Sciences Department, Massachusetts Institute of Technology, 43 Vassar Street #46-5155B, Cambridge, MA 02139, USA C Koch, Division of Biology, California Institute of Technology, MC 216-76, Pasadena, CA 91125, USA J.H Kotaleski, Computational Biology and Neurocomputing, School of Computer Science and Communication, Royal Institute of Technology, SE 10044 Stockholm, Sweden M Kouh, Center for Biological and Computational Learning, McGovern Institute for Brain Research, Computer Science and Artificial Intelligence Laboratory, Brain and Cognitive Sciences Department, Massachusetts Institute of Technology, 43 Vassar Street #46-5155B, Cambridge, MA 02139, USA A Kozlov, Computational Biology and Neurocomputing, School of Computer Science and Communication, Royal Institute of Technology, SE 10044 Stockholm, Sweden J.W Krakauer, The Motor Performance Laboratory, Department of Neurology, Columbia University College of Physicians and Surgeons, New York, NY 10032, USA G Kreiman, Department of Ophthalmology and Neuroscience, Children’s Hospital Boston, Harvard Medical School and Center for Brain Science, Harvard University ´ ´ ´ N.I Krouchev, GRSNC, Departement de Physiologie, Faculte de Medecine, Pavillon Paul-G Desmarais, ´ ´ ´ Universite de Montreal, C.P 6128, Succursale Centre-ville, Montreal, QC H3C 3J7, Canada I Kurtzer, Centre for Neuroscience Studies, Queen’s University, Kingston, ON K7L 3N6, Canada A Lansner, Computational Biology and Neurocomputing, School of Computer Science and Communication, Royal Institute of Technology, SE 10044 Stockholm, Sweden P.E Latham, Gatsby Computational Neuroscience Unit, London WC1N 3AR, UK M.F Levin, Center for Interdisciplinary Research in Rehabilitation, Rehabilitation Institute of Montreal and Jewish Rehabilitation Hospital, Laval, QC, Canada J Lewi, Georgia Institute of Technology, Atlanta, GA, USA Y Liang, Biology and Biochemistry, University of Houston, Houston, TX, USA W.J Ma, Department of Brain and Cognitive Sciences, University of Rochester, Rochester, NY 14627, USA K.M MacLeod, Department of Biology, University of Maryland, College Park, MD 20742, USA L Maler, Department of Cell and Molecular Medicine and Center for Neural Dynamics, University of Ottawa, 451 Smyth Rd, Ottawa, ON K1H 8M5, Canada E Marder, Volen Center MS 013, Brandeis University, 415 South St., Waltham, MA 02454-9110, USA S.N Markin, Department of Neurobiology and Anatomy, Drexel University College of Medicine, 2900 Queen Lane, Philadelphia, PA 19129, USA vii N.Y Masse, Department of Physiology, McGill University, 3655 Sir William Osler, Montreal, QC H3G 1Y6, Canada D.A McCrea, Spinal Cord Research Centre and Department of Physiology, University of Manitoba, 730 William Avenue, Winnipeg, MB R3E 3J7, Canada A Menciassi, CRIM Laboratory, Scuola Superiore Sant’Anna, Viale Rinaldo Piaggio 34, 56025 Pontedera (Pisa), Italy T Mergner, Neurological University Clinic, Neurocenter, Breisacher Street 64, 79106 Freiburg, Germany T.E Milner, School of Kinesiology, Simon Fraser University, Burnaby, BC V5A 1S6, Canada P Mohajerian, Computer Science and Neuroscience, University of Southern California, Los Angeles, CA 90089-2905, USA L Paninski, Department of Statistics and Center for Theoretical Neuroscience, Columbia University, New York, NY 10027, USA V Patil, Neurobiology and Anatomy, Drexel University College of Medicine, 2900 Queen Lane, Philadelphia, PA 19129, USA J.F.R Paton, Department of Physiology, School of Medical Sciences, University of Bristol, Bristol BS8 1TD, UK J Pillow, Gatsby Computational Neuroscience Unit, University College London, Alexandra House, 17 Queen Square, London WC1N 3AR, UK T Poggio, Center for Biological and Computational Learning, McGovern Institute for Brain Research, Computer Science and Artificial Intelligence Laboratory, Brain and Cognitive Sciences Department, Massachusetts Institute of Technology, 43 Vassar Street #46-5155B, Cambridge, MA 02139, USA A Pouget, Department of Brain and Cognitive Sciences, University of Rochester, Rochester, NY 14627, USA A Prochazka, Centre for Neuroscience, 507 HMRC University of Alberta, Edmonton, AB T6G 2S2, Canada R Rohrkemper, Physics Department, Institute of Neuroinformatics, Swiss Federal Institute of Technology, Zurich CH-8057, Switzerland ă I.A Rybak, Department of Neurobiology and Anatomy, Drexel University College of Medicine, Philadelphia, PA 19129, USA A Sangole, Center for Interdisciplinary Research in Rehabilitation, Rehabilitation Institute of Montreal and Jewish Rehabilitation Hospital, Laval, QC, Canada S Schaal, Computer Science and Neuroscience, University of Southern California, Los Angeles, CA 900892905, USA S.H Scott, Centre for Neuroscience Studies, Department of Anatomy and Cell Biology, Queen’s University, Botterell Hall, Kingston, ON K7L 3N6, Canada T Serre, Center for Biological and Computational Learning, McGovern Institute for Brain Research, Computer Science and Artificial Intelligence Laboratory, Brain and Cognitive Sciences Department, Massachusetts Institute of Technology, 43 Vassar Street #46-5155B, Cambridge, MA 02139, USA R Shadmehr, Laboratory for Computational Motor Control, Department of Biomedical Engineering, Johns Hopkins School of Medicine, Baltimore, MD 21205, USA R Shapley, Center for Neural Science, New York University, Washington Place, New York, NY 10003, USA J.C Smith, Cellular and Systems Neurobiology Section, National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD 20892-4455, USA C Stefanini, CRIM Laboratory, Scuola Superiore Sant’Anna, Viale Rinaldo Piaggio 34, 56025 Pontedera (Pisa), Italy J.A Taylor, Department of Biomedical Engineering, Washington University, Brookings Dr., St Louis, MO 63130, USA viii K.A Thoroughman, Department of Biomedical Engineering, Washington University, Brookings Dr., St Louis, MO 63130, USA L.H Ting, The Wallace H Coulter Department of Biomedical Engineering at Georgia Tech and Emory University, 313 Ferst Drive, Atlanta, GA 30332-0535, USA A.-E Tobin, Volen Center MS 013, Brandeis University, 415 South St., Waltham, MA 02454-9110, USA E Torres, Division of Biology, California Institute of Technology, Pasadena, CA 91125, USA J.Z Tsien, Center for Systems Neurobiology, Departments of Pharmacology and Biomedical Engineering, Boston University, Boston, MA 02118, USA D Tweed, Departments of Physiology and Medicine, University of Toronto, King’s College Circle, Toronto, ON M5S 1A8, Canada D.B Walther, Beckman Institute for Advanced Science and Technology, University of Illinois at UrbanaChampaign, 405 N Mathews Ave., Urbana, IL 61801, USA A.C Wilhelm, Department of Physiology, McGill University, 3655 Sir William Osler, Montreal, QC H3G 1Y6, Canada D Xing, Center for Neural Science, New York University, Washington Place, New York, NY 10003, USA ´ ´ ´ S Yakovenko, Departement de Physiologie, Universite de Montreal, Pavillon Paul-G Desmarais, Universite de Montreal, C.P 6128, Succ Centre-ville, Montreal, QC H3C 3J7, Canada D Zipser, Department of Cognitive Science, UCSD 0515, 9500 Gilman Drive, San Diego, CA 92093, USA Preface In recent years, computational approaches have become an increasingly prominent and influential part of neuroscience research From the cellular mechanisms of synaptic transmission and the generation of action potentials, to interactions among networks of neurons, to the high-level processes of perception and memory, computational models provide new sources of insight into the complex machinery which underlies our behaviour These models are not merely mathematical surrogates for experimental data More importantly, they help us to clarify our understanding of a particular nervous system process or function, and to guide the design of our experiments by obliging us to express our hypotheses in a language of mathematical formalisms A mathematical model is an explicit hypothesis, in which we must incorporate all of our beliefs and assumptions in a rigorous and coherent conceptual framework that is subject to falsification and modification Furthermore, a successful computational model is a rich source of predictions for future experiments Even a simplified computational model can offer insights that unify phenomena across different levels of analysis, linking cells to networks and networks to behaviour Over the last few decades, more and more experimental data have been interpreted from computational perspectives, new courses and graduate programs have been developed to teach computational neuroscience methods and a multitude of interdisciplinary conferences and symposia have been organized to bring mathematical theorists and experimental neuroscientists together ´ ´ This book is the result of one such symposium, held at the Universite de Montreal on May 8–9, 2006 (see: ` http://www.grsnc.umontreal.ca/XXVIIIs) It was organized by the Groupe de Recherche sur le Systeme Nerveux Central (GRSNC) as one of a series of annual international symposia held on a different topic each year This was the first symposium in that annual series that focused on computational neuroscience, and it included presentations by some of the pioneers of computational neuroscience as well as prominent experimental neuroscientists whose research is increasingly integrated with computational modelling The symposium was a resounding success, and it made clear to us that computational models have become a major and very exciting aspect of neuroscience research Many of the participants at that meeting have contributed chapters to this book, including symposium speakers and poster presenters In addition, we invited a number of other well-known computational neuroscientists, who could not participate in the symposium itself, to also submit chapters Of course, a collection of 34 chapters cannot cover more than a fraction of the vast range of computational approaches which exist We have done our best to include work pertaining to a variety of neural systems, at many different levels of analysis, from the cellular to the behavioural, from approaches intimately tied with neural data to more abstract algorithms of machine learning The result is a collection which includes models of signal transduction along dendrites, circuit models of visual processing, computational analyses of vestibular processing, theories of motor control and learning, machine algorithms for pattern recognition, as well as many other topics We asked all of our contributors to address their chapters to a broad audience of neuroscientists, psychologists, and mathematicians, and to focus on the broad theoretical issues which tie these fields together The conference, and this book, would not have been possible without the generous support of the GRSNC, the Canadian Institute of Advanced Research (CIAR), Institute of Neuroscience, Mental Health and Addiction (INMHA) of the Canadian Institutes of Health Research (CIHR), the Fonds de la ix x ´ ´ ´ ´ Recherche en Sante Quebec (FRSQ), and the Universite de Montreal We gratefully acknowledge these sponsors as well as our contributing authors who dedicated their time to present their perspectives on the computational principles which underlie our sensations, thoughts, and actions Paul Cisek Trevor Drew John F Kalaska P Cisek, T Drew & J.F Kalaska (Eds.) Progress in Brain Research, Vol 165 ISSN 0079-6123 Copyright r 2007 Elsevier B.V All rights reserved CHAPTER The neuronal transfer function: contributions from voltage- and time-dependent mechanisms Erik P Cook1,Ã, Aude C Wilhelm1, Jennifer A Guest2, Yong Liang2, Nicolas Y Masse1 and Costa M Colbert2 Department of Physiology, McGill University, 3655 Sir William Osler, Montreal, QC H3G 1Y6, Canada Biology and Biochemistry, University of Houston, Houston, TX, USA Abstract: The discovery that an array of voltage- and time-dependent channels is present in both the dendrites and soma of neurons has led to a variety of models for single-neuron computation Most of these models, however, are based on experimental techniques that use simplified inputs of either single synaptic events or brief current injections In this study, we used a more complex time-varying input to mimic the continuous barrage of synaptic input that neurons are likely to receive in vivo Using dual whole-cell recordings of CA1 pyramidal neurons, we injected long-duration white-noise current into the dendrites The amplitude variance of this stimulus was adjusted to produce either low subthreshold or high suprathreshold fluctuations of the somatic membrane potential Somatic action potentials were produced in the high variance input condition Applying a rigorous system-identification approach, we discovered that the neuronal input/ output function was extremely well described by a model containing a linear bandpass filter followed by a nonlinear static-gain Using computer models, we found that a range of voltage-dependent channel properties can readily account for the experimentally observed filtering in the neuronal input/output function In addition, the bandpass signal processing of the neuronal input/output function was determined by the timedependence of the channels A simple active channel, however, could not account for the experimentally observed change in gain These results suggest that nonlinear voltage- and time-dependent channels contribute to the linear filtering of the neuronal input/output function and that channel kinetics shape temporal signal processing in dendrites Keywords: dendrite; integration; hippocampus; CA1; channel; system-identification; white noise of understanding information processing in the brain The past two decades of research have significantly increased our knowledge of how neurons integrate synaptic input, including the finding that dendrites contain nonlinear voltage- and timedependent mechanisms (for review, see Johnston et al., 1996) However, there is still no consensus on the precise structure of the rules for synaptic integration The neuronal input/output function What are the rules that single neurons use to process synaptic input? Put another way, what is the neuronal input/output function? Revealing the answer to this question is central to the larger task ÃCorresponding author Tel.: +1 514 398 7691; Fax: +1 514 398 8241; E-mail: erik.cook@mcgill.ca DOI: 10.1016/S0079-6123(06)65001-2 Early theoretical models of neuronal computation described the neuronal input/output function as a static summation of the synaptic inputs (McCulloch and Pitts, 1943) Rall later proposed that cable theory could account for the passive electrotonic properties of dendritic processing (Rall, 1959) This passive theory of dendritic integration has been extremely useful because it encompasses both the spatial and temporal aspects of the neuronal input/output function using a single quantitative framework For example, the passive model predicts that the temporal characteristics of dendrites are described by a lowpass filter with a cutoff frequency that is inversely related to the distance from the soma The recent discovery that dendrites contain a rich collection of time- and voltage-dependent channels has renewed and intensified the study of dendritic signal processing at the electrophysiological level (for reviews, see Hausser et al., 2000; Magee, 2000; Segev and London, 2000; Reyes, 2001; London and Hausser, 2005) The central goal of this effort has been to understand how these active mechanisms augment the passive properties of dendrites These studies, however, have produced somewhat conflicting results as to whether dendrites integrate synaptic inputs in a linear or nonlinear fashion (Urban and Barrionuevo, 1998; Cash and Yuste, 1999; Nettleton and Spain, 2000; Larkum et al., 2001; Wei et al., 2001; Tamas et al., 2002; Williams and Stuart, 2002) The focus of past electrophysiological studies has also been to identify the conditions in which dendrites initiate action potentials (Stuart et al., 1997; Golding and Spruston, 1998; Larkum and Zhu, 2002; Ariav et al., 2003; Gasparini et al., 2004; Womack and Khodakhah, 2004), to understand how dendrites spatially and temporally integrate inputs (Magee, 1999; Polsky et al., 2004; Williams, 2004; Gasparini and Magee, 2006; Nevian et al., 2007), and to reveal the extent of local dendritic computation (Mel, 1993; Hausser and Mel, 2003; Williams and Stuart, 2003) Although these past studies have shed light on many aspects of single-neuron computation, most studies have focused on quiescent neurons in vitro A common experimental technique is to observe how dendrites process brief ‘‘single-shock’’ inputs, either a single EPSP or the equivalent dendritic current injection, applied with no background activity present (but see Larkum et al., 2001; Oviedo and Reyes, 2002; Ulrich, 2002; Oviedo and Reyes, 2005; Gasparini and Magee, 2006) Based on the average spike rate of central neurons, it is unlikely that dendrites receive single synaptic inputs in isolation A more likely scenario is that dendrites receive constant time-varying excitatory and inhibitory synaptic input that together produces random fluctuations in the membrane potential (Ferster and Jagadeesh, 1992; Destexhe and Pare, 1999; Chance et al., 2002; Destexhe et al., 2003; Williams, 2004) The challenge is to incorporate this type of temporally varying input into our study of the neuronal input/output function Fortunately, system-identification theory provides us with several useful tools for addressing this question Using a white-noise input to reveal the neuronal input/output function The field of system-identification theory has developed rigorous methods for describing the input/ output relationships of unknown systems (for reviews, see Marmarelis and Marmarelis, 1978; Sakai, 1992; Westwick and Kearney, 2003) and has been used to describe the relationship between external sensory inputs and neuronal responses in a variety of brain areas (for reviews, see Chichilnisky, 2001; Wu et al., 2006) A prominent tool in systemidentification is the use of a ‘‘white-noise’’ stimulus to characterize the system Such an input theoretically contains all temporal correlations and power at all frequencies If the unknown system is linear, or slightly nonlinear, it is a straightforward process to extract a description of the system by correlating the output with the random input stimulus If the unknown system is highly nonlinear, however, this approach is much more difficult One difficulty of describing the input/output function of a single neuron is that we lack precise statistical descriptions of the inputs neurons receive over time Given that a typical pyramidal neuron has over ten thousand synaptic contacts, one might reasonably estimate that an input arrives on the dendrites every millisecond or less, producing membrane fluctuations that are constantly varying in time Thus, using a white-noise input has two advantages: (1) it affords the use of quantitative methods for identifying the dendrite input/output function and (2) it may represent a stimulus that is statistically closer to the type of input dendrites receive in vivo We applied a system-identification approach to reveal the input/output function of hippocampal CA1 pyramidal neurons in vitro (Fig 1) We used standard techniques to perform dual whole-cell patch clamp recordings in brain slices (Colbert and Pan, 2002) More specifically, we injected 50 s of white-noise current (Id) into the dendrites with one electrode and measured the membrane potential at the soma (Vs) with a second electrode The amplitude distribution of the injected current was Gaussian with zero mean Electrode separation ranged from 125 to 210 mm with the dendrite electrode placed on the main proximal apical dendritic branch Figure illustrates a short segment of the white-noise stimulus and the corresponding somatic membrane potentials To examine how the input/output function changed with different input conditions, we alternately changed the variance of the input current between low and high values The low-variance input produced small subthreshold fluctuations in the somatic membrane potential In contrast, the high-variance input produced large fluctuations that caused the neurons to fire action potentials with an average rate of 0.9 spikes/s This rate of firing was chosen because it is similar to the average firing rate of CA1 hippocampal neurons in vivo (Markus et al., 1995; Yoganarasimha et al., 2006) Thus, we examined the dendrite-to-soma input/ output function under physiologically reasonable subthreshold and suprathreshold operating regimes The LN model Fig Using a system-identification approach to characterize the dendrite-to-soma input/output function (A) Fifty seconds of zero-mean Gaussian distributed random current (Id) was injected into the proximal apical dendrites of CA1 pyramidal neurons and the membrane potential (Vs) was recorded at the soma The variance of the injected current was switched between low (bottom traces) and high (top traces) on alternate trials Action potentials were produced with the high-variance input (B) An LN model was fit to the somatic potential The input to the model was the injected current and the output of ^ the model was the predicted soma potential (V S ) The LN model was composed of a linear filter that was convolved with the input current followed by a static-gain function The output of the linear filter, F (arbitrary units), was scaled by the staticgain function to produce the predicted somatic potential The static-gain function was modeled as a quadratic function of F Figure illustrates our approach for describing the input/output function of the neuron using an LN model (Hunter and Korenberg, 1986) This is a functional model that provides an intuitive description of the system under study and has been particularly useful for capturing temporal processing in the retina in response to random visual inputs (for reviews, see Meister and Berry, 1999; Chichilnisky, 2001) and the processing of current injected at the soma of neurons (Bryant and Segundo, 1976; Poliakov et al., 1997; Binder et al., 1999; Slee et al., 2005) The LN model is a cascade of two processing stages: The first stage is a filter (the ‘‘L’’ stage) that linearly convolves the input current Id The output of the linear filter, F, is the input to the nonlinear second stage (the ‘‘N’’ stage) that converts the output of the linear filter into the predicted somatic ^ potentials (V S ) This second stage is static and can be viewed as capturing the gain of the system The two stages of the LN model are represented mathematically as F ¼ H ÃI d ^ V S ẳ GF ị 1ị where H is a linear filter, * the convolution operator, and G a quadratic static-gain function Having two stages of processing is an important aspect of the model because it allows us to separate temporal processing from gain control The linear filter describes the temporal processing while the nonlinear static-gain captures amplitude-dependent changes in gain Thus, this functional model permits us to describe the neuronal input/output function using quantitatively precise terms such as filtering and gain control In contrast, highly detailed biophysical models of single neurons, with their large number of nonlinear free parameters, are less likely to provide such a functionally clear description of single-neuron computation It is important to note that we did not seek to describe the production of action potentials in the dendrite-to-soma input/output function Action potentials are extremely nonlinear events and would not be captured by the LN model We instead focused on explaining the subthreshold fluctuations of the somatic voltage potential Thus, action potentials were removed from the somatic potential before the data were analyzed This was accomplished by linearly interpolating the somatic potential from ms before the occurrence of the action potential to either or 10 ms after the action potential Because action potentials make up a very small part of the 50 s of data (typically less than 2%), our results were not qualitatively affected when the spikes were left in place during the analysis The LN model accounts for the dendrite-to-soma input/output function Using standard techniques, we fit the LN model to reproduce the recorded somatic potential in response to the injected dendritic current (Hunter and Korenberg, 1986) We wanted to know how the low and high variance input conditions affected the components of the LN model Therefore, these conditions were fit separately An example of the LN model’s ability to account for the neuronal input/output function is shown in Fig For this neuron, the LN model’s predicted somatic ^ membrane voltage (V S ; dashed line) almost perfectly overlapped the neuron’s actual somatic potential (Vs, thick gray line) for both input conditions (Fig 2A and B) The LN model was able to fully describe the somatic potentials in response to the random input current with very little error Computing the Pearson’s correlation coefficient over the entire 50 s of data, the LN model accounted for greater than 97% of the variance of this neuron’s somatic potential Repeating this experiment in 11 CA1 neurons, the LN model accounted for practically all of the somatic membrane potential (average R240.97) Both the low and high variance input conditions were captured equally well by the LN model Thus, the LN model is a functional model that describes the neuronal input/output function over a range of input regimes from low-variance subthreshold to high-variance suprathreshold stimulation Gain but not filtering adapts to the input variance The LN model’s linear filters and nonlinear staticgain functions are shown for our example neuron in Fig 2C and D The impulse-response function of the linear filters (Fig 2C) for both the low (solid line) and high (dashed line) variance inputs had pronounced negativities corresponding to a bandpass in the 1–10 Hz frequency range (inset) Although the two input conditions were significantly different, the filters for the low- and highvariance inputs were very similar Across our population of neurons, we found no systematic change in the linear filters as the input variance was varied between low and high levels Therefore, the temporal processing performed by CA1 pyramidal neurons on inputs arriving at the proximal apical dendrites does not change with the input variance In contrast to the filtering properties of CA1 neurons, the static-gain function changed as a function of input variance Figure 2D illustrates the staticgain function for both input conditions In this plot, the resting membrane potential corresponds to mV and the units for the output of the linear filter (F) are arbitrary The static-gain function for the 540 Fig The receptive fields of some feature detectors Each gray square shows the incoming weights to one feature detector from all the pixels Pure white means a positive weight of at least and pure black means a negative weight of at least À3 Most of the feature detectors learn highly localized receptive fields The learned weights and biases of the features implicitly define a probability distribution over all possible binary images Sampling from this distribution is difficult, but it can be done by using ‘‘alternating Gibbs sampling.’’ This starts with a random image and then alternates between updating all of the features in parallel using Eq (5) and updating all of the pixels in parallel using Eq (6) After Gibbs sampling for sufficiently long, the network reaches ‘‘thermal equilibrium.’’ The states of pixels and features detectors still change, but the probability of finding the system in any particular binary configuration does not A greedy learning algorithm for multiple hidden layers A single layer of binary features is not the best way to model the structure in a set of images After learning the first layer of feature detectors, a second layer can be learned in just the same way by treating the existing feature detectors, when they are being driven by training images, as if they were data To reduce noise in the learning signal, the binary states of feature detectors (or pixels) in the ‘‘data’’ layer are replaced by their real-valued probabilities of activation when learning the next layer of feature detectors, but the new feature detectors have binary states to limit the amount of information they can convey This greedy, layerby-layer learning can be repeated as many times as desired To justify this layer-by-layer approach, it would be good to show that adding an extra layer of feature detectors always increases the probability that the overall generative model would generate the training data This is almost true: provided the number of feature detectors does not decrease and their weights are initialized correctly, adding an extra layer is guaranteed to raise a lower bound on the log probability of the training data (Hinton et al., 2006) So after learning several layers there is good reason to believe that the feature detectors will have captured many of the statistical 541 regularities in the set of training images and we can now test the hypothesis that these feature detectors will be useful for classification Using backpropagation for discriminative finetuning After greedily learning layers of 500, 500 and 2000 feature detectors without using any information about the class labels, gentle backpropagation was used to fine-tune the weights for discrimination This produced much better classification performance on test data than using backpropagation without the initial, unsupervised phase of learning The MNIST dataset used for these experiments has been used as a benchmark for many years and many different researchers have tried using many different learning methods, including variations of backpropagation in nets with different numbers of hidden layers and different numbers of hidden units per layer There are several different versions of the MNIST learning task In the most difficult version, the learning algorithm is not given any prior knowledge of the geometry of images and it is forbidden to increase the size of the training set by using small affine or elastic distortions of the training images Consequently, if the same random permutation is applied to the pixels of every training and test image, the performance of the learning algorithm will be unaffected For this reason, this is called the ‘‘permutation invariant’’ version of the task So far as the learning algorithm is concerned, each 28 Â 28 pixel image is just a vector of 784 numbers that has to be given one of 10 labels The best published backpropagation error rate for this version of the task is 1.6% (Simard et al., 2003) Support vector machines can achieve 1.4% (Decoste and Schoelkopf, 2002) Table shows that the error rate of backpropagation can be reduced to about 1.12% if it is only used for finetuning features that are originally discovered by layer-by-layer pretraining Details of the discriminative fine-tuning procedure Using three different splits of the 60,000 image training set into 50,000 training examples and 10,000 validation examples, the greedy learning algorithm was used to initialize the weights and biases and gentle backpropagation was then used to fine-tune the weights After each sweep through the training set (which is called an ‘‘epoch’’), the classification error rate was measured on the validation set Training was continued until two conditions were satisfied The first condition involved the average cross-entropy error on the validation set This is the quantity that is being minimized by the learning algorithm so it always falls on the training data On the validation data, however, it starts rising as soon as overfitting occurs There is a strong tendency for the number of classification errors to continue to fall after the cross-entropy has bottomed-out on the validation data, so the Table Neta, Netb and Netc were greedily pretrained on different, unlabeled, subsets of the training data that were obtained by removing disjoint validation sets of 10,000 images Pretrained Network Neta Netb Netc Combined Neta Netb Netc Combined Not pretrained Backpropagation training Set size 50,000 50,000 50,000 Train Epochs 33 56 63 Train cost Per 100 0.12 0.04 0.03 Train Errors 0 60,000 60,000 60,000 33+16 56+28 63+31 o0.12 o0.04 o0.03 0 60,000 119 o0.063 Valid cost Per 100 6.49 7.81 8.12 Valid errors In 104 129 118 118 Test cost Per 100 6.22 6.21 6.73 5.75 5.81 5.90 5.93 5.40 18.43 Test errors In 104 122 116 124 110 113 106 118 106 227 Note: After pretraining, they were trained on those same subsets using backpropagation Then the training was continued on the full training set until the cross-entropy error reached the criterion explained in the text 542 first condition is that the learning must have already gone past the minimum of the cross-entropy on the validation set It is easy to detect when this condition is satisfied because the cross-entropy changes very smoothly during the learning The second condition involved the number of errors on the validation set This quantity fluctuates unpredictably, so the criterion was that the minimum value observed so far should have occurred at least 10 epochs ago Once both conditions were satisfied, the weights and biases were restored to the values they had when the number of validation set errors was at its minimum, and performance on the 10,000 test cases was measured As shown in Table 1, this gave test error rates of 1.22, 1.16 and 1.24% on the three different splits The fourth line of the table shows that these error rates can be reduced to 1.10% by multiplying together the three probabilities that the three nets predict for each digit class and picking the class with the maximum product Once the performance on the validation set has been used to find a good set of weights, the crossentropy error on the training set is recorded Performance on the test data can then be further improved by adding the validation set to the training set and continuing the training until the crossentropy error on the expanded training set has fallen to the value it had on the original training set for the weights selected by the validation procedure As shown in Table this eliminates about 8% of the errors Combining the predictions of all three models produces less improvement than before because each model has now seen all of the training data The final line of Table shows that backpropagation in this relatively large network gives much worse results if no pretraining is used For this last experiment, the stopping criterion was set to be the average of the stopping criteria from the previous experiments To avoid making large changes to the weights found by the pretraining, the backpropagation stage of learning used a very small learning rate which made it very slow, so a new trick was introduced which sped up the learning by about a factor of three Most of the computational effort is expended computing the almost non-existent gradients for ‘‘easy’’ training cases that the network can already classify confidently and correctly It is tempting to make a note of these easy cases and then just ignore them, checking every few epochs to see if the cross-entropy error on any of the ignored cases has become significant This can be done without changing the expected value of the overall gradient by using a method called importance sampling Instead of being completely ignored, easy cases are selected with a probability of 0.1, but when they are selected, the computed gradients are multiplied by 10 Using more extreme values like 0.01 and 100 is dangerous because a case that used to be easy might have developed a large gradient while it was being ignored, and multiplying this gradient by 100 could give the network a shock When using importance sampling, an ‘‘epoch’’ was redefined to be the time it takes to sample as many training cases as the total number in the training set So an epoch typically involves several sweeps through the whole set of training examples, but it is the same amount of computation as one sweep without importance sampling After the results in Table were obtained using the rather complicated version of backpropagation described above, Ruslan Salakhutdinov discovered that similar results can be obtained using a standard method called ‘‘conjugate gradient’’ which takes the gradients delivered by backpropagation and uses them in a more intelligent way than simply changing each weight in proportion to its gradient (Hinton and Salakhutdinov, 2006) The MNIST data together with the Matlab code required for pretraining and fine-tuning the network are available at http://www.cs.toronto.edu/$hinton/ MatlabForSciencePaper.html Using extra unlabeled data Since the greedy pretraining algorithm does not require any labeled data, it should be a very effective way to make use of unlabeled examples to improve performance on a small labeled dataset Learning with only a few labeled examples is much more characteristic of human learning We see many instances of many different types of object, but we are very rarely told the name of an object 543 Preliminary experiments confirm that pretraining on unlabeled data helps a lot, but for a proper comparison it will be necessary to use networks of the appropriate size When the number of labeled examples is small, it is unfair to compare the performance of a large network that makes use of unlabeled examples with a network of the same size that does not make use of the unlabeled examples Using geometric prior knowledge The greedy pretraining improves the error rate of backpropagation by about the same amount as methods that make use of prior knowledge about the geometry of images, such as weight-sharing (LeCun et al., 1998) or enlarging the training set by using small affine or elastic distortions of the training images But pretraining can also be combined with these other methods If translations of up to two pixels are used to create 12 extra versions of each training image, the error rate of the best support vector machine falls from 1.4% to 0.56% (Decoste and Schoelkopf, 2002) The average error rate of the pretrained neural net falls from 1.12% to 0.65% The translated data is presumably less helpful to the multilayer neural net because the pretraining can already capture some of the geometrical structure even without the translations The best published result for a single method is currently 0.4%, which was obtained using backpropagation in a multilayer neural net that uses both weight-sharing and sophisticated, elastic distortions (Simard et al., 2003) The idea of using unsupervised pretraining to improve the performance of backpropagation has recently been applied to networks that use weight-sharing and it consistently reduces the error rate by about 0.1% even when the error rate is already very low (Ranzato et al., 2007) Using contrastive wake–sleep for generative fine-tuning Figure shows a multilayer generative model in which the top two layers interact via undirected connections and form an associative memory At the start of learning, all configurations of this 2000 top-level units 10 label units This could be the top level of another sensory pathway 500 units 500 units 28 × 28 pixel image Fig A multilayer neural network that learns to model the joint distribution of digit images and digit labels The top two layers have symmetric connections and form an associative memory The layers below have directed, top-down, generative connections that can be used to map a state of the associative memory to an image There are also directed, bottom-up, recognition connections that are used to infer a factorial representation in one layer from the binary activities in the layer below top-level associative memory have roughly equal energy Learning sculpts the energy landscape and after learning, the associative memory will settle into low-energy states that represent images of digits Valleys in the high-dimensional energylandscape represent digit classes Directions along the valley floor represent the allowable variations of a digit and directions up the side of a valley represent implausible variations that make the image surprising to the network Turning on one of the 10 label units lowers one whole valley and raises the other valleys The number of valleys and the dimensionality of each valley floor are determined by the set of training examples The states of the associative memory are just binary activity vectors that look nothing like the images they represent, but it is easy to see what the associative memory has in mind First, the 500 hidden units that form part of the associative memory are used to stochastically activate some of the units in the layer below via the top-down, generative connections Then these activated units 544 are used to provide top-down input to the pixels Figure shows some fantasies produced by the trained network when the top-level associative memory is allowed to wander stochastically between low-energy states, but with one of the label units clamped so that it tends to stay in the same valley The fact that it can generate a wide variety of slightly implausible versions of each type of digit makes it very good at recognizing poorly written digits A demonstration that shows the network generating and recognizing digit images is available at http://www.cs.toronto.edu/hinton/ digits.html In this chapter, each training case consists of an image and an explicit class label, but the same learning algorithm can be used if the ‘‘labels’’ are replaced by a multilayer pathway whose inputs are spectrograms from multiple different speakers saying isolated digits (Kaganov et al., 2007) The network then learns to generate pairs that consist of an image and a spectrogram of the same digit class The network was trained in two stages –— pretraining and fine-tuning The layer-by-layer pretraining was the same as in the previous section, except that when training the top layer of 2000 feature detectors, each ‘‘data’’ vector had 510 components The first 500 were the activation probabilities of the 500 feature detectors in the penultimate layer and the last 10 were the label values The value of the correct label was set to and the remainder were set to So the top layer of feature detectors learns to model the joint distribution of the 500 penultimate features and the 10 labels At the end of the layer-by-layer pretraining, the weight between any two units in adjacent layers is the same in both directions and we can view the result of the pretraining as a set of three different RBMs whose only interaction is that the data for the higher RBMs is provided by the feature activations of the lower RBMs It is possible, however, to take a very different view of exactly the same system (Hinton et al., 2006) We can view it as a single generative model that generates data by first letting the top-level RBM settle to thermal equilibrium, which may take a very long time, and then performing a single top-down pass to convert the 500 binary feature activations in the penultimate layer into an image When it is viewed as a single generative model, the weights between the top two Fig Each row shows 10 samples from the generative model with a particular label clamped on The top-level associative memory is run for 1000 iterations of alternating Gibbs sampling between samples 545 layers need to be symmetric, but the weights between lower layers not In the top-down, generative direction, these weights form part of the overall generative model, but in the bottom-up, recognition direction they are not part of the model They are merely an efficient way of inferring what hidden states probably caused the observed image If the whole system is viewed as a single generative model, we can ask whether it is possible to fine-tune the weights produced by the pretraining to make the overall generative model more likely to generate the set of image-label pairs in the training data The answer is that the generative model can be significantly improved by using a contrastive form of the wake–sleep algorithm In the lower layers, this makes the recognition weights differ from the generative weights In addition to improving the overall generative model, the generative fine-tuning makes the model much better at assigning labels to test images using a method, which will be described later In the standard wake–sleep algorithm, the network generates fantasies by starting with a pattern of activation of the top-level units that is chosen stochastically using only the generative bias of each top-level unit to influence its probability of being on This way of initiating fantasies cannot be used if the top two layers of the generative model form an associative memory because it will not produce samples from the generative model The obvious alternative is to use prolonged Gibbs sampling in the top two layers to sample from the energy landscape defined by the associative memory, but this is much too slow A very effective alternative is to use the bottom-up recognition connections to convert a image-label pair from the training set into a state of the associative memory and then to perform brief alternating Gibbs sampling which allows the associative memory to produce a ‘‘confabulation’’ that it prefers to its initial representation of the training pair The top-level associative memory is then trained as an RBM by using Eq (4) to lower the energy of the initial representation of the training pair and raise the energy of the confabulation The confabulation in the associative memory is also used to drive the system top-down, and the states of all the hidden units that are produced by this generative, top-down pass are used as targets to train the bottom-up recognition connections The ‘‘wake’’ phase is just the same as in the standard wake–sleep algorithm: After the initial bottomup pass, the top-down, generative connections in the bottom two layers are trained, using Eq (2), to reconstruct the activities in the layer below from the activities in the layer above The details are given in Hinton et al (2006) Fine-tuning with the contrastive wake–sleep algorithm is about an order of magnitude slower than fine-tuning with backpropagation, partly because it has a more ambitious goal The network shown in Fig takes a week to train on a GHz machine The examples shown in Fig were all classified correctly by this network which gets a test error rate of 1.25% This is slightly worse than pretrained networks with the same architecture that are fine-tuned with backpropagation, but it is better than the 1.4% achieved by the best support vector machine on the permutation-invariant version of the MNIST task It is rare for a generative model to outperform a good discriminative model at discrimination There are several different ways of using the generative model for discrimination If time was not an issue, it would be possible to use sampling methods to measure the relative probabilities of generating each of the ten image-label pairs that are obtained by pairing the test image with each of the 10 possible labels A fast and accurate approximation can be obtained by first performing a bottom-up pass in which the activation probabilities of the first layer of hidden units are used to compute activation probabilities for the penultimate hidden layer Using probabilities rather than stochastic binary states suppresses the noise due to sampling Then the vector of activation probabilities of the feature detectors in the penultimate layer is paired with each of the 10 labels in turn and the ‘‘free energy’’ of the associative memory is computed Each of the top-level units contributes additively to this free energy, so it is easy to calculate exactly (Hinton et al., 2006) The label that gives the lowest free-energy is the network’s guess Fitting a generative model constrains the weights of the network far more strongly than fitting a discriminative model, but if the ultimate objective is discrimination, it also wastes a lot of the 546 discriminative capacity This waste shows up in the fact that after fine-tuning the generative model, its discriminative performance on the training data is about the same as its discriminative performance on the test data — there is almost no overfitting This suggests one final experiment After first using contrastive wake–sleep for fine-tuning, further finetuning can be performed using a weighted average of the gradients computed by backpropagation and by contrastive wake–sleep Using a validation set, the coefficient controlling the contribution of the backpropagation gradient to the weighted average was gradually increased to find the coefficient value at which the error rate on the validation set was minimized Using this value of the coefficient, the test error rate was 0.97%, which is the current record for the permutation-invariant MNIST task It is also possible to combine the gradient from backpropagation with the gradient computed by the pretraining (Bengio et al., 2007) This is much less computational effort than using contrastive wake–sleep, but does not perform as well Acknowledgments I thank Yoshua Bengio, Yann LeCun, Peter Dayan, David MacKay, Sam Roweis, Terry Sejnowski, Max Welling and my past and present graduate students for their numerous contributions to these ideas The research was supported by NSERC, CFI and OIT GEH is a fellow of the Canadian Institute for Advanced Research and holds a Canada Research Chair in Machine Learning References Bengio, Y., Lamblin, P., Popovici, D and Larochelle, H (2007) Greedy layer-wise training of deep networks In: Advances in Neural Information Processing Systems, 19 MIT Press, Cambridge, MA, pp 153–160 Bryson, A and Ho, Y (1975) Applied optimal control Wiley, New York Decoste, D and Schoelkopf, B (2002) Training invariant support vector machines Machine Learn., 46: 161–190 Hinton, G.E (2002) Training products of experts by minimizing contrastive divergence Neural Comput., 14: 1771–1800 Hinton, G.E., Dayan, P., Frey, B.J and Neal, R (1995) The wake-sleep algorithm for self-organizing neural networks Science, 268: 1158–1161 Hinton, G.E., Osindero, S and Teh, Y.W (2006) A fast learning algorithm for deep belief nets Neural Comput., 18: 1527–1554 Hinton, G.E and Salakhutdinov, R.R (2006) Reducing the dimensionality of data with neural networks Science, 313: 504–507 Jabri, M and Flower, B (1992) Weight perturbation: an optimal architecture and learning technique for analog VLSI feedforward and recurrent multilayer networks IEEE Trans Neural Netw., 3(1): 154–157 Kaganov, A., Osindero, S and Hinton, G.E (2007) Learning the relationship between spoken digits and digit images In: Technical Report, Department of Computer Science, University of Toronto Karni, A., Tanne, D., Rubenstein, B., Askenasy, J and Sagi, D (1994) Dependence on REM sleep of overnight improvement of a perceptual skill Science, 265(5172): 679 ´ ´ LeCun, Y (1985) Une procedure d’apprentissage pour reseau a seuil asymmetrique (a learning scheme for asymmetric threshold networks) In: Proceedings of Cognitiva 85, Paris, France, pp 599–604 LeCun, Y., Bottou, L., Bengio, Y and Haffner, P (1998) Gradient based learning applied to document recognition Proc IEEE, 86(11): 2278–2324 LeCun, Y., Huang, F.-J and Bottou, L (2004) Learning methods for generic object recognition with invariance to pose and lighting In: Proceedings of CVPR’04 IEEE Press, New York Mazzoni, P., Andersen, R and Jordan, M (1991) A more biologically plausible learning rule for neural networks Proc Natl Acad Sci., 88(10): 4433–4437 Merzenich, M., Kaas, J., Wall, J., Nelson, R., Sur, M and Felleman, D (1983) Topographic reorganization of somatosensory cortical areas 3b and in adult monkeys following restricted deafferentation Neuroscience, 8(1): 33–55 Minsky, M and Papert, S (1969) Perceptrons: an introduction to computational geometry MIT Press, Cambridge, MA Parker, D (1985) Learning logic In: Technical Report TR-47 Center for Computational Research in Economics and Management Science Massachusetts Institute of Technology, Cambridge, MA Ranzato, M., Poultney, C., Chopra, S and LeCun, Y (2007) Efficient learning of sparse representations with an energybased model In: Advances in Neural Information Processing Systems 17 MIT Press, Cambridge, MA Rosenblatt, F (1962) Principles of Neurodynamics Spartan Books, New York Rumelhart, D.E., Hinton, G.E and Williams, R.J (1986) Learning representations by back-propagating errors Nature, 323: 533–536 Selfridge, O.G (1958) Pandemonium: a paradigm for learning In: Mechanisation of thought processes: Proceedings of a symposium held at the National Physical Laboratory HMSO, London 547 Seung, H (2003) Learning in spiking neural networks by reinforcement of stochastic synaptic transmission Neuron, 40(6): 1063–1073 Sharma, J., Angelucci, A and Sur, M (2000) Induction of visual orientation modules in auditory cortex Nature, 404(6780): 841–847 Simard, P.Y., Steinkraus, D and Platt, J (2003) Best practice for convolutional neural networks applied to visual document analysis In: International Conference on Document Analysis and Recogntion (ICDAR) IEEE Computer Society, Los Alamitos, pp 958–962 Vapnik, V.N (2000) The Nature of Statistical Learning Theory Springer, New York Werbos, P (1974) Beyond regression: new tools for prediction and analysis in the behavioral sciences PhD thesis, Harvard University Subject Index basis function 40, 59, 436–437, 439, 442 Bayesian 79–80, 84–85, 324, 371, 480, 500, 503, 509–511, 513–515, 517, 519 biarticular 350, 352–353 biased competition 86, 482, 488 binary activation function 117 binary classifier 42 binary code 105, 117–118 biomechanics 255–256, 260, 263, 299, 302, 313, 326, 389 body schema 391 Boltzmann machine 539 boosting algorithms 521 Botzinger complex 209 ¨ brain machine interface 105 brainstem 128, 176, 201, 204, 206, 224, 229, 236 bursts 13–15, 80, 88, 144, 146–149, 193–197, 205–206, 208, 210–215, 222–226, 229, 237, 239–240, 244–247, 250, 257, 326, 328, 333–334, 336–338, 452, 483, 497 action selection 475, 478, 488–489 action specification 475, 479 action-perception cycle 451–453, 456 adaptation incremental 375, 378, 380 motor 373–380 perceptual 128 trial-by-trial 373–378, 380 afferent control (see control, afferent) affordance competition hypothesis 378–380, 383, 488 after effect 374, 388 amnesia anterograde 108 retrograde 108 AMPA receptor 126, 143 angular velocity 159–160, 167–168, 402–404 apraxia 388–392 arm position error (see position error) arm stiffness 367 attention bottom-up 65, 74 divided 301, 316 feature-based 63, 65, 68–69 object-based 65, 69–70, 86–88 spatial 63, 67–68, 90–92 top-down 60, 65, 70–71, 74, 86–88 visual 57, 68, 75, 88 attentional shroud 79, 91 attractor 21, 25, 146, 325, 395–397, 399–402, 407–408, 427–429, 435–436, 438–439, 441, 447, 450, 452–453, 456, 460 attractor dynamics 395, 397, 400, 427 auditory system 145, 518 avalanche 13–14, 18 cable theory calcium 13, 16–18, 197, 206, 215–217, 505 categorization 33–36, 38, 42–50, 52–53, 105, 108, 117 category 42–48, 57, 60, 65, 70, 74, 79, 90, 92, 99, 114, 450, 526 cell assembly 105 center of mass 300, 302–303, 305 center-surround 39, 64, 66, 86 central pattern generator 202, 235, 255–256, 276, 325, 428 centripetal force (see force, centripetal) cerebellum 109, 149, 158, 162, 165, 169, 175, 256, 364, 378, 384, 387, 434, 451, 479, 489 channel (see ion channel) classification 40, 42–44, 106–107, 110, 112, 525–526, 529–531, 533, 536, 538, 541 coactivation 301 backpropagation 536–537, 539, 541–543, 546 backward masking 48 basal ganglia 222, 387, 451, 470, 479, 482 549 550 co-articulation 420, 423 cochlear nucleus 123–125 co-contraction 302, 370, 372 cognition 79, 81, 99, 106, 117, 267, 277, 279, 459, 475–477, 488–489 cognitive psychology 476–477 cognitive neuroscience 90, 476, 488 cognitive task 463 coherence 66, 84–85 command eye-position 184 eye-velocity 183 motor 350, 360, 363–366, 372, 383–388, 427 torsional 185 communication 123, 135–137, 139–141, 147–148, 450 competitive queuing models 79, 99 complex cells 34, 37–39 computer vision 47, 67 conductance 7, 9, 24, 193–197, 212–213, 215, 217, 225, 237–239, 243, 499 conjugate gradient 542 contextual cues 383, 390 control adaptive 373, 386 afferent 247–250 feedback 299, 360, 385–387 force 364–365 optimal 184–185, 360, 370, 412, 425–426 optimal feedback 360, 387 postural 222, 299–305, 307–316 sensorimotor 181–182, 285–295 threshold position 267–272, 277, 279 control policy 387, 425–427 controller impedance 364 time-optimal 184–185 torque 435 torsional 185 coordinate framework 407, 409 coordinates hand 374 joint 374 motor command 380 coriolis force (see force, coriolis) cortex anterior intraparietal 391 cingulate 470 inferior temporal 35, 57, 464, 467 medial temporal 107 motor 310–312, 347, 355, 391 parietal 81, 90, 412, 434, 479, 481, 485 prefrontal 35, 79, 81, 463, 469, 479, 481 premotor 434, 478 visual 21–22, 34–36, 38, 40–42, 44, 47, 50–52, 81–84 cortical column 119 cost criterion 426 cost function 181–182, 413–417, 536 coupled oscillators 328 coupling 13, 15, 161–162, 170, 225–226, 236, 240, 300, 337–338, 348, 427–428, 435, 438, 441–442, 495 cross-entropy 536, 541–542 cyclorotation 186–189 decision-making 85, 486 decision threshold 482, 488 decoding algorithm 113 decomposition 325, 331–333, 336 degree of freedom 438–441 delayed match to sample 464–473 delayed pair association 464–473 dendrite 1, 3–6, 9–10, 143 desired trajectory 371–372, 426–427, 441–442 development 13, 26, 39, 79, 82, 84–86, 88–89, 92–94, 96, 98–99, 119, 273, 342–343, 425, 459, 477 direct components analysis 333–336 directional tuning 303–304 Donder’s law 189 dopamine 196, 198, 387, 470 dorsal stream 478–479, 481, 488 dynamic movement primitives 435–443 dynamic programming 426 dynamic system 425, 427–429, 434–436, 439, 441 dynamics activation 350 contraction 350 environmental 375, 403 inverse 396, 435 interaction 364 limb 348–350 multi-joint 348 segmental 348 551 efference copy 284, 294, 391, 419 electrosensory 135–141, 143–144, 148, 149 endogenous bursting mechanism 206–207, 211–212 energy 316, 340, 350, 353, 385, 395, 401, 412–413, 419, 425, 429, 447, 454–455, 460, 503, 543–545 ensemble recording 105, 109 epipolar line 186–189 error correction 396, 403, 411–412, 419–420 error signal 364, 378, 380, 383, 387, 405, 408–409 excitation cortico-cortical 29–30 recurrent 24, 238–239 executive function 120, 463 executive unit 465–466, 468–470 feature detector 74, 146, 447, 450–451, 535–536, 538–541, 546, 548 feature encoding 106, 120 feature encoding pyramid 105, 114, 116–117 feature map 62, 64, 66–67, 69–71, 74 feature modulation function 63–66, 68–69 feedback internal 387–388 proprioceptive 244, 247, 250, 269, 271, 286–287, 289, 293, 427, 451 sensory 222, 227, 231–232, 235–236, 244, 247, 284, 328, 407, 448, 488 visual 374, 384–385, 388, 419, 423 feedback control (see control, feedback) feedback control law 316, 385–387 feedforward filtering 21 force compensation 368 force control (see control, force) force field 323, 325–331, 365–371, 387, 390, 402, 405, 407 position dependent 365–367 velocity dependent 365, 367–368, 371 force-field primitives (see primitives, force-field) forces centripetal 348 coriolis 374 interaction 326, 373 viscous 374 force-velocity relation 357 forward model (see model, forward) function approximation 407, 439 Gabor filter 58–59, 64, 70 gating 57–58, 357, 465, 468, 470 gaze shift 185 generalization 36, 38, 40, 96, 108, 116–117, 120, 189, 375–376, 378, 389, 416, 435, 449, 452, 456–458, 470, 472, 496, 499, 514, 531 genetic code 117, 119 geometric stage 411–413, 415, 420, 422 geons 58, 61, 63–64 Gibbs sampling 540, 544–545 gradient descent 396, 411, 413–414, 416, 419–420, 423 gradient vector 414, 418 gravity 155–157, 159–160, 163–165, 167–168, 170, 175–177, 222, 275, 283, 286–287, 301, 340, 427, 430, 447, 456 half-centre 193, 195–196, 206, 210, 235–241, 243–245, 250, 260–261 hand velocity 349, 374, 376–377 hidden units 535–539, 541, 543, 545 hidden layer 396–397, 399, 402, 408, 521, 523–525, 530, 533, 537, 540–541, 545 hierarchical models 33–34, 44, 57–59 hippocampus 1, 9, 105–109, 116, 119–120, 453 Hodgkin-Huxley model 206, 215, 223–224, 228, 235, 237, 241 hypercolumn 24, 28, 61–62 hypercomplex cells 34 if-then rules 258–260, 262 image stabilization 183 imitation learning (see learning, imitation) impedance controller (see controller, impedance) independent components analysis 306, 333, 335–336, 338 information theory 9, 459 inhibition cortical 23, 28–29 push-pull 23–24 integrate and fire 53, 493, 498–499 integration sensorimotor 299–300, 388–389, 392 integrator 16, 155, 159, 162, 170 intentional arc 447–448, 450, 456–457 interaction dynamics (see dynamics, interaction) 552 internal model 160–162, 284, 290, 293, 304, 331, 338, 341–342, 363–364, 370–372, 388, 390–391, 395, 401–405, 407–408, 460 internal observer 283–284 internal representation 105, 116, 155, 160, 177, 273, 293, 396, 416, 476, 478, 537 interneuron 23, 30, 83, 85, 95, 100, 139, 141–144, 148–149, 195, 222–226, 228–231, 235–239, 241–243, 247–248, 250, 256, 260–261, 268, 270, 336, invariance 34–39, 42, 44, 47, 49, 51–54, 57–59 inverse model (see model, inverse) inverted pendulum 286, 290, 300, 316, 340 ion channel 1, 6–10, 124, 193, 198, 223, 455 time-dependent 8–9, 276 voltage-dependent 1–2, 8–9 Jacobian 419 joint angles 278, 300–301, 327, 381, 413–416, 418, 421, 423, 431 joint mechanics 351, 355 joint motion 305–306, 315–316, 348–350, 354, 359 joint torque 272, 275, 285–286, 306, 314, 327, 330, 348–351, 354–356, 359, 378, 395, 403 joint velocity 354, 374 kernel machines 521, 523–527, 529–530, 533 kinematics 255, 258, 261–263, 293, 313, 325, 349–350, 404, 407–409, 420, 432 limb 326, 351 kinetics 1, 9, 206, 216–217, 237, 240, 326, 349–350 lambda model 267 lamina 75, 79–84, 86, 93, 96, 99–100, 135, 141, 143, 355 laminar computing 79–84 lamprey 221–228, 230–232 learning imitation 325, 439 machine 521–522, 524, 530, 533, 536 motor 261, 324, 363–364, 370–375, 389, 425 reinforcement 377, 426, 466, 524–525, 532–533 supervised 38–40, 91, 438–439, 532 trajectory 426, 441 unsupervised 37, 395, 530, 532, 535, 539 likelihood 337, 407, 409, 493–497, 510, 512–515, 517 linear filter model 3–6 linear-nonlinear-Poisson model 495, 497–498 Listing’s law 184–185, 189 local optima 416, 505 locomotion 221–227, 229–230, 235–248, 250, 255–262, 276–277, 301, 313, 324, 328, 331, 333–336, 340 fictive 235–237, 239–241, 243–248, 250–251, 260, 262 low-dimensional subspace 110 machine learning (see learning, machine) mechanical impedance 363–365, 371–372 mechanics 316, 355, 357, 359, 403, 454 limb 348–349, 360 musculoskeletal 348 medial temporal lobe (see cortex, medial temporal) memory 107–108, 338, 463 associative 67, 452, 543–545 declarative 105, 108 episodic 105, 108, 111, 117 eplicit 108 loss of (see amnesia) motor 269, 374, 384, 390, 392, 460 procedural 108 semantic 105, 108, 116–117, 119, 389 working 79, 99–100, 464–467, 469–471, 483 memory trace 89, 105, 108–113 minimal interaction 267–269, 271–273, 275–277, 279 minimization 268, 275–278, 284, 293, 356, 386 minimum jerk 325, 412, 436–437, 439 minimum torque 412, 439 minimum variance 412, 439 model forward 364, 383, 386–388, 391, 488, 502 inverse 386–387 monoarticular 351–352, 354 motor command (see command, motor) motor learning (see learning, motor) movement discrete 425, 428–429, 432–435, 437, 442 rhythmic 202, 425, 428–430, 432–435, 438, 442 movement path 411–413, 418, 420, 423 movement primitives (see primitives, movement) movement segmentation 429–430, 432 movement speed 275, 378, 411 553 multiple discriminant analysis 105, 110, 112 muscle activation pattern 258, 299–301, 303–308, 310, 312–313, 315–316, 324, 368–369 muscle length 257, 269–271, 278–279, 413, 447 navigation 81–82, 135, 182 neocognitron 58 neural clique 105, 113–120 neural code 105–106, 117, 123, 125, 447, 493–494, 500, 506, 510 population code 106–107, 109, 145, 407, 409, 442, 475, 509, 511, 513–514, 517 rate code 106, 109 temporal code 106, 109 neural network 59, 75, 87, 94, 105–106, 116, 119, 127, 144, 155, 159, 177, 193, 273, 347, 375, 395, 399, 413, 426, 451, 471, 521, 523–525, 530–531, 533, 535, 537–538, 543 neuroethology 324 neuromechanical model 226–228, 232, 255, 258, 262, 299 NMDA receptor 108, 126, 148–149, 223, 225, 237 noise 44, 128, 135, 140, 150, 284, 287, 289–295, 333, 338, 385, 395, 399–400, 405, 407–408, 419, 422, 439, 480, 482, 496, 509–510, 518, 536, 540, 545 white 1–3, 290, 495 noncommutativity 182–183, 189 object recognition 33–36, 44–45, 50–51, 54, 57–67, 69–70, 75, 97, 536 olfaction 135, 447–448, 451–453 optimization 181–185, 189, 198, 283, 288, 293–294, 313, 353, 359, 412–413, 418, 425–426, 428, 434–435, 438–439, 441–442, 495, 497, 504–505, 530–532, 536 sensorimotor 181 organization modular 330, 334, 341, 456, 463, 472 orientation tuning 21–30, 60, 145 oscillation 9, 86, 98, 135, 147–148, 210, 212–214, 235, 237, 239–241, 245, 325, 340, 438, 448, 452–454, 456 oscillator 193, 214, 256, 260, 263, 332–333, 336–338, 340–341, 428, 438 overfitting 496, 541, 546 Parkinson’s disease 316, 386 pattern formation 235, 241–248, 250–251, 256 perceptron 70, 525–526, 535 perceptual grouping 79, 83–88, 93–94, 97 performance error 364–368, 370–372 perturbation 222, 231, 235, 245–246, 250, 299, 301–308, 310–311, 315, 332, 337, 340, 363, 370–372, 374, 376–378, 388, 392, 395, 403, 405, 408, 416, 435, 441–442, 472, 505 place cells 106–107, 109, 113, 116 place conditioning test 110 plant musculoskeletal 347 population coding (see neural code, population code) position error 363–365, 370–372, 377, 403–404, 407–408 postural control (see control, postural) postural path 412–413, 416, 418, 420, 423 postural variable 413 posture 222, 267, 283, 285–286, 293–294, 299, 304–307, 315–316, 347, 351–353, 356–359, 374, 378, 391–392, 412–413, 415–417, 419 power law 14–15, 17–18, 429–432 pre-attentive vision 51 preferred direction 397, 481, 483, 485, 487 preferred orientation 22–23, 26–29, 34, 36–37 preferred stimulus 36–37, 41, 399, 512–513, 518 preferred torque direction 351–352, 355, 359 prey capture 135, 137, 396 primitives 37–38, 323, 325–329, 331–333, 342–343, 396 force-field 323, 325–331 kinematic 323, 325 movement 425, 435, 439, 442 rhythmic 325 principal component analysis 110, 306, 333 probabilistic population code 509–514 probability density function 480, 495 probability distribution 25, 502, 509–511, 517–518, 540 psychophysics 33, 40, 46–47, 50, 52–53, 373–374, 380, 402, 429 Purkinje cells 175–176, 378, 380 quenching threshold 482, 486 554 radial basis function 40, 59, 396 random-dot stereogram 182, 186 rate code (see neural code, rate code) reach to grasp 412–413, 423 reaching movement 267, 354–356, 364, 370, 478, 483 recognition by components 58–59, 61 object (see object recognition) view-based 58–59, 61, 70 visual (see visual recognition) recognition hierarchy 63–65, 68 recurrent networks 22, 100 redundancy kinematic 184 problem 267, 411 reference frame 155–156, 159–160, 167–168, 175–176, 411, 414–415, 419 referent position 272–273, 275–277 reflex stretch 257–259, 314, 367 vestibulo-ocular 157–158, 162, 175, 182, 285 reflex arc 448, 450, 455–456, 457 reinforcement learning (see learning, reinforcement) respiration 201–204, 206–215, 223, 237, 240 response variability 106, 113 retinotopic 36–37, 62, 66, 229–230, 478 retinal image 182–183, 419, 423 reward 383–388, 390–391, 425–426, 449, 460, 465, 467, 471, 482 rhythm generation 202–203, 206, 215, 235–241, 243, 245, 250, 328, 341 robot, 221, 227–228, 285, 288–289, 293–295, 325, 340, 351, 365, 374, 383–385, 390–391, 406, 412, 425, 427–428, 431–432, 439, 457, 460 saccade 68–69, 184–185 salience 65–67, 69–71, 150 searchlight hypothesis 148–149 selectivity index 23, 41 self-motion 156, 285, 291 self-organization 84, 425 semantic knowledge 390–391 semantic memory (see memory, semantic) sensor fusion 283–289, 294–295 sensorimotor integration (see integration, sensorimotor) sensory consequences 387–388, 460 sensory error 364, 380 sensory integration 155, 303–304, 314 sequential sampling models 488 simple cells 22, 34–37, 39, 93 single-neuron computation 1, sound intensity 126–127, 129–131 sound localization 123, 127, 156 spatial modulation function 63–64, 66–69 specificity 23–24, 35, 37, 57–59, 65, 113, 116, 359, 380, 451 speed (see movement speed) spinal cord 193, 201–204, 213, 221–222, 224–226, 228, 230–231, 235–236, 240–241, 251, 255, 286, 304, 310, 313, 323, 326, 328–329, 331, 342–343, 451 standing 277–278, 289, 299–301, 307, 314, 316, 458 state kinematic 260, 435 proprioceptive 387 state-space 377, 403 stereopsis 93, 96, 181, 188, 189 stimulus contrast 22–23, 91, 500–503 stomatogastric ganglion 194, 196 supervised learning (see learning, supervised) support vector machine 526, 536, 541, 543, 545 swimming 224, 226–228, 231–232 symbol grounding problem 447 synaptic plasticity 108, 113, 123, 125, 127–128, 130, 148–149, 505 synaptic strength 13, 16, 126–127, 452 synergies 299–301, 304–308, 310, 313–316, 323, 325–326, 328, 331–332 synfire chain 109 syntax 106, 447, 463–464, 472–473 cognitive 463–464, 472–473 systems identification 1–3, 386 task representation 471–473 temporal code (see neural code, temporal code) threshold position control (see control, threshold position) tool 2, 33, 52, 110, 232, 386, 389–392, 425, 428, 494, 500 555 top down approach 284 topography 51, 143, 149 torque error 377, 403 torsion 156–158, 181, 184–185 trajectory learning (see learning, trajectory) transformation, sensorimotor 299–300, 310, 314–315, 395, 519 transport 276, 416, 418, 420, 422–423 tuning function 36–37, 482 tuning properties 34, 39–40, 59, 106 underdetermined 412–413, 416, 423 unsupervised learning (see learning, unsupervised) ventral stream 33–39, 41, 47, 49–50, 53, 479, 481 vestibular system 116, 156, 229, 231, 284–285, 291–292, 295 viscoelastic properties 364 visual recognition 33–34, 40, 44, 51–53 visual system 9, 21, 33–34, 37–39, 48, 54, 57, 86, 136, 139, 146–147, 186, 188, 419, 476, 488, 518 wake-sleep algorithm 535, 538, 545 wave packet 447, 452–454, 456–457, 459 winner-takes-all 66–68, 73 working memory (see memory, working) ... other (Ben-Yishai et al., 1995) Our own view based on our experimental work, and also on recent theoretical work (Troyer et al., 1998; Chance et al., 1999; McLaughlin et al., 2000; Tao et al., 2004;... activity (Chance et al., 2002; Rauch et al., 2003; Shu et al., 2003), or both (Higgs et al., 2006) Because of the importance of maintaining the optimal level of activity in the brain, it is not... also in computational models of the cortex (Troyer et al., 1998; McLaughlin et al., 2000; Wielaard et al., 2001) Untuned suppression and cortical inhibition To judge whether or not cortico-cortical