Speech and audio signal processing

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang	547
Dung lượng	16,6 MB

Nội dung

SPEECH AND AUDIO SIGNAL PROCESSING Processing and Perception of Speech and Music BEN GOLD Massachusetts Institute of Technology Lincoln Laboratory NELSON MORGAN University of California at Berkeley International Computer Science Institute with contributions from Herve Bourlard Eric Fosler-Lussier Jeff Gilbert _ Acquisitions Editor Bill Zobrist Marketing Manager Katherine Hepburn Senior Production Editor Robin Factor Senior Designer Laura Boucher Illustration Editor Gene Aiello Electronic Illustrations Radiant This book was set in 10/12 Times Roman by TechBooks, and printed and bound by Quebecor/Fairfield The cover was printed by Lehigh Press The book is printed on acid-free paper Đ Copyright â 2000 by John Wiley & Sons, Inc All rights reserved No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of tbi: appropriate per-copy fee to the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470 Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 605 Third Avenue, New York, NY 10158-0012, (212) 850-6011, fax (212) 850-6008, E-Mail: PERMRFQ@WILEY.COM To order books please call 1(800)-225-5945 Library of Congress Cataloging in publication Data: Gold, Ben, 1923Speech and audio signal processing : processing and perception of speech, and music I Ben Gold, Nelson Morgan ; with contributions from Herve Bourlard, Eric Fosler-Lussier, and Jeff Gilbert, p cm Includes Index ISBN 0-471-35154-7 (alk paper) Speech processing systems Signal processing-Digital techniques Electronic music TK7882.S65G65 621.382'2-dc21 Morgan Nelson 1999 ISBN 0-471-35154-7 Printed in the United States of America 10 99-16025 CIP This book is dedicated to our families and our students ix CONTENTS 6.9 6.10 CHAPTER 8.3 8.4 8.5 83 Introduction Filtering Concepts 84 Useful Filter Functions 88 Transformations for Digital Filter Design 90 Digital Filter Design with Bilinear Transformation The Discrete Fourier Transform 92 Fast Fourier Transform Methods 95 Relation Between the DFT and Digital Filters 98 Exercises 100 PATTERN CLASSIFICATION 8.1 8.2 CHAPTER 79 DIGITAL FILTERS AND DISCRETE FOURIER TRANSFORM 7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 7.9 CHAPTER Concluding Comments Exercises 79 103 103 Introduction 105 Feature Extraction Some Opinions 106 8.2.1 Pattern-Classification Methods 107 Minimum Distance Classifiers 107 8.3.1 Discriminant Functions 109 8.3.2 110 Generalized Discriminators 8.3.3 113Exercises Appendix: Multilayer Perception Training 114 114 Definitions 8.5.1 115 Derivation 8.5.2 STATISTICAL PATTERN CLASSIFICATION 9.1 9.2 9.3 9.4 9.5 9.6 9.7 9.8 9.9 119 119 Introduction A Few Definitions 119 120 Class-Related Probability Functions 121 Minimum Error Classification 122 Likelihood-Based MAP Classification Approximating a Bayes Classifier 123 125 Statistically Based Linear Discriminants 126 Discussion 9.7.1 126 Iterative Training: The EM Algorithm 131 Discussion 9.8.1 Exercises 132 91 83 CHAPTER 10 WA VE BASICS 10.1 10.2 10.3 10.4 10.5 10.6 10.7 10.8 10.9 CHAPTER 11 12 12.6 12.7 12.8 12.9 13 Introduction 148 Acoustic Tube Models of English Phonemes Excitation Mechanisms in Speech Production Exercises 153 MUSIC PRODUCTION 12.1 12.2 12.3 12.4 12.5 CHAPTER Introduction 137 The Wave Equation for the Vibrating String 137 Discrete-Time Traveling Waves 139 Boundary Conditions and Discrete Traveling Waves 140 Standing Waves 140 Discrete- Time Models of Acoustic Tubes 141 Acoustic Tube Resonances 143 Relation of Acoustic Tube Resonances to Observed Formant Frequencies 144 Exercises 146 ACOUSTIC TUBE MODELING OF SPEECH PRODUCTION 11.1 11.2 11.3 11.4 CHAPTER 137 148 152 1'54 Introduction 154 Sequence of Steps in a Plucked or Bowed String Instrument 155 Vibrations of the Bowed String 155 Frequency-Response Measurements of the Bridge of a Violin 156 Vibrations of the Body of String Instruments: Measurement Methods 159 Radiation Pattern of Bowed String Instruments 163 Some Considerations in Piano Design 165 Brief Discussion of the Trumpet, Trombone, French Horn, and Tuba Exercises 173 ROOM ACOUSTICS 13.1 148 175 Sound Waves 175 13.1.1 One-Dimensional Wave Equation 13.1.2 Spherical Wave Equation 177 13.1.3 Intensity 177 13.1.4 Decibel Sound Levels 178 13.1.5 Typical Power Sources 178 176 171 xi CONTENTS 13.2 13.3 13.4 CHAPTER 14 EAR PHYSIOLOGY 14.1 14.2 14.3 14.4 14.5 14.6 14.7 CHAPTER 15 16 CHAPTER 17 Introduction 189 Anatomical Pathways from the Ear to the Perception of Sound The Peripheral Auditory System 191 Hair Cell and Auditory Nerve Functions 192 Properties of the Auditory Nerve 194 Summary and Block Diagram of the Peripheral Auditory System 201 Exercises 203 205 Introduction 205 Sound-Pressure -Level and Loudness 206 Frequency Analysis and Critical Bands 208 Masking 210 Summary 212 Exercises 213 MODELS OF PITCH PERCEPTION 16.1 16.2 16.3 16.4 16.5 16.6 214 Introduction 214 Historical Review of Pitch-Perception Models 214 Physiological Exploration of Place Versus Periodicity 219 Results from Psychoacoustic Testing and Models 220 Summary 224 Exercises 226 SPEECH PERCEPTION 17.1 17.2 17.3 184 189 PSYCHOACOUSTICS 15.1 15.2 15.3 15.4 15.5 15.6 CHAPTER Sound Waves in Rooms 179 13.2.1 Acoustic Reverberation 180 13.2.2 Early Reflections 183 Room Acoustics as a Component in Speech Systems Exercises 185 228 Introduction 228 Vowel Perception: Psychoacoustics and Physiology The Confusion Matrix 231 228 189 xii CONTENTS 17.4 17.5 17.6 17.7 17.8 17.9 CHAPTER 18 HUMAN SPEECH RECOGNITION 18.1 18.2 18.3 18.4 18.5 CHAPTER 19 19.4 19.5 19.6 19.7 19.8 20 Introduction 246 The Articulation Index and Human Recognition 246 The Big Idea 246 18.2.1 The Experiments 18.2.2 247 18.2.3 Discussion 248 Comparisons between Human and Machine Speech Recognizers 248 Concluding Thoughts 252 Exercises 253 257 Introduction 257 Review of Fletcher's Critical Band Experiments 257 Relation Between Threshold Measurements and Hypothesized Filter Shapes 259 Gamma-Tone Filters, Roex Filters, and Auditory Models 264 Other Considerations in Filter-Bank Design 266 Speech Spectrum Analysis Using the FFf 268 Conclusions 269 Exercises 269 THE CEPSTRUM AS A SPECTRAL ANALYZER 20.1 20.2 20.3 20.4 20.5 20.6 20.7 239 246 THE AUDITORY SYSTEM,J.S A FILTER BANK 19.1 19.2 19.3 CHAPTER Perceptual Cues for Plosives 234 Physiological Studies of Two Voiced Plosives 235 Motor Theories of Speech Perception 237 Neural Firing Patterns for Connected Speech Stimuli Concluding Thoughts 240 Exercises 243 271 Introduction 271 A Historical Note 271 The Real Cepstrum 272 The Complex Cepstrum 273 Application of Cepstra1 Analysis to Speech Signals Concluding Thoughts 277 Exercises 278 275 ... Ben, 192 3Speech and audio signal processing : processing and perception of speech, and music I Ben Gold, Nelson Morgan ; with contributions from Herve Bourlard, Eric Fosler-Lussier, and Jeff... development of speech and music synthesizers, speech transmission systems, and automatic speech recognition (ASR) systems Hand in hand with this progress has come an enhanced understanding of how... people produce and perceive speech and music In fact, the processing of speech and music by devices and the perception of these sounds by humans are areas that inherently interact with and enhance

Ngày đăng: 08/03/2018, 15:30