Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 28 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
28
Dung lượng
414,98 KB
Nội dung
Jenkins, W. K. “Auditory Psychophysics for Coding Applications”
Digital SignalProcessing Handbook
Ed. Vijay K. Madisetti and Douglas B. Williams
Boca Raton: CRC Press LLC, 1999
c
1999byCRCPressLLC
Hall, J.L. “Auditory Psychophysics for Coding Applications”
Digital SignalProcessing Handbook
Ed. Vijay K. Madisetti and Douglas B. Williams
Boca Raton: CRC Press LLC, 1999
c
1999byCRCPressLLC
39
Auditory Psychophysics for Coding
Applications
Joseph L. Hall
Bell Laboratories
Lucent Technologies
39.1Introduction
39.2Definitions
Loudness
•
Pitch
•
ThresholdofHearing
•
DifferentialThresh-
old
•
Masked Threshold
•
Critical Bandsand Peripheral Audi-
tory Filters
39.3SummaryofRelevantPsychophysicalData
Loudness
•
Differential Thresholds
•
Masking
39.4Conclusions
References
In this chapter we review properties of auditory perception that are relevant to the
design of coders for acoustic signals. The chapter begins with a general definition
of a perceptual coder, then considers what the “ideal” psychophysical model would
consist of and what use a coder could be expected to make of this model. We then
present some basic definitions and concepts. The chapter continues with a review of
relevantpsychophysical data, including resultsonthreshold,just-noticeabledifferences,
masking, and loudness. Finally, we attempt to summarize the present state of the art,
the capabilities and limitations of present-day perceptual coders for audio and speech,
and what areas most need work.
39.1 Introduction
A coded signal differs in some respect from the original signal. One task in designing a coder is to
minimize some measure of this difference under the constraints imposed by bit rate, complexity,
or cost. What is the appropriate measure of difference? The most straightforward approach is to
minimize some physical measure of the difference between original and coded signal. The designer
might attempt to minimize RMS difference between the original and coded waveform, or perhaps
the difference between original and coded power spectra on a frame-by-frame basis. However, if the
purpose of the coder is to encode acoustic signals that are eventually to be listened to
1
by people,
1
Perceptual coding is not limited to speech and audio. It can be applied also to image and video [16]. In this paper we
consider only coders for acoustic signals.
c
1999 by CRC Press LLC
these physical measures do not directly address the appropriate issue. For signals that are to be
listenedtobypeople,the“best”coderistheonethat sounds thebest. Thereisavery cleardistinction
between physical and perceptual measures of a signal (frequency vs. pitch, intensity vs. loudness,
for example). A perceptual coder can be defined as a coder that minimizes some measure of the
difference between original and coded signal so as to minimize the perceptual impact of the coding
noise. Wecandefinethebestcodergivenaparticularsetofconstraintsastheoneinwhichthecoding
noise is least objectionable.
It follows that the designer of a perceptual coder needs some way to determine the perceptual
quality of a coded signal. “Perceptual quality” is a poorly defined concept, and it will be seen that in
some sense it cannot be uniquely defined. We can, however, attempt to provide a partial answer to
thequestionofhowitcanbedetermined. Wecan present somethingofwhatisknownabouthuman
auditory perception from psychophysical listening experiments and show how these phenomena
relate to the design of a coder.
One requirement for successful design of a perceptual coder is a satisfactory model for the signal-
dependentsensitivityoftheauditorysystem. Present-daymodelsareincomplete,butwecanattempt
to specify what the properties of a complete model would be. One possible specification is that, for
anygivenwaveform(thesignal),itaccuratelypredictstheloudness,asafunctionofpitchandoftime,
of any added waveform (the noise). If we had such a complete model, then we would in principle
be able to build a transparent coder, defined as one in which the coded signal is indistinguishable
from the original signal, or at least we would be able to determine whether or not a given coder
was transparent. It is relatively simple to design a psychophysical listening experiment to determine
whether the coding noise is audible, or equivalently, whether the subject can distinguish between
original and coded signal. Any subject with normal hearing could be expected to give similar results
to this experiment. While present-day models are far from complete, we can at least describe the
properties of a complete model.
Thereisasecondrequirementthatismoredifficulttosatisfy. Thisistheneedtobeabletodetermine
whichoftwocodedsamples,eachofwhichhasaudiblecodingnoise,ispreferable. Whileasatisfactory
model for the signal-dependent sensitivity of the auditory system is in principle sufficient for the
design of a transparent coder, the question of how to build the best nontransparent coder does not
haveauniqueanswer. Often,design constraints precludebuildingatransparentcoder. Eventhebest
coderbuiltundertheseconstraintswillresultinaudiblecodingnoise,anditisundersomeconditions
impossible to specify uniquely how best to distribute this noise. One listener may prefer the more
intelligible version, while another may prefer the more natural sounding version. The preferences
of even a single listener might very well depend on the application. In the absence of any better
criterion, we can attempt to minimize the loudness of the coding noise, but it must be understood
that this is an incomplete solution.
Our purpose in this paper is to present something of what is known about human auditory
perception in a form that may be useful to the designer of a perceptual coder. We do not attempt
to answer the question of how this knowledge is to be utilized, how to build a coder. Present-day
perceptualcodersfor the most part utilize a feedforward paradigm: analysisof the signal to be coded
produces specifications for allowable coding noise. Perhaps a more general method is a feedback
paradigm, in which the perceptual model somehow makes possible a decision as to which of two
codedsignalsis“better”. Thisdecisionprocesscanthenbeiteratedtoarriveatsomeoptimumsolution.
It will be seen that for proper exploitation of some aspects of auditory perception the feedforward
paradigm may be inadequate and the potentially more time-consuming feedback paradigm may be
required. How this is to be done is part of the challenge facing the designer.
c
1999 by CRC Press LLC
39.2 Definitions
In this section we define some fundamental terms and concepts and clarify the distinction between
physical and perceptual measures.
39.2.1 Loudness
When we increase the intensity of a stimulus its loudness increases, but that does not mean that
intensity and loudness are the same thing. Intensity is a physical measure. We can measure the
intensity of a signal with an appropriate measuring instrument, and if the measuring instrument
is standardized and calibrated correctly anyone else anywhere in the world can measure the same
signal and get the same result. Loudness is perceptual magnitude. It can be defined as “that attribute
of auditory sensation in terms of which sounds can be ordered on a scale extending from quiet to
loud” ([23], p.47). We cannot measure it directly. All we can do is ask questions of a subject and
from the responses attempt to infer something about loudness. Furthermore, we have no guarantee
that a particular stimulus will be as loud for one subject as for another. The best we can do is assume
that, for a particular stimulus, loudness judgments for one group of normal-hear ing people will be
similar to loudness judgments for another group.
There are two commonly used measures of loudness. One is loudness level (unit phon) and the
other is loudness (unit sone). These two measures differ in what they describe and how they are
obtained. The phon is defined as the intensity, in dB SPL, of an equally loud 1-kHz tone. The sone
is defined in terms of subjectively measured loudness ratios. A stimulus half as loud as a one-sone
stimulus has a loudness of 0.5 sones, a stimulus ten times as loud has a loudness of 10 sones, etc. A
1-kHz tone at 40 dB SPL is arbitrarily defined to have a loudness of one sone.
The argument can be made that loudness matching, the procedure used to obtain the phonscale,
isa less subjectiveprocedurethanloudnessscaling, the procedureusedtoobtainthesonescale. This
argument would lead to the conclusion that the phon is the more objective of the two measures and
that the sone is more subject to individual variability. This argument breaks down on two counts:
first, for dissimilar stimuli even the supposedly straightforward loudness-matching task is subject to
large and poorly understood order and bias effects that can only be described as subjective. While
loudness matching of two equal-frequency tone bursts generally gives stable and repeatable results,
thetaskbecomesmoredifficultwhenthefrequenciesofthetwotoneburstsdiffer. Loudnessmatching
between two dissimilar stimuli, as for example between a pure tone and a multicomponent complex
signal, isevenmoredifficultandyields lessstableresults. Loudness-matchingexperimentshavetobe
designed carefully, and results from these experiments have to be interpreted with caution. Second,
it is possible to measure loudness in sones, at least approximately, by means of a loudness-matching
procedure. Fletcher [6] states that under some conditions loudness adds. Binaural presentation of a
stimulus results in loudness doubling; and two equally-loud stimuli, far enough apart in frequency
thattheydonotmaskeachother,aretwiceasloudasone. Ifloudnessadditivityholds,thenit follows
that the sone scale can be generated by matching loudness of a test stimulus to binaural stimuli or
to pairs of tones. This approach must be treated with caution. As Fletcher states, “However, this
method [scaling] is related more directly to the scale we are seeking (the sone scale) than the two
preceding ones (binaural or monaural loudness additivity)” ([6], p. 278). The loudness additivity
approachreliesonthe assumptionthatloudnesssummationisperfect, andthereissomemorerecent
evidence [28, 33] that loudness summation, at least for binaural vs. monaural presentation, is not
perfect.
c
1999 by CRC Press LLC
39.2.2 Pitch
The American Standards Association defines pitch as “that attribute of auditory sensation in which
sounds may be ordered on a musical scale”. Pitch bears much the same relationship to frequency
as loudness does to intensity: frequency is an objective physical measure, while pitch is a subjective
perceptualmeasure. Justasthereisnotaone-to-onerelationshipbetweenintensityandloudness, so
also thereisnot a one-to-onerelationshipbetweenfrequencyandpitch. Under some conditions, for
example, loudness can be shown to decrease with decreasing frequency with intensity held constant,
and pitch can be shown to decrease with increasing intensity with frequency held constant ([40], p.
409).
39.2.3 Threshold of Hearing
Since the concept of threshold is basic to much of what follows, it is worthwhile at this point to
discuss it in some detail. It will be seen that thresholds are determined not only by the stimulus and
the observer but also by the method of measurement. While this discussion is phrased in terms of
threshold of hearing, much of what follows applies as well to differential thresholds (just-noticeable
differences) discussed in the next subsection.
Bythesimplestdefinition,thethresholdofhearing(equivalently,auditory threshold)isthelowest
intensity that the listener can hear. Thisdefinition is inadequate because we cannot directly measure
the listener’s perception. A first-order correction, therefore, is that the threshold of hearing is the
lowest intensity that elicits from the listener the response that the sound is audible. Given this
definition, we can present a stimulus to the listener and ask whether he or she can hear it. If we
do this, we soon find that identical stimuli do not always elicit identical responses. In general, the
probability of a positive response increases with increasing stimulus intensity and can be described
by a psychometric function such as that shown for a hypothetical experiment in Fig. 39.1. Here the
stimulus intensity (in dB) appears on the abscissa and the probability P(C)of a positive response
appears on the ordinate. The yes-no experiment couldbe described by a psychometric function that
rangesfromzerotoone,andthresholdcouldbedefinedasthestimulusintensitythatelicitsapositive
response in 50% of the trials.
FIGURE 39.1: Idealized psychometric functions for hypothetical yes-no experiment (zero to one)
and for hypothetical two-inter val forced-choice experiment (0.5 to one).
c
1999 by CRC Press LLC
Adifficultywiththesimpleyes-noexperimentisthatwehavenocontroloverthesubject’scriterion
level. Thesubject may be using a strict criterion (“yes”only if the signal is definitely present) or a lax
criterion (“yes”ifthe signal might be present). Thesubjectcanrespondcorrectly either by a positive
response in the presence of a stimulus (hit) or by a negative response in the absence of a stimulus
(correct rejection). Similarly the subject can respond incorrectly either by a negative response in the
presenceofastimulus(miss) or by a positive response in the absence of a stimulus ( false alarm).
Unless the exper imenter is willing to use an elaborate and time-consuming procedure that involves
assigning rewards to correct responses and penalties to incorrect responses, the criterion level is
uncontrolled.
The field of psychophysics that deals with this complication is called detection theory. The field of
psychophysical detection theory is highly developed [12] and a complete description is far beyond
the scope of this paper. Very briefly, the subject’s response is considered to be based on an internal
decision variable, a random variable drawn from a distribution with mean and standard deviation
that depend on the stimulus. If we assume that the decision variable is normally distributed with a
fixed standard de viation σ and a mean that depends only on stimulus intensity, then we can define
an index of sensitivity d
for a given stimulus intensity as the difference between m
0
(the mean in
the absence of the stimulus) and m
s
(the mean in the presence of the stimulus), divided by σ .An
ideal observer (a hypothetical subject who does the best possible job for the task at hand) gives a
positive response if and only if the decision variable exceeds an internal criterion le vel. An increase
in criterion level decreases the probability of a false alarm and increases the probability of a miss.
A simple and satisfactory way to deal with the problem of uncontrolled criterion level is to use a
cr iterion-free experimental paradigm. The simplest is perhaps the two-interval forced choice (2IFC)
paradigm, in which the stimulus is presented at random in one of two observation intervals. The
subject’s task is to determine which of the two intervals contained the stimulus. The ideal observer
selects the interval that elicits the larger decision variable, and criterion level is no longer a factor.
Nowthesubjecthasa50%chanceofchoosingthecorrectintervalevenintheabsenceofanystimulus,
so the psychometric function goes from 0.5 to 1.0 as shown in Fig. 39.1. A reasonable definition of
thresholdisP(C) = 0.75, halfwaybetweenthechancelevelof0.5andone. Ifthedecisionvariableis
normallydistributedwithafixedstandarddeviation, it canbeshownthatthisdefinitionofthreshold
corresponds to a d
of 0.95.
The number of intervals can be increased beyond two. In this case, the ideal observer responds
correctly if the decision variable for the interval containing the stimulus is larger than the largest of
the N-1 decision variables for the intervals not containing the stimulus. A common practice is, for
an N-interval forced choice paradigm (NIFC), to define threshold as the point halfway between the
chancelevelof 1/N and one. Thisis a perfectly acceptable practice so long as it is recognized that the
measuredthresholdis influenced by the number of alternatives. For a 3IFC paradigm this definition
of threshold corresponds to a d
of 1.12 and for a 4IFC paradigm it corresponds to a d
of 1.24.
39.2.4 Differential Threshold
Thedifferentialthresholdisconceptuallysimilartotheauditorythresholddiscussedabove,andmany
of the same comments apply. The differential threshold, or just-noticeable difference (JND), is the
amount by which some attribute of a signal has to change in order for the observer to be able to
detect the change. A tone burst, for example, can be specified in terms of frequency, intensity, and
duration, and a differential threshold for any of these three attributes can be defined and measured.
The first attempt to provide a quantitative description of differential thresholds was provided by
the Ger man physiologist E. H. Weber in the first half of the 19th century. According to Weber’s law,
thejust-noticeabledifferenceI isproportionaltothestimulusintensityI ,orI/I = K,wherethe
constant of proportionality I /I is known as the Weber fraction. This was supposed to be a general
description of sensitivit y to changes of intensity for a variety of sensory modalities, not limited just
c
1999 by CRC Press LLC
to hearing, and it has since been applied to perception of nonintensive variables such as frequency.
It was recognized at an early stage that this law breaks down at near-threshold intensities, and in the
latter half of the 19th century the German physicist G. T. Fe chner suggested the modification that is
nowknown as the modified Weber law, I /(I +I
0
) = K,whereI
0
is a constant. While Weber’s law
provides a reasonable first-order description of intensity and frequency discrimination in hearing,
in general it does not hold exactly, as will be seen below.
Aswith the threshold of hearing, the differential threshold can be measured in differentways,and
the result depends to some extent on how it is measured. The simplest method is a same-different
paradigm, in which two stimuli are presented and the subject’s task is to judge whether or not they
are the same. This method suffers from the same drawback as the yes-no paradigm for auditory
threshold: we do not have control over the subject’s criterion level.
If the physical attribute being measured is simply related to some perceptual attribute, then the
differential threshold can be measured by requiring the subject to judge which of two stimuli has
more of that perceptual attribute. A just-noticeable difference for frequency, for example, could be
measuredbyrequiringthesubjecttojudgewhichof twostimuliisofhigherpitch;orajustnoticeable
difference for intensity could be measured by requiring the subject to judge which of two stimuli is
louder. As with the 2IFC paradigm discussed above for auditory threshold, this method removes the
problem of uncontrolled criterion level.
There are more general methods that do not assume a knowledge of the relationship between
the physical attribute being measured and a perceptual attribute. The most useful, perhaps, is the
N-interval forced choice method: N stimuli are presented, one of which differs from the other N-1
along the dimension being measured. The subject’s task is to specify which one of the N stimuli is
different from the other N-1.
Note that there is a close parallel between the differential threshold and the auditory threshold
described in the previous subsection. Theauditory threshold can be regarded as a special case of the
just-noticeable difference for intensity, where the question is by how much the intensity has to differ
from zero in order to be detectable.
39.2.5 Masked Threshold
The masked threshold of a signal is defined as the threshold of that signal (the probe) in the presence
of another signal (the masker). A related ter m is masking, which is the elevation of threshold of the
probe by the masker: it is the difference between masked and absolute threshold. More generally,
the reduction of loudness of a supra-threshold signal is also referred to as masking. It will be seen
that masking can appear in many forms, depending on spectral and temporal relationships between
probe and masker.
Many of the comments that applied to measurement of absolute and differential thresholds also
apply to measurement of masked threshold. The simplest method is to present masker plus probe
and ask the subject whether or not the probe is present. Onceagain there is a problemwith criterion
level. Another method is to present stimuli in two intervals and ask the subject which one contains
the probe. This method can give useful results but can, under some conditions, give misleading
results. Suppose, for example, that the probe and masker are both pure tones at 1 kHz, but that the
two signals are 180
◦
out of phase. As the intensity of the probe is increased from zero, the intensity
of the composite signal will first decrease, then increase. The two signals, masker alone and masker
plus probe, may be easily distinguishable, but in the absence of additional information the subject
has no way of telling which is which.
A more robust method for measuring masked threshold is the N-interval forced choice method
described above, in which the subject specifies which of the N stimuli differs from the other N-1.
Subjective percepts in masking experiments can be quite complex and can differ from one observer
to another. In the N-interval forced choice method the observer has the freedom to base judgments
c
1999 by CRC Press LLC
on whatever attribute is most easily detected, and itis not necessar y to instruct the observer what to
listen for.
Note that the differential threshold for intensity can be regarded as a special case of the masked
threshold in which the probe is an intensity-scaled version of the masker.
A note on terminology: suppose two signals, x
1
(t) and [x
1
(t) + x
2
(t)] are just distinguishable.
If x
2
(t) is a scaled version of x
1
(t), then we are dealing with intensity discrimination. If x
1
(t) and
x
2
(t) are two different signals, then we are dealing with masking, with x
1
(t) the masker and x
2
(t)
the probe. In either case, the difference can be described in several ways. These ways include (1) the
intensity increment between x
1
(t) and [x
1
(t) + x
2
(t)],I; (2) the intensity increment relative to
x
1
(t), I/I; (3) the intensity ratio between x
1
(t) and [x
1
(t) +x
2
(t)],(I +I )/I ; (4) the intensity
incrementindB,10×log
10
(I/I); and(5)theintensityratioindB, 10×log
10
[(I +I )/I ]. These
ways are equivalent in that they show the same information, although for a particular application
onewaymay be preferabletoanotherforpresentation purposes. Anothermeasurethatisoftenused,
particularly in the design of perceptual coders, is the intensity of the probe x
2
(t). This measure is
subject to misinterpretation and must be used with caution. Depending on the coherence between
x
1
(t) and x
2
(t), a given probe intensity can result in a wide range of intensity increments I .The
resulting ambiguity has been responsible for some confusion.
39.2.6 Critical Bands and Peripheral Auditory Filters
The concepts of critical bands and peripheral auditory filters are central to much of the auditory
modeling work that is used in present-day perceptual coders. Scharf, in a classic review article [33],
defines the empirical critical bandwidth as “that bandwidth at which subjective responses rather
abruptly change”. Simply put, for some psychophysical tasks the auditory system behaves as if it
consistedofa bank of bandpass filters (the critical bands) followedbyenergy detectors. Examples of
critical-bandbehaviorthatareparticularlyrelevantforthedesignerofacoderincludetherelationship
between bandwidth and loudness (Fig. 39.5) and the re lationship between bandwidth and masking
(Fig. 39.10). Anotherexampleofcritical-band behavior is phase sensitivity: in experiments measur-
ing the detectability of amplitude and of frequency modulation, the auditory system appears to be
sensitive to the relativephaseofthecomponentsofacomplexsoundonlyso long as the components
are within a critical band [9, 45].
The conceptof the critical band was introducedmorethan a half-century ago by Fletcher [6], and
sincethattimeithasbeen studiedextensively. Fletcher’spioneeringcontributionisablydocumented
by Allen [1], and Scharf’s1970review article [33] givesreferencestosomelater work. Morerecently,
Moore and his co-workers have made extensive measurements of peripheral auditory filters [24].
The value of critical bandwidths has been the subject of some discussion, because of questions
of definition and method of measurement. Figure 39.2 ([31], Fig. 1) shows critical bandwidth as a
function of frequency for Scharf’sempirical definition (the bandwidth at which subjectiveresponses
undergo some sort of change). Results from several experiments are superimposed here, and they
are in substantial agreement with each other. Moore and Glasberg [26] argue that the bandwidths
shown in Fig. 39.2 are determined not only by the bandwidth of peripheral auditory filters but also
by changes in processing efficiency. By their argument, the bandwidth of peripheral auditory filters
is somewhat smaller than the values shown in Fig. 39.2 at frequencies above 1 kHz and substantially
smaller, by as much as an octave, at lower frequencies.
39.3 Summary of Relevant Psychophysical Data
In Section 39.2, we introduced some basic concepts and definitions. In this section, we review some
relevant psychophysical results. There are several excellent books and book chapters that have been
c
1999 by CRC Press LLC
FIGURE39.2: Empiricalcriticalbandwidth. (Source: Scharf,B.,Criticalbands,ch. 5inFoundations
of Modern Auditory Theory, Vol. 1, Tobias, J.V., ed., Academic Press, NY,1970. With per mission).
writtenonthissubject,and wehaveneitherthespacenortheinclinationtoduplicatematerial found
in these other sources. Our attempt here is to make the reader aware of some relevant results and to
refer him or her to sources where more extensive treatments may be found.
39.3.1 Loudness
Loudness Level and Frequency
For pure tones, loudness depends on both intensity and frequency. Figure 39.3 (modified
from[37],p.124)showsloudnesslevelcontours. Thecurvesarelabeledinphonsand,inparentheses,
sones. These curves have been remeasured many times since, with some variation in the results, but
the basic conclusions remain unchanged. The most sensitive region is around 2-3 kHz. The low-
frequency slope of the loudness level contours is flatter at high loudness le vels than at low. Itfollows
that loudness level grows more rapidly with intensity at low frequencies than at high. The 38- and
48-phon contoursare (bydefinition)separatedby10dBat1 kHz, but theyareonlyabout5 dB apart
at 100 Hz.
Thisfigurealsoshowscontoursthatspecifythedynamicrangeofhearing. Tonesbelowthe8-phon
contour are inaudible, and tones above the dotted line are uncomfortable. The dynamic range of
hearing, the distance between these two contours, is greatest around 2 to 3 kHz and decreases at
lower and hig her frequencies. In practice, the useful dynamic range is substantially less. We know
today that extended exposure to sounds at much lower levels than the dotted line in Fig. 39.3 can
result in temporary or permanent damage to the ear. It has been suggested that extended exposure
to sounds as low as 70 to 75 dB(A) may produce permanent high-frequency threshold shifts in some
c
1999 by CRC Press LLC
[...]... problems with signal definition rather than a difference in processing by the auditory system, and it may be that an effective way of dealing with it will result not from an improved understanding of auditory perception but rather from changes in the coder A feedforward prediction of acceptable coding noise based on the energy of the signal does not take into account phase relationships between signal and... made between the original signal and the proposed coded signal This approach would require a more complex encoder but leave the decoder complexity unchanged [27] The difference between narrow-band and wide-band coding noise, on the other hand, appears to call for a basic change in models of c 1999 by CRC Press LLC auditory perception For largely historical reasons, the idea of signal energy as a perceptual... Criteria for noise and vibration exposure, in Handbook of Acoustical Measurements and Noise Control, 3rd ed., ch 26, Harris, C.M., Ed., McGraw-Hill, New York, 1991 [40] Ward, W.D., Musical perception, in Foundations of Modern Auditory Theory, Vol 1, ch 11, Tobias, J.V., Ed., Academic Press, New York, 1970 [41] Watson, C.S and Gengel, R.W., Signal duration and signal frequency in relation to auditory sensitivity,... critical-band transformation of the signal and arrive at a prediction of acceptable coding noise These models are essentially refinements of models first described by Fletcher and his co-workers and further developed by Zwicker and others [34] These models do a good job describing masking and loudness for steady-state bands of noise, but they are less satisfactory for other signals We can identify two areas... this difference is not handled well by present-day perceptual models Presentday coders first compute a measure of tonality of the signal and then use this measure empirically to obtain an estimate of masking This empirical approach has been applied successfully to a variety of signals, but it is possible that an approach that is less empirical and more based on a comprehensive model of auditory perception... band [49] The loudness growth function is very steep near threshold, so that dividing the total energy of the signal into two or more critical bands results in a reduction of total loudness The loudness growth function well above threshold is less steep, so that dividing the total energy of the signal into two or more critical bands results in an increase of total loudness Loudness and Duration Everything... bands, and the results shown here are similar to Fletcher’s results ([7], Fig 124) The closed symbols show threshold level of probe signals at frequencies ranging from 500 Hz to 8 kHz in dB relative to the intensity of a masking band of noise centered at the frequency of the test signal and with the bandwidth shown on the abscissa The intensity of the masking noise is 60 dB SPL per 1/3 octave Note that for... increasing realization that under some conditions signal energy is not the relevant measure but that some envelope-based measure may be required A second area in which additional research may prove fruitful is in the area of temporal aspects of masking As is discussed in Section 39.3.3 (Masking: Temporal Aspects of Masking), the situation with time-varying signal and noise is more complex than the steady-state... tone, Perception and Psychophsyics, 11: 241-246, 1972 [16] Jayant, N., Johnston, J., and Safranek, R., Signal compression based on models of human perception, Proc IEEE, 81: 1385-1422, 1993 [17] Jesteadt, W., Bacon, S.P., and Lehman, J.R., Forward masking as a function of frequency, masker level, and signal delay, J Acoust Soc Am., 71: 950-962, 1982 [18] Jesteadt, W., Wier, C.C., and Green, D.M., Intensity... Academic Press, New York, 1970 [32] Scharf, B., Loudness, in Handbook of Perception, Vol IV, Hearing, ch 6, Carterette, E.C and Friedman, M.P., Eds., Academic Press, New York, 1978 [33] Scharf, B and Fishkin, D., Binaural summation of loudness: reconsidered, J Exp Psychol., 86: 374-379, 1970 [34] Schroeder, M.R., Atal, B.S., and Hall, J.L., Optimizing digital speech coders by exploiting masking properties . Jenkins, W. K. “Auditory Psychophysics for Coding Applications”
Digital Signal Processing Handbook
Ed. Vijay K. Madisetti and Douglas B. Williams
Boca Raton:. 1999
c
1999byCRCPressLLC
Hall, J.L. “Auditory Psychophysics for Coding Applications”
Digital Signal Processing Handbook
Ed. Vijay K. Madisetti and Douglas B. Williams
Boca Raton: