IX DigitalAudio Communications NikilJayant BellLaboratories,LucentTechnologies 39AuditoryPsychophysicsforCodingApplications JosephL.Hall Introduction • Definitions • SummaryofRelevantPsychophysicalData • Conclusions 40MPEGDigitalAudioCodingStandards PeterNoll Introduction • KeyTechnologiesinAudioCoding • MPEG-1/AudioCoding • MPEG-2/Audio MultichannelCoding • MPEG-4/AudioCoding • Applications • Conclusions 41DigitalAudioCoding:DolbyAC-3 GrantA.Davidson Overview • BitStreamSyntax • Analysis/SynthesisFilterbank • SpectralEnvelope • Multichannel Coding • ParametricBitAllocation • QuantizationandCoding • ErrorDetection 42ThePerceptualAudioCoder(PAC) DeepenSinha,JamesD.Johnston,SeanDorward, andSchuylerR.Quackenbush Introduction • ApplicationsandTestResults • PerceptualCoding • MultichannelPAC • Bitstream Formatter • DecoderComplexity • Conclusions 43SonySystems KenzoAkagiri,M.Katakura,H.Yamauchi,E.Saito,M.Kohut,Masayuki Nishiguchi,andK.Tsutsui Introduction • OversamplingADandDAConversionPrinciple • TheSDDSSystemforDigitizing FilmSound • SwitchedPredictiveCodingofAudioSignalsfortheCD-IandCD-ROMXAFormat • ATRAC(AdaptiveTransformAcousticCoding)andATRAC2 A SWEENTERTHE21STCENTURY,digitalaudiocommunicationswillhavebecomenearly asprevalentasdigitalspeechcommunications.Inparticular,newtechnologiesforaudio storageandtransmissionwillmakeavailablemusicandwidebandsignalsinaflexiblevariety ofstandardformats. Thefundamentalunderpinningforthesetechnologiesisaudiocompressionbasedonperceptually- tunedshapingofthequantizationnoise.Thenextchapterinthissectiondescribespsychoacoustics knowledgethatsuggeststhegeneralprinciplesofperceptualaudiocoding.Succeedingchaptersinthis sectionaredevotedtodescriptionsofestablishedexamplesofperceptualaudiocoders.Theseinclude MPEGstandards,andcodersdevelopedbyDolby,Sony,andBellLaboratories. c 1999byCRCPressLLC The dimensions of coder performance are quality, bit rate, delay, and complexity. The quality vs. bit rate tradeoffs are particularly important. Audio Quality The three parameters of digital audio quality are signal bandwidth, fidelity and spatial realism. Compact-disk (CD) signals have a bandwidth of 20–20,000 Hz, while traditional telephone speech has a bandwidth of 200–3400 Hz. Intermediate bandwidths characterize various grades of wideband speech and audio, including roughly defined ranges of quality referred to as AM radio and FM radio quality (bandwidths on the order of 7–10 and 12–15 kHz, respectively). In the context of digital coding, fidelity refers to the level of perceptibility of quantization or reconstruction noise. The highest level of fidelity is one where the noise is imperceptible in formal listening tests. Lower levels of fidelity are acceptable in some applications if they are not annoying, although in general it is good practice to sacrifice some bandwidth in the interest of greater fidelity, for a given bit rate in coding. Five-point scales of signal fidelity are common in both speech and audio coding. Spatial realism is generally provided by increasing the number of coded (and reproduced) spatial channels. Common formats are 1-channel (mono), 2-channel (stereo), 5-channel (3 front, 2 rear), 5.1-channel (5-channel plus subwoofer) and 8-channel (6 front, 2 rear). For given constraints on bandwidth and fidelity, the required bit rate in coding increases as a function of the number of channels; but the increase is slower than linear, because of the presence of interchannel redundancy. The notion of perceptual coding originally developed for exploiting the perceptual irrelevancies of a single-channel audio signal extends also to the methods used in exploiting interchannel redundancy. Bit Rate The CD-stereo signal has a digital representation rate of 1406 kilobits per second (kb/s). Current technology for perceptual audio coding reproduces CD-stereo with perfect fidelity at bit rates as low as 128 kb/s, depending on the input signal. CD-like reproduction is possible at bit rates as low as 64 kb/s for stereo. Single-channel reproduction of FM-radio-like music is possible at 32 kb/s. Single- channel reproduction of AM-radio-like music and wideband speech is possible at rates approaching 16 kb/s for all but the most demanding signals. Techniques for so-called “pseudo-stereo” can provide additional enhancement of digital single-channel audio. Applications of Digital Audio The capabilities of audio compression have combined with increasingly affordable implementa- tions on platforms for digital signal processing (DSP), native signal processing (NSP) in a computer’s (native) processor, and application-specific integrated circuits (ASICs) to create revolutionary ap- plications of digital audio. International and national standards have contributed immensely to this revolution. Some of these standards only specify the bit-stream syntax and decoder, leaving room for future, sometimes proprietary, enhancements of the encoding algorithm. The domainsofapplicationsincludetransmission(forexample, digital audio broadcasting), storage (for example, the minidisk and the digital versatile disk, DVD), and networking (music preview, distribution, and publishing). The networking applications will make digital audio communications as commonplace as digital telephony. The Future of Digital Audio Remarkable as the capabilities and applications mentioned above are, there are even greater chal- lenges and opportunities for the practitioners of digital audio technology. It is unlikely that we have reached or even approached the fundamental limits of performance in terms of audio quality at a given bit rate. Newer capabilities in this technology (in terms of audio fidelity, bandwidth, and c 1999 by CRC Press LLC spatial realism) will continue to lead to newer classes of applications in audio communications. New technologies for embedded coding and universal coding will create interesting new options for digital networking and seamless communication of speech and music signals. Finally, co-designs of audio processing with image and video processing will lead to currently unavailable capabilities for multi- media networking games, computer agents, and personal communication services. These scenarios will call upon our best capabilities in signal compression as well as advances in the sister disciplines of signal synthesis and recognition by machine. c 1999 by CRC Press LLC . tunedshapingofthequantizationnoise.Thenextchapterinthissectiondescribespsychoacoustics knowledgethatsuggeststhegeneralprinciplesofperceptualaudiocoding.Succeedingchaptersinthis sectionaredevotedtodescriptionsofestablishedexamplesofperceptualaudiocoders.Theseinclude. The CD-stereo signal has a digital representation rate of 1406 kilobits per second (kb/s). Current technology for perceptual audio coding reproduces CD-stereo