Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 32 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
32
Dung lượng
716,5 KB
Nội dung
Multimedia Engineering Lecture: Basics of Digital Audio Lecturer: Dr Đỗ Văn Tuấn Department of Electronics and Telecommunications Email: tuandv@epu.edu.vn Lecture contents Digitalization of Sound Quantization and Transmission Audio What is Sound? Sound is a wave phenomenon like light, and involves molecules of air being compressed and expanded under the action of some physical device For example, a speaker in an audio system vibrates back and forth and produces a longitudinal pressure wave that we perceive as sound Since sound is a pressure wave, it takes on continuous values, as opposed to digitized ones Even though such pressure waves are longitudinal, they still have ordinary wave properties and behaviors, such as reflection (bouncing), refraction (change of angle when entering a medium with a different density) and diffraction (bending around an obstacle) If we wish to use a digital version of sound waves we must form digitized representations of audio information Digitization Digitization means conversion to a stream of numbers, and preferably these numbers should be integers for efficiency Figure below shows the 1-dimensional nature of sound: amplitude values depend on a 1D variable, time (And note that images depend instead on a 2D set of variables, x and y) Figure: An analog signal: continuous measurement of pressure wave Sampling and Quantization The graph in the above figure has to be made digital in both time and amplitude To digitize, the signal must be sampled in each dimension: in time, and in amplitude Sampling means measuring the quantity we are interested in, usually at evenlyspaced intervals The first kind of sampling, using measurements only at evenly spaced time intervals, is simply called, sampling The rate at which it is performed is called the sampling frequency For audio, typical sampling rates are from kHz (8,000 samples per second) to 48 kHz This range is determined by Nyquist theorem (discussed later) Sampling in the amplitude or voltage dimension is called quantization Thus to decide how to digitize audio data we need to answer the following questions: What is the sampling rate? How finely is the data to be quantized, and is quantization uniform? How is audio data formatted? (file format) Signals can be decomposed into a sum of sinusoids Nyquist Theorem Signals can be decomposed into a sum of sinusoids Figure: Building up a complex signal by superposing sinusoids Nyquist Theorem Frequency is an absolute measure, pitch is generally relative – a perceptual subjective quality of sound Pitch and frequency are linked by setting the note A4 exactly 440 Hz An octave above that note takes us to another A note An octave corresponds to doubling the frequency Thus with the middle “A” on a piano (“A4” or “A440”) set to 440 Hz, the next “A” up is at 880 Hz, or one octave above Harmonics: any series of musical tones whose frequencies are integral multiples of the frequency of a fundamental tone If we allow non-integer multiples of the base frequency, we allow non-“A” notes and have a more complex resulting sound Nyquist Theorem The Nyquist theorem states how frequently we must sample in time to be able to recover the original sound For correct sampling we must use a sampling rate equal to at least twice the maximum frequency content in the signal This rate is called the Nyquist rate Nyquist Theorem: If a signal is band-limited, i.e., there is a lower limit f1 and an upper limit f2 of frequency components in the signal, then the sampling rate should be at least 2(f2 − f1) Nyquist frequency: half of the Nyquist rate Since it would be impossible to recover frequencies higher than Nyquist frequency in any event, most systems have an anti-aliasing filter that restricts the frequency content in the input to the sampler to a range at or below Nyquist frequency Signal to Noise Ratio The ratio of the power of the correct signal and the noise is called the signal to noise ratio (SNR) – a measure of the quality of the signal The SNR is usually measured in decibels (dB), where dB is a tenth of a bel The SNR value, in units of dB, is defined in terms of base-10 logarithms of squared voltages, as follows: SNR 10 log10 VSignal V Noise 20 log10 VSignal VNoise The power in a signal is proportional to the square of the voltage For example, if the signal voltage VSignal is 10 times the noise, then the SNR is 20 log10(10) = 20dB In terms of power, if the power from ten violins is ten times that from one violin playing, then the ratio of power is 10dB, or 1B Signal to Quantization Noise Ratio Aside from any noise that may have been present in the original analog signal, there is also an additional error that results from quantization If voltages are actually in to but we have only bits in which to store values, then effectively we force all continuous values of voltage into only 256 different values This introduces a round-off error It is not really “noise” Nevertheless it is called quantization noise (or quantization error) The quality of the quantization is characterized by the Signal to Quantization Noise Ratio (SQNR) Quantization noise: the difference between the actual value of the analog signal, for the particular sampling time, and the nearest quantization interval value At most, this error can be as much as half of the interval For a quantization accuracy of N bits per sample, the SQNR can be simply N1 V expressed: Signal Signal SNR 20 log10 VQuan _ noise 20 log10 1/ 20 N log 6.02 NdB 10 Quantization and Transmission Audio Coding of Audio: Quantization and transformation of data are collectively known as coding of the data For audio, the μ-law technique for companding (Compressing and Expanding) audio signals is usually combined with an algorithm that exploits the temporal redundancy present in audio signals Differences in signals between the present and a past time can reduce the size of signal values and also concentrate the histogram of pixel values (differences, now) into a much smaller range The result of reducing the variance of values is that lossless compression methods produce a bit-stream with shorter bit lengths for more likely values In general, producing quantized sampled output for audio is called PCM (Pulse Code Modulation) The differences version is called DPCM (and a crude but efficient variant is called DM) The adaptive version is called ADPCM 18 Pulse Code Modulation The basic techniques for creating digital signals from analog signals are sampling and quantization Quantization consists of selecting breakpoints in magnitude, and then remapping any value within an interval to one of the representative output levels The set of interval boundaries are called decision boundaries, and the representative values are called reconstruction levels The boundaries for quantizer input intervals that will all be mapped into the same output level form a coder mapping The representative values that are the output values from a quantizer are a decoder mapping Finally, we may wish to compress the data, by assigning a bit 19 Pulse Code Modulation Every compression scheme has three stages: The input data is transformed to a new representation that is easier or more efficient to compress We may introduce loss of information Quantization is the main lossy step we use a limited number of reconstruction levels, fewer than in the original signal Coding: assign a codeword (thus forming a binary bit-stream) to each output level or symbol This could be a fixed-length code, or a variable length code such as Huffman coding For audio signals, we first consider PCM for digitization This leads to Lossless Predictive Coding as well as the DPCM scheme; both methods use differential coding As well, we look at the adaptive version, ADPCM, which can provide better compression 20 PCM in Speech Compression Assuming a bandwidth for speech from about 50 Hz to about 10 kHz, the Nyquist rate would dictate a sampling rate of 20 kHz Using uniform quantization without companding, the minimum sample size we could get away with would likely be about 12 bits Hence for mono speech transmission the bit-rate would be 240 kbps (20K × 12 bits) With companding, we can reduce the sample size down to about bits with the same perceived level of quality, and thus reduce the bit-rate to 160 kbps However, the standard approach to telephony in fact assumes that the highest-frequency audio signal we want to reproduce is only about kHz Therefore the sampling rate is only kHz, and the companded bit-rate thus reduces this to 64 kbps 21 PCM in Speech Compression However, there are two small wrinkles we must also address: Since only sounds up to kHz are to be considered, all other frequency content must be noise Therefore, we should remove this high-frequency content from the analog input signal This is done using a band-limiting filter that blocks out high, as well as very low, frequencies A discontinuous signal contains not just frequency components due to the original signal, but also a theoretically infinite set of higher-frequency components: This result is from the theory of Fourier analysis, in signal processing These higher frequencies are extraneous Therefore the output of the digital-to-analog converter goes to a lowpass filter that allows only frequencies up to the original maximum to be retained 22 PCM in Speech Compression The complete scheme for encoding and decoding telephony signals is shown as a schematic in the figure below As a result of the low-pass filtering, the output becomes smoothed Figure: PCM signal encoding and decoding 23 Differential Coding in Audio Audio is often stored not in simple PCM but instead in a form that exploits differences – which are generally smaller numbers, so offer the possibility of using fewer bits to store If a time-dependent signal has some consistency over time (“temporal redundancy”), the difference signal, subtracting the current sample from the previous one, will have a more peaked histogram, with a maximum around zero For example, as an extreme case the histogram for a linear ramp signal that has constant slope is flat, whereas the histogram for the derivative of the signal (i.e., the differences, from sampling point to sampling point) consists of a spike at the slope value So if we then go on to assign bit-string codewords to differences, we can assign short codes to prevalent values and long codewords to rarely occurring ones 24 Lossless Predictive Coding Predictive coding: simply means transmitting differences – predict the next sample as being equal to the current sample; send not the sample itself but the difference between previous and next Predictive coding consists of finding differences, and transmitting these using a PCM system Note that differences of integers will be integers Denote the integer input signal as the set of values f n Then we predict values f n as simply the previous value, and define the error en as the difference between the actual and the predicted signal: f n f n en f n f n But it is often the case that some function of a few of the previous values f n1 , f n , f n , provides a better prediction Typically, a linear predictor function is used: to f n an k f n k k 1 25 Lossless Predictive Coding The idea of forming differences is to make the histogram of sample values more peaked For example, the first figure plots second of sampled speech at kHz, with magnitude resolution of bits per sample A histogram of these values is actually centered around zero The last figure shows the histogram for corresponding speech signal differences: difference values are much more clustered around zero than are sample values themselves As a result, a method that assigns short code-words to frequently occurring symbols will assign a short code to zero and rather well: such a coding scheme will much more efficiently code sample differences than samples themselves 26 Lossless Predictive Coding Figure: Differencing concentrates the histogram (a): Digital speech signal (b): Histogram of digital speech signal values (c): Histogram of digital speech signal differences 27 Lossless Predictive Coding One problem: suppose our integer sample values are in the range 255 Then differences could be as much as -255 255 – we’ve increased our dynamic range (ratio of maximum to minimum) by a factor of two ! need more bits to transmit some differences A clever solution for this: define two new codes, denoted SU and SD, standing for Shift-Up and Shift-Down Some special code values will be reserved for these Define SU and SD as shifts by 32 Then we can use codewords for only a limited set of signal differences, say only the range −15 16 Differences which lie in the limited range can be coded as is, but with the extra two values for SU, SD, a value outside the range −15 16 can be transmitted as a series of shifts, followed by a value that is indeed inside the range −15 16 For example, 100 is transmitted as: SU, SU, SU, 4, (3*32+4) where (the codes for) SU and for are what are transmitted (or stored) 28 Lossless Predictive Coding The decoder produces the same signals as the original As a simple example, suppose we have: Let’s consider an explicit example Suppose we wish to code the sequence f1, f2, f3, f4, f5 = 21, 22, 27, 25, 22 For the purposes of the predictor, we’ll invent an extra signal value f0, equal to f1, and first transmit this initial value, uncoded: 29 Lossless Predictive Coding The error does center around zero, we see, and coding (assigning bit-string codewords) will be efficient The figure below shows a typical schematic diagram used to encapsulate this type of system: Figure: Schematic diagram for Predictive Coding encoder and decoder 30 Lossless Predictive Coding More investigations DPCM DM ADPCM 31 End of the lecture 32 ... Quantization and Transmission Audio 17 Quantization and Transmission Audio Coding of Audio: Quantization and transformation of data are collectively known as coding of the data For audio, the μ-law technique... that the expression for the SQNR becomes: SQNR = 6.02N + 1.76(dB) 12 Audio Filtering Prior to sampling and AD conversion, the audio signal is also usually filtered to remove unwanted frequencies... the DA circuit 13 Audio Quality vs Data Rate The uncompressed data rate increases as more bits are used for quantization Stereo: double the bandwidth to transmit a digital audio signal Table: