CS 450: Compression presents about coding; Information; Example Information; Information vs Data; Quantifying Information - Entropy; Context; Calculating Information in Context; Types of Redundancy; Entropy Coding; Vector Quantization; Interpixel Redundancy;...
CS 450 Compression Coding All information in digital form must be encoded somehow Examples: • Binary numbers • ASCII • IEEE floating-point standard Coding can be lossless or lossy Coding can be efficient or inefficient Compression is nothing but efficient coding CS 450 Compression Information Information is something unknown More probabilistically, it is something unexpected CS 450 Compression Example: Information What’s the missing letter? H E L Now, what you think is missing? A Y O CS 450 Compression Information vs Data Data: the actual bits, bytes, letters, numbers, etc Information: the content Redundancy: difference between data and information redundancy = data − information CS 450 Compression Quantifying Information: Entropy Information content of symbol w measured in bits: infow = − log2 p(w) where p(w) is the probability of symbol w occurring The average bits for a language with n symbols is n p(wi ) infowi information = i=1 n = − p(wi ) log2 p(wi ) i=1 where p(wi ) is the probability of symbol wi occurring CS 450 Compression Context Information is based on expectation Expectation is based on context Examples of contexts: • last n letters • last n values for a time-sampled sequence • neighboring pixels in an image CS 450 Compression Calculating Information in Context Without considering context, a string of n symbols has n − log2 p(wi ) information = i=1 where p(wi ) is the probability of symbol wi occurring Considering context, a string of n symbols may have n − log2 p(wi |w0 wi−1 ) information = i=1 where p(wi |w0 wi−1 ) is the probability of symbol wi occurring immediately after symbols w0 wi−1 Can better by considering multiple symbols as one CS 450 Compression Types of Redundancy Remember: redundancy = data − information For images, there are three types of redundancy: • Coding • Interpixel • Visual CS 450 Compression Entropy Coding Entropy coding allocates bits per symbol or group of symbols according to information content (entropy) Huffman Coding: Optimal unique coding on a per-symbol basis Arithmetic Coding: Encodes a sequence of symbols as an infinite-precision real number (About 2:1 better than Huffman) The theoretical compression limit of entropy coding is the entropy of the language (set of symbols) itself CS 450 Compression Vector Quantization Divide the signal into sets of n values and treat as one collective vector value Quantize the vectors to some finite number of them—this causes loss but reduces the space of possible values Use a unique code for each vector and transmit that code, not the vector Most VQ systems intelligently select the quantization based on the signal content—this means you have to send the codebook with the codes Can extend to images using n x m blocks Example: x blocks, 256 levels per pixel = 25616 possibilities = 128 bits per block Choose 4096 possibilties = 12 bits per block 10 CS 450 Compression Lempel-Ziv: Basic Idea sequence = empty get(symbol) while (symbol) if sequence + symbol starts a previous sequence sequence += symbol get(symbol) else if empty(sequence) write symbol get(symbol) else write pointer to start of previous sequence sequence = empty 14 CS 450 Compression Lempel-Ziv: Actual Algorithm codebook = empty sequence = empty while (get(symbol)) if sequence + symbol is in codebook sequence += symbol else add sequency+symbol to codebook output(new code) 15 CS 450 Compression Example: Lempel-Ziv Mary had a little lamb, little lamb, little lamb, Mary had a little lamb, its fleece was white as snow 16 CS 450 Compression Predictive Coding Use one set of pixels to predict another Predictors: • • • • Next pixel is like the last one Next scanline is like the last one Next frame was like the last one Next pixel is the average of the already-known neighbors The error from the prediction (residual) hopefully has smaller entropy than the original signal The information used to make the prediction is the context 17 CS 450 Compression Predictive Coding Sender and receiver use the same predictive model Sender: Make prediction Send the entropy-encoded difference (the residual) Receiver: Make prediction Add the residual 18 CS 450 Compression Delta Modulation Prediction: next signal value is the same as the last Residual is the difference (delta) from the previous one Residual is encoded in a smaller number of bits than the original Often used in audio systems (phones) Problem: limited-precision delta can cause under/overshoot 19 CS 450 Compression Predictive Image Coding Prediction: next pixel is weighted average of the neighbors that have been previously seen in scanline order Can use larger context Used in DPCM and the lossless part of the JPEG standard Newer algorithms (CALOC, LOCO-I) use multiple contexts 20 CS 450 Compression Visual Redundancy Eye is less sensitive to • Color • High Frequencies So, • Allocate more bits to intensity than chromaticity • Allocate more bits to low frequencies than to high frequencies 21 CS 450 Compression Transform Coding Use some transform to convert from spatial domain to another (e.g., a frequency-based one) Quantize coefficients according to information content (e.g., quantize high frequencies more coarsely than low ones) Problem: artifacts caused by imperfect approximation in one place get spread across the entire image Solution: independently transform and quantize blocks of the image: block transform encoding 22 CS 450 Compression JPEG Intensity/Chromaticity Can allocate more bits to intensity than to chromaticity x block DCT Energy compaction Predictively encode DC coeff Takes advantage of redundancy in the block averages Quantize AC coefficients Many high frequencies become zero! Zig-zag ordering Changes from 2-D to 1-D and puts similar frequencies together Run-length encoding Collapses long runs of zeroes Entropy encode what’s left To more efficiently encode the RLE sequences 23 CS 450 Compression Evaluating Compression Compression rate is usually expressed in bits-per-symbol (bps) Compression rate isn’t the only thing! Must also consider an distortion (error or loss) introduced Rate-Distortion Curves 24 CS 450 Compression 25 Typical Compression Rates Application Voice Uncompressed Compressed 64 kbps 2-4 kbps 64 kbps 16-64 kbps 1.5 Mbps 128 kps–1.5 Mbps 5.07 Mbps 8-16 kbps (8 ksamples/sec, bits/sample) Audio conference (8 ksamples/s, bits/sample) Digital Audio (stereo, 44.1 ksamples/s, 16 bits/sample) Slow Motion Video (10 fps, frame size 176x120, bits/pixel) CS 450 Compression 26 Typical Compression Rates Application Video Conference Uncompressed Compressed 30.41 Mbps 64-768 kbps 30.41 Mbps 384 kbps 60.83 Mbps 1.5–4 Mbps 248.83 Mbps 3–8 Mbps 1.33 Gbps 20 Mbps (15 fps, frame size 352x240, bits/pixel) Video File Transfer (15 fps, frame size 352x240, bits/pixel) Digital Video on CD-ROM (30 fps, frame size 352x240, bits/pixel) Broadcast Video (30 fps, frame size 720x480, bits/pixel) HDTV (59.94 fps, frame size 1280x720, bits/pixel) CS 450 Compression Progressive Methods Interlaced GIF: Send every scanlines, then every 4, then 2, then all Interpolate intermediate lines until they get there Progressive JPEG: Send DC coefficients Send all lowest-frequency AC coefficients Send successively higher AC coefficients 27 CS 450 Compression Other Compression Methods Wavelets: Similar in principle to block-transform-based methods Greater compression/fidelity by exploiting both spatial and frequency information Fractals: Blocks of the image are represented as rotated, translated, and rescaled copies of other blocks Probably the highest claimed compression rates Useful in some applications; very lossy in others Some people are researching hybrid methods based on fractal compression 28 ... more efficiently encode the RLE sequences 23 CS 450 Compression Evaluating Compression Compression rate is usually expressed in bits-per-symbol (bps) Compression rate isn’t the only thing! Must... symbols as one CS 450 Compression Types of Redundancy Remember: redundancy = data − information For images, there are three types of redundancy: • Coding • Interpixel • Visual CS 450 Compression. .. 4096 possibilties = 12 bits per block 10 CS 450 Compression Interpixel Redundancy The basis of interpixel redundancy is • repetition • prediction 11 CS 450 Compression Run-Length Encoding Encode