ĐIỆN tử VIỄN THÔNG bài 3 lossless compression khotailieu

Multimedia Engineering Lecture 3: Lossless Compression Techniques Lecturer: Dr Đỗ Văn Tuấn Department of Electronics and Telecommunications Email: tuandv@epu.edu.vn Lecture contents Introduction Basics of Information Theory Run-Length Coding Variable-Length Coding (VLC) Arithmetic Coding Dictionary-based Coding Introduction  Compression: the process of coding that will effectively reduce the total number of bits needed to represent certain information  If the compression and decompression processes induce no information loss, then the compression scheme is lossless; otherwise, it is lossy  Compression ratio: B0 compression ratio = B1 B0 − number of bits before compression B1 − number of bits after compression Lecture contents Introduction Basics of Information Theory Run-Length Coding Variable-Length Coding (VLC) Arithmetic Coding Dictionary-based Coding Basics of Information Theory  The entropy η of an information source with alphabet S = s1 , s2 , sn is pi – probability that symbol si will occur in S – indicates the amount of information (self-information) pi contained in si , which corresponds to the number of bits needed to encode si log Distribution of Grey Level Intensities  The figure below shows the histogram of an image with uniform distribution of gray-level intensities, i.e., pi = 1/256 (i=1:256) Hence, the entropy of this image is log2256 = Figure: Histograms for Two Gray-level Images Entropy and Code Length  The entropy η is a weighted-sum of terms; hence it represents the average amount of information contained per symbol in the source S  The entropy η specifies the lower bound for the average number of bits to code each symbol in S, i.e., η ≤ ave(len) ave(len)– the average length (measured in bits) of the codewords produced by the encoder η = 8, the minimum average number of bits to represent each gray-level intensity is at least η = (1/3) log23 + (2/3)log2(3/2) = 0.92  The entropy is greater when the probability distribution is flat and smaller when it is more peaked Lecture contents Introduction Basics of Information Theory Run-Length Coding Variable-Length Coding (VLC) Arithmetic Coding Dictionary-based Coding Run-Length Coding  Rationale for RLC: if the information source has the property that symbols tend to form continuous groups, then such symbol and the length of the group can be coded  Run-length encoding (RLE) is a very simple form of data compression in which runs of data (that is, sequences in which the same data value occurs in many consecutive data elements) are stored as a single data value and count, rather than as the original run  Example: Text to be coded (length: 67) WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWW WW Text after coded (length: 18) 12W1B12W3B24W1B14W Lecture contents Introduction Basics of Information Theory Run-Length Coding Variable-Length Coding (VLC) Arithmetic Coding Dictionary-based Coding 10 Example: Coding in Arithmetic Coding Table: Now low, high and range generated 21 Arithmetic Coding Decoder 22 Arithmetic Coding Decoder  If the alphabet is [A, B, C] and the probability distribution is pA = 0.5, pB = 0.4, pC = 0.1, then for sending BBB,  Huffman coding: bits  Arithmetic coding: bits  Arithmetic coding can treat the whole message as one unit In practice, the input data is usually broken up into chunks to avoid error propagation 23 Lecture contents Introduction Basics of Information Theory Run-Length Coding Variable-Length Coding (VLC) Arithmetic Coding Dictionary-based Coding 24 Lempel-Ziv-Welch Algorithm  LZW uses fixed-length codewords to represent variable-length strings of symbols/characters that commonly occur together, e.g., words in English text  The LZW encoder and decoder build up the same dictionary dynamically while receiving the data  LZW places longer and longer repeated entries into a dictionary, and then emits the code for an element, rather than the string itself, if the element has already been placed in the dictionary  The predecessors of LZW are LZ77 and LZ78, due to Jacob Ziv and Abraham Lempel in 1977 and 1978  Terry Welch improved the technique in 1984  LZW is used in many applications, such as UNIX compress, GIF for image, V.42 bis for modems, and others 25 Lempel-Ziv-Welch Algorithm 26 Lempel-Ziv-Welch Algorithm  Example: LZW compression for string “ABABBABCABABBA”  Let’s start with a very simple dictionary (also referred to as a “string table”), initially containing only characters, with codes as follows:  Now if the input string is “ABABBABCABABBA”, the LZW compression algorithm works as follows: 27 Lempel-Ziv-Welch Algorithm  ABABBABCABABBA: The output codes are: Instead of sending 14 characters, only codes need to be sent (compression ratio = 14/9 = 1.56) 28 Lempel-Ziv-Welch Algorithm Example: LZW decompression for string “ABABBABCABABBA” Input codes to the decoder are The initial string table is identical to what is used by the encoder 29 Lempel-Ziv-Welch Algorithm The LZW decompression algorithm then works as follows: Apparently, the output string is “ABABBABCABABBA”, a truly lossless result! 30 Lempel-Ziv-Welch Algorithm  The above simple version will reveal a potential problem In adaptively updating the dictionaries, the encoder is sometimes ahead of the decoder  For example, after ABABB, the encoder will output code and create a dictionary entry with code for the new string ABB  After receiving the code 4, the output will be AB, and the dictionary is updated with code for a new string, BA  Welch points out that the simple version will break down when the following scenario occurs  Input= ABABBABCABBABBAX  Whenever the sequence is Character, String, Character, String, Character, and so on, the encoder will create a new code to represent Character + String + Character and use it right away, before the decoder has had a chance to create it! 31 Lempel-Ziv-Welch Algorithm 32 Lempel-Ziv-Welch Algorithm  This is only case will fail When this occurs, the variable S= Character + String A modified version can handle this exceptional case by checking whether the input code has been defined in the decoder’s dictionary If not, it will simple assume that the code represents the symbols s + s[0]; that is Character + String + Character  In real applications, the code length l is kept in the range of [l0, lmax] The dictionary initially has a size of 2lo.When it is filled up, the code length will be increased by 1; this is allowed to repeat until l = lmax  When lmax is reached and the dictionary is filled up, it needs to be flushed (as in Unix compress, or to have the LRU (least recently used) entries removed 33 Lempel-Ziv-Welch Algorithm 34 End of the lecture 35 ... information  If the compression and decompression processes induce no information loss, then the compression scheme is lossless; otherwise, it is lossy  Compression ratio: B0 compression ratio... error propagation 23 Lecture contents Introduction Basics of Information Theory Run-Length Coding Variable-Length Coding (VLC) Arithmetic Coding Dictionary-based Coding 24 Lempel-Ziv-Welch Algorithm... encoder η = 8, the minimum average number of bits to represent each gray-level intensity is at least η = (1 /3) log 23 + (2 /3) log2 (3/ 2) = 0.92  The entropy is greater when the probability distribution

Định dạng
Số trang	35
Dung lượng	1,06 MB