Lecture 4 DictionaryMethods
Data Compression Lecture Dictionary Methods Alexander Kolesnikov Dictionary based methods • Statistical compression methods use a statistical model of the data, and the quality of compression they achieve depends on how good that model is • Dictionary-based method not use statistical model of data Instead they select strings of symbols and encode each string as a token using a dictionary • The dictionary holds strings of symbols and it may be static or dynamic (adaptive) • To encode text one can use a (static) dictionary of the English (Finnish, etc.) language • What about images? Dictionary-based methods: Main idea • Encoder: As the input is processed, develop a dictionary and transmit the index of strings found in the dictionary • Decoder: As the code is processed, reconstruct the dictionary to invert the process of encoding Dictionary-based methods: History • 1977: LZ77 [Lempel, Ziv] * * * * 1982: LZSS [Storer, Szymanski] LZR, LZH, etc Gzip, PKZip, LHarc, Zoo for files PNG image format • 1978: LZ78 [Lempel, Ziv] * 1984: LZW [Welch] algorithm * GIF image format * V.42bis data compression standard for modems LZ77 – Giới thiệu LZ77 Jacov Ziv Abraham Lempel đưa vào năm 1977 LZ77 thuật toán nén dựa từ điển Thuật toán: Sử dụng trỏ dịch chuyển xâu kí tự đầu vào Ban đầu trỏ trỏ vào vị trí Qui ước: từ trỏ trở trước gọi khứ, từ trỏ trở sau gọi tương lai Tránh trường hợp khứ dài vơ tận người ta đặt kích thước cửa sổ khứ w số kí tự dài trùng khớp với kí tự tương lai LZ77 – thuật toán nén Pos - vị trí xét Ban đầu Pos Match – xâu dài tìm thấy cửa sổ Char – kí tự vừa đọc Output - thể dạng (i, j) C: (i,j) thể vị trí so khớp độ dài xâu tương ứng C kí tự rõ buffer LZ77: Example sir_sid_eastman_easily_teases_sea_sick_seals search buffer look-ahead buffer sir_sid_eastman_ ⇒ (0,0,’s’) sir_sid_eastman_e ⇒ (0,0,’i’) sir_sid_eastman_ea ⇒ (0,0,’r’) sir_sid_eastman_eas ⇒ (0,0,’_’) sir_sid_eastman_easi ⇒ (4,2,’d’) LZ77: Decoding Codes: (0, 0,’s’) (0,0,’i’) (0,0,’r’) (0,0,’_’) (4,2,’d’) Message: (0, 0,’s’): s (0,0,’i’): s+i=si (0,0,’r’): si+r=sir (0,0,’_’): sir+_=sir_ (4,2,’d’): sir_+si+d=sir_sid Get symbols ’si’ starting from position -4 and add symbol ’d’ LZ77: Example LZ77: Example 10 LZW: Decoding Initialize Dictionary Input code c Decode code c (index) to w Output decoded string w Put w? in Dictionary REPEAT a) Input code c Decode the 1st symbol s1 of the code c Complete the previous Dictionary entry with s1 b) Finish decoding the remainder of the code c Output decoded string w Put put w? in Dictionary UNTIL no more codes 30 LZW: Example (0) Dictionary 0a 1b Codes: 012436 Message: Initialize Dictionary with alphabet 31 LZW: Example (1) Dictionary 0a 1b a? Codes : Message: a Input code c=0 Decode code c to w=a Output decoded string w=a Put w?=a? in Dictionary 32 LZW: Example (2a) Dictionary 0a 1b ab Codes : Message: a b+ a) Input code c=1 Decode the 1st symbol s1=b of the code c=1 Complete the previous Dictionary entry with s1=b 33 LZW: Example (2b) Dictionary 0a 1b ab b? Codes : Message: a b b) Finish decoding the remainder of the code c=1 Output decoded string w=b Put put w?=b? in Dictionary 34 LZW: Example (3a) Dictionary 0a 1b ab ba Codes : Message: a b a+ a) Input code c=2 Decode the 1st symbol s1=a of the code c=2 Complete the previous Dictionary entry with s1=a 35 LZW: Example (3b) Dictionary 0a 1b ab ba ab? Codes : Message: a b ab a) Finish decoding the remainder of the code c=2 Output decoded string w=ab Put put w?=ab? in Dictionary 36 LZW: Example (4a) step Dictionary 0a 1b ab ba ab? Codes : Message: a b ab a+ a) Input code c=4 ⇒ Decode the 1st symbol s1=a of the code c=4 Complete the previous Dictionary entry with s1=a 37 LZW: Example (4a) step Dictionary 0a 1b ab ba aba Codes : Message: a b ab a+ a) Input code c=4 Decode the 1st symbol s1=a of the code c=4 ⇒ Complete the previous Dictionary entry with s1=a 38 LZW: Example (4b) Dictionary 0a 1b ab ba aba aba? Codes : 436 Message: a b ab aba b) Finish decoding the remainder of the code c=4 Output decoded string w=aba Put put w?=aba? in Dictionary 39 LZW: Example (5a) Dictionary 0a 1b ab ba aba abab Codes : Message: a b ab aba b+ a) Input code c=3 Decode the 1st symbol s1=b of the code c=3 Complete the previous Dictionary entry with s1=b 40 LZW: Example (5b) Dictionary 0a 1b ab ba aba abab ba? Codes : Message: a b ab aba ba b) Finish decoding the remainder of the code c=3 Output decoded string w=ba Put put w?=ba? in Dictionary 41 LZW: Example (6a) step Dictionary 0a 1b ab ba aba abab ba? Codes : Message: a b ab aba ba b+ a) Input code c=6 ⇒ Decode the 1st symbol s1=b of the code c=6 Complete the previous Dictionary entry with s1=b 42 LZW: Example (6a) step Dictionary 0a 1b ab ba aba abab bab Codes : Message: a b ab aba ba b+ a) Input code c=6 Decode the 1st symbol s1=b of the code c=6 ⇒ Complete the previous Dictionary entry with s1=b 43 LZW: Example (6b) Dictionary 0a 1b ab ba aba abab bab bab? Codes : Message: a b ab aba ba bab a) Finish decoding the remainder of the code c=6 Output decoded string w=bab Put put w?=bab? in Dictionary 44 ... sir_sid_eastman_easi ⇒ (4, 2,’d’) LZ77: Decoding Codes: (0, 0,’s’) (0,0,’i’) (0,0,’r’) (0,0,’_’) (4, 2,’d’) Message: (0, 0,’s’): s (0,0,’i’): s+i=si (0,0,’r’): si+r=sir (0,0,’_’): sir+_=sir_ (4, 2,’d’): sir_+si+d=sir_sid... w?=ab? in Dictionary 36 LZW: Example (4a) step Dictionary 0a 1b ab ba ab? Codes : Message: a b ab a+ a) Input code c =4 ⇒ Decode the 1st symbol s1=a of the code c =4 Complete the previous Dictionary... Dictionary entry with s1=a 37 LZW: Example (4a) step Dictionary 0a 1b ab ba aba Codes : Message: a b ab a+ a) Input code c =4 Decode the 1st symbol s1=a of the code c =4 ⇒ Complete the previous Dictionary