Mã nén
Data CompressionLecture 2Arithmetic CodeAlexander Kolesnikov Arithmetic codeAlphabet extension (blocking symbols) can lead to coding efficiencyHow about treating entire sequence as one symbol!Not practical with Huffman codingArithmetic coding allows you to do precisely thisBasic idea - map data sequences to sub-intervals in [0,1) with lengths equal to probability of corresponding sequence. 1) Huffman coder: H ≤ R ≤ H + 1 bit/(symbol, pel)2) Arithmetic code: H ≤ R ≤ H + 1 bit/message (!) Arithmetic code: HistoryRissanen [1976] : arithmetic codePasco [1976] : arithmetic code Arithmetic code: Algorithm (1)0) Start by defining the current interval as [0,1).1) REPEAT for each symbol s in the input stream a) Divide the current interval [L, H) into subintervals whose sizes are proportional to the symbols's probabilities. b) Select the subinterval [L, H) for the symbol s and define it as the new current interval2) When the entire input stream has been processed, the output should be any number V that uniquely identify the current interval [L, H). Arithmetic code: Algorithm (2)0.70 Arithmetic code:Algorithm (3)Probabilities: p1, p2, …, pN.Cumulants: C1=0; C2=C1+p1=p1; C3=C2+p2 =p1+p2; etc. CN=p1+p2+…+pN-1; CN+1=1; 0) Current interval [L, H) = [0.0, 1.0): 1) REPEAT for each symbol si in the input stream: H ← L + (H − L)*C(si+1), L ← L + (H − L)*C(si);2) UNTIL the entire input stream has been processed. The output code V is any number that uniquely identify the current interval [L, H). Example 1: StatisticsMessage: 'SWISS_MISS'Char Freq Prob [C(si), C(si+1))S 5 5/10=0.5 [0.5, 1.0)W 1 1/10=0.1 [0.4, 0.5)I 2 2/10=0.2 [0.2, 0.4)M 1 1/10=0.1 [0.1, 0.2)_ 1 1/10=0.1 [0.0, 0.1) Example 1: EncodingS 0.5 [0.5, 1.0) W 0.1 [0.4, 0.5) I 0.2 [0.2, 0.4) M 0.1 [0.1, 0.2) __ 0.1 [0.0, 0.1) Example 1: DecodingS 0.5 [0.5, 1.0) W 0.1 [0.4, 0.5) I 0.2 [0.2, 0.4) M 0.1 [0.1, 0.2) __ 0.1 [0.0, 0.1) V ∈ [0.71753375, 0.71753500) Example 1: Compression?V ∈ [0.71753375, 0.71753500)• How many bits do we need to encode a number V in the final interval [L, H)? m=4 bits: 16=24 intervals of size ∆=1/16.• The number of bits m to represent a value in the interval of size ∆ : m= -Log2(∆) bits.010 100 0110 11000 001 010 011101 101110 1110000 00011110 1111 . Algorithm (2) 0.70 Arithmetic code:Algorithm (3)Probabilities: p1, p2, …, pN.Cumulants: C1=0; C2=C1+p1=p1; C3=C2+p2 =p1+p2; etc. CN=p1+p2+…+pN-1;. 0.5)I 2 2/10=0 .2 [0 .2, 0.4)M 1 1/10=0.1 [0.1, 0 .2) _ 1 1/10=0.1 [0.0, 0.1) Example 1: EncodingS 0.5 [0.5, 1.0) W 0.1 [0.4, 0.5) I 0 .2 [0 .2,