Basicconcepts Source coding and Channel coding Lossless encoding Prefix code, entropy code Huffman, ex-golomb, arithmetic, LZW Predictive encoding PCM, DPCM Transformation DCT Sampling and Quantization Forward Error Correction Reminder: Random Variables Variables vs random variables randvar is associated with random events (measurements, observations) that produce random outcomes (values) Example: Tossing coin, tossing die, …, drawing cards, throwing a needle Discrete randvar vs Continuous randvar Reminder: cdf, pdf Cumulative distribution function (cdf) Probability density function (pdf) Information measurement (lượng tin) Symbol x với xác suất xuất p(x), nội dung thơng tin (thơng tin chứa biểu tượng) là: Lượng tin không phụ thuộc vào giá trị biểu tượng, phụ thuộc vào xác suất xuất biểu tượng Đơn vị tính lượng tin bit I ( x) log( p( x)) Entropy Entropy measure information of a source Entropy = Trung bình lượng tin symbols alphabet nguồn H ( X ) p ( x) I ( x) p ( x) log( p( x)) x x Prefix code Variable code-length No valid code-words are prefix of other valid code-word Example: Prefix code or not prefix code? Huffman coding Alphabet A = {s1, s2, …, sk} Associated “frequency” f1, f2, …, fk “frequency” : xác suất xuất symbols Sort symbols : ascending order Pick (and remove from the list) smallest symbols: smallest -> 0; 2nd smallest -> Create a new “pseudo symbol” whose freq is sum of the two frequencies Insert it into the list Repeat step Nyquist-Shannon Sampling Theorem When sampling a signal at discrete intervals, the sampling frequency must be >= fmax fmax = max frequency of the input signal This will allows to reconstruct the original perfectly from the sampled version Anti-aliasing Sample more often Get rid of all frequencies that are greater than half the new sampling frequency Will lose information But it’s better than aliasing Apply a smoothing filter Quantization Quantizer: mapping Q(u) u: continuous variable Q(u) in {r1, r2, …, rk} How to implement a quantizer: Segment the real axis : k fragments u in the i-th segment is mapped to ri Quantizer example Q(u) in {rk | k=1 L } Reconstruction level tk , k = L+1 Transition level Δk=tk+1 – tk Quantization step u – Q(u) : quantization error Uniform quantizer vs non-uniform quantizer Designing a quantizer Quantization causes information loss Rate-distortion optimized quantization problem: Given signal u with (known) pdf p(u) How to design a L-level quantizer with minimum information loss Input: pdf function p(u) Output {rk | k = L } and {tk | k = L+1 } Lloyd – Max quantization Lloyd-max quantizer – Centroid condition Nếu {tk} biết, sai số bình phương trung bình (MSE) nhỏ rk trọng số đoạn lượng tử Channel coding Channel coding (Error Control Coding, Forward Error Correction) Adding more bits into source stream Redundant data Consume more bandwidth Channel may cause information lost on (source info + – noise) Receiving site reconstruct source info from (source info + redundant – noise) Shannon channel coding theorem: Exist an ECC scheme that error probability is arbitrarily small Digital Communication System Source encoder – decoder: data compression/decompression Channel encoder – decoder: Channel decoder uses redundant bits to detect (and/or correct) errors Communication channels: Transmission media Storage Any process that cause error !!! Two kinds of Error Control Codes Block codes: Algebraic approach Convolutional codes Probabilistic approach Erasure code (Forward Error Correction – FEC) Is a block code Erasure code Original blocks: m L-1] x1, x2, x3, …, xm Coded blocks: linear combinations of original blocks X1 = x1 + x2 X2 = x2 + x3 + x5 … Xn = x1 + x5 + xm Receive: k Xi packets (n > k > m) Encoding algorithm: Know x1, x2, …, xm, calculate X1, X2, …, Xn Decoding algorithm: Know X1, X2, X3, … , calculate x1, x2, x3, …, xm Decoding algorithm = solving system of linear equation