Vector Quantization Contents I II III IV V VI VII VIII Introduction Preliminaries Design Problem Optimality Criteria LBG Design Algorithm Two-Dimensional Animation Performance References I Introduction Vector quantization (VQ) is a lossy data compression method based on the principle of block coding It is a fixed-to-fixed length algorithm In the earlier days, the design of a vector quantizer (VQ) is considered to be a challenging problem due to the need for multi-dimensional integration In 1980, Linde, Buzo, and Gray (LBG) proposed a VQ design algorithm based on a training sequence The use of a training sequence bypasses the need for multi-dimensional integration A VQ that is designed using this algorithm are referred to in the literature as an LBG-VQ II Preliminaries A VQ is nothing more than an approximator The idea is similar to that of ``rounding-off'' (say to the nearest integer) An example of a 1-dimensional VQ is shown below: Here, every number less than -2 are approximated by -3 Every number between -2 and are approximated by -1 Every number between and are approximated by +1 Every number greater than are approximated by +3 Note that the approximate values are uniquely represented by bits This is a 1-dimensional, 2-bit VQ It has a rate of bits/dimension An example of a 2-dimensional VQ is shown below: Here, every pair of numbers falling in a particular region are approximated by a red star associated with that region Note that there are 16 regions and 16 red stars each of which can be uniquely represented by bits Thus, this is a 2-dimensional, 4-bit VQ Its rate is also bits/dimension In the above two examples, the red stars are called codevectors and the regions defined by the blue borders are called encoding regions The set of all codevectors is called the codebook and the set of all encoding regions is called the partition of the space III Design Problem The VQ design problem can be stated as follows Given a vector source with its statistical properties known, given a distortion measure, and given the number of codevectors, find a codebook (the set of all red stars) and a partition (the set of blue lines) which result in the smallest average distortion We assume that there is a training sequence consisting of source vectors: This training sequence can be obtained from some large database For example, if the source is a speech signal, then the training sequence can be obtained by recording several long telephone conversations is assumed to be sufficiently large so that all the statistical properties of the source are captured by the training sequence We assume that the source vectors are dimensional, e.g., - be the number of codevectors and let Let represents the codebook Each codevector is -dimensional, e.g., be the encoding region associated with codevector Let denote the partition of the space If the source vector approximation (denoted by ) is and let is in the encoding region , then its : Assuming a squared-error distortion measure, the average distortion is given by: where Given The design problem can be succinctly stated as follows: and , find and such that is minimized IV Optimality Criteria If and are a solution to the above minimization problem, then it must satisfied the following two criteria • Nearest Neighbor Condition: should consists of all vectors that are This condition says that the encoding region closer to than any of the other codevectors For those vectors lying on the boundary (blue lines), any tie-breaking procedure will • Centroid Condition: This condition says that the codevector should be average of all those training vectors In implementation, one should ensure that at least one that are in encoding region training vector belongs to each encoding region (so that the denominator in the above equation is never 0) V LBG Design Algorithm The LBG VQ design algorithm is an iterative algorithm which alternatively solves the above two optimality criteria The algorithm requires an initial codebook This initial codebook is obtained by the splitting method In this method, an initial codevector is set as the average of the entire training sequence This codevector is then split into two The iterative algorithm is run with these two vectors as the initial codebook The final two codevectors are splitted into four and the process is repeated until the desired number of codevectors is obtained The algorithm is summarized below LBG Design Algorithm Given Fixed Let to be a ``small'' number and Calculate Splitting: For Set Iteration: Let i For over all ii , set For Set the iteration index , find the minimum value of Let be the index which achieves the minimum Set , update the codevector iii iv Set Calculate v If vi Set , go back to Step (i) For , set as the final codevectors Repeat Steps and until the desired number of codevectors is obtained VI Two-Dimensional Animation Click on the figure above to begin the animation • If the animation appears to be stuck, try moving up or down the page in your browser The source for the above is a memoryless Gaussian source with zero-mean and unit variance The tiny green dots are training vectors - there are 4096 of them • • • The LBG design algorithm is run with The algorithm guarantees a locally optimal solution The size of the training sequence should be sufficiently large It is recommended that • • VI Performance The performance of VQ are typically given in terms of the signal-to-distortion ratio (SDR): (in dB), where is the variance of the source and is the average squared-error distortion The higher the SDR the better the performance The following tables show the performance of the LBG-VQ for the memoryless Gaussian source and the first-order Gauss-Markov source with correlation coefficient 0.9 Comparisons are made with the optimal performance theoretically attainable, SDRopt, which is obtained by evaluating the rate-distortion function Rate SDR (in dB) SDRopt (bits/dimension) 4.4 4.4 4.5 4.7 4.8 4.8 4.9 5.0 6.0 9.3 9.6 9.9 10.2 10.3 12.0 14.6 15.3 15.7 18.1 20.2 21.1 24.1 26.0 27.0 30.1 Memoryless Gaussian Source Rate SDR (in dB) SDRopt (bits/dimension) 4.4 7.8 9.4 10.2 10.7 11.0 11.4 11.6 13.2 9.3 13.6 15.0 15.8 16.2 19.3 14.6 19.0 20.6 25.3 20.2 24.8 31.3 26.0 30.7 37.3 First-Order Gauss-Markov Source with Correlation 0.9 VIII References A Gersho and R M Gray, Vector Quantization and Signal Compression H Abut, Vector Quantization R M Gray, ``Vector Quantization,'' IEEE ASSP Magazine, pp 29, April 1984 Y Linde, A Buzo, and R M Gray, ``An Algorithm for Vector Quantizer Design,'' IEEE Transactions on Communications, pp 702 710, January 1980 Image Compression with Vector Quantization by Ivan-Assen Ivanov [Programming] April 16, 2001 Page of The famous Moore's law, which states in rough terms that every 18 months the speed of computers doubles, has an evil twin: every 18 months software becomes twice as slow A similar relationship can be formulated for RAM and game data: no matter how big the memory budget of your next-generation game may seem, your art team can probably fill it up faster than you can say "disk thrashing." The appetite for art megabytes grows faster than the publisher's willingness to raise the minimum platform requirements Until we start seeing games with a serious amount of geometry, the greatest slice of the memory pie will belong to textures Nobody wants to ship a game with small, blurry, obviously tiling textures—and it's up to the programmers to alleviate texture limitations The hundreds of megabytes of stuff coming from the art quarters must be compressed Conventional image-compression algorithms are not very well suited to the specific requirements of art storage in games They are designed for relatively fast compression, which is not an issue, since art assets are preprocessed offline; their decompression speed leaves much to be desired Also, it is usually hard to access a specific portion of the image For fixed textures used in hardware-rendered games, the texture compression schemes such as DXTn present a solution; however, for supporting older hardware, for (gasp!) software renderers, and doing more complicated stuff with textures they aren't perfect Sure, you could decompress DXTn in software and process it, but those formats aren't really meant for this—it would probably be quite slow There is a better solution in terms of both decompression speed and image quality Image-compression algorithms based on vector quantization (VQ) techniques have been researched for years Recently, such algorithms have been implemented in hardware by several graphics chip vendors Unlike DXTn, VQ decompression is as easy to in software as it is in hardware, and might be just what you need to slash the memory requirements of your project in half This article provides an introduction to the field of VQ, presents two algorithms for performing VQ, and goes into the details of a successful real-world application for VQ texture compression What Is Vector Quantization? Strictly speaking, quantization is the procedure of approximating continuous with discrete values; in practice, the input values to the quantization procedure are often also discrete, but with a much finer resolution than that of the output values The goal of quantization usually is to produce a more compact representation of the data while maintaining its usefulness for a certain purpose For example, to store color intensities you can quantize floating-point values in the range [0.0, 1.0] to integer values in the range 0-255, representing them with bits, which is considered a sufficient resolution for many applications dealing with color In this example, the spacing of possible values is the same over the entire discrete set, so we speak of uniform quantization; often, a nonuniform spacing is more appropriate when better resolution is needed over some parts of the range of values Floating-point number representation is an example of nonuniform quantization—you have the as many possible FP values between 0.1 and as you have between 10 and 100 Both these are examples of scalar quantization—the input and output values are scalars, or single numbers You can vector quantization (VQ) too, replacing vectors from a continuous (or dense discrete) input set with vectors from a much sparser set (note that here by vector we mean an ordered set of N numbers, not just the special case of points in 3D space) For example, if we have the colors of the pixels in an image represented by triples of red, green, and blue intensities in the [0.0, 1.0] range, we could quantize them uniformly by quantizing each of the three intensities to an 8-bit number; this leads us to the traditional 24-bit representation By quantizing each component of the vector for itself, we gain nothing over standard scalar quantization; however, if we quantize the entire vectors, replacing them with vectors from a carefully chosen sparse nonuniform set and storing just indices into that set, we can get a much more compact representation of the image This is nothing but the familiar paletted image representation In VQ literature the "palette," or the set of possible quantized values for the vectors is called a "codebook," because you need it to "decode" the indices into actual vector values Why Does VQ Work? It turns out that VQ is a powerful method for lossy compression of data such as sounds or images, because their vector representations often occupy only small fractions of their vector spaces We can illustrate this distribution in the case of a simple representation of a grayscale image in a 2D vector space The vectors will be composed by taking in pairs the values of adjacent pixels If the input image has 256 shades of gray, we can visualize the vector space as the [0,0]-[255,255] square in the plane We can then take the two components of the vectors as XY coordinates and plot a dot for each vector found in the input image Figure shows the result of this procedure applied to a grayscale version of the famous "Lena" (Figure 1), a traditional benchmark for image-compression algorithms The diagonal line along which the density of the input vectors is concentrated is the x = y line; the reason for this clustering is that "Lena," like most photographic images, consists predominantly of smooth gradients Adjacent pixels from a smooth gradient have similar values, and the corresponding dot on the diagram is close to the x = y line The areas on the diagram which would represent abrupt intensity changes from one pixel to the next are sparsely populated FIGURE Distribution of pairs of adjacent pixels from grayscale Lena If we decide to reduce this image to bits/pixel via scalar quantization, this would mean reducing the pixels to four possible values If we interpret this as VQ on the 2D vector distribution diagram, we get a picture like Figure FIGURE Scalar quantization to bits/pixel interpreted as 2D VQ The big red dots on the figure represent the 16 evenly spaced possible values of pairs of pixels Every pair from the input image would be mapped to one of these dots during the quantization The red lines delimit the "zones of influence," or cells of the vectors—all vectors inside a cell would get quantized to the same codebook vector Now we see why this quantization is very inefficient: Two of the cells are completely empty and four other cells are very sparsely populated The codebook vectors in the six cells adjacent to the x = y diagonal are shifted away from the density maxima in their cells, which means that the average quantization error in these cells will be unnecessarily high In other words, six of the 16 possible pairs of pixel values are wasted, six more are not used efficiently and only four are O.K Let's perform an equivalent (in terms of size of resulting quantized image) vector quantization Instead of bits/pixel, we'll allocate bits per 2D vector, but now we can take the freedom to place the 16 vectors of the codebook anywhere in the diagram To minimize the mean quantization error, we'll place all of these vectors inside the dense cloud around the x = y diagonal FIGURE Vector quantization to bits per 2D-vector Figure shows how things look with VQ As in Figure 3, the codebook vectors are represented as big red dots, and the red lines delimit their zones of influence (This partitioning of a vector space into cells around a predefined set of "special" vectors, such as for all vectors inside a cell the same "special" vector is closest to them, is called a Voronoi diagram; the cells are called Voronoi cells You can find a lot of resources on Voronoi diagrams on the Internet, since they have some interesting properties besides being a good illustration of the merits of VQ.) You can see that in the case of VQ the cells are smaller (that is, the quantization introduces smaller errors) where it matters the most—in the areas of the vector space where the input vectors are dense No codebook vectors are wasted on unpopulated regions, and inside each cell the codebook vector is optimally spaced with regard to the local input vector density When you go to higher dimensions (for example, taking 4-tuples of pixels instead of pairs), VQ gets more and more efficient—up to a certain point How to determine the optimal vector size for a given set of input data is a rather complicated question beyond the scope of this article; basically, to answer it, you need to study the autocorrelation properties of the data It suffices to say that for images of the type and resolution commonly used in games, four is a good choice for the vector size For other applications, such as voice compression, vectors of size 40-50 are used 10 ... Correlation 0.9 VIII References A Gersho and R M Gray, Vector Quantization and Signal Compression H Abut, Vector Quantization R M Gray, ` `Vector Quantization, '' IEEE ASSP Magazine, pp 29, April 1984... single numbers You can vector quantization (VQ) too, replacing vectors from a continuous (or dense discrete) input set with vectors from a much sparser set (note that here by vector we mean an ordered... during the quantization The red lines delimit the "zones of influence," or cells of the vectors—all vectors inside a cell would get quantized to the same codebook vector Now we see why this quantization