handbook of approximation algorithms and metaheuristics gonzalez 2007 05 15 Cấu trúc dữ liệu và giải thuật

Handbook of Approximation Algorithms and Metaheuristics © 2007 by Taylor & Francis Group, LLC CuuDuongThanCong.com CHAPMAN & HALL/CRC COMPUTER and INFORMATION SCIENCE SERIES Series Editor: Sartaj Sahni PUBLISHED TITLES ADVERSARIAL REASONING: COMPUTATIONAL APPROACHES TO READING THE OPPONENT’S MIND Alexander Kott and William M McEneaney DISTRIBUTED SENSOR NETWORKS S Sitharama Iyengar and Richard R Brooks DISTRIBUTED SYSTEMS: AN ALGORITHMIC APPROACH Sukumar Ghosh FUNDAMENTALS OF NATURAL COMPUTING: BASIC CONCEPTS, ALGORITHMS, AND APPLICATIONS Leandro Nunes de Castro HANDBOOK OF ALGORITHMS FOR WIRELESS NETWORKING AND MOBILE COMPUTING Azzedine Boukerche HANDBOOK OF APPROXIMATION ALGORITHMS AND METAHEURISTICS Teofilo F Gonzalez HANDBOOK OF BIOINSPIRED ALGORITHMS AND APPLICATIONS Stephan Olariu and Albert Y Zomaya HANDBOOK OF COMPUTATIONAL MOLECULAR BIOLOGY Srinivas Aluru HANDBOOK OF DATA STRUCTURES AND APPLICATIONS Dinesh P Mehta and Sartaj Sahni HANDBOOK OF SCHEDULING: ALGORITHMS, MODELS, AND PERFORMANCE ANALYSIS Joseph Y.-T Leung THE PRACTICAL HANDBOOK OF INTERNET COMPUTING Munindar P Singh SCALABLE AND SECURE INTERNET SERVICES AND ARCHITECTURE Cheng-Zhong Xu SPECULATIVE EXECUTION IN HIGH PERFORMANCE COMPUTER ARCHITECTURES David Kaeli and Pen-Chung Yew © 2007 by Taylor & Francis Group, LLC CuuDuongThanCong.com +DQGERRNRI$SSUR[LPDWLRQ $OJRULWKPVDQG0HWDKHXULVWLFV Edited by 7HRÀOR)*RQ]DOH] 8QLYHUVLW\RI&DOLIRUQLD 6DQWD%DUEDUD86$ © 2007 by Taylor & Francis Group, LLC CuuDuongThanCong.com Chapman & Hall/CRC Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487-2742 © 2007 by Taylor & Francis Group, LLC Chapman & Hall/CRC is an imprint of Taylor & Francis Group, an Informa business No claim to original U.S Government works Printed in the United States of America on acid-free paper 10 International Standard Book Number-10: 1-58488-550-5 (Hardcover) International Standard Book Number-13: 978-1-58488-550-4 (Hardcover) This book contains information obtained from authentic and highly regarded sources Reprinted material is quoted with permission, and sources are indicated A wide variety of references are listed Reasonable efforts have been made to publish reliable data and information, but the author and the publisher cannot assume responsibility for the validity of all materials or for the consequences of their use No part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers For permission to photocopy or use material electronically from this work, please access www.copyright.com (http:// www.copyright.com/) or contact the Copyright Clearance Center, Inc (CCC) 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400 CCC is a not-for-profit organization that provides licenses and registration for a variety of users For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe Library of Congress Cataloging-in-Publication Data Handbook of approximation algorithms and metaheurististics / edited by Teofilo F Gonzalez p cm (Chapman & Hall/CRC computer & information science ; 10) Includes bibliographical references and index ISBN-13: 978-1-58488-550-4 ISBN-10: 1-58488-550-5 Computer algorithms Mathematical optimization I Gonzalez, Teofilo F II Title III Series QA76.9.A43H36 2007 005.1 dc22 Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com and the CRC Press Web site at http://www.crcpress.com © 2007 by Taylor & Francis Group, LLC CuuDuongThanCong.com 2007002478 DEDICATED To my wife Dorothy, and my children Jeanmarie, Alexis, Julia, Teofilo, and Paolo © 2007 by Taylor & Francis Group, LLC CuuDuongThanCong.com Preface Forty years ago (1966), Ronald L Graham formally introduced approximation algorithms The idea was to generate near-optimal solutions to optimization problems that could not be solved efficiently by the computational techniques available at that time With the advent of the theory of NP-completeness in the early 1970s, the area became more prominent as the need to generate near optimal solutions for NP-hard optimization problems became the most important avenue for dealing with computational intractability As it was established in the 1970s, for some problems one can generate near optimal solutions quickly, while for other problems generating provably good suboptimal solutions is as difficult as generating optimal ones Other approaches based on probabilistic analysis and randomized algorithms became popular in the 1980s The introduction of new techniques to solve linear programming problems started a new wave for developing approximation algorithms that matured and saw tremendous growth in the 1990s To deal, in a practical sense, with the inapproximable problems there were a few techniques introduced in the 1980s and 1990s These methodologies have been referred to as metaheuristics There has been a tremendous amount of research in metaheuristics during the past two decades During the last 15 or so years approximation algorithms have attracted considerably more attention This was a result of a stronger inapproximability methodology that could be applied to a wider range of problems and the development of new approximation algorithms for problems in traditional and emerging application areas As we have witnessed, there has been tremendous growth in field of approximation algorithms and metaheuristics The basic methodologies are presented in Parts I–III Specifically, Part I covers the basic methodologies to design and analyze efficient approximation algorithms for a large class of problems, and to establish inapproximability results for another class of problems Part II discusses local search, neural networks and metaheuristics In Part III multiobjective problems, sensitivity analysis and stability are discussed Parts IV–VI discuss the application of the methodologies to classical problems in combinatorial optimization, computational geometry and graphs problems, as well as for large-scale and emerging applications The approximation algorithms discussed in the handbook have primary applications in computer science, operations research, computer engineering, applied mathematics, bioinformatics, as well as in engineering, geography, economics, and other research areas with a quantitative analysis component Chapters and present an overview of the field and the handbook These chapters also cover basic definitions and notation, as well as an introduction to the basic methodologies and inapproximability Chapters 1–8 discuss methodologies to develop approximation algorithms for a large class of problems These methodologies include restriction (of the solution space), greedy methods, relaxation (LP and SDP) and rounding (deterministic and randomized), and primal-dual methods For a minimization problem P these methodologies provide for every problem instance I a solution with objective function value that is at most (1 + ǫ) · f ∗ (I ), where ǫ is a positive constant (or a function that depends on the instance size) and f ∗ (I ) is the optimal solution value for instance I These algorithms take polynomial time with respect to the size of the instance I being solved These techniques also apply to maximization vii © 2007 by Taylor & Francis Group, LLC CuuDuongThanCong.com viii Preface problems, but the guarantees are different Given as input a value for ǫ and any instance I for a given problem P , an approximation scheme finds a solution with objective function value at most (1 + ǫ)· f ∗ (I ) Chapter discusses techniques that have been used to design approximation schemes These approximation schemes take polynomial time with respect to the size of the instance I (PTAS) Chapter 10 discusses different methodologies for designing fully polynomial approximation schemes (FPTAS) These schemes take polynomial time with respect to the size of the instance I and 1/ǫ Chapters 11–13 discuss asymptotic and randomized approximation schemes, as well as distributed and randomized approximation algorithms Empirical analysis is covered in Chapter 14 as well as in chapters in Parts IV–VI Chapters 15–17 discuss performance measures, reductions that preserve approximability, and inapproximability results Part II discusses deterministic and stochastic local search as well as very large neighborhood search Chapters 21 and 22 present reactive search and neural networks Tabu search, evolutionary computation, simulated annealing, ant colony optimization and memetic algorithms are covered in Chapters 23–27 In Part III, I discuss multiobjective optimization problems, sensitivity analysis and stability of approximations Part IV covers traditional applications These applications include bin packing and extensions, packing problems, facility location and dispersion, traveling salesperson and generalizations, Steiner trees, scheduling, planning, generalized assignment, and satisfiability Computational geometry and graph applications are discussed in Part V The problems discussed in this part include triangulations, connectivity problems in geometric graphs and networks, dilation and detours, pair decompositions, partitioning (points, grids, graphs and hypergraphs), maximum planar subgraphs, edge disjoint paths and unsplittable flow, connectivity problems, communication spanning trees, most vital edges, and metaheuristics for coloring and maximum disjoint paths Large-scale and emerging applications (Part VI) include chapters on wireless ad hoc networks, sensor networks, topology inference, multicast congestion, QoS multimedia routing, peer-to-peer networks, data broadcasting, bioinformatics, CAD and VLSI applications, game theoretic approximation, approximating data streams, digital reputation and color quantization Readers who are not familiar with approximation algorithms and metaheuristics should begin with Chapters 1–6, 9–10, 18–21, and 23–27 Experienced researchers will also find useful material in these basic chapters We have collected in this volume a large amount of this material with the goal of making it as complete as possible I apologize in advance for omissions and would like to invite all of you to suggest to me chapters (for future editions of this handbook) to keep up with future developments in the area I am confident that research in the field of approximations algorithms and metaheuristics will continue to flourish for a few more decades Teofilo F Gonzalez Santa Barbara, California © 2007 by Taylor & Francis Group, LLC CuuDuongThanCong.com About the Cover The four objects in the bottom part of the cover represent scheduling, bin packing, traveling salesperson, and Steiner tree problems A large number of approximation algorithms and metaheuristics have been designed for these four fundamental problems and their generalizations The seven objects in the middle portion of the cover represent the basic methodologies Of these seven, the object in the top center represents a problem by its solution space The object to its left represents its solution via restriction and the one to its right represents relaxation techniques The objects in the row below represent local search and metaheuristics, problem transformation, rounding, and primal-dual methods The points in the top portion of the cover represent solutions to a problem and their height represents their objective function value For a minimization problem, the possible solutions generated by an approximation scheme are the ones inside the bottommost rectangle The ones inside the next rectangle represent the one generated by a constant ratio approximation algorithm The top rectangle represents the possible solution generated by a polynomial time algorithm for inapproximable problems (under some complexity theoretic hypothesis) ix © 2007 by Taylor & Francis Group, LLC CuuDuongThanCong.com About the Editor ´ Dr Teofilo F Gonzalez received the B S degree in computer science from the Instituto Tecnologico de Monterrey (1972) He was one of the first handful of students to receive a computer science degree in Mexico He received his Ph.D degree from the University of Minnesota, Minneapolis (1975) He has been member of the faculty at Oklahoma University, Penn State, and University of Texas at Dallas, ´ and has spent sabbatical leaves at Utrecht University (Netherlands) and the Instituto Tecnologico de Monterrey (ITESM, Mexico) Currently he is professor of computer science at the University of California, Santa Barbara Professor Gonzalez’s main area of research activity is the design and analysis of efficient exact and approximation algorithms for fundamental problems arising in several disciplines His main research contributions fall in the areas of resource allocation and job scheduling, message dissemination in parallel and distributed computing, computational geometry, graph theory, and VLSI placement and wire routing His professional activities include chairing conference program committees and membership in journal editorial boards He has served as an accreditation evaluator and has been a reviewer for numerous journals and conferences, as well as CS programs and funding agencies xi © 2007 by Taylor & Francis Group, LLC CuuDuongThanCong.com Contributors Emile Aarts Maria J Blesa Marco Chiarandini Philips Research Laboratories Eindhoven, The Netherlands Technical University of Catalonia Barcelona, Spain University of Southern Denmark Odense, Denmark Ravindra K Ahuja Christian Blum Francis Y L Chin University of Florida Gainesville, Florida Technical University of Catalonia Barcelona, Spain The University of Hong Kong Hong Kong, China Enrique Alba Vincenzo Bonifaci Christopher James Coakley University of Málaga Málaga, Spain Christoph Albrecht Cadence Berkeley Labs Berkeley, California Eric Angel University of Evry Val d’Essonne Evry, France Abdullah N Arslan University of Vermont Burlington, Vermont Giorgio Ausiello University of Rome “La Sapienza” Rome, Italy Sudha Balla University of Connecticut Storrs, Connecticut Evripidis Bampis University of Evry Val d’Essonne Evry, France Roberto Battiti University of Trento Trento, Italy Alan A Bertossi University of Bologna Bologna, Italy University of Rome “La Sapienza” Rome, Italy University of California, Santa Barbara Hans-Joachim Bockenhauer ă Santa Barbara, California Swiss Federal Institute of Technology (ETH) Zăurich Edward G Coffman, Jr Zăurich, Switzerland Columbia University New York, New York Mauro Brunato University of Trento Povo, Italy Jason Cong Gruia Calinescu University of California Los Angeles, California Illinois Institute of Technology Chicago, Illinois Carlos Cotta Peter Cappello University of Málaga Málaga, Spain University of California, Santa Barbara Santa Barbara, California Kun-Mao Chao National Taiwan University Taiwan, Republic of China Danny Z Chen University of Notre Dame Notre Dame, Indiana Ting Chen University of Southern California Los Angeles, California Janos Csirik ´ University of Szeged Szeged, Hungary Artur Czumaj University of Warwick Coventry, United Kingdom Bhaskar DasGupta University of Illinois at Chicago Chicago, Illinois Jaime Davila University of Connecticut Storrs, Connecticut xiii © 2007 by Taylor & Francis Group, LLC CuuDuongThanCong.com 86-3 Color Quantization Color quantization Image-independent Image-dependent Context-free FIGURE 86.2 Context-sensative Categorization of color quantization methods certain statistical criteria; and (3) image-dependent, context-sensitive methods that make use of additional contextual information beyond original colors and their frequencies that can be derived from the input image (e.g., spatial relationship between the pixels) as heuristics to help better restrain visible quantization artifacts Many algorithms are essentially color space-neutral by design, leaving it a separate issue for the pixel colors to be represented in an appropriate color space However, the RGB color space is frequently used in experimental implementation Three common options exist for the mapping of original colors to quantized colors, or more broadly, the assignment of quantized colors to pixels: (1) replace each original color with the closest counterpart in the color map (mostly image-independent methods); (2) replace each original color with a specific quantized color whose association with the original color has already been determined during the construction of the color map (mostly image-dependent methods), and (3) make use of such techniques as error diffusion [6–9] to select the appropriate quantized color for each pixel The latter option helps to smooth out some of the visible quantization artifacts, with the trade-off being that it may also degrade sharp edges and fine details.1 86.2 Color Spaces for Quantization The CIELUV and CIELAB are two prominent candidate spaces that are derivatives of the CIE 1931 XYZ color model, which is device-independent but nonuniform in terms of perceived color differences [10] Both are defined as nonlinear transformations of XYZ with respect to a reference white point, which may be the standard illuminant D50 for reflective reproduction, with XYZ coordinates being (0.9642, 1.0, 0.8249), or D65 for emissive display, with XYZ coordinates being (0.9504, 1.0, 1.0889) The CIELUV or CIE 1976 L ∗ u∗ v ∗ color space is defined by     116 1/3 Y − 16, Yw L∗ =  Y   , 903.3 Yw u∗ = 13L ∗ (u′ − u′w ) v ∗ = 13L ∗ (v ′ − vw′ ) Y Yw > 0.008856 otherwise where 4X 4X w u′w = X + 15Y + 3Z X w + 15Yw + 3Z w 9Yw 9Y ′ ′ vw = v = X + 15Y + 3Z X w + 15Yw + 3Z w X w , Yw , and Z w are determined from the reference white point u′ = Edge detection has been used to suppress error diffusion across edges to preserve image sharpness [28]; and edge enhancement may be incorporated into the error diffusion process [54] © 2007 by Taylor & Francis Group, LLC CuuDuongThanCong.com 86-4 Handbook of Approximation Algorithms and Metaheuristics The CIELAB or CIE 1976 L ∗ a ∗ b ∗ color space is defined by     116 Y Y w L∗ = Y    903.3 Y w 1/3 Y Yw − 16, , > 0.008856 otherwise a ∗ = 500 f X Xw − f Y Yw b ∗ = 200 f Y Yw − f Z Zw where  t 1/3 t > 0.008856 16 otherwise 7.787t + 116 X w , Yw , and Z w are determined from the reference white point f (t) = The L ∗ component in both trivariant models is designed to carry luminance and the remaining two specify chrominance Although often referred to as being perceptually uniform, these two color spaces still depart from uniformity over the visible gamut, with variations √ that may reach as high as 6:1 when L ∗2 + u∗2 + v ∗2 and E ab = = color differences are measured by Euclidian distances E uv √ ∗2 ∗2 ∗2 L + a + b [11,12] An improved color difference formula was introduced in 1994 [11]: L∗ kL SL ∗ E 94 = ∗ + C ab k C SC ∗ + Hab kH SH which is based on using polar √ coordinates to address color points in the CIELAB space in terms of perceived ∗ ∗ ∗ lightness L ∗ , chroma C ab = a ∗2 + b ∗2 , and hue angle Hab = tan−1 ( ab ∗ ) Standard reference values for ∗ ∗ the formula are k L = kC = k H = 1, S L = 1, SC = + 0.045C ab , and S H = + 0.015C ab The conversion between RGB (assumed to be linear without γ correction for cathode ray tubes) and XYZ may be carried out by the following standard transformation, with white illuminant D65:      3.0651 −1.3942 −0.4761 X R  G  = −0.9690 1.8755 0.0415 Y  0.0679 −0.2290 1.0698 Z B 86.3 Image-Independent Quantization In contrast with the image-dependent methods that choose quantized colors in an image-specific fashion, quantization may also be carried out on an image-independent basis, with a fixed/universal palette for all images These image-independent methods enjoy high computational efficiency since they avoid the need to analyze each original image to determine the quantized colors for that image And there is no overhead for the storage and transmission of the individualized color map for each image The trade-off is that a set of quantized colors that are specifically tailored to the distribution of the original colors in a given image tends to a better job in approximating those original colors and lowering quantization errors Uniform quantization In this approach we preselect k colors that are uniformly distributed in the chosen color space (preferably perceptually uniform) A ready example would be the × × browser/web-safe palette, with integer values 0, 51, 102, 153, 204, and 255 for each primary for a total of 216 RGB colors [13] The quantization of an image now entails mapping each pixel to a preselected color (e.g., one that is the closest to the pixel’s original color) © 2007 by Taylor & Francis Group, LLC CuuDuongThanCong.com 86-5 Color Quantization Set Map Set Set Map Set FIGURE 86.3 Trellis-coded quantization An easy and fast implementation of uniform quantization involves the truncation of a few leastsignificant bits from each component of an original color, rounding the original color down to a quantized color For example, we may truncate bits from each component of a 24-bit RGB color to arrive at its counterpart in a set of 32 × 32 × 32 quantized colors Alternatively, aiming to better preserve luminance (a key ingredient that conveys details) and taking hint from the standard formula for computing luminance from RGB values: Y = 0.299R + 0.587G + 0.114B, we may truncate bits from the red component, bits from the green component, and bits from the blue component to partially compensate for the nonuniform nature of the RGB color space This bit-cutting technique effectively places all quantized colors below the maximum intensity level in each dimension of the color space, and causes a downward shift in intensity (as well as hue shift) across the entire image These are often unacceptable when a relatively high number of bits are truncated Trellis-coded quantization Consider the case of uniform quantization using one byte for the direct encoding of pixel colors, for example, 3-3-2 for red-green-blue, we would have a rather coarse grid of × × quantized colors Now if we can “extend” the capacity of the limited number of bits used for each primary to specify intensity values at a higher resolution, we will be able to approach the effect of uniform quantization within a finer grid This is made possible by the application of the Viterbi algorithm [14,15] in trellis-coded quantization [16–18] Take, for example, the encoding of one of the primaries with x = bits, which normally yields 2x = 23 = intensity levels In contrast, we may have two color maps (see Figure 86.3), each of which consists of eight equally spaced intensity levels The values in one map can be obtained by offsetting the values in the other map by half the distance between two adjacent levels We further partition the intensity values in each map into two subsets Given a specific map, only x = bits are necessary to identity one of the two subsets (1 bit needed) and the particular intensity level within the chosen subset (x − = bits needed) Operating as a finite state machine, described by a trellis, the algorithm uses the bit that identifies the subset within the current map to determine the next state of the trellis, which in turn determines the choice of color map for the next input bit-string This approximates the effect of quantizing with 2x+1 = 24 = 16 intensity levels Sampling by Fibonacci lattice Unlike scalar values on the gray axis, points in a multidimensional color space not lend themselves to easy manipulation and ordering In this variation of uniform quantization [19], the universal color palette is constructed by sampling within a series of cross planes along the luminance axis of the CIELAB color space Each cross plane is a complex plane centered at the luminance axis, and sample points z j in the plane (with equal luminance) are determined by the Fibonacci spiral lattice (see Figure 86.4): z j = j δeiθ , θ = 2πjτ + α0 where parameter δ controls the radial distribution of the points (higher value produces greater dispersion), τ determines the overall pattern of distribution (a Markoff irrational number yields the most uniform distribution), and α0 denotes an initial angle of rotation to better align the sample points with the boundaries of the color space © 2007 by Taylor & Francis Group, LLC CuuDuongThanCong.com 86-6 Handbook of Approximation Algorithms and Metaheuristics 55 47 60 34 39 52 58 21 26 37 13 18 31 45 44 24 16 10 23 57 32 11 53 15 36 19 40 28 49 27 14 20 12 41 48 17 35 22 25 33 30 54 56 43 38 46 51 59 50 FIGURE 86.4 42 29 Points on the Fibonacci spiral lattice √ (the golden mean) are used in implementation, along with an The values δ = 0.5 and τ = 5−1 additional scaling factor to help adjust the sample points’ coverage of the color space within each cross plane To produce a universal palette of a certain size, the number of luminance levels and the number of sample points per level need to be carefully chosen, and the set of luminance values be determined through image-dependent quantization of luminance values from a large set of training images A unique aspect of this sampling method comes from the Fibonacci lattice Each sample point z j in the spiral lattice is uniquely determined by its scalar index j , and two neighboring points are always some Fibonacci number apart in their indices These plus other useful properties of the Fibonacci lattice make the resulting color palette amenable to fast quantization and ordered dither In addition, a number of gray-scale image processing operations such as gradient-based edge detection can be readily applied to the quantized color images 86.4 Image-Dependent, Context-Free Quantization Image-dependent, context-free quantization methods select quantized colors based solely on original colors and their frequencies, without regard to the spatial relationship of the pixels and the context of the visual information that the image conveys A basic strategy shared by numerous quantization algorithms in this category is to proceed in two steps The first partitions the n original image colors into k disjoint clusters S1 , S2 , , Sk based on a certain numerical criterion—this makes color quantization a part of the broader area of data clustering [20]; and the second computes a representative (i.e., a quantized color) for each cluster The quantized image may then be constructed by recoloring each pixel with the representative of the cluster that contains the pixel’s original color, or with the application of such techniques as error diffusion using the resultant color map Intuitively, these methods differ in how to balance two interrelated and competing objectives: the preservation of popular colors versus the minimization of maximum quantization error (see Figure 86.5) The former may be characterized as achieving an error-free mapping for the k most popular original colors; whereas the latter is the minimization of the upper bound for all d(c, qi ), ≤ i ≤ k, where c is an original color in the i th cluster Si , qi the representative of Si , and d(c, qi ) the nonnegative quantization error, typically the Euclidean distance between c and qi A classic approach to strike a balance between the two objectives is to minimize the sum of squared errors 1≤i ≤k c∈Si P (c)d (c, q i ) across the entire image, where P (c) is the frequency (pixel population) of color c As an alternative, we may try to minimize the total quantization errors 1≤i ≤k c∈Si P (c)d(c, q i ), which represents a lesser bias towards capping the maximum quantization error Such statistical criteria can trace their origin to the quantization of a continuous-tone black-and-white signal [21,22], and have been proven to be NP-complete [23–26] © 2007 by Taylor & Francis Group, LLC CuuDuongThanCong.com 86-7 Color Quantization Minimum maximum quantization error Error-free mapping for popular colors Minimum total quantization errors FIGURE 86.5 Minimum sum of squared errors An intuitive scale for comparing statistical criteria Regardless of the operational principle (e.g., limiting the spatial extent of each cluster) for the clustering step of an approximation algorithm, the frequency-weighed mean of the original colors in each resulting cluster is almost always used as the cluster’s representative, often called the cluster’s centroid but sometimes referred to as the center of gravity (the two notions are equivalent in this context) This reflects a common consensus on minimizing intracluster variance The popularity algorithm This early quantization method aims at the preservation of popular colors [27] It creates a histogram of the colors in the original image and selects the k most popular ones as quantized colors Pixels in other colors are simply mapped to the closest quantized colors, respectively The advantage here is that relatively large and similarly colored image areas are kept little changed after quantization; however, smaller areas, some of which may carry crucial information (e.g., a uniquely colored signal light), can take on significant distortion as a result of their being mapped to popular colors An implementation technique that may alleviate this problem preprocesses the 24-bit original colors by truncating a few least-significant bits from each color component (i.e., performing a uniform quantization), effectively combining several popular colors that are very similar to each other into a single quantized color, thus allowing some of the less popular colors to be selected as quantized colors This preprocessing step (e.g., 3-3-3 or 3-2-4 bit-cutting for red-green-blue) can also be used to achieve color reduction and to alter color granularity for other algorithms The downside here is that bit-cutting itself can cause false contours to appear on smoothly shaded surfaces One may also avoid having several popular colors that are neighbors of each other as quantized colors by choosing one quantized color at a time, and artificially reducing the pixel count of the remaining colors in the vicinity of the chosen color (currently the most popular), with the reduction being based on a spherically symmetric exponential function in the form of − e Kr (note that this sets the pixel count of the chosen color to so it will never be selected again in subsequent iterations), where r is the radius of the sphere that is centered at the chosen color and K an experimentally determined constant [28] Detecting peaks in histogram Instead of choosing the k most popular original colors for the color map, we may find peaks in the histogram and use colors at the peaks as quantized colors The peaks may be identified by a multiscale clustering scheme based on discrete wavelet transform (DWT), where computational efficiency comes from carrying out three-dimensional DWT as a series of independent one-dimensional transforms followed by downsampling [29] The quantizer can determine the value of k from the number of detected peaks, or it may be adjusted to produce a preset number of quantized colors Peano scan In another technique to lessen the difficulty associated with multidimensional data processing, a recursively defined space-filling curve, referred to as a Peano curve, is used to traverse the space (e.g., the RGB color cube), creating a one-to-one mapping between points in space and their counterparts along the curve Subject to the spatial relationships that are preserved by the mapping, certain spatially oriented operations may now be carried out along a single dimension For example, since points close on the Peano curve are also close in space, given a specific color we may easily find some of its neighbors by searching along the curve [30,31] The shortfall of this approach comes from the fact that points close in space are only likely, but not necessarily to be close on the curve © 2007 by Taylor & Francis Group, LLC CuuDuongThanCong.com 86-8 Handbook of Approximation Algorithms and Metaheuristics The Median-cut algorithm This two-step algorithm conducts a hierarchical subdivision of clusters that have high pixel populations, attempting to achieve an even distribution of pixels among the quantized colors [27] We first fit a rectangular box over an initial cluster containing all original colors, and split the box into two with a plane that is orthogonal to its longest dimension to bisect the cluster in such a way that each new cluster is now responsible for half of the pixel population in the original cluster The new clusters are then treated the same way repeatedly until we have k clusters The criterion for selecting the next cluster to split is based on pixel count By splitting the most popular cluster in each step, the algorithm will eventually produce k clusters each of which is responsible for roughly 1/k of the image’s pixel population In comparison with the popularity algorithm, this alternative for resource distribution often brings about better quantization results However, having the same number of pixels mapped to each quantized color does not necessarily lead to effective control of quantization errors The center-cut algorithm As a variation to the median-cut algorithm, this method bisects a cluster at the midpoint of the longest dimension of its bounding box without regard to pixel population [32] And it ranks candidate clusters for subdivision based on the longest dimension of their bounding boxes—the longest one is split first These changes put more emphasis on restraining the spatial extent of the clusters, and a better job in keeping grossly distinct colors from being grouped into the same cluster and mapped to the same quantized color Both the median-cut and the center-cut algorithms take a top-down approach to partitioning a single cluster into k clusters Alternatively, we may follow a bottom-up strategy that merges the n original colors into the desired number of clusters To this end the octree data structure [33] can be used to provide a predetermined hierarchical subdivision of the RGB color space for merging clusters Octree quantization With the entire RGB color cube represented by the root node and each octant of the color cube by a child node descending from the root, an individual 24-bit RGB color corresponds to a leaf node at depth of the octree [34] Conceptually, once we populate an octree with pixel colors from an input image, we may start from the bottom of the octree (greatest depth) and recursively merge leaf nodes that have the same parent into the parent node, transforming the parent node into a leaf node at the next level, until we reduce the number of leaf nodes from n to k Each remaining leaf node now represents a cluster of original colors that inhabit the spatial extent of the node In an actual implementation, only an octree structure with no more than k leaf nodes needs to be maintained, where each leaf node has a color accumulator and a pixel counter for the eventual calculation of its centroid As we scan an original image, the color of each pixel is processed as follows If the color falls within the spatial extent of an existing leaf node, then add it to the node’s color accumulator and increase the node’s pixel count by Otherwise, use the color to initialize the color accumulator of a new leaf node and set the node’s pixel counter to If this increases the number of leaf nodes to k + 1, merge some of the existing leaf nodes (leaves with greatest depth first) into their parent node, which becomes a new leaf node whose color accumulator takes on the sum of the accumulated color values from the children, and whose pixel counter gets the total of the children’s pixel count Note that each splitting operation in median-cut or center-cut is performed either at the median or the midpoint of the longest dimension of a bounding box, and the merging operation in the octree algorithm is along predetermined spatial boundaries This leaves the possibility of separating color points that are close to each other in a naturally forming cluster into different clusters Agglomerative clustering There are other bottom-up approaches where we start with n clusters each of which contains one original color, and merge clusters without regard to preset spatial boundaries In a method that relies on a three-dimensional representation of all 24-bit original colors as well as the clusters they belong to [35], we begin with n initial clusters that have the smallest bounding boxes By gradually increasing the size limit on bounding boxes (with increments 2, 1, and for red, green, and blue, respectively, to partially compensate for the nonuniform nature of the RGB color space), we merge neighboring clusters into larger ones to reduce the number of clusters For a given size limit, we search the vicinity of each existing cluster S in the three-dimensional data structure to find candidates to merge, that is, clusters that can fit into a new bounding box for S that satisfies the size limit The process terminates when k clusters remain © 2007 by Taylor & Francis Group, LLC CuuDuongThanCong.com Color Quantization 86-9 In addition to limiting the size of bounding boxes, the criterion for merging clusters may also be based on variance or distance between centroids (see below) Variance-based methods The two-step top-down or bottom-up methods we have discussed so far decouple the formation of clusters and the computation of a representative for each cluster (the centroid) in the sense that the two steps are designed to achieve different numerical objectives: evenly distributed pixel population or size-restricted bounding boxes for clustering, and minimum variance for selecting cluster representatives after clustering Several approximation algorithms are devised with variance-based criteria for the clustering step as well A K -means algorithm starts with an initial selection of quantized colors q1 , q2 , , qk , which may simply be evenly spaced points in the color space, or the result of some other algorithm It then partitions the n original colors into k clusters S1 , S2 , , Sk such that c ∈ Si if d(c, qi ) ≤ d(c, q j ) for all j , ≤ j ≤ k After the partition it calculates the centroid of each cluster Si as the cluster’s new representative qi′ The algorithm terminates when the relative reduction in overall quantization error from the previous ′ choice of quantized colors is below a preset threshold This relative reduction may be defined as E −E E′ , where previous overall quantization error E = 1≤i ≤k c∈Si P(c)d (c, qi ) and current overall quanti′ ′ zation error E = 1≤i ≤k c∈Si P(c)d (c, qi ) Otherwise, the algorithm reiterates the partitioning step (followed by the recalculation of centroids) using the newly selected quantized colors This quantization method is rather time consuming (with O(nk) for each iteration), and its convergence at best leads to a locally optimal solution that is influenced by the initial selection of quantized colors [36–38] In one of the bottom-up approaches we merge clusters under the notion of pairwise nearest neighbors [39] Each iteration of the algorithm entails searching among current clusters to find two candidates, viz., Si and S j that are the closest neighbors, that is, two that when merged together into Sij = Si ∪ S j , will result in minimum sum of squared errors for Sij : c∈Sij P(c)d (c, µij ), where µij is the centroid of Sij A full implementation of this method is rather time-consuming since it would take at least O(nlog n) just for the first iteration To this end a k–d tree, where existing clusters (each cluster is spatially located at its centroid) are grouped into buckets (roughly equal number of clusters in each buckets), is used to restrict the search for pairwise nearest neighbors within each bucket (one pair per bucket) The pair that will result in the lowest sum of squared errors is merged first, then the pair in another bucket that yields the second lowest error sum, etc The tree is rebalanced to account for the merged clusters when a certain percentage (e.g., 50%) of the identified pairs have been merged Another bottom-up method [40] randomly samples the input image for original colors and their frequencies; sorts the list of sampled original colors based on their frequencies in ascending order; and merges each color ci , starting from the top of the list (i.e., low frequency first), with its nearest neighbor P (c )P (c ) c j , chosen based on a weighted squared Euclidean distance P (ci )i + P (cj j ) d (ci , c j ) to favor the merging of pairs of low-frequency colors Each pair of merged colors is removed from the current list and replaced by P (c )c + P (c )c cij = P i(ci i) + P (c jj ) j , with P (cij ) = P (ci ) + P (c j ), which will be handled as an ordinary color during the next iteration of sorting and pairwise merging The algorithm terminates when k colors remain on the list, which are used as quantized colors In a couple of approaches that follow the strategy of hierarchical subdivision, we start with a single cluster containing all original colors, and repeatedly partition the cluster S whose sum of squared errors c∈S P (c)d (c, µ), also termed weighted variance, with µ being the centroid of S, is the highest We move an orthogonal cutting plane along each of the three dimensions of the RGB color cube to search for a position that divides the chosen cluster into two One way to determine the orientation and position of the cutting plane is to project color points in the cluster, bounded by r ≤ r ≤ r , g ≤ g ≤ g , and b1 ≤ b ≤ b2 , onto each of the three color axis; find the threshold that minimizes the weighted sum of projected variances of the two intervals adjoining at the threshold for each axis; and run the cutting plane perpendicular to and through the threshold on the axis that gives the minimum sum of projected variances [41] More specifically, the frequency of a projected point on the r -axis is P (r, 0, 0) = b1 ≤b≤b2 P (r, g , b) for the b1 ≤b≤b2 P (r, g , b) Likewise, we have P (0, g , 0) = r ≤r ≤r g ≤g ≤g g -axis and P (0, 0, b) = r ≤r ≤r g ≤g ≤g P (r, g , b) for the b-axis Given an axis along with a series of projected points between l and m, a threshold l < t ≤ m partitions the points into two intervals [l , t−1] © 2007 by Taylor & Francis Group, LLC CuuDuongThanCong.com 86-10 Handbook of Approximation Algorithms and Metaheuristics and [t, m], with the resulting weighted sum of projected variances being E t = l ≤i ≤t−1 Pi (i − µ1 )2 + t≤i ≤m Pi (i − µ2 ) , where µ1 and µ2 are the means of the two intervals, respectively, and Pi = P (i , 0, µ+m 0), P (0, i , 0), orP (0, 0, i ) The optimal threshold value that minimizes E t is in the range of [ l +µ , ] w1 and maximizes w (µ − µ1 ) , where µ is the mean of the projected points in [l , m], w = l ≤i ≤t−1 Pi , and w = t≤i ≤m Pi are the weights for the two respective intervals [42] Another way to determine the cutting plane is to minimize the sum of weighted variances (without projecting points onto the three color axes) on both sides of the plane [43] A rectangular bounding box is now defined by r < r ≤ r , g < g ≤ g , and b1 < b ≤ b2 ; and it is denoted by (cl , cm ], where cl = (r , g , b1 ) and cm = (r , g , b2 ) And we define Md (ct ) = c∈ (o, ct ] cd P (c), with d = 0, 1, 2, c0 = 1, c2 = ccT , and o being a reference point such that c∈ (−∞, o] P (c) = We precompute and store Md (c), d = 0, 1, 2, for each grid point in the RGB space to facilitate efficient computation of the pixel population w (cl , cm ], mean µ(cl , cm ], and weighted variance E (cl , cm ] of any cluster of image colors bounded by (cl , cm ]: w (cl , cm ] = P (c) c∈ (cl , c m ] cP(c) µ(cl , cm ] = c∈ (cl , c m ] w (cl , cm ]  E (cl , cm ] = P (c)d (c, µ(cl , cm ]) = c∈ (cl , c m ] c2 P (c) − c∈ (cl , c m ]  2 cP(c) c∈ (cl , c m ] w (cl , cm ] The evaluation of these items in O(1) time is made possible by designating the remaining six corners of the bounding box as ca = (r , g , b1 ); cb = (r , g , b1 ); cd = (r , g , b2 ); cc = (r , g , b2 ); ce = (r , g , b2 ); c f = (r , g , b1 ) and applying the rule of inclusion–exclusion to obtain f (c)P (c) = c∈ (cl , c m ]   + + + − − − − c∈ (o, c m ] c∈ (o, c a ] c∈ (o, c b ] c∈ (o, c c ] c∈ (o, c d ] c∈ (o, c e ] c∈ (o, c f ] c∈ (o, cl ]   f (c)P (c) where f (c) may be 1, c, or c2 Furthermore, to determine a cutting plane for (cl , cm ], we need to minimize E (cl , ct ] + E (ct , cm ], with ct = (r, g , b2 )|r

Định dạng
Số trang	1.351
Dung lượng	16,83 MB