551 17 IMAGE SEGMENTATION Segmentation of an image entails the division or separation of the image into regions of similar attribute. The most basic attribute for segmentation is image lumi- nance amplitude for a monochrome image and color components for a color image. Image edges and texture are also useful attributes for segmentation. The definition of segmentation adopted in this chapter is deliberately restrictive; no contextual information is utilized in the segmentation. Furthermore, segmenta- tion does not involve classifying each segment. The segmenter only subdivides an image; it does not attempt to recognize the individual segments or their relationships to one another. There is no theory of image segmentation. As a consequence, no single standard method of image segmentation has emerged. Rather, there are a collection of ad hoc methods that have received some degree of popularity. Because the methods are ad hoc, it would be useful to have some means of assessing their performance. Haralick and Shapiro (1) have established the following qualitative guideline for a good image segmentation: “Regions of an image segmentation should be uniform and homogeneous with respect to some characteristic such as gray tone or texture. Region interiors should be simple and without many small holes. Adjacent regions of a segmentation should have significantly different values with respect to the char- acteristic on which they are uniform. Boundaries of each segment should be simple, not ragged, and must be spatially accurate.” Unfortunately, no quantitative image segmentation performance metric has been developed. Several generic methods of image segmentation are described in the following sections. Because of their complexity, it is not feasible to describe all the details of the various algorithms. Surveys of image segmentation methods are given in Refer- ences 1 to 6. Digital Image Processing: PIKS Inside, Third Edition. William K. Pratt Copyright © 2001 John Wiley & Sons, Inc. ISBNs: 0-471-37407-5 (Hardback); 0-471-22132-5 (Electronic) 552 IMAGE SEGMENTATION 17.1. AMPLITUDE SEGMENTATION METHODS This section considers several image segmentation methods based on the threshold- ing of luminance or color components of an image. An amplitude projection segmentation technique is also discussed. 17.1.1. Bilevel Luminance Thresholding Many images can be characterized as containing some object of interest of reason- ably uniform brightness placed against a background of differing brightness. Typical examples include handwritten and typewritten text, microscope biomedical samples, and airplanes on a runway. For such images, luminance is a distinguishing feature that can be utilized to segment the object from its background. If an object of inter- est is white against a black background, or vice versa, it is a trivial task to set a midgray threshold to segment the object from the background. Practical problems occur, however, when the observed image is subject to noise and when both the object and background assume some broad range of gray scales. Another frequent difficulty is that the background may be nonuniform. Figure 17.1-1a shows a digitized typewritten text consisting of dark letters against a lighter background. A gray scale histogram of the text is presented in Fig- ure 17.1-1b. The expected bimodality of the histogram is masked by the relatively large percentage of background pixels. Figure 17.1-1c to e are threshold displays in which all pixels brighter than the threshold are mapped to unity display luminance and all the remaining pixels below the threshold are mapped to the zero level of dis- play luminance. The photographs illustrate a common problem associated with image thresholding. If the threshold is set too low, portions of the letters are deleted (the stem of the letter “p” is fragmented). Conversely, if the threshold is set too high, object artifacts result (the loop of the letter “e” is filled in). Several analytic approaches to the setting of a luminance threshold have been proposed (7,8). One method is to set the gray scale threshold at a level such that the cumulative gray scale count matches an a priori assumption of the gray scale proba- bility distribution (9). For example, it may be known that black characters cover 25% of the area of a typewritten page. Thus, the threshold level on the image might be set such that the quartile of pixels with the lowest luminance are judged to be black. Another approach to luminance threshold selection is to set the threshold at the minimum point of the histogram between its bimodal peaks (10). Determination of the minimum is often difficult because of the jaggedness of the histogram. A solution to this problem is to fit the histogram values between the peaks with some analytic function and then obtain its minimum by differentiation. For example, let y and x represent the histogram ordinate and abscissa, respectively. Then the quadratic curve (17.1-1) yax 2 bx c++= AMPLITUDE SEGMENTATION METHODS 553 FIGURE 17.1-1. Luminance thresholding segmentation of typewritten text. ( a ) Gray scale text ( b ) Histogram ( c ) High threshold, T = 0.67 ( d ) Medium threshold, T = 0.50 ( f ) Histogram, Laplacian mask ( e ) Low threshold, T = 0.10 554 IMAGE SEGMENTATION where a, b, and c are constants provides a simple histogram approximation in the vicinity of the histogram valley. The minimum histogram valley occurs for . Papamarkos and Gatos (11) have extended this concept for threshold selection. Weska et al. (12) have suggested the use of a Laplacian operator to aid in lumi- nance threshold selection. As defined in Eq. 15.3-1, the Laplacian forms the spatial second partial derivative of an image. Consider an image region in the vicinity of an object in which the luminance increases from a low plateau level to a higher plateau level in a smooth ramplike fashion. In the flat regions and along the ramp, the Lapla- cian is zero. Large positive values of the Laplacian will occur in the transition region from the low plateau to the ramp; large negative values will be produced in the tran- sition from the ramp to the high plateau. A gray scale histogram formed of only those pixels of the original image that lie at coordinates corresponding to very high or low values of the Laplacian tends to be bimodal with a distinctive valley between the peaks. Figure 17.1-1f shows the histogram of the text image of Figure 17.1-1a after the Laplacian mask operation. If the background of an image is nonuniform, it often is necessary to adapt the luminance threshold to the mean luminance level (13,14). This can be accomplished by subdividing the image into small blocks and determining the best threshold level for each block by the methods discussed previously. Threshold levels for each pixel may then be determined by interpolation between the block centers. Yankowitz and Bruckstein (15) have proposed an adaptive thresholding method in which a thresh- old surface is obtained by interpolating an image only at points where its gradient is large. 17.1.2. Multilevel Luminance Thresholding Effective segmentation can be achieved in some classes of images by a recursive multilevel thresholding method suggested by Tomita et al. (16). In the first stage of the process, the image is thresholded to separate brighter regions from darker regions by locating a minimum between luminance modes of the histogram. Then histograms are formed of each of the segmented parts. If these histograms are not unimodal, the parts are thresholded again. The process continues until the histogram of a part becomes unimodal. Figures 17.1-2 to 17.1-4 provide an example of this form of amplitude segmentation in which the peppers image is segmented into four gray scale segments. 17.1.3. Multilevel Color Component Thresholding The multilevel luminance thresholding concept can be extended to the segmentation of color and multispectral images. Ohlander et al. (17, 18) have developed a seg- mentation scheme for natural color images based on multidimensional thresholding of color images represented by their RGB color components, their luma/chroma YIQ components, and by a set of nonstandard color components, loosely called intensity, xb– 2a⁄= AMPLITUDE SEGMENTATION METHODS 555 FIGURE 17.1-2. Multilevel luminance thresholding image segmentation of the peppers_ mon image; first-level segmentation. ( a ) Original ( b ) Original histogram ( c ) Segment 0 ( d ) Segment 0 histogram ( e ) Segment 1 ( f ) Segment 1 histogram 556 IMAGE SEGMENTATION hue, and saturation. Figure 17.1-5 provides an example of the property histograms of these nine color components for a scene. The histograms, have been measured over those parts of the original scene that are relatively devoid of texture: the non- busy parts of the scene. This important step of the segmentation process is necessary to avoid false segmentation of homogeneous textured regions into many isolated parts. If the property histograms are not all unimodal, an ad hoc procedure is invoked to determine the best property and the best level for thresholding of that property. The first candidate is image intensity. Other candidates are selected on a priority basis, depending on contrast level and location of the histogram modes. After a threshold level has been determined, the image is subdivided into its segmented parts. The procedure is then repeated on each part until the resulting property histograms become unimodal or the segmentation reaches a reasonable FIGURE 17.1-3. Multilevel luminance thresholding image segmentation of the peppers_ mon image; second-level segmentation, 0 branch. ( a ) Segment 00 ( b ) Segment 00 histogram ( c ) Segment 01 ( d ) Segment 01 histogram AMPLITUDE SEGMENTATION METHODS 557 stage of separation under manual surveillance. Ohlander's segmentation technique using multidimensional thresholding aided by texture discrimination has proved quite effective in simulation tests. However, a large part of the segmentation control has been performed by a human operator; human judgment, predicated on trial threshold setting results, is required for guidance. In Ohlander's segmentation method, the nine property values are obviously inter- dependent. The YIQ and intensity components are linear combinations of RGB; the hue and saturation measurements are nonlinear functions of RGB. This observation raises several questions. What types of linear and nonlinear transformations of RGB are best for segmentation? Ohta et al. (19) suggest an approximation to the spectral Karhunen–Loeve transform. How many property values should be used? What is the best form of property thresholding? Perhaps answers to these last two questions may FIGURE 17.1-4. Multilevel luminance thresholding image segmentation of the peppers_ mon image; second-level segmentation, 1 branch. ( a ) Segment 10 ( b ) Segment 10 histogram ( c ) Segment 11 ( d ) Segment 11 histogram 558 IMAGE SEGMENTATION be forthcoming from a study of clustering techniques in pattern recognition (20). Property value histograms are really the marginal histograms of a joint histogram of property values. Clustering methods can be utilized to specify multidimensional decision boundaries for segmentation. This approach permits utilization of all the property values for segmentation and inherently recognizes their respective cross correlation. The following section discusses clustering methods of image segmentation. FIGURE 17.1-5. Typical property histograms for color image segmentation. AMPLITUDE SEGMENTATION METHODS 559 17.1.4. Amplitude Projection Image segments can sometimes be effectively isolated by forming the average amplitude projections of an image along its rows and columns (21,22). The horizon- tal (row) and vertical (column) projections are defined as (17.1-2) and (17.1-3) Figure 17.1-6 illustrates an application of gray scale projection segmentation of an image. The rectangularly shaped segment can be further delimited by taking projec- tions over oblique angles. FIGURE 17.1-6. Gray scale projection image segmentation of a toy tank image. Hk() 1 J Fjk,() j 1 = J ∑ = Vj() 1 K Fjk,() k 1 = K ∑ = ( a ) Row projection ( b ) Original ( c ) Segmentation ( d ) Column projection B W W B 560 IMAGE SEGMENTATION 17.2. CLUSTERING SEGMENTATION METHODS One of the earliest examples of image segmentation, by Haralick and Kelly (23) using data clustering, was the subdivision of multispectral aerial images of agricul- tural land into regions containing the same type of land cover. The clustering seg- mentation concept is simple; however, it is usually computationally intensive. Consider a vector of measurements at each pixel coordinate (j, k) in an image. The measurements could be point multispectral values, point color components, and derived color components, as in the Ohlander approach described previously, or they could be neighborhood feature measurements such as the moving window mean, standard deviation, and mode, as discussed in Section 16.2. If the measurement set is to be effective for image segmentation, data collected at various pixels within a segment of common attribute should be similar. That is, the data should be tightly clustered in an N-dimensional measurement space. If this condition holds, the segmenter design task becomes one of subdividing the N-dimensional measurement space into mutually exclusive compartments, each of which envelopes typical data clusters for each image segment. Figure 17.2-1 illustrates the concept for two features. In the segmentation process, if a measurement vector for a pixel falls within a measurement space compartment, the pixel is assigned the segment name or label of that compartment. Coleman and Andrews (24) have developed a robust and relatively efficient image segmentation clustering algorithm. Figure 17.2-2 is a flowchart that describes a simplified version of the algorithm for segmentation of monochrome images. The first stage of the algorithm involves feature computation. In one set of experiments, Coleman and Andrews used 12 mode measurements in square windows of size 1, 3, 7, and 15 pixels. The next step in the algorithm is the clustering stage, in which the optimum number of clusters is determined along with the feature space center of each cluster. In the segmenter, a given feature vector is assigned to its closest cluster center. FIGURE 17.2-1. Data clustering for two feature measurements. x x 1 x 2 … x N ,,,[] T = X 2 CLASS 1 CLASS 3 LINEAR CLASSIFICATION BOUNDARY CLASS 2 X 1