Luận án tiến sĩ: Texture- and structure-based image representation with applications to image retrieval and compression

Finally, a new multi-scale curve representation framework, the chordlet, is constructed for succinct curve-based image structure representation.. c c Q Q Q kg ng vn ga vần Sample shape i

Applications to Texture-Based Retrieval 2.0+00, 57

3.4.1 Non-Rotation-Invariant Texture-Based Retrieval

To give an illustration of an application of the proposed texture representation systems, the image retrieval problem based on textures is considered The retrieval systems discussed in this chapter consist of two typical stages, first, texture features characterizing the images’ visual contents are extracted using the proposed texture representation system, then the similarities between the query images’ and the database images’ feature vectors are measured As discussed in Section 3.2, by using the proposed texture representation systems, the texture features are extracted through statistical modeling in the contourlet domain for non-rotation-invariant representations The parameters for the distributions of all subbands of an image are used as the texture features, and the Kullback-Liebler distance (KLD) is adopted as the similarity measure between different distributions Fig 3-13 gives an illustration of a retrieval system framework employing the models discussed in this chapter

Color or luminance Feature components ằ| Extraction of query image : ; texture Kullback-Liebler | _ằ Candidate features Distance images Color or —g—> 1 luminance ® Feature eee — Ỷ

Zz components |_® | Extraction of images N in database ° e ~ 4 Markov Estimation of

Decomposition Subbands at scale L(i) , (| distribution statistics | aatures with orientation 8 (i) ; en of each subband e ;

Figure 3-13: An illustration of a texture retrieval system exploiting contourlet domain-based HMMs (a) A general retrieval system; N = 1 when only luminance component is adopted for feature extraction, and N = 3 when color components are adopted for feature extraction (b) The feature extraction sub-system using the proposed texture representations

Similar to the practice in (Manjunath et al., 2000; Do and Vetterli, 2002b; Po and

Do, 2006), a texture database generated from the MIT VisTex database (MIT Vision and Modeling Group, 2002) is adopted to evaluate the performance of the texture retrieval systems using different contourlet-based HMMs A group of 163 texture images of size re ran v HƯỚNG th

Figure 3-14: Sample texture images from the MIT VisTex database

512 x 512 images from the VisTex database were divided into four sub-images, creating a new texture database with 652 images of size 256 x 256 These four texture images from the same original image are considered the same texture, and each can serve as a ground truth for the other three.

Considering the size of the texture images, a three-level contourlet decomposition is adopted, and 16 directional subbands are generated with resolution gz on both the first and second levels while 8 directional subbands with resolution 4 are generated on the last level This results in subband images with 512 coefficients on the last level, which is assumed to be adequate for parameter training A contourlet pyramid with higher numbers of levels and resolutions may be adopted for images of larger size

Texture features estimated from the HMM are compared through the KLD, estimated by a Monte-Carlo method

D(F,{x)|Pa(x)) ® x >_ [log Py(2”) — log Pa(x”)], N (3.7) n=1 where the z”'s are randomly generated data following query image distributions P„(-)

In these experiments, each image in the database is used as a query image for retrieval Three metrics, namely, the recall, the precision and the normalized modified retrieval rank

(NMRR) are usually used to evaluate the performance of texture retrieval systems (Long et al., 2003) In this work, the recall is adopted for the definition of retrieval rate, namely,

Ru = >, NM/1956 , 652 (3.8) i=1 where N = 1 when a ground-truth texture image has been retrieval among the M candidate images that are preserved for the i** query image The highest retrieval rate Rau = 1 is obtained when for each query image the system can retrieve the images from the same texture classes among top M candidate images

Table 3.1: Brief description of five proposed HMM-based texture representation systems and that of the method in (Po, 2003)

HMM using luminance in-band spatial dependencies, Sec 3.2.1 HMM using luminance in-band and cross scale dependencies, Sec 3.2.2 HMM using color in-band dependencies, Sec 3.2.3 HMM using color cross-scale dependencies, Sec 3.2.4 combination of features from color in-band and color cross-scale, Sec 3.2.4 HMM using luminance cross-scale dependencies of (Po, 2003)

A brief description of each model is given in Table 3.1, and the retrieval results for the proposed five approaches are listed in Table 3.2 and compared with those of (Po, 2003) and (Han and Ma, 2002) The HMM that exploits cross-scale dependencies is adopted in (Po, 2003), which is included for the comparison between HMMs exploiting different dependencies Han adopts a fuzzy color histogram method in (Han and Ma, 2002) to perform retrieval based on color information; this is included in simulations to compare the proposed methods with one exploiting color information in other manners The number of texture features required for each query in each retrieval system is also included in the

Table 3.2: Retrieval results of the proposed retrieval systems as compared with (Po, 2003) umber of

Methods Rs Rio Ris đa features

Luminance Cross-scale of (Po, 2003) 0.7730 | 0.8712 | 0.9080 | 0.9387 240 Luminance Dual 0.8037 | 0.8773 | 0.9202 | 0.9387 360

Color In-band 0.9417 | 0.9678 | 0.9770 | 0.9877 240 Color Cross-scale 0.9417 | 0.9682 | 0.9761 | 0.9850 800

(Han and Ma, 2002) | 05798 | 0.7117 | 0.7807 | 0.8221 64 last column of Table 3.2; these reflect the storage cost and, in part, reflect computation costs It is observed that, in general, the color-based models yield higher retrieval rates over the luminance-based models In particular when only the top 5 and top 10 candidate images are considered, over 10% higher rates are observed using the color-based approaches Among these three color-based models, the color dual model gives a slightly higher retrieval rate at the cost of a significant number of features The color in-band model and color cross-scale methods yield similar retrieval rates; with fewer than 1/3 the number of features are needed for the first method as compared with the second method

Among the three luminance-based models, the luminance dual model gives slightly better performance over the other two, with more features need for each query during the retrieval In particular, the luminance in-band model exhibits similar performance as compared to that of the luminance cross-scale model constructed by Po and Do (Po, 2003), and with fewer than 1/3 the number of features needed for retrieval In summary, the color in-band model yields significantly higher retrieval rates over luminance-based models, while requiring the same number of features as the luminance cross-scale model The luminance in-band model requires the fewest number of features for retrieval and yields retrieval rates comparable to all luminance-based models. ae | mw

The top six retrieval results for texture image Bark are presented in Figure 3-15, employing color-based and intensity-based HMMs and the approach proposed by (Po, 2003) Subfigures (a) through (d) showcase retrieval results using the proposed color HMM, leveraging cross-scale and in-band dependencies, and exploiting luminance in-band and cross-scale dependencies Subfigure (e) displays results using the method of (Po, 2003) Each subfigure features the query image on the left and the top six retrieved images on the right.

Figure 3-16: Top six retrieval results using the color in-band model, the luminance in-band model and the luminance cross-scale model of (Po, 2003) (a) Query image Flower with three other flower images generated from the same original image (b) Retrieval results using the luminance in-band model (c) Retrieval results using the luminance cross-scale model of (Po, 2003) (d) Retrieval results using color in-band model In each case the query image is shown on the left and top six retrieved images are shown on the right.

Non-Rotation-Invariant Texture-Based Retrieval

Rotation-Invariant Texture-Based Retrieval

Figure 3-17: Sample texture images from the Brodatz texture database

(University of Southern California, Brodatz texture image database, 2004)

The performance of the constructed DHMM-based rotation-invariant texture representation is evaluated in an application to rotation-invariant texture-based retrieval As described in Section 3.4.1, the Kullback-Liebler distance is used to measure the distance between the texture representations of two images For simplicity, {f%°} and {741} are not directly combined together to form a single distribution of {TL} instead the original image is treated as two sub-images so that f(T"°) and f(T") are treated separately

To evaluate the performance of the proposed rotation-invariant texture-based retrieval system, the Brodatz texture database (University of Southern California, Brodatz texture

Figure 3-18: An illustration of rotated texture images From top left to bottom right, the images with rotation angles: 0° (original image), 30°, 60°,

90°, 120°, and 150° image database, 2004) and the MIT VisTex database (MIT Vision and Modeling Group, 2002) are used in simulations The first set of texture images used in the simulations is generated from thirteen classes of texture images in Brodatz texture database (University of Southern California, Brodatz texture image database, 2004); each class has four 64 x 64 texture images generated from the non-overlapping central region of the original/non- rotated 512 x 512 image, and five rotated 64 x 64 texture images generated from the central region of the original image rotated with angles 30°, 60”, 90°, 120°, and 150° A total of

117 images from the Brodatz database forms the first part of the simulation dataset Fig 3:17 shows some sample original texture images used in the experiments, and Fig 3-18 shows five images formed by rotations of a Brodatz texture image with rotation angles: 0, 30°, 60”, 90°, 120°, and 150° To include more texture images in the simulation, eighteen classes of texture images from the MIT VisTex database (MIT Vision and Modeling Group, 2002) are considered By selecting four texture images from each class, a total of 72 texture images, each of size 64 x 64, form the second part of the simulation dataset

In the simulations, the rotated images, a total of 65 images, are used as query images, and the corresponding images generated from the original images in the Brodatz database are used as the ground truth images to be retrieved The inclusion of the images from the MIT VisTex database provide the simulation with a large dataset to validate the robustness of the retrieval system The two-level two-orientation steerable pyramid is used in the simulation, and the retrieval rate defined in (3.8) is adopted, which is given as

Ru = >_ NM /260 , 65 (3.9) ¿=1 where W#Í is the number of successful retrieved image among the top M images reserved for the i** retrieval The proposed directional in-band based rotation-invariant method is tested against the cross-scale based rotation-invariant method (Do and Vetterli, 2002a)3 Table 3.3 lists the retrieval rates for the two methods when the top 5, 10, 15 and 20 candidate images are preserved It is observed that both retrieval methods yield high retrieval rates, and the proposed method consistently yields higher retrieval rates over the method of (Do and Vetterli, 2002a), at a cost of additional features

Table 3.3: Retrieval results of the proposed rotation-invariant retrieval system and that in (Do and Vetterli, 2002a)

Rs Rio Ris Roo | number of features

Method exploiting in-band dependencies

Method exploiting cross-scale dependencies | 0.8308 | 0.9538 | 0.9846 | 0.9846 6

In this chapter, texture representation frameworks are described Motivated by (Crouse et al., 1998; Do and Vetterli, 2003; Fan and Xia, 2003; Po, 2003), texture is represented using statistical modeling in a multi-scale directional pyramid Texture features are extracted by estimating probability density functions of wavelet coefficients in subbands using HMMs Three Markov dependencies are observed in the multi-scale pyramid, namely, cross-scale,

Leveraging only first-level features in the proposed method has demonstrated superior retrieval rates in experiments Despite the prominence of cross-scale and cross-band dependencies in HMM-based retrieval, this study emphasizes the importance of directional in-band dependencies in HMM construction for texture representation The proposed texture representations are evaluated for texture-based retrieval, and Table 3.2 compares their performance to other methods.

The work on the rotation-invariant texture representation is motived by (Do and Vet- terli, 2002a) In (Do and Vetterli, 2002a) the cross-scale and cross-band dependencies are used for the HMM construction and in this work the directional in-band and cross-band dependencies are used for the construction of HMM Performance of application of the rotation-invariant texture representation to texture-based retrieval is evaluated and the results are listed in Table 3.3 Discussion of the retrieval results in Sections 3.4.1 and 3.4.2 show that the HMM-based systems exploiting in-band directional dependencies gen- erally outperform the HMM-based methods exploiting only cross-scale dependencies The intuition behind this observation is as follows:

The texture representation systems in (Do and Vetterli, 2003; Fan and Xia, 2003; Po, 2003) and the ones developed in this chapter make three assumptions: e First, texture can be compared through characterization of distributions of subbands in a multi-scale decomposition This is supported by the physiological studies of texture perception, which reveal that human observers find it difficult to discriminate between two texture images if the distributions of each subband in a wavelet pyramid are similar (Bergen and Adelson, 1991) e Second, the distribution of a subband can be sufficiently estimated from the coefficients within the subband, and approximated by the histogram of the subband When Gaussian mixture models are used, as adopted in many HMM-based schemes, the histogram of a subband can be approximated by summations of weighted Gaus- sians. e Third, the estimation of Gaussian mixture models through EM algorithm is robust

The second and third assumptions are based on the first assumption While the first assumption is mainly true for homogeneous textures, there are cases where two visually different textures can share similar distributions and two visually similar textures may have distributions that differ To compensate for these circumstances, we need to take into consideration additional properties of the human visual system Physiological studies suggest that the HVS is localized, oriented, and bandpass, and that human observers are sensitive to edges with different orientations (Valois and Valois, 1988) While the multi-scale directional pyramid naturally provides a framework which is localized, oriented and bandpass, the spatial-oriented dependencies in each subband, in particular, provide information on local edge orientations Thus, with inclusion of spatial Markov dependencies in construction of HMMs, the features extracted are better fit to properties of the HVS and can provide for more robust and accurate texture representation

Given the discussion of the roles of different Markov dependencies in construction of HMMs for texture representation, we developed three additional representation systems, including color information in the construction of HMMs The features extracted based on these color-based HMMs carry both texture and color information Experimental results for the texture-based retrieval application given in Table 3.2 show that a retrieval system based on a combination of in-band dependencies and color information yields good performance in terms of both retrieval rate and the number of features required for retrieval While color features can be separately extracted for retrieval applications like color matching, in our work we illustrate how a straightforward extension of a luminance-based texture representation system to employ color will increase performance

We further show that slight improvement in rotation-invariant image retrieval can be obtained at the cost of additional features However, the number of features required for an HMM exploiting in-band dependencies is so small, that this will result in negligible additional computation cost over a previous method (Do and Vetterli, 2002a).

In Chapter 3, a framework for representing textures is proposed, along with its application to retrieving textures Texture features provide information about the properties and contents of objects, while structural features capture their shapes and spatial relationships To represent structures, a directional-projection-based framework known as DP-SR is constructed.

To clarify the use of the term structure, the directional projection based structure representation described in this chapter is based on the analysis of line edges in an image, where the strengths and spatial locations of line edges convey shape and spatial relationship information It is a class of local yet global features, i.e., local because edge segments are the building blocks for structure features and no explicit geometrical shape information will be explicitly incorporated, but it is also global due to the projection operation in the system, therefore information with respect to more than one object may be included in the structure features As compared to the curve representation framework, the chordlet, given in Chapter 5, the DP-SR framework focuses more on global structures, yielding a more compact representation by discarding some spatial relationships between structures Among the many projection-based structure representation systems, the Hough transform is widely used to capture shape information through line integrals/projections (Tip- wai and Madarasmi, 2002; Franti et al., 2000; Fung et al., 1996) The Hough transform performs direct integration over all possible lines in an image, with each point on the integration lines treated equally Kadyrov and Petrou (Kadyrov and Petrou, 2001) proposed a trace transform as a generalization of the Hough transform, where selected function- als along the straight lines are utilized to characterize the shape information A related method with applications to image retrieval uses histograms to represent edge segments in the spatial domain (Abdel-Mottaleb, 2000)

In the proposed DP-SR framework, the structure features are extracted from the directional projections within a directional multi-scale pyramid, contourlet Two structure- representation methods, DP-SR-I and DP-SR-II, are constructed in Sections 4.2.1 and 4,2.2, based on the DP-SR framework The key difference between these two systems is that while DP-SR-I adopts a piece-wise linear approximation for the projection profiles, DP-SR-II exploits a non-linear Gaussian mixture approximation for each projection profile For structure-based retrieval applications, two different similarity measurements are used for the two systems Experimental results are presented in Section 4.2.3 followed by conclusions in Section 4.3.

Structure Representation Using Directional Projection

DP-SR Framework 1 00.000 000 eee ee 70

In the DP-SR framework, an image is first decomposed into directional subbands through the use of a contourlet decomposition, via multi-scale directional filter bank constructed by Do and Vetterli (Do and Vetterli, 2003) (details on the contourlet are given in Section 2.5.4) Through this transform, edges with different orientations are captured in different subbands By projecting each subband onto its principal and orthogonal axes, edges with different orientations, locations and strengths in an image are reduced to a set of 1D projection profiles Fig 4-1 gives an illustration of the DP-SR framework This framework consists of four components, namely, multi-scale directional filtering, directional projection to form 1D profiles, low-pass filtering of the 1D profiles, and profile modeling Application of DP-SR, to retrieval systems is illustrated in Fig 4-2 Different profile modelings and similarity measurements are employed for DP-SR-I and DP-SR-II; these are discussed in detail in Sections 4.2.1 and 4.2.2

Input Directional | Subband 1D Smooth Profile

Image | Filterbanks Images Projection Profiles | Filtering] ~ | Modeling L Shape Features

Image in Edge Edge DP-SR em

Figure 4-2: A retrieval system based on DP-SR

In the DP-SR framework, the contourlet transform, an efficient iterative multi-scale directional filter bank (Do and Vetterli, 2003), is used to decompose each sketch/image edge map into local directional expansions, so that edges with different orientations are captured in different directional subbands The discussion in Chapter 2 shows that the contourlet transform provides an efficient representation for images whose edges can be approximated by curves u(v) € C? ! (Do, 2001), which is an appropriate assumption for natural images Fig 4-3 (b) gives an example of the horizontal subband of the edge map Fig 4:3 (c) of the image of Fig 4-3 (a)

‘Using Taylor series expansion u(v) (0) + œ (00 + * 902, when ứ % 0.

Figure 4-3 visually demonstrates various image projections and edge detection techniques The original image (a) is transformed into a horizontal subband image (b) that highlights horizontal features The edge map (c) identifies image boundaries, while the sketch (d) provides a simplified representation of the image's structure These projections and edge detection methods aid in image analysis and feature extraction for applications such as image processing, pattern recognition, and computer vision.

To efficiently capture shape information at different orientations, projections along each subband’s principal and orthogonal directions are performed to yield a set of profile pairs For example, given a vertical subband image, projections along the vertical (principal) direction and the horizontal (orthogonal) direction are performed, yielding a pair of projection profiles This process captures the locations and lengths of edges at different orientations Fig 4-3(b) shows an example of the projection of the horizontal subband in the horizontal and vertical directions These profiles are also shown in Figs 4-4(a) and (b) It is observed that the vertical locations and strength of edges are mainly captured by the location and amplitude of peaks in Fig 4-4 (a), the principal projection, while the strength and relative horizontal spatial relationships of the edges are characterized in the orthogonal projection profile of Fig 4-4 (b)

Each profile can be represented as a mixture of peaks with different amplitudes and widths at different locations Given an edge eg(j) with orientation 6 in the original image, the location and strength of eg(j) is characterized by the location and amplitude of the j* peak pi(j) in the principal projection profile of the subband with orientation 0 (the super- script ”1” of p}(j) indicates a peak in principal profile) The relative spatial relationship between two edges eg(m) and eg(n) can be represented by dg(m,n) and dg,,/2(m,n), the displacements along principal direction 6 and orthogonal direction @ + 3, respectively The distance between two peaks pj(m) and p}(n) characterizes dg(m,n), and dg, jo(m,n) is embedded in the peaks pa in the orthogonal projection profile ti (a) Tn - —— Đ)

Figure 4-4 presents image profiles of Fig 4:3(b) Profile 1 (a) projects along the principal direction (0°) of the vertical subband, while Profile 2 (b) projects along the orthogonal direction (90°) Low-pass filtered versions of these profiles are shown in (c) and (d), respectively Peak-valley approximations of the profiles are provided in (e) and (f).

Directional Projection 00 0.0.0.0 eee eee 72

To efficiently capture shape information at different orientations, projections along each subband’s principal and orthogonal directions are performed to yield a set of profile pairs For example, given a vertical subband image, projections along the vertical (principal) direction and the horizontal (orthogonal) direction are performed, yielding a pair of projection profiles This process captures the locations and lengths of edges at different orientations Fig 4-3(b) shows an example of the projection of the horizontal subband in the horizontal and vertical directions These profiles are also shown in Figs 4-4(a) and (b) It is observed that the vertical locations and strength of edges are mainly captured by the location and amplitude of peaks in Fig 4-4 (a), the principal projection, while the strength and relative horizontal spatial relationships of the edges are characterized in the orthogonal projection profile of Fig 4-4 (b)

Each profile can be represented as a mixture of peaks with different amplitudes and widths at different locations Given an edge eg(j) with orientation 6 in the original image, the location and strength of eg(j) is characterized by the location and amplitude of the j* peak pi(j) in the principal projection profile of the subband with orientation 0 (the super- script ”1” of p}(j) indicates a peak in principal profile) The relative spatial relationship between two edges eg(m) and eg(n) can be represented by dg(m,n) and dg,,/2(m,n), the displacements along principal direction 6 and orthogonal direction @ + 3, respectively The distance between two peaks pj(m) and p}(n) characterizes dg(m,n), and dg, jo(m,n) is embedded in the peaks pa in the orthogonal projection profile ti (a) Tn - —— Đ)

Figure 4-4: Image profiles of Fig 4:3(b) (a) Profile 1: projecting along the principal direction (0°) of the vertical subband (b) Profile 2: projection along the orthogonal direction (90°) of vertical subband (c) Low-pass filtered version of (a) (d) Low-pass filtered version of (b) (e) Peak-valley approximation of (a) (f) Peak-valley approximation of (b).

Low-Pass Filtering of 1D Profiles 04 74

Edge isolation through directional filtering is not ideal; thus, any edges not in the direction of a subband contribute noise to the 1D profiles In the DP-SR framework, a two-stage low-pass filtering is adopted First, a Gaussian low-pass filter is employed to suppress noise, and then a fuzzy median filter (Nie and Barner, 2002) is adopted to further remove noise while preserving the true profile peaks Figs 4-4(c) and (d) show the filtered versions of the profiles of Figs 4-4(a) and (b) It is observed that the filtered profiles are much smoother and the peaks in each profile, which preserve the edge structure information, are more easily identified.

Application to Sketch-Based Image Retreval 74 1 DP-SR-I: Piece-wise Linear Profile Modeling and Similarity Measure-

In the application of DP-SR to sketch-based image retrieval, as shown in Fig 4-2, both DP- SR-I and DP-SR-II consist of two parts, the DP-SR framework to capture structure features and a similarity measure to compute the distance between the structures The query image sketch and edge map of an image in a database are first separately passed through the DP-

SR framework, then the shape features of the two sets of resulting profiles are compared to obtain similarity values The profile modelings and similarity measurements for DP-SR-I and DP-SR-II will be addressed in the following sections

4.2.1 DP-SR-I: Piece-wise Linear Profile Modeling and Similarity Measure- ment

The first three components in DP-SR-I have been described above; profile modeling and the similarity measurement are presented below e Piece-wise linear peak-valley approzimation of 1D profile

Shape structure information is embedded in each projection profile The discussion in Section 4.1.1 on the relationship between the profile peaks and edge structure information suggests that a good peak-valley approximation of projection profiles can characterize shape information In DP-SR-I, a piece-wise linear fit is adopted for peak-valley approximation, where only the locations and amplitudes of peaks and valleys are preserved To detect peaks and valleys, zero-crossing points of the derivative of the profile are first detected The points with positive right derivatives are valleys and the others are regarded peaks Each peak with its two neighboring valleys describes one true peak in original profile While most of the noise is removed through filtering of the profiles, there are cases where peaks with small height and small extension are detected These have no correspondence to the shape structure information in the original image, and thus need to be discarded

This piece-wise linear peak-valley approximation is compact and convenient for both the representation of edges and for computation of the similarity metrics Figs 4-4(e) and (f) give the peak-valley approximations to the profiles of Figs 4-4(a) and (b)

The shape features of the query image sketch are compared with those of each candidate image in the database The similarity between two images is defined as the distance between the location and strength of the aligned edges from the two images, a measure consistent with properties of the human visual system First, the correspondence between the edges in the two images needs to be determined In the feature space of the DP-SR-I system, one needs to determine the correspondence between the shape features, i.e., the locations of peaks and valleys The profile with the smaller number of peaks is used as an anchor, a, and the target profile, t, is aligned with the anchor by minimizing the Euclidean distance between the locations of peaks in the anchor and those in the target profile

To determine the optimal alignment between an anchor profile and a target profile, equation 4.1 is employed This equation minimizes the distance between the k-th peak of the anchor profile, denoted as La(k), and the selected k-th peak from a subset of the target profile, denoted as Li(k) The objective is to find the best alignment across all peaks in the anchor profile, denoted by K, yielding the smallest distance.

Euclidean distance In this work, the exhaustive search is used to determine {L;}? Note that each set of locations and heights for a given profile is first normalized by the largest location and height, respectively TThe unmatched peaks in the non-anchor profile, whose areas are denoted as s, also provide information about the distance between two profiles

Given matched profiles, a distance metric for the j*" projection of the i** subband is defined as

By = ((3-j)*Lp+1)x (Ly +1) x (Hp +1) x (Hy +1) x (s+1)-1

The parameters Lp, Ly, Hp and Hy are the differences between the location of peaks, the location of valleys, the height of peaks and the height of valleys of two matched profiles, Lg and L;, respectively The final similarity measure for the i** subband is EK; = w, * Ey + Ej Because the locations of edges are more salient and robust than the strengths of edges in sketch-based image retrieval, a higher weight w,; = 2 is experimental determined for the profile along the principal direction, namely Ej

Finally, the distances from each subband are combined to yield the final distortion value E = lai Bị, where the subband feature distances E; are ordered from largest to lowest to form the set E; The parameter N is the total number of directional subbands Only a percentage, a, of the lowest distances are used, where œ is experi- mentally determined to be a = 0.75

4.2.2 DP-SR-II: Non-linear Gaussian Mixture Profile Modeling and Similarity

The shape structure information embedded in the directional profiles is characterized by the location, amplitude and width of the peaks in each profile While the piece-wise linear approximation of the peaks and valleys of the previous section is easy to implement, a more accurate profile approximation scheme may better capture shape information Similarly,

2Note that dynamic programing method might be adopted to reduce the computation cost. even though the comparison based on the Euclidean distance between different peaks and valleys, as adopted in DP-SR-I, is straightforward, variations between the sketches by different users and an original image are deterministically captured, and may instead be more gracefully described by probabilistic models

Based on these observations, we describe a second directional projection-based retrieval system, DP-SR-II, which inherits the DP-SR framework, as shown in Fig 4:1 As compared with DP-SR-I, a different profile modeling scheme is adopted for the shape descriptor followed by a different similarity measurement in which a probabilistic model is used The details are discussed below e Non-linear Gaussian mixture approximation of 1D profile

Following the same steps as in DP-SR-I, the sketch/edge-map is first passed through a contourlet decomposition Then projections are formed and smoothed, yielding a set of directional subbands

The shape structure information is embedded in each smoothed profile More specifi- cally, the location, amplitude and width of each peak in the profile give the structure information While earlier a piece-wise linear approximation was adopted for representation of the projection profile, now a mixture model is used A projection profile g(x) for the subband with orientation 6 can be approximated by a Gaussian mixture

Tạ(z) x G(z) = 3 ứ() = ` âXp—————z“— (4.2) j=l j=l 2xơ7 205 where each of the n peaks, po(j), in T'9(x) is characterized by a Gaussian function gj(x) The adoption of a Gaussian mixture for approximation of profiles preserves the smooth contour of the profile in a compact parameter set {A,, oF, my} More importantly, for an edge eg with orientation @ in a given image, each point on the edge has same probability of falling in the center region of eg or in the side regions Assume the location of each point in an edge follows Gaussian distribution, the principal projection profile of subband with orientation 6, which can be viewed as a histogram of the location of the points for edges with orientation @ in different locations, can thus be approximated by a Gaussian mixture While this argument does not necessarily hold for orthogonal profiles, a Gaussian mixture provides a good approximation of the profiles, where each of the Gaussian components correspond to a peak in the profile On the other hand, the Gaussian mixture representation can be viewed as an approximation of a Gabor expansion

The shape descriptor in DP-SR-II adopts nonlinear fitting, where each profile is approximated by a Gaussian mixture Three sets of the features A,, ơ?; m;, denoted as { fi;,i = 1, 2,3}, are extracted for each profile, which correspond to the amplitude, width and location of each peak in a profile Thus, the shape structure information is embedded in these features, as suggested by the relationship between the edge structure and profile peaks In the construction of the similarity measurement in DP-SR-H, the variance between different users’ sketches is considered

Two assumptions are adopted in DP-SR-II:

~— The edge-map of the original image serves as the ground-truth sketch for all the users

Tiêu đề	Texture- and Structure-based Image Representation with Applications to Image Retrieval and Compression
Tác giả	Zhihua He
Người hướng dẫn	Maja Bystrom, Ph.D., W. Clem Karl, Ph.D., Janusz Konrad, Ph.D., Stan Sclaroff, Ph.D.
Trường học	Boston College, University of Engineering
Chuyên ngành	Electrical and Computer Engineering
Thể loại	Dissertation
Năm xuất bản	2007
Thành phố	Boston

Định dạng
Số trang	209
Dung lượng	9,01 MB