Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 22 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
22
Dung lượng
2,38 MB
Nội dung
Chapter Background and Preliminaries 2.1 Overview Image alignment and registration is a cornerstone of computer vision and image processing. Early image alignment techniques can be found in the field of estimation of optical flow problem in video recognition [43] and is associated with various video processing techniques such as video compression [31, 35], video summarization [12, 67, 5] and stabilization [4, 40, 10]. Techniques can also be applied to register medical and remote sensing images (e.g. [14, 72, 29]). Our work focuses on image alignment for constructing wide-angle panorama from regular hand-held cameras using image alignment techniques. This problem was popularized in the mid-1990’s [44, 65, 20] and drew the attention of the research community. Following these early works, there have been a wide range of work that focuses on different challenges in the panorama constructing system, such as globally consistent alignment [66, 55, 70], feature points detection and matching for transformation estimation [45, 15, 16], post-processing for removing object 11 CHAPTER 2. Background and Preliminaries misalignment artifacts [70, 68, 3] and dealing with varying exposures [68, 36]. This area has matured to the point that image stitching for panorama construction has been integrated into many commercial softwares [50, 1, 2]. Given the diversity of work in this area, it is outside the scope of this thesis to review all work related to image mosaicing. Moreover, thorough surveys on this topic are already available, most notable that by Szeliski [61]. Instead, we review key techniques that are applied in our proposed methods and that are related to the traditional image mosaicing pipeline described in the previous chapter. In particular, we discuss feature extraction and matching, transformation estimation and image warping onto a canonical canvas, and post-processing techniques to hide misalignment artifacts. 2.2 Feature Registering Image registration and alignment approaches can be classified into two main groups: pixel-based alignment [44, 65, 20, 66] and feature-based alignment [73, 19, 32, 7, 45, 15, 16]. As the name implies, pixel-based alignment methods aim to find an alignment transformation which minimizes the pixel differences over the global overlapping region. This type of method is time consuming but provides more accurate result in the early days when feature-based method are weak in matching local content with different motion. More recently, with the development of viewinvariant feature extraction methods [11, 57, 42, 21], the popularity of feature-based alignment methods has overtaken the pixel-based methods due to the efficiency and capability of tolerance for disturbing object in the overlapping regions. In the survey of Mikolajczyk and Schmid [46], it is observed that among these local 12 2.2. Feature Registering good choice given t The cross corre The performance de and region detectio metric transformati nificantly worse re surprising as they a (Gaussian derivativ necessary to obtain stability of the desc Regions detecte tures. There are no of the blob and ther tors perform better Obviously, the c Figure 3: Evaluation for a viewpoint change of the camera ✰ tive Figure 2.1: of the performance of different feature for aand it would b ofA✮ comparison computed for Harris-Affine points.descriptors HL ✄ . Descriptors ◦ case of viewpoint camera has rotated 60 . This is from [46]. Intors thisfor example n sift is of thethe SIFT descriptor computed for figure Harris-Laplace figure, the detection rate is the number of correctly matched points with respect and Gabor filters. H points. to the number of all possible matches for the input pair of images. The cate falsethat robust regi positive rate represents the probability of a descriptor may have a false match in point-wise descript a conventional descriptor database. From the figure we can see that, SIFT feature as non-affine illumination The feature importance of based descriptor. H generallywell provide the best detection rates changes. over all other descriptors. it is very sensitive t affine illumination invariance is shown by the comparison ization errors. It wo to un-normalized descriptors (computed on un-normalized with patch alignme regions). These descriptors obtainFeature worse results. feature descriptors, Lowe’s Scale-Invariant Transform (SIFT) [42] feature measure the gain ob In figure 4(b) the standard descriptors are compared in generallythe provides the best performance, followed by Freeman and Adelson’s thiss-is very time co presence of illumination change. Note that it shows the verification. results[25]. onlyFigure for the higher than 0.6. SIFT teerable filters 2.1detection from [46]rate shows a comparison of thedeperformance It would be of in scriptors are normalized by the technique proposed in [11], of different feature descriptors. Therefore, in the implementation of the works in of error wh sources all other descriptors are computed on normalized image parameters. Perform patches. We feature-based observe how method the descriptors perform in the this thesis, we use SIFT to register the images. synthetic image de presence of small brightness changes which remain after additional dimensio the patch normalization. All descriptors obtain very good results except the differential invariants. Note that steerable Acknowledgme filters perform better than SIFT descriptors. This is probably due to the normalization procedure used for SIFT which This work was might be worth further investigations. We can also see that project VIBES an the photometric image transformations have less influence 2001-34405). We 13 on descriptors compared to the geometric changes (cf. figSchaffalitzky and A ure and 3). for their detectors/d Discussion and Conclusions In this paper we have presented an experimental evalu- References [1] A. Baumberg. R arated views. In CHAPTER 2. Background and Preliminaries 2.2.1 SIFT Feature Matching Scale-invariant feature transform (SIFT) [42] is a powerful feature extraction method that achieves a scale invariant, rotation invariant and illuminance invariant feature descriptor. Initially, the property of scale invariance is achieved by looking for scale-space maxima of Difference of Gaussian (DoG). Specifically, a DoG pyramid is first established in different scales and the features are first located at extrema of the DoG functions in this scale space. Then the descriptor of this feature point is computed by accumulating the 8-direction histograms of oriented gradients in the local patch. Essentially, there are two advantages of using the gradients instead of using the intensity values: one is that gradients are is more shift-tolerant since the moving edges will not change their overall direction; second, using gradients of the image ignore the lighting change of the local patch and hence achieves illumination invariance. At the same time, all local descriptor vectors are also aligned with the largest gradient and hence reaches rotation invariant in a result. For each point of interest, this approaches compute the SIFT descriptor for its × neighboring patches and use the obtained × × vector as the feature descriptor. Thus, for each of the input images, all features are extracted and matched to other feature in neighboring images based on the closest Euclidean distance . Figure 2.2(a) shows an example of detected registered SIFT points for a pair of neighboring images. 2.2.2 RANSAC Process Although computing the registering correspondences by finding the nearest neighbor of SIFT features provides locally accurate matching results, there are still in14 2.2. Feature Registering (a) Initial registered SIFT features (b) Left SIFT features after RANSAC filtering Figure 2.2: An example of SIFT feature detection and RANSAC filtering. We can see that after the RANSAC filtering process, high reliable registering correspondences are obtained. correct matches due to reasons such as moving objects or repeated content inside the scene. Thus, it is necessary to have a set of correspondences that share the same transformation. This can be done by applying a random sample consensus (RANSAC) [24] process to the correspondences set for each pair of the image. RANSAC is a robust estimation procedure that is able to find the solution with the most consensus. A typical RANSAC procedure can be summarized as follow: 15 adopted by the computer vision community from the statistics literature, RANSAC was developed from within the computer vision community. RANSAC is a resampling technique that generates candidate solutions by using the minimum number observations (data points) required to estimate the underlying model parameters. As pointed out by Fischler and Bolles [1], unlike conventional sampling techniques that use as much of the data as possible to obtain an initial solution and then proceed to prune outliers, RANSAC uses the smallest set possible CHAPTER 2. Background Preliminaries and proceeds to enlarge and this set with consistent data points [1]. The basic algorithm is summarized as follows: Algorithm RANSAC 1: Select randomly the minimum number of points required to determine the model parameters. 2: Solve for the parameters of the model. 3: Determine how many points from the set of all points fit with a predefined tolerance . 4: If the fraction of the number of inliers over the total number points in the set exceeds a predefined threshold τ , re-estimate the model parameters using all the identified inliers and terminate. 5: Otherwise, repeat steps through (maximum of N times). The number of iterations, N , is chosen high enough to ensure that the probability p (usually set to 0.99) that at least one of the sets of random samples does not include This solution to the featuresthat with same homography case. an outlier. Letcorresponds u represent the probability any selected data pointinis our an inlier This is achieved by repeatedly drawing a1 random minimal set of correspondences to compute a sampled transformation, and finding a solution that has the most consensus within the data. Specifically, for each trial of the drawing procedure, k (4 is at least necessary to estimate an homography) correspondences are randomly drawn to estimate a homography, and all other correspondences which are within pixels (around − ) are counted as inliers. The random selection process is repeated N times, and the sample set with largest number of inliers is kept as the final solution. To ensure that a true set of inliers can be selected, a sufficient number of trials N must be tried. Hence the total probability of success P is P = − (1 − pk )N . (2.1) where p is the probability of a single correspondence to be an inlier. For our works, we constantly set p = 0.5, k = and N = 200. We can see that the probability of achieving an incorrect set of correspondences is lower than 10e − 6. Figure 2.2(b) shows an example of computed feature correspondences after the RANSAC pro- 16 2.3. Transformation Estimation cess. 2.3 Transformation Estimation 2.3.1 Homography Model Estimation Once the corresponding pairs are obtained, we need to compute the transformation based on these correspondences. Early image mosaicing applications (e.g. satellite photo) usually have known motion parameters and telephoto-like images. Hence, the images can be directly stitched using simple motions such as translation, rotation and so on. Table 2.1 provides an illustration of the hierarchy of 2D coordinate transformations which are discussed in [30]. From the table we can see that any transformations can be represented as a × matrix. At the same time, the transformation preserves more properties of the coordinates when the matrix has less degree of freedom (D.O.F.) . Now the question is if the input image pairs can be aligned using any of these 2D transformation. Recall the images taking assumptions that described in Chapter 1, essentially both the scenarios guarantee that all the objects can be projected onto a virtual plane. It has been proved that under such assumption the transformation between each pair of images can be represented as a projective transform, commonly called a homography. This is done by setting the virtual plane as a reference plane which makes the screen depth equal to and hence the information of depth can be ignored during the transformation. For more details, the reader can refer to Szeliski et al.’s work [62, 63]. Thus, for each pair of the registering points p and p in the image, we use p˜ and 17 CHAPTER 2. Background and Preliminaries Name Matrix translation I t rigid (Euclidean) R t similarity sR t affine A projective ˜ H # D.O.F. Preserves: 2×3 2×3 2×3 2×3 3×3 Icon orientation + · · · lengths + · · · angles + · · · parallelism + · · · ✂ ✂ ✂ ✂ ✥✥ straight lines ❵❵ ✚❙ ✚ ❙ ❙ ✚ ❙✚ ✚❙ ❙✚ Table 2.1: An illustration of different types of 2D motions. This table is referred Table 1: Hierarchy of 2D coordinate transformations. The × matrices are extended with a third [0T 1] from [61]. row to form a full × matrix for homogeneous coordinate transformations. p˜ to denote their homogeneous coordinates. Then Hierarchy of 2D transformations The preceding set of transformations are illustrated in Figure and summarized in Table 1. The easiest way to think of these is as a set of (potentially ˜ p˜ ∼ Hp, (2.2) restricted) × matrices operating on 2D homogeneous coordinate vectors. Hartley and Zisserman (2004) contains a more detailed description of the hierarchy of 2D planar transformations. where ∼ denotes equality to scale. Since all correspondences shouldunder follow The above transformations form up a nested set of groups, i.e., they are closed composition and have anone inverse is a member ofdirectly the same group.theEach (simpler) group istoathe subset of the single that homography, we can estimate homography according more complex group belowinit.a least square manner. correspondences 2.2 3D transformations 2.3.2 Rotation Model Estimation A similar nested hierarchy exists for 3D coordinate transformations that can be denoted using A more constrained case is when the camera is rotating along its center of projection. × transformation matrices, with 3D equivalents to translation, rigid body (Euclidean) and In such case, and the homography can(sometimes be decomposed as:collineations) (Hartley and Zisserman affine transformations, homographies called 2004). −1 −1 H10maps = K1 R3D K0 = K1 R10 K , (X, Y, Z) to 2D coordinates (2.3) R0 coordinates The process of central projection p−1 x= 0= (x, y, 1) through a pinhole at the camera origin onto a 2D projection plane a distance f along the z axis, 18 X Y x=f , y=f , (9) Z Z as shown in Figure 3. The relationship between the (unit-less) focal length f and the field of view θ is given by θ or θ = tan−1 . (10) f −1 = tan 2.3. Transformation Estimation where Ki = diag( fi , fi , 1) is the camera intrinsic matrix which projects pixels onto an infinite plane, and R10 is a rotation matrix representing the motion of the camera. Here, we can see that there is only one unknown f in K which represents the focal length of the camera, and three unknowns inside the general rotation matrix R. Thus, instead of the homography transformation, we get 3−, 4− or 5− parameter rotation model when the focal length of the camera is known, equal or different. These parameters can be estimated using Levenberg-Marquardt algorithm (LMA) which is described in Brown and Lowe’s work [15, 16]. The 3D rotation model is more intrinsically stable than a full 8-parameter homography [66]. A straightforward benefit of this model is that the warped images suffer from less distortions after applying cylinder warping which is described in the following section. 2.3.3 Cylinder Mapping After the homographies for each pair of the input image are computed, the images can be directly aligned by warping according to the estimated homography. In that case, one image in the input set (in most cases the centered image) is chosen as the reference image and all other images can be then transformed to the coordinate system of the reference image by concatenating the pair-to-pair homographies. As described in Section 2.3.1, a homography preserves the perspective property that the straight lines remain straight after the transformation and sometimes this is a key requirement for some applications. However, for the stitched result with large fields of view (FOV), the content near the border of the panorama will suffer severe stretching. In practice, this problem is usually solved by applying a cylinder mapping [62, 20]. Figure 2.3 shows an illustration of the cylinder mapping. The 19 CHAPTER 2. Background and Preliminaries Homography Warping Cylinder Warping Figure 2.3: An example of homography warping and cylinder warping. The result using homography transformation preserves the straight line structures of the scene. However, we can see that as it results in a large fields of view, the content near border is stretched. This artifacts can be relieved using cylinder mapping. idea of cylinder mapping is to project the reference plane to a cylindrical surface. For a given radius r of the cylinder surface, the mapped coordinates of a point p(x, y, f ) can be computed as: x = rθ = r tan−1 x f , y = rh = r y x2 + f , (2.4) where r is usually set to be f¯ (the average of the focal length of all input images) in our implementation. Figure 2.3 shows an example of cylinder mapping. From 20 2.4. Post-Processing Techniques the figure we can see that the distortion at the border of the panorama are greatly relieved. 2.4 Post-Processing Techniques As discussed in Chapter 1, in practice, it is difficult to capture images that meet the imaging assumptions required by homography-based alignment. As a result, there almost always exists some degree of misalignment after warping using the transformations. Post-processing techniques are applied to remove these artifacts. There are two main categories of post-processing: blending and seam-cutting. Blending techniques generally perform some type of weighted pixel averaging in the overlapped regions. This helps to reduce noticeable artifacts and help blend images with different exposures. Seam-cutting techniques, on the other hand, not perform pixel blending, but instead compute the binary mask for each image in an overlapping region such that alignments artifacts are minimized. This can have the effect at times of removing undesirable objects that appear in the overlapped region. We describe both of these techniques in the following. Note that the research in this thesis uses the seam-cutting technique. 2.4.1 Blending Blending techniques have played an important role since the emergence of image mosaicing. Interesting works include approaches such as gradient domain blending [49, 36] and Laplacian pyramid blending [17, 16]. Gradient domain blending is done by assuming that the visual content of the image can be represented using the gradient of the image and hence concatenating 21 CHAPTER 2. Background and Preliminaries the images using the gradients only would generate naturally seamless result. A representative gradient domain image processing technique is P´erez et al. [49] Poisson Image Editing (PIE) which transfer the gradients of the source region to the target image by solving a Poisson equation while obeying the Dirichlet condition along the boundary. Later, this solution is simplified and extended to multiple images in Agarwala’s work [3] and accelerated using multi-grid [23] or hierarchical basis preconditioned conjugate gradient descent [60, 64]. An alternative blending method is Laplacian pyramid blending (a.k.a multiband blending) which is popularized by being adopted in Brown et al. [1] AutoStitch software. This approach assumes that the exposure difference artifacts exists in the low frequency band of the overlapping regions while the tearing artifacts exists in the high frequency band. Hence, it decomposes the overlapping region into different frequency bands and blends these bands with coarse-to-fine Gaussian kernels where the low frequency bands undergo comparatively fuzzier blending (large standard deviation σ for Gaussian kernel) and high frequency bands undergo sharper blend (small σ). Figure 2.4(a) and (b) show an example of blending result using multi-band blending method. From the figure we can see that how the images are blended on different bands. Although the final blend result has noticeable artifacts, it still provide a smooth transition between images. 2.4.2 Seam-cut Seam-cut is a recently popular technique which is applied in image composition and segmentation, representative works include [3, 39, 52]. The advantage of seamcut approach is that the seam passes through only the most unnoticeable part of the 22 2.4. Post-Processing Techniques Band Band Band (a) (b) (c) (d) Figure 2.4: An example of post-processing techniques of image mosaicing. Multiband blending and seam-cut results are compared. (a) the result of multi-band blending which is generated by AutoStitch [1] (b) an illustration of different bands of the image and the alpha blending map between each bands. (c) The seam-cut result using our data-term. (d) The seam-cut result using negative of our data-term. image, which ignores the misaligned part that is not along the seam. A comparison result of seam-cut and blending approaches is shown in Figure 2.4. This technique has been applied in commercial image mosaicing softwares such as Photoshop [50] and Microsoft ICE [2]. In our system, we develop a seam-cut approach based on Argawala et al’s work [3]. Figure 2.5 shows an illustration of the seam-cutting process. Seam-cut is 23 CHAPTER 2. Background and Preliminaries 𝑝 For a pixel position defined as : , the data term is 𝑝 𝑝 Two overlapped region and can be represented by label and respectively in the MRF. 𝑝 𝑞 For each neighboring positions the smoothness term is defined as the sum of difference D for p and q when they are assigned with different labels. 𝑝 Seam cut is performed by assigning label to each pixel position by minimizing an total energy: 𝑝 𝑝 𝑞 𝑞 𝑞 Figure 2.5: An illustration of seam-cut theory. applied to the overlapping regions of pairs of images (I0 and I1 ) aligned with the candidate homographies. The seam computation can be formulated as a labeling problem on a Markov Random Field (MRF) which minimizes a global energy with Represented by label the following form:1 in the MRF E= Ed + λ p Es , (2.5) (p,q)∈N where Ed is the data-cost energy reflecting the saliency of a pixel, p, with label lp . The smoothness energy, Es , measures the discontinuity of adjacent pixels, p and q, defined over a 4-connected neighborhood N. The label lp decides which image, I1 24 2.4. Post-Processing Techniques Figure 2.6: This example shows our blending procedure where a seam-cut is first computed, followed by blending about the seam. or I2 , will appear in the overlapped region at each pixel p. Following the formulation introduced by [3], the data-cost of each pixel is defined to be the gradient at that location: Ed (p, lp ) = −∇I(lp ) (p), (2.6) where lp decides which image gradient (i.e. ∇I1 or ∇I2 ) to use at position p. Essentially, the data-cost decides which side is preferably to shown in the final result. By using of the pixel as the data-term, an object in a homogenous region or a object that is highly textured is more likely to be revealed in the final result. On the contrary, if we use Ed (p, lp ) = ∇I(lp ) (p) for the data-term, the result tends to show more homogeneous region. Figure 2.4 (c) and (d) show comparison of different data-terms. From the result we can see that the highly textured object (the car) is shown in the result using the data-term described in Equation 2.6. The smoothness cost between two pixels p and q is defined as: Es (p, lp , q, lq ) = |lp − lq | · (D(p) + D(q)), (2.7) 25 CHAPTER 2. Background and Preliminaries which represents discontinuities between each pair of neighboring pixels. If lp = lq , the smoothness cost is 0; if lp lq , the smoothness cost is defined as the difference D of the overlapped pixels, where D is: D(ν) = I1 (ν) − I2 (ν) + α ∇I1 (ν) − ∇I2 (ν) , (2.8) where α = 2. After the data-term and smoothness-term are defined, graph-cut optimization is used to assign the labels and perform the segmentation [13]. This is done by building a weighted connected graph based on the MRF and iteratively performing the graph-cut algorithm. For each iteration, a target node (t) and a source node (s) are created to represent the current testing label (background or foreground label in our segmentation problem) and remaining label respectively. A grid of nodes with the same resolution as each images is then created to represent each pixel. The links between source/target and each pixel node corresponds to the data terms, and the links between pixels nodes corresponds to smoothness term. A min/max flow algorithm is applied to this graph to provide a labeling estimation [13]. Figure 2.7 shows an illustration of this graph. The whole procedure iteratively repeats these process while switching the current testing label to be background or foreground label, and halts until the global energy when Equation 2.5 converge to a minimal value. Seam-cutting with Blending While seam cutting produces an image with no overlaps, color discontinuities may still be noticeable. To reduce this, we introduce a method that expand the seam by 16 pixels and perform a simple linear alpha 26 2.5. Challenges for Imperfect Image Stitching and Its Related Works Figure 2.7: An illustration of graph cut labeling. The nodes t and s represents the current testing label and remaining label respectively. After one iteration of graph cut algorithm is performed, the graph is optimally segmented into two part while all remaining links have the minimum sum of weight. All nodes that connected to T is re-assigned to the current testing label. blending [58] to the pixels in this expanded seam as shown in Figure 2.6. We found this combined approach of seam-cutting and local seam blending produced better result than seam-cutting alone. 2.5 Challenges for Imperfect Image Stitching and Its Related Works In previous sections we have reviewed traditional image stitching pipeline. As introduced in section 2.3.1, the transformation model for current image mosaicing 27 CHAPTER 2. Background and Preliminaries is built on the rotation preassumption which was described in Chapter 1. When taking the panoramic image series using a hand-held camera, this assumption can be easily broken due to the center of projection of the camera is unknown and it is hard to rotate camera properly. As a result, noticeable parallax effect can occur in the images which means objects with different depth follow with different transformation model. Since traditional image mosaicing only estimates one homography as the transformation model, displacements occurs for the objects not on the estimated plane. This results in breaking artifacts after applying the seam cut post-processing. As such, the current mosaicing pipeline aims at providing a perceptually seamless result instead of a geometrically correct result. However, many input image series have deviate too far from the assumptions and break down even with post-processing to remove artifacts. To address this, we examine three places where this process may be improved. The first is more flexible warping for certain types of imaged scene which we call dual-homography scene stitching. The second which we call seam-driven image stitching explores if suboptimal geometry alignment may result in better seam-cut results. Lastly, we describe an interactive panorama correction software intended to expediate manual corrections for scene that still have artifacts after post-processing. Works closely related to these three contributions are described in the following. Related work to dual-homography The idea of fixing misalignments (some- times called deghosting [58]) in the image registration and transformation phase is not new. Various approaches use local alignment matching [58] together with scatter-point- interpolation [18], or other nonlinear warping methods [33] to correct problems in the overlapped regions. These approaches, however, assume 28 2.5. Challenges for Imperfect Image Stitching and Its Related Works that the input images can be reasonably aligned using an initial global alignment and that the misalignments are relatively small. Methods that have more flexible imaging models include ReliefMosaic [37] that uses dense matches to perform view-morphing to produce a 2.5D light-field that can be rendered to a mosaic. This approach requires an image scene with sufficient texture for estimating the quasidense image disparity. Work using manifold mosaicing [48] also allows a general imaging framework but requires dense input to select image strips in the fashion of a strip camera. Our dual-homography work, with its more constrained scene type, lies somewhere between traditional mosaicing with strong imaging assumptions, and these general approaches that require either dense matching or dense video input. As far as we are aware, the idea of the dual-homography has not been used before. Related work to seam-driven image stitching The seam-driven image stitching work tries to use the performance of the seam cut result to guide the selection of all possible transformation models. Without knowing the correct underlying geometry of scene, the target of this work is more like to synthesis a final stitched result by choosing the mosaic which has the least noticeable artifacts. Applications that related to this problem include texture synthesis techniques [22, 51, 8] and recently popularized image retargeting problem [6, 53, 54]. These techniques try to achieve their goal by finding the most unnoticeable seam to seamlessly manipulate a visually plausible result. We also utilize this mechanism to solve the image stitching problem, and unique to our seam driven work, we proposed our randomly homography generation method that provide all possible transform candidatures. 29 CHAPTER 2. Background and Preliminaries Related works to interactive panorama correction Our interactive panorama correction tool related to two main applications interactive segmentation and image warping. These are described in the following. Interactive segmentation Undesirable visual artifacts generally occur in panoramas when scene objects or scene structure has either been cut or is clearly misaligned. As part of the post-processing correction effort, the users often need to select an object in the scene they wish to preserve in the final panorama. This task is related to object segmentation. There are a variety of techniques to perform object segmentation based on various interaction methods, such as paint-like interfaces [41], scribblebased interfaces [39, 13, 52], or boundary selection [47]. As previously mentioned, the work by Agarwala et al. [3] used graph-cut in a manner similar to segmentation, but for the purpose of image compositing for a variety of photo-editing tasks, including mosaicing. Our approach has adopted this idea to allow the user to manually perform seam editing in a localized manner. Image warping When the visual errors are due to large structural misalignments, it is often necessary to perform local image warping instead seam adjustment. There are several strategies to allow interactive local warping including moving-leastsquares [56], radial basis function and thin-plate splines [9], and various other options (see [28] for a survey). Our approach is based on the liquify tools in Photoshop. The key difference is that our warping tool takes into consideration the overlapped image in the mosaiced sequence to help “snap” the warped content to assist in the image registration. 30 2.6. Summary 2.6 Summary This chapter has reviewed some key techniques of image stitching that can represent the benchmark of the classical image mosaicing systems. Many of these approaches have been applied in current commercial softwares [50, 2, 1]. We also discussed the challenges of imperfect image stitching and how they relate to the contributions of this thesis. 31 CHAPTER 2. Background and Preliminaries 32 [...]... 2.5 Challenges for Imperfect Image Stitching and Its Related Works In previous sections we have reviewed traditional image stitching pipeline As introduced in section 2 .3. 1, the transformation model for current image mosaicing 27 CHAPTER 2 Background and Preliminaries is built on the rotation preassumption which was described in Chapter 1 When taking the panoramic image series using a hand-held camera,... methods [33 ] to correct problems in the overlapped regions These approaches, however, assume 28 2.5 Challenges for Imperfect Image Stitching and Its Related Works that the input images can be reasonably aligned using an initial global alignment and that the misalignments are relatively small Methods that have more flexible imaging models include ReliefMosaic [37 ] that uses dense matches to perform view-morphing... representative gradient domain image processing technique is P´rez et al [49] e Poisson Image Editing (PIE) which transfer the gradients of the source region to the target image by solving a Poisson equation while obeying the Dirichlet condition along the boundary Later, this solution is simplified and extended to multiple images in Agarwala’s work [3] and accelerated using multi-grid [ 23] or hierarchical basis... segmentation There are a variety of techniques to perform object segmentation based on various interaction methods, such as paint-like interfaces [41], scribblebased interfaces [39 , 13, 52], or boundary selection [47] As previously mentioned, the work by Agarwala et al [3] used graph-cut in a manner similar to segmentation, but for the purpose of image compositing for a variety of photo-editing tasks, including... overlapped image in the mosaiced sequence to help “snap” the warped content to assist in the image registration 30 2.6 Summary 2.6 Summary This chapter has reviewed some key techniques of image stitching that can represent the benchmark of the classical image mosaicing systems Many of these approaches have been applied in current commercial softwares [50, 2, 1] We also discussed the challenges of imperfect image. .. to be background or foreground label, and halts until the global energy when Equation 2.5 converge to a minimal value Seam-cutting with Blending While seam cutting produces an image with no overlaps, color discontinuities may still be noticeable To reduce this, we introduce a method that expand the seam by 16 pixels and perform a simple linear alpha 26 2.5 Challenges for Imperfect Image Stitching and... important role since the emergence of image mosaicing Interesting works include approaches such as gradient domain blending [49, 36 ] and Laplacian pyramid blending [17, 16] Gradient domain blending is done by assuming that the visual content of the image can be represented using the gradient of the image and hence concatenating 21 CHAPTER 2 Background and Preliminaries the images using the gradients only... manually perform seam editing in a localized manner Image warping When the visual errors are due to large structural misalignments, it is often necessary to perform local image warping instead seam adjustment There are several strategies to allow interactive local warping including moving-leastsquares [56], radial basis function and thin-plate splines [9], and various other options (see [28] for a survey)... unknown and it is hard to rotate camera properly As a result, noticeable parallax effect can occur in the images which means objects with different depth follow with different transformation model Since traditional image mosaicing only estimates one homography as the transformation model, displacements occurs for the objects not on the estimated plane This results in breaking artifacts after applying the seam... (large standard deviation σ for Gaussian kernel) and high frequency bands undergo sharper blend (small σ) Figure 2.4(a) and (b) show an example of blending result using multi-band blending method From the figure we can see that how the images are blended on different bands Although the final blend result has noticeable artifacts, it still provide a smooth transition between images 2.4.2 Seam-cut Seam-cut . extended with a third [0 T 1] ro w to form a full 3 × 3 matrix for homogeneous coordinate transformations. Hierarchy of 2D transformations The preceding set of transformations are illustrated in Fig- ure. complex group below it. 2.2 3D transformations A similar nested hierarchy exists for 3D coordinate transformations that can be denoted using 4 × 4 transformation matrices, with 3D equivalents to t ranslation,. that case, one image in the input set (in most cases the centered image) is chosen as the reference image and all other images can be then transformed to the coordinate system of the reference image by