Ebook Object detection and recognition in digital images: Part 2

194 16 0
Ebook Object detection and recognition in digital images: Part 2

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

(BQ) Part 1 book Object detection and recognition in digital images has contents: Object detection and tracking, object recognition, recognition based on tensor decompositions, recognition from deformable models, template based recognition, template based recognition.

4 Object Detection and Tracking 4.1 Introduction This section is devoted to selected problems in object detection and tracking Objects in this context are characterized by their salient features, such as color, shape, texture, or other traits Then the problem is telling whether an image contains a defined object and, if so, then indicating its position in an image If instead of a single image a video sequence is processed, then the task can be to track, or follow, the position and size of an object seen in the previous frame and so on This assumes high correlation between consecutive frames in the sequence, which usually is the case Eventually, an object will disappear from the sequence and the detection task can be started again Detection can be viewed as a classification problem in which the task is to tell the presence or absence of a specific object in an image If it is present, then the position of the object should be provided Classification within a group of already detected objects is usually stated separately, however In this case the question is formulated about what particular object is observed Although the two groups are similar, recognition methods are left to the next chapter Thus, examples of object detection in images are, for instance, detection of human faces, hand gestures, cars, and road signs in traffic scenes, or just ellipses in images On the other hand, if we were to spot a particular person or a road sign, etc we would call this recognition Since detection relies heavily on classification, as already mentioned, one of the methods discussed in the previous section can be used for this task However, not least important is the proper selection of features that define an object The main goal here is to choose features that are the most characteristic of a searched object or, in other words, that are highly discriminative, thus allowing an accurate response of a classifier Finally, computational complexity of the methods is also essential due to the usually high dimensions of the feature and search spaces All these issues are addressed in this section with a special stress on automotive applications 4.2 Direct Pixel Classification Color conveys important information about the contents of an environment A very appealing natural example is a coral reef Dozens of species adapt the colors of their skin so as to be as indistinguishable from the background as possible to gain protection from predators Object Detection and Recognition in Digital Images: Theory and Practice, First Edition Bogusław Cyganek © 2013 John Wiley & Sons, Ltd Published 2013 by John Wiley & Sons, Ltd Object Detection and Tracking 347 The latter the same to outwit their prey, and so on Thus, objects can be segmented out from a scene based exclusively on their characteristic colors This can be achieved with direct pixel classification into one of the two classes: objects and background An object, or pixels potentially belonging to an object, are defined providing a set or range of their allowable colors A background, on the other hand, is either also defined explicitly or can be understood as “all other values.” Such a method is usually applied first in a chain on the computer vision system to sieve out the pixels of one object from all the others For example Phung et al proposed a method for skin segmentation using direct color pixel classification [1] Road signs are detected by direct pixel segmentation in the system proposed by Cyganek [2] Features other than color can also be used For instance Viola and Jones propose using Haar wavelets in a chain of simple classifiers to select from background pixels which can belong to human faces [3] Although not perfect, the methods in this group have an immense property of dimensionality reduction Last but not least, many of them allow very fast image pre-processing 4.2.1 Ground-Truth Data Collection Ground-truth data allow verification of performance of the machine learning methods However, the process of its acquisition is tedious and time consuming, because of the high quality requirements of this type of data Acquisition of ground-truth data can be facilitated by an application built for this purpose [4, 5] It allows different modes of point selection, such as individual point positions, as well as rectangle and polynomial outlines of visible objects, as shown in Figure 4.1 An example of its operation for points marked inside the border of a road sign is depicted in Figure 4.2 Only the positions of the points are saved as meta-data to the original image These can then be processed to obtain the requested image features, i.e in this case it is color in the chosen color space This tool was used to gather point samples for the pixel-based classification for human skin selection and road sign recognition, as will be discussed in the next sections (a) (b) Figure 4.1 A road sign manually outlined by a polygon defined by the points marked by an operator This allows selection of simple (a) and more complicated shapes (b) Selected points are saved as metadata to an image with the help of a context menu Color versions of this and subsequent images are available at www.wiley.com/go/cyganekobject 348 Object Detection and Recognition in Digital Images (a) (b) (c) Figure 4.2 View of the application for manual point marking in images Only the positions of the selected points are saved in the form of meta-data to the original image These can be used to obtain image features, such as color, in the indicated places 4.2.2 CASE STUDY – Human Skin Detection Human skin detection gets much attention in computer vision due to its numerous applications The most obvious is detection of human faces for their further recognition, human hands for gesture recognition,1 or naked bodies for parental control systems [6, 7], for instance Detection of human skin regions in images requires the definition of characteristic parameters such as color and texture, as well as the choice of proper methods of analysis, such as used color space, classifiers, etc There is still ongoing research in this respect As already discussed, a method for human skin segmentation based on a mixture of Gaussians was proposed by Jones and Rehg [8] Their model contains J = 16 Gaussians which were trained from almost one billion labeled pixels from the RGB images gathered mostly from the Internet The reported detection rate is 80% with about 9% of false positives A similar method based on MoG was undertaken by Yang and Ahuja in [9] On the other hand, Jayaram et al [10] report that the best results are obtained with histogram methods rather than using the Gaussian models They also pointed out that different color spaces improve the performance but not consistently However, a fair trade-off in this respect is the direct use of the RGB space A final observation is that in all color spaces directly partitioned into achromatic and chromatic components, performance was significantly better if the luminance component was employed in detection Similar results, which indicate the positive influence of the illumination component and the poor performance of the Gaussian modeling, were reported by Phung et al [1] They also found that the Bayesian classifier with the histogram technique, as well as the multilayer perceptron, performs the best The Bayes classifier operates in accordance with Equation (3.77), in which x is a color vector, ω0 denotes a “skin,” whereas ω1 is a “nonskin” classes, as described in Section 3.4.5 However, the Bayes classifier requires much more memory than, for example, a mixture of Gaussians Therefore there is no unique “winner” and application of a specific detector can be driven by other factors such as the computational capabilities of target platforms With respect to the color space, some authors advocate using perceptually uniform color spaces for object detection based on pixel classification Such an approach was undertaken by Wu et al [11] in their fuzzy face detection method The front end of their detection constitutes 1A method for gesture recognition is presented in Section 5.2 Object Detection and Tracking Table 4.1 349 Fuzzy rules for skin detection in sun lighting Rule no Rule description R1 : Range of skin color components in daily conditions found in experiments IF R > 95 AND G > 40 AND B > 20 THEN T0 = high; Sufficient separation of the RGB components; Elimination of gray areas IF max(R,G,B)-min(R,G,B) > 15 THEN T1 = high; R, G should not be close together IF |R-G| > 15 THEN T2 = high; R must be the greatest component IF R > G AND R > B THEN T3 = high; R2 : R3 : R4 : skin segmentation operating in the Farnsworth color space A perceptual uniformity of this color space makes the classification process resemble subjective classification made by humans due to similar sensitivity to changes of color Surveys on pixel based skin detection are provided in the papers by Vezhnevets et al [12], by Phung et al [1], or the recent one by Khan et al [13] Conclusions reported in the latter publication indicate that the best results were obtained with the cylindrical color spaces and with the tree based classifier (Random forest, J48) Khan et al also indicate the importance of the luminance component in feature data, which stays in agreement with the results of Jayaram et al [10] and Phung et al [1] In this section a fuzzy based approach is presented with explicit formulation of the human skin color model, as proposed by Peer et al [14] Although simple, the conversion of the histogram to the membership function greatly reduces memory requirements, while fuzzy inference rules allow real-time inference A similar approach was also undertaken to road sign detection based on characteristic colors, which is discussed in the next section (4.2.3) The method consists of a series of the fuzzy IF THEN rules presented in Table 4.1 for daylight conditions and in Table 4.2 for artificial lighting, respectively These were designed based on expert knowledge from data provided in the paper by Peer et al [14], although other models or modifications can be easily adapted The combined (aggregated) fuzzy rule for human skin detection directly in the RGB space is as follows RHS : Table 4.2 IF T0−3 are high OR T4−6 are high THEN H = high; Fuzzy rules for flash lighting Rule no Rule description R5 : Skin color values for flash illumination IF R > 220 AND G > 210 AND B > 170 THEN T4 = high; R and G components should be close enough IF |R-G| ≤ 15 THEN T5 = high; B component has to be the smallest one IF B < R AND B < G THEN T6 = high; R6 : R7 : (4.1) 350 Object Detection and Recognition in Digital Images (R 95 The advantage of the fuzzy formulation (4.1) over its crisp version is that the influence of each particular rule can be controlled separately Also, new rules can be easily added if necessary For instance in the rule R1 when checking the condition for the component R being greater than 95 this can be assigned different values than simple “true” or “false” in the classical formulation Thus, in this case knowing a linear membership function presented in Figure 4.3, the relation R < 95 can be evaluated differently (in the range from to 1) depending on a value of R Certainly, a type of membership function can be chosen with additional “expert” knowledge Here, we assume a margin of noise in the measurement of R which in this example spans from 90–105 Apart from this region we reach two extremes for R “significantly lower” with the membership function spanning 0–0.1 and for R “significantly greater” with a corresponding membership function from 0.9–1 Such fuzzy formulation has been shown to offer much more control over a crisp formulation Therefore it can be recommended for tasks which are based on some empirical or heuristic observations A similar methodology was undertaken in fuzzy image matching, discussed in the book by Cyganek and Siebert [15], or in the task of figure detection, discussed in Section 4.4 The fuzzy AND operation can be defined with the multiplication or the minimum rule of the membership functions [16], as it was already formulated in Equations (3.162) and (3.163), respectively On the other hand, for the fuzzy implication reasoning the two common methods of Mamdani and Larsen, μ P⇒C (x, y) = (μ P (x), μC (y)) μ P⇒C (x, y) = μ P (x)μC (y) (4.2) can be used [17, 18] In practice the Mamdani rule is usually preferred since it avoids multiplication It is worth noting that the above inference rules are conceptually different from the definition of implication in the traditional logic Rules (4.2) convey the intuitive idea that the truth value of the conclusion C should not be larger than that of the premise P In the traditional implication if P is false and C is true, then P ⇒ C is defined also to be true Thus, assuming about 5% transient region as in Figure 4.3, the rule R1 in Table 4.1 for exemplary values R = 94, G = 50, and B = 55 would evaluate to (0.4 × 0.95 × 0.96) × ≈ 0.36, in accordance with the Larsen rule in (4.2) For Mamdani this would be 0.4 On the other Object Detection and Tracking 351 hand, the logical AND the traditional formulation would produce false However, the result of the implication would be true, since false ⇒ true evaluates to true Thus, neither crisp false, nor true, reflect the insight into the nature of the real phenomenon or expert knowledge (in our case these are the heuristic values found empirically by Peer et al [14] and used in Equation (4.1)) The rule RHS in (4.1) is an aggregation of the rules R1 –R6 The common method of fuzzy aggregation is the maximum rule, i.e the maximum of the output membership functions of the rules which “fired.” Thus, having output fuzzy sets for the rules the aggregated response can be inferred as μ H = max μ1P⇒C , , μnP⇒C , (4.3) where μP⇒C are obtained from (4.2) Finally, from μH the “crisp” answer can be obtained after defuzzification In our case the simplest method for this purpose is also the maximum rule (we need a “false” or “true” output), although in practice the centroid method is very popular The presented fuzzy rules were then incorporated into the system for automatic human face detection and tracking in video sequences For face detection the abovementioned method by Viola and Jones was applied [3] For tests the OpenCV implementation was used [19, 20] However, in many practical examples it showed high rate of false positives These can be suppressed however at the cost of the recall factor Therefore, to improve the former without sacrificing the latter, the method was augmented with a human skin segmentation module to take advantage if color images are available Faces found this way can be tracked, for example, with the method discussed in Section 4.6 The system is a simple cascade of a prefilter, which partitions a color image into areas-of-interest (i.e areas with human skin), and a cascade for face detection in monochrome images, as developed by Viola and Jones Thus, the prefilter realizes the already mentioned dimensionality reduction, improving speed of execution and increasing accuracy This shows the great power of a cascade of simple classifiers which can be recommended in many tasks in computer vision The technique can be seen as an ensemble of cooperating classifiers which can be arranged in a series, parallel, or a mixed fashion These issues are further discussed in Section 5.6 The system is depicted in Figure 4.4 In a cascade simple classifiers are usually employed, for which speed is preferred over accuracy Therefore one of the requirements is that the preceding classifier should have a high monochrome image color image Human skin detection accept Cascade of classifiers Classifier accept Classifier N accept re ct je ct je ct je re re Figure 4.4 A cascade of classifiers for human face detection The first classifier does dimensionality reduction selecting only pixels-of-interest based on a model of a color for human skin based on fuzzy rules 352 Object Detection and Recognition in Digital Images (a) (b) Figure 4.5 Results of face detection with the system in Figure 4.4 A color skin map (a) A test image with outlined face areas (b) recall factor In other words, it is better if such a classification module has a high ratio of false positives rather than too high false negative, i.e it passes even possibly wrong answers to the next classification stage rather than rejecting too many If this is done, then there is still hope that the next expert module in the chain will have a chance to correctly classify an object, and so on Thus, in the system in Figure 4.4 the human skin detector operates in accordance with the fuzzy method (4.1) For all relations in the particular rules of (4.1) the fuzzy margin of 5% was set as presented in Figure 4.3 Summarizing, this method was chosen for three reasons Firstly, as found by comparative experiments, it has the desirable property of a high recall factor, for the discussed reasons, at the cost of slightly lower precision when compared with other methods Secondly, it does not require any training and it is very fast, allowing run-time operation Thirdly, it is simple to implement Figure 4.5(a) depicts results of face detection in a test color image carried out in the system presented in Figure 4.4 Results of human skin detection computed in accordance with (4.1) are shown in Figure 4.5(a) The advantage of this approach is a reduction in the computations which depend on the contents of an image, since classifiers which are further along in the chain exclusively process pixels passed by the preceding classifiers This reduction reached up to 62% in the experiments with different images downloaded from the Internet from the links provided in the paper by Hsu [21] 4.2.3 CASE STUDY – Pixel Based Road Signs Detection In this application the task was to segment out image regions which could belong to road signs Although shapes and basic colors are well defined for these object, in real situations there can be high variations of the observed colors due to many factors, such as materials and paint used in manufacturing the signs, their wear, lighting and weather conditions, and many others Two methods were developed which are based on manually collected samples from a few dozen images from real traffic scenes In the first approach a fuzzy classifier was built from the color histograms In the second, the one-class SVM method, discussed in Section 3.8.4, was employed These are discussed in the following sections Object Detection and Tracking 4.2.3.1 353 Fuzzy Approach For each of the characteristic colors for each group of signs their color histograms were created based on a few thousand samples gathered An example of the red component in the HSV color space and for the two groups of signs is presented in Figure 4.6 Histograms allow assessment of the distributions of different colors of road signs and different color spaces Secondly, they allow derivation of the border values for segmentation based onsimple thresholding Although not perfect, this method is very fast and can be considered in many other machine vision tasks (e.g due to its simple implementation) [22] Based on the histograms it was observed that the threshold values could be derived in the HSV space which give an insight into the color representation However, it usually requires prior conversion from the RGB space From these histograms the empirical range values for the H and S channels were determined for all characteristic colors encountered in Polish road signs from each group [23] These are given in Table 4.3 In the simplest approach they can be used as threshold values for segmentation However, for many applications the accuracy of such a method is not satisfactory The main problem with crisp threshold based segmentation is usually the high rate of false positives, which can lower the recognition rate of the whole system However, the method is one of the fastest ones Better adapted to the actual shape of the histograms are the piecewise linear fuzzy membership functions At the same time they not require storage of the whole histogram which can be a desirable feature especially for the higher dimensional histograms, such as 2D or 3D Table 4.4 presents the piecewise linear membership functions for the blue and yellow colors of the Polish road signs obtained from the empirical histograms of Figure 4.7 Due to specific Polish conditions it was found that detection of warning signs (group “A” of signs) is more reliable based on their yellow background rather than their red border, which is thin and usually greatly deteriorated Experimental results of segmentation of real traffic scenes with the different signs are presented in Figure 4.8 and Figure 4.9 In this case, the fuzzy membership functions from Table 4.4 were used In comparison to the crisp thresholding method, the fuzzy approach allows more flexibility in classification of a pixel to one of the classes In the presented experiments such a threshold was set experimentally to 0.25 Thus, if for instance for a pixel p, if min(μHR (p), μSR (p)) ≥ 0.25, it is classified as possibly the red rim of a sign It is worth noticing that direct application of the Bayes classification rule (3.77) requires evaluation of the class probabilities Its estimation using, for instance, 3D histograms can even occupy a matrix of up to 255 × 255 × 255 entries (which makes 16 MB of memory assuming only byte per counter) This could be reduced to × 255 if channel independence is assumed However, this does not seem to be justified especially for the RGB color space, and usually leads to a higher false positive rate On the other hand, the parametric methods which evaluate the PDF with MoG not fit well to some recognition tasks what results in poor accuracy, as frequently reported in the literature [10, 1] 4.2.3.2 SVM Based Approach Problems with direct application of the Bayes method, as well as the sometimes insufficient precision of the fuzzy approach presented in the previous section, has encouraged the search Warning signs 0 (d) 0.002 0.05 300 0.004 0.1 250 0.006 0.15 150 0.008 0.014 0.016 0.2 200 50 50 100 100 150 (e) 150 200 200 250 250 300 300 0.03 0 0.005 0.01 0.015 0.02 0.025 0 50 50 100 100 150 (f) 150 I S 0.018 H 300 I V (c) 250 0.005 0.01 0.015 0.02 0.025 (b) 200 S (a) 150 0.01 100 100 0.012 50 50 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 0.3 0 H Histograms of the red color of warning and prohibition road signs S 0.25 0.35 0.4 0.45 0.05 0.1 0.15 0.2 0.25 H Figure 4.6 HSV histograms of the red color of warning signs (a,b,c) and the prohibitive signs in daylight conditions (d,e,f) Prohibitive signs 200 200 250 250 300 300 Object Detection and Tracking 355 Table 4.3 Empirical crisp threshold values for different colors encountered in Polish road signs The values refer to the normalized [0–255] HSV space Red Blue Yellow Green H S [0–10] ∪ [223–255] [120–165] [15–43] [13–110] [50–255] [80–175] [95–255] [60–170] for other classifiers In this case the idea is to use the one-class SVM, operating in one of the color spaces, to select pixels with the characteristic colors of the road signs Once again, the main objectives of the method are accuracy and speed, since it is intended to be applied in real-time applications Operation of the OC-SVMs is discussed in Section 3.12 In this section we outline the main properties and extensions of this method [2] The idea is to train the OC-SVM with color values taken from examples of pixels encountered in images of real road signs This seems to fit well to the OC-SVM since significantly large amounts of low dimensional data from one class are available Thus, a small number of SVs is usually sufficient to outline the boundaries of the data clusters A small amount of SVs means faster computation of the decision function which is one of the preconditions for automotive applications For this purpose and to avoid conversion the RGB color space is used During operation each pixel of a test image is checked to see if it belongs to the class or not with the help of formulas (3.286) and (3.287) The Gaussian kernel (3.211) was found to provide the best results A single OC-SVM was trained in a 10-fold fashion Then its accuracy was measured in terms of the ROC curves, discussed in Appendix A.5 However, speed of execution – which is a second of the important parameters in this system – is directly related to the number of support vectors which define a hypersphere encompassing data and are used in classification of a test point, as discussed in Section 3.12 These, in turn, are related to the parameter γ of the Gaussian kernel (3.211), as depicted in Figure 4.10 For γ ≤ 10 processing time in the software implementation is in the order of 15–25 ms per frame of resolution 320 × 240 Table 4.4 Piecewise linear membership functions for the red, blue, and yellow colors of Polish road signs Attribute Piecewise-linear membership functions – coordinates (x,y) Red H (HR ) Red S (SR ) Blue H (HB ) Blue S (SB ) Yellow H (HY ) Yellow S (SY ) (0, 1) - (7, 0.9) - (8, 0) - (245, 0) - (249, 0.5) - (255, 0) (75, 0.4) - (80, 1) - (180, 1) - (183, 0) (125, 0) - (145, 1) - (150, 1) - (160, 0) (100, 0) - (145, 1) - (152, 1) - (180, 0) (20, 0) - (23, 1) - (33, 1) - (39, 0) (80, 0) - (95, 0.22) - (115, 0.22) - (125, 0) - (128, 0) - (150, 0.48) - (155, 0.48) - (175, 0.18) - (200, 0.22) - (225, 1) - (249, 0.95) - (251, 0) 526 Matlab (Continued ) ttt, 517–518 ttv, 517 matrix confusion (contingency), 493, 496–497, 516 covariance, 23, 196, 198, 199–201, 204–210, 216, 218, 227–229, 234, 235, 299, 366, 390, 492 exponential, 24, 25 idempotent, 219 logarithm, 24–29, 55, 513, 515 metric, 30 non-negative factorization (NMF), 150, 152, 153–162, 164, 168, 169, 173, 181, 311, 513 positive definite, 22, 26, 28, 50, 55, 196 positive semidefinite, 22, 50, 294, 298, 490 rotation, 22 symmetric, 47, 294 translation maximum a posteriori classification, 189, 224, 236, 267, 337 maximum a priori probability maximum likelihood, 155, 224, 227 maximum membership rule, 190, 191, 224 measure Matusita, 241 Mercer’s condition, 294 message passing interface (MPI), 500, 507 metric invariant, 437 minimum error rate, 226 mixture of Gaussians, 222, 227, 228, 249, 333, 337, 348, 401 moment of inertia, 63 statistical, 421 morphological additive conjugate, 284, 286 dilation, 213, 411, 417, 419, 488, 489 erosion, 213, 381, 384, 411, 413, 417, 419, 488, 489 gradient, 212–214, 434, 435 neuron model, 282, 283 operators, 36, 490, 491, 519 outer product, 284 scale-space, 8, 285, 481, 487–490, 516 morphological operators, 36, 490, 491, 519 Index nearest-neighbors classifier k-nearest-neighbors, 236, 247, 427, 439, 462 one-nearest-neighbor, 267, 275, 413, 426, 427, 452 neural network convolutional, 462 Hamming, (HNN) 7, 8, 250, 267, 274–279, 281, 337, 423, 439, 444, 446, 448, 458, 481 morphological, 267, 282–291, 386 probabilistic, 250, 261, 267–274, 422, 481 Newton symbol (binomial coefficient), 292 no-free-lunch theorem, 298, 439 noise Gaussian, 65, 155, 233, 455, 459 Poisson, 155 object blob model, 65, 253, 257, 258, 394 object database Caltech-101, 155, 478 Calterch-256, 478 Flickr, 478, 479 LabelMe, 478, 479 PASCAL, 371, 478 occlusions, 257, 259, 263, 401, 409, 416, 424, 428, 429, 436, 440 OpenCV software platform, 236, 351, 519 OpenMP default, 503, 505–507, 510 firstprivate, 502, 503, 511 lastprivate, 502 nowait, 502 num_threads, 502 omp parallel for, 501, 505, 506, 510 omp parallel sections, 501 omp_set_num_threads, 512 omp single, 501 ordered, 501, 502 private, 502, 503, 505, 506, 510 reduction, 502, 505–507, 510 schedule, 501, 502, 505–507, 510 OpenMP library, 8, 394, 487, 499, 519 OpponentSIFT, 370, 371 optical flow, 50–52, 65, 401, 470, 471 OuterProduct, 194, 206 outliers, 63, 221, 253, 261, 327–329, 331, 333, 336, 370, 423, 439 Index parts-based representation, 151 PCA See Principal Component Analysis PCA_For Compute_PCA, 201, 202, 205, 207, 208, 210 pixel labeling problem, 372 displacement field, 437 point correspondence, 395, 478 posterior class probability, 224 precision of a classifier, 363, 365, 384–385, 388–389, 423–424, 494–497 principal component analysis, 7, 36, 55, 76, 149, 173, 189, 195–208, 336, 455 generalized, 199 SIFT, 370 probabilistic density function, 236, 255 probability density estimation nonparametric, 247, 251 projective invariance transformation, 73, 74, 421 random sample consensus (RANSAC), 370, 400 real-time systems, 282 recall of a classifier, 388–389, 434, 494–497 receiver operating characteristic (ROC), 355, 359, 360, 493–495, 516, 519 reconstruction of 3D space, 371 representer theorem, 296 retina, 470, 471 Richardson-Lucy algorithm, 155 Riemannian geometry, 23 Riemannian manifold, 24, 371 road signs detection, 352, 380 recognition, 381, 425, 435, 442, 443, 445 robust scale, 37, 55 salient point detector, 371 scale invariant feature transform (SIFT), 179, 370, 399, 409, 421, 478, 480, 481 scale-space, 8, 36, 52, 285, 411, 412, 418, 427, 430, 477, 488, 490 self organizing maps, 311 semi-metric, 21 shapelets, 399, 401 SIFT See Scale Invariant Feature Transform 527 signal-to-noise ratio (SNR), 50, 65 signature, 242, 243 singular value decomposition (SVD), 7, 10, 76, 513, 514 soft independent modeling of class analogies (SIMCA), 216, 221, 427 sparse models, 438 spectral signature, 14 speeded up robust feature image descriptor (SURF), 370, 471 stereo vision system, 49, 54, 398–402, 414, 436, 444 structural tensor, 44–62 compact, 55–57, 61–62 coherence, 49, 393 extended, 39, 54–57, 61 scale-spaces, 52–54 subspace pattern classification, 215–222, 453 support vector machine, 7, 298, 333–334, 337, 408 one-class, 337 support vectors, 327, 328, 330–333, 355, 357–360, 362 bounded, 330 tensor approximation best rank-(R1 , R2 , , RP ), 7, 10, 114, 131–139, 141–145, 180, 182, 518 rank-1, 123, 124, 126, 129, 131, 136 ball component, 71, 180 bifocal, 74, 75 core, 76, 101, 106, 110–114, 117, 119–122, 135, 137, 139, 145, 147, 172–174, 387, 457, 458, 460, 463 cumulant, 65 decomposition, 64, 71, 72, 76, 114, 126, 132, 137, 144, 147, 169, 173, 180, 181, 210, 439, 462, 513 best rank-(R1 , R2 , , RP ), 7, 10, 114, 131–139, 141–145, 180, 182, 518 HOSVD (Tucker), 112, 119, 123, 135, 176, 408, 453, 457, 458, 460, 473 rank-1 (CANDECOMP/PARAFAC, CP), 77, 126, 128–130, 143, 144 diffusion, 9, 21, 23, 69, 180 eigendecomposition, 6, 10, 27, 68, 71, 72, 210, 408 fiber, 78 528 tensor (Continued ) flattening, 6, 78–84, 85, 90, 91, 94, 95, 101, 104, 105, 112, 115, 117 backward, 79 forward, 81, 82 invariance to shape, 22 to size, 21 invariant properties, 21, 410 nonnegative factorization, 169–173 of inertia, 6, 9, 62, 63 plate component, 71, 180, 409, 411, 415 product contracted, 19, 95, 517 inner, 19, 20, 293, 454, 518 k-mode, 6, 20, 95–99 outer, 19, 50, 77, 122, 517 rank, 100–101 stick component, 71, 180, 409, 411, 415, 417–419 structural, 44–62 compact, 55–57, 61–62 extended, 39, 54–57, 61 summation convention, 10, 62 trifocal, 75 voting, 72 TensorAlgebraFor n_mode_multiply, 102, 109 n_mode_multiply_flat, 102, 108 orphan_k_mode_tensor_from, 104, 109, 143 Orphan_Kronecker_Of_Series, 101, 105, 110, 111, 142, 143 Orphan_PROJECTED_Tensor_From, 101, 106, 110, 111, 121, 142, 143 Orphan_RECONSTRUCTED_Tensor_From, 101, 107, 144 Orphan_TensorProjection_Onto_ k_Minus_SubSpace, 101, 107 orphan_video_from, 104, 105, 109 save_tensor_data, 144 Index tensorfaces, 150, 174, 222 TExHammingNeuralNet MAXNET_Activation_Fun, 279, 280 Run, 280 T_HOSVD_MultiClass_Classifier Build, 466 IncrementalBuild, 469 Run, 466, 468 T_HOSVD_SingleClass_Classifier AdjustAndNormalizeTestPattern, 468 Build, 466 ComputeTheResiduum, 468 Run, 466, 468 TProbabilisticNeuralNetFor NormalizeSingleFeature, 271 ComputeDistance, 274, 326 ComputeKernel, 274 AddClass, 272 AddFeature, 272 Train_Network, 270, 272 Run_Network, 270, 273, 274 TRandomWeights_ExHammingNeuralNet, 280–282 InitializeMAXNETWeights, 281, 282 transformation projective, 73, 74, 421 Tukey biweight function, 37 UpWrite method, 365–367 vector mean, 207, 210 quantization, 149, 151, 154, 311 view multiple, 10 visual words, 477, 478, 481 wavelets Gabor, 173, 243 Haar, 228, 347, 399, 401 winner-takes-all (WTA), 267, 275, 444–446 (a) (b) Figure 1.3 A color image can be seen as a 3D structure (a) Internal properties of such multidimensional signals can be analyzed with tensors Local structures can be detected with the structural tensor (b) Here different colors encode orientations of areas with strong signal variations, such as edges Areas with weak texture are in black These features can be used to detect pedestrians, cars, road signs and other objects (a) (b) Figure 1.5 Circular signs are found by outlining all red objects detected in the scene Then only those which fulfill the definition and relative size expected for a sign are left (a) Triangular and rectangular shapes are found based on their corner points The points are checked for all possible rectangles and again only those which comply with fuzzy rules defining sought figures are left (b) Object Detection and Recognition in Digital Images: Theory and Practice, First Edition Bogusław Cyganek © 2013 John Wiley & Sons, Ltd Published 2013 by John Wiley & Sons, Ltd Figure 2.2 An image and its three color channels as an example of a 3D tensor (a) (b) Figure 2.7 Color visualization of the structural tensor (2.7) computed from a region in Figure 2.2 in the HSV color encoding (a) Its × median filtered version (b) (a) (b) Figure 2.10 Exemplary color image (a) Three components of a 2D structural tensor encoded into the HSV color Phase of the orientation vector w represented by the hue channel (H), coherence by saturation (S), and trace with value (V) (a) (b) Figure 2.14 ST of Figure 2.10(a) computed in different scales Directional filters of ξ = and ρ = (a), ξ = and ρ = (b) Color image (IR,IG,IB) I x2 I y2 IxIy IR IG I B2 IRIG IRIB IGIB IxIR IxIG IxIB IyIB IyIG IyIB Figure 2.15 Visualization of k = 15 independent components of the extended structural tensor computed for the color image shown in the upper left pane Nr Nc Nk Nf (a) Nf Nc Image red channel Image green channel Image blue channel Image red channel Image f blue channel Nk Nr (b) Figure 2.24 An example of a 4D tensor representing a traffic video scene with color frames (a) A practical way of tensor flattening – the forward mode – which maps images to the memory in the scanning order (b) Frame No Original color15 sequence 160 120 Rank-20-20-3-3 compressed Rank-10-20-3-3 compressed Figure 2.37 Visualization of color sequences of a person from the Georgia Tech face database [141] Original color sequence (left) Rank -20-20-3-3 compressed version (middle), rank-10-20-3-3 compressed (right) (a) (b) (c) Figure 3.5 Reconstructed images from the three (a), two (b), and one (c) principal components of the original image from Figure 3.4a Reconstruction PSNR values: 53.13 dB, 32.36 dB, and 27.08 dB, respectively (a) (b) (c) Figure 3.6 A test color image (a) and its morphological gradients computed in each RGB color channel with arbitrary order (b) The same operation but in the PCA decomposed space in order of eigenvalues and after reconstruction (c) (a) (b) (c) (d) Figure 3.26 Exemplary color images (left column) and the corresponding scatterplots of their colors in the RGB color space (right column) Figure 4.11 Comparison of image segmentation with the fuzzy method (middle row) and the one-class SVM with RBF kernel (lower row) (from [2]) (a) (b) (c) (d) Figure 4.14 Red-and-blue color image segmentation with the ensemble of WOC-SVMs trained with the manually selected color samples An original 640 × 480 color image of a traffic scene (a) Segmentation results for M = (b), M = (c), and M = (d) (from [26]) (a) (b) (c) Figure 4.15 A 481 × 321 test image (a) and manually segmented areas of image from the Berkeley Segmentation Database [27] (b) Manually selected 923 points from which the RGB color values were used for system training and segmentation (c), from [26] Color versions of the images are available at the book web page [28] (a) (b) Figure 4.22 Detection of ovals in a real image (a) Detected points with the method for the allowable phase change set to N = 400 (b) Figure 4.41 Tracking of a boat in a color video sequence Fuzzy field initialized once in the first frame (upper left) which defines an object to track The IJK space provides features to track Figure 4.42 Boat tracking with the IJK and structural tensor components More precise tracking of an object due to more descriptive features Figure 4.44 Tracking results of a red car in a traffic sequence with the IJK and ST components User drawn white rectangle defines an object to track (left image) (a) (b) (c) (d) Figure 5.39 Steps of recognition of the STOP road sign Real traffic scene (a) A region of a sign detected in the red color segmented map (b) Blue channel used in the recognition stage (c) Correctly recognized STOP sign outlined in the scene (d) (a) (b) (c) (d) Figure 5.41 Steps in recognition of the two prohibition signs A traffic scene (a) Regions of separate red circles detected in the color segmented map (b) Blue channel used in the recognition stage (c) Correctly recognized and outlined signs “No Parking” and “No Left Turn” (d) (a) (b) (c) (d) Figure 5.42 Recognition of the obligation sign from a road scene (a) A blue object detected from the color segmented map (b) Red channel used in recognition stage (c) Correctly recognized “Turn Right” obligation sign (d) ... 0) - (95, 0 .22 ) - (115, 0 .22 ) - ( 125 , 0) - ( 128 , 0) - (150, 0.48) - (155, 0.48) - (175, 0.18) - (20 0, 0 .22 ) - (22 5, 1) - (24 9, 0.95) - (25 1, 0) Object Detection and Recognition in Digital Images... 0.0 020 0 .27 27 0.0010 1 .20 09 0.000 921 0.00 028 7 0.000050 0.001000 0.000050 0.000 020 52 2 10 0.0010 0.0010 0.0 020 0.0910 0.0010 2. 8009 0.00 024 00 0.0003 825 0.0009 525 0.00 028 75 0.0003350 0.0001450 8 25 ... condition (3 .24 0), i.e Nm wmi < Nm 0< (4.4) i=1 Therefore, combining condition (3 .28 8) with (4.4) the following is obtained 1< ≤ C Nm wmi ≤ Nm i=1 (4.5) 3 62 Object Detection and Recognition in Digital

Ngày đăng: 12/02/2020, 19:36

Từ khóa liên quan

Tài liệu cùng người dùng

  • Đang cập nhật ...

Tài liệu liên quan