Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 64 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
64
Dung lượng
444,71 KB
Nội dung
Context-based Visual Object Segmentation Wei Xia (B. Eng, Huazhong University of Science and Technology) A THESIS SUBMITTED FOR THE DEGREE OF DOCTOR OF PHILOSOPHY Department of Electrical and Computer Engineering National University of Singapore 2014 Declaration I hereby declare that this thesis is my original work and it has been written by me in its entirety. I have duly acknowledged all the sources of information which have been used in the thesis. This thesis has also not been submitted for any degree in any university previously. Wei Xia Jul. 2014 Acknowledgement Throughout the four years of my PhD study, there are a lot of people to thank for the help and support they have provided. First and foremost, I’d like to express my great gratitude to my two supervisors, Prof Loong Fah Cheong and Prof Shuicheng Yan. Specifically, in the first semester, Prof Cheong helped me to lay a solid theoretic foundation and find my research interest from a wide range of topics in the field of computer vision and machine learning. Then under the patient guidance of Prof Yan, I managed to finish some work in object semantic segmentation, which forms the main body of the thesis. I enjoyed working with them, their passion and professionalism in research, dedication to the details, complete commitment and great personality have significantly inspired me and will keep benefit me in my future life. Then I would like to express my thanks to my seniors Ju Sun and Jiashi Feng for their patient guidance when I was struggling at the beginning of my PhD study. Special thanks also goes to Dr. Csaba Domokos for his great professionalism and perfectionism, helping me win the PASCAL VOC Challenge and publish top-tiered papers. I learned a lot from the great experience of collaboration with him. I also want to thank Jian Dong and Junshi Huang, who are both my room-mates and lab-mates and provided me a lot of help in both academic and life. I will always remember the days when we discussed till late night. Besides, I met a lot of great friends here in Learning and Vision Lab, Qiang Chen, Zheng Song, Luoqi Liu, Min Lin, Si Liu, Mengdi Xu, etc. . Furthermore, I’d like to express my sincere gratitude to Mr. Zhongyang Huang, for providing me the opportunity of internship in Panasonic Singapore Laboratory. Under his guidance, I learned a lot about industrial research from the interesting projects we have done together. Last but not least, I want to thank my parents for their everlasting support and care. Finally I want to express my appreciation to my wife, Chong Chen. Without her love, companion and encouragement during the difficult times, I would have not be able to achieve this goal along this long journey of PhD study. This thesis is dedicated to her. Contents Introduction 1.1 16 Historical Background . . . . . . . . . . . . . . . . . . . . . . . . . . 18 1.1.1 Image Classification . . . . . . . . . . . . . . . . . . . . . . . 19 1.1.2 Object Detection . . . . . . . . . . . . . . . . . . . . . . . . . 20 1.1.3 Image Segmentation . . . . . . . . . . . . . . . . . . . . . . . 21 1.1.4 Semantic Segmentation . . . . . . . . . . . . . . . . . . . . . 22 Bottom-up Approaches . . . . . . . . . . . . . . . . . . . . . 22 Top-down Approaches . . . . . . . . . . . . . . . . . . . . . . 23 Integrative Approaches . . . . . . . . . . . . . . . . . . . . . 24 Obtaining Contextual Information . . . . . . . . . . . . . . . 25 1.2 Thesis Focus and Contributions . . . . . . . . . . . . . . . . . . . . . 25 1.3 Organization of the thesis . . . . . . . . . . . . . . . . . . . . . . . . 27 1.3.1 28 1.1.5 Relevant Publications . . . . . . . . . . . . . . . . . . . . . . Segmentation over Detection via Optimal Sparse Reconstructions 29 2.1 2.2 2.3 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.1.1 Motivation and Contributions . . . . . . . . . . . . . . . . . . 31 2.1.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 Proposed Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 2.2.1 Figure-ground Segmentation . . . . . . . . . . . . . . . . . . 35 2.2.2 Coupled Global and Local Reconstruction . . . . . . . . . . . 38 Optimization Procedure . . . . . . . . . . . . . . . . . . . . . . . . . 40 ˜1, . . . , x ˜r . . . . . . . . . Optimization with respect to x and x 40 Sub-problem 1: x . . . . . . . . . . . . . . . . . . . . . . . . . 40 ˜r . . . . . . . . . . . . . . . . . . . . ˜1, . . . , x Sub-problem 2: x 41 Optimization with respect to m . . . . . . . . . . . . . . . . . 41 2.4 Numerical Implementation . . . . . . . . . . . . . . . . . . . . . . . . 45 2.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 2.5.1 Convergence Analysis . . . . . . . . . . . . . . . . . . . . . . 49 2.5.2 Effects of the Size of Local Patches and Super-pixels . . . . . 50 2.5.3 Effects of the Detection Results . . . . . . . . . . . . . . . . . 51 2.5.4 Parameter Estimation . . . . . . . . . . . . . . . . . . . . . . 52 2.5.5 Proof-of-Concept Experiments . . . . . . . . . . . . . . . . . 53 2.5.6 Comparison on the PASCAL VOC Datasets . . . . . . . . . . 54 Performance gain from mask refinement . . . . . . . . . . . . 55 Comparison on VOC’10, VOC’11 and VOC’12 . . . . . . . . 56 Comparison on the Weizmann-Horses Dataset . . . . . . . . . 59 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 2.3.1 2.3.2 2.5.7 2.6 Semantic Segmentation without Annotating Segments 64 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 3.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 3.3 Proposed Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 3.3.1 Bounding Box Score Normalization . . . . . . . . . . . . . . . 68 3.3.2 Object Shape Guidance Estimation . . . . . . . . . . . . . . . 69 3.3.3 Graph-cut Based Segmentation . . . . . . . . . . . . . . . . . 71 3.3.4 Merging and Post-processing . . . . . . . . . . . . . . . . . . 74 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 3.4.1 Proof of the Concept . . . . . . . . . . . . . . . . . . . . . . . 75 3.4.2 Comparison with the State-of-the-arts . . . . . . . . . . . . . 77 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 3.4 3.5 Background Context Augmented Hypothesis Graph for Object Segmentation 84 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 4.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 4.3 Proposed Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 4.3.1 CRF-based Formulation . . . . . . . . . . . . . . . . . . . . . 91 4.3.2 Background Context Modeling . . . . . . . . . . . . . . . . . 93 4.3.3 Merging and Post-processing . . . . . . . . . . . . . . . . . . 95 4.4 Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . . 97 4.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 4.5.1 99 Proof-of-concept Experiments . . . . . . . . . . . . . . . . . . Effects of the sub-category number t . . . . . . . . . . . . . . 100 Effects of the post-processing parameters . . . . . . . . . . . 100 Effects of the CRF model . . . . . . . . . . . . . . . . . . . . 102 Effects of different contextual cues . . . . . . . . . . . . . . . 102 4.5.2 Comparison with the State-of-the-arts . . . . . . . . . . . . . 105 VOC 2012 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 MSRC-21 dataset . . . . . . . . . . . . . . . . . . . . . . . . . 108 4.6 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 Conclusion and Future Work 112 5.1 Thesis Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 5.2 Discussion of the Future Directions . . . . . . . . . . . . . . . . . . . 114 Summary Visual object recognition is one of the most fundamental problems in artificial intelligence, which mainly divides into three different tasks: object classification, object detection and object segmentation. Classification tells what object the image contains; detection predicts the bounding box location of the object, while segmentation tends to assign category labels from a predefined label set to every pixel in the image. In this thesis, we aim to solve the problem of object segmentation. It has been proved that the three tasks are significantly correlated that both classification and detection can provide useful contextual information to guide the segmentation process. We first proposed a detection based method that formulates the segmentation task as pursuing the optimal latent mask in a nonparametric manner inside the predicted bounding box via sparse reconstruction of the ground-truth masks over the training set. By taking into both the global and local constraints, a coupled convex optimization framework is proposed. By alternatively optimizing the sparse reconstruction coefficients and the latent optimal mask using Lasso and Accelerated Proximal Gradient methods, global optimal solution could be achieved. Furthermore, since ground-truth segment annotation is generally very difficult to obtain while object bounding boxes can be obtained in a much easier way. We proposed a segmentation approach based on detected bounding boxes without any additional segment annotation from either the training set or user interaction. Based on a set of segment hypothesis, a simple voting scheme is introduced to estimate the shape guidance for each bounding box. The derived shape guidance is used in the subsequent graph-cut-based figure-ground segmentation and the final segmentation result is obtained by merging the segmentation results in the bounding boxes. Finally, inspired by the significant role of the context information, besides global classification and detection, we explore the contextual cues from the unlabeled background regions that are usually ignored. A fully connected CRF model is considered over a set of overlapping hypothesis from CPMC, and the background contextual cues are learned from the unlabeled background regions and applied in the unary terms of the corresponding foreground regions. The final segmentation result is obtained via maximum-a-posteriori (MAP) inference, where the segments are merged based on a sequential aggregation manner. Note that the proposed model has strong generalization ability, other contextual cues like global classification and detection can be easily integrated into the model to further boost the performance. In order to evaluate the effectiveness of the proposed algorithms, extensive experiments are conducted on various benchmark datasets, ranging from the challenging PASCAL VOC, to Weizmann Horse dataset, Grabcut-50, MSRC-21 dataset, etc. The proposed approaches achieve new state-of-the-art performance. Based on the above methods, we won the winner prize in segmentation competition of PASCAL VOC Challenge 2012. List of Tables 2.1 List of notations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Segmentation IoU accuracy defined in (2.14) on the VOC’10 trainVal 35 Dataset [1] by changing the size of the local patches and super-pixels. 51 2.3 Segmentation IoU accuracy defined in (2.14) on the VOC’10 trainVal Dataset [1] based on different object detectors. BA is the baseline algorithm based on coarse masks, while Proposed is our sparse reconstruction framework. . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Study of the effects of different algorithm parts of the proposed method on the VOC’10 trainVal dataset [1] in IoU accuracy defined in (2.14). 2.5 51 52 Comparison of segmentation accuracies in terms of IoU measure defined in (2.14) provided by the previous methods on the VOC’07, VOC’10, VOC’11 and VOC’12 test datasets [1]. Note that the methods marked with * use extra annotation to train the model, while all other methods are trained by making use of the VOC annotations only. 55 2.6 Comparison with the state-of-the-art methods in the Weizmann-Horses Dataset [2]. Accuacy measured by the percentage of the correctly labeled pixels. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7 Statistics of the segmentation accuracy (δ) obtained from the proposed method on the Weizmann-Horses Dataset [2]. . . . . . . . . . 3.1 60 60 Comparison of segmentation accuracy provided by previous methods on VOC 2011 test dataset [1]. . . . . . . . . . . . . . . . . . . . . . . 75 HIGH PERFORMANCE NANOSTRUCTURED PHOSPHO-OLIVINE CATHODES FOR LITHIUM-ION BATTERIES DING BO A THESIS SUBMITTED FOR THE DEGREE OF DOCTOR OF PHILOSOPHY NUS GRADUATE SCHOOL FOR INTEGRATIVE SCIENCES AND ENGINEERING NATIONAL UNIVERSITY OF SINGAPORE 2014 Declaration Declaration I hereby declare that the thesis is my original work and it has been written by me in its entirety. I have duly acknowledged all the sources of information which have been used in the thesis. This thesis has also not been submitted for any degree in any university previously. _______________ Ding Bo 29 July 2014 i Acknowledgement ACKNOWLEDGEMENT First and foremost, my heartfelt thanks and sincere gratitude to my supervisors, Prof Lee Jim Yang and Prof Lu Li, for their constant guidance, timely advice and continuous encouragement all these years. They have supported me unequivocally throughout my thesis project with their patience whilst allowing me the room to explore. Their untiring dedications to imparting me with knowledge and enthusiasm for scientific research have always been an invaluable source of inspiration. I would like to express my sincere thanks to all my fellow colleagues in the research groups, in particular, Dr. Zhang Chao, Dr. Ma Yue, Dr. Ji Ge, Dr. Yu Yue, Dr. Xu Chaohe, Dr. Qu Baihua, Dr. Xiao Pengfei, Dr. Song Bohang, Dr. Lin Chunfu, Dr. Li Siheng, Dr. Zhu Jing, Dr. Ye Shukai, Dr. Fan Xiaoyong, Dr. Song Shufeng, Mr. Yao Qiaofeng, Ms. Lv Meihua, Mr. Zhan Yi, Mr. Yang Liuqing, Mr. Jiang Xi, Mr. Yan Binggong,. I thank them for their valuable suggestions and stimulating discussions. My sincere thanks to the technical staff in the Chemical and Biomolecular Engineering department especially Ms. Chia Keng Lee, Mr. Evan Tan, Mr. Kok Hong Boey, Mr. Liu Zhicheng, Mr. Mao Ning, and Dr. Yuan Zeliang. Without their superb timely technical service, this study would not complete on time. The financial supports from the National University of Singapore (NUS) Graduate School for Integrative Sciences & Engineering (NGS) are greatly acknowledged. Finally, I would like to thank my families for their unconditioned love and support. Thanks to Niuzai for sharing joys and providing supports over all these years. ii Table of content Table of content ACKNOWLEDGEMENT II SUMMARY………………………………………………………………………….VI LIST OF TABLES VIII LIST OF FIGURES .IX LIST OF ABBREVIATIONS . XIV CHAPTER INTRODUCTION . 1. Background . 1. Objectives and scope CHAPTER LITERATURE REVIEW 2. Electrochemistry of LiMPO4 2. Physical properties of LiMPO4 . 10 2. Phase behaviour and charge transport properties of LiMPO4 . 12 2.3.1 Phase diagram . 12 2.3.2 Electron conduction and Li+ diffusion 16 2.3.3 Coupled Li+ and polaron motions . 19 2.3.4 Phase transformation . 20 2.3.4.1 Equilibrium phase transformation . 20 2.3.4ing objects, ranging from rigid transportation tools, articulated animals to indoor objects. Based on the results, it is fair to say that the proposed method could well handle different scenarios in most cases, as far as the detection is accurate enough. However, there still exist some failure cases, mainly due to three types of reasons, as shown in Fig. 2.9, inaccurate sparse reconstruction, mis-detection as well as wrong labelling. In some extreme cases, the object detector could not predict any bounding box at all, leading to the whole image wrongly classified into background class. 2.5.7 Comparison on the Weizmann-Horses Dataset To better verify the proposed algorithm, we conduct experiments on the WeizmannHorses Dataset [2], which contains 328 side-view horse images with the size of 150 × 59 Table 2.6: Comparison with the state-of-the-art methods in the Weizmann-Horses Dataset [2]. Accuacy measured by the percentage of the correctly labeled pixels. Methods Number of test images Segmentation accuracy (δ) Borensteinet al. [100] 328 93.0 Cour et al. [101] 328 94.2 OBJ CUT [92] 96.0 Levin et al. [102] N/A 95.0 LOCUS [103] 200 93.1 Zhu-HDT [87] 228 94.7 Proposed 82 95.9 Table 2.7: Statistics of the segmentation accuracy (δ) obtained from the proposed method on the Weizmann-Horses Dataset [2]. Mean Minimum Maximum Median 95.91 93.76 98.13 96.09 100 pixels. Most of the horses are positioned at the center of the images and are turned to left, one can consider this dataset as over-complete in terms of pose variance, which fits well with the assumption of the sparse reconstruction algorithms. In order to compare with other state-of-the-art algorithms in this dataset, instead of the IoU measure, here we use the same measure as other competing methods, denoted by δ, for segmentation accuracy. δ shows the percentage of the correctly labeled pixels. The baseline accuracy in this test set is 76.36% by classifying all the pixels into background. Among the competing methodsin Table 2.6, Borenstein et al. [100], Cour et al. [101] and OBJ CUT [92] are unsupervised methods, while others including the proposed one are supervised. Borenstein et al. [100] constructed a Bayesian model to integrate top-down and bottom-up information. Cour et al. [101] proposed an algorithm to fit a set of local super-pixels into some pre-annotated template parts. OBJ CUT [92] utilized an object category specific MRF model to obtain segmentations. Levin et al. [102] utilized a CRF model. LOCUS [103] uses a generative probabilistic model to combine bottom-up cues of color and edge with top-down cues of shape and pose. Zhu-HDT [87] introduced a probabilistic model called hierarchical deformable template (HDT) to represents the object by state variables defined over a hierarchy, and uses a bottom-up inference algorithm called compositional inference to obtain the final state-variable results. Since there is no 60 (a) δ = 94.46% (b) δ = 96.95% (c) δ = 96.36% (d) δ = 98.06% (e) δ = 98% (f) δ = 96.67% Figure 2.10: Some exemplar results, overlaid on the images with yellow color and white boundaries, on the Weizmann-Horses Dataset [2] obtained from the proposed method. δ is the percentage of the correctly labled pixels. training set for the unsupervised methods and there is no standard trainval/test splitting for the supervised ones, it is difficult to follow the exact experimental setting. However, to make a fair comparison and avoid random split noise, we used 10 different random splitting and calculated the average as the final performance. In every splitting, we randomly split the dataset into subsets containing 164 images for training, 82 for validation and 82 for testing, respectively. The whole image are considered as the bounding box since there is only a single object contained in every image. Table 2.6 shows the comparison with the previous methods. From Table 2.6, it is observed that our methods outperforms all the competing methods except OBJ CUT [92]. Note that although OBJ CUT [92] has the highest average accuracy of 96%, however, it was tested on images only, which is not quite persuasive. Table 2.7 demonstrates the statistics of the accuracy (δ) obtained by the proposed method. The median accuracy is 96.09%, which means there are at least 41 images having a higher accuracy and on which the proposed method performs better than OBJ CUT [92]. 61 Some qualitative results with their corresponding segmentation accuracy are shown in Fig. 2.10. The proposed method predicts almost perfect results except some small local errors, even the worst one has an accuracy of 93.76%. Note that this dataset [2] is much easier than the VOC dataset [1], the performance is almost saturated. It also concludes that with over-complete training set to cover large within-class varieties, the proposed sparse reconstruction framework could significantly improve the segmentation results. 2.6 Chapter Summary In this chapter, we presented an approach for semantic segmentation based on object detection by coupled global and local sparse representations. Unlike previous methods, we formulate segmentation as a mask reconstruction problem from the coarse masks predicted from object detectors. Through global sparse reconstruction that could generally select the most similar training masks and local reconstructions that could handle locally spatial deformation. The proposed algorithm could achieve competitive results with the state-of-the-art algorithms on PASCAL VOC and Weizmann Horses benchmarks and outperforms other detection based semantic segmentation algorithms. Current performance of the proposed method on the PASCAL VOC’12 benchmark is 48.4%, which remains a great potential for improvement. Furthermore, the algorithm can be naturally extended to handle more categories in more complicated scenes, as long as there is enough annotated data to train the corresponding object detectors. One key property of our algorithm is that it heavily relies on object detectors, in order to overcome the possible failure cases for the interacting objects, some layered information other than the proposed simple scheme only relying on object detection scores can be utilized. For example, the pairwise depth information between different categories of objects, like human and bicycle, human and sofa etc., can be learned and applied as a depth constraint to help solve the ambiguity of the interacting regions. Therefore, with better object detectors in future, such as one 62 that could well handle partial objects and occlusions, significant improvement can be expected for object segmentation performance. Another promising direction is to adaptively model the local deformation, and update the dictionary correspondingly. Furthermore, 63 [...]... foreground objects, while FullAvg is the average performance of all the 21 categories 10 8 10 List of Figures 1. 1 Different sub-fields of visual recognition Classification predicts the object category labels at image level, detection localize the objects by bounding boxes, while segmentation assigns every pixel an object category it belongs to 2 .1 17 ... then we present the historical background of image segmentation and semantic segmentation, 18 finally we will introduce contextualization, including various methods to obtain and utilize the context information 1. 1 .1 Image Classification Image classification methods mainly fall into two categories, bag-of-words(BOW) based [10 , 17 19 ] and deep learning based [20–23] Traditional BoW models usually follow... and machine learning [11 , 66, 81, 87] The core subtasks of this area are object classification, detection and segmentation [10 , 11 , 87] Classification tells whether an image contains a certain object or not, detection localizes the object by providing its bounding box, while segmentation aims to assign class labels to each pixel A special case of segmentation is called semantic segmentation, where the... recognition and object recognition, therefore how to 16 Figure 1. 1: Different sub-fields of visual recognition Classification predicts the object category labels at image level, detection localize the objects by bounding boxes, while segmentation assigns every pixel an object category it belongs to emulate such capabilities by computer has been the focus of many visual recognition researchers Visual recognition... detection and segmentation (See Fig 1. 1) As shown in Fig 1. 1, given a test image, object classification tends to predict the presence/absence of an example from a certain category within the image; object detection will not only predict the category labels of the objects, but also the exact localization constrained by bounding boxes; object segmentation goes further by distinguishing different objects into... datasets with enough data di19 versity and category distribution, can be transferred to extract features for other image datasets possibly without enough training data [ 21 23, 32] 1. 1.2 Object Detection Similar to image classification, mainstream object detection methods also fall into two categories, sliding window based [14 , 33–36] and deep learning based [37–39] Since the object of interest can be... learning problem The above contextualization are mostly based on detection and classification In this work we will comprehensively explore how to utilize different context cues to help the object segmentation task 1. 2 Thesis Focus and Contributions Since the topic of this thesis is semantic object segmentation, and it has been proved that image classification, object detection and semantic segmentation are highly... representation [9, 10 ], and region labeling inference [11 , 12 ] A test image is first over-segmented into some coherent regions like super-pixels [6,8], then some local or mid-level features are extracted, like SIFT [13 ], HOG [14 ], LBP [15 ] to represent the local or mid-level regions, finally all of the features are fed into some classification models like SVM or CRF for the final labeling inference [11 , 16 ] Though... the ith object (1 ≤ i ≤ k) Bounding box of the ith object on I t Number of pixels within Bi The unknown segmentation mask in Bi Length of feature representation Feature vector of I t over Bi Number of training objects Mask of the j th training object (1 ≤ j ≤ n) Feature vector of the j th training object Codebook for training feature vectors {uj }n j =1 Codebook for training masks {mj }n j =1 Sparse... super-pixel in Bi (1 ≤ j ≤ r ) Mask of the j th super-pixel in the lth patch the proposed method is shown in Fig 2 .1 Next we introduce the formulation of the figure-ground segmentation within an object bounding box Bi (see Table 2 .1 for the list of notations) 2.2 .1 Figure-ground Segmentation Let us now consider the ith object only We exploit the detected object bounding box, thus I t is cropped based on the . is dedicated to her. 3 Contents 1 Introduction 16 1. 1 HistoricalBackground 18 1. 1 .1 ImageClassification 19 1. 1.2 ObjectDetection 20 1. 1.3 ImageSegmentation 21 1 .1. 4 SemanticSegmentation 22 Bottom-upApproaches. ComparisonwiththeState-of-the-arts 10 5 VOC 2 012 10 5 MSRC-21dataset 10 8 4.6 ChapterSummary 11 0 5 Conclusion and Future Work 11 2 5 .1 ThesisConclusion 11 2 5.2 DiscussionoftheFutureDirections 11 4 6 Summary Visual object recognition. Background Context Augmented Hypothesis Graph for Object Seg- mentation 84 4 .1 Introduction 85 4.2 RelatedWork 87 4.3 ProposedSolution 91 4.3 .1 CRF-basedFormulation 91 4.3.2 BackgroundContextModeling