Summarization from multiple user generated videos in geo space

SUMMARIZATION FROM MULTIPLE USER GENERATED VIDEOS IN GEO-SPACE ZHANG YING (B.Eng., NPU, CHINA) A THESIS SUBMITTED FOR THE DEGREE OF DOCTOR OF PHILOSOPHY DEPARTMENT OF COMPUTER SCIENCE NATIONAL UNIVERSITY OF SINGAPORE 2014 DECLARATION I hereby declare that this thesis is my original work and it has been written by me in its entirety. I have duly acknowledged all the sources of information which have been used in the thesis. This thesis has also not been submitted for any degree in any university previously. ZHANG YING July 2014 a ACKNOWLEDGEMENTS Foremost, I would like to express my sincere gratitude to my supervisor Professor Roger Zimmermann for his patience, motivation and knowledge. Without his guidance and constant feedback this Ph.D would not have been achievable. I would also like to thank my committee members: Prof. Michael S. Brown, Prof. Mohan Kankanhalli and Prof. Wei Tsang Ooi for their encouragement and insightful comments. My sincere thanks also goes to Prof. Beomjoo Seo, Prof. Yu-Ling Hsueh, Prof. He Ma, Dr. David A. Shamma and Dr. Luming Zhang for discussions, suggestions and ideas for improvement. I thank the labmates in Multimedia Management Lab and Media Research Lab: Jia Hao, Haiyang Ma, Zhijie Shen, Xiangyu Wang, Xiaohong Xiang, Yangyang Xiang, Tian Gan, Guanfeng Wang, and Yifang Yin for their support to my research work. I am also very grateful to all my friends in Singapore who are always so helpful in numerous ways. Last but not the least, I would like to thank my family for all their love and encouragement. For my parents Shusheng Zhang and Li Yu, who raised me with a love of science and supported me in all my pursuits. And for my beloved husband Haitao Zhao whose faithful support during my Ph.D is so appreciated. Thank you. b CONTENTS Summary vi List of Tables viii List of Figures ix Introduction 1.1 Background and Motivation . . . . . . . . . . . . . . . . . . 1.2 Research Challenges and Contributions . . . . . . . . . . . . Literature Review and Preliminaries 2.1 2.2 12 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . 12 2.1.1 Video Summarization . . . . . . . . . . . . . . . . . . 13 2.1.2 Visual Exploration in Geo-space . . . . . . . . . . . . 20 2.1.3 Visual Recommendation . . . . . . . . . . . . . . . . 26 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.2.1 Symbolic Notations and Abbreviation . . . . . . . . . 28 2.2.2 Description of Viewable Scene in Videos . . . . . . . 29 i CONTENTS 2.2.3 Geo-referenced Video Dataset Description . . . . . . 31 Static Summarization From Multiple Geo-referenced Videos 32 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.2 Single Video Summarization . . . . . . . . . . . . . . . . . . 35 3.3 Multi-video Summarization for a Landmark . . . . . . . . . 36 3.3.1 Subshot Selection . . . . . . . . . . . . . . . . . . . . 39 3.3.2 Subshot Assembly . . . . . . . . . . . . . . . . . . . 44 3.4 Multi-video Summarization for an Area . . . . . . . . . . . . 49 3.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.6 3.5.1 Classification Accuracy . . . . . . . . . . . . . . . . . 54 3.5.2 Keyframe Summary for Single Video . . . . . . . . . 54 3.5.3 Landmark Multi-Video Summary . . . . . . . . . . . 55 3.5.4 Multi-Landmark Multi-Video Summary . . . . . . . . 62 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 Region of Interest Detection and Summarization from Georeferenced Videos 65 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.2 Regions of Interest Detection from Multiple Videos . . . . . 68 4.2.1 Probabilistic Model for ROI Detection . . . . . . . . 68 4.2.2 Capture Intention Distribution for Video Frames . . . 69 4.2.3 ROI Detection from Multiple Videos . . . . . . . . . 70 4.3 Summarization From Multiple Geo-referenced Videos . . . . 72 4.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 4.4.1 ROI Detection . . . . . . . . . . . . . . . . . . . . . . 76 4.4.2 Multi-Video Summarization . . . . . . . . . . . . . . 80 ii CONTENTS 4.4.3 4.5 Algorithm Robustness and Efficiency . . . . . . . . . 83 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 Quality-Guided Multi-Video Summarization by Graph Formulation 88 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 5.2 Aesthetics Assessment for User Generated Videos . . . . . . 91 5.3 Multiple Video Summarization . . . . . . . . . . . . . . . . . 95 5.4 5.5 5.3.1 Graph Construction . . . . . . . . . . . . . . . . . . 97 5.3.2 Dynamic Programming Solution . . . . . . . . . . . . 100 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 5.4.1 Aesthetics Evaluation for User Generated Videos . . 102 5.4.2 Aesthetics-Guided Multi-Video Summarization . . . . 108 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 Interactive and Dynamic Exploration Among Multiple Georeferenced Videos 118 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 6.2 Video Segmentation . . . . . . . . . . . . . . . . . . . . . . . 121 6.2.1 Landmark Popularity . . . . . . . . . . . . . . . . . . 121 6.2.2 Landmark Completeness . . . . . . . . . . . . . . . . 122 6.3 Video Summarization . . . . . . . . . . . . . . . . . . . . . . 123 6.4 Online Query Process . . . . . . . . . . . . . . . . . . . . . . 124 6.5 6.4.1 Data Structure Design . . . . . . . . . . . . . . . . . 125 6.4.2 Data Indexing . . . . . . . . . . . . . . . . . . . . . . 126 6.4.3 Online Query Processing . . . . . . . . . . . . . . . . 128 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 iii CONTENTS 6.5.1 Offline Video Summarization . . . . . . . . . . . . . . 130 6.5.2 Online Query . . . . . . . . . . . . . . . . . . . . . . 132 6.6 Demo System . . . . . . . . . . . . . . . . . . . . . . . . . . 136 6.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 Camera Shooting Location Recommendation for Objects in Geo-Space 138 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 7.2 Relevance Filter for Landmark Photos . . . . . . . . . . . . 143 7.3 Photo Aesthetic Quality Measurement . . . . . . . . . . . . 145 7.4 7.5 7.3.1 Color Features . . . . . . . . . . . . . . . . . . . . . 146 7.3.2 Texture Features . . . . . . . . . . . . . . . . . . . . 146 7.3.3 Spatial Distribution Features 7.3.4 Feature Classification . . . . . . . . . . . . . . . . . . 147 . . . . . . . . . . . . . 147 Camera Location Recommendation . . . . . . . . . . . . . . 148 7.4.1 GMM-based Camera Location Clustering . . . . . . . 148 7.4.2 Camera Location Recommendation . . . . . . . . . . 150 7.4.3 Spatio-temporal Camera Spot Recommendations . . 153 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 7.5.1 Data Setup . . . . . . . . . . . . . . . . . . . . . . . 156 7.5.2 Image Filtering . . . . . . . . . . . . . . . . . . . . . 156 7.5.3 Image Quality Measurement . . . . . . . . . . . . . . 157 7.5.4 Viewing Statistics . . . . . . . . . . . . . . . . . . . . 160 7.5.5 Camera Shooting Location Recommendations . . . . 160 7.6 Demo System . . . . . . . . . . . . . . . . . . . . . . . . . . 169 7.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 iv CONTENTS Conclusions and Future Work 172 8.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 8.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 8.2.1 Adaption to Updated Video Set . . . . . . . . . . . . 174 8.2.2 Summarizations According to Video Categories . . . 174 8.2.3 Audio Quality Evaluation . . . . . . . . . . . . . . . 175 8.2.4 Summary with Crowdsourcing Knowledge . . . . . . 175 Bibliography 177 v SUMMARY Summary In recent years, we have witnessed an overwhelming number of user-generated videos being captured on a daily basis. An essential reason is the rapid development in camera technology and hence videos are easily recorded on multiple portable devices, especially mobile smartphones. Such flexibility encourages the modern videos to be tagged with additional various sensor properties. In this thesis, we are interested in geo-referenced videos whose meta-data is closely tied to geographic identifications. These videos have great appeal for prospective travelers and visitors who are unfamiliar with a region, an area or a city. For example, before someone visits a place, a geo-referenced video search engine can quickly retrieve a list of videos that are captured in this place so the visitors could obtain an overall visual impression, conveniently and quickly. However, users face the prospect of an ever increasing viewing burden if the size of these video repositories keeps increasing and as a result more videos are relevant to a search query. To manage these video retrievals and provide viewers with an efficient way to browse, we introduce a novel solution to automatically generate a summarization from multiple user generated videos and present their salience to viewers in an enjoyable manner. This thesis consists of three major parts. In the first part, we introduce three pieces of work to produce a preview video to summarize a sub-area in geo-space from multiple videos. Several metrics are proposed to evaluate the summary quality and a heuristic method is used to determine the video (segment) selection and connection. One of the key features of our technique is that it leverages the geographic contexts to create a satisfactory summarization result automatically, robustly and efficiently. We also vi SUMMARY propose a graph based model to formulate this summary problem which can be applied to general videos. In the second part, an interactive and dynamic video exploration system is built where people can conduct personalized summary queries through direct map-based manipulations. In the third part, we investigate whether external crowdsourcing databases contribute to improving the summary quality. Proposing a GMM model and integrating visual or social knowledge, we recommend a list of locations to be preferentially selected in a summarization as they are of potential to capture appealing photos. vii CHAPTER Conclusions and Future Work 8.1 Conclusions In this dissertation, we create a summarization from multiple user generated videos which are tagged with rich geographic sensors properties. One of the key features of our technique is that it leverages these geographic contexts to create a satisfactory summarization result automatically, robustly and efficiently. The main contributions in this dissertation have been summarized as follows: Static Summarization from Multiple Geo-referenced Videos: We proposed a method to automatically generate a summarization for a queried space from multiple geo-referenced videos. A video skim generation model is introduced to cover each landmark comprehensively and to combine multiple videos in a pleasant and usable way. A route generator is used to determine a traveling sequence across summaries from multiple 172 CHAPTER 8. CONCLUSIONS AND FUTURE WORK landmarks in a visually coherent manner. Region of Interest Detection and Summarization from Georeferenced Videos. We propose a method to detect the regions of interest (ROIs) from multiple videos automatically according to their geographic properties using the wisdom of crowd. Compared with existing solutions, the average error distances reduced from a few dozens to a few meters. Furthermore, we refine the summarization problem so that segments with good informativeness of these detected ROIs are preferentially included in the skim. Quality-Guided Multi-Video Summarization by Graph Formulation The visual qualities among user generated videos are diverse. So we refine the static summarization problem by including a quality factor and represent it using a graph model. Desirable summarization criteria are incorporated as the graph attributes and the problem is solved by a dynamical programming-based framework. This is a general solution which can be applied to the summarization for any category of videos. Interactive and Dynamic Exploration Among Geo-Referenced Videos: To provide users a convenient way to explore videos, we build a real system where people can interactive and dynamic explore videos through direct map manipulations. Giving a start point and end point, our system quickly generates a route trajectory between the two points and retrieves its summarization in real-time. We proposed a R-Tree based index to efficiently store the summarizations for each route section and a back-track algorithm to speed up the summary retrieval. Camera Shooting Location Recommendation for Objects in Geo-Space: To investigate if external databases contribute to improving 173 CHAPTER 8. CONCLUSIONS AND FUTURE WORK the summarization quality, we proposed a GMM based model to recommends users a list of locations, where these users may be able to capture appealing landmark photos by themselves. We build a web-based demo system to visualize the recommended camera shooting locations for a user selected landmark. Using the system, users can get a clear awareness where to produce a nice landmark view and what visual perspectives can be created. 8.2 8.2.1 Future Work Adaption to Updated Video Set Our current work ignores the issue that how to update a summarization when new videos are continuously added. So when the database has been updated, it is very likely that a summarization re-computation should be executed. We will look for solutions with more elegant design so when new clips are included, the system could efficiently update the summarization results accordingly. 8.2.2 Summarizations According to Video Categories From some recent observations, the volume of our geo-referenced videos is increasing rapidly. Initially, majority of our collected videos are outdoor videos, for example, they are collected along the streets or in some open-space. So the video contents mostly capture the external appearance of the objects in these regions and hence we define the video salience as the landmarks in geo-space. However, as more and more users are using our applications, their purposes of recording a video may be different. For 174 CHAPTER 8. CONCLUSIONS AND FUTURE WORK example, we observe that some recent videos are captured during a driving so that the videos can be used for navigation. Some videos recorded popular events such as an entertainment activity. These imply that the semantic categories of these videos are different. Although we can conduct content-based analysis to determine video categories, we are interested to know if such categories can be indicated from sensor data (not limited to geographic sensor data). In the future, we will investigate 1) if sensor data can contribute to video categorizations and 2) if we can provide different summarization strategies according to the video categories. 8.2.3 Audio Quality Evaluation Our current work leverage the video geographic contexts and visual contents to create a summary however ignore the contribution from audio. Audio quality, however, may also affect the summary quality as the noises may seriously degrade viewing experience. So in the future, an further investigation in audio quality should be taken into consideration. 8.2.4 Summary with Crowdsourcing Knowledge In our last work, we recommend photography spots by mining the crowdsourcing geotagged photo collections. Such recommendation results, implicitly indicate users’ likeliness where to take a good capture of an object and they provide us some indication that videos captured at these top recommended spots may be selected with higher priorities when we create an object summary. In the future, we will investigate how we can incorporate such results into the summarization generation. One of the research challenges is some top locations are distant from each other which might 175 CHAPTER 8. CONCLUSIONS AND FUTURE WORK introduce visual inconsistency when these candidates should be concatenated for the final summarization. We will look into the balance between these factors and try to improve the overall summarization quality. 176 Bibliography [1] Camera Brands used in the Flickr Community. www.flickr.com/cameras/. [2] Cisco Visual Networking Index: Forecast and Methodology, 2013 2018. www.ci sco.com/c/en/us/solutions/collateral/service-provider/ip-n gn-ip-next-generation-network/white_paper_c11-481360.pdf/. [3] DPChallenge dataset. www.ritendra.weebly.com/aesthetics-dataset s.html/. [4] Flowplayer. flowplayer.org/. [5] GeoVid. geovid.org/. [6] Global Internet Phenomena Report: 1H 2013. www.sandvine.com/news/gl obal_broadband_trends.asp/. [7] Google Maps API. developers.google.com/maps/. [8] Multi-Video Summarization based on Video-MMR. [9] Openstreetmap. www.openstreetmap.org/. [10] Photobucket Survey: Video Uploads from Mobile Devices on the Rise. www.businesswire.com/news/home/20110829005068/en/Photobuck et-Survey-Video-Uploads-Mobile-Devices-Rise#.U6D9a9Jmh8E/. [11] Tiger/line. www.census.gov/geo/www/tiger/. 177 BIBLIOGRAPHY [12] Traveling salesman API. wiki.openstreetmap.org/wiki/Traveling_sa lesman. [13] Why Online Video Is Vital For Your 2013 Content Marketing Objectives. www. forbes.com/sites/seanrosensteel/2013/01/28/why-online-vid eo-is-vital-for-your-2013-content-marketing-objectives/. [14] Worldwide Camera Phone Industry Continues to Flourish A Little Advice for Camera Phone & Traditional Camera Vendors. blog.infotrends.com/?p=1 3801/. [15] YouTube Press Statistics, 2014. www.youtube.com/t/press_statistics/. [16] G. Abdollahian, C. Taskiran, Z. Pizlo, and E. Delp. Camera Motion-Based Analysis of User Generated Video. IEEE Transactions on Multimedia, 12(1):28–41, 2010. [17] S. Ahern, M. Naaman, R. Nair, and J. H.-I. Yang. World Explorer: Visualizing Aggregate Data From Unstructured Text in Geo-Referenced Collections. In ACM/IEEE-CS Joint Conference on Digital Libraries, 2007. [18] K. Aizawa, K. Ishijima, and M. Shiina. Summarizing Wearable Video. In International Conference on Image Processing, 2001. [19] K. Aizawa, D. Tancharoen, S. Kawasaki, and T. Yamasaki. Efficient Retrieval of Life Log based on Context and Content. In ACM Workshop on Continuous Archival and Retrieval of Personal Experiences, 2004. [20] Y. Arase, X. Xie, T. Hara, and S. Nishio. Mining People’s Trips from Large Scale Geo-tagged Photos. In ACM International Conference on Multimedia, 2010. [21] S. Arslan Ay, R. Zimmermann, and S. H. Kim. Viewable Scene Modeling for Geospatial Video Search. In ACM International Conference on Multimedia, 2008. [22] T. L. Berg and D. A. Forsyth. Animals on the Web. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2006. [23] S. Bhattacharya, R. Sukthankar, and M. Shah. A Framework for Photo-Quality Assessment and Enhancement Based on Visual Aesthetics. In ACM International Conference on Multimedia, 2010. 178 BIBLIOGRAPHY [24] Z. Botev, D. Kroese, and T. Taimre. Generalized Cross-entropy Methods with Applications to Rare-event Simulation and Optimization. Simulation, 83(11):785– 806, 2007. [25] K. Bradley and B. Smyth. Improving Recommendation Diversity. In National Conference in Artificial Intelligence and Cognitive Science, 2001. [26] A. Brahmachari and S. Sarkar. View Clustering of Wide-Baseline N-views for Photo Tourism. In Conference on Graphics, Patterns and Images, 2011. [27] M. Buchin, A. Driemel, M. van Kreveld, and V. Sacristán. An Algorithmic Framework for Segmenting Trajectories based on Spatio-temporal Criteria. In SIGSPATIAL International Conference on Advances in Geographic Information Systems, 2010. [28] D. Cai and X. He. Manifold Adaptive Experimental Design for Text Categorization. IEEE Transactions on Knowledge and Data Engineering, 24(4):707–719, 2012. [29] M. Cha, H. Kwak, P. Rodriguez, Y.-Y. Ahn, and S. Moon. I Tube, You Yube, Everybody Tubes: Analyzing the World’s Largest User Generated Content Video System. In ACM SIGCOMM Conference on Internet Measurement, 2007. [30] Y.-M. Chen and I. Bajic. A Joint Approach to Global Motion Estimation and Motion Segmentation From a Coarsely Sampled Motion Vector Field. IEEE Transactions on Circuits and Systems for Video Technology, 21(9):1316–1328, 2011. [31] B. Cheng, B. Ni, S. Yan, and Q. Tian. Learning to Photograph. In ACM International Conference on Multimedia, 2010. [32] P. Chippendale, M. Zanin, and C. Andreatta. Collective Photography. In Conference for Visual Media Production, 2009. [33] Y. Cong, J. Yuan, and J. Luo. Towards Scalable Summarization of Consumer Videos Via Sparse Dictionary Selection. IEEE Transactions on Multimedia, 14(1):66–75, 2012. [34] M. Cooper, J. Foote, A. Girgensohn, and L. Wilcox. Temporal Event Clustering for Digital Photo Collections. In ACM International Conference on Multimedia, 2003. 179 BIBLIOGRAPHY [35] G. Cornuejols, M. L. Fisher, and G. L. Nemhauser. Location of Bank Accounts to Optimize Float: an Analytic Study of Exact and Approximate Algorithms. Management Science, 23(8):789–810, 1977. [36] N. Dalal and B. Triggs. Histograms of Oriented Gradients for Human Detection. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005. [37] R. Datta, D. Joshi, J. Li, and J. Z. Wang. Studying Aesthetics in Photographic Images using a Computational Approach. In European Conference on Computer Vision, 2006. [38] G. Douglas. R-Tree, Templated C++ Implementation. superliminal.com/s ources/RTreeTemplate.zip/, 2010. [39] A. Ekin, A. Tekalp, and R. Mehrotra. Automatic Soccer Video Analysis and Summarization. IEEE Transactions on Image Processing, 12(7):796–807, 2003. [40] B. Epshtein, E. Ofek, Y. Wexler, and P. Zhang. Hierarchical Photo Organization Using Geo-Relevance. In ACM International Symposium on Advances in Geographic Information Systems, 2007. [41] R. Fergus, P. Perona, and A. Zisserman. A Visual Category Filter for Google Images. In European Conference on Computer Vision, 2004. [42] K. Gavric, D. Culibrk, P. Lugonja, M. Mirkovic, and V. Crnojevic. Detecting Attractive Locations and Tourists’ Dynamics Using Geo-Referenced Images. In International Conference on Telecommunication in Modern Satellite Cable and Broadcasting Services, 2011. [43] Y. Gong and X. Liu. Summarizing Video by Minimizing Visual Content Redundancies. In IEEE International Conference on Multimedia and Expo, 2001. [44] C. Graham. Vision and Visual Perception. Wiley Inc., 1965. [45] J. Hao, G. Wang, B. Seo, and R. Zimmermann. Keyframe Presentation for Browsing of User-generated Videos on Map Interfaces. In ACM International Conference on Multimedia, 2011. 180 BIBLIOGRAPHY [46] J. Harel, C. Koch, and P. Perona. Graph-Based Bisual Saliency. In Neural Information Processing Systems, 2006. [47] L. He, E. Sanocki, A. Gupta, and J. Grudin. Auto-Summarization of Audio-Video Presentations. In ACM International Conference on Multimedia, 1999. [48] T. Hori and K. Aizawa. Context-Based Video Retrieval System for the Life-log Applications. In ACM SIGMM International Workshop on Multimedia Information Retrieval, 2003. [49] X.-S. Hua, L. Lu, and H.-J. Zhang. Photo2Video: A System for Automatically Converting Photographic Series Into Video. IEEE Transactions on Circuits and Systems for Video Technology, 16(7):803–819, 2006. [50] A. Jaffe, M. Naaman, T. Tassa, and M. Davis. Generating Summaries and Visualization for Large Collections of Geo-Referenced Photographs. In ACM International Workshop on Multimedia Information Retrieval, 2006. [51] F. Jing, L. Zhang, and W. Y. Ma. VirtualTour: an Online Travel Assistant Based on High Quality Images. In ACM International Conference on Multimedia, 2006. [52] T. Judd, K. Ehinger, F. Durand, and A. Torralba. Learning to Predictwhere Humans Look. In IEEE International Conference on Computer Vision, 2009. [53] L. Kennedy, M. Naaman, S. Ahern, R. Nair, and T. Rattenbury. How Flickr Helps Us Make Sense of the World: Context and Content in Community-Contributed Media Collections. In ACM International Conference on Multimedia, 2007. [54] L. S. Kennedy and M. Naaman. Generating Diverse and Representative Image Search Results for Landmarks. In International Conference on World Wide Web, 2008. [55] J.-G. Lee, J. Han, and K.-Y. Whang. Trajectory Clustering: A Partition-andgroup Framework. In ACM SIGMOD International Conference on Management of Data, 2007. [56] M. S. Lew, N. Sebe, C. Djeraba, and R. Jain. Content-Based Multimedia Information Retrieval: State of the Art and Challenges. ACM Transactions on Multimedia Computing, Communications and Applications, 2(1):1–19, 2006. 181 BIBLIOGRAPHY [57] C. Li and T. Chen. Aesthetic Visual Quality Assessment of Paintings. IEEE Journal of Selected Topics in Signal Processing, 3(2):236–252, 2009. [58] Y. Li and B. Merialdo. Multi-Video Summarization based on AV-MMR. In International Workshop on Content-Based Multimedia Indexing, 2010. [59] Y. Li and B. Merialdo. Multi-Video Summarization Based on Video-MMR. In Interntional Workshop on Image Analysis for Multimedia Interactive Services, 2010. [60] Y. Li and B. Merialdo. Multi-Video Summarization based on OB-MMR. In International Workshop on Content-Based Multimedia Indexing, 2011. [61] R. Lienhart, S. Pfeiffer, and W. Effelsberg. Video Abstracting. Communications of the ACM, 40:55–62, 1997. [62] S. Lu, I. King, and M. Lyu. Video Summarization by Video Structure Analysis and Graph Optimization. In IEEE International Conference on Multimedia and Expo, 2004. [63] X. Lu, C. Wang, J.-M. Yang, Y. Pang, and L. Zhang. Photo2Trip: Generating Travel Routes from Geo-tagged Photos for Trip Planning. In ACM International Conference on Multimedia, 2010. [64] Y. Luo and X. Tang. Photo and Video Quality Evaluation: Focusing on the Subject. In European Conference on Computer Vision, 2008. [65] Y.-F. Ma, L. Lu, H.-J. Zhang, and M. Li. A User Attention Model for Video Summarization. In ACM International Conference on Multimedia, 2002. [66] A. G. Money and H. Agius. Video Summarisation: A Conceptual Framework and Survey of the State of the Art. Journal of Visual Communication and Image Represention, 19(2):121–143, 2008. [67] A. K. Moorthy, P. Obrador, and N. Oliver. Towards computational models of the visual aesthetic appeal of consumer videos. In European Conference on Computer Vision: Part V, 2010. [68] M. Naaman, Y. J. Song, A. Paepcke, and H. Garcia-Molina. Automatic Organization for Digital Photographs with Geographic Coordinates. In ACM/IEEE-CS Joint Conference on Digital Libraries, 2004. 182 BIBLIOGRAPHY [69] C.-W. Ngo, Y.-F. Ma, and H.-J. Zhang. Automatic Video Summarization by Graph Modeling. In IEEE International Conference on Computer Vision, 2003. [70] J. Nievergelt, H. Hinterberger, and K. Sevcik. The Grid File: Anadaptable, Symmetric Multikey File Structure,. ACM Transactions on Database Systems, 9(1):38–71, 1984. ´ [71] G. Noronha, C. Alvares, and T. Chambel. Sight Surfers: 360◦ Videos and Maps Navigation. In ACM Multimedia Workshop on Geotagging and its Applications in Multimedia, 2012. [72] T. Ojala, M. Pietik¨ ainen, and D. Harwood. A Comparative Study of Texture Measures with Classification Based on Featured Distributions. Pattern Recognition, 29(1):51–59, 1996. [73] A. Okabe, B. Boots, and K. Sugihara. Spatial Tessellations: Concepts and Applications of Voronoi Diagrams. John Wiley & Sons, Inc., 1992. [74] L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: Bringing order to the web. 1999. [75] S. Palmer, E. Rosch, and P. Chase. Canonical Perspective and the Perception of Objects. Attention and Performance IX, pages 135–151, 1981. [76] S. Papadopoulos, C. Zigkolis, S. Kapiris, Y. Kompatsiaris, and A. Vakali. ClustTour: City Exploration by Use of Hybrid Photo Clustering. In ACM International Conference on Multimedia, 2010. [77] S. Pongnumkul, J. Wang, and M. Cohen. Creating Map-Based Storyboards for Browsing Tour Videos. In ACM Symposium on User Interface Software and Technology, 2008. [78] P. Rigaux, M. Scholl, and A. Voisard. Spatial Databases with Application to GIS. Morgan Kaufmann Publishers Inc., 2002. [79] J. Roettgers. Online Video Will Be More Popular Than Facebook and Twitter by 2017. gigaom.com/2013/05/29/online-video-will-be-more-pop ular-than-facebook-and-twitter-by-2017/, 2013. 183 BIBLIOGRAPHY [80] M. K. Saini, R. Gadde, S. Yan, and W. T. Ooi. MoViMash: Online Mobile Video Mashup. In ACM International Conference on Multimedia, 2012. [81] F. Schroff, A. Criminisi, and A. Zisserman. Harvesting Image Databases from the Web. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(4):754–66, 2011. [82] S. Seitz, B. Curless, J. Diebel, D. Scharstein, and R. Szeliski. A Comparison and Evaluation of Multi-View Stereo Reconstruction Algorithms. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2006. [83] J. Shao, D. Jiang, M. Wang, H. Chen, and L. Yao. Multi-Video Summarization using Complex Graph Clustering and Mining. Computer Science and Information Systems, 7(1), 2010. [84] Z. Shen, S. Arslan Ay, S. H. Kim, and R. Zimmermann. Automatic Tag Generation and Ranking for Sensor-Rich Outdoor Videos. In ACM International Conference on Multimedia, 2011. [85] F. Shipman, A. Girgensohn, and L. Wilcox. Creating Navigable Multi-level Video Summaries. In IEEE International Conference on Multimedia and Expo, 2003. [86] I. Simon, N. Snavely, and S. Seitz. Scene Summarization for Online Image Collections. In IEEE International Conference on Computer Vision, 2007. [87] M. Smith and T. Kanade. Video Skimming and Characterization through the Combination of Image and Language Understanding Techniques. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 1997. [88] N. Snavely, R. Garg, S. M. Seitz, and R. Szeliski. Finding Paths Through the World’s Photos. 27(3):15:1–15:11, 2008. [89] N. Snavely, S. M. Seitz, and R. Szeliski. Photo Tourism: Exploring Photo Collections in 3D. In ACM International Conference and Exhibition on Computer Graphics and Interactive Techniques, 2006. [90] M. staff. Netflix and YouTube, Half of North American Peak Downstream Traffic. www.marketingcharts.com/wp/interactive/netflix-youtube-h alf-of-north-american-peak-downstream-traffic-29549/, 2013. 184 BIBLIOGRAPHY [91] M. A. Stricker and M. Orengo. Similarity of color images. Proceeding of Storage and Retrieval for Image and Video Databases, 3:381–392, 1995. [92] H.-H. Su, T.-W. Chen, C.-C. Kao, W. Hsu, and S.-Y. Chien. Preference-Aware View Recommendation System for Scenic Photos Based on Bag-of-AestheticsPreserving Features. IEEE Transactions on Multimedia, 14(3):833–843, 2012. [93] D. Tjondronegoro, Y.-P. P. Chen, and B. Pham. Highlights for more Complete Sports Video Summarization. IEEE MultiMedia, 11(4):22–37, 2004. [94] K. Toyama, R. Logan, and A. Roseway. Geographic Location Tags on Digital Images. In 11th ACM International Conference on Multimedia, 2003. [95] B. T. Truong and S. Venkatesh. Video Abstraction: A Systematic Review and Classification. ACM Transactions on Multimedia Computing, Communications and Applications, 3(1), 2007. [96] P. Verna. A Spotlight on UGC Participants. www.emarketer.com/Article/ Spotlight-on-UGC-Participants/1006914/, 2009. [97] F. Wang and B. Merialdo. Multi-Document Video Summarization. In IEEE International Conference on Multimedia and Expo, 2009. [98] G. Wang, H. Ma, B. Seo, and R. Zimmermann. Sensor-assisted Camera Motion Analysis and Motion Estimation Improvement for H.264/AVC Video Encoding. In International Workshop on Network and Operating System Support for Digital Audio and Video, 2012. [99] X. Wang, T. Han, and S. Yan. An HOG-LBP Human Detector with Partial Occlusion Handling. In IEEE International Conference on Computer Vision, 2009. [100] Y. Wang. Beauty is Here: Evaluating Aesthetics in Videos Using Multimodal Features and Free Training Data. In ACM International Conference on Multimedia, 2013. [101] Z. Wang and Q. Li. Video Quality Assessment Using a Statistical Model of Human Visual Speed Perception. Journal of the Optical Society of America A, 24(12):B61– B69, 2007. 185 BIBLIOGRAPHY [102] S. Wilk and W. Effelsberg. Crowdsourced Evaluation of the Perceived Viewing Quality in User-Generated Video. In ACM International Workshop on Crowdsourcing for Multimedia, 2013. [103] W. Wolf. Key Frame Selection by Motion Analysis. In IEEE International Conference on Acoustics, Speech, and Signal Processing, 1996. [104] C. Xu, J. Wang, H. Lu, and Y. Zhang. A Novel Framework for Semantic Annotation and Personalized Retrieval of Sports Video. IEEE Transactions on Multimedia, 10(3):421–436, 2008. [105] C. Xu, J. Wang, K. Wan, Y. Li, and L. Duan. Live Sports Event Detection based on Broadcast Video and Web-casting Text. In ACM International Conference on Multimedia, 2006. [106] M. Xu, N. Maddage, C. Xu, M. Kankanhalli, and Q. Tian. Creating Audio Keywords for Event Detection in Soccer Video. In International Conference on Multimedia and Expo, 2003. [107] C.-Y. Yang, H.-H. Yeh, and C.-S. Chen. Video Aesthetic Quality Assessment by Combining Semantically Independent and Dependent Features. In IEEE International Conference on Acoustics, Speech and Signal Processing, 2011. [108] W. Yin, T. Mei, and C. W. Chen. Crowdsourced Learning to Photograph via Mobile Devices. In IEEE International Conference on Multimedia and Expo, 2012. [109] K. Yu, J. Bi, and V. Tresp. Active Learning Via Transductive Experimental Design. In International Conference on Machine Learning, 2006. [110] K. Yu, S. Zhu, W. Xu, and Y. Gong. trNon-greedy Active Learning for Text Categorization Using Convex Ansductive Experimental Design. In International ACM SIGIR Conference on Research and Development in Information Retrieval, 2008. [111] D. Zhang and S.-F. Chang. Event Detection in Baseball Video Using Superimposed Caption Recognition. In ACM international conference on Multimedia, 2002. 186 BIBLIOGRAPHY [112] Y. Zhang, H. Ma, and R. Zimmermann. Dynamic Multi-video Summarization of Sensor-Rich Videos in Geo-Space. In International Conference on Multimedia Modeling, 2013. [113] Y. Zhang, G. Wang, B. Seo, and R. Zimmermann. Multi-Video Summary and Skim Generation of Sensor-Rich Videos in Geo-Space. In Multimedia Systems Conference, 2012. [114] Y. Zhang, L. Zhang, and R. Zimmermann. Aesthetics-Guided Summarization from Multiple User Generated Videos. ACM Transactions on Multimedia Computing, Communications and Applications, 2014. [115] Y. Zhang and R. Zimmermann. DVS: A Dynamic Multi-video Summarization System of Sensor-rich Videos in Geo-space. In ACM International Conference on Multimedia, 2012. [116] Y. Zhang and R. Zimmermann. Camera Shooting Location Recommendations for Landmarks in Geo-space. In IEEE International Symposium on Modeling, Analysis Simulation of Computer and Telecommunication Systems, 2013. [117] Y. Zhang, R. Zimmermann, L. Zhang, and D. A. Shamma. Points of Interest Detection from Multiple Sensor-Rich Videos in Geo-Space. In ACM International Conference on Multimedia, 2014. [118] Y.-T. Zheng, Z.-J. Zha, and T.-S. Chua. Research and Applications on Georeferenced Multimedia: a Survey. Multimedia Tools and Applications, 51(1):77–98, 2011. [119] A. C. B. Zhou Wang, Hamid R. Sheikh. Objective Video Quality Assessment. The Handbook of Video Databases: Design and Applications, pages 1041–1078, 2003. 187 [...]... Exploration in Geo space • Location & Quality • Video Quality Evaluation Static Summarization From Multiple Geo referenced Videos Interactive and Dynamic Exploration Among Geo referenced Videos • For a Landmark • For a Region • For a Route Camera Shooting Location Recommendation for Object in Geo Space Figure 2.1: Main topics in the literature survey and the relations to our work 2.1.1 Video Summarization. .. whether they are composed of only a series of independent images or video segments coming with audio streams In the next part, we have investigated the mainstream techniques for single video summarization in Section 2.1.1 and multiple video summarization in Section 2.1.1 Single Video Summarization In the field of video summarization, the majority of existing techniques are based on low level media signal... with multi-video summarization Then a few existing studies for multi-video summarization are discussed 2 Our work generates summarizations from georeferenced videos, so we investigate how the current studies associate some geo- properties 12 CHAPTER 2 LITERATURE REVIEW AND PRELIMINARIES with different media documents A survey about visual exploration in geo- space is introduced 3 Our work investigates a... ensure retrieval of pertinent information The users need to view all the videos to obtain their final selections FOV of Query Meta-Data Videos Video List End -user Client Video Acquisition Web Interface for Video Retrieval Video Storage Figure 1.2: Overview for the georeferenced video search system However, the volume of the user generated videos is increasing and hence more relevant videos are retrieved... determine the order among the selected video clips All the above problems motivate us to produce a summarization from multiple georeferenced videos and we will discuss the research challenges in the next section 1.2 Research Challenges and Contributions In this thesis, we design a method to produce a segment-composed video summarization (video skim) from multiple user generated georeferenced videos. .. Framework ROI Detection & Summary Refinement Interactive & Dynamic Exploration QualityGuided Summary Summarization Exploration Recommend Camera Locations Improvement Figure 1.3: Overview of the main contributions of this thesis There are three main components: summarization, exploration and improvement a sub-area in geo- space from multiple videos In the second part, an interactive and dynamic video exploration... coverage, redundancy and inconsistency and propose a heuristics solution to optimize each of these factors The summary length is fixed 8 CHAPTER 1 INTRODUCTION (static) which completely depends on the original information among the input videos 2 Region of Interest Detection and Summarization from Georeferenced Videos The above work requires users to specify what landmarks should be included in summary Additionally,... summarization, visual exploration in geo- space and visual recommendation As illustrated in Fig 2.1, each colored circles refers to a related topic and each arrow indicates how this topic is related to my contribution in this thesis 1 Our main target is to create a video summarization, so we look into the mainstream techniques in this field We start from the single video summarization as it shares some... highlights from a long list of information pieces [66] The operation of video summarization creates a shorter video 1 The term video segment, video section are used interchangeably in this proposal 5 CHAPTER 1 INTRODUCTION clip or a video poster which includes only the important scenes in the original video streams so that the users can gain a better understanding of a video document without watching the... manipulations In the third part, we investigate whether external crowdsourced databases contribute to improving the summary quality A brief introduction for each of these five works is described as follows: 1 Static Summarization from Multiple Geo- Referenced Videos We take a rectangular region query as an initial attempt and investigate how to create a summary for this region from its retrieved videos We . SUMMARIZATION FROM MULTIPLE USER GENERATED VIDEOS IN GEO- SPACE ZHANG YING (B.Eng., NPU, CHINA) A THESIS SUBMITTED FOR THE DEGREE OF DOCTOR OF PHILOSOPHY DEPARTMENT. Interest Detection and Summarization from Geo- referenced Videos 65 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.2 Regions of Interest Detection from Multiple Videos. 68 4.2.2 Capture Intention Distribution for Video Frames . . . 69 4.2.3 ROI Detection from Multiple Videos . . . . . . . . . 70 4.3 Summarization From Multiple Geo- referenced Videos . . . . 72 4.4

Định dạng
Số trang	202
Dung lượng	5,1 MB