Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 164 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
164
Dung lượng
4,01 MB
Nội dung
THE TRANSMISSION AND PROCESSING OF SENSOR-RICH VIDEOS IN MOBILE ENVIRONMENT HAO JIA B.E., HIT, CHINA A THESIS SUBMITTED FOR THE DEGREE OF DOCTOR OF PHILOSOPHY SCHOOL OF COMPUTING NATIONAL UNIVERSITY OF SINGAPORE 2013 Declaration I hereby declare that this thesis is my original work and it has been written by me in its entirety. I have duly acknowledged all the sources of information which have been used in the thesis. This thesis has also not been submitted for any degree in any university previously. HAO Jia 30 Oct 2013 a c 2013 HAO Jia All Rights Reserved Dedication This thesis is dedicated to my beloved sister and friend, Hao Ming, my beloved parents, Hao Peigang and Li Deying, who gave me unconditional support and love all my life. c Acknowledgements This thesis is the result of five years of work during which I have been accompanied and supported by many people. Without them, the completion of my thesis would not have been possible. It is now my great pleasure to take this opportunity to thank them. First and foremost, I would like to express my most profound gratitude to my supervisor, Prof. Roger Zimmermann, for his guidance and support. It has been an invaluable experience working with him in the past five years. His insights, suggestions and guidance helped me sharpen my research skills and his inspiration, patience and encouragement helped me conquer the difficulties and complete my Ph.D. program successfully. It has been a great honor for me to be his student. My gratitude and appreciation to my advisory and examining committee Prof. Wang Ye, Prof Ooi Wei Tsang, and Prof. Pung Hung Keng, for their invaluable assistance, feedback and patience at all stages of this thesis. Their criticisms, comments, and advice were critical in making this thesis more accurate, more complete and clear to read. I also would like to thank the School of Computing, National University of Singapore for providing me the opportunity to doctoral research with financial support. My sincere thanks go out to Dr. Seon Ho Kim, Dr. Beomjoo Seo and Dr. Sakire Arslan Ay with whom I have collaborated during my Ph.D. research. Their conceptual and technical insights into my research work have been invaluable. I want to express my sincere appreciation to my dear colleagues Liang Ke, Ma He, Shen Zhijie, Zhang Ying, Ma Haiyang, Cui Weiwei, Wang Guanfeng and Yin Yifang in Media Management Research Lab. We have experienced a lot together and move forward with each other. I also want to thank my dearest friends in NUS: Chen Qi, Deng Fanbo, Lu Meiyu, Ma He, Wang Xiaoli, Yang Xin and Zhang Meihui. I am grateful for the encouragement and enlightenment they gave to me. They accompanied me to overcome the most difficult period and make my life wonderful. Last, but definitely not the least, I would like to thank my family for their love and support. None of my achievements would be possible without their love and encouragement. d Publications Peer Reviewed • Jia Hao, Seon Ho Kim, Sakire Arslan Ay and Roger Zimmermann. Energy-Efficient Mobile Video Management using Smartphones. In Proceedings of the 2th ACM Multimedia Systems Conference (ACM MMSys), February 2011. • Jia Hao, Guanfeng Wang, Beomjoo Seo and Roger Zimmermann. Keyframe Presentation for Browsing of User-generated Videos on Map Interface. In Proceedings of the 19th annual ACM International Conference on Multimedia (ACM MM), November 2011. • Beomjoo Seo, Jia Hao and Guanfeng Wang. Sensor-rich Video Exploration on a Map Interface. In Proceedings of the 19th annual ACM International Conference on Multimedia (ACM MM), November 2011. • Jia Hao, Roger Zimmermann and Haiyang Ma. GTube: Geo-Predictive Video Streaming over HTTP in Mobile Environment. In the 5th ACM Multimedia Systems Conference (ACM MMSys), March 2014. Under Review • Jia Hao, Guanfeng Wang, Beomjoo Seo and Roger Zimmermann. Point of Interest Detection and Visual Distance Estimation for Sensorrich Video. In IEEE TMM, 2014. • Ke Liang, Jia Hao, Roger Zimmermann and David Y.C. Yau. Integrated Prefetching and Caching for Adaptive Streaming over HTTP: An Online Approach. In IEEE ICDCS, 2014. Patent • Roger ZIMMERMANN, Seon Ho KIM, Sakire ARSLAN AY, Beomjoo SEO, Zhijie SHEN, Guanfeng WANG, Jia HAO, Ying ZHANG. “APPARATUS, SYSTEM, AND METHOD FOR ANNOTATION OF MEDIA FILES WITH SENSOR DATA” WIPO Patent APPLICATION No. 2012115593. 31 Aug. 2012. e CONTENTS Summary v List of Figures vii List of Tables x Introduction 1.1 Background and Motivations . . . . . . . . . . . . . . . . . 1.2 Research Work and Contributions . . . . . . . . . . . . . . 1.2.1 Energy-Efficient Video Acquisition and Upload . . . 1.2.2 Point of Interest Detection and Visual Distance Estimation . . . . . . . . . . . . . . . . . . . . . . . . 1.2.3 Keyframe Presentation of User Generated Videos on a Map Interface . . . . . . . . . . . . . . . . . . . . 1.2.4 Geo-Predictive Video Streaming . . . . . . . . . . . 1.3 Organization . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Terminology Definitions . . . . . . . . . . . . . . . . . . . . . . 1 4 . . . . . 9 . . . . 13 13 14 16 17 Literature Review 2.1 Energy Management on Mobile Devices . . . . 2.1.1 System-Level Energy Management . . 2.1.2 Application-Level Energy Management 2.1.3 Summary . . . . . . . . . . . . . . . . i . . . . . . . . . . . . . . . . . . . . . . . . . . . . CONTENTS 2.2 2.3 2.4 2.5 Geo-Referenced Digital Media . . . . . . . . . . . . . . . 2.2.1 Techniques for Geo-referenced Images . . . . . . . 2.2.2 Techniques for Geo-referenced Videos . . . . . . . 2.2.3 Commercial Products . . . . . . . . . . . . . . . . 2.2.4 Video Sensor Networks . . . . . . . . . . . . . . . 2.2.5 Summary . . . . . . . . . . . . . . . . . . . . . . Geo-Location Mining . . . . . . . . . . . . . . . . . . . . 2.3.1 Mining Location History . . . . . . . . . . . . . . 2.3.2 Landmark Mining from Social Sharing Websites . Video Presentation . . . . . . . . . . . . . . . . . . . . . 2.4.1 Keyframe Extraction . . . . . . . . . . . . . . . . 2.4.2 Video Summarization . . . . . . . . . . . . . . . . 2.4.3 Summary . . . . . . . . . . . . . . . . . . . . . . Adaptive HTTP Streaming . . . . . . . . . . . . . . . . . 2.5.1 HTTP Streaming Fundamentals . . . . . . . . . . 2.5.2 Quality Adaptation in Adaptive HTTP Streaming 2.5.3 Location-Aided Video Delivery Systems . . . . . . 2.5.4 Summary . . . . . . . . . . . . . . . . . . . . . . Energy-Efficient Video Acquisition and Upload 3.1 Introduction . . . . . . . . . . . . . . . . . . . . 3.2 Power Model . . . . . . . . . . . . . . . . . . . 3.2.1 Modeled Hardware Components . . . . . 3.2.2 Analytical Power Model . . . . . . . . . 3.2.3 Validation of the Power Model . . . . . . 3.3 System Design . . . . . . . . . . . . . . . . . . . 3.3.1 Data Acquisition and Upload . . . . . . 3.3.2 Data Storage and Indexing . . . . . . . . 3.3.3 Query Processing . . . . . . . . . . . . . 3.4 Experimental Evaluation . . . . . . . . . . . . . 3.4.1 Simulator Operation . . . . . . . . . . . 3.4.2 Simulator Architecture and Modules . . 3.4.3 Experiments and Results . . . . . . . . . 3.5 Prototype . . . . . . . . . . . . . . . . . . . . . 3.5.1 Android Geo-Video Application . . . . . ii . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 19 20 22 22 23 24 24 25 25 25 26 26 27 27 28 29 30 . . . . . . . . . . . . . . . 31 31 32 32 33 34 36 37 38 39 40 40 42 45 55 55 CONTENTS 3.6 3.5.2 User Interface . . . . . . . . . . . . . . . . . . . . . . 58 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 Point of Interest Detection and Visual Distance Estimation 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Approach Design . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 POI Detection . . . . . . . . . . . . . . . . . . . . . . 4.2.2 Effective Visual Distance Estimation . . . . . . . . . 4.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 Data Collection . . . . . . . . . . . . . . . . . . . . . 4.3.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . Keyframe Presentation for Browsing of terfaces 5.1 Keyframe Extraction . . . . . . . . . . 5.1.1 Visual Similarity Measurement 5.1.2 Keyframe Selection . . . . . . . 5.2 Experiments . . . . . . . . . . . . . . . 5.2.1 Keyframe Extraction Results . 5.2.2 Keyframe Placement Results . . 5.3 Prototype . . . . . . . . . . . . . . . . 5.3.1 System Architecture . . . . . . 5.3.2 Demonstration . . . . . . . . . 5.4 Summary . . . . . . . . . . . . . . . . Videos on Map In. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . GTube: Geo-Predictive Video Streaming 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . 6.2 System Design . . . . . . . . . . . . . . . . . . . . . 6.2.1 Geo-Bandwidth Data Collection and Upload 6.2.2 Geo-Bandwidth Query and Response . . . . 6.2.3 Quality Adaptation . . . . . . . . . . . . . . 6.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . 6.3.1 Datasets . . . . . . . . . . . . . . . . . . . . 6.3.2 Experimental Setup . . . . . . . . . . . . . . iii 60 60 62 62 67 69 69 73 85 86 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 89 89 91 93 93 98 99 99 100 102 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 . 103 . 105 . 106 . 108 . 112 . 117 . 117 . 119 CONTENTS 6.4 6.3.3 Evaluation Metrics . 6.3.4 Experimental Results 6.3.5 Discussion . . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusions 7.1 Summary of Research . . . . . . . . . . . . . . . . . . . . . 7.2 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . Bibliography . . . . 120 122 130 130 132 . 132 . 133 . 135 138 iv CHAPTER Conclusions In this chapter, we summarize the conclusions that we have reached in transmission, processing and presentation of sensor-rich videos. Also, a few potential areas for extension and possible applications of these research results will be presented. 7.1 Summary of Research This thesis proposed to develop comprehensive methods for transmission and processing of sensor-rich videos. The aim is to properly utilize the sensor metadata to achieve goals such as efficient video transmission, POI detection and effective video presentation. This thesis has proposed several methods to address these issues. Below we summarize the specific contributions and findings of our works. First, in Chapter 3, we presented the design and prototype implementation of a mobile video acquisition and upload scheme that uses smartphones as mobile video sensors. We implemented an extensive simulator to demonstrated the energy efficiency of our system. The simulation results show that compare to Immediate strategy, our onDemand strategy can save energy for mobile device ranges from 10% to 40%, prolongs the device lifetime up to 50%, and the total reduced data transmitted ranges between 10% to 70%. 132 CHAPTER 7. CONCLUSIONS Second, in Chapter 4, we presented an approach to detect POIs and their distances from the camera location in a fully automated way. We provided two algorithms for POI identification and also a method to estimate the effective visual distance without examining the actual video content, purely based on associated sensor information. In addition, we designed a cross-intersection elimination method to remove non-existing phantom POIs. The experimental results show that our approach can successfully detect POIs from Singapore and Chicago dataset within limited time period. Third, Chapter presented a novel and integrated video exploration approach where keyframes are positioned at an expected target location during playback on a map interface. Keyframes and their locations are computed in a fully automated manner. Thus, a number of visual cues are provided to the user to effectively navigate a large set of videos. We have implemented a prototype system to demonstrate the feasibility of our approach. Finally, we presented GTube (Chapter 6): geo-predictive video streaming over HTTP in mobile environment. We have developed a smartphone application to gather information and relate it to GPS locations. The information collected is used to build a bandwidth map. A path prediction and a geo-based bandwidth estimation method was presented for estimating the future network conditions. We also provided two quality adaptation algorithms which make use of the predicted bandwidth obtained in the previous step. Experimental results show that the technique is effective to achieve continuous playback and to provide higher and stabilized media quality to the end user. 7.2 Limitations Our research has shown that using location and viewing direction information, coupled with timestamps, efficient video delivery systems can be developed, more interesting information can be mined from video repository, and user-generated video presentation can be more natural. However, our research is not perfect. The limitations lie in several aspects: 133 CHAPTER 7. CONCLUSIONS First, our viewable scene model assumes a 2D camera plane. In most cases, this assumption does not affect the results for the POI detection and visual distance estimation when the third dimension can be omitted. However, our method is unaware of the altitude and elevation angle of the camera. The estimation process always tries to find POIs on the ground plane which may cause erroneous R estimation as showed in frame 260 of Fig. 4.15. A future solution for this problem is to construct a field-of-view model in 3D space so that the height of a POI can be considered. Second, it is meaningful to analyze the GPS and compass error impact on the accuracy of POI detection. In our current implementation we applied the GPS location filtering method introduced by Hakeem et al. [43]. We simply filtered out GPS values with GPS error values higher than 20 meters. Because some sample points were discarded, there existed a few gaps between some consecutive GPS measurements and we linearly interpolated those values. We agree that our method does not always work (just like content-based methods don’t always work). In our practical experience the sensor accuracy is definitely an aspect that requires attention. Overall we found that our method is quite robust. Because our method is based on the FOVs constructed by continuous sensor data sampling, the sensor data error can be balanced out to a certain degree, and should not significantly affect the accuracy of POI detection and visual distance estimation. Third, for bandwidth prediction in Chapter 6, as this is a new method proposed, the original test geo-bandwidth dataset for the evaluation of bandwidth prediction method were collected by me only. As the data collection task is laborious, we did not acquire a large enough dataset. Different measurements in same location during different time of the day are not adequate. Therefore in our current system, we assume that bandwidth in the same location is constant. However, this is definitely not true in real situation. This problem can be solved by taking the geo-bandwidth data sampling time into consideration. The input for bandwidth prediction algorithm are not only location but also the time instant. After finding the k-nearest locations of current location, the algorithm can select the bandwidth value collected in the closest time point during a day in these locations, and perform the prediction. 134 CHAPTER 7. CONCLUSIONS 7.3 Future Work There are a lot of open issue in the area of transmission, processing and presentation of sensor-rich video. In our future work we plan to extend our approach in the following aspects: • For the POI detection our two proposed methods each have their own benefits. When targeting large scale applications, we may consider a hybrid strategy to combine the two methods to achieve overall better performance. Currently, our visual distance R estimation algorithm only works when there exists one or more than one POIs within the field-of-view. For frames with ambiguous content a user feedback mechanism may be able to help improve the R estimation results. Given the estimated distance R, we may use it to adjust the center vector length of the stored field-of-view slices and hence obtain a continuous stream of precise viewable scene descriptions corresponding to the video frames. We plan to utilize such data to facilitate many types of video applications such as video search and presentation. Furthermore, current work in spatio-temporal index structures can not fully take advantage of dynamically changing field-of-view shapes. Therefore, a better index structure is needed for fast access to this type of data. • For bandwidth prediction, currently we only use spatial distance as weighted factor, later we will take the temporal factor into consideration, expecting to get more accurate bandwidth results. We will also investigate the problem of the frequency of geo-bandwidth data uploads and frequency of bandwidth map updates, as they may affect the interaction between server and clients and further affect the users’ perceived experience. Due to the power-consuming property of mobile streaming, it is necessary to provide energy-efficient streaming solutions for mobile devices. We plan to take the battery life time into consideration for the geo-bandwidth data collection,upload and quality adaptation module. Nevertheless, we expect our approach of leveraging location information to facilitate efficient mobile video delivery to be useful for a wide range of novel applications. 135 CHAPTER 7. CONCLUSIONS • Our approach can be enhanced further with incorporating the existing or emerging content-based tools. As the content-based methods improve and more semantic information become available for describing video content, video management applications could much more for users than they now by leveraging content based cues together with automatically collected sensor meta-data. It is also interesting to see the comparison between our sensor-rich approach and the content-based approach. • Using WiFi and GSM localization technologies, along with GPS, would be an alternative solution to avoid unnecessary energy consumption. Although in most cases GPS offers more accurate location information than WiFi and GSM localization, the superiority of GPS may decrease obviously when the vehicle is moving in urban areas. Sometimes GPS has significant outliers due to tall buildings or a tree cover, while WiFi localization can perform very well because there exist many urban WiFi access points. Therefore it would be interesting to develop an online algorithm that dynamically selects the best location sensor to sample considering available energy and the current uncertainty of the trajectory. • Currently, our application for sensor-rich video collection only works outdoor because GPS receiver requires an unobstructed view of the sky. Indoor localization techniques with mobile phone [14, 13] need to be integrated into our current implementation in order to provide indoor information collection functionality. • In addition to GPS and compass, other sensor devices can be embedded to the cameras to collect additional meta-data which can be used to enhance the search functionality. For example, compact and portable distance sensor solution can be attached to cameras to estimate the distance to large objects in front of the camera. In our current viewable scene model we assume that no objects in geo-space block the camera view. • To enable video search on a larger scale, a standard format for georeferenced video annotations must be established and issues for enabling 136 CHAPTER 7. CONCLUSIONS automated integration with other providers’ data have to be investigated. A standard file format which is used to store point of interest (POI) data are also needed, so that the POI data collected by different vendors and devices can be exchanged. • At last, encourage by the conclusions in this thesis, it is worth to work on a real-world deployment for the proposed sensor-rich video streaming system, providing efficient and effective streaming service to the end users. 137 Bibliography [1] Open Source Media Framework (OSMF). http://www.osmf.org. [2] Adobe. HTTP Dynamic Streaming on the Adobe Flash Platform. 2010. [3] Y. Agarwal, R. Chandra, A. Wolman, P. Bahl, K. Chin, and R. Gupta. Wireless Wakeups Revisited: Energy Management for VOIP over Wi-Fi Smartphones. In ACM MobiSys, volume 7, 2007. [4] S. Akhshabi, A. Begen, and C. Dovrolis. An Experimental Evaluation of Rateadaptation Algorithms in Adaptive Streaming over HTTP. 2nd annual ACM conference on Multimedia systems, pages 157–168, 2011. [5] I. Akyildiz, T. Melodia, and K. Chowdhury. A Survey on Wireless Multimedia Sensor Networks. Computer Networks, 51, 2007. [6] I. Akyildiz, T. Melodia, and K. Chowdury. Wireless Multimedia Sensor Networks: A Survey. IEEE Wireless Communications, 14(6):32–39, 2007. [7] Apple Inc. HTTP Live Streaming draft-pantos-http-live-streaming-06. InternetDraft, 2011. [8] Y. Arase, X. Xie, T. Hara, and S. Nishio. Mining People’s Trips from Large Scale Geo-tagged Photos. In 18th ACM Intl. Conference on Multimedia, pages 133–142, 2010. [9] S. Arslan Ay, L. Zhang, S. H. Kim, M. He, and R. Zimmermann. GRVS: A Georeferenced Video Search Engine. In 17th ACM Intl. Conference on Multimedia, Beijing, China, 19-24 October 2009. [10] S. Arslan Ay, R. Zimmermann, and S. H. Kim. Viewable Scene Modeling for Geospatial Video Search. In 16th ACM Intl. Conference on Multimedia, pages 309–318, 2008. 138 BIBLIOGRAPHY [11] S. Arslan Ay, R. Zimmermann, and S. H. Kim. Relevance Ranking in Georeferenced Video Search. Multimedia Systems Journal, 16(2), February 2010. [12] D. Ashbrook and T. Starner. Using GPS to Learn Significant Locations and Predict Movement Across Multiple Users. Personal and Ubiquitous Computing, 7(5):275–286, 2003. [13] M. Azizyan, I. Constandache, and R. Roy Choudhury. SurroundSense: Mobile Phone Localization via Ambience Fingerprinting. In Proceedings of the 15th annual international conference on Mobile computing and networking, pages 261– 272. ACM, 2009. [14] P. Bolliger. Redpin-Adaptive, Zero-Configuration Indoor Localization through User Collaboration. In Proceedings of the first ACM international workshop on Mobile entity localization and tracking in GPS-less environments, pages 55–60. ACM, 2008. [15] T. Brinkhoff. A Framework for Generating Network-Based Moving Objects. GeoInformatica, 6(2):153–180, 2002. [16] J.-C. Chen, W.-T. Chu, J.-H. Kuo, C.-Y. Weng, and J.-L. Wu. Tiling Slideshow. In Proceedings of the 14th annual ACM international conference on Multimedia, pages 25–34. ACM, 2006. [17] Y. Cheng, Y. Chawathe, A. LaMarca, and J. Krumm. Accuracy Characterization for Metropolitan-scale Wi-Fi Localization. In 3rd Intl. Conference on Mobile Systems, Applications, and Services, page 245. ACM, 2005. [18] T. Cheung, K. Okamoto, F. Maker III, X. Liu, and V. Akella. Markov Decision Process (MDP) Framework for Optimizing Software on Mobile Phones. In 7rd ACM Intl. conference on Embedded software, pages 11–20. ACM, 2009. [19] P. Chiu, A. Girgensohn, and Q. Liu. Stained-Glass Visualization for Highly Condensed Video Summaries. In Multimedia and Expo, 2004. ICME’04. 2004 IEEE International Conference on, volume 3, pages 2059–2062. IEEE, 2004. [20] Cisco Systems, Inc. Cisco Visual Networking Index: Global Mobile Data Traffic Forecast Update, 2012-2017. White Paper, 2012. [21] K. C. Clarke. Advances in Geographic Information Systems. Computers, environment and urban systems, 10(3):175–184, 1986. [22] R. Cornea, S. Mohapatra, N. Dutt, A. Nicolau, and N. Venkatasubramanian. Integrated Power Management for Video Streaming to Mobile Handheld Devices. In Proc. of ACM Multimedia, pages 582–591. Citeseer, 2003. [23] C. Cotsaces, N. Nikolaidis, and I. Pitas. Video Shot Detection and Condensed Representation: A Review. Signal Processing Magazine, IEEE, 23(2):28–37, 2006. 139 BIBLIOGRAPHY [24] I. D. Curcio, V. K. M. Vadakital, and M. M. Hannuksela. Geo-Predictive RealTime Media Delivery in Mobile Environment. In 3rd workshop on Mobile video delivery, pages 3–8. ACM, 2010. [25] L. De Cicco, S. Mascolo, and V. Palmisano. Feedback Control for Adaptive Live Video Streaming. 2nd annual ACM conference on Multimedia systems, 2011. [26] N. Diakopoulos and I. Essa. Mediating Photo Collage Authoring. In Proceedings of the 18th annual ACM symposium on User interface software and technology, pages 183–186. ACM, 2005. [27] P. Duygulu, K. Barnard, J. De Freitas, and D. Forsyth. Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary. In Computer Vision – ECCV 2002, volume 2353 of Lecture Notes in Computer Science, pages 349–354. Springer, 2006. [28] B. Epshtein, E. Ofek, Y. Wexler, and P. Zhang. Hierarchical Photo Organization Using Geo-Relevance. In 15th ACM Intl. Symposium on Advances in Geographic Information Systems (GIS), pages 1–7, 2007. [29] M. Ester, H. Kriegel, J. Sander, and X. Xu. A Density-based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In 2nd Intl. Conference on Knowledge Discovery and Data Mining, volume 1996, pages 226–231. Portland: AAAI Press, 1996. [30] S. Fang and R. Zimmermann. EnAcq: Energy-Efficient GPS Trajectory Data Acquisition based on Improved Map Matching. In 19th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pages 221–230. ACM, 2011. [31] W. Feng, E. Kaiser, W. Feng, and M. Baillif. Panoptes: Scalable Low-Power Video Sensor Networking Technologies. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP), 1(2):151–167, 2005. [32] F. Fitzek and M. Katz. Cooperation in Wireless Networks: Principles and Applications. Springer, 2006. [33] Flickr. http://www.flickr.com. [34] S. Gautam, G. Sarkis, E. Tjandranegara, E. Zelkowitz, Y.-H. Lu, and E. J. Delp. Multimedia for Mobile Environment: Image Enhanced Navigation. volume 6073, page 60730F. SPIE, 2006. [35] Geobloggers. http://www.geobloggers.com. [36] Geographic Midpoint calculation.html. Calculator. http://www.geomidpoint.com/ [37] F. Giannotti, M. Nanni, F. Pinelli, and D. Pedreschi. Trajectory Pattern Mining. In 13th ACM SIGKDD Intl. Conference on Knowledge Discovery and Data Mining, pages 330–339. ACM, 2007. 140 BIBLIOGRAPHY [38] A. Girgensohn, S. Bly, F. Shipman, J. Boreczky, and L. Wilcox. Home Video Editing Made Easy–Balancing Automation and User Control. In Human-Computer Interaction INTERACT, volume 1, pages 464–471, 2001. [39] Google Goggles. http://www.google.com/mobile/goggles/. [40] I. Google. Android – An Open Handset Alliance Project. http://developer. android.com. [41] C. H. Graham, N. R. Bartlett, J. L. Brown, Y. Hsia, C. C. Mueller, and L. A. Riggs. Vision and Visual Perception. John Wiley & Sons, Inc., 1965. [42] J. Graham, B. Erol, J. J. Hull, and D.-S. Lee. The Video Paper Multimedia Playback System. In Proceedings of the eleventh ACM international conference on Multimedia, pages 94–95. ACM, 2003. [43] A. Hakeem, R. Vezzani, M. Shah, and R. Cucchiara. Estimating Geospatial Trajectory of a Moving Camera. In Pattern Recognition, 2006. ICPR 2006. 18th International Conference on, volume 2, pages 82–87. IEEE, 2006. [44] J. Hao, S. H. Kim, S. A. Ay, and R. Zimmermann. Energy-Efficient Mobile Video Management using Smartphones. In Proceedings of the second annual ACM conference on Multimedia systems, pages 11–22. ACM, 2011. [45] J. Hao, G. Wang, B. Seo, and R. Zimmermann. Keyframe Presentation for Browsing of User-Generated Videos on Map Interfaces. In Proceedings of the 19th ACM international conference on Multimedia, pages 1013–1016. ACM, 2011. [46] R. Hariharan and K. Toyama. Project Lachesis: Parsing and Modeling Location Histories. Geographic Information Science, pages 106–124, 2004. [47] P. Havinga and G. Smit. Low Power System Design Techniques for Mobile Computers. CTIT technical reports series, 1997(32), 1997. [48] T. Hwang, K. Choi, I. Joo, and J. Lee. MPEG-7 Metadata for Video-based GIS Applications. In 2003 IEEE International Geoscience and Remote Sensing Symposium, 2003. IGARSS’03. Proceedings, volume 6, 2003. [49] Information Sciences Institute, The University of Southern California. The Network Simulator - ns-2, 2006. [50] R. Ji, X. Xie, H. Yao, and W. Ma. Mining City Landmarks from Blogs by Graph Modeling. In 17th ACM Intl. Conference on Multimedia, pages 105–114. ACM, 2009. [51] R. Kadobayashi and K. Tanaka. 3D Viewpoint-Based Photo Search and Information Browsing. In 28th ACM SIGIR, pages 621–622, 2005. [52] R. Kakerow. Low Power Design Methodologies for Mobile Communication. In 2002 IEEE International Conference on Computer Design: VLSI in Computers and Processors, 2002. Proceedings, pages 8–13, 2002. 141 BIBLIOGRAPHY [53] H.-W. Kang, X.-Q. Chen, Y. Matsushita, and X. Tang. Space-Time Video Montage. In Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on, volume 2, pages 1331–1338. IEEE, 2006. [54] S. Kang, J. Lee, H. Jang, H. Lee, Y. Lee, S. Park, T. Park, and J. Song. SeeMon: Scalable and Energy-Efficient Context Monitoring Framework for Sensor-Rich Mobile Environments. In 6th Intl. Conference on Mobile Systems, Applications, and Services, pages 267–280. ACM, 2008. [55] H. A. Karimi and X. Liu. A Predictive Location Model for Location-based Services. In 11th ACM International Symposium on Advances in Geographic Information Systems, pages 126–133, 2003. [56] L. Kennedy, M. Naaman, S. Ahern, R. Nair, and T. Rattenbury. How Flickr Helps Us Make Sense of the World: Context and Content in Community-contributed Media Collections. In 15th ACM Intl. Conference on Multimedia, pages 631–640. ACM, 2007. [57] L. S. Kennedy and M. Naaman. Generating Diverse and Representative Image Search Results for Landmarks. In 17th Intl. Conference on the World Wide Web (WWW), pages 297–306, New York, NY, USA, 2008. ACM. [58] J. Korhonen and Y. Wang. Power-Efficient Streaming for Mobile Terminals. In Proceedings of the international workshop on Network and operating systems support for digital audio and video, pages 39–44. ACM, 2005. [59] P. Kulkarni, D. Ganesan, P. Shenoy, and Q. Lu. SensEye: a Multi-tier Camera Sensor Network. In 13th ACM Intl. Conference on Multimedia, page 238, 2005. [60] A. K¨ upper. Location-based Services. Wiley, 2005. [61] S. Lederer, C. M¨ uller, and C. Timmerer. Dynamic Adaptive Streaming over HTTP Dataset. In 3rd Multimedia Systems Conference, pages 89–94. ACM, 2012. [62] Q. Li, Y. Zheng, X. Xie, Y. Chen, W. Liu, and W. Ma. Mining User Similarity based on Location History. In 16th ACM SIGSPATIAL Intl. Conference on Advances in Geographic Information Systems, page 34. ACM, 2008. [63] R. Lienhart. Comparison of Automatic Shot Boundary Detection Algorithms. In Proc. SPIE, volume 3656, pages 290–301, 1999. [64] C. Liu, I. Bouazizi, and M. Gabbouj. Rate Adaptation for Adaptive HTTP Streaming. 2nd annual ACM conference on Multimedia systems, 2011. [65] X. Liu, M. Corner, and P. Shenoy. SEVA: Sensor-Enhanced Video Annotation. In 13th ACM Intl. Conference on Multimedia, pages 618–627, 2005. [66] X. Liu, M. Corner, and P. Shenoy. SEVA: Sensor-Enhanced Video Annotation. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP), 5(3):24, 2009. 142 BIBLIOGRAPHY [67] P. Longley, M. Goodchild, D. Maguire, and D. Rhind. Geographical Information Systems and Science. John Wiley & Sons Inc, 2005. [68] X. Lu, C. Wang, J.-M. Yang, Y. Pang, and L. Zhang. Photo2Trip: Generating Travel Routes from Geo-tagged Photos for Trip Planning. In 18th ACM Intl. Conference on Multimedia, pages 143–152, 2010. [69] Y.-F. Ma, X.-S. Hua, L. Lu, and H.-J. Zhang. A Generic Framework of User Attention Model and Its Application in Video Summarization. Multimedia, IEEE Transactions on, 7(5):907–919, 2005. [70] Y.-F. Ma and H.-J. Zhang. Video Snapshot: A Bird View of Video Sequence. In Multimedia Modelling Conference, 2005. MMM 2005. Proceedings of the 11th International, pages 94–101. IEEE, 2005. [71] R. Mayo and P. Ranganathan. Energy Consumption in Mobile Devices: Why Future Systems Need Requirements–Aware Energy Scale-Down. Power-Aware Computer Systems, pages 26–40, 2005. [72] D. McMullin, R. Trestian, and G. Muntean. Power Save-based Adaptive Multimedia Delivery Mechanism. In 9th. IT & T Conference, page 6, 2009. [73] M. Michael. Energy Awareness for Mobile Devices. In Research Seminar on Energy Awareness. Citeseer. [74] J.-H. Min, H. Cha, and R. Ha. System-Level Integrated Power Management for Handheld Systems. Microprocessors and Microsystems, 33(3):201–210, 2009. [75] R. K. Mok, X. Luo, E. W. Chan, and R. K. Chang. QDASH: a QoE-aware DASH System. In 3rd Multimedia Systems Conference, pages 11–22. ACM, 2012. [76] M. Morzy. Mining Frequent Trajectories of Moving Objects for Location Prediction. Machine Learning and Data Mining in Pattern Recognition, pages 667–680, 2007. [77] M. Naaman, Y. J. Song, A. Paepcke, and H. Garcia-Molina. Automatic Organization for Digital Photographs with Geographic Coordinates. In 4th ACM/IEEE-CS Joint Conference on Digital Libraries, pages 53–62, 2004. [78] A. Natsev, J. R. Smith, J. Teˇsi´e, L. Xie, and R. Yan. IBM Multimedia Analysis and Retrieval System. In Proceedings of the 2008 international conference on Content-based image and video retrieval, pages 553–554. ACM, 2008. [79] L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank Citation Ranking: Bringing Order to the Web. Technical Report 1999-66, Stanford InfoLab, November 1999. [80] M. Park, J. Hong, and S. Cho. Location-based Recommendation System using Bayesian User’s Preference Model in Mobile Devices. Ubiquitous Intelligence and Computing, pages 1130–1139, 2007. 143 BIBLIOGRAPHY [81] M. Pedram, Q. Wu, and Q. Qiu. Dynamic Power Management in a Mobile Multimedia System with Guaranteed Quality-of-Service. In dac, pages 834–839. ACM, 2001. [82] T. Pering, Y. Agarwal, R. Gupta, and R. Want. Coolspots: Reducing the Power Consumption of Wireless Mobile Devices with Multiple Radio Interfaces. In Proceedings of the Annual ACM/USENIX International Conference on Mobile Systems, Applications and Services (MobiSys), 2006. [83] G. Perrucci, F. Fitzek, G. Sasso, and M. Katz. Energy Saving Strategies for Mobile Devices using Wake-up Signals. In MobiMedia-4th International Mobile Multimedia Communications Conference, Oulu-Finland, 2008. [84] A. Pigeau and M. Gelgon. Building and Tracking Hierarchical Geographical & Temporal Partitions for Image Collection Management on Mobile Devices. In 13th ACM Intl. Conference on Multimedia, 2005. [85] M. Ra, J. Paek, A. Sharma, R. Govindan, M. Krieger, and M. Neely. Energy-delay Tradeoffs in Smartphone Applications. In 8th Intl. conference on Mobile systems, applications, and services, pages 255–270. ACM, 2010. [86] A. Rav-Acha, Y. Pritch, and S. Peleg. Making a Long Video Short: Dynamic Video Synopsis. In Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on, volume 1, pages 435–441. IEEE, 2006. [87] H. Riiser, H. Bergsaker, P. Vigmostad, P. Halvorsen, and C. Griwodz. A Comparison of Quality Scheduling in Commercial Adaptive HTTP Streaming Solutions on a 3G Network. In 4th Workshop on Mobile Video, pages 25–30. ACM, 2012. [88] H. Riiser, T. Endestad, P. Vigmostad, C. Griwodz, and P. Halvorsen. Video Streaming using a Location-based Bandwidth-Lookup Service for Bitrate Planning. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP), 8(3):24, 2012. [89] C. Rother, L. Bordeaux, Y. Hamadi, and A. Blake. Autocollage. In ACM Transactions on Graphics (TOG), volume 25, pages 847–852. ACM, 2006. [90] F. Schirrmeister. Design for Low-Power at the Electronic System Level. ChipVision Design Systems, White paper, 2004. [91] B. Seo, J. Hao, and G. Wang. Sensor-rich Video Exploration on a Map Interface. In Proceedings of the 19th ACM international conference on Multimedia, pages 791–792. ACM, 2011. [92] B. Shahraray and D. Gibbon. Automatic Generation of Pictorial Transcripts of Video Programs. In Proceedings of SPIE, volume 2417, page 512, 1995. [93] Z. Shen, S. Arslan Ay, S. H. Kim, and R. Zimmermann. Automatic Tag Generation and Ranking for Sensor-rich Outdoor Videos. In Proceedings of the 19th ACM international conference on Multimedia, pages 93–102. ACM, 2011. 144 BIBLIOGRAPHY [94] E. Shih, P. Bahl, and M. Sinclair. Wake on Wireless: An Event Driven Energy Saving Strategy for Battery Operated Devices. In 8th Intl. Conference on Mobile Computing and Networking, pages 160–171. ACM, 2002. [95] F. Shipman, A. Girgensohn, and L. Wilcox. Hyper-Hitchcock: Towards the Easy Authoring of Interactive Video. In Human-Computer Interaction INTERACT, volume 3, pages 33–40, 2003. [96] A. Shye and B. Sholbrock. Into The Wild: Studying Real User Activity Patterns to Guide Power Optimization for Mobile Architectures. Micro, 2009. [97] I. Simon and S. M. Seitz. Scene Segmentation Using the Wisdom of Crowds. In Proc. ECCV, 2008. [98] V. Singh, J. Ott, and I. D. Curcio. Predictive Buffering for Streaming Video in 3G Networks. In World of Wireless, Mobile and Multimedia Networks (WoWMoM), 2012 IEEE International Symposium on a, pages 1–10. IEEE, 2012. [99] J. Sivic, B. Russell, A. Efros, A. Zisserman, and W. Freeman. Discovering Objects and Their Location in Images. In 10th IEEE Intl. Conference on Computer Vision (ICCV), volume 1, pages 370–377, 2005. [100] N. Sklavos and K. Touliou. A System-Level Analysis of Power Consumption & Optimizations in 3G Mobile Devices. New Technologies, Mobility and Security, pages 217–227. [101] M. A. Smith and T. Kanade. Video Skimming and Characterization Through the Combination of Image and Language Understanding. In Content-Based Access of Image and Video Database, 1998. Proceedings., 1998 IEEE International Workshop on, pages 61–70. IEEE, 1998. [102] J. Sorber, N. Banerjee, M. Corner, and S. Rollins. Turducken: Hierarchical Power Management for Mobile Devices. In 3rd Intl. Conference on Mobile Systems, Applications, and Services, page 274. ACM, 2005. [103] M. Stemm and R. Katz. Measuring and Reducing Energy Consumption of Network Interfaces in Hand-held Devices. IEICE Transactions on Communications, 80(8):1125–1131, 1997. [104] T. Stockhammer. Dynamic Adaptive Streaming over HTTP: Standards and Design Principles. In 2nd annual ACM conference on Multimedia systems, pages 133–144. ACM, 2011. [105] Y. Takeuchi and M. Sugimoto. An Outdoor Recommendation System based on User Location History. In 1st Intl. Workshop on Personalized Context Modeling and Management for UbiComp Applications, pages 91–100, 2005. [106] B. Tiwana and L. Zhang. PowerTutor. 2009. http://powertutor.org. 145 BIBLIOGRAPHY [107] C. Torniai, S. Battle, and S. Cayzer. Sharing, Discovering and Browsing Geotagged Pictures on the Web. In A. Scharl and P. K. Tochtermann, editors, The Geospatial Web: How Geo-Browsers, Social Software and the Web 2.0 are Shaping the Network Society. Springer, 2007. [108] K. Toyama, R. Logan, and A. Roseway. Geographic Location Tags on Digital Images. In 11th ACM Intl. Conference on Multimedia, pages 156–166, 2003. [109] S. Uchihashi, J. Foote, A. Girgensohn, and J. Boreczky. Video Manga: Generating Semantically Meaningful Video Summaries. In Proceedings of the seventh ACM international conference on Multimedia (Part 1), pages 383–392. ACM, 1999. [110] T. UEDA, T. AMAGASA, M. YOSHIKAWA, and S. UEMURA. A System for Retrieval and Digest Creation of Video Data based on Geographic Objects. Lecture notes in computer science, pages 768–778, 2002. [111] H. Van Antwerpen, N. Dutt, R. Gupta, S. Mohapatra, C. Pereira, N. Venkatasubramanian, and R. von Vignau. Energy-Aware System Design for Wireless Multimedia. In Proceedings of the conference on Design, automation and test in Europe-Volume 2, page 21124. IEEE Computer Society, 2004. [112] O. Vargas. Minimum Power Consumption in MobilePhone Memory Subsystems. Available: http://pd. pennnet. com/display article/244484/21/ARTCL/none/WIREL/Minimumpowerconsumption-in-mobile-phone-memory-subsystems. [113] V. Varsa and I. Curcio. Transparent End-to-End Packet Switched Streaming Service (PSS): Protocols and Codecs. Technical report, 3GPP TR 26.937 V1, 2003. [114] M. Viredaz, L. Brakmo, and W. Hamburgen. Energy Management on Handheld Devices. Queue, 1(7):52, 2003. [115] Y. Wang, J. Lin, M. Annavaram, Q. Jacobson, J. Hong, B. Krishnamachari, and N. Sadeh. A Framework of Energy Efficient Mobile Sensing for Automatic User State Recognition. In 7th Intl. Conference on Mobile Systems, Applications, and Services, pages 179–192, 2009. [116] W. Wolf. Key Frame Selection by Motion Analysis. In icassp, pages 1228–1231, 1996. [117] Woophy. http://www.woophy.com. [118] S. Xiang, L. Cai, and J. Pan. Adaptive Scalable Video Streaming in Wireless Networks. In 3rd Multimedia Systems Conference, pages 167–172. ACM, 2012. [119] J. Yao, S. S. Kanhere, and M. Hassan. Improving QoS in High-Speed Mobility using Bandwidth Maps. Mobile Computing, IEEE Transactions on, 11(4):603– 617, 2012. 146 BIBLIOGRAPHY [120] G. Yavas, D. Katsaros, O. Ulusoy, and Y. Manolopoulos. A Data Mining Approach for Location Prediction in Mobile Environments. Data & Knowledge Engineering, 54(2):121–146, 2005. [121] J. Yuan, H. Wang, L. Xiao, W. Zheng, J. Li, F. Lin, and B. Zhang. A Formal Study of Shot Boundary Detection. Circuits and Systems for Video Technology, IEEE Transactions on, 17(2):168–186, 2007. [122] A. Zambelli. IIS Smooth Streaming Technical Overview. Microsoft Corporation, 2009. [123] H. Zhang, J. Wu, D. Zhong, and S. Smoliar. An Integrated System for Contentbased Video Retrieval and Browsing. Pattern recognition, 30(4):643–658, 1997. [124] Y. Zheng, L. Zhang, X. Xie, and W. Ma. Mining Correlation between Locations using Human Location History. In 17th ACM SIGSPATIAL Intl. Conference on Advances in Geographic Information Systems, pages 472–475, 2009. [125] Y. Zheng, L. Zhang, X. Xie, and W. Ma. Mining Interesting Locations and Travel Sequences from GPS Trajectories. In 18th Intl. Conference on World Wide Web (WWW), pages 791–800, 2009. [126] Y. Zheng, M. Zhao, Y. Song, H. Adam, U. Buddemeier, A. Bissacco, F. Brucher, T. Chua, and H. Neven. Tour the World: Building a Web-scale Landmark Recognition Engine. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pages 1085–1092. IEEE, 2009. [127] Y. Zhuang, Y. Rui, T. Huang, and S. Mehrotra. Adaptive Key Frame Extraction using Unsupervised Clustering. In Image Processing, 1998. ICIP 98. Proceedings. 1998 International Conference on, volume 1, pages 866–870, 1998. 147 [...]... number of geo-tagged photos and videos have been accumulating continuously on the web, posing a challenging problem for mining this type of media data Existing solutions attempt to examine the signal content of the videos and recognize objects and events This is typically time-consuming and computationally expensive and the results can be uneven in their quality Therefore these methods face challenges... of sensor- generated geospatial contextual data The aggregation of multi-sourced geospatial data into a standalone meta-data tag allow video content to be identified by a number of precise, objective geospatial characteristics These so-called sensor- rich videos can conveniently be captured with smartphones In this thesis we investigate the transmission and processing of sensor- rich videos in mobile environment. .. the viewing direction becomes very important GPS data only identifies object locations and therefore it is imperative to investigate the natural concepts of a viewing direction and a view point For example, the location of the most salient object in the video is often not at the position of the camera, but may in fact be quite a distance away Consider the example of a user videotaping the pyramids of. .. on the energy consumption of the various CPUs, memories, interconnecting buses, the display and the RF part of the multi-core platform Viredaz et al [114] and Sklavos et al [100] surveyed many energy-saving techniques for handheld devices in terms of improving the design and cooperation of system hardware, software as well as multiple sensing sources In Table 2.1, one can see the participation of the. .. Device and Software d) Geo-Predictive Video Streaming Video Server Mobile Client Figure 1.2: The framework of sensor- rich video transmission and processing content from the large binary-based video content This small amount of meta-data is then transmitted to a server in real-time, while the video content will remain on the recording device, creating an extensive, resource efficient catalogue of video... detecting landmark places from photos Compared to prior studies, ours 6 CHAPTER 1 INTRODUCTION differs in the following aspects: • Accurate POI detection We identify the location of interesting places that appear in users’ videos, rather than the location where the user was standing, holding the camera • Automaticity The proposed technique is fully automatic It also does not require any training set... build the bandwidth map For estimating the future network condition, a path prediction and a geo-based bandwidth estimation method is presented that utilize the bandwidth map Finally, we provide two quality adaptation algorithms which make use of the predicted bandwidth obtained in the previous step The proposed scheme enables the mobile client to intelligently use the location-specific bandwidth information... continuous playback, thus guaranteeing the user perceived quality of experience 8 CHAPTER 1 INTRODUCTION 1.3 Organization The remainder of this thesis describes our approach in details We will start with a survey of the related work and techniques in Chapter 2 Chapter 3 presents the design of a system for energy-efficient sensor- rich video acquisition and upload Chapter 4 introduces the POI detection and. .. interesting places (Point of Interest - POI) in user-generated sensor- rich videos, (2) how to leverage the viewing direction together with the GPS location to identify the salient objects in a video, and (3) how to efficiently estimate the visual distance to objects in a video frame We do not restrict the movement of the camera operator (for example to a road network) and hence assume that mobile videos may be... However, the acquisition and transmission of large amounts of video data on mobile devices face fundamental challenges such as power and wireless bandwidth constraints Furthermore, the search and presentation of large video databases still remains a very challenging task Mobile streaming suffers from discontinuous playback which affect the user perceived Quality of Service (QoS) To support diverse mobile . THE TRANSMISSION AND PROCESSING OF SENSOR- RICH VIDEOS IN MOBILE ENVIRONMENT HAO JIA B.E., HIT, CHINA A THESIS SUBMITTED FOR THE DEGREE OF DOCTOR OF PHILOSOPHY SCHOOL OF COMPUTING NATIONAL. charac- teristics. These so-called sensor- rich videos can conveniently be captured with smartphones. In this thesis we investigate the transmission and pro- cessing of sensor- rich videos in mobile environment. . number of geo-tagged photos and videos have been accumu- lating continuously on the web, posing a challenging problem for mining this type of media data. Existing solutions attempt to examine the