Glasgow Theses Service http://theses.gla.ac.uk/ theses@gla.ac.uk Blair, Calum Grahame (2014) Real-time video scene analysis with heterogeneous processors. EngD thesis. http://theses.gla.ac.uk/5061/ Copyright and moral rights for this thesis are retained by the author A copy can be downloaded for personal non-commercial research or study, without prior permission or charge This thesis cannot be reproduced or quoted extensively from without first obtaining permission in writing from the Author The content must not be changed in any way or sold commercially in any format or medium without the formal permission of the Author When referring to this work, full bibliographic details including the author, title, awarding institution and date of the thesis must be given Real-time Video Scene Analysis with Heterogeneous Processors Calum Grahame Blair M.Eng. A thesis submitted to The Universities of Glasgow, Edinburgh, Heriot-Watt, and Strathclyde for the degree of Doctor of Engineering in System Level Integration c ○ Calum Grahame Blair May 2014 Abstract Field-Programmable Gate Arrays (FPGAs) and General Purpose Graphics Pro- cessing Units (GPUs) allow acceleration and real-time processing of computationally intensive computer vision algorithms. The decision to use either architecture in any application is determined by task-specific priorities such as processing latency, power consumption and algorithm accuracy. This choice is normally made at design time on a heuristic or fixed algorithmic basis; here we propose an alternative method for automatic runtime selection. In this thesis, we describe our PC-based system architecture containing both plat- forms; this provides greater flexibility and allows dynamic selection of processing platforms to suit changing scene priorities. Using the Histograms of Oriented Gradients (HOG) algorithm for pedestrian detection, we comprehensively explore algorithm implementation on FPGA, GPU and a combination of both, and show that the effect of data transfer time on overall processing performance is significant. We also characterise performance of each implementation and quantify tradeoffs between power, time and accuracy when moving processing between architectures, then specify the optimal architecture to use when prioritising each of these. We apply this new knowledge to a real-time surveillance application representative of anomaly detection problems: detecting parked vehicles in videos. Using motion detection and car and pedestrian HOG detectors implemented across multiple architectures to generate detections, we use trajectory clustering and a Bayesian contextual motion algorithm to generate an overall scene anomaly level. This is in turn used to select the architectures to run the compute-intensive detectors for the next frame on, with higher anomalies selecting faster, higher-power implementations. Comparing dynamic context-driven prioritisation of system performance against a fixed mapping of algorithms to architectures shows that our dynamic mapping iv method is 10% more accurate at detecting events than the power-optimised version, at the cost of 12W higher power consumption. Acknowledgements I would like to acknowledge the consistent and enthusiastic help and constructive advice given to me by my supervisor, Neil Robertson, throughout the course of this doctorate. I would also like to thank Siân Williams for all her procedural advice, before, during and after the winding-up of the ISLI. I’m also grateful for the work done by Scott Robson during his internship at Thales. Acknowledgements are also given to the funders of this research, EPSRC and Thales Optronics. Thanks are due also to my friends especially Chris, Kenny and Johnathan, for dragging me out to the pub whenever this degree started to get too overwhelming. Doubly so for those – including Marek – willing to accompany me as I dragged them up and down various Munros. My thanks also go to Rebecca for her continued understanding, patience and enthusiasm. Above all, I would like to thank my family, Mum, Dad, Mhairi and Catriona, for all the support and encouragement they have given me throughout this period, and particularly for their frequent offers to appear — especially with the dog — in my video datasets. v Contents Abstract iii Acknowledgements v List of Publications x List of Tables xi List of Figures xii List of Abbreviations xv Declaration of Originality xviii 1. Introduction 19 1.1. Academic Motivation and Problem Statement . . . . . . . . . . . . . 21 1.1.1. A Motivating Scenario . . . . . . . . . . . . . . . . . . . . . . . 21 1.1.2. Specifying Surveillance Subtasks . . . . . . . . . . . . . . . . . 23 1.1.3. Wider Applicability . . . . . . . . . . . . . . . . . . . . . . . . 24 1.2. Industrial Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 1.3. Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 1.4. Knowledge Transfer . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 1.4.1. Research Outputs . . . . . . . . . . . . . . . . . . . . . . . . . . 29 1.4.2. Knowledge Transfer within Thales . . . . . . . . . . . . . . . . 29 1.5. Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 1.6. Thesis Roadmap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 2. Related Work 35 2.1. Data Processing Architectures . . . . . . . . . . . . . . . . . . . . . . . 35 2.1.1. Processor Taxonomy . . . . . . . . . . . . . . . . . . . . . . . . 36 2.1.2. Methods for CPU Acceleration . . . . . . . . . . . . . . . . . . 39 vi Contents vii 2.1.3. Graphics Processing Units . . . . . . . . . . . . . . . . . . . . . 39 2.1.4. Field-Programmable Gate Arrays . . . . . . . . . . . . . . . . 42 2.1.5. FPGA vs. GPU . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 2.1.6. Alternative Architectures . . . . . . . . . . . . . . . . . . . . . 48 2.2. Parallelisable Detection Algorithms . . . . . . . . . . . . . . . . . . . 48 2.2.1. Algorithms for Pedestrian Detection . . . . . . . . . . . . . . . 50 2.2.2. Classification Methods: Support Vector Machines . . . . . . . 55 2.2.3. HOG Implementations . . . . . . . . . . . . . . . . . . . . . . . 57 2.3. Surveillance for Anomalous Behaviour . . . . . . . . . . . . . . . . . 60 2.4. Design Space Exploration . . . . . . . . . . . . . . . . . . . . . . . . . 66 2.5. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 3. Sensors, Processors and Algorithms 72 3.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 3.2. Sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 3.2.1. Infrared . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 3.2.2. Visual . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 3.3. Processing Platforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 3.3.1. Ter@pix Processor . . . . . . . . . . . . . . . . . . . . . . . . . 76 3.4. Simulation or Hardware? . . . . . . . . . . . . . . . . . . . . . . . . . 77 3.4.1. Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 3.5. Algorithms for Scene Segmentation . . . . . . . . . . . . . . . . . . . 80 3.5.1. Vegetation Segmentation . . . . . . . . . . . . . . . . . . . . . 80 3.5.2. Road Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . 81 3.5.3. Sky Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . 81 3.6. Automatic Processing Pipeline Generation . . . . . . . . . . . . . . . 82 3.7. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 4. System Architecture 87 4.1. Processor Specifications . . . . . . . . . . . . . . . . . . . . . . . . . . 87 4.2. System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 4.2.1. PCIe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 4.2.2. Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 4.2.3. Interface Limitations . . . . . . . . . . . . . . . . . . . . . . . . 95 4.3. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 Contents viii 5. Algorithm-Level Partitioning 96 5.1. HOG Algorithm Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 96 5.1.1. Algorithm Steps . . . . . . . . . . . . . . . . . . . . . . . . . . 98 5.1.2. Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 5.2. Hardware Implementation . . . . . . . . . . . . . . . . . . . . . . . . . 101 5.2.1. Cell Histogram Operations . . . . . . . . . . . . . . . . . . . . 103 5.2.2. Window Classification Operations . . . . . . . . . . . . . . . . 105 5.3. Software and System Implementation Details . . . . . . . . . . . . . . 107 5.4. Classifier Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 5.5. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 5.5.1. Performance Considerations . . . . . . . . . . . . . . . . . . . 109 5.5.2. Detection Performance . . . . . . . . . . . . . . . . . . . . . . . 114 5.5.3. Performance Tradeoffs . . . . . . . . . . . . . . . . . . . . . . . 114 5.5.4. Analysis, Limitations, and State-of-the-Art . . . . . . . . . . . 121 5.6. Variations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 5.6.1. Kernel SVM Classification . . . . . . . . . . . . . . . . . . . . . 124 5.6.2. Pinned Memory . . . . . . . . . . . . . . . . . . . . . . . . . . 125 5.6.3. Version Switching . . . . . . . . . . . . . . . . . . . . . . . . . 126 5.6.4. Embedded Evaluation . . . . . . . . . . . . . . . . . . . . . . . 127 5.7. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 6. Task-Level Partitioning for Anomaly Detection 131 6.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 6.2. Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 6.2.1. Bank Street Dataset . . . . . . . . . . . . . . . . . . . . . . . . . 134 6.2.2. i-LIDS Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 6.3. A Problem Description and Related Work . . . . . . . . . . . . . . . 136 6.4. High-level Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 6.5. Algorithm Implementations . . . . . . . . . . . . . . . . . . . . . . . . 140 6.5.1. Pedestrian Detection with HOG . . . . . . . . . . . . . . . . . 140 6.5.2. Car Detection with HOG . . . . . . . . . . . . . . . . . . . . . 141 6.5.3. Background Subtraction . . . . . . . . . . . . . . . . . . . . . . 145 6.5.4. Detection Combination . . . . . . . . . . . . . . . . . . . . . . 146 6.5.5. Detection Matching and Tracking . . . . . . . . . . . . . . . . 146 6.5.6. Trajectory Clustering . . . . . . . . . . . . . . . . . . . . . . . . 148 6.5.7. Contextual Knowledge . . . . . . . . . . . . . . . . . . . . . . . 150 Contents ix 6.5.8. Anomaly Detection . . . . . . . . . . . . . . . . . . . . . . . . . 151 6.6. Dynamic Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 6.6.1. Priority Recalculation . . . . . . . . . . . . . . . . . . . . . . . 155 6.6.2. Implementation Mapping . . . . . . . . . . . . . . . . . . . . . 156 6.7. Evaluation Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . 157 6.8. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 6.8.1. Detection Performance on BankSt videos . . . . . . . . . . . . 158 6.8.2. Detection Performance on i-LIDS videos . . . . . . . . . . . . 159 6.9. Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 6.9.1. Comparison to State-of-the-Art . . . . . . . . . . . . . . . . . . 167 6.9.2. System Architecture Improvements . . . . . . . . . . . . . . . 169 6.9.3. Algorithm-Specific Improvements . . . . . . . . . . . . . . . . 170 6.9.4. Task-Level Improvements . . . . . . . . . . . . . . . . . . . . . 170 6.10. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 7. Conclusion 173 7.1. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 7.2. Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 7.2.1. Outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 7.3. Future Research Directions and Improvements . . . . . . . . . . . . . 176 A. Mathematical Formulae 178 A.1. Vector Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 A.2. Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 A.3. Planar Homography . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 Bibliography 180 List of Publications ∙ Characterising Pedestrian Detection on a Heterogeneous Platform, C. Blair, N. M. Robertson, and D. Hume, in Workshop on Smart Cameras for Robotic Applications (SCaBot ’12), iros 2012. ∙ Characterising a Heterogeneous System for Person Detection in Video using Histo- grams of Oriented Gradients: Power vs. Speed vs. Accuracy, C. Blair, N. M. Robertson, and D. Hume, ieee Journal of Emerging and Selected Topics in Circuits and Systems, V3(2) pp. 236–247, 2013. ∙ Event-Driven Dynamic Platform Selection for Power-Aware Real-Time Anomaly Detection in Video, C. G. Blair & N. M. Robertson, International Conference on Computer Vision Theory and Applications (visapp) 2014. x [...]... pipeline on a fpga system: histogram generation 2.17 Fast Hog pipeline on a fpga system: classification 2.18 Analysis and information hierarchies in surveillance video 2.19 Surveillance analysis block diagram 2.20 Traffic trajectory analysis 2.21 Trajectory analysis via subtrees xii ... capable of pedestrian and vehicle detection, within a surveillance context Conceptually, we use a vehicle with some onboard processing capability as a target platform, while keeping in mind its power constraints Our commercial motivations involve ascertaining the best architecture to run such a system on, and also whether or not a system with multiple heterogeneous processors outperforms e.g a single-gpu... processors have become available Using these, tasks such as face detection [19], which would have been infeasible in real-time ten years ago, are now performed in realtime within most consumer cameras and mobile phones [20] 35 2.1 Data Processing Architectures 36 optical image image acquisition scene constraints light image array pre-processing image array actuation classification & interpretation scene. .. problem within the field of surveillance This work aims to answer two questions: 1 “How does the performance of an algorithm when partitioned temporally across a heterogeneous array of processors compare to the performance of the same algorithm in a singly-accelerated system, when considering a real-world image processing problem?” 2 “What is the optimal mapping of a set of algorithms to a heterogeneous. .. consider whether image segmentation is required We then focus on our choice of heterogeneous processors and discuss algorithms for exploring design space ∙ Chapter 4 is shaped by the previous chapter, and documents the system architecture we will use to perform real-time detection and hence surveillance We give specifications of the processors used and discuss the interface for data transfer between them... the Engineering Doctorate in System Level Integration The work is in the technical field of characterization and deployment of heterogeneous architectures for acceleration of image processing algorithms, with a focus on real-time performance This was carried out in combination with the Visionlab, part of the Institute for Sensors, Signals and Systems at Heriot-Watt University1 , and Thales Optronics... applications [12], meaning that cameras from one vendor can in theory be paired with signal processing equipment from another, and processing equipment can be easily upgraded when required Thales are thus concerned with the deployment of image-processing algorithms in embedded systems, and are aware that such technology operating with realtime performance has a wide variety of current and future applications,... vehicles equipped with cameras allow us to explore areas of our world and universe which would be extremely hostile to humans Grand aims such as these cover much of the motivation for research in this field From an engineering perspective, many tasks within computer vision are difficult problems The human brain has specialised hardware built for processing information from images, with a design time... advance rapidly; using computers built within the last few years we can now make reasonable progress towards creating implementations of complex signal processing algorithms which can run in real 19 20 time These same advances have allowed devices containing sensors and processors to shrink to where they become handheld or even smaller Their ubiquity and low cost, along with their size, further expand the... parts of a single algorithm, while Chapter 6 addresses the same topic at task level Note that throughout this work we refer to real-time operation This uses the “soft” definition of real-time computing, where results received after a deadline are less useful In a “hard” real-time system, failure to generate results by a deadline would be catastrophic We use the frame rate of 30 frames per second, . http://theses.gla.ac.uk/ theses@gla.ac.uk Blair, Calum Grahame (2014) Real-time video scene analysis with heterogeneous processors. EngD thesis. http://theses.gla.ac.uk/5061/ Copyright. author, title, awarding institution and date of the thesis must be given Real-time Video Scene Analysis with Heterogeneous Processors Calum Grahame Blair M.Eng. A thesis submitted to The Universities. 60 2.18. Analysis and information hierarchies in surveillance video . . . . . . 61 2.19. Surveillance analysis block diagram . . . . . . . . . . . . . . . . . . . 61 2.20. Traffic trajectory analysis