Phát hiện đối tượng dựa trên các đặc tính cục bộ

ĐẠI HỌC QUỐC GIA TP HCM TRƯỜNG ĐẠI HỌC BÁCH KHOA ĐINH VĂN TUYẾN PHÁT HIỆN ĐỐI TƯỢNG DỰA TRÊN CÁC ĐẶC TÍNH CỤC BỘ Chuyên ngành: Kỹ thuật điều khiển tự động hóa Mã số: 60520216 LUẬN VĂN THẠC SĨ TP HỒ CHÍ MINH, tháng 01 năm 2018 CƠNG TRÌNH ĐƯỢC HỒN THÀNH TẠI TRƯỜNG ĐẠI HỌC BÁCH KHOA –ĐHQG -HCM Cán hướng dẫn khoa học : TS Trịnh Hoàng Hơn (Ghi rõ họ, tên, học hàm, học vị chữ ký) Cán chấm nhận xét : TS Nguyễn Đức Thành (Ghi rõ họ, tên, học hàm, học vị chữ ký) Cán chấm nhận xét : TS Nguyễn Vĩnh Hảo (Ghi rõ họ, tên, học hàm, học vị chữ ký) Luận văn thạc sĩ bảo vệ Trường Đại học Bách Khoa, ĐHQG Tp HCM ngày 15 tháng 01 năm 2018 Thành phần Hội đồng đánh giá luận văn thạc sĩ gồm: (Ghi rõ họ, tên, học hàm, học vị Hội đồng chấm bảo vệ luận văn thạc sĩ) PGS TS Huỳnh Thái Hoàng TS Phạm Việt Cường TS Nguyễn Đức Thành TS Nguyễn Vĩnh Hảo TS Nguyễn Trọng Tài Xác nhận Chủ tịch Hội đồng đánh giá LV Trưởng Khoa quản lý chuyên ngành sau luận văn sửa chữa (nếu có) CHỦ TỊCH HỘI ĐỒNG TRƯỞNG KHOA ĐẠI HỌC QUỐC GIA TP.HCM TRƯỜNG ĐẠI HỌC BÁCH KHOA CỘNG HÒA XÃ HỘI CHỦ NGHĨA VIỆT NAM Độc lập - Tự - Hạnh phúc NHIỆM VỤ LUẬN VĂN THẠC SĨ Họ tên học viên: Đinh Văn Tuyến MSHV: 1570373 Ngày, tháng, năm sinh: 18/10/1985 Nơi sinh: Quảng Bình Chuyên ngành: Kỹ thuật điều khiển tự động hóa Mã số : 60520216 I TÊN ĐỀ TÀI: PHÁT HIỆN ĐỐI TƯỢNG DỰA TRÊN CÁC ĐẶC TÍNH CỤC BỘ II NHIỆM VỤ VÀ NỘI DUNG: Đề tài phát đối tượng dựa vào đặc tính cục ứng dụng lý thuyết mơ hình DPM vào việc phát đối tượng Chương trình xây dựng dựa mơ hình phát đối tượng mơi trường khác nhau, đối tượng xuất với hình dáng đặc biệt số phận đối tượng bị che khuất mơ hình đưa dự đốn vị trí đối tượng Luận văn đề xuất giải thuật cải tiến nhằm tối ưu thơng số q trình huấn luyện mơ hình đối tượng nhằm nâng cao độ xác Luận văn xây dựng chương trình phát đối tượng dựa đặc tính cục chạy Matlab Chương trình xây dựng từ chương trình thành chương trình hồn chỉnh, chạy tự động để đánh giá khả ứng dụng vào thực tế Xây dựng huấn luyện mơ hình ba đối tượng “Con người”, “Khn mặt”, “xe hơi” Các mơ hình sau áp dụng để phát đối tượng tập mẫu chuẩn III NGÀY GIAO NHIỆM VỤ : 06/02/2017 IV NGÀY HOÀN THÀNH NHIỆM VỤ: 03/12/2017 V CÁN BỘ HƯỚNG DẪN : TS Trịnh Hoàng Hơn Tp HCM, ngày 03 tháng 12 năm 2017 CÁN BỘ HƯỚNG DẪN (Họ tên chữ ký) CHỦ NHIỆM BỘ MÔN ĐÀO TẠO (Họ tên chữ ký) TRƯỞNG KHOA….……… (Họ tên chữ ký) LỜI CẢM ƠN Trước hết, em muốn bày tỏ lòng biết ơn sâu sắc tới người thầy hướng dẫn, TS Trịnh Hồng Hơn người tận tình hướng dẫn để em hồn thành khóa luận Thầy người cố vấn dày kinh nghiệm, người cộng tác viên, người dẫn, người bạn Luận văn khơng hồn thành khơng có giao phó nỗ lực cần cù bảo thầy Thầy cách khám phá phương pháp khác cách toàn diện phân tích kết quả, mà cịn hấp thụ ý tưởng mới, điều cần thiết hồn hảo khơng thực nghiên cứu mà truyền đạt kết Em xin bày tỏ lịng biết ơn chân thành tới tồn thể thầy cô giáo khoa Điện - Điện Tử, Đại học Bách Khoa, Đại Học Quốc Gia TP Hồ Chí Minh dạy bảo em tận tình suốt trình học tập khoa Đặc biệt thầy giáo Bộ Mơn Tự Động có bảo tận tình, phản biện nghiêm túc từ trình làm đề cương để em có định hướng trình làm luận văn Nhân dịp em xin gửi lời cảm ơn chân thành tới gia đình, bạn bè ln bên em, cổ vũ, động viên, giúp đỡ em suốt trình học tập thực khóa luận tốt nghiệp Tp Hồ Chí Minh, ngày 03 tháng 12 năm 2017 Học viên Đinh Văn Tuyến I TÓM TẮT LUẬN VĂN Những năm gần đây, nhiều ngành công nghiệp đầu tư dịch vụ trực tuyến, thương mại điện tử, sử dụng phương tiện truyền thông để cung cấp thông tin liên quan đến hình ảnh, video đến người tiêu dùng khách hàng Cùng với gia tăng nhanh số lượng ảnh, việc khai thác thông tin dựa theo ảnh ngày trọng Để làm điều máy tính phải “hiểu” nội dung ảnh Để “hiểu” ảnh việc máy tính phải phát đối tượng đâu hình, hình có đối tượng Việc khai thác thơng tin từ khối lượng lớn hình ảnh hỗn tạp tương quan thơng tin hình ảnh với mục tiêu ứng dụng trọng Việc sản sinh thông tin tạo nhu cầu cần có cơng nghệ để phân tích tập liệu Sự phát triển của hệ thống tìm kiếm thơng tin dựa theo hình ảnh với thành công công cụ hỗ trợ tìm kiếm, phân tích thơng tin thăm dị phụ thuộc vào khả nắm bắt mô tả đầy đủ phức tạp, đa dạng liệu cần xử lý, mối quan hệ cấu trúc đối tượng hình ảnh Hơn nữa, việc truy cập nội dung thơng tin hình ảnh liên quan đến vấn đề phức tạp phát sinh chủ yếu từ khối lượng lớn liệu, nội dung thông tin phong phú, tính chủ quan việc giải thích từ phía người dùng, địi hỏi cần tối ưu hóa sản phẩm, hỗ trợ người đưa định hợp lý Để tìm kiếm xác thơng tin dựa theo hình ảnh, máy tính phải hiểu hình ảnh, bước phải trả lời câu hỏi có đối tượng ảnh vị trí đối tượng đâu Do đó, tốn “Phát đối tượng” lĩnh vực thị giác máy tính ln mối quan tâm Sự khó khăn tốn “Phát đối tượng” do: Các đối tượng khác hình dáng, kích thước đối tượng, thay đổi chiếu sáng, hướng quan sát, che khuất phần đối tượng; Q trình xử lý phân tích hình ảnh địi hỏi nhiều thời gian u cầu tính tốn chi phí thực Dựa vào quan sát thực tế, đối tượng thường cấu thành từ nhiều thành phần nhỏ khác Ví dụ: hình ảnh “khn mặt” thường cấu thành từ mắt, miệng, mũi, II cằm; hình ảnh “Con người” thường có phận đầu, tay, chân, mình; xe thường có phận bốn bánh, đèn, gương, phần thân, cửa kính Tùy vào mức độ u cầu độ xác xét đến độ chi tiết khác đối tượng, ví dụ đối tượng người xem xét đến chi tiết phận mắt, mũi, miệng Yêu cầu đặt tốn “Nhận dạng đối tượng” độ xác cao địi hỏi phải có đặc trưng tốt Đặc trưng đối tượng gồm có nhiều thành phần nhỏ gọi đặc trưng cục Trên sở này, luận văn trình bày phương pháp “Phát đối tượng dựa đặc tính cục bộ” Trong miêu tả bước mơ hình hóa đối tượng dựa nhiều thành phần khác nhau, trình bày cách trích xuất đặc trưng thành phần, tương quan thành phần, phương pháp huấn luyện cho hệ thống phát đối tượng áp dụng mơ hình đối tượng huấn luyện vào tập mẫu phổ biến III ABSTRACT In recent years, many industries are investing in online services, e-commerce, using the media to provide information related to images and video to consumers and customers With the rapid increase in the number of images, the exploitation of information based on images is increasingly focused To that, the computer must "understand" the content of the image In order to "understand" the first thing is that the computer must detect the object where in the picture, in the picture there are objects Exploitation of information from a large number of heterogeneous images and the correlation of visual information with the goal of the application is emphasized The production of new information has created the need for new technologies for analyzing data sets The development of the information search system based on the image along with the success of support search engine, analysis of probe information depends on the ability to grasp and fully describe the complexity, diversity of data needs to process, structural relationships between objects image Moreover, the access to image information content involves complex issues arising mainly from large amounts of data, rich-content information, and subjective interpretation of the users requires optimizing the product, support people to make the right decisions and more reasonable In order to search for exact image-based information, the computer must understand the image, the first step is to answer the question of which objects in the image and the location of those objects? Therefore, problem “Object Detection” in the field of computer vision is always a concern The difficulty of the problem “Object Detection” due to Objects can be different in shape, size of them, changes in lighting, the direction of observation, partially obscured objects; Image processing and analysis requires more time for computing requirements or implementation costs Based on actual Observations, an object is usually composed of many different small components For example, the “face” images are usually composed of eyes, mouth, nose, chin The image of "human" usually has parts such as head, arms, legs, and body; A car IV usually has parts such as four wheels, lights, mirrors, trunks, glass doors Depending on the level of accuracy required, it is possible to take into account the different details of an object, as, in the example above, the human subject can be considered for details of parts such as eyes, mouth The requirement for the "Object recognition" problem is high accuracy and therefore requires good features The characteristic of an object consists of many such small components called localized features Based on this, the thesis presents the method "Object detection based on local features" It describes the steps for modelling an object based on a variety of components, demonstrating how to extract local features, component correlations, model training methods for object detection systems And apply the model of the trained object to the current popular datasets V LỜI CAM ĐOAN Tơi xin cam đoan cơng trình nghiên cứu riêng hướng dẫn khoa học TS Trịnh Hoàng Hơn Các nội dung nghiên cứu, kết đề tài trung thực chưa cơng bố hình thức trước Nếu phát có gian lận nào, tơi xin hồn tồn chịu trách nhiệm nội dung luận văn Trường Đại học Bách Khoa thành phố Hồ Chí Minh khơng liên quan đến vi phạm (nếu có) tác quyền, quyền tơi gây q trình thực Tp Hồ Chí Minh, ngày 03 tháng 12 năm 2017 Học viên Đinh Văn Tuyến VI Mục lục LỜI CẢM ƠN I TÓM TẮT LUẬN VĂN II ABSTRACT IV LỜI CAM ĐOAN VI Chương TỔNG QUAN VỀ ĐỀ TÀI 1.1 Bài toán phát đối tượng 1.2 Các cơng trình nghiên cứu liên quan 1.3 Đặc tính tồn cục Đặc tính cục 1.4 Cách tiếp cận vấn đề 1.5 Ý nghĩa khoa học 1.6 Tóm lược luận văn Chương XÂY DỰNG MƠ HÌNH ĐỐI TƯỢNG DỰA VÀO CÁC ĐẶC TÍNH CỤC BỘ 10 2.1 Mơ hình tồn cục 10 2.2 Mơ hình phần 12 2.3 Sử dụng mơ hình phần để phát đối tượng 14 2.4 Mơ hình hỗn hợp 15 Chương HUẤN LUYỆN MƠ HÌNH CỦA CÁC ĐỐI TƯỢNG 18 3.1 Huấn luyện mơ hình 19 3.2 Khởi tạo tham số tốn huấn luyện mơ hình 20 Chương TRÍCH XUẤT ĐẶC TRƯNG CỦA ĐỐI TƯỢNG 25 4.1 Trích xuất đặc trưng HOG đối tượng 25 4.2 PCA phân tích giảm số chiều vector đặc trưng 27 VII 2016 3rd National Foundation for Science and Technology Development Conference on Information and Computer Science (a) Fig Extracted and retrieved rectangular shape focus regions from Fig.3(c) and Fig.3(d) background pixels that cannot be reached by filling in the background from the edge of the image (b) 10 consecutive focus region pairs are used Fig Foreground extraction The results of pre-processing phase is illustrated in Fig.8; where the first row are the detected foreground, the second row are the candidates of moving objects (c) 20 consecutive focus region pairs are used B Extracting features and training thresholds (d) 30 consecutive focus region pairs are used Morphological features have been used for vehicle flow analysis in many published works such as [13], [12] However, this paper uses morphological features to detect motor bikes in Vietnam’s traffic condition From candidates of moving objects, the binary image is labeled as separated blobs Featured parameters such as area, minor axis and major axis are calculated for each blob an aspect ratio of blob is established from the length of major and and minor axes, a= (e) 40 consecutive focus region pairs are used Fig Background model training, a) from left to right, top to bottom the several initial images of training set; each of Fig.b to Fig.f: top left is ith training image of region 1, top right ith is training image of region 2, bottom left is corresponding retrieved background of region 1, bottom right is corresponding retrieved background of region 2; where i={10, 20, 30, 40} • the size of normalized image, in this report, P equals to 30 pixels Fill image regions and holes: fill holes in the binary image F In this syntax, a hole is a set of major axis length minor axis length (5) where a is the aspect ratio of each blob with corresponding it’s major and minor axes Finally, three parameter features such as area, major axis length and aspect ratio are used to classify and detect the moving objects To get thresholds of parameter features, a set of moving objects as single motorbikes, double motorbikes and cars is handled extracted as ground truth of training data Here, double motorbike is explained as two adjacent motorbikes so their foregrounds can not be separated Fig.9 is several initial ground truth data where the first row is of single motorbike; similarly, the second and the last rows are for double motorbikes and cars respectively With a huge number of training data, the feature parameters are computed whose values are statistics as seen in Fig.4 From those figures, the trained thresholds area chosen as in Table.I 226 2016 3rd National Foundation for Science and Technology Development Conference on Information and Computer Science 50 first images of each sequence are used for training geometrical model, focus regions and background models; training process is performed in offline mode, it takes about 30 seconds In the running phase, each continuous two images of sequence are treated as a pair and used for detecting and classifying moving objects 100 first images of each sequence are used for handled collecting the types of vehicles as single motorbikes, double motorbikes and cars From training images, 564 single motorbikes, 120 double motorbikes and 180 cars are collected as truth ground images Fig Candidates of moving objects B Obtained results Kind of vehicles Aspect ratio in range Area in range Single motorbikes Double motorbikes Cars [560 1900] [1900 3700] ≥ 3700 - [0.137 0.45] [0.08 - 0.3] ≥ 0.3 TABLE I Major axis length in range [40 - 80] [130 - 250] ≥ 150 T RAINED THRESHOLDS FOR VEHICLES V E XPERIMENT RESULTS A Data set The proposed method has been experimented for two video clips; each clip consists of approximately 19300 frames ( 12 30 ) The clips are then decomposed into image sequences Then they are taken continuous two from every five frames, that means captured time from two continuous images in the sequence is 0.2 seconds The proposed algorithm gives a good result Geometrical model is accuracy and robust The background model is good enough for calculating candidates of moving objects Even there is no criteria for the size and position, but moving objects are fully and clearly appeared for detection and classification The detection function is challenged to 4220 single motorbikes, 572 double motorbikes and 387 cars; the accuracy percents are 93.7; 95.45 and 83.98 for single and double motorbikes and cars, respectively several results are shown in Fig.11 Most of Incorrect cases of single and double motorbikes come from broken of extracted foreground because the color of moving objects is similar to the background as in the bottom right image of Fig.11 The errors of cars is mainly caused by triple motorbikes which are not considered in the paper (a) Area of single (b) Aspect ratio of (c) Major axis motorbikes single motorbikes length of single motorbikes (a) Initial images of training set for single motorbikes (d) Area of double (e) Aspect ratio of (f) Major axis motorbikes double motorbikes length of single motorbikes (b) Initial images of training set for double motorbikes (g) Area of cars (h) Aspect ratio of (i) Major axis cars length of cars (c) Initial images of training set for cars Fig 10 Feature parameter learning Fig Demonstration of training set of moving objects 227 2016 3rd National Foundation for Science and Technology Development Conference on Information and Computer Science R EFERENCES C Discussions and future works After detecting vehicles, it is easy to track a vehicle in the flow Because a pair of focus regions are captured in two continuous images, so it’s images appear close together as seen car’s images in Fig.3cd Therefore, it is easy to match for tracking vehicles This information can be used for calculating density, velocity and flow of vehicle Furthermore, tracking process is helpful for verifying detection and classification results Because before being double or triple motorbikes, the motorbikes maybe a single one The density of vehicle is very important to warning the traffic jam, but it is ready to estimate from analyzed data The density can be calculated by Eq.6 h w δ(i,j) Dv = i=1 j=1 h×w (6) where, h × w is the size of focus region, and δ(i,j) is unit pixel at (i, j) of foreground after pre-processing (see bottom row of Fig.8) All discussion above is also our future works [1] G Beauchamp-Baez, E Rodriguez-Morales, E L Muniz Marrero, A Fuzzy Logic Based Phase Controller for Traffic Control, Proc IEEE International Conference on Fuzzy Systems 1997, pp.1533-1539, 1997 [2] S Chen, C F N Cowan, and P M Grant, Orthogonal least squares learning algorithm for radial basis function networks, Neural Networks, IEEE Transactions, Volume:2 Issue:2, 1991 [3] O Ibrahim, H Elgendy and A M ElShafee, Speed Detection Camera System using Image Processing Techniques on Video Streams, International Journal of Computer and Electrical Engineering, Vol 3, No 6, pp 771-778, December 2011 [4] D Levinson, The value of advanced traveller information systems for route choice Transportation Research part C: Emerging Technology, 11-1:75-87, 2003 [5] D.G Lowe, “Distinctive Image Features from Scale-Invariant Keypoints,” in IJCV, Vol 60, pp 91-110, Jan 2004 [6] H Matthew, T C Au, S Peter, Autonomous Intersection Management: Multi-Intersection Optimization,IROS, San Francisco, September 2011 [7] Celil Ozkurt and Fatih Camci, automatic traffic density estimation and vehicle classification for traffic surveillance systems using neural networks, mathematical and computational applications, vol 14, no 3, pp 187-196, 2009 [8] C Pappis, E Mamdani, A fuzzy logic controller for a traffic junction, IEEE Trans Systems, Man Cybernetics SMC-7 (10), pp 707-717, 1997 [9] M Papageorgiou, C Diakaki, V Dinopoulou, A Kotsoalos, and Y Wang, Review of road traffic control strategies, In Proceedings of The IEEE, volume 91, 2003 [10] G Russell, P Shaw, N Ferguson, Accurate Rapid Simulation of Urban Traffic using Discrete Modelling, Technical Report RR96-1, Napier University, January 1996 [11] P H S Torr and A Zisserman, MLESAC: A New Robust Estimator with Application to Estimating Image Geometry, in CVIU, vol 78, pp 138-156, 2000 [12] D.-Y Huang, C.-H Chen, W.-C Hu, S.-C Yi, Y.-F Lin et al., “Feature-based vehicle flow analysis and measurement for a realtime traffic surveillance system,” Journal of Information Hiding and Multimedia Signal Processing, vol 3, no 3, pp 279–294, 2012 [13] Prutha, YM and Anuradha, SG, “Morphological image processing approach of vehicle detection for real-time traffic analysis,” Int J Eng Res Technol, vol 3, no 5, pp 2278–0181, 2014 Fig 11 Illustration for detecting and classifying vehicles; red ellipses are single motorbikes, blue ellipses are double motorbikes and green ones are cars VI C ONCLUSION By using several well known algorithms and subtle formulation, conditions; the proposed method extracts information from image sequences to detect and classify vehicles From 50 first of sequence images, geometrical and background models are learning, this phase is trained in offline mode and it takes about 30 seconds From 100 first of sequence images, the ground truth of vehicles which is used for training thresholds is collected Base on the obtained thresholds, the vehicles are classified into single motorbikes, double motorbikes and cars The obtained results shows that it is high accuracy and robust geometrical model, good enough background model and high range of detection rate The proposed method is ready for calculating important information as velocity, density and flow of vehicles and also highly potential for real applications 228 HOG AND GEOMETRICAL MODEL BASED MOVING VEHICLE DETECTION Hoang-Hon Trinh Manh-Dung Ngo Van-Tuyen Dinh Ho Chi Minh City University of Technology, Viet Nam Email: thhon@icalabhcmut.edu.vnn Ho Chi Minh City University of Technology, Viet Nam Email: nmdung@hcmut.edu.vn Ho Chi Minh City University of Technology, Viet Nam Email: dvtuyen@icalabhcmut.edu.vn Abstract—This paper presents a new method for detecting moving vehicle based on Geometrical Model and Histograms of Oriented Gradient (HOG) features To so, a geometrical model is built, which is used to reduce the size of the focus region and processing time This model has two components which are Road boundary and Background model Road boundary to redefine the limits of the focus region where the probability of vehicle detection on the road is the highest Background model helps indicate the candidate moving vehicle in the focus region The background is extracted and updated automatically by using the median method From the background and a new frame of video, the foreground is extracted as the candidate of the moving object Followed by, HOG feature pyramid is extracted from these region candidates Based on the score of the filters at different positions and scales, objects are detected and classified Finally, the Vietnam traffic dataset is used to verify the effectiveness and accuracy of the proposed method Experimental results show that the moving vehicles such as motorcycles, cars, buses and trucks have been detected and classified with high accuracy The use of combined geometry and background models has significantly reduced the number of times sliding windows on each photo frame I I NTRODUCTION An intelligent surveillance system typically has four function modules, motion detection, object classification, object tracking and behavior recognition Moving object detection is the first step in this system to identify objects of interest in the video The result of this step affects the accuracy entire system Based on the information about the motion in the video, the methods of detecting moving objects are divided into three categories: Frame Difference, Optical Flow estimation, Background Subtraction The frame difference method detects all objects moving by considering the difference the between or more consecutive images Li & He [1], Lei & Gong [2] were based on the change in three consecutive frame image to detect moving objects The advantage of this method is simple calculations, however, when objects move at this method failed because the change between frames is not much The purpose of the Optical Flow Estimation method is to determine motion field from a set of image frame changes over time Assuming the appearance of the object is not much change in light intensity at two adjacent frames, Fleet & Weiss [3] were based on the gradient constraint equation to determine the movement of objects in an image sequence However, this method needs a large number of calculations, on the other hand it is also very 978-1-5090-6161-7/17/$31.00 c 2017 IEEE sensitive to noise and the change in light intensity The background subtraction method detects objects by comparing the change of a new frame from the background The key issue is to determine the precise background There are many methods to build and update the background model were Benezeth, et al [4], Sobral, et al [5] surveyed and compared with each other The two most common methods are Gaussian Of Mixture and Approximate Median The approximate Median method requires a large enough buffer for temporary storage before calculating the median frame, but it fits many models of different backgrounds In contrast, the Gaussian Of Mixture requires method requires only a small memory, but it does not automatically adapt to different background After obtaining the background, the foreground is extracted by getting a new frame minus background and compared to a threshold value Foreground is normalized by removing noise, the empty space is filled holes and adjacent areas are connected to each other Background Subtraction method, computational requirements is relatively few, but it is very sensitive to changes in light After the objects are shown, the classification of the object is performed on the candidate this region Parekh, et al [6], has divided the methods of classifying objects into four groups which are Shape based classification, Motion based, Color based classification and texture based classification Recently, some new methods have been applied to vehicle detection Li, at al [7] proposed a vehicle discovery model based on the AND-OR graph (AOG) multiscale model The advantage of this model is the circumvent the multiscale problem The experimental results of the model show that this model is efficiently dealt with multiscale vehicles and also adapted to different vehicle shapes and vehicle occlusion The use of the Convolutional Neural Network (CNN) is a new trend in object detection Bautista et al [8], have applied CNN in detection and classification of vehicles Using low quality traffic cameras Experimental results suggest that this approach can be applied to real-time traffic monitoring applications These methods often apply to traffic data sets collected in developed countries There are mainly Car and the distance between vehicles is relatively large In Vietnam and some other Asian countries, personal vehicles are mainly motorcycles Means of circulation often appear dense and move according to a certain law It is this problem that makes it difficult to detect the vehicle 1896 orientation We assume that the captured images are always upright as in Fig.2(a); survived edge pixels are collected when they pass the follow conditions, 1, 0, δ= −α0 ≤ α ≤ α0 Others (1) where, α = atan(dx, dy) , is the orientation of the gradient at pixel (x, y) whose components (dx, dy) are received from Canny operator is a threshold, in this paper α0 is 450 Then the survived edge pixels are calculated, surv = BW(x,y) × δ(x,y) BW(x,y) Fig Proposed scheme for moving object detection In this paper for the detection and classification of moving objects, an approach as see in Fig.1 has been proposed To reduce processing times, the proposed method builds a geometrical model that can be used to extract a focus region In this focus region, a background model is built based on the median method Then foregrounds are extracted as candidates of moving objects HOG features of the candidates are extracted at different scales Next, these features are passing the SVM classifiers were trained earlier Here the classifiers will indicate what kind of the objects Because the candidates are extracted features in different scale should sometimes be more than a bounding box around an object Therefore, the removal of the excess bounding box is necessary to point out exactly the object The method proposed in this paper is tested on the Vietnam traffic dataset These videos were filmed in various roads in Vietnam in different environmental conditions The remainder of this paper is organized as follows, section introduces a construction of geometrical model, section describes how to extract HOG features The experiments are represented in the section The last section is for summarizing several key ideas of the paper (2) where, is an edge image obtained from Canny operator The Orthogonal least squares method, Chen & Cowan (1991) is used to detect straight line segment as in Fig.2(e, f ) The dominant vanishing point is calculated by MSAC (m-estimator sample consensus, Zisserman & Mlesac [9]); the boundary of geometrical model is established from inlier segments of the previous step The result of this step is illustrated in Fig.2(g) Given a frame of sequence images with a period of time equals to 0.2 second, a sub region is focused to reduce time processing and increase effective results To so, HOGfeatures are calculated for all regions of the road A candidate region is chosen, then sliding along the geometrical model boundary; the sizes are increasing for each searching time The HOG-features of them are used to detect vehicles, the result is automatically recorded for the subregion The training process is repeated for 50 first images Total number of matches from each position is summarized, and a candidate winning with a largest matched number is selected as focus regions of geometric models In the case of upright images, the best candidates usually looked at bottom images because this position gives the best foreground resolution; this logic is clear by the results obtained in Fig.2(h) (a) (b) (c) (d) II GEOMETRICAL MODEL A Road boundary detection The determination of the limit of interest has important implications in the detection of objects It helps indicate the region in which the objects are likely to appear higher So reduce processing costs in areas not necessary In Vietnam set traffic pattern, normally the roadside path reserved for pedestrians separated from the lane reserved for vehicles Based on the properties of the edge roadside dominant, limiting the region of interest is performed automatically An input RGB image is converted into gray image, and normalized in to [0, 1] value of intensity Line segment detection is the computation of image edges using the Canny edge detector The results of Canny function are edge and gradient which are used to calculate the (e) (f) (g) (h) Fig The results of geometrical model and focus region processing, (a) Original image, (b) gray image, (c) Detected edge, (d) Survived edge pixels passing orientation conditions, (e) Line segment detection result, (f) inlier of MSAC algorithm, (f) detected boundary of geometrical model, (h) detected focus region B Background model There are many methods of detecting background were Bouwmans, et al [10] surveyed and compared with each other 2017 12th IEEE Conference on Industrial Electronics and Applications (ICIEA) 1897 However, because of the simplicity and efficiency, this paper has chosen the median method To verify the effectiveness, the paper compares the results with a different method was Trinh, et al [11] proposed before Given a group of N continuous images in the sequence, each image supports a focus region for training background The equation used in this study is described in the formula as Eq.3 t0 = M edian (Ixy (t−N ) , Ixy (t−N +i ) , , Ixy (t−1 )) Mxy i=1,2, ,N −2 (3) where, Ixy (t−N +i ) is the pixel value of (x, y) in (−N + i) th frame; Mxy (t0 ) is the pixel median value of (x, y) at processing time that averaged over the previous N frames To get an optimal value of N, the training set of previous section which consists of 50 image pairs is used N is tried from to 50 for calculating the retrieved background, then the best one is chosen for extracting moving objects Of course, when the value of N increases the quality of background also increases The result of training process shown that N equaled 40 gives the good enough of background quality when processing time is not so big Fig.3 demonstrates training result with N increases In those images, in the left side is the focus region of frame ith, in the middle is corresponding retrieved background follow the Trinh, et al [11] proposed, in the right side is corresponding retrieved background follow the median method The results of the this method still appear some stains when a new object moving through Meanwhile, with the number of frame N = 40, median methods for results similar to real background (a) for updating background are discarded By the period time is 0.2 seconds for each image pair, the total time for collecting images of updated data is seconds This time is so short to against the change of background which is caused by, for example, stopped vehicles To overcome the problem, the total time is increased into 32 seconds It means that each 0.8 seconds is used for accumulating an image pair to update the background The proposed method not only fast retrieves background, but also fast updates the static foreground (stopped the car in enough period time will be treated as background) and also fast adapts to sunshine changed After acquiring geometrical model and background has been above extract, the foreground pixels are indexed by Eq.4 F (x,y) = 1, 0, (x,y) (x,y) Ii − Ij Others ≤T (4) where, Ii and Ij are the new image and trained background, respectively; and T is a certain threshold, in the paper T was selected by Otsu (1979) method A binary image F is treated as candidates of moving objects F is illustrated in Fig.4(a); Fig.4(b, c) shows us the results of candidates of moving objects It can be seen that the results obtained are unsuitable to perform classification of moving objects Therefore, some processing techniques are applied for increasing accuracy when handling classified objects, Remove the pepper: the region of pixels has less than P pixels from the binary image F are replaced by the other binary value P must be larger than the smallest size of the object, in this report, P equals to 30 pixels which chosen by experience Fill image regions and holes: fill holes in the binary image F A hole is a set of pixels is considered as background and is surrounded by a set of foreground The results of this processing phase are illustrated in Fig.4(d); Fig.4(e, f ) shows the candidates of moving objects (b) (c) (d) Fig TBackground model training, Fig.a to Fig.d: the left side is the focus region of frame ith, the middle is corresponding retrieved background follow the Trinh, et al [11] proposed, the right side is corresponding retrieved background follow the median method; where i= 10, 20, 30, 40 In working phase, the background model is always updated, to so, the Eq.3 is also used When two newest images are captured and analyzed, that means two eldest focus regions 1898 (a) (b) (c) (d) (e) (f) Fig Foreground extraction and candidates of moving objects, (a) Foreground image in focus region, (b, c) candidates of moving objects, (d) Foreground after removing the pepper and fill holes, (e, f) final candidates of moving objects 2017 12th IEEE Conference on Industrial Electronics and Applications (ICIEA) III HOG DESCRIPTOR The histogram of oriented gradients (HOG) algorithm was introduced by Dalal & Triggs [12] The main idea of this algorithm is to use well-normalized local histograms of image gradient orientation in a dense grid The HOG object detector uses a sliding detection window, which is moved across at all locations and scales of the image At each position of the detector window, a HOG descriptor is computed for the detection window This descriptor, then is conveyed through the pre-trained classifier, which classifies it as either object or not an object In that paper, the authors have successfully applied the algorithm HOG on pedestrian detection However, the algorithm is also used in the detection of many different objects such as face recognition by Dniz, et al [13], 3D object retrieval by Scherer, et al [14], recognize human actions from sequences of depth maps by Yang, et al [15], etc This part briefly presents the main steps of the algorithm HOG Firstly, if gradient is ”signed” To reduce aliasing, the contribution of each pixel to the cell histogram is weighted by a Gaussian centered in the middle of the block and then interpolated bilinearly between the neighboring bin center in both orientation and position The next step to create the block description in order provide some invariance to illumination and contrast There are two main ways to perform this step which is rectangular blocks (R-HOG) and circular blocks (C-HOG) In [12], the authors were represented R-HOG by a grid includes ς × ς cells, each cell is composed of η × η pixels and each cell containing orientation bins For the problem of detecting pedestrians, by comparing the experimental results the authors found that ς = 2, η = and β = for best results The final step is block normalization, each block is represented by a feature vector that is normalized by L2-norm Eq.7, with ε be some small constant v (7) f= v + ε2 Motor-bike Car Bus and Truck Window size Cell size Block size Number of orientation Number of HOG blocks Size of HOG descriptor (a) 80x40 128x64 200x104 8x8 8x8 8x8 2x2 2x2 2x2 9 9x4 15x7 24x12 1296 3780 10368 TABLE I T HE PARAMETERS OF HOG DESCRIPTOR FOR SOME OBJECT (b) (c) (d) Fig Illustrate extracting HOG descriptors in candidate region (a) the candidate region, (b) gradient magnitude image, (c) gradient orientation image, (d) visualization of HOG image each image is divided into blocks, each block consists of a group of cell, each cell contains a group of pixels Gradient orientations and magnitude are calculated for each pixel This is done by convoluting image with the 1-D centered kernel T −1 by rows and −1 by columns, results Gx and Gy respectively The magnitude and orientation of the gradient achieved by using Eq.5 and Eq.6 For color images, the gradients with the largest magnitude and their corresponding orientation are selected G= Gx + Gy θ = arctan Gy Gx (5) (6) The cells are rectangular and the bins evenly spaced over to 180 degrees if gradient is ”unsigned” or to 360 degrees In this work, the HOG descriptors are calculated for the candidate of moving object instead of the entire image The parameters of the descriptors for the corresponding object type are detailed description in Table I The steps of extracting HOG descriptors in candidate region are illustrated in Fig.5 In that Fig.5(a) shows the candidate region, the gradient magnitude and orientation are visualized in Fig.5(b, c), Fig.5(d) shows a visualization of HOG image IV EXPERIMENT A Data set In this paper, the data set is the traffic videos were filmed at various roads in Vietnam in terms of infrastructure, the lighting, the shadows and different weather conditions This data set is divided into two groups, training set and test set In training set, 2000 images of Motorbike, 1000 images of Cars and 1000 images of Buss and Truck were cut out In each group does not contain objects of different groups and called positive set The size of each image in the group object is normalized by the size of the corresponding group windowsize shown in Table The part images that not contain any object called the negative set For each object group, positive set of other groups can be seen as part of negative set The proposed method has been experimented for ten video clips; each clip consists of approximately 21600 frames (60 minutes) The clips are then decomposed into image sequences Then they are taken continuous two from every 2017 12th IEEE Conference on Industrial Electronics and Applications (ICIEA) 1899 five frames, that means captured time from two continuous images in the sequence is 0.2 seconds B Train a SVM classifier To train the classification for each object class, training set of each object is extracted HOG features Parameters of each object class are described in Table The vlsvmtrain function of the library VLFeat, Vedaldi, et al [16], used to train the SVM classifier for each object class The parameter of the SVM solver selected based on the number of positive and negative images, is shown in Eq.8 In this work, C was selected by 10 and ε = 0.001; λ= C ∗ (numP os+numN eg) (8) To test the accuracy of the classification, we used a different sample set similar training set This sample set features extracted and put into the classifier, results obtained with an accuracy of 98, 99 and 96 percent respectively for Cars, Motorcycles and Buss-Truck The proposed algorithm gives a good result Geometrical model is accuracy and robustness Background subtraction detects most of the moving objects In some cases when the light changed suddenly, the candidate region can not contain any object However the classifier role as the second supervisor, once more confirm the objects in this region candidate The detection function is challenged to 45811 Motorbikes, 3715 Cars and 2073 Bus -Trucks; the accuracy percents are 96.87, 96.85 and 89.6 for Motorbikes, Cars and Bus-Trucks, respectively There are several causes of minor errors For example, the color of moving objects is similar to the background, the shape of an object is similar to the vehicle, the shadow of trees on the street Bus-Truck class with the lowest recognition rate A part of the reason is the variety of sizes and shapes of this group However, these minor errors make no change in the final result, so they are not mentioned deeply in this paper This experiment was carried out on a collection of traffic data collected in Vietnam There is currently no other experimental method on this data set In the future, it is necessary to compare the results of the proposed method with other methods V CONCLUSIONS C Object detection The first image of each sequence is used for training geometrical model, it takes about seconds Forty-first image is used to train the background model, it takes about seconds Both processes are performed in offline mode During the detection phase, the background is updated at (a) D Obtained results In this paper, the object detection was performed based on Background Model and Histograms of Oriented Gradient (HOG) features The modeling background has significantly reduced the amount of the sliding window on the region of interest The SVM classifier based on HOG features helps retest the accuracy of the moving object detection The proposed method has been implemented on the Vietnam traffic dataset with high accuracy The experiments are mainly to detect moving vehicles, but can use this method to detect many different kinds of objects The proposed method is ready (b) Fig Applying non-maximum suppression to remove the redundant bounding boxes (a) Detecting multiple overlapping bounding boxes around the object we want to detect (b) Fusion of multiple detections intervals of seconds Each new frame is extracted foreground as described in Section 3, which results in the candidate region The classifiers are scanned across each frame in the candidate region at multiple scales The scale range used is from 0.6 to 1.4, and the scale step is 0.2 It’s notable that the scale somewhat decides the size of the vehicles detected This typical multiple overlapping detections for each frame These detections need to be fused together Non-maximum suppression method, Dalal [17], is used to remove the excess bounding box The illustration of this step is shown in Fig Several results of the object detection are shown in Fig.7 1900 Fig Illustration for detecting and classifying vehicles; red rectangles are motorbikes, yellow rectangles are Cars, blue rectangles are Buss and Trucks 2017 12th IEEE Conference on Industrial Electronics and Applications (ICIEA) for calculating important information The proposed method was classified object class, ready for other tasks such as object tracking, behavior recognition to perfect an intelligent surveillance system ACKNOWLEDGMENT This research is funded by Ho Chi Minh City University of Technology - VNU-HCM under grant number TSH-T-201622 R EFERENCES [1] Q.-L Li and J.-F He, “Vehicles detection based on three-framedifference method and cross-entropy threshold method [j],” Computer Engineering, vol 37, no 4, pp 172–174, 2011 [2] L Ding and N Gong, “Moving objects detection based on improved three frame difference,” Dianshi Jishu(Video Engineering), vol 37, no 1, pp 151–153, 2013 [3] D Fleet and Y Weiss, “Optical flow estimation,” in Handbook of mathematical models in computer vision Springer, 2006, pp 237– 257 [4] Y Benezeth, P.-M Jodoin, B Emile, H Laurent, and C Rosenberger, “Review and evaluation of commonly-implemented background subtraction algorithms,” in Pattern Recognition, 2008 ICPR 2008 19th International Conference on IEEE, 2008, pp 1–4 [5] A Sobral and A Vacavant, “A comprehensive review of background subtraction algorithms evaluated with synthetic and real videos,” Computer Vision and Image Understanding, vol 122, pp 4–21, 2014 [6] H S Parekh, D G Thakore, and U K Jaliya, “A survey on object detection and tracking methods,” International Journal of Innovative Research in Computer and Communication Engineering, vol 2, no 2, pp 2970–2979, 2014 [7] Y Li, M J Er, and D Shen, “A novel approach for vehicle detection using an and–or-graph-based multiscale model,” IEEE Transactions on Intelligent Transportation Systems, vol 16, no 4, pp 2284–2289, 2015 [8] C M Bautista, C A Dy, M I Mañalac, R A Orbe, and M Cordel, “Convolutional neural network for vehicle detection in low resolution traffic videos,” in Region 10 Symposium (TENSYMP), 2016 IEEE IEEE, 2016, pp 277–281 [9] P H Torr and A Zisserman, “Mlesac: A new robust estimator with application to estimating image geometry,” Computer Vision and Image Understanding, vol 78, no 1, pp 138–156, 2000 [10] T Bouwmans, F Porikli, B Hăoferlin, and A Vacavant, Background Modeling and Foreground Detection for Video Surveillance CRC Press, 2014 [11] H.-H Trinh, H.-P Bui, and N.-D Luu, “Velocity and density of vehicles for warning traffic jam in vietnam,” in International Conference on Green and Human Information Technology (ICGHIT), 2015, pp 300– 303 [12] N Dalal and B Triggs, “Histograms of oriented gradients for human detection,” in 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), vol IEEE, 2005, pp 886–893 [13] O Déniz, G Bueno, J Salido, and F De la Torre, “Face recognition using histograms of oriented gradients,” Pattern Recognition Letters, vol 32, no 12, pp 1598–1603, 2011 [14] M Scherer, M Walter, and T Schreck, “Histograms of oriented gradients for 3d object retrieval,” 2010 [15] X Yang, C Zhang, and Y Tian, “Recognizing actions using depth motion maps-based histograms of oriented gradients,” in Proceedings of the 20th ACM international conference on Multimedia ACM, 2012, pp 1057–1060 [16] A Vedaldi and B Fulkerson, “Vlfeat: An open and portable library of computer vision algorithms,” in Proceedings of the 18th ACM international conference on Multimedia ACM, 2010, pp 1469–1472 [17] N Dalal, “Finding people in images and videos,” Ph.D dissertation, Institut National Polytechnique de Grenoble-INPG, 2006 2017 12th IEEE Conference on Industrial Electronics and Applications (ICIEA) 1901 2017 4th NAFOSTED Conference on Information and Computer Science A Robust Geometric Model of Road Extraction Method for Intelligent Traffic System Van-Tuyen Dinh Hoang-Hon Trinh Industry Institute of Science and Technology Nguyen Tat Thanh University Ho Chi Minh, Viet Nam dvtuyen@ntt.edu.vn Ho Chi Minh city University of Technology Ho Chi Minh, Viet Nam thhon@icalabhcmut.edu.vn Abstract—In the intelligent transportation system, the geometry for the street is an important factor in vehicle monitoring It helps to point out areas of interest, reduce computing costs, increased accuracy in detecting and identifying objects and facilitate data collection In this paper, a new robust method of extracting the geometric model of the road is presented The method is based on vehicle motion data to show the exact shape of the street without depending on the edge elements First, the special features of consecutive frames are extracted and matched together Second, by stretching the lines from matching these respective keypoints, intersections are indicated The vanishing point is achieved by calculating the center of these intersections Third, combining the infinity and the extremes of the motion data to tangent to the boundary of the geometry Finally, by charting the area with the greatest matching ratio between the keypoints in the adjacent frame we reach the area of interest Vietnam traffic dataset is used to verify the effectiveness and accuracy of the proposed method Experimental results show that the proposed method has shown the correct geometry of the path The use of motion data is a sustainable method of extracting Vanishing Point without being affected by other side effects Index Terms—Intelligent Traffic System, Geometric Model of Road, Lane Detection, Road Detection, Vanishing Point Detection I I NTRODUCTION The term Intelligent Transportation Systems or Intelligent Traffic Systems (ITS) refer to a traffic monitoring and control system that allows users to obtain better information At the same time make the use of network traffic more secure, more coordinated and smarter An ITS typically has there been major functions like motion detection, object classification, object tracking and behavior recognition Video-based detection and identification systems are an integral part of the ITS system, has been researched much in recent times such as [1], [2], [3] They have enormous potential because of providing flexible data collection methods and new techniques developed in the field of computer vision Video in the vehicle identification and detection system is a collection of consecutive frames In these frames there are image areas that not need to be processed because the vehicles not appear there The determination of the limit of interest has important implications in the detection of objects It helps indicate the region in which the objects 978-1-5386-3210-9/17/$31.00 ©2017 IEEE 264 are likely to appear higher Therefore, identifying the right region of interest is a necessity to reduce the computational cost of the system On the other hand increase the accuracy of the detection and identification of the objects Some research on intelligent transportation systems often determine the region of interest by hand This is usually determined edge in this case is correct However, the region of interest limits not indicate the region with the highest object recognition Wang and Zhang (2014) [2], Huang, et al(2012) [4] pointed out the area of interest by pointing out the base lines of the image where the object appears the most In this way the image areas on either side, where no object is present, are included in the calculation On the other hand, when the slope and foresight change, this method is wrong Defining an area of interest is usually done automatically by relying on the boundary edge of the road Dinh, et al [5] built the geometry model by relying on edge and detection rates From the two edges of the path, the authors extracted the edges, thereby pointing out the infinity A scaled-down slider window is run along these two edges Based on the matching rate of objects in each window, this method indicates the area of interest automatically However, this method gives wrong results when the boundary of the street is unclear or the boundary of other objects is more pronounced Another similar method introduced by Kong, et al (2010) [6], in this study the authors used Locally Adaptive Soft-Voting algorithm to determine the road vanishing point Then use constraints to indicate region of interest Some other methods only define two sides of the path by pointing out the road vanishing point Fan and Shin (2016) [7] detects vanishing point, distinguishes the road trail and surrounding noise by using the Weber adaptive local filter In this way the road trail is distinguished from the background noise Zhao and colleagues [8] estimate the background noise by extracting significant texture features based on differential excitation Using Hough transform and constraints, the authors team proposed an efficient lane detection algorithm These studies identify region of interest by relying on edge features In some cases, when the edges of road are not dominant, it is imperative to use multiple constraints to limit false detection In this paper, we propose a new method as seen in Figure 2017 4th NAFOSTED Conference on Information and Computer Science of Canny function are edge and gradient which are used to calculate the orientation In the experiment, the camera is always placed at an angle about 40 degrees to obtain images are always upright as in Fig.2(a) Corresponding to the nearfar law in computer vision, the edges of the boundary must be straight lines with angles in the range −α0 to α0 , in this paper α0 is 450 So the survived edge pixels are collected when they pass the follow (1) conditions: Image sequence (or movie) SIFT Keypoint Extract - Describe Remove non-moving Keypoints Keypoint matching Remove outliers RANSAC δ= - Expand straight lines and find intersections - Determine the strongest intersection Calculate the center of the intersection (Vanishing Point) Specify left and right margins 1, 0, −α0 ≤ α ≤ α0 Others (1) where, α = atan(dx, dy) , is the orientation of the gradient at pixel (x, y) whose components (dx, dy) are received from Canny operator Then the survived edge pixels are calculated, Statistics rate of the keypoint in the regions surv BW(x,y) = BW(x,y) × δ(x,y) (2) II P REVIOUS W ORK where, BW(x,y) is an edge image obtained from Canny operator The Orthogonal least squares method, Chen & Cowan (1991) [9] is used to detect straight line segment as in Fig.2(e, f ) The dominant vanishing point is calculated by MLESAC (Maximum Likelihood Estimation Sample Consensus, Zisserman & Torr (2000) [10]); the boundary of geometrical model is established from inlier segments of the previous step The result of this step is illustrated in Fig.2(g) Given a frame of sequence images with a period of time equals to 0.2 second, a sub region is focused to reduce time processing and increase effective results To so, HOGfeatures [11] are calculated for all regions of the road A candidate region is chosen, then sliding along the geometrical model boundary; the sizes are increasing for each searching time The HOG-features of them are used to detect vehicles, the result is automatically recorded for the subregion The training process is repeated for 50 first images Total number of matches from each position is summarized, and a candidate winning with a largest matched number is selected as focus regions of geometric models In the case of upright images, the best candidates usually looked at bottom images because this position gives the best foreground resolution; this logic is clear by the results obtained in Fig.2(h) The cases extract marginal of road were wrong, as illustrated in Fig.3 is the main reason was the edge of the road is not clear than the edges of other objects Objects with prominent edges are typically lane lines, objects standing close to the roadside such as electric poles, trees, edges of trucks This leads to the region of interest being incorrectly identified or smaller than reality Therefore, another method of extracting the area of interest is needed, without being influenced by the edge elements Previous studies were experimental on sample set of transportation in Vietnam In Vietnam set traffic pattern, normally the roadside path reserved for pedestrians separated from the lane reserved for vehicles Based on the properties of the edge roadside dominant, limiting the region of interest is performed automatically From an input image, the edges of the image are calculated using the Canny edge detector The results III ROBUST G EOMETRIC M ODEL Using edge features to determine street boundaries is only effective when these boundaries are clear Where other edges are more prominent than the boundary defining the area of interest is false In this section, the method of detecting areas of interest based on the motion characteristics of vehicles is presented Road boundary Region of interest Geometrical Model of Road Fig 1: Proposed scheme for Geometric Model Extraction to determine the geometry of the path First, consecutive frames are extracted keypoints Keypoints in adjacent two frames without moving are discarded Next, the keypoints in these two adjacent frames are matched together The straight lines connecting these extended keys will meet at some intersections For each pair of adjacent images an intersection is defined by the point where the number of straight lines passes through most The center of the intersection of the adjacent N frames is the vanishing point From this vanishing point, combined with the left and right margins of the moving objects, the two major edges of the region of interest are indicated Divide the image horizontally along these two boundaries and statistically measure the matching rate of each region, thus pointing out the limits of the region of interest The remainder of the paper is organized as follows, the second part introduces the geometry extraction method of the previously presented pathway and analyzes the case of false detection The third part introduces the proposed method The experiment was introduced in fourth part The final section is intended to summarize the main ideas of the article 978-1-5386-3210-9/17/$31.00 ©2017 IEEE 265 2017 4th NAFOSTED Conference on Information and Computer Science (a) (b) (c) (d) (e) (f) (g) (h) Fig 2: The results of geometrical model and focus region processing, (a) Original image, (b) gray image, (c) Detected edge, (d) Survived edge pixels passing orientation conditions, (e) Line segment detection result, (f) inlier of MLESAC algorithm, (f) detected boundary of geometrical model, (h) detected focus region Fig 4: Detect and describe SIFT keypoint the vehicle objects (a) (b) (c) (d) (e) (f) (g) (h) (i) (j) (k) (l) Fig 3: Some cases were wrong in extracting geometrical model of road, (a, e, i) Original image, (b, f, j) The edges have been detected, (c, g, k) Line segment detection result, (d, h, l) inlier of MLESAC algorithm The adjacent frames are extracted and described local features by the SIFT algorithm [12] Points of interest are extracted and described are invariant to image scale, rotation, noise, illumination changes, and partially invariant to affine distortion Fig.4 illustrates the keypoint detected at a frame, the center of the red circles are the location of the keypoints, the outer blue square and the arrows illustrate a descriptor for the corresponding keypoint In a slightly altered background, in two adjacent frames the keypoints of the background not move much These points belong to objects such as trees, houses, or other nonmoving objects, which are removed This is to reduce the matching cost between two adjacent frames Fig.5 illustrates the moving keys are removed, the remaining points belong to 978-1-5386-3210-9/17/$31.00 ©2017 IEEE 266 Fig 5: Removal of unmoving keypoints of two adjacent frames After removing the non-moving special points, the remaining points are largely located on the vehicles Since these points are invariant for light displacements and depending on the speed of movement of the means of transport, the coordinates of these points change at two adjacent frames To find the closest descriptor, the first nearest neighbor method is used Here multiple matching points appear, using a threshold to eliminate mismatches, this method introduced by Lowe in [12] is called the second nearest neighbor Fig.6(a) illustrates the result of matching special points on two adjacent frames Although the use of thresholds to remove mismatches, there are still many matches in the opposite direction compared to most other matches For each pair of keypoints, the parameter set (s, θ, tx , ty ) represents the geometric transformation of the two model 2017 4th NAFOSTED Conference on Information and Computer Science point on the different frame (3) u v = sR (θ) x y + tx ty (3) T where [tx ty ] is the model translation, s is an isotropic scaling and θ is the affine rotation Similar pairs of geometric changes can be identified by using the MLESAC algorithm [10] to find sets of parameters (s, θ, tx , ty ) After applying this algorithm the most mismatches is almost eliminated, the result is shown in Fig.6(b) If two adjacent frames overlap and join the same keypoints in the two frames together we get the result shown in Fig.7 Fig 7: The shift of corresponding keypoints in two adjacent frames (a) (b) Fig 6: Keypoint matching, (a) SIFT matching using second nearest neighbor method of Lowe [12], (b) SIFT matching using combination second nearest neighbor and geometric transformation Fig 8: The extension of the pair of keypoints is similar in two adjacent frames and the intersection has the longest lines passing through most After stretching the lines and the intersections of each pair of lines is calculated In each pair of adjacent images there will be a point where the number of intersections is at most, this point is illustrated by the red dot in Fig.8 However, this point is still deviated from the vanishing point we want to find By using multiple pairs of images in succession, with each pair we have a point as a candidate The vanishing point is indicated by calculating the center of the candidate points Simultaneously with each pair of figures after calculating the position of the keypoint, the positions of the left and right polarities must be stored Connect these two points to the vanishing point, the edges of the street indicated This happens when the number of vehicles increases and the vehicles tend to move to the edge and combine with the nearsighted law in computer vision The two yellow lines in Fig.9 illustrate this step Each image is divided into 10 equal partitions illustrated by the blue lines in Fig.9 In each of these partitions the ratio of similar keypoints between adjacent frames is statistically calculated Low-level partitions are removed Normally, these segregations are far away from the camera position Partitions with matching ratios greater than one threshold are retained Combining these partitions with the extracted edges in the previous step, the boundaries of the road are shown 978-1-5386-3210-9/17/$31.00 ©2017 IEEE 267 Fig 9: Geometric model of the street The yellow point is the location of the vanishing point, two blue dots illustrate the extreme edge of the model, the green quadrilateral limits the region of interest, blue horizontal lines are used to partition the statistics to match the ratio in each region 2017 4th NAFOSTED Conference on Information and Computer Science V C ONCLUSIONS (a) By using the motion data of vehicles involved in traffic, the paper presented a way of identifying the geometry of the street in a sustainable way Unlike previously performed methods that rely entirely on the geometric characteristics of the street such as edges of the road, bar lines are available These factors are often confused with the geometry of other objects The article also points to a new method for pointing to vanishing point, which is a very important factor in computer vision Statistics of the matching ratio of keypoints in fixed areas help limit the area of interest in the traffic monitoring problem Experimental results suggest that this approach can be applied to geometric modeling problems in intelligent transportation systems (b) R EFERENCES (c) (d) Fig 10: Illustrate the results of the proposed algorithm IV E XPERIMENT AND D ISCUSSION In this paper, the data set is the traffic videos were filmed at various roads in Vietnam in terms of infrastructure, the lighting, the shadows and different weather conditions The clips are then decomposed into image sequences Then they are taken continuous two from every five frames, that means captured time from two continuous images in the sequence is 0.2 seconds The computation time for adjacent pairs of adjacent frame pairs is about 0.15 seconds However, the extraction of this geometry takes place offline before other monitoring activities of the system, so the algorithm only focuses on accuracy Detection, description, and matching of keypoints in adjacent frames are made based on a number of functions in the VLFeat 0.92 library [13] The proposed algorithm has been tested on 20 videos collected in different ways During the experiment, 40 consecutive frames were used to extract the position of the vanishing point At the same time, in each frame, the right and left boundaries of the keypoints are also saved.In some special cases such as roads without commuter traffic, the vanishing point extraction fails In this case, the method described in section II is used effectively The statistics on the matching ratio between the partitions are illustrated in Fig.9, which limits the region of interest By intuition, it can be seen that high proportion of partitioning is usually in the first to regions from the camera location This happens because the remote partitions have a small proportion of the object’s keypoints in those partitions Some results on the geometric model extraction of the street are illustrated in Fig.10 Compared to the results made by the previous algorithms illustrated in Fig.3, the cases were wrong due to other edge factors being overcome 978-1-5386-3210-9/17/$31.00 ©2017 IEEE 268 [1] Y Tang, C Zhang, R Gu, P Li, and B Yang, “Vehicle detection and recognition for intelligent traffic surveillance system,” Multimedia Tools and Applications, vol 76, no 4, pp 5817–5832, feb 2015 [2] H Wang and H Zhang, “A Hybrid Method of Vehicle Detection based on Computer Vision for Intelligent Transportation System,” International Journal of Multimedia, 2014 [3] M Bommes, A Fazekas, T Volkenhoff, and M Oeser, “Video Based Intelligent Transportation Systems - State of the Art and Future Development,” in Transportation Research Procedia, vol 14, 2016, pp 4495–4504 [4] D.-Y Huang, C.-H Chen, W.-C Hu, S.-C Yi, Y.-F Lin, and Others, “Feature-based vehicle flow analysis and measurement for a realtime traffic surveillance system,” Journal of Information Hiding and Multimedia Signal Processing, vol 3, no 3, pp 279–294, 2012 [5] V.-T Dinh, N.-D Luu, and H.-H Trinh, “Vehicle classification and detection based coarse data for warning traffic jam in VietNam,” in 2016 3rd National Foundation for Science and Technology Development Conference on Information and Computer Science (NICS) IEEE, sep 2016, pp 223–228 [6] H Kong, J Y Audibert, and J Ponce, “General road detection from a single image,” IEEE Transactions on Image Processing, 2010 [7] X Fan and H Shin, “Road vanishing point detection using weber adaptive local filter and salient-block-wise weighted soft voting,” IET Comput Vis (UK), 2016 [8] P Zhao, B Fang, and W.-B Yang, “A robust lane detection algorithm based on differential excitation,” in International Conference on Machine Learning and Cybernetics IEEE, jul 2016, pp 1015–1020 [9] S Chen, C Cowan, and P Grant, “Orthogonal least squares learning algorithm for radial basis function networks,” IEEE Transactions on Neural Networks, vol 2, no 2, pp 302–309, mar 1991 [10] P Torr and A Zisserman, “MLESAC: A new robust estimator with application to estimating image geometry,” Computer Vision and Image Understanding, 2000 [11] N Dalal and B Triggs, “Histograms of oriented gradients for human detection,” in Proceedings - 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, vol I, 2005, pp 886–893 [12] D G Lowe, “Distinctive Image Features from Scale-Invariant Keypoints,” International Journal of Computer Vision, vol 60, no 2, pp 91–110, 2004 [13] A Vedaldi and B Fulkerson, “VLFeat: An open and portable library of computer vision algorithms,” http://www.vlfeat.org/, 2008 LÝ LỊCH TRÍCH NGANG Họ tên: ĐINH VĂN TUYẾN Phái: Nam Ngày, tháng, năm sinh: 18/10/1985 Nơi sinh: Quảng Bình Địa liên lạc: Đinh Văn Tuyến, 81/2/30 đường 18D, KP 10, phường Bình Hưng Hịa A, quận Bình Tân, TP HCM Email: dinhtuyen01@gmail.com Điện thoại: 0975.757.842 Khóa (năm trúng tuyển): 2015 - đợt QUÁ TRÌNH ĐÀO TẠO 2008 – 2011: Học đại học trường Đại Học Quảng Bình Chun ngành: Cơng nghệ thông tin 2015 – 2017: Học cao học trường Đại Học Bách Khoa TP HCM Chuyên ngành: Kỹ thuật điều khiển tự động hóa Q TRÌNH CƠNG TÁC 03/2011 – 03/2016: Làm việc công ty TNHH Mậu Đạt 04/2017 – nay: Làm việc trường Đại Học Nguyễn Tất Thành ... TÊN ĐỀ TÀI: PHÁT HIỆN ĐỐI TƯỢNG DỰA TRÊN CÁC ĐẶC TÍNH CỤC BỘ II NHIỆM VỤ VÀ NỘI DUNG: Đề tài phát đối tượng dựa vào đặc tính cục ứng dụng lý thuyết mơ hình DPM vào việc phát đối tượng Chương... nhỏ gọi đặc trưng cục Trên sở này, luận văn trình bày phương pháp ? ?Phát đối tượng dựa đặc tính cục bộ? ?? Trong miêu tả bước mơ hình hóa đối tượng dựa nhiều thành phần khác nhau, trình bày cách trích... "Car" với thách thức đặc trưng toán phát đối tượng Các đối tượng nằm chen lẫn với đối tượng khác, số phận đối tượng bị che khuất, độ chiếu sáng khác đối tượng khác nhau, đối tượng có biến dạng

Định dạng
Số trang	83
Dung lượng	30,05 MB