Building a diagram recognition problem with machine vision approach

VIETNAM NATIONAL UNIVERSITY - HO CHI MINH CITY HO CHI MINH UNIVERSITY OF TECHNOLOGY COMPUTER SCIENCE AND ENGINEERING FACULTY ——————– * ——————— GRADUATION THESIS Building A Diagram Recognition Problem with Machine Vision Approach Council: Computer Science Advisor: Dr Nguyen Duc Dung Reviewer: Dr Nguyen An Khuong —o0o— Student: Tran Hoang Thinh 1752516 HO CHI MINH CITY, 08/2021 ĐẠI HỌC QUỐC GIA TP.HCM -TRƯỜNG ĐẠI HỌC BÁCH KHOA KHOA: KH & KT Máy tính _ BỘ MƠN: KHMT _ CỘNG HÒA XÃ HỘI CHỦ NGHĨA VIỆT NAM Độc lập - Tự - Hạnh phúc NHIỆM VỤ LUẬN ÁN TỐT NGHIỆP Chú ý: Sinh viên phải dán tờ vào trang thuyết trình HỌ VÀ TÊN: Trần Hồng Thịnh MSSV: 1752516 _ HỌ VÀ TÊN: _ MSSV: HỌ VÀ TÊN: _ MSSV: NGÀNH: _ LỚP: Đầu đề luận án: Building A Diagram Recognition Problem with Machine Vision Approach Nhiệm vụ đề tài (yêu cầu nội dung số liệu ban đầu): - Investigate approaches in diagram recognition problem - Research on machine learning approaches for the problem - Prepare data for the problem - Propose and implement the diagram recognition system - Evaluate the proposed model Ngày giao nhiệm vụ luận án: 1/3/2021 Ngày hoàn thành nhiệm vụ: 30/6/2021 Họ tên giảng viên hướng dẫn: Phần hướng dẫn: 1) Nguyễn Đức Dũng 2) _ 3) _ Nội dung yêu cầu LVTN thông qua Bộ môn Ngày tháng năm CHỦ NHIỆM BỘ MƠN GIẢNG VIÊN HƯỚNG DẪN CHÍNH (Ký ghi rõ họ tên) (Ký ghi rõ họ tên) PHẦN DÀNH CHO KHOA, BỘ MÔN: Người duyệt (chấm sơ bộ): Đơn vị: _ Ngày bảo vệ: _ Điểm tổng kết: Nơi lưu trữ luận án: TRƯỜNG ĐẠI HỌC BÁCH KHOA KHOA KH & KT MÁY TÍNH CỘNG HÒA XÃ HỘI CHỦ NGHĨA VIỆT NAM Độc lập - Tự - Hạnh phúc -Ngày 01 tháng 08 năm 2021 PHIẾU CHẤM BẢO VỆ LVTN (Dành cho người hướng dẫn/phản biện) Họ tên SV: Trần Hoàng Thịnh MSSV: 1752516 Ngành (chuyên ngành): Computer Science Đề tài: Building A Diagram Recognition Problem with Machine Vision Approach Họ tên người hướng dẫn/phản biện: Nguyễn Đức Dũng Tổng quát thuyết minh: Số trang: Số chương: Số bảng số liệu Số hình vẽ: Số tài liệu tham khảo: Phần mềm tính tốn: Hiện vật (sản phẩm) Tổng quát vẽ: - Số vẽ: Bản A1: Bản A2: Khổ khác: - Số vẽ vẽ tay Số vẽ máy tính: Những ưu điểm LVTN: The team has successfully proposed the diagram recognition system They built the initial dataset and perform labeling the data for this task The team has utilized their knowledge in computer vision and machine learning to propose a suitable approach for this problem The evaluation results are promising Những thiếu sót LVTN: The dataset they built is still small and the number of components that this model can recognize is also limited Even obtained high accuracy, the team has not performed experiments under real conditions, i.e image captured with shadows, low contrast, thin sketches, etc Đề nghị: Được bảo vệ o Bổ sung thêm để bảo vệ o câu hỏi SV phải trả lời trước Hội đồng: a Không bảo vệ o b c 10 Đánh giá chung (bằng chữ: giỏi, khá, TB): Giỏi Điểm: /10 Ký tên (ghi rõ họ tên) Nguyễn Đức Dũng TRƯỜNG ĐẠI HỌC BÁCH KHOA KHOA KH & KT MÁY TÍNH CỘNG HỊA XÃ HỘI CHỦ NGHĨA VIỆT NAM Độc lập - Tự - Hạnh phúc -Ngày tháng năm 2021 PHIẾU CHẤM BẢO VỆ LVTN (Dành cho người hướng dẫn/phản biện) Họ tên SV: Trần Hoàng Thịnh MSSV: 1752516 Ngành (chuyên ngành): Computer Science Đề tài: “Building A Diagram Recognition Problem with Machine Vision Approach” Họ tên người phản biện: Nguyễn An Khương Tổng quát thuyết minh: Số trang: 35 Số bảng số liệu: Số tài liệu tham khảo: 53 Hiện vật (sản phẩm) Tổng quát vẽ: - Số vẽ: Bản A1: - Số vẽ vẽ tay Số chương: Số hình vẽ: 18 Phần mềm tính tốn: Bản A2: Khổ khác: Số vẽ máy tính: Những ưu điểm LVTN:  Thesis topic is interesting and well-choosen The author clearly understands the problem to be solved and masters the techniques and background knowledge to solve the problem  The author proposes three algorithms: Algorithms for improving Non Max Suppression, and Algorithms 3,4 for diagram building  The thesis uses Mask R-CNN model and its variant, Keypoint R-CNN with some improvements and augmentation to solve offline diagram recognition task with rather high accuracy (~90%) and acceptable performance (< 2s for each diagram) Những thiếu sót LVTN:  The thesis is not well-written and too short  The contributions of the author are not presented in a clear maner Đề nghị: Được bảo vệ  Bổ sung thêm để bảo vệ  Không bảo vệ  Câu hỏi SV phải trả lời trước Hội đồng: a Is there any commercial or prototype app/software that solve this problem or similar ones? If YES, can you give some comments and remarks on benchmarking your work and those? b Arrow keypoints seem often coincide with one border of the bounding box, so how should we to reduce overlap task between arrow keypoint detection and bounding box detection? 10 Đánh giá chung (bằng chữ: giỏi, khá, TB): Excellent Điểm: 9/10 Ký tên (ghi rõ họ tên) Nguyễn An Khương Declaration We hereby undertake that this is our own research project under the guidance of Dr Nguyen Research content and results are truthful and have never been published before The data used for the analysis and comments are collected by us from many different sources and will be clearly stated in the references In addition, we also use several reviews and figures of other authors and organizations All have citations and origins If we detect any fraud, we take full responsibility for the content of our graduate internship Ho Chi Minh City University of Technology is not related to the copyright and copyright infringement caused by us in the implementation process Best regards, Tran Hoang Thinh Acknowledgments First and foremost, we would like to express our sincere gratitude to our advisor Dr Nguyen Duc Dung for the support of our thesis for his patience, enthusiasm, experience, and knowledge He shared his experience and knowledge which helps us in our research and how to provide a good thesis We also want to thank Dr Nguyen An Khuong and Dr Le Thanh Sach for their support in reviewing our thesis proposal and thesis Finally, we would like to show our appreciation to Computer Science Faculty and Ho Chi Minh University of Technology for providing an academic environment for us to become what we are today Best regards, Tran Hoang Thinh Abstract Diagram has been one of the most effective illustrating tools for demonstrating and sharing ideas and suggestions among others Besides text and images, drawing flow charts is the best way to give others a clearer path of the plan with the least amount of work Nowadays, many meetings require a blackboard so everyone can express their thoughts This raises a problem with saving these drawings as a reference for future use since taking a picture can not solve the problem of re-editing these ideas and they need to be redrawn to be suitable in professional documents On the other hand, digitizing the chart requires redrawing the entire diagram using a computer or a special device like drawing boards and digital pens, which cost a lot and are not the most convenient tools to use Therefore, it is necessary to find a way to convert the current, traditional hand-drawing diagrams into a digital version, simplifying the sharing process between users Moreover, the digitizing diagram also helps the user to modify and convert to other forms that satisfy their requirements This thesis will focus on stating a problem with digitizing diagrams and proposing the solution Contents Introduction 1.1 Overview 1.2 Outline 1.3 Objectives 1 2 Related works 2.1 Object detection methods 2.1.1 Introduction 2.1.2 Traditional detector 2.1.3 CNN-based Detector 2.1.3.1 CNN-based Two Stages Detection (Region Proposal based) 2.1.3.2 CNN-based One Stage Detection (Regression/Classification based) 2.2 Diagram recognition 3 3 4 Background 3.1 Faster R-CNN 3.1.1 Backbone network 3.1.2 Region Proposal Network 3.1.3 Non-Maximum Suppression 3.2 Mask R-CNN 3.2.1 Binary Masks 3.2.2 Feature Pyramid Network 3.2.3 Region of Interest Align 3.3 Keypoint R-CNN Proposed method 4.1 Scope of the thesis 4.2 Problem statement 4.2.1 Problem definition 4.2.2 Approaches 4.2.3 Preparing input 4.3 Proposed model 4.3.1 Model selection 4.3.2 Proposed model 4.3.2.1 Feature map generator 4.3.2.2 Proposal generator 4.3.2.3 Instance generator 4.3.3 Loss function and summary i 7 11 11 12 13 15 17 17 18 18 18 19 24 24 26 26 27 29 30 4.4 Diagram building 4.4.1 Output 4.4.2 Arrow refinement 4.4.3 Symbol-Arrow relationship 4.4.4 Text-Others relationship 31 31 31 33 37 Experiments and Results 5.1 Data augmentation 5.2 Experiments 5.2.1 Perform training and inference without keypoints 5.2.2 Perform training and inference with keypoints 5.2.3 Building diagram structure from predictions 38 38 39 40 40 42 Conclusion 6.1 Summary 6.2 Challenges 6.3 Future works 46 46 46 46 List of Tables 4.1 4.2 4.3 Output structures Used fields Output structures 22 31 32 5.1 5.2 Model results Graph building technique experiment 42 44 iii PROPOSED METHOD d ∗ (A, B) = = √ √ ((xB∗ − xA∗ ) |i∗ | wi∗ )2 + ((y∗B − y∗A ) | j∗ | w j∗ )2 u2 wi∗ + v2 w j∗ α2 + β (4.3) With sufficient weight, one axis will have considerably higher priority compared to another direction Algorithm describes the usage of Weighted Euclidean Distance in the project with w∗i = and w∗j = Algorithm 6: Weighted Euclidean for Symbol-Arrow relationship Input: A = [a1 , a2 , , aN ], S = [s1 , s2 , , sN ], wi , w j A : list of arrows keypoints, each are represented a struct with parameters: < xt, yt, xh, yh > S : list of symbols coordinates, each are presented as a struct with parameter: < x, y, w, h > wi , w j : weight of each basis Output: M : set of connections M = []; for i in A ar = ; al pha = ar.xh − ar.xt; beta = ar.yh − ar.yt; closest_head = INT _MAX; closest_tail = INT _MAX; id_head = −1; id_tail = −1; 10 for j in S 11 if j == i then 12 continue 13 end 14 sym = si ; 15 x = sym.x; 16 y = sym.y; 17 dh = weuclidean(al pha, beta, wi , w j , ar.xh, ar.yh, x, y); 18 if dh < closest_head then 19 dh = closest_head; 20 id_head = j 21 end 22 dt = weuclidean(−al pha, −beta, wi , w j , ar.xt, ar.yt, x, y) ; // coefficient sign is inverted to get the vector from head to tail 23 if dt < closest_tail then 24 dt = closest_tail; 25 id_tail = j 26 end 27 end 28 M = M + (i, id_tail, id_head); 29 end 36 PROPOSED METHOD 4.4.4 Text-Others relationship In general, there are three types of text: Text that stays in symbol, text on arrow, and standalone text While working to find a solution, we notice the similarities between Text-Others and Arrow-Symbol relationship, as both can be solved using Weighted Euclidean Distance The following procedure describes the method to generate the relationship Select a text object along with its bounding box If there exists a symbol bounding box that contains the center of the text bounding box, that text belongs to the symbol Calculate the Weighted Euclidean Distance between the center of the text and arrow vectors The only difference is that for text, the orthogonal vector is preferred If the shortest distance is smaller than δ , the text belongs to that symbol Otherwise, the text stands alone In our testing, we select w∗i = 3, w∗j = and δ = 400 37 Chapter Experiments and Results This chapter shows our progress in experimenting different models and results Section 5.1 describes different method to augment the dataset Section 5.2 illustrates our main experiments with different models 5.1 Data augmentation As mention in section 4.2.3, the input fed to proposed model comes from two sources: drawings and COCO annotations file To improve our precision in experiments, we decide to implement different augmentation method We have tested the following techniques: • cropping up to 50 percent of image size • stretching one dimension with the rate from twofold to tenfold • rotating clockwise or counter-clockwise up to 10 degree For image augmentation, we use imgaug[62] to the task For JSON file, we perform the augmentation after loading the file Taking an example of bounding box B =< xB , yB , wB , hB > in image I =< wI , hI >, each augmenting method is performed in the following way: Cropping: use box clipping technique mentioned in section 4.3.2.2 Stretching: use box relative position with the image More specifically, when the new image size is < w∗I , h∗I >: xB ∗ w∗I wI yB ∗ h∗I y∗B = hI ∗ w B ∗ hI w∗B = hI hB ∗ h∗I h∗B = hI xB∗ = For rotation, the box coordinate rotates with regard to image center Firstly, the image is changed to rectangle format: x1 = xB − wB y1 = yB − hB x2 = xB + wB y2 = yB + hB 38 EXPERIMENTS AND RESULTS Next, we record the new coordinate of the rectangle after rotation with angle θ With the rectangle format above, the four vertices of the rectangle is (x1 , y1 ), (x1 , y2 ), (x2 , y1 ), (x2 , y2 ) After the rotation, they have the new coordinate of: wI wI x1 − wI x1 − wI x1 − wI x2 − wI x2 − wI x2 − wI x2 − (x1∗ , y∗1 ) = x1 − (x2∗ , y∗2 ) = (x3∗ , y∗3 ) = (x4∗ , y∗4 ) = hI hI sin(θ ) + y1 − hI cos(θ ) − y2 − hI sin(θ ) + y2 − hI cos(θ ) − y1 − hI sin(θ ) + y1 − hI cos(θ ) − y2 − hI sin(θ ) + y2 − cos(θ ) − y1 − wI , hI cos(θ ) + wI sin(θ ) + , hI cos(θ ) + wI sin(θ ) + , hI cos(θ ) + wI sin(θ ) + , hI cos(θ ) + sin(θ ) + Finally, the new bounding box is calculated based on these new vertices: xB∗ = y∗B = w∗B = h∗B = maxi∈[1,4] (xi∗ ) + mini∈[1,4] (xi∗ ) maxi∈[1,4] (y∗i ) + mini∈[1,4] (y∗i ) ∗ maxi∈[1,4] (xi ) − mini∈[1,4] (xi∗ ) maxi∈[1,4] (y∗i ) − mini∈[1,4] (y∗i ) In practice, we not use bounding box for rotation cases Instead, we use mask for “symbol” and “text” class and keypoint coordinate for “arrow” class, similar to section 4.2.3 While it is possible to implement other augmentation, it disables the ability to recognize some symbols and texts In the following experiments, the input image receives one random augmentation at the probability of 0.25 5.2 Experiments We perform experiments on the proposed model in two situations: With and without keypoints All experiments are conducted on a NVIDIA GTX 1650 4GB with batch size of After that, we effectuate building the graph from predictions 39 EXPERIMENTS AND RESULTS Figure 5.1: Sample prediction 5.2.1 Perform training and inference without keypoints In this experiment, we perform training and inference on the proposed model without keypoints For arrows, the keypoints are replaced by masks for visual effects The input of this experiment contains 1000 self-labeled image at the learning rate of 0.005 and run for 10000 iterations Figure 5.1 and 5.2 shows a sample image with predictions Most objects are labeled correctly with high precision The mask of arrows is predicted appropriately although it is added for visual effect Figure 5.3 shows every means of loss and their total While providing an outstanding result, it is not a satisfying result since it is impossible to reconstruct the diagram from this image There is no noticeable relationship between symbols and objects 5.2.2 Perform training and inference with keypoints In this experiment, we perform training and inference on the model with keypoints The input of this experiment contains 1500 self-labeled images at the learning rate of 0.0005 and runs for 100000 iterations Figure 5.1 and 5.2 shows a sample image with predictions Each arrow is represented as a line connecting the blue dot (tail) to the red dot (head) It is notable from the example that when keypoints are added, the performance of the model drops tremendously 40 EXPERIMENTS AND RESULTS Figure 5.2: Sample prediction with rotated input Figure 5.3: Loss over iteration of proposed model without keypoints 41 EXPERIMENTS AND RESULTS Figure 5.4: Sample diagram without text Model Without keypoints With keypoints Training time (hours) 1.2 16 FPS 1.95 1.73 Learning rate 0.005 0.0005 Epoch 10000 100000 mAp 0.74 0.46 Table 5.1: Model results from over 0.9 scores for most labels to the range between 0.5 and 0.8 Figure 5.6 and table 5.1 further confirms this assumption Box regression loss, the highest term in figure ?? now barely contributes to total loss as more than half of total loss is from keypoint To improve this, we try to implement a hyperparameter for keypoint loss However, it does not work as long iteration balances out the parameter Another considerable solution to this problem is to train for longer iteration However, due to the limitation of the system, the required amount of time to train the model is enormous We are planning to solve this problem in the future 5.2.3 Building diagram structure from predictions In this experiment, we confirm the necessity of using different building methods in section 4.4 We perform inference on 1000 inputs without text and divide them equally into four groups, each has different building techniques In this experiment, we construct the diagram based on previous predictions and the method shown in Chapter We perform inference on 1000 inputs without text and divide them equally into four groups, each has different techniques For each group, the image used in prediction is converted into JSON format, rebuilt and compared with the original PNG The detailed techniques of each group is as follows: • Group I: Perform no edge refinement, WED with (wi∗ , w j∗ = (1, 1) • Group II: Perform edge refinement, WED with (wi∗ , w j∗ = (1, 1) √ • Group III: Perform no edge refinement, WED with (wi∗ , w j∗ = (1, 3) √ • Group IV: Perform edge refinement, WED with (wi∗ , w j∗ = (1, 3) Table 5.1 shows the result of experimenting five times While Weighted Euclidean Distance shows its effectiveness, it is not certain that edge refinement can improve the precision, since the system is not able to fully predict every object and the inputs are distributed randomly Figure 5.7 shows a drawing without scores higher than 60 percent Figure 5.8 shows a situation that the 42 EXPERIMENTS AND RESULTS Figure 5.5: Sample diagram with text Figure 5.6: Loss over iteration of proposed model with keypoints 43 EXPERIMENTS AND RESULTS Figure 5.7: Drawing without predictions at 60% score model will never be able to predict correctly Group I II III IV Size 250 250 250 250 Attempt 165 184 206 241 Attempt 186 173 237 229 Attempt 199 207 239 235 Attempt 175 189 210 238 Attempt 188 175 222 220 Average 182.6 185.6 222.8 232.6 Table 5.2: Graph building technique experiment We also experiment with placing text inside objects Figure 5.9 shows the correct output result of a drawing 44 EXPERIMENTS AND RESULTS Figure 5.8: Example of impossibility in prediction Figure 5.9: Sample output result 45 Chapter Conclusion 6.1 Summary This thesis focuses on creating the problem of digitizing diagram using machine learning approach We have researched about several object detection method for solving this task Although there are many researches about Online Diagram Recognition, the Offline counterpart plummets heavily with the lack of both datasets and studies We converted online diagram drawings to image and convert them to universally accepted format We proposed a two-step object detection model to solve object detection task We improved the original to give the ability of choosing instance format for each category Moreover, we introduced many techniques to rebuild a relation diagram from object instances and convert to usable output With many stateof-the-art model, we hope to create a great model that can be applied to an application in the future 6.2 Challenges Currently, our model can detect objects with an acceptable range However, the huge loss of keypoint forces us to change the hyperparameter of keypoint task By hindering the ability to predict at high precision, we cannot automatically predict and label dataset Another problem comes from the impossible situation Our model does not allow two diagonal arrows staying near to each other To negate this a better architecture is required Last but not least, the computational cost of algorithm used in building diagram has to be improved as it is time-consuming 6.3 Future works There are many improvements that can be done in the future: • Solving the tasks in the previous section • We will try to label more data If possible, we can develop an application to generate diagram like what DiDi did Moreover, we will try to add more symbols for different kind of diagram • We will change the source code to a format in which it can be deployed on a server If the backbone network is too large, we can change it to a smaller one in trade with computation cost 46 References [1] Raimi Karim Illustrated: 10 cnn architectures https://towardsdatascience.com/ illustrated-10-cnn-architectures-95d78ace614d, 2019 Accessed: 2020.12.06 [2] Tommy Huang Non-maximum suppression (nms) https://chih-sheng-huang821 medium.com/-non-maximum-suppression-nms-aa70c45adffa Accessed: 2020.12.06 [3] Juan Tapia Farias Architecture of the original mask rcnn framework https://www.researchgate.net/figure/ Architecture-of-the-original-Mask-R-CNN-framework-The-CNN-represents-the-backb fig5_334998795, 2019 Accessed: 2020.12.06 [4] T.-Y Lin, P Dollar, R B Girshick, K He, B Hariharan, and S J Belongie Feature pyramid networks for object detection arXiv:1612.03144, 2017 [5] Jonathan Hui Understanding feature pyramid networks for object detection (fpn) https://jonathan-hui.medium.com/ understanding-feature-pyramid-networks-for-object-detection-fpn-45b227b9106c, 2018 Accessed: 2020.12.06 [6] FARHAD MANJOO Chinook, the unbeatable checkers-playing computer https:// www.salon.com/2007/07/19/checkers/, 2007 Accessed: 2021.06.10 [7] CHESScom Kasparov vs deep blue, the match that changed history https: //www.chess.com/article/view/deep-blue-kasparov-chess, 2018 Accessed: 2021.06.11 [8] SOUTH BRUNSWICK Computer beats champion again, this time in othello https: //apnews.com/article/cfb65936e48e403e5a87ad30c8a063ec, 1997 Accessed: 2021.06.11 [9] MelOCinneide How does alphazero play chess? https://www.chess.com/article/ view/how-does-alphazero-play-chess, 2017 Accessed: 2021.06.11 [10] Illumine Magazine Yiqing Xu Ai behind alphago: Machine learning and neural network https://illumin.usc.edu/ ai-behind-alphago-machine-learning-and-neural-network/, 2019 Accessed: 2021.06.11 [11] Google lens https://lens.google Accessed: 2020.12.06 [12] Microsoft math https://math.microsoft.com Accessed: 2020.12.06 [13] Vivino https://vivino.com Accessed: 2020.12.06 [14] Screenshop https://screenshopit.com Accessed: 2020.12.06 [15] University of Nantes Ohfcd dataset http://tc11.cvc.uab.es/datasets/OHFCD_1 Accessed: 2020.12.06 [16] P Viola and M Jones Rapid object detection using a boosted cascade of simple features Computer Vision and Pattern Recognition, 1:511–518, 2001 47 REFERENCES [17] P Viola and M Jones Robust real-time face detection International Journal of Computer Vision, 57:137–154, 2004 [18] N Dalal and B Triggs Histograms of oriented gradients for human detection Computer Vision and Pattern Recognition, 1:886–893, 2005 [19] P F Felzenszwalb, R B Girshick, and D McAllester Cascade object detection with deformable part models Computer Vision and Pattern Recognition, page 2241–2248, 2010 [20] T Malisiewicz, A Gupta, and A A Efros Ensemble of exemplar-svms for object detection and beyond International Conference on Computer Vision, page 89–96, 2011 [21] A Krizhevsky, I Sutskever, and G E Hinton Imagenet classification with deep convolutional neural networks Advances in neural information processing systems, page 1097–1105, 2012 [22] Y LeCun, L Bottou, Y Bengio, and P Haffne Gradient-based learning applied to document recognition Proceedings of the IEEE, pages 2278–2324, 1998 [23] A Krizhevsky, I Sutskever, and G E Hinton Imagenet classification with deep convolutional neural networks Advances in Neural Information Processing Systems, 2012 [24] K Simonyan and A Zisserman Very deep convolutional networks for large-scale image recognition arXiv:1409.1556, 2014 [25] C Szegedy, W Liu, Y Jia, P Sermanet, S Reed, D Anguelov, D Erhan, V Vanhoucke, and A Rabinovich Going deeper with convolutions arXiv:1409.4842, 2014 [26] C Szegedy, V Vanhoucke, S Ioffe, J Shlens, and Z Wojna Rethinking the inception architecture for computer vision arXiv:1512.00567, 2015 [27] K He, X Zhang, S Ren, and J Sun Deep residual learning for image recognition Computer Vision and Pattern Recognition, 2016 [28] L Deng, M L Seltzer, D Yu, A Acero, A r Mohamed, and G Hinton Binary coding of speech spectrograms using a deep auto-encoder INTERSPEECH, 2010 [29] G E Dahl, M A Ranzato, A r Mohamed, and G Hinton Phone recognition with the mean-covariance restricted boltzmann machine Neural Information Processing Systems, 2010 [30] G E Hinton, N Srivastava, A Krizhevsky, I Sutskever, and R R Salakhutdinov Improving neural networks by preventing coadaptation of feature detectors arXiv:1207.0580, 2012 [31] S Ioffe and C Szegedy Batch normalization: Accelerating deep network training by reducing internal covariate shift International Conference on Machine Learning, 2015 [32] R Girshick, J Donahue, T Darrell, and J Malik Rich feature hierarchies for accurate object detection and semantic segmentation Computer Society Conference on Computer Vision and Pattern Recognition, pages 580–587, 2014 [33] R Girshick, J Donahue, T Darrell, and J Malik Regionbased convolutional networks for accurate object detection and segmentation Transactions on Pattern Analysis and Machine Intelligence, 38:142–158, 2016 [34] K He, X Zhang, S Ren, and J Sun Spatial pyramid pooling in deep convolutional networks for visual recognition arXiv:1406.4729, 2014 [35] S Lazebnik, C Schmid, and J Ponce Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories Computer Vision and Pattern Recognition, 2006 [36] F Perronnin, J Sanchez, and T Mensink Improving the fisher kernel for large-scale image classification European Conference on Computer Vision, 2010 48 REFERENCES [37] R Girshick Fast r-cnn International Conference on Computer Vision, 2015 [38] S Ren, K He, R Girshick, and J Sun Faster r-cnn: Towards real-time object detection with region proposal networks arXiv:1506.01497, 2015 [39] Y Li, J Dai, K He, and J Sun R-fcn: Object detection via region-based fully convolutional networks arXiv:1605.06409, 2016 [40] K He, G Gkioxari, P Dollar, and R B Girshick Mask r-cnn International Conference on Computer Vision, 2017 [41] J Redmon, S Divvala, R Girshick, and A Farhadi You only look once: Unified, real-time object detection arXiv:1506.02640, 2015 [42] J Redmon and A Farhadi Yolo9000: Better, faster, stronger arXiv:1612.08242, 2016 [43] J Redmon and A Farhadi Yolov3: An incremental improvement arXiv:1804.02767, 2018 [44] A Bochkovskiy and H Y M Liao C Y Wang Yolov4: Optimal speed and accuracy of object detection arXiv:2004.10934, 2020 [45] W Liu, D Anguelov, D Erhan, C Szegedy, S Reed, C.-Y Fu, and A C Berg Ssd: Single shot multibox detector European Conference on Computer Vision, 2016 [46] T.-Y Lin, P Goyal, R Girshick, K He, and P Dollar Focal loss for dense object detection Pattern Analysis and Machine Intelligence, 2018 [47] Zhang Shifeng, Wen Longyin, Bian Xiao, Lei Zhen, and Li Stan Z Single-shot refinement neural network for object detection In CVPR, 2018 [48] J.-P Valois, M Cote, and M Cheriet Online recognition of sketched electrical diagrams Proceedings of Sixth International Conference on Document Analysis and Recognition, 2001 [49] G Feng, C Viard-Gaudin, and Z Sun On-line hand-drawn electric circuit diagram recognition using 2d dynamic programming Pattern Recognition, page 3215–3223, 2009 [50] Ouyang T.Y and Davis R Chemink: A natural real-time recognition system for chemical drawings International Conference on Intelligent User Interfaces, page 267–276, 2011 [51] Qi Y., Szummer M., and Minka T.P Diagram structure recognition by bayesian conditional random fields Conference on Computer Vision and Pattern Recognition, page 191–196, 2005 [52] A Lemaitre, H Mouchere, J Camillerapp, and B Couasnon Interest of syntactic knowledge for on-line flowchart recognition Graphics Recognition New Trends and Challenges, page 89–98, 2013 [53] C Wang, H Mouchere, C Viard-Gaudin, and L Jin Combined segmentation and recognition of online handwritten diagrams with high order markov random field International Conference on Frontiers in Handwriting Recognition, page 252–257, 2016 [54] C Wang, H Mouchere, and C Viard-Gaudin Online flowchart understanding by combining max-margin markov random field with grammatical analysis International Journal on Document Analysis and Recognition, page 123–136, 2017 [55] M Bresler, D Prusa, and V Hlavac Modeling flowchart structure recognition as a maxsum problem International Conference on Document Analysis and Recognition, page 1215–1219, 2013 [56] M Bresler, D Prusa, and V Hlavac Online recognition of sketched arrow-connected diagrams International Journal on Document Analysis and Recognition, page 253–267, 2016 49 REFERENCES [57] M Bresler, D Prusa, and V Hlavac Recognizing off-line flowcharts by reconstructing strokes and using on-line recognition techniques International Conference on Frontiers in Handwriting Recognition, page 48–53, 2016 [58] A Bhattacharya, S Roy, N Sarkar, and S Malakar Circuit component detection in offline hand drawn electrical and electronic circuit diagram Calcutta Conference, page 151–156, 2020 [59] Czech Technical Univer-sity Fcdatabase https://cmp.felk.cvut.cz/~breslmar/ flowcharts/ Accessed: 2020.12.06 [60] Tokyo University of Agriculture Nakagawa lab and Technology Kondate dataset http://web.tuat.ac.jp/~nakagawa/database/en/kondate_proc.html Accessed: 2020.12.06 [61] P Gervais, T Deselaers, E Aksan, and O Hilliges The didi dataset: Digital ink diagram data Computer Vision and Pattern Recognition, 2020 [62] imgaug https://github.com/aleju/imgaug Accessed: 2021.08.04 50 ... Building A Diagram Recognition Problem with Machine Vision Approach Nhiệm vụ đề tài (yêu cầu nội dung số liệu ban đầu): - Investigate approaches in diagram recognition problem - Research on machine. .. diagrams Many ideas in the mainstream are divided into two main approaches: online diagram recognition and offline diagram recognition In online diagram recognition, the user continuously draws... following diagram datasets: • FC database[59] with 672 flowchart diagrams drawn by 24 users from Czech Technical University storing in InkML format • KONDATE dataset[60] from Nakagawa lab at Tokyo

Tiêu đề	Building A Diagram Recognition Problem With Machine Vision Approach
Tác giả	Tran Hoang Thinh
Người hướng dẫn	Dr. Nguyen Duc Dung
Trường học	Ho Chi Minh University of Technology
Chuyên ngành	Computer Science
Thể loại	Graduation Thesis
Năm xuất bản	2021
Thành phố	Ho Chi Minh City

Định dạng
Số trang	62
Dung lượng	1,4 MB