The YOLOv7 with Python implementationis a state-of-the-art image detection technology that can identify objects with highaccuracy.. - Vietnamese: Nghiên cứu "Phát hiện công cụ phẫu thuật
Trang 1VIETNAM NATIONAL UNIVERSITY, HANOI
INTERNATIONAL SCHOOL
STUDENT RESEARCH REPORT
Supervisor: Kim Dinh Thai
Trang 2TEAM LEADER INFORMATION
- Program: Automation and informatics
- Address: Thach That, Ha Noi
- Phone no /Email: 0333292913/ le2407204@gmail.com
II Academic Results
1 semester, 1 yearst st 2.75 Good
III Other achievements:
Advisor
Hanoi, April 15, 2023
Team Leader
Trang 3We would like to send our most sincere acknowledgement to Mr Kim Dinh Thai, Mr Ha Manh Hung and Mr Quang Anh ,who guided us on the right track with our research assignment.Thanks to your thorough support and carefulness, we have completed this scientific research Mr Kim Dinh Thai and Mr Ha Manh Hung has always been caring and supporting us step by step from the ideation to the completion phase of the research.They help us understand the concepts and knowledge in our field, support in solving problems and providing documents and resources to help you develop They not only inspired us to generate creative ideas for the research but also motivated
us when we were trying to overcome the difficulties in the process
Mr Anh Quang has extensive experience and knowledge in our field, providing in-depth advice and guidance in implementing the topic, so that we can complete it on time
Without their support, we could not be able to complete this research Once more, we sincerely thankyou for their huge contribution and look forward to collaborating with them in future projects
Student
Le Ba Tung Duong
Trang 43 Applications of CNN in our scientific research project 13
Trang 5* Object Detection problem, challenges and proposed models: 19
Trang 6MIS procedures use small incisions and robotic arms, making it essential to detectand identify surgical tools accurately The YOLOv7 with Python implementation
is a state-of-the-art image detection technology that can identify objects with highaccuracy The proposed method of using YOLOv7 with Python in detectingsurgical tools will contribute to the development of computer-assisted surgicalsystems that can identify surgical tools in real-time This research could improvethe efficiency of MIS procedures, making them safer and more accessible topatients with Python
Using image detection with YOLOv7 with Python, this research is expected toachieve high accuracy and fast detection times The approach could be adapted toother surgical fields to provide a faster and more accurate way of identifyingsurgical tools with Python
In conclusion, this research study with Python implementation has the potential tomake MIS procedures safer and more efficient by providing a robust and accuratemethod for detecting surgical tools with Python The use of YOLOv7 with Python
in image-only detection is a significant step forward in computer-assisted surgeryand could have broad applications across different surgical fields with Python
- Vietnamese:
Nghiên cứu "Phát hiện công cụ phẫu thuật trong phẫu thuật ít xâm lấn bằng Mạngnơ-ron sâu (sử dụng YOLOv7) với Python" tập trung vào phát triển một phươngpháp phát hiện công cụ phẫu thuật được sử dụng trong phẫu thuật ít xâm lấn (MIS)bằng cách sử dụng mạng nơ-ron sâu YOLOv7 với thực hiện Python Nghiên cứunhằm cung cấp một giải pháp hiệu quả để phát hiện và xác định các công cụ phẫuthuật trong các phẫu thuật MIS, nâng cao độ chính xác và an toàn của thủ thuật sửdụng Python
Trang 7Các phương pháp MIS sử dụng các cắt nhỏ và các cánh tay robot, làm cho việcphát hiện và xác định các công cụ phẫu thuật một việc rất quan trọng YOLOv7với thực hiện Python là một công nghệ phát hiện hình ảnh tiên tiến có thể xác địnhcác đối tượng với độ chính xác cao Phương pháp đề xuất sử dụng YOLOv7 vớiPython trong việc phát hiện công cụ phẫu thuật sẽ đóng góp cho việc phát triển các
hệ thống phẫu thuật hỗ trợ máy tính có thể xác định công cụ phẫu thuật trong thờigian thực Nghiên cứu này có thể cải thiện hiệu quả của các phương pháp MIS,làm cho chúng an toàn và dễ tiếp cận hơn đối với các bệnh nhân với Python
Sử dụng phát hiện hình ảnh với YOLOv7 với Python, nghiên cứu này dự kiến sẽđạt được độ chính xác cao và thời gian phát hiện nhanh Phương pháp này có thểđược áp dụng cho các lĩnh vực phẫu thuật khác để cung cấp một cách nhanh hơn
và chính xác hơn để xác định các công cụ phẫu thuật với Python
Tóm lại, nghiên cứu này với việc thực hiện Python có tiềm năng để làm cho cácphương pháp MIS an toàn và hiệu quả hơn bằng cách cung cấp một phương phápchính xác và mạnh mẽ để phát hiện công cụ phẫu thuật với Python Việc sử dụngYOLOv7 với Python trong phát hiện hình ảnh là một bước tiến quan trọng trongphẫu thuật hỗ trợ máy tính và có thể có các ứng dụng rộng khắp trong các lĩnh vựcphẫu thuật khác với Python
Trang 8- Vietnamese: Nhận diện dụng cụ phẫu thuật trong phẫu thuật nội soi sử dụng mạng nơ-ron học sâu
2 Student’s Information:
INTRODUCTION
1 Concerning rationale of the study:
Nowadays, laparoscopic surgery is gradually replacing traditional open surgery because ithas significant advantages such as less postoperative pain, faster recovery, shorter hospital stay, smaller scars, and lower risk of infection However, it also has some drawbacks Due to the inability to directly see the patient's abdominal cavity and the needfor indirect observation through a display screen to perform the operation, laparoscopic surgery is actually much more difficult than traditional open surgery techniques, especially for inexperienced doctors Especially in Vietnam, the application of computer vision in endoscopic surgery and the research and application of detecting surgical instruments are still quite new and have not yet truly developed Therefore, we suggest applying object detection to real-time detection of laparoscopic surgical instruments to assist doctors in performing surgeries
2 Research questions:
With the current advancements in computer vision, the successful application of technology to identify surgical instruments in laparoscopic surgery would result in significant advancements in the country's medical business However, in order to do so,
we must first address two major questions:
- How to clearly identify surgical equipment under unstable settings such as low light or ablurred camera angle, shaking?
Trang 9- How does the performance of the deep neural network model compare to traditional computer vision techniques in surgical tool detection?
3 Research Objectives:
This study was practiced with the following three main objectives:
1 Build and expand the data set of approximately 20000 images of surgicalinstruments and human organs from various angles
2 Improving the algorithm accuracy and proficiency in processing images
3 Development and evaluation of algorithms to assist in the detection of surgicaltools
4 Research Methodology:
To detect the presence and location of surgical tools, this research will use YOLO (Only Look Once), a module that allows for object detection and location within an image The research will employ Python languages, OpenCV, and libraries to program the deep neural network model The deep neural network model used in this research will be a real-time attention-guided convolutional neural network (CNN) that performs frame-by-frame detection of surgical tools in MIS videos The CNN will consist of a coarse (CDM)and a refined (RDM) detection module, allowing for high accuracy in identifying the location of surgical tools
Overall, the use of YOLO, Python, OpenCV, and the real-time attention-guided CNN model will enable accurate and efficient detection of surgical tools in MIS videos, ultimately improving surgical procedures and outcomes
5 Structure:
The research project is structured into chapters aimed at addressing the issues outlined in the research objectives The study includes theoretical foundations for the formation of the Surgical Tool Detection through the YOLO model From there, an evaluation of the
Trang 10quality of the model's training results can be made, and the best model can be selected forpractical application.
- The main body of the thesis consists of five chapters:
Chapter 1: Neural network model overview
Chapter 2: Object detection problem and YOLO model
Chapter 3: Surgical Tool Detection in Minimally Invasive Surgery based on YOLO model
Chapter 4: Experimental results
Laparoscopic surgery is currently commonly used in Vietnam because of it's advantagesare outperforming, which including the ability to cure complex disorders, almost painlesstreatment, smaller scars, quicker healing, and shorter time staying in hospital Due to theinability to view directly within the patient's abdominal cavity but instead needing toobserve through the monitors in control, in reality, laparoscopy is far more challengingthan the standard open surgical procedure ditionally, the subsequent skill evaluation afterthe training is still conducted manually and is based on the arbitrary observations andconclusions of a professional As a result, we propose a system for automaticallycontrolling the position of the camera to detect laparoscopic tools based on the surgeon'smoving image
Trang 11- NN is only inspired by the brain and its functioning, rather than replicating all of itsfunctions Our main objective is to use this model to solve the problems we need toaddress.
1.2 Convolutional Neural Network - CNN
- Convolutional Neural Network (CNN) is one of the most advanced Deep Learningmodels CNN allows you to build intelligent systems with extremely high accuracy It isdesigned to automatically learn features and characteristics of input data through theapplication of filters and convolutional layers
Trang 121.2.1 Components of a CNN model:
● Input layer: receives input data, typically images, audio, or video
● Convolutional layer: uses filters to search for features of the input data
● Non-linear layer: uses a non-linear activation function to help the model learn more complex features
● Pooling layer: reduces the size of the output of the convolutional layer
● Fully connected layer: connects neurons between layers to produce the final output
● Output layer: produces the final result of the model
1.2.2 How CNN model works.
- The CNN model works by applying filters to the input images to search for features ofthe image These filters are trained to identify important features of the image, such asedges, corners, curves, and other features Then, the identified features are fed into fullyconnected layers to produce the final prediction result
Trang 132 Improving CNN models with algorithms:
2.1 Dropout Algorithm:
- Dropout: Dropout is a regularization technique that helps reduce overfitting in CNN models by randomly removing some neurons during training Removing some neurons helps to reduce the impact of individual neurons on the final result
2.2 Batch Normalization Algorithm:
- Batch normalization is a technique for normalizing the input values of each layer in the CNN model, helping to reduce bias and ensure that the input values have the same distribution This helps the model converge faster and improve accuracy
2.3 The Transfer Learning Algorithm:
- Transfer learning is a technique that uses a pre-trained model on a large dataset andreuses the learned weights to help the model learn better on the current dataset Thistechnique helps to reduce training time and improve the accuracy of the model
Trang 142.4 Learning Rate Scheduling Algorithm:
- Learning Rate Scheduling is a technique for adjusting the learning rate of the modelduring training Adjusting the learning rate helps the model learn better and reducestraining time
=> These algorithms can be used independently or in combination with each other toimprove the performance of CNN models
3 Applications of CNN in our scientific research project
- Image processing: CNN is used for object recognition, image classification,medical image processing, and many other applications
Specifically, in our project, we will use CNN to detect surgical tools The teamwill work together to create datasets to be used for training the CNN model toautomatically detect cases such as: grasper , scissor , hook, clipper,
Since the CNN neural network model is one of the most popular and effective deepneural network models today, the research team will use it in this project
Chapter 2: Object Detection Problem and YOLO Model
· Types of Problems in Computer Vision
- In computer vision, there are numerous types of problems depending on thespecific task that the user wants to perform However, below are some commontypes of problems:
2.1 Object recognition/detection: The problem of recognizing and
detecting objects in images.
- Object recognition/detection is the problem of identifying and detecting objects inimages or videos The purpose of this problem is to determine the position, size,shape, and type of objects in the image
- In the detection problem, the model is trained to detect objects in images andprovide regions of interest (bounding boxes) around the objects These boundingboxes determine the position and size of the object in the image Some popularmodels for object detection are YOLO, Faster R-CNN, and SSD
Trang 15- In the recognition problem, the model is trained to determine the type of objectsthat have been detected in the image Some popular object recognition models areVGGNet, Inception, and ResNet.
- Object recognition/detection is used in many fields such as self-driving cars,security monitoring, object recognition in medical images, and artificialintelligence-related applications
2.2 Image classification: The problem of classifying images into different categories, such as classifying dogs and cats.
- Image classification is the problem of classifying images into different categories.The purpose of this problem is to identify the type of image being viewed in somepre-defined categories For example, an image can be classified as "dog" or "cat,"
"car" or "airplane," "house" or "street."
- To solve this problem, we typically use deep learning models such as theConvolutional Neural Network (CNN) This model is usually trained on a largedataset of pre-classified images and then used to predict the type of new images.The training process typically involves optimizing the model's parameters toincrease the accuracy of predictions
- Image classification is widely used in practical applications such as facialrecognition, traffic detection, satellite image object identification, financial marketanalysis, and many other applications
2.3 Image segmentation: The problem of dividing pixels in an image into different regions to help analyze the image in more detail.
- Image segmentation is the process of dividing an image into various parts, eachpart representing an object or a part of an object This allows us to accuratelyidentify and analyze the features of objects in the image Image segmentationtechniques are very important in computer vision and are widely used in manyapplications such as object recognition, medical analysis, facial recognition, animalanalysis, and many other applications
- There are many methods for performing image segmentation, including:+ Thresholding: This method is based on comparing pixel values with a definedthreshold to classify them into a certain group This method is suitable for simpleimages with clear objects
+ Region-based segmentation: This method searches for regions in the image withsimilar characteristics and divides the image into segments based on thosecharacteristics This method is often used in medical and pharmaceutical imaging.+ Edge-based segmentation: This method uses the edges and boundaries of objects
to separate them from other objects This method is suitable for images withobjects that have clear boundaries
Trang 16+ Clustering-based segmentation: This method uses clustering algorithms such ask-means or mean-shift to divide the image into regions with similar characteristics.This method is suitable for images with diverse characteristics and colors.+ Deep learning-based segmentation: Deep learning methods such asConvolutional Neural Networks (CNNs) are used to learn the features of objects inthe image and classify them into different classes These deep learning models arebecoming effective and popular image segmentation methods in many real-worldapplications.
2.4 Pose estimation, also known as pose estimation from images or image-based3D pose estimation, is the task of determining the position and orientation of an object in3D space based on images or video
- Pose estimation is the problem of determining the position and orientation of parts of abody or object in 3D space based on images or video This problem can be applied tomany purposes, such as human action recognition, object tracking in video, ordetermining the position of a robot in space
- In pose estimation, the parts of the body or object are identified and marked in theimage or video Then, this information is used to build a 3D model of the object Thisprocess often uses machine learning models such as Convolutional Neural Networks(CNNs) to analyze and extract features from images
- Pose estimation is widely used in many fields such as sports, health care, entertainment,and manufacturing For example, this problem can be used to track the position andactions of athletes in sports games, improving performance and diagnosing medicalissues It can also be used to monitor the position and actions of robots in manufacturingapplications
2.5 Object tracking: The problem of following an object in a sequence of images iscalled object tracking
- Object tracking is the problem of following the position and motion of an object in asequence of video frames The goal of this problem is to identify and maintain theposition of an object that has been detected across consecutive frames
Common object tracking algorithms use different methods to measure the similaritybetween regions of interest or features of the object across frames For example, somemethods use shape and size information of the object to track it, while others use context-based tracking algorithms that do not rely on the shape of the object
- The applications of object tracking are vast, from security surveillance to observing thebehavior of wild animals It is also used in real-world applications such as autonomousvehicles and mobile robots to track objects
Trang 172.6 Face recognition: The problem of identifying human faces in images or videos.
- Face recognition is the problem of identifying human faces in images or videos Thisproblem is typically divided into three main parts: detect, align, and recognize
- In the detect phase, faces in the image are searched and detected The align phaseensures that the found faces are in the correct position and aligned to ensure they are ofthe same size and orientation Then, in the recognize phase, machine learning algorithmsare used to compare the features of the face with registered data in a database todetermine the identity of the person
- Algorithms used in face recognition include both traditional methods and deep based methods such as Convolutional Neural Networks (CNNs) However, the problemstill faces many challenges such as changes in lighting, viewing angles, skin color, hair,and clothing of the person
learning Face recognition has many applications in fields such as security, surveillance,entertainment, and access control For example, it can be used for security monitoringand crime detection, or to improve user experience in entertainment applications It canalso be used for identity verification in access control and replace traditional passwordentry
2.7 Optical character recognition (OCR): The problem of recognizing characters in
an image
- Optical Character Recognition (OCR) is the problem of converting handwritten orprinted text from image or digital documents into computer-readable text that can beprocessed by a machine
- Common OCR algorithms use character recognition methods, including templatematching, segmentation, orientation, character recognition, and combining them togenerate the corresponding text Modern OCR methods use deep learning algorithmssuch as Convolutional Neural Networks (CNNs) to solve this problem Using deeplearning allows the model to automatically learn character features and improve theaccuracy of character recognition
- The applications of OCR are vast, from helping to convert old documents and storingthem as digital documents, to supporting the management of records and documents forcompanies and organizations It is also used in human-machine interaction products, such
as recognizing handwriting on a control panel, or in applications for recognizing licenseplates, cards, and personal documents
2.8 Scene reconstruction: The problem of creating a 3D model of a scene from 2Dimages