Báo cáo nghiên cứu khoa học: Research and development of a monitoring system for cashew nuts in the sorting and packaging line

In this study, YOLOv8, a state-of-the-art object detection algorithm, was employed to train the AI model for recognizing and classifying cashew nuts on the production line.. Research Top

Research Topic

• English: Research And Development Of A Monitoring System For Cashew Nuts

In The Sorting And Packaging Line

• Vietnamese: Nghiên cứu và phát triển hệ thống giám sát cho hạt điều trong quy trình sắp xếp và đóng gói.

Student’s Information

Phạm Nhật Quang 22071126 AIT2022A Applied Information

Lê Quốc Đạt 22070054 AIT2022B Applied Information

Lê Hữu Uy 22071116 AIT2022A Applied Information

Thảo 22070265 FDB2022A FinTech and Digital

Trần Quang Tiệp 23070340 AIT2023A Applied Information

Cashew nuts are a popular and valuable commodity in the global market due to their nutritional value and versatility in various culinary applications As consumer demand for high-quality cashew nuts continues to rise, ensuring the efficiency and consistency of the sorting and packaging processes has become increasingly important To address this need, the research and development of a monitoring system specifically designed for cashew nuts in the sorting and packaging line has gained significant attention The aim of this study is to develop an innovative monitoring system that integrates advanced technologies to enhance the overall efficiency and quality control of cashew nut processing By implementing computer vision, machine learning, and data analysis techniques, this system will enable real-time monitoring and evaluation of key parameters throughout the sorting and packaging line The primary objective of the monitoring system is to accurately detect and classify various defects or abnormalities in cashew nuts, such as broken or discolored nuts, foreign objects, or improper sizing

By employing computer vision algorithms, the system will analyze visual data to identify and separate defective nuts from those meeting the required quality standards This will help minimize the presence of inferior cashew nuts in the final packaged products, ensuring a consistent level of quality and customer satisfaction Additionally, the monitoring system will collect and analyze data on process variables such as sorting speed, packaging efficiency, and error rates This data-driven approach will provide valuable insights into the performance of the sorting and packaging line, facilitating continuous process optimization and enabling timely adjustments to improve productivity and reduce waste The successful development and implementation of a robust monitoring system for cashew nuts in the sorting and packaging line have the potential to revolutionize the industry by streamlining operations, improving product quality, and optimizing resource utilization This system will contribute to the overall growth and competitiveness of cashew nut producers, enabling them to meet the increasing demands of the market while maintaining stringent quality standards.

Concerning rationale of the study

The research and development of a monitoring system for cashew nuts in the sorting and packaging line is motivated by several key factors The cashew nut industry is experiencing a growing demand for high-quality products, and it is essential to meet this demand with an efficient and reliable sorting and packaging process One of the main drivers behind this research is the need to ensure consistent quality standards Defective nuts, such as those that are broken, discolored, or malformed, can have a significant impact on the overall quality of the final product Therefore, the implementation of a monitoring system that can accurately detect and separate these defects is crucial By incorporating advanced technologies such as computer vision, machine learning, and data analytics, the system can automate the identification process, ensuring that only premium-quality cashew nuts are selected for packaging Another important aspect is the reduction of human errors Manual sorting and packaging processes are labor-intensive and can be prone to mistakes By integrating a monitoring system, the reliance on manual labor is minimized, and the system can provide real-time detection and separation of defective nuts This not only improves the accuracy of the sorting process but also reduces the inclusion of substandard nuts in the final packaged products.

Research questions

How can a monitoring system using computer vision, machine learning, and data analytics be developed to identify and separate defective cashew nuts in the sorting and packaging process?

How can the accuracy and reliability of the monitoring system in detecting different types of defective cashew nuts be ensured?

Object and Scope of the Study

The objective of this study is to research and develop a comprehensive monitoring system for cashew nuts in the sorting and packaging line, with a focus on improving quality control and optimizing operational efficiency in the cashew nut industry The study aims to achieve several specific objectives Firstly, a monitoring system will be developed utilizing cutting-edge technologies such as computer vision, machine learning, and data analytics This system will be designed to accurately identify and separate defective cashew nuts in real-time during the sorting and packaging process

To ensure the reliability and accuracy of the monitoring system, robust algorithms and models will be implemented The system will be trained to detect various types of defects commonly found in cashew nuts, including broken nuts, discolored nuts, and deformed nuts This will enable the system to effectively differentiate between acceptable and defective nuts, ensuring that only high-quality nuts are included in the final packaging The scope of the study includes the research, design, and development of the monitoring system specifically tailored for cashew nuts in the sorting and packaging line Emphasis will be placed on automation and optimization, ensuring seamless integration into the existing operational setup Factors such as process flow, equipment compatibility, and human-machine interaction will be considered during the system's implementation Furthermore, the study will involve the collection and analysis of data obtained from the monitoring system This data will provide valuable insights into trends, patterns, and opportunities for process improvement in the cashew nut sorting and packaging line By leveraging this information, potential enhancements and optimizations can be identified, leading to increased efficiency and productivity The study's scope is limited to the sorting and packaging stage of cashew nut production and does not cover other aspects such as cultivation, harvesting, or post-packaging logistics.

Research Methods

The YOLOv8 method is utilized for cashew nut recognition into categories of good, broken, peel, and burn YOLOv8's object detection capabilities enable real-time detection and localization A dataset of annotated images or videos from cashew nut production facilities is gathered Finally, a classification model categorizes the detection results into the four designated categories, providing accurate identification and classification based on quality and condition.

Structure

The research content of the topics is configured into small programs to solve problems mentioned in the target study Accordingly, the study provides a basis for discussion on identification and classification of cashew nuts

The main content of the report includes 4 chapters:

Chapter 4: Conclusion and Future Work

BACKGROUND

Convolutional Neural Networks (CNN)

a) An introduction to CNN and its role in cashew identification :

Convolutional Neural Networks (CNNs) are artificial neural networks specifically engineered for processing image and video data By employing filters and convolutions, CNNs automatically extract key features like edges and angles from images By combining these features, they can recognize complex objects such as faces, animals, and cars Their ability to learn and analyze image data without human intervention has made CNNs a prominent tool in computer vision applications such as facial recognition, image classification, self-driving cars, and medical imaging CNNs have consistently achieved remarkable results in these domains, cementing their significance in the field of image data processing.

CNN (Convolutional Neural Network) plays an important role in cashew nut identification in the cashew nut production and processing industry By using CNNs, systems can automatically classify and recognize cashews based on images, which enhances performance and quality during production and quality control With its flexibility and automatic learning capabilities, CNNs can distinguish cashew nuts by size, shape, color, and other characteristics This makes the process of sorting and separating cashew nuts more automated, fast and accurate, helping to optimize the production process and minimize sorting errors

In addition, the use of CNNs also helps to create an image database of different types of cashews, thereby improving the training process and validating identification models This enhances quality and consistency in the final product, while minimizing reliance on human factors in the cashew nut identification and sorting process b) CNN Basics:

- The convolutional class is the core of a CNN and is responsible for extracting features from image data

- In this layer, a number of filters are applied on parts of the input image to create feature maps

- Each filter detects features such as edges, angles, or special patterns in the image

- The pooling class is often used after convolutional classes to reduce the size of feature maps and reduce the amount of information that needs to be processed

- The most common is the max pooling class, where the maximum value in each specific area of the feature map is selected and retained, while other values are discarded

- The use of the pooling layer minimizes overfitting and speeds up the network's computation

- The fully connected class is the last layer in CNN, usually used to connect features that have been extracted from previous layers to the output layers

- Each node in the fully connected class connects to all nodes in the previous class

- This class is often used to perform tasks such as classifying and detecting objects based on characteristics learned from convolutional and pooling classes

- These classes work together to learn and extract appropriate features from image data and perform tasks such as object classification and detection c) Some common applications of CNN :

Up to now, Convolutional Neural Networks (CNN) has proven its strength in the field of computer vision with a variety of applications Here are some common uses of CNNs in computer vision:

Convolutional Neural Networks (CNNs) have revolutionized image classification tasks They excel in categorizing images into distinct classes, ranging from everyday objects like cats and dogs to specialized domains such as flower identification and material classification CNNs' ability to extract high-level features from images has led to remarkable advancements in this field, making them an indispensable tool in image classification applications.

- Object Detection: One of the most important uses of CNN is to detect and locate objects in images By identifying bounding boxes containing specific objects in an image, CNN makes object recognition and analysis easier, from facial recognition to detecting cars on the road

- Sentiment Analysis: CNNs have also been used to analyze emotions from faces, making the identification of emotions such as happy, sad, angry possible more automatically and accurately than ever

- License Plate Recognition: In the field of security and traffic, CNNs have been deployed to recognize and analyze information from license plates in images or videos, helping to improve traffic management and long-distance safety

- Medical Pathology Detection and Analysis: In the field of medicine, CNN plays an important role in detecting signs of pathology from medical images such as X-ray, MRI, CT scan, thereby supporting the diagnosis and treatment of diseases

- Image Enhancement: Finally, CNNs can be used to enhance image quality, including brightening, clarifying, reducing noise, and enhancing resolution, helping to improve user experience and image diagnosis

=> With these diverse applications, CNN has become a powerful tool in the field of computer vision, bringing great utility and advancement to many different fields.

Object Detection

Object detection is a computer vision task that aims to locate objects in digital images

As such, it is an instance of artificial intelligence that consists of training computers to see as humans do, specifically by recognizing and classifying objects according to semantic categories.One Object localization is a technique for determining the location specific objects in an image by demarcating the object through a bounding box.

There are many traditional and modern methods used for object detection in computer vision Here are some notable methods:

• R-CNN (Region-based Convolutional Neural Networks): R-CNN is an advanced method that uses CNNs to suggest important areas in images, then applies a CNN network to extract features from each proposed region and finally uses classifiers to classify objects.

YOLO (You Only Look Once) is an advanced object detection method that simultaneously predicts bounding boxes and classification layers from the entire input image This single-pass detection approach allows YOLO to process images much faster than traditional methods, making it an efficient choice for real-time object detection tasks.

• SSD (Single Shot MultiBox Detector): SSD is an object detection method capable of predicting bounding boxes and classification layers from each pixel in a single go SSDs combine techniques such as convolutional neural networks and feature-specific extraction techniques on multiple scales to detect objects of different sizes.

PROPOSED METHOD

Overview Diagram of YOLOv8

YOLOv8 is an improvement over its predecessor YOLO versions, enhancing performance and making the model faster, more accurate, and user-friendly An essential part of YOLOv8 continues to utilize the CSP architecture from YOLOv5 As shown in, the C2F block extracts image features

In YOLOv8, the 1×1CBS convolution structure in the stage for generating feature maps similar to PAN-FPN in YOLOv5 has been removed Additionally, the C3 structure is replaced with the C2F structure to enhance feature extraction and learning In addition, YOLOv8 is a model that eliminates the anchor box mechanism, also known as Anchor- Free, which addresses the complexity when two objects share the same center point by constructing bounding boxes and assigning them to separate classes Additionally, data augmentation with mosaic is a simple technique where four different images are combined and fed into the model as input This helps YOLOv8 learn natural objects from various positions and under occluded conditions.

Figure 4 : C2f module in YOLOv8 model

The YOLO (You Only Look Once) series of models has become famous in the computer vision world YOLO's fame is attributable to its considerable accuracy while maintaining a small model size YOLO models can be trained on a single GPU, which makes it accessible to a wide range of developers Machine learning practitioners can deploy it for low cost on edge hardware or in the cloud

YOLO has been nurtured by the computer vision community since its first launch in

2015 by Joseph Redmond In the early days (versions 1-4), YOLO was maintained in

C code in a custom deep learning framework written by Redmond called Darknet YOLOv8 author, Glenn Jocher at Ultralytics, shadowed the YOLOv3 repo in PyTorch (a deep learning framework from Facebook) As the training in the shadow repo got better, Ultralytics eventually launched its own model: YOLOv5

YOLOv5 quickly became the world's SOTA repo given its flexible Pythonic structure This structure allowed the community to invent new modeling improvements and quickly share them across repository with similar PyTorch methods

In the last two years, various models branched off of the YOLOv5 PyTorch repository, including Scaled-YOLOv4, YOLOR, and YOLOv7 Other models emerged around the world out of their own PyTorch based implementations, such as YOLOX and YOLOv6 Along the way, each YOLO model has brought new SOTA techniques that continue to push the model's accuracy and efficiency

Over the last six months, Ultralytics worked on researching the newest SOTA version of YOLO, YOLOv8 YOLOv8 was launched on January 10th, 2023

YOLOv8 achieves strong accuracy on COCO For example, the YOLOv8m model the medium model achieves a 50.2% mAP when measured on COCO When evaluated against Roboflow 100, a dataset that specifically evaluates model performance on various task-specific domains, YOLOv8 scored substantially better than YOLOv5 More information on this is provided in our performance analysis later in the article

YOLOv8 outshines other models with its developer-friendly features A CLI streamlines the training process, while a Python package enhances the coding experience This comprehensive approach makes YOLOv8 highly accessible and user-friendly, surpassing its predecessors in terms of convenience.

Figure 5Detailed illustration of YOLOv8 model architecture b YOLOv8 Accuracy Improvements :

To cater to the needs of different applications, YOLOv8 comes in five different versions:

• YOLOv8n: The smallest model, with a 37.3 mAP score on COCO

• YOLOv8s: A step up, offering improved performance

• YOLOv8x: The largest model, scoring an impressive 53.9 mAP on COCO c Advantages and limitations of YOLOv8:

Speed: YOLOv8 is considered fast and has low response times, helping to handle object recognition and image segmentation tasks in real time

Accuracy: YOLOv8 is built on advances in deep learning and computer vision, ensuring high accuracy in object recognition

Flexibility: YOLOv8 supports object recognition and segmentation on both GPU and

CPU, leveraging technologies such as Nvidia's TensorRT and Intel's OpenVino

Figure 6: Compare the accuracy and performance of YOLO models

To use YOLOv8 effectively, you need to:

• Have in-depth knowledge of Machine Learning, Deep Learning and related algorithms

• Needs to be trained on a sufficiently large and diverse data set to achieve the highest efficiency

• Requires high computational resources to achieve fast and accurate processing

The YOLOv8 algorithm is not open source and is only available through licensing agreements with its creator, Joseph Redmon

YOLOv8 may not perform well in all environments and may require additional tuning or optimization to achieve optimal performance d YOLOv8 Labeling Tool:

Ultralytics, the creator and maintainer of YOLOv8, has partnered with Roboflow to be a recommended annotation and export tool for use in your YOLOv8 projects Using Roboflow, you can annotate data for all the tasks YOLOv8 supports – object detection, classification, and segmentation – and export data so that you can use it with the YOLOv8 CLI or Python package

Table 1: The accuracy of YOLOv8 on COCO

An essential feature of YOLOv8 is its versatility, allowing it to be extended as a framework that is compatible with all previous YOLO versions This makes it effortless to switch between different YOLO models and evaluate their performance Consequently, for those who want to take advantage of the latest advancements in YOLO technology while retaining their existing YOLO models, YOLOv8 is the ideal choice.

Web Application

Web Application is an extensive open-source project showcasing the seamless integration of object detection, tracking, and segmentation tasks using YOLOv8 (object detection algorithm) and Streamlit (Python web application framework) The project offers a user-friendly and customizable interface designed to detect, track, and segment objects in real-time video streams from various sources

Streamlit is a free and open-source framework to rapidly build and share beautiful machine learning and data science web apps It is a Python-based library specifically designed for machine learning engineers Data scientists or machine learning engineers are not web developers and they're not interested in spending weeks learning to use these frameworks to build web apps Instead, they want a tool that is easier to learn and to use, as long as it can display data and collect needed parameters for modeling Streamlit allows you to create a stunning-looking application with only a few lines of code

The integration with Streamlit enables a user-friendly web interface Users can easily interact with the application, selecting the video source, adjusting detection and tracking parameters, and visualizing real-time detection, tracking, and segmentation results The interface is customizable, allowing users to tailor the application to their specific requirements

Web Application combines the power of object detection with advanced tracking and segmentation techniques With the YOLOv8 algorithm, the project achieves accurate and efficient object detection capabilities It can detect a wide range of objects with high precision, drawing bounding boxes around them for visualization

In addition to object detection, Web Application incorporates object tracking functionality It utilizes sophisticated tracking algorithms to follow the detected objects across consecutive frames, ensuring smooth and continuous tracking This is particularly useful in scenarios where objects may undergo occlusion or change in appearance

By leveraging advanced image segmentation models, the project accurately extracts detailed object information and generates precise segmentations within video frames This comprehensive processing provides users with a clear understanding of object structure and characteristics, enhancing their perception of the video content.

Web Application supports various video sources, including RTSP, UDP, and YouTube URLs Users can also analyze static videos and images by uploading them via the web interface This flexibility makes Web Application applicable to a wide range of use cases in fields such as surveillance, video analytics, and computer vision research

In summary, Web Application is an open-source project that seamlessly integrates object detection, tracking, and segmentation tasks With its user-friendly interface and customizable features, it enables users to detect, track, and segment objects in real-time video streams from different sources Web Application provides a powerful solution for applications requiring comprehensive object analysis and understanding.

Counting and Tracking

a) OpenCV for YOLOv8 Object Tracking :

OpenCV is our most extensive open-sourced library for computer vision, containing almost every possible image-processing algorithm Leveraging OpenCV for YOLOv8 Object Tracking combines the advanced detection capabilities of YOLOv8 with the robust features of the OpenCV library, offering an innovative solution for sophisticated realtime object tracking

OpenCV primarily provides eight different trackers available in OpenCV 4.2 — BOOSTING, MIL, KCF, TLD, MEDIANFLOW, GOTURN, MOSSE, and CSRT out of the box A majority of the open-source trackers use OpenCV for most of the visualization and image processing works b Object Tracking :

YOLOv8 Object Tracking is an extended part of object detection where we identify the location and class of objects within the frame and maintain a unique ID for each detected object in subsequent video frames c Types of Object Tracking :

This article focuses mainly on YOLOv8 object tracking and object counting To fully understand the landscape of object-tracking technology, it’s essential first to explore the various current tracking methods This exploration will include a detailed look at different types of trackers and their unique capabilities

In the paper by Zahra Soleimanitaleba and Mohammad Ali Keyvanrada [2], we learn that object tracking methods have different categories For example, from feature-based methods like optical flow to Kalman Filter, which uses estimation-based methods for object tracking, to Deep learning-based methods like GOTURN or SORT by Alex Bewley, Zongyuan Ge, et al [3]

We can classify Object Tracking into two main categories based on the type and functionality of available trackers d Single Object Tracker (SOT)

Object tracking involves marking the target object in the first frame and subsequently using a tracking algorithm to locate it in subsequent frames OpenCV offers various built-in trackers, such as KCF, TLD, and GOTURN, cater to a range of tracking applications One notable extension is the Multiple Object Tracker (MOT), which enables tracking multiple objects simultaneously.

With a fast object detector, detecting multiple objects and then running a tracker to track multiple objects in a single or consecutive frame makes sense

Various advanced Multi-Object Tracking (MOT) systems exist, such as DeepSORT, FaitMOT, ByteTrack, botSORT, etc These systems employ sophisticated algorithms to track multiple objects accurately and efficiently in video sequences

In the paper by Gioele Ciaparrone, Francisco Luque Sánchez, et al.[3] the standard approach in Multiple Object Tracking (MOT) algorithms is tracking-by-detection, where detections (bounding boxes identifying targets in video frames) guide the tracking process These detections are associated with maintaining consistent IDs for the same targets across frames This makes MOT primarily an assignment problem Thanks to modern detection frameworks ensuring high-quality detections, most MOT methods focus on enhancing Data association rather than detection Many MOT datasets provide pre-determined detections, allowing algorithms to bypass detection and focus solely on comparing association quality This approach isolates the impact of detector performance on tracking results f Object Counting

YOLOv8 Object counting is an extended part of object detection and object tracking It begins with YOLOv8 object tracking to identify objects in video frames These objects are then tracked across frames via algorithms like BoTSORT or ByteTrack, maintaining consistent identification

Object counting employs various methods including whole-frame counting, line counting, and zone-based counting Line counting involves tallying objects crossing a predefined line, while zone-based counting focuses on objects within designated areas Additionally, in/out counting identifies objects entering or leaving specific zones These methods provide accurate and efficient counting solutions across a range of applications.

To ensure accuracy, especially against double counting, objects are tracked in lists (like counting_list), and counts are incremented (count += 1) under specific conditions Addressing challenges such as occlusions and dynamic conditions, the process yields valuable data for insights in fields ranging from traffic monitoring to retail analytics, showcasing its comprehensive utility

Object counting with YOLOv8 involves the following steps:

- Object Detection: YOLOv8 performs object detection by analyzing the image or video frame and identifying the presence of objects It predicts the bounding boxes that tightly enclose the objects and assigns class labels to them

- Object Localization: YOLOv8 accurately localizes the objects within the image by determining the coordinates of the bounding boxes that surround each object

- Object Classification: YOLOv8 assigns class probabilities to each detected object, indicating the likelihood of it belonging to a specific category or class This allows for the identification and differentiation of different types of objects

- Object Counting: By analyzing the bounding boxes and class labels of the detected objects, object counting algorithms based on YOLOv8 can accurately count the occurrences of specific objects within the image or video stream The count can be obtained by simply tallying the number of objects belonging to the desired class.

RESULTS

Dataset

In the industrial sector, product identification and classification is an important process that helps improve production efficiency and quality On the market today there are many methods to perform this task and one of the commonly used methods is using machine learning and computer vision technology We are the research group on cashew identification in the industry Our work includes purchasing cashew samples from manufacturing factories, collecting photos from internet sources, then classifying them into specific categories such as 'Burn', 'Bad', 'Good' ' and 'Peel'

To ensure precise cashew classification, Roboflow's platform was employed Among the available options (Detection, Segmentation, Classification), Detection was deemed optimal as it not only categorizes cashews but also pinpoints their image location This enhanced accuracy and efficiency in the manufacturing process Roboflow's flexibility simplified data labeling, while its detection models optimized recognition performance Notably, Bounding Boxes facilitated rapid cashew identification by marking their image locations.

In addition, using Roboflow and the Detection problem also brings flexibility and convenience Detection models are capable of quickly processing images and videos and can be deployed on devices such as Raspberry Pi, helping to optimize the system and reduce operating costs

Figure 7: Perform labeling of cashew nuts on Roboflow

Utilizing Roboflow's Detection problem has greatly enhanced our industrial cashew identification and classification process.* The flexibility, efficiency, and automation provided by Roboflow have enabled us to streamline the production process.* Consequent improvements in quality have resulted from the optimized production process made possible by this technology.

- Before we feed images into the model to detect and classify objects, a series of preprocessing steps need to be performed to prepare the data First, we convert the image to the appropriate color space, usually the RGB color space This is extremely important so that the model can understand and process the image correctly We then typically normalize the image sizes to ensure that they are the same size as the training data This helps ensure consistency and effectiveness of the training and prediction process Finally, various preprocessing steps can include noise removal, light balancing, and resolution changes These steps help clean and prepare data for detection and classification

- Next, after preprocessing the data, we use a convolutional neural network (CNN) to extract features from the image CNN networks are often pre-trained on large amounts of image data to learn basic features such as edges, corners, and shapes of objects The final layers of the CNN network usually contain higher-level feature information, and we use them to predict objects in the image This process helps us better understand the characteristics of objects in the image, thereby helping the model to predict more accurately

- The next step is object detection In this case, we use the YOLOv8 network to predict bounding boxes for objects in the image Each bounding box contains information about the position (x, y coordinates) and size of the object YOLOv8 is capable of detecting multiple objects at the same time and providing prediction probability for each object Once we have the bounding boxes, we can determine the location and type of objects in the image

- Finally, we perform post-processing steps to refine and filter the results These steps may include removing bounding boxes with low prediction probability, applying a non-maximum suppression algorithm to remove overlapping boxes, and assigning labels to predicted objects The final result will be a list of objects detected in the image, along with their locations and predicted probabilities This provides us with a comprehensive approach to detect and classify objects in images effectively and accurately.

Evaluation Metrics

Table 2: The number of images and number of instances

Class Images Instances Box (P) R mAP50 mAP50-95

-Conceptual: Accuracy is the ratio between the number of correct positive predictions and the total number of positive predictions.

(TP) (True Positive) is the number of correct positive predictions.

(FP) (False Positive) is the number of false positive predictions.

-Conceptual: Recall is the ratio between the number of correct positive predictions and the total number of actual positives.

(FN) (False Negative) is the number of positive false negative predictions.

-Conceptual: Average area under the precision-recall curve from 0.5 to 0.95) is an important performance measure in object detection.

-Conceptual: This is the average AP calculated at IoU thresholds from 0.5 to 0.95 with a step of 0.05 It provides a more comprehensive view of the performance of the object detection model at different levels of difficulty.

+Conceptual: This is the ratio between the intersection area and integration area of the predicted bounding box and the actual bounding box IoU measures the degree of overlap between two bounding boxes.

+Conceptual: This is the average area under the Precision-Recall curve for a particular feature class AP evaluates the performance of the object detection model.

+AP was calculated by calculating the area under the Precision-Recall curve and averaging.

Detection Result

After the first training with the YOLOv8l algorithm, we ran 100 epochs with nearly

1000 images and videos We have obtained relatively accurate results on cashew nut identification using the Detection problem

Then we did the same with the remaining 4 versions of YOLO which are YOLOv8n, YOLOv8m, YOLOv8x, YOLOv8s and also obtained relatively accurate results and gave the results with those four cases are "Burn", "Broken", "Good" and "Peel" They are shown in Figure 5, Figure 6, Figure 7 and Figure 8 below For comparison purposes, we also summarize the corresponding results obtained from other models and report them along with our model results in each figure

And in the prediction results, each image of the cashew nut is labeled with the prediction quality level and the probability that the method is certain about its prediction

Figure 8: Examples of the detection results obtained by four models for the “Burn” grade products

Figure 9: Examples of the detection results obtained by four models for the “Broken” grade products

Figure 10: Examples of the detection results obtained by four models for the “Good” grade products

Figure 11: Examples of the detection results obtained by four models for the “Burn” grade

Counting and Tracking result

The results obtained from the application of the Yolov8 model for counting and tracking cashew nuts were highly satisfactory The system demonstrated robust performance in accurately detecting and classifying cashew nuts into the four predefined classes: good, peel, burn, and broken

For the counting aspect, the model efficiently identified regions of interest within the video where the cashew nuts passed through Each time a cashew nut crossed one of these regions, the model successfully detected its class and incremented the respective count This counting mechanism proved to be reliable and accurate, providing precise quantification of the number of cashew nuts for each class

The object detection capabilities of the Yolov8 model were particularly impressive It displayed a high level of precision in localizing and recognizing individual cashew nuts within the video frames The model's ability to handle variations in lighting conditions, orientations, and appearances of the cashew nuts contributed to its overall effectiveness

The classification results were also noteworthy The model accurately assigned the appropriate class label to each detected cashew nut Whether a cashew nut was deemed good, peel, burn, or broken, the model consistently made correct predictions This classification accuracy is crucial for quality control purposes as it allows for effective sorting and categorization in the cashew nut production process

Furthermore, the Yolov8 model demonstrated real-time performance in processing the video frames The inference speed was fast, enabling efficient analysis of the cashew nut production line without significant delays This real-time capability is vital for applications in industrial settings where quick decision-making and process optimization are crucial

Figure 12:Counting Cashew nuts using Ultralytics YOLOv8

In summary, the results obtained from the application of the Yolov8 model for counting and tracking cashew nuts using the Utralytics framework were highly promising The model exhibited excellent performance in accurately detecting, classifying, and counting cashew nuts in a simulated video of a production line These results showcase the potential of Yolov8-based approaches for automating counting and tracking tasks in the cashew nut industry, ultimately contributing to improved efficiency and quality control.

Web App Result

During the exploration and preparation for developing a website, our team discussed and decided to use the Python programming language as the main tool to build the foundation of the project This decision was made after conducting a series of extensive research and analysis on today's popular programming languages

Python was chosen because of its flexibility and power in web application development With a large community and strong support from the development community, Python offers a rich range of libraries and frameworks, helping us optimize the development process and create a high-performance final product and a great user experience

The website was developed and operated locally, using the Python programming language to build both its interface and features Our main goal is to bring the problem of classifying and detecting cashews to the web, creating a powerful tool to help users identify cashews effectively and conveniently

On this website, we have integrated two main methods to identify cashew nuts: image identification and video identification This provides flexibility to users, allowing them to choose the method that best suits their specific needs and situations.

With guaranteed high accuracy, both of these identification methods bring reliability and efficiency during use The video is also integrated with the feature of counting the number of cashew nuts that have passed through the counting line, helping users monitor output accurately and conveniently

Furthermore, we provide a new feature that assists users in recognizing cashews through webcams, specific YouTube videos, and RTSP This not only opens up new opportunities for users, but also demonstrates our commitment to providing the most modern and convenient features for users This feature not only expands the usability of the application but also creates flexibility and convenience for users in choosing data sources for cashew identification At the same time, it also promotes convenience and flexibility during the use of our application

Figure 13: Interface of cashew surface web using Detection problem

Figure 14: Track and count cashew nuts through video

To efficiently count objects in videos, crucial parameters such as dimensions and frame rate are defined A designated region of interest guides the counting process An object counter, tailored for video processing, is initialized to monitor objects throughout the video Each frame is analyzed by a YOLO model, delivering object detection and tracking data This information is then transmitted to the counter for counting and contour rendering The resulting video is recorded, and resources are released upon completion This automated approach streamlines object counting in videos, with applications in fields like security, traffic management, and product tracking.

CONCLUSION AND FUTURE WORK

Based on this research, we have utilized the YOLOv8 model for cashew nut recognition and successfully applied it to specific applications The YOLOv8 model demonstrates remarkable abilities in cashew nut recognition and classification tasks

Firstly, we employed the YOLOv8 model to count the number of cashew nuts within a specific region Through training and testing, the model achieved high accuracy in counting cashew nuts while ensuring fast processing speed This is particularly useful for quality control and production management in the cashew nut industry

Moreover, the model can accurately identify and classify different types of cashew nuts based on their visual attributes It can distinguish between good nuts, nuts with husks, broken nuts, and burnt nuts This level of classification accuracy is crucial for quality control purposes, ensuring that only the finest cashew nuts reach the market

The YOLOv8 model boasts robust cashew nut recognition capabilities, enabling it to handle challenges encountered in real-world scenarios By effectively recognizing nuts with irregular shapes, sizes, and partial obscuration or overlap, the model ensures reliable and consistent performance in detecting cashews across various image or video formats.

Secondly, we developed an application that allows users to upload images and videos containing cashew nuts for recognition purposes This application enables users to upload data from external sources such as links to images or videos containing cashew nuts The YOLOv8 model is then applied to recognize and classify the cashew nuts within that dataset This provides convenience and flexibility for users, allowing them to use the application across multiple platforms and from various data sources

In summary, the utilization of the YOLOv8 model and the development of a cashew nut recognition application have brought numerous benefits We have successfully built a system for counting the number of cashew nuts and classifying them into categories such as good nuts, nuts with husks, broken nuts, and burnt nuts Additionally, the application allows users to upload images and videos from various sources for cashew nut recognition

These achievements not only provide utility in the cashew nut industry but also have the potential to expand the application of the YOLOv8 model and cashew nut recognition into other fields The model's ability to accurately identify and classify cashew nuts, as well as its robustness in handling various challenges, contributes to enhanced productivity, quality assurance, and efficiency in cashew nut production processes

In the future, the integration of the YOLOv8 model and cashew nut recognition technology holds great potential for further advancements and applications

One potential direction for future development is the refinement and fine-tuning of the YOLOv8 model specifically for cashew nut recognition By continuously training the model with larger and more diverse datasets, its accuracy and robustness can be further improved This will enable it to handle a wider range of cashew nut variations, including different cultivars, sizes, and processing conditions

Additionally, advancements in hardware and computing power may enable real-time cashew nut recognition and classification This would allow for seamless integration of

Tiêu đề	Research and development of a monitoring system for cashew nuts in the sorting and packaging line
Tác giả	Phạm Nhật Quang
Người hướng dẫn	TS.Kim Đình Thái
Trường học	Vietnam National University, Hanoi
Chuyên ngành	Applied Information Technology
Thể loại	Student Research Report
Năm xuất bản	2024
Thành phố	Hanoi

Định dạng
Số trang	40
Dung lượng	1,57 MB