1. Trang chủ
  2. » Luận Văn - Báo Cáo

A smart shopping cart with automated payment based on artificial intelligence

114 8 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề A Smart Shopping Cart With Automated Payment Based On Artificial Intelligence
Tác giả Trần Ngô Minh Trí, Nguyễn Hoài Nam, Nguyễn Văn Tòng
Người hướng dẫn TS. Bùi Hà Đức
Trường học Ho Chi Minh City University Of Technology And Education
Chuyên ngành Electronics And Telecommunications Engineering Technology
Thể loại Graduation Thesis
Năm xuất bản 2023
Thành phố Ho Chi Minh City
Định dạng
Số trang 114
Dung lượng 8,45 MB

Cấu trúc

  • CHAPTER 1: INTRODUCTION (18)
    • 1.1 Motivations (18)
    • 1.2 Objective (20)
    • 1.3 Research task (20)
    • 1.4 Limitations (21)
    • 1.5 Research subjects and scopes (21)
    • 1.6 Outline (21)
  • CHAPTER 2: LITERATURE REVIEW (23)
    • 2.1 Service robot (23)
    • 2.2 Introducing 2D images (26)
    • 2.3 Deep learning and Convolutional Neural Networks (26)
    • 2.4 Object detection (26)
      • 2.4.1 Metrics are used for the object detection task (29)
      • 2.4.2 Choosing a model to deploy on Jetson Nano (30)
      • 2.4.3 Understanding SSD (Single Shot MultiBox Detector) (30)
      • 2.4.4 Understanding MobileNetV2 (31)
    • 2.5 Introducing TensorRT (32)
    • 2.6 Object tracking (33)
    • 2.7 OCR (Optical Character Recognition) (34)
    • 2.9 PID controller (37)
  • CHAPTER 3: HARDWARE AND MECHANICAL DESIGN (40)
    • 3.1 Technical Requirements (40)
    • 3.2 Design Proposal (40)
    • 3.4 Building the robot base (43)
      • 3.4.1 Calculations and motor selection (43)
      • 3.4.2 Calculations and selection of the belt transmission system (47)
      • 3.4.3. Selection of bearing supports (48)
      • 3.4.4. Selection of omnidirectional wheels (50)
      • 3.4.5 Designing the base plate (50)
    • 3.5 Calculating the kinematics of the robot (51)
    • 3.6 Calculating the dynamics of the robot (55)
  • CHAPTER 4: ELECTRICAL DESIGN (58)
    • 4.1 Technical requirements (58)
    • 4.2 Block diagram and overview of the electrical system (58)
    • 4.3 Power supply block (60)
      • 4.3.1 Calculating and selecting the power supply (60)
      • 4.3.2 Buck converter circuits (62)
    • 4.4 Main data processing block (63)
    • 4.5 Sensor block (64)
    • 4.6 Control block (66)
      • 4.6.1 STM32F103C8T6 microcontroller (67)
      • 4.6.2 H-Bridge driver (68)
    • 4.7 Actuator block (69)
      • 4.7.1 DC motor (70)
      • 4.7.2 Encoder (71)
  • CHAPTER 5: ALGORITHM DESIGN (73)
    • 5.1 Designing 2D and 3D image processing algorithms (73)
    • 5.2 Following person module (74)
    • 5.3 Automatic checkout module (75)
      • 5.3.1 OCR (Optical Character Recognition) block (75)
      • 5.3.2 Semantic Entity Recognition module (78)
    • 5.4 Algorithm for Robot navigation (79)
    • 5.5 The control algorithm on STM32 (82)
  • CHAPTER 6: EXPERIMENTS AND RESULTS (85)
    • 6.1 PID Controller for Motor Speed Control (85)
      • 6.1.1 The structure of a PID controller (85)
      • 6.1.2 Finding the transfer function of the motor from experimentation (85)
      • 6.1.3 Find the parameters of the PID controller for speed control of the motor (89)
      • 6.1.4 The PID control diagram of each motor (90)
      • 6.1.5 The experimental results of the PID controller on two motors (91)
    • 6.2 Deformation testing (97)
      • 6.2.1 The main base plate deformation testing (97)
      • 6.2.2 The cargo compartment deformation testing (98)
      • 6.2.3 The base frame deformation testing (99)
    • 6.3 Training the semantic entity recognition model (101)
      • 6.3.1 Preparing data (101)
      • 6.3.2 Data labeling (103)
      • 6.3.3 Training the model (104)
    • 6.4 Model inference (106)
    • 6.5 Recognizing user actions (107)
    • 6.6 User interface designing (108)
    • 6.7. The result of the tracking model when the person is occluded (109)

Nội dung

INTRODUCTION

Motivations

The retail industry is crucial for economic growth in both emerging and developed nations, including Vietnam, where it generates approximately 8,015 trillion VND (around 348 million dollars) In major cities, retail activities are primarily concentrated in supermarkets and shopping centers However, the peak-hour shopping experience at these venues faces challenges, particularly during the payment process, as most supermarkets rely on manual labor for transactions, leading to congestion and slow payment processing that fails to meet customer demand.

The peak-hour shopping experience at supermarkets and shopping centers is often hindered by slow payment processes, primarily due to reliance on manual labor for processing individual transactions This leads to congestion and delays, negatively impacting customer satisfaction Addressing these challenges is crucial, and exploring innovative solutions can significantly improve the overall shopping experience during busy periods.

Figure 1.1 Customers waiting to pay at the supermarket

The Amazon Cart features an integrated automated payment system, utilizing four cameras and five lights to capture images and identify products This innovative cart enables customers to easily enter a store and select items for purchase.

Customers can effortlessly shop by using a smart cart that eliminates the need for a traditional checkout process This innovative cart employs advanced technology, including cameras, sensors, and machine learning algorithms, to monitor the items selected and returned by shoppers Once customers exit the store, their Amazon accounts are automatically charged for the items they have chosen.

Figure 1.2 The Amazon Dash Cart scans items, and interacts with a customer’s Amazon account for payment

Veeve, a Seattle-based company, is innovating the shopping experience with its advanced shopping cart that features high-tech upgrades This smart cart allows customers to check out directly from the cart using a built-in camera, which captures product images and identifies the items, streamlining the shopping process.

Customers face challenges when purchasing multiple heavy items, like several 1-liter milk bottles or soft drinks, as the carts are not equipped for automatic movement and must be manually pushed.

Figure 1.3 Veeve's new product attaches to regular carts, turning them into smart carts

To improve the efficiency of payment tasks in supermarkets, we propose the design of an Autonomous Mobile Robot that functions as a cart, featuring two essential capabilities: shopper tracking and automated payment processing through Artificial Intelligence This innovative solution aims to distribute payment responsibilities evenly, reducing pressure on cashiers and checkout counters By allowing customers to browse and select items at their own pace, the robot enhances the overall shopping experience while seamlessly carrying goods and handling payments.

Objective

This innovative product enhances the shopping experience in supermarkets and shopping malls by enabling customers to seamlessly check out items as they place them in their carts Utilizing advanced depth camera technology, it processes both 2D and 3D images, while a deep learning model trained on meticulously labeled datasets ensures accurate item recognition and automatic checkout.

Research task

After survey the demand of the market, this research project is undertaken with the following tasks:

• Task 1: Survey solutions already exist in the market for the topic

• Task 2: Design block diagram for the complete system and assign tasks to each member within the team

• Task 3: Design mechanical systems and calculate kinematic of mobile robots

• Task 4: Design electrical and control systems

• Task 5: Research science papers relevant to this topic and give a final solution

• Task 6: Collect and label data in practical situations

• Task 7: Train AI models with our datasets

• Task 8: Testing performance of our system with suitable metrics

• Task 9: Design and program user interface on web browsers.

Limitations

Due to hardware and financial constraints, our cart is unable to keep pace with individuals moving at high speeds Additionally, the AI model designed for product recognition has not yet been implemented on embedded devices, leading us to rely on wired webcams instead of wireless options.

Research subjects and scopes

• Deploy AI model (detection models, tracking models) on embedded devices

• Research point cloud processing for plane segmentation

• Apply OCR (Optical Character Recognition) models with different approaches

• Kinematic calculations for Mobile Robot

• Apply PID method to control motors

• Develop customer behavior analysis based on paths of mobile robots

• Deploy OCR model on embedded devices

• Build infrastructure for communication between mobile robots and servers

• Connect mobile robots with each other to create a robot network

• Improve processing speed of AI models.

Outline

The outline of the thesis is divided into five chapters, which are as follows:

• Chapter 1: Introduction This chapter will introduce generally about our project such as topic’s motivation, objectives, limitations, research activities and scope

• Chapter 2: Literature review In this chapter, we will review the knowledge we used during research and the implementation process

• Chapter 3: Hardware and mechanical design In this chapter, we will dicuss mechanical structure, dynamic calculating and selection of mechanical elements

• Chapter 4: Electrical design This chapter will provide the design of electrical elements in detail and explain why we choose those elements

• Chapter 5: Algorithm design In this chapter, we will discuss algorithms that we used in this project such as object detection, tracking and PID algorithms

• Chapter 6: Experiments and results This chapter presents our experiences with the algorithms discussed in Chapter 5 and the results we obtained

• Chapter 7: Conclusion In this chapter, we summarize the performance of our system and identify areas for future research

LITERATURE REVIEW

Service robot

Service robots, often referred to as assistive or service-oriented robots, are advanced technological systems that perform diverse tasks and provide assistance across various environments By offering a broad spectrum of services, these robots are transforming industries and enhancing daily life for individuals, businesses, and society at large.

Service robots are engineered for seamless interaction and collaboration with humans, utilizing advanced sensors and AI algorithms These sophisticated machines can autonomously navigate their environment, manipulate objects, and communicate effectively through speech recognition and natural language processing, ensuring a user-friendly experience.

Service robots are revolutionizing healthcare by supporting professionals in hospitals and care facilities They assist with crucial tasks like patient monitoring, medication delivery, and offering companionship, which enhances overall patient care By alleviating the workload of healthcare staff, these robots improve efficiency and contribute to better patient outcomes.

Service robots are revolutionizing industries like logistics and warehousing by automating repetitive, labor-intensive tasks such as sorting and transporting goods This automation boosts productivity and lowers operational costs These robots can efficiently navigate complex environments, adapt to dynamic conditions, and work alongside human employees to enhance overall warehouse operations.

Service robots are increasingly utilized in the hospitality and service industry, playing a vital role in hotels, restaurants, and customer service centers These robots greet guests, provide information, and deliver food and beverages, significantly enhancing customer experiences By streamlining service delivery, they allow human staff to concentrate on more complex or specialized tasks, ultimately improving overall efficiency.

Service robots are transforming domestic life by streamlining household chores and aiding individuals with everyday tasks These innovations range from robotic vacuum cleaners that autonomously maintain clean floors to personal assistant robots designed to manage schedules and appointments efficiently.

7 and manage smart home devices, these robots simplify and enhance our lives by providing convenience and support

Service robots are increasingly being integrated into education to enhance learning experiences through interactive and personalized engagement These robots serve as tutors, delivering educational content, facilitating group activities, and adapting to individual learning needs By promoting student participation and fostering creativity, service robots contribute significantly to the advancement of educational methodologies.

Service robots play a crucial role in disaster response and search-and-rescue missions, showcasing their ability to navigate dangerous environments and gather essential information By accessing hard-to-reach locations and undertaking perilous tasks, these robots minimize risks for human responders while significantly improving the efficiency of rescue operations.

In summary, service robots are becoming essential across multiple industries and everyday life Their advanced features and adaptability are transforming work, living, and interactions As technology progresses, these robots will evolve further, enhancing efficiency, safety, and connectivity in our world.

An autonomous mobile robot (AMR) is a self-operating robot capable of navigating and interacting with its surroundings without human intervention Unlike autonomous guided vehicles (AGVs), which depend on fixed paths or tracks and typically need operator supervision, AMRs utilize advanced sensors and algorithms to move freely and adapt to dynamic environments.

Autonomous Mobile Robots (AMRs) are increasingly utilized across diverse sectors, including manufacturing, healthcare, retail, and logistics These robots can be programmed for specific tasks and enhanced with machine learning and artificial intelligence, which boosts their performance and adaptability AMRs are engineered to enhance operational efficiency, lower costs, and improve safety in various environments.

Figure 2.1 Autonomous Mobile robots of major brands

- Improved efficiency: AMRs can work continuously without breaks and can perform repetitive tasks with high accuracy, which can improve overall efficiency and productivity

- Cost-effective: AMRs can be a cost-effective solution for companies as they can reduce labor costs and increase productivity without requiring significant infrastructure changes

- Flexibility: AMRs are designed to operate in a variety of environments and can adapt to changes in their surroundings, which makes them highly flexible and versatile

- Enhanced safety: AMRs can be designed to operate in hazardous or dangerous environments, which can reduce the risk of injury or accidents for human workers

- High upfront costs: The initial investment required to purchase and deploy AMRs can be high, which can be a barrier to adoption for some companies

- Limited customization: While AMRs are flexible, they may not be able to perform highly specialized tasks or adapt to unique work environments without significant customization

- Maintenance and repair: AMRs require regular maintenance and repair to ensure that they operate efficiently and safely, which can add to the overall cost of ownership

Autonomous Mobile Robots (AMRs) heavily depend on sophisticated technology, including sensors and software, making them susceptible to malfunctions or failures Such vulnerabilities can lead to operational downtime and necessitate extra maintenance and repair efforts.

Introducing 2D images

The RGB (Red-Green-Blue) color model is a fundamental system in digital imaging and computer graphics, utilizing an additive approach to combine varying intensities of red, green, and blue light to produce a wide array of colors This model operates on the principle that mixing red, green, and blue light at full intensity results in white light, while the absence of all three colors yields black.

The RGB color model utilizes an 8-bit value for each color channel—red, green, and blue—ranging from 0 to 255 This setup provides 256 intensity levels per channel, leading to more than 16 million possible color combinations By adjusting the intensity of each channel, a wide spectrum of colors and shades can be generated.

In terms of mathematics, a RGB image includes 3 matrices stacking each other, each channel representing one color in red, green and blue.

Deep learning and Convolutional Neural Networks

Deep learning, a subset of machine learning, involves training artificial neural networks with multiple layers to identify and learn intricate patterns in data This technology has transformed numerous domains, such as computer vision, natural language processing, and speech recognition.

Convolutional Neural Networks (CNNs) are a widely used deep learning architecture ideal for image and video processing By utilizing convolutional layers, CNNs automatically learn spatial hierarchies of features, effectively capturing local patterns, textures, and shapes in images Their outstanding performance in image classification, object detection, and image segmentation tasks has established CNNs as a crucial tool in computer vision.

Object detection

Object detection is a crucial computer vision task that identifies and locates objects within images or videos, providing precise bounding box coordinates Unlike simple image classification, it enables machines to recognize and interact with their environment in real-time This technology is vital for applications such as autonomous driving, surveillance, robotics, and augmented reality Modern methods leverage deep learning techniques, particularly Convolutional Neural Networks (CNNs), and utilize advanced algorithms like Faster R-CNN and YOLO.

(You Only Look Once), or SSD (Single Shot MultiBox Detector) to achieve high accuracy and efficiency in detecting objects within complex visual scenes

Figure 2.2 Timeline of state-of-the-art object detection methods

Figure 2.3 Output of Object Detection

One of the most popular deep learning models are used in object detection tasks:

Faster R-CNN is a widely-used object detection algorithm known for its blend of accuracy and efficiency It features a Region Proposal Network (RPN) that generates region proposals and utilizes a shared convolutional network to extract features from the entire image The RPN effectively predicts objectness scores and bounding box offsets, enabling the identification of potential object regions This innovative approach allows Faster R-CNN to deliver remarkable detection accuracy while outperforming previous models in speed.

11 due to this model using two-stage architecture, its inference speed achieves 0.2s/image which is slower than the others used one-stage architecture

The Single Shot MultiBox Detector (SSD) is a highly efficient object detection algorithm recognized for its remarkable speed and accuracy It utilizes multiple convolutional layers of varying sizes to detect objects at different scales and aspect ratios, allowing it to effectively identify objects of diverse sizes in an image By predicting bounding boxes and class probabilities at each convolutional layer with default anchor boxes, SSD excels in multi-scale detection However, its reliance on several convolutional layers with smaller resolutions can lead to challenges in accurately detecting small objects.

YOLO (You Only Look Once) is an advanced object detection algorithm that transformed real-time object detection by treating it as a regression problem rather than relying on traditional region proposals It segments the input image into a grid, predicting bounding boxes and class probabilities from each grid cell, which allows for efficient processing in real-time with just a single forward pass through the neural network Although YOLO may sacrifice some localization accuracy, its speed makes it ideal for applications requiring real-time detection, including video analysis, robotics, and self-driving vehicles.

Figure 2.4 Accuracy of YOLOv2, SSD, and Faster-RCNN models for each object size category

Figure 2.5 The fps versus batch size in five detection algorithms

2.4.1 Metrics are used for the object detection task

IoU (Intersection over Union): the ratio between area of overlap and area of union of predicted box and ground truth respectively The higher IoU score, the better model performance

Figure 2.6 Illustration of Intersection over Union (IoU)

• Precision: the accuracy of all of the model predictions

• Recall: the ratio of true prediction in all of ground truth

• Receipt to calculate precision and recall:

• True positive: is the correctly positive prediction of the model

• False positive: is the incorrectly positive prediction of the model and it actually is negative

• True negative: is the correctly negative prediction of the model

• False negative: is the incorrectly negative prediction of the model and it actually is positive

In object detection models, a higher precision indicates greater confidence in the detected bounding boxes, but this often results in fewer boxes being identified Conversely, a higher recall leads to the detection of more bounding boxes, although it comes with reduced confidence in those detections.

Average Precision (AP) is a key evaluation metric in object detection tasks, representing the average precision value across each class at various Intersection over Union (IoU) thresholds It assesses the accuracy and quality of predicted bounding boxes, indicating how effectively a model ranks and localizes objects.

Mean average precision (mAP): It provides a comprehensive assessment of the detection performance across all classes by calculating the average precision (AP) for each class and then averaging them

2.4.2 Choosing a model to deploy on Jetson Nano

Due to hardware limitations, it is essential to choose models that prioritize lightweight design and high inference speed Since the size of the detected person remains constant, inference speed takes precedence over accuracy Consequently, MobileNetV2 is selected as the backbone, utilizing the SSD (Single Shot MultiBox Detector) algorithm for efficient detection.

2.4.3 Understanding SSD (Single Shot MultiBox Detector)

In the first version, VGG16 is used as a backbone and the architecture of this model is present in the figure:

Figure 2.7 The basic structure of the SSD network model

Convolutional predictors for detection: this model uses 3x3xp convolutional layers as a filter for classification and bounding box regression instead of fully connected such as YOLO

The multi-scale feature map detection process begins with the input image being processed through the backbone to extract features, resulting in reduced resolution output at each layer This approach enables effective object detection within the image Utilizing a VGG16 backbone, the model generates a total of 8,732 predictions across six layers.

In the Single Shot Multibox Detector (SSD) framework, each prediction feature map initializes k default boxes, which are then classified into c classes Additionally, the coordinates of these boxes are represented as (cx, cy, w, h) Consequently, the total computation involved in this process is calculated as (c + 4) * k * m * n.

MobileNetV2 is an advanced deep learning architecture optimized for creating efficient and lightweight neural network models, making it ideal for mobile and embedded devices with constrained computational resources Developed by Google researchers, it enhances the capabilities of its predecessor, MobileNetV1.

MobileNetV2 has a significantly smaller model size compared to VGG16 which is a deeper model with a larger number of parameters, resulting in a larger model size and higher computational complexity

Figure 2.9 Show performance of SSD algorithm with several different backbone on

ImageNet and VOC datasets, MobilenetV2 achieved impressive inference speed (39 FPS) with mAP around 65% which is acceptable.

Introducing TensorRT

TensorRT, developed by NVIDIA, is a high-performance library designed for optimizing and accelerating the inference of deep learning models It supports frameworks such as TensorFlow, PyTorch, and ONNX, making it an essential tool for enhancing the efficiency of deep learning applications.

TensorRT is designed for optimal deployment on NVIDIA GPUs, leveraging their hardware capabilities to achieve rapid and efficient inference performance It employs a range of optimization techniques, including layer fusion, precision calibration, kernel auto-tuning, and dynamic tensor memory management, to enhance throughput and reduce latency during inference.

The main features and benefits of TensorRT include:

TensorRT enhances neural network performance through layer optimization by implementing techniques like kernel auto-tuning, precision calibration, and layer fusion, which streamline the execution of individual layers.

- Tensor Memory Management: It efficiently manages the memory required for intermediate tensors during inference, minimizing the memory footprint and reducing memory transfers between the CPU and GPU

TensorRT offers dynamic precision calibration, allowing models to be calibrated at runtime using a portion of the training data This process facilitates the quantization of the network to lower precision formats like INT8, which enhances inference speed while maintaining accuracy.

- Multi-GPU Support: TensorRT is designed to take advantage of multiple GPUs, enabling efficient scaling across multiple devices for even faster inference performance

The platform offers APIs and plugins that facilitate seamless integration with widely-used deep learning frameworks like TensorFlow and PyTorch, enabling users to efficiently optimize and deploy their models using TensorRT.

Our AI models for person tracking, which operate on the Jetson Nano, require conversion to TensorRT format to enhance performance and reduce processing speed The accompanying figure illustrates the processing speed of object detection models following their conversion to TensorRT format.

Figure 2.10 Performance results of various Machine Learning networks on Jetson Nano using NVIDIA's TensorRT library, with FP16 accuracy and batch_size 1

Object tracking

The tracking algorithms in this product play a crucial role in helping the robot accurately locate the individual it is monitoring, initially identified by object detection models It is essential to prevent any instances of mistakenly tracking the wrong object, especially during occlusion.

In the past, some tracking algorithms are provided by OpenCV library (a specialized library for 2D image processing):

BOOSTING Tracker is a widely used tracking algorithm that integrates machine learning, making it an online variant of the AdaBoost algorithm However, it has limitations when it comes to tracking obscured objects.

- MIL (Multiple Instance Learning) Tracker: more accurate than Boosting Tracker but slower and it is impossible to stop when the object is lost

The KCF Tracker combines two advanced tracking algorithms, enhancing speed and accuracy while effectively halting when an object is lost However, it struggles to resume tracking seamlessly when the object reappears.

- TLD Tracker: this algorithm is unstable and mistaken object tracking

- MOSSE Tracker: the significant trade-off between speed and accuracy

- GOTURN Tracker: this algorithm applies deep learning in tracking but it is really hard to use

Deep learning models outperform traditional tracking algorithms by delivering remarkable speed and accuracy, effectively tackling challenges such as obscured objects and rapid movements.

Figure 2.11 The comparison accuracy of state-of-the-art tracking models

The STARK model, introduced at CVPR 2021, demonstrates impressive performance in person tracking tasks due to its speed and accuracy Utilizing ResNet as its backbone and featuring six encoder layers, STARK employs a modern deep learning architecture It is trained on several datasets, including LaSOT, GOT-10K, TrackingNet, and COCO, which contribute to its effectiveness in tracking algorithms.

STARK-Lightning, a compact version of the STARK model, utilizes RepVGG16-4-Lite as its backbone, resulting in a speed increase of approximately seven times compared to the original versions, though with a slight decrease in accuracy Ultimately, we opted for STARK-Lightning to enhance the person tracking feature.

OCR (Optical Character Recognition)

OCR technology transforms printed or handwritten text into machine-readable formats by analyzing visual patterns of characters and converting them into electronic text This sophisticated process has revolutionized data management and accessibility.

18 entry, document management, and text processing, making it an integral part of various industries and applications

Optical Character Recognition (OCR) employs advanced algorithms and machine learning methods to identify and extract text from images or scanned documents The OCR process consists of key steps, including image preprocessing, text detection, character segmentation, and the final recognition of characters.

Image preprocessing is a crucial first step that enhances and normalizes images to improve text quality and clarity This process includes essential tasks like noise reduction, image binarization, and deskewing, all aimed at achieving optimal results for the following processing stages.

Text detection involves identifying and locating areas within an image that contain text This process utilizes various techniques, including edge detection, connected component analysis, and deep learning methods, to effectively recognize and pinpoint text regions in images.

Character segmentation is a crucial process that involves isolating individual characters from text regions, especially in handwritten text or when characters are closely connected The techniques used for segmentation can differ based on the specific script and language being analyzed.

Optical character recognition (OCR) is the process of converting segmented characters into machine-readable text OCR engines utilize statistical models, pattern matching, and machine learning algorithms to accurately identify and interpret characters To enhance accuracy and performance, these models are trained on extensive datasets of annotated characters.

OCR technology is widely utilized across multiple industries for document digitization, allowing the transformation of paper documents into searchable and editable electronic formats This capability is essential for effective archiving, information retrieval, and enhancing document management systems.

OCR technology plays a crucial role in data entry and form processing by automating the extraction of information from documents like invoices, surveys, and application forms This automation minimizes the reliance on manual data entry, significantly reducing errors and enhancing overall efficiency.

OCR significantly enhances accessibility for individuals with visual impairments by converting printed text into digital formats This technology enables the use of screen readers and text-to-speech tools, making information more accessible to those with visual challenges.

The accuracy and performance of OCR systems have significantly improved over the years due to advancements in machine learning and deep learning techniques The

19 availability of large annotated datasets and powerful computing resources has enabled the development of sophisticated OCR models that can handle various fonts, languages, and writing styles

One notable OCR framework is Tesseract, an open-source OCR engine developed by Google Tesseract has gained popularity for its accuracy and flexibility, supporting multiple languages and script types

PaddleOCR is a powerful open-source OCR toolkit developed by PaddlePaddle, featuring a wide array of tools and pre-trained models for text detection, recognition, and layout analysis With its user-friendly API and command-line interface, PaddleOCR is designed to be accessible for both developers and researchers The advancements in OCR technology, exemplified by PaddleOCR, have significantly changed how we process and manage text-based information.

Optical Character Recognition (OCR) has significantly improved digitization efficiency, streamlined data entry processes, and enhanced accessibility for visually impaired users As algorithms and deep learning techniques continue to advance, OCR is poised for further evolution, opening up new opportunities and applications across diverse industries.

The mobile robot is designed to gather two essential signals from its environment: horizontal angle deviation and the distance to a person While the tracking model effectively addresses the collection of horizontal angle deviation, obtaining distance signals is crucial for the robot's functionality Depth cameras provide 3D images or point clouds that can supply the necessary distance information for the robot to operate efficiently.

Depth cameras, utilizing technologies like time-of-flight and structured light, capture both depth and color data to create detailed 3D images and point clouds, which represent the geometry and structure of a scene The point cloud consists of a collection of 3D points with precise spatial coordinates, enabling comprehensive analysis and processing These innovative technologies are widely used in fields such as augmented reality, robotics, autonomous vehicles, and 3D mapping, providing valuable insights into spatial characteristics.

Figure 2.12 RGB image (top-left), point cloud (top-right), and depth image (bottom) received from the depth camera Realsense D435

PID controller

The PID (Proportional Integral Derivative) controller is a popular feedback control mechanism utilized in industrial control systems, particularly in closed-loop systems that rely on feedback signals This controller is specifically designed for SISO (Single Input Single Output) systems, which feature one input and one output.

In control systems, achieving stable operation and desired output across various applications—such as motor speed, temperature, pressure, and flow control—requires an effective control algorithm The PID (Proportional-Integral-Derivative) algorithm is among the most commonly utilized solutions for control challenges When properly tuned, the PID controller can effectively eliminate steady-state errors, enhance response speed, and minimize overshoot, making it a vital tool in control system design.

The PID algorithm operates by continuously monitoring the system's output and comparing it to the desired target value When there is a discrepancy, or error, between the actual output and the set value, the PID controller calculates the necessary control input to correct the deviation and achieve the desired performance.

A closed-loop control system continuously adjusts its operations until the error reaches zero, indicating that the system has achieved a stable state This process involves feedback from the system output to the controller, enabling ongoing corrections to maintain optimal performance.

Figure 2.13 Diagram of a PID Controller

Based on the diagram above:

• PV(t): Process variable (the actual output of the system at time t)

• e(t) = SP(t) – PV(t): Deviation between SP(t) and PV(t)

• MV(t): Output of PID controller is calculated using the following formula:

Proportional Gain (Kp) significantly influences system response, with higher Kp values leading to stronger and faster reactions However, an excessively high Kp can cause overshooting and instability, resulting in oscillations If the gain value exceeds a certain threshold, the system may dampen its oscillations but ultimately become unstable.

The integral component (Ki) plays a crucial role in system performance, as a higher value facilitates quicker reduction of steady-state error However, this increase also leads to greater overshoot, since any negative error accumulated during the transient phase must be offset by a positive error before achieving stability.

The derivative component (Kd) plays a crucial role in control systems, as a higher value can minimize overshoot However, it may also slow down the transient response and increase the risk of system instability by amplifying signal noise in the derivative error term Therefore, the derivative term should always be used in conjunction with either the proportional (P) or integral (I) terms for optimal performance.

In certain applications, it is essential to utilize only one or two terms based on the system's requirements This can be accomplished by adjusting the gains of the undesired outputs to zero.

A PID controller can be classified as a PI, PD, P, or I controller based on the absence of one or two terms The PI controller is particularly popular due to its effective response to derivative action, making it beneficial for handling measurement noise However, if the integral term is omitted, the system may struggle to reach the desired setpoint.

There are multiple solutions to adjust PID controller's parameters [11]:

• Using supported tools or softwares

HARDWARE AND MECHANICAL DESIGN

Technical Requirements

The initial technical requirements for calculating and designing the mechanical system are as follows:

• Terrain: Moving on a flat tiled surface, without inclines or significant obstacles

• Base size of the robot: Based on real surveys of the distance between shelves in two supermarkets, Coopmart (District 9) and GO (Binh Duong), it ranges from 1,2m to 1,5m

➔ The chosen size of the robot base (length x width) is 0,5m x 0,4m

Frame size of the robot:

• The length and width of the robot frame will correspond to the base dimensions

• Height of the robot: Based on the average height of Vietnamese individuals, approximately 1,56m (female) and 1,68m (male), as well as the average height of regular supermarket trolleys

➔ The chosen size for the robot frame (length x width x height) is 0,5m x 0,4m x 1m

➔ The total weight of the robot and payload is approximately 70kg

➔ The maximum speed of the robot is 1 m/s = 3,6 km/h.

Design Proposal

The robot will feature two degrees of freedom, allowing for both linear motion and rotation To achieve this, the team has selected a differential drive configuration, which includes two primary drive motors for the main wheels and four omnidirectional wheels to enhance balance and support a substantial payload.

Figure 3.1 Basic Configuration of a Robot

The diagram above represents the mechanism that the team finds most suitable for the project, considering factors such as control, design, and mechanical fabrication convenience

The robot control is as follows:

• Straight motion: Control both motors at the same speed and in the same direction

• Arc motion: Control the two motors at different speeds (one wheel faster and one wheel slower) to achieve curved motion

• Rotation around the center: Control both motors at the same speed but in opposite directions to rotate the robot around its center

• These control methods provide the desired maneuverability for the robot, allowing it to navigate in a straight line, follow curved paths, and rotate effectively

3.3 3D Structural Design of the Robot

After the ideation process and finalizing the design proposal, the team proceeded with the 3D design of the robot's structure:

Figure 3.2 3D Design of the Robot

The mechanical system of the robot is constructed with two main components:

The base is built using a 20x20 mm aluminum profile as the frame, with two aluminum-mica plates serving as the base panels The base consists of two layers:

- Upper layer: This layer accommodates the power supplies, low-voltage circuits, motor drivers, Jetson Nano, and microcontrollers

- Lower layer: This layer includes the belt drive system, axle bearings, motors, main drive wheels, and omnidirectional wheels

The cargo compartment features two L-shaped connecting bars that support a maximum load of 50kg, linking the base and the compartment A conveniently positioned control panel at chest level enhances user operation This design choice not only provides a spacious area for electrical system maintenance but also ensures ease of access during the maintenance process.

Building the robot base

Calculation of the maximum required power of each motor P dc :Based on the give input parameters:

• Total weight of the robot and maximum payload: m = 70 kg

• Maximum speed of the robot: v = 1 m/s

• Selection of the drive wheel diameter d w = 145 mm = 0,145 m (radius r w = 0,0725 meter) and maximum load capacity: 70 kg

Figure 3.5 The main drive wheels

The robot is equipped with two motors that power the middle wheels via a belt transmission system, complemented by four omnidirectional wheels at the front and rear To determine the necessary motor power, a specific formula can be applied.

P lv : the working power of the motor

P mm : the power loss through the transmission systems

Figure 3.6 The main forces acting on the robot during movement

According to Newton's second law, we have:

F k : the total pulling force required for the robot to move à: the coefficient of rolling friction Based on the table below, we choose à = 0,01

Figure 3.7 Table of rolling friction coefficients

F ms = 70 × 9,81 × 0,01 = 6,87 (N) Calculating the maximum acceleration a when accelerating from v 0 = 0 m/s to v 1 m/s in t =1,5 seconds: v = v 0 + at (3.4)

1,5 = 0,67 (m/s 2 ) From (3.2), we can deduce that:

F k = 70 × 0,67 + 6,87 = 53,77 (N) The power required to provide sufficient traction for the robot:

The working power of each motor: P lv = P k

2 = 26,89 (W) The formula to calculate power loss:

In which: η: the overall efficiency of the transmission systems We use a pair of rollers and a belt transmission system, therefore η = η b η r

Referring to Table 2.3 on page 19[12], we obtain: η b = 0,95: the efficiency of the belt transmission system η r = 0,99: the efficiency of a pair of rollers η = η b η r = 0,95 × 0,99 = 0,94

0,94− 1) × 26,89 = 1,72 (W) From the working power and power loss, we can deduce the required power of each motor:

P dc ≥ 28,61 (W) Calculating the maximum rotational speed of the motor n dc :

According to the theoretical mechanics, we have: v = rω (3.7)

The maximum rotational speed of the wheel: ω w = v r w = 1 0,0725= 13,79 (rad/s) n w = ω w 60

2π= 13,79 × 9,55 = 131,7 (RPM) The maximum rotational speed of the motor: n dc = u n w (3.8)

In which: u = u b : the gear ratio of the toothed belt transmission

Calculating the maximum torque of the motor and the wheel axle (working axis):

Based on the give input parameters:

• The required power of the motor: P dc = 28,61(W) = 0,02861(kW)

• The maximum rotational speed of the motor: n dc = 263,4(RPM)

• The total pulling force required for the robot to move: F k = 53,77(N)

• The radius of the wheel: r w = 0.0725 (m)

The maximum torque of the motor:

263,4 = 1037,3 (Nmm) = 1,037 (Nm) The maximum torque of the wheel axle:

Table 3.1 The calculated motor parameters

Shaft Motor shaft Working shaft

Based on the calculated motor parameters, the chosen motor is the Planet 24VDC 60W 320RPM, which meets the specified requirements

Figure 3.8 Planet motor 24VDC 60W 320RPM

Technical specifications of the selected motor:

• Speed of the motor after the gear reduction: 320 RPM

3.4.2 Calculations and selection of the belt transmission system

To enhance torque and decrease motor speed, we implemented a toothed belt transmission system with a gear ratio of 2 The design and calculation of the wheel axle were performed concurrently with the selection of the toothed belt system, leading to the identification of an optimal transmission solution This process ultimately guided our choice of the wheel axle, ensuring compatibility and efficiency in the system.

The input parameters are as follows:

The module of the toothed belt is calculated using the formula 4.28 on page 69[12]: m = 35√P 1 n 1

The value of the module (m) is determined according to the standard specified in Table 4.27 on page 68 [12]: m = 1,5 mm, pitch of the belt p = 4,71 mm

The width of the belt (b) is calculated using formula 4.29 on page 69 [12]: ψ d = 7 b = ψ d m (3.12)

We choose b = 10 mm, according to Table 4.28 on page 69 [12]

The number of teeth on the driving and driven pulleys We select them based on Table 4.29 on page 70 [12]:

=> The team has selected a commonly available 1:2 ratio belt transmission system from the market with the following specifications:

• Number of teeth on the large toothed pulley: 40 teeth

• Number of teeth on the small toothed pulley.: 20 teeth

Based on the shaft diameter d = 8 mm and the shaft arrangement, the team has selected vertical bearing supports with a design that consists of double-row deep groove

Ball bearings are engineered to handle radial forces effectively while accommodating minor axial forces along the shaft Their key benefit lies in their ability to allow the shaft to tilt up to 2 degrees relative to the bearing race, making them ideal for supporting long shafts and applications with misalignment or slight vibrations These bearings excel in high-speed operations, ensuring low friction and vibration, and they do not require lubrication, enhancing their efficiency and reliability.

Figure 3.10 Vertical shaft bearing support

The OMEGA bearing support features an 8mm diameter and is shaped like the letter "OMEGA." Constructed from a robust cast iron alloy, this bearing housing is designed to endure heavy loads and is highly resistant to cracking and fracturing under impact Its interior is made of a specialized alloy steel that combines flexibility with exceptional wear resistance and thermal expansion properties This adaptable design enhances the overall flexibility of the bearing support.

• Distance between the two screw holes on the base: 42 mm

Omnidirectional wheels are capable of rotating freely 360 degrees and help the robot maintain better balance The team has selected omnidirectional wheels with a diameter of

100 mm, width of 50 mm and can withstand a maximum load of 70 kg

The base plate, crafted from 5mm thick aluminum 6061, offers an economical solution with high strength and widespread availability It features strategically drilled holes and grooves designed for motor mounting brackets, bearing supports, and connections to the robot's frame.

Calculating the kinematics of the robot

Table 3.2 Symbol table of kinematic parameters used for calculations

The linear velocity of the robot (m/s) v

The angular velocity of the robot (rad/s) ω

The angular velocity of the right wheel (rad/s) ω R

The angular velocity of the left wheel (rad/s) ω L

The speed of the right motor (RPM) n RM

The speed of the left motor (RPM) n LM

The linear velocity of the right wheel (m/s) v R

The linear velocity of the left wheel (m/s) v L

The orientation angle when the robot turns right or left (rad) θ

The radius of the main driving wheel (m) R = 0,0725 The distance between the centers of the two wheels (m)

The radius of the circular arc when the robot rotates at an angle θ

Considering the case of the robot moving straight (forward or backward):

Figure 3.13 Description of the robot moves in a straight line

The robot features two primary driving wheels, as depicted in Figure 3.13 Utilizing principles of theoretical mechanics, the linear velocity of each wheel can be calculated using the formula: v_R = Rω_R = Rn_RM * 2 * π.

The velocity of the robot's movement is calculated as the average of the sum of the linear velocities of the left and right wheels Therefore, we have: v =v R + v L

To make the robot move in a straight line, the linear velocity of the left wheel and the linear velocity of the right wheel must be equal: v = v R = v L (3.16)

The relationship between the linear velocity of the robot and the rotational speed of its two motors during straight-line movement can be expressed by the formula v = Rn RM ∗ 2 ∗ π.

60 ∗ 2 Considering the case when the robot moves in a circular arc:

The length of the trajectory curve of robot L T is the average of the circular arcs L L and L R

Figure 3.14 Describes the robot while moving in a circular arc

The orientation angle θ of the robot is calculated as follows: θ = L T

Both L R and L L can be calculated based on the radius of motion R T , the distance between the two wheel centers L, and the orientation angle θ of the robot's movement:

From that, we can calculate the orientation angle of the robot based on the lengths

L R and L L , as well as the distance between the two wheel centers L: θ =L R − L L

In order for the angular velocity of the robot to be non-zero, the linear velocities of the right wheel v R and the left wheel v L must be different: θ̇ =L R ̇ − L L ̇

L The angular velocity of the robot: ω =R(ω R − ω L )

From (3.13), (3.14), and (3.25), we have the following system of equations:

Solving the system of equations, we obtain the solution as follows:

Due to the use of a belt transmission system with a gear ratio of 2, the angular velocity of the left motor and the right motor is as follows:

Therefore, we can establish the relationship between the speeds of the two motors, the linear velocity and the angular velocity of the robot:

Calculating the dynamics of the robot

Table 3.3 Symbol table of dynamic parameters used for calculations

The longitudinal and lateral forces of the right wheel

The longitudinal and lateral forces of the left wheel (N) F lx , F ly

The mass of the robot (kg) m

The forward and lateral robot velocities (m/s) v x , v y

The linear velocity of the robot (m/s) v

The angular velocity of the robot (rad/s) ω

The moment of inertia of the robot with respect to the axis passing through the point G (kg/𝑚 2 )

The right motor torques (Nm) 𝜏 𝑟

The left motor torques (Nm) 𝜏 𝑙

The mechanical constants of the motors 𝑘 𝑎

The electrical constants of motors 𝑘 𝑏

The supply voltages of the motors (right and left) (V) 𝑉 𝑟 , 𝑉 𝑙

The resistance of the motor (Ω) 𝑅 𝑎

The coefficient of viscous friction reduced to the motor shaft

The moment of inertia of the wheel with respect to the axis of rotation (kg/𝑚 2 )

The angular velocity of the right wheel (rad/s) ω r

The angular velocity of the left wheel (rad/s) ω 𝑙

The orientation angle when the robot turns right or left

The radius of the main driving wheel (m) r = 0,0725

The distance between the centers of the two wheels (m) L = 0,4

After calculating the robot's kinematics, the team proceeds to calculate the robot's dynamics using the parameters listed in the table above

Figure 3.15 Description of the Dynamic Model of the Robot

The point G, depicted in Figure 3.15, serves as the robot's center of mass, center of rotation, and the tracking point for its desired trajectory The robot's dynamics model was developed by applying the law of conservation of linear and angular momentum within the robot's associated coordinate system, as referenced in [13].

The robot's rotation point G, situated on the axis between the drive wheels, ensures that forces acting along the y-axis do not generate any torque Consequently, Equation (3.27) can be reformulated accordingly.

The angular velocity of the wheels can be determined by transforming the expressions for the right motor torques 𝜏 𝑟 and left motor torques 𝜏 𝑙 [13]:

The driving wheel dynamics equations take the form [13]:

By substituting the system of Equation (3.30) into the system of Equation (3.31) and transforming it into a form for determining longitudinal driving forces, a system of equations was obtained:

Then substituting the system of Equation (3.32) into Equation (3.28) and Equation (3.29) obtained:

ELECTRICAL DESIGN

Technical requirements

The initial technical requirements for calculating and designing the electrical system are as follows:

• A suitable power supply that matches the specifications of the devices in the system to operate at maximum power and for extended periods of time

• The devices are arranged in a logical manner, making them easy to install and replace in case of damage

• The system is equipped with power switches and emergency stop buttons to immediately halt the system in the event of any malfunction.

Block diagram and overview of the electrical system

Figure 4.1 Block diagram of the electrical system

The electrical system of the robot consists of 6 blocks:

• Block 1 is the power supply module for the entire system:

The 24VDC power supply, with a capacity of 14,000 mAh, is created by connecting two 12VDC accumulator batteries in series This power supply is essential for providing energy to the two H-Bridge drivers, which control the operation of the two motors.

- The 12VDC power supply (5200 mAh) from the Lithium Polymer battery, with a high-discharge current, is reduced to 5V - 4A to provide power to the NVIDIA Jetson Nano Board

- The 24VDC power supply (7500 mAh) from the Lithium - ion battery is reduced to 3,3V to provide power to the STM32 microcontroller and motor encoders

• Block 2 is the main data collection and processing block of the robot, with the following two functions:

The Personal Following function leverages the Jetson Nano to process data from a 3D camera, calculating control parameters such as motor speed, which are then transmitted to Block 4 using UART communication.

- The Auto Checkout function utilizes a laptop: With the task of calculation, processing the data returned from the 2D camera, and sending the data to Block

• Block 3 is the sensor module of the robot, consisting of:

- Realsense Camera D435: Collecting image data from the environment

- Camera 2D: Collecting image data from the product

- RPLidar A1: collecting distance, position, and other information of the robot within the operating environment

• Block 4 is the control module of the robot:

- This block consists of an STM32F103C8T6 microcontroller and two H-Bridge drivers HI216

The STM32 microcontroller processes speed commands received from the Jetson Nano in Block 2, calculating the necessary adjustments It then employs an H-Bridge driver to effectively control both the speed and direction of the motor.

The STM32 continuously receives data from the motor encoder, which is then input into the PID controller, ensuring stable motor performance and achieving the desired setpoint.

• Block 5 is the actuator module of the robot:

- The actuator module consists of two DC servo motors and encoders

• Block 6 is the user interface module for the automatic payment function:

- The team utilizes the iPad Gen 6 with Wi-Fi and 4G to display information about the purchased products Users can also interact with the screen to add and remove items

Power supply block

4.3.1 Calculating and selecting the power supply

The power supply block in the robot's electrical system is crucial for delivering energy to the entire system To choose the right power supply, it is essential to assess the operational power needs of each component involved.

• The Planet 24VDC motor has a maximum power rating of 60W With two motors, the total maximum power would be 120W

• Camera Intel RealSense Depth D435 5V 1,5A: The maximum power rating is 7,5W

• RPLidar A1 5V 0,1A: The maximum power rating is 0,5W

• Jetson Nano 5V 4A: The maximum power rating is 20W

• H-bridge Driver can be disregarded due to the very low heat dissipation power

• Buck converter circuit XL4015 can be disregarded due to the very low heat dissipation power

• Buck converter circuit XL4016 can be disregarded due to the very low heat dissipation power

• STM32F103C8T6 3,3V 0,01A: The maximum power rating is 0,033W

The formula for calculating the required battery capacity is as follows:

P: The total maximum power consumption of the devices

T: The continuous operating time corresponding to the maximum power consumption of the device

• The power supply for the H-bridge driver and the two motors:

The maximum power of the two motors P = 120W and the power supply voltage

Vpower = 24V, the required battery capacity for one hour A = PT

Vγ=6,25 (Ah) With the aforementioned accumulator battery, it can provide sufficient power for continuous motor operation for approximately 2 hours

Figure 4.2 Accumulator battery 12VDC (14000 mAh)

• Using a 12VDC Lithium Polymer battery (5200mAh) to power Jetson Nano Board, Camera Intel RealSense Depth D435, RPLidar A1:

The maximum power consumption of the three devices P +7,5+0,5= 28W and the power supply voltage Vpower V, the required battery capacity for one hour A = PT

= 2,92 (Ah) The aforementioned Lithium Polymer battery can provide enough power for all three devices to operate for over an hour

Figure 4.3 Lithium Polymer 12VDC (5200 mAh)

• Using a 24VDC Lithium-ion battery (7500 mAh) to power the STM32F103C8T6 microcontroller:

The maximum power consumption of the STM32F103C8T6 microcontroller P 0,033W and the power supply voltage Vpower = 24V, the required battery capacity for one hour A = PT

Vγ=0,0017 (Ah) The mentioned Lithium Polymer battery can provide enough power for the STM32F103C8T6 to operate for a long duration, lasting many hours

The XL4016 Buck converter circuit is used to reduce the voltage from the 12VDC Lithium Polymer battery to 5V, providing power to the Jetson Nano Board

Figure 4.4 Buck converter circuit XL4016

The XL4015 Buck converter circuit is used to reduce the voltage from the 24VDC Lithium-ion battery to 5V, providing power to the STM32F103C8T6 microcontroller

Figure 4.5 Buck converter circuit XL4015

Main data processing block

Figure 4.6 The block diagram of the main data processing block

The Jetson Nano board serves as the primary data processing unit for the robot, enabling it to interact with its environment and determine its position in relation to humans using input from the Realsense Camera D435 This data allows the robot to follow a person by calculating the information received and transmitting speed commands to the STM32 microcontroller through UART communication, which controls the motors To efficiently handle complex data streams and support parallel processing, the team has integrated the Robot Operating System (ROS) into the system, allowing for seamless data transmission and programming in various languages, including Python and C++.

The NVIDIA Jetson Nano Developer Kit is a powerful embedded computer designed for processing sensor data from the sensor block It communicates with the STM32 microcontroller via UART to manage the speed of two motors effectively Additionally, the Jetson Nano utilizes I2C communication to relay commands to the LCD, enabling real-time status updates for user monitoring of the robot's performance.

Sensor block

Figure 4.8 Realsense Camera D435 of Intel

The Intel RealSense Camera D435, a second-generation model in the D400 series, has been significantly enhanced based on extensive feedback and development in Stereo Camera technology This upgraded Stereo Vision Depth Camera utilizes dual image sensors to mimic human binocular vision, enabling it to accurately perceive spatial depth in its surroundings.

• Depth Technology: Active IR stereo

• Depth Field of View (FOV) (Horizontal × Vertical × Diagonal): 86° x 57° (±3)

• Depth Stream Output Resolution: Up to 1280 x 720

• Depth Stream Output Frame Rate: Up to 90 fps

• RGB Sensor Resolution and Frame Rate: 1920 x 1080 at 30 fps

• Camera Dimension: 90 mm x 25 mm x 25 mm

The RealSense Camera consists of three key components: a regular lens, an infrared lens, and an infrared laser lens These lenses work together to detect the return of infrared light from objects in front of the device, enabling accurate depth perception.

Control block

Figure 4.9 The block diagram of the control block

The control block features the STM32F103C8T6 microcontroller, paired with two H-bridge drivers and two DC servo motors It receives speed commands from the Jetson Nano through UART communication and implements a PID control algorithm to effectively manage the speed of both motors.

The STM32 family, developed by STMicroelectronics, includes popular series such as F0, F1, F2, F3, and F4 Among these, the STM32F103 from the F1 series features a 32-bit ARM Cortex-M3 core, operating at speeds up to 72MHz This microcontroller is cost-effective compared to similar options, and offers a wide range of user-friendly programming boards and tools.

The GPIO pins, timers, and communication protocols are configured to control the speed of the motors:

• Timer 2 and Timer 3 are used to generate PWM signals for controlling the speed of the right and left motors

• Timer 1 is used for timing purposes to calculate the speed of the motor

• Pin PA8 and PA9 are used to change the rotation direction of the right and left motors

• The UART1 module is used to receive and transmit data to the Jetson Nano

• Pin PA1 and PA2 are connected to the A and B channels of the right motor's encoder for reading the encoder pulses

• Pin PA3 and PA4 are connected to the A and B channels of the left motor's encoder for reading the encoder pulses

• Pin PB10 is used to output signals for controlling the lights

Figure 4.11 Shows the configuration of the pins that are used

The H-bridge is a crucial power circuit commonly utilized in DC motor control applications With various types of H-bridge drivers available, selecting the right one depends on specific application needs, including current ratings, control voltage specifications, and frequency requirements.

Figure 4.12 The structure of the H-bridge driver

The H-bridge driver HI216 enables precise control of a motor's rotation direction by selectively closing pairs of contacts, specifically S1 with S4 or S2 with S3.

The symbols used in the circuit:

• D+, D-: The pin for reversing the motor's rotation direction

• P+, P-: The pin for the PWM signal input

• GND, VCC: The power supply for the motor (12VDC – 48 VDC)

• M1, M2: The output pins that connect to the two terminals of the motor.

Actuator block

A DC servo motor consists of two main components: a DC motor and an encoder

The project utilizes a DC servo motor with an encoder, designed for precise control of speed, position, and torque Additionally, the motor features a gearbox that enhances torque while reducing speed, making it ideal for applications requiring accurate performance.

• Motor speed after the gearbox: 320 rpm

DC motors convert direct current (DC) electrical energy into mechanical energy, consisting of two main components: the stator and the rotor The stator, which can be made of permanent magnets or electromagnets, generates a magnetic field that magnetizes the system Meanwhile, the rotor, composed of coils connected to DC power, continuously rotates and is responsible for changing the direction of the current.

DC motors operate on the principles of direct current and the interaction of magnetic fields When current flows through the pole coil, it generates a magnetic field that interacts with the stationary magnet's field, creating a force that drives the motor This force causes the pole coil to rotate around the shaft Additionally, a phase corrector is employed to adjust the current's direction, enabling the motor to reverse rotation or rotate in a specific direction.

To effectively control the speed of a DC motor, adjusting the voltage supplied to the rotor is essential However, changing the motor speed using DC voltage alone can be difficult Therefore, the pulse width modulation (PWM) technique is employed to simplify this process.

Pulse Width Modulation (PWM) is a technique that controls the ratio of time a signal is high (ON) versus low (OFF) within a single cycle By adjusting the width of the high pulse while maintaining a constant cycle length, PWM effectively regulates the power delivered to devices.

PWM (Pulse Width Modulation) is an effective method for controlling DC motors by varying the power supply A wider high pulse width in the PWM signal results in a stable power source, enabling the motor to run at high speed In contrast, a narrower high pulse width delivers less power, causing the motor to operate at a lower speed.

By modifying the ratio of high to low pulse durations, we can effectively manage the input power to a DC motor A microcontroller or similar electronic devices can generate the PWM signal, which, through processing, allows for precise regulation of the motor's speed and power according to specific requirements.

An encoder is a vital element in closed-loop feedback systems, used to measure and provide data on rotational speed, distance traveled by a wheel, and the angular position of a spinning disk It is essential for any device with a rotating mechanism that requires accurate displacement calculations.

An absolute encoder delivers precise rotor position information without requiring extra signal processing, while a relative encoder, or incremental encoder, measures angular displacement from an initial position by generating pulses that indicate rotation angle, speed, and direction However, determining position or rotational speed with a relative encoder necessitates additional signal processing Due to its cost-effectiveness and suitability for this project, the relative encoder is the preferred choice.

An encoder is a device that converts mechanical motion into electrical signals, consisting of an encoder disk on a rotating shaft, a light source (such as an LED or IR emitter), and photodetectors (like photodiodes or phototransistors) positioned opposite the light source As the disk rotates, the photodetectors detect variations in light intensity caused by the disk's slots or marks, which are then processed by signal conditioning electronics This allows the encoder to deliver accurate position and motion information through either digital or analog outputs, all housed in a protective casing for durability.

Incremental encoders utilize two channels, A and B, to convey direction of rotation These channels simultaneously generate electrical pulses or digital signals with a phase difference, resulting in a unique pulse pattern known as quadrature encoding.

The phase difference between channels A and B is crucial for determining the rotor's rotation direction When the rotor rotates in one direction, channel A emits pulses ahead of channel B, while in the opposite direction, channel B leads By analyzing the quadrature pulse pattern, the encoder reading system accurately identifies the rotor's rotation direction.

The encoder accurately tracks the rotor's position and rotational speed by counting pulses and monitoring phase changes between channels A and B This dual-channel system not only determines the direction of rotation but also ensures high precision in position sensing and motor control.

ALGORITHM DESIGN

Designing 2D and 3D image processing algorithms

There are 2 key tasks that 2D and 3D image processing algorithms responsible for:

To effectively follow a person, the robot must first identify their location through a local planning task This process relies on AI models, specifically an object detection model and a single object tracking model, which supply the controller with essential parameters regarding the person's position.

The process involves detecting and recognizing text on product packaging, categorizing it into two classes: “product_name” for the actual product names and “None” for other text elements like manufacturer details, volume, slogans, or flavors This system is divided into two modules: a user-following module and an automatic checkout module.

Figure 5.1 Diagram of system for the automated payment feature

Figure 5.2 Diagram of system for the individual person following feature

Following person module

This module processes two types of inputs: color images and depth images Color images provide 2D spatial information essential for human detection and tracking using deep learning models, while depth images measure the distance between the robot and the tracked individual To effectively utilize both image types simultaneously, they must undergo an alignment step that coordinates their pixel data into a unified format.

Figure 5.3 Input and output of detecting and tracking node

• Input of this node: color image and depth image from Realsense camera

- /is_person: indicate that if the person is existing within the box

- /distance: the distance between robot and person

- /alpha: the deviation angle between robot and tracked person

The MobileNetV2-SSD object detection algorithm is the optimal choice for Jetson Nano, offering the highest processing speed and satisfactory accuracy Enhanced by NVIDIA's TensorRT tools, MobileNetV2-SSD delivers improved performance and efficiency.

NVIDIA offers pre-trained MobileNetV2-SSD models in TensorRT format via the Jetson_inference library These models, trained on the COCO dataset featuring 91 classes, achieve an impressive 39 frames per second (FPS) on the Jetson Nano platform.

To minimize noise during inference, we select only bounding boxes with a confidence score exceeding 0.8 and the class labeled as "person." The initial execution requires approximately 2 minutes to convert the loaded model from either Pytorch (.pth) or TensorFlow (.engine) formats.

TensorRT format When we run the whole system including ROS, a detection model and a tracking model, it achieves around 10 FPS

Figure 5.4 A result of object detection model

The STARK algorithms presented in Chapter 2 include four models: S50, ST50, ST101, and Lightning We have selected the Lightning model due to its optimization for edge devices, making it a suitable choice for single object tracking applications.

We tested STARK lightning model in 3 cases:

• A person on the screen, forward, backward, rotate left, rotate right

• People walk across and obscure the tracked person

The whole system when running the tracking model achieved around 4 FPS This processing speed is acceptable for robot tracking a person in ideal cases (normal movement speed)

Node “following_controll”: after receiving 3 different information including alpha, distance, and is_person then this node will calculate speed of 2 motors based on linear equations.

Automatic checkout module

5.3.1 OCR (Optical Character Recognition) block

This block will be responsible for recognizing all of the texts in the product packaging The input image from the camera will be passed through two models: text

The text detection model identifies regions containing text by creating bounding boxes, which are then utilized by the text recognition model to transform images into strings.

Figure 5.5 Data processing pipeline of OCR system

Our recent survey on product packaging reveals that the text differs significantly from standard document formats, often featuring unique shapes, curves, stylized designs, and vibrant colors As a result, our team emphasizes the use of segmentation-based models to enhance adaptability to these distinctive text conditions.

Figure 5.6 Some sample of product name in the packaging

After careful investigation and consideration of the available models, we have decided to implement the Differentiable Binarization (DB) model in our system

Figure 5.7 The comparison accuracy of state-of-the-art sense text detection models

The DB text detection model stands out for its impressive inference speed and F-measure score, making it one of the top choices in recent years It effectively integrates the binarization process within a segmentation network, allowing for adaptive threshold settings that simplify post-processing and boost text detection performance Notably, the use of a lightweight backbone enhances the model's efficiency, striking an ideal balance between detection accuracy and operational speed.

Figure 5.8 Results of DB model

The SVTR (Sequence-to-Sequence Vision Transformer) architecture processes images by utilizing a Patch Embedding layer, which converts the input image into a sequence format This method mirrors the Patch Embedding module found in the Vision Transformer model, enhancing its functionality in image analysis.

The application of transformer architecture in image processing involves dividing images into smaller segments, which facilitates their treatment as sequences, akin to traditional natural language processing techniques For a deeper understanding of this model, refer to the article "Vision Transformer - An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale."

The architecture comprises three layers, each featuring two essential modules: the Mixing block and Merging These modules are vital for enhancing the model's performance by extracting feature information at various scales while minimizing redundant representations across layers To fully appreciate their impact, it's important to explore the specifics of each module and their significance within the overall structure.

Figure 5.9 The comparison accuracy of state-of-the-art text recognition models

This module processes bounding boxes and the text contained within them, classifying the content into two categories: "product_name" and "other." Once the product name is identified, we query the database to find the specific product the customer intends to purchase.

We chose LayoutXLM for its low inference time and effectiveness in multilingual document understanding This multimodal pre-trained model excels at bridging language barriers in visually-rich documents Notably, LayoutXLM has significantly surpassed the current state-of-the-art cross-lingual pre-trained models on the XFUND dataset.

Figure 5.10 Architecture of LayoutXLM model

Algorithm for Robot navigation

Robot navigation involves controlling a robot to follow a person while maintaining a safe distance of 1 meter As the person moves straight, the robot mirrors this movement, adjusting its speed based on the distance to ensure it remains at the designated gap If the person turns right, the robot navigates in a curved path to the right, with its angular velocity influenced by the angle of deviation The same principle applies when the person turns left, allowing for smooth and safe following behavior.

The team is developing a node named "following_control" to receive data from a 3D camera through the following topics:

• /alpha: The parameter for the deviation angle between the person and the robot

• /distance: The parameter for the distance between the person and the robot

• /is_person: The parameter indicating whether there is a person in the frame or not

It has two values: 0 for no person and 1 for presence of a person

The control node is responsible for calculating the speed values for each motor and transmitting these speed commands to the STM32 microcontroller through UART communication.

Figure 5.11 The flowchart of the algorithm for robot navigation

The variable "is_person" determines the presence of a person within the robot's frame If no person is detected, the robot will rotate in place to locate one Upon detecting a person, the system collects distance and angle deviation data to compute control commands for the STM32 motor controller The "distance" variable quantifies the space between the robot and the person, with values ranging from 1 to 4 meters Meanwhile, the "alpha" variable indicates the angle deviation between the robot's camera and the person, with a range of -25° to 25°.

Figure 5.12 Describes the position of the person relative to the robot

The robot's control method for following a person involves determining its position relative to the individual using two key variables: "distance" and "alpha" (deviation angle) obtained from a 3D camera The robot is programmed to maintain a distance of 1 meter while moving at a linear velocity between 0.3 m/s and 0.7 m/s, ensuring safety and compliance with pedestrian speed in supermarket environments The robot can move straight when the deviation angle is within -7° to 7°, but turning poses more complexity Experiments established that the optimal angular velocity (𝜔) for following a person ranges from 0.2 rad/s to 0.3 rad/s When the deviation angle is between -25° and -7°, the robot executes a right turn, while a left turn occurs within the range of 25° to 7° The robot maintains a linear velocity of 0.15 m/s during turns, and it stops when the distance to a person falls below 1 meter to ensure shopper safety.

The linear velocity of the robot will vary depending on the 'distance' variable, according to the following linear equation:

The angular velocity of the robot will vary depending on the 'alpha' variable, according to the following linear equation:

From the kinematic calculations presented in chapter 3, we can calculate the required speed for each motor based on the following formula:

The speed of the right motor in units of RPM: n RM =2v + ωL πR ∗ 30 The speed of the left motor in units of RPM: n LM =2v − ωL πR ∗ 30

The control algorithm on STM32

To enable the STM32 to receive speed commands for motor control from the Jetson Nano, the team configures GPIO pins, UART1, and three timers One timer employs an interrupt function to accurately calculate speed, while the other two timers generate PWM signals to control the two motors effectively.

Figure 5.13 The flowchart of the control algorithm on STM32

The program begins by initializing essential values and configuring components such as GPIO, UART, and TIMER It then enters a continuous While(1) loop to maintain operation, where it checks the status of the emergency stop button to ensure the robot can be halted immediately in emergencies Upon the occurrence of an interrupt event, the program executes the relevant instructions within the interrupt handler function, utilizing three types of interrupts throughout the project.

• GPIO interrupt (External Interrupts): Read the pulses from a motor encoder

• Timer interrupts: Calculate the motor speed and the PWM pulse width

• UART interrupts: Receive speed commands from the Jetson Nano

The UART interrupt function (ISR1) is triggered when a new data event is received from the Jetson Nano This function processes the incoming data and converts it to update the motor's speed and direction accordingly.

GPIO interrupt function (ISR2): When an interrupt event occurs (GPIO pins change logic level), proceed with counting the number of pulses from the encoder

The Timer Interrupt Service Routine (ISR3) is triggered by a timer register overflow, allowing for the calculation of the motor's current speed using the encoder's pulse values Following this, the encoder pulse value is reset, and the motor's direction is adjusted by sending a signal to the DIR pin of the H-bridge driver The set speed and current speed are then fed into the PID function, which computes the necessary PWM pulse to be transmitted via the PWM pin of the H-bridge driver, effectively controlling the motor's operation.

EXPERIMENTS AND RESULTS

PID Controller for Motor Speed Control

6.1.1 The structure of a PID controller

The structure of a PID controller for motor speed control is constructed as the diagram below:

Figure 6.1 The diagram of a PID controller

• SP(t): The set speed or the desired motor speed

• MV(t): The voltage supplied to the motor, specifically the PWM pulse value

• PV(t): The actual speed of the motor calculated from the value returned by the encoder

• e(t) = SP(t) – PV(t): The error between the desired speed and the actual speed

6.1.2 Finding the transfer function of the motor from experimentation

The article discusses a 60 W DC Servo motor that operates at a speed of 320 RPM, featuring a gearbox with a gear ratio of 19.2 and an encoder that delivers 13 pulses per revolution (ppr) Testing reveals that the motor can reach a maximum speed of approximately 332 RPM.

The formula to calculate the motor speed n dc from the encoder value returned: n dc = M × 60 ppr × u × N × T (5.1)

• T: The sampling frequency (the time interval between each encoder pulse reading),

• n dc : The motor speed during the time interval T

• M: The number of encoder pulses during the time interval T

• ppr: The number of pulses per revolution of a single channel encoder without going through the gearbox, ppr

• u: The gear ratio of the gearbox, u

• N: The encoder reading mode N= 4 (considering the state of both channels A and

The steps to find the relative transfer function of the motor using the System Identification Tool in Matlab:

Step 1: Prepare the input data as a table of the motor speed variations from 0 to the maximum speed corresponding to a 24VDC voltage Apply a 24VDC power supply (from accumulator battery) and record the changing motor speed with a sampling frequency of 10ms The collected data consists of 139 samples

Figure 6.2 The collected data on the speed variations of the two motors

Step 2: Save the collected data into an Excel file Then, import the data from the

• Importing a column of voltage using a command: u1 = xlsread('PID.xlsx',1,'C2:C140')

• Importing a column of speed using a command.: y1 = xlsread('PID.xlsx',1,'A2:A140')

• Save the two variables u1 and y1 that have just been imported into a unified file using a command: save dcp u1 y1

Step 3: Use the System Identification Tool to find the relative transfer function for the right motor

• Open the interface of the System Identification Tool using a command: ident

Figure 6.3 The interface of the System Identification Tool

• Import the previously entered data on the speed variations of the motor from 0 to the maximum speed:

Figure 6.4 Displays the data that has been input into the System Identification Tool

After inputting the voltage data into the System Identification Tool, the corresponding output data, which reflects the right motor speed over time, will be visually represented in the figure below.

Figure 6.5 The graph shows the speed variations corresponding to the 24VDC voltage over time

Selecting a first-order transfer function without delay for the right motor:

Figure 6.6 The parameters of the relative transfer function for the right motor

Therefore, the relative transfer function for the right motor is:

Similarly, the relative transfer function for the left motor is:

Figure 6.7 The parameters of the relative transfer function for the left motor

6.1.3 Find the parameters of the PID controller for speed control of the motor

Through simulations in Matlab and various experiments with the actual motor, we identified the optimal PID controller parameters (Kp, Ki, Kd) tailored for each motor.

• Right motor: Kp = 0,083, Ki = 0,842, Kd = 0,0004

• Left motor: Kp = 0,0843, Ki = 0,8063, Kd = 0,000395

Figure 6.8 The general form of the PID control function

6.1.4 The PID control diagram of each motor

Utilizing the identified transfer function and PID parameters for each motor, the team develops a PID control diagram for each motor, as illustrated in the accompanying figure.

Figure 6.9 The PID control diagram of the right motor

Figure 6.10 The PID control diagram of the left motor

6.1.5 The experimental results of the PID controller on two motors

Using the identified PID controller parameters, the robot achieves stable straight motion at a maximum speed of 1 m/s without a load and 0.7 m/s when carrying a load.

The experimental results with a set speed of 102 RPM and no load for each motor are represented in the graph below:

Figure 6.11 The experimental results of the PID controller with a speed of 102 RPM and no load on the right motor

Figure 6.12 The experimental results of the PID controller with a speed of 102 RPM and no load on the left motor

The experimental results with a set speed of 102 RPM and load (robot mass of 50kg and product mass of 20kg) for each motor are represented in the graph below:

Figure 6.13 The experimental results of the PID controller with a speed of 102 RPM and load on the right motor

Figure 6.14 The experimental results of the PID controller with a speed of 102 RPM and load on the left motor

The experimental results with a set speed of 181 RPM and no load for each motor are represented in the graph below:

Figure 6.15 The experimental results of the PID controller with a speed of 181 RPM and no load on the right motor

Figure 6.16 The experimental results of the PID controller with a speed of 181 RPM and no load on the left motor

The experimental results with a set linear velocity of 0,7 m/s and load (robot mass of 50kg and product mass of 20kg) for each motor are represented in the graph below:

Figure 6.17 The experimental results of the PID controller with a set linear velocity of

0,7 m/s and load on the right motor

Figure 6.18 The experimental results of the PID controller with a set linear velocity of

0,7 m/s and load on the left motor

Figure 6.19 The experimental results of the PID controller with a set linear velocity of

0,7 m/s and load on the robot

The experimental results with a set angular velocity of 0,3 rad/s and load (robot mass of 50kg and product mass of 20kg) for each motor are represented in the graph below:

Figure 6.20 The experimental results of the PID controller with a set angular velocity of

0,3 rad/s and load on the right motor

Figure 6.21 The experimental results of the PID controller with a set angular velocity of

0,3 rad/s and load on the left motor

Figure 6.22 The experimental results of the PID controller with a set angular velocity of

0,3 rad/s and load on the robot

Deformation testing

6.2.1 The main base plate deformation testing

The robot's main base plate, constructed from 5mm thick Aluminum Alloy A6061, supports a load of 700N SolidWorks simulations indicate a maximum von Mises stress of 3.949e+07 N/m² and a minimum of 2.680e+03 N/m², both of which are below the material's allowable stress limit of 5.515e+07 N/m² This analysis confirms that the base plate maintains structural integrity, demonstrating a well-designed component that can endure the specified load without permanent damage, ensuring stability and durability under operational conditions.

Figure 6.23 Stress simulation result of the main base plate

Figure 6.24 Displacement simulation result of the main base plate

6.2.2 The cargo compartment deformation testing

The cargo compartment's structural integrity relies on two shaped aluminum bars made from Aluminum Alloy A6061, measuring 20x20mm, which connect it to the lower frame and support a maximum load of approximately 200N Simulation results from SolidWorks reveal a maximum von Mises stress of 4.479e+07 N/m² and a minimum of 2.155e+03 N/m² Given that the allowable stress for Aluminum Alloy A6061 is 5.515e+07 N/m², the analysis confirms that these aluminum bars can adequately maintain the required structural integrity.

Shaped aluminum bars are designed to meet strength requirements and effectively support load conditions, ensuring structural integrity and the ability to withstand specified loads without incurring permanent damage.

Figure 6.25 Stress simulation result of the cargo compartment

Figure 6.26 Displacement simulation result of the cargo compartment

6.2.3 The base frame deformation testing

Similar to the cargo compartment, the base frame also utilizes shaped aluminum with A6061 aluminum alloy material and corresponding dimensions of 20x20mm To test

To assess the strength and deformation of the base frame, a load of 1000N was applied, exceeding both the vehicle's total weight and its maximum payload capacity Simulations conducted with SolidWorks revealed a maximum von Mises stress of 3.555e+07 N/m² and a minimum of 2.155e+03 N/m² These results indicate that the base frame maintains structural integrity, as the maximum stress remains below the allowable limit of 5.515e+07 N/m² for Aluminum Alloy A6061.

Figure 6.27 Stress simulation result of the base frame

Figure 6.28 Displacement simulation result of the base frame

Training the semantic entity recognition model

Our group has collected real-time data at the Coopmart supermarket in District 9

In this dataset, we have collected a total of 2800 images containing information about products The list of products collected and labeled includes:

Figure 6.29 The list of products collected and labeled

After collecting real-time data, the group obtained the initial raw dataset as follows:

Figure 6.30 Chart of the dataset before preprocessing

To address the issue of imbalanced data, the team filtered and processed the dataset to prevent overfitting during model training The summarized results of this processed data are outlined below.

Figure 6.31 Chart of the dataset after preprocessing

After filtering the data, the group divided the dataset into two parts: 80% for training and 20% for validation.

After completing the data preprocessing phase, the team labeled the dataset in preparation for training They employed the PPOCRLabel tool, developed by the esteemed PaddleOCR team, recognized for their expertise in Computer Vision and Optical Character Recognition (OCR).

After labeling, we generated a TXT file that includes the content from all text bounding boxes, along with the corresponding class for each box Each row in this file represents an individual image from the dataset For example, one row details the text and class associated with a specific image.

Figure 6.32 Labels of data for LayoutXLM model

After labeling, the team proceeded to train the model In terms of hardware, you used the NVIDIA GeForce RTX 3090 Ti 24GB graphics card for the model training

During the training process that spanned 300 epochs, Divide into steps with the magnitude of the loss, H_mean, precision and recall functions as follows:

Figure 6.33 Loss function during LayoutXLM training process

Figure 6.34 H-mean (F1-score) during LayoutXLM training process

Figure 6.35 Precision during LayoutXLM training process

Figure 6.36 Recall during LayoutXLM training process

Model inference

The team ran the model in real-time on a laptop with the following hardware configuration:

• GPU: NVIDIA GeForce GTX 1050 Ti with 4GB RAM

The laptop hardware enables the product recognition system to achieve a performance of 5 frames per second (FPS) It effectively detects products with cylindrical packaging, such as water bottles, and maintains accuracy in recognizing product names even when they are partially obstructed.

Figure 6.37 Result of LayoutXLM model on an Aquafina product

Figure 6.38 Result of LayoutXLM model on a Lavie product

Recognizing user actions

This section aims to determine if the user is adding a product to the cart or removing it, ensuring the correct amount is reflected on the invoice.

Our team has chosen to implement the CSRT tracking algorithm, utilizing the OpenCV library, due to the straightforward movement of the items The object being tracked is the product name, which was previously identified by the Layout XLM model.

The model analyzes the coordinates of the bounding box center when entering and exiting the Region of Interest (ROI) An increase in the Y-coordinate suggests that the customer is adding an item to the cart, while a decrease indicates that the customer is removing an item from the cart This analysis effectively determines customer behavior regarding item additions and removals in the shopping cart.

User interface designing

After completing the basic functionalities of the system, the team proceeds to design a user interface with interactive capabilities, allowing users to add or remove products that were not detected

The interface includes the following sections:

Figure 6.39 GUI (Graphical User Interface) run on Apple Ipad 6th generation

The interface is divided into 6 sections as follows:

• Region 1: Data Input Area - used for customers to enter the name of the product they want to interact with

• Region 2: Search Button - helps search the database for information about the product entered in Area 1

• Region 3: Add to Invoice Button - allows customers to add the product entered in

Area 1 to the invoice This button is used when the model fails to detect the product

• Region 4: Remove from Invoice Button - allows customers to remove a product from the invoice This button is used when the model fails to detect the product

• Region 5: Total Amount - displays the total amount of the invoice, showing the accumulated cost of all products

• Region 6: Invoice Information - displays detailed information about the invoice, including product type, price, and quantity.

The result of the tracking model when the person is occluded

We evaluated the performance of the STARK algorithms in scenarios where the tracked individual is occluded and off-screen before reappearing The model demonstrated effectiveness in both situations, showing no signs of misunderstanding.

Figure 6.40 Results of the model in the case that the followed person is obscured by another person

Figure 6.41 Results of the model in the case that the followed person is out of screen and returns

When a tracked individual moves off-screen, the model utilizes a random bounding box for tracking This issue can be effectively addressed using the template matching function from the OpenCV library, which calculates a similarity score ranging from 0 to 1 between the template and the current frame The template is initialized using the output from the object detection model, and tracking frames are then compared If the similarity score falls below 0.3, the robot will pause until the score exceeds 0.3, ensuring accurate tracking.

CHAPTER 7: CONCLUSION AND FUTURE DEVELOPMENTS

We have successfully achieved our goals by developing the robot's mechanical and electrical systems, AI application, and advanced image processing algorithms for effective recognition of people and environments Additionally, we designed a user-friendly graphical interface that enhances customer interaction with our mobile robot, ensuring compatibility across various devices, including mobile phones, tablets, desktops, and Jetson devices.

To enhance customer experiences, a more powerful system than the NVIDIA 1050 Ti 4GB is essential for improving product recognition models Additionally, it would be ideal for our mobile robot to autonomously navigate back to the charging dock after completing its tasks.

This innovative product is set to revolutionize the shopping experience in large supermarkets by removing the need for traditional payment counters, allowing customers to shop effortlessly, even with heavy items This advancement is expected to greatly enhance the operational efficiency of supermarkets and shopping centers.

[1] Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, Liang-Chieh Chen, MobileNetV2: Inverted Residuals and Linear Bottlenecks, The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp 4510-4520

[2] Bin Yan, Houwen Peng, Jianlong Fu, Dong Wang, Huchuan Lu, Learning Spatio- Temporal Transformer for Visual Tracking, Computer Vision and Pattern Recognition

Yiheng Xu and colleagues (2021) introduced LayoutXLM, a multimodal pre-training model designed for multilingual understanding of visually rich documents This research focuses on enhancing computational language processing by integrating visual and textual information, thereby improving the ability to analyze and interpret complex document layouts across different languages.

[4] Rich Tech Robotics, Autonomous food service robots, https://www.richtechrobotics.com/matradee

[5] Relay Robotics, The World’s First Hospitality Service Robot, https://www.relayrobotics.com/blog/2020/2/25/the-worlds-first-hospitalityservice-robot- doubled-in-room-dining-in-one-month-emc2-chicago

[6] Notebook Check, Xiaomi launches a cheaper robot vacuum, the Mijia Robot Vacuum Cleaner 3C, https://www.notebookcheck.net/Xiaomi-launches-a-cheaper-robot vacuumthe-Mijia- Robot-Vacuum-Cleaner-3C.609385.0.html

[7] Following Inspiration, Wii go retail - The ultimate Customer‘s in-store experience, https://followinspiration.pt/index.php/pt/autonomous-robots/wii-go

[8] Robotis E – Manual, Turtlebot3, https://emanual.robotis.com/docs/en/platform/turtlebot3/overview/

[9] Wikipedia, Differential wheeled robot, https://en.wikipedia.org/wiki/Differential_wheeled_robot

[10] Juan Angel Gonzalez-Aguirre, Ricardo Osorio-Oliveros, Karen

L.RodríguezHernández , Javier Lizárraga-Iturralde , Rubén Morales Menendez , Ricardo

A Ramírez-Mendoza , Mauricio Adolfo Ramírez-Moreno and Jorge de Jesús Lozoya- Santos (2021), Service Robots: Trends and Technology

[11] Wikipedia, PID controller, https://en.wikipedia.org/wiki/PID_controller

[12] PGS.TS Trịnh Chất, TS Lê Văn Uyển, Tính toán thiết kế hệ dẫn động cơ khí, Nhà xuất bản Giáo dục, 2006

In 2023, researchers Michal Siwek, Jaroslaw Panasiuk, Leszek Baranowski, Wojciech Kaczmarek, Piotr Prusaczyk, and Szymon Borys from the Faculty of Mechatronics, Armament, and Aerospace at the Military University of Technology in Warsaw, Poland, conducted a study on the identification of dynamic model parameters for differential drive robots.

[14] Eka Maulana, M Aziz Muslim, Akhmad Zainuri, Inverse Kinematics of a Two-Wheeled Differential Drive an Autonomous Mobile Robot, 2014 Electrical Power, Electronics, Communications, Controls and Informatics Seminar (EECCIS), 2014.

Ngày đăng: 14/11/2023, 10:10

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

w