1. Trang chủ
  2. » Luận Văn - Báo Cáo

Design and development of social service robot based on nonverbal and verbal interaction

177 13 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Design And Development Of Social Service Robot Based On Nonverbal And Verbal Interaction
Tác giả Tran Quang Huy, Nguyen Duc Tai, Pham Minh Tuan
Người hướng dẫn Assoc. Prof. Dr. Nguyen Truong Thinh
Trường học Ho Chi Minh City University of Technology and Education
Chuyên ngành Electronics and Telecommunication Engineering Technology
Thể loại Graduation Thesis
Năm xuất bản 2023
Thành phố Ho Chi Minh City
Định dạng
Số trang 177
Dung lượng 12,12 MB

Cấu trúc

  • CHAPTER 1: INTRODUCTION (20)
    • 1.1. Overview (20)
    • 1.2. Literature review (21)
      • 1.2.1. International researches (21)
      • 1.2.2. Domestic researches (25)
    • 1.3. The urgency of research (27)
    • 1.4. The significance of the study in terms of science and practice (28)
      • 1.4.1. The scientific significance (28)
      • 1.4.2. The practical significance (29)
    • 1.5. Research objectives (29)
    • 1.6. The object and scope of the research (30)
      • 1.6.1. The object of the research (30)
      • 1.6.2. The scope of the research (30)
    • 1.7. Research methodology (30)
    • 1.8. Thesis organization (31)
  • CHAPTER 2: MECHANICAL SYSTEM DESIGN (32)
    • 2.1. Overview (32)
    • 2.2. Design requirements (33)
    • 2.3. Mechanical design for the robot base (34)
      • 2.3.1. Calculation and selection of engine (36)
      • 2.3.2. Transmission Ratio Distribution (38)
      • 2.3.3. Calculation of belt drive (39)
      • 2.3.4. Calculation and testing of shafts (42)
      • 2.3.5. Calculation of bearing (47)
      • 2.3.6. The durability simulation of the robot base (48)
    • 2.4. Mechanical design for robot arms (49)
      • 2.4.1. The forearm design (49)
      • 2.4.2. The shoulder design (50)
      • 2.4.3. The biceps design (53)
      • 2.4.4. The elbow joint design (54)
      • 2.4.5. The wrist design (54)
      • 2.4.6. The hand design (55)
      • 2.4.7. The robot neck design (55)
  • CHAPTER 3: KINEMATICS AND DYNAMICS PROBLEMS (57)
    • 3.1. Overview (57)
    • 3.2. Kinematics and dynamics of the robot base (57)
      • 3.2.1. Wheeled Mobile Robots (57)
      • 3.2.2. Calculation of forward kinematics for the robot base (58)
      • 3.2.3. Calculation of inverse kinematics for the robot base (61)
      • 3.2.4. Dynamics of the robot base (62)
    • 3.3. Kinematics and dynamics of the robot arms (64)
      • 3.3.1. Setting up the D-H board for the robot arms (64)
      • 3.3.2. Calculation of forward kinematics for robot arms (66)
      • 3.3.3. Calculation of inverse kinematics for robot arms (67)
      • 3.3.4. Calculation of Velocity Kinematics (68)
      • 3.3.5. Dynamics of robot arms (70)
    • 3.4. Kinemactics of the robot head (72)
  • CHAPTER 4: ELECTRICAL AND CONTROL SYSTEM DESIGN (73)
    • 4.1. Overview (73)
    • 4.2. Structure of the electrical system (74)
    • 4.3. Calculation of battery (75)
    • 4.4. Sensor system (76)
      • 4.4.1. Encoder (76)
      • 4.4.2. Ultrasonic sensor (77)
    • 4.5. Control system design for robot arms (78)
      • 4.5.1. Trajectory planning for robot arms (78)
      • 4.5.2. Build algorithm for arms and neck of the robots (81)
    • 4.6. Control algorithm system design for the robot base (82)
      • 4.6.1. Overall system (82)
      • 4.6.2. Positioning method (83)
      • 4.6.3. Path Planning Fuzzy Logic Controller (PPFLC) (86)
      • 4.6.4. Obstacle Avoidance Fuzzy Logic Controller (OAFLC) (88)
      • 4.6.5. Auto Docking Fuzzy Logic Controller (ADFLC) (92)
  • CHAPTER 5: IMAGE PROCESSING SOFTWARE DESIGN FOR NONVERBAL (94)
    • 5.1. Face detection method using Haar Cascade (94)
      • 5.1.1. Introduction to face detection (94)
      • 5.1.2. Haar-like features in the cascade filter method (94)
      • 5.1.3. The Adaboost classifier in method cascade (96)
      • 5.1.4. Cascade model (96)
      • 5.1.5. How the haar cascade filter work (97)
    • 5.2. Hand detection (98)
    • 5.3. Convolutional neural network algorithm (CNN) (99)
      • 5.3.1. Neural networks (99)
      • 5.3.2. Introduction to CNN (99)
      • 5.3.3. Components of CNN (100)
    • 5.4. Method to determine the distance from the robot to the human’s face (102)
    • 5.5. Emotion recognition based on transfer learning (103)
      • 5.5.1. Transfer learning (103)
      • 5.5.2. Dataset of emotion recognition (104)
      • 5.5.3. Methodology of emotion recognition based on transfer learning (106)
    • 5.6. Identification recognition process (107)
    • 5.7. Hand gesture recognition to control the robot (109)
  • CHAPTER 6: NATURAL LANGUAGE PROCESSING SOFTWARE DESIGN FOR THE (110)
    • 6.1. Voice recognition (110)
    • 6.2. Text processing techniques (110)
      • 6.2.1. Tokenization (110)
      • 6.2.2. Stopwords and punctuation removal (111)
    • 6.3. Text classification (112)
      • 6.3.1. The extraction of CNN in natural language processing (112)
      • 6.3.2. Bidirectional Long short-term memory model (112)
      • 6.3.3. Text classification method based on CNN-BiLSTM model (114)
      • 6.3.4. The result of text classification (115)
    • 6.4. Question and answer model (116)
      • 6.4.1. Training dataset (116)
      • 6.4.2. Proposed method (116)
    • 6.5. The large laguage model for handling the knowledge text (119)
      • 6.5.1. ChatGPT’s language model (119)
      • 6.5.2. Collect the information based on ChatGPT (119)
    • 6.6. Name Entity Recognition in Vietnamese language (120)
  • CHAPTER 7: EXPERIMENTS AND EVALUATIONS (122)
    • 7.1. Overview (122)
    • 7.2. Basic parameters of the robot (122)
    • 7.3. Speech interaction experiments (123)
    • 7.4. Emotion recognition experiments (126)
    • 7.5. The combination of the base, 2 arms and neck of the robot (126)
    • 7.6. System experiments in the real environment (128)
    • 7.7. Self-charging experiments (129)
      • 7.7.1. Navigation and Obstacle Avoidance experiments (129)
      • 7.7.2. Auto Docking Experiments (134)
  • CHAPTER 8: CONCLUSIONS AND RECOMMENDATIONS (138)
    • 8.1. Conclusions (138)
    • 8.2. Recommendations (138)

Nội dung

INTRODUCTION

Overview

Robotics is an advancing field focused on the design, development, and deployment of robots, which are devices programmed to perform tasks autonomously or with human assistance These robots vary widely, from industrial robots that enhance manufacturing efficiency by automating repetitive and hazardous tasks to humanoid robots that engage with people in social settings Significant advancements in robotics have transformed industries, particularly manufacturing, where robots increase productivity and precision in processes like assembly and welding Beyond manufacturing, robots are utilized in agriculture for tasks such as crop monitoring and spraying, as well as in logistics and transportation with the development of autonomous vehicles and delivery robots Additionally, robots play a crucial role in space exploration, executing tasks in environments that are inhospitable to humans.

Figure 1.1: Number of industrial robots used in the world in the period 2011 – 2021 [27]

Robots are programmable devices capable of autonomously executing both repetitive and complex tasks, with social service robots currently undergoing extensive research and development globally These robots hold the promise of delivering significant value to society and individuals, enhancing both business operations and community interactions By fostering human-robot collaboration through artificial intelligence (AI) and intelligent algorithms, social service robots can effectively engage with users For example, in commercial settings, robots can identify potential customers, address their concerns, and provide alternative solutions to sales staff In elderly care, robots can motivate users to maintain active lifestyles, encourage social interactions, and promote overall well-being.

The advancement of social service robots offers significant potential across various sectors and enhances individual well-being These robots can assist with tasks, offer companionship, and promote societal advancement By fostering collaboration among researchers, developers, and users, the integration of social robots can create a more interconnected and thriving future Leveraging AI and human-robot interaction, social service robots can improve quality of life and stimulate considerable growth in both businesses and society.

Literature review

Several companies now provide solutions for commercial service robots, including Fetch Robotics, which specializes in mobile bases and manipulators for e-commerce warehouse automation Fellow Robotics offers robotic sales representatives for retail, while Savioke focuses on hotel services and senior care Adept provides customizable general-purpose mobile platforms suitable for various applications, such as restaurant management The costs of service robots have significantly decreased, and ongoing advancements in safety, self-calibration, and user-friendly interfaces are leading to their increased adoption in new sectors These service robots aim to enhance worker efficiency, speed, and safety.

Fetch Robots, founded in 2014 and headquartered in San Jose, California, specializes in the production of industrial robots In 2015, the company launched the Fetch robot, featuring a mobile base and a robotic arm, designed for seamless operation in warehouse environments This innovative robot includes an automatic charging mechanism that allows it to recharge as needed, ensuring uninterrupted productivity.

3 function independently of human workers and completes repetitive activities like selecting and sorting objects and distributing commodities

Future Robot released the FURO-K service robot to the market in South Korea in

In 2011, FURO features a large touchscreen that enables users to easily select its functions, displaying an image of a joyful, smiling girl that conveys emotions When overwhelmed by multiple simultaneous requests, the robot displays uncertainty by shaking its head.

Figure 1.2: Fetch Robots is working in warehouse [28]

Figure 1.3: FURO-K Robot in public environment [29]

The demand for robots is rapidly increasing across various sectors, including sales, transportation, and personal assistance One notable example is the OSHbot from Fellow Robots, designed to support retailers by providing detailed product information and guiding customers to the locations of items within the store Equipped to communicate in 25 different languages, the OSHbot helps shoppers find products while offering insights on pricing and features If customers choose not to follow the robot, it can still assist by pinpointing item locations on a map, enhancing the overall shopping experience.

Pepper, a groundbreaking robot co-developed by SoftBank Mobile and Aldebaran Robotics, has gained global attention for its unique ability to assess human emotions and adapt behavior accordingly Standing at just 121 cm tall and weighing 28 kg, Pepper features a 10.1-inch touch screen on its chest, equipped with a WiFi connection for constant internet communication Designed for family use, this innovative robot moves using a rolling motion on a soleplate and is capable of verbal communication, making it a versatile addition to daily life.

17 different languages According to researchers, Pepper can understand up to 70-80% of the content of a spontaneous conversation and respond thanks to a pre-programmed response

Figure 1.4: OSHBOT is working in supermarket [30]

In the past, robots were often envisioned as rigid machines executing complex tasks, but today, they are increasingly seen as automated devices performing functions that resemble human activities Savioke, a company focused on enhancing human experiences through robotics, has developed the Relay robot, designed to transport items between individuals, particularly in the hospitality sector This innovative robot operates in hotels, delivering food and requested items to guests in the absence of staff, and has been implemented by major hotel chains like Starwood and InterContinental Notably, Relay has proven to be more efficient than traditional delivery personnel, showcasing the evolving role of robotics in everyday life.

Figure 1.5: Pepper robot takes care of the patient [31]

Figure 1.6: The robot is serving on the hotel [32]

Japan is a global leader in robotics research and application, exemplified by Fujitsu Laboratories and Fujitsu Frontech's launch of the multifunctional ENON service robot Designed to assist individuals with daily routines, ENON features a high-resolution integrated camera for effective monitoring and patrolling Its advanced obstacle-detecting software enables smooth navigation through curving hallways, while it can guide and identify individuals for order acceptance Additionally, ENON is equipped with voice recognition and speech synthesis capabilities, and it can connect to the Internet for remote control via a PC.

In 2010, Dr Nguyen Duc Thanh from the Faculty of Electrical and Electronics Engineering at Ho Chi Minh City University of Technology developed an autonomous robot capable of recognizing emotions on human faces This innovative robot consists of three main components: a head equipped with two cameras, and a motor that adjusts the camera's rotation angle Its structure features three levels housing the machine and control circuits, while the base is supported by three mobile motors, enhancing its functionality and mobility.

Figure 1.7: ENON is interacting with human [33]

Equipped with a camera, the robot can identify human faces and analyze the captured data through its internal computer system to assess the emotional state of individuals, determining if they are happy, sad, surprised, or angry In response to these emotions, the robot will react accordingly, offering assistance during times of distress while reducing its engagement when individuals are content However, the robot's emotional expressions are limited, its accuracy is subpar, and its mechanical stability is lacking, indicating that it has yet to meet the emotional interaction needs of humans.

Figure 1.8: Emotion recognition robot in VietNam [34]

Figure 1.9: MORTA is serving at coffee shop [35]

Mr Nguyen Quoc Phi developed and constructed the MORTA coffee robot (Figure

In 2017, a revolutionary robot named Morta was introduced, functioning as a waiter that serves and invites customers for coffee using voice commands in multiple languages Initially launched in a coffee shop in Hanoi, Morta is designed with two omnidirectional wheels and a balancing wheel, showcasing advanced robotic engineering The restaurant's floor features aluminum tape to assist the robot's magnetic sensor in navigating to each table, while an ultrasonic sensor detects obstacles If the robot encounters a collision that alters its path, the system automatically instructs it to return to its original position.

The Co Ba robot, created by a dedicated team of seven from the Department of Mechanical Engineering at Ho Chi Minh City University of Technology and Education since late 2013, showcases remarkable capabilities Equipped with thermal and ultrasonic sensors, it can autonomously and remotely navigate to tables, skillfully maneuvering around obstacles and stopping promptly when necessary Additionally, this innovative robot serves as an informative staff member, offering guests valuable insights into the menu, beverages, and distinctive features of its environment.

The urgency of research

The urgency of research for the thesis "Design and Development of Social Service Robot Based on Nonverbal and Verbal Interaction" is driven by several key factors

Firstly, in the previous section, we examined the landscape of social service robot companies, highlighting the significant advancements made by renowned technology

Figure 1.10: Co Ba Robot is serving in reastaurant [36]

In Vietnam, the production of social service robots is still in the research phase, with significant progress needed before manufacturing and mass production can begin Currently, there are virtually no local companies producing service robots, forcing businesses and individuals to rely on expensive foreign imports The high costs and insufficient software capabilities of these robots hinder their widespread adoption in the country.

The growing societal demand for advanced social service robots highlights the necessity for these technologies to effectively interact with humans As automation and robotic assistance become increasingly prevalent, there is an urgent need for robots capable of understanding nonverbal cues and engaging in meaningful verbal communication This research aims to develop socially intelligent robots that enhance human-robot interaction in public spaces like hospitals, restaurants, and hotels The potential advantages of social service robots, including improved efficiency, enhanced customer experiences, and elevated service quality, underscore the importance of this research.

Developing robots capable of effective communication and interaction with humans presents a transformative opportunity for industries like hospitality, healthcare, and customer service This research aims to investigate the integration of both verbal and nonverbal interactions in social service robots, focusing on the unique needs and challenges faced by these sectors.

The significance of the study in terms of science and practice

This study significantly advances research and development in social service robots by enhancing our understanding of robot learning and human interaction By examining both verbal and nonverbal communication between robots and humans, it offers valuable insights into the dynamics of human-machine interaction.

Nonverbal interaction encompasses the exchange of information through facial expressions, body language, gestures, and other non-linguistic signals This form of communication is essential in human interactions, as it conveys nuanced emotions, intentions, and social cues that enhance verbal communication.

Verbal interaction refers to the use of spoken or written language to share information and articulate thoughts and ideas It includes the vocabulary, grammar, and syntax that people employ to communicate effectively with each other.

This research explores the relationship between nonverbal and verbal communication in human-robot interaction, offering crucial insights for designing robots capable of effectively interpreting and responding to both communication forms Such understanding can lead to significant advancements in social robotics, fostering the creation of more intuitive and responsive robots that enhance human engagement and understanding.

Social service robots that utilize both nonverbal and verbal communication have significant potential across diverse sectors such as healthcare, education, tourism, and entertainment These robots can effectively assist individuals with communication challenges, including the deaf or those who do not speak the local language Furthermore, the advancement of social service robots is essential for alleviating workforce shortages in social services, while simultaneously enhancing social interaction and improving community life quality.

Research objectives

The aims of the current study were to design and develop of social service robot based on nonverbal and verbal interaction The sequences to accomplish these aims were:

 Design and development of autonomous humanoid-shaped robots with receptionist functions in public areas or organizations

 Development of a robot capable of providing essential information to users upon specific requests: reminders, recommendations, advice, assistance, security, and activity provision

We are creating advanced image and audio processing software powered by artificial intelligence, including natural language processing and computer vision This technology enables robots to engage in both nonverbal and verbal interactions effectively.

 Design an interactive system that can connect to hardware such as robot platform, robot arms, and robot head to perform body language during communication

 Design and development of an autonomous mobile robot on a flat surface with obstacle avoidance mechanism, capable of navigation to the charging station when energy is running low

This study is divided into several sub-studies to achieve its objectives Initially, an analysis and survey of existing social service robots will be conducted to pinpoint their limitations Subsequently, new features will be developed to address these gaps in the market Research materials and relevant knowledge will be collected to examine the receptionist workflow, leading to the proposal of suitable solutions These proposed solutions will undergo review and feedback from the project supervisor, as well as insights from experienced professionals in programming and mechanical processing of robots, ensuring efficient project implementation and resource management.

The object and scope of the research

1.6.1 The object of the research

This research aims to design and develop a social service robot capable of engaging in both verbal and nonverbal interactions, enabling effective communication and social engagement with humans.

1.6.2 The scope of the research

This research focuses on designing and developing social service robots capable of engaging in both verbal and nonverbal interactions Key areas of investigation include voice communication with users, facial recognition features, and the integration of robotic components such as arms, bases, and heads for dancing Furthermore, the project includes the creation of a fuzzy controller system to facilitate the robot's navigation to its charging station.

This research focuses on developing social service robots capable of engaging with humans through both verbal and nonverbal communication, thereby improving their effectiveness in providing assistance and facilitating social interactions across diverse environments.

Research methodology

This section outlines the research methods employed in the study "Design and Development of Social Service Robot Based on Nonverbal and Verbal Interaction." The chosen techniques provide a systematic approach to effectively achieve the study's objectives and address the research questions.

The study's particular research techniques include:

This literature review examines key research and studies related to social service robots, nonverbal communication, verbal interaction, and human-robot interaction It establishes a theoretical foundation for the research, offering valuable insights into existing theories, models, and best practices in the field.

The development of a social service robot prototype involves designing features for both verbal and nonverbal communication This process requires careful selection of hardware, software creation, and the integration of sensors and actuators to ensure effective communication and engagement with users.

User testing and evaluation are essential for assessing the functionality, usability, and overall user experience of social service robots Researchers can gather valuable insights into the robot's efficacy and acceptance through methods such as observation, surveys, interviews, and qualitative feedback from individuals or groups participating in testing sessions.

Data analysis involves utilizing suitable statistical techniques to evaluate collected data, focusing on the effectiveness of both nonverbal and verbal interactions in reaching desired outcomes This process also aims to identify areas that require improvement.

Thesis organization

This thesis is organized in 8 chapters:

Chapter 1 introduces the research topic "Design and Development of Social Service Robots Based on Non-Verbal and Verbal Interaction," outlining the significance of the study and its objectives This section provides a comprehensive overview of the issues at hand and highlights related works in the field, setting the stage for a deeper exploration of social service robots and their interaction capabilities.

 Chapter 2: Mechanical system design - Motor selection, force calculation, shaft calculation, transmission system Material selection for manufacturing and assembly of robot components

 Chapter 3: Kinematics and dynamics problems - Kinematics and dynamics problems for the robot are presented

Chapter 4 delves into electrical and control system design, outlining the structure of the electrical system and introducing the sensor system It proposes a trajectory planning method for robotic arms and develops three fuzzy logic controllers to effectively guide the robot to its charging station.

 Chapter 5: Image processing software design for nonverbal interactions -

Delving into the methods such as Haar Cascade for face detection and using convolution neural network and transfer learning for Emotion recognition

 Chapter 6: Natural language processing software design for verbal interactions

- This chapter also illustrates many aspects of natural language processing for building a chatbox which helps the robot has the ability of verbal interaction

Chapter 7 focuses on experiments and evaluations designed to assess the robot's performance in both nonverbal and verbal interactions This chapter will also detail experiments conducted on three fuzzy logic controllers developed to enable the robot to efficiently return to its charging station.

 Chapter 8: Conclusions and recommendations - State the general conclusions, confirm the obtained results

MECHANICAL SYSTEM DESIGN

Overview

In recent years, robotic moving platforms with automatic positioning and various service capabilities have seen significant advancements While there are a few complex service robots available, most remain basic mobile platforms with limited functions This section emphasizes the mechanical design of human-friendly service robots, aiming to create a distinctive product that embodies the vision for future robots To establish essential functions, it's crucial to define the robot's role—such as receptionist, informant, or entertainer—and its primary tasks, including emotional expression, object handling, and teaching Additionally, considerations regarding the robot's size and weight are necessary to ensure mobility in both home and public settings, informed by the experiences of existing robots in similar projects.

Table 2.1: Symbolic features for service robot design

1 Traditional Square shape, full gourd

2 Tranquility Longer sleeves, clear waist, high neck

4 Healthy body Strong angle fracture, strong mass contrast

5 Technological beauty Light dot line, solid block, earphone shape

Introducing MIA ROBOT, our innovative social service robot designed for seamless verbal and nonverbal interactions With a concept inspired by a friendly young lady, MIA features gentle curves and gourd-like forms that foster a sense of warmth and approachability The robot’s feminine appearance is accentuated by carefully crafted textures and proportions in its torso, pedestal, and shoulder joints To enhance user engagement, MIA incorporates touch displays and flat screens, ensuring a friendly interface while maintaining a sleek, modern design.

Design requirements

The social service robot is designed for public interaction, utilizing both verbal and nonverbal communication Its hardware and software must support advanced decision-making capabilities Key components include a mobility base with DC servo motor-powered wheels, a flexible castor wheel for enhanced movement, and a body equipped with sensors and cameras for obstacle avoidance and facial recognition A touch screen facilitates human interaction, while the robot's arms, featuring four degrees of freedom, create a welcoming atmosphere and can perform simple dance routines The mechanical design is crucial, providing stability and support for the robot's operations, which simplifies programming tasks.

The mechanical part of the service robot consists of two main parts:

 The robot's frame must ensure the following factors: firmness, rigidity, compactness, easy disassembly, and repair

 The appearance of the robot must ensure aesthetics, compactness, and ensure that the robot is easy to move

Figure 2.1: The overall structure of the social service robot

The robot operates using a 3-wheel mechanism, consisting of two active wheels and one castor wheel, and weighs approximately 40 kg, which includes its mechanical components, battery system, dual robot arms, actuator, and CPU Powered by a direct current (DC) electric motor, the robot can achieve a maximum speed of 0.3 m/s Consequently, the group's requirements for the translation robot encompass a range of essential functions and tasks.

 Movement: Movement on flat, non-rough surfaces

 Approach via camera and button: receive and process information about robot control activities within a radius of 200 cm

 Emotion display screen of the service robot: this screen is placed on the robot's face and is used to express the robot's emotions, showing the operating status

The robot features a screen located on its chest, designed to facilitate user interaction through various integrated applications.

The robot features three primary components: a head, body, and base, designed to stand at an adult human's height for optimal interaction An iPad screen atop the robot conveys its emotions and includes a touch interface for user engagement Additionally, the robot's body is crafted with curves to enhance its friendly and lively appearance.

 Move: The movement moves on a uniform material plane, with a space limit of 5 meters x 5 meters.

Mechanical design for the robot base

The team conducted a thorough discussion and market analysis to establish the initial requirements for a cost-effective and stable social service robot They designed a frame model (Figure 2.2) using aluminum alloy to minimize the robot's weight while ensuring strength and structural rigidity.

The robot base is divided into three floors:

 First floor: Contains driver control motor, motor, encoder and some voltage stabilizer circuits

The second floor houses the battery power supply, which is essential for powering the entire system Additionally, microcontrollers located on the exterior of the second floor play a crucial role in managing the robot base's mechanisms.

 Third floor: This is the part that contains the robot body to help the robot body be firmly held on the robot base

1: Robot body support 2: Upper body support 3: Wheel

4: Battery (12V) 5: Wheel axle bearing 6: Castor wheel 7: The pose of the robot

Service robots are increasingly sought after for their ability to autonomously navigate and perform tasks in complex, dynamic environments These self-propelled robots execute movements based on predefined requirements, yet their movement capabilities are constrained by non-holonomic control systems and wheel design To ensure accurate navigation, control commands are generated to guide the robot along a specific trajectory, maintaining the desired angular and linear velocities while addressing the limitations of non-holonomic systems and dynamic surroundings Factors such as obstacles, wheel slip, tire wear, and structural changes can significantly impact robot movement To optimize both the cost and space of the robot platform, a belt drive system was selected for its efficiency and compactness.

Figure 2.2: Details on the robot base

2.3.1 Calculation and selection of engine

A moving robot undergoes several stages: starting, accelerating, decelerating, and maintaining a constant speed During the start-up stage, the movement resistance is highest, necessitating maximum engine pull; thus, power calculations and engine selection are based on this traction The robot is estimated to carry a load of 40 kg and is designed to achieve a maximum velocity of 0.3 m/s, with a wheel diameter of 145 mm.

Friction force acting on one wheel:

 à k is the coefficient of friction Select  k 0, 68, because Rubber on concrete (Dry)

Figure 2.4: Diagram of force distribution on a active wheel

Moment of inertia acting on one wheel:

According to Newton's second law, for the robot to move:

  , We choose F k 135 (N) The power of working capacity on the working machine shaft (constant load) [1]:

 F k is the pulling force of each motor

 V is the speed of the robot

Number of revolutions of the impeller shaft [1]:

 Transmission efficiency of 1 pair of bearings:  ol = 0,995

Required power on motor shaft [1]:

Conditions of engine selection (Equation 2.19, page 22, [1]):

40 dc ct dc dc sb dc

From the calculations above (P dc 43 W and n dc 40 RPM) and referring to the products available on the market, DC Servo Motor (S-7D55BW116) is chosen for the robot

Table 2.2: Engine specifications and features

Name Power Voltage n dc Rate load torque

Transmission ratio of the system:

Number of wheel axle revolutions:

To prevent dangerous equipment fires caused by motor overload during movement, our robot platform utilizes a belt drive system instead of a direct motor connection This design safeguards the motor and other components from damage when the load is excessive Specifically, the system incorporates two toothed belts and two pulleys to effectively drive the two motors and wheels, ensuring optimal performance and safety.

Working parameters of the transmitter:

 Power on motor shaft: P dc 43   W

 Number of revolutions on the drive shaft: n dc 45  RPM 

 Static load, impact, slight vibration

 The belt may be adjusted

 Periodically, the belt is tightened to maintain the proper tension

After analyzing market specifications, we select a coefficient of m = 1.62 Based on the data from Table 4.28 and a survey of available belt widths, we determine that a belt width of b = 10 mm is a reasonable choice.

Determine the parameters of the transmitter:

Choose the number of teeth of the drive pulley:

Choose z1 = 18 (According to table 4.29, page 70, [1])

The axis distance a is selected according to the condition [1]: min max a  a a (2.17)

So the axis distance a will be in the condition: 32, 4   mm   a 116, 64   mm

Number of belt teeth zd:

According to table 4.30 [1], Choose belt length: ld = 203,2 (mm)

Axis distance is redefined according to (equation 4.6, [1]):

The diameter of the rings of the pulleys:

The outer diameter of pulley:

The embrace angle on the lead pulley:

Number of teeth that simultaneously engage on the drive pulley:

Belt test for specific ring force:

Specific ring force on belt:

The specific ring force on the belt must satisfy the following conditions:

Determine the initial tension force and the force acting on the belt:

The force acting on the belt:

2.3.4 Calculation and testing of shafts

Moment equilibrium equation at A in the Y-direction:

 Force balance equation in the Y-direction:

Figure 2.5: Force analysis on transmission

Figure 2.6: Analysis of forces acting when projecting onto coordinates zOy

Moment equilibrium equation at A in the X-direction:

 Force balance equation in the X-direction:

Figure 2.8: Shear diagram Qy and moment diagram Mx Figure 2.7: Analysis of forces acting when projecting onto coordinates zOx

Shaft diameter at cross section A :

Figure 2.9: Shear diagram Qx and moment diagram My

Shaft diameter at cross section N :

From calculations and market survey, choose a shaft diameter of 12 mm

Shaft test according to fatigue strength

Bending and torsional fatigue limits for the symmetry period:

The selected shaft has a circular cross-section According to (table 10.6, page 196, [1]), the formula for calculating bending resistance W j and torque W 0 j :

The axes of the reducer are all rotating, the bending stress changes according to the symmetry period, so according to (equation 10.22, page 196, [1]) we have: aj max

When the shaft rotates in one direction, the torsional stress changes with the dynamic circuit period, so according to (equation 10.23, page 196, [1]) we have: max aj oj

Determine the coefficients for hazardous sections according to ((10.25) and

 K x 1, 06 - Value of stress concentration factor (table 10.8, page 197, [1])

 K y  1, 6 - The value of the stability coefficient (table 10.9, page 197, [1])

Factor of safety considering only normal stress (equation 10.22, page 195, [1]):

The factor of safety considers only the shear stress (equation 10.23,page 195,[1]) :

Factor of safety quation 10.19, page 195, [1]):

Shaft test for static strength

To avoid excessive plastic deformation or failure from sudden overload, it is essential to perform a static strength test on the shaft, following the formula outlined in equation 10.27 on page 200 of reference [1].

 The shaft meets the static strength condition

Based on the shaft diameter d = 15 mm Look up (table P2.7, page 254, [1]) select super light ball bearing with symbol 1000902; inner diameter d = 15 mm; Outer diameter

D = 28 mm; Dynamic load capacity C = 2.53 kN; Static load capacity C o = 1.51 kN; r 0.5 mm; B = 7 mm

Radial load acting on the first bearing (at A):

Radial load acting on the second bearing (at b):

So we test with larger load bearing: F r  F rA  887,104   N

Conventional dynamic load for ball bearings (equation 11.3, page 214, [1]):

 V: The coefficient refers to which revolution rotates, V=1

 Kt: Factor taking into account the effect of temperature Kt = 1

 Kd: Factor taking into account load characteristics (table 11.3, page 215, [1])

 X: Factor taking into account the load characteristic X = 1 because it is only subjected to centripetal force

 Y: Axial load factor y = 0 because Fa = 0

Drive life in million revolutions :

Dynamic loading capacity (equation 11.1, page 213, [1]) :

Rolling bearings meet dynamic load conditions

Conventional static load (equation 11.19, page 221, [1]):

 X o 0, 6 - radial load factor (table 11.6, page 221, [1])

 Y o 0,5 - axial load factor (table 11.6, page 221, [1]) t o

 The static load capacity of the bearing is guaranteed

2.3.6 The durability simulation of the robot base

The robot base, constructed from A6061 aluminum alloy, features a teaching depth of 5 mm and supports a load of 400N Simulation results from Solidworks, illustrated in Figure 2.10, indicate that the critical stress points are at fixed locations, with maximum stress recorded at 23.6 MPa and minimum stress at 1.84 kPa These values are well within the allowable stress limit of aluminum, which is 55.15 MPa, confirming the durability of the robot base.

Figure 2.10: Stress simulation result of the base

Mechanical design for robot arms

Designing a mechanical hand that accurately mimics the human hand is a complex challenge due to its restricted geometric space, diverse touch inputs, and 22 degrees of freedom This project employs a robotic arm modeled after the human arm, ensuring the size and weight closely resemble those of a human hand It utilizes two servo motors in the forearm and five in the hand for finger flexion, with the entire system being 3D printed from plastic for efficient prototyping Significant modifications have been made from both bio-anatomical and mechatronic perspectives to minimize issues associated with the DART hand while maintaining biological dimensions The design incorporates space for drivers, motors, sensors, and electronics within strict geometric constraints, and features modular components for easy assembly, maintenance, and repair Accurate control is achieved through essential position feedback at the joints.

The design of the robot's forearm was inspired by the human arm's golden ratio, resulting in a structure that closely resembles human anatomy Equipped with two arms, the robot boasts nine degrees of freedom—four for the arm and five for the hand—allowing for a versatile range of motion A hinge joint connects the base body to the arm, enhancing its operational capabilities With two robotic arms, the robot achieves a broader work area, and its control system effectively coordinates the movements of both the manipulator and the platform The robot performs tasks using data from an automatic camera system or through direct interaction with a human operator.

Figure 2.11: Displacement simulation result of the base

The body coupling serves as a vital joint, supporting the majority of the arm's weight, making its design essential for strength and compactness Proper assembly of the arm module with the robot body is crucial, necessitating a logical mounting of components To choose the appropriate motor, it's important to calculate the maximum torque at the shoulder joint The overall design and material selection have led to an arm mass of approximately 1.5 kg, and estimating the arm's center of gravity is key for optimal performance.

Figure 2.12: Human arm’s golden ratio [37]

Figure 2.13: The robot arm workspaces

Figure 2.14: Center of gravity of the arm relative to the coupling motor

Table 2.4: Body coupling motor parameters

Speed after passing through the gearbox 33 rpm

Number of encoder pulses 12 ppm

Rated voltage of encoder 5 VDC

Encoder channel number 2 channels (A and B)

From there, calculate the torque at the motor set point:

 d: Distance from center of gravity of swing arm to motor shaft

To ensure stability and enhance system productivity, we recommend selecting a motor with a torque range of 150% to 200% of the estimated requirements Additionally, incorporating an encoder is essential for precise control Consequently, the team has decided on a 12V DC servo motor for this application.

The shoulder joint features one degree of freedom, a spherical construction, and a limit switch on the bridge surface of the joint that controls the over-joint condition to

To prevent equipment damage, the motor is integrated within the transmission for the shoulder joint Similar to the selection process for the robot body joint, careful consideration is given to the motor for the shoulder joint This joint is vital as it supports the entire weight of the arm, ranking as the second most important joint after the trunk joint due to its high mobility.

According to the formula we have:

To ensure stability and enhance system productivity, we select a motor with a torque range of 150%–200% of the predicted requirements The motor is equipped with a return value encoder for precise control Consequently, we opted for a DC motor encoder worm gear reducer, which benefits from a screw transmission that increases load capacity Additionally, this motor design prevents backward movement when power is suddenly cut off.

Table 2.5: Shoulder joint motor parameters

Speed after passing through the gearbox 11 rpm

Number of encoder pulses 6204 ppm

Rated voltage of encoder 5VDC

Figure 2.15: Arm center of gravity to shoulder joint motor

To ensure proper integration with the robot body, the biceps must match the size of a human arm, necessitating careful planning for motor attachment and control circuit space Designers must maximize every inch of the biceps, which is constructed in two sections and secured with bolts and nuts The chosen motor for this assembly is the Mg946 RC servo motor, known for its specific performance parameters.

Table 2.6: Mg946 Servo motor parameters

Voltage 9kg/cm(4.8V), 12kg/cm(6V)

Figure 2.16: The outer shape and the biceps engine

The elbow joint, which connects the biceps and forearms, is a pretty significant joint

A well-designed robotic arm ensures strength, smooth operation, and minimal vibration It is essential to position the engine effectively while maintaining a size comparable to a human arm The forearm consists of two segments: one connected to the elbow joint and the other to the wrist A rotating motor, centrally mounted between these two sections, acts as the connecting link.

The wrist serves as the crucial link between the arm and hand, presenting challenges in locating and mounting the motor due to its small size To address this, the team has chosen the MG90S servo motor for installation on the wrist, opting to keep the wrist stationary without any movement.

Figure 2.17: External shape and association with the forearm

Figure 2.18: The forearm is designed to fit the motors for the hand The motor controls the finger by the method of pulling the wire

The hand is the final component in the chain connecting the arm joints, primarily designed for grasping However, in the context of service robots, its gripping power is limited, emphasizing arm mobility instead Consequently, my team has opted for fixed, unmoving fingers The design of the hand is inspired by the proportions of a woman's hand, guiding the integration of the robot's arm and body.

The neck joint of a robot consists of two main components: the upper part, which connects to the robot head, and the lower part, which links to the neck rotation motor and is secured to the robot body A key mechanism ensures the stability of these two sections, while the cylinder-shaped design of the lower neck enhances its stability The upper neck joint, directly supporting the robot head, must be constructed from durable, anti-vibration materials to effectively bear the load.

Figure 2.19: Design of the upper and lower parts

Figure 2.20: 3D image of the hand

When designing a robot, it's crucial to consider the mass of the robot head, estimated at approximately 1.5 kg, as it directly impacts the selection of the neck motor This motor is responsible for supporting the weight and initiating rotation, so its pulling power must exceed the frictional force between components by 3 to 4 times Additionally, the motor's speed must remain within specified limits to ensure optimal performance.

In humans, a period of 180° rotation is 0.5s With the above requirements, the group proposed a plan to choose the GW 4632 reducer DC motor

Table 2.7: Motor parameter of the robot’s neck

Speed after passing through the gearbox 30 rpm

Number of encoder pulses 2200 ppm

Rated voltage of encoder 5 VDC

KINEMATICS AND DYNAMICS PROBLEMS

Overview

A social service robot is an autonomous mechatronic device that performs various tasks under computer control, capable of manipulating objects and moving its components It operates based on a reference coordinate system, which establishes a precise relationship for tracking its position and orientation, thus creating a complete model of the robot Kinematics, the study of mechanical motion, focuses on the geometric aspects of this movement without considering the causes of motion In robot kinematics, forward kinematics analyzes the geometric relationships between control parameters and the robot's behavior in its workspace, allowing for predictions of movement based on joint parameters Conversely, inverse kinematics determines the necessary joint parameters to reach a desired position, facilitating effective motion planning for the robot.

Kinematics and dynamics of the robot base

Figure 3.1: View of an idealized rolling wheel [5]

Our social service robot is a mobile robot, and its kinematics are crucial for understanding its operational workspace, which defines the range of positions it can occupy As a self-contained device, the robot navigates its environment autonomously, but accurately determining its instantaneous position is complex due to the need for integrating its movements over time Factors like wheel slip can further complicate motion estimation This research focuses on a basic self-propelled social robot model with three wheels—two active and one castor—where the position is calculated as the midpoint between the active wheels, assuming no slipping on a flat surface Additionally, we will explore the dynamics of the robot system to tackle the challenges posed by impact forces during motion control By delving into these areas, the study aims to enhance the understanding of mobile robot kinematics and dynamics, ultimately leading to effective control strategies that improve performance and safety in various environments.

Wheeled robots operate exclusively on the x-axis, with no movement capabilities along the y-axis At low speeds, the impact of wheel slip on the road surface is negligible.

When the robot is moving, we will assume:

 The robot moves without slipping on the basis

 Motion of the robot is considered on a flat surface

 Motion is considered at low speed, so the force of inertia in all directions is less than the force of friction

Kinematics involves determining the optimal trajectory for a robot to navigate from its starting point to the destination while avoiding obstacles Once the trajectory is established, the next step is to implement a control strategy that enables the robot to accurately follow the designed path.

3.2.2 Calculation of forward kinematics for the robot base

The purpose of forward kinematics (FK) involves calculating a mobile robot’s pose and motion given the linear velocity and agular velocity

Mobile robots utilize a drive mechanism called independent drive in each wheel,

The differential drive mechanism, commonly used in robotics, operates based on the number of driving wheels It typically consists of an axle with two independently moving wheels, allowing for forward and backward motion To maintain stability and prevent tipping, an additional wheel is designated for driving By varying the speed of each wheel, the robot can execute left or right turns, rotating around a point known as the Instantaneous Center of Curvature (ICC).

The differential drive system determines a vehicle's motion by controlling wheel velocities, requiring the robot to rotate around a point on the common axis of its two drive wheels By adjusting the relative velocities of these wheels, the rotation point can be shifted, enabling various trajectories It is essential that, at all times, this rotation point allows both the left and right wheels to maintain the same angular rate, denoted as ω, while revolving around the Instantaneous Center of Curvature (ICC).

Where: l is the distance between two wheels

R is the distance from ICC point to the center point between two wheels

41 v r is the velocity of right wheel v l is the velocity of left wheel

 is the angular velocity, while the robot rotate around ICC From ICC, we can determine the angular velocity of the robot as follows: r l v v

The radius of curvature from the center of movement of the robot to the center of instantaneous velocity is determined as follows:

From (3.3) and (3.4), Linear velocity of the robot base is computed by:

ICC point is given by:

At the time t +  t , the pose of the robot with respect to the ICC is given by:

The motion of a robot rotating a distance R around its ICC with an angular velocity defined by is described by Equation (3.7)

Using forward kinematics, the robot's pose can be determined at any time \( t \) by controlling the velocities of the left and right wheels This enables the robot to move in a specific direction \( \theta(t) \) at a defined velocity \( V(t) \).

  For the case of a differential drive robot, Equation (3.8) is transformed into:

3.2.3 Calculation of inverse kinematics for the robot base

The purpose of inverse kinematics (IK) is used to calculate the linear velocity and angular velocity of the robot in order to reach a given pose

To describe the position of the Mobile Robot (MR) in the workspace, two different coordinate systems (shown in Figure 3.3) that need to be defined:

 The robot’s reference frame is described by  x R , y R , 

 The axes  x G , y G  illustrate the inertial global reference frame

The position of P in inertial global reference frame is defined by:

The position of C in inertial global reference frame is defined by:

From (3.10) and (3.11), the relation between P and C can be described by:

Derivatives equation (3.14) with respect to time, the equation is like this:

On the other hand, we have:

Substitute (3.18) into (3.17), the equation like:

So, the angular and linear velocity is calculated by:

In conclusion, we use equation (3.20) to calculate linear and angular velocity

 V ,   of the robot, and get the pose of robot from the equation (3.9)

3.2.4 Dynamics of the robot base

The dynamics of a differential drive mobile robot are influenced by non-holonomic constraints, with its center of gravity positioned between two independently controlled wheels The motor-driven wheels allow for precise movement, while the castor wheel ensures stability on flat surfaces This configuration enhances the robot's balance during motion, as illustrated in Figure 3.4.

44 distance between two wheels, R is the radius of each wheel, is the angular between x- axis and robot orientation

Assume that the robot does not slide on any axis From this assumption, the robot is described below: cos sin 0

The equation of dynamics is written by:

 A T is the Jacobi matrix,   cos sin

 B q  is the transpose matrix,   cos cos

 is the force vector,    m x  C cos   y C sin  

 C q q  , is matrix containing radial and rotational components

Figure 3.4: Dynamics for robots moving on wheels

The robot has non-holonomic control then:

Kinematics and dynamics of the robot arms

3.3.1 Setting up the D-H board for the robot arms

To model the degrees of freedom for the arm, each joint must have a reference system that includes the z and x axes Given that the arm is designed as a swivel without a translational joint, the z-axis is determined using the right-hand rule in the direction of rotation By calculating the common perpendicular distance between the axes z n-1 and z n, the x-axis can be established, with the x n axis aligning with the direction of the segment a n.

The coordinate system for the arm and neck can be modeled based on anatomical structures and rotational movements, as illustrated in Figures 3.5 and 3.6 Each joint within this system is assigned a specific set of coordinate axes to accurately represent its rotation.

Figure 3.5: Coordinate system of the human body [38]

Table 3.1: DH-parameters table for 4-DOF arm

Parameters  i , d i , a i and  i are defind as follows

  i : angle between axes x i  1 and x i about axis z i  1 to be taken positive when rotation is made counter-clockwise

Figure 3.6: Coordinate systems of the robot arm and robot neck

  i : angle between axes z i  1 and z i about axis x i to be taken positive when rotation is made counter-clockwise

Finally, the coordinate systems of the robot arm can be obtained and displayed as shown in Figure 3.6, and the D-H parameters are listed in Table 3.1

3.3.2 Calculation of forward kinematics for robot arms

The forward kinematics problem involves determining the endpoint position of a robotic arm by analyzing the rotation angles of its joints To calculate the position and orientation, it is essential to establish the parameters of the Denavit-Hartenberg (D-H) table Once these parameters are defined, we can derive the conversion matrices for the gripper or final actuator in relation to the robot's reference axis system.

The coordinate transformation between two consecutive frame i and frame i + 1 is obtained as follows:

1 1 1 sin( ) sin( ) cos( ) sin( ) sin( ) cos( ) sin( ) cos( ) cos( ) cos( ) sin( ) sin( )

Based on the table D-H, substituting the values we have:

1 2 cos( ) sin( ) 0 cos( ) sin( ) cos( ) 0 sin( )

The robot arm utilizes the Denavit-Hartenberg (D-H) table to achieve coordinate transformation between successive frames, as demonstrated in equation (3.25) Consequently, this enables the transformation of coordinates from the end effector's frame to the base frame.

After multiplying all the matrices together and reducing the expression, we will finally find out that the coordinates of the gripper or working head are ( P P P X , , Y Z )

3 2 3 2 2 cos( )( cos( ) cos( )) sin( )( cos( ) cos( )) sin( ) sin( )

3.3.3 Calculation of inverse kinematics for robot arms

Inverse kinematics is a crucial challenge in robotics, as it enables the control of robotic movements towards specific coordinates By determining the necessary variable values from a given point's coordinates, robots can effectively navigate to their target locations Utilizing a wing model with four degrees of freedom, we can perform these calculations to achieve precise movement control.

From equation (3.26), inverse kinematic is calculated by steps below:

3cos( )cos( )2 3 3sin( )sin( )2 3 2cos( )2

3sin( )cos( )2 3 3cos( )sin( )2 3 2sin( )2

3 3 2 2 3 3 2 sin( ) sin( ) cos( ) cos( ) cos( ) sin( ) sin( ) cos( )

3 3 3 3 2 1 cos( ) sin( ) cos( ) sin( ) cos( ) sin( )

3 3 3 3 2 sin( ) cos( ) cos( ) cos( ) sin( ) sin( ) cos( )

3 3 3 3 2 cos( ) sin( ) sin( ) cos( ) sin( ) sin( ) cos( )

1 sin( )( cos( ) cos( )) cos( )( cos( ) cos( ))

1 1 cos( )( cos( ) sin( )) sin( )( cos( ) sin( )) cos( ) cos( ) sin( ) cos( ) 0

1 1 sin( ) cos( ) sin( ) sin( ) cos( ) sin( ) cos( ) 0

0 0 0 cos( ) cos( ) cos( ) sin( ) sin( )

1 1 sin( )( cos( ) cos( )) cos( )( cos( ) sin( )) sin( )cos( ) cos( )( cos( ) cos( )) sin( )( cos( ) sin( )) sin(

0 cos( ) cos( )cos( ) sin( ) cos( )sin( ) cos( ) sin( ) 0

So the general velocity vector of the robot is determined:

1 1 sin( )( cos( ) cos( )) cos( )( cos( ) sin( )) sin( ) cos( cos( )( cos( ) cos( )) sin( )( cos( ) sin( ))

0 cos( ) cos( ) cos( ) sin( ) cos( )sin( ) cos( ) sin( )

The kinetic energy of the system is equal to the sum of the kinetic energies of the links and joints:

Kinetic energy of the whole system:

The potential energy of the system is equal to the sum of the kinetic energies of the links and joints:

The potential energy of the whole system

So the Lagrange function to find is:

External torques for a rotational motion:

Kinemactics of the robot head

The robot head features a design with only one degree of freedom, which significantly simplifies the kinematic calculations To facilitate this, we can establish the D-H parameters for the robot neck accordingly.

Table 3.2: DH-parameters table for head

The total displacement matrix, analogous to the rotation matrix of the neck joint, encompasses the coordinates of the endpoint we aim to determine By utilizing the matrices of the rotating joints, we can derive this comprehensive total displacement matrix.

ELECTRICAL AND CONTROL SYSTEM DESIGN

Overview

Achieving stability in a mechatronics device relies not only on the mechanical system but also on the electrical control system, which is essential for operational stability This chapter focuses on the analysis, design, and computation of the electrical and control systems to enhance the robot's performance and functionality By systematically optimizing these systems, we can ensure the robot meets desired operating parameters, ultimately improving its stability, precision, and efficiency during long-term operation.

Structure of the electrical system

The block diagram of the electrical control system for our social service robot, as illustrated in Figure 4.2, is organized into three distinct components: the supply power block, the processing block, and the signal block This division simplifies the understanding of the robot's complex system.

Supply power is essential for energizing systems, as microcontrollers, CPUs, sensors, and actuators operate at varying voltage levels To provide the necessary electrical supply, buck and boost converters are utilized In this project, a battery serves as the power source, with a buck converter used for components needing less than 12V and a boost converter for those requiring more than 12V.

The processing block of our robot consists of essential devices that enable data processing and actuator control through command transmission Key components include the CPU, microcontrollers, camera, microphone, and monitor The camera and monitor facilitate nonverbal communication with users, while the microphone captures audio input The CPU acts as the central processing unit, handling data from the camera, monitor, and microphone, displaying information on the monitor, generating audio for the loudspeaker, and managing the robot's arm and base movements Additionally, microcontrollers within the robot's arms and base oversee motor operations, ensuring precise control.

Figure 4.2: Block diagram of the robot electrical system

The signal block is essential for robot functionality, incorporating actuators that process data from various devices, including loudspeakers, sensors, motors, and monitors Motors control the movement of the robot's base and arms, while the monitor offers a visual interface for user interaction Additionally, ultrasonic sensors detect obstacles, and encoder sensors determine the robot's position, enhancing its operational efficiency.

Calculation of battery

The robot platform is designed for autonomous tasks, necessitating a reliable and efficient power supply The choice of storage battery is critical, as it directly impacts the robot's self-propulsion capabilities Key factors influencing battery configuration include size, weight, voltage, and charging method, which must be carefully considered during the design process To enhance stability, the battery should be positioned to lower the robot's center of gravity, ideally aligned with the normal axis perpendicular to the triangular plane formed by the wheels' contact points The battery's placement at the lowest point is essential for maintaining overall mass distribution For self-propelled robots, mobile power sources are crucial; a battery suffices for short operational periods, while longer durations require direct power connection This section will detail the calculations for the storage battery's power requirements, ensuring it can effectively power all onboard equipment while providing adequate operating time.

Table 4.1: Total capacity of the robot system

Other sensors and power circuits 10 1 10

Formula to calculate battery capacity:

 AH: Capacity of the battery

 W: The total capacity of the system

The total capacity of the systemW = 288,2W Used Time T = 2h Performance factor pf = 0,7

To operate a robot in 2 hours that need a storage battery with 12V-68,62AH So we choose battery (Atlas 12V-70AH) for our robot

SN Đặc tính Thông số

Sensor system

Encoders play a crucial role in numerous applications, including conveyors and motor shafts, by accurately determining velocity and angular position through the counting of revolutions on the encoder disk They are primarily classified into two types: absolute encoders and incremental encoders This article focuses exclusively on incremental encoders, which are utilized in our system for various applications, providing essential data for effective operation.

The 58 revolution encoder produces two pulse trains, labeled A and B, utilizing light or magnetic sensors The relationship between these pulse trains indicates the direction of rotation A rising edge in either pulse train signifies the encoder's rotational movement.

The direction of an encoder's rotation can be determined by the sequence of rising and falling edges between signals A and B A rising edge on signal B following a rising edge on signal A indicates that the encoder is rotating in one direction Conversely, if a rising edge on signal B occurs after a falling edge on signal A, it signifies that the encoder is rotating in the opposite direction.

In our research, we utilize an encoder to accurately estimate the velocity and position of the robot's arms and base This information enables the robotic arms to move towards a designated target point Additionally, we develop three fuzzy logic controllers to determine the optimal trajectory for reaching the charging dock.

In this study, ultrasonic sensors (HC-SR04) are used to avoid obstacles The HC- SR04 sensor has four pins (as shown in Figure 4.4), those are Trigger, Echo, Vcc, and

When the Trigger pin of the ultrasonic sensor receives high pulses lasting at least 10 microseconds, it activates the transmitter to emit eight 40KHz pulses By measuring the time taken for the receiver to detect the reflected sound waves and utilizing the speed of sound in dry air, approximately 344 m/s, the sensor accurately calculates the distance to nearby obstacles.

Figure 4.3: The simple structure of encoder [7]

Figure 4.4: The ultrasonic sensor’s working principle [7]

The distance from the ultrasonic sensor to the obstacle is computed by:

 d is the distance from ultrasonic sensor to obstacle (cm)

 t is the period of time (from the moment the transmitter emits pulses to the receiver gets the reflected wave (us).

Control system design for robot arms

When designing a motion trajectory for a robot arm, it is essential to consider the workspace to avoid collisions with the robot body The trajectory selection involves analyzing the arm's shape, start and end positions, desired joint speeds, and ensuring the safety of both wings in Cartesian coordinate space Effective communication between the hands allows for precise, stable, and collision-free movements One approach involves focusing on a specific joint and utilizing a polynomial function to ensure that the initial and final conditions are consistent, transitioning from an initial angle θ_i at time t_i to a final angle θ_f at time t_f Path planning requires the application of inverse kinematics and interpolation transformation equations.

4.5.1 Trajectory planning for robot arms

The inverse kinematics problem allows us to identify the starting and ending points of angles based on their associated positions For stable joint movement, the trajectory of each joint angle θ(t) must meet specific velocity and position constraints at the endpoint, adhering to four essential criteria.

The variable \( \theta(t_e) \) denotes the terminal time's angle, while \( \theta(t_i) \) indicates the starting angle at time \( t = 0 \) The endpoint position constraint \( \theta(t_e) \) reflects the joint angle at the end position in relation to the starting position.

The velocities of the end and start points can be considered as zero to ensure a smooth and steady joint velocity, namely:

We use these four variables to solve the equation below as a cubic polynomial:

Equation of motion for speed and acceleration of the arm:

Find the coefficients h h h 1 , 2 , 3 and h 4 by substituting the values into the cubic polynomial:

As seen below, the coefficients can be solved

Cubic polynomial function that describes the motion of each joint when the initial and final velocities are zero is described by:

The formula for angular velocity and angular acceleration can be found using equation (4.13)

Upon recognizing the interactor's face within the interaction range, the robot initiates a handshake operation to boost engagement, utilizing a shoulder joint rotation angle of θ1 This robotic handshake will be analyzed through calculations to provide precise results.

To establish a trajectory for the shoulder joint, it is essential to ensure smooth control over a four-second movement The robot arm must effectively manage the states involved, with the joint velocity reaching zero at the end of the motion.

To plan the motion from an initial stationary position at angle θ1 with i = 0, a cubic polynomial function can be employed By substituting the known parameters into equation (4.12), the necessary coefficients can be derived.

According to (11) and (12), the trajectory of the shoulder joint is specifically:

The remaining joints are managed using a third polynomial for trajectory planning, ensuring that the operational limits and requirements of the robot arm's tasks are met effectively.

Figure 4.5: Trajectory of shoulder joint for robot arm

4.5.2 Build algorithm for arms and neck of the robots

During the execution of arm tasks, the robot's host transmits signals essential for coordination, particularly when the arms and neck must work in unison The robot's two arms are managed by two microcontrollers—one Master and one Slave—where the Master oversees both the neck movement and the overall control process Upon receiving a command, the Master microcontroller sends signals to the Slave microcontroller, enabling the arms to achieve the required positions to successfully complete the designated tasks.

Figure 4.6: The flowchart for controlling the robot’s arms and neck

Control algorithm system design for the robot base

Over the past century, robotics has made remarkable advancements to address human societal needs, shifting from industrial robots to service robots These versatile machines are now central to research, integrating into various sectors such as healthcare, agriculture, shipbuilding, construction, defense, and household services The evolving demands across these fields drive the development of mobile and service robots, which, despite structural variations, focus on service applications and operations in natural environments As invaluable multifunctional helpers, robots are adapting to the changing perspectives of society.

Service robots are increasingly operating in complex environments, yet they still rely on human assistance for battery charging To achieve energy self-sufficiency, these social service robots must have the ability to autonomously recharge, which requires the establishment of dedicated charging stations for docking and initiating the charging process This research is crucial as it highlights the need to adapt and enhance traditional navigation algorithms to incorporate charging stations, ensuring seamless navigation while addressing the robots' charging needs.

This article discusses the implementation of three fuzzy logic controllers to guide a social service robot from its starting position to a charging station Each controller features unique membership functions and rules to regulate the speed of the robot's wheels The Path Planning Fuzzy Logic Controller (PPFLC) is utilized for route navigation, while the Obstacle Avoidance Fuzzy Logic Controller (OAFLC) ensures the robot can circumvent obstacles during its journey Ultimately, the Auto Docking Fuzzy Logic Controller (ADFLC) facilitates the robot's seamless docking at the charging station Figure 4.7 provides a visual representation of the PPFLC fuzzy controller.

The navigation problem remains a significant challenge in the research of service robots, prompting researchers to enhance positioning reliability through modern sensors, hardware systems, and advanced software algorithms While the robot's position is determined by sensor measurements, these sensors have inherent limitations Current positioning methods include GPS combined with encoder sensors, which are costly and primarily suited for outdoor use Alternatively, Lidar sensors and cameras offer another approach, but they also come with high expenses and necessitate complex control techniques.

Figure 4.7: The flowchart of the fuzzy logic algorithm

Dead-reckoning is a widely used method for mobile robots, leveraging simple equations and revolution encoder data from the wheels for easy implementation This technique converts wheel revolutions into corresponding linear displacement, making it a feasible and low-cost option for experimental studies Due to its short experimental duration and minimal cumulative error, dead-reckoning is acceptable for various tasks In this research, encoder sensors are utilized to manage wheel speed, followed by kinematic equations to accurately calculate the robot's location.

The robot features two independent driving wheels and a castor wheel that aids in maintaining balance during movement The kinematics of the robot, illustrated in Figure 4.9, allows for the determination of its pose within the environment as it moves The kinematic model for the differential drive mobile robot is defined by the equations referenced in sources [9-11].

The dead-reckoning technique estimates the position and direction of a wheeled mobile robot (WMR) by using its current pose at time t = t(n) to calculate its subsequent pose at time t = t(n+1) This method relies on mathematical equations, specifically referenced in equation (2) of [9], which can be reformulated as shown in Eq (4.17).

Figure 4.8: The structure of the robot

Where T s = t n  1 t n represents the sample time used The dead-reckoning method relies on estimating the distance traveled by the social service robot during the sample time

T s , which is determined by reading the encoder pulses on the robot’s wheels The mathematical model is described by the following equations:

Where ΔP = P n  1 P n represents the number of pulses measured from the left and right wheels of the robot during the sample time Ts.The robot has two leading wheels, with

The robot's movement is characterized by its diameter (D) and the distance between its wheels (2b) Each encoder utilized has a resolution of N, though challenges like wheel slipping during rapid motion and accumulated errors persist To mitigate these issues, the robot operates at a reduced speed, employs a short sampling time, and implements the X4 encoding technique for encoder readings.

Robot movement processes, including obstacle avoidance, rely on effective path planning to navigate a room This involves segmenting the path into multiple parts, where the controller sequentially directs the robot along these segments Each segment is defined by two points on the robot's motion plane: the starting point (x_start, y_start, θ_start) and the end point (x_goal, y_goal).

The objective is to control a robot's movement from one point to another, enabling it to autonomously dock at a charging station using its camera To facilitate this process, three fuzzy logic controllers have been developed to assist the robot in navigating back to the charging station efficiently.

Figure 4.9: The robot move from one point to another

4.6.3 Path Planning Fuzzy Logic Controller (PPFLC)

The PPFLC functions as the controller for wheeled mobile robots (WMRs), facilitating seamless motion tracking It utilizes two key inputs: the distance between the robot's current position and its target location, along with the orientation angle error between the desired and actual motion vectors The controller's output signal regulates the speed of the robot's wheels, ensuring precise navigation.

Figure 4.11: The membership function for error of the orientation angle

Figure 4.10: Calculate inputs for PPFLC

The processing of each input in the PPFLC involves multiple membership functions, as depicted in Figures 4.11 and 4.12 The orientation angle between the robot's desired and actual motion vectors is characterized using specific linguistic terms.

In the context of sentiment analysis, various ratings are categorized as follows: BN (Big Negative), N (Negative), SN (Small Negative), Z (Zero), SP (Small Positive), P (Positive), and BP (Big Positive) Additionally, the distance between the initial position and the target is denoted by the variables VF (Very Far) and F (Far).

SF (Small Far), M (Medium), N (Near), VN (Very Near), and Z (Zero)

Desired_Ang = { BN, N, SN, Z, SP, P, BP }

Figure 4.12: The membership function for distance between start point and the target

Figure 4.13: The membership function for the velocities of the right and left wheel

The PPFLC configuration, depicted in Figure 4.13, features output membership functions for Right Velocity (R_Velocity) and Left Velocity (L_Velocity) It employs fuzzy rules that incorporate the variables: Z (Zero), S (Slow), M (Medium), B (Big), and VB (Very Big).

Finally, the fuzzy rule set is established based on the input and output signals Since there are 2 inputs and 2 outputs, we have the following 49 rules Table 4.3 describes this

Table 4.3: Fuzzy Rules for Path Planning Fuzzy Logic Controller

Distance BN N SN Z SP P BP

Z ZL ZL ZL ZL SL SL ML

MR SR SR ZR ZR ZR ZR

SL SL ZL SL ML BL BL

BR BR MR SR ZR SR SR

N SL SL SL SL BL BL VBL

VBR BR BR SR SR SR SR

SL SL SL ML BL BL VBL

VBR BR BR MR SR SR SR

SL SL SL BL BL BL VBL

VBR BR BR BR SR SR SR

SL SL ML BL BL BL VBL

VBR BR BR BR MR SR SR

VF SL SL SL VBL BL BL VBL

VBR BR BR VBR SR SR SR

4.6.4 Obstacle Avoidance Fuzzy Logic Controller (OAFLC)

To navigate obstacles in unfamiliar environments, the OAFLC is engineered to generate control signals for the velocities of the right and left wheels It processes inputs from seven ultrasonic sensors, which are organized into three groups: the left sensors (S1, S2), the front sensors (S3, S4, S5), and the right sensors (S6, S7), to effectively command the robot's movements.

IMAGE PROCESSING SOFTWARE DESIGN FOR NONVERBAL

Face detection method using Haar Cascade

Face detection plays a vital role in computer vision and image processing, facilitating the identification and localization of human faces in images and video frames A prominent method for achieving this is the Haar cascade algorithm, which utilizes Haar-like features and a machine learning approach to detect faces accurately and efficiently in real-time applications.

5.1.2 Haar-like features in the cascade filter method

Haar-like features are rectangular regions that are classified into various regions, as illustrated in Figure 5.1:

Viola and Jones introduced four essential features for object identification, known as Haar-Like features Each feature is formed by combining two or three rectangles, which can be either black or white, as demonstrated in Figure 5.2.

To enhance object identification, the four primary Haar-Like features are expanded and organized into three distinct categories: Line features, Edge features, and Center-Surround features.

Figure 5.2: Haar-like square and rectangle features [39]

The Haar-Like feature values can be calculated by measuring the difference between the sums of pixel intensities in the black and white areas, as illustrated by the following formula.

Viola and Jones developed the Integral Image, a 2-dimensional array that matches the dimensions of the original image utilized for Haar-like features Each element in this array is derived by summing the pixel values located above and to the left of the corresponding pixel in the image.

The formula of Interal Image:

Assuming it is necessary to compute the total gray level value of area D as illustrated below, we can compute it using the following method:

Wherethe value at point P4 on the Integral Image corresponds to the sum of A, B,

C, and D Similarly, the value at point P2 is the sum of A and B, the value at point P3 is the sum of A and C, and the value at point P1 is A So this equation can be rewritten as follows:

Figure 5.3: Three basic feature in Cascade filter [39]

5.1.3 The Adaboost classifier in method cascade

AdaBoost, a sophisticated nonlinear strong classifier developed by Freund and Schapire in 1995, utilizes the boosting technique to enhance classification performance It operates by linearly combining multiple weak classifiers, resulting in a more robust and effective overall classifier.

The Cascade of Boosted Classifiers is an efficient model that utilizes multiple layers of AdaBoost with weak classifiers based on decision trees featuring Haar-like features During training, the model evaluates all features in the training samples, which can be time-consuming However, many samples contain easily recognizable background patterns that require only a few features for identification Traditional classifiers, regardless of the complexity of the patterns, must consider all features, leading to unnecessary processing time In contrast, the Cascade of Classifiers is designed to reduce processing time and minimize false positives by employing a tiered approach, where each layer focuses on progressively more challenging patterns For an object to be classified accurately, it must pass through these layered AdaBoost models.

Figure 5.6: Combine weak learner into strong learner [40]

The AdaBoost layers enhance face detection by training later layers with challenging negative samples, specifically non-human faces that the system misidentifies This approach allows the classifier to learn from difficult backgrounds, reducing misidentification Consequently, easily recognizable background patterns are filtered out in the first layer, optimizing processing time while ensuring effective face detection.

Viola and Jones use AdaBoost to combine weak classifiers using Haar-like features in a cascading model as follows:

5.1.5 How the haar cascade filter work

The Haar cascade algorithm detects faces through a systematic process that begins with the computation of Haar-like features, which are simple rectangular patterns capturing local image characteristics such as edges and color contrasts These features are calculated by subtracting pixel values in defined regions of an image To enhance computational efficiency, the integral image technique is utilized, allowing for quick calculations of Haar-like features across any rectangular area The Haar cascade classifier comprises multiple stages of weak classifiers, each focusing on specific Haar-like features As the classifier evaluates regions of interest, it determines potential face regions, progressing through stages while rejecting non-candidates to optimize processing time During detection, the classifier scans the image or video frame at various scales and positions, applying its cascade of weak classifiers to identify faces.

79 region of interest Regions that pass through all stages are considered as detected faces

Hand detection

Hand detection plays a vital role in computer vision and human-computer interaction systems Developed by Google, MediaPipe is an open-source framework that delivers a complete solution for creating real-time multimedia processing pipelines It features pre-built models and modules, including hand detection, which provides precise and efficient hand tracking and gesture recognition functionalities.

MediaPipe's hand detection utilizes a deep learning-based pose estimation model to initially assess the hand's pose, determining its location and key points This data allows for the extraction of a region of interest (ROI) focused on the hand, enhancing detection accuracy Following this, a hand detection model, trained on extensive datasets, effectively identifies and tracks the hand within the ROI Additionally, MediaPipe features a model that estimates hand landmarks, including fingertips and the palm center, facilitating precise hand tracking and gesture analysis This comprehensive approach integrates pose estimation, region extraction, hand detection, and landmark estimation, resulting in real-time, accurate hand tracking and interaction capabilities.

Figure 5.8: Face detection using haar cascade

Figure 5.9: Hand detection using Mediapipe [17]

Convolutional neural network algorithm (CNN)

Neural networks, also known as artificial neural networks, are computational models inspired by the human brain's structure and function They consist of interconnected layers of artificial neurons that process and transmit information, making them popular in machine learning applications such as image recognition, natural language processing, and time series analysis By mimicking biological neurons, neural networks learn from data through a training process that adjusts the weights and biases of connections, typically utilizing large datasets and optimization algorithms like gradient descent Stacking layers creates deeper architectures, allowing networks to learn hierarchical data representations Their ability to capture complex patterns and relationships has led to impressive problem-solving capabilities in various domains.

Convolutional Neural Networks (CNNs) are extensively used for image recognition and classification, particularly in applications like object identification and face recognition These models process input images by interpreting them as arrays of pixels, with dimensions defined by height, width, and depth (H x W x D) For instance, an RGB image measuring 6x6 pixels can be represented as a 6x6 matrix with three color channels.

Figure 5.10: The model of the neural network [41]

In the CNN model, input images are processed through multiple convolutional layers equipped with filters (kernels) during both training and testing phases This is followed by the integration of fully connected layers, culminating in the application of the Softmax function to classify objects by assigning probability values ranging from 0 to 1 The accompanying diagram illustrates the comprehensive flow of the CNN in processing input images and classifying objects based on these probability values.

A Convolutional Neural Network (CNN) is a specialized neural network architecture tailored for processing grid-like data, particularly images CNNs excel in various computer vision applications, including image classification, object detection, and segmentation Key components of a CNN include convolutional layers, pooling layers, and fully connected layers, which work together to extract features and improve accuracy in visual tasks.

Convolutional layers serve as the fundamental components of Convolutional Neural Networks (CNNs), utilizing multiple filters or kernels that traverse the input image to execute element-wise multiplications and aggregations This technique effectively captures local patterns and feature representations, allowing the network to learn hierarchical features across various levels of abstraction.

Figure 5.11: Convolutional neural network architecture [42]

Figure 5.12: The computational process of the convolutional layer [43]

Pooling layers are essential in deep learning as they decrease the spatial dimensionality of feature maps produced by convolutional layers By employing techniques such as max pooling or average pooling, these layers downsample feature maps, preserving key information while minimizing computational demands This process enhances the network's robustness against translations and diminishes sensitivity to minor spatial changes.

Activation functions are crucial for introducing non-linearity in neural networks, allowing them to learn intricate relationships In Convolutional Neural Networks (CNNs), popular activation functions include the rectified linear unit (ReLU), sigmoid, and hyperbolic tangent (tanh) Among these, ReLU is favored for its simplicity and effectiveness in addressing the vanishing gradient problem.

Fully connected layers, or dense layers, are essential components of CNN architecture, typically positioned at the end They establish connections between every neuron in one layer and all neurons in the next, facilitating the final classification process These layers utilize the features learned from the earlier convolutional and pooling layers to make accurate predictions.

Figure 5.14: The fully connected layer [43]

Method to determine the distance from the robot to the human’s face

To calculate the distance from a camera to a face using Mediapipe and mathematical methods, begin by setting up your development environment with essential libraries like OpenCV and Mediapipe, followed by importing the necessary modules in your programming language.

To get started with video capture, ensure your environment is set up properly Use OpenCV to initialize the camera and begin reading frames for further processing.

Leverage Mediapipe's capabilities to detect facial landmarks by loading its pre-trained face detection and landmark models For each frame, apply the face detection model to identify the face region, establishing the area of interest After detecting the face, utilize the facial landmark model to pinpoint specific facial landmarks within that region.

To accurately calculate distance on the face, it's essential to identify specific reference points, such as the center of the eyes or the tip of the nose These landmarks serve as the foundation for measurement, and the corresponding points provided by the facial landmark model should be used for this purpose.

To determine the actual size of a reference object, select an item with known dimensions, such as the average distance between human eyes, which is approximately 6.3 cm, a value that applies universally.

After establishing the actual size of the reference object, the subsequent step involves measuring its size within the image This measurement can be achieved by calculating the Euclidean distance between the facial landmarks associated with the reference object The Euclidean distance formula, which computes the distance between two points (x1, y1) and (x2, y2), is essential for this process.

Then, you can apply perspective projection principles to estimate the distance to the face This calculation takes into account the camera's focal length, sensor size, and other relevant parameters

To calculate the focal length (f), you need to establish key measurements, including the actual distance between the eyes (d), the distance from the eye to the lens, and the size of the eyes in the image Once these dimensions are defined, you can determine F using the appropriate formula.

Then in our application we will always calculate the distance from the face to the lens (f) using the formula:

After calculating the distance, we can effectively present it in a suitable format, whether by displaying it as text on the screen or utilizing it for additional processing or application logic as required.

Emotion recognition based on transfer learning

Facial emotion recognition (FER) using transfer learning (TL) [20] is an approach that leverages pre-trained deep learning models to accurately detect and classify emotions

Figure 5.16: The field of view and local length

Transfer learning enables the application of knowledge acquired from one task, such as image classification, to enhance performance in a related task like emotion recognition By reusing learned features, this approach effectively leverages existing data to improve the accuracy of interpreting facial expressions.

Transfer learning offers significant advantages for recognition tasks, such as facial emotion recognition, by enabling faster training compared to building models from scratch Pre-trained models, having already learned generic features from large-scale datasets, reduce convergence time and perform well even with limited labeled data These models generalize effectively to new tasks, extracting useful low-level features like edges and textures, as well as high-level features related to facial expressions Additionally, they capture essential knowledge about object shapes, textures, and spatial relationships, facilitating the learning of task-specific features With a variety of pre-trained models available, users can select architectures that have shown strong performance in similar domains, providing flexibility and adaptability for specific recognition needs.

In this section, the structure of the emotion recognition method is shown in Figure

5.17 Here, the convolutional layer of the pre-trained model will be kept, and the dense layer will be removed Then, a new dense layer will be created and added to the existing layer of the pre-trained model to create a classification model for the Fer dataset The newly added dense layer is a component of the classification model Next, the newly added dense layer and a portion of the pre-trained model's convolutional layers are fine-tuned on the emotion dataset In the proposed method, the fine-tuning is performed on the new classifier and on all retained layers of the pre-trained model

The study utilizes the Fer2013 dataset, a well-established and highly regarded resource in facial emotion recognition research This dataset was specifically developed to enhance the accuracy and effectiveness of emotion detection in facial expressions.

Figure 5.17: Architecture of transfer learning for emotion recognition

The study introduces a standardized benchmark for assessing the performance of algorithms in emotion recognition from facial expressions, utilizing a dataset of 35,887 grayscale images categorized into seven fundamental emotions: anger, fear, sadness, neutrality, happiness, surprise, and disgust Each image has a resolution of 48 x 48 pixels, focusing on various facial expressions For training purposes, 32,298 images are allocated, while the remaining images are reserved for validation The dataset's distribution is illustrated in Figure 5.18.

The image of the Fer2013 dataset is shown in Figure 5.19

Figure 5.18: The distribution of the emotion dataset

Figure 5.19: The image of the Fer2013 dataset

5.5.3 Methodology of emotion recognition based on transfer learning

The AI model utilizes images for emotion classification by employing a CNN architecture based on transfer learning with the pre-trained VGG19 model, known for its excellent feature extraction capabilities The VGG19 model, consisting of 19 layers (16 convolutional and 3 fully connected), is modified by removing its top layers and adding new dense layers to enhance classification accuracy Adjustments are made to tailor the model to the specific emotion recognition dataset, followed by fine-tuning on the Fer2013 dataset The final layer is designed to predict one of seven emotional classes—angry, fear, sad, neutral, happy, surprise, and disgust—by utilizing seven neurons for classification The fine-tuning process integrates the existing convolutional layers of VGG19 with the newly added dense layers to optimize performance on the emotion recognition task.

During the testing phase, a human face image undergoes preprocessing, including cropping and converting to grayscale, before being input into the model Ultimately, the model predicts one of seven possible emotions based on the processed image.

Figure 5.20: Method for emotion recognition based on Transfer Learning

The proposed model, illustrated in Figure 5.20, incorporates four additional dense layers, highlighted in green It begins with a global average pooling (GAP) layer that reduces the spatial dimensions of feature maps to a 1x1 size by averaging each channel, transforming multi-dimensional feature maps into a 1D vector for the fully connected layers The GAP layer effectively replaces the max pooling layer of the pre-trained VGG19 model, leading to superior emotional classification outcomes Following the GAP layer, two dense layers with 1024 neurons each facilitate the classification process, culminating in an output layer that corresponds to the number of emotions being classified The model features five blocks for image feature extraction, each containing convolutional and max pooling layers, with the output of one block serving as the input for the next Input data of size 48x48x3 is processed through these blocks, resulting in a 3x3x512 output, which the GAP layer converts into a 512-sized vector for the first dense layer Classification occurs in the dense layers, where the final layer computes the probabilities for seven distinct emotions Fine-tuning enhances accuracy, with the option to freeze certain blocks, and the best results are achieved when all layers, including dense layers, are utilized due to the random initialization of their weights.

The model is trained on a dataset of 32,298 images, achieving an impressive accuracy of over 91.85% in emotion classification after 60 epochs Furthermore, the evaluation dataset shows an accuracy of 70.02%.

Identification recognition process

Implementing face recognition technology on robots is crucial for developing attendance systems in workplaces and classrooms This technology not only facilitates user identification but also enhances security by activating protective measures when an unfamiliar person enters the robot's designated area.

Figure 5.21: Architecture of VGG19 model for emotion recognition method

The extraction algorithm employs a pre-trained deep neural network model called FaceNet, which generates a 128-dimensional vector representation of a person's face, capturing key features like lip length, nose length, and eye distance These measurements enable the comparison of facial similarities between images Utilizing a Support Vector Machine Classifier (SVM), the algorithm calculates the distance between the 128-dimensional encodings of the face to be recognized and those stored in the database By establishing a threshold, it determines if the result matches a known identity or is an unknown individual Upon successful detection and identification of a user's face, the program can access or update the user's real-time information from the database, enhancing human-machine interaction.

Figure 5.22: Methodology for the identification recognition method

Hand gesture recognition to control the robot

In nonverbal communication with robots, sending requests enhances adaptability during interactions This article utilizes Google's Mediapipe library for hand gesture recognition, allowing users to select interactive functions through an OpenCV interface without screen contact Once a gesture is made, the robot receives control signals via Socket and UART protocols to manage its head, arms, and base This innovative interaction method diversifies user engagement and boosts operational efficiency Results of the identification system and interface are illustrated in Figure 5.24.

Figure 5.23: The result of identification recognition in the attendance system

Figure 5.24: The result of hand gesture recognition

NATURAL LANGUAGE PROCESSING SOFTWARE DESIGN FOR THE

Voice recognition

Speech-to-text technology, driven by Google's API, enables the conversion of spoken language into written text using advanced machine learning algorithms for real-time transcription By incorporating the Google Speech-to-Text API into applications, developers can facilitate automatic transcription, significantly improving accessibility, productivity, and overall user experiences.

The Google Speech-to-Text API utilizes advanced deep neural networks to analyze audio recordings and produce highly accurate text transcriptions Trained on extensive multilingual datasets, it effectively recognizes various speech patterns, accents, and languages Whether transcribing phone calls, conference presentations, or other audio sources, the API processes input to deliver precise transcriptions that capture both content and context, ensuring optimal efficiency in the speech-to-text conversion process.

Text processing techniques

Tokenization is an essential preprocessing method in Natural Language Processing (NLP) that breaks down text into smaller units called tokens, which can be characters, words, or subwords This technique is vital for various NLP tasks as it allows machines to effectively understand and process human language.

The process of tokenization involves segmenting a continuous stream of text, such as a sentence or a document, into discrete units Tokenization is typically performed using

Figure 6.1: Speech to text method [44]

Natural Language Processing (NLP) frameworks offer a variety of strategies and libraries, including rules-based heuristics, pre-trained models, and statistical techniques The choice of tokenization method depends on the specific NLP task at hand The most prevalent form of tokenization is word tokenization, which breaks text into individual words For instance, the sentence "Tôi yêu Việt Nam" is tokenized into the words: ["Tôi", "yêu", "Việt", "Nam"].

Stopwords and punctuation removal are essential preprocessing techniques that enhance the quality and efficiency of text analysis in natural language processing (NLP) This process involves the elimination of insignificant or redundant words and punctuation marks from text data Stopwords, which are frequently occurring words in a language, often lack significant meaning and do not contribute substantially to the understanding of the text In Vietnamese, examples of stopwords include "là," "và," and "những."

Removing stopwords enhances data dimensionality reduction and highlights more informative words This practice is advantageous in NLP tasks like text classification, sentiment analysis, and information retrieval By eliminating stopwords, we improve computational efficiency, reduce noise, and increase analysis accuracy, allowing essential content-bearing words to emerge more clearly.

Punctuation marks like periods, commas, question marks, and exclamation points play a crucial role in grammar and syntax, but they often do not enhance the semantic understanding of written text In many natural language processing (NLP) tasks, removing punctuation can simplify the text and facilitate further processing This removal also helps to eliminate inconsistencies from varying punctuation styles, particularly when handling noisy or unstructured data, such as social media posts and user-generated content.

By removing stopwords and punctuation marks, the resulting text data becomes more focused on the meaningful content, enabling more accurate analysis and

When interpreting NLP results, it's crucial to consider the removal of stopwords and punctuation in relation to the specific task and analytical needs Certain stopwords or punctuation may hold significant meaning, and their exclusion could result in a loss of important information.

Text classification

6.3.1 The extraction of CNN in natural language processing

Convolutional Neural Networks (CNNs) have achieved significant success in computer vision tasks like image classification and object detection, but their application extends to Natural Language Processing (NLP) as well In NLP, CNNs excel in tasks such as text classification, sentiment analysis, named entity recognition, and machine translation These deep learning models effectively capture local patterns and hierarchical structures in data, drawing inspiration from the visual cortex of the human brain By treating text as a one-dimensional signal, CNNs utilize convolutions to identify local patterns, or n-grams, within the input sequence, enhancing their performance across various NLP applications.

To effectively process text in Natural Language Processing (NLP) using Convolutional Neural Networks (CNNs), the first step involves converting the input text into numerical representations through word embeddings, which encapsulate both semantic and syntactic information The CNN architecture typically incorporates one or more convolutional layers that utilize 1D convolutions suited for the sequential nature of text data These layers apply filters to the input sequence, detecting specific local patterns by performing element-wise multiplications and summations to generate feature maps Following the convolutional layers, pooling layers, particularly max pooling, are employed to reduce the dimensionality of the feature maps while retaining the most significant features By leveraging CNNs in NLP tasks, models can autonomously learn pertinent features and recognize both local and global patterns in text, enhancing their effectiveness across various applications.

6.3.2 Bidirectional Long short-term memory model

A BiLSTM model consists of both a forward and a backward LSTM, representing an advanced form of recurrent neural networks (RNNs) This model incorporates a gating unit to overcome the gradient disappearance issue commonly faced by traditional RNNs As a result, LSTMs are better equipped to capture long-term dependencies, enhancing the RNN's capability to identify and leverage relationships within long-distance data.

An LSTM cell unit features a unique structure consisting of four main components: the input gate (i t), output gate (o t), forget gate (f t), and storage unit (c t) This design allows LSTM modules to effectively manage information flow, enhancing their capability in sequence prediction tasks.

The hidden state in the BiLSTM model is denoted as h t, while the Sigmoid activation function used is tanh The cell state, c t, signifies long-term memory, and the output gates, o t, control the output from the memory cell This model effectively extracts deeper semantic relationships within the context Figure 6.4 illustrates the structure of the BiLSTM.

Equations (6.1) and (6.2) show the formulas for the model's state at each moment:

( , i 1 ) i i h  LSTM x h  (6.2) where x i denotes the input vector at time i, h i  1 is the forward hidden layer vector at time i-1, and h i  1 is the reverse hidden layer vector at time i-1

6.3.3 Text classification method based on CNN-BiLSTM model

The dataset is organized in a CSV file comprising two columns: the content column and the label column, which categorizes data into two distinct classes This data was meticulously gathered by our team through daily conversations and academic insights to ensure a comprehensive dataset We aimed for a balanced distribution between the two classes, achieving 51% for the first class and 49% for the second class The structure of the dataset is illustrated in Fig 6.5.

This method involves preprocessing the input sentence through tokenization and punctuation removal to normalize the data and eliminate noise It also includes filtering out stop words and punctuation, such as "là," "của," "làm," "và," and "hay."

"còn", "với", "không", "được", "cũng", "này", "cho", "nên", "đã", "đang", "muốn", "thì",

Certain words such as "lại," "nếu," "ai," and others are disqualified due to their limited communicative value The data is then transformed into a numerical format through an embedding layer with a dimension of 300, preparing it for model input In this context, the CNN-BiLSTM model is utilized for text classification, effectively addressing the exploding and vanishing gradient issues commonly encountered in RNN backpropagation This model is particularly useful for time series data analysis, where input sequences exhibit dependencies and benefit from prior information The BiLSTM architecture enhances sequence learning by processing information in both directions, employing two separate LSTM networks: one that facilitates forward information flow and another that allows backward information flow.

Figure 6.5: The structure of data for text classification

The model utilizes a future-to-past sequence approach, allowing it to effectively learn by integrating contextual information from both directions The output is generated by combining the latest states of the forward and backward LSTM networks We have integrated a CNN network with a BiLSTM layer, where the data, following the embedding layer, undergoes feature extraction through a 1-dimensional convolution layer and the Bi-LSTM network To mitigate overfitting, a dropout layer is subsequently incorporated.

After the dropout layer, a fully connected layer is utilized to transmit the results to the Sigmoid layer, ultimately producing the final output As illustrated in Figure 6.6, the model's training process employs the Adam optimizer and utilizes Binary Crossentropy as the loss function.

6.3.4 The result of text classification

All the parameters of the training model are optimized during the actual training of the model, so the selected parameters provide high accuracy on the entire data set After

After 16 epochs of training, the model achieved an impressive accuracy of 99.63% on the training dataset and 95.02% on the test dataset The accuracy and loss trends are illustrated in Figure 6.7 Additionally, the evaluation process, represented by the confusion matrix in Figure 6.8, indicates that the model accurately classified 98% of type 1 data and 93% of type 2 data within the test dataset.

Figure 6.6: CNN combined Bi-LSTM Model for text classification

Figure 6.7: The accuracy and loss chart of model

Question and answer model

The dataset is designed to address user communication needs, featuring 24 distinct tags along with corresponding patterns and responses Each intent is systematically mapped to its respective tag, pattern, and response, forming a comprehensive dictionary-type dataset To prevent overfitting in our neural network model, particularly given the limited data, we meticulously adjust parameters during training to optimize model performance The structure of the training data is illustrated in Table 6.1.

Tags are the labels of each type of user’s question and the corresponding answer in the training dataset

2 Patterns Patterns are user requests, this is the input to the model

3 Responses Responses is the corresponding response for patterns

Speech recognition [25] is implemented in this section for human-robot interaction The Google Speech to Text API is utilized as a tool for recognizing the user's voice This

Figure 6.8: The confusion maxtrix for evaluation process

The API service facilitates the conversion of speech to text in multiple languages, allowing programmers to integrate the Google Speech to Text API into their applications across various platforms By using this service, a robot can accurately transcribe spoken words into text Once the text is obtained, it undergoes processing through natural language processing techniques, starting with tokenization into individual words using the NLTK library in Python This is followed by lemmatization to standardize each word and the removal of duplicates Noise filtering is then applied by eliminating stopwords and punctuation, resulting in cleaner data that is suitable for training models The processed data consists of input patterns and their corresponding output tags When an input sentence is fed into the model, it predicts the most suitable response However, since the model cannot comprehend text directly, the sentences must be converted into numeric vectors before inputting them into the AI model.

In deep learning, we employ a multi-layered neural network to analyze input data, leveraging its powerful capability to extract meaningful features from the dataset.

Figure 6.9: Flowchart of the natural language processing algorithm

This paper presents a dataset organized into multiple classes, each associated with a specific tag When input data is processed by the model, it predicts the corresponding class and generates a random response from a predefined list A feedforward neural network is employed for training on this dataset, where data flows in a single forward direction The output from one hidden layer serves as the input for the subsequent layer, ensuring a structured flow of information throughout the network.

The process is sequential, the following mathematical equation will describe this process

In the mathematical model described, the layers of the network are denoted as \(i \in [1, , n]\), where \(v_i\) represents the activation vector of the \(i\)th layer, and \(u_{i-1}\) is the output from the \((i-1)\)th layer, serving as the input for the \(i\)th layer Each layer is associated with a learnable bias matrix \(w_i\) and utilizes the ReLU activation function for its computational efficiency, enhancing model performance The model begins with an input vector \(u_0\) and processes through two hidden layers, each containing 8 neurons, culminating in an output layer that matches the number of classification labels The cross-entropy loss function is employed, with results presented in Table 6.2.

Table 6.2: Loss value of model

The large laguage model for handling the knowledge text

ChatGPT is a machine learning model based on the transformer architecture, designed to understand and generate human-like text By being trained on extensive datasets, including web pages, books, and articles, it learns the patterns and structures of language This pre-training allows ChatGPT to recognize statistical correlations in language data, enabling it to produce cohesive and contextually relevant text Its versatility makes it suitable for various applications such as language translation, text summarization, and conversational agents.

6.5.2 Collect the information based on ChatGPT

To effectively meet user requests for information, the system must accurately extract and provide relevant data This process often involves handling large volumes of information that require both precision and current relevance While creating a static database and developing a training model can facilitate answering inquiries, it does not address the issue of outdated information.

To address various challenges, search engines like Google and Bing offer several methods for user responses This paper focuses on the ChatGPT tool due to its extensive training data, exceptional customization capabilities for information retrieval, and effectiveness in providing precise and updated solutions to user inquiries.

To provide accurate and relevant responses, user queries are categorized before being processed through the ChatGPT language model via an API Instead of developing a new model, we leverage pre-trained models to efficiently send and receive information Given the extensive knowledge embedded in these models, generating precise output requires careful reconfiguration, utilizing parameters like max_tokens (to limit the response length), n (to control the number of sentences), and temperature (to adjust response diversity) This approach ensures that the generated answers meet user needs and maintain high accuracy across various fields The outcome of this knowledge processing is illustrated in Figure 6.11.

Name Entity Recognition in Vietnamese language

Effective human-robot communication requires robots to recognize and store information about humans, which can be achieved using Name Entity Recognition (NER) NER is a crucial natural language processing (NLP) task that identifies and classifies named entities in text, such as names of people, organizations, locations, and dates By extracting and labeling these entities, NER enhances the understanding of text structure and semantics, facilitating tasks like information extraction, sentiment analysis, question answering, and text summarization Recent advancements in NER have significantly improved various applications, enabling automated analysis of large textual data volumes In the Vietnamese context, the Underthesea library offers efficient NER capabilities, allowing developers and researchers to extract and classify named entities from Vietnamese text effectively This functionality is essential for applications involving information extraction, entity linking, and sentiment analysis, ultimately enhancing text understanding and application performance.

Figure 6.10: The result of handling the knowledge text

Figure 6.11: The result of NER

EXPERIMENTS AND EVALUATIONS

Overview

There are three critical elements to consider when evaluating the overall effectiveness of a social service robot:

 The Social service robot must be fully autonomous

 The robot must interact naturally with the user based on nonverbal and verbal interaction

 The robot must be perfect in its tasks

A graphical display interface system is designed to enhance user-robot interaction, allowing applications to effectively engage with the real world through intuitive user interface design and robust robotic functionalities.

Basic parameters of the robot

Table 7.1 provides a comprehensive overview of the hardware results obtained after the meticulous design and manufacturing of our robot

Figure 7.1: Complete social service robot model

5 Motion kinematics of the base 2 differential wheels, 1 castor wheel

Speech interaction experiments

A graphic user interface (GUI) facilitates effective human-robot communication, utilizing the Tkinter module within the Python programming language Tkinter is an excellent library for developing applications due to its seamless integration with Python, enhancing the robot's operational efficiency The GUI features a frame that displays interactions between humans and robots, as well as a camera display frame that represents the robot's vision, enabling it to gather information about individuals through images An example of this human-robot conversation is illustrated in Figure 7.2.

Figure 7.2: GUI of interaction system

To analyze the interaction process with the robot, a survey was conducted involving 90 participants, including 8 university instructors, 20 engineers, 39 university students, and several freelancers, after they engaged with the robot for 2-3 minutes Each participant was asked to communicate randomly with the robot, followed by three questions regarding their experience: the robot's level of understanding, any technical issues encountered during communication, and their overall satisfaction with the interaction The survey results are presented in a pie chart, categorizing responses into five distinct levels.

The understanding level of the robot is categorized into five cognitive abilities, ranging from very poor to very good A survey reveals that 83% of users perceive the robot as having a fairly strong cognitive ability, noting that its responses are quite authentic during interactions Conversely, only 6% of users feel the robot lacks sufficient cognitive skills, particularly in maintaining dialogue This indicates that the robot demonstrates a commendable ability to engage with people in real-life situations.

In a recent analysis of user experiences regarding technical issues during communication, it was found that 61% of users report minimal errors, indicating a generally positive perception of the system's reliability Meanwhile, 25% of users experience minor problems, such as insensitive touch keys Notably, 14% of users frequently face significant issues, including system failures to respond or partial responses This data highlights the varying levels of user satisfaction and the need for ongoing improvements to enhance the overall functionality of the system.

Figure 7.3: The level of awareness of robot

106 running effectively, there are still some issues to be resolved in a few cases The chart showing the level of error shown in Figure 7.4

The survey results regarding user satisfaction with the robot's interaction reveal five levels of feedback: Very Unsatisfied, Unsatisfied, Neutral, Satisfied, and Very Satisfied Levels 1 and 2 reflect dissatisfaction, while level 3 indicates neutrality, and levels 4 and 5 denote satisfaction Notably, 78% of users reported that the system responds quickly and effectively to their inquiries, whereas only 7% expressed dissatisfaction due to incorrect or unresponsive answers Overall, the robot demonstrates excellent reliability and human-like interaction capabilities, as illustrated in the satisfaction chart in Figure 7.5.

Figure 7.4: The level of error of robot

Figure 7.5: The level of satisfaction of user

Emotion recognition experiments

In an experiment focused on emotion recognition, the effectiveness of a robot in accurately identifying human emotions during conversations was evaluated A total of 100 participants engaged in interactions with the robot, each completing a form that included a 7x7 table This table represented the current emotions expressed by the participants (rows) and the corresponding emotions recognized by the robot (columns) Participants were positioned within the robot's field of vision and randomly displayed seven distinct emotions Throughout the interaction, the robot was tasked with recognizing these emotions based on text prompts from a supervisor The findings of this experiment are summarized in Table 7.2.

Table 7.2: The results of the emotion recognition process

The study's results reveal that neutral emotions are predicted with the highest accuracy of 91%, while disgust emotions have the lowest accuracy at 75% Notably, 16% of disgust emotions were incorrectly identified as anger, highlighting the confusion between these two similar emotions Despite this, the correct prediction rates for the other six emotions exceed 80%, resulting in an overall average accuracy of 84.14% for the emotion recognition process.

The combination of the base, 2 arms and neck of the robot

The test results indicate that the robot's handshake action is effective, as the robot arm can accurately respond to the desired speed and position Both robot arms demonstrate stability during operations; however, shaking occurs in the stitches when the robot arm is controlled to move at high speeds This behavior is observed when the robot receives signals from the main controller.

The robot's 108 processor effectively facilitates communication between the Master and Slave units Overall, the design and programming of the robot arm and neck successfully fulfill the previously established requirements.

The innovative design of the robot, featuring a base, two arms, and a head, enables fluid movements that enhance human-robot interaction As illustrated in Figure 7.7, the robot's base rotation synchronizes with the neck and arms, resulting in a variety of dynamic dance movements.

Figure 7.6: Photos of the robot's right arm movement to perform the handshake task with the interactor

System experiments in the real environment

Figure 7.8: Robot uses body language while talking with people

Figure 7.9: Our robot is interacting with humans in public environments

In this section, the human-robot interaction occurs in real-world settings, emphasizing the importance of timely robot responses, which have been proven effective through rigorous testing Emotional recognition, being computationally intensive, often takes the longest during this process Additionally, it is essential for the robot to execute tasks smoothly without conflicts upon receiving input signals The robot demonstrates hand gestures, such as a handshake for greetings, and can respond to dance requests by converting commands into hardware control signals Experiments conducted in various environments confirmed that the system operates without conflicts.

Self-charging experiments

7.7.1 Navigation and Obstacle Avoidance experiments

In this research, a customized wheel-based robot platform was developed to effectively navigate flat environments A series of experiments were conducted to achieve the research objectives, integrating PPFLC and OAFLC techniques to enhance the robot's target-reaching capabilities The experimental environment, depicted in Figure 7.10, included two obstacles placed on the path to assess the performance of the Avoidance Obstacle Fuzzy Logic Controller Additionally, the Path Planning Fuzzy Logic Controller was evaluated upon the robot's arrival at its destination.

The robot system is deployed in a similar environment where an unexpected static obstacle appears in its path As it scans the area with the obstacle, the robot engages in continuous processing to determine the best avoidance strategy.

Figure 7.10: Robot start to charging station with two obstacles

The 111 plan, shown in Figure 7.11, effectively demonstrates the robot's obstacle avoidance capabilities Its performance is evaluated based on successful navigation around the initial obstacle, as depicted in Figure 7.12.

To validate the effectiveness of the OAFLC (Obstacle Avoidance Fuzzy Logic Controller), a second obstacle is introduced along the robot's path This demonstrates the algorithm's adaptability, as the robot navigates around the obstacles with ease.

Figure 7.11: Robot is processing to avoid first obstacle

Figure 7.12: The robot passed the first obstacle

112 changes its direction to avoid the second obstacle, as shown in Figure 7.13 This successful navigation confirms that the OAFLC satisfactorily meets the initial requirements of obstacle avoidance

The robot initiates its movement from a designated starting point and successfully navigates to its destination In a subsequent experiment, it encounters two unexpected obstacles that require immediate adjustments to its trajectory When the path is clear, the robot follows its pre-planned route without modifications However, in this instance, it must adapt its path to circumvent the obstacles Ultimately, the robot reaches the target location, demonstrating the effectiveness of our Path Planning Fuzzy Logic Controller Additionally, the robot successfully completes its first self-charging task, as illustrated in Figure 7.14.

Figure 7.13: The robot passed the second obstacle

Figure 7.14: The robot reached to the desired target

Table 7.3: Experimental results of fuzzy controller for point-to-point moving algorithm

Table 7.4: Error values through 5 experiments

No Xset (cm) Yset (cm) Xactual (cm) Yactual (cm) X

In this section, we evaluate the performance of the Auto Docking Fuzzy Logic Controller (ADFLC) in guiding a robot's movements through further experiments Equipped with sensors and a camera, the robot was able to perceive its surroundings and provide real-time feedback to the ADFLC This capability enabled the controller to make informed decisions and adapt its strategies effectively The experimental results were highly promising, showcasing the robustness and effectiveness of the ADFLC in enhancing the robot's navigation.

Figure 7.15: Starting to assemble to the charging station

Figure 7.16: Adjust the Aruco marker to the center of the camera

Figure 7.17: Continue backward to reduce the distance Z

Figure 7.18: When the Z distance < 20 cm, control without fuzzy logic controller

Figure 7.19: Robot has successfully reached to the charging station

After implementing the ADFLC, the robot has achieved successful docking at the charging station for battery recharge The robot's movement during the docking process is illustrated in Figures 7.15 to 7.19

Figure 7.20: The process of entering to the charging station

The process of a robot autonomously docking at its charging station is illustrated in Figure 7.20 By leveraging computer vision and a fuzzy controller algorithm, the robot navigates to the charging station's center and executes a reverse movement to complete the charging process A camera on the charging station captures the robot's position, and the images are processed using an Aruco Marker to accurately determine its location This data is wirelessly transmitted to the robot and utilized by the fuzzy logic controller for precise docking control The successful combination of computer vision, Aruco Marker recognition, and the ADFLC showcases the robot's ability to autonomously locate and dock, marking a significant advancement in its autonomy and operational efficiency.

The developed system demonstrates the promising integration of advanced technologies like computer vision, wireless communication, and fuzzy logic control, allowing robots to autonomously navigate, locate, and dock for recharging This innovation greatly enhances the robotics field, facilitating the emergence of more autonomous and efficient robotic systems across diverse applications.

CONCLUSIONS AND RECOMMENDATIONS

Conclusions

Under the guidance of Associate Professor Dr Nguyen Truong Thinh, the project team

The design and development of a social service robot, utilizing both nonverbal and verbal interactions, has been successfully completed and tested Experimental results confirm the system's stable operation, highlighting the effectiveness of the research outcomes.

 Development of a humanoid service robot that operates independently, aiming to attract humans in public areas

We are creating advanced image and audio processing software powered by artificial intelligence, incorporating technologies like natural language processing, computer vision, and neural networks This innovative approach aims to enable robots to engage in both verbal and nonverbal interactions effectively.

 The robot is capable of providing essential information to users upon specific requests: reminders, recommendations, advice, assistance, and activity provision

The design of a social service robot involves completing its structural framework, developing dynamic and kinematic models, and integrating these with fuzzy logic controllers This integration ensures the robot can operate autonomously, avoid obstacles, and navigate effectively to its charging station.

Recommendations

Based on the results they obtained, we determined that the system was capable of generating the following:

 Improve the ability for calculating robot position with high accuracy by using modern technologies such as Lidar, and GPS… and combining ROS to improve Self-propelled performance for robots

 Increase data for training chatbox to make AI more intelligent

 Develop additional features for our social service robot

[1] Trịnh Chất, Lê Văn Uyển, Tính toán thiết kế hệ dẫn động cơ khí , Tập 1/2, NXB Giáo Dục, Hà Nội, 2013

[2] Nguyễn Hữu Lộc, Giáo trình cơ sở thiết kế máy, NXB ĐHQG TP HCM, 2016

[3] Nguyễn Trường Thịnh, Giáo trình kỹ thuật robot, NXB ĐHQG TP HCM, 2014

[4] Nguyễn Thị Phương Hà, Lý thuyết điều khiển hiện đại, NXB ĐHQG TP HCM, 2016

Lưu Trọng Hiếu, Lê Hồng Lâm và Nguyễn Hữu Hiếu đã nghiên cứu và mô phỏng thiết kế bộ điều khiển mờ cho robot di động Nghiên cứu này được công bố trong Tạp chí khoa học và công nghệ Đại học Đà Nẵng, số 1(86) vào năm 2015, trang 48-51 Bộ điều khiển mờ được thiết kế nhằm cải thiện khả năng điều khiển và linh hoạt của robot trong các môi trường khác nhau.

[6] Gregory Dudek, Michael Jenkin, Computational Priciples of Mobile Robotics, Cambridge University Press, New York, 2010

[7] Kevin M Lynch, Nicholas Marchuk, Matthew L Elwin, Embedded Computing and Mechatronics with the PIC32 Microcontroller, Newnes Press, Waltham, 2016

[8] Saeed B Niku, Introduction to Robotics: Analysis, Control, Applications, John Wiley & Sons, Inc., the United States of America, 2010

In their 2013 study published in the International Journal of Advanced Robotic Systems, Mohammed Faisal, Ramdane Hedjar, Mansour Alsulaiman, and Khalid AI-mutib explore the application of fuzzy logic for navigation and obstacle avoidance in mobile robots operating within unknown dynamic environments The research highlights innovative strategies for enhancing robotic movement and safety in unpredictable settings, contributing to advancements in robotic system design and functionality.

In their 2021 article published in the International Journal of Mechanical Engineering and Robotics Research, Nguyen Truong Thinh, Tuong Phuoc Tho, and Nguyen Dao Xuan Hai explore the implementation of adaptive fuzzy control systems for autonomous robots operating in complex environments This research, detailed on pages 216-223 of Volume 10, Issue 5, highlights innovative strategies to enhance robotic adaptability and performance in challenging conditions.

[11] Cherry Myint, Nu Nu Win, Position and Velocity control for Two-Wheel Differential Drive Mobile Mobot, pp 2849-2855, Vol 5, Iss 9, International Journal of Science, Engineering and Technology Research, 2016

[12] Chian-Song Chiu, Teng-Shung Chiang, Yu-Ting Ye, Fuzzy Obstacle Avoidance Control of a Two-Wheeled Mobile Robot, pp 1-6, Proceedings of 2015 International Automatic Control Conference (CACS), 2015

[13] L Cuimei, Q Zhiliang, J Nan, W Jianhua, Human face detection algorithm via Haar cascade classifier combined with three additional classifiers, pp 483-487, 2017 13th IEEE International Conference on Electronic Measurement & Instruments (ICEMI), 2017

[14] A B Kanburoglu, F B Tek, A Haar Classifier Based Call Number Detection and Counting Method for Library Books, pp 504-508, 2018 3rd International Conference on Computer Science and Engineering (UBMK), 2018

[15] Katie Li, Ashutosh Tiwari, Jeffrey Alcock, Pablo Bermell-Garcia, Categorisation of visualisation methods to support the design of Human-Computer Interaction Systems, pp 85-107, Applied ergonomics 55, 2016

[16] Fan Zhang et al, MediaPipe Hands: On-device Real-time Hand Tracking, arXiv:2006.10214 [cs.CV], 2020

[17] George Sung et al, On-device Real-time Hand Gesture Recognition, arXiv:2111.00038v1 [cs.CV], 2021

[18] H Abdi, D Valentin, B Edelman, Neural Networks, No 124, Sage University Paper Series on Quantitative Applications in the Social Sciences, 1999

[19] Teja Kattenborn, Jens Leitloff, Felix Schiefer, Stefan Hinz, Review on Convolutional Neural Networks (CNN) in vegetation remote sensing, pp 24-49, Volume 173, Iss 2, ISPRS Journal of Photogrammetry and Remote Sensin, 2021

[20] Qiang Yang , Yu Zhang , Wenyuan Dai , Sinno Jialin Pan, Transfer Learning, Cambridge University Press, 2020

[21] Paula Fortuna, Se1rgio Nunes, A Survey on Automatic Detection of Hate Speech in Text, pp 1-30, Vol 51, Iss 4, No 85, ACM Computing Surveys, 2018

[22] S Siami-Namini, N Tavakoli, A S Namin, The Performance of LSTM and BiLSTM in Forecasting Time Series, pp 3285-3292, 2019 IEEE International Conference on Big Data (Big Data), 2019

[23] Y Yu, X Si, C Hu, J Zhang, A Review of Recurrent Neural Networks: LSTM Cells and Network Architectures, pp 1235-1270, Vol 31, No 7, Neural Computation, 2019

[24] Xinlu Li, Yuanyuan Lei, Shengwei Ji, BERT‑ and BiLSTM‑Based Sentiment Analysis of Online Chinese Buzzwords, pp 1-15, Vol 14, Iss 11, Journal Future Internet, 2022

[25] Dong Yu, Li Deng, Automatic Speech Recognition A Deep Learning Approach, Springer, United Kingdom, 2016

[26] Shiliang Sun, Chen Luo, Junyu Chen, A review of natural language processing techniques for opinion mining systems, pp 10-25, Vol 36, Iss C, Information Fusion, 2017

[27] International Federation of Robotics, Executive Summary World Robotics 2021 Industrial Robots, link: https://www.computerworld.com/article/2916553/fetch-warehouse-robots-can- work-in-pairs.html, 4/2023

[28] Tim Hornyak, Fetch warehouse robots can work in pairs, link: https://www.computerworld.com/article/2916553/fetch-warehouse-robots-can-work-in- pairs.html, 4/2023

[29] Keily, [MWC 2015] Future Robot Introduces Service Robot "FURO",link: https://us.aving.net/news/articleView.html?idxno5813, 4/2023

The OSHbot is revolutionizing the hardware shopping experience by allowing customers to avoid human interaction This innovative technology assists shoppers in finding the right products without the need to ask for help, catering to those who prefer a more independent shopping experience As hardware stores evolve, the OSHbot represents a significant shift towards automation and customer convenience.

[31] Chris Griffith, Next time you go to hospital or the doctor, look out for a robot named Pepper helping out, link: https://bom.so/WxTotH, 4/2023

[32] John Biggs, The Savioke Robot Is Headed To A Hotel Near You, link: https://techcrunch.com/2016/02/15/the-savioke-robot-is-headed-to-a-hotel-near-you/, 4/2023

[33] Robotnews, Enon - Fujitsu’s Service Robot, link: https://robotnews.wordpress.com/2006/04/08/enon-fujitsus-service-robot-2/, 4/2023

[34] Huong Thu, Robot nhận diện cảm xúc ra đời tại Việt Nam, link: https://vnexpress.net/robot- nhan-dien-cam-xuc-ra-doi-tai-viet-nam-2184369.html, 4/2023

[35] Mai Anh, Robot phục vụ cafe được sản xuất tại Việt Nam, link: https://vnexpress.net/robot- phuc-vu-cafe-duoc-san-xuat-tai-viet-nam-3684797.html, 4/2023

[36] Vu Thuy, Robot Cô Ba 'chạy sô' nhà hàng ở Sài Gòn, link: https://tuoitre.vn/robot-co-ba- chay-so-nha-hang-o-sai-gon-20180118224319402.htm, 4/2023

[37] Elevate Enrich, The Golden Ratio is Literally Everywhere!, https://www.elevateenrichment.sg/the-golden-ratio-is-literally-everywhere/, 4/2023

[38] MORSE, Human Posture (kinect version), https://www.openrobots.org/morse/doc/stable/user/sensors/mocap_posture.html, 4/2023

Tìm hiểu về phương pháp nhận diện khuôn mặt của Violas và John, bài viết này trình bày cách thức hoạt động và ứng dụng của công nghệ nhận diện khuôn mặt trong nhiều lĩnh vực khác nhau Việc áp dụng công nghệ này không chỉ giúp tăng cường bảo mật mà còn nâng cao hiệu quả trong các hệ thống giám sát Hơn nữa, bài viết cũng đề cập đến các thách thức và vấn đề đạo đức liên quan đến việc sử dụng công nghệ nhận diện khuôn mặt trong xã hội hiện đại.

[40] Ryukkkk, Step-by-Step Guide to Implement Machine Learning VI – AdaBoost, link: https://www.codeproject.com/Articles/4114375/Step-by-Step-Guide-to-Implement-Machine- Learning, 5/2023

[41] Mark Patrick, Introduction to neural networks, link: https://www.electronicsworld.co.uk/introduction-to-neural-networks/12544/, 5/2023

[42] Nguyen Minh Thanh, Convolutional neural network, link: https://rpubs.com/thanhleo92/407837, 5/2023

Convolutional Neural Network (CNN) là một mô hình học sâu hiệu quả trong việc phân loại ảnh Bài viết của Tran Duc Trung cung cấp cái nhìn tổng quan về CNN và hướng dẫn thực hiện một ví dụ nhỏ để phân loại ảnh CNN sử dụng các lớp tích chập để trích xuất đặc trưng từ hình ảnh, giúp cải thiện độ chính xác trong nhận diện đối tượng Việc áp dụng CNN trong phân tích hình ảnh đang ngày càng trở nên phổ biến trong nhiều lĩnh vực như y tế, tự động hóa và an ninh Tham khảo bài viết để hiểu rõ hơn về cách thức hoạt động và ứng dụng của CNN trong phân loại ảnh.

[44] Naveenkumar Paramasivam, Converting Speech To Text Using Python, link: https://www.c- sharpcorner.com/article/speech-to-text-recognition-using-python/, 5/2023

[45] Pranav Bhounsule, Robotics Lec7: Inverse kinematics of differential drive car (Fall 2020), link: https://www.youtube.com/watch?v=6oor2CnSx8M&t20s, 5/2023

TRƯỜNG ĐẠI HỌC SƯ PHẠM KỸ THUẬT TP HỒ CHÍ MINH

KHOA ĐÀO TẠO CHẤT LƯỢNG CAO

 ĐỒ ÁN TỐT NGHIỆP NGÀNH CÔNG NGHỆ KĨ THUẬT CƠ ĐIỆN TỬ

THIẾT KẾ VÀ PHÁT TRIỂN ROBOT PHỤC VỤ XÃ HỘI DỰA TRÊN TƯƠNG TÁC PHI NGÔN NGỮ VÀ LỜI NÓI

Tp Hồ Chí Minh, tháng 7 năm 2023

GVHD: PGS.TS NGUYỄN TRƯỜNG THỊNH SVTH: TRẦN QUANG HUY

SVTH: NGUYỄN ĐỨC TÀI SVTH: PHẠM MINH TUẤN

DỰA TRÊN TƯƠNG TÁC PHI NGÔN NGỮ VÀ LỜI NÓI

Họ và tên Chữ ký Ngày

Số lượng Khối lượng Tỷ lệ

1 Duyệt Nguyễn Trường Thịnh ĐẾ

Khoa: Đào tạo Chất lượng cao Trường: Đại học Sư Phạm Kỹ Thuật TP.HCM

- Dung sai các kích thước còn lại là

THIẾT KẾ VÀ PHÁT TRIỂN ROBOT PHỤC VỤ XÃ HỘI

Nguyễn Trường Thịnh Hướng dẫn

DỰA TRÊN TƯƠNG TÁC PHI NGÔN NGỮ VÀ LỜI NÓI

THIẾT KẾ VÀ PHÁT TRIỂN ROBOT PHỤC VỤ XÃ HỘI

Họ và tên Chữ ký Ngày

- Dung sai các kích thước còn lại là

Trường: Đại học Sư Phạm Kỹ Thuật TP.HCM Khoa: Đào tạo Chất lượng cao

Nguyễn Trường Thịnh Hướng dẫn

DỰA TRÊN TƯƠNG TÁC PHI NGÔN NGỮ VÀ LỜI NÓI

THIẾT KẾ VÀ PHÁT TRIỂN ROBOT PHỤC VỤ XÃ HỘI

Họ và tên Chữ ký Ngày

- Dung sai các kích thước còn lại là

Trường: Đại học Sư Phạm Kỹ Thuật TP.HCM Khoa: Đào tạo Chất lượng cao

DỰA TRÊN TƯƠNG TÁC PHI NGÔN NGỮ VÀ LỜI NÓI

Họ và tên Chữ ký Ngày

Số lượng Khối lượng Tỷ lệ

Khoa: Đào tạo Chất lượng cao Trường: Đại học Sư Phạm Kỹ Thuật TP.HCM

- Phương pháp gia công: In 3D

- Dung sai các kích thước còn lại là

THIẾT KẾ VÀ PHÁT TRIỂN ROBOT PHỤC VỤ XÃ HỘI

Nguyễn Trường Thịnh Hướng dẫn

DỰA TRÊN TƯƠNG TÁC PHI NGÔN NGỮ VÀ LỜI NÓI

THIẾT KẾ VÀ PHÁT TRIỂN ROBOT PHỤC VỤ XÃ HỘI

Họ và tên Chữ ký Ngày

- Dung sai các kích thước còn lại là

- Phương pháp gia công: In 3D

Trường: Đại học Sư Phạm Kỹ Thuật TP.HCM Khoa: Đào tạo Chất lượng cao

Nguyễn Trường Thịnh Hướng dẫn

DỰA TRÊN TƯƠNG TÁC PHI NGÔN NGỮ VÀ LỜI NÓI

THIẾT KẾ VÀ PHÁT TRIỂN ROBOT PHỤC VỤ XÃ HỘI

Họ và tên Chữ ký Ngày

- Dung sai các kích thước còn lại là

- Phương pháp gia công: In 3D

Trường: Đại học Sư Phạm Kỹ Thuật TP.HCM Khoa: Đào tạo Chất lượng cao

Nguyễn Trường Thịnh Hướng dẫn

DỰA TRÊN TƯƠNG TÁC PHI NGÔN NGỮ VÀ LỜI NÓI

THIẾT KẾ VÀ PHÁT TRIỂN ROBOT PHỤC VỤ XÃ HỘI

Họ và tên Chữ ký Ngày

- Dung sai các kích thước còn lại là

- Phương pháp gia công: In 3D

Trường: Đại học Sư Phạm Kỹ Thuật TP.HCM Khoa: Đào tạo Chất lượng cao

Nguyễn Trường Thịnh Hướng dẫn

DỰA TRÊN TƯƠNG TÁC PHI NGÔN NGỮ VÀ LỜI NÓI

THIẾT KẾ VÀ PHÁT TRIỂN ROBOT PHỤC VỤ XÃ HỘI

Họ và tên Chữ ký Ngày

- Dung sai các kích thước còn lại là

- Phương pháp gia công: In 3D

Trường: Đại học Sư Phạm Kỹ Thuật TP.HCM Khoa: Đào tạo Chất lượng cao

Nguyễn Trường Thịnh Hướng dẫn

DỰA TRÊN TƯƠNG TÁC PHI NGÔN NGỮ VÀ LỜI NÓI

THIẾT KẾ VÀ PHÁT TRIỂN ROBOT PHỤC VỤ XÃ HỘI

Họ và tên Chữ ký Ngày

- Dung sai các kích thước còn lại là

- Phương pháp gia công: In 3D

Trường: Đại học Sư Phạm Kỹ Thuật TP.HCM Khoa: Đào tạo Chất lượng cao

Nguyễn Trường Thịnh Hướng dẫn

DỰA TRÊN TƯƠNG TÁC PHI NGÔN NGỮ VÀ LỜI NÓI

THIẾT KẾ VÀ PHÁT TRIỂN ROBOT PHỤC VỤ XÃ HỘI

Họ và tên Chữ ký Ngày

- Dung sai các kích thước còn lại là

- Phương pháp gia công: In 3D

Trường: Đại học Sư Phạm Kỹ Thuật TP.HCM Khoa: Đào tạo Chất lượng cao

Nguyễn Trường Thịnh Hướng dẫn

DỰA TRÊN TƯƠNG TÁC PHI NGÔN NGỮ VÀ LỜI NÓI

THIẾT KẾ VÀ PHÁT TRIỂN ROBOT PHỤC VỤ XÃ HỘI

Họ và tên Chữ ký Ngày

- Dung sai các kích thước còn lại là

- Phương pháp gia công: In 3D

Trường: Đại học Sư Phạm Kỹ Thuật TP.HCM Khoa: Đào tạo Chất lượng cao

Họ và tên Hướng dẫn

DỰA TRÊN TƯƠNG TÁC PHI NGÔN NGỮ VÀ LỜI NÓI

Trần Quang Huy Nguyễn Trường Thịnh

THIẾT KẾ VÀ PHÁT TRIỂN ROBOT PHỤC VỤ XÃ HỘI

- Dung sai các kích thước còn lại là

- Phương pháp gia công: In 3D

Trường: Đại học Sư Phạm Kỹ Thuật TP.HCM Khoa: Đào tạo Chất lượng cao

Nguyễn Trường Thịnh Hướng dẫn

DỰA TRÊN TƯƠNG TÁC PHI NGÔN NGỮ VÀ LỜI NÓI

THIẾT KẾ VÀ PHÁT TRIỂN ROBOT PHỤC VỤ XÃ HỘI

Họ và tên Chữ ký Ngày

- Dung sai các kích thước còn lại là

- Phương pháp gia công: In 3D

Trường: Đại học Sư Phạm Kỹ Thuật TP.HCM Khoa: Đào tạo Chất lượng cao

DỰA TRÊN TƯƠNG TÁC PHI NGÔN NGỮ VÀ LỜI NÓI

Họ và tên Chữ ký Ngày

Số lượng Khối lượng Tỷ lệ

2 Duyệt Nguyễn Trường Thịnh ỐP BẮP TAY

Khoa: Đào tạo Chất lượng cao Trường: Đại học Sư Phạm Kỹ Thuật TP.HCM

- Phương pháp gia công: In 3D

- Dung sai các kích thước còn lại là

THIẾT KẾ VÀ PHÁT TRIỂN ROBOT PHỤC VỤ XÃ HỘI

DỰA TRÊN TƯƠNG TÁC PHI NGÔN NGỮ VÀ LỜI NÓI

Họ và tên Chữ ký Ngày

Số lượng Khối lượng Tỷ lệ

Khoa: Đào tạo Chất lượng cao Trường: Đại học Sư Phạm Kỹ Thuật TP.HCM Lớp: 19146CL3

- Phương pháp gia công: In 3D

- Dung sai các kích thước còn lại là

THIẾT KẾ VÀ PHÁT TRIỂN ROBOT PHỤC VỤ XÃ HỘI

49 Gá liên kết màn hình 1

46 Màn hình cảm ứng Dell 1

31 Tấm đỡ động cơ 1 CT45

30 Khớp nối trục GW4632 DC 3

19 Nắp đậy vỏ bệ 1 Composite

15 Khớp nối cổ tay 2 Nhựa PLA

12 Nối cù chỏ 2 Nhựa PLA

11 Khớp nối cù chỏ 2 Nhựa PLA

7 Khớp nối vai 2 Nhựa PLA

6 Nắp đậy sau 1 Nhựa PLA

STT KÝ HIỆU TÊN S.LƯỢNG VẬT LIỆU GHI CHÚ ĐỒ ÁN TỐT NGHIỆP NGÀNH CÔNG NGHỆ KĨ THUẬT CƠ ĐIỆN TỬ

Nguyễn Trường Thịnh Trần Quang Huy Nguyễn Trường Thịnh Duyệt

Thiết kế Hướng dẫn Chức năng

Trường: Đại học Sư Phạm Kỹ Thuật TP.HCM Khoa: Đào tạo Chất lượng cao

THIẾT KẾ VÀ PHÁT TRIỂN ROBOT PHỤC VỤ XÃ HỘI DỰA TRÊN TƯƠNG TÁC PHI NGÔN NGỮ VÀ LỜI NÓI

49 Gá liên kết màn hình 1

46 Màn hình cảm ứng Dell 1

31 Tấm đỡ động cơ 1 CT45

30 Khớp nối trục GW4632 DC 3

19 Nắp đậy vỏ bệ 1 Composite

15 Khớp nối cổ tay 2 Nhựa PLA

12 Nối cù chỏ 2 Nhựa PLA

11 Khớp nối cù chỏ 2 Nhựa PLA

7 Khớp nối vai 2 Nhựa PLA

6 Nắp đậy sau 1 Nhựa PLA

STT KÝ HIỆU TÊN S.LƯỢNG VẬT LIỆU GHI CHÚ ĐỒ ÁN TỐT NGHIỆP NGÀNH CÔNG NGHỆ KĨ THUẬT CƠ ĐIỆN TỬ

Nguyễn Trường Thịnh Trần Quang Huy Nguyễn Trường Thịnh Duyệt

Thiết kế Hướng dẫn Chức năng

Trường: Đại học Sư Phạm Kỹ Thuật TP.HCM Khoa: Đào tạo Chất lượng cao

THIẾT KẾ VÀ PHÁT TRIỂN ROBOT PHỤC VỤ XÃ HỘI DỰA TRÊN TƯƠNG TÁC PHI NGÔN NGỮ VÀ LỜI NÓI

Ngày đăng: 14/11/2023, 10:10

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN