Luận án tiến sĩ Cơ kỹ thuật: Swimming gait control of elongated undulating fins based on the central pattern generators

Background

With the development of new biology, materials, and robotics technologies, it may be possible to make robots that move like animals and swim like a fish[1]–[3] This kind of robot is a particular biologically-inspired underwater vehicle (BIUV) that moves by mimicking the actions of aquatic animals [4] Instead of screw propellers, BIUVs are powered by biomimetic fins, flippers, or bodies The BIUV systems are similar to traditional Autonomous Underwater Vehicles (AUVs) in that they can be used in many different ways, such as marine sourcing, seabed charting, military surveillance, environmental assessments, sea exploration, finding mines, and doing scientific research, among other things[5], [6] Also, BIUVs have unique features that make them better than traditional AUVs, especially regarding how well they move Regarding how animals move underwater, fish swimming is a popular topic of study [7] Over millions of years of evolution and natural selection, fish have perfected how their bodies work and swim to move around underwater It has been said that most fish can swim more efficiently than 80% of the time[8] Some Thunniform fish can swim with more than 90% efficiency, while the average efficiency of screw propellers today is between 40% and 50%[8] Fish can also turn with a turning radius of less than 10% to 30% of their body length and still move at high speed This fantastic skill is way beyond the abilities of any current ship, which usually has a turning radius much more significant than its hull length and a turning speed less than half of its average cruising speed[8]

Compared to screw propellers, the movement of fish fins or bodies can give underwater robots more maneuverability, which can be used to fine-tune their positions[9] These abilities inspire new designs that make it easier for artificial systems to operate in and interact with water[10] The underwater ecosystem is also an essential part of the study of BIUVs, especially since

2 marine life has been getting worse because of how often propellers, which make loud noises in the wake, have been used Fish move without making noise because of the way they swim Because of this, engineers are also forced to develop new ways to make vehicles that haven't rotary propellers[11]

Biomimetic propulsion systems for swimming machines can learn a lot from how fish move[12]–[14] People have become increasingly interested in robotic fish in the last 20 years The goal of the research on fish robots in robotics is simple: to turn the idea behind biomimetic fish into new underwater vehicles that can help people To achieve this goal, researchers need to study many things, such as the mechanical design of fish robots, the materials of biomimetic propellers, the methods of actuation and actuators for underwater environments, the sensors and electronic systems for underwater measurements, the control of swimming for highly efficient locomotion, intelligent control strategies for autonomous manipulations, etc.[2], [3], [15], [16]

This dissertation focuses on exploiting the motion controller aspect of the propulsion module using the swimming mechanism of the Gymnotiform fish class The very important factor that characterizes the flexibility of this propulsion system lies in the time of changeover, which has not been mentioned in any previous studies In addition, with the characteristics of robot application orientation in underwater mine removal, it is necessary to find a swimming posture for maximum thrust without causing changes in underwater sound frequencies caused by them when swimming out of position With the thesis as a framework, the following are some specific limitations and research limitations:

1- It is impossible to simulate the effects of disturbance on the marine environment

2- in the analytical calculation to focus on the thrust in the translational direction, the thesis temporarily ignores the horizontal, oblique force analysis

3- The effect of vortices and the experimental tank's narrowness is considered negligible and will develop in future studies.

Motivation

One of the inevitable consequences of modern warfare is the remnants of explosives left everywhere and the lasting impacts on people's quality of life For fishermen in the coastal areas of Vietnam, there is always a potential risk from mines still lying in the seabed with a lot of moss and mud Using clones to carry out detection and destruction tasks is not only labor- intensive but also involves a lot of risks In recent years, many Vietnamese Navy units have noticed the above inadequacy and have used underwater robots to carry out mine clearance

3 survey work However, a new inadequacy arises from the characteristics of the environment where the mines are located, which is often a mossy environment with a lot of ocean garbage Figure 1‑1 The robots using propellers are all stuck and not working effectively, so a solution is needed Solutions to these problems

Figure 1-1 Mine underwater (source internet)

The above situation, concerning the operating mechanism of fish robots in the world, has motivated me to conduct a research-oriented approach to underwater robots with high stability and a rigid body to install and place the devices The contributions in the thesis are the foundation for the orientation of building a complete underwater robot for surveying and clearing mines left on the seabed.

Literature review

Aquatic Locomotion Modes of Fish

This section discusses the swimming mechanisms used by fish The purpose is to provide a concise and helpful overview of the existing literature on aquatic biomechanisms Breder came up with a well-known classification scheme and nomenclature for fish that swim These kinds of fish are described in that way Use body and/ or caudal fins (BCF) or median and/ or paired fins (MPF) to swim when you fish The latter is usually used at low speeds because it has more maneuverability and propulsion efficiency than BCF movements, which have more thrust and acceleration Specific swimming modes are identified for BCF, and MPF locomotion based on the propulsor and the type of movements (oscillatory or undulatory) used to generate thrust

4 Swimming locomotion has been broken down into two general types based on how quickly the movements happen[1]:

• Periodic Swimming (or steady or sustained), in which propulsive movements are repeated cyclically Periodic Swimming enables fish to cover relatively large distances at a relatively constant rate

• Voluntary (or transient) movements such as rapid acceleration, escape maneuvers, and turns Typically, millisecond-long movements are used to capture prey or evade predators

Biologists and mathematicians have historically concentrated their research on periodic Swimming This is primarily because experimental measurements of transient movements are more challenging to set up, repeat, and verify that those of sustained Swimming As a result, the section's primary focus will invariably be periodic Swimming However, given the importance of transient movements in providing fish with unique abilities in the aquatic environment and the increased interest by scientists in describing them in recent years, references to transient propulsion will be made wherever possible This section's classification of swimming movements is based on BBreder's(expanded) nomenclature [2] Recently, BBreder'snomenclature for describing fish swimming was criticized for being oversimplified and ill-defined [3] The classification system Breder came up with isn't what They want to discuss, but it is still important to us because it is the foundation for a complete classification system for Swimming This system looks at how fish move, their kinematics, how they move, how they move, and how they move their muscles to think about how they swim The most fish move by bending their bodies backward into a wave that pushes them forward This type of Swimming is called the body and/or caudal fin locomotion (BCF) IIt'scalled median and/or paired fin (MPF) locomotion when fish use their middle and back fins for Swimming The term

"aired" refers to both the pectoral and pelvic fins, but the latter, although handy for stabilization and steering, do not help much with forward propulsion and are not linked to any specific mode of locomotion in the literature-based classifications Around 15% of fish families use non-BCF modes for routine propulsion, while a much more significant proportion use BCF modes for maneuvering and stabilization [3] Another frequently used distinction in the literature is between undulatory and oscillatory motions: undulatory motions involve the passage of a wave along the propulsive structure, whereas oscillatory movements involve the propulsive construction swiveling on its base without exhibiting a wave formation The two modes of action should be considered as a continuum, as oscillatory motion can be derived from a gradual

5 increase in the undulation wavelength In addition, the propulsor's smaller parts move together to make both types of movement Generally, fish that use the same propulsion method regularly exhibit similar morphology However, form differences exist and are related to each species' unique mode of life Three basic optimal designs for fish morphology are derived from specializations for accelerating, cruising, and maneuvering [4], and they are intimately related to the locomotion method used (Figure 1‑2) Additionally, because they are primarily mutually exclusive, no single fish performs optimally in all three functions However, none of these fish are specialists in a single activity; instead, they are locomotor generalists incorporating design elements from all three specialists to varying degrees [4] and [5] give more information about how function and morphology work together in swimming fish

Figure 1-2 Diagram of swimming propulsors and swimming functions

Within the primary classification of MPF and BCF propulsion, additional types of swimming can be identified for each group using Breder's [2] original classification and nomenclature (Figure 1‑3) These modes should not be thought of as separate groups Instead, they should be seen as prominent points on a continuum Fish can swim in a variety of ways at the same time or at different speeds Median and paired fins are often used together to provide thrust, with varying contributions from each This results in smooth trajectories Several fish also swim in MPF mode, which gives them more maneuverability, the ability to switch to BCF mode at faster speeds, and quick acceleration rates, among other things

6 Figure 1-3 Swimming mode (a): BCF , (b): MPF [17]

In undulatory BCF modes, the propulsive wave moves through the fish's body in a different direction than the overall movement and at a faster rate than the overall speed of the fish when it swims Figure 1‑3 shows four undulatory BCF locomotion modes that move in different ways, such as the one shown in the figure Each method has a unique wavelength and amplitude envelope that makes it special In addition, other modes have different ways of making thrust This can be done with a lift-based (vorticity) method and a method that adds more mass to it Two main ways to do this: As you can see, this is where the added-mass method has been used the most It has been linked to the added-mass process for a long time, but now They know why Carangiform and Subcarangiform fish are found in the sea and have vorticity mechanisms that help them move

Figure 1-4 Gradation of BCF from (a) Anguilliform through, (b) Subcarangiform, (c)

Anguilliform mode is characterized by large-amplitude undulations that involve the entire body Figure 1-4(a) A wave that moves your body is at least one full wavelength long, which means that lateral forces are enough to cancel each other out This reduces the body's tendency to recoil when the wave is applied By shifting the propagation direction of the propulsive wave, many anguilliform swimmers can swim both backward and forward Backward swimming necessitates greater lateral forces and body flexibility [7] The eel and the lamprey [8] are well-known examples of this widespread movement style The sub-carangiform mode (for example, trout) exhibits similar motions, but the amplitude of the undulations is limited to the front of the body and only increases in the back of the body Figure 1-4(b)

Carangiform swimming makes this much clearer because the body undulations are limited to the last third of the body length Figure 1‑4(c), and a relatively stiff caudal fin gives propulsion In general, Carangiform swimmers are faster than their Anguilliform or Subcarangiform counterparts However, because of the relative rigidity of their bodies, their turning and accelerating skills are severely limited Furthermore, because the lateral forces are focused on the posterior, there is a greater tendency for the body to rebound

In the aquatic environment, the Thunniform style has evolved as the most efficient mode of locomotion Thrust is generated by the lift-based approach, which allows high cruising speeds to be maintained for extended periods It is regarded as a climax in the evolution of swimming patterns because it is found in a diverse range of vertebrates (teleost fish, sharks, and marine mammals), all of which have developed in distinct environments The thunniform mode is seen in scombrids, which include tuna and mackerel, among other teleost fish Only the caudal fin generates more than 90% of the thrust, and the area near the narrow peduncle is subjected to significant lateral motion The body is well streamlined to reduce pressure drag, and the caudal fin is rigid and high, with a crescent-moon shape known as the lunate Figure 1-4(d) Because of the caudal thrust strength, the body shape and mass distribution ensure that recoil forces are efficiently minimized and that very little sideslipping is caused by the thrusts The primary function of Thunniform swimmers is to swim quickly in calm water; however, other activities such as slow swimming, turning maneuvers, and quick acceleration from immobile or turbulent water are not well-suited to their design

Ostraciiform locomotion is the only BCF mode that is entirely oscillatory It is distinguished by the caudal fin oscillating in a pendulum-like fashion while the rest of the body stays essentially solid Fish that feed in the Ostraciiform mode are frequently enclosed in rigid bodies and use MPF propulsion to navigate their (usually complicated) environment [9] When

8 used as an auxiliary locomotion method, caudal oscillations can aid in the creation of thrust at higher speeds, the maintenance of appropriate rigidity of the body, and prey tracking [6] The hydrodynamic adaptations and refinements found in Thunniform swimmers don't show up in Ostraciiform movement, which has low hydrodynamic efficiency even though it looks similar

Many fish employ undulating fins as alternate propulsors, as well as for maneuvering and stabilization, regularly These propulsion systems can also provide sufficient thrust to be included in the sole means of locomotion at generally low speeds Certain fish can actively bend their median fins rays because they have a muscle group (usually six) for each fin ray that allows them to move with two degrees of freedom The muscular system of paired fins is even more complex, allowing them to perform movements like rotations of individual fin rays Several reviews of the literature on teleost fin’s structure and properties are provided in [6],[3] Figure 1‑5 illustrates how their adaptability has played a crucial role in the growth of the undulatory MPF modes

Figure 1-5 Growth of the undulatory MPF modes [3]

Some experts say that many fish, like rays, skates, and manta rays, move in the same way as birds do when they fly Rajiform mode is found in fish like this To generate thrust, vertical undulations must be passed along the extremely large, triangular-shaped pectorals and flexible Increasing the amplitude of the undulations from the anterior portion to the apex of the fin and then decreasing it again toward the posterior part, It is also possible to flap the fins up and down

Diodontiform mode moves the animal forward by skipping down the wide pectoral fins and not causing them to move As a result, light waves can spread across the fins in two full wavelengths, and the waves and flapping movements of the fin are often seen together

Swimming in Amiiform mode is accomplished through undulations of a dorsal fin (typically long in base), with the body axis being maintained in many cases while swimming

9 The African freshwater electric eels are the best examples of this characteristic, and they can be found in large numbers in Africa It lacks the anal and caudal fins but has many fin rays and extends along most of the body length before tapering to a posterior point (up to 200)

The swimming mechanism of fishes

Studies related to the swimming mechanism of fishes have been ongoing for many years Researchers have been interested in understanding how fish are able to swim efficiently through water, and how this knowledge can be applied to the design of underwater vehicles and robots

One area of research has focused on the undulating motion of fish fins In a study, the authors investigated the factors contributing to the propulsive thrust and efficiency of undulating fins for various swimming modes They noted that tissue fibers in the fin of cuttlefish may store elastic energy during fin bending, allowing the fin to function as a harmonic oscillator and increasing the efficiency of the fins during locomotion[125]

Another area of research has focused on the morphology of fish and how it affects swimming performance In a study developed a model for carangiform swimmers that addressed the mechanics of both the foil (caudal fin) and the body The authors noted that the long narrow peduncle of carangiforms allows the caudal tail fin to be located several chord lengths away from the main body, which affects the flow field around the fish and its swimming performance [126]

10 Researchers have also developed mathematical models to describe the effect of sinusoidal inputs over a cycle of fish locomotion Leonard’s research derived an average-formula approach to describe this effect, which is appealing because fish locomotion often involves oscillatory motions of the fins and body[127] Li and Saimek developed a Kalman filter-based estimation scheme that recovers the hydrodynamic potential from a set of pressure measurements along a fish’s body[128]

Overall, studies related to the swimming mechanism of fishes have provided valuable insights into the design of underwater vehicles and robots By understanding how fish are able to swim efficiently through water, researchers can develop more effective and efficient underwater technologies.

The Development of Vertebrate Locomotion

An organism, like a vertebrate, is a dynamic system that has changed since it was first created However, even though the parts change, the organism works similarly Self-organization is the process by which the organism grows and changes in a way that makes sense This process is based on genetic, chemical, mechanical, and activity-dependent mechanisms

All vertebrate brains go through the same stages as they grow Researchers have found that synaptogenesis depends on what the animal is doing This process happens both before and after the animal is born So, the adult pattern of brain connections is made possible by processes that depend on how the brain is used The function also determines structure

It is well known that all embryos of vertebrates move around before they are born[19], [20] But the effects of the pattern of prenatal behavior have only been fully understood in the last few years[19], [21]–[23] It has been suggested that these movements during pregnancy could be how the nervous system connects sensory inputs to specific patterns of muscle activity Prenatal movements can be broken down into the following stages[19], [20], [23]:

Pre-Motile Stage : In species like Xenopus Laevis, when fine hair is touched on the head during the pre-motile stage, right before the animal moves on its own, it bends away from the stimulus Roberts[23] says that a reflex pathway is to blame for this bending

Bending of the head : In the next movement stage, the head moves forward in a way that isn't coordinated with the rest of the body This move starts in the neck and doesn't show any coordination Like in swimming, a bend to the left doesn't come before an angle to the right The way it bends seems to happen at random The animal's lack of coordination shows that it is functionally split into two parts, one on each side Only its head and neck are visible when the

11 animal starts to move While the spinal cord is still being built, the movement begins Over time, the animal's whole body starts to bend this way

C-Bending: The forward bending of the head gives way to a bend The shape is made when the animal bends very far in one direction and then very far in the other The sides of the animal work together, which is different from early head flexion This shows that the two parts of the animal work together

S-Bending: The bending replaces the bending The letter says a lot As in the letter "S", two bend points become apparent During the "C" bending phase, both sides of the spinal cord work simultaneously During the "S" bending phase, They see the first signs that the spinal cord is starting to separate into different parts that work together

S-Wave Traveling and Swimming: Eventually, the "S" bend gives way to a moving "S" wave and a "swimming" motion In the traveling "S" wave, an "S"-shaped wave moves from the top to the bottom In short, They can infer from this that development is made up of separate events that happen in a strict order How the nervous system is set up anatomically also depends on how the body moves.

Locomotion control for elongated undulating fin

Many approaches studied bio-fish robots concerned with the diversity of fish species These studies pointed out that many significant factors affect the hydrodynamics of bio-fish robots One such factor is the swimming pattern that enables bio-fish robots to perform complex operations such as turning, swaying, twisting, and curving Research on robotic fish locomotion control undulating fins is divided into two groups: offline swimming gait control and online swimming gait control

Several exciting studies utilized a sinusoidal-based kinematic equation to generate the undulating oscillatory motion for the bio-fish robots to address this research field This locomotion control strategy can provide various swimming patterns by predefining the amplitude envelope, oscillatory frequency, and phase lag, which are the kinematic parameters of the sinusoidal generator These works frequently involve modeling a specific swimming posture of the fish observed, and they believe that it is suitable for research to find practical linear swimming rules for them K.H Low and colleagues have had a series of research papers on robots that simulate the movement of fish with undulating fins that create the campaign Most of these studies use the law of sine wave motion Some articles go into more detail about

12 tuning the fin ray parameter set to get better thrust [10]–[14] Mohsen Siahmansouri et al extend the concepts of phase difference angle and thrust direction to further develop the motion controller for fish robots with a sine wave oscillator [15] Our group is also working on propulsion systems that use undulating fins Almost all use sine wave oscillators to control servo motors, and they all work together [16]–[18] However, in general, locomotion control is not only a fixed control of a swimming posture for a biomimetic robot, so adapting to the water environment and having flexibility requires a better solution for control Swimming shape can adjust parameters such as frequency and amplitude, smoothly

1.3.4.2 Online swimming gait controller used central paten generator

Orjan Ekeberg et al have laid the foundation for applying a central pattern generator (CPG) to control fish locomotion According to biologists who study fish, the fin rays are coordinated by the spinal nervous system, which is not directly related to the brain The model proposed by Orjan Ekeberg et al is a model that realizes that thinking With only a few inputs changing, like those emitted by the central nervous system, the swimming postures were flexibly switched without interfering too profoundly with the control details placed under the fin motion control [19] However, the application of CPG in motion control was made early in the motion simulation of humanoid robots, salamanders, etc., with different oscillators In 2006, Dai-bing Zhang et al applied controlled CPG to a fish robot with the model foundation that Orjan Ekeberg built Zhang proposed a sine-cosine oscillator In this study, he et al thinks it is reasonable and flexible for fish movement and superior to traditional oscillators such as Matsuoka and Hofp [20] In the same year (2006), a study on CPG for motion control of boxfish was published Daisy Lachat et al do not use classical oscillators but create a separate oscillator for each movement joint of the boxfish The research objective is only to prove the flexibility of movements such as turning the head and waving the fish body according to different amplitude and frequency signals In this study, they do not present the details of the CPG controller nor the required parameters that make up the quality of the locomotion controller [21] From the synthesis and evaluation of research on CPG application in robotics, in 2008, Auke Jan Ijspeert and colleagues presented specific steps as a principle to design the locomotion control CPG The main steps include [22]:

(1)-The CPG's overall design and architecture This includes the number and type of oscillators or neurons in the circuit Additionally, it involves selecting either position control or torque control for robots

13 (2)- The type and topology of couplings are essential considerations These will determine the conditions for synchronization between oscillators and the resulting gaits, i.e., the stable phase relationships between oscillators, among other things

(3)- The waveforms themselves During a cycle, these will determine which trajectories will be performed by each joint angle and which ones will not However, the waveforms depend on the shape of the limit cycle produced by the chosen (neural) oscillator, and the addition of filters can transform them into the mix

(4) - Input signals affect control parameters, which means that control parameters can modulate essential quantities such as frequency, amplitude, phase lags, or waveforms, among other things The influence of feedback signals is how feedback from the body will affect the activity of the CPG (for instance, accelerating or decelerating it, depending on environmental conditions)

(5) - The fact that these five design axes are all highly interconnected presents a significant challenge when developing CPGs.These steps later become the standard procedure for developing locomotion control

To control the movement of a fish with a combined swimming style of pectoral, body, and caudal fins, Yonghui Hu and his colleagues built a CPG network with a Mastsuka oscillator The results show that the controller produces smooth motion when changing the frequency or amplitude The author considers this necessary for protecting the servo motor to avoid damage

In addition, the article also mentions the genetic algorithm to find the optimal swimming posture to achieve the highest speed [23]

The turtle robot was also subjected to testing and application of the online swimming patent Wei Zhao and his colleagues developed a four-legged turtle robot with an online controller in

2008 According to the findings of this study, the authors developed an artificial neural network based on CPG with a loop connection to control the movement of four legs in a coordinated manner Figure 1‑5 It has been demonstrated in simulations and experiments that the turtle robot can move smoothly at the intersection of frequency and amplitude conversion even when it only has four legs Additionally, diving, floating, and turning on the spot are carried out with the same flexibility as a real turtle

14 Figure 1-6 CPG with a loop connection to control the movement of four legs turtle-like underwater robot [24]

Researchers from the Chinese Academy of Sciences have been investigating the movement mechanism for amphibious robots that have been specially designed and built for the past two years With two wheels on the ground and flexible body movement, this robot can walk on land and swim in water like a fish, just like a real fish The authors of these two publications have used a CPG-based motion controller to coordinate the movements of the body joints and the two front wheels, which is the most notable aspect of their work A serial arrangement of links at the neural nodes of the CPG network is followed by associative branching of the links As a result, the robot's movements as it transitions from underwater to land and vice versa are smooth and rely on only a few parameters from the high-level controller to function correctly A sliding controller was also added to the locomotion controller in the 2nd version, which the authors believe is an improvement over the first As a result, when walking on land, the robot's turning radius has been significantly increased by this method [24], [25]

Figure 1-7 Configuration of the formulated CPG model (a) simplefied structure (b) CPG network configuration [25]

15 Following the publication of a new CPG-based locomotion control method by Chen Wang and colleagues, a robotic fish model was used to demonstrate the effectiveness of the proposed method The proposed CPG model, a coupled linear oscillator system, has several advantages over the existing models First, the CPG model has been made simpler by substituting linear differential oscillators for nonlinear ones instead of the latter This makes it easier to put the CPG model into practice [26] Another advantage is that the dynamic performance has been maintained to a satisfactory level thanks to the adaptive structural parameters

Additionally, the explicitly presented parameters of the CPG model have improved the clarity of the applications they support As a result of our experiments, they can conclude that their CPG model is well-suited for the locomotion control of a three-jointed robotic fish All biomimetic multi-joint underwater robots with link structures, they believe, can be represented by our model, and They believe their model can be represented by their model for all of them They are currently working on broadening their scope of work to include the following areas of expertise:

- First and foremost, the paper does not consider the stable performance of the locomotion because the authors have chosen the locomotion speed as the only optimization goal However, to improve the stability and transient performance of the parameters, they are currently experimenting with various optimization methods

Discussion & Objective of the Disertation

It can be concluded that these earlier studies related to CPG have been successfully applied to the locomotion control of biomimetic robots However, most of these researches rely on trial- and-error data fitting to adjust a control parameter of the CPG model called convergence rate Increasing the convergence rate can reduce the processing time for achieving the limit cycle; however, this can raise an oscillatory error defined as the difference between the intrinsic amplitude of CPG and the maximum amplitude envelope of the CPG’s output This issue is still a challenge for researchers with the lack of optimization for the convergence rate of CPG In terms of parameter optimization, several studies used the particle swarm optimization (PSO) algorithm to seek the CPG parameters to minimize the difference between the desired oscillatory waveform and the generated output of the CPG [47], to reduce the control parameters [48] and to refine the feature parameters of the CPG [49]

The above-aforementioned studies regarding CPG-based bio-fish robots have not conducted optimization for the convergence rate (Characteristic coefficient for the time to change swimming form) Inspired from the studies concerned with applying RL for CPG, this dissertation proposes a reinforcement learning-based optimization of locomotion controller using CPG network for an elongated undulating fin

In addition to controlling motion, central pattern generator (CPG) networks [6–8] are used to make bio-robotic fish move in a rhythmic way Due to the slow response time, a genetic algorithm is used to improve the CPG and make the fish robot's thrust [13] In the paper [14], a CPG model is used to find the critical factor affecting propulsion to get the undulating motion pattern Even though the above mathematical models have been used successfully to set up a CPG-based motion controller, improving the robotic fish's propulsion force using a CPG network is still a significant challenge Some researchers have used optimization algorithms to pick the right parameters to solve this problem To get the desired swimming pattern, a Hopf oscillator-based CPG network did the parameter synthesis by following some learning rules [15] Combining Andronov–Hopf oscillators and an artificial neural network, the modified CPG in the paper [16] can make a real fish move differently (ANN) Heuristic search has been extensively used in recent years to tune the CPG network's parameters In the paper [17], the genetic algorithm (GA) is used to create rhythms based on CPG models by setting the weight values of the oscillator's coordination In references [18] and [19], the authors use particle

24 swarm optimization (PSO) to find the best parameters for the Hopf oscillator-based CPG for better propulsion These metaheuristic algorithms do an excellent job of finding the CPG parameters, but they often get stuck in local optima This dissertation investigates a new ideal for differential particle swarm optimization (D-PSO) to improve optimization problems The amplitude values of the CPG network increase the average propulsive force of the undulating fin robot to make a faster movement.

Outline of the Dissertation

The dissertation is presented, including six chapters:

Chapter 1: focuses on researching scientific publications in the same field to find out the contribution orientation of the dissertation

Chapter 2: Building a Motion Controller for a Specific Fish Robot Propulsion Module Model

Available on the CPG Platform Simultaneously, model the propulsion mentioned above system module

Chapter 3: Research on optimizing the specificity coefficient for the stroke switching speed of the locomotor controller built by a reinforcement learning algorithm

Chapter 4: Research on selecting the optimal set of amplitude parameters for the motion controller with the criterion of keeping the frequency unchanged and achieving the maximum thrust by the swarm optimization algorithm

Chapter 5: Testing the ability to change swimming posture flexibly, optimizing the speed characteristic coefficient of the change of swimming form found in Chapter 3, and measuring the thrust caused by the best set of parameters found in Chapter 4

Chapter 6: Conclusions and Future Research

DESIGN SWIMMING GAIT CONTROLLER AND THRUST MODELING

This Chapter proposes a locomotion controller inspired by black Knifefish for an undulating elongated fin robot The proposed controller is built by a modified CPG network using sixteen coupled Hopf oscillators with the feedback of the angle of each fin-ray By employing the proposed controller, the undulating elongated fin robot can realize swimming pattern transformations naturally Additionally, the proposed controller enables the configuration of the swimming pattern parameters, known as the amplitude envelope and oscillatory frequency, to perform various swimming patterns.

Elongated undulating fin description

The elongated undulating fin comprises sixteen oblique adjacent fin-rays interconnected with a flexible membrane Each fin-ray is driven by an RC servo motor that enables the fin-ray to sway around a rotary joint fixed to a supporting frame Accordingly, each fin-ray reacts as a shaker bar with a limited angle, and the phase difference between two adjacent fin-rays is regarded as a phase lag angle By changing one of the kinematic parameters, such as amplitude envelope, oscillatory frequency, and swimming pattern, the magnitude of the propulsive force can be adjustable To perform forwarding/reversing motion, the elongated undulating fin might change the sign of the phase lag angle Additionally, to avoid the counter-torque of the elongated undulating fin, the number of oscillation wavelengths should be an even number Traditionally Swimming Gait Based on Sine Generators is implemented as follows:

Fish move by undulating and/or oscillating their fins and/or bodies in a rhythmic way to move forward They use a kinetic method led by Lighthill's Elongated Body Concept to move forward [34] According to Lighthill's theory, thrust is generated by the formula:

𝑓 𝑒 (𝑥) : is the envelop equation (see Fig 2-1) f: the oscillation frequency of a body part and the speed of the traveling wave are determined by this

Figure 2-1 Waveform commonly used by undulatory swimming machines [35] the key point of the approach Swimming Gait Based on Sine Generators is to find out the appropriate time-dependant control functions

Figure 2-2 Parallel linkage mechanisms are used to make the fish robots move

Ai amplitude is determined with the following equation:

𝑓 𝑒 (𝑥) : is the envelop equation (see Fig 2-1)

𝑙 𝑖 : is the length of fin ray

𝑥 𝑖 : is the longitudinal position of the i th joint

Instantaneous angle of a Fin ray 𝜃 𝑖 (𝑡) with the following equation:

𝜑 𝑑,𝑖 : is the phase difference between adjacent fin

In the water environment with many fluctuating factors, to achieve the control purpose, the frequency and amplitude must be changed continuously with the above formulas consider simulating amplitude and frequency variation of a sine motion controller for fish fins

Figure 2-3 Changed amplitude and frequency observe at the communication point t * The common sine generator has disadvantages that if there is a sudden change in amplitude or frequency, the motion created by the oscillator will be disrupted or discontinue Therefore, the locomotion of fish robot will not be smooth when changing movement state This drawback is not mentioned in the research papers [36]–[41].However, some scientists found this shortcoming to create smoother oscillation controller with CPG base.

Swimming gait controller for elongated undulating fin base on CPGs

Oscillating neuron models

The CPG is a circuit network of oscillators that can produce rhythmic patterns for biomimetic robots There are several popular kinds of oscillators such as Wilson-Cowan, Kuramoto, Matsuoka, Amplitude-Controlled Phase, Rowat-Selverston, and Hopf that have been applied

28 successfully to generate the walking/swimming/flapping gaits of biomimetic robots However, the Hopf oscillator has been demonstrated that it can obtain well-perform and more adaptability than the others[42] Therefore, this research employs this type of oscillator to construct a modified CPG for generating the rhythm locomotion of the elongated undulating fin A typical structure of Hopf oscillator is shown in Figure 2-4

Figure 2-4 Typical structure of Hopf oscillator The dynamic of the Hopf oscillator is expressed by the following differential equation:

𝑣̇(𝑡) = 𝑘(𝐴 2 − 𝑢 2 (𝑡) − 𝑣 2 (𝑡))𝑣(𝑡) + 2𝜋𝑓𝑢(𝑡) (2-4) where 𝑢, 𝑣 are time-variant state variables of the oscillator; 𝐴 is the amplitude of the steady- state oscillation; 𝑓 is the intrinsic frequency; 𝑘 is the convergence speed to the limit cycle (𝑘 > 0)

For comparison to the traditionally sinusoidal generator, a simulation of single Hopf oscillator is conducted with the same manner as shown in Figure 2-5

29 Figure 2-5 Output of Hopf oscillator in abrupt change of amplitude and frequency

It can be observed from Figure 2-3 that the oscillatory output generated by the Hopf oscillator can introduce the smooth transition when the abrupt changes of both amplitude envelope and oscillatory frequency are conducted at the arbitrary time 𝑡 ∗ In addition, the Hopf oscillator of

Eq (2.4) also features the quick convergence to the limit cycle Even though starting from different arbitrary initial points, the output of the Hopf oscillator converges to a stable limit cycle with the amplitude 𝐴 The convergence speed can be controlled by adjusting k of the Eq (2.4) The Hopf oscillator output converges to the limit cycle more rapidly with an increasing

𝑘, regardless of the abrupt changes of amplitude and intrinsic frequency A simulation result of the Hopf oscillator with eight different initial points for each scenario is illustrated in Figure 2-6 It can be seen that the output of the Hopf oscillator can converge to the limit cycle rapidly

In this thesis, the dynamic swimming motion is analyzed using Computational Fluid Dynamics (CFD) method, specifically employing the Hopf model to simulate the swimming behavior in the simulated environment

30 Figure 2-6 Convergence to limit cycle of Hopf oscillator

Coupling Schemes

In the modified CPG network, there are two terminal oscillators which do not affected by the adjacent oscillators However, without loss of generality, the nonlinear function illustrating the modified CPG network is given as follow: Ẋ 𝑖 = 𝐹(𝑋 𝑖 ) + 𝑃 𝑖 = [𝑘(𝐴 𝑖 − 𝑢 𝑖 2 − 𝑣 𝑖 2 )𝑢 𝑖 − 2𝜋𝑓𝑣 𝑖

𝑝 𝑣,𝑖 ] (2-5) where 𝑋 𝑖 ≜ [𝑢 𝑖 𝑣 𝑖 ] 𝑇 is the state vector of the 𝑖-th oscillator; 𝐹(𝑋 𝑖 ) represents a nonlinear function; 𝑃 𝑖 ≜ [𝑝 𝑢,𝑖 𝑝 𝑣,𝑖 ] 𝑇 is a perturbation vector

To get a better idea of how the two CPGs work together, start by holding a random phase difference between two oscillators Then, look at a one-way coupling: oscillator one perturbs oscillator two, and there is no backwards perturbation In this case, P1 is zero, and P2 is an unknown vector that needs to be found out about first The coupling scheme is shown in Figure 2-7

31 Figure 2-7 Single –directional coupling between two oscillators

As 𝒖 𝟐 is considered the output of oscillator 2; it is not perturbed directly by 𝒖 𝟏 (output of oscillator 1) Indirectly is applied by using the output of the oscillator one to influence the state

𝒗 𝟐 of the oscillator 2, as shown in Figure 2-8 the impact that 𝒖 𝟐 can make is internally coupled with 𝒗 𝟏

In this case, there is no coupling between two oscillators, and both of these oscillators start out in the same state, 𝒖 𝟏 always equals 𝒖 𝟐 In a coupling, it can be assumed that oscillator two gets synchronized with oscillator one when the phase of oscillator one is at the same point as oscillator two 𝝋 𝟏 + 𝜹 An intermediate angle 𝜹 is added to adjust the phase difference 𝝋 𝒅

"The coupling" term on oscillator two can then be defined by:

Where 𝜺 is a positive constant that determines the coupling strength

In a polar coordinate system, perturbation acting in the direction of phase generate the phase Figure 2-8 tangential to the limit cycle

32 Figure 2-8 Illustration of perturbation in the direction of phase angle φ

This direction for oscillator 2 (without perturbed) is given by:

The perturbation on the phase is obtained given by:

By substituting Eqs (2.7) and (2.8) into (2.9), p φ,2 can be given in terms 𝛿 and 𝜑 2 :

𝑝 𝜑,2 In a phase-locking case, two oscillators evolve with stable phase difference 𝜑 𝑑,2 :

The two oscillators are synchronized with a constant phase difference 𝜑 𝑑,2 after a short transient phase evolving (from 0 to 𝑡 0 ) Therefore, in steady-state have:

33 The integration of Eq (2.13) over time t can be done implicitly by integration over φ2 in the steady-state of the system [43]

If the system evolves into the steady-state since 𝜑 2 = Φ 0 ( 𝑎𝑡 𝑡 = 𝑡 0 ), then phase locking is held on after 𝜑 2 = Φ > Φ 0 ( 𝑡 > 𝑡 0 ), Thus, Eq.(2.13) can be rewritten as:

Then substitute the (2.12) into (2.14), the following result is:

By solving Eq (2.15), the intermediate angle δ is:

2) = 𝜀(𝑐𝑜𝑠𝜑 𝑑,2 + 𝑣 1 𝑠𝑖𝑛𝜑 𝑑,2 ) (2-17) The coupling terms in condition of one-directional coupling can be given by:

The coupling between two oscillators is’nt only one-directional [44] The output of oscillator (i+1) can influence the evolution of oscillator (i) In this case, oscillator (i) is coupled with oscillator (i+1) with a contrary phase difference Figure 2-9 The coupling terms 𝐏 𝑖 and 𝐏 𝑖+1 can be obtained similarly as that in the derivation for Eq (2.7), which means:

34 Figure 2-9 Mutual coupling between two oscillators The coupling formula for two mutually coupled oscillators can be expressed as follows:

Multiple oscillatory joints may be used in the movement of an animal or a robot that looks like an animal, which needs more joints to move in a coordinated way

A single oscillator could have a lot of coupling put on it, like the one in Figure 2-10

Figure 2-6 Couplings among three oscillators Figure 2-10 shows a chain structure of three serially connected oscillators Oscillator i is influenced by the outputs of the oscillator i-1 and oscillator i+1

In this case, 𝐏 𝑖−1 , 𝐏 𝑖 and 𝐏 𝑖+1 can also be defined by:

In both invertebrates and vertebrates, there are several topological couplings between the joints that allow the muscle to function perfectly, a role that exhibits both excitatory and inhibitory roles The actual CPG of the animal brain is a complex network of neurons To reproduce the CPG for control of a biomimetic robot, it was necessary to simplify the coupling connections and classify them into four main topological structures: chain coupling, radial coupling, ring coupling, and full coupling Each topological structure of the linkage has appropriate properties corresponding to the biological characteristics of each species For example, the chain joint is mainly used to stimulate the movement of swimmers, while the fully connected joint is often applied to the rhythmic creation of the legging robot because all the legs must be coupled Joint to realize smooth motion against environmental changes

The biology of the long undulating fin has a series of rays The irregular movement of each fin ray is arbitrarily due to environmental influences and affects only its adjacent fin ray To induce undulating motion for the undulating long fin, this study constructs the articulated sequence of sixteen oscillators with bidirectional perturbations described in Each oscillator is used to excite each fin ray The reflection of each fin ray onto its adjacent fin is done through bidirectional perturbation Figure 2-10

Conduct oscillation simulation comparing phase reversal from -𝜋/3 to 𝜋 /3 between one-way coupling and two-way coupling, I get the results as shown below Figure 2-11

36 Figure 2-7 Output u of two oscillators CPG1 and CPG3 for two types of coupling

According to simulation results on phase reversal of two-way coupling for shorter time (9s) compared to one-way coupling (13.4s) Therefore, I choose a two-way coupling to control the fin-ray oscillation.

Configurations of Oscillators

In both invertebrate and vertebrate organisms, there are several topological couplings between the joints to allow the muscle to work perfectly, representing the role of stimulus and inhibition The actual CPGs of animal brains are complicated networks that have large neurons In order to replicate the CPG for controlling biomimetic robots, it is necessary to simplify the coupling connections and categorize them into four main topological structures: chain coupling, radial coupling, ring coupling, and fully connected coupling [45]

• Radial coupling: radial coupling topology is a single CPG topology that affects multiple cpg peers, see Figure 2-12

37 Figure 2-8 Radial type CPG coupling

This articulated topology is usually suitable for humanoid robots whose legs move as a peer signals from the spinal cord direct the same actions in both legs the two-pin action signal is not direct

Ring coupling is the structure of CPGs that act in turn and are closed in a cycle This activity is often concerned with the 4-legged jumping robot Moving Control of Quadruped Hopping Robot Using Adaptive CPG Networks

Fully connected coupling is the most complex type of CPGs network and exhibits many biological properties of primates Due to the interweaving effect between CPG units, the network construction needs to give weights to each node

38 Figure 2-10 Fully connected coupling Chain coupling:

This type of connection is the most common for fish robots because each node is only affected by the previous node and the next node Figure 2-14 Therefore, chain connect is especially suitable for showing the natural movement of fish robots

Chain coupling has two primary forms, one-way chain coupling, and two-way chain coupling Figure 2-15, Figure 2-16 chain coupling acting in one way is usually applied to fish robots that use the body to move On the other hand, the 2-way chain coupling is most suitable for fish robots that use the fin mechanism along the body because each fin ray affects the fins immediately before and after

Figure 2-11 One-way chain coupling

Figure 2-16 Two-way chain coupling

Swimming gait using Multiple Coupled CPG Oscillators

From the analysis in the above sections, the suitable CPGs model for Elongated Undulating Fin Robot is determined as follows:

With references and analysis of oscillators commonly applied to CPG, in this case (motion of fins along the body) Hopf oscillator was chosen due to its high stability and good self-healing in the disturbing underwater environment

Because of the biology of fish with caudal fins along the body, each fin ray is equal, but there is a mutual interaction with the fin rays immediately before and after fin rays so Multiple Coupling is best suited to make realistic fish-like movements

According to the biological structure of Elongated Undulating Fin, the fin rays are arranged in a long row, natural movement is operated by chain coupling structure The design in the following sections will comply with the chain coupling structure

By these options, the locomotion controller for the studied fin model is presented, as shown in Figure 2-17

Figure 2-12 Chain coupling structure CPGs model for Elongated Undulating Fin

According to this structure, the fin will be in the state of self-propelled control with the parameters f, A1…A16 from the higher-order controller

For the first oscillator (𝑖 = 1), there is only perturbation from the second oscillator (𝑖 + 1), thus the perturbation of the first oscillator is given by:

𝛽(𝑣 2 𝑐𝑜𝑠𝜑 𝑑 − 𝑢 2 𝑠𝑖𝑛𝜑 𝑑 ] (2-27) where 𝛽 is the coupling strength; 𝜑 𝑑 is the phase lag angle of two adjacent oscillator

In the same manner, the sixteenth oscillator is only affected by the perturbation from the fifteenth oscillators:

For 𝑖-th (1=2,3…14,15) oscillators, the perturbation vector is given by the following:

Corresponding to various amplitudes 𝐴 𝑖 , the modified CPG network can provide different swimming patterns for the elongated undulating fin, it thus can produce different propulsive forces

Modeling of elongated undulating fin

The diaphragm is the fin module's primary and most important part because it is the direct agent that generates thrust Therefore, diaphragm modeling was done to investigate the factors affecting the motion and identify the design elements In the thesis, the model is adjusted from the model introduced by Sfakiotakis [46], [47] Considering the experimental characteristics, the fabricated physical model proposes to ignore the horizontal thrusts because the link between the push module and the guide rail is fixed with degrees of freedom rotating around the axis

41 Consider the point 𝑞 located on the fin membrane between two i th and i+1 th fin ray Set the inertial coordinate system {𝑃} and the local coordinate system of the fin {𝑂} with the origin set at the head of the first fin ray see Figure 2-19 rq(i+1) r q(i)

Figure 2-19 Representation of coordinate systems

The swing angles of the i th fin ray , i+1 th fin ray are determined by the equation:

The swing angle represents the swimming amplitude of the fin rays; this swing angle parameter is calculated by interpolating at each point on the diaphragm from the two rays limiting them

𝐺(𝑖): is wave amplitude at i th fin ray;

𝑢(𝑖, 𝑡): calculated according to the equation (2-27),(2-28),(2-29)

The angular velocity of the fin ray is determined:

Angular velocity is a parameter representing the swimming frequency of the fin rays This function, the only variable being the swim frequency, will be applied to calculate the thrust when setting up a specific swim mode

The position vector of the point 𝑞 is determined by the formula:

𝐷: distance between two fin rays

ℎ: distance from point q to origin of fin ray

Then, the coordinates of the point 𝑞 in the inertial frame of reference are expressed in terms of 𝑟⃗ 𝑞𝑖 and 𝑟⃗ 𝑞(𝑖+1)

𝑤: distance of i th fin ray to point 𝑞

Due to the significant Reynolds coefficient of water, the effect of the force on the tangent to the fin surface can be neglected when the force produced by a differential of diaphragm area 𝑑𝑆 is considered linear to the diaphragm Now can approximate the thrust of the fins as follows:

𝜌: is density of water (998.1 kg/m 3 );

𝐶 𝑛 : is drag coefficiency (𝐶 𝑛 =2.8 from the ANSYS FLUENT database);

⃗⃗⃗⃗ = (𝑛⃗⃗ 0 𝑟̇⃗ 𝑞 )𝑛⃗⃗ 0 : is the velocity vector component according to the linear method set variable

Then, the normal vector of the dS component is determined by

The value of 𝐶 is the distance between two adjacent fin-ray peaks during the oscillation at time t, so 𝐶 = 𝑤 𝑚𝑎𝑥 The normal unit vector is written as:

𝜍 = √2𝐷 2 𝑤̅(𝑤̅ − 1)(1 − cos(𝜃 𝑖+1 − 𝜃 𝑖 )) + ℎ 2 𝑠𝑖𝑛 2 (𝜃 𝑖+1 − 𝜃 𝑖 ) + 𝐷 2 (2-45) The velocity vector component at the point 𝑞 is the position-time derivative at the point 𝑞

Therefore, the component of force produced by the diaphragm at point 𝑞 is:

By integrating the force 𝑓⃗ over the entire diaphragm with ∈ [0 15], ℎ ∈ [ℎ 𝑚𝑖𝑛 ℎ 𝑚𝑎𝑥 ], 𝑤 ∈ [𝑤 𝑚𝑖𝑛 𝑤 𝑚𝑎𝑥 ] Equation of force produced by the entire diaphragm:

Simulate the thrust of the fin ray when changing the waveform

With a mathematical model of the generated thrust combined with the flexibility in creating different swimming postures of the built-in locomotor controller In this section, describe the most common swimming pattens of fish observable in nature include: Elliptic, Quaratic, Envelope These swimming pattens was build from changing the parameter sets of amplitude for each ray Besides, use simulation tools to calculate the thrust that the fin generates according to the built mathematical model

The three types of pattens are based on the same set of CPGs with the assumed parameter set of 𝑓 = 1 Hz, 𝑘 = 10, φ 𝑑 = −π ⁄ 3 , 𝛽 = 0.8, and the sampling time of 0.01 seconds To match the required amplitude envelope, the output of oscillators is calculated by the following:

𝜃 𝑖 = 𝐺 𝑖 𝑢 𝑖 Where 𝑢 𝑖 is the output of each oscillator neural; 𝐺 𝑖 is the maximum sway angle of each fin-ray which is determined by 𝐺 𝑖 = arcsin(𝑌 𝑖 ) /𝐿 with 𝑌 𝑖 defined as the amplitude envelope of each fin-ray along to laterally, and 𝐿 is the length of fin-ray, for this case 𝐿 = 150 mm

45 The Elliptic waveform: With the formula just presented, with the amplitude envelope 𝑌 𝑖 for each fin-ray for Elliptic waveform are: {0, 5.7, 11.43, 17.14, 22.85, 28.57, 34.28, 40, 40, 34.28, 28.57, 22.85,17.14, 11.43, 5.7, 0} mm;

Figure 2-13 Transition from Static to Elliptic waveform

The results show that with the set of input parameters as stated above, the elliptic swimming shape of the 16-ray fin thruster module is soft converted from a non-motion position after 3 seconds The membrane profile of the fin module is stabilized to correspond to the natural Eliptic waveform

46 Figure 2-14 The thrust of the fin-ray module is generated relative to the Elliptic waveform

Thrust is calculated according to the mathematical model built at (2-48) with the actual parameter of the fin ray and the specific water viscosity 𝐷(m)= 0.032; 𝜌(kg/m 3 ) = 998.1;

𝐶 𝑛 = 2.8; ℎ 𝑚𝑎𝑥 (m) =0.1 , the average force generated is 0.8N and harmonic oscillation with a frequency corresponding to the swimming frequency of the fins

Quaratic waveform: with the model's fin ray length and distance The amplitude envelope of the Quaratic waveform is determined by the set 𝑌 1 … 𝑌 16 as follows: {0, 2.57, 5.33, 8, 10.67, 13.33, 16, 18.67, 21.33, 24, 26.67, 29.33, 32, 34.57, 37.33, 40} mm

Figure 2-15 Transition from Static to Quadratic waveform

47 According to the simulation results of the motion controller for the propulsion module with 16 fin rays, the Quadratic swimming profile was established in 3 seconds The profile corresponds to the swimming pattern observed from the wild fish

Similar to the thrust elliptic waveform of the thrust module is modeled according to the modeled coefficients and equations With this swimming posture, the thrust is generated on average about 0.55N and fluctuates with an amplitude of 0.05N according to the swimming frequency of the fins show on Figure 2-23

Figure 2-16 The thrust of the fin-ray module is generated relative to the Quadratic waveform

With the same controller of swimming movement on the CPG platform, creating a linear swimming posture only needs to proceed with the set of parameters y1-y16 with the same index and equal to 40 mm (the largest amplitude according to the physical design of the swimming pool) fin module) will produce a linear waveform Figure 2.24 depicts the transition from a static state to a linear waveform Figure 2.25 shows the calculation of the thrust generated by the thrust module corresponding to a linear waveform of 1.8N

48 Figure 2-24: Transition from Static to Linear waveform

Figure 2-17 The thrust of the fin-ray module is generated relative to the Linear waveform.

Conclusions

This chapter has presented the fundamental steps to construct a motion controller for a 16-ray finned fish based on the Central Pattern Generator (CPG) framework Additionally, the chapter builds upon the widely applied thrust computation model previously published by scientists worldwide to recreate the thrust generation for the entire vertical fin with 16 rays Furthermore, the simulation provides a more comprehensive insight into the swimming pattern control model for the vertically finned fish and the thrust computation methodology

OPTIMIZING CONVERGENCE SPEED OF SWIMMING GAIT CONTROLLER BASE ON CPG BY

Problem statement

The oceans account for more than three-quarters of the earth, and the ocean seafloor has the considerable potential to recover the great benefit that may benefit humanity Therefore, ocean exploration is recognized as an essential field in ocean science [48] Ocean exploration identifies two primary devices called remotely operated underwater vehicles (ROV), an autonomous underwater vehicle (AUV) Almost conventional AUVs adopt water pumps, air- jet engines, or single propellers as the propulsion system [49] that cause a loud noise affecting the organism’s life on the seabed In addition, the topological structure of conventional AUVs has been recognized that are not able to perform maneuverability and stability [50] The propeller can also be stuck by sediment and seaweed in the operation of AUVs on the seafloor [51]–[53] A bionic underwater robot equipped with a biomimetic fin mechanism is well-suited for ocean exploration [54] to overcome the drawbacks mentioned above Many approaches studied about bio-fish robots concerned the diversity of fish species [55]–[79] These studies pointed out that many significant factors affect the hydrodynamic of bio-fish robots One such factor is the swimming pattern that enables the bio-fish robots to perform complex operations such as turning, swaying, twisting, and curving Several exiting studies utilized a sinusoidal- based kinematic equation to generate the undulating oscillatory motion for the bio-fish robots [49], [80]–[85] to address this research field This locomotion control strategy can provide various swimming patterns by predefining the amplitude envelope, oscillatory frequency, and phase lag regarded as the kinematic parameters of the sinusoidal generator However, this does not feature a flexible transition swimming pattern, as well as it does not enable tuning online kinematic parameters to adapt to the environmental changes [55], [78]

To achieve efficient locomotor, earlier exiting studies have been proposed a central pattern generator (CPG) based locomotion controllers for widely application fields [58], [74], [86]–[92] In terms of governing the locomotion of bio-fish robots, the authors early synthesized a locomotion controller using a Proportional-Integral-Derivative (PID) controller integrated with CPG for a prototype of the fish robot with 3D dimensional [71] In 2008, Wang et al [66] employed a modified Matsuoka oscillator to build a CPG-based locomotion controller for a prototype of an undulating fins propulsion system with ten fin-rays Simulation and experimental results showed that the variable model of the weight matrix is consistent with the thrust propulsion generated by the prototype of the propulsion system In 2011, a CPG-based

50 controller of the proposed propulsion system was integrated with the rotary position sensors to improve the locomotion of undulating fin more flexibly [75] In addition, this study also introduced two control levels with a high-level controller for commanding operation and a low- level controller for driving actuators In, 2012 Zhou et al [86] developed a manta ray robot with two wide flexible pectoral fins This robot has been used a CPG model to achieve rhythmic biomimetic movement Simulation and experimental results showed that the yaw angle is stabilized, but the response time is slow In 2014, Chunlin Zhou et al [76] adopted a genetic algorithm to achieve a better conversion efficiency to optimize the CPG-based controller for the fish robot according to the thrust generation To validate the CPG-based control approach for undulating fins propulsion, in 2015, Michael Sfakiotakis et al [79] performed the CPG denominations using the conversion of single amplitude parameters and simultaneous transformation The authors adopted a CPG model to achieve the undulating motion pattern for finding the critical factor which affects the propulsion A fish robot prototype using the CPG model for swimming motion was inspired by cuttlefish [69] This study presented the effect of the various kinematic parameters of the undulating fin and the validity of a fluid drag model used to estimate the generated thrust Another study [55] dealt with the utilization of CPG for undulating biological fins with six degrees of freedom to perform the replicated fish-like swimming robot by changing the parameters of the CPG model Various parameters of the CPG model can be adjustable to generate undulating motion to produce the propulsion force, such as amplitude envelope, oscillatory frequency, and swimming patterns Thus, Yong Cao et al [93] predefined the undulation frequency and the undulation amplitude as constant parameters while governing CPG neuron output's phase angle to achieve various swimming patterns It can be concluded that these earlier studies related to CPG have been successfully applied for locomotion control of biomimetic robots However, most of these researches rely on trial-and- error data fitting to adjust a control parameter of the CPG model called convergence rate Increasing the convergence rate can reduce the processing time for achieving the limit cycle; however, this can raise an oscillatory error defined as the difference between the intrinsic amplitude of CPG and the maximum amplitude envelope of the CPG’s output This issue is still a challenge for researchers with the lack of optimization for the convergence rate of CPG

In terms of parameter optimization, several studies used particle swarm optimization (PSO) algorithm to seek the CPG parameters in order to minimize the difference between the desired oscillatory waveform and the generated output of the CPG [94], to reduce the control parameters [95] and to refine the feature parameters of the CPG [96] In comparison to a genetic algorithm (GA), PSO is similar to GA as so to search for optimal solutions through iterations

51 of a population, but PSO proved to be faster computed and easier implemented than GA [97] However, PSO exhibits that it is susceptible to trap in local minima [98] Reinforcement learning (RL) is known as an alternative strategy for optimization that has been applied recently in various applications such as robotic control, transportation, and energy supervision [99]– [105] RL generates a series of sequence actions to obtain the maximum numerical rewards in the interaction with environments RL can be categorized as model-based RL method, which attempts to model the environment known as Markov Decision Process (MDP) [106], and model-free RL method, which does not require the explicit of the environment One such model-free RL method is Q-Learning which is recognized as a well-suited method for optimization to trade-off the performance time and the effectiveness [102], [107]–[109] According to these above studies, Q-learning can be feasible to implement in real-time on programmable devices For the application of biomimetic robots, Y Nakamura et al [110], [111] utilized a reinforcement learning model for the CPG-based motion controller, namely CPG-actor-critic, to learn the selection of motion patterns for biped robots An actor observes the state of the biped robot and outputs a parameter of the motion controller Then the motion controller with the selected parameter produces the control signal

The above-aforementioned studies regarding CPG-based bio-fish robots have not conducted optimization for the convergence rate Inspired from the studies concerned with applying RL for CPG, this research proposes a reinforcement learning-based optimization of locomotion controller using CPG network for an elongated undulating fin The elongated undulating fin comprises sixteen oblique fin-rays interconnected with a membrane known as a flexible surface that is controlled by the proposed CPG-based locomotion controller coupled with sixteen neural oscillators to generate the locomotor corresponding to sixteen fin-rays of the elongated undulating fin The advantages of this control method in comparison to the sinusoidal kinematic equation are discussed This research , differentiating from the previous studies, utilizes a Q- learning with discrete state/action to optimize the convergence rate of the CPG controller The actor observes the undulating signal of the CPG-based locomotion controller and outputs a value of the convergence rate The locomotion controller with the chosen convergence rate produces the control signal The proposed controller is promised that it can be implemented on a microcontroller due to its simplicity The simulation and experimental results are carried out to evaluate the performance and effectiveness of the proposed control method

Theoretical foundations of reinforcement learning

Introduction to Reinforcement Learning

Reinforcement Learning (RL) belongs to the class of machine learning methods [112], [113] used to solve optimization problems by continuously adjusting the actions of agents The theory of RL is based on the observation and study of the properties and behaviour of animals when interacting with the environment to adapt and survive Control algorithms based on RL simulate animal instincts It knows how to learn from mistakes, how to teach yourself, how to use information directly from the environment, and information evaluated in the past to reinforce and adjust behavior to improve quality continuously interact, optimize a particular goal over time RL has been intensely researched, developed, and applied in machine learning since the 1980s [113] However, RL has only really begun to develop for the control field since the early years of the 21st century The development history of RL in The control field is temporarily divided into three phases In the first phase (before 2005), RL theory from artificial intelligence was developed and expanded to control First, the RL with the Markov model is defined by discretizing the state space [113] Then, two basic iterative algorithms: PI [46], [114], and VI (Value Iteration) [113] are used to approximate the control law or optimal evaluation function

To apply these two algorithms, the mathematical model of the system needs to be determined in advance Another proposed algorithm with a parameter update rule that does not depend on the system model is the TD prediction algorithm (Temporal Difference) [115].If the Agent, environment, Action, reward, control state signals are quantized together with the state space, the Q-Learning algorithm [115] is proposed In Q-Learning, the parameter update rule does not depend on the system model In general, it is possible to enumerate the branches of RL Because the underwater environment is an ever-changing environment with many variables that cannot be predicted or modeled, therefore, in this thesis, I choose to focus on the free model branching.

Markov decision processes

In reinforcement learning, the designer does not teach an agent how to do something but rewards it, whether positive or negative, based on its action The fundamental mathematical nature of learning is embodied in Markov decision processes

The agent and the environment are the two most prominent actors across all reinforcement learning These two subjects are described in detail by Markov as follows:

Agent: This case implies Software programs that are programmed to make intelligent decisions and learners in RL These actors impact the environment by actions and receive rewards in response to those actions

Environment: refers to the problem to be solved with different influencing factors; it can be a natural environment with overlapping effects or a virtual simulation environment created to model impacts It can be said that everything that the agent cannot change arbitrarily is part of the environment

The agent and the environment continuously interact through actions from the actor, and the environment also responds to each action with rewards through reward policies The way to modify the next behavior is for the agent to optimize the reward received, but this action is not short-term but long-term to achieve the highest total reward

There's something special about distinguishing between factors and environments It's not as clear as the physical entities An example of a fish swimming in an aquatic environment is that the medium is water and the skeleton, the muscles of the fish With this view, everything cannot be changed arbitrarily by the agent; it is the environment even the calculation of the reward is the same, even though it is calculated inside the agent, it is still subject to the policy dictated by the environment, so it cannot be the agent This refers to the absolute control of the agent that promotes reinforcement learning and continuous learning However, there will be multiple agent segments to execute multiple mapped scopes of work rather than a single agent For example, a personal robot, the agent that determines the swimming direction is an agent, the agent that determines the swimming posture is an agent

Reinforcement learning framework by decision process Markov proposes that all sensors, actuators are reduced to 3 elements representing the interaction between agent and environment a signal is called action it represents the agent's choice of next behavior a signal is a state it defines the parameters of the agent representing the basis for determining the behavior, and a reward signal representing the goal the agent is aiming for Of course, this reinforcement learning framework cannot claim to be all cases, but it is simple and basic, demonstrating the correlation of reinforcement learning and is widely applicable

Goal: the goal here indicates the final goal that the agent is aiming for It is the cumulative sum of rewards over a learning cycle

Reward: only a short-term instantaneous nature when the action occurs; it shows the trend of the action that the agent has just performed

The goal of the learning process is the ultimate goal of the training process So presenting goals as rewards is a characteristic feature of reinforcement learning and this may seem limited at first, but it is flexible It can be applied across various areas from coaching robotics, animal training, and recognition training It should be noted that the reward here is a scalar quantity and represents the direction of the new action taken Another issue that needs to be emphasized here is that the calculation of the reward is entirely out of the box for the agent, but it follows the policies prescribed by the environment; the agent cannot arbitrarily change the reward mechanisms Thus, it can be generalized that the goal of reinforcement learning is the cumulative sum of rewards throughout the learning process The rewards are the materialized representation of the goal at each defined point in time

Return: This is a term that represents the intelligence of reinforcement learning As I stated above, the agent's goal is to maximize the final reward, so the agent may stumble upon choosing only the maximum reward at each move, but in practice this is not so simple The foresight in machine learning strategy manifests itself in the fact that there are steps that must be accepted to act to receive immediate non-maximum rewards because those actions will lead to further steps that bring the total The highest reward

Policy: is a probability distribution assigned to a set of actions For example, activities that are highly rewarded will have a high probability and vice versa

The equation expresses Markov Property :

With this expression, the agent's i th states should be dependent only on its agent's i-1 th state, which is almost unrelated to past states

Markov Process: The Markov process is defined by the set (𝑆, 𝒫) where 𝑆 is the set of states and 𝒫 is state transition probabilities the set of states is a sequence of random states S₁, S₂, etc., where all states conform to the Markov Property

The state transition probability 𝒫_ss′ is the probability of jumping to state s' from the current state s expressed by the formula:

To make the Markov process more intuitive Consider the state of a person as shown in the following diagram Figure 3-1

Figure 3-1 Diagram for the Markov process [83]

It is clear that there are choices for the next state at each state, and it is accompanied by a probability value to switch that state Note that the total probability that any one state can be moved to a new state is only = 1.0.

Canonical RL algorithm

"Canonical RL algorithm" refers to a class of RL algorithms that share a common structure and approach to solving the RL problem

At a high level, a canonical RL algorithm works by mapping a history of experiences (i.e a sequence of states, actions, and rewards) to an action The goal is to learn a policy that maximizes the expected cumulative reward over time

There are two main types of canonical RL algorithms: model-based and model-free Model- based algorithms maintain a model of the environment (i.e a representation of the transition probabilities and rewards associated with each state-action pair), and use this model to plan ahead and make decisions Model-free algorithms, on the other hand, do not maintain a model of the environment, and instead learn a value function or policy directly from experience

Two of the most well-known and widely used model-free RL algorithms are Q-learning and SARSA Q-learning maintains an estimate of the Q-function for each state-action pair, and updates this estimate based on the last experience and a learning rate SARSA, on the other hand, maintains an estimate of the Q-function for each state-action pair, and updates this estimate based on the last experience and the next action chosen by the agent

56 Overall, the choice of RL algorithm depends on the specific problem at hand and the available resources (e.g computational power, amount of training data, etc.).

Evaluation in RL

Evaluation in RL refers to the process of assessing the performance of an RL algorithm There are two main approaches to evaluating RL algorithms: theoretical and empirical

Theoretical evaluation involves establishing mathematical guarantees about the algorithm's performance For example, one might prove that the algorithm converges to a desirable fixed point, or establish a bound on the sample complexity of exploration (i.e the number of experiences needed for the algorithm to achieve a satisfactory level of performance) These guarantees can provide insight into the algorithm's behavior and help guide the design of new algorithms

Empirical evaluation, on the other hand, involves conducting experiments to test the algorithm's performance on a specific task or set of tasks This typically involves running the algorithm on a benchmark dataset or environment, and measuring its performance in terms of some metric (e.g accuracy, speed, sample efficiency, etc.) Empirical evaluation can provide a more realistic assessment of an algorithm's performance, but may be more time-consuming and resource- intensive than theoretical evaluation

Overall, both theoretical and empirical evaluation are important for understanding and improving RL algorithms Theoretical guarantees can provide insight into the algorithm's behavior and help guide the design of new algorithms, while empirical evaluation can provide a more realistic assessment of an algorithm's performance and help identify areas for improvement.

Q-Learning

The Markov decision process tells us about the logic of a particular learning style of reinforcement learning However, it is not enough to meet specific learning situations There are situations where I can't model, and I can't show goals through pay-per-action policies Q- learning is an evolution of reinforcement learning that allows agents to make decisions without models, without policies The agent will make the best decision based on its current state in the environment

Free Model: This is an important concept when an agent does not rely on a reward system to make decisions that it will try and fail Like the fact a child won't know if they come into contact with a hot object will get burned because in their head there is no logic in how much is desirable,

57 how much is hurt in hand, so the only way to learn is trial and error Q-learning simulates this learning to find the next direction by continuous trial and error

In addition to the general concepts in reinforcement learning, in Q learning, it is necessary to understand the following concepts:

Episode: this is a term that implies the agent has fallen into a state where it cannot perform further actions; this means that the agent has entered a stopped state

Q-Values: to determine the quality of an action A at state S, a corresponding Q value is used and is represented by Q (A, S)

Temporal Difference: To determine the above-mentioned Q value, a specific method is required Temporal Difference is a formula based on the state and action at the time of consideration, the state and the action immediately before calculating the Q-Value and it is The Bellman Equation

The Bellman Equation: The equation is shown below:

Reinforcement learning based optimization convergence speed

It should be noted from Eq (2-5) that the convergence speed k is chosen by a trial-and-error method to obtain the limit cycle as quickly as possible A large value of 𝑘 can reduce the transient-state time, which is defined as a period from the beginning to the moment that the output of 16 th CPG starts the first cycle; meanwhile, it might cause the oscillatory error of the modified CPG network output Thus, it is necessary to optimal this significant parameter On the other hand, Q-learning is a part of reinforcement learning that is the value-based learning algorithm to obtain a higher reward for each episode This chapter employs a Q-learning with discrete action because it costs a duration for the CPG to generate the oscillatory output corresponding to each chosen action before taking the following step Furthermore, this

58 algorithm does not require high computational time, enabling the onboard implementation Accordingly, the state variables 𝑠 𝑡 ∈ 𝑆 (with 𝑆 is the state variable compact set) are the oscillatory error 𝑠 𝑡 1 and the transient-state time 𝑠 𝑡 2 with 𝑠 𝑡 1 ∈ 𝑆 1 , 𝑠 𝑡 2 ∈ 𝑆 2 , and 𝑆 1 , 𝑆 2 ⊂ 𝑆 The shifting of the convergence speed is chosen as the action variable 𝑎 𝑡 ∈ 𝐴 The interaction of the agent and the environment of RL is shown in Figure 3-2

Figure 3-2 Interaction of agent and environment

The reward function is designed to minimize the cost in the trade-off between the transient- state time and the oscillatory error that the mathematical proposed reward function is given the following:

In Eq (3-4) 𝑠 𝑡 ′ is the next state variable, and 𝐿 𝑢 , 𝐿 𝑙 are reward constants set arbitrarily such that the condition holds 𝐿 𝑢 ≫ 𝐿 𝑙 to emphasis that the minimization of the oscillatory error is more significant than that of the transient-state time Thus, 𝐿 𝑢 , 𝐿 𝑙 are respectively set to 100 and 10 in this case The reward subfunctions 𝑟 𝑖 (𝑠 𝑡 𝑖 ) with 𝑖 = 1, 2 are given by the following:

(3-5) where 𝑅 𝑚𝑎𝑥 , 𝑅 𝑚𝑖𝑛 are the maximum reward and the minimum reward set to 1 and 0.1, respectively

As well, the terminal state 𝑠 𝑇 known as the condition for complete an episode holds the constraint 𝑠 𝑇 ≔ {𝑠 𝑡 ∈ 𝑆|𝛿 ≜ (𝐿 𝑢 |𝑠 𝑡 1 | + 𝐿 𝑙 𝑠 𝑡 2 ) ≤ min

𝛿 (Δ 𝑒 )} with Δ 𝑒 is the compact set of 𝛿 of each episode

The Q-value (action-value) function is updated by the simple Temporal Difference (TD) method:

59 where 𝛼 is the learning rate (0 ≤ 𝛼 < 1); 𝛾 is the discount factor (0 ≤ 𝛾 < 1); 𝑎 𝑡 ′ is the next action variable; 𝑄 𝑡−1 (∎) denotes the current Q-value; 𝑄 𝑡 (∎) denotes the new Q-value;

The next policy 𝜋 ′ (𝑎 𝑡 , 𝑠 𝑡 )is implemented by 𝜀-Greedy strategy which is given by:

(3-7) where 𝑞 is the uniform random number

The optimal convergence speed can be determined by the optimal action value:

The pseudo-code of the Q-learning optimization for the convergence speed is illustrated in Table 3-1 The impact of the transient-state time and the oscillatory error on the convergence speed is depicted in Figure 3-3 a) As well, the distribution of Q-value on the state variable and the action variable is illustrated in Figure 3-3 b)

Table 3-1 Pseudo-code of the Q-learning optimization

Algorithm: Q-learning based optimization of the convergence speed

3 Repeat for each step of episode:

6 Take the action 𝑎 𝑡 (traveling the convergence speed 𝑘 to the modified CPG network)

𝑎 𝑡 (perceiving the oscillatory error and the transient-state time, calculating the reward value by Eq (3-5)

9 The next state is assigned as the next state (𝑠 𝑡 ← 𝑠 𝑡 ′ )

10 Until the current state is the terminal state (𝑠 𝑡 ≡ 𝑠 𝑇 )

11 Take the optimal action 𝑎 𝑡 ∗ = argmax

The problem of finding the optimal Q value in the Q-table: the smaller the learning rate, the larger the convergence time of the Q-table, which will cause difficulties in the learning process For not too complicated problems, I can choose a learning rate of close to 1 to reduce the learning time in this case after testing many times choose the learning rate 𝛼 = 0.95 similar to the learning rate with a reduction factor, aka discount factor, with a function at each Q-value calculation The optimal state gives the greatest reward, and vice versa (the formula below) Therefore, the closer I get to the optimal action, the larger the Q-value To make the Q-values more distinct when converging, I chose a large discounter 0.75 by multiple trials of choice 𝛾 0.75 ε -greedy is a parameter in the greedy policy that is often used in Q-learning to find the

60 optimal Q-table value Here I choose 0.2, which means that in each step of calculating random

Q-learning, if random q falls on a value 0.3, the action 𝜋 ′ (𝑠 𝑡 , 𝑎 𝑡 ) will be calculated by the max index of the Q value at time t-1, and the episode number 𝑛 = 1000, the optimal Q-value achieved the approximately value 𝑄 ∗ (𝑠 𝑡 , 𝑎 𝑡 ) = 195.66 with respect to the optimal action of

𝑎 𝑡 ∗ = 96 The convergence speed of the modified CPG network thereby is chosen as 𝑘 = 96 for both simulation studies and experimental studies in next section

Figure 3-3 a) Impact of transient-state time and oscillatory error on the convergence speed b) Distribution of Q-value on state variable and action variable

Simulation and discussion

In this Chapter, the simulation study of the modified CPG network is conducted through MATLAB with the aim that is to evaluate the flexible transition gait of the elongated undulating fin relevant to the swimming pattern, intrinsic amplitude, oscillatory frequency, and the number of waveforms The swimming patterns utilized in this research are illustrated in Figure 3-4 The simulation results also demonstrate the affection of the convergence rate on the transient-state time and the oscillator error of the modified CPG network Figure 3-5

61 Figure 3-4 Swimming patterns of elongated undulating fin propulsion

Figure 3-5 The relative convergence rate concerning transient-state time and oscillatory error

The modified CPG parameters are given for this study as 𝐴 𝑖 = 1(with 𝑖 = 1 ÷ 16) , 𝑓 1, 𝜑 𝑑 = −𝜋 3⁄ , 𝛽 = 0.8 to allow the fin-rays to perform the cuttlefish-like swimming pattern Depicts the output of a single oscillator with 𝑘 chosen arbitrarily around the optimal value of

96 for comparison As can be seen, with 𝑘 = 86, the transient-state time is nearly obtained as 1.45 seconds, whereas that of the case 𝑘 = 96 is approximately value of 1.41 seconds compared to the case of 𝑘 = 106 as 1.36 seconds It is easy to note that the larger amount of 𝑘 will result in the reducing of the transient-state time due to the modified CPG output converged to the limit cycle Nevertheless, increasing the convergence rate 𝑘 will cause the larger oscillatory

62 error of the modified CPG output illustrated in Figure 3-6., which might affect the performance of the actuators powered for fin-rays Therefore, the oscillator error is recognized as the more significant factor than the transient-state time

Figure 3-6 The output of a single oscillator with 𝑘 = 86, 𝑘 = 96, 𝑘 = 106

This simulation study aims to clarify several aspects as smooth accelerating/decelerating with no jerk by changing the oscillatory frequency 𝑓, flexible transition swimming pattern by changing the intrinsic amplitude 𝐴 𝑖 , the transition between forwarding and backward swimming by changing the phase lag angle 𝜑 𝑑 , and transition of waveform number It can be seen from Figure 3-7 The modified CPG network initially generates a nonharmonic swimming pattern with the linear waveform to mimic the cuttlefish-like gait for 2.5 seconds Afterward, the oscillatory frequency gradually increased from 1 Hz to 2 Hz, and the oscillatory output became faster to enable the elongated undulating fin to accelerate During the time 5 – 7.5 seconds, the elongated undulating fin performs the quadratic swimming pattern After 7.5 seconds, the swimming pattern is forced to change into the ecliptic waveform In Figure 3-8 the elongated undulating fin performs generates the waveform with the elliptical waveform to mimic the stingray-like swimming pattern for the first 5 seconds with the phase lag angle of 𝜑 𝑑 −𝜋 3⁄ for each fin-ray At the time of 5 seconds, the swimming pattern abruptly change the phase lag angle into 𝜑 𝑑 = 𝜋 3⁄ to enable the elongated undulating fin to perform backward swimming It can be seen that the modified CPG network can perform better smooth transition gait than the kinematic sinusoidal generator During the time 5 - 20 seconds, the elongated undulating fin performs the backward swimming Afterward, the phase lag angle is again changed into 𝜑 𝑑 = − 𝜋 3⁄ to force the elongated undulating fin to perform the forward swimming This study scenario also reveals that a lower convergence rate endows the shorter transient-state time when the phase lag angle is changed to switch the swimming direction (see Figure 3-9)

Figure 3-7 Output of sixteen oscillators with changes of swimming pattern, oscillatory frequency, and waveform number

64 Figure 3-8 Output of sixteen oscillators with changes of phase lag angle enabling for reverse swimming direction

Figure 3-2 Relation of transient-state time with respect to convergence rate

Figure 3-9 shows the CPG’s outputs in the cases with the convergence rate of 𝑘 = 96 and

𝑘 = 10 For the sake of distinguishing, I take the undulating signals of the first and third CPGs During the time 0 – 5 seconds, the CPGs perform the undulating waveform with the phase lag angle of 𝜑 𝑑 = −𝜋/3 It can be recognized by the fact that the output phase of 1 st CPG leads that of 3 rd CPG At the time of 5 seconds, the CPGs are commanded to change into the phase lag angle of 𝜑 𝑑 = 𝜋/3

Conclusions

This Chapter has presented the modified CPG network for generating the rhythm for the elongated undulating fin with sixteen fin-rays to mimic the fish’s swimming patterns Accordingly, the modified CPG network is composed by chain coupling sixteen oscillators with bidirectional perturbation because each fin-ray is only affected by its two adjacent oscillators Both simulation and experimental results show that the modified CPG network seems to be very promising to perform the rhythm for a fish robot It allows changing the kinematic parameters abruptly with no jerk of oscillation Additionally, this research has also investigated the intrinsic parameter of the CPG known as the convergence rate, which has not been considered before, usually using the trial-and-error method for this issue The simulation results have revealed that the large convergence rate can reduce the transient-state time; however, it might cause the oscillator error worse Therefore, the tunning of the convergence rate is to trade- off between the transient-state time and the oscillatory error To deal with this issue, the Q-

65 learning algorithm is appropriate to find the optimal convergence rate To obtain smooth oscillation avoiding damage to the RC servo motor, the reward function of the Q-learning is defined with more significant oscillatory error than the transient-state time The optimal convergence rate found by the Q-learning can provide the short transient-state time and the appropriate oscillatory error in the simulation/experimental results with the abrupt change of kinematic parameters such as amplitude envelope, oscillatory frequency, and waveform number Especially, I have found that the transient-state time is longer in the case of using the large convergence rate when the phase lag angle is changed into the opposite value for reverse swimming However, a change of the convergence rate while the limit cycle of the CPG is obtained does not affect the CPG output Thus, this might raise a piece-wise switching function to change the convergence rate according to the swimming operation Consequently, the convergence rate should be changed from the optimal value into a smaller appropriate value before the phase lag angle is changed to switch forwarding swimming into backward swimming and vice versa Afterward, the convergence rate is again changed into the optimal value to obtain the short transient-state time

The code is shown in Appendix A

FORCE OPTIMIZATION OF ELONGATED UNDULATING FIN ROBOT USING IMPROVED PSO BASED CPG

Problem statement

In research that I have published, four locomotion patterns are carried out to evaluate the locomotion patterns and thrust force influence The first three locomotion patterns are similar to nature species (cuttlefish, knife fish, stingray) respectively, and the fourth locomotion pattern is selected for comparison without referring to natural aquatic Figure 4-1

The experimental platform as shown in Figure 4-3 was adopted to investigate the thrust force performance of the undulating fin in various kinematic and pattern parameters The morphology parameters of the undulating fin are listed in Table 4-1.

Figure 4-2 Undulating fin in water tank

67 Table 4-1.Morphology parameter of the undulating robotic fin

Three sets of experiments with difference frequencies (0.5Hz, 0.75Hz, 1Hz, 1.25Hz) these are frequencies that are suitable for the response of the designed electrical control unit, and these are also guaranteed to be within the frequency range observed in nature Amplitudes (40mm, 60mm, 80mm, 100mm) and locomotion patterns (Mode1, Mode2, Mode3, and Mode4) were performed to obtain the thrust performance of the undulating fin The experiment data is sampling at 200Hz this is the sampling frequency that matches the sampling device In principle, the higher the frequency, the greater the accuracy of the measurement and post- processed by mean of adjacent averaging (100-point resolution) to remove the sensor noise The results show that oscillatory frequency amplitude envelope increases for pattern Mode1 and Mode 3 For the pattern Mode 2 and Mode 4, the static thrust force decreases under the increase of the oscillatory frequency when it reached a certain value As well, the static thrust force almost increases when the amplitude envelope increases except for the pattern Mode 4, which makes the static thrust force decreases under the increase of the amplitude envelope when it reached a certain value I found that the pattern Mode 4 produced the largest thrust force and the pattern Mode 3 generated the static thrust force increases linearly with the increase of both amplitude envelope and oscillatory frequency

To simulate the CPG behavior, many researchers have established some of the different mathematical models to seek the parametric values of CPG to obtain the desired oscillation profiles The authors in [116] propose a CPG integrated with PID to establish a motion controller for a prototype of the fish robot In the paper [117], a CPG network comprising ten Masuoka oscillators is offered to generate rhythmic signals In 2012, Zhou et al introduced a CPG model using two motors to drive two pectoral fins for a manta ray robot [118] Due to the

68 slow response time, an improved CPG using a genetic algorithm is performed for the thrust generation of the fish robot [118] In paper [119], a CPG model is adopted to achieve the undulating motion pattern for finding the critical factor, which affects the propulsion Although the above mathematical models have been successfully applied for establishing a CPG-based motion controller, enhancing the propulsive force of the robotic fish using a CPG network is still a significant challenge To overcome this problem, optimization algorithms have been performed for parameter selection by some researchers A Hopf oscillator-based CPG network performed the parameter synthesis subjecting to some learning rules to obtain the desired swimming pattern [120] Meanwhile, the modified CPG in paper [121] can produce different locomotion patterns of an actual fish by combining Andronov–Hopf oscillators and an artificial neural network (ANN) Recently, heuristic search has been widely applied for tuning the parameters of the CPG network In paper [122], the genetic algorithm (GA) is used for the rhythmic generation based on CPG models by establishing the weight values of the coordination between oscillators The authors in [123], [124] both use particle swarm optimization (PSO) to find the optimal characteristic parameters of the Hopf oscillators-based CPG for improved propulsive performance Although these meta-heuristic algorithms are well-resulted in seeking the CPG parameters, they are often trapped in local optimal In this brief, a new differential particle swarm optimization (D-PSO) is investigated to improve optimization problems

Realized the need to optimize the set of parameters for the swimming pose set established in chapter 3 with the selected convergence parameters The main idea of this chapter is to use the commonly used optimization algorithms for the selection of control parameters to find the set of Gi parameters in the numerical equation 2.30, with the objective function being the maximum thrust for each set of parameters swimming frequency At first, a CPG using the chain topology of sixteen coupled Hopf oscillators is offered to generate fishlike rhythmic movements Following, the improved D-PSO is exploited to optimize the amplitude values of the CPG network to increase the average propulsive force of the undulating fin robot to make a faster movement Finally, the obtained result of CPG parameter synthesis using the different optimization methods, including DPSO, PSO and GA, is implemented to proves the superiority of the proposed D-PSO algorithm.

Theory of Particle Swarm Optimization (PSO)

Introduction

Particle Swarm Optimization (PSO) is an algorithm to solve optimization problems on intelligent population or swarm intelligence (SI) PSO algorithm was first introduced and proposed by Kennedy and Eberhat in 1995 About these authors, in spite of developing simple

69 methods, it works effectively in optimizing nonlinear continuous functions, extremum of a functional and some other multi-objective optimization problems This algorithm is inspired by the observation of biological populations in nature, the cooperative behavior displayed by various species, the organization and activities in population in order to give a fastest optimal solution Thus far, the idea of PSO is to simulate the method of searching food from animal swarms such as bird and fish, that is, finding food by its and neighbor’s information

PSO is one of evolutionary computation techniques, however, PSO also has a few differences compared to other evolutionary techniques Similar to other techniques, PSO also creates population initialization by using the random distribution method The appropriateness of PSO is this algorithm is based on the evaluation and the selection of alternatives by informations about the experience of each individual and swarm This selection is better than classical evolutionary methods such as GA

In recent years, with high efficiency and benefit, PSO algorithm has been developed rapidly, appearing in many researches of ameliorating parameters All the results have been proven by many popular functions, viz Shaffer, Sphere, Rosenbrock, Rastrigin and Griewank As a consequence, PSO has applied for solving some problems: Unconstrained Optimization, Constrained Optimization, Multi Objective Optimization, Dynamic Optimization Problem, etc.

The concept of intelligent swarm

A swarm is a group of organisms that interact with each other SI was used for the first time in mobile robot systems by Beni and Wang in the 1980s Then in the early 1990s, studies and researches about intelligent swarms have started from artificial life and society psychology, especially, society of insects, birds and fish These day, scientists have concentrated on swarm models as an important strategic method in solving optimal genres such as constrained and unconstrained optimization Hence, this concept is commonly used in telecommunications, network design, robotics, military, etc

Some research are based on intelligent swarm:

Ant Colony Optimization (ACO): based on an idea of how ants find the way, build a route from their nest to a food source

First ant will find food, then return back to its nest with marking a traces of pheromones on the passage

After that, the ants go randomly in four possible ways, the more pheromone traces, the more attractive the ants to select a short route

At last, the ants solely follow the shortest way, other roads will gradually disappear

Particle Swarm Optimization (PSO): inspired by an idea of find food, water from birds According to the hypothesis of the problem, each individual has a position, velocity and also a communication channel Individuals move in solution field, each individual will be evaluated by one or more adaptive standards Furthermore, these individuals will follow the best one in their range.

Classical PSO algorithm

The PSO algorithm, in its basic form, aims to emulate the behavior of birds and serves as a direct inspiration for the algorithm itself The fundamental model can be described as follows in a scientific manner Each bird individual is represented as a point within the Cartesian coordinate system, initially assigned random velocities and positions The program operates based on the "nearest proximity velocity match rule," which ensures that an individual matches the speed of its closest neighbor With each iteration, all points converge to the same velocity However, due to the oversimplified nature of this model, it deviates significantly from real- world scenarios To address this, a random variable is introduced to the velocity component Hence, during each iteration, in addition to adhering to the "nearest proximity velocity match" criterion, each speed is augmented by a random variable This modification brings the simulation closer to real-world scenarios

Heppner proposed a "cornfield model" to simulate the foraging behavior of a bird flock In this model, a cornfield represents the food source's location on a plane, and birds are initially dispersed randomly across the plane The birds move according to the following rules to locate the food Firstly, let's assume the position coordinates of the cornfield are (𝑥 0 , 𝑦 0 ), and the position and velocity coordinates of an individual bird are (𝑥, 𝑦 )and (𝑣 𝑥 , 𝑣 𝑦 ) respectively The distance between the current position and the cornfield is used as a performance measure for the position and speed A greater distance to the "cornfield" implies better performance, while a shorter distance indicates poorer performance Each bird possesses memory capabilities and can remember the best position it has reached, denoted as 𝑝 𝑏𝑒𝑠𝑡

The PSO algorithm utilizes a velocity adjusting constant, denoted as 𝑎 and a random number 𝑟𝑎𝑛𝑑 ranging from [0, 1] The change in the velocity component follows the following rules: if 𝑥 > 𝑝𝑏𝑒𝑠𝑡𝑥, 𝑣 𝑥 = 𝑣 𝑥 – 𝑟𝑎𝑛𝑑 × 𝑎, otherwise, 𝑣 𝑥 = 𝑣 𝑥 + 𝑟𝑎𝑛𝑑 × 𝑎 if 𝑥 > 𝑝𝑏𝑒𝑠𝑡𝑦, 𝑣 𝑦 = 𝑣 𝑦 – 𝑟𝑎𝑛𝑑 × 𝑎, otherwise, 𝑣 𝑦 = 𝑣 𝑦 + 𝑟𝑎𝑛𝑑 × 𝑎

71 Assuming the swarm has a means of communication, each individual is aware of and remembers the best location of the entire swarm so far, referred to as 𝑔𝑏𝑒𝑠𝑡 Additionally, there is a velocity adjusting constant 𝑏 After the velocity adjustment based on the previously mentioned rules, the velocity component must also be updated according to the following rules: if 𝑥 > 𝑔𝑏𝑒𝑠𝑡𝑥, 𝑣 𝑥 = 𝑣 𝑥 – 𝑟𝑎𝑛𝑑 × 𝑏, otherwise, 𝑣 𝑥 = 𝑣 𝑥 + 𝑟𝑎𝑛𝑑 × 𝑏 if 𝑥 > 𝑔𝑏𝑒𝑠𝑡𝑦, 𝑣 𝑦 = 𝑣 𝑦 – 𝑟𝑎𝑛𝑑 × 𝑏, otherwise, 𝑣 𝑦 = 𝑣 𝑦 + 𝑟𝑎𝑛𝑑 × 𝑏

Computer simulation results indicate that when the ratio 𝑎/𝑏 is relatively large, all individuals quickly converge towards the "cornfield." Conversely, if the ratio 𝑎/𝑏 is small, the particles gather around the "cornfield" unsteadily and slowly This simple simulation demonstrates the swarm's ability to rapidly locate the optimal point Inspired by this model, Kennedy and Eberhart developed an evolutionary optimization algorithm After numerous experiments, they established the basic algorithm as follows:

In this algorithm, each individual is abstracted as a particle without mass and volume, possessing only velocity and position Hence, it is referred to as the "particle swarm optimization algorithm." Based on this foundation, the PSO algorithm can be summarized as follows: It is a swarm-based search process where each individual, called a particle, represents a potential solution in the D-dimensional search space Each particle can remember its optimal position, as well as the optimal position of the swarm, along with its velocity In each iteration, particle information is combined to adjust the velocity for each dimension, which is then used to compute the new position of the particle Particles continually change their states within the multidimensional search space until they reach a balanced or optimal state or exceed the calculation limits The objective functions establish a unique connection between different dimensions of the problem space Numerous empirical studies have demonstrated the effectiveness of this algorithm as an optimization tool The flowchart of the PSO algorithm is depicted in Figure 4-3

72 Figure 4-3 The flowchart of the PSO algorithm

Developed PSO-based CPG Optimization

D-PSO

Differential particle swarm optimization (DPSO) is a modified version of classical particle swarm optimization (PSO), which is capable of escaping from local optima in order to obtain a higher quality solution in classification problems

The proposed differential particle swarm optimization (DPSO) considers an additional feature in the classical PSO The additional feature is the opinion of one of the particles selected randomly from the swarm The randomly-scaled difference of the particle and its opinion-giver particle is included in the velocity equation of the particle necessary to escape from local minima Mathematically, the concepts of DPSO can be expressed as follows:

In eqn (4-1), c 3 is the scaling factor and r 3 is a randomly-generated random number between 0 and 1, whereas l represents the expert particle corresponding to target particle p In this equation, l varies from 1 to N but l ≠ p Figure 4-4 shows the search mechanism of the proposed DPSO in a multidimensional search space

Figure 4-4 Proposed DPSO search mechanism of p th particle at k th iteration in a multidimensional search space [94]

In Figure 4.4, Pbest k p,q represents personal best q th component of p th individual, whereas Gbest k q represents q th component of the best individual of population up to iteration k It is found from Figure 4-4 that the proposed DPSO is performed by adding one more term 𝑉 𝑝 𝐷𝑖𝑓𝑓 = 𝑐 3 𝑟 3 (𝑋 𝑙,𝑞 𝑘

𝑋 𝑃,𝑞 𝑘 ) in the velocity equation of the classical PSO notation This additional feature allows the particles to escape from a local optimum in order to search for a better solution in the other reasons in the search space

A detailed flowchart of the proposed DPSO algorithm is shown in Figure 4-5

74 Figure 4-5 Flowchart of the proposed DPSO

From Figure 4-5, the proposed DPSO algorithm can be expressed using the following steps:

2 Initialize positions X and velocities V of each particle of population

3 Evaluate the fitness of each particle F p k =f(X p k ), ∀p and find the best particle index b

6 Update the velocity and position of each particle using eqns (4.1) and (4.2)

7 Evaluate the updated fitness of each particle F p k+1 =f(X p k+1 ), ∀p and find the best particle index b1

8.Update Pbest of each particle ∀p

If F p k +  F p k then Pbest p k + =X p k + else Pbest p k + = Pbest p k

If F b k + F b k then Gbest k + Pbest b k + and set b = b else Gbest k + Gbe ts k

10 If k Maxite then k = +k 1 and go to step 6 else go to step 11

11 Optimum solution obtained and hence print the results Gbest.

Application of D-PSO to CPG model

The performance of CPG model is susceptible to the value of amplitude, and hence it is necessary to seek the best parametric values for purpose of improving the propulsive

75 performance In this sub-section, a novel D-PSO is applied to obtain a set of optimum amplitudes by maximizing the average propulsive force called the objective function The optimization problem of CPG model can be posed in the following way:

Figure 4-6 Flowchart of the proposed approach The developed D-PSO-based CPG optimization problem is performed as follows:

Select some parameters of D-PSO, including w, c 1 , c 2 and c 3

The initial positions and velocities of each individual in swarm are selected by random values

76 Initialize oscillation amplitudes of CPG model within their ranges

Call the Hopf oscillator-based CPG model

The fitness function of each individual is evaluated by:

The particle having the best position is indexed as p, and hence the personal experience and the overall experience are selected as follows: e ite besti it

Initialize the iteration at ite = 1

The velocity and position of each individual are updated by Equations (4.5) and (4.6)

The updated fitness function of each particle is re-evaluated:

FN + = f X it + ∀i (4-7) and indexing for the particle with the best position as q

The personal experience and the overall experience of swarm are updated:

If FN i ite +  FN i ite then P best i ite + = X i it e + else P best i ite + = P be i st ite

If FN q ite + F b k then G best ite + P best q ite + and select p = else G best ite + G be s t it e

If ite < Maxite then ite = ite +1 and goto the step 7 else goto the step 10

The optimal parameters of CPG are obtained as G best it e , and hence the maximum thrust force is defined.

Test Results and Discussion

Testing the D-PSO algorithm on the basic math function

To prove the feasibility of the developed D-PSO technique in finding the extremum of functions, some of basic math functions given have been initially tested as shown in Table 4-3

It is noted that D-PSO is performed using the above selected parameters corresponding the population size of 10 particles

Table 4-3 The tested five math functions

Function name Equation Variable range

It is clear from Table 4-3 that the proposed D-PSO technique is ability to optimize the basic five math functions successfully For the population size of ten particles, the D-PSO achieves the least value of mean square error (MSE) of five math functions as 8.11E-05, 0.000928, 0.000389, 3.37E-15, and 2.80E-12, respectively.

Testing the D-PSO algorithm on the modified CPG network

Table 4-4 gives the average propulsive force of the undulating fin with the dynamic model driven by Hofp oscillator-based CPG unit using both with and without D-PSO optimization

Table 4-4 Optimization results of CPG model with/without D-PSO algorithm

It can be Table 4-4 that the average push force before optimization with the random chosen parameters is 0.52 N, whereas this value is increased to 3.6 N after using the D-PSO based CPG According to the constant values of intrinsic frequency of 2 Hz, the best amplitude parameters of A1-A16 as given in Table 4-4 Furthermore, the D-PSO-based CPG output and the corresponding average thrust force is shown Figure 4-10 and Figure 4-11 respectively

Figure 4-5 Simulation results with the D-PSO-based CPG -05 CPG outputs

81 Figure 4-6 Simulation results with the D-PSO-based CPG - The average thrust force Table 4-5 Optimization results of CPG model using different meta-heuristic algorithms

Table 4-5 shows a comparison of D-PSO with the traditional PSO and GA to prove the superiority of the developed method in this study The parameter selection of PSO and GA is also performed through running repeatedly many times corresponding to the different population sizes As a result, the best acceleration coefficients are both selected as 2.0 for PSO, while GA uses the best crossover probability as 0.75 and the best mutation fraction as 0.015 Moreover, Figure 4-12 illustrates the convergence characteristic of average thrust obtained using three different optimization techniques

From Figure 4-12, I can see that the optimum results obtained by the D-PSO is better than those obtained by any other CPG parameters optimizers The average thrust is 3.60 N in the case of using the proposed D-PSO, while the values of force only reach 3.58 N and 3.57 N by PSO and

GA, respectively In the other words, the CPG based on D-PSO gives the largest thrust Also, the proposed D-PSO takes less convergence time than that of PSO and GA, 2.3 iterations compared to 3.9 iterations and 10.5 iterations, respectively It can be further seen from Figure 4-14 that there are two steps in the convergence characteristic of PSO and GA, so it is easy for these two optimization methods to get trapping the local maxima This problem has been solved

82 by using the proposed D-PSO, whose optimization result only shows one step It means the D- PSO is the ability to achieve the global best position better than the others

Figure 4-7 The convergence characteristic of some CPG optimization techniques

Conclusions

A locomotion control has been built for the fin module, which can be very flexible in creating swimming pattens Considering the same swimming frequency (also the sonar frequency generated under the water of the robot) for each swimming patten, different thrusts are generated This chapter is based on the thrust calculation model built in the modeling section in Chapter 2 The thrust optimization function is built with the variables as a combination of the amplitudes of 16 fin rays based on the DPSO algorithm This is a new approach to solving the real problem of finding a set of control parameters for the greatest thrust without having to change the emitted sonar frequency to help the robot quickly get out of the danger zone In addition, with the improvement of the swarm optimization algorithm, it helps to avoid the local optimal points and gives better results than the two solutions with the same optimal idea, the original GA and PSO The simulation results show that the improvement of DPSO gives a set of amplitude values that increase the stability of the thrust, and the value is about 0.02 N higher than the other two methods The research results can be used for a wide range of frequencies This lets us make a table where I can look up the best frequency and force to use before turning on the robot to avoid causing hydroacoustic disturbances by changing the frequency at which it swims

The code and data are shown in Appendix B

Experimentation is a very important part of research, especially with topics related to application direction In this chapter, experiments are arranged to clarify the following factors:

• Ability to flexibly switch the CPG-based motion controller when changing frequency, amplitude

• Verify the results of convergence rate of some random convergence values K compared with the optimal value found in chapter 3

• Compare the thrust generated at a given frequency, a certain convergence factor K of the optimal set of amplitude parameters found in chapter 4 and a few sets of random parameters.

Introducing experimental models and measuring devices

The elongated undulating fin comprises sixteen oblique adjacent fin-rays interconnected with a flexible membrane Each fin-ray is driven by an RC servo motor that enables the fin-ray to sway around a rotary joint fixed to a supporting frame illustrated in Figure 5-1

Specific parameters are presented in Table 5-1 To produce the propulsive force, the elongated undulating fin performs the sinusoidal oscillatory propagation along to the fin from the anterior to the posterior

Table 5-1 Specific parameters of elongated undulating fin

Paramater Value Unit length 775 mm

Figure 5-1 Overview of elongated undulating fin

84 The number of servos corresponds to the number of fin rays; in fact, the natural structure of the fish makes the fin rays very close together, and the number of fin rays present in a complete wavelength is very large With fish learning but mechanical limitations, achieving smooth propulsion performance necessitates selecting the optimal number of fin rays In published research servo number in model selection is 16 is optimal [40] Therefore, this model chooses

16 fin rays to set up an underwater propulsion module that simulates the working mechanism of the Gymotiform class The distance between the two rays is limited by the mechanical structure and is selected to be 30mm The servo motor chosen is a commercially available type with specific parameters in Table 5‑2

Idle current (at stopped) 6mA@7.4V

Runnig current (at no load): 180mA@7.4V

Peak stall torque: 35.5kg.cm@7.4V

Due to the design as a propulsion module capable of actual underwater operation, the water resistance is achieved through the geared transmission mechanism Figure 5-2 In this design, the fin ray will not directly engage with the servo motor shaft but indirectly through two perpendicular bevel gears 1:1 gear ratio

Figure 5-2 Fin ray drive mechanism

85 Fish fins are designed according to the self-propelled propulsion module system, so the control system is designed according to the open-loop control plan The control board see in Figure 5-3 receives the frequency signal from the central controller and then calculates itself to give the corresponding rotation angle of the PWM signal to control each RC servo To achieve synchronous regulation, each RC motor functions as a low-level controller, effectively embodying the essence of CPG-based control The high-level controller solely transmits control parameters, while the motor itself autonomously executes all position response operations This approach draws inspiration from the joint control system of humanoid robots, thereby facilitating efficient and independent motor operations

Figure 5-3 Control system structure The block diagram of the control circuit is designed as shown in Figure 5-4

Figure 5-4 Block diagram of the control Fin module board

The experimental model is designed to have parameters at the ratio of 1:1 compared with the simulation program servo motor and control diagram are briefly mentioned in chapter 2 design drawings are attached in the appendix Specific parameters of the fin ray model are shown in the following Table 5-3

5 Width of diaphragm part 150 mm

Figure 5-5 Module elongated undulating fin

Thrust measuring device is a specialized device used in industry to measure compressive force:

IMADA DS2-200N with the following detailed technical parameters:

Selectable Units lbf(ozf), kgf(gf) or Newtons

Overload Capacity 200% of F.S (Overload indicator flashes beyond 110% of F.S.) Data Processing Speed 1,000 data/second (30 data/second rate selectable)

Power Rechargeable Ni-MH battery pack or AC adapter

Battery Indicator Display flashes battery icon when battery is low

Setpoints Programmable high/low setpoints w/ LCD indicators

Outputs RS-232C, Digimatic and ± 1 VDC analog output

Operating Temp Operating Temp 32° to 100°F (0° to 40°C)

• Instrument for measuring the true angle of rotation of the fin ray

The actual measurement angle of the fin beam is fed back from the position sensor of the servo motor itself At each fin, the ADC signal from the sensor is returned to the ADC reader, through noise filtering processes and transmitted to the computer via serial communication

Figure 5-6 Instrument for measuring the true angle of rotation of the fin ray

• Experiment tank and equipment setup

The experimental system was arranged in a 2mx4mx1.2m water tank to create an environment for fish fins to swim with full properties of water acting on the fins however note that there are still a few factors that are ignored such as the effects of vortexes, waves

88 Figure 5-7 Experiment tank and equipment setup

• Software and automatic parameter recording tools

Self-developed software on the platform combining MCU programming and computer interface in Python The code is presented in detail in the attached appendix

Figure 5-8 Software and automatic parameter recording tool

Experiment

Experiment 3

• Purpose: in order to validate the calculation in Chapter 4 about the set of amplitude parameters that allow for the creation of the optimal thrust The experiment was conducted to put many different sets of amplitude parameters into the CPG-based locomotion controller of the fin module The actual swimming wave shape is recorded, and the force of each wave shape is measured by the force meter mounted on the experimental model

Similar to experiment two but will be arranged with an additional dynamometer to measure the thrust of the model Take the time value of force in turn of the swimming strokes optimized according to PSO and the popular swimming strokes in practice The parameters of the optimal swimming postures by PSO are taken according to the simulation results The actual swimming poses are selected as follows: k, f=1(Hz), 𝜑 𝑑 = − 𝜋

3, 𝛽 = 0.8 Their amplitudes are chosen according to the following parameter with ℎ = 𝜋

• Result: to verify the responsiveness of the designed mechanical mechanism and the motion controller The rotation angle values of each fin beam are collected through the angle sensor located at each servo motor These sensors are rotary resistive sensors In turn, different strokes are tested to compare their thrust, including Liner waveform Figure 5-14; Quadratic waveform Figure 5-15; Elliptic waveform Figure 5-16; Random waveform Figure 5-17; GA CPG waveform Figure 5-18; Straight CPG waveform Figure 5-19; PSO CPG waveform Figure 5-20; DPSO waveform Figure 5-21 in the above figures, the red line is the simulated line The blue line is the actual measured line The actual data collected is attached in Appendix D

The linear swimming posture commonly seen in cuttlefish is controlled by a CPG-based locomotor controller Blue line is simulation, red is experimental measurement

95 Quadratic waveform is common in some species such as black knifefish in South Africa, strabismus This realistic swimming pose is performed on a robot under CPG-based 16-ray fin movement control

Elliptic waveform is commonly seen in manta rays during exercise This realistic swimming pose is performed on the robot under the control of the 16-ray fin movement on the CPG platform Above are three natural swimming postures that the robot can easily simulate with the CPG motion controller Later, swimming postures are created based on the thrust optimization criteria without changing the frequency by different optimization methods; the measured experimental and simulation results are presented in the graphs

In all swimming strokes, one thing was found in common: the actual red response line coincided with the simulated blue line However, later on, there is a small deviation, but the contour still ensures the intended swimming shape This can be explained by the slow response speed of the parameter taking system and the effect of water resistance in the small swimming pool (the influence factor due to eddy currents on the edges of the pool has been ignored)

The IMADA thrust meter is used to collect thrust data concurrently with the swimming posture test In all force graphs, the red line is the simulated line, the blue line is the actual measured value line The graphs include: force of Linear waveform Figure 5-22; force of Quadratic waveform Figure 5-23; force of Elliptic waveform Figure 5-24; force of Random waveform

Figure 5-25; force of GA CPG waveform Figure 5-26; force of Straight CPG waveform Figure

5-27; force of PSO CPG waveform Figure 5-28; force of DPSO waveform Figure 5-29

The result of force versus time is shown by the following graphs:

99 Figure 5-12 Force of liner waveform

Figure 5-13 Force of liner waverform

100 Figure 5-14 Force of elliptic waverform

Figure 5-15 Force of secret waveform

101 Figure 5-16 Force of GA waveform

Figure 5-17 Force of straight CPG waveform

102 Figure 5-28 Force of D-PSO CPG waveform

Figure 5-29 Force of PSO CPG waveform The graph shows spikes in the actual pickup force that can be explained by the vibrations of the sliding system when the fins are in motion base force profile as simulated The actual thrust is

103 lower than that of the simulation when stabilizing, which is explained by the friction force of the sliding rail

In order to easily compare the generated thrust, the actual data obtained for the thrust according to the above swimming wave shapes is shown in Figures 5-30 and Figures 5-31 The average forces of each stroke are plotted on the same chart with different colors and are specifically annotated

Figure 5-18 Average force of strokes from CPG

Figure 5-19 Average force of swimming strokes observed from nature

This experiment attempted to generate many different swimming wave shapes in order to compare their repulsion with the same frequency Experimental results show that the swimming wave shape generated when using the amplitude parameters A1-A16 found by DPSO optimization is the highest Experiments once again confirm the correctness of the thrust optimization approach by finding the best set of amplitude parameters without having to change the frequency.

Conclusions

This chapter summarizes the results of three experiments and compares them with the simulation outcomes to validate the computations presented in Chapters 2, 3, and 4 The first experiment demonstrates that using the CPG motion controller enables smooth swimming pattern transitions when adjusting the frequency or amplitude The second experiment confirms that the swimming pattern conversion coefficient, K, obtained through reinforcement learning algorithms, outperforms other values This coefficient achieves a balance between the swimming pattern transition speed and the control error In practical terms, this factor affects the mechanical safety limits of the fin system The final experiment substantiates, through real measurements, that the swimming pattern discovered by the D-PSO algorithm is optimal in terms of thrust generation

The code and data are shown in Appendix C

Over the decades, research on underwater robots has been developed that have similar movement mechanisms to fish Although there are significant turning points in the performance, swimming ability However, perfecting the robot to achieve the performance and flexibility of fish has a long way to go and needs the contributions of many researchers The thesis looks at fish robots as a way to solve a pressing problem in demining and demining under the sea after the sea wars

To enhance applicability Firstly, the thesis expands the scope of contributions by conducting research in the direction of modularizing propulsion modules using the swimming mechanism of fish suitable for application purposes Based on this module, it is possible to build larger modules or combine more modules to increase the mobility of a complete fish robot Secondly, intending to focus on flexibility in changing swimming posture, the thesis has built a locomotion controller for the fin ray module based on the capture mechanism of real fish CPG Thirdly, to obtain a basis for calculations applied to complete robots, the thesis proceeds to model the fin ray module based on the theory of fluid interactions to calculate the thrust of a module with different swimming waveforms In the thesis, it was also discovered that the coefficient K in the locomotion controller of the fin module is an essential factor determining the flexibility of this swimming mechanism, and a reinforcement learning algorithm was applied to find the optimal K factor Finally, the thesis has been presented to achieve the specific goal when it is necessary to escape from a dangerous area but not activate any sound sensing mechanism of the hydroacoustic non-contact fuse (changing sonar frequency) This presents the solution to find the optimal thrust swimming shape at each frequency.

Dissertation contributions

The dissertation adopts a construction-oriented methodology, building on the earlier research of colleagues interested in fish robots and concentrating on finding solutions to the practical issues of the need for robots for surveying and removing underwater bombs In addition, create a biomimetic propulsion component to increase the applications The main contributions can be summed up as follows in terms of their impact on science:

• Modularizing the propulsion system to create a stand-alone system with a high self- control mechanism based on the CPG mechanism This module generates all motion control operations automatically, which only accepts a limited range of parameters,

106 including frequency and amplitude values This enables the rapid development of fish robots in a variety of sizes and scales

• Concentrating on creating a controller that directs movement in the direction of improving flexibility by identifying the distinctive K coefficient for the swimming posture's transition period The dissertation is fully applicable to biomimetic operational structures by a contemporary reinforcement learning algorithm without the provided learning model Thanks to reinforcement learning, robots can become more in tune with how learning and evolution work in nature

• The demand for maximal acceleration without changing the frequency is evident in many cases of surveying and promoting the movement of fish robots, not just in mine clearance applications Therefore, the dissertation advocates a novel strategy that utilizes an enhanced swarm algorithm to avoid the local optimization mistakes found in earlier published studies

In addition to its academic contributions, my thesis also contributes to addressing the fundamental specifications of an underwater mine-detecting robot fish Building upon the research conducted in the thesis, there are currently three ongoing projects by other researchers that follow this study and further develop it into a complete robotic fish These projects aim to integrate the findings and methodologies presented in the thesis to create a fully functional robot fish By leveraging the knowledge and insights gained from this research, these projects strive to enhance the capabilities and performance of the robot fish for effective mine detection and exploration tasks in underwater environments The collaborative efforts of these projects build upon the foundation laid by the thesis, paving the way for advancements in the field of underwater robotics and contributing to the development of innovative solutions for mine- clearing operations.

Future work

From the results obtained in the thesis Further work will be done to perfect the locomotion controller with optimal parameters Specifically, I have now prepared new machine-learning algorithms to help optimize energy consumption for each specific movement need In the future, the CPG-based locomotion controller still has many scientific gaps for further research

Based on the researched parameters in the thesis, the next step for me and my colleagues is to integrate multiple fish fin modules onto a rigid body to develop a versatile underwater mine-

107 clearing robot This project has been approved and funded by the government, and we have already commenced its implementation, which is expected to span a period of six months

The objective of this project is to leverage the insights gained from the thesis and combine them with practical engineering approaches to create a robust and adaptable robot for bomb disposal in underwater environments By integrating the fish fin modules onto a rigid body, we aim to enhance the manoeuvrability and agility of the robot, enabling it to navigate complex underwater terrains and effectively locate and neutralize underwater mines

The government's support and approval of this project demonstrate the recognition of its significance and potential impact in addressing the challenges associated with underwater mine clearance We are dedicated to successfully executing this project, contributing to the advancement of underwater robotics technology, and enhancing the safety and security of underwater operations

International journals/Book series (ISI-Scopus)

1 V D Nguyen, “Force Optimization of Elongated Undulating Fin Robot Using Improved PSO-

Based CPG,” in Computational Intelligence and Neuroscience, vol 2022, pp 1-11, 2022, doi:

10.1155/2022/2763865 Impact Factor: 3.633; Journal Rank: 3615; SJR: 0.605; (Currently, although the magazine still has a Q1 index, it has been removed from the Scopus list)

2 V D Nguyen, D Q Vo, V T Duong, H H Nguyen, and T T Nguyen, “Reinforcement learning-based optimization of locomotion controller using multiple coupled CPG oscillators for elongated undulating fin propulsion,” in Mathematical Biosciences and Engineering, vol

19, no 1, pp 738-758, 2021, doi: 10.3934/mbe.2022033 Impact Factor: 1.285; h-index: 45; SJR:

3 V H Nguyen, V T Duong, H H Nguyen, V D Nguyen, V S Le, and T T Nguyen,

“Kinematic Study on Generated Thrust of Bioinspired Robotic with Undulating Propulsion,” in

International Conference on Advanced Mecha-tronic Systems (ICAMechS), 2020, doi:

10.1109/icamechs49982.2020.9310158 Impact Factor: 0.29; h-index: 13; SJR: 0.148;

Proceedings of the international conferences

1 V H Nguyen, C A T Pham, V D Nguyen, and T T Nguyen, “Study on Velocity Control of Gymnotiform Undulating Fin Module,” in Lecture Notes in Electrical Engineering, pp 714-

2 V H Nguyen, C A T Pham, V D Nguyen, H L Phan, and T T Nguyen, “Computational Study on Upward Force Generation of Gymnotiform Undulating Fin,” in Lecture Notes in

3 V D Nguyen, C A T Pham, V H Nguyen, T P Tran, and T T Nguyen, “Modular Design of Gymnotiform Undulating Fin,” in Lecture Notes in Electrical Engineering, pp 924-931,

4 V H Nguyen, C A T Pham, V D Nguyen, D H Kim, and T T Nguyen, “A Study on Force Generated by Gymnotiform Undulating Fin,” in 15th International Conference on Ubiquitous

5 V D Nguyen, D K Phan, C A T Pham, D H Kim, V T Dinh, and T T Nguyen, “Study on Determining the Number of Fin-Rays of a Gymnotiform Undulating Fin Robot,” in Lecture

Notes in Electrical Engineering, pp 745-752, 2017, doi: 10.1007/978-3-319-69814-4_72

Proceedings of the national conferences

1 C N Nguyen, T N L Le, V D Nguyen, T P Tran, and T T Nguyen, “A Study on

Localization of Floating Aquaculture Sludge Collecting Robot,” in Advances in Asian Mechanism and Machine Science, pp 512-518, 2021, doi: 10.1007/978-3-030-91892-7_48

2 2 A S Nguyen, V D Nguyen, H H Nguyen, and T T Nguyen, “A Novel Approach for

Determining a Hit Point Based on Estimating Target Movement and Ballistic Table,” in Lecture

Notes in Electrical Engineering, pp 703-713, 2020, doi: 10.1007/978-3-030-53021-1_70

[1] A Azuma, The Biokinetics of Flying and Swimming Springer, Tokyo, Japan, 1992 doi:

[2] G V Lauder, E J Anderson, J Tangorra, and P G A Madden, “Fish biorobotics: kinematics and hydrodynamics of self-propulsion,” in Journal of Experimental Biology, vol 210, no 16, pp 2767–2780, Aug 2007, doi: 10.1242/jeb.000265

[3] L Lionel, “Underwater Robots Part I: Current Systems and Problem Pose,” in Mobile

Robots: towards New Applications, I-Tech Education and Publishing, Ed London,

Uninted Kingdom, 2006, vol.16, pp 310-334, doi: 10.5772/4697

[4] P R Bandyopadhyay, “Trends in biorobotic autonomous undersea vehicles,” in IEEE

Journal of Oceanic Engineering, vol 30, no 1, pp 109-139, Jan 2005, doi:

[5] L Lionel, “Underwater Robots Part I: Current Systems and Problem Pose,” in Mobile

Uninted Kingdom, 2006, vol.16, pp 310-334, doi: 10.5772/4697

[6] L Lionel, “Underwater Robots Part II: Existing Solutions and Open Issues,” in Mobile

Uninted Kingdom, 2006, vol.17, pp 336-372 doi: 10.5772/4698

[7] G V Lauder, E J Anderson, J Tangorra, and P G Madden, “Fish biorobotics: kinematics and hydrodynamics of self-propulsion” in J Exp Biol, vol 210(Pt 16), pp 2767-80, Aug 2007, doi: 10.1242/jeb.000265 PMID: 17690224

[8] M S Triantafyllou and G S Triantafyllou, “An Efficient Swimming Machine,” in

Scientific American, vol 272, no 3, pp 64–70, 1995 doi:10.1038/scientificamerican0395-64

[9] K Mohseni, “Pulsatile vortex generators for low-speed maneuvering of small underwater vehicles,” in Ocean Engineering, vol 33, iss 16, pp 2209-2223, 2006, ISSN 0029-8018, https://doi.org/10.1016/j.oceaneng.2005.10.022

[10] M Sfakiotakis, D M Lane and J B C Davies, “Review of fish swimming modes for aquatic locomotion,” in IEEE Journal of Oceanic Engineering, vol 24, no 2, pp 237-

[11] T R Asplund, “The effects of motorized watercraft on aquatic ecosystems,” in

Environmental Science, University of Wisconsin, Madison, pp 21 (PUBL-SS-948–00),

Mar 2000, http://dnr.wi.gov/org/water/fhp/papers/lakes pdf

[12] L Lionel, “Underwater Robots Part II: Existing Solutions and Open Issues,” in Mobile

Uninted Kingdom, 2006, vol.17, pp 336-372 doi: 10.5772/4698

110 [13] M S Triantafyllou and G S Triantafyllou, “An Efficient Swimming Machine,” in

Scientific American, vol 272, no 3, pp 64-70, 1995 doi: 10.1038/scientificamerican0395-64

[14] F E Fish and J J Rohr, “Review of Dolphin Hydrodynamics and Swim

-ming Performance,” Technical report 1801, San Diego, Aug 1999, CA 92152–5001

[15] A J Ijspeert, A Crespi, D Ryczko, and J.M Cabelguen, “From Swimming to Walking with a Salamander Robot Driven by a Spinal Cord Model,” in Science, vol 315, no

[16] K H Low, C Zhou, and Y Zhong, “Gait Planning for Steady Swimming Control of

Biomimetic Fish Robots,” in Advanced Robotics, vol 23, no 7–8, pp 805–829, Jan

[17] P W Webb, “Form and Function in Fish Swimming,” in Scientific American, vol 251, no 1, pp 72–83, 1984

[18] C C Lindsey, “Form, Function and Locomotory Habits in Fish,” in Locomotion, Fish

[19] A Bekoff, “Development of Locomotion in Vertebrates: A Comparative Perspective,” in The Comparative Development of Adaptive Skills, Routledge, pp 57-94, Feb 2018, doi: 10.4324/9781351265164-3

[20] G E Coghill, “Anatomy and the Problem of Behaviour,” in Nature, Cambridge:

University Press, vol 124, pp 648–649, 1929, doi:10.1038/124648b0

[21] J C Fentress, “History of developmental neuroethology: Early contributions from ethology,” in Journal of Neurobiology, vol 23, no 10, pp 1355-136969, Dec 1992, PMID: 1487740, doi:10.1002/neu.480231003

[22] S R Robinson and W P Smotherman, “Fundamental motor patterns of the mammalian fetus,” in Journal Neurobiol, vol 23, no 10, pp 1574–1600, Dec 1992, doi:

[23] A Roberts, “How does a nervous system produce behaviour? A case study in neurobiology,” in Sci Prog, vol 74, no 293 Pt 1, pp 31–51, 1990, PMID: 2176347

[24] W Zhao, Y Hu, L Wang and Y Jia, “Development of a flipper propelled turtle-like underwater robot and its CPG-based control algorithm,” in 47th IEEE Conference on

Decision and Control, Cancun, Mexico, 2008, pp 5226-5231, doi: 10.1109/CDC.2008.4738819

[25] R Ding, J Yu, Q Yang, M Tan and J Zhang, “Robust gait control in biomimetic amphibious robot using central pattern generator,” in IEEE/RSJ International

Conference on Intelligent Robots and Systems, Taipei, Taiwan, 2010, pp 3067-3072, doi: 10.1109/IROS.2010.5651475

111 [26] W Wang, J Guo, Z Wang and G Xie, “Neural controller for swimming modes and gait transition on an ostraciiform fish robot,” in IEEE/ASME International Conference on Advanced Intelligent Mechatronics, Wollongong, NSW, Australia, 2013, pp 1564-

[27] J Yu, R Ding, Q Yang, M Tan and J Zhang, “Amphibious Pattern Design of a Robotic

Fish with Wheel‐propeller‐fin Mechanisms,” in Journal of field robotics, vol: 30, no 5, pp 702–716, Jul 2013, doi:10.1002/rob.21470

[28] T Wang, Y Hu, and J Liang, “Learning to swim: a dynamical systems approach to mimicking fish swimming with CPG,” in Robotica, vol 31, no 3, pp 361–369, May

[29] W Wang and G Xie, “CPG-based Locomotion Controller Design for a Boxfish-like

Robot,” in International Journal of Advanced Robotic Systems, vol 11, no 6, p 87, Jun 2014, doi: 10.5772/58564

[30] X Niu, J Xu, Q Ren, and Q Wang, “Locomotion Learning for an Anguilliform

Robotic Fish Using Central Pattern Generator Approach,” in IEEE Transactions on Industrial Electronics, vol 61, no 9, pp 4780–4787, Sep 2014, doi:

[31] E Donati, F Corradi, C Stefanini and G Indiveri, "A spiking implementation of the lamprey's Central Pattern Generator in neuromorphic VLSI," in IEEE Biomedical

Circuits and Systems Conference (BioCAS) Proceedings, Lausanne, Switzerland, 2014, pp 512-515, doi: 10.1109/BioCAS.2014.6981775

[32] J Yu, C Wang, and G Xie, “Coordination of Multiple Robotic Fish with Applications to Underwater Robot Competition,” in IEEE Transactions on Industrial Electronics, vol 63, no 2, pp 1280–1288, Feb 2016, doi: 10.1109/TIE.2015.2425359

[33] S Zhang, Y Qian, P Liao, F Qin, and J Yang, “Design and Control of an Agile

Robotic Fish with Integrative Biomimetic Mechanisms,” in IEEE/ASME Transactions on Mechatronics, vol 21, no 4, pp 1846–1857, Aug 2016, doi: 10.1109/TMECH.2016.2555703

[34] M J Lighthill, “Note on the swimming of slender fish,” in Journal of Fluid Mechanics, vol 9, no 2, pp 305–317, Oct 1960, doi: 10.1017/S0022112060001110

[35] K H Low, C Zhou and Y Zhong, “Gait Planning for Steady Swimming Control of

Biomimetic Fish Robots,” in Advanced Robotics, vol 23, no 7-8, pp 805-829, Apr

[36] M Sfakiotakis, D M Lane and J B C Davies, “Review of fish swimming modes for aquatic locomotion,” in IEEE Journal of Oceanic Engineering, vol 24, no 2, pp 237-

[37] K Mohseni, “Pulsatile vortex generators for low-speed maneuvering of small underwater vehicles,” in Ocean Eng, vol 33, no 16, pp 2209–2223, 2006, doi:

112 [38] S H William and J R David, Locomotion Academic Press, pp 576,

[39] P W Webb, “Form and Function in Fish Swimming,” in Scientific American, vol 251, no.1, pp 72–83, Aug 1984, http://www.jstor.org/stable/24969414

[40] P L Brower, “Design of a manta ray inspired underwater propulsive mechanism for long range, low power operation,” M.S thesis, Tufts University, USA, Aug 2008

[41] T Michael and T George, “An Efficient Swimming Machine,” in Scientific American, vol 272, no 3, pp 64-70, Mar.1995, doi: 10.1038/scientificamerican0395-64

[42] M A MacIver, E Fontaine, and J W Burdick, “Designing future underwater vehicles: principles and mechanisms of the weakly electric fish,” in IEEE Journal of Oceanic

Engineering, vol 29, no 3, pp 651–659, Jul 2004, doi: 10.1109/JOE.2004.833210

[43] J Buchli, “Adaptive frequency oscillators and applications to adaptive locomotion control of compliant robots,” Ph.D dissertation, École Polytechnique Fédérale De Lausanne, Switzerland, 2007

[44] J Buchli, A J Ijspeert, M Murata, and N Wakamiya, “Distributed Central Pattern

Generator Model for Robotics Application Based on Phase Sensitivity Analysis,” in

Biologically Inspired Approaches to Advanced Information Technology, vol 3141,

[45] J Buchli, F Iida and A J Ijspeert, “Finding Resonance: Adaptive Frequency

Oscillators for Dynamic Legged Locomotion,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, Beijing, China, 2006, pp 3903-3909, doi:

[46] M Sfakiotakis, J Fasoulas and R Gliva, “Dynamic modeling and experimental analysis of a two-ray undulatory fin robot,” in IEEE/RSJ International Conference on Intelligent

Robots and Systems (IROS), Hamburg, Germany, 2015, pp 339-346, doi:

[47] M Sfakiotakis and J Fasoulas, “Development and experimental validation of a model for the membrane restoring torques in undulatory fin mechanisms,” in 22nd Mediterranean Conference on Control and Automation, Palermo, Italy, 2014, pp 1540–

[48] J, Yuh, “Design and Control of Autonomous Underwater Robots: A Survey,” in

Autonomous Robots, vol 8, pp 7–24, 2000, doi: 10.1023/A:1008984701078

[49] K H Low, “Maneuvering of biomimetic fish by integrating a bouyancy body with modular undulating fins,” in International Journal of Humanoid Robotics, vol 4, no 4, pp 671–695, 2007, doi: 10.1142/S0219843607001217

[50] C Ren, X Zhi, Y Pu, F Zhang, “A multi-scale UAV image matching method applied to large-scale landslide reconstruction,” in Mathematical Biosciences and Engineering, vol 18, no 3, pp 2274-2287, 2021, doi: 10.3934/mbe.2021115

113 [51] G Ferri, A Munafo, and K D LePage, “An Autonomous Underwater Vehicle Data-

Driven Control Strategy for Target Tracking,” in IEEE Journal of Oceanic Engineering, vol 43, no 2, pp 323–343, Apr 2018, doi: 10.1109/JOE.2018.2797558

[52] G Salavasidis, A Munafò, C.A Harris, T Prampart, R Templeton, M.H Smart, D

T Roper, M Pebody, S.D McPhail, E Rogers and A.B Phillips, “Terrain-aided navigation for long-endurance and deep-rated autonomous underwater vehicles,” in

Journal of Field Robotics, vol 36, no 2, pp 447–474, Mar 2019, doi:

[53] C I Sprague, ệ ệzkahraman, A Munafo, R Marlow, A Phillips and P ệgren,

"Improving the Modularity of AUV Control Systems using Behaviour Trees," in

IEEE/OES Autonomous Underwater Vehicle Workshop (AUV), Porto, Portugal, 2018, pp 1-6, doi: 10.1109/AUV.2018.8729810

[54] W Zhao, Y Hu, and L Wang, “Construction and Central Pattern Generator-Based

Control of a Flipper-Actuated Turtle-Like Underwater Robot,” in Advanced Robotics, vol 23, no 1–2, pp 19–43, 2009, doi: 10.1163/156855308X392663

[55] C Zhou and K H Low, “Kinematic modeling framework for biomimetic undulatory fin motion based on coupled nonlinear oscillators,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, Taipei, Taiwan, 2010, pp 934–939 doi:

[56] J Yu, K Wang, M Tan, and J Zhang, “Design and control of an embedded vision guided robotic fish with multiple control surfaces,” in The Scientific World Journal, vol

[57] A J Ijspeert and A Crespi, “Online trajectory generation in an amphibious snake robot using a lamprey-like central pattern generator model,” in Proceedings - IEEE International Conference on Robotics and Automation, Rome, Italy, pp 262–268, 2007, doi: 10.1109/ROBOT.2007.363797

[58] D Korkmaz, G Ozmen Koca, G Li, C Bal, M Ay, and Z H Akpolat, “Locomotion control of a biomimetic robotic fish based on closed loop sensory feedback CPG model,” in Journal of Marine Engineering and Technology, vol 20, no 2, pp 125–137,

[59] J.-K Ryu, N Chong, B.-J You, and H Christensen, “Locomotion of snake-like robots using adaptive neural oscillators,” Intelligent Service Robotics, vol 3, pp 1–10, 2009, doi: 10.1007/s11370-009-0049-4

[60] M Ikeda, K Watanabe, and I Nagai, “Propulsion movement control using CPG for a

Manta robot,” in the 6th International Conference on Soft Computing and Intelligent

Systems, and the 13th International Symposium on Advanced Intelligence Systems,

Kobe, Japan, 2012, pp 755–758 doi: 10.1109/SCIS-ISIS.2012.6505174

[61] L Shang, S Wang and M Tan, “Fuzzy logic PID based control design for a biomimetic underwater vehicle with two undulating long-fins,” in IEEE/RSJ International

Conference on Intelligent Robots and Systems, Taipei, Taiwan, 2010, pp 922-927, doi:

[62] J Zhang, “Multimodal swimming control of a robotic fish with pectoral fins using a

CPG network,” in Chinese Science Bulletin, vol 57, pp 1209–1216, 2012

[63] K Inoue, S Ma, and C Jin, “Neural oscillator network-based controller for meandering locomotion of snake-like robots,” in IEEE International Conference on Robotics and Automation, New Orleans, LA, USA, 2004, vol 5, pp 5064–5069 doi:

[64] C.L Zhou, “Modeling and Control of Swimming Gaits for Fish-like Robots Using

Coupled Nonlinear Oscillators,” Ph.D dissertation, School of Mechanical & Aerospace Engineering, Nanyang Technological University, Singapore, 2012

[65] V D Nguyen, D K Phan, C A T Pham, D H Kim, V T Dinh, and T T Nguyen,

“Study on Determining the Number of Fin-Rays of a Gymnotiform Undulating Fin Robot,” in Lecture Notes in Electrical Engineering, vol 465, pp 745–752, 2018, doi: 10.1007/978-3-319-69814-4_72

[66] X Dong, S Wang, Z Cao, and M Tan, “CPG Based Motion Control for an Underwater

Thruster with Undulating Long-Fin,” in IFAC Proceedings Volumes, vol 41, no 2, pp 5433–5438, 2008, doi: 10.3182/20080706-5-KR-1001.00916

[67] A Crespi, D Lachat, A Pasquier, and A J Ijspeert, “Controlling swimming and crawling in a fish robot using a central pattern generator,” in Autonomous Robots, vol

[68] M Sfakiotakis, A Manolis, N Spyridakis, J Fasoulas, and M Arapis, “Development and Experimental Evaluation of an Undulatory Fin Prototype,” in Proceedings of the

22nd International Workshop on Robotics in Alpe-Adria-Danube Region, Smolenice,

[69] M Sfakiotakis, R Gliva, and M Mountoufaris, “Steering-plane motion control for an underwater robot with a pair of undulatory fin propulsors,” in 24th Mediterranean

Conference on Control and Automation (MED), Athens, Greece, 2016, pp 496–503 doi: 10.1109/MED.2016.7535989

[70] V H Nguyen, V D Nguyen, V T Duong, H H Nguyen, and T T Nguyen,

“Experimental Study on Kinematic Parameter and Undulating Pattern Influencing Thrust Performance of Biomimetic Underwater Undulating Driven Propulsor,” in

International Journal of Mechanical and Mechatronics Engineering, vol 20, no 5, pp

[71] W Zhao, J Yu, Y Fang, and L Wang, “Development of Multi-mode Biomimetic

Robotic Fish Based on Central Pattern Generator,” in IEEE/RSJ International

Conference on Intelligent Robots and Systems, Beijing, China, 2006, doi:

115 [72] X Wu and S Ma, “CPG-based control of serpentine locomotion of a snake-like robot,” in Mechatronics, vol 20, no 2, pp 326–334, 2010, doi: 10.1016/j.mechatronics.2010.01.006

[73] G Roza, M Minas, N Spyridakis, and M Sfakiotakis, “Development of a bio-inspired underwater robot prototype with undulatory fin propulsion,” in Proceedings of the 9th

International Conference on New Horizons in Industry, Business and Education,

[74] Z Lu, S Ma, B Li, and Y Wang, “3D Locomotion of a Snake-like Robot Controlled by Cyclic Inhibitory CPG Model,” in IEEE/RSJ International Conference on Intelligent

Robots and Systems, Beijing, China, 2006, doi: 10.1109/IROS.2006.281801

[75] M Wang, J Yu, M Tan and G Zhang, "A CPG-based sensory feedback control method for robotic fish locomotion," in Proceedings of the 30th Chinese Control Conference,

[76] C Zhou and K H Low, “On-line Optimization of Biomimetic Undulatory Swimming by an Experiment-based Approach,” in Journal of Bionic Engineering, vol 11, no 2, pp 213–225, 2014, doi: 10.1016/S1672-6529(14)60042-1

[77] M Sfakiotakis, J Fasoulas, R Gliva, and A Yannakoudakis, “Model-based fin ray joint tracking control for undulatory fin mechanisms,” in International Congress on Ultra

Modern Telecommunications and Control Systems and Workshops, vol 2016-Janua, pp

[78] C Zhou and K H Low, “Design and locomotion control of a biomimetic underwater vehicle with fin propulsion,” in IEEE/ASME Transactions on Mechatronics, vol 17, no

[79] M Sfakiotakis, J Fasoulas, M M Kavoussanos, and M Arapis, “Experimental investigation and propulsion control for a bio-inspired robotic undulatory fin,” in

Robotica, Jun 2015, vol 33, no 5, pp 1062–1084 doi: 10.1017/S0263574714002926

[80] Prof M ệzturan, A Bozanta, B Basarir-Ozel, E Akar, and M Coşkun, “A roadmap for an integrated university information system based on connectivity issues: Case of Turkey,” in The International Journal of Management Science and Information Technology (IJMSIT), vol 17, no 17, pp 1–23, 2015, doi: 10.14313/JAMRIS

[81] K H Low and A Willy, “Biomimetic motion planning of an undulating robotic fish fin,” in JVC/Journal of Vibration and Control, vol 12, no 12, pp 1337–1359, 2006, doi: 10.1177/1077546306070597

[82] R Ruiz-Torres, O M Curet, G V Lauder, and M A Maciver, “Erratum: Kinematics of the ribbon fin in hovering and swimming of the electric ghost knifefish,” in Journal of Experimental Biology, vol 217, no 20, pp 3765–3766, 2014, doi:

116 [83] K H Low, “Modelling and parametric study of modular undulating fin rays for fish robots,” in Mechanism and Machine Theory, vol 44, no 3, pp 615–632, 2009, doi:

[84] I English, H Liu, and O M Curet, “Robotic device shows lack of momentum enhancement for gymnotiform swimmers,” in Bioinspiration and Biomimetics, vol 14, no 2, 2019, doi: 10.1088/1748-3190/aaf983

[85] I D Neveln, R Bale, A P S Bhalla, O M Curet, N A Patankar, and M A MacIver,

“Undulating fins produce off-axis thrust and flow structures,” in Journal of Experimental Biology, vol 217, no 2, pp 201–213, 2014, doi: 10.1242/jeb.091520

[86] M Ikeda, S Hikasa, K Watanabe, and I Nagai, “A CPG design of considering the attitude for the propulsion control of a Manta robot,” in 39th Annual Conference of the

IEEE Industrial Electronics Society, Vienna, Austria, 2013, pp 6354–6358 doi:

[87] C Liu, Q Chen, and D Wang, “CPG-inspired workspace trajectory generation and adaptive locomotion control for quadruped robots,” in IEEE transactions on systems, man, and cybernetics Part B, Cybernetics: a publication of the IEEE Systems, Man, and Cybernetics Society, vol 41, no 3, pp 867–880, 2011, doi: 10.1109/TSMCB.2010.2097589

[88] C M A Pinto, D Rocha, and C P Santos, “Hexapod robots: New CPG model for generation of trajectories,” in Journal of Numerical Analysis, Industrial and Applied

[89] T Wang, W Guo, M Li, F Zha, and L Sun, “CPG Control for Biped Hopping Robot in Unpredictable Environment,” in Journal of Bionic Engineering, vol 9, no 1, pp 29–

[90] S Inagaki, H Yuasa, and T Arai, “CPG model for autonomous decentralized multi- legged robot system—generation and transition of oscillation patterns and dynamics of oscillators,” in Robotics and Autonomous Systems, vol 44, no 3–4, pp 171–179, 2003, doi: 10.1016/S0921-8890(03)00067-8

[91] M Mokhtari, M Taghizadeh, and M Mazare, “Hybrid Adaptive Robust Control Based on CPG and ZMP for a Lower Limb Exoskeleton,” in Robotica, vol 39, no 2, pp 181–

[92] X Wu, L Teng, W Chen, G Ren, Y Jin, and H Li, “CPGs with continuous adjustment of phase difference for locomotion control,” in International Journal of Advanced Robotic Systems, vol 10, pp 1–13, 2013, doi: 10.5772/56490

[93] Y Cao, Y Lu, Y Cai, S Bi, and G Pan, “CPG-fuzzy-based control of a cownose-ray- like fish robot,” in Industrial Robot: the international journal of robotics research and application, vol 46, no 6, pp 779–791, 2019, doi: 10.1108/IR-02-2019-0029

[94] I B Jeong, C S Park, K I Na, S Han, and J H Kim, “Particle swarm optimization- based central patter generator for robotic fish locomotion,” in IEEE Congress of

Evolutionary Computation, pp 152–157, New Orleans, LA, USA, 2011, doi:

[95] M C C Wang, G Xie, and L Wang, “CPG-based locomotion control of a robotic fish: Using linear oscillators and reducing control parameters via PSO,” in

International journal of innovative computing, information & control, vol 7, no 7, pp

[96] J Yu, Z Wu, M Wang, and M Tan, “CPG Network Optimization for a Biomimetic

Robotic Fish via PSO,” in IEEE Transactions on Neural Networks and Learning Systems, vol 27, no 9, pp 1962–1968, 2016, doi: 10.1109/TNNLS.2015.2459913

[97] J Lee, S Lee, S Chang, and B.-H Ahn, “A Comparison of GA and PSO for Excess

Return Evaluation in Stock Markets,” in Lecture Notes in Computer Science, vol 3562, no PART II, pp 221–230, 2005, doi: 10.1007/11499305_23

[98] C Niehaus, T Rofer, and T Laue, “Gait optimization on a humanoid robot using particle swarm optimization,” in Proceedings of the Second Workshop on Humanoid

Soccer Robots in conjunction with the sn, Pittsburgh, USA, pp 1-7, Nov.2007

[99] Y Zou, T Liu, D Liu, and F Sun, “Reinforcement learning-based real-time energy management for a hybrid tracked vehicle,” in Applied Energy, vol 171, pp 372–382,

[100] T Liu, Y Zou, D Liu, and F Sun, “Reinforcement learning-based energy management strategy for a hybrid electric tracked vehicle,” in Energies, vol 8, no 7, pp 7243–7260,

[101] R C Hsu, C T Liu, and D Y Chan, “A reinforcement-learning-based assisted power management with QoR provisioning for human-electric hybrid bicycle,” in IEEE Transactions on Industrial Electronics, vol 59, no 8, pp 3350–3359, 2012, doi:

[102] H Lee, C Kang, Y Il Park, N Kim, and S W Cha, “Online data-driven energy management of a hybrid electric vehicle using model-based Q-learning,” in IEEE Access, vol 8, pp 84444–84454, 2020, doi: 10.1109/ACCESS.2020.2992062

[103] T Liu, X Hu, S E Li, and D Cao, “Reinforcement Learning Optimized Look-Ahead

Energy Management of a Parallel Hybrid Electric Vehicle,” in IEEE/ASME Transactions on Mechatronics, vol 22, no 4, pp 1497–1507, 2017, doi:

[104] Y Lu, R He, X Chen, B Lin, and C Yu, “Energy-efficient depth-based opportunistic routing with q-learning for underwater wireless sensor networks,” in Sensors (Switzerland), vol 20, no 4, pp 1–25, 2020, doi: 10.3390/s20041025

[105] R Plate and C Wakayama, “Utilizing kinematics and selective sweeping in reinforcement learning-based routing algorithms for underwater networks,” in Ad Hoc

Networks, vol 34, pp 105–120, Oct.2015, doi: 10.1016/j.adhoc.2014.09.012

118 [106] Y He, L Xing, Y Chen, W Pedrycz, L Wang and G Wu, "A Generic Markov

Decision Process Model and Reinforcement Learning Method for Scheduling Agile Earth Observation Satellites," in IEEE Transactions on Systems, Man, and Cybernetics:

Systems, vol 52, no 3, pp 1463-1474, March 2022, doi: 10.1109/TSMC.2020.3020732

[107] Z Jin, Y Ma, Y Su, S Li, and X Fu, “A Q-learning-based delay-aware routing algorithm to extend the lifetime of underwater sensor networks,” in Sensors (Switzerland), vol 17, no 7, pp 1–15, 2017, doi: 10.3390/s17071660

[108] D Zhang, Z H Ye, P C Chen, and Q G Wang, “Intelligent event-based output feedback control with Q-learning for unmanned marine vehicle systems,” in Control

Engineering Practice, vol 105, Aug 2020, doi: 10.1016/j.conengprac.2020.104616

[109] Z Chen, B Qin, M Sun, and Q Sun, “Q-Learning-based parameters adaptive algorithm for active disturbance rejection control and its application to ship course control,” in

Neurocomputing, vol 408, pp 51–63, 2020, doi: 10.1016/j.neucom.2019.10.060

[110] Y Nakamura, T Mori, and S Ishii, “Natural Policy Gradient Reinforcement Learning for a CPG Control of a Biped Robot,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol 3242, pp 972–981, 2004, doi: 10.1007/978-3-540-30217-9_98

[111] A Drago, G Carryon and J Tangorra, "Reinforcement Learning as a Method for

Tuning CPG Controllers for Underwater Multi-Fin Propulsion," in 2022 International

Conference on Robotics and Automation (ICRA), Philadelphia, PA, USA, 2022, pp

[112] S G Khan, G Herrmann, F L Lewis, T Pipe, and C Melhuish, “Reinforcement learning and optimal adaptive control: An overview and implementation examples,” in

Annual Reviews in Control, vol 36, no 1, pp 42–59, Apr 2012, doi:

[113] A M Andrew, “Reinforcement Learning: An Introduction,” in Kybernetes, MIT Press, vol 27, no 9, pp 1093–1096, Jan.1998, doi: 10.1108/k.1998.27.9.1093.3

[114] S L Tanimoto, The elements of artificial intelligence: an introduction using LISP

USA: Computer Science Press, Jan 1987, ISBN:978-0-88175-113-0, doi: abs/10.5555/30422

[115] R S Sutton, “Learning to predict by the methods of temporal differences,” in Mach

Learn, vol 3, no 1, pp 9–44, Aug 1988, doi: 10.1007/BF00115009

[116] A Crespi, D Lachat, A Pasquier, and A J Ijspeert, “Controlling swimming and crawling in a fish robot using a central pattern generator,” in Autonomous Robots, vol

[117] A J Ijspeert, “Central pattern generators for locomotion control in animals and robots:

A review,” in Neural Networks, vol 21, no 4, pp 642–653, May 2008, doi:

119 [118] W Zhao, J Yu, Y Fang, and L Wang, “Development of Multi-mode Biomimetic

Robotic Fish Based on Central Pattern Generator,” in 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems, Oct 2006, pp 3891–3896 doi:

[119] M Ikeda, S Hikasa, K Watanabe, and I Nagai, “A CPG design of considering the attitude for the propulsion control of a Manta robot,” in 39th Annual Conference of the

IEEE Industrial Electronics Society, Oct 2013, pp 6354–6358 doi: 10.1109/IECON.2013.6700181

[120] C Zhou and K H Low, “On-line Optimization of Biomimetic Undulatory Swimming by an Experiment-based Approach,” in J Bionic Eng, vol 11, no 2, pp 213–225, Jun

[121] M Sfakiotakis, R Gliva, and M Mountoufaris, “Steering-plane motion control for an underwater robot with a pair of undulatory fin propulsors,” in 24th Mediterranean

Conference on Control and Automation (MED), Athens, Greece, 2016, pp 496–503 doi: 10.1109/MED.2016.7535989

[122] Y Hu, J Liang, and T Wang, “Parameter Synthesis of Coupled Nonlinear Oscillators for CPG-Based Robotic Locomotion,” IEEE Transactions on Industrial Electronics, vol 61, no 11, pp 6183–6191, Oct 2014, doi: 10.1109/TIE.2014.2308150

[123] X Niu, J Xu, Q Ren, and Q Wang, “Locomotion Learning for an Anguilliform

Robotic Fish Using Central Pattern Generator Approach,” IEEE Transactions on Industrial Electronics, vol 61, no 9, pp 4780–4787, Sep 2014, doi:

[124] J H Barron-Zambrano and C Torres-Huitzil, “Two-phase GA parameter tunning method of CPGs for quadruped gaits,” in The 2011 International Joint Conference on

Neural Networks, San Jose, CA, USA, 2011, pp 1767–1774 doi: 10.1109/IJCNN.2011.6033438

[125] W Andre, “Design and development of undulating fin,” in M.S thesis, Nanyang

[126] S D Kelly, R J Mason, C T Anhalt, R M Murray and J W Burdick, "Modelling and experimental investigation of carangiform locomotion for control," in Proceedings of the 1998 American Control Conference, ACC (IEEE Cat No.98CH36207),

Philadelphia, PA, USA, 1998, pp 1271-1276 vol.2, doi: 10.1109/ACC.1998.703619

[127] N E Leonard, "Control synthesis and adaptation for an underactuated autonomous underwater vehicle," in IEEE Journal of Oceanic Engineering, vol 20, no 3, pp 211-

[128] P Y Li and S Saimek, "Modeling and estimation of hydrodynamic potentials,"

Proceedings of the 38th IEEE Conference on Decision and Control (Cat

No.99CH36304), Phoenix, AZ, USA, 1999, pp 3253-3258 vol 4, doi: 10.1109/CDC.1999.827771

1 Qlearning simulation code for fin rays (Python) import numpy as np import random import matplotlib.pyplot as plt def cal_F(u, v, k, f,Am):

Calculate base element of hopf oscillator

""" return k*(Am*Am-u**2-v**2)*u - 2*np.pi*f*v, k*(Am*Am-u**2-v**2)*v + 2*np.pi*f*u def cal_P_head(post_u, post_v, epsilon, psi):

""" return epsilon * (post_v*np.cos(psi) - post_u*np.sin(psi)) def cal_P_tail(pre_u, pre_v, epsilon, psi):

""" return epsilon * (pre_u*np.sin(psi) + pre_v*np.cos(psi)) def cal_P_body(pre_u, pre_v, post_u, post_v, epsilon, psi):