Nghiên cứu phát triển thuật toán Định vị thị giác thông minh cho thiết bị bay không người lái

The method utilizes the state-of-the-art object detection and instance segmentation algorithm YOLOv8 to detect and produce the target UAV masks from the dual RGB images.. The algorithm s

BACKGROUND

Introduction about UAV

Unmanned aerial vehicles (UAVs), or drones, are remote-controlled flying devices that operate without a human pilot onboard Initially developed for military purposes, their use has since broadened to include commercial, scientific, industrial, civil, and entertainment applications, leading to their increased popularity and accessibility.

UAVs are increasingly being equipped with a variety of smart sensors and powerful companion computers that help to provide many applications in different fields such as:

Figure 1.1 A delivery UAV from Amazon

Unmanned Aerial Vehicles (UAVs) have revolutionized military operations by enabling the delivery of explosives without the need for a pilot to enter hostile territories These versatile drones are essential for surveillance, reconnaissance, aerial photography, and precise target tracking, while also playing a crucial role in countering aerial threats Their use significantly diminishes the risks associated with combat missions by minimizing human involvement, enhancing the safety of military personnel.

- Logistics: UAV is applied in automated freight tasks in the chain of activities of logistics services A few countries have started using this technology

UAVs play a crucial role in scientific research by collecting data in the air and over complex terrains that are often inaccessible to humans Equipped with various sensors, these unmanned aerial vehicles gather vital environmental information In the realm of space research, the United States and the United Kingdom lead the way in UAV utilization, with NASA's X-37B conducting numerous classified missions in space Following NASA's example, other global space agencies are increasingly adopting UAVs for their research endeavors.

Unmanned Aerial Vehicles (UAVs) play a crucial role in Search and Rescue (SAR) operations by accessing hard-to-reach areas and covering larger territories than a single individual can Their ability to patrol expansive regions enhances the efficiency and effectiveness of rescue missions.

4 addition of thermal imaging devices, the identification of people in distress is much more faster and reliable

- Civil and entertainment: UAVs are used in production activities such as agriculture or entertainment activities such as carrying cameras for the purpose of taking pictures or recording videos

The 4 most popular UAV structures nowaday are multi-rotors, fixed-wing, single- rotor, and vertical take-off and landing(VTOL) (Figure 1.2) [7]

Figure 1.2 The most popular modern UAV structures

Fixed-wing UAVs are aircraft-like drones equipped with fixed wings that utilize aerodynamic lift for gliding through the air Their primary advantage is the ability to operate over long distances, making them ideal for military applications, research, surveillance, and large-area mapping However, these UAVs face limitations, including an inability to hover, navigate complex terrains with obstacles, or carry heavy payloads.

Multi-rotor UAVs, including quadcopters, hexacopters, and octocopters, utilize multiple motors for vertical lift and maneuverability Their capability for precise takeoff and landing, as well as navigation through complex terrains, makes them the preferred choice for civilian applications These versatile drones are widely used in urban settings for tasks such as aerial photography, 3D scanning, visual inspections, and entertainment, particularly in video and image recording.

Single-rotor UAVs feature a helicopter-like design, utilizing a primary rotor for thrust and a smaller tail rotor for directional control and stability Known for their energy efficiency and ability to carry substantial payloads, these UAVs also pose safety risks due to their large, heavy rotors, necessitating meticulous maintenance and handling.

VTOL, or Vertical Take-Off and Landing, represents a UAV category that merges the benefits of rotor-based and fixed-wing designs This innovative UAV series features propeller rotors on the fuselage for vertical lift, akin to multi-rotor drones, while rear propellers generate horizontal thrust, enabling efficient forward movement.

The emergence of fixed-wing UAVs has revolutionized delivery services, combining the stationary capabilities of multi-rotors with the long-distance, high-speed travel of traditional fixed-wing aircraft Despite their innovative design, this UAV technology is still in its infancy, with many applications and technologies currently undergoing research and testing.

The multi-rotor UAV family is a primary focus of this project due to its prevalence in urban settings and its high accessibility for civilians.

UAV monitor and interception

The increasing adoption of UAVs for civilian use in urban areas presents significant technical, social, and security challenges Malicious drones pose threats by potentially carrying weapons or explosives for physical attacks, as well as devices for cyberattacks that target critical infrastructure Privacy concerns also arise, as UAVs equipped with cameras can infringe upon personal and governmental privacy To safeguard sensitive areas, it is essential to implement continuous monitoring of aerial space for unauthorized UAVs using various detection methods, including RF signal monitoring, radar, acoustic signals, and computer vision technologies.

After detecting a malicious UAV, the next step is to intercept it using various techniques tailored to the situation One of the most effective methods is employing a UAV jammer gun to disrupt the UAV's GPS or radio signals, prompting it to return to its original location or to land However, this approach can inadvertently affect other nearby communications and poses challenges when dealing with autopilot UAVs Additionally, the operator must be positioned close to the UAV to enhance interception success Alternative methods, such as net bazookas or trained animals, are also viable but come with their own limitations.

Figure 1.3 Some UAVs interception methods

Recent research has explored the use of one UAV to intercept intruder UAVs, utilizing either manual control or automatic detection Although this approach remains limited in research and application, it offers significant advantages, including the ability to operate over long distances and at high altitudes, minimal human oversight, and easy scalability to cover extensive airspace by increasing the number of devices This method represents a promising direction for future research and development.

Visual sensors

Modern automatic UAVs are equipped with a variety of sensors, including essential flight control components like Inertial Measurement Units (IMU), magnetometers, and GPS In addition to these, UAVs commonly feature visual sensors, particularly RGB and depth cameras, to gather environmental information effectively.

An RGB camera utilizes a standard CMOS sensor to capture colored images of objects, available in various shapes, sizes, prices, and image qualities that must be balanced for specific applications When selecting a camera for a UAV, key priorities include lightweight design, a wide field of view, depth of field, and vibration resistance, often resulting in a higher price point.

A few important parameters that need to be considered when choosing the camera for this application are:

Image resolution refers to the number of pixels in an image, indicating its clarity and typically measured by width and height dimensions (e.g., 1920x1080) Higher image resolution captures more environmental details, facilitating more accurate results in image processing and AI applications However, increased resolution also leads to longer processing times and higher memory usage, posing challenges due to the limited hardware capabilities of UAVs.

The field of view (FOV) refers to the camera's viewing angle, typically measured in degrees, including horizontal (HFOV), vertical (VFOV), and diagonal (DFOV) dimensions A camera with a larger FOV can cover a broader area, increasing the likelihood of detecting objects However, when maintaining the same image resolution, a higher FOV can result in lower image quality, particularly for distant objects, complicating the task for object detection algorithms.

Figure 1.4 Illustration of camera lens's field of view (FOV) [10]

To adjust the field of view (FOV) of a camera, selecting the appropriate focal length of the lens is essential The focal length, defined as the distance from the lens's center to the point where distant objects focus, plays a critical role in determining the camera's FOV alongside the size of the camera's image sensor.

We can calculate the FOV of the camera from the camera sensor size and the lens focal length using the following equation:

In case of cameras equipped on UAV, usually a lens with low focal length is preferred to achieve high FOV, covering a larger area

Depth cameras utilize advanced sensing technology to determine the distance of various points within a scene relative to the camera They generate a series of images, with each frame representing a depth image, where the pixel values indicate the distance from the camera.

Some of the most advanced depth cameras currently available for commercial use are Intel Realsense D455 [11] from Intel and Zed2i [12] from Stereolab

Figure 1.5 Intel RealSense Depth Camera D455

Depth cameras can experience significant noise issues, influenced by the method of depth acquisition and environmental conditions Stereo and structured light systems calculate depth by identifying point correspondences across multiple views, but interpolation between these points can lead to depth inaccuracies The layout of the scene and the objects within it further contribute to noise levels in depth images Additionally, scene illumination plays a critical role, as depth cameras struggle under strong lighting conditions, particularly in natural outdoor settings This challenge arises from the infrared (IR) components in natural light, which can interfere with the IR sensors of the depth camera, while high-intensity lighting may overshadow the lower-intensity IR light emitted by the camera, resulting in further depth errors.

UAV detection

The initial phase of any algorithm designed for 3D object localization involves identifying and localizing objects within a 2D image This challenge encompasses detecting and encircling objects in an image while also classifying them It is a well-established issue in artificial intelligence and computer vision, featuring renowned object detection algorithms like YOLO and SSD.

The project employs an instance segmentation algorithm to enhance the identification of UAVs in images, prioritizing accuracy for 3D position estimation Unlike traditional methods like YOLO and RCNN that focus on bounding boxes and classification confidence, the segmentation approach provides detailed insights into the shape and size of the UAV This improved precision is crucial for accurately determining the UAV's location in three-dimensional space.

Instance Segmentation is a crucial computer vision task focused on recognizing and isolating individual objects within an image This process includes detecting the boundaries of each object and assigning a unique label to them The primary objective is to create a pixel-wise segmentation map, ensuring that each pixel corresponds to a specific object instance.

9 difference between an object detection and an instance segmentation model outputs

Figure 1.7 Difference in outputs of object detection and instance segmentation models

Figure 1.8 A sample instance segmentation output with UAV objects

The UAV's shape is unevenly distributed within the bounding box, with its body skewed towards the top and its thin legs often unrecognizable This obscures the UAV's center of gravity in the image, rendering the bounding box an inadequate representation of its state In contrast, the segmentation mask provides a more accurate depiction of the UAV's position and structure.

To evaluate the performance of the detection and segmentation model during training and testing, some of the following evaluation metrics are used:

Intersection over Union (IOU) is a crucial metric used to measure the overlap between two regions, primarily for evaluating object detection performance It compares the predicted bounding box against the ground truth bounding box, and similarly applies to segmentation tasks by assessing the overlap between the predicted segmentation mask and the ground truth mask The IOU value ranges from 0 to 1, where a higher value indicates a more accurate prediction.

Figure 1.9 IOU calculation on object detection and instance segmentation images

In object detection tasks, the Intersection over Union (IOU) threshold value is crucial for classifying predictions A prediction is labeled as True Positive (TP) if it overlaps with a ground truth box with an IOU exceeding the threshold Conversely, a prediction is classified as False Positive (FP) when it either lacks a corresponding ground truth or has an IOU below the threshold Additionally, a False Negative (FN) occurs when there is no prediction for a correctly identified ground truth object bounding box.

A sample of these predictions can be seen in Figure 1.10

Figure 1.10 Visualization of TP,FP and FN

The Intersection over Union (IoU) can be computed for segmentation masks by considering the intersection as the pixels present in both the prediction and ground truth masks Meanwhile, the union is defined as all pixels that appear in either the prediction or ground truth masks.

Precision (P) quantifies the accuracy of predicted positive outcomes in relation to actual correct results This metric is applicable to both object detection and segmentation tasks, and its value ranges from 0 to 1.

- Recall (R): this parameter measures the proportion of actual positives that were predicted correctly Same as precision, the value of recall ranges from

Confidence plays a crucial role in object detection models, which not only identify the bounding box or segmentation mask of an object but also predict the object's class with an associated confidence score As illustrated in Figure 1.11, a higher confidence score indicates a greater certainty that the identified object is accurate During both training and inference, a confidence threshold is established; predictions that fall below this threshold are discarded, ensuring only the most reliable detections are retained.

Figure 1.11 Object detection results with confidence scores

The Precision-Recall curve is a crucial tool in evaluating model performance by setting a static IOU threshold (e.g., 0.5) and varying the model's confidence threshold from 0 to 1, yielding multiple precision and recall values As confidence increases, the model's predictions become more accurate, enhancing precision by minimizing false positives; however, this often results in lower recall due to the exclusion of predictions made with lower confidence By plotting all precision and recall values across the confidence spectrum, we can visualize the trade-off between these metrics, with a high area under the curve indicating both high precision (low false positive rate) and high recall (low false negative rate) Achieving high scores in both areas signifies that the classifier not only provides accurate results but also captures the majority of positive instances effectively.

Figure 1.12 Sample of a Precision-Recall curve [17]

Mean Average Precision (mAP) is computed using Equation 1-5, where APi represents the average precision for class i and n denotes the total number of classes The average precision for each class is defined based on specific metrics.

The area under the precision-recall curve for a single object class is represented by the average precision (AP) and mean average precision (mAP) values, which are identical when detecting only one class mAP is typically calculated using two different Intersection over Union (IoU) settings: mAP50, which uses an IoU threshold of 0.5, and mAP50-95, a more stringent metric that averages scores across multiple IoU thresholds from 0.5 to 0.95 in 0.05 increments Achieving a higher mAP50-95 score necessitates that the model's predictions closely align with the ground truths.

The mean Average Precision (mAP) value serves as a comprehensive metric for evaluating the performance of detection models, making it a standard benchmark for comparing various models against each other.

The YOLOv8 algorithm was selected for object recognition and instance segmentation in this project due to its rapid inference speed and adequate accuracy, making it ideal for the hardware limitations of UAVs Released in January 2023 by Ultralytics, the creators of YOLOv5, YOLOv8 supports various vision tasks, including object detection, segmentation, pose estimation, tracking, and classification The architecture of the YOLOv8 object detection model is illustrated in Figure 1.13.

Figure 1.13 Structure of the YOLOv8 object detection model [20]

YOLOv8 enhances the foundational architecture established by earlier YOLO algorithms, which stands for You Only Look Once This series of models is designed to predict all objects within an image in a single forward pass, showcasing their efficiency and speed in object detection.

Vision-based object localization in 3D space

Research on algorithms for determining object coordinates in a 3D environment using image sensors is gaining traction, particularly in self-driving vehicles and autonomous UAVs These algorithms analyze data from color or depth cameras to identify objects and their positions in three-dimensional space, facilitating crucial tasks like obstacle avoidance, object tracking, and following.

1.5.1 Coordinate systems and coordinate transformation

To facilitate effective monitoring and control of the target UAV, it is essential to convert the camera-based coordinate values into the world coordinate system This conversion enables accurate location calculation and enhances overall UAV management.

A coordinate frame consists of orthogonal axes fixed to a body, enabling the description of point positions relative to that body These axes converge at a single point known as the origin In the field of robotics, several key coordinate frames are utilized.

Figure 1.20 Body and camera coordinate system

The world frame is defined as the Oxyz coordinate system, with its origin at the UAV's starting point (0,0,0) In this system, the Ox axis extends forward from the UAV upon powering up, the Oy axis extends to the left, and the Oz axis points upward This coordinate system remains static throughout the UAV's operation.

The body frame, or robot frame, is defined by the Oxyz coordinate system, with its origin at the UAV controller's position (0,0,0) In this system, the Ox axis consistently points forward, the Oy axis directs to the left, and the Oz axis extends upward, perpendicular to the UAV's frame This coordinate system is dynamic, moving in conjunction with the UAV During flight, the UAV executes rotational motions such as roll, pitch, and yaw, aligned with these body frame axes.

Figure 1.21 Motion of the UAV along its body frame

The camera frame operates within the Oxyz coordinate system, where the origin is positioned at the center of the camera or camera cluster In this system, the Ox axis extends to the right, the Oy axis directs downward, and the Oz axis is oriented accordingly.

20 forward perpendicular to the image plane This coordinate system is often used in OpenCV image processing algorithms

To convert a point's coordinates in space from one system to another, we utilize the transformation matrix [R|t] This matrix comprises a 3x3 rotation matrix R and a 3x1 translation matrix t.

The rotation and translation matrix required to convert a point from coordinate frame 1 to coordinate frame 2 is identical to the transformation needed to align coordinate frame 2 with coordinate frame 1 Essentially, the transformation that aligns the target frame with the initial frame is the same as the one that transforms a point from the initial frame to the target frame This concept can be effectively demonstrated with the help of Figure 1.22.

In a 3D space, when a point P with coordinates p(x, y, z) undergoes a rotation alongside its coordinate system using a rotation matrix R, the resulting point P’ retains the same coordinates (x, y, z) in the new system This simultaneous rotation of both the point and the coordinate system ensures that the coordinates of point P’ are consistent, highlighting the relationship between the original and transformed frames.

To compute the coordinates of point P’ in the original coordinate system, we use the formula p’ = R*p, where p represents the coordinates of point P (x, y, z) The rotation matrix R aligns the original coordinate system with the new one, enabling the calculation of point P’'s coordinates Similarly, the translation matrix functions in the same manner, facilitating the transformation of coordinates between systems.

When mounting a camera on a UAV, aligning the camera's z-axis with the UAV's x-axis and translating it by a distance of ∆𝑥𝑥 along the x-axis creates a specific transformation matrix [R|t] from the camera to the body frame, as illustrated in Figure 1.23.

Figure 1.23 Transformation matrix from camera frame to UAV body frame

The transformation from the body frame to the world frame involves a rotation and translation matrix that represents the current pose of the UAV This matrix encapsulates both the position and orientation of the UAV, effectively expressing the necessary rotation and translation to align the world frame with the UAV's current frame.

To effectively localize an object in 3D space using visual sensors, such as a UAV, various setups can be employed, including a single RGB camera, a combination of RGB and depth cameras, or a multi-camera system utilizing several RGB cameras.

Several studies have investigated the use of a single camera mounted on a UAV for detecting other UAVs However, relying solely on an RGB camera presents challenges in accurately determining the distance to the target UAV This limitation arises from the unknown size of the target UAV, causing images of small UAVs close to the camera to resemble those of larger UAVs positioned farther away.

To address distance measurement challenges, depth cameras can be utilized, often in conjunction with RGB cameras, to detect and estimate object depth in 3D However, a significant limitation of depth cameras is their short range, typically around 10 meters, and the considerable noise they exhibit at this distance Additionally, identifying and classifying objects using RGB images remains more straightforward than with depth images.

UAV controller

The UAVs in this project utilize the PX4 Autopilot flight controller, an open-source software created by a global community of skilled developers from diverse research fields and industries PX4 offers control solutions for a wide range of unmanned aerial vehicles, primarily targeting affordable and civilian autonomous aircraft.

Currently, 24 hardware vendors, including CUAV and Holybro, are manufacturing flight controller modules based on the PX4 platform, accompanied by various accessories A notable example is the Holybro Pixhawk 6C flight controller, which is compatible with PX4 This software offers an autopilot mode, enabling UAV control programs to execute tasks autonomously, without human intervention Additionally, PX4 is open-source and accessible to all users.

PX4 software together with appropriate hardware (flight controller module) support the following feature on UAVs:

- Manual, partially assisted or autonomous flight control with support from a companion computer

- Vehicle stabilization with support from sensors like gyroscope, accelerometer, magnetometer (compass) and barometer

- Control a variety of external systems like cameras and payloads

To effectively control a UAV, the flight controller software must be set to Offboard mode, allowing the vehicle to respond to external inputs like position, velocity, and thrust commands from a companion computer For this project, the velocity flight mode is chosen to maintain UAV agility while simplifying the controller program logic The implementation and testing of the velocity controller program, which operates on the companion computer and transmits velocity commands to the UAV's flight controller module, are detailed in my research paper [6].

The velocity controller program utilizes the UAV's current position and the desired target position provided by the flight controller It then employs the Proportional-Integral-Derivative (PID) control algorithm to determine the appropriate velocity command for the UAV.

The PID controller is a widely utilized feedback control loop mechanism in industrial control systems and various applications that require continuous modulation It continuously computes an error value, represented as 𝑠𝑠(𝑤𝑤), which is the difference between the desired setpoint (SP) and the actual process variable (PV) The controller then applies corrections based on three terms: proportional (P), integral (I), and derivative (D), which collectively define its functionality The structure of a PID controller is illustrated in Figure 1.28.

Figure 1.28 Structure of a PID controller

The PID controller continuously computes the error value \( s(w) \) by comparing the desired setpoint \( s(w) \) with the current process value \( y(w) \) To minimize this error, the controller adjusts the control variable \( o(w) \) that regulates the system, utilizing a specific control function.

Equation 1-16 where 𝐾𝐾𝑝𝑝, 𝐾𝐾𝑖𝑖 and 𝐾𝐾𝑑𝑑 are the coefficients for the proportional (P), integral (I) and derivative (D) term of the controller

The Proportional block generates a control output that is directly proportional to the current error value, denoted as 𝑠𝑠(𝑤𝑤) A larger error results in a correspondingly larger control output, determined by the gain factor 𝐾𝐾𝑝𝑝 However, relying solely on proportional control can lead to a steady-state error between the set point and the process value, as the controller needs an error to produce a proportional output When the error diminishes, the control signal may become insufficient to influence the system effectively Additionally, an excessively high value of 𝐾𝐾𝑝𝑝 can cause instability and oscillation within the system.

The Integral block plays a crucial role in control systems by accounting for past values of the error, denoted as 𝑠𝑠(𝑤𝑤), and integrating them over time to generate the I term When there is a residual error after proportional control is applied, the integral term works to eliminate this error by incorporating the cumulative historical values of the error Once the error is resolved, the integral term stabilizes, leading to a reduction in the proportional effect as the error diminishes, but this is offset by the increasing influence of the integral effect It's important to note that a higher 𝐾𝐾 𝑖𝑖 value can lead to increased system overshoot.

The Derivative block estimate of the future trend of the 𝑠𝑠(𝑤𝑤) error, based on its current rate of change It is sometimes called "anticipatory control", as it is

To effectively minimize the impact of the 𝑠𝑠(𝑤𝑤) error, a control influence is applied based on the rate of error change; a faster change leads to a stronger damping effect This mechanism aims to prevent the overshoot phenomenon, where the system over-corrects its state or becomes unstable, oscillating around the steady state due to excessively high 𝐾𝐾𝑝𝑝 or 𝐾𝐾𝑖𝑖 coefficients.

The PID control algorithm is effectively utilized in the velocity controller program, where the error 𝑠𝑠(𝑤𝑤) represents the difference between the UAV's target position and its current 3D location The output, u(t), serves as the velocity command for UAV control, calculated individually for the x, y, and z axes using Equation 1-17, which determines the required velocity.

Robot Operating System

The Robot Operating System (ROS) is an open-source framework comprising software libraries and toolkits that facilitate the development of robotics applications It offers essential services for heterogeneous computer clusters, including hardware abstraction, low-level device control, and commonly used functionality ROS employs a graph architecture to represent running processes, where nodes handle sensor data, control messages, state information, planning, and actuator communications through efficient message-passing and package management.

Figure 1.29 Structure of a simple ROS system

In a ROS (Robot Operating System) framework, processes are organized as nodes within a graph structure, interconnected by edges known as ROS topics These nodes facilitate communication by passing messages through topics, making service calls to one another, and offering services to other nodes Additionally, they can access or modify shared data stored in a communal database, enhancing collaborative functionality in robotic applications.

The ROS Master facilitates the functionality of a parameter server by registering nodes, establishing node-to-node communication for topics, and managing updates to the parameter server Instead of routing messages and service calls through the master, it enables direct peer-to-peer communication among registered node processes This decentralized architecture is particularly advantageous for robotic systems, which typically consist of interconnected computer hardware and may rely on external computers for intensive computations or command execution.

In the Robot Operating System (ROS), a node represents a single process within the ROS graph, each identified by a unique name registered with the ROS master Nodes can exist under various namespaces, allowing multiple nodes with different names to operate simultaneously, or they can be defined as anonymous, which generates a random identifier to append to their name Central to ROS programming, nodes primarily function as the building blocks of client code, enabling them to receive information from other nodes, send data to peers, and handle requests for actions, thereby facilitating communication and collaboration within the ROS ecosystem.

Topics serve as designated buses for nodes to exchange messages, requiring unique names within their namespace To transmit messages, a node publishes to a topic, while receiving messages necessitates a subscription to that topic This publish/subscribe model operates anonymously, meaning nodes are unaware of each other's identities in the communication process, only that they are engaged in message exchange The messages conveyed through a topic can vary significantly and may include user-defined content such as sensor data, motor control commands, state information, and actuator commands.

Nodes can advertise services, which are defined actions resulting in a single outcome These services are typically utilized for tasks with a clear beginning and end, like capturing a one-frame image, rather than for continuous processes like sending velocity commands to a wheel motor or processing odometer data Nodes not only advertise their services but also invoke services from other nodes.

Utilizing ROS allows for the decomposition of complex systems into smaller, specialized programs that communicate via the ROS topics network This framework enables the companion computer to effectively connect, control, and gather data from various UAV hardware components, including flight controllers, sensors, and cameras.

The ROS version used in this project is ROS Noetic Ninjemys, which officially supports the Ubuntu operating system version 20.04.

UAV simulator

Testing unmanned aerial vehicles (UAVs) in real-world scenarios can be costly and time-intensive, necessitating skilled pilots and meticulous integration of hardware and software Consequently, simulation tools have emerged as a viable alternative to traditional flight testing Numerous UAV simulation software options are available, including Gazebo, Flightmare, and AIRSIM, each offering unique features and capabilities.

The project utilizes the AIRSIM photo-realistic environment as a UAV simulator, offering a customizable 3D space for testing UAV path planning and navigation capabilities This environment accurately represents UAV dynamics and supports advanced physics for controller research, making it ideal for data collection Additionally, AIRSIM enhances AI and machine learning applications by providing a life-like simulation that is compatible with the PX4 flight controller.

Figure 1.30 AIRSIM simulator running with a UAV and mutiple different image outputs

AIRSIM (Aerial Informatics and Robotics Simulation) is an open-source, cross-platform simulator designed for UAVs and ground vehicles, developed by Microsoft using Epic Games' Unreal Engine 4 This innovative tool enables researchers to experiment with deep learning, computer vision, and reinforcement learning algorithms, making it ideal for advancing autonomous vehicle technology.

Figure 1.30 illustrates a simulated UAV capturing various image outputs, including RGB, depth, and segmentation images This simulation facilitates the testing of autonomous solutions while eliminating concerns about potential real-world damage.

AIRSIM leverages Unreal Engine 4, granting users access to a wide array of simulation environments available on the Unreal Engine Marketplace for UAV experiments Additionally, users can utilize the Unreal Engine Editor to create custom assets tailored to their specific applications.

AIRSIM offers comprehensive API support in both Python and C++, enabling users to perform a variety of simulation tasks such as environment interaction, multi-UAV control, and data recording from various sensors including cameras, Lidar, IMU, and GPS Additionally, AIRSIM facilitates a connection with the PX4 flight controller, bridging the gap between UAV control in simulated and real-world environments, and allowing for the testing of UAV controller programs within the simulation framework.

DESIGN OF THE UAV LOCALIZATION AND

Overview

This chapter outlines the comprehensive structure of the UAV detection, localization, and tracking system, which consists of four key components: a dual RGB camera setup, a UAV detection and segmentation algorithm, a 3D location estimation algorithm, and a UAV controller for tracking the target UAV The subsequent sections will detail the design process for each of these components.

Figure 2.1 Structure of the UAV detection, localization and following system

Stereo camera system setup

The dual camera system offers significant advantages for 3D object detection, utilizing stereo matching to accurately estimate object coordinates Unlike mono cameras, which struggle to differentiate between UAV models of similar shapes but varying sizes, the dual system effectively addresses this challenge Additionally, it excels in capturing fast-moving UAVs and operates over longer ranges than traditional depth cameras, enhancing overall performance in aerial applications.

In the AIRSIM simulator, the dual RGB camera system on the following UAV is configured as follows:

- Image resolution: The resolution value of a single image is chosen to be

The 1280x720 pixel resolution provides a sharp image, effectively capturing small details of UAVs from a distance without generating excessively large data sets Additionally, 720p is the standard video recording resolution supported by most mini cameras designed for UAVs.

- Field of view: In the case of doing this simulation, the FOV value is set to

The UAV utilizes a 90-degree horizontal field of view (HFOV) to enhance target detection In practical applications, this HFOV can be adjusted downward when integrated with additional support systems to improve algorithmic accuracy The experimental section will also evaluate the performance of varying FOV values.

The chosen baseline value of 0.5 meters is ideal for mounting two cameras on medium to large-sized UAVs, ensuring adequate accuracy for initial 3D coordinate estimation tests in simulations While a larger baseline can enhance accuracy, it is constrained by the UAV's physical dimensions.

Figure 2.2 Sample image from the RGB camera in the AIRSIM simulator

Images were captured using the AIRSIM C++ API, facilitated by a custom C++ ROS package that manages API calls to retrieve images from the simulation, ensuring synchronization between them These images are then transmitted to a Python ROS node, which operates the YOLOv8 UAV detection and segmentation model A sample image is illustrated in Figure 2.2 However, due to API and simulator constraints, the output image pair is limited to a frequency of 20Hz.

The project employs a camera system featuring two FLIR FFY-U3-16S2C-C color cameras, paired with Fujinon's DF6HA-1 lens, due to equipment accessibility constraints For comprehensive specifications of the equipment, please refer to Tables 2.1 and 2.2.

Figure 2.3 FFY-U3-16S2C-C color camera Table 2.1 FFY-U3-16S2C-C color camera specification

Figure 2.4 DF6HA-1 camera lens Table 2.2 DF6HA-1 camera lens specification

The combination of the camera and lens produces an image resolution of 1440x1080, featuring a horizontal field of view (HFOV) of 44.9 degrees and a vertical field of view (VFOV) of 34.5 degrees Although this configuration offers a narrower field of view compared to the simulator's camera, it enhances the detail of the visible objects, thereby improving the accuracy of detection and estimation processes A sample output from the camera is illustrated in Figure 2.5.

Figure 2.5 Sample image from the real-life camera setup

Images were captured at a frequency of 60Hz using a camera driver ROS package in conjunction with the Spinnaker SDK for FLIR cameras This high-frequency and synchronized streaming of stereo images minimizes data latency, enhancing the performance of the algorithm.

UAV detection and segmentation

In this stage, the AI model was used to detect and segment the target UAV from the dual RGB images received from the camera system

The author of YOLOv8 provided some prebuild model architecture with various sizes, accuracy and inference speed as shown in Table 2.3

Table 2.3 Information about different sizes of YOLOv8 segmentation model

Due to hardware limitations on UAVs and the necessity for low-latency detection tasks, the YOLOv8n-seg model was chosen for its compact size This selection aims to achieve the fastest inference speed while conserving hardware resources for other functionalities, ultimately enhancing the efficiency of UAV position estimation.

Identifying and segmenting UAVs using RGB images is a largely overlooked challenge, resulting in a significant lack of adequate datasets for this task While several UAV detection datasets are available, they are limited in terms of UAV types, flight scenarios, and environmental diversity, often providing only bounding box annotations Consequently, there is a pressing need to develop a comprehensive dataset tailored to address this issue effectively.

Accurate identification and tracking of UAVs in dynamic environments necessitate the creation of a robust dataset for effective detection from UAV-mounted cameras The ideal approach involves using a real UAV equipped with a camera to capture images of various UAV models across diverse settings, followed by manual annotation of the captured data However, this method demands significant human labor and resources As an alternative, researchers are exploring simulation data, which allows for automatic annotation and the generation of extensive, diverse datasets using multiple 3D models Despite the advantages of simulation, discrepancies between simulated and real images persist.

33 obtained in reality Therefore, the simulation data acquisition process needs to be carefully designed so that the data obtained using simulation software is as close to reality as possible

The project utilizes Microsoft’s AIRSIM, a photo-realistic UAV simulation software powered by Unreal Engine 4, which offers a highly detailed environment and API support for UAV simulation This software allows for environment modification and provides annotated segmentation images based on object IDs As illustrated in Figure 2.6, the simulator generates sample RGB and segmentation images, with distinct mask colors representing different object types in the scene, such as trees, benches, and globes.

Figure 2.6 Sample of RGB and segmentation images from AIRSIM

To enhance UAV detection across various operating environments, it is essential to capture images in a wide range of settings; however, obtaining such diverse data is challenging due to the limited availability and high costs of realistic environments This project introduces a novel data acquisition method that integrates real UAV images with those generated in the AIRSIM simulator By leveraging the Media Player component within the simulation, real-life UAV footage can be displayed in front of a virtual camera, while one or more UAVs maneuver between the camera and the image plane, seamlessly incorporating the UAV into the scene This approach not only preserves the details of actual UAV datasets but also automates the generation of segmentation mask labels using the simulator, streamlining the data collection process.

Figure 2.7 Data collection setup in the simulation environment

The VisDrone2019 dataset, collected by the AISKYEYE team at Tianjin University, serves as the foundation for this project, focusing on UAV-captured images in urban environments For our analysis, we utilized a specific subset known as VisDrone-DET, which includes 4,478 images, to construct the image sequence required for our dataset Sample images from this collection are illustrated in Figure 2.8.

Figure 2.8 Sample images from the VisDrone-DET dataset

Eight distinct UAV models, each featuring four unique 3D geometric designs and available in black and white, were programmed to fly at varying distances of 2 to 15 meters in front of the camera This approach was implemented to ensure a diverse range of UAV sizes in the dataset All models were sourced from [49] with certain modifications Figure 2.9 illustrates all the UAV models utilized in this dataset, which will henceforth be referred to as UAV model 1, UAV model 2, and so on.

Figure 2.9 3D UAV models used in the AIRSIM simulator

The dataset restricts UAV distances to a maximum of 15 meters, as beyond this range, the high field of view (FOV) of the camera causes the UAV to appear too small in the images Consequently, when these images are scaled down for input into the YOLO model, the UAV is reduced to an unrecognizable black or white blob, leading to noise and inaccuracies in detection results.

The raw dataset containing 40427 RGB and segmentation image pairs A sample of the raw dataset collected can be seen in Figure 2.10

Figure 2.10 Sample of the collected dataset

The dataset was augmented using the imgaug tool to enhance data variation, employing various image modification techniques such as flipping, cropping, and affine transformations Additionally, motion blur was applied to mimic a moving camera, along with random Gaussian blur noise and adjustments to color channels to replicate low image quality and varying lighting and weather conditions This process resulted in a comprehensive dataset of 202,135 image pairs, which were split into an 80:20 ratio for training and validation purposes, with 80% allocated for training and 20% for validation.

Figure 2.11 Sample of the dataset after applying data augmentation

This data collection method leverages realistic background images captured in simulation software, allowing for the expansion of the dataset with various UAV perspective images available online Additionally, self-captured images can be incorporated to fine-tune the model for specific terrains required for deployment This adaptable data acquisition approach is also suitable for a wide range of object detection applications.

The YOLOv8 model was trained using default settings, with the augmentation scaling factor disabled to maintain the correct UAV mask size in the dataset The training utilized an image size of 640x384, resized from 1280x720, over 200 epochs with a batch size of 36 The learning rate was set at 0.01, employing stochastic gradient descent (SGD) as the optimizer, with a momentum of 0.937 and a weight decay of 0.0005 This training process was conducted on a single RTX 3060TI and lasted approximately 3.4 days.

The final model demonstrates impressive performance, achieving a mAP50 score of 0.94189 and a mAP50-95 score of 0.76262 for bounding box detection Additionally, it records a mAP50 score of 0.92877 and a mAP50-95 score of 0.57537 for the instance mask segmentation task on the validation set.

After training, the final model will be used to perform inference on each image in the image pair received from the dual cameras in simulation or in reality The

The 37 model is set up with an inference image size of 640x384, a confidence threshold of 0.5, and operates in FP16 precision mode to enhance inference speed Each UAV mask result generated from the images is utilized for 3D location estimation, allowing for the calculation of the target UAV's x, y, and z coordinates within the environment.

After we have achieved the segmentation mask of the UAV in the stereo image pair, we can then continue to estimate the location of the UAV

Various methods exist for calculating the disparity map of an image pair By applying the UAV mask obtained from the previous step, we can generate a disparity map that focuses specifically on all pixels associated with the UAV.

OpenCV's algorithm for computing disparity maps from rectified RGB images is often too slow for UAV applications, taking approximately 100-150ms on weaker CPUs and struggling with noise in complex environments and fast-moving cameras Similar AI-based methods also demand significant time and GPU resources, which are limited in UAV hardware To enhance efficiency, we first identify the UAV's center using a predetermined mask, then compute the disparity between these two points to determine the UAV's distance.

UAV controller algorithm

The target UAV's coordinates, determined through the 3D location estimation block, are transmitted to the UAV controller block This block calculates the setpoint to ensure the camera-equipped UAV maintains a specific distance from the target UAV, keeping it within the camera's field of view for effective tracking.

Figure 2.15 Setpoint position placement for optimal following result

The setpoint for the pursuing UAV is strategically placed on the straight line connecting it to the target UAV, maintaining a preset distance to optimize the pursuit path This calculation prevents the following UAV from getting too close, ensuring a safe distance is maintained The UAV controller determines the x and y coordinates of the setpoint, while the z coordinate matches that of the target UAV to keep it centered in the camera's frame for optimal visibility Additionally, the yaw rotation angle of the pursuing UAV is calculated to ensure it consistently faces the target, enhancing visibility throughout the pursuit.

The UAV controller block operates by first powering up the UAV, which then takes off to its designated starting position Once airborne, the UAV searches for the target UAV; upon detection, the target's location is relayed to the controller block This block calculates the desired setpoint for the UAV's flight path, maintaining an array of the most recent setpoints to ensure smooth navigation By averaging these setpoints, the controller determines the linear and angular velocities needed for the UAV to effectively pursue the target while keeping it within the camera's field of view This approach minimizes oscillations in motion around the target's actual position, with the final velocity commands sent to the PX4 controller for execution.

If a UAV is not detected initially or loses track of the target for more than five seconds, it automatically switches to search mode.

41 in which case it will rotate at its current position, scanning the area to find the target UAV

Figure 2.16 Algorithm flowchart of the UAV controller block

The 5 seconds timer for the location of the target UAV is implemented to remember the last location of the target UAV as the UAV detection and segmentation block cannot reliably detect the target UAV in every image frame, or the UAV performs a pitch or roll rotation in the process of moving which make the target UAV move out of the cameras FOV This such case, the following UAV will continue to move to the last location of the target UAV in up to 5 seconds After 5 seconds and the detection of the target UAV has not been recovered, the following UAV will enter UAV searching mode

EXPERIMENTS AND RESULTS

UAV detection experiments

This section assesses the UAV detection capabilities of the trained YOLOv8 segmentation model using a variety of self-collected datasets in both simulated and real-world environments, as well as publicly available UAV detection datasets Due to the scarcity of datasets offering segmentation masks for UAVs—most only provide bounding box annotations—the evaluation will encompass both instance segmentation and object detection results for UAV objects.

3.1.1 Testing the UAV detection model in simulation

A simulation dataset was captured in a photo-realistic park environment, featuring a target UAV flying randomly while another UAV maintains a distance of approximately 7 meters from smaller UAVs (models 6 and 7) and 10 meters from larger UAVs (models 1 and 3) This setup allows for the recording of images from an RGB camera along with instance segmentation masks, which will be compared to the segmentation results produced by the YOLOv8 model A sample image from this dataset is illustrated in Figure 3.1.

Figure 3.1 Sample from the dataset for testing the UAV detection and segmentation model Table 3.1 Result of the UAV detection and segmentation in simulation

(box) mAP50-95 (box) mAP50 (mask) mAP50-95 (mask) UAV model 1 0.9908 0.7554 0.9896 0.6208

The model demonstrates strong performance, as evidenced by the high mAP50 values shown in Table 3.1, indicating its effectiveness in detecting UAVs consistently Overall, the model's accuracy is notably high, although it faces challenges in accurately detecting specific UAV model numbers.

6, which is a white UAV In this case, the UAV color blends in with the sky environment, making it much harder to detect the UAV

This section evaluates several publicly available UAV detection datasets, specifically selecting those that align closely with scenarios involving an actively flying UAV The criteria for these datasets include a horizontal viewing angle and a maximum distance of 20 meters, ensuring that the UAV is in motion rather than stationary or held.

The following datasets were used in the testing:

The UAV-Eagle dataset is highly relevant for our scenarios, featuring 510 images of a custom quadcopter named Eagle captured in an unconstrained environment The imagery predominantly showcases a backdrop of trees and blue skies, interspersed with flat buildings Notably, this dataset includes pre-existing annotations for object detection in YOLO format, detailing the object class along with the bounding box's center coordinates (x, y) and its dimensions (width and height).

Figure 3.2 Sample image from UAV-Eagle dataset

The Drone Detection Dataset features a collection of videos capturing flying objects, including 114 recordings of UAVs However, the majority of these videos depict similar scenarios, predominantly showcasing a single UAV against a blue sky Given the requirement for manual labeling and the homogeneity of the footage, only one video, V_DRONE_001.mp4, consisting of 301 annotated image frames, was utilized for model evaluation.

Figure 3.3 Sample image from the Drone-detection-dataset dataset

The self-recorded dataset was captured using the FFY-U3-16S2C color camera system mounted on a Tarot Iron Man 650 UAV, which was positioned approximately 13 meters from the building being observed A sample image from this dataset is illustrated in Figure 3.4.

Figure 3.4 Sample image recorded using the FFY-U3-16S2C-C camera for testing

Table 3.2 presents the evaluation results for the simulation datasets, which only include bounding box labels for the object detection task Consequently, only the mean Average Precision (mAP) value for bounding box detection can be computed, despite the model also generating segmentation masks alongside bounding box results.

Table 3.2 Results of the UAV detection and segmentation model testing with real-life dataset

Dataset mAP50 (box) mAP50-95 (box)

The YOLOv8n-seg UAV detection model demonstrates impressive accuracy given its compact structure It outperforms the YOLOv3Tiny model on the UAV-Eagle dataset, achieving a mAP50 score of 0.8444, surpassing the original paper's results Additionally, the model recorded commendable scores on both the Drone-detection-dataset and a self-recorded dataset, highlighting its effectiveness in UAV detection tasks.

A score of 95 across all datasets may be attributed to inconsistencies in labeling, which often hinder the accurate detection of UAV wing tips and slim legs This misalignment results in a significant gap between the detection bounding box and the ground truth bounding box.

Self-recorded datasets featuring UAVs in complex environments or low light conditions pose significant detection challenges, as the UAVs may blend into their surroundings, as illustrated in Figure 3.5 The limited resolution of 640x384 and the simplistic structure of the Nano model hinder effective detection To improve performance in these demanding scenarios, advanced hardware capable of processing full-resolution images, along with a more sophisticated AI model that can capture and analyze greater detail, is essential.

Figure 3.5 A real-life dataset that the AI model failed to detect the UAV The red box shown the ground truth bounding box

3.1.3 Summary of the UAV detection test

The conducted tests utilizing both simulated and real-life image data demonstrate the model's effectiveness in detecting and segmenting the target UAV across various scenarios This validates the data collection approach that combines simulation with real-world imagery Although the current dataset emphasizes urban settings, it can be adjusted to suit different environments where the UAV operates Additionally, collecting data directly from the deployment location could enhance detection accuracy through model fine-tuning.

The model encounters challenges with more difficult detection tasks, as illustrated in Figure 3.5 While adopting a more sophisticated approach for UAV detection could enhance performance, it may also result in slower detection speeds.

UAV follow experiment

This section discusses the evaluation of the target UAV location estimation method and the tracking capabilities of the camera-equipped UAV Due to device limitations and safety concerns associated with real-world testing, this evaluation is confined to simulations conducted in the AIRSIM environment.

The simulated environment replicates an aerial view of a park, ensuring that the UAVs can navigate freely without any obstacles in their flying area.

3.2.1.1 Scenario 1: Testing the accuracy of the localization algorithm

The target of this experiment is to evaluate the accuracy of the estimation of the target UAV from the camera system from the main UAV

The experiment involves a stationary UAV equipped with a stereo camera system and a second UAV flying within its field of view (FOV) at varying distances of 3 to 15 meters and a speed of 1 m/s The stationary UAV is positioned at coordinates (0,0,20), while the target UAV is modeled as UAV number 1 The stereo image stream, along with the poses of both UAVs, will be recorded Following the application of a localization algorithm to the stereo images, the estimated position of the target UAV will be compared to its ground truth pose to assess the estimation error All error calculations will utilize the rpg_trajectory_evaluation software, an open-source toolbox designed for trajectory evaluation.

Figure 3.6 Setup of the experiment with the stationary UAV carrying camera and the target UAV moving

The experimental results depicted in Figures 3.7 and 3.8 indicate that the 3D localization algorithm excels at distances under 10 meters, achieving a mean error of less than 5% and a maximum error of 10% However, as the separation between the two UAVs increases, the limitations of the stereo camera system become apparent.

As the distance increases to 16 meters, the error in depth estimation rises significantly, reaching a mean error of 7.5% and a maximum error of 22% This fluctuation in estimated depth, which oscillates around the true value, stems from the limitations of calculating depth based on the disparity of a single center pixel As disparity values decrease, the differences in depth calculations for consecutive disparity values become more pronounced, amplifying the impact of disparity estimation errors, particularly at greater distances This phenomenon is illustrated in Figure 3.7, where the UAV, positioned about 15 meters from the camera, shows depth values fluctuating between 14 and 16 meters.

Figure 3.7 The groundtruth and the estimation flying path of the target UAV

Figure 3.8 Absolute and mean error of the location estimation process

3.2.1.2 Scenario 2: Testing the UAV pursuit ability in a small area

This experiment aims to assess the precision of location estimation for target UAVs and their capability to track an agile UAV navigating a complex flight path within a confined space.

The experiment involves two UAVs, with the target UAV flying a random path within a 20x20x4 meter area centered at (x=0, y=0, z) in the world frame The target UAV will begin at a speed of 1 m/s, increasing until the camera-equipped UAV can no longer maintain a 7-meter distance Each test will last approximately 2 minutes.

The UAV, equipped with cameras, will hover at the coordinates (x=0, y=0, z), marking the center of the target UAV's operational area Once in position, the UAV tracking and following program will be activated, allowing the target UAV to begin moving at a predetermined speed Throughout this process, both the ground truth and the estimated location of the target UAV will be recorded and analyzed for comparison.

Figures 3.9 to 3.12 present the experimental results, illustrating the estimated and ground truth flight paths of the target UAV derived from the camera system of the following UAV Additionally, the final plot in each figure highlights the error in each world coordinate axis, comparing the estimated location to the ground truth location of the target UAV throughout its flight.

The experiment demonstrates that a UAV equipped with cameras can effectively track a target UAV at speeds of up to 3 meters per second However, when the target UAV's speed increases to 4 m/s, the tracking UAV struggles and ultimately loses track after covering 140 meters Additionally, the location estimation error for the target UAV rises with increased speed, ranging from approximately 0.2 meters on each axis at 1 m/s to nearly 1 meter at 4 m/s, yet the estimated flight path remains closely aligned with the actual trajectory of the target UAV.

Figure 3.9 Result of the first UAV follow test with target UAV speed of 1 m/s

3.2.1.3 Testing scenarios 3: Testing the following ability over long distance

The target of this experiment is to evaluate the ability to follow the target UAV when the target UAV is flying away from the cameras at various speeds

The experiment involves two UAVs, with the target UAV initially hovering 7 meters within the camera's field of view of the following UAV As the target UAV accelerates, it will travel a total distance of 30 meters The following UAV's objective is to pursue the target while maintaining a distance of 7 meters The experimental setup is illustrated in Figure 3.13 within the AIRSIM simulator.

Figure 3.13 Setup of the UAV following over long distance experiment

Figure 3.14 illustrates the distance between two UAVs across different axes for five varying target UAV speeds As both UAVs travel parallel to the x-axis, the focus is on their positional differences along this axis At lower target speeds of 1-2 m/s, the following UAV maintains a consistent distance of approximately 8 meters from the target However, as the target speed increases, the following UAV experiences delays due to the setpoint queue in its controller, resulting in a significant lag Despite this, the following UAV eventually catches up Notably, at a target speed of 6 m/s, the following UAV is unable to keep pace with the target.

Figure 3.14 Results of the UAV following over long distance experiment

3.2.2 Summary of the UAV localization and follow test

The UAV detection and pursuit algorithm has demonstrated operational effectiveness through three testing scenarios It achieved an average detection accuracy of about 7.5% at a distance of 15-16 meters Additionally, the algorithm successfully followed a target UAV at speeds up to 3 m/s when navigating complex paths and up to 5 m/s when the target was flying away in a straight line.

CONCLUSION

Conclusion

The project has developed an advanced algorithm capable of detecting and tracking a target UAV using RGB cameras mounted on another UAV This innovative solution integrates the cutting-edge YOLOv8 object detection and instance segmentation algorithm, along with the depth estimation features of a stereo camera system and a velocity UAV controller on the PX4 Autopilot platform Extensive experiments have shown that the algorithm can accurately detect the target UAV and estimate its coordinates in various scenarios Although the UAV controller effectively follows the target, its performance is still challenged when the target UAV exhibits high speeds or complex movements.

Future research directions

Future research from this project can be enhanced through real-life experiments and integrated with complementary systems like UAV identification and management methods, facilitating practical application deployment Additionally, techniques for creating object detection datasets using photo-realistic simulators, along with algorithms for estimating object locations from UAVs, have broad applicability in detecting and tracking various object types.

[1] I Guvenc, F Koohifar, S Singh, M L Sichitiu and D Matolak, "Detection, Tracking, and Interdiction for Amateur Drones," IEEE Communications

Magazine, pp vol 56, no 4, pp 75-81, 2018

[2] D Reis, J Kupec, J Hong and A Daoudi, "Real-Time Flying Object Detection with YOLOv8," arXiv:2305.09972v1, 2023

[3] C Cigla, R Thakker and L Matthies, "Onboard Stereo Vision for Drone Pursuit or Sense and Avoid," IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2018

[4] A Barisic, F Petric and S Bogdan, "Brain over Brawn: Using a Stereo Camera to Detect, Track, and Intercept a Faster UAV by Reconstructing the Intruder’s Trajectory," Field Robotics, vol 2, pp 222-240, 2022

[5] A Carrio, S Vemprala, A Ripoll, S Saripalli and P Campoy, "Drone Detection Using Depth Maps," IEEE/RSJ International Conference on

Intelligent Robots and Systems (IROS), 2018

[6] D M Huynh, A D Nguyen, H N Nguyen, H D Tran, D A Ngo, J Pestana and A Q Nguyen, "Implementation of a HITL-Enabled High Autonomy Drone Architecture on a Photo-Realistic Simulator," in 2022 11th

International Conference on Control, Automation and Information Sciences (ICCAIS)

[7] Chakrasthitha [Online] Available: https://electricalfundablog.com/drones- unmanned-aerial-vehicles-uavs/ [Accessed 18 September 2023]

[8] A Barisic, M Car and S Bogdan, "Vision-based system for a real-time detection and following of UAV," in Workshop on Research, Education and

Development of Unmanned Aerial Systems (RED UAS), Cranfield, UK,

[9] Akhloufi, M A., S Arola and A Bonnet, "Drones Chasing Drones: Reinforcement Learning and Deep Search Area Proposal," Drones, vol 3, p

[10] N T Thanh, Abdukhakimov, A a Kim and Dong-Seong, "Long-Range Wireless Tethering Selfie Camera System Using Wireless Sensor Networks," IEEE Access, 2019

[11] "Intel RealSense Depth Camera D455," [Online] Available: https://www.intelrealsense.com/depth-camera-d455/ [Accessed 18 September 2023]

[12] Stereolabs, "ZED 2i depth camera," [Online] Available: https://www.stereolabs.com/zed-2i/ [Accessed 18 September 2023]

[13] Haider, Azmi and H Hel-Or, "What Can We Learn from Depth Camera

[14] J Terven and D Cordova-Esparza, "A Comprehensive Review of YOLO:

From YOLOv1 and Beyond," arXiv:2304.00501v4

[15] W Liu, D Anguelov, D Erhan, C Szegedy, S Reed, C.-Y Fu and A C

Berg, "SSD: Single Shot MultiBox Detector," in Computer Vision – ECCV

[16] Girshick, Ross, Donahue, Jeff, Darrell, Trevor, Malik and Jitendra, "Rich

Feature Hierarchies for Accurate Object Detection and Semantic

Segmentation," IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2013

[17] glenn-jocher, Ultralytics, [Online] Available: https://github.com/ultralytics/yolov3/issues/898 [Accessed 18 September

[18] L AI, "TorchMetrics," [Online] Available: https://lightning.ai/docs/torchmetrics/stable/detection/mean_average_precis ion.html [Accessed 18 September 2023]

[19] Ultralytics, "Ultralytics YOLOv8 Docs," [Online] Available: https://docs.ultralytics.com/ [Accessed 18 September 2023]

[20] RangeKing [Online] Available: https://github.com/RangeKing [Accessed

[21] A Bochkovskiy, C.-Y Wang and H.-Y M Liao, "YOLOv4: Optimal Speed and Accuracy of Object Detection," arXiv:2004.10934v1, 2020

[22] Pytorch [Online] Available: https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html [Accessed

[23] S Ioffe and C Szegedy, "Batch Normalization: Accelerating Deep Network

Training by Reducing Internal Covariate Shift," arXiv:1502.03167, 2015

[24] [Online] Available: https://tek4.vn/batch-norm-trong-pytorch-lap-trinh- neural-network-voi-pytorch [Accessed 18 September 2023]

[25] PyTorch [Online] Available: https://pytorch.org/docs/stable/generated/torch.nn.SiLU.html [Accessed 18

[26] PyTorch [Online] Available: https://pytorch.org/docs/stable/generated/torch.nn.ReLU.html [Accessed

[27] P Ramachandran, B Zoph and Q V Le, "Searching for Activation

[28] D Bolya, C Zhou, F Xiao and Y J Lee, "YOLACT++: Better Real-time

[29] NVIDIA, "Embedded Systems with Jetson," [Online] Available: https://www.nvidia.com/en-us/autonomous-machines/embedded-systems/

[30] Lakshan, "YOLOv8 Performance Benchmarks on NVIDIA Jetson Devices,"

[Online] Available: https://www.seeedstudio.com/blog/2023/03/30/yolov8- performance-benchmarks-on-nvidia-jetson-devices/ [Accessed 18

[31] OpenMMLab, "MMDetection3D," [Online] Available: https://github.com/open-mmlab/mmdetection3d [Accessed 18 September

[32] Li, Jing, Ye, D Hye, Kolsch, Mathias, Wachs, J P, Bouman and C A, "Fast and Robust UAV to UAV Detection and Tracking From Video," IEEE

Transactions on Emerging Topics in Computing, 2022

[33] Intel, "Intel RealSense D400 Series datasheet," [Online] Available: https://www.intelrealsense.com/wp-content/uploads/2020/06/Intel-

RealSense-D400-Series-Datasheet-June-2020.pdf [Accessed 18 September

[34] OpenCV [Online] Available: https://docs.opencv.org/3.4/dd/d53/tutorial_py_depthmap.html

[35] Xu, Haofei, Zhang and Juyong, "AANet: Adaptive Aggregation Network for

Efficient Stereo Matching," in Proceedings of the IEEE/CVF Conference on

Computer Vision and Pattern Recognition, 2020

[36] H Xu, J Zhang, J Cai, H Rezatofighi, F Yu, D Tao and A Geiger,

"Unifying Flow, Stereo and Depth Estimation," IEEE Transactions on

Pattern Analysis and Machine Intelligence, 2023

[37] D Gallup, J.-M Frahm, P Mordohai and M Pollefeys, "Variable baseline/resolution stereo," IEEE Conference on Computer Vision and

[38] Dronecode, "PX4 Autopilot," [Online] Available: https://px4.io/ [Accessed

[39] "Gazebo," [Online] Available: https://gazebosim.org/home [Accessed 18

[40] "Flightmare - A Flexible Quadrotor Simulator," [Online] Available: https://uzh-rpg.github.io/flightmare/ [Accessed 18 September 2023]

[41] M Research, "AirSim," [Online] Available: https://microsoft.github.io/AirSim/ [Accessed 18 September 2023].

Tiêu đề	Nghiên cứu phát triển thuật toán định vị thị giác thông minh cho thiết bị bay không người lái
Tác giả	Huỳnh Đức Minh
Người hướng dẫn	TS. Nguyễn Anh Quang
Trường học	Hanoi University of Science and Technology
Chuyên ngành	Kỹ thuật điện tử
Thể loại	thesis
Năm xuất bản	2023
Thành phố	Hà Nội

Định dạng
Số trang	67
Dung lượng	3 MB