1. Trang chủ
  2. » Luận Văn - Báo Cáo

Khóa luận tốt nghiệp Khoa học máy tính: Khai thác các kỹ thuật khai phá mẫu hiệu quả cho bài toán tái định danh người không giám sát

75 20 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Khai thác các kỹ thuật khai phá mẫu hiệu quả cho bài toán tái định danh người không giám sát
Tác giả Nguyen Tien Hung, Cao Van Hung
Người hướng dẫn TS. Nguyen Vinh Tiep
Trường học Đại học Quốc gia Thành phố Hồ Chí Minh
Chuyên ngành Khoa học máy tính
Thể loại Khóa luận tốt nghiệp
Năm xuất bản 2024
Thành phố TP. Hồ Chí Minh
Định dạng
Số trang 75
Dung lượng 36,48 MB

Nội dung

NHẬN XÉT KHÓA LUẬN TÓT NGHIỆP CUA CÁN BỘ HUONG DAN Tên khóa luân: KHAI THÁC CÁC KỸ THUẬT KHAI PHÁ MẪU HIỆU QUÁ CHO BÀI TOÁN TÁI ĐỊNH DANH NGƯỜI KHÔNG GIÁM SÁT Nhóm SV thực hiện: Cán bô h

Trang 1

ĐẠI HỌC QUOC GIA THÀNH PHO HO CHÍ MINH TRUONG DAI HOC CONG NGHE THONG TIN

KHOA KHOA HOC MAY TINH

NGUYEN TIEN HUNG - 20520198

CAO VĂN HUNG - 20520193

NGƯỜI KHÔNG GIÁM SÁT

EXPLOITING EFFICIENT PATTERN MINING

TECHNIQUES FOR UNSUPERVISED PERSON

RE-IDENTIFICATION

CU NHÂN TAI NANG NGANH KHOA HỌC MAY TÍNH

GIANG VIEN HUONG DAN

TS NGUYEN VINH TIEP

TP HO CHi MINH, NAM 2024

Trang 2

ĐẠI HỌC QUOC GIA THÀNH PHO HO CHÍ MINH

TRUONG DAI HOC CONG NGHE THONG TIN

KHOA KHOA HOC MAY TINH

NGUYEN TIEN HUNG - 20520198

CAO VAN HUNG - 20520193

NGƯỜI KHÔNG GIÁM SÁT

EXPLOITING EFFICIENT PATTERN MINING

TECHNIQUES FOR UNSUPERVISED PERSON

RE-IDENTIFICATION

CU NHÂN TAI NĂNG NGANH KHOA HOC MAY TÍNH

GIANG VIEN HUONG DAN

TS NGUYEN VINH TIEP

TP HO CHÍ MINH, NAM 2024

Trang 3

DANH SÁCH HOI DONG BẢO VỆ KHÓA LUẬN

Hội đồng cham khóa luận tốt nghiệp, thành lập theo Quyết định số

-5-ngầy của Hiệu trưởng Trường Đại học Công nghệ Thông tin.

— Chủ tịch.

„I4 — Thư ký

sa — Ủy viên

Trang 4

ĐẠI HỌC QUOC GIA TP HO CHI MINH CONG HÒA XÃ HOI CHỦ NGHĨA VIỆT NAM

TRƯỜNG ĐẠI HỌC Độc Lập - Tự Do - Hạnh Phúc CÔNG NGHỆ THÔNG TIN

TP HCM, ngay thang ndm

NHẬN XÉT KHÓA LUẬN TÓT NGHIỆP

CUA CÁN BỘ HUONG DAN

Tên khóa luân:

KHAI THÁC CÁC KỸ THUẬT KHAI PHÁ MẪU HIỆU QUÁ CHO BÀI TOÁN TÁI

ĐỊNH DANH NGƯỜI KHÔNG GIÁM SÁT

Nhóm SV thực hiện: Cán bô hướng dẫn:

Nguyễn Tiến Hưng 20520198 TS Nguyễn Vinh Tiệp

Cao Văn Hùng 20520193

Đánh gia Khóa luận

1 Vé cuôn báo cáo:

Số trang Số chương

Số bảng số liệu Số hình vẽ

Số tài liệu tham khảo Sản pham

Một sô nhận xét vê hình thức cuôn báo cáo:

Trang 5

4 Về thái độ làm việc của sinh viên:

Người nhận xét

(Ký tên và ghi rõ họ tên)

Trang 6

ĐẠI HỌC QUOC GIA TP HO CHÍ MINH CONG HÒA XÃ HOI CHỦ NGHĨA VIỆT NAM

TRƯỜNG ĐẠI HỌC Độc Lập - Tự Do - Hạnh Phúc

CÔNG NGHỆ THÔNG TIN

TP HCM, ngay thang ndm

NHAN XÉT KHÓA LUẬN TOT NGHIỆP

CUA CAN BO PHAN BIEN

Tên khóa luân:

KHAI THÁC CÁC KỸ THUẬT KHAI PHÁ MẪU HIỆU QUÁ CHO BÀI TOÁN TÁI

ĐỊNH DANH NGƯỜI KHÔNG GIÁM SÁT

Nhóm SV thực hiện: Cán bô phản biên:

Cao Văn Hùng 20520193

Đánh gia Khóa luận

5 Vệ cuôn bao cáo:

Số trang Số chương

Số bảng số liệu Số hình vẽ

Số tài liệu tham khảo Sản pham

Một sô nhận xét về hình thức cuôn báo cáo:

8 Vệ thái độ làm việc của sinh viên:

Trang 7

Người nhận xét

(Ký tên và ghi rõ họ tên)

Trang 8

ĐẠI HỌC QUOC GIA TP HO CHÍ MINH CONG HÒA XÃ HOI CHỦ NGHĨA VIỆT NAM

TRƯỜNG ĐẠI HỌC Độc Lập - Tự Do - Hạnh Phúc

CÔNG NGHỆ THÔNG TIN

ĐĂNG KÝ DE TÀI KHÓA LUẬN TOT NGHIỆP

Tên đề tài: Khai thác các kỹ thuật khai phá mẫu hiệu quả cho bài toán tái định danh

người không giảm sát

Tên đề tài tiếng Anh: Exploiting Efficient Pattern Mining Techniques for

Unsupervised Person Re-identification

Ngôn ngữ thực hiện: Tiếng Việt

Cán bộ hướng dẫn: Tiến sĩ Nguyễn Vinh Tiệp

Thời gian thực hiện: Từ ngày 09/2023 đến ngày 12/2023.

Sinh viên thực hiện:

Nguyễn Tiến Hưng - 20520198 Lớp: KHTN2020

Email: 20520198@gm.uit.edu.vn Điện thoại: 0915730264

Cao Văn Hùng - 20520193 Lớp: KHTN2020

Email: 20520193@gm.uit.edu.vn Điện thoại: 0966160250

Nội dung đề tài: (M6 ta chỉ tiết mục tiêu, phạm vi, đối tượng, phương pháp thực hiện, kết

quả mong đợi của đề tài)

Bài toán tái định danh người không giám sát là một trong những thách thức quan

trọng trong lĩnh vực thị giác máy tính Nó liên quan đến việc nhận diện lại một người từ

một tập hợp các hình ảnh hoặc video mà không cần dùng dit liệu gan nhãn Điều này có

ứng dụng quan trọng trong giám sát an ninh, quản lý đám đông và hệ thống thành phố

thông minh.

Trang 9

Bài toán tái định danh người không giám sat (Unsupervised

Person-Re-identification) yêu cau tìm ra hình ảnh của một người trong nhiều camera khác nhau và trình bày kết quả dưới dang danh sách các đối tượng con người từ cơ sở dữ liệu, được sắp

xếp theo mức độ tương đồng với hình ảnh cần nhận diện Việc này trở nên quan trọng khi

việc thu thập đữ liệu gan nhãn cho bài toán này trở nên khó khăn hoặc không khả thi.

Phạm vi:

- _ Tập trung vào tái định danh người mà không thay đổi quan áo.

- Tap trung vào các mô hình học không giám sat.

Trong nghiên cứu này, các mục tiêu đưa ra là:

- _ Nghiên cứu các kỹ thuật tiên tiến cho bai toán tái định danh người không giám sát.

- Cai đặt, thử nghiệm các phương pháp trên các bộ dữ liệu chuẩn Sau đó đánh giá

kết quả nhận được.

- Dé xuất và đánh giá các phương pháp khai thác mẫu hiệu quả cho việc học không

giám sát đối với bài toán tái định danh người.

- _ Xây dựng ứng dụng minh hoa.

Nội dung và phương pháp nghiên cứu:

- - Nội dung 1: Khảo sát tong quan về các hướng tiếp cận cho bài toán tái định danh

người dựa trên học không giám sát.

o Tìm hiểu tổng quan các hướng tiếp cận chính cho bài toán hiện có.

o Tìm hiểu tổng quan các bộ dit liệu đánh giá chuẩn hiện có.

- _ Nội dung 2: Cài đặt, thử nghiệm các phương pháp trên các bộ dit liệu chuẩn Sau

đó đánh giá kết quả nhận được

o Thực hiện việc cai đặt các phương pháp và thuật toán đã được nghiên cứu

trong phan 1 trên các bộ dữ liệu chuẩn.

o_ Tiến hành đánh giá kết quả để xem các phương pháp này hoạt động như thế

nào trong ngữ cảnh của bai toán học không giám sat tái định danh người.

- Nội dung 3: Đề xuất và đánh giá phương pháp khai thác mẫu hiệu quả cho việc

học không giám sát đối với bài toán tái định danh người

o_ Đề xuất phương pháp cải tiến dựa trên kết quả khảo sát và các van đề chưa

giải quyết được ở Nội dung 1 và 2.

- _ Nội dung 4: Xây dựng ứng dung minh hoa.

o_ Thiết kế giao diện và xây dựng hệ thống tái định danh người nền tảng web

Kết quả dự kiến:

Trang 10

Kết quả khảo sát tông quan về các hướng nghiên cứu của học không giám sát tái

định danh người.

Tài liệu mô tả về các bộ đữ liệu đánh giá hiện có cho bài toán tái định danh người.

Tài liệu mô tả chỉ tiết về phương pháp được đề xuất và kết quả đánh giá, so sánh giữa phương pháp đề xuất và các phương pháp liên quan.

Kế hoạch thực hiện: (Mô tả tóm tắt kế hoạch làm việc và phân công công việc cho từng

sinh viên tham gia)

Tuần 1-2: Tiến hành nội dung 1 - Khảo sát tổng quan về bài toán và các hướng tiếp

cận chung của bài toán học không giám sát cho tái định danh người.

Tuần 3-4: Tiến hành nội dung 2 - Cài đặt, thử nghiệm các phương pháp trên các

bộ đữ liệu chuẩn Sau đó đánh giá kết quả nhận được.

Tuần 5-8: Tiến hành nội dung 3 - Đề xuất và đánh giá phương pháp khai thác mẫu hiệu quả cho việc học không giám sát đối với bài toán tái định danh người.

Tuần 9-12: Tiến hành nội dung 4 - Xây dựng ứng dụng minh hoạ.

Phân công công việc:

Cao Văn Hùng: khảo sát các nghiên cứu liên quan, chạy thực nghiệm, viết báo cáo Nguyễn Tiến Hưng: khảo sát các nghiên cứu liên quan, chạy thực nghiệm, viết báo

cáo, xây dựng ứng dụng minh họa.

Xác nhận của CBHD TP HCM, ngày tháng năm 2023 (Ký tên và ghi rõ họ tên) Sinh viên

(Ký tên và ghi rõ họ tên)

Trang 11

This thesis was completed successfully thanks to a lot of help and support from manypeople We’re really thankful for their helpful feedback We want to start by thanking

our supervisor, Dr Nguyen Vinh Tiep, for his great guidance and help throughout this

research His advice was really important in helping us do our research and finish this

thesis

We also want to say a big thank you to the Dean and all the teachers in the ComputerScience Department at the University of Information Technology They supported us a lotand taught us everything we needed to know to complete this thesis

We’re also thankful to the Multimedia Laboratory (MMLab-UIT) for giving us a good

place to do our research and for the advanced equipment they provided Also, a special

thanks to the researchers at MMLab for their useful feedback and questions, which really

helped make our research better They helped us find and fix mistakes, which made thisthesis better

Trang 13

2.6.1 Consistent mining sfrateøy| Ặ

2.6.2 Adaptive mining stratesy| Ặ.ẶẶo

Trang 15

List of Figures

1.1 A common flow chart of person ReID system Datasets are collected from

multi-camera systems for training and testing The training phase involves

learning person feature representations During testing, the system

re-ceives a query to locate a matching person in gallery images, resulting in

a ranked list of potential matches [II|| 3

1.2 Examples of some person RelD challenges Each pair of images shows

the same person except (g) (a) viewpoint variation, (b) pose variations,

(c) illumination changes, (d) partial occlusion, (e) inaccurate pedestrian

image, but not in the second), (g) low resolution, (h) different people with

similar clothing) 2 ee ee 5

2.1 Illustration of triplet loss given one positive and one negative per anchor

(Image source: Schroff etal 2015)| 16

2.2 Comparison between triplet and hard instance contrastive loss} 16

2.3 Illustration of feature alignment approach for unsupervised domain

adap-tation Mid-level attribute features are aligned between source and targetdomains in a joint learning pipeline Ja, represents the attribute align-ment loss between the source attributes and the target attributes Source

PJP we ee 19

Trang 16

traction and clustering from unlabeled target data to generate

pseudo-labels 2 Model training using the unlabeled data along with these

2.5 Hard instance contrastive loss compares input sample with hard positive

that belong to the same cluster and hard negative instances from other

clusters Visualization From HHCL [I0Ì| 23

3.1 Hybrid Contrast Learning Framework [10] 1) Initialization using a

clus-tering algorithm to create pseudo labels and initialize memory banks 2)

Forward propagation to calculate cluster and hard instance contrastive

losses 3) Backpropagation to update the encoder model 4) Updating

the instance and cluster centroid memory banks} 263.3_ Cluster-level contrastive loss| Ặ 28

3.4 Memory Base HardMiningl| 29

3.5 Overview of our method The ClusterNCE compute contrastive loss in

cluster level with dynamic momentum update In the Instance-level

con-trastive loss, we apply adaptive positive mining In this context, x € Xdenotes the training dataset, g represents the query instance’s feature vec-

tor, and c¿ signifies the k-th cluster feature vector, with feature vectors of

the same color belonging to the same cluster Additionally, i; represents

the j-thinstance memory.|_ ẶẶẶẶẶ 33

3.6 Different ways to assign weights of cluster centroid (J9])| 34

Trang 17

3.7 Person RelD datasets exhibit varying levels of intra-class differences (a)

For large intra-class variations caused by factors like occlusion, lighting

changes, and different viewpoints, mining the hardest positive pair can

negatively impact metric learning In these cases, opting for the

least-hard pair 1s more beneficial (b) When there’s small intra-class variation,

with visually similar features, both the hardest and least-hard pairs tend to

show significant visual resemblances|_

3.8 Visualize different sampling method|

4.1 MarketI50l) 0020000 0 0.000.000 00000

4.2 Statistics of MSMII/

4 3_ MSMTI7 vs Market1501 Each column shows two sample images of the

same identity MSM T17 presents a more challenging and realistic person

RelID taskfi, 00.) gee er VG ee eee 44

Trang 18

List of Tables

4.1 DatasetCompariIson| 0.2.00 eee ee 44

4.2 Ablation studies on proposed method| 48

4.3 Comparison with the state-of-the-art methods on Market-1501 and MSMT17

LIF 6© RR ee 49

Trang 19

In the evolving field of unsupervised person re-identification (Re-ID), this thesis focuses

on enhancing methods for identifying individuals across various camera views without

the aid of labelled data Unsupervised Re-ID presents challenges in accurately learningfrom unannotated datasets and is crucial for applications in security and surveillance

The research is motivated by exclusive hard feature mining limitations in unsupervised

Re-ID, particularly the risk of incorporating noisy samples and the diminished impact

of hard instances as training progresses Addressing these challenges, the thesis aims to

develop a more effective and stable training paradigm

Contributions include the implementation of the Dynamic Centroid Update Policy (DCUP)

for optimizing cluster representation vectors, and the Adaptive Positive Mining Instance

Contrastive Loss, which balances hard and easy samples for a more effective training

pro-cess These advancements aim to address the limitations of current methods, offering

robust and generalizable solutions for unsupervised Re-ID

Trang 20

Chapter 1

Introduction

1.1 Overview

Enhancing public security and safety through robust surveillance systems is becoming

increasingly crucial, especially considering the rising global crime rates In recent years,statistics have shown a steady increase in various types of crimes across different regions,

emphasizing the need for effective monitoring and response mechanisms Implementing

large-scale surveillance systems, equipped with advanced person re-identification (ReID)models, is a proactive response to this growing concern These systems aid in crime

prevention and investigation and play a vital role in ensuring public safety in densely

populated urban areas, transportation hubs, and other public spaces The integration ofperson ReID technology into these systems significantly enhances their capability to trackand identify individuals in public spaces like airports, shopping centres, and urban streets,

thereby enhancing public safety and security measures

As computer vision technology continues to advance, various critical modules in a lance system should be developed for specialized tasks Modules for object detection and

surveil-classification are essential for identifying and categorizing individuals in a scene With

Trang 21

Chapter 1 Introduction

the large amount of data from numerous cameras, a retrieval module becomes necessary

to efficiently search for specific persons Person ReID plays a key role in this process,

enabling the system to consistently recognize a target individual despite varying posesand environmental factors Additionally, person ReID models aid the retrieval module by

extracting effective image representations for more efficient searches

1.2 Person Re-identification

Queryimage/video

| —

a “A

Multi-camera system Images/videos Ranking list

FIGURE 1.1: A common flow chart of person ReID system Datasets are collected from camera systems for training and testing The training phase involves learning person feature rep- resentations During testing, the system receives a query to locate a matching person in gallery

multi-images, resulting in a ranked list of potential matches.

1.2.1 Problem Definition

Person ReID can be defined as the human association task on the bounding boxes drawn

by a person detection algorithm The primary input in this process is the image or videofeed from these cameras, where individuals are initially detected and their appearances

captured within bounding boxes The output, on the other hand, is the identification of

Trang 22

Chapter 1 Introduction

these individuals across different camera feeds, linking appearances to establish a

con-tinuous identity track This process involves sophisticated algorithms that analyse and

compare visual features, including facial characteristics, clothing, and gait The goal is tocreate a reliable link between the observed subject and their corresponding identity in the

database

In an image-based ReID system, the workflow integrates a person detection algorithmwith a ReID algorithm For video-based ReID, also involves a human tracker that createstracklets - sequences of detections of the same individual over time - bridging the gap

between detection and ReID The focus of person ReID research is predominantly on the

matching of these bounding boxes or tracklets, ensuring accurate identification despite thevisual disparities caused by different camera angles, occlusions, or varying resolutions

We apply specific constraints to refine the focus of our research First, our model

fo-cuses on the appearance aspects of an individual’s identity, excluding scenarios involvingchanges in clothes This decision aims to narrow the problem down to its fundamentalaspects, which involve identifying and matching individuals based on consistent visual

features

Additionally, our study concentrates on image-based person ReID As a result, the model

we are developing is tailored to process only the cropped bounding boxes that encompass

individuals This limitation indicates that our system is intended to analyze still images or

frames from video feeds, with individuals pre-detected and isolated by a person detectionalgorithm

By concentrating on consistent appearance features and emphasizing image-based person

ReID, we aim to develop a robust system tailored to real-world scenarios, particularly insurveillance and security contexts While this approach has its limitations, such as not

Trang 23

Chapter 1 Introduction

accounting for clothing changes and focusing solely on still images, it allows for targeted

improvements in identity recognition and tracking within the defined parameters Our

research paves the way for more specialized models in person ReID, contributing valuableinsights and methodologies to the field

FIGURE 1.2: Examples of some person RelD challenges Each pair of images shows the same

person except (g) (a) viewpoint variation, (b) pose variations, (c) illumination changes, (d) partial occlusion, (e) inaccurate pedestrian detection, (f) accessory change (the person has a back bag in

the first image, but not in the second), (g) low resolution, (h) different people with similar clothing

¢ Appearance Variation: In person ReID, a significant challenge is appearance

vari-ability due to clothing or accessory changes Over time and across different

loca-tions, individuals may alter their appearance, like changing clothes or accessories,which complicates identification

Trang 24

Chapter 1 Introduction

¢ Viewpoint Challenges: The varying camera heights, distances, and angles result in

different pedestrian shapes and sizes Since a single image can’t capture a person

from all angles, each view only provides partial information, leading to intra-class

variation and inter-class confusion

¢ Occlusion Issues: In crowded areas, target individuals can be obscured by others or

objects, leading to incomplete feature representations This occlusion often results

in less robust person identification

¢ Illumination Variation: Different illumination levels, particularly in indoor and

outdoor camera networks, affect the matching of appearance representations Thisvariance contributes to inconsistencies in image brightness and quality

¢ Domain Gap: Datasets, often recorded over short periods, exhibit specific clothing

styles and illumination levels For instance, Market1501 and DukeMTMC-ReID

datasets, recorded in different seasons, reflect seasonal clothing and lighting

differ-ences, affecting model generalization

¢ Pose Variation: Human body articulation results in different appearances of the

same individual Models trained on specific poses struggle with identifying varied

poses due to changes in body part localization and visibility

¢ Low Resolution: The high cost of extensive camera coverage often results in sparse

networks and low-resolution images, as cameras are placed high and far from

sub-jects

» Clothing Similarity: In large galleries, the likelihood of similar clothing increases,

adding to matching ambiguity Identifying unique visual signatures becomes more

challenging

Trang 25

Chapter 1 Introduction

¢ Gallery Size: Large public spaces covered by camera networks result in enormous

candidate sets for ReID, increasing computational requirements

¢ Data Labeling Issues: Creating a robust, supervised model requires extensive

an-notated data, which is often prohibitively expensive and labour-intensive to collect

in large camera networks

Each of these factors significantly impacts the effectiveness and accuracy of unsupervised

person ReID systems, presenting challenges that need to be addressed for successful tification

iden-1.2.3 Application

The applications of person ReID are vast and varied, reflecting its growing importance

in modern society From enhancing public safety with city-wide surveillance systems to

personalizing retail experiences and ensuring patient safety in healthcare, person ReID is

becoming an indispensable tool Its role in intelligent transportation for traffic analysis

and in human-robot interaction for personalized assistance further demonstrates its

versa-tility As technology advances, the scope and impact of person ReID are set to expand,

solidifying its position as a key technology in diverse sectors

1.3 Unsupervised Person ReID

Person ReID has emerged as a pivotal task This area initially centred around supervised

learning methods, which relied heavily on extensive datasets with detailed annotations

Trang 26

Chapter 1 Introduction

These methods set the foundation for the field, harnessing richly labelled data to train

models capable of identifying individuals with notable accuracy

However, the field of ReID has increasingly shifted towards unsupervised learning

tech-niques This transition is primarily driven by the practical difficulties in obtaining large,

labelled datasets, particularly in diverse real-world environments Unsupervised

learn-ing methods offer a promislearn-ing alternative, aimlearn-ing to learn from unannotated data Yet,

this approach introduces the significant challenge of extracting reliable and discriminativefeatures without the guidance provided by labelled datasets

This challenge in unsupervised ReID underscores a critical aspect of contemporary

re-search in computer vision: the balance between the need for robust, accurate

identifica-tion and the practicalities of dataset availability and scalability As the field progresses,

the focus on effectively leveraging unannotated data while maintaining the accuracy and

reliability of identification processes becomes increasingly paramount This shift not only

reflects the adaptability of the field but also highlights the ongoing efforts to develop vative methodologies that can work within the constraints of real-world data availability

inno-In unsupervised ReID, two main strategies have emerged: Unsupervised Domain tation (UDA) and Pure Unsupervised Learning (USL) UDA utilizes a pre-trained model

Adap-from a labelled source domain to initialize or adapt to the target domain, often using style

transfer methods However, this approach faces challenges when the domains have

signif-icantly different categories, as the quality of pseudo-labels may be compromised by highlabelling noise

Recent trends have shifted focus towards pseudo-label-based methods that do not rely on

source domain data These methods generate pseudo labels either through pre-trained

classifiers or clustering algorithms like K-means or DB-SCAN [4] Approaches

Trang 27

Chapter 1 Introduction

like hierarchical clustering with hard-batch triplet loss and multi-label classification tasks

have been developed to refine the quality of pseudo labels Techniques like self-paced

contrastive learning and asymmetric contrastive learning frameworks have also been ployed to form more reliable clusters and enhance feature learning invariance

em-State-of-the-art USL ReID pipelines typically involve memory dictionary initialization,

pseudo-label generation, and neural network training Innovations in this area includetreating each sample as a cluster, gradually grouping similar samples, and employingself-paced contrastive learning frameworks for more reliable cluster creation These ad-

vancements aim to improve the memory dictionary and loss function, contributing to the

progressive refinement of ReID models

1.4 Motivations

Our research is primarily motivated by the limitations observed in exclusive hard feature

mining within the realm of unsupervised person ReID Through our experiments, we have

identified critical issues with this prevailing approach:

¢ Risk of Noisy Sample Selection: The strategy of focusing solely on hard features

increases the likelihood of incorporating noisy samples into the model’s training

process These noisy samples can lead to inconsistencies and hinder the model’sability to learn effectively

¢ Diminished Impact of Hard Instances: We have observed that the reliance on hard

instances becomes less evident as the model progresses in its training This can

be attributed to the model potentially over-fitting to the hard samples, leading to adecreased capability in generalizing to new, diverse datasets

Trang 28

Chapter 1 Introduction

These observations underline the necessity to explore more effective sample mining

strate-gies in the unsupervised ReID domain By investigating and implementing alternative

ap-proaches, we aim to develop methodologies that not only overcome these challenges butalso enhance the overall performance and generalization capabilities of ReID models

1.5 Objectives

In response to these challenges, our research aims to explore and implement strategiesthat could lead to more effective and stable training paradigms in unsupervised person

ReID Specifically, our objective is adjusting weights to address the noisy samples

is-sue: To tackle the identified issue of noisy samples and the instability they bring to the

network training, we plan to devise a method that appropriately adjusts the weights ofdifferent instances during the learning process By doing so, we anticipate a more reli-

able and consistent performance from the ReID system, even in the challenging context

of unsupervised learning scenarios

1.6 Contributions

This thesis contributes to the field of unsupervised person ReID with two significant

advancements:

* Dynamic Centroid Update Policy (DCUP): We apply the DCUP method, which

optimizes cluster representation vectors dynamically in contrastive learning This

method replaces traditional static centroids with dynamic cluster centroids,

enhanc-ing the accuracy and adaptability of the learnenhanc-ing process

10

Trang 29

Chapter 1 Introduction

e Adaptive Positive Mining Instance Contrastive Loss: We implement an adaptive

scheme for calculating instance-level contrastive loss This approach utilizes a mix

of both hard and easy samples, rather than a fixed selection strategy, leading to amore balanced and effective training process

These contributions are aimed at addressing the limitations of current unsupervised ReID

methods, offering more robust and generalizable solutions

1.7 Thesis Structure

This thesis is divided into five key sections, each of which is designed to provide the

reader with a comprehensive overview of the topic:

¢ Chapter 1: Introduction This initial chapter offers a comprehensive introduction

to the unsupervised person ReID problem It outlines the research motivation,

de-fines key concepts, addresses the challenges in the field, and delineates our primary

contributions

¢ Chapter 2: Related Work This chapter provides an in-depth review of existing

lit-erature on unsupervised person ReID It focuses on the advancements and persistent

challenges in this area of research

¢ Chapter 3: Methodology The focus here is on explaining our proposed

method-ologies, including innovative strategies like the Dynamic Centroid Update Policy

(DCUP) and Adaptive Positive Mining Contrastive Loss Additionally, this chapterdelves into the baseline model applied in our research, exploring its features and

limitations

11

Trang 30

Chapter 1 Introduction

¢ Chapter 4: Experiments In this chapter, we detail the experimental setup,

pro-cedures, and results It highlights how our proposed methodologies have enhanced

the framework’s performance

¢ Chapter 5: Discussions and Conclusion The concluding chapter of the thesis

encapsulates the significant findings and contributions of our research It discusses

the wider implications of our work in unsupervised person ReID and suggests

di-rections for future research in the field

12

Trang 31

Chapter 2

Related Works

2.1 Unsupervised representation learning

Recent strides in unsupervised representation learning, particularly through contrastive

instance discrimination methods, are profoundly influencing unsupervised person ReID

These methods operate under the principle that each image represents a distinct class

and use mechanisms such as memory banks or large mini-batches to differentiate

posi-tive instance representations from negaposi-tives The refinement of these techniques, such asMoCoV2, underscores the vital role of data augmentation in developing robust represen-

tations This progress is pivotal for unsupervised person ReID, as it enhances the model’s

ability to discern individuals in diverse and uncontrolled environments, a cornerstone foradvancing security and monitoring systems without relying on labelled datasets The in-tegration of such learning methods into person ReID systems could significantly reduce

the need for extensive annotated data, thus streamlining the deployment of these systems

in real-world applications

13

Trang 32

Chapter 2 Related Works

2.2 Datasets

In this section, we introduce the datasets used to evaluate our proposed solution

Market1501 is a large-scale, publicly available standard dataset for person ReID It

contains 1501 identities recorded by six different cameras with 32,668 pedestrian ing boxes detected using the Deformable Part Models pedestrian detector Each individual

bound-is represented by an average of 3.6 images from each camera angle The dataset bound-is divided

into two parts: 750 identities for training and 751 for testing In the official testing cess, 3,368 query images are used to find accurate matches in a reference gallery of 19,732

pro-images

MSMT17 is a multi-scene, multi-time person ReID dataset, comprising 180 hours

of video footage captured by 12 outdoor and 3 indoor cameras over 12 time periods Thevideos cover a long duration with complex lighting variations and contain a large number

of labelled identities (4,101 identities and 126,441 bounding boxes)

2.3 Deep neural networks

Deep neural networks like ResNet and OSNet play a crucial role ResNet,

par-ticularly noted for its skip-connections, addresses the issue of gradient vanishing in deep

networks, leading to the development of deeper structures, such as the 152-layer ResNet.This breakthrough not only surpassed the depth of VGGNets but also won the 2015 Ima-

geNet challenge, establishing ResNet as a standard person ReID task due to its efficiency

and versatility

OSNet stands out in person ReID with its lightweight yet efficient architecture, inspired

14

Trang 33

Chapter 2 Related Works

by MobileNet It employs only 2.2 million parameters, demonstrating its compactness

OSNet’s use of depthwise separable convolutions, dividing into depthwise and pointwise

kernels, effectively facilitates multi-scale feature learning This ability renders OSNet

particularly adept at meeting the specific requirements of person ReID tasks, striking a

balance between performance and resource efficiency

The choice of an appropriate neural network backbone, such as ResNet or OSNet, iscritical in unsupervised person ReID These networks not only minimize computationaldemands but also serve as benchmarks for emerging methods within the research commu-

nity, ensuring adaptability and efficiency across various scenarios in person ReID

2.4 Loss Function

2.4.1 Triplet Loss

In the realm of person ReID (ReID), the triplet loss function stands as a pivotal

contri-bution, greatly influencing subsequent research and applications in this field Initially

introduced in the landmark FaceNet paper by Schroff et al (15), this loss function was

designed to optimize the feature space for face recognition tasks Its core principle

in-volves minimizing the distance between an anchor and a positive sample (same identity)

while maximizing the distance from a negative sample (different identity)

15

Trang 34

Chapter 2 Related Works

FIGURE 2.1: Illustration of triplet loss given one positive and one negative

per anchor (Image source: Schroff et al 2015)

Given one anchor input x, we select one positive sample xt and one negative x , meaning that xt and x belong to the same class and x~ is sampled from another different class Triplet loss learns to minimize the distance between the anchor x and positive xt and

maximize the distance between the anchor x and negative x~ at the same time with thefollowing equation:

Prigie XX) = Evcz max (0,|/(x) — /(x?)lŠ — ILO) — F073 +8)

where the margin parameter € is configured as the minimum offset between distances

of similar vs dissimilar pairs It is crucial to select challenging x~ to truly improve the

(a) Triplet loss | (b) Hard instance contrastive loss

FIGURE 2.2: Comparison between triplet and hard instance contrastive loss

Building upon the foundational concept of triplet loss, the hard batch triplet loss furtherrefines the approach to enhance learning efficacy, particularly in complex person ReID

16

Trang 35

Chapter 2 Related Works

scenarios This variant, notably discussed in works such as 21] (1), and [5] introduces a

more stringent selection criterion for triplet formation In the context of a random identity

sampler used during training, a mini-batch is formed by randomly selecting K instances

from P identities for each mini-batch, resulting in a batch size of PxK The batch-hard

triplet loss is defined as:

LrripletHard = Pt ye, m +max,~t k{D(ƒ,xƑ)} —min j=1, P {D(x4,x")}

n=1, ,N

P

where x? is the anchor, x 7 is the positive sample which has the same identity as xƒ, xj

is the negative sample which identity is different from xƒ D(-) -) means the euclidean

distance and m is the hyperparameter margin in hard-batch triplet loss Hard-batch triplet

ploss makes sure that give an anchor x#, x j is closer to x? than xj

2.4.2 InfoNCE Loss

In unsupervised person ReID, since there is no ground truth person identity and the pseudo

labels are changing during training, non-parametric classification loss such as InfoNCE

are used as identity loss (10] present innovative approaches to unsupervised

person ReID using InfoNCE loss Similar to InfoNCE, apply InfoNCE loss between

cluster feature and query instance feature on unsupervised ReID HHCL{I0] integrates

both cluster-level and instance-level contrastive learning with InfoNCE The InfoNCE is

defined as:

Lạ =E| log =

where g is an encoded query and kT is a positive feature which has the same label with

q selected from a set of candidates k!, k2, kŸ + is a temperature hyper-parameter that

controls the scale of similarities

17

Trang 36

Chapter 2 Related Works

2.5 Common approaches

2.5.1 Overview

Unsupervised person ReID methodologies are generally divided into two main categories

based on their reliance on large-scale labelled datasets The first category, unsuperviseddomain adaptive (UDA), requires a labelled source dataset for initial training beforeadapting to the target unsupervised domain The second category, fully unsupervised

ReID, operates without any labelled source data, relying entirely on unlabelled datasets

for learning discriminative features for person ReID This categorization reflects the

var-ied approaches in handling the absence of labels in person ReID tasks

2.5.2 Unsupervised domain adaptation

Unsupervised Domain Adaptation (UDA) for person ReID is an innovative approach in

the field It’s used when we need to identify people across different cameras, especiallywhen the target domain (where we apply the model) doesn’t have labels UDA solves this

by transferring knowledge from a source domain that has labelled data

18

Trang 37

Chapter 2 Related Works

Labelled source domain 4

tributes Source

¢ Image Style Transfer has emerged as a pioneering approach Here, the based techniques are paramount, with methods like CycleGAN reducing the domain

GAN-discrepancy The PTGAN and SPGAN utilize this framework to transpose source

images into the target style The result is a seamless blending where the sourceidentity is maintained within the stylistic context of the target domain, thus enabling

a ReID model trained on these transformed images to perform more effectively in

the target domain

¢ Adversarial Learning introduces a competitive element, pitting a domain

discrim-inator against the main model The discrimdiscrim-inator’s task is to distinguish between

source and target domain distributions, while the main model endeavours to ate features indistinguishable from the discriminator This tug-of-war leads to the

gener-extraction of domain-agnostic features, a core objective of UDA ReID

19

Ngày đăng: 02/10/2024, 02:57

TỪ KHÓA LIÊN QUAN

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

w