NHẬN XÉT KHÓA LUẬN TÓT NGHIỆP CUA CÁN BỘ HUONG DAN Tên khóa luân: KHAI THÁC CÁC KỸ THUẬT KHAI PHÁ MẪU HIỆU QUÁ CHO BÀI TOÁN TÁI ĐỊNH DANH NGƯỜI KHÔNG GIÁM SÁT Nhóm SV thực hiện: Cán bô h
Trang 1ĐẠI HỌC QUOC GIA THÀNH PHO HO CHÍ MINH TRUONG DAI HOC CONG NGHE THONG TIN
KHOA KHOA HOC MAY TINH
NGUYEN TIEN HUNG - 20520198
CAO VĂN HUNG - 20520193
NGƯỜI KHÔNG GIÁM SÁT
EXPLOITING EFFICIENT PATTERN MINING
TECHNIQUES FOR UNSUPERVISED PERSON
RE-IDENTIFICATION
CU NHÂN TAI NANG NGANH KHOA HỌC MAY TÍNH
GIANG VIEN HUONG DAN
TS NGUYEN VINH TIEP
TP HO CHi MINH, NAM 2024
Trang 2ĐẠI HỌC QUOC GIA THÀNH PHO HO CHÍ MINH
TRUONG DAI HOC CONG NGHE THONG TIN
KHOA KHOA HOC MAY TINH
NGUYEN TIEN HUNG - 20520198
CAO VAN HUNG - 20520193
NGƯỜI KHÔNG GIÁM SÁT
EXPLOITING EFFICIENT PATTERN MINING
TECHNIQUES FOR UNSUPERVISED PERSON
RE-IDENTIFICATION
CU NHÂN TAI NĂNG NGANH KHOA HOC MAY TÍNH
GIANG VIEN HUONG DAN
TS NGUYEN VINH TIEP
TP HO CHÍ MINH, NAM 2024
Trang 3DANH SÁCH HOI DONG BẢO VỆ KHÓA LUẬN
Hội đồng cham khóa luận tốt nghiệp, thành lập theo Quyết định số
-5-ngầy của Hiệu trưởng Trường Đại học Công nghệ Thông tin.
— Chủ tịch.
„I4 — Thư ký
sa — Ủy viên
Trang 4ĐẠI HỌC QUOC GIA TP HO CHI MINH CONG HÒA XÃ HOI CHỦ NGHĨA VIỆT NAM
TRƯỜNG ĐẠI HỌC Độc Lập - Tự Do - Hạnh Phúc CÔNG NGHỆ THÔNG TIN
TP HCM, ngay thang ndm
NHẬN XÉT KHÓA LUẬN TÓT NGHIỆP
CUA CÁN BỘ HUONG DAN
Tên khóa luân:
KHAI THÁC CÁC KỸ THUẬT KHAI PHÁ MẪU HIỆU QUÁ CHO BÀI TOÁN TÁI
ĐỊNH DANH NGƯỜI KHÔNG GIÁM SÁT
Nhóm SV thực hiện: Cán bô hướng dẫn:
Nguyễn Tiến Hưng 20520198 TS Nguyễn Vinh Tiệp
Cao Văn Hùng 20520193
Đánh gia Khóa luận
1 Vé cuôn báo cáo:
Số trang Số chương
Số bảng số liệu Số hình vẽ
Số tài liệu tham khảo Sản pham
Một sô nhận xét vê hình thức cuôn báo cáo:
Trang 54 Về thái độ làm việc của sinh viên:
Người nhận xét
(Ký tên và ghi rõ họ tên)
Trang 6ĐẠI HỌC QUOC GIA TP HO CHÍ MINH CONG HÒA XÃ HOI CHỦ NGHĨA VIỆT NAM
TRƯỜNG ĐẠI HỌC Độc Lập - Tự Do - Hạnh Phúc
CÔNG NGHỆ THÔNG TIN
TP HCM, ngay thang ndm
NHAN XÉT KHÓA LUẬN TOT NGHIỆP
CUA CAN BO PHAN BIEN
Tên khóa luân:
KHAI THÁC CÁC KỸ THUẬT KHAI PHÁ MẪU HIỆU QUÁ CHO BÀI TOÁN TÁI
ĐỊNH DANH NGƯỜI KHÔNG GIÁM SÁT
Nhóm SV thực hiện: Cán bô phản biên:
Cao Văn Hùng 20520193
Đánh gia Khóa luận
5 Vệ cuôn bao cáo:
Số trang Số chương
Số bảng số liệu Số hình vẽ
Số tài liệu tham khảo Sản pham
Một sô nhận xét về hình thức cuôn báo cáo:
8 Vệ thái độ làm việc của sinh viên:
Trang 7Người nhận xét
(Ký tên và ghi rõ họ tên)
Trang 8ĐẠI HỌC QUOC GIA TP HO CHÍ MINH CONG HÒA XÃ HOI CHỦ NGHĨA VIỆT NAM
TRƯỜNG ĐẠI HỌC Độc Lập - Tự Do - Hạnh Phúc
CÔNG NGHỆ THÔNG TIN
ĐĂNG KÝ DE TÀI KHÓA LUẬN TOT NGHIỆP
Tên đề tài: Khai thác các kỹ thuật khai phá mẫu hiệu quả cho bài toán tái định danh
người không giảm sát
Tên đề tài tiếng Anh: Exploiting Efficient Pattern Mining Techniques for
Unsupervised Person Re-identification
Ngôn ngữ thực hiện: Tiếng Việt
Cán bộ hướng dẫn: Tiến sĩ Nguyễn Vinh Tiệp
Thời gian thực hiện: Từ ngày 09/2023 đến ngày 12/2023.
Sinh viên thực hiện:
Nguyễn Tiến Hưng - 20520198 Lớp: KHTN2020
Email: 20520198@gm.uit.edu.vn Điện thoại: 0915730264
Cao Văn Hùng - 20520193 Lớp: KHTN2020
Email: 20520193@gm.uit.edu.vn Điện thoại: 0966160250
Nội dung đề tài: (M6 ta chỉ tiết mục tiêu, phạm vi, đối tượng, phương pháp thực hiện, kết
quả mong đợi của đề tài)
Bài toán tái định danh người không giám sát là một trong những thách thức quan
trọng trong lĩnh vực thị giác máy tính Nó liên quan đến việc nhận diện lại một người từ
một tập hợp các hình ảnh hoặc video mà không cần dùng dit liệu gan nhãn Điều này có
ứng dụng quan trọng trong giám sát an ninh, quản lý đám đông và hệ thống thành phố
thông minh.
Trang 9Bài toán tái định danh người không giám sat (Unsupervised
Person-Re-identification) yêu cau tìm ra hình ảnh của một người trong nhiều camera khác nhau và trình bày kết quả dưới dang danh sách các đối tượng con người từ cơ sở dữ liệu, được sắp
xếp theo mức độ tương đồng với hình ảnh cần nhận diện Việc này trở nên quan trọng khi
việc thu thập đữ liệu gan nhãn cho bài toán này trở nên khó khăn hoặc không khả thi.
Phạm vi:
- _ Tập trung vào tái định danh người mà không thay đổi quan áo.
- Tap trung vào các mô hình học không giám sat.
Trong nghiên cứu này, các mục tiêu đưa ra là:
- _ Nghiên cứu các kỹ thuật tiên tiến cho bai toán tái định danh người không giám sát.
- Cai đặt, thử nghiệm các phương pháp trên các bộ dữ liệu chuẩn Sau đó đánh giá
kết quả nhận được.
- Dé xuất và đánh giá các phương pháp khai thác mẫu hiệu quả cho việc học không
giám sát đối với bài toán tái định danh người.
- _ Xây dựng ứng dụng minh hoa.
Nội dung và phương pháp nghiên cứu:
- - Nội dung 1: Khảo sát tong quan về các hướng tiếp cận cho bài toán tái định danh
người dựa trên học không giám sát.
o Tìm hiểu tổng quan các hướng tiếp cận chính cho bài toán hiện có.
o Tìm hiểu tổng quan các bộ dit liệu đánh giá chuẩn hiện có.
- _ Nội dung 2: Cài đặt, thử nghiệm các phương pháp trên các bộ dit liệu chuẩn Sau
đó đánh giá kết quả nhận được
o Thực hiện việc cai đặt các phương pháp và thuật toán đã được nghiên cứu
trong phan 1 trên các bộ dữ liệu chuẩn.
o_ Tiến hành đánh giá kết quả để xem các phương pháp này hoạt động như thế
nào trong ngữ cảnh của bai toán học không giám sat tái định danh người.
- Nội dung 3: Đề xuất và đánh giá phương pháp khai thác mẫu hiệu quả cho việc
học không giám sát đối với bài toán tái định danh người
o_ Đề xuất phương pháp cải tiến dựa trên kết quả khảo sát và các van đề chưa
giải quyết được ở Nội dung 1 và 2.
- _ Nội dung 4: Xây dựng ứng dung minh hoa.
o_ Thiết kế giao diện và xây dựng hệ thống tái định danh người nền tảng web
Kết quả dự kiến:
Trang 10Kết quả khảo sát tông quan về các hướng nghiên cứu của học không giám sát tái
định danh người.
Tài liệu mô tả về các bộ đữ liệu đánh giá hiện có cho bài toán tái định danh người.
Tài liệu mô tả chỉ tiết về phương pháp được đề xuất và kết quả đánh giá, so sánh giữa phương pháp đề xuất và các phương pháp liên quan.
Kế hoạch thực hiện: (Mô tả tóm tắt kế hoạch làm việc và phân công công việc cho từng
sinh viên tham gia)
Tuần 1-2: Tiến hành nội dung 1 - Khảo sát tổng quan về bài toán và các hướng tiếp
cận chung của bài toán học không giám sát cho tái định danh người.
Tuần 3-4: Tiến hành nội dung 2 - Cài đặt, thử nghiệm các phương pháp trên các
bộ đữ liệu chuẩn Sau đó đánh giá kết quả nhận được.
Tuần 5-8: Tiến hành nội dung 3 - Đề xuất và đánh giá phương pháp khai thác mẫu hiệu quả cho việc học không giám sát đối với bài toán tái định danh người.
Tuần 9-12: Tiến hành nội dung 4 - Xây dựng ứng dụng minh hoạ.
Phân công công việc:
Cao Văn Hùng: khảo sát các nghiên cứu liên quan, chạy thực nghiệm, viết báo cáo Nguyễn Tiến Hưng: khảo sát các nghiên cứu liên quan, chạy thực nghiệm, viết báo
cáo, xây dựng ứng dụng minh họa.
Xác nhận của CBHD TP HCM, ngày tháng năm 2023 (Ký tên và ghi rõ họ tên) Sinh viên
(Ký tên và ghi rõ họ tên)
Trang 11This thesis was completed successfully thanks to a lot of help and support from manypeople We’re really thankful for their helpful feedback We want to start by thanking
our supervisor, Dr Nguyen Vinh Tiep, for his great guidance and help throughout this
research His advice was really important in helping us do our research and finish this
thesis
We also want to say a big thank you to the Dean and all the teachers in the ComputerScience Department at the University of Information Technology They supported us a lotand taught us everything we needed to know to complete this thesis
We’re also thankful to the Multimedia Laboratory (MMLab-UIT) for giving us a good
place to do our research and for the advanced equipment they provided Also, a special
thanks to the researchers at MMLab for their useful feedback and questions, which really
helped make our research better They helped us find and fix mistakes, which made thisthesis better
Trang 132.6.1 Consistent mining sfrateøy| Ặ
2.6.2 Adaptive mining stratesy| Ặ.ẶẶo
Trang 15List of Figures
1.1 A common flow chart of person ReID system Datasets are collected from
multi-camera systems for training and testing The training phase involves
learning person feature representations During testing, the system
re-ceives a query to locate a matching person in gallery images, resulting in
a ranked list of potential matches [II|| 3
1.2 Examples of some person RelD challenges Each pair of images shows
the same person except (g) (a) viewpoint variation, (b) pose variations,
(c) illumination changes, (d) partial occlusion, (e) inaccurate pedestrian
image, but not in the second), (g) low resolution, (h) different people with
similar clothing) 2 ee ee 5
2.1 Illustration of triplet loss given one positive and one negative per anchor
(Image source: Schroff etal 2015)| 16
2.2 Comparison between triplet and hard instance contrastive loss} 16
2.3 Illustration of feature alignment approach for unsupervised domain
adap-tation Mid-level attribute features are aligned between source and targetdomains in a joint learning pipeline Ja, represents the attribute align-ment loss between the source attributes and the target attributes Source
PJP we ee 19
Trang 16traction and clustering from unlabeled target data to generate
pseudo-labels 2 Model training using the unlabeled data along with these
2.5 Hard instance contrastive loss compares input sample with hard positive
that belong to the same cluster and hard negative instances from other
clusters Visualization From HHCL [I0Ì| 23
3.1 Hybrid Contrast Learning Framework [10] 1) Initialization using a
clus-tering algorithm to create pseudo labels and initialize memory banks 2)
Forward propagation to calculate cluster and hard instance contrastive
losses 3) Backpropagation to update the encoder model 4) Updating
the instance and cluster centroid memory banks} 263.3_ Cluster-level contrastive loss| Ặ 28
3.4 Memory Base HardMiningl| 29
3.5 Overview of our method The ClusterNCE compute contrastive loss in
cluster level with dynamic momentum update In the Instance-level
con-trastive loss, we apply adaptive positive mining In this context, x € Xdenotes the training dataset, g represents the query instance’s feature vec-
tor, and c¿ signifies the k-th cluster feature vector, with feature vectors of
the same color belonging to the same cluster Additionally, i; represents
the j-thinstance memory.|_ ẶẶẶẶẶ 33
3.6 Different ways to assign weights of cluster centroid (J9])| 34
Trang 173.7 Person RelD datasets exhibit varying levels of intra-class differences (a)
For large intra-class variations caused by factors like occlusion, lighting
changes, and different viewpoints, mining the hardest positive pair can
negatively impact metric learning In these cases, opting for the
least-hard pair 1s more beneficial (b) When there’s small intra-class variation,
with visually similar features, both the hardest and least-hard pairs tend to
show significant visual resemblances|_
3.8 Visualize different sampling method|
4.1 MarketI50l) 0020000 0 0.000.000 00000
4.2 Statistics of MSMII/
4 3_ MSMTI7 vs Market1501 Each column shows two sample images of the
same identity MSM T17 presents a more challenging and realistic person
RelID taskfi, 00.) gee er VG ee eee 44
Trang 18List of Tables
4.1 DatasetCompariIson| 0.2.00 eee ee 44
4.2 Ablation studies on proposed method| 48
4.3 Comparison with the state-of-the-art methods on Market-1501 and MSMT17
LIF 6© RR ee 49
Trang 19In the evolving field of unsupervised person re-identification (Re-ID), this thesis focuses
on enhancing methods for identifying individuals across various camera views without
the aid of labelled data Unsupervised Re-ID presents challenges in accurately learningfrom unannotated datasets and is crucial for applications in security and surveillance
The research is motivated by exclusive hard feature mining limitations in unsupervised
Re-ID, particularly the risk of incorporating noisy samples and the diminished impact
of hard instances as training progresses Addressing these challenges, the thesis aims to
develop a more effective and stable training paradigm
Contributions include the implementation of the Dynamic Centroid Update Policy (DCUP)
for optimizing cluster representation vectors, and the Adaptive Positive Mining Instance
Contrastive Loss, which balances hard and easy samples for a more effective training
pro-cess These advancements aim to address the limitations of current methods, offering
robust and generalizable solutions for unsupervised Re-ID
Trang 20Chapter 1
Introduction
1.1 Overview
Enhancing public security and safety through robust surveillance systems is becoming
increasingly crucial, especially considering the rising global crime rates In recent years,statistics have shown a steady increase in various types of crimes across different regions,
emphasizing the need for effective monitoring and response mechanisms Implementing
large-scale surveillance systems, equipped with advanced person re-identification (ReID)models, is a proactive response to this growing concern These systems aid in crime
prevention and investigation and play a vital role in ensuring public safety in densely
populated urban areas, transportation hubs, and other public spaces The integration ofperson ReID technology into these systems significantly enhances their capability to trackand identify individuals in public spaces like airports, shopping centres, and urban streets,
thereby enhancing public safety and security measures
As computer vision technology continues to advance, various critical modules in a lance system should be developed for specialized tasks Modules for object detection and
surveil-classification are essential for identifying and categorizing individuals in a scene With
Trang 21Chapter 1 Introduction
the large amount of data from numerous cameras, a retrieval module becomes necessary
to efficiently search for specific persons Person ReID plays a key role in this process,
enabling the system to consistently recognize a target individual despite varying posesand environmental factors Additionally, person ReID models aid the retrieval module by
extracting effective image representations for more efficient searches
1.2 Person Re-identification
Queryimage/video
| —
a “A
Multi-camera system Images/videos Ranking list
FIGURE 1.1: A common flow chart of person ReID system Datasets are collected from camera systems for training and testing The training phase involves learning person feature rep- resentations During testing, the system receives a query to locate a matching person in gallery
multi-images, resulting in a ranked list of potential matches.
1.2.1 Problem Definition
Person ReID can be defined as the human association task on the bounding boxes drawn
by a person detection algorithm The primary input in this process is the image or videofeed from these cameras, where individuals are initially detected and their appearances
captured within bounding boxes The output, on the other hand, is the identification of
Trang 22Chapter 1 Introduction
these individuals across different camera feeds, linking appearances to establish a
con-tinuous identity track This process involves sophisticated algorithms that analyse and
compare visual features, including facial characteristics, clothing, and gait The goal is tocreate a reliable link between the observed subject and their corresponding identity in the
database
In an image-based ReID system, the workflow integrates a person detection algorithmwith a ReID algorithm For video-based ReID, also involves a human tracker that createstracklets - sequences of detections of the same individual over time - bridging the gap
between detection and ReID The focus of person ReID research is predominantly on the
matching of these bounding boxes or tracklets, ensuring accurate identification despite thevisual disparities caused by different camera angles, occlusions, or varying resolutions
We apply specific constraints to refine the focus of our research First, our model
fo-cuses on the appearance aspects of an individual’s identity, excluding scenarios involvingchanges in clothes This decision aims to narrow the problem down to its fundamentalaspects, which involve identifying and matching individuals based on consistent visual
features
Additionally, our study concentrates on image-based person ReID As a result, the model
we are developing is tailored to process only the cropped bounding boxes that encompass
individuals This limitation indicates that our system is intended to analyze still images or
frames from video feeds, with individuals pre-detected and isolated by a person detectionalgorithm
By concentrating on consistent appearance features and emphasizing image-based person
ReID, we aim to develop a robust system tailored to real-world scenarios, particularly insurveillance and security contexts While this approach has its limitations, such as not
Trang 23Chapter 1 Introduction
accounting for clothing changes and focusing solely on still images, it allows for targeted
improvements in identity recognition and tracking within the defined parameters Our
research paves the way for more specialized models in person ReID, contributing valuableinsights and methodologies to the field
FIGURE 1.2: Examples of some person RelD challenges Each pair of images shows the same
person except (g) (a) viewpoint variation, (b) pose variations, (c) illumination changes, (d) partial occlusion, (e) inaccurate pedestrian detection, (f) accessory change (the person has a back bag in
the first image, but not in the second), (g) low resolution, (h) different people with similar clothing
¢ Appearance Variation: In person ReID, a significant challenge is appearance
vari-ability due to clothing or accessory changes Over time and across different
loca-tions, individuals may alter their appearance, like changing clothes or accessories,which complicates identification
Trang 24Chapter 1 Introduction
¢ Viewpoint Challenges: The varying camera heights, distances, and angles result in
different pedestrian shapes and sizes Since a single image can’t capture a person
from all angles, each view only provides partial information, leading to intra-class
variation and inter-class confusion
¢ Occlusion Issues: In crowded areas, target individuals can be obscured by others or
objects, leading to incomplete feature representations This occlusion often results
in less robust person identification
¢ Illumination Variation: Different illumination levels, particularly in indoor and
outdoor camera networks, affect the matching of appearance representations Thisvariance contributes to inconsistencies in image brightness and quality
¢ Domain Gap: Datasets, often recorded over short periods, exhibit specific clothing
styles and illumination levels For instance, Market1501 and DukeMTMC-ReID
datasets, recorded in different seasons, reflect seasonal clothing and lighting
differ-ences, affecting model generalization
¢ Pose Variation: Human body articulation results in different appearances of the
same individual Models trained on specific poses struggle with identifying varied
poses due to changes in body part localization and visibility
¢ Low Resolution: The high cost of extensive camera coverage often results in sparse
networks and low-resolution images, as cameras are placed high and far from
sub-jects
» Clothing Similarity: In large galleries, the likelihood of similar clothing increases,
adding to matching ambiguity Identifying unique visual signatures becomes more
challenging
Trang 25Chapter 1 Introduction
¢ Gallery Size: Large public spaces covered by camera networks result in enormous
candidate sets for ReID, increasing computational requirements
¢ Data Labeling Issues: Creating a robust, supervised model requires extensive
an-notated data, which is often prohibitively expensive and labour-intensive to collect
in large camera networks
Each of these factors significantly impacts the effectiveness and accuracy of unsupervised
person ReID systems, presenting challenges that need to be addressed for successful tification
iden-1.2.3 Application
The applications of person ReID are vast and varied, reflecting its growing importance
in modern society From enhancing public safety with city-wide surveillance systems to
personalizing retail experiences and ensuring patient safety in healthcare, person ReID is
becoming an indispensable tool Its role in intelligent transportation for traffic analysis
and in human-robot interaction for personalized assistance further demonstrates its
versa-tility As technology advances, the scope and impact of person ReID are set to expand,
solidifying its position as a key technology in diverse sectors
1.3 Unsupervised Person ReID
Person ReID has emerged as a pivotal task This area initially centred around supervised
learning methods, which relied heavily on extensive datasets with detailed annotations
Trang 26Chapter 1 Introduction
These methods set the foundation for the field, harnessing richly labelled data to train
models capable of identifying individuals with notable accuracy
However, the field of ReID has increasingly shifted towards unsupervised learning
tech-niques This transition is primarily driven by the practical difficulties in obtaining large,
labelled datasets, particularly in diverse real-world environments Unsupervised
learn-ing methods offer a promislearn-ing alternative, aimlearn-ing to learn from unannotated data Yet,
this approach introduces the significant challenge of extracting reliable and discriminativefeatures without the guidance provided by labelled datasets
This challenge in unsupervised ReID underscores a critical aspect of contemporary
re-search in computer vision: the balance between the need for robust, accurate
identifica-tion and the practicalities of dataset availability and scalability As the field progresses,
the focus on effectively leveraging unannotated data while maintaining the accuracy and
reliability of identification processes becomes increasingly paramount This shift not only
reflects the adaptability of the field but also highlights the ongoing efforts to develop vative methodologies that can work within the constraints of real-world data availability
inno-In unsupervised ReID, two main strategies have emerged: Unsupervised Domain tation (UDA) and Pure Unsupervised Learning (USL) UDA utilizes a pre-trained model
Adap-from a labelled source domain to initialize or adapt to the target domain, often using style
transfer methods However, this approach faces challenges when the domains have
signif-icantly different categories, as the quality of pseudo-labels may be compromised by highlabelling noise
Recent trends have shifted focus towards pseudo-label-based methods that do not rely on
source domain data These methods generate pseudo labels either through pre-trained
classifiers or clustering algorithms like K-means or DB-SCAN [4] Approaches
Trang 27Chapter 1 Introduction
like hierarchical clustering with hard-batch triplet loss and multi-label classification tasks
have been developed to refine the quality of pseudo labels Techniques like self-paced
contrastive learning and asymmetric contrastive learning frameworks have also been ployed to form more reliable clusters and enhance feature learning invariance
em-State-of-the-art USL ReID pipelines typically involve memory dictionary initialization,
pseudo-label generation, and neural network training Innovations in this area includetreating each sample as a cluster, gradually grouping similar samples, and employingself-paced contrastive learning frameworks for more reliable cluster creation These ad-
vancements aim to improve the memory dictionary and loss function, contributing to the
progressive refinement of ReID models
1.4 Motivations
Our research is primarily motivated by the limitations observed in exclusive hard feature
mining within the realm of unsupervised person ReID Through our experiments, we have
identified critical issues with this prevailing approach:
¢ Risk of Noisy Sample Selection: The strategy of focusing solely on hard features
increases the likelihood of incorporating noisy samples into the model’s training
process These noisy samples can lead to inconsistencies and hinder the model’sability to learn effectively
¢ Diminished Impact of Hard Instances: We have observed that the reliance on hard
instances becomes less evident as the model progresses in its training This can
be attributed to the model potentially over-fitting to the hard samples, leading to adecreased capability in generalizing to new, diverse datasets
Trang 28Chapter 1 Introduction
These observations underline the necessity to explore more effective sample mining
strate-gies in the unsupervised ReID domain By investigating and implementing alternative
ap-proaches, we aim to develop methodologies that not only overcome these challenges butalso enhance the overall performance and generalization capabilities of ReID models
1.5 Objectives
In response to these challenges, our research aims to explore and implement strategiesthat could lead to more effective and stable training paradigms in unsupervised person
ReID Specifically, our objective is adjusting weights to address the noisy samples
is-sue: To tackle the identified issue of noisy samples and the instability they bring to the
network training, we plan to devise a method that appropriately adjusts the weights ofdifferent instances during the learning process By doing so, we anticipate a more reli-
able and consistent performance from the ReID system, even in the challenging context
of unsupervised learning scenarios
1.6 Contributions
This thesis contributes to the field of unsupervised person ReID with two significant
advancements:
* Dynamic Centroid Update Policy (DCUP): We apply the DCUP method, which
optimizes cluster representation vectors dynamically in contrastive learning This
method replaces traditional static centroids with dynamic cluster centroids,
enhanc-ing the accuracy and adaptability of the learnenhanc-ing process
10
Trang 29Chapter 1 Introduction
e Adaptive Positive Mining Instance Contrastive Loss: We implement an adaptive
scheme for calculating instance-level contrastive loss This approach utilizes a mix
of both hard and easy samples, rather than a fixed selection strategy, leading to amore balanced and effective training process
These contributions are aimed at addressing the limitations of current unsupervised ReID
methods, offering more robust and generalizable solutions
1.7 Thesis Structure
This thesis is divided into five key sections, each of which is designed to provide the
reader with a comprehensive overview of the topic:
¢ Chapter 1: Introduction This initial chapter offers a comprehensive introduction
to the unsupervised person ReID problem It outlines the research motivation,
de-fines key concepts, addresses the challenges in the field, and delineates our primary
contributions
¢ Chapter 2: Related Work This chapter provides an in-depth review of existing
lit-erature on unsupervised person ReID It focuses on the advancements and persistent
challenges in this area of research
¢ Chapter 3: Methodology The focus here is on explaining our proposed
method-ologies, including innovative strategies like the Dynamic Centroid Update Policy
(DCUP) and Adaptive Positive Mining Contrastive Loss Additionally, this chapterdelves into the baseline model applied in our research, exploring its features and
limitations
11
Trang 30Chapter 1 Introduction
¢ Chapter 4: Experiments In this chapter, we detail the experimental setup,
pro-cedures, and results It highlights how our proposed methodologies have enhanced
the framework’s performance
¢ Chapter 5: Discussions and Conclusion The concluding chapter of the thesis
encapsulates the significant findings and contributions of our research It discusses
the wider implications of our work in unsupervised person ReID and suggests
di-rections for future research in the field
12
Trang 31Chapter 2
Related Works
2.1 Unsupervised representation learning
Recent strides in unsupervised representation learning, particularly through contrastive
instance discrimination methods, are profoundly influencing unsupervised person ReID
These methods operate under the principle that each image represents a distinct class
and use mechanisms such as memory banks or large mini-batches to differentiate
posi-tive instance representations from negaposi-tives The refinement of these techniques, such asMoCoV2, underscores the vital role of data augmentation in developing robust represen-
tations This progress is pivotal for unsupervised person ReID, as it enhances the model’s
ability to discern individuals in diverse and uncontrolled environments, a cornerstone foradvancing security and monitoring systems without relying on labelled datasets The in-tegration of such learning methods into person ReID systems could significantly reduce
the need for extensive annotated data, thus streamlining the deployment of these systems
in real-world applications
13
Trang 32Chapter 2 Related Works
2.2 Datasets
In this section, we introduce the datasets used to evaluate our proposed solution
Market1501 is a large-scale, publicly available standard dataset for person ReID It
contains 1501 identities recorded by six different cameras with 32,668 pedestrian ing boxes detected using the Deformable Part Models pedestrian detector Each individual
bound-is represented by an average of 3.6 images from each camera angle The dataset bound-is divided
into two parts: 750 identities for training and 751 for testing In the official testing cess, 3,368 query images are used to find accurate matches in a reference gallery of 19,732
pro-images
MSMT17 is a multi-scene, multi-time person ReID dataset, comprising 180 hours
of video footage captured by 12 outdoor and 3 indoor cameras over 12 time periods Thevideos cover a long duration with complex lighting variations and contain a large number
of labelled identities (4,101 identities and 126,441 bounding boxes)
2.3 Deep neural networks
Deep neural networks like ResNet and OSNet play a crucial role ResNet,
par-ticularly noted for its skip-connections, addresses the issue of gradient vanishing in deep
networks, leading to the development of deeper structures, such as the 152-layer ResNet.This breakthrough not only surpassed the depth of VGGNets but also won the 2015 Ima-
geNet challenge, establishing ResNet as a standard person ReID task due to its efficiency
and versatility
OSNet stands out in person ReID with its lightweight yet efficient architecture, inspired
14
Trang 33Chapter 2 Related Works
by MobileNet It employs only 2.2 million parameters, demonstrating its compactness
OSNet’s use of depthwise separable convolutions, dividing into depthwise and pointwise
kernels, effectively facilitates multi-scale feature learning This ability renders OSNet
particularly adept at meeting the specific requirements of person ReID tasks, striking a
balance between performance and resource efficiency
The choice of an appropriate neural network backbone, such as ResNet or OSNet, iscritical in unsupervised person ReID These networks not only minimize computationaldemands but also serve as benchmarks for emerging methods within the research commu-
nity, ensuring adaptability and efficiency across various scenarios in person ReID
2.4 Loss Function
2.4.1 Triplet Loss
In the realm of person ReID (ReID), the triplet loss function stands as a pivotal
contri-bution, greatly influencing subsequent research and applications in this field Initially
introduced in the landmark FaceNet paper by Schroff et al (15), this loss function was
designed to optimize the feature space for face recognition tasks Its core principle
in-volves minimizing the distance between an anchor and a positive sample (same identity)
while maximizing the distance from a negative sample (different identity)
15
Trang 34Chapter 2 Related Works
FIGURE 2.1: Illustration of triplet loss given one positive and one negative
per anchor (Image source: Schroff et al 2015)
Given one anchor input x, we select one positive sample xt and one negative x , meaning that xt and x belong to the same class and x~ is sampled from another different class Triplet loss learns to minimize the distance between the anchor x and positive xt and
maximize the distance between the anchor x and negative x~ at the same time with thefollowing equation:
Prigie XX) = Evcz max (0,|/(x) — /(x?)lŠ — ILO) — F073 +8)
where the margin parameter € is configured as the minimum offset between distances
of similar vs dissimilar pairs It is crucial to select challenging x~ to truly improve the
(a) Triplet loss | (b) Hard instance contrastive loss
FIGURE 2.2: Comparison between triplet and hard instance contrastive loss
Building upon the foundational concept of triplet loss, the hard batch triplet loss furtherrefines the approach to enhance learning efficacy, particularly in complex person ReID
16
Trang 35Chapter 2 Related Works
scenarios This variant, notably discussed in works such as 21] (1), and [5] introduces a
more stringent selection criterion for triplet formation In the context of a random identity
sampler used during training, a mini-batch is formed by randomly selecting K instances
from P identities for each mini-batch, resulting in a batch size of PxK The batch-hard
triplet loss is defined as:
LrripletHard = Pt ye, m +max,~t k{D(ƒ,xƑ)} —min j=1, P {D(x4,x")}
n=1, ,N
P
where x? is the anchor, x 7 is the positive sample which has the same identity as xƒ, xj
is the negative sample which identity is different from xƒ D(-) -) means the euclidean
distance and m is the hyperparameter margin in hard-batch triplet loss Hard-batch triplet
ploss makes sure that give an anchor x#, x j is closer to x? than xj
2.4.2 InfoNCE Loss
In unsupervised person ReID, since there is no ground truth person identity and the pseudo
labels are changing during training, non-parametric classification loss such as InfoNCE
are used as identity loss (10] present innovative approaches to unsupervised
person ReID using InfoNCE loss Similar to InfoNCE, apply InfoNCE loss between
cluster feature and query instance feature on unsupervised ReID HHCL{I0] integrates
both cluster-level and instance-level contrastive learning with InfoNCE The InfoNCE is
defined as:
Lạ =E| log =
where g is an encoded query and kT is a positive feature which has the same label with
q selected from a set of candidates k!, k2, kŸ + is a temperature hyper-parameter that
controls the scale of similarities
17
Trang 36Chapter 2 Related Works
2.5 Common approaches
2.5.1 Overview
Unsupervised person ReID methodologies are generally divided into two main categories
based on their reliance on large-scale labelled datasets The first category, unsuperviseddomain adaptive (UDA), requires a labelled source dataset for initial training beforeadapting to the target unsupervised domain The second category, fully unsupervised
ReID, operates without any labelled source data, relying entirely on unlabelled datasets
for learning discriminative features for person ReID This categorization reflects the
var-ied approaches in handling the absence of labels in person ReID tasks
2.5.2 Unsupervised domain adaptation
Unsupervised Domain Adaptation (UDA) for person ReID is an innovative approach in
the field It’s used when we need to identify people across different cameras, especiallywhen the target domain (where we apply the model) doesn’t have labels UDA solves this
by transferring knowledge from a source domain that has labelled data
18
Trang 37Chapter 2 Related Works
Labelled source domain 4
tributes Source
¢ Image Style Transfer has emerged as a pioneering approach Here, the based techniques are paramount, with methods like CycleGAN reducing the domain
GAN-discrepancy The PTGAN and SPGAN utilize this framework to transpose source
images into the target style The result is a seamless blending where the sourceidentity is maintained within the stylistic context of the target domain, thus enabling
a ReID model trained on these transformed images to perform more effectively in
the target domain
¢ Adversarial Learning introduces a competitive element, pitting a domain
discrim-inator against the main model The discrimdiscrim-inator’s task is to distinguish between
source and target domain distributions, while the main model endeavours to ate features indistinguishable from the discriminator This tug-of-war leads to the
gener-extraction of domain-agnostic features, a core objective of UDA ReID
19