ĐẠI HỌC QUỐC GIA TP HCM TRƯỜNG ĐẠI HỌC BÁCH KHOA NGUYỄN VĂN ĐỨC PHÂN LOẠI DỮ LIỆU MỘT LỚP VÀ ỨNG DỤNG TRONG BÀI TOÁN PHÁT HIỆN BẤT THƯỜNG Chuyên ngành: Khoa học Máy tính Mã số: 8480101 LUẬN VĂN THẠC SĨ TP HỒ CHÍ MINH, tháng năm 2023 CƠNG TRÌNH ĐƯỢC HỒN THÀNH TẠI TRƯỜNG ĐẠI HỌC BÁCH KHOA –ĐHQG -HCM Cán hướng dẫn khoa học: PGS TS Lê Hồng Trang (Ghi rõ họ, tên, học hàm, học vị chữ ký) Cán chấm nhận xét 1: TS Nguyễn Thị Ái Thảo (Ghi rõ họ, tên, học hàm, học vị chữ ký) Cán chấm nhận xét 2: PGS.TS Nguyễn Tuấn Đăng (Ghi rõ họ, tên, học hàm, học vị chữ ký) Luận văn thạc sĩ bảo vệ Trường Đại học Bách Khoa, ĐHQG Tp HCM ngày tháng năm 2023 Thành phần Hội đồng đánh giá luận văn thạc sĩ gồm: (Ghi rõ họ, tên, học hàm, học vị Hội đồng chấm bảo vệ luận văn thạc sĩ) Chủ tịch: PGS.TS Trần Minh Quang Thư ký: TS Phan Trọng Nhân Phản biện 1: TS Nguyễn Thị Ái Thảo Phản biện 2: PGS.TS Nguyễn Tuấn Đăng Uỷ viên: TS Đặng Trần Trí Xác nhận Chủ tịch Hội đồng đánh giá LV Trưởng Khoa quản lý chuyên ngành sau luận văn sửa chữa (nếu có) CHỦ TỊCH HỘI ĐỒNG TRƯỞNG KHOA KHOA HỌC VÀ KỸ THUẬT MÁY TÍNH ĐẠI HỌC QUỐC GIA TP.HCM TRƯỜNG ĐẠI HỌC BÁCH KHOA CỘNG HÒA XÃ HỘI CHỦ NGHĨA VIỆT NAM Độc lập - Tự - Hạnh phúc NHIỆM VỤ LUẬN VĂN THẠC SĨ Họ tên học viên: NGUYỄN VĂN ĐỨC MSHV: 1970014 Ngày, tháng, năm sinh: 21/01/1996 Nơi sinh: Đồng Nai Chuyên ngành: Khoa học Máy tính Mã số: 8480101 I TÊN ĐỀ TÀI: Phân loại liệu lớp ứng dụng toán phát bất thường One-class classification and its application in anomaly detections II NHIỆM VỤ VÀ NỘI DUNG: Nhiệm vụ luận văn nghiên cứu toán phân loại liệu lớp, tập trung cho liệu hình ảnh phương pháp học dựa vào đặc trưng sâu Luận văn cần đề xuất mơ hình cải tiến đánh giá tập liệu chuẩn tốn phát bất thường sản xuất cơng nghiệp Các nội dung luận văn bao gồm : - Nghiên cứu tổng quan toán phân lớp lớp, ứng dụng thực tiễn Các phương pháp, mơ hình đề xuất - Tổng hợp, phân tích mơ hình học dựa đặc trưng sâu cho toán phân lớp lớp, kết thực nghiệm tập liệu chuẩn cho tốn phát bất thường sản xuất cơng nghiệp - Đề xuất mơ hình cải tiến, thực nghiệm đánh giá kết thu - Viết báo cho kết đạt gửi tham dự báo cáo hội nghị khoa học quốc tế phù hợp với chủ đề nghiên cứu III NGÀY GIAO NHIỆM VỤ: 06/09/2021 IV NGÀY HOÀN THÀNH NHIỆM VỤ: 05/12/2022 V CÁN BỘ HƯỚNG DẪN: PGS TS Lê Hồng Trang Tp HCM, ngày tháng năm 20 CÁN BỘ HƯỚNG DẪN (Họ tên chữ ký) HỘI ĐỒNG NGÀNH (Họ tên chữ ký) TRƯỞNG KHOA KHOA HỌC VÀ KỸ THUẬT MÁY TÍNH (Họ tên chữ ký) Lời cảm ơn Để hoàn thiện đề tài nghiên cứu này, em cố gắng nỗ lực Tuy nhiên bênh cạnh phải kể đến hỗ trợ, giúp đỡ tận tình gia đình, bạn bè thầy cô Nhân dịp này, em xin gửi lời cảm ơn tất người hỗ trợ em thời gian qua Em xin trân trọng gửi đến thầy Lê Hồng Trang - thầy người trực tiếp hướng dẫn, cung cấp tài liệu cần thiết động viên em suốt trình thực lời cảm ơn chân thành Em xin cảm ơn ban giám hiệu tồn thể thầy trường đại học Bách Khoa đặc biệt khoa Khoa học Kỹ thuật máy tính tạo điều kiện hỗ trợ em hoàn thành tốt nghiên cứu Một lần xin cảm ơn tất người! iv Tóm tắt luận văn thạc sĩ Phát bất thường toán ứng dụng nhiều lĩnh vực khác đặc biệt liệu có thuộc lớp liệu Thách thức cho tốn lớn mà mơ hình khơng huấn luyện để nhận biết bất thường khác toán phân loại nhiều lớp bình thường Trong nghiên cứu này, tác giả trình bày kết đạt từ nghiên cứu trước đề xuất cải tiến hướng xử lý cho tốt Phương pháp đề xuất thực nghiệm liệu chuẩn tiếng đánh giá cho loại toán bất thường lớp (như MVTec BTAD) Các kết thực nghiệm khả quan củng cố cho hướng đắn phát triển thêm tương lai tới Abstract Anomaly detection is a problem widely applied in different fields today, especially when the available data belongs to only one data class The challenge for this problem is quite significant as the model will not be trained to recognize other anomalies like the regular multi-class classification problem In this study, the author presented the results obtained from previous studies and proposed to improve the current best treatment direction The proposed method is also tested on well-known benchmark data for this one-class anomaly problem (such as MVTec and BTAD) The positive experimental results confirm that the current direction is correct and can be further developed v Lời cam đoan Tơi xin cam đoan tồn số liệu kết có luận văn thạc sĩ với đề tài "Phân loại liệu lớp ứng dụng toán phát bất thường" trung thực chưa cơng bố trước Các số liệu trình bày phân tích tơi thu nhập từ nhiều nguồn khác trích dẫn đầy đủ ghi rõ nguồn gốc Nếu phát có gian lận chép kết nghiên cứu nghiên cứu trước, tơi xin hồn tồn chịu trách nhiệm Học viên: Nguyễn Văn Đức vi Mục lục Lời cảm ơn iv Tóm tắt luận văn thạc sĩ v Lời cam đoan vi Mục lục vii Danh sách hình vẽ ix Danh sách bảng xi MỞ ĐẦU GIỚI THIỆU 2.1 Tổng quan toán 2.2 Học máy toán phân loại liệu 2.3 Mục tiêu phạm vi nghiên cứu 2.4 Các tảng 2.5 Dữ liệu 27 2.6 Cách thức đánh giá 30 Các cách tiếp cận phổ biến cho toán phát bất thường với liệu lớp 33 3.1 Các thành phần toán liệu lớp 33 3.2 Các phương pháp đề xuất 34 vii MỤC LỤC Một mơ hình lớp đề xuất viii 43 4.1 Mơ hình chắt lọc liệu cải tiến 43 4.2 Thực nghiệm kết 46 KẾT LUẬN 52 DANH SÁCH CÁC CƠNG TRÌNH KHOA HỌC 53 An Improved Reverse Distillation Model for Unsupervised Anomaly Detection Paper 53 Tài liệu tham khảo 60 ix Danh sách hình vẽ 1.1 Các mốc nghiên cứu one-class classification qua thời gian [1] 2.1 Kiến trúc mạng nơ-ron nhân tạo nhiều lớp 2.2 Hàm sigmoid 10 2.3 Hàm so với sigmoid 11 2.4 Hàm ReLu so với sigmoid 12 2.5 Hàm Leaky ReLu so với ReLU 13 2.6 Hàm số y = x2 15 2.7 Chuỗi lớp xử lý mạng nơ-ron tích chập cho tốn nhận dạng chũ viết tay [2] 18 2.8 Làm phẳng ma trận hình ảnh 3x3 thành véc-tơ 9x1 19 2.9 Tích chập hình kích thước 5x5x1 với kernel 3x3x1 để có đặc trưng tích chập 3x3x1 20 2.10Thực thi lớp tích chập với độ dài trượt same padding 21 2.11Tích chập hình kích thước 5x5x1 với kernel 3x3x1 để có đặc trưng tích chập 3x3x1 22 2.12Cách hoạt động max pooling 23 2.13Tầng kết nối đầy đủ thể lớp cuối phía sau tầng Flatten 24 2.14Kiến trúc mạng ResNet 34 (ngoài bên phải) [3] 26 2.15Các biến thể mạng ResNet 27 2.16Một góc nhìn kiến trúc mạng ResNet 34 27 2.17Khối tích chập conv1 28 x 2.18Khối tích chập conv1 Max pooling 28 2.19Ảnh ví dụ lấy từ tập liệu MVTec AD [4] 29 2.20Số lượng báo sử dụng BTAD qua năm [5] 30 2.21Ảnh ví dụ lấy từ tập liệu BTAD [5] 31 2.22Minh họa đường cong ROC [6] 32 3.1 Chiến lược đào tạo sử dụng trng OCCNN [7] 35 3.2 Hình ảnh thể thành phần khác kiến trúc mạng OCGAN [8] 37 3.3 Chắt lọc tri thức với mạng GAN 38 3.4 Học lũy tiến KDGAN [9] 39 3.5 Kiến trúc mơ hình chắt lọc ngược [10] 41 4.1 Mơ hình chắt lọc tri thức nguyên 43 4.2 Cải tiến cho mô hình chắt lọc liệu 44 4.3 Cải tiến thứ hai cho mơ hình chắt lọc liệu 45 4.4 Kết quan sát tập liệu Bottle, Wood, Toothbrush, Zipper, Hazelnut 51 49 Bảng 4.3: Kết đánh giá với tập MVTec sử dụng độ đo AUROC PRO (phần 1) Categories Carpet Grid Textures Leather Tile Wood Average Bottle Cable Capsule Hazelnut Metal_Nut Objects Pill Screw Toothbrush Transistor Zipper Average Average US[13] -/87.9 -/95.2 -/94.5 -/94.6 -/91.1 -/92.7 -/93.1 -/81.8 -/96.8 -/96.5 -/94.2 -/96.1 -/94.2 -/93.3 -/66.6 -/95.1 -/90.8 -/91.4 MF[16] -/87.8 -/86.5 -/95.9 -/88.1 -/84.8 -/88.6 -/88.8 -/93.7 -/87.9 -/88.6 -/86.9 -/93.0 -/95.4 -/87.7 -/92.6 -/93.6 -/90.8 -/90.1 SPADE[19] 97.5/94.7 93.7/86.7 97.6/97.2 87.4/75.9 88.5/87.4 92.9/88.4 98.4/95.5 97.2/90.9 99.0/93.7 99.1/95.4 98.1/94.4 96.5/94.6 98.9/96.0 97.9/93.5 94.1/87.4 96.5/92.6 97.6/93.4 96.5/91.7 PaDiM[17] 99.1/96.2 97.3/94.6 99.2/97.8 94.1/86.0 94.9/91.1 96.9/93.2 98.3/94.8 96.7/88.8 98.5/93.5 98.2/92.6 97.2/85.6 95.7/92.7 98.5/94.4 98.8/93.1 97.5/84.5 98.5/95.9 97.8/91.6 97.5/92.1 Our 99/97 99.3/97.5 99.5/99.2 95.7/90.4 95.9/92.4 97.88/95.3 98.8/96.7 98.5/93.8 98.6/95.6 99/95.9 97.5/92.4 98.1/96.3 99.6/98.2 99.1/94.5 92.4/78.8 98.3/95.7 97.99/93.79 98.0/94.3 giải thuật dựa học sâu khơng giải thích lý đánh giá kết Nhưng hình ảnh ta thấy so với ground-truth biểu đồ dị thường đánh dấu gần xác vị trí bất thường ảnh Điều thực cần thiết trường hợp thực tiễn, xác định vị trí bất thường vật thể, ta tập trung sửa chửa phân loại bất thường này, tiết kiệm thời gian công sức Tiếp đến, nghiên cứu thực nghiệm tập liệu BeanTech Anomaly Detection (BTAD) Đây tập liêu tiếng việc đánh giá mơ hình phát bât thường liệu lớp Bảng 4.5 bảng 4.6 thể kết model so sanh với mơ hình chắt lọc tri thức ngun Mơ hình đề xuất giúp tăng độ xác độ đo AUROC mức sample lên 1.1% với 94.77% đạt 97.63% mức level 50 Bảng 4.4: Kết đánh giá với tập MVTec sử dụng độ đo AUROC PRO (phần 2) Categories Carpet Grid Textures Leather Tile Wood Average Bottle Cable Capsule Hazelnut Metal_Nut Objects Pill Screw Toothbrush Transistor Zipper Average Average RIAD[20] 96.3/98.8/99.4/89.1/85.8/93.9/98.4/84.2/92.8/96.1/92.5/95.7/98.8/98.9/87.7/97.8/94.3/94.2/- CutPaste[18] 98.3/97.5/99.5/90.5/95.5/96.3/97.6/90.0/97.4/97.3/93.1/95.7/96.7/98.1/93.0/99.3/95.8/96.0- RD[10] 99/97 98.1/95.7 99.4/99.1 95.4/90.2 95.3/90.9 97.44/94.58 98.7/96.6 97.3/91.0 98.7/95.6 98.9/95.5 97.3/92.2 98.3/96.6 99.6/98.2 99.1/94.6 92.7/78.3 98.6/96.1 97.92/93.48 97.8/93.8 Our 99/97 99.3/97.5 99.5/99.2 95.7/90.4 95.9/92.4 97.88/95.3 98.8/96.7 98.5/93.8 98.6/95.6 99/95.9 97.5/92.4 98.1/96.3 99.6/98.2 99.1/94.5 92.4/78.8 98.3/95.7 97.99/93.79 98.0/94.3 Bảng 4.5: Kết thực nghiệm tập BTAD[5] sử dụng độ đo AUROC mức độ mẫu (sample level) Category/Method Average RD[10] 98.2 83.3 99.5 93.67 Revise 96.5 87.9 99.6 94.67 Revise 98.6 86 99.7 94.77 Bảng 4.6: Kết thực nghiệm tập BTAD[5] sử dụng độ đo AUROC PRO mức pixel (pixel level) Category/Method Average RD[10] 96.3/75.5 96.6/58.5 99.7/86.4 97.53/73.47 Revise1 96.8/75.4 96.5/60.2 99.6/84.8 97.63/73.47 Revise2 96.9/77.9 96.4/59.4 99.6/83.7 97.63/73.67 51 Hình 4.4: Kết quan sát tập liệu Bottle, Wood, Toothbrush, Zipper, Hazelnut 52 Chương KẾT LUẬN Bài toán phát bất thường cho tập liệu có lớp đã, đề tài khó mang tính thực tiễn cao mà đa số tốn ta có đa số lớp liệu Luận văn trình bày phương pháp có hiệu tốt cải tiến thành phương pháp mang tính chất lượng cao thơng qua chứng thực nghiệm thực tiễn Pháp hiển nhiên chưa phải tốt nhất, nghiên cứu tiếp tục hoàn thiện Các kết cho thấy phương pháp có hướng tính thực tiễn cao 53 DANH SÁCH CÁC CƠNG TRÌNH KHOA HỌC N.Van Duc, H H Bach and L.H Trang, "An Improved Reverse Distillation Model for Unsupervised Anomaly Detection," 2023 17th International Conference on Ubiquitous Information Management and Communication (IMCOM) , Seoul, Korea, Republic of, 2023, pp 1-6, doi: 10.1109/IMCOM56909.2023.10035610 54 An Improved Reverse Distillation Model for Unsupervised Anomaly Detection Nguyen Van Duc Hoang Huu Bach Le Hong Trang ( ) Faculty of Computer Science Faculty of Computer Science and Engineering Faculty of Information Technology and Engineering University of Engineering Ho Chi Minh City University Ho Chi Minh City University and Technology (UET), of Technology (HCMUT), of Technology (HCMUT), 144 Xuan Thuy, Cau Giay, 268 Ly Thuong Kiet Street, District 10 268 Ly Thuong Kiet Street, District 10 Vietnam National Vietnam National University Vietnam National University University Ha Noi Ho Chi Minh City, Ho Chi Minh City, Hanoi, Vietnam Linh Trung Ward, Thu Duc City Linh Trung Ward, Thu Duc City Ho Chi Minh City, Vietnam Ho Chi Minh City, Vietnam nvduc.sdh19@hcmut.edu.vn lhtrang@hcmut.edu.vn Abstract—Using knowledge distillation for unsupervised anomaly detection problems is more efficient Recently, a reverse distillation (RD) model has been presented a novel teacherstudent (T-S) model for the problem [7] In the model, the student network uses the one-class embedding from the teacher model as input with the goal of restoring the teacher’s representations The knowledge distillation starts with high-level abstract presentations and moves down to low-level aspects using a model called one-class bottleneck embedding (OCBE) Although its performance is expressive, it still leverages the power of transforming input images before applying this architecture Instead of only using raw images, in this paper, we transform them using augmentation techniques The teacher will encode raw and transformed inputs to get raw representation (encoded from raw inputs) and transformed representation (encoded from transformed inputs) The student must restore the transformed representation from the bottleneck to the raw representation Testing results obtained on benchmarks for AD and one-class novelty detection showed that our proposed model outperforms the SOTA ones, proving the utility and applicability of the suggested strategy Index Terms—One-class Classification, Anomaly Detection, Deep Neural Networks, Deep Learning, Distillation I I NTRODUCTION Anomaly or novelty detection is to recognize unexpected patterns in a collection of homogeneous natural images Anomaly detection has several uses, including visual industrial inspections Anomalies on production lines, however, are incredibly uncommon occurrences and challenging to find manually Automated anomaly detection, therefore, is crucial for not only the quality control, but also saving human efforts Overall, the problem’s nature makes it challenging to collect a significant volume of abnormal data, labeled or not On the other hand, many manufacturing defects are tiny which can only be highlighted by a high-resolution camera based optics systems Furthermore, many defect types keep the same vision feature behaviors It thus is difficult to differentiate 978-1-6654-5348-6/23/$31.00 ©2023 IEEE such defects from each others Therefore, building an anomaly detector using one-class classification which only needs normal data, would be feasible in this situation In the one-class Fig Reverse Distillation Architecture [7] classification, a model tries to learn features of a normal sample set The model then recognizes anomalies if it does not represent the feature of a test sample In order to identifies such anomalies, for example, if a data reconstruction error is significant, an autoencoder which was trained to reconstruct normal data, can be used A generative model will recognize anomalies by means of comparing defined anomaly score to a given threshold It, however, has a high amount of semantic information that might not be presented in the anomaly score A potential approach to anomaly detection is knowledge distillation performed from trained models The teacher-student model is expected to extract different features on anomalies in the context of unsupervised AD This can performed by the student model It is exposed to normal samples in knowledge distillation [2], [18], [22] Different strategies are introduced to improve further the T-S model’s ability to discriminate between different types of anomalies For instance, US [2] ensembles multiple models trained on normal data at several 55 scales to detect a multi-scale abnormality, and MKD [18], proposes using multi-level feature alignment The reverse distillation teacher-student model in [7] (described at Algorithm 2) is constructed on several components including a teacher encoder and a student decoder, in contrast to the traditional knowledge distillation framework where both instructor and student follow the encoder structure The student decoder uses the low-dimensional embedding as input rather than simultaneously feeding the teacher-student model the raw data so that it can emulate the instructor’s behavior by restoring the representations of the teacher model We have improved the Reverse Distillation to help maximize the reproducibility of both teacher and student Students now learn not only features from raw input but also input transformations The goal is for students to understand that raw input can be recovered even with different class views This approach significantly efficiently demonstrates the MVTec Anomaly Detection (MVTec AD [3]) dataset compared to the previous and original Reverse Distillation model II R ELATED W ORK Recently, anomaly detection on a one-class dataset has been widely studied Before the reverse distillation strategy, several other families had success dealing with this problem A family uses normal support vectors, and traditional anomaly detection techniques concentrates on computing a compact closed distribution for one-class representation Oneclass support vector machine (OC-SVM) [20] and support vector data description (SVDD) [21] are well-known models in this approach Their deep network versions can be found in [17] (DeepSVDD) and [24] PatchSVDD They were proposed to deal with high-dimensional data Another family is reconstruction-based methods, including AutoEncoder (AE) [12] and Generative Adversarial Nets (GAN) [10] In these methods, generative models trained on normal samples can accurately restore regions free of anomalies while failing for anomalous regions [1], [4], [19] It, however, still remains that deep models can generate well even typical regions can be recovered [25] In order to solve this problem, reconstructionbased methods combine memory mechanisms, image masking strategy, and pseudo-anomaly [9], [11], [15] Metaformer (MF) [23] recently proposed using meta-learning to close the gap between model adaption and reconstruction It also notes that the proposed reverse knowledge distillation is an encoderdecoder architecture based model However, in a generative model, the encoder and decoder are concurrently trained initially, whereas, in our reverse distillation, the instructor is frozen as a previously taught model Anomaly is then detected using on the semantic feature space rather than on pixel-level reconstruction error Lately, it has been demonstrated that networks that have been pre-trained on a large dataset, can extract discriminative features for anomaly detection [5], [24] Utilizing a pre-trained model’s anomaly-free features can help to determine anomalous samples [5] The investigations in [16], [24] also gave accurate anomaly detection results from using the Mahalanobis distance These approaches are computationally expensive since they necessitate memorizing every feature from training data it should also note that although the method is similarly based on knowledge distillation, the reverse distillation is the first one using an encoder and a decoder to for the teacherstudent model Because of the diversity of teacher and student networks as well as the reverse data flow in the knowledge distillation, approach combining raw and transformed input differs from existing ones III O UR I MPROVEMENT A The Original Teacher-Student Reverse Distillation Our work is to improve the teacher-student reverse distillation method proposed in [7] We now recall the proposed architecture The teacher network encodes the input image into an embedding vector in the architecture The student network has to restore representations of the teacher However, if using that direct flow, there are some disadvantages: • The capacity of the teacher model is often relatively high Although a high-capacity model can extract rich features, high-dimensional descriptors probably contain a significant amount of redundant information High representation variety and redundancy will challenge the student model to interpret the crucial normal features • The final block in the encoder often determine the semantic and structural features of input images It is difficult for the student decoder to restore low-level characteristics with this high-level representation Previous attempts usually skip routes, the link between the encoder and decoder In knowledge distillation, this strategy fails because the skip pathways reveal anomalous information to the student during inference Therefore, before one-class embedding, the bottleneck module concatenates multi-scale representations to overcome the issue of low-level feature restoration in the decoder Fig presents the origin architecture, the flow through the teacher to BottleNeck, and then to the student The loss function calculates from Outputs and Fig The origin reverse distillation Anomaly map and score Let ϕ be the projection from input data I to the bottleneck embedding space The paired 56 activation in the original reverse distillation is {fEk = E k (I), k fD = Dk (ϕ)}, in the teacher and student models, where E k and Dk are the k th encoding and decoding block, respectively, k fEk , fD ∈ RCk ×Hk ×Wk , where Ck , Hk , and Wk denote the number of channels, height, and width of the k th layer activation tensor The 2-D anomaly map M k ∈ RHk ×Wk is given by k (fEk (h, w))T fD (h, w) M k (h, w) = − k k f (h, w) f (h, w) E D (1) Accumulating multi-scale anomaly maps in the multi-scale knowledge distillation gives the student loss as follows k Wk K X XX LKD = M k (h, w) (2) Hk Wk w=1 Fig The first improved reverse distillation h=1 k=1 In [7] they considered the pixel-level anomaly score measurement for anomaly localization (AL) at the inference stage The teacher model can express abnormality in a query sample’s features when it is abnormal Because the student only learns to restore normal representations from the compact oneclass embedding in knowledge distillation, it is likely to fail in the restoration of abnormal features In the other words, when a test image is unusual, the student gives different representations from the teacher They did an up-sampling Mk to image size in order to pinpoint anomalies in a test image The bi-linear up-sampling technique, denoted by Ψ, was applied in this work A score map is computed by adding up each anomalous map pixel by pixel with a Gaussian filter to remove the noises SAL = L X Ψ(M i ) Revise one demonstrates that it performs better than the original Architecture on the Texture datasets, but the difference is minor in the Object datasets Considering our ongoing attempts to better it, The Revise 2—a combination of the Revise flow and the original flow—was formed On Object datasets, Revise two performs better than both previous flows but slightly worse than Revise on textures datasets We strongly suggest using Revise for Texture datasets and Revise for Object Datasets The flow of Revise is described in Fig The combination of Revise and Revise for multiple data kinds gives impressive results, which we refer to as the Improved Reverse Distillation Model (3) i For samples with minor anomaly regions, averaging all values in a score map SAL is unfair regarding anomaly detection There is a most responsive spot for anomalies of any size Therefore, we define the sample-level anomaly score (SAD ) as the most outstanding value in SAL The assumption is that their anomaly score map for normal samples shows no noticeable response B The Improved Reverse Distillation Reverse distillation exhibits excellent performance but still does not take advantage of the augmentation technique With the expectation to help Student Decoder can learn more information, we have tried to add some augmentation to the input image before passing it to the pipeline After transforming the same image, students will receive different inputs and must restore the original image representations The pipeline is described in Fig 3, called Revise The anomaly map now performs as the formula Where the student’s output, as determined by the transform route, is represented by f (fEk (h, w))T f 1kD (h, w) M k (h, w) = − f k (h, w) f 1k (h, w) E D (4) Fig The second improved reverse distillation IV E XPERIMENTS We use the MVTec AD dataset [3] for our research This is a dataset for comparing anomaly detection approaches in industrial inspection More than 5,000 high-resolution (700 x 700 1024 x 1024 pixels) photos with five texture classes and ten object classes make up the dataset Each class consists of training photos with no faults, testing images with varied defects, and defect-free images Additionally, the testing data offers pixel-level comments on every anomaly 57 TABLE I Results of the anomaly detection on MVTec [3] Those strategies used to obtain the top outcomes AUROC (%) for each category of 256x256 photos are marked in bold Our improvement comes in best by means of the average scores of textures, objects, and overall Textures Objects Average Categories Carpet Grid Leather Tile Wood Average Bottle Cable Capsule Hazelnut Metal Nut Pill Screw Toothbrush Transistor Zipper Average GT [8] 43.7 61.9 84.1 41.7 61.1 58.5 74.4 78.3 67 35.9 81.3 63 50 97.2 86.9 82 71.6 67.2 GN [1] 69.9 70.8 84.2 79.4 83.4 77.5 89.2 75.7 73.2 78.5 70 74.3 74.6 65.3 79.2 74.5 75.5 76.2 US [2] 91.6 81 88.2 99.1 97.7 91.5 99 86.2 86.1 93.1 82 87.9 54.9 95.3 81.8 91.9 85.8 87.7 PSVDD [24] 92.9 94.6 90.9 97.8 96.5 94.5 98.6 90.3 76.7 92 94 86.1 81.3 100 91.5 97.9 90.8 92.1 DAAD [11] 86.6 95.7 86.2 88.2 98.2 91 97.6 84.4 76.7 92.1 75.8 90 98.7 99.2 87.6 85.9 88.8 89.5 MF [23] 94 85.9 99.2 99 99.2 95.5 99.1 97.1 87.5 99.4 96.2 90.1 97.5 100 94.4 98.6 96 95.8 PaDiM [6] 99.8 96.7 100 98.1 99.2 98.8 99.9 92.7 91.3 92 98.7 93.3 85.8 96.1 97.4 90.3 93.8 95.5 CutPaste [13] 93.9 100 100 94.6 99.1 97.5 98.2 81.2 98.2 98.3 99.9 94.9 88.7 99.4 96.1 99.9 95.5 96.1 RD [7] 98.8 95.4 100 99.3 99.6 98.66 100 96.3 97.1 100 100 96.8 97 98.3 96.9 97.9 98.03 98.2 Our 98 100 100 99.3 98.3 99.12 100 97.8 96.8 100 100 97.7 98.7 96.7 98.2 98.6 98.45 98.7 TABLE II Results of anomaly localization on MVTec using AUROC and PRO While PRO concentrates on region-based behavior, AUROC provides a comparison at the pixel level We bold the top outcomes for AUROC and PRO Textures Objects Average Categories Carpet Grid Leather Tile Wood Average Bottle Cable Capsule Hazelnut Metal Nut Pill Screw Toothbrush Transistor Zipper Average US [2] -/87.9 -/95.2 -/94.5 -/94.6 -/91.1 -/92.7 -/93.1 -/81.8 -/96.8 -/96.5 -/94.2 -/96.1 -/94.2 -/93.3 -/66.6 -/95.1 -/90.8 -/91.4 MF [23] -/87.8 -/86.5 -/95.9 -/88.1 -/84.8 -/88.6 -/88.8 -/93.7 -/87.9 -/88.6 -/86.9 -/93.0 -/95.4 -/87.7 -/92.6 -/93.6 -/90.8 -/90.1 SPADE [5] 97.5/94.7 93.7/86.7 97.6/97.2 87.4/75.9 88.5/87.4 92.9/88.4 98.4/95.5 97.2/90.9 99.0/93.7 99.1/95.4 98.1/94.4 96.5/94.6 98.9/96.0 97.9/93.5 94.1/87.4 96.5/92.6 97.6/93.4 96.5/91.7 PaDiM [6] 99.1/96.2 97.3/94.6 99.2/97.8 94.1/86.0 94.9/91.1 96.9/93.2 98.3/94.8 96.7/88.8 98.5/93.5 98.2/92.6 97.2/85.6 95.7/92.7 98.5/94.4 98.8/93.1 97.5/84.5 98.5/95.9 97.8/91.6 97.5/92.1 Applying Adam optimizer with β = (0.5, 0.999) to train our Revise and model The learning rate is set to 0.005, and forwards 80 epochs are trained with a batch size of 16 The anomaly score map is smoothed using a Gaussian filter with σ = We will compare the three previously mentioned metrics—AUROC for sample level, AUROC for pixel-wise level, and AUPRO pixel-wise level—to assess the effectiveness of this approach compared to other approaches and the original reverse distillation However, the original model is slightly better in some data sets, but not significantly In most episodes, the model after improvement is significantly higher At all three values, on average, this technique produced new SOTA (98.7%, 98.0%, and 94.3%, respectively) Table I and Table II provide more details reports In addition to showing outstanding results, the anomaly map’s output image from Fig makes it easy to see why this image is considered abnormal RIAD [25] 96.3/98.8/99.4/89.1/85.8/93.9/98.4/84.2/92.8/96.1/92.5/95.7/98.8/98.9/87.7/97.8/94.3/94.2/- CutPaste [13] 98.3/97.5/99.5/90.5/95.5/96.3/97.6/90.0/97.4/97.3/93.1/95.7/96.7/98.1/93.0/99.3/95.8/96.0- RD [7] 99/97 98.1/95.7 99.4/99.1 95.4/90.2 95.3/90.9 97.44/94.58 98.7/96.6 97.3/91.0 98.7/95.6 98.9/95.5 97.3/92.2 98.3/96.6 99.6/98.2 99.1/94.6 92.7/78.3 98.6/96.1 97.92/93.48 97.8/93.8 Our 99/97 99.3/97.5 99.5/99.2 95.7/90.4 95.9/92.4 97.88/95.3 98.8/96.7 98.5/93.8 98.6/95.6 99/95.9 97.5/92.4 98.1/96.3 99.6/98.2 99.1/94.5 92.4/78.8 98.3/95.7 97.99/93.79 98.0/94.3 by the model TABLE III Results of anomaly detection on BTAD [14] using AUROC at the sample level Category/Method Average RD [7] 98.2 83.3 99.5 93.67 Revise 96.5 87.9 99.6 94.67 Revise 98.6 86 99.7 94.77 We also experienced on BeanTech Anomaly Detection dataset (BTAD [14]) There are three kinds of industrial products in the BTAD, each of which has 2540 images While the test set contains both normal and abnormal photos, the training set only includes normal images We compare the results of our method with the origin RD Table III and IV presents the comparing findings Our model suppresses other 58 applied to benchmark datasets for image anomaly detection and segmentation ACKNOWLEDGMENT We acknowledge Ho Chi Minh City University of Technology (HCMUT), VNU-HCM for supporting this study R EFERENCES Fig Anomaly map result left column: normal images of Bottle, Wood, Toothbrush, Zipper, and Hazelnut classes The next column: images of the ground truth anomalies highlighted in white The last column presents anomaly heatmaps obtained by the Improved RD model approaches with an AUROC sampel level of up to 1.1% with 94.77% while achieving 97.63% pixel level TABLE IV Results of anomaly localization on BTAD [14] using AUROC and PRO at the pixel level Category/Method Average RD [7] 96.3/75.5 96.6/58.5 99.7/86.4 97.53/73.47 Revise1 96.8/75.4 96.5/60.2 99.6/84.8 97.63/73.47 Revise2 96.9/77.9 96.4/59.4 99.6/83.7 97.63/73.67 V C ONCLUSION We enhance the image representation by learning a multiscale patch-based framework for anomaly identification Our experimental findings demonstrate that learning an adequate representation for picture anomaly identification is facilitated by considering an image’s global and local context simultaneously Additionally, we improved the model to increase its accuracy capacity Letting the student learn features from the transformed image and reconstruct it compared to the raw image has significantly improved Our test findings show that the suggested technique achieves SOTA accuracy when [1] Samet Akcay, Amir Atapour Abarghouei, and Toby P Breckon Ganomaly: Semi-supervised anomaly detection via adversarial training CoRR, abs/1805.06725, 2018 [2] Paul Bergmann, Michael Fauser, David Sattlegger, and Carsten Steger Uninformed students: Student-teacher anomaly detection with discriminative latent embeddings CoRR, abs/1911.02357, 2019 [3] Paul Bergmann., Xin Jin., David Sattlegger., and Carsten Steger The mvtec 3d-ad dataset for unsupervised 3d anomaly detection and localization In Proceedings of the 17th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 5: VISAPP,, pages 202–213 INSTICC, SciTePress, 2022 [4] Paul Bergmann, Sindy Lăowe, Michael Fauser, David Sattlegger, and Carsten Steger Improving unsupervised defect segmentation by applying structural similarity to autoencoders In Proceedings of the 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications SCITEPRESS - Science and Technology Publications, 2019 [5] Niv Cohen and Yedid Hoshen Sub-image anomaly detection with deep pyramid correspondences CoRR, abs/2005.02357, 2020 [6] Thomas Defard, Aleksandr Setkov, Angelique Loesch, and Romaric Audigier Padim: a patch distribution modeling framework for anomaly detection and localization CoRR, abs/2011.08785, 2020 [7] Hanqiu Deng and Xingyu Li Anomaly detection via reverse distillation from one-class embedding In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9737–9746, June 2022 [8] Izhak Golan and Ran El-Yaniv Deep anomaly detection using geometric transformations In S Bengio, H Wallach, H Larochelle, K Grauman, N Cesa-Bianchi, and R Garnett, editors, Advances in Neural Information Processing Systems, volume 31 Curran Associates, Inc., 2018 [9] Dong Gong, Lingqiao Liu, Vuong Le, Budhaditya Saha, Moussa Reda Mansour, Svetha Venkatesh, and Anton van den Hengel Memorizing normality to detect anomaly: Memory-augmented deep autoencoder for unsupervised anomaly detection, 2019 [10] Ian J Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio Generative adversarial networks, 2014 [11] Jinlei Hou, Yingying Zhang, Qiaoyong Zhong, Di Xie, Shiliang Pu, and Hong Zhou Divide-and-assemble: Learning block-wise memory for unsupervised anomaly detection CoRR, abs/2107.13118, 2021 [12] Diederik P Kingma and Max Welling Auto-encoding variational bayes, 2013 [13] Chun-Liang Li, Kihyuk Sohn, Jinsung Yoon, and Tomas Pfister Cutpaste: Self-supervised learning for anomaly detection and localization In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9664–9674, June 2021 [14] Pankaj Mishra, Riccardo Verk, Daniele Fornasier, Claudio Piciarelli, and Gian Luca Foresti VT-ADL: A vision transformer network for image anomaly detection and localization CoRR, abs/2104.10036, 2021 [15] Hyunjong Park, Jongyoun Noh, and Bumsub Ham Learning memoryguided normality for anomaly detection In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 14360– 14369, 2020 [16] Oliver Rippel, Patrick Mertens, and Dorit Merhof Modeling the distribution of normal data in pre-trained deep features for anomaly detection, 2020 [17] Lukas Ruff, Robert Vandermeulen, Nico Goernitz, Lucas Deecke, Shoaib Ahmed Siddiqui, Alexander Binder, Emmanuel Măuller, and Marius Kloft Deep one-class classification In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 4393–4402 PMLR, 10–15 Jul 2018 59 [18] Mohammadreza Salehi, Niousha Sadjadi, Soroosh Baselizadeh, Mohammad H Rohban, and Hamid R Rabiee Multiresolution knowledge distillation for anomaly detection In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 14902–14912, June 2021 [19] Thomas Schlegl, Philipp Seebăock, Sebastian M Waldstein, Ursula Schmidt-Erfurth, and Georg Langs Unsupervised anomaly detection with generative adversarial networks to guide marker discovery, 2017 [20] Bernhard Schăolkopf, John Platt, John Shawe-Taylor, Alexander Smola, and Robert Williamson Estimating support of a high-dimensional distribution Neural Computation, 13:1443–1471, 07 2001 [21] David M J Tax and Robert P W Duin Support vector data description Machine Learning, 54:45–66, 2004 [22] Guodong Wang, Shumin Han, Errui Ding, and Di Huang Studentteacher feature pyramid matching for anomaly detection, 2021 [23] Jhih-Ciang Wu, Ding-Jie Chen, Chiou-Shann Fuh, and Tyng-Luh Liu Learning unsupervised metaformer for anomaly detection In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 4369–4378, October 2021 [24] Jihun Yi and Sungroh Yoon Patch svdd: Patch-level svdd for anomaly detection and segmentation In Proceedings of the Asian Conference on Computer Vision (ACCV), November 2020 [25] Vitjan Zavrtanik, Matej Kristan, and Danijel Skoˇcaj Reconstruction by inpainting for visual anomaly detection Pattern Recognition, 112:107706, April 2021 60 Tài liệu tham khảo [1] P Perera, P Oza, and V M Patel, “One-class classification: A survey,” Computer Vision and Pattern Recognition (CVPR), vol abs/2101.03064, 2021 [2] C S Varshini, G Hruday, G S Mysakshi Chandu, and S K S and, “Sign language recognition,” International Journal of Engineering Research and Technology (IJERT), vol V9, June 2020 [3] K He, X Zhang, S Ren, and J Sun, “Deep residual learning for image recognition,” Computer Vision and Pattern Recognition (CVPR), vol abs/1512.03385, 2015 [4] P Bergmann., X Jin., D Sattlegger., and C Steger., “The mvtec 3d-ad dataset for unsupervised 3d anomaly detection and localization,” in Proceedings of the 17th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 5: VISAPP,, pp 202–213, INSTICC, SciTePress, 2022 [5] P Mishra, R Verk, D Fornasier, C Piciarelli, and G L Foresti, “VT-ADL: A vision transformer network for image anomaly detection and localization,” Computer Vision and Pattern Recognition (CVPR), vol abs/2104.10036, 2021 [6] P.-N Tan, “Receiver operating characteristic,” Encyclopedia of Database Systems, pp 2349–2352, 2009 [7] P Oza and V M Patel, “One-class convolutional neural network,” IEEE Signal Processing Letters, vol 26, no 2, pp 277–281, 2019 61 [8] P Perera, R Nallapati, and B Xiang, “OCGAN: one-class novelty detection using gans with constrained latent representations,” Computer Vision and Pattern Recognition (CVPR), vol abs/1903.08550, 2019 [9] Z Zhang, S Chen, and L Sun, “P-kdgan: Progressive knowledge distillation with gans for one-class novelty detection,” in Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20 (C Bessiere, ed.), pp 3237–3243, International Joint Conferences on Artificial Intelligence Organization, 2020 Main track [10] H Deng and X Li, “Anomaly detection via reverse distillation from one-class embedding,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 9737–9746, June 2022 [11] I Golan and R El-Yaniv, “Deep anomaly detection using geometric transformations,” in Advances in Neural Information Processing Systems (S Bengio, H Wallach, H Larochelle, K Grauman, N Cesa-Bianchi, and R Garnett, eds.), vol 31, Curran Associates, Inc., 2018 [12] S Akcay, A A Abarghouei, and T P Breckon, “Ganomaly: Semi-supervised anomaly detection via adversarial training,” Computer Vision and Pattern Recognition (CVPR), vol abs/1805.06725, 2018 [13] P Bergmann, M Fauser, D Sattlegger, and C Steger, “Uninformed students: Student-teacher anomaly detection with discriminative latent embeddings,” Computer Vision and Pattern Recognition (CVPR), vol abs/1911.02357, 2019 [14] J Yi and S Yoon, “Patch svdd: Patch-level svdd for anomaly detection and segmentation,” in Proceedings of the Asian Conference on Computer Vision (ACCV), November 2020 [15] J Hou, Y Zhang, Q Zhong, D Xie, S Pu, and H Zhou, “Divide-and-assemble: Learning block-wise memory for unsupervised anomaly detection,” Computer Vision and Pattern Recognition (CVPR), vol abs/2107.13118, 2021 62 [16] J.-C Wu, D.-J Chen, C.-S Fuh, and T.-L Liu, “Learning unsupervised metaformer for anomaly detection,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp 4369–4378, October 2021 [17] T Defard, A Setkov, A Loesch, and R Audigier, “Padim: a patch distribution modeling framework for anomaly detection and localization,” Computer Vision and Pattern Recognition (CVPR), vol abs/2011.08785, 2020 [18] C.-L Li, K Sohn, J Yoon, and T Pfister, “Cutpaste: Self-supervised learning for anomaly detection and localization,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 9664–9674, June 2021 [19] N Cohen and Y Hoshen, “Sub-image anomaly detection with deep pyramid correspondences,” Computer Vision and Pattern Recognition (CVPR), vol abs/2005.02357, 2020 [20] V Zavrtanik, M Kristan, and D Skoˇcaj, “Reconstruction by inpainting for visual anomaly detection,” Pattern Recognition, vol 112, p 107706, Apr 2021 63 PHẦN LÝ LỊCH TRÍCH NGANG Họ tên: NGUYỄN VĂN ĐỨC Ngày, tháng, năm sinh: 21/01/1996 Nơi sinh: ĐỒNG NAI Địa liên lạc: 587 khu 6, thị trấn Tân Phú, huyện Tân Phú, tỉnh Đồng Nai QUÁ TRÌNH ĐÀO TẠO • 2014 - 2018: Học đại học Đại học Bách Khoa TPHCM • 2019 - 2022: Học cao học Đại học Bách Khoa TPHCM QUÁ TRÌNH CƠNG TÁC • 2018 - 2021: Làm việc cơng ty FPT Telecom • 2021 - nay: Làm việc công ty VNG