Ứng dụng mô hình chuyển đổi thị giác cho bài toán phân loại và diễn giải ảnh y tế

BỘ GIÁO DỤC VÀ ĐÀO TẠO TRƯỜNG ĐẠI HỌC SƯ PHẠM KỸ THUẬT THÀNH PHỐ HỒ CHÍ MINH CƠNG TRÌNH NGHIÊN CỨU KHOA HỌC CỦA SINH VIÊN ỨNG DỤNG MÔ HÌNH CHUYỂN ĐỔI THỊ GIÁC CHO BÀI TỐN PHÂN LOẠI VÀ DIỄN GIẢI ẢNH Y TẾ MÃ SỐ: SV2022 - 24 CHỦ NHIỆM ĐỀ TÀI: PHẠM NGUYỄN NGỌC DIỄM SKC007676 Tp Hồ Chí Minh, tháng 6/2022 TIEU LUAN MOI download : skknchat123@gmail.com BỘ GIÁO DỤC VÀ ĐÀO TẠO TRƯỜNG ĐH SƯ PHẠM KỸ THUẬT TPHCM BÁO CÁO TỔNG KẾT ĐỀ TÀI NGHIÊN CỨU KHOA HỌC CỦA SINH VIÊN ỨNG DỤNG MƠ HÌNH CHUYỂN ĐỔI THỊ GIÁC CHO BÀI TỐN PHÂN LOẠI VÀ DIỄN GIẢI ẢNH Y TẾ SV2022-24 Thuộc nhóm ngành khoa học: Kỹ thuật SV thực hiện: Ngô Quang Khải Nam, Nữ: Nam Dân tộc: Kinh Năm thứ: /Số năm đào tạo: Lớp, khoa: 181290C, khoa Điện – Điện tử Ngành học: Điện tử y sinh Người hướng dẫn: T.S Nguyễn Mạnh Hùng TP Hồ Chí Minh, 06/2022 TIEU LUAN MOI download : skknchat123@gmail.com MỤC LỤC MỤC LỤC i DANH MỤC BẢNG iv DANH MỤC HÌNH v DANH MỤC TỪ VIẾT TẮT vii THÔNG TIN KẾT QUẢ NGHIÊN CỨU CỦA ĐỀ TÀI viii Chương 1: TỔNG QUAN 1.1 Tổng quan tình hình nghiên cứu 1.2 Lý chọn đề tài 1.3 Mục tiêu 1.4 Phương pháp nghiên cứu 1.5 Đối tượng phạm vi đề tài 1.6 Bố cục Chương 2: CƠ SỞ LÝ THUYẾT 10 2.1 Giới thiệu ảnh X-quang 10 2.1.1 Ảnh số 10 2.1.2 Ảnh X-quang 11 2.2 Giới thiệu phần mềm 12 2.2.1 Ngôn ngữ lập trình Python 12 2.2.2 Thư viện Pytorch 13 2.3 Giới thiệu mạng nơ-ron nhân tạo 14 2.3.1 Hàm kích hoạt 15 2.3.2 Gradient descent 16 2.3.3 Hàm tổn thất 17 2.4 Phân loại dựa CNN 19 2.4.1 Mạng LeNet 19 i TIEU LUAN MOI download : skknchat123@gmail.com 2.4.2 Mạng AlexNet 2.4.3 Mạng VGG 2.4.4 Mạng GoogLeNet 2.4.5 Mạng ResNet 2.5 Phân loại dựa Transformer 2.5.1 Kiến trúc Transformer 2.5.2 Mơ hình Vision Transf 2.6 Giới thiệu Grad-CAM Chương 3: PHƯƠNG PHÁP ĐỀ XUẤT 3.1 Tổ chức liệu 3.2 Quy trình tổng quan 3.4 Các mơ hình pre-trained 3.4.1 Model ViT L/16, S/16, 3.4.2 Model Resnet-18, Resn 3.4.3 R50+B/16 3.5 Chỉ số đánh giá Chương 4: KẾT QUẢ - NHÂN XÉT - ĐÁNH GIÁ 4.1 Kết định lượng 4.1.1 Tỷ lệ kích thước ả 4.1.2 So sánh backbone ViT 4.1.3 Tác dụng tập li 4.1.4 So sánh với backbone 4.2 Khả diễn giải mơ hình Chương 5: KẾT LUẬN VÀ KIẾN NGHỊ 5.1 Kết luận 5.2 Kiến nghị ii TIEU LUAN MOI download : skknchat123@gmail.com TÀI LIỆU THAM KHẢO 51 PHỤ LỤC 54 iii TIEU LUAN MOI download : skknchat123@gmail.com DANH MỤC BẢNG Bảng 1.1: So sánh tập liệu hình ảnh thơng thường tập liệu hình ảnh y tế Bảng 1.2: Các tập liệu cho toán phân loại ảnh y tế có Bảng 1.3: Tập liệu CheXpert bao gồm 14 loại bệnh gắn nhãn [8] Bảng 1.4: Bảng thống kê tập liệu VinDr-CXR [10] Bảng 3.1: Class bệnh lựa chọn tập VinDr_CXR 37 Bảng 3.2: Các cấu trúc backbone ViT 40 Bảng 4.1: Ảnh hưởng kích thước hình ảnh 45 Bảng 4.2: So sánh backbone 46 Bảng 4.3: Kết có khơng có tập liệu pre-trained 47 Bảng 4.4: Kết tập liệu backbone CNN 48 iv TIEU LUAN MOI download : skknchat123@gmail.com DANH MỤC HÌNH Hình 1.1: Sự xuất đồng thời nhiều loại bệnh tập liệu VinDr-CXR…7 Hình 2.1: Ví dụ ảnh màu RGB 10 Hình 2.2: Ví dụ ảnh xám 11 Hình 2.3: Ảnh chụp X-quang vùng ngực 12 Hình 2.4: Workflow q trình huấn luyện mơ hình Pytorch .13 Hình 2.5: Mạng nơ-ron nhân tạo 15 Hình 2.6: Đồ thị hàm Sigmoid 16 Hình 2.7: Đồ thị hàm ReLU 16 Hình 2.8: Vai trò việc chọn learning rate 17 Hình 2.9: Đồ thị Hàm log(x) 18 Hình 2.10: Mơ tả kiến trúc mạng LeNet-5 thiết kế để xác định chữ số viết tay tập liệu MNIST 19 Hình 2.11: Chi tiết thành phần tham số mơ hình LeNet .20 Hình 2.12: Mơ tả kiến trúc mạng AlexNet 20 Hình 2.13: Chi tiết thành phần tham số mơ hình AlexNet 21 Hình 2.14: Ý tưởng xây dựng mạng VGG sử dụng khối VGG block gồm tầng tích chập xếp chồng 22 Hình 2.15: Mơ tả kiến trúc mơ hình VGG-16 23 Hình 2.16: Cấu trúc khối Inception 24 Hình 2.17: Cấu trúc mạng GoogLeNet 25 Hình 2.18: Kết nối tắt sử dụng ResNet 26 Hình 2.19: Mơ tả kiến trúc ResNet (ResNet-18) 27 Hình 2.20: Kiến rúc Transformer [3] 27 Hình 2.21: Mơ tả cách tính ma trận attention 29 Hình 2.22: Mơ tả Multi - Head Attention 29 Hình 2.23: Mơ tả kiến trúc mơ hình ViT [4] 30 Hình 2.24: Hình ảnh chia nhỏ thành patch nhỏ có kích thước cố định 31 Hình 2.25: Làm phẳng patch thành chuỗi vectors (hình minh họa patch) 31 Hình 2.26: Khái niệm attention Vision Transformer 32 Hình 2.27: Khái niệm attention Vision Transformer 33 Hình 2.28: Tổng quan hoạt động Grad-CAM 35 v TIEU LUAN MOI download : skknchat123@gmail.com Hình 3.1: Biểu đồ phân phối liệu lựa chọn từ tập liệu VinDr-CXR 38 Hình 3.2: Tổ chức liệu thí nghiệm 38 Hình 3.3: Tổng quan phương pháp đề nghị 39 Hình 3.4: Kiến trúc mạng Resnet-18 40 Hình 3.5: Kiến trúc mạng Resnet-34 41 Hình 3.6: Sự thay đổi Resnet-50 (phải) so với phiên trước (trái) .41 Hình 3.7: Kiến trúc mạng Resnet-50 42 Hình 3.8: Grad-CAM attention map tập liệu ong kiến 43 Hình 4.1: Diễn giải mơ hình với nhiều vùng tổn thương 48 Hình 4.2: Diễn giải mơ hình với vùng tổn thương đơn lẻ 49 vi TIEU LUAN MOI download : skknchat123@gmail.com STT Từ viết tắt 1D 2D AI AtteMap BN CNN CXR DICOM GAP 10 GN 11 Grad-CAM 12 ID 13 ILD 14 KNN 15 LN 16 LRN 17 MLP 18 MNIST 19 MSA 20 NIH 21 NLP 22 PACS 23 ReLU 24 ResNet 25 SVM 26 VGG 27 ViT vii TIEU LUAN MOI download : skknchat123@gmail.com while True: model.train() epoch_iterator = tqdm(train_loader, desc="Training (X / X Steps) (loss=X.X)", bar_format="{l_bar}{r_bar}", dynamic_ncols=True, disable=args.local_rank not in [-1, 0]) for step, batch in enumerate(epoch_iterator): batch = tuple(t.to(args.device) for t in batch) x, y = batch loss = model(x, y) if args.gradient_accumulation_steps > 1: loss = loss / args.gradient_accumulation_steps if args.fp16: with amp.scale_loss(loss, optimizer) as scaled_loss: scaled_loss.backward() else: loss.backward() if (step + 1) % args.gradient_accumulation_steps == 0: losses.update(loss.item()*args.gradient_accumulation_steps) if args.fp16: torch.nn.utils.clip_grad_norm_(amp.master_params(optimizer), args.max_grad_norm) else: torch.nn.utils.clip_grad_norm_(model.parameters(), args.max_grad_norm) scheduler.step() optimizer.step() optimizer.zero_grad() global_step += 60 TIEU LUAN MOI download : skknchat123@gmail.com epoch_iterator.set_description( "Training (%d / %d Steps) (loss=%2.5f)" % (global_step, t_total, losses.val) ) if args.local_rank in [-1, 0]: writer.add_scalar("train/loss", scalar_value=losses.val, global_step=global_step) writer.add_scalar("train/lr", scalar_value=scheduler.get_lr()[0], global_step=global_step) if global_step % args.eval_every == and args.local_rank in [-1, 0]: accuracy = valid(args, model, writer, test_loader, global_step) if best_acc < accuracy: save_model(args, model) best_acc = accuracy model.train() if global_step % t_total == 0: break losses.reset() if global_step % t_total == 0: break if args.local_rank in [-1, 0]: writer.close() logger.info("Best Accuracy: \t%f" % best_acc) logger.info("End Training!") def main(): parser = argparse.ArgumentParser() # Required parameters 61 TIEU LUAN MOI download : skknchat123@gmail.com parser.add_argument(" name", required=True, help="Name of this run Used for monitoring.") parser.add_argument(" dataset", choices=["cifar10", "cifar100"], default="cifar10", help="Which downstream task.") parser.add_argument(" model_type", choices=["ViT-B_16", "ViT-B_32", "ViTL_16", "ViT-L_32", "ViT-H_14", "R50-ViT-B_16", "ViT-S_16", "R26+ViT-S_32", "R+ViT-Ti_16"], default="ViT-B_16", help="Which variant to use.") parser.add_argument(" pretrained_dir", type=str, default="checkpoint/ViTB_16.npz", help="Where to search for pretrained ViT models.") parser.add_argument(" output_dir", default="output", type=str, help="The output directory where checkpoints will be written.") parser.add_argument(" img_size", default=224, type=int, help="Resolution size") parser.add_argument(" train_batch_size", default=512, type=int, help="Total batch size for training.") parser.add_argument(" eval_batch_size", default=64, type=int, help="Total batch size for eval.") parser.add_argument(" eval_every", default=100, type=int, help="Run prediction on validation set every so many steps." "Will always run one evaluation at the end of training.") parser.add_argument(" learning_rate", default=3e-2, type=float, help="The initial learning rate for SGD.") parser.add_argument(" weight_decay", default=0, type=float, help="Weight deay if we apply some.") parser.add_argument(" num_steps", default=10000, type=int, help="Total number of training epochs to perform.") 62 TIEU LUAN MOI download : skknchat123@gmail.com parser.add_argument(" decay_type", choices=["cosine", "linear"], default="cosine", help="How to decay the learning rate.") parser.add_argument(" warmup_steps", default=500, type=int, help="Step of training to perform learning rate warmup for.") parser.add_argument(" max_grad_norm", default=1.0, type=float, help="Max gradient norm.") parser.add_argument(" local_rank", type=int, default=-1, help="local_rank for distributed training on gpus") parser.add_argument(' seed', type=int, default=42, help="random seed for initialization") parser.add_argument(' gradient_accumulation_steps', type=int, default=1, help="Number of updates steps to accumulate before performing a backward/update pass.") parser.add_argument(' fp16', action='store_true', help="Whether to use 16-bit float precision instead of 32-bit") parser.add_argument(' fp16_opt_level', type=str, default='O2', help="For fp16: Apex AMP optimization level selected in ['O0', 'O1', 'O2', and 'O3']." "See details at https://nvidia.github.io/apex/amp.html") parser.add_argument(' loss_scale', type=float, default=0, help="Loss scaling to improve fp16 numeric stability Only used when fp16 set to True.\n" "0 (default value): dynamic loss scaling.\n" "Positive power of 2: static loss scaling value.\n") args = parser.parse_args() # Setup CUDA, GPU & distributed training if args.local_rank == -1: device = torch.device("cuda" if torch.cuda.is_available() else "cpu") args.n_gpu = torch.cuda.device_count() 63 TIEU LUAN MOI download : skknchat123@gmail.com else: # Initializes the distributed backend which will take care of sychronizing nodes/GPUs torch.cuda.set_device(args.local_rank) device = torch.device("cuda", args.local_rank) torch.distributed.init_process_group(backend='nccl', timeout=timedelta(minutes=60)) args.n_gpu = args.device = device # Setup logging logging.basicConfig(format='%(asctime)s - %(levelname)s - %(name)s - % (message)s', datefmt='%m/%d/%Y %H:%M:%S', level=logging.INFO if args.local_rank in [-1, 0] else logging.WARN) logger.warning("Process rank: %s, device: %s, n_gpu: %s, distributed training: %s, 16-bits training: %s" % (args.local_rank, args.device, args.n_gpu, bool(args.local_rank != -1), args.fp16)) # Set seed set_seed(args) # Model & Tokenizer Setup args, model = setup(args) # Training train(args, model) save_model(args, model) if name == " main ": main() 64 TIEU LUAN MOI download : skknchat123@gmail.com ➢ Chương trình hiển thị Attention Map import typing import io import os import torch import numpy as np import cv2 import matplotlib.pyplot as plt from urllib.request import urlretrieve from PIL import Image from torchvision import transforms, datasets from torch.utils.data import DataLoader from models.modeling import VisionTransformer, CONFIGS # Prepare Model config = CONFIGS["R50-ViT-B_16"] model = VisionTransformer(config, num_classes=10, zero_head=False, img_size=224, vis=True) PATH='model /Finetune_Chest14_R50B16_enhance.bin' model.load_state_dict(torch.load(PATH)) model.eval() VinXray_labels=dict(enumerate(open(' /VinXrayLabel.txt'))) transform = transforms.Compose([ transforms.Resize((224, 224)), 65 TIEU LUAN MOI download : skknchat123@gmail.com transforms.Grayscale(num_output_channels=3), transforms.ToTensor(), transforms.Normalize(mean=[0.5], std=[0.5]), ]) im=Image.open("Images/051c2436c0acdb5e09d085c7e4a764f3.jpg") def get_attention_map(img, get_mask=False): x = transform(img) x.size() logits, att_mat = model(x.unsqueeze(0)) print('logits =',logits) att_mat = torch.stack(att_mat).squeeze(1) # Average the attention weights across all heads att_mat = torch.mean(att_mat, dim=1) # To account for residual connections, we add an identity matrix to the # attention matrix and re-normalize the weights residual_att = torch.eye(att_mat.size(1)) aug_att_mat = att_mat + residual_att aug_att_mat = aug_att_mat / aug_att_mat.sum(dim=-1).unsqueeze(-1) # Recursively multiply the weight matrices joint_attentions = torch.zeros(aug_att_mat.size()) joint_attentions[0] = aug_att_mat[0] for n in range(1, aug_att_mat.size(0)): joint_attentions[n] = torch.matmul(aug_att_mat[n], joint_attentions[n1]) v = joint_attentions[-1] grid_size = int(np.sqrt(aug_att_mat.size(-1))) 66 TIEU LUAN MOI download : skknchat123@gmail.com mask = v[0, 1:].reshape(grid_size, grid_size).detach().numpy() if get_mask: result = cv2.resize(mask / mask.max(), img.size) else: mask = cv2.resize(mask / mask.max(), img.size)[ , np.newaxis] result = (mask * img).astype("uint8") return result def plot_attention_map(original_img, att_map): fig, (ax1, ax2) = plt.subplots(ncols=2, figsize=(16, 16)) ax1.set_title('Original') ax2.set_title('Attention Map Last Layer') _ = ax1.imshow(original_img) _ = ax2.imshow(att_map) result = get_attention_map(im) plot_attention_map(im, result) # Check mask for Attention Map check_mask = get_attention_map(im, True) plot_attention_map(im, check_mask) ➢ Chương trình hiển thị Grad-CAM from pytorch_grad_cam import GradCAM, ScoreCAM, GradCAMPlusPlus, AblationCAM, XGradCAM, EigenCAM, LayerCAM, FullGrad 67 TIEU LUAN MOI download : skknchat123@gmail.com from pytorch_grad_cam.utils.image import show_cam_on_image, preprocess_image, deprocess_image from pytorch_grad_cam import GuidedBackpropReLUModel import torch import torch.nn as nn from torchvision import models import cv2 import numpy as np import matplotlib.pyplot as plt from PIL import Image #Load model model=torch.load('model/ resnet50_ft_VinXray10class.pth') model.eval() #Select the target layer target_layer = [model.layer4[-1]] image_path='Images/1b2a7adb5705d9e3f5b63939046d93c7.jpg' img = cv2.imread(image_path, 1)[:, :, ::-1] # Is read rgb img = cv2.resize(img, (224, 224)) img = np.float32(img) / 255 input_tensor = preprocess_image(img, mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]) #Construct the CAM object once, and then re-use it on many images: cam = GradCAM(model=model, target_layers=target_layer, use_cuda=True) def show_cam(mask: np.ndarray,use_rgb: bool = False, colormap: int = cv2.COLORMAP_JET) -> np.ndarray: " This function overlays the cam mask on the image as an heatmap By default the heatmap is in BGR format 68 TIEU LUAN MOI download : skknchat123@gmail.com :param img: The base image in RGB or BGR format :param mask: The cam mask :param use_rgb: Whether to use an RGB or BGR heatmap, this should be set to True if 'img' is in RGB format :param colormap: The OpenCV colormap to be used :returns: The default image with the cam overlay """ heatmap = cv2.applyColorMap(np.uint8(255 * mask), colormap) if use_rgb: heatmap = cv2.cvtColor(heatmap, cv2.COLOR_BGR2RGB) heatmap = np.float32(heatmap) / 255 if np.max(img) > 1: raise Exception( "The input image should np.float32 in the range [0, 1]") cam = heatmap cam = cam / np.max(cam) return np.uint8(255 * cam) #Show cam target_category = None cam.batch_size = 32 #Calculation cam grayscale_cam = cam(input_tensor=input_tensor, target_category=target_category, aug_smooth=True, eigen_smooth=True) #Display and save the heat map , grayscale_cam It's a batch Result , Only one can be selected for display grayscale_cam = grayscale_cam[0,:] visualization = show_cam(grayscale_cam, use_rgb=False) cv2.imwrite(f'cam_image.jpg', visualization) 69 TIEU LUAN MOI download : skknchat123@gmail.com cam_image = Image.open('cam_image.jpg') fig, (ax1, ax2) = plt.subplots(ncols=2, figsize=(12, 12)) ax1.set_title('Original') ax2.set_title('Grad-cam') _ = ax1.imshow(img) _ = ax2.imshow(cam_image) plt.show() #Show cam on image visualization = show_cam_on_image(img, grayscale_cam, use_rgb=False) cv2.imwrite(f'cam_image.jpg', visualization) cam_image = Image.open('cam_image.jpg') fig, (ax1, ax2) = plt.subplots(ncols=2, figsize=(12, 12)) ax1.set_title('Original') ax2.set_title('Grad-cam') _ = ax1.imshow(img) _ = ax2.imshow(cam_image) plt.show() 70 TIEU LUAN MOI download : skknchat123@gmail.com ➢ Chương trình Đánh giá định lượng kết phân loại mơ hình: import numpy as np import pandas as pd def check_FileName(GTData,PDData): GTNAME = GTData['ImageName'].to_list() PDNAME = PDData['ImageName'].to_list() count = for i in range(len(PDNAME)): if PDNAME[i] in GTNAME: count = count + return len(PDNAME)== count def check_Col(GTData,PDData): GT_Col = GTData.columns PD_Col = PDData.columns count = for i in range(len(GT_Col)): if GT_Col[i] == PD_Col[i]: count = count + return len(GT_Col)== count class Evaluator: def init (self, GTfile, PDFile): self.GTname = GTfile self.GTData= pd.read_csv(self.GTname) self.PDname = PDFile self.PDData = pd.read_csv(self.PDname) if not(check_FileName(self.GTData, self.PDData)): raise ValueError("File Name not pass") 71 TIEU LUAN MOI download : skknchat123@gmail.com if not(check_Col(self.GTData, self.PDData)): raise ValueError("Collume not pass") def get_result(self, thre=0.7, k=[1,3,5]): FileName=self.PDData['ImageName'].to_list() col=self.PDData.columns[2:] PreCision=[] ReCall=[] ACC_k=[] for filename in FileName: # print(filename) gt=self.GTData[self.GTData['ImageName']==filename] [col].to_numpy() pred=self.PDData[self.PDData['ImageName']==filename] [col].to_numpy() The=pred.max()*thre pred_thd=pred>=The pred_thd=pred_thd.astype('float') precision=np.sum(gt*pred_thd)/np.sum(pred_thd) recall = np.sum(gt * pred_thd) / np.sum(gt) PreCision.append(precision) ReCall.append(recall) A_temp=[] y_sort=np.argsort(pred) for i in range (len(k)): y_temp=np.zeros_like(gt) K=k[i] for j in range(K): y_temp[0][int(y_sort[0][-(j+1)])]=1.0 72 TIEU LUAN MOI download : skknchat123@gmail.com r=np.sum(y_temp*gt)>0 if r: A_temp.append(1.0) else: A_temp.append(0.0) ACC_k.append(A_temp) self.Pre_R=np.mean(np.array(PreCision)) self.ReC_R = np.mean(np.array(ReCall)) self.ACC_R =np.mean(np.array(ACC_k),axis=0) if name == " main ": 73 TIEU LUAN MOI download : skknchat123@gmail.com TIEU LUAN MOI download : skknchat123@gmail.com ... GIÁO DỤC VÀ ĐÀO TẠO TRƯỜNG ĐH SƯ PHẠM KỸ THUẬT TPHCM BÁO CÁO TỔNG KẾT ĐỀ TÀI NGHIÊN CỨU KHOA HỌC CỦA SINH VIÊN ỨNG DỤNG MƠ HÌNH CHUYỂN ĐỔI THỊ GIÁC CHO BÀI TOÁN PHÂN LOẠI VÀ DIỄN GIẢI ẢNH Y TẾ SV2022-24... truyền thơng quản lý thơng tin hình ảnh y tế liệu liên quan DICOM sử dụng phổ biến để lưu trữ truyền hình ảnh y tế cho phép tích hợp thiết bị hình ảnh y tế m? ?y quét, m? ?y chủ, m? ?y trạm, m? ?y in,... hình ảnh y tế Có thể th? ?y tập liệu hình ảnh y tế nhỏ tập liệu hình ảnh thông thường số lượng mẫu số lớp Bảng 1.1: So sánh tập liệu hình ảnh thơng thường tập liệu hình ảnh y tế Tập liệu hình ảnh