Dr unet rethinking the resunet++ architecture with dual respath skip connection for nuclei segmentation

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang	5
Dung lượng	473,32 KB

Nội dung

DR Unet Rethinking the ResUnet++ Architecture with Dual ResPath Skip Connection for Nuclei Segmentation DR Unet Rethinking the ResUnet++ Architecture with Dual ResPath skip connection for Nuclei segme[.]

2021 8th NAFOSTED Conference on Information and Computer Science (NICS) DR-Unet: Rethinking the ResUnet++ Architecture with Dual ResPath skip connection for Nuclei segmentation Nhat-Minh Le Dinh-Hung Le Faculty of Automation Engineering School of Electrical and Electronics Engineering Hanoi University of Science and Technology minh.ln181647@sis.hust.edu.vn Faculty of Automation Engineering School of Electrical and Electronics Engineering Hanoi University of Science and Technology hung.ld181504@sis.hust.edu.vn Van-Truong Pham* Thi-Thao Tran Faculty of Automation Engineering School of Electrical and Electronics Engineering Hanoi University of Science and Technology truong.phamvan@hust.edu.vn Faculty of Automation Engineering School of Electrical and Electronics Engineering Hanoi University of Science and Technology thao.tranthi@hust.edu.vn Abstract—Nuclei segmentation is a crucial stage in the analysis of cell microscope pictures By identifying nuclei, researchers may identify and characterize each cell in a sample Some models used techniques based on encoder-decoder pairs, such as U-Net, Multi ResUnet, DoubleUnet, and ResUnet++, which have been implemented and deployed on the Data Science Bowl 2018 dataset and given excellent results However, there is still a semantics gap between the features that directly connect from encoder to decoder in ResUnet++, and the extraction of information on many different regions is still limited To improve the performance of ResUnet++ in this segmentation task, in this paper, we propose a new architecture that uses Double ResPath (DR), called Double respath Unet (DR-Unet) The DR-Unet architecture retains some advantages that made Resunet++ successful such as residual block associated with a squeeze and excitation block Besides that, we also pass the encoder features through Respath, which can bridge the semantic gap instead of combining the encoder with the decoder feature straightforwardly Moreover, we use Progressive Atrous Spatial Pyramidal Pooling, PASPP, to replace ASPP to capture contextual information more efficiently Experimental results demonstrate that DR-Unet outperforms ResUnet, DoubleUnet, and other models in the benchmark Index Terms—Deep Learning, Nuclei Segmentation, Image Segmentation, ResUnet++, Multi ResUnet I I NTRODUCTION Medical image segmentation is the task of segmenting objects of interest in medical images More specifically, it is the task of labeling each pixel in a medical image In this field, nuclei image segmentation has received considerable attention Identifying the cell’s nuclei will help locate the cells under different conditions to enable faster cures Besides, we can improve throughput for research and insight, reduce time-tomarket for new drugs, etc [1] However, the manual cell’s nucleus image segmentation is highly time-consuming and labor-intensive, and the accuracy is highly dependent on the 978-1-6654-1001-4/21/$31.00 ©2021 IEEE expertise of the experts Therefore, automatic nucleus image partitioning is an essential requirement Otsu-based method [2], the watershed method [3], active contour [4] are only a few of the traditional image segmentation approaches that have been used in this problem The majority of the approaches listed above, on the other hand, are inefficient, time-consuming, and computationally demanding Another noteworthy factor is that the nuclei of the cells in the 2018 Data Science Bowl dataset [5] varied greatly in form, size, color, and border Traditional techniques can’t distinguish certain cell nuclei because their borders are unclear Recently, with the vigorous development of Convolutional Neural Networks (CNNs), CNN-based image segmentation methods have shown superior performance compared to traditional methods in many segmentation tasks [6] Long et al proposed Fully Convolutional Network (FCN) [7], one of the first deep learning architectures trained end-to-end for pixelwise prediction FCN uses an encoder to extract features of the input image and a decoder to generate a segmentation mask from those features Unet [8] is another popular method After them, many image segmentation architectures that use encoder-decoder structures are released and get good results ResUnet++ [9], developed from ResUnet, takes advantage of residual units, squeeze and excitation units, Atrous Spatial Pyramidal Pooling (ASPP), and attention units, showing great potential in medical image segmentation Double Unet [10] uses two U-Net [8] architecture in sequence, with two encoders and two decoders Multi ResUnet [11], an enhanced version of the standard U-Net architecture, uses Multi ResBlock to capture more spatial information and Respath to reduce the semantic gap between pair parallel features from the encoder to decoder The above models have proven effective in nuclei segmen- 194 2021 8th NAFOSTED Conference on Information and Computer Science (NICS) tation, especially the 2018 Data Science Bowl dataset [5] We desire to inherit and combine the strengths of the above models and develop a new architecture that achieves better results From that movitation, this paper develops a novel architecture that uses Double respath scheme,called DR-Unet, inspired by ResUnet++ [9] architecture, for medical image segmentation We tested the model on the Data 2018 Science Bowl dataset [5] The results indicate that the improved model is efficient and performs well compared to ResUnet++ and other models in the benchmark The paper is organized as follows: In Section II, we review some related works The proposed model is presented in Section III Experimental results on the Data 2018 Science Bowl are given Section IV Finally, we summarize the paper and discuss future work and limitations in Section V II R ELATED WORK Deep learning-based algorithms have recently been widely used in medical image applications, such as image superresolution, classification, especially medical imaging segmentation Along with the development of deep learning, Image Segmentation also achieved remarkable accomplishments Many deep learning-based techniques have been used to segment cells and nuclei However, there are still some existing problems that require researchers to come up with methods to solve Theoretically, the deeper the model, the higher the accuracy [12] [13] However, there will be a deterioration problem when the model reaches a certain depth [14] [15] He et al suggested a residual learning framework [14] to overcome this problem and increase the depth of the model Residual Blocks have a simple architecture but are capable of deeper model training without the degradation problem The Squeeze and excitation [17] block aims to improve the quality of convolutional neural networks by performing recalibration of each channel’s features, enhancing information between channels, and selective emphasis on channels containing more important features With any convolution layers, we can construct a corresponding SE block to reframe the feature maps This task is achieved in two steps: • Features are passed through a transformation, usually several convolutional layers, which extract the feature maps These feature maps go through a squeeze function, generating a feature matrix of each channel • The excitation function is added right after that, taking the above composite matrices as input through the calculation steps to calculate the weights describing the dependence between the channels These weights are then multiplied by features we got before to get the significant features to the problem In medical image segmentation, there are usually some issues as follows The first issue is that as the model becomes deeper, the resolution of the features is reduced due to the mass application of pooling layers At that time, it is challenging to extract spatial information The second issue is that the objects we are interested in have different scales Chen et al Fig PASPP architecture with atrous convolutional layer [18] [19] proposed Atrous Spatial Pyramid Pooling (ASPP) to deal with the above problems ASPP combines the output of atrous convolutions with different dilation rates to increase the ability to capture more global information while keeping the size of the feature map the same In the article [20], Yan et al proposed a Progressive ASPP model based on ASPP PASPP still uses atrous convolution layers with different dilation rates, but the output features are not combined immediately but will be gradually combined with different cognitive regions Figure depicts the architecture of the PASPP block Fig ResPath was introduced in [11] by Ibtehaz and Rahman It helps to reduce the semantic gap between the Encoder and the Decoder In U-net architecture [8], Ronneberger et al proposed a shortcut link between the convolutional layers immediately before the MaxPooling layer in the Encoder and the convolutional layers immediately after the equivalent deconvolution layer in the Decoder This permits the Encoder to send context information to the Decoder that was lost during training However, in the paper [11], Ibtehaz and Rahman pointed out that a problem in simple skip connection is the information imbalance between features in Encoder and Decoder Here, the 195 2021 8th NAFOSTED Conference on Information and Computer Science (NICS) Encoder features are considered low-level features compared to the features in the Decoder because the features in the Decoder are computed in the deeper layers of the network Therefore, directly combining these features can cause differences that adversely affect the segmentation results To address this problem, Ibtehaz and Rahman [11] devised the ”Res Path” which consists of numerous convolutional layers running the short connection length to lessen the information gap between the Encoder and the Decoder as shown in Figure presented in Fig.3 To extract feature information in Encoder, we employ one Stem block first, followed by three SE blocks interspersed with three Residual blocks The first PASPP block is used to collect information about multi-scale objects efficiently In Decoder, we build three Residual blocks using input from the previous block combining information from Double Res Path Finally, we use a combination of the second PASPP block, which have six atrous convolutional layers with a higher dilation rate,1×1 2D convolution layer, and Sigmoid activation function to generate the output mask III DR-U NET A RCHITECTURE A Residual block Each Residual block consists of two successive 3×3 convolutional blocks In [16], He et al demonstrated that the use of Batch Normalization and ReLU activation as pre-activation is surprisingly effective For this reason, in this paper, we employ a convolutional block with a batch normalization layer, a ReLU activation layer, and a convolutional layer A 1×1 convolutional layer is applied on the shortcut that connects the input and output of the encoder block A strided convolution layer is applied to reduce the spatial dimension of the feature maps by half at the first convolutional layer of the encoder block In the proposed architecture, the Squeeze and Excitation blocks are stacked together with the Residual blocks to increase effective generalization and improve the performance of the network [9] B Progressive Atrous Spatial Pyramid Pooling (PASPP) Because of the potential and efficiency of PASPP, we adopt two PASPP blocks in the ResUnet++ architecture Since the input features of the first ASPP block are 32×32, we employ four layers of atrous convolution with a dilation rate of 1, 2, 4, respectively, similar to what Yan el al [20] did in his model In the 2nd PASPP block, since the input size is 256×256, We believe that using more atrous convolutional layers and a higher dilation rate will result in a more informed final result Because of that reasons, we adopt atrous convolution with dilation rate of 1, 2, 4, 8, 16, 32 respectively C Double ResPath Fig Proposed DR-Unet architecture In the current work, we propose a new architecture that uses Double ResPath (DR), called Double Respath Unet (DRUnet) for nuclei segmentation The architecture of the model is Inspired by ResPath and ResUnet++ model, we propose a new shortcut called ”Double Res Path” in this article Double Res Path is illustrated in Figure In ResUnet++ architecture [9], Attention block combines low-level features in Encoder with high-level features in Decoder to identify which parts of the network need more attention We believe that concatenating the output of ResPath to the decoder feature both before and after upsampling step will preserve context information most accurately We also find that when using ResPath blocks, the usefulness of the attention block is still but not much, and removing them from the model will significantly reduce the cost calculation but still keep good results Furthermore, since the semantic gap between Encoder and Decoder decreases as the network is trained in deeper layers, we also gradually reduce convolutional blocks 196 2021 8th NAFOSTED Conference on Information and Computer Science (NICS) along Double ResPath [11], respectively, is 3, 2, However, corresponding to the number of filters at the ends of Double ResPath, we apply the number of filters to the layers at Double ResPath as 64, 128, 256 respectively IV E XPERIMENT AND RESULTS Dice score and 85.20 IoU score) The comparative methods include the Double Unet, Multi ResUnet, and ResUnet++ As shown in this Table, our approach obtains highest scores for both DSC and IoU In addition, the proposed model has less training parameters (2.6M) compared to ResUnet++ (4.1M), Multi ResUnet (7.3M), and DoubleUnet ( 29.3M) A Datasets TABLE I In the Data Science Bowl 2018 dataset [5], scientists worldwide were challenged to detect and image cells in a series of micro pictures using machine learning techniques The primary task is to determine image segmentation algorithms that can be used to a large number of tests images without any human influence This method might shorten the time to analyze images, allowing future researchers to adopt and test more experiments for research and clinical application The Data Science Bowl 2018 dataset contains 670 training pairs and 65 testing pairs with each pair includes an image and its corresponding masks [1] The dataset includes types of cell images: Small flourescent, Purple tis-sue, Pink and purple tissue, Large flourescent, Grayscale tissue with different percentages of data inside [5] R ESULT ON N UCLEI Method Double Unet Multi ResUnet ResUnet++ Ours (use PASPP) Ours (use ASPP) SEGMENTATION FROM CHALLENGE Params 29,297,573 7,275,844 4,101681 2,560,973 3,584,497 2018 DATA S CIENCE B OWL Dice 91.33 90.92 91.02 92.40 91.23 IoU 85.07 84.40 84.77 86.70 85.20 B Training We implemented our model and utilized the Adam algorithm for optimizing the trainable parameters of the model with 1e×10−4 learning rate The training process is looped on the dataset for 200 epochs with batch size Early stopping and Reducelronplateau have already been used C Evaluation Metrics To assess the neural network’s segmentation performance, we utilize the Dice Similarity Coefficient (DSC) The DSC metric is described as follows: × TP (1) FN + FP + × TP where TP, TN, FP, FN are respectively the number of true positives, true negatives, false positives and false negatives In addition to DSC, we also use the Intersection over Union (IoU) index as an alternative evaluation measure, defined as DSC = IOU = TP TP + FN + FP (2) D Results Figure presents the representative segmentation of our proposed approach on the 2018 Data Science Bowl challenge dataset As shown in this figure, the predicted masks are in good agreement with those in the ground truths In addition, the learning curves obtained from the train and validation sets are shown in Fig From this figure we can see that the DSC, and IoU as well as accuracy are stable after 100 epochs For quantitative assessment, we provide the evaluation scores including the DSC, and IoU of the proposed model and other state-of-the-arts in Table The use of PASPP helps an increase of 1.17 in average Dice score (92.40) and 1.5 IoU score compared to ASPP (92.23 Fig Some representative segmentation results of DR-Unet on Nuclei images from 2018 Data Science Bowl challenge dataset V C ONCLUSION In this paper, we have proposed the DR-Unet architecture for nuclei image segmentation In this novel model, we take advantage of Residual blocks, SE blocks Furthermore, we replaced ASPP with PASPP allowing more efficient semantic context extraction and Double Res Path development to 197 2021 8th NAFOSTED Conference on Information and Computer Science (NICS) Fig Learning curve reduce the semantic gap between features when combined Our experiments outperformed several previous state-of-the-art models on the 2018 Data Science Bowl dataset Besides, this result also demonstrates the potential of DR-Unet in cell image segmentation Nevertheless, we found the DR-Unet architecture to be even better In the future, besides developing the DR-Unet model, we will simultaneously develop appropriate loss functions for image segmentation to achieve even more performance ACKNOWLEDGMENT This research is funded by the Hanoi University of Science and Technology (HUST) under project number T2021-PC-005 [9] ] D Jha, P H Smedsrud, M A Riegler, D Johansen, T De Lange,P Halvorsen, and H D Johansen, “Resunet++: An advanced architecture for medical image segmentation,” in Proceeding of IEEE International Symposium on Multimedia (ISM), 2019, pp 225–2255 [10] Jha, D., Riegler, M., Johansen, D., Halvorsen, P., Johansen, H.: Doubleunet: A deep convolutional neural network for medical image segmentation In: IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS) 2020, pp 558-564 [11] N Ibtehaz and M Rahman, ”MultiResUNet : Rethinking the U-Net architecture for multimodal biomedical image segmentation”, Neural Networks, vol 121, pp 74-87, 2020 [12] K Simonyan and A Zisserman Very deep convolutional networks for large-scale image recognition arXiv:1409.1556, 2014 [13] C Szegedy, W Liu, Y Jia, P Sermanet, S Reed, D Anguelov, D Erhan, V Vanhoucke, and A Rabinovich Going deeper with convolutions arXiv:1409.4842, 2014 [14] Z Zhang, Q Liu, and Y Wang, “Road extraction by deep residual unet,” IEEE Geoscience and Remote Sensing Letters, vol 15, no 5, pp.749–753, 2018 [15] K He, X Zhang, S Ren, and J Sun, “Deep residual learning for image recognition,” in Proceedings of IEEE conference on computer vision and pattern recognition (CVPR), 2016, pp 770–778 [16] K He, X Zhang, S Ren, and J Sun Identity mappings in deep residual networks In ECCV, 2016 [17] J Hu, L Shen, and G Sun, “Squeeze-and-excitation networks,” in Proceedings of IEEE conference on computer vision and pattern recognition(CVPR), 2018, pp 7132–7141 [18] L.-C Chen, G Papandreou, I Kokkinos, K Murphy, and A L Yuille Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs arXiv:1606.00915, 2016 [19] Liang-Chieh Chen, George Papandreou, Florian Schroff, and Hartwig Adam Rethinking atrous convolution for semantic image segmentation CoRR, abs/1706.05587, 2017 [20] Q Yan, B Wang, D Gong, C Luo, W Zhao, J Shen, Q Shi, S Jin, L Zhang, and Z You, “Covid-19 chest ct image segmentation– a deep convolutional neural network solution,” arXiv preprint arXiv:2004.10987, 2020 R EFERENCES [1] ”2018 Data Science Bowl — Kaggle”, Kaggle.com, 2021 [Online] Available: https://www.kaggle.com/c/data-science-bowl-2018/overview [Accessed: 11- Nov- 2021] [2] KOtsu, N.: A threshold selection method from gray-level histograms IEEE Trans Syst Man Cybern 9, 62-66 ( 1979) [3] 5.Wăahlby, C., Sintorn, I.-M., Erlandsson, F., Borgefors, G., Bengtsson, E.: Combining intensity, edge and shape information for 2D and 3D segmentation of cell nuclei in tissue sections Journal of Microscopy 215, 67-76 (2004) [4] Hayakawa, T., Surya Prasath, V.B., Kawanaka, H., Aronow, B.J., Tsuruoka, S.: Computational Nuclei Segmentation Methods in Digital Pathology: A Survey Archives of Computational Methods in Engineering 28, 1-13 (2021) mining and advanced computing (SAPIENCE) (pp 198- 203) IEEE [5] Caicedo, J.C., Goodman, A., Karhohs, K.W., Cimini, B.A., Ackerman, J., Haghighi, M., Heng, C., Becker, T., Doan, M., McQuin, C., al., e.: Nucleus segmentation across imaging experiments: the 2018 data science bowl Nature Methods 16(12), 1247–1125 (2019) [6] G Litjens, T Kooi, B E Bejnordi, A A A Setio, F Ciompi, M Ghafoorian, J A Van Der Laak, B Van Ginneken, and C I Sanchez, ´ “A survey on deep learning in medical image analysis,” Medical image analysis (MedIA), vol 42, pp 60–88, 2017 [7] E Shelhamer, J Long and T Darrell, ”Fully Convolutional Networks for Semantic Segmentation”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol 39, no 4, pp 640-651, 2017 Available: 10.1109/tpami.2016.2572683 [8] 16.Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation In: International Conference on Medical image computing and computer-assisted intervention 2015, pp 234-241 Springer 198 ... that uses Double ResPath (DR) , called Double Respath Unet (DRUnet) for nuclei segmentation The architecture of the model is Inspired by ResPath and ResUnet++ model, we propose a new shortcut called... also demonstrates the potential of DR- Unet in cell image segmentation Nevertheless, we found the DR- Unet architecture to be even better In the future, besides developing the DR- Unet model, we will... representative segmentation results of DR- Unet on Nuclei images from 2018 Data Science Bowl challenge dataset V C ONCLUSION In this paper, we have proposed the DR- Unet architecture for nuclei image segmentation

Ngày đăng: 22/02/2023, 22:48