2021 8th NAFOSTED Conference on Information and Computer Science (NICS) A modified FCN-based method for Left Ventricle endocardium and epicardium segmentation with new block modules Do-Hai-Ninh Nham Minh-Nhat Trinh School of Applied Mathematics and Informatics Hanoi University of Science and Technology ninh.ndh182714@sis.hust.edu.vn School of Electrical and Electronic Engineering Hanoi University of Science and Technology minhnhattrinh312@gmail.com Tien-Thanh Tran Van-Truong Pham Smart Health Center Vingroup Big Data Institute trantienthanh081298@gmail.com School of Electrical and Electronic Engineering Hanoi University of Science and Technology truong.phamvan@hust.edu.vn Thi-Thao Tran School of Electrical and Electronic Engineering Hanoi University of Science and Technology thao.tranthi@hust.edu.vn Abstract—Cardiac segmentation of medical magnetic resonance images has been crucial nowadays owing to its necessity for cardiac problems diagnosis In the increasing demand of advanced procedures for cardiac disease diagnosis and inspired by the structure of receptive field block; in this paper, we propose a new block module then further assembling into a deep fully convolutional neural network to deal with automated left ventricle segmentation With only one learning stage, our proposed model is trained end-to-end, pixels-to-pixels and validated on two popular cardiac MRI benchmarks, ACDC and SunnyBrook datasets Several experiments have proved that our new model architecture has a better performance than previous segmentation methods with enhanced feature discriminability and robustness, despite having much less training parameters Index Terms—kernel size; dilation rate; fully convolutional neural network; cardiac MRI segmentation I I NTRODUCTION To interpret the cardiovascular function, cardiac magnetic resonance imaging (MRI) has been extensively utilized Segmenting the left ventricle (LV) from these MRI images is an imperative step for cardiac problem diagnosis; as many information like systolic and diastolic issues could be sooner observed [1] Thus there is a desperate necessity for upgrade in LV segmentation methods nowadays As clinical approaches could be tedious and sometimes might lead to human error risks, developed automated approaches such as deep learningbased approaches (DL) [2], [3], active contour models approaches (ACMs) [4], [5] have proved their efficiencies As regards ACM methods, the level set-based active contour model is shown to be effective on enabling the curve to update Identify applicable funding agency here If none, delete this 978-1-6654-1001-4/21/$31.00 ©2021 IEEE its topology throughout the segmentation process Despite revealing benefits, there are certain deficiencies that unless contour initialization has to be constructed, the final result might have less acceptable objects boundary information There is also remarkable success indicated in LV segmentation with reference to deep learning-based methods, as Tran [6] has employed a deep fully convolutional neural network (FCN) into cardiac MRI segmentation In his paper, traditional fully connected layers are replaced with convolutional layers to seize a coarse inference map, before deconvoluting it with fractional stride convolution to yield temporal coherence and discriminative features then obtain prominent results Notably methods using U-Net [7]; due to skip connections operation, which deep, coarse, semantic feature maps from the encoder to the corresponding layers in the decoder could be allowed to propagate, the output segmentation map is proved to be more proper Trinh et al [3] have combined the vanilla UNet with modules of Swish activation [8] and Squeeze-andExcite block [9] for better tunning parameters and highlighting the best salient boundary features Tran et al [10] have formulated the issue as a two-stage problem and build a hybrid algorithm, where the first phase hypothesizes raw segmentation features of all masks by a vanilla U-Net In the second phase these features are handled as initial level set functions for the multiphase active contour model (MPACM) before providing persistently outstanding cardiac MRI contours prediction performance However, all these previous deep network methods have resulted in high computational costs, low inference speed and borderline data insufficiency as a consequence of strides and dilated convolution defection Motivated by these limitations, in this work, we develop 392 2021 8th NAFOSTED Conference on Information and Computer Science (NICS) and validate a new approach for automatic contours detection of cardiac MR images Our main contributions could be summarized as follows: • We propose new block modules to optimize the receptive field on feature maps, thus aiming to enhance deep features, improving feature detection and correct occasional misalignment in MRI segmentation performance, in condition of sensitive contour initialization in cardiac MRI segmentation performance • Using the backbone of FCN, we replace FCN’s convolutional block modules by our proposed modules before training the model end-to-end • We show that our new approach achieves state-of-the-art results on the ACDC and SunnyBrook datasets at a real time processing speed with only one training stage, and demonstrate the effectiveness of our notably light-weight model to other models II R ELATED W ORK Fully Convolutional Neural Network (FCN) Previously, FCN has been exploited in many proposals and applications With fully convolutional inference, a convolutional network has been exemplified for raw multi-class segmentation of particular tissues [11] Also within a convolutional network, multi-scale and sliding window technique could be constructed for classification, localization and even detection [12] Evolving from [12], FCN has been adopted as a segmentation method [13], which includes FCN-8s, FCN-16s and FCN-32s FCN-8s and FCN-16s are validated to have more outstanding segmentation performances, as the FCN-32s outcome is rougher due to the loss of spatial location and local information Furthermore, FCN in [6] highlighted the first time ever to be employed in pixel-wise labelling or per-pixel classification, benchmarked on cardiac MRI datasets Receptive Field In neuroscience, receptive field is a region in the sensory periphery within which stimuli can influence the electrical activity of sensory cells [15] Similarly, in a deep learning context, the Receptive Field (RF) is basically defined as a measure of association of an output feature of any layer to the input region [16] Inception architecture [17] has overcome the hurdles of huge variation locaction, being overfitted and expensive computatation cost by putting multiple RFs sizes (1 × 1, × 3, × 5) into use Thus the network structure would essentially get a bit wider rather than deeper Inception variants [18], [19] have factorized convolutions of filter size n × n into a combination of × n and n × convolutions; therefore attaining promising performances on object detection and classification In the Deformable Convolutional Network (DCN) [20], several RF sizes of deformable filters are correlated with object scales and shapes, indicating that the deformation is efficiently learned from input image or feature maps Compared with larger but fixed dilation value of atrous convolution [21], deformable convolution utilizes different dilation values applied to each point in the grid during convolution In addition, DCN is also reported to be an extremely light-weight Spatial Transformer Network [22]; hence DCN has won as the runner-up in the COCO Detection as well as Segmentation Challenge III M ETHODOLOGY A The Proposed Block Module In general, assuming that the input shape is nh × nw and the convolutional kernel shape is kh × kw , the output shape will be (nh − kh + 1) × (nw − kw + 1) Accordingly, the output shape of the convolutional layer is affected by the shape of the input and the shape of the convolution kernel Convolutional kernels with odd height and width values (such as 1, 3, 5, or 7) are frequently utilized; which the spatial dimensionality could be preserved while padding with the same number of rows on top and bottom, and the same number of columns on left and right This usage of odd kernels and padding to precisely preserve dimensionality reveals certain efficiency; that for any two-dimensional tensor X, when the kernel size is odd and the padding rows and columns number of all sides are all similar, obtaining an output with equal height and width as the input, we know that the output y[i, j] is determined by cross-correlation of the input and convolutional kernel with the window centered on x[i, j] [23] The question is, how to decide the most optimized kernels for convolutional operation? Salient parts of the input image could have substantially wide variation in shape and size Owing to this, adopting the precise kernel size for convolutional operation has become a tough issue If a smaller kernel is adopted for locally distributed information, a larger one is preferred for more global features Very deep network makes it hard for gradient to be passed during updating through the whole network; also, naively stacking large convolutional operations is computationally expensive Consequently, convolutional filters with several shapes on the same level need to be taken into account As above, asymmetric n×n convolution could be performed as the integration of 1×n and n×1 convolutions For instance, a × convolution is proved to be equivalent to a × then × convolution when their computational cost of is proved to be 33% cheaper Thus, instead of employing consecutive × 3, × 5, × and × operations, we consider a merger of × and × 1, × and × 1, × and × 1, × and × respectively; before applying an asymmetric convolution operation One primary obstacle with the above modules is that even a modest number of × and × convolutions could be prohibitively expensive on top of a convolutional layer with a large number of filters A × convolutional layer is judiciously exploited to overcome this dispute, as it could offer a channel-wise pooling, also called feature map pooling or a projection layer Such simple technique with low dimensional embeddings could be utilized in dimensionality reduction whilst retaining salient features; also, generating a one-to-one projection of stack of feature maps to pool features across channels after conventional pooling layers To obtain a promising performance on MRI segmentation, convolutional procedures have to analyze both globally and 393 2021 8th NAFOSTED Conference on Information and Computer Science (NICS) locally distributed features If familiar discrete convolutions are applied throughout the network architecture, it will be necessary to assign large kernels in order to achieve global view, which is responsible for parameters surging Therefore, we adopt dilated convolutions [24] at the asymmetric convolution layers as it supports RF exponential expansion without loss of resolution, while the parameters number increases linearly, to control the features’ eccentricities Between two consecutive layers, an activation function and a Mean-Variance normalization layer (MVN) [6] are used for humbling the pixel distribution shifting right after a convolutional operation Compare with Batch Normalization [14], which reduces internal covariate shift and accelerates the gradient flow through the network like MVN; MVN is still effective though being much simpler as it primarily centers and standardizes a single batch at a time About the activation, instead of using ReLU activation; Swish [8] is selected as the only activation function throughout the training Overall, our proposed block modules could be briefly described as in Fig We create block modules In block Fig Our modified FCN based on the new receptive field block module in Fig Fig Our proposed block module (n = 2i + 1; i ∈ [1, 4]) module i, (i ∈ [1, 4]), there are convolutional blocks, whose kernel sizes are (1, 1), (1, n), (n, 1), (n, n) respectively that n = 2i + All the dilation rates are set up to be equal to (1, 1) except for the last one, which is (n − 2, n − 2) B Model Architecture Assembling the new module to the FCN backbone [6], we construct an advanced one-stage segmentation model Even though having such a lightweight module, our modified FCN delivers relatively comparable results to some of the start-ofthe-art approaches and moreover retaining a very fast speed Our new 16-layer model architecture is displayed as Fig C Loss function Sigmoid activation is employed to form the loss and the binary image from the output layer, which contains c = labels standing for the total number of classes for each pixel to be classified into Denoting there are totally N pixels for prediction as well as groundtruth, P and L be the predicted set and the groundtruth set respectively which |P | = |L| = N Pic and Lic are the element of P and L in order, with condition that i ∈ {1, 2, , N } and c ∈ {0, 1}; Pic ∈ [0, 1]; Lic ∈ {0, 1} displaying ground truth label and predicted label probability correspondingly Dice Loss is our loss function usage in the training process, it is calculated as: PN (Pic + Lic ) − 2Pic Lic (1) Dice Loss = i=1 PN i=1 (Pic + Lic ) + ϵ D Evaluation Metrics In medical image analysis, while the Dice Similarity Coefficient (DSC) is a statistical tool for measuring similarity between segmentation maps, the Intersection over Union index (IoU) is also a statistical tool but to gauge the similarity and diversity of sample pixel sets They are determined by: PN 2Pic Lic + ϵ DSC(P, L) = PN i=1 (2) i=1 (Pic + Lic ) + ϵ PN i=1 Pic Lic + ϵ IoU (P, L) = PN (3) i=1 (Pic + Lic − Pic Lic ) + ϵ 394 2021 8th NAFOSTED Conference on Information and Computer Science (NICS) The smooth coefficient ϵ is provided for preventing zero division, in experiment we suppose ϵ to be 1e − 15 In a diagnostic test, while sensitivity (SEN) is determined as the rate of people who are diagnosed as positive out of those who actually have the condition, specificity (SPE) is determined as the rate of people who are diagnosed as negative Mathematically, sensitivity=number of true positives (TP)/(number of true positives (TP) + number of false negatives (FN)) and specificity=number of true negatives (TN)/(number of true negatives (TN) + number of false positives (FP)): TP (4) SEN = TP + FN TN SP E = (5) TN + FP Accuracy (ACC) is also used as a statistical measure of how well a binary classification test correctly identifies or excludes a condition: TP + TN (6) ACC = TP + TN + FP + FN IV E XPERIMENTAL R ESULTS sub-tasks respectively for endo- and epi-cardium contour prediction Our models are trained end-to-end, with cost minimization on various epochs (base on different cases) is operated by applying NADAM optimizer [27] with an original learning rate of 0.001 Learning rate is multiplied by 0.5 every 10 epochs, before reaching 0.00001 and being constantly kept through the remainder training period All images are all rigidly pre-processed by cropping center and normalization, before being augmented by randomly flipping horizontally and vertically, and rotated within π radians, before being trained with NVIDIA Tesla P100 16GB GPU C Evaluation TABLE I P ERFORMANCES COMPARISON OF VARIOUS MODELS ON THE ENDOCARDIUM OF S UNNY B ROOK DATASET Models DSC Trinh [3] 32.4M 0.7850 FCN[1] [6] 10.9M 0.9156 Queir´os [29] 0.90 U-Net [7] 31.0M 0.6123 Hu [30] 0.89 Ours 9.1M 0.9196 [1] w/o finetune and Xavier unit A Benchmarks There are 45 cine-MRI images in the Sunnybrook dataset [25] (2009 Cardiac MR Left Ventricle Segmentation Challenge data), comprising a mixed of patients and pathologies: healthy, hypertrophy, heart failure with infarction and heart failure without infarction While a part of this dataset was first used in the automated myocardium segmentation challenge from short-axis MRI, held by a MICCAI workshop in 2009; the entire dataset is now available in the CAP database with public domain license There are three different subsets of 15 cases each: training, validation, and online with groundtruth respectively The training set is utilized for training model for LV endocardium and epicardium segmentation, when validation and online sets would be used for evaluation Having been released by the ”Automatic Cardiac Diagnosis Challenge (ACDC)” workshop held in conjunction with the 20th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), the public ACDC dataset [26] comprises 100 patient 4D cine-CMR scans each consisting of segmentation masks for the left ventricle (LV), the myocardium (Myo) and the right ventricle (RV) at the end-systolic (ES) and end-diastolic (ED) phases of each patient Then the training database containing the manual segmentation masks is separated into the training set and test set with ratio 8:2 to assess the proposed model, before comparing the performance between several automatic methods on the segmentation of the left ventricular endocardium and epicardium for both end diastolic and end systolic phase instances B Training We have performed our customized FCN to segment cardiac left ventricles on MRIs It is necessary to note that in each segmentation task of each dataset, we break down the task into Size Endocardium IoU SEN SPE 0.6996 0.8991 0.9920 0.8537 0.9336 0.9934 0.5438 0.7294 0.9914 0.8580 0.9314 0.9942 ACC 0.9746 0.9866 0.9731 0.9870 TABLE II P ERFORMANCES COMPARISON OF VARIOUS MODELS ON THE EPICARDIUM OF S UNNY B ROOK DATASET Models Size DSC Trinh [3] 32.4M 0.8620 FCN[1] [6] 10.9M 0.9466 Queir´os [29] 0.94 U-Net [7] 31.0M 0.6896 Hu [30] 0.94 Ours 9.1M 0.9505 [1] w/o finetune and Xavier unit IoU 0.7940 0.9009 0.6347 0.9100 Epicardium SEN 0.9277 0.9542 0.7150 0.9507 SPE 0.9843 0.9886 0.9921 0.9910 ACC 0.9726 0.9814 0.9568 0.9826 We compare our algorithm with several algorithms to evaluate the efficiency of our new approach and write down the mean value of each metric index into tables Ablation test results recorded in the Table I and Table II show that our proposed method is confirmed to be the best on endocardium segmentation of the SunnyBrook dataset according to testing measurement on DSC (0.9196), IoU (0.8580), SPE (0.9942) and ACC (0.9870), also on epicardium segmentation of the SunnyBrook dataset with DSC (0.9505), IoU (0.9100), SPE (0.9910) and ACC (0.9826) This indicates that our proposed method is more convinced of the predicted regions and provides more accurate boundary maps, even with exceptionally light-weighted backbone (9.1 million trainable parameters), comparing with Tran et al [10] (which is a two-stage architecture) and other popular architectures 395 2021 8th NAFOSTED Conference on Information and Computer Science (NICS) TABLE III P ERFORMANCES COMPARISON OF VARIOUS MODELS ON THE ENDOCARDIUM OF ACDC DATASET Models Size DSC FCN[1] [6] 10.9M 0.8576 FCN[2] [6] 10.9M 0.89 Trinh [3] 32.4M 0.9253 SegNet [28] 29.5M 0.8192 U-Net [7] 31.0M 0.8810 Ours 9.1M 0.8798 [1] w/o finetune and Xavier unit Endocardium IoU SEN SPE ACC 0.8047 0.9272 0.9810 0.9798 0.83 0.8964 0.9661 0.9966 0.9940 0.7470 0.9300 0.9829 0.9783 0.8225 0.9504 0.9909 0.9920 0.8360 0.9512 0.9852 0.9823 [2] with finetune and Xavier unit Fig Endocardium contour prediction on the SunnyBrook dataset between different models in some complex cases TABLE IV P ERFORMANCES COMPARISON OF VARIOUS MODELS ON THE EPICARDIUM OF ACDC DATASET Models Size DSC FCN[1] [6] 10.9M 0.9167 FCN[2] [6] 10.9M 0.92 SegNet [28] 29.5M 0.8896 U-Net [7] 31.0M 0.9189 Ours 9.1M 0.9273 [1] w/o finetune and Xavier unit Epicardium IoU SEN SPE ACC 0.8803 0.9631 0.9947 0.9918 0.89 0.8321 0.9526 0.9942 0.9907 0.8712 0.9618 0.9945 0.9918 0.8978 0.9656 0.9969 0.9943 [2] with finetune and Xavier unit To present a fairer verification of our proposed approach, corresponding outcomes of ours and five different approaches have been displayed in Table III and Table IV on the benchmark of the ACDC dataset From Table IV, our proposed method is confirmed to outperform other compared methods on epicardium segmentation of the ACDC dataset according to testing measurement on DSC (0.9273), IoU (0.8978), SPE (0.9969) and ACC (0.9943), which is very comparable to some of the state-of-the-art approaches Nevertheless, with regards to the endocardium segmentation in Table III, Trinh et al [3] has gained the best results with Dice score of 0.9253, while our result stays modest with Dice score of 0.8798 Fig Epicardium contour prediction on the SunnyBrook dataset between different models in some complex cases D Representative Results In this section, we provide quantitative results produced by our proposed model for difficult input cases As indicated in Fig 3-5, our simple model can achieve better results than some of existing methods; which verifies the strength and light-weighting of our proposed method have obtained quite a comparable performance on inputs with high complexity From the results on figures and tables, we demonstrate other fundamental observations that: • U-Net-based networks seem to have poorer performances than FCN-based networks on the SunnyBrook dataset, as vanilla U-Net, in some particular cases it could not provide an acceptable prediction as Fig and Fig have illustrated • In contrast, FCN-based methods seem to have more mediocre results than U-Net-based methods However, our FCN-based performance on the epicardium of ACDC Fig Epicardium contour prediction on the ACDC dataset between different models in some complex cases dataset has verified our approach’s convincement on providing comparable results for different datasets (which very little number of algorithms could and no lightweighted models could do), as shown in Fig as our light-weighted network still outperforms vanilla U-Net on a side of the ACDC dataset segmentation V C ONCLUSION In this paper, we have introduced a new light-weighted deep-learning framework for cardiac MRI contours prediction Our experimental outcomes through the training process ev- 396 2021 8th NAFOSTED Conference on Information and Computer Science (NICS) idently express that our new model has achieved prominent performances on the two popular datasets SunnyBrook and ACDC Even though, those are just small datasets in the deep learning era Therefore, in the future, we will dedicate more effort on complex MRI segmentation datasets to minimize the segmentation errors of the deep-learning approaches and maximize the acceleration, efficiency and reliability of the deep-learning network ACKNOWLEDGEMENT This research is funded by the Hanoi University of Science and Technology (HUST) under project number T2021-PC-005 Minh-Nhat Trinh was funded by Ngày đăng: 18/02/2023, 05:28