MS UNet a multi scale UNet with feature recalibration approach for automatic liver and tumor segmentation in CT images

Computerized Medical Imaging and Graphics 89 (2021) 101885 Contents lists available at ScienceDirect Computerized Medical Imaging and Graphics journal homepage: www.elsevier.com/locate/compmedimag MS-UNet: A multi-scale UNet with feature recalibration approach for automatic liver and tumor segmentation in CT images Devidas T Kushnure a, b, *, Sanjay N Talbar a a Department of Electronics and Telecommunication Engineering, Shri Guru Gobind Singhji Institute of Engineering and Technology, Nanded, Maharashtra, India Department of Electronics and Telecommunication Engineering, Vidya Pratishthan’s Kamalnayan Bajaj Institute of Engineering and Technology, Baramati, Maharashtra, India b A R T I C L E I N F O A B S T R A C T Keywords: Deep learning Convolutional neural network Liver and tumor segmentation Multi-scale feature Feature recalibration CT images Automatic liver and tumor segmentation play a significant role in clinical interpretation and treatment planning of hepatic diseases To segment liver and tumor manually from the hundreds of computed tomography (CT) images is tedious and labor-intensive; thus, segmentation becomes expert dependent In this paper, we proposed the multi-scale approach to improve the receptive field of Convolutional Neural Network (CNN) by representing multi-scale features that extract global and local features at a more granular level We also recalibrate channelwise responses of the aggregated multi-scale features that enhance the high-level feature description ability of the network The experimental results demonstrated the efficacy of a proposed model on a publicly available 3Dircadb dataset The proposed approach achieved a dice similarity score of 97.13 % for liver and 84.15 % for tumor The statistical significance analysis by a statistical test with a p-value demonstrated that the proposed model is statistically significant for a significance level of 0.05 (p-value < 0.05) The multi-scale approach im proves the segmentation performance of the network and reduces the computational complexity and network parameters The experimental results show that the performance of the proposed method outperforms compared with state-of-the-art methods Introduction According to the status report on the Global Burden of Cancer worldwide (GLOBOCAN), 2018 estimates liver cancer growth and mortality rate rapidly increasing across the world It is the sixth most common cancer and the second most leading cause of cancer deaths worldwide (Bray et al., 2018) In the human body, the liver is one of the massive and essential organs involved in detoxification, filtering blood from the digestive tract, and supplying it to the body parts (Bilic et al., 2019) Thus, the liver becomes the first site that often affects due to the spread of metastases tumor from a primary site such as colorectal, breast, pancreatic, ovarian, and lung The growth of liver tumors due to metastases tumor is secondary liver cancer Also, liver cancer, which originates in the liver cells (hepatocytes) such as Hepatocellular Carci noma (HCC), is the primary liver cancer HCC contains a hereditarily and molecularly exceptionally heterogeneous group of malignant growths that usually emerge in the chronically damaged liver HCC af fects the hepatocytes or liver cells, which leads to a mutation in the structure and shape of the affected liver cells that determine the progress of cancer These perceptible variations in shape and tissue structure allow for the non-invasive identification of HCC in imaging (Christ et al., 2017) The radio imaging modalities such as ultrasound, computed tomog raphy (CT), and magnetic resonance imaging (MRI) are utilized to detect anomalies in the upper and lower abdomen part of the body Radio imaging is a non-invasive, painless, and precise technique to identify an inside injury to help clinical specialists diagnose the complication and plan the treatment for saving the life of the patient Medical imaging techniques have become more popular nowadays to diagnose and further treat the disease and its progress (Bilic et al., 2019) Due to ease and less time required to capture the human body’s exact inner struc ture, the CT scan becomes the medical expert’s choice to diagnose liver-related complications and anomalies (Luo et al., 2014) Clinically liver and tumor segmentation from CT images is an important task to deal with hepatic disease diagnosis and treatment planning The liver volume assessment is the directive before * The corresponding author is a research scholar at the Department of Electronics and Telecommunication Engineering, Shri Guru Gobind Singhji Institute of Engineering and Technology, Nanded, Maharashtra, India E-mail address: devidas.kushnure@vpkbiet.org (D.T Kushnure) https://doi.org/10.1016/j.compmedimag.2021.101885 Received 25 May 2020; Received in revised form 22 January 2021; Accepted 24 January 2021 Available online 24 February 2021 0895-6111/© 2021 Elsevier Ltd All rights reserved D.T Kushnure and S.N Talbar Computerized Medical Imaging and Graphics 89 (2021) 101885 (Moghbel et al., 2018) Because of these challenges, the liver and tumor segmentation turn into a challenging assignment that has pulled in much research consideration in recent years hepatectomy, and it assists the doctors and surgeons in planning liver resection, liver transplantation, portal vein embolization, associating liver partition and portal vein ligation for staged hepatectomy (ALPPS) (Gotra et al., 2017) and post-treatment assessment It is also essential for applications such as computer-aided diagnosis (CAD) and deciding the interventional radiological treatment The liver tumor volume extrac tion with high accuracy is beneficial for the planning of Selective In ternal Radiation Therapy (SIRT) (Radioembolization) to diminishing the risk of abundance or low radiation dose as per the patient’s liver volume (Moghbel et al., 2018) Therapy planning for the liver, primary and metastatic tumors using the percutaneous ablation is the minimally invasive surgical procedure guided through image navigation (Spinczyk et al., 2019) The liver segmentation is the significant stage to detect the hepatic complications earlier with radio imaging The CT is the medical expert’s preferred imaging modality to deal with hepatic diseases because of its robustness, vast availability, less acquisition process, and higher spatial resolution In clinical routine medical experts delineate the liver and tumor manually from CT images; manual segmentation is considered the gold standard in medical practice and research How ever, the liver and tumors’ manual outline is tedious and time-consuming, which could delay the diagnosis process The seg mentation depends on the expert’s knowledge and experience that may cause an erroneous segmentation outcome Due to these reasons, it is essential to provide the computer-based framework that will automati cally segment liver and tumor with the accuracy acceptable to the clinical significance and offer the second opinion to the physician to conclude with more accuracy in less time Many researchers and the scientific community focus on developing the framework for automatic liver and tumor segmentation with modern image processing and computer vision algorithms In the last three decades, much scientific research on automatic and interactive segmentation strategies proposed in the literature Even though automatic liver and liver tumor segmentation from CT volume remains challenging, the liver is a soft organ, and its shape is excep tionally dependent on the surrounding organs inside the abdomen Apart from that, the liver pathology is inconsistent that may modify its signal intensity, density, and distortion in shape, less intensity difference between the liver and tumor region, uniformity in intensities of the liver with its surrounding organs, and feeble boundaries between the liver to its surrounding organs such as stomach and heart illustrated in Fig Usually, liver CT images are obtained by utilizing an injection protocol that enhances the liver in the CT images for medical interpretation However, the injection phase decides the enhancement variation, and noise in the CT images increases with the enhancement that causes an increase in liver noise, which is already noisy without any enhancement Related literature In literature, several interactive and automatic methods for liver and tumor segmentation in CT volumes are proposed In 2007 Grand chal lenge benchmarks on liver and liver tumor segmentation conducted in conjunction with the MICCAI conference Most of the presented methods at the challenge were based on statistical shape models for automatic segmentation (Heimann et al., 2009) Furthermore, methods (Luo et al., 2014) based on the liver gray intensities, structure, and texture features proposed automatic liver segmentation The gray level-based methods utilized the liver gray intensities for segmentation These methods used intensity-based algorithms like region growing, active contour, graph cut, thresholding, and clustering The structure-based methods utilized the liver’s repetitive geometry to create the probabilistic model to reconstruct the liver shape The methods used for the liver segmentation statistical shape model, statistical pose model, and probabilistic atlas The texture-based methods utilized the texture features to segment the liver The algorithms based on machine learning and pattern recognition classify the liver region based on the texture feature description After ward, a computer-aided diagnosis (CAD) system is envisaged as the second pair of eyes to the expert radiologists if the CAD system works with accuracy Several methods utilized machine learning algorithms like probabilistic neural network (PNN), support vector machine (SVM), region growing, and alternative fuzzy C-means (AFCN), Hidden Markov Model (HMM) algorithms to design CAD systems for liver and tumor segmentation (Moghbel et al., 2018) Over the past few years, Deep learning algorithms based on Con volutional Neural Network (CNN) have become famous for visual recognition applications because of their powerful nonlinear feature extraction capabilities using many different filters at a different layer of the network and capability to process a large amount of data (Yamashita et al., 2018) CNN based architecture outperformed for various appli cations like image classification, object detection, action recognition The CNN-based networks like Alexnet, VGGNet, GoogleNet, ResNet proved the capability for visual recognition tasks in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) (Ueda et al., 2019) After CNN’s success for an efficient classification task, the researcher exploi ted the same backbone architecture for the semantic segmentation task The Fully Convolutional Neural Network (FCN) (Shelhamer et al., 2017) based architecture employed the existing well-known classifier models for semantic segmentation by replacing the dense classifier layers The Fig Sample images from the 3Dircadb dataset that denotes the complications in liver and tumor segmentation in the abdomen CT scan: (a) Low-intensity dif ference between nearby organs (liver, stomach, and heart) and tumor (b) Ambiguous boundary between the liver, heart and (c) stomach D.T Kushnure and S.N Talbar Computerized Medical Imaging and Graphics 89 (2021) 101885 FCN decoder architecture was considered to be the most successful for segmentation The decoder network was utilized to upsample the segmented map into the input image size For semantic segmentation of images, SegNet (Badrinarayanan et al., 2017) was proposed It has the encoder-decoder network followed by a pixel-wise classification layer The encoder architecture utilized the identical network topology like 13 convolutional layers VGGNet In medical image processing, the semantic segmentation task is uti lized to segment the organ’s anatomical structure and segmentation of tumors The automatic segmentation of the region of interest from medical images using CNN-based architecture proved the effectiveness (Litjens et al., 2017) The FCN inspired encoder-decoder UNet archi tecture proposed for biomedical image processing An encoder-decoder design becomes a decision for the medical image segmentation that has an encoding part, which falls information of input image into a group of high-level features and decoding parts, where high-level features uti lized to rebuild a pixel-wise segmentation at a single or multiple upsampling steps (Ronneberger et al., 2015) After this paper, the al gorithms proposed based on FCN utilized the UNet derived architecture for medical image segmentation The Liver Tumor Segmentation Benchmark (LiTS) challenge was organized in conjunction with the ISBI 2017 and MICCAI 2017 The methods were presented based on the CNN deep learning approach; the majority of methods were UNet derived architecture Almost all of the techniques utilized specific preprocessing on input data like HU-value windowing, normalization, and standardi zation Additionally, most of the techniques applied connected lesion components post-processing method on the segmented map to discard the portion of lesion outside the liver region (Bilic et al., 2019) Furthermore, most available liver and liver tumor segmentation networks are based on FCN with UNet encoder (contraction) and decoder (expansion) structure In UNet, all layers are CNNs to achieve the pixel level prediction in a forward step UNet has an encoding and decoding path built using the convolution, pooling, and upsampling layers For improving the segmentation capability of UNet, the encoding path features concatenated with a decoding path at the respective stage using skip connection To enhance the segmentation output further few methods (Budak et al., 2020)(Gruber et al., 2019) proposed were uti lized two UNet architecture jointly for liver and tumor segmentation The liver and kidney segmentation complex CNN architecture (Efre mova et al., 2019) combined the UNet and LinkNet-34 with ImageNet-trained ResNet-34 for feature encoder to reduce the conver gence time and overfitting problem Further, to improve the prediction capabilities of FCN based archi tectures, the residual connections (Drozdzal et al., 2016) (Zhang et al., 2019) were employed from a forward path in the intermediate feature map and post-processing to refine the performance of the segmentation The Deep CNN model (Han, 2017) based on ResNet used long-range UNet and short-range ResNet residual connection with post-processing using 3D connected component labeling of all voxels labeled as a lesion Further, post-processing methods were proposed to refine the segmentation performance of the algorithms The FCN based encoder-decoder model (Zhang et al., 2017) (Christ et al., 2017) (Chlebus et al., 2017) demonstrated the effect of the level set, graph cut, CRF, and random forest algorithms utilized for post-processing to refine segmentation results The super-pixel-based (Qin et al., 2019) CNN proposed divided the CT image into super-pixel by aggregating the adjacent pixels with the same intensity and classified into three classes interior liver, liver boundary, and non-liver background and utilized the CNN to predict the liver boundary Recently, many extensions of UNet by modifying the core structure proposed to segment liver and tumor The novel hybrid densely con nected UNet (Li et al., 2018) proposed to explore intra-slice and inter-slice features by introducing a hybrid feature fusion layer using 2D and 3D DenseUNet The 2D DenseUNet extract the intra-slice features, and 3D Dense-UNet extract the inter-slice volumetric context, and these features were fused to portray 3D interpretation using the feature fusion layer The 3D residual attention-aware RA-UNet (Noh et al., 2015) proposed using the residual learning approach to express multi-scale attention information and combine low-level features with high-level features The modified UNet architecture (Seo et al., 2020) proposed to exploit the object-dependent feature extraction using a modified skip connection with an additional convolutional layer and residual path The modified skip connection extract high-level global features of small objects and high-level features of high-resolution edge information of the large object Recently, UNet++ architecture (Zhou et al., 2020) was proposed by a redesign of the skip connection to exploit the multi-scale features using a feature fusion scheme from each encoding layer to the decoding layer In semantic segmentation, the CNN extracts the critical features of the image and effectively decides the coarse boundary of the target However, it is observed that end of the encoder, the size of feature maps is remarkably reduced; as a result, obstructing the accuracy of CNN The consecutive downsampling by pooling operations reduces the input image resolution to a small size feature map It results in loss of spatial information about the object, which is essential in analyzing medical images for accurate segmentation of target objects There are various methods such as deconvolution (Noh et al., 2015), skip connections (Ronneberger et al., 2015) have been proposed which utilized the concept of transpose convolution for upsampling, and skip connection for connecting upper convolution layers with the deep layers in such a way that network can maximize the utilization of the high-level features to preserve the spatial information However, these methods cannot recover the spatial information loss that occurred in the pooling and convolution operation Moreover, CNN models need to process the features at a different scale to extract the meaningful contextual information of the object for achieving successful semantic segmentation Multi-scale feature char acterization achieved by FCN with variable pooling layers and combining the features from previous layers with a deeper layer to maintain the global and local information of the object and achieved effective semantic segmentation (Long et al., 2015) The pyramid scene parsing network (PSPNet) (Zhao et al., 2017) utilized global context information accumulated from region-based features employing a pyr amid pooling In pyramid pooling, global and local information is characterized effectively by transforming input features from multiple pooling and aggregating all the features to achieve effective semantic segmentation Later, the DeepLab system (Chen et al., 2018)(Chen et al., 2017) proposed to preserve the spatial resolution by utilizing the atrous convolution module The atrous convolution is employed in series or parallel to achieve the expansion of the receptive field of CNN and atrous spatial pyramid pooling with adopting multiple atrous rates utilized to gain the multi-scale context depiction Channel-UNet (Chen et al., 2019) proposed optimized the mapping of information between pixels in convolution layers with spatial-channel convolution and adopted an iterative learning mechanism that expands the receptive field of convolution layers In this paper, we propose the CNN with encoder-decoder UNet based multi-scale feature representation and recalibration architecture for liver and liver tumor segmentation We utilize the bottleneck Res2Net module’s ability to represent multi-scale features and improve the receptive field of the convolutional neural network (CNN) Further, we recalibrate the multi-scale features channel-wise with a squeeze-andexcitation (SE) network We perform the experimentation on the 3Dir cadb dataset available publicly The results illustrated that the multiscale UNet outperforms as compared to the state-of-the-art methods for liver and tumor segmentation Following contributions are incorporated in the paper • We proposed the MS-UNet with a feature recalibration approach by exploiting the core idea of UNet encoder-decoder architecture The architectural difference is that our network utilized the Res2Net module for multi-scale feature representation and the SE network for D.T Kushnure and S.N Talbar Computerized Medical Imaging and Graphics 89 (2021) 101885 feature recalibration in encoder and decoder stages The combina tion of the Res2Net module followed by the SE network enhance the feature representation capability and learning potential of the network The computational complexity and the parameters of the proposed network are reduced because of the Res2Net module • We employed the multi-scale Res2Net module in the network to enhances the receptive field of CNN to cover the entire region of interest from the input features and characterize global and local information of the input at the more granular level by extracting multi-scale features Therefore, the input feature represented with multiple features at different scales is aggregated hierarchically, signifying detailed information of the input features • The aggregated multi-scale features extract the more granular in formation of the input that limits the learning capacity of the network To improve the network learning ability and focus on more prominent features of the object, we utilized the SE network The SE network recalibrates the channel wise feature responses by modeling interdependencies between the channels with squeeze and excitation operations Feature recalibrations enhance the network sensitivity to informative features of the object so that the network’s ability to learn prominent features improves Also, the feature extraction capability of successive network layers increases that boost the seg mentation performance • We experimentally verified the MS-UNet performance for liver and tumor segmentation in terms of multi-scale feature extraction ability by varying scaling factor The model performance has been evalu ated by using different statistical measures by varying scaling factors We trained the network from scratch and performed the experi mentation on a publicly available 3Dircadb dataset manually anno tated by the medical experts and demonstrated the network segmentation performance The proposed model is statistically sig nificant for the significance level 0.05 (p-value < 0.05) performed using a statistical test for hypothesis testing The remaining paper is organized as follows The explanation of the proposed methodology in section 3, experimental setup and result analysis in Section and section 5, concludes the paper Proposed work 3.1 Proposed methodology We proposed a deep convolutional neural network with multi-scale feature extraction and recalibration architecture for automatic liver and tumor segmentation Fig shows the proposed MS-UNet encoderdecoder architecture We embed the Res2Net bottleneck module with the SE network in place of two × convolution operations in UNet encoder-decoder stages The bottleneck Res2Net module has the same architecture, such as bottleneck ResNet except a single × convolu tion operation replaced by small × convolutions in hierarchical order to achieve multi-scale feature extraction and improvement in the receptive field The Res2Net bottleneck architecture was designed to improve the layer-wise multi-scale feature representation at a more granular level and receptive field of CNN We employed the Res2Net bottleneck module instead of × convolution in UNet to leverage the multi-scale feature extraction ability with an improved receptive field to enhance the segmentation performance The Res2Net module can extract the input features at a more granular level with its multi-scale characterization ability It improves the receptive field of the CNN by scaling the input features into small blocks and processed through the Fig Proposed MS-UNet encoder and decoder architecture (Yellow blocks: Res2Net module and SE network, Gray block: SE network) (For interpretation of the references to colour in the Figure, the reader is referred to the web version of this article) D.T Kushnure and S.N Talbar Computerized Medical Imaging and Graphics 89 (2021) 101885 multiple convolution blocks with different scale features to enhance the features at multi-scale These multi-scale features are captured by mul tiple convolution layers with the local receptive field They empower the network to extract informative features by aggregating both spatial and channel-wise features In semantic segmentation, spatial information plays a significant role in locating the region of interest in the image To empower the network potential to characterize the global and local information at a granular level and uplift the network learning ability at each stage To achieve this, we employed the feature refinement through SENet The SENet recalibrates the features in two steps One, it globalizes the fused multiscale features channel-wise into the one-dimensional vector called squeeze operation Secondly, it recalibrates the features by passing through two dense layers and describes the weights for the input channels called excitation operation Then channel weights scale with input multi-scale features and improve the feature representation po tential of the network The network gains the perception of coarsegrained context in the shallow layers and localization of fine-grained attributes in the deeper layers, which leads to boost segmentation performance Table shows the detailed architecture of the proposed multi-scale UNet with the number of stages in the network along with layer-wise activation function, convolution filter size, and the number of features and feature shape All the convolutional operations used in the Res2Net module following the order Conv2D, Batch normalization, and Relu The entire training and testing pipeline illustrated in Fig The input CT dataset is preprocessed first, then train the MS-UNet for liver and tumor, test model on test data, and evaluate network performance using statistical measures Table Details of operations performed, settings of layers in each encoding and decoding stage of the proposed network # Stages Encoder path Input 3.2 Multi-scale features In the CNN, encoder architecture extracts high-level information at each stage by downsampling the input using a pooling operation However, pooling causes contextual information loss The skip connec tion provides the low-resolution information to the respective stages of the decoder to recover the contextual information However, this method cannot retrieve the loss due to the pooling layer and results in a coarse pixel map The multi-scaling approach enables CNN to extract different features at different scales It enhances the receptive field layer-wise at a more granular layer, which leads to the refinement of the network’s feature characterization potential The layer-wise feature representation ability of CNNs improved at a more granular level by improving the receptive field using the bottle neck Res2Net module (Gao et al., 2019) The detailed architecture of the Res2Net module is shown in Fig For multi-scale feature representa tion, the × convolution filters of n channels are replaced with a bunch of small filters in the Res2Net module, each with w channels in such a way n = s × w without additional computational burden The small filter groups are connected in a hierarchical residual-fashion that increases the representation of output features with different scales Lastly, the feature maps from all the subsets are concatenated and pass through the × filter to fuse complete information The input features are evenly split up into s subset after × convolution in such a way that every set has the same spatial size and 1/s channels as compared with the input features The number of features denoted by fi where i ε {1, 2, 3, …, s} Each fi has a corresponding × convolution filter, excluding the first feature f1 the output of denoted Oi ( ), and the output of Oi ( ) is multi-scale features denoted by Mi The output Mi can be written as (Eq 1), ⎧ i = 1; ⎨ fi , Mi = Oi (fi ) , i = 2; (1) ⎩ Oi (fi + Mi− ) , < i ≤ s Conv2D [3 × 3, BatchNorm, Relu] Res2Net + SE Res2Net- [64, scaling = 4, Relu] SE- [64, r = 8, Relu and Sigmoid] # output features and feature size Decoder path # output features and feature size 256 × 256 ×1 Conv2D [output Layer] [1 × 1, Sigmoid] 256 × 256 ×1 256 × 256 × 64 Conv2D [3 × 3, BatchNorm, Relu] 256 × 256 × 64 Res2Net + SE 256 × 256 × 64 Max Pooling [2 × 2] 128 × 128 × 64 Res2Net + SE Res2Net- [128, scaling = 4, Relu] SE- [128, r = 8, Relu and Sigmoid] 128 × 128 × 128 Max Pooling [2 × 2] 64 × 64 × 128 Res2Net + SE Res2Net- [256, scaling = 4, Relu] SE- [256, r = 8, Relu and Sigmoid] 64 × 64 × 256 Max Pooling [2 × 2] 32 × 32 × 256 Res2Net + SE Res2Net- [512, scaling = 4, Relu] SE- [512, r = 8, Relu and Sigmoid] 32 × 32 × 512 Max Pooling [2 × 2] 16 × 16 × 512 Res2Net + SE Res2Net- [1024, scaling = 4, Relu] SE- [1024, r = 8, Relu and Sigmoid] Res2Net-[64, scaling = 4, Relu] SE- [64, r = 8, Relu and Sigmoid] Upsampling (Deconvolution layer) [2 × 2, strides = × 2] Res2Net + SE Res2Net- [128, scaling = 4, Relu] SE- [128, r = 8, Relu and Sigmoid] Upsampling (Deconvolution layer) [2 × 2, strides = × 2] Res2Net + SE Res2Net- [256, scaling = 4, Relu] SE- [256, r = 8, Relu and Sigmoid] Upsampling (Deconvolution layer) [2 × 2, strides = × 2] Res2Net + SE Res2Net- [512, scaling = 4, Relu] SE- [512, r = 8, Relu and Sigmoid] Upsampling (Deconvolution layer) [2 × 2, strides = × 2] 256 × 256 × 64 256 × 256 × 128 128 × 128 × 128 128 × 128 × 256 64 × 64 × 256 64 × 64 × 512 32 × 32 × 512 32 × 32 × 1024 16 × 16 × 1024 } { split up fj , j ≤ i The fusion of all the features results in the output of Res2Net with multiple dissimilar features with different combinations of the receptive field In the Res2Net module, the features are split and concatenated The splits are working in a multi-scale manner, which benefited to the extraction of both global and local features information of input features The concatenation of features at a different scale to better fuse the information The process of split and concatenation en ables the convolutions to transform features more efficiently The fea tures from multiple × filters combined that result in many identical features due to the aggregation effect Eq depicts the multi-scale features, where Oi ( ) is × convolu tional operator could encounter feature information from all the features D.T Kushnure and S.N Talbar Computerized Medical Imaging and Graphics 89 (2021) 101885 Fig Illustration of liver and tumor segmentation pipeline for MS-UNet 3.3 Multi-scale feature recalibration returns the same channel dimension of input features (M) The final output of the SE block is the scaling of excitation output channel weights with input features, which puts more emphasis on essential features and less on negligible features The scaling expressed as (Eq 4), Along with the multi-scale feature characterization ability of the Res2Net module, the SE network (Hu et al., 2018), as indicated in Fig 2, is added before the residual connection The SE network’s primary purpose is to model the channel-wise feature responses by expressing the interdependencies between the channels and recalibrate fused features after concatenation In squeeze operation, input features transformed in such a fashion, which apply global average pooling on the input features of size W × H × C received from the × convolution of Res2Net block and con verted all the channels into the one-dimensional vector with dimensions equal to the number of channels C Let, input features M = [M1 , M2 , M3 , …, MC ] where MC ε RH×W is one channel from input feature with size H × W The global pooling represents the one-dimensional vector Z of size RC For Cth- channel, the element in the vector is given by (Eq 2), ZC = Fsqe (MC ) = H ∑ W ∑ MC (i, j) H × W i=1 j=1 ̃ C = Fscale (MC , EC ) = MC EC M [ ] ̃ = M ̃ 1, M ̃ 2, M ̃ , …, M ̃ C and scaling operation Fscale (MC , EC ) is Where M channel-wise multiplication between the EC ∈ [0, 1] and MC ∈ RH×W The multi-scale features and recalibration approach characterized the high-level features in a better way At each encoding stage, the network maintains the contextual information of the input object These features are concatenated with the corresponding encoding stages with decoding stages through skip connection to reconstruct the object shape in the segmentation map Experimental setup and result analysis (2) 4.1 Data preparation The Z is the transformation of input features M, and it is the aggre gation of transformed features that can be interpreted as a cluster of the local descriptors whose statistics are meaningful for the entire image In the second operation, the aggregated information is utilized to grab channel-wise dependencies To isolate the channel and improve the generalization capability of a network simple gating mechanism is employed using two fully connected layers with ReLU and sigmoid activation The first fully connected layers transform Z with δ – ReLU activation and then σ - sigmoid activation function expressed as (Eq 3), E = FEx (Z, W) = σ (g(Z, W) = σ(W2 δ(W1 Z)) C (4) The experimentation performed using the publicly available 3D Image Reconstruction for Comparison of Algorithm Database (3Dircadb) was utilized for the training and testing the network It consists of 20 venous phase enhanced CT volumes from various European hospitals with different CT scanners The CT scans of patients (10 women and 10 men) with hepatic tumors in 15 cases The database has been manually annotated by medical experts for liver and tumor The input size is 512 × 512, and in-plane resolution has a range from 0.86 × 0.86 mm2 to 0.56 × 0.56 mm2 The CT scan slices are available in DICOM format The number of slices ranges from 74 to 260 Slice thickness varies from mm to mm The database provides significant variations in the shape and size of the liver and tumor The tumors are available in different coui naud segments in the liver (3Dircadb, n.d.) (3) C Where W1 ∈ R r ×C and W2 ∈ RC× r , r- is the dimensionality reduction factor which decides the computational cost controlling capacity of SENet Here we employed a dimensionality reduction factor r = that exhibits the best segmentation performance for the medical images (Rundo et al., 2019) Therefore, we employed the dimensionality reduction factor r = for our experimentation The excitation operation 4.2 Data preprocessing In CT scan volume, the relative densities of internal body organs are D.T Kushnure and S.N Talbar Computerized Medical Imaging and Graphics 89 (2021) 101885 measured using Hounsfield Units (HU) In general, the HU values range from -1000 to 1000 The tumor growth in the liver parenchyma, which is a region of interest for segmentation To focus on the liver region, the adjacent organs, and irrelevant tissues in the abdomen may trouble the segmentation performance The radiodensities in CT volumes for soft liver tissues vary from 40 HU to 50 HU (Jin et al., 2018) For removing irrelevant organs and unnecessary details from the CT images, the liver region and tumor become clean for segmentation We preprocess the entire CT data slice-by-slice-fashion First, we downsampled the 512 × 512 CT images into 256 × 256 to reduce the computational burden Secondly, we took a global windowing step on CT slices one-to-one For windowing HU values, we used the window of (-100, 400) HU so that most of the irrelevant organs were removed from the CT slices that make the liver and tumor region clean for segmenta tion Afterward, the dataset normalized on the same scale between [0,1] that simplifies the network learning function and convergence speed of the network by supplying the more easily proportionate images as input (Luo et al., 2014) Lastly, we performed image enhancement to get an enhanced liver and liver tumor region for segmentation The HU value windowing and image enhancement results, along with the histogram plot illustrated in Fig These preprocessing operations offer a clean liver and tumor region for segmentation 2015) with TensorFlow (Abadi et al., 2016) as a backend The work station utilized for training and testing the model has Intel(R)Xeon(R) CPU E5-16200, 3.60GHZ, 16 GB RAM, NVIDIA GEFORCE TITAN-X GPU with 12 GB Memory with windows 10 operating system 4.3 Training strategy for the proposed network VOE(A, B) = − JC The proposed network trained using Adam optimizer with initial learning rate × e− reduced on the plateau with the patience of epochs up to minimum learning rate × e− 10 and mini-batch size of employed for training the network To avoid overfitting, we regularize the network weights using a weight decay factor of × e− Table indicates the detailed training configuration of the network VOE(A, B) = 4.4 Performance metrics To evaluate the segmentation performance between the ground truth and the segmentation map of the proposed network, we utilize the performance metrics based on the volumetric size similarity and surface distance measures (Heimann et al., 2009) (Jiang et al., 2018) In defi nition, ground truth is denoted by A, and the segmented result is denoted by B The Dice Similarity Coefficient (DSC) score expressed as follows (Eq 6), DSC(A, B) = LDsc = − N ∑ pi × gi i=1 N ∑ i=1 P2i + N ∑ i=1 (6) The Volumetric Overlap Error (VOE) expressed (Eq and 9) using the Jaccard Coefficient (JC) or Intersection over Union (IoU) (Eq 7), Jaccard Coefficient(A, B) = ( 1− |A ∩ B| |A ∪ B| |A ∩ B| |A ∪ B| (7) (8) ) (9) × 100 Relative Absolute Volume Difference (RAVD) denoted as follows (Eq 10), ( ) |B| − |A| RAVD = Abs × 100 (10) |A| The surface distance measures are Average Symmetric Surface Dis tance (ASSD) and Maximum Symmetric Surface Distance (MSSD) ASSD denoted as (Eq 12), Let S(A) indicates the number of surface voxels of A The shortest distance of a random voxel V to S(A) is expressed as follows (Eq 11), 4.3.1 Loss function The model compiled using the dice coefficient as a metric and loss function as a dice loss, which is the complement of the Dice coefficient (Jin et al., 2018) The dice loss is expressed as (Eq 5), 2|A ∩ B| × 100 |A| + |B| (11) d(v, S(A) ) = ‖v − SA ‖ (5) SA ∈S(A) g2i Where ‖‖ denote Euclidean distance ) ( ∑ ∑ ASSD(A, B) = d(SA , S(B) ) + d(SB , S(A) ) |S(A) | + |S(B) | S ∈S(A) S ∈S(B) Where pi and gi are the binary predicted segmentation voxels and ground truth voxels, respectively, and N is the number of voxels It measures the similarity of two samples directly and accordingly network weights optimized by minimizing the loss B A (12) Maximum symmetric surface distance (MSSD) denoted as (Eq 13), { } MSSD(A, B) = max max d(SA , S(B) ), max d(SB , S(A) ) (13) 4.3.2 Data augmentation The deep neural networks are data hungry It needs enormous data to train without over-fitting and generalize well on the test data The nonavailability of a vast labeled medical image database publicly limits the deep learning model to apply in the medical applications However, despite the limited publicly available databases, it is possible to train the deep learning model employing the data augmentation technique It allows for augmenting the database by applying standard geometric transformations like translation, rotation, and scaling We train our network by employing data augmentation We augment the training images by employing image transformations on the dataset, which are rotation, scaling, shifting, flipping, and elastic deformation The data augmentation is beneficial to reduce the risk of overfitting during the training process and refines the generalization potential of the model on test data (Yamashita et al., 2018) SA ∈S(A) SB ∈S(B) The DSC, IoU, VOE, and RAVD measured in percentage, and surface distance measures (ASSD and MSSD) measured in millimeters (mm) For DSC and IoU 100 % is the best segmentation, and % is the worst segmentation, VOE, and RAVD 0% is the best segmentation, and 100 % is the worst segmentation These metrics expressed in percentage The ASSD and MSSD measured in millimeters (mm) For ASSD and MSSD mm is the best segmentation, and the upper value has not bound; the maximum value shows the worst segmentation 4.5 Experimental results and analysis The proposed network has multi-scale feature extraction and feature recalibration ability that leads to better segmentation performance To verify the multi-scale feature representation potential of the proposed network, we evaluated the proposed network for scaling factor s = 2, s = 4, and s = and measure the performance using the measures 4.3.3 Implementation platform details The proposed network has been implemented in the Keras high-level neural network application programming interface (API) (Chollet, D.T Kushnure and S.N Talbar Computerized Medical Imaging and Graphics 89 (2021) 101885 Fig Preprocessing effect on input image along with the histogram: the first column indicates the sample input CT slices with histogram, the second column indicates HU Windowing with Histogram, and the third column indicates the histogram equalized images and equalized histogram D.T Kushnure and S.N Talbar Computerized Medical Imaging and Graphics 89 (2021) 101885 shows a tradeoff between the scaling factor and the segmentation per formance of the network Table Proposed network training configuration Training Parameters Value Initial Learning rate × e− 4.6 Comparison with other methods Minimum Learning rate 1×e Weight regularization factor (L2 Regularization) × e− Scaling factor of Res2Net block Convolutional 2D operations and activation function utilized in Res2Net SE Net block dimensionality reduction factor Optimizer Loss function Mini-batch size Conv2D + BatchNorm + Relu Adam Dice loss − 10 We verified the performance of the proposed methods with state-ofart-methods for liver and tumor segmentation We perform the experi mentation on a 3Dircadb publicly available dataset that provides considerable variation and complexity of liver and tumors Segmenta tion results on the 3Dircadb database for various methods are shown in Tables and 6, respectively The dice similarity score is one of the sig nificant performance measures preferred to evaluate the segmentation algorithm performance of the medical images The proposed method offers a dice similarity score of 97.13 % for liver and 84.15 % for a liver tumor, far better than the baseline UNet architecture (Ronneberger et al., 2015) Our method is superior to the ResNet based method (Han, 2017) proposed for liver tumor with an improved dice score of 3.33 % for liver and 22.15 % for tumor segmentation We compared the results with the recently proposed method with modified UNet architecture (Seo et al., 2020) ; our method showed improved dice scores by 1.05 % for liver and 13.08 % for tumors Our proposed model complexity compared with the other UNet based methods shown in Table The Res2Net block reduces the number of parameters and increases the number of layers in the network, resulting in better learning of the network and reducing the computa tional burden to a great extent compared to ResNet and mU-Net models The prediction time reduced due to less complexity compared to ResNet and mU-Net models Our model compared with the baseline UNet model, a small number of parameters and FLOPS increased, but seg mentation performance improved significantly We illustrated the statistical significance analysis of the proposed model We calculated the p-value to demonstrate the statistical signifi cance of the proposed model with the other models The p-value is the statistical way to test the hypothesis of the method and is decided by using statistical tests on the data The segmentation performance of the different models compared with the proposed model using the statisti cally significant analysis by calculating the p-value between the two models The proposed model verified for the statistical significance with the significance level α = 0.05 or confidence level 0.95 The statistical significance of the model decided by the p-value using the nonparametric statistical Wilcoxon signed-rank test (Demsar, 2006), which is used for hypothesis testing The test was performed on the predicted results obtained from different models The pair of dice scores of each test sample from the other models are utilized to perform the statistical Wilcoxon signed-rank test (Zabihollahy et al., 2019) We test the model by setting the null hypothesis (H0): two models have no statistically significant difference in performance, and the alternative hypothesis (Ha): the proposed model is having a statistically significant performance than the other model We performed the test on a different group of samples using the dice score of each sample calculated using the ground truth provided in the dataset for comparing the model per formance The significance level set for the test is α = 0.05 If the p-value is smaller than the significance level (α), then the null hypothesis is rejected in favor of the alternative hypothesis The meaning is that the proposed model has statistically significant performance than another defined in the previous section; Table presents the performance of the proposed network for the various scaling factor The proposed network achieved a dice similarity coefficient of 97.13 % for liver and 84.15 % for tumor segmentation when scaling factor s = The scaling factor improves the CNN capability to extract features at a multi-scale level with an improved receptive field The segmentation performance improved by multi-scale features representation and recalibration approach for liver and tumor is shown in Fig The result depicted that the multi-scale features capture more detailed liver and tumor information and segment complicated liver parenchyma and tumor with less segmentation error The boundary marked images show that the error between ground truth and the segmented results The network captures fuzzy liver boundaries with less segmentation error The liver and tumor have less intensity varia tion captured by the network with less segmentation error However, the network shows negative results where the tumor size is sufficiently small, shown in Fig The network performs well on liver pixel seg mentation, but the tumors with small size are not segmented with more accuracy than that of a large tumor The tumor segmentation error in creases as the number of small tumors increases in the 2D CT images The proposed network can segment the liver anatomical structure from the abdomen CT images with less segmentation error The results show that the network can segment the liver with the feeble boundaries between the liver and nearby organs with less segmentation error However, the tumor pixels are not accurately classified by the network if tumor size decreases The segmentation error is increasing, and the network produces false-negative results for the tumor segmentation for small-sized tumors The multi-scale approach offers a reasonable seg mentation quality for complex liver anatomical structure and large size tumors Still, tumor segmentation performance for small size tumor needs to upgrade at an adequate level It is observed that the network performs well on the liver and large tumors, but it produces falsenegative results if the size of the tumor decreases We also analyzed the effect of multi-scale features on the network complexity in terms of the number of parameters, layers, floating-point operations per second (FLOPS), and prediction time per image, as shown in Table As the scaling factor increases, the total number of param eters, FLOPS, and prediction time decreases, indicating the network complexity reduces with the scaling factor However, the total layers increasing with the increase in scaling factor that improve feature extraction capability and learning ability of the network The analysis Table Experimental results on 3Dircadb database for liver and tumor segmentation Scaling factor (s) Liver Segmentation Tumor Segmentation Liver Segmentation Tumor Segmentation Liver Segmentation Tumor Segmentation DSC (%) IoU (%) VOE (%) RAVD (%) ASSD (mm) MSSD (mm) 95.87 68.85 97.13 84.15 96.71 78.19 91.23 55.34 94.42 72.64 95.13 64.19 18.87 44.66 5.57 27.36 6.37 35.81 1.72 2.24 0.41 0.22 0.04 0.36 10.83 3.23 4.08 1.64 4.52 1.92 20.31 15.01 10.21 7.04 12.15 7.84 D.T Kushnure and S.N Talbar Computerized Medical Imaging and Graphics 89 (2021) 101885 Fig Sample segmentation results (Scaling factor s = 4) Row 1- input images, Row - liver and tumor Ground Truth (GT) images (Red liver, and orange - tumor), Row 3- an overlay of GT and input images (Red - liver, and Purpletumor), Row 4- Segmented liver and tumor (Dark green- liver and faint green - tumor), Row 5- an overlay of segmented output and input (Dark green- liver and faint green -tumor), Row - Boundary marked images with GT and segmented result, Row - magnified liver region (Red- liver GT, Blue-Tumor GT, Green- Liver segmented region and yellow - segmented tumor region) (For interpretation of the references to colour in the Figure, the reader is referred to the web version of this article) 10 D.T Kushnure and S.N Talbar Computerized Medical Imaging and Graphics 89 (2021) 101885 Fig Sample false-negative segmentation re sults for liver tumor (Scaling factor s = 4) Row 1- input images, Row – liver and tumor Ground Truth (GT) images (Red - liver, and or ange - tumor), Row 3- an overlay of GT and input images (Red - liver, and Purple-tumor), Row 4- Segmented liver and tumor (Dark green- liver and faint green - tumor), Row 5- an overlay of segmented output and input image (Dark green- liver and faint green _tumor), Row - Boundary marked images with GT and segmented result, Row - magnified liver region (Red- liver GT, Blue-Tumor GT, Green- Liver segmented region and yellow - segmented tumor region) (For interpretation of the references to colour in the Figure, the reader is referred to the web version of this article) 11 D.T Kushnure and S.N Talbar Computerized Medical Imaging and Graphics 89 (2021) 101885 Table Network complexity in terms of parameters, layers, FLOPS, and prediction time Scaling factor (s) # Parameters # layers # FLOPS Prediction time per image (msec) 11,753,282 10,774,722 9,549,314 181 271 451 30,262,490 28,303,216 25,851,404 7.4 6.2 5.8 Table Comparative result analysis on 3Dircadb database for liver segmentation Methods DSC (%) VOE (%) RAVD (%) ASSD (mm) MSSD (mm) UNet ResNet mU-Net Ours 72.90 93.80 96.08 97.13 39.00 11.65 9.86 5.57 8.7 3.86 0.32 0.41 19.40 3.91 3.60 4.08 119 12.88 9.42 10.21 Fig Statistical significance for different sample size and the p-value of the different model for significance level 0.05 Table Comparative result analysis on 3Dircadb database for tumor segmentation Methods DSC (%) VOE (%) RAVD (%) ASSD (mm) MSSD (mm) UNet ResNet mU-Net Ours 51.0 62.00 70.87 84.15 62.55 42.60 31.16 27.36 38.42 4.12 0.76 0.22 11.11 6.78 1.37 1.64 86.60 58.04 6.48 7.04 feature extraction property of the Res2Net module, which improves the receptive field of CNN It is essential to contextualize the objects extracted at various scales for effective segmentation We utilized the recalibration mechanism using SENet, which further improves the channel-wise feature responses by modeling interdependencies between channels That network can learn the high-level contextual information in a better way and improve the performance of the deeper layer, which leads to the better feature representation ability of the network We experimentally verified the potential of the multi-scale feature extraction property of the CNN for liver and tumor segmentation We performed the experimentation to leverage the multi-scale property to a great extent by varying the scaling factor of the network By changing the scaling factor, multi-scale features and receptive field increases offer a better representation of the detailed features and accurately segment the target object Finally, we proposed the end-to-end trained network for automatic liver and tumor segmentation and compared the perfor mance with state-of-the-art liver and tumor segmentation methods The scaling factor decides the multi-scale property, we increase the scaling factor multi-scale features representation potential increases, and the receptive field of CNN improved that enhances network seg mentation performance However, the scaling factor is increasing rela tively larger that cannot extend the assurance of the performance improvement It happens because the Res2Net module can learn an appropriate receptive field If the object in the image is already covered by receptive fields available in the Res2Net module, then improving the scale can limit the performance of the network (Gao et al., 2019) On the other hand, increasing the scaling factor, network parameters, and FLOPS reduces, hence reducing network complexity However, the performance starts degrading because of multi-scale network limitations to learn the restricted receptive field The experimental results demon strated that when the scaling factor increased to s = 8, the network performance slightly degraded The scaling factor is the parameter that needs to choose appropriately to attain better segmentation perfor mance We also demonstrated the performance of the proposed model using the statistical significance analysis using the p-value for the sig nificance level α = 0.05 The non-parametric Wilcoxon signed-rank test utilized for the hypothesis testing demonstrated that the proposed model is statistically significant than the previously proposed models Further, the network is trained for liver and tumor segmentation using a super vised learning strategy, which causes the performance of the model depends on the quality and scale of the dataset The network perfor mance can be upgraded by tuning the network hyper-parameters like mini-batch size, learning rate, and Learning strategies In future work, we plan to verify the model performance on 3D CT volume for various scaling factor and tweak the network parameters like output multi-scale feature expansion by increasing group of convolution Table Comparison of network complexity with other methods Methods # Parameters # layers # FLOPS Prediction time per image (msec) UNet ResNet mU-Net Ours 7,759,521 47,764,097 35,438,209 10,774,722 32 53 88 271 15,512,223 95,503,660 70,838,747 28,303,216 4.3 9.1 8.6 6.2 model Table indicates that the p-value varies with sample size, and it also provides strong evidence that the performance of the proposed model and compared models is significantly different Fig shows that the p-value for the proposed model with the other model is less than the significance level of 0.05 Therefore, the p-value demonstrates that the segmentation performance of the proposed model statistically signifi cantly from all other models tested using the statistical test As we in crease the number of samples, the p-value becomes very small, which provides strong support that the proposed model is statically significant than the other models utilized for comparison 4.7 Discussion The automatic liver and tumor segmentation are challenging due to fuzzy boundaries between the adjacent organs, less intensity difference between the liver and tumor, making the automatic segmentation difficult The automatic segmentation of the liver and tumor plays a significant role in clinical interpretation and treatment planning for liver-related complications In this paper, we proposed the multi-scale Table Statistical significance analysis with a p-value of the proposed method and other previously proposed methods with the different sample size Methods p-value for different sample size # Samples ¼ 30 # Samples ¼ 50 UNet and MS-UNet 0.0007 1.5669 × e− ResNet and MS-UNet mU-Net and MS-UNet 0.0017 1.0246 × e− 0.0009 2.9938 × e− 05 06 09 12 D.T Kushnure and S.N Talbar Computerized Medical Imaging and Graphics 89 (2021) 101885 operations in Res2Net module and dimensionality reduction factor of SENet to boost the network performance for the liver and tumor segmentation Bray, F., Ferlay, J., Soerjomataram, I., Siegel, R.L., Torre, L.A., Jemal, A., 2018 Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries CA Cancer J Clin 68, 394–424 https://doi.org/ 10.3322/caac.21492 Budak, U., Guo, Y., Tanyildizi, E., Sengur, A., 2020 Cascaded deep convolutional encoder-decoder neural networks for efficient liver tumor segmentation Med Hypotheses 134, 109431 https://doi.org/10.1016/j.mehy.2019.109431 Chen, L.C., Papandreou, G., Schroff, F., Adam, H., 2017 Rethinking atrous convolution for semantic image segmentation arXiv Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L., 2018 DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs IEEE Trans Pattern Anal Mach Intell 40, 834–848 https:// doi.org/10.1109/TPAMI.2017.2699184 Chen, Y., Wang, K., Liao, X., Qian, Y., Wang, Q., Yuan, Z., Heng, P.A., 2019 Channelunet: a spatial channel-wise convolutional neural network for liver and tumors segmentation Front Genet 10, 1–13 https://doi.org/10.3389/fgene.2019.01110 Chlebus, G., Meine, H., Moltz, J.H., Schenk, A., 2017 Neural network-based automatic liver tumor segmentation with random forest-based candidate filtering arXiv 5–8 Chollet, F., 2015 Keras [WWW Document] URL https://github.com/keras-team/keras Christ, P.F., Ettlinger, F., Grün, F., Elshaer, M.E.A., Lipkov´ a, J., Schlecht, S., Ahmaddy, F., Tatavarty, S., Bickel, M., Bilic, P., Rempfler, M., Hofmann, F., D’Anastasi, M., Ahmadi, S.A., Kaissis, G., Holch, J., Sommer, W., Braren, R., Heinemann, V., Menze, B., 2017 Automatic liver and tumor segmentation of CT and MRI volumes using cascaded fully convolutional neural networks arXiv 1–20 Demsar, J., 2006 Statistical comparisons of classifiers over multiple data sets J Mach Learn Res 7, 1–30 Drozdzal, M., Vorontsov, E., Chartrand, G., Kadoury, S., Pal, C., 2016 The importance of skip connections in biomedical image segmentation Lect Notes Comput Sci 179–187 https://doi.org/10.1007/978-3-319-46976-8_19 (including Subser Lect Notes Artif Intell Lect Notes Bioinformatics) 10008 LNCS Efremova, D.B., Konovalov, D.A., Siriapisith, T., Kusakunniran, W., Haddawy, P., 2019 Automatic segmentation of kidney and liver tumors in CT images arXiv https://doi org/10.24926/548719.038 Gao, S.H., Cheng, M.M., Zhao, K., Zhang, X.Y., Yang, M.H., Torr, P., 2019 Res2Net: A new multi-scale backbone architecture arXiv XX, 1–10 https://doi.org/10.1109/ tpami.2019.2938758 Gotra, A., Sivakumaran, L., Chartrand, G., Vu, K.N., Vandenbroucke-Menu, F., Kauffmann, C., Kadoury, S., Gallix, B., de Guise, J.A., Tang, A., 2017 Liver segmentation: indications, techniques and future directions Insights Imaging 8, 377–392 https://doi.org/10.1007/s13244-017-0558-1 Gruber, N., Antholzer, S., Jaschke, W., Kremser, C., Haltmeier, M., 2019 A joint deep learning approach for automated liver and tumor segmentation arXiv 1–10 Han, X., 2017 MR-based synthetic CT generation using a deep convolutional neural network method Med Phys 44, 1408–1419 https://doi.org/10.1002/mp.12155 Heimann, T., Van Ginneken, B., Styner, M.A., Arzhaeva, Y., Aurich, V., Bauer, C., Beck, A., Becker, C., Beichel, R., Bekes, G., Bello, F., Binnig, G., Bischof, H., Bornik, A., Cashman, P.M.M., Chi, Y., C´ ordova, A., Dawant, B.M., Fidrich, M., Furst, J.D., Furukawa, D., Grenacher, L., Hornegger, J., Kainmüller, D., Kitney, R.I., Kobatake, H., Lamecker, H., Lange, T., Lee, J., Lennon, B., Li, R., Li, S., Meinzer, H.P., N´ emeth, G., Raicu, D.S., Rau, A.M., Van Rikxoort, E.M., Rousson, M., Rusk´ o, L., Saddi, K.A., Schmidt, G., Seghers, D., Shimizu, A., Slagmolen, P., Sorantin, E., Soza, G., Susomboon, R., Waite, J.M., Wimmer, A., Wolf, I., 2009 Comparison and evaluation of methods for liver segmentation from CT datasets IEEE Trans Med Imaging 28, 1251–1265 https://doi.org/10.1109/TMI.2009.2013851 Hu, J., Shen, L., Albanie, S., Sun, G., Wu, E., 2018 Squeeze-and-excitation networks Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 7132–7141 https://doi.org/10.1109/CVPR.2018.00745 Jiang, H., Li, Shaojie, Li, Siqi, 2018 Registration-based organ positioning and joint segmentation method for liver and tumor segmentation Biomed Res Int 2018 https://doi.org/10.1155/2018/8536854 Jin, Q., Meng, Z., Sun, C., Wei, L., Su, R., 2018 RA-UNet: A hybrid deep attention-aware network to extract liver and tumor in CT scans arXiv 1–13 https://doi.org/ 10.3389/fbioe.2020.605132 Li, X., Chen, H., Qi, X., Dou, Q., Fu, C.W., Heng, P.A., 2018 H-DenseUNet: Hybrid densely connected UNet for liver and tumor segmentation from CT volumes IEEE Trans Med Imaging 37, 2663–2674 https://doi.org/10.1109/TMI.2018.2845918 Litjens, G., Kooi, T., Bejnordi, B.E., Setio, A.A.A., Ciompi, F., Ghafoorian, M., van der Laak, J.A.W.M., van Ginneken, B., S´ anchez, C.I., 2017 A survey on deep learning in medical image analysis Med Image Anal 42, 60–88 https://doi.org/10.1016/j media.2017.07.005 Long, J., Shelhamer, E., Darrell, T., 2015 Fully convolutional networks for semantic segmentation Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 3431–3440 Luo, S., Li, X., Li, J., 2014 Review on the methods of automatic liver segmentation from abdominal images J Comput Commun 02, 1–7 https://doi.org/10.4236/ jcc.2014.22001 Moghbel, M., Mashohor, S., Mahmud, R., Saripan, M.I.Bin, 2018 Review of liver segmentation and computer assisted detection/diagnosis methods in computed tomography Artif Intell Rev 50, 497–537 https://doi.org/10.1007/s10462-0179550-x Noh, H., Hong, S., Han, B., 2015 Learning deconvolution network for semantic segmentation Proceedings of the IEEE International Conference on Computer Vision 1520–1528 https://doi.org/10.1109/ICCV.2015.178 Qin, W., Wu, J., Han, F., Yuan, Y., Zhao, W., Ibragimov, B., Gu, J., Xing, L., 2019 Superpixel-based and boundary-sensitive convolutional neural network for Conclusion We presented the end-to-end trained multi-scale UNet architecture for automatic liver and tumor segmentation We utilized the multi-scale feature extraction property of the Res2Net module that improves the receptive field of CNN and extracts the features at a more granular level Multi-scale features better characterize the object and preserve the highlevel contextual information of the object that plays a significant role in the semantic segmentation task The feature recalibration property of the SE network improves the network’s feature learning capability by modeling the channel-wise feature responses and representing in terdependencies between the channels We trained the model from scratch and verified the effect of scaling factor on multi-scale features on liver and tumor segmentation performance Our network performs well on the liver and tumor for scaling factor s = The dice coefficient for liver and tumor of our network is superior to the state-of-the-art methods proposed for liver and tumor segmentation The proposed approach utilized minimum preprocessing operations, that is, HU-value windowing and image enhancement It could be utilized for other medical imaging applications for other organ segmentation with different medical image modalities like MRI, PET, and ultrasound CRediT authorship contribution statement Devidas T Kushnure: Conceptualization, Methodology, Writing original draft, Investigation, Visualization, Software Sanjay N Talbar: Supervision, Resources, Writing - review & editing Declaration of Competing Interest All the authors declare no conflict of interest Acknowledgements The authors would like to extend heartfelt thanks to the Faculty and Management of Vidya Pratishthan’s, Kamalnayan Bajaj Institute of En gineering and Technology, Baramati, and Vidya Pratishthan’s, Institute of Information Technology, Baramati for facilitating computational re sources to perform experimentation References 3Dircadb Dataset URL: https://www.ircad.fr/research/3d-ircadb-01/ Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Man´ e, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Vi´egas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y., Zheng, X., Research, G., 2016 TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems Badrinarayanan, V., Kendall, A., Cipolla, R., 2017 SegNet: A deep convolutional encoder-decoder architecture for image segmentation IEEE Trans Pattern Anal Mach Intell 39, 2481–2495 https://doi.org/10.1109/TPAMI.2016.2644615 Bilic, P., Christa, P.F., Vorontsov, E., Chlebusr, G., Chenm, H., Doum, Q., Fum, C.W., Hanp, X., Hengm, P.A., Hesserq, J., Kadourye, S., Kopczyskiv, T., Leo, M., Lio, C., Lim, X., Lipkova, J., Lowengrubn, J., Meiner, H., Moltzr, J.H., Pale, C., Pirauda, M., Qim, X., Qil, J., Rempera, M., Rothq, K., Schenkr, A., Sekuboyinaa, A., Zhouk, P., Hulsemeyera, C., Beetza, M., Ettlingera, F., Gruena, F., Kaissisb, G., Lohferb, F., Brarenb, R., Holchc, J., Hofmannc, F., Sommerc, W., Heinemannc, V., Jacobsd, C., Mamanid, G.E.H., Ginnekend, B Van, Chartrande, G., Tange, A., Drozdzale, M., Kadourye, S., Ben-Cohenf, A., Klangf, E., Amitaif, M.M., Konenf, E., Greenspanf, H., Moreaug, J., Hostettlerg, A., Solerg, L., Vivantih, R., Szeskinh, A., Lev-Cohainh, N., Sosnah, J., Joskowiczh, L., Kumarw, A., Korex, A., Wangy, C., Fengz, D., Liaa, F., Krishnamurthix, G., Heab, J., Wuaa, J., Kimx, J., Zhouac, J., Maad, J., Liaa, J., Maninisae, K.K., Kaluvax, K.C., Bix, L., Khenedx, M., Beliverae, M., Linaa, Q., Yangad, X., Yuanaf, Y., Chenaa, Y., Liad, Y., Qius, Y., Wuad, Y., Menzea, B., 2019 The liver tumor segmentation benchmark (LiTS) arXiv 1–43 13 D.T Kushnure and S.N Talbar Computerized Medical Imaging and Graphics 89 (2021) 101885 Ueda, D., Shimazaki, A., Miki, Y., 2019 Technical and clinical overview of deep learning in radiology J Radiol 37, 15–33 https://doi.org/10.1007/s11604-018-0795-3 Yamashita, R., Nishio, M., Kinh, R., Do, G., Togashi, K., 2018 Convolutional neural networks: an overview and application in radiology Insights Imaging 9, 611–629 https://doi.org/10.1007/s13244-018-0639-9 Zabihollahy, F., White, J.A., Ukwatta, E., 2019 Convolutional neural network-based approach for segmentation of left ventricle myocardial scar from 3D late gadolinium enhancement MR images Med Phys 46, 1740–1751 https://doi.org/10.1002/ mp.13436 Zhang, Yao, He, Z., Zhong, C., Zhang, Yang, Shi, Z., 2017 Fully convolutional neural network with post-processing methods for automatic liver segmentation from CT In: Proc - 2017 Chinese Autom Congr CAC 2017 2017-Janua, pp 3864–3869 https:// doi.org/10.1109/CAC.2017.8243454 Zhang, W., Tang, P., Zhao, L., Huang, Q., 2019 A comparative study of U-nets with various convolution components for building extraction In: 2019 Joint Urban Remote Sensing Event JURSE 2019 https://doi.org/10.1109/ JURSE.2019.8809055 Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J., 2017 Pyramid scene parsing network In: Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition CVPR 2017 https://doi.org/10.1109/CVPR.2017.660 Zhou, Z., Siddiquee, M.M.R., Tajbakhsh, N., Liang, J., 2020 UNet++: Redesigning skip connections to exploit multiscale features in image segmentation IEEE Trans Med Imaging 39, 1856–1867 https://doi.org/10.1109/TMI.2019.2959609 automated liver segmentation Physiol Behav 176, 139–148 https://doi.org/ 10.1088/1361-6560/aabd19.Superpixel-based Ronneberger, O., Fischer, P., Brox, T., 2015 U-net: convolutional networks for biomedical image segmentation Lect Notes Comput Sci (including Subser Lect Notes Artif Intell Lect Notes Bioinformatics) 9351, 234–241 https://doi.org/ 10.1007/978-3-319-24574-4_28 Rundo, L., Han, C., Nagano, Y., Zhang, J., Hataya, R., Militello, C., Tangherloni, A., Nobile, M.S., Ferretti, C., Besozzi, D., Gilardi, M.C., Vitabile, S., Mauri, G., Nakayama, H., Cazzaniga, P., 2019 USE-Net: Incorporating Squeeze-and-Excitation blocks into U-Net for prostate zonal segmentation of multi-institutional MRI datasets Neurocomputing 365, 31–43 https://doi.org/10.1016/j neucom.2019.07.006 Seo, H., Huang, C., Bassenne, M., Xiao, R., Xing, L., 2020 Modified U-Net (mU-Net) with incorporation of object-dependent high level features for improved liver and livertumor segmentation in CT images IEEE Trans Med Imaging 39, 1316–1325 https://doi.org/10.1109/TMI.2019.2948320 Shelhamer, E., Long, J., Darrell, T., 2017 Fully convolutional networks for semantic segmentation IEEE Trans Pattern Anal Mach Intell 39, 640–651 https://doi.org/ 10.1109/TPAMI.2016.2572683 Spinczyk, D., Badura, A., Sperka, P., Stronczek, M., Pycinski, B., Juszczyk, J., Czajkowska, J., Biesok, M., Rudzki, M., Wiecławek, W., Zarychta, P., Badura, P., Woloshuk, A., Zylkowski, J., Rosiak, G., Konecki, D., Milczarek, K., Rowinski, O., Pietka, E., 2019 Supporting diagnostics and therapy planning for percutaneous ablation of liver and abdominal tumors and pre-clinical evaluation Comput Med Imaging Graph 78, 101664 https://doi.org/10.1016/j.compmedimag.2019.101664 14 ... input features and characterize global and local information of the input at the more granular level by extracting multi- scale features Therefore, the input feature represented with multiple features... multi- scale features and recalibration approach characterized the high-level features in a better way At each encoding stage, the network maintains the contextual information of the input object... pooling In pyramid pooling, global and local information is characterized effectively by transforming input features from multiple pooling and aggregating all the features to achieve effective semantic

Định dạng
Số trang	14
Dung lượng	6,67 MB