The 2020 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS) Dilated Residual Convolutional Neural Networks for Low-Dose CT Image Denoising Nguyen Thanh Trung1,2 , Dinh-Hoan Trinh3 , Nguyen Linh Trung1 , Tran Thi Thuy Quynh1 , Manh-Ha Luu1 VNU University of Engineering and Technology, Vietnam National University, Hanoi, Vietnam University of Information and Communication Technology, Thai Nguyen University, Thai Nguyen, Vietnam VIBOT ERL CNRS 6000 / ImViA, University of Bourgogne, France Emails: nttrungktmt@ictu.edu.vn, linhtrung@vnu.edu.vn Abstract—X-ray computed tomography (CT) imaging, which uses X-ray to acquire image data, is widely used in medicine High X-ray doses may be harmful to the patient’s health Therefore, X-ray doses are often reduced at the expense of reduced quality of CT images This paper presents a convolutional neural network model for low-dose CT image denoising, inspired by a recently introduced dialated residual network for despeckling of synthetic aparture radar images (SAR-DRN) In particular, batch normalization is added to some layers of SAR-DRN in order to adapt SAR-DRN for low-dose CT denoising In addition, a preprocessing layer and a post-processing one are added in order to improve the receptive field and to reduce computational time Moreover, the perceptual loss combined with MSE one are used in the training phase so that the proposed denoising model can preserve more subtle details of denoised images Experimental results show that the proposed model can denoise low-dose CT images efficiently as compared to some state-of-the-art methods Index Terms—Computer tomography, low dose imaging, medical image denoising, dilated residual network, convolutional neural network, perceptual loss I I NTRODUCTION Computed tomography (CT) images play an important role in disease diagnosis and therapy The quality of CT images often relates to X-ray radiation X-ray inherent to CT acquisition may be harmful for the health so it is necessary to develop methods that improve the quality of CT images without increasing X-ray dose Of our interest in this paper are those methods that can enhanc the quality of low-dose CT (LDCT) images that often contain more noise and artifacts Various methods were proposed for reducing noise and artifacts in LDCT images These methods can be categorized into three main groups: (i) Sinogram processing before reconstruction [1], (ii) iterative reconstruction [2], and (iii) image postprocessing after reconstruction [3], [4] Sinogram filtrations have some advantages such as noise characteristic of sinogram can be well modeled that is very useful for sinogram denoising, but these methods may lead to reduction of resolution and blur edges in reconstructed images Moreover, sinograms of commercial scanners are not available to user Iterative reconstruction methods give better reconstructed images than sinogram filtrations but often require higher computational cost, and thus is limitted from practical applications Image post-processing after reconstruction methods, which directly process reconstructed images, have attracted many researchers because of the availability of reconstructed images 978-1-7281-9396-0/20/$31.00 ©2020 IEEE and low computational cost Various methods have been proposed in the literature, some bring the ideas of natural image denoising methods, some are specifically designed for LDCT medical images The noise in LDCT images is often complicated, so denoising these images by using traditional methods may achieve limited results because these methods often use assumptions that could not help model well the noise inherent to LDCT With convolutional neural networks (CNN), the problem can be solved efficiently if we have a suitable dataset for model training Several CNN-based methods were proposed for LDCT image denoising [5]–[7] These methods often require a large network size or use a complex network structure In [8], the authors take the advantages of dilated convolution, residual learning, and skip connection, to propose a dialated residual network for despeckling of synthetic aparture radar images (SAR-DRN) The results of this work are impressive, inspiring the development of a new method in this paper for denoising LDCT images It is also noted that dilated convolution can both enlarge the receptive field and maintain the filter size and the layer depth The noise in a LDCT image is generally different from that in a SAR image Our objective is to adapt SAR-DRN so that the adapted model can denoise LDCT images better and faster than SAR-DRN while using the same parameter setting in SAR-DRN To this, we propose a model based on SAR-DRN with four changes: (i) apply batch normalization in layers of the network (see Figure 2), (ii) add two layers to the network (downsample layer and upsample layer) (see Figure 2), (iii) use perceptual loss combined with MSE loss in the training phase, and (iv) replace the training data with a set of pairs of low-dose CT and normal-dose CT images Hereafter we named the proposed method as FDRN-LDCT, which stands for Fast Dilated Residual Network for Low-dose CT image denoising II M ETHOD A Problem Let x ∈ Rn×m be a LDCT image, and y ∈ Rn×m be the corresponding normal-dose CT one It can be assumed that x is a degraded version of y and there exists a map from x to 189 Authorized licensed use limited to: UNIVERSITY OF WESTERN ONTARIO Downloaded on June 20,2021 at 00:04:35 UTC from IEEE Xplore Restrictions apply Suppose that the training set is a set of N image pairs ˆ i is the network output which corresponds {xi , yi }N i=1 , and y to input xi , i = {1, , N } The MSE loss is defined as LMSE (Θ) = Fig The illustration of dilated convolution with different rate (source: https://www.researchgate.net/figure/) y, so denoising the LDCT image here can be considered as constructing a function F :x→y (1) that maps x to y Because of the complexity of the noise in LDCT images [9], it is not easy to construct F by conventional methods However, by using deep learning, F can be trained feasibly when we have a suitable dataset In this work, we adapt dilated and residual network in [8] to denoise LDCT images B Network structure The proposed FDRN-LDCT network consists of layers is shown in Figure 2: pre-processing layer is followed by nonlinear mapping layers and then post-processing one The pre-processing layer, L0 , performs down-sampling the noisy input image This layer extracts an n × m input image into four sub-images with size of ⌈ n2 ⌉ × ⌈ m ⌉ The function of this layer is demonstrated in Figure We added this layer to increase the receptive field of the original SAR-DRN network in [8] The noisy sub-images then are passed through nonlinear mapping layers to generate denoised sub-images These images then are upsampled to yield the denoised image with the same size as that of the input image In the non-linear mapping layers, the convolutions follow dilated convolution with different dilation rates which are arranged similarly U-net style (1-2-3-4-3-2-1) Dilated convolution is illustrated in Figure To deploy the dilated residual network [8] in denoising of LDCT images, we have to use a different dataset which is very much different from the data used in DRN Accordingly, adaptation is needed, including the learning rate and initialization We adapt SARDRN to denoise LDCT images by adding 2-dimensional batch normalization blocks between dilated convolution and ReLU Batch normalization is more beneficial in training phase [10] such as it provides flexibility about initialization C Loss function While the architecture determines the complexity of the model, the loss function controls how to learn the denoising model from data [11] and thus plays an important role In image restoration tasks, the (per pixel) MSE loss is associated to oversmoothing of edges and loss of details, while the perceptual loss is more robust to these issues [12] 2N N ˆ − yi y i=1 = 2N N F(xi , Θ) − yi , i=1 (2) where Θ is a set of network parameters Perceptual loss is determined in the feature space and is defined as below, with the above training set: 1 Φ(F(x; Θ)) − Φ(y) 2F , (3) LPerception (Θ) = 2N whd where Φ is a feature extractor, w, h, and d are the width, the height and the depth of the feature space, respectively To take advantage of both of these losses, we propose a loss function that linearly combines them: L(Θ) = LMSE + λLPerception , (4) where λ is a nonnegative weight, used to balance the role of each loss function III E XPERIMENTS AND P ERFORMANCE E VALUATION To evaluate the performance of the proposed FDRN-LDCT model, some experiments are implemented We compare the proposed method to some CNN-based denoising ones and a state-of-the-art conventional method namely BM3D Two common metrics namely PSNR (Peak Signal to Noise Ratio) and SSIM (Structural SIMilarity) are used for subjective comparison A Data The model is designed to map a LDCT image to a normaldose one so it requires the dataset which contains such image pairs In our experiments, we use a clinical dataset authorized for ”the 2016 NIH-AAPM-Mayo Clinic Low Dose CT Grand Challenge” by Mayo Clinic This dataset contains low-dose and full-dose images of 10 anonymous patients For training, 22600 patch-pairs of quarter-dose and normal-dose images are extracted from 600 pairs of quarter-dose and normal-dose ones that are randomly selected in 1351 pairs of patients Images of remaining patients are used for validation (100 image pairs are randomly selected) B Models for comparison To compare the performance of the proposed FDRN-LDCT model, some models in Table I are trained and used to denoise LDCT images The original SAR-DRN is trained with CT image data with the same parameter setting as that in [8] The second model (FDRN-LDCT) is trained to investigate its ability in denoising LDCT images as compared to SARDRN The third model (FDRN-BN) is trained to see the improvement in performance of DRN-BN For FDRN-LDCT, we use two types of loss, one with only MSE loss, the other with both MSE and perception losses, where VGG is used for the perception loss All models are trained using the same dataset A state-of-the-art model namely RED-CNN [5] is also trained for comparison 190 Authorized licensed use limited to: UNIVERSITY OF WESTERN ONTARIO Downloaded on June 20,2021 at 00:04:35 UTC from IEEE Xplore Restrictions apply Fig The structure of the proposed FDRN-LCDT model TABLE I S UMMARY OF TRAINED NETWORKS Network Loss Description SAR-DRN LMSE Original model in [8] was trained with CT data set DRN(BN)-LDCT LMSE Insert BN block in to some layer of SAR-DRN FDRN-LDCT(MSE) LMSE Downsample+DRN-LDCT+Upsample FDRN-LDCT(VGG) LMSE + λLVGG FDRN-LDCT trained with combined losses RED-CNN LMSE State-of-the-art CNN [5] (a) (b) Fig Output images of SAR-DRN trained for LDCT denoising Fig The downsample layer 0.95 SSIM C Parameter Setting The weight λ in (4) is set to 0.1 The ADAM algorithm [13] is used to minimize the loss function, with all its default values of hyper-parameters The number of epochs is set to 50, the learning rate starts at 10−2 and is reduced through multiplying it by a descending factor of 0.5 after every 10 epochs These settings are based on the settings in [8] 0.85 BM3D RED-CNN DRN(BN)-LDCT FDRN-LDCT(MSE) FDRN-LDCT(VGG) BM3D RED-CNN DRN(BN)-LDCT FDRN-LDCT(MSE) FDRN-LDCT(VGG) 36 PSNR(dB) 34 D Results Figure shows some output images of the SAR-DRN model when trained for LDCT denoising It can be seen that the noise was not removed; so, this model could not be used for LDCT denoising The reason may be that the used dataset is not suitable for SAR-DRN, and/or the training parameters are not appropriate Figure illustrates the results of some methods It can be seen that all variants of SAR-DRN in Table I can perform LDCT denoising better than BM3D [14] and RED-CNN [5] For objective comparison, Figure shows the average value 0.9 32 30 28 26 Fig Statistical values of SSIM and PSNR over 100 images of SSIM and PSNR over 100 test images It can be seen that, SSIM and PSNR of FDRN-LDCT are similar to those of DRN(BN)-LDCT and slightly better than RED-CNN 191 Authorized licensed use limited to: UNIVERSITY OF WESTERN ONTARIO Downloaded on June 20,2021 at 00:04:35 UTC from IEEE Xplore Restrictions apply (a) Quarter-dose (b) BM3D (c) RED-CNN (d) FDRN-LDCT(MSE) (e) FDRN-LDCT(VGG) (f) Full-dose Fig Illustration of LDCT image denoising TABLE II N ETWORK CAPACITY AND TESTING TIME Capacity(MB) Testing time on CPU(s) RED-CNN 1.78 4.3807 DRN(BN)-LDCT 0.75 1.939 FDRN-LDCT 0.75 0.4753 E Network capacity and Computational time Table II shows the capacity and the average computational time of RED-CNN, DRN(BN)-LDCT, and FDRN-LDCT over the testing image set The testing image size is 512× 512 The computational time is calculated in case of denoising using CPU Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz ×8 The capacity of FDRN-LDCT is 0.75 MB, which is smaller than that of RED-CNN, which is 1.78 MB The computational time of FDRN-LDCT is less than that of RED-CNN and DRN(BN)-LDCT, in the case the testing is performed on the CPU IV C ONCLUSION In this paper, we have uccessfully adapted the SAR-DRN model for lDCT image denoising In three adapted models, FDRN-LDCT is promising because it requires less computational time and less network capacity while efficiently denoising LDCT images In future work, we will concentrate on model training with a more plentiful dataset, optimize parameters, and apply some techniques such as deformable convolution, separable convolution to improve the performance of FDRN-LCDT R EFERENCES [1] J Wang, T Li, H Lu, and Z Liang, “Penalized weighted least-squares approach to sinogram noise reduction and image reconstruction for low-dose x-ray computed tomography,” IEEE transactions on medical imaging, vol 25, no 10, pp 1272–1283, 2006 [2] M J Willemink, T Leiner, P A de Jong, L M de Heer, R A Nievelstein, A M Schilham, and R P Budde, “Iterative reconstruction techniques for computed tomography part 2: initial results in dose reduction and image quality,” European radiology, vol 23, no 6, pp 1632–1642, 2013 [3] J Ma, J Huang, Q Feng, H Zhang, H Lu, Z Liang, and W Chen, “Low-dose computed tomography image restoration using previous normal-dose scan,” Med Phys., vol 38, no 10, pp 5713–5731, 2011 [4] Y Chen, X Yin, L Shi, H Shu, L Luo, J.-L Coatrieux, and C Toumoulin, “Improving abdomen tumor low-dose ct images using a fast dictionary learning based processing,” Physics in Medicine & Biology, vol 58, no 16, p 5803, 2013 [5] H Chen, Y Zhang, M K Kalra, F Lin, Y Chen, P Liao, J Zhou, and G Wang, “Low-dose ct with a residual encoder-decoder convolutional neural network,” IEEE transactions on medical imaging, vol 36, no 12, pp 2524–2535, 2017 [6] Q Yang, P Yan, Y Zhang, H Yu, Y Shi, X Mou, M K Kalra, Y Zhang, L Sun, and G Wang, “Low dose CT image denoising using a generative adversarial network with Wasserstein distance and perceptual loss,” IEEE transactions on medical imaging, 2018 [7] J M Wolterink, T Leiner, M A Viergever, and I Iˇsgum, “Generative adversarial networks for noise reduction in low-dose ct,” IEEE transactions on medical imaging, vol 36, no 12, pp 2536–2545, 2017 [8] Q Zhang, Q Yuan, J Li, Z Yang, and X Ma, “Learning a dilated residual network for sar image despeckling,” Remote Sensing, vol 10, no 2, p 196, 2018 [9] J Hsieh et al., “Computed tomography: principles, design, artifacts, and recent advances.” SPIE Bellingham, WA, 2009 [10] S Ioffe and C Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” arXiv preprint arXiv:1502.03167, 2015 [11] H Shan, Y Zhang, Q Yang, U Kruger, M K Kalra, L Sun, W Cong, and G Wang, “3-d convolutional encoder-decoder network for low-dose ct via transfer learning from a 2-d trained network,” IEEE transactions on medical imaging, vol 37, no 6, pp 1522–1534, 2018 [12] J Johnson, A Alahi, and L Fei-Fei, “Perceptual losses for real-time style transfer and super-resolution,” in European conference on computer vision Springer, 2016, pp 694–711 [13] D P Kingma and J Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014 [14] K Dabov, A Foi, V Katkovnik, and K Egiazarian, “Image denoising by sparse 3-D transform-domain collaborative filtering,” IEEE Transactions on image processing, vol 16, no 8, pp 2080–2095, 2007 192 Authorized licensed use limited to: UNIVERSITY OF WESTERN ONTARIO Downloaded on June 20,2021 at 00:04:35 UTC from IEEE Xplore Restrictions apply ... DRN(BN)-LDCT FDRN-LDCT(MSE) FDRN-LDCT(VGG) BM3D RED-CNN DRN(BN)-LDCT FDRN-LDCT(MSE) FDRN-LDCT(VGG) 36 PSNR(dB) 34 D Results Figure shows some output images of the SAR-DRN model when trained for LDCT... denoising the LDCT image here can be considered as constructing a function F :x→y (1) that maps x to y Because of the complexity of the noise in LDCT images [9], it is not easy to construct F by conventional... adversarial networks for noise reduction in low-dose ct, ” IEEE transactions on medical imaging, vol 36, no 12, pp 2536–2545, 2017 [8] Q Zhang, Q Yuan, J Li, Z Yang, and X Ma, “Learning a dilated residual