Thearchitecture encompasses an input layer comprising pass-through neurons, oneor more hidden layers of TLUs, and ultimately a output layer of TLUs.. Eachlayer incorporates a bias neuron
Advancements in Medical Image Segmentation and Innovative Ap-
Image segmentation is a pivotal and challenging topic in the field of computer vision [6] Its objective is to partition an image in a way that accurately locates, identifies, and quantifies objects This process holds crucial importance in medical imaging, supporting additional clinical analysis, diagnosis, therapy planning, and disease progression measurement Within the domain of medical image segmen- tation, several primary obstacles exist These include a scarcity of well-labeled benchmarks for training, a deficiency of annotated images [7], a lack of consis- tent segmentation techniques, poor image resolution, and significant variability in image quality across patients [8] Precise calculation of segmentation accuracy and uncertainty is vital for gauging performance in other applications [9] Con- sequently, this underscores the imperative for advanced methodologies, such as Artificial Intelligence (AI)-based approaches, to enable automated, generalizable, and efficient medical image segmentation.
In the context of developing AI systems, the attributes of generalization and robustness bear critical significance, particularly in clinical trials [10] Conse- quently, the development of a resilient architecture suited for diverse biomedical applications becomes paramount Recently, convolutional neural networks (CNNs) have emerged as advanced tools for automating the segmentation of medical im- ages [11–13] This includes various modalities such as X-rays, CT scans, and MRIs, with promising outcomes compared to conventional segmentation meth- ods [14, 15] Among different CNN versions, encoder-decoder networks like Fully Convolutional Networks (FCN) [16] and their advancement such as U-Net [17] have gained substantial traction as semantic segmentation techniques for 2D im- ages A deep fully convolutional neural network designed for semantic pixel-wise segmentation that requires fewer trainable parameters yet yields high-quality seg- mentation maps was introduced by [18] Addressing dense prediction challenges, a novel convolutional network module was proposed by [19] This module utilized dilated convolutions to systematically aggregate multi-scale contextual features, resulting in a significant performance enhancement for advanced automated seg- mentation systems Moreover, [20] introduced DeepLab as a segmentation method. DeeplabV3 [21], without DenseCRF fine-tuning, demonstrated considerable im- provements over earlier DeepLab iterations, utilizing a synthetic approach with fewer convolutional layers than FCN and U-Net architectures, along with skip con- nections between the encoder and decoder paths An efficient scene parsing net- work for comprehending complex receptive fields was proposed by [22] This approach utilized global pyramidal characteristics to facilitate the acquisition of additional contextual information.
Throughout the training process, CNN model parameters are typically re- fined using gradient descent techniques, as outlined by [23], wherein errors are quantified by a loss function that contrasts predicted labels against ground truth labels For classification endeavors, prevalent loss functions encompass cross- entropy (CE) and the L2 norm, often referred to as the mean squared error (MSE), as frequently cited in the works [24, 25] Conversely, problems centered on seg- mentation have commonly engaged the Dice Coefficient (DC) and cross-entropy (CE) [17, 26] Despite the recent strides made in CNN deployment for biomedical image segmentation, prevalent loss functions frequently revolve around pixel-wise similarity evaluation Notably, the DC and CE are tailored towards specific region feature extraction While this framework often yields impressive classification and segmentation outcomes, low loss function values do not always signify meaning- ful segmentation Instances arise where noisy images produce several indistinct contours, signaling erroneous predictions, and the indistinctness of object bound- aries stems from the difficulty in classifying pixels near the contour An additional challenge arises from susceptibility to local minima due to aberrations within the training database, high dimensionality, and the non-convex attributes of loss func- tions, as illuminated by [27].
Among frequent deep-CNN approaches, fully convolutional network (FCN)
[28] and U-Net [17] have been designed that deconvolutional operations replace fully connected layers to strengthen temporal coherence; also, skip connections are used for inheriting spatial information in deeper layers Depthwise convolution
[29], which is channel-wise n×n spatial convolution, segregates the image into several channels before convolving it with the preferable channel and then stacking these channels back Pointwise convolution [29] is 1×1 convolution operation for adjusting the feature map dimension A Depthwise Separable convolution [29] is defined as the depthwise convolution followed by the pointwise convolution,which helps prevent the model from getting overfitting by reducing the number of connections in the model.
Dilated convolution [30] expands window size without increasing the number of weights by adding zero-values into convolution kernels while maintaining com- putation cost Adaptive Dilated Convolution [31] generates and fuses multi-scale features of similar spatial sizes by setting various dilation rates for different chan- nels Applying dilated convolution, Compact Dilation Convolution-based Module (CDCM) [32] is adopted in my proposed model for more useful features.
Region-based Tversky loss [33] and Focal Tversky loss [34] control the in- formation flow implicitly through pixel-level affinity and tackle class-imbalanced problems; however, their contour optimization processes are not good enough. There has been an ongoing concern about exploiting the active contour models as loss functions in deep-learning solutions for better contour optimization Region- based active contour Chan-Vese model [35] has been successful for training images with two regions, each having a different mean of pixel intensity Inheriting the advantage of Mumford-Shah functional and the AC loss with some adjustments obtains the LMS loss [36] Acquiring the requirements for boundary optimiza- tion and addressing the class-imbalanced problem, I propose a new Focal Active Contour loss function.
This study yields several noteworthy contributions:
• Innovative Loss Function: I introduce a novel loss function tailored for the training process of deep-learning models By incorporating elements of active contour methodology into the loss functions, I aim to tackle a persistent chal- lenge encountered in medical imaging and computer vision - the problem of intensity inhomogeneity within image data This amalgamation of techniques offers a promising avenue to address this issue effectively It not only helps deep-learning models achieve more accurate and robust segmentation results but also paves the way for more precise and reliable image analysis across var- ious applications, ultimately advancing the capabilities of AI-driven solutions in the field.
• End-to-End CNN Model Development: Inspired by PiDiNet, I propose a new architecture by modifying this network from FCN-shape into U-Net- shape, using CDCM modules (without CSAM followed); combining with an Attention module, Depthwise-and-Pointwise module.
• Thorough Evaluation and Comparison: A comprehensive evaluation of both my proposed model and the introduced loss function is conducted across2D and 3D datasets These evaluations are benchmarked against existing state- of-the-art methods Notably, my approach consistently demonstrates promis- ing outcomes when compared to baseline algorithms This observation is substantiated across diverse datasets including the Lesion Boundary Segmen- tation ISIC-2018 dataset, the dermoscopic PH2 dataset, the 2017 MICCAI sub-challenge on automatic cardiac diagnosis benchmark, and the 6-month infant brain MRI Segmentation (iSeg) benchmark.
THEORETICAL BASIS 6
Artificial Neural Networks
Deep learning is a machine learning technique that is very significant It teaches a computer (PC) to filter inputs through layers in order to predict and cat- egorize data Observations may take the form of images, text, or sound The way the human brain filters knowledge is the driving force behind deep learning Its aim is to imitate how the human brain seeks to conjure up some real magic There are about 100 billion neurons in the human brain A single neuron interacts with approximately 100,000 of its peers That is what I am attempting to build, although in a computer manner As a result, the neuron (or Node) receives a signal or sig- nals (input values) that pass through it The output signal is transmitted by that neuron This knowledge is broken down into numbers and bits of binary data that a computer can understand.
What about synapses? Every one of the neurotransmitters gets assigned weights, which are important to Artificial Neural Networks (ANNs) Weights are the way ANNs learn By changing the weights, the ANN chooses to what degree signals get passed along and the weights are changed while training your network.
For some decades ago, McCulloch suggested a immensely basic architecture of a biological neuron [37], which has one or more binary (on/off) inputs and one binary output, was later called an artificial neuron When more than a certain number of its inputs are involved, the artificial neuron stimulates its output They demonstrated in their paper that even with such a simplistic model, a network of artificial neurons can be built to compute any logical proposition.
The Perceptron, which is one of the most basic ANN architectures, was
Frank Rosenblatt [38] created The threshold logic unit (TLU) is derived from a marginally different artificial neuron (Figure 2.1) or sometimes a linear threshold unit (LTU) The inputs and outputs now are both numbers (rather than binary on/off values), and each input relation has a weight assigned to it The TLU calculates a weighted sum of its inputs (z=w 1 x 1 +w 2 x 2 + +w n x n =x T w), then such sum is added by a step function and returned the result: h w (x) =step(z)where z=x T w.
A Perceptron comprises a layer of Threshold Logic Units (TLUs), each intricately connected to all the inputs This layer is recognized as a fully connected layer or a dense layer when each neuron within the layer establishes connections with every neuron in the preceding layer The Perceptron’s inputs are channeled to input neurons, which serve as pass-through units, directly outputting the received input. The assembly of these input neurons constitutes theinput layer It’s worth noting that an additional bias term is commonly integrated (x0=1), typically introduced
Figure 2.1 Threshold logic unit: an artificial neuron applies a step function after calculating the weighted sum of its inputs [39] through a specialized neuron known as abias neuron, perpetually yielding an out- put of 1 A visual representation of this setup can be seen in Figure 2.2, illustrating a Perceptron equipped with two inputs and three outputs In this case, the Percep- tron functions as a multi-output classifier, concurrently categorizing instances into three distinct binary classes Perceptrons are trained using a variety of rules that
Figure 2.2 Perceptron architecture of two neurons input, one neuron bias, and three neurons in output [39] consider the network’s error when making predictions The Perceptron learning rule refines correlations, progressively minimizing error [40] In greater detail, the Perceptron is sequentially exposed to individual training instances, yielding pre- dictions for each instance If an output neuron generates an incorrect prediction, the correlation weights pertaining to inputs that would have led to the accurate prediction are adjusted This rule is represented by Equation 2.1: w next step i, j =wi, j+η(y j −yˆj) (2.1)
• w i, j is the weight linking thei th input neuron and the j th output neuron.
• xi is thei th input value of the current training sample.
• yj is the target output of the j th output neuron for the current training sample.
• ˆyj is the output of the j th output neuron for the current training instance.
• η denotes the learning rate during training (typically adjusted as needed).
Given that the decision boundaries of individual output neurons remain lin- ear, Perceptrons inherently struggle to capture intricate patterns However, stacking multiple Perceptrons collectively mitigates these limitations This composite struc- ture is known as a Multilayer Perceptron (MLP), as illustrated in Figure 2.3 The architecture encompasses an input layer (comprising pass-through neurons), one or more hidden layers of TLUs, and ultimately a output layer of TLUs Each layer incorporates a bias neuron except for the output layer, and these layers are fully connected to one another, creating a comprehensive neural network A deep
Figure 2.3 Multilayer Perceptron architecture has two inputs, four neurons in one hidden layer and three neurons in output layer [39] neural network (DNN) is described as an ANN with a large number of hidden layers.
Deep neural network
Deep Learning revolves around the exploration of deep neural networks (DNNs), which frequently consist of intricate sequences of computations Representing the output of hidden layers as h (l) (Z), the computation for a neural network with L hidden layers is depicted as: f(a) = f h z (L+1) h (L) z (L) ã ã ã h (1) z (1) (a)i
Each pre-activation functionz (l) (a)entails a linear operation governed by the weight matrixW (l) and biasb (l) : z (l) j =w (l)T j a (l−1) +b (l) j
The activation functions within the hidden layer, denoted ash (l) (Z), typically exhibit uniformity across layers However, there are instances where distinct acti- vation functions are employed to serve specific purposes For further clarity, the process of feedforward from the (l−1) th to the l th layer is illustrated in Figure 2.4 below For years, researchers struggled to train Multilayer Perceptrons (MLPs)
Figure 2.4 Hidden layers in Deep Neural Network [41] effectively However, in 1986, David Rumelhart introduced a groundbreaking ap- proach [42] that revolutionized the field This approach implemented the backprop- agation training algorithm, which remains a cornerstone of neural network train- ing In essence, it leverages Gradient Descent [43] along with an efficient means of automatically calculating gradients The backpropagation algorithm computes the gradient of the network’s error with respect to each model parameter in just two passes through the network – one forward and one backward This algorithm efficiently determines how relation weights and bias terms should be adjusted to minimize error It repetitively undertakes a regular Gradient Descent step using these computed gradients, iteratively moving towards a solution.
Key aspects of the backpropagation algorithm include:
• Mini-Batch Processing and Epochs: The algorithm operates on one mini- batch at a time (typically comprising a power of two instances for computa- tional efficiency), cycling through the entire training dataset multiple times – each complete cycle is termed an epoch This iterative process aids in the gradual reduction of losses.
• Forward Pass: The input layer sends the first hidden layer each mini-batch.
Subsequently, the algorithm computes the contributions of all neurons within this layer for each sample in the mini-batch This result is then propagated forward to the subsequent layer, repeating this process layer by layer until the output layer is reached This forward pass is akin to making projections, with the distinction that intermediary outcomes are retained for utilization during the backward pass.
• Error Calculation: Subsequent to the forward pass, the algorithm calculates the network’s performance error.
• Output Contribution Evaluation: The algorithm assesses the contribution of each output relation to the error Leveraging the chain rule, this process is executed analytically, ensuring efficiency and precision.
• Backward Error Propagation: By employing the chain rule, the algorithm quantifies the extent to which each error input stems from each link within the layer directly below This backward process extends until the input layer is reached As previously highlighted, this backward propagation effectively assesses the error gradient throughout the entire neural network, traversing the network’s relation weights.
• Gradient Descent Phase: The final step involves adjusting all the network’s relation weights using the computed error gradients during a Gradient Descent phase.
The backpropagation algorithm’s significance warrants reiteration: it initiates with a prediction (forward pass), calculates the error for each training step, retraces through each layer to compute error contributions from connections (reverse pass),and subsequently adjusts connection weights to minimize error (Gradient Descent step) To facilitate the proper functioning of this algorithm, a pivotal enhancement was made to the MLP’s architecture: the replacement of the step function with the logistic (sigmoid) function [44], denoted asσ(z) = 1
1+e (−z) The logistic function is characterized by a continuous nonzero derivative across its domain, enabling Gradient Descent to make progress at each step In contrast, the step function features flat segments, leading to the absence of gradients for computation.
However, a challenge arises: as the algorithm progresses down to lower lay- ers, gradients diminish due to the cumulative effect of multiplications by values less than 1 Consequently, the Gradient Descent updates predominantly influence lower layer relation weights, preventing convergence to a single solution—a predicament known as the vanishing gradients problem Conversely, gradients can surge in magnitude, causing layers to receive excessively large weight updates, ultimately leading to divergence—an issue termed theexploding gradients problem A tech- nique involving the logistic activation function and initialization procedure was presented in [45] This study demonstrated that each layer’s output variance ex- ceeds its input variance significantly As the network advances, variance escalates with each layer, culminating in activation saturation in the upper layers Notably, this saturation is exacerbated by the logistic function’s mean of 0.5, which diverges from 0.
With respect to the logistic activation function (depicted in Figure 2.5), it’s evident that the function saturates at 0 or 1 as inputs become increasingly large (negative or positive), leading to derivatives that approach zero Consequently, there exists minimal gradient available for back propagation, and any existing gra- dient becomes diluted as it traverses the network’s upper layers during back prop- agation Therefore, Glorot and Bengio [45] suggested a way to reduce the unstable
Figure 2.5 Logistic activation function saturation [39] gradient issue dramatically, it is Glorot and He Initialization.
The proper propagation of signals in both forward and backward passes is crucial in neural networks During prediction (forward pass) and gradient compu- tation (backward pass), signals must traverse accurately in both directions Authors emphasize that for correct signal flow, the output variance of a layer should match the input variance, ensuring proper signal propagation Furthermore, gradients need to be adjusted both before and after they travel through the back direction of the layer Achieving these conditions isn’t guaranteed even when the input and neuron layer have an equal number of connections (referred to as the f anin and f anout of the layer).
However, Glorot and Bengio introduced a practical approach that has proven effective: initializing the connection weights of each layer with random values de- fined by equations (2.4) and (2.5), which involve normal distribution and uniform distribution with the parameters outlined Notably, f anavg= (f anin+ f anout)/2. This initialization strategy is referred to as Xavier initialization or Glorot initial- ization in [45] The significance of this technique has been recognized for over a decade Applying Glorot initialization significantly accelerates training and is one of the influential strategies that have contributed to the success of Deep Learning.
Similar techniques for different activation functions have been presented in certain papers [46] These approaches share a common framework with variations in the variance scale: σ 2 = 2 f anin
In the case of the uniform distribution, the value ofr is computed asr√ 3σ 2 Particularly, the initialization technique tailored for the Rectified Linear Unit (ReLU) activation function, which will be discussed in the subsequent subsection, is sometimes referred to asHe initialization.
The backpropagation algorithm not only performs effectively with the logistic equation but also proves successful with various other activation functions Several common options are presented below.
To address the vanishing gradient problem [47] associated with sigmoid acti- vation, the Linear Unit [48] or Rectified Linear Unit (ReLU) was introduced The
ReLU activation function is illustrated in Figure 2.6 Unlike the sigmoid func- tion, ReLU doesn’t suffer from vanishing gradients Specifically, its derivative is
0 for x