Introduction to Neural Networks and Deep Learning Introduction to the Convolutional Network Andres Mendez Vazquez March 28, 2021 1 148 Outline 1 Introduction The Long Path The Problem of Image Proce.
Introduction to Neural Networks and Deep Learning Introduction to the Convolutional Network Andres Mendez-Vazquez March 28, 2021 / 148 Outline Introduction The Long Path The Problem of Image Processing Multilayer Neural Network Classification Drawbacks Possible Solution Convolutional Networks History Local Connectivity Sharing Parameters Layers Convolutional Layer Convolutional Architectures A Little Bit of Notation Deconvolution Layer Alternating Minimization Non-Linearity Layer Fixing the Problem, ReLu function Back to the Non-Linearity Layer Rectification Layer Local Contrast Normalization Layer Sub-sampling and Pooling Strides Normalization Layer AKA Batch Normalization Finally, The Fully Connected Layer An Example of CNN The Proposed Architecture Backpropagation Deriving wr,s,k Deriving the Kernel Filters / 148 Outline Introduction The Long Path The Problem of Image Processing Multilayer Neural Network Classification Drawbacks Possible Solution Convolutional Networks History Local Connectivity Sharing Parameters Layers Convolutional Layer Convolutional Architectures A Little Bit of Notation Deconvolution Layer Alternating Minimization Non-Linearity Layer Fixing the Problem, ReLu function Back to the Non-Linearity Layer Rectification Layer Local Contrast Normalization Layer Sub-sampling and Pooling Strides Normalization Layer AKA Batch Normalization Finally, The Fully Connected Layer An Example of CNN The Proposed Architecture Backpropagation Deriving wr,s,k Deriving the Kernel Filters / 148 The Long Path [1] 2018 Channel Boosting Beyond 2018 Attention Channel Boosted CNN CBAM Residual Attention Module 2018 Feature Map Explotation A Small History of a Revolution Complex Architectures and The Attention Revolution PolyNet WideResNext 2017 ResNext The Beginnig of Atention? ZfNet Parameter Optimization Residual and Multipath Architectures 2016 Multi-Path Connectivity FractalNet Dense Net ResNet 2015 Skip Connections Highway Net Depth Revolution VGG 2014 Effective Reception Filed (Small Size Filters) Inception-ResNet Factorization Spatial Explotation Depth Explotation Spatial Explotation 2010 ImageNet 2006 GPU 2007 NVIDIA PROGRAMMING The Revolution Inception-V2 Parallelism 2014 GoogleNet Inception Block First Results Inception-V4 Inception-V3 Bottleneck 2006 Maxpooling 6.5 Early 2000 CNN Stagnation Feature Visualization 2017 4.5 SE Net Transformers-CNN PyramidalNet Width Explotation 1 6 CMPE-SE 2013 2012 3D CNN's AlexNet 1998 LeNet 1989 Early Attempts 1979 ConvNet Neurocognition / 148 Outline Introduction The Long Path The Problem of Image Processing Multilayer Neural Network Classification Drawbacks Possible Solution Convolutional Networks History Local Connectivity Sharing Parameters Layers Convolutional Layer Convolutional Architectures A Little Bit of Notation Deconvolution Layer Alternating Minimization Non-Linearity Layer Fixing the Problem, ReLu function Back to the Non-Linearity Layer Rectification Layer Local Contrast Normalization Layer Sub-sampling and Pooling Strides Normalization Layer AKA Batch Normalization Finally, The Fully Connected Layer An Example of CNN The Proposed Architecture Backpropagation Deriving wr,s,k Deriving the Kernel Filters / 148 Digital Images as pixels in a digitized matrix [2] Ilumination Source Output Ilumination Source / 148 Further [2] Pixel values typically represent Gray levels, colors, heights, opacities etc Something Notable Remember digitization implies that a digital image is an approximation of a real scene / 148 Further [2] Pixel values typically represent Gray levels, colors, heights, opacities etc Something Notable Remember digitization implies that a digital image is an approximation of a real scene / 148 Images Common image formats include On sample/pixel per point (B&W or Grayscale) Three samples/pixel per point (Red, Green, and Blue) Four samples/pixel per point (Red, Green, Blue, and “Alpha”) / 148 Therefore, we have the following process Low Level Process Imagen Noise Removal Sharpening / 148 (5) Now, we need to derive ∂z1 (5) ∂wr,s,k We know that (6) (5) z1 m2 (6) m3 (5) (4) wr,s,k Yk = k=1 r=1 s=1 r,s Finally (5) ∂z1 (5) ∂wr,s,k (4) = Yk r,s 138 / 148 Maxpooling This is not derived after all, but we go directly go for the max term Assume you get the max element for f = 1, 2, , and j = (l) h1 (3) Yf = x,y (3) Bf (l) h2 (3) Kf + x,y (l) k=−h1 (l) (2) k,t Y1 x−k,x−t t=−h2 139 / 148 Therefore We have then (3) ∂ Kf (6) (6) ∂L k,t ∂ y1 − t1 = × (3) ∂ Kf k,t We have the following chain of derivations given (4) Yf x,y =f ∂L ∂ (3) Kf k,t (3) Yf x,y (5) (6) (6) = y1 − t ∂f z1 (5) ∂z1 ∂f (5) ∂zi × ∂ (4) Yf x,y × ∂ (3) Yf x,y (3) Kf k,t 140 / 148 Therefore We have then (3) ∂ Kf (6) (6) ∂L k,t ∂ y1 − t1 = × (3) ∂ Kf k,t We have the following chain of derivations given (4) Yf x,y =f ∂L ∂ (3) Kf k,t (3) Yf x,y (5) (6) (6) = y1 − t ∂f z1 (5) ∂z1 ∂f (5) ∂zi × ∂ (4) Yf x,y × ∂ (3) Yf x,y (3) Kf k,t 140 / 148 Therefore We have (5) ∂zi ∂ (5) (3) Yf x,y = wx,y,f Then assuming that (l) h1 (3) Yf = x,y (3) Bf (l) h2 (3) Kf + x,y (l) k=−h1 (l) (2) k,t Y1 x−k,x−t t=−h2 141 / 148 Therefore We have (5) ∂zi ∂ (5) (3) Yf x,y = wx,y,f Then assuming that (l) h1 (3) Yf = x,y (3) Bf (l) h2 (3) Kf + x,y (l) k=−h1 (l) (2) k,t Y1 x−k,x−t t=−h2 141 / 148 Therefore We have ∂f ∂ (3) Yf ∂f x,y = (3) Kf k,t ∂ (3) Yf (3) x,y × (3) Yf x,y ∂ Yf ∂ x,y (3) Kf k,t Then ∂f ∂ (3) Yf x,y (3) Yf x,y =f (3) Yf x,y 142 / 148 Therefore We have ∂f ∂ (3) Yf ∂f x,y = (3) Kf k,t ∂ (3) Yf (3) x,y × (3) Yf x,y ∂ Yf ∂ x,y (3) Kf k,t Then ∂f ∂ (3) Yf x,y (3) Yf x,y =f (3) Yf x,y 142 / 148 Finally, we have The equation (3) ∂ Yf ∂ x,y (3) Kf k,t (2) = Y1 x−k,x−t 143 / 148 The Other Equations I will leave you to devise them They are a repetitive procedure The interesting case the average pooling The others are the stride and the deconvolution 144 / 148 The Other Equations I will leave you to devise them They are a repetitive procedure The interesting case the average pooling The others are the stride and the deconvolution 144 / 148 [1] A Khan, A Sohail, U Zahoora, and A S Qureshi, “A survey of the recent architectures of deep convolutional neural networks,” Artificial Intelligence Review, vol 53, no 8, pp 5455–5516, 2020 [2] R Szeliski, Computer Vision: Algorithms and Applications Berlin, Heidelberg: Springer-Verlag, 1st ed., 2010 [3] S Haykin, Neural Networks and Learning Machines No v 10 in Neural networks and learning machines, Prentice Hall, 2009 [4] D H Hubel and T N Wiesel, “Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex,” The Journal of physiology, vol 160, no 1, p 106, 1962 [5] Y LeCun, L Bottou, Y Bengio, P Haffner, et al., “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol 86, no 11, pp 2278–2324, 1998 145 / 148 [6] W Zhang, K Itoh, J Tanida, and Y Ichioka, “Parallel distributed processing model with local space-invariant interconnections and its optical architecture,” Appl Opt., vol 29, pp 4790–4797, Nov 1990 [7] J J Weng, N Ahuja, and T S Huang, “Learning recognition and segmentation of 3-d objects from 2-d images,” in 1993 (4th) International Conference on Computer Vision, pp 121–128, IEEE, 1993 [8] M D Zeiler, D Krishnan, G W Taylor, and R Fergus, “Deconvolutional networks,” in 2010 IEEE Computer Society Conference on computer vision and pattern recognition, pp 2528–2535, IEEE, 2010 [9] D Krishnan and R Fergus, “Fast image deconvolution using hyper-laplacian priors,” Advances in neural information processing systems, vol 22, pp 1033–1041, 2009 146 / 148 [10] Y Wang, J Yang, W Yin, and Y Zhang, “A new alternating minimization algorithm for total variation image reconstruction,” SIAM Journal on Imaging Sciences, vol 1, no 3, pp 248–272, 2008 [11] I Goodfellow, Y Bengio, and A Courville, Deep Learning The MIT Press, 2016 [12] J T Springenberg, A Dosovitskiy, T Brox, and M Riedmiller, “Striving for simplicity: The all convolutional net,” 2015 [13] T G Kolda and B W Bader, “Tensor decompositions and applications,” SIAM review, vol 51, no 3, pp 455–500, 2009 [14] C Kong and S Lucey, “Take it in your stride: Do we need striding in cnns?,” arXiv preprint arXiv:1712.02502, 2017 [15] S Ioffe and C Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” arXiv preprint arXiv:1502.03167, 2015 147 / 148 [16] S Santurkar, D Tsipras, A Ilyas, and A Madry, “How does batch normalization help optimization?,” in Advances in Neural Information Processing Systems, pp 2483–2493, 2018 148 / 148 ... Convolutional Networks History Local Connectivity Sharing Parameters Layers Convolutional Layer Convolutional Architectures A Little Bit of Notation Deconvolution Layer Alternating Minimization Non-Linearity... Convolutional Networks History Local Connectivity Sharing Parameters Layers Convolutional Layer Convolutional Architectures A Little Bit of Notation Deconvolution Layer Alternating Minimization Non-Linearity... Layers Convolutional Layer Convolutional Architectures A Little Bit of Notation Deconvolution Layer Alternating Minimization Non-Linearity Layer Fixing the Problem, ReLu function Back to the Non-Linearity