Notes on neural networks and deep learning

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang	268
Dung lượng	17,63 MB

Nội dung

Introduction to Neural Networks and Deep Learning Introduction to the Convolutional Network Andres Mendez Vazquez March 28, 2021 1 148 Outline 1 Introduction The Long Path The Problem of Image Proce.

Introduction to Neural Networks and Deep Learning Introduction to the Convolutional Network Andres Mendez-Vazquez March 28, 2021 / 148 Outline Introduction The Long Path The Problem of Image Processing Multilayer Neural Network Classification Drawbacks Possible Solution Convolutional Networks History Local Connectivity Sharing Parameters Layers Convolutional Layer Convolutional Architectures A Little Bit of Notation Deconvolution Layer Alternating Minimization Non-Linearity Layer Fixing the Problem, ReLu function Back to the Non-Linearity Layer Rectification Layer Local Contrast Normalization Layer Sub-sampling and Pooling Strides Normalization Layer AKA Batch Normalization Finally, The Fully Connected Layer An Example of CNN The Proposed Architecture Backpropagation Deriving wr,s,k Deriving the Kernel Filters / 148 Outline Introduction The Long Path The Problem of Image Processing Multilayer Neural Network Classification Drawbacks Possible Solution Convolutional Networks History Local Connectivity Sharing Parameters Layers Convolutional Layer Convolutional Architectures A Little Bit of Notation Deconvolution Layer Alternating Minimization Non-Linearity Layer Fixing the Problem, ReLu function Back to the Non-Linearity Layer Rectification Layer Local Contrast Normalization Layer Sub-sampling and Pooling Strides Normalization Layer AKA Batch Normalization Finally, The Fully Connected Layer An Example of CNN The Proposed Architecture Backpropagation Deriving wr,s,k Deriving the Kernel Filters / 148 The Long Path [1] 2018 Channel Boosting Beyond 2018 Attention Channel Boosted CNN CBAM Residual Attention Module 2018 Feature Map Explotation A Small History of a Revolution Complex Architectures and The Attention Revolution PolyNet WideResNext 2017 ResNext The Beginnig of Atention? ZfNet Parameter Optimization Residual and Multipath Architectures 2016 Multi-Path Connectivity FractalNet Dense Net ResNet 2015 Skip Connections Highway Net Depth Revolution VGG 2014 Eﬀective Reception Filed (Small Size Filters) Inception-ResNet Factorization Spatial Explotation Depth Explotation Spatial Explotation 2010 ImageNet 2006 GPU 2007 NVIDIA PROGRAMMING The Revolution Inception-V2 Parallelism 2014 GoogleNet Inception Block First Results Inception-V4 Inception-V3 Bottleneck 2006 Maxpooling 6.5 Early 2000 CNN Stagnation Feature Visualization 2017 4.5 SE Net Transformers-CNN PyramidalNet Width Explotation 1 6 CMPE-SE 2013 2012 3D CNN's AlexNet 1998 LeNet 1989 Early Attempts 1979 ConvNet Neurocognition / 148 Outline Introduction The Long Path The Problem of Image Processing Multilayer Neural Network Classification Drawbacks Possible Solution Convolutional Networks History Local Connectivity Sharing Parameters Layers Convolutional Layer Convolutional Architectures A Little Bit of Notation Deconvolution Layer Alternating Minimization Non-Linearity Layer Fixing the Problem, ReLu function Back to the Non-Linearity Layer Rectification Layer Local Contrast Normalization Layer Sub-sampling and Pooling Strides Normalization Layer AKA Batch Normalization Finally, The Fully Connected Layer An Example of CNN The Proposed Architecture Backpropagation Deriving wr,s,k Deriving the Kernel Filters / 148 Digital Images as pixels in a digitized matrix [2] Ilumination Source Output Ilumination Source / 148 Further [2] Pixel values typically represent Gray levels, colors, heights, opacities etc Something Notable Remember digitization implies that a digital image is an approximation of a real scene / 148 Further [2] Pixel values typically represent Gray levels, colors, heights, opacities etc Something Notable Remember digitization implies that a digital image is an approximation of a real scene / 148 Images Common image formats include On sample/pixel per point (B&W or Grayscale) Three samples/pixel per point (Red, Green, and Blue) Four samples/pixel per point (Red, Green, Blue, and “Alpha”) / 148 Therefore, we have the following process Low Level Process Imagen Noise Removal Sharpening / 148 (5) Now, we need to derive ∂z1 (5) ∂wr,s,k We know that (6) (5) z1 m2 (6) m3 (5) (4) wr,s,k Yk = k=1 r=1 s=1 r,s Finally (5) ∂z1 (5) ∂wr,s,k (4) = Yk r,s 138 / 148 Maxpooling This is not derived after all, but we go directly go for the max term Assume you get the max element for f = 1, 2, , and j = (l) h1 (3) Yf = x,y (3) Bf (l) h2 (3) Kf + x,y (l) k=−h1 (l) (2) k,t Y1 x−k,x−t t=−h2 139 / 148 Therefore We have then (3) ∂ Kf (6) (6) ∂L k,t ∂ y1 − t1 = × (3) ∂ Kf k,t We have the following chain of derivations given (4) Yf x,y =f ∂L ∂ (3) Kf k,t (3) Yf x,y (5) (6) (6) = y1 − t ∂f z1 (5) ∂z1 ∂f (5) ∂zi × ∂ (4) Yf x,y × ∂ (3) Yf x,y (3) Kf k,t 140 / 148 Therefore We have then (3) ∂ Kf (6) (6) ∂L k,t ∂ y1 − t1 = × (3) ∂ Kf k,t We have the following chain of derivations given (4) Yf x,y =f ∂L ∂ (3) Kf k,t (3) Yf x,y (5) (6) (6) = y1 − t ∂f z1 (5) ∂z1 ∂f (5) ∂zi × ∂ (4) Yf x,y × ∂ (3) Yf x,y (3) Kf k,t 140 / 148 Therefore We have (5) ∂zi ∂ (5) (3) Yf x,y = wx,y,f Then assuming that (l) h1 (3) Yf = x,y (3) Bf (l) h2 (3) Kf + x,y (l) k=−h1 (l) (2) k,t Y1 x−k,x−t t=−h2 141 / 148 Therefore We have (5) ∂zi ∂ (5) (3) Yf x,y = wx,y,f Then assuming that (l) h1 (3) Yf = x,y (3) Bf (l) h2 (3) Kf + x,y (l) k=−h1 (l) (2) k,t Y1 x−k,x−t t=−h2 141 / 148 Therefore We have ∂f ∂ (3) Yf ∂f x,y = (3) Kf k,t ∂ (3) Yf (3) x,y × (3) Yf x,y ∂ Yf ∂ x,y (3) Kf k,t Then ∂f ∂ (3) Yf x,y (3) Yf x,y =f (3) Yf x,y 142 / 148 Therefore We have ∂f ∂ (3) Yf ∂f x,y = (3) Kf k,t ∂ (3) Yf (3) x,y × (3) Yf x,y ∂ Yf ∂ x,y (3) Kf k,t Then ∂f ∂ (3) Yf x,y (3) Yf x,y =f (3) Yf x,y 142 / 148 Finally, we have The equation (3) ∂ Yf ∂ x,y (3) Kf k,t (2) = Y1 x−k,x−t 143 / 148 The Other Equations I will leave you to devise them They are a repetitive procedure The interesting case the average pooling The others are the stride and the deconvolution 144 / 148 The Other Equations I will leave you to devise them They are a repetitive procedure The interesting case the average pooling The others are the stride and the deconvolution 144 / 148 [1] A Khan, A Sohail, U Zahoora, and A S Qureshi, “A survey of the recent architectures of deep convolutional neural networks,” Artificial Intelligence Review, vol 53, no 8, pp 5455–5516, 2020 [2] R Szeliski, Computer Vision: Algorithms and Applications Berlin, Heidelberg: Springer-Verlag, 1st ed., 2010 [3] S Haykin, Neural Networks and Learning Machines No v 10 in Neural networks and learning machines, Prentice Hall, 2009 [4] D H Hubel and T N Wiesel, “Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex,” The Journal of physiology, vol 160, no 1, p 106, 1962 [5] Y LeCun, L Bottou, Y Bengio, P Haffner, et al., “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol 86, no 11, pp 2278–2324, 1998 145 / 148 [6] W Zhang, K Itoh, J Tanida, and Y Ichioka, “Parallel distributed processing model with local space-invariant interconnections and its optical architecture,” Appl Opt., vol 29, pp 4790–4797, Nov 1990 [7] J J Weng, N Ahuja, and T S Huang, “Learning recognition and segmentation of 3-d objects from 2-d images,” in 1993 (4th) International Conference on Computer Vision, pp 121–128, IEEE, 1993 [8] M D Zeiler, D Krishnan, G W Taylor, and R Fergus, “Deconvolutional networks,” in 2010 IEEE Computer Society Conference on computer vision and pattern recognition, pp 2528–2535, IEEE, 2010 [9] D Krishnan and R Fergus, “Fast image deconvolution using hyper-laplacian priors,” Advances in neural information processing systems, vol 22, pp 1033–1041, 2009 146 / 148 [10] Y Wang, J Yang, W Yin, and Y Zhang, “A new alternating minimization algorithm for total variation image reconstruction,” SIAM Journal on Imaging Sciences, vol 1, no 3, pp 248–272, 2008 [11] I Goodfellow, Y Bengio, and A Courville, Deep Learning The MIT Press, 2016 [12] J T Springenberg, A Dosovitskiy, T Brox, and M Riedmiller, “Striving for simplicity: The all convolutional net,” 2015 [13] T G Kolda and B W Bader, “Tensor decompositions and applications,” SIAM review, vol 51, no 3, pp 455–500, 2009 [14] C Kong and S Lucey, “Take it in your stride: Do we need striding in cnns?,” arXiv preprint arXiv:1712.02502, 2017 [15] S Ioffe and C Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” arXiv preprint arXiv:1502.03167, 2015 147 / 148 [16] S Santurkar, D Tsipras, A Ilyas, and A Madry, “How does batch normalization help optimization?,” in Advances in Neural Information Processing Systems, pp 2483–2493, 2018 148 / 148 ... Convolutional Networks History Local Connectivity Sharing Parameters Layers Convolutional Layer Convolutional Architectures A Little Bit of Notation Deconvolution Layer Alternating Minimization Non-Linearity... Convolutional Networks History Local Connectivity Sharing Parameters Layers Convolutional Layer Convolutional Architectures A Little Bit of Notation Deconvolution Layer Alternating Minimization Non-Linearity... Layers Convolutional Layer Convolutional Architectures A Little Bit of Notation Deconvolution Layer Alternating Minimization Non-Linearity Layer Fixing the Problem, ReLu function Back to the Non-Linearity

Ngày đăng: 09/09/2022, 20:04