TAI LIEU MON HOC LAP TRINH MACHINE LEARNING

Trí tuệ nhân tạo (AI) là ngành tạo ra máy móc và hệ thống thông minh thông qua việc sử dụng mô hình máy tính, kỹ thuật và công nghệ liên quan, giúp thực hiện các công việc yêu cầu trí thông minh của con người. Nhìn chung, đây là một ngành học rất rộng, bao gồm các yếu tố tâm lý học, khoa học máy tính và kỹ thuật. Một số ví dụ phổ biến về AI có thể kể đến ô tô tự lái, phần mềm dịch thuật tự động, trợ lý ảo trên điện thoại hay đối thủ ảo khi chơi trò chơi trên điện thoại.

Trang 1

AI Foundations and Applications

8 Optimization of learning process

Thien Huynh-The HCM City Univ Technology and Education

Jan, 2023

Trang 2

Challenges in Deep Learning

Local minima

• The objective function of deep learning usually has many local minima

• The numerical solution obtained by the final iteration may only minimize the

objective function locally, rather than globally

• As the gradient of the objective function's solutions approaches or becomes zero

4/1/2024

Trang 3

Vanishing gradient

• As more layers using certain activation functions are added

to neural networks, the gradients of the loss function

approaches zero , making the network hard to train

• The simplest solution is to use other activation functions,

such as ReLU, which doesn’t cause a small derivative.

• Residual networks are another solution, as they provide

residual connections straight to earlier layers

Exploding gradient

• On the contrary, in some cases, the gradients keep on getting

larger and larger as the backpropagation algorithm

progresses This, in turn, causes very large weight updates

and causes the gradient descent to diverge This is known as

the exploding gradients problem.

Trang 4

4/1/2024

Which one is better in preventing a neural network having more activation layers from vanishing gradient,

sigmoid or ReLU ?

Trang 5

How to choose the right Activation Function?

Few other guidelines to help you out.

• ReLU activation function should only be used in the hidden layers.

• Sigmoid/Logistic and Tanh functions should not be used in hidden layers as they make the model more susceptible to problems during training (due to vanishing gradients).

• Swish function is used in neural networks having a depth greater than 40 layers.

Choose the activation function for your output layer based

on the type of prediction problem that you are solving:

• Regression - Linear Activation Function

• Binary Classification—Sigmoid/Logistic Activation Function

• Multiclass Classification—Softmax

• Multilabel Classification—Sigmoid

The activation function used in hidden layers is typically chosen based on the type of neural network architecture.

• Convolutional Neural Network (CNN): ReLU activation function.

• Recurrent Neural Network: Tanh and/or Sigmoid activation function.

Trang 6

Over fitting and under fitting

• Overfitting is a modeling error in statistics that occurs when a function is too closely aligned to a

limited set of data points As a result, the model is useful in reference only to its initial data set, and

not to any other data sets

• The model fit well the training data, but it not show the good performance with the testing data

• Underfitting is a scenario in data science where a data model is unable to capture the relationship

between the input and output variables accurately, generating a high error rate on both the training

set and unseen data

4/1/2024

Trang 7

Optimization schemes - Momentum

• The method of momentum is designed to accelerate learning

• The momentum algorithm accumulates exponentially decaying moving average of past gradient and

continues to move in their direction

Gradient descent with 2 variables Gradient descent with 2 variables (another example)

• Learning rate 0.4

• Learning rate 0.6

Trang 8

Optimization schemes - Momentum

• Instead of using only the gradient of the current step to guide the search,

momentum also accumulates the gradient of the past steps to determine the

direction to go

• The equations of gradient descent are revised as follows

8

Trang 9

Adaptive Gradient Descent (Adagrad)

• Decay the learning rate for parameters in proportion to their update history

• Adapts the learning rate to the parameters, performing smaller updates (low learning rates) for

parameters associated with frequently occurring features, and larger updates (high learning rates)

for parameters associated with infrequent features

• It is well-suited for dealing with sparse data

• Adagrad greatly improved the robustness of SGD and used it for training large-scale neural nets

Trang 10

Root Mean Squared Propagation (RMSProp)

• Adapts the learning rate to the parameters

• Divide the learning rate for a weight by a running average of the magnitudes of

recent gradients for that weight

4/1/2024

Trang 11

Adaptive Moment Estimation (Adam)

• ADAM combines two stochastic gradient descent approaches, Adaptive Gradients,

and Root Mean Square Propagation

• Adam also keeps an exponentially decaying average of past gradients similar to

SGD with momentum

Trang 12

Dropout

• Avoid overfitting problem

• Probabilistically dropping out nodes in the network is a simple and effective

regularization method

• Dropout is implemented per-layer in a neural network

• A common value is a probability of 0.5 for retaining the output of each node in a

hidden layer

4/1/2024

Trang 13

Dropout

Trang 14

Assignment 2 (mandatory)

Design a multilayer neural networks with input layer, 02 hidden layers (sigmoid) ,

output layer (softmax) Apply the optimization methods of Momentum và Adam

Compare the accuracy and Converging time among two methods Assume that the

MNIST dataset is used for training and testing the neural network Important: The

use of built-in functions are prohibited

Student submit the python code on Google Class

4/1/2024

Tiêu đề	Optimization of Learning Process
Tác giả	Thien Huynh-The
Trường học	HCM City University of Technology and Education
Chuyên ngành	AI Foundations and Applications
Thể loại	essay
Năm xuất bản	2023
Thành phố	HCMUTE

Định dạng
Số trang	14
Dung lượng	1,09 MB