TAI LIEU MON HOC LAP TRINH MACHINE LEARNING

14 0 0
TAI LIEU MON HOC LAP TRINH MACHINE LEARNING

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

Trí tuệ nhân tạo (AI) là ngành tạo ra máy móc và hệ thống thông minh thông qua việc sử dụng mô hình máy tính, kỹ thuật và công nghệ liên quan, giúp thực hiện các công việc yêu cầu trí thông minh của con người. Nhìn chung, đây là một ngành học rất rộng, bao gồm các yếu tố tâm lý học, khoa học máy tính và kỹ thuật. Một số ví dụ phổ biến về AI có thể kể đến ô tô tự lái, phần mềm dịch thuật tự động, trợ lý ảo trên điện thoại hay đối thủ ảo khi chơi trò chơi trên điện thoại.

Trang 1

AI Foundations and Applications

8 Optimization of learning process

Thien Huynh-The

HCM City Univ Technology and Education Jan, 2023

Trang 2

AI Foundations and Applications

Challenges in Deep Learning

Local minima

• The objective function of deep learning usually has many local minima

• The numerical solution obtained by the final iteration may only minimize the objective function locally, rather than globally.

• As the gradient of the objective function's solutions approaches or becomes zero

4/1/2024

Trang 3

Challenges in Deep Learning

Vanishing gradient

• As more layers using certain activation functions are added to neural networks, the gradients of the loss function

approaches zero, making the network hard to train

• The simplest solution is to use other activation functions, such as ReLU, which doesn’t cause a small derivative.• Residual networks are another solution, as they provide

residual connections straight to earlier layersExploding gradient

• On the contrary, in some cases, the gradients keep on getting larger and larger as the backpropagation algorithm

progresses This, in turn, causes very large weight updates and causes the gradient descent to diverge This is known as the exploding gradients problem.

Trang 4

AI Foundations and Applications

Challenges in Deep Learning

Which one is better in preventing a neural network having more activation layers from vanishing gradient,

sigmoid or ReLU ?

Trang 5

Challenges in Deep Learning

How to choose the right Activation Function?

Few other guidelines to help you out.

•ReLU activation function should only be used in the hidden layers.

•Sigmoid/Logistic and Tanh functions should not be used in hidden layers as they make the model more susceptible to problems during training (due to vanishing gradients).

•Swish function is used in neural networks having a depth greater than 40 layers.

Choose the activation function for your output layer based on the type of prediction problem that you are solving:

•Regression - Linear Activation Function

•Binary Classification—Sigmoid/Logistic Activation Function•Multiclass Classification—Softmax

•Multilabel Classification—Sigmoid

The activation function used in hidden layers is typically chosen based on the type of neural network architecture.

•Convolutional Neural Network (CNN): ReLU activation function.•Recurrent Neural Network: Tanh and/or Sigmoid activation

function.

Trang 6

AI Foundations and Applications

Challenges in Deep Learning

Over fitting and under fitting

• Overfitting is a modeling error in statistics that occurs when a function is too closely aligned to a limited set of data points As a result, the model is useful in reference only to its initial data set, and not to any other data sets

• The model fit well the training data, but it not show the good performance with the testing data• Underfitting is a scenario in data science where a data model is unable to capture the relationship

between the input and output variables accurately, generating a high error rate on both the training set and unseen data

4/1/2024

Trang 7

Optimization schemes - Momentum • The method of momentum is designed to accelerate learning

• The momentum algorithm accumulates exponentially decaying moving average of past gradient and continues to move in their direction

Gradient descent with 2 variablesGradient descent with 2 variables (another example)

• Learning rate 0.4• Learning rate 0.6

Trang 8

AI Foundations and Applications

Optimization schemes - Momentum

• Instead of using only the gradient of the current step to guide the search, momentum also accumulates the gradient of the past steps to determine the direction to go

• The equations of gradient descent are revised as follows.

8

Trang 9

Adaptive Gradient Descent (Adagrad)

• Decay the learning rate for parameters in proportion to their update history

• Adapts the learning rate to the parameters, performing smaller updates (low learning rates) for parameters associated with frequently occurring features, and larger updates (high learning rates) for parameters associated with infrequent features

• It is well-suited for dealing with sparse data

• Adagrad greatly improved the robustness of SGD and used it for training large-scale neural nets

Trang 10

AI Foundations and Applications

Root Mean Squared Propagation (RMSProp)

• Adapts the learning rate to the parameters

• Divide the learning rate for a weight by a running average of the magnitudes of recent gradients for that weight

4/1/2024

Trang 11

Adaptive Moment Estimation (Adam)

• ADAM combines two stochastic gradient descent approaches, Adaptive Gradients, and Root Mean Square Propagation

• Adam also keeps an exponentially decaying average of past gradients similar to SGD with momentum

Trang 12

AI Foundations and Applications

• Avoid overfitting problem

• Probabilistically dropping out nodes in the network is a simple and effective regularization method

• Dropout is implemented per-layer in a neural network

• A common value is a probability of 0.5 for retaining the output of each node in a hidden layer

4/1/2024

Trang 13

Dropout

Trang 14

AI Foundations and Applications

Assignment 2 (mandatory)

Assignment 2 (mandatory)

Design a multilayer neural networks with input layer, 02 hidden layers (sigmoid) , output layer (softmax) Apply the optimization methods of Momentum và Adam Compare the accuracy and Converging time among two methods Assume that the MNIST dataset is used for training and testing the neural network Important: The use of built-in functions are prohibited.

Student submit the python code on Google Class.

4/1/2024

Ngày đăng: 13/04/2024, 10:07

Tài liệu cùng người dùng

Tài liệu liên quan