Prediction of risk and return by using machine learning

– third layer state Xt Data input ht Data in epochs Ct Cell state

Trang 1

Ho Chi Minh City, July 2023

GRADUATION THESIS MAJOR: ELECTRONICS AND COMMUNICATION ENGINEERING

PREDICTION OF RISK AND RETURN

BY USING MACHINE LEARNING

INSTRUCTOR: PHẠM NGỌC SƠN, PhD STUDENT: LÂM MINH NHẬT

S K L 0 1 1 1 9 7

Trang 2

FACULTY FOR HIGH QUALITY TRAINING

DEPARTMENT OF COMPUTER AND COMMUNICATIONS

GRADUATION PROJECT

Student ID: 19161037

Ho Chi Minh City, July 2023

Major: ELECTRONICS AND COMMUNICATION ENGINEERING

PREDICTION OF RISK AND RETURN BY USING

MACHINE LEARNING

Advisor: PHẠM NGỌC SƠN, PhD

Student’s name: LÂM MINH NHẬT

Trang 3

-

Ho Chi Minh City, July 5th, 2023

GRADUATION PROJECT ASSIGNMENT

Student name: LÂM MINH NHẬT Student ID: 19161037

Major: ELECTRONICS AND

COMMUNICATION ENGINEERING

Class: 19161CLA Advisor: PHẠM NGỌC SƠN, PhD Phone number:

Date of assignment: June 22th 2023 Date of submission: June 23th 2023

1 Project title: RISK AND RETURN BY USING MACHINE LEARNING

2 Initial materials provided by the advisor: _

3 Content of the project: _

4 Final product:

CHAIR OF THE PROGRAM

(Sign with full name)

ADVISOR

Trang 4

-

Ho Chi Minh City, July , 2023 ADVISOR’S EVALUATION SHEET Student name: Student ID:

Student name: Student ID:

Major:

Project title:

Advisor:

EVALUATION 1 Content of the project:

2 Strengths:

3 Weaknesses:

4 Approval for oral defense? (Approved or denied)

5 Overall evaluation: (Excellent, Good, Fair, Poor)

Ho Chi Minh City, (month day, year)

ADVISOR

Trang 5

-

Ho Chi Minh City, July, 2023 PRE-DEFENSE EVALUATION SHEET Student name: Student ID:

Student name: Student ID:

Major:

Project title:

Name of Reviewer:

EVALUATION 1 Content and workload of the project

2 Strengths:

3 Weaknesses:

4 Approval for oral defense? (Approved or denied)

6 Reviewer questions for project valuation

6 Mark:……….(in words: )

Ho Chi Minh City, (month day, year)

REVIEWER

Trang 6

PREAMBLE

During the implementation of the graduation project, our team has received a lot of help, suggestions and indicators from teachers and friends

To complete the project “RISK AND RETURN BY USING MACHINE

LEARNING.” We sincerely thank PhD Phạm Ngọc Sơn - Lecturer of the Department of

Computer Engineering - Telecommunications, Faculty of Electrical - Electronics, University of Technology and Education of Ho Chi Minh City With dedicated guidance, guidance, facilitating and supporting the team to successfully complete this project

Individual would also like to thank the authors of the reference sources who helped the group to have more knowledge and choices in the process of implementing the topic

Although the group has tried to complete this topic in the most complete way, certain errors in research work, practical approach, as well as limitations in knowledge and time cannot be avoided implementation time Looking forward to receiving your comments so that the group can supplement and correct the topic to be more complete

Trang 7

ABBREVIATION

LSTMs Long-Short Term Memory networks

RNN Recurrent Neural Network

Rmse Root-mean-square deviation

S&P500 Standard & Poor's 500 Index

CAGR Compound Annual Growth Rate

Trang 8

MATHEMATICAL SYMBOL

Tanh The function Tanh is the ratio of Sinh and Cosh

– third layer state

X t Data input

h t Data in epochs

C t Cell state

f t First layer state

i t Second layer state

o t Fourth layer state

Trang 9

Table of Contents

CHAPTER 1: OVERVIEW OF PROJECT 1

1.1 I NTRODUCTION 1

1.2 O BJECTIVES OF THE PROJECT 1

1.3 L IMITATION OF THE PROJECT 1

1.4 P ROJECT ’ S LAYOUT 2

CHAPTER 2: THEORETICAL BASIS 3

2.1 P YTHON PROGRAMING LANGUAGE 3

2.1.1 N UMPY 3

2.1.2 P ANDAS 4

2.1.3 M ATPLOTLIB 4

2.1.4 K ERAS 5

2.2 M ACHINE LEARNING 5

2.3 LSTM MODEL 6

2.4 T HE CORE IDEA OF LSTM 7

2.5 LSTM’ S O PERATION 8

2.6 I NVESTMENT PORTFOLIO 10

2.7 S HARPE ’ S RATIO CONCEPT 11

2.8 R EVIEW ANOTHER PROJECT 12

CHAPTER 3: IMPLEMENTATION PROCESS 18

3.1 M ODEL SYSTEM 18

3.2 M AIN FLOWCHART AND SIMULATION PARAMETERS 23

CHAPTER 4: RESULT OF SIMULATION ANALYSIS AND ASSESSMENT 24

4.1 CHP ( C ENTRAL H YDROPOWER C ORP ) 24

4.2 SAB ( S AIGON B EER A LCOHOL B EVERAGE C ORP ) 26

4.3 FPT ( FPT C ORP ) 28

4.4 E VALUATION 30

CHAPTER 5: CONCLUSION AND DEVELOPMENT 33

5.1 C ONCLUSION 33

5.2 D EVELOPMENT DIRECTION OF THE PROJECT 33

REFERENCES 34

Trang 10

List of figure

Figure 1) The repeating module in a standard RNN contains a single layer 7

Figure 2) The repeating module in an LSTM contains four interacting layers 7

Figure 3) The cell state runs straight down the entire chain with only some minor linear interactions 8

Figure 4) They are combined by a sigmoid lattice layer and a multiplication 8

Figure 5) Decision of what information is going to throw away from the cell state 9

Figure 6) Decision of what new information is going to store in the cell state 9

Figure 7) Updating the old cell state 10

Figure 8) Decision of what is going to output 10

Figure 9) William Forsyth Sharpe 11

Figure 10) Summarize daily prices for Amazon and Facebook 12

Figure 11) Plot daily prices for Amazon and Facebook 12

Figure 12) Summarize daily values for the S&P 500 13

Figure 13) Plot daily values for the S&P 500 13

Figure 14) Visualize the daily return of Amazon and Facebook 14

Figure 15) Visualize the daily return of S&P 500 14

Figure 16) Calculating Excess Returns for Amazon and Facebook vs S&P 500 15

Figure 17) The Average Difference in Daily Returns Stocks vs S&P 500 15

Figure 18) Standard Deviation of the Return Difference 16

Figure 19) Comparation the Sharpe ratio between Amzazon and Facebook 16

Figure 20) Model system 18

Figure 21) Flowchart 23

Figure 22) Development chart of CHP 24

Figure 23) The predicted price compares with valid price 24

Figure 24) Comparasion of Sharpe Ratio 26

Figure 25) Development chart of SAB 26

Figure 28) Development chart of FPT 28

Figure 31) CHP’s Open Price in 23-06-09 31

Figure 32) SAB’s Open Price in 23-06-09 31

Figure 33) FPT’s Open Price in 23-06-09 32

Trang 11

The application of technology can help investors analyze stock market indices and indicators to make accurate and timely investment decisions By using “Machine Learning”

is the "door" for investors to understand the methods and methods of applying machine learning achievements to understand about stocks – about a field that is constantly changing every day Focusing on technical application, technical application in quantitative finance will support Investors to have a realistic, intuitive and easy-to-understand view of the stock market

1.2 Objectives of the project

Based on Sharpe’s ratio formula; Using machine learning model to simulate predicted stock values through the GoogleColab compilation environment From the predicted values of model’s result, we will draw conclusion, analyze the parameters Moreover, evaluating the comparation between the “Predicted Sharpe’s ratio”- taken from parameters in the model with “Valid Sharpe’s ratio”- taken from the data collected in reality Finally, giving the advices and observe the stock evolutions that they grow similar with the trending

1.3 Limitation of the project

This project focuses on building machine learning models to simulate and analyze that will predict the collection of data through regression methods In addition, the above results are only simulations and can be applied to actual models

Trang 12

Chapter 4: Simulation analysis and evaluation Provide simulation scenarios, flowcharts, simulation results and comment on results obtained

Chapter 5: Conclusion and development direction Provide conclusions and directions for the development of the topic

Trang 13

3

Chapter 2: Theoretical basis

2.1 Python programing language

Python is a high-level general-purpose programming language that was developed

by Guido van Rossum and originally made available in 1991 Python was created with the significant benefit of being simple to read, understand, and remember

Python, which is frequently used in the creation of artificial intelligence, has a very brilliant look, a clear structure, is handy for novices, and is simple to master The structure

of Python also enables users to write code with few keystrokes

Python consistently ranks as one of the most popular programming languages[2]

2.1.1 Numpy

The fundamental Python module for scientific computing is called NumPy A multidimensional array object, various derived objects (like masked arrays and matrices), and a variety of routines for quick operations on arrays are provided by this Python library These operations include discrete Fourier transforms, basic linear algebra, basic statistical operations, random simulation, and much more

The ndarray object is the base of the NumPy package This contains homogenous n-dimensional arrays of data kinds, with many operations carried out in compiled code for speed NumPy arrays and regular Python sequences have a number of significant distinctions.:

_ Unlike Python lists (which can grow dynamically), NumPy arrays have a fixed size when they are created An ndarray's size change results in the creation of

a new array and the deletion of the old one

_ A NumPy array's elements must all be the same data type in order to share the same amount of memory The exception: arrays of (Python, including NumPy) objects are possible, allowing for arrays with various element sizes

_NumPy arrays make it easier to do complex mathematical and other operations on enormous amounts of data The majority of the time, these actions can

Trang 14

as well In other words, merely being able to utilize Python's built-in sequence types

is not enough to effectively use the majority (perhaps even all) of the scientific and mathematical Python-based applications available today

2.1.2 Pandas

Pandas is an open-source library designed primarily for working quickly and logically with relational or labeled data It offers a range of data structures and procedures for working with time series and numerical data The NumPy library serves as the foundation for this library Pandas is quick and provides users with excellent performance

& productivity

Advantages:

- Quick and effective data manipulation and analysis

- It is possible to load data from different file objects

- Simple handling of missing data in both floating point and non-floating point data (expressed as NaN)

- Size mutability: columns in DataFrame and higher-dimensional objects can be added and removed

- Merging and connecting data sets

- Flexible data set reshaping and pivoting

- Time-series functionality is provided

- Effective group by functionality for splitting, applying, and combining data sets

2.1.3 Matplotlib

Matplotlib is a fantastic Python visualization package for array-based 2D charts A multi-platform data visualization package called Matplotlib was created to deal with the larger SciPy stack and is based on NumPy arrays In the year 2002, John Hunter first presented it One of visualization's biggest advantages is that it gives us visual access to vast volumes of data in forms that are simple to understand There are several plots in Matplotlib, including line, bar, scatter, histogram, etc

Trang 15

of tools to make dealing with picture and text data easier This makes the coding necessary

to build deep neural networks easier The source is hosted on GitHub, and there is a Slack channel and problems page for community help In addition to classical neural networks, Keras also supports convolutional and recurrent neural networks The dropout, batch normalization, and pooling utility layers are available, as are other commonly used utility layers

2.2 Machine learning

Machine learning is the study of creating computer algorithms that can recognize patterns in data, adjust to changes, and get better over time In addition to being one of the original goals of computer science, machine learning has gained significance as we expect computers to handle issues that are more complicated and interwoven into our daily lives

Understanding the basics of learning as a computing process is the goal of machine learning theory, commonly referred to as computer learning theory This discipline aims to comprehend at a precise mathematical level the fundamental skills and knowledge necessary to successfully learn various activities and to comprehend the fundamental algorithmic principles underlying the creation of computers Gain knowledge from data, then use feedback to enhance performance This theory's objectives are to understand the basic issues with the learning process itself and to assist in the invention of better automated learning systems[3]

Making mathematical models that capture important aspects of machine learning allows one to analyze the inherent simplicity or complexity of various learning problems Machine learning theory also involves proving guarantees for algorithms (under what circumstances will they succeed, how much data and computation time are required, and developing machine learning algorithms that provably work)

Trang 16

6

There are some important links between Machine Learning Theory and other fields

of study One of the main objectives of cryptography is to allow communication between users while preventing information about what is being said from being obtained by eavesdroppers In this context, machine learning might be seen as creating algorithms for the eavesdropper Provably sound cryptosystems can be transformed into issues that are impossible to learn and difficult learning problems can be transformed into suggested cryptosystems, in particular Additionally, there are clear linkages between crucial Machine Learning methods and approaches created in cryptography at the technological level For instance, Boosting, a machine learning technique aimed at maximizing the power of a particular learning algorithm, is closely related to techniques for amplifying cryptosystems developed in cryptography

There are some important links between Machine Learning Theory and other fields

of study One of the main objectives of cryptography is to allow communication between users while preventing information about what is being said from being obtained by eavesdroppers In this context, machine learning might be seen as creating algorithms for the eavesdropper Additionally, the creation of particularly quick-adapting algorithms reveals how states in a system may be attained swiftly even when each member has a wide range of options On the other hand, when a computer program adapts to its surroundings while also influencing its environment and the behavior of other people inside it, economic problems in machine learning start to appear As both groups work to create tools for modeling and supporting electronic commerce, links between these two fields have become stronger in recent years

2.3 LSTM model

Long-Short Term Memory networks, commonly known as LSTMs - are a special type of RNN that is capable of learning distant dependencies LSTM was introduced by Hochreiter & Schmidhuber (1997)[4], and has since been refined and popularized by many people in the industry They work extremely effectively on many different problems, so they have gradually become as popular as they are today LSTM is designed to avoid the problem of long-term dependency Remembering information for a long time is their default property, we don't need to train it to be able to remember it That is, its internals can already be memorized without any intervention

Trang 17

7

Every recurrent network takes the form of a sequence of repeating modules of a neural network With standard RNNs, these modules have a very simple structure, usually

one tanh layer

Figure 1) The repeating module in a standard RNN contains a single layer

The LSTM also has the same chained architecture, but its modules have a different structure from that of a standard RNN Instead of having only one layer of neural networks, they have up to 4 layers that interact with each other in a very special way

Figure 2) The repeating module in an LSTM contains four interacting layers

2.4 The core idea of LSTM

The key to LSTM is the cell state - the very line that runs through the top of the diagram

Trang 18

8

The cell state is a kind of conveyor belt It runs through all the links (network nodes) and interacts only slightly linearly So that the information can be easily transmitted without fear of being changed

Figure 3) The cell state runs straight down the entire chain with only some minor linear interactions

The LSTM is capable of removing or adding information necessary for the cell state, which is carefully regulated by groups called gates

The gates are the filter for the information passing through it, they are combined by

a sigmoid network layer with a multiplication

Figure 4) They are combined by a sigmoid lattice layer and a multiplication

The sigmoid layer will output a number in the range [0,1], describing how much information can be passed When the output is 0, it means no information is allowed to pass, and when it is 1, it means that all information passes through it An LSTM consists

of three such gates to maintain and control the state of the cell

Trang 19

9

state C t−1 A value of 1 indicates that it will keep all the information and a value of 0 indicates that the information set will be discarded

Figure 5) Decision of what information is going to throw away from the cell state

The next step is to decide what new information we’re going to store in the cell state This has two parts First, a sigmoid layer called the “input gate layer” decides which

values we’ll update Next, a tanh layer creates a vector of new candidate values, C̃ t , that could be added to the state In the next step, we’ll combine these two to create an update

to the state

Figure 6) Decision of what new information is going to store in the cell state.

Now it's time to update the old cell state C t−1 to the new cell state C t In the previous steps, we decided what to do, so now we just need to do it

We will multiply the old state by f t to remove the information we decided to forget

earlier Then add i t ∗ C̃ t This newly obtained state depends on how we decide to update each state value

Trang 20

10

Figure 7) Updating the old cell state

Finally, we need to decide what we want the output to be The output value will be based on the cell state, but will be further screened First, we run a sigmoid layer to decide

what part of the cell state we want to output We then pass it to the cell through a tanh

function to reduce it to the interval [−1,1], and multiply it by the output of the sigmoid gate

to get the output we want

Figure 8) Decision of what is going to output

2.6 Investment portfolio

Investment Portfolio is a combination of different asset classes such as stocks, bonds, real estate and other asset classes to optimize returns and minimize investment risks Risk is an ever-present factor in business and investing A successful investor is one who can balance these two factors and achieve his investment goals

Portfolio diversification is one of the important measures in risk management when investing The right combination of many asset classes with different levels of return and risk such as: gold, real estate, foreign currencies, stocks and bonds… helps reduce the overall risk compared to holding one type of security securities or a single asset class in a portfolio

Trang 21

11

Our investment professionals assist clients with portfolio management to achieve their financial goals through asset selection, allocation and portfolio management

2.7 Sharpe’s ratio concept

The Sharpe ratio, which is the relationship between an investment's excess expected return and its return volatility or standard deviation, is one of the most often used statistics

in financial research[5] The Sharpe ratio is currently utilized in many various situations, from performance attribution to tests of market efficiency to risk management It was originally inspired by mean-variance analysis and the Sharpe Lintner Capital Asset Pricing Model This picture below is a portrait of Mr William Forsyth Sharpe

Figure 9) William Forsyth Sharpe

It is remarkable that so little focus has been placed on the Sharpe ratio's statistical features given how frequently it is used and the numerous meanings it has garnered over the years[6] Expected returns and volatilities must be calculated in some way because they are quantities that are typically not visible Because estimating mistakes are unavoidable,

it follows that the Sharpe ratio is similarly subject to estimation error, which begs the obvious question: How precisely are Sharpe ratios measured?

The Sharpe ratio calculates the amount of return received per unit of risk while using

a trading strategy or investing in an asset

William F Sharpe created the Sharpe ratio, which is used by investors to analyze the relationship between an investment's return and risk The average return received over the risk-free return for each unit of risk is represented by this ratio

Trang 22

12

2.8 Review another project

Project: Risk and Return by using Sharpe’s Ratio:

Calculate the Sharpe ratio for the stocks of Amazon and Facebook to gain knowledge about the ratio We'll use the S&P 500, which gauges the performance of the

500 biggest US equities, as a benchmark

If we expect an investment to provide a higher profit than its cost, it may be a wise decision However, because returns are uncertain and there are a variety of potential outcomes, they only tell half of the picture How can various investments be compared when they may, on average, produce comparable outcomes but have varying degrees of risk? First, we will summarize how many samples of 2 securities and calculate some calculation

Figure 10) Summarize daily prices for Amazon and Facebook

Form the data raw, we will plot it as the figure below to show the development of them and observe the oscillation of open price

Figure 11) Plot daily prices for Amazon and Facebook

Tiêu đề	Prediction of Risk and Return by Using Machine Learning
Tác giả	Lâm Minh Nhật
Người hướng dẫn	Phạm Ngọc Sơn, PhD
Trường học	Ho Chi Minh City University of Technology and Education
Chuyên ngành	Electronics and Communication Engineering
Thể loại	Graduation Project
Năm xuất bản	2023
Thành phố	Ho Chi Minh City

Định dạng
Số trang	45
Dung lượng	4,24 MB

Tài liệu tham khảo	Loại	Chi tiết
[1] Application of Computational Intelligence in data-driven Trading - Cris Doloc [2] Learning Python - Mark Lutz	Khác
[3] Machine Learning Theory - Avrim Blum	Khác
[4] A Review on the Long Short-Term Memory Model: Greg Van Houdt - Carlos Mosquera - Gonzalo Napoles	Khác
[5] The Statistics of Sharpe’s Ratios - Andrew W.Lo	Khác
[6] The Sharpe Ratio: Statistics and Application – Steven E.Pav [7] Probability and Statistics – Le Thi Mai Trang	Khác
[8] Risk and Return in General: Theory and Evidence - Eric Falkenstein	Khác