– third layer state Xt Data input ht Data in epochs Ct Cell state
Trang 1Ho Chi Minh City, July 2023
GRADUATION THESIS MAJOR: ELECTRONICS AND COMMUNICATION ENGINEERING
PREDICTION OF RISK AND RETURN
BY USING MACHINE LEARNING
INSTRUCTOR: PHẠM NGỌC SƠN, PhD STUDENT: LÂM MINH NHẬT
S K L 0 1 1 1 9 7
Trang 2FACULTY FOR HIGH QUALITY TRAINING
DEPARTMENT OF COMPUTER AND COMMUNICATIONS
GRADUATION PROJECT
Student ID: 19161037
Ho Chi Minh City, July 2023
Major: ELECTRONICS AND COMMUNICATION ENGINEERING
PREDICTION OF RISK AND RETURN BY USING
MACHINE LEARNING
Advisor: PHẠM NGỌC SƠN, PhD
Student’s name: LÂM MINH NHẬT
Trang 3-
Ho Chi Minh City, July 5th, 2023
GRADUATION PROJECT ASSIGNMENT
Student name: LÂM MINH NHẬT Student ID: 19161037
Major: ELECTRONICS AND
COMMUNICATION ENGINEERING
Class: 19161CLA Advisor: PHẠM NGỌC SƠN, PhD Phone number:
Date of assignment: June 22th 2023 Date of submission: June 23th 2023
1 Project title: RISK AND RETURN BY USING MACHINE LEARNING
2 Initial materials provided by the advisor: _
3 Content of the project: _
4 Final product:
CHAIR OF THE PROGRAM
(Sign with full name)
ADVISOR
(Sign with full name)
Trang 4-
Ho Chi Minh City, July , 2023 ADVISOR’S EVALUATION SHEET Student name: Student ID:
Student name: Student ID:
Student name: Student ID:
Major:
Project title:
Advisor:
EVALUATION 1 Content of the project:
2 Strengths:
3 Weaknesses:
4 Approval for oral defense? (Approved or denied)
5 Overall evaluation: (Excellent, Good, Fair, Poor)
Ho Chi Minh City, (month day, year)
ADVISOR
(Sign with full name)
Trang 5-
Ho Chi Minh City, July, 2023 PRE-DEFENSE EVALUATION SHEET Student name: Student ID:
Student name: Student ID:
Student name: Student ID:
Major:
Project title:
Name of Reviewer:
EVALUATION 1 Content and workload of the project
2 Strengths:
3 Weaknesses:
4 Approval for oral defense? (Approved or denied)
6 Reviewer questions for project valuation
6 Mark:……….(in words: )
Ho Chi Minh City, (month day, year)
REVIEWER
(Sign with full name)
Trang 6PREAMBLE
During the implementation of the graduation project, our team has received a lot of help, suggestions and indicators from teachers and friends
To complete the project “RISK AND RETURN BY USING MACHINE
LEARNING.” We sincerely thank PhD Phạm Ngọc Sơn - Lecturer of the Department of
Computer Engineering - Telecommunications, Faculty of Electrical - Electronics, University of Technology and Education of Ho Chi Minh City With dedicated guidance, guidance, facilitating and supporting the team to successfully complete this project
Individual would also like to thank the authors of the reference sources who helped the group to have more knowledge and choices in the process of implementing the topic
Although the group has tried to complete this topic in the most complete way, certain errors in research work, practical approach, as well as limitations in knowledge and time cannot be avoided implementation time Looking forward to receiving your comments so that the group can supplement and correct the topic to be more complete
Trang 7ABBREVIATION
LSTMs Long-Short Term Memory networks
RNN Recurrent Neural Network
Rmse Root-mean-square deviation
S&P500 Standard & Poor's 500 Index
CAGR Compound Annual Growth Rate
Trang 8MATHEMATICAL SYMBOL
Tanh The function Tanh is the ratio of Sinh and Cosh
– third layer state
X t Data input
h t Data in epochs
C t Cell state
f t First layer state
i t Second layer state
o t Fourth layer state
Trang 9Table of Contents
CHAPTER 1: OVERVIEW OF PROJECT 1
1.1 I NTRODUCTION 1
1.2 O BJECTIVES OF THE PROJECT 1
1.3 L IMITATION OF THE PROJECT 1
1.4 P ROJECT ’ S LAYOUT 2
CHAPTER 2: THEORETICAL BASIS 3
2.1 P YTHON PROGRAMING LANGUAGE 3
2.1.1 N UMPY 3
2.1.2 P ANDAS 4
2.1.3 M ATPLOTLIB 4
2.1.4 K ERAS 5
2.2 M ACHINE LEARNING 5
2.3 LSTM MODEL 6
2.4 T HE CORE IDEA OF LSTM 7
2.5 LSTM’ S O PERATION 8
2.6 I NVESTMENT PORTFOLIO 10
2.7 S HARPE ’ S RATIO CONCEPT 11
2.8 R EVIEW ANOTHER PROJECT 12
CHAPTER 3: IMPLEMENTATION PROCESS 18
3.1 M ODEL SYSTEM 18
3.2 M AIN FLOWCHART AND SIMULATION PARAMETERS 23
CHAPTER 4: RESULT OF SIMULATION ANALYSIS AND ASSESSMENT 24
4.1 CHP ( C ENTRAL H YDROPOWER C ORP ) 24
4.2 SAB ( S AIGON B EER A LCOHOL B EVERAGE C ORP ) 26
4.3 FPT ( FPT C ORP ) 28
4.4 E VALUATION 30
CHAPTER 5: CONCLUSION AND DEVELOPMENT 33
5.1 C ONCLUSION 33
5.2 D EVELOPMENT DIRECTION OF THE PROJECT 33
REFERENCES 34
Trang 10List of figure
Figure 1) The repeating module in a standard RNN contains a single layer 7
Figure 2) The repeating module in an LSTM contains four interacting layers 7
Figure 3) The cell state runs straight down the entire chain with only some minor linear interactions 8
Figure 4) They are combined by a sigmoid lattice layer and a multiplication 8
Figure 5) Decision of what information is going to throw away from the cell state 9
Figure 6) Decision of what new information is going to store in the cell state 9
Figure 7) Updating the old cell state 10
Figure 8) Decision of what is going to output 10
Figure 9) William Forsyth Sharpe 11
Figure 10) Summarize daily prices for Amazon and Facebook 12
Figure 11) Plot daily prices for Amazon and Facebook 12
Figure 12) Summarize daily values for the S&P 500 13
Figure 13) Plot daily values for the S&P 500 13
Figure 14) Visualize the daily return of Amazon and Facebook 14
Figure 15) Visualize the daily return of S&P 500 14
Figure 16) Calculating Excess Returns for Amazon and Facebook vs S&P 500 15
Figure 17) The Average Difference in Daily Returns Stocks vs S&P 500 15
Figure 18) Standard Deviation of the Return Difference 16
Figure 19) Comparation the Sharpe ratio between Amzazon and Facebook 16
Figure 20) Model system 18
Figure 21) Flowchart 23
Figure 22) Development chart of CHP 24
Figure 23) The predicted price compares with valid price 24
Figure 24) Comparasion of Sharpe Ratio 26
Figure 25) Development chart of SAB 26
Figure 26) The predicted price compares with valid price 27
Figure 27) Comparasion of Sharpe Ratio 28
Figure 28) Development chart of FPT 28
Figure 29) The predicted price compares with valid price 29
Figure 30) Comparasion of Sharpe Ratio 30
Figure 31) CHP’s Open Price in 23-06-09 31
Figure 32) SAB’s Open Price in 23-06-09 31
Figure 33) FPT’s Open Price in 23-06-09 32
Trang 11The application of technology can help investors analyze stock market indices and indicators to make accurate and timely investment decisions By using “Machine Learning”
is the "door" for investors to understand the methods and methods of applying machine learning achievements to understand about stocks – about a field that is constantly changing every day Focusing on technical application, technical application in quantitative finance will support Investors to have a realistic, intuitive and easy-to-understand view of the stock market
1.2 Objectives of the project
Based on Sharpe’s ratio formula; Using machine learning model to simulate predicted stock values through the GoogleColab compilation environment From the predicted values of model’s result, we will draw conclusion, analyze the parameters Moreover, evaluating the comparation between the “Predicted Sharpe’s ratio”- taken from parameters in the model with “Valid Sharpe’s ratio”- taken from the data collected in reality Finally, giving the advices and observe the stock evolutions that they grow similar with the trending
1.3 Limitation of the project
This project focuses on building machine learning models to simulate and analyze that will predict the collection of data through regression methods In addition, the above results are only simulations and can be applied to actual models
Trang 12Chapter 4: Simulation analysis and evaluation Provide simulation scenarios, flowcharts, simulation results and comment on results obtained
Chapter 5: Conclusion and development direction Provide conclusions and directions for the development of the topic
Trang 133
Chapter 2: Theoretical basis
2.1 Python programing language
Python is a high-level general-purpose programming language that was developed
by Guido van Rossum and originally made available in 1991 Python was created with the significant benefit of being simple to read, understand, and remember
Python, which is frequently used in the creation of artificial intelligence, has a very brilliant look, a clear structure, is handy for novices, and is simple to master The structure
of Python also enables users to write code with few keystrokes
Python consistently ranks as one of the most popular programming languages[2]
2.1.1 Numpy
The fundamental Python module for scientific computing is called NumPy A multidimensional array object, various derived objects (like masked arrays and matrices), and a variety of routines for quick operations on arrays are provided by this Python library These operations include discrete Fourier transforms, basic linear algebra, basic statistical operations, random simulation, and much more
The ndarray object is the base of the NumPy package This contains homogenous n-dimensional arrays of data kinds, with many operations carried out in compiled code for speed NumPy arrays and regular Python sequences have a number of significant distinctions.:
_ Unlike Python lists (which can grow dynamically), NumPy arrays have a fixed size when they are created An ndarray's size change results in the creation of
a new array and the deletion of the old one
_ A NumPy array's elements must all be the same data type in order to share the same amount of memory The exception: arrays of (Python, including NumPy) objects are possible, allowing for arrays with various element sizes
_NumPy arrays make it easier to do complex mathematical and other operations on enormous amounts of data The majority of the time, these actions can
Trang 14as well In other words, merely being able to utilize Python's built-in sequence types
is not enough to effectively use the majority (perhaps even all) of the scientific and mathematical Python-based applications available today
2.1.2 Pandas
Pandas is an open-source library designed primarily for working quickly and logically with relational or labeled data It offers a range of data structures and procedures for working with time series and numerical data The NumPy library serves as the foundation for this library Pandas is quick and provides users with excellent performance
& productivity
Advantages:
- Quick and effective data manipulation and analysis
- It is possible to load data from different file objects
- Simple handling of missing data in both floating point and non-floating point data (expressed as NaN)
- Size mutability: columns in DataFrame and higher-dimensional objects can be added and removed
- Merging and connecting data sets
- Flexible data set reshaping and pivoting
- Time-series functionality is provided
- Effective group by functionality for splitting, applying, and combining data sets
2.1.3 Matplotlib
Matplotlib is a fantastic Python visualization package for array-based 2D charts A multi-platform data visualization package called Matplotlib was created to deal with the larger SciPy stack and is based on NumPy arrays In the year 2002, John Hunter first presented it One of visualization's biggest advantages is that it gives us visual access to vast volumes of data in forms that are simple to understand There are several plots in Matplotlib, including line, bar, scatter, histogram, etc
Trang 15of tools to make dealing with picture and text data easier This makes the coding necessary
to build deep neural networks easier The source is hosted on GitHub, and there is a Slack channel and problems page for community help In addition to classical neural networks, Keras also supports convolutional and recurrent neural networks The dropout, batch normalization, and pooling utility layers are available, as are other commonly used utility layers
2.2 Machine learning
Machine learning is the study of creating computer algorithms that can recognize patterns in data, adjust to changes, and get better over time In addition to being one of the original goals of computer science, machine learning has gained significance as we expect computers to handle issues that are more complicated and interwoven into our daily lives
Understanding the basics of learning as a computing process is the goal of machine learning theory, commonly referred to as computer learning theory This discipline aims to comprehend at a precise mathematical level the fundamental skills and knowledge necessary to successfully learn various activities and to comprehend the fundamental algorithmic principles underlying the creation of computers Gain knowledge from data, then use feedback to enhance performance This theory's objectives are to understand the basic issues with the learning process itself and to assist in the invention of better automated learning systems[3]
Making mathematical models that capture important aspects of machine learning allows one to analyze the inherent simplicity or complexity of various learning problems Machine learning theory also involves proving guarantees for algorithms (under what circumstances will they succeed, how much data and computation time are required, and developing machine learning algorithms that provably work)
Trang 166
There are some important links between Machine Learning Theory and other fields
of study One of the main objectives of cryptography is to allow communication between users while preventing information about what is being said from being obtained by eavesdroppers In this context, machine learning might be seen as creating algorithms for the eavesdropper Provably sound cryptosystems can be transformed into issues that are impossible to learn and difficult learning problems can be transformed into suggested cryptosystems, in particular Additionally, there are clear linkages between crucial Machine Learning methods and approaches created in cryptography at the technological level For instance, Boosting, a machine learning technique aimed at maximizing the power of a particular learning algorithm, is closely related to techniques for amplifying cryptosystems developed in cryptography
There are some important links between Machine Learning Theory and other fields
of study One of the main objectives of cryptography is to allow communication between users while preventing information about what is being said from being obtained by eavesdroppers In this context, machine learning might be seen as creating algorithms for the eavesdropper Additionally, the creation of particularly quick-adapting algorithms reveals how states in a system may be attained swiftly even when each member has a wide range of options On the other hand, when a computer program adapts to its surroundings while also influencing its environment and the behavior of other people inside it, economic problems in machine learning start to appear As both groups work to create tools for modeling and supporting electronic commerce, links between these two fields have become stronger in recent years
2.3 LSTM model
Long-Short Term Memory networks, commonly known as LSTMs - are a special type of RNN that is capable of learning distant dependencies LSTM was introduced by Hochreiter & Schmidhuber (1997)[4], and has since been refined and popularized by many people in the industry They work extremely effectively on many different problems, so they have gradually become as popular as they are today LSTM is designed to avoid the problem of long-term dependency Remembering information for a long time is their default property, we don't need to train it to be able to remember it That is, its internals can already be memorized without any intervention
Trang 177
Every recurrent network takes the form of a sequence of repeating modules of a neural network With standard RNNs, these modules have a very simple structure, usually
one tanh layer
Figure 1) The repeating module in a standard RNN contains a single layer
The LSTM also has the same chained architecture, but its modules have a different structure from that of a standard RNN Instead of having only one layer of neural networks, they have up to 4 layers that interact with each other in a very special way
Figure 2) The repeating module in an LSTM contains four interacting layers
2.4 The core idea of LSTM
The key to LSTM is the cell state - the very line that runs through the top of the diagram
Trang 188
The cell state is a kind of conveyor belt It runs through all the links (network nodes) and interacts only slightly linearly So that the information can be easily transmitted without fear of being changed
Figure 3) The cell state runs straight down the entire chain with only some minor linear interactions
The LSTM is capable of removing or adding information necessary for the cell state, which is carefully regulated by groups called gates
The gates are the filter for the information passing through it, they are combined by
a sigmoid network layer with a multiplication
Figure 4) They are combined by a sigmoid lattice layer and a multiplication
The sigmoid layer will output a number in the range [0,1], describing how much information can be passed When the output is 0, it means no information is allowed to pass, and when it is 1, it means that all information passes through it An LSTM consists
of three such gates to maintain and control the state of the cell
Trang 199
state C t−1 A value of 1 indicates that it will keep all the information and a value of 0 indicates that the information set will be discarded
Figure 5) Decision of what information is going to throw away from the cell state
The next step is to decide what new information we’re going to store in the cell state This has two parts First, a sigmoid layer called the “input gate layer” decides which
values we’ll update Next, a tanh layer creates a vector of new candidate values, C̃ t , that could be added to the state In the next step, we’ll combine these two to create an update
to the state
Figure 6) Decision of what new information is going to store in the cell state.
Now it's time to update the old cell state C t−1 to the new cell state C t In the previous steps, we decided what to do, so now we just need to do it
We will multiply the old state by f t to remove the information we decided to forget
earlier Then add i t ∗ C̃ t This newly obtained state depends on how we decide to update each state value
Trang 2010
Figure 7) Updating the old cell state
Finally, we need to decide what we want the output to be The output value will be based on the cell state, but will be further screened First, we run a sigmoid layer to decide
what part of the cell state we want to output We then pass it to the cell through a tanh
function to reduce it to the interval [−1,1], and multiply it by the output of the sigmoid gate
to get the output we want
Figure 8) Decision of what is going to output
2.6 Investment portfolio
Investment Portfolio is a combination of different asset classes such as stocks, bonds, real estate and other asset classes to optimize returns and minimize investment risks Risk is an ever-present factor in business and investing A successful investor is one who can balance these two factors and achieve his investment goals
Portfolio diversification is one of the important measures in risk management when investing The right combination of many asset classes with different levels of return and risk such as: gold, real estate, foreign currencies, stocks and bonds… helps reduce the overall risk compared to holding one type of security securities or a single asset class in a portfolio
Trang 2111
Our investment professionals assist clients with portfolio management to achieve their financial goals through asset selection, allocation and portfolio management
2.7 Sharpe’s ratio concept
The Sharpe ratio, which is the relationship between an investment's excess expected return and its return volatility or standard deviation, is one of the most often used statistics
in financial research[5] The Sharpe ratio is currently utilized in many various situations, from performance attribution to tests of market efficiency to risk management It was originally inspired by mean-variance analysis and the Sharpe Lintner Capital Asset Pricing Model This picture below is a portrait of Mr William Forsyth Sharpe
Figure 9) William Forsyth Sharpe
It is remarkable that so little focus has been placed on the Sharpe ratio's statistical features given how frequently it is used and the numerous meanings it has garnered over the years[6] Expected returns and volatilities must be calculated in some way because they are quantities that are typically not visible Because estimating mistakes are unavoidable,
it follows that the Sharpe ratio is similarly subject to estimation error, which begs the obvious question: How precisely are Sharpe ratios measured?
The Sharpe ratio calculates the amount of return received per unit of risk while using
a trading strategy or investing in an asset
William F Sharpe created the Sharpe ratio, which is used by investors to analyze the relationship between an investment's return and risk The average return received over the risk-free return for each unit of risk is represented by this ratio
Trang 2212
2.8 Review another project
Project: Risk and Return by using Sharpe’s Ratio:
Calculate the Sharpe ratio for the stocks of Amazon and Facebook to gain knowledge about the ratio We'll use the S&P 500, which gauges the performance of the
500 biggest US equities, as a benchmark
If we expect an investment to provide a higher profit than its cost, it may be a wise decision However, because returns are uncertain and there are a variety of potential outcomes, they only tell half of the picture How can various investments be compared when they may, on average, produce comparable outcomes but have varying degrees of risk? First, we will summarize how many samples of 2 securities and calculate some calculation
Figure 10) Summarize daily prices for Amazon and Facebook
Form the data raw, we will plot it as the figure below to show the development of them and observe the oscillation of open price
Figure 11) Plot daily prices for Amazon and Facebook