Luận văn tốt nghiệp Khoa học máy tính: Stocks Price Trends Prediction Using Machine Learning Techniques

Faculty of Computer Science and Engineering——————– * ——————— Bachelor of Engineering Thesis Stocks Price Trends Prediction Using Machine Learning Techniques Committee : Computer Science

Trang 1

Faculty of Computer Science and Engineering

——————– * ———————

Bachelor of Engineering Thesis

Stocks Price Trends Prediction

Using Machine Learning Techniques

Committee : Computer Science

Supervisors : Dr Nguyen An Khuong, HCMUT, VNU-HCM

Mr Nguyen Thanh Phuong, New Mexico State University

Reviewer : Dr Nguyen Hua Phung, HCMUT, VNU-HCM

Author : Nguyen Duc Phu, 1710234

Ho Chi Minh City, August 10, 2021

Trang 2

ii) 7uPKLӇXFiFNӻWKXұWKӑFPi\WURQJYLӋF[ӱOêGӳOLӋXFKXӛL WKӡLJLDQÿһFELӋWOjGӳOLӋXtài chính

iii) 7KXWKұSOjPVҥFKYj[ӱOêGӳOLӋXWjLFKtQK FKRYLӋFKXҩQOX\ӋQKӑFPi\

Trang 3

ĈӅWjL 'ӵÿRiQ[XKѭӟQJJLiFәSKLӃXVӱGөQJFiFNӻWKXұWKӑFPi\ (Stocks price trends

prediction using machine learning techniques)

6ӕtài OLӋXWKDPNKҧo: 36 3KҫQPӅPWtQKWRiQ

+LӋQYұWVҧQSKҭP : &'FKӭD các files [ӱOêGӳOLӋXKXҩQOX\ӋQYjNLӇPÿӏQKP{KuQK

x ӃW TXҧÿҥWÿѭӧFFӫD/9FyêQJKƭDWKӵFWLӉQSKKӧSYӟLPөFWLrXYjJLӟLKҥQSKҥPYLÿӅWjLÿһWUDEDQÿҫX

1KӳQJWKLӃXVyWFKtQKFӫD/971

x &ҫQWKӵFKLӋQÿiQKJLiSKkQWtFKFKLWLӃWKѫQEӝGӳOLӋXÿѭӧFVӱGөQJWURQJOXұQYăQ

x &KѭDWLQKFKӍQKFiFWK{QJ VӕFӫDP{KuQKÿӇÿҥWNӃWTXҧWӕLѭX

x ChѭDKLӋQWKӵFWKành công cөSKөFYөÿҫXWѭ

ĈӅQJKӏĈѭӧFEҧRYӋ; %әVXQJWKrPÿӇEҧRYӋ K{QJÿѭӧFEҧRYӋ

9 0ӝWVӕ FkXKӓL69SKҧLWUҧOӡLWUѭӟF+ӝLÿӗQJ Không có (69Vͅÿ˱ͫFK͗LWU͹FWL͇SWUrQ+Ĉ)

ĈiQKJLiFKXQJEҵQJFKӳJLӓLNKi7% GiӓL ĈLӇP9.5/10

.êWrQJKLU}KӑWrQ

Trang 4

2 Đề tài: Stocks Price Trends Prediction Using Machine Learning Techniques

3 Họ tên người phản biện: TS Nguyễn Hứa Phùng

4 Tổng quát về bản thuyết minh:

6 Những ưu điểm chính của LVTN:

Đề tài thực hiện dự đoán xu hướng giá của cổ phiếu chứng khoán giao dịch trong ngày Đề tài thực hiện sử dụng dữ liệu giao dịch trong 1 tháng của 7 cổ phiếu với khoảng 200 triệu bản ghi được cungcấp bởi Wharton Research Data Services Sinh viên Phú đã thực hiện xử lý dữ liệu (bình quân các giao dịch trong mili giây để đưa về giao dịch trên giây, chuẩn hoá dữ liệu, tạo cửa sổ dữ liệu mỗi 5 phút Sinh viên Phú cũng sử dụng các kỹ thuật học máy có sẵn (LSTM, ResNet50) và kết hợp

chúng theo hai hướng khác nhau (Hybrid, ResLSTM) sau đó triển khai thực nghiệm và thực hiện

mô phỏng giao dịch dựa vào kết quả dự đoán của các mô hình Kết quả cho thấy có mô hình cho kếtquả tương đối tốt Luận văn được viết bằng tiếng Anh khá tốt, ít lỗi

- Liệu có mối tương quan giữa khối lượng và giá dự báo không?

8 Đề nghị: Được bảo vệ  Bổ sung thêm để bảo vệ  Không được bảo vệ 

9 3 câu hỏi SV phải trả lời trước Hội đồng:

a Sự chênh lệch giữa giá ask và giá bid trong tập dữ liệu có lớn không? Nếu sự chênh lệch không lớn thì có thể chỉ cần dùng một giá trong tập dữ liệu được không?

b Liệu có mối tương quan giữa khối lượng và giá dự báo không? Em đã thực hiện phân tích nào để đánh giá mối tương quan giữa khối lượng và giá dự báo truớc khi đưa khối lượng vào mô hình học máy?

10 Đánh giá chung (bằng chữ: giỏi, khá, TB): Giỏi Điểm : 9 /10

Ký tên (ghi rõ họ tên)

Trang 5

I certify that everything written in this thesis, as well as in the source code, isdone by myself, with the exception of quoted reference knowledge as well as codeprovided by the manufacturer themselves, with no intention of plagiarising orduplicating from foreign sources If reassurances find contradicting results to theaforementioned statement, I shall take full responsibility in front of the Facultyand the University.

Author

Trang 6

We would like to express our very great appreciation to Dr Nguyen An Khuongfor his huge support and useful critiques during the planning, development, andcompletion of this thesis His enthusiastic, credible, and continuous guidance plays

an important part in the completion of the thesis Advice given by Dr NguyenTien Thinh has been a great help in both technical and presentation aspects

We also would like to offer my special thanks to the seniors, Mr Nguyen ThanhPhuong, Mr Van Tien Duc, Mr Phan Son Tu, Mr Tran Trung Hieu, Mr VanMinh Hao, and Mr Nguyen Tan Duc, for their helpful advice during our researchprocess

It would be incomplete without showing love to our family They are the biggestmotivation for us to complete the thesis

Finally, we would like to thanks my friends, Nguyen Dang Ha Nam and NguyenHuy Hong Huy, as well as Nguyen Nguyen Vi for their assists

Author

Trang 7

The stock market is nothing but one of the most attractive topics nowadays Thanks

to the recent rapid development of machine learning, especially deep learning,algorithmic trading becomes more popular With the purpose of constructing anautomatic trading bot in mind, we decided to work on developing stocks price trendspredictors for our thesis as the first step Besides two models using convolutionalneurons network and long-short term memory, we also propose other two hybridforms of these models The result is competitive in terms of training and evaluationperformance, compared to other studies Moreover, trading simulations based

on signals of trained models are conducted to provide more insights about thepotential of applying machine learning models into the real-life stock market, inwhich one of our models achieves positive returns

Trang 8

1.1 Introduction to research problem 1

1.2 Objectives of the study 2

1.3 Structure of the thesis 3

2 Background 5 2.1 Basic concepts of stock 5

2.1.1 Stock definition 5

2.1.2 Stock markets 5

2.1.3 Stock order and types of orders 6

2.1.4 Order book 7

2.2 Intraday trading 8

2.3 High frequency data characteristics 10

2.3.1 Irregular temporal spacing 10

2.3.2 Discreteness 10

2.3.3 Diurnal patterns 11

2.4 Machine learning 12

Trang 9

2.4.1 Deep learning 13

2.4.2 Feedforward Neural Networks 16

2.4.3 Convolutional Neural Networks 19

2.4.4 Recurrent Neural Networks 25

3 Related Work 31 4 Proposed models and experiments 39 4.1 Data preparation 39

4.1.1 Data overview 39

4.1.2 Data preprocessing 41

4.1.3 Normalization 41

4.1.4 Time intervals 41

4.1.5 Labeling 42

4.2 Deep learning models 50

4.2.1 LSTM 53

4.2.2 ResNet 53

4.2.3 Two proposed combinations of LSTM and ResNet 54

4.2.4 Training method 56

4.3 Trading strategies 57

4.3.1 Trading strategies 57

4.3.2 Evaluating trading strategy 58

5 Result 60 5.1 Evaluation methods 60

5.2 Models performance 61

5.3 Trading simulation performance 67

6 Conclusion and Future Work 71 6.1 Conclusion 71

6.2 Limitation and future work 71

Trang 10

2.1 Histogram of transaction price changes for Airgas stock 11

2.2 Analysis on the basis of publication year 13

2.3 Analysis based on prediction techniques (2019) 13

2.4 Analysis based on clustering techniques (2019) 14

2.5 Histogram of publication count in topics (2020) 15

2.6 Histogram of publication count in models (2020) 16

2.7 Topic-model heatmap (2020) 17

2.8 Neuron in human brain 17

2.9 Computer replication of neuron 18

2.10 Basic feedforward neural network 19

2.11 Convolutional Neural Network 20

2.12 An example of 2D convolution 21

2.13 Logistic sigmoid function 22

2.14 ReLU function 23

2.15 A residual block in ResNet 24

2.16 An example of Unfolding Computational Graph 26

2.17 Example of RNN architecture 27

2.18 Illustration of LSTM block 29

3.1 Performance of stocks midprice trends prediction of CNN 34

3.2 Performance of stocks midprice trends prediction of LSTM 35

3.3 Performance of DeepLOB for Nasdaq Nordic dataset 37

3.4 Performance of DeepLOB for London Stock Exchange dataset 38

4.1 Original midprice and processed midprice of averaged AMZN data 43 4.2 Original midprice and processed midprice of averaged AMD data 44

4.3 Original midprice and processed midprice of averaged AAPL data 45 4.4 Original midprice and processed midprice of averaged FB data 46

Trang 11

4.5 Original midprice and processed midprice of averaged TSLA data 47 4.6 Original midprice and processed midprice of averaged NVDA data 48 4.7 Original midprice and processed midprice of averaged MSFT data 49

4.8 Labeling using the first and last record 50

4.9 Labeling using averaged midprice with k =1 51

4.10 Labeling using averaged midprice with k =10 52

4.11 LSTM model, built with Keras 53

4.12 ResNet model, built with Keras and TensorFlow Hub 54

4.13 The first proposed model, built with Keras and TensorFlow Hub 55

4.14 The second proposed model, built with Keras and TensorFlow Hub 56 5.1 Accuracy of models in training 62

5.2 Kappa coefficient of models in training 63

5.3 Precision of models in training 63

5.4 Recall of models in training 64

5.5 F1-Score of models in training 64

5.6 Accuracy of models in evaluation 65

5.7 Kappa coefficient of models in evaluation 65

5.8 Precision of models in evaluation 66

5.9 Recall of models in evaluation 66

5.10 F1-Score of models in evaluation 67

5.11 The midprice of Apple Inc stock in simulation 68

5.12 Cumulative returns of models in simulation 69

Trang 12

4.1 Percentage of labels using the first and last record of tensor 50

4.2 Percentage of labels using averaged midprice with k=1 51

4.3 Percentage of labels using averaged midprice with k=10 52

5.1 Performance of models after trained with seven datasets 62

5.2 Performance of models in the final evaluation 65

5.3 Performance of models in the simulation 69

5.4 Final Balance of models 70

Trang 13

S&P500 Standard and Poor’s 500 Stock Index

AI Artificial Intelligence

ML Machine Learning

DL Deep Learning

LSTM Long Short Term Memory

CNN Convolutional Neural Network

LOB Limit order book

SEC Securities and Exchange Commission

WRDS Wharton Research Data Service

MLP Multilayer Perceptron

SVM Support Vector Machine

Trang 14

1.1 Introduction to research problem

The stock market is nothing but one of the most attractive topics nowadays TheU.S stock market ended 2020 at all-time highs despite a deadly pandemic Globalstocks (as measured by the MSCI World Index) climbed 14% Global stocks havenow posted two consecutive years of double-digit gains In 2019, the MSCI WorldIndex gained 24%, and U.S stocks, as measured by S&P500, added 28%.1

Thanks to the quick development of computing and communications, electronictrading becomes the main trading activity in replace of traditional face-to-facetrading, which also makes possible algorithmic trading supported by artificialintelligence (AI) Chi Nzelu, Head of Macro e-Trading, said, “Through automation,

we can capture more data - a problem previously unsolvable by algorithms Machinelearning allows us to improve the quality of services in our trading ecosystem,which also should gradually improve over time” According to a survey conducted

by J.P.Morgan in 20202, 71% of traders believe that AI and machine learning (ML)provide deep data analytics for their daily trading activity, whereas 58% of tradersbelieve that AI and ML represent an opportunity to hone their trading decisions.Together with the growth of ML, especially deep learning (DL), algorithmic tradingattracts more attention from researchers The survey about recent applications of

DL in the financial industry, conducted by Ozbayoglu et al [1], proves that prices

1 market-report

https://www.fidelity.com/learning-center/trading-investing/markets-sectors/2020-stock-2 https://www.jpmorgan.com/solutions/cib/markets/e-trading-2020

Trang 15

or price trends prediction, along with algorithmic trading, has the most interestfrom DL researchers Also in [1], Ozbayoglu et al claim that Long-Short TermMemory (LSTM) is the most used model thanks to its advantage in the financialtime series research area Meanwhile, Convolutional Neural Networks (CNNs)based models, which are well-known in image processing, also gain popularityamong researchers.

Problem statement. As we can see, due to an increasing number of dailytraders, there is a demand for more accurate and efficient tools that are able

to support daily traders to make better decisions and profits on stock markets.Therefore, our study is executed with the hope of providing some insights into theperformance of DL models in algorithmic trading, i.e price movements prediction

We hope that the work can be considered as the first step to construct a useful

DL tool for “intraday” traders3, an automatic trading bot For more specified,this thesis concentrates on developing price trends predictors based on the “orderbook”4, which can be used as automatic decision makers in following projects

The thesis aims are to develop machine learning models for predicting stocksmidprice movements based on the high frequency limit order book (LOB) data andsimulate trading strategies using proposed models In particular, the thesis has thefollowing specific objectives:

• Studying how the stock market works

• Doing engineering to preprocess the dataset

• Applying machine learning techniques to forecast price trends during tradingdays of chosen U.S stocks

• Simulating trades and conducting statistical reports on the outcome ofsuggested models and strategies

3 discussed in Section 2.2

4 discussed in Subsection 2.1.4

Trang 16

Because of the variety and diversity of the research area as well as the limitedresources, in the scope of this thesis, we consider following restricts:

• Ignoring effects of other economic factors, such as “dark pool”5, social networkstatements that affect the market

• Simulating simple trades in real-time using quote data given by InvestorsExchange Cloud API6

• Ignoring the effect of transaction cost when testing trading strategies

We hope that the study can contribute to the literature of algorithmic trading usingmachine learning, provide more insights about DL algorithmic trading, and maycome up with models and strategies with better performance Our expectationsare to process the data, construct models that perform acceptably in trainingand simulating Moreover, the thesis can also be considered as the first step todevelop a comprehensive automation trading bot in further research, which may

be eventually deployed in the real market

Based on the objectives that we have discussed in Section 1.2 , the thesis is organized

Chapter 3: Related Work We discuss the state of machine learning plications in the financial industry and the methodologies proposed by previousresearchers in solving the research problem

ap-5 discussed in Subsection 2.1.4

6 https://iexcloud.io/

Trang 17

Chapter 4: Data, Models and Trading Strategies This chapter cases our data and the way we process it Models and the training phase are alsomentioned, followed by our trading strategies.

show-Chapter 5: Result The performance of models and the result of simulatingtrades are demonstrated in detail

Chapter 6: Conclusion and Future Work We summarize the thesis,evaluate what we have achieved and what we have not, as well as some plan in thefuture for the research problem

Trang 18

2.1 Basic concepts of stock

2.1.1 Stock definition

Stocks represent the ownership in a company By owning shares or stocks, investorsown a piece of a company The value of stock increases when the company operateswell and otherwise, the stock may decrease in value when the company does not

do well Some companies may pay a dividend to the owners of stocks People buystocks for various reasons, such as capital appreciation1, dividend payments, orthe ability to vote and influence the company Companies issue stock when theyneed money, for maintenance and development purposes, or to paying off debt [2].Without stocks, companies may struggle to collect such a large amount of moneyfrom individual investors

2.1.2 Stock markets

The stock market refers to the collection of destinations where regular activities

of buying, selling happened between investors as well as the issuance of shares ofpublicly-held companies Though it’s called a stock market or equity market, other

securities, like exchange traded funds (ETF), bonds, gold are also traded in the

stock market

1 occurs when a stock price rises.

Trang 19

Stock markets provide a secure and regulated environment where traders,companies, and organizations can safely take financial actions like trading Thestock markets have two missions, known as “primary markets” and “secondarymarkets”, which both follow the rules defined by the regulator.

The first task is that the stock market allows companies to hold an initial publicoffering (IPO), which refers to issue and sell parts of itself (shares) to the publicfor raising fund purposes The second task is to provide a trading platform thatallows transactions of the listed shares For every transaction, traders, individuals

or organizations, have to pay the stock market a fee, called a transaction fee.Long-term investors and short-term traders are not the only two roles takingpart in stock markets Brokers, portfolio managers, investment banks, and marketmakers also contribute to the operation of a stock market [2]

2.1.3 Stock order and types of orders

According to U.S SEC [2], market orders, limit orders, and stop-loss orders areamong the most popular types of orders used in the stock market orders.2

Market Orders are the most common ones in trading Market orders allow

to buy or sell immediately at the current price, which means buying a stock at ornear the posted ask price, or selling a stock at or near the posted bid price Thelast traded price is not necessarily the price at which the market order will beexecuted Market orders mostly suit the investors who want to issue transactionswithout any delay, although the price is not guaranteed

Limit Orders, which are sometimes referred to as pending orders, allowinvestors to guarantee the price at which the transaction, buy or sell, is executed.Limit orders determine the level where the price must reach for the order to befilled If the required level is not met, the limit order will wait until being fulfilled

or canceled by investors Limit orders help the traders to acquire the best pricepossible, in exchange for the immediate execution

Stop-Loss Orders, which are also referred to as stop orders, are orders to

2 work/types-orders

Trang 20

https://www.investor.gov/introduction-investing/investing-basics/how-stock-markets-trade once the stock price reaches the specified milestone, known as the stop price.Different from the limit order, a stop order becomes a market order when the stopprice is activated [2].

Other special orders which may be allowed by brokerage firms are Day Orders,

Good-Till-Cancelled Orders, etc However, in the thesis, we only care about LimitOrders, which form the limit order book The following subsection will discuss thelimit order book (LOB)

An order book is dynamic, meaning it is constantly updated in real-timethroughout the day Orders that specify execution only at market open or marketclose are maintained separately, known as “opening order book” and “closing orderbook” respectively

There are typically three parts to an order book, i.e buy orders, sell orders,and order history:

• Buy orders contain buyer information including all the bids, the amount theywish to purchase, and the ask price

• Sell orders are similar to buy orders

• Market order histories show all the transactions that have taken place in thepast

3 https://www.investopedia.com/terms/o/order-book.asp

Trang 21

Although the order book is meant to provide transparency to market pants, some details are not included in the list, such as “dark pool”, a privatelyorganized financial forum or exchange for trading securities that allows investors

partici-to trade without exposure until after the trade has been executed and reported,and give certain investors the opportunity to place large orders and make tradeswithout publicly revealing their intentions during the search for a buyer or seller.Because of the difficulty to include them in the modeling or simulation, thesefactors are considered non-exist in the thesis

Local spatial structure. We also want to specify a feature of the LOB, known

as spatial structure This property plays an important part in our model selection

in Chapter 4 Spatial structure, or spatial dependence, refers to a structure wherelinear member interconnects to each other Said differently, the records, which are

in the same region, are relevant to each other, and therefore, each combination oflocal patterns results in different effect on the outcome

The advance of neural networks, particularly CNN, is the ability to explore thespatial structure of the data Sirignano provided some statistical evidence for localspatial structure in limit order books [3] Evidence about the conditional movement

of the future price depends only locally on the current limit order book state wasgiven in the research [3], where the reported coefficients were fitted on the stockAmazon For example, the larger the ask size at the current level, the less likelythe future best ask price will reach a greater level To strengthen the statement,Sirignano also constructed a detailed analysis across 489 stocks primarily drawnfrom S&P500 and NASDAQ-100 The result proved the claim about the localspatial structure of the limit order book is rational

Trang 22

from these price changes or make use of the spread between the bid price andthe ask price The number of trades made by this type of trader is usually high.Closing position in the market is often a part of their strategies [4] Anotherdefinition is provided by U.S Securities and Exchange Commission4 (SEC) Daytraders rapidly buy, sell, and short stocks throughout the day in the hope that thestocks continue climbing or falling in value for the seconds or minutes they hold theshares, allowing investors to lock in quick profits Day trading is also mentioned

by U.S SEC as an extremely risky strategy that can result in substantial financiallosses in a very short period

Short-term traders usually use 1-, 5-, 15-, 30- and 60-minute intervals to operatetheir plans Traders use numerous intraday strategies, including:

• Scalping One attempts to make small profits on small price changes

• Range trading One uses support and resistance levels to determine buy orsell signals

• News-based trading One has reliable source of insight information, thereforetake the chance of large price changes due to the news

• High-frequency trading One uses algorithms, with or without AI, to exploitthe small or short-term market inefficient5

Taking part in intraday trading means traders do not suffer from overnight news

or off-hours broker moves Trading multiple times in a day can result in moreexperience and more knowledge for the traders Although there are some disadvan-tages of intraday trading since the transaction cost is trouble Moreover, a smallerduration between two consecutive trades means there is less time to consider thedecision, which can lead to making mistakes or missing chances to make profits

In the thesis, discussed in Chapter 4, we choose 5 minutes to be the gap betweentwo different trades That means we use the information collected in 5 minutes topredict the trend of price and make the decision whether we long or short

4 https://www.investor.gov/introduction-investing/investing-basics/glossary/day-trading

5 An inefficient market is one in which an asset’s prices do not reflect its true value due to some reasons like information asymmetries, human emotion,etc

Trang 23

2.3 High frequency data characteristics

Modern financial datasets may contain millions of transactions or posted quotes in asingle day time-stamped to the nearest second or even millisecond It is importantfor us to study the characteristics of those high-frequency data According toRussell et al [5], factors like irregular temporal spacing, diurnal patterns, pricediscreteness may increase the complexity of the analysis of these data

2.3.1 Irregular temporal spacing

Also from Russell et al [5], we know that all transactions data are inherentlyirregularly spaced in time The irregular spacing of the data can be interpreted assome transactions appear to occur only seconds apart while others, e.g between10:30 and 11:00 may be 5 or 10 minutes apart Said differently, the time betweentwo consecutive trades can be varying

Because most econometric6 models work with fixed intervals, this poses animmediate challenge The time intervals, over which the data will be analyzed,must be decided

2.3.2 Discreteness

All economic data is discrete [5] Moreover, the variance of the long time process isusually large relative to the size of the minimum movement Meanwhile, the pricechanges take only a handful of values, called ticks, which can be proved throughmany data sets For example, over a year, a stock price can increase hundreds ofdollars, although the prices of two consecutive trades have the difference of onlyone or two ticks

As an example, Fig 2.1, drown by Russell et al [5], presents a histogram ofAirgas data transaction price changes after deleting the overnight and openingtransactions Fifty-two percent of the transactions have the same price as theprevious ones Over 70% of the transaction prices have a difference of zero (no

6 Econometric is the application of statistical methods to economic data in order to give empirical content to economic relationships.

Trang 24

change), up to one tick or down one tick Because the bid prices and ask prices alsoincrease or decrease the number of ticks, the mid-price also own the same property.This discreteness will have an impact on measuring volatility, dependence, or anycharacteristic of prices that is small relative to the tick size [5] To sum up, stocksprice is discrete and the difference between the prices of two consecutive trades isrelatively small, compared to the variance of the long term price changes.

Figure 2.1: Histogram of transaction price changes for Airgas stock (soure [5])

2.3.3 Diurnal patterns

It is obvious that all trading occurs in the daytime, resulting in very strong diurnal

or periodic patterns Russell et al also discussed in the research [5] that volatility,volume and spreads, and the frequency of trades all have the same U-shapedpattern, which is highest around the middle of the day This is the result of theworking hours of the stock market Meanwhile, the time between trades tends to

be shortest near the open and also prior to the close Also in the work [5] that theyshowed that the duration and standard deviation of mid-quote price changes, a.k.amidprice changes, are also in U-shape This can be interpreted as the midpricechanges are usually large at the beginning and the end of the trading day, andrelatively small at the middle of the day

Trang 25

2.4 Machine learning

Machine learning is an application of artificial intelligence (AI) Within AI, machinelearning has emerged as the method of choice for developing practical softwarefor computer vision, speech recognition, natural language processing, and otherapplications The effect of machine learning has also been felt broadly acrosscomputer science and across a range of industries concerned with data-intensiveissues (Jordan et al [6])

A machine learning algorithm is a computational process that uses input data

to achieve the desired task without being literally programmed These algorithmsautomatically alter their parameters through a training process, in which samples

of input data are provided along with the desired outcomes The algorithm thenmodifies itself to not only achieve the desired outcomes when presented with thetraining inputs but can produce the desired outcomes from unseen data (Naqa andMurphy [7])

Machine learning in stock market prediction. In the last few years, AI wasable to make many advances that enabled the creation of applications for financeprofessionals that could arguably disrupt the finance industry Lots of applications

already implement AI in different areas of finance, such as anomaly detection,

portfolio management , credit evaluation, text mining, or algorithmic trading [8].

The stock market prediction has already been one of the most attractive areas.Therefore, several data mining techniques and knowledge discovery from thedatabase have been employed for analyzing the market trends Gandhmal et al [9]constructed a detailed analysis and review of stock market prediction techniques

as discussed following

Having surveyed 50 papers suggesting methodologies in stock market prediction,Gandhmal et al [9] provided some insights about those researches Fig.2.2illustrates the number of research papers published in the years from 2010 to 2018.The proportion of each prediction technique is depicted in Fig 2.3 The percentage

of each clustering technique is depicted in Fig 2.4

Trang 26

Figure 2.2: Analysis on the basis of publication year (source [9]).

Figure 2.3: Analysis based on prediction techniques (2019) (source [9])

2.4.1 Deep learning

Deep Learning is a machine learning technique that constructs artificial neuralnetworks to mimic the structure and function of the human brain In practice, DL

Trang 27

Figure 2.4: Analysis based on clustering techniques (2019) (source [9]).

uses a large number of hidden layers, typically at least 6, of non-linear processing toextract features from data and transform the data into different levels of abstraction.Wang et al [10] conducted a survey about recent advances in DL, categorized intofour groups, i.e deep architectures and CNN, incremental learning, RNN, andgenerative models Published in 2020, the paper provides readers with the currentstate of DL and reinforces the future of DL

Deep learning in finance. Stock market forecasting, algorithmic trading, creditrisk assessment, portfolio allocation, asset pricing, and derivatives market are amongthe areas where ML researchers focused on developing models that can providereal-time working solutions for the financial industry [1] DL in finance attractsmore and more interest from researchers every year

A glance at Fig 2.5 shows us financial text mining and algorithmic trading arethe top two fields that the researchers most worked on followed by risk assessment,sentiment analysis, portfolio management, and fraud detection, respectively Thedrawing indicates most of the papers were published within the past four years(2017-2020), which proves that the domain is very well-received and actively studied

Trang 28

Figure 2.5: Histogram of publication count in topics (2020) (source [1]).

Given in Fig 2.6, we observe the dominance of RNN, DMLP, and CNN overthe remaining models, which might be expected, since these models are the mostcommonly preferred and supported ones in DL techniques Meanwhile, RNN is arelatively general model which possesses several variants including LSTM, GRU,etc Within the RNN choice, most of the proposed models belonged to LSTM,which is very popular in time series forecasting or regression problems thanks toits advantages compared to other models It is also used quite often in algorithmictrading More than 70% of the RNN papers consisted of LSTM models

Furthermore, Fig 2.7 gives the distribution of the models for the research areasthrough a model-topic heatmap According to Ozbayoglu et al [1], the amount ofmodels is more than the number of papers resulting from most of the papers thatcontributed multiple DL models The picture indicates the broad acceptance ofRNN, DMLP, and CNN models in almost all financial application areas

The survey comprehensively conducted by Ozbayoglu et al [1] helps to formthe way we approach the research problem, a small part of the algorithmic trading

Trang 29

Figure 2.6: Histogram of publication count in model types (2020) (source [1]).area, and also guides us in the model selection process.

2.4.2 Feedforward Neural Networks

The concept of feedforward neural networks origins from the artificial neuralnetwork, which is mainly formed by artificial neurons, or perceptron

Artificial neuron. The Neural Network mimics the function of the human brain,

in which the basic calculation unit is the neuron Fig 2.8 shows details aboutneurons in the human brain, whereas Fig 2.9 shows a computer replication.Impulses from other neurons go through the dendrites and return as output signalsthrough the neuron’s axon terminals Then, dendrites carry signals to the cell bodywhere the sum over all of these signals is calculated The result is then compared

to a certain pre-defined threshold to decide whether the neuron can be activated,which means it sends an electrical signal to following neurons through its axon

Following this concept, an artificial neuron includes the input signals x, which are multiplied by weights w, and all of them are summed, possibly with a bias b.

Trang 30

Figure 2.7: Topic-model heatmap (2020) (source [1]).

A trigger function f acting as a threshold then decides whether the neuron will be

activated The idea of an artificial neuron’s core is that weights can be learnedand thereby control the input values strength from the previous neurons [11]

Figure 2.8: Neuron in human brain (source [12])

Trang 31

Figure 2.9: Computer replication of neuron (source [12]).

Feedforward Neural Networks. Feedforward neural networks (FNNs), also

referred to as deep feedforward networks or Multilayer Perceptrons (MLPs) are

the deep learning models The goal of a FNN is to approximate the function f∗,

e.g for a classifier, y∗ =f∗(x) maps an input x to a category y A FNN defines a mapping y =f(x; θ) and learns the value of parameters θ= (w, b) that result in

the best function performance[13], where f is the approximation of f∗

These models are called feedforward since information flows through the

function being evaluated from input x, through the intermediate computations f, and finally to the output y, without being fed back into the model The FNNs

,which include the feedback mechanism, are called recurrent neural networks

[13]

Fig 2.10 demonstrates an example of FNNs, in which there are three layers.The input layer is the layer without the input arrows, representing the input ofthe network The final layer is called the output layer The layer in the middle

is called the hidden layer since, during the training phase, the use of the layer isdecided by the learning algorithm instead of the training data, which means thetraining data does not show the desired output of these layers[13]

Trang 32

Figure 2.10: Basic feedforward neural network (source [11]).

2.4.3 Convolutional Neural Networks

Convolutional neural networks (CNNs), invented by LeCun et al.[14], are ward neural networks that can exploit local spatial structures in the input data.Flattening high-dimensional time series, such as limit order book depth histories,would require a very large number of weights in a feedforward architecture CNNs,shown in Fig 2.11, attempt to reduce the network size by exploiting data locality.These properties are the reasons for us to choose CNN in Section 4.2 This sectiondiscusses some basic knowledge about the architecture of CNNs and the modelwhich we intend to use in the thesis

feedfor-Convolutional layer. The convolutional layer is considered the most importantcomponent in the design of CNNs Convolutional layers include a set of filters,a.k.a kernels, that can be learned through back-propagation In DL models used

Trang 33

Figure 2.11: Convolutional Neural Network.

for image processing, these filters are usually small in length and width, which

is motivated by the sparse interactions For example, we can detect small,

meaningful features such as edges with kernels that occupy only tens or hundreds

of pixels, which both reduces the memory requirements and improves the statisticalefficiency of the model [13] Fig 2.12 demonstrates a convolution operation between

the input x of shape (3 × 4) and the kernel k of shape (2 × 2)

Three typical hyperparameters of a convolutional layer are the number ofkernels, which determines the depth of the output, stride, and zero-padding Theoutput shape of the layer is then decided by these parameters [11]:

• The number of kernels K The number of kernels determines the depth of

the output Each of these filters is responsible for detecting different features

of the input

• Stride S Stride is the amount of movement that the kernel moves.

• Zero-padding P Zero-padding is the parameter that determines the number

of zero borders to be padded around the original border of the input

The size of the output is then calculated by

Input size − Kernel size+2 × P

For example, in Fig 2.12, the input size is (3 × 4), the kernel size is (2 × 2), the

stride S =1 and the zero-padding P =0 The output therefore has the shape of

Trang 34

Figure 2.12: An example of 2D convolution (source [13]).

convolu-a.k.a non-linearity, function Many non-linear functions can be used with artificial

neurons The two most commonly used activation functions are logistic sigmoidand ReLU

Trang 35

• Logistic sigmoid The logistic sigmoid can be calculted as follow

1+e −x.This function takes a real number and returns the result in range[0, 1] Therewas a time when the logistic sigmoid is the best choice in term of activationfunction However, due to its major drawbacks: easy to saturated and notzero-centered [11], shown in Fig 2.13

Figure 2.13: Logistic sigmoid function (source [11])

• Rectified linear unit ReLU function, shown in Fig 2.14, has been

replacing the “old fashion” logistic sigmoid to become the ones used a lot inrecent years The formula for ReLU is as follow

f(x) = max(0, x).ReLU function has some advantages over the logistic sigmoid function:

– Significantly accelerate the convergence when using stochastic gradientdescent algorithm compared to logistic sigmoid function [15]

– Simple, easy to implement

Trang 36

Figure 2.14: ReLU function (source [11]).

Pooling layer. The purpose of the pooling layer is to reduce the spatial size

of the representation to reduce the number of parameters and calculations in thenetwork, thereby control overfitting In most cases, the pooling layer is in themiddle of two consecutive convolutional layers The pooling layer applies a poolingfunction, such as max, average, and sum, independently on every depth slice of theinput, reducing data by length and width while the depth is reserved [11]

Pooling helps to make the representation approximately invariant to small

changes of the input, which means the values of most of the pooled outputs do notchange if we translate the input a small amount Invariance to local translationcan be a useful property if we care more about whether some feature is presentthan exactly where it is [13]

Deep Residual Learning. Since AlexNet [15] was introduced, the art CNN architecture is going deeper and deeper The descendants of AlexNet,VGGNet, and GoogleNet, had 19 and 22 layers respectively[16], compared to 5 ofAlexNet

state-of-the-However, vanishing gradient causes challenging problems regarding training

Trang 37

deep neural networks - as the gradient is back-propagated to earlier layers, repeatedmultiplication may make the gradient infinitely small As a result, when thenetwork goes deeper, its performance halts or even starts degrading rapidly, which

is called a degradation problem [17] Deep Residual Learning (ResNet) was first

introduced by He et al [17] in order to address the degradation problem The coreidea of ResNet is introducing a so-called “identity shortcut connection” that skipsone or more layers, as shown in Fig 2.15

Figure 2.15: A residual block in ResNet (source [17])

Because of its compelling results, ResNet quickly became one of the mostpopular architectures in various computer vision tasks Inspired by its success

in image classification, we proposed ResNet as a model for forecasting financialsignals, which remains a classification problem

ResNet50. ResNet50 is a variant of ResNet in which there are 50 convolutionallayers According to He et al [17], the ResNet50 architecture contains thefollowing element:

• A convolutional layer with filter of size 7 × 7 and number of filters is 64, all

with a stride of 2 giving us 1 layer.

• Max Pooling layer with also a stride of 2

Trang 38

• These three following layers are repeated in total 3 times, giving us 9 layers

these three layers are repeated 6 times giving us a total of 18 layers.

• Three layers which are repeated 3 times resulting in 9 layers

1 512 filters of size 1 × 1

• An average pooling and end it with a fully-connected layer containing 1000

neurons with softmax function, giving us the last 1 layer.

2.4.4 Recurrent Neural Networks

Recurrent Neural Networks (RNNs) are firstly introduced by Rumelhart et al [18]

in 1986 RNNs are a family of neural networks for processing sequential data,

x(1), , x( ) and can scale to much longer sequences than would be practical fornetworks without sequence-based specialization [13] The main idea of RNN derivesfrom early ideas found in machine learning and statistical models, which is sharingparameters across different parts of a model [13] In this section, we only presentthe RNNs briefly enough to use them in the later chapters

Trang 39

Unfolding computational graph. A computational graph is a way to formalizethe structure of a set of computations [13] Unfolding an equation is repeatedlyapplying the definition For example, consider a system

The computational graph of Equation 2.2 is illustrated in Fig 2.16 Each node

represents the state at time t, denoted as s( ), and the function f maps the state

at t to the state at t+1 The same parameters θ of f are used for all time steps

[13]

Figure 2.16: An example of Unfolding Computational Graph (source [13])

If we add more information at time t, such as input x( 1 ) into the function f

in Equation 2.1, the new state now contains information about the whole pastsequence

s( ) =f(s(t−1), x( ); θ), (2.4)

where s( ) denotes the hidden state at time t, calculated from the input x( ).According to Goodfellow et al [13], the unfolding technique has two majoradvantages:

• The input size is not affected by the sequence length, as the models concentrate

on the transition from one state to following state, rather than the number

Trang 40

of states.

• The transition function f in Equation 2.4 with the same parameters can be

used at every time step

Recurrent Neural Network. Inspired by the unfolding computational graph,the Fig 2.17 demonstrate a typical example of RNN, in which an output isproduced at each time step and hidden units have recurrent connections betweenthemselves [13]

Figure 2.17: Example of RNN architecture (source [13])

In Fig 2.17, the hidden state at time t, previously denoted by s( ), now are

denoted by h( ), map input x( ) to an output o( ) The loss L( ) is then computed to

determine the difference between the desirable output y( ) and the calculated output

o( ) above The values of L( ) is used to calculate the prediction ˆy( ) =softmax(o)

Three weight matrices W , V and U together make the RNN hyperparameters The

Tiêu đề	Stocks Price Trends Prediction Using Machine Learning Techniques
Tác giả	Nguyen Duc Phu
Người hướng dẫn	Dr. Nguyen An Khuong, Dr. Nguyen Tien Thinh, Mr. Phan Son Tu, Mr. Nguyen Thanh Phuong, Dr. Nguyen Hua Phung
Trường học	Ho Chi Minh City University of Technology
Chuyên ngành	Computer Science
Thể loại	Bachelor of Engineering Thesis
Năm xuất bản	2021
Thành phố	Ho Chi Minh City

Định dạng
Số trang	89
Dung lượng	1,98 MB