1. Trang chủ
  2. » Luận Văn - Báo Cáo

Luận văn tốt nghiệp Khoa học máy tính: Stocks Price Trends Prediction Using Machine Learning Techniques

89 1 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Nội dung

Trang 1

Faculty of Computer Science and Engineering——————– * ———————

Bachelor of Engineering Thesis

Stocks Price Trends PredictionUsing Machine Learning Techniques

Committee : Computer Science

Supervisors : Dr Nguyen An Khuong,HCMUT, VNU-HCM

Mr Nguyen Thanh Phuong, New Mexico State University

Reviewer : Dr Nguyen Hua Phung,HCMUT, VNU-HCM

Author : Nguyen Duc Phu,1710234

Ho Chi Minh City, August 10, 2021

Trang 2

ii) 7uPKLӇXFiFNӻWKXұWKӑFPi\WURQJYLӋF[ӱOêGӳOLӋXFKXӛL WKӡLJLDQÿһFELӋWOjGӳOLӋXtài chính

iii) 7KXWKұSOjPVҥFKYj[ӱOêGӳOLӋXWjLFKtQK FKRYLӋFKXҩQOX\ӋQKӑFPi\ iv) ;k\GӵQJP{KuQKKӑFPi\ÿӇGӵÿRiQ[XKѭӟQJJLiFәSKLӃX

v) +LӋQWKӵFWUuQKJLҧOұSP{SKӓQJJLDRGӏFKFәSKLӃXÿѫQJLҧQÿӇNLӇPÿӏQKKLӋXTXҧFӫDmô hình

1Jj\JLDRQKLӋPYөOXұQYăQ: 01/03/2021 1Jj\KRjQWKjQKQKLӋPYө 14/06/2021

+ӑWrQJLҧQJYLrQKѭӟQJGүQ 3KҫQKѭӟQJGүQ

1) 1JX\ӉQ$Q.KѭѫQJ, Ĉ+%iFK.KRD7S+&0 *ӧLêKѭӟQJÿӅWjLJLiPViWTXiWUuQKWKӵFKLӋQ

2) 1JX\ӉQ7LӃQ7KӏQK, Ĉ+%iFK.KRD Tp.HCM +ѭӟQJGүQNLӃQWKӭFQӅQWҧQJJLiPViWTXiWUuQKWKӵFKLӋQ

Trang 3

ĈӅWjL 'ӵÿRiQ[XKѭӟQJJLiFәSKLӃXVӱGөQJFiFNӻWKXұWKӑFPi\ (Stocks price trends

prediction using machine learning techniques)

+ӑWrQQJѭӡLKѭӟQJGүQ

x 1JX\ӉQ$Q.Kѭѫng, Khoa K+ 70i\WtQKĈ+BK x 1JX\ӉQ7LӃQ7KӏQK, KRD.+ 70i\WtQKĈ+% x 3KDQ6ѫQ7ӵ, Descartes Network

x 1JX\ӉQ7KjQK3KѭѫQJ, Ĉ+1HZ0H[LFR6WDWH+RD.Ǥ

7әQJTXiWYӅEҧQWKX\ӃWPLQK

6ӕtài OLӋXWKDPNKҧo: 36 3KҫQPӅPWtQKWRiQ

+LӋQYұW VҧQSKҭP : &'FKӭD các files [ӱOêGӳOLӋXKXҩQOX\ӋQYjNLӇPÿӏQKP{KuQK

7әQJTXiWYӅFiFEҧQYӁ

- 6ӕEҧQYӁYӁWD\ 6ӕEҧQYӁWUrQPi\WtQK 1KӳQJѭXÿLӇPFKtQKFӫD/9TN:

x /9ÿѭӧFYLӃWEҵQJWLӃQJ$QKNKiWӕWtWOӛL, trình bày ÿҽSPҥFKOҥFU}UjQJÿ~QJTX\cách7/7.WUuQKEj\ÿ~QJFKXҭQ

x SVTH có QăQJOӵFWӕW, có NKҧQăQJWӵKӑFYjWLQKWKҫQOjPYLӋFÿӝFOұSUҩWFDR

x 697+QҳPYӳQJNLӃQWKӭFQӅQWҧQJNӻWKXұWYjFiF F{QJQJKӋFyOLrQTXDQ ÿӇ[ӱOêGӳOLӋXFKXӛLWKӡLJLDQ[k\GӵQJÿѭӧFP{KuQKKӑFPi\ÿӇGӵÿRiQ[XKѭӟQJJLiFәSKLӃXYjKLӋQWKӵFJLҧOұSJLDRGӏFKFәSKLӃXÿѫQJLҧQGӵDWUrQGӵÿRiQFӫDP{KuQKQj\

x ӃW TXҧÿҥWÿѭӧFFӫD/9FyêQJKƭDWKӵFWLӉQSKKӧSYӟLPөFWLrXYjJLӟLKҥQSKҥPYLÿӅWjLÿһWUDEDQÿҫX

1KӳQJWKLӃXVyWFKtQKFӫD/971

x &ҫQWKӵFKLӋQÿiQKJLiSKkQWtFKFKLWLӃWKѫQEӝGӳOLӋXÿѭӧFVӱGөQJWURQJOXұQYăQ x &KѭDWLQKFKӍQKFiFWK{QJ VӕFӫDP{KuQKÿӇÿҥWNӃWTXҧWӕLѭX

x ChѭDKLӋQWKӵFWKành công cөSKөFYөÿҫXWѭ

ĈӅQJKӏĈѭӧFEҧRYӋ; %әVXQJWKrPÿӇEҧRYӋ† K{QJÿѭӧFEҧRYӋ†

9 0ӝWVӕ FkXKӓL69SKҧLWUҧOӡLWUѭӟF+ӝLÿӗQJ Không có (69Vͅÿ˱ͫFK͗LWU͹FWL͇SWUrQ+Ĉ)

ĈiQKJLiFKXQJ EҵQJFKӳJLӓLNKi7%  GiӓL ĈLӇP9.5/10

.êWrQ JKLU}KӑWrQ

Trang 4

2 Đề tài: Stocks Price Trends Prediction Using Machine Learning Techniques3 Họ tên người phản biện: TS Nguyễn Hứa Phùng

4 Tổng quát về bản thuyết minh:

6 Những ưu điểm chính của LVTN:

Đề tài thực hiện dự đoán xu hướng giá của cổ phiếu chứng khoán giao dịch trong ngày Đề tài thực hiện sử dụng dữ liệu giao dịch trong 1 tháng của 7 cổ phiếu với khoảng 200 triệu bản ghi được cungcấp bởi Wharton Research Data Services Sinh viên Phú đã thực hiện xử lý dữ liệu (bình quân các giao dịch trong mili giây để đưa về giao dịch trên giây, chuẩn hoá dữ liệu, tạo cửa sổ dữ liệu mỗi 5 phút Sinh viên Phú cũng sử dụng các kỹ thuật học máy có sẵn (LSTM, ResNet50) và kết hợp chúng theo hai hướng khác nhau (Hybrid, ResLSTM) sau đó triển khai thực nghiệm và thực hiện mô phỏng giao dịch dựa vào kết quả dự đoán của các mô hình Kết quả cho thấy có mô hình cho kếtquả tương đối tốt Luận văn được viết bằng tiếng Anh khá tốt, ít lỗi.

- Liệu có mối tương quan giữa khối lượng và giá dự báo không?

8 Đề nghị: Được bảo vệ  Bổ sung thêm để bảo vệ  Không được bảo vệ 9 3 câu hỏi SV phải trả lời trước Hội đồng:

a Sự chênh lệch giữa giá ask và giá bid trong tập dữ liệu có lớn không? Nếu sự chênh lệch không lớn thì có thể chỉ cần dùng một giá trong tập dữ liệu được không?

b Liệu có mối tương quan giữa khối lượng và giá dự báo không? Em đã thực hiện phân tích nào để đánh giá mối tương quan giữa khối lượng và giá dự báo truớc khi đưa khối lượng vào mô hình học máy?

10 Đánh giá chung (bằng chữ: giỏi, khá, TB): Giỏi Điểm : 9 /10Ký tên (ghi rõ họ tên)

Trang 5

I certify that everything written in this thesis, as well as in the source code, isdone by myself, with the exception of quoted reference knowledge as well as codeprovided by the manufacturer themselves, with no intention of plagiarising orduplicating from foreign sources If reassurances find contradicting results to theaforementioned statement, I shall take full responsibility in front of the Facultyand the University.

Author

Trang 6

We would like to express our very great appreciation to Dr Nguyen An Khuongfor his huge support and useful critiques during the planning, development, andcompletion of this thesis His enthusiastic, credible, and continuous guidance playsan important part in the completion of the thesis Advice given by Dr NguyenTien Thinh has been a great help in both technical and presentation aspects.

We also would like to offer my special thanks to the seniors, Mr Nguyen ThanhPhuong, Mr Van Tien Duc, Mr Phan Son Tu, Mr Tran Trung Hieu, Mr VanMinh Hao, and Mr Nguyen Tan Duc, for their helpful advice during our researchprocess.

It would be incomplete without showing love to our family They are the biggestmotivation for us to complete the thesis.

Finally, we would like to thanks my friends, Nguyen Dang Ha Nam and NguyenHuy Hong Huy, as well as Nguyen Nguyen Vi for their assists.

Author

Trang 7

The stock market is nothing but one of the most attractive topics nowadays Thanksto the recent rapid development of machine learning, especially deep learning,algorithmic trading becomes more popular With the purpose of constructing anautomatic trading bot in mind, we decided to work on developing stocks price trendspredictors for our thesis as the first step Besides two models using convolutionalneurons network and long-short term memory, we also propose other two hybridforms of these models The result is competitive in terms of training and evaluationperformance, compared to other studies Moreover, trading simulations basedon signals of trained models are conducted to provide more insights about thepotential of applying machine learning models into the real-life stock market, inwhich one of our models achieves positive returns.

Trang 8

1.1 Introduction to research problem 1

1.2 Objectives of the study 2

1.3 Structure of the thesis 3

2Background52.1 Basic concepts of stock 5

2.3 High frequency data characteristics 10

2.3.1 Irregular temporal spacing 10

2.3.2 Discreteness 10

2.3.3 Diurnal patterns 11

2.4 Machine learning 12

Trang 9

2.4.1 Deep learning 13

2.4.2 Feedforward Neural Networks 16

2.4.3 Convolutional Neural Networks 19

2.4.4 Recurrent Neural Networks 25

3Related Work314Proposed models and experiments394.1 Data preparation 39

5.2 Models performance 61

5.3 Trading simulation performance 67

6Conclusion and Future Work716.1 Conclusion 71

6.2 Limitation and future work 71

Trang 10

2.1 Histogram of transaction price changes for Airgas stock 11

2.2 Analysis on the basis of publication year 13

2.3 Analysis based on prediction techniques (2019) 13

2.4 Analysis based on clustering techniques (2019) 14

2.5 Histogram of publication count in topics (2020) 15

2.6 Histogram of publication count in models (2020) 16

2.7 Topic-model heatmap (2020) 17

2.8 Neuron in human brain 17

2.9 Computer replication of neuron 18

2.10 Basic feedforward neural network 19

2.11 Convolutional Neural Network 20

2.12 An example of 2D convolution 21

2.13 Logistic sigmoid function 22

2.14 ReLU function 23

2.15 A residual block in ResNet 24

2.16 An example of Unfolding Computational Graph 26

2.17 Example of RNN architecture 27

2.18 Illustration of LSTM block 29

3.1 Performance of stocks midprice trends prediction of CNN 34

3.2 Performance of stocks midprice trends prediction of LSTM 35

3.3 Performance of DeepLOB for Nasdaq Nordic dataset 37

3.4 Performance of DeepLOB for London Stock Exchange dataset 38

4.1 Original midprice and processed midprice of averaged AMZN data 434.2 Original midprice and processed midprice of averaged AMD data 44

4.3 Original midprice and processed midprice of averaged AAPL data 454.4 Original midprice and processed midprice of averaged FB data 46

Trang 11

4.5 Original midprice and processed midprice of averaged TSLA data 474.6 Original midprice and processed midprice of averaged NVDA data 484.7 Original midprice and processed midprice of averaged MSFT data 49

4.8 Labeling using the first and last record 50

4.9 Labeling using averaged midprice with k =1 51

4.10 Labeling using averaged midprice with k =10 52

4.11 LSTM model, built with Keras 53

4.12 ResNet model, built with Keras and TensorFlow Hub 54

4.13 The first proposed model, built with Keras and TensorFlow Hub 55

4.14 The second proposed model, built with Keras and TensorFlow Hub 565.1 Accuracy of models in training 62

5.2 Kappa coefficient of models in training 63

5.3 Precision of models in training 63

5.4 Recall of models in training 64

5.5 F1-Score of models in training 64

5.6 Accuracy of models in evaluation 65

5.7 Kappa coefficient of models in evaluation 65

5.8 Precision of models in evaluation 66

5.9 Recall of models in evaluation 66

5.10 F1-Score of models in evaluation 67

5.11 The midprice of Apple Inc stock in simulation 68

5.12 Cumulative returns of models in simulation 69

Trang 12

4.1 Percentage of labels using the first and last record of tensor 50

4.2 Percentage of labels using averaged midprice with k=1 51

4.3 Percentage of labels using averaged midprice with k=10 52

5.1 Performance of models after trained with seven datasets 62

5.2 Performance of models in the final evaluation 65

5.3 Performance of models in the simulation 69

5.4 Final Balance of models 70

Trang 13

S&P500 Standard and Poor’s 500 Stock Index

AI Artificial Intelligence

ML Machine Learning

DL Deep Learning

LSTM Long Short Term Memory

CNN Convolutional Neural Network

LOB Limit order book

SEC Securities and Exchange Commission

WRDS Wharton Research Data Service

MLP Multilayer Perceptron

SVM Support Vector Machine

Trang 14

1.1Introduction to research problem

The stock market is nothing but one of the most attractive topics nowadays TheU.S stock market ended 2020 at all-time highs despite a deadly pandemic Globalstocks (as measured by the MSCI World Index) climbed 14% Global stocks havenow posted two consecutive years of double-digit gains In 2019, the MSCI WorldIndex gained 24%, and U.S stocks, as measured by S&P500, added 28%.1

Thanks to the quick development of computing and communications, electronictrading becomes the main trading activity in replace of traditional face-to-facetrading, which also makes possible algorithmic trading supported by artificialintelligence (AI) Chi Nzelu, Head of Macro e-Trading, said, “Through automation,we can capture more data - a problem previously unsolvable by algorithms Machinelearning allows us to improve the quality of services in our trading ecosystem,which also should gradually improve over time” According to a survey conductedby J.P.Morgan in 20202, 71% of traders believe that AI and machine learning (ML)provide deep data analytics for their daily trading activity, whereas 58% of tradersbelieve that AI and ML represent an opportunity to hone their trading decisions.Together with the growth of ML, especially deep learning (DL), algorithmic tradingattracts more attention from researchers The survey about recent applications ofDL in the financial industry, conducted by Ozbayoglu et al [1], proves that prices1https://www.fidelity.com/learning-center/trading-investing/markets-sectors/2020-stock-market-report

2https://www.jpmorgan.com/solutions/cib/markets/e-trading-2020

Trang 15

or price trends prediction, along with algorithmic trading, has the most interestfrom DL researchers Also in [1], Ozbayoglu et al claim that Long-Short TermMemory (LSTM) is the most used model thanks to its advantage in the financialtime series research area Meanwhile, Convolutional Neural Networks (CNNs)based models, which are well-known in image processing, also gain popularityamong researchers.

Problem statement. As we can see, due to an increasing number of dailytraders, there is a demand for more accurate and efficient tools that are ableto support daily traders to make better decisions and profits on stock markets.Therefore, our study is executed with the hope of providing some insights into theperformance of DL models in algorithmic trading, i.e price movements prediction.We hope that the work can be considered as the first step to construct a usefulDL tool for “intraday” traders3, an automatic trading bot For more specified,this thesis concentrates on developing price trends predictors based on the “orderbook”4, which can be used as automatic decision makers in following projects.

The thesis aims are to develop machine learning models for predicting stocksmidprice movements based on the high frequency limit order book (LOB) data andsimulate trading strategies using proposed models In particular, the thesis has thefollowing specific objectives:

• Studying how the stock market works.• Doing engineering to preprocess the dataset.

• Applying machine learning techniques to forecast price trends during tradingdays of chosen U.S stocks.

• Simulating trades and conducting statistical reports on the outcome ofsuggested models and strategies.

3discussed in Section 2.2

4discussed in Subsection 2.1.4

Trang 16

Because of the variety and diversity of the research area as well as the limitedresources, in the scope of this thesis, we consider following restricts:

• Ignoring effects of other economic factors, such as “dark pool”5, social networkstatements that affect the market.

• Simulating simple trades in real-time using quote data given by InvestorsExchange Cloud API6.

• Ignoring the effect of transaction cost when testing trading strategies.We hope that the study can contribute to the literature of algorithmic trading usingmachine learning, provide more insights about DL algorithmic trading, and maycome up with models and strategies with better performance Our expectationsare to process the data, construct models that perform acceptably in trainingand simulating Moreover, the thesis can also be considered as the first step todevelop a comprehensive automation trading bot in further research, which maybe eventually deployed in the real market.

Based on the objectives that we have discussed in Section 1.2 , the thesis is organizedas follows:

Chapter 1: Introduction We introduce the research problem as well as theobjectives, scope, and structure of the thesis.

Chapter 2: Background In this chapter, domain knowledge about financeand stocks is presented Chapter 2 also includes the machine learning backgroundrelevant to the study.

Chapter 3: Related Work We discuss the state of machine learning plications in the financial industry and the methodologies proposed by previousresearchers in solving the research problem.

ap-5discussed in Subsection 2.1.4

6https://iexcloud.io/

Trang 17

Chapter 4: Data, Models and Trading Strategies This chapter cases our data and the way we process it Models and the training phase are alsomentioned, followed by our trading strategies.

show-Chapter 5: Result The performance of models and the result of simulatingtrades are demonstrated in detail.

Chapter 6: Conclusion and Future Work We summarize the thesis,evaluate what we have achieved and what we have not, as well as some plan in thefuture for the research problem.

Trang 18

2.1Basic concepts of stock2.1.1Stock definition

Stocks represent the ownership in a company By owning shares or stocks, investorsown a piece of a company The value of stock increases when the company operateswell and otherwise, the stock may decrease in value when the company does notdo well Some companies may pay a dividend to the owners of stocks People buystocks for various reasons, such as capital appreciation1, dividend payments, orthe ability to vote and influence the company Companies issue stock when theyneed money, for maintenance and development purposes, or to paying off debt [2].Without stocks, companies may struggle to collect such a large amount of moneyfrom individual investors.

2.1.2Stock markets

The stock market refers to the collection of destinations where regular activitiesof buying, selling happened between investors as well as the issuance of shares ofpublicly-held companies Though it’s called a stock market or equity market, other

securities, like exchange traded funds (ETF), bonds, gold are also traded in the

stock market.

1occurs when a stock price rises.

Trang 19

Stock markets provide a secure and regulated environment where traders,companies, and organizations can safely take financial actions like trading Thestock markets have two missions, known as “primary markets” and “secondarymarkets”, which both follow the rules defined by the regulator.

The first task is that the stock market allows companies to hold an initial publicoffering (IPO), which refers to issue and sell parts of itself (shares) to the publicfor raising fund purposes The second task is to provide a trading platform thatallows transactions of the listed shares For every transaction, traders, individualsor organizations, have to pay the stock market a fee, called a transaction fee.

Long-term investors and short-term traders are not the only two roles takingpart in stock markets Brokers, portfolio managers, investment banks, and marketmakers also contribute to the operation of a stock market [2].

2.1.3Stock order and types of orders

According to U.S SEC [2], market orders, limit orders, and stop-loss orders areamong the most popular types of orders used in the stock market orders.2

Market Orders are the most common ones in trading Market orders allowto buy or sell immediately at the current price, which means buying a stock at ornear the posted ask price, or selling a stock at or near the posted bid price Thelast traded price is not necessarily the price at which the market order will beexecuted Market orders mostly suit the investors who want to issue transactionswithout any delay, although the price is not guaranteed.

Limit Orders, which are sometimes referred to as pending orders, allowinvestors to guarantee the price at which the transaction, buy or sell, is executed.Limit orders determine the level where the price must reach for the order to befilled If the required level is not met, the limit order will wait until being fulfilledor canceled by investors Limit orders help the traders to acquire the best pricepossible, in exchange for the immediate execution.

Stop-Loss Orders, which are also referred to as stop orders, are orders to2https://www.investor.gov/introduction-investing/investing-basics/how-stock-markets-work/types-orders

Trang 20

trade once the stock price reaches the specified milestone, known as the stop price.Different from the limit order, a stop order becomes a market order when the stopprice is activated [2].

Other special orders which may be allowed by brokerage firms are Day Orders,

Good-Till-Cancelled Orders, etc However, in the thesis, we only care about LimitOrders, which form the limit order book The following subsection will discuss thelimit order book (LOB).

An order book is dynamic, meaning it is constantly updated in real-timethroughout the day Orders that specify execution only at market open or marketclose are maintained separately, known as “opening order book” and “closing orderbook” respectively.

There are typically three parts to an order book, i.e buy orders, sell orders,and order history:

• Buy orders contain buyer information including all the bids, the amount theywish to purchase, and the ask price

• Sell orders are similar to buy orders

• Market order histories show all the transactions that have taken place in thepast

3https://www.investopedia.com/terms/o/order-book.asp

Trang 21

Although the order book is meant to provide transparency to market pants, some details are not included in the list, such as “dark pool”, a privatelyorganized financial forum or exchange for trading securities that allows investorsto trade without exposure until after the trade has been executed and reported,and give certain investors the opportunity to place large orders and make tradeswithout publicly revealing their intentions during the search for a buyer or seller.Because of the difficulty to include them in the modeling or simulation, thesefactors are considered non-exist in the thesis.

partici-Local spatial structure. We also want to specify a feature of the LOB, knownas spatial structure This property plays an important part in our model selectionin Chapter 4 Spatial structure, or spatial dependence, refers to a structure wherelinear member interconnects to each other Said differently, the records, which arein the same region, are relevant to each other, and therefore, each combination oflocal patterns results in different effect on the outcome.

The advance of neural networks, particularly CNN, is the ability to explore thespatial structure of the data Sirignano provided some statistical evidence for localspatial structure in limit order books [3] Evidence about the conditional movementof the future price depends only locally on the current limit order book state wasgiven in the research [3], where the reported coefficients were fitted on the stockAmazon For example, the larger the ask size at the current level, the less likelythe future best ask price will reach a greater level To strengthen the statement,Sirignano also constructed a detailed analysis across 489 stocks primarily drawnfrom S&P500 and NASDAQ-100 The result proved the claim about the localspatial structure of the limit order book is rational.

Trang 22

from these price changes or make use of the spread between the bid price andthe ask price The number of trades made by this type of trader is usually high.Closing position in the market is often a part of their strategies [4] Anotherdefinition is provided by U.S Securities and Exchange Commission4 (SEC) Daytraders rapidly buy, sell, and short stocks throughout the day in the hope that thestocks continue climbing or falling in value for the seconds or minutes they hold theshares, allowing investors to lock in quick profits Day trading is also mentionedby U.S SEC as an extremely risky strategy that can result in substantial financiallosses in a very short period.

Short-term traders usually use 1-, 5-, 15-, 30- and 60-minute intervals to operatetheir plans Traders use numerous intraday strategies, including:

• Scalping One attempts to make small profits on small price changes.• Range trading One uses support and resistance levels to determine buy or

5An inefficient market is one in which an asset’s prices do not reflect its true value due tosome reasons like information asymmetries, human emotion,etc

Trang 23

2.3High frequency data characteristics

Modern financial datasets may contain millions of transactions or posted quotes in asingle day time-stamped to the nearest second or even millisecond It is importantfor us to study the characteristics of those high-frequency data According toRussell et al [5], factors like irregular temporal spacing, diurnal patterns, pricediscreteness may increase the complexity of the analysis of these data.

2.3.1Irregular temporal spacing

Also from Russell et al [5], we know that all transactions data are inherentlyirregularly spaced in time The irregular spacing of the data can be interpreted assome transactions appear to occur only seconds apart while others, e.g between10:30 and 11:00 may be 5 or 10 minutes apart Said differently, the time betweentwo consecutive trades can be varying.

Because most econometric6 models work with fixed intervals, this poses animmediate challenge The time intervals, over which the data will be analyzed,must be decided.

All economic data is discrete [5] Moreover, the variance of the long time process isusually large relative to the size of the minimum movement Meanwhile, the pricechanges take only a handful of values, called ticks, which can be proved throughmany data sets For example, over a year, a stock price can increase hundreds ofdollars, although the prices of two consecutive trades have the difference of onlyone or two ticks.

As an example, Fig 2.1, drown by Russell et al [5], presents a histogram ofAirgas data transaction price changes after deleting the overnight and openingtransactions Fifty-two percent of the transactions have the same price as theprevious ones Over 70% of the transaction prices have a difference of zero (no6Econometric is the application of statistical methods to economic data in order to giveempirical content to economic relationships.

Trang 24

change), up to one tick or down one tick Because the bid prices and ask prices alsoincrease or decrease the number of ticks, the mid-price also own the same property.This discreteness will have an impact on measuring volatility, dependence, or anycharacteristic of prices that is small relative to the tick size [5] To sum up, stocksprice is discrete and the difference between the prices of two consecutive trades isrelatively small, compared to the variance of the long term price changes.

Figure 2.1: Histogram of transaction price changes for Airgas stock (soure [5]).

2.3.3Diurnal patterns

It is obvious that all trading occurs in the daytime, resulting in very strong diurnalor periodic patterns Russell et al also discussed in the research [5] that volatility,volume and spreads, and the frequency of trades all have the same U-shapedpattern, which is highest around the middle of the day This is the result of theworking hours of the stock market Meanwhile, the time between trades tends tobe shortest near the open and also prior to the close Also in the work [5] that theyshowed that the duration and standard deviation of mid-quote price changes, a.k.amidprice changes, are also in U-shape This can be interpreted as the midpricechanges are usually large at the beginning and the end of the trading day, andrelatively small at the middle of the day.

Trang 25

2.4Machine learning

Machine learning is an application of artificial intelligence (AI) Within AI, machinelearning has emerged as the method of choice for developing practical softwarefor computer vision, speech recognition, natural language processing, and otherapplications The effect of machine learning has also been felt broadly acrosscomputer science and across a range of industries concerned with data-intensiveissues (Jordan et al [6]).

A machine learning algorithm is a computational process that uses input datato achieve the desired task without being literally programmed These algorithmsautomatically alter their parameters through a training process, in which samplesof input data are provided along with the desired outcomes The algorithm thenmodifies itself to not only achieve the desired outcomes when presented with thetraining inputs but can produce the desired outcomes from unseen data (Naqa andMurphy [7]).

Machine learning in stock market prediction. In the last few years, AI wasable to make many advances that enabled the creation of applications for financeprofessionals that could arguably disrupt the finance industry Lots of applications

already implement AI in different areas of finance, such as anomaly detection,

portfolio management, credit evaluation, text mining, or algorithmic trading [8].

The stock market prediction has already been one of the most attractive areas.Therefore, several data mining techniques and knowledge discovery from thedatabase have been employed for analyzing the market trends Gandhmal et al [9]constructed a detailed analysis and review of stock market prediction techniquesas discussed following.

Having surveyed 50 papers suggesting methodologies in stock market prediction,Gandhmal et al [9] provided some insights about those researches Fig.2.2illustrates the number of research papers published in the years from 2010 to 2018.The proportion of each prediction technique is depicted in Fig 2.3 The percentageof each clustering technique is depicted in Fig 2.4.

Trang 26

Figure 2.2: Analysis on the basis of publication year (source [9]).

Figure 2.3: Analysis based on prediction techniques (2019) (source [9]).

2.4.1Deep learning

Deep Learning is a machine learning technique that constructs artificial neuralnetworks to mimic the structure and function of the human brain In practice, DL

Trang 27

Figure 2.4: Analysis based on clustering techniques (2019) (source [9]).uses a large number of hidden layers, typically at least 6, of non-linear processing toextract features from data and transform the data into different levels of abstraction.Wang et al [10] conducted a survey about recent advances in DL, categorized intofour groups, i.e deep architectures and CNN, incremental learning, RNN, andgenerative models Published in 2020, the paper provides readers with the currentstate of DL and reinforces the future of DL.

Deep learning in finance. Stock market forecasting, algorithmic trading, creditrisk assessment, portfolio allocation, asset pricing, and derivatives market are amongthe areas where ML researchers focused on developing models that can providereal-time working solutions for the financial industry [1] DL in finance attractsmore and more interest from researchers every year.

A glance at Fig 2.5 shows us financial text mining and algorithmic trading arethe top two fields that the researchers most worked on followed by risk assessment,sentiment analysis, portfolio management, and fraud detection, respectively Thedrawing indicates most of the papers were published within the past four years(2017-2020), which proves that the domain is very well-received and actively studied.

Trang 28

Figure 2.5: Histogram of publication count in topics (2020) (source [1]).Given in Fig 2.6, we observe the dominance of RNN, DMLP, and CNN overthe remaining models, which might be expected, since these models are the mostcommonly preferred and supported ones in DL techniques Meanwhile, RNN is arelatively general model which possesses several variants including LSTM, GRU,etc Within the RNN choice, most of the proposed models belonged to LSTM,which is very popular in time series forecasting or regression problems thanks toits advantages compared to other models It is also used quite often in algorithmictrading More than 70% of the RNN papers consisted of LSTM models.

Furthermore, Fig 2.7 gives the distribution of the models for the research areasthrough a model-topic heatmap According to Ozbayoglu et al [1], the amount ofmodels is more than the number of papers resulting from most of the papers thatcontributed multiple DL models The picture indicates the broad acceptance ofRNN, DMLP, and CNN models in almost all financial application areas.

The survey comprehensively conducted by Ozbayoglu et al [1] helps to formthe way we approach the research problem, a small part of the algorithmic trading

Trang 29

Figure 2.6: Histogram of publication count in model types (2020) (source [1]).area, and also guides us in the model selection process.

2.4.2Feedforward Neural Networks

The concept of feedforward neural networks origins from the artificial neuralnetwork, which is mainly formed by artificial neurons, or perceptron.

Artificial neuron. The Neural Network mimics the function of the human brain,in which the basic calculation unit is the neuron Fig 2.8 shows details aboutneurons in the human brain, whereas Fig 2.9 shows a computer replication.Impulses from other neurons go through the dendrites and return as output signalsthrough the neuron’s axon terminals Then, dendrites carry signals to the cell bodywhere the sum over all of these signals is calculated The result is then comparedto a certain pre-defined threshold to decide whether the neuron can be activated,which means it sends an electrical signal to following neurons through its axon.

Following this concept, an artificial neuron includes the input signals x, whichare multiplied by weights w, and all of them are summed, possibly with a bias b.

Trang 30

Figure 2.7: Topic-model heatmap (2020) (source [1]).

A trigger function f acting as a threshold then decides whether the neuron will be

activated The idea of an artificial neuron’s core is that weights can be learnedand thereby control the input values strength from the previous neurons [11].

Figure 2.8: Neuron in human brain (source [12]).

Trang 31

Figure 2.9: Computer replication of neuron (source [12]).

Feedforward Neural Networks. Feedforward neural networks (FNNs), also

referred to as deep feedforward networks or Multilayer Perceptrons (MLPs) are

the deep learning models The goal of a FNN is to approximate the function f∗,

e.g for a classifier, y∗ =f∗(x) maps an input x to a category y A FNN defines amapping y =f(x; θ) and learns the value of parameters θ= (w, b) that result in

the best function performance[13], where f is the approximation of f∗.

These models are called feedforward since information flows through the

function being evaluated from input x, through the intermediate computations f,and finally to the output y, without being fed back into the model The FNNs

,which include the feedback mechanism, are called recurrent neural networks

Fig 2.10 demonstrates an example of FNNs, in which there are three layers.The input layer is the layer without the input arrows, representing the input ofthe network The final layer is called the output layer The layer in the middleis called the hidden layer since, during the training phase, the use of the layer isdecided by the learning algorithm instead of the training data, which means thetraining data does not show the desired output of these layers[13].

Trang 32

Figure 2.10: Basic feedforward neural network (source [11]).

2.4.3Convolutional Neural Networks

Convolutional neural networks (CNNs), invented by LeCun et al.[14], are ward neural networks that can exploit local spatial structures in the input data.Flattening high-dimensional time series, such as limit order book depth histories,would require a very large number of weights in a feedforward architecture CNNs,shown in Fig 2.11, attempt to reduce the network size by exploiting data locality.These properties are the reasons for us to choose CNN in Section 4.2 This sectiondiscusses some basic knowledge about the architecture of CNNs and the modelwhich we intend to use in the thesis.

feedfor-Convolutional layer. The convolutional layer is considered the most importantcomponent in the design of CNNs Convolutional layers include a set of filters,a.k.a kernels, that can be learned through back-propagation In DL models used

Trang 33

Figure 2.11: Convolutional Neural Network.

for image processing, these filters are usually small in length and width, which

is motivated by the sparse interactions For example, we can detect small,

meaningful features such as edges with kernels that occupy only tens or hundredsof pixels, which both reduces the memory requirements and improves the statisticalefficiency of the model [13] Fig 2.12 demonstrates a convolution operation between

the input x of shape (3 × 4) and the kernel k of shape (2 × 2).

Three typical hyperparameters of a convolutional layer are the number ofkernels, which determines the depth of the output, stride, and zero-padding Theoutput shape of the layer is then decided by these parameters [11]:

• The number of kernels K The number of kernels determines the depth of

the output Each of these filters is responsible for detecting different featuresof the input.

• Stride S Stride is the amount of movement that the kernel moves.

• Zero-padding P Zero-padding is the parameter that determines the number

of zero borders to be padded around the original border of the input.The size of the output is then calculated by

Input size − Kernel size+2 × P

For example, in Fig 2.12, the input size is (3 × 4), the kernel size is (2 × 2), the

stride S =1 and the zero-padding P =0 The output therefore has the shape of

Trang 34

Figure 2.12: An example of 2D convolution (source [13]).

(2 × 3) since

3 − 2+0

1 +1=2,4 − 2+0

1 +1=3.

Activation function. In Fig 2.11, we can see that at the end of the tional layer, there is a rectified linear unit (ReLU) function, which is an activation,

convolu-a.k.a non-linearity, function Many non-linear functions can be used with artificial

neurons The two most commonly used activation functions are logistic sigmoidand ReLU.

Trang 35

• Logistic sigmoid The logistic sigmoid can be calculted as follow

This function takes a real number and returns the result in range[0, 1] Therewas a time when the logistic sigmoid is the best choice in term of activationfunction However, due to its major drawbacks: easy to saturated and notzero-centered [11], shown in Fig 2.13.

Figure 2.13: Logistic sigmoid function (source [11]).

• Rectified linear unit ReLU function, shown in Fig 2.14, has been

replacing the “old fashion” logistic sigmoid to become the ones used a lot inrecent years The formula for ReLU is as follow

f(x) = max(0, x).

ReLU function has some advantages over the logistic sigmoid function:

Significantly accelerate the convergence when using stochastic gradientdescent algorithm compared to logistic sigmoid function [15].

Simple, easy to implement.

Trang 36

Figure 2.14: ReLU function (source [11]).

Pooling layer. The purpose of the pooling layer is to reduce the spatial sizeof the representation to reduce the number of parameters and calculations in thenetwork, thereby control overfitting In most cases, the pooling layer is in themiddle of two consecutive convolutional layers The pooling layer applies a poolingfunction, such as max, average, and sum, independently on every depth slice of theinput, reducing data by length and width while the depth is reserved [11].

Pooling helps to make the representation approximately invariant to small

changes of the input, which means the values of most of the pooled outputs do notchange if we translate the input a small amount Invariance to local translationcan be a useful property if we care more about whether some feature is presentthan exactly where it is [13].

Deep Residual Learning. Since AlexNet [15] was introduced, the art CNN architecture is going deeper and deeper The descendants of AlexNet,VGGNet, and GoogleNet, had 19 and 22 layers respectively[16], compared to 5 ofAlexNet.

state-of-the-However, vanishing gradient causes challenging problems regarding training

Trang 37

deep neural networks - as the gradient is back-propagated to earlier layers, repeatedmultiplication may make the gradient infinitely small As a result, when thenetwork goes deeper, its performance halts or even starts degrading rapidly, which

is called a degradation problem [17] Deep Residual Learning (ResNet) was first

introduced by He et al [17] in order to address the degradation problem The coreidea of ResNet is introducing a so-called “identity shortcut connection” that skipsone or more layers, as shown in Fig 2.15.

Figure 2.15: A residual block in ResNet (source [17]).

Because of its compelling results, ResNet quickly became one of the mostpopular architectures in various computer vision tasks Inspired by its successin image classification, we proposed ResNet as a model for forecasting financialsignals, which remains a classification problem.

ResNet50. ResNet50 is a variant of ResNet in which there are 50 convolutionallayers According to He et al [17], the ResNet50 architecture contains thefollowing element:

• A convolutional layer with filter of size 7 × 7 and number of filters is 64, all

with a stride of 2 giving us 1 layer.

• Max Pooling layer with also a stride of 2.

Trang 38

• These three following layers are repeated in total 3 times, giving us 9 layers

1 64 filters of size 1 × 12 64 filters of size 3 × 33 256 filters of size 1 × 1

• The combination of 3 layer is repeated 4 times, which gives us 12 layers

1 128 filters of size 1 × 12 128 filters of size 3 × 33 512 filters of size 1 × 1• This step contains 3 layer:

1 256 filters of size 1 × 12 256 filters of size 3 × 33 1024 filters of size 1 × 1

these three layers are repeated 6 times giving us a total of 18 layers.• Three layers which are repeated 3 times resulting in 9 layers

1 512 filters of size 1 × 12 512 filters of size 3 × 33 2048 filters of size 1 × 1

• An average pooling and end it with a fully-connected layer containing 1000

neurons with softmax function, giving us the last 1 layer.

2.4.4Recurrent Neural Networks

Recurrent Neural Networks (RNNs) are firstly introduced by Rumelhart et al [18]in 1986 RNNs are a family of neural networks for processing sequential data,

x(1), , x() and can scale to much longer sequences than would be practical fornetworks without sequence-based specialization [13] The main idea of RNN derivesfrom early ideas found in machine learning and statistical models, which is sharingparameters across different parts of a model [13] In this section, we only presentthe RNNs briefly enough to use them in the later chapters.

Trang 39

Unfolding computational graph. A computational graph is a way to formalizethe structure of a set of computations [13] Unfolding an equation is repeatedlyapplying the definition For example, consider a system

represents the state at time t, denoted as s(), and the function f maps the stateat t to the state at t+1 The same parameters θ of f are used for all time steps

Figure 2.16: An example of Unfolding Computational Graph (source [13]).

If we add more information at time t, such as input x(1) into the function f

in Equation 2.1, the new state now contains information about the whole pastsequence.

s( ) =f(s(t−1), x(); θ), (2.4)

where s() denotes the hidden state at time t, calculated from the input x().According to Goodfellow et al [13], the unfolding technique has two majoradvantages:

• The input size is not affected by the sequence length, as the models concentrateon the transition from one state to following state, rather than the number

Trang 40

of states.

• The transition function f in Equation 2.4 with the same parameters can be

used at every time step.

Recurrent Neural Network. Inspired by the unfolding computational graph,the Fig 2.17 demonstrate a typical example of RNN, in which an output isproduced at each time step and hidden units have recurrent connections betweenthemselves [13].

Figure 2.17: Example of RNN architecture (source [13]).

In Fig 2.17, the hidden state at time t, previously denoted by s(), now are

denoted by h(), map input x() to an output o() The loss L() is then computed to

determine the difference between the desirable output y() and the calculated output

o( ) above The values of L() is used to calculate the prediction ˆy() =softmax(o).

Three weight matrices W , V and U together make the RNN hyperparameters The

Ngày đăng: 31/07/2024, 10:21

w