Khóa luận tốt nghiệp: Deep learning for predicting stock market index

1LIST OF TABLES =T——= _—T-Table 4.1 List of the best hyperparameters for single-neurons LSTM, 37 GRU, and CNN models The performance scores of the single layer deep learning models in t

Trang 1

VIETNAM NATIONAL UNIVERSITYUNIVERSITY OF ECONOMICS & BUSINESS

FACULTY OF FINANCE - BANKING

GRADUATION THESIS

DEEP LEARNING FOR PREDICTING STOCK MARKET INDEX

Instructor: Dr Nguyen Tien Chuong

Student: Nguyen Quang Bac

Student ID: 20050408

Class: QH2020E TCNH CLC 3Course code: FIB4151

Ha Noi, 2023

VIETNAM NATIONAL UNIVERSITY

Trang 2

UNIVERSITY OF ECONOMICS & BUSINESS

FACULTY OF FINANCE - BANKING

GRADUATION THESIS

DEEP LEARNING FOR PREDICTING STOCK MARKET INDEX

Instructor: Dr Nguyen Tien Chuong

Student: Nguyen Quang BacStudent ID: 20050408

Class: QH2020E TCNH CLC 3

Course code: FIB4151

Ha Noi, 2023ACKNOWLEDGMENTSDuring the time of researching and implementing this research, I have receivedvery enthusiastic help and valuable words of encouragement With all respect andgratitude, I would like to express our sincere thanks to: The Board of Directors of theUniversity of Economics and Business —Vietnam National University has built alearning environment that helps me as well as the students of the whole university to be

Trang 3

motivated, opportunities to access and practice scientific research I sincerely thank the teachers and experts of the Faculty of Finance and Banking and other faculties in theuniversity for taking the time to answer and analyze the questions, contributing tocreating a foundation to help me confidently implement this research I would like toexpress my deep gratitude to Dr Nguyen Tien Chuong - the person who directlyguided me He is a very dedicated and enthusiastic teacher with new directions,detailed communication, frank suggestions and especially he is always conscious toshow me the greatest value I receive after completing my research I feel very fortunate

to receive his support.

TABLE OF CONTENTPART 1: INTRODUCTION cscsssssscsssssssscsscssssscssssssesssssessesssessessssssssesseseesees 1

1.1 The relevance of the research topic eee eeceseeseeereeeceeeecesecsseeeeseeeeaeeneeenees 1 1.2 Research ODJeCfIVGS SG SH HH ng ng 2In on 21.4 Research co ha e 31.5 Materials ao nn ee 31.6 Expected research COntrIDUfIOIS - s6 55 + S211 3 1E 9v re 3

In oi ố ố 3

PART 2: THEORETICAL BASIS AND LITERATURE REVIEW 52.1 Theoretical basis ceeceescessecescceseceseeeeseecescecesecececeaeeesseceeeseaeeceaeeeeeeeeees 52.1.1.

622 0u 0 h 5 2.1.2.Overview of deep learning - - 1191991191 1v vn ng ngư, 72.1.3.Overview of Deep learning for predicting time S€T1€s - «+ 82.2.Areview of related previous StUCIES - - ĂG ng re, 10 2.2.1.Review of research abroad - - + xxx k9 9 th ng ngư 10 2.2.2.Review of research in Vietnam - - -s+++ + + +skEssekeeesrereree 13 2.3.Research’s Sap 0 14PART

3: RESEARCH METHODOLOGY G1956 16 3.1.6910 16 3.1.1Long short-term memory (LSTM) . 55 << + k+svsseerereeeesee 16 3.1.2 Gatedrecurrent unit (GIRÙ) - << 322211 1E ‡+22 EE+seekeeeerseseerss 18 3.1.3 Convolution

neural networks (CNN) cS S22 eeeee 20 3.2 Materials and

\/0i1 i0 23 3.3 Data set and

Trang 4

BIŠ0si 50:0 1E ÖÖŠ 23 3.3.2 Data denoising, normalization and input preparation - - - 27 3.3.3 Feature SELECTION Strate VY 0 eeecceescessecsseecsseeesseceseecececeeeeceteeesseseneeeenee 29 3.4 Hyperparameter

0001022117177 2Ö 303.5 Model performance IT€fTICS - 2 + + + k1 E91 1E kg vn ng g 313.6Implementation DFOC€SS - - + 119119911911 911 91191 1 nh gi ng ngà 34 PART 4:EXPERIMENT AND RESULTS -5=<5=5 555555 5s seessesees 36 4.1 Singlelayer deep learning model reSuÏ(S ¿+ +++<s++se++eex+sex+ze+ 36 4.2 MultilayerDeep learning r€SuÏ(S - - s5 S11 9 9 gưy 41 4.3 Comparison of

single and multilayer deep learning models 44 PART 5:

CONCLUSION - Ăn HH HH HH 00 0000600000800008 006 47 5.1

Màn ÔỎ 475.2.Timplications 21177 — 47 5.3

Directions for future Studies - 5 s3 3 HH HH nghiệt 48REEERENCES o5 G000 00005009609006 49

LIST OF ACRONYMS

No ABBREVIATION EXPANSION

Artificial intelligence

HAR Human activity recognition

RNN Recurrent neural network

LSTM Long short-term memory

GRU Gated recurrent unit

CNN Convolution neural network

ARIMAX Autoregressive integrated moving average with

exogenous inputs

mm VNINDEX Vietnam stock market index

Trang 5

RMSE Root Mean Square Error

MAPE Mean Absolute Percentage Error

ee

1LIST OF TABLES

=T——=

_—T-Table 4.1 List of the best hyperparameters for single-neurons LSTM, 37

GRU, and CNN models

The performance scores of the single layer deep learning

models in the test data

Table 4.3 List of the best hyperparameters for multi-layer LSTM, GRU,

and CNN models

The performance scores of the best multi-layer deep learning

models in the test data

of the model’s performance

Trang 6

ilLIST OF FIGURES

Fig 3.1 Long short-term memory (LSTM) architecture

Fig 3.2 Gated Recurrent Unit (GRU) architecture m19

CNN architecture with m filters for multivariate time series 22

prediction.

Fig 3.4 Train, Validation and Test Split Dataset

Fig 3.5 Correlation heatmap among the variables

Fig 3.6 Schematic diagram of the proposed research framework 35

Fig 4.1 VNINDEX closing price along with moving averages 36

Fig 4.2 Average scores obtained from LSTM, GRU, and CNN models:

(a) RMSE, (b) MAPE, and (c) R on test dataset.

Boxplots of metrics: (a) RMSE, (b) MAPE, and (c) R of the

single-neurons LSTM, GRU, and CNN models

\o

True vs predicted plots of the single layer models: (a) LSTM,

(b) GRU, and (c) CNN on test data

+fan)Time series plots of the true and predicted values obtained

from GRU model with 150 neurons

best multi-layer LSTM, GRU, and CNN models

Trang 7

True vs predicted plots of the best multi-layer models: (a)

LSTM, (b) GRU, and (c) CNN on test data

Time series plots of the true and predicted values obtained

from GRU model with (100, 50) neurons.

multilayer deep learning models.

11

PART 1: INTRODUCTION1.1 The relevance of the research topic

The stock market, also known as the equity market, exerts a significant influence

on today's economy According to the efficient market hypothesis, stock prices are notpredictable, and their movements are essentially random However, recent technicalanalysis has shown that most of the stock value is reflected in historical data; therefore, understanding trends is crucial for effective predictions (Akhter and Misir, 2005).Moreover, stock markets are subject to the influence of a wide array of economicfactors, including political events, the overall economic climate, commodity price indices, investor expectations, movements in other stock markets, and investorpsychology, among other factors (Miao et al., 2007) Different technical indicators areused to derive statistical data from stock prices (Lehoczky, Schervish, 2018) In general, stock market indices, derived from stock prices, are heavily influenced byinvestment activities in the market and often serve as indicators of a country'seconomic conditions The nature of stock price fluctuations is uncertain and presentsrisks for investors The rise or fall in share prices plays a crucial role in determining aninvestor's profit Predicting stock prices has always been a_ challenging problem,primarily due to its inherent unpredictability in the long term (Asadi et al, 2012)

Forecasting stock prices is regarded as one of the most difficult tasks to accomplish

in financial forecasting due to the complex nature of the stock market Prediction willcontinue to be an interesting area of research making researchers in the domain fieldalways desiring to improve existing predictive models The reason 1s that institutionsand individuals are empowered to make investment decisions and ability to plan and

Trang 8

develop effective strategies about their daily and future endeavors The desire of manyinvestors is to lay hold of any forecasting method that could guarantee easy profitingand minimize investment risk from the stock market This remains a motivating factorfor researchers to evolve and develop new predictive models (G.S Atsalakis, 2011).

1The existing methods for stock price forecasting can be classified as follows:Fundamental Analysis, Technical Analysis, Time Series Forecasting Fundamental analysis is a type of investment analysis where the share value of a company isestimated by analyzing its sales, earnings, profits and other economic factors Thismethod is most suited for long-term forecasting Technical analysis uses the historicalprice of stocks for identifying the future price This method is suitable for short termpredictions The third method is the analysis of time series data It involves basicallytwo classes of algorithms; they are Linear Models and Non-Linear Models.

The existing forecasting methods make use of both linear (AR, MA, ARIMA) andnon-linear algorithms (ARCH, GARCH, Neural Networks), but they focus onpredicting the stock index movement or price forecasting for a single stock using thedaily closing price Here we are not fitting the data to a specific model, rather we areidentifying the latent dynamics existing in the data using deep learning architectures Inthis work we use three different deep learning architectures (LSTM, RNN, CNN) forthe price prediction of VNINDEX and compare their performance

1.2 Research objectives

The objective of this research is to construct a model for predicting the price of theVNINDEX using Deep Learning (DL), optimizing the model's parameters, andevaluating its performance in predicting stock price trends in the future By employingDeep Learning, we hope that this study will contribute to enhancing predictivecapabilities and supporting investment decisions in the stock market sector 1.3.

Trang 9

The best method to use for predicting the Vietnam stock market index.

1.4 Research scope

Scope of content: Vietnam's stock market index (VNINDEX)

Scope of time: From October 16, 2000, to October 6, 20231.5 Materials and methods

Since the research focuses on mathematical modeling, the methodology willprimarily involve quantitative methods and machine learning model training 1.6.

Expected research contributions

The research topic will contribute to the study and development of predictionmethods and models for the VNINDEX and stock prices using the Deep Learningmodel By applying and analyzing the prediction results on real stock data, the researchcan provide valuable insights into the effectiveness and applicability of the DeepLearning model in predicting the VNINDEX and stock prices in the Vietnamesemarket.

The findings from this research can help investors better understand stock pricepredictions, enabling them to make more informed investment decisions This, in turn,has the potential to optimize profits and reduce risks in the Vietnamese stock market

Additionally, the results of this research can serve as a reference for future studiesthat utilize machine learning models to predict the stock market index and stock prices.1.7 Research structure

The rest of the research includes the following content:

Part 2: An overview introduction to the research field of stock price prediction and the foundational knowledge that will be included in the research Specifically, thischapter will introduce the fundamental concepts of AI, as well as the deep learningmodel.

Part 3: In this part, I will delve into the proposed model and describe in detail thefeatures of the Deep Learning model for stock price prediction Additionally, datapreprocessing steps, model training, model evaluation methods will be discussed

Trang 10

3 Part 4: Presentation of experiments and comparison of the test results for each model Different dataset organization approaches will be explored, leading toobservations about the implemented models and the selection of an optimal modelbased on visualized results.

Part 5: Presents the conclusion and future work, followed by a list of references,and appendix

Trang 11

PART 2: THEORETICAL BASIS AND LITERATURE REVIEW2.1 Theoretical basis

2.1.1 Overview of AI

Artificial intelligence (AI), known as a kind of machine intelligence, generallymeans the intelligence showcased by man-made machines Typically, AI is defined as a typical type of computer program that emulates human intelligence General textbooksdescribe it as the study and creation of an intelligent agent capable of observing itsenvironment and taking actions to fulfill tasks given by human instructions (Russell et al., 2003) Classical definitions posit that AI, as an intelligent system, possesses theability to accurately interpret external data and use that information as knowledge toaccomplish its missions with adaptability demonstrated throughout the process (Russell

et al., 2009)

The field of artificial intelligence research is intricate and demands specializedknowledge for comprehension It encompasses a broad and diverse array of subfieldsthat are deep but not necessarily interconnected Key areas of focus include the development of reasoning, knowledge, planning, learning, communication, perception, manipulation of objects, tool usage, and control of machines, all with the aim ofmaking machines similar to or even surpassing human capabilities A wide range oftools employing AI technology are utilized, spanning from research and mathematicaloptimization to logical deduction Algorithms on the basis of bionics, cognitivepsychology, probability theory, and economics are also being developed As a result,artificial intelligence has the potential to evolve into an ultimately smart machine thatmay even surpass human capabilities In recent times, artificial intelligence hasgarnered increasing attention, particularly in its practical applications This includescomputers embedded with AI functions and various types of robots capable of making decisions on commercial or political matters.

Artificial intelligence also refers to the concept of simulation In this process,machines are designed to mimic human behaviors, including learning, comprehension,planning, and decision-making The fundamental idea is to imbue

5 computers with intelligence, enabling them to perform tasks akin to human cognition and thereby enhancing their overall performance Artificial intelligence has a large

Trang 12

scope of subjects including computer science and psychology, etc combining elements from both the natural and social sciences, which adds to its complexity It is also hard

to figure out the relationship between artificial intelligence and thinking science, which

is a reflection from practice and theory

From the standpoint of cognitive science, artificial intelligence (AI) has itslimitations, particularly in terms of logical thinking and the ability to visualize or imagine While the advancement of artificial intelligence can be spurred by creativethinking and inspiration, it cannot entirely forsake its foundation in mathematical logic.Mathematics is intricately connected with linguistics and the thinking process, playing

a crucial role in AI development and hastening its evolution Strong AI proponentssuggest that it's possible to create intelligent machines that exhibit sentience, self-awareness, and the ability to reason and problem-solve In contrast, advocates of weak

AI argue that achieving AI with human-like capabilities is highly improbable Although

AI technology may appear intelligent on the surface, the underlying systems still lack independent thinking, let alone self-consciousness.

Artificial intelligence, as a field of study, has been under development for manyyears, resulting in a multitude of theories, methodologies, techniques, andcorresponding systems aimed at simulating human intelligence The overarching goal

of artificial intelligence is to enable machines to comprehend, reason, and learn likehumans, essentially using computers or other agents to replicate human intelligence.One widely adopted approach to realize artificial intelligence is training neuralnetworks to analyze and interpret relevant information through learning processes(Nilsson, 1982) The neural network improves its algorithms by iteratively adjustingresults based on pairs of input data (x) and target output data (y), striving to make itspredictions progressively closer to the desired y values.The neural network with x and

y values is the supervised learning type, while the neural network with x and y

6values is the unsupervised learning, also known as the clustering algorithm (Anderson

et al., 1984).

The field of AI involves processes of cognition, decision-making, and feedback.Human brains sometimes perform poorly with intricate scientific and engineering

Trang 13

calculations, a task that modern computers can not only complete but also execute more effectively, rapidly, and accurately than human cognition Consequently, contemporarysociety no longer views these processes as daunting challenges, thanks to the aid ofartificial intelligence AI applications encompass a diverse array of subjects, spanningfrom deep learning and computer vision to intelligent robotics, speech and naturallanguage recognition, context analysis, linguistics, and gesture control (Wodecki et al.,2019).

2.1.2 Overview of deep learning

Deep learning, as an applied branch of artificial intelligence, holds significantimportance in the market with many companies and substantial investor interest Indeep learning, a network is constructed, and the initial connections are randomlyinitialized Data is then input into the network, and the network processes these inputs and learns from them If the actions align with the specified criteria, the networkincreases the weights; if not, it reduces them This weight adjustment process takesplace over hundreds of thousands of iterations, enabling the network to learn moreeffectively than humans The foundation of deep learning can be traced back to research on artificial neural networks, with a common deep learning architecture beingthe multilayer perceptron, which includes multiple hidden layers

Deep learning is characterized by its ability to combine low-level features to create more abstract high-level representations of attribute categories or features, uncoveringdistributed feature representations within the data In 2006, Hinton et al introduced theconcept of deep learning and developed an unsupervised training algorithm based onbelief networks known as Deep Belief Networks (DBN), aimed at addressingoptimization challenges related to deep structures Another significant advancement indeep learning is the multilayer autoencoder, a new structural

7innovation Besides, the first real multilayer structure learning algorithm is theconvolutional neural network, invented by Lecun et al It leverages the relative spatialrelationships to reduce the number of parameters, thus enhancing training performance.The goal of deep learning is to emulate the analytical learning process of the humanbrain within machines These machines are designed to process and generate data,including text, speech, and images, similar to how the human brain operates Deep

Trang 14

learning methods can be categorized into two broad types: supervised learning and unsupervised learning, much like machine learning approaches.

Various frameworks and structures have been developed for deep learning, including the convolutional neural networks (CNNs) are a machine learning modelunder deep supervised learning In contrast, deep belief networks (DCNS) are amachine learning model under the deep supervised learning Nets (DBNs) is a machinelearning model under unsupervised learning Deep learning primarily depends onfactors such as data, algorithms, and computational power The proliferation of theinternet has led to the accumulation of vast amounts of data, and the increasingcapabilities of computers have provided more robust computing power Furthermore,the optimization of some algorithms becomes more and more mature, which promotesthe development of deep learning (Genesereth et al., 1987).

2.1.3 Overview of Deep learning for predicting time series

Deep Learning (DL), a particular type of machine learning algorithm with multilayer structures for processing higher-level abstractions from the input dataset(Goodfellow et al., 2016), is very well suited for very large datasets, as most of its layercomputations can be implemented in parallel and distributed computing techniques can

be applied easily Deep Learning models have been shown to perform satisfactorily formany time series analysis tasks like forecasting For example, Deep Learning hasproven excellent in human activity recognition (HAR) tasks, where wearable sensorscan connect people with the cyber—physical system through HAR (Zheng et al, 2018).Deep Learning has also been employed for tippingpoint prediction, with performancebetter than traditional early warning systems

8 (Dablander et al, 2022) Nevertheless, the reliability of these forecasting methods is not guaranteed (Livieris et al., 2020) Deep Learning methods may also face the overfittingproblem Common preprocessing methods like smoothing, transformation, andestimation can remove the noise in time series signals in advance and improve theoverall performance of the time series models The performance of Deep Learningmodels may improve with preprocessing of time series inputs, usually on conditionsthat the data distribution at test time is similar to that at training time In many real-world applications, the stationarity of the datasets may no longer be true There are four

Trang 15

basic components that can cause non-stationarity in some time series These four components are trend, seasonal, cyclical, and irregular components (Mahmoud &Mohammed, 2021) The trend component refers to long-term increases or decreasesover time, with examples like long-term population growth The seasonal componentrefers to the existence of seasonality.

Deep Learning is capable of modeling the complex non-linear relationships amongthe variables, while traditional neural network needs to assume that all input vectors areindependent, which may lead to its ineffectiveness for sequential data prediction(Rajagukguk et al., 2020) Comparing DL with the conventional time series methods(Mahmoud & Mohammed, 2021; Cai et al., 2019), it is found that Deep Learningmodels can give better representation and classification Cai et al (2019) investigatedthe day-ahead multi-step load prediction of commercial buildings with RNN and CNN against autoregressive integrated moving average with exogenous inputs (ARIMAX), apopular traditional time series method for the time series modeling of load forecasting.The results show that the CNN approach with a direct multi-step procedure canperform better than the seasonal ARIMAX by a 22.6% improvement in predictionaccuracy This illustrated that the Deep Learning hierarchical structure may have thecapability to handle data-dependent uncertainty better It is also shown that the long- term trends can be explored better when the preprocessing tool of the moving averagesmethod is deployed for smoothing the — short-term fluctuations Mahmoud andMohammed (2021) presented a survey of

9Deep Learning models, such as CNN, RNN, LSTM, GRU, deep autoencoders (AEs),restricted Boltzmann machines (RBM), deep belief networks (DBNs), in the time seriesforecasting of electricity load and price, solar power, and finance, with comparison results showing that DL performs better than classical methods.

2.2 A review of related previous studies

2.2.1 Review of research abroad

Since the birth of the stock market, the volatility of the stock's closing price hasbeen closely watched at all levels, and there have been several ways to study the rules

of the stock market to predict its performance In a business environment, we want toaccurately and effectively predict multiple financial variables to make the right

Trang 16

decisions to avoid huge losses However, the financial time series data analysis andprediction is extremely difficult, and the most complex task is to improve theinvestment decisions because the stock market is essentially a dynamic, nonlinear, non-stationary, non-parametric, noisy, chaotic system The price trend is extremely complexand easily affected by factors affecting economic variables, industry specific variables,company-specific variables, investor's psychological variables, political variables, etc.Therefore, how to predict the stock market more accurately is a subject of wide attention from many scholars at home and abroad.

In the world, there are many research works on stock price prediction Mostresearchers focus on machine learning algorithms, with theoretical investment analysistechniques being the method to predict stock prices directly through the study ofhistorical stock market data Predicting the stock market is one of the important

and challenging issues One of the most employed techniques in this area is theArtificial Neural Network (ANN) introduced by Verma et al (2017) ANNs are oftensusceptible to the issue of overfitting As an alternative to mitigate this overfittingconcern, Support Vector Machines (SVMs) can be employed (M S Babu et al., 2012).Usmani et al (2016) aimed to forecast the Karachi Stock Exchange (KSE) trends intheir study, primarily focusing on daily closing values using various machine learningalgorithms They applied traditional statistical models like ARIMA

10 and SMA for price prediction, along with other machine learning models such as SLP (Single Layer Perceptron), MLP (MultiLayer Perceptron), RBF (Radial BasisFunction), and SVM (Support Vector Machine) Among these methods, the MLPalgorithm outperformed the others Kumar and Haider (2019) compared theperformance of single-class classifiers with multi-class classifiers, which are acombination of machine learning techniques such as decision trees, support vectormachines, and logistic regression classifiers The test results showed that multi-classclassifiers outperformed others and led to a more accurate model with a growth inaccuracy of about 10 to 12% Cervelló-Royo and Guijarro (2020) compared theperformance of four machine learning models to assess the predictive capabilities oftechnical indicators in the NASDAQ technology index The results showed that Random Forest outperformed other models considered in their study and could predict

Trang 17

market trends in the next 10 days with typical accuracy of 80%.

Long el al (2020) examined deep neural network models using public market data and trading profiles to evaluate stock price movements Experimental results showedthat bidirectional LSTM could predict stock prices for financial decisions and was themost effective forecasting method compared to other models Pang el al (2018)attempted to improve a creative neural network method to get better stock marketpredictions They proposed LSTM with an embedding layer and LSTM with anautoencoder to assess stock market movements The results showed that deep LSTMwith embedding layers performed better, and the model's accuracy for the ShanghaiComposite Index was 57.2% and 56.9% respectively Kelotra and Pandey (2020) used

a deep convolutional LSTM model as a prediction tool to test the effectiveness of stockmarket movements The model was trained using the King Fisher optimization algorithm based on Rider, and they achieved a minimum MSE and RMSE of 7.2487and 2.6923 Bouktif el al (2020) investigated the ability to predict stock market trenddirections with an improved sentiment analysis approach Ultimately, the proposed method outperformed others significantly, predicting stock

11

market trends with an accuracy of over 60% compared to other sentiment-based deep

learning prediction methods

Zhong and Enke (2019) proposed a comprehensive dataset of SPDR S&P 500 ETF

to evaluate profits with 60 economic and financial characteristics Deep neuralnetworks and artificial neural networks (ANN) were used through Principal ComponentAnalysis (PCA) to predict the future daily returns of the stock market index The resultsshowed that deep neural networks outperformed other networks when classified based

on PCA-represented data Das, Mishra, and Rout (2019) optimized features byconsidering social and biochemical aspects of the firefly algorithm The resultsindicated that the firefly algorithm, applied to the Online Sequential Extreme LearningMachine (OSELM) prediction method, was the best model among other tested models.Hoseinzade and Haratizadeh (2019) proposed a Convolutional Neural Networks (CNN) framework that could be applied to various data collections related to different markets

to explore predictive features for future market movements The results showed

Trang 18

significant improvement in prediction performance compared to other recent baseline methods.

Chung and Shin (2020) applied one of the deep learning methods (CNN) to predictstock market trends Additionally, the Genetic Algorithm (GA) was used tosystematically optimize the parameters of the CNN method, and the results showed thatGA-CNN outperformed compared to models that combined GA and CNN Sim el al.(2019) proposed CNN for stock price prediction as a new learning method The research aimed to address two issues, using CNN and optimizing them for stock market data Wen et al (2019) applied the CNN algorithm on time series noise data usingcommon patterns as a new method The results demonstrated the effectiveness andsuperiority of this method over traditional signal processing methods, with an improvedaccuracy ranging from 4% to 7%

Chung and Shin (2018) used a combination of LSTM and GA to improve a newstock market prediction model The final results showed that the combined model ofLSTM and GA outperformed the standard model Chen el al (2018) used three neural

12 networks: a transductive neural network, an extreme learning machine, and three traditional artificial neural networks, to evaluate their performance on high-frequencystock market data Their results indicated that deep learning methods extractedinformation from non-linear features and could strongly predict the future market.Chong el al (2017) attempted to test the performance of deep learning algorithms forstock market prediction with three unsupervised feature extraction methods: PCA,restricted Boltzmann machines, and autoencoders The final results, with significantimprovements, showed that additional information could be extracted by deep neuralnetworks from the auto-regressive model

Long el al (2019) proposed an innovative end-to-end model called the multi filtersneural network specifically designed for the task of predicting prices and featureextraction from financial time series data Their results showed that the networkoutperformed conventional machine learning methods, statistical models, convolutionaland recurrent neural networks (CNN and LSTM) in terms of accuracy, stability, andprofitability Moew et al (2019) suggested using deep neural networks with stepwiselinear regression in feature engineering, along with exponential smoothing for this task,

Trang 19

with regression gradients as an indicator of movement intensity over a specific time frame The results demonstrated the feasibility of the proposed method, with highaccuracy and considering the statistical significance of the results to confirm theadditions, as well as the significant impacts on modern economics.

2.2.2 Review of research in Vietnam

Tran Trung Hieu et al (2012) proposed a hybrid approach called GA-SVR(Genetic Algorithm - Support Vector Regression) for predicting stock prices inVietnam In this hybrid method, GA simultaneously performs two tasks: optimizingparameters for SVR and selecting input features The optimal parameters and selectedinput features are then used to train the SVR The test results showed that the proposedmethod outperformed SVR, ANN, and had practical applicability in the Vietnamesestock market, which is relatively young and volatile

Means in the model significantly improved the algorithm's execution time Nguyen DucHien and Le Manh Thanh (2015) proposed a stock price prediction model based on thecombination of SOM (Self-Organizing Maps) and fm-SVM (fuzzy multiple Support

Vector Machine) Experimental results on test data showed that the proposed model

significantly improved prediction accuracy or was at least equivalent to other models,

as indicated by values of NMSE, MAE, and DS This improvement came along with asubstantial reduction in the number of fuzzy rules used in the models

Truong Thi Thuy Dung (2021) discussed the use of deep learning models, specifically LSTM (Long Short-Term Memory), for predicting the VNINDEX closing index The results indicated that the LSTM model accurately captured the main trend ofprice movements in the next trading session and provided forecasts close to the actualvalues

Hoang Van Hai and Dang Thi Thu Hien (2023) conducted research on time series forecasting in finance, specifically predicting the closing price of stocks based on

Trang 20

historical data from the previous day They used various technical indicators in their analysis to model scenarios of sharp price declines or sudden increases due to externalshocks (such as epidemics or wars) Additionally, they applied and comparedalgorithms like LSTM, BiLSTM, CNN, and ARIMA The results showed that CNNoutperformed the other models in predicting the closing price for the next day,highlighting CNN as an effective and promising tool for stock price prediction.

2.3 Research’s gap

Vietnam's stock market is experiencing significant development in recent years.

This market possesses distinctive characteristics compared to stock markets in other

14countries It includes the presence of price limits such as +7% (HOSE), +10% (HNX),

or +15% (UPCOM) of the opening price for each index within a trading day Theselimitations are in place to control abnormal market fluctuations and mitigate marketshocks related to political issues, among other factors Despite substantialdevelopments witnessed in the Vietnamese stock market, there has been an insufficientamount of research on predicting stock price volatility using modern AI methods Previous studies often relied on basic data and did not delve into exploring additional factors that could impact stock market predictions Several studies only appliedselection methods to technical indicators while neglecting micro and macro factors,which may play a pivotal role in a stock index trend It is crucial to develop a modelthat strikes a well-balanced equilibrium between market variables while maintainingsimplicity in the model's architecture Our contribution lies in building a modern deeplearning model that combines fundamental, technical analysis, and macroeconomic variables to capture market behavior from various dimensions.

Trang 21

15 PART 3: RESEARCH METHODOLOGY3.1 Models

3.1.1 Long short-term memory (LSTM)

The LSTM model was first introduced in a paper titled "Long Short-TermMemory" by Sepp Hochreiter and Jiirgen Schmidhuber, published in 1997 In theirpaper, Hochreiter and Schmidhuber described the LSTM model as a solution to thevanishing gradient problem in traditional recurrent neural networks (RNNs).

The LSTM model proposed by Hochreiter and Schmidhuber consists of a memorycell and three interactive gates: the input gate, forget gate, and output gate Thememory cell is responsible for storing and updating the memory state, allowing the network to retain important information over long sequences.

Through the employment of these gates and the memory cell, LSTM models are adept at effectively capturing long-term dependencies in sequential data while mitigating the vanishing gradient problem This characteristic renders them especially well-suited for tasks related to the processing and comprehension of sequentialinformation, such as speech recognition, language modeling, and machine translation

LSTM models have proven to be highly effective in various applications and havebecome widely popular Their ability to remember information over long periods is aninherent feature, requiring no additional training for memory retention While LSTMsshare a sequential architecture with standard RNNs, the modules within them have adifferent structure from regular RNNs Instead of having just one neural network layer,they consist of four layers that interact with each other in a specific way

While standard RNNs are better at preserving information compared to traditionalnetworks, they face challenges in effectively learning long-term dependencies due tothe vanishing gradient problem as (Hochreiter, 1998) In contrast, LSTM tackles thevanishing gradient problem by incorporating memory cells An LSTM modelcomprises an input layer, a hidden layer, a cell state, and an

16output layer (Gers el al., 2000; Hochreiter and Schmidhuber, 1997) The core element

in the LSTM architecture is the cell state, which traverses the sequence while undergoing only linear interactions to ensure the information flow remains intact The

Trang 22

gate mechanism of LSTM is responsible for either removing or modifying the information within the cell state This is a way to selectively transmit information,including sigmoid layers, hyperbolic tangent layers, and element-wise multiplicationoperations.

Figure 3.1 illustrates the architecture of the LSTM at time @@designed for sequential

input modeling Specifically, four gates - output, change, input, and forget - are depicted

with their operations at time t For the given input sequence {@@t, O@&, , eo).

where @@@€ OOP 1is the input sequence at time @@ The memory cell ệ@@updates

information through three gates: the input gate @@ge the forget gate @@ge@ and the

change gate ~@@ The hidden state h@@s updated using the output gate @@and the memory cell @@ge@ At time @@ the gates and their corresponding layers compute the

following functions:

17

000 = GXKOOCOO + GOoe 00-1 + Q06), (1) CGO = Ô@@ooQQoo +

OR.00100-1 + 960), (2) COLO = OKOGCHOLE + OO.o000-1 + Q60) (3)

“e@— tanh(@@ee@@ce+ OO.oo00-1 + Q0), (4) Qoo— G00ve® O@ve-1

+ @@0e® ee (5) họọ= @Qeo@ tanh(@@ge) (6)

In which, @@and tanh represent the sigmoid and hyperbolic tangent activation

functions, respectively, ® is the element-wise multiplication operator, @@c OPS?

Trang 23

n,k, d respectively denote the sequence length, the number of features, and the hiddensize (Greff, Srivastava, Koutnik, Steunebrink, & Schmidhuber, 2017; Lei, Liu, & Jiang,2019; Qiu et al., 2020).

3.1.2 Gated recurrent unit (GRU)

Gated Recurrent Unit (GRU) is a type of recurrent neural network (RNN) that wasintroduced by Chung et al in 2014 (Chollet, 2017) as a simpler alternative to LongShort-Term Memory (LSTM) networks Like LSTM, GRU can process sequential data such as text, speech, and time-series data Like LSTM, GRU is designed to modelsequential data by allowing information to be selectively remembered or forgotten overtime However, GRU has a simpler architecture than LSTM, with fewer parameters, which can make it easier to train and more computationally efficient.

The main difference between GRU and LSTM is the way they handle the memorycell state In LSTM, the memory cell state is maintained separately from the hiddenstate and is updated using three gates: the input gate, output gate, and forget gate InGRU, the memory cell state is replaced with a “candidate activation vector,” which isupdated using two gates: the reset gate and update gate The reset gate determines howmuch of the previous hidden state to forget, while the update gate determines howmuch of the candidate activation vector to incorporate into the new hidden state.Overall, GRU is a popular alternative to LSTM for modeling sequential data,

18especially in cases where computational resources are limited or where a simpler

architecture is desired

Fig 3.2 Gated Recurrent Unit (GRU) architecture

Trang 24

Like other recurrent neural network architectures, GRU processes sequential data oneelement at a time, updating its hidden state based on the current input and the previoushidden state At each time step, the GRU computes a “candidate activation vector” that

combines information from the input and the previous hidden state This candidate vector is then used to update the hidden state for the next time step The short-term

(he@ and long-term (@@ge) information of LSTM are merged into a single vector

heen GRU As opposed to the four gates in LSTM, GRU has three gates: reset gate,

change gate, and update gate The update gate of GRU is equivalent to the forget gateand input gate of LSTM (Géron, 2019) Thus, a single gate decides what to forget andupdate in GRU instead of two gates in LSTM GRU cell takes two different pieces of

information: the current input sequence @@g@ the short-term memory from the previous cell h@@-1, at time @@ The update gate carries the long-term dependencies in

GRU It determines the past information that needs to be passed into the next step The

reset gate takes the information from @@and h@@-1 and produces the output

19between 0 and 1 through the sigmoid layer and then it identifies which information to

discard from the previous hidden state h@@-1 When the value is 1, it stores all the

information in the cell while with a value of 0, it forgets all the information from theprevious hidden state Based on empirical evidence, LSTM and GRU have proven theireffectiveness on many machine learning tasks (Agarap, 2018; Chorowski et al., 2015;Wen et al., 2015; Yang et al., 2016) The operations of each gate are shown in Fig 3.2

For a given input sequence {@@t, O@, , O@ee}, Qoọc OG! is the input

sequence at time t It takes the input @@g@ and the hidden state h@@-1from the

Trang 25

previous time step @@— 1 at time @@ It outputs a new hidden state h@@and forwards it

again to the next time step (Zhang et al., 2021) At time t, the respective gates andlayers compute the following functions:

where, @ and tanh represent the sigmoid and hyperbolic tangent functions

respectively, the operator ® is the element-wise product, @@c @@%®⁄% @@, c

OPC re weight matrices, and @@c OE” are bias vectors Moreover, n, k, d are

sequence length, the number of features, and the hidden size respectively.

3.1.3 Convolution neural networks (CNNs)

A Convolutional Neural Network (CNN) is a popular deep learning techniquewidely used in computer vision It is inspired by Hubel and Wiesel’s experiments onanimals’ natural visual perception mechanism (Hubel, 1959) Motivated by theirexperiments, Fukushima and Miyake developed a neural network model for the mechanism of visual pattern recognition (Fukushima & Miyake, 1982) In 1990, Lecun

et al developed a handwritten digit recognition model that is considered a base modelfor current CNN architecture (LeCun et al., 1989) There are many variations andbenchmark models of CNN so far CNN models are also used in natural language

20 processing, voice recognition, and stock price prediction In this study, we discuss the

CNN architecture, particularly for time-series predictions.

CNN architecture has the following components: input, convolutional layer withthe non-linear activation function, pooling layer, fully connected layer, and output Alllayers of CNN have training parameters except the pooling layer The number ofconvolutional, pooling and fully connected layers varies based on the complexities ofthe task Generally, a higher number of layers is used for a complex task.

For a given multivariate time series {@@t, @@, , so) Qsọc CGP! a

@@is the number of input variables CNN views a time step as a sequence over which

convolutional operations can be performed as a one-dimensional image Since eachseries has observations at the same time step, the input time series are parallel We can

Trang 26

reshape these three arrays of data (no of samples, time-step, no of features) as a single dataset where each row is a time step and each column is a separate time series

(Brownlee, 2018b, 2018c) Thus, from n observations, we will have @@— @@since

we do not include the last observation in the input as the last row will be the output of

the previous step input) many matrices of size @@g@x 2s in LSTM, each matrix is treated as an image of size @@x @Ogein CNN.

The convolution operations are performed on the time axis with filters of size @@x

@@ where @@is the length of a filter, a hyperparameter to be tuned The number of

filters m and convolution stride @@are other hyperparameters to be chosen based on

experiments or domain knowledge Then, from @@many filters, we get @@ many

univariate series of length @@eeee= |%e-ee

eo 1|.

After convolution operations with @@ filters, we apply non-linear activation function on feature map, then apply pooling operation of size @@ge@ x 1 for each univariate feature with a stride of @@g@ Then the size of the feature map after pooling

Fig 3.3 CNN architecture with m filters for multivariate time series

prediction.

Trang 27

[Ông mai datah are jIrpat)

Fig 3.3, we take a data frame of size @x @@with @@observations and @@many

input variables For the input to the CNN, we take time steps 5 (@@g@= 5), so the size

of an input image is @ 5 For each image, we use m filters of size 2 x @@and slide

the filter on the time axis with stride 1 Then, after the convolution operation, we get

@@ feature maps from @@ filters After convolution operations, we use non-linear

activation functions such as ReLU or Leaky ReLU A pooling of size 2 x | with stride

2 is performed for downsampling Then, the feature maps from each filter are

22vectorized into a single sequence and formed a fully connected layer Finally, the

output ˆ1is predicted using the linear activation function

3.2 Materials and Methods

Table 3.1 Computing environmental conditions

Machine configuration Visual Studio Code

Python 3.9.13, TensorFlow, and Keras APIs

Trang 28

This research uses Python 3.9.13 programming language Python is a programming language that can help people work quickly and integrate systems more efficiently.Python is a widely used general-purpose, high level programming language It wascreated by Guido van Rossum in 1991 and further developed by the Python SoftwareFoundation It was designed with an emphasis on code readability, and its syntaxallows programmers to express their concepts in fewer lines of code.

Table 3.1 summarizes the experimental environment and libraries used for thisstudy The experiment uses python programming environment along with TensorFlowand Keras APIs The research also utilized several libraries, including NumPy, Pandas,and scikit-learn, for data processing and index calculations All the experiments are conducted in a machine configuration as stated in Table 3.1.

3.3 Data set and materials

We start with a brief description of the data used in the proposed model Theprediction of closing prices is based on a combination of fundamental trading data,

23macroeconomic data, and technical indicators related to the underlying index Theseinput features are drawn from three distinct categories, as outlined in Table 3.2 To ensure uniformity among the variables, we have transformed the monthly data intodaily data using the forward filling method

Table 3.2 List of features for the model

Trang 29

Interest rate Tradingview Monthly

Consumer Price Index Tradingview Monthly

Technical Indicator

Moving Average

Convergence Divergence

Average True Range

Relative Strength Index

The first set of variables presented in Table 3.2 is fundamental or historical data onthe closing price of VNINDEX The data collection period spans from 16/10/2000 to06/10/2023, with all historical trading data sourced from Tradingview (https://www.tradingview.com), one of the popular sources for detailed informationabout the stock market

The second set of variables demonstrated in Table 3.2 is macroeconomic variablesthat significantly influence stock market performance The representative features thataffect the stock price prediction under the umbrella of macroeconomic factors areInterest Rate (IR), Consumer Price Index (CPI), and Exchange Rate to US Dollar (ER)

Interest Rate: Fama (1981, 1990) argues that interest rates have an inverse relationship with stock prices in the long run This relationship stems directly from thepresent value model through the influence of the long-term interest rate on the discountrate (Uddin and Alam, 2010) The negative relationship is also based on the

24view that a rise in interest rates leads to higher borrowing costs, lower future profits, anincrease in the discount rate for equity investors, and subsequently, a decrease in stockprices Therefore, increases in interest rates have an indirect impact on stock prices(Ibrahim & Musah, 2014).

Exchange Rate: An exchange rate represents the value of one nation's currency

Trang 30

relative to another's The appreciation of the exchange rate can negatively affect export competitiveness, impacting economic growth (Adhikari, 2018; Paudel & Burke, 2015).Given the interconnectedness of the economy and the broader stock market, thevariable "Exchange Rate to US Dollar" is included as a predictor in the model.

Consumer Price Index: The Consumer Price Index (CPI) is a measure that examines the weighted average prices of a basket of consumer goods and services CPI

is the most widely used measure of inflation and, by proxy, the government’s economicpolicy There is always a long-run relationship of the effect of consumer price indexwith stock market data (Devkota, 2018; Panta, 2020; Shrestha & Pokhrel, 2019)

The set of final variables outlined in Table 3.2 comprises technical indicators, including Moving Average Convergence Divergence (MACD), Average True Range (ATR), and Relative Strength Index (RSI) These technical indicators are mathematical computations applied to variables such as price or other technical indicators Traders frequently employ them in the market, primarily for the analysis of short-term pricefluctuations

MACD, developed by Gerald Appel, is one of the widely applied technicalindicators (Wang & Kim, 2018) It is a momentum oscillator calculated by subtractingthe 26-day Exponential Moving Average (EMA) from the 12-day EMA (Murphy,1999) EMA differs from a simple moving average as it assigns more weight to laterdata values Traders use MACD to identify trends, directions, momentum, and potentialreversals in stock prices For example, if the MACD line crosses above 0, it signals abuying opportunity Similarly, when it crosses below 0, it signals a sell

25order Research results demonstrate the significant potential of the MACD indicator inmaking optimal investment decisions (Chong & Ng, 2008; Chong, Ng, & Liew, 2014;Eric, Andjelic, & Redzepagic, 2009)

J Welles Wilder Jr is renowned for introducing several technical market indicators ATR characterizes the average volatility level of a stock over a specificperiod Stock traders use this indicator as part of their risk management strategy to setstop levels It is also valuable for identifying significant price movements, aiding in therecognition of the onset of a trend ATR is based on the concept of the true range,which measures a stock's trading range by highlighting price gaps and whether the

Tiêu đề	Deep Learning For Predicting Stock Market Index
Tác giả	Nguyen Quang Bac
Người hướng dẫn	Dr. Nguyen Tien Chuong
Trường học	Vietnam National University
Chuyên ngành	Finance - Banking
Thể loại	Graduation Thesis
Năm xuất bản	2023
Thành phố	Ha Noi

Định dạng
Số trang	60
Dung lượng	31,33 MB