1. Trang chủ
  2. » Luận Văn - Báo Cáo

phân tích và trực quan hóa dữ liệu chứng khoán của sàn new york stock exchange nyse bằng cấu trúc mạng thần kinh hồi quy

27 6 1
Tài liệu được quét OCR, nội dung có thể không chính xác
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Nội dung

Trang 1

DAI HOC QUOC GIA THÀNH PHÓ HÒ CHÍ MINH ae

TRUONG DAI HOC KINH TE - LUAT

ĐỎ ÁN CUOIKY - HK2 (2022-2023) MON HOC PHAN TICH DU LIEU

TEN DE TAI: Phan tich va trực quan hóa dữ liệu chứng khoán của sản

New York Stock Exchange (NYSE) bang cau tric mang than kinh hồi quy

Danh sách sinh viên thực hiện:

MSSV Họ và tên Email K214140935_ Nguyễn Minh Đức ducnm21414c@st.uel.edu.vn K214140937 Huynh Thi Ngoc Ha hahtn21414c(st.uel.edu.vn K214140944 Võ Quỳnh My myvq21414c@st.uel.edu.vn K214142102 Nguyén ThiKim Ngan nganntk21414c@st.uel.edu.vn K214130904 Trịnh Văn Thanh thanhtv21413c@st.uel.edu.vn

Giảng viên hướng dân: TS Nguyễn Thôn Dã

TP HCM 05/2023

AEG

Trang 2

TABLE OF CONTENTS

L0: 00 2 I3MN9)009702.17 3

6.1 STRENGTHS 0.00 cece cc ccecceeeeeecccceceseeceeseceeseeeceseeunntesececeeccenntenseeeseceneseeentntseeess 21 6.2 LIMITATION 20.00.00 cccccccceeeeeeeecccccceeeeecnceesecccceseeneteseccesseeetttesseeeestttssececeeseentttnnneees 21 6.3 PROPOSING DEVELOPMENT DIRECTION S22 2c 1111121121211 111 111111999 22511 111 xe2 22

Trang 3

ABSTRACT

Stock price movement is non-linear and complex As a result, the art of forecasting stock prices has been a difficult task for many researchers and analysts The majority of them are curious about the stock price trajectory going forward They aspire to obtain an accurate and successful model Neural networks have become a popular tool for stock prediction recently Convolutional neural networks (CNN), recurrent neural networks (RNN), long short-term memories (LSTM), and gated recurrent units (GRU) are only a few of the numerous and various forecasting models available In this article, we present all models that calculate an average of the stock market data from the previous five days (date, open, low, high, close, adj close, volume) and use that number to predict stock prices for the following five days For testing using Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), Long Short-Term Memory (LSTM), and Gated Recurrent Units (GRU), we use data from three companies obtained from Yahoo Finance

Key words: Stock price prediction, CNN, RNN, GRU, LSTM, Vietnam

Trang 4

Table |: Table 2: Table 3: Table 4: Table 5: Table 6:

LIST OF TABLES

Description of Coty Íirm Đá 2 01011112111 1211121 1112111101112 2111112011 13 Description of Coca-Cola firm - 2c 2211112111211 1121112 11111221111 14 Description of Kellogg? Íirm - Q22 112102111211 112111 2111121111118 1 11g 15 Performance Metrics for Machine Learning Models - Coty Firm 18 Performance Metrics for Machine Learning Models — Coca-Cola Firm 20 Performance Metrics for Machine Learning Models — Kellogg Firm 22

Trang 5

Figure 1: Figure 2: Figure 3: Figure 4: Figure 5: Figure 6: Figure 7: Figure 8:

LIST OF FIGURES

OvervIiew modelÌL ác t2 1211211 121111111111 1110110111 111111 11111111 TH Hy 11 Correlation of three stock cccccccccccecseceeetecenecnecensesssenseesseeseeneensentsenseed 16 Coty firm's closing price trend and prediction 00 0 ccc cent eesente ees 17 Machine learning test with the last 60 observations of Coty firm 18 Coca-Cola firm's closing price trend and prediction - - 19 Machine learning test with the last 60 observations of Coca-Cola firm 20 Kellogg firm's closing price trend and prediction 55:2 2555: 21 Machine learning test with the last 60 observations of Kellogg firm 22

Trang 6

1 INTRODUCTION

The global stock market serves as a critical pillar of the modern economy, facilitating capital allocation, investment, and wealth creation on a massive scale With trillions of dollars in daily trading volume and countless participants worldwide, the stock market has evolved into a dynamic and interconnected network of exchanges, where the prices of individual stocks fluctuate based on a multitude of factors In recent years, the stock market has evolved into a dynamic and highly complex ecosystem, driven by a myriad of factors ranging from economic indicators to geopolitical events The ever-changing nature of the market, coupled with the increasing demand for accurate stock price predictions, has led researchers and analysts to explore imnovative methodologies and tools to forecast stock prices effectively

The importance of stock price prediction stems from the profound impact it can have on investment decisions and financial outcomes Successful forecasts enable investors to capitalize on emerging trends, identify undervalued assets, and time their trades effectively By gaining insights into potential market opportunities and risks, investors can optimize their portfolio allocation, manage their exposure, and potentially achieve higher returns Conversely, inaccurate or unreliable predictions may lead to missed opportunities, suboptimal decision-making, and financial losses The scale and complexity of the stock market, combined with the multitude of factors influencing stock prices, make forecasting a challenging endeavor Traditional approaches to prediction, such as fundamental analysis and technical indicators, often struggle to capture the intricate interplay of market variables, investor sentiment, and external events Consequently, researchers and analysts have turned to innovative methodologies, including the application of neural networks, to address the non-linear and complex nature of stock price movements

The advent of neural networks has revolutionized the field of stock price prediction Neural networks, inspired by the structure and functioning of the human brain, offer a powerful toolset for modeling complex patterns and relationships within financial data Among the wide array of neural network models available, this essay 5

Trang 7

focuses on four prominent ones: Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), Long Short-Term Memory (LSTM), and Gated Recurrent Units (GRU)

The objective of this study is to explore the effectiveness of these neural network models in predicting stock prices for three companies: COTY, KO, and K These companies represent different sectors of the stock market and possess distinct characteristics that make them interesting subjects for analysis By utilizing the average stock market data from the previous five days, including date, open, low, high, close, adjusted close, and volume, these neural network models aim to forecast the stock prices for the following five days

The utilization of neural networks in stock price prediction has gained significant attention due to their ability to capture non-linear relationships, adapt to changing market conditions, and learn from historical patterns The reliance on neural network models for stock price prediction has intensified in recent years due to advancements in computational power, data availability, and machine learning techniques The availability of real-time financial data, along with vast historical datasets, enables researchers and analysts to train and fine-tune these models on rich and diverse information This article aims to provide a comprehensive analysis of the selected neural network models, highlighting their strengths, limitations, and comparative performance in the context of stock price prediction

To evaluate the performance of the CNN, RNN, LSTM, and GRU models, data from the three aforementioned companies will be obtained from Yahoo Finance By examining the accuracy and reliability of these models in forecasting stock prices, valuable insights can be gained, assisting investors, analysts, and researchers in making informed decisions and enhancing their understanding of the stock market dynamics

Overall, this essay serves as a comprehensive exploration of the application of neural networks in stock price prediction Through an examination of the selected models and their performance on real-world financial data, it contributes to the ongoing dialogue surrounding the challenges and opportunities of forecasting stock prices By shedding light on the effectiveness of neural networks, this research

Trang 8

endeavors to enhance the understanding of stock market dynamics and provide valuable insights for the financial community

Trang 9

2 RELATED WORKS

Stock market prediction has been the subject of extensive research, with various techniques and models being employed to forecast stock prices Different approaches have been explored, ranging from fundamental analysis and technical analysis to time series analysis In the realm of time series analysis, both linear models such as AR, MA, ARIMA, ARMA, CARIMA, and non-linear models such as ARCH, GARCH, ANN, RNN, and LSTM have been widely utilized [1][2]

Researchers have also delved into the influence of macro-economic factors on stock price movement, analyzing variables such as crude oil price, exchange rate, gold price, bank interest rate, and political stability[3].Additionally, studies have investigated the correlation between price movements across different sectorial indices in the stock market, employing techniques like frequent itemset mining[4]

Deep learning models, particularly RNN, LSTM, and CNN, have gained attention for their effectiveness in stock price prediction For example, Roondiwala et al utilized the RNN-LSTM model on NIFTY-50 stocks, considering high, close, open, and low prices as features, with a 21-day window used to predict the following day's price movement[5].Similarly, Kim et al proposed the feature fusion LSTM- CNN model, combining the analysis of stock chart images using CNN and historical price data using LSTM Their results demonstrated that the combined model outperformed individual models in terms of prediction accuracy[6]

Other researchers have explored different deep-learning architectures to forecast stock prices Hiransha et al experimented with RNN, CNN, and LSTM models using past closing prices of IT sector companies like TCS and Infosys, as well as Pharma sector company Cipla They trained the models on data from a single company and successfully predicted future prices of stocks from both NSE and NYSE, with CNN showing superior performance[7]

Variations of LSTM have also been proposed to improve its performance Gers & Schmidhuber introduced "peephole connections," allowing gate layers to access the cell state, while Cho et al proposed the Gated Recurrent Unit (GRU), which merges

Trang 10

the forget and input gates into an “update gate." These modifications aim to enhance the learning capabilities of the models and simplify their architecture[8][9]

It is worth noting that the models mentioned above represent only a fraction of the extensive research conducted in this field Other variations, such as Depth Gated LSTMs and Clockwork RNNs, have been proposed to address long-term dependencies and uncover more intricate dynamics in stock prices[10][1 1]

In summary, researchers have explored various techniques and models, including LSTM, CNN, and GRU, to forecast stock prices These models leverage both historical price data and external factors to capture the complexities of stock market dynamics Through the application of deep learning and advancements in computational capabilities, these models offer promising avenues for improving the accuracy and reliability of stock price predictions

3 BACKGROUND 3.1 Machine learning model

RNN (Recurrent Neural Network): is a class of artificial neural networks where connections between nodes can create a cycle, allowing output from some nodes to affect subsequent input to the same nodes This allows it to exhibit temporal dynamic behavior Dertved from feedforward neural networks, RNNs can use their internal state (memory) to process variable length sequences of inputs.[12][13] [14]This makes them applicable to tasks such as unsegmented, connected handwriting recognition or speech recognition

LSTM (Long Short Term Memory): is an artificial neural network used in the fields of artificial intelligence and deep learning Unlike standard feedforward neural networks, LSTM has feedback connections Such a recurrent neural network (RNN) can process not only single data points (such as images), but also entire sequences of data (such as speech or video) This characteristic makes LSTM networks ideal for processing and predicting data For example, LSTM is applicable to tasks such as unsegmented, connected handwriting recognition, [15]speech

Trang 11

recognition, [16] [17]machine translation, [18][19]speech activity detection, robot control, [20]video games, [21 ]and healthcare

CNN (Convolutional Neural Network): is a class of artificial neural network most commonly applied to analyze visual imagery [22]CNNs use a mathematical operation called convolution in place of general matrix multiplication in at least one of their layers [23]They are specifically designed to process pixel data and are used in Image recognition and processing They have applications in image and video recognition, recommender systems, image classification, image segmentation, medical image analysis, natural language processing, [24]brain—computer interfaces, [25]and financial time series

GRU (Gated Recurrent Unit): is a gating mechanism in recurrent neural networks, introduced in 2014 by Kyunghyun Cho et al [9]The GRU is like a long short-term memory (LSTM) with a forget gate, [26]but has fewer parameters than LSTM, as it lacks an output gate GRU's performance on certain tasks of polyphonic music modeling, speech signal modeling and natural language processing was found to be similar to that of LSTM

3.2 Indicators

e MAE (Moving Average Envelopes): The absolute mean error of the predicted model compared with the actual value The lower the MAE, the more efficient the model

Expected value: MAE <1

® R%2: is the proportion of the variation in the dependent variable that is predictable from the independent variable(s) The accuracy of the model compared to the mean of the data set The closer R%2 1s, the more efficient the model is

Expected Value: 0.5 <R‘%2 <1

® RMSE (Root Mean Square Error): is the standard deviation of the residuals (prediction errors) Square root of the mean error of the predicted model relative to the actual value The lower the RMSE, the more efficient the model Expected Value: 0.5 <RMSE < 1.5

10

Trang 12

4 METHODOLOGY 4.1 Overview model

NPUT > DATASET TRAINING MODEL > PREDICTIONS > RESULTS

Figure 1: Overview model 4.2 Description of Data

The purpose of this research is to choose the best model in predicting stock price between: RNN (Recurrent Neural Network), LSTM (Long Short Term Memory), CNN (Convolutional Neural Network), GRU (Gated Recurrent Unit)

This study is based on a financial dataset extracted from Web Yahoo Finance[27] The available dataset is composed of 11.588 records of three companies which have company code: COTY, KO, K Data is collected from 01-01-2005 to 04- 29-2023, the number of transactions varies from day to day and the average was about 8.237 transactions For company KO: We collected data from 03-01-2005 to 04- 282023 with 4612 values, company COTY: 2366 data collected from 12-06-2013 to 28-04-2023, company K: 4613 data collected from 1-3-2005 to 28-04-2023 The data does not specify an explicit target but provides 7 columns that represent the realized percent return on each trade and the returns over 3 different time horizons The objective is to populate an action column with one of two decisions: to trade or not to trade Note that the exact nature of the trade is unknown (long or short) as well as the specific instrument or market traded, in other words, only the return values are provided for the output

Attributes

Date | Open | Low | High | Close | Adi Close | Volume 11

Trang 13

4.3 Data preprocessing and Feature extraction

- Stage 1: Raw Data

In this stage, the historical stock data is collected from Yahoo Finance and this historical data is used for the comparison between 4 models After that, save data to a CSV file

- Stage 2: Data Preprocessing The pre-processing stage involves: a) Load and preprocess data

This will load the CSV file into a DataFrame, set the ‘Date’ column as the index, and parse the dates The dropna() method is used to remove any rows with missing values Finally, the 'Close’ column is selected as the feature data that we want to use for analysis

b) Normalize data to interval (0,1)

In this case, the MinMaxScaler object is created with a feature range of (0, 1) The fit_transform() method is then used to fit the scaler object to the feature data and transform the data to the specified range The resulting data2 is the normalized feature data that can be used for subsequent analysis or machine learning tasks

Finally, numpy arrays are used to store the training data Each sample is appended to X train as a 2D array of shape (timesteps, features), while the

12

Ngày đăng: 23/08/2024, 21:34

TÀI LIỆU CÙNG NGƯỜI DÙNG

  • Đang cập nhật ...

TÀI LIỆU LIÊN QUAN

w