Analyzing-stock-market-movements-using-News-Tweets-Stock-prices-and-transactions-Volume-data-for-APPLE-AAPL-GOOGLE-GOOG-and-SONY-SNE

5 3 0
Analyzing-stock-market-movements-using-News-Tweets-Stock-prices-and-transactions-Volume-data-for-APPLE-AAPL-GOOGLE-GOOG-and-SONY-SNE

Đang tải... (xem toàn văn)

Thông tin tài liệu

See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/329414834 Analyzing stock market movements using News, Tweets, Stock prices and transactions Volume data for APPLE (AAPL), GOOGLE (GOOG) and SONY (SNE) Conference Paper · August 2018 DOI: 10.1145/3243250.3243263 CITATIONS READS 911 authors, including: Ching-Yu Huang Kean University 55 PUBLICATIONS   394 CITATIONS    SEE PROFILE Some of the authors of this publication are also working on these related projects: Periodontitis Genetics View project All content following this page was uploaded by Ching-Yu Huang on 05 December 2018 The user has requested enhancement of the downloaded file Analyzing Stock Market Movements Using News, Tweets, Stock Prices and Transactions Volume Data for APPLE (AAPL), GOOGLE (GOOG) and SONY (SNE) Brijen Rai Mangala Kasturi Ching-yu Huang Kean University, Union, USA Kean University, Union, USA Kean University, Union, USA brai@kean.edu kasturim@kean.edu chuang@kean.edu proportional to the average number of daily transactions If the number of transactions is proportional to the trading volume, then the trading volume is proportional to the variance of the daily price change The activities of stock markets influence the lives of many people, within the financial sector Financial benefits therefore lie in a better understanding of the behavior of this complex system Research towards this goal has been driven by the enormous amount of data on financial transactions, with increasing numbers of studies in complex systems science directing to analyze and model stock market behavior ABSTRACT Goal: Today’s financial markets are of complex behavior which is the result of decisions made by many traders Goal of this research is to calculate the relationship between financial markets stock prices, volumes, counts in financial news and tweets Method: Collect the data sets for the three companies - Apple, Google and Sony Collect tweets using Twitter API written in Python and extract tweet counts only related to stocks for the above companies Collect News data counts using News API, written in Python, only related to stocks for the above companies Collect stocks data including Volume, Close Price, etc for the above companies Findings: We find a positive correlation between the daily number of mentions of the above companies in the Tweets, News, daily stocks close prices and daily transactions volume of a company's stock after the tweets and news are released Our results provide measurable support for the suggestion that activities in financial markets, news and tweets are fundamentally interlinked Traders may not only receive information via search for information online, but by passively or actively receiving news broadcast by large financial news channels The decisions of traders may lead to events which are described by the financial news In this study, we pursue to calculate the relationship between activities in financial news and markets by take advantage of data of stock news from News API With the advent of social media, the information about public feelings has become abundant Social media is transforming like a perfect platform to share public emotions about any topic and has a significant impact on overall public opinion Twitter, a social media platform, has received a lot of attention from researchers in the recent times Twitter is a micro-blogging application that allows users to follow and comment other user’s thoughts or share their opinions in real time More than million users post over 140 million tweets every day This situation makes Twitter like a corpus with valuable data for researchers Each tweet is of 140 characters long and speaks public opinion on a topic concisely The information exploited from tweets are very useful for making predictions [8] CCS Concept Information systems → Data stream mining Keywords Correlation, chi-square, similarity, tweets, news, stock price, volume, data mining INTRODUCTION Cross-correlations between volume change and price change was studied by Boris Podobnik, Davor Horvatic, Alexander M Petersen, and H Eugene Stanley [1] There is a saying on Wall Street that “it takes volume to move stock prices.” A number of studies have analyzed the relationship between price changes and the trading volume in financial markets [5-7] Some of these studies [5, 7] have found a positive relationship between price change and the trading volume In order to explain this relationship, Clarke assumed that the daily price change is the sum of a random number of uncorrelated intraday price changes [3], so predicted that the variance of the daily price change is METHODS To examine the relationship between financial news, tweets and market behavior, we analyze a corpus of daily news of the https://newsapi.org/ using News API from 14th February to 26th March 2018 We used a Python library called Tweepy to connect to Twitter Streaming API and downloaded the data daily from 12th February to 26th February 2018 between 3pm4pm These file size ranges from 150MB to 250MB and number of tweets ranges from 50 to 10,000 per stock For stocks data we used yahoo finance website to download csv file from 12th February to 26th March 2018 Table show the list of companies under study Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page Copyrights for components of this work owned by others than ACM must be honored Abstracting with credit is permitted To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee Request permissions from Permissions@acm.org PRAI 2018, August 15–17, 2018, Union, NJ, USA © 2018 Association for Computing Machinery ACM ISBN 978-1-4503-6482-9/18/08…$15.00 Table List of companies https://doi.org/10.1145/3243250.3243263 109 Company Name Ticker Symbol Apple AAPL Google GOOG Sony SNE Once the data was collected it was then preprocessed and filtered in relation to the above stocks related discussions only All the above collected data was then inserted into the MySQL by running a cron job for 14 days and tables are used for further processing and calculations The raw data that we collected from Twitter is 30000 tweets for three companies and the tweets are filtered based on stock ticker symbol and the processed data we got for each company is around 5000 tweets This processed data is further normalized and used for further interpretation and evaluation of patterns and rules Platform: Front-end is browser based using HTML, JavaScript, JpGraph Middle-end is using Python, PHP running on Apache web server, the back-end is a MySQL server The browser uses CGI post method to pass data and display the results Figure-1 show the 3-tier architecture used for our project and research Figure Line graph for Stock Frequency The Figure-4 displays the line graph for News frequency The xaxis shows the number of days and the y-axis displays the news count of a company on that day Figure 3-Tier architecture Line Graphs: A line graph is useful for displaying data or information that changes continuously over time The Figure-2 displays the line graph for Twitter frequency The x-axis shows the number of days and the y-axis displays the number of Twitter counts of a company’s stock label on that day Figure Line graph for News Frequency Correlation is a statistical analysis used to measure and describe the relationship between two variables The Correlations coefficient is a statistic and it can range between +1 and -1 +1 is a perfect positive correlation If the scores go up for one variable the score goes up on the other.> 0.8 is a strong correlation.> 0.4 is a high correlation.> 0.2 correlate < 0.2 is not a strong correlation < 0.1 doesn't correlate.0 is no correlation (independence) -1 is a perfect negative correlation If the scores go up for one variable the score goes down on the other Similarity between the Apple’s Stock prices and news count, Stock volume and news count was calculated To achieve this average of the stock prices, news count and stocks volume for the above period was calculated and any number of the stock prices, news count and stocks volume above that average value was marked as binary variable “1” Any value lower than the average value was marked as binary variable “0” Figure Line graph for Twitter Frequency The Figure-3 displays the line graph for Stock frequency The xaxis shows the number of days and the y-axis displays the close price of a company on that day Following is the table for the above Similarity method: 110  Table Similarity Method used for stock close prices, volume and news count Google and Sony Twitter vs Twitter Correlation: 0.71 The value > 0.4 is a high correlation between Google and Sony twitter data Our approach shows strong correlation values between various features and the average value of correlation between various features is around 0.8 for Twitter vs Twitter and Stock vs Stock correlations Comparatively we have around 0.3 correlation for Twitter vs Stocks datasets Similarity Results: From the similarity method stated above in the Method section following is the result of Apple Stock’s Close Price and News count similarity test: Table AAPL Stock’s Close Price and News count similarity test As the data set collected is very small the result shows 40% similarity between Apple’s Close price vs News Count From the similarity method stated above in the Methods section following is the result of Apple Stock’s Transactions Volume vs News count similarity test: Chi-Square Method: was used to test the hypothesis that stock’s close prices and news count are independent and there is no correlation between them The test was based on a significance level, with (row-1) x (colum-1) degrees of freedom = 1x = This method was also used to test the hypothesis that stock’s volume and news count are independent and there is no correlation between them If the hypothesis can be rejected, then we say that these attributes are statistically dependent Table AAPL Stock’s Transactions Volume vs News count similarity test EXPERIMENT RESULTS Correlation Results: Correlations between tweet counts, news counts, stock prices and transactions volume were calculated The results we got are shown below:         As the data set collected is very small the result shows 44.44% similarity between Apple’s Transactions Volume and News Counts Apple Twitter vs Stock Correlation: 0.39 The value >0.2 indicates that Twitter and Stock data for Apple are correlated with each other Google Twitter vs Stock Correlation: 0.61 The value > 0.4 indicates that Google Twitter and stock data have a high correlation Sony Twitter vs Stock Correlation: 0.37 The value >0.2 indicates that Twitter and Stock data for Apple are correlated with each other Apple and Google Stock vs Stock Correlation: 0.93 The value > 0.8 is a strong correlation between Apple and Google stocks data Apple and Sony Stock vs Stock Correlation: 0.92 The value > 0.8 is a strong correlation between Apple and Sony stock data Google and Sony Stock vs Stock Correlation: 0.96 The value > 0.8 is a strong correlation between Google and Sony stock data Apple and Google Twitter vs Twitter Correlation: 0.81 The value > 0.8 is a strong correlation between Apple and Google twitter data Apple and Sony Twitter vs Twitter Correlation: 0.77 The value > 0.4 is a high correlation between Apple and Sony twitter data Similar to the above experiments we can, time permitting, perform more similarity experiments between other data sets to find the relationship between them Chi-Square Results: These are the Chi-Square results as mentioned in the methods sections above Significance level of 0.01 was selected for the tests Following are the results for the Apple Stock’s Close Price and News count chi-Square test: Table AAPL Stock’s Close Price and News count ChiSquare test Note: The contingency table provides the following information: the observed cell totals, (the expected cell totals) and [the chisquare statistic for each cell] The chi-square statistic is 0.7778 The p-value is 0.377822 The result is not significant at p < 0.01 111 stock close price and transactions volume were not associated with News count In future we can focus on collecting more Twitter and News data and analyze different combinations to find out the correlation and similarity between the datasets As the data set collected was small the results might not be accurate From Chi-Square table, Χ2 value needed to reject the hypothesis at the 0.01 significance level is >= 6.6349 but the result we got for X2 = 0.7778 which is < 6.6349 based on which we cannot reject the hypothesis This indicates that Apple Stock’s Close Price and News count are independent REFERENCES [1] Boris Podobnik, Davor Horvatic, Alexander M Petersen, and H Eugene Stanley 2009 Cross-correlations between volume change and price change In Proceedings of the National Academy of Sciences of the United States of America, 106, 52 (December 29, 2009), 22079–22084 DOI= http://www.pnas.org/cgi/doi/10.1073/pnas.0911983106 [2] Line Graphs, Math Goodies, 2018 https://www.mathgoodies.com/lessons/graphs/line Accessed: 2018- 07- 01 [3] Statistics - Correlation (Coefficient analysis), 2018 https://gerardnico.com/data_mining/correlation Accessed: 2018- 07- 01 [4] Tushar Rao and Saket Srivastava 2012 Analyzing Stock Market Movements Using Twitter Sentiment Analysis In Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining, (Aug 26-29, 2012), 119-123 [5] Charles C Ying 1996 Stock market prices and volume of sales Econometrica, 34 (Jul 1996), 676–685 [6] Robert L Crouch 1970 The volume of transactions and price changes on the New York Stock Exchange Financial Analysts Journal, 26, (Jul.-Aug 1970), 104–109 [7] Peter K Clark 1973 A subordinated stochastic process model with finite variance for speculative prices Econometrica, 41, (Jan 1973), 135–155 [8] Venkata Sasank Pagolu, Kamal Nayan Reddy Challa, Ganapati Panda, Babita Majhi 2016 Sentiment Analysis of Twitter Data for Predicting Stock Market Movements In Proceedings of International conference on Signal Processing, Communication, Power and Embedded System (Paralakhemundi, India, October 3-5, 2016) DOI= https://doi.org/10.1109/SCOPES.2016.7955659 Following are the results for the Apple Stock’s Transactions Volume and News count chi-Square test: Table AAPL Stock’s Transactions Volume and News count Chi-Square test Note: The contingency table provides the following information: the observed cell totals, (the expected cell totals) and [the chisquare statistic for each cell] The chi-square statistic is 2.3333 The p-value is 0.12663 The result is not significant at p < 0.01 As the data set collected was small the results might not be accurate From Chi-Square table, Χ2 value needed to reject the hypothesis at the 0.01 significance level is >= 6.6349 but the result we got for X2 = 2.3333 which is < 6.6349 based on which we cannot reject the hypothesis This indicates that Apple Stock’s Transactions Volume and News count are independent Time Permitting, we can perform Chi-Square test on the rest of the data collected and perform more experiments to find the relationship between them CONCLUSIONS In this paper, we have worked upon identifying relationships between Twitter based sentiment analysis of a company and its short-term market performance using large scale collection of tweet data and news data Overall our analysis of individual company stocks gave strong correlation values with twitter sentiment features of that company However, both Apple’s 112 View publication stats

Ngày đăng: 01/11/2022, 23:28

Tài liệu cùng người dùng

  • Đang cập nhật ...

Tài liệu liên quan