1. Trang chủ
  2. » Công Nghệ Thông Tin

Politics big data nowcasting forecasting 6585

189 37 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 189
Dung lượng 2,51 MB

Nội dung

Politics and Big Data The importance of social media as a way to monitor an electoral campaign is well established Day-­by-­day, hour-­by-­hour evaluation of the evolution of online ideas and opinion allows observers and scholars to monitor trends and momentum in public opinion well before traditional polls However, there are difficulties in recording and analyzing often brief, unverified comments while the unequal age, gender, social and racial representation among social media users can produce inaccurate forecasts of final polls Reviewing the different techniques employed using social media to nowcast and forecast elections, this book assesses its achievements and limitations while presenting a new technique of “sentiment analysis” to improve upon them The authors carry out a meta-­analysis of the existing literature to show the conditions under which social media-based electoral forecasts prove most accurate while new case studies from France, the United States and Italy demonstrate how much more accurate “sentiment analysis” can prove Andrea Ceron is Assistant Professor of Political Science at Università degli Studi di Milano, Italy Luigi Curini is Associate Professor of Political Science at Università degli studi di Milano, Italy Stefano Maria Iacus is Full Professor of Mathematical Statistics and Probability at Università degli Studi di Milano, Italy Politics and Big Data Nowcasting and Forecasting Elections with Social Media Andrea Ceron, Luigi Curini and Stefano M Iacus First published 2017 by Routledge Park Square, Milton Park, Abingdon, Oxon OX14 4RN and by Routledge 711 Third Avenue, New York, NY 10017 Routledge is an imprint of the Taylor & Francis Group, an informa business © 2017 Andrea Ceron, Luigi Curini and Stefano M Iacus The right of Andrea Ceron, Luigi Curini and Stefano M Iacus to be identified as authors of this work has been asserted by them in accordance with sections 77 and 78 of the Copyright, Designs and Patents Act 1988 All rights reserved No part of this book may be reprinted or reproduced or utilised in any form or by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying and recording, or in any information storage or retrieval system, without permission in writing from the publishers Trademark notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library Library of Congress Cataloging in Publication Data Names: Ceron, Andrea, author | Curini, Luigi, author | Iacus, Stefano M   (Stefano Maria), author Title: Politics and big data : nowcasting and forecasting elections   with social media / Andrea Ceron, Luigi Curini and Stefano M Iacus Description: Abingdon, Oxon ; New York, NY : Routledge, [2017] |   Includes bibliographical references Identifiers: LCCN 2016034245| ISBN 9781472466662 (hbk) |   ISBN 9781315582733 (ebk) Subjects: LCSH: Election forecasting | Social media—Political aspects |   Big data—Political aspects | Internet in political campaigns Classification: LCC JF1048 C47 2017 | DDC 324.900285/57—dc23 LC record available at https://lccn.loc.gov/2016034245 ISBN: 978-­1-­4724-­6666-­2 (hbk) ISBN: 978-­1-­315-­58273-­3 (ebk) Typeset in Times New Roman by Apex CoVantage, LLC Contents List of figures List of tables vi viii Introduction 1 Social media electoral forecasts: an overview From noise to signal in sentiment and opinion analysis 38 Nowcasting and forecasting electoral campaigns: evidence from France, the United States and Italy 68 Leaders, promises and negative campaigning: digging into an electoral campaign through social media 105 Social media and electoral forecasts: sources of bias and meta-­analysis 132 Conclusion: “To predict or not to predict?” Future avenues of social media research within and beyond electoral forecasts 154 Postscript171 Index 175 Figures 2.1 Needs and solutions in textual analysis 2.2 The data generating process in topic models 2.3 Intuitive reason why, when the noise D0 category is dominant in the data, the estimation of P(S | D) is reasonably more accurate than the estimation of counterpart P(D | S) 2.4 The estimated sentiment toward Donald Trump 2.5 Reasons for positive sentiment toward Donald Trump 2.6 Reasons for negative sentiment toward Donald Trump 3.1 First round of the 2012 French presidential election: vote share predictions according to electoral surveys and SASA by presidential candidates (SASA) 3.2 Flow of preferences expressed on Twitter during the electoral campaign for the second round of the 2012 French presidential election (SASA) 3.3 Second round of the 2012 French presidential election: vote share predictions according to electoral surveys and SASA by presidential candidates 3.4 2012 US presidential election: daily social voting intentions according to SASA 3.5 Estimates of the lead-­lag analysis in the short term 3.6 General expectations on Twitter about the outcome of the 2012 centre-­left primary election (SASA) 3.7 Centre-­left primary election, first round: candidates’ share of votes according to SASA and comparison with actual results 3.8 First round of the 2012 Italian centre-­left primary elections: absolute difference between final results and predictions according to survey polls and SASA 3.9 Italian Democratic Party primary election: daily social voting intentions according to SASA 3.10 Italian Democratic Party primary election: daily evolution of the level of negative campaigning in Twitter discussion over candidates 45 49 52 62 63 64 72 74 75 77 87 89 90 91 96 98 Figures  vii 4.1 Flow of preferences expressed on Twitter (SASA) during the last month of electoral campaign for the five main coalitions in the 2013 Italian general elections 4.2 Predicted vote share for the centre-­right and centre-­left coalition according to survey polls throughout the 2013 Italian electoral campaign 4.3 Comparison between actual vote share and, respectively, predictions according to survey polls, instant polls and SASA 4.4 Evolution of the different strategies across the campaign 4.5 Marginal effect of negative campaign toward rivals conditional on the magnitude of negative campaign suffered (with 95% confidence interval) 4.6 Marginal effect of distributive/clientelistic policy promises conditional on the party making the promise (with 95% confidence interval) 4.7 Evolution of sentiment toward Italian political leaders before the 2013 Italian general elections (SASA) 4.8 Evolution of sentiment toward Italian political leaders after the 2013 Italian general elections (SASA) 5.1 Evolution over time of the MAE of social media leaders’ approval (SASA) 5.2 The evolution of the MAE over time as the election date approached (SASA) 5.3 Predicted and actual vote shares related to the first round of the 2012 French legislative elections 5.4 Marginal effect of the number of tweets on the MAE as turnout changes (with 90% confidence interval) 5.5 Marginal effect of number of posts on the MAE at different levels of turnout (with 95% confidence interval) 6.1 Distribution of ideological self-­placement of citizens by frequency of SNS usage 6.2 Distribution of ideological self-­placement of citizens by perceived importance of SNS for expressing own views PS.1 2016 US presidential election: daily social voting intentions according to iSA 109 112 113 117 120 121 123 124 135 138 140 142 150 159 159 173 Tables 1.1 A typology of methods employed to perform social media-based electoral forecast 2.1 What characterizes text analysis techniques 2.2 Example of document-­term matrix 2.3 Comparison of individual and aggregated classifiers 3.1 Temporal evolution of the main topics of the US presidential campaign according to social media conversations 3.2 Reaction to the presidential debates during the US electoral campaign according to survey polls and Twitter (SASA) 3.3 2012 US presidential election: accuracy of the predictions. 3.4 Comparison of the accuracy of Twitter forecast made through mentions, automated sentiment analysis and SASA method (2012 US presidential election, popular vote) 3.5 Estimates of the lead-­lag analysis in the short term 3.6 Comparison of the accuracy of Twitter forecast and surveys polls in the Italian primary election of the centre-­left coalition (first round) 3.7 Comparison of the accuracy of Twitter forecast and surveys polls in the Italian primary election of the centre-­left coalition (second round) 3.8 Comparison of the accuracy of Twitter forecast made through mentions, automated sentiment analysis and SASA method (Italian primary election of the centre-­left coalition, first round) 3.9 Italian Democratic Party primary election: contrasting the actual votes of the candidates with Twitter (SASA) predictions (both weighted and non-­weighted) 4.1 Effect of different campaign strategies on a party’s share of voting intentions 4.2 Explaining surveys voting intentions according to leaders’ positive sentiment 5.1 Average difference and correlation of leaders’ popularity ratings between mass surveys and social media 5.2 Fractional logit of the MAE 16 42 43 58 78 79 83 84 86 92 93 93 97 119 126 134 141 Tables  ix 5.3 Number of predictions by election type 5.4 List of countries and number of predictions 5.5 Determinants of the accuracy of social media predictions: fractional logit of the electoral forecast MAE 6.1 Weighting social media predictions: fractional logit of the electoral forecast MAE 6.2 Distribution of ideological self-­placement of Italian voters versus subsample active on social media 6.3 Survey data as determinants of the accuracy of social media predictions: fractional logit of the electoral forecast MAE  144 145 148 157 158 163 164  Future avenues of social media research we can notice from Table 6.3 that anchoring social media predictions to surveys can dramatically decrease the MAE and, therefore, social media data can be profitably integrated with survey data to increase the performance of the prediction The MAE decreases, in fact, by points on average when surveys (i.e the variable Polls) are taken into account when doing social media electoral forecasts Interestingly, we have found also a negative correlation (−0.11) between the MAE obtained by traditional survey polls, which summarized how the election was easily predictable using traditional tools, and the MAE of social media predictions This result is confirmed in our meta-­analysis (see Table 6.3), which revealed that the MAE of social media was lower in electoral contexts in which the MAE of surveys was higher.12 In other words, the MAE of social media analysis and that of traditional survey polls went in different directions When the MAE of the survey polls increased, that is when electoral results could be hardly predicted by means of surveys, the MAE of social media analysis, that is the accuracy of social media analysis decreased And vice versa Why did this happen? Probably the information available on social media allowed to catch some trends that were not captured by traditional opinion polls; for instance, voting intentions toward new parties, anti-­system parties or parties that were markedly affected by the spiral of silence due to social desirability These features as well as other trends that manifest themselves in the online realm first (see, for instance Chapter 3), may not emerge when analyzing public opinion through traditional techniques, thereby increasing the MAE of surveys; however, these same trends might be caught by social media analysis, and for this reason, the MAE of social media predictions can decrease This result pointed, therefore, to the importance of integrating social media data and survey data If it is true, as our analysis has shown, that survey data can improve the accuracy of social media predictions, it is also true that survey research can benefit from listening to social media-based predictions, as these predictions in some cases can anticipate new trends and can be used to evaluate changes in public opinion and to adjust the estimates of surveys Summing up, to reiterate what was already noted, our review and the connected statistical analysis suggested that the analysis of social media was not always so bad as it was sometimes depicted and, in fact, – depending on the technique used or on the electoral context – we observed many promising predictions based on social media In this regard, this is a way that is worth pursuing Conclusion The application of sentiment analysis with respect to politics and electoral results is just one of the many examples employing Big Data we have witnessed in the recent years Some of those examples have clearly shown the utility of addressing old research questions in new ways, and addressing new questions that emerge from this process After all, the goal of social science research is to understand causal processes and this inevitably requires deductive theory testing (Dalton Future avenues of social media research   165 2016) Social and political scientists are accustomed to using statistical methods to test theories While doing it, the focus is typically on the coefficients for the theoretically-derived input variables When using Big Data in politics (in particular with respect to elections), the focus is usually on the outputs rather than the inputs (Wilkerson and Casas 2017), leading researchers to be first and foremost concerned with prediction accuracy and less concerned with explanation Still, there is nothing inherently contradictory between Big Data methods and theory testing (Nagler and Tucker 2015) For example, in Chapter  we have shown what we can learn by employing social media data in terms of the role of political leaders and their valence endowment, the impact of policy promises and that of negative campaigning Of course, other researchers highlight the limits of these new types of data This is far from being a surprise and, on the contrary, it is a quite common story that appears every time we have to deal with new data (big or small, it does not matter) For example, at the early stage of genomic analysis, we have seen many dumb applications of standard statistical or machine learning techniques to, for example, microarray data Of course, gene expressions contain information, but standard machine learning or data mining techniques will not reveal any Indeed, joint multiple-­testing techniques and other ad hoc methods have been developed to assess, based on correct statistical grounds, the functional relationships between genes and pathologies To summarize, when social scientists come to data or, even earlier, when social scientists approach data analysis, the main focus should not be the data per se or the model alone The focus should be always on the correct statistical model given the data at hand The blind application of machine learning, data mining techniques or other well-­known methods may severely affect the results of the analysis no matter how good these techniques are for the specific task they were designed for Social science is not merely a set of tools but is mostly a way of looking at the world That is social science is (or should also be) about how one thinks of problems and how he or she tries (eventually) to solve them Such process, from studying problems to solving them, can explain why – at least in some contexts – “the influence of quantitative social science (including the related technologies, methodologies, and data) on the real world has been growing fast” (King 2014: 3) Big Data are “today’s data”, and clearly every data are different beasts which require different methodologies But these new data are more and more accessible and even if they require powerful backend and advanced statistical engines, these can be easily managed today from everyone’s laptop even using open source tools Computer science and information technology are certainly important as well as the connections with other disciplinary fields such as political communication and political science; but the general principle of looking into the data and thinking on the problem in a statistical way seems to be too seldom neglected Our results underpinned a theoretical framework that went in a different direction from the idea of building a representative sample of online population in order 166  Future avenues of social media research to carry out analysis at the individual level To the contrary, the online crowds could be more useful to anticipate trends (and outcomes) when it is considered in the aggregate, as this can provide the finest description of the balance of power between different parties/candidates What is more, our analysis has provided strong evidence in favor of mixing data from different sources in order to better understand public opinion dynamics At the same time, however, our results confirmed that the method does matter and information available on social media should be analyzed using the proper techniques Of course, as with any other data, social media research will require replicability (King 1995) to gauge its viability as a source to study public opinion (see also Lacy et al 2015 for a discussion on the best practices with respect to content analysis and social media data) Right now this is still (largely) lacking, due to the interplay of the massive amount of data analyzed (difficult to store and share on standard public platforms), their economic value and (sometimes) rather obscure and ad hoc algorithm adopted to produce the reported estimates (Sudulich et al 2014) But at least with respect to this latter aspect, some steps have been taken lately in the right direction To conclude, it is time for the data to go back to social science or, better, for the social scientists (or data scientists, as it is now fancy to call people working on data) to start looking at the Big Data (with B and D capitalized) in an inclusive way We consider this book as a contribution in this direction Notes   http://www.guardian.co.uk/technology/2011/mar/17/us-­s py-­o peration-­s ocial-­ networks   http://www.digitalspy.com/music/news/a471915/justin-­bieber-­twitter-­followers-­50-­ percent-­are-­fake-­says-­report.html   http://www.ft.com/intl/cms/s/2/21a6e7d8-­b479-11e3-­a09a-­00144feabdc0.html   This is a problem also for standard offline surveys, as the poll rates keep falling dramatically in recent years, thanks to mobile phones, caller identification and a rise in phone solicitation, while the difficulties of reaching many population segments still persist (Goidel 2011; Hillygus 2011; Tourangeau and Plewes 2013)   For example the percentage of population ages 55–64 increased on Twitter by 79% in the last year See Global Web Index, ‘‘SOCIAL PLATFORMS GWI.8 UPDATE: Decline of Local Social Media Platforms.’’ URL: https://www.globalwebindex.net/ social-­platforms-­gwi-­8-­update-­decline-­of-­local-­social-­media-­platforms/   Notice that the number of observations was equal to 231 because in five cases we were not able to evaluate whether the prediction was made focusing on only one tweet per user as such information was not provided by the source of the forecasts   Polling organizations also commonly use post-­stratification weights to adjust the collected sample characteristics to match estimated population values (Little 1993)   Even when we exclude from the analysis the predictions based on SASA, which only produces an aggregate measure of sentiment and that, therefore, does not allow to distinguish the opinions of single users, the effect of Weight is still not statistically significant at the 95% level of confidence   In other words, if for surveys of probability samples topic coverage follows naturally from population coverage, for social media analyses, topic coverage can, in principle, be achieved without population coverage That is other mechanisms of information Future avenues of social media research   167 propagation that are particular to the dynamics of social media may lead a collection of posts to accurately distill larger conversations in the full population despite the lack of population coverage among posters (Schober et al 2016) 10 Interestingly, the contrast that we have often highlighted between methods that focus on the individual opinion of each single user versus methods that focus on the contrary directly on the aggregate opinion, reminds the long-lived (and heated) debate between those who see public opinion as the aggregate of individual opinions that can be recovered, for example, through surveys, and those who adopts an Enlightenment idea of public opinion that conceives it as the collective judgment arising during a public deliberation (Habermas 1980; Bourdieu 1979), i.e., as a “sharing opinion” rather than as a “aggregating opinion” 11 In a few cases we did not find any survey data and, therefore, it was impossible to compute the value of the variable MAE Polls Accordingly, the number of observations dropped to 215 12 Notice that if we add the variable MAE Polls in Models and 2, Table 5.5 (Chapter 5) we find the same negative association between MAE and MAE Polls References Ampofo, L., Anstead, N., and O’Loughlin, B (2011) ‘Trust, confidence, and credibility’, Information, Communication & Society, 14(6): 850–871 doi: 10.1080/1369118X.2011.587882 Anderson, C (2008) ‘The end of theory: The data deluge makes the scientific method obsolete’ Available at: http://www.wired.com/science/discoveries/magazine/16–07/pb\_theory Bakker, T.P., and De Vreese, C.H (2011) ‘Good news for the future? Young people, internet use, and political participation’, Communication Research, 20: 1–20 Barberá, P (2015) ‘Who is the most conservative republican candidate for president?’, Washington Post, June 16 Available at: https://www.washingtonpost.com/blogs/monkey-­cage/ wp/2015/06/16/who-­is-­the-­most-­conservative-­republican-­candidate-­for-­president/ Barberá, P., and Rivero, G (2014) ‘Understanding the political representativeness of Twitter users’, Social Science Computer Review, 33(6): 712–729 doi: 0894439314558836 Best, S.J., and Krueger, B.S (2005) ‘Analyzing the representativeness of internet political participation’, Political Behavior, 27(2): 183–216 Bhutta, C.B (2012) ‘Not by the book: Facebook as a sampling frame’, Sociological Methods Research, 41: 57–88 Bourdieu, P (1979) ‘Public opinion does not exist’, In Siegelaub, S and Mattelart, A (eds.), Communication and class struggle, 124–310 New York: International General/IMMRC Butler, D (2013) ‘When google got flu wrong: Us outbreak foxes a leading web-­based method for tracking seasonal flu’, Nature, February 13 Available at: http://www.nature com/news/when–google–got–flu–wrong-1.12413 Ceron, A., Curini, L., and Iacus, S.M (2016) ‘iSA: A fast, scalable and accurate algorithm for sentiment analysis of social media content’, forthcoming on Information Sciences, 367–368: 105–124 Choy, M., Cheong, M., Ma, N., and Koo, P (2011) ‘A sentiment analysis of Singapore presidential election 2011 using Twitter data with census correction’, ArXiv Available at: http://arxiv.org/abs/1108.5520 Choy, M., Cheong, M., Ma, N., and Koo, P (2012) ‘US presidential election 2012 prediction using census corrected Twitter model’, ArXiv Available at: http://arxiv.org/abs/ 1211.0938 Conover, M., Ratkiewicz, J., Francisco, M.R., Gonỗalves, B., Menczer, F., and Flammini, A (2011) ‘Political polarization on Twitter’, ICWSM, 133: 89–96 168  Future avenues of social media research Cook, S., Conrad, C., Fowlkes, A., and Mohebbi, M (2011) ‘Assessing google flu trends performance in the united states during the 2009 influenza virus a (h1n1) pandemic’, PloS One, 6(8): e23610 Dalton, R (2016) ‘The potential of “big data” for the cross-­national study of political behavior’, International Journal of Sociology, 46: 1–13 de Zúñiga, H.G., Nakwon, J., and Valenzuela, S (2012) ‘Social media use for news and individuals’ social capital, civic engagement and political participation’, Journal of Computer-­Mediated Communication, 17(3): 319–336 Diaz, F., Gamon, M., Hofman, J.M., and Kiciman, E (2016) ‘Online and social media data as an imperfect continuous panel survey’, PloS One doi: 10.1371/journal.pone.0145406 Dwi Prasetyo, N., and Hauff, C (2015) ‘Twitter-­based election prediction in the developing world’, Proceedings of the 26th ACM Conference on Hypertext & Social Media Guzelyurt, Cyprus, 1–4 September 2015 Farrell, H., and Drezner, D.W (2008) ‘The power and politics of blogs’, Public Choice, 134(1–2): 15–30 Franch, F (2013) ‘(Wisdom of the crowds)2: 2010 UK election prediction with social media’, Journal of Information Technology & Politics, 10(1): 57–71 Franklin, M.N (2002) ‘The dynamics of electoral participation’, In LeDuc, L., Niemi, R.G., and Norris, P (eds.), Comparing Democracies 2: New Challenges in the Study of Elections and Voting Thousand Oaks; London: Sage, 163 Gayo-­Avello, D (2011) ‘Don’t turn social media into another “literary digest” poll’, Communications of the ACM, 54(10): 121–128 Gayo-­Avello, D (2013) ‘A meta-­analysis of state-­of-­the-­art electoral prediction from Twitter data’, Social Science Computer Review, 31(6): 649–679 Gayo-­Avello, D., Metaxas, P.T., and Mustafaraj, E (2011) ‘Limits of electoral predictions using Twitter’, Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media, Barcelona, Spain, 17–21 July 2011 Goidel, K (2011) Political Polling in the Digital Age: The Challenge of Measuring and Understanding Public Opinion New Orleans: LSU Press Habermas, J (1980) The Structural Transformation of the Public Sphere Cambridge, MA: The MIT Press Herbst, S (2011) ‘Un(Numbered) voices? Reconsidering the meaning of public opinion in a digital age’, In Goidel, K (ed.), Political Polling in the Digital Age: The Challenge of Measuring and Understanding Public Opinion, 85–98 Baton Rouge: Louisiana State University Press Hillygus, D.S (2011) ‘The evolution of election polling in the United States’, Public Opinion Quarterly, 75(5): 962–981 Hopkins, D., and King, G (2010) ‘A method of automated nonparametric content analysis for social science’, American Journal of Political Science, 54(1): 229–247 Huberty, M (2015) ‘Can we vote with our tweet? On the perennial difficulty of election forecasting with social media’, International Journal of Forecasting, 31(3): 992–1007 Iacus, S.M (2014) ‘Big data or big fail? the good, the bad and the ugly and the missing role of statistics’, Electronic Journal of Applied Statistical Analysis, 5(11): 4–11 King, G (1995) ‘Replication, replication’, PS: Political Science and Politics, 28: 444–452 King, G (2014) ‘Restructuring the social sciences: Reflections from Harvard’s institute for quantitative social science’, PS: Political Science and Politics, 47(1): 165–172 Jensen, M.J., and Anstead, N (2013) ‘Psephological investigations: Tweets, votes, and unknown unknowns in the republican nomination process’, Policy & Internet, 5(2): 161–182 Future avenues of social media research   169 Jensen, M.J., Jorba, L., and Anduiza, E (2012) ‘Introduction’, In Anduiza, E., Jensen, M.J., and Jorba, L (eds.), Digital Media and Political Engagement Worldwide: A Comparative Study, 1–15 New York: Cambridge University Press Lacy, S., Watson, B.R., Riffe, D., and Lovejoy, J (2015) ‘Issues and Best Practices in Content Analysis’, Journalism & Mass Communication Quarterly, 92(4): 791–811 Lasorsa, D.L., Lewis, S.C., and Holton, A.E (2012) ‘Normalizing Twitter: Journalism practice in an emerging communication space’, Journalism Studies, 13(1): 19–36 Little, R (1993) ‘Post-­stratification: A modeler’s perspective’, Journal of the American Statistical Association, 88: 1001–1012 Moy, P., and Murphy, J (2016) ‘Problems and prospects in survey research’, Journalism & Mass Communication Quarterly, 93(1): 16–37 Murphy, J.J (2014) ‘Using respondent tweets to fill in survey gaps’, Quirk’s Marketing Research Media Available at: http://www.quirks.com/articles/2014/20140125-1.aspx Murphy, J.J., Keating, M.D., and Edgar, J (2014) ‘Crowdsourcing in the cognitive interviewing process’, Proceedings of the 2013 Federal Committee on Statistical Methodology Research Conference, Washington, DC, 4–6 November 2013 Nagler, J., and Tucker, J (2015) ‘Drawing inferences and testing theories with big data’, PS: Political Science and Politics 48: 84–88 Noble, I (2003) ‘Human genome finally complete BBC News’ Available at: http://news bbc.co.uk/1/hi/sci/tech/2940601.stm O’Connor, B., Balasubramanyan, R., Routledge, B., and Smith, N.A (2010) ‘From tweets to polls: Linking text sentiment to public opinion time series’, Proceedings of the Fourth International AAAI Conference on Weblogs and Social Media, Washington, DC, 23–26 May Plutzer, E (2002) ‘Becoming and habitual voter: Inertia, resources, and growth in young adulthood’, American Political Science Review, 96(1): 41–56 Rainie, L., Smith, A., Schlozman, K.L., Brady, H., and Verba, S (2012) ‘Social media and political engagement’, Pew Internet & American Life Project Available at: http://pewin ternet.org/Reports/2012/Political-­engagement.aspx Rhodes, B.B., and Marks, E.L (2011) ‘Using Facebook to locate sample members’, Survey Practice, 4(5) Available at: http://www.surveypractice.org/index.php/SurveyPractice/ article/view/83/pdf Sang, E., and Johan Bos, J (2012) ‘Predicting the 2011 Dutch senate election results with Twitter’, Proceedings of the Workshop on Semantic Analysis in Social Media, Avignon, France, 23 April 2012 Schober, M.F., Pasek, J., Guggenheim, L., Lampe, C., and Conrad, Frederick G (2016) ‘Social media analyses for social measurement’, Public Opinion Quarterly, 80(1): 180–211 doi: 10.1093/poq/nfv048 Shi, L., Agarwal, N., Agrawal, A., Spoelstra, G., and Spolestra, J (2012) ‘Predicting US primary elections with Twitter’, Unpublished manuscript Available at: http://snap.stanford edu/social2012/papers/shi.pdf Sigelman, L., Roeder, P.W., Jewell, M.E., and Baer, M.A (1985) ‘Voting and nonvoting: A multi-­election perspective’, American Journal of Political Science, 29(4): 749–765 Spierings, N., and Jacobs, K (2014) ‘Getting personal? The impact of social media on preferential voting’, Political Behavior, 36(1): 215–234 Sudulich, L., Wall, M., Gibson, R., Cantijoch, M., and Ward, S (2014) ‘Introduction: The importance of method in the study of the ‘political internet’’, in Cantijoch, M., Gibson, R and Ward, S (eds.), Analyzing Social Media Data and Web Networks: New Methods for Political Science, 1–21 Basingstoke: Palgrave Macmillan 170  Future avenues of social media research Tjong Kim Sang, E., and Bos, J (2012) ‘Predicting the 2011 Dutch senate election results with Twitter’, Proceedings of SASN 2012, the EACL 2012 Workshop on Semantic Analysis in Social Networks, Avignon, France Tourangeau, R., and Plewes, T.J (2013) Nonresponse in Social Science Surveys: A Research Agenda Washington, DC: The National Academies Press Tsakalidis, A., Papadopoulos, S., Cristea, A., and Kompatsiaris, Y (2015) ‘Predicting elections for multiple countries using Twitter and polls’, IEEE Intelligent Systems doi: 10.1109/MIS.2015.17 Vaccari, C., Valeriani, A., Barberá, P., Bonneau, R., Jost, J.T., Nagler, J., and Tucker, J (2013) ‘Social media and political communication: A survey of Twitter users during the 2013 Italian general election’, Rivista Italiana di Scienza Politica, 43: 325–355 Wang, W., Rothschild, D., Goel, S., and Gelman, A (2014) ‘Forecasting elections with non-­ representative polls’, International Journal of Forecasting Available at: http://www.sci encedirect.com/science/article/pii/S0169207014000879 Wei, L., and Hindman, D.B (2011) ‘Does the digital divide matter more? comparing the effects of new media and old media use on the education based knowledge gap’, Mass Communication and Society, 14(2): 216–235 Wilkerson, J., and Casas, A (2017) ‘Large-Scale Computerized Text Analysis in Political Science: Opportunities and Challenges’, Annual Review of Political Science, forthcoming Postscript (15 November 2016) A lesson from the 2016 US presidential race: you cannot predict elections without social media data? Soon after having submitted the final draft of our book, one of the most shocking elections in American political history (to say the least) took place Given that we monitored the last month of the 2016 US presidential campaign day-by-day exactly as we did for the analogous race in 2012 (see Chapter 3), we are somehow compelled to add this short postscript.1 The triumph of Donald Trump the 8th of November, in one way or another, has been a huge surprise, but perhaps not equally a surprise for everyone The results have shown a striking disconnection between the traditional media and polls on the one hand, and the electorate on the other For weeks the media reported a virtually closed match with predictions based on surveys that gave Hillary Clinton a clear lead in the popular vote as well as in almost every swing state, quite often with a comfortable margin Then came the election night, and we woke up with a very different reality than the one portrayed for weeks in the press True, media have misrepresented to a certain degree the interpretation given to polls data throughout the campaign,2 but it is not heresy to argue that those same polls did not fare that well Indeed, until the election night, several predictive models, mainly based on the analysis of survey data, assigned to Hillary Clinton a probability of victory that ranged from 72% – according to Nate Silver’s popular blog FiveThirtyEight – to 85% (New York Times) to an astonishing 98%–99%, according for example to the Huffington Post and the Princeton Election Consortium.3 The prediction about the final votes share of the two main candidates was inaccurate too Nate Silver attributed a margin of 3.6% to Hillary Clinton, while the estimates provided by Realclearpolitics.com, which – as we have already pointed out in Chapter – computes the average value of survey polls, claimed that Clinton was leading by 3.3 points Overall, the outcome of single surveys was even worse and polling companies working for ABC, CBS, FOX News, and The Economist gave to Hillary Clinton a 4-point advantage Only the IBD survey, which considered Trump points above Clinton, and the Los Angeles Times, which registered +3.2 for the Republican candidate, were suggesting that Trump could at least win 172  Postscript the national vote.4 Honestly, these predictions were not completely wrong given that the actual result falls within the margin of error of some of them Nevertheless, survey data clearly failed to catch the strong signal coming from American voters Why that? One possibility is related to what is called “confirmation bias”, i.e., excluding the evidence that one does not like (or expect) and filtering in only what confirms a hypothesis Possibly, however, the biggest reason goes back to what we already noticed in Chapter Exactly as the “shy Tory” thesis in the UK elections has been set forward as an explanation for the mismatch between the prior forecasts and the election outcome, the same could be argued for a “shy Trump voter” effect Accordingly, Trump supporters could have refrained from expressing their voting choice in survey polls, either because they were distrustful of institutions, including polling companies, or because after the scandals reported in the media involving the Republican candidate, they felt uncomfortable expressing their actual voting behavior.5 The “confirmation bias effect” combined with the “shy Trump voter effect” to generate a “perfect storm” that produced a very weak performance of survey polls (without mentioning the extremely poor results of the exit polls6) However, as we have argued throughout the present book, people could feel more free to express online their personal views without being affected by conformism and social desirability And this is what, after all, happened, at least according to our analysis To monitor the evolution of American public opinion, we followed our usual approach illustrated at length in the previous pages, focusing on comments coming from the United States that were discussing explicitly about the two candidates on Twitter (Overall, we have analyzed around 32.5 million posts on a daily basis, excluding the huge peaks during the presidential debates) However, we also decided to apply what is discussed in Chapter 5; that is, we integrated survey data with the information on the mood of social media analyzed by means of iSA as follows: from 19th of September till the 2nd of October, we ran an econometric model in which we used the results of our sentiment analysis to predict on a daily basis the estimates of national survey data as published on Real Clear Politics web-page It turned out that by using just the percentage of negative comments toward Donald Trump and the online voting intentions expressed in support of Hillary Clinton we were able to explain 97% of the overall trend in the survey Then, beginning on the 3rd of October, we relied on the just mentioned two social-media indicators, conveniently weighted according to the analysis just illustrated, to produce our estimates of the electoral forecast According to our analysis, at the beginning of October Donald Trump appeared to overtake Hillary Clinton in preferences (see Figure PS.1) Then, Trump faced difficulty when attacked on tax evasion, and even more after the video published by the Washington Post in which Trump denigrated the image of women If the sex video scandal brought down Trump in early October, the problems involving Clinton’s campaign (highlighted by WikiLeaks) and the desire to “drain the swamp”, favored Trump’s recovery Even discounting for the propaganda bots used by Trump staff (estimated to be about 20%–30% of the total volume7), the sentiment toward Trump pushed upwards, and in the second half of October, although polls Postscript  173 Figure PS.1  2016 US presidential election: daily social voting intentions according to iSA showed Clinton largely ahead, the Republican candidate was sharply rising, so much so that according to our estimates, on the day in which the FBI director “reopened” the email case against Hillary Clinton, Trump climbed over Clinton in the popular vote The race then remained narrow, a sort of “too close to call” contest, with Clinton oscillating around the only vantage point (+1.2 our final forecast) Hillary Clinton indeed won the popular vote by 0.6 points (at least according to the updated vote-counting procedure reported as in 15th November), far away from the 3–4 margin points that were predicted on the eve by pollsters But, as we all know, in American elections what really matter are the swing states Among the 14 swing states considered, on 10 occasions our forecast proved to be correct (Florida, Ohio, North Carolina, Georgia, Arizona, Virginia, Colorado, Nevada, New Hampshire, Minnesota) Here, too, our data was able to capture some important signals that escaped other analysts On Twitter, for example, the Trump victory in Ohio and Florida was never questioned, even in mid-October, although other analysts regarded Florida as Clinton’s territory Similarly, the sentiment predicted a fairly easy victory for Hillary Clinton in Nevada and Colorado, while the surveys considered the two were very much in the balance And while no one cared about Pennsylvania, we regarded this as a key state, as it was indeed Five days before the vote, our estimate recorded an advantage of Trump in Pennsylvania, so that on that day, our forecast predicted Trump as the next president of the United States.8 In a nutshell, we forecasted a more open game than others did, giving Trump a good percentage of success rate, predicting the victory of the Republican candidate 174  Postscript in some key states and coming very close to the real gap between the two candidates in the popular vote Contrary to media enthusiasm for the oncoming Hillary Clinton victory, our model suggested a higher level of uncertainty: if throwing 10 times the “dice”, in 4.1 circumstances, we would in fact predict a victory for Trump.9 And in the end, Donald Trump won Our estimates proved therefore to be more accurate than survey polls alone, suggesting that combining two sources of data on public opinion can produce gains in terms of accuracy, exactly as discussed in Chapter In fact, survey data can mitigate the bias of social media, while social media can attenuate the bias of survey polls In this respect, the appropriate mixture of multiple sources of data seems nowadays the best way to really get the whole picture, if our aim is to understand the behavior of a volatile public opinion The 2016 American elections confirm once again the ability of social networks to grasp in advance current trends in public opinion and in society at large Obviously, as in all disciplines, the right tools are needed, and this book is about methods after all Still, we cannot deny that public opinion has profoundly changed The way to measure it must change as well: there is no longer the option to avoid listening to social media information This is the path for better nowcasting (and forecasting) politics Notes 1 http://sentimeter.corriere.it/2016/10/17/presidenziali-americane-2016-secondo-la-rete/ 2 http://www.realclearpolitics.com/articles/2016/11/12/it_wasnt_the_polls_that_missed_ it_was_the_pundits_132333.html 3 http://www.nytimes.com/interactive/2016/upshot/presidential-polls-forecast.html http://www.realclearpolitics.com/epolls/2016/president/us/general_election_trump_vs_ clinton_vs_johnson_vs_stein-5952.html This could have been true especially for women backing Donald Trump See http:// fivethirtyeight.com/features/the-polls-missed-trump-we-asked-pollsters-why/?ex_ cid=2016-forecast 6 http://www.slate.com/votecastr_election_day_turnout_tracker.html http://firstmonday.org/ojs/index.php/fm/article/view/7090/5653 8 http://sentimeter.corriere.it/2016/11/03/se-si-votasse-oggi-il-presidente-sarebbe-trumpalmeno-per-la-rete/ 9 http://www.corriere.it/elezioni-presidenziali-usa-2016/notizie/trump-nuovo-presidenteusa-segnali-sottovalutati-due-lezioni-imparare-sull-opinione-pubblica-473743a8-a72e11e6–8208–49eea13f646a.shtml Index Aarts, K 98 administrative data Affordable Care Act 78 agenda-setting power 7–8 agglomerative methods 47–8 Agrawal, M 12 Alfano, Angelino 116, 122–5 American Idol 14 Amrit, C 98 Antweiler, W 13 Arab Spring 9, 13 Augustine, C.B 14 Avery, J.M backlash effect 121 Barclay, F.P 19 Barnes, M.D 14 Barrett, P 19 Basilaia, E Benghazi-gate 81 Bennett, W.L Berlusconi, Silvio 107, 110, 116–17, 122–5, 133 Bermingham, A 22 Bersani, Pier Luigi 39, 87–94, 106, 110, 116, 123, 133, 134 Big Data 1–2, 39, 154–5; results weighted against individual data 155–61 Bin Laden, Osama 13, 80 Bloomberg, Michael 81–2 Bos, J 21, 22 Bossi, Umberto 94–9, 133 Burnap, P 12 Burton, S.H 14 Cambraia, C.A 19 Cameron, M.P 19 campaigning, electoral 105; effectiveness of positive and negative Twitter 113–18; empirical analysis and results 118–22; online voting intentions in 2013 Italian election and 105–13; relationship between leaders sentiment and voting intentions and 122–6 Campante, F Cancellieri, Annamaria 97, 99 Canova, L 12 Casini, Pier Ferdinando 116, 123, 133, 134 Ceron, A Chelala, P 21 Chiao, C 85 Christie, Chris 81–2 CIA (Central Intelligence Agency) 13 Civati, Giuseppe 95, 96–7, 98–9 classification methods 47–52 clicktivism Clinton, Hillary 171–4 cluster analysis 47–8 cluster ensembles 48 Coletto, M 97 collective action 9–10 computational approach to social mediabased electoral forecasts 16–20 Comte, F 85 confirmation bias 172 Conway, B.A Corriere della Sera 95 Coskun, M 13 Cristea, A 161 Cristianini, N 14 criticisms of social media-based electoral forecasts 22–4 Cunha, E 19 Cunningham, J.A 14 Cuperlo, Gianni 95, 96–7, 99 Curini, L 8, 12 Dassen, A 98 data mining 47 Dave, K 39 176  Index de Jong, F 85 de Voogd, L 21 Di Pietro, Antonio 106, 123 dissimilarity 47 dissociative methods 47 Dotterweich, L Downs 142 Durahim, A.O 13 Durante, R dynamic multi-topic models 49 Dziurzynski, L 12 Eastwood, Clint 79 e-campaigning 10–11 Eichstaedt, J.C 12 emotions and well-being 12–13 endorsement data 19 estimation of policy positions using SNS 11–12 Facebook 38; Arab Spring and 9; Big Data revolution and 2; electoral forecasts and 15; endorsement data 19; growth of 5; sentiment analysis 21; voting behavior declarations on 18 Falck, O Financial Times 154 Fini, Gianfranco 123, 133 first-level agenda-setting 7–9 forecasts, electoral 132; best practices 154–5; comparing the approval of Italian leaders in 2011 and 132–5; computational approach to social media-based 16–20; discussion of SASA method for 149–51; limits and criticism of 22–4; made through SASA 68–71; meta-analysis of existing studies 143–8; sentiment analysis and machine learning applied to 20–2; 2015 Veneto regional election 135–8; 2012 French legislative elections 138–43; 2012 US presidential election 76–85; use of social media in 14–16; weighting results with individual data in 155–61; see also nowcasts, electoral France: legislative elections, 2012 138–43; presidential election, 2012 71–5 Franch, F 19 Frank, M.Z 13 functional words 40 Giannino, Oscar 112 Giraud-Carrier, C 14 GitHub 54, 59 Gold, R Gonỗalves, M.A 19 Google 13; Flu Trends 14, 154; Insights 8; Trends Grillo, Beppe 106, 112, 116, 123, 124–5 Grimmer, J 48 Guardian, The 154 Guggenheim, L Hale, K Hansen, B 14 Hansen, H.M 14 Hanson, C.L 14 hashtags 69–70; see also Twitter Hayashi-Yoshida asynchronous covariance estimator 86 Heblich, S Hierarchical methods 48 Hoffmann, M 856 Hollande, Franỗois 71–5 Hopkins, D 39, 51, 52 Hosch-Dayican, B 98 Howard, P.N human and automatic tagging 46–7 Human Genome Project 154 Hung, K 85 Hurricane Sandy 81–2 Hussain, M.M Iacus, S.M 8, 12 Ingroia, Antonio 106, 112, 116 iSA (integrated Sentiment Analysis) 39–40, 52–4; Donald Trump data example 59–65; online voting intentions and 107–13; United States presidential election, 2016 and 172–3 iSAX package 54–8 Italy: comparing approval of leaders in 2011 and election forecasts 132–5; effectiveness of positive and negative Twitter-campaigning in 2013 election in 113–18; online voting intentions in 2013 election 105–13; party leader primaries, 2013 94–9; primary election, 2012 87–94; 2015 Veneto regional election 135–8 Jha, S 12 Jiang, Y Johnson, T.J Jungherr, A Kang, J 85 Kaye, B.K Keim, D.A 48 Kenski, K Kern, M.L 12 Kim, A 14 Index  177 King, G 39, 48, 51, 52 Kompatsiaris, Y 161 Kroutil, L.A 14 Lakshmikanth, S.K 12 Lampos, V 14 Lawrence, S 39 LDA (latent Dirichlet allocation) method 48–9 lead-lag estimation 85 Lee, C 85 Lee, S 85 Lindsay, R 21, 22 Lucas, R.E 12 Lucchese, C 97 machine learning 21–2, 49–52 Mcneal, R MacWilliams, M.C 11 Magno, G 19 Maroni, Roberto 116, 123 mean absolute error (MAE) 23 Meraz, S Mo Jang, S Monti, Mario 106–10, 116–17, 123 Moretti, Alessandra 136–8 Mourdock, Richard 81 Murphy, J.J 14, 161 negative campaigning, effectiveness of 113–18; empirical analysis and results 118–22 Newman, R.W New York Times 13 Nijman, T 85 noise 50–1; iSAX package and 54–8 nowcasts, electoral 68, 70–1; Hollande versus Sarkozy in 2012 French presidential election 71–5; see also forecasts, electoral Obama, Barack 20, 21, 38; Twitter analysis using SASA approach and 76–85 Obamacare 78 Occupy Wall Street movement old versus new media 7–9 Open Source Indicators (OSI) 13 opinion mining 39 opinion polls 161–4 organic data 1–2 Orlando, S 97 Papadopoulos, S 161 Park, G.J 12 Parmelee, J.H Pennock, D.M 39 Perego, R 97 Pew Research Center 18 Philipp, G.S 18 Pichandy, C 19 Pittella, Gianni 95 Polgreen, P.M 14 politics: flourishing relationship between society, social media and 5–14; impact of the Web on 5–7 positive campaigning, effectiveness of 113–18; empirical analysis and results 118–22 Prodi, Romano 110 public policy 9–10 quantitative and qualitative text analysis 38–9 ReadMe algorithm 52 Realclearpolitics.com 82 Recorded Future 13 Renault, E 85 Renzi, Matteo 38–9, 87–94, 95, 96, 97, 98–9, 110 Richards, A.K 14 Rich Site Summary (RSS) feeds 16 Robert, C 85 Romney, Mitt 20, 38; Twitter analysis using SASA approach and 76–85 Rosenbaum, M 85–6 R statistical environment 54, 59–65 Sage, A.J 14 Salvini, Matteo 94–9 Sarkozy, Nicolas 71–5 SASA approach 68–71, 149–51; France presidential election, 2012 71–5; Italy primary elections 87–99; online voting intentions and 111; United States presidential election, 2012 76–87 Scharkow, M Schwartz, H.A 12 Schwarzer, S 21 scoring techniques in text analysis 44–6 second-level agenda-setting 8–9 Segerberg, A Segre, A.M 14 Self-Organizing Maps (SOM) 48 self-selection bias 24 Seligman, M.E 12 sentiment analysis (SA) 20–2, 38, 164–6; Donald Trump data example 59–65; opinion mining and 39; principles of 39–41; quantitative and qualitative analysis of texts in 38–9; relationship to opinion polls 161–4; stemming in 41–4; see also text analysis 178  Index Servizio Pubblico 124 Shaw, D.L Signorini, A 14 Silver, Nate 171 slacktivism Smeaton, A 22 Sobbrio, F social desirability 111, 127, 164, 172 social media and social network sites (SNS): agenda-setting power 7; analysis relationship to opinion polls 161–4; best practices for using data from 154–5; collective action and public policy influence 9–10; compared to traditional media 7–9; data 2; e-campaigning using 10–11; economic, social and political forecasts using 13–14; electoral forecasts and 14–16; emotions and well-being measured using 12–13; estimating policy positions using 11–12; flourishing relationship between politics, society and 5–14; online voting intentions in 2013 Italian election and 105–13; see also Twitter spatial theory of voting 114 Sprenger, T.O 18 stemming 41–4 Stephens, M 12 Stewardson, B 19 Sudhakaran, S 19 Surveys Voting Intentions 125 text analysis: classification methods 47–52; fundamental principles of 39–41; making text digestible to statistical models for 41–4; opinion discovery and 38; pros and cons of human and automatic tagging in 46–7; quantitative and qualitative 38–9; scoring techniques 44–6 text mining 47 Tjong Kim Sang, E 21, 22 topic models 48–9 traditional sentiment analysis 21 training set 41 transaction data Trump, Donald 59–65; Twitter analysis using iSA approach and 172–4 Tsakalidis, A 161 Tufekci, Z Tumasjan, A 18 Twindex 84 Twitter 38; agenda-setting power 7, 8, 9; analysis of mentions of parties and candidates on 15; Big Data revolution and 2; drug abuse analysis 14; effectiveness of positive and negative campaigning on 113–18; endorsement data 19; estimating policy positions using 11; growth of 5; “hate map” 12; nowcasting using 69–70, 71–5; reciprocal influence between new and old media and 8; Recorded Future 13; SASA approach and 68–70; sentiment analysis 20, 21; social media crowd and 18; voting behavior declarations on 18; see also forecasts, electoral; social media and social network sites (SNS) Ungar, L.H 12 United States presidential election, 2012: surveys versus sentiment in 85–7; Twitter analysis using SASA approach and 76–85 United States presidential election, 2016: Twitter analysis using iSA approach and 172–4 Vargo, C.J Vendola, Nichi 116, 123, 133, 134 Venkat, A 19 Véronis, J 16 Virgilio, A 19 Vogelgesang, J volume data 19 voting intentions: online 105–13; relationship between leaders sentiment and 122–6 Wallsten, K 7, Wang, D Welpe, I.M 18 West, J.H 14 Wikipedia 15 Williams, M.L 12 Wilson, C Woodly, D Wordfish 44–6 Wordscores 46 Yoshida, N 85–6 Young Bae, S YouTube Zaia, Luca 136–8

Ngày đăng: 04/03/2019, 16:17

TÀI LIỆU CÙNG NGƯỜI DÙNG

  • Đang cập nhật ...

TÀI LIỆU LIÊN QUAN