1. Trang chủ
  2. » Công Nghệ Thông Tin

Big data in complex and social networks

253 366 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 253
Dung lượng 13,4 MB

Nội dung

BIG DATA IN COMPLEX AND SOCIAL NETWORKS Chapman & Hall/CRC Big Data Series SERIES EDITOR Sanjay Ranka AIMS AND SCOPE This series aims to present new research and applications in Big Data, along with the computational tools and techniques currently in development The inclusion of concrete examples and applications is highly encouraged The scope of the series includes, but is not limited to, titles in the areas of social networks, sensor networks, data-centric computing, astronomy, genomics, medical data analytics, large-scale e-commerce, and other relevant topics that may be proposed by potential contributors PUBLISHED TITLES BIG DATA COMPUTING: A GUIDE FOR BUSINESS AND TECHNOLOGY MANAGERS Vivek Kale BIG DATA IN COMPLEX AND SOCIAL NETWORKS My T Thai, Weili Wu, and Hui Xiong BIG DATA OF COMPLEX NETWORKS Matthias Dehmer, Frank Emmert-Streib, Stefan Pickl, and Andreas Holzinger BIG DATA : ALGORITHMS, ANALYTICS, AND APPLICATIONS Kuan-Ching Li, Hai Jiang, Laurence T Yang, and Alfredo Cuzzocrea NETWORKING FOR BIG DATA Shui Yu, Xiaodong Lin, Jelena Mišic, ´ and Xuemin (Sherman) Shen BIG DATA IN COMPLEX AND SOCIAL NETWORKS EDITED BY My T Thai University of Florida, USA Weili Wu University of Texas at Dallas, USA Hui Xiong Rutgers, The State University of New Jersey, USA CRC Press Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487-2742 © 2017 by Taylor & Francis Group, LLC CRC Press is an imprint of Taylor & Francis Group, an Informa business No claim to original U.S Government works Printed on acid-free paper Version Date: 20161014 International Standard Book Number-13: 978-1-4987-2684-9 (Hardback) This book contains information obtained from authentic and highly regarded sources Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint Except as permitted under U.S Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400 CCC is a not-for-profit organization that provides licenses and registration for a variety of users For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com and the CRC Press Web site at http://www.crcpress.com Contents Preface vii Editors ix Section I Social Networks and Complex Networks Chapter Hyperbolic Big Data Analytics within Complex and Social Networks Eleni Stai, Vasileios Karyotis, Georgios Katsinis, Eirini Eleni Tsiropoulou and Symeon Papavassiliou Chapter Scalable Query and Analysis for Social Networks 37 Tak-Lon (Stephen) Wu, Bingjing Zhang, Clayton Davis, Emilio Ferrara, Alessandro Flammini, Filippo Menczer and Judy Qiu Section II Big Data and Web Intelligence Chapter Predicting Content Popularity in Social Networks 65 Yan Yan, Ruibo Zhou, Xiaofeng Gao and Guihai Chen Chapter Mining User Behaviors in Large Social Networks 95 Meng Jiang and Peng Cui Section III Security and Privacy Issues of Social Networks Chapter Mining Misinformation in Social Media 125 Liang Wu, Fred Morstatter, Xia Hu and Huan Liu v vi Contents Chapter Rumor Spreading and Detection in Online Social Networks 153 Wen Xu and Weili Wu Section IV Applications Chapter A Survey on Multilayer Networks and the Applications 183 Huiyuan Zhang, Huiling Zhang and My T Thai Chapter Exploring Legislative Networks in a Multiparty System 213 Jose Manuel Magallanes Index 233 Preface In the past decades, the world has witnessed a blossom of online social networks, such as Facebook and Twitter This has revolutionized the way of human interaction and drastically changed the landscape of information sharing in cyberspace nowadays Along with the explosive growth of social networks, huge volumes of data have been generating The research of big data, referring to these large datasets, gives insight into many domains, especially in complex and social network applications In the research area of big data, the management and analysis of largescale datasets are quite challenging due to the highly unstructured data collected The large size of social networks, spatio-temporal effect and interaction between users are among various challenges in uncovering behavioral mechanisms Many recent research projects are involved in processing and analyzing data from social networks and attempt to better understand the complex networks, which motivates us to prepare an in-depth material on recent advances in areas of big data and social networks This handbook is to provide recent developments on theoretical, algorithmic and application aspects of big data in complex social networks The handbook consists of four parts, covering a wide range of topics The first part focuses on data storage and data processing The efficient storage of data can fundamentally support intensive data access and queries, which enables sophisticated analysis Data processing and visualization help to communicate information clearly and efficiently The second part of this handbook is devoted to the extraction of essential information and the prediction of web content By performing big data analysis, we can better understand the interests, location and search history of users and have more accurate prediction of users’ behaviors The book next focuses on the protection of privacy and security in Part Modern social media enables people to share and seek information effectively, but also provides effective channels for rumor and misinformation propagation It is essentially important to model the rumor diffusion, identify misinformation from massive data and design intervention strategies Finally, Part discusses the emergent application of big data and social networks It is particularly interested in multilayer networks and multiparty systems We would like to take this opportunity to thank all authors, the anonymous referees, and Taylor & Francis Group for helping us to finalize this handbook Our thanks also go to our students for their help during the processing of all contributions Finally, we hope that this handbook will encourage research on vii viii Preface the many intriguing open questions and applications in the area of big data and social networks that still remain My T Thai Weili Wu Hui Xiong Editors My T Thai is a professor and associate chair for research in the department of computer and information sciences and engineering at the University of Florida She received her PhD degree in computer science from the University of Minnesota in 2005 Her current research interests include algorithms, cybersecurity and optimization on network science and engineering, including communication networks, smart grids, social networks and their interdependency The results of her work have led to books and 120+ articles published in various prestigious journals and conferences on networking and combinatorics Dr Thai has engaged in many professional activities She has been a TPCchair for many IEEE conferences, has served as an associate editor for Journal of Combinatorial Optimization (JOCO), Optimization Letters, Journal of Discrete Mathematics, IEEE Transactions on Parallel and Distributed Systems, and a series editor of Springer Briefs in Optimization Recently, she has cofounded and is co-Editor-in-Chief of Computational Social Networks journal She has received many research awards including a UF Research Foundation Fellowship, UF Provosts Excellence Award for Assistant Professors, a Department of Defense (DoD) Young Investigator Award, and an NSF (National Science Foundation) CAREER Award Weili Wu is a full professor in the department of computer science, University of Texas at Dallas She received her PhD in 2002 and MS in 1998 from the department of computer science, University of Minnesota, Twin City She received her BS in 1989 in mechanical engineering from Liaoning University of Engineering and Technology in China From 1989 to 1991, she was a mechanical engineer at Chinese Academy of Mine Science and Technology She was an associate researcher and associate chief engineer in Chinese Academy of Mine Science and Technology from 1991 to 1993 Her current research mainly deals with the general research area of data communication and data management Her research focuses on the design and analysis of algorithms for optimization problems that occur in wireless networking environments and various database systems She has published more than 200 research papers in various prestigious journals and conferences such as IEEE Transaction on Knowledge and Data Engineering (TKDE), IEEE Transactions on Mobile Computing (TMC), IEEE Transactions on Multimedia (TMM), ACM Transactions on Sensor Networks (TOSN), IEEE Transactions on Parallel and Distributed ix Escalante Leon Dacia Nena Venegas Mello Rosa Maria Zeballos Gamez Washington Reymundo Mercado Edgard Cornelio Silva Diaz Juvenal Sabino Acosta Zarate Martha Carolina Supa Huaman Hilaria Anaya Oropeza Jose Oriol Urquizo Maggia Jose Antonio Gutierrez Cueva Alvaro Gonzalo Abugattas Majluf Daniel Fernando Vega Antonio Jose Alejandro Luizar Obregon Oswaldo Maslucan Culqui Jose Alfonso Ramos Prudencio Gloria Deniz Najar Kokally Roger Torres Caro Carlos Alberto Santos Carpio Pedro Julian Bautista Leon Zapata Antonio Ruiz Delgado Miro Espinoza Soto Gustavo Dacio Vasquez Rodriguez Rafael Beteta Rubin Karina Juliza Obregon Peralta Nancy Rufina Isla Rojas Victor Huancahuari Paucar Juana Aide Estrada Choque Aldo Vladimiro Cabrera Campos Werner Mekler Neiman Isaac Leon Minaya Elizabeth Vilca Achata Susana Gladis Uribe Medina Cenaida Cebastiana Sumire de Conde Maria Cleofe Galindo Sandoval Cayo Cesar Espinoza Cruz Marisol Escudero Casquino Francisco Alberto Rivas Texeira Martin Amado Espinoza Ramos Eduardo Mayorga Miranda Victor Ricardo Sucari Cari Margarita Teodora Serna Guzman Isaac Fredy Escalante Leon Dacia Nena Venegas Mello Rosa Maria Zeballos Gamez Washington Reymundo Mercado Edgard Cornelio Silva Diaz Juvenal Sabino Acosta Zarate Martha Carolina Supa Huaman Hilaria Anaya Oropeza Jose Oriol Urquizo Maggia Jose Antonio Gutierrez Cueva Alvaro Gonzalo Abugattas Majluf Daniel Fernando Vega Antonio Jose Alejandro Luizar Obregon Oswaldo Maslucan Culqui Jose Alfonso Ramos Prudencio Gloria Deniz Najar Kokally Roger Torres Caro Carlos Alberto Santos Carpio Pedro Julian Bautista Leon Zapata Antonio Ruiz Delgado Miro Espinoza Soto Gustavo Dacio Vasquez Rodriguez Rafael Beteta Rubin Karina Juliza Obregon Peralta Nancy Rufina Isla Rojas Victor Huancahuari Paucar Juana Aide Estrada Choque Aldo Vladimiro Cabrera Campos Werner Mekler Neiman Isaac Leon Minaya Elizabeth Vilca Achata Susana Gladis Uribe Medina Cenaida Cebastiana time point 2010-II 2010-I 2009-II 2009-I 2008-II 2008-I 2007-II 2007-I 2006-II 2006-I De la Cruz Vasquez Oswaldo Pando Cordova Ricardo Moyano Delgado Martha Lupe Andrade Carmona Alberto Manuel Lescano Ancieta Yonhy Sasieta Morales Antonina Rosario Foinquinos Mera Jorge Rafael Garcia Belaunde Victor Andres Belmont Cassinelli Ricardo Reggiardo Barreto Renzo Andres Fujimori Fujimori Santiago Raffo Arce Carlos Fernando Fujimori Higuchi Keiko Sofia Sousa Huanambal Victor Rolando Reategui Flores Rolando Aguinaga Recuenco Alejandro Aurelio Cuculiza Torre Luisa Maria Chacon de Vettori Cecilia Isabel Calderon Castro Wilder Felix Flores Torres Jorge Leon Carrasco Tavara Jose Carlos Negreiros Criado Luis Alberto Herrera Pumayauli Julio Roberto Salazar Leguia Fabiola Sumire de Conde Maria Cleofe Galindo Sandoval Cayo Cesar Espinoza Cruz Marisol Escudero Casquino Francisco Alberto Rivas Texeira Martin Amado Espinoza Ramos Eduardo Mayorga Miranda Victor Ricardo Sucari Cari Margarita Teodora Serna Guzman Isaac Fredy De la Cruz Vasquez Oswaldo Pando Cordova Ricardo Moyano Delgado Martha Lupe Andrade Carmona Alberto Manuel Lescano Ancieta Yonhy Sasieta Morales Antonina Rosario Foinquinos Mera Jorge Rafael Garcia Belaunde Victor Andres Belmont Cassinelli Ricardo Reggiardo Barreto Renzo Andres Fujimori Fujimori Santiago Raffo Arce Carlos Fernando Fujimori Higuchi Keiko Sofia Sousa Huanambal Victor Rolando Reategui Flores Rolando Aguinaga Recuenco Alejandro Aurelio Cuculiza Torre Luisa Maria Chacon de Vettori Cecilia Isabel Calderon Castro Wilder Felix Flores Torres Jorge Leon Carrasco Tavara Jose Carlos Negreiros Criado Luis Alberto Herrera Pumayauli Julio Roberto Salazar Leguia Fabiola Canepa la Cotera Carlos Alberto Pari Choquecota Juan Donato Cajahuanca Rosales Yaneth Carpio Guerrero Franco Eguren Neuenschwander Juan Carlos Mallqui Beas Jose Eucebio Castro Stagnaro Raul Eduardo Perez Monteverde Martin Bedoya de Vivanco Javier Alonso Perez del Solar Cuculiza Gabriela Lourdes Menchola Vasquez Walter Ricardo Galarreta Velarde Luis Fernando Canchaya Sanchez Elsa Victoria Luna Galvez Jose Leon Urtecho Medina Wilson Michael Morales Castillo Fabiola Maria Ruiz Silva Wilder Augusto Alcorta Suero Maria Lourdes Pia Luisa Lombardi Elias Guido Ricardo Yamashiro Ore Rafael Gustavo Florian Cedron Rosa Madeleine Tapia Samaniego Hildebrando Perry Cruz Juan David Lazo Rios de Hornung Alda Mirta Waisman Rjavinsthi David Bruce Montes de Oca Carlos Ricardo Mulder Bedoya Claude Maurice Alegria Pastor Mario Arturo Cenzano Sierralta Alfredo Tomas Guevara Trelles Miguel Luis Velasquez Quesquen Angel Javier Zumaeta Flores Cesar Alejandro Wilson Ugarte Luis Daniel Gonzales Posada Eyzaguirre Luis Javier Cribilleros Shigihara Olga Amelia Valle Riestra Gonzales Olaechea Javier Maximiliano Alfredo Hipolito Mendoza del Solar Lourdes Benites Vasquez Tula Luz Leon Romero Luciana Milagros Giampietri Rojas Luis Alejandro Huerta Diaz Anibal Ovidio Falla Lamadrid Luis Humberto Peralta Cruz Jhony Alexander Cabanillas Bustamante Mercedes Del Castillo Galvez Jorge Alfonso Alejandro Robles Lopez Daniel Pelaez Bardales Eduardo Rebaza Martell Alejandro Arturo Vilchez Yucra Nidia Ruth Balta Salazar Maria Helvezia Guevara Gomez Hilda Elizabeth Sanchez Ortiz Franklin Humberto Vargas Fernandez Jose Augusto Alva Castro Luis Juan Pastor Valdivieso Aurelio Macedo Sanchez Jose Rodriguez Zavaleta Elias Nicolas Party switching of legislators in the 2006-2011 Congress Each vertical line is the separation of legislators by party: (from left to right) Alianza por el Futuro, Frente de Centro, Partido Aprista Peruano, Peru Posible, Restauracion Nacional, Unidad Nacional, and Union por el Peru FIGURE 8.10 Canepa la Cotera Carlos Alberto Pari Choquecota Juan Donato Cajahuanca Rosales Yaneth Carpio Guerrero Franco Eguren Neuenschwander Juan Carlos Mallqui Beas Jose Eucebio Castro Stagnaro Raul Eduardo Perez Monteverde Martin Bedoya de Vivanco Javier Alonso Perez del Solar Cuculiza Gabriela Lourdes Menchola Vasquez Walter Ricardo Galarreta Velarde Luis Fernando Canchaya Sanchez Elsa Victoria Luna Galvez Jose Leon Urtecho Medina Wilson Michael Morales Castillo Fabiola Maria Ruiz Silva Wilder Augusto Alcorta Suero Maria Lourdes Pia Luisa Lombardi Elias Guido Ricardo Yamashiro Ore Rafael Gustavo Florian Cedron Rosa Madeleine Tapia Samaniego Hildebrando Perry Cruz Juan David Lazo Rios de Hornung Alda Mirta Waisman Rjavinsthi David Bruce Montes de Oca Carlos Ricardo Mulder Bedoya Claude Maurice Alegria Pastor Mario Arturo Cenzano Sierralta Alfredo Tomas Guevara Trelles Miguel Luis Velasquez Quesquen Angel Javier Zumaeta Flores Cesar Alejandro Wilson Ugarte Luis Daniel Gonzales Posada Eyzaguirre Luis Javier Cribilleros Shigihara Olga Amelia Valle Riestra Gonzales Olaechea Javier Maximiliano Alfredo Hipolito Mendoza del Solar Lourdes Benites Vasquez Tula Luz Leon Romero Luciana Milagros Giampietri Rojas Luis Alejandro Huerta Diaz Anibal Ovidio Falla Lamadrid Luis Humberto Peralta Cruz Jhony Alexander Cabanillas Bustamante Mercedes Del Castillo Galvez Jorge Alfonso Alejandro Robles Lopez Daniel Pelaez Bardales Eduardo Rebaza Martell Alejandro Arturo Vilchez Yucra Nidia Ruth Balta Salazar Maria Helvezia Guevara Gomez Hilda Elizabeth Sanchez Ortiz Franklin Humberto Vargas Fernandez Jose Augusto Alva Castro Luis Juan Pastor Valdivieso Aurelio Macedo Sanchez Jose Rodriguez Zavaleta Elias Nicolas 10 time points, γ=1.5, ω=60.0 2010-II 2010-I 2009-II 2009-I 2008-II 2008-I 2007-II 2007-I 2006-II 2006-I time point Big Data in Complex and Social Networks 228 10 time points, γ=1.5, ω=20.0 In Equation 8.2, i and j are legislator indices, and s and r are slice (year) indices The adjacency tensor Aijs = if nodes i and j are connected in slice s, and Aijs = otherwise kis is the degree (or strength in weighted networks) of node i in slice s, ms is the number of edges (or sum of weights in weighted networks) in slice s, and γs is the resolution parameter in slice s Cjsr = if slices s and r are connected via node j, and Cjsr = otherwise The factor 2µ is used for the normalization condition Q∈ [-1,1] I use the simplest version of interslice connection that Cjsr = ω if two adjacent years s and r share lehgislator j and Cjrs = otherwise, and γs = γ for all the slices Therefore, ω as the interslice connection strength and γ as the intraslice resolution parameter are the control parameter plane As in [17], I have used the Louvain method [2] for maximizing Q from community aggregation Exploring Legislative Networks in a Multiparty System 8.5 229 DISCUSSION OF RESULTS This work confirms the usefulness of social network analysis to study the interaction of legislators in multi-party systems As much of the literature on co sponsorship uses the US Congress as the case of study, this work makes uses of the techniques available to study a different country where many parties get seats, but most legislators know they will not survive the next election In this case, I first focus my attention to discover if co sponsorship can complement the accepted re election strategies which are mostly exogenous and without a network approach The results obtained clearly show how different network metrics reveal the position that the legislators occupied as time went by, however, as it was seen, the strategy followed by every reelected or no reelected legislator was similar or not easy to differentiate, which motivated the use of a data mining technique The results obtained by the association rule algorithm proved that there are some combination of endogenous strategies which included a combination of centrality and homophily (EI index) telling that reelection was achieved by legislators that kept intermediate values in those dimensions On the other hand, graph theory proved very useful to highlight how the splitting of a party in congress (macro-level) does not correlate with co sponsorship, as this network reported one component during four of the five years a legislator is in office In the same manner, using the multi slicing technique proved very effective to detect the party switching process, which is done in parallel with party splitting In general, party switching in Peru is very criticized, it is portrayed by most mass media as a dishonest act since the phenomena started in the 1990s In this circumstances, switching is a risky strategy but it is nevertheless followed by many legislators As a by product of this analysis, I could also detect party discipline at the individual level, as there are parties whose legislators never quit (Alianza por el Futuro -Fujimoristas, Frente de Centro, Partido Aprista Peruano) 8.6 FURTHER RESEARCH I have focused this work on a particular set of proposals — the law proposals mainly because they were the most numerous and also because they addressed the issues of greater impact on the governability of Peru However, I have not analyzed the other proposals I also collected For example, I have amendments, which are more sophisticated proposals that may need not only the signature of the proposal but also more dedication on the part of the legislator, since, in case it is accepted for roll-call vote, it would need at least 66% of the votes to be approved A less conflicting proposal and also the least interesting is the Declaration, which is a symbolic act: congressmen declare some issue as relevant for the nation and the like However, both of these proposals may present different challenges It will also be important to study more Congresses I am already organizing the data available from 1995 which will give me two more Congresses to analyze with the possibility of comparing 230 Bibliography important structural differences among them This could be very interesting due to the fact that the legislators from 2001 onward consider themselves as the leaders of the restoration of democracy after the so-called dictatorship during the 1990s; therefore I should expect differences among the 1995-2000 Congress and those that came after Another interesting strategy would be to carry out ego-network analysis, considering that each case represents an interesting history by itself, however, that may need some further qualitative fieldwork I now know some more on Congress dynamics I can clearly see that the Peruvian political class is still learning how to survive in the complex political setting where they are embedded It may also seem that these legislators are in a “beginner stage” when their inner motivations for political survival are distracting them from building strong political parties This will not need to continue indefinitely; there will come a time when they realize that keeping this mindset may in fact make survival harder for them; but I cannot anticipate when this will happen ACKNOWLEDGMENT I thank the collaboration of Sang Hon Lee in the multislice technique I am also grateful to Mason Porter and James Fowler for their interest in this topic and his advice during the organization of the data I also owe a lot to the different scholar to whom I have discussed about this data: Claudio Cioffi-Revilla, Robert Axtell, Andrew Crooks, Maksim Tsvetovat, William Kennedy, Jeniffer Victor and Bruce Desmarais Finally, particular thanks to my co researchers Scott Morgensten and Ernesto Calvo in this topic with whom I may soon finish a comparative research study on co sponsorship There are previous versions of this work with the participation of my colleagues Annetta Burger, Adriana Martinez and Nikhil Murali from George Mason University I also would like to acknowledge my former student Maria Alejandra Guzman, with whom I started working on this topic in Peru Bibliography [1] Eduardo Aleman and Ernesto Calvo Explaining Policy Ties in Presidential Congresses: A Network Analysis of Bill Initiation Data Political Studies, 61(2):356–377, June 2013 [2] Vincent D Blondel, Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre Fast unfolding of communities in large networks Journal of Statistical Mechanics: Theory and Experiment, 2008(10):P10008, October 2008 [3] Kathleen A Bratton and Stella M Rouse Networks in the Legislative Arena: How Group Dynamics Affect Cosponsorship Legislative Studies Quarterly, 36(3):423–460, August 2011 Bibliography 231 [4] John A Ferejohn On the Decline of Competition in Congressional Elections The American Political Science Review, 71(1):166–176, March 1977 [5] James H Fowler Legislative cosponsorship networks in the US House and Senate Social Networks, 28(4):454–465, October 2006 [6] Matt Groeninger “Circular layout”, Gephi Marketplace, March 2, 2013 https://marketplace.gephi.org/plugin/circular-layout/ [7] Justin H Gross and Cosma Shalizi Cosponsorship in the us senate: A multilevel approach to detecting the subtle influence of social relational factors on legislative behavior Unpublished Manuscript, Department of Statistics, Carnegie Mellon University, 2008 [8] Michael Hahsler, Bettina Grn, and Kurt Hornik A computational environment for mining association rules and frequent item sets Journal of Statistical Software, 14(15), pages 1–25, 2005 [9] Harry Henderson Campaign and Election Reform Library in a book Facts On File, New York, 2004 [10] IDEA Estudios sobre el Congreso Peruano: grupos parlamentarios, disciplina partidaria y desempeo profesional UARM, Instituto Etica y Desarrollo ; IDEA Internacional, Lima, 2009 bibtex: IDEA2009 [11] Gregory Koger and James H Fowler Parties and Agenda-Setting in the Senate, 1973-1998 SSRN Electronic Journal, 2007 [12] D Krackhardt and R.N Stern Informal Networks and Organizational Crises: An Experimental Simulation ILR Reprints ILR Press, New York State School of Industrial and Labor Relations, Cornell University, 1988 [13] Kevin Lees Incumbents Arent Latin Americas Problem | Americas Quarterly, November 2014 [14] Steven D Levitt and Catherine D Wolfram Decomposing the sources of incumbency advantage in the U.S House Legislative Studies Quarterly, 22(1):45–60, February 1997 [15] Michael J Malbin, editor Life after Reform: When the Bipartisan Campaign Reform Act Meets Politics Campaigning American style Rowman & Littlefield, Lanham, MD, 2003 [16] David R Mayhew Congressional elections: The case of the vanishing marginals Polity, 6(3):295–317, April 1974 [17] Peter J Mucha, Thomas Richardson, Kevin Macon, Mason A Porter, and Jukka-Pekka Onnela Community structure in time-dependent, multiscale, and multiplex networks Science, 328(5980):876–878, 2010 232 Bibliography [18] Ivan Pastine, Tuvana Pastine, and Paul Redmond Incumbent-Quality Advantage and Counterfactual Electoral Stagnation in the US Senate: Incumbent-Quality Advantage Politics, 35(1):32–45, February 2015 [19] Pascal Pons and Matthieu Latapy Computing communities in large networks using random walks (long version) arXiv:physics/0512106, December 2005 arXiv: physics/0512106 [20] Nils Ringe and Jennifer Nicoll Victor Bridging the Information Gap: Legislative Member Organizations as Social Networks in the United States and the European Union The University of Michigan Press, Ann Arbor, 2013 [21] Kenneth A Shepsle and Barry R Weingast Political preferences for the pork barrel: A generalization American Journal of Political Science, 25(1):96–111, February 1981 [22] Wendy K Tam Cho and James H Fowler Legislative success in a small world: Social network analysis and the dynamics of congressional legislation The Journal of Politics, 72(01):124, January 2010 [23] Daniel Zovatto Reelection, continuity and hyper-presidentialism in Latin America, The Brookings Institute, February 12, 2014 https://www.brookings.edu/research/opinions/2014/02/12-reelectioncontinuity-hyperpresidentialism-zovatto Index A Abstract Syntax Trees (AST), 40 AP, see Average precision Apache high-level language, syntax and its common features, 39–45 Abstract Syntax Trees, 40 GRUNT interactive shell interface, 40 Hive, 42–44 HiveQL queries, 43 Pig, 39–42 RDBMS technique, 43 Spark SQL/Shark, 44–45 UNIX bash scripting, 40 AST, see Abstract Syntax Trees Average precision (AP), 85 B Belief exchange, 131 Big data analytics (BDA), 5, 9–13 network science and, 6–8 Bill co-sponsorship and social network analysis, 219–228 classification of proposals, 220 cosponsorship network, 221 EI index, 223, 225 organizing the data, 219–221 party splitting and switching, 226–228 re-election strategies, 222–226 rule association, 224 Bing search, 90 Bots, 137 Box-office revenue prediction, 75–76 Brownian Motion, 77 C Citeseer co-authorship dataset, 77 Computer science networks, 204 Confusion matrix, 86–87 Content Delivery Network (CDN), 89 Content popularity in social networks, prediction of, 65–94 classification of social network, 69–72 major-based social network, 70–72 narrow-sensed social network, 69 news-based social network, 69–70 non-intuitive social network, 71 description of social network, 67 evaluation, 84–88 average precision, 85 classification prediction, 85–87 confusion matrix, 86–87 correlation-based method, 87–88 Discounted Cumulative Gain, 84 error-based method, 88 F-score, 86 importance of, 84 Linear Regression, 88 metrics, 84–88 numerical prediction, 87–88 ranking prediction, 84–85 SVM Regression, 88 levels of social network, 67 life cycles, prediction based on, 78–81 233 234 Index discussion on the types of life cycles, 78–79 Hashtag, life cycles for, 80–81 influence decay, life cycle with, 79–80 rebounding, 79 regression models, 81 long tail, 67–69 look forward, 88–90 online marketing, 89 search engines, 90 network topology, prediction based on, 81–83 discussion on node attribute, 82 edge, 81 In-Degree, 81, 82 Latent Dirichlet Allocation model, 82 micro-blogs, 82 node, 81 Out-Degree, 81, 82 state-of-the-art of predicting popular online content, 83 prediction model, 72–83 correlation-based method, 73 feature selection, 72–74 mature tool, 73 prediction based on life cycles, 78–81 prediction based on network topology, 81–83 prediction based on user behaviors, 75–78 text content, 74 unique method, 73–74 user behaviors, prediction based on, 75–78 box-office revenue prediction, 75–76 news popularity prediction, 76 story popularity prediction, 77–78 user behavior prediction, 76–77 Crowdsourcing, 127, 144 D Deception detection for online news, 174 Dependency edges, 190 Dialogue act classification, 173 Digg, 70, 75, 77, 95 Discounted Cumulative Gain (DCG), 84 E Economical networks, 205–206 Embedding of networked data, 17–21 Exponential random graph (ERGM) statistical modeling, 219 Extract, Transform, and Load (ETL) data processing, 47 F Facebook, 5, 69, 185 data infrastructure team, 42 friend-request dataset, 77 growth of, 192 problem definition, 102 rumor cascades, 155 -style social website in China, 97 total number of monthly active users, 66 wall messages spreading between users, 135 -wallpost dataset, 77 Flexible evolutionary multi-faceted analysis (FEMA), 102, 104–107 Flickr, 66, 75 F-score, 86 G Gigantic datasets, 5,see also Hyperbolic big data analytics framework Google, co-founders, 48 Correlate, 218 Index News Archives, 79 Plus+, 185, 192 search, 90 search engine, 48 Graph theory (rumor source detection), 158–172 algorithm, 163–169 discussion, 171–172 model, 161–162 Multiple Rumor Source Detection, 159, 160 networks with partial observations, detecting multiple rumor sources in, 160 Set Resolving Set, 159, 160, 161 simulation results, 169–171 susceptible-infected-recovered model, 158 GRUNT interactive shell interface, 40 H Harp, see Pig and Harp, integrated high-level dataflow system with Hashtag, life cycles for, 80–81 High-level dataflow system, see Pig and Harp, integrated high-level dataflow system with HMS, see Hyperbolic multidimensional scaling Honeypot accounts, 139 HSOM, see Self-organizing map in hyperbolic space Hybrid Random Walk (HRW) method, 109 Hyperbolic big data analytics framework, 3–36 big data analytics based on hyperbolic space, 9–13 big data and network science, 6–8 235 big data challenges and complex networks, complex networks, big data and the big data chain, 6–8 data correlations and dimensionality reduction in hyperbolic space, 14–17 embedding of networked data, 17–21 HyperMap embedding, 19–21 Rigel embedding in hyperboloid model, 17–19 greedy routing over hyperbolic coordinates and applications, 21–23 hyperbolic geometric space, fundamentals of, 11–13 optimization techniques, 23–29 advertisement allocation over online social networks, 23–27 file allocation optimization in wireless cellular networks, 27–29 outline, scope and objectives, big data analytics, gigantic datasets, Internet of Things, social networks analysis, visualization analytics, 29–32 adaptive focus in hyperbolic space, 30–31 general graphs, 31–32 hierarchical (tree) graphs, 31 hybrid scheme, 31 hyperbolic multidimensional scaling, 31 Mobius transformations, 31 “pie segment,” 31 self-organizing map, 31 self-organizing map in hyperbolic space, 31 treemap, 31 236 Index Hyperbolic multidimensional scaling (HMS), 31 HyperMap embedding, 19–21 Hypernetworks, 191, 192 Large social networks, user behaviors in, see User behaviors in large social networks, mining of Latent Dirichlet Allocation (LDA) I model, 82 Independent Cascade (IC) model, Legislative networks in a multiparty 130 system, 213–232 Influence decay, life cycle with, 79–80 background, 214–218 Influence maximization (IM), 193 institutional conditions, Information diffusion in social 216–217 networks, 128–131 political scenario, 214–216 Independent Cascade model, 130 2006–2011 Congress, 217–218 Linear Threshold model, bill co-sponsorship and social 130–131 network analysis, 219–228 Maximum Influence classification of proposals, 220 Arborescence, 131 cosponsorship network, 221 roles for diffusion, 128 EI index, 223, 225 SIMPATH, 131 organizing the data, 219–221 SIR model, 128 party splitting and switching, Tipping model, 129 226–228 Instagram, 66 re-election strategies, 222–226 Integrated high-level dataflow rule association, 224 system, see Pig and Harp, co-sponsorship as a network, integrated high-level 218–219 dataflow system with discussion of results, 229 Interconnected networks, 190 exponential random graph International Trade Network (ITN), statistical modeling, 219 205 further research, 229–230 Internet of Things (IoT), graph theory, 229 Library and information science K (LIS), 174 K-means clustering, 48 Life cycles, prediction based on, Knowledge transfer (large social 78–81 networks), 108–115 discussion on the types of life cross-domain behavior modeling, cycles, 78–79 108–109 Hashtag, life cycles for, 80–81 cross-domain link weight, 109 influence decay, life cycle with, Hybrid Random Walk 79–80 algorithm, 109–114 rebounding, 79 within-domain link weight, 109 regression models, 81 Linear Threshold (LT) model, L 130–131 Laplacian matrix, 202 LinkedIn, 71 Index LIS, see Library and information science M Machine learning (rumor detection), 172–175 claim check in presidential debates, 175 deception detection for online news, 174 library and information science, 174 natural language processing, 172–173, 174 part-of-speech taggers, 173 real-time rumor debunking on Twitter, 175 semantic classification combined with propagation patterns, 174 sentiment classification, 173 speech act classification, 173 towards information credibility, 173–175 Twitter parsers, 173 MAE, see Mean Absolute Error Major-based social network, 70–72 Matlab, 47 Maximum Influence Arborescence (MIA), 131 MCGC, see Mutually connected giant components MCICM, see Multi-Campaign Independence Cascade Model Mean Absolute Error (MAE), 145 “Meme,” 47 Mesostructure, 190 MIA, see Maximum Influence Arborescence Micro-blogs, 82 Mining, see Misinformation in social media, mining of; User behaviors in large social networks, mining of 237 Misinformation in social media, mining of, 125–152 crowdsourcing, 127 definition of misinformation in social networks, 126 evaluation, 144–146 accuracy, 145 crowdsourcing, 144 datasets, 144–145 Mean Absolute Error, 145 metrics, 145–146 outcome of simulation, 146 precision, recall and F-measure, 145 future work, 146–147 intentionally spread misinformation, 126 misinformation identification, 134–140 honeypot, 139 manual labeling, 139 misinformation detection, 134–136 Natural Language Processing techniques, 135 Online Social Spammer Detection, 137 Part-Of-Speech tags, 135 spammers and bots, 137 spreader detection, 136–140 suspension list, 130 training datasets, 139 misinformation intervention, 141–143 combating rumors with facts, 142–143 malicious account detection in an early stage, 141 Multi-Campaign Independence Cascade Model, 142 simulation, 143 social media data, 143 misinformation modeling, 127–134 238 Index belief exchange, 131 Independent Cascade model, 130 information diffusion in social networks, 128–131 Linear Threshold model, 130–131 Maximum Influence Arborescence, 131 misinformation diffusion, 131–134 network structure, 133, 134 roles for diffusion, 128 SIMPATH, 131 SIR model, 128 Tipping model, 129 unintentionally spread misinformation, 126 Mobius transformations, 30, 31 MRSD problem, see Multiple Rumor Source Detection Multi-Campaign Independence Cascade Model (MCICM), 142 Multidimensional networks, 190–191 Multilayer networks and applications, 183–211 applications, 203–206 computer science networks, 204 economical networks, 205–206 International Trade Network, 205 power grids, 205 social networks, 203–204 transportation networks, 204–205 dynamics in multilayer networks, 192–200 cascading failures, 193–194 cascading model, 196 clique lossless aggregation scheme, 197–198 diffusion models in multilayer networks, 194–197 diffusion spreading in multilayer networks, 192–194 extensions to other diffusion models, 199–200 influence maximization, 193 mutually connected giant components, 194 network aggregation and synchronization, 197–200 representative vertex, 197 SIR model, 196–197 social influence, 192 spreading of disease, 194 star lossless aggregation scheme, 198–199 superposition network, 197 threshold model, 195–196 network representation, 186–191 adjacency representation, 188–189 dependency edges, 190 general representation, 187–188 hypernetworks, 191, 192 independent networks, 190 interconnected networks, 190 mesostructure, 190 multidimensional networks, 190–191 multilevel networks, 191 multiplex network, 190 network types, 189–191 node space, 187 singular value decomposition, 186 temporal networks, 191 network structure and measurements, 200–203 betweenness, 200–201 closed walk, 202 clustering and transitivity, 201 cycle, 202 Laplacian matrix, 202 Index matrices and spectral properties, 202–203 node degree, 200 walks and paths, 201–202 Multinomial Naive Bayes Classifier, 175 Multiparty system, see Legislative networks in a multiparty system Multiple Rumor Source Detection (MRSD), 159, 160 Mutually connected giant components (MCGC), 194 N Naive Bayes Classifier (NBC), 175 Narrow-sensed social network, 69 National Congress of Peru (NCP), 217 Natural language processing (NLP), 135, 172, 174 Netflix, 89 Network representation (multilayer networks), 186–191 adjacency representation, 188–189 dependency edges, 190 general representation, 187–188 hypernetworks, 191, 192 independent networks, 190 interconnected networks, 190 mesostructure, 190 multidimensional networks, 190–191 multilevel networks, 191 multiplex network, 190 network types, 189–191 node space, 187 singular value decomposition, 186 temporal networks, 191 Network topology, prediction based on, 81–83 discussion on node attribute, 82 edge, 81 239 In-Degree, 81, 82 Latent Dirichlet Allocation model, 82 micro-blogs, 82 node, 81 Out-Degree, 81, 82 state-of-the-art of predicting popular online content, 83 News-based social network, 69–70 News popularity prediction, 76 NLP, see Natural language processing Node space, 187 Non-intuitive social network, 70, 71 O Online marketing, 89 Online news, deception detection for, 174 Online social networks (OSNs), 192 Online Social Spammer Detection (OSSD), 137 P Part-Of-Speech (POS) tags, 135, 173 PCA, see Principal Components Analysis Pig and Harp, integrated high-level dataflow system with, 37–61 ad-hoc queries (Truthy and Twitter data), 46–47 Apache high-level language, syntax and its common features, 39–45 Abstract Syntax Trees, 40 GRUNT interactive shell interface, 40 Hive, 42–44 HiveQL queries, 43 Pig, 39–42 RDBMS technique, 43 Spark SQL/Shark, 44–45 UNIX bash scripting, 40 benchmarks, 51–56 240 Index performance of ad-hoc queries, 51–52 performance of data analysis, 53–56 iterative scientific applications, 47–50 Hive K-means, 48 K-means clustering and PageRank, 48–50 Pig+Harp K-means script, 49 Pig PageRank, 49 “meme,” 47 MOE large-memory cluster, 46 Pig, Hive and Spark SQL comparison, 45–46 User-Defined Functions, 46 Poincare disk, 30 POS tags, see Part-Of-Speech tags Power grids, 205 Principal Components Analysis (PCA), 73 Python, 47, 48, 220 multiple rumor sources in, 160 Set Resolving Set, 159, 160, 161 simulation results, 169–171 susceptible-infected-recovered model, 158 machine learning based approach, 172–175 claim check in presidential debates, 175 deception detection for online news, 174 library and information science, 174 natural language processing, 172–173, 174 part-of-speech taggers, 173 real-time rumor debunking on Twitter, 175 semantic classification combined with propagation patterns, 174 sentiment classification, 173 speech act classification, 173 towards information credibility, 173–175 Twitter parsers, 173 understanding rumor cascades, 155–158 analysis on real social networks, 157–158 PUSH-PULL strategy, 156 small world effect, 155 structure properties of social networks, 155–156 theoretical analysis on simulated social networks, 156–157 why rumor spreads so fast, 156–158 R Random Forest Classifier (RFC), 175 Rebounding, 79 Renren, online social platform 97, 98, 158 Rigel embedding in hyperboloid model, 17–19 R programming language, 47 Rule association, 224 Rumor spreading and detection in online social networks, 153–180 graph theory based approach, 158–172 algorithm, 163–169 discussion, 171–172 model, 161–162 Multiple Rumor Source Detection, 159, 160 networks with partial S observations, detecting Scalable query and analysis, see Pig and Harp, integrated Index high-level dataflow system with Search engine algorithms, quality of, 84 models derived from, 82 predicting technique and, 90 research on next generation of, 48 state-of-the-art, 90 Self-organizing map (SOM), 31 Self-organizing map in hyperbolic space (HSOM), 31 Sentiment classification, 173 Set Resolving Set (SRS), 159, 160, 161 Shashdot, 95 Singular value decomposition (SVD), 186 SIR model, see Susceptible-infectedrecovered model Small world effect, 155 Social networks analysis (SNA), SOM, see Self-organizing map Spammers, 137 Speech act classification, 173 SRS, see Set Resolving Set Story popularity prediction, 77–78 Superposition network, 197 Support Vector Classifier (SVM), 175 Supra-Laplacian matrix, 202 Susceptible-infected-recovered (SIR) model, 128, 158, 159 SVD, see Singular value decomposition T Temporal networks, 191 Tencent Weibo, 97, 98 Tipping model (information diffusion), 129 Transportation networks, 204–205 Treemap, 31 Twitter, 66, 69, 185 Company Statistics, 66 241 growth of, 192 information credibility on, 181 parsers, 173 real-time rumor debunking on, 175 rumor cascades, 155 spreading Osama Bin Laden’s death in, 82 streaming API, 47 -style microblogging platform in China, 97, 98 U UDF, see User-Defined Functions UGC, see User-Generated Content UNIX bash scripting, 40 User behaviors, prediction based on, 75–78 box-office revenue prediction, 75–76 news popularity prediction, 76 story popularity prediction, 77–78 user behavior prediction, 76–77 User behaviors in large social networks, mining of, 95–121 exercises, 115–118 knowledge transfer, 108–115 cross-domain behavior modeling, 108–109 cross-domain link weight, 109 Hybrid Random Walk algorithm, 109–114 within-domain link weight, 109 prediction, 101–107 flexible evolutionary multi-faceted analysis, 104–107 modeling multi-faceted dynamic behaviors, 102–104 spatio-temporal contexts, 102 social recommendation, 96–101 individual preference, 97 interpersonal influence, 97 242 Index social contextual factor analysis, 97–98 social contextual modeling for recommendation, 98–101 User-Defined Functions (UDF), 46 User-Generated Content (UGC), 117 V Visualization analytics (hyperbolic big data analytics), 29–32 adaptive focus in hyperbolic space, 30–31 general graphs, 31–32 hierarchical (tree) graphs, 31 hybrid scheme, 31 hyperbolic multidimensional scaling, 31 Mobius transformations, 31 “pie segment,” 31 self-organizing map, 31 self-organizing map in hyperbolic space, 31 treemap, 31 W Walks and paths, 201–202 Weibo, 69 Wiener Process (WP), 77 Within-domain link weight, 109 Y Yahoo! news, number of click counts for, 88 Pig introduced by, 39

Ngày đăng: 06/06/2017, 15:43

TỪ KHÓA LIÊN QUAN