1. Trang chủ
  2. » Công Nghệ Thông Tin

Collaborate computing networking, applications and worksharing

706 249 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 706
Dung lượng 30,76 MB

Nội dung

Shangguang Wang Ao Zhou (Eds.) 201 Collaborate Computing: Networking, Applications and Worksharing 12th International Conference, CollaborateCom 2016 Beijing, China, November 10–11, 2016 Proceedings 123 Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering Editorial Board Ozgur Akan Middle East Technical University, Ankara, Turkey Paolo Bellavista University of Bologna, Bologna, Italy Jiannong Cao Hong Kong Polytechnic University, Hong Kong, Hong Kong Geoffrey Coulson Lancaster University, Lancaster, UK Falko Dressler University of Erlangen, Erlangen, Germany Domenico Ferrari Università Cattolica Piacenza, Piacenza, Italy Mario Gerla UCLA, Los Angeles, USA Hisashi Kobayashi Princeton University, Princeton, USA Sergio Palazzo University of Catania, Catania, Italy Sartaj Sahni University of Florida, Florida, USA Xuemin Sherman Shen University of Waterloo, Waterloo, Canada Mircea Stan University of Virginia, Charlottesville, USA Jia Xiaohua City University of Hong Kong, Kowloon, Hong Kong Albert Y Zomaya University of Sydney, Sydney, Australia 201 More information about this series at http://www.springer.com/series/8197 Shangguang Wang Ao Zhou (Eds.) • Collaborate Computing: Networking, Applications and Worksharing 12th International Conference, CollaborateCom 2016 Beijing, China, November 10–11, 2016 Proceedings 123 Editors Shangguang Wang Beijing University of Posts and Telecommunications Beijing China Ao Zhou Beijing University of Posts and Telecommunications Beijing China ISSN 1867-8211 ISSN 1867-822X (electronic) Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering ISBN 978-3-319-59287-9 ISBN 978-3-319-59288-6 (eBook) DOI 10.1007/978-3-319-59288-6 Library of Congress Control Number: 2017942991 © ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2017 This work is subject to copyright All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed The use of general descriptive names, registered names, trademarks, service marks, etc in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations Printed on acid-free paper This Springer imprint is published by Springer Nature The registered company is Springer International Publishing AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland Preface Over the past two decades, many organizations and individuals have relied on electronic collaboration between distributed teams of humans, computer applications, and/or autonomous robots to achieve higher productivity and produce joint products that would have been impossible to develop without the contributions of multiple collaborators Technology has evolved from standalone tools to open systems supporting collaboration in multi-organizational settings, and from general purpose tools to specialized collaboration grids Future collaboration solutions that fully realize the promises of electronic collaboration require advancements in networking, technology and systems, user interfaces and interaction paradigms, and interoperation with application-specific components and tools The CollaborateCom 2016 conference series is a major venue in which to present the successful efforts to address the challenges presented by collaborative networking, technology and systems, and applications This year’s conference continued with several of the changes made for CollaborateCom 2015, and its topics of interest include, but are not limited to: participatory sensing, crowdsourcing, and citizen science; architectures, protocols, and enabling technologies for collaborative computing networks and systems; autonomic computing and quality of services in collaborative networks, systems, and applications; collaboration in pervasive and cloud computing environments; collaboration in data-intensive scientific discovery; collaboration in social media; big data and spatio-temporal data in collaborative environments/systems; collaboration techniques in data-intensive computing and cloud computing Overall, CollaborateCom 2016 received a record 116 paper submissions, up slightly from 2015 and continuing the growth compared with other years All papers were rigorously reviewed, with all papers receiving at least three and many four or more reviews with substantive comments After an on-line discussion process, we accepted 43 technical track papers and 33 industry track papers, three papers for the Multivariate Big Data Collaborations Workshop and two papers for the Social Network Analysis Workshop ACM/Springer CollaborateCom 2016 continued the level of technical excellence that recent CollaborateCom conferences have established and upon which we expect future ones to expand This level of technical achievement would not be possible without the invaluable efforts of many others My sincere appreciation is extended first to the area chairs, who made my role easy I also thank the many Program Committee members, as well as their subreviewers, who contributed many hours for their reviews and discussions, without which we could not have realized our vision of technical excellence Further, I thank the CollaborateCom 2016 Conference Committee, who provided invaluable assistance in the paper-review process and various other places that a successful conference requires Finally, and most of all, the entire committee acknowledges the contributions of the authors who submitted their high-quality work, for without community support the conference would not happen April 2017 Shangguang Wang Ao Zhou Organization General Chair and Co-chairs Shangguang Wang Zibin Zheng Xuanzhe Liu Beijing University of Posts and Telecommunications, Beijing, China Sun Yat-sen University, China Peking University, China TPC Co-chairs Ao Zhou Yutao Ma Mingdong Tang Beijing University of Posts and Telecommunications, China Wuhan University, China Hunan University of Science and Technology, China Workshop Chairs Shuiguang Deng Sherry Xu Zhejiang University, China CSIRO, China Local Arrangements Chairs Ruisheng Shi Jialei Liu Beijing University of Posts and Telecommunications, China Beijing University of Posts and Telecommunications, China Publication Chairs Shizhan Chen Yucong Duan lingyan Zhang Tianjing University, China Hainan University, China Beijing University of Posts and Telecommunications, China Social Media Chairs Xin Xin Jinliang Xu Beijing Institute of Technology, China Beijing University of Posts and Telecommunications, China Website Chair Songtai Dai Beijing University of Posts and Telecommunications, China Conference Manager Lenka Laukova EAI - European Alliance for Innovation, China Contents Default Track Web APIs Recommendation for Mashup Development Based on Hierarchical Dirichlet Process and Factorization Machines Buqing Cao, Bing Li, Jianxun Liu, Mingdong Tang, and Yizhi Liu A Novel Hybrid Data Mining Framework for Credit Evaluation Yatao Yang, Zibin Zheng, Chunzhen Huang, Kunmin Li, and Hong-Ning Dai 16 Parallel Seed Selection for Influence Maximization Based on k-shell Decomposition Hong Wu, Kun Yue, Xiaodong Fu, Yujie Wang, and Weiyi Liu 27 The Service Recommendation Problem: An Overview of Traditional and Recent Approaches Yali Zhao and Shangguang Wang 37 Gaussian LDA and Word Embedding for Semantic Sparse Web Service Discovery Gang Tian, Jian Wang, Ziqi Zhao, and Junju Liu 48 Quality-Assure and Budget-Aware Task Assignment for Spatial Crowdsourcing Qing Wang, Wei He, Xinjun Wang, and Lizhen Cui 60 Collaborative Prediction Model of Disease Risk by Mining Electronic Health Records Shuai Zhang, Lei Liu, Hui Li, and Lizhen Cui 71 An Adaptive Multiple Order Context Huffman Compression Algorithm Based on Markov Model Yonghua Huo, Zhihao Wang, Junfang Wang, Kaiyang Qu, and Yang Yang Course Relatedness Based on Concept Graph Modeling Pang Jingwen, Cao Qinghua, and Sun Qing Rating Personalization Improves Accuracy: A Proportion-Based Baseline Estimate Model for Collaborative Recommendation Zhenhua Tan, Liangliang He, Hong Li, and Xingwei Wang 83 94 104 X Contents A MapReduce-Based Distributed SVM for Scalable Data Type Classification Chong Jiang, Ting Wu, Jian Xu, Ning Zheng, Ming Xu, and Tao Yang 115 A Method of Recovering HBase Records from HDFS Based on Checksum File Lin Zeng, Ming Xu, Jian Xu, Ning Zheng, and Tao Yang 127 A Continuous Segmentation Algorithm for Streaming Time Series Yupeng Hu, Cun Ji, Ming Jing, Yiming Ding, Shuo Kuai, and Xueqing Li 140 Geospatial Streams Publish with Differential Privacy Yiwen Nie, Liusheng Huang, Zongfeng Li, Shaowei Wang, Zhenhua Zhao, Wei Yang, and Xiaorong Lu 152 A More Flexible SDN Architecture Supporting Distributed Applications Wen Wang, Cong Liu, and Jun Wang 165 Real-Time Scheduling for Periodic Tasks in Homogeneous Multi-core System with Minimum Execution Time Ying Li, Jianwei Niu, Jiong Zhang, Mohammed Atiquzzaman, and Xiang Long 175 Sweets: A Decentralized Social Networking Service Application Using Data Synchronization on Mobile Devices Rongchang Lai and Yasushi Shinjo 188 LBDAG-DNE: Locality Balanced Subspace Learning for Image Recognition Chuntao Ding and Qibo Sun 199 Collaborative Communication in Multi-robot Surveillance Based on Indoor Radio Mapping Yunlong Wu, Bo Zhang, Xiaodong Yi, and Yuhua Tang 211 How to Win Elections Abdallah Sobehy, Walid Ben-Ameur, Hossam Afifi, and Amira Bradai Research on Short-Term Prediction of Power Grid Status Data Based on SVM Jianjun Su, Yi Yang, Danfeng Yan, Ye Tang, and Zongqi Mu 221 231 An Effective Buffer Management Policy for Opportunistic Networks Yin Chen, Wenbin Yao, Ming Zong, and Dongbin Wang 242 Runtime Exceptions Handling for Collaborative SOA Applications Bin Wen, Ziqiang Luo, and Song Lin 252 Contents XI Data-Intensive Workflow Scheduling in Cloud on Budget and Deadline Constraints Zhang Xin, Changze Wu, and Kaigui Wu 262 PANP-GM: A Periodic Adaptive Neighbor Workload Prediction Model Based on Grey Forecasting for Cloud Resource Provisioning Yazhou Hu, Bo Deng, Fuyang Peng, Dongxia Wang, and Yu Yang 273 Dynamic Load Balancing for Software-Defined Data Center Networks Yun Chen, Weihong Chen, Yao Hu, Lianming Zhang, and Yehua Wei 286 A Time-Aware Weighted-SVM Model for Web Service QoS Prediction Dou Kai, Guo Bin, and Li Kuang 302 An Approach of Extracting Feature Requests from App Reviews Zhenlian Peng, Jian Wang, Keqing He, and Mingdong Tang 312 QoS Prediction Based on Context-QoS Association Mining Yang Hu, Qibo Sun, and Jinglin Li 324 Collaborate Algorithms for the Multi-channel Program Download Problem in VOD Applications Wenli Zhang, Lin Yang, Kepi Zhang, and Chao Peng 333 Service Recommendation Based on Topics and Trend Prediction Lei Yu, Zhang Junxing, and Philip S Yu 343 Real-Time Dynamic Decomposition Storage of Routing Tables Wenlong Chen, Lijing Lan, Xiaolan Tang, Shuo Zhang, and Guangwu Hu 353 Routing Model Based on Service Degree and Residual Energy in WSN Zhenzhen Sun, Wenlong Chen, Xiaolan Tang, and Guangwu Hu 363 Abnormal Group User Detection in Recommender Systems Using Multi-dimension Time Series Wei Zhou, Junhao Wen, Qingyu Xiong, Jun Zeng, Ling Liu, Haini Cai, and Tian Chen Dynamic Scheduling Method of Virtual Resources Based on the Prediction Model Dongju Yang, Chongbin Deng, and Zhuofeng Zhao A Reliable Replica Mechanism for Stream Processing Weilong Ding, Zhuofeng Zhao, and Yanbo Han 373 384 397 An Improvement Direction for the Simple Random Walk Sampling 637 For each biased sampling algorithm, the largest real-world graph having 26,475 nodes is selected as the original network, and the network sizes of a sequence of sampling graphs are set to fall in the range from 16,301 to 25,988 which are consistent with those of the top 23 smallest real-world Internet graphs For a certain network size of sampling graphs, each sampling algorithm runs ten times and the corresponding statistic is the average over ten realizations Comparison of ME1 and WSD on Sampling Algorithms As shown in Fig 1, the SRW performs much more stable on the normalized Laplacian spectrum compared to other biased sampling algorithms, and its corresponding curves are much closer to those of the real-world dataset To subtly analyze the SRW, we compare the SRW with one mutation of the SRW, called Random Walk Flying Back (RWFB) The only difference between the SRW and the RWFB is that at each new iteration, with a probability c, the RWFB flies back to the original seed selected by the SRW and restarts the random walk [2] As shown in Fig 2, when c = 0.1, the RWFB performs much better than the SRW The comparison between the SRW and the RWFB will be used to explore the improvement direction of this type of algorithms 0.725 0.050 : Real topologies 0.720 0.715 0.710 0.705 0.700 0.695 1.6 : RWFB sampling (c = 0.1) : SRW sampling 1.8 2.0 2.2 2.4 Network size (n) 2.6 10 Ratio of the WSD to the network size Ratio of the ME1 to the network size 0.730 0.048 0.046 0.044 : Real topologies : RWFB sampling (c = 0.1) : SRW sampling 0.042 0.040 0.038 0.036 1.6 (a) 1.8 2.0 2.2 2.4 Network size (n) 2.6 10 (b) Fig Comparisons of the WE1 and the WSD on real-world and random walk sampling graphs (a) ME1/n vs n (b) WSD/n vs n Analysis for the Numerical Results According to Sect 2.2, for the Internet topology, node classification is an important feature reflected by the normalized Laplacian spectrum Because the Internet topology has plenty of nodes with degree one, the pendant set P(G) and the inner noise node set OI are respectively related to the largest and the smallest cardinalities Thus, we analyze the two node set features of diverse sampling algorithms The BFS is a graph traversal algorithm that constructs tree-like sampling graphs since each node is exactly visited once for this type of algorithms Breath first principle induces small depths (where depth is defined as the maximum distance between root and leaves) of the tree-like sampling graphs As is well known, trees with small depths have an extremely 638 B Jiao et al large number of pendants (i.e., leaves with degree one) Additionally, this situation will lead to extremely small cardinalities of other node sets Although the FF is a randomized version, its many performances remain similar to the BFS Thus, for the BFS and the FF, their pendant numbers are extremely larger than that of the real-world dataset, and their inner noise node numbers are approximately equal to zero The RJ is a hybrid algorithm of the SRW and the Random Node (RN) samplings The SRW is biased towards high-degree nodes while the RN uniformly samples each node Thus, the small-degree node number of the hybrid algorithm RJ is larger than that of the SRW Note that the pendant set is an important component of small-degree nodes, which explains why the pendant number of the RJ is obviously larger than that of the real-world dataset Therefore, the bad performances on node classification of the BFS, the FF and the RJ are critical reasons for the best performance of the SRW on the normalized Laplacian spectrum, as shown in Fig According to Sect 2.2, the Internet topology can be divided into eight node classifications Specifically, pendant set P(G), inner isolated node set II, quasi-pendant set Q (G) and inner binate node set BI occupy the vast majority of the Internet nodes [9] As shown in Fig 3, we exhibit the evolving features of the four node sets on real-world and two random walk sampling graphs Next, we will analyze the physical meaning embedded in Fig and investigate the improvement direction of the random walk sampling Based on Fig 3, for the RWFB, its pendant number, quasi-pendant number and inner binate node number are decreased and its inner isolated node number is increased in contrast to those of the SRW The physical interpretation of these phenomena is presented in Fig In Fig 4(a), each periphery node is attached to only one core node so these periphery nodes are single-homed With the increasing of the links between periphery and core nodes, increasingly more nodes are transformed from single-homed to multi-homed, as shown in Fig 4(b) As is well known, multi-homed nodes have better fault tolerance Due to the rich club phenomenon, extremely few core nodes attract the majority of periphery nodes to connect with them As shown in Fig 4(b), inner binate nodes are generated by the small-degree core nodes which are connected with only one periphery node If we remove the links between two inner binate nodes, in contrast to Fig 4(a), all of the pendants, quasi-pendants and inner binate nodes will be reduced, and the inner isolated nodes will be added, as shown in Fig 4(c) Thus, Fig explains why more added multi-homed nodes and more reduced inner binate nodes of the RWFB sampling graphs compared to those of the SRW induce the phenomenon shown in Fig The ME1 quantities the periphery number minus the core number [9], so the ME1 of Fig 4(c) is obviously larger than that of Fig 4(a) which verifies the phenomenon of Fig 2(a) With the transformation from single-homed to multi-homed networks, the WSD monotonically decreases in general [9], so the WSD of Fig 4(c) is commonly smaller than that of Fig 4(a), which verifies the phenomenon of Fig 2(b) Therefore, we can determine that adding multi-homed nodes and reducing inner binate nodes are the key reasons for the better performance of the RWFB in Fig Although the RWFB performs better than the SRW on the normalized Laplacian spectrum, its stability of the evolving process on the spectrum is still unsatisfactory Specially, as shown in Fig 3, with the increasing of the size reduction ratio of the sampling graphs, the curves of the RWFB are father and father away from those of the real-world dataset Additionally, the time complexity of the RWFB is very high since 0.41 Ratio of the inner isolated node number to the network size Ratio of the pendant number to the network size An Improvement Direction for the Simple Random Walk Sampling : RWFB sampling (c = 0.1) : SRW sampling 0.40 0.39 0.38 0.37 : Real topologies 10 0.36 0.35 0.34 1.6 1.8 2.0 2.2 2.4 Network size (n) 2.6 10 0.41 0.40 0.39 0.38 0.37 0.36 10 0.35 0.34 1.6 1.8 2.0 2.2 2.4 Network size (n) 0.090 : RWFB sampling (c = 0.1) 0.088 : SRW sampling Ratio of the inner binate node number to the network size Ratio of the quasi-pendant number to the network size 2.6 10 (b) : Real topologies 0.086 0.084 0.082 0.080 1.6 : Real topologies : RWFB sampling (c = 0.1) : SRW sampling (a) 0.092 639 1.8 2.0 2.2 2.4 Network size (n) 2.6 10 0.054 0.052 10 : Real topologies : RWFB sampling (c = 0.1) : SRW sampling 0.050 0.048 0.046 0.044 0.042 0.040 1.6 1.8 2.0 2.2 2.4 Network size (n) (c) 2.6 10 (d) Fig Comparisons of the node classification on real-world and random walk sampling graphs (a) pendant number/n vs n (b) inner isolated node number/n vs n (c) quasi-pendant number/n vs n (d) Inner binate node number/n vs n P P P P Q Core Q II (a) P IB (b) Q P P Q II P II P Q Q P P II Core IB Q P Q IB II P II P Q Core Q Q II P P IB P II P P (c) Fig Physical meaning embedded in Fig (a) A network with abundant single-homed nodes (b) more multi-homed nodes are added (c) More inner binate nodes are reduced Note that white and black nodes respectively compose the core and periphery of the Internet, and P, Q, II and IB respectively denote pendant, quasi-pendant, inner isolated node and inner binate node flying back to the seed extremely increases the average visiting time of each node in the original network However, based on the physical interpretation of Fig 4, we can determine that adding multi-homed nodes and reducing inner binate nodes are valuable improvement directions for the simple random walk sampling algorithms 640 B Jiao et al Although only realistic autonomous system level Internet topologies with snapshots from Jan 2004 to Nov 2007 are analyzed in this paper, our recent studies [5, 9] verified that the physical meanings of the ME1 and the WSD hold for plenty of Internet evolving topologies Specially, the core and periphery of the Internet (associated with the ME1) respectively are composed of the transit and stub nodes, which is consistent with the classical transit-stub model of the Internet [13] Moreover, the transformation from single-homed to multi-homed (indicated by the WSD) reflects the Internet’s requirement for better fault tolerance Also, realistic Internet topologies derived from different data sources (e.g., AS-733, Oregon and AS-Caida) [12] keep plenty of similar size-independent structures [8] Therefore, the derived results of this paper can be applied to more general cases of the Internet topology Conclusion The normalized Laplacian spectrum is critical for evaluating graph sampling algorithms applied in the Internet visualization In this paper, we use the spectrum to investigate the advantages and deficiencies of the SRW samplings and observe that the SRW and its mutation perform much better than other biased samplings Additionally, based on the physical interpretation for the better performance of the RWFB, we indicate that adding multi-homed nodes and reducing inner binate nodes are important improvement directions for this type of SRW algorithms In the future work, according to the improvement directions, we will design another mutation of the SRW which has better performance on the spectrum and higher runtime efficiency Acknowledgments We would like to thank the anonymous reviewers for their comments that helped improve this paper This paper is supported by the National Natural Science Foundation of China with Grant Nos 61402485 and 61303061 References Lee, C.H., Xu, X., Eun, D.Y.: Beyond random walk and metropolis-hastings samplers: why you should not backtrack for unbiased graph sampling ACM SIGMETRICS Perform Eval Rev 40, 319–330 (2012) Leskovec, J., Faloutsos, C.: Sampling from large graphs In: The 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 631–636 (2006) Xu, X., Lee, C.H.: A general framework of hybrid graph sampling for complex network analysis In: 2014 Proceedings IEEE INFOCOM, pp 2795–2803 (2014) Kurant, M., Markopoulou, A., Thiran, P.: Towards unbiased BFS sampling IEEE J Sel Areas Commun 29, 1799–1809 (2011) Jiao, B., Zhou, Y., Du, J., et al.: Study on the stability of the topology interactive growth mechanism using graph spectra IET Commun 8, 2845–2857 (2014) Jiao, B., Nie, Y., Shi, J., et al.: Scaling of weighted spectral distribution in deterministic scale-free networks Phys A Stat Mech Appl 451, 632–645 (2016) An Improvement Direction for the Simple Random Walk Sampling 641 Jiao, B., Shi, J., Wu, X., et al.: Correlation between weighted spectral distribution and average path length in evolving networks Chaos Interdisc J Nonlinear Sci 26, 023110 (2016) Jiao, B., Nie, Y., Shi, J., et al.: Accurately and quickly calculating the weighted spectral distribution Telecommun Syst 62, 231–243 (2016) Jiao, B., Shi, J.: Graph perturbations and corresponding spectral changes in internet topologies Comput Commun 76, 77–86 (2016) 10 Vukadinović, D., Huang, P., Erlebach, T.: On the spectrum and structure of internet topology graphs In: Unger, H., Böhme, T., Mikler, A (eds.) IICS 2002 LNCS, vol 2346, pp 83–95 Springer, Heidelberg (2002) doi:10.1007/3-540-48080-3_8 11 Fay, D., Haddadi, H., Thomason, A., et al.: Weighted spectral distribution for internet topology analysis: theory and applications IEEE/ACM Trans Networking 18, 164–176 (2010) 12 Leskovec, J.: Stanford Large Network Dataset Collection http://snap.stanford.edu/data/ 13 Calvert, K., Doar, M., Zegura, E.: Modeling internet topology IEEE Trans Commun 35, 160–163 (1997) Detecting False Information of Social Network in Big Data Yi Xu1(&), Furong Li1, Jianyi Liu1, Ru Zhang1, Yuangang Yao2, and Dongfang Zhang3 Information Security Center, Beijing University of Posts and Telecommunications, Beijing, China {xuyi0511,ronger19930711,liujy,zhangru}@bupt.edu.cn China Information Technology Security Evaluation Center, Beijing, China yaoyg@itsec.gov.cn First Research Institute of the Ministry of Public Security of PRC, Beijing, China 40319005@qq.com Abstract With the rapid development of social network, the information announced by this platform attracts more and more attention, because of the great harm brought by the false information, researching the false information detection of social network has great significance This paper presents a model of social network false information detection, which firstly converting the information announced by social network into a three-dimensional vector, then comparing this vector with the three-dimensional vector converted by Internet events and calculating the similarity between social network and Internet, detecting the consistency of social network event and Internet event afterwards, finally gathering statistics and analyzing then we can get the similarity between social network event and Internet event, according to this, we can judge that the social network information is false or not Keywords: Social network Á Information Á Similarity Á False Á Detection Introduction At present, with the rapid growth of data volume, we enter the era of Big Data While Big Data brings us extremely rich information, followed by this is a large number of false or outdated data, greatly reducing the application value of Big Data The so-called “false information” is information that it is not authentic and maybe cause negative influence Especially social network in Big Data, because announcing information by social network is open, anonymous, convenient and the spread of information is extensive and rapid, the problem of false information become more and more serious For example, on April 24, 2013, hackers stole the Twitter of AP announced that the White House has been attacked by bomb attack, result in the Dow Jones Industrial Average Index plunged in a short time [1] On March 2016, in microblog some say that somebody infect H7N9 because eating chicken, this led to people’s panic for pandemic virus This information may fool the cyber citizens, more seriously maybe cause social © ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2017 S Wang and A Zhou (Eds.): CollaborateCom 2016, LNICST 201, pp 642–651, 2017 DOI: 10.1007/978-3-319-59288-6_65 Detecting False Information of Social Network in Big Data 643 unrest [2] It follows that it is important that detecting false information to prevent the spread of false information At the moment, the research of social network false information detection has been attracting more and more attention, domestic and international scholars have achieved some research result in this aspect Akritidis et al [3] presents two mechanisms: BP-Index Mechanism and BI-Index Mechanism BP-Index Mechanism in charge of assessing amount of blogs that users announced; BI-Index Mechanism judges bloggers’ influence according to the number of links and comment in a certain time Combining the two mechanisms to evaluate whether the bloggers are recently influential or recently productive Zolfaghar and Aghaie [4] presents a method to forecast users’ trust issues with Machine Learning, this paper considers that social credibility can divided into five aspects: relationship credibility, honor credibility, knowledge credibility, similarity credibility and individuation credibility, maps these five aspects into the characteristic sets consist of context, behavior and feature information of credibility network topological structure However, this paper doesn’t analyze forecast of users’ trust issues in dynamic state Calais et al [5] proposes a real-time emotion analysis method based on transfer learning strategy, this method acquires users’ prejudice to information in social network and sets this prejudice as the essential attribute of users’ behaviors to translate into textual features, so that structure emotion classification model to realize emotion analysis Castillo et al [6] proposes a method of automatically assessing Twitter information credibility aim at social network typical representative Twitter The paper analyzes text content to judge information credibility by means of users’ emotion and opinion on the information However, this method depends on manual work, its efficiency is low in the practical application Qiao et al [7] presents a trust calculating algorithm based on the social network users’ context This method divides the user trust into two parts: generated by familiarity and similarity among the different social network users This paper also provides the specific computing method In the research of information credibility, there are a few typical system can help user judge network information credibility from several angles, mainly include: WISDOM, reframeit.com, Honto Search, Blekko.com, etc [8] On the problem of social network false information detection, scholars have obtained some achievement from different view, however, the achievement is little and scattered so that there is no systematic theory, we still have lots of problems to solve This paper presents a model of social network false information detection, comparing the social network event with Internet event, calculating the similarity between the two, then detecting their consistency, according to this, we can judge that the social network information is false or not This paper also provides specific computational formula about these process Comparing with the common false information detection methods, this paper follows the point of view in information itself instead of users, and compares social network information with Internet information, it can guarantee the information detection is authentic and authoritative In this paper, the Sect presents the model of social network false information detection and explains every module of this model; the Sect designs contrast experiments to prove that the model in this paper is feasible; at last, makes summary of work and outlooks the future work 644 Y Xu et al Detecting False Information of Social Network This paper firstly extracts social network information keywords to be query items put into Google, screens the Internet information from web pages return by Google; then based on webpages screen, extracting the webpages event, this paper converts the information into a three-dimensional vector ~ E ðe; t; pÞ, in this vector, E represents event vector, e represents event name, t represents time, p represents place, so that we can get the three-dimensional vector ~ Ei ðei ; ti ; pi Þ converted by social network events and ~ Ej ðej ; tj ; pj Þ converted by Internet events; calculating the similarity between this two vector; detecting the emotional tendency consistency of social network event and Internet event; after gathering statistics and analyzing we can get the similarity between social network event and Internet event, according to this, we can get the result that the social network information is false or not The model as shown in Fig Fig The model of social network false information detection 2.1 Internet Information Screen In order to screening the Internet information, this paper firstly extracts social network information keywords to be query items put into Google, aim at web pages return by Google, we screen the webpages by website quality value At present, the methods describing website quality are those used often: PageRank and Alexa PageRank describes importance degree of website [9] Alexa embodies Detecting False Information of Social Network in Big Data 645 popularity degree of website [10] This paper comprehends PageRank and Alexa to rank the webpages, the denition of Website Quality Value as follows: WQAị ẳ a PageRankAị ỵ b ẵ1 AlexaAị 10000 ð1Þ In this formula, A represents the website we need to calculate, 10000 is the quantity of Chinese website list which Alexa published, a, b (0 a; b 1&&a ỵ b ¼ 1) is the weight of PageRank and Alexa 2.2 Event Extract In the social network information and Internet information, all of the events can be constituted of three elements, event name, time and place, this paper converts the ! information into a three-dimensional vector E ðe; t; pÞ, in this vector, E represents event vector, e represents event name, t represents time, p represents place This paper based on website screen, extracts the webpages return by Google The format of the web information in webpages mostly is HTML So we present a method based on Chinese text density to extract main body Firstly, taking out HTML tags from webpages and reserving blank position, the left is text denoted by Ltext totals L lines Dividing pieces turn down with k spacing from the first line denoted by Blocki , counting the total number of characters SChari and the number of Chinese characters CChari in the Blocki , the density of Chinese characters denoted by Deni as shown in formula Deni ẳ CChari SChari 2ị The text is divided into L-k pieces, we draw the distribution image using [1, L-k] as horizontal axis and Deni as vertical axis There inevitably are a large number of Chinese characters in the main body of webpages, it lead to sudden rise of Chinese characters density What we need to is confirm the point of sudden rise and sudden fall to delimit a region with high Chinese characters density, the text in this region is the main body of webpages After obtaining the main body of webpages, this paper search the sentences which contain trigger words The so-called “trigger word” is a word that describes the status of one event, it represents the event occurred so that it can commendably deciding the type of event, such as “happen” and “outbreak” [11] In the natural language processing, the context field is from −8 to +9 apart from Core Words can contain more than 85% amount of information [12], we call this field effective range of event sentences Because the sentence between two nearest periods from trigger word mostly in this effective range, so this paper put the sentence between two nearest periods from trigger word as the Internet event sentence Aim at event sentences, using the method of matching rule to extract event information, based on the ICTCLAS, this paper defines the matching rule “\{}/t” to extract time information Similarly, defining the 646 Y Xu et al matching rule “\{}/ns” or “\{}/nsf” to extract place information The process of extracting the event three-dimensional vector as follows: Step 1: Segmenting the text and traversing the words, then searching the words whether belong to trigger words dictionary or not If not, searching go on If yes, putting the sentence between two nearest periods from trigger word as the event sentence Step 2: Using the method of matching rule to extract time information denoted by t in the event sentence; extract place information denoted by p; taking out t and p, the rest of the sentence is so called as event name denoted by e Step 3: Using the three extract elements to constituting the event three-dimensional vector ~ Eðe; t; pÞ 2.3 Similarity Calculate In accordance with the event three-dimensional vector ~ Eðe; t; pÞ, we compare social network event vector ~ Ei ðei ; ti ; pi Þ with webpages extract event vector ~ Ej ðej ; tj ; pj Þ The formula that how to calculate similarity value as follows: ~ Ei Á ~ Ej Sim~ Ei ; ~ Ej ị ẳ cos~ Ei ; ~ Ej ị ẳ ~ ... http://www.springer.com/series/8197 Shangguang Wang Ao Zhou (Eds.) • Collaborate Computing: Networking, Applications and Worksharing 12th International Conference, CollaborateCom 2016 Beijing, China, November 10–11,... presented by collaborative networking, technology and systems, and applications This year’s conference continued with several of the changes made for CollaborateCom 2015, and its topics of interest... sensing, crowdsourcing, and citizen science; architectures, protocols, and enabling technologies for collaborative computing networks and systems; autonomic computing and quality of services

Ngày đăng: 05/03/2019, 09:03

TỪ KHÓA LIÊN QUAN