Web mining applications in e commerce and e services

187 26 0
Web mining applications in e commerce and e services

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

I-Hsien Ting and Hui-Ju Wu (Eds.) Web Mining Applications in E-Commerce and E-Services Studies in Computational Intelligence, Volume 172 Editor-in-Chief Prof Janusz Kacprzyk Systems Research Institute Polish Academy of Sciences ul Newelska 01-447 Warsaw Poland E-mail: kacprzyk@ibspan.waw.pl Further volumes of this series can be found on our homepage: springer.com Vol 150 Roger Lee (Ed.) Software Engineering Research, Management and Applications, 2008 ISBN 978-3-540-70774-5 Vol 151 Tomasz G Smolinski, Mariofanna G Milanova and Aboul-Ella Hassanien (Eds.) Computational Intelligence in Biomedicine and Bioinformatics, 2008 ISBN 978-3-540-70776-9 Vol 152 Jaroslaw Stepaniuk Rough – Granular Computing in Knowledge Discovery and Data Mining, 2008 ISBN 978-3-540-70800-1 Vol 153 Carlos Cotta and Jano van Hemert (Eds.) Recent Advances in Evolutionary Computation for Combinatorial Optimization, 2008 ISBN 978-3-540-70806-3 Vol 154 Oscar Castillo, Patricia Melin, Janusz Kacprzyk and Witold Pedrycz (Eds.) Soft Computing for Hybrid Intelligent Systems, 2008 ISBN 978-3-540-70811-7 Vol 162 Costin Badica, Giuseppe Mangioni, Vincenza Carchiolo and Dumitru Dan Burdescu (Eds.) Intelligent Distributed Computing, Systems and Applications, 2008 ISBN 978-3-540-85256-8 Vol 163 Pawel Delimata, Mikhail Ju Moshkov, Andrzej Skowron and Zbigniew Suraj Inhibitory Rules in Data Analysis, 2009 ISBN 978-3-540-85637-5 Vol 164 Nadia Nedjah, Luiza de Macedo Mourelle, Janusz Kacprzyk, Felipe M.G Fran¸ca and Alberto Ferreira de Souza (Eds.) Intelligent Text Categorization and Clustering, 2009 ISBN 978-3-540-85643-6 Vol 165 Djamel A Zighed, Shusaku Tsumoto, Zbigniew W Ras and Hakim Hacid (Eds.) Mining Complex Data, 2009 ISBN 978-3-540-88066-0 Vol 166 Constantinos Koutsojannis and Spiros Sirmakessis (Eds.) Tools and Applications with Artificial Intelligence, 2009 ISBN 978-3-540-88068-4 Vol 155 Hamid R Tizhoosh and M Ventresca (Eds.) Oppositional Concepts in Computational Intelligence, 2008 ISBN 978-3-540-70826-1 Vol 167 Ngoc Thanh Nguyen and Lakhmi C Jain (Eds.) Intelligent Agents in the Evolution of Web and Applications, 2009 ISBN 978-3-540-88070-7 Vol 156 Dawn E Holmes and Lakhmi C Jain (Eds.) Innovations in Bayesian Networks, 2008 ISBN 978-3-540-85065-6 Vol 168 Andreas Tolk and Lakhmi C Jain (Eds.) Complex Systems in Knowledge-based Environments: Theory, Models and Applications, 2009 ISBN 978-3-540-88074-5 Vol 157 Ying-ping Chen and Meng-Hiot Lim (Eds.) Linkage in Evolutionary Computation, 2008 ISBN 978-3-540-85067-0 Vol 158 Marina Gavrilova (Ed.) Generalized Voronoi Diagram: A Geometry-Based Approach to Computational Intelligence, 2009 ISBN 978-3-540-85125-7 Vol 159 Dimitri Plemenos and Georgios Miaoulis (Eds.) Artificial Intelligence Techniques for Computer Graphics, 2009 ISBN 978-3-540-85127-1 Vol 169 Nadia Nedjah, Luiza de Macedo Mourelle and Janusz Kacprzyk (Eds.) Innovative Applications in Data Mining, 2009 ISBN 978-3-540-88044-8 Vol 170 Lakhmi C Jain and Ngoc Thanh Nguyen (Eds.) Knowledge Processing and Decision Making in Agent-Based Systems, 2009 ISBN 978-3-540-88048-6 Vol 160 P Rajasekaran and Vasantha Kalyani David Pattern Recognition using Neural and Functional Networks, 2009 ISBN 978-3-540-85129-5 Vol 171 Chi-Keong Goh, Yew-Soon Ong and Kay Chen Tan (Eds.) Multi-Objective Memetic Algorithms, 2009 ISBN 978-3-540-88050-9 Vol 161 Francisco Baptista Pereira and Jorge Tavares (Eds.) Bio-inspired Algorithms for the Vehicle Routing Problem, 2009 ISBN 978-3-540-85151-6 Vol 172 I-Hsien Ting and Hui-Ju Wu (Eds.) Web Mining Applications in E-Commerce and E-Services, 2009 ISBN 978-3-540-88080-6 I-Hsien Ting Hui-Ju Wu (Eds.) Web Mining Applications in E-Commerce and E-Services 123 Dr I-Hsien Ting Department of Information Management National University of Kaohsiung No 700, Kaohsiung University Road Kaohsiung City, 811 Taiwan Email: iting@nuk.edu.tw Dr Hui-Ju Wu Institute of Human Resource Management National Changhua University of Education No.2, Shi-Da Road Changhua City, 500 Taiwan Email: d94311001@mail.ncue.edu.tw ISBN 978-3-540-88080-6 e-ISBN 978-3-540-88081-3 DOI 10.1007/978-3-540-88081-3 Studies in Computational Intelligence ISSN 1860949X Library of Congress Control Number: 2008935505 c 2009 Springer-Verlag Berlin Heidelberg This work is subject to copyright All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilm or in any other way, and storage in data banks Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer Violations are liable to prosecution under the German Copyright Law The use of general descriptive names, registered names, trademarks, etc in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use Typeset & Cover Design: Scientific Publishing Services Pvt Ltd., Chennai, India Printed in acid-free paper 987654321 springer.com Preface Web mining has become a popular area of research, integrating the different research areas of data mining and the World Wide Web According to the taxonomy of Web mining, there are three sub-fields of Web-mining research: Web usage mining, Web content mining and Web structure mining These three research fields cover most content and activities on the Web With the rapid growth of the World Wide Web, Web mining has become a hot topic and is now part of the mainstream of Web research, such as Web information systems and Web intelligence Among all of the possible applications in Web research, e-commerce and e-services have been identified as important domains for Web-mining techniques Web-mining techniques also play an important role in e-commerce and e-services, proving to be useful tools for understanding how e-commerce and e-service Web sites and services are used, enabling the provision of better services for customers and users Thus, this book will focus upon Web-mining applications in e-commerce and e-services Some chapters in this book are extended from the papers that presented in WMEE 2008 (the 2nd International Workshop for E-commerce and E-services) In addition, we also sent invitations to researchers that are famous in this research area to contribute for this book The chapters of this book are introduced as follows: In chapter 1, Peter I Hofgesang presents an introduction to online web usage mining and provides background information followed by a comprehensive overview of the related work In addition, it outlines the major, and yet mostly unsolved, challenges in the field In chapter 2, Gulden Uchyigit presented an overview of some of the techniques, algorithms, methodologies along with challenges of using semantic information in representation of domain knowledge, user needs and the recommendation algorithms In chapter 3, Bettina Berendt and Daniel Trümper describe a novel method for analyzing large corpora has been developed Using an ontology created with methods of global analysis, a corpus is divided into groups of documents sharing similar topics The introduced local analysis allows the user to examine the relationships of documents in a more detailed way In chapter 4, Jean-Pierre Norguet et al propose a method based on output page mining and presents a solution to answer the need for summarized and conceptual audience metrics in Web analytics The authors describes several methods for collecting the Web pages output by Web servers and aggregate the occurrences of taxonomy terms in these pages can provide audience metrics for the Web site topics VI Preface In chapter 5, Leszek Borzemski presents empirical experience learnt from Web performance mining research, in particular, in the development of predictive model describing Web performance behavior from the perspective of end-users The author evaluates Web performance from the perspective of Web clients therefore the Web performance is considered in the sense of the Web server-to-browser throughput or Web resource download speed rate In chapter 6, Ali Mroue and Jean Caussanel describe an approach for automatically finding the prototypic browsing behavior of web users User access logs are examined in order to extract the most significant user navigation access pattern Such approach gives us an efficient way to better understand the way users are acting, and leads us to improve the structure of websites for improving navigation In chapter 7, Istvan K Nagy and Csaba Gaspar-Papanek investigate the time spent on web pages as a disregarded indicator of quality of online contents The authors present influential factors on TSP measure and gave a TSP data preprocessing methodology whereby we were able to eliminate the effects of this factors In addition, The authors introduce the concept of the sequential browsing and revisitation to more exactly restore users' navigation pattern based on TSP and the restored stack of browser In chapter 8, Yingzi Jin et al describe an attempt to learn ranking of companies from a social network that has been mined from the web The authors conduct an experiment using the social network among 312 Japanese companies related to the electrical products industry to learn and predict the ranking of companies according to their market capitalization This study specifically examines a new approach to using web information for advanced analysis by integrating multiple relations among named entities In chapter 9, Jun Shen, and Shuai Yuan propose a modelling based approach to design and develop a P2P based service coordination system and their components The peer profiles are described with the WSMO (Web Service Modelling Ontology) standard, mainly for quality of service and geographic features of the e-services, which would be invoked by various peers To fully explore the usability of service categorization and mining, the authors implement an ontology driven unified algorithm to select the most appropriate peers The UOW-SWS prototype also shows that the enhanced peer coordination is more adaptive and effective in dynamic business processes In chapter 10, I-Hsien Ting and Hui-Ju Wu provide a study about the issues of using web mining techniques for on-line social networks analysis Techniques and concepts of web mining and social networks analysis will be introduced and reviewed in this chapter as well as a discussion about how to use web mining techniques for online social networks analysis Moreover, in this chapter, a process to use web mining for on-line social networks analysis is proposed, which can be treated as a general process in this research area Discussions of the challenges and future research are also included in this chapter In summary, this book’s content sets out to highlight the trends in theory and practice which are likely to influence e-commerce and e-services practices in the web mining research Through applying Web-mining techniques to e-commerce and e-services, value is enhanced and the research fields of Web mining, e-commerce and e-services can be expanded I-Hsien Ting Hui-Ju Wu Contents Online Mining of Web Usage Data: An Overview Peter I Hofgesang Semantically Enhanced Web Personalization Gulden Uchyigit 25 Semantics-Based Analysis and Navigation of Heterogeneous Text Corpora: The Porpoise News and Blogs Engine Bettina Berendt, Daniel Tră umper 45 Semantic Analysis of Web Site Audience by Integrating Web Usage Mining and Web Content Mining Jean-Pierre Norguet, Esteban Zim´ anyi, Ralf Steinberger 65 Towards Web Performance Mining Leszek Borzemski 81 Anticipate Site Browsing to Anticipate the Need Ali Mroue, Jean Caussanel 103 User Behaviour Analysis Based on Time Spent on Web Pages Istvan K Nagy, Csaba Gaspar-Papanek 117 Ranking Companies on the Web Using Social Network Mining Yingzi Jin, Yutaka Matsuo, Mitsuru Ishizuka 137 Adaptive E-Services Selection in P2P-Based Workflow with Multiple Property Specifications Jun Shen, Shuai Yuan 153 Web Mining Techniques for On-Line Social Networks Analysis: An Overview I-Hsien Ting, Hui-Ju Wu 169 Author Index 181 Online Mining of Web Usage Data: An Overview Peter I Hofgesang VU University Amsterdam, Department of Computer Science De Boelelaan 1081A, 1081 HV Amsterdam, The Netherlands hpi@few.vu.nl Abstract In recent years, web usage mining techniques have helped online service providers to enhance their services, and restructure and redesign their websites in line with the insights gained The application of these techniques is essential in building intelligent, personalised online services More recently, it has been recognised that the shift from traditional to online services – and so the growing numbers of online customers and the increasing traffic generated by them – brings new challenges to the field Highly demanding real-world E-commerce and E-services applications, where the rapid, and possibly changing, large volume data streams not allow offline processing, motivate the development of new, highly efficient real-time web usage mining techniques This chapter provides an introduction to online web usage mining and presents an overview of the latest developments In addition, it outlines the major, and yet mostly unsolved, challenges in the field Keywords: Online web usage mining, survey, incremental algorithms, data stream mining Introduction In the case of traditional, “offline” web usage mining (WUM), usage and other user-related data are analysed and modelled offline The mining process is not time-limited, the entire process typically takes days or weeks, and the entire data set is available upfront, prior to the analysis Algorithms may perform several iterations on the entire data set and thus data instances can be read more than once However, as the number of online users – and the traffic generated by them – greatly increases, these techniques become inapplicable Services with more than a critical amount of user access traffic need to apply highly efficient, real-time processing techniques that are constrained both computationally and in terms of memory requirements Real-time, or online, WUM techniques (as we refer to them throughout this chapter) that provide solutions to these problems have received great attention recently, both from academics and the industry Figure provides a schematic overview of the online WUM process User interactions with the web server are presented as a continuous flow of usage data; the data are pre-processed – including being filtered and sessionised – on-the-fly; models are incrementally updated when new data instances arrive and refreshed I.-H Ting, H.-J Wu (Eds.): Web Mining Appl in E-Commerce & E-Services, SCI 172, pp 1–23 c Springer-Verlag Berlin Heidelberg 2009 springerlink.com P.I Hofgesang Fig An overview of online WUM User interactions with a web server are preprocessed continuously and fed into online WUM systems that process the data and update the models in real-time The outputs of these models are used to, e.g monitor user behaviour in real-time, to support online decision making, and to update personalised services on-the-fly models are applied, e.g to update (personalised) websites, to instantly alert on detected changes in user behaviour, and to report on performance analysis or on results of monitoring user behaviour to support online decision making This book chapter is intended to be an introduction to online WUM and it aims to provide an overview of the latest developments in the field and so, in this respect, it is – to the best of our knowledge – the first survey on the topic The remainder of this chapter is organised as follows In the section, we provide a brief general introduction to WUM, and the new online challenges We survey the literature related to online WUM divided in three sections (Sections 3, 4, and 5) overviews the efficient and compact structures used in (or even developed for) online WUM overviews online algorithms for WUM, while presents the work related to real-time monitoring systems The most important (open) challenges are described in Finally, the last section provides a discussion Background This section provides a background to traditional WUM; describes incremental learning to efficiently update WUM models in a single pass over the clickstream; and, finally, it motivates the need for highly efficient real-time, change-aware algorithms for high volume, streaming web usage data through the description of web dynamics, characterising changing websites and usage data 2.1 Web Usage Mining Web or application servers log all relevant information available on user–server interaction These log data, also known as web user access or clickstream data, Online Mining of Web Usage Data: An Overview can be used to explore, model, and predict user behaviour WUM is the application of data mining techniques to perform these steps, to discover and analyse patterns automatically in (enriched) clickstream data Its applications include customer profiling, personalisation of online services, product and content recommendations, and various other applications in E-commerce and web marketing There are three major stages in the WUM process (see Figure 2): (I) data collection and pre-processing, (II) pattern discovery, and (III) pattern analysis (see, for example, [18, 51, 67]) Web Usage Data Sources The clickstream data contain information on each user click, such as the date and time of the clicks, the URI of visited web sources, and some sort of user identifier (IP, browser type and, in the case of authenticationrequired sites, login names) An example of (artificially designed) user access log data can be seen in Table In addition to server-side log data, some applications allow the installation of special software on the client side (see, for example, [3]) to collect various other information (e.g scrolling activity, active window), and, in some cases, more reliable information (e.g actual page view time) Web access information can be further enriched by, for example, user registration information, search queries, and geographic and demographic information Pre-processing Raw log data need to be pre-processed; first, by filtering all irrelevant data and possible noise, then by identifying unique visitors, and by recovering Fig An overview of the web usage mining process 166 J Shen and S Yuan service requestors to select certain services without understanding details of their principles In contrast, our new UOW-SWS is built by taking considerations of new intuitive correlations between various service quality measurements and also testified upon a well-founded peer-to-peer e-service workflow system, which the authors have developed in the past [14, 17] Conclusion and Future Work Integration of QoS and spatial feature profiles in P2P-based service provision can benefit both e-service requestors and providers It enables adaptive and autonomous service selection and composition, thus addressing quality-based user requirements with regard to non-functional properties such as availability, performance and spatial distance In this chapter, we incorporate QoS specification and spatial consideration for P2P-based service mining and selection In dynamic and decentralised environments, how to utilise WSMO to extend QoS and spatial-featured mechanism into eservices’ composition is a significant issue, and it brings a new set of critical challenges and requirements yet to be explored and answered We augmented WSMO description by involving real QoS perspectives and geographic profiles We also designed and implemented an effective algorithm to facilitate the peer selection Within the near future, our service peer selection model is expected to be modernized by focusing on concrete and detailed geographic features for location-based services, and we will improve our prototype for P2P-based workflow under a dynamic circumstance more effectively Through this effort, we will be extending more complicated and useful specifications (e.g representing more complicated geographical knowledge) as well as protocols to enhance the accessibility, reliability and availability of e-services in decentralised information systems References Ankolekar, A., Burstein, M., Hobbs, J.R., Lassila, O., Martin, D.L., McDermott, D., McIlraith, S.A., Narayanan, S., Paolucci, M., Payne, T.R., Sycara, K.: DAML-S: Web service description for the semantic web In: Horrocks, I., Hendler, J (eds.) ISWC 2002 LNCS, vol 2342, p 348 Springer, Heidelberg (2002), http://www.cs.cmu.edu/~ terryp/Pubs/ISWC2002-DAMLS.pdf JXTA: JXTA v2.0 Protocols Specification, http://jxta-spec.dev.java.net/ JXTAProtocols.pdf Kim, J.W., Kim, J.Y., Kim, C.S.: Semantic LBS: Ontological approach for enhancing interoperability in location based services In: Meersman, R., Tari, Z., Herrero, P (eds.) OTM 2006 Workshops LNCS, vol 4277, pp 792–801 Springer, Heidelberg (2006) Lee, K., Jeon, J., Lee, W., Jeong, S., Park, S.: QoS for Web services: Requirements and Possible Approaches W3C Working Group Note 25 (2003), http://www.w3c.or kr/kr-office/TR/2003/ws-qos/ Liu, Y.T., Ngu, A.H.H., Zeng, L.Z.: QoS computation and policing in dynamic Web service selection In: Proceedings of International Conference on World Wide Web (WWW 2004), Alternate track papers & posters, pp 66–73 Adaptive E-Services Selection in P2P-Based Workflow 167 LSDIS: METEOR-S: Semantic Web services and Processes, http://lsdis.cs uga.edu/projects/meteor-s/ Mou, Y., Cao, J., Zhang, S.S., Zhang, J.H.: Interactive Web Service Choice-Making Based on Extended QoS Model In: The Fifth International Conference on Computer and Information Technology (CIT 2005), pp 1130–1134 IEEE Press, Los Alamitos (2005) Negri, A., Poggi, A., Tomaiuolo, M., Turci, P.: Ontologies and web services: Agents for ebusiness applications In: Proceedings of the Fifth International Joint Conference on Autonomous Agents and Multi-Agent Systems (AAMAS 2006), pp 907–914 ACM Press, New York (2006) Papaioannou, I.V., Tsesmetzis, D.T., Roussaki, I.G., Miltiades, E.A.: QoS Ontology Language for Web-Services In: Proceedings of the 20th International Conference on Advanced Information Networking and Applications (AINA 2006), vol 1, pp 18–20 IEEE Press, Los Alamitos (2006) 10 Ran, S.: A model for Web services Discovery with QoS ACM SIGecom Exchanges 4(1), 1–10 11 Roman, D., Keller, U., Lausen, H., et al.: Web Service Modeling Ontology Applied Ontology 1(1), 77–106 (2005) 12 Schlosser, M., Sintek, M., Decker, S., Nejdl, W.: A scalable and ontology-based P2P infrastructure for semantic web services In: Proceedings of the Second International Conference on Peer-to-Peer Computing (P2P 2002), p 104 (2002) 13 Shen, J., Yan, J., Yang, Y.: SwinDeW-S: extending P2P workflow systems for adaptive composite Web services In: Australian Software Engineering Conference (ASWEC 2006), pp 61–69 (2006) 14 Shen, J., Yang, Y., Yan, J.: A P2P based Service Flow System with Advanced Ontologybased Service Profiles Advanced Engineering Informatics 21(2), 221–229 (2007) 15 Toma, I., Foxvog, D., Jaeger, M.C.: Modeling QoS characteristics in WSMO In: Workshop on Middleware for Service Oriented Computing (MW4SOC 2006), pp 42–47 (2006) 16 Tsesmetzis, D.T., Roussaki, I.G., Papaioannou, I.V., Anagnostou, M.E.: QoS awareness support in Web-Service semantics In: Proceedings of the Advanced International Conference on Telecommunications and International Conference on Internet and Web Applications and Services (AICT/ICIW 2006), p 128 (2006) 17 Yan, J., Yang, Y., Raikundalia, G.: SwinDeW - a p2p based decentralized workflow management system IEEE Transactions on Systems, Man and Cybernetics - part A 36(5), 922–935 (2006) 18 Yuan, S., Shen, J.: Mining E-Services in P2P-based Workflow Enactments Web Mining Applications in E-commerce and E-services 32(2), 163–178 (2008) (special issue) (Online Information Review) 19 Yuan, S., Shen, J.: QoS Aware Service Selection in P2P-Based Business Process Frameworks In: Proceedings of the 9th IEEE Conference on E-Commerce Technology and the 4th IEEE Conference on Enterprise Computing, E-Commerce and E-Services (CEC/EEE 2007), Tokyo, Japan, pp 675–682 (2007) 20 Zhou, C., Chia, L.T., Lee, B.S.: DAML-QoS Ontology for Web services In: Proceedings of International Conference on Web services (ICWS 2004), pp 472–479 (2004) Web Mining Techniques for On-Line Social Networks Analysis: An Overview I-Hsien Ting1 and Hui-Ju Wu2 Department of Information Management, National University of Kaohsiung No 700, Kaohsiung University Road, Kaohsiung City, 811 Taiwan iting@nuk.edu.tw Institute of Human Resource Management, National Changhua University of Education d94311001@mail.ncue.edu.tw Abstract On-line social networking has become a very popular application of Web 2.0 ages This chapter provides a study about the issues of using web mining techniques for on-line social networks analysis Techniques and concepts of web mining and social networks analysis will be introduced and reviewed in this chapter as well as a discussion about how to use web mining techniques for on-line social networks analysis Moreover, in this chapter, a process to use web mining for on-line social networks analysis is proposed, which can be treated as a general process in this research area Discussions of the challenges and future research are also included in this chapter Keywords: Web Mining, Social Networking, Social Networks Analysis, Association Rule, Visualization Introduction Social networks analysis is an interesting research direction to analyze these structures and relationships of social networks, such as the analyses of density, centrality and cliques of social network structure [32] A social network is usually formed and constructed by daily and continuously communications of people and it therefore includes different relationships, such as the positions, betweeness and closeness among individuals or groups [21] In order to understand the social structure, social relationships and social behaviors, social networks analysis therefore is an essential and important technique that has to be studied In recent years, on-line social networking is a very hot and popular application in the age of web 2.0 [16], which allows user to communicate, interact and share in the World Wide Web [1] Some on-line social networking websites now even become the most popular sites on the web [29][11][22] For example, Flickr for online album sharing, Youtube for online video sharing, Linkdin for connecting from college and employee, some portal-based on-line social networking websites are multi-functions integrated, such as Facebook, MySpace, etc In addition, the rapid growths of Blog systems also provide good platforms for users to communicate and share Thus, online social networking now is part of our human’s life huge resources of communication contents, relationships and behaviors for us to on-line social networks analysis I.-H Ting, H.-J Wu (Eds.): Web Mining Appl in E-Commerce & E-Services, SCI 172, pp 169–179 springerlink.com © Springer-Verlag Berlin Heidelberg 2009 170 I.-H Ting and H.-J Wu are produced by the incredible developments of on-line social networking websites and applications [9] The history of social networks analysis is older than everybody how is reading this chapter The history of social networks analysis is more than hundred years from around 1900’s, and mostly in the research areas of sociology [32] During this period, the studies of social networks analysis were focusing on small groups and small social networks However, it becomes harder and harder to analyze manually for those broad social networks, such as the World Wide Web Therefore, the strong computer ability and information technologies has became very important tool for social networks analysis and the search direction is therefore now moving from sociology to computer science For on-line social networks analysis, the analysis targets are mainly focused on resources from the web, such as the contents of the web, the structures of the web and the usage behaviors of users in the web Among the information techniques that can be used for the analysis of on-line social networks, web mining is claimed to be the most suitable one [7] Web mining is an application of Data Mining and the analysis targets of web mining are mainly from the World Wide Web, such as web content mining, web structure mining and web usage mining [5] Therefore, it is more than suitable to use the web mining techniques for on-line social networks analysis, and it is also the focus of this chapter The chapter is structured as follows: section is the background and introduction of this chapter, and the literature reviews about social network analysis and web mining are provided in section In section 3, a study of how web mining techniques can be used for on-line social networks analysis will be included and a general process for applying the web mining techniques for on-line social networks analysis will be proposed in section In section 5, a discussion of the challenge of using web mining for on-line social networks analysis will be provided and the future research directions will be included in this section as well.2 Background Literature Review In this section, related literatures about social networks analysis and web mining will be reviewed, in order to present a broad view about these two topics for readers 2.1 Social Networks Analysis In the research area of social networks analysis, it is usually the main task about how to extract social networks from different communication resources [21] [26] The data that used for building social networks is relational data [32], which can be obtained and transferred from different resources including the web, email communication, internet relay chats, telephone communications, organization and business events, etc [4] For example, the email communication is a rich source for extracting and constructing social networks In the issue of email social networks extraction, the relationship between email senders and receivers can be transformed by measuring the frequency of email communication with take the communication behavior (such as Web Mining Techniques for On-Line Social Networks Analysis: An Overview 171 reply, forward, etc.) into account [8] The transformed relational data can then be used for social networks construction In the past three decades, social network analysis has developed a range of concepts and methods for detecting structural patterns, identifying patterns of different types of relationship interrelate, analyzing the implications that structural patterns for the behavior of network members, studying the impact on social structures of network members and their social relationships [37][ 32] [35] Types of Social Network Analysis A social network has a set of relations of ties, which can be viewed in two different ways One approach focus on an individual, called ego-centered network, and put it at the centers of the network Members of the network are defined by the relations with the ego Ego-centered network analysis can show the range and breadth of connectivity for individuals and identify those who have access to diverse pools of information and resources The ego-centered approach is useful when the population is large, or the boundaries of the population are hard to define [23] [38].The second approach considers the whole network based on some specific criterion of population boundaries such as a formal organization, department, club or kinship group Whole network analysis can identify those members of the network who emerge as central figures or who act as bridges between different groups This approach requires responses from all members on their relations with all others in the same environment, such as the extent of email and video communication in a workgroup [18] Key Concepts of Social Network Analysis Network analysis provides a rich and systematic means of assessing such network by mapping and analyzing relationships among people, teams, departments or even the entire organization [25] A network is composed of three elements—(1) actors (2) relations between actors, and (3) the linkages among actors Actors and their actions are viewed as interdependent rather than independent, autonomous units Actors can be persons, organizations, or groups, or any other set of related entities Relations between actors are depicted as links between the corresponding nodes [39] A tie connects a pair of actors by one or more relations Pairs may maintain a tie based on one relation only or a multiplex tie based on many relations Thus, ties also have characteristics like content, direction and strength, but they are often referred to as weak or strong Social network analysts have found that multiplex ties are more intimate, voluntary, supportive and durable [36] In addition, the linkages among actors have several characteristics, which are direction, degree, and content The direction of linkages covers symmetrical and asymmetrical relations; the degree of linkages means the strength of relations, and the content of linkages includes friendship, information, power, and influence, etc Owing to complex properties of nodes, relations, and linkages, scholars utilizing the concept of network in their studies have different definitions of network [13] SNA Techniques Visualization is also a hot topic of social network analysis, and it is a suitable technique in this area Through the visualization of social networks, the characters of 172 I.-H Ting and H.-J Wu social networks can be understood easily, such as the structure of networks, the distribution of nodes, the links (relationships) between nodes and the clusters and groups in the social networks[19][35] In additional to social network extraction and visualization, there are other measurements that can be used for social network analysis as well [35] For example, centrality degree of a social network is a measurement that is used to measure the betweenness and closeness of the social network [34] Betweenness centrality indicates the extent to which a node lies on the shortest path between every other pair of nodes Closeness centrality analyzes centrality structure of a network based on geodesic distances among nodes in a social network [6] Cluster coefficient is a measurement to discover the clusters in a social network and to measure the coefficient of the clusters The density measurement can be used to analyze the connectivity and the degree of nodes and links in a social network [24] The measurements path length and reachability can be used to analyze how to reach a node from another node in the social networks Structural hole is also a measurement of social network analysis, which can be used to discover the holes in a social network and by this to fill the hole and expand the social network [10] These sparse regions are structural holes that prove opportunities for brokering information flows among actors Thus, maximizing the structural holes spanned or minimizing redundancy between actors is an important aspect of constructing an efficient, information-rich network [3] 2.2 Web Mining In the introduction section of this chapter, a brief explanation of web mining has been provided It is an application of data mining and data mining is a technique to discover and extract useful information from large data sets or databases [17] For web mining, the definition therefore can be explained as to discover or extract useful information from the web [5] Different Types of Web Mining According to different analysis targets and resources, the web mining techniques can be divided into three different types, which are Web Content Mining, Web Structure Mining and Web Usage Mining Web content mining is a web mining technique to analyze the contents in the web, such as texts, graphs, graphics, etc [2] Recently, most of web content mining researches are focused on the text data processing and few are focused on other multimedia data Natural language process is therefore the main technology that used in this area The concept and techniques of Semantic Web and Ontology also have to be studied [14][27] Web structure mining is a technique that can be used to analyze the links and structure of websites Graph theory is usually the main concept and theory for web structure mining to analyze and explain the structure of websites In addition, the extraction of the structure of websites is always essential in this research area [10] Therefore, it’s usually the concern about how to design and implement a crawler (or spider, bots) to extract and construct the structure of websites, such as the research topic of Deep-web Web Mining Techniques for On-Line Social Networks Analysis: An Overview 173 Web usage mining is a web mining technique that can be used to analyze how the websites have been used, such as the navigation behavior of users The server-side Clickstream data (logs file) is the main sources that used for web usage mining Client-side data (such as client-side logs file, cookies) is sometimes to be used due to some research concerns, such as in order to record more complete behavior of users Different web usage mining analyses include basic statistical analysis of the navigation behavior of users in a website, such as how many times the website has been browsed, where the users comes from, etc Furthermore, advanced web usage mining analyses can also be provided, such as more complex analysis for understand the navigation history of users in a website or cross-website analysis [31] Web Mining Techniques Traditional data mining techniques can also be provided for web mining, such as classification, clustering, association rule mining, and visualization In web mining, the classification algorithms can be used to classify users into different classes according to their browsing behavior For example, a classification application classifies their users according to their browsing time After classification, a useful classification rule like “30% of users browse product/food during the hours 8:00-10:00 PM” can be discovered The difference between classification and clustering is that the classes in classification are predefined (supervised), but in clustering are not predefined (unsupervised) The criterion by which items are assigned to different clusters is the degree of similarity among them The main purpose of Clustering is to maximize both the similarity of the items in a cluster and the difference between clusters [20] The association rule mining technique can be used to indicate pages that are most often referenced together and to discover the direct or indirect relationships between web pages in users’ browsing behavior [31] For example, an association rule mining in the web usage mining area should take the form “the people who view web page index.htm and also view product.htm the support=50% and the confidence=60%” Visualization is a special analysis technique in web mining that allows data and information to be understood or recognized by human eyes Graphical and visualized means are used for this kind of technique to represent data, information and analysis results [19] In web structure mining, it usually plays an important role to illustrate the structure of hypertexts and links in a websites or the linking relationship between websites For other two types of web mining techniques, visualization is also an ideal tool to model the data or information For example, a graph (or map) can be used for web usage mining to present the traversal paths of users or a statistic graph to show the information of web usage This approach enables the analyzer to understand and to interpret the analysis results of web usage mining very efficiency Web Mining Techniques for On-Line Social Networks Analysis In this section, the three different types of web mining and the techniques of web mining that introduced in section will be used for discussion to show how these techniques can be used for on-line social networks analysis 174 I.-H Ting and H.-J Wu 3.1 The Three Web Mining Types for On-Line Social Networks Analysis Web content mining, text mining or natural language processing are very useful techniques that can be used for on-line social network analysis For example, web content mining can be used to categorize or classify the documents of on-line social networking website, especially for blog or text forum analysis to categorize or classify the articles of blogs The article categorization is usually the first task for many on-line social networks analyses or applications Furthermore, web content mining can also be used for on-line social networks analysis to analyze users’ reading interests, such as the favorite contents of users However, for most on-line social networks analysis tasks, it is usually necessary to work with other types of web mining and techniques collaboratively For example, the social networks analysis likes this case is necessary to use the concept and techniques of web content mining and web usage mining Web usage mining plays an important role in on-line social networks analysis as well It is useful for the on-line social network analysis of social networks extraction that discussed in section of this chapter The usage data and users’ communication in on-line social networking website can be transformed to relational data for socialnetworks construction [24] In addition, web usage mining is also a tool to measure the centrality degree For example, the closeness of blog users can be measured by: Closeness = ( f * ( w * b)) + ( f * ( w * r )) + ( f * ( w * i )) (1) In the equation above, the f denotes the frequency of a blog behavior, and w is the weight of closeness for each blog behavior The three blog behaviors are b=browsing, r=reading and i=interaction This is just a simple case of web usage mining, and the techniques of web usage mining allow many possible means of on-line social networks analysis Web structure mining is the third kind of web mining and it is also useful for extracting and constructing on-line social networks to extract the links from WWW, email or other sources Web structure mining also can be used to analyze the path length, reachability or to find the structural holes, which are very basic and traditional social networks analyses Web structure mining usually provides graph and visualized to represent the data and information of social networks, which enables the analyzer to understand and to analyze social networks easily [15] For most on-line social networks analyses, the three types of web mining can’t work alone by just using one of them, it is not similar to other web usage mining applications Sometimes, the three different types of web mining maybe used for just one particular on-line social network analysis 3.2 Web Mining Techniques for On-Line Social Networks Analysis There are many different kinds of web mining techniques, such as those discussed in section of this chapter In this section, two of them will be used as examples to explain how web mining techniques can be used for on-line social networks analysis The two techniques are clustering and association rule mining Clustering is an important web mining techniques for on-line social networks analysis In social networks analysis, finding a group of closet people in a network or Web Mining Techniques for On-Line Social Networks Analysis: An Overview 175 cross networks are usually the main tasks Normally, this task is achieved by using visualization technique in a small social network However, only few groups in a social network are expected to be discovered by using this approach, and further analyses are hard to be taken Thus, the clustering technique can of helping for a large social network to identify more groups and clusters Moreover, it can provide many detail information than just using visualization [33] They include the closeness of a group, the detail information of members in a group and the relationship between groups in a social network Association rule mining is another web mining technique that is popular to be used in traditional data mining application, such as marketing analysis, and it is therefore also called market-basket analysis In social network analysis, the association rule mining can help us to discover the hidden relationships between nodes in a social network or even cross networks For example, an association rule for on-line social networks analysis maybe “the person A who know person B and also know person C, the support is 0.9 and the confidence is 0.5” or “the person who read person A’s blog article and also read person B’s blog article, the support is 0.9 and the confidence is 0.5” The association rule mining therefore can provide different analysis and to transform more relational data and to identify more nodes and relationships in social networks In addition, the association rule mining is helpful for the application after social networks analysis, such as recommendation systems or information filtering systems [28] The Process to Use Web Mining for On-Line Social Networks Analysis In this section, a general process to use web mining for on-line social networks analysis will be proposed, and the details of each step in the process will be discussed as well Figure presents the general process to use web mining for on-line social networks analysis The steps in the process including selection of analysis targets, selection of on-line social networks analysis, data preparation, web mining techniques selection, results presentation and interpretation, recommendation and action The first step in the process is the selection of analysis targets In this step the analysis targets will be selected, such as web, email, telephone communication, etc Sometimes, there may not only one target will be selected due to some analyses may focus on the analysis of multi-targets After this step, then we can select what kind online social networks analysis will be proceeded Once the analysis targets and on-line social networks analysis methodology have been selected, the next step is data preparation In this step, related data will be collected in this stage for analysis and the data will be cleaned and formed as the final format to store in database Then the next step in the process is web mining techniques selection and to proceed the selected web mining techniques As discussed in the previous sections of this chapter, there may not only one web mining techniques will be selected and sometimes the collaboration of different types of web mining technique is necessary After selecting suitable techniques for on-line social networks analysis, the selected techniques then will be used to analyze the data that collected and prepared in the third step of the process 176 I.-H Ting and H.-J Wu Selection of Analysis Targets Selection of Online Social Networks Analysis Data Pre-paration Web Content Mining Web Usage Mining Web Structure Mining Web Mining Techniques Selection Results Presentation and Interpretation Recommendation and Action Fig The general process to use web mining for on-line social networks analysis The analysis results after web mining then will be presented and the results will be interpreted either manually or automatically Visualization technique sometime is used to assist the presentation of analysis results, such as the extracted social networks The last step of the general process to use web mining for on-line social networks analysis is recommendation and action This is an optional step in the process, and the process may be terminated after the analysis results have been generated The recommendation and action step is the step to deal with the analysis results that generated in the previous step For example, if the structure holes in a social network has been discovered by on-line social networks analysis, the recommendations about how to fill the hole will be generated (by manually or automatically) and then to take appropriate actions for the generated recommendations The general process to use web mining for on-line social networks could be a continuous process In some research projects, the process will be started again after recommendations have been generated and actions have been taken The process will be started again in order to perform the action performance evaluation or to hold a new social networks analysis The process that proposed in this section is a general process and it doesn’t mean any case that uses web mining techniques for on-line social networks analysis will follow the process In some cases, the modifications of this process are necessary to suit the requirements of different on-line social networks analysis projects For example, the recommendation and action step in the process may not necessary for some cases and can be removed from the general process Web Mining Techniques for On-Line Social Networks Analysis: An Overview 177 Discussion In this chapter, a study of applying the concept and techniques of web mining for online social networks has been provided, and related literatures of web mining and social networks analysis have been reviewed Moreover, how to use web mining and a general process of using web mining for on-line social networks analysis have also been proposed in this chapter Web mining based techniques are proving to be useful for analysis of on-line social network data, especially for large datasets that cannot be analyze by traditional methods This chapter will help researchers by providing a review on the research, enable the understanding of how web mining can be useful for on-line social network analysis and motivate them to pursue new research in different field It is an interesting topic about how to use the techniques of web mining for on-line social networks analysis However, there are several challenges in this research area to be overcome For example, data sampling is a big issue when using web mining for on-line social networks analysis In other web mining applications, data sampling is a simple task to reduce the amounts of data size However, in on-line social networks analysis, it becomes a hard task to select suitable samples which can represent the real social networks Furthermore, how to collaborative different types of web mining techniques for a particular on-line social networks analysis is usually issue The approach and process that proposed in this chapter is of helping to deal with some cases of on-line social networks analysis, but tunings of the process are sometimes necessary It will be useful for practitioners from internet world to understand how web mining based techniques can help them handle the relation of on-line social networks In future research, we will shift our research focus to overcome the challenges that discussed above, such as the how to reduce the data size and not affects the characters of the social networks In addition, we will focus on how to apply the web mining techniques to some real on-line social networking websites, such as blog website and on-line album These should be the next targets for us to use the web mining techniques for on-line social networks analysis References [1] Adamic, L.A., Adar, E.: Friends and Neighbors on the Web Social Networks 25, 211– 230 (2007) [2] Agrawal, R., Rajagopalan, S., Srikant, R., Xu, Y.: Mining Newsgroup Using Networks Arising From Social Behavior In: Proceedings of World Wide Web, Conference, Budapest, Hungary, pp 529–535 (2003) [3] Burt, R.S.: Structural Holes Harvard University Press, Cambridge (1992) [4] Cai, D., Shao, Z., He, X., Yan, X., Han, J.: Mining Hidden Community in Heterogeneous Social Networks In: Proceedings of LinkKDD 2005 Conference, August 21, Chicago, IL, USA, pp 58–65 (2005) [5] Cooley, R., Mobasher, B., Srivastave, J.: Web Mining: Information and Pattern Discovery on the World Wide Web In: Proceedings of the 9th IEEE International Conference on Tool with Artificial Intelligence, Newport Beach, CA, USA, pp 558–567 (1997) [6] Cross, R., Parker, A.: The Hidden Power of Social Networks Harvard University Press (2004) 178 I.-H Ting and H.-J Wu [7] Chakrabarti, S.: Mining the Web: Discovering Knowledge from Hypertext Data Morgan Kaufmann Publishers, San Francisco (2003) [8] Chin, A., Chignell, M.: Finding Evidence of Community from Blogging Co-Citations: A Social Network Analytic Approach In: Proceedings of the IADIS International Conference on Web Based Communities 2006, San Sebastian, Spain, February 26-28 (2006) [9] Churchill, E.F., Halverson, C.A.: Social Networks and Social Networking IEEE Internet Computing, 14–19 (September/October 2005) [10] Fu, F., Chen, X., Liu, L., Wang, L.: Social Dilemmas in An Online Social Network: The Structure and Evolution of Cooperation Physics Letters A 371, 58–64 (2007) [11] Fu, F., Liu, L., Wang, L.: Empirical Analysis of Online Social Networks in the age of Web 2.0 Physica A 387, 675–684 (2008) [12] Furukawa, T., Matsuo, Y., Ohmukai, I., Uchiyama, K., Ishizuka, M.: Social Networks and Reading Behavior in the Blogosphere In: Proceedings of ICWSM 2007, Boulder, Colorado, USA, pp 51–58 (2007) [13] Garton, L., Haythornthwaite, C., Wellman, B.: Studying Online Social Networks Journal of Computer Mediated Communication 3, (1997) [14] Godbole, N., Srinivasaiah, M., Skiena, S.: Large-Scale Sentiment Analysis for News and Blogs In: Proceedings of ICWSM 2007, Boulder, Colorado, USA (2007) [15] Goodreau, S.M.: Advances in Exponential Random Graph (p*) Models Applied to A Large Social Network Social Network 29, 231–248 (2007) [16] Goth, G.: Are Social Networking Growing Up? IEEE Distributed Systems Online 9(2) (February 2008) [17] Hand, D., Mannila, H., Smyth, P.: Principles of Data Mining MIT Press, Cambridge (2001) [18] Haythornthwaite, C., Wellman, B., Mantei, M.: Work relationships and media use: A 74 social network analysis Group Decision and Negotiation 4(3), 193–211 (1995) [19] Heer, J., Boyd, D.: Vizster: Visualizing Online Social Network In: Proceedings of 2005 IEEE Symposium, October 23-25, Minneapolis, MN USA, pp 32–39 (2005) [20] Jain, A.K., Murty, M.N., Flynn, P.J.: Data Clustering: A Review ACM Computing Surveys 31(3), 264–323 (1999) [21] Jin, Y.Z., Matsuo, Y., Ishizuka, M.: Extracting Social Networks among Various Entities on the Web In: Proceedings of the Fourth European Semantic Web Conference (2007) [22] Kumar, R., Novak, J., Tomkins, A.: Structure and Evolution of Online Social Networks In: Proceedings of KDD 2006 Conference, August 20-23, Philadelphia, Pennsylvania, USA, pp 611–617 (2006) [23] Laumann, E., Marsden, P., Prensky, D.: The boundary specification problem in network analysis In: Burt, R., Minor, M (eds.) Applied network analysis, pp 18–34 Sage, Beverly Hills (1983) [24] Lento, T., Welser, H.T., Gu, L., Smith, M.: The Ties that Blog: Examining the Relationship Between Social Ties and Continued Participation in the Wallop Weblogging System In: Proceedings of the 15th International World Wide Web Conference, May 23-26, Edinburgh, Scotland (2006) [25] Lutters, W.G., Ackerman, M.S., Boster, J., McDonald, D.W.: Creating a knowledge mapping instrument: approximation techniques for mapping knowledge networks in organizations (ICS Technical Report No 99–32) Center for Research on Information Technology and Organizations University of California, Irvine, CA (2001) [26] Matsuo, Y., Tomobe, H., Nishimura, T.: Robust Estimation of Google Counts for Social Network Extraction In: Proceedings of Twenty Second Conference on Artificial Intelligence (AAAI 2007), July 22-26, Vancouver BC Canada (2007) Web Mining Techniques for On-Line Social Networks Analysis: An Overview 179 [27] Mika, P.: Flink: Semantic Web Technology for the Extraction and Analysis of Social Networks Web Semantics 3(2-3), 211–223 (2005) [28] Mishne, G.: Using Blog Properties to Improve Retrieval In: Proceedings of ICWSM 2007, Boulder, Colorado, USA (2007) [29] Mislove, A., Marcon, M., Gummadi, K.P., Druschel, P., Bhattacharjee, B.: Measurement and Analysis of Online Social Networks In: Proceedings of 2007 Internet Measurement Conference, October 24-26, San Diego, California, USA, pp 29–42 (2007) [30] Nowson, S., Oberlander, J.: Identifying More Bloggers In: Proceedings of ICWSM 2007, Boulder, Colorado, USA (2007) [31] Pierrakos, D., Paliouras, G., Papatheodorou, C., Spyropoulos, C.D.: Web Usage Mining As A Tool for Personalization: A Survey User Modelling and User Adapted Interaction 13(4), 311–372 (2003) [32] Scott, J.: Social Network Analysis: A Hand Book, 2nd edn SAGE publication, Thousand Oaks (2000) [33] Tseng, B.L., Tatemura, J., Wu, Y.: Tomographic Clustering to Visualize Blog Communities as Mountain Views In: Proceedings of World Wide Web 2005 Conference, Chiba, Japan, May 10-14 (2005) [34] Wang, Y., Li, X.: Social Network Analysis of Interaction in Online Learning Communities In: Proceedings of Seventh IEEE International Conference on Advanced Learning Technologies, July 18-20, Niigata Japan, pp 699–700 (2007) [35] Wasserman, S., Faust, K.: Social Network Analysis: Method and Applications Cambridge University Press, Great Britain (2003) [36] Wellman, B., Wortley, S.: Different strokes from different folks: Community ties and social support American Journal of Sociology (96), 558–588 (1990) [37] Wellman, Berkowitz, S.D (eds.): Social structures: A network approach, pp 19–61 Cambridge University Press, Cambridge (1988) [38] Wellman, B.: Studying personal communities In: Marsden, P., Lin, N (eds.) Social structure and network analysis, pp 61–80 Sage, Beverly Hills (1982) [39] Wellman, B., Salaff, J., Dimitrova, D., Garton, L., Gulia, M., Haythornthwaite, C.: Computer networks as social networks: Virtual community, computer-supported cooperative work and telework Annual Review of Sociology (22), 213–238 (1997) Author Index Berendt, Bettina 45 Borzemski, Leszek 81 Nagy, Istvan K 117 Norguet, Jean-Pierre 65 Caussanel, Jean Shen, Jun 153 Steinberger, Ralf 103 Gaspar-Papanek, Csaba Hofgesang, Peter I Ishizuka, Mitsuru Jin, Yingzi 137 137 Matsuo, Yutaka 137 Mroue, Ali 103 117 65 Ting, I-Hsien 169 Tră umper, Daniel 45 Uchyigit, Gulden Wu, Hui-Ju 169 Yuan, Shuai 153 Zim´ anyi, Esteban 25 65 ... practices in the web mining research Through applying Web- mining techniques to e- commerce and e- services, value is enhanced and the research fields of Web mining, e- commerce and e- services can be expanded... of Web mining, there are three sub-fields of Web- mining research: Web usage mining, Web content mining and Web structure mining These three research fields cover most content and activities on... e- commerce and e- services, proving to be useful tools for understanding how e- commerce and e- service Web sites and services are used, enabling the provision of better services for customers and users

Ngày đăng: 02/03/2020, 13:49

Mục lục

    Online Mining of Web Usage Data: An Overview

    Semantically Enhanced Web Personalization

    Data Preperation: Ontology Learning, Extraction and Pre-processing

    User Modelling with Semantic Data

    Summary and Future Work

    Semantics-Based Analysis and Navigation of Heterogeneous Text Corpora: The {\it Porpoise} News and Blogs Engine

    Semantic Analysis of Web Site Audience by Integrating Web Usage Mining and Web Content Mining

    Motivations and Related Work

    Log Files and Content Journaling

    Conclusions and Future Work