1. Trang chủ
  2. » Luận Văn - Báo Cáo

Social network analysis

44 313 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 44
Dung lượng 1,92 MB

Nội dung

~1~ Abstract Social network analysis is one of the most active topics in the central of research nowadays. It has been widely used in various domains such as sociology, biology, economics, as well as information science. From the very early start, researchers used the concept of centrality to analyze networks. In 1948, Bavelas [14] proposed the idea of centrality as applied to human communication. He was specifically concerned with communication in small groups and hypothesized a relationship between structural centrality and influence in group processes. For years, it has been agreed that centrality is an important structural factor of social networks, and many measures of centrality have been proposed, including four widely used measures: degree centrality, betweenness centrality, closeness centrality, and eigenvector centrality [34]. The Web is an example of social network, references from page to page create a hyperlink structure of the internet. The most interesting application of analyzing this network is information retrieval system (or search engine). After crawling web pages to a local store, we create a network based on the links between the pages, and then compute the quality of each page, which is called static rank. The static rank helps information retrieval systems to return more relevant results to a query. PageRank and HITS are the two most widely used algorithms in today search engines to calculate the static rank. Besides, social networking sites, known as blog in another word, have become more and more popular. These sites have its own properties that challenge traditional search engines in some context, such as users searching for users, which we have to find all users that have the shortest path to the user ~2~ issuing the query [23]. It is also possible to apply PageRank to blog search, but with some modification to fit the blog’s properties. Recently, several local search engines have appeared in Vietnam, including xalo, 7sac, baamboo, socbay, headvances, etc, but only three o of them, xalo, bamboo and headvances, have blog search, and none uses any link-based ranking algorithm to improve their ranking. We consider that there is a link between two bloggers if one of them left a comment on the other. More precisely, we model these relations as a network with nodes are bloggers and ties are “commenting” relations. If blogger A left n comments on blogger B, we construct two corresponding nodes A and B, and a directional tie from A to B with the weight n. We have modified the PageRank algorithm to take the weight of tie into account, which calculate the static rank of each blogger more precisely. ~3~ Acknowledgement I would like to thank my supervisors, Assoc. Prof. Dr Ha Quang Thuy and Ms. Nguyen Thu Trang at College of Technology, VNUH, for all their understanding, supports and encouragements that help me finish this thesis. I also want to thank my colleagues at Tinh Van Media for all their helps, especially Mr. Pham Thuc Truong Luong and Mr. Nguyen Quan Son for allowing me doing experiments with their search platform. My last words are to thank my dear friends, who always beside me, encourage me and spend time proofreading the manuscript. ~4~ Contents Abstract 1 Acknowledgement 3 List of Figures 5 Chapter 1 6 Introduction to Social Network 6 1. Social network 6 2. Network construction 8 3. Network representation 10 4. A brief introduction of graph theory 12 5. Social network’s characteristics 14 6. Social network analysis – SNA 17 Chapter 2 19 Ranking in social network – Social rank 19 1. Introduction 19 2. Ranking in social networks 20 Chapter 3 29 Ranking bloggers and Experiments 29 1. Background and Motivation 29 2. Ranking bloggers by PageRank 34 3. Experiment setup and Results 35 Conclusion and Future works 40 Biblography 41 ~5~ List of Figures Figure 1: A symmetric relationship 6 Figure 2: A directional relationship 6 Figure 3: Internet Alliances 8 Figure 4: A socio-gram 10 Figure 5: Graph and adjacent matrix 11 Figure 6: six degrees of separation 15 Figure 7: Real world example of small world networks. 16 Figure 8: The Kite Network 21 Figure 9: An example showing how pagerank works 26 Figure 10: Đầu gấu’s blog 33 Figure 11: The corresponding network of Đầu gấu’s blog. 34 Figure 12: Blog Ranking Architecture 35 Figure 13: A part of the Yahoo 360 network 37 Figure 14: Top 10 bloggers based on number Of comments 38 Figure 15: Top 10 bloggers based on PageRank 38 ~6~ Chapter 1 Introduction to Social Network 1. Social network Social network is a social structure made of nodes and ties, where nodes might be people, groups, organizations… and ties might be relations, flow or exchange between the nodes [33]. In the simplest form, the network contains two nodes and one relationship that connects them [12]. The context might be people studying at the same university. As you can see Minh and Thu has a relationship because they study at the same class at university, so in this kind of network, there is a tie between the two nodes Minh and Thu. Figure 1: A symmetric relationship The previous network is un-directional or symmetric, that mean A knows B and B knows A as well, such relationships are friendships, neighbor, kinship, companionship, or just living in the same room. But in reality, there are a lot of relationships which are directional such as financial exchange, like (dislike), information flow, or disease transmission. For instance, Minh likes Thu, but Thu might not like Minh. Figure 2: A directional relationship. studying at the same university Minh Thu likes Minh Thu ~7~ More complex networks have multi-relationships. These networks model many kinds of relationship between objects, or there might be many different ties between some two nodes [12]. Relationships might be more than sharing some attributes or being at the same place at the same time; the flow between the objects can form a relationship. Liking, for example, might lead to an exchange of gifts. In an organization, there is the flow of knowledge between people; they share information, experiences… and constitutes a network [12]. A tie might have a weight associated with it, explaining the strength of the relationship between the two objects. A long time friendship should be stronger than the friendship with someone you have just said “hi” in the street. Social network is unnecessary to be social in context. There are many real- world instances of technological, business, economic, and biologic social networks; such as electrical power grids, telephone call graphs, the World Wide Web, co-authorship and citation networks of scientists, the spread of computer viruses or water flow network in a city. The exchange of emails within organizations, newsgroups, chat rooms, friendships are examples from sociology [16]. ~8~ Figure 3: Internet Alliances Source: http://www.orgnet.com/netindustry.html 2. Network construction Given a set of nodes, there are several strategies to collect information (objects and relations) and creating a network. The first approaches are full network methods, which yields the maximum of information, but can also be costly and difficult to execute, and may be difficult to generalize. On the other hand, there are approaches that yield considerably less information about the network structure, but are often less costly, and often more easily generalize from the observations in the sample to some large population. And there is no one right way for all research questions and problems; each method has their own advantages and disadvantages. In this section, I will introduce an overview of 4 major methods in practice, refer to [29] for more details. ~9~ 2.1.1. Full network methods This approach begins with a set of actors and tries to collect information (relations or ties) with all other actors. For example, we could collect friendship data from all pairs of students in a college; we could count the number of vehicles moving between all pairs of cities or look at the flow of email between all pairs of employees in an organization. Because we collect information between all pairs of actors, full network methods draw a complete picture of relations in the population. Full network data is needed to properly define and measure many structural concepts of network analysis. The disadvantages of this approach is the cost of collecting information; the process is very expensive . 2.1.2. Snowball methods In these methods, we choose a set of actors as a starting point. We then include some other actors who have connections with each actor in the set. The process continue until no new actors are indentified, or until we decide to stop. Isolated actors are not located by this method, and the structure of the network depends greatly on how we choose the initial actors. 2.1.3. Ego-centric networks (with alter connections) It will not feasible and necessary to track down the full networks beginning with some initial nodes as in the snowball method for many cases. We can also begin with a set of some initial nodes and identify nodes that have connections with the initial nodes. Then, we determine which of the nodes identified in the first stage are connected to one another. ~10~ 2.1.4. Ego-centric networks (ego only) Ego-centric methods really focus on the individual, rather than on the network as a whole. These methods collect information on the connections among the actors connected to each focal ego, which still present a pretty good picture of the “local” networks, or “neighborhoods” of individuals. Such information is useful for understanding how networks affect individuals. 3. Network representation In order to analyze the social network, we need a way to represent it in a computational structure and to see how it looks like. Network analysis use graphs and adjacent matrices to model social networks, and use graph theories to do analyzing. Graphs are a very useful ways to present information about social networks. In simple networks, it is easy for us to look at the graph and predict patterns of information. Network analysis uses one kind of graphic display that consists of points to represent objects or nodes, and lines to represent ties or relations. The graphic is called socio-gram. They use various colors, shapes, names, etc, to represent different actors and relations [29]. Figure 4: A socio-gram Source: http://blogs.bnet.com/bnet1/images/sociogram_los_alamos.jpg [...]... algorithms from social network analysis, which use the link structure to determine the importance of individuals in the network 2 Ranking in social networks In social network analysis, one fundamental problem is ranking individuals in society according to their implicit importance, for example the power or influence, determined by the topology of the networks Precisely, given a social ~ 20 ~  network, the... Source: [25] Formally, we represent a network as a graph G = consisting of a set of vertices V = {vi} that represent social entities and a set of edges E = {eij} where eij represent information of the connection between the nodes i and j [25] ~ 11 ~  4 A brief introduction of graph theory A necessary course in social network analysis is graph theory As social networks can be represented as graphs,... this kind of network allows the ability of fault tolerant Because the random occurrence of failures and the number of small degree nodes are enormous, the likelihood that a hub would be affected is negligible Even if such even occurs, the networks will not lose its connectedness, because of the remaining hubs This property make scale free network highly stable and robust [36] 6 Social network analysis. .. [15] deals with mapping and measuring the nodes and relations between the nodes in a social network As stated previously, the nodes might be people, organizations, etc, and relations might be friendship, kinship, or water flow Social network analysis has become a key technique in modern sociology, anthropology, geography, social psychology, sociolinguistics, information science, communication studies,... century, people have used social network metaphor to model complex sets of relationships between actors of social systems at all scale ~ 17 ~  Analysts reason from whole to part, from structure to relation to individual, from behavior to attitude [33] So why do we have to study social networks and what we can learn about their structure? The reason is that the structure of a network affects its functionalities... connected to each other [31] Beside regular and random graph, the two extreme types of graph, network analysts also study some other types of networks, two most important of them are small world and scale free networks 5.1 Small world networks The experiments conducted by Stanley Milgram and his colleagues for social networks of people in the United States raising the concept of “small world” The phrase... We have also known some social networks as blogs, the word inherited from the phrase “web log”, which describes the activities of people writing down their information, such as their everyday life activities, interests, etc, and then people created the verb (we) “blog” Hence, the people participating in social networks are so called bloggers In order to participate in a social network, people have to... (a), (b), (c) are from www.nd.edu/~networks/publications.html#talks0001 by Barabasi, Oltvai, Jeong et al Figure 6 (d) is from http://tam.cornell.edu/Strogatz.html#pub ~ 16 ~  5.2 Scale free networks Many real world networks are scale free, which means the network will not change its properties no matter how many nodes it has The degree distribution of scale free networks follows the Yule-Simon distribution... patterns [29] The most common form of matrix in social network analysis is adjacent matrix, a square matrix with as many rows and columns as there are actors in the network The weights or scores in the cells of the matrix show information about the ties between each pair of actors This kind of matrix represents who is next to, or adjacent to whom in the social space” mapped by relations that we have... world network characterized by shorter than expected path lengths [16] In small world network, most nodes can be reached from every other by a small number of hops or steps Figure 6: six degrees of separation Source: http://en.wikipedia.org ~ 15 ~  Figure 7: Real world example of small world networks (a) science coauthor network, (b) connected pages on a part of the internet, (c) biochemical pathway network, . transmission. For instance, Minh likes Thu, but Thu might not like Minh. Figure 2: A directional relationship. studying at the same university Minh Thu likes Minh Thu ~7~ More complex. university. As you can see Minh and Thu has a relationship because they study at the same class at university, so in this kind of network, there is a tie between the two nodes Minh and Thu. Figure. Prof. Dr Ha Quang Thuy and Ms. Nguyen Thu Trang at College of Technology, VNUH, for all their understanding, supports and encouragements that help me finish this thesis. I also want to thank

Ngày đăng: 20/08/2014, 09:36

Nguồn tham khảo

Tài liệu tham khảo Loại Chi tiết
[1] Lê Ngọc Dương Cầm, Phương Thanh - “Cô gái Đồ Long”: Đối mặt tại toà. Dân trí. 2008 Sách, tạp chí
Tiêu đề: Cô gái Đồ Long”: Đối mặt tại toà. "Dân trí
[2] Bùi Ngọc Lan. Nghiên cứu mạng thư điện tử và ứng dụng trong lọc thư rác. Khóa lu ậ n t ố t nghi ệ p đ ạ i h ọ c h ệ chính quy, Khoa Công ngh ệ thông tin, Tr ườ ng Đại học Công nghệ, ĐHQGHN. 2006 Sách, tạp chí
Tiêu đề: Khóa luận tốt nghiệp đại học hệ chính quy
[3] Nguy ễ n Thu Trang. Link spam v ớ i đ ồ th ị web và h ạ ng trang web. Khóa lu ậ n tốt nghiệp đại học hệ chính quy, Khoa Công nghệ thông tin, Trường Đại học Công ngh ệ , ĐHQGHN. 2006 Sách, tạp chí
Tiêu đề: Khóa luận tốt nghiệp đại học hệ chính quy
[4] Vietnamnet. Blogger Việt Nam có cơ hội "rinh" giải thưởng lớn. Vietnamnet, 2007.English References Sách, tạp chí
Tiêu đề: rinh
[5] Allan Borodin, Gareth O. Roberts, Jeffrey S. Rosenthal, Panayiotis Tsaparas. Link analysis ranking: algorithms, theory, and Experiments. ACM, 2005 Sách, tạp chí
Tiêu đề: ACM
[6] Alan Mislove, Peter Druschel, Massimiliano Marcon, Bobby Bhattacharjee, Krishna P. Gummadi. Measurement and analysis of online social networks.ACM, 2007 Sách, tạp chí
Tiêu đề: ACM
[7] Amy Greenwald, John Wick. QuickRank: A Recursive Ranking Algorithm, The 1st International Workshop on Computational Social Choice (COMSOC-2006), 2006 Sách, tạp chí
Tiêu đề: The 1st International Workshop on Computational Social Choice
[8] Amy N. Langville, Carl D. Meyer. The use of the linear algebra by web search engines. Technical report, Department of Mathematics, North Carolina State University, 2004 Sách, tạp chí
Tiêu đề: Technical report
[10] Boanergers Aleman Meza. Searching and ranking documents based on relevance of semantic relationships. IEEE, 2006 Sách, tạp chí
Tiêu đề: IEEE
[11] Caroline McCarthy. ComScore's latest numbers: Worldwide social- networking growth. News.com, 2007 Sách, tạp chí
Tiêu đề: News.com
[13] Danah M. Boyd, Nicole B. Ellison. Social network sites: definition, history, and scholarship. Journal of Computer-mediated communication, 2007 Sách, tạp chí
Tiêu đề: Journal of Computer-mediated communication
[14] Duncan J. Watts, Peter Sheridan Dodds, M. E. J. Newman. Identity and search in social networks. Technical report, Department of Sociology, Columbia University, New York . 2002 Sách, tạp chí
Tiêu đề: Technical report
[15] Elizabeth F. Churchill, Christine A. Halverson. Social networks and social networking. IEEE, 2005 Sách, tạp chí
Tiêu đề: IEEE
[16] Jiawei Han, Micheline Kamber. Data Mining Concepts and Techniques, 2 nd edition. Morgan Kaufman Publishers. Pages 555-560 Sách, tạp chí
Tiêu đề: Morgan Kaufman Publishers
[12] Charles Kadushin. Introduction to Social Network Theory. http://home.earthlink.net/~ckadushin/Texts/, 2004, Pages 7-13 Link
[17] John Battelle. Google Announces New Index Size, Shifts Focus from Counting. http://battellemedia.com/archives/001889.php, 2005 Link
[26] Orgnet.com. Social Network Analysis, A Brief Introduction. http://www.orgnet.com/sna.html Link
[33] Wikipedia. Social network. http:// en.wikipedia.org/wiki/Social_network. [34] Wikipedia. Centrality.http://en.wikipedia.org/wiki/Eigenvector_centrality Link
[35] Wikipedia. Spamdexing. http://en.wikipedia.org/wiki/Spamdexing Link
[36] Wikipedia. Scale free network. http://en.wikipedia.org/wiki/Scale-free_network Link

TỪ KHÓA LIÊN QUAN

w