Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 306 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
306
Dung lượng
3,7 MB
Nội dung
[...]... on theWeb in Chapter 7 The chapter also deals with the basic principles of Web crawlers Web crawling is essential to gather information about theWeband in this sense is a prerequisite for the study of theWeb graph in Chapter 3 Chapter 3 studies theInternetandtheWeb as large graphs It describes, models, and analyzes the power-law distribution of Web sizes, connectivity, PageRank, andthe ‘small-world’... the ‘glue’ of this book Chapter 2 provides an introduction to the InternetandtheWebandthe foundations of the WWW technologies that are necessary to understand the rest of the book, including the structure of Web documents, the basics of Internet protocols, Web server log files, and so forth Server log files, for instance, are important to thoroughly understand the analysis of human behavior on the. .. such as webs of scientific citations, social relations, or even protein interactions In this sense, it is fair to say that a predominant fraction of our book is about theWebandthe information aspects of theInternet We use Web every time we refer to the World Wide Webandweb when we refer to a broader class of networks or other kinds of networks, i.e web of citations As theInternetandthe Web. .. Mathematics, Economics and Business, and Social Sciences The topic is quite broad On the surface theWeb could appear to be a limited subdiscipline of computer science, but in reality it is impossible for a single researcher to have an in-depth knowledge and understanding of all the areas of science and technology touched by the InternetandtheWeb While we do not claim to cover all aspects of the Internet. .. but they have also themselves become the objects of active scientific investigation And not only for computer scientists and engineers, but also for mathematicians, economists, social scientists, and even biologists There are many reasons why the InternetandtheWeb are exciting, albeit young, topics for scientific investigation These reasons go beyond the need to improve the underlying technology and. .. tried to respect Internet , in particular, is the more general term and implicitly includes physical aspects of the underlying networks as well as mechanisms such as email and peer-to-peer activities that are not directly associated with theWebThe term Web , on the other hand, is associated with the information stored and available on theInternet It is also a term that points to other complex networks... technology and to harness theWeb for commercial applications Because the InternetandtheWeb can be viewed as dynamic constellations of interconnected processors andWeb pages, respectively, they can be monitored in many ways and at many different levels of granularity, ranging from packet traffic, to user behavior, to the graphical structure of Web pages and their hyperlinks These measurements provide... was either greater than five billion or not There is, however, considerable uncertainty about what this number was back in January 2003 since, as we will discuss later in Chapters 2 and 3, accurately estimating the size of theWeb is a quite challenging problem Consequently there is uncertainty about whether the proposition e is true or not Modeling the InternetandtheWeb P Baldi, P Frasconi and P... the basic axioms ¯ ¯ of probability The ‘normalization’ constant in the denominator of Equation (1.1) can be calculated by noting that P (D) = P (D | e)P (e) + P (D | e)P (e) It is easy to see ¯ ¯ that P (e | D) depends both on the prior andthe likelihood in terms of ‘competing’ with the alternative hypothesis e – the larger they are relative to the prior for e and ¯ ¯ the likelihood for e, then the. .. 1 observations for the first of the two words Since the prior is flat, the posterior Beta distribution has the same shape as the likelihood function Figure 1.3 shows the same inference problem with the same data, but with a different prior – now the prior is ‘stronger’ and favors a parameter π that is around 0.5 In this case the likelihood andthe posterior have different shapes andthe posterior in effect . alt="" Modeling the Internet and the Web This Page Intentionally Left Blank Modeling the Internet and the Web Probabilistic Methods and Algorithms Pierre Baldi School of Information and Computer. directly associated with the Web. The term Web , on the other hand, is associated with the information stored and available on the Internet. It is also a term that points to other complex networks. part of the ‘glue’ of this book. Chapter 2 provides an introduction to the Internet and the Web and the foundations of the WWW technologies that are necessary to understand the rest of the book, including