Web content delivery

Web Content Delivery Web Information Systems Engineering and Internet Technologies Book Series Series Editor: Yanchun Zhang, Victoria University, Australia Editorial Board: Robin Chen, AT&T Umeshwar Dayal, HP Arun Iyengar, IBM Keith Jeffery, Rutherford Appleton Lab Xiaohua Jia, City University of Hong Kong Yahiko Kambayashit Kyoto University Masaru Kitsuregawa, Tokyo University Qing Li, City University of Hong Kong Philip Yu, IBM Hongjun Lu, HKUST John Mylopoulos, University of Toronto Erich Neuhold, IPSI Tamer Ozsu, Waterloo University Maria Orlowska, DSTC Gultekin Ozsoyoglu, Case Western Reserve University Michael Papazoglou, Tilburg University Marek Rusinkiewicz, Telcordia Technology Stefano Spaccapietra, EPFL Vijay Varadharajan, Macquarie University Marianne Winslett, University of Illinois at Urbana-Champaign Xiaofang Zhou, University of Queensland Other Bool^s in the Series: Semistructured Database Design by Tok Wang Ling, Mong Li Lee, Gillian Dobbie ISBN 0-378-23567-1 Web Content Delivery Edited by Xueyan Tang Nanyang Technological University, Singapore Jianliang Xu Hong Kong Baptist University Samuel T Chanson Hong Kong University of Science and Technology ^ Springer Xueyan Tang Nanyang Technological University, SINGAPORE Jianliang Xu Hong Kong Baptist University Samuel T Chanson Hong Kong University of Science and Technology Library of Congress Cataloging-in-Publication Data A C.I.P Catalogue record for this book is available From the Library of Congress ISBN-10: 0-387-24356-9 (HE) e-ISBN-10: 0-387-27727-7 ISBN-13: 978-0387-24356-6 (HB) e-ISBN-13: 978-0387-27727-1 © 2005 by Springer Science+Business Media, Inc All rights reserved This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science -i- Business Media, Inc., 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden The use in this publication of trade names, trademarks, service marks and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights Printed in the United States of America 987654321 springeronline.com SPIN 11374763 Contents Preface vii Part I Web Content Delivery Web Workload Characterization: Ten Years Later Adepele Williams, Martin Arlitt, Carey Williamson, and Ken Barker Replica Placement and Request Routing Magnus Karlsson 23 The Time-to-Live Based Consistency Mechanism Edith Cohen and Haim Kaplan 45 Content Location in Peer-to-Peer Systems: Exploiting Locality Kunwadee Sripanidkulchai and Hui Zhang 73 Part II Dynamic Web Content Techniques for Efficiently Serving and Caching Dynamic Web Content Arun Iyengar, Lakshmish Ramaswamy and Bianca Schroeder 101 Utility Computing for Internet Applications Claudia Canali, Michael Rabinovich and Zhen Xiao 131 Proxy Caching for Database-Backed Web Sites Qiong Luo 153 vi WEB CONTENT DELIVERY Part III Streaming Media Delivery Generating Internet Streaming Media Objects and Workloads Shudong Jin and Azer Bestavros 177 Streaming Media Caching Jiangchuan Liu 197 10 Policy-Based Resource Sharing in Streaming Overlay Networks K Selỗuk Candan, YusufAkca, and Wen-Syan Li 215 11 Caching and Distribution Issues for Streaming Content Distribution Networks Michael Zink and Pmshant Shenoy 245 12 Peer-to-Peer Assisted Streaming Proxy Lei Guo, Songqing Chen and Xiaodong Zhang 265 Part IV Ubiquitous Web Access 13 Distributed Architectures for Web Content Adaptation and Delivery Michele Colajanni, Riccardo Lancellotti and Philip S Yu 285 14 Wireless Web Performance Issues Carey Williamson 305 15 Web Content Delivery Using Thin-Client Computing Albert M Lai and Jason Nieh 325 16 Optimizing Content Delivery in Wireless Networks Pablo Rodriguez Rodriguez 347 17 Multimedia Adaptation and Browsing on Small Displays Xing Xie and Wei-Ying Ma 371 Preface The concept of content delivery (also known as content distribution) is becoming increasingly important due to rapidly growing demands for efficient distribution and fast access of information in the Internet Content delivery is very broad and comprehensive in that the contents for distribution cover a wide range of types with significantly different characteristics and performance concerns, including HTML documents, images, multimedia streams, database tables, and dynamically generated contents Moreover, to facilitate ubiquitous information access, the network architectures and hardware devices also vary widely They range from broadband wired/fixed networks to bandwidthconstrained wireless/mobile networks, and from powerful workstations/PCs to personal digital assistants (PDAs) and cellular phones with limited processing and display capabilities All these levels of diversity are introducing numerous challenges on content delivery technologies It is desirable to deliver contents in their best quality based on the nature of the contents, network connections and client devices This book aims at providing a snapshot of the state-of-the-art research and development activities on web content delivery and laying the foundations for future web applications The book focuses on four main areas: (1) web content delivery; (2) dynamic web content; (3) streaming media delivery; and (4) ubiquitous web access It consists of 17 chapters written by leading experts in the field The book is designed for a professional audience including academic researchers and industrial practitioners who are interested in the most recent research and development activities on web content delivery It is also suitable as a textbook or reference book for graduate-level students in computer science and engineering WEB CONTENT DELIVERY Chapter WEB WORKLOAD CHARACTERIZATION: TEN YEARS LATER Adepele Williams, Martin Arlitt, Carey Williamson, and Ken Barker Department of Computer Science, University of Calgary 2500 University Drive NW, Calgary, AB, Canada T2N1N4 {awilliam,arlitt,carey,barker}@cpsc.ucalgary.ca Abstract In 1996, Arlitt and Williamson [Arlitt et al., 1997] conducted a comprehensive workload characterization study of Internet Web servers By analyzing access logs from Web sites (3 academic, research, and industrial) in 1994 and 1995, the authors identified 10 invariants: workload characteristics common to all the sites that are likely to persist over time In this present work, we revisit the 1996 work by Arlitt and Williamson, repeating many of the same analyses on new data sets collected in 2004 In particular, we study access logs from the same academic sites used in the 1996 paper Despite a 30-fold increase in overall traffic volume from 1994 to 2004, our main conclusion is that there are no dramatic changes in Web server workload characteristics in the last 10 years Although there have been many changes in Web technologies (e.g., new protocols, scripting languages, caching infrastructures), most of the 1996 invariants still hold true today We postulate that these invariants will continue to hold in the future, because they represent fundamental characteristics of how humans organize, store, and access information on the Web Keywords: Web servers, workload characterization Introduction Internet traffic volume continues to grow rapidly, having almost doubled every year since 1997 [Odlyzko, 2003] This trend, dubbed "Moore's Law [Moore, 1965] for data traffic", is attributed to increased Web awareness and the advent of sophisticated Internet networking technology [Odlyzko, 2003] Emerging technologies such as Voice-over-Internet Protocol (VoIP) telephony and Peerto-Peer (P2P) applications (especially for music and video file sharing) further Multimedia Adaptation and Browsing on Small Displays 379 Figure 17.2 The binary tree used in presentation optimization And the upper boundary is the addition of all IF values of those unchecked blocks after the current level, in other words, the sum of IF values of all blocks in P except those discarded before the current level We perform a depth-first traversal on this tree according to following constraints: • Whenever the upper bound of a node is smaller than the best IF value currently achieved, the whole sub-tree of that node including itself will be truncated • At the same time, for each node we check Equation 17.3 to verify its validity If the constraint is broken, the node and its whole sub-tree will be truncated, because including a new block will increase the sum of MPS values • If we arrive at a block set with an IF value larger than the current best IF value, we will replace the current best IF value by this one By checking the bounds on possible IF value, the computation cost is greatly reduced We can also use some other techniques to reduce the time of traversal such as arranging all the objects in a decreasing order of their importance values at the beginning of search, since in many cases only a few objects contribute the majority of IF value The complexity of this algorithm is exponential with the number of information objects in the worst case However, our approach can be conducted efficiently, because the number of information objects is often less than a few dozens and the importance values are always distributed quite unevenly among information objects Adapting Web Pages for Small Displays In this section, we show how we apply the previous content model to define a web page representation that is scalable to various display sizes 380 4,1 WEB CONTENT DELIVERY Document Representation Model We adopt an approach similar to the fisheye view [Sarkar and Brown, 1994]: when the display area shrinks, some parts of the web pages will be summarized and then, presented together with other unsummarized parts, adaptively to end users with aesthetic layouts Here an information block is defined as a logically independent portion of the HTML page There are two issues we should take care of in web page adaptation: • Web authors usually not want their content to be randomly shuffled after adaptation Therefore, we'd better keep the relative position of information blocks • For information blocks like texts, they usually require a minimal height and width to be properly displayed In addition, some information blocks like text blocks can be refiowed while others can not Based on the content representation model described in Definition 1, we introduce a scalable web page representation, as shown in Definition The extensions to the original representation model are mainly twofold: • In order to let authors have controls on the final page layout, we leverage binary slicing trees, a data structure widely used in computer aided design community [Cohoon and Paris, 1987], instead of an unordered set to organize the information blocks • We add three additional properties to each information object in order to characterize their special display constraints Definition 2: The resulting web page representation is a binary slicing tree with N leaf nodes Each inner node is labelled with either v ox h denoting vertical or horizontal split, and each leaf node is an information block defined as follows: Bi = {IMPi, MPSu ALTu MPHu MPWu ADJi) (17.6) where

Định dạng
Số trang	388
Dung lượng	22,83 MB