Content description model and framework for efficient content distribution

CONTENT DESCRIPTION MODEL AND FRAMEWORK FOR EFFICIENT CONTENT DISTRIBUTION ZHANG SHUTAO (B Eng (Hons.) NUS) HT00-6864A A THESIS SUBMITTED FOR THE DEGREE OF MASTER OF SCIENCE DEPARTMENT OF COMPUTER SCIENCE SCHOOL OF COMPUTING NATIONAL UNIVERSITY OF SINGAPORE 2005 CONTENT DESCRIPTION MODEL AND FRAMEWORK FOR EFFICIENT CONTENT DISTRIBUTION ZHANG SHUTAO NATIONAL UNIVERSITY OF SINGAPORE 2005 Acknowledgement I owe my deepest gratitude and appreciation to my thesis supervisor, Dr Chi Chi-Hung, for giving me the opportunity to work with him and my lab mates I thank him for his continued guidance, insight, patience, encouragement, and above all, his confidence in me, without which this thesis would not have been possible I am grateful to him for all the time and efforts he has spent in helping me improve my research and this document I would also like to thank Dr Chi Chi-Hung, for giving me advices on how to choose my career path at this important stage of life I sincerely thank all my lab mates for offering me much needed assistance and for sharing their invaluable insights during my research Special thanks to my dear friend Wang Hong-Guang for his sincere help and encouragement during the most difficult time of my research Also I want to thank Yuan Jun-Li and Li Qi-Ming for sharing their valuable advice on my research experiment Finally, I would like to express my immeasurable appreciation to my wife, my parents and my parents in law for their love, trust, inspiration and understanding, Contents Summary iii List of Figures v Chapter Introduction Chapter Related Works 2.1 Framework for Customized Content Delivery 2.2 Content Description Model .9 2.3 Client Descriptions 12 2.4 Server Side Approaches .12 2.5 Existing Software Tools … 14 2.6 Summary 15 Chapter A General Content Description Model 17 3.1 General Settings 17 3.2 Proposed Content Description Model 20 3.2.1 Web Objects………………….… 20 3.2.2 Object Description Scheme 21 3.2.3 Discussion………………………… 26 Chapter A Framework for Efficient Content Distribution 27 4.1 Design Objectives … 27 4.2 Overall Architecture…… 29 4.3 Server Operations… …………… .34 4.4 Proxy Operations… ………… …… 37 4.4.1 Mapping User Descriptions to Content Descriptions 37 4.4.2 Managing Local Content Descriptions…………… 41 4.5 User Operations …… ………………………………… 45 4.6 Summary…… …… ………………………………… 45 Chapter A Case Study on the Framework 47 5.1 Simulation Setup….….…… … …… 47 5.2 Web Object Size…………… 51 5.3 Web Object Latency ……… 52 5.4 XHTML Page Latency……………………… …… .55 5.5 Summary …………………………… … …… 63 Chapter Conclusion 65 Reference 68 Summary Today, the Web has become a highly heterogeneous environment Users are accessing information on the Web pervasively through heterogeneous end points with different capabilities To accommodate the needs due to heterogeneous user preferences and device capabilities, web intermediaries, called proxies, start to perform various functions including Web content caching and image transcoding on the Web content before it is distributed to the users As different functions require different content semantic information, which we refer to as content descriptions, web servers are hosting a large amount of content descriptions to help proxies perform various functions Under the heterogeneous environment, efficient content distribution has become a problem due to a few challenging issues First of all, it is not clear how a proxy should decide which functions to perform given any user preferences and device capabilities, because it is not easy, if possible at all, for every proxy to understand every type of devices and users, and the users may not be able to know all the functions provided by proxies, either If this is not properly handled, we may end up delivering non-acceptable content to users Secondly, to provide semantic information about different attributes of Web content, the server may need to store a large amount of content descriptions Delivering all the descriptions about a Web page to a proxy when the Web page is requested may be highly inefficient because the proxy may only need a small fraction of the content descriptions to perform the desirable functions Thirdly, repeatedly delivering the same content descriptions to the same proxy is unnecessary But insofar, there lacks a mechanism for a proxy to properly cache and reuse the content descriptions that are already retrieved In this thesis, we propose a content description model and framework for efficient content distribution The content description model employs ideas from Resource Description Framework [3] and External Annotation [2], which allow flexible descriptions for Web content The model also allows a server to efficiently select any subset of the descriptions of any Web page and deliver them to a proxy The framework consists of several algorithms for the proxies to map user preferences and device capabilities to a set of functions to be performed, and for the server to select and deliver necessary content descriptions to the proxy, and for the proxy to efficiently cache and reuse the content descriptions To evaluate the performance of our framework, we conduct a simulation study with certain simplifications (the details are given in Chapter 5) We employ real world Web objects identified from network traces, and study how our content description model and framework reduce the size of the Web objects, the delay in retrieving Web objects, the number chunks in HTTP responses, and the delay of entire Web pages We give some preliminary results and some discussions List of Figures 2.1 ICAP Response Modification……………………………………………… 2.2 ICAP Request Modification …………………………………………………8 2.3 InfoPyramid Model………………………………………………………… 3.1 General Settings …………………………………………………………….18 3.2 Description for a Simple XHTML Page 24 4.1 The Framework Overview ………………………………………………… 33 4.2 Mapping User Descriptions to Functions ………………………………… 38 4.3 Mapping Functions to Set of Attribute Descriptions ……………………… 40 4.4 Caching and Validation for Content Descriptions ………………………….43 4.5 Managing Local Attribute Descriptions …………………………………….44 5.1 A Sample Content Selection Flow………………………………………… 50 5.2 Web Object Size Reduction ……………………………………………… 51 5.3 Web Object Latency Reductions ………………………………………… 53 5.4 Chunk Number Distribution for Web Objects………………………………54 5.5 HTML Chunk Number Reduction ………………………………………….55 5.6 XHTML Page Latency …………………………………………………… 57 5.7 Effect of Different Parallel Connections with User Description D1 ……….59 5.8 Effect of Different Parallel Connections with User Description D2 ……….59 5.9 Effect of Different Parallel Connections with User Description D3 ……….61 5.10 Effect of Different Parallel Connections with User Description D4 …… 61 5.11 Effect of Different Parallel Connections with User Description D5 …… 62 Chapter Introduction The Internet keeps growing rapidly based on latest surveys [7, 8, 9, 18] The WorldWide-Web (or Web in short), which is based on the Hyper Text Transfer Protocol (HTTP), has become the main platform for information distribution on the Internet Thompson et al [9] conducted a study on InternetMCI’s backbone and found that Web traffic occupied more than half of the total Internet traffic Today, the Web has become a highly heterogeneous environment Users are accessing information on the Web pervasively through heterogeneous end points, including personal computers and workstations on traditional wired networks, and devices based on more recent wireless technologies Wireless devices such as smart phones, palm-top devices, and laptop computers are playing a very important role on the Internet All these Web accessing devices have various capabilities due to their widely diversified hardware computation power (e.g., Page Latency Reduction (D2) Percentage of Reduction 35.00% 30.00% 25.00% C=4 20.00% C=8 15.00% C=16 C=32 10.00% 5.00% 0.00% 0-1K 1K10K 10K20K 20K30K 30K40K 40K50K 50K60K 60K70K 70K80K 80K90K 90K- >100K 100K Web Object Size Figure 5.8 Effect of Different Parallel Connections with User Description D2 In the above figures, the variable “C” refers to number of parallel connections For example, “C=4” means there are parallel connections We observe that the page latency reduction is higher with larger number of parallel connections except for object less than 1K bytes For example, in figure 5.8, when Web object size is between 10K and 20K, the percentage of reduction on page latency is 10.88% and 17.94% for and parallel connections respectively Then the marginal benefit on page latency reduction is 7.06% when number of parallel connections increases from to The marginal benefit on page latency reduction, however, is decreasing when number of parallel connection increases This observation can be explained by how Web objects and its embedded objects are transferred over HTTP Web objects are divided into chunks and sent sequentially, one chunk after another When the browser gets a chunk, it will parse the content in the chunk and start to download the embedded objects discovered in that chunk, provided that there are available parallel connections So when the number of parallel connections increases, the chance of delay due to waiting for available connections to download the embedded objects is reduced, thus the page latency is reduced But this improvement will stop when the number of parallel connections is greater than the number of embedded objects in a chunk So we also observe a decreased marginal effect on reduction of page latency when number of parallel connections increases For objects smaller than 1K bytes, it is very unlikely to have more than embedded objects Thus the reduction ratio remains nearly unchanged with different number of parallel connections Page Latency Reduction (D3) Percentage of Reduction 45.00% 40.00% 35.00% 30.00% C=4 25.00% C=8 20.00% C=16 C=32 15.00% 10.00% 5.00% 0.00% 0-1K 1K10K 10K20K 20K30K 30K40K 40K50K 50K60K 60K70K 70K80K 80K90K 90K- >100K 100K Web Object Size Figure 5.9 Effect of Different Parallel Connections with User Description D3 Page Latency Reduction (D4) Percentage of Reduction 40.00% 35.00% 30.00% C=4 25.00% C=8 20.00% C=16 15.00% C=32 10.00% 5.00% 0.00% 0-1K 1K10K 10K20K 20K30K 30K40K 40K50K 50K60K 60K70K 70K80K 80K90K 90K- >100K 100K Web Object Size Figure 5.10 Effect of Different Parallel Connections with User Description D4 For user description D3 and D4, we notice that the marginal benefit due to increasing the number of parallel connections with user description D3 is much smaller than that with D4 This is because there are much more embedded images than scripts, hence after content selection, there are more remaining embedded objects with user description D4 than D3, which implies that increasing the number of parallel connections has larger effect in reducing page latency with D4 Page Latency Reduction (D5) Percentage of Reduction 70.00% 60.00% 50.00% C=4 40.00% C=8 30.00% C=16 C=32 20.00% 10.00% 0.00% 0-1K 1K10K 10K20K 20K30K 30K40K 40K50K 50K60K 60K70K 70K80K 80K90K 90K- >100K 100K Web Object Size Figure 5.11 Effect of Different Parallel Connections with User Description D5 With user description D5, there is no difference in page reduction ratio at all when number of parallel connections changes This is expected as there are no embedded objects after content selection in this scenario 5.5 Summary In this chapter, we conduct a simulation study on how the framework improves efficiency on content delivery for real world Web objects The effect on content delivery is evaluated in three aspects: Web object size, Web object latency and XHTML page latency In the following, we summarize our findings The framework reduces Web object sizes to some extent On average, there is a reduction of 4.5%, 12.3%, 21.2%, 22.3% and 35% on all Web sizes with user description D1, D2, D3, D4 and D5, respectively The reduction in sizes can help to reduce wastage of network resources in transferring content that is not needed by users When we consider Web object latency, we find the size reduction is not proportional to latency reduction This is because Web objects are delivered in terms of chunks, and there will be little effect on Web object latency when the number of chunks to deliver a Web object remains the same As a result, we find that framework is more effective in reducing Web object latency when the object size is larger than 1K bytes The reduction ratio stabilizes to 15% ~ 20% for user descriptions D1 to D4 and 25% for user description D5 when Web object size becomes large Besides size and Web object latency, we further perform a study on XHTML page latency This is a more complicated than Web object latency alone as we need to take into account several other important factors including position and latency of the embedded objects, as well as the parallelism in downloading the embedded objects To study how these factors affect the page latency, we take a two-step approach First of all, we assume there are parallel connections available and study how the framework affects XHTML page latency with different user descriptions After that, we study how the number of parallel connections affects the efficiency of the framework by repeating our experiments for different number of parallel connections When there are parallel connections, we find that the page latency reduction ratio is highly correlated to the number of embedded objects in a page Thus we observe that for user description D1, D2 and D5, the reduction ratio is around 7% ~ 10% While for D3 and D5, this ratio shoots up to 25% and 45% on average When we change the number of parallel connections, we find that increasing the number of parallel connections helps to reduce the page latency However, the marginal benefit decreases when the number of parallel connections increases, i.e., the increment in reduction ratio of page latency from to parallel connections is higher than that from to 16 parallel connections There is only a small improvement in page latency reduction ratio when the number of parallel connections increases from 16 to 32 From the above observations, we conclude that the simulated framework achieves its goal in improving efficiency of content delivery from the server to the user through reducing the sizes of the Web objects and latencies of Web objects and the pages Chapter Conclusion and Future Work In this thesis, we study the problem of efficient content distribution in a heterogeneous environment We propose a content description model and framework The content description model defines how the Web objects are described and how these descriptions are organized Our model supports a wide range of content descriptions and allows a server to easily select any subset of content descriptions of a given Web page The framework consists of several algorithms and guidelines for a proxy to easily and correctly map user device capabilities and preferences to a set of functions to be performed, and to determine which content descriptions are necessary to perform these functions We also define HTTP protocol extensions and local rules for a proxy to select, cache and validate content descriptions from a server With the proposed model and framework, we can improve the efficiency of content distribution by sending only the desired content to a user, requesting only the necessary content descriptions from a server, and by caching and reusing existing content descriptions at the proxy We also conduct a simulation study on the proposed model and framework with Web objects from the real world and varying user preferences and device capabilities Our results suggest that our model and framework can be applied to achieve efficient Web content distribution In this thesis, we only implement the content description model and delivery framework on a LAN This is not an all-rounded implementation of the model and the framework Due to the complexity of the framework and variety of existing computer systems, implementation of all the features of the framework on the Internet is beyond the scope of the thesis As the chunk level simulation for HTML object delivery indicates, we have observed effectiveness of the model and the simplified framework in the LAN, we are expecting the effectiveness to be even higher when we consider an all-rounded implementation in a heterogeneous environment in the Internet Because many features such as retrieving only necessary content descriptions from the server, caching and reusing Web content and content descriptions will further boost the performance of the framework But this is to be proven by more complicated implementations in the future In this framework, we not consider scenarios where multiple proxies cooperate with each other to perform various function on Web content, that will require proxies to know what have been done by others and further improve efficiency This is will make the framework even better in a heterogeneous environment and is left to further research Besides the above, we have assumed data integrity from the server to the client in the framework But this assumption is not always valid in the real world network where malicious network nodes can modify the content in the middle To make this framework more realistic, we need to incorporate mechanisms to handle issues of data integrity and delegate content adaptation operations to authorized proxies Since there are existing solutions in the literature about data integrity [35] and proxy delegation [36], they can be added to the framework to provide a more robust framework References [1] ESI language specification 1.0, 2000, http://www.esi.org [2] Masahiro H.,Goh K, Kouichi O., Hirose S.,Sandeep S., “ Annotation-based Web content transcoding”, The 9th WWW Conference, 2000 [3] Jefffrey C M., “Server Directed Transcoding”, Computer Communications 24(2):155-162, February, 2001 [4] A Fox, S Gribble, Y Chawathe, and E A Brewer Adapting to Network and Client Variation Using Infrastructural Proxies: Lessons and Prespectives IEEE Personal Communication, August 1998 [5] Device independence, http://www.w3c.org/2001/di [6] X Fu and V Karamcheti Why path-based adaptation? Performance implications of different adaptation mechanisms for network content delivery Technical Report TR2003-843, Computer Science Department, New York University, July 2003.Computer Industry Almanac Inc http://www.c-i-a.com/pr032102.htm [7] K G Coffman and A M Odlyzko, ”Growth of the Internet”, Optical Fiber Telecommunications IV, I P Kaminow and T Li, eds Academic Press, March 2002 [8] K Thompson, G J Miller, and R Wilder Wide-Area Internet Traffic Patterns and Characteristics IEEE Network, vol 11, no 6, pp 10−23, November 1997 [9] Composite Capability and Preference Profile, http://www.w3.org/Mobile/CCPP/ [10] WAP Forum: http://www.wapforum.org/ [11] HTTP: Hypertext Transfer Protocol, http://www.w3.org/Protocols/ [12] A Luotonen and K Altis World-Wide Web Proxies In Proc of 1st International World Wide Web Conference, Geneva (Switzerland), May 1994 [13] Internet Content Adaptation Protocol (I-CAP) http://www.i-cap.org [14] A Fox, S D Gribble, E A Brewer, and E Amir Adapting to Network and Client Variability via On-Demand Dynamic Distillation In Proc of 7th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Cambridge (MA), October 1996 [15] Open Pluggable Edge Service: http://www.ietf-opes.org [16] Content selection for device independence, http://www.w3c.org/TR/cselection/ [17] Internet Systems Consortium, http://www.isc.org/index.pl?/ops/ds/reports/2004-01/ [18] eTForecasts, http://www.etforecasts.com/pr/pr0402b.htm [19] WebOverDrive, http://www.freedownloadscenter.com/Web_Authoring/Misc Web_Authoring_Tool s/WebOverdrive.html [20] J R Smith, R Mohan, and C S Li Transcoding Internet Content for Heterogeneous Client Devices In Proc of IEEE International Symposium on Circuits and Systems (ISCAS), Monterey (CA), June 1998 [21] Jeffrey C M., Fred D., Anja F., and Balachander K., “Potential benefits of delta encoding and data compression for HTTP”, In Proc SIGCOMM '97 Conference, pages 181-194 1997 [22] Dynamic HTML Conversion to WML: http://html2wml.sourceforge.net/ [23] IRCACHE Proxy Traces, http://ircache.nlanr.net [24] General Packet Radio Service, http://www.gsmworld.com/technology/gprs/intro.shtml [25] WebSTAT, http://www.webstat.com/ [26] Peter Cranstone,“HTTP Compression Speeds up the web”, http://www.R25.com/internet/software/servers/http/compression/ [27] World Wide Web Consortium, “The effect of HTML Compression on a LAN and a PPP Modem Line”, http://www.w3.org/Protocols/HTTP/Performance/Compression/LAN.html http://www.w3.org/Protocols/HTTP/Performance/Compression/PPP.html [28] Yuan J.-L and Chi C.-H., Understanding Compression in Web Content Delivery, The 9th International Workshop on Web Content Caching and Distribution, Beijing, China, 2004 [29] ClickZ Internet Statistics and Demographics: http://www.clickz.com/stats/ [30] R Mohan, J R Smith, and C S Li Adapting Multimedia Internet Content for Universal Access IEEE Transactions on Multimedia, vol 1, no 1, pp 104–114, March 1999 [31] A Fox, S D Gribble, E A Brewer, and E Amir Adapting to Network and Client Variability via On-Demand Dynamic Distillation In Proc of 7th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Cambridge (MA), October 1996 [32] Bjorn K., Lu H.-H., Jeffrey M., “Architecture and pragmatics of Server directed transcoding”, Proceedings of the 7th International Workshop on Web Content Caching and Distribution, Boulder, CO, USA, August, 2002 [33] UAProf profile repository: http://w3development.de/rdf/uaprof_repository/ [34] Yu X Y., “Data integrity framework for web content delivery”, Master Thesis, National University of Singapore, 2003 [35] Open Pluggable Edge Services, http://www.ietf-opes.org/ [36] User Agent Profile, WAP forum http://www.wapforum.org/what/technical/SPEC-UAProf-19991110.pdf [37] Ora L., Ralph S, “Resource Description Framework (RDF) Model and Syntax Specification”, World Wide Web Consortium Recommendation, 1999, http://www.w3.org/TR/1999/REC-rdf-syntax-19990222/ [38] "Dynamic Content Acceleration: A Caching Solution to Enable Scalable Dynamic Web Page Generation", White Paper, Chutney Technologies, May 2001 [39] "Scaling Up e-Business Applications with Caching", M Conner, G Copeland, G Flurry DeveloperToolbox Magazine, August 2000 http://service2.boulder.ibm.com/devtools/news0800/art7.htm [40] "High-Performance Web Site Design Techniques", A Iyengar, J Challenger, D Dias, P Dantzig IEEE Internet Computing, March/April 2000 [41] Surendar Chandra and Carla Schlatter Ellis "JPEG Compression Metric as a Quality Aware Image Transcoding", Department of Computer Science, Duke University, Durham, NC 27708, July 2004 [42] Nathaniel Good, J Ben Schafer, Joseph A Konstan, Al Borchers, Badrul Sarwar, Jon Herlocker, and John Riedl, "Combining Collaborative Filtering with Personal Agents for Better Recommendations", Department of Computer Science and Engineering, University of Minnesota, June 2004 [43] The Extensible Device Independent Markup Language, http://www.volantis.com [44] Resources for the Extensible Markup Language (XML), http://www.w3.org/XML/ [45] Resources for the XML Path Language (XPath), http://www.w3.org/TR/xpath [46] IBM WebSphere Portal Family, http://www- 306.ibm.com/software/info1/websphere/index.jsp?tab=products/portal&S_TACT=10 1CMM04 [47] BEA WebLogic Portal, http://www.bea.com/framework.jsp?CNT=index.htm&FP=/content/products/weblogi c/portal [48] Online repository system for mobile devices, http://www.volantis.com [49] Michael J and Hu Ye Jian, "Multimedia Description Framework (MDF) for Content Description of Audio/Video Documents", School of EEE, Nanyang Technological University, May 2004 [50] Annotation of Web Content for Transcoding, http://www.w3.org/1999/07/NOTEannot-19990710/ ... and reuse the content descriptions that are already retrieved In this thesis, we propose a content description model and framework for efficient content distribution The content description model. .. and reuse the content descriptions that are already retrieved In this thesis, we propose a content description model and framework for efficient content distribution The content description model. .. general frameworks for customized content distribution, content description models for providing Web content descriptions, mechanisms to support descriptions for device capabilities and user

Định dạng
Số trang	87
Dung lượng	245,5 KB