Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 67 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
67
Dung lượng
1,8 MB
Nội dung
Web Caching Elliot Jaffe Presentation for The Seminar on Database and Internet Hebrew University, Fall 2002 Agenda Caching: Why, Where, How, What Some empirical data: Zipf’s Law Content Delivery Networks Bibliography Why cache? Number of unique pages: 2.2B 800M < X < Number of unique web sites: 8,500,000 static pages: %30 - %40 pages revisited: %80 expected hit-rate: %24 - %32 Why cache? Bandwidth Latency Performance = Response Time Server Load Failure Redundancy Where Reverse Reverse Reverse Proxy Reverse Proxy Proxy Proxy Local ISP cache cdn Content Content Content Content Server Server Server Server L4 Switch cache Intranet cache cache cache Browser Browser Browser Data Center ISP cdn Hot-potato routing Get traffic off of your network as soon as possible Bounces traffic around the internet Increases chance of dropped packet Increases latency Destination You are here How: Types of Caches Simple Proxy Transparent Proxy Reverse Proxy Adaptive Caching Push Caching Active Caching Streaming Caches How: Simple Proxy Harvest/Squid Provide web content for a fixed user base Standalone operation May be transparent Commodity product/technology Easy to get 90% correct How: Transparent Proxy No client configuration Violates end-to-end paradigm Client thinks it is talking directly to server Server thinks it is talking to cache Implemented as Pass-through unit L4 switch How: Reverse Proxy Designed to offload duties from one or more specific servers Data size is limited to size of static content on the server Challenge is fast, disk-less operation Cache consistency is easy Single point of failure On the Use and Performance of Content Distribution Networks Focus is on client perceived performance Build canonical web page with images from CDN server On the Use and Performance of Content Distribution Networks If each CDN serves different content, then how did they create comparable pages? Size matters! Select images of (almost) identical sizes from each of the CDN services On the Use and Performance of Content Distribution Networks Step 1: For services using only DNS redirection, get an IP address from the DNS server For services using rewriting, get the page and extract the CDN content server from the page Amortize DNS lookup time over all images in this page On the Use and Performance of Content Distribution Networks Step 2: Download all the images from the IP address of the identified server Throw this data away The purpose is to make sure that there are no cache misses On the Use and Performance of Content Distribution Networks Step 3: Download all the images from the IP address of the identified server just like a browser would (4 in parallel) Repeat every 30 minutes over a period of 24 hours with a 10 minute jitter On the Use and Performance of Content Distribution Networks Results On the Use and Performance of Content Distribution Networks Four Conclusions Forcing a DNS lookup in the critical path of resource retrieval, does not generally result in better server choices The download time from a previously selected server is often better than from the download time from the newly selected server CDN servers are generally not loaded so frequent DNS lookup is not helpful It makes sense for CDNs to increase the DNS TTL given to a client unless the servers are known to be loaded On the Use and Performance of Content Distribution Networks Is this a better study? More detailed results Relates to observed performance A good marketing white paper What did we learn? Dirty Secrets of the CDN world CDNs are tremendously underutilized CDNs are over-architected The value of a CDN is its remote presence in the ISP Not in its ability to load balance Remember the ISP Interconnect? P2P content delivery systems PUSH content to the leaf nodes Content Manager Server other leaf nodes from the edges Kontiki client client P2P CDN Four Challenges Aggregate input streams Deal with unstable peers Manage Malicious peers Who really pays for this? P2P Caching? Discussion: Is this a good idea? What are the issues? Where is the payback? Agenda Caching: Why, Where, How, What Some empirical data: Zipf’s Law Content Delivery Networks Bibliography Bibliography Gray, Shenoy, Rules of Thumb in Data Engineering 1999, Revised March 2000 Microsoft Research MS-TR-99-100 Berners-Lee, Fielding, Frystyk, Hypertext Transfer Protocol -HTTP/1.0, IETF RFC 1945, http://www.w3.org/Protocols/rfc1945/rfc1945 Fielding, Gettys, Mogul, Frystyk, Masinter, Leach, Berners-Lee, Hypertext Transfer Protocol HTTP/1.1, ftp://ftp.isi.edu/in-notes/rfc2616.txt Greg Barish and Katia Obraczka World Wide Web Caching: Trends and Techniques IEEE Communications, May 2000 http://www.isi.edu/people/katia/cache-survey.pdf Breslau, Cao, Fan, Phillips, Shenker, Web Caching and Zipf-like distributions: Evidence and Implications, IEEE Infocom 1999 K.L.Johnson,J.F.Carr,M.S.Day,and M.F.Kaashoek,”The measured performance of content distribution networks,”in Proceedings of the 5th International Web Caching Workshop and Content Delivery Workshop,(Lisbon,Portugal),May 2000 www.terena.nl/conf/wcw/Proceedings/S4/S4-1.pdf B Krishnamurthy,C Wills,Y Zhang, “On the Use and Performance of Content Distribution Networks” in ACM SIGCOMM INTERNET MEASUREMENT WORKSHOP 2001 http://www.icir.org/vern/imw-2001/imw2001-papers/10.pdf Bibliography “Zipf Distribution of Web Site Popularity”, http://www.useit.com/alertbox/zipf.html S Gribble, E Brewer, “System Design Issues for Internet Middleware Services: Deductions from a Large Client Trace”, Proceedings of the USENIX Symposium on Internet Technologies and Systems Monterey,California,December 1997 “The Internet is a little bit broken”, http://www.internap.com/about/theproblem.html “Reliable Internet Connectivity with BGP, Chapter 7, Influencing Entrance Selection”, http://www.bgpbook.com/archpolicyenter.html A Cockburn, B McKenzie, “What Web Users Do? An Empirical Analysis of Web Use”, http://www.cosc.cantebury.ac.nz/~andy/papers/ijhcs Analysis.pdf