Chapter 9 Internet Site Functionality Design i-NET+ EXAM OBJECTIVES COVERED IN THIS CHAPTER: Identify the issues that affect Internet site functionality (e.g., performance, security, and reliability). Content may include the following: Bandwidth Internet connection points Audience access Internet Service Provider (ISP) Connection types Corrupt files Files taking too long to load Inability to open files Resolution of graphics Describe the concept of caching and its implications. Content may include the following: Server caching Client caching Proxy caching Cleaning out client-side cache Server may cache information as well Web page update settings in browsers Describe different types of search indexes—static index/site map, keyword index, full-text index. Examples could include the following: Searching your site Searching content Indexing your site for a search Copyright © 2000 SYBEX Inc., Alameda, CA. www.sybex.com P erhaps the most important aspect of implementing and main- taining a Web site is making sure that it is accessible and usable by your audi- ence. Regardless of how wonderful content is, if users cannot access the site in a timely and reliable way, they will go elsewhere for the information they seek. Therefore, it is important to know enough about the technologies that run the Internet that you can ensure that your site will meet the demands of its users. In this chapter, you will learn about several critical topics that have an impact on a site’s functionality and usability: Site functionality issues Technology and content-type planning Caching Site indexing Each of these major topics contributes to the overall usability of a Web site. Site Functionality Issues I nternet users are a fickle bunch. Technological glitches not only harm functionality, they often cost sites their reputation for usability and reliabil- ity. Everyone has given up on a Web page because it is too slow or just plain broken. What do we do if www.amazon.com/ goes down? We go to www .barnesandnoble.com/ . What do we do if a site requires ActiveX and our corporate security policy is to disallow ActiveX? We go to a different site. It Copyright © 2000 SYBEX Inc., Alameda, CA. www.sybex.com Site Functionality Issues 393 is important to know the most common errors users experience and why they occur. Functionality errors manifest themselves in three ways: Users can’t get to the site at all. It takes too long to download and view a page. The document they request is missing or appears to be broken. In the following sections, we’ll take a look at the technological factors under- neath each of these errors. Connectivity Failure The most basic Web browser error is when a user fails to get any information from your Web server. These attempts will generate a warning message in the browser such as “Host not found” or “Request timed out.” Such a warn- ing message is shown in Figure 9.1—the user is trying to go to the Web site www.bahoozit.com , which does not exist. FIGURE 9.1 “DNS Not Found” error message Not all error messages indicate a connectivity problem. If the server gives a dreaded “404—File not found” error, for example, your client is connect- ing to the server but the requested document cannot be found. If the host wasn’t found or the request timed out, there was never a full-fledged con- nection between the client and the browser. Because connectivity errors mean that the server never gets a full connection to the client, such problems are often never logged on the server. Copyright © 2000 SYBEX Inc., Alameda, CA. www.sybex.com 394 Chapter 9 Internet Site Functionality Design As explained in Chapter 2, several client queries and server responses need to succeed for a user to browse a Web page—the server’s domain name needs to be resolved into an IP address, and the client needs to make a suc- cessful request to the Web server at that address. If the user can’t get to a site at all, the problem could be caused by one of several factors: The client’s network settings or DNS services are not working. The client’s connection to the Internet is down. The server’s hardware or software is malfunctioning or overwhelmed. The server’s connection to the Internet is down or overwhelmed. Available IP network connections between client and server are over-saturated. The server’s DNS records are corrupt or unavailable. Determining the exact cause of failure requires some troubleshooting. For more information on troubleshooting, see Chapter 10. Another common reason that users get an error message is that the domain name they entered is incorrect. The best way to counter this potential problem is to register a domain name that is short, descriptive of your organization, and easy to remember. If the domain name isn’t unique sounding, people can forget it and try similar names. Some organizations register multiple domain names that people might think of going to. George W. Bush, for his U. S. pres- idential campaign, registered domains like www.gwbush.com and www.bush .com . In addition to registering a primary domain name, some organizations will register common misspellings. Download and View Time One common reason a user doesn’t use a Web site is that it is too slow. How slow is too slow? Researchers at Yale claim that 10 seconds is the threshold of frustration. Users may wait longer than that if the information cannot readily be found elsewhere or if they are particularly interested in a site, but then again, they might not. So depending on the patience level of the audi- ence, pages should finish loading within 10 seconds of the time a user clicks a link. Copyright © 2000 SYBEX Inc., Alameda, CA. www.sybex.com Site Functionality Issues 395 In the following sections, you’ll learn: The different stages of a request that can eat into those 10 seconds How to estimate the time it takes to download a page How available bandwidth limits download speeds Examples and rules of thumb for download times These sections will enable you to estimate whether your page is going to be too slow. Stages of a Request The 10 seconds a user will wait gets split up into several steps, and each step uses up a portion of that time. The major steps are as follows: 1. DNS lookup and initial connection from client to Web server occurs. 2. Request sits in the Web server queue, waiting to be serviced. 3. Server generates response to the request (gets a file, runs a script). 4. Server transmits the data to the client. 5. Client renders/displays the data. Combined, steps 1 and 5, which are the ones most clearly out of the con- trol of the server, generally take a second or two. The time required for steps 2 and 3 depends on the server configuration, although they can often also be reduced to less than a second (you’ll learn more about this in “Planning Robust Back-End Service” later in this chapter). The bulk of the time, there- fore, is spent on step 4, transmitting the data from server to client. Step 5 can sometimes take longer than one second. Slow computers may take several seconds to parse and render HTML documents. Even fast computers can get bogged down by complex HTML code, such as nested tables. Determining Transmission Time Step 4 generally takes the longest amount of time, so it has the most impact on the apparent speed of the Web site. If the Web page takes too long to load, the user will leave. Therefore, it is important to be able to estimate how long a Web page will take to download for different types of users. Copyright © 2000 SYBEX Inc., Alameda, CA. www.sybex.com 396 Chapter 9 Internet Site Functionality Design Transmission time is a function of how large the page is, divided by the speed at which it is downloaded. The size of the page is measured in kilobytes for the HTML, graphics, and multimedia files. The standard way to express this is as follows: Time of Download = (Size of Page ÷ Available Bandwidth) If a site has a 100K page, the time it will take someone to download it with a 5KB/s connection can be estimated. Using 100K for the size of the page and 5KB/s for the available bandwidth, the formula shows that it would take 20 seconds to download: X seconds = (100 kilobytes ÷ 5 kilobytes per second) = 20 seconds Be careful not to confuse bytes and bits. People write about file sizes and download speeds using the terms kilobytes and kilobits. Bytes are generally 8 bits. Also, kilobytes and kilobits refer to 1,024 bytes and bits respectively, not 1,000. Unfortunately, some folks, especially advertisers, represent kilo- bits and kilobytes with inconsistent symbols. Kilobits are referred to as K, k, kb, and Kb. Kilobytes are referred to as K, k, kB, and KB. The symbols K and k are ambiguous! When looking at a number like 14K or 14k, a good rule of thumb is that modem-like devices are generally measured in terms of kilobits per second, and file sizes are almost always measured in kilobytes. Lacking any other clues, KB is likely to be kilobytes and kb (or Kb) kilobits. When writ- ing, choose clear notation, such as KB and kb. Bandwidth Bottlenecks When data is downloaded, it flows in a pipeline from the server to the server’s Internet connection to the general Internet, then from the client’s network connection to the client. So the available bandwidth is the speed of the slowest segment of the pipeline. In 1999 in the United States, the slowest segment is generally the client’s network connection. If a U.S. browser is vis- iting a server in Kenya, however, the slowest segment is likely going to be the slow connection between the Kenyan and U.S. national backbones. Copyright © 2000 SYBEX Inc., Alameda, CA. www.sybex.com Site Functionality Issues 397 Theoretical and Practical Download Speeds The goal of Web designers should be to design pages that won’t take too long to download. Network connections, however, rarely perform exactly as advertised. Therefore, you should consider the following: Know the theoretical speed of different devices. Take these speeds with a grain of salt. It is easy to determine the theoretical speed of any device. A 56Kbps modem, for example, should be able to download about 7K per second. You can determine that with this formula: (56Kbps ÷ 8 bits per byte) = 7KB/s Table 9.1 lists the theoretical speeds of several types of network connections. Real-world factors like initial connection times, intervening devices, and line noise slow downloads to below their advertised limits. Even with a fast TABLE 9.1 Network Connection Speed Network Connection Theoretical Speed Modem Up to 56Kbps BRI ISDN 64–128Kbps Frame Relay 56Kbps–1.544Mbps T1 1.544Mbps E1 2.048Mbps E2 8.448Mbps Cable modem (Varies widely) Ethernet 10Mbps (variations go up to 100 or even 1000Mbps) OC-3 155.52Mbps Copyright © 2000 SYBEX Inc., Alameda, CA. www.sybex.com 398 Chapter 9 Internet Site Functionality Design server and a good ISP, a 56Kbps modem, for example, will rarely achieve that speed. 56Kbps modems operate at 33.6Kbps over analog phone lines. If an ISP has digital lines, there is a chance that their users will be able to get 56Kbps download speed, but uploads will stay at 33.6Kbps. A 14.4Kbps modem will often download at 1.5KB/s, a 28.8Kbps modem at 3KB/s, a 56Kbps modem will optimistically download at 5KB/s, and an unloaded T1 dedicated line will download at 180KB/s. DSL and cable modem users will notice large variances in their download speeds, anywhere from 384Kbps to 10Mbps. Even DSL services that are advertised at 384Kbps frequently get download speeds of 800Kbps (100KB/s) during unloaded times and 100Kbps or slower when the DSL network is sat- urated. For more information on benchmarking DSL and cable modems, see the links on home1.gte.net/awiner/ . Example: A Page Viewers Might Abandon Freshmeat ( www.freshmeat.net ), a popular Unix software directory, weighs in at 78K, almost all HTML. As you can see in Figure 9.2, modem users have a smaller Internet pipeline than DSL users do. It will take a 28.8K modem user about 26 seconds to download a page this size, whereas a DSL modem running at the advertised 384Kbps would receive it in about 2 seconds. Downloading this page hovers on the threshold of frustration for 56Kbs modem users—fickle users might get bored with waiting for the page and jump over to see if linuxapps.com is loading any quicker (at a slightly slimmer 75K). FIGURE 9.2 Download times for freshmeat.net Different Download Times: DSL vs. a 56K Modem www.freshmeat.net home page 1 second 384K DSL 78KB MOO! MOO! 12 seconds 56Kbps modem MOO! MOO! Copyright © 2000 SYBEX Inc., Alameda, CA. www.sybex.com Site Functionality Issues 399 Example: A Page Viewers Would Not Abandon Google (www.google.com) has a highly functional search page of only 12K. As you can see in Figure 9.3, even a 14.4 modem user can download the page in less than the 10 second threshold of frustration. It is unlikely that even the impatient users would abandon the Google page in less than 8 seconds to try another search engine like www.hotbot.com (a lean 30K). FIGURE 9.3 Download times for google.com You can test out the probable download times of any page on the Internet with this free online tool: www2.imagiware.com. Inability to Open or View Files If people can’t use the files on your site, they will often feel frustrated and give up. Files that cannot be opened are either corrupt or are somehow incompatible with certain software and hardware configurations. In this sec- tion, you will learn the following: How a browser successfully recognizes a file What stops a browser from opening a multimedia file What stops a browser from opening an HTML file How to identify and fix corrupt files Download Times for google.com google.com 12KB 12 seconds 14.4Kbps modem 10,000,000,000, 000,000,000,000, 000,000,000,000, 000,000,000,000, 000,000,000,000, 000,000,000,000, 000,000,000,000, 000,000,000,000, 000,000 10,000,000,000, 000,000,000,000, 000,000,000,000, 000,000,000,000, 000,000,000,000, 000,000,000,000, 000,000,000,000, 000,000,000,000, 000,000 Copyright © 2000 SYBEX Inc., Alameda, CA. www.sybex.com 400 Chapter 9 Internet Site Functionality Design It is important for Web site owners to fix the broken files and mark incompatible ones with warnings as to who can and cannot use them. Many times when someone says a file “won’t open,” it is because the file is simply not there. Broken links and missing files are quite common on the WWW. People move the files in their Web site around a lot, and the links to their old files are not automatically updated. See Chapter 10 on how to set up a system to counter this potential source of errors. How a Browser Recognizes a File Browsers sometimes fail to display a file or display it in a mangled fashion. To understand why they fail, like good doctors we need to first understand what happens with our patient when everything goes right and the browser succeeds in displaying a file. The technology that makes this happen is MIME file types. MIME is an acronym that stands for Multipurpose Internet Mail Extension. It allows Web browsers and e-mail clients to recognize and view lots of different types of files. Servers that deliver pages tag these pages as being certain file types. Clients display these file types as best they can. Read www.whatis.com/ mime.htm for details. In a foreign culture, even people who know the language need to be told when something is a joke. They often don’t pick up the subtle clues they need to change the context of their understanding from “serious” to “joke.” In a similar way, browsers need to be told explicitly what mode they should use to interpret each file. Browsers handle many different types of files. The first Web browser was designed to display only HTML. Later browsers learned to understand files from Gopher servers, FTP servers, and WWWAIS index servers. The next generation of browsers learned to display inline images like GIF and JPEG files. More recently, browsers can open Adobe Acrobat por- table documents, Java applets, XML documents, and others. When a browser downloads a file, the Web server tells the browser exactly what type of file it is. The server uses a configuration file (MIME.TYPES in Copyright © 2000 SYBEX Inc., Alameda, CA. www.sybex.com [...]... 9 Internet Site Functionality Design Sites designed for these two different audiences might very well differ in the media use Content-Technology Policies After gathering information about your audience, the next step is to draft a content type policy The policy will guide the entire organization as to what content types to use on the Web site The goal of the content policy is to make sure the Web site. .. pictures Would your audience want to download a Shockwave plug-in in order to view your site? If you have a news site, nice Copyright © 2000 SYBEX Inc., Alameda, CA www.sybex.com 408 Chapter 9 Internet Site Functionality Design pictures would be appealing, but a Shockwave game might not be compelling But if your site provides Web-based tools for diagramming atoms, scientists would probably have enough... 2000 SYBEX Inc., Alameda, CA www.sybex.com 402 Chapter 9 Internet Site Functionality Design FIGURE 9.5 Configuring client MIME types FIGURE 9.6 The Edit Type dialog box Copyright © 2000 SYBEX Inc., Alameda, CA www.sybex.com Site Functionality Issues 403 Web servers send a MIME header with each file, specifying what type of file it is The Web site administrator maintains a lookup table on the Web server... Chapter 9 Internet Site Functionality Design FIGURE 9.8 The Web Networks Web site in a graphical browser There is a techno-political movement that supports the lowest common denominator approach: www.anybrowser.org/campaign/ 85% POLICY The 85% policy states that you should “use technologies that will reach many people, but don’t let the stragglers drag functionality down for other viewers.” A lot of sites... Alameda, CA www.sybex.com 406 Chapter 9 Internet Site Functionality Design Technology and Content Planning The best way to ensure a well-functioning Web site is to plan ahead By planning ahead, administrators can address potential problems before their customers are screaming for blood Also, comprehensive planning leads to optimal trade-offs with factors like high functionality versus compatibility This... degradation is that everyone can use the site as they would like to use it; in other words, “the user is always right.” The cost is added complexity in maintaining multiple versions of documents or in documents that degrade well Copyright © 2000 SYBEX Inc., Alameda, CA www.sybex.com 414 Chapter 9 Internet Site Functionality Design FIGURE 9.9 www.browsercaps.com asks what site version to use Implementing a... marketing successes Copyright © 2000 SYBEX Inc., Alameda, CA www.sybex.com 420 Chapter 9 Internet Site Functionality Design Redundancy Equals Reliability Your back-end service is only as good as its weakest link If your organization’s name servers don’t work, no one will be able to get to your site and it doesn’t matter if your site has plenty of bandwidth Don’t forget to ensure that network services like DNS... considerably, so they will design the site so that the majority of the visitors will be able to visit it and take advantage of its features Businesses generally put up a Web site to sell something A glitzy Web site may sell more than a plain one, even if the glitzy page is theoretically not accessible to those with slower modems ADAPTIVE CONTENT Using an adaptive content policy, Web site developers don’t... discard requests after a threshold has been reached This is often unacceptable and is used only as a safety measure to make sure overloaded Web sites don’t lock up Copyright © 2000 SYBEX Inc., Alameda, CA www.sybex.com 418 Chapter 9 Internet Site Functionality Design Therefore, it is often necessary to increase the rate of fulfilled requests This rate is the number of children actively fulfilling requests... accessibility and glamour Instead, sites can deliver advanced features to clients that can use them and deliver standard features to those who can’t This way, the whole audience is well served Creating such a Web site, however, adds complexity and often cost There are two ways to create a Web site that provides high functionality to advanced clients and also gracefully provides reduced functionality: Differential . Chapter 9 Internet Site Functionality Design i-NET+ EXAM OBJECTIVES COVERED IN THIS CHAPTER: Identify the issues that affect Internet site functionality. view your site? If you have a news site, nice Copyright © 2000 SYBEX Inc., Alameda, CA. www.sybex.com 408 Chapter 9 Internet Site Functionality Design pictures