CHAPTER 12 : Network Troubleshooting Methodology 576 The first networking protocols were proprietary; that is, each networking vendor developed its own set of rules. Computers using an individual vendor’s protocol would be able to communicate with each other, but not with computers that were using the networking product of a different vendor. This had the effect of locking a business into a particular product; the business would always have to use the same vendor’s products to maintain compatibility. The solution to this problem was the development of protocols based on open standards. Organizations such as the International Organization for Standardization (ISO) were charged with overseeing the definition and control of these standards and publishing them so they would be available to any vendor that wanted to create products that adhered to them. The advantage to the consumer is that no longer is he forced to patronize a single vendor. The advantage to the vendor is that its products are more widely compatible, and therefore can be used in networks that started out using a different vendor’s products. A model provides an easy-to-understand description of the networking architecture and serves as the framework for the standards. The OSI model has become a common reference point for discussion of network protocols and connection devices. As we look at the OSI and DoD models, you’ll see that they both use layers to represent areas of functionality. In OSI terms, each of the layered specifications uses the services of the layer below it to build an enriched service. The layered approach provides a logical division of responsibility, where each layer handles only the functions that are specific to that layer. You can think of this like the teamwork exhibited by a good assembly-line crew that’s building an automobile. One worker may be responsible for fitting a wheel onto the axis, another for inserting and tightening the screws, and so forth. There are several advantages to this type of working model: Each worker only has to be concerned with his or her own area of responsibility. Each worker becomes extremely proficient at his or her particular job through constant repetition. Working together in sequence, the team of workers is able to produce the final product much more quickly and efficiently than one person or than a group of people with no assigned responsibilities could. How to Use the OSI Model in Troubleshooting 577 If something goes wrong (for instance, if a particular part was put on incorrectly), the supervisor knows who is to blame for the problem. Likewise, when the networking protocols are divided into layers, communication generally flows more smoothly, and when it doesn’t, troubleshooting is easier because you are better able to narrow down the source of the problem to a specific layer. We’ll be using the ISO’s OSI model to help us in troubleshooting network connectivity issues. Reviewing the OSI Model You should remember that the OSI model consists of seven layers. When one computer communicates with another one, data at the sending computer is passed from one layer to the next until the physical layer finally puts it out onto the network cable. At the receiving end, it travels back up in reverse order. Although the data travels down the layers on one side and up the layers on the other, the logical communication link is between each layer and its matching counterpart, as shown in Figure 12.1. Here’s how it works: as the data goes down through the layers, it is encapsulated or enclosed within a larger unit as each layer adds its own header information. When it reaches the receiving computer, the process occurs in reverse; the information is passed upward through each layer, and as it does so, the encapsulation information is evaluated and then stripped off one layer at a time. The information added by the network layer, for example, will be read and processed by the network layer on the receiving side. After processing, each layer removes the header information that was added by its corresponding layer on the sending side. It is finally presented to the application layer, and then to the user’s application at the receiving computer. At this point, the data is in the form it was in when sent by the user application at the source machine. Figure 12.2 illustrates how FIGURE 12.1 Each Layer of the OSI Model Communicates with the Corresponding Layer. Application Sending Computer Receiving Computer Presentation Session Transport Network Data Link Physical Network Media Application Presentation Session Transport Network Data Link Physical CHAPTER 12 : Network Troubleshooting Methodology 578 the header information is added to the data as it progresses down through the layers. Establishing a Troubleshooting Strategy This chapter will provide a number of examples of how to troubleshoot network issues. The most important thing that you can do when troubleshooting is to be organized and methodical in your approach to solving problems. If you work in a rushed fashion, you’re likely to miss a crucial troubleshooting step or forget what you did to solve the problem the next time it occurs. In general, you can break down the items necessary to troubleshoot a network issue into the following seven steps: 1. Identify the symptoms and potential causes The first step is determining exactly what’s wrong, and ensuring that there is actually a problem. For example, if you upgrade the speed of your company’s Internet connection so that a file transfer that used to FIGURE 12.2 Each Layer Adds its Own Header Information. How to Use the OSI Model in Troubleshooting 579 take 30 seconds now takes no time at all, a user might think that the file transfer didn’t work. In this case there really wasn’t an error, but the user thought that there was a problem because something she was used to had suddenly changed. So your first step should be in determining the exact symptoms of the problem. Is a user unable to access the Internet entirely, or is it only that she is unable to receive her e-mail while Web browsing is working fine? Ask questions and get very specific details on the issue. Screenshots of error messages can be helpful as well. Having as much information as possible at hand will help you determine the potential causes of an issue. 2. Identify the affected area Is only one user having trouble accessing the Internet, or is there an issue that’s affecting an entire subnet or your entire network? If all users on a particular subnet are experiencing connectivity troubles, you might suspect the default gateway or the network connection between that subnet and the rest of your network. If the issue is restricted to a single user, you’ll be more likely to examine the specific user’s workstation to check for hardware failures or software misconfigurations. 3. Establish what has changed Once you’ve determined what the problem is, you should try to find out if anything has changed that may have caused the issue to surface. If an issue is impacting multiple machines, think about what may have changed across the enterprise; if it is an issue that is isolated to a particular machine then question the user as to what they may have changed recently. Often times a problem can seem to come out of nowhere only to find out later on that other administrators were making changes that you were not aware of. Communication between teams is critical in any organization and being aware of what changes may be taking place in your organization are important. Knowing what has changed on a network will usually give you a good starting point from which you can begin the troubleshooting process. 4. Select the most probable cause Based on the information you gathered in the first three steps, and using the troubleshooting tools we covered in Chapter 11, try to determine the most likely reason why you are experiencing the particular problem. Keep in mind that your first guess might be incorrect and that there may be more than one cause. Also, don’t get discouraged if you have to run CHAPTER 12 : Network Troubleshooting Methodology 580 through a process of elimination not just a first, but then a second and even third or fourth time before finding the true root cause. 5. Implement an action plan and solution, including potential impact The first part of this seems straightforward enough – “Fix the problem!” But many administrators will overlook the second half of the statement, which is just as important: you don’t want to do something to fix one problem that will only end up causing another, potentially worse, problem. So think carefully about the impact of any troubleshooting steps that you are going to take, especially if it involves steps that you’re taking on a server, router, or other device that many users rely on. Also, once you have devised a course of action for a solution be sure to think through what the ramifications of implementing the solution may be. For example, you may decide that you need to stop and restart the WWW publishing service on your Microsoft Web server to solve an issue being experienced by one particular user, but you need to plan this carefully because it will affect every user that is connected to the Web server. You should be even more careful in changing any configuration items on a server or router so that the change you’re making doesn’t create additional problems. A good rule of thumb is to follow any change management guidance that your organization may have in place and be thoughtful of any changes you make to production systems. 6. Test the result In addition to determining if your solution actually fixed the problem you were experiencing, you should also look at the overall connectivity of your network to make sure that you didn’t inadvertently create another issue by implementing the fix you chose in Step 5. 7. Document the solution and process Even though you probably think that you’ll never forget how to fix this problem, especially if it’s one that you had to work at solving for hours and hours, the simple fact of the matter is that if you don’t write down your solution, odds are quite good that you won’t remember if the issue recurs many months down the line or won’t be able to reproduce it precisely. You should document the information that you gathered in each of these steps: how the problem manifested itself, the systems that were affected, and the details about the solution and how you implemented it. Most importantly, don’t forget to document any side effects that came about as a result of Troubleshooting the Physical Layer 581 implementing the fix. Taking the time to go through this step at the time of resolution can save you hours down the road when you encounter a similar issue and find that you need to retrace your steps. In the next sections, we will begin discussing the layers of the OSI model. We will map each one back to the troubleshooting methodology discussed in this section. We will start with Layer 1 and work our way through Layer 7, and within each section we will take the time to discuss components that exist at each tier and how you may approach them. Being able to bring together the concepts of the OSI model, a solid troubleshooting methodology, and the appropriate tools are the magical combination of ingredients that will enable you to successfully tackle just about any network issue that may arise. TROUBLESHOOTING THE PHYSICAL LAYER In many ways, the physical layer is the easiest to troubleshoot because it deals with devices and concepts that are concrete and tangible – you can actually see the network cabling and NICs that exist at the physical layer, whereas concepts such as IP addresses and Media Access Control (MAC) addresses are intangible items that are simply configured on the physical devices. The physical layer deals with such things as the type of signal transmission, cable type, and the actual layout or path of the network wiring. These are things we can see, touch, or at least easily represent with a drawing or diagram. Because of this, the functions of the physical layer devices, which include NICs, cables, connectors, hubs, and repeaters, are also relatively easy to understand. Physical layer devices are mostly items that you’ll find in any networking equipment catalog. The basics are deceptively simple: you insert a network card into an expansion slot on each computer, plug a piece of cable into each network card, and plug the other end of each cable into a hub. But leafing through any type of equipment catalog will reveal that physical layer issues are a little more complex. Some cable manufacturers offer literally thousands of different cables, and the variety of available network cards and connectivity devices is just as overwhelming. Getting a network up and running at the physical level requires a good bit of knowledge about what works with what and which hardware type is best for your particular situation. The NIC is the hardware device most essential to establishing communication between computers. Although there are ways to connect computers without a NIC (using a modem over the phone lines or via a serial null modem cable, for instance), in most cases where there is a network, there CHAPTER 12 : Network Troubleshooting Methodology 582 is a NIC for each participating computer. There are some basic guidelines to consider when selecting a NIC for any device. The first thing is that the NIC must match the bus type for which you have an open slot in the computer. It also must be of the correct media access type, and the chosen NIC must have the correct connector for the cable your network uses. Transfer speed is an additional important consideration, and the NIC should be rated to transfer data at the proper speed. Ethernet normally transmits at 10, 100, or 1000 Mbps, and Token Ring runs at 4 or 16 Mbps. The network media is the cable or wireless technology on which the signal is sent. Cable types include thin and thick coaxial cable that is sim- ilar to cable TV cable. They include twisted-pair, such as used for modern telephone lines and available in both shielded and unshielded types, or fiber optic that sends pulses of light through thin strands of glass or plastic for fast, reliable communication. Fiber optic is expensive and can be diffi- cult to work with. Wireless media can include radio waves, laser, infrared, and microwave. Hubs and repeaters are devices that operate at the physical layer. Repeaters connect two network segments (usually thin or thick coax) and boost the signal so the distance of the cabling can be extended past the normal limits at which attenuation, or weakening, interferes with the reliable transmission of the data. Hubs are generally used with Ethernet twisted-pair cable, and most modern hubs are repeaters with multiple ports. Hubs also strengthen the signal before passing it back out to the computers attached to it. Hubs can be categorized as follows: Active hubs are the ones just described. They serve as both a connection point and a signal booster. Data that comes in is passed back out on all ports. Passive hubs serve as connection points only; they do not boost the signal before passing it on. Passive hubs do not require electricity and thus won’t have a power cord as active hubs do. Intelligent or “smart” hubs include a microprocessor chip with diagnostic capabilities, so that you can monitor the transmission on individual ports. The NIC is another device that functions at the physical layer, and it is responsible for preparing the data to be sent out over the network media. Exactly how that preparation is done depends on which media is being used; a Token Ring NIC is different from an Ethernet NIC, for example. They use different access methods, and even though 10Base2, 10Base5, and 10BaseT Ethernet networks all use Carrier Sense Multiple Access/Collision Detection Troubleshooting the Physical Layer 583 (CSMA/CD) as their access method, they use different cable and connector types; however, it is possible to get a “combo” card that has connectors for all three, though not common nowadays. Another consideration at the physical layer is whether the signaling method will use the entire bandwidth of the cable to transmit the data, or only use one frequency. When all frequencies are used, the transmission method is called baseband. If only part of the bandwidth is used (thus allowing other signals to share the bandwidth), it is referred to as broadband. Traditionally, baseband transmission has been associated with digital signaling and broadband with analog, but this does not always hold true. For instance, digital subscriber line (DSL) is a high-speed technology offered by many telephone companies for Internet connectivity. DSL is a broadband technology. It uses only a part of the wire to transmit data. Voice communication can take place simultaneously on the same cable, using a different frequency than is being used by the data communications. Cable television is another example of broadband transmission, bringing dozens of different channels into your home on just one coaxial cable. Another important physical layer issue is the layout, or topology, of the network. This refers to whether the cables are arranged in a line going directly from computer to computer (bus), in a circle going from computer to computer with the last connecting back to the first (ring), or in a spoke-like fashion with each connecting directly to a central hub (star). A fourth topology, the mesh, is used when every computer is connected to every other computer, creating redundant data pathways and high fault tolerance, at the cost of increasing complexity as the network grows. Wireless communications can use the cellular topology that is widely used for wireless telephone networks. In this case, an area is divided into slightly overlapping cells, representing connection points. The physical layout of the network will influence other factors, such as which media access method (and thus which cable type) is used. All of the physical layer factors, such as the cable type, access method, topology, and so on, when considered together, define the architecture of the network. Popular network architectures include Ethernet, ARCnet, Token Ring, and AppleTalk. Layer 1 Troubleshooting When troubleshooting the physical layer, you’ll be most concerned with NICs, network cables, and hubs. We’ll start by looking at potential issues with NICs, and ways to troubleshoot problems with them. Configuring the NIC at the physical layer is the first step in creating a TCP/IP connection. CHAPTER 12 : Network Troubleshooting Methodology 584 Although an improperly configured card is not a protocol-specific issue, it may be mistaken for one, and you can lose a great deal of time trying to troubleshoot TCP/IP when the problem actually lies with the NIC itself. Thus, it’s important for you to know how to determine when a TCP/IP connection is failing due to a lower-level problem related to physical components such as the computer’s NIC or physical cabling. One easy way to determine that the problem lies at the physical layer is to attempt to establish a connection using a different protocol. If your computer is unable to communicate with others on the network using TCP/ IP, but can make the connection when NetBEUI or NWLink is installed on the machines, you can surmise that the problem lies in the way that TCP/IP is configured, and you can start troubleshooting the protocol configuration accordingly. If you still have no luck in making a connection with other network transport protocols, it is likely that you have a problem with the hardware or the hardware drivers. This simple test can save you much time and effort when you begin the troubleshooting process. In addition, when you are troubleshooting physical devices like cables or NICs, it’s also helpful to have spare equipment to swap out so that you can quickly determine the cause of the problem. If you suspect that the network cable is at fault, switch it with one that you know works; if connectivity is restored, then you know that the cable was the culprit. If switching the cable doesn’t help, try using a different wall jack or another port on the switch or hub. If switching to a different port solves the problem, then you have isolated the point of failure. The Role of the NIC The NIC, also called the network adapter or just the network card, plays an essential role in TCP/IP and other network communications. The NIC is the device that physically joins the computer and the cable or other network media, but its function is more complex than that. Network cards also have memory chips, called buffers, in which information is stored so that if the data comes in or goes out too quickly, it can “rest” there while the bottleneck clears until there is room for it to pass onto the cable or up into the computer’s components. It is essential that you ensure that the NIC installed in the computer is the proper type for both the media and architecture used by your network. For instance, Ethernet and Token Ring require different types of NICs. This is because of the different ways in which the media access methods function. And, of course, the card must have the proper connector for the cable type Troubleshooting the Physical Layer 585 being used. These are basic, relatively straightforward issues, but don’t overlook them when troubleshooting connectivity problems. If you are installing a NIC on a Windows-based computer, be sure to check the Windows Hardware Compatibility List (HCL) or Windows catalog to ensure that your card is supported. The list can be accessed from the Microsoft Web site at www.microsoft.com/hcl. Although devices not listed may still work with Windows, if your card is on the list you can be confident that it has been tested and is compatible with the operating system. Driver Issues Like other hardware devices, the NIC requires a software driver to provide the interface between the operating system and the card. Be sure the driver that is designated for your specific model of NIC is installed and that it is the most recent version. In many cases, simply installing an updated NIC driver can solve countless connection problems. Recent Windows operating systems such as Windows XP, Windows 2003, and Windows Vista support a large number of common brands and models of NICs, and the drivers are included on the Windows installation CDs. However, these may not be the latest versions. Always check the manufacturer’s Web site for a download area where you can obtain the latest drivers. Because Windows 2000, XP, 2003, and Vista are plug-and-play operating systems, supported cards are more likely to be automatically detected and the drivers installed from the Windows installation files. You may be prompted to supply the disk or network location if the driver can- not be automatically located, but compared to systems like Windows NT 4.0, this is still a huge step up. Windows NT 4.0 does not include native plug-and-play functionality. Just be cautioned that drivers installed by the operating system may be outdated, and even if a device does successfully plug-and-play, it is still wise to verify that most current drives are in use. In Exercise 12.1, we’ll walk through the steps of updating a driver for a NIC installed on a Windows Vista PC. EXERCISE 12.1 Updating NIC Drivers Click 1. Start | Control Panel | System. Select the Device Manager in the upper left hand corner. The list of installed devices will be displayed as shown in Figure 12.3. Click the 2. + sign next to Network Adapters. Double-click the NIC installed in this computer and select the Driver tab. You’ll see a screen similar to the one shown in Figure 12.4. . could. How to Use the OSI Model in Troubleshooting 577 If something goes wrong (for instance, if a particular part was put on incorrectly), the supervisor knows who is to blame for the problem. Likewise,. there CHAPTER 12 : Network Troubleshooting Methodology 582 is a NIC for each participating computer. There are some basic guidelines to consider when selecting a NIC for any device. The first thing. the networking product of a different vendor. This had the effect of locking a business into a particular product; the business would always have to use the same vendor’s products to maintain