CHAPTER 10: Network Management 496 Setup and configuration utilities on computers and other devices Administrative features in software Files, such as those containing other passwords or documentation containing procedures Once this list is prepared, it should be stored in a secure location that is known to key personnel and that can be accessed only by them. This might be a safe, locked cabinet in a branch office where tape backups are stored, or a safety deposit box at a bank. In an emergency, this list can then be used to perform tasks that are necessary while you’re unavailable, or that you can’t perform because of unforeseen circumstances in a disaster situation. Notification Documentation If security is compromised or another major problem occurs, it’s important that the right people are notified as soon as possible. Notification is vital to dealing with a crisis swiftly so that problems aren’t left unresolved for an extended period of time, allowing them to increase in severity. When critical incidents such as system failures or intrusions occur, it is impor- tant that the appropriate person(s) to deal with the situation be called in. Notification documentation includes contact information for specific peo- ple in an organization, their roles, and when they should be called. Having this documentation helps to ensure that the appropriate person is called at the appropriate times to deal with issues. For example, during regular business hours, the network administrator might be called to deal with any issues dealing with the network. After work and on weekends, however, the members of the IT department may rotate the duty of being on-call, so that a different member of the staff is responsible for fixing problems dur- ing off hours. If the organization calls the on-call person, and he or she is unable to fix the problem, it is at that point that the on-call person would contact the network administrator. In doing so, the right person for the right job is called in the right order. Having a call order or chain of com- mand is important to notification procedures. After all, while the bosses of a company may claim they want to be notified of problems, chances are that they won’t appreciate being called in the middle of the night because someone forgot a password. At the same time, you don’t want everyone in the organization being able to contact you directly, or you might get dozens of phone calls at home about the same problem. Even worse, you may get calls about inane problems that don’t warrant you being called at home (such as someone needing their maiden name changed to their married name on a system, or someone being unable to find a particular Web page on the corporate intranet). Network Monitoring 497 During off hours, people need to know that problems are to go through a particular person or group of people who have a list of people to notify. This might be a help desk, receptionists, dispatchers, or others who have a 24-hour shift rotation (or are at least there when you’re not).The person or department who has the notification list acts as a filter for problems, and has the responsibility of determining who to call, or if it’s necessary to call some- one in after normal business hours are over. Those responsible for notifying people need to have up-to-date information on how to contact members of the IT staff and certain other employees within the organization. If the prob- lem is catastrophic, such as a fire at a particular office building, then manage- ment and various IT staff members may need to be called in. Those who are called in to solve a problem can then determine if additional persons need to be contacted. The contact information included in notification documenta- tion should provide several methods of contacting the appropriate person. The list of people to contact might include each person’s name, role in the organization, and pager number. A more extensive list might include phone extension within the company, home phone number, cell phone number, address, and other information that will ensure the person can be contacted. If such extensive information isn’t included with the notification documen- tation, then it should reference where additional contact information should also be available, such as an employee database. Notification procedures should also include contact information for certain outside parties who are contracted to support specific systems. For example, if there is a problem with the air conditioner in a server room, then a heating/cooling company that’s under contract might be called in to fix the system. However, when outside parties are called in on an emergency basis, it’s important to remember that other proce- dures and practices are still followed (such as signing them in and out of secure areas). Except in extreme circumstances, the policies of a company shouldn’t be ignored in a crisis. Although notification procedures may be a starting point in a disaster, other plans, policies, and procedures also come into play. Different Methods and Rationales for Network Performance Optimization Network Performance Optimization Network performance optimization is simply assessing your network’s status on an ongoing basis by monitoring and discovering network traffic and logs. Data rates, available bandwidth, WAN link status, backup time, device response rate, and component failures are just a few of the things we need to keep tabs on to ensure the network is optimized. The methods that CHAPTER 10: Network Management 498 we will use to discover performance issues is through techniques called qual- ity of service (QoS), traffic shaping, load balancing, high availability, caching engines, and fault tolerance. QoS You’re at home and decide to call your friend using your VoIP phone. When you talk on the phone the last thing you want to hear is your voice echoing in the background, even worse having to wait a few seconds before your voice reaches the distant end’s phone. QoS is a method that is used all over the world to ensure things like this don’t happen. VoIP is one protocol that must be monitored on the network to ensure QoS is highest on the priority list. QoS is a measure of value of a network service (that is VoIP) compared to the expected or predicted performance quality that network service is actually producing on your network. Problems might include the following: Dropped packets – The routers might fail to deliver (drop) some packets if they arrive when their buffers are already full. Some, none, or all of the packets might be dropped, depending on the state of the network, and it is impossible to determine what will happen in advance. The receiving application may ask for this information to be retransmitted, possibly causing severe delays in the overall transmission. Delay/Latency – As VoIP is a real-time application of voice services, any delay would reduce the transmission thereby making it impos- sible to understand the other person on the distant end. Overcrowded data links on routers in the transit path of your packets could result in a delay of data packets. Long queues or indirect route avoiding con- gestion might be some causes of latency within your VoIP network. Jitter – The Internet is a complex mesh of interconnected routers connected across the world. There is no single path to a given destination. In fact some packets travel completely different paths and end up at the same destination. When there are delays in transit, some packets leaving after others might arrive at the destination first. This variation in packet delay is called “jitter”. Applications like VoIP cannot effectively be used if jitter is occurring. Error – Sometimes packets are misdirected, or combined together, or corrupted, while en route. The receiver has to detect this, and just as if the packet was dropped, ask the sender to resend it. Network Monitoring 499 Dropped packets – Routers have buffers and if these routers are highly active their buffers become full, which results in the router dropping the packets. While the destination host is waiting for the packets the application might request a retransmission. A retrans- mission request usually causes more delays in service since the send- ing host is waiting on its routers’ buffers to decrease. This can cause packets to become corrupt while waiting in the buffer if the router does not pass them along within a given time frame (usually in mil- liseconds for services like VoIP). Some common QoS protocols include Resource ReSerVation Protocol (RSVP) and MultiProtocol Label Switching (MPLS). Both help to manage the bandwidth allocation for services such as VoIP. Differentiated Services (Diff- Serv) model specifies a way of classifying and managing network traffic on IP networks. Each packet in the network is marked with QoS information. Integrated Services (IntServ) is a QoS model that allows applications to signal associated QoS requirements to the local network before transmitting infor- mation. Each networked device receives this signal and allocates the necessary network resources to ensure the marked packets achieve the QoS desired. There are eight levels of QoS as described in Table 10.1. These levels determine how traffic is prioritized based on the rest of the network traffic. The lowest of these levels is “0” meaning the data will get there whenever it wants or on a first-come first-served basis. There is no guarantee at all. For real-time data transfers QoS level “7” is used. This priority level is usually associated with voice or multimedia data packets. Table 10.1 Levels of QoS Priority Level Traffic Type 0 Best effort 1 Background 2 Standard (spare) 3 Excellent load (business critical) 4 Controlled load (streaming multimedia) 5 Voice and video (interactive media and voice) [less than 100 ms latency and jitter] 6 Layer 3 network control reserved traffic [less than 10 ms latency and jitter] 7 Layer 2 network control reserved traffic [lowest latency and jitter] CHAPTER 10: Network Management 500 Traffic Shaping Traffic shaping is important as the news article suggests. Another common term used to describe the control of computer network traffic to optimize for peak performance is packet shaping. Increasing usable bandwidth and lowering latency are the goals of traffic shaping. This technique is employed by specifying what traffic and at what rate (rate limiting) in a span of time (bandwidth throttling) you are going to allow in or out of your network. More common is the use of traffic shaping at the border routers (those Perhaps one of the most universal complaints regarding Simon’s Rock is the speed (or lack thereof) of Internet access on campus. During peak usage, approximately 5 to 10 p.m. on weekdays, the Internet seems to grind to a prover- bial halt. Yet the Internet situation at Simon’s Rock is complex, due in part to the campus’s relatively rural location, which makes access to the Inter- net more difficult than schools with a more urban locale. According to Information Technology Services (ITS), the amount of bandwidth available to cam- pus is approximately 15 Mbps (~1.875 megabits per second) downstream and 15 Mbps upstream. Although this may sound like a lot, one must remember that all of Simon’s Rock’s approximately 450 students and 40 faculty members share this single connection. To maximize the usability of the available band- width, ITS currently utilizes a technology known as “traffic shaping”. Traffic Shaping allows ITS to prioritize certain Internet traffic over others; for example, access to aca- demic Web sites is prioritized over access to bandwidth intensive sites such as YouTube or HuLu. Systems Administrator Peter C. Lai argues that the Internet on campus would be simply “unusable” without traffic shaping. Interestingly, though the Internet may seem slower than one is used to, Simon’s Rock’s Internet speed is not unusually slow, especially considering the school’s small size. According to EDUCAUSE’s Core Data Service 2007 Survey (the last year for which data is available), approximately 30.5 percent of BA institutions have Internet speeds of 12.1 to 44 Mbps, the same category which Simon’s Rock is in. Even so, ITS has entertained the idea of increasing the Internet speed on campus. Unfortunately, Simon’s Rock’s rural location makes this rather difficult and expensive. Currently Simon’s Rock must rent its own fiber optic connection from campus to downtown Great Barrington in order to access the Internet Service Provider (ISP). The fiber, which is owned by Verizon, costs more than $3,000 a month to rent. This is in addition to the cost that the actual ISP charges, which is over $100 a month. Furthermore, if Simon’s Rock wished to go over 20 Mbps, ITS would need to update its Packet Shaper (the device that allows ITS to traffic shape), which would be a very significant investment. Link to article: http://media.www.llamaledger.com/ media/storage/paper1178/news/2009/04/15/News/ Ever-Wondered.Why.Simons.Rocks.Internet.Is.So. Slow-3710993.shtml NEWS ARTICLE BY THE LLA MA LEDGE R B Y JAMES KRELLENSTEIN ON 4/15/09 Network Monitoring 501 bordering your network’s perimeter) for delay- ing entering network traffic. Internal routers and outbound traffic can also be shaped. Traffic policing and traffic contract are terms used to describe how packets are allowed in/ out of the network and at what time. Enforcing compliance with the traffic contract is how traffic sources are aware of what traffic policy is in effect. Figure 10.11 shows the difference in traffic policing and traffic shaping. A much smoother traffic rate is reached with traffic shaping when compared to traffic policing because policing limits traffic and when traffic is naturally below policy it dips down. Traffic shaping on the other hand shapes the traffic into optimal network utilization for the allocated bandwidth on a particular link. Load Balancing Load balancing is a technique employed on computer networks to distribute the incoming traffic upon other network devices if there are indications of increased network traffic or “load”. Load balancing allows a group or cluster of data center servers to share the inbound traffic all the while seeming as if there actually is only one external connection to the Internet. Figure 10.12 shows what a typical network looks like when it is configured for load bal- ancing. Once traffic comes into the network via the one external entry point, it is distributed among other servers internally connected to share the high traffic volumes. Read about how Google™ uses load balancing and see the first Google™ load balancing server in Figure 10.13. High Availability High availability is a system design protocol which once implemented assures a specific degree of uptime continuity in a specific period of time. The goal of high availability is to ensure that users have the maximum uptime so that they can access network resources anytime and anywhere. Reducing unplanned downtime increases business’s potential productivity and elimi- nates bottlenecks in the network. Ever wonder why you can’t check in early at the airport? Well, did you ever think about the critical networks that pro- cess and support airport customer self check-in kiosks? These network infra- structures are the backbone for millions of transactions that allow for quick FIGURE 10.11 Traffic Shaping versus Traffic Policing. Image courtesy of supportwiki.cisco.com, see Doc ID: 19645. http://www.cisco. com/en/US/tech/tk543/tk545/technologies_tech_note09186a00800a3a25. shtml#policingvsshaping Policing Shaping Traffic Rate Time Traffic Traffic Rate Traffic Rate Traffic Rate Time Traffic Time Traffic Time Traffic CHAPTER 10: Network Management 502 FIGURE 10.12 Load Balancing. Image courtesy of hostway.co.uk “When an attempt to connect to Google is made, DNS servers resolve www.google.com to multiple IP addresses, which acts as a first level of load balancing by directing clients to different Google clusters. (When a domain name resolves to multiple IP addresses, typical implementation of clients is to use the first IP address for communication; the order of IP addresses provided by DNS servers for a domain name is typically done using Round Robin policy.) Each Google cluster has thousands of servers, and upon connection to a cluster further load balancing is performed by hardware in the cluster, in order to send the queries to the least loaded Web server. This makes Google one of the biggest and most complex content delivery networks.” GOOGLE™ IS USING LOAD BALANCING! customer check-in. If these kinds of networks didn’t utilize high availability you might be flying with their competitor very soon. Check out how high availability helps Czech Republic Airlines below. Netscreen 5GT Netscreen 5GT Web Server Secure back-up network Secure back-up network Database Server Web/Database Combined Server DB Log File Shipping Database Replication Virtual Load Balancer Re-Routed Disaster Recovery Traffic Primary Site Traffic INTERNET INTERNET INTERNET All Incoming Client Traffic Foundy Server Iron Foundy Server Iron Network Monitoring 503 FIGURE 10.13 Google’s First Production Server with the Hair Pulled Back, Revealing a Rack of Cheap Networked PCs, Circa 1999. Image courtesy of Wikimedia Commons, picture taken by Steve Jurvetson CHAPTER 10: Network Management 504 Caching Engines Cache is data that is copied from the original data and is saved for computers to access locally instead of having to retrieve the same data again from the Internet. This allows the clients to access this “cached” data quicker since it is stored in a temporary location for a specific amount of time. Usually the time is configured by network administrators to allow for the best performance. Cache engines are servers dedicated to caching data for clients. Usually these servers have a database where the cache is stored and upon request is given to the local host. If the cache is not used often enough it is discarded until the client accesses it again. Some basic implementations of cache engines are for Web servers and proxy servers. Even large companies like Google™ and Yahoo ® use caching engines, but these solutions are usually dedicated hard- ware clusters that can handle millions of requests per second. Figure 10.14 shows a typical setup for cache engines and associated clients. Fault Tolerance Fault tolerance refers to what you have in place in the case of a network fail- ure and the plan of how you can recover that lost device with a back up device that is ready to take the failed one’s place. Fault tolerance is also known as redundancy. Having more than one network router is important, especially if your company cannot conduct business as usual without network con- nectivity. Redundant network links are helpful too. As your business grows you might need to hire more people to take sales calls. What about your network? If your only Internet connection breaks and you have hired 30 sales people to send product brochures to potential clients via e-mail, you have just prevented your organization from conducting business. Fault toler- ance is having multiple paths from one end point to another end point ready to be activated if needed. “Czech Airlines, which carried 5.6 million passengers last year, today announced a four-year deal with aviation IT specialist SITA to become the single supplier of its global network communication and messaging services.” “The agreement between the national flag carrier of the Czech Republic and SITA involves the evolu- tion of their existing network infrastructure to increase performance while reducing cost by 25%. The new hybrid network which includes IP VPN, ISP, DSL and SITA AirportHub connections, will serve 74 airport and remotely located offices. This hybrid solution ensures full back-up for all services in both the airline’s head- quarters and remote locations.” http://www.boarding.no/art.asp?id=36195 NEWS ARTICLE: CZECH AIRLINES SELECTS SITA AS SINGLE NETWORK SUPPLIER ON 4/26/09 Summary of Exam Objectives 505 FIGURE 10.14 Caching Engine. SUMMARY OF EXAM OBJECTIVES In this chapter we discussed why networked information systems need to be managed. If our goal in network management is to manage our net- works so that they don’t spiral out of control, we need to always remember the activities, techniques, measures, and gear that pertain to how we oper- ate, administer, maintain, and condition networked information systems in order to ensure the highest available network connectivity within your information technology (IT) department’s budget. In this chapter we cov- ered three main exam objectives, network management, CM, and network monitoring. In the network management section, we discussed how to keep track of resources in the network and how they are assigned, maintained, upgraded, repaired (preventative maintenance), and configured for optimal resource usage and network performance using monitoring techniques. In the CM section, we discussed why and how to establish documentation for wiring schematics for your WAN links and local POP, what are the differ- ences between physical and logical network diagrams, how baselines can assist in network troubleshooting, and why creating and implementing pol- icies and regulations is so important to keep your network in good standing . someone being unable to find a particular Web page on the corporate intranet). Network Monitoring 497 During off hours, people need to know that problems are to go through a particular person or group. issues dealing with the network. After work and on weekends, however, the members of the IT department may rotate the duty of being on-call, so that a different member of the staff is responsible. others who have a 24-hour shift rotation (or are at least there when you’re not).The person or department who has the notification list acts as a filter for problems, and has the responsibility