Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 56 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
56
Dung lượng
496,29 KB
Nội dung
Certification Zone - Tutorial Page of 56 Tutorial Cisco High Availability Techniques by Howard Berkowitz Introduction Importance to the Entire Certification Process Trends So How High an Availability Level Is Enough? Redundancy Doesn't Always Mean High Availability Availability Terminology State Shared Risk Groups Paging Mr Murphy Selecting Recovery Strategies Cost and Complexity in Selecting Strategies Recovery Time Requirements in Selecting Strategies 1:N, 1:1, and 1+1 Protection Strategies 1:1 Can Be Simple Local Restoration Protection Strategy Dynamic Discovery Recovery Strategy Layer 1/2 High Availability for Links and Interfaces Layer Failover and Bandwidth on Demand Dial Recovery SONET and POS Layer Aggregation Aggregating 802.3 802.3 Aggregation Control Multilink PPP and Multichassis Multilink Resilient Packet Rings (RPR) Two Rings to Protect Them Layer 1/2 Spanning Tree High Availability Lessons Learned about L2 STP Solutions STP Failure Modes Core/Backbone Switch Failure Distribution Switch Failure IEEE 802.1w Rapid Spanning Tree Protocol (RSTP) Port Types in 802.1d and 802.1w Port States in 802.1d and 802.1w Restrictions on Edge Ports Root Wars and Root Guard Non-STP Port-Based Mechanisms PortFast, BPDU Guard, and 802.1w Functional Equivalence Preventing Broadcast Storms 802.1x Port Based Authentication STP Convergence Time Effect of Diameter in Basic 802.1d Single and Multiple Spanning Trees and Relationships to VLANs MSTP: Subdividing the Spanning Tree for Faster Convergence MSTP Regions IST, CIST, and CST VLAN Tagging and VLAN Trunk Protocol (VTP) VLAN-to-Spanning Tree Relationships PVST http://www.certificationzone.com/cisco/studyguides/component.html?module=studyguides 5/31/2005 Certification Zone - Tutorial Page of 56 Router Location and Failover First-Hop Redundancy HSRP and VRRP IRDP Gateway Load Balancing Protocol Exit Selection Passive RIP Kinds of Default Generating Default Special Applications of Type and Type Defaults General Routing Hardware Failover in Routing High Availability for Power High Availability for Route Processors Partition Repair Backbone Nonbackbone Getting Creative with Tunnels and Nonredistributed Static Routes High Availability BGP and Supporting IGPs Conditional Advertisement BGP Route Oscillation Deterministic MED Something You Might Not Know You Know about Loop Prevention Graceful Restart/Cisco Nonstop Forwarding IETF Graceful Restart Principles End-of-RIB: A Really Good Thing Handling Multiple Restarts Cisco NSF Routing and Forwarding Operation NSF for Link State IGPs BGP Operation Protection against Routing Attack Ingress Filtering Unicast Reverse Path Forwarding (uRPF) uRPF Restrictions Authentication Midboxes Reverse Route Injection PIX Failover Historical Perspective on NAT High Availability Based on NAT IOS SNAT Failover IPSec Troubleshooting and Recovery High Availability Applications Conclusion References Books, Reports, and Articles IETF Requests for Comment IETF Internet Drafts Introduction A wide range of Cisco features makes more sense when you look at them as a group and understand their relationships It can be awfully confusing to look at individual features and try to understand them Indeed, mailing lists such as those at www.groupstudy.com are filled with questions about "How I make feature X something?" rather than the broader and more CCIE-lab-like question, "What features are potential alternatives for this problem/requirement/task?" http://www.certificationzone.com/cisco/studyguides/component.html?module=studyguides 5/31/2005 Certification Zone - Tutorial Page of 56 Importance to the Entire Certification Process Being able to look at a group of features is increasingly important in the CCIE written exam, including knowing about platform-specific features In this paper, you'll see things that might appear on the written exam, but won't occur in the lab because the lab doesn't contain the necessary hardware Remember, however, that first you have to pass the written exam Platform-specific features as well as "speeds and feeds" also are important in the various specialist exams Figure Road Map for This Document In this Tutorial, I'll show the relationships among an assortment of features that generally improve availability Many of these are both at Layer and Layer 2, so they don't necessarily get tested in a lab without special hardware (e.g., high-end routers, SONET/SDH, multiple parallel LAN and WAN links) However, you still may see them on the CCIE or specialist written tests Understanding what goes on "under the hood" of high availability will improve your networking marketability, your understanding of principles, and, to a certain extent, your ability to pass certification exams Some of the under-the-hood principles include knowing why increasing redundancy can decrease availability, learning different ways things fail and can be protected against failure, and some of the less obvious protocol mechanisms that exist for reliability Above all, practical highreliability design means learning to put in adequate functionality but not to overcomplicate It is the matter of overcomplicating that causes me to raise a flag with the certification process Some of the more bizarre redistribution schemes and the like may, at one time, have been justified for increasing reliability, but they have long been obsolete in the real world Cisco testers, unfortunately, love complexity if it lets them exercise obscure IOS knobs You Have Been Warned Keep in mind that there is a growing trend to move availability mechanisms to higher layers This may seem counterintuitive as new and powerful switches, such as the 3550, move into the lab, but you will often find their high availability functions to be as much at Layer as at Layer Incidentally, please don't try to analyze any of this functionality with the marketing term Layer switching it will just confuse you http://www.certificationzone.com/cisco/studyguides/component.html?module=studyguides 5/31/2005 Certification Zone - Tutorial Page of 56 Figure Graphic Conventions for Figures While some of the high availability features use their own specialized protocols, BGP has become the Swiss army knife of networking It is used to carry all manner of setup and control information for features As a result, as well as the need for more scalable Internet routing, BGP has been significantly extended See the section "High Availability BGP and Supporting IGPs" for information on the major extensions Some of the extensions have been described in the CertificationZone BGP Tutorials already online; others will be discussed here; and yet others, primarily associated with virtual private networks, will be discussed in (future) VPN Tutorials Trends Network design is always evolving, based on new ways to look at products and technologies One evolution has been the de-emphasizing of VLANs, especially campus VLANs Such VLANs were introduced when campus L2 switches had a significant speed advantage over L3 devices, but current Cisco routers have made up for that speed restriction So How High an Availability Level Is Enough? Bridging and L2 switching were long pushed for high speed and low cost, but L2 topologies historically are less flexible than L3 topologies Until recently, L2 networks were extremely limited in load sharing Selecting the appropriate level of availability is as much a business as a technical decision In her book Planning for Survivable Networks, Annlee Hines has written extensively on the basis of these decisions If you ever plan to recommend real network designs rather than simply pass tests, read her book! [Hines 2002] There are certainly applications for VLANs, but they are now much less the dominant campus technique than they were three or four years ago [Sharan 2002] My WAN Survival Guide [Berkowitz 2000] discusses some of these cost-benefit trade-offs from the enterprise standpoint, and my Building Service Provider Networks [Berkowitz 2002] looks at the trade-offs from the service provider viewpoint So, your view of emerging designs is to de-emphasize older L2 techniques while simultaneously using new L2 enhancements and increasing availability of the L3 techniques used to interconnect L2 domains Table Broad Goals for High Availability [Berkowitz 2000] Availability Level Server Network "Do nothing special" Locked network equipment http://www.certificationzone.com/cisco/studyguides/component.html?module=studyguides 5/31/2005 Certification Zone - Tutorial Page of 56 Backups "Increased availability: protect the data" Full or partial disk mirroring, transaction logging Dial/ISDN backup "High availability: Protect the system" Clustered servers Redundant routers No single-point-of-failure local loop "Disaster recovery: protect the organization" Alternate server sites No single-point-of-failure national backbone High availability involves a great many cost trade-offs, some of which are "Layer 8" business rather than technical considerations Table Readiness Costs for High Availability (Capital Expense) Direct Indirect Backup equipment Design Additional lines/bandwidth Network administrator time due to additional complexity; higher salaries for higher skills Floor space, ventilation, and electrical power for additional resources Performance drops due to fault tolerance overhead If you choose to pay me later and accept failures, what are some of the costs of failures when they occur? Table Costs of Lack of Availability Direct Indirect Revenue loss Lost marketing opportunities Overtime charges for repair Shareholder suits Salaries of idle production staff Staff morale Redundancy Doesn't Always Mean High Availability Radia Perlman's doctoral thesis [Perlman 1988] was on the "Byzantine generals problem." She demonstrated that adding more network elements during certain kinds of failures not only does not increase availability but actually decreases it The theoretical problem deals with a situation where the decision maker receives conflicting information from multiple sources, some of which is known to be untrue but it is not known which information is untrue Sounds familiar from mutual redistribution problems, hmm? Actually, it applies to most routing mechanisms and related mechanisms such as Layer spanning trees Aside from the theoretical aspects, adding more components provides more and more opportunity for Murphy's Law to function, causing configuration errors or introducing stresses when components fail In an idealized network, you have just enough network elements to meet all requirements (including recovery from failure), but not more Of course, no network is ideal http://www.certificationzone.com/cisco/studyguides/component.html?module=studyguides 5/31/2005 Certification Zone - Tutorial Page of 56 Availability Terminology Remember that the CCIE written exam is more concerned with protocol theory and features than specific configuration of routers to use them This section will give you a good deal of information relevant to the theory of many protocols State All connection-oriented protocols are stateful, but not all stateful protocols are connection-oriented I don't say this to be confusing, but to illustrate an important and neglected aspect of protocols Obviously, when two peers are going to have bi-directional communications, they have to know about one another In addition, they are going to use some resources, whether a telephone line, a set of TCP buffers, or some other resource associated with the protocol Having state means that the participants retain knowledge of one another In a true connectionoriented protocol, resources are reserved for the communication: the bandwidth of the telephone channel That bandwidth, in classic telephony, will be available even if both parties are silent Some of the VoIP bandwidth reduction techniques juggle conversations and available bandwidth, but that is a special case There are a significant number of protocols, however, that are connectionless but stateful Such protocols retain knowledge, but not commit resources Typically, they make resources available from a pool shared by multiple stateful associations A flow is such an association, which does not necessarily reserve dedicated resources Stateful protocols can have an explicit connection phase, or can be soft state Soft-state protocols maintain state as long as the participants periodically hear some type of keepalive message Semi-softstate protocols use explicit connections, but have no teardown Stateless protocol implementations retain no knowledge of prior events or relationships The classic example of a stateless protocol is IP in a router, which makes packet-by-packet decisions on forwarding Shared Risk Groups We often speak of single points of failure Multiprotocol Label Switching (MPLS) has refined that definition into the shared risk group (SRG) The basic definition of an SRG is "a set of network elements that will be affected by the same fault" SRGs can apply to all sorts of network resources, and a given resource can belong to more than one SRG A shared risk group of routers might be all of those on a common electrical power supply Table Basic Shared Risk Groups Layer Hardware Software Infrastructure Commercial power Physical Cable in common duct, single shared medium Data Link Cables in common multilink bundle Network Router Transport Routing software session/instance TCP software http://www.certificationzone.com/cisco/studyguides/component.html?module=studyguides 5/31/2005 Certification Zone - Tutorial Application Page of 56 Single DNS server One of the classic SRGs is the common cable or cable duct that gets cut by construction workers While building alternate cable runs to the telco end office historically is prohibitively expensive, new Cisco technology gives you some creative alternatives It may not be expensive, balanced against the cost of downtime, to run a wireless LAN from your main router to a router in a nearby building That alternate router would connect to the end office, at the very least, via a different cable, and ideally would connect to an entirely different office The bandwidth available to you from one wireless LAN, or a small number of parallel wireless LANs, usually will be comparable to your normal WAN uplink When the WAN bandwidth requirements are substantial, you still can get laser or wireless links from non-Cisco vendors, providing short-haul bandwidth up to OC12 (622-Mbps) rates Paging Mr Murphy Murphy's First Law states, "whatever can go wrong, will." His Second Law says, "What has gone wrong will get worse." High availability measures will never be able to deal with every possible Murphy case As a result, the MPLS Recovery Draft [Hellstrand 2002] first approaches the problem of single link (group) failures between network elements, generalizing this model to single interface and single router failures The latter two are equivalent to SRG failures Other failure modes not considered here include congestion from broadcast storms and the like, Byzantine errors, host or host link errors, etc Illegal protocol packets or hardware failures clearly are error events Inopportune events impact high availability as well A good example of an inopportune event is the arrival of one or more error notifications, or explicit restart/recovery messages, while recovery or restart is in progress Selecting Recovery Strategies Approaches to recovering from failures depend on whether tight resource control in the network is needed, as, for example, where bandwidth is explicitly allocated to meet QoS actions If overall control of this sort is needed, there may need to be a central (or distributed) network management element, usually called a head end Before selecting a technology, know your tolerance for outages and your budget This discussion assumes that the recovery technology does have sufficient resources to protect against at least a single failure without human intervention Outside the scope of this discussion are failures where mean time to repair (MTTR) is significant because it requires human intervention, possibly at unmanned sites, and possibly where spares need to be shipped in You must, however, always remember why you want a particular level of survivability and build against the defined requirements Designers and, unfortunately, traditional telephony people often use the 50ms cutover goal of SONET as the gold standard This number is derived from SS7 characteristics of large carrier networks VoIP is much more tolerant of drops, tolerating 140 ms to s Cost and Complexity in Selecting Strategies Part of the cost of any recovery strategy is the cost of resources that not routinely carry operational http://www.certificationzone.com/cisco/studyguides/component.html?module=studyguides 5/31/2005 Certification Zone - Tutorial Page of 56 traffic but are devoted to backup Such resources are assumed in 1:N, 1:1, and 1+1, and possibly local repair models Dynamic discovery does not make this assumption The more resources committed, the more expensive the solution See Table for a summary of recovery strategies, which are detailed in subsequent subsections of this discussion Local restoration and reversion, also discussed below, can apply to any of the modes of this table For more detail, see "Local Restoration Protection Strategy" and "Dynamic Discovery Recovery Strategy" later in this section Another consideration is whether the recovery must consider end-to-end performance Table Recovery Modes Mode Functionality Dynamic discovery Relies on sufficient statistical redundancy that routing protocols can find a nondedicated backup path 1:N Switching onto available backup in pool 1:1 Switching to assigned backup The assigned backup might be carrying preemptible packet traffic 1+1 Replicating onto both paths All of these strategies provide restoration, but may or may not provide reversion In restoration, the high availability system has done its job when the failed resource is replaced by another In reversion, the high availability system also needs to restore the original conditions of resources after the failure is fixed Another consideration is whether a backup resource needs to be found for the new working resource Reversion implies, to some extent that the original resource backs up the new working resource, but the risk of the original resource being down may make that inadequate End-to-end recovery needs to know about SRGs It needs to know that a recovery action will either minimize the number of resources put into an SRG in which a failure occurred or completely avoid that SRG Local repair is not aware of end-to-end recovery Recovery Time Requirements in Selecting Strategies Data protocols that are extremely timing-critical are becoming uncommon, such as IBM System Network Architecture (SNA) without local acknowledgement and DEC Local Area Transport (LAT) MPLS work on availability has produced a generally useful list of timers, generalized here for IP as well as MPLS (Table 6) Table Failure Detection Timers Failure or Degradation Type MPLS Definition Path Failure (PF) Recovery mechanisms have decided the path has totally lost connectivity Link Failure (LF) IP Routing Definition BGP or IGP route withdrawal or loss of keepalives at a lower layer MPLS recovery mechanisms have been Typically implementation-specific, http://www.certificationzone.com/cisco/studyguides/component.html?module=studyguides 5/31/2005 Certification Zone - Tutorial Page of 56 informed of a lower-layer total failure although OSPF does have a specific notification abstraction, especially for demand circuits Usually associated with an SNMP trap Fault Indication Signal (FIS) A signal repeatedly transmitted that a fault along a path has occurred, passed along the path until it reaches a network element capable of initiating recovery BGP or IGP withdrawal route Generally considered poor practice to announce periodically Fault Recovery Signal (FRS) Indication that a fault along a working path has been repaired BGP or IGP re-announcement of previously withdrawn route You may have real-time applications such as telepresence, telemetry, etc that must have predictable delay Delay may also be a commercial differentiator for competitive offerings of mission-critical business applications such as automatic teller machines, credit authorization, and transaction-based Internet commerce 1:N, 1:1, and 1+1 Protection Strategies In order of strength of protection (and cost), there are three basic modes for media/link protection: 1:N, 1:1, and 1+1 (See Figure 3.) These modes dedicate backup resources An additional mode, dynamic discovery, assumes resources are there but does not pre-allocate them Do remember that to use some of these strategies, you will have to have physical topologies that make the backup resource in physical proximity to the working resource Figure 1:1 and 1:N Protection 1:N provides one backup resource for N working resources, N being greater than Think of a multiresource PPP or EtherChannel with more than two links, when any two links will handle the traffic 1:1 dedicates a backup resource for each working resource Think dual ring FDDI or primary/backup power supplies http://www.certificationzone.com/cisco/studyguides/component.html?module=studyguides 5/31/2005 Certification Zone - Tutorial Page 10 of 56 Both 1:N and 1:1 schemes may use the backup resource for lower-priority traffic, which can instantly be pre-empted if the working resource fails 1:1 Can Be Simple Don't forget that you can protection not just on high-speed facilities, but also with such external equipment as intelligent CSUs Figure Intelligent CSU with 1:1 Failover 1+1 restoration sends the same data on both resources (i.e., the solid and dotted lines in Figure 3), so that the data is immediately available in the event of failure This is the basic mode of operation of the SS7 data resource protocol SSCOP, an extension of LAP-B 1+1 protection adds application complexity, because the applications need to be able to decide which copy of information should to be used Alternatives vary with the nature of the application (Table 7) Table Dealing with Redundant Application Data Application Rule Example Apply once and only once Add deposit to bank account Apply multiple times Requirement Take first and block others (stateful) Weather map for same interval Just keep applying Local Restoration Protection Strategy Other strategies include local restoration around the fault (Figure 5) Local restoration is especially attractive in optical networks, where a fault can be bypassed much as in FDDI http://www.certificationzone.com/cisco/studyguides/component.html?module=studyguides 5/31/2005 Certification Zone - Tutorial Page 42 of 56 Figure 28 Blackholing as a Result of Excessive Summarization Getting Creative with Tunnels and Nonredistributed Static Routes Refinements of Floating Static Routes Especially for dial backup, static routes are a common recovery technique Floating static routes are less preferred (due to a high administrative distance) than dynamic routing They are selected when the routing protocol fails Alternatively, for traffic engineering or other purposes, you might use static routes to describe the preferred path, but use dynamic routing as a backup See Figure 29 http://www.certificationzone.com/cisco/studyguides/component.html?module=studyguides 5/31/2005 Certification Zone - Tutorial Page 43 of 56 Figure 29 OSPF Traffic Engineering Workaround Partition Repair One way to recover from a partitioned nonzero area, assuming two ABRs, is to tunnel through area 0.0.0.0 As with the traffic engineering example, create a tunnel made up of static routes that are not redistributed into the dynamic routing protocol, and place that tunnel in the nonzero area Another application of tunneling in recovery is to tunnel through a routing domain of a different type or process Figure 30 Tunneling through Heterogeneous Domains This can be an alternative for both backbone and nonbackbone areas High Availability BGP and Supporting IGPs Some other BGP high availability features, such as soft refresh, multiple reflectors in clusters, and route flap damping, have been discussed in other papers See my series of CertificationZone BGP Study Guides Table 17 BGP Extensions Update Note: Internet Drafts are working documents, which usually but not always become RFCs The -nn version numbers were correct at the time of writing, but may be updated by the time you read this Alternatively, the draft might expire or be replaced by an RFC Feature or Design Issue Cisco Name IETF Reference Capabilities advertisement www.ietf.org/rfc/rfc2842.txt Extended communities www.ietf.org/internet-drafts/ draft-ietf-idr-bgp-ext-communities- http://www.certificationzone.com/cisco/studyguides/component.html?module=studyguides 5/31/2005 Certification Zone - Tutorial Page 44 of 56 05.txt Outbound route filtering www.ietf.org/internet-drafts/ draft-ietf-idr-route-filter-08.txt www.ietf.org/internet-drafts/ draft-ietf-idr-aspath-orf-04.txt Soft refresh Soft Reset RFC 2918 Graceful restart Nonstop forwarding with SSO www.ietf.org/internet-drafts/ draft-ietf-idr-restart-06.txt Persistent route oscillation Design technique, not specific knob RFC 3345 4-byte AS numbers www.ietf.org/internet-drafts/ draft-ietf-idr-as4bytes-06.txt Always-compare-MED versus deterministic-MED Referenced in RFC 3345 Multiprotocol BGP RFC 2858 (new draft in process) Conditional Advertisement Conditional advertisement is a powerful BGP feature first available in IOS 11.1CC and 11.2 and generally available in subsequent releases It provides a dynamic way of selecting less-preferred backup routes, covering situations where you might otherwise encounter blackholing due to aggregate addresses hiding the loss of a more-specific route In normal BGP operation, if a route is in the Adj-Out-RIB for a particular peer, it will be transmitted BGP conditional advertisement, however, creates an additional functionality, implemented in route maps, which can be applied before making the decision to propagate an advertisement You can either permit or deny an advertisement from a specific router, based on the presence or absence of a particular prefix in the main routing table You define a non-exist-map to match a specific prefix If this prefix is not in the routing table, an update specified in an advertise-map clause will be propagated As long as the prefix, typically more specific, is in the routing table, the conditional route in the advertise-map will not be announced BGP Route Oscillation ISP operators have discovered that, under certain circumstances, route reflector clusters and confederations can cause oscillating or partially visible roots The underlying problem lies in the interaction of MEDs with RRs and confederations The general workaround is a matter of how you use existing BGP features rather than a new knob to be added This is one of those issues where you may want to ask a lab proctor whether you should a route reflector or confederation solution that protects against route oscillation I suspect this is more an operational technique that may have not made its way into the exams Deterministic MED http://www.certificationzone.com/cisco/studyguides/component.html?module=studyguides 5/31/2005 Certification Zone - Tutorial Page 45 of 56 You should be familiar with the role of MED in the sequence of BGP route selection rules There are some nuances of interpretation that reduce BGP instability The command bgp deterministic-med is one of them The most basic reason to want deterministic MED is consistency If you don't enable it, the same BGP configuration may select different routes at different times, depending on the order in which the BGP updates were received Routing researchers have demonstrated that nondeterministic ordering can lead to loops and inconsistent routing decisions [RFC 3345] Some of the underlying mechanisms are being fixed in the new BGP standard, but enabling deterministic ordering is a Good Thing for dealing with sometimes-flaky real-world implementations While Cisco recommends always using this command unless you have a very, very specific reason not to so, it is not enabled by default When you enable it, try to so on all routers simultaneously If you cannot enable it everywhere at once, it router by router, constantly monitoring for routing loops Enabling bgp deterministic-med forces the MED to be considered in all routes received from different peers in the same AS We've already noted that not enabling bgp deterministic-med can cause unpredictable and unrepeatable results Even worse, especially in more complex iBGP configurations, not turning it on can cause route oscillation changes as you watch This problem appears as one of two types, which [RFC 3345] calls type and type churn Type churn happens when: The AS uses a single level of RRs or confederations, and The AS accepts the MED attribute on updates from two or more ASs, and The MED values are unique for each route Alternatively, nondeterministic MED can cause type churn See the detailed timing tables in [RFC 3345] The most basic way to prevent churn is to make sure that intracluster or intraconfederation-AS IGP metrics are always less than the corresponding intercluster IGP metrics You can get type churn when: Something You Might Not Know You Know about Loop Prevention Did you ever wonder why OSPF always prefers an intra-area route to an interarea, an interarea to an external type 1, and an external type 2? Without Your AS accepts unique MED values for the same getting into all the theory, the motivation prefix from more than one AS (i.e., alwaysis preventing loop formation from data compare-med has to be enabled) that might be generated inside an area and leak back into it via an ABR, or One of the easiest ways to avoid type churn is to consider generated outside the domain and leak all of your BGP selection criteria and set policies so there is back into it via an ASBR a distinct list of preferences by AS In other words, this requires that any route from AS1 is always most preferred, Route reflector clusters and confederations are levels of hierarchy, any route from AS2 next preferred, and so forth There is a hierarchy of multiple levels of RR clusters or confederation sub-ASs, AND http://www.certificationzone.com/cisco/studyguides/component.html?module=studyguides 5/31/2005 Certification Zone - Tutorial It helps to avoid having multilevel RR or confederation hierarchies, and, indeed, you probably won't have enough routers in the CCIE lab to create this condition Nevertheless, if you have this situation, fully meshing the members of a RR cluster avoids concern Route reflection, even in this case, still is a valid scalability technique because you are still reducing the number of peerings outside the cluster To avoid type churn, use local-pref as the principal selection criterion pertaining to the originating AS, and then use IGP cost and metric Graceful Restart/Cisco Nonstop Forwarding Page 46 of 56 much as are OSPF and ISIS areas We have the same sort of concern that we want to keep routing information as local as possible inside the area with an IGP, or inside the reflector cluster or confederation with BGP BGP can get even more complex, because you aren't strictly limited to two levels of hierarchy You can set up tiers of reflectors and confederations For example, the reflector of one cluster could very well be a client in a higherlevel cluster Hierarchies inside BGP sometimes are the only way to solve scaling problems, but their design and configuration is a challenge for truly expert ISP routing engineers The IETF has been working with the concept of graceful restart Graceful restart takes an optimistic view of the relationships between routing protocols and the forwarding table It assumes that even though a routing session may have crashed, the routes in the forwarding table are probably still good, and can be used while the routing protocol session reinitializes Obviously, they help keep traffic flowing in the event of failures Less obvious but equally important is that graceful restart and route flap damping both contribute to internet stability by slowing the rate at which flaps propagate Originally, this work focused on BGP, but it soon became apparent that BGP needed the support of IGPs to maintain connectivity within its AS Work began on graceful restart for OSPF and ISIS Cisco's implementation of graceful restart is called Cisco nonstop forwarding (NSF) It requires SSO to be running The primary goal of NSF is to keep a router with redundant route processors forwarding while the backup RP takes over from the primary Distributed line cards or forwarding processors maintain synchronization with the FIB wherever it is active IETF Graceful Restart Principles Fundamentally, graceful restart protocols make two assumptions: The forwarding table/FIB isn't made completely invalid by an associated routing protocol failure When a failed link to another router is restored, you can go through resynchronization rather than full topology discovery (e.g., Loc-RIB in BGP, link state databases in OSPF and ISIS) This is an optimistic view, as opposed to the conventional pessimistic view that withdraws all routes from a routing process if the route fails In other words, it is more important that some forwarding continue, and route flaps be avoided, than it is possibly to misroute or blackhole some packets When a standard BGP speaker loses connectivity to a route, it must withdraw the route and propagate the withdrawal to all peer speakers to whom it previously advertised the route It may advertise a new route with a different next hop, but it must withdraw the failed route End-of-RIB: A Really Good Thing One feature of this capability, which could be useful beyond the original intent of the graceful restart capability, is the end-of-RIB marker The presence of this marker in an update indicates that all routes have been sent and that, if the interface has been batching any routes, http://www.certificationzone.com/cisco/studyguides/component.html?module=studyguides 5/31/2005 Certification Zone - Tutorial Since BGP builds on information from IGPs in its AS, it makes sense to wait for IGP graceful restart and reconvergence before beginning BGP graceful restart Routers cannot fully implement graceful restart if both ends of a session not support it, although Cisco has a partial solution to NSF-unaware neighbors for ISIS Once the receiving speaker receives the end-of-RIB marker from all peers that have indicated that they are restarting, it can begin to run route selection on the received routes A router won't any BGP advertising until it receives the end-of-RIB marker from all relevant peers and can build the appropriate routing protocol tables However, it will continue to forward while waiting for end-of-RIB, using what might be an increasingly stale FIB Once the routing protocol tables can be rebuilt, the FIB must be flushed and stale data removed Not doing this would result in chaos if multiple restarts took place Cisco NSF Routing and Forwarding Operation Page 47 of 56 full convergence can begin Indeed, it has been recommended that as long as a BGP speaker can generate end-of-RIB, it can be useful that it advertises the graceful restart capability even if it cannot retain its FIB during a BGP restart Handling Multiple Restarts To introduce sanity, the restarting BGP speaker must mark its existing routes as stale This is one of the protocol extensions in graceful restart Stale routes will be flushed when end-ofRIB is reached In addition, a router can put a timer on stale routes, which will be flushed when the timer expires A separate timer can be configured to defer the start of route selection after end-ofRIB As was recognized by the IETF as the minimum necessary set of protocols, Cisco supports NSF for BGP, ISIS, and OSPF It requires SSO and CEF Since the lowest-end platforms on which it runs are in the 7000 series, you will not find it in the CCIE lab given the current equipment list It could, however, be a CCIE written question Cisco distinguishes between NSF-aware and NSF-capable devices NSF-aware devices run an IOS supporting NSF, but NSF-capable devices are NSF-aware devices that have NSF configured All neighbors of a BGP, OSPF, or ISIS router must be at least NSF-aware to have an NSF-capable session BGP neighbors also must have the graceful restart capability Obviously, if graceful restart is implemented, a router has to tell its peers that it does have that capability It does that through capabilities advertisement NSF switchover for ISIS and OSPF means that the databases will need to resynchronize, but the adjacencies not need to be reestablished Of course, if the neighbor does not respond to the resynchronize request before a timer expires, it must be considered dead and the routes involved really must be withdrawn In NSF, the routing actually runs only on the active RP; the standby RP does not monitor the protocols, as does a standby PIX Think of the Cisco implementation as 1:1 failover There is a Cisco extension for sharing state information with the standby RP, but OSPF and BGP have to full database resynchronization NSF for Link State IGPs After NSF switches link state protocol operation to the standby RP, the first step is to send NSF signals to their peer routers Assuming the neighbors NSF-capable, they recognize that they need to resynchronize their link state databases, but they can bypass neighbor establishment From the responses to the NSF signal, the newly active RP rebuilds a neighbor list After it recognizes a neighbor, it can begin to resynchronize with it After resynchronization, the newly active RP purges its http://www.certificationzone.com/cisco/studyguides/component.html?module=studyguides 5/31/2005 Certification Zone - Tutorial Page 48 of 56 LSDB of stale information, generates a new RIB, and updates the FIB Once NSF restart begins, a timer preventing new restarts from initiating begins to run Not to have such a timer risks repeated restarts with resulting flaps OSPF and ISIS assume, as their basic mode of operation, that neighbors are NSF-capable Cisco's implementation has an extension for ISIS, called Cisco ISIS, that can resynchronize even if the neighbors are not NSF-aware The difference in this protocol is that adjacency and LSDB information does transfer to the standby RP, approaching 1+1 protection BGP graceful restart should begin only after IGP convergence BGP Operation Cisco supports attempting to establish graceful restart with both NSF-capable and non-NSF-capable routers Graceful restart/NSF is particularly important in protecting the global Internet, in that it improves worldwide stability by avoiding route flaps for all routes accepted from a peer on which there has been a failure BGP uses the capabilities advertisement feature in its open message to tell a peer that it supports graceful restart Remember that this feature is intended for situations where BGP simply quits rather than does a controlled shutdown with the notification message The capabilities advertisement message contains a timer value that is an estimate of the time needed for a restart to complete To be useful, the value of this timer must be less than the BGP session timeout Failure of the advertising router to complete reinitialization by the time this timer fails allows its peers to mark it as unavailable without waiting for the relatively long BGP session timeout After an RP switchover, an NSF-aware peer marks all routes received from the restarting peer as stale but continues to use them for the time set in the specific advertisement of graceful restart capability In other words, the functioning peer will continue to forward to the peer in failover while the latter is reconverging There is an assumption that the FIBs on the restarting peer are still mostly valid As the non-restarting peer continues to forward, it also rebuilds the BGP session One of the key aspects of the recreated session is that the non-restarting peer knows that it has received all routes from its peer when it receives an end-of-RIB marker Using the new RIB information, the non-restarting peer then removes stale routes from its Loc-RIB Protection against Routing Attack You'll want to install any of the attack protections, with the possible exception of authentication, at the edge of your network The edge, in this case, definitely includes all end user-enterprise interfaces, and may include interprovider links On a case-by-case basis, you may decide not to filter on links with providers that you know have the same protections as you on all of their customer links Ingress Filtering RFC 2827 is the current version of the ingress filtering method for protecting against certain attacks In practice, this method lacks scalability when dealing with large ISP routers, and also depends on trusted headers Unicast remote path forwarding (uRPF) is a more scalable technique, but fully implementing the solution requires cryptographic authentication of routing updates and possibly packet headers http://www.certificationzone.com/cisco/studyguides/component.html?module=studyguides 5/31/2005 Certification Zone - Tutorial Page 49 of 56 Unicast Reverse Path Forwarding (uRPF) The problem with filtering is that, barring locally written automatic filter generators, it imposes a very heavy maintenance and change control workload Some providers indeed have such generators that generate filters based on information on local or public route registries (see www.radb.net) Without per-packet authentication, it also assumes source addresses are not spoofed within the legitimate address block Unicast RPF, however, does not use access lists, but instead uses the much faster FIB FIB dependency implies that CEF is a prerequisite In unicast RPF, the source addresses are checked against the FIB to see that there is a reverse route to the interface on which the packets arrived If there is not, the packets are dropped uRPF Restrictions uRPF will not look inside tunnels or inside higher-layer protocols that include IP addresses in their messages Performance is the first motivation for uRPF It also is reasonably automated, since it depends on routing protocols rather than manual ACL configuration You are still free to use ACLs, which operate before the uRPF check on the ingress interface and just before forwarding on the egress interface Contrary to growing urban legend, uRPF can work with some modes of multihoming Specifically, it requires that the multihoming does not go to multiple interfaces on the same router In practice, this usually means that uRPF should be deployed on external interfaces to customers or peer ISPs The general operational experience is that it is reasonable to implement uRPF on aggregation routers but not on access or core routers Access servers often default to one or more aggregation routers, and not have the full routes necessary for uRPF to work Core routers should be focused on forwarding, not filtering When there are multiple paths in the routing table to the ingress interface, the incoming packets must take the best path Equal-cost multipath routes are all considered best, as are EIGRP unequal-cost routes If the ingress interface sends routing updates, the incoming packet must comply with the route specified by that update Authentication I discussed BGP authentication in my BGP Tutorials MD5 authentication of BGP protocol messages is only part of making sure your eBGP is robust, potentially while under attack All that the MD5 authentication does is verify that the update came from an authorized neighbor If you can assume that every router back to the source is also authenticated, you have much better confidence, but we can't verify this with available technology There are proposals for digital signature verification at each AS in the path, but these are only proposals No amount of authentication can protect you against a legitimate router that generates bad information You need to rely on other safeguards, such as prefix limit and more complex acceptance policies, the latter often based on information in routing registries (see www.radb.net) Prefix limiting and flap dampening still not protect against deliberate or accidental denial of service by large amounts of BGP traffic Using committed access rate (CAR) traffic policing on BGP is yet another viable safeguard Midboxes One of the basic issues in midboxes such as firewalls, network address translators, and security http://www.certificationzone.com/cisco/studyguides/component.html?module=studyguides 5/31/2005 Certification Zone - Tutorial Page 50 of 56 gateways is whether they must preserve sessions in the event of a component failure Of course, we would prefer that users not notice failures, but user-transparent failover is complex and resourceintensive Failover schemes in which the next session (including the user's reestablished session) will always be directed to a working set of resources are much easier to accomplish [RFC 2391] Remember that part of failover can be solved in application hosts or clusters of them If you are doing an e-commerce shopping cart, the shopping cart should be saved, at least temporarily, on a host All the world's failover mechanisms on switches, routers, and midboxes can't save it Reverse Route Injection RRI generates static routes to the backup devices in 1:1 protection schemes It is primarily intended for secure VPNs, either for load balancing or high availability, and works with both static and dynamic crypto maps RRI deals with both static and dynamic mappings In its static role, it builds a static route for each destination covered by an extended access list For the dynamic case, it builds a static route to the RRI peer that connects a subnet (i.e., a security gateway) or a host PIX Failover The PIX has long had a failover feature, but one that depends on additional cabling and that, by not keeping the backup synchronized, can cause data or session failure In the traditional mode, the backup monitors, but does not have full information and typically will drop connections during failover The newer stateful failover mechanism does pass state information on active connections to the standby PIX You need to configure the failover link command to identify a link used to pass state information between the primary and standby PIX High Availability Based on NAT Historical Perspective on NAT NAT was first implemented on the PIX Cisco acquired the PIX from a company called Network Translation NAT can protect against server failures or paths to the servers Stateful NAT can protect against failure of the NAT device itself Whether you implement it in IOS or on a PIX, another use of NAT is not concerned with the efficiency of assigning addresses, but with various load distribution and fault tolerance mechanisms beyond what can be done with pure IP routing Using NAT, you can establish a virtual host on the inside network that coordinates load sharing among real hosts Destination addresses that match an access list are replaced with addresses from a rotary pool Allocation is done in a round-robin basis, and only when a new connection is opened from the outside to the inside Non-TCP traffic is passed untranslated (unless other translations are in effect) Obviously, for any NAT failover, there must be a physical link between the NAT devices, preferably dedicated to the purpose This is not necessary in NSF because failover is through the backplane While Cisco has not made any announcement about failover between physically different router chassis, it would be logical that such a link would be necessary in such configurations Having a pool of servers behind NATs, or stateless NATs, certainly protects against new transactions going to a dead server Server pools, in general, are a form of 1:N protection To preserve transactions in process, at least stateful NAT is necessary http://www.certificationzone.com/cisco/studyguides/component.html?module=studyguides 5/31/2005 Certification Zone - Tutorial Page 51 of 56 IOS SNAT Failover Stateful failover for IOS NAT has been added, although in a phased approach The major limitation is that its phase does not support NAT with Application Layer gateways (ALGs) that deal with protocols that contain embedded addresses in the higher-layer protocol payload, such as FTP redirection You establish translation groups among two or more routers You'll configure HSRP with a group name rather than an IP address, and include the group-name in an ip snat statement For NAT, you have redundancy group-names in the ip snap commands, which, in turn, point to ip nat pool definitions You must explicitly configure the NAT as stateful IPSec There are several levels of IPSec failover A basic mode uses HSRP to find a new gateway, but HSRP alone doesn't synchronize state among the alternate IPSec gateways You need to configure IKE keepalive to pass that information Otherwise, during a failover with HSRP, all security associations (SAs) will be lost and will need to be reestablished with IKE You define the IPSec tunnel endpoint as the virtual IP address shared among the IPSec devices Of course, if this were the only thing done, you would still need to resynchronize To avoid this, you use IKE keepalive both to detect failover and to handle state during failover IPSec will use the HSRP virtual IP address as the destination identifier for security associations (SAs) and key management protocol (ISAKMP) identity Troubleshooting and Recovery Remember that pings and traceroutes through tunnels cannot see the tunnel endpoints Figure 31 Tunnel Challenges http://www.certificationzone.com/cisco/studyguides/component.html?module=studyguides 5/31/2005 Certification Zone - Tutorial Page 52 of 56 Pings and traceroutes to the tunnel endpoints not see the intermediate hops Figure 32 Original Configuration A ping to 1.1.1.1 works Both the ICMP Echo Request and Echo Reply go through the normal path from you to AS2 http://www.certificationzone.com/cisco/studyguides/component.html?module=studyguides 5/31/2005 Certification Zone - Tutorial Page 53 of 56 Figure 33 Link Failure to Primary ISP Shut down the link to AS2 (Figure 33) You can ping 2.2.2.2 because AS3 knows that 2.2.2.2 is in its address space, and it knows your address because it advertises your address space inside AS2, but not beyond Providers will usually advertise address space assigned to their direct customers by other providers, but they will want both a specific request from you and permission from the provider responsible for the address block With the link to AS1 back up, you can ping 3.3.3.3 because AS4 knows how to get back to the aggregate, advertised by AS2, containing your more-specific A ping to 3.3.3.3 will also fail While AS3 knows how to send to AS4, AS3 does not readvertise your address block to AS4 AS4 sends the response to AS2, which cannot deliver the response to you while its link is down To avoid problems like this, you must make sure that your prefix propagates at least from all of your directly connected providers High Availability Applications As storage area networking and application fault tolerance become more important, the possibility grows that you may be asked questions on the CCIE written exam that deal with application-level availability Here are some representative considerations Table 18 Threats and Countermeasures for Application Availability [Berkowitz 2000] http://www.certificationzone.com/cisco/studyguides/component.html?module=studyguides 5/31/2005 Certification Zone - Tutorial Page 54 of 56 Threat Alternative Countermeasures At a single site, single server failure or out of service for maintenance Local clustering Overload of the virtual server clusters at multiple sites Intelligent load distribution with DNS Loss of connectivity to a site Intelligent directory Server crash Backup, checkpointing Even if the server stays up, you may need additional measures to be sure that the data on it remains valid Table 19 Server Data Integrity [Berkowitz 2000] File/Database-Oriented Transaction-Oriented Backup Multiple backup with shipping Reciprocal remote backup Transaction log Remote transaction log Parallel database Two-phase commit Other applications are inherently more distributed, and can benefit from an assortment of backup site techniques The backup site can be cold standby, in which the application and database must be loaded onto a new server and that server brought into operation, possibly needing system software to be set up Cold standby means a restoration time of hours at best It will be accurate up to the last physical backup taken Hot standby sites have communication links to the primary site and are updated in near real time There are several strategies for hot standby that entail increased cost but also will result in decreased outage time in the event of a failure: z Remote transaction logging: The incremental log file of the primary site is recorded (or copied) at the remote site In the event of a primary site failure, this file will be closed and used to update the database z Mirrored but not synchronized: As the primary database records each transaction, it generates a message containing the contents of the transaction and sends this message to the backup system Beyond the TCP or other transport-level error control, the primary system does not know whether the secondary system actually has updated with the change When this method works well, there may be only one record difference between the two in the event of a failure Congestion or other errors can leave more records in an ambiguous state z Mirrored and transaction-synchronized: As the primary prepares to record a transaction, it sends a copy to the backup and waits for a confirmation before it commits the change to its own database If no confirmation is received, the change is rolled back and retried or treated as an error depending on site policy An additional safeguard may include waiting for the remote database to confirm that the primary database has been changed This method has the highest overhead, but also the best protection for database integrity http://www.certificationzone.com/cisco/studyguides/component.html?module=studyguides 5/31/2005 Certification Zone - Tutorial Page 55 of 56 Conclusion The appropriate level of availability for a system always has to be justified, at least in real networks, through cost-benefit analysis Infinite availability has an infinite cost Always remember that high availability does not mean simply adding redundancy Indeed, inappropriate redundancy can cause greater overhead and risk of failures as well as complicate troubleshooting We have a very wide range of techniques for improving availability They exist at every OSI layer, although the trend is to move recovery mechanisms to higher levels than had been the practice not long ago Layer mechanisms such as SONET APS are inferior to Layer mechanisms such as 802.17, and both traditional routing and MPLS have superiorities over classical spanning trees Since there are so many mechanisms for enhancing availability, you'll certainly encounter some in certification tests Many candidates try to learn every mechanism, but fail in a lab test or real-world requirement because they not understand the interaction of the many mechanisms This Tutorial emphasizes the goals of mechanisms and their interactions, rather than their detailed individual configuration References Books, Reports, and Articles [Awerbuch 2002] Awerbuch, B., et al., "An On-Demand Secure Routing Protocol Resilient to Byzantine Failures." www.jhuisi.jhu.edu/institute/docs/B_Awerbuch_wise2002_sec_routing.pdf [Berkowitz 2000] Berkowitz, H WAN Survival Guide New York: John Wiley & Sons, 2000 [Berkowitz 2002] Berkowitz, H Building Service Provider Networks New York: John Wiley & Sons, 2002 [Greene 2002] Greene, B R., and P Smith Cisco ISP Essentials Cisco Press, 2002 [Hines 2002] Hines, A A Planning for Survivable Networks New York: John Wiley & Sons, 2002 [Perlman 1988] Perlman, R "Network Layer Protocols with Byzantine Robustness." Ph.D dissertation, Massachusetts Institute of Technology, 1988 Laboratory of Computer Science document MIT-LCS-TR429 www.lcs.mit.edu/publications/pubs/pdf/MIT-LCS-TR-429.pdf [Sharan2002] Sharan, C "Deploying Campus Networks," Networkers 2002 www.cisco.com/networkers/nw02/post/presentations/docs/RST-271.pdf [Tiara 2000] Tasman Networks "Multilink Multimegabit Access," white paper, plasma.tasmannetworks.com/Public/PDFs/MultilinkAccess.pdf IETF Requests for Comment [RFC 1256] Deering, S., et al "ICMP Router Discovery Messages." [RFC 2391] Srisuresh, P., and D Gan "Load Sharing using IP Network Address Translation (LSNAT)." http://www.certificationzone.com/cisco/studyguides/component.html?module=studyguides 5/31/2005 Certification Zone - Tutorial Page 56 of 56 1998 [RFC 2663] Srisuresh, P., and M Holdredge "IP Network Address Translator (NAT) Terminology and Considerations." 1999 [RFC 2827] Ferguson, P., and D Senie "Network Ingress Filtering: Defeating Denial of Service Attacks which Employ IP Source Address Spoofing." [RFC 2858] Bates, T., Y Rekhter, R Chandra, and D Katz "Multiprotocol Extensions for BGP-4." June 2000 [RFC 2918] E Chen "Route Refresh Capability for BGP-4." 2000 [RFC 3027] Holdredge, M., and P Srisuresh "Protocol Complications with the IP Network Address Translator (NAT)." Work in Progress, IETF NAT Working Group, 1999 [RFC 3235] Senie, D "NAT Friendly Application Design Guidelines." 2002 [RFC 3345] McPherson, D., et al "Border Gateway Protocol (BGP) Persistent Route Oscillation Condition." [RFC 3392] Chandra, R., and J Scudder "Capabilities Advertisement with BGP-4." 2002 IETF Internet Drafts Internet Drafts are working documents, which usually but not always become RFCs The -nn version numbers were correct at the time of writing, but may be updated by the time you read this Alternatively, the draft might expire or be replaced by an RFC www.ietf.org/internet-drafts/draft-ietf-idr-bgp-ext-communities-05.txt www.ietf.org/internet-drafts/draft-ietf-idr-route-filter-08.txt www.ietf.org/internet-drafts/draft-ietf-idr-aspath-orf-04.txt www.ietf.org/internet-drafts/draft-ietf-idr-restart-06.txt www.ietf.org/internet-drafts/draft-ietf-idr-as4bytes-06.txt [IE-HiAv-WP1-F06] [2003-04-30-02] http://www.certificationzone.com/cisco/studyguides/component.html?module=studyguides 5/31/2005 ... Routing High Availability for Power High Availability for Route Processors Partition Repair Backbone Nonbackbone Getting Creative with Tunnels and Nonredistributed Static Routes High Availability. .. Injection PIX Failover Historical Perspective on NAT High Availability Based on NAT IOS SNAT Failover IPSec Troubleshooting and Recovery High Availability Applications Conclusion References Books,... to move availability mechanisms to higher layers This may seem counterintuitive as new and powerful switches, such as the 3550, move into the lab, but you will often find their high availability