High-Availability Enterprise Network Design haviland@cisco.com 505 0911_04F9_c3 © 1999, Cisco Systems, Inc Staying On Target HA Focus vs Distractions! “Variety” “Flat networks are easier” of vendors, protocols, designs, etc beware! Inherited complexity hard to purge The latest cool stuff older is more 505 stable 0911_04F9_c3 © 1999, Cisco Systems, Inc Five nines is job one! “Feature rich” let’s use all the knobs! Change is hard, sometimes $$$ HA Features of the Catalyst 6500 Consider for Backbones & Server Farms ✔ Fabric Redundancy switch fabric module in CatOS 6.1 ✔ Supervisor Redundancy HA feature in CatOS 5.4.1 stateful recovery image versioning on the fly ✔ MSFC Redundancy config-sync feature IOS 12.1.3 CatOS 6.1 HSRP pair 505 0911_04F9_c3 © 1999, Cisco Systems, Inc Thinking Outside the Box For HA/HP design “outside the box” ☛ the logical design is critical ☛ network features & protocols Inside: “HA”, RAID, UPS, MTBF, etc ☛ geophysical diversity is powerful 505 0911_04F9_c3 © 1999, Cisco Systems, Inc Dramatis Personae Our Cast of Symbols ✔ Links GE, DPT, SONET, etc Channel GigE ✔ L2 switching Catalyst 4000 L2 forwarding in hardware ✔ L3 switching Catalyst 6500 L3/L2 forwarding in hardware ✔ Routing L3 forwarding (SW or HW) ✔ Control plane = IOS Cisco 7500 Cisco 12000 routing protocols & features 505 0911_04F9_c3 ✔ QoS where required ✔ Application intelligence © 1999, Cisco Systems, Inc HA Gigabit Campus Architecture survivable modules + survivable backbone Access L2 Client Blocks Distribution L3 ☛ Define the mission critical parts first! Ethernet or ATM Layer or Layer Backbone Distribution L3 Server Block Access L2 E or FE Port or GEC 505 GE 0911_04F9_c3 © 1999, Cisco Systems, Inc Server Farm High Availability Design Why a Modular ABC Approach ✔ Many new products, features, technologies ✔ HA and HP application operation is the goal ✔ Start with modular, structured approach (the “logical” design) ✔ Add multicast, VoIP, DPT, DWDM 505 0911_04F9_c3 © 1999, Cisco Systems, Inc Design the Solution Then Pick the Products $350 New Modules Price per 10/100 New Modules $300 New $250 $200 Catalyst 2912G Catalyst 2948G Catalyst 2980G Catalyst 5XXX Catalyst 6XXX New Catalyst 4XXX $100 10/100 Ports Gigabit Ports Backplane Switching Capacity 505 0911_04F9_c3 © 1999, Cisco Systems, Inc 32-96 24 6-12 24-500+ 3-38+ 24-350+ 8-64+ 24 Gbps 20 Mpps 1.2-3.6 + 10Gbps Up to 72 Mpps 250+ Gbps Up to 150 Mpps HA Design Reality Check! Assume Things Fail - Then What? ✔ Networks are complex ✔ Things break, people make mistakes ✔ What happens if a failure occurs? ✔ Simple, structured, deterministic design required for fast recovery ✔ The “tradeoffs” your choices are important 505 0911_04F9_c3 © 1999, Cisco Systems, Inc Network Recovery How Long? What Happens? Building Branches Access Layer Distribution WAN Layer Core L3 WAN backup Server Distribution Server Farm Layer 505 0911_04F9_c3 © 1999, Cisco Systems, Inc 10 2) VLAN Building Block make L2 design match L3 design All VLANs terminate at L3 boundary All VLANs All Subnets UplinkFast FE BO FO BE All VLANs All Subnets FE BO FO BE All VLANs All Subnets FE FO BO BE All VLANs All Subnets FE BO FO BE More flexible GE/GEC VLAN Trunks FO forwarding odd BE blocking even etc L2 L3 STP root VLANs 10 12 14 16 HSRP primary subnets 11 13 15 17 HSRP primary subnets 10 12 14 16 10/100 BaseT 505 0911_04F9_c3 GE or L2 Path © 1999, Cisco Systems, Inc GEC STP root VLANs 11 13 15 17 L2 L3 Dual Path with Tracking 44 3) Large-Scale Server Farm Building Block Dual-NIC Server Example Fault Tolerant Mode (FTM) Same IP Address - seamless recovery based on VLAN building block aggregates traffic - high BW Access L2 UplinkFast GE/GEC VLAN Trunks 10/100 BaseT GE or GEC L2 L3 505 0911_04F9_c3 STP root VLANs EVEN L2 Path STP root VLANs ODD HSRP primary subnets ODD HSRP primary subnets EVEN L2 L3 Dual Path with Tracking © 1999, Cisco Systems, Inc 45 4) Small-Scale Server Farm Building Block Simplified building block with no STP loops Use if port density permits Dual-NIC Server Example Fault Tolerant Mode (FTM) Same IP Address - seamless recovery Use if no oversubscription (non-blocking) is a requirement L2 L3 505 0911_04F9_c3 L2 Path HSRP primary subnets EVEN HSRP primary subnets ODD Dual Path with Tracking © 1999, Cisco Systems, Inc L2 L3 10/100 BaseT GE or GEC46 Redundant Backbone Models all good - increasing scale 1) Collapsed L3 Backbone 2) Full Mesh 3) Partial Mesh 4) Dual-Path L2 Switched 5) Dual-Path L3 Switched 505 0911_04F9_c3 © 1999, Cisco Systems, Inc 47 1) Collapsed L3 Backbone large building or small campus Clients Access L2 GE/GEC Collapsed Backbone Core L3 Scale depends on physical plant and policy more than performance 10/100 BaseT Server Farm 505 0911_04F9_c3 © 1999, Cisco Systems, Inc GE or GEC 48 2) Full Mesh Backbone small campus - n squared limitation Access L2 Client Blocks Distribution L3 blocks - peerings blocks - 15 peerings blocks - 28 peerings blocks - 45 peerings Note importance of passive wiring closet interfaces in meshed designs! Distribution L3 Server Block E or FE Port or GEC 505 GE 0911_04F9_c3 © 1999, Cisco Systems, Inc Access L2 49 3) Partial Mesh Backbone medium campus - traffic flow to server farm Access L2 Client Blocks Distribution L3 Predominant traffic pattern Distribution/Core L3 Server Block Access L2 E or FE Port or GEC 505 GE 0911_04F9_c3 © 1999, Cisco Systems, Inc 50 4) Dual-Path L2 Switched Backbone no STP loops or VLAN trunks in core North West South Access L2 Client Blocks Distribution L3 Dual L2 Backbone “red” core subnet=VLAN=ELAN E or FE Port or GEC 505 GE 0911_04F9_c3 © 1999, Cisco Systems, Inc Core L2 “blue” core subnet=VLAN=ELAN 51 5a) Benefits of a L3 Backbone ✔ Multicast PIM routing control ✔ Load balancing ✔ No blocked links ✔ Fast convergence EIGRP/OSPF ✔ Greater scalability overall ✔ Router peering reduced ✔ IOS features in the backbone 505 0911_04F9_c3 © 1999, Cisco Systems, Inc 52 5b) Dual-Path L3 Backbone largest scale, intelligent multicast Access L2 Client Block Distribution L3 Core L3 Server Farm Block All routed links, consider subnet count ! Distribution L3 Access L2 E or FE Port or GEC 505 GE 0911_04F9_c3 © 1999, Cisco Systems, Inc 53 Restore Considerations ✔ ✔Restoring Restoring can can take take longer longer in in some some cases cases more more complex complex -schedule schedule ✔ ✔On On power power up up L1 L1 may may come come up up before before L3 L3 builds builds routing routing table table -temporary temporary black black hole hole for for HSRP HSRP ✔ ✔Use Use “preempt “preempt delay” delay” for for HSRP HSRP 505 0911_04F9_c3 © 1999, Cisco Systems, Inc 54 Campus Failover Layer Recovery & Tuning UplinkFast STP Tune ‘diameter’ on root switch Improves recovery time maxage PortFast Server or desktop ports only s Move directly from linkup into forwarding 505 0911_04F9_c3 © 1999, Cisco Systems, Inc No tuning, seconds, wiring closet only Only applies with forwarding & blocking link Backbonefast Converges sec + 2xFwd_delay for indirect link failures Eliminates maxage timeout 55 Campus Failover Layer Recovery & Tuning Caution with aggressive tuning Good when network is stable, highly summarized HSRP (fast LAN links) Tune hello timer sec, dead timer sec