High-Availability Campus Design – Best Practices Presented by Dr Peter J Welcher, Chesapeake Netcraftsmen Slide About the Speaker • Dr Pete Welcher – Cisco CCIE #1773, CCSI #94014, CCIP – Network design & management consulting • Stock quotation firm, 3000 routers, TCP/IP • Another stock quotation firm, 2000 routers, UDP broadcasts • Hotel chain, 1000 routers, SNA • Government agency, 1500 routers – Taught many of the Cisco courses • CiscoWorld / Enterprise Networking Magazine articles – http://www.netcraftsmen.net/welcher/papers Slide Copyright â 2003, Chesapeake Netcraftsmen Handout Page-1 Agenda • • • • • • Introduction and Motivation Campus Design Principles LAN Protocols – Further Comments Adding Wireless Managing the Campus Network High Availability Wrap-Up Slide What Are We Trying To Do? • Cost-effective building / campus switched network design • Modern L2/L3 switches • Redundancy, high availability • Reasonably high level of security • AVVID support – Future IP Telephony – IP Video Conferencing (IPVC) – Other Video services Support for QoS Slide Copyright â 2003, Chesapeake Netcraftsmen Handout Page-2 Assumption • Cisco’s Networkers 2002 RST-271 presentation preceded this, see the URL http://www.cisco.com/networkers/nw02/po st/presentations/docs/RST-271.pdf Slide Agenda • • • • • • • Introduction and Motivation Campus Design Principles LAN Protocols – Further Comments Adding Wireless Managing the Campus Network High Availability Wrap-Up Slide Copyright © 2003, Chesapeake Netcraftsmen Handout Page-3 Connect Distribution Layer? • When the Distribution Layer switches need a trunk or link between them? Access Distribution Slide Connect Distribution Layer? • When the Distribution Layer switches need a trunk or L3 link between them? – When there is more than one port on the switch in a VLAN, need a trunk – When each distribution switch has only one connection to the WAN or Core router or switch – Best to avoid sending inter-switch traffic through IDF switches • Consider having such links! Slide Copyright © 2003, Chesapeake Netcraftsmen Handout Page-4 Use of L3 Switching • Use L3 switching rather than L2 – “Within reason” • L3 in the IDF? – Do you really want each IDF switch doing routing? – What does this buy you if every port on the switch is on one or two VLANs? – Making uplinks L3 may lead to slower failover! – Can use L3 capabilities in IDF for QoS but not routing L2 or L3 switching? Slide Control L3 adjacencies • Not needed if IDF’s are doing L3 routing • The point: not need the two distribution layer MSFC’s becoming neighbors on each IDF VLAN • Select 2-4 preferred VLAN’s • Make the rest passive interfaces • Preferable: use inter-switch links for this, with perhaps or “preferred VLAN’s” in case both links fail Slide 10 Copyright © 2003, Chesapeake Netcraftsmen Handout Page-5 HSRP • Use HSRP, making the MSFC’s the HSRP routers – Set HSRP primary to match preferred STP root switch – Tune VLAN interface costs (or delay with EIGRP) for symmetric return traffic Root for odd • If have single links to Core or WAN, can use tracking of that link with pre-empt • Consider load distribution VLAN’s Blocking state – Two VLAN’s per IDF – Odds / evens for root switch Slide 11 Don’t Forget the Servers • Having redundant switching doesn’t much good if a single switch failure takes out key servers (DNS, DHCP, ecommerce, etc.) • Use dual-attached servers – One IP address or two? – Depends on NIC support, application needs – Test failover! – Re-test failover after making changes Slide 12 Copyright © 2003, Chesapeake Netcraftsmen Handout Page-6 Servers: or NIC’s? • Can consider additional NIC for servers – If server NIC performance, queues and drops of concern – Possible issue here: massive backup, network storage appliance, or SAN data flows across the network – 3rd NIC allows separate address, separate subnet for this – Routing, and route hiding can direct large flows onto dedicated links or NIC’s if desired – Can isolate client-server traffic and NIC from the large flows (or management traffic) – Alternative: QoS for network, shared NIC use for server Slide 13 Cat 6500 and RPR • Choices for Supervisor redundancy – RPR: stateless failover, takes time to resume switching – RPR+: stateful failover of Sup and L2 – CatOS may be much faster than Native IOS • Choices for MSFC redundancy – Dual router mode: more routers, complexity – Single router mode (SRM redundancy): simpler – Single router mode preferred with FlexWAN module • Inactive MSFC doesn’t “see” the FlexWAN interface • Easy to lose configuration of FlexWAN interface Slide 14 Copyright © 2003, Chesapeake Netcraftsmen Handout Page-7 Cat 4507 Versus Cat 6500 • Availability of NAM, IDS, firewall blades for chassis • FlexWAN support • Degree and speed of failover (RPR vs RPR+) • Port density • Gig/10 Gig performance • Possibly feature differences, would need to check details… Slide 15 Jumbo Frames • K Tolly has been agitating for non-standard Jumbo frames for higher speed media – They provide better performance on boxes with CPU or NIC processing limitations, but smarter NIC’s and drivers alleviate the need • Design / support issues – Need to implement and configure jumbos consistently – Need equipment that supports them • Conclusions – Jumbos may be limiting on hardware purchases – A real factor in MTTR relating to SAN or high-speed outages – not a good place to add complexity Slide 16 Copyright © 2003, Chesapeake Netcraftsmen Handout Page-8 Agenda • • • • • • • Introduction and Motivation Campus Design Principles LAN Protocols – Further Comments Adding Wireless Managing the Campus Network High Availability Wrap-Up Slide 17 Auto-Negotiate • Auto-negotiate or lock down settings? – Certain NIC cards not interoperate with autonegotiation in Catalysts • This may affect other ports – Can hard-code settings on key FastEthernet ports • Servers, routers • Do set both ends – Generally, let client PC’s negotiate ? • Use high port error counts to track down situations where one end hard-coded, other trying to negotiate – Symptom: failed negotiations (one end hard-coded) end up as 100 half-duplex Slide 18 Copyright © 2003, Chesapeake Netcraftsmen Handout Page-9 Overall L2 Wisdom • Control L2 and Spanning Tree Protocol (STP) – Known root switch, known preferred link, known blocking port • Don’t mess with the STP timers – Can cause instability, our recommended design uses uplink fast for very fast failover anyway • Do keep STP scale small: “spanning tree weirdness” • Don’t disable STP: cabling and other accidents WILL happen • Don’t overdo L2 redundancy – It can hurt stability and convergence time Slide 19 Trunking Scheme • Use 802.1q for trunking – – – – • Interoperability, standard, more future support Set Cisco switches to “desirable”, at both ends This allows possible connectivity if one end won’t trunk for some reason Can make trunk native VLAN the management VLAN for same reason Trunking provides CoS bits for QoS to use – Useful with lower-end switches, not needed for L3-capable switches • Strongly Consider breaking up VLAN – Manually prune VLAN’s on trunks where not needed – This controls impact of rogue switch, prevents large Spanning Tree instability – Use this to break up VLAN size – Use some other VLAN for management of switches Slide 20 Copyright © 2003, Chesapeake Netcraftsmen Handout Page-10 QoS and the Campus • Throwing bandwidth at the problem is relatively easy in a building or campus • This may not eliminate drops • QoS is crucial to support AVVID applications • Qos is crucial where there are potential large bandwidth applications on campus – Video on demand – Network backup – SAN • This is a separate seminar talk! Slide 33 IP Multicast Support • Make sure unicast routing is consistent • Use sparse mode – You don’t want to see Dense mode floods • Use IGMP snooping • Use AutoRP – Can provide redundancy – Good way to manage RP selection • Know your application traffic – We’ve been seeing some interesting behavior when there are many multicast sources – Consider bidirectional PIM and Source Specific Multicast (SSM) where appropriate Slide 34 Copyright â 2003, Chesapeake Netcraftsmen Handout Page-17 Agenda • • • • • Introduction and Motivation Campus Design Principles LAN Protocols – Further Comments Adding Wireless Managing the Campus Network High Availability Wrap-Up Slide 35 Wireless Campus Design Issues • Isolate WLAN’s outside a firewall as untrusted networks – Security! • Subnetting and wireless mobility scheme • Authentication (WLAN, network) – Security! • Data confidentiality across WLAN – Security! Slide 36 Copyright © 2003, Chesapeake Netcraftsmen Handout Page-18 Isolating WLAN’s Outside a Firewall • WLAN: anybody (in principle) can access it – Must therefore be untrusted, “outside” network(s) • Choices: – Separate infrastructure cabling WAP’s to firewall • Costly! – Separate isolation VLAN (building-sized?) with all WAP’s in it, force traffic from VLAN through firewall • # users, scale of Spanning Tree? – Multiple isolation VLAN’s • Per-floor? 3-D signal propagation? • Mobility support by vendor? Slide 37 Subnetting and Mobility • Mobility depends on WAP vendor • No mobility support Å all WAP’s must be in same VLAN & subnet – One VLAN spanning the building or campus?? – Number of users on that one VLAN?? • Cisco proxy Mobile IP client in WAP – Standard approach to mobility – Scales well • Some WLAN “switches” provide mobility for clients on the WAP’s they control – They act as crypto and mobility gateways Slide 38 Copyright © 2003, Chesapeake Netcraftsmen Handout Page-19 WLAN and Network Authentication • Securing access to WAP – WEP isn’t even close to secure, don’t rely on it – PEAP and LEAP versions of 802.1x contending for standard status, WPA alternatives, appear adequate for controlling WAP access – DoS preventing association may still be possible with WPA – Availability and interoperability issues possible in short term with multi-vendor approaches • Alternative/workaround: IPSec IKE as authentication – Using IPSec/IKE as sole authentication means hackers can use WAP, although not to access switches or network Slide 39 WLAN Confidentiality • Current choices: – Eventually 802.11i will provide a standard using AES • But don’t hold your breath! Crypto chip support needed – WEP with frequent rekeying (Cisco TKIP) does work – WPA is current Wi-Fi scheme, uses TKIP plus other measures to render adequate confidentiality – IPSec • Advantage to IPSec – – – – – IT team may already support it for mobile, remote workers One method of access for user, one type of client, no matter where Simplifies firewall rules connecting WLAN’s to rest of network But does leave WAP access somewhat open Hence need to “harden” switches on path from WAP to IPSec concentrator Slide 40 Copyright © 2003, Chesapeake Netcraftsmen Handout Page-20 Other WLAN Security • Each PC on WLAN is in principle exposed as if connected directly to Internet – 802.11 does allow for PC-PC communications without a WAP, unless you disable this • Need personal firewall software on PC!!! – Need safeguards to ensure personal firewall in use when user on WLAN – Virus scanning? Slide 41 Agenda • • • • • • • Introduction and Motivation Campus Design Principles LAN Protocols – Further Comments Adding Wireless Managing the Campus Network High Availability Wrap-Up Slide 42 Copyright © 2003, Chesapeake Netcraftsmen Handout Page-21 Design for Manageability • Keep VLAN’s small, known root switch • Tie VLAN’s and addressing to location • Use templates and cookie-cutter approach for absolute configuration consistency • Do configuration QA using another means – show commands versus text comparison • Do QA testing – This often runs into deployment deadlines Slide 43 Configuring for Manageability • See http://www.netcraftsmen.net/welcher/papers /snmptemplate.html • See http://www.netcraftsmen.net/welcher/papers /index.htm for other articles about manageability Slide 44 Copyright © 2003, Chesapeake Netcraftsmen Handout Page-22 Pay Attention to… • Boot up testing – Can turn on full testing – Affects reboot and possibly failover time • Syslog messages to CiscoWorks server – But not to console • Send SNMP traps to HP Openview or CW DFM • NTP • CDP • Manage the important ports! Slide 45 AUX VLAN for IP Phones • I see this as a management issue • Separate VLAN and addressing for phones can make some tasks easier – Passing selected DHCP options – Implementing QoS – Troubleshooting Slide 46 Copyright © 2003, Chesapeake Netcraftsmen Handout Page-23 Network Management • Cisco recommends: – Do something simple, and it well – Reduce staff overload due to excessive data polling, collection, tools, and manual analysis • Find out what is working well on the network and leave it alone • Concentrate on what is not working Slide 47 Network Management Toolset • HP Openview as an NMS – Primarily for Fault Management – Real-time graphs • CiscoWorks – Resource Manager Essentials for Configuration, syslog, inventory, and software manager – Campus Manager for VLAN and user management – Device Fault Manager for first-line performance management • Low-Cost Performance Management – – – – Cricket (freeware: Linux/UNIX or NT*) RTG (Linux/UNIX freeware), plus web forms & CGI scripting SolarWinds Orion What’s Up Gold Slide 48 Copyright © 2003, Chesapeake Netcraftsmen Handout Page-24 Configuration Essentials • Passwords (telnet, enable) • Permit filter for telnet or SSH access – Use SSH to prevent exposure of passwords? • Use banner messages that not give info away unnecessarily • Consider TACACS+ or Radius – – – – Who accessed the device, when Audit trail Privileges Can use Cisco ACS on Windows Slide 49 Secure Management Traffic • SNMP: how to secure it? – SNMPv3?? – Separate VLAN for management • One alternative: out-of-band management – How cost-effective is this? – Separate LAN to devices for SNMP and telnet access: costly – Modems on remote AUX ports and/or terminal server (+ modem?) connected to consoles: good back door if lose network connectivity Slide 50 Copyright © 2003, Chesapeake Netcraftsmen Handout Page-25 Agenda • • • • • • • Introduction and Motivation Campus Design Principles LAN Protocols – Further Comments Adding Wireless Managing the Campus Network High Availability Wrap-Up Slide 51 Keys to High Availability (HA) • Redundancy – Use first where needed most – One bulletproof chassis versus two chassis? • HA Costs – – – – – – Redundancy Spare parts onsite Tools & test gear Skills Training Staff head-count Slide 52 Copyright © 2003, Chesapeake Netcraftsmen Handout Page-26 Achieving High Availability • MTBF & MTTR are both factors • Increasing MTBF – Redundancy – Fast failover & protocols • But test them – Emergency Power Off button guard cover – Good security practices! • Lower MTTR – – – – – – – – Tools Techniques Staff skills, training Solid simple design Good current maps! Port descriptions! Spare parts Building, room access (key versus guard travel time) Slide 53 Human Procedures • Avoid being the cause of an outage – – – – Configuration change control Image change control and procedures Testing prior to changes Operations control (tight but not too inflexible) • Noticing failures which not cause outages • Understanding what’s going on – Training – Time to observe traffic flows and protocols • Application profiles – Knowing what matters – Knowing who to contact Slide 54 Copyright © 2003, Chesapeake Netcraftsmen Handout Page-27 Diversity • There are many aspects to diversity: – Facilities • Cabling • Power • Heating/cooling • Water (pipes, pumps, drainage…) • Rack location (fire sprinkler heads?) – Geographic diversity – WAN • Dual links • Dual carriers (but is it really diverse?) • Diverse technologies Slide 55 Managed HA • Improving Operations impact on HA isn’t as easy as it looks – High costs – May require culture and job description changes • Manage outage information – Outage frequency, severity, MTTR – Outage cause reports • Net management – Training and procedures – Event management procedures, rulesets – Performance and Capacity management • Service Level management – Target SLA’s – Continual improvement – The right incentives Slide 56 Copyright © 2003, Chesapeake Netcraftsmen Handout Page-28 Agenda • • • • • • • Introduction and Motivation Campus Design Principles LAN Protocols – Further Comments Adding Wireless Managing the Campus Network High Availability Wrap-Up Slide 57 See Also • Gigabit Campus Network Design— Principles and Architecture http://www.cisco.com/en/US/customer/netsol/ns110/ns146 /ns147/ns17/networking_solutions_white_paper09186a00 800a3e16.shtml • Gigabit Campus Design Configuration and Recovery Analysis http://www.cisco.com/en/US/customer/netsol/ns110/ns146 /ns147/ns17/networking_solutions_white_paper09186a00 800a3e0b.shtml • Best Practices for Catalyst 4000, 5000, and 6000 Series Switch Configuration and Management http://www.cisco.com/en/US/customer/products/hw/switch es/ps663/products_tech_note09186a0080094713.shtml Slide 58 Copyright © 2003, Chesapeake Netcraftsmen Handout Page-29 Summary • Keep Campus Designs simple and redundant • Use a cookie-cutter approach and configuration templates • Use L3 to keep STP domains very small and limit the scope of any outages • Management practices and procedures are important part of achieving High Availability Disclaimer: this presentation touches on most of the high-level issues, but it definitely does not cover all the details of campus design and IPT planning Slide 59 THANK YOU ! Slide 60 Copyright © 2003, Chesapeake Netcraftsmen Handout Page-30 A Word From Us … • We can provide – – – – • Network design review: how to make what you have work better Periodic strategic advice: what’s the next step for your network or staff Network management tools & procedures advice: what’s right for you Implementation guidance (your staff does the details) or full implementation We – Small- and Large-Scale Routing and Switching (design, health check, etc.) – IPsec VPN and V3PN (design and implementation) – QoS (strategy, design and implementation) – IP Telephony (preparedness survey, design, and implementation) – Call Manager deployment – Security – Network Management (design, installation, tuning, tech transfer, etc.) Slide 61 Cisco Certifications Chesapeake Netcraftsmen is certified by Cisco in: • IP Telephony • Network Management • Wireless • Security • (Routing and Switching) Slide 62 Copyright © 2003, Chesapeake Netcraftsmen Handout Page-31 ... Motivation Campus Design Principles LAN Protocols – Further Comments Adding Wireless Managing the Campus Network High Availability Wrap-Up Slide What Are We Trying To Do? • Cost-effective building / campus. .. where one end hard-coded, other trying to negotiate – Symptom: failed negotiations (one end hard-coded) end up as 100 half-duplex Slide 18 Copyright © 2003, Chesapeake Netcraftsmen Handout Page-9... port-by-port is painful! Slide 21 EtherChannel • EtherChannel is a great way to scale up bandwidth – We have seen odd low frequency failures, say or out of 100 channels, once every months or so –