High availability with cisco active network abstraction

5 30 0
High availability with cisco active network abstraction

Đang tải... (xem toàn văn)

Thông tin tài liệu

White Paper High Availability with Cisco Active Network Abstraction Executive Summary It is axiomatic that in a service provider network, downtime equals revenue loss Service providers need high-availability capabilities throughout their networks that keep services running while other systems isolate faults and assist with root-cause identification to facilitate rapid problem resolution These activities require network management systems that themselves are highly available ® The high-availability characteristics of the Cisco Active Network Abstraction (ANA) system are based on distributed software architecture with internal process monitoring and interunit activepassive clustering and failover The high-availability solution scales in capacity and performance with the Cisco ANA system, with no performance bottleneck or single point of failure The high-availability solutions presented herein address local-level high availability for continuous day-to-day operations Enabling the network management system for disaster recovery is beyond the scope of this discussion High-Availability Service Requirements To help operators keep network services alive, network management systems must themselves be highly available High availability begins with hardware and link redundancies that should be designed into the primary system deployment The network management platform should include the following capabilities to assure system availability during faults: Fault isolation: As much as possible, a fault should not impair performance of other processes in the system or lead to cascading failures that disable the entire network management system Automatic process restart: Each self-monitoring unit should be able to identify and restart processes as needed, with the ability to escalate response and generate alarms if a configured number of process restarts fails to resolve the fault Automatic failover to clustered standby units: The system should be able to detect failed hardware units and initiate automatic failover to standby units without human intervention Event and alarm reporting: Real-time alerts should allow operators to respond to events and alarms in real time, and a centrally located database allows them to review event histories Cisco ANA Software Architecture Cisco ANA software represents the Cisco vision of an entirely new management architecture that facilitates end-to-end, service-level management in very large multivendor, multitechnology, multiservice networks This elegant solution manages converged, multiservice infrastructures based on the Cisco IP NGN architecture The solution is based upon a virtualized network model that creates service-level views, facilitating rapid integration of existing provisioning, fault management, and billing systems This customizable, integrated management platform vastly simplifies service provisioning, configuration, monitoring, and troubleshooting processes to reduce operational expenditures, shorten time to revenue, and increase customer satisfaction All contents are Copyright © 1992–2007 Cisco Systems, Inc All rights reserved This document is Cisco Public Information Page of White Paper Cisco ANA software is a sophisticated management fabric that intercedes between the physical network infrastructure and the operators and OSS/BSS systems for managing them (Figure 1) As the central mediation and normalization platform for the entire network management system of very large networks, Cisco ANA services must be available at all times With no single point of failure, the distributed software architecture inherently helps ensure continuous availability of assurance and fulfillment functionality by automatically detecting and recovering from a wide range of hardware and software failures The distributed architecture also confines the impact radius of single faults to prevent a “domino” effect that can lead to catastrophic system failure Figure Cisco ANA Architecture The Cisco ANA architecture includes the following components: Cisco ANA Gateway: The appliance through which all Cisco ANA clients and OSS/BSS applications access the Cisco ANA system It enforces access control and security for all connections and manages client sessions It maps network resources to their business context This capability allows Cisco ANA to contain information that is not directly housed in the network (such as VPNs and subscribers) and display it to northbound OSS/BSS applications It maintains a repository for system settings, topological data, and snapshots of active alarms and events Cisco ANA Units: Unit software is loaded onto a third-party server Units host autonomous virtual network elements (VNEs), each of which corresponds to a real network element Multiple Cisco ANA Units are interconnected to form a fabric of VNEs, which can intercommunicate with one another Depending upon server system size and VNE type, each Unit can host thousands of autonomous VNE processes Multiple Units allow administrators to optimize VNE distribution, affording the option to provision geographic proximity between each VNE and its managed network element VNEs continuously All contents are Copyright © 1992–2007 Cisco Systems, Inc All rights reserved This document is Cisco Public Information Page of White Paper resynchronize themselves with their associated devices to help ensure that no data corruption occurs during the failover cycle of a Unit Cisco ANA clients: As the comprehensive suite of GUI-based applications for managing the network using the Cisco ANA platform, there are three types of Cisco ANA client: Cisco ANA NetworkVision: The primary GUI application for Cisco ANA, used to visualize every management function that the system supports Cisco ANA EventVision: The tool for viewing historical events detected by the Cisco ANA system Cisco ANA Manage: The system administration and configuration tool for the entire Cisco ANA platform High Availability in Cisco ANA Systems The Cisco ANA system supports two levels of availability: Software-level, per Unit availability: Internal processes in each Unit and the Gateway System-level availability: N+m standby Unit clustering and 1+1 standby Gateway server Software-Level, Per Unit Availability Each Unit executes several processes: one control process and several agent virtual machine (AVM) processes that execute VNEs Each process within the Unit is completely independent, a crucial software design concept that prevents the failure of a single process from affecting other processes on the same Unit Even if the high-availability capabilities of a Cisco ANA Unit are not configured, any failure of a VNE, AVM, or Unit has only a local impact and the rest of the Cisco ANA system continues normal operations The control process in a Cisco ANA Unit is called AVM-99, a software module responsible for monitoring the health and availability of all processes in the Unit Using a watchdog protocol, the AVM-99 module pings other AVMs in the Unit to verify their status All watchdog parameters in the AVM-99 module are configurable by the operator The watchdog requires continual handshakes with every AVM process in the Unit The AVM-99 module automatically stops and reloads any process that fails to complete a handshake, simultaneously reporting the reload as an alarm to the management console This local reload is very fast, with minimal downtime The process can use previous cache information to prevent data loss during reload The AVM-99 module implements run-time adaptation and escalation If a process crashes more than a configurable number of times within a defined time period, the AVM-99 module suspends the process and sends an alarm to the Cisco ANA EventVision client, where the operator can take further steps to resolve the problem System-Level Availability Enabling system-level availability services for the entire Cisco ANA system is a flexible clustering architecture for both the Units and the Gateway This architecture assumes that the system is deployed with standard network redundancy interconnecting the nodes with dual pathways and appropriate security policies N+m Standby Unit Clustering Administrators can configure the Cisco ANA system to automatically initiate a failover to a standby Unit for a number of reasons, such as Unit hardware failures, operating system failures, power All contents are Copyright © 1992–2007 Cisco Systems, Inc All rights reserved This document is Cisco Public Information Page of White Paper failures, network failures, or a timed-out AVM reload process The flexible Cisco ANA architecture allows clustering of one or more Units with a standby server Unit This flexibility allows the service provider to balance budget with operational risks The value of m may be greater than For example, a service provider may determine that Cisco ANA Units that host VNEs for missioncritical core devices require a 1+1 cluster, while Units hosting less critical VNEs may be adequately served using a 5+1 or 12+2 cluster configuration In highly distributed Cisco ANA environments, Cisco recommends clustering Units according to geography The administrator can define one or more protection groups that correspond to clusters of active Units and a standby Unit Should any active Unit in a protection group fail, the Gateway initiates a failover to the standby Unit in that protection group (Figure 2) Figure Cisco ANA Protection Group Running a Protection Manager process, the Cisco ANA Gateway monitors system health of each Unit associated with it The Protection Manager monitors the entire Cisco ANA system, much like the AVM-99 module operates within a single Unit When the Protection Manager detects a malfunctioning Unit, it automatically signals the standby server in the cluster to load the configuration of the faulty Unit Switchover incurs no risk of information loss and requires no persistent storage synchronization The Cisco ANA Gateway server includes a Goldensource directory that identifies every Unit attached to it by a unique AVM number When the standby Unit comes online, it identifies itself to the Gateway, which downloads the unique AVM configurations of the failed Unit The Unit then polls the network to reload its VNEs, a process that typically takes about half an hour If they are still running, VNEs on the faulty Unit switch into maintenance mode to prevent double polling of the network from both the failed Unit and the new active Unit All events are recorded in the Cisco ANA EventVision system log, which enables operators to initiate repair or replacement of the failed Unit Standby Gateway Server Because a single Gateway server governs access to the entire Cisco ANA system, high availability of the Gateway is essential Cisco ANA version 3.5.2 uses an active-passive 1+1 high-availability architecture, where a standby server assumes control of the Cisco ANA system in case of primary Gateway failure The physical configuration of the high-availability Gateway environment is shown in Figure 3, with dual Ethernet connections between primary and secondary Gateway servers and associated storage units All contents are Copyright © 1992–2007 Cisco Systems, Inc All rights reserved This document is Cisco Public Information Page of White Paper Figure Cisco ANA Gateway Server High-Availability Configuration For another layer of protection Cisco ANA AVM-100 allows sharing an IP address between active and standby Cisco ANA Gateway servers, enabling seamless communication with associated Units when a failover occurs Achieving High Availability in Cisco ANA When the network management system remains available, operators can their jobs more effectively as they strive to eliminate downtime and revenue loss for the service provider As with any technology system, achieving high availability for a Cisco ANA system requires a synergy among redundant links and hardware, properly configured monitoring and failover processes, and the operators managing the system As the worldwide leader in networking for the Internet, Cisco remains committed to continually enhancing the high availability of Cisco ANA software, its premier service provider management system Printed in USA All contents are Copyright © 1992–2007 Cisco Systems, Inc All rights reserved This document is Cisco Public Information C11-405350-00 05/07 Page of ... server governs access to the entire Cisco ANA system, high availability of the Gateway is essential Cisco ANA version 3.5.2 uses an active- passive 1+1 high- availability architecture, where a standby... for the entire Cisco ANA platform High Availability in Cisco ANA Systems The Cisco ANA system supports two levels of availability: Software-level, per Unit availability: Internal processes in... applications for managing the network using the Cisco ANA platform, there are three types of Cisco ANA client: Cisco ANA NetworkVision: The primary GUI application for Cisco ANA, used to visualize

Ngày đăng: 27/10/2019, 23:08

Mục lục

  • High Availability with Cisco Active Network Abstraction

    • Executive Summary

    • Cisco ANA Software Architecture

    • High Availability in Cisco ANA Systems

    • Software-Level, Per Unit Availability

    • System-Level Availability

      • N+m Standby Unit Clustering

      • Achieving High Availability in Cisco ANA

Tài liệu cùng người dùng

  • Đang cập nhật ...

Tài liệu liên quan