BasicStepsinDisaster-RecoveryPlanning The disaster-recoveryplanning process may vary from organization to organization, but the basicsteps that must be performed in all cases are listed below: Establish a planning committee. The top management of the organization must be involved in the development of the disaster-recovery plan. Management should be responsible for coordinating the disaster-recovery plan and ensuring its effectiveness within the organization. Adequate time and resources must be committed to the development of an effective plan, with the resources under consideration including financial considerations and the effort of all personnel involved. The planning committee should include representatives from all operational areas of the organization. This is essential, since it is common that separate plans exist for each department, and these plans must be coordinated. Failure to do this can result in multiple demands on the same resource, incompatible strategies, time delays, and, in the worst case, the failure to properly carry out the plan in the case of emergency. Identify serious risks. The planning committee should carry out a risk and business-impact analysis that includes a range of possible disasters, including natural, technical, and human threats. Each operational area of the organization should be analyzed to determine the potential consequence and impact associated with several disaster scenarios. The risk-assessment process should also evaluate the safety of critical documents and vital records. Traditionally, fire has posed the greatest threat to organizations. Intentional human tampering, however, should also be considered. The plan should provide for the "worst case" scenario: the destruction of the main building. It is important to assess the impacts and consequences resulting from the loss of information and services. The planning committee should also analyze the costs related with minimizing potential exposures. Establish priorities. Here you should determine what are the most important considerations for processing and operations and carefully evaluate the critical requirements of each department. Determine the maximum amount of time that the department and organization can operate without each critical system. Critical needs are defined as the necessary procedures and equipment required to continue operations should a department, computer center, main facility or a combination of these be destroyed or become inaccessible. Determine recovery strategies. Here, you should consider all aspects of your organization's information system, including the following: Facilities Hardware Software Communications Data files Customer services User operations End-user systems Other processing operations Assign a disaster team. Once this has been done, you should then develop disaster recognition and initial-reaction procedures. At a minimum, these must include the following: Initial reaction procedures to a disaster report Notification procedures for police, fire, medical care Notification procedures for management Procedures for mobilizing the disaster team Procedures for assessing the damage and registering critical-events logs for audit purposes Take a complete inventory of all equipment and software. This is an essential part of any recovery plan. At minimum, it should include the following: A listing of all equipment by type and model number. The list should include equipment such as mission-critical servers, mainframe computers, bridges, routers and gateways. Name, address and telephone number of the manufacturer/vendor. Date of purchase and original cost. Locations of third-party equipment suppliers. Associated software packages, including all software required for the operation of mission-critical equipment. The software inventory must include the following information: the purpose of the software; date of acquisition; license and version number; original cost; address and telephone number of the vendor; names, addresses, and phone numbers of service and technical-support centers, etc. Develop recovery procedures. You should take into consideration the following aspects: Procedures for ensuring and maintaining physical security Coordination of restoration for the original site Restoration of electronic equipment Reloading of software Restoration of power, UPS, common building systems Replacement of fire-suppression systems Rewiring of the building Restoring the LAN Restoring the WAN connections Document the plan. Try to make the plan easily understandable for any technical person or other co-workers who might be called upon to help execute the plan or support recovery efforts. Whenever possible, illustrate the plan with diagrams. A comprehensive recovery plan generally includes the following information: Emergency call lists for management and recovery teams Vendor call out and escalation lists Inventory and report forms Carrier call out and escalation lists Maintenance forms Hardware lists and serial numbers Software lists and license numbers Team-member duties and responsibilities Network schematic diagrams Equipment-room floor-grid diagrams Contract and maintenance agreements Special operating instructions for sensitive equipment Cellular telephone inventory and agreements Miscellaneous Present the plan to all staff and train employees. Test the plan and review it with all employees. If necessary, you should re- evaluate and re-document your plan after having done this. Understand that disaster recovery planning is not a short-term project. On the contrary, it is a very complex, labor-intensive, and time-consuming process. Furthermore, it is also not a project that you can forget about after it has been set up and approved. An efficient recovery plan is one that works, so it must be kept current and updated. In order to ensure this, it must be revised, tested, and practised on a regular basis. In other words, your disaster-recovery plan must be a living plan! Basicsteps of a typical disaster recovery plan are outlined in Fig. 2.1 . Figure 2.1: The basicsteps of a typical disaster recovery plan Emerging Technologies for Disaster-Recovery Solutions In traditional disaster-recovery planning, a complete daily backup to tape is the key feature. After backup, tapes are usually shipped to a safe site. Theoretically, when a disaster strikes, these backup copies are shipped to an alternate site, where IT specialists perform the recovery, and the business can be up and running again. According to statistical data, more than 75% of all companies in the United States rely on this technique. This is understandable, since tape backup is a traditional, well-tested, and workable technology. More importantly, it involves relatively low costs. However, this approach also has its drawbacks. First, it carries a 24-hours recovery period, meaning 24 to 48 hours of downtime (for example, one day to ship the tapes to alternate site, and another day to restore, troubleshoot, and actually get the system working again). If your company can afford 48 hours downtime then this traditional approach is appropriate. However, many companies can't afford even one hour of downtime, and the number of such companies is growing constantly (e-businesses, for example). To meet these needs, new technologies have been developed, such as electronic vaulting and mirroring. With electronic vaulting, data backup is performed over the network to a remote site. Some companies use vaulting because they find it to be a more convenient, reliable and automated way to do nightly backups. Some companies use the computer at the vaulting site (the "catcher") as a temporary replacement for the down server. In this case, performance might suffer, but service still won't be totally interrupted. However, electronic vaulting involves a minimum of twice the cost of a traditional tape backup and, furthermore, can't alone help you to achieve a recovery window of less than one hour. To achieve even shorter recovery times, it is necessary to mirror data to an identical system dedicated to performing the mirroring function. At the protected server, a probe is installed that continuously sends "OK" messages to the mirroring server. When the probe ceases to send these messages (or sends an emergency request), the mirroring machine steps in. Theoretically, this recovery scheme provides instantaneous recovery (even in the event of a large-scale emergency, where communication services or Internet services may be affected, it is possible to bring the mirroring system up within an hour). Banks, stock exchanges, or e-commerce companies often employ this scheme. Most other companies can't afford this technology, but current trends indicate that its use will grow significantly with the growth of e-commerce. Disaster-Recovery Services Market As has already been mentioned, disaster-recoveryplanning is very, very crucial and very, very complicated. Therefore, along with performing the basicstepsin your disaster- recovery plan and making technical decisions, you have to decide whether you are going to implement the plan in-house or use the services of a specialized disaster-recovery firm. The main advantages of in-house disaster-recoveryplanning are obvious-better control and lower costs. However, despite these advantages, there are also drawbacks: Proper disaster-recoveryplanning is much easier said than done. It really is difficult and consumes a lot of employee time. Any money-saving techniques always involve certain compromises (such as performance degradation, for example). Because of this, if you decide to implement the disaster-recovery plan in-house, it is recommended that you consider retaining a consulting firm to help you define your needs and develop proper procedures. After all, if you suffer a data loss as a result of disaster, you don't want any additional surprises. In today's business environment, more and more organizations, especially large ones, are opting to utilize specialized companies providing disaster-recovery services. The largest and best-known of these service providers, accounting for a 90% share of the entire market, are the following: Comdisco Continuity Services (http://www.comdisco.com) SunGard Recovery Services (http://www.sungard.com) IBM Global Services Business Continuity and Recovery Services If your company is a large one, it is clearly better to contact one of these three large service providers, since smaller disaster-recovery service providers are unlikely to fulfill your needs. In any case, a large disaster-recovery service provider is preferable for the simple reason that it has more resources at its disposal. For example, consider how many other companies may be experiencing a disaster situation simultaneously with you? Smaller disaster-recovery companies could be paralyzed with requests for help in such a situation. But, over the long haul, the industry seems prepared to meet the needs of enterprises looking to beef up their disaster-recovery and storage options. One factor in this equation is the Storage Service Provider (SSP) market, which emerged based on a storage utility "pay-as-you-go" model. Recently, many SSPs have reinvented themselves, however, after anemic market uptake, and have begun to provide managed storage services for companies that are looking for someone to help them lasso their own runaway storage resources. Some SSPs-such as StorageNetworks (www.storagenetworks.com) and StorageWay (www.storageway.com)-have entered into symbiotic relationships with telecommunication companies and other service providers (including Qwest, Yipes, and BellSouth, to name just a few). In some cases, they are using a service provider's infrastructure to deliver services to their own customers. This time around, the SSPs are primarily supplying the service providers with the software and know-how necessary to deliver managed storage, backup, and recovery services to their customers. If you didn't have a Disaster Recovery Plan before now, the necessity to set one up should now be obvious. The lack of a disaster-recovery plan is, simply put, dangerous, so you can't afford to wait until the last minute to set one up. Always remember that surviving a disaster depends largely on comprehensive planning, selecting the appropriate products, documenting procedures, and constant updates to your disaster-recovery plan. These elements are essential for any IT firm or organization. Data Backup as Part of Disaster-Recovery Plan Surprisingly enough, most people only appreciate the importance of backup after a massive data loss occurs or important files get corrupted or accidentally deleted. Even among the most talented and qualified administrators, not all can say that they have a good set of backups at hand. The situation that must be avoided is that where a recent backup turns out to be unavailable at the moment when it is required. Your backup plan must be an integral part of the disaster-recovery plan, and as such, it requires serious testing. General recommendations for creating a solid backup plan are as follows: Make sure that your backup hardware is adequate for the job and make certain to provide the proper backup capacity. Don't forget to consider how much time it requires to complete the backup procedure. If you have limited time, consider purchasing faster hardware or implementing a parallel backup scheme, which can significantly reduce the backup time, provided that it is properly configured. Be aware of open files. Although backup software supplied with Windows XP and Windows Server 2003 handle open files gracefully thanks to the new volume- snapshot technology first introduced with Windows XP, all of its predecessors simply skip open files during the backup process. Where necessary, purchase backup agents or modules to back up databases, mail servers and applications in which open files are continuously maintained. For example, it is strongly recommended to purchase such agents for Microsoft Exchange, SQL Server, and Oracle. Assign a person to be responsible for checking event logs to be certain that backup procedures are being carried out according to the plan. Keep your backup tapes as fresh as possible and store them properly. Also, be certain that tape devices remain clean and are in a clean environment. Since tapes are relatively fragile, test them periodically by doing a restore from each tape. Also, have multiple copies of tapes at hand and be certain that they are usable. It is also recommended to consider copying data to more stable media, such as MO diskettes or CD-RW. If your users prefer not to store their files on the server, make them responsible for backing up their own hard drives. Note In any case, a solid backup system must be considered a key component of any network, server, or even critical workstation. Do your best to avoid situations where the user, being asked to provide a backup copy, ends up answering: "What? A backup copy? What is a backup copy ." For many organizations, particularly smaller ones, the concept of fault tolerance extends only as far as to doing a nightly tape backup. In many cases, the reason cited for using this measure alone is the lack of available funds to do more, but perhaps more common is a simple lack of appreciation of the impact a downed server can have. Tape backups provide an insurance policy against one thing - data loss. They do not, however, protect against downtime and, quite often, constitute the slowest link in the system-recovery process. Now, we come to the most interesting point - the role of registry backup and recovery in your plan for regular data backup. The saddest fact about this is that most users still don't realize how fast and easy the restoration process can be if, instead of reinstalling the OS and performing full system recovery, you simply restore the damaged configuration using one of the methods described in this chapter. . Basic Steps in Disaster-Recovery Planning The disaster-recovery planning process may vary from organization to organization, but the basic steps. take into consideration the following aspects: Procedures for ensuring and maintaining physical security Coordination of restoration for the original