1. Trang chủ
  2. » Công Nghệ Thông Tin

o reilly Unix Backup and Recovery phần 7 doc

73 247 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 73
Dung lượng 662,07 KB

Nội dung

The second commercial method of backing up Oracle is to use Oracle7's EBU or Oracle8's rman. EBU/rman are Oracle internal products that are designed to give a backup utility a stream (or many streams) of backup data from the database. The command that is run is called obackup or rman. After a onetime setup, the commercial backup software can communicate with Oracle at any time to initiate a backup. It tells Oracle that it wants to back up instance ORACLE_SID, and it is able to receive n threads of data. (See "Commercial Backup Utilities" in Chapter 5 for an explanation of how backup threads work.) EBU/rman then does all the internal communication that it needs to do to supply the backup utility with n threads of data. Both the utility and EBU/rman record the time of the backup for future reference. After things have been set up, it is also possible for a DBA to run the obackup or rman command from the command line. This command then calls the appropriate programs to connect with the backup utility. The commercial backup utility then responds to this as to any other backup request, loading volumes as necessary. Page 477 Since EBU is no longer supported in Oracle8, we do not cover it here. Recovery Manager is supported in Oracle8 and has a number of advantages over EBU. One of the main advantages is that it understands the structure of the database a lot better. It can be told, for example, to restore a tablespace. It knows what files are in that tablespace and then restores the most recent backup of those files. Once that is accomplished, it then can be told to recover that tablespace or apply media recovery to it. This is far better than having to find out what files to restore. rman is too complex to be covered in detail in a chapter of this size; consult Oracle's Backup and Recovery Guide for an explanation of how rman works. What I would like to include in this chapter, however, is what is not included in the documentation-how to use rman to completely automate the process of backing up all Oracle instances on a server rman. To completely automate such a process, you must start at the top, with the oratab file (the oratab file contains a list of all Oracle instances). A script should read the oratab file, then generate backup requests for rman based on that file. These backup requests could be used to back up both the databases and the archive logs. Such a script has to use rman scripts as well to be able to give rman all the commands that it needs. I have used rman and have written such scripts (they are included here for example only). Unlike oraback.sh, these scripts have not been extensively tested on multiple platforms, but they are short, and their principles can be used to automate the backups of any Unix database server. Sample rman scripts The three sample scripts are rman.sh, database.rman, and archivelog.rman. rman.sh is the "parent" script. It is called from cron with one required argument: database or archivelog. This tells rman.sh what it is supposed to do. $ rman.sh [ database.full.rman ¦ database.inc.rman ] If called in this manner, rman.sh tells rman to use the command file database. level.rman. This command file tells rman to back up the entire database and switch log files when it is done. The level of the backup is determined by which rman script is called. (database.full does a level-0 backup, and database.inc does a level-1 backup.) If the PARALLELISM parameter at the top of the script is set to a number higher than 1, it backs up multiple instances at one time. $ rman.sh [ archivelog.full.rman ¦ archivelog.inc.rman ] If called in this manner, rman.sh tells rman to use the command file archivelog. level.rman. This command file tells rman to back up all archive logs it finds but not to delete them when it is done. (There is an rman option to do this, but I believe it is better to leave the files around for a few days before they are deleted.) Again, the level is determined by which script is called. Page 478 The rman.sh Script Here is the rman.sh script: # !/bin/sh # ####################################################### ##Site-specific section (change as appopriate) PATH=/usr/bin:/usr/sbin:/usr/ucb:/oracle/app/oracle/product/8.0.4/bin:/oracle/opt/ bin:/oracle/opt/rcs:/oracle/app/oracle/olap/olap/bin:/oracle/backupbin ORACLE_BASE=/oracle/app/oracle DEBUG=Y # Set this to "Y" to turn on set -x for all functions BINDIR=/oracle/backupbin # Location of this and related programs ORACLE=oracle # ID that script will run as DBAGROUP=dba # GROUP that should own backup directory ORATAB=/var/opt/oracle/oratab ORACLE_HOME='grep -v 'ˆ#' $ORATAB¦awk -F':' '{print $2}' ¦tail -1' TMP=/var/tmp # Where temporary and permanent logs are kept PATH=$PATH:/usr/bin:/usr/sbin:/sbin:/usr/5bin:/bin:$BINDIR GLOBAL_LOGIN_PASSWD=internal/manager RMAN_LOGIN_PASSWD=rman/rman RMAN_SID=admin ORIG_PATH=$PATH SID_PARALLELISM=2 # The number of instances to back up simultaneously. LOGDIR=/oracle/backupbin Preback() { #Run prior to backup. [ "$DEBUG" = Y ] && set -x } Postback() { #Run after entire backup finishes. [ "$DEBUG" = Y ] && set -x } export BINDIR ORATAB ORACONF TMP PATH ORIG_PATH ##End site-specific configuration section ####################################################### Usage() { echo "Usage: $0: cmdfile (Substitute 'cmdfile' with an rman cmdfile script (located in $BINDIR) that will be run by $0. e.g. database.rman)" exit 1 } [ "$DEBUG" = Y ] && set -x ORACLE_SIDS='grep -v 'ˆ#' $ORATAB¦awk -F':' '{print $1}' ¦grep -v '\*" Page 479 [ $# -eq 1 ] ¦ ¦ Usage CMDFILE=$1 PSID=sodfwer98w7uo2krwer987wer for ORACLE_SID in $ORACLE_SIDS ; do CT='ps -ef¦grep -c 'rman.target" while [ $CT -gt $SID_PARALLELISM ] ; do # Give the last command a little time to get going and/or fail. sleep 15 if [ 'ps -ef¦grep -c " $PSID "' -gt 1 ] ; then # If the command that we just backgrounded is now running, add it to the CT. CT='ps -ef¦grep -c "rman.target" sleep 30 else # If not, break out of this loop cause we'll be here forever. break fi done rman cmdfile "${BINDIR}/$CMDFILE" > $LOGDIR/rman.$ORACLE_SID. $CMDFILE.log 2>&1 & PSID=$! done The database.full.rman command file (level-0 backup) Here is the rman command file used to perform a level-0 backup: Run { target passwd@oracle_sid; rcvcat passwd@rman_sid; allocate channel t1 type 'sbt_tape'; allocate channel t2 type 'sbt_tape'; allocate channel t3 type 'sbt_tape'; allocate channel t4 type 'sbt_tape'; backup incremental level 0 format 'backup_test_%t_%s_%p' database ; sql 'alter system archive log current' ;} The archivelog.full.rman command file (level-0 archive logs) Here is the rman command file used to back up all level-0 archive logs: Run { target passwd@oracle_sid; rcvcat passwd@rman_sid; allocate channel t1 type 'sbt_tape'; allocate channel t2 type 'sbt_tape'; allocate channel t3 type 'sbt_tape'; allocate channel t4 type 'sbt_tape'; Page 480 backup incremental level 0 format 'backup_test_%t_%s_%p' archivelog all ; sql 'alter system archive log current';} The database.inc.rman command file (level-1 backups) Here is the rman command file used to perform a level-1 backup: Run { target passwd@oracle_sid; rcvcat passwd@rman_sid; allocate channel t1 type 'sbt_tape'; allocate channel t2 type 'sbt_tape'; allocate channel t3 type 'sbt_tape'; allocate channel t4 type 'sbt_tape'; backup incremental level 1 format 'backup_test_%t_%s_%p' database ; sql 'alter system archive log current';} The archivelog.inc.rman command file (level-1 archive logs) Here is the rman command file used to back up all level-1 archive logs: Run { target passwd@oracle_sid; rcvcat passwd@rman_sid; aenteringchannel t1 type 'sbt_tape'; allocate channel t2 type 'sbt_tape'; allocate channel t3 type 'sbt_tape'; allocate channel t4 type 'sbt_tape'; backup incremental level 1 format 'backup_test_%t_%s_%p' archivelog all ; sql 'alter system archive log current' ;} Difficulties with rman Oracle has come a long way since alter tablespace begin backup. rman is a powerful, flexible tool, but it's also a complex one with a large command set that must be learned in order to use it properly. (I wish they didn't make it so hard.) The default documentation also tells you to enter the rman password on the command line. This makes it available to anyone who can enter ps -ef. (The preceding scripts do not do this, but you can see that it was done by manually entering the passwords into the script.) The Oracle Enterprise Manager is designed to make rman and other Oracle products easy to use. A DBA learning rman for the first time would do well to experiment with this tool. Managing the Archived Redologs How common is the question,"Should I have archiving turned on?" Yes, yes, a thousand times yes! When in doubt, archive it out! Here's what is possible only if archiving is enabled: Page 481 • Recover up to the point of failure. • Recover from a backup that is a month or more old-if all the archived redologs since then are available. • Perform a complete backup of the database without even shutting it down. The existence of archive logs does all this without adding significant overhead to the entire process. The only difference between having archiving on or off is whether or not Oracle copies the current redolog out to disk when it "switches" from one redolog to the next. That's because even with archiving off, it still logs every transaction in the online redologs. That means that the only overhead associated with archiving is the overhead associated with copying the online file to the archive location, which is why there may be only a 1-3 percent performance hit in an environment with many transactions-if there is one at all. Feel free to experiment, but it is very difficult to justify turning off archiving on any production database. Archiving Saves the Day I know of one company that had a 250-GB database that did not use archiving at all. The biggest downside to this was that they could not do hot backups, and a cold backup took long. The result was that they didn't do any backups! The DBAs didn't want to turn on archiving because they said that it would make the batch loads take too long. They also believed that having archiving turned on would somehow cause database corruption. This is just not possible. Again, the only difference between running and not running archiving is whether the old redolog is copied to the archive destination. The rest of the database works exactly the same. I tried to convince them to turn on archiving. I even bet them that turning on archiving would not add more than a 3 percent overhead to their load times. In other words, a five-hour load would take only five hours and nine minutes. I lost the bet because it took five hours and ten minutes. The DBAs agreed to turn on archiving, and the database received its first backup ever in five years. Two weeks later that database lost five disks-believe it or not. We were able to recover the database overnight with no downtime to the users. In my opinion, there are only two environments in which turning off archiving is acceptable. The first is an environment in which the data does not matter. What type of environment would that be? The only one is a true test environment that is using fake data or data restored from production volumes. No structure changes are being made to this database, and any changes made to the data will be discarded. This database does not need archiving and probably doesn't even need to Page 482 be backed up at all.* It should be mentioned, though, that if you're doing any type of benchmarking of a database that will go into production, backup and archiving should be running.** The test will be more realistic-even if all the archive logs are deleted as soon as they are made. Development databases do not fall into this category. That's because, although the data in a development database may be unimportant, the structure of the database often is highly important. If archiving is off, a DBA cannot restore any development work that he has done since the last backup. That creates the opportunity to lose hours' or even days' worth of work, just so a development database can be 1-3 percent faster. That is a big risk for such a small gain. The second type of database that doesn't need archive logs is a completely read-only database or a "partially read-only" database where an archive log restore would be slower than a reload of the original data. The emergence of the datawarehouse has created this scenario. There are now some databases that have completely read-only tablespaces and never have data loaded into them. This type of database can be backed up once and then left alone until it changes again. A partially read-only database is one that stays read only for long periods of time and is updated by a batch process that runs nightly, weekly, or even as needed. The idea is that, instead of saving hundreds of redologs, the database would be restored from a backup that was taken before the load. The DBA then could redo the load. There are two choices in this scenario. The first is to turn off archiving, making sure that there is a good cold backup after each database load. If the load aborted or a disk crashed after the load but before the next backup, you could simply load the older backup and then redo the load. The cold backup will cost some downtime, but having archiving off will speed up the loads somewhat. The other option would be to turn on archiving. That allows taking a hot backup anytime and creates the option of using the redologs to reload the data instead of doing an actual data reload. This method allows for greater backup flexibility. However, depending on the database and the type of data, an archive log restore could take longer than a reload of the original data-especially if it is a multithreaded load. It is a tradeoff of performance for recoverability. Test both ways to see which one works best for you. * Did I just say that? ** I say this because I remember being told to turn off archiving and not run backups because the DBAs were running a "load test" to see how well the database would perform. I always argued that such a test was worthless, since you didn't test it under real conditions. Page 483 Recovering Oracle Since an Oracle database consists of several interrelated parts, recovering such a database is done through a process of elimination. Identify which pieces work, then recover the pieces that don't work. The following recovery guide follows that logic and works regardless of the chosen backup method. It consists of a flowchart (Figure 15-1) and a procedure whose numbered steps correspond to the elements in the flowchart. Using This Recovery Guide The following process for recovering an Oracle database assumes nothing. Specifically, it does not assume that the cause of the database failure is known. By following these steps you'll work through a series of tasks that determine which part(s) of the database is/are no longer functional. You then can bring the database up as soon as possible, while allowing recovery of the pieces that are damaged. ("Damaged" may mean that a file is either missing or corrupted.) Start with Step 1. If it succeeds, it directs you to Step 10. If the "startup mount" fails, it directs you to Step 2. Each of the steps follows a similar pattern, directing you to the appropriate step following the failure or success of the current step. The flowchart follows the same pattern as the printed steps. Once you are familiar with the details of each step, you may find the flowchart easier to follow than the printed instructions. If you are following the flowchart and get to a step that is unfamiliar to you, simply refer to the printed steps. The electronic version of this procedure* contains a flowchart that is an HTML image map. Each decision or action box in the flowchart is a hyperlink to the appropriate section of the printed procedure. For more detailed information about individual steps, please consult Oracle's documentation, especially the Oracle8 Backup and Recovery Manual. Restore or recover? In this chapter, the words "restore" and "recover" have different meanings: "Restore" means to use the backup and restore system to restore that particular file or files. For example, if it says to restore a database file that was backed up to disk, simply copy the backup copy of that file from the backup directory on disk to its original location. If a commercial backup utility is being used, it means to restore that file using that product's interface. The term ''recover," on the other hand, refers to doing something within Oracle to synchronize the various pieces of * It is available on the CD that comes with this book and at http://www.backupcentral.com. Page 484 Figure 15-1. Oracle recovery flowchart the database. For example, recover database rolls through all the redologs and applies any applicable changes to the datafiles associated with that database. Page 485 Step 1: Try Startup Mount The first step in verifying the condition of an Oracle database is to attempt to mount it. This works because mounting a database (without opening it) reads the control files but does not open the datafiles. If the control files are mirrored,* Oracle attempts to open each of the control files that are listed in the initORACLE_SID.ora file. If any of them is damaged, the mount fails. To mount a database, simply run svrmgrl, connect to the database, and enter startup mount: $ svrmgrl SVRMGR > connect internal; Connected. SVRMGR > startup mount; Statement processed. If it succeeds, the output looks something like this: SVRMGR > startup mount; ORACLE instance started. Total System Global Area 5130648 bytes Fixed Size 44924 bytes Variable Size 4151836 bytes Database Buffers 409600 bytes Redo Buffers 524288 bytes If the attempt to mount the database fails, the output looks something like this: SVRMGR > startup mount; Total System Global Area 5130648 bytes Fixed Size 44924 bytes Variable Size 4151836 bytes Database Buffer to s 409600 bytes Redo Buffers 524288 bytes ORACLE instance started. ORA-00205: error in identifying controlfile, check alert log for more info If the attempt to mount the database succeeds, proceed to Step 10. If it database fails, proceed to Step 2. * Which they'd better be! If you learn anything from this procedure, it should be that you really don't want to lose all of the control files and/or all of the current online redologs. Oracle will mirror them for you if you just tell it to do so. So do it! Page 486 Step 2: Are All Control Files Missing? Don't panic if the attempt to mount the database fails. Control files are easily restored if they were mirrored and can even be rebuilt from scratch if necessary. The first important piece of information is that one or more control files are missing. Unfortunately, since Oracle aborts the mount at the first failure it encounters, it could be missing one, two, or all of the control files, but so far you know only about the first missing file. So, before embarking on a course of action, determine the severity of the problem. In order to do that, do a little research. First, determine the names of all of the control files. Do that by looking at the configORACLE_SID.ora file next to the term control_files. It looks something like this: control_files = (/db/Oracle/a/oradata/crash/control01.ctl, /db/Oracle/b/oradata/crash/control02.ctl, /db/Oracle/c/oradata/crash/control03.ctl) It's also important to get the name of the control file that Oracle is complaining about. Find this by looking for the phrase control_files: in the alert log. (The alert log can be found in the location specified by the background_dump_dest value in the configinstance.ora file. (Typically, it is in the ORACLE_BASE/ORACLE_SID/admin/bdump directory.) In that directory, there should be a file called alert_ORACLE_SID.log. In that file, there should be an error that looks something like this: Sat Feb 21 13:46:19 1998 alter database mount exclusive Sat Feb 21 13:46:20 1998 ORA-00202: controlfile: '/db/a/oradata/crash/control01.ctl' ORA-27037: unable to obtain file status SVR4 Error: 2: No such file or directory Some of the following procedures may say to override a potentially corrupted control file. Since one never knows which file may be needed, always make backup copies of all of the control files before doing any of this. That offers an "undo" option that isn't possible otherwise. (Also make copies of the online redologs as well.) With the names of all of the control files and the name of the damaged file, it's easy to determine the severity of the problem. Do this by listing each of the control files and comparing their size and modification time. (Remember the game "One of these is not like the others" on Sesame Street?) The following scenarios assume that the control files were mirrored to three locations, which is a very common practice. The possible scenarios are: [...]... good and one damaged log This is why redologs are mirrored! Copy the good redolog to the damaged redolog's location For example, if /db/Oracle/a/oradata/crash/redocrash01.log was missing, but /db/Oracle/a/oradata/crash/redocrash01.log was intact, issue the following command: $ cp /db/Oracle/a/oradata/crash/redocrash01.log \ /db/Oracle/a/oradata/crash/redocrash01.log All redologs in at least one log... No such file or directory Always make backups of all the control files before copying any of them on top of one another The next step would be to copy a known good control file to the damaged control file's location Once that is done, return to Step 1 and try the startup mount again "But I don't have a good control file!" It's possible that there may be no known good control file, which is what would... log group should have the same modification time For example, the output of the preceding example command shows that /db/Oracle/a/oradata/crash/redocrash01.log and /db/Oracle/a/oradata/crash/redocrash01.log are in log group 1 They should have the same modification time and size The same should be true for groups 2 and 3 There are a couple of possible scenarios: One or more log groups has at least one... modification times, there's no way to know which one is good Try the following steps: First, make backup copies of all the files: $ cp /a/control1.ctl /a/control1.ctl.sav $ cp /b/control2.ctl /b/control2.ctl.sav $ cp /c/control3.ctl /c/control3.ctl.sav Second, try copying one file to all locations Skip control3.ctl, since it's obviously damaged Try starting with control1.ctl: $ cp /a/control1.ctl /b/control2.ctl... /logs1redolog02.log 3 /logs1redolog03.log For this example, we will mirror these three logs to /logs2 and /logs3 I prefer to keep the filenames of the members of a log group the same Therefore, in this example, redolog01.log will be mirrored to /logs1, /logs2, and /logs3 To do this, we issue the following commands: SVRMGR > alter database add logfile member '/logs2redolog01.log' to group 1; Selection processed SVRMGR... to do so, Oracle will automatically roll through all the archived redologs and the online redolog Then it says, Media recovery complete However, once Oracle rolls through all the archived redologs, it may prompt for the online redolog It does this by prompting for an archived redolog with a number that is higher than the most recent archived redolog available This means that it is looking for the online... Hopefully your backup system has been running the backup control file to trace command on a regular basis (The output of this command is a SQL script that rebuilds the control files automatically.) To rebuild the control file using the create controlfile script, proceed to Steps 4 through 7 If the backup control file to trace command has been running, proceed to Steps 4 through 7 If not, proceed to... Restore control files from backup The very first step in this process is to find and restore the most recent backup of the control file This would be the results of a backup control file to filename command This is the only supported method of backing up the control file Some people (oraback.sh included) also copy the control file manually If there is a manual copy of the control file that is more recent... an "official" copy, try to use it first However, if it doesn't work, use a backup Page 498 copy created by the backup control file to filename command Whatever backup control file is used, copy it to all of the locations and filenames listed in the configORACLE_SID.ora file after the phrase control_files: control_files = (/db/Oracle/a/oradata/crash/control01.ctl, /db/Oracle/b/oradata/crash/control02.ctl,... v$datafile and v$logfile Output (continued) 1 /db/Oracle/b/oradata/crash/redocrash01.log 2 /db/Oracle/a/oradata/crash/redocrash03.log 3 /db/Oracle/c/oradata/crash/redocrash02.log 6 rows selected SVRMGR > Look at each of the files shown by the preceding command First, look at the datafiles Each of the datafiles probably has the same modification time, or there might be a group of them with one modification . make backups of all the control files before copying any of them on top of one another. The next step would be to copy a known good control file to the damaged control file's location. Once. scenarios: One or more log groups has at least one good and one damaged log This is why redologs are mirrored! Copy the good redolog to the damaged redolog's location. For example, if /db/Oracle/a/oradata/crash/redocrash01.log. /logs1redolog01.log 2 /logs1redolog02.log 3 /logs1redolog03.log For this example, we will mirror these three logs to /logs2 and /logs3. I prefer to keep the filenames of the members of a log group

Ngày đăng: 14/08/2014, 02:22