OCA/OCP Oracle Database 11g All-in-One Exam Guide 836 To create an external table, use the CREATE TABLE command with the keywords ORGANIZATION EXTERNAL. These tell Oracle that the table does not exist as a segment. Then specify the layout and location of the operating system file. For example, create table new_dept (deptno number(2), dname varchar2(14), loc varchar2(13)) organization external ( type oracle_loader default directory jon_dir access parameters (records delimited by newline badfile 'depts.bad' discardfile 'depts.dsc' log file 'depts.log' fields terminated by ',' missing field values are null) location ('depts.txt')); This command will create an external table that will be populated by the DEPTS. TXT file shown in the section “SQL*Loader” earlier in this chapter. The syntax for the Figure 23-1 Managing directories with SQL*Plus Chapter 23: Moving and Reorganizing Data 837 PART III ACCESS PARAMETERS is virtually identical to the SQL*Loader controlfile syntax and is used because the TYPE has been set to ORACLE_LOADER. The specification for the DEFAULT DIRECTORY gives the Oracle directory where Oracle will look for the source datafile, and where it will write the log and other files. External tables can be queried in exactly the same way as internal tables. Any SQL involving a SELECT will function against an external table: they can be used in joins, views, and subqueries. They cannot have indexes, constraints, or triggers. Exercise 23-1: Use SQL*Loader and External Tables In this exercise, you will install and use SQL*Loader to insert data into a table, and also to generate the CREATE TABLE script for an external table. 1. Connect to your database as user SYSTEM (in the examples, the SYSTEM password is ORACLE) with SQL*Plus. 2. Create a table to use for the exercise: create table names(first varchar2(10),last varchar2(10)); 3. Using any editor that will create plain text files, create a file names.txt with these values (or similar): John,Watson Roopesh,Ramklass Sam,Alapati 4. Using the editor, create a controlfile names.ctl with these settings: load data infile 'names.txt' badfile 'names.bad' truncate into table names fields terminated by ',' trailing nullcols (first,last) This controlfile will truncate the target file before carrying out the insert. 5. From an operating system prompt, run SQL*Loader as follows: sqlldr system/oracle control=names.ctl 6. Study the log file names.log that will have been generated. 7. With SQL*Plus, confirm that the rows have been inserted: select * from names; 8. To generate a statement that will create an external table, you can use SQL*Loader and an existing controlfile: sqlldr userid=system/oracle control=names.ctl external_table=generate_only 9. This will have generated a CREATE TABLE statement in the log file names. log, which will look something like this: CREATE TABLE "SYS_SQLLDR_X_EXT_NAMES" ( "FIRST" VARCHAR2(10), OCA/OCP Oracle Database 11g All-in-One Exam Guide 838 "LAST" VARCHAR2(10) ) ORGANIZATION external ( TYPE oracle_loader DEFAULT DIRECTORY SYS_SQLLDR_XT_TMPDIR_00000 ACCESS PARAMETERS ( RECORDS DELIMITED BY NEWLINE CHARACTERSET WE8MSWIN1252 BADFILE 'SYS_SQLLDR_XT_TMPDIR_00000':'names.bad' Log file 'names.log_xt' READSIZE 1048576 FIELDS TERMINATED BY "," LDRTRIM MISSING FIELD VALUES ARE NULL REJECT ROWS WITH ALL NULL FIELDS ( "FIRST" CHAR(255) TERMINATED BY ",", "LAST" CHAR(255) TERMINATED BY "," ) ) location ( 'names.txt' ) )REJECT LIMIT UNLIMITED 10. From your SQL*Plus session, create an Oracle directory pointing to the operating system directory where your names.txt file is. For example, create directory system_dmp as '/home/oracle'; 11. Make any edits you wish to the command shown in Step 9. For example, you might want to change the name of the table being created (“SYS_SQLLDR_X_ EXT_NAMES” isn’t very useful) to something more meaningful. You will need to change both the DEFAULT DIRECTORY and BADFILE settings to point to the directory created in Step 10. 12. Run the statement created in Step 11 from your SQL*Plus session. 13. Query the table with a few SELECT and DML statements. You will find that a log file is generated for every SELECT, and that DML is not permitted. 14. Tidy up: delete the names.txt and names.ctl files; drop the tables; as SYS, drop the directory. Data Pump In the normal course of events, SELECT and DML commands are used to extract data from the database and to insert data into it, but there are occasions when you will need a much faster method for bulk operations. For many reasons it may be desirable to extract a large amount of data and the associated object definitions from a database in a form that will allow it to be easily loaded into another. One obvious purpose for extracting large amounts of data is for backups, but there are others, such as archiving of historical data before deleting it from the live system, or to transfer data between Chapter 23: Moving and Reorganizing Data 839 PART III production and test environments, or between an online system and a data warehouse. Data Pump (introduced with release 10g and enhanced with 11g) is a tool for large-scale, high-speed data transfer between Oracle databases. Data Pump Architecture Data Pump is a server-side utility. You initiate Data Pump jobs from a user process, either SQL*Plus or through Enterprise Manager, but all the work is done by server processes. This improves performance dramatically over the old Export/Import utilities, because the Data Pump processes running on the server have direct access to the datafiles and the SGA; they do not have to go via a session. Also, it is possible to launch a Data Pump job and then detach from it, leaving it running in the background. You can reconnect to the job to monitor its progress at any time. There are a number of processes involved in a Data Pump job, two queues, a number of files, and one table. First, the processes: The user processes are expdp and impdp (for Unix) or expdp.exe and impdp .exe (Windows). These are used to launch, control, and monitor Data Pump jobs. Alternatively, there is an Enterprise Manager interface. The expdp or impdp user process establishes a session against the database through a normal server process. This session then issues commands to control and monitor Data Pump jobs. When a Data Pump job is launched, at least two processes are started: a Data Pump Master process (the DMnn) and one or more worker processes (named DWnn). If multiple Data Pump jobs are running concurrently, each will have its own DMnn process, and its own set of DWnn processes. As the name implies, the master process controls the workers. If you have enabled parallelism, then each DWnn may make use of two or more parallel execution servers (named Pnnn). Two queues are created for each Data Pump job: a control queue and a status queue. The DMnn divides up the work to be done and places individual tasks that make up the job on the control queue. The worker processes pick up these tasks and execute them—perhaps making use of parallel execution servers. This queue operates on a deliver-exactly-once model: messages are enqueued by the DMnn and dequeued by the worker that picks them up. The status queue is for monitoring purposes: the DMnn places messages on it describing the state of the job. This queue operates on a publish-and-subscribe model: any session (with appropriate privileges) can query the queue to monitor the job’s progress. The files generated by Data Pump come in three forms: SQL files, dump files, and log files. SQL files are DDL statements describing the objects included in the job. You can choose to generate them (without any data) as an easy way of getting this information out of the database, perhaps for documentation purposes or as a set of scripts to recreate the database. Dump files contain the exported data. This is formatted with XML tags. The use of XML means that there is a considerable overhead in dump files for describing the data. A small table like the REGIONS table in the HR sample schema will generate a 94KB dump file, but while this overhead may seem disproportionately large for a tiny table like that, it becomes trivial for larger tables. The log files describe the history of the job run. OCA/OCP Oracle Database 11g All-in-One Exam Guide 840 EXAM TIP Remember the three Data Pump file types: SQL files, log files, and dump files. Finally, there is the control table. This is created for you by the DMnn when you launch a job, and is used both to record the job’s progress and to describe it. It is included in the dump file as the final item of the job. Directories and File Locations Data Pump always uses Oracle directories. These are needed to locate the files that it will read or write, and its log files. One directory is all that is needed, but often a job will use several. If the amount of data is many gigabytes to be written out in parallel to many files, you may want to spread the disk activity across directories in different file systems. If a directory is not specified in the Data Pump command, there are defaults. Every 11 g database will have an Oracle directory that can be used. This is named DATA_PUMP_ DIR. If the environment variable ORACLE_BASE has been set at database creation time, the operating system location will be the ORACLE_BASE/admin/database_name/ dpdump directory. If ORACLE_BASE is not set, the directory will be ORACLE_HOME/ admin/database_name/dpdump (where database_name is the name of the database). To identify the location in your database, query the view DBA_DIRECTORIES. However, the fact that this Oracle directory exists does not mean it can be used; any user wishing to use Data Pump will have to be granted read and/or write permissions on it first. Specifying the directory (or directories) to use for a Data Pump job can be done at four levels. In decreasing order of precedence, these are • A per-file setting within the Data Pump job • A parameter applied to the whole Data Pump job • The DATA_PUMP_DIR environment variable • The DATA_PUMP_DIR directory object So it is possible to control the location of every file explicitly, or a single Oracle directory can be nominated for the job, or an environment variable can be used, or failing all of these, Data Pump will use the default directory. The environment variable should be set on the client side but will be used on the server side. An example of setting it on Unix is DATA_PUMP_DIR=SCOTT_DIR; export DATA_PUMP_DIR or on Windows: set DATA_PUMP_DIR=SCOTT_DIR Direct Path or External Table Path? Data Pump has two methods for loading and unloading data: the direct path and the external table path. The direct path bypasses the database buffer cache. For a direct path export, Data Pump reads the datafile blocks directly from disk, extracts and Chapter 23: Moving and Reorganizing Data 841 PART III formats the content, and writes it out as a dump file. For a direct path import, Data Pump reads the dump file, uses its content to assemble blocks of table data, and writes them directly to the datafiles. The write is above the “high water mark” of the table, with the same benefits as those described earlier for a SQL*Loader direct load. The external table path uses the database buffer cache. Even though Data Pump is manipulating files that are external to the database, it uses the database buffer cache as though it were reading and writing an internal table. For an export, Data Pump reads blocks from the datafiles into the cache through a normal SELECT process. From there, it formats the data for output to a dump file. During an import, Data Pump constructs standard INSERT statements from the content of the dump file and executes them by reading blocks from the datafiles into the cache, where the INSERT is carried out in the normal fashion. As far as the database is concerned, external table Data Pump jobs look like absolutely ordinary (though perhaps rather large) SELECT or INSERT operations. Both undo and redo are generated, as they would be for any normal DML statement. Your end users may well complain while these jobs are in progress. Commit processing is absolutely normal. So what determines whether Data Pump uses the direct path or the external table path? You as DBA have no control; Data Pump itself makes the decision based on the complexity of the objects. Only simple structures, such as heap tables without active triggers, can be processed through the direct path; more complex objects such as clustered tables force Data Pump to use the external table path because it requires interaction with the SGA in order to resolve the complexities. In either case, the dump file generated is identical. EXAM TIP The external table path insert uses a regular commit, like any other DML statement. A direct path insert does not use a commit; it simply shifts the high water mark of the table to include the newly written blocks. Data Pump files generated by either path are identical. Using Data Pump Export and Import Data Pump is commonly used for extracting large amounts of data from one database and inserting it into another, but it can also be used to extract other information such as PL/SQL code or various object definitions. There are several interfaces: command- line utilities, Enterprise Manager, and a PL/SQL API. Whatever purpose and technique are used, the files are always in the Data Pump proprietary format. It is not possible to read a Data Pump file with any tool other than Data Pump. Capabilities Fine-grained object and data selection facilities mean that Data Pump can export either the complete database or any part of it. It is possible to export table definitions with or without their rows; PL/SQL objects; views; sequences; or any other object type. If exporting a table, it is possible to apply a WHERE clause to restrict the rows exported (though this may make direct path impossible) or to instruct Data Pump to export a random sample of the table expressed as a percentage. OCA/OCP Oracle Database 11g All-in-One Exam Guide 842 Parallel processing can speed up Data Pump operations. Parallelism can come at two levels: the number of Data Pump worker processes, and the number of parallel execution servers each worker process uses. An estimate facility can calculate the space needed for a Data Pump export, without actually running the job. The Network Mode allows transfer of a Data Pump data set from one database to another without ever staging it on disk. This is implemented by a Data Pump export job on the source database writing the data over a database link to the target database, where a Data Pump import job reads the data from the database link and inserts it. Remapping facilities mean that objects can be renamed or transferred from one schema to another and (in the case of data objects) moved from one tablespace to another as they are imported. When exporting data, the output files can be compressed and encrypted. Using Data Pump with the Command-Line Utilities The executables expdb and impdp are installed into the ORACLE_HOME/bin directory. Following are several examples of using them. Note that in all cases the command must be a single one-line command; the line breaks are purely for readability. To export the entire database, expdp system/manager@orcl11g full=y parallel =4 dumpfile=datadir1:full1_%U.dmp, datadir2:full2_%U.dmp, datadir3:full3_%U.dmp, datadir4:full4_%U.dmp, filesize=2G compression=all This command will connect to the database as user SYSTEM and launch a full Data Pump export, using four worker processes working in parallel. Each worker will generate its own set of dump files, uniquely named according to the %U template, which generates strings of eight unique characters. Each worker will break up its output into files of 2GB (perhaps because of underlying file system restrictions) of compressed data. A corresponding import job (which assumes that the files generated by the export have all been placed in one directory) would be impdb system/manager@dev11g full=y directory=data_dir parallel=4 dumpfile=full1_%U.dmp,full2_%U.dmp,full3_%U.dmp,full4_%U.dmp This command makes a selective export of the PL/SQL objects belonging to two schemas: expdp system/manager schemas=hr,oe directory=code_archive dumpfile=hr_oe_code.dmp include=function,include=package,include=procedure,include=type Chapter 23: Moving and Reorganizing Data 843 PART III This command will extract everything from a Data Pump export that was in the HR schema, and import it into the DEV schema: impdp system/manager directory=usr_data dumpfile=usr_dat.dmp schema=hr remap_schema=hr:dev Using Data Pump with Database Control The Database Control interface to Data Pump generates the API calls that are invoked by the expdp and impdp utilities, but unlike the utilities it makes it possible to see the scripts and if desired copy, save, and edit them. To reach the Data Pump facilities, from the database home page select the Data Movement tab. In the Move Row Data section, there are four links that will launch wizards: • Export to Export Files Define Data Pump export jobs. • Import from Export Files Define Data Pump import jobs. • Import from Database Define a Data Pump network mode import. • Monitor Export and Import Jobs Attach to running jobs to observe their progress, to pause or restart them, or to modify their operation. The final stage of each wizard gives the option to see the PL/SQL code that is being generated. The job is run by the Enterprise Manager job system, either immediately or according to a schedule. Figure 23-2 shows this final step of scheduling a simple export job of the HR.REGIONS table. Figure 23-2 The final step of the Database Control Data Pump Export Wizard OCA/OCP Oracle Database 11g All-in-One Exam Guide 844 Exercise 23-2: Perform a Data Pump Export and Import In this exercise, you will carry out a Data Pump export and import using Database Control. 1. Connect to your database as user SYSTEM with SQL*Plus, and create a table to use for the exercise: create table ex232 as select * from all_users; 2. Connect to your database as user SYSTEM with Database Control. Navigate to the Export Wizard: select the Data Movement tab from the database home page, then the Export To Export Files link in the Move Row Data section. 3. Select the radio button for Tables. Enter your operating system username and password for host credentials (if these have not already been saved as preferred credentials) and click CONTINUE. 4. In the Export: Tables window, click ADD and find the table SYSTEM.EX232. Click NEXT. 5. In the Export: Export Options window, select the directory SYSTEM_DMP (created in Exercise 23-1) as the Directory Object for Optional Files. Click NEXT. 6. In the Export: Files window, choose the directory SYSTEM_DMP and click NEXT. 7. In the Export: Schedule window, give the job a name and click NEXT to run the job immediately. 8. In the Review window, click SUBMIT JOB. 9. When the job has completed, study the log file that will have been created in the operating directory mapped onto the Oracle directory SYSTEM_DMP. Note the name of the Data Pump file EXPDAT01.DMP produced in the directory. 10. Connect to the database with SQL*Plus, and drop the table: drop table system.ex232; 11. In Database Control, select the Data Movement tab from the database home page, then the Import from Export Files link in the Move Row Data section. 12. In the Import: Files window, select your directory and enter the filename noted in Step 9. Select the radio button for Tables. Enter your operating system username and password for host credentials (if these have not already been saved as preferred credentials) and click CONTINUE. 13. In the Import: Tables window, click ADD. Search for and select the SYSTEM .EX232 table. Click SELECT and NEXT. 14. In the Import: Re-Mapping window, click NEXT. 15. In the Import: Options window, click NEXT. 16. In the Import: Schedule window, give the job a name and click NEXT. 17. In the Import: Review window, click SUBMIT JOB. 18. When the job has completed, confirm that the table has been imported by querying it from your SQL*Plus session. Chapter 23: Moving and Reorganizing Data 845 PART III Tablespace Export and Import A variation on Data Pump export/import is the tablespace transport capability. This is a facility whereby entire tablespaces and their contents can be copied from one database to another. This is the routine: 1. Make the source tablespace(s) read only. 2. Use Data Pump to export the metadata describing the tablespace(s) and the contents. 3. Copy the datafile(s) and Data Pump export file to the destination system. 4. Use Data Pump to import the metadata. 5. Make the tablespace(s) read-write on both source and destination. An additional step that may be required when transporting tablespaces from one platform to another is to convert the endian format of the data. A big-endian platform (such as Solaris on SPARC chips) stores a multibyte value such as a 16-bit integer with the most significant byte first. A little-endian platform (such as Windows on Intel chips) stores the least significant byte first. To transport tablespaces across platforms with a different endian format requires converting the datafiles: you do this with the RMAN command CONVERT. To determine the platform on which a database is running, query the column PLATFORM_NAME in V$DATABASE. Then to see the list of currently supported platforms and their endian-ness, query the view V$TRANSPORTABLE_PLATFORM: orcl > select * from v$transportable_platform order by platform_name; PLATFORM_ID PLATFORM_NAME ENDIAN_FORMAT 6 AIX-Based Systems (64-bit) Big 16 Apple Mac OS Big 19 HP IA Open VMS Little 15 HP Open VMS Little 5 HP Tru64 UNIX Little 3 HP-UX (64-bit) Big 4 HP-UX IA (64-bit) Big 18 IBM Power Based Linux Big 9 IBM zSeries Based Linux Big 13 Linux 64-bit for AMD Little 10 Linux IA (32-bit) Little 11 Linux IA (64-bit) Little 12 Microsoft Windows 64-bit for AMD Little 7 Microsoft Windows IA (32-bit) Little 8 Microsoft Windows IA (64-bit) Little 20 Solaris Operating System (AMD64) Little 17 Solaris Operating System (x86) Little 1 Solaris[tm] OE (32-bit) Big 2 Solaris[tm] OE (64-bit) Big 19 rows selected. Database Control has a wizard that takes you through the entire process of transporting a tablespace (or several). From the database home page, select the Data Movement tab and then the Transport Tablespaces link in the Move Database Files . expressed as a percentage. OCA/ OCP Oracle Database 11g All-in-One Exam Guide 842 Parallel processing can speed up Data Pump operations. Parallelism can come at two levels: the number of Data Pump. OCA/ OCP Oracle Database 11g All-in-One Exam Guide 836 To create an external table, use the CREATE TABLE command with the keywords ORGANIZATION EXTERNAL. These tell Oracle that the table. over a database link to the target database, where a Data Pump import job reads the data from the database link and inserts it. Remapping facilities mean that objects can be renamed or transferred