Oracle® Database Globalization Support Guide 10g Release 2 (10.2) B14225-02 December 2005 Oracle Database Globalization Support Guide, 10g Release 2 (10.2) B14225-02 Copyright © 1996, 2005, Oracle. All rights reserved. Primary Author: Cathy Shea Contributing Authors: Paul Lane, Cathy Baird Contributors: Dan Chiba, Winson Chu, Claire Ho, Gary Hua, Simon Law, Geoff Lee, Peter Linsley, Qianrong Ma, Keni Matsuda, Meghna Mehta, Valarie Moore, Shige Takeda, Linus Tanaka, Makoto Tozawa, Barry Trute, Ying Wu, Peter Wallack, Chao Wang, Huaqing Wang, Simon Wong, Michael Yau, Jianping Yang, Qin Yu, Tim Yu, Weiran Zhang, Yan Zhu The Programs (which include both the software and documentation) contain proprietary information; they are provided under a license agreement containing restrictions on use and disclosure and are also protected by copyright, patent, and other intellectual and industrial property laws. Reverse engineering, disassembly, or decompilation of the Programs, except to the extent required to obtain interoperability with other independently created software or as specified by law, is prohibited. The information contained in this document is subject to change without notice. If you find any problems in the documentation, please report them to us in writing. This document is not warranted to be error-free. Except as may be expressly permitted in your license agreement for these Programs, no part of these Programs may be reproduced or transmitted in any form or by any means, electronic or mechanical, for any purpose. If the Programs are delivered to the United States Government or anyone licensing or using the Programs on behalf of the United States Government, the following notice is applicable: U.S. GOVERNMENT RIGHTS Programs, software, databases, and related documentation and technical data delivered to U.S. Government customers are "commercial computer software" or "commercial technical data" pursuant to the applicable Federal Acquisition Regulation and agency-specific supplemental regulations. As such, use, duplication, disclosure, modification, and adaptation of the Programs, including documentation and technical data, shall be subject to the licensing restrictions set forth in the applicable Oracle license agreement, and, to the extent applicable, the additional rights set forth in FAR 52.227-19, Commercial Computer Software—Restricted Rights (June 1987). Oracle Corporation, 500 Oracle Parkway, Redwood City, CA 94065 The Programs are not intended for use in any nuclear, aviation, mass transit, medical, or other inherently dangerous applications. It shall be the licensee's responsibility to take all appropriate fail-safe, backup, redundancy and other measures to ensure the safe use of such applications if the Programs are used for such purposes, and we disclaim liability for any damages caused by such use of the Programs. Oracle, JD Edwards, PeopleSoft, and Retek are registered trademarks of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. The Programs may provide links to Web sites and access to content, products, and services from third parties. Oracle is not responsible for the availability of, or any content provided on, third-party Web sites. You bear all risks associated with the use of such content. If you choose to purchase any products or services from a third party, the relationship is directly between you and the third party. Oracle is not responsible for: (a) the quality of third-party products or services; or (b) fulfilling any of the terms of the agreement with the third party, including delivery of products or services and warranty obligations related to purchased products or services. Oracle is not responsible for any loss or damage of any sort that you may incur from dealing with any third party. iii Contents Preface xv Intended Audience xv Documentation Accessibility xv Structure xvi Related Documents xvii Conventions xvii What's New in Globalization Support? xxiii Oracle Database 10g Release 2 (10.2) New Features in Globalization xxiii Oracle Database 10g Release 1 (10.1) New Features in Globalization xxiv 1 Overview of Globalization Support Globalization Support Architecture 1-1 Locale Data on Demand 1-1 Architecture to Support Multilingual Applications 1-2 Using Unicode in a Multilingual Database 1-3 Globalization Support Features 1-4 Language Support 1-4 Territory Support 1-4 Date and Time Formats 1-5 Monetary and Numeric Formats 1-5 Calendars Feature 1-5 Linguistic Sorting 1-5 Character Set Support 1-6 Character Semantics 1-6 Customization of Locale and Calendar Data 1-6 Unicode Support 1-6 2 Choosing a Character Set Character Set Encoding 2-1 What is an Encoded Character Set? 2-1 Which Characters Are Encoded? 2-2 Phonetic Writing Systems 2-3 Ideographic Writing Systems 2-3 Punctuation, Control Characters, Numbers, and Symbols 2-3 iv Writing Direction 2-3 What Characters Does a Character Set Support? 2-3 ASCII Encoding 2-4 How are Characters Encoded? 2-6 Single-Byte Encoding Schemes 2-7 Multibyte Encoding Schemes 2-7 Naming Convention for Oracle Character Sets 2-8 Length Semantics 2-8 Choosing an Oracle Database Character Set 2-10 Current and Future Language Requirements 2-11 Client Operating System and Application Compatibility 2-11 Character Set Conversion Between Clients and the Server 2-12 Performance Implications of Choosing a Database Character Set 2-12 Restrictions on Database Character Sets 2-12 Restrictions on Character Sets Used to Express Names 2-13 Database Character Set Statement of Direction 2-13 Choosing Unicode as a Database Character Set 2-13 Choosing a National Character Set 2-14 Summary of Supported Datatypes 2-14 Changing the Character Set After Database Creation 2-15 Monolingual Database Scenario 2-15 Character Set Conversion in a Monolingual Scenario 2-16 Multilingual Database Scenarios 2-17 Restricted Multilingual Support 2-17 Unrestricted Multilingual Support 2-18 3 Setting Up a Globalization Support Environment Setting NLS Parameters 3-1 Choosing a Locale with the NLS_LANG Environment Variable 3-3 Specifying the Value of NLS_LANG 3-5 Overriding Language and Territory Specifications 3-6 Locale Variants 3-6 Should the NLS_LANG Setting Match the Database Character Set? 3-7 NLS Database Parameters 3-8 NLS Data Dictionary Views 3-8 NLS Dynamic Performance Views 3-8 OCINlsGetInfo() Function 3-9 Language and Territory Parameters 3-9 NLS_LANGUAGE 3-9 NLS_TERRITORY 3-11 Overriding Default Values for NLS_LANGUAGE and NLS_TERRITORY During a Session 3-13 Date and Time Parameters 3-15 Date Formats 3-15 NLS_DATE_FORMAT 3-15 NLS_DATE_LANGUAGE 3-16 Time Formats 3-17 v NLS_TIMESTAMP_FORMAT 3-18 NLS_TIMESTAMP_TZ_FORMAT 3-19 Calendar Definitions 3-19 Calendar Formats 3-20 First Day of the Week 3-20 First Calendar Week of the Year 3-20 Number of Days and Months in a Year 3-21 First Year of Era 3-21 NLS_CALENDAR 3-22 Numeric and List Parameters 3-22 Numeric Formats 3-23 NLS_NUMERIC_CHARACTERS 3-23 NLS_LIST_SEPARATOR 3-24 Monetary Parameters 3-24 Currency Formats 3-25 NLS_CURRENCY 3-25 NLS_ISO_CURRENCY 3-26 NLS_DUAL_CURRENCY 3-27 Oracle Support for the Euro 3-27 NLS_MONETARY_CHARACTERS 3-28 NLS_CREDIT 3-28 NLS_DEBIT 3-29 Linguistic Sort Parameters 3-29 NLS_SORT 3-29 NLS_COMP 3-30 Character Set Conversion Parameter 3-31 NLS_NCHAR_CONV_EXCP 3-31 Length Semantics 3-31 NLS_LENGTH_SEMANTICS 3-31 4 Datetime Datatypes and Time Zone Support Overview of Datetime and Interval Datatypes and Time Zone Support 4-1 Datetime and Interval Datatypes 4-1 Datetime Datatypes 4-2 DATE Datatype 4-2 TIMESTAMP Datatype 4-3 TIMESTAMP WITH TIME ZONE Datatype 4-4 TIMESTAMP WITH LOCAL TIME ZONE Datatype 4-5 Inserting Values into Datetime Datatypes 4-5 Choosing a TIMESTAMP Datatype 4-8 Interval Datatypes 4-9 INTERVAL YEAR TO MONTH Datatype 4-9 INTERVAL DAY TO SECOND Datatype 4-10 Inserting Values into Interval Datatypes 4-10 Datetime and Interval Arithmetic and Comparisons 4-10 Datetime and Interval Arithmetic 4-10 Datetime Comparisons 4-11 vi Explicit Conversion of Datetime Datatypes 4-11 Datetime SQL Functions 4-12 Datetime and Time Zone Parameters and Environment Variables 4-13 Datetime Format Parameters 4-13 Time Zone Environment Variables 4-14 Daylight Saving Time Session Parameter 4-14 Choosing a Time Zone File 4-15 Upgrading the Time Zone File 4-17 Setting the Database Time Zone 4-18 Setting the Session Time Zone 4-19 Converting Time Zones With the AT TIME ZONE Clause 4-20 Support for Daylight Saving Time 4-21 Examples: The Effect of Daylight Saving Time on Datetime Calculations 4-21 5 Linguistic Sorting and String Searching Overview of Oracle's Sorting Capabilities 5-1 Using Binary Sorts 5-2 Using Linguistic Sorts 5-2 Monolingual Linguistic Sorts 5-2 Multilingual Linguistic Sorts 5-3 Multilingual Sorting Levels 5-4 Primary Level Sorts 5-4 Secondary Level Sorts 5-4 Tertiary Level Sorts 5-4 Linguistic Sort Features 5-5 Base Letters 5-5 Ignorable Characters 5-6 Contracting Characters 5-6 Expanding Characters 5-6 Context-Sensitive Characters 5-6 Canonical Equivalence 5-7 Reverse Secondary Sorting 5-7 Character Rearrangement for Thai and Laotian Characters 5-8 Special Letters 5-8 Special Combination Letters 5-8 Special Uppercase Letters 5-8 Special Lowercase Letters 5-8 Case-Insensitive and Accent-Insensitive Linguistic Sorts 5-8 Examples of Case-Insensitive and Accent-Insensitive Sorts 5-10 Specifying a Case-Insensitive or Accent-Insensitive Sort 5-10 Linguistic Sort Examples 5-12 Performing Linguistic Comparisons 5-13 Linguistic Comparison Examples 5-14 Using Linguistic Indexes 5-17 Linguistic Indexes for Multiple Languages 5-17 Requirements for Using Linguistic Indexes 5-18 Set NLS_SORT Appropriately 5-18 vii Specify NOT NULL in a WHERE Clause If the Column Was Not Declared NOT NULL 5-18 Example: Setting Up a French Linguistic Index 5-19 Searching Linguistic Strings 5-19 SQL Regular Expressions in a Multilingual Environment 5-19 Character Range '[x-y]' in Regular Expressions 5-20 Collation Element Delimiter '[. .]' in Regular Expressions 5-20 Character Class '[: :]' in Regular Expressions 5-21 Equivalence Class '[= =]' in Regular Expressions 5-21 Examples: Regular Expressions 5-21 6 Supporting Multilingual Databases with Unicode Overview of Unicode 6-1 What is Unicode? 6-1 Supplementary Characters 6-2 Unicode Encodings 6-2 UTF-8 Encoding 6-2 UCS-2 Encoding 6-3 UTF-16 Encoding 6-3 Examples: UTF-16, UTF-8, and UCS-2 Encoding 6-3 Oracle's Support for Unicode 6-4 Implementing a Unicode Solution in the Database 6-4 Enabling Multilingual Support with Unicode Databases 6-5 Enabling Multilingual Support with Unicode Datatypes 6-6 How to Choose Between a Unicode Database and a Unicode Datatype Solution 6-7 When Should You Use a Unicode Database? 6-7 When Should You Use Unicode Datatypes? 6-8 Comparing Unicode Character Sets for Database and Datatype Solutions 6-8 Unicode Case Studies 6-10 Designing Database Schemas to Support Multiple Languages 6-12 Specifying Column Lengths for Multilingual Data 6-12 Storing Data in Multiple Languages 6-13 Store Language Information with the Data 6-13 Select Translated Data Using Fine-Grained Access Control 6-13 Storing Documents in Multiple Languages in LOB Datatypes 6-14 Creating Indexes for Searching Multilingual Document Contents 6-15 Creating Multilexers 6-15 Creating Indexes for Documents Stored in the CLOB Datatype 6-16 Creating Indexes for Documents Stored in the BLOB Datatype 6-16 7 Programming with Unicode Overview of Programming with Unicode 7-1 Database Access Product Stack and Unicode 7-1 SQL and PL/SQL Programming with Unicode 7-3 SQL NCHAR Datatypes 7-4 The NCHAR Datatype 7-4 viii The NVARCHAR2 Datatype 7-4 The NCLOB Datatype 7-5 Implicit Datatype Conversion Between NCHAR and Other Datatypes 7-5 Exception Handling for Data Loss During Datatype Conversion 7-5 Rules for Implicit Datatype Conversion 7-6 SQL Functions for Unicode Datatypes 7-7 Other SQL Functions 7-8 Unicode String Literals 7-8 NCHAR String Literal Replacement 7-9 Using the UTL_FILE Package with NCHAR Data 7-10 OCI Programming with Unicode 7-10 OCIEnvNlsCreate() Function for Unicode Programming 7-10 OCI Unicode Code Conversion 7-12 Data Integrity 7-12 OCI Performance Implications When Using Unicode 7-12 OCI Unicode Data Expansion 7-13 Setting UTF-8 to the NLS_LANG Character Set in OCI 7-14 Binding and Defining SQL CHAR Datatypes in OCI 7-14 Binding and Defining SQL NCHAR Datatypes in OCI 7-15 Handling SQL NCHAR String Literals in OCI 7-16 Binding and Defining CLOB and NCLOB Unicode Data in OCI 7-17 Pro*C/C++ Programming with Unicode 7-17 Pro*C/C++ Data Conversion in Unicode 7-18 Using the VARCHAR Datatype in Pro*C/C++ 7-18 Using the NVARCHAR Datatype in Pro*C/C++ 7-19 Using the UVARCHAR Datatype in Pro*C/C++ 7-19 JDBC Programming with Unicode 7-20 Binding and Defining Java Strings to SQL CHAR Datatypes 7-20 Binding and Defining Java Strings to SQL NCHAR Datatypes 7-21 Using the SQL NCHAR Datatypes Without Changing the Code 7-22 Using SQL NCHAR String Literals in JDBC 7-22 Data Conversion in JDBC 7-23 Data Conversion for the OCI Driver 7-23 Data Conversion for Thin Drivers 7-23 Data Conversion for the Server-Side Internal Driver 7-24 Using oracle.sql.CHAR in Oracle Object Types 7-24 oracle.sql.CHAR 7-24 Accessing SQL CHAR and NCHAR Attributes with oracle.sql.CHAR 7-26 Restrictions on Accessing SQL CHAR Data with JDBC 7-26 Character Integrity Issues in a Multibyte Database Environment 7-26 ODBC and OLE DB Programming with Unicode 7-27 Unicode-Enabled Drivers in ODBC and OLE DB 7-27 OCI Dependency in Unicode 7-28 ODBC and OLE DB Code Conversion in Unicode 7-28 OLE DB Code Conversions 7-29 ODBC Unicode Datatypes 7-29 OLE DB Unicode Datatypes 7-30 ix ADO Access 7-30 XML Programming with Unicode 7-31 Writing an XML File in Unicode with Java 7-31 Reading an XML File in Unicode with Java 7-32 Parsing an XML Stream in Unicode with Java 7-32 8 Oracle Globalization Development Kit Overview of the Oracle Globalization Development Kit 8-1 Designing a Global Internet Application 8-2 Deploying a Monolingual Internet Application 8-2 Deploying a Multilingual Internet Application 8-4 Developing a Global Internet Application 8-5 Locale Determination 8-6 Locale Awareness 8-6 Localizing the Content 8-7 Getting Started with the Globalization Development Kit 8-7 GDK Quick Start 8-9 Modifying the HelloWorld Application 8-10 GDK Application Framework for J2EE 8-16 Making the GDK Framework Available to J2EE Applications 8-18 Integrating Locale Sources into the GDK Framework 8-19 Getting the User Locale From the GDK Framework 8-20 Implementing Locale Awareness Using the GDK Localizer 8-21 Defining the Supported Application Locales in the GDK 8-22 Handling Non-ASCII Input and Output in the GDK Framework 8-23 Managing Localized Content in the GDK 8-25 Managing Localized Content in JSPs and Java Servlets 8-25 Managing Localized Content in Static Files 8-26 GDK Java API 8-27 Oracle Locale Information in the GDK 8-28 Oracle Locale Mapping in the GDK 8-28 Oracle Character Set Conversion (JDK 1.4 and Later) in the GDK 8-29 Oracle Date, Number, and Monetary Formats in the GDK 8-30 Oracle Binary and Linguistic Sorts in the GDK 8-31 Oracle Language and Character Set Detection in the GDK 8-32 Oracle Translated Locale and Time Zone Names in the GDK 8-33 Using the GDK for E-Mail Programs 8-33 The GDK Application Configuration File 8-35 locale-charset-maps 8-35 page-charset 8-36 application-locales 8-36 locale-determine-rule 8-36 locale-parameter-name 8-37 message-bundles 8-38 url-rewrite-rule 8-39 Example: GDK Application Configuration File 8-39 GDK for Java Supplied Packages and Classes 8-40 x oracle.i18n.lcsd 8-41 oracle.i18n.net 8-41 oracle.i18n.servlet 8-41 oracle.i18n.text 8-42 oracle.i18n.util 8-42 GDK for PL/SQL Supplied Packages 8-42 GDK Error Messages 8-43 9 SQL and PL/SQL Programming in a Global Environment Locale-Dependent SQL Functions with Optional NLS Parameters 9-1 Default Values for NLS Parameters in SQL Functions 9-2 Specifying NLS Parameters in SQL Functions 9-2 Unacceptable NLS Parameters in SQL Functions 9-3 Other Locale-Dependent SQL Functions 9-4 The CONVERT Function 9-4 SQL Functions for Different Length Semantics 9-5 LIKE Conditions for Different Length Semantics 9-6 Character Set SQL Functions 9-6 Converting from Character Set Number to Character Set Name 9-6 Converting from Character Set Name to Character Set Number 9-6 Returning the Length of an NCHAR Column 9-7 The NLSSORT Function 9-7 NLSSORT Syntax 9-8 Comparing Strings in a WHERE Clause 9-8 Using the NLS_COMP Parameter to Simplify Comparisons in the WHERE Clause 9-8 Controlling an ORDER BY Clause 9-9 Miscellaneous Topics for SQL and PL/SQL Programming in a Global Environment 9-9 SQL Date Format Masks 9-9 Calculating Week Numbers 9-10 SQL Numeric Format Masks 9-10 Loading External BFILE Data into LOB Columns 9-10 10 OCI Programming in a Global Environment Using the OCI NLS Functions 10-1 Specifying Character Sets in OCI 10-2 Getting Locale Information in OCI 10-2 Mapping Locale Information Between Oracle and Other Standards 10-3 Manipulating Strings in OCI 10-3 Classifying Characters in OCI 10-5 Converting Character Sets in OCI 10-5 OCI Messaging Functions 10-6 lmsgen Utility 10-6 11 Character Set Migration Overview of Character Set Migration 11-1 Data Truncation 11-1 [...]... territory definition files in Oracle Database 10g Release 1 See Also: "Obsolete Locale Data" on page A-29 xxv xxvi 1 Overview of Globalization Support This chapter provides an overview of Oracle globalization support It includes the following topics: ■ Globalization Support Architecture ■ Globalization Support Features Globalization Support Architecture Oracle's globalization support enables you to store,... sets ■ It supports the Unicode datatype based on the Unicode standard Overview of Globalization Support 1-3 Globalization Support Features See Also: ■ Chapter 6, "Supporting Multilingual Databases with Unicode" ■ Chapter 7, "Programming with Unicode" ■ "Enabling Multilingual Support with Unicode Datatypes" on page 6-6 Globalization Support Features Oracle's standard features include: ■ Language Support. .. 6, "Supporting Multilingual Databases with Unicode" 1-6 Oracle Database Globalization Support Guide 2 Choosing a Character Set This chapter explains how to choose a character set It includes the following topics: ■ Character Set Encoding ■ Length Semantics ■ Choosing an Oracle Database Character Set ■ Changing the Character Set After Database Creation ■ Monolingual Database Scenario ■ Multilingual Database. .. messages are translated Territory Support The database supports cultural conventions that are specific to geographical locations The default local time format, date format, and numeric and monetary conventions 1-4 Oracle Database Globalization Support Guide Globalization Support Features depend on the local territory setting Setting different NLS parameters allows the database session to use different cultural... set up a globalization support environment, choose and migrate a character set, customize locale data, do linguistic sorting, program in a global environment, and program with Unicode This preface contains these topics: ■ Intended Audience ■ Documentation Accessibility ■ Structure ■ Related Documents ■ Conventions Intended Audience Oracle Database Globalization Support Guide is intended for database. .. Features in Globalization ■ Unicode 4.0 Support Unicode support has been enhanced to support the latest version of the Unicode standard See Also: Chapter 6, "Supporting Multilingual Databases with Unicode" ■ Character Set Scanner Utilities Enhancements The Database Character Set Scanner (CSSCAN) introduces two new parameters, QUERY and COLUMN, which offer finer control in performing selective scanning Support. .. examples in this guide follow OFA conventions Refer to Oracle Database Platform Guide for Windows for additional information about OFA compliances and for information about installing Oracle products in non-OFA compliant directories xxi xxii What's New in Globalization Support? This section describes new features of globalization support and provides pointers to additional information Oracle Database 10g... languages It ensures that database utilities, error messages, sort order, and date, time, monetary, numeric, and calendar conventions automatically adapt to any native language and locale In the past, Oracle's globalization support capabilities were referred to as National Language Support (NLS) features National Language Support is a subset of globalization support National Language Support is the ability... SQL*Plus is started by the UNIX user who owns the Oracle software from the Oracle home in which the RDBMS software is installed, and SQL*Plus connects to the 1-2 Oracle Database Globalization Support Guide Globalization Support Architecture database through an adapter by specifying the ORACLE_SID parameter, SQL*Plus is considered a client Its behavior is ruled by client-side NLS parameters Another example... Installing the Database Character Set Scanner System Tables Starting the Database Character Set Scanner Creating the Database Character Set Scanner Parameter File Getting Command-Line Help for the Database Character Set Scanner Database Character Set Scanner Parameters Database Character Set Scanner Sessions: Examples Full Database Scan: Examples . Oracle® Database Globalization Support Guide 10g Release 2 (10.2) B14225-02 December 2005 Oracle Database Globalization Support Guide, 10g Release 2 (10.2) B14225-02 Copyright. xxiii Oracle Database 10g Release 1 (10.1) New Features in Globalization xxiv 1 Overview of Globalization Support Globalization Support Architecture 1-1 Locale Data on Demand 1-1 Architecture to Support. Intended Audience ■ Documentation Accessibility ■ Structure ■ Related Documents ■ Conventions Intended Audience Oracle Database Globalization Support Guide is intended for database administrators,