Tài liệu The New C Standard- P4 ppt

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang	100
Dung lượng	777,88 KB

Nội dung

5.2.1 Character sets 223 Table 221.2: Relative frequency (most common to least common, with parenthesis used to bracket extremely rare letters) of letter usage in various human languages (the English ranking is based on the British National Corpus). Based on Kelk. [729] Language Letters English etaoinsrhldcumfpgwybvkxjqz French esaitnrulodcmpộvqfbghjxốyờzõỗợựụỷùkởw Norwegian erntsilakodgmvfupbhứjyồổcwzx(q) Swedish eantrsildomkgvọfhupồửbcyjxwzộq Icelandic anriestulgmkfhvoỏỵớdjúbyổỳửpộ ` ycxwzq Hungarian eatlnskomzrigỏộydbvhj ofupửúcuớỳỹxw(q) 222 The representation of each member of the source and execution basic character sets shall t in a byte. basic character set t in a byte Commentary This is a requirement on the implementation. The denition of character already species that it ts in a byte. 59 character single-byte However, a character constant has type int ; which could be thought to imply that the value representation of 883 character constant type characters need not t in a byte. This wording claries the situation. The representation of members of the basic execution character set is also required to be a nonnegative value. 478 basic character set positive if stored in char object C ++ 1.7p1 A byte is at least large enough to contain any member of the basic execution character set and . . . This requirement reverses the dependency given in the C Standard, but the effect is the same. Common Implementations On hosts where characters have a width 16 or 32 bits, that choice has usually been made because of addressability issues (pointers only being able to point at storage on 16- or 32-bit address boundaries). It is not usually necessary to increase the size of a byte because of representational issues to do with the character set. In the EBCDIC character set, the value of a is 129 (in Ascii it is 97). If the implementation-dened value of CHAR_BIT is 8, then this character, and some others, will not be representable in the type signed 307 CHAR_BIT macro char (in most implementations the representation actually used is the negative value whose least signicant eight bits are the same as those of the corresponding bits in the positive value, in the character set). In such implementations the type char will need to have the same representation as the type unsigned char. The ICL 1900 series used a 6-bit byte. Implementing this requirement on such a host would not have been possible. Coding Guidelines A general principle of coding guidelines is to recommend against the use of representation information. In 569.1 representation information using this case the standard is guaranteeing that a character will t within a given amount of storage. Relying on this requirement might almost be regarded as essential in some cases. Example 1 void f(void) 2 { 3 char C_1 = W; / * Guaranteed to fit in a char. * / 4 char C_2 = $; / * Not guaranteed to fit in a char. * / 5 signed char C_3 = W; / * Not guaranteed to fit in a signed char. * / 6 } June 24, 2009 v 1.2 5.2.1 Character sets 224 223 In both the source and execution basic character sets, the value of each character after 0 in the above list ofdigit characters contiguous decimal digits shall be one greater than the value of the previous. Commentary This is a requirement on the implementation. The Committee realized that a large number of existing programs depended on this statement being true. It is certainly true for the two major character sets used in the English-speaking world, Ascii, EBCDIC, and all of the human language digit encodings specified in Unicode, see Table 797.1. The Committee thus saw fit to bless this usage. Not only is it possible to perform relational comparisons on the digit characters (e.g, ’0’<’1’ is always true) but arithmetic operations can also be performed (e.g., ’0’+1 == ’1’ ). A similar statement for the alphabetic characters cannot be made because it would not be true for at least one character set in common use (e.g., EBCDIC). C ++ The above wording has been proposed as the response to C ++ DR #173. Other Languages Most languages that have not recently had their specifications updated do not specify any representational properties for the values of their execution character sets. Java specifies the use of the Unicode character set (newer versions of the language specify newer versions of the Unicode Standard; all of which are the same as Ascii for their first 128 values), so this statement also holds true. Ada specifies the subset of ISO 10646 known as the Basic Multilingual Plane (the original language standard specified ISO 646). ISO 10646 28 Coding Guidelines This requirement on an implementation provides a guarantee of representation information that developers can make use of (e.g., in relational comparisons, see Table 866.3). The following are suggested wordings for deviations from the guideline recommendation dealing with making use of representation information. representation information using 569.1 Dev 569.1 An integer character constant denoting a digit character may appear in the visible source as the operand of an additive operator. Example 1 #include <stdio.h> 2 3 extern char c_glob = ’4’; 4 5 int main(void) 6 { 7 if (’0’ + 3 == ’3’) 8 printf("Sentence 221 is TRUE\n"); 9 10 if (c_glob < ’5’) 11 printf("Sentence 221 may be TRUE\n"); 12 if (c_glob < 53) / * ’5’ == 53 in ASCII * / 13 printf("Sentence 221 does not apply\n"); 14 } 224 In source files, there shall be some way of indicating the end of each line of text;end-of-line representation v 1.2 June 24, 2009 5.2.1 Character sets 227 Commentary This is a requirement on the implementation. The C library makes a distinction between text and binary files. However, there is no requirement that source files exist in either of these forms. The worst-case scenario: In a host environment that did not have a native method of delimiting lines, an implementation would have to provide/define its own convention and supply tools for editing such files. Some integrated development environments do define their own conventions for storing source files and other associated information. C ++ The C ++ Standard does not specify this level of detail (although it does refer to end-of-line indicators, 2.1p1n1). Common Implementations Unicode Technical Report #13: “Unicode newline guidelines” discusses the issues associated with representing new-lines in files. The ISO 6429 standard also defines NEL (NExt Line, hexadecimal 0x85) as an end-of-line indicator. The Microsoft Windows convention is to indicate this end-of-line with a carriage return/line feed pair, \r\n (a convention that goes back through CP/M to DEC RT-11); the Unix convention is to use a single line feed character \n; the MacIntosh convention is to use the carriage return character, \r. Some mainframes implement a form of text files that mimic punched cards by having fixed-length lines. Each line contains the same number of characters, often 80. The space after the last user-written character is sometimes padded with spaces, other times it is padded with null characters. 225 this International Standard treats such an end-of-line indicator as if it were a single new-line character. Commentary The standard is not interested in the details of the byte representation of end-of-line on storage media. It 116 translation phase 1 makes use of the concept of end-of-line and uses the conceptual simplification of treating it as if it were a single character. C ++ 2.1p1n1 . . . (introducing new-line characters for end-of-line indicators) . . . 226 In the basic execution character set, there shall be control characters representing alert, backspace, carriage basic execution character set control characters return, and new line. Commentary This is a requirement on the implementation. These characters form part of the set of 96 execution character set members (counting the null character) defined by the standard, plus new line which is introduced in translation phase 1. However, these characters 221 basic execution character set 116 translation phase 1 are not in the basic source character set, and are represented in it using escape sequences. 866 escape sequence syntax Other Languages Few other languages include the concept of control characters, although many implementations provide semantics for them in source code (they are usually mapped exactly from the source to the execution character set). Java defines the same control characters as C and gives them their equivalent Ascii values. However, it does not define any semantics for these characters. Common Implementations ECMA-48 Control Functions for Coded Character Sets, Fifth Edition (available free from their Web site, http://www.ecma-international.ch ) was fast-tracked as the third edition of ISO/IEC 6429. This standard defines significantly more control functions than those specified in the C Standard. June 24, 2009 v 1.2 5.2.1 Character sets 228 227 If any other characters are encountered in a source file (except in an identifier, a character constant, a string literal, a header name, a comment, or a preprocessing token that is never converted to a token), the behavior is undefined. Commentary The standard does not prohibit such characters from occurring in a source file outright. The Committee was aware of implementations that used such characters to extend the language. For instance, the use of the @ character in an object definition to specify its address in storage. The list of exceptions is extensive. The only usage remaining, for such characters, is as a punctuator. Any other character has to be accepted as a preprocessing token. It may subsequently, for instance, be stringized. # operator 1950 It is the attempt to convert this preprocessing token into a token where the undefined behavior occurs. preprocessing token converted to token 137 C90 Support for additional characters in identifiers is new in C99. C ++ 2.1p1 Any source file character not in the basic source character set (2.2) is replaced by the universal-character-name that designates that character. The C ++ Standard specifies the behavior and a translator is required to handle source code containing such a character. A C translator is permitted to issue a diagnostic and fail to translate the source code. Other Languages Most languages regard the appearance of an unknown character in the source as some form of error. Like C, most language implementations support additional characters in string literals and comments. Common Implementations Most implementations generate a diagnostic, either when the preprocessing token containing one of these characters is converted to a token, or as a result of the very likely subsequent syntax violation. Some implementations [728] define the @ character to be a token, its usual use being to provide the syntax for specifying the address at which an object is to be placed in storage. It is generally followed by an integer constant expression. Coding Guidelines An occurrence of a character outside of the basic source character set, in one of these contexts, is most likely to be a typing mistake and is very likely to be diagnosed by the translator. The other possibility is that such characters were intended to be used because use is being made of an extension. This issue is discussed elsewhere. extensions cost/benefit 95.1 Example 1 static int glob @ 0x100; / * Put glob at location 0x100. * / 228 A letter is an uppercase letter or a lowercase letter as defined above;letter Commentary This defines the term letter. There is a third kind of case that characters can have, titlecase (a term sometimes applied to words where the first letter is in uppercase, or titlecase, and the other letters are in lowercase). In most instances titlecase is the same as uppercase, but there are a few characters where this is not true; for instance, the titlecase of the Unicode character U01C9, lj, is U01C8, Lj, and its uppercase is U01C7, LJ. v 1.2 June 24, 2009 5.2.1.1 Trigraph sequences 232 C90 This definition is new in C99. 229 in this International Standard the term does not include other characters that are letters in other alphabets. Commentary All implementations are required to support the basic source character set to which this terminology applies. Annex D lists those universal character names that can appear in identifiers. However, they are not referred to as letters (although they may well be regarded as such in their native language). The term letter assumes that the orthography (writing system) of a language has an alphabet. Some 792 orthography orthographies, for instance Japanese, don’t have an alphabet as such (let alone the concept of upper- and lowercase letters). Even when the orthography of a language does include characters that are considered to be matching upper and lowercase letters by speakers of that language (e.g., æ and Æ, å and Å), the C Standard does not define these characters to be letters. C ++ The definition used in the C ++ Standard, 17.3.2.1.3 (the footnote applies to C90 only), implies this is also true in C ++ . Coding Guidelines The term letter has a common usage meaning in a number of different languages. Developers do not often use this term in its C Standard sense. Perhaps the safest approach for coding guideline documents to take is to avoid use of this term completely. 230 The universal character name construct provides a way to name other characters. Commentary In theory all characters on planet Earth and beyond. In practice, those defined in ISO 10646. 28 ISO 10646 C90 Support for universal character names is new in C99. Other Languages Other language standards are slowly moving to support ISO 10646. Java supports a similar concept. Common Implementations Support for these characters is relatively new. It will take time before similarities between implementations become apparent. 231 Forward references: universal character names (6.4.3), character constants (6.4.4.4), preprocessing direc- tives (6.10), string literals (6.4.5), comments (6.4.9), string (7.1.1). 5.2.1.1 Trigraph sequences 232 trigraph sequences replaced by All occurrences in a source file Before any other processing takes place, each occurrence of one of the following sequences of three characters (called trigraph sequences 12) ) are replaced with the corresponding single character. Commentary Trigraphs were an invention of the C committee. They are a method of supporting the input (into source files, not executing programs) and the printing of some C source characters in countries whose alphabets, and keyboards, do not include them in their national character set. Digraphs, discussed elsewhere, are another 916 digraphs sequence of characters that are replaced by a corresponding single character. The \? escape sequence was introduced to allow sequences of ?s to occur within string literals. 895 string literal syntax The wording was changed by the response to DR #309. June 24, 2009 v 1.2 5.2.1.1 Trigraph sequences 234 Other Languages Until recently many computer languages did not attempt to be as worldly as C, requiring what might be called an Ascii keyboard. Pascal specifies what it calls lexical alternatives for some lexical tokens. The character sequences making up these lexical alternatives are only recognized in a context where they can form a single, complete token. Common Implementations On the Apple MacIntosh host, the notation ’????’ is used to denote the unknown file type. Translators in this environment often disable trigraphs by default to prevent unintended replacements from occurring. 233 trigraph sequences mappings ??= # ??) ] ??! | ??( [ ??’ ^ ??< } ??/ \ ??< { ??- ~ Commentary The above sequences were chosen to minimize the likelihood of breaking any existing, conforming, C source code. Other Languages Many languages use a small subset, or none, of these problematic source characters, reducing the potential severity of the problem. The Pascal standard specifies (. and .) as alternative lexical representations of [ and ] respectively. Common Implementations Recognizing trigraph sequences entails a check against every character read in by the translator. Performance profiling of translators has shown that a large percentage of time is spent in the lexer. A study by Waite [1469] found 41% of total translation time was spent in a handcrafted lexer (with little code optimization performed by the translator). An automatically produced lexer, the lex tool was used, consumed 3 to 5 as much time. One vendor, Borland, who used to take pride, and was known, for the speed at which their translators operated, did not include trigraph processing in the main translator program. A stand-alone utility was provided to perform trigraph processing. Those few programs that used trigraphs needed to be processed by this utility, generating a temporary file that was processed by the main translator program. While using this pre-preprocessor was a large overhead for programs that used trigraphs, performance was not degraded for source code that did not contain them. Usage There are insufficient trigraphs in the visible form of the .c files to enable any meaningful analysis of the usage of different trigraphs to be made. 234 No other trigraph sequences exist.trigraph sequences no other Commentary The set of characters for which trigraphs were created to provide an alternative spelling are known, and unlikely to be extended. Coding Guidelines Although no other trigraph sequences exist, sequences of two adjacent questions marks in string literals may lead to confusion. Developers may be unsure about whether they represent a trigraph or not. Using the escape sequence \? on at least one of these questions marks can help clarify the intent. Example 1 char * unknown_trigraph = "??++"; 2 char * cannot_be_trigraph = "?\? "; v 1.2 June 24, 2009 5.2.1.2 Multibyte characters 238 Usage The visible form of the .c files contained 593 ( .h 10) instances of two question marks (i.e., ?? ) in string literals that were not followed by a character that would have created a trigraph sequence. 235 Each ? that does not begin one of the trigraphs listed above is not changed. Commentary Two ?s followed by any other character than those listed above is not a trigraph. Common Implementations No implementation is known to define any other sequence of ?s to be replaced by other characters. Coding Guidelines No other trigraph sequences are defined by the standard, have been notified for future addition to the standard, or used in known implementations. Placing restrictions on other uses of other sequences of ? s provides no benefit. 236 EXAMPLE 1 ??=define arraycheck(a,b) a??(b??) ??!??! b??(a??) becomes #define arraycheck(a,b) a[b] || b[a] Commentary This example was added by the response to DR #310 and is intended to show a common trigraph usage. 237 EXAMPLE 2 The following source line printf("Eh???/n"); becomes (after replacement of the trigraph sequence ??/) printf("Eh?\n"); Commentary This illustrates the sometimes surprising consequences of trigraph processing. 5.2.1.2 Multibyte characters 238 The source character set may contain multibyte characters, used to represent members of the extended multibyte character source contain character set. Commentary The mapping from physical source file multibyte characters to the source character set occurs in translation 60 multibyte character phase 1. Whether multibyte characters are mapped to UCNs, single characters (if possible), or remain as 116 translation phase 1 multibyte characters depends on the model used by the implementation. 115 UCN models of C ++ The representations used for multibyte characters, in source code, invariably involve at least one character that is not in the basic source character set: 2.1p1 Any source file character not in the basic source character set (2.2) is replaced by the universal-character-name that designates that character. The C ++ Standard does not discuss the issue of a translator having to process multibyte characters during translation. However, implementations may choose to replace such characters with a corresponding universal- character-name. June 24, 2009 v 1.2 5.2.1.2 Multibyte characters 241 Other Languages Most programming languages do not contain the concept of multibyte characters. Common Implementations Support for multibyte characters in identifiers, using a shift state encoding, is sometimes seen as an extension. Support for multibyte characters in this context using UCNs is new in C99. The most common universal character name syntax 815 implementations have been created to support the various Japanese character sets. Coding Guidelines The standard does not define how multibyte characters are to be represented. Any program that contains them is dependent on a particular implementation to do the right thing. Converting programs that existed before support for universal character names became available may not be economically viable. Some coding guideline documents recommend against the use of characters that are not specified in the C Standard. Simply prohibiting multibyte characters because they rely on implementation-defined behavior ignores the cost/benefit issues applicable to the developers who need to read the source. These are complex issues for which your author has insufficient experience with which to frame any applicable guideline recommendations. 239 The execution character set may also contain multibyte characters, which need not have the same encoding as for the source character set. Commentary Multibyte characters could be read from a file during program execution, or even created by assigning byte values to contiguous array elements. These multibyte sequences could then be interpreted by various library functions as representing certain (wide) characters. The execution character set need not be fixed at translation time. A program’s locale can be changed at execution time (by a call to the setlocale function). Such a change of locale can alter how multibyte characters are interpreted by a library function. C ++ There is no explicit statement about such behavior being permitted in the C ++ Standard. The C header <wchar.h> (specified in Amendment 1 to C90) is included by reference and so the support it defines for multibyte characters needs to be provided by C ++ implementations. Other Languages Most languages do not include library functions for handling multibyte characters. Coding Guidelines Use of multibyte characters during program execution is an applications issue that is outside the scope of these coding guidelines. 240 For both character sets, the following shall hold: Commentary This is a set of requirements that applies to an implementation. It is the minimum set of guaranteed requirements that a program can rely on. Coding Guidelines The set of requirements listed in the following C-sentences is fairly general. Dealing with implementations that do not meet the requirements listed in these sentences is outside the scope of these coding guidelines. 241 — The basic character set shall be present and each character shall be encoded as a single byte. v 1.2 June 24, 2009 5.2.1.2 Multibyte characters 243 Commentary This is a requirement on the implementation. It prevents an implementation from being purely multibyte- based. The members of the basic character set are guaranteed to always be available and fit in a byte. 222 basic character set fit in a byte Common Implementations An implementation that includes support for an extended character set might choose to define CHAR_BIT to 216 extended character set 307 CHAR_BIT macro be 16 (most of the commonly used characters in ISO 10646 are representable in 16 bits, each in UTF-16; at 28 ISO 10646 28 UTF-16 least those likely to be encountered outside of academic research and the traditional Chinese written on Hong Kong). Alternatively, an implementation may use an encoding where the members of the basic character set are representable in a byte, but some members of the extended character set require more than one byte for their encoding. One such representation is UTF-8. 28 UTF-8 242 — The presence, meaning, and representation of any additional members is locale-specific. Commentary On program startup the execution locale is the "C" locale. During execution it can be set under program control. The standard is silent on what the translation time locale might be. Common Implementations The full Ascii character set is used by a large number of implementations. Coding Guidelines It often comes as a surprise to developers to learn what characters the C Standard does not require to be provided by an implementation. Source code readability could be affected if any of these additional members appear within comments and cannot be meaningfully displayed. Balancing the benefits of using additional members against the likelihood of not being able to display them is a management issue. The use of any additional members during the execution of a program will be driven by the user requirements of the application. This issue is outside the scope of these coding guidelines. 243 — A multibyte character set may have a state-dependent encoding, wherein each sequence of multibyte multibyte character state-dependent encoding shift state characters begins in an initial shift state and enters other locale-specific shift states when specific multibyte characters are encountered in the sequence. Commentary State-dependent encodings are essentially finite state machines. When a state encoding, or any multibyte encoding, is being used the number of characters in a string literal is not the same as the number of bytes encountered before the null character. There is no requirement that the sequence of shift states and characters representing an extended character be unique. 215 extended characters There are situations where the visual appearance of two or more characters is considered to be a single combining characters character. For instance, (using ISO 10646 as the example encoding), the two characters LATIN SMALL LETTER O (U+006F) followed by COMBINING CIRCUMFLEX ACCENT (U+0302) represent the grapheme cluster (the ISO 10646 term [334] for what might be considered a user character) ô not the two characters o ^ . Some languages use grapheme clusters that require more than one combining character, for instance ô ¯ . Unicode (not ISO 10646) defines a canonical accent ordering to handle sequences of these combining characters. The so-called combining characters are defined to combine with the character that comes immediately before them in the character stream. For backwards compatibility with other character encodings, and ease of conversion, the ISO 10646 Standard provides explicit codes for some accent characters; for instance, LATIN SMALL LETTER O WITH CIRCUMFLEX (U+00F4) also denotes ô. A character that is capable of standing alone, the o above, is known as a base character. A character that modifies a base character, the ô above, is known as a combining character (the visible form of some combining characters are called diacritic characters). Most character encodings do not contain any combining characters, and those that do contain them rarely specify whether they should occur before or after the modified base June 24, 2009 v 1.2 5.2.1.2 Multibyte characters 243 character. Claims that a particular standard require the combining character to occur before the base character it modifies may be based on a misunderstanding. For instance, ISO/IEC 6937 specifies a single-byte encoding for base characters and a double-byte encoding for some visual combinations of (diacritic + base) Latin letter. These double-byte encodings are precomposed in the sense that they represent a single character; there is no single-byte encoding for the diacritic character, and the representation of the second byte happens to be the same as that of the single-byte representation of the corresponding base character (e.g., 0xC14F represents LATIN CAPITAL LETTER O WITH GRAVE and 0xC16F represents LATIN SMALL LETTER O WITH GRAVE). C90 The C90 Standard specified implementation-defined shift states rather than locale-specific shift states. C ++ The definition of multibyte character, 1.3.8, says nothing about encoding issues (other than that more than one byte may be used). The definition of multibyte strings, 17.3.2.1.3.2, requires the multibyte characters to begin and end in the initial shift state. Common Implementations Most methods for state-dependent encoding are based on ISO/IEC 2022:1994 (identical to the standard ISO 2022 ECMA-35 “Character Code Structure and Extension Techniques”, freely available from their Web site, http://www.ecma.ch ). This uses a different structure than that specified in ISO/IEC 10646–1. The encoding method defined by ISO 2022 supports both 7-bit and 8-bit codes. It divides these codes up into control characters (known as C0 and C1) and graphics characters (known as G0, G1, G2, and G3). In the initial shift state the C0 and G0 characters are in effect. Table 243.1: Commonly seen ISO 2022 Control Characters. The alternative values for SS2 and SS3 are only available for 8-bit codes. Name Acronym Code Value Meaning Escape ESC 0x1b Escape Shift-In SI 0x0f Shift to the G0 set Shift-Out SO 0x0e Shift to the G1 set Locking-Shift 2 LS2 ESC 0x6e Shift to the G2 set Locking-Shift 3 LS3 ESC 0x6f Shift to the G3 set Single-Shift 2 SS2 ESC 0x4e, or 0x8e Next character only is in G2 Single-Shift 3 SS3 ESC 0x4f, or 0x8f Next character only is in G3 Some of the control codes and their values are listed in Table 243.1. The codes SI, SO, LS2, and LS3 are known as locking shifts. They cause a change of state that lasts until the next control code is encountered. A stream that uses locking shifts is said to use stateful encoding. ISO 2022 specifies an encoding method: it does not specify what the values within the range used for graphic characters represent. This role is filled by other standards, such as ISO 8859. A C implementation ISO 8859 24 that supports a state-dependent encoding chooses which character sets are available in each state that it supports (the C Standard only defines the character set for the initial shift state). Table 243.2: An implementation where G1 is ISO 8859–1, and G2 is ISO 8891–7 (Greek). Encoded values 0x62 0x63 0x64 0x0e 0xe6 0x1b 0x6e 0xe1 0xe2 0xe3 0x0f Control character SO LS2 SI Graphic character a b c æ α β γ Having to rely on implicit knowledge of what character set is intended to be used for G1, G2, and so on, is not always satisfactory. A method of specifying the character sets in the sequence of bytes is needed. The v 1.2 June 24, 2009 [...]... value occurs at translation time The execution time value actually received by the display device is outside the scope of the standard The library function fputc could map the value represented by these single char object into any sequence of bytes necessary basic execution character set 222 basic character set 221 fit in a byte C+ + This requirement can be deduced from 2.2p3 Other Languages Java explicitly... the scope of these coding guidelines 258 \b (backspace) Moves the active position to the previous position on the current line Commentary The standard specifies that the active position is moved It says nothing about what might happen to any character displayed prior to the backspace at the new current active position June 24, 2009 v 1.2 backspace escape sequence 260 5.2.2 Character display semantics... C9 9 in that it renders the behavior of the program as unspecified The program simply writes the character; how the device handles the character is beyond its control C+ + The C+ + Standard does not discuss character display semantics Common Implementations The most common implementation behavior is to ignore the request leaving the active position unchanged Some VDUs have the ability to wrap back to the. .. viewed as having the same effect as writing the appropriate number of backspace characters However, the effect of writing a backspace character might be to erase the previous character, while a carriage return does not cause the contents of a line to be erased Like backspace, the standard says 258 backspace escape sequence nothing about the effect of writing characters at the position on a line that... says nothing about the order in which lines are organized The vertical tab (and new line) escape sequence move the active position in the same line direction There is no escape sequence for moving the active position in the opposite direction, similar to backspace for movement within a line The concept of vertical tabulation implicitly invokes the concept of current page This concept is primarily applied... character Other devices write all subsequent characters, up to the next new- line character, at the final position On some displays, writing to the bottom right corner of a display has an effect other than displaying the character output, for instance, clearing the screen or causing it to scroll The termcap and ncurses both provide configuration options that specify whether writing to this display location... written characters (which can occur in Arabic) This specification implies that the positions are a fixed width apart 58 glyph The graphic representation of a character is known as a glyph C+ + The C+ + Standard does not discuss character display semantics Common Implementations In some oriental languages, character glyphs can usually be organized into two groups, one being twice the width as the other Implementations... described here Coding Guidelines A program cannot assume that any of the functionality described will occur when the escape sequence is sent to a display device The root cause for the variability in support for the intended behaviors is the variability of the display devices In most cases an implementation’s action is to send the binary representation of the escape sequence to the device The manufacturers... lines on the page of the display device being written However, it does place a dependency on the characteristics of the display device being known to the host executing the program, or on the device itself, to respond to the data sent to it 261 \n (new line) Moves the active position to the initial position of the next line termcap database new- line escape sequence Commentary What happens to the preceding... applicable to C+ + source files Coding Guidelines In some cases source files can contain multibyte characters and be translated by translators that have no knowledge of the structure of these multibyte characters The developer is relying on the translator ignoring them in comments containing their native language, or simply copying the character sequence in a string literal into the program image In other . . 226 In the basic execution character set, there shall be control characters representing alert, backspace, carriage basic execution character set control characters return,. source code (they are usually mapped exactly from the source to the execution character set). Java defines the same control characters as C and gives them their

Ngày đăng: 26/01/2014, 07:20

Xem thêm