Software Solution for Engineers and Scientist Episode 3 pps

ADD CL,CL ; Double number to get shift count SHL DX,CL ; Shift mask bits left AND BX,DX ; Mask off all other tag bits SHR BX,CL ; Shift unmasked tag bits right ; ;***************************| ; move message to caller’s | ; buffer and exit | ;***************************| ; At this point BX holds the tag code MOV AX,BX ; Tag code to AX ; The value in AX is multiplied by 8 to obtain the offset ; of the corresponding tag code text message ; Message is then moved to the caller’s buffer by DS:DI LEA ESI,TAG_MESS_TBL ; Offset of table MOV CL,8 ; Length of each message MUL CL ; AX -> offset of correct message ADD ESI,EAX ; Add to table offset ; At this point: ; ESI —> 8-byte number type message ; EDI —> caller’s buffer with 8 bytes minimum space MOV ECX,8 ; Counter for 8 bytes TRANSFER_8: MOV AL,[ESI] ; Get message character MOV [EDI],AL ; Place in caller’s buffer INC ESI ; Bump buffer pointers INC EDI LOOP TRANSFER_8 ; End of processing CLD RET _GET_TAG ENDP SOFTWARE ON-LINE The GET_TAG procedure is found in the Un32_4 module of the MATH32 li - brary, in the book’s on-line software. The contents of the Stack Top register can be determined more precisely using the FXAM or FTST instructions and interpreting the resulting condition code bits, as described in Section 7.0.3. Instruction and Data Pointers The Instruction and Data Pointer registers are part of the math unit environment (see Figure 7.6). These two registers are jointly called the exception pointers. After each floating-point instruction is executed, the math unit automatically saves its op - eration code and address, as well as the operand’s address if one was contained in the instruction. This data, which is saved internally in the math unit, can be exam - ined by storing the environment in memory. The operation of saving and inspecting the environment is shown in the GET_TAG procedure listed previously. The information provided by the instruction and the data pointers is often used by exception handler routines to identify the instruction that generated an error. 154 Chapter 7 In the 80287, 80387, and the math unit of the 486 and the Pentium, the storage for - mats for the instruction and data pointers depend on the operating mode as well as the memory model. In the real mode the value stored is in the form of a 20-bit physi - cal address and an 11-bit math unit opcode. In protected mode the value stored is the 32-bit virtual address of the last coprocessor instruction. The 8087 stores this data as in the real mode mentioned above. Figure 7.8 is a map of the data stored in the exception pointers while the processor is operating in 16-bit real mode. Figure 7.8 Exception Pointers Memory Layout Notice that on the 8087 the instruction address saved in the environment area does not include a possible segment override prefix. This was changed in the 80287 so that the address pointer includes a possible segment override. A portable error handler routine would have to take this difference into account. As shown in Figure 7.6, the location of the exception pointers within the environ - ment area changes according to the memory model. In the 16-bit model the instruc - tion pointer is at word offset 6 from the start of the environment area and the data pointer at word offset 10. In the flat 32-bit memory model the instruction pointer is at word offset 12 and the data pointer at word offset 20. The following code frag - ment shows how the various data elements of the math unit environment area can be defined in the 32-bit memory model. .486 .MODEL flat .DATA ; ; Storage for environment variables in 32-bit memory model ENVIRO_FPU DD 0 ; FPU control word - 4 bytes STATUS_FPU DD 0 ; FPU status word - 4 bytes TAG_WORD DD 0 ; FPU tag word - 4 bytes INST_POINTER DD 0 ; Instruction ptr - 8 bytes DD 0 DATA_POINTER DD 0 ; Data pointer - 8 bytes Math Unit Architecture and Instruction Set 155 INSTRUCTION POINTER EXCEPTION POINTERS IN 16-BIT REAL MODES DATA POINTER instruction address (20 bits) data address (20 bits) opcode (11 bits) Note: 5 most significant bits of opcode field are always 11011B UNUSED 0 0 32 31 63 19 51 21 DD 0 ; ========= ; total 28 bytes In the 16-bit memory model the various areas can be defined as follows: .486 .MODEL medium .DATA ; ; Storage for environment variables in 32-bit memory model ENVIRO_FPU DW 0 ; FPU control word - 2 bytes STATUS_FPU DW 0 ; FPU status word - 2 bytes TAG_WORD DW 0 ; FPU tag word - 2 bytes INST_POINTER DD 0 ; Instruction ptr - 4 bytes DATA_POINTER DD 0 ; Data pointer - 4 bytes ; ========= ; total 14 bytes The different memory layout of the math unit environment area compromises the portability of applications that execute in the various memory models. Applications must take these variations into account not only in defining the memory map, but also in coding CPU instructions that access the stored data. In the preceding code fragments the various data elements of the math unit environment are defined using variables of different sizes. For example, in the 16-bit model the status word is stored in a word variable, while in the 16-bit model it is stored in a doubleword variable. The coding for retrieving the status word into a 16-bit register could be as follows: MOV AX,STATUS_FPU while in a 32-bit model program the code would have to be changed to: MOV EAX,STATUS_FPU AND EAX,0FFFFH ; Clear un-used bits 7.0.5 Math Unit State Area The coprocessor state area is a data area that holds the environment area plus the eight registers in the math unit stack. Since the state area includes the environment, its size changes according to the memory model. In the 16-bit model the state area consists of 94 bytes, while in the 32-bit flat model it requires 108 bytes. The difference of 14 bytes is the difference in size of the environment area in the two models, as discussed in the pre - vious section. The math unit instruction set contains the FSAVE instruction that stores the state area in memory. The FRSTOR instruction serves to reload a saved state into the math unit. Figure 7.9 is a map of the data stored in the state area. System and application software usually save the coprocessor state whenever they wish to clean up the math unit for a new task. In a multitasking environment this can occur at every context or task switch. In addition, an interrupt service routine or an exception handler saves the math unit state in order to use the coprocessor for its own calculations; later the math unit is restored to its original contents. 156 Chapter 7 Figure 7.9 Memory Map of Math Unit State Area Math Unit Architecture and Instruction Set 157 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 . . . . 84 86 88 90 92 0 4 8 12 20 24 28 30 32 34 36 38 40 42 44 46 48 . . . . 98 100 102 104 106 word offset in 16-bit memory model word offset in 32-bit flat memory model bits (16-bit model ) STATUS REGISTER CONTROL REGISTER ENVIRONMENT AREA ST(0) ST(1) REGISTER STACK AREA ST(7) TAG WORD INSTRUCTION POINTER DATA POINTER SIGNIFICAND SIGNIFICAND SIGNIFICAND EXPONENT EXPONENT EXPONENT S S S 0 15 7.1 Math Unit Instruction Patterns You have seen that the math unit Data registers seem to share the characteristics of explicit storage units and that of a stack structure. Another feature of the math unit is that its instruction set can access memory operands using all the memory ad - dressing modes of the central processor. This is due to the fact that the CPU per - forms all address calculations on behalf of the math unit. The result is an abundance of math unit operand patterns that are suitable for most programming situations. A useful coding style is to use the comment area to keep track of the state of the math unit register stack. In this book we often use this notation style, al - though text space limitations often force the use of abbreviations that may be somewhat cryptic. In the code fragments listed in the following section we la - beled three columns with the designations of the first three stack registers: ST, ST(1), and ST(2). Thus, the comment field is a snapshot of a portion of the math unit stack after the instruction executes. Examples of this coding style are found in the following sections. 7.1.1 Register Operands Some math unit instructions can be coded using explicit Numeric Data register operands, for example: ;| ST | ST(1) | ST(2) | ; Initialize processor FINIT ;| EMPTY | EMPTY | EMPTY | ; Perform operations FLD1 ;| 1.0 | EMPTY | EMPTY | FLDZ ;| 0.0 | 1.0 | EMPTY | FLDPI ;| 3.1415 | 0.0 | 1.0 | FADD ST,ST(2);| 4.1415 | 0.0 | 1.0 | FADD ST(1),ST;| 4.1415 | 4.1415 | 1.0 | In this listing the FADD instructions specifically designate which stack regis - ters must be added, and which register holds the sum. Another type of FPU opcodes automatically pop the stack after each instruction executes. The mne - monic for these instructions end with the letter “P” (pop), for example, FADDP. The following code fragment illustrates the action of FADDP instruction. ;| ST | ST(1) | ST(2) | ; Initialize processor FINIT ;| EMPTY | EMPTY | EMPTY | ; Perform operations FLD1 ;| 1.0 | EMPTY | EMPTY | FLDPI ;| 3.1415 | 1.0 | EMPTY | FADDP ST(1),ST;| 4.1415 | EMPTY | EMPTY | In the preceding fragment notice that the stack is popped after the instruction executes. Therefore, the destination operand cannot be the Stack Top register; if this were the case, the sum would be destroyed. Consequently it is illegal to code FADDP ST,ST(i) 158 Chapter 7 The math unit instruction set makes it possible to not designate registers explic - itly, but the implicit encoding can sometimes produce unexpected results. For ex - ample ;| ST | ST(1) | ST(2) | ; Initialize processor FINIT ;| EMPTY | EMPTY | EMPTY | ; Perform operations FLD1 ;| 1.0 | EMPTY | EMPTY | FLDPI ;| 3.1415 | 1.0 | EMPTY | FADD ;| 4.1415 | EMPTY | EMPTY | Notice that the FADD instruction is used with implicit operands. In this case the stack registers ST(1) and ST are added and the stack is then popped. The action of coding FADD with no operands is the same as coding FADDP ST(1),ST. However, it may seem reasonable that the implicit opcode mode would resemble the form FADD ST,ST(1), rather than its actual action. 7.1.2 Memory Operands The math unit can access numeric data stored in memory using any of the five CPU addressing modes: direct, register indirect, base, indexed, and based indexed addressing. A difference between processor and coprocessor memory addressing is that math unit opcodes that reference memory have a single operand. For instance, it is possible to load a memory variable into any of the processor’s general purpose registers MOV AX,MEM_VALUE_1 ; First variable to AX MOV BX,MEM_VALUE_2 ; Second variable to BX MOV DX,MEM_VALUE_1 ; First variable to DX However, the two-operand format is not valid in the math unit instruction set. This is due to the fact that, if the instruction is a load (FLD, FILD, or FBLD) the des - tination is always the Stack Top register (ST), while if the operation is a store, the source is assumed to be in the Stack Top register. In instructions that perform calcu - lations, a memory operand is always a source. For example FLD SINGLE_PREC ; Memory variable to ST FST DOUBLE_PREC ; ST stored in memory variable FADD LONG_INT ; ST = ST + memory variable . . . LEA BX,DOUBLE_PREC ; Set pointer to memory variable FADD QWORD PTR [BX] ; ST = ST + variable —> [EBX] 7.2 Math Unit Instruction Set The math unit instruction set is classified into six groups according to their operation. The groups of instructions are named data transfer, arithmetic, comparison, transcen - dental, constant, and processor control. In the following sections we present a brief description of the instructions in each of these groups. Math Unit Architecture and Instruction Set 159 7.2.1 Data Transfer Instructions The data transfer instructions are used to move numeric data between stack regis - ters, and between registers and memory. Any of the seven math unit data types can be read from a memory storage into the Stack Top register. The math unit automati - cally converts the numeric data into the extended precision format as it is loaded into the register stack. The data transfer instructions automatically update the Tag register. Separate instructions are provided for loading and storing real, integer, and packed binary coded decimal numbers. The FI prefix identifies the integer load and store instructions and the FB prefix the packed BCD transfers. The FST (store real) instruction transfers the stack top to the destination oper - and, which can be a memory variable or another stack register. However, FST can only be used to store the stack top into a single or double precision real variable. FSTP (store real and pop) must be used to store into a memory destination in ex - tended precision real format. Constants, special encodings, temporary results, and other operational data that could affect the precision of the final result should always be stored in extended precision format. On the other hand, final results should not be represented in the extended format since this defeats its purpose, which is absorbing rounding and computational errors. The store opcodes that end in the letter “P” pop the stack after the data transfer is executed. The encoding FSTP ST(0) pops the stack without a data transfer, effectively discarding the contents of ST(0). Table 7.4 describes the nine opcodes related to math unit data transfer instructions. Table 7.4 Math Unit Data Transfer Instructions MNEMONICS OPERATION EXAMPLES TRANSFER OF REAL NUMBERS FLD Load real memory variable or stack FLD SINGLE_REAL register onto stack top. Value is FLD DOUBLE_REAL converted to extended real format FLD EXENDED_REAL FLD ST(2) FST Store stack top in another stack FST ST(3) register or in a real memory FST SINGLE_REAL variable. Rounding is according FST DOUBLE_REAL to RC field of control word. Coding FLD ST(0) duplicates the stack top FSTP Store stack top in another stack FSTP ST(2) register or in a real memory FSTP SINGLE_REAL variable and pop stack. Rounding FSTP DOUBLE_REAL is according to RC field in FSTP EXTENDED_REAL control word. (continues) 160 Chapter 7 Table 7.4 Math Unit Data Transfer Instructions (continued) MNEMONICS OPERATION EXAMPLES TRANSFER OF REAL NUMBERS FXCH Swap contents of stack top and FXCH ST(2) another stack register. If no FXCH explicit register, ST(1) is used INTEGER TRANSFERS FILD Load word, short or long integer FILD WORD_INTEGER to stack top. Loaded number is FILD SHORT_INTEGER converted to extended real FILD LONG_INTEGER FIST Round stack top to integer. FIST WORD_INTEGER Rounding is according to the RC FIST SHORT_INTEGER field in the control word. FIST stores in integer memory variable. FISTP (see below) must be used to store a long integer FISTP Round stack top to integer, per FISTP WORD_INTEGER RC field in the status word, store FISTP SHORT_INTEGER in variable and pop stack FISTP LONG_INTEGER TRANSFER OF PACKED BCD FBLD Load packed BCD to stack top FBLD PACKED_BCD FBSTP Store stack top as a packed BCD FBSTP PACKED_BCD integer and pop stack. Non-integers are rounded before storing 7.2.2 Nontranscendental Instructions The math unit nontranscendental instructions provide the basic arithmetic opera - tions required by ANSI/IEEE 754. These are: addition, subtraction, multiplication, di - vision, and remainder. In addition, the math unit instruction set includes several other operations not required by the standard, such as the calculation of square roots, rounding, scaling, partial remainder, change of sign, and the extraction of exponent and significand. In the original Intel literature the nontranscendental instructions were called the arithmetic instructions. Basic Arithmetic The fundamental arithmetic instructions that perform addition, subtraction, multipli - cation, and division are straightforward and uncomplicated. Addition and multiplica - tion are commutative, that is, the result is independent of the order of the operands. In order to extend this symmetry to all fundamental arithmetic operations, the math unit provides opcodes for reversing the operands of subtraction and division. Further - more, there are separate operand modes for performing integer and real arithmetic. Table 7.5 lists the operand options for the math unit nontranscendental instructions that perform basic arithmetic. Math Unit Architecture and Instruction Set 161 In Table 7.5 notice that if no explicit operand is present in the mnemonic, the math unit operates as a pure stack machine. In this case the source operand is as - sumed to be in ST and the destination in ST(1). After performing the calculation the result is stored in ST(1) and the stack is popped, effectively replacing both operands with the result. Perhaps a more reasonable way of implementing a clas - sical stack operation is to use an operand in the form ST(1),ST and the pop mne - monic form of the opcode (see Table 7.5). For example, in the instruction FADDP ST(1),ST the sum of ST and ST(1) is placed in ST(1) and the stack is popped. The result is the same as coding FADD with no operand but the action of the instruction is more clearly expressed by the explicit encoding. Table 7.5 Operand Modes for Arithmetic Instructions INSTRUCTION MNEMONIC OPERAND SAMPLE CODING TYPE FORMAT DESTINATION,SOURCE implicit F opcode {ST(1),ST} FADD (pop stack) registers F opcode ST(i),ST FADD ST,ST(1) (explicit) or ST,ST(i) register F opcode P ST(i),ST FADDP ST(2),ST (explicit and pop) memory F opcode {ST},MEM_VAR FADD MEM_VAR (real number) memory FI opcode {ST},MEM_INT FIADD MEM_INT (integer number) F opcode : ACTION: ADD destination <= destination + source SUB destination <= destination – source SUBR destination <= source - destination MUL destination <= destination · source DIV destination <= destination / source DIVR destination <= source / destination Legend: Braces { } indicate implicit operands Scaling and Square Root The FSQRT instruction calculates the square root of the number in ST(0). Intel doc - umentation states that the algorithm used in the calculation of the square root in - sures that the FSQRT instruction executes faster than ordinary division. At the time of the introduction of the 8087 this level of square root calculation performance had no precedent in commercial floating-point hardware. The result of the square root is accurate to within one-half of the last significand digit, which is the same precision obtained by the add, subtract, multiply, and divide operations. The FSCALE (scale) opcode is designed to provide a fast multiplication and di - vision by integral powers of 2. The operation interprets the value in ST(1) as an 162 Chapter 7 exponent and adds its value to the exponent field of the number in ST. This action can be expressed as ST <= ST · 2 ST(1) For example, if the value in ST(1) is the integer 3, then the FSCALE instruction performs ST <= ST · 2 3 ST <= ST · 8 If ST = 1 then FSCALE calculates a power of 2. Negative powers of the value in ST(1) indicates a subtraction of the exponent, which results in effectively dividing the operand in ST by the power of 2 in ST(1). The following fragment shows the pro - cessing for quickly and accurately obtaining p/4, a constant sometimes used in argu - ment reduction prior to the calculation of trigonometric functions. .DATA ; NEG_TWO DW -2 ; Storing of constant -2 .CODE . . . ;| ST | ST(1) | ST(2) | ;| EMPTY | EMPTY | EMPTY | FILD NEG_TWO ;| -2 | EMPTY | EMPTY | FLDPI ;| PI | -2 | EMPTY | FSCALE ;| PI/4 | -2 | EMPTY | FSTP ST(1) ;| PI/4 | EMPTY | EMPTY | ; At this point ST(0) holds PI/4 In the 8087 and 80287 the scaling factor, in ST(1), must be an integer in the range ±32767. However, there is no limit to the scaling factor in the 80387 and the math unit of the 486 and the Pentium. In the newer machines, if the value in ST(1) is not an integer, it is chopped to the nearest integer before it is added to the exponent of ST. In order to ensure that the scaling factor is an integer, it is a good programming practice to define it in an integer variable and load it into the math unit by means of the FILD instruction, as in the preceding fragment. Partial Remainder The FPREM (partial remainder) instruction performs modulo division of ST by ST(1). In this case the modulus is assumed to be in ST(1). Like FSCALE, the FPREM instruc - tion allows no explicit operands. FPREM produces an exact result, therefore the pre - cision exception does not occur and the rounding field of the control word has no effect. FPREM allows implementing operations of finite algebra and modular arithme - tic on the math unit. These operations, sometimes referred to as clock arithmetic, are based on closed number systems which wrap around to the first number in the set. For example, consider a 12-hour clock showing the present time as 2 o’clock. The clock time 54 hours later is calculated as follows: Math Unit Architecture and Instruction Set 163 [...]... trigonometric function, in the 8 038 7 and the math unit of the 486 and the Pentium this reduction is usually unnecessary, since these math units have a considerably expanded operand range Specifically: the valid operand range in the 8087 and 80287 is an angle between 0 and p/4 radian while in the 8 038 7 and the math unit of the 486 Chapter 7 168 64 64 and the Pentium this range is between 0 and 2 radian Considering... math unit control word (see Figure 7 .3) FXTRACT (extract exponent and significand) breaks down the number at the stack top into its exponent and significand fields The exponent is stored in ST(1) and the significand in ST Notice that this conversion refers to the actual binary exponents and significands in extended precision format and not to its decimal equivalents For example, suppose that the number... instruction and to scale the results The transcendental instructions require that the operands be in ST or in ST and ST(1) and return the result in ST All trigonometric transcendentals assume operands in radian measure In the 8087 and 80287 the scope and operand range for the trigonometric transcendentals was limited For this reason the calculation routines had to include prologue code to scale the operand... exponentials, as would be convenient for directly calculating 10y, ey, or xy The instruction F2XM1 calculates 2 to the x and subtracts 1 from the result The reason for subtracting 1 is to improve accuracy for values of x close to 0 In the 8087 and 80287 the operand range is limited to 0 = < x = < 0.5 In the 8 038 7 and the math unit of the 486 and the Pentium the operand range was expanded to – 1 < x < 1 However,... testing or argument reduction 63 63 18 It has been documented by Intel that in the 8 038 7 and the math unit of the 486 and the Pentium, argument reduction to the first octant is performed internally using a higher precision constant for the modulus p/4 than can be represented externally For this reason, it is undesirable to use argument reduction routines designed for the 8087 and the 80287 when developing... Unit Architecture and Instruction Set 175 Table 7.10 Math Unit Transcendental Instructions MNEMONICS OPERATION EXAMPLES FCOS (8 038 7) Calculates cosine of stack top and 63 returns value in ST |ST| < 2 Input in radians FCOS FSIN (8 038 7) Calculates sine of stack top and 63 returns value in ST |ST| < 2 Input in radian FSIN Calculates sine and cosine of ST SIne appears in ST and cosine in 63 ST(1) |ST| . 8087 and 80287 is an angle between 0 and p/4 radian while in the 8 038 7 and the math unit of the 486 Math Unit Architecture and Instruction Set 167 x x m m -m -m 2m 2m -2m -2m -3m -3m -4m -4m 3m 3m 4m m m -m 2 2 -m 4m graph. State Area Math Unit Architecture and Instruction Set 157 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 . . . . 84 86 88 90 92 0 4 8 12 20 24 28 30 32 34 36 38 40 42 44 46 48 . . . . 98 100 102 104 106 word. use an operand in the form ST(1),ST and the pop mne - monic form of the opcode (see Table 7.5). For example, in the instruction FADDP ST(1),ST the sum of ST and ST(1) is placed in ST(1) and the

Định dạng
Số trang	90
Dung lượng	362,45 KB