Software Solution for Engineers and Scientist Episode 2 pptx

eration described above. In this opcode the right-most bit is moved into the carry flag. Figure 3.2 shows the action of the 80x86 shift instructions. The 80x86 opcodes for performing a bit shift to the left are SHL (shift logical left) and SAL (shift arithmetic left). Notice that SHL and SAL are different mne - monics for the same operation (see Figure 3.2). In SHL and SAL it is the left-most bit of the operand that is moved into the carry flag. The terms logical and arithmetic, as used in the SHL and SAL opcodes, reflect a potential problem associated with shifting bits in a signed representation. The problem is that negative numbers in two’s complement form always have the high bit set. Therefore, when the bits of a two’s complement number are shifted, the sign bit can change unpredictably. For this reason, in left-shift operations of signed operands the sign bit is moved into the carry flag. After performing the shift, software can test the carry flag and make the necessary adjustments. On the other hand, in a right-shift operation the sign bit is moved from bit num - ber 7 to bit number 6, and a zero bit is introduced into the sign bit position. This action makes all signed numbers positive. In order to make possible shift operations of signed numbers the 80x86 instruction set has a separate opcode for the right-shift of signed numbers. The SAR opcode (shift arithmetic right) preserves the sign bit (bit number 7) while shifting all other bits to the right. This action can be seen in the diagram for the SAR instruction in Figure 3.2. Note that, in the SAR instruction, the left-most bit (sign bit) is both preserved and shifted. For example, the value 10000000B becomes 11000000B after executing the SAR operation. This action is sometimes called a sign extension operation. Figure 3.2 80x86 Bit Shift Instructions 64 Chapter 3 76543210 76543210 76543210 76543210 76543210 76543210 0 0 CF CF CF SHL - shift logical left SAL - shift arithmetic left SHR - shift logical right SAR - shift arithmetic right The 8-bit microprocessors that preceded the 80x86 family (such as the Intel 8080, the Zilog Z80, and the Motorola 6502) did not include multiplication and division in - structions. In these chips multiplication and division had to be performed by soft - ware. One approach to multiplication was through repeated addition. Occasionally this approach is still useful. The following code fragment illustrates multiplication by repeated addition using 80x86 code. ; Multiplication of AL * CX using repeated addition MOV AH,0 ; Clear register used to ; accumulate sum MOV AL,10 ; Load multiplicand MOV CX,6 ; Load multiplier MULTIPLY: ADD AH,AL ; Add AL to sum in AH LOOP MULTIPLY ; AH now holds product of 10 * 6 An often-used method for performing fast multiplication and division operations is by shifting the bits of the operand. This method is based on the positional proper - ties of the binary number system. In the binary number scheme the value of each digit is a successive power of 2 (see Chapter 1). Therefore, by shifting all digits to the left, the value 0001B (1 decimal) successively becomes 0010B (2 decimal), 0100B (4 decimal), and 1000B (8 decimal). A limitation of binary multiplication by means of bit shift operations is that the multiplier must be a power of 2. If not, then the software must shift by a power of 2 that is smaller than the multiplier and add the multiplier as many times as necessary to complete the product. For example, to multiply by 5 we can shift left twice and add once the value of the multiplicand. A more practical approach can be based on the same algorithm used in longhand multiplication. For example, the multiplication of 00101101B (45 decimal) by 01101101B (109 decimal) can be expressed as a series of products and shifts, in the following manner: 00101101B=45decimal times 01101101B=109decimal 00101101 00000000 00101101 00101101 00000000 00101101 00101101 00000000 001001100101001B=4905 decimal The actual calculations using this method of binary multiplication are quite sim - ple, since the product by a 0 digit is zero and the product by a 1 digit is the multipli - cand itself. The multiplication routine simply tests each digit in the multiplier. If the Machine Arithmetic 65 digit is 1, the multiplicand is shifted left and added into an accumulator. If the digit is 0, then the bits are shifted but the addition is skipped. Shift-based multiplication routines were quite popular in processors that were not equipped with a multiplication instruction. In the case of the 80x86 there seems to be little use for multiplication routines based on bit shifts, since the pro - cessor is capable of performing efficient multiplications internally. For this rea - son, 80x86 programmers find little practical use for the SAR and SAL opcodes in developing arithmetic routines, although these opcodes are still useful for other bit manipulations. Bit Rotate Instructions The 80x86 rotate instructions also shift the bits in the operand to the left or right. The differencebetween theshift andthe rotateis thatin therotate thebit shiftedout is either re-introduced at the other end of the operand or is stored in the carry flag. The ROL opcode (rotate left) shifts the bits to the left while the high-order bit is cy - cled back to the low-order bit position, as well as stored in the carry flag. The ROR opcode operates in a similar manner, except that the action takes place left-to-right. In both instructions, ROL and ROR, the carry flag is used to store the recycled bit, which can be conveniently tested by the software. Figure 3.3 shows the action of the 80x86 rotate instructions. Figure 3.3 80x86 Bit Rotate Instructions 66 Chapter 3 76543210 76543210 76543210 76543210 76543210 76543210 76543210 76543210 CF CF CF CF ROL - rotate left RCL - rotate through carry left ROR - rotate right RCR - rotate through carry right Two rotate instructions, RCL (rotate through carry left) and RCR (rotate through carry right), use the carry flag as a temporary storage for the bit that is shifted out. This action can be seen in the diagrams of Figure 3.3. Note that the bit shifted out is not recovered at the other end of the operand until the instruction is re-executed. It is also interesting that by repeating the rotation as many times as there are bits in the destination operand the rotate instructions preserve the original value. This re - quires rotating a byte-size operand 8 times, a word-size operand 16 times, and so on. Double Precision Shift Instructions The 386 introduced two new opcodes for performing bitwise operations on long bit strings. These opcodes have the mnemonic SHLD (double precision shift left) and SHRD (double precision shift right). The instructions are also available in the 486 and the Pentium. The double precision shift instructions SHLD and SHRD require 3 operands. For example: SHLD AX,BX,12 The left-most operand (AX) is the destination of the shift. The right-most operand (12) is the bit count. The middle operand (BX) is the source. The bits in the source operand are moved into the destination operand, starting with the sources’ high order bits. Source and destination must be of the same size, for example, if the destination is a word-size register then the source has to be a word size register or memory variable. By the same token, if the destination is a doubleword register or memory location then the source must also be 32-bits wide. Either source or destination may be a memory operand, but at least one of them must be a machine register. The count operand can be an immediate byte or the value in the CL register. The limit of the shift count is 31 bits. The following code fragment shows a double preci - sion bit shift. ; Demonstration of the action preformed by the double precision ; shift left (SHLD) MOV EAX,3456H ; One operand to destination MOV EBX,10000000H ; Source operand SHLD EAX,EBX,4 ; Shift left EAX digits 4 bits ; and introduce EBX bits into ; EAX bits vacated by the shift ; At this point: ; EAX = 34561 ; EBX = 10000000 (unchanged) The most common used of the SHLD and SHRD instructions is in manipulating long bit strings. For example, you can overlay a memory variable with a register value, as in the following code fragment using inline assembly: int var1; main() { _asm { MOV EBX,12300000H ; Source operand Machine Arithmetic 67 SHLD var1,EBX,12 // ASSERT: // VAR1 = 123H } } In the above code fragments notice that the SHLD instruction has been used to shift 4 packed BCD digits. The digit shift is accomplished by selecting a bit count that is a multiple of 4, since each digit takes up 4 bits. In this manner a bit count of 8 would have shifted 2 packed BCD digits. Also notice that the source register is unchanged by the double precision shift. Shift and Rotate Addressing Modes The addressing modes for shift and rotate opcodes have undergone several changes in the different microprocessors of the 80x86 line. In the 8086 and 8088, shift and ro - tate can use a count in the CL register or the number 1 as an immediate operand. Later processors allow an 8-bit immediate operand. The following code fragment il - lustrates the valid addressing modes in each case. ; Shift and rotate addressing modes in the 8086 and 8088 chips SHL AL,1 ; Shift left 1 bit position MOV CL,4 ; Shift count to CL SHL AL,CL ; Shift left 5 bit positions . . . ; Shift and rotate addressing modes in the 80286, 80386, 486, ; and Pentium, in which an 8-bit immediate operand can be specified ; directly SHR AX,3 ; Shift right 3 bits . . . ; In the 80386, 486, and Pentium the shift and rotate opcodes allow ; a 32-bit register operand as a destination, for example SHL EBX,4 ; Shift EBX 4 bits . . . 3.3.2 Comparison, Bit Scan, and Bit Test Instructions The CMP (compare) instruction changes the flags as ifa subtractionhad takenplace but does not change the value of the operands. The action can be described as set - ting the Status register as if the source operand had been subtracted from the desti - nation. The instruction is typically followed by a conditional jump. The following code fragment shows the use of CMP in determining the relative value of an operand in a machine register. ; Use of CMP to determine if BX > AX, BX < AX, or BX = AX ; Code assumes that the values in AX and BX are unsigned binary CMP AX,BX ; Simulate AX minus BX JA AX_ABOVE ; Go if AX > BX JB AX_BELOW ; Go if AX < BX ; At this point AX = BX . 68 Chapter 3 . . ; Entry point for AX > BX AX_ABOVE: . . . ; Entry point for AX < BX AX_BELOW: . . . The TEST instruction performs a logical AND and updates the flags without changing the operands. If a TEST instruction is followed by JNZ, the jump is taken if there are matching 1-bits in both operands. The following code fragment shows the use of the TEST opcode. ; Use of TEST to determine if bit 7 of the AL register is set TEST AL,10000000B ; ANDing AL and binary mask JNZ HIGH_BIT_SET ; Go if AL bit7=1 ; At this point AL bit7=0 . . . ; Entry point for AL bit 7 set HIGH_BIT_SET: . . . The 80386 CPU introduced several new bit manipulating instructions that allow more elaborate bit scanning and testing. The BSF (bit scan forward) opcode scans the source operand low-to-high and stores, in the destination operand, the bit posi - tion of the first 1-bit found. If all bits of the source operand are 0, then the zero flag is set, otherwise the zero flag is cleared. BSR (bit scan reverse) performs the same test but starting at the high-order bit position. Both instructions require word or doubleword operands; byte operands are not allowed. The following code fragment shows the operation of BSF. ; Use of the BSF and BSR instructions to determine the number of ; the first bit set in the source operand. MOV AX,10001000B ; Right-to-left first bit ; set is number 3 BSF BX,AX ; AX bit number into BX ; At this point BX = 03 since the first bit set is in bit ; position number 3 when read low-to-high. Zero flag is clear BSR CX,AX ; AX bit number into CX ; read high-to-low ; At this point CX = 07 since bit number 7 of AX is the first ; bit set when read high-to-low. Zero flag is clear The bit test opcodes BT (bit test), BTS (bit test and set), BTR (bit test and reset), and BTC (bit test and complement) were also introduced with the 386 processor. All of these opcodes copy the value of a specified bit into the carry flag. The code can Machine Arithmetic 69 later include a JC or JNC instruction to direct execution according to the state of the carry flag. In addition, the bit tested can be modified in the destination oper - and: BTS sets the tested bit, BTR clears the tested bit, and BTC complements the tested bit. The following code fragment shows the action of these opcodes. ; Use of BT, BTS, BTR, and BTC opcodes to test and manipulate ; bits according to their position MOV AX,10001000B ; Set value in operand BT AX,3 ; Test AX bit 3 ; Carry flag is set since AX bit 3 is set. AX is not changed BTS AX,0 ; Test AX bit 0 ; Carry flag is clear since AX bit 0 is not set ; AX = 10001001B since the instruction sets the specified bit BTR AX,7 ; Test AX bit 7 ; Carry flag is set since AX bit 7 is set ; AX = 00001001B since bit 7 is reset (cleared) by BTR BTC AX,1 ; Test AX bit 1 ; Carry flag is clear since bit 1 is cleared ; AX = 00001011B since bit 1 is toggled (complemented) by BTC Signed and Unsigned Conditional Jumps The 80x86 provides two categories of conditional jump opcodes: one for operating on integers and one for operating on signed numbers in two’s complement form. For example, JA (jump if above) and JB (jump if below) assume that the operands are unsigned integers while JG (jump if greater) and JL (jump if less) assume that the operands are signed numbers in two’s complement format. Table 3.2 shows the 80x86 conditional jump instructions according to their signed or unsigned interpre- tation. Notice in Table 3.2 that the conditional jump instructions that assume signed operands use the sign and the overflow flag to determine their action. The sign flag is clear when the result of the operation is a binary positive number, that is, one in which the high bit is 0. The sign flag is set if the result of the previous oper - ation is a binary negative number, that is, one in which the high bit is set. On the other hand, unsigned arithmetic routines usually ignore the sign flag since the high-order bit of unsigned binary numbers is interpreted as value. The overflow flag indicates a signed positive number that is too large to represent in the format, or a signed negative number that is too small. In signed arithmetic this flag indi - cates an overflow, however, it is usually ignored when operating on unsigned bi - nary numbers. Several jump instructions in Table 3.2 are based on the parity flag, namely: JNP (jump if no parity), JPO (jump if parity odd), JP (jump if parity), and JPE (jump if parity even). This flag is set if the low-order eight bits of the result contain an even number of 1-bits (parity even) and cleared otherwise. This flag was provided for compatibility with the Intel 8080 and 8005 processors. Although the parity flag can be used to assure the integrity of data transmissions, it has no application in arithmetic or logic routines. 70 Chapter 3 Table 3.2 x86 Conditional Jumps MNEMONIC FLAG ACTION DESCRIPTION CONDITIONAL JUMPS THAT ASSUME UNSIGNED OPERANDS JA (CF or ZF) = 0 jump if above JNBE jump if not below or equal JAE CF = 0 jump if above or equal JNB jump if not below JNC jump if no carry JB CF = 1 jump if below JNAE jump if not above or equal JC jump if carry set JBE (CF or ZF) = 1 jump if below or equal JNA jump if not above JE ZF = 1 jump if equal JZ jump if zero JNE ZF = 0 jump if not equal JNZ jump if not zero JNP PF = 0 jump if no parity JPO jump if parity odd JP PF = 1 jump if parity JPE jump if parity even CONDITIONAL JUMPS THAT ASSUME SIGNED OPERANDS JG ((SF xor OF) or ZF) = 0 jump if greater JNLE jump if not less or equal JGE (SF xor OF) = 0 jump if greater or equal JNL jump if not less JL (SF XOR OF) = 1 jump if less JNGE jump if not greater or equal JLE ((SF xor OF) or ZF) = 1 jump if less or equal JNG jump if not greater JNO OF = 0 jump if no overflow JNS SF = 0 jump if positive (no sign) JO OF = 1 jump if overflow JS SF = 1 jump if negative (sign set) Legend: CF = carry flag ZF = zero flag PF = parity flag SF = sign flag OF = overflow flag 3.3.3 Increment, Decrement, and Sign Extension Instructions The INC (increment) instruction adds 1 to the value of the destination while the DEC (decrement) instruction subtracts 1. INC and DEC are often used in manipulating pointers although they find occasional application in arithmetic routines, mainly in adjusting after overflow or underflow conditions. Both instructions assume that the operand is an unsigned integer, therefore they do not affect the carry flag. For this rea - son, when operating with signed magnitudes it is preferable to use the ADD and SUB instructions. The 80x86 instruction set also includes several opcodes whose action is often de - scribed as performing a sign extension of the source operand. CBW (convert byte to word) converts a signed byte in two’s complement form into a signed word, also in Machine Arithmetic 71 two’s complement. The source is always the AL register and the destination is AX. The conversion is performed by copying the most significant bit of AL into all AH bits. Therefore the signed value 0083H is converted into FF83H, hence the use of the term sign extension to describe its action. The opcode CWD (convert word to doubleword) performs the same conversion regarding a word in AX to a doubleword in DX:AX. The 80386 processor introduced two new sign extension instructions designed to operate on 32-bit and 64-bit operands. CWDE (convert word to doubleword ex - tended) converts a signed 16-bit number in AX into a signed 32-bit number in EAX. The CDQ (convert doubleword to quadword) assumes a two’s complement num - ber in EAX and converts it into a signed 64-bit integer in EDX:EAX. The sign ex - tension opcodes are useful in performing signed multiplication and division when one of the operands is in a different format than the destination. The following code fragment is a demonstration of the use of the CBW instruction. ; Use of CBW to multiply a signed word operand in BX by a ; signed byte in AL MOV BX,-1234 ; Load byte multiplier MOV AL,-104 ; Load multiplicand (98H) CBW ; Convert to word ; At this point AX holds FF98H (signed byte converted to word) IMUL BX ; -1234 * -104 ; Result of -1234 * -104 is 128,336. The product is stored ; in DX:AX as 0001:F550H 3.3.4 486 and Pentium Proprietary Instructions The 486 and Pentium processors introduced 4 new instructions that are related to arithmetic processing; these are: BSWAP (byte swap), XADD (exchange and add), CHPXCHG (compare and exchange), and CMPXCHG8B (compare and exchange 8 bytes). BSWAP The BSWAP instruction reverses the byte order in a 32-bit machine register. One use of BSWAP is in converting data between the little endian and the big endian formats. In this sense it is possible to use BSWAP to reverse the order of unpacked decimal digits loaded from a memory operand into a 32-bit machine register. For example: assume four unpacked decimal digits are stored in a memory operand with the least significant digit in the lowest order location, as would be the case in a conventional BCD format. When these digits are loaded into a machine register by means of a MOV instruction their order would be reversed. The following code simulates this situation. DATA SEGMENT FOUR_DIGS DB 01H,02H,03H,04H DATA ENDS If these digits are now loaded into a 32-bit machine register, typically by means of a pointer register, their order would be reversed, as shown in the following fragment. 72 Chapter 3 LEA SI,FOUR_DIGITS ; Pointer to unpacked BCD MOV EAX,DWORD PTR [SI] ; Load EAX using pointer ; EAX = 04030201H At this point the unpacked BCD digits are reversed in the EAX register. In a Pentium machine the situation can be easily corrected by means of the BSWAP in - struction. The instruction would reverse the bytes in EAX, as follows BSWAP EAX ; Swap bytes in EAX ; EAX = 01020304H Figure 3.4 shows the action of the BSWAP instruction. Figure 3.4 Action of the 486 BSWAP Instruction In a 386 CPU reversing the byte order in a 32-bit register requires several XCHG (exchange) operations. The following procedure simulates the BSWAP in a 80386 machine. BSWAP_EAX PROC NEAR ; Simulate the 486 BSWAP EAX instruction on a 386 machine ; Comments assume that on entry EAX = 0403 0201H ; After byte inversion EAX will hold 0102 0304H ; PUSH EBX ; Save EBX in stack MOV EBX,EAX ; Copy EAX in EBX SHR EBX,16 ; Shift high word into low word ; At this point: ; EAX = 0403 0201H ; EBX = 0000 0403H XCHG AH,AL ; EAX = 0403 0102H SHL EAX,16 ; EAX = 0102 0000H XCHG BH,BL ; EBX = 0000 0304H OR EAX,EBX ; EAX = 0102 0304H POP EBX ; Restore EBX RET BSWAP_EAX ENDP XADD The 486 XADD (exchange and add) instruction requires a source operand in a machine register and a destination operand, which can be a register or a memory variable. When XADD executes, the source operand is replaced with the destination and the Machine Arithmetic 73 23 16 23 16 31 24 31 24 15 8 15 8 70 70 [...]... “bcd20math.h” // Source numbers for tests (to 33 significant digits) char asc1[] = “1 .22 334455667788771 122 334455667711"; char asc2[] = “1.11111111111111111111111111111111"; // BCD20 data char num1 [20 ]; char num2 [20 ]; char bcdResult [20 ]; // ASCII data char ascResult[ 52] ; int main() { High-Precision Arithmetic 93 // BCD20 addition AsciiToBcd20(asc1, num1); AsciiToBcd20(asc2, num2); SignAddBcd20(num1,... SignAddBcd 12( ) performs signed addition of two floating-point BCD numbers encoded in BCD 12 format 2 SignSubBcd 12( ) performs signed subtraction of two floating-point BCD numbers encoded in BCD 12 format 3 SignMulBcd 12( ) performs signed multiplication of two floating-point BCD numbers encoded in BCD 12 format 4 SignDivBcd 12( ) performs signed division of two floating-point BCD numbers encoded in BCD 12 format... extensive BCD format The BCD20 format allows representing 34 significant digits and uses the same sign encoding and exponent range as BCD 12 The structure of the BCD20 format is shown in Figure 4.4 and in Table 4.1 sign of number (1 BCD digit) sign of exponent (1 BCD digit) exponent (4 BCD digits) significand (34 BCD digits) S s e e e e m m m m 2 bytes m m 17 bytes 20 bytes Figure 4.4 Map of the BCD20 Format... representation used in the BCD 12 and BCD20 encodings 2 The routines calculate results to double the number of significand digits of the input format, plus a possible carry That is, the BCD 12 routines calculate to 37 binary coded decimal digits, and the BCD20 routines to 69 binary coded decimal digits These results are rounded and returned in the BCD 12 or BCD20 format of the operands, respectively Doubling... following sections, refer to the BCD 12 format The BCD20 format is described at the end of this chapter For each function in BCD 12 arithmetic there is a corresponding one in BCD20 4 .2 Floating-Point BCD Addition The function SignAddBcd 12( ), listed in Section 4.6, performs the signed addition of two floating-point numbers encoded in BCD 12 format The processing assumes that the BCD 12 number has been normalized... significant digits of the formats are maintained in multiplication and division 3 While the BCD 12 and BCD20 formats store digits in packed form, the arithmetic routines unpack these digits prior to performing numerical calculations One reason for this practice is that the Intel CPUs do not contain instructions for multiplication and division of packed BCD operands In order to maintain uniform processing all... significant digits The processing of BCD20 numbers is similar to that of BCD 12; therefore BCD20 routines are not listed in the text These functions can be found in the bcd20math.cpp module that is furnished in the book’s CD ROM 4.0.1 ANSI/IEEE 854 Standard On March 12, 1987, the Standards Board of the Institute of Electrical and Electronic Engineers approved the IEEE Standard for Radix-Independent Floating-Point... MOV EDI,bcd2 MOV EBX,result CALL SIGN_DIV_BCD 12 } return; } SOFTWARE ON-LINE The C++ functions for the BCD 12 arithmetic are found in the file BCD 12. h located in the folder Sample Code\Chapter04\BCD 12 Arithmetic in the book’s on-line software The project BCD 12 Arithmetic exercises and tests the low-level procedures located in the Un 32_ 2 module of the MATH 32 library 4.6 High-Precision BCD Arithmetic One... standard C and C++ use these standards in representing floating point numbers The C/C++ float type corresponds to ANSI/IEEE single format and the C/C++ double type to ANSI/IEEE double format Table 2. 2 shows that the significand in the ANSI/IEEE 754 double format is 53 binary digits wide, to which we must add an implicit 1-bit The largest decimal significand allowed in 54 bits is 720 ,575,940,379 ,27 7,743,... exponents of the multiplicand and the multiplier 4 The significand of the product is the significand of the multiplicand times the significand of the multiplier 5 The operations performed on the significands may require adjusting exponents in order to maintain a normalized result Figure 4 .2 is a flowchart of the processing performed by the SignMulBcd 12( ) function START SAVE ENTRY DATA AND CLEAR BUFFERS x . instructions. Figure 3.3 80x86 Bit Rotate Instructions 66 Chapter 3 7654 321 0 7654 321 0 7654 321 0 7654 321 0 7654 321 0 7654 321 0 7654 321 0 7654 321 0 CF CF CF CF ROL - rotate left RCL - rotate through carry left ROR. does not specify formats for floating-point numbers or encodings of integers or strings representing decimal numbers. Therefore BCD and ASCII formats, such as the BCD 12 and BCD20, used in the. coded and stored in float - ing-point BCD 12 and BCD20 formats. This means that the processing algorithms are based on the floating-point exponential representation used in the BCD 12 and BCD20 encodings. 2.

Định dạng
Số trang	90
Dung lượng	392,64 KB