Floating-Point C~mparison Operatiqns

Section 3.9 Section 3.9 Heterogeneous Data Structures 275

3.11.6 Floating-Point C~mparison Operatiqns

AVX2 provides two instructions for comparing floating-point values:

Instruction ucomiss S1, Sz

ucomisd S1, Sz

Based on Description

Compare single precision Compare double precision

These instructions are similar to the CMP instructions (see Section 3.6), in that they compare operands S1 and S2 (but in the opposite order one might expect) and set the condition codes to indicate their relative values. As with cmpq, they follow the ATJLformat convention of listing•the•operands in reverse order. Argument S2 must Ile.in an XMM register, while s1 can be either in an XMM register or in memory.

The floating-point comparison instructionsãset three condition codes; the zero flag ZF, the carry flag CF, and the parity flag PF. We did not document the parity flag in Section 3.6.1, because it is not commonly found in Gee-generated x86 code.

For integer operations, this flag is set when the most recent arithmetic or logical operation yielded a value 'where the least significant byte"has 'even parify (i.e., an even number of ones in the byte). For floating-point comparisons, however, the flag is set when either operand is NaN. By convention, any''comparison in C is consider~d to fail when one of the arguments is NaN, and this flag is used to detect such a condition. For example, even the co111parison x == x yields 0 when x is NaN.

'rile condifion codes are set as follows:

Ordering l52:s1 CF ZF PF

Unordered 1 1 1

S2 < S1 1 0 0

S2=S1 0 1 0

S2 > S1 0 0 0

The unordered case occurs when either operand is NaN. This can be detected with the parity flag. Commonly, the jp (for "junip on parity") instnfction is used to conditionally jump when a floating-point comparison yields an unordered result Except for this case, the values of the carry and zero flags are the same as those for an unsigned comparison: ZF is set when the two operands are equal, and CF is

Section 3.11 Floating-Point Code 307 (a) C code

typedef enum {NEG, ZERO, POS, OTHER} range_t;

range_t find_range(float x) {

}

int result;

if (x < 0) result NEG;

else i f (x = o'i

result ZERO;

else i f (x > 0) result POS;

else

result = OTHER;

return result;

(ti) Generated assembly cod~

range_t find_range(float x)

x< ill X:rmmo 1 find_range:

2 vxorps %xmm1, %xmm1,

' vucom±ss %:xmmO,

4 j.a .L5

5 vucomiss %xmm1,

6 jp .L8

7 movl $1, %eax

8 je .L9

9 .L8:

%xmm1

%xmrn0

10 vucomiss .LCO(%rip), 7.xmmo

11 set be %al

12 movzbl %al, %eax 1l addl $2, %eax

14 ret

15 .L5:

16 movl $0, %eax .L3:

rep; ret

Set Xxmm1 = 0 Compare O:x It >, goto neg Compare x:O

If NaN, goto poeornan result - ZERO

If ~. goto done posornan;

Compare x:O

Set result ~NaN ? 1 : 0 Zero-extend

result += 2 (POS for> 0, OTHER :for NaN) Return

neg:

result "' NEG done:

Return

Figure 3.51 Illustration of conditional branching in floating-point code.

3,08 Chapter 3 Machine-Level Representation of Programs

set when S2 < S1. Instructions such as j a and jb are used to conditionally jump otl various combinations of these flags.

As an example of floating-point comparisons, the C function of Figure 3.Slt a) classifies argument x according to its relation to 0.0, returning an enumerated type as the result. Enumerated types in C are encoded as integers, and so the possible function values are: 0 (NEG), 1 (ZERO), 2 (POS), and 3 (OTHER). This final outcome occurs when the value of xis NaN.

Gee generates the code shown in Figure 3.Sl(b) for find_range. The code is not very efficient-it compares x to 0.0 three times, eveà though the required information could be obtained with a single comparison. It also generates floating- point constant 0.0 twice-once using vxorps, and once by reading the valueJrom memory. Let us trace the flow of the function for the four possible comparison results:

x < 0.0 The j a branch on line 4 will be taken, jumping to the end with a return value of 0.

x = 0.0 The ja (line 4) and jp (line 6) branches will not be taken, but the je branch (line 8) will, returning with %eax equal to 1.

x > 0.0 N.one of t)ie three branches will be taken. The set be (line 11) will yield

IJ; and this will tie incremented by the addl instruction (line i3) to give a return value of 2.

x =NaN The jp branch (line 6) will be taken. The third vucomiss instruction (line 10) will set both the carry and the zero flag, and so the setbe instruction (line 11) and the following instruction will s~t %eax to 1. This gets incremented by the addl instruction (line 13) to give a return value of3.

In Homework Problems 3.73 and 3.74, you are challenged to hand-generate more efficient implementations of find_range.

L!'1i!~(ii.~:f!r@iiim~tM~s§.LN&'.a fN(1g ~<!}<;~;:'!j1:i~1£!! :~:;r.: : 1

Function funct3 has the following prototype:

double funct3(int *ap, double b, long c, float *dp);

For this function, ace generates the following code:

2 3 4 5 6 7

double funct3(int *ap, double b, long c, float *dp) ap in %rdj, b in 7.xmmO, c in %r~i, dp,in %rdx

funct3: l

vmovss (%rdx), %xmm1

vcvtsi2sd (%rdi), %xmm2, %xmm2 vucomisd %xmm2, %xrnm0

jbe .1<8

vcvtsi2ssq %rsi, %xmmO, %xmmO vmulss %xmm1, %xmm0, %xmm1

Section 3.12 Summary 309 a. vunpcklps %xmm1, %xmm1, %xmm1

9 VCY.tps2pd %xmm1, %xmmO

10 ret

11 .LS:

12 vaddss %xmm1, r..xmm1, %xmm1

13 vcvtsi2ssq i.rsi, %xmm0, %xmmO 14 vaddss %xmm1, %xmmO, %xmmO

15 vunpcklps %xmmo, %xmm0, %xmm0 16 vcvtps2pd %xmmO, %xmmO

17 ret.

Write a C version of funct3.

3. 11.7 Observations about F.loating-Point Code

We see that the general style of macl)ine code generated for operating on fioating- point data with A VX.2 is similar to what we have seen for operating on integer data.

Both use a collection of registers to hold and operate on values, and they use these registern for passing function arguments.

Of course, there are many complexities in dealing with the different data types and the ~ules for e'valuati11g expressions containing 'a mixture of data types, and AVX2 code involves many more different instru'ctions and formats than is usually seen with functiops that perform mtly jnteger arithmetic.

AVx'2. .also h\15 the potedtial to make computaiions run faster by performing parallel operations on packed data. Compil~r developers are working on automat- ing tlie conversion of scalar code to parallel cod~, but currently the most reliable way to achieve higher performance thiough parallelism is to use the extensions to the C language supported by ace for manipulating vectors of data. See Web Aside

OPT:SIMD on page 546 to see how this can be done.

Floating-Point C~mparison Operatiqns

Systems Communicate 'with Other Systems

Conversions between Signed and Unsigned