cnt. Gee generates the assembly code shown as Figure 3.16(c). Our rendition of the"machine code into C is shown as'the function gotodiff_se (Figure 3.16(b)).
It uses the goto statement in C, which is similar to the unconditional jump of
3. Actually, it can return a negative value if one of the subtractions overflows. Our interest.here is to demonstrate machine code, not to implement robust code.
,I
210 Chapter 3 Machine-Level Representation of Programs
(b) Equivalent goto version (a) Original C code
long 1 t_cnt 0 ;ã
long ge_cnt = O;
long gotodiff_se(long x, long y)
•
long absdiff_se(long x, long y)
{
}
long resultj if (x < y) { lt_cnt++;
result = y - x;
}
else {
}
ge_cnt++i result = x - Yi return result;
(c) Genera\ed assembly code
2 3 4
s
6 7 8 9
,, 10 12
long 'l.bsdi:ff_Ae(1011g x. long y) x in i.rdi, yin Xrsi
absdiff_se:
cmpq %rsi, %rdi jge .L2
addq $1, 1t_cnt(%rip)' movq %rsi, %rax
subq %rdi, %rax ret
.L2:
addq $1, ge_cnt(%rip) movq %rdi, %rax1 subq %rsi, %rax
ref' .u•
2 {
3 long res~lt;
4 i f (x >= y)
s goto x_ge_y;
6 lt_cnt++;
7 result = y - x;
8 return result;
9 x_ge_y:
10 ge_cnt++;
11 result = x - Yi 12 return result;
13 }
Compare.x:y If >== go'to x_ge_y lt_cnt++
result . =--Y - x . I Return
x_ge_y: I ,
gs_cnt++
. '
result = x - y Return
f
Fig~re 3.16 Compilation of "<'!riditional statements. (a) C; proc~dure absdiff_se contains an.if-else statement. The gen~rated assembly 1;qd!' is sho"(n (c), ~long ';"ith (b) a C procedure gotpdiff_s~ that mimics th~ control flow oft~~ asselT]ply cpd~.
.. I
"I'
assemblycode:Using goto statements is generally considered a bad programming style, since their use can make code very difficult to read and debug. We use.them in ourãpresentation as a wayãto construct C programs that describe the control flow of machine code. We call this style of programming "goto code."
In the goto code (Figure 3.16(b )), the statement goto x_ge_y on line 5 causes a jump to the label x~ge_y (since it occurs when x :::: y) op line 9. Continuing the
Section 3.6 Control 211
__ ,..,_ ã---~._..._..,.._~-.. ~---~,~"
Aside Describing rnac~ine \:.9,~e )1(ith C cqp~ ., '"" Q , ' Figure 3.16 shows an example 'of how ,we wilt demonstrafe~t)le trabsl,ation of,,C language 'Control I
constructs int9 machine code. The ~gure contains an example C function (a) and an annotated version of the assembly ,code generated ,~Y qcc (c). It also contains a version in C that c!osely matches the structure of the assembly code (b). Although these versions were generated in the sequence (a), (c), and (b), werecominend that you read them in the order (a); (b), and then (c). That is, the C rendition of the machine code will help you understand the key points, and this cap guide you in understanding
)'. ., ~ '{: f
the actual assembly code. '
execution from this point, it completes the computations specified by the else portion of function absdiff_se and returns. On the other hand, if the test x >= y fails, the program procedure will carry out the steps specified by the i f portion of absdiff_se and return.
The assembly-code implementation (Figure 3.16(c)) first compares the two operands (line 2), setting the condition codes. If the comparison result indicates that x is greater than or equal to y, it then jumps to a block of code starting at line 8 that increments global variable ge_cnt, computes x-y as th~ return value, and returns. Otherwise, it continues with the execution of code beginning at line 4 that increments global variable 1 t_cnt, computes y-x as the return value, and returns. We can see, then, that the control flow of the assembly code generated for absdiff_se closely follows the goto code of gotodiff_se.
The general form of an if-else statement in C is given by the template
i f (test-expr) then-statement else
else-statement
where test-expr is an integer expression that evaluates either to.zero (interpreted as meaning "false") or to a nonzero value (interpreted as meaning "true"). Only one of the two branch statements (then-statement or else-statement) is executed.
For this general form, the assembly implementation typically adheres to the following form, where we use C syntax to describe the control flow:
t = test-expr;
if ( !t)
goto false;
then-statement goto done;
false:
else~statement
done:
. -
212 Chapter 3 Machine-Level Representation of Programs
That is, the compiler generates separate blocks of code for then-statement and else-statement. It inserts conditional and unconditional branches to make sure the
correct block is executed. ..
!rlttic.e;ei:o6ri'ffi'tdE'\llior@;m"!a!fljt"3W:~e"itt:~E S-a
When given the C code void cond(long a, long •p)
{
}
i f (p &;&; a > •p)
*P = a;
Gee generates the following assembly code:
void cond(long a, long •p) a in Xrdi, pin Xrsi cond:
te;>tq
; . \
3e cmpq jge' movq .Li:
rep; ret
Y.rsi. ~rsi
:Li .
%rdi, (%rsi) .Li
%rdi, c%rsi) .,
A. Write a goto version in C that performs the same computation and mimics the control flow of the assembly code, in the style shown in Figure 3.16(b ).
You might find it helpful to first annotate the assembly code as we have done in our examples.
B. Explain whi; the assembly,code contains two concjitional branches, even though the C cof!o;..has qnly q_ne ~f statement. "
An alternate rule for translating if statements into goto code is as follows:
t = test-expr;
i f (t)
goto true;
else-statement goto done;
true:
then-statement done:
Section 3.6 Control 213
A. Rewrite the goto version of absdiff_se based mi this alternate rule.
B. Can you think of any reasons for choosing one rule over the other?
fi!riitii~IDi!Mi'.3~'1!Hlmb&~ii~iMP1U~~~il
Starting with C code of the form
long test(long x, long y, long z) { long ;val = _ _ _ _ _
if ( {
i f (_ )
val else
val } ,el~e i f ( _ _ _
val= _ _ _ _ ; return val;
}
Gee generates the following assembly code:
long test(long x, long y, long z) x in %rdi, y in %rsi , z in %rdx test:
leaq addq cmpq jge cmpq jge movq imulq ret .13:
movq imulq ret .L2:
cmpq jle movq imulq .14:
rep; ret
(%rdi,%rsi),
%rdx, %rax
$-3, %rdi .L2
%rdx, %rsi .13
%rdi, %rax
%rsi, %rax
%rsi, %rax
%rdx, %rax
$2, %rdi .14
%rdi, %rax
%rdx, %rax
%rax
Fill in the missing expressions in the C code.
~a-.--ã-
21'4 Chapter 3 Machine-Level Representation of Programs
3.6.6 Implementing' Gonditional Branches with Conditional Moves The conventional way t6 implement conditional operations is through a condi- tional transfer of control, where the program follows one execution path when a condition holds and another when it does not. This mechanism is simple and general, but it can be very inefficient on modern processors.
An alternate strategy is through a conditional transfer of data. This approach computes both outcomes of a conditional operation and then selects one based on whether or not the condition holds. This strategy makes sense only in restricted cases, but it can then be implemented by a simple conditional move instruction that is better matched to the performance characteristics of modern processors.
Here, we examine this strategy and its implementation with x86-64.
Figure 3.l 7(a) shows an example of code that can be compiled using a condi- tional move. The function computes the absolute value of its arguments x and y, as did our earlier example (Figure 3.16). Whereas the earlier example had side ef- fects in the branches, modifying the value of either lt_cnt or ge_cnt, this'version simply computes the value to be returned by the function.
(a) Original C code (b) Implementation using conditional assignment long cmovdiff(long x, long y)
long absdiff(long x, long y) {
}
long result;
if (x < y)
result y - x;
else
result = x - y;
return result;
(c) Generated assembly code
long absdiff(long x, long y) x in %rdi, yin %rsi
absdiff:
2 movq %rsi, %r;à
3 subq %rdi, %rax
4 movq %rdi, %rdx
5 subq %rsi, %rdx
6 cmpq %rsi, %rdi
7 cmovge %rdx, %rax
8 ret
2 3 4 5 6 7 8 9
{
10 }
rval = y-x
eval = x-y Compare x:y
long rval = y-x;
long eval = x-y;
lorig ntest =4x >= y;
/* Line below requires single instruction: */
if (ntest) rval = eval;
return rval;
If >=, rval = eval Return tval
Figure 3.17 Compilation of conditional statements using conditional assignment. (a) C function absdiff contains a conditional expression. The generated assemqly code is shown (c), along witl) (b) a C function cmovdiff that mimics the operation of the assembly code.
Section 3.6 Control 215 For this function, ace generates the assembly code shown in Figure 3.17(c),
having an approximate form shown by the C function crnovdiff shown in Figure 3.17(b). Studying the C version, we can see that it computes both y-x and x-y, naming these rval and eval, respectively. It then tests whether x is greater than or equal to y, and if so, copies eval to rval before returning rval. The assembly code in Figure 3.17(c) follows the same logic. The key is that the single cmovge instruction (line 7) of the assembly code implements the conditional assignment (line 8) of cmovdiff. It will transfer the data from the source register to the destination, only if the cmpq instruction of line 6 indicates that one value is greater than or equal to the other (as indicated by the suffix ge).
;ro, understand why code based on conditional data transfers can outperform code based on conditional COJ!trol transfers (as in Figure 3.16), we must understand something about how modern processors operate. As we will see in Chapters 4 and 5, processors achieve high performance through pipelining, where an instruc- tion is processed via a sequence of stages, each performing one small portion of the required operations (e.g., fetching the instruction from memory, determining the instruction type, reading from memory, performing an arithmetic operation, writing to memory, and updating the program counter). 1bis approach achieves high performance by overlapping the steps of the successive instructions, such as fetching one instruction while performing the arithmetic operations for a pre- vious instruction. To do this requires being able to determine the sequence of instructions to be executed well ahead of time in order to keep the pipeline full of instructions to be executed. When the machine encounters a conditional jump (re- ferred to as a "branch"), it cannot determine which way the branch will go until it has evaluated the branch condition. Processors employ sophisticated branch pre- diction logic to try to guess whether or not each jump instruction will be followed.
As long as it can guess reliably (modern microprocessor designs try to achieve success rates on the order of 90% ), the instruction pipeline will be kept full of instructions. Mispredicting a jump, on the other hand, requires that the processor discard much of the work it has already done on future instructions and then begin filling the pipeline with instructions starting at the correct location. As we will see, such a misprediction can incur a serious penalty, say, 15-30 clock cycles of wasted effort, causing a serious degradation of program performance.
As an example, we ran timings of the absdiff function on an Intel Haswell processor using both methods of implementing the conditional operation. In a typical application, the outcome of the test x < y is highly unpredictable, and so even the most sophisticated branch prediction hardware will guess correctly only around 50% of the time. In addition, the computations performed in each of the two code sequences require only a single clock cycle. As a consequence, the branch misprediction penalty dominates the performance of this function. For x86-64 code with conditional jumps, we found that the function requires around 8 clock cycles per call when the branching pattern is easily predictable, and around 17.50 clock cycles per call when the branching pattern is random. From this, we can infer that the branch misprediction penalty is around 19 clock cycles. That means time required by the function ranges between around 8 and 27 cycles, depending on whether or not the branch is predicted correctly.
I
l
I
I
I I
I I
I
I
1 I
-- -- __ ,,._ -- - -- ... _ .. ---- .. _
216 Chapter 3 Machine-Level Representation of Programs
.""' ~..., !!<' ... .,.,, I} .~ ,\.~ lot ~ .ãII' 1ã ~ ,, • ~" .. ã - ., .t ~ ""'<!' ~
Aside How dicf:ou d~!~rr:rii~~ }~l~ P.~riaity? . ., ã'ã • ''. .' " ; ., •j
Assum~ the pr~bajili,ty:of m}~pred'.c~onisãg; the.'t~i; to .ex~t:\Jte the ãcq~~ ã~~h\l.ut rl'l},sjlrtldiction !~ 1
:TC?K• and the•m1spre?~~t10~ penalty-is !MPã tffett the a'Vetage ttpi.e t~ execut~the/od~.a.s a f~."ctio~ of I
.P ts Tavg(P) = (1-p)ToK -f pJ:foK t TMp) =-Tm{:,\'ãPTMf>. We.ate g1verfToK,and Tran• tile average llme i
whe.n p "= b.5, and we'''1\(ant't~. d,eter,1?"1e.~M. i:ã Su~~tlfoli~~into ihe e'.qll~~on, we ge!}ian = favg(0.5~,'.i.
ToK + 0.5TMP• and therefore '!'MP =7(7; •• ,-.ToK)."So, fp{,'.!'.'()K = 8 ~J\d Tran= 1'7.5, we get TMP ='19:
_ - ~ ""' 1v~z.,...t-:. . ..,.._.., , ~ •• ~., .. ~ ... ~""'*'~ã <~-~ ... ~L .. ""1"...u •. ~ ~-.... ,,.... ~- ~
On the other hand, the code compiled using conditional moves. requires around 8 clock cycles regardless of the data being tested. The flow of control does not depend oh data, and this makes it easier for the processor to keepãits pipeline full.
§~Ble~1!ii;i;VJ;R39~32>ZS,.~~;t,:ZU~t:3
Running on an older processor model, our code required around 16 cycles when the branching.pattern was highly predictable, and around 31 cycles when.the pattern was random.
A. What is the approximate miss penalty?
B. How many cycles would the function require when the branch is mispre- dicted?,
Figure 3.18 illustrates some of the conditional move instructions available with x86-64. Each of these instructions has two operands: a source register or memory location S, and a destination register R. As with the different SET (Section 3.6.2) and jump (Section 3.6.3) instructions, the outcome of these instructions depends on the values of the condition codes. The source value is read from 'either memory or the source register, but it is copied to the destination only if the specified condition holds.
The source and destination values can be 16, 32, or 64 bits long. Single- byte conditional moves are not supported. Unlike the unconditional instructions, where the operand length is explicitly encoded in the instruction name (e.g., movw and movl), the assembler can infer the operand length of a conditional move instruction from the name of the destinationãregister, and so the same instruction name canrbe used for all operand lengths.
Unlike conditional jumps, the processor can execute 'Conditional move in- structions without having to predict the outcome of the test The processor simply reads the source value (possibly from memory), checks the condition code, and then either updates the-destination register or keeps it the same. We will explore the implementation of'conditional moves in Chapter 4.
To understand how conditional operations can he implemented via .condi- tional data transfers, consider the following general form of conditional expression and assignment:
Section 3.6 Instruction Synonym Move condition Description
cmove S,R cmovz ZF Equal I zero
cmovne S,R cmovnz -ZF No! equal I not zero cmovs S,R ,.
SF Negative
cmovns S,R -SF Nonnegative
cmovg S,R cmovnle -(SF - OF) & -ZF Greater (signed >)
cmovge S,R cmovnl -(SF - OF) Greater or equal (signed>=) cmovl S,R cmovnge SF-OF Less (signed <)
cmovle S,R cmovng (SF - OF) I ZF Less or equal (signed<=) cm ova S,R cmovnbe -CF&-ZF Above (unsigned >)
cmovae S,R cmovnb -CF Above or equal (Unsigned >=) cmovb S,Rã cmo'\fnae CF Below (unsigned<)
cmovbe S,R cmovna CF I ZF Below or equal (unsigned <= )ã
Figure 3.18 The conditional move instructions. These instructions ~opy the source value S to its destination R when the move condition holds. S~me instructions have
"synonyms," alternate names for the same machine instruction.
v = test-expr ? then-expr : else-expr;
The standard way to compile this expression using conditional control transfer would have the following form:
ã'
if ( !test-expr) goto false;, v = the11-expr;
g9tO,dOI!~i
false:
v = else-expr;
done:
This code contains two code sequences--0ne evaluating then-expr and one evalu- ating else-exp~. A combination of condhi<;mai and unconditional jumps is used to ensure that just one of the sequences is evaluat~d.
For the code based
0
on a conditi~nal ip~vO: both the then-expr and the e/se- expr are' evaluated, with tne final value ch?~~n based on the evaluation t~st-expr.
This cal\ be d,escribed by the following absJract code:
vã = then-expr;
ve = else-expr;
t = tist-expr;
if (!t) V = V0j
The final statement in this sequence is implemented. with a conditional move- value ve is copied to v only if test condition t does not hold.
Control 217
218 Chapter 3 Machine-Level Representation of Programs
Not all conditional expressions can be compiled using conditional moves.
Most significantly, the abstract code we have shown evaluates both then-expr and else-expr regardless of the test outcome. If one of those two expressions could possibly generate an error condition or a side effect, this could lead to invalid behavior. Such is the case for our earlier example (Figure 3.16). Indeed, we put the side effects into this example.specifically to force _ace to implement this function using conditional transfers.
As a second illustration, consider the following C function:
long cread(long •xp) {
return (xp? *XP O);
}
At first, this seems likeãa good candidate to compile using a conditional move to set the result to zero when the pointer is null, as shown in the following assembly code:
longã crBad(long *Xp)
Invalid impleme11tatio11 of fUDctio11 cread xp in register %rdi
1 cread:
2 movq
3 testq
4 movl
5 cmove
6 ret
..
(%rdi), %rax v = *XP
%rdi, %rdi Test x
$0, %edx Set ve = 0
%rdx, %rax If x==O, v = ve Return v
This implementation is invalid, however, since the derefefencing of xp by the movq instruction (line 2) occurs even when the test fails, causing a 'null pointer dereferencing error. Instead, this code must be compiled using branching code.
Using conditional moves also does not always improve code efficiency. For example, if either the then-expr or the e/se-expr 'evaluation requires a significant computation, then this effort is wasted when the corresponding condition does npt \lold. Compilers must take into account the relative P>'rformance of wasted computation versus ti)!'' poteiitlal 'for performanb6 pei\aity due to branch mispre- diction. In truth, they do not really have.enough information to make this decision reliably; for example, they do l)Ot know how well the branches will follow pre- dictable patterns. Our exreriments'with GCC indicate that it duly u~es conditional moves when tJie two expres,siori~ c~n"be compute'cl very easlly, for ~x'ample, with
single add instructions. In our experience, ace uses conditional control transfers even in many cases where the cost of branch misprediction would exceed even more complex computations.
Overall, then, we see that conditional data transfers offer an alternative strategy to conditional control transfers for implementing conditiopal ~perations.
They can only be used in restricted cases, but these cases are fairly common and provide a much better match to the operation of modern processors.
Section 3.6'. Control 219
lflrct1si\iPio&kim~mt;~3iifi>'41V-8&at~lt&:£?fll In the following C function, we have left the definition of operation OP incomplete:
#define OP _ _ _ _ I• Unknown operator •/
long arith(long x) { reiurn x OP 8;
}
When compiled, ace generates the following assembly code:
long arith(long x) x in Xrdi
arith:
leaq 7(%rdi). %rax testq %rdi, %rdi cmovns %rdi, %rax sarq $3, %rax ret
A. What' operation is OP?
-ã
,,
} !h '~
B. Annotate ~Jie code to explain hpw, jt, wor!q;.
1 Starting w}th C code of the form
1 long test(long x, long y) { long val = _ _ _ _
if ( ) {
if ( )
val else
val
} else if ( ) val= _ _ _ _ ieturn val;
}
ace generates the following assembly code:
long test(long x, long y) x in Zrdi , y in Xrsi
1
test:
1 leaq O(,%rdi,8), '!.rax.
testq • jle
%rsi ,' %rsi .L2
, - --- -- -- -- ---- - ... - --
220 Ch'apter 3 'Machine-Level Representation of Programs movq %rsi, %rax
subq %rdi, %rax
' I
movq %rdi, %rdx andq %rsi, %rdx cmpq %rsi, %rdi cmovge %rdx, %rax ret
.L2:
addq %rsi, %rdi cmpq $-2, %rsi cmovle %rdi, %rax ret
Fill in the missing expressions in the C code.