[ Table of Contents | Index ]
Appendix D
D. Optimal Code Sequences
These code sequences are derived from the output of the GNU superoptimizer version 2.5 configured for the PowerPC architecture. The superoptimizer generates all sequences that perform a specified function through an exhaustive search. See Granlund and Kenner [1992] for further details. When multiple forms were found (most cases), sequences that did not set the Carry bit, had more parallelism, had fewer register operands, or generalized to 64-bit implementations were favored. A clever compiler might want to consider multiple sequences to minimize resource conflicts.
The GNU superoptimizer includes a large number of goal functions, basic operations for which the superoptimizer can attempt to find equivalent instruction sequences. Some of the goal functions, primarily shifts, for which the PowerPC architecture has direct instruction support have been removed from this table. The values of v0, v1, and so forth are stored in R3, R4, and so forth, respectively. At the end of the code sequence, the highest numbered register contains the result.
D.1 Comparisons and Comparisons Against Zero
These operators provide a truth value for the relationship between two values. Versions for both signed and unsigned values are required. They are branch-free forms which produce a truth value in a register with the semantics of ANSI C. That is, true is one, false is zero. Special forms for comparison against zero are listed because they are frequently shorter than the general sequence.
eq: equal to r = v0 == v1;
subf R5,R3,R4 cntlzw R6,R5 srwi R7,R6,5
______________________________________________________________________
ne: not equal to r = v0 != v1;
subf R5,R3,R4 subf R5,R3,R4 addic R6,R5,-1 subf R6,R4,R3 subfe R7,R6,R5 or R7,R6,R5 srwi R8,R7,31
______________________________________________________________________
les: less than or equal to (signed) r = (signed_word) v0 <= (signed_word) v1;
ges: greater than or equal to (signed) r = (signed_word) v1 >= (signed_word) v0;
srwi R5,R3,31 srawi R6,R4,31 subfc R7,R3,R4 adde R8,R6,R5
______________________________________________________________________
leu: less than or equal to (unsigned) r = (unsigned_word) v0 <= (unsigned_word) v1;
geu: greater than or equal to (unsigned) r = (unsigned_word) v1 >= (unsigned_word) v0;
li R6,-1 subfc R5,R3,R4 subfze R7,R6
______________________________________________________________________
lts: less than (signed) r = (signed_word) v0 < (signed_word) v1;
gts: greater than (signed) r = (signed_word) v1 > (signed_word) v0;
subfc R5,R4,R3 eqv R6,R4,R3 srwi R7,R6,31 addze R8,R7 rlwinm R9,R8,0,31,31
______________________________________________________________________
ltu: less than or equal to (unsigned) r = (unsigned_word) v0 < (unsigned_word) v1;
gtu: greater than or equal to (unsigned) r = (unsigned_word) v1 > (unsigned_word) v0;
subfc R5,R4,R3 subfe R6,R6,R6 neg R7,R6
______________________________________________________________________
eq0: equal to 0 r = v0 == 0;
subfic R4,R3,0 cntlzw R4,R3 adde R5,R4,R3 srwi R5,R4,5
______________________________________________________________________
ne0: not equal to 0 r = v0 != 0;
addic R4,R3,-1 subfe R5,R4,R3
______________________________________________________________________
les0: less than or equal to 0 (signed) r = (signed_word) v0 <= 0;
neg R4,R3 orc R5,R3,R4 srwi R6,R5,31
______________________________________________________________________
ges0: greater than or equal to 0 (signed) r = (signed_word) v0 >= 0;
srwi R4,R3,31 xori R5,R4,1
______________________________________________________________________
lts0: less than 0 (signed) r = (signed_word) v0 < 0;
srwi R4,R3,31
______________________________________________________________________
gts0: greater than 0 (signed) r = (signed_word) v0 > 0;
neg R4,R3 andc R5,R4,R3 srwi R6,R5,31
______________________________________________________________________
D.2 Negated Comparisons and Negated Comparisons Against Zero
These are branch-free forms that place a full word truth value into a register. Negated comparisons return 0 if the condition is false and -1 (0xFFFF_FFFF on a 32-bit machine) if it is true. That is, each bit in the word reflects the truth value of the comparison. In general, these sequences are building blocks for specialized sequences, but may be constructed by a compiler during optimization. Compare to zero is a special case because shorter forms are frequently available.
neq: negative equal to r = -(v0 == v1);
subf R5,R4,R3 addic R6,R5,-1 subfe R7,R7,R7
______________________________________________________________________
nne: negative not equal to r = -(v0 != v1);
subf R5,R4,R3 subfic R6,R5,0 subfe R7,R7,R7
______________________________________________________________________
nles: negative less than or equal to (signed) r = -((signed_word) v0 <= (signed_word) v1);
nges: negative greater than or equal to (signed) r = -((signed_word) v1 >= (signed_word) v0);
xoris R5,R3,0x8000 subf R6,R3,R4 addc R7,R6,R5 subfe R8,R8,R8
______________________________________________________________________
nleu: negative less than or equal to (unsigned) r = -((unsigned_word) v0 <= (unsigned_word) v1);
ngeu: negative greater than or equal to (unsigned) r = -((unsigned_word) v1 >= (unsigned_word) v0);
subfc R5,R3,R4 addze R6,R3 subf R7,R6,R3
______________________________________________________________________
nlts: negative less than (signed) r = -((signed_word) v0 < (signed_word) v1);
ngts: negative greater than (signed) r = -((signed_word) v1 > (signed_word) v0);
subfc R5,R4,R3 srwi R6,R4,31 srwi R7,R3,31 subfe R8,R7,R6
______________________________________________________________________
nltu: negative less than (unsigned) r = -((unsigned_word) v0 < (unsigned_word) v1);
ngtu: negative greater than (unsigned) r = -((unsigned_word) v1 > (unsigned_word) v0);
subfc R5,R4,R3 subfe R6,R6,R6
______________________________________________________________________
neq0: negative equal to 0 r = -(v0 == 0);
addic R4,R3,-1 subfe R5,R5,R5
______________________________________________________________________
nne0: negative not equal to 0 r = -(v0 != 0);
subfic R4,R3,0 subfe R5,R5,R5
______________________________________________________________________
nles0: negative less than or equal to 0 (signed) r = -((signed_word) v0 <= 0);
addic R4,R3,-1 srwi R5,R3,31 subfze R6,R5
______________________________________________________________________
nges0: negative greater than or equal to 0 (signed) r = -((signed_word) v0 >= 0);
srwi R4,R3,31 addi R5,R4,-1
______________________________________________________________________
nlts0: negative less than 0 (signed) r = -((signed_word) v0 < 0);
srawi R4,R3,31
______________________________________________________________________
ngts0: negative greater than 0 (signed) r = -((signed_word) v0 > 0);
subfic R4,R3,0 srwi R5,R3,31 addme R6,R5
______________________________________________________________________
D.3 Comparison Operators
This operation provides an index value that captures the full relationship between two values: -1 if less than, 0 if equal, and 1 if greater than. Comparisons occur frequently in sorting and searching. Frequently the value computed is used as an index rather than tested with branch instructions.
cmpu: compare (unsigned) r = (unsigned_word) v0 > (unsigned_word) v1? 1 : ((unsigned_word) v0 < (unsigned_word) v1 ? -1 : 0);
subf R5,R4,R3 subfc R6,R3,R4 subfe R7,R4,R3 subfe R8,R7,R5
______________________________________________________________________
D.4 Sign Manipulation
These operations manipulate the sign of a value. The sign function sgn returns zero if the value of its argument is zero, 1 if it is greater than zero, and -1 if it is less than zero. The absolute value and negated absolute values return the input value made either a positive or negative, respectively.
sgn: r = (signed_word) v0 > 0 ? 1 : ((signed_word) v0 < 0 ? -1 : 0);
xoris R4,R3,0x8000 addc R4,R3,R3 srawi R5,R4,31 subfe R5,R3,R4 subfze R6,R5 subfe R6,R5,R3
______________________________________________________________________
abs: r = (signed_word) v0 < 0 ? -v0 : v0;
srawi R4,R3,31 add R5,R4,R3 xor R6,R5,R4
______________________________________________________________________
nabs r = (signed_word) v0 > 0 ? -v0 : v0;
srawi R4,R3,31 subf R5,R3,R4 xor R6,R5,R4
______________________________________________________________________
D.5 Comparisons with Addition
These sequences all handle the sum or difference of a value with the result of a relational operator: a conditional increment or decrement. As with other relational expressions, both signed and unsigned forms are necessary. Likewise, the case of one of the relation operands having the value of zero is specialized as there are shorter code sequences. These sequences are for optimizing common C constructs like:
if(cond) a++;
if(cond)a=someval; else a=someval+1;
eq+: if equal to, increment r = (v0 == v1) + v2;
subf R6,R3,R4 subfic R7,R6,0 addze R8,R5
______________________________________________________________________
ne+: if not equal to, increment r = (v0 != v1) + v2;
subf R6,R3,R4 addic R7,R6,-1 addze R8,R5
______________________________________________________________________
les+: if less than or equal to (signed), increment r = ((signed_word) v0 <= (signed_word) v1) + v2;
ges+: if greater than or equal to (signed), increment r = ((signed_word) v1 >= (signed_word) v0) + v2;
xoris R6,R3,0x8000 xoris R7,R4,0x8000 subfc R8,R6,R7 addze R9,R5
______________________________________________________________________
leu+: if less than or equal to (unsigned), increment r = ((unsigned_word) v0 <= (unsigned_word) v1) + v2;
geu+: if greater than or equal to (unsigned), increment r = ((unsigned_word) v1 >= (unsigned_word) v0) + v2;
subfc R6,R3,R4 addze R7,R5
______________________________________________________________________
lts+: if less than (signed), increment r = ((signed_word) v0 < (signed_word) v1) + v2;
gts+: if greater than (signed), increment r = ((signed_word) v1 > (signed_word) v0) + v2;
subf R6,R4,R3 xoris R7,R4,0x8000 addc R8,R7,R6 addze R9,R5
______________________________________________________________________
ltu+: if less than (unsigned), increment r = ((unsigned_word) v0 < (unsigned_word) v1) + v2;
gtu+: if greater than (unsigned), increment r = ((unsigned_word) v1 > (unsigned_word) v0) + v2;
subfc R6,R4,R3 subfze R7,R5 neg R8,R7
______________________________________________________________________
eq0+: if equal to 0, increment r = (v0 == 0) + v1;
subfic R5,R3,0 addze R6,R4
______________________________________________________________________
ne0+: if not equal to 0, increment r = (v0 != 0) + v1;
addic R5,R3,-1 addze R6,R4
______________________________________________________________________
les0+: if less than or equal to 0 (signed), increment r = ((signed_word) v0 <= 0) + v1;
subfic R5,R3,0 srwi R6,R3,31 adde R7,R6,R4
______________________________________________________________________
ges0+: if greater than or equal to 0 (signed), increment r = ((signed_word) v0 >= 0) + v1;
addi R5,R4,1 srwi R6,R3,31 subf R7,R6,R5
______________________________________________________________________
lts0+: if less than 0 (signed), increment r = ((signed_word) v0 < 0) + v1;
srwi R5,R3,31 add R6,R5,R4
______________________________________________________________________
gts0+: if greater than 0 (signed), increment r = ((signed_word) v0 > 0) + v1;
neg R5,R3 srawi R6,R5,31 addze R7,R4
______________________________________________________________________
D.6 Bit Manipulation
The clear_lsb(a) function clears the least-significant 1-bit. The clear_lsb2(a,b) function clears the bit in b corresponding to the least-significant 1-bit of a.
clear_lsb: r = v0 & ~(v0 & -v0);
neg R4,R3 andc R5,R3,R4
______________________________________________________________________
clear_lsb2: r = v1 & ~(v0 & -v0);
neg R5,R3 and R6,R5,R3 andc R7,R4,R6
______________________________________________________________________
[ Table of Contents | Index ]
|