IBM Search Java Site Map Microelectronics News Products Services Technology About Us IBM Microelectronics Order Contact Legal

power pcProducts

overview
news
products
documents
performance
technology


[ Table of Contents | Index ]
Appendix D

D. Optimal Code Sequences


These code sequences are derived from the output of the GNU superoptimizer version 2.5 configured for the PowerPC architecture. The superoptimizer generates all sequences that perform a specified function through an exhaustive search. See Granlund and Kenner [1992] for further details. When multiple forms were found (most cases), sequences that did not set the Carry bit, had more parallelism, had fewer register operands, or generalized to 64-bit implementations were favored. A clever compiler might want to consider multiple sequences to minimize resource conflicts.

The GNU superoptimizer includes a large number of goal functions, basic operations for which the superoptimizer can attempt to find equivalent instruction sequences. Some of the goal functions, primarily shifts, for which the PowerPC architecture has direct instruction support have been removed from this table. The values of v0, v1, and so forth are stored in R3, R4, and so forth, respectively. At the end of the code sequence, the highest numbered register contains the result.


D.1 Comparisons and Comparisons Against Zero

These operators provide a truth value for the relationship between two values. Versions for both signed and unsigned values are required. They are branch-free forms which produce a truth value in a register with the semantics of ANSI C. That is, true is one, false is zero. Special forms for comparison against zero are listed because they are frequently shorter than the general sequence.

eq: equal to
r = v0 == v1;


subf R5,R3,R4
cntlzw R6,R5
srwi R7,R6,5

______________________________________________________________________

ne: not equal to
r = v0 != v1;


subf R5,R3,R4 subf R5,R3,R4
addic R6,R5,-1 subf R6,R4,R3
subfe R7,R6,R5 or R7,R6,R5
srwi R8,R7,31

______________________________________________________________________

les: less than or equal to (signed)
r = (signed_word) v0 <= (signed_word) v1;


ges: greater than or equal to (signed)
r = (signed_word) v1 >= (signed_word) v0;


srwi R5,R3,31
srawi R6,R4,31
subfc R7,R3,R4
adde R8,R6,R5

______________________________________________________________________

leu: less than or equal to (unsigned)
r = (unsigned_word) v0 <= (unsigned_word) v1;


geu: greater than or equal to (unsigned)
r = (unsigned_word) v1 >= (unsigned_word) v0;


li R6,-1
subfc R5,R3,R4
subfze R7,R6

______________________________________________________________________

lts: less than (signed)
r = (signed_word) v0 < (signed_word) v1;


gts: greater than (signed)
r = (signed_word) v1 > (signed_word) v0;


subfc R5,R4,R3
eqv R6,R4,R3
srwi R7,R6,31
addze R8,R7
rlwinm R9,R8,0,31,31

______________________________________________________________________

ltu: less than or equal to (unsigned)
r = (unsigned_word) v0 < (unsigned_word) v1;


gtu: greater than or equal to (unsigned)
r = (unsigned_word) v1 > (unsigned_word) v0;


subfc R5,R4,R3
subfe R6,R6,R6
neg R7,R6

______________________________________________________________________

eq0: equal to 0
r = v0 == 0;


subfic R4,R3,0 cntlzw R4,R3
adde R5,R4,R3 srwi R5,R4,5

______________________________________________________________________

ne0: not equal to 0
r = v0 != 0;


addic R4,R3,-1
subfe R5,R4,R3

______________________________________________________________________

les0: less than or equal to 0 (signed)
r = (signed_word) v0 <= 0;


neg R4,R3
orc R5,R3,R4
srwi R6,R5,31

______________________________________________________________________

ges0: greater than or equal to 0 (signed)
r = (signed_word) v0 >= 0;


srwi R4,R3,31
xori R5,R4,1

______________________________________________________________________

lts0: less than 0 (signed)
r = (signed_word) v0 < 0;


srwi R4,R3,31
______________________________________________________________________

gts0: greater than 0 (signed)
r = (signed_word) v0 > 0;


neg R4,R3
andc R5,R4,R3
srwi R6,R5,31

______________________________________________________________________

D.2 Negated Comparisons and Negated Comparisons Against Zero

These are branch-free forms that place a full word truth value into a register. Negated comparisons return 0 if the condition is false and -1 (0xFFFF_FFFF on a 32-bit machine) if it is true. That is, each bit in the word reflects the truth value of the comparison. In general, these sequences are building blocks for specialized sequences, but may be constructed by a compiler during optimization. Compare to zero is a special case because shorter forms are frequently available.


neq: negative equal to
r = -(v0 == v1);


subf R5,R4,R3
addic R6,R5,-1
subfe R7,R7,R7

______________________________________________________________________

nne: negative not equal to
r = -(v0 != v1);


subf R5,R4,R3
subfic R6,R5,0
subfe R7,R7,R7

______________________________________________________________________

nles: negative less than or equal to (signed)
r = -((signed_word) v0 <= (signed_word) v1);


nges: negative greater than or equal to (signed)
r = -((signed_word) v1 >= (signed_word) v0);


xoris R5,R3,0x8000
subf R6,R3,R4
addc R7,R6,R5
subfe R8,R8,R8

______________________________________________________________________

nleu: negative less than or equal to (unsigned)
r = -((unsigned_word) v0 <= (unsigned_word) v1);


ngeu: negative greater than or equal to (unsigned)
r = -((unsigned_word) v1 >= (unsigned_word) v0);


subfc R5,R3,R4
addze R6,R3
subf R7,R6,R3

______________________________________________________________________

nlts: negative less than (signed)
r = -((signed_word) v0 < (signed_word) v1);


ngts: negative greater than (signed)
r = -((signed_word) v1 > (signed_word) v0);


subfc R5,R4,R3
srwi R6,R4,31
srwi R7,R3,31
subfe R8,R7,R6

______________________________________________________________________

nltu: negative less than (unsigned)
r = -((unsigned_word) v0 < (unsigned_word) v1);


ngtu: negative greater than (unsigned)
r = -((unsigned_word) v1 > (unsigned_word) v0);


subfc R5,R4,R3
subfe R6,R6,R6

______________________________________________________________________

neq0: negative equal to 0
r = -(v0 == 0);


addic R4,R3,-1
subfe R5,R5,R5


______________________________________________________________________
nne0: negative not equal to 0
r = -(v0 != 0);


subfic R4,R3,0
subfe R5,R5,R5

______________________________________________________________________

nles0: negative less than or equal to 0 (signed)
r = -((signed_word) v0 <= 0);


addic R4,R3,-1
srwi R5,R3,31
subfze R6,R5

______________________________________________________________________

nges0: negative greater than or equal to 0 (signed)
r = -((signed_word) v0 >= 0);


srwi R4,R3,31
addi R5,R4,-1

______________________________________________________________________

nlts0: negative less than 0 (signed)
r = -((signed_word) v0 < 0);


srawi R4,R3,31
______________________________________________________________________

ngts0: negative greater than 0 (signed)
r = -((signed_word) v0 > 0);


subfic R4,R3,0
srwi R5,R3,31
addme R6,R5

______________________________________________________________________

D.3 Comparison Operators

This operation provides an index value that captures the full relationship between two values: -1 if less than, 0 if equal, and 1 if greater than. Comparisons occur frequently in sorting and searching. Frequently the value computed is used as an index rather than tested with branch instructions.

cmpu: compare (unsigned)
r = (unsigned_word) v0 > (unsigned_word) v1? 1 : ((unsigned_word) v0 < (unsigned_word) v1 ? -1 : 0);


subf R5,R4,R3
subfc R6,R3,R4
subfe R7,R4,R3
subfe R8,R7,R5

______________________________________________________________________

D.4 Sign Manipulation

These operations manipulate the sign of a value. The sign function sgn returns zero if the value of its argument is zero, 1 if it is greater than zero, and -1 if it is less than zero. The absolute value and negated absolute values return the input value made either a positive or negative, respectively.

sgn:
r = (signed_word) v0 > 0 ? 1 : ((signed_word) v0 < 0 ? -1 : 0);


xoris R4,R3,0x8000 addc R4,R3,R3
srawi R5,R4,31 subfe R5,R3,R4
subfze R6,R5 subfe R6,R5,R3

______________________________________________________________________

abs:
r = (signed_word) v0 < 0 ? -v0 : v0;


srawi R4,R3,31
add R5,R4,R3
xor R6,R5,R4

______________________________________________________________________

nabs
r = (signed_word) v0 > 0 ? -v0 : v0;


srawi R4,R3,31
subf R5,R3,R4
xor R6,R5,R4

______________________________________________________________________

D.5 Comparisons with Addition

These sequences all handle the sum or difference of a value with the result of a relational operator: a conditional increment or decrement. As with other relational expressions, both signed and unsigned forms are necessary. Likewise, the case of one of the relation operands having the value of zero is specialized as there are shorter code sequences. These sequences are for optimizing common C constructs like:

if(cond) a++;

if(cond)a=someval;
else a=someval+1;

eq+: if equal to, increment
r = (v0 == v1) + v2;


subf R6,R3,R4
subfic R7,R6,0
addze R8,R5

______________________________________________________________________

ne+: if not equal to, increment
r = (v0 != v1) + v2;


subf R6,R3,R4
addic R7,R6,-1
addze R8,R5

______________________________________________________________________

les+: if less than or equal to (signed), increment
r = ((signed_word) v0 <= (signed_word) v1) + v2;


ges+: if greater than or equal to (signed), increment
r = ((signed_word) v1 >= (signed_word) v0) + v2;


xoris R6,R3,0x8000
xoris R7,R4,0x8000
subfc R8,R6,R7
addze R9,R5

______________________________________________________________________

leu+: if less than or equal to (unsigned), increment
r = ((unsigned_word) v0 <= (unsigned_word) v1) + v2;


geu+: if greater than or equal to (unsigned), increment
r = ((unsigned_word) v1 >= (unsigned_word) v0) + v2;


subfc R6,R3,R4
addze R7,R5

______________________________________________________________________

lts+: if less than (signed), increment
r = ((signed_word) v0 < (signed_word) v1) + v2;


gts+: if greater than (signed), increment
r = ((signed_word) v1 > (signed_word) v0) + v2;


subf R6,R4,R3
xoris R7,R4,0x8000
addc R8,R7,R6
addze R9,R5

______________________________________________________________________

ltu+: if less than (unsigned), increment
r = ((unsigned_word) v0 < (unsigned_word) v1) + v2;


gtu+: if greater than (unsigned), increment
r = ((unsigned_word) v1 > (unsigned_word) v0) + v2;


subfc R6,R4,R3
subfze R7,R5
neg R8,R7

______________________________________________________________________

eq0+: if equal to 0, increment
r = (v0 == 0) + v1;


subfic R5,R3,0
addze R6,R4

______________________________________________________________________

ne0+: if not equal to 0, increment
r = (v0 != 0) + v1;


addic R5,R3,-1
addze R6,R4

______________________________________________________________________

les0+: if less than or equal to 0 (signed), increment
r = ((signed_word) v0 <= 0) + v1;


subfic R5,R3,0
srwi R6,R3,31
adde R7,R6,R4

______________________________________________________________________

ges0+: if greater than or equal to 0 (signed), increment
r = ((signed_word) v0 >= 0) + v1;


addi R5,R4,1
srwi R6,R3,31
subf R7,R6,R5

______________________________________________________________________

lts0+: if less than 0 (signed), increment
r = ((signed_word) v0 < 0) + v1;


srwi R5,R3,31
add R6,R5,R4

______________________________________________________________________

gts0+: if greater than 0 (signed), increment
r = ((signed_word) v0 > 0) + v1;


neg R5,R3
srawi R6,R5,31
addze R7,R4

______________________________________________________________________

D.6 Bit Manipulation

The clear_lsb(a) function clears the least-significant 1-bit. The clear_lsb2(a,b) function clears the bit in b corresponding to the least-significant 1-bit of a.

clear_lsb:
r = v0 & ~(v0 & -v0);


neg R4,R3
andc R5,R3,R4

______________________________________________________________________

clear_lsb2:
r = v1 & ~(v0 & -v0);


neg R5,R3
and R6,R5,R3
andc R7,R4,R6

______________________________________________________________________


[ Table of Contents | Index ]
Copyright 1998 IBMchips