IBM Search Java Site Map Microelectronics News Products Services Technology About Us IBM Microelectronics Order Contact Legal

power pcProducts

overview
news
products
documents
performance
technology


[ Table of Contents | Index ]

3.3 Floating-Point Operations

The principles underlying floating-point instruction selection parallel those for fixed-point instruction selection: The total number of instructions should be reduced, especially the number of long-latency instructions (divides, loads and stores). Dependences should be minimized to increase flexibility of scheduling and opportunities for parallel computation.

An important difference between floating-point and fixed-point instruction selection, however, involves the inherent rounding of floating-point calculations. In general, the finite floating-point format is an imperfect representation of a real number, hence the need for rounding. The error introduced during rounding can be analyzed, and various approaches have been developed to reproducibly control this error. The IEEE 754 Standard for Binary Floating-Point Arithmetic (IEEE 754) specifies the most common approach. The PowerPC architecture includes the features needed to conform to IEEE 754. The requirements of IEEE 754, however, restrict the optimizations available during code selection.

The PowerPC architecture's floating-point support includes:


3.3.1 Typing, Conversions and Rounding

The PowerPC architecture includes 32 64-bit Floating-Point Registers. These Floating-Point Registers contain the values of all source and destination operands for all floating-point operations (except for floating-point loads and stores). The floating-point data types supported by the PowerPC architecture include the IEEE 754 double and single formats and a 32-bit two's complement integer format. 64-bit implementations also support a 64-bit two's complement integer format.

During a load into a Floating-Point Register, the processor converts 32-bit single-precision values to the 64-bit double-precision format. A full complement of single-precision arithmetic operations accept single-precision values as input and round their results to single-precision. Maintaining the single-precision values in double-precision format in the Floating-Point Registers removes the need for these values to undergo conversion to double-precision. In addition, a double-precision operation can trivially perform the multiplication of two single-precision values to produce a double-precision result.

Some implementations perform all floating-point operations (except division) in double-precision format, rounding the final result to the target precision. Some implementations perform single-precision operations with less latency than the comparable double-precision operations. In summary, there may be timing differences between single-precision and double-precision operations, so consult Appendix B and specific implementation documentation for further information.

The 64-bit integer format fills the entire Floating-Point Register, while the 32-bit integer format resides in the low-order half of the register. These integer values result from a direct load into the Floating-Point Register, a conversion from a floating-point value, or the transferal of the contents of the FPSCR by mffs. The PowerPC architecture includes several instructions to support conversion between integer and floating-point formats in the Floating-Point Registers. Detailed algorithms describing their behavior can be found in Appendix B of Book I of The PowerPC Architecture. Code examples that perform these conversions appear in Section 3.3.8 of this book and Appendix E.3 of Book I of The PowerPC Architecture.

An important optimization involves avoiding the conversion of integer induction variables to floating-point values in floating-point computational loops if the integer values take part in floating-point computations. Integer-to-float conversions are time-consuming. One way to avoid this is to create a floating-point value that increments in lock step with the induction variable. Using this floating-point value in the calculation avoids the conversion.

The Floating Round to Single-Precision (frsp) instruction explicitly rounds the value in a Floating-Point Register to single-precision, with all applicable IEEE 754 exceptions. PowerPC single-precision arithmetic operations automatically round their results to single-precision.

The value of the RN field in the Floating-Point Status and Control Register (FPSCR) determines the rounding mode as follows:

The FPSCR instructions, mtfsb0, mtfsb1, mtfsfi, and mtfsf, can write this field.


3.3.2 Memory Access

The floating-point instruction set architecture includes load and store functionality analogous to that provided for fixed-point operations. This includes load, store, load with update, store with update, load indexed, store indexed, load with update indexed, and store with update indexed. No floating-point exceptions occur during the execution of these instructions, although the processor may generate Data Storage or Alignment interrupts.


3.3.2.1 Single-Precision Loads and Stores

The single-precision floating-point load copies the addressed word from memory and interprets it as a 32-bit single-precision value, which is reformatted as a 64-bit double-precision value and written to a Floating-Point Register. The single-precision store functions in reverse, reformatting the double-precision contents of a Floating-Point Register to a single-precision format and writing the result to the addressed word in memory. The detailed algorithms for the single-precision floating-point loads and stores are found in Sections 4.6.2 and 4.6.3 of Book I of The PowerPC Architecture, respectively.

A single-precision store operation does not round the register value to single-precision, but merely performs a format conversion when copying the data. This format conversion may involve denormalizing the value, but does not include rounding. For the storage of a properly rounded single-precision result, it is the compiler's responsibility either to round the result to single-precision with the frsp instruction prior to the store or to ensure that the value in the register falls in the range of the single-precision format. Storing a double-precision value that is not rounded to single-precision can produce unexpected results when the operand is out of range.


3.3.2.2 Double-Precision Loads and Stores

The double-precision load copies the addressed doubleword from memory to a Floating-Point Register without modification. Similarly, the double-precision store copies the contents of a Floating-Point Register to the addressed doubleword in memory without modification.


3.3.2.3 Endian Conversion

The fixed-point load with byte reversal and store with byte reversal instructions enable the endian reversal of floating-point values. The procedure for a single-precision floating-point value involves loading it into a General-Purpose Register using the Load Word with Byte Reversal instruction, and then storing it normally. The procedure for a double-precision floating-point value involves loading both of its 32-bit halves into General-Purpose Registers using Load Word with Byte Reversal instructions, followed by storing the two halves normally but reversing their order (upper to lower and lower to upper). Alternatively, the preceding procedures may also use normal loads and byte-reversal stores.


3.3.2.4 Touch Instructions

High-performance compilers and carefully crafted library code may use the touch instructions to give the processor a hint regarding which data to load into the cache. See the discussion in Section 4.4.3 on page 134 for details.


3.3.3 Floating-Point Move Instructions

The floating-point move instructions copy data from one Floating-Point Register to another, altering the sign bit in some cases. These instructions do not alter the FPSCR.


3.3.4 Computation

All floating-point arithmetic operations use only Floating-Point Register values as source and destination operands. Floating-point comparison operations use only Floating-Point Register values as source operands and a Condition Register field as a destination operand. Floating-point operands or results that are NaNs or infinities may increase execution time.


3.3.4.1 Setting Status Bits

All arithmetic, rounding and conversion floating-point instructions have a recording form, which copies bits 0:3 of the FPSCR to field 1 of the Condition Register, reflecting possible exceptions that occurred during calculation of the result. In addition, all floating-point operations update the FPSCR to characterize the result and associated exceptions.


3.3.4.2 Arithmetic

Arithmetic instructions have two forms: single- and double-precision. The processor does not round the source operands of a single-precision operation to single-precision, and for this reason and others, using double-precision valued operands leads to undefined results. All operands for single-precision operations should either be the result of a single-precision operation or should be previously rounded to single-precision. Use single-precision operators on single-precision values, and double-precision operators on single- or double-precision values.

Multiplication followed by a dependent addition is the most common idiom in floating-point code, so the PowerPC architecture includes a set of fused multiply-add instructions. The multiply-add instructions do not round the multiplication result before performing the addition. This property is useful for many algorithms, some of which appear in subsequent sections. On the other hand, this property does not conform to IEEE 754, which requires rounding after every operation, but the results of the fused multiply-add instructions always differ from IEEE 754 results in a way that would be considered more accurate.

Many PowerPC implementations have floating-point units designed around the fused multiply-add functionality. Regardless of the arithmetic operation, the actual execution involves both a multiplication and an addition. Addition operations include an effective multiply by 1. Multiplication operations include an effective add to 0. In these implementations, the multiply-add operations are faster than separate multiply and add steps.

For the negative multiply-add or multiply-subtract instructions, the rounding occurs prior to the sign inversion. Therefore, when the rounding mode is round toward ± , the fnmadd instruction, defined as -[A × C + B], is not the same as -A × C - B, and the compiler cannot substitute fnmadd for -A × C - B if IEEE 754 compatibility is required. Similar concerns apply to fnmsub and -A × C + B.

On all existing designs, the division operation requires a large number of cycles to execute. In the special case that the divisor is a power of 2, you may replace the division with multiplication by the reciprocal, which can be represented precisely, unless the divisor is a denormal other than 2-127 (for single-precision) or 2-1023 (for double-precision).


3.3.4.3 Floating-Point Comparison

The processor writes the 4-bit result of floating-point compare instructions to both the specified field of the Condition Register and the FPCC field of the FPSCR. This result indicates the relation between the two values: less than, greater than, equal to, or unordered. If either of the operands is a SNaN, VXSNAN in FPSCR is set. If interrupts on Invalid Operations are enabled, the processor generates the corresponding interrupt.

If either operand is a NaN, the Floating Compare Ordered (fcmpo) instruction sets VXVC in FPSCR. If interrupts on Invalid Operations are enabled, the processor generates the corresponding interrupt.


3.3.5 FPSCR Instructions

FPSCR instructions appear to synchronize the effects of all floating-point instructions. In fact, an FPSCR instruction may not execute until:

No subsequent instruction that depends on or alters the FPSCR may execute until the FPSCR instruction has completed. Therefore, avoid unnecessary use of FPSCR instructions.

FPSCR instructions do not set the FEX and VX bits in FPSCR explicitly, but rather these bits represent the logical OR of the exception bits and the Invalid Operation exception bits, respectively. On some implementations, updating fewer than all eight fields of the FPSCR may result in substantially poorer performance than updating all the fields.


3.3.6 Optional Floating-Point Instructions

The PowerPC architecture includes some optional instructions, which are divided into two groups: general-purpose (fsqrt and fsqrts) and graphics (stfiwx, fres, frsqrte, and fsel). An implementation must support all instructions in a given group if it supports any of them. If good performance is required for all implementations, avoid the optional instructions. On some implementations, they cause interrupts and hence take on the order of 100 cycles or more to execute. If higher performance on some set of implementations that directly support these instructions is desired, their use should be restricted to library code for reasons of portability. See Appendix B and implementation-specific documentation for availability and performance of these instructions.


3.3.6.1 Square Root

The fsqrt[s] instruction computes the square root, accurate to the indicated precision. With the exception of -0, if the operand is negative, the operation produces a QNaN for a result and causes an Invalid Square Root Invalid Operation exception. The square root of -0 is -0.


3.3.6.2 Storage Access

The stfiwx instruction stores the lower 32 bits of a Floating-Point Register to a word in memory without modification. If the source operand was produced by a single-precision instruction, the value stored in memory is undefined.


3.3.6.3 Reciprocal Estimate

The fres instruction generates an estimate of a reciprocal, accurate to 1 part in 256. If greater accuracy is required, this result can serve as the seed for a Newton-Raphson approximation algorithm.


3.3.6.4 Reciprocal Square Root Estimate

The frsqrte instruction generates an estimate of the reciprocal square root, accurate to 1 part in 32. If greater accuracy is required, this result can serve as the initial seed for a Newton-Raphson approximation algorithm.


3.3.6.5 Selection

The optional fsel instruction conditionally copies one of two values to the target register on the basis of comparison of a third value to zero. This instruction can implement comparison to zero, minimum or maximum calculations, and simple if-then-else constructions. Examples of its use appear in Section 3.3.9. This instruction does not cause an exception if one of the operands is an SNaN. Therefore, using it to replace comparison-branch combinations does not comply with IEEE 754.


3.3.7 IEEE 754 Considerations

IEEE 754 places requirements on the whole system, including the processor, run-time libraries, operating system, and compiler. The division of responsibility for complying with IEEE 754 varies among systems, but those parts handled by the hardware do not have to be emulated by the software.

The PowerPC architecture's hardware support for the IEEE 754 standard includes:

Given the hardware support, the run-time libraries are expected to provide functions to perform the following:

These functions are available in run-time libraries, such as those specified in the AIX API.

The operating system must provide:

The compiler must generate code in a predictable manner consistent with the source language definition. The influence of IEEE 754 occurs principally during optimization. The integer optimizations, such as strength reduction, common subexpression elimination, and even evaluation of constants at compile time, are greatly restricted in their use. Some optimizations such as scheduling, inlining of code, and certain algebraic identities remain legitimate. Farnum [1988] and Goldberg [1991] examine many of these issues.


3.3.7.1 Relaxations

Certain PowerPC floating-point extensions do not comply with IEEE 754. These include the multiply-add instructions and non-IEEE mode.

Multiply-Add
IEEE 754 requires the rounding of results that do not fit in the destination format. Because the multiply-add operation maintains the full precision of the multiplication result, rather than rounding before the addition, the final result may differ from the IEEE 754 result in a manner considered to be more accurate. Compilers should incorporate a mode that handles multiplications and additions separately for strict compatibility with IEEE 754.

Non-IEEE Mode
Setting the NI bit in the FPSCR places the processor in Non-IEEE mode. In this mode, all implementations convert denormalized results to zero with the appropriate sign. An implementation may demonstrate other modified behaviors in this mode. See the relevant implementation-specific documentation for more information. Non-IEEE mode makes the performance of floating-point arithmetic deterministic, but the results obtained in this mode may differ from those of the same calculations in IEEE mode.


3.3.8 Data Format Conversion

High-level languages define rules specifying various implicit conversions or coercions, in addition to the explicit conversion requests in the source code. Compilers may execute these conversions between different types using calls to functions in the run-time library. For simple cases, the compiler may emit the code directly.

The code examples in this section often use the optional fsel instruction. If IEEE 754 compatibility is required or if the instruction is not supported on the target processor, replace this instruction with the equivalent comparison-branch sequence.


3.3.8.1 Floating-Point to Integer

These examples convert a floating-point value in FR1 to an integer in R3. The processor always transfers values between the floating-point unit and the fixed-point unit through memory.

In general, floating-point to integer conversions require rounding. The rounding mode is that indicated in the RN field of the FPSCR unless the "z" form of the convert to integer instruction is used. Then, regardless of the rounding mode, the value is rounded toward zero, effectively truncating the fractional part of the floating-point value.

The code sequences in Figures 3-48 and 3-49 convert a floating-point value to a 32-bit and a 64-bit signed integer value, respectively. The Floating Convert To Integer instructions convert the floating-point value to an integer in the Floating-Point Register. The Store Float-Point Double instructions directly copy the bit pattern of this integer to memory. Then, the integer is loaded into a General-Purpose Register.



Figure 3-48. Convert Floating-Point to 32-Bit Signed Integer Code Sequence

32-Bit Implementation
fctiw[z] FR2,FR1 # convert to integer
stfd FR2,disp(R1) # copy unmodified to memory
lwz R3,disp+4(R1) # load the low-order 32 bits
64-Bit Implementation
fctiw[z] FR2,FR1 # convert to integer
stfd FR2,disp(R1) # copy unmodified to memory
lwa R3,disp+4(R1) # load low-order 32 bits with sign
# extension



Figure 3-49. Convert Floating-Point to 64-Bit Signed Integer Code Sequence

64-Bit Implementation Only
fctid[z] FR2,FR1 # convert to doubleword integer
stfd FR2,disp(R1) # store float double
ld R3,disp(R1) # load double word

The code sequences in Figures 3-50 and 3-51 convert a floating-point value to a 32-bit and a 64-bit unsigned integer value, respectively. The Floating Convert To Integer instruction converts a floating-point value to a signed integer. Using this instruction to convert to an unsigned integer requires some adjustments. If the floating-point input value is negative, replace it with 0. If the floating-point input value is greater than the maximum for the format (2M - 1), replace it with 2M - 1, where M is the number of bits in the resulting integer (32 or 64). If the floating-point input value lies in the range 2M-1 to 2M - 1, the range where a signed integer is negative, reduce the input value by 2M-1. Then, convert the floating-point value to an integer using a Floating Convert To Integer instruction. If the input was greater than 2M-1, add 2M-1. Negative values return 0; values exceeding 2M - 1 return 2M - 1. The 64-bit implementation of the conversion to a 32-bit unsigned integer uses the fctid[z] instruction to avoid the extra code associated with the 231 to 232 range.



Figure 3-50. Convert Floating-Point to 32-Bit Unsigned Integer Code Sequence

# FR0 = 0.0
# FR1 = value to be converted
# FR3 = 232 - 1
# FR4 = 231
# R3 = returned result
# disp = displacement from R1
32-Bit Implementation
fsel FR2,FR1,FR1,FR0 # use 0 if < 0
fsub FR5,FR3,FR1 # use 232-1 if >= 232
fsel FR2,FR5,FR2,FR3
fsub FR5,FR2,FR4 # subtract 2**31
fcmpu cr2,FR2,FR4
fsel FR2,FR5,FR5,FR2 # use diff if >= 231
# next part same as conversion to
# signed integer word
fctiw[z] FR2,FR2 # convert to integer
stfd FR2,disp(R1) # copy unmodified to memory
lwz R3,disp+4(R1) # load low-order word
blt cr2,$+8 # add 231 if input
xoris R3,R3,0x8000 # was >= 231
64-Bit Implementation
fsel FR2,FR1,FR1,FR0 # use 0 if < 0
fsub FR4,FR3,FR1 # use 232-1 if >= 232
fsel FR2,FR4,FR2,FR3
# next part same as conversion to
# signed integer word except
# convert to double
fctid[z] FR2,FR2 # convert to doubleword integer
stfd FR2,disp(R1) # copy unmodified to memory
lwz R3,disp+4(R1) # load low-order word, zero extend



Figure 3-51. Convert Floating-Point to 64-Bit Unsigned Integer Code Sequence

64-Bit Implementation Only
# FR0 = 0.0
# FR1 = value to be converted
# FR3 = 264 - 2048
# FR4 = 263
# R4 = 263
# R3 = returned result
# disp = displacement from R1
fsel FR2,FR1,FR1,FR0 # use 0 if < 0
fsub FR5,FR3,FR1 # use max if > max
fsel FR2,FR5,FR2,FR3
fsub FR5,FR2,FR4 # subtract 263
fcmpu cr2,FR2,FR4 # use diff if >= 263
fsel FR2,FR5,FR5,FR2
# next part same as conversion to
# signed integer doubleword
fctid[z] FR2,FR2 # convert to integer
stfd FR2,disp(R1) # copy unmodified to memory
ld R3,disp(R1) # load doubleword integer value
blt cr2,$+8 # add 263 if input
add R3,R3,R4 # was >= 263


3.3.8.2 Integer to Floating-Point

The code sequences in Figures 3-52 and 3-53 convert an integer in R3 to a floating-point value in FR1. 64-bit implementations include an instruction that simplifies this task.

In a 32-bit implementation, you may convert a 32-bit integer to a floating-point value as follows. Flip the integer sign bit and place the result in the low-order part of a doubleword in memory. Create the high-order part with sign and exponent fields such that the resulting doubleword value interpreted as a hexadecimal floating-point value is 0x1.00000dddddddd×10D, where 0xdddddddd is the hexadecimal sign-flipped integer value. Then, load the doubleword as a floating-point value. Subtract the hexadecimal floating-point value 0x1.0000080000000×10D from the previous value to generate the result.

The 64-bit implementations possess the Floating Convert From Integer Doubleword (fcfid) instruction, which converts an integer to a floating-point value. Both 64-bit implementation examples transfer the value to the floating-point unit, where the fcfid instruction is used.

The conversion of a 32-bit integer to a floating-point value is always exact. The conversion of a 64-bit integer to a floating-point value may require rounding, which conforms to the mode indicated in the RN field of the FPSCR.



Figure 3-52. Convert 32-Bit Signed Integer to Floating-Point Code Sequence

32-Bit Implementation
# FR2 = 0x4330000080000000
addis R0,R0,0x4330 # R0 = 0x43300000
stw R0,disp(R1) # store upper half
xoris R3,R3,0x8000 # flip sign bit
stw R3,disp+4(R1) # store lower half
lfd FR1,disp(R1) # float load double of value
fsub FR1,FR1,FR2 # subtract 0x4330000080000000
64-Bit Implementation
extsw R3,R3 # extend sign
std R3,disp(R1) # store doubleword integer
lfd FR1,disp(R1) # load integer into FPR
fcfid FR1,FR1 # convert to floating-point value



Figure 3-53. Convert 64-Bit Signed Integer to Floating-Point Code Sequence

64-Bit Implementation Only
std R3,disp(R1) # store doubleword
lfd FR1,disp(R1) # load float double
fcfid FR1,FR1 # convert to floating-point integer

The code sequences in Figure 3-54 convert an unsigned 32-bit integer to a floating-point value. These code examples parallel those given for the signed case in Figure 3-52.

In a 32-bit implementation, construct the floating-point value in memory, as before, but do not flip the sign bit. Subtract the hexadecimal floating-point value 0x1.0000080000000×10D from the loaded value to produce the result.

In a 64-bit implementation, replace the sign extension in the signed version with a zero extension performed by the rldicl instruction.



Figure 3-54. Convert 32-Bit Unsigned Integer to Floating-Point Code Sequence

32-Bit Implementation
# FR2 = 0x4330000000000000
addis R0,R0,0x4330 # R0 = 0x43300000
stw R0,disp(R1) # store high half
stw R3,disp+4(R1) # store low half
lfd FR1,disp(R1) # float load double of value
fsub FR1,FR1,FR2 # subtract 0x4330000000000000
64-Bit Implementation
rldicl R0,R3,0,32 # zero extend value
std R0,disp(R1) # store doubleword value to memory
lfd FR1,disp(R1) # load value to FPU
fcfid FR1,FR1 # convert to floating-point integer

The code sequence in Figure 3-55 converts a 64-bit unsigned integer value to a floating-point value in a 64-bit implementation. The first example converts the two 32-bit halves separately and combines the results at the end with a multiply-add instruction.

The second example presents an alternative, shorter sequence that can be used if the rounding mode is Round toward ± , or if the rounding mode does not matter. This example converts the entire integer doubleword in a single step with a subsequent addition to correct for negative values. The addition operation may cause inaccuracy in certain rounding modes.



Figure 3-55. Convert 64-Bit Unsigned Integer to Floating-Point Code Sequence

64-Bit Implementation (All Rounding Modes)
# FR4 = 232
rldicl R2,R3,32,32 # isolate high half
rldicl R0,R3,0,32 # isolate low half
std R2,disp(R1) # store high half
std R0,disp+8(R1) # store low half
lfd FR2,disp(R1) # load high half
lfd FR1,disp+8(R1) # load low half
fcfid FR2,FR2 # convert each half to floating-
fcfid FR1,FR1 # point integer (no round)
fmadd FR1,FR4,FR2,FR1 # (232)*high + low
# (only add can round)
Alternate Version (Only for Round toward ± )
# FR2 = 264
std R3,disp(R1) # store doubleword
lfd FR1,disp(R1) # load float double
fcfid FR1,FR1 # convert to floating-point integer
fadd FR4,FR1,FR2 # add 264
fsel FR1,FR1,FR1,FR4 # if R3 < 0


3.3.8.3 Rounding to Floating-Point Integer

The code sequences in Figure 3-56 round a floating-point value in FR1 to a floating-point integer in FR3. The RN field in the FPSCR determines the rounding mode, unless you use the "z" form of the convert to integer instruction. Regardless of the rounding mode, the "z" form rounds the value toward zero, effectively truncating the fractional part of the floating-point value.

You may round a floating-point value to a 32-bit integer as follows. For non-negative values, add 252 and then subtract it, allowing the floating-point hardware to perform the rounding using the rounding mode given by the RN field. For negative values, add -252 and then subtract it. If you require a rounding mode different from that specified in the RN field, this technique would require modification of RN.

In a 64-bit implementation, you may round a floating-point value to a 64-bit floating-point integer by converting to a 64-bit integer and then converting back to a floating-point value. If VXCVI is set, indicating an invalid integer conversion, the value does not fit into the 64-bit integer format (i.e., the value is greater than or equal to 264). These large values have no fractional part.



Figure 3-56. Round to Floating-Point Integer Code Sequence

32-Bit Implementation
# FR0 = 0.0
# FR2 = 0x43300000 = 252
# FR3 = 0xC3300000 = -252
fcmpu cr6,FR1,FR0
bt cr6[lt],lab # branch if value < 0.0
fcmpu cr7,FR1,FR2
bt cr7[gt],exit # input was floating-point integer
fadd FR4,FR1,FR2 # add 252
fsub FR1,FR4,FR2 # subtract 252
b exit
lab:
fcmpu cr7,FR1,FR3
bt cr7[lt],exit # input was floating-point integer
fadd FR4,FR1,FR3 # add -252
fsub FR1,FR4,FR3 # subtract -252
exit:
64-Bit Implementation
mtfsb0 23 # clear VXCVI
fctid[z] FR3,FR1 # convert to fixed-point integer
fcfid FR3,FR3 # convert back again
mcrfs 7,5 # transfer VXCVI to CR
bf 31,$+8 # skip if VXCVI was 0
fmr FR3,FR1 # input was floating-point integer


3.3.9 Floating-Point Branch Elimination

When IEEE 754 conformance is not required, the optional Floating Select (fsel) instruction can eliminate branches in certain floating-point constructions that involve comparisons. The fsel instruction has 4 operands. The first is the target register. The second operand is compared to 0.0. If it is greater than or equal to 0.0, the contents of the third operand are copied to the target register. Otherwise, the contents of the fourth operand are copied to the target register. fsel does not cause an exception if any of the operands are SNaNs. It may also generate results different from what the equivalent comparison-branch code yields in some special cases. The floating-point compare and select instructions ignore the sign of 0.0 (i.e., +0.0 is equal to -0.0).

The code examples in Figure 3-57 perform a greater than or equal to 0.0 comparison for both comparison-branch and fsel object code. The fsel code does not cause an exception for a NaN, unlike the comparison-branch code.

The use of the Condition Register Logical instruction reduces the number of branches from two to one. Unlike the integer case, greater than or equal to is not equivalent to not less than because the result of the comparison could be unordered.



Figure 3-57. Greater Than or Equal to 0.0 Code Example

Fortran Source Code
if (a .ge. 0.0) then x = y
else x = z
Branching Assembly Code
# (FR0) = 0.0
fcmpo cr7,FRa,FR0 # causes exception if a is NaN
cror cr5[fe],cr7[fe],cr7[fg] # temp = (a .gt. 0.0)
# .or. (a .eq. 0.0)
bf cr5[fe],lab1 # if (.not.temp) then branch
fmr FRx,FRy # x = y
b lab2
lab1:
fmr FRx,FRz # x = z
lab2:
fsel Assembly Code
fsel FRx,FRa,FRy,FRz # if (a .ge. 0.0) then x = y
# else x = z
# no exception if a is NaN

The code examples in Figure 3-58 perform a greater than 0.0 comparison. The fsel code uses the comparison (-a 0), while reversing the assignments. If a is a NaN, the result of the assignment is reversed from the branch case. Again, an SNaN or a QNaN does not cause an exception in the fsel case.



Figure 3-58. Greater Than 0.0 Code Example

Fortran Source Code
if (a .gt. 0.0) then x = y
else x = z
Branching Assembly Code
# (FR0) = 0.0
fcmpo cr7,FRa,FR0 # causes exception if a is NaN
bf cr7[fg],lab1 # if .not.(a .gt. 0.0) then branch
fmr FRx,FRy # x = y
b lab2
lab1:
fmr FRx,FRz # x = z
# if (a is NaN) then x = z
lab2:
fsel Assembly Code
fneg FRs,FRa # s = -a
fsel FRx,FRs,FRz,FRy # if (s .ge. 0.0) then x = z
# else x = y
# if (a is NaN) then x = y
# no exception if a is NaN

The code examples in Figure 3-59 perform an equal to 0.0 comparison. Two fsel instructions are used in series so that x is set to y only if a 0.0 and a 0.0. Otherwise, x is set to z. Again, a NaN does not cause an exception.



Figure 3-59. Equal to 0.0 Code Example

Fortran Source Code
if (a .eq. 0.0) then x = y
else x = z
Branching Assembly Code
# (FR0) = 0.0
fcmpo cr7,FRa,FR0 # causes exception if a is NaN
bf cr7[fe],lab1 # if (a .ne. 0) then branch
fmr FRx,FRy # x = y
b lab2
lab1:
fmr FRx,FRz # x = z
lab2:
fsel Assembly Code
fsel FRx,FRa,FRy,FRz # if (a .ge. 0.0) then x = y
# else x = z
fneg FRs,FRa # s = -a
fsel FRx,FRs,FRx,FRz # if (s .ge. 0.0) then x = x
# else x = z
# no exception if a is NaN

The code examples in Figure 3-60 perform the min(a,b) function. You may compute this function by comparing 0.0 to the difference between a and b, and using the result to select a or b conditionally. If either a or b is a NaN, a different result from the comparison-branch version may occur. Moreover, the subtraction operation may cause exceptions that do not occur for the comparison-branch case. The code for the maximum function is analogous.



Figure 3-60. Minimum Code Example

Fortran Source Code
x = min(a,b)
Branching Assembly Code
fcmpo cr7,FRa,FRb # causes exception if a or b is NaN
bf cr7[fl],lab1 # if (.not.(a .lt. b)) then branch
fmr FRx,FRa # min = a
b lab2
lab1:
fmr FRx,FRb # min = b
# if (a or b is NaN) then min = b
lab2:
fsel Assembly Code
fsub FRs,FRa,FRb # s = a - b
# causes exception if
# a or b is SNaN
fsel FRx,FRs,FRb,FRa # if (s .ge. 0.0) then min = b
# else min = a
# if (a or b is NaN) then min = a
# no exception if a or b is QNaN

The code examples in Figure 3-61 compare a and b for equality. Two fsel instructions are used in series so that x is set to y only if (a - b) 0.0 and (a - b) 0.0. Otherwise, x is set to z. This comparison between two values is similar to the minimum function, having similar incompatibilities with the comparison-branch approach.



Figure 3-61. a Equal To b Code Example

Fortran Source Code
if (a .eq. b) then x = y
else x = z
Branching Assembly Code
fcmpo cr7,FRa,FRb # causes exception if a or b is NaN
bf cr6[fe],lab1 # if (a .eq. b) then branch
fmr FRx,FRy # x = y
b lab2
lab1:
fmr FRx,FRz # x = z
lab2:
fesl Assembly Code
fsub FRs,FRa,FRb # s = a - b
# causes exception if
# a or b is SNaN
fsel FRx,FRs,FRy,FRz # if (s .ge. 0.0) then x = y
# else x = z
fneg FRs,FRs # s = -s
fsel FRx,FRs,FRx,FRz # if (s .ge. 0.0) then x = z
# else x = z
# no exception if a or b is QNaN


3.3.10 DSP Filters

The dot product represents the core of DSP algorithms. In fact, matrix multiplication forms the basis of much scientific programming. Figure 3-62 shows the C source for the example of a matrix product.



Figure 3-62. Matrix Product: C Source Code

for(i=0;i<10;i++){
for(j=0;j<10;j++){
c[i][j]=0;
for(k=0;k<10;k++){
c[i][j] = c[i][j] + a[i][k] * b[k][j];
}}}

The central fragment/inner loop code is:

c[i][j] = c[i][j] + a[i][k] × b[k][j];

assume:

Ra -> points to array a - 8

Rb-> points to array b - 80

Rc-> points to array c - 8

Figure 3-63 shows the object code for the double-precision floating-point case. The multiply-add instructions and update forms of the load and store instructions combine to form a tight loop. This example represents an extremely naive compilation that neglects loop transformations.



Figure 3-63. Double-Precision Matrix Product: Assembly Code

dloop:
lfdu FR0,8(Ra) # load a[i][k], bump to next element
lfdu FR1,80(Rb) # load b[k][j], bump to next element
fmadd FR2,FR0,FR1,FR2 # c[i][j] + a[i][k] * b[k][j]
bdnz dloop
stfdu FR2,8(Rc) # store c[i][j] element


3.3.11 Replace Division with Multiplication by Reciprocal

Because a floating-point division operation requires significantly more cycles to execute than a floating-point multiplication operation, you might replace repeated division by a loop invariant with calculation of the reciprocal prior to entering the loop and multiplication by the reciprocal inside the loop. The combined rounding error from the calculation of the reciprocal and from multiplication of the reciprocal by the dividend may differ from the rounding error of simple division. Hence, this procedure may yield a result different from that required by IEEE 754. The code example in Figure 3-64 includes a Newton-Raphson iteration to correct for this error. This transformation requires the full precision of the intermediate result during the multiply-add operation. The correction generates the IEEE 754 result so long as the divisor is a non-zero normal number and the dividend is not infinite. If the divisor is a denormal, multiplying by the reciprocal may give a different result than division (the reciprocal of a denormalized number may be infinite). Adding the special cases to the code complicates the use of this method by the compiler. If it is known that the special cases do not occur, this technique simplifies.



Figure 3-64. Convert Division to Multiplication by Reciprocal Code Example

Fortran Source Code
do 10 j = 1,n
10 x(j) = x(j)/y
Assembly Code
# FR1 = 1.0
# FR5 = y
# R3 = address of x(j) - 8
# CTR = n
fdiv FR2,FR1,FR5 # calculate the reciprocal
# FR2 = rec = 1/y
loop:
lfd FR3,8(R3) # load x(j)
fmul FR4,FR3,FR2 # multiply by the reciprocal
# FR4 = q = x(j)*rec
fneg FR6,FR4
fmadd FR7,FR6,FR5,FR3 # calculate residual
# FR5 = res = x(j) - q*y
fmadd FR8,FR7,FR2,FR4 # correct x(j)*rec
# x(j)/y = x(j)*rec + (res * rec)
stfdu FR8,8(R3) # store x(j)/y with update
bdnz loop


3.3.12 Floating-Point Exceptions

The default responses defined by IEEE 754 for floating-point exception conditions are satisfactory for many cases, but certain situations require the additional software control offered by a trap. The PowerPC architecture provides four floating-point exception modes: Ignore Exceptions, Imprecise Nonrecoverable, Imprecise Recoverable, and Precise. Many implementations, however, support only the Ignore Exceptions and Precise modes. Depending on the implementation, Precise mode may substantially degrade the performance of the processor. Therefore, software alternatives to Precise mode that enable trapping on floating-point exceptions, but maintain performance become important. The choice of using an alternative depends sensitively on the way floating-point operations are handled in the compiler and the performance of Precise mode on the target implementations.

These software alternatives involve placing the processor into Ignore Exceptions mode, but enabling the desired exceptions through the Floating-Point Status and Control Register (FPSCR). Interrogation of the FEX bit in the FPSCR reveals whether an enabled exception has occurred. This interrogation may be carried out using run-time library functions, such as those described in the AIX API. Another approach uses a recording floating-point instruction form followed by a conditional branch to test the copy of FEX placed in CR1. If FEX is set indicating that an enabled exception has occurred, control is transferred to the appropriate trap handling routine. This control transfer can be managed with an unconditional trap instruction that generates an interrupt, which transfers control to the usual floating-point interrupt handler or with a simple call to a service routine. Figure 3-65 shows an example that uses a recording floating-point instruction, a conditional branch to interrogate FEX, and an unconditional trap to transfer control to the interrupt handler. The interrupt handler must fully diagnose the cause of the interrupt, which involves interrogating the FPSCR register.



Figure 3-65. Precise Interrupt in Software Code Example

fma. FR1,FR1,FR2,FR3 # recording floating-point operation
bt cr1[fex],lab2 # branch if there is an exception
lab1:
...
lab2:
trap # unconditional trap to handler
b lab1 # return

Depending on the frequency of floating-point operations and the cost of running in Precise mode for the implementation under consideration, the overhead associated with the application of software-controlled trapping to all floating-point instructions may degrade performance more than running in Precise mode. The potential benefit of these approaches may occur only when a single test for exceptions at the end of a series of floating-point operations suffices. In this case, if an exception occurs during the execution of the series, you may not be able identify the operation that caused it.


[ Table of Contents | Index ]
Copyright 1998 IBMchips