IBM Search Java Site Map Microelectronics News Products Services Technology About Us IBM Microelectronics Order Contact Legal

power pcProducts

overview
news
products
documents
performance
technology


[ Table of Contents | Index ]

4.2 Hazards

The PowerPC architecture requires any implementation to contain enough interlocks so that the sequential execution model is maintained. This section examines the various mechanisms that PowerPC implementations use to maintain the sequential execution model in the face of potential data hazards, control hazards, and structural hazards.


4.2.1 Data Hazards

A data hazard is a situation in which an instruction has a data dependence or a name dependence on a prior instruction, and they occur close enough together in the instruction sequence that the processor could generate a result inconsistent with the sequential execution model. There are three ways for a data hazard to occur:

The simplest means to eliminate a data hazard is for the processor to execute the instructions sequentially and, if necessary, to stall the instruction that occurs later in program order until the first instruction completes its use of a mutually required operand. Forwarding (feedback or bypassing) represents a performance improvement for handling true dependences. In a simple model of instruction execution, an instruction writes its result to a register from which a subsequent dependent instruction reads its source operand. Forwarding can improve performance by providing the results of the first instruction to a subsequent instruction simultaneous with the write to the register file. For example, the final stage of processing an integer instruction consists of writing the result to a General-Purpose Register for access by subsequent instructions, but this write back may require an extra cycle. PowerPC implementations usually include forwarding logic that provides the result to subsequent instructions during the completion stage and thereby permits dependent integer instructions to execute in consecutive cycles. Forwarding may apply to results within an execution unit for a subsequent execution in that unit, or to results of one unit required in some other unit. During execution of an integer comparison, for instance, the processor may directly forward a Condition Register field result to the Branch Processing Unit for use by a subsequent branch instruction. Forwarding is reflected in the instruction timing for a given implementation.

To avoid RAW hazards, the processor must sequentially execute the relevant instructions. Renaming of operands, however, may be used to eliminate WAR and WAW hazards. Dynamic register renaming capability varies among PowerPC implementations from none to full renaming of General-Purpose Registers, Floating-Point Registers, and Condition Register fields.

On some implementations, certain registers may have associated shadow registers. These registers are most often associated with Branch-Unit registers, like the Link Register and the Count Register. For example, a shadow register stack for the Link Register may allow speculative execution of function calls.

Full register renaming defines a new renamed register for every result. High-performance implementations include a large rename register file. When the rename register file is full, the processor stalls at dispatch until slots in the file become available. The string instructions tend to serialize the processor because of the difficulty associated with renaming the multiple destination registers. The update instructions represent two results, which most implementations can handle unless a large number of the update instructions appear consecutively. Knowledge of the processor's dynamic register renaming capability is important during register allocation. Register allocation produces many antidependences as it tries to optimize register reuse. If the implementation has minimal or no dynamic register renaming, the compiler should statically rename the registers to improve performance.


4.2.2 Control Hazards

Control hazards result when an unresolved branch makes the correct path of execution uncertain. When a processor encounters an unresolved conditional branch, it has these options to prevent the control hazard:

Stalling until the branch is resolved is the simplest alternative, but this alternative idles some of the execution units. Speculative execution down multiple branch paths may require a substantial increase in hardware. All current PowerPC implementations predict how the branch will be resolved and speculatively continue execution down the predicted path. Accurate branch prediction algorithms may allow speculatively computed results to be used more than 90% of the time, depending on the program, the prediction algorithm, and hardware support for prediction.

Conditional branch instructions include a static prediction bit that allows a compiler to specify how the processor predicts the branch, although some implementations ignore this bit. Section 3.1.4 on page 35 describes the static branch prediction mechanism.

Dynamic branch prediction uses hardware to track the history of specific branches. Although software does not directly control these mechanisms, they can significantly affect code performance. Knowledge of their behavior can help software to estimate the costs of misprediction for those processors that implement dynamic prediction. The main dynamic prediction mechanisms used in current implementations include branch target address caches and branch history tables.

A Branch Target Address Cache (BTAC) stores the target-addresses of taken branches as a function of the address of the branch instruction. If this branch instruction is fetched again, the fetch logic will automatically fetch the cached target address on the next cycle, even without decoding the fetched instructions. Correctly predicted branches may effectively execute in zero cycles. This approach saves a cycle, but if the branch was resolved and mispredicted, a delay associated with this misprediction may occur. The size of this delay depends on the stage in the pipeline at which the misprediction is identified. Some implementations may store target addresses in the BTAC as a function of an address that references two or more instructions. In such implementations, branches should be separated to avoid the interference caused by a taken branch writing its target address over the target address of another branch.

A Branch History Table (BHT) maintains a record of recent outcomes for conditional branches (taken or not taken). Many implementations have branch history tables that associate 2 bits with each conditional branch in the table. The four states of the 2-bit code stand for strongly taken, weakly taken, weakly not taken, and strongly not taken. Figure 4-2 shows the relationship between these four states. A conditional branch whose BHT entry is taken, either strongly or weakly, is predicted taken. Likewise, any branch whose entry is not taken, is predicted not taken. If a branch is strongly taken, for example, and is mispredicted once, the state becomes weakly taken. On the next encounter of the branch, it is still predicted taken. Requiring two mispredictions to reverse the prediction for a branch prevents a single anomalous event from modifying the prediction. If the branch is mispredicted twice, however, the prediction reverses.


Figure 4-2. 2-Bit Branch History Table Algorithm

The PowerPC architecture offers no means for the operating system to communicate a context switch to the dynamic branch prediction hardware, so the saved history may represent another context. The processor will correctly execute the code, but additional misprediction and the associated degradation of performance may be introduced.


4.2.3 Structural Hazards

Structural hazards occur when different instructions simultaneously access the same hardware resources, which can be execution units, dispatch or reservation slots, register file ports, store queue slots, and so forth. The processor handles this hazard by stalling the later instruction in program order until the resource becomes available. Hardware designers can reduce this conflict by duplicating the resource in contention while adding the necessary logic for its correct integration into the processor.


4.2.4 Serialization

To maintain a processor and memory state consistent with the sequential execution model, in certain situations, implementations may serialize the execution of a whole class of instructions or even all instructions. These situations may involve hazards or modifications of the processor state. For example, if there is more than one Fixed-Point Unit, additional precautions may be required to ensure that common resources, such as the XER fields, are correctly maintained in program order. If the floating-point rounding mode is changed, the processor must ensure that all subsequent floating-point operations execute in the new mode. If Precise mode is enabled requiring precise floating-point exceptions, floating-point instructions may need to execute in program order. Serialization might involve placing an interlock on the dispatch of certain instructions; preventing a certain instruction from executing until it is the oldest uncompleted instruction in the pipeline; or flushing the instruction pipeline, refetching and re-executing the instructions following a particular instruction. Appendix B and the user manuals for specific implementations contain further details regarding serializing instructions and the processor's response.


[ Table of Contents | Index ]
Copyright 1998 IBMchips