IBM Search Java Site Map Microelectronics News Products Services Technology About Us IBM Microelectronics Order Contact Legal

power pcProducts

overview
news
products
documents
performance
technology


[ Table of Contents | Index ]

Appendix B

B. Summary of PowerPC 6xx Implementations


This appendix summarizes the implementation features of currently available PowerPC 6xx processors that are potentially of interest to compiler writers. These features principally involve the performance of the programmer interface outlined in Book I of The PowerPC Architecture. The abbreviations used for the execution units in this section include:


B.1 Feature Summary

Figure B-1 compares the currently available processors and the Common Model described in Section 4.3.6 on page 117. The following features are summarized:



Figure B-1. PowerPC 6xx Processor Features

Feature Common Model PowerPC 601 Processor PowerPC 603e Processor PowerPC 604 Processor
Implementation Type 32-Bit 32-Bit 32-Bit 32-Bit
Maximum Number of Instructions Fetched per Cycle 8 2 4
Instruction Queue 8-Entry 6-Entry 8-Entry
Maximum Number of Instructions Issued per Cycle 3 3 3 4
Number of Rename Registers (implied) LR—2 GPR—5 FPR—4 LR—1 CTR—1 CR—1 GPR—12 FPR—8 LR—1 CTR—1 CR—8
Execution Units BPU FXU FPU BPU FXU FPU BPU FXU LSU SRU FPU BPU 2 SFXs CFX LSU FPU
Reservation Stations none FXU—1 LSU—1 SRU—1 FPU—1 BPU—2 SFX—
2 each
CFX—2 LSU—2 FPU—2
Maximum Number of Instructions Completed per Cycle 3 3 2 4
Completion Unit none 5-Entry 16-Entry
Caches (implied) 8-Way 32KB Unified 64-Byte Cache Block 4-Way 16KB I- and D-Caches 32-Byte Cache Block 4-Way 16KB I- and D-Caches 32-Byte Cache Block
TLBs 256-Entry Unified TLB 2-Way 64-entry ITLB and DTLB 2-Way 128-Entry ITLB and DTLB
Reorder Loads and Stores during bus transactions during cache access during cache access
Load and Store Queues 2-Entry Read 3-Entry Write 1-Entry Store Queue 4-Entry Finish Load Queue 6-Entry Store Queue
Branch Prediction Static Static 1 Level of Prediction Static 1 Level of Prediction Dynamic 64-Entry BTAC 512-Entry BHT (2 bits per entry) 2 Levels of Prediction
GPR—General-Purpose Register
FPR—Floating-Point Register
LR—Link Register
CTR—Count Register
CR—Condition Register
BTAC—Branch Target Address Cache
BHT—Branch History Table


B.2 Serialization

In order to maintain the appearance of execution in program order, the processor must occasionally enforce varying degrees of sequential execution in the processor. The degree depends on both the implementation and the instruction. The PowerPC 601 implementation uses a system of tags that flow through the fixed-point pipeline; therefore, only instructions that the PowerPC architecture defines as synchronizing demonstrate serializing behavior. The PowerPC 603 and 604 implementations permit the instructions to move through the pipelines more independently, so certain instructions incorporate the degrees of serialization as indicated in the following sections.


B.2.1 PowerPC 603e Processor Classifications

The PowerPC 603 processor uses the following categories for its serializing instructions:


B.2.2 PowerPC 604 Processor Classifications

The PowerPC 604 processor uses the following categories for its serializing instructions:


B.3 Instruction Timing

The following cycle counts assume that:

The columns in the table are:



Figure B-2. Branch Instructions

Instructions Implementation Execution Unit Execution Time Latency Serialize
b[l][a], bc[l][a], bcctr[l], bclr[l] Common Model BRU 1
601 BPU 1
603e BPU 1
604 BPU 1
crand, cror, crnand, crnor, crxor, creqv, crandc, crorc Common Model BRU 1 1
601 FXU 1 1
603e SRU 1 1 completion
604 BPU 1 1 execution
mcrf Common Model BRU 1 1
601 FXU 1 1
603e SRU 1 1 completion
604 BPU 1 1 execution



Figure B-3. Load and Store Instructions

Instructions Implementation Execution Unit Execution Time Latency Serialize
lbz, lbzu, lbzux, lbzx, lha, lhau, lhaux, lhax, lhz, lhzu, lhzux, lhzx, lwz, lwzu, lwzux, lwzx, lhbrx, lwbrx Common Model FXU 1 2
601 FXU 1 2
603e LSU 1 2
604 LSU 1 2
stb, stbu, stbux, stbx, sth, sthu, sthux, sthx, stw, stwu, stwux, stwx, sthbrx, stwbrx Common Model FXU 1 1
601 FXU 1 1
603e LSU 1 2
604 LSU 1 3 execution
lfd, lfdu, lfdux, lfdx, lfs, lfsu, lfsux, lfsx Common Model FXU 1 3
601 FXU 1 3
603e LSU 1 2
604 LSU 1 3
stfd, stfdu, stfdux, stfdx, stfs, stfsu, stfsux, stfsx Common Model FXU 1 1
601 FXU 1 1
603e LSU 1 2
604 LSU 1 3 execution
lmw Common Model FXU #reg #reg + 1
601 FXU #reg #reg + 1
603e LSU #reg + 2 #reg + 2 dispatch
604 LSU #reg + 2 #reg + 2 string/
multiple
stmw Common Model FXU #reg #reg + 1
601 FXU #reg #reg
603e LSU #reg + 1 #reg + 1 dispatch
604 LSU #reg + 2 #reg + 2 string/
multiple
lswi, lswx Common Model FXU #reg #reg + 1
601 FXU #reg #reg + 1
603e LSU #reg + 2 #reg + 2 dispatch
604 LSU 2 #reg + 2 2 #reg + 2 string/
multiple
stswi, stswx Common Model FXU #reg #reg + 1
601 FXU #reg #reg
603e LSU #reg + 1 #reg + 1 dispatch
604 LSU #reg + 2 #reg + 2 string/
multiple
lwarx Common Model FXU 1 1
601 FXU 1 2
603e LSU 1 2
604 LSU 1 3+bus execution
stwcx. Common Model FXU 1 1/2
601 FXU 2 2/3
603e LSU 10 10 completion
604 LSU 3 3 execution



Figure B-5. Fixed-Point Computational Instructions

Instructions Implementation Execution Unit Execution Time Latency Serialize
addi, addis, add[o][.], subf[o][.], addic[.], subfic, addc[o][.], subfc[o][.], neg[o][.] Common Model FXU 1 1/3
601 FXU 1 1/1
603e FXU, SRU 1 1/2
604 SFX 1 1/1
adde[o][.], subfe[o][.], addme[o][.], subfme[o][.], addze[o][.], subfze[o][.] Common Model FXU 1 1/3
601 FXU 1 1/1
603e FXU, SRU 1 1/2
604 SFX 1 1/1 execution
mulli Common Model FXU 3-5 3-5
601 FXU 5 5
603e FXU 2-3 2-3
604 CFX 3 3
mulhw[.], mullw[o][.] Common Model FXU 5 5/8
601 FXU 5-9 5-9/5-9
603e FXU 2-5 2-5/3-6
604 CFX 3-4 3-4/4-5
mulhwu[.] Common Model FXU 5 5/8
601 FXU 5-10 5-10/5-10
603e FXU 2-6 2-6/3-7
604 CFX 1-2 3-4/4-5
divw[o][.], divwu[o][.] Common Model FXU 19 19/21
601 FXU 36 36/36
603e FXU 37 37/38
604 CFX 19 20/21
cmp, cmpi, cmpl, cmpli Common Model FXU 1 3
601 FXU 1 1
603e FXU, SRU 1 1
604 SFX 1 1
and[.], or[.], nand[.], nor[.], xor[.], eqv[.], andc[.], orc[.], andi., andis., ori, oris, xori, xoris, extsb[.], extsh[.] Common Model FXU 1 1/3
601 FXU 1 1/1
603e FXU 1 1/2
604 SFX 1 1/2
603e LSU 8 8/9
604 LSU 1 3/4 execution



Figure B-4. Cache Control Instructions

Instructions Implementation Execution Unit Execution Time Latency Serialize
dcbf, dcbst Common Model
601 FXU 1 1
603e LSU 2 (miss) 5 (hit) 2 (miss) 5 (hit) complete
604 LSU 1 3 execution
dcbi Common Model
601 FXU 1 1
603e LSU 2 2 completion
604 LSU 1 3 execution
dcbt, dcbtst Common Model FXU 1 1
601 FXU 1 1
603e LSU 2 2 completion
604 LSU 1 2 execution
dcbz Common Model
601 FXU 1 1 <237">
cntlzw[.] Common Model FXU 1 1/3
601 FXU 1 1/1
603e FXU 1 1/2
604 SFX 1 1/2
rlwimi[.], rlwinm[.], rlwnm[.], slw[.], sraw[.], srawi[.], srw[.] Common Model FXU 1 1/3
601 FXU 1 1/1
603e FXU 1 1/2
604 SFX 1 1/2
mtlr, mtctr Common Model FXU 1 4
601 FXU 1 2
603e SRU 2 2
604 CFX 1 1 dispatch
mflr, mfctr Common Model FXU 1 2
601 FXU 1 1
603e SRU 1 1 completion
604 CFX 1 3 execution
mtxer Common Model FXU 1 1
601 FXU 1 1
603e SRU 2 2 dispatch
604 CFX 1 1 completion
mfxer Common Model FXU 1 1
601 FXU 1 1
603e SRU 1 1 completion
604 CFX 3 3 execution
mtcrf Common Model FXU 1 3
601 FXU 1 1
603e SRU 1 1 completion
604 SFX
CFX
1
1
1
1

dispatch/execution
mcrxr Common Model FXU 1 3
601 FXU 1 1
603e SRU 1 1 dispatch
604 CFX 1 3 execution
mfcr Common Model FXU 1 1
601 FXU 1 1
603e SRU 1 1 completion
604 CFX 1 3 execution
Setting the Overflow bit causes postdispatch serialization in the PowerPC 604 processor.



Figure B-6. Floating-Point Instructions

Instructions Implementation Execution Unit Execution Time Latency Serialize
fmr[.], fabs[.], fnabs[.], fneg[.] Common Model FPU 1 3/10
601 FPU 1 4/4
603e FPU 1 3/3
604 FPU 1 3/4
fadd[.], fsub[.] Common Model FPU 1 3/10
601 FPU 1 4/4
603e FPU 1 3/3
604 FPU 1 3/4
fmul[.], fmadd[.], fmsub[.], fnmadd[.], fnsub[.] Common Model FPU 1 3/10
601 FPU 2 5/5
603e FPU 2 4/4
604 FPU 1 3/4
fadds[.], fsubs[.], fmuls[.], fmadds[.], fmsubs[.], fnmadds[.], fnsubs[.] Common Model FPU 1 3/10
601 FPU 1 4/4
603e FPU 1 3/3
604 FPU 1 3/4
fdiv[.] Common Model FPU 19 21/28
601 FPU 31 31/31
603e FPU 33 33/33
604 FPU 32 32/33
fdivs[.] Common Model FPU 19 21/28
601 FPU 17 17/17
603e FPU 18 18/18
604 FPU 18 18/19
fctiw[.], fctiwz[.], frsp[.] Common Model FPU 1 3/10
601 FPU 1 4/4
603e FPU 1 3/3
604 FPU 1 3/4
fcmpo, fcmpu Common Model FPU 1 8
601 FPU 1 2
603e FPU 1 3
604 FPU 1 3
mffs[.] Common Model FPU 1 1/8
601 FPU 1 4/4
603e FPU 1 3/3 completion
604 FPU 1 3/4
mcrfs Common Model BPU, FPU 1 1
601 FXU 1 4
603e FPU 3 4 completion
604 FPU 3 3
mtfsf[.], mtfsfi[.], mtfsb0[.], mtfsb1[.] Common Model FPU 1 1/8
601 FXU 4 4/4
603e FPU 3 3/3 completion
604 FPU 3 3/4



Figure B-7. Optional Instructions

Instructions Implementation Execution Unit Execution Time Latency Serialize
stfiwx Common Model
601
603e LSU 1 2
604 LSU 1 3 execution
fres[.] Common Model
601
603e FPU 18 18/18
604 FPU 18 18/19
frsqrte[.] Common Model
601
603e FPU 1 3/3
604 FPU 1 3/4
fsel[.] Common Model
601
603e FPU 1 3/3
604 FPU 1 3/4


B.4 Misalignment Handling

PowerPC processors can automatically handle some misaligned accesses. Figure B-8 shows the number of transfers required to access various misaligned operands, or indicates that the processor generates an alignment interrupt. For the indicated processors, all misaligned accesses in Little-Endian mode cause alignment interrupts. Moreover, the use of any load multiple, store multiple, load string, or store string operation in Little-Endian mode causes an alignment interrupt.



Figure B-8. Number of Accesses for Misaligned Operands

Misalignment Type 601 603e 604
Halfword cross 2B boundary 1 1 1
cross 4B boundary 1 2 2
cross 8B boundary 2 2 2
cross 4KB boundary interrupt 1 interrupt 1 interrupt 1
cross 256MB boundary interrupt interrupt interrupt
Word
cross 4B boundary
1 2 2
cross 8B boundary
2 2 2
cross 4KB boundary
interrupt 1 interrupt 1 interrupt 1
cross 256MB boundary
interrupt interrupt interrupt
load/
store multiple
not word-aligned 1.5#reg 4 interrupt interrupt
word-aligned, but cross 4KB boundary #reg 2 2
word-aligned, but cross 256MB boundary interrupt interrupt 3 interrupt 3
load/
store string
cross 256MB boundary interrupt interrupt 3 interrupt 3
lwarx, stwcx. not word-aligned interrupt interrupt interrupt
single-precision floating-point
cross 4B boundary
1 interrupt interrupt
cross 8B boundary
2 interrupt interrupt
cross 4KB boundary
interrupt 1 interrupt interrupt
cross 256MB boundary
interrupt interrupt interrupt
double-precision floating-point odd-word-aligned 2 2 2
not-word-aligned 2 interrupt interrupt
cross 4KB boundary interrupt 1 interrupt interrupt
cross 256MB boundary interrupt interrupt 3 interrupt
— means not applicable.
interrupt refers to an alignment interrupt.
T refers to the T bit in the Segment Register.
DR refers to the Data Relocation bit in the Machine State Register.
1 If T = 0 and DR = 1.
2 If miss in TLB, get PTE and restart instruction.
3 If T changes.
4 The number of cycles depends on the position of the access relative to the doubleword boundaries.


[ Table of Contents | Index ]
Copyright 1998 IBMchips