IBM Search Java Site Map Microelectronics News Products Services Technology About Us IBM Microelectronics Order Contact Legal

power pcProducts

overview
news
products
documents
performance
technology


[ Table of Contents | Index ]
Chapter 1

1. Introduction


High-performance computer systems depend on good hardware design coupled with powerful compilers and operating systems. Although announced in 1991, the PowerPC architecture represents the end product of nearly 20 years of evolution starting with work on the 801 system at IBM. From the beginning, advanced hardware and software techniques were intermingled to develop first RISC and then superscalar computer systems. This guide describes how a compiler may select and schedule code that performs to the potential of the architecture.


1.1 RISC Technologies

The time required to execute a program is the product of the path length (the number of instructions), the number of cycles per instruction, and the cycle time. These three variables interact with one another. For example, reducing the cycle time reduces the window of time in which useful work can be performed, so the execution of a complex instruction may be unable to finish. Then, the function of the complex instruction must be separated into multiple simpler instructions, increasing the path length. Identifying the optimal combination of these variables in the form of an instruction set architecture, therefore, represents a challenging problem whose solution depends on the hardware technology and the software requirements.

Historically, CISC architectures evolved in response to the limited availability of memory because complex instructions result in smaller programs. As technology improved, memory cost dropped and access times decreased, so the decode and execution of the instructions became the limiting steps in instruction processing. Work at IBM, Berkeley, and Stanford demonstrated that performance improved if the instruction set was simple and instructions required a small number of cycles to execute, preferably one cycle. The reduction in cycle time and number of cycles needed to process an instruction were a good trade-off against the increased path length. Development along these RISC lines continued at IBM and elsewhere. The physical design of the computer was simplified in exchange for increased hardware management by compilers and operating systems.

The work at IBM led to the development of the POWER™ architecture, which implemented parallel instruction (superscalar) processing, introduced some compound instructions to reduce instruction path lengths in critical areas, incorporated floating-point as a first-class data type, and simplified the architecture as a compiler target. Multiple pipelines permitted the simultaneous execution of different instructions, effectively reducing the number of cycles required to execute each instruction. The POWER architecture refined the original RISC approach by improving the mapping of the hardware architecture to the needs of programming languages. The functionality of key instructions was increased by combining multiple operations in the same instruction: the load and store with update instructions, which perform the access and load the effective address into the base register; the floating-point multiply-add instructions; the branch-on-count instructions, which decrement the Count Register and test the contents for zero; or the rotate-mask instructions. This increased functionality significantly reduced the path length for critical areas of code, such as loops, at the expense of moderately longer pipeline stages.

The POWER instruction set architecture and the hardware implementation were developed together so that they share a common partitioning based on function, minimizing the interaction between different functions. By arranging the instruction set in this way, the compiler could better arrange the code so that there were fewer inter-instruction dependencies impeding superscalar dispatch. The role of the compiler became more important because it generated code that could extract the performance potential of this superscalar hardware.

IBM, Motorola, and Apple jointly defined the PowerPC architecture as an evolution of the POWER architecture. The modifications to the POWER architecture include:


1.2 Compilers and Optimization

The quality of code generated by a compiler is measured in terms of its size and execution speed. The compiler must balance these factors for the particular programming environment. The quality is most profoundly affected by the choice of algorithm and data structures, choices which are the province of the individual programmer. Given the algorithm and data structures, quality depends upon a collusion between the compiler, the processor architecture, and the specific implementation to best exploit the resources of the computer system. Modern processors rely upon statistical properties of the programs and upon the ability of the compiler to transform and schedule the specification of the algorithm in a semantically equivalent way so as to improve the performance of individual programs. Today, most programming is done in a high-level language. The compilers for these languages are free to generate the best possible machine code within the constraint that the semantics of the language are preserved. This book concentrates on compilers for procedure-oriented languages, such as C or Fortran.

Optimizations are traditionally classified as machine-independent or machine-dependent. Compilers usually perform machine-independent optimizations by transforming an intermediate language version of the program into an equivalent optimized program, also expressed in the intermediate language. The choice of optimizations normally considered machine-independent and their order of application, however, may actually be machine-dependent. Most classical compiler issues, including the front-end syntactic and semantic checks, intermediate language, and most machine-independent optimizations are not covered here; they are described elsewhere in the literature. This book focuses principally on implementation-dependent optimizations specific to the PowerPC architecture.

Machine-dependent optimizations require detailed knowledge of the processor architecture, the Application Binary Interface (ABI) and the processor implementation. Detailed issues of code choice depend mostly on the architecture. Typical compilers examine the intermediate representation of the program and select semantically equivalent machine instructions. The ABI is a convention that allows programs to function in a particular programming environment, but restricts the type of code that a compiler can emit in many contexts. Two PowerPC compilers that target different operating environments may generate quite different optimized code for the same program. Machine-dependent optimizations, such as program layout, scheduling, and alignment considerations, depend on the implementation of the architecture. In the case of the PowerPC architecture, there are a number of implementations, each with different constraints on these optimizations.


1.3 Assumptions

The assumptions made in this book include:


[ Table of Contents | Index ]
Copyright 1998 IBMchips