CmpSci 535 Notes from Lecture 4

The Instruction Set Architecture

As you are aware from your assembly language programming course, machines are programmed with binary codes that represent instructions and data. The fact that the program and data can both reside in memory is called the "stored program concept" and is what makes computers so flexible.

Instructions are usually divided into fields that represent different information required to carry out an operation (such as the op-code, data addresses, register operands, etc.). In your prior experience with the Intel processors you worked with a complex instruction set. Let's briefly review the features of this instruction set architecture:

Intel Instruction Set Architecture

Eight "General Purpose" registers: EAX, EBX, ECX, EDX, ESI, EDI, EBP, ESP

The first four can be treated as 32 or 16 bit registers, with the lower 16 bits being further divided into bytes that are designated separately and which provide limited compatibility with the 8-bit Intel architectures.

Six Segment registers: CS, SS, DS, ES, FS, GS

CS being Code Segment, SS being Stack Segment, DS being Data Segment (along with the other three)

Instruction Pointer (16 or 32 bits), Flags Register (32 bits, with some undefined, although each new generation seems to define more of them)

Memory Management Registers:

Control Registers:

Floating Point Registers:

Control Register (16 bits), Status Register (16 bits), Tag Word (16 bits), Instruction Pointer (48 bits), Data Pointer (48 bits)

Debug Registers

The Intel 386/486/Pentium has 12 addressing modes:

Operands in the Intel processors can be 8, 16, 32, 48, 64, or 80 bits long. The instruction set also supports string operations over the entire address range.

Instructions in the Intel design can be as small as one byte or as long as 12 bytes and any combination in between. The first bytes generally contain opcode, mode specifiers, and register fields, while the remainder are for address displacement and immediate data.

Now for the sake of comparison, let's take a look at the Instruction Set Architecture of the MIPS family of RISC processors.

MIPS R4000

The MIPS family has four addressing modes:

Memory accesses in the MIPS architecture are to any multiple between 1 and 8 bytes.

There are three instruction formats, all of which are 32 bits in length.

Comparison

Even at this cursory level of review, it's clear how much simpler the Instruction Set Architecture of the MIPS processor is. The same is true of most RISC processors.

What are the advantages of making the ISA simple?

Easier to program.

Easier to build.

Requires less hardware for basic functionality, leaving room for circuitry to increase performance.

Let's take a closer look at each of these reasons in turn.

A RISC ISA is easier to program in the sense that we have fewer alternatives to choose between -- there are only a few ways to accomplish a given operation that make sense. In a CISC ISA, we have many alternatives to choose from. While we, as assembly language programmers, may be comfortable with making those choices, most compilers don't have the necessary sophistication to make the subtle distinctions necessary. In fact, one of the studies that resulted in the RISC approach was an analysis of object code on CISC architectures that showed only a tiny fraction of the available instructions were being used. The compilers were generating simple, straightforward code and did not have the sophistication to make use of the many special operations in the CISC ISA.

However, a RISC ISA is also more difficult to program in that it often requires a greater number of instructions than the CISC ISA to accomplish the same task. It is not unusual for a RISC object file to be 50% larger than a CISC object for the same program.

The RISC architecture is easier to build in that it is more regular, has a simpler set of interconnections, fewer special cases, and generally takes less circuitry to implement. However, it is also more difficult to build in that it must run faster than a CISC architecture in order to execute the greater number of instructions in the same time. Faster circuits are harder to design and consume more power. But at least the RISC design compensates for this difficulty by eliminating other sources of complexity. The RISC processor also requires a larger cache memory. Why?

Because it's object code is larger, it has to hold more instructions in cache.

RISC does require less hardware for the basic functionality, which (if we ignore the need for a larger cache) leaves room for more acceleration circuits to be added. For example, the complex pipeline of the Intel P6 is only just catching up to where RISC processors have been for a full generation. RISC processors also have a greater degree of instruction level parallelism that CISC processors.

The result of the streamlined RISC design is that a RISC processor is typically 1.5 to 5 times faster than a CISC processor of the same period. It is actually possible to emulate a CISC processor in software on some RISC processors with nearly the same performance as the actual CISC processor.

Why CISC?

The natural question at this point is, "Why did CISC develop, if it is lower in performance?"

Architects at the time were using less-developed memory technology. Magnetic core was expensive, and the amount of it in a machine was typically quite small (64K words was considered large). It was also slower than the circuitry of the CPU.

Given that you have very little memory and it is slow, what would you do to the instruction set to compensate?

Make each instruction do more work.

This solution saves memory (fewer instructions are needed to accomplish the same task) and it saves time (instructions don't have to be fetched as often to keep the CPU busy).

Thus, architects of the time set about looking for ways to combine simpler instructions into more complex ones. In addition, they had pressure from assembly language programmers (who were proportionally as numerous then as C programmers are today) to make their jobs easier by providing more powerful instructions. The result was higher performance.

Then the technology shifted: memory became cheap (allowing it to be large) and fast, and compilers started generating efficient code (which put a lot of assembly language programmers out of work). Since the compilers only used a subset of the instructions, the complexity bacame excess baggage.

What Distinguishes RISC from CISC?

Many people have wrestled with this question, and their answers are usually only partly satisfying. Some say it is the orientation toward a large bank of general purpose registers. But others point to the Motorola 68000, which has a bank of general purpose registers and note that it is clearly CISC. Others point to the smaller number of addressing modes, but there are counterexamples -- some RISC processors actually have nearly as many modes as CISC processors. The number of instructions is often cited, but again there are counterexamples. The same is true for the fixed size of a RISC instruction (one word) vs the variable size of a CISC instruction, and the relative clock rates.

In the end, we may conclude from these commentators that RISC is a philosophy that attempts to restrict a design to simple, regular, general purpose features as much as possible, but is hard to pin down to specifics. The truly cynical might conclude that it is just a convenient label for marketing purposes. However, one aspect that truly seems to set RISC apart is that it keeps independent operations separate. In particular, it separates computation from memory access.

Why is this important? Keep in mind that a cache doesn't always hold the required portion of main memory.

Given that memory accesses may be slow, if they are independent of computations, a clever compiler can rearrange the operations so that memory accesses are done while the CPU is busy with other work. If the memory operation is tightly bound to the computation (as in a CISC ISA), then there is less flexibility for this reordering and the CPU spends more time waiting.

In a complex impementation of the ISA, with multiple arithmetic units, separating operations from each other frees the compiler to rearrange them to better utilize the multiple units. For example, if a program is operating on an integer array, the address arithmetic and the arithmetic on the array values may compete for the same units. If the address arithmetic can be split into separate operations, however, the compiler may be able to rearrange them with respect to the array value operations so that the units are fully utilized. If that address arithmetic is instead done by a single instruction that locks up the arithmetic units, it may be necessary for the computations on the array values to wait until the units are unlocked, even though some of them may be idle for part of the address instruction.

The Importance of Compiler Technology to RISC

RISC ISAs are simple to program in a naive, unoptimized manner. The trouble is that the sophisticated acceleration mechanisms we've touched on are not part of the ISA. Even though they are hidden, they affect the performance of the system in ways that are dependent on the sequence of instructions executed.

For example, if a 2-dimensional array is layed out with its row values appearing sequentially in memory, and a programmer can either access the data by reading along the rows or along the columns, what order should she or he choose? From the ISA, we would say it doesn't matter. But knowing that there is a cache that always loads sequences of words from memory, we can expect much higher performance if we write the code to access the array along its rows. That way, we get to spread the penalty for going to main memory across a series of accesses that find the subsequent data already in the cache.

With more complicated acceleration mechanisms, such as having multiple arithmetic units, it becomes difficult for an assembly language programmer to keep all of the details straight. Essentially, there is a large space of potential arrangements of instructions, and one must search this for the arrangement that makes the best use of all of the available resources. People can do this well for a little while, but for millions of instructions it is just too much effort. However, a compiler can dogmatically apply these optimizations across an entire program and in the end produce more efficient code.

Because RISC ISAs are specifically designed to allow their implementers the flexibility to incorporate a greater number of these sorts of enhancements, they depend even more than CISC ISA's on good compiler technology.

Unfortunately, because of the lag between development of an architecture and a mature optimizing compiler for it, architects often base design decisions for the next generation on inaccurate assumptions. The result has been that newer architectures are incorporating more compiler-like optimizations into the hardware. However, this added complexity makes it even more difficult to build a good optimizing compiler!

An Example of RISC Assembly Code

Go through example on p. 144 of text, using overhead copy. Point out that none of the operate instructions reference memory -- they all work on registers. Note how the registers are saved on the stack, especially $31, which is the return jump pointer. Also note how the less-than comparisons are done in separate instructions from the branches, because they require computation, and the RISC approach tries to separate computation from unrelated operations (control flow in this case). Also note that the equal and not-equal tests do not require computation in the sense that they do not need to do arithmetic.



© Copyright 1995, 1996 Charles C. Weems Jr. All rights reserved.
Back to Chip Weems' home page.
Back to courses index page.
Back to Computer Science Department home page.