As you are aware from your assembly language programming course, machines
are programmed with binary codes that represent instructions and data. The
fact that the program and data can both reside in memory is called the "stored
program concept" and is what makes computers so flexible.
Instructions are usually divided into fields that represent different information
required to carry out an operation (such as the op-code, data addresses,
register operands, etc.). In your prior experience with the Intel processors
you worked with a complex instruction set. Let's briefly review the features
of this instruction set architecture:
Eight "General Purpose" registers: EAX, EBX, ECX, EDX, ESI,
EDI, EBP, ESP
The first four can be treated as 32 or 16 bit registers, with the lower
16 bits being further divided into bytes that are designated separately
and which provide limited compatibility with the 8-bit Intel architectures.
Six Segment registers: CS, SS, DS, ES, FS, GS
CS being Code Segment, SS being Stack Segment, DS being Data Segment (along
with the other three)
Instruction Pointer (16 or 32 bits), Flags Register (32 bits, with some
undefined, although each new generation seems to define more of them)
Memory Management Registers:
Control Registers:
Floating Point Registers:
Control Register (16 bits), Status Register (16 bits), Tag Word (16 bits),
Instruction Pointer (48 bits), Data Pointer (48 bits)
Debug Registers
The Intel 386/486/Pentium has 12 addressing modes:
Operands in the Intel processors can be 8, 16, 32, 48, 64, or 80 bits
long. The instruction set also supports string operations over the entire
address range.
Instructions in the Intel design can be as small as one byte or as long
as 12 bytes and any combination in between. The first bytes generally contain
opcode, mode specifiers, and register fields, while the remainder are for
address displacement and immediate data.
Now for the sake of comparison, let's take a look at the Instruction Set
Architecture of the MIPS family of RISC processors.
The MIPS family has four addressing modes:
Memory accesses in the MIPS architecture are to any multiple between
1 and 8 bytes.
There are three instruction formats, all of which are 32 bits in length.
Even at this cursory level of review, it's clear how much simpler the
Instruction Set Architecture of the MIPS processor is. The same is true
of most RISC processors.
What are the advantages of making the ISA simple?
Easier to program.
Easier to build.
Requires less hardware for basic functionality, leaving room for circuitry
to increase performance.
Let's take a closer look at each of these reasons in turn.
A RISC ISA is easier to program in the sense that we have fewer alternatives
to choose between -- there are only a few ways to accomplish a given operation
that make sense. In a CISC ISA, we have many alternatives to choose from.
While we, as assembly language programmers, may be comfortable with making
those choices, most compilers don't have the necessary sophistication to
make the subtle distinctions necessary. In fact, one of the studies that
resulted in the RISC approach was an analysis of object code on CISC architectures
that showed only a tiny fraction of the available instructions were being
used. The compilers were generating simple, straightforward code and did
not have the sophistication to make use of the many special operations in
the CISC ISA.
However, a RISC ISA is also more difficult to program in that it often requires
a greater number of instructions than the CISC ISA to accomplish the same
task. It is not unusual for a RISC object file to be 50% larger than a CISC
object for the same program.
The RISC architecture is easier to build in that it is more regular, has
a simpler set of interconnections, fewer special cases, and generally takes
less circuitry to implement. However, it is also more difficult to build
in that it must run faster than a CISC architecture in order to execute
the greater number of instructions in the same time. Faster circuits are
harder to design and consume more power. But at least the RISC design compensates
for this difficulty by eliminating other sources of complexity. The RISC
processor also requires a larger cache memory. Why?
Because it's object code is larger, it has to hold more instructions in
cache.
RISC does require less hardware for the basic functionality, which (if we
ignore the need for a larger cache) leaves room for more acceleration circuits
to be added. For example, the complex pipeline of the Intel P6 is only just
catching up to where RISC processors have been for a full generation. RISC
processors also have a greater degree of instruction level parallelism that
CISC processors.
The result of the streamlined RISC design is that a RISC processor is typically
1.5 to 5 times faster than a CISC processor of the same period. It is actually
possible to emulate a CISC processor in software on some RISC processors
with nearly the same performance as the actual CISC processor.
The natural question at this point is, "Why did CISC develop, if
it is lower in performance?"
Architects at the time were using less-developed memory technology. Magnetic
core was expensive, and the amount of it in a machine was typically quite
small (64K words was considered large). It was also slower than the circuitry
of the CPU.
Given that you have very little memory and it is slow, what would you do
to the instruction set to compensate?
Make each instruction do more work.
This solution saves memory (fewer instructions are needed to accomplish
the same task) and it saves time (instructions don't have to be fetched
as often to keep the CPU busy).
Thus, architects of the time set about looking for ways to combine simpler
instructions into more complex ones. In addition, they had pressure from
assembly language programmers (who were proportionally as numerous then
as C programmers are today) to make their jobs easier by providing more
powerful instructions. The result was higher performance.
Then the technology shifted: memory became cheap (allowing it to be large)
and fast, and compilers started generating efficient code (which put a lot
of assembly language programmers out of work). Since the compilers only
used a subset of the instructions, the complexity bacame excess baggage.
Many people have wrestled with this question, and their answers are usually
only partly satisfying. Some say it is the orientation toward a large bank
of general purpose registers. But others point to the Motorola 68000, which
has a bank of general purpose registers and note that it is clearly CISC.
Others point to the smaller number of addressing modes, but there are counterexamples
-- some RISC processors actually have nearly as many modes as CISC processors.
The number of instructions is often cited, but again there are counterexamples.
The same is true for the fixed size of a RISC instruction (one word) vs
the variable size of a CISC instruction, and the relative clock rates.
In the end, we may conclude from these commentators that RISC is a philosophy
that attempts to restrict a design to simple, regular, general purpose features
as much as possible, but is hard to pin down to specifics. The truly cynical
might conclude that it is just a convenient label for marketing purposes.
However, one aspect that truly seems to set RISC apart is that it keeps
independent operations separate. In particular, it separates computation
from memory access.
Why is this important? Keep in mind that a cache doesn't always hold the
required portion of main memory.
Given that memory accesses may be slow, if they are independent of computations,
a clever compiler can rearrange the operations so that memory accesses are
done while the CPU is busy with other work. If the memory operation is tightly
bound to the computation (as in a CISC ISA), then there is less flexibility
for this reordering and the CPU spends more time waiting.
In a complex impementation of the ISA, with multiple arithmetic units, separating
operations from each other frees the compiler to rearrange them to better
utilize the multiple units. For example, if a program is operating on an
integer array, the address arithmetic and the arithmetic on the array values
may compete for the same units. If the address arithmetic can be split into
separate operations, however, the compiler may be able to rearrange them
with respect to the array value operations so that the units are fully utilized.
If that address arithmetic is instead done by a single instruction that
locks up the arithmetic units, it may be necessary for the computations
on the array values to wait until the units are unlocked, even though some
of them may be idle for part of the address instruction.
RISC ISAs are simple to program in a naive, unoptimized manner. The trouble
is that the sophisticated acceleration mechanisms we've touched on are not
part of the ISA. Even though they are hidden, they affect the performance
of the system in ways that are dependent on the sequence of instructions
executed.
For example, if a 2-dimensional array is layed out with its row values appearing
sequentially in memory, and a programmer can either access the data by reading
along the rows or along the columns, what order should she or he choose?
From the ISA, we would say it doesn't matter. But knowing that there is
a cache that always loads sequences of words from memory, we can expect
much higher performance if we write the code to access the array along its
rows. That way, we get to spread the penalty for going to main memory across
a series of accesses that find the subsequent data already in the cache.
With more complicated acceleration mechanisms, such as having multiple arithmetic
units, it becomes difficult for an assembly language programmer to keep
all of the details straight. Essentially, there is a large space of potential
arrangements of instructions, and one must search this for the arrangement
that makes the best use of all of the available resources. People can do
this well for a little while, but for millions of instructions it is just
too much effort. However, a compiler can dogmatically apply these optimizations
across an entire program and in the end produce more efficient code.
Because RISC ISAs are specifically designed to allow their implementers
the flexibility to incorporate a greater number of these sorts of enhancements,
they depend even more than CISC ISA's on good compiler technology.
Unfortunately, because of the lag between development of an architecture
and a mature optimizing compiler for it, architects often base design decisions
for the next generation on inaccurate assumptions. The result has been that
newer architectures are incorporating more compiler-like optimizations into
the hardware. However, this added complexity makes it even more difficult
to build a good optimizing compiler!
Go through example on p. 144 of text, using overhead copy. Point out
that none of the operate instructions reference memory -- they all work
on registers. Note how the registers are saved on the stack, especially
$31, which is the return jump pointer. Also note how the less-than comparisons
are done in separate instructions from the branches, because they require
computation, and the RISC approach tries to separate computation from unrelated
operations (control flow in this case). Also note that the equal and not-equal
tests do not require computation in the sense that they do not need to do
arithmetic.