MIPS R4400 Pipeline Case Study
The MIPS R4400 has an 8-stage pipeline with the
following stages
Instruction First IF
Instruction Second IS
Register File RF
Execute EX
Data First DF
Data Second DS
Data Tag Check TC
Write Back WB
In IF, the branch logic selects the instruction
address and the I-cache fetch begins. The instruction TLB begins the
virtual-to-physical address translation.
In IS, the fetch and translation complete.
In RF, the instruction decode occurs and the processor
checks for interlocks. The instruction cache tag is checked against the page
frame from the instruction TLB. Operands are fetched from the register file.
In EX, one of the following occurs: The ALU performs a
register-to-register operation; the ALU calculates the data's virtual address
for a load or store; the ALU determines whether a branch condition is true and
calculates the virtual target address if the operation is a branch.
In DF, one of the following occurs: The data cache
fetch and data TLB translation begin for a load or store; the branch target
instruction address translation and TLB update begin for a branch; nothing
happens for a register-to-register operation.
In DS, one of the following happens: the data cache
fetch and TLB translation complete for a load/store and the shifter aligns data
to a word or double word boundary; the branch instruction address translation
and TLB update complete for branches, nothing happens for a
register-to-register operation.
In TC, for a load/store the data cache performs a tag
check -- the TLB physical address is checked against the cache tag to determine
whether it hit. Nothing happens for a register-to-register operation.
In WB, the result of a register-to-register operation
is written back to the register file. Branches do nothing during this stage.
The following diagram shows what happens at each stage
in the pipeline for load/store and branch operations.

IC1 Instruction
cache access part 1
IC2 Instruction
cache access part 2
ITLB1 Instruction
address translation part 1
ITLB2 Instruction
address translation part 2
ITC Instruction
cache tag check
IDEC Instruction
decode
RF Register
operand fetch
ALU ALU
operation (register-to-register)
DVA Data
virtual address calculation
DC1 Data
cache access part 1
DC2 Data
cache access part 2
LSA Load/store
allignment with shifter
JTLB1 Data/address
translation part 1
JTLB2 Data/address
translation part 2
DTC Data
cache tag check
IVA Instruction
virtual address calculation
WB Write
back to register file
Notice that the load/store unit and the branch unit
each do their own virtual address calculation, but after that they both use the
same TLB.
Branch Delay
The R4400 detects whether a conditional branch will be
taken in stage 4 (EX). Thus, the appropriate fetch address cannot be determined
until after three subsequent instructinos have entered the pipe behind the
branch. Normally these would have to be flushed, but it is also possible to tag
them as independent of the branch and have them proceed normally, in order to
fill the branch delay slots with useful work. It is also possible for the
processor to move instructions into the branch slots.
Load Delay
A load completes in DS. Thus, the operand cannot be
used until the instruction following two slots behind the load (i.e. when it
reaches its EX stage). The result of the load at the DS stage is automatically
redirected to the RF of the instruction using the operand so that it is
available for its EX.
Pipeline Faults
The R4400 is designed around a hierarchy of potential
fault conditions that are arranged as follows:

Stalls are interlocks that halt the entire pipeline
while slips allow part of the pipeline (usually the part already past the
offending stage) to proceed.
Each exception or interlock condition is detected in
just one stage, permitting the system to identify the offending instruction
uniquely. For example:
In IS, Instruction TLB misses are detected.
In RF, Instruction cache misses are detected, as are
Load interlock, multiply busy, divide busy, mul/div slip, shift > 32 bits,
and FPU busy, and instruction translation exceptions.
In EX, exceptions such as interrupt, bus error,
illegal instruction, breakpoint, system call, etc. are detected.
When an exception occurs, the offending instruction
and all instructions following it in the pipe are cancelled. Any stalled
instructions or other exceptions referencing this one are also cancelled. The
following are the MIPS R4400 exceptions, with the stage in which they are
detected indicated:
Instruction translation or address exception (RF)
External interrupt (EX)
Instruction bus error (EX)
Instruction virtual address coherent (EX)
Illegal instruction (EX)
Breakpoint (EX)
System call (EX)
Coprocessor unusable (EX)
Instruction error-correcting code error (EX)
Integer overflow (DS)
Floating point interrupt (DS)
Execute stage (programmed) traps (DS)
Data translation or address exception (TC)
Translation lookaside buffer modified (TC)
Data bus error (WB)
Memory reference address debugger comparison (WB)
Data virtual address coherent (WB)
Data error correcting code error (WB)
Non-maskable interrupt (WB)
Hardware reset (WB)
For a stall, the entire pipeline is frozen until the
interlock is resolved. A restart sequence starts two cycles and inserts
corrected information into the pipe before it is released.
Stalls are caused by the following events and detected
in the stage indicated:
Instruction TLB miss (IS)
Instruction cache miss (RF)
Coprocessor possible exception (DF)
Integer sign extend (DF)
Store interlock (DF)
Data cache miss (TC)
Watch address exception (TC)
In a slip, pipeline stages that depend on the
condition being resolved are held, and the rest are allowed to continue. Slips
all occur in the RF stage. They are
Load interlock
Multiply unit busy
Divide Unit busy
Multiply/Divide single cycle slip
Variable shift or shift > 32 bits
FPU busy
In the following pipeline trace, the ALU operation
depends on the Load and is scheduled two operations behind it. However, the
cache miss is not detected until the ALU operation has already executed on
incorrect data (EX-). Thus the
pipeline stalls and must be backed up before it is restarted.
|
Run/Stall |
R |
R |
R |
R |
R |
R |
R |
S |
S |
S |
S |
S |
R |
R |
R |
R |
R |
|
Restart |
Ð |
Ð |
Ð |
Ð |
Ð |
Ð |
Ð |
Ð |
Ð |
Ð |
R |
R |
Ð |
Ð |
Ð |
Ð |
Ð |
|
Load |
IF |
IS |
RF |
EX |
DF |
DS |
TC |
Ð |
Ð |
DF |
DS |
TC |
WB |
|
|
|
|
|
??? |
|
IF |
IS |
RF |
EX |
DF |
DS |
Ð |
Ð |
Ð |
DF |
DS |
TC |
WB |
|
|
|
|
??? |
|
|
IF |
IS |
RF |
EX |
DF |
Ð |
Ð |
Ð |
Ð |
DF |
DS |
TC |
WB |
|
|
|
ALU
op |
|
|
|
IF |
IS |
RF |
EX- |
Ð |
Ð |
Ð |
RF |
EX+ |
DF |
DS |
TC |
WB |
|
|
??? |
|
|
|
|
IF |
IS |
RF |
Ð |
Ð |
Ð |
Ð |
Ð |
EX |
DF |
DS |
TC |
WB |
Notice that the instructions ahead of the ALU
operation where the stall was detected all restart with their DF stage because
of the miss. The ALU operation, however, must repeat its RF stage at the same
time as the missed load completes its DS (the data is redirected). The EX+
represents a recomputation of the result with the corrected data. Note that,
because the miss was just serviced, the TC for the load should not detect any
problems.
Instruction Abort After Interlock
Suppose that an ALU operation results in an overflow
but the next instruction also has an I-cache miss. Because the miss is detected
in RF and the overflow is detected after this at DF, we have the overflow being
detected after an interlock for an instruction following it is serviced. Thus,
several instructions have entered the pipe, stalled and been restarted, but
must now be cancelled because the overflow calls for an exception handler to
execute.
|
Run/Stall |
R |
R |
R |
R |
S |
S |
S |
S |
S |
R |
R |
R |
R |
R |
|
R |
|
Stall |
|
|
|
|
ICM |
ICM |
ICM |
ICM |
ICM |
|
|
|
|
|
|
|
|
Restart |
|
|
|
|
|
|
|
R |
R |
|
|
|
|
|
|
|
|
ALU |
IF |
IS |
RF |
EX |
|
|
|
|
|
DF |
DS |
TC |
WB |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
OVF |
|
|
|
|
|
|
|
??? |
|
IF |
IS |
RF |
|
|
IF |
IS |
RF |
EX |
DF |
DS |
TC |
WB |
|
|
|
|
|
|
|
ICM |
|
|
|
|
|
|
|
|
|
|
|
|
|
??? |
|
|
IF |
IS |
|
|
|
IF |
IS |
RF |
EX |
DF |
DS |
TC |
WB |
|
|
??? |
|
|
|
IF |
|
|
|
< |