MIPS R4400 Pipeline Case Study

The MIPS R4400 has an 8-stage pipeline with the following stages

Instruction First           IF

Instruction Second      IS

Register File                RF

Execute                        EX

Data First                    DF

Data Second                DS

Data Tag Check           TC

Write Back                  WB

In IF, the branch logic selects the instruction address and the I-cache fetch begins. The instruction TLB begins the virtual-to-physical address translation.

In IS, the fetch and translation complete.

In RF, the instruction decode occurs and the processor checks for interlocks. The instruction cache tag is checked against the page frame from the instruction TLB. Operands are fetched from the register file.

In EX, one of the following occurs: The ALU performs a register-to-register operation; the ALU calculates the data's virtual address for a load or store; the ALU determines whether a branch condition is true and calculates the virtual target address if the operation is a branch.

In DF, one of the following occurs: The data cache fetch and data TLB translation begin for a load or store; the branch target instruction address translation and TLB update begin for a branch; nothing happens for a register-to-register operation.

In DS, one of the following happens: the data cache fetch and TLB translation complete for a load/store and the shifter aligns data to a word or double word boundary; the branch instruction address translation and TLB update complete for branches, nothing happens for a register-to-register operation.

In TC, for a load/store the data cache performs a tag check -- the TLB physical address is checked against the cache tag to determine whether it hit. Nothing happens for a register-to-register operation.

In WB, the result of a register-to-register operation is written back to the register file. Branches do nothing during this stage.

The following diagram shows what happens at each stage in the pipeline for load/store and branch operations.

IC1                  Instruction cache access part 1

IC2                  Instruction cache access part 2

ITLB1 Instruction address translation part 1

ITLB2 Instruction address translation part 2

ITC                  Instruction cache tag check

IDEC               Instruction decode

RF                   Register operand fetch

ALU                ALU operation (register-to-register)

DVA                Data virtual address calculation

DC1                Data cache access part 1

DC2                Data cache access part 2

LSA                 Load/store allignment with shifter

JTLB1 Data/address translation part 1

JTLB2 Data/address translation part 2

DTC                Data cache tag check

IVA                 Instruction virtual address calculation

WB                 Write back to register file

Notice that the load/store unit and the branch unit each do their own virtual address calculation, but after that they both use the same TLB.


Branch Delay

The R4400 detects whether a conditional branch will be taken in stage 4 (EX). Thus, the appropriate fetch address cannot be determined until after three subsequent instructinos have entered the pipe behind the branch. Normally these would have to be flushed, but it is also possible to tag them as independent of the branch and have them proceed normally, in order to fill the branch delay slots with useful work. It is also possible for the processor to move instructions into the branch slots.

Load Delay

A load completes in DS. Thus, the operand cannot be used until the instruction following two slots behind the load (i.e. when it reaches its EX stage). The result of the load at the DS stage is automatically redirected to the RF of the instruction using the operand so that it is available for its EX.

Pipeline Faults

The R4400 is designed around a hierarchy of potential fault conditions that are arranged as follows:

Stalls are interlocks that halt the entire pipeline while slips allow part of the pipeline (usually the part already past the offending stage) to proceed.

Each exception or interlock condition is detected in just one stage, permitting the system to identify the offending instruction uniquely. For example:

In IS, Instruction TLB misses are detected.

In RF, Instruction cache misses are detected, as are Load interlock, multiply busy, divide busy, mul/div slip, shift > 32 bits, and FPU busy, and instruction translation exceptions.

In EX, exceptions such as interrupt, bus error, illegal instruction, breakpoint, system call, etc. are detected.

When an exception occurs, the offending instruction and all instructions following it in the pipe are cancelled. Any stalled instructions or other exceptions referencing this one are also cancelled. The following are the MIPS R4400 exceptions, with the stage in which they are detected indicated:

Instruction translation or address exception (RF)

External interrupt (EX)

Instruction bus error (EX)

Instruction virtual address coherent (EX)

Illegal instruction (EX)

Breakpoint (EX)

System call (EX)

Coprocessor unusable (EX)

Instruction error-correcting code error (EX)

Integer overflow (DS)

Floating point interrupt (DS)

Execute stage (programmed) traps (DS)

Data translation or address exception (TC)

Translation lookaside buffer modified (TC)

Data bus error (WB)

Memory reference address debugger comparison (WB)

Data virtual address coherent (WB)

Data error correcting code error (WB)

Non-maskable interrupt (WB)

Hardware reset (WB)

For a stall, the entire pipeline is frozen until the interlock is resolved. A restart sequence starts two cycles and inserts corrected information into the pipe before it is released.

Stalls are caused by the following events and detected in the stage indicated:

Instruction TLB miss (IS)

Instruction cache miss (RF)

Coprocessor possible exception (DF)

Integer sign extend (DF)

Store interlock (DF)

Data cache miss (TC)

Watch address exception (TC)

In a slip, pipeline stages that depend on the condition being resolved are held, and the rest are allowed to continue. Slips all occur in the RF stage. They are

Load interlock

Multiply unit busy

Divide Unit busy

Multiply/Divide single cycle slip

Variable shift or shift > 32 bits

FPU busy

 

In the following pipeline trace, the ALU operation depends on the Load and is scheduled two operations behind it. However, the cache miss is not detected until the ALU operation has already executed on incorrect data (EX-).  Thus the pipeline stalls and must be backed up before it is restarted.

 

Run/Stall

R

R

R

R

R

R

R

S

S

S

S

S

R

R

R

R

R

Restart

Ð

Ð

Ð

Ð

Ð

Ð

Ð

Ð

Ð

Ð

R

R

Ð

Ð

Ð

Ð

Ð

Load

IF

IS

RF

EX

DF

DS

TC

Ð

Ð

DF

DS

TC

WB

 

 

 

 

???

 

IF

IS

RF

EX

DF

DS

Ð

Ð

Ð

DF

DS

TC

WB

 

 

 

???

 

 

IF

IS

RF

EX

DF

Ð

Ð

Ð

Ð

DF

DS

TC

WB

 

 

ALU op

 

 

 

IF

IS

RF

EX-

Ð

Ð

Ð

RF

EX+

DF

DS

TC

WB

 

???

 

 

 

 

IF

IS

RF

Ð

Ð

Ð

Ð

Ð

EX

DF

DS

TC

WB

Notice that the instructions ahead of the ALU operation where the stall was detected all restart with their DF stage because of the miss. The ALU operation, however, must repeat its RF stage at the same time as the missed load completes its DS (the data is redirected). The EX+ represents a recomputation of the result with the corrected data. Note that, because the miss was just serviced, the TC for the load should not detect any problems.

Instruction Abort After Interlock

Suppose that an ALU operation results in an overflow but the next instruction also has an I-cache miss. Because the miss is detected in RF and the overflow is detected after this at DF, we have the overflow being detected after an interlock for an instruction following it is serviced. Thus, several instructions have entered the pipe, stalled and been restarted, but must now be cancelled because the overflow calls for an exception handler to execute.

 

Run/Stall

R

R

R

R

S

S

S

S

S

R

R

R

R

R

 

R

Stall

 

 

 

 

ICM

ICM

ICM

ICM

ICM

 

 

 

 

 

 

 

Restart

 

 

 

 

 

 

 

R

R

 

 

 

 

 

 

 

ALU

IF

IS

RF

EX

 

 

 

 

 

DF

DS

TC

WB

 

 

 

 

 

 

 

 

 

 

 

 

 

OVF

 

 

 

 

 

 

???

 

IF

IS

RF

 

 

IF

IS

RF

EX

DF

DS

TC

WB

 

 

 

 

 

 

ICM

 

 

 

 

 

 

 

 

 

 

 

 

???

 

 

IF

IS

 

 

 

IF

IS

RF

EX

DF

DS

TC

WB

 

???

 

 

 

IF

 

 

 

 <