Here the goal is performance. We want to optimize how short a clock cycle is

We might reuse values from a previous cycle.

All operations run in parallel.

Step 1: Instruction fetch

We have the following operations and control signals:

  • IorD Select PC as the source address
  • MemRead Read a memory address
  • IRWrite Write the instruction into the instruction register
  • ALUSrcA Send the PC to ALU
  • ALUSrcB Send 4 to ALU
  • ALUOp Instruct ALU to perform add
  • PCSource Send ALU output to PC
  • PCWrite Write to the PC

Step 2: Instruction Decode and Register

Since we don’t yet know the operation that will be run, we have to do things that don’t need any information:

  • Read rs1 and rs2
  • Compute the branch operation with the ALU

These are optimistic optimizations, because they cause no harm if useless, but may be helpful once we know the instruction

Step 3: Execution, Mem Addr Computation, Branch Completion

The ALU operates on the operands that were prepared in step 2, which can either be:

  • Memory load/store
  • Arithmetic instruction (R-type)
  • Branch

Step 4: Memory Access or R-type Instruction Completion

A load/store instruction accesses memory, or an arithmetic instruction writes its result to the register file

Step 5: Memory Read Completion

Uses the memory we read at the last step and writes it to the register file

Defining the Control FSM

The FSM has two steps which are common between all the different types we can end up having:

After that, we switch to another FSM, depending on the operand read by the decoding step

Memory Reference FSM

We either read/write data to memory

  • If we are writing, we need to just write the register computed at step 2 into memory
  • If we are reading, we need
    • To load the word into memory
    • And then write it to a register

Therefore we see that to load/store from memory, we need 4/5 cycles to complete the FSM.

R-Type (Arithmetic Operation)

Here, we just have to write the ALU’s result to the register file.

Therefore this FSM requires only 4 cycles.

Branch

If the operation is to branch, then we can jump to the value we got in step 2. This makes the branch operation work in only 3 cycles.