Here the goal is performance. We want to optimize how short a clock cycle is
We might reuse values from a previous cycle.
All operations run in parallel.
Step 1: Instruction fetch
We have the following operations and control signals:
- IorD → Select PC as the source address
- MemRead → Read a memory address
- IRWrite → Write the instruction into the instruction register
- ALUSrcA → Send the PC to ALU
- ALUSrcB → Send 4 to ALU
- ALUOp → Instruct ALU to perform add
- PCSource → Send ALU output to PC
- PCWrite → Write to the PC
Step 2: Instruction Decode and Register
Since we don’t yet know the operation that will be run, we have to do things that don’t need any information:
- Read
rs1andrs2 - Compute the branch operation with the ALU
These are optimistic optimizations, because they cause no harm if useless, but may be helpful once we know the instruction
Step 3: Execution, Mem Addr Computation, Branch Completion
The ALU operates on the operands that were prepared in step 2, which can either be:
- Memory load/store
- Arithmetic instruction (R-type)
- Branch
Step 4: Memory Access or R-type Instruction Completion
A load/store instruction accesses memory, or an arithmetic instruction writes its result to the register file
Step 5: Memory Read Completion
Uses the memory we read at the last step and writes it to the register file
Defining the Control FSM
The FSM has two steps which are common between all the different types we can end up having:
After that, we switch to another FSM, depending on the operand read by the decoding step
Memory Reference FSM
We either read/write data to memory
- If we are writing, we need to just write the register computed at step 2 into memory
- If we are reading, we need
- To load the word into memory
- And then write it to a register
Therefore we see that to load/store from memory, we need 4/5 cycles to complete the FSM.
R-Type (Arithmetic Operation)
Here, we just have to write the ALU’s result to the register file.
Therefore this FSM requires only 4 cycles.
Branch
If the operation is to branch, then we can jump to the value we got in step 2. This makes the branch operation work in only 3 cycles.