Saturday, 7 March 2020

Optimizing the fetch decode execute cycle II

In the last article we identified a couple of opportunities to decrease the cycles needed to execute the branch, pop and load immediate instructions. The key issue hear was that we weren't reading the bytes until after we started setting the mem_raddr register in the DECODE cycle.

Because we know the opcode for any instruction already in the FETCH3 cycle we can set the mem_raddr register with the contents of the stackpointer if we are dealing with a pop instruction or keep on incrementing the mem_raddr for those instructions that are followed by some bytes after the instructions itself, like the two byte offset fir the branch instruction and the four bytes of the load immediate instruction. And if we set the mem_raddr register two cycles earlier that means that we can actually read those bytes two cycles earlier as well.

This newly implemented scenario is summed up in the table below (click to enlarge)


The highlighted areas show where the changes are. From the second column we can see that we are setting or updating the mem_raddr register for every cycle from FETCH1 to EXEC3, and reading a byte in every cycle from FETCH3 to EXEC5.

This means that for the load immediate and pop instructions we're done in EXEC3 and for the branch instruction even one cycle earlier (not two cycles because although we read two bytes less, we also need to add the offset to the program counter and that takes a cycle).

Some more opportunities


There are still a few opportunities left for optimization for the mover, load byte and the push instruction and i'll probably discuss that in a future article.

CPU design

The CPU design as currently implemented largely follows the diagram shown below. It features a 16 x 32bit register file and 16 bit instructi...