Because we know the opcode for any instruction already in the FETCH3 cycle we can set the mem_raddr register with the contents of the stackpointer if we are dealing with a pop instruction or keep on incrementing the mem_raddr for those instructions that are followed by some bytes after the instructions itself, like the two byte offset fir the branch instruction and the four bytes of the load immediate instruction. And if we set the mem_raddr register two cycles earlier that means that we can actually read those bytes two cycles earlier as well.
This newly implemented scenario is summed up in the table below (click to enlarge)
The highlighted areas show where the changes are. From the second column we can see that we are setting or updating the mem_raddr register for every cycle from FETCH1 to EXEC3, and reading a byte in every cycle from FETCH3 to EXEC5.
This means that for the load immediate and pop instructions we're done in EXEC3 and for the branch instruction even one cycle earlier (not two cycles because although we read two bytes less, we also need to add the offset to the program counter and that takes a cycle).
Some more opportunities
There are still a few opportunities left for optimization for the mover, load byte and the push instruction and i'll probably discuss that in a future article.