ip1
is r[15]+1
):
case(state)
FETCH1 : begin
r[0] <= 0;
r[1] <= 1;
r[13][31] <= 1; // force the always on bit
mem_raddr <= ip;
state <= halted ? FETCH1 : FETCH2;
end
FETCH2 : state <= FETCH3;
FETCH3 : begin
instruction[15:8] <= mem_data_out;
r[15] <= ip1;
state <= FETCH4;
end
FETCH4 : begin
mem_raddr <= ip;
state <= FETCH5;
end
FETCH5 : state <= FETCH6;
FETCH6 : begin
instruction[7:0] <= mem_data_out;
r[15] <= ip1;
...
So between every assignment to the mem_raddr
register (in state FETCH1 and FETCH4) and the retrieval of the byte from the mem_data_out
register (in state FETCH3 and FETCH6) we had a wait cycle.
Now it is true that for the ice40 BRAM there needs to be two clock cycles between setting the address and reading the byte, but we can already set the new address in the next cycle, allowing us to read a byte every clock cycle once we set the initial address.
This alternative approach looks like this:
case(state)
FETCH1 : begin
r[0] <= 0;
r[1] <= 1;
r[13][31] <= 1; // force the always on bit
mem_raddr <= ip;
state <= halted ? FETCH1 : FETCH2;
end
FETCH2 : begin
state <= FETCH3;
r[15] <= ip1;
mem_raddr <= ip1;
end
FETCH3 : begin
instruction[15:8] <= mem_data_out;
state <= FETCH4;
end
FETCH4 : begin
instruction[7:0] <= mem_data_out;
r[15] <= ip1;
...
So we set the address in states FETCH1 and FETCH2 and read the corresponding bytes in states FETCH3 and FETCH4 respectively, saving us 2 clock cycles for every instruction. Since the most used instructions took 8 cycles and now 6, this is a reduction of 25%. Not bad I think.
And although not very well documented (or documented at all actually) this setup works for SPRAMS as well.