Sunday, 15 March 2020
Count leading zeros in verilog
It documents (with code) the implementation of a new module to count leading zeros.
Saturday, 14 March 2020
Verilog test benches
for(i=7; i>=0 ; i--) begin
...
end
The simple solution of course was to replace
i--
with i=i-1
, but it is still not completely clear to me what the exact differences are in Verilog versions supported by Yosys and Icarus.
clz_tb: ../clz.v clz_tb.v
$(VERILOG) -o $@ $^ ; vvp $@ | awk "/FATAL/{exit(1)}"
Doubts about Verilog
I can't tell at this point if VHDL or other hardware definition languages are any better but the longer i work with Verilog the more doubts i have: It does not clearly separate simulation from synthesis, its syntax (especially scoping rules for variables) is illogical, you can't define functions with more than one statement (at least not in verilog-2001) and every implementation is allowed to diverge from the standard by chosing to implement some features or not. I am not sure why people in the hardware world accept this, couldn't imagine this happening to Python implementations for example.
Anyway, it works sort of, so we'll see where it gets us; maybe with a bit more experience it will be less awkward to work with.
Saturday, 7 March 2020
Optimizing the fetch decode execute cycle II
Because we know the opcode for any instruction already in the FETCH3 cycle we can set the mem_raddr register with the contents of the stackpointer if we are dealing with a pop instruction or keep on incrementing the mem_raddr for those instructions that are followed by some bytes after the instructions itself, like the two byte offset fir the branch instruction and the four bytes of the load immediate instruction. And if we set the mem_raddr register two cycles earlier that means that we can actually read those bytes two cycles earlier as well.
This newly implemented scenario is summed up in the table below (click to enlarge)
Some more opportunities
Thursday, 5 March 2020
Optimizing the fetch decode execute cycle
By closely looking at the timing diagrams for memory access we could reduce the number of cycles in the fetch part significantly. Meanwhile I implemented some additional optimizations and currently the MOVE and LOADL instruction clock in (pun intended) at 4 and 9 cycles respectively, a speedup of about 2x compared to the initial implementation.
The diagram below illustrates the different activities that take place in the various states (click to enlarge):
The important bit to understand here is that we do not read anything from memory in the decode and exec1 states. For some instructions this is inevitable because only after reading the second byte of an instruction (in fetch4) and adding the two source registers (available in decode, because adding those to registers needs a clock cycle) can we load the mem_raddr register and start loading two cycles later.
However, for instructions like LOADIL (load immediate ling word) and SETBRA, the data and offset respectively are located just after the actual instruction, so we could keep on incrementing the mem_raddr in states fetch 3 and fetch 4 so that the first two bytes would be available in the decode and exec 1 states as indicated by the highlighted 'gaps' in the table.
Even for the POP instruction we know what the address should be because we can refer to register 14 (the stackpointer). The only thing we have to keep in ind that we need to decide whether to keep on incrementing the mem_raddr register or to load it with the address in the stack pointer. We can make this decision in state fetch 3 already because there we read the high byte of the instruction which contains the intructions opcode.
So next on my agenda is to see whether we can indeed implement this idea. it would potentially shave of another 2 cycles from the the LOADIL, SETBRA and POP instructions so it is certainly worth the effort.
CPU design
The CPU design as currently implemented largely follows the diagram shown below. It features a 16 x 32bit register file and 16 bit instructi...