Thursday, 16 January 2020

Additional instructions

The Robin cpu/soc is coming along nicely but when i started playing around with implementing a compiler it became quickly clear that code generation was hindered by not having relative branch instructions that could reach destinations beyond -128 or +127 bytes (a 8-bit signed integer).

Long branch

So I expanded the instruction set to take a full 32-bit signed offset. If the 8bit offset is zero, the next 4 bytes will be used as a the offset. The complete instruction now looks like this:

[15:12] opcode (13)
[11: 8] condition
[ 7: 0] offset
Optional: 4 bytes offset (if offset == 0)

The condition is used to check against the flags register. The highest bit of the condition determines if a flag should be set or unset and because bit 31 of the flags register is always 1 we even have an option for an unconditional branch (or even to never take the branch, which is rather useless)

if cond[2:0] & R13[31:29] == cond[3]  then PC += offset  ? offset : (PC)

Bit 30 and 29 of the flags register are the negative (sign) and zero bit respectively.

Stack instructions

I also added pop and push instructions to reduce code size, even though it is a bit at odds with the RISC philosophy. These always use R14 as the stack pointer and the opcode looks like this:

[15:12] opcode (15)
[11: 8] register
[ 7: 0] 1 = pop, 2 = push

Verilog observations

I have a few other instructions I wish to implement, for example to sign extend a byte to a long, but already i am using almost all available LUTs on the iCEbreaker.
There are a few options though: until now i have been using next-pnr's heap placer which is quite fast (just a few seconds on my machine). The sa placer however is much slower (more than 60 seconds) but also generates a result that saves me about 250 LUTs!
The second option is to play around with the numerical values of the state labels. This may sound weird but the current implementation of the cpu has 29 states, i.e. a 5 bit state register. If i number them consecutively from 0 - 28 yosys uses more LUTs than when I assign the last state the number 31. Apparently the huge multiplexer generated for this state machine benefits from gaps in the list of possible states.
In the end I intend to simplify and optimise this design but for now I stick with the sa placer.

CPU design

The CPU design as currently implemented largely follows the diagram shown below. It features a 16 x 32bit register file and 16 bit instructi...