Saturday 4 January 2020

More memory: spram

The iCE40 up5k that is used on the iCEBreaker board provides another type of memory besides the ubiquitous block ram: single port ram (spram).
No less than 128 Kbytes are provided and although it is a little bit unclear to me at the moment how fast they are, theY seem to function quite well with two clock cycle delay, so I can integrate them with my current design without a any changes to the CPU.

Implementation

The 128 Kbytes are provided as four blocks, each 16k x 16bits. As far as I know Yosys does not yet offer automatic inference, which means we have to use the iCE40 primitives directly. This may sound complicated but it is not as hard as it sounds.
The blocks take a 14 bit address (i.e. can address 16K words) and will read or write 16 bits at the time. Because we are interested in 8 bit bytes rather than words we need to make sure we return or write either the upper half or the lower half of a word depending on the address. For reading this means simply selecting, for writing this means setting a writemask that will limit which bits of a 16 bit word are actually written on receiving a write enable signal. Such a write mask itself is not 16 bit wide but just 4: 1 bit for each 4 bit nibble. We make this selection based on bit 14.


Code

The code below (GitHub) shows the implementation details. We use all four SB_SPRAM256KA blocks available on the up5k and use the top two bits of the 17 bit address to select a block. Bit 14 is then used to calculate the write mask (called nibble mask here). The same nibble mask is also used to select either the high or low byte from any 16 bit word we read from any of the four blocks. Note that our module's input data (wdata) is a byte but we always write 16 bit words. To this end we simply double the incoming byte; whether we actually write to high or low byte is determined by the write mask we construct and pass to the .MASKWREN input of the blocks.

// byte addressable spram
// uses all 128MB

module spram (
 input clk,
 input wen,
 input [16:0] addr,
 input [7:0] wdata,
 output [7:0] rdata
);

wire cs_0 = addr[16:15] == 0;
wire cs_1 = addr[16:15] == 1;
wire cs_2 = addr[16:15] == 2;
wire cs_3 = addr[16:15] == 3;

wire nibble_mask_hi = addr[14];
wire nibble_mask_lo = !addr[14];

wire [15:0] wdata16 = {wdata, wdata};

wire [15:0] rdata_0,rdata_1,rdata_2,rdata_3;
wire [7:0] rdata_0b = nibble_mask_hi ? rdata_0[15:8] : rdata_0[7:0];
wire [7:0] rdata_1b = nibble_mask_hi ? rdata_1[15:8] : rdata_1[7:0];
wire [7:0] rdata_2b = nibble_mask_hi ? rdata_2[15:8] : rdata_2[7:0];
wire [7:0] rdata_3b = nibble_mask_hi ? rdata_3[15:8] : rdata_3[7:0];

assign rdata = cs_0 ? rdata_0b : cs_1 ? rdata_1b : cs_2 ? rdata_2b : rdata_3b;

SB_SPRAM256KA ram0
  (
    .ADDRESS(addr[13:0]),
    .DATAIN(wdata16),
    .MASKWREN({nibble_mask_hi, nibble_mask_hi, nibble_mask_lo, nibble_mask_lo}),
    .WREN(wen),
    .CHIPSELECT(cs_0),
    .CLOCK(clk),
    .STANDBY(1'b0),
    .SLEEP(1'b0),
    .POWEROFF(1'b1),
    .DATAOUT(rdata_0)
  );

SB_SPRAM256KA ram1
  (
    .ADDRESS(addr[13:0]),
    .DATAIN(wdata16),
    .MASKWREN({nibble_mask_hi, nibble_mask_hi, nibble_mask_lo, nibble_mask_lo}),
    .WREN(wen),
    .CHIPSELECT(cs_1),
    .CLOCK(clk),
    .STANDBY(1'b0),
    .SLEEP(1'b0),
    .POWEROFF(1'b1),
    .DATAOUT(rdata_1)
  );

SB_SPRAM256KA ram2
  (
    .ADDRESS(addr[13:0]),
    .DATAIN(wdata16),
    .MASKWREN({nibble_mask_hi, nibble_mask_hi, nibble_mask_lo, nibble_mask_lo}),
    .WREN(wen),
    .CHIPSELECT(cs_2),
    .CLOCK(clk),
    .STANDBY(1'b0),
    .SLEEP(1'b0),
    .POWEROFF(1'b1),
    .DATAOUT(rdata_2)
  );

SB_SPRAM256KA ram3
  (
    .ADDRESS(addr[13:0]),
    .DATAIN(wdata16),
    .MASKWREN({nibble_mask_hi, nibble_mask_hi, nibble_mask_lo, nibble_mask_lo}),
    .WREN(wen),
    .CHIPSELECT(cs_3),
    .CLOCK(clk),
    .STANDBY(1'b0),
    .SLEEP(1'b0),
    .POWEROFF(1'b1),
    .DATAOUT(rdata_3)
  );

endmodule

Notes

Because of the values passed to the standby sleep and poweroff inputs we effectively keep the everything running full blast and presumably consuming quite a lot of power (relatively speaking). Since i have no idea at the moment hiw long it would take to resume from standby, i leave it at that for now.

CPU design

The CPU design as currently implemented largely follows the diagram shown below. It features a 16 x 32bit register file and 16 bit instructi...