Small FPGA things: contineous integration

Showing posts with label contineous integration. Show all posts

Sunday, 23 February 2020

The Robin SoC on the iCEbreaker: current status

The main focus in the last couple of weeks has been on the simplification of the CPU and ALU.

Simplification

The main decoding loop in the CPU was rather convoluted so both were redesigned a bit to improve readability of the Verilog code as well as reduce resource consumption. (The ALU code was updated in place, the CPU code got a new file)

Because by now I also have some experience with the code that is being generated by the compiler, I was able to remove unused instructions and ALU operations. Previously the pop, push and setxxx instructions were considered sub-instructions within one special opcode, now they are individual instructions (in case of pop and push) or rolled into a single set-and-branch instruction. The new instruction set architecture was highlighted in a separate article.

Less resources

All in all this redesign shrunk the number of LUTs consumed from 5279 (yes, just one LUT removed from 100%) to 4879 (92%), which is pretty neat because it leaves some room for additional functionality or tweaks. The biggest challenge by the way is Yosys: even slight changes in the design, like assigning different values to labels of a case statement that is not full, may result in a different number of LUTs consumed. This is something that needs some more research, maybe Yosys offers additional optimization options that let me get the lowest resource count in a more predictable manner.

Better testing

A significant amount of effort was spent on designing more and better regression tests. Both for the SoC and the support programs (assembler, simulator, ...) regression tests and syntax checkers were added. Most of these were also added to GitHub push actions, with the exception of the actual hardware tests because I cannot run those on GitHub. And of course this mainly done to show a few green banners on the repository home page 😀

Bug fixes

With a better testing framework in place it is far easier to check whether changes don't inadvertently break something. This was put to work in fixing one of the more annoying bugs left in the ALU design: previously shift left by more than 31 and shift right by 0 did not give a proper result. This is now fixed.

Frustrations

The up5k on the iCEbreaker board has 8 dsp cores. We currently use 4 of them to implement 32x32 bit multiplication. The SB_MAC16 primitives we use for this are inferred by Yosys from some multiplication statements we use in the ALU (i.e. we do not instantiate them directly) and work fine.
However, when I want to instantiate some of them directly and configure them to be used as 32 bit adders these instantiations will still multiply instead of add! No matter what I do, the result stays teh same. I have to admit I have no idea how Yosys infers stuff so it might very well be that my direct instantiation gets rewritten by some Yosys stage, so I will have to do some more research here.

What next?

I think next on the agenda is performance: I think I use too many read states for the fetch/decode/execute cycle. The Lattice technical documentation seems to imply we can read and write new data every clock cycle, at least for block ram. Unfortunately the docs for the SPRAM are less clear. Anyway, this area for sure needs some attention.

Sunday, 16 February 2020

Regression tests

In a previous post I mentioned the importance of creating a proper framework for testing.

In order to properly debug more complex programs like some functions in the libc library, I created a simulator. This simulator allows you to set breakpoints and single step through a program but also contains features that come in handy for regression tests, like dumping the contents of registers and memory locations after running a program. This information can be checked against known values in a test.

The simulator is great for more complex programs and to verify that elements of the toolchain like the assembler keep working as designed and as such these tests can even be automated as a GitHub push triggered action. These tests do not test the actual hardware though.

Our monitor program is scriptable however, so I designed a couple of tests that execute all instructions and alu operations and verify the results against known values. These tests still don't cover all edge cases but are sufficient to verify proper performance once I start refactoring the cpu and alu. Also, the hardware tests are mirrored in a test for the simulator (in the makefile) , so can be used to test its behavior as well.

Refactoring the cpu and alu

There are many things that can be improved in the design, especially when considering resource usage, instruction set design and performance.

I already started rewriting the very complex state machine into something simpler but without changing the instruction set (apart from removing the obsolete loadw and storw instructions). This already saves more than a hundred LUTs but I want to do more.

Things I have in mind:

Removing carry related functionality
Moving pop/push from 'special' sub-opcodes to their own instruction opcodes
Merging the setxxx and branch instructions into a combined instruction

Now that we have a somewhat decent test framework in place these changes can be more easily tested against regressions. The results of these refactorings will be reported on in future articles.

Saturday, 8 February 2020

A simulator for the Robin cpu

When I started playing around with a SoC design that I wanted to implement on the iCEbreaker, I quickly realized that without proper testing tools even a moderately ambiteous design would quickly become too complex to change and improve.

There exist of course tools to simulate verilog designs and even perform formal verification but my skill level is not quite up to that yet. On top of that I am convinced that many changes that I want to try out in the cpu would benefit from regression tests that are based on real code, i.e. code generated by a compiler instead of artificial tiny bits of code: code that you do not directly implement yourself tends to expose issues in the instruction set or bugs in it hardware implementation quicker than when you deliberately try to construct tiny test cases.

For these more realistic tests an assembler and a C compiler were created and they were used to implement small string and floating point libraries mimicking some of the functions in the C standard library. And they proved their worth as they uncovered among other things bugs in the handling of conditional branches for example.

However, as we will use the assembler and compiler to perform regression tests on the cpu it is important that these tools themselves are as bug free as possible, even when we add new functionality or change implementation details. Ideally some contineous integration would be implemented using GitHub actions that would be triggered on every push.

There is one catch here though: we cannot perform the final test in our chain of dependencies simply because the GitHub machines do not have an iCEbreaker board attached ☺️

We can deal with this challenge by creating a program the will simulate the cpu we have implemented on our fpga. This way we should be able to perform the tests for the compiler/assembler toolchain against this simulator with the added benefit of having more debugging options available (because they are much easier to implement in a bit of Python that in our resource constrained hardware.

The first version of this simulator is now commited and i hope to create some contineous integration actions in the near future.