The first actual implementation of a UART plus a FIFO is shown below. Because we have no flow control whatsoever on the RX/TX lines of the UART I decided not to implement a FIFO on the transmit side. It is useless I think: either we can send (and bytes will be buffered on the receiving end) or we fail without anything to indicate something went wrong. So an outbound FIFO will not make any difference and just consume resources so I did not implement it.
Implementation notes
In the implementation we have to processes (always blocks in the verilog code). The top one acts on an received signal and transfers the incoming byte to the FIFO. It will not immediately trigger the fifo_in_write signal because the data has to be stable but will trigger it on the next clock cycle. (Indicated by the red dashed signal). The bottom process check whether the FIFO is not empty and whether the UART is not transmitting already. If these conditions are met it transfers a byte from the FIFO by triggering the fifo_read signal.
The top level code is available on GitHub, the fifo code is a separate module.