microwatt

Commit Graph

Author	SHA1	Message	Date
Benjamin Herrenschmidt	d40c1c1a25	icache: Use narrower block RAMs We only ever access the cache memory for at most the wishbone bus width at a time. So having the BRAMs organized as a cache-line-wide port is a waste of resources. Instead, use a wishbone-wide memory and store a line as consecutive rows in the BRAM. This significantly improves BRAM usage in the FPGA as we can now use more rows in the BRAM blocks. It also saves a few LUTs and muxes. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	5 years ago
Benjamin Herrenschmidt	d415e5544a	fetch/icache: Fit icache in BRAM The goal is to have the icache fit in BRAM by latching the output into a register. In order to avoid timing issues , we need to give the BRAM a full cycle on reads, and thus we souce the BRAM address directly from fetch1 latched NIA. (Note: This will be problematic if/when we want to hash the address, we'll probably be better off having fetch1 latch a fully hashed address along with the normal one, so the icache can use the former to address the BRAM and pass the latter along) One difficulty is that we cannot really stall the icache without adding more combo logic that would break the "one full cycle" BRAM model. This means that on stalls from decode, by the time we stall fetch1, it has already gone to the next address, which the icache is already latching. We work around this by having a "stash" buffer in fetch2 that will stash away the icache output on a stall, and override the output of the icache with the content of the stash buffer when unstalling. This requires a rewrite of the stop/step debug logic as well. We now do most of the hard work in fetch1 which makes more sense. Note: Vivado is still not inferring an built-in output register for the BRAMs. I don't want to add another cycle... I don't fully understand why it wouldn't be able to treat current_row as such but clearly it won't. At least the timing seems good enough now for 100Mhz, possibly more. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	5 years ago
Anton Blanchard	1e3e16e500	Add an icache testbench Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	5 years ago

Author

SHA1

Message

Date

Benjamin Herrenschmidt

d40c1c1a25

icache: Use narrower block RAMs

We only ever access the cache memory for at most the wishbone bus
width at a time. So having the BRAMs organized as a cache-line-wide
port is a waste of resources.

Instead, use a wishbone-wide memory and store a line as consecutive
rows in the BRAM.

This significantly improves BRAM usage in the FPGA as we can now use
more rows in the BRAM blocks. It also saves a few LUTs and muxes.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

Benjamin Herrenschmidt

d415e5544a

fetch/icache: Fit icache in BRAM

The goal is to have the icache fit in BRAM by latching the output
into a register. In order to avoid timing issues , we need to give
the BRAM a full cycle on reads, and thus we souce the BRAM address
directly from fetch1 latched NIA.

(Note: This will be problematic if/when we want to hash the address,
we'll probably be better off having fetch1 latch a fully hashed address
along with the normal one, so the icache can use the former to address
the BRAM and pass the latter along)

One difficulty is that we cannot really stall the icache without adding
more combo logic that would break the "one full cycle" BRAM model. This
means that on stalls from decode, by the time we stall fetch1, it has
already gone to the next address, which the icache is already latching.

We work around this by having a "stash" buffer in fetch2 that will stash
away the icache output on a stall, and override the output of the icache
with the content of the stash buffer when unstalling.

This requires a rewrite of the stop/step debug logic as well. We now
do most of the hard work in fetch1 which makes more sense.

Note: Vivado is still not inferring an built-in output register for the
BRAMs. I don't want to add another cycle... I don't fully understand why
it wouldn't be able to treat current_row as such but clearly it won't. At
least the timing seems good enough now for 100Mhz, possibly more.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

Anton Blanchard

1e3e16e500

Add an icache testbench

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>

3 Commits (b56b46b7d1a537d9f99608d4957bb5183d512421)