You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
microwatt/fpga
Paul Mackerras 0fb207be60 fetch1: Implement a simple branch target cache
This implements a cache in fetch1, where each entry stores the address
of a simple branch instruction (b or bc) and the target of the branch.
When fetching sequentially, if the address being fetched matches the
cache entry, then fetching will be redirected to the branch target.
The cache has 1024 entries and is direct-mapped, i.e. indexed by bits
11..2 of the NIA.

The bus from execute1 now carries information about taken and
not-taken simple branches, which fetch1 uses to update the cache.
The cache entry is updated for both taken and not-taken branches, with
the valid bit being set if the branch was taken and cleared if the
branch was not taken.

If fetching is redirected to the branch target then that goes down the
pipe as a predicted-taken branch, and decode1 does not do any static
branch prediction.  If fetching is not redirected, then the next
instruction goes down the pipe as normal and decode1 does its static
branch prediction.

In order to make timing, the lookup of the cache is pipelined, so on
each cycle the cache entry for the current NIA + 8 is read.  This
means that after a redirect (from decode1 or execute1), only the third
and subsequent sequentially-fetched instructions will be able to be
predicted.

This improves the coremark value on the Arty A7-100 from about 180 to
about 190 (more than 5%).

The BTC is optional.  Builds for the Artix 7 35-T part have it off by
default because the extra ~1420 LUTs it takes mean that the design
doesn't fit on the Arty A7-35 board.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
4 years ago
..
LICENSE
acorn-cle-215.xdc acorn: Add support for the Acorn CLE 215+ 4 years ago
arty_a7.xdc Arty A7: Document pin connections for on-board headers 4 years ago
clk_gen_bypass.vhd Fix clk_gen_bypass 5 years ago
clk_gen_ecp5.vhd Add PLL for ECP5 device 4 years ago
clk_gen_mcmm.vhd Improve PLL/MMCM clocks configuration 5 years ago
clk_gen_plle2.vhd acorn: Add support for the Acorn CLE 215+ 4 years ago
cmod_a7-35.xdc Add SPI configuration to Xilinx constraint files 5 years ago
firmware.hex
fpga-random.vhdl Add random number generator and implement the darn instruction 4 years ago
fpga-random.xdc Add random number generator and implement the darn instruction 4 years ago
genesys2.xdc fpga: Add support for Genesys2 4 years ago
hello_world.hex hello_world: Use new headers and frequency from syscon 5 years ago
main_bram.vhdl Fix some ghdlsynth issues with fpga_bram 5 years ago
nexys-video.xdc spi: Add SPI Flash controller 4 years ago
nexys_a7.xdc Add SPI configuration to Xilinx constraint files 5 years ago
pp_fifo.vhd pp_fifo: Fix full fifo losing all data on simultaneous push & pop 5 years ago
pp_soc_uart.vhd uart: Remove combinational loops on ack and stall signal 4 years ago
pp_utilities.vhd
soc_reset.vhdl soc_reset: Use counters, add synchronizers 5 years ago
soc_reset_tb.vhdl Exit cleanly from testbench on success 5 years ago
top-acorn-cle-215.vhdl acorn: Add support for the Acorn CLE 215+ 4 years ago
top-arty.vhdl fetch1: Implement a simple branch target cache 4 years ago
top-generic.vhdl fetch1: Implement a simple branch target cache 4 years ago
top-genesys2.vhdl fpga: Add support for Genesys2 4 years ago
top-nexys-video.vhdl fetch1: Implement a simple branch target cache 4 years ago