microwatt/fpga/top-generic.vhdl

library ieee;
use ieee.std_logic_1164.all;

library work;
use work.wishbone_types.all;

entity toplevel is
    generic (
	MEMORY_SIZE   : positive := (384*1024);
	RAM_INIT_FILE : string   := "firmware.hex";
	RESET_LOW     : boolean  := true;
	CLK_INPUT     : positive := 100000000;
	CLK_FREQUENCY : positive := 100000000;
        HAS_FPU       : boolean  := true;
        HAS_BTC       : boolean  := false;
        ICACHE_NUM_LINES : natural := 64;
        LOG_LENGTH    : natural := 512;
	DISABLE_FLATTEN_CORE : boolean := false;
        UART_IS_16550 : boolean  := true
	);
    port(
	ext_clk   : in  std_ulogic;
	ext_rst   : in  std_ulogic;

	-- UART0 signals:
	uart0_txd : out std_ulogic;
	uart0_rxd : in  std_ulogic
	);
end entity toplevel;

architecture behaviour of toplevel is

    -- Reset signals:
    signal soc_rst : std_ulogic;
    signal pll_rst : std_ulogic;

    -- Internal clock signals:
    signal system_clk : std_ulogic;
    signal system_clk_locked : std_ulogic;

begin

    reset_controller: entity work.soc_reset
	generic map(
	    RESET_LOW => RESET_LOW
	    )
	port map(
	    ext_clk => ext_clk,
	    pll_clk => system_clk,
	    pll_locked_in => system_clk_locked,
	    ext_rst_in => ext_rst,
	    pll_rst_out => pll_rst,
	    rst_out => soc_rst
	    );

    clkgen: entity work.clock_generator
	generic map(
	    CLK_INPUT_HZ => CLK_INPUT,
	    CLK_OUTPUT_HZ => CLK_FREQUENCY
	    )
	port map(
	    ext_clk => ext_clk,
	    pll_rst_in => pll_rst,
	    pll_clk_out => system_clk,
	    pll_locked_out => system_clk_locked
	    );

    -- Main SoC
    soc0: entity work.soc
	generic map(
	    MEMORY_SIZE   => MEMORY_SIZE,
	    RAM_INIT_FILE => RAM_INIT_FILE,
	    SIM           => false,
	    CLK_FREQ      => CLK_FREQUENCY,
            HAS_FPU       => HAS_FPU,
            HAS_BTC       => HAS_BTC,
	    ICACHE_NUM_LINES => ICACHE_NUM_LINES,
            LOG_LENGTH    => LOG_LENGTH,
	    DISABLE_FLATTEN_CORE => DISABLE_FLATTEN_CORE,
            UART0_IS_16550     => UART_IS_16550
	    )
	port map (
	    system_clk        => system_clk,
	    rst               => soc_rst,
	    uart0_txd         => uart0_txd,
	    uart0_rxd         => uart0_rxd
	    );

end architecture behaviour;
Split FPGA toplevel from soc This will be useful when we start needing different toplevels for different boards. We keep the reset and clock generators in the toplevel as they will eventually be taken over by litedram when we integrate it, and they are more likely to change on different system types. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> 5 years ago			`library ieee;`
			`use ieee.std_logic_1164.all;`

Some yosys fixes This gets the yosys build further along, but I'm now chasing what looks like a yosys bug. Signed-off-by: Anton Blanchard <anton@linux.ibm.com> 5 years ago			`library work;`
			`use work.wishbone_types.all;`

Split FPGA toplevel from soc This will be useful when we start needing different toplevels for different boards. We keep the reset and clock generators in the toplevel as they will eventually be taken over by litedram when we integrate it, and they are more likely to change on different system types. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> 5 years ago			`entity toplevel is`
			`generic (`
Reduce simulated and default FPGA RAM to 384kB Micropython has been able to fit into 384kB for ages, so lets reduce our simulated RAM. This is useful for testing if micropython will run on an ECP5 85k, which has enough BRAM for 384kB but not enough for 512kB. Signed-off-by: Anton Blanchard <anton@linux.ibm.com> 5 years ago			`MEMORY_SIZE : positive := (384*1024);`
Split FPGA toplevel from soc This will be useful when we start needing different toplevels for different boards. We keep the reset and clock generators in the toplevel as they will eventually be taken over by litedram when we integrate it, and they are more likely to change on different system types. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> 5 years ago			`RAM_INIT_FILE : string := "firmware.hex";`
Improve PLL/MMCM clocks configuration We can now pass both the input clock and target clock frequency via generics. Add support for both 50Mhz and 100Mhz target freqs for both cases. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> 5 years ago			`RESET_LOW : boolean := true;`
			`CLK_INPUT : positive := 100000000;`
Add option to not flatten hierarchy Vivado by default tries to flatten the module hierarchy to improve placement and timing. However this makes debugging timing issues really hard as the net names in the timing report can be pretty bogus. This adds a generic that can be used to control attributes to stop vivado from flattening the main core components. The resulting design will have worst timing overall but it will be easier to understand what the worst timing path are and address them. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> 5 years ago			`CLK_FREQUENCY : positive := 100000000;`
core: Add support for floating-point loads and stores This extends the register file so it can hold FPR values, and implements the FP loads and stores that do not require conversion between single and double precision. We now have the FP, FE0 and FE1 bits in MSR. FP loads and stores cause a FP unavailable interrupt if MSR[FP] = 0. The FPU facilities are optional and their presence is controlled by the HAS_FPU generic passed down from the top-level board file. It defaults to true for all except the A7-35 boards. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> 4 years ago			`HAS_FPU : boolean := true;`
fetch1: Implement a simple branch target cache This implements a cache in fetch1, where each entry stores the address of a simple branch instruction (b or bc) and the target of the branch. When fetching sequentially, if the address being fetched matches the cache entry, then fetching will be redirected to the branch target. The cache has 1024 entries and is direct-mapped, i.e. indexed by bits 11..2 of the NIA. The bus from execute1 now carries information about taken and not-taken simple branches, which fetch1 uses to update the cache. The cache entry is updated for both taken and not-taken branches, with the valid bit being set if the branch was taken and cleared if the branch was not taken. If fetching is redirected to the branch target then that goes down the pipe as a predicted-taken branch, and decode1 does not do any static branch prediction. If fetching is not redirected, then the next instruction goes down the pipe as normal and decode1 does its static branch prediction. In order to make timing, the lookup of the cache is pipelined, so on each cycle the cache entry for the current NIA + 8 is read. This means that after a redirect (from decode1 or execute1), only the third and subsequent sequentially-fetched instructions will be able to be predicted. This improves the coremark value on the Arty A7-100 from about 180 to about 190 (more than 5%). The BTC is optional. Builds for the Artix 7 35-T part have it off by default because the extra ~1420 LUTs it takes mean that the design doesn't fit on the Arty A7-35 board. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> 4 years ago			`HAS_BTC : boolean := false;`
Reduce the size of icache to help yosys ECP5 builds (#303) The icache RAM is currently LUT ram not block ram. This massively bloats the icache size. We think this is due to yosys not inferencing the RAM correctly but that's yet to be confirmed. Work around this for now by reducing the default size of the icache RAM for the ECP5 builds. On the ECP5 85K builts, this gets us from 95% down to 76% and helps our CI to pass. Signed-off-by: Michael Neuling <mikey@neuling.org> 3 years ago			`ICACHE_NUM_LINES : natural := 64;`
Add LOG_LENGTH to top-generic.vhdl The other top level files allow LOG_LENGTH to be configured. Signed-off-by: Anton Blanchard <anton@linux.ibm.com> 4 years ago			`LOG_LENGTH : natural := 512;`
uart: Import and hook up opencore 16550 compatible UART This imports via fusesoc a 16550 compatible (ie "standard") UART, and wires it up optionally in the SoC instead of the potato one. This also adds support for a second UART (which is always a 16550) to Arty, wired to JC "bottom" port. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> 4 years ago			`DISABLE_FLATTEN_CORE : boolean := false;`
uart: Make 16550 the default Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> 4 years ago			`UART_IS_16550 : boolean := true`
Split FPGA toplevel from soc This will be useful when we start needing different toplevels for different boards. We keep the reset and clock generators in the toplevel as they will eventually be taken over by litedram when we integrate it, and they are more likely to change on different system types. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> 5 years ago			`);`
			`port(`
			`ext_clk : in std_ulogic;`
			`ext_rst : in std_ulogic;`

			`-- UART0 signals:`
			`uart0_txd : out std_ulogic;`
			`uart0_rxd : in std_ulogic`
			`);`
			`end entity toplevel;`

			`architecture behaviour of toplevel is`

			`-- Reset signals:`
			`signal soc_rst : std_ulogic;`
Fix PLL reset signal name in toplevel It shouldn't have a _n suffix, it's active positive. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> 5 years ago			`signal pll_rst : std_ulogic;`
Split FPGA toplevel from soc This will be useful when we start needing different toplevels for different boards. We keep the reset and clock generators in the toplevel as they will eventually be taken over by litedram when we integrate it, and they are more likely to change on different system types. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> 5 years ago
			`-- Internal clock signals:`
			`signal system_clk : std_ulogic;`
			`signal system_clk_locked : std_ulogic;`

			`begin`

			`reset_controller: entity work.soc_reset`
			`generic map(`
			`RESET_LOW => RESET_LOW`
			`)`
			`port map(`
			`ext_clk => ext_clk,`
			`pll_clk => system_clk,`
			`pll_locked_in => system_clk_locked,`
			`ext_rst_in => ext_rst,`
Fix PLL reset signal name in toplevel It shouldn't have a _n suffix, it's active positive. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> 5 years ago			`pll_rst_out => pll_rst,`
Split FPGA toplevel from soc This will be useful when we start needing different toplevels for different boards. We keep the reset and clock generators in the toplevel as they will eventually be taken over by litedram when we integrate it, and they are more likely to change on different system types. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> 5 years ago			`rst_out => soc_rst`
			`);`

			`clkgen: entity work.clock_generator`
Improve PLL/MMCM clocks configuration We can now pass both the input clock and target clock frequency via generics. Add support for both 50Mhz and 100Mhz target freqs for both cases. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> 5 years ago			`generic map(`
			`CLK_INPUT_HZ => CLK_INPUT,`
			`CLK_OUTPUT_HZ => CLK_FREQUENCY`
			`)`
Split FPGA toplevel from soc This will be useful when we start needing different toplevels for different boards. We keep the reset and clock generators in the toplevel as they will eventually be taken over by litedram when we integrate it, and they are more likely to change on different system types. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> 5 years ago			`port map(`
			`ext_clk => ext_clk,`
Fix PLL reset signal name in toplevel It shouldn't have a _n suffix, it's active positive. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> 5 years ago			`pll_rst_in => pll_rst,`
Split FPGA toplevel from soc This will be useful when we start needing different toplevels for different boards. We keep the reset and clock generators in the toplevel as they will eventually be taken over by litedram when we integrate it, and they are more likely to change on different system types. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> 5 years ago			`pll_clk_out => system_clk,`
			`pll_locked_out => system_clk_locked`
			`);`

			`-- Main SoC`
			`soc0: entity work.soc`
			`generic map(`
			`MEMORY_SIZE => MEMORY_SIZE,`
			`RAM_INIT_FILE => RAM_INIT_FILE,`
Add option to not flatten hierarchy Vivado by default tries to flatten the module hierarchy to improve placement and timing. However this makes debugging timing issues really hard as the net names in the timing report can be pretty bogus. This adds a generic that can be used to control attributes to stop vivado from flattening the main core components. The resulting design will have worst timing overall but it will be easier to understand what the worst timing path are and address them. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> 5 years ago			`SIM => false,`
Some yosys fixes This gets the yosys build further along, but I'm now chasing what looks like a yosys bug. Signed-off-by: Anton Blanchard <anton@linux.ibm.com> 5 years ago			`CLK_FREQ => CLK_FREQUENCY,`
core: Add support for floating-point loads and stores This extends the register file so it can hold FPR values, and implements the FP loads and stores that do not require conversion between single and double precision. We now have the FP, FE0 and FE1 bits in MSR. FP loads and stores cause a FP unavailable interrupt if MSR[FP] = 0. The FPU facilities are optional and their presence is controlled by the HAS_FPU generic passed down from the top-level board file. It defaults to true for all except the A7-35 boards. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> 4 years ago			`HAS_FPU => HAS_FPU,`
fetch1: Implement a simple branch target cache This implements a cache in fetch1, where each entry stores the address of a simple branch instruction (b or bc) and the target of the branch. When fetching sequentially, if the address being fetched matches the cache entry, then fetching will be redirected to the branch target. The cache has 1024 entries and is direct-mapped, i.e. indexed by bits 11..2 of the NIA. The bus from execute1 now carries information about taken and not-taken simple branches, which fetch1 uses to update the cache. The cache entry is updated for both taken and not-taken branches, with the valid bit being set if the branch was taken and cleared if the branch was not taken. If fetching is redirected to the branch target then that goes down the pipe as a predicted-taken branch, and decode1 does not do any static branch prediction. If fetching is not redirected, then the next instruction goes down the pipe as normal and decode1 does its static branch prediction. In order to make timing, the lookup of the cache is pipelined, so on each cycle the cache entry for the current NIA + 8 is read. This means that after a redirect (from decode1 or execute1), only the third and subsequent sequentially-fetched instructions will be able to be predicted. This improves the coremark value on the Arty A7-100 from about 180 to about 190 (more than 5%). The BTC is optional. Builds for the Artix 7 35-T part have it off by default because the extra ~1420 LUTs it takes mean that the design doesn't fit on the Arty A7-35 board. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> 4 years ago			`HAS_BTC => HAS_BTC,`
Reduce the size of icache to help yosys ECP5 builds (#303) The icache RAM is currently LUT ram not block ram. This massively bloats the icache size. We think this is due to yosys not inferencing the RAM correctly but that's yet to be confirmed. Work around this for now by reducing the default size of the icache RAM for the ECP5 builds. On the ECP5 85K builts, this gets us from 95% down to 76% and helps our CI to pass. Signed-off-by: Michael Neuling <mikey@neuling.org> 3 years ago			`ICACHE_NUM_LINES => ICACHE_NUM_LINES,`
Add LOG_LENGTH to top-generic.vhdl The other top level files allow LOG_LENGTH to be configured. Signed-off-by: Anton Blanchard <anton@linux.ibm.com> 4 years ago			`LOG_LENGTH => LOG_LENGTH,`
uart: Import and hook up opencore 16550 compatible UART This imports via fusesoc a 16550 compatible (ie "standard") UART, and wires it up optionally in the SoC instead of the potato one. This also adds support for a second UART (which is always a 16550) to Arty, wired to JC "bottom" port. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> 4 years ago			`DISABLE_FLATTEN_CORE => DISABLE_FLATTEN_CORE,`
			`UART0_IS_16550 => UART_IS_16550`
Split FPGA toplevel from soc This will be useful when we start needing different toplevels for different boards. We keep the reset and clock generators in the toplevel as they will eventually be taken over by litedram when we integrate it, and they are more likely to change on different system types. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> 5 years ago			`)`
			`port map (`
			`system_clk => system_clk,`
			`rst => soc_rst,`
			`uart0_txd => uart0_txd,`
soc: Don't require dram wishbones signals to be wired by toplevel Currently, when not using litedram, the top level still has to hook up "dummy" wishbones to the main dram and control dram busses coming out of the SoC and provide ack signals. Instead, make the SoC generate the acks internally when not using litedram and use defaults to make the wiring entirely optional. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> 4 years ago			`uart0_rxd => uart0_rxd`
Split FPGA toplevel from soc This will be useful when we start needing different toplevels for different boards. We keep the reset and clock generators in the toplevel as they will eventually be taken over by litedram when we integrate it, and they are more likely to change on different system types. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> 5 years ago			`);`

			`end architecture behaviour;`