microwatt

Commit Graph

Author	SHA1	Message	Date
Anton Blanchard	faab169307	Allow ALT_RESET_ADDRESS to be overridden Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	3 years ago
Paul Mackerras	4cf2921b0b	soc: Re-do peripheral address decode to improve timing This generates a series of io_cycle_* signals which are clean latches and which become the 'cyc' signals of the wishbone buses going to various peripherals (syscon, uarts, XICS, GPIO, etc.). Effectively this is done by moving the address decoding into the slave_io_latch process. The slave_io_type, which drives the multiplexer which selects which wishbone to look for a response on, is reduced to just 8 values in the expectation that an 8-way multiplexer will use less logic than one with more than 8 inputs. With this timing is considerably better on the A7-100T. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	3 years ago
Anton Blanchard	af6bc48d36	Merge pull request #329 from paulusmack/wb-fix Wishbone addressing fix	3 years ago
Paul Mackerras	ca4eb46aea	Make wishbone addresses be in units of doublewords or words This makes the 64-bit wishbone buses have the address expressed in units of doublewords (64 bits), and similarly for the 32-bit buses the address is in units of words (32 bits). This is to comply with the wishbone spec. Previously the addresses on the wishbone buses were in units of bytes regardless of the bus data width, which is not correct and caused problems with interfacing with externally-generated logic. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	3 years ago
Paul Mackerras	734e4c4a52	core: Add a short multiplier This adds an optional 16 bit x 16 bit signed multiplier and uses it for multiply instructions that return the low 64 bits of the product (mull[dw][o] and mulli, but not maddld) when the operands are both in the range -2^15 .. 2^15 - 1. The "short" 16-bit multiplier produces its result combinatorially, so a multiply that uses it executes in one cycle. This improves the coremark result by about 4%, since coremark does quite a lot of multiplies and they almost all have operands that fit into 16 bits. The presence of the short multiplier is controlled by a generic at the execute1, SOC, core and top levels. For now, it defaults to off for all platforms, and can be enabled using the --has_short_mult flag to fusesoc. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	3 years ago
Anton Blanchard	591e96d1a2	gpio: Add HAS_GPIO to avoid verilator build errors The verilator build fails with warnings and errors, because NGPIO is 0 and we do things like: gpio_out : out std_ulogic_vector(NGPIO - 1 downto 0); Set NGPIO to something reasonable (eg 32) and add HAS_GPIO to avoid building the macro entirely if it isn't in use. Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	3 years ago
Michael Neuling	84473eda1b	Merge pull request #277 from paulus/gpio A few cleanups. GPIO IRQ number is now 4 as 3 is now taken by the SD card.	4 years ago
Paul Mackerras	21ed730514	arty_a7: Add litesdcard interface This adds litesdcard.v generated from the litex/litesdcard project, along with logic in top-arty.vhdl to connect it into the system. There is now a DMA wishbone coming in to soc.vhdl which is narrower than the other wishbone masters (it has 32-bit data rather than 64-bit) so there is a widening/narrowing adapter between it and the main wishbone master arbiter. Also, litesdcard generates a non-pipelined wishbone for its DMA connection, which needs to be converted to a pipelined wishbone. We have a latch on both the incoming and outgoing sides of the wishbone in order to help make timing (at the cost of two extra cycles of latency). litesdcard generates an interrupt signal which is wired up to input 3 of the ICS (IRQ 19). Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 years ago
Paul Mackerras	eb7eba2d92	dcache: Snoop writes to memory by other agents This adds a path where the wishbone that goes out to memory and I/O also gets fed back to the dcache, which looks for writes that it didn't initiate, and invalidates any cache line that gets written to. This involves a second read port on the cache tag RAM for looking up the snooped writes, and effectively a second write port on the cache valid bit array to clear bits corresponding to snoop hits. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 years ago
Anton Blanchard	5cc5d8f030	Merge pull request #281 from antonblanchard/cache-tlb-parameters Pass icache/dcache/tlb parameters down from soc	4 years ago
Anton Blanchard	91a53d8001	Allow SPI BOOT_CLOCKS to be overridden by top level Our SPI controller sends 8 dummy clocks at boot which Ben added for some Xilinx boards. This should be harmless but it is confusing the flash testbench in the Caravel project. Add a parameter so it can be overridden at the top level. Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	4 years ago
Anton Blanchard	2d21b95f87	Pass icache/dcache/tlb parameters down from soc We want much smaller caches and tlbs when building for sky130, so allow the toplevel file to override the defaults. Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	4 years ago
Paul Mackerras	f06ffcf9b7	Add a GPIO controller and use it to drive the shield I/O pins on the Arty This adds a GPIO controller which provides 32 bits of I/O. The registers are modelled on the set used by the gpio-ftgpio010.c driver in the Linux kernel. Currently there is no interrupt capability implemented, though an interrupt line from the GPIO subsystem to the XICS has been connected. For the Arty A7 board, GPIO lines 0 to 13 are connected to the pins labelled IO0 to IO13 on the "shield" connector, GPIO lines 14 to 29 connect to IO26 to IO41, GPIO line 30 connects to the pin labelled A (aka IO42), and GPIO line 31 is connected to LED 7. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 years ago
Michael Neuling	9a6a7e9fe5	Merge pull request #268 from paulusmack/btc Implement branch target cache	4 years ago
Anton Blanchard	481f3cdfea	Add some wishbone checking Check that stb, cyc and ack are never undefined. While not really needed here, this also tests if --pragma synthesis_off/--pragma synthesis_on works on all the tools we use. Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	4 years ago
Paul Mackerras	0fb207be60	fetch1: Implement a simple branch target cache This implements a cache in fetch1, where each entry stores the address of a simple branch instruction (b or bc) and the target of the branch. When fetching sequentially, if the address being fetched matches the cache entry, then fetching will be redirected to the branch target. The cache has 1024 entries and is direct-mapped, i.e. indexed by bits 11..2 of the NIA. The bus from execute1 now carries information about taken and not-taken simple branches, which fetch1 uses to update the cache. The cache entry is updated for both taken and not-taken branches, with the valid bit being set if the branch was taken and cleared if the branch was not taken. If fetching is redirected to the branch target then that goes down the pipe as a predicted-taken branch, and decode1 does not do any static branch prediction. If fetching is not redirected, then the next instruction goes down the pipe as normal and decode1 does its static branch prediction. In order to make timing, the lookup of the cache is pipelined, so on each cycle the cache entry for the current NIA + 8 is read. This means that after a redirect (from decode1 or execute1), only the third and subsequent sequentially-fetched instructions will be able to be predicted. This improves the coremark value on the Arty A7-100 from about 180 to about 190 (more than 5%). The BTC is optional. Builds for the Artix 7 35-T part have it off by default because the extra ~1420 LUTs it takes mean that the design doesn't fit on the Arty A7-35 board. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 years ago
Paul Mackerras	97586e7e99	soc: Drive uart1_irq to 0 when we don't have UART1 The tools complain about uart1_irq not being driven and not having a default when HAS_UART1 is false. This sets it to 0 in that case. Fixes: `7575b1e0c2` ("uart: Import and hook up opencore 16550 compatible UART") Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 years ago
Paul Mackerras	45cd8f4fc3	core: Add support for floating-point loads and stores This extends the register file so it can hold FPR values, and implements the FP loads and stores that do not require conversion between single and double precision. We now have the FP, FE0 and FE1 bits in MSR. FP loads and stores cause a FP unavailable interrupt if MSR[FP] = 0. The FPU facilities are optional and their presence is controlled by the HAS_FPU generic passed down from the top-level board file. It defaults to true for all except the A7-35 boards. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 years ago
Benjamin Herrenschmidt	fb5c16d05e	uart: Make 16550 the default Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	4 years ago
Benjamin Herrenschmidt	7575b1e0c2	uart: Import and hook up opencore 16550 compatible UART This imports via fusesoc a 16550 compatible (ie "standard") UART, and wires it up optionally in the SoC instead of the potato one. This also adds support for a second UART (which is always a 16550) to Arty, wired to JC "bottom" port. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	4 years ago
Benjamin Herrenschmidt	8366710217	liteeth: Hook up LiteX LiteEth ethernet controller Currently only generated for Arty. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	4 years ago
Benjamin Herrenschmidt	bb54af59de	xics: Add support for reduced priority field size This makes the ICS support less than the 8 architected bits and sets the soc to use 3 bits by default. All the supported bits set translates to "masked" (and will read back at 0xff), any small value is used as-is. Linux doesn't use priorities above 5, so this is a way to save silicon. The number of supported priority bits is exposed to the OS via the config register. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	4 years ago
Benjamin Herrenschmidt	5c2fc47e2c	xics: Add simple ICS Move the external interrupt generation to a separate module "ICS" (source controller) which a register per source containing currently only the priority control. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	4 years ago
Michael Neuling	b90a0a2139	Merge pull request #208 from paulusmack/faster Make the core go faster Several major improvements in here: - Simple branch predictor - Reduced latency for mispredicted branches and interrupts by removing fetch2 stage - Cache improvements o Request critical dword first on refill o Handle hits while refilling, including on line being refilled o Sizes doubled for both D and I - Loadstore improvements: can now do one load or store every two cycles in most cases - Optimized 2-cycle multiplier for Xilinx 7-series parts using DSP slices - Timing improvements, including: o Stash buffer in decode1 o Reduced width of execute1 result mux o Improved SPR decode in decode1 o Some non-critical operation take a cycle longer so we can break some long combinatorial chains - Core logging: logs 256 bits of info every cycle into a ring buffer, to help with debugging and performance analysis This increases the LUT usage for the "synth" + A35 target from 9182 to 10297 = 12%.	4 years ago
Paul Mackerras	78de4fef72	Make LOG_LENGTH configurable per FPGA variant This plumbs the LOG_LENGTH parameter (which controls how many entries the core log RAM has) up to the top level so that it can be set on the fusesoc command line and have different default values on different FPGAs. It now defaults to 512 entries generally and on the Artix-7 35 parts, and 2048 on the larger Artix-7 FPGAs. It can be set to 0 if desired. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 years ago
Benjamin Herrenschmidt	67b6117ebf	soc: Slight cleanup of IRQ assignments Use a separate process to assign selected interrupts to the interrupt array, and document them. There's only one interrupt for now but that will change and this way is clearer. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	4 years ago
Benjamin Herrenschmidt	e07b3dd6fa	soc: Rename uart_dat8 to uart0_dat8 Just for consistency. Will come in handy if we ever add a second one Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	4 years ago
Benjamin Herrenschmidt	f9f18906a3	soc: Rename wb_dram_ctrl to wb_ext_io and rework decoding This makes the control bus currently going out of "soc" towards litedram more generic for external IO devices added by the top-level rather than inside the SoC proper. This is mostly renaming of signals and a small change on how the address decoder operates, using a separate "cascaded" decode for the external IOs. We make the region 0xc8nn_nnnn be the "external IO" region for now. This will make it easier / cleaner to add more external devices. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	4 years ago
Benjamin Herrenschmidt	bf7def5503	soc: Don't require dram wishbones signals to be wired by toplevel Currently, when not using litedram, the top level still has to hook up "dummy" wishbones to the main dram and control dram busses coming out of the SoC and provide ack signals. Instead, make the SoC generate the acks internally when not using litedram and use defaults to make the wiring entirely optional. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	4 years ago
Benjamin Herrenschmidt	1ffc89e58b	soc: Add defaults for some input signals That way the top-level's don't need to assign them Also remove generics that are set to the default anyways Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	4 years ago
Benjamin Herrenschmidt	4244b54984	soc: Remove unused RESET_LOW generic Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	4 years ago
Benjamin Herrenschmidt	6c3a8bf417	bram: Remove combinational loop on stall It hurts timing and is pointless Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	4 years ago
Benjamin Herrenschmidt	e5aa0e9dc9	uart: Remove combinational loops on ack and stall signal They hurt timing forcing signals to come from the master and back again in one cycle. Stall isn't sampled by the master unless there is an active cycle so masking it with cyc is pointless. Masking acks is somewhat pointless too as we don't handle early dropping of cyc in any of our slaves properly anyways. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	4 years ago
Benjamin Herrenschmidt	cc4dcb3597	spi: Add SPI Flash controller This adds an SPI flash controller which supports direct memory-mapped access to the flash along with a manual mode to send commands. The direct mode can be set via generic to default to single wire or quad mode. The controller supports normal, dual and quad accesses with configurable commands, clock divider, dummy clocks etc... The SPI clock can be an even divider of sys_clk starting at 2 (so max 50Mhz with our typical Arty designs). A flash offset is carried via generics to syscon to tell SW about which portion of the flash is reserved for the FPGA bitfile. There is currently no plumbing to make the CPU reset past that address (TBD). Note: Operating at 50Mhz has proven unreliable without adding some delay to the sampling of the input data. I'm working in improving this, in the meantime, I'm leaving the default set at 25 Mhz. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	4 years ago
Benjamin Herrenschmidt	bf1b98b958	litedram: Add support for booting without BRAM This adds an option to disable the main BRAM and instead copy a payload stashed along with the init code in the secondary BRAM into DRAM and boot from there Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	5 years ago
Benjamin Herrenschmidt	f86fb74bfe	irq: Simplify xics->core irq input Use a simple wire. common.vhdl types are better kept for things local to the core. We can add more wires later if we need to for HV irqs etc... Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	5 years ago
Benjamin Herrenschmidt	573b6b4bc4	soc: Rework interconnect This changes the SoC interconnect such that the main 64-bit wishbone out of the processor is first split between only 3 slaves (BRAM, DRAM and a general "IO" bus) instead of all the slaves in the SoC. The IO bus leg is then latched and down-converted to 32 bits data width, before going through a second address decoder for the various IO devices. This significantly reduces routing and timing pressure on the main bus, allowing to get rid of frequent timing violations when synthetizing on small'ish FPGAs such as the Artix-7 35T found on the original Arty board. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	5 years ago
Benjamin Herrenschmidt	8d64090a68	sw: Add full memory map to .h and use it for litedram .lds Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	5 years ago
Anton Blanchard	4e78b8078e	Merge branch 'master' into litedram	5 years ago
Benjamin Herrenschmidt	acbdd396a5	soc/core: Add reset latches This adds one-cycle latches to the various resets out of the soc and into the various core modules. It seems to help vivado P&R a bit and has shown to avoid timing violations under some circumstances. Interestingly those resets never seem to appear in the bad timing path. It looks like those long resets simply impose placement constraints that Vivado satisfies at the expense of timing elsewhere. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	5 years ago
Benjamin Herrenschmidt	c19b5b8cc7	litedram: Update to new LiteX/LiteDRAM version Things have changed a bit in upstream LiteX. LiteDRAM now exposes a wishbone for the CSRs for example. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	5 years ago
Paul Mackerras	941499133e	soc: Work around compile error with ghdl 0.37-dev The ghdl packaged in Fedora 31 doesn't like a port map of the form "rst => rst or core_reset", so this works around the problem by doing the OR in a separate statement. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 years ago
Benjamin Herrenschmidt	7f1f6b8525	litedram: Add support for Microwatt-initialized controller This adds support for initializing the memory controller from microwatt rather than using a built-in RiscV processor. This might require some fixes to LiteX and LiteDRAM (they haven't been merged as of this commit yet). This is enabled in the shipped generated files and can be changed via modifying the generator script to pass False to "mw_init" Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	5 years ago
Benjamin Herrenschmidt	025cf5efe8	syscon: Add syscon registers These provides some info about the SoC (though it's still somewhat incomplete and needs more work, see comments). There's also a control register for selecting DRAM vs. BRAM at 0 (and for soft-resetting the SoC but that isn't wired up yet). Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	5 years ago
Benjamin Herrenschmidt	8bb3c8f8b6	soc: Add DRAM address decoding Still not attached to any board Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	5 years ago
Benjamin Herrenschmidt	6853d22203	core: Add alternate reset address An external signal can control whether the core will start executing at the standard or the alternate reset address. This will be used when litedram is initialized by microwatt itself, to route the reset to the built-in init code secondary block RAM. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	5 years ago
Michael Neuling	b4f20c20b9	XICS interrupt controller New unified ICP and ICS XICS compliant interrupt controller. Configurable number of hardware sources. Fixed hardware source number based on hardware line taken. All hardware interrupts are a fixed priority. Level interrupts supported only. Hardwired to 0xc0004000 in SOC (UART is kept at 0xc0002000). Signed-off-by: Michael Neuling <mikey@neuling.org>	5 years ago
Anton Blanchard	3ad3e2abfd	Removed unused core_terminated signal Right now it's unused. We can add it back when we add an LED to signify the core has terminated. Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	5 years ago
Benjamin Herrenschmidt	bc2acfde2f	wb_arbiter: Make arbiter size parametric Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	5 years ago
Benjamin Herrenschmidt	8e0389b973	ram: Rework main RAM interface This replaces the simple_ram_behavioural and mw_soc_memory modules with a common wishbone_bram_wrapper.vhdl that interfaces the pipelined WB with a lower-level RAM module, along with an FPGA and a sim variants of the latter. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	5 years ago

1 2

61 Commits (8ecb30da051f8486ac01db3df7c639844826ede9)