microwatt

Commit Graph

Author	SHA1	Message	Date
Benjamin Herrenschmidt	a3857aac94	litedram: Add an L2 cache with store queue This adds a cache between the wishbone and litedram with the following features (at this point, it's still evolving) - 128 bytes line width in order to have a reasonable amount of litedram pipelining on the 128-bit wide data port. - Configurable geometry otherwise - Stores are acked immediately on wishbone whether hit or miss (minus a 2 cycles delay if there's a previous load response in the way) and sent to LiteDRAM via 8 entries (configurable) store queue Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	6 years ago
Benjamin Herrenschmidt	6fe077910b	litedram: Add simulation support This adds a simulated litedram model along with the necessary Makefile gunk to verilate it and wrap it for use by ghdl. The core_dram_tb test bench is a variant of core_tb with LiteDRAM simulated. It's not built by default, an explicit make core_dram_tb is necessary as to not require verilator to be installed for the normal build process (also it's slow'ish). Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	6 years ago
Anton Blanchard	04c56a0c52	Pass clock frequency to UART sim wrapper The UART sim wrapper is currently hard wired to 50 MHz. Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	6 years ago
Anton Blanchard	7b14819dbb	A little less shouting in the Makefile Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	6 years ago
Anton Blanchard	01da807476	Fix the simulated DMI Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	6 years ago
Anton Blanchard	a9e7194de5	Merge Makefile and Makefile.synth We still need to a way to our FPGA target on the command line, but this at least gets us down to a common Makefile. Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	6 years ago
Anton Blanchard	6326efaca4	Add Makefile command line variables to enable docker and podman Instead of having to edit the Makefile, we can now do: make DOCKER=1 make PODMAN=1 Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	6 years ago
Anton Blanchard	224e7734a8	Rework Makefile Instead of building each file one by one (and having to track all the dependencies manually), use the ghdl -c command that does analysis and elaboration in one go. Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	6 years ago
Anton Blanchard	4e78b8078e	Merge branch 'master' into litedram	6 years ago
Benjamin Herrenschmidt	803ee9ef35	Makefile: Improve clean a bit Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	6 years ago
Paul Mackerras	c164a2f4ea	Merge branch 'mmu' Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	6 years ago
Benjamin Herrenschmidt	025cf5efe8	syscon: Add syscon registers These provides some info about the SoC (though it's still somewhat incomplete and needs more work, see comments). There's also a control register for selecting DRAM vs. BRAM at 0 (and for soft-resetting the SoC but that isn't wired up yet). Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	6 years ago
Paul Mackerras	8160f4f821	Add framework for implementing an MMU This adds a new module to implement an MMU. At the moment it doesn't do very much. Tlbie instructions now get sent by loadstore1 to mmu, which sends them to dcache, rather than loadstore1 sending them directly to dcache. TLB misses from dcache now get sent by loadstore1 to mmu, which currently just returns an error. Loadstore1 then generates a DSI in response to the error return from mmu. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	6 years ago
Benjamin Herrenschmidt	982cf166dd	litedram: Add basic support for LiteX LiteDRAM This comes in two parts: - A generator script which uses LiteX to generate litedram cores along with their init files for various boards (currently Arty and Nexys-video). This comes with configs for arty and nexys_video. - A fusesoc "generator" which uses pre-generated litedram cores The generation process is manual on purpose. This include pre-generated cores for the two above boards. This is done so that one doesn't have to install LiteX to build microwatt. In addition, the generator script or wrapper vhdl tend to break when LiteX changes significantly which happens. This is still rather standalone and hasn't been plumbed into the SoC or the FPGA toplevel files yet. At this point LiteDRAM self-initializes using a built-in VexRiscv "Minimum" core obtained from LiteX and included in this commit. There is some plumbing to generate and cores that are initialized by Microwatt directly but this isn't working yet and so isn't enabled yet. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	6 years ago
Paul Mackerras	102b304db7	Merge remote-tracking branch 'remotes/origin/master'	6 years ago
Anton Blanchard	4160f2138d	Merge pull request #165 from mikey/xics Implement XICS compliant interrupt controller	6 years ago
Paul Mackerras	a05ee9fc7f	Makefile: fix typo Fix a typo which meant that the console tests weren't getting executed by 'make check'. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	6 years ago
Paul Mackerras	167e37d667	Plumb insn_type through to loadstore1 In preparation for adding a TLB to the dcache, this plumbs the insn_type from execute1 through to loadstore1, so that we can have other operations besides loads and stores (e.g. tlbie) going to loadstore1 and thence to the dcache. This also plumbs the unit field of the decode ROM from decode2 through to execute1 to simplify the logic around which ops need to go to loadstore1. The load and store data formatting are now not conditional on the op being OP_LOAD or OP_STORE. This eliminates the inferred latches clocked by each of the bits of r.op that we were getting previously. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	6 years ago
Michael Neuling	b4f20c20b9	XICS interrupt controller New unified ICP and ICS XICS compliant interrupt controller. Configurable number of hardware sources. Fixed hardware source number based on hardware line taken. All hardware interrupts are a fixed priority. Level interrupts supported only. Hardwired to 0xc0004000 in SOC (UART is kept at 0xc0002000). Signed-off-by: Michael Neuling <mikey@neuling.org>	6 years ago
Michael Neuling	ff162e42eb	Add VHDL TAGS Adds `make TAGS` Signed-off-by: Michael Neuling <mikey@neuling.org>	6 years ago
Michael Neuling	9d7df2d507	Add test cases for new exceptions and supervisor state This adds test cases for: - sc, illegals and decrementer exceptions - decrementer overflow - rfid - mt/mf sprg0/1 srr0/1 - mtdec - mtmsrd - sc It also adds these test cases to make check/check_light Signed-off-by: Michael Neuling <mikey@neuling.org>	6 years ago
Dan Horák	ab50c7710d	make the sources volume mount SELinux friendly Signed-off-by: Dan Horák <dan@danny.cz>	6 years ago
Anton Blanchard	471c7e2197	Consolidate VHPI code We had many copies of the VHPI marshalling/unmarshalling code. Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	6 years ago
Anton Blanchard	f77b31a552	Merge pull request #134 from paulusmack/master Add bypass from execute1 output to input	6 years ago
Anton Blanchard	c18830a5e5	Add an option to use Docker Some distros don't have a version of ghdl with the LLVM or GCC backend, so add a Docker image as an alternative. Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	6 years ago
Anton Blanchard	a4dbbfda4a	Fix Makefile dependency issue with files in vhdl/* GHDL doesn't seem to have a way to specify the location of the object file it writes, so right now they are all ending up in the root directory. The Makefile rules did not reflect that, so make would continually the files in fpga/* Fix the rules to match what GHDL is doing. Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	6 years ago
Paul Mackerras	39d18d2738	Make divider hang off the side of execute1 With this, the divider is a unit that execute1 sends operands to and which sends its results back to execute1, which then send them to writeback. Execute1 now sends a stall signal when it gets a divide or modulus instruction until it gets a valid signal back from the divider. Divide and modulus instructions are no longer marked as single-issue. The data formatting step that used to be done in decode2 for div and mod instructions is now done in execute1. We also do the absolute value operation in that same cycle instead of taking an extra cycle inside the divider for signed operations with a negative operand. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	6 years ago
Paul Mackerras	2167186b5f	Make multiplier hang off the side of execute1 With this, the multiplier isn't a separate pipe that decode2 issues instructions to, but rather is a unit that execute1 sends operands to and which sends the result back to execute1, which then sends it to writeback. Execute1 now sends a stall signal when it gets a multiply instruction until it gets a valid signal back from the multiplier. This all means that we no longer need to mark the multiply instructions as single-issue. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	6 years ago
Benjamin Herrenschmidt	e4f475e17f	sprs: Store common SPRs in register file This stores the most common SPRs in the register file. This includes CTR and LR and a not yet final list of others. The register file is set to 64 entries for now. Specific types are defined that can represent a GPR index (gpr_index_t) or a GPR/SPR index (gspr_index_t) along with conversion functions between the two. On order to deal with some forms of branch updating both LR and CTR, we introduced a delayed update of LR after a branch link. Note: We currently stall the pipeline on such a delayed branch, but we could avoid stalling fetch in that specific case as we know we have a branch delay. We could also limit that to the specific case where we need to update both CTR and LR. This allows us to make bcreg, mtspr and mfspr pipelined. decode1 will automatically force the single issue flag on mfspr/mtspr to a "slow" SPR. [paulus@ozlabs.org - fix direction of decode2.stall_in] Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	6 years ago
Benjamin Herrenschmidt	8e0389b973	ram: Rework main RAM interface This replaces the simple_ram_behavioural and mw_soc_memory modules with a common wishbone_bram_wrapper.vhdl that interfaces the pipelined WB with a lower-level RAM module, along with an FPGA and a sim variants of the latter. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	6 years ago
Benjamin Herrenschmidt	9a63c098a5	Move log2/ispow2 to a utils package (Out of icache and dcache) Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	6 years ago
Benjamin Herrenschmidt	cb4451498f	dcache: Add testbench A very simple one for now... Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	6 years ago
Benjamin Herrenschmidt	b513f0fb48	dcache: Add a dcache This replaces loadstore2 with a dcache The dcache unit is losely based on the icache one (same basic cache layout), but has some significant logic additions to deal with stores, loads with update, non-cachable accesses and other differences due to operating in the execution part of the pipeline rather than the fetch part. The cache is store-through, though a hit with an existing line will update the line rather than invalidate it. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	6 years ago
Paul Mackerras	f49a5a99a5	Remove execute2 stage Since the condition setting got moved to writeback, execute2 does nothing aside from wasting a cycle. This removes it. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	7 years ago
Paul Mackerras	374f4c536d	writeback: Do data formatting and condition recording in writeback This adds code to writeback to format data and test the result against zero for the purpose of setting CR0. The data formatter is able to shift and mask by bytes and do byte reversal and sign extension. It can also put together bytes from two input doublewords to support unaligned loads (including unaligned byte-reversed loads). The data formatter starts with an 8:1 multiplexer that is able to direct any byte of the input to any byte of the output. This lets us rotate the data and simultaneously byte-reverse it. The rotated/reversed data goes to a register for the unaligned cases that overlap two doublewords. Then there is per-byte logic that does trimming, sign extension, and splicing together bytes from a previous input doubleword (stored in data_latched) and the current doubleword. Finally the 64-bit result is tested to set CR0 if rc = 1. This removes the RC logic from the execute2, multiply and divide units, and the shift/mask/byte-reverse/sign-extend logic from loadstore2. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	7 years ago
Anton Blanchard	813f834012	Add CR hazard detection To keep things simple we treat the CR as a single entity. Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	7 years ago
Anton Blanchard	bdc26b7527	Add GPR hazard detection Check GPRs against any writers in the pipeline. All instructions are still marked single in pipeline at this stage. Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	7 years ago
Anton Blanchard	e4c98dce36	Merge pull request #100 from antonblanchard/gpr-hazard-5-a Separate issue control into its own unit	7 years ago
Anton Blanchard	d5346d0abf	Separate issue control into its own unit Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	7 years ago
Paul Mackerras	4396eddc31	countzero: Add a testbench Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	7 years ago
Anton Blanchard	3c6e66dc96	Merge pull request #83 from paulusmack/logical execute: Consolidate count-leading/trailing-zeroes implementations	7 years ago
Anton Blanchard	4b7b702e01	Merge pull request #81 from antonblanchard/logical Consolidate logical instructions	7 years ago
Paul Mackerras	24a4a796ce	execute: Consolidate count-leading/trailing-zeroes implementations This adds combinatorial logic that does 32-bit and 64-bit count leading and trailing zeroes in one unit, and consolidates the four instructions under a single OP_CNTZ opcode. This saves 84 slice LUTs on the Arty A7-100. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	7 years ago
Anton Blanchard	b8fb721b81	Consolidate logical instructions Consolidate and/andc/nand, or/orc/nor and xor/eqv, using a common invert on the input and output. This saves us about 200 LUTs. Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	7 years ago
Benjamin Herrenschmidt	b56b46b7d1	icache: Set associative icache This adds support for set associativity to the icache. It can still be direct mapped by setting NUM_WAYS to 1. The replacement policy uses a simple tree-PLRU for each set. This is only lightly tested, tests pass but I have to double check that we are using the ways effectively and not creating duplicates. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Benjamin Herrenschmidt	004eb074c9	plru: Add a simple PLRU module Tested in sim only for now Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Paul Mackerras	f7c393ba7e	Add a rotate/mask/shift unit and use it in execute1 This adds a new entity 'rotator' which contains combinatorial logic for rotating and masking 64-bit values. It implements the operations of the rlwinm, rlwnm, rlwimi, rldicl, rldicr, rldic, rldimi, rldcl, rldcr, sld, slw, srd, srw, srad, sradi, sraw and srawi instructions. It consists of a 3-stage 64-bit rotator using 4:1 multiplexors at each stage, two mask generators, output logic and control logic. The insn_type_t values used for these instructions have been reduced to just 5: OP_RLC, OP_RLCL and OP_RLCR for the rotate and mask instructions (clear both left and right, clear left, clear right variants), OP_SHL for left shifts, and OP_SHR for right shifts. The control signals for the rotator are derived from the opcode and from the is_32bit and is_signed fields of the decode_rom_t. The rotator is instantiated as an entity in execute1 so that we can be sure we only have one of it. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	7 years ago
Paul Mackerras	c9e92483b8	decode: Push mtspr/mfspr register decoding down into execute1 Instead of doing mfctr, mflr, mftb, mtctr, mtlr as separate ops, just pass down mfspr and mtspr ops with the spr number and let execute1 decode which SPR we're addressing. This will help reduce the number of instruction bits decode1 needs to look at. In fact we now pass down the whole instruction from decode2 to execute1. We will need more bits of the instruction in future, and the tools should just optimize away any that we don't end up using. Since the 'aa' bit was just a copy of an instruction bit, we can now remove it from the record. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	7 years ago
Benjamin Herrenschmidt	586abb70a0	Update dependency Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Anton Blanchard	26f70264b3	Update Makefile dependencies Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	7 years ago

1 2

70 Commits (a3857aac940437c18a46d518929ea7e78ac7e61e)