microwatt

Commit Graph

Author	SHA1	Message	Date
Anton Blanchard	f5ca58b3c4	Merge pull request #123 from antonblanchard/spi-conf Add SPI configuration to Xilinx constraint files	6 years ago
Anton Blanchard	20674e0d65	Add SPI configuration to Xilinx constraint files Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	6 years ago
Paul Mackerras	23ade0b1c3	decode2: Minor cleanup Remove unused variable is_reg in decode_input_reg_a. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	6 years ago
Benjamin Herrenschmidt	e4f475e17f	sprs: Store common SPRs in register file This stores the most common SPRs in the register file. This includes CTR and LR and a not yet final list of others. The register file is set to 64 entries for now. Specific types are defined that can represent a GPR index (gpr_index_t) or a GPR/SPR index (gspr_index_t) along with conversion functions between the two. On order to deal with some forms of branch updating both LR and CTR, we introduced a delayed update of LR after a branch link. Note: We currently stall the pipeline on such a delayed branch, but we could avoid stalling fetch in that specific case as we know we have a branch delay. We could also limit that to the specific case where we need to update both CTR and LR. This allows us to make bcreg, mtspr and mfspr pipelined. decode1 will automatically force the single issue flag on mfspr/mtspr to a "slow" SPR. [paulus@ozlabs.org - fix direction of decode2.stall_in] Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	6 years ago
Benjamin Herrenschmidt	afdd593502	spr: Add translation from SPR to special GPR number We will want to store some SPRs in the register file using a set of "extra" registers. This provides a function for doing the translation along with some SPR definitions. This isn't used yet Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	6 years ago
Paul Mackerras	5a0458dec1	divider: Fix overflow calculation We were signalling overflow when neg_result=1 but the result was zero. Fix this. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	6 years ago
Paul Mackerras	d04887fdcd	decode1: Add OE=1 forms of add/sub, mul and div instructions Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	6 years ago
Paul Mackerras	ec9b27660f	execute: Copy XER[SO] to CR for cmp[i] and cmpl[i] instructions We were copying in XER[SO] for the dot-form instructions but not the explicit compare instructions. Fix this. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	6 years ago
Benjamin Herrenschmidt	501b6daf9b	Add basic XER support The carry is currently internal to execute1. We don't handle any of the other XER fields. This creates type called "xer_common_t" that contains the commonly used XER bits (CA, CA32, SO, OV, OV32). The value is stored in the CR file (though it could be a separate module). The rest of the bits will be implemented as a separate SPR and the two parts reconciled in mfspr/mtspr in latter commits. We always read XER in decode2 (there is little point not to) and send it down all pipeline branches as it will be needed in writeback for all type of instructions when CR0:SO needs to be updated (such forms exist for all pipeline branches even if we don't yet implement them). To avoid having to track XER hazards, we forward it back in EX1. This assumes that other pipeline branches that can modify it (mult and div) are running single issue for now. One additional hazard to beware of is an XER:SO modifying instruction in EX1 followed immediately by a store conditional. Due to our writeback latency, the store will go down the LSU with the previous XER value, thus the stcx. will set CR0:SO using an obsolete SO value. I doubt there exist any code relying on this behaviour being correct but we should account for it regardless, possibly by ensuring that stcx. remain single issue initially, or later by adding some minimal tracking or moving the LSU into the same pipeline as execute. Missing some obscure XER affecting instructions like addex or mcrxrx. [paulus@ozlabs.org - fix CA32 and OV32 for OP_ADD, fix order of arguments to set_ov] Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	6 years ago
Benjamin Herrenschmidt	f291efa266	decode1: Mark ALU ops using carry as pipelined There is no reason not to that I can think of Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	6 years ago
Benjamin Herrenschmidt	1249a11349	cr_file: Check write_cr_enable Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	6 years ago
Anton Blanchard	ac7df6fc04	Merge pull request #120 from antonblanchard/spr-decode-cleanup spr: Cleanup decoding of SPR numbers	6 years ago
Anton Blanchard	726e4db66a	Merge pull request #119 from antonblanchard/reduce-pipe-depth control: Reduce pipeline depth to 1	6 years ago
Anton Blanchard	9b1394e236	Merge pull request #118 from antonblanchard/bus-pipeline Bus pipeline	6 years ago
Benjamin Herrenschmidt	98bd8b73c0	control: Reduce pipeline depth to 1 To match our one stage execute. This might change back if we end up adding 2 stages to match the LSU, but in that case we'll want forwards as well. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	6 years ago
Benjamin Herrenschmidt	83a8bb0238	spr: Cleanup decoding of SPR numbers Use a function to obtain the integer number and use constants with the architected numbers. Replace std_match with a case statement. This also has the side effect of returning 0 instead of some random previous result on mfspr of an unknown SPR. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	6 years ago
Benjamin Herrenschmidt	cff4b13a9b	wb_arbiter: Early master selection This flips the arbiter muxes on the same cycle as a new request comes in, thus avoiding a cycle latency. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Benjamin Herrenschmidt	bc2acfde2f	wb_arbiter: Make arbiter size parametric Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Benjamin Herrenschmidt	472d8f94a2	wb_arbiter: Avoid IDLE cycle when not changing master Consecutive accesses from the same master shouldn't need an IDLE cycle. Completely remove the IDLE state and switch master when the bus is idle, but stay on the last selected one between cycles. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Benjamin Herrenschmidt	336f0e0690	ram: Ack stores early Stores only need a single cycle, so we can ack them early if there isn't an older ack already in the pipeline Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Benjamin Herrenschmidt	8e0389b973	ram: Rework main RAM interface This replaces the simple_ram_behavioural and mw_soc_memory modules with a common wishbone_bram_wrapper.vhdl that interfaces the pipelined WB with a lower-level RAM module, along with an FPGA and a sim variants of the latter. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Benjamin Herrenschmidt	9a63c098a5	Move log2/ispow2 to a utils package (Out of icache and dcache) Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Benjamin Herrenschmidt	3349bdc798	ram: Add block RAM pipelining This adds an output buffer to help with timing and allows the BRAMs to actually pipeline. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Benjamin Herrenschmidt	797b1bb045	decode: Reformat decode_types.vhdl Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Benjamin Herrenschmidt	d2762e70e5	Add option to not flatten hierarchy Vivado by default tries to flatten the module hierarchy to improve placement and timing. However this makes debugging timing issues really hard as the net names in the timing report can be pretty bogus. This adds a generic that can be used to control attributes to stop vivado from flattening the main core components. The resulting design will have worst timing overall but it will be easier to understand what the worst timing path are and address them. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Benjamin Herrenschmidt	48f260761b	writeback: Slightly improve timing The CR update currently depends on the complete data formatting mux chain. This makes it source its inputs from a bit earlier in the chian, thus improving timing a bit Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Benjamin Herrenschmidt	365f60b693	simple_ram: Turn on pipelining With a 1 cycle delay Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Benjamin Herrenschmidt	c22734d0d9	wb_debug: Add wishbone pipelining support Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Benjamin Herrenschmidt	3df018cdc0	icache: Add wishbone pipelining support Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Benjamin Herrenschmidt	d363daa692	dcache: Add wishbone pipelining support Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Benjamin Herrenschmidt	e638c3e8ae	fpga/bram: Generate stall signal This doesn't yet pipeline the block RAM, just generate a valid stall signal so it's compatible with a pipelined master Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Benjamin Herrenschmidt	37acb35773	simple_ram: Add pipelining support The generic PIPELINE_DEPTH can be set to 0 to keep it operating as a non-pipelined slave, or a larger value indicating the amount of extra cycles between requests and acks. It will always generate a valid stall signal, so it can be used in either mode with a pipelined master (but only in non-pipelined mode with a non-pipelined master). Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Benjamin Herrenschmidt	df1a9237f6	intercon: Generate stall signals for non-pipelined slaves So far the UART and the "miss" case. Memory will be pipelined Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Benjamin Herrenschmidt	7a4a9b6377	wb_arbiter: Forward stall signals They are set to '1' for non-selected devices Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Benjamin Herrenschmidt	b1424e859e	icache_tb: Initialize stop_mark Too much red in gtkwave.. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Benjamin Herrenschmidt	79101041d6	wishbone: Add stall signal Pipelined wishbone needs it Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Benjamin Herrenschmidt	559b3bcf2d	pp_uart: reformat Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Anton Blanchard	9620a76281	Merge pull request #115 from antonblanchard/reduce-wishbone Reduce wishbone	7 years ago
Anton Blanchard	247d7d4aa0	Merge pull request #113 from mikey/exec-sim-remove Remove SIM generic from execute1	7 years ago
Anton Blanchard	1b6c246379	Merge pull request #114 from antonblanchard/dcache Dcache from Ben	7 years ago
Michael Neuling	bd4ac06243	Remove SIM generic from execute1 This does nothing, so remove. Signed-off-by: Michael Neuling <mikey@neuling.org>	7 years ago
Benjamin Herrenschmidt	6dd0b514ac	Reduce wishbone address size to 32-bit For now ... it reduces the routing pressure on the FPGA This needs manual adjustment of the address decoder in soc.vhdl, at least until I can figure out how to deal with std_match Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> # Conflicts: # soc.vhdl # Conflicts: # soc.vhdl	7 years ago
Benjamin Herrenschmidt	1a63c39704	Make it possible to change wishbone address size All that needs to be changed now is the size in wishbone_types.vhdl and the address decoder in soc.vhdl Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Benjamin Herrenschmidt	cb4451498f	dcache: Add testbench A very simple one for now... Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Benjamin Herrenschmidt	742b21480e	insn: Simplistic implementation of icbi We don't yet have a proper snooper for the icache, so for now make icbi just flush the whole thing Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Benjamin Herrenschmidt	a0d95e791e	insn: Implement isync instruction The instruction works by redirecting fetch to nia+4 (hopefully using the same adder used to generate LR) and doing a backflush. Along with being single issue, this should guarantee that the next instruction only gets fetched after the pipe's been emptied. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Benjamin Herrenschmidt	6e0ee0b0db	icache & dcache: Fix store way variable We used the variable "way" in the wrong state in the cache when updating a line valid bit after the end of the wishbone transactions, we need to use the latched "store_way". Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Benjamin Herrenschmidt	587a5e3c45	dcache: Cleanup (mostly cosmetic) Clearly separate the 2 stages of load hits, improve naming and comments, clarify the writeback controls etc... Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Benjamin Herrenschmidt	265fbf894b	icache/dcache: Make both caches 32 lines, 2 ways Adding lines seems to add only little extra as the BRAMs aren't full, 2 ways is our current comprimise to limit pressure on small FPGAs. We could go to 64 lines for a little more, but timing is becoming a bit too right to my linking on the tags/LRU path of the icache, so let's leave it at 32 for now. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago
Benjamin Herrenschmidt	174378b190	dcache: Introduce an extra cycle latency to make timing This makes the BRAMs use an output buffer, introducing an extra cycle latency. Without this, Vivado won't make timing at 100Mhz. We stash all the necessary response data in delayed latches, the extra cycle is NOT a state in the state machine, thus it's fully pipelined and doesn't involve stalling. This introduces an extra non-pipelined cycle for loads with update to avoid collision on the writeback output between the now delayed load data and the register update. We could avoid it by moving the register update in the pipeline bubble created by the extra update state, but it's a bit trickier, so I leave that for a latter optimization. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	7 years ago

... 2 3 4 5 6 ...

501 Commits (7bc118c7db9aa003301a903002acfc32cec04ed7) All Branches Search

501 Commits (7bc118c7db9aa003301a903002acfc32cec04ed7)

All Branches