Commit Graph

10 Commits (7575b1e0c2b1c21847ed3103185858d1a512ead6)

Author SHA1 Message Date
Paul Mackerras 6701e7346b core: Use a busy signal rather than a stall
This changes the instruction dependency tracking so that we can
generate a "busy" signal from execute1 and loadstore1 which comes
along one cycle later than the current "stall" signal.  This will
enable us to signal busy cycles only when we need to from loadstore1.

The "busy" signal from execute1/loadstore1 indicates "I didn't take
the thing you gave me on this cycle", as distinct from the previous
stall signal which meant "I took that but don't give me anything
next cycle".  That means that decode2 proactively gives execute1
a new instruction as soon as it has taken the previous one (assuming
there is a valid instruction available from decode1), and that then
sits in decode2's output until execute1 can take it.  So instructions
are issued by decode2 somewhat earlier than they used to be.

Decode2 now only signals a stall upstream when its output buffer is
full, meaning that we can fill up bubbles in the upstream pipe while a
long instruction is executing.  This gives a small boost in
performance.

This also adds dependency tracking for rA updates by update-form
load/store instructions.

The GPR and CR hazard detection machinery now has one extra stage,
which may not be strictly necessary.  Some of the code now really
only applies to PIPELINE_DEPTH=1.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
4 years ago
Benjamin Herrenschmidt 31b55b2a75 core: Improve core reset
The icache would still spit out an instruction which could
cause a 0x700 instead of a reset.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
5 years ago
Paul Mackerras b14d982011 execute: Implement bypass from output of execute1 to input
This enables back-to-back execution of integer instructions where
the first instruction writes a GPR and the second reads the same
GPR.  This is done with a set of multiplexers at the start of
execute1 which enable any of the three input operands to be taken
from the output of execute1 (i.e. r.e.write_data) rather than the
input from decode2 (i.e. e_in.read_data[123]).

This also requires changes to the hazard detection and handling.
Decode2 generates a signal indicating that the GPR being written
is available for bypass, which is true for instructions that are
executed in execute1 (rather than loadstore1/dcache).  The
gpr_hazard module stores this "bypassable" bit, and if the same
GPR needs to be read by a subsequent instruction, it outputs a
"use_bypass" signal rather than generating a stall.  The
use_bypass signal is then latched at the output of decode2 and
passed down to execute1 to control the input multiplexer.

At the moment there is no bypass on the inputs to loadstore1, but that
is OK because all load and store instructions are marked as
single-issue.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
5 years ago
Anton Blanchard 1c05f330c6 control: Fix build issue with Fedora 31 version of GHDL
I'm hitting an issue with the Fedora 31 version of GHDL that
appears to be fixed upstream:

control.vhdl:105:39:error: actual expression must be globally static

Add a signal to get rid of error.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
5 years ago
Benjamin Herrenschmidt e4f475e17f sprs: Store common SPRs in register file
This stores the most common SPRs in the register file.

This includes CTR and LR and a not yet final list of others.

The register file is set to 64 entries for now. Specific types
are defined that can represent a GPR index (gpr_index_t) or
a GPR/SPR index (gspr_index_t) along with conversion functions
between the two.

On order to deal with some forms of branch updating both LR and
CTR, we introduced a delayed update of LR after a branch link.

Note: We currently stall the pipeline on such a delayed branch,
but we could avoid stalling fetch in that specific case as we
know we have a branch delay. We could also limit that to the
specific case where we need to update both CTR and LR.

This allows us to make bcreg, mtspr and mfspr pipelined. decode1
will automatically force the single issue flag on mfspr/mtspr to
a "slow" SPR.

[paulus@ozlabs.org - fix direction of decode2.stall_in]

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
5 years ago
Benjamin Herrenschmidt 98bd8b73c0 control: Reduce pipeline depth to 1
To match our one stage execute.

This might change back if we end up adding 2 stages to match the
LSU, but in that case we'll want forwards as well.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
5 years ago
Anton Blanchard 813f834012 Add CR hazard detection
To keep things simple we treat the CR as a single entity.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
5 years ago
Anton Blanchard bb65d0b899 Remove issue restrictions on a number of instructions
Anything that isn't a load or store and anything that doesn't read the
CR can go as soon as its inputs are ready.

While we could also allow SPR read/write and carry read/write, we plan
to change them to be read in decode2 and written in writeback soon and
they will need separate hazard detection to be added.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
5 years ago
Anton Blanchard bdc26b7527 Add GPR hazard detection
Check GPRs against any writers in the pipeline.

All instructions are still marked single in pipeline at
this stage.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
5 years ago
Anton Blanchard d5346d0abf Separate issue control into its own unit
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
5 years ago