Make the core go faster
Several major improvements in here:
- Simple branch predictor
- Reduced latency for mispredicted branches and interrupts by removing fetch2 stage
- Cache improvements
o Request critical dword first on refill
o Handle hits while refilling, including on line being refilled
o Sizes doubled for both D and I
- Loadstore improvements: can now do one load or store every two cycles in most cases
- Optimized 2-cycle multiplier for Xilinx 7-series parts using DSP slices
- Timing improvements, including:
o Stash buffer in decode1
o Reduced width of execute1 result mux
o Improved SPR decode in decode1
o Some non-critical operation take a cycle longer so we can break some long combinatorial chains
- Core logging: logs 256 bits of info every cycle into a ring buffer, to help with debugging and performance analysis
This increases the LUT usage for the "synth" + A35 target from 9182 to 10297 = 12%.
This plumbs the LOG_LENGTH parameter (which controls how many entries
the core log RAM has) up to the top level so that it can be set on
the fusesoc command line and have different default values on
different FPGAs.
It now defaults to 512 entries generally and on the Artix-7 35 parts,
and 2048 on the larger Artix-7 FPGAs. It can be set to 0 if desired.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Use a separate process to assign selected interrupts to the
interrupt array, and document them.
There's only one interrupt *for now* but that will change
and this way is clearer.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This makes the control bus currently going out of "soc" towards
litedram more generic for external IO devices added by the
top-level rather than inside the SoC proper.
This is mostly renaming of signals and a small change on how the
address decoder operates, using a separate "cascaded" decode for
the external IOs.
We make the region 0xc8nn_nnnn be the "external IO" region for
now.
This will make it easier / cleaner to add more external devices.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Currently, when not using litedram, the top level still has to hook
up "dummy" wishbones to the main dram and control dram busses coming
out of the SoC and provide ack signals.
Instead, make the SoC generate the acks internally when not using
litedram and use defaults to make the wiring entirely optional.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
That way the top-level's don't need to assign them
Also remove generics that are set to the default anyways
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
They hurt timing forcing signals to come from the master and back
again in one cycle. Stall isn't sampled by the master unless there
is an active cycle so masking it with cyc is pointless. Masking acks
is somewhat pointless too as we don't handle early dropping of cyc
in any of our slaves properly anyways.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This adds an SPI flash controller which supports direct
memory-mapped access to the flash along with a manual
mode to send commands.
The direct mode can be set via generic to default to single
wire or quad mode. The controller supports normal, dual and quad
accesses with configurable commands, clock divider, dummy clocks
etc...
The SPI clock can be an even divider of sys_clk starting at 2
(so max 50Mhz with our typical Arty designs).
A flash offset is carried via generics to syscon to tell SW about
which portion of the flash is reserved for the FPGA bitfile. There
is currently no plumbing to make the CPU reset past that address (TBD).
Note: Operating at 50Mhz has proven unreliable without adding some
delay to the sampling of the input data. I'm working in improving
this, in the meantime, I'm leaving the default set at 25 Mhz.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This adds an option to disable the main BRAM and instead copy a
payload stashed along with the init code in the secondary BRAM
into DRAM and boot from there
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Use a simple wire. common.vhdl types are better kept for things
local to the core. We can add more wires later if we need to for
HV irqs etc...
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This changes the SoC interconnect such that the main 64-bit wishbone out
of the processor is first split between only 3 slaves (BRAM, DRAM and a
general "IO" bus) instead of all the slaves in the SoC.
The IO bus leg is then latched and down-converted to 32 bits data width,
before going through a second address decoder for the various IO devices.
This significantly reduces routing and timing pressure on the main bus,
allowing to get rid of frequent timing violations when synthetizing on
small'ish FPGAs such as the Artix-7 35T found on the original Arty board.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This adds one-cycle latches to the various resets out of the soc and
into the various core modules. It *seems* to help vivado P&R a bit
and has shown to avoid timing violations under some circumstances.
Interestingly those resets never seem to appear in the bad timing
path. It looks like those long resets simply impose placement
constraints that Vivado satisfies at the expense of timing elsewhere.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Things have changed a bit in upstream LiteX. LiteDRAM now exposes a
wishbone for the CSRs for example.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The ghdl packaged in Fedora 31 doesn't like a port map of the form
"rst => rst or core_reset", so this works around the problem by
doing the OR in a separate statement.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
This adds support for initializing the memory controller from microwatt
rather than using a built-in RiscV processor. This might require some
fixes to LiteX and LiteDRAM (they haven't been merged as of this commit
yet).
This is enabled in the shipped generated files and can be changed via
modifying the generator script to pass False to "mw_init"
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
These provides some info about the SoC (though it's still somewhat
incomplete and needs more work, see comments).
There's also a control register for selecting DRAM vs. BRAM at 0
(and for soft-resetting the SoC but that isn't wired up yet).
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
An external signal can control whether the core will start
executing at the standard or the alternate reset address.
This will be used when litedram is initialized by microwatt
itself, to route the reset to the built-in init code secondary
block RAM.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
New unified ICP and ICS XICS compliant interrupt controller.
Configurable number of hardware sources.
Fixed hardware source number based on hardware line taken. All
hardware interrupts are a fixed priority. Level interrupts supported
only.
Hardwired to 0xc0004000 in SOC (UART is kept at 0xc0002000).
Signed-off-by: Michael Neuling <mikey@neuling.org>
This replaces the simple_ram_behavioural and mw_soc_memory modules
with a common wishbone_bram_wrapper.vhdl that interfaces the
pipelined WB with a lower-level RAM module, along with an FPGA
and a sim variants of the latter.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Vivado by default tries to flatten the module hierarchy to improve
placement and timing. However this makes debugging timing issues
really hard as the net names in the timing report can be pretty
bogus.
This adds a generic that can be used to control attributes to stop
vivado from flattening the main core components. The resulting design
will have worst timing overall but it will be easier to understand
what the worst timing path are and address them.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
For now ... it reduces the routing pressure on the FPGA
This needs manual adjustment of the address decoder in soc.vhdl, at
least until I can figure out how to deal with std_match
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
# Conflicts:
# soc.vhdl
# Conflicts:
# soc.vhdl
The current scheme has UART0 repeating throughout the UART address range.
This patch tightens the address checking so that it only occurs once.
Signed-off-by: Alastair D'Silva <alastair@d-silva.org>
This module adds some simple core controls:
reset, stop, start, step
along with icache clear and reading the NIA and core
status bits
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org
This adds a debug module off the DMI (debug) bus which can act as a
wishbone master to generate read and write cycles.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This adds a simple bus that can be mastered from an external
system via JTAG, which will be used to hookup various debug
modules.
It's loosely based on the RiscV model (hence the DMI name).
The module currently only supports hooking up to a Xilinx BSCANE2
but it shouldn't be too hard to adapt it to support different TAPs
if necessary.
The JTAG protocol proper is not exactly the RiscV one at this point,
though I might still change it.
This comes with some sim variants of Xilinx BSCANE2 and BUFG and a
test bench.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>