The fetch2 stage existed primarily to provide a stash buffer for the
output of icache when a stall occurred. However, we can get the same
effect -- of having the input to decode1 stay unchanged on a stall
cycle -- by using the read enable of the BRAMs in icache, and by
adding logic to keep the outputs unchanged on a clock cycle when
stall_in = 1. This reduces branch and interrupt latency by one
cycle.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
This logs 256 bits of data per cycle to a ring buffer in BRAM. The
data collected can be read out through 2 new SPRs or through the
debug interface.
The new SPRs are LOG_ADDR (724) and LOG_DATA (725). LOG_ADDR contains
the buffer write pointer in the upper 32 bits (in units of entries,
i.e. 32 bytes) and the read pointer in the lower 32 bits (in units of
doublewords, i.e. 8 bytes). Reading LOG_DATA gives the doubleword
from the buffer at the read pointer and increments the read pointer.
Setting bit 31 of LOG_ADDR inhibits the trace log system from writing
to the log buffer, so the contents are stable and can be read.
There are two new debug addresses which function similarly to the
LOG_ADDR and LOG_DATA SPRs. The log is frozen while either or both of
the LOG_ADDR SPR bit 31 or the debug LOG_ADDR register bit 31 are set.
The buffer defaults to 2048 entries, i.e. 64kB. The size is set by
the LOG_LENGTH generic on the core_debug module. Software can
determine the length of the buffer because the length is ORed into the
buffer write pointer in the upper 32 bits of LOG_ADDR. Hence the
length of the buffer can be calculated as 1 << (31 - clz(LOG_ADDR)).
There is a program to format the log entries in a somewhat readable
fashion in scripts/fmt_log/fmt_log.c. The log_entry struct in that
file describes the layout of the bits in the log entries.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
This outputs a carriage return rather than a newline after the
display of the progress count during the load and save operations.
This makes the output more compact and better looking.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Currently the test writes to the XICS and then checks that the
expected interrupt has happened. This turns into a stbcix
instruction followed immediately by a load from the variable that
indicates whether an interrupt has happened. It is possible for
it to take a few cycles for the store to reach the XICS and the
interrupt request signal to come back to the core, particularly
with improvements to the load/store unit and dcache.
This therefore adds a delay between storing to the XICS and
checking for the occurrence of an interrupt, so as to give the
signals time to propagate. The delay loop does an arbitrary 10
iterations, and each iteration does two loads and one store to
(cacheable) memory.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
If a bug causes an indeterminate value to be written to a GPR, an
assert causes simulation to abort. Move the assert after the report
of the GPR index and value so that we get to know what the bad value
is before the simulation terminates.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
They hurt timing forcing signals to come from the master and back
again in one cycle. Stall isn't sampled by the master unless there
is an active cycle so masking it with cyc is pointless. Masking acks
is somewhat pointless too as we don't handle early dropping of cyc
in any of our slaves properly anyways.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
It will look for an ELF binary at the flash offset specified
for the board (currently 0x300000 on Arty but that could be
changed).
Note: litedram is regenerated in order to rebuild the init code,
which was done using a newer version of litedram from LiteX.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This require the s25fl128s.vhd flash model and FMF libraries,
which will be built when passed to the Makefile via the
FLASH_MODEL_PATH argument. Otherwise a dummy module is used
which ties MISO to '1'.
The model isn't included as I'm not sure its licence (GPL) is
at this point, but it can be obtained from
https://github.com/ozbenh/microspi
FLASH_MODEL_PATH=<path to microspi>/model
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This stores the output of the PLRU big mux and clears the
tags and valid bits on the next cycle.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This adds an SPI flash controller which supports direct
memory-mapped access to the flash along with a manual
mode to send commands.
The direct mode can be set via generic to default to single
wire or quad mode. The controller supports normal, dual and quad
accesses with configurable commands, clock divider, dummy clocks
etc...
The SPI clock can be an even divider of sys_clk starting at 2
(so max 50Mhz with our typical Arty designs).
A flash offset is carried via generics to syscon to tell SW about
which portion of the flash is reserved for the FPGA bitfile. There
is currently no plumbing to make the CPU reset past that address (TBD).
Note: Operating at 50Mhz has proven unreliable without adding some
delay to the sampling of the input data. I'm working in improving
this, in the meantime, I'm leaving the default set at 25 Mhz.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
There is a long timing path to generate the ack signal from
the L2 cache as it's fully combinational for stores, including
signals coming from litedram.
Instead, pipeline the store acks. This will introduce a cycle
latency but should improve timing. Also the core will eventually
be smart enough not to wait for store acks to complete them anyway.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The DRAM related pins have some small changes in LiteX, so resync
and add the false path information as well.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This breaks the long stall signal coming back to the processor
and helps improve overall timing.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Currently, there's a huge mux gathering the output of all the PLRUs
to select the victim way on cache miss. This is fed combinationally
into the clearing of the valid and tags.
In order to help timing, let's store it instead and perform the
clearing on the next cycle. The L2 doesn't respond to requests
when not in IDLE state so this should have no negative effects.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Github workflow gives us longer run times and faster startup.
Major kudos for this goes to @eine for the initial version and for
pushing us in this direction.
Signed-off-by: Michael Neuling <mikey@neuling.org>
litedram build directory used by the generator and the
verilator obj_dir can be taken out
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We still had some wires bringing an extra serial port out of
litedram for the built-in riscv processor. This is all gone now
so take them out.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Some fields might get extended with extra bits, use the appropriate
masks when reading the values.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The rewrite of the Makefile to use "ghdl -c" somewhat broke building
the unisim library as ghdl doesn't yet support putting files in
separate libraries from a single command line invocation.
The workaround at the time was to put the entire project in "unisim"
which is ... weird and will break if we try to add another library
such as fmf.
This fixes it by generating the library separately using "ghdl -i"
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The changes in d3c274d01e ("flash-arty: Add support for specifying the file type")
added a local jtagspi.cfg, which meant openocd must be run from the root
of the microwatt directory.
This puts the content into the xilinx-xc7.cfg so the script can be used
from any path again.
Signed-off-by: Joel Stanley <joel@jms.id.au>
icbi currently just resets the icache. This has some nasty side
effects such as also clearing the TLB, but also the wishbone interface.
That means that any ongoing cycle will be dropped.
However, most of our slaves don't handle that well and will continue
sending acks for already issued requests.
Under some circumstances we can thus restart an icache load and get
spurious ack/data from the wishbone left over from the "cancelled"
sequence.
This has broken booting Linux for me.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
These were missed earlier when the single-issue flag was turned off on
the other loads and stores by commit 1a244d3470 ("Remove single-issue
constraint for most loads and stores").
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
We don't run these but we should.
The SOC tests have bit rotted. We need to fix them but leave them out
for now.
Signed-off-by: Michael Neuling <mikey@neuling.org>
This increases the number of L2 lines from 32 to 64. The BRAM usage is the
same as they were only half used. There's an increase in LUTs and registers
due to the extra tags and valid bits, but none of it should be in a
space constrained or critical timing path.
We could make it wider instead (256 bytes lines) which would reduce usage
instead, but this increases the latency by 8 cycles. Something to consider
once the L2 is capable of early response on miss and starting reloads
from any point in a line.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
By adding logic to decode2 to be able to send the instruction address
down the A input, and making CONST_DX_HI (renamed to CONST_DXHI4) add
4 to the immediate value (easy since the bottom 16 bits were zero),
we can do addpcis using the main adder. This reduces the width of the
result mux and frees up one value in insn_type_t, since we can now use
OP_ADD for addpcis.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Support for this has bitrotted and would require refactoring of L2 to
be brought back. It's also not really needed anymore now that we ship
pre-generated litedram and that LiteX supports what we do.
So take it out, which simplifies some of the scripts as well. This also
fixes up CSR alignment the sim model.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>