While these signals should only be read when valid is true, they
are only a small number of bits and we want to reduce the amount of
U/X state bouncing around the chip.
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
decode1 has a lot of logic that uses i_out.insn without first looking at
i_iout.valid. Play it safe and never output X state.
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
While we should only look at this when d_out.valid = 1, we may as remove
some U state across interfaces.
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
While this is not an issue in VHDL, I noticed this when running
a script over the source and we may as well fix it.
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
While trying to reduce U/X state issues, I notice that our BSS is not
being initialised in the hello world test.
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
These instructions are similar to those at
https://ozlabs.org/~joel/microwatt/README
except they describe how to build the artifacts from scratch instead of
downloading them.
Signed-off-by: Joel Stanley <joel@jms.id.au>
The SoC defaults to using the uart16550 so provide instructions on how
to fetch that library when seetting up fusesoc.
Also remove the text about a working directory; fusesoc doesn't need
one.
Signed-off-by: Joel Stanley <joel@jms.id.au>
log2ceil() returns the number of bits required to store a value, so we
need to pass in memory_size-1, not memory_size.
Every other user of log2ceil() gets this right.
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
Setting icache to be privileged and accessing physical memory directly.
And set big_endian to 0 to correspond to the testbench result.
Signed-off-by: Tianrui Wei <tianrui@tianruiwei.com>
Revert to linking dynamically by default, can statically link with
`make STATIC_URJTAG=1`
Fixes#351
Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
We've had these for a while now:
- D/I cache
- GPR bypassing
- Supervisor state (and can boot linux)
We still need Vector/VMX/VSX (and probably some other things)
Signed-off-by: Michael Neuling <mikey@neuling.org>
This is necessary for the upcoming Arctic Tern system enablement,
since Arctic Tern uses two DRAM devices and a separate clock line
is routed to each device. LiteX handles this behavior correctly,
therefore we assume other hardware exists that uses a similar
DRAM clock design.
Updates from Mikey to fix some compile issues.
Signed-off-by: Timothy Pearson <tpearson@raptorengineering.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
At present, the loop in the irq_gen process generates a chain of
comparators and other logic to work out the source number and priority
of the most-favoured (lowest priority number) pending interrupt.
This replaces that chain with (1) logic to generate an array of bits,
one per priority, indicating whether any interrupt is pending at that
priority, (2) a priority encoder to select the most favoured priority
with an interrupt pending, (3) logic to generate an array of bits, one
per source, indicating whether an interrupt is pending at the priority
calculated in step 2, and (4) a priority encoder to work out the
lowest numbered source that has an interrupt pending at the selected
priority. This reduces LUT utilization.
The priority encoder function implemented here uses the optimized
count-leading-zeroes logic from helpers.vhdl.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
This implements an alternative count-leading-zeroes algorithm which
uses less LUTs to generate the higher-order bits (2..5) of the
result.
By doing (v | -v) rather than (v & -v), we get a value which has ones
from the MSB down to the rightmost 1 bit in v and then zeroes down to
the LSB. This means that we can generate the MSB of the result (the
index of the rightmost 1 bit in v) just by looking at bits 63 and 31
of (v | -v), assuming that v is 64 bits. Bit 4 of the result requires
looking at bits 63, 47, 31 and 15. In contrast, each bit of the
result using (v & -v), which has a single 1, requires ORing together
32 bits.
It turns out that the minimum LUT usage comes from using (v & -v) to
generate bits 0 and 1 of the result, and using (v | -v) to generate
bits 2 to 5. This saves almost 60 6-input LUTs on the Artix-7.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
This generates a series of io_cycle_* signals which are clean latches
and which become the 'cyc' signals of the wishbone buses going to
various peripherals (syscon, uarts, XICS, GPIO, etc.). Effectively
this is done by moving the address decoding into the slave_io_latch
process. The slave_io_type, which drives the multiplexer which
selects which wishbone to look for a response on, is reduced to just 8
values in the expectation that an 8-way multiplexer will use less
logic than one with more than 8 inputs.
With this timing is considerably better on the A7-100T.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
"-b ecp5" will select ECP5 interface that talks to a JTAGG
primitive.
For example with a FT232H JTAG board:
./mw_debug -t 'ft2232 vid=0x0403 pid=0x6014' -s 30000000 -b ecp5 mr ff003888 6
Connected to libftdi driver.
Found device ID: 0x41113043
00000000ff003888: 6d6f636c65570a0a ..Welcom
00000000ff003890: 63694d206f742065 e to Mic
00000000ff003898: 2120747461776f72 rowatt !
00000000ff0038a0: 0000000000000a0a ........
00000000ff0038a8: 67697320636f5320 Soc sig
00000000ff0038b0: 203a65727574616e nature:
Core: running
NIA: c0000000000187f8
MSR: 9000000000001033
Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
This uses the JTAGG primitive which is similar to BSCANE2.
The LUT4 delay approach came from Florian and Greg in
https://github.com/enjoy-digital/litex/pull/1087
Has been tested on an OrangeCrab with 48MHz sysclk
FT232H up to 30MHz (though libusb/urjtag is by far the bottleneck vs
the JTAG clock)
Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
liburjtag isn't in Debian, so usually we're pointing at a urjtag
build directory when building mw_debug
Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
This removes logic that I added some time ago with the thought that it
would enable us to do prefetching in the icache. This logic detects
when the fetch address is an odd multiple of 4 and the next address in
sequence from the previous cycle. In that case the instruction we
want is in the output register of the icache RAM already so there is
no need to do another read or any icache tag or TLB lookup.
However, this logic adds complexity, and removing it improves timing,
so this removes it.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>