Commit Graph

1161 Commits (602ba25c7070665d2ac99e9d6de3897ae2f2c3e7)
 

Author SHA1 Message Date
Anton Blanchard b47b71821e loadstore1: reduce U state being output
While these signals should only be read when valid is true, they
are only a small number of bits and we want to reduce the amount of
U/X state bouncing around the chip.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
3 years ago
Anton Blanchard 71d4b5ed20 core_debug: Initialise gspr_index
Another case of U state being driven out of a module.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
3 years ago
Anton Blanchard a527d9b959 core: Remove unused icache_inv signal
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
3 years ago
Anton Blanchard e7f0a7c7ac icache: Don't output X on i_out.insn
decode1 has a lot of logic that uses i_out.insn without first looking at
i_iout.valid. Play it safe and never output X state.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
3 years ago
Anton Blanchard 39220be311 dcache: remove unused do_write signal
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
3 years ago
Anton Blanchard 843361f2be execute1: sub_mux_sel and result_mux_sel are unused
Remove them.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
3 years ago
Anton Blanchard d3a7517318 divider: Fix d_out.overflow U state issue
While we should only look at this when d_out.valid = 1, we may as remove
some U state across interfaces.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
3 years ago
Anton Blanchard 1ff852b012
Merge pull request #369 from antonblanchard/loadstore-pmu-init
loadstore1: Initialise PMU events
3 years ago
Anton Blanchard e2438071a1 loadstore1: Initialise PMU events
The loadstore1 PMU events are U state until a load and a store completes.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
3 years ago
Anton Blanchard b7c4d3c5c3
Merge pull request #367 from antonblanchard/fpu-typo
fpu: Fix capitalisation of Execute1ToFPUType
3 years ago
Anton Blanchard f06abb67ad icache: Hook up PMU events
We weren't connecting the icache PMU events up.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
3 years ago
Anton Blanchard 64d2def0c6 fpu: Fix capitalisation of Execute1ToFPUType
While this is not an issue in VHDL, I noticed this when running
a script over the source and we may as well fix it.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
3 years ago
Anton Blanchard ff442d1bdb Zero BSS in hello world test
While trying to reduce U/X state issues, I notice that our BSS is not
being initialised in the hello world test.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
3 years ago
Anton Blanchard b8fc5636a4
Merge pull request #365 from antonblanchard/less-fpga-init
Remove some FPGA style signal inits
3 years ago
Anton Blanchard ebdddcc402 Remove some FPGA style signal inits
These don't work on the ASIC flow, so remove them and initialise
them explicitly where required.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
3 years ago
Anton Blanchard a750365ffa Remove some FPGA style signal inits
These don't work on the ASIC flow, so remove them and initialise
them explicitly where required.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
3 years ago
Joel Stanley 9ec22af256 README: Add Linux on Microwatt instructions
These instructions are similar to those at

 https://ozlabs.org/~joel/microwatt/README

except they describe how to build the artifacts from scratch instead of
downloading them.

Signed-off-by: Joel Stanley <joel@jms.id.au>
3 years ago
Joel Stanley a31725d989 README: Add uart to fusesoc instructions
The SoC defaults to using the uart16550 so provide instructions on how
to fetch that library when seetting up fusesoc.

Also remove the text about a working directory; fusesoc doesn't need
one.

Signed-off-by: Joel Stanley <joel@jms.id.au>
3 years ago
Michael Neuling f5e06c2d4b
Merge pull request #361 from antonblanchard/alt-reset-address
Allow ALT_RESET_ADDRESS to be overridden
3 years ago
Anton Blanchard 948f6f43a7 Allow ALT_RESET_ADDRESS to be overridden
This allows us to boot from flash for example.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
3 years ago
Michael Neuling 8bf48ac094
Merge pull request #360 from antonblanchard/log2ceil-issue
wishbone_bram_wrapper ram_addr_bits is 1 bit off
3 years ago
Anton Blanchard b5accb78b2 wishbone_bram_wrapper ram_addr_bits is 1 bit off
log2ceil() returns the number of bits required to store a value, so we
need to pass in memory_size-1, not memory_size.

Every other user of log2ceil() gets this right.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
3 years ago
Michael Neuling 30fd936c12
Merge pull request #358 from antonblanchard/unused-sig
Remove unused sequential signal from Fetch1ToIcacheType
3 years ago
Michael Neuling af1b76d944
Merge pull request #356 from antonblanchard/fpu-constant
fpu: Make inverse_table a constant
3 years ago
Michael Neuling 9b96ab730c
Merge pull request #357 from antonblanchard/xics-warning
xics: Fix warning when comparing two std_ulogic_vectors
3 years ago
Anton Blanchard 0b39947f8d Remove unused sequential signal from Fetch1ToIcacheType
GHDL synthesis is flagging a warning about this.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
3 years ago
Anton Blanchard 00bf0af21c xics: Fix warning when comparing two std_ulogic_vectors
Use unsigned() to make it clear what we are doing.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
3 years ago
Anton Blanchard 50b4cb9423 fpu: Make inverse_table a constant
GHDL synthesis is complaining that inverse_table is never stored to.
Change it to a constant.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
3 years ago
Tianrui Wei 844ca0e6b5
fix: fix icache_tb not finishing correctly
Setting icache to be privileged and accessing physical memory directly.
And set big_endian to 0 to correspond to the testbench result.

Signed-off-by: Tianrui Wei <tianrui@tianruiwei.com>
3 years ago
Michael Neuling f01f3d233a
Merge pull request #352 from mkj/static-urjtag
mw_debug: Add STATIC_URJTAG flag
3 years ago
Matt Johnston c0c00d05bc mw_debug: Add STATIC_URJTAG flag
Revert to linking dynamically by default, can statically link with
`make STATIC_URJTAG=1`

Fixes #351

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
3 years ago
Michael Neuling ffcdaaa92d
Update the README Issues (#350)
We've had these for a while now:
 - D/I cache
 - GPR bypassing
 - Supervisor state (and can boot linux)

We still need Vector/VMX/VSX (and probably some other things)

Signed-off-by: Michael Neuling <mikey@neuling.org>
3 years ago
Michael Neuling b4770197a2
Merge pull request #349 from madscientist159/master
Extend LiteDRAM VHDL wrapper to allow more than one clock line
3 years ago
Raptor Engineering Development Team fcb783a0fb Extend LiteDRAM VHDL wrapper to allow more than one clock line
This is necessary for the upcoming Arctic Tern system enablement,
since Arctic Tern uses two DRAM devices and a separate clock line
is routed to each device.  LiteX handles this behavior correctly,
therefore we assume other hardware exists that uses a similar
DRAM clock design.

Updates from Mikey to fix some compile issues.

Signed-off-by: Timothy Pearson <tpearson@raptorengineering.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
3 years ago
Michael Neuling 2b97fb0bf3
Merge pull request #348 from paulusmack/reduce
Reduce LUT usage
3 years ago
Paul Mackerras 0aa898c7a6 xics: Rework the irq_gen process
At present, the loop in the irq_gen process generates a chain of
comparators and other logic to work out the source number and priority
of the most-favoured (lowest priority number) pending interrupt.
This replaces that chain with (1) logic to generate an array of bits,
one per priority, indicating whether any interrupt is pending at that
priority, (2) a priority encoder to select the most favoured priority
with an interrupt pending, (3) logic to generate an array of bits, one
per source, indicating whether an interrupt is pending at the priority
calculated in step 2, and (4) a priority encoder to work out the
lowest numbered source that has an interrupt pending at the selected
priority.  This reduces LUT utilization.

The priority encoder function implemented here uses the optimized
count-leading-zeroes logic from helpers.vhdl.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
3 years ago
Paul Mackerras 1720a0584a Use alternative count-leading-zeroes algorithm in the FPU and LSU
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
3 years ago
Paul Mackerras 1086988883 countzero: Use alternative algorithm for higher bits
This implements an alternative count-leading-zeroes algorithm which
uses less LUTs to generate the higher-order bits (2..5) of the
result.

By doing (v | -v) rather than (v & -v), we get a value which has ones
from the MSB down to the rightmost 1 bit in v and then zeroes down to
the LSB.  This means that we can generate the MSB of the result (the
index of the rightmost 1 bit in v) just by looking at bits 63 and 31
of (v | -v), assuming that v is 64 bits.  Bit 4 of the result requires
looking at bits 63, 47, 31 and 15.  In contrast, each bit of the
result using (v & -v), which has a single 1, requires ORing together
32 bits.

It turns out that the minimum LUT usage comes from using (v & -v) to
generate bits 0 and 1 of the result, and using (v | -v) to generate
bits 2 to 5.  This saves almost 60 6-input LUTs on the Artix-7.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
3 years ago
Paul Mackerras 4cf2921b0b soc: Re-do peripheral address decode to improve timing
This generates a series of io_cycle_* signals which are clean latches
and which become the 'cyc' signals of the wishbone buses going to
various peripherals (syscon, uarts, XICS, GPIO, etc.).  Effectively
this is done by moving the address decoding into the slave_io_latch
process.  The slave_io_type, which drives the multiplexer which
selects which wishbone to look for a response on, is reduced to just 8
values in the expectation that an 8-way multiplexer will use less
logic than one with more than 8 inputs.

With this timing is considerably better on the A7-100T.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
3 years ago
Michael Neuling 27b660ef76
Merge pull request #346 from mkj/dmi_ecp5
Add DMI and mw_debug for ECP5
3 years ago
Anton Blanchard 5a5a082601
Merge pull request #343 from mikey/orange-crab-ci
ci: Add new Orange Crab build
3 years ago
Matt Johnston 9c64f8a98b mw_debug: Add Lattice ECP5 support
"-b ecp5" will select ECP5 interface that talks to a JTAGG
primitive.

For example with a FT232H JTAG board:

./mw_debug  -t 'ft2232 vid=0x0403 pid=0x6014'  -s 30000000 -b ecp5 mr ff003888 6
Connected to libftdi driver.
Found device ID: 0x41113043
00000000ff003888: 6d6f636c65570a0a  ..Welcom
00000000ff003890: 63694d206f742065  e to Mic
00000000ff003898: 2120747461776f72  rowatt !
00000000ff0038a0: 0000000000000a0a  ........
00000000ff0038a8: 67697320636f5320   Soc sig
00000000ff0038b0: 203a65727574616e  nature:
Core: running
 NIA: c0000000000187f8
 MSR: 9000000000001033

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
3 years ago
Matt Johnston 3775650df3 dmi_dtm_ecp5: Use ECP5 JTAGG for DMI
This uses the JTAGG primitive which is similar to BSCANE2.
The LUT4 delay approach came from Florian and Greg in
https://github.com/enjoy-digital/litex/pull/1087

Has been tested on an OrangeCrab with 48MHz sysclk
FT232H up to 30MHz (though libusb/urjtag is by far the bottleneck vs
the JTAG clock)

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
3 years ago
Matt Johnston eb20195a10 mw_debug: Link urjtag statically
liburjtag isn't in Debian, so usually we're pointing at a urjtag
build directory when building mw_debug

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
3 years ago
Matt Johnston 763138798e mw_debug: use isxdigit for hex arguments
Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
3 years ago
Matt Johnston 04cc4a842c mw_debug: Add -s frequency argument
Chose -s for speed, vs -f for --force

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
3 years ago
Matt Johnston e05ae0c8cb mw_debug: pass target parameters to urjtag
An example

./mw_debug -d -t 'ft2232 vid=0x0403 pid=0x6014'

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
3 years ago
Paul Mackerras 49ec80ac3e fetch1/icache1: Remove the use_previous logic
This removes logic that I added some time ago with the thought that it
would enable us to do prefetching in the icache.  This logic detects
when the fetch address is an odd multiple of 4 and the next address in
sequence from the previous cycle.  In that case the instruction we
want is in the output register of the icache RAM already so there is
no need to do another read or any icache tag or TLB lookup.

However, this logic adds complexity, and removing it improves timing,
so this removes it.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
3 years ago
Paul Mackerras cef3660e74
Merge pull request #345 from antonblanchard/popcnt-go-fast
popcnt* timing improvements from Paul
3 years ago
Paul Mackerras 2491aa7fc5 core: Make popcnt* take two cycles
This moves the calculation of the result for popcnt* into the
countbits unit, renamed from countzero, so that we can take two cycles
to get the result.  The motivation for this is that the popcnt*
calculation was showing up as a critical path.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
3 years ago