Compare commits

...

148 Commits

Author SHA1 Message Date
Paul Mackerras 35e0dbed34
Merge pull request #353 from tianrui-wei/master
fix: fix icache_tb not finishing correctly
2 years ago
Michael Neuling cd52390bf1
Merge pull request #373 from antonblanchard/icache-insn-u-state
icache: Don't output X on i_out.insn
2 years ago
Michael Neuling b983d5080e
Merge pull request #376 from antonblanchard/loadstore-init
loadstore1: reduce U state being output
2 years ago
Michael Neuling d4db331467
Merge pull request #374 from antonblanchard/icache-unused-sig
core: Remove unused icache_inv signal
2 years ago
Michael Neuling ee5e3778ed
Merge pull request #364 from shenki/readme-updates
Readme updates
2 years ago
Michael Neuling c43692f4c7
Merge pull request #372 from antonblanchard/dcache-unused-sig
dcache: remove unused do_write signal
2 years ago
Michael Neuling 956df2c863
Merge pull request #371 from antonblanchard/unused-sig
execute1: sub_mux_sel and result_mux_sel are unused
2 years ago
Michael Neuling 3627f102db
Merge pull request #370 from antonblanchard/divider-init
divider: Fix d_out.overflow U state issue
2 years ago
Paul Mackerras 6e1e763c02
Merge pull request #368 from antonblanchard/icache-pmu-events
icache: Hook up PMU events
2 years ago
Anton Blanchard 1047239a37
Merge pull request #377 from antonblanchard/fpu-init
fpu: Reduce uninitialised signals
2 years ago
Anton Blanchard 9d35340bb1 fpu: Reduce uninitialised signals
Reduce uninitialised signals coming out of the FPU.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2 years ago
Michael Neuling b82eea5933
Merge pull request #366 from antonblanchard/hello-world-bss
Zero BSS in hello world test
2 years ago
Anton Blanchard d3aff67fa7
Merge pull request #375 from antonblanchard/core_debug-init
core_debug: Initialise gspr_index
2 years ago
Anton Blanchard b47b71821e loadstore1: reduce U state being output
While these signals should only be read when valid is true, they
are only a small number of bits and we want to reduce the amount of
U/X state bouncing around the chip.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2 years ago
Anton Blanchard 71d4b5ed20 core_debug: Initialise gspr_index
Another case of U state being driven out of a module.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2 years ago
Anton Blanchard a527d9b959 core: Remove unused icache_inv signal
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2 years ago
Anton Blanchard e7f0a7c7ac icache: Don't output X on i_out.insn
decode1 has a lot of logic that uses i_out.insn without first looking at
i_iout.valid. Play it safe and never output X state.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2 years ago
Anton Blanchard 39220be311 dcache: remove unused do_write signal
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2 years ago
Anton Blanchard 843361f2be execute1: sub_mux_sel and result_mux_sel are unused
Remove them.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2 years ago
Anton Blanchard d3a7517318 divider: Fix d_out.overflow U state issue
While we should only look at this when d_out.valid = 1, we may as remove
some U state across interfaces.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2 years ago
Anton Blanchard 1ff852b012
Merge pull request #369 from antonblanchard/loadstore-pmu-init
loadstore1: Initialise PMU events
2 years ago
Anton Blanchard e2438071a1 loadstore1: Initialise PMU events
The loadstore1 PMU events are U state until a load and a store completes.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2 years ago
Anton Blanchard b7c4d3c5c3
Merge pull request #367 from antonblanchard/fpu-typo
fpu: Fix capitalisation of Execute1ToFPUType
2 years ago
Anton Blanchard f06abb67ad icache: Hook up PMU events
We weren't connecting the icache PMU events up.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
2 years ago
Anton Blanchard 64d2def0c6 fpu: Fix capitalisation of Execute1ToFPUType
While this is not an issue in VHDL, I noticed this when running
a script over the source and we may as well fix it.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
3 years ago
Anton Blanchard ff442d1bdb Zero BSS in hello world test
While trying to reduce U/X state issues, I notice that our BSS is not
being initialised in the hello world test.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
3 years ago
Anton Blanchard b8fc5636a4
Merge pull request #365 from antonblanchard/less-fpga-init
Remove some FPGA style signal inits
3 years ago
Anton Blanchard ebdddcc402 Remove some FPGA style signal inits
These don't work on the ASIC flow, so remove them and initialise
them explicitly where required.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
3 years ago
Anton Blanchard a750365ffa Remove some FPGA style signal inits
These don't work on the ASIC flow, so remove them and initialise
them explicitly where required.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
3 years ago
Joel Stanley 9ec22af256 README: Add Linux on Microwatt instructions
These instructions are similar to those at

 https://ozlabs.org/~joel/microwatt/README

except they describe how to build the artifacts from scratch instead of
downloading them.

Signed-off-by: Joel Stanley <joel@jms.id.au>
3 years ago
Joel Stanley a31725d989 README: Add uart to fusesoc instructions
The SoC defaults to using the uart16550 so provide instructions on how
to fetch that library when seetting up fusesoc.

Also remove the text about a working directory; fusesoc doesn't need
one.

Signed-off-by: Joel Stanley <joel@jms.id.au>
3 years ago
Michael Neuling f5e06c2d4b
Merge pull request #361 from antonblanchard/alt-reset-address
Allow ALT_RESET_ADDRESS to be overridden
3 years ago
Anton Blanchard 948f6f43a7 Allow ALT_RESET_ADDRESS to be overridden
This allows us to boot from flash for example.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
3 years ago
Michael Neuling 8bf48ac094
Merge pull request #360 from antonblanchard/log2ceil-issue
wishbone_bram_wrapper ram_addr_bits is 1 bit off
3 years ago
Anton Blanchard b5accb78b2 wishbone_bram_wrapper ram_addr_bits is 1 bit off
log2ceil() returns the number of bits required to store a value, so we
need to pass in memory_size-1, not memory_size.

Every other user of log2ceil() gets this right.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
3 years ago
Michael Neuling 30fd936c12
Merge pull request #358 from antonblanchard/unused-sig
Remove unused sequential signal from Fetch1ToIcacheType
3 years ago
Michael Neuling af1b76d944
Merge pull request #356 from antonblanchard/fpu-constant
fpu: Make inverse_table a constant
3 years ago
Michael Neuling 9b96ab730c
Merge pull request #357 from antonblanchard/xics-warning
xics: Fix warning when comparing two std_ulogic_vectors
3 years ago
Anton Blanchard 0b39947f8d Remove unused sequential signal from Fetch1ToIcacheType
GHDL synthesis is flagging a warning about this.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
3 years ago
Anton Blanchard 00bf0af21c xics: Fix warning when comparing two std_ulogic_vectors
Use unsigned() to make it clear what we are doing.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
3 years ago
Anton Blanchard 50b4cb9423 fpu: Make inverse_table a constant
GHDL synthesis is complaining that inverse_table is never stored to.
Change it to a constant.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
3 years ago
Tianrui Wei 844ca0e6b5
fix: fix icache_tb not finishing correctly
Setting icache to be privileged and accessing physical memory directly.
And set big_endian to 0 to correspond to the testbench result.

Signed-off-by: Tianrui Wei <tianrui@tianruiwei.com>
3 years ago
Michael Neuling f01f3d233a
Merge pull request #352 from mkj/static-urjtag
mw_debug: Add STATIC_URJTAG flag
3 years ago
Matt Johnston c0c00d05bc mw_debug: Add STATIC_URJTAG flag
Revert to linking dynamically by default, can statically link with
`make STATIC_URJTAG=1`

Fixes #351

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
3 years ago
Michael Neuling ffcdaaa92d
Update the README Issues (#350)
We've had these for a while now:
 - D/I cache
 - GPR bypassing
 - Supervisor state (and can boot linux)

We still need Vector/VMX/VSX (and probably some other things)

Signed-off-by: Michael Neuling <mikey@neuling.org>
3 years ago
Michael Neuling b4770197a2
Merge pull request #349 from madscientist159/master
Extend LiteDRAM VHDL wrapper to allow more than one clock line
3 years ago
Raptor Engineering Development Team fcb783a0fb Extend LiteDRAM VHDL wrapper to allow more than one clock line
This is necessary for the upcoming Arctic Tern system enablement,
since Arctic Tern uses two DRAM devices and a separate clock line
is routed to each device.  LiteX handles this behavior correctly,
therefore we assume other hardware exists that uses a similar
DRAM clock design.

Updates from Mikey to fix some compile issues.

Signed-off-by: Timothy Pearson <tpearson@raptorengineering.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
3 years ago
Michael Neuling 2b97fb0bf3
Merge pull request #348 from paulusmack/reduce
Reduce LUT usage
3 years ago
Paul Mackerras 0aa898c7a6 xics: Rework the irq_gen process
At present, the loop in the irq_gen process generates a chain of
comparators and other logic to work out the source number and priority
of the most-favoured (lowest priority number) pending interrupt.
This replaces that chain with (1) logic to generate an array of bits,
one per priority, indicating whether any interrupt is pending at that
priority, (2) a priority encoder to select the most favoured priority
with an interrupt pending, (3) logic to generate an array of bits, one
per source, indicating whether an interrupt is pending at the priority
calculated in step 2, and (4) a priority encoder to work out the
lowest numbered source that has an interrupt pending at the selected
priority.  This reduces LUT utilization.

The priority encoder function implemented here uses the optimized
count-leading-zeroes logic from helpers.vhdl.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
3 years ago
Paul Mackerras 1720a0584a Use alternative count-leading-zeroes algorithm in the FPU and LSU
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
3 years ago
Paul Mackerras 1086988883 countzero: Use alternative algorithm for higher bits
This implements an alternative count-leading-zeroes algorithm which
uses less LUTs to generate the higher-order bits (2..5) of the
result.

By doing (v | -v) rather than (v & -v), we get a value which has ones
from the MSB down to the rightmost 1 bit in v and then zeroes down to
the LSB.  This means that we can generate the MSB of the result (the
index of the rightmost 1 bit in v) just by looking at bits 63 and 31
of (v | -v), assuming that v is 64 bits.  Bit 4 of the result requires
looking at bits 63, 47, 31 and 15.  In contrast, each bit of the
result using (v & -v), which has a single 1, requires ORing together
32 bits.

It turns out that the minimum LUT usage comes from using (v & -v) to
generate bits 0 and 1 of the result, and using (v | -v) to generate
bits 2 to 5.  This saves almost 60 6-input LUTs on the Artix-7.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
3 years ago
Paul Mackerras 4cf2921b0b soc: Re-do peripheral address decode to improve timing
This generates a series of io_cycle_* signals which are clean latches
and which become the 'cyc' signals of the wishbone buses going to
various peripherals (syscon, uarts, XICS, GPIO, etc.).  Effectively
this is done by moving the address decoding into the slave_io_latch
process.  The slave_io_type, which drives the multiplexer which
selects which wishbone to look for a response on, is reduced to just 8
values in the expectation that an 8-way multiplexer will use less
logic than one with more than 8 inputs.

With this timing is considerably better on the A7-100T.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
3 years ago
Michael Neuling 27b660ef76
Merge pull request #346 from mkj/dmi_ecp5
Add DMI and mw_debug for ECP5
3 years ago
Anton Blanchard 5a5a082601
Merge pull request #343 from mikey/orange-crab-ci
ci: Add new Orange Crab build
3 years ago
Matt Johnston 9c64f8a98b mw_debug: Add Lattice ECP5 support
"-b ecp5" will select ECP5 interface that talks to a JTAGG
primitive.

For example with a FT232H JTAG board:

./mw_debug  -t 'ft2232 vid=0x0403 pid=0x6014'  -s 30000000 -b ecp5 mr ff003888 6
Connected to libftdi driver.
Found device ID: 0x41113043
00000000ff003888: 6d6f636c65570a0a  ..Welcom
00000000ff003890: 63694d206f742065  e to Mic
00000000ff003898: 2120747461776f72  rowatt !
00000000ff0038a0: 0000000000000a0a  ........
00000000ff0038a8: 67697320636f5320   Soc sig
00000000ff0038b0: 203a65727574616e  nature:
Core: running
 NIA: c0000000000187f8
 MSR: 9000000000001033

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
3 years ago
Matt Johnston 3775650df3 dmi_dtm_ecp5: Use ECP5 JTAGG for DMI
This uses the JTAGG primitive which is similar to BSCANE2.
The LUT4 delay approach came from Florian and Greg in
https://github.com/enjoy-digital/litex/pull/1087

Has been tested on an OrangeCrab with 48MHz sysclk
FT232H up to 30MHz (though libusb/urjtag is by far the bottleneck vs
the JTAG clock)

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
3 years ago
Matt Johnston eb20195a10 mw_debug: Link urjtag statically
liburjtag isn't in Debian, so usually we're pointing at a urjtag
build directory when building mw_debug

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
3 years ago
Matt Johnston 763138798e mw_debug: use isxdigit for hex arguments
Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
3 years ago
Matt Johnston 04cc4a842c mw_debug: Add -s frequency argument
Chose -s for speed, vs -f for --force

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
3 years ago
Matt Johnston e05ae0c8cb mw_debug: pass target parameters to urjtag
An example

./mw_debug -d -t 'ft2232 vid=0x0403 pid=0x6014'

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
3 years ago
Paul Mackerras 49ec80ac3e fetch1/icache1: Remove the use_previous logic
This removes logic that I added some time ago with the thought that it
would enable us to do prefetching in the icache.  This logic detects
when the fetch address is an odd multiple of 4 and the next address in
sequence from the previous cycle.  In that case the instruction we
want is in the output register of the icache RAM already so there is
no need to do another read or any icache tag or TLB lookup.

However, this logic adds complexity, and removing it improves timing,
so this removes it.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
3 years ago
Paul Mackerras cef3660e74
Merge pull request #345 from antonblanchard/popcnt-go-fast
popcnt* timing improvements from Paul
3 years ago
Paul Mackerras 2491aa7fc5 core: Make popcnt* take two cycles
This moves the calculation of the result for popcnt* into the
countbits unit, renamed from countzero, so that we can take two cycles
to get the result.  The motivation for this is that the popcnt*
calculation was showing up as a critical path.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
3 years ago
Michael Neuling 286757f0f7 ci: Add new Orange Crab build
This builds the Orange Crab v0.21 + litedram image

Signed-off-by: Michael Neuling <mikey@neuling.org>
3 years ago
Michael Neuling 6ff3b2499c
Merge pull request #342 from mkj/orangecrab-merge
Orangecrab working with litedram

Fixed up a few simple merge conflicts in the Makefile.
3 years ago
Michael Neuling cdd661d844
Merge branch 'master' into orangecrab-merge 3 years ago
Michael Neuling fda8879e2f
Merge pull request #341 from mkj/progtools
orangecrab programming targets
3 years ago
Michael Neuling ffbf2f9964
Merge pull request #340 from mkj/orangecrab-ghdl-plugin
Makefile: detect when ghdl is a yosys plugin
3 years ago
Matt Johnston 049f0549d8 orangecrab: Fix sdcard wishbone addressing
Orangecrab missed out on:

Make wishbone addresses be in units of doublewords or words
Author: Paul Mackerras <paulus@ozlabs.org>
Date:   Wed Sep 15 18:18:09 2021 +1000

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
3 years ago
Matt Johnston abc6a4f372 orangecrab: use litesdcard
Currently not working (tested in Linux)

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
3 years ago
Matt Johnston 42959184dd litesdcard: add lattice, regenerate
Modifies litescard generate script to take a clock speed.

Regenerated verilog with latest litesdcard
e52c731 ("Bump year.")

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
3 years ago
Matt Johnston d794cc70b1 orangecrab: No BTC, LOG_LENGTH, dram NUM_LINES
Reduce litedram NUM_LINES 64->8
This allows us to meet timing. Can probably
be improved in future with better BRAM usage.

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
3 years ago
Matt Johnston a8d9203c5d orangecrab: Use litedram
Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
3 years ago
Matt Johnston 57d4c4c117 orangecrab: set HAS_SHORT_MULT
It seems free, generated as a single MULT18X18D

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
3 years ago
Matt Johnston a9b467f43b orangecrab: add Orange Crab r0.2 target
top-orangecrab0.2 is a copy of top-arty with various changes.
USRMCLK is added for the SPI clock
ethernet is removed

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
3 years ago
Matt Johnston 8901e84d8d litedram: Add orangecrab-85-0.2 target
Parameters are based on
https://github.com/gregdavill/OrangeCrab-test-sw/blob/main/hw/OrangeCrab-bitstream.py
and litex-boards orangecrab.py

rtt_nom and cmd_delay are overridden for OrangeCrab, we do the same here.

Generated with litedram and litex
62abf9c ("litedram_gen: Add block_until_ready port parameter to control blocking behaviour.")
add2746a ("tools/litex_cli: Rename wb to bus.")

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
3 years ago
Matt Johnston 08021ae28e litedram: set Makefile -Werror
Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
3 years ago
Matt Johnston 5a3cdc8b22 litedram: disable block_until_ready, regenerate
Recent litedram gets stuck at memtest unless block_until_ready=False.
(discussion in https://github.com/enjoy-digital/litedram/pull/292)

This change regenerates with latest litedram and litex
62abf9c ("litedram_gen: Add block_until_ready port parameter to control blocking behaviour.")
add2746a ("tools/litex_cli: Rename wb to bus.")

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
3 years ago
Matt Johnston 5e90133b61 Makefile: add ecpprog targets
The 0x80000 offset is specific to the OrangeCrab bootloader.

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
3 years ago
Matt Johnston 7761bf8b71 Makefile: Add DFU programming
Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
3 years ago
Matt Johnston 2ec0d5fccd Makefile: detect when ghdl is a yosys plugin
oss-cad-suite builds it as a plugin, some other toolchains
have it built in.

Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
3 years ago
Anton Blanchard 67164a6ffa
Merge pull request #338 from shenki/yosys-read-verilog
Makefile: Use read_verilog with yosys
3 years ago
Joel Stanley 9ceb463957 Makefile: Use read_verilog with yosys
Yosys changed command line behaviour following the v0.12 release.  Work
around this by using read_verilog, which maintains the old behaviour.

This should work fine for current yosys and be compatible with
future releases.

See https://github.com/YosysHQ/yosys/issues/3109

Signed-off-by: Joel Stanley <joel@jms.id.au>
3 years ago
Michael Neuling 7fa7b45faa
Merge pull request #337 from paulusmack/fixes
ECP5: Adjust PLL constants so the PLL lock indication works
3 years ago
Paul Mackerras d458b5845c ECP5: Adjust PLL constants so the PLL lock indication works
At present, code (such as simple_random) which produces serial port
output during the first few milliseconds of operation produces garbled
output.  The reason is that the clock has not yet stabilized and is
running slow, resulting in the bit time of the serial characters being
too long.

The ECP5 data sheet says that the phase detector should be operated
between 10 and 400 MHz.  The current code operates it at 2MHz.
Consequently, the PLL lock indication doesn't work, i.e. it is always
zero.  The current code works around that by inverting it, i.e. taking
the "not locked" indication to mean "locked".

Instead, we now run it at 12MHz, chosen because the common external
clock inputs on ECP5 boards are 12MHz and 48MHz.  Normally this would
mean that the available system clock frequencies would be multiples of
12MHz, but this is a little inconvenient as we use 40MHz on the Orange
Crab v0.21 boards.  Instead, by using the secondary clock output for
feedback, we can have any divisor of the PLL frequency as the system
clock frequency.

The ECP5 data sheet says the PLL oscillator can run at 400 to 800
MHz.  Here we choose 480MHz since that allows us to generate 40MHz and
48MHz easily and is a multiple of 12MHz.

With this, the lock signal works correctly, and the inversion can be
removed.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
3 years ago
Michael Neuling 8a030502a2
Merge pull request #336 from paulusmack/fixes
Makefile: Correct parameters for the Orange Crab 85F
3 years ago
Paul Mackerras a5c9b3c412 Makefile: Add a target for the Orange Crab v0.21 with LFE5U-85F
The existing orange crab target is for an older board with a
LFE5UM5G-85F device.  Newer orange crab boards (v0.21) have a
LFE5U-85F device in the -8 speed grade, so make a new target for them
called ORANGE-CRAB-0.21.

Also add flags to ecppack to indicate that the bitstream should be
compressed and can be loaded at 38.8MHz.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
3 years ago
Michael Neuling 9cbe1f4a17
Merge pull request #334 from antonblanchard/icbi-issue
Add a test for icbi and dcbz issues
3 years ago
Anton Blanchard 099862bee9
Merge pull request #335 from ozbenh/misc
Misc cleanups and icache fix
3 years ago
Benjamin Herrenschmidt e675eba0df icache: req_laddr becomes req_raddr
Uses real_addr_t and only stores the real address bits

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
3 years ago
Benjamin Herrenschmidt 5cfa65e836 Introduce addr_to_wb() and wb_to_addr() helpers
These convert addresses to/from wishbone addresses, and use them
in parts of the caches, in order to make the code a bit more readable.

Along the way, rename some functions in the caches to make it a bit
clearer what they operate on and fix a bug in the icache STOP_RELOAD state where
the wb address wasn't properly converted.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
3 years ago
Benjamin Herrenschmidt d745995207 Introduce real_addr_t and addr_to_real()
This moves REAL_ADDR_BITS out of the caches and defines a real_addr_t
type for a real address, along with a addr_to_real() conversion helper.

It makes the vhdl a bit more readable

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
3 years ago
Anton Blanchard 2d142a6c01 tests/misc: Add a store/dcbz test
We have a bug where an store near a dcbz can cause the dcbz to only zero
8 bytes. Add a test case for this.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
3 years ago
Anton Blanchard 00259458c7 tests/misc: Add an icbi test
We have a bug where an icbi can cause an instruction to execute twice.
Add a test case for this.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
3 years ago
Anton Blanchard 13439c76ba
Merge pull request #333 from ozbenh/wukong
Add support for  QMTech Wukong v2 board
3 years ago
Benjamin Herrenschmidt d564672a82 Regenerate litedram and liteeth
Note: There are a few patches to upstream to fix an upstream breakage
of litedram standalone generator, and fix some issues with liteeth
in the way it's used on Wukong. All these have pending pull requests.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
3 years ago
Benjamin Herrenschmidt da0189af1e Add support for QMTech Wukong v2 board
For now only the V2 of the board (slightly different pinout)
and only the A100T variant. I also haven't added GPIOs or anything
else on the PMODs really.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
3 years ago
Benjamin Herrenschmidt 621a0f6b28 fpga/clk_gen_plle2: Add support for 50Mhz->100Mhz
50Mhz clkin, 100Mhz sys_clk, as needed for Wukon v2

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
3 years ago
Benjamin Herrenschmidt 4b1a413a2f Add support for more spansion flash
That's the one on the Wukong v2

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
3 years ago
Anton Blanchard c7579d74b0
Merge pull request #332 from paulusmack/fixes
Bug fixes
3 years ago
Paul Mackerras 70270c066a dcache: Fix bug with dcbz closely following stores with the same tag
This fixes a bug where a dcbz can get incorrectly handled as an
ordinary 8-byte store if it arrives while the dcache state machine is
handling other stores with the same tag value (i.e. within the same
set-sized area of memory).  The logic that says whether to include a
new store in the current wishbone cycle didn't take into account
whether the new store was a dcbz.  This adds a "req.dcbz = '0'" factor
so that it does.  This is necessary because dcbz is handled more like
a cache line refill (but writing to memory rather than reading) than
an ordinary store.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
3 years ago
Paul Mackerras 9b3b57710a icache: Fix icache invalidation
This fixes two bugs in the flash invalidation of the icache.

The first is that an instruction could get executed twice.  The
i-cache RAM is 2 instructions (64 bits) wide, so one read can supply
results for 2 cycles.  The fetch1 stage tells icache when the address
is equal to the address of the previous cycle plus 4, and in cases
where that is true, bit 2 of the address is 1, and the previous cycle
was a cache hit, we just use the second word of the doubleword read
from the cache RAM.  However, the cache hit/miss logic also continues
to operate, so in the case where the first word hits but the second
word misses (because of an icache invalidation or a snoop occurring in
the first cycle), we supply the instruction from the data previously
read from the icache RAM but also stall fetch1 and start a cache
reload sequence, and subsequently supply the second instruction
again.  This fixes the issue by inhibiting req_is_miss and stall_out
when use_previous is true.

The second bug is that if an icache invalidation occurs while
reloading a line, we continue to reload the line, and make it valid
when the reload finishes, even though some of the data may have been
read before the invalidation occurred.  This adds a new state
STOP_RELOAD which we go to if an invalidation happens while we are in
CLR_TAG or WAIT_ACK state.  In STOP_RELOAD state we don't request any
more reads from memory and wait for the reads we have previously
requested to be acked, and then go to IDLE state.  Data returned is
still written to the icache RAM, but that doesn't matter because the
line is invalid and is never made valid.

Note that we don't have to worry about invalidations due to snooped
writes while reloading a line, because the wishbone arbiter won't
switch to another master once it has started sending our reload
requests to memory.  Thus a store to memory will either happen before
any of our reads have got to memory, or after we have finished the
reload (in which case we will no longer be in WAIT_ACK state).

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
3 years ago
Paul Mackerras 83dea94793 decode1: Conditional trap instructions don't need to be single-issue
They can generate interrupts, but that doesn't mean they have to
single-issue.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
3 years ago
Paul Mackerras 9aaa6d3ca3
Merge pull request #330 from antonblanchard/orange-crab-freq
Orange Crab is 48MHz not 50MHz, bump PLL frequency
3 years ago
Anton Blanchard 537e446562
Merge pull request #331 from ozbenh/misc
jtag tooling improvements & gitignore fix
3 years ago
Benjamin Herrenschmidt e6cb72fcd9 Add liteeth/build to gitignore
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
3 years ago
Benjamin Herrenschmidt b557ec3a05 mw_debug: Default to jtag backend if unspecified
It avoids typing it all the time

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
3 years ago
Benjamin Herrenschmidt 4bdfef9a20 mw_debug: Probe cable if unspecified
Instead of defaulting to DigilentHS1

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
3 years ago
Benjamin Herrenschmidt 814d6914d0 flash-arty: Add cable argument
To select the cable config. Defaults to arty

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
3 years ago
Anton Blanchard af6bc48d36
Merge pull request #329 from paulusmack/wb-fix
Wishbone addressing fix
3 years ago
Anton Blanchard 06266fe84a Orange Crab is 48MHz not 50MHz, bump PLL frequency
I'm not sure why I set the input frequency for the Orange Crab to 50MHz.
Since we easily make timing now, bump our output frequency to 48MHz as
well.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
3 years ago
Michael Neuling 0a415410c9
Merge pull request #328 from paulusmack/shortmult
core: Add a short multiplier
3 years ago
Michael Neuling c2f5db6fca
Merge pull request #327 from paulusmack/master
loadstore1 timing improvement
3 years ago
Paul Mackerras ca4eb46aea Make wishbone addresses be in units of doublewords or words
This makes the 64-bit wishbone buses have the address expressed in
units of doublewords (64 bits), and similarly for the 32-bit buses the
address is in units of words (32 bits).  This is to comply with the
wishbone spec.  Previously the addresses on the wishbone buses were in
units of bytes regardless of the bus data width, which is not correct
and caused problems with interfacing with externally-generated logic.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
3 years ago
Paul Mackerras 734e4c4a52 core: Add a short multiplier
This adds an optional 16 bit x 16 bit signed multiplier and uses it
for multiply instructions that return the low 64 bits of the product
(mull[dw][o] and mulli, but not maddld) when the operands are both in
the range -2^15 .. 2^15 - 1.   The "short" 16-bit multiplier produces
its result combinatorially, so a multiply that uses it executes in one
cycle.  This improves the coremark result by about 4%, since coremark
does quite a lot of multiplies and they almost all have operands that
fit into 16 bits.

The presence of the short multiplier is controlled by a generic at the
execute1, SOC, core and top levels.  For now, it defaults to off for
all platforms, and can be enabled using the --has_short_mult flag to
fusesoc.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
3 years ago
Paul Mackerras bb5f356386 loadstore1: Make r1.req.addr not depend on l_in.valid
Some critical path reports showed r1.req.addr depending on l_in.valid,
which then depended ultimately on the dcache's r1.ls_valid.  In fact
we can update r1.req.addr (and other fields of r1.req, except for
r1.req.valid) independently of l_in.valid as long as busy = 0.
We do also need to preserve r1.req.addr0 when l_in.valid = 0, so we
pull it out of r1.req and store it separately in r1.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
3 years ago
Michael Neuling 2224b28c2c
Merge pull request #324 from paulusmack/master
Performance and timing improvements
3 years ago
Paul Mackerras 54b0e8b8c8 core: Predict not-taken conditional branches using BTC
This adds a bit to the BTC to store whether the corresponding branch
instruction was taken last time it was encountered.  That lets us pass
a not-taken prediction down to decode1, which for backwards direct
branches inhibits it from redirecting fetch to the target of the
branch.  This increases coremark by about 2%.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
3 years ago
Paul Mackerras 0cdaa2778f xilinx-mult: Move some registers later in the data flow
This changes s0 to use the P register rather than the A/B/C input
registers, thus improving the timing of the multiplier output.  The
m00, m02 and m03 multipliers now use their P registers rather than the
M registers, moving the addition they do from the second cycle to the
first.

Also, the XOR that inverts the 32 LSBs is moved before the output
register.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
3 years ago
Paul Mackerras 77d9891d2f
Merge pull request #326 from antonblanchard/dcache-nc-fix
dcache: Loads from non-cacheable PTEs load entire 64 bits
3 years ago
Anton Blanchard 39e1d10069
Merge pull request #325 from paulusmack/fixes
decode1: Fix form of isel marked as single-issue
3 years ago
Anton Blanchard b29c58f3d1 dcache: Loads from non-cacheable PTEs load entire 64 bits
A non-cacheable load should only load the data requested and no more. We
do the right thing for real mode cache inhibited storage instructions,
but when loading through a non-cacheable PTE we load the entire 64 bits
regardless of the size.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
3 years ago
Paul Mackerras d4cfdb1bfe decode1: Fix form of isel marked as single-issue
The row in the decode table for isel with BC=0 was inadvertently left
marked as single-issue by commit 813f834012 ("Add CR hazard
detection", 2019-10-15).  Fix it.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
3 years ago
Michael Neuling 09bd01a49e
Merge pull request #323 from paulusmack/fixes
More instruction fixes
3 years ago
Paul Mackerras 06e07c69a8 decode1: Fix maddld and maddhdu to not set CR0
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
3 years ago
Paul Mackerras a68921edca core: Fix mcrxrx, addpcis and bpermd
- mcrxrx put the bits in the wrong order

- addpcis was setting CR0 if the instruction bit 0 = 1, which it
  shouldn't

- bpermd was producing 0 always and additionally had the wrong bit
  numbering

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
3 years ago
Michael Neuling 18eb029f0a
Merge pull request #322 from paulusmack/fixes
Fix bug with load hitting two previous stores
3 years ago
Paul Mackerras ba34914465 tests/misc: Add a test for a load that hits two preceding stores
This checks that the store forwarding machinery in the dcache
correctly combines forwarded stores when they are partial stores
(i.e. only writing part of the doubleword, as for a byte store).

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
3 years ago
Paul Mackerras 0b23a5e760 dcache: Simplify data input to improve timing
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
3 years ago
Paul Mackerras 1a9834c506 dcache: Fix bug with forwarding of stores
We have two stages of forwarding to cover the two cycles of latency
between when something is written to BRAM and when that new data can
be read from BRAM.  When the writes to BRAM result from store
instructions, the write may write only some bytes of a row (8 bytes)
and not others, so we have a mask to enable only the written bytes to
be forwarded.  However, we only forward written data from either the
first stage of forwarding or the second, not both.  So if we have
two stores in succession that write different bytes of the same row,
and then a load from the row, we will only forward the data from the
second store, and miss the data from the first store; thus the load
will get the wrong value.

To fix this, we make the decision on which forward stage to use for
each byte individually.  This results in a 4-input multiplexer feeding
r1.data_out, with its inputs being the BRAM, the wishbone, the current
write data, and the 2nd-stage forwarding register.  Each byte of the
multiplexer is separately controlled.  The code for this multiplexer
is moved to the dcache_fast_hit process since it is used for cache
hits as well as cache misses.

This also simplifies the BRAM code by ensuring that we can use the
same source for the BRAM address and way selection for writes, whether
we are writing store data or cache line refill data from memory.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
3 years ago
Paul Mackerras f812832ad7 dcache: Move way selection and forwarding earlier
This moves the way multiplexer for the data from the BRAM, and the
multiplexers for forwarding data from earlier stores or refills,
before a clock edge rather than after, so that now the data output
from the dcache comes from a clean latch.  To do this we remove the
extra latch on the output of the data BRAM (i.e. ADD_BUF=false) and
rearrange the logic.  The choice whether to forward or not now depends
not on way comparisons but rather on a tag comparisons, for the sake
of timing.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
3 years ago
Michael Neuling a9e6263aab
Merge pull request #319 from antonblanchard/verilator-ci
Add some Verilator CI tests
3 years ago
Michael Neuling 8bbb0018b4
Merge pull request #318 from paulusmack/pmu
PMU enhancements
3 years ago
Michael Neuling 05fe709ffc
Merge pull request #320 from antonblanchard/litedram-regenerate
litedram: Regenerate from upstream litex
3 years ago
Anton Blanchard c0f7f54276 litedram: Regenerate from upstream litex
This adds Joel's sdcard feature reporting to the firmware.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
3 years ago
Anton Blanchard ee38a31152 ci: Add verilator tests
Now we have some verilator tests, add them to the CI.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
3 years ago
Anton Blanchard c81583c128 makefile: Check environment for MEMORY_SIZE/RAM_INIT_FILE
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
3 years ago
Anton Blanchard efb387b0d2 makefile: Add some verilator micropython tests
These are the same micropython tests we use against the ghdl
simulation.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
3 years ago
Anton Blanchard 8acd5a5607 verilator: Specify top level module
While verilator finds the correct top level module with the current
setup, if we start adding simulation models it can get confused.

Explicitly specify the top level module.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
3 years ago
Anton Blanchard 7e2de602ee makefile: Simplify microwatt-verilator target, add Docker image
Recent versions of verilator support the --build option, allowing
us to remove a step.

Also add a Docker image for verilator.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
3 years ago
Paul Mackerras 65c43b488b PMU: Add several more events
This implements most of the architected PMU events.  The ones missing
are mostly the ones that depend on which level of the cache hierarchy
data is fetched from.  The events implemented here, and their raw
event codes, are:

    Floating-point operation completed (100f4)
    Load completed (100fc)
    Store completed (200f0)
    Icache miss (200fc)
    ITLB miss (100f6)
    ITLB miss resolved (400fc)
    Dcache load miss (400f0)
    Dcache load miss resolved (300f8)
    Dcache store miss (300f0)
    DTLB miss (300fc)
    DTLB miss resolved (200f6)
    No instruction available and none being executed (100f8)
    Instruction dispatched (200f2, 300f2, 400f2)
    Taken branch instruction completed (200fa)
    Branch mispredicted (400f6)
    External interrupt taken (200f8)

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
3 years ago
Paul Mackerras 8cdb00652b
Merge pull request #316 from antonblanchard/verilator-fix
Rename 'do' signal to avoid verilator System Verilog warning
3 years ago
Paul Mackerras 71af8016da
Merge pull request #317 from antonblanchard/gpio-fix
gpio: Add HAS_GPIO to avoid verilator build errors
3 years ago
Paul Mackerras e33fb26e7a PMU: Fix PMC5/6 behaviour when MMCR0[PMCC] = 11
The architecture states that when MMCR0[PMCC] = 0b11, PMC5 and PMC6
are not part of the Performance Monitor, meaning that they are not
controlled by bits in MMCRs, and counter negative conditions in PMCs 5
and 6 don't generate Performance Monitor alerts, exceptions or
interrupts.  It doesn't say that PMC5 and PMC6 are frozen in this
case, so presumably they should continue to count run instructions and
run cycles.

This implements that behaviour.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
3 years ago
Anton Blanchard 591e96d1a2 gpio: Add HAS_GPIO to avoid verilator build errors
The verilator build fails with warnings and errors, because NGPIO
is 0 and we do things like:

        gpio_out : out std_ulogic_vector(NGPIO - 1 downto 0);

Set NGPIO to something reasonable (eg 32) and add HAS_GPIO to avoid
building the macro entirely if it isn't in use.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
3 years ago
Anton Blanchard bc0f7cf236 Rename 'do' signal to avoid verilator System Verilog warning
Experimenting with using ghdl to do VHDL to Verilog conversion (instead
of ghdl+yosys), verilator complains that a signal is a SystemVerilog
keyword:

%Error: microwatt.v:15013:18: Unexpected 'do': 'do' is a SystemVerilog keyword misused as an identifier.
        ... Suggest modify the Verilog-2001 code to avoid SV keywords, or use `begin_keywords or --language.

We could probably make this go away by disabling SystemVerilog, but
it's easy to rename the signal in question. Rename di at the same
time.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
3 years ago
Michael Neuling 2bd00f5119
Merge pull request #315 from paulusmack/pmu
Add basic PMU implementation
3 years ago
Paul Mackerras a7873b45f7 core: Add a basic performance monitor unit (PMU) implementation
This is the start of an implementation of a PMU according to PowerISA
v3.0B.  Things not implemented yet include most architected events,
the BHRB, event-based branches, thresholding, MMCR0[TBCC] field, etc.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
3 years ago

@ -72,7 +72,7 @@ jobs:
fail-fast: false
max-parallel: 2
matrix:
task: [ ECP5-EVN, ORANGE-CRAB ]
task: [ ECP5-EVN, ORANGE-CRAB, ORANGE-CRAB-0.21 ]
runs-on: ubuntu-latest
env:
DOCKER: 1
@ -93,3 +93,17 @@ jobs:
steps:
- uses: actions/checkout@v2
- run: make DOCKER=1 microwatt.v

verilator:
runs-on: ubuntu-latest
env:
DOCKER: 1
FPGA_TARGET: verilator
RAM_INIT_FILE: micropython/firmware.hex
MEMORY_SIZE: 524288
steps:
- uses: actions/checkout@v2
- run: |
sudo apt update
sudo apt install -y python3-pexpect
make -j$(nproc) test_micropython_verilator test_micropython_verilator_long

1
.gitignore vendored

@ -13,4 +13,5 @@ tests/*/*.hex
tests/*/*.elf
TAGS
litedram/build/*
liteeth/build/*
obj_dir/*

@ -1,16 +1,22 @@
GHDL ?= ghdl
GHDLFLAGS=--std=08
CFLAGS=-O3 -Wall
VERILATOR_FLAGS=-O3 #--trace
# Need to investigate why yosys is hitting verilator warnings, and eventually turn on -Wall
VERILATOR_FLAGS=-O3 -Wno-fatal -Wno-CASEOVERLAP -Wno-UNOPTFLAT #--trace
# It takes forever to build with optimisation, so disable by default
#VERILATOR_CFLAGS=-O3

GHDLSYNTH ?= ghdl.so
# some yosys builds have ghdl plugin built in, otherwise need "-m ghdl"
GHDLSYNTH ?= $(shell ($(YOSYS) -H | grep -q ghdl) || echo -m ghdl)
YOSYS ?= yosys
NEXTPNR ?= nextpnr-ecp5
ECPPACK ?= ecppack
ECPPROG ?= ecpprog
OPENOCD ?= openocd
VUNITRUN ?= python3 ./run.py
VERILATOR ?= verilator
DFUUTIL ?= dfu-util
DFUSUFFIX ?= dfu-suffix

# We need a version of GHDL built with either the LLVM or gcc backend.
# Fedora provides this, but other distros may not. Another option is to use
@ -33,12 +39,13 @@ PWD = $(shell pwd)
DOCKERARGS = run --rm -v $(PWD):/src:z -w /src
GHDL = $(DOCKERBIN) $(DOCKERARGS) ghdl/ghdl:buster-llvm-7 ghdl
CC = $(DOCKERBIN) $(DOCKERARGS) ghdl/ghdl:buster-llvm-7 gcc
GHDLSYNTH = ghdl
GHDLSYNTH = -m ghdl
YOSYS = $(DOCKERBIN) $(DOCKERARGS) hdlc/ghdl:yosys yosys
NEXTPNR = $(DOCKERBIN) $(DOCKERARGS) hdlc/nextpnr:ecp5 nextpnr-ecp5
ECPPACK = $(DOCKERBIN) $(DOCKERARGS) hdlc/prjtrellis ecppack
OPENOCD = $(DOCKERBIN) $(DOCKERARGS) --device /dev/bus/usb hdlc/prog openocd
VUNITRUN = $(DOCKERBIN) $(DOCKERARGS) ghdl/vunit:llvm python3 ./run.py
VERILATOR = $(DOCKERBIN) $(DOCKERARGS) verilator/verilator:latest
endif

VUNITARGS += -p10
@ -53,9 +60,9 @@ core_files = decode_types.vhdl common.vhdl wishbone_types.vhdl fetch1.vhdl \
decode1.vhdl helpers.vhdl insn_helpers.vhdl \
control.vhdl decode2.vhdl register_file.vhdl \
cr_file.vhdl crhelpers.vhdl ppc_fx_insns.vhdl rotator.vhdl \
logical.vhdl countzero.vhdl multiply.vhdl divider.vhdl execute1.vhdl \
logical.vhdl countbits.vhdl multiply.vhdl divider.vhdl execute1.vhdl \
loadstore1.vhdl mmu.vhdl dcache.vhdl writeback.vhdl core_debug.vhdl \
core.vhdl fpu.vhdl
core.vhdl fpu.vhdl pmu.vhdl

soc_files = wishbone_arbiter.vhdl wishbone_bram_wrapper.vhdl sync_fifo.vhdl \
wishbone_debug_master.vhdl xics.vhdl syscon.vhdl gpio.vhdl soc.vhdl \
@ -138,29 +145,54 @@ $(soc_dram_tbs): %: $(soc_dram_files) $(soc_dram_sim_files) $(soc_dram_sim_obj_f
endif

# Hello world
MEMORY_SIZE=8192
RAM_INIT_FILE=hello_world/hello_world.hex
MEMORY_SIZE ?=8192
RAM_INIT_FILE ?=hello_world/hello_world.hex

# Micropython
#MEMORY_SIZE=393216
#RAM_INIT_FILE=micropython/firmware.hex

FPGA_TARGET ?= ORANGE-CRAB
FPGA_TARGET ?= ORANGE-CRAB-0.21

# FIXME: icache RAMs aren't being inferrenced as block RAMs on ECP5
# with yosys, so make it smaller for now as a workaround.
ICACHE_NUM_LINES=4

# OrangeCrab with ECP85
clkgen=fpga/clk_gen_ecp5.vhd
toplevel=fpga/top-generic.vhdl
dmi_dtm=dmi_dtm_dummy.vhdl
LITEDRAM_GHDL_ARG=

# OrangeCrab with ECP85 (original v0.0 with UM5G-85 chip)
ifeq ($(FPGA_TARGET), ORANGE-CRAB)
RESET_LOW=true
CLK_INPUT=50000000
CLK_FREQUENCY=40000000
CLK_INPUT=48000000
CLK_FREQUENCY=48000000
LPF=constraints/orange-crab.lpf
PACKAGE=CSFBGA285
NEXTPNR_FLAGS=--um5g-85k --freq 40
NEXTPNR_FLAGS=--um5g-85k --freq 48
OPENOCD_JTAG_CONFIG=openocd/olimex-arm-usb-tiny-h.cfg
OPENOCD_DEVICE_CONFIG=openocd/LFE5UM5G-85F.cfg
ECP_FLASH_OFFSET=0x80000
endif

# OrangeCrab with ECP85 (v0.21)
ifeq ($(FPGA_TARGET), ORANGE-CRAB-0.21)
RESET_LOW=true
CLK_INPUT=48000000
CLK_FREQUENCY=48000000
LPF=constraints/orange-crab-0.2.lpf
PACKAGE=CSFBGA285
NEXTPNR_FLAGS=--85k --speed 8 --freq 48 --timing-allow-fail --ignore-loops
OPENOCD_JTAG_CONFIG=openocd/olimex-arm-usb-tiny-h.cfg
OPENOCD_DEVICE_CONFIG=openocd/LFE5U-85F.cfg
DFU_VENDOR=1209
DFU_PRODUCT=5af0
ECP_FLASH_OFFSET=0x80000
toplevel=fpga/top-orangecrab0.2.vhdl
litedram_target=orangecrab-85-0.2
soc_extra_v += litesdcard/generated/lattice/litesdcard_core.v
dmi_dtm=dmi_dtm_ecp5.vhdl
endif

# ECP5-EVN
@ -175,12 +207,17 @@ OPENOCD_JTAG_CONFIG=openocd/ecp5-evn.cfg
OPENOCD_DEVICE_CONFIG=openocd/LFE5UM5G-85F.cfg
endif

ifneq ($(litedram_target),)
soc_extra_synth += litedram/extras/litedram-wrapper-l2.vhdl \
litedram/generated/$(litedram_target)/litedram-initmem.vhdl
soc_extra_v += litedram/generated/$(litedram_target)/litedram_core.v
LITEDRAM_GHDL_ARG=-gUSE_LITEDRAM=true
endif

GHDL_IMAGE_GENERICS=-gMEMORY_SIZE=$(MEMORY_SIZE) -gRAM_INIT_FILE=$(RAM_INIT_FILE) \
-gRESET_LOW=$(RESET_LOW) -gCLK_INPUT=$(CLK_INPUT) -gCLK_FREQUENCY=$(CLK_FREQUENCY) -gICACHE_NUM_LINES=$(ICACHE_NUM_LINES)
-gRESET_LOW=$(RESET_LOW) -gCLK_INPUT=$(CLK_INPUT) -gCLK_FREQUENCY=$(CLK_FREQUENCY) -gICACHE_NUM_LINES=$(ICACHE_NUM_LINES) \
$(LITEDRAM_GHDL_ARG)

clkgen=fpga/clk_gen_ecp5.vhd
toplevel=fpga/top-generic.vhdl
dmi_dtm=dmi_dtm_dummy.vhdl

ifeq ($(FPGA_TARGET), verilator)
RESET_LOW=true
@ -193,18 +230,16 @@ fpga_files = fpga/soc_reset.vhdl \
fpga/pp_fifo.vhd fpga/pp_soc_uart.vhd fpga/main_bram.vhdl \
nonrandom.vhdl

synth_files = $(core_files) $(soc_files) $(fpga_files) $(clkgen) $(toplevel) $(dmi_dtm)
synth_files = $(core_files) $(soc_files) $(soc_extra_synth) $(fpga_files) $(clkgen) $(toplevel) $(dmi_dtm)

microwatt.json: $(synth_files) $(RAM_INIT_FILE)
$(YOSYS) -m $(GHDLSYNTH) -p "ghdl --std=08 --no-formal $(GHDL_IMAGE_GENERICS) $(synth_files) -e toplevel; synth_ecp5 -abc9 -nowidelut -json $@ $(SYNTH_ECP5_FLAGS)" $(uart_files)
$(YOSYS) $(GHDLSYNTH) -p "ghdl --std=08 --no-formal $(GHDL_IMAGE_GENERICS) $(synth_files) -e toplevel; read_verilog $(uart_files) $(soc_extra_v); synth_ecp5 -abc9 -nowidelut -json $@ $(SYNTH_ECP5_FLAGS)"

microwatt.v: $(synth_files) $(RAM_INIT_FILE)
$(YOSYS) -m $(GHDLSYNTH) -p "ghdl --std=08 --no-formal $(GHDL_IMAGE_GENERICS) $(synth_files) -e toplevel; write_verilog $@"
$(YOSYS) $(GHDLSYNTH) -p "ghdl --std=08 --no-formal $(GHDL_IMAGE_GENERICS) $(synth_files) -e toplevel; write_verilog $@"

# Need to investigate why yosys is hitting verilator warnings, and eventually turn on -Wall
microwatt-verilator: microwatt.v verilator/microwatt-verilator.cpp verilator/uart-verilator.c
verilator $(VERILATOR_FLAGS) -CFLAGS "$(VERILATOR_CFLAGS) -DCLK_FREQUENCY=$(CLK_FREQUENCY)" --assert --cc $< --exe verilator/microwatt-verilator.cpp verilator/uart-verilator.c -o $@ -Iuart16550 -Wno-fatal -Wno-CASEOVERLAP -Wno-UNOPTFLAT
make -C obj_dir -f Vmicrowatt.mk
$(VERILATOR) $(VERILATOR_FLAGS) -CFLAGS "$(VERILATOR_CFLAGS) -DCLK_FREQUENCY=$(CLK_FREQUENCY)" -Iuart16550 --assert --cc --exe --build $^ -o $@ -top-module toplevel
@cp -f obj_dir/microwatt-verilator microwatt-verilator

microwatt_out.config: microwatt.json $(LPF)
@ -212,13 +247,28 @@ microwatt_out.config: microwatt.json $(LPF)
mv -f $@.tmp $@

microwatt.bit: microwatt_out.config
$(ECPPACK) --svf microwatt.svf $< $@
$(ECPPACK) --compress --freq 38.8 --svf microwatt.svf $< $@

microwatt.svf: microwatt.bit

prog: microwatt.svf
$(OPENOCD) -f $(OPENOCD_JTAG_CONFIG) -f $(OPENOCD_DEVICE_CONFIG) -c "transport select jtag; init; svf $<; exit"

microwatt.dfu: microwatt.bit
cp $< $@.tmp
$(DFUSUFFIX) -v $(DFU_VENDOR) -p $(DFU_PRODUCT) -a $@.tmp
mv $@.tmp $@

dfuprog: microwatt.dfu
$(DFUUTIL) -a 0 -D $<

ecpprog: microwatt.bit
$(ECPPROG) -S $<

ecpflash: microwatt.bit
test -n "$(ECP_FLASH_OFFSET)" || (echo Error: No ECP_FLASH_OFFSET defined for target; exit 1)
$(ECPPROG) -o $(ECP_FLASH_OFFSET) $<

tests = $(sort $(patsubst tests/%.out,%,$(wildcard tests/*.out)))
tests_console = $(sort $(patsubst tests/%.console_out,%,$(wildcard tests/*.console_out)))

@ -240,9 +290,15 @@ $(tests_console): core_tb
test_micropython: core_tb
@./scripts/test_micropython.py

test_micropython_verilator: microwatt-verilator
@./scripts/test_micropython_verilator.py

test_micropython_long: core_tb
@./scripts/test_micropython_long.py

test_micropython_verilator_long: microwatt-verilator
@./scripts/test_micropython_verilator_long.py

tests_soc_tb = $(patsubst %_tb,%_tb_test,$(soc_tbs))

%_test: %

@ -103,14 +103,8 @@ sudo dnf install fusesoc

```
fusesoc init
```

- Create a working directory and point FuseSoC at microwatt:

```
mkdir microwatt-fusesoc
cd microwatt-fusesoc
fusesoc library add microwatt /path/to/microwatt/
fusesoc fetch uart16550
fusesoc library add microwatt /path/to/microwatt
```

- Build using FuseSoC. For hello world (Replace nexys_video with your FPGA board such as --target=arty_a7-100):
@ -128,6 +122,68 @@ You should then be able to see output via the serial port of the board (/dev/tty
fusesoc run --target=nexys_video microwatt
```

## Linux on Microwatt

Mainline Linux supports Microwatt as of v5.14. The Arty A7 is the best tested
platform, but it's also been tested on the OrangeCrab and ButterStick.

1. Use buildroot to create a userspace

A small change is required to glibc in order to support the VMX/AltiVec-less
Microwatt, as float128 support is mandiatory and for this in GCC requires
VSX/AltiVec. This change is included in Joel's buildroot fork, along with a
defconfig:
```
git clone -b microwatt https://github.com/shenki/buildroot
cd buildroot
make ppc64le_microwatt_defconfig
make
```

The output is `output/images/rootfs.cpio`.

2. Build the Linux kernel
```
git clone https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
cd linux
make ARCH=powerpc microwatt_defconfig
make ARCH=powerpc CROSS_COMPILE=powerpc64le-linux-gnu- \
CONFIG_INITRAMFS_SOURCE=/buildroot/output/images/rootfs.cpio -j`nproc`
```

The output is `arch/powerpc/boot/dtbImage.microwatt.elf`.

3. Build gateware using FuseSoC

First configure FuseSoC as above.
```
fusesoc run --build --target=arty_a7-100 microwatt --no_bram --memory_size=0
```

The output is `build/microwatt_0/arty_a7-100-vivado/microwatt_0.bit`.

4. Program the flash

This operation will overwrite the contents of your flash.

For the Arty A7 A100, set `FLASH_ADDRESS` to `0x400000` and pass `-f a100`.

For the Arty A7 A35, set `FLASH_ADDRESS` to `0x300000` and pass `-f a35`.
```
microwatt/openocd/flash-arty -f a100 build/microwatt_0/arty_a7-100-vivado/microwatt_0.bit
microwatt/openocd/flash-arty -f a100 dtbImage.microwatt.elf -t bin -a $FLASH_ADDRESS
```

5. Connect to the second USB TTY device exposed by the FPGA

```
minicom -D /dev/ttyUSB1
```

The gateware has firmware that will look at `FLASH_ADDRESS` and attempt to
parse an ELF there, loading it to the address specified in the ELF header
and jumping to it.

## Testing

- A simple test suite containing random execution test cases and a couple of
@ -139,8 +195,5 @@ make -j$(nproc) check

## Issues

This is functional, but very simple. We still have quite a lot to do:

- There are a few instructions still to be implemented
- Need to add caches and bypassing (in progress)
- Need to add supervisor state (in progress)
- There are a few instructions still to be implemented:
- Vector/VMX/VSX

@ -21,6 +21,7 @@ package common is
constant MSR_FE1 : integer := (63 - 55); -- Floating Exception mode
constant MSR_IR : integer := (63 - 58); -- Instruction Relocation
constant MSR_DR : integer := (63 - 59); -- Data Relocation
constant MSR_PMM : integer := (63 - 61); -- Performance Monitor Mark
constant MSR_RI : integer := (63 - 62); -- Recoverable Interrupt
constant MSR_LE : integer := (63 - 63); -- Little Endian

@ -54,6 +55,34 @@ package common is
constant SPR_PTCR : spr_num_t := 464;
constant SPR_PVR : spr_num_t := 287;

-- PMU registers
constant SPR_UPMC1 : spr_num_t := 771;
constant SPR_UPMC2 : spr_num_t := 772;
constant SPR_UPMC3 : spr_num_t := 773;
constant SPR_UPMC4 : spr_num_t := 774;
constant SPR_UPMC5 : spr_num_t := 775;
constant SPR_UPMC6 : spr_num_t := 776;
constant SPR_UMMCR0 : spr_num_t := 779;
constant SPR_UMMCR1 : spr_num_t := 782;
constant SPR_UMMCR2 : spr_num_t := 769;
constant SPR_UMMCRA : spr_num_t := 770;
constant SPR_USIER : spr_num_t := 768;
constant SPR_USIAR : spr_num_t := 780;
constant SPR_USDAR : spr_num_t := 781;
constant SPR_PMC1 : spr_num_t := 787;
constant SPR_PMC2 : spr_num_t := 788;
constant SPR_PMC3 : spr_num_t := 789;
constant SPR_PMC4 : spr_num_t := 790;
constant SPR_PMC5 : spr_num_t := 791;
constant SPR_PMC6 : spr_num_t := 792;
constant SPR_MMCR0 : spr_num_t := 795;
constant SPR_MMCR1 : spr_num_t := 798;
constant SPR_MMCR2 : spr_num_t := 785;
constant SPR_MMCRA : spr_num_t := 786;
constant SPR_SIER : spr_num_t := 784;
constant SPR_SIAR : spr_num_t := 796;
constant SPR_SDAR : spr_num_t := 797;

-- GPR indices in the register file (GPR only)
subtype gpr_index_t is std_ulogic_vector(4 downto 0);

@ -127,6 +156,12 @@ package common is
constant FPSCR_NI : integer := 63 - 61;
constant FPSCR_RN : integer := 63 - 63;

-- Real addresses
-- REAL_ADDR_BITS is the number of real address bits that we store
constant REAL_ADDR_BITS : positive := 56;
subtype real_addr_t is std_ulogic_vector(REAL_ADDR_BITS - 1 downto 0);
function addr_to_real(addr: std_ulogic_vector(63 downto 0)) return real_addr_t;

-- Used for tracking instruction completion and pending register writes
constant TAG_COUNT : positive := 4;
constant TAG_NUMBER_BITS : natural := log2(TAG_COUNT);
@ -165,8 +200,8 @@ package common is
priv_mode : std_ulogic;
big_endian : std_ulogic;
stop_mark: std_ulogic;
sequential: std_ulogic;
predicted : std_ulogic;
pred_ntaken : std_ulogic;
nia: std_ulogic_vector(63 downto 0);
end record;

@ -178,6 +213,12 @@ package common is
insn: std_ulogic_vector(31 downto 0);
big_endian: std_ulogic;
next_predicted: std_ulogic;
next_pred_ntaken: std_ulogic;
end record;

type IcacheEventType is record
icache_miss : std_ulogic;
itlb_miss_resolved : std_ulogic;
end record;

type Decode1ToDecode2Type is record
@ -303,6 +344,51 @@ package common is
is_extended => '0', is_modulus => '0',
neg_result => '0', others => (others => '0'));

type PMUEventType is record
no_instr_avail : std_ulogic;
dispatch : std_ulogic;
ext_interrupt : std_ulogic;
instr_complete : std_ulogic;
fp_complete : std_ulogic;
ld_complete : std_ulogic;
st_complete : std_ulogic;
br_taken_complete : std_ulogic;
br_mispredict : std_ulogic;
ipref_discard : std_ulogic;
itlb_miss : std_ulogic;
itlb_miss_resolved : std_ulogic;
icache_miss : std_ulogic;
dc_miss_resolved : std_ulogic;
dc_load_miss : std_ulogic;
dc_ld_miss_resolved : std_ulogic;
dc_store_miss : std_ulogic;
dtlb_miss : std_ulogic;
dtlb_miss_resolved : std_ulogic;
ld_miss_nocache : std_ulogic;
ld_fill_nocache : std_ulogic;
end record;
constant PMUEventInit : PMUEventType := (others => '0');

type Execute1ToPMUType is record
mfspr : std_ulogic;
mtspr : std_ulogic;
spr_num : std_ulogic_vector(4 downto 0);
spr_val : std_ulogic_vector(63 downto 0);
tbbits : std_ulogic_vector(3 downto 0); -- event bits from timebase
pmm_msr : std_ulogic; -- PMM bit from MSR
pr_msr : std_ulogic; -- PR bit from MSR
run : std_ulogic;
nia : std_ulogic_vector(63 downto 0);
addr : std_ulogic_vector(63 downto 0);
addr_v : std_ulogic;
occur : PMUEventType;
end record;

type PMUToExecute1Type is record
spr_val : std_ulogic_vector(63 downto 0);
intr : std_ulogic;
end record;

type Decode2ToRegisterFileType is record
read1_enable : std_ulogic;
read1_reg : gspr_index_t;
@ -396,6 +482,14 @@ package common is
cache_paradox : std_ulogic;
end record;

type DcacheEventType is record
load_miss : std_ulogic;
store_miss : std_ulogic;
dcache_refill : std_ulogic;
dtlb_miss : std_ulogic;
dtlb_miss_resolved : std_ulogic;
end record;

type Loadstore1ToMmuType is record
valid : std_ulogic;
tlbie : std_ulogic;
@ -465,6 +559,12 @@ package common is
interrupt => '0', intr_vec => 0,
srr0 => (others => '0'), srr1 => (others => '0'));

type Loadstore1EventType is record
load_complete : std_ulogic;
store_complete : std_ulogic;
itlb_miss : std_ulogic;
end record;

type Execute1ToWritebackType is record
valid: std_ulogic;
instr_tag : instr_tag_t;
@ -595,6 +695,11 @@ package common is
write_cr_mask => (others => '0'),
write_cr_data => (others => '0'));

type WritebackEventType is record
instr_complete : std_ulogic;
fp_complete : std_ulogic;
end record;

end common;

package body common is
@ -679,4 +784,9 @@ package body common is
begin
return tag1.valid = '1' and tag2.valid = '1' and tag1.tag = tag2.tag;
end;

function addr_to_real(addr: std_ulogic_vector(63 downto 0)) return real_addr_t is
begin
return addr(real_addr_t'range);
end;
end common;

@ -0,0 +1,225 @@
LOCATE COMP "ext_clk" SITE "A9";
IOBUF PORT "ext_clk" IO_TYPE=LVCMOS33;

// LOCATE COMP "ext_rst_n" SITE "J2"; // io_13
// IOBUF PORT "ext_rst_n" PULLMODE=UP IO_TYPE=LVCMOS33 DRIVE=4;

// user_button as reset
LOCATE COMP "ext_rst_n" SITE "J17";
IOBUF PORT "ext_rst_n" IO_TYPE=SSTL135_I;

LOCATE COMP "usb_d_p" SITE "N1";
LOCATE COMP "usb_d_n" SITE "M2";
LOCATE COMP "usb_pullup" SITE "N2";

IOBUF PORT "usb_d_p" IO_TYPE=LVCMOS33;
IOBUF PORT "usb_d_n" IO_TYPE=LVCMOS33;
IOBUF PORT "usb_pullup" IO_TYPE=LVCMOS33;

LOCATE COMP "led0_g" SITE "M3";
LOCATE COMP "led0_r" SITE "K4";
LOCATE COMP "led0_b" SITE "J3";

IOBUF PORT "led0_g" IO_TYPE=LVCMOS33;
IOBUF PORT "led0_g" IO_TYPE=LVCMOS33;
IOBUF PORT "led0_b" IO_TYPE=LVCMOS33;

// discontinuous gpio numbers, match orangecrab litex platform
LOCATE COMP "pin_gpio_0" SITE "N17"; // tx
LOCATE COMP "pin_gpio_1" SITE "M18"; // rx
LOCATE COMP "pin_gpio_2" SITE "C10"; // sda
LOCATE COMP "pin_gpio_3" SITE "C9"; // scl
//
LOCATE COMP "pin_gpio_5" SITE "B10"; // io_5
LOCATE COMP "pin_gpio_6" SITE "B9"; // ...
//
LOCATE COMP "pin_gpio_9" SITE "C8"; //
LOCATE COMP "pin_gpio_10" SITE "B8"; //
LOCATE COMP "pin_gpio_11" SITE "A8"; //
LOCATE COMP "pin_gpio_12" SITE "H2"; //
LOCATE COMP "pin_gpio_13" SITE "J2"; // io_13
LOCATE COMP "pin_gpio_14" SITE "N15"; // miso
LOCATE COMP "pin_gpio_15" SITE "R17"; // sck
LOCATE COMP "pin_gpio_16" SITE "N16"; // mosi

LOCATE COMP "pin_io_a0" SITE "L4";
LOCATE COMP "pin_io_a1" SITE "N3";
LOCATE COMP "pin_io_a2" SITE "N4";
LOCATE COMP "pin_io_a3" SITE "H4";
LOCATE COMP "pin_io_a4" SITE "G4";
LOCATE COMP "pin_io_a5" SITE "T17";

IOBUF PORT "pin_gpio_0" IO_TYPE=LVCMOS33;
IOBUF PORT "pin_gpio_1" IO_TYPE=LVCMOS33;
IOBUF PORT "pin_gpio_2" IO_TYPE=LVCMOS33;
IOBUF PORT "pin_gpio_3" IO_TYPE=LVCMOS33;
IOBUF PORT "pin_gpio_5" IO_TYPE=LVCMOS33;
IOBUF PORT "pin_gpio_6" IO_TYPE=LVCMOS33;
IOBUF PORT "pin_gpio_9" IO_TYPE=LVCMOS33;
IOBUF PORT "pin_gpio_10" IO_TYPE=LVCMOS33;
IOBUF PORT "pin_gpio_11" IO_TYPE=LVCMOS33;
IOBUF PORT "pin_gpio_12" IO_TYPE=LVCMOS33;
IOBUF PORT "pin_gpio_13" IO_TYPE=LVCMOS33;
IOBUF PORT "pin_gpio_14" IO_TYPE=LVCMOS33;
IOBUF PORT "pin_gpio_15" IO_TYPE=LVCMOS33;
IOBUF PORT "pin_gpio_16" IO_TYPE=LVCMOS33;
IOBUF PORT "pin_io_a0" IO_TYPE=LVCMOS33;
IOBUF PORT "pin_io_a1" IO_TYPE=LVCMOS33;
IOBUF PORT "pin_io_a2" IO_TYPE=LVCMOS33;
IOBUF PORT "pin_io_a3" IO_TYPE=LVCMOS33;
IOBUF PORT "pin_io_a4" IO_TYPE=LVCMOS33;
IOBUF PORT "pin_io_a5" IO_TYPE=LVCMOS33;

LOCATE COMP "ddram_a[0]" SITE "C4";
LOCATE COMP "ddram_a[1]" SITE "D2";
LOCATE COMP "ddram_a[2]" SITE "D3";
LOCATE COMP "ddram_a[3]" SITE "A3";
LOCATE COMP "ddram_a[4]" SITE "A4";
LOCATE COMP "ddram_a[5]" SITE "D4";
LOCATE COMP "ddram_a[6]" SITE "C3";
LOCATE COMP "ddram_a[7]" SITE "B2";
LOCATE COMP "ddram_a[8]" SITE "B1";
LOCATE COMP "ddram_a[9]" SITE "D1";
LOCATE COMP "ddram_a[10]" SITE "A7";
LOCATE COMP "ddram_a[11]" SITE "C2";
LOCATE COMP "ddram_a[12]" SITE "B6";
LOCATE COMP "ddram_a[13]" SITE "C1";
LOCATE COMP "ddram_a[14]" SITE "A2";
LOCATE COMP "ddram_a[15]" SITE "C7";
IOBUF PORT "ddram_a[0]" IO_TYPE=SSTL135_I SLEWRATE=FAST;
IOBUF PORT "ddram_a[1]" IO_TYPE=SSTL135_I SLEWRATE=FAST;
IOBUF PORT "ddram_a[2]" IO_TYPE=SSTL135_I SLEWRATE=FAST;
IOBUF PORT "ddram_a[3]" IO_TYPE=SSTL135_I SLEWRATE=FAST;
IOBUF PORT "ddram_a[4]" IO_TYPE=SSTL135_I SLEWRATE=FAST;
IOBUF PORT "ddram_a[5]" IO_TYPE=SSTL135_I SLEWRATE=FAST;
IOBUF PORT "ddram_a[6]" IO_TYPE=SSTL135_I SLEWRATE=FAST;
IOBUF PORT "ddram_a[7]" IO_TYPE=SSTL135_I SLEWRATE=FAST;
IOBUF PORT "ddram_a[8]" IO_TYPE=SSTL135_I SLEWRATE=FAST;
IOBUF PORT "ddram_a[9]" IO_TYPE=SSTL135_I SLEWRATE=FAST;
IOBUF PORT "ddram_a[10]" IO_TYPE=SSTL135_I SLEWRATE=FAST;
IOBUF PORT "ddram_a[11]" IO_TYPE=SSTL135_I SLEWRATE=FAST;
IOBUF PORT "ddram_a[12]" IO_TYPE=SSTL135_I SLEWRATE=FAST;
IOBUF PORT "ddram_a[13]" IO_TYPE=SSTL135_I SLEWRATE=FAST;
IOBUF PORT "ddram_a[14]" IO_TYPE=SSTL135_I SLEWRATE=FAST;
IOBUF PORT "ddram_a[15]" IO_TYPE=SSTL135_I SLEWRATE=FAST;

LOCATE COMP "ddram_ba[0]" SITE "D6";
LOCATE COMP "ddram_ba[1]" SITE "B7";
LOCATE COMP "ddram_ba[2]" SITE "A6";
LOCATE COMP "ddram_cas_n" SITE "D13";
LOCATE COMP "ddram_cs_n" SITE "A12";
LOCATE COMP "ddram_dm[0]" SITE "D16";
LOCATE COMP "ddram_dm[1]" SITE "G16";
LOCATE COMP "ddram_ras_n" SITE "C12";
LOCATE COMP "ddram_we_n" SITE "B12";
IOBUF PORT "ddram_ba[0]" IO_TYPE=SSTL135_I SLEWRATE=FAST;
IOBUF PORT "ddram_ba[1]" IO_TYPE=SSTL135_I SLEWRATE=FAST;
IOBUF PORT "ddram_ba[2]" IO_TYPE=SSTL135_I SLEWRATE=FAST;
IOBUF PORT "ddram_cas_n" IO_TYPE=SSTL135_I SLEWRATE=FAST;
IOBUF PORT "ddram_cs_n" IO_TYPE=SSTL135_I SLEWRATE=FAST;
IOBUF PORT "ddram_dm[0]" IO_TYPE=SSTL135_I SLEWRATE=FAST;
IOBUF PORT "ddram_dm[1]" IO_TYPE=SSTL135_I SLEWRATE=FAST;
IOBUF PORT "ddram_ras_n" IO_TYPE=SSTL135_I SLEWRATE=FAST;
IOBUF PORT "ddram_we_n" IO_TYPE=SSTL135_I SLEWRATE=FAST;

// from litex platform, termination disabled to reduce heat
LOCATE COMP "ddram_dq[0]" SITE "C17";
LOCATE COMP "ddram_dq[1]" SITE "D15";
LOCATE COMP "ddram_dq[2]" SITE "B17";
LOCATE COMP "ddram_dq[3]" SITE "C16";
LOCATE COMP "ddram_dq[4]" SITE "A15";
LOCATE COMP "ddram_dq[5]" SITE "B13";
LOCATE COMP "ddram_dq[6]" SITE "A17";
LOCATE COMP "ddram_dq[7]" SITE "A13";
LOCATE COMP "ddram_dq[8]" SITE "F17";
LOCATE COMP "ddram_dq[9]" SITE "F16";
LOCATE COMP "ddram_dq[10]" SITE "G15";
LOCATE COMP "ddram_dq[11]" SITE "F15";
LOCATE COMP "ddram_dq[12]" SITE "J16";
LOCATE COMP "ddram_dq[13]" SITE "C18";
LOCATE COMP "ddram_dq[14]" SITE "H16";
LOCATE COMP "ddram_dq[15]" SITE "F18";
IOBUF PORT "ddram_dq[0]" IO_TYPE=SSTL135_I SLEWRATE=FAST TERMINATION=OFF;
IOBUF PORT "ddram_dq[1]" IO_TYPE=SSTL135_I SLEWRATE=FAST TERMINATION=OFF;
IOBUF PORT "ddram_dq[2]" IO_TYPE=SSTL135_I SLEWRATE=FAST TERMINATION=OFF;
IOBUF PORT "ddram_dq[3]" IO_TYPE=SSTL135_I SLEWRATE=FAST TERMINATION=OFF;
IOBUF PORT "ddram_dq[4]" IO_TYPE=SSTL135_I SLEWRATE=FAST TERMINATION=OFF;
IOBUF PORT "ddram_dq[5]" IO_TYPE=SSTL135_I SLEWRATE=FAST TERMINATION=OFF;
IOBUF PORT "ddram_dq[6]" IO_TYPE=SSTL135_I SLEWRATE=FAST TERMINATION=OFF;
IOBUF PORT "ddram_dq[7]" IO_TYPE=SSTL135_I SLEWRATE=FAST TERMINATION=OFF;
IOBUF PORT "ddram_dq[8]" IO_TYPE=SSTL135_I SLEWRATE=FAST TERMINATION=OFF;
IOBUF PORT "ddram_dq[9]" IO_TYPE=SSTL135_I SLEWRATE=FAST TERMINATION=OFF;
IOBUF PORT "ddram_dq[10]" IO_TYPE=SSTL135_I SLEWRATE=FAST TERMINATION=OFF;
IOBUF PORT "ddram_dq[11]" IO_TYPE=SSTL135_I SLEWRATE=FAST TERMINATION=OFF;
IOBUF PORT "ddram_dq[12]" IO_TYPE=SSTL135_I SLEWRATE=FAST TERMINATION=OFF;
IOBUF PORT "ddram_dq[13]" IO_TYPE=SSTL135_I SLEWRATE=FAST TERMINATION=OFF;
IOBUF PORT "ddram_dq[14]" IO_TYPE=SSTL135_I SLEWRATE=FAST TERMINATION=OFF;
IOBUF PORT "ddram_dq[15]" IO_TYPE=SSTL135_I SLEWRATE=FAST TERMINATION=OFF;

LOCATE COMP "ddram_dqs_n[0]" SITE "A16";
LOCATE COMP "ddram_dqs_n[1]" SITE "H17";
LOCATE COMP "ddram_dqs_p[0]" SITE "B15";
LOCATE COMP "ddram_dqs_p[1]" SITE "G18";
IOBUF PORT "ddram_dqs_n[0]" IO_TYPE=SSTL135D_I SLEWRATE=FAST DIFFRESISTOR=100 TERMINATION=OFF;
IOBUF PORT "ddram_dqs_n[1]" IO_TYPE=SSTL135D_I SLEWRATE=FAST DIFFRESISTOR=100 TERMINATION=OFF;
IOBUF PORT "ddram_dqs_p[0]" IO_TYPE=SSTL135D_I SLEWRATE=FAST DIFFRESISTOR=100 TERMINATION=OFF;
IOBUF PORT "ddram_dqs_p[1]" IO_TYPE=SSTL135D_I SLEWRATE=FAST DIFFRESISTOR=100 TERMINATION=OFF;

LOCATE COMP "ddram_clk_p" SITE "J18";
LOCATE COMP "ddram_clk_n" SITE "K18";
IOBUF PORT "ddram_clk_p" IO_TYPE=SSTL135D_I SLEWRATE=FAST;
IOBUF PORT "ddram_clk_n" IO_TYPE=SSTL135D_I SLEWRATE=FAST;

LOCATE COMP "ddram_cke" SITE "D18";
LOCATE COMP "ddram_odt" SITE "C13";
LOCATE COMP "ddram_reset_n" SITE "L18";
IOBUF PORT "ddram_cke" IO_TYPE=SSTL135_I SLEWRATE=FAST;
IOBUF PORT "ddram_odt" IO_TYPE=SSTL135_I SLEWRATE=FAST;
IOBUF PORT "ddram_reset_n" IO_TYPE=SSTL135_I SLEWRATE=FAST;

LOCATE COMP "ddram_vccio[0]" SITE "K16";
LOCATE COMP "ddram_vccio[1]" SITE "D17";
LOCATE COMP "ddram_vccio[2]" SITE "K15";
LOCATE COMP "ddram_vccio[3]" SITE "K17";
LOCATE COMP "ddram_vccio[4]" SITE "B18";
LOCATE COMP "ddram_vccio[5]" SITE "C6";
LOCATE COMP "ddram_gnd[0]" SITE "L15";
LOCATE COMP "ddram_gnd[1]" SITE "L16";
IOBUF PORT "ddram_vccio[0]" IO_TYPE=SSTL135_II SLEWRATE=FAST;
IOBUF PORT "ddram_vccio[1]" IO_TYPE=SSTL135_II SLEWRATE=FAST;
IOBUF PORT "ddram_vccio[2]" IO_TYPE=SSTL135_II SLEWRATE=FAST;
IOBUF PORT "ddram_vccio[3]" IO_TYPE=SSTL135_II SLEWRATE=FAST;
IOBUF PORT "ddram_vccio[4]" IO_TYPE=SSTL135_II SLEWRATE=FAST;
IOBUF PORT "ddram_vccio[5]" IO_TYPE=SSTL135_II SLEWRATE=FAST;
IOBUF PORT "ddram_gnd[0]" IO_TYPE=SSTL135_II SLEWRATE=FAST;
IOBUF PORT "ddram_gnd[1]" IO_TYPE=SSTL135_II SLEWRATE=FAST;

// We use USRMCLK instead for clk
// LOCATE COMP "spi_flash_clk" SITE "U16";
// IOBUF PORT "spi_flash_clk" IO_TYPE=LVCMOS33;
LOCATE COMP "spi_flash_cs_n" SITE "U17";
IOBUF PORT "spi_flash_cs_n" IO_TYPE=LVCMOS33;
LOCATE COMP "spi_flash_mosi" SITE "U18";
IOBUF PORT "spi_flash_mosi" IO_TYPE=LVCMOS33;
LOCATE COMP "spi_flash_miso" SITE "T18";
IOBUF PORT "spi_flash_miso" IO_TYPE=LVCMOS33;
LOCATE COMP "spi_flash_wp_n" SITE "R18";
IOBUF PORT "spi_flash_wp_n" IO_TYPE=LVCMOS33;
LOCATE COMP "spi_flash_hold_n" SITE "N18";
IOBUF PORT "spi_flash_hold_n" IO_TYPE=LVCMOS33;

LOCATE COMP "sdcard_data[0]" SITE "J1";
LOCATE COMP "sdcard_data[1]" SITE "K3";
LOCATE COMP "sdcard_data[2]" SITE "L3";
LOCATE COMP "sdcard_data[3]" SITE "M1";
LOCATE COMP "sdcard_cmd" SITE "K2";
LOCATE COMP "sdcard_clk" SITE "K1";
LOCATE COMP "sdcard_cd" SITE "L1";

IOBUF PORT "sdcard_data[0]" IO_TYPE=LVCMOS33 SLEWRATE=FAST PULLMODE=UP;
IOBUF PORT "sdcard_data[1]" IO_TYPE=LVCMOS33 SLEWRATE=FAST PULLMODE=UP;
IOBUF PORT "sdcard_data[2]" IO_TYPE=LVCMOS33 SLEWRATE=FAST PULLMODE=UP;
IOBUF PORT "sdcard_data[3]" IO_TYPE=LVCMOS33 SLEWRATE=FAST PULLMODE=UP;
IOBUF PORT "sdcard_cmd" IO_TYPE=LVCMOS33 SLEWRATE=FAST PULLMODE=UP;
IOBUF PORT "sdcard_clk" IO_TYPE=LVCMOS33 SLEWRATE=FAST;
IOBUF PORT "sdcard_cd" IO_TYPE=LVCMOS33;

@ -64,8 +64,8 @@ architecture rtl of control is

signal r_int, rin_int : reg_internal_type := reg_internal_init;

signal gpr_write_valid : std_ulogic := '0';
signal cr_write_valid : std_ulogic := '0';
signal gpr_write_valid : std_ulogic;
signal cr_write_valid : std_ulogic;

type tag_register is record
wr_gpr : std_ulogic;
@ -245,6 +245,8 @@ begin
end if;

if rst = '1' then
gpr_write_valid <= '0';
cr_write_valid <= '0';
v_int := reg_internal_init;
valid_tmp := '0';
end if;

@ -13,6 +13,7 @@ entity core is
EX1_BYPASS : boolean := true;
HAS_FPU : boolean := true;
HAS_BTC : boolean := true;
HAS_SHORT_MULT : boolean := false;
ALT_RESET_ADDRESS : std_ulogic_vector(63 downto 0) := (others => '0');
LOG_LENGTH : natural := 512;
ICACHE_NUM_LINES : natural := 64;
@ -116,21 +117,20 @@ architecture behave of core is
signal complete: instr_tag_t;
signal terminate: std_ulogic;
signal core_rst: std_ulogic;
signal icache_inv: std_ulogic;
signal do_interrupt: std_ulogic;

-- Delayed/Latched resets and alt_reset
signal rst_fetch1 : std_ulogic := '1';
signal rst_fetch2 : std_ulogic := '1';
signal rst_icache : std_ulogic := '1';
signal rst_dcache : std_ulogic := '1';
signal rst_dec1 : std_ulogic := '1';
signal rst_dec2 : std_ulogic := '1';
signal rst_ex1 : std_ulogic := '1';
signal rst_fpu : std_ulogic := '1';
signal rst_ls1 : std_ulogic := '1';
signal rst_wback : std_ulogic := '1';
signal rst_dbg : std_ulogic := '1';
signal rst_fetch1 : std_ulogic;
signal rst_fetch2 : std_ulogic;
signal rst_icache : std_ulogic;
signal rst_dcache : std_ulogic;
signal rst_dec1 : std_ulogic;
signal rst_dec2 : std_ulogic;
signal rst_ex1 : std_ulogic;
signal rst_fpu : std_ulogic;
signal rst_ls1 : std_ulogic;
signal rst_wback : std_ulogic;
signal rst_dbg : std_ulogic;
signal alt_reset_d : std_ulogic;

signal sim_cr_dump: std_ulogic;
@ -147,6 +147,12 @@ architecture behave of core is

signal msr : std_ulogic_vector(63 downto 0);

-- PMU event bus
signal icache_events : IcacheEventType;
signal loadstore_events : Loadstore1EventType;
signal dcache_events : DcacheEventType;
signal writeback_events : WritebackEventType;

-- Debug status
signal dbg_core_is_stopped: std_ulogic;

@ -244,6 +250,7 @@ begin
wishbone_out => wishbone_insn_out,
wishbone_in => wishbone_insn_in,
wb_snoop_in => wb_snoop_in,
events => icache_events,
log_out => log_data(96 downto 43)
);

@ -333,6 +340,7 @@ begin
generic map (
EX1_BYPASS => EX1_BYPASS,
HAS_FPU => HAS_FPU,
HAS_SHORT_MULT => HAS_SHORT_MULT,
LOG_LENGTH => LOG_LENGTH
)
port map (
@ -352,6 +360,10 @@ begin
bypass_cr_data => execute1_cr_bypass,
icache_inval => ex1_icache_inval,
dbg_msr_out => msr,
wb_events => writeback_events,
ls_events => loadstore_events,
dc_events => dcache_events,
ic_events => icache_events,
terminate_out => terminate,
log_out => log_data(134 downto 120),
log_rd_addr => log_rd_addr,
@ -393,6 +405,7 @@ begin
m_out => loadstore1_to_mmu,
m_in => mmu_to_loadstore1,
dc_stall => dcache_stall_out,
events => loadstore_events,
log_out => log_data(149 downto 140)
);

@ -427,6 +440,7 @@ begin
wishbone_in => wishbone_data_in,
wishbone_out => wishbone_data_out,
snoop_in => wb_snoop_in,
events => dcache_events,
log_out => log_data(170 downto 151)
);

@ -441,6 +455,7 @@ begin
w_out => writeback_to_register_file,
c_out => writeback_to_cr_file,
f_out => writeback_to_fetch1,
events => writeback_events,
interrupt_out => do_interrupt,
complete_out => complete
);

@ -154,6 +154,7 @@ begin
stopping <= '0';
terminated <= '0';
log_trigger_delay <= 0;
gspr_index <= (others => '0');
else
if do_log_trigger = '1' or log_trigger_delay /= 0 then
if log_trigger_delay = 255 then

@ -121,6 +121,7 @@ begin
DRAM_ABITS => 24,
DRAM_ALINES => 1,
DRAM_DLINES => 16,
DRAM_CKLINES => 1,
DRAM_PORT_WIDTH => 128,
PAYLOAD_FILE => DRAM_INIT_FILE,
PAYLOAD_SIZE => ROM_SIZE

@ -0,0 +1,136 @@
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;

library work;
use work.helpers.all;

entity bit_counter is
port (
clk : in std_logic;
rs : in std_ulogic_vector(63 downto 0);
count_right : in std_ulogic;
do_popcnt : in std_ulogic;
is_32bit : in std_ulogic;
datalen : in std_ulogic_vector(3 downto 0);
result : out std_ulogic_vector(63 downto 0)
);
end entity bit_counter;

architecture behaviour of bit_counter is
-- signals for count-leading/trailing-zeroes
signal inp : std_ulogic_vector(63 downto 0);
signal inp_r : std_ulogic_vector(63 downto 0);
signal sum : std_ulogic_vector(64 downto 0);
signal sum_r : std_ulogic_vector(64 downto 0);
signal onehot : std_ulogic_vector(63 downto 0);
signal edge : std_ulogic_vector(63 downto 0);
signal bitnum : std_ulogic_vector(5 downto 0);
signal cntz : std_ulogic_vector(63 downto 0);

-- signals for popcnt
signal dlen_r : std_ulogic_vector(3 downto 0);
signal pcnt_r : std_ulogic;
subtype twobit is unsigned(1 downto 0);
type twobit32 is array(0 to 31) of twobit;
signal pc2 : twobit32;
subtype threebit is unsigned(2 downto 0);
type threebit16 is array(0 to 15) of threebit;
signal pc4 : threebit16;
subtype fourbit is unsigned(3 downto 0);
type fourbit8 is array(0 to 7) of fourbit;
signal pc8 : fourbit8;
signal pc8_r : fourbit8;
subtype sixbit is unsigned(5 downto 0);
type sixbit2 is array(0 to 1) of sixbit;
signal pc32 : sixbit2;
signal popcnt : std_ulogic_vector(63 downto 0);

begin
countzero_r: process(clk)
begin
if rising_edge(clk) then
inp_r <= inp;
sum_r <= sum;
end if;
end process;

countzero: process(all)
variable bitnum_e, bitnum_o : std_ulogic_vector(5 downto 0);
begin
if is_32bit = '0' then
if count_right = '0' then
inp <= bit_reverse(rs);
else
inp <= rs;
end if;
else
inp(63 downto 32) <= x"FFFFFFFF";
if count_right = '0' then
inp(31 downto 0) <= bit_reverse(rs(31 downto 0));
else
inp(31 downto 0) <= rs(31 downto 0);
end if;
end if;

sum <= std_ulogic_vector(unsigned('0' & not inp) + 1);

-- The following occurs after a clock edge
edge <= sum_r(63 downto 0) or inp_r;
bitnum_e := edgelocation(edge, 6);
onehot <= sum_r(63 downto 0) and inp_r;
bitnum_o := bit_number(onehot);
bitnum(5 downto 2) <= bitnum_e(5 downto 2);
bitnum(1 downto 0) <= bitnum_o(1 downto 0);

cntz <= 57x"0" & sum_r(64) & bitnum;
end process;

popcnt_r: process(clk)
begin
if rising_edge(clk) then
for i in 0 to 7 loop
pc8_r(i) <= pc8(i);
end loop;
dlen_r <= datalen;
pcnt_r <= do_popcnt;
end if;
end process;

popcnt_a: process(all)
begin
for i in 0 to 31 loop
pc2(i) <= unsigned("0" & rs(i * 2 downto i * 2)) + unsigned("0" & rs(i * 2 + 1 downto i * 2 + 1));
end loop;
for i in 0 to 15 loop
pc4(i) <= ('0' & pc2(i * 2)) + ('0' & pc2(i * 2 + 1));
end loop;
for i in 0 to 7 loop
pc8(i) <= ('0' & pc4(i * 2)) + ('0' & pc4(i * 2 + 1));
end loop;

-- after a clock edge
for i in 0 to 1 loop
pc32(i) <= ("00" & pc8_r(i * 4)) + ("00" & pc8_r(i * 4 + 1)) +
("00" & pc8_r(i * 4 + 2)) + ("00" & pc8_r(i * 4 + 3));
end loop;
popcnt <= (others => '0');
if dlen_r(3 downto 2) = "00" then
-- popcntb
for i in 0 to 7 loop
popcnt(i * 8 + 3 downto i * 8) <= std_ulogic_vector(pc8_r(i));
end loop;
elsif dlen_r(3) = '0' then
-- popcntw
for i in 0 to 1 loop
popcnt(i * 32 + 5 downto i * 32) <= std_ulogic_vector(pc32(i));
end loop;
else
popcnt(6 downto 0) <= std_ulogic_vector(('0' & pc32(0)) + ('0' & pc32(1)));
end if;
end process;

result <= cntz when pcnt_r = '0' else popcnt;

end behaviour;

@ -11,11 +11,11 @@ use work.common.all;
library osvvm;
use osvvm.RandomPkg.all;

entity countzero_tb is
entity countbits_tb is
generic (runner_cfg : string := runner_cfg_default);
end countzero_tb;
end countbits_tb;

architecture behave of countzero_tb is
architecture behave of countbits_tb is
constant clk_period: time := 10 ns;
signal rs: std_ulogic_vector(63 downto 0);
signal is_32bit, count_right: std_ulogic := '0';
@ -23,13 +23,15 @@ architecture behave of countzero_tb is
signal clk: std_ulogic;

begin
zerocounter_0: entity work.zero_counter
bitcounter_0: entity work.bit_counter
port map (
clk => clk,
rs => rs,
result => res,
count_right => count_right,
is_32bit => is_32bit
is_32bit => is_32bit,
do_popcnt => '0',
datalen => "0000"
);

clk_process: process

@ -1,60 +0,0 @@
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;

library work;
use work.helpers.all;

entity zero_counter is
port (
clk : in std_logic;
rs : in std_ulogic_vector(63 downto 0);
count_right : in std_ulogic;
is_32bit : in std_ulogic;
result : out std_ulogic_vector(63 downto 0)
);
end entity zero_counter;

architecture behaviour of zero_counter is
signal inp : std_ulogic_vector(63 downto 0);
signal sum : std_ulogic_vector(64 downto 0);
signal msb_r : std_ulogic;
signal onehot : std_ulogic_vector(63 downto 0);
signal onehot_r : std_ulogic_vector(63 downto 0);
signal bitnum : std_ulogic_vector(5 downto 0);

begin
countzero_r: process(clk)
begin
if rising_edge(clk) then
msb_r <= sum(64);
onehot_r <= onehot;
end if;
end process;

countzero: process(all)
begin
if is_32bit = '0' then
if count_right = '0' then
inp <= bit_reverse(rs);
else
inp <= rs;
end if;
else
inp(63 downto 32) <= x"FFFFFFFF";
if count_right = '0' then
inp(31 downto 0) <= bit_reverse(rs(31 downto 0));
else
inp(31 downto 0) <= rs(31 downto 0);
end if;
end if;

sum <= std_ulogic_vector(unsigned('0' & not inp) + 1);
onehot <= sum(63 downto 0) and inp;

-- The following occurs after a clock edge
bitnum <= bit_number(onehot_r);

result <= x"00000000000000" & "0" & msb_r & bitnum;
end process;
end behaviour;

@ -46,6 +46,8 @@ entity dcache is
wishbone_out : out wishbone_master_out;
wishbone_in : in wishbone_slave_out;

events : out DcacheEventType;

log_out : out std_ulogic_vector(19 downto 0)
);
end entity dcache;
@ -65,8 +67,6 @@ architecture rtl of dcache is

-- Bit fields counts in the address

-- REAL_ADDR_BITS is the number of real address bits that we store
constant REAL_ADDR_BITS : positive := 56;
-- ROW_BITS is the number of bits to select a row
constant ROW_BITS : natural := log2(BRAM_ROWS);
-- ROW_LINEBITS is the number of bits to select a row within a line
@ -120,7 +120,7 @@ architecture rtl of dcache is
type cache_valids_t is array(index_t) of cache_way_valids_t;
type row_per_line_valid_t is array(0 to ROW_PER_LINE - 1) of std_ulogic;

-- Storage. Hopefully "cache_rows" is a BRAM, the rest is LUTs
-- Storage. Hopefully implemented in LUTs
signal cache_tags : cache_tags_array_t;
signal cache_tag_set : cache_tags_set_t;
signal cache_valids : cache_valids_t;
@ -287,7 +287,7 @@ architecture rtl of dcache is
op : op_t;
valid : std_ulogic;
dcbz : std_ulogic;
real_addr : std_ulogic_vector(REAL_ADDR_BITS - 1 downto 0);
real_addr : real_addr_t;
data : std_ulogic_vector(63 downto 0);
byte_sel : std_ulogic_vector(7 downto 0);
hit_way : way_t;
@ -315,15 +315,13 @@ architecture rtl of dcache is
tlb_hit_way : tlb_way_t;
tlb_hit_index : tlb_index_t;

-- 2-stage data buffer for data forwarded from writes to reads
forward_data1 : std_ulogic_vector(63 downto 0);
forward_data2 : std_ulogic_vector(63 downto 0);
forward_sel1 : std_ulogic_vector(7 downto 0);
forward_valid1 : std_ulogic;
forward_way1 : way_t;
forward_row1 : row_t;
use_forward1 : std_ulogic;
-- data buffer for data forwarded from writes to reads
forward_data : std_ulogic_vector(63 downto 0);
forward_tag : cache_tag_t;
forward_sel : std_ulogic_vector(7 downto 0);
forward_valid : std_ulogic;
forward_row : row_t;
data_out : std_ulogic_vector(63 downto 0);

-- Cache miss state (reload state machine)
state : state_t;
@ -355,6 +353,8 @@ architecture rtl of dcache is

signal r1 : reg_stage_1_t;

signal ev : DcacheEventType;

-- Reservation information
--
type reservation_t is record
@ -383,12 +383,16 @@ architecture rtl of dcache is
signal r0_valid : std_ulogic;
signal r0_stall : std_ulogic;

signal use_forward1_next : std_ulogic;
signal use_forward2_next : std_ulogic;
signal fwd_same_tag : std_ulogic;
signal use_forward_st : std_ulogic;
signal use_forward_rl : std_ulogic;
signal use_forward2 : std_ulogic;

-- Cache RAM interface
type cache_ram_out_t is array(way_t) of cache_row_t;
signal cache_out : cache_ram_out_t;
signal ram_wr_data : cache_row_t;
signal ram_wr_select : std_ulogic_vector(ROW_SIZE - 1 downto 0);

-- PLRU output interface
type plru_out_t is array(index_t) of std_ulogic_vector(WAY_BITS-1 downto 0);
@ -406,12 +410,13 @@ architecture rtl of dcache is
signal tlb_hit : std_ulogic;
signal tlb_hit_way : tlb_way_t;
signal pte : tlb_pte_t;
signal ra : std_ulogic_vector(REAL_ADDR_BITS - 1 downto 0);
signal ra : real_addr_t;
signal valid_ra : std_ulogic;
signal perm_attr : perm_attr_t;
signal rc_ok : std_ulogic;
signal perm_ok : std_ulogic;
signal access_ok : std_ulogic;
signal tlb_miss : std_ulogic;

-- TLB PLRU output interface
type tlb_plru_out_t is array(tlb_index_t) of std_ulogic_vector(TLB_WAY_BITS-1 downto 0);
@ -447,9 +452,9 @@ architecture rtl of dcache is
end;

-- Returns whether this is the last row of a line
function is_last_row_addr(addr: wishbone_addr_type; last: row_in_line_t) return boolean is
function is_last_row_wb_addr(addr: wishbone_addr_type; last: row_in_line_t) return boolean is
begin
return unsigned(addr(LINE_OFF_BITS-1 downto ROW_OFF_BITS)) = last;
return unsigned(addr(LINE_OFF_BITS - ROW_OFF_BITS - 1 downto 0)) = last;
end;

-- Returns whether this is the last row of a line
@ -459,15 +464,15 @@ architecture rtl of dcache is
end;

-- Return the address of the next row in the current cache line
function next_row_addr(addr: wishbone_addr_type) return std_ulogic_vector is
function next_row_wb_addr(addr: wishbone_addr_type) return std_ulogic_vector is
variable row_idx : std_ulogic_vector(ROW_LINEBITS-1 downto 0);
variable result : wishbone_addr_type;
begin
-- Is there no simpler way in VHDL to generate that 3 bits adder ?
row_idx := addr(LINE_OFF_BITS-1 downto ROW_OFF_BITS);
row_idx := addr(ROW_LINEBITS - 1 downto 0);
row_idx := std_ulogic_vector(unsigned(row_idx) + 1);
result := addr;
result(LINE_OFF_BITS-1 downto ROW_OFF_BITS) := row_idx;
result(ROW_LINEBITS - 1 downto 0) := row_idx;
return result;
end;

@ -571,6 +576,7 @@ begin
r.doall := m_in.doall;
r.tlbld := m_in.tlbld;
r.mmu_req := '1';
r.d_valid := '1';
else
r.req := d_in;
r.req.data := (others => '0');
@ -578,21 +584,19 @@ begin
r.doall := '0';
r.tlbld := '0';
r.mmu_req := '0';
r.d_valid := '0';
end if;
r.d_valid := '0';
if rst = '1' then
r0_full <= '0';
elsif (r1.full = '0' and d_in.hold = '0') or r0_full = '0' then
r0 <= r;
r0_full <= r.req.valid;
end if;
-- Sample data the cycle after a request comes in from loadstore1.
-- If another request has come in already then the data will get
-- put directly into req.data below.
if r0.req.valid = '1' and r.req.valid = '0' and r0.d_valid = '0' and
r0.mmu_req = '0' then
elsif r0.d_valid = '0' then
-- Sample data the cycle after a request comes in from loadstore1.
-- If this request is already moving into r1 then the data will get
-- put directly into req.data in the dcache_slow process below.
r0.req.data <= d_in.data;
r0.d_valid <= '1';
r0.d_valid <= r0.req.valid;
end if;
end if;
end process;
@ -605,6 +609,8 @@ begin
r0_valid <= r0_full and not r1.full and not d_in.hold;
stall_out <= r0_stall;

events <= ev;

-- TLB
-- Operates in the second cycle on the request latched in r0.req.
-- TLB updates write the entry at the end of the second cycle.
@ -689,6 +695,7 @@ begin
pte <= (others => '0');
end if;
valid_ra <= tlb_hit or not r0.req.virt_mode;
tlb_miss <= r0_valid and r0.req.virt_mode and not tlb_hit;
if r0.req.virt_mode = '1' then
ra <= pte(REAL_ADDR_BITS - 1 downto TLB_LG_PGSZ) &
r0.req.addr(TLB_LG_PGSZ - 1 downto ROW_OFF_BITS) &
@ -712,6 +719,7 @@ begin
if rising_edge(clk) then
tlbie := r0_valid and r0.tlbie;
tlbwe := r0_valid and r0.tlbld;
ev.dtlb_miss_resolved <= tlbwe;
if rst = '1' or (tlbie = '1' and r0.doall = '1') then
-- clear all valid bits at once
for i in tlb_index_t loop
@ -793,11 +801,10 @@ begin

-- Cache tag RAM second read port, for snooping
cache_tag_read_2 : process(clk)
variable addr : std_ulogic_vector(REAL_ADDR_BITS - 1 downto 0);
variable addr : real_addr_t;
begin
if rising_edge(clk) then
addr := (others => '0');
addr(snoop_in.adr'left downto 0) := snoop_in.adr;
addr := addr_to_real(wb_to_addr(snoop_in.adr));
snoop_tag_set <= cache_tags(get_index(addr));
snoop_wrtag <= get_tag(addr);
snoop_index <= get_index(addr);
@ -820,11 +827,13 @@ begin
variable s_hit : std_ulogic;
variable s_tag : cache_tag_t;
variable s_pte : tlb_pte_t;
variable s_ra : std_ulogic_vector(REAL_ADDR_BITS - 1 downto 0);
variable s_ra : real_addr_t;
variable hit_set : std_ulogic_vector(TLB_NUM_WAYS - 1 downto 0);
variable hit_way_set : hit_way_set_t;
variable rel_matches : std_ulogic_vector(TLB_NUM_WAYS - 1 downto 0);
variable rel_match : std_ulogic;
variable fwd_matches : std_ulogic_vector(TLB_NUM_WAYS - 1 downto 0);
variable fwd_match : std_ulogic;
begin
-- Extract line, row and tag from request
req_index <= get_index(r0.req.addr);
@ -840,8 +849,10 @@ begin
hit_way := 0;
is_hit := '0';
rel_match := '0';
fwd_match := '0';
if r0.req.virt_mode = '1' then
rel_matches := (others => '0');
fwd_matches := (others => '0');
for j in tlb_way_t loop
hit_way_set(j) := 0;
s_hit := '0';
@ -861,11 +872,15 @@ begin
if s_tag = r1.reload_tag then
rel_matches(j) := '1';
end if;
if s_tag = r1.forward_tag then
fwd_matches(j) := '1';
end if;
end loop;
if tlb_hit = '1' then
is_hit := hit_set(tlb_hit_way);
hit_way := hit_way_set(tlb_hit_way);
rel_match := rel_matches(tlb_hit_way);
fwd_match := fwd_matches(tlb_hit_way);
end if;
else
s_tag := get_tag(r0.req.addr);
@ -879,39 +894,28 @@ begin
if s_tag = r1.reload_tag then
rel_match := '1';
end if;
if s_tag = r1.forward_tag then
fwd_match := '1';
end if;
end if;
req_same_tag <= rel_match;

-- See if the request matches the line currently being reloaded
if r1.state = RELOAD_WAIT_ACK and req_index = r1.store_index and
rel_match = '1' then
-- For a store, consider this a hit even if the row isn't valid
-- since it will be by the time we perform the store.
-- For a load, check the appropriate row valid bit.
is_hit := not r0.req.load or r1.rows_valid(req_row mod ROW_PER_LINE);
hit_way := replace_way;
end if;
fwd_same_tag <= fwd_match;

-- Whether to use forwarded data for a load or not
use_forward1_next <= '0';
if get_row(r1.req.real_addr) = req_row and r1.req.hit_way = hit_way then
-- Only need to consider r1.write_bram here, since if we are
-- writing refill data here, then we don't have a cache hit this
-- cycle on the line being refilled. (There is the possibility
-- that the load following the load miss that started the refill
-- could be to the old contents of the victim line, since it is a
-- couple of cycles after the refill starts before we see the
-- updated cache tag. In that case we don't use the bypass.)
use_forward1_next <= r1.write_bram;
use_forward_st <= '0';
use_forward_rl <= '0';
if r1.store_row = req_row and rel_match = '1' then
-- Use the forwarding path if this cycle is a write to this row
use_forward_st <= r1.write_bram;
if r1.state = RELOAD_WAIT_ACK and wishbone_in.ack = '1' then
use_forward_rl <= '1';
end if;
end if;
use_forward2_next <= '0';
if r1.forward_row1 = req_row and r1.forward_way1 = hit_way then
use_forward2_next <= r1.forward_valid1;
use_forward2 <= '0';
if r1.forward_row = req_row and fwd_match = '1' then
use_forward2 <= r1.forward_valid;
end if;

-- The way that matched on a hit
req_hit_way <= hit_way;

-- The way to replace on a miss
if r1.write_tag = '1' then
replace_way <= to_integer(unsigned(plru_victim(r1.store_index)));
@ -919,6 +923,23 @@ begin
replace_way <= r1.store_way;
end if;

-- See if the request matches the line currently being reloaded
if r1.state = RELOAD_WAIT_ACK and req_index = r1.store_index and
rel_match = '1' then
-- Ignore is_hit from above, because a load miss writes the new tag
-- but doesn't clear the valid bit on the line before refilling it.
-- For a store, consider this a hit even if the row isn't valid
-- since it will be by the time we perform the store.
-- For a load, check the appropriate row valid bit; but also,
-- if use_forward_rl is 1 then we can consider this a hit.
is_hit := not r0.req.load or r1.rows_valid(req_row mod ROW_PER_LINE) or
use_forward_rl;
hit_way := replace_way;
end if;

-- The way that matched on a hit
req_hit_way <= hit_way;

-- work out whether we have permission for this access
-- NB we don't yet implement AMR, thus no KUAP
rc_ok <= perm_attr.reference and (r0.req.load or perm_attr.changed);
@ -1014,28 +1035,9 @@ begin
-- Return data for loads & completion control logic
--
writeback_control: process(all)
variable data_out : std_ulogic_vector(63 downto 0);
variable data_fwd : std_ulogic_vector(63 downto 0);
variable j : integer;
begin
-- Use the bypass if are reading the row that was written 1 or 2 cycles
-- ago, including for the slow_valid = 1 case (i.e. completing a load
-- miss or a non-cacheable load).
if r1.use_forward1 = '1' then
data_fwd := r1.forward_data1;
else
data_fwd := r1.forward_data2;
end if;
data_out := cache_out(r1.hit_way);
for i in 0 to 7 loop
j := i * 8;
if r1.forward_sel(i) = '1' then
data_out(j + 7 downto j) := data_fwd(j + 7 downto j);
end if;
end loop;

d_out.valid <= r1.ls_valid;
d_out.data <= data_out;
d_out.data <= r1.data_out;
d_out.store_done <= not r1.stcx_fail;
d_out.error <= r1.ls_error;
d_out.cache_paradox <= r1.cache_paradox;
@ -1043,7 +1045,7 @@ begin
-- Outputs to MMU
m_out.done <= r1.mmu_done;
m_out.err <= r1.mmu_error;
m_out.data <= data_out;
m_out.data <= r1.data_out;

-- We have a valid load or store hit or we just completed a slow
-- op such as a load miss, a NC load or a store
@ -1067,7 +1069,7 @@ begin
-- Request came from loadstore1...
-- Load hit case is the standard path
if r1.hit_load_valid = '1' then
report "completing load hit data=" & to_hstring(data_out);
report "completing load hit data=" & to_hstring(r1.data_out);
end if;

-- error cases complete without stalling
@ -1077,7 +1079,7 @@ begin

-- Slow ops (load miss, NC, stores)
if r1.slow_valid = '1' then
report "completing store or load miss data=" & to_hstring(data_out);
report "completing store or load miss data=" & to_hstring(r1.data_out);
end if;

else
@ -1099,6 +1101,13 @@ begin

end process;

-- RAM write data and select multiplexers
ram_wr_data <= r1.req.data when r1.write_bram = '1' else
wishbone_in.dat when r1.dcbz = '0' else
(others => '0');
ram_wr_select <= r1.req.byte_sel when r1.write_bram = '1' else
(others => '1');

--
-- Generate a cache RAM for each way. This handles the normal
-- reads, writes from reloads and the special store-hit update
@ -1112,7 +1121,6 @@ begin
rams: for i in 0 to NUM_WAYS-1 generate
signal do_read : std_ulogic;
signal rd_addr : std_ulogic_vector(ROW_BITS-1 downto 0);
signal do_write : std_ulogic;
signal wr_addr : std_ulogic_vector(ROW_BITS-1 downto 0);
signal wr_data : std_ulogic_vector(wishbone_data_bits-1 downto 0);
signal wr_sel : std_ulogic_vector(ROW_SIZE-1 downto 0);
@ -1123,7 +1131,7 @@ begin
generic map (
ROW_BITS => ROW_BITS,
WIDTH => wishbone_data_bits,
ADD_BUF => true
ADD_BUF => false
)
port map (
clk => clk,
@ -1132,7 +1140,7 @@ begin
rd_data => dout,
wr_sel => wr_sel_m,
wr_addr => wr_addr,
wr_data => wr_data
wr_data => ram_wr_data
);
process(all)
begin
@ -1148,37 +1156,13 @@ begin
-- For timing, the mux on wr_data/sel/addr is not dependent on anything
-- other than the current state.
--
wr_sel_m <= (others => '0');

do_write <= '0';
if r1.write_bram = '1' then
-- Write store data to BRAM. This happens one cycle after the
-- store is in r0.
wr_data <= r1.req.data;
wr_sel <= r1.req.byte_sel;
wr_addr <= std_ulogic_vector(to_unsigned(get_row(r1.req.real_addr), ROW_BITS));
if i = r1.req.hit_way then
do_write <= '1';
end if;
else
-- Otherwise, we might be doing a reload or a DCBZ
if r1.dcbz = '1' then
wr_data <= (others => '0');
else
wr_data <= wishbone_in.dat;
end if;
wr_addr <= std_ulogic_vector(to_unsigned(r1.store_row, ROW_BITS));
wr_sel <= (others => '1');

if r1.state = RELOAD_WAIT_ACK and wishbone_in.ack = '1' and replace_way = i then
do_write <= '1';
end if;
end if;
wr_addr <= std_ulogic_vector(to_unsigned(r1.store_row, ROW_BITS));

-- Mask write selects with do_write since BRAM doesn't
-- have a global write-enable
if do_write = '1' then
wr_sel_m <= wr_sel;
wr_sel_m <= (others => '0');
if i = replace_way and
(r1.write_bram = '1' or
(r1.state = RELOAD_WAIT_ACK and wishbone_in.ack = '1')) then
wr_sel_m <= ram_wr_select;
end if;

end process;
@ -1189,20 +1173,60 @@ begin
-- It also handles error cases (TLB miss, cache paradox)
--
dcache_fast_hit : process(clk)
variable j : integer;
variable sel : std_ulogic_vector(1 downto 0);
variable data_out : std_ulogic_vector(63 downto 0);
begin
if rising_edge(clk) then
if req_op /= OP_NONE then
report "op:" & op_t'image(req_op) &
" addr:" & to_hstring(r0.req.addr) &
" nc:" & std_ulogic'image(r0.req.nc) &
" idx:" & integer'image(req_index) &
" tag:" & to_hstring(req_tag) &
" way: " & integer'image(req_hit_way);
end if;
report "op:" & op_t'image(req_op) &
" addr:" & to_hstring(r0.req.addr) &
" nc:" & std_ulogic'image(r0.req.nc) &
" idx:" & integer'image(req_index) &
" tag:" & to_hstring(req_tag) &
" way: " & integer'image(req_hit_way);
end if;
if r0_valid = '1' then
r1.mmu_req <= r0.mmu_req;
end if;

-- Bypass/forwarding multiplexer for load data.
-- Use the bypass if are reading the row of BRAM that was written 0 or 1
-- cycles ago, including for the slow_valid = 1 cases (i.e. completing a
-- load miss or a non-cacheable load), which are handled via the r1.full case.
for i in 0 to 7 loop
if r1.full = '1' or use_forward_rl = '1' then
sel := '0' & r1.dcbz;
elsif use_forward_st = '1' and r1.req.byte_sel(i) = '1' then
sel := "01";
elsif use_forward2 = '1' and r1.forward_sel(i) = '1' then
sel := "10";
else
sel := "11";
end if;
j := i * 8;
case sel is
when "00" =>
data_out(j + 7 downto j) := wishbone_in.dat(j + 7 downto j);
when "01" =>
data_out(j + 7 downto j) := r1.req.data(j + 7 downto j);
when "10" =>
data_out(j + 7 downto j) := r1.forward_data(j + 7 downto j);
when others =>
data_out(j + 7 downto j) := cache_out(req_hit_way)(j + 7 downto j);
end case;
end loop;
r1.data_out <= data_out;

r1.forward_data <= ram_wr_data;
r1.forward_tag <= r1.reload_tag;
r1.forward_row <= r1.store_row;
r1.forward_sel <= ram_wr_select;
r1.forward_valid <= r1.write_bram;
if r1.state = RELOAD_WAIT_ACK and wishbone_in.ack = '1' then
r1.forward_valid <= '1';
end if;

-- Fast path for load/store hits. Set signals for the writeback controls.
r1.hit_way <= req_hit_way;
r1.hit_index <= req_index;
@ -1254,37 +1278,15 @@ begin
-- operates at stage 1.
--
dcache_slow : process(clk)
variable stbs_done : boolean;
variable stbs_done : boolean;
variable req : mem_access_request_t;
variable acks : unsigned(2 downto 0);
begin
if rising_edge(clk) then
r1.use_forward1 <= use_forward1_next;
r1.forward_sel <= (others => '0');
if use_forward1_next = '1' then
r1.forward_sel <= r1.req.byte_sel;
elsif use_forward2_next = '1' then
r1.forward_sel <= r1.forward_sel1;
end if;

r1.forward_data2 <= r1.forward_data1;
if r1.write_bram = '1' then
r1.forward_data1 <= r1.req.data;
r1.forward_sel1 <= r1.req.byte_sel;
r1.forward_way1 <= r1.req.hit_way;
r1.forward_row1 <= get_row(r1.req.real_addr);
r1.forward_valid1 <= '1';
else
if r1.dcbz = '1' then
r1.forward_data1 <= (others => '0');
else
r1.forward_data1 <= wishbone_in.dat;
end if;
r1.forward_sel1 <= (others => '1');
r1.forward_way1 <= replace_way;
r1.forward_row1 <= r1.store_row;
r1.forward_valid1 <= '0';
end if;
ev.dcache_refill <= '0';
ev.load_miss <= '0';
ev.store_miss <= '0';
ev.dtlb_miss <= tlb_miss;

-- On reset, clear all valid bits to force misses
if rst = '1' then
@ -1357,7 +1359,7 @@ begin
req.data := d_in.data;
end if;
-- Select all bytes for dcbz and for cacheable loads
if r0.req.dcbz = '1' or (r0.req.load = '1' and r0.req.nc = '0') then
if r0.req.dcbz = '1' or (r0.req.load = '1' and r0.req.nc = '0' and perm_attr.nocache = '0') then
req.byte_sel := (others => '1');
else
req.byte_sel := r0.req.byte_sel;
@ -1377,7 +1379,7 @@ begin
-- Main state machine
case r1.state is
when IDLE =>
r1.wb.adr <= req.real_addr(r1.wb.adr'left downto 0);
r1.wb.adr <= addr_to_wb(req.real_addr);
r1.wb.sel <= req.byte_sel;
r1.wb.dat <= req.data;
r1.dcbz <= req.dcbz;
@ -1417,6 +1419,7 @@ begin
-- Track that we had one request sent
r1.state <= RELOAD_WAIT_ACK;
r1.write_tag <= '1';
ev.load_miss <= '1';

when OP_LOAD_NC =>
r1.wb.cyc <= '1';
@ -1449,6 +1452,9 @@ begin
r1.wb.we <= '1';
r1.wb.cyc <= '1';
r1.wb.stb <= '1';
if req.op = OP_STORE_MISS then
ev.store_miss <= '1';
end if;

-- OP_NONE and OP_BAD do nothing
-- OP_BAD & OP_STCX_FAIL were handled above already
@ -1461,26 +1467,26 @@ begin
-- If we are still sending requests, was one accepted ?
if wishbone_in.stall = '0' and r1.wb.stb = '1' then
-- That was the last word ? We are done sending. Clear stb.
if is_last_row_addr(r1.wb.adr, r1.end_row_ix) then
if is_last_row_wb_addr(r1.wb.adr, r1.end_row_ix) then
r1.wb.stb <= '0';
end if;

-- Calculate the next row address
r1.wb.adr <= next_row_addr(r1.wb.adr);
r1.wb.adr <= next_row_wb_addr(r1.wb.adr);
end if;

-- Incoming acks processing
r1.forward_valid1 <= wishbone_in.ack;
if wishbone_in.ack = '1' then
r1.rows_valid(r1.store_row mod ROW_PER_LINE) <= '1';
-- If this is the data we were looking for, we can
-- complete the request next cycle.
-- Compare the whole address in case the request in
-- r1.req is not the one that started this refill.
if req.valid = '1' and req.same_tag = '1' and
((r1.dcbz = '1' and req.dcbz = '1') or
(r1.dcbz = '0' and req.op = OP_LOAD_MISS)) and
r1.store_row = get_row(req.real_addr) then
-- (Cases where req comes from r0 are handled as a load
-- hit.)
if r1.full = '1' and r1.req.same_tag = '1' and
((r1.dcbz = '1' and req.dcbz = '1') or r1.req.op = OP_LOAD_MISS) and
r1.store_row = get_row(r1.req.real_addr) then
r1.full <= '0';
r1.slow_valid <= '1';
if r1.mmu_req = '0' then
@ -1488,8 +1494,6 @@ begin
else
r1.mmu_done <= '1';
end if;
r1.forward_sel <= (others => '1');
r1.use_forward1 <= '1';
end if;

-- Check for completion
@ -1500,6 +1504,7 @@ begin
-- Cache line is now valid
cache_valids(r1.store_index)(r1.store_way) <= '1';

ev.dcache_refill <= not r1.dcbz;
r1.state <= IDLE;
end if;

@ -1523,15 +1528,17 @@ begin
-- See if there is another store waiting to be done
-- which is in the same real page.
if req.valid = '1' then
r1.wb.adr(SET_SIZE_BITS - 1 downto 0) <=
req.real_addr(SET_SIZE_BITS - 1 downto 0);
r1.wb.adr(SET_SIZE_BITS - ROW_OFF_BITS - 1 downto 0) <=
req.real_addr(SET_SIZE_BITS - 1 downto ROW_OFF_BITS);
r1.wb.dat <= req.data;
r1.wb.sel <= req.byte_sel;
end if;
if acks < 7 and req.same_tag = '1' and
if acks < 7 and req.same_tag = '1' and req.dcbz = '0' and
(req.op = OP_STORE_MISS or req.op = OP_STORE_HIT) then
r1.wb.stb <= '1';
stbs_done := false;
r1.store_way <= req.hit_way;
r1.store_row <= get_row(req.real_addr);
if req.op = OP_STORE_HIT then
r1.write_bram <= '1';
end if;
@ -1573,8 +1580,6 @@ begin
else
r1.mmu_done <= '1';
end if;
r1.forward_sel <= (others => '1');
r1.use_forward1 <= '1';
r1.wb.cyc <= '0';
r1.wb.stb <= '0';
end if;
@ -1589,7 +1594,7 @@ begin
dcache_log: process(clk)
begin
if rising_edge(clk) then
log_data <= r1.wb.adr(5 downto 3) &
log_data <= r1.wb.adr(2 downto 0) &
wishbone_in.stall &
wishbone_in.ack &
r1.wb.stb & r1.wb.cyc &

@ -114,8 +114,8 @@ architecture behaviour of decode1 is
36 => (LDST, NONE, OP_STORE, RA_OR_ZERO, CONST_SI, RS, NONE, '0', '0', '0', '0', ZERO, '0', is4B, '0', '0', '0', '0', '0', '0', NONE, '0', '0', NONE), -- stw
37 => (LDST, NONE, OP_STORE, RA_OR_ZERO, CONST_SI, RS, RA, '0', '0', '0', '0', ZERO, '0', is4B, '0', '0', '1', '0', '0', '0', NONE, '0', '0', NONE), -- stwu
8 => (ALU, NONE, OP_ADD, RA, CONST_SI, NONE, RT, '0', '0', '1', '0', ONE, '1', NONE, '0', '0', '0', '0', '0', '0', NONE, '0', '0', NONE), -- subfic
2 => (ALU, NONE, OP_TRAP, RA, CONST_SI, NONE, NONE, '0', '0', '0', '0', ZERO, '0', NONE, '0', '0', '0', '0', '0', '0', NONE, '0', '1', NONE), -- tdi
3 => (ALU, NONE, OP_TRAP, RA, CONST_SI, NONE, NONE, '0', '0', '0', '0', ZERO, '0', NONE, '0', '0', '0', '0', '1', '0', NONE, '0', '1', NONE), -- twi
2 => (ALU, NONE, OP_TRAP, RA, CONST_SI, NONE, NONE, '0', '0', '0', '0', ZERO, '0', NONE, '0', '0', '0', '0', '0', '0', NONE, '0', '0', NONE), -- tdi
3 => (ALU, NONE, OP_TRAP, RA, CONST_SI, NONE, NONE, '0', '0', '0', '0', ZERO, '0', NONE, '0', '0', '0', '0', '1', '0', NONE, '0', '0', NONE), -- twi
26 => (ALU, NONE, OP_XOR, NONE, CONST_UI, RS, RA, '0', '0', '0', '0', ZERO, '0', NONE, '0', '0', '0', '0', '0', '0', NONE, '0', '0', NONE), -- xori
27 => (ALU, NONE, OP_XOR, NONE, CONST_UI_HI, RS, RA, '0', '0', '0', '0', ZERO, '0', NONE, '0', '0', '0', '0', '0', '0', NONE, '0', '0', NONE), -- xoris
others => illegal_inst
@ -133,9 +133,9 @@ architecture behaviour of decode1 is
constant decode_op_4_array : op_4_subop_array_t := (
-- unit fac internal in1 in2 in3 out CR CR inv inv cry cry ldst BR sgn upd rsrv 32b sgn rc lk sgl rpt
-- op in out A out in out len ext pipe
2#110000# => (ALU, NONE, OP_MUL_H64, RA, RB, RCR, RT, '0', '0', '0', '0', ZERO, '0', NONE, '0', '0', '0', '0', '0', '1', RC, '0', '0', NONE), -- maddhd
2#110001# => (ALU, NONE, OP_MUL_H64, RA, RB, RCR, RT, '0', '0', '0', '0', ZERO, '0', NONE, '0', '0', '0', '0', '0', '0', RC, '0', '0', NONE), -- maddhdu
2#110011# => (ALU, NONE, OP_MUL_L64, RA, RB, RCR, RT, '0', '0', '0', '0', ZERO, '0', NONE, '0', '0', '0', '0', '0', '1', RC, '0', '0', NONE), -- maddld
2#110000# => (ALU, NONE, OP_MUL_H64, RA, RB, RCR, RT, '0', '0', '0', '0', ZERO, '0', NONE, '0', '0', '0', '0', '0', '1', NONE, '0', '0', NONE), -- maddhd
2#110001# => (ALU, NONE, OP_MUL_H64, RA, RB, RCR, RT, '0', '0', '0', '0', ZERO, '0', NONE, '0', '0', '0', '0', '0', '0', NONE, '0', '0', NONE), -- maddhdu
2#110011# => (ALU, NONE, OP_MUL_L64, RA, RB, RCR, RT, '0', '0', '0', '0', ZERO, '0', NONE, '0', '0', '0', '0', '0', '1', NONE, '0', '0', NONE), -- maddld
others => decode_rom_init
);

@ -166,7 +166,7 @@ architecture behaviour of decode1 is
-- mcrf; and cr logical ops
2#000# => (ALU, NONE, OP_CROP, NONE, NONE, NONE, NONE, '1', '1', '0', '0', ZERO, '0', NONE, '0', '0', '0', '0', '0', '0', NONE, '0', '0', NONE),
-- addpcis
2#001# => (ALU, NONE, OP_ADD, CIA, CONST_DXHI4, NONE, RT, '0', '0', '0', '0', ZERO, '0', NONE, '0', '0', '0', '0', '0', '0', RC, '0', '0', NONE),
2#001# => (ALU, NONE, OP_ADD, CIA, CONST_DXHI4, NONE, RT, '0', '0', '0', '0', ZERO, '0', NONE, '0', '0', '0', '0', '0', '0', NONE, '0', '0', NONE),
-- bclr, bcctr, bctar
2#100# => (ALU, NONE, OP_BCREG, SPR, SPR, NONE, SPR, '1', '0', '0', '0', ZERO, '0', NONE, '0', '0', '0', '0', '0', '0', NONE, '1', '0', NONE),
-- isync
@ -210,7 +210,7 @@ architecture behaviour of decode1 is
2#1011001010# => (ALU, NONE, OP_ADD, RA, NONE, NONE, RT, '0', '0', '0', '0', CA, '1', NONE, '0', '0', '0', '0', '0', '0', RC, '0', '0', NONE), -- addzeo
2#0000011100# => (ALU, NONE, OP_AND, NONE, RB, RS, RA, '0', '0', '0', '0', ZERO, '0', NONE, '0', '0', '0', '0', '0', '0', RC, '0', '0', NONE), -- and
2#0000111100# => (ALU, NONE, OP_AND, NONE, RB, RS, RA, '0', '0', '1', '0', ZERO, '0', NONE, '0', '0', '0', '0', '0', '0', RC, '0', '0', NONE), -- andc
2#0011111100# => (ALU, NONE, OP_BPERM, NONE, NONE, RS, RA, '0', '0', '0', '0', ZERO, '0', NONE, '0', '0', '0', '0', '0', '0', NONE, '0', '0', NONE), -- bperm
2#0011111100# => (ALU, NONE, OP_BPERM, NONE, RB, RS, RA, '0', '0', '0', '0', ZERO, '0', NONE, '0', '0', '0', '0', '0', '0', NONE, '0', '0', NONE), -- bperm
2#0100111010# => (ALU, NONE, OP_BCD, NONE, NONE, RS, RA, '0', '0', '0', '0', ZERO, '0', NONE, '0', '0', '0', '0', '0', '0', NONE, '0', '0', NONE), -- cbcdtd
2#0100011010# => (ALU, NONE, OP_BCD, NONE, NONE, RS, RA, '0', '0', '1', '0', ZERO, '0', NONE, '0', '0', '0', '0', '0', '0', NONE, '0', '0', NONE), -- cdtbcd
2#0000000000# => (ALU, NONE, OP_CMP, RA, RB, NONE, NONE, '0', '1', '1', '0', ONE, '0', NONE, '0', '0', '0', '0', '0', '1', NONE, '0', '0', NONE), -- cmp
@ -256,7 +256,7 @@ architecture behaviour of decode1 is
2#1101111011# => (ALU, NONE, OP_EXTSWSLI, NONE, CONST_SH, RS, RA, '0', '0', '0', '0', ZERO, '0', NONE, '0', '0', '0', '0', '0', '0', RC, '0', '0', NONE), -- extswsli
2#1111010110# => (ALU, NONE, OP_ICBI, NONE, NONE, NONE, NONE, '0', '0', '0', '0', ZERO, '0', NONE, '0', '0', '0', '0', '0', '0', NONE, '0', '1', NONE), -- icbi
2#0000010110# => (ALU, NONE, OP_ICBT, NONE, NONE, NONE, NONE, '0', '0', '0', '0', ZERO, '0', NONE, '0', '0', '0', '0', '0', '0', NONE, '0', '1', NONE), -- icbt
2#0000001111# => (ALU, NONE, OP_ISEL, RA_OR_ZERO, RB, NONE, RT, '1', '0', '0', '0', ZERO, '0', NONE, '0', '0', '0', '0', '0', '0', NONE, '0', '1', NONE), -- isel
2#0000001111# => (ALU, NONE, OP_ISEL, RA_OR_ZERO, RB, NONE, RT, '1', '0', '0', '0', ZERO, '0', NONE, '0', '0', '0', '0', '0', '0', NONE, '0', '0', NONE), -- isel
2#0000101111# => (ALU, NONE, OP_ISEL, RA_OR_ZERO, RB, NONE, RT, '1', '0', '0', '0', ZERO, '0', NONE, '0', '0', '0', '0', '0', '0', NONE, '0', '0', NONE), -- isel
2#0001001111# => (ALU, NONE, OP_ISEL, RA_OR_ZERO, RB, NONE, RT, '1', '0', '0', '0', ZERO, '0', NONE, '0', '0', '0', '0', '0', '0', NONE, '0', '0', NONE), -- isel
2#0001101111# => (ALU, NONE, OP_ISEL, RA_OR_ZERO, RB, NONE, RT, '1', '0', '0', '0', ZERO, '0', NONE, '0', '0', '0', '0', '0', '0', NONE, '0', '0', NONE), -- isel
@ -410,8 +410,8 @@ architecture behaviour of decode1 is
2#0011001000# => (ALU, NONE, OP_ADD, RA, NONE, NONE, RT, '0', '0', '1', '0', CA, '1', NONE, '0', '0', '0', '0', '0', '0', RC, '0', '0', NONE), -- subfze
2#1011001000# => (ALU, NONE, OP_ADD, RA, NONE, NONE, RT, '0', '0', '1', '0', CA, '1', NONE, '0', '0', '0', '0', '0', '0', RC, '0', '0', NONE), -- subfzeo
2#1001010110# => (ALU, NONE, OP_NOP, NONE, NONE, NONE, NONE, '0', '0', '0', '0', ZERO, '0', NONE, '0', '0', '0', '0', '0', '0', NONE, '0', '1', NONE), -- sync
2#0001000100# => (ALU, NONE, OP_TRAP, RA, RB, NONE, NONE, '0', '0', '0', '0', ZERO, '0', NONE, '0', '0', '0', '0', '0', '0', NONE, '0', '1', NONE), -- td
2#0000000100# => (ALU, NONE, OP_TRAP, RA, RB, NONE, NONE, '0', '0', '0', '0', ZERO, '0', NONE, '0', '0', '0', '0', '1', '0', NONE, '0', '1', NONE), -- tw
2#0001000100# => (ALU, NONE, OP_TRAP, RA, RB, NONE, NONE, '0', '0', '0', '0', ZERO, '0', NONE, '0', '0', '0', '0', '0', '0', NONE, '0', '0', NONE), -- td
2#0000000100# => (ALU, NONE, OP_TRAP, RA, RB, NONE, NONE, '0', '0', '0', '0', ZERO, '0', NONE, '0', '0', '0', '0', '1', '0', NONE, '0', '0', NONE), -- tw
2#0100110010# => (LDST, NONE, OP_TLBIE, NONE, RB, RS, NONE, '0', '0', '0', '0', ZERO, '0', NONE, '0', '0', '0', '0', '0', '0', NONE, '0', '0', NONE), -- tlbie
2#0100010010# => (LDST, NONE, OP_TLBIE, NONE, RB, RS, NONE, '0', '0', '0', '0', ZERO, '0', NONE, '0', '0', '0', '0', '0', '0', NONE, '0', '0', NONE), -- tlbiel
2#1000110110# => (ALU, NONE, OP_NOP, NONE, NONE, NONE, NONE, '0', '0', '0', '0', ZERO, '0', NONE, '0', '0', '0', '0', '0', '0', NONE, '0', '1', NONE), -- tlbsync
@ -740,6 +740,8 @@ begin
bv.br_offset := br_offset;
if f_in.next_predicted = '1' then
v.br_pred := '1';
elsif f_in.next_pred_ntaken = '1' then
v.br_pred := '0';
end if;
bv.predict := v.br_pred and f_in.valid and not flush_in and not busy_out and not f_in.next_predicted;
-- after a clock edge...

@ -215,7 +215,6 @@ architecture behaviour of decode2 is
OP_AND => "001", -- logical_result
OP_OR => "001",
OP_XOR => "001",
OP_POPCNT => "001",
OP_PRTY => "001",
OP_CMPB => "001",
OP_EXTS => "001",
@ -234,7 +233,8 @@ architecture behaviour of decode2 is
OP_DIV => "011",
OP_DIVE => "011",
OP_MOD => "011",
OP_CNTZ => "100", -- countzero_result
OP_CNTZ => "100", -- countbits_result
OP_POPCNT => "100",
OP_MFSPR => "101", -- spr_result
OP_B => "110", -- next_nia
OP_BC => "110",

@ -42,6 +42,8 @@ begin
quot <= (others => '0');
running <= '0';
count <= "0000000";
is_32bit <= '0';
overflow <= '0';
elsif d_in.valid = '1' then
if d_in.is_extended = '1' then
dend <= '0' & d_in.dividend & x"0000000000000000";

@ -0,0 +1,298 @@
library ieee;
use ieee.std_logic_1164.all;
use ieee.math_real.all;

library work;
use work.wishbone_types.all;

entity dmi_dtm is
generic(ABITS : INTEGER:=8;
DBITS : INTEGER:=64);

port(sys_clk : in std_ulogic;
sys_reset : in std_ulogic;
dmi_addr : out std_ulogic_vector(ABITS - 1 downto 0);
dmi_din : in std_ulogic_vector(DBITS - 1 downto 0);
dmi_dout : out std_ulogic_vector(DBITS - 1 downto 0);
dmi_req : out std_ulogic;
dmi_wr : out std_ulogic;
dmi_ack : in std_ulogic
-- dmi_err : in std_ulogic TODO: Add error response
);
end entity dmi_dtm;

architecture behaviour of dmi_dtm is
-- Signals coming out of the JTAGG block
signal jtag_reset_n : std_ulogic;
signal tdi : std_ulogic;
signal tdo : std_ulogic;
signal tck : std_ulogic;
signal jce1 : std_ulogic;
signal jshift : std_ulogic;
signal update : std_ulogic;

-- signals to match dmi_dtb_xilinx
signal jtag_reset : std_ulogic;
signal capture : std_ulogic;
signal jtag_clk : std_ulogic;
signal sel : std_ulogic;
signal shift : std_ulogic;

-- delays
signal jce1_d : std_ulogic;
constant TCK_DELAY : INTEGER := 8;
signal tck_d : std_ulogic_vector(TCK_DELAY+1 downto 1);

-- ** JTAG clock domain **

-- Shift register
signal shiftr : std_ulogic_vector(ABITS + DBITS + 1 downto 0);

-- Latched request
signal request : std_ulogic_vector(ABITS + DBITS + 1 downto 0);

-- A request is present
signal jtag_req : std_ulogic;

-- Synchronizer for jtag_rsp (sys clk -> jtag_clk)
signal dmi_ack_0 : std_ulogic;
signal dmi_ack_1 : std_ulogic;

-- ** sys clock domain **

-- Synchronizer for jtag_req (jtag clk -> sys clk)
signal jtag_req_0 : std_ulogic;
signal jtag_req_1 : std_ulogic;

-- ** combination signals
signal jtag_bsy : std_ulogic;
signal op_valid : std_ulogic;
signal rsp_op : std_ulogic_vector(1 downto 0);

-- ** Constants **
constant DMI_REQ_NOP : std_ulogic_vector(1 downto 0) := "00";
constant DMI_REQ_RD : std_ulogic_vector(1 downto 0) := "01";
constant DMI_REQ_WR : std_ulogic_vector(1 downto 0) := "10";
constant DMI_RSP_OK : std_ulogic_vector(1 downto 0) := "00";
constant DMI_RSP_BSY : std_ulogic_vector(1 downto 0) := "11";

attribute ASYNC_REG : string;
attribute ASYNC_REG of jtag_req_0: signal is "TRUE";
attribute ASYNC_REG of jtag_req_1: signal is "TRUE";
attribute ASYNC_REG of dmi_ack_0: signal is "TRUE";
attribute ASYNC_REG of dmi_ack_1: signal is "TRUE";

-- ECP5 JTAGG
component JTAGG is
generic (
ER1 : string := "ENABLED";
ER2 : string := "ENABLED"
);
port(
JTDO1 : in std_ulogic;
JTDO2 : in std_ulogic;
JTDI : out std_ulogic;
JTCK : out std_ulogic;
JRTI1 : out std_ulogic;
JRTI2 : out std_ulogic;
JSHIFT : out std_ulogic;
JUPDATE : out std_ulogic;
JRSTN : out std_ulogic;
JCE1 : out std_ulogic;
JCE2 : out std_ulogic
);
end component;

component LUT4 is
generic (
INIT : std_logic_vector
);
port(
A : in STD_ULOGIC;
B : in STD_ULOGIC;
C : in STD_ULOGIC;
D : in STD_ULOGIC;
Z : out STD_ULOGIC
);
end component;

begin

jtag: JTAGG
generic map(
ER2 => "DISABLED"
)
port map (
JTDO1 => tdo,
JTDO2 => '0',
JTDI => tdi,
JTCK => tck,
JRTI1 => open,
JRTI2 => open,
JSHIFT => jshift,
JUPDATE => update,
JRSTN => jtag_reset_n,
JCE1 => jce1,
JCE2 => open
);

-- JRTI1 looks like it could be connected to SEL, but
-- in practise JRTI1 is only high briefly, not for the duration
-- of the transmission. possibly mw_debug could be modified.
-- The ecp5 is probably the only jtag device anyway.
sel <= '1';

-- TDI needs to align with TCK, we use LUT delays here.
-- From https://github.com/enjoy-digital/litex/pull/1087
tck_d(1) <= tck;
del: for i in 1 to TCK_DELAY generate
attribute keep : boolean;
attribute keep of l: label is true;
begin
l: LUT4
generic map(
INIT => b"0000_0000_0000_0010"
)
port map (
A => tck_d(i),
B => '0', C => '0', D => '0',
Z => tck_d(i+1)
);
end generate;
jtag_clk <= tck_d(TCK_DELAY+1);

-- capture signal
jce1_sync : process(jtag_clk)
begin
if rising_edge(jtag_clk) then
jce1_d <= jce1;
capture <= jce1 and not jce1_d;
end if;
end process;

-- latch the shift signal, otherwise
-- we miss the last shift in
-- (maybe because we are delaying tck?)
shift_sync : process(jtag_clk)
begin
if (sys_reset = '1') then
shift <= '0';
elsif rising_edge(jtag_clk) then
shift <= jshift;
end if;
end process;

jtag_reset <= not jtag_reset_n;

-- dmi_req synchronization
dmi_req_sync : process(sys_clk)
begin
-- sys_reset is synchronous
if rising_edge(sys_clk) then
if (sys_reset = '1') then
jtag_req_0 <= '0';
jtag_req_1 <= '0';
else
jtag_req_0 <= jtag_req;
jtag_req_1 <= jtag_req_0;
end if;
end if;
end process;
dmi_req <= jtag_req_1;

-- dmi_ack synchronization
dmi_ack_sync: process(jtag_clk, jtag_reset)
begin
-- jtag_reset is async (see comments)
if jtag_reset = '1' then
dmi_ack_0 <= '0';
dmi_ack_1 <= '0';
elsif rising_edge(jtag_clk) then
dmi_ack_0 <= dmi_ack;
dmi_ack_1 <= dmi_ack_0;
end if;
end process;
-- jtag_bsy indicates whether we can start a new request, we can when
-- we aren't already processing one (jtag_req) and the synchronized ack
-- of the previous one is 0.
--
jtag_bsy <= jtag_req or dmi_ack_1;

-- decode request type in shift register
with shiftr(1 downto 0) select op_valid <=
'1' when DMI_REQ_RD,
'1' when DMI_REQ_WR,
'0' when others;

-- encode response op
rsp_op <= DMI_RSP_BSY when jtag_bsy = '1' else DMI_RSP_OK;

-- Some DMI out signals are directly driven from the request register
dmi_addr <= request(ABITS + DBITS + 1 downto DBITS + 2);
dmi_dout <= request(DBITS + 1 downto 2);
dmi_wr <= '1' when request(1 downto 0) = DMI_REQ_WR else '0';

-- TDO is wired to shift register bit 0
tdo <= shiftr(0);

-- Main state machine. Handles shift registers, request latch and
-- jtag_req latch. Could be split into 3 processes but it's probably
-- not worthwhile.
--
shifter: process(jtag_clk, jtag_reset, sys_reset)
begin
if jtag_reset = '1' or sys_reset = '1' then
shiftr <= (others => '0');
jtag_req <= '0';
request <= (others => '0');
elsif rising_edge(jtag_clk) then

-- Handle jtag "commands" when sel is 1
if sel = '1' then
-- Shift state, rotate the register
if shift = '1' then
shiftr <= tdi & shiftr(ABITS + DBITS + 1 downto 1);
end if;

-- Update state (trigger)
--
-- Latch the request if we aren't already processing one and
-- it has a valid command opcode.
--
if update = '1' and op_valid = '1' then
if jtag_bsy = '0' then
request <= shiftr;
jtag_req <= '1';
end if;
-- Set the shift register "op" to "busy". This will prevent
-- us from re-starting the command on the next update if
-- the command completes before that.
shiftr(1 downto 0) <= DMI_RSP_BSY;
end if;

-- Request completion.
--
-- Capture the response data for reads and clear request flag.
--
-- Note: We clear req (and thus dmi_req) here which relies on tck
-- ticking and sel set. This means we are stuck with dmi_req up if
-- the jtag interface stops. Slaves must be resilient to this.
--
if jtag_req = '1' and dmi_ack_1 = '1' then
jtag_req <= '0';
if request(1 downto 0) = DMI_REQ_RD then
request(DBITS + 1 downto 2) <= dmi_din;
end if;
end if;

-- Capture state, grab latch content with updated status
if capture = '1' then
shiftr <= request(ABITS + DBITS + 1 downto 2) & rsp_op;
end if;

end if;
end if;
end process;
end architecture behaviour;

@ -44,6 +44,7 @@ begin
DRAM_ABITS => 24,
DRAM_ALINES => 1,
DRAM_DLINES => 16,
DRAM_CKLINES => 1,
DRAM_PORT_WIDTH => 128,
PAYLOAD_FILE => DRAM_INIT_FILE,
PAYLOAD_SIZE => DRAM_INIT_SIZE
@ -250,10 +251,10 @@ begin
report "Back to back 4 stores 4 reads on hit...";
clr_acks;
for i in 0 to 3 loop
wb_write(add_off(a, i*8), make_pattern(i), x"ff");
wb_write(add_off(a, i), make_pattern(i), x"ff");
end loop;
for i in 0 to 3 loop
wb_read(add_off(a, i*8));
wb_read(add_off(a, i));
end loop;
wait_acks(8);
for i in 0 to 7 loop
@ -268,10 +269,10 @@ begin
a(10) := '1';
clr_acks;
for i in 0 to 3 loop
wb_write(add_off(a, i*8), make_pattern(i), x"ff");
wb_write(add_off(a, i), make_pattern(i), x"ff");
end loop;
for i in 0 to 3 loop
wb_read(add_off(a, i*8));
wb_read(add_off(a, i));
end loop;
wait_acks(8);
for i in 0 to 7 loop
@ -286,8 +287,8 @@ begin
a(10) := '1';
clr_acks;
for i in 0 to 3 loop
wb_write(add_off(a, i*8), make_pattern(i), x"ff");
wb_read(add_off(a, i*8));
wb_write(add_off(a, i), make_pattern(i), x"ff");
wb_read(add_off(a, i));
end loop;
wait_acks(8);
for i in 0 to 3 loop
@ -299,29 +300,29 @@ begin
a(11) := '1';
clr_acks;
wb_write(add_off(a, 0), x"1111111100000000", x"ff");
wb_write(add_off(a, 8), x"3333333322222222", x"ff");
wb_write(add_off(a, 16), x"5555555544444444", x"ff");
wb_write(add_off(a, 24), x"7777777766666666", x"ff");
wb_write(add_off(a, 32), x"9999999988888888", x"ff");
wb_write(add_off(a, 40), x"bbbbbbbbaaaaaaaa", x"ff");
wb_write(add_off(a, 48), x"ddddddddcccccccc", x"ff");
wb_write(add_off(a, 56), x"ffffffffeeeeeeee", x"ff");
wb_write(add_off(a, 64), x"1111111100000000", x"ff");
wb_write(add_off(a, 72), x"3333333322222222", x"ff");
wb_write(add_off(a, 80), x"5555555544444444", x"ff");
wb_write(add_off(a, 88), x"7777777766666666", x"ff");
wb_write(add_off(a, 96), x"9999999988888888", x"ff");
wb_write(add_off(a,104), x"bbbbbbbbaaaaaaaa", x"ff");
wb_write(add_off(a,112), x"ddddddddcccccccc", x"ff");
wb_write(add_off(a,120), x"ffffffffeeeeeeee", x"ff");
wb_write(add_off(a, 1), x"3333333322222222", x"ff");
wb_write(add_off(a, 2), x"5555555544444444", x"ff");
wb_write(add_off(a, 3), x"7777777766666666", x"ff");
wb_write(add_off(a, 4), x"9999999988888888", x"ff");
wb_write(add_off(a, 5), x"bbbbbbbbaaaaaaaa", x"ff");
wb_write(add_off(a, 6), x"ddddddddcccccccc", x"ff");
wb_write(add_off(a, 7), x"ffffffffeeeeeeee", x"ff");
wb_write(add_off(a, 8), x"1111111100000000", x"ff");
wb_write(add_off(a, 9), x"3333333322222222", x"ff");
wb_write(add_off(a, 10), x"5555555544444444", x"ff");
wb_write(add_off(a, 11), x"7777777766666666", x"ff");
wb_write(add_off(a, 12), x"9999999988888888", x"ff");
wb_write(add_off(a, 13), x"bbbbbbbbaaaaaaaa", x"ff");
wb_write(add_off(a, 14), x"ddddddddcccccccc", x"ff");
wb_write(add_off(a, 15), x"ffffffffeeeeeeee", x"ff");
wait_acks(16);

report "Scattered from middle of line...";
clr_acks;
wb_read(add_off(a,24));
wb_read(add_off(a,32));
wb_read(add_off(a, 3));
wb_read(add_off(a, 4));
wb_read(add_off(a, 0));
wb_read(add_off(a,16));
wb_read(add_off(a, 2));
wait_acks(4);
read_data(d);
assert d = x"7777777766666666" report "bad data (24), got " & to_hstring(d) severity failure;

@ -14,6 +14,7 @@ entity execute1 is
generic (
EX1_BYPASS : boolean := true;
HAS_FPU : boolean := true;
HAS_SHORT_MULT : boolean := false;
-- Non-zero to enable log data collection
LOG_LENGTH : natural := 0
);
@ -45,6 +46,12 @@ entity execute1 is
icache_inval : out std_ulogic;
terminate_out : out std_ulogic;

-- PMU event buses
wb_events : in WritebackEventType;
ls_events : in Loadstore1EventType;
dc_events : in DcacheEventType;
ic_events : in IcacheEventType;

log_out : out std_ulogic_vector(14 downto 0);
log_rd_addr : out std_ulogic_vector(31 downto 0);
log_rd_data : in std_ulogic_vector(63 downto 0);
@ -67,6 +74,11 @@ architecture behaviour of execute1 is
mul_finish : std_ulogic;
div_in_progress : std_ulogic;
cntz_in_progress : std_ulogic;
no_instr_avail : std_ulogic;
instr_dispatch : std_ulogic;
ext_interrupt : std_ulogic;
taken_branch_event : std_ulogic;
br_mispredict : std_ulogic;
log_addr_spr : std_ulogic_vector(31 downto 0);
end record;
constant reg_type_init : reg_type :=
@ -75,6 +87,8 @@ architecture behaviour of execute1 is
busy => '0', terminate => '0', intr_pending => '0',
fp_exception_next => '0', trace_next => '0', prev_op => OP_ILLEGAL, br_taken => '0',
mul_in_progress => '0', mul_finish => '0', div_in_progress => '0', cntz_in_progress => '0',
no_instr_avail => '0', instr_dispatch => '0', ext_interrupt => '0',
taken_branch_event => '0', br_mispredict => '0',
others => (others => '0'));

signal r, rin : reg_type;
@ -82,23 +96,23 @@ architecture behaviour of execute1 is
signal a_in, b_in, c_in : std_ulogic_vector(63 downto 0);
signal cr_in : std_ulogic_vector(31 downto 0);
signal xerc_in : xer_common_t;
signal mshort_p : std_ulogic_vector(31 downto 0) := (others => '0');

signal valid_in : std_ulogic;
signal ctrl: ctrl_t := (others => (others => '0'));
signal ctrl_tmp: ctrl_t := (others => (others => '0'));
signal ctrl: ctrl_t;
signal ctrl_tmp: ctrl_t;
signal right_shift, rot_clear_left, rot_clear_right: std_ulogic;
signal rot_sign_ext: std_ulogic;
signal rotator_result: std_ulogic_vector(63 downto 0);
signal rotator_carry: std_ulogic;
signal logical_result: std_ulogic_vector(63 downto 0);
signal countzero_result: std_ulogic_vector(63 downto 0);
signal do_popcnt: std_ulogic;
signal countbits_result: std_ulogic_vector(63 downto 0);
signal alu_result: std_ulogic_vector(63 downto 0);
signal adder_result: std_ulogic_vector(63 downto 0);
signal misc_result: std_ulogic_vector(63 downto 0);
signal muldiv_result: std_ulogic_vector(63 downto 0);
signal spr_result: std_ulogic_vector(63 downto 0);
signal result_mux_sel: std_ulogic_vector(2 downto 0);
signal sub_mux_sel: std_ulogic_vector(2 downto 0);
signal next_nia : std_ulogic_vector(63 downto 0);
signal current: Decode2ToExecute1Type;

@ -125,6 +139,10 @@ architecture behaviour of execute1 is
signal random_cond : std_ulogic_vector(63 downto 0);
signal random_err : std_ulogic;

-- PMU signals
signal x_to_pmu : Execute1ToPMUType;
signal pmu_to_x : PMUToExecute1Type;

-- signals for logging
signal exception_log : std_ulogic;
signal irq_valid_log : std_ulogic;
@ -213,6 +231,24 @@ architecture behaviour of execute1 is
return msr_out;
end;

-- Work out whether a signed value fits into n bits,
-- that is, see if it is in the range -2^(n-1) .. 2^(n-1) - 1
function fits_in_n_bits(val: std_ulogic_vector; n: integer) return boolean is
variable x, xp1: std_ulogic_vector(val'left downto val'right);
begin
x := val;
if val(val'left) = '0' then
x := not val;
end if;
xp1 := bit_reverse(std_ulogic_vector(unsigned(bit_reverse(x)) + 1));
x := x and not xp1;
-- For positive inputs, x has ones at the positions
-- to the left of the leftmost 1 bit in val.
-- For negative inputs, x has ones to the left of
-- the leftmost 0 bit in val.
return x(n - 1) = '1';
end;

-- Tell vivado to keep the hierarchy for the random module so that the
-- net names in the xdc file match.
attribute keep_hierarchy : string;
@ -247,13 +283,15 @@ begin
datalen => e_in.data_len
);

countzero_0: entity work.zero_counter
countbits_0: entity work.bit_counter
port map (
clk => clk,
rs => c_in,
count_right => e_in.insn(10),
is_32bit => e_in.is_32bit,
result => countzero_result
do_popcnt => do_popcnt,
datalen => e_in.data_len,
result => countbits_result
);

multiply_0: entity work.multiply
@ -279,6 +317,25 @@ begin
err => random_err
);

pmu_0: entity work.pmu
port map (
clk => clk,
rst => rst,
p_in => x_to_pmu,
p_out => pmu_to_x
);

short_mult_0: if HAS_SHORT_MULT generate
begin
short_mult: entity work.short_multiply
port map (
clk => clk,
a_in => a_in(15 downto 0),
b_in => b_in(15 downto 0),
m_out => mshort_p
);
end generate;

dbg_msr_out <= ctrl.msr;
log_rd_addr <= r.log_addr_spr;

@ -287,6 +344,31 @@ begin
c_in <= e_in.read_data3;
cr_in <= e_in.cr;

x_to_pmu.occur <= (instr_complete => wb_events.instr_complete,
fp_complete => wb_events.fp_complete,
ld_complete => ls_events.load_complete,
st_complete => ls_events.store_complete,
itlb_miss => ls_events.itlb_miss,
dc_load_miss => dc_events.load_miss,
dc_ld_miss_resolved => dc_events.dcache_refill,
dc_store_miss => dc_events.store_miss,
dtlb_miss => dc_events.dtlb_miss,
dtlb_miss_resolved => dc_events.dtlb_miss_resolved,
icache_miss => ic_events.icache_miss,
itlb_miss_resolved => ic_events.itlb_miss_resolved,
no_instr_avail => r.no_instr_avail,
dispatch => r.instr_dispatch,
ext_interrupt => r.ext_interrupt,
br_taken_complete => r.taken_branch_event,
br_mispredict => r.br_mispredict,
others => '0');
x_to_pmu.nia <= current.nia;
x_to_pmu.addr <= (others => '0');
x_to_pmu.addr_v <= '0';
x_to_pmu.spr_num <= e_in.insn(20 downto 16);
x_to_pmu.spr_val <= c_in;
x_to_pmu.run <= '1';

-- XER forwarding. To avoid having to track XER hazards, we use
-- the previously latched value. Since the XER common bits
-- (SO, OV[32] and CA[32]) are only modified by instructions that are
@ -310,7 +392,7 @@ begin
logical_result when "001",
rotator_result when "010",
muldiv_result when "011",
countzero_result when "100",
countbits_result when "100",
spr_result when "101",
next_nia when "110",
misc_result when others;
@ -322,6 +404,7 @@ begin
r <= reg_type_init;
ctrl.tb <= (others => '0');
ctrl.dec <= (others => '0');
ctrl.cfar <= (others => '0');
ctrl.msr <= (MSR_SF => '1', MSR_LE => '1', others => '0');
else
r <= rin;
@ -459,7 +542,11 @@ begin

case current.sub_select(1 downto 0) is
when "00" =>
muldiv_result <= multiply_to_x.result(63 downto 0);
if HAS_SHORT_MULT and r.mul_in_progress = '0' then
muldiv_result <= std_ulogic_vector(resize(signed(mshort_p), 64));
else
muldiv_result <= multiply_to_x.result(63 downto 0);
end if;
when "01" =>
muldiv_result <= multiply_to_x.result(127 downto 64);
when "10" =>
@ -633,7 +720,7 @@ begin
end if;
when "100" =>
-- MCRXRX
newcrf := xerc_in.ov & xerc_in.ca & xerc_in.ov32 & xerc_in.ca32;
newcrf := xerc_in.ov & xerc_in.ov32 & xerc_in.ca & xerc_in.ca32;
when others =>
end case;
if current.insn_type = OP_MTCRF then
@ -692,6 +779,18 @@ begin
v.div_in_progress := '0';
v.cntz_in_progress := '0';
v.mul_finish := '0';
v.ext_interrupt := '0';
v.taken_branch_event := '0';
v.br_mispredict := '0';

x_to_pmu.mfspr <= '0';
x_to_pmu.mtspr <= '0';
x_to_pmu.tbbits(3) <= ctrl.tb(63 - 47);
x_to_pmu.tbbits(2) <= ctrl.tb(63 - 51);
x_to_pmu.tbbits(1) <= ctrl.tb(63 - 55);
x_to_pmu.tbbits(0) <= ctrl.tb(63 - 63);
x_to_pmu.pmm_msr <= ctrl.msr(MSR_PMM);
x_to_pmu.pr_msr <= ctrl.msr(MSR_PR);

spr_result <= (others => '0');
spr_val := (others => '0');
@ -701,7 +800,7 @@ begin
ctrl_tmp.tb <= std_ulogic_vector(unsigned(ctrl.tb) + 1);
ctrl_tmp.dec <= std_ulogic_vector(unsigned(ctrl.dec) - 1);

irq_valid := ctrl.msr(MSR_EE) and (ctrl.dec(63) or ext_irq_in);
irq_valid := ctrl.msr(MSR_EE) and (pmu_to_x.intr or ctrl.dec(63) or ext_irq_in);

v.terminate := '0';
icache_inval <= '0';
@ -716,6 +815,8 @@ begin
rot_clear_right <= '1' when e_in.insn_type = OP_RLC or e_in.insn_type = OP_RLCR else '0';
rot_sign_ext <= '1' when e_in.insn_type = OP_EXTSWSLI else '0';

do_popcnt <= '1' when e_in.insn_type = OP_POPCNT else '0';

illegal := '0';
if r.intr_pending = '1' then
v.e.srr1 := r.e.srr1;
@ -763,12 +864,16 @@ begin
elsif irq_valid = '1' then
-- Don't deliver the interrupt until we have a valid instruction
-- coming in, so we have a valid NIA to put in SRR0.
if ctrl.dec(63) = '1' then
if pmu_to_x.intr = '1' then
v.e.intr_vec := 16#f00#;
report "IRQ valid: PMU";
elsif ctrl.dec(63) = '1' then
v.e.intr_vec := 16#900#;
report "IRQ valid: DEC";
elsif ext_irq_in = '1' then
v.e.intr_vec := 16#500#;
report "IRQ valid: External";
v.ext_interrupt := '1';
end if;
exception := '1';

@ -801,6 +906,9 @@ begin
v.intr_pending := '0';
end if;

v.no_instr_avail := not (e_in.valid or l_in.busy or l_in.in_progress or r.busy or fp_in.busy);
v.instr_dispatch := valid_in and not exception and not illegal;

if valid_in = '1' and exception = '0' and illegal = '0' and e_in.unit = ALU then
v.e.valid := '1';

@ -859,7 +967,7 @@ begin
when OP_ADDG6S =>
when OP_CMPRB =>
when OP_CMPEQB =>
when OP_AND | OP_OR | OP_XOR | OP_POPCNT | OP_PRTY | OP_CMPB | OP_EXTS |
when OP_AND | OP_OR | OP_XOR | OP_PRTY | OP_CMPB | OP_EXTS |
OP_BPERM | OP_BCD =>

when OP_B =>
@ -870,6 +978,7 @@ begin
if ctrl.msr(MSR_BE) = '1' then
do_trace := '1';
end if;
v.taken_branch_event := '1';
when OP_BC | OP_BCREG =>
-- read_data1 is CTR
-- for OP_BCREG, read_data2 is target register (CTR, LR or TAR)
@ -885,6 +994,7 @@ begin
taken_branch := r.br_taken;
end if;
v.br_taken := taken_branch;
v.taken_branch_event := taken_branch;
abs_branch := e_in.br_abs;
if e_in.repeat = '0' or e_in.second = '1' then
is_branch := '1';
@ -919,7 +1029,7 @@ begin
end if;
do_trace := '0';

when OP_CNTZ =>
when OP_CNTZ | OP_POPCNT =>
v.e.valid := '0';
v.cntz_in_progress := '1';
v.busy := '1';
@ -963,6 +1073,12 @@ begin
when 725 => -- LOG_DATA SPR
spr_val := log_rd_data;
v.log_addr_spr := std_ulogic_vector(unsigned(r.log_addr_spr) + 1);
when SPR_UPMC1 | SPR_UPMC2 | SPR_UPMC3 | SPR_UPMC4 | SPR_UPMC5 | SPR_UPMC6 |
SPR_UMMCR0 | SPR_UMMCR1 | SPR_UMMCR2 | SPR_UMMCRA | SPR_USIER | SPR_USIAR | SPR_USDAR |
SPR_PMC1 | SPR_PMC2 | SPR_PMC3 | SPR_PMC4 | SPR_PMC5 | SPR_PMC6 |
SPR_MMCR0 | SPR_MMCR1 | SPR_MMCR2 | SPR_MMCRA | SPR_SIER | SPR_SIAR | SPR_SDAR =>
x_to_pmu.mfspr <= '1';
spr_val := pmu_to_x.spr_val;
when others =>
-- mfspr from unimplemented SPRs should be a nop in
-- supervisor mode and a program interrupt for user mode
@ -1017,6 +1133,11 @@ begin
ctrl_tmp.dec <= c_in;
when 724 => -- LOG_ADDR SPR
v.log_addr_spr := c_in(31 downto 0);
when SPR_UPMC1 | SPR_UPMC2 | SPR_UPMC3 | SPR_UPMC4 | SPR_UPMC5 | SPR_UPMC6 |
SPR_UMMCR0 | SPR_UMMCR2 | SPR_UMMCRA |
SPR_PMC1 | SPR_PMC2 | SPR_PMC3 | SPR_PMC4 | SPR_PMC5 | SPR_PMC6 |
SPR_MMCR0 | SPR_MMCR1 | SPR_MMCR2 | SPR_MMCRA | SPR_SIER | SPR_SIAR | SPR_SDAR =>
x_to_pmu.mtspr <= '1';
when others =>
-- mtspr to unimplemented SPRs should be a nop in
-- supervisor mode and a program interrupt for user mode
@ -1039,10 +1160,20 @@ begin
icache_inval <= '1';

when OP_MUL_L64 | OP_MUL_H64 | OP_MUL_H32 =>
v.e.valid := '0';
v.mul_in_progress := '1';
v.busy := '1';
x_to_multiply.valid <= '1';
if HAS_SHORT_MULT and e_in.insn_type = OP_MUL_L64 and e_in.insn(26) = '1' and
fits_in_n_bits(a_in, 16) and fits_in_n_bits(b_in, 16) then
-- Operands fit into 16 bits, so use short multiplier
if e_in.oe = '1' then
-- Note 16x16 multiply can't overflow, even for mullwo
set_ov(v.e, '0', '0');
end if;
else
-- Use standard multiplier
v.e.valid := '0';
v.mul_in_progress := '1';
v.busy := '1';
x_to_multiply.valid <= '1';
end if;

when OP_DIV | OP_DIVE | OP_MOD =>
v.e.valid := '0';
@ -1068,6 +1199,7 @@ begin
end if;
if taken_branch /= e_in.br_pred then
v.e.redirect := '1';
v.br_mispredict := is_direct_branch;
end if;
v.e.br_last := is_direct_branch;
v.e.br_taken := taken_branch;
@ -1092,7 +1224,7 @@ begin
-- valid_in = 0. Hence they don't happen in the same cycle as any of
-- the cases above which depend on valid_in = 1.
if r.cntz_in_progress = '1' then
-- cnt[lt]z always takes two cycles
-- cnt[lt]z and popcnt* always take two cycles
v.e.valid := '1';
elsif r.mul_in_progress = '1' or r.div_in_progress = '1' then
if (r.mul_in_progress = '1' and multiply_to_x.valid = '1') or

@ -40,7 +40,8 @@ architecture behaviour of fetch1 is
type reg_internal_t is record
mode_32bit: std_ulogic;
rd_is_niap4: std_ulogic;
predicted: std_ulogic;
predicted_taken: std_ulogic;
pred_not_taken: std_ulogic;
predicted_nia: std_ulogic_vector(63 downto 0);
end record;
signal r, r_next : Fetch1ToIcacheType;
@ -52,7 +53,7 @@ architecture behaviour of fetch1 is
constant BTC_TAG_BITS : integer := 62 - BTC_ADDR_BITS;
constant BTC_TARGET_BITS : integer := 62;
constant BTC_SIZE : integer := 2 ** BTC_ADDR_BITS;
constant BTC_WIDTH : integer := BTC_TAG_BITS + BTC_TARGET_BITS;
constant BTC_WIDTH : integer := BTC_TAG_BITS + BTC_TARGET_BITS + 1;
type btc_mem_type is array (0 to BTC_SIZE - 1) of std_ulogic_vector(BTC_WIDTH - 1 downto 0);

signal btc_rd_data : std_ulogic_vector(BTC_WIDTH - 1 downto 0) := (others => '0');
@ -83,12 +84,13 @@ begin
end if;
if advance_nia = '1' then
r.predicted <= r_next.predicted;
r.pred_ntaken <= r_next.pred_ntaken;
r.nia <= r_next.nia;
r_int.predicted <= r_next_int.predicted;
r_int.predicted_taken <= r_next_int.predicted_taken;
r_int.pred_not_taken <= r_next_int.pred_not_taken;
r_int.predicted_nia <= r_next_int.predicted_nia;
r_int.rd_is_niap4 <= r_next.sequential;
r_int.rd_is_niap4 <= r_next_int.rd_is_niap4;
end if;
r.sequential <= r_next.sequential and advance_nia;
-- always send the up-to-date stop mark and req
r.stop_mark <= stop_in;
r.req <= not rst;
@ -107,13 +109,12 @@ begin
signal btc_wr : std_ulogic;
signal btc_wr_data : std_ulogic_vector(BTC_WIDTH - 1 downto 0);
signal btc_wr_addr : std_ulogic_vector(BTC_ADDR_BITS - 1 downto 0);
signal btc_wr_v : std_ulogic;
begin
btc_wr_data <= w_in.br_nia(63 downto BTC_ADDR_BITS + 2) &
btc_wr_data <= w_in.br_taken &
w_in.br_nia(63 downto BTC_ADDR_BITS + 2) &
w_in.redirect_nia(63 downto 2);
btc_wr_addr <= w_in.br_nia(BTC_ADDR_BITS + 1 downto 2);
btc_wr <= w_in.br_last;
btc_wr_v <= w_in.br_taken;

btc_ram : process(clk)
variable raddr : unsigned(BTC_ADDR_BITS - 1 downto 0);
@ -131,7 +132,7 @@ begin
if inval_btc = '1' or rst = '1' then
btc_valids <= (others => '0');
elsif btc_wr = '1' then
btc_valids(to_integer(unsigned(btc_wr_addr))) <= btc_wr_v;
btc_valids(to_integer(unsigned(btc_wr_addr))) <= '1';
end if;
end if;
end process;
@ -143,9 +144,11 @@ begin
begin
v := r;
v_int := r_int;
v.sequential := '0';
v.predicted := '0';
v_int.predicted := '0';
v.pred_ntaken := '0';
v_int.predicted_taken := '0';
v_int.pred_not_taken := '0';
v_int.rd_is_niap4 := '0';

if rst = '1' then
if alt_reset_in = '1' then
@ -172,19 +175,21 @@ begin
if r_int.mode_32bit = '1' then
v.nia(63 downto 32) := (others => '0');
end if;
elsif r_int.predicted = '1' then
elsif r_int.predicted_taken = '1' then
v.nia := r_int.predicted_nia;
v.predicted := '1';
else
v.sequential := '1';
v_int.rd_is_niap4 := '1';
v.pred_ntaken := r_int.pred_not_taken;
v.nia := std_ulogic_vector(unsigned(r.nia) + 4);
if r_int.mode_32bit = '1' then
v.nia(63 downto 32) := x"00000000";
end if;
if btc_rd_valid = '1' and r_int.rd_is_niap4 = '1' and
btc_rd_data(BTC_WIDTH - 1 downto BTC_TARGET_BITS)
btc_rd_data(BTC_WIDTH - 2 downto BTC_TARGET_BITS)
= v.nia(BTC_TAG_BITS + BTC_ADDR_BITS + 1 downto BTC_ADDR_BITS + 2) then
v_int.predicted := '1';
v_int.predicted_taken := btc_rd_data(BTC_WIDTH - 1);
v_int.pred_not_taken := not btc_rd_data(BTC_WIDTH - 1);
end if;
end if;
v_int.predicted_nia := btc_rd_data(BTC_TARGET_BITS - 1 downto 0) & "00";

@ -82,33 +82,42 @@ architecture bypass of clock_generator is
CLKINTFB : out std_logic );
end component;

signal clkos : std_ulogic;
signal clkop : std_logic;
signal lock : std_logic;

-- PLL constants based on prjtrellis example
constant PLL_IN : natural := 2000000;
constant PLL_OUT : natural := 600000000;
-- PLL constants
-- According to the datasheet, PLL_IN needs to be between 10 and 400 MHz
-- PLL_OUT needs to be between 400 and 800 MHz
-- PLL_IN is chosen based on 12 and 48 MHz being common values
-- for the reference clock.
constant PLL_IN : natural := 12000000;
constant PLL_OUT : natural := 480000000;

-- Configration for ECP5 PLL
constant PLL_CLKOP_DIV : natural := PLL_OUT/CLK_OUTPUT_HZ;
constant PLL_CLKFB_DIV : natural := CLK_OUTPUT_HZ/PLL_IN;
constant PLL_CLKOS_DIV : natural := 2;
constant PLL_CLKFB_DIV : natural := PLL_OUT/PLL_CLKOS_DIV/PLL_IN;
constant PLL_CLKI_DIV : natural := CLK_INPUT_HZ/PLL_IN;

begin
pll_clk_out <= clkop;
pll_locked_out <= not lock; -- FIXME: EHXPLLL lock signal active low?!?
pll_locked_out <= lock;

clkgen: EHXPLLL
generic map(
CLKOP_CPHASE => 11, -- FIXME: Copied from prjtrells.
CLKOP_DIV => PLL_CLKOP_DIV,
CLKOS_ENABLE => "ENABLED",
CLKOS_DIV => PLL_CLKOS_DIV,
CLKFB_DIV => PLL_CLKFB_DIV,
CLKI_DIV => PLL_CLKI_DIV
CLKI_DIV => PLL_CLKI_DIV,
FEEDBK_PATH => "CLKOS"
)
port map (
CLKI => ext_clk,
CLKOP => clkop,
CLKFB => clkop,
CLKOS => clkos,
CLKFB => clkos,
LOCK => lock,
RST => pll_rst_in,
PHASESEL1 => '0',
@ -118,8 +127,8 @@ begin
PHASELOADREG => '0',
STDBY => '0',
PLLWAKESYNC => '0',
ENCLKOP => '0',
ENCLKOS => '0',
ENCLKOP => '1',
ENCLKOS => '1',
ENCLKOS2 => '0',
ENCLKOS3 => '0'
);

@ -70,6 +70,18 @@ architecture rtl of clock_generator is
report "Unsupported output frequency" severity failure;
return bad_settings;
end case;
when 50000000 =>
case output_hz is
when 100000000 =>
return (clkin_period => 20.0,
clkfbout_mult => 32,
clkout_divide => 16,
divclk_divide => 1,
force_rst => '0');
when others =>
report "Unsupported output frequency" severity failure;
return bad_settings;
end case;
when others =>
report "Unsupported input frequency" severity failure;
return bad_settings;

@ -17,8 +17,8 @@ entity main_bram is
port(
clk : in std_logic;
addr : in std_logic_vector(HEIGHT_BITS - 1 downto 0) ;
di : in std_logic_vector(WIDTH-1 downto 0);
do : out std_logic_vector(WIDTH-1 downto 0);
din : in std_logic_vector(WIDTH-1 downto 0);
dout : out std_logic_vector(WIDTH-1 downto 0);
sel : in std_logic_vector((WIDTH/8)-1 downto 0);
re : in std_ulogic;
we : in std_ulogic
@ -68,14 +68,14 @@ begin
for i in 0 to 7 loop
if sel(i) = '1' then
memory(to_integer(unsigned(addr)))((i + 1) * 8 - 1 downto i * 8) <=
di((i + 1) * 8 - 1 downto i * 8);
din((i + 1) * 8 - 1 downto i * 8);
end if;
end loop;
end if;
if re = '1' then
obuf <= memory(to_integer(unsigned(addr)));
end if;
do <= obuf;
dout <= obuf;
end if;
end process;


@ -94,6 +94,10 @@ architecture behaviour of toplevel is
signal spi_sdat_oe : std_ulogic_vector(3 downto 0);
signal spi_sdat_i : std_ulogic_vector(3 downto 0);

-- ddram clock signals as vectors
signal ddram_clk_p_vec : std_logic_vector(0 downto 0);
signal ddram_clk_n_vec : std_logic_vector(0 downto 0);

-- Fixup various memory sizes based on generics
function get_bram_size return natural is
begin
@ -252,6 +256,9 @@ begin
-- but for now, assert it's 100Mhz
assert CLK_FREQUENCY = 100000000;

ddram_clk_p_vec <= (others => ddram_clk_p);
ddram_clk_n_vec <= (others => ddram_clk_n);

reset_controller: entity work.soc_reset
generic map(
RESET_LOW => false,
@ -272,6 +279,7 @@ begin
DRAM_ABITS => 26,
DRAM_ALINES => 16,
DRAM_DLINES => 16,
DRAM_CKLINES => 1,
DRAM_PORT_WIDTH => 128,
PAYLOAD_FILE => RAM_INIT_FILE,
PAYLOAD_SIZE => PAYLOAD_SIZE
@ -304,8 +312,8 @@ begin
ddram_dq => ddram_dq,
ddram_dqs_p => ddram_dqs_p,
ddram_dqs_n => ddram_dqs_n,
ddram_clk_p => ddram_clk_p,
ddram_clk_n => ddram_clk_n,
ddram_clk_p => ddram_clk_p_vec,
ddram_clk_n => ddram_clk_n_vec,
ddram_cke => ddram_cke,
ddram_odt => ddram_odt,
ddram_reset_n => ddram_reset_n

@ -16,6 +16,7 @@ entity toplevel is
CLK_FREQUENCY : positive := 100000000;
HAS_FPU : boolean := true;
HAS_BTC : boolean := true;
HAS_SHORT_MULT : boolean := false;
USE_LITEDRAM : boolean := false;
NO_BRAM : boolean := false;
DISABLE_FLATTEN_CORE : boolean := false;
@ -28,6 +29,7 @@ entity toplevel is
UART_IS_16550 : boolean := false;
HAS_UART1 : boolean := true;
USE_LITESDCARD : boolean := false;
HAS_GPIO : boolean := true;
NGPIO : natural := 32
);
port(
@ -161,6 +163,10 @@ architecture behaviour of toplevel is
signal gpio_out : std_ulogic_vector(NGPIO - 1 downto 0);
signal gpio_dir : std_ulogic_vector(NGPIO - 1 downto 0);

-- ddram clock signals as vectors
signal ddram_clk_p_vec : std_logic_vector(0 downto 0);
signal ddram_clk_n_vec : std_logic_vector(0 downto 0);

-- Fixup various memory sizes based on generics
function get_bram_size return natural is
begin
@ -193,6 +199,7 @@ begin
CLK_FREQ => CLK_FREQUENCY,
HAS_FPU => HAS_FPU,
HAS_BTC => HAS_BTC,
HAS_SHORT_MULT => HAS_SHORT_MULT,
HAS_DRAM => USE_LITEDRAM,
DRAM_SIZE => 256 * 1024 * 1024,
DRAM_INIT_SIZE => PAYLOAD_SIZE,
@ -207,6 +214,7 @@ begin
UART0_IS_16550 => UART_IS_16550,
HAS_UART1 => HAS_UART1,
HAS_SD_CARD => USE_LITESDCARD,
HAS_GPIO => HAS_GPIO,
NGPIO => NGPIO
)
port map (
@ -378,11 +386,15 @@ begin
end if;
end process;

ddram_clk_p_vec <= (others => ddram_clk_p);
ddram_clk_n_vec <= (others => ddram_clk_n);

dram: entity work.litedram_wrapper
generic map(
DRAM_ABITS => 24,
DRAM_ALINES => 14,
DRAM_DLINES => 16,
DRAM_CKLINES => 1,
DRAM_PORT_WIDTH => 128,
PAYLOAD_FILE => RAM_INIT_FILE,
PAYLOAD_SIZE => PAYLOAD_SIZE
@ -415,8 +427,8 @@ begin
ddram_dq => ddram_dq,
ddram_dqs_p => ddram_dqs_p,
ddram_dqs_n => ddram_dqs_n,
ddram_clk_p => ddram_clk_p,
ddram_clk_n => ddram_clk_n,
ddram_clk_p => ddram_clk_p_vec,
ddram_clk_n => ddram_clk_n_vec,
ddram_cke => ddram_cke,
ddram_odt => ddram_odt,
ddram_reset_n => ddram_reset_n
@ -548,7 +560,7 @@ begin
wb_eth_cyc <= wb_ext_io_in.cyc and wb_ext_is_eth;

-- Remove top address bits as liteeth decoder doesn't know about them
wb_eth_adr <= x"000" & "000" & wb_ext_io_in.adr(16 downto 2);
wb_eth_adr <= x"000" & "000" & wb_ext_io_in.adr(14 downto 0);

-- LiteETH isn't pipelined
wb_eth_out.stall <= not wb_eth_out.ack;
@ -638,7 +650,7 @@ begin
-- Gate cyc with chip select from SoC
wb_sdcard_cyc <= wb_ext_io_in.cyc and wb_ext_is_sdcard;

wb_sdcard_adr <= x"0000" & wb_ext_io_in.adr(15 downto 2);
wb_sdcard_adr <= x"0000" & wb_ext_io_in.adr(13 downto 0);

wb_sdcard_out.stall <= not wb_sdcard_out.ack;


@ -13,6 +13,7 @@ entity toplevel is
CLK_FREQUENCY : positive := 100000000;
HAS_FPU : boolean := true;
HAS_BTC : boolean := false;
HAS_SHORT_MULT: boolean := false;
ICACHE_NUM_LINES : natural := 64;
LOG_LENGTH : natural := 512;
DISABLE_FLATTEN_CORE : boolean := false;
@ -74,6 +75,7 @@ begin
CLK_FREQ => CLK_FREQUENCY,
HAS_FPU => HAS_FPU,
HAS_BTC => HAS_BTC,
HAS_SHORT_MULT => HAS_SHORT_MULT,
ICACHE_NUM_LINES => ICACHE_NUM_LINES,
LOG_LENGTH => LOG_LENGTH,
DISABLE_FLATTEN_CORE => DISABLE_FLATTEN_CORE,

@ -97,6 +97,10 @@ architecture behaviour of toplevel is
signal spi_sdat_oe : std_ulogic_vector(3 downto 0);
signal spi_sdat_i : std_ulogic_vector(3 downto 0);

-- ddram clock signals as vectors
signal ddram_clk_p_vec : std_logic_vector(0 downto 0);
signal ddram_clk_n_vec : std_logic_vector(0 downto 0);

-- Fixup various memory sizes based on generics
function get_bram_size return natural is
begin
@ -270,11 +274,15 @@ begin
rst_out => open
);

ddram_clk_p_vec <= (others => ddram_clk_p);
ddram_clk_n_vec <= (others => ddram_clk_n);

dram: entity work.litedram_wrapper
generic map(
DRAM_ABITS => 25,
DRAM_ALINES => 15,
DRAM_DLINES => 32,
DRAM_CKLINES => 1,
DRAM_PORT_WIDTH => 256,
PAYLOAD_FILE => RAM_INIT_FILE,
PAYLOAD_SIZE => PAYLOAD_SIZE
@ -307,8 +315,8 @@ begin
ddram_dq => ddram_dq,
ddram_dqs_p => ddram_dqs_p,
ddram_dqs_n => ddram_dqs_n,
ddram_clk_p => ddram_clk_p,
ddram_clk_n => ddram_clk_n,
ddram_clk_p => ddram_clk_p_vec,
ddram_clk_n => ddram_clk_n_vec,
ddram_cke => ddram_cke,
ddram_odt => ddram_odt,
ddram_reset_n => ddram_reset_n

@ -16,6 +16,7 @@ entity toplevel is
CLK_FREQUENCY : positive := 100000000;
HAS_FPU : boolean := true;
HAS_BTC : boolean := true;
HAS_SHORT_MULT: boolean := false;
USE_LITEDRAM : boolean := false;
NO_BRAM : boolean := false;
DISABLE_FLATTEN_CORE : boolean := false;
@ -138,6 +139,10 @@ architecture behaviour of toplevel is
signal spi_sdat_oe : std_ulogic_vector(3 downto 0);
signal spi_sdat_i : std_ulogic_vector(3 downto 0);

-- ddram clock signals as vectors
signal ddram_clk_p_vec : std_logic_vector(0 downto 0);
signal ddram_clk_n_vec : std_logic_vector(0 downto 0);

-- Fixup various memory sizes based on generics
function get_bram_size return natural is
begin
@ -170,6 +175,7 @@ begin
CLK_FREQ => CLK_FREQUENCY,
HAS_FPU => HAS_FPU,
HAS_BTC => HAS_BTC,
HAS_SHORT_MULT=> HAS_SHORT_MULT,
HAS_DRAM => USE_LITEDRAM,
DRAM_SIZE => 512 * 1024 * 1024,
DRAM_INIT_SIZE => PAYLOAD_SIZE,
@ -328,11 +334,15 @@ begin
end if;
end process;

ddram_clk_p_vec <= (others => ddram_clk_p);
ddram_clk_n_vec <= (others => ddram_clk_n);

dram: entity work.litedram_wrapper
generic map(
DRAM_ABITS => 25,
DRAM_ALINES => 15,
DRAM_DLINES => 16,
DRAM_CKLINES => 1,
DRAM_PORT_WIDTH => 128,
PAYLOAD_FILE => RAM_INIT_FILE,
PAYLOAD_SIZE => PAYLOAD_SIZE
@ -365,8 +375,8 @@ begin
ddram_dq => ddram_dq,
ddram_dqs_p => ddram_dqs_p,
ddram_dqs_n => ddram_dqs_n,
ddram_clk_p => ddram_clk_p,
ddram_clk_n => ddram_clk_n,
ddram_clk_p => ddram_clk_p_vec,
ddram_clk_n => ddram_clk_n_vec,
ddram_cke => ddram_cke,
ddram_odt => ddram_odt,
ddram_reset_n => ddram_reset_n
@ -444,7 +454,7 @@ begin
wb_eth_cyc <= wb_ext_io_in.cyc and wb_ext_is_eth;

-- Remove top address bits as liteeth decoder doesn't know about them
wb_eth_adr <= x"000" & "000" & wb_ext_io_in.adr(16 downto 2);
wb_eth_adr <= x"000" & "000" & wb_ext_io_in.adr(14 downto 0);

-- LiteETH isn't pipelined
wb_eth_out.stall <= not wb_eth_out.ack;
@ -533,7 +543,7 @@ begin
-- Gate cyc with chip select from SoC
wb_sdcard_cyc <= wb_ext_io_in.cyc and wb_ext_is_sdcard;

wb_sdcard_adr <= x"0000" & wb_ext_io_in.adr(15 downto 2);
wb_sdcard_adr <= x"0000" & wb_ext_io_in.adr(13 downto 0);

wb_sdcard_out.stall <= not wb_sdcard_out.ack;


@ -0,0 +1,512 @@
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;

library work;
use work.wishbone_types.all;

entity toplevel is
generic (
MEMORY_SIZE : integer := 16384;
RAM_INIT_FILE : string := "firmware.hex";
RESET_LOW : boolean := true;
CLK_INPUT : positive := 100000000;
CLK_FREQUENCY : positive := 100000000;
HAS_FPU : boolean := true;
HAS_BTC : boolean := false;
USE_LITEDRAM : boolean := true;
NO_BRAM : boolean := true;
SCLK_STARTUPE2 : boolean := false;
SPI_FLASH_OFFSET : integer := 4194304;
SPI_FLASH_DEF_CKDV : natural := 1;
SPI_FLASH_DEF_QUAD : boolean := true;
LOG_LENGTH : natural := 0;
UART_IS_16550 : boolean := true;
HAS_UART1 : boolean := false;
USE_LITESDCARD : boolean := true;
ICACHE_NUM_LINES : natural := 64;
NGPIO : natural := 0
);
port(
ext_clk : in std_ulogic;
ext_rst_n : in std_ulogic;

-- UART0 signals:
pin_gpio_0 : out std_ulogic;
pin_gpio_1 : in std_ulogic;

-- LEDs
led0_b : out std_ulogic;
led0_g : out std_ulogic;
led0_r : out std_ulogic;

-- SPI
spi_flash_cs_n : out std_ulogic;
spi_flash_mosi : inout std_ulogic;
spi_flash_miso : inout std_ulogic;
spi_flash_wp_n : inout std_ulogic;
spi_flash_hold_n : inout std_ulogic;

-- SD card wires
sdcard_data : inout std_ulogic_vector(3 downto 0);
sdcard_cmd : inout std_ulogic;
sdcard_clk : out std_ulogic;
sdcard_cd : in std_ulogic;

-- DRAM wires
ddram_a : out std_ulogic_vector(13 downto 0);
ddram_ba : out std_ulogic_vector(2 downto 0);
ddram_ras_n : out std_ulogic;
ddram_cas_n : out std_ulogic;
ddram_we_n : out std_ulogic;
ddram_cs_n : out std_ulogic;
ddram_dm : out std_ulogic_vector(1 downto 0);
ddram_dq : inout std_ulogic_vector(15 downto 0);
ddram_dqs_p : inout std_ulogic_vector(1 downto 0);
ddram_clk_p : out std_ulogic_vector(0 downto 0);
-- only the positive differential pin is instantiated
--ddram_dqs_n : inout std_ulogic_vector(1 downto 0);
--ddram_clk_n : out std_ulogic_vector(0 downto 0);
ddram_cke : out std_ulogic;
ddram_odt : out std_ulogic;
ddram_reset_n : out std_ulogic;

ddram_gnd : out std_ulogic_vector(1 downto 0);
ddram_vccio : out std_ulogic_vector(5 downto 0)
);
end entity toplevel;

architecture behaviour of toplevel is

-- Reset signals:
signal soc_rst : std_ulogic;
signal pll_rst : std_ulogic;

-- Internal clock signals:
signal system_clk : std_ulogic;
signal system_clk_locked : std_ulogic;

-- External IOs from the SoC
signal wb_ext_io_in : wb_io_master_out;
signal wb_ext_io_out : wb_io_slave_out;
signal wb_ext_is_dram_csr : std_ulogic;
signal wb_ext_is_dram_init : std_ulogic;
signal wb_ext_is_sdcard : std_ulogic;

-- DRAM main data wishbone connection
signal wb_dram_in : wishbone_master_out;
signal wb_dram_out : wishbone_slave_out;

-- DRAM control wishbone connection
signal wb_dram_ctrl_out : wb_io_slave_out := wb_io_slave_out_init;

-- LiteSDCard connection
signal ext_irq_sdcard : std_ulogic := '0';
signal wb_sdcard_out : wb_io_slave_out := wb_io_slave_out_init;
signal wb_sddma_out : wb_io_master_out := wb_io_master_out_init;
signal wb_sddma_in : wb_io_slave_out;
signal wb_sddma_nr : wb_io_master_out;
signal wb_sddma_ir : wb_io_slave_out;
-- for conversion from non-pipelined wishbone to pipelined
signal wb_sddma_stb_sent : std_ulogic;

-- Control/status
signal core_alt_reset : std_ulogic;

-- Status LED
signal led0_b_pwm : std_ulogic;
signal led0_r_pwm : std_ulogic;
signal led0_g_pwm : std_ulogic;

-- Dumb PWM for the LEDs, those RGB LEDs are too bright otherwise
signal pwm_counter : std_ulogic_vector(8 downto 0);

-- SPI flash
signal spi_sck : std_ulogic;
signal spi_cs_n : std_ulogic;
signal spi_sdat_o : std_ulogic_vector(3 downto 0);
signal spi_sdat_oe : std_ulogic_vector(3 downto 0);
signal spi_sdat_i : std_ulogic_vector(3 downto 0);

-- GPIO
signal gpio_in : std_ulogic_vector(NGPIO - 1 downto 0);
signal gpio_out : std_ulogic_vector(NGPIO - 1 downto 0);
signal gpio_dir : std_ulogic_vector(NGPIO - 1 downto 0);

-- Fixup various memory sizes based on generics
function get_bram_size return natural is
begin
if USE_LITEDRAM and NO_BRAM then
return 0;
else
return MEMORY_SIZE;
end if;
end function;

function get_payload_size return natural is
begin
if USE_LITEDRAM and NO_BRAM then
return MEMORY_SIZE;
else
return 0;
end if;
end function;

constant BRAM_SIZE : natural := get_bram_size;
constant PAYLOAD_SIZE : natural := get_payload_size;

COMPONENT USRMCLK
PORT(
USRMCLKI : IN STD_ULOGIC;
USRMCLKTS : IN STD_ULOGIC
);
END COMPONENT;
attribute syn_noprune: boolean ;
attribute syn_noprune of USRMCLK: component is true;

begin

-- Main SoC
soc0: entity work.soc
generic map(
MEMORY_SIZE => BRAM_SIZE,
RAM_INIT_FILE => RAM_INIT_FILE,
SIM => false,
CLK_FREQ => CLK_FREQUENCY,
HAS_FPU => HAS_FPU,
HAS_BTC => HAS_BTC,
HAS_DRAM => USE_LITEDRAM,
DRAM_SIZE => 256 * 1024 * 1024,
DRAM_INIT_SIZE => PAYLOAD_SIZE,
HAS_SPI_FLASH => true,
SPI_FLASH_DLINES => 4,
SPI_FLASH_OFFSET => SPI_FLASH_OFFSET,
SPI_FLASH_DEF_CKDV => SPI_FLASH_DEF_CKDV,
SPI_FLASH_DEF_QUAD => SPI_FLASH_DEF_QUAD,
LOG_LENGTH => LOG_LENGTH,
UART0_IS_16550 => UART_IS_16550,
HAS_UART1 => HAS_UART1,
HAS_SD_CARD => USE_LITESDCARD,
ICACHE_NUM_LINES => ICACHE_NUM_LINES,
HAS_SHORT_MULT => true,
NGPIO => NGPIO
)
port map (
-- System signals
system_clk => system_clk,
rst => soc_rst,

-- UART signals
uart0_txd => pin_gpio_0,
uart0_rxd => pin_gpio_1,

-- UART1 signals
--uart1_txd => uart_pmod_tx,
--uart1_rxd => uart_pmod_rx,

-- SPI signals
spi_flash_sck => spi_sck,
spi_flash_cs_n => spi_cs_n,
spi_flash_sdat_o => spi_sdat_o,
spi_flash_sdat_oe => spi_sdat_oe,
spi_flash_sdat_i => spi_sdat_i,

-- GPIO signals
gpio_in => gpio_in,
gpio_out => gpio_out,
gpio_dir => gpio_dir,

-- External interrupts
ext_irq_sdcard => ext_irq_sdcard,

-- DRAM wishbone
wb_dram_in => wb_dram_in,
wb_dram_out => wb_dram_out,

-- IO wishbone
wb_ext_io_in => wb_ext_io_in,
wb_ext_io_out => wb_ext_io_out,
wb_ext_is_dram_csr => wb_ext_is_dram_csr,
wb_ext_is_dram_init => wb_ext_is_dram_init,
wb_ext_is_sdcard => wb_ext_is_sdcard,

-- DMA wishbone
wishbone_dma_in => wb_sddma_in,
wishbone_dma_out => wb_sddma_out,

alt_reset => core_alt_reset
);

-- SPI Flash
--
spi_flash_cs_n <= spi_cs_n;
spi_flash_mosi <= spi_sdat_o(0) when spi_sdat_oe(0) = '1' else 'Z';
spi_flash_miso <= spi_sdat_o(1) when spi_sdat_oe(1) = '1' else 'Z';
spi_flash_wp_n <= spi_sdat_o(2) when spi_sdat_oe(2) = '1' else 'Z';
spi_flash_hold_n <= spi_sdat_o(3) when spi_sdat_oe(3) = '1' else 'Z';
spi_sdat_i(0) <= spi_flash_mosi;
spi_sdat_i(1) <= spi_flash_miso;
spi_sdat_i(2) <= spi_flash_wp_n;
spi_sdat_i(3) <= spi_flash_hold_n;

uclk: USRMCLK port map (
USRMCLKI => spi_sck,
USRMCLKTS => '0'
);

nodram: if not USE_LITEDRAM generate
signal ddram_clk_dummy : std_ulogic;
begin
reset_controller: entity work.soc_reset
generic map(
RESET_LOW => RESET_LOW
)
port map(
ext_clk => ext_clk,
pll_clk => system_clk,
pll_locked_in => system_clk_locked,
ext_rst_in => ext_rst_n,
pll_rst_out => pll_rst,
rst_out => soc_rst
);

clkgen: entity work.clock_generator
generic map(
CLK_INPUT_HZ => CLK_INPUT,
CLK_OUTPUT_HZ => CLK_FREQUENCY
)
port map(
ext_clk => ext_clk,
pll_rst_in => pll_rst,
pll_clk_out => system_clk,
pll_locked_out => system_clk_locked
);

led0_b_pwm <= '1';
led0_r_pwm <= '1';
led0_g_pwm <= '0';
core_alt_reset <= '0';

end generate;

has_dram: if USE_LITEDRAM generate
signal dram_init_done : std_ulogic;
signal dram_init_error : std_ulogic;
signal dram_sys_rst : std_ulogic;
signal rst_gen_rst : std_ulogic;
begin

-- Eventually dig out the frequency from
-- litesdram generate.py sys_clk_freq
-- but for now, assert it's 48Mhz for orangecrab
assert CLK_FREQUENCY = 48000000;

reset_controller: entity work.soc_reset
generic map(
RESET_LOW => RESET_LOW,
PLL_RESET_BITS => 18,
SOC_RESET_BITS => 1
)
port map(
ext_clk => ext_clk,
pll_clk => system_clk,
pll_locked_in => system_clk_locked,
ext_rst_in => ext_rst_n,
pll_rst_out => pll_rst,
rst_out => rst_gen_rst
);

-- Generate SoC reset
soc_rst_gen: process(system_clk)
begin
if ext_rst_n = '0' then
soc_rst <= '1';
elsif rising_edge(system_clk) then
soc_rst <= dram_sys_rst or not system_clk_locked;
end if;
end process;

dram: entity work.litedram_wrapper
generic map(
DRAM_ABITS => 24,
DRAM_ALINES => 14,
DRAM_DLINES => 16,
DRAM_CKLINES => 1,
DRAM_PORT_WIDTH => 128,
NUM_LINES => 8, -- reduce from default of 64 to make smaller/timing
PAYLOAD_FILE => RAM_INIT_FILE,
PAYLOAD_SIZE => PAYLOAD_SIZE
)
port map(
clk_in => ext_clk,
rst => pll_rst,
system_clk => system_clk,
system_reset => dram_sys_rst,
core_alt_reset => core_alt_reset,
pll_locked => system_clk_locked,

wb_in => wb_dram_in,
wb_out => wb_dram_out,
wb_ctrl_in => wb_ext_io_in,
wb_ctrl_out => wb_dram_ctrl_out,
wb_ctrl_is_csr => wb_ext_is_dram_csr,
wb_ctrl_is_init => wb_ext_is_dram_init,

init_done => dram_init_done,
init_error => dram_init_error,

ddram_a => ddram_a,
ddram_ba => ddram_ba,
ddram_ras_n => ddram_ras_n,
ddram_cas_n => ddram_cas_n,
ddram_we_n => ddram_we_n,
ddram_cs_n => ddram_cs_n,
ddram_dm => ddram_dm,
ddram_dq => ddram_dq,
ddram_dqs_p => ddram_dqs_p,
ddram_clk_p => ddram_clk_p,
-- only the positive differential pin is instantiated
--ddram_dqs_n => ddram_dqs_n,
--ddram_clk_n => ddram_clk_n,
ddram_cke => ddram_cke,
ddram_odt => ddram_odt,

ddram_reset_n => ddram_reset_n
);

ddram_gnd <= "00";
-- for power consumption.
-- https://github.com/orangecrab-fpga/orangecrab-hardware/issues/19#issuecomment-683479378
ddram_vccio <= "111111";

led0_b_pwm <= not dram_init_done;
led0_r_pwm <= dram_init_error;
led0_g_pwm <= dram_init_done and not dram_init_error;

end generate;


-- SD card pmod
has_sdcard : if USE_LITESDCARD generate
component litesdcard_core port (
clk : in std_ulogic;
rst : in std_ulogic;
-- wishbone for accessing control registers
wb_ctrl_adr : in std_ulogic_vector(29 downto 0);
wb_ctrl_dat_w : in std_ulogic_vector(31 downto 0);
wb_ctrl_dat_r : out std_ulogic_vector(31 downto 0);
wb_ctrl_sel : in std_ulogic_vector(3 downto 0);
wb_ctrl_cyc : in std_ulogic;
wb_ctrl_stb : in std_ulogic;
wb_ctrl_ack : out std_ulogic;
wb_ctrl_we : in std_ulogic;
wb_ctrl_cti : in std_ulogic_vector(2 downto 0);
wb_ctrl_bte : in std_ulogic_vector(1 downto 0);
wb_ctrl_err : out std_ulogic;
-- wishbone for SD card core to use for DMA
wb_dma_adr : out std_ulogic_vector(29 downto 0);
wb_dma_dat_w : out std_ulogic_vector(31 downto 0);
wb_dma_dat_r : in std_ulogic_vector(31 downto 0);
wb_dma_sel : out std_ulogic_vector(3 downto 0);
wb_dma_cyc : out std_ulogic;
wb_dma_stb : out std_ulogic;
wb_dma_ack : in std_ulogic;
wb_dma_we : out std_ulogic;
wb_dma_cti : out std_ulogic_vector(2 downto 0);
wb_dma_bte : out std_ulogic_vector(1 downto 0);
wb_dma_err : in std_ulogic;
-- connections to SD card
sdcard_data : inout std_ulogic_vector(3 downto 0);
sdcard_cmd : inout std_ulogic;
sdcard_clk : out std_ulogic;
sdcard_cd : in std_ulogic;
irq : out std_ulogic
);
end component;

signal wb_sdcard_cyc : std_ulogic;
signal wb_sdcard_adr : std_ulogic_vector(29 downto 0);

begin
litesdcard : litesdcard_core
port map (
clk => system_clk,
rst => soc_rst,
wb_ctrl_adr => wb_sdcard_adr,
wb_ctrl_dat_w => wb_ext_io_in.dat,
wb_ctrl_dat_r => wb_sdcard_out.dat,
wb_ctrl_sel => wb_ext_io_in.sel,
wb_ctrl_cyc => wb_sdcard_cyc,
wb_ctrl_stb => wb_ext_io_in.stb,
wb_ctrl_ack => wb_sdcard_out.ack,
wb_ctrl_we => wb_ext_io_in.we,
wb_ctrl_cti => "000",
wb_ctrl_bte => "00",
wb_ctrl_err => open,
wb_dma_adr => wb_sddma_nr.adr,
wb_dma_dat_w => wb_sddma_nr.dat,
wb_dma_dat_r => wb_sddma_ir.dat,
wb_dma_sel => wb_sddma_nr.sel,
wb_dma_cyc => wb_sddma_nr.cyc,
wb_dma_stb => wb_sddma_nr.stb,
wb_dma_ack => wb_sddma_ir.ack,
wb_dma_we => wb_sddma_nr.we,
wb_dma_cti => open,
wb_dma_bte => open,
wb_dma_err => '0',
sdcard_data => sdcard_data,
sdcard_cmd => sdcard_cmd,
sdcard_clk => sdcard_clk,
sdcard_cd => sdcard_cd,
irq => ext_irq_sdcard
);

-- Gate cyc with chip select from SoC
wb_sdcard_cyc <= wb_ext_io_in.cyc and wb_ext_is_sdcard;

wb_sdcard_adr <= x"0000" & wb_ext_io_in.adr(13 downto 0);

wb_sdcard_out.stall <= not wb_sdcard_out.ack;

-- Convert non-pipelined DMA wishbone to pipelined by suppressing
-- non-acknowledged strobes
process(system_clk)
begin
if rising_edge(system_clk) then
wb_sddma_out <= wb_sddma_nr;
if wb_sddma_stb_sent = '1' or
(wb_sddma_out.stb = '1' and wb_sddma_in.stall = '0') then
wb_sddma_out.stb <= '0';
end if;
if wb_sddma_nr.cyc = '0' or wb_sddma_ir.ack = '1' then
wb_sddma_stb_sent <= '0';
elsif wb_sddma_in.stall = '0' then
wb_sddma_stb_sent <= wb_sddma_nr.stb;
end if;
wb_sddma_ir <= wb_sddma_in;
end if;
end process;

end generate;

-- Mux WB response on the IO bus
wb_ext_io_out <= wb_sdcard_out when wb_ext_is_sdcard = '1' else
wb_dram_ctrl_out;

leds_pwm : process(system_clk)
begin
if rising_edge(system_clk) then
pwm_counter <= std_ulogic_vector(signed(pwm_counter) + 1);
if pwm_counter(8 downto 4) = "00000" then
led0_b <= led0_b_pwm;
led0_r <= led0_r_pwm;
led0_g <= led0_g_pwm;
else
led0_b <= '0';
led0_r <= '0';
led0_g <= '0';
end if;
end if;
end process;

end architecture behaviour;

@ -0,0 +1,587 @@
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;

library unisim;
use unisim.vcomponents.all;

library work;
use work.wishbone_types.all;

entity toplevel is
generic (
MEMORY_SIZE : integer := 16384;
RAM_INIT_FILE : string := "firmware.hex";
RESET_LOW : boolean := true;
CLK_FREQUENCY : positive := 100000000;
HAS_FPU : boolean := true;
HAS_BTC : boolean := true;
HAS_SHORT_MULT : boolean := false;
USE_LITEDRAM : boolean := false;
NO_BRAM : boolean := false;
DISABLE_FLATTEN_CORE : boolean := false;
SPI_FLASH_OFFSET : integer := 4194304;
SPI_FLASH_DEF_CKDV : natural := 1;
SPI_FLASH_DEF_QUAD : boolean := true;
LOG_LENGTH : natural := 512;
USE_LITEETH : boolean := false;
UART_IS_16550 : boolean := true;
HAS_UART1 : boolean := false;
USE_LITESDCARD : boolean := false;
HAS_GPIO : boolean := false;
NGPIO : natural := 32
);
port(
ext_clk : in std_ulogic;
ext_rst_n : in std_ulogic;

-- UART0 signals:
uart_main_tx : out std_ulogic;
uart_main_rx : in std_ulogic;

-- LEDs
led0_n : out std_ulogic;
led1_n : out std_ulogic;

-- SPI
spi_flash_cs_n : out std_ulogic;
spi_flash_mosi : inout std_ulogic;
spi_flash_miso : inout std_ulogic;
spi_flash_wp_n : inout std_ulogic;
spi_flash_hold_n : inout std_ulogic;

-- Ethernet
eth_clocks_tx : in std_ulogic;
eth_clocks_gtx : out std_ulogic;
eth_clocks_rx : in std_ulogic;
eth_rst_n : out std_ulogic;
eth_mdio : inout std_ulogic;
eth_mdc : out std_ulogic;
eth_rx_dv : in std_ulogic;
eth_rx_er : in std_ulogic;
eth_rx_data : in std_ulogic_vector(7 downto 0);
eth_tx_en : out std_ulogic;
eth_tx_er : out std_ulogic;
eth_tx_data : out std_ulogic_vector(7 downto 0);
eth_col : in std_ulogic;
eth_crs : in std_ulogic;

-- SD card
sdcard_data : inout std_ulogic_vector(3 downto 0);
sdcard_cmd : inout std_ulogic;
sdcard_clk : out std_ulogic;
sdcard_cd : in std_ulogic;

-- DRAM wires
ddram_a : out std_ulogic_vector(13 downto 0);
ddram_ba : out std_ulogic_vector(2 downto 0);
ddram_ras_n : out std_ulogic;
ddram_cas_n : out std_ulogic;
ddram_we_n : out std_ulogic;
ddram_dm : out std_ulogic_vector(1 downto 0);
ddram_dq : inout std_ulogic_vector(15 downto 0);
ddram_dqs_p : inout std_ulogic_vector(1 downto 0);
ddram_dqs_n : inout std_ulogic_vector(1 downto 0);
ddram_clk_p : out std_ulogic;
ddram_clk_n : out std_ulogic;
ddram_cke : out std_ulogic;
ddram_odt : out std_ulogic;
ddram_reset_n : out std_ulogic
);
end entity toplevel;

architecture behaviour of toplevel is

-- Reset signals:
signal soc_rst : std_ulogic;
signal pll_rst : std_ulogic;

-- Internal clock signals:
signal system_clk : std_ulogic;
signal system_clk_locked : std_ulogic;

-- External IOs from the SoC
signal wb_ext_io_in : wb_io_master_out;
signal wb_ext_io_out : wb_io_slave_out;
signal wb_ext_is_dram_csr : std_ulogic;
signal wb_ext_is_dram_init : std_ulogic;
signal wb_ext_is_eth : std_ulogic;
signal wb_ext_is_sdcard : std_ulogic;

-- DRAM main data wishbone connection
signal wb_dram_in : wishbone_master_out;
signal wb_dram_out : wishbone_slave_out;

-- DRAM control wishbone connection
signal wb_dram_ctrl_out : wb_io_slave_out := wb_io_slave_out_init;

-- LiteEth connection
signal ext_irq_eth : std_ulogic;
signal wb_eth_out : wb_io_slave_out := wb_io_slave_out_init;

-- LiteSDCard connection
signal ext_irq_sdcard : std_ulogic := '0';
signal wb_sdcard_out : wb_io_slave_out := wb_io_slave_out_init;
signal wb_sddma_out : wb_io_master_out := wb_io_master_out_init;
signal wb_sddma_in : wb_io_slave_out;
signal wb_sddma_nr : wb_io_master_out;
signal wb_sddma_ir : wb_io_slave_out;
-- for conversion from non-pipelined wishbone to pipelined
signal wb_sddma_stb_sent : std_ulogic;

-- Control/status
signal core_alt_reset : std_ulogic;

-- SPI flash
signal spi_sck : std_ulogic;
signal spi_cs_n : std_ulogic;
signal spi_sdat_o : std_ulogic_vector(3 downto 0);
signal spi_sdat_oe : std_ulogic_vector(3 downto 0);
signal spi_sdat_i : std_ulogic_vector(3 downto 0);

-- ddram clock signals as vectors
signal ddram_clk_p_vec : std_ulogic_vector(0 downto 0);
signal ddram_clk_n_vec : std_ulogic_vector(0 downto 0);

-- Fixup various memory sizes based on generics
function get_bram_size return natural is
begin
if USE_LITEDRAM and NO_BRAM then
return 0;
else
return MEMORY_SIZE;
end if;
end function;

function get_payload_size return natural is
begin
if USE_LITEDRAM and NO_BRAM then
return MEMORY_SIZE;
else
return 0;
end if;
end function;
constant BRAM_SIZE : natural := get_bram_size;
constant PAYLOAD_SIZE : natural := get_payload_size;
begin

-- Main SoC
soc0: entity work.soc
generic map(
MEMORY_SIZE => BRAM_SIZE,
RAM_INIT_FILE => RAM_INIT_FILE,
SIM => false,
CLK_FREQ => CLK_FREQUENCY,
HAS_FPU => HAS_FPU,
HAS_BTC => HAS_BTC,
HAS_SHORT_MULT => HAS_SHORT_MULT,
HAS_DRAM => USE_LITEDRAM,
DRAM_SIZE => 256 * 1024 * 1024,
DRAM_INIT_SIZE => PAYLOAD_SIZE,
DISABLE_FLATTEN_CORE => DISABLE_FLATTEN_CORE,
HAS_SPI_FLASH => true,
SPI_FLASH_DLINES => 4,
SPI_FLASH_OFFSET => SPI_FLASH_OFFSET,
SPI_FLASH_DEF_CKDV => SPI_FLASH_DEF_CKDV,
SPI_FLASH_DEF_QUAD => SPI_FLASH_DEF_QUAD,
LOG_LENGTH => LOG_LENGTH,
HAS_LITEETH => USE_LITEETH,
UART0_IS_16550 => UART_IS_16550,
HAS_UART1 => HAS_UART1,
HAS_SD_CARD => USE_LITESDCARD,
HAS_GPIO => HAS_GPIO,
NGPIO => NGPIO
)
port map (
-- System signals
system_clk => system_clk,
rst => soc_rst,

-- UART signals
uart0_txd => uart_main_tx,
uart0_rxd => uart_main_rx,

-- SPI signals
spi_flash_sck => spi_sck,
spi_flash_cs_n => spi_cs_n,
spi_flash_sdat_o => spi_sdat_o,
spi_flash_sdat_oe => spi_sdat_oe,
spi_flash_sdat_i => spi_sdat_i,

-- External interrupts
ext_irq_eth => ext_irq_eth,
ext_irq_sdcard => ext_irq_sdcard,

-- DRAM wishbone
wb_dram_in => wb_dram_in,
wb_dram_out => wb_dram_out,

-- IO wishbone
wb_ext_io_in => wb_ext_io_in,
wb_ext_io_out => wb_ext_io_out,
wb_ext_is_dram_csr => wb_ext_is_dram_csr,
wb_ext_is_dram_init => wb_ext_is_dram_init,
wb_ext_is_eth => wb_ext_is_eth,
wb_ext_is_sdcard => wb_ext_is_sdcard,

-- DMA wishbone
wishbone_dma_in => wb_sddma_in,
wishbone_dma_out => wb_sddma_out,

alt_reset => core_alt_reset
);

-- SPI Flash
spi_flash_cs_n <= spi_cs_n;
spi_flash_mosi <= spi_sdat_o(0) when spi_sdat_oe(0) = '1' else 'Z';
spi_flash_miso <= spi_sdat_o(1) when spi_sdat_oe(1) = '1' else 'Z';
spi_flash_wp_n <= spi_sdat_o(2) when spi_sdat_oe(2) = '1' else 'Z';
spi_flash_hold_n <= spi_sdat_o(3) when spi_sdat_oe(3) = '1' else 'Z';
spi_sdat_i(0) <= spi_flash_mosi;
spi_sdat_i(1) <= spi_flash_miso;
spi_sdat_i(2) <= spi_flash_wp_n;
spi_sdat_i(3) <= spi_flash_hold_n;

STARTUPE2_INST: STARTUPE2
port map (
CLK => '0',
GSR => '0',
GTS => '0',
KEYCLEARB => '0',
PACK => '0',
USRCCLKO => spi_sck,
USRCCLKTS => '0',
USRDONEO => '1',
USRDONETS => '0'
);

nodram: if not USE_LITEDRAM generate
signal ddram_clk_dummy : std_ulogic;
begin
reset_controller: entity work.soc_reset
generic map(
RESET_LOW => RESET_LOW
)
port map(
ext_clk => ext_clk,
pll_clk => system_clk,
pll_locked_in => system_clk_locked,
ext_rst_in => ext_rst_n,
pll_rst_out => pll_rst,
rst_out => soc_rst
);

clkgen: entity work.clock_generator
generic map(
CLK_INPUT_HZ => 50000000,
CLK_OUTPUT_HZ => CLK_FREQUENCY
)
port map(
ext_clk => ext_clk,
pll_rst_in => pll_rst,
pll_clk_out => system_clk,
pll_locked_out => system_clk_locked
);

core_alt_reset <= '0';

-- Vivado barfs on those differential signals if left
-- unconnected. So instanciate a diff. buffer and feed
-- it a constant '0'.
dummy_dram_clk: OBUFDS
port map (
O => ddram_clk_p,
OB => ddram_clk_n,
I => ddram_clk_dummy
);
ddram_clk_dummy <= '0';

end generate;

has_dram: if USE_LITEDRAM generate
signal dram_init_done : std_ulogic;
signal dram_init_error : std_ulogic;
signal dram_sys_rst : std_ulogic;
signal rst_gen_rst : std_ulogic;
begin

-- Eventually dig out the frequency from the generator
-- but for now, assert it's 100Mhz
assert CLK_FREQUENCY = 100000000;

reset_controller: entity work.soc_reset
generic map(
RESET_LOW => RESET_LOW,
PLL_RESET_BITS => 18,
SOC_RESET_BITS => 1
)
port map(
ext_clk => ext_clk,
pll_clk => system_clk,
pll_locked_in => system_clk_locked,
ext_rst_in => ext_rst_n,
pll_rst_out => pll_rst,
rst_out => rst_gen_rst
);

-- Generate SoC reset
soc_rst_gen: process(system_clk)
begin
if ext_rst_n = '0' then
soc_rst <= '1';
elsif rising_edge(system_clk) then
soc_rst <= dram_sys_rst or not system_clk_locked;
end if;
end process;

ddram_clk_p_vec <= (others => ddram_clk_p);
ddram_clk_n_vec <= (others => ddram_clk_n);

dram: entity work.litedram_wrapper
generic map(
DRAM_ABITS => 24,
DRAM_ALINES => 14,
DRAM_DLINES => 16,
DRAM_CKLINES => 1,
DRAM_PORT_WIDTH => 128,
PAYLOAD_FILE => RAM_INIT_FILE,
PAYLOAD_SIZE => PAYLOAD_SIZE
)
port map(
clk_in => ext_clk,
rst => pll_rst,
system_clk => system_clk,
system_reset => dram_sys_rst,
core_alt_reset => core_alt_reset,
pll_locked => system_clk_locked,

wb_in => wb_dram_in,
wb_out => wb_dram_out,
wb_ctrl_in => wb_ext_io_in,
wb_ctrl_out => wb_dram_ctrl_out,
wb_ctrl_is_csr => wb_ext_is_dram_csr,
wb_ctrl_is_init => wb_ext_is_dram_init,

init_done => dram_init_done,
init_error => dram_init_error,

ddram_a => ddram_a,
ddram_ba => ddram_ba,
ddram_ras_n => ddram_ras_n,
ddram_cas_n => ddram_cas_n,
ddram_we_n => ddram_we_n,
ddram_cs_n => open,
ddram_dm => ddram_dm,
ddram_dq => ddram_dq,
ddram_dqs_p => ddram_dqs_p,
ddram_dqs_n => ddram_dqs_n,
ddram_clk_p => ddram_clk_p_vec,
ddram_clk_n => ddram_clk_n_vec,
ddram_cke => ddram_cke,
ddram_odt => ddram_odt,
ddram_reset_n => ddram_reset_n
);

end generate;

has_liteeth : if USE_LITEETH generate

component liteeth_core port (
sys_clock : in std_ulogic;
sys_reset : in std_ulogic;
gmii_eth_clocks_tx : in std_ulogic;
gmii_eth_clocks_gtx : out std_ulogic;
gmii_eth_clocks_rx : in std_ulogic;
gmii_eth_rst_n : out std_ulogic;
gmii_eth_mdio : inout std_ulogic;
gmii_eth_mdc : out std_ulogic;
gmii_eth_rx_dv : in std_ulogic;
gmii_eth_rx_er : in std_ulogic;
gmii_eth_rx_data : in std_ulogic_vector(7 downto 0);
gmii_eth_tx_en : out std_ulogic;
gmii_eth_tx_er : out std_ulogic;
gmii_eth_tx_data : out std_ulogic_vector(7 downto 0);
gmii_eth_col : in std_ulogic;
gmii_eth_crs : in std_ulogic;
wishbone_adr : in std_ulogic_vector(29 downto 0);
wishbone_dat_w : in std_ulogic_vector(31 downto 0);
wishbone_dat_r : out std_ulogic_vector(31 downto 0);
wishbone_sel : in std_ulogic_vector(3 downto 0);
wishbone_cyc : in std_ulogic;
wishbone_stb : in std_ulogic;
wishbone_ack : out std_ulogic;
wishbone_we : in std_ulogic;
wishbone_cti : in std_ulogic_vector(2 downto 0);
wishbone_bte : in std_ulogic_vector(1 downto 0);
wishbone_err : out std_ulogic;
interrupt : out std_ulogic
);
end component;

signal wb_eth_cyc : std_ulogic;
signal wb_eth_adr : std_ulogic_vector(29 downto 0);

-- Change this to use a PLL instead of a BUFR to generate the 25Mhz
-- reference clock to the PHY.
constant USE_PLL : boolean := false;
begin
liteeth : liteeth_core
port map(
sys_clock => system_clk,
sys_reset => soc_rst,
gmii_eth_clocks_tx => eth_clocks_tx,
gmii_eth_clocks_gtx => eth_clocks_gtx,
gmii_eth_clocks_rx => eth_clocks_rx,
gmii_eth_rst_n => eth_rst_n,
gmii_eth_mdio => eth_mdio,
gmii_eth_mdc => eth_mdc,
gmii_eth_rx_dv => eth_rx_dv,
gmii_eth_rx_er => eth_rx_er,
gmii_eth_rx_data => eth_rx_data,
gmii_eth_tx_en => eth_tx_en,
gmii_eth_tx_er => eth_tx_er,
gmii_eth_tx_data => eth_tx_data,
gmii_eth_col => eth_col,
gmii_eth_crs => eth_crs,
wishbone_adr => wb_eth_adr,
wishbone_dat_w => wb_ext_io_in.dat,
wishbone_dat_r => wb_eth_out.dat,
wishbone_sel => wb_ext_io_in.sel,
wishbone_cyc => wb_eth_cyc,
wishbone_stb => wb_ext_io_in.stb,
wishbone_ack => wb_eth_out.ack,
wishbone_we => wb_ext_io_in.we,
wishbone_cti => "000",
wishbone_bte => "00",
wishbone_err => open,
interrupt => ext_irq_eth
);

-- Gate cyc with "chip select" from soc
wb_eth_cyc <= wb_ext_io_in.cyc and wb_ext_is_eth;

-- Remove top address bits as liteeth decoder doesn't know about them
wb_eth_adr <= x"000" & "000" & wb_ext_io_in.adr(14 downto 0);

-- LiteETH isn't pipelined
wb_eth_out.stall <= not wb_eth_out.ack;

end generate;

no_liteeth : if not USE_LITEETH generate
ext_irq_eth <= '0';
end generate;

-- SD card pmod
has_sdcard : if USE_LITESDCARD generate
component litesdcard_core port (
clk : in std_ulogic;
rst : in std_ulogic;
-- wishbone for accessing control registers
wb_ctrl_adr : in std_ulogic_vector(29 downto 0);
wb_ctrl_dat_w : in std_ulogic_vector(31 downto 0);
wb_ctrl_dat_r : out std_ulogic_vector(31 downto 0);
wb_ctrl_sel : in std_ulogic_vector(3 downto 0);
wb_ctrl_cyc : in std_ulogic;
wb_ctrl_stb : in std_ulogic;
wb_ctrl_ack : out std_ulogic;
wb_ctrl_we : in std_ulogic;
wb_ctrl_cti : in std_ulogic_vector(2 downto 0);
wb_ctrl_bte : in std_ulogic_vector(1 downto 0);
wb_ctrl_err : out std_ulogic;
-- wishbone for SD card core to use for DMA
wb_dma_adr : out std_ulogic_vector(29 downto 0);
wb_dma_dat_w : out std_ulogic_vector(31 downto 0);
wb_dma_dat_r : in std_ulogic_vector(31 downto 0);
wb_dma_sel : out std_ulogic_vector(3 downto 0);
wb_dma_cyc : out std_ulogic;
wb_dma_stb : out std_ulogic;
wb_dma_ack : in std_ulogic;
wb_dma_we : out std_ulogic;
wb_dma_cti : out std_ulogic_vector(2 downto 0);
wb_dma_bte : out std_ulogic_vector(1 downto 0);
wb_dma_err : in std_ulogic;
-- connections to SD card
sdcard_data : inout std_ulogic_vector(3 downto 0);
sdcard_cmd : inout std_ulogic;
sdcard_clk : out std_ulogic;
sdcard_cd : in std_ulogic;
irq : out std_ulogic
);
end component;

signal wb_sdcard_cyc : std_ulogic;
signal wb_sdcard_adr : std_ulogic_vector(29 downto 0);

begin
litesdcard : litesdcard_core
port map (
clk => system_clk,
rst => soc_rst,
wb_ctrl_adr => wb_sdcard_adr,
wb_ctrl_dat_w => wb_ext_io_in.dat,
wb_ctrl_dat_r => wb_sdcard_out.dat,
wb_ctrl_sel => wb_ext_io_in.sel,
wb_ctrl_cyc => wb_sdcard_cyc,
wb_ctrl_stb => wb_ext_io_in.stb,
wb_ctrl_ack => wb_sdcard_out.ack,
wb_ctrl_we => wb_ext_io_in.we,
wb_ctrl_cti => "000",
wb_ctrl_bte => "00",
wb_ctrl_err => open,
wb_dma_adr => wb_sddma_nr.adr,
wb_dma_dat_w => wb_sddma_nr.dat,
wb_dma_dat_r => wb_sddma_ir.dat,
wb_dma_sel => wb_sddma_nr.sel,
wb_dma_cyc => wb_sddma_nr.cyc,
wb_dma_stb => wb_sddma_nr.stb,
wb_dma_ack => wb_sddma_ir.ack,
wb_dma_we => wb_sddma_nr.we,
wb_dma_cti => open,
wb_dma_bte => open,
wb_dma_err => '0',
sdcard_data => sdcard_data,
sdcard_cmd => sdcard_cmd,
sdcard_clk => sdcard_clk,
sdcard_cd => sdcard_cd,
irq => ext_irq_sdcard
);

-- Gate cyc with chip select from SoC
wb_sdcard_cyc <= wb_ext_io_in.cyc and wb_ext_is_sdcard;

wb_sdcard_adr <= x"0000" & wb_ext_io_in.adr(13 downto 0);

wb_sdcard_out.stall <= not wb_sdcard_out.ack;

-- Convert non-pipelined DMA wishbone to pipelined by suppressing
-- non-acknowledged strobes
process(system_clk)
begin
if rising_edge(system_clk) then
wb_sddma_out <= wb_sddma_nr;
if wb_sddma_stb_sent = '1' or
(wb_sddma_out.stb = '1' and wb_sddma_in.stall = '0') then
wb_sddma_out.stb <= '0';
end if;
if wb_sddma_nr.cyc = '0' or wb_sddma_ir.ack = '1' then
wb_sddma_stb_sent <= '0';
elsif wb_sddma_in.stall = '0' then
wb_sddma_stb_sent <= wb_sddma_nr.stb;
end if;
wb_sddma_ir <= wb_sddma_in;
end if;
end process;

end generate;

-- Mux WB response on the IO bus
wb_ext_io_out <= wb_eth_out when wb_ext_is_eth = '1' else
wb_sdcard_out when wb_ext_is_sdcard = '1' else
wb_dram_ctrl_out;

led0_n <= system_clk_locked;
led1_n <= not soc_rst;

end architecture behaviour;

@ -0,0 +1,487 @@
################################################################################
# clkin, reset, uart pins...
################################################################################

set_property -dict { PACKAGE_PIN M21 IOSTANDARD LVCMOS33 } [get_ports { ext_clk }];

set_property -dict { PACKAGE_PIN H7 IOSTANDARD LVCMOS33 } [get_ports { ext_rst_n }];

set_property -dict { PACKAGE_PIN E3 IOSTANDARD LVCMOS33 } [get_ports { uart_main_tx }];
set_property -dict { PACKAGE_PIN F3 IOSTANDARD LVCMOS33 } [get_ports { uart_main_rx }];

################################################################################
# LEDs
################################################################################

set_property -dict { PACKAGE_PIN V16 IOSTANDARD LVCMOS33 } [get_ports { led0_n }];
set_property -dict { PACKAGE_PIN V17 IOSTANDARD LVCMOS33 } [get_ports { led1_n }];

################################################################################
# SPI Flash
################################################################################ema

set_property -dict { PACKAGE_PIN P18 IOSTANDARD LVCMOS33 } [get_ports { spi_flash_cs_n }];
set_property -dict { PACKAGE_PIN R14 IOSTANDARD LVCMOS33 } [get_ports { spi_flash_mosi }];
set_property -dict { PACKAGE_PIN R15 IOSTANDARD LVCMOS33 } [get_ports { spi_flash_miso }];
set_property -dict { PACKAGE_PIN P14 IOSTANDARD LVCMOS33 } [get_ports { spi_flash_wp_n }];
set_property -dict { PACKAGE_PIN N14 IOSTANDARD LVCMOS33 } [get_ports { spi_flash_hold_n }];

################################################################################
# Micro SD
################################################################################

set_property -dict { PACKAGE_PIN M5 IOSTANDARD LVCMOS33 SLEW FAST } [get_ports { sdcard_data[0] }];
set_property -dict { PACKAGE_PIN M7 IOSTANDARD LVCMOS33 SLEW FAST } [get_ports { sdcard_data[1] }];
set_property -dict { PACKAGE_PIN H6 IOSTANDARD LVCMOS33 SLEW FAST } [get_ports { sdcard_data[2] }];
set_property -dict { PACKAGE_PIN J6 IOSTANDARD LVCMOS33 SLEW FAST } [get_ports { sdcard_data[3] }];
set_property -dict { PACKAGE_PIN J8 IOSTANDARD LVCMOS33 SLEW FAST } [get_ports { sdcard_cmd }];
set_property -dict { PACKAGE_PIN L4 IOSTANDARD LVCMOS33 SLEW FAST } [get_ports { sdcard_clk }];
set_property -dict { PACKAGE_PIN N6 IOSTANDARD LVCMOS33 } [get_ports { sdcard_cd }];

# Put registers into IOBs to improve timing
set_property IOB true [get_cells -hierarchical -filter {NAME =~*.litesdcard/sdcard_*}]

################################################################################
# PMOD header J10 (high-speed, no protection resisters)
################################################################################

#set_property -dict { PACKAGE_PIN D5 IOSTANDARD LVCMOS33 } [get_ports { pmod_j10_1 }];
#set_property -dict { PACKAGE_PIN G5 IOSTANDARD LVCMOS33 } [get_ports { pmod_j10_2 }];
#set_property -dict { PACKAGE_PIN G7 IOSTANDARD LVCMOS33 } [get_ports { pmod_j10_3 }];
#set_property -dict { PACKAGE_PIN G8 IOSTANDARD LVCMOS33 } [get_ports { pmod_j10_4 }];
#set_property -dict { PACKAGE_PIN E5 IOSTANDARD LVCMOS33 } [get_ports { pmod_j10_7 }];
#set_property -dict { PACKAGE_PIN E6 IOSTANDARD LVCMOS33 } [get_ports { pmod_j10_8 }];
#set_property -dict { PACKAGE_PIN D6 IOSTANDARD LVCMOS33 } [get_ports { pmod_j10_9 }];
#set_property -dict { PACKAGE_PIN G6 IOSTANDARD LVCMOS33 } [get_ports { pmod_j10_10 }];

################################################################################
# PMOD header J11 (high-speed, no protection resisters)
################################################################################

#set_property -dict { PACKAGE_PIN H4 IOSTANDARD LVCMOS33 } [get_ports { pmod_j11_1 }];
#set_property -dict { PACKAGE_PIN F4 IOSTANDARD LVCMOS33 } [get_ports { pmod_j11_2 }];
#set_property -dict { PACKAGE_PIN A4 IOSTANDARD LVCMOS33 } [get_ports { pmod_j11_3 }];
#set_property -dict { PACKAGE_PIN A5 IOSTANDARD LVCMOS33 } [get_ports { pmod_j11_4 }];
#set_property -dict { PACKAGE_PIN J4 IOSTANDARD LVCMOS33 } [get_ports { pmod_j11_7 }];
#set_property -dict { PACKAGE_PIN G4 IOSTANDARD LVCMOS33 } [get_ports { pmod_j11_8 }];
#set_property -dict { PACKAGE_PIN B4 IOSTANDARD LVCMOS33 } [get_ports { pmod_j11_9 }];
#set_property -dict { PACKAGE_PIN B5 IOSTANDARD LVCMOS33 } [get_ports { pmod_j11_10 }];

################################################################################
# HDR 20X2 connector
################################################################################

## TODO

################################################################################
# Ethernet (generated by LiteX)
################################################################################

# eth_clocks:0.tx
set_property LOC M2 [get_ports {eth_clocks_tx}]
set_property IOSTANDARD LVCMOS33 [get_ports {eth_clocks_tx}]

# eth_clocks:0.gtx
set_property LOC U1 [get_ports {eth_clocks_gtx}]
set_property IOSTANDARD LVCMOS33 [get_ports {eth_clocks_gtx}]

# eth_clocks:0.rx
set_property LOC P4 [get_ports {eth_clocks_rx}]
set_property IOSTANDARD LVCMOS33 [get_ports {eth_clocks_rx}]

# eth:0.rst_n
set_property LOC R1 [get_ports {eth_rst_n}]
set_property IOSTANDARD LVCMOS33 [get_ports {eth_rst_n}]

# eth:0.mdio
set_property LOC H1 [get_ports {eth_mdio}]
set_property IOSTANDARD LVCMOS33 [get_ports {eth_mdio}]

# eth:0.mdc
set_property LOC H2 [get_ports {eth_mdc}]
set_property IOSTANDARD LVCMOS33 [get_ports {eth_mdc}]

# eth:0.rx_dv
set_property LOC L3 [get_ports {eth_rx_dv}]
set_property IOSTANDARD LVCMOS33 [get_ports {eth_rx_dv}]

# eth:0.rx_er
set_property LOC U5 [get_ports {eth_rx_er}]
set_property IOSTANDARD LVCMOS33 [get_ports {eth_rx_er}]

# eth:0.rx_data
set_property LOC M4 [get_ports {eth_rx_data[0]}]
set_property IOSTANDARD LVCMOS33 [get_ports {eth_rx_data[0]}]

# eth:0.rx_data
set_property LOC N3 [get_ports {eth_rx_data[1]}]
set_property IOSTANDARD LVCMOS33 [get_ports {eth_rx_data[1]}]

# eth:0.rx_data
set_property LOC N4 [get_ports {eth_rx_data[2]}]
set_property IOSTANDARD LVCMOS33 [get_ports {eth_rx_data[2]}]

# eth:0.rx_data
set_property LOC P3 [get_ports {eth_rx_data[3]}]
set_property IOSTANDARD LVCMOS33 [get_ports {eth_rx_data[3]}]

# eth:0.rx_data
set_property LOC R3 [get_ports {eth_rx_data[4]}]
set_property IOSTANDARD LVCMOS33 [get_ports {eth_rx_data[4]}]

# eth:0.rx_data
set_property LOC T3 [get_ports {eth_rx_data[5]}]
set_property IOSTANDARD LVCMOS33 [get_ports {eth_rx_data[5]}]

# eth:0.rx_data
set_property LOC T4 [get_ports {eth_rx_data[6]}]
set_property IOSTANDARD LVCMOS33 [get_ports {eth_rx_data[6]}]

# eth:0.rx_data
set_property LOC T5 [get_ports {eth_rx_data[7]}]
set_property IOSTANDARD LVCMOS33 [get_ports {eth_rx_data[7]}]

# eth:0.tx_en
set_property LOC T2 [get_ports {eth_tx_en}]
set_property IOSTANDARD LVCMOS33 [get_ports {eth_tx_en}]

# eth:0.tx_er
set_property LOC J1 [get_ports {eth_tx_er}]
set_property IOSTANDARD LVCMOS33 [get_ports {eth_tx_er}]

# eth:0.tx_data
set_property LOC R2 [get_ports {eth_tx_data[0]}]
set_property IOSTANDARD LVCMOS33 [get_ports {eth_tx_data[0]}]

# eth:0.tx_data
set_property LOC P1 [get_ports {eth_tx_data[1]}]
set_property IOSTANDARD LVCMOS33 [get_ports {eth_tx_data[1]}]

# eth:0.tx_data
set_property LOC N2 [get_ports {eth_tx_data[2]}]
set_property IOSTANDARD LVCMOS33 [get_ports {eth_tx_data[2]}]

# eth:0.tx_data
set_property LOC N1 [get_ports {eth_tx_data[3]}]
set_property IOSTANDARD LVCMOS33 [get_ports {eth_tx_data[3]}]

# eth:0.tx_data
set_property LOC M1 [get_ports {eth_tx_data[4]}]
set_property IOSTANDARD LVCMOS33 [get_ports {eth_tx_data[4]}]

# eth:0.tx_data
set_property LOC L2 [get_ports {eth_tx_data[5]}]
set_property IOSTANDARD LVCMOS33 [get_ports {eth_tx_data[5]}]

# eth:0.tx_data
set_property LOC K2 [get_ports {eth_tx_data[6]}]
set_property IOSTANDARD LVCMOS33 [get_ports {eth_tx_data[6]}]

# eth:0.tx_data
set_property LOC K1 [get_ports {eth_tx_data[7]}]
set_property IOSTANDARD LVCMOS33 [get_ports {eth_tx_data[7]}]

# eth:0.col
set_property LOC U4 [get_ports {eth_col}]
set_property IOSTANDARD LVCMOS33 [get_ports {eth_col}]

# eth:0.crs
set_property LOC U2 [get_ports {eth_crs}]
set_property IOSTANDARD LVCMOS33 [get_ports {eth_crs}]

################################################################################
# DRAM (generated by LiteX)
################################################################################

# ddram:0.a
set_property LOC E17 [get_ports {ddram_a[0]}]
set_property SLEW FAST [get_ports {ddram_a[0]}]
set_property IOSTANDARD SSTL135 [get_ports {ddram_a[0]}]

# ddram:0.a
set_property LOC G17 [get_ports {ddram_a[1]}]
set_property SLEW FAST [get_ports {ddram_a[1]}]
set_property IOSTANDARD SSTL135 [get_ports {ddram_a[1]}]

# ddram:0.a
set_property LOC F17 [get_ports {ddram_a[2]}]
set_property SLEW FAST [get_ports {ddram_a[2]}]
set_property IOSTANDARD SSTL135 [get_ports {ddram_a[2]}]

# ddram:0.a
set_property LOC C17 [get_ports {ddram_a[3]}]
set_property SLEW FAST [get_ports {ddram_a[3]}]
set_property IOSTANDARD SSTL135 [get_ports {ddram_a[3]}]

# ddram:0.a
set_property LOC G16 [get_ports {ddram_a[4]}]
set_property SLEW FAST [get_ports {ddram_a[4]}]
set_property IOSTANDARD SSTL135 [get_ports {ddram_a[4]}]

# ddram:0.a
set_property LOC D16 [get_ports {ddram_a[5]}]
set_property SLEW FAST [get_ports {ddram_a[5]}]
set_property IOSTANDARD SSTL135 [get_ports {ddram_a[5]}]

# ddram:0.a
set_property LOC H16 [get_ports {ddram_a[6]}]
set_property SLEW FAST [get_ports {ddram_a[6]}]
set_property IOSTANDARD SSTL135 [get_ports {ddram_a[6]}]

# ddram:0.a
set_property LOC E16 [get_ports {ddram_a[7]}]
set_property SLEW FAST [get_ports {ddram_a[7]}]
set_property IOSTANDARD SSTL135 [get_ports {ddram_a[7]}]

# ddram:0.a
set_property LOC H14 [get_ports {ddram_a[8]}]
set_property SLEW FAST [get_ports {ddram_a[8]}]
set_property IOSTANDARD SSTL135 [get_ports {ddram_a[8]}]

# ddram:0.a
set_property LOC F15 [get_ports {ddram_a[9]}]
set_property SLEW FAST [get_ports {ddram_a[9]}]
set_property IOSTANDARD SSTL135 [get_ports {ddram_a[9]}]

# ddram:0.a
set_property LOC F20 [get_ports {ddram_a[10]}]
set_property SLEW FAST [get_ports {ddram_a[10]}]
set_property IOSTANDARD SSTL135 [get_ports {ddram_a[10]}]

# ddram:0.a
set_property LOC H15 [get_ports {ddram_a[11]}]
set_property SLEW FAST [get_ports {ddram_a[11]}]
set_property IOSTANDARD SSTL135 [get_ports {ddram_a[11]}]

# ddram:0.a
set_property LOC C18 [get_ports {ddram_a[12]}]
set_property SLEW FAST [get_ports {ddram_a[12]}]
set_property IOSTANDARD SSTL135 [get_ports {ddram_a[12]}]

# ddram:0.a
set_property LOC G15 [get_ports {ddram_a[13]}]
set_property SLEW FAST [get_ports {ddram_a[13]}]
set_property IOSTANDARD SSTL135 [get_ports {ddram_a[13]}]

# ddram:0.ba
set_property LOC B17 [get_ports {ddram_ba[0]}]
set_property SLEW FAST [get_ports {ddram_ba[0]}]
set_property IOSTANDARD SSTL135 [get_ports {ddram_ba[0]}]

# ddram:0.ba
set_property LOC D18 [get_ports {ddram_ba[1]}]
set_property SLEW FAST [get_ports {ddram_ba[1]}]
set_property IOSTANDARD SSTL135 [get_ports {ddram_ba[1]}]

# ddram:0.ba
set_property LOC A17 [get_ports {ddram_ba[2]}]
set_property SLEW FAST [get_ports {ddram_ba[2]}]
set_property IOSTANDARD SSTL135 [get_ports {ddram_ba[2]}]

# ddram:0.ras_n
set_property LOC A19 [get_ports {ddram_ras_n}]
set_property SLEW FAST [get_ports {ddram_ras_n}]
set_property IOSTANDARD SSTL135 [get_ports {ddram_ras_n}]

# ddram:0.cas_n
set_property LOC B19 [get_ports {ddram_cas_n}]
set_property SLEW FAST [get_ports {ddram_cas_n}]
set_property IOSTANDARD SSTL135 [get_ports {ddram_cas_n}]

# ddram:0.we_n
set_property LOC A18 [get_ports {ddram_we_n}]
set_property SLEW FAST [get_ports {ddram_we_n}]
set_property IOSTANDARD SSTL135 [get_ports {ddram_we_n}]

# ddram:0.dm
set_property LOC A22 [get_ports {ddram_dm[0]}]
set_property SLEW FAST [get_ports {ddram_dm[0]}]
set_property IOSTANDARD SSTL135 [get_ports {ddram_dm[0]}]

# ddram:0.dm
set_property LOC C22 [get_ports {ddram_dm[1]}]
set_property SLEW FAST [get_ports {ddram_dm[1]}]
set_property IOSTANDARD SSTL135 [get_ports {ddram_dm[1]}]

# ddram:0.dq
set_property LOC D21 [get_ports {ddram_dq[0]}]
set_property SLEW FAST [get_ports {ddram_dq[0]}]
set_property IOSTANDARD SSTL135 [get_ports {ddram_dq[0]}]
set_property IN_TERM UNTUNED_SPLIT_40 [get_ports {ddram_dq[0]}]

# ddram:0.dq
set_property LOC C21 [get_ports {ddram_dq[1]}]
set_property SLEW FAST [get_ports {ddram_dq[1]}]
set_property IOSTANDARD SSTL135 [get_ports {ddram_dq[1]}]
set_property IN_TERM UNTUNED_SPLIT_40 [get_ports {ddram_dq[1]}]

# ddram:0.dq
set_property LOC B22 [get_ports {ddram_dq[2]}]
set_property SLEW FAST [get_ports {ddram_dq[2]}]
set_property IOSTANDARD SSTL135 [get_ports {ddram_dq[2]}]
set_property IN_TERM UNTUNED_SPLIT_40 [get_ports {ddram_dq[2]}]

# ddram:0.dq
set_property LOC B21 [get_ports {ddram_dq[3]}]
set_property SLEW FAST [get_ports {ddram_dq[3]}]
set_property IOSTANDARD SSTL135 [get_ports {ddram_dq[3]}]
set_property IN_TERM UNTUNED_SPLIT_40 [get_ports {ddram_dq[3]}]

# ddram:0.dq
set_property LOC D19 [get_ports {ddram_dq[4]}]
set_property SLEW FAST [get_ports {ddram_dq[4]}]
set_property IOSTANDARD SSTL135 [get_ports {ddram_dq[4]}]
set_property IN_TERM UNTUNED_SPLIT_40 [get_ports {ddram_dq[4]}]

# ddram:0.dq
set_property LOC E20 [get_ports {ddram_dq[5]}]
set_property SLEW FAST [get_ports {ddram_dq[5]}]
set_property IOSTANDARD SSTL135 [get_ports {ddram_dq[5]}]
set_property IN_TERM UNTUNED_SPLIT_40 [get_ports {ddram_dq[5]}]

# ddram:0.dq
set_property LOC C19 [get_ports {ddram_dq[6]}]
set_property SLEW FAST [get_ports {ddram_dq[6]}]
set_property IOSTANDARD SSTL135 [get_ports {ddram_dq[6]}]
set_property IN_TERM UNTUNED_SPLIT_40 [get_ports {ddram_dq[6]}]

# ddram:0.dq
set_property LOC D20 [get_ports {ddram_dq[7]}]
set_property SLEW FAST [get_ports {ddram_dq[7]}]
set_property IOSTANDARD SSTL135 [get_ports {ddram_dq[7]}]
set_property IN_TERM UNTUNED_SPLIT_40 [get_ports {ddram_dq[7]}]

# ddram:0.dq
set_property LOC C23 [get_ports {ddram_dq[8]}]
set_property SLEW FAST [get_ports {ddram_dq[8]}]
set_property IOSTANDARD SSTL135 [get_ports {ddram_dq[8]}]
set_property IN_TERM UNTUNED_SPLIT_40 [get_ports {ddram_dq[8]}]

# ddram:0.dq
set_property LOC D23 [get_ports {ddram_dq[9]}]
set_property SLEW FAST [get_ports {ddram_dq[9]}]
set_property IOSTANDARD SSTL135 [get_ports {ddram_dq[9]}]
set_property IN_TERM UNTUNED_SPLIT_40 [get_ports {ddram_dq[9]}]

# ddram:0.dq
set_property LOC B24 [get_ports {ddram_dq[10]}]
set_property SLEW FAST [get_ports {ddram_dq[10]}]
set_property IOSTANDARD SSTL135 [get_ports {ddram_dq[10]}]
set_property IN_TERM UNTUNED_SPLIT_40 [get_ports {ddram_dq[10]}]

# ddram:0.dq
set_property LOC B25 [get_ports {ddram_dq[11]}]
set_property SLEW FAST [get_ports {ddram_dq[11]}]
set_property IOSTANDARD SSTL135 [get_ports {ddram_dq[11]}]
set_property IN_TERM UNTUNED_SPLIT_40 [get_ports {ddram_dq[11]}]

# ddram:0.dq
set_property LOC C24 [get_ports {ddram_dq[12]}]
set_property SLEW FAST [get_ports {ddram_dq[12]}]
set_property IOSTANDARD SSTL135 [get_ports {ddram_dq[12]}]
set_property IN_TERM UNTUNED_SPLIT_40 [get_ports {ddram_dq[12]}]

# ddram:0.dq
set_property LOC C26 [get_ports {ddram_dq[13]}]
set_property SLEW FAST [get_ports {ddram_dq[13]}]
set_property IOSTANDARD SSTL135 [get_ports {ddram_dq[13]}]
set_property IN_TERM UNTUNED_SPLIT_40 [get_ports {ddram_dq[13]}]

# ddram:0.dq
set_property LOC A25 [get_ports {ddram_dq[14]}]
set_property SLEW FAST [get_ports {ddram_dq[14]}]
set_property IOSTANDARD SSTL135 [get_ports {ddram_dq[14]}]
set_property IN_TERM UNTUNED_SPLIT_40 [get_ports {ddram_dq[14]}]

# ddram:0.dq
set_property LOC B26 [get_ports {ddram_dq[15]}]
set_property SLEW FAST [get_ports {ddram_dq[15]}]
set_property IOSTANDARD SSTL135 [get_ports {ddram_dq[15]}]
set_property IN_TERM UNTUNED_SPLIT_40 [get_ports {ddram_dq[15]}]

# ddram:0.dqs_p
set_property LOC B20 [get_ports {ddram_dqs_p[0]}]
set_property SLEW FAST [get_ports {ddram_dqs_p[0]}]
set_property IOSTANDARD DIFF_SSTL135 [get_ports {ddram_dqs_p[0]}]
set_property IN_TERM UNTUNED_SPLIT_40 [get_ports {ddram_dqs_p[0]}]

# ddram:0.dqs_p
set_property LOC A23 [get_ports {ddram_dqs_p[1]}]
set_property SLEW FAST [get_ports {ddram_dqs_p[1]}]
set_property IOSTANDARD DIFF_SSTL135 [get_ports {ddram_dqs_p[1]}]
set_property IN_TERM UNTUNED_SPLIT_40 [get_ports {ddram_dqs_p[1]}]

# ddram:0.dqs_n
set_property LOC A20 [get_ports {ddram_dqs_n[0]}]
set_property SLEW FAST [get_ports {ddram_dqs_n[0]}]
set_property IOSTANDARD DIFF_SSTL135 [get_ports {ddram_dqs_n[0]}]
set_property IN_TERM UNTUNED_SPLIT_40 [get_ports {ddram_dqs_n[0]}]

# ddram:0.dqs_n
set_property LOC A24 [get_ports {ddram_dqs_n[1]}]
set_property SLEW FAST [get_ports {ddram_dqs_n[1]}]
set_property IOSTANDARD DIFF_SSTL135 [get_ports {ddram_dqs_n[1]}]
set_property IN_TERM UNTUNED_SPLIT_40 [get_ports {ddram_dqs_n[1]}]

# ddram:0.clk_p
set_property LOC F18 [get_ports {ddram_clk_p}]
set_property SLEW FAST [get_ports {ddram_clk_p}]
set_property IOSTANDARD DIFF_SSTL135 [get_ports {ddram_clk_p}]

# ddram:0.clk_n
set_property LOC F19 [get_ports {ddram_clk_n}]
set_property SLEW FAST [get_ports {ddram_clk_n}]
set_property IOSTANDARD DIFF_SSTL135 [get_ports {ddram_clk_n}]

# ddram:0.cke
set_property LOC E18 [get_ports {ddram_cke}]
set_property SLEW FAST [get_ports {ddram_cke}]
set_property IOSTANDARD SSTL135 [get_ports {ddram_cke}]

# ddram:0.odt
set_property LOC G19 [get_ports {ddram_odt}]
set_property SLEW FAST [get_ports {ddram_odt}]
set_property IOSTANDARD SSTL135 [get_ports {ddram_odt}]

# ddram:0.reset_n
set_property LOC H17 [get_ports {ddram_reset_n}]
set_property SLEW FAST [get_ports {ddram_reset_n}]
set_property IOSTANDARD SSTL135 [get_ports {ddram_reset_n}]

################################################################################
# Design constraints and bitsteam attributes
################################################################################

set_property INTERNAL_VREF 0.675 [get_iobanks 16]

set_property CONFIG_VOLTAGE 3.3 [current_design]
set_property CFGBVS VCCO [current_design]

set_property BITSTREAM.GENERAL.COMPRESS TRUE [current_design]
set_property BITSTREAM.CONFIG.CONFIGRATE 33 [current_design]
set_property CONFIG_MODE SPIx4 [current_design]

################################################################################
# Clock constraints
################################################################################

create_clock -name sys_clk_pin -period 20.00 [get_ports { ext_clk }];

create_clock -name eth_rx_clk -period 8.0 [get_nets has_liteeth.liteeth/eth_rx_clk]
create_clock -name eth_tx_clk -period 8.0 [get_nets has_liteeth.liteeth/eth_tx_clk]

set_clock_groups -group [get_clocks -include_generated_clocks -of [get_nets has_liteeth.liteeth/sys_clk]] -group [get_clocks -include_generated_clocks -of [get_nets has_liteeth.liteeth/eth_rx_clk]] -asynchronous

set_clock_groups -group [get_clocks -include_generated_clocks -of [get_nets has_liteeth.liteeth/sys_clk]] -group [get_clocks -include_generated_clocks -of [get_nets has_liteeth.liteeth/eth_tx_clk]] -asynchronous

set_clock_groups -group [get_clocks -include_generated_clocks -of [get_nets has_liteeth.liteeth/eth_rx_clk]] -group [get_clocks -include_generated_clocks -of [get_nets has_liteeth.liteeth/eth_tx_clk]] -asynchronous

################################################################################
# False path constraints (from LiteX as they relate to LiteDRAM and LiteEth)
################################################################################

set_false_path -quiet -through [get_nets -hierarchical -filter {mr_ff == TRUE}]

set_false_path -quiet -to [get_pins -filter {REF_PIN_NAME == PRE} -of_objects [get_cells -hierarchical -filter {ars_ff1 == TRUE || ars_ff2 == TRUE}]]

set_max_delay 2 -quiet -from [get_pins -filter {REF_PIN_NAME == C} -of_objects [get_cells -hierarchical -filter {ars_ff1 == TRUE}]] -to [get_pins -filter {REF_PIN_NAME == D} -of_objects [get_cells -hierarchical -filter {ars_ff2 == TRUE}]]

@ -16,7 +16,7 @@ entity fpu is
clk : in std_ulogic;
rst : in std_ulogic;

e_in : in Execute1toFPUType;
e_in : in Execute1ToFPUType;
e_out : out FPUToExecute1Type;

w_out : out FPUToWritebackType
@ -197,7 +197,7 @@ architecture behaviour of fpu is
-- Each output value is the inverse of the center of the input
-- range for the value, i.e. entry 0 is 1 / (1 + 1/512),
-- entry 1 is 1 / (1 + 3/512), etc.
signal inverse_table : lookup_table := (
constant inverse_table : lookup_table := (
-- 1/x lookup table
-- Unit bit is assumed to be 1, so input range is [1, 2)
18x"3fc01", 18x"3f411", 18x"3ec31", 18x"3e460", 18x"3dc9f", 18x"3d4ec", 18x"3cd49", 18x"3c5b5",
@ -549,6 +549,10 @@ begin
r.do_intr <= '0';
r.fpscr <= (others => '0');
r.writing_back <= '0';
r.dest_fpr <= (others =>'0');
r.cr_mask <= (others =>'0');
r.cr_result <= (others =>'0');
r.instr_tag.valid <= '0';
else
assert not (r.state /= IDLE and e_in.valid = '1') severity failure;
r <= rin;

@ -40,8 +40,8 @@ architecture behaviour of gpio is
constant GPIO_REG_DATA_CLR : std_ulogic_vector(GPIO_REG_BITS-1 downto 0) := "00101";

-- Current output value and direction
signal reg_data : std_ulogic_vector(NGPIO - 1 downto 0) := (others => '0');
signal reg_dirn : std_ulogic_vector(NGPIO - 1 downto 0) := (others => '0');
signal reg_data : std_ulogic_vector(NGPIO - 1 downto 0);
signal reg_dirn : std_ulogic_vector(NGPIO - 1 downto 0);
signal reg_in1 : std_ulogic_vector(NGPIO - 1 downto 0);
signal reg_in2 : std_ulogic_vector(NGPIO - 1 downto 0);

@ -58,7 +58,7 @@ begin

-- Wishbone response
wb_rsp.ack <= wb_in.cyc and wb_in.stb;
with wb_in.adr(GPIO_REG_BITS + 1 downto 2) select reg_out <=
with wb_in.adr(GPIO_REG_BITS - 1 downto 0) select reg_out <=
reg_data when GPIO_REG_DATA_OUT,
reg_in2 when GPIO_REG_DATA_IN,
reg_dirn when GPIO_REG_DIR,
@ -79,7 +79,7 @@ begin
wb_out.ack <= '0';
else
if wb_in.cyc = '1' and wb_in.stb = '1' and wb_in.we = '1' then
case wb_in.adr(GPIO_REG_BITS + 1 downto 2) is
case wb_in.adr(GPIO_REG_BITS - 1 downto 0) is
when GPIO_REG_DATA_OUT =>
reg_data <= wb_in.dat(NGPIO - 1 downto 0);
when GPIO_REG_DIR =>

@ -60,11 +60,25 @@ _start:

.global boot_entry
boot_entry:
LOAD_IMM64(%r10,__bss_start)
LOAD_IMM64(%r11,__bss_end)
subf %r11,%r10,%r11
addi %r11,%r11,63
srdi. %r11,%r11,6
beq 2f
mtctr %r11
1: dcbz 0,%r10
addi %r10,%r10,64
bdnz 1b

/* setup stack */
LOAD_IMM64(%r1, STACK_TOP - 0x100)
2: LOAD_IMM64(%r1,__stack_top)
li %r0,0
stdu %r0,-32(%r1)
LOAD_IMM64(%r12, main)
mtctr %r12,
mtctr %r12
bctrl
attn // terminate on exit
b .

#define EXCEPTION(nr) \

Binary file not shown.

Binary file not shown.

@ -35,24 +35,24 @@ a64b5a7d14004a39
a602487d05009f42
a64b5a7d14004a39
2402004ca64b7b7d
3c20000048000004
3d40000048000004
794a07c6614a0000
614a1900654a0000
616b00003d600000
656b0000796b07c6
7d6a5850616b1980
796bd183396b003f
7d6903a641820014
394a00407c0057ec
3c2000004200fff8
782107c660210000
60211f0064210000
618c00003d800000
658c0000798c07c6
7d8903a6618c1014
480000004e800421
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
0000000000000000
6021398064210000
f801ffe138000000
3d8000007c1243a6
798c07c6618c0000
618c1000658c0000
4e8004217d8903a6
4800000000000200
0000000000000000
0000000000000000
0000000000000000
@ -510,150 +510,150 @@ a64b5a7d14004a39
0000000000000000
0000000000000000
0000000000000000
e8010010ebc1fff0
7c0803a6ebe1fff8
3c4000014e800020
7c0802a638429800
f8010010fbe1fff8
480001edf821ffd1
6000000060000000
4800015538628000
4800004960000000
7c7f1b7860000000
57ff063e5463063e
60000000480000b9
4082ffe02c1f000d
480000a53860000a
4bffffd060000000
0100000000000000
3c40000100000180
6000000038429800
6000000089228090
2c09000039428088
e92a000041820030
7c0004ac39290014
712900017d204eaa
e86a00004182ffec
7c601eaa7c0004ac
4e8000205463063e
39290010e92a0000
7d204eea7c0004ac
4082ffec71290001
38630008e86a0000
7c601eea7c0004ac
000000004bffffd0
0000000000000000
384298003c400001
8922809060000000
3942808860000000
4182002c2c090000
39290014e92a0000
7d204eaa7c0004ac
4182ffec71290020
7c0004ace92a0000
4e8000207c604faa
39290010e92a0000
7d204eea7c0004ac
4082ffec71290008
e94a00005469063e
7d2057ea7c0004ac
000000004e800020
0000000000000000
384298003c400001
fbe1fff87c0802a6
3be3fffffbc1fff0
f821ffd1f8010010
2c3e00008fdf0001
3821003040820010
4bfffe4438600000
4082000c281e000a
4bffff453860000d
4bffff3d7fc3f378
60000000480001ed
3862800060000000
6000000048000155
6000000048000049
5463063e7c7f1b78
480000b957ff063e
2c1f000d60000000
3860000a4082ffe0
60000000480000a5
000000004bffffd0
0000028001000000
386000007c691b78
2c0a00007d4918ae
386300014d820020
000000004bfffff0
0000000000000000
0000018001000000
384298003c400001
614a00203d40c000
7c0004ac794a0020
3d20c0007d4056ea
61290008794a0600
7c0004ac79290020
712900207d204eea
3d20c00041820018
7929002061290040
7d204eea7c0004ac
3d00c0007929f804
6108200079290fc3
6000000079080020
3d00001cf9028088
7d4a439261082000
6000000041820084
9922809039200001
6108200c3d00c000
790800203920ff80
7d2047aa7c0004ac
7c0004ace9228088
e92280887d404faa
39290004794ac202
7d404faa7c0004ac
39400003e9228088
7c0004ac3929000c
e92280887d404faa
8922810860000000
3942810060000000
418200302c090000
39290014e92a0000
7d204eaa7c0004ac
4182ffec71290001
7c0004ace86a0000
5463063e7c601eaa
e92a00004e800020
7c0004ac39290010
e92280887d404faa
3929000839400007
7d404faa7c0004ac
600000004e800020
99228090394affff
612920183d20c000
7c0004ac79290020
4e8000207d404fea
712900017d204eea
e86a00004082ffec
7c0004ac38630008
4bffffd07c601eea
0000000000000000
3c40000100000000
6000000038429800
2c24000089228090
600000002f890000
419e0030e9228088
3940000241820024
418200082c230000
39290004614a0001
6000000089228108
2c09000039428100
e92a00004182002c
7c0004ac39290014
712900207d204eaa
e92a00004182ffec
7c604faa7c0004ac
e92a00004e800020
7c0004ac39290010
712900087d204eea
5469063e4082ffec
7c0004ace94a0000
4e8000207d2057ea
0000000000000000
3c40000100000000
7c0802a638429800
fbc1fff0fbe1fff8
f80100103be3ffff
8fdf0001f821ffd1
408200102c3e0000
3860000038210030
281e000a480001e8
3860000d4082000c
7fc3f3784bffff45
4bffffd04bffff3d
0100000000000000
7c691b7800000280
7d4918ae38600000
4d8200202c0a0000
4bfffff038630001
0000000000000000
3c40000100000000
3d40c00038429800
794a0020614a0020
7d4056ea7c0004ac
794a06003d20c000
7929002061290008
7d204eea7c0004ac
4182001871290020
612900403d20c000
7c0004ac79290020
7929f8047d204eea
79290fc33d00c000
7908002061082000
f902810060000000
610820003d00001c
418200847d4a4392
3920000160000000
3d00c00099228108
3920ff806108200c
7c0004ac79080020
e92281007d2047aa
7d404faa7c0004ac
394000004e800020
418200084bffffe0
794ac202e9228100
7c0004ac39290004
e92281007d404faa
3929000c39400003
7d404faa7c0004ac
39290010e9228100
7d404faa7c0004ac
39400007e9228100
7c0004ac39290008
4e8000207d404faa
394affff60000000
3d20c00099228108
7929002061292018
7d404fea7c0004ac
000000004e800020
0000000000000000
384298003c400001
8922810860000000
600000002c090000
41820024e9228100
78840e282c230000
6084000141820008
7c0004ac39290004
4e8000207c804faa
418200082c240000
3929002060630002
7c604fea7c0004ac
000000004e800020
0000000000000000
0000000000000010
0141780400527a01
0000001800010c1b
fffffc4800000018
300e460000000070
000000019f7e4111
0000000000000010
0141780400527a01
0000001000010c1b
fffffc8800000018
0000000000000084
0000002c00000010
00000080fffffcf8
0000002800000000
fffffd6400000040
4109450000000060
300e43029e019f00
42000e0a447e4111
0000000b4106dedf
0000006c00000010
00000028fffffd98
e8010010ebc1fff0
7c0803a6ebe1fff8
000000104e800020
00527a0100000000
00010c1b01417804
0000001800000018
00000070fffffc40
9f7e4111300e4600
0000001000000001
00527a0100000000
00010c1b01417804
0000001800000010
00000084fffffc80
0000001000000000
fffffcf00000002c
0000000000000080
0000004000000028
00000060fffffd5c
9e019f0041094500
447e4111300e4302
4106dedf42000e0a
000000100000000b
fffffd900000006c
0000000000000028
0000008000000010
0000012cfffffda4
0000001000000000
fffffdac00000080
000000000000012c
0000009400000010
00000074fffffec4
fffffebc00000094
0000000000000068
0000000000000000
0000000000000000
0000000000000000
0000000000000000

@ -1,13 +1,27 @@
SECTIONS
{
_start = .;
. = 0;
_start = .;
.head : {
KEEP(*(.head))
}
}
. = 0x1000;
.text : { *(.text) }
.text : { *(.text) *(.text.*) *(.rodata) *(.rodata.*) }
. = 0x1800;
.data : { *(.data) }
.bss : { *(.bss) }
.data : { *(.data) *(.data.*) *(.got) *(.toc) }
. = ALIGN(0x80);
__bss_start = .;
.bss : {
*(.dynsbss)
*(.sbss)
*(.scommon)
*(.dynbss)
*(.bss)
*(.common)
*(.bss.*)
}
. = ALIGN(0x80);
__bss_end = .;
. = . + 0x2000;
__stack_top = .;
}

@ -28,7 +28,9 @@ package helpers is

function bit_reverse(a: std_ulogic_vector) return std_ulogic_vector;
function bit_number(a: std_ulogic_vector(63 downto 0)) return std_ulogic_vector;
function edgelocation(v: std_ulogic_vector; nbits: natural) return std_ulogic_vector;
function count_left_zeroes(val: std_ulogic_vector) return std_ulogic_vector;
function count_right_zeroes(val: std_ulogic_vector) return std_ulogic_vector;
end package helpers;

package body helpers is
@ -247,16 +249,50 @@ package body helpers is
return ret;
end;

-- Count leading zeroes operation
-- Assuming the input 'v' is a value of the form 1...10...0,
-- the output is the bit number of the rightmost 1 bit in v.
-- If v is zero, the result is zero.
function edgelocation(v: std_ulogic_vector; nbits: natural) return std_ulogic_vector is
variable p: std_ulogic_vector(nbits - 1 downto 0);
variable stride: natural;
variable b: std_ulogic;
variable k: natural;
begin
stride := 2;
for i in 0 to nbits - 1 loop
b := '0';
for j in 0 to (2**nbits / stride) - 1 loop
k := j * stride;
b := b or (v(k + stride - 1) and not v(k + (stride/2) - 1));
end loop;
p(i) := b;
stride := stride * 2;
end loop;
return p;
end function;

-- Count leading zeroes operations
-- Assumes the value passed in is not zero (if it is, zero is returned)
function count_left_zeroes(val: std_ulogic_vector) return std_ulogic_vector is
variable rev: std_ulogic_vector(val'left downto val'right);
function count_right_zeroes(val: std_ulogic_vector) return std_ulogic_vector is
variable sum: std_ulogic_vector(val'left downto val'right);
variable onehot: std_ulogic_vector(val'left downto val'right);
variable edge: std_ulogic_vector(val'left downto val'right);
variable bn, bn_e, bn_o: std_ulogic_vector(5 downto 0);
begin
sum := std_ulogic_vector(- signed(val));
onehot := sum and val;
edge := sum or val;
bn_e := edgelocation(std_ulogic_vector(resize(signed(edge), 64)), 6);
bn_o := bit_number(std_ulogic_vector(resize(unsigned(onehot), 64)));
bn := bn_e(5 downto 2) & bn_o(1 downto 0);
return bn;
end;

function count_left_zeroes(val: std_ulogic_vector) return std_ulogic_vector is
variable rev: std_ulogic_vector(val'left downto val'right);
begin
rev := bit_reverse(val);
sum := std_ulogic_vector(- signed(rev));
onehot := sum and rev;
return bit_number(std_ulogic_vector(resize(unsigned(onehot), 64)));
return count_right_zeroes(rev);
end;

end package body helpers;

@ -46,8 +46,6 @@ entity icache is
TLB_SIZE : positive := 64;
-- L1 ITLB log_2(page_size)
TLB_LG_PGSZ : positive := 12;
-- Number of real address bits that we store
REAL_ADDR_BITS : positive := 56;
-- Non-zero to enable log data collection
LOG_LENGTH : natural := 0
);
@ -70,6 +68,7 @@ entity icache is

wb_snoop_in : in wishbone_master_out := wishbone_master_out_init;

events : out IcacheEventType;
log_out : out std_ulogic_vector(53 downto 0)
);
end entity icache;
@ -170,7 +169,7 @@ architecture rtl of icache is
signal eaa_priv : std_ulogic;

-- Cache reload state machine
type state_t is (IDLE, CLR_TAG, WAIT_ACK);
type state_t is (IDLE, STOP_RELOAD, CLR_TAG, WAIT_ACK);

type reg_internal_t is record
-- Cache hit state (Latches for 1 cycle BRAM access)
@ -197,6 +196,8 @@ architecture rtl of icache is

signal r : reg_internal_t;

signal ev : IcacheEventType;

-- Async signals on incoming request
signal req_index : index_t;
signal req_row : row_t;
@ -204,14 +205,13 @@ architecture rtl of icache is
signal req_tag : cache_tag_t;
signal req_is_hit : std_ulogic;
signal req_is_miss : std_ulogic;
signal req_laddr : std_ulogic_vector(63 downto 0);
signal req_raddr : real_addr_t;

signal tlb_req_index : tlb_index_t;
signal real_addr : std_ulogic_vector(REAL_ADDR_BITS - 1 downto 0);
signal real_addr : real_addr_t;
signal ra_valid : std_ulogic;
signal priv_fault : std_ulogic;
signal access_ok : std_ulogic;
signal use_previous : std_ulogic;

-- Cache RAM interface
type cache_ram_out_t is array(way_t) of cache_row_t;
@ -234,7 +234,7 @@ architecture rtl of icache is
end;

-- Return the cache row index (data memory) for an address
function get_row(addr: std_ulogic_vector(63 downto 0)) return row_t is
function get_row(addr: std_ulogic_vector) return row_t is
begin
return to_integer(unsigned(addr(SET_SIZE_BITS - 1 downto ROW_OFF_BITS)));
end;
@ -248,9 +248,9 @@ architecture rtl of icache is
end;

-- Returns whether this is the last row of a line
function is_last_row_addr(addr: wishbone_addr_type; last: row_in_line_t) return boolean is
function is_last_row_wb_addr(wb_addr: wishbone_addr_type; last: row_in_line_t) return boolean is
begin
return unsigned(addr(LINE_OFF_BITS-1 downto ROW_OFF_BITS)) = last;
return unsigned(wb_addr(LINE_OFF_BITS - ROW_OFF_BITS - 1 downto 0)) = last;
end;

-- Returns whether this is the last row of a line
@ -260,16 +260,16 @@ architecture rtl of icache is
end;

-- Return the address of the next row in the current cache line
function next_row_addr(addr: wishbone_addr_type)
function next_row_wb_addr(wb_addr: wishbone_addr_type)
return std_ulogic_vector is
variable row_idx : std_ulogic_vector(ROW_LINEBITS-1 downto 0);
variable result : wishbone_addr_type;
begin
-- Is there no simpler way in VHDL to generate that 3 bits adder ?
row_idx := addr(LINE_OFF_BITS-1 downto ROW_OFF_BITS);
row_idx := wb_addr(ROW_LINEBITS - 1 downto 0);
row_idx := std_ulogic_vector(unsigned(row_idx) + 1);
result := addr;
result(LINE_OFF_BITS-1 downto ROW_OFF_BITS) := row_idx;
result := wb_addr;
result(ROW_LINEBITS - 1 downto 0) := row_idx;
return result;
end;

@ -298,10 +298,9 @@ architecture rtl of icache is
end;

-- Get the tag value from the address
function get_tag(addr: std_ulogic_vector(REAL_ADDR_BITS - 1 downto 0);
endian: std_ulogic) return cache_tag_t is
function get_tag(addr: real_addr_t; endian: std_ulogic) return cache_tag_t is
begin
return endian & addr(REAL_ADDR_BITS - 1 downto SET_SIZE_BITS);
return endian & addr(addr'left downto SET_SIZE_BITS);
end;

-- Read a tag from a tag memory row
@ -397,7 +396,7 @@ begin
wr_dat(ii * 8 + 7 downto ii * 8) <= wishbone_in.dat(j * 8 + 7 downto j * 8);
end loop;
end if;
do_read <= not (stall_in or use_previous);
do_read <= not stall_in;
do_write <= '0';
if wishbone_in.ack = '1' and replace_way = i then
do_write <= '1';
@ -465,7 +464,7 @@ begin
end if;
eaa_priv <= pte(3);
else
real_addr <= i_in.nia(REAL_ADDR_BITS - 1 downto 0);
real_addr <= addr_to_real(i_in.nia);
ra_valid <= '1';
eaa_priv <= '1';
end if;
@ -494,6 +493,7 @@ begin
itlb_ptes(wr_index) <= m_in.pte;
itlb_valids(wr_index) <= '1';
end if;
ev.itlb_miss_resolved <= m_in.tlbld and not rst;
end if;
end process;

@ -502,16 +502,6 @@ begin
variable is_hit : std_ulogic;
variable hit_way : way_t;
begin
-- i_in.sequential means that i_in.nia this cycle is 4 more than
-- last cycle. If we read more than 32 bits at a time, had a cache hit
-- last cycle, and we don't want the first 32-bit chunk, then we can
-- keep the data we read last cycle and just use that.
if unsigned(i_in.nia(INSN_BITS+2-1 downto 2)) /= 0 then
use_previous <= i_in.req and i_in.sequential and r.hit_valid;
else
use_previous <= '0';
end if;

-- Extract line, row and tag from request
req_index <= get_index(i_in.nia);
req_row <= get_row(i_in.nia);
@ -520,8 +510,7 @@ begin
-- Calculate address of beginning of cache row, will be
-- used for cache miss processing if needed
--
req_laddr <= (63 downto REAL_ADDR_BITS => '0') &
real_addr(REAL_ADDR_BITS - 1 downto ROW_OFF_BITS) &
req_raddr <= real_addr(REAL_ADDR_BITS - 1 downto ROW_OFF_BITS) &
(ROW_OFF_BITS-1 downto 0 => '0');

-- Test if pending request is a hit on any way
@ -566,13 +555,18 @@ begin
-- I prefer not to do just yet as it would force fetch2 to know about
-- some of the cache geometry information.
--
i_out.insn <= read_insn_word(r.hit_nia, cache_out(r.hit_way));
if r.hit_valid = '1' then
i_out.insn <= read_insn_word(r.hit_nia, cache_out(r.hit_way));
else
i_out.insn <= (others => '0');
end if;
i_out.valid <= r.hit_valid;
i_out.nia <= r.hit_nia;
i_out.stop_mark <= r.hit_smark;
i_out.fetch_failed <= r.fetch_failed;
i_out.big_endian <= r.big_endian;
i_out.next_predicted <= i_in.predicted;
i_out.next_pred_ntaken <= i_in.pred_ntaken;

-- Stall fetch1 if we have a miss on cache or TLB or a protection fault
stall_out <= not (is_hit and access_ok);
@ -587,8 +581,7 @@ begin
if rising_edge(clk) then
-- keep outputs to fetch2 unchanged on a stall
-- except that flush or reset sets valid to 0
-- If use_previous, keep the same data as last cycle and use the second half
if stall_in = '1' or use_previous = '1' then
if stall_in = '1' then
if rst = '1' or flush_in = '1' then
r.hit_valid <= '0';
end if;
@ -622,11 +615,12 @@ begin
icache_miss : process(clk)
variable tagset : cache_tags_set_t;
variable tag : cache_tag_t;
variable snoop_addr : std_ulogic_vector(REAL_ADDR_BITS - 1 downto 0);
variable snoop_addr : real_addr_t;
variable snoop_tag : cache_tag_t;
variable snoop_cache_tags : cache_tags_set_t;
begin
if rising_edge(clk) then
ev.icache_miss <= '0';
-- On reset, clear all valid bits to force misses
if rst = '1' then
for i in index_t loop
@ -651,8 +645,7 @@ begin
-- Detect snooped writes and decode address into index and tag
-- Since we never write, any write should be snooped
snoop_valid <= wb_snoop_in.cyc and wb_snoop_in.stb and wb_snoop_in.we;
snoop_addr := (others => '0');
snoop_addr(wb_snoop_in.adr'left downto 0) := wb_snoop_in.adr;
snoop_addr := addr_to_real(wb_to_addr(wb_snoop_in.adr));
snoop_index <= get_index(snoop_addr);
snoop_cache_tags := cache_tags(get_index(snoop_addr));
snoop_tag := get_tag(snoop_addr, '0');
@ -699,18 +692,19 @@ begin
" way:" & integer'image(replace_way) &
" tag:" & to_hstring(req_tag) &
" RA:" & to_hstring(real_addr);
ev.icache_miss <= '1';

-- Keep track of our index and way for subsequent stores
r.store_index <= req_index;
r.store_row <= get_row(req_laddr);
r.store_row <= get_row(req_raddr);
r.store_tag <= req_tag;
r.store_valid <= '1';
r.end_row_ix <= get_row_of_line(get_row(req_laddr)) - 1;
r.end_row_ix <= get_row_of_line(get_row(req_raddr)) - 1;

-- Prep for first wishbone read. We calculate the address of
-- the start of the cache line and start the WB cycle.
--
r.wb.adr <= req_laddr(r.wb.adr'left downto 0);
r.wb.adr <= addr_to_wb(req_raddr);
r.wb.cyc <= '1';
r.wb.stb <= '1';

@ -742,17 +736,23 @@ begin
if wishbone_in.stall = '0' and r.wb.stb = '1' then
-- That was the last word ? We are done sending. Clear stb.
--
if is_last_row_addr(r.wb.adr, r.end_row_ix) then
if is_last_row_wb_addr(r.wb.adr, r.end_row_ix) then
r.wb.stb <= '0';
end if;

-- Calculate the next row address
r.wb.adr <= next_row_addr(r.wb.adr);
r.wb.adr <= next_row_wb_addr(r.wb.adr);
end if;

-- Abort reload if we get an invalidation
if inval_in = '1' then
r.wb.stb <= '0';
r.state <= STOP_RELOAD;
end if;

-- Incoming acks processing
if wishbone_in.ack = '1' then
r.rows_valid(r.store_row mod ROW_PER_LINE) <= '1';
r.rows_valid(r.store_row mod ROW_PER_LINE) <= not inval_in;
-- Check for completion
if is_last_row(r.store_row, r.end_row_ix) then
-- Complete wishbone cycle
@ -768,6 +768,18 @@ begin
-- Increment store row counter
r.store_row <= next_row(r.store_row);
end if;

when STOP_RELOAD =>
-- Wait for all outstanding requests to be satisfied, then
-- go to IDLE state.
if get_row_of_line(r.store_row) = get_row_of_line(get_row(wb_to_addr(r.wb.adr))) then
r.wb.cyc <= '0';
r.state <= IDLE;
end if;
if wishbone_in.ack = '1' then
-- Increment store row counter
r.store_row <= next_row(r.store_row);
end if;
end case;
end if;

@ -797,7 +809,7 @@ begin
log_data <= i_out.valid &
i_out.insn &
wishbone_in.ack &
r.wb.adr(5 downto 3) &
r.wb.adr(2 downto 0) &
r.wb.stb & r.wb.cyc &
wishbone_in.stall &
stall_out &
@ -812,4 +824,7 @@ begin
end process;
log_out <= log_data;
end generate;

events <= ev;

end;

@ -74,6 +74,9 @@ begin
i_out.req <= '0';
i_out.nia <= (others => '0');
i_out.stop_mark <= '0';
i_out.priv_mode <= '1';
i_out.virt_mode <= '0';
i_out.big_endian <= '0';

m_out.tlbld <= '0';
m_out.tlbie <= '0';

@ -13,6 +13,7 @@ entity litedram_wrapper is
DRAM_ABITS : positive;
DRAM_ALINES : natural;
DRAM_DLINES : natural;
DRAM_CKLINES : natural;
DRAM_PORT_WIDTH : positive;

-- Pseudo-ROM payload
@ -69,8 +70,8 @@ entity litedram_wrapper is
ddram_dq : inout std_ulogic_vector(DRAM_DLINES-1 downto 0);
ddram_dqs_p : inout std_ulogic_vector(DRAM_DLINES/8-1 downto 0);
ddram_dqs_n : inout std_ulogic_vector(DRAM_DLINES/8-1 downto 0);
ddram_clk_p : out std_ulogic;
ddram_clk_n : out std_ulogic;
ddram_clk_p : out std_ulogic_vector(DRAM_CKLINES-1 downto 0);
ddram_clk_n : out std_ulogic_vector(DRAM_CKLINES-1 downto 0);
ddram_cke : out std_ulogic;
ddram_odt : out std_ulogic;
ddram_reset_n : out std_ulogic
@ -93,8 +94,8 @@ architecture behaviour of litedram_wrapper is
ddram_dq : inout std_ulogic_vector(DRAM_DLINES-1 downto 0);
ddram_dqs_p : inout std_ulogic_vector(DRAM_DLINES/8-1 downto 0);
ddram_dqs_n : inout std_ulogic_vector(DRAM_DLINES/8-1 downto 0);
ddram_clk_p : out std_ulogic;
ddram_clk_n : out std_ulogic;
ddram_clk_p : out std_ulogic_vector(DRAM_CKLINES-1 downto 0);
ddram_clk_n : out std_ulogic_vector(DRAM_CKLINES-1 downto 0);
ddram_cke : out std_ulogic;
ddram_odt : out std_ulogic;
ddram_reset_n : out std_ulogic;
@ -163,7 +164,6 @@ architecture behaviour of litedram_wrapper is
-- Select a WB word inside DRAM port width
constant WB_WORD_COUNT : positive := DRAM_DBITS/WBL;
constant WB_WSEL_BITS : positive := log2(WB_WORD_COUNT);
constant WB_WSEL_RIGHT : positive := log2(WBL/8);

-- BRAM organisation: We never access more than wishbone_data_bits at
-- a time so to save resources we make the array only that wide, and
@ -312,10 +312,20 @@ architecture behaviour of litedram_wrapper is
-- Helper functions to decode incoming requests
--

-- Return the DRAM real address from a wishbone address
function get_real_addr(addr: wishbone_addr_type) return std_ulogic_vector is
variable ra: std_ulogic_vector(REAL_ADDR_BITS - 1 downto 0) := (others => '0');
begin
ra(REAL_ADDR_BITS - 1 downto wishbone_log2_width) :=
addr(REAL_ADDR_BITS - wishbone_log2_width - 1 downto 0);
return ra;
end;

-- Return the cache line index (tag index) for an address
function get_index(addr: wishbone_addr_type) return index_t is
begin
return to_integer(unsigned(addr(SET_SIZE_BITS - 1 downto LINE_OFF_BITS)));
return to_integer(unsigned(addr(SET_SIZE_BITS - wishbone_log2_width - 1 downto
LINE_OFF_BITS - wishbone_log2_width)));
end;

-- Return the cache row index (data memory) for an address
@ -378,7 +388,8 @@ architecture behaviour of litedram_wrapper is
-- Get the tag value from the address
function get_tag(addr: wishbone_addr_type) return cache_tag_t is
begin
return addr(REAL_ADDR_BITS - 1 downto SET_SIZE_BITS);
return addr(REAL_ADDR_BITS - wishbone_log2_width - 1 downto
SET_SIZE_BITS - wishbone_log2_width);
end;

-- Read a tag from a tag memory row
@ -447,7 +458,7 @@ begin
wb_ctrl_stb <= '0';
else
-- XXX Maybe only update addr when cyc = '1' to save power ?
wb_ctrl_adr <= x"0000" & wb_ctrl_in.adr(15 downto 2);
wb_ctrl_adr <= x"0000" & wb_ctrl_in.adr(13 downto 0);
wb_ctrl_dat_w <= wb_ctrl_in.dat;
wb_ctrl_sel <= wb_ctrl_in.sel;
wb_ctrl_we <= wb_ctrl_in.we;
@ -608,7 +619,7 @@ begin
if stall = '1' and wb_out.stall = '0' and wb_in.cyc = '1' and wb_in.stb = '1' then
wb_stash <= wb_in;
if TRACE then
report "stashed wb req ! addr:" & to_hstring(wb_in.adr) &
report "stashed wb req ! addr:" & to_hstring(wb_in.adr & "000") &
" we:" & std_ulogic'image(wb_in.we) &
" sel:" & to_hstring(wb_in.sel);
end if;
@ -621,7 +632,7 @@ begin
wb_req <= wb_stash;
wb_stash.cyc <= '0';
if TRACE then
report "unstashed wb req ! addr:" & to_hstring(wb_stash.adr) &
report "unstashed wb req ! addr:" & to_hstring(wb_stash.adr & "000") &
" we:" & std_ulogic'image(wb_stash.we) &
" sel:" & to_hstring(wb_stash.sel);
end if;
@ -636,7 +647,7 @@ begin

if TRACE then
if wb_in.cyc = '1' and wb_in.stb = '1' then
report "latch new wb req ! addr:" & to_hstring(wb_in.adr) &
report "latch new wb req ! addr:" & to_hstring(wb_in.adr & "000") &
" we:" & std_ulogic'image(wb_in.we) &
" sel:" & to_hstring(wb_in.sel);
end if;
@ -665,12 +676,12 @@ begin

if TRACE then
if req_op = OP_LOAD_HIT then
report "Load hit addr:" & to_hstring(wb_req.adr) &
report "Load hit addr:" & to_hstring(wb_req.adr & "000") &
" idx:" & integer'image(req_index) &
" tag:" & to_hstring(req_tag) &
" way:" & integer'image(req_hit_way);
elsif req_op = OP_LOAD_MISS then
report "Load miss addr:" & to_hstring(wb_req.adr);
report "Load miss addr:" & to_hstring(wb_req.adr & "000");
end if;
if read_ack_0 = '1' then
report "read data:" & to_hstring(cache_out(read_way_0));
@ -771,20 +782,19 @@ begin
begin
-- Extract line, row and tag from request
req_index <= get_index(wb_req.adr);
req_row <= get_row(wb_req.adr(REAL_ADDR_BITS-1 downto 0));
req_row <= get_row(get_real_addr(wb_req.adr));
req_tag <= get_tag(wb_req.adr);

-- Calculate address of beginning of cache row, will be
-- used for cache miss processing if needed
req_laddr <= wb_req.adr(REAL_ADDR_BITS - 1 downto ROW_OFF_BITS) &
(ROW_OFF_BITS-1 downto 0 => '0');
req_laddr <= get_real_addr(wb_req.adr);


-- Do we have a valid request in the WB latch ?
valid := wb_req.cyc = '1' and wb_req.stb = '1';

-- Store signals (hard wired for 64-bit wishbone at the moment)
req_wsl <= wb_req.adr(WB_WSEL_RIGHT+WB_WSEL_BITS-1 downto WB_WSEL_RIGHT);
req_wsl <= wb_req.adr(WB_WSEL_BITS-1 downto 0);
for i in 0 to WB_WORD_COUNT-1 loop
if to_integer(unsigned(req_wsl)) = i then
req_we(WBSL*(i+1)-1 downto WBSL*i) <= wb_req.sel;
@ -892,7 +902,7 @@ begin
variable stq_wsl : std_ulogic_vector(WB_WSEL_BITS-1 downto 0);
begin
storeq_wr_data <= wb_req.dat & wb_req.sel &
wb_req.adr(WB_WSEL_RIGHT+WB_WSEL_BITS-1 downto WB_WSEL_RIGHT);
wb_req.adr(WB_WSEL_BITS-1 downto 0);

-- Only queue stores if we can also send a command
if req_op = OP_STORE_HIT or req_op = OP_STORE_MISS then
@ -927,13 +937,13 @@ begin
if rising_edge(system_clk) then
if req_op = OP_STORE_HIT then
report "Store hit to:" &
to_hstring(wb_req.adr(DRAM_ABITS+3 downto 0)) &
to_hstring(wb_req.adr(DRAM_ABITS downto 0) & "000") &
" data:" & to_hstring(req_wdata) &
" we:" & to_hstring(req_we) &
" V:" & std_ulogic'image(user_port0_cmd_ready);
else
report "Store miss to:" &
to_hstring(wb_req.adr(DRAM_ABITS+3 downto 0)) &
to_hstring(wb_req.adr(DRAM_ABITS downto 0) & "000") &
" data:" & to_hstring(req_wdata) &
" we:" & to_hstring(req_we) &
" V:" & std_ulogic'image(user_port0_cmd_ready);
@ -954,7 +964,8 @@ begin
if req_op = OP_STORE_HIT or req_op = OP_STORE_MISS then
-- For stores, forward signals directly. Only send command if
-- the FIFO can accept a store.
user_port0_cmd_addr <= wb_req.adr(DRAM_ABITS+ROW_OFF_BITS-1 downto ROW_OFF_BITS);
user_port0_cmd_addr <= wb_req.adr(DRAM_ABITS + ROW_OFF_BITS - wishbone_log2_width - 1 downto
ROW_OFF_BITS - wishbone_log2_width);
user_port0_cmd_we <= '1';
user_port0_cmd_valid <= storeq_wr_ready;
else

@ -102,8 +102,8 @@ entity litedram_core is
ddram_dq : inout std_ulogic_vector(15 downto 0);
ddram_dqs_p : inout std_ulogic_vector(1 downto 0);
ddram_dqs_n : inout std_ulogic_vector(1 downto 0);
ddram_clk_p : out std_ulogic;
ddram_clk_n : out std_ulogic;
ddram_clk_p : out std_ulogic_vector(0 downto 0);
ddram_clk_n : out std_ulogic_vector(0 downto 0);
ddram_cke : out std_ulogic;
ddram_odt : out std_ulogic;
ddram_reset_n : out std_ulogic;

@ -31,6 +31,7 @@
"user_ports": {
"native_0": {
"type": "native",
"block_until_ready": False,
},
},
}

@ -31,6 +31,7 @@
"user_ports": {
"native_0": {
"type": "native",
"block_until_ready": False,
},
},
}

@ -100,7 +100,7 @@ begin
if rising_edge(clk) then
oack <= '0';
if (wb_in.cyc and wb_in.stb) = '1' then
adr := to_integer((unsigned(wb_in.adr(INIT_RAM_ABITS-1 downto 2))));
adr := to_integer((unsigned(wb_in.adr(INIT_RAM_ABITS - 3 downto 0))));
if wb_in.we = '0' then
obuf <= init_ram(adr);
else

@ -100,7 +100,7 @@ def generate_one(t):

def main():

targets = ['arty','nexys-video', 'genesys2', 'acorn-cle-215', 'sim']
targets = ['arty','nexys-video', 'genesys2', 'acorn-cle-215', 'wukong-v2', 'orangecrab-85-0.2', 'sim']
for t in targets:
generate_one(t)

@ -31,6 +31,7 @@
"user_ports": {
"native_0": {
"type": "native",
"block_until_ready": False,
},
},
}

@ -31,6 +31,7 @@
"user_ports": {
"native_0": {
"type": "native",
"block_until_ready": False,
},
},
}

@ -0,0 +1,39 @@
# Matt Johnston 2021
# Based on parameters from Greg Davill's Orangecrab-test-sw

{
"cpu": "None", # CPU type (ex vexriscv, serv, None)
"device": "LFE5U-85F-8MG285C",
"memtype": "DDR3", # DRAM type

"sdram_module": "MT41K256M16", # SDRAM modules of the board or SO-DIMM
"sdram_module_nb": 2, # Number of byte groups
"sdram_rank_nb": 1, # Number of ranks
"sdram_phy": "ECP5DDRPHY", # Type of FPGA PHY

# Electrical ---------------------------------------------------------------
"rtt_nom": "disabled", # Nominal termination. ("disabled" from orangecrab)
"rtt_wr": "60ohm", # Write termination. (Default)
"ron": "34ohm", # Output driver impedance. (Default)

# Frequency ----------------------------------------------------------------
"init_clk_freq": 24e6,
"input_clk_freq": 48e6, # Input clock frequency
"sys_clk_freq": 48e6, # System clock frequency (DDR_clk = 4 x sys_clk)

# 0 if freq >64e6 else 100. https://github.com/enjoy-digital/litedram/issues/130
"cmd_delay": 100,

# Core ---------------------------------------------------------------------
"cmd_buffer_depth": 16, # Depth of the command buffer

"dm_swap": true,

# User Ports ---------------------------------------------------------------
"user_ports": {
"native_0": {
"type": "native",
"block_until_ready": False,
},
},
}

@ -32,6 +32,7 @@ CPPFLAGS += -I$(LXSRC_DIR) -I$(LXINC_DIR) -I$(LXINC_DIR)/base -I$(LXSRC_DIR)/lib

CPPFLAGS += -isystem $(shell $(CC) -print-file-name=include)
CFLAGS = -Os -g -Wall -std=c99 -m64 -mabi=elfv2 -msoft-float -mno-string -mno-multiple -mno-vsx -mno-altivec -mlittle-endian -fno-stack-protector -mstrict-align -ffreestanding -fdata-sections -ffunction-sections -fno-delete-null-pointer-checks
CFLAGS += -Werror
ASFLAGS = $(CPPFLAGS) $(CFLAGS)
LDFLAGS = -static -nostdlib -T $(OBJ)/$(PROGRAM).lds --gc-sections


@ -125,7 +125,7 @@ static bool check_flash(void)

/* Supported flash types for quad mode */
if (id[0] == 0x01 &&
(id[1] == 0x02 || id[1] == 0x20) &&
(id[1] == 0x02 || id[1] == 0x20 || id[1] == 0x60) &&
(id[2] == 0x18 || id[2] == 0x19)) {
check_spansion_quad_mode();
quad = true;

@ -31,6 +31,7 @@
"user_ports": {
"native_0": {
"type": "native",
"block_until_ready": False,
},
},
}

@ -0,0 +1,37 @@
# This file is Copyright (c) 2018-2019 Florent Kermarrec <florent@enjoy-digital.fr>
# License: BSD

{
# General ------------------------------------------------------------------
"cpu": "None", # CPU type (ex vexriscv, serv, None)
"speedgrade": -1, # FPGA speedgrade
"memtype": "DDR3", # DRAM type

# PHY ----------------------------------------------------------------------
"cmd_latency": 0, # Command additional latency
"sdram_module": "MT41K128M16", # SDRAM modules of the board or SO-DIMM
"sdram_module_nb": 2, # Number of byte groups
"sdram_rank_nb": 1, # Number of ranks
"sdram_phy": "A7DDRPHY", # Type of FPGA PHY

# Electrical ---------------------------------------------------------------
"rtt_nom": "60ohm", # Nominal termination
"rtt_wr": "60ohm", # Write termination
"ron": "34ohm", # Output driver impedance

# Frequency ----------------------------------------------------------------
"input_clk_freq": 50e6, # Input clock frequency
"sys_clk_freq": 100e6, # System clock frequency (DDR_clk = 4 x sys_clk)
"iodelay_clk_freq": 200e6, # IODELAYs reference clock frequency

# Core ---------------------------------------------------------------------
"cmd_buffer_depth": 16, # Depth of the command buffer

# User Ports ---------------------------------------------------------------
"user_ports": {
"native_0": {
"type": "native",
"block_until_ready": False,
},
},
}

@ -100,7 +100,7 @@ begin
if rising_edge(clk) then
oack <= '0';
if (wb_in.cyc and wb_in.stb) = '1' then
adr := to_integer((unsigned(wb_in.adr(INIT_RAM_ABITS-1 downto 2))));
adr := to_integer((unsigned(wb_in.adr(INIT_RAM_ABITS - 3 downto 0))));
if wb_in.we = '0' then
obuf <= init_ram(adr);
else

File diff suppressed because it is too large Load Diff

File diff suppressed because one or more lines are too long

@ -100,7 +100,7 @@ begin
if rising_edge(clk) then
oack <= '0';
if (wb_in.cyc and wb_in.stb) = '1' then
adr := to_integer((unsigned(wb_in.adr(INIT_RAM_ABITS-1 downto 2))));
adr := to_integer((unsigned(wb_in.adr(INIT_RAM_ABITS - 3 downto 0))));
if wb_in.we = '0' then
obuf <= init_ram(adr);
else

File diff suppressed because it is too large Load Diff

File diff suppressed because one or more lines are too long

@ -100,7 +100,7 @@ begin
if rising_edge(clk) then
oack <= '0';
if (wb_in.cyc and wb_in.stb) = '1' then
adr := to_integer((unsigned(wb_in.adr(INIT_RAM_ABITS-1 downto 2))));
adr := to_integer((unsigned(wb_in.adr(INIT_RAM_ABITS - 3 downto 0))));
if wb_in.we = '0' then
obuf <= init_ram(adr);
else

File diff suppressed because it is too large Load Diff

File diff suppressed because one or more lines are too long

@ -100,7 +100,7 @@ begin
if rising_edge(clk) then
oack <= '0';
if (wb_in.cyc and wb_in.stb) = '1' then
adr := to_integer((unsigned(wb_in.adr(INIT_RAM_ABITS-1 downto 2))));
adr := to_integer((unsigned(wb_in.adr(INIT_RAM_ABITS - 3 downto 0))));
if wb_in.we = '0' then
obuf <= init_ram(adr);
else

File diff suppressed because it is too large Load Diff

File diff suppressed because one or more lines are too long

@ -0,0 +1,123 @@
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
use std.textio.all;

library work;
use work.wishbone_types.all;
use work.utils.all;

entity dram_init_mem is
generic (
EXTRA_PAYLOAD_FILE : string := "";
EXTRA_PAYLOAD_SIZE : integer := 0
);
port (
clk : in std_ulogic;
wb_in : in wb_io_master_out;
wb_out : out wb_io_slave_out
);
end entity dram_init_mem;

architecture rtl of dram_init_mem is

constant INIT_RAM_SIZE : integer := 24576;
constant RND_PAYLOAD_SIZE : integer := round_up(EXTRA_PAYLOAD_SIZE, 8);
constant TOTAL_RAM_SIZE : integer := INIT_RAM_SIZE + RND_PAYLOAD_SIZE;
constant INIT_RAM_ABITS : integer := log2ceil(TOTAL_RAM_SIZE-1);
constant INIT_RAM_FILE : string := "litedram_core.init";

type ram_t is array(0 to (TOTAL_RAM_SIZE / 4) - 1) of std_logic_vector(31 downto 0);

-- XXX FIXME: Have a single init function called twice with
-- an offset as argument
procedure init_load_payload(ram: inout ram_t; filename: string) is
file payload_file : text open read_mode is filename;
variable ram_line : line;
variable temp_word : std_logic_vector(63 downto 0);
begin
for i in 0 to RND_PAYLOAD_SIZE-1 loop
exit when endfile(payload_file);
readline(payload_file, ram_line);
hread(ram_line, temp_word);
ram((INIT_RAM_SIZE/4) + i*2) := temp_word(31 downto 0);
ram((INIT_RAM_SIZE/4) + i*2+1) := temp_word(63 downto 32);
end loop;
assert endfile(payload_file) report "Payload too big !" severity failure;
end procedure;

impure function init_load_ram(name : string) return ram_t is
file ram_file : text open read_mode is name;
variable temp_word : std_logic_vector(63 downto 0);
variable temp_ram : ram_t := (others => (others => '0'));
variable ram_line : line;
begin
report "Payload size:" & integer'image(EXTRA_PAYLOAD_SIZE) &
" rounded to:" & integer'image(RND_PAYLOAD_SIZE);
report "Total RAM size:" & integer'image(TOTAL_RAM_SIZE) &
" bytes using " & integer'image(INIT_RAM_ABITS) &
" address bits";
for i in 0 to (INIT_RAM_SIZE/8)-1 loop
exit when endfile(ram_file);
readline(ram_file, ram_line);
hread(ram_line, temp_word);
temp_ram(i*2) := temp_word(31 downto 0);
temp_ram(i*2+1) := temp_word(63 downto 32);
end loop;
if RND_PAYLOAD_SIZE /= 0 then
init_load_payload(temp_ram, EXTRA_PAYLOAD_FILE);
end if;
return temp_ram;
end function;

impure function init_zero return ram_t is
variable temp_ram : ram_t := (others => (others => '0'));
begin
return temp_ram;
end function;

impure function initialize_ram(filename: string) return ram_t is
begin
report "Opening file " & filename;
if filename'length = 0 then
return init_zero;
else
return init_load_ram(filename);
end if;
end function;
signal init_ram : ram_t := initialize_ram(INIT_RAM_FILE);

attribute ram_style : string;
attribute ram_style of init_ram: signal is "block";

signal obuf : std_ulogic_vector(31 downto 0);
signal oack : std_ulogic;
begin

init_ram_0: process(clk)
variable adr : integer;
begin
if rising_edge(clk) then
oack <= '0';
if (wb_in.cyc and wb_in.stb) = '1' then
adr := to_integer((unsigned(wb_in.adr(INIT_RAM_ABITS - 3 downto 0))));
if wb_in.we = '0' then
obuf <= init_ram(adr);
else
for i in 0 to 3 loop
if wb_in.sel(i) = '1' then
init_ram(adr)(((i + 1) * 8) - 1 downto i * 8) <=
wb_in.dat(((i + 1) * 8) - 1 downto i * 8);
end if;
end loop;
end if;
oack <= '1';
end if;
wb_out.ack <= oack;
wb_out.dat <= obuf;
end if;
end process;

wb_out.stall <= '0';

end architecture rtl;

File diff suppressed because it is too large Load Diff

File diff suppressed because one or more lines are too long

@ -100,7 +100,7 @@ begin
if rising_edge(clk) then
oack <= '0';
if (wb_in.cyc and wb_in.stb) = '1' then
adr := to_integer((unsigned(wb_in.adr(INIT_RAM_ABITS-1 downto 2))));
adr := to_integer((unsigned(wb_in.adr(INIT_RAM_ABITS - 3 downto 0))));
if wb_in.we = '0' then
obuf <= init_ram(adr);
else

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

@ -0,0 +1,123 @@
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
use std.textio.all;

library work;
use work.wishbone_types.all;
use work.utils.all;

entity dram_init_mem is
generic (
EXTRA_PAYLOAD_FILE : string := "";
EXTRA_PAYLOAD_SIZE : integer := 0
);
port (
clk : in std_ulogic;
wb_in : in wb_io_master_out;
wb_out : out wb_io_slave_out
);
end entity dram_init_mem;

architecture rtl of dram_init_mem is

constant INIT_RAM_SIZE : integer := 24576;
constant RND_PAYLOAD_SIZE : integer := round_up(EXTRA_PAYLOAD_SIZE, 8);
constant TOTAL_RAM_SIZE : integer := INIT_RAM_SIZE + RND_PAYLOAD_SIZE;
constant INIT_RAM_ABITS : integer := log2ceil(TOTAL_RAM_SIZE-1);
constant INIT_RAM_FILE : string := "litedram_core.init";

type ram_t is array(0 to (TOTAL_RAM_SIZE / 4) - 1) of std_logic_vector(31 downto 0);

-- XXX FIXME: Have a single init function called twice with
-- an offset as argument
procedure init_load_payload(ram: inout ram_t; filename: string) is
file payload_file : text open read_mode is filename;
variable ram_line : line;
variable temp_word : std_logic_vector(63 downto 0);
begin
for i in 0 to RND_PAYLOAD_SIZE-1 loop
exit when endfile(payload_file);
readline(payload_file, ram_line);
hread(ram_line, temp_word);
ram((INIT_RAM_SIZE/4) + i*2) := temp_word(31 downto 0);
ram((INIT_RAM_SIZE/4) + i*2+1) := temp_word(63 downto 32);
end loop;
assert endfile(payload_file) report "Payload too big !" severity failure;
end procedure;

impure function init_load_ram(name : string) return ram_t is
file ram_file : text open read_mode is name;
variable temp_word : std_logic_vector(63 downto 0);
variable temp_ram : ram_t := (others => (others => '0'));
variable ram_line : line;
begin
report "Payload size:" & integer'image(EXTRA_PAYLOAD_SIZE) &
" rounded to:" & integer'image(RND_PAYLOAD_SIZE);
report "Total RAM size:" & integer'image(TOTAL_RAM_SIZE) &
" bytes using " & integer'image(INIT_RAM_ABITS) &
" address bits";
for i in 0 to (INIT_RAM_SIZE/8)-1 loop
exit when endfile(ram_file);
readline(ram_file, ram_line);
hread(ram_line, temp_word);
temp_ram(i*2) := temp_word(31 downto 0);
temp_ram(i*2+1) := temp_word(63 downto 32);
end loop;
if RND_PAYLOAD_SIZE /= 0 then
init_load_payload(temp_ram, EXTRA_PAYLOAD_FILE);
end if;
return temp_ram;
end function;

impure function init_zero return ram_t is
variable temp_ram : ram_t := (others => (others => '0'));
begin
return temp_ram;
end function;

impure function initialize_ram(filename: string) return ram_t is
begin
report "Opening file " & filename;
if filename'length = 0 then
return init_zero;
else
return init_load_ram(filename);
end if;
end function;
signal init_ram : ram_t := initialize_ram(INIT_RAM_FILE);

attribute ram_style : string;
attribute ram_style of init_ram: signal is "block";

signal obuf : std_ulogic_vector(31 downto 0);
signal oack : std_ulogic;
begin

init_ram_0: process(clk)
variable adr : integer;
begin
if rising_edge(clk) then
oack <= '0';
if (wb_in.cyc and wb_in.stb) = '1' then
adr := to_integer((unsigned(wb_in.adr(INIT_RAM_ABITS - 3 downto 0))));
if wb_in.we = '0' then
obuf <= init_ram(adr);
else
for i in 0 to 3 loop
if wb_in.sel(i) = '1' then
init_ram(adr)(((i + 1) * 8) - 1 downto i * 8) <=
wb_in.dat(((i + 1) * 8) - 1 downto i * 8);
end if;
end loop;
end if;
oack <= '1';
end if;
wb_out.ack <= oack;
wb_out.dat <= obuf;
end if;
end process;

wb_out.stall <= '0';

end architecture rtl;

File diff suppressed because it is too large Load Diff

File diff suppressed because one or more lines are too long

@ -1,6 +1,6 @@
#!/bin/bash

TARGETS="arty nexys-video"
TARGETS="arty nexys-video wukong-v2"

ME=$(realpath $0)
echo ME=$ME

@ -0,0 +1,17 @@
# This file is Copyright (c) 2020 Florent Kermarrec <florent@enjoy-digital.fr>
# License: BSD

# PHY ----------------------------------------------------------------------
phy: LiteEthPHYGMIIMII
vendor: xilinx
device: xc7
# Core ---------------------------------------------------------------------
clk_freq: 100e6
core: wishbone
endianness: little
ntxslots: 2
nrxslots: 2

soc:
mem_map:
ethmac: 0x00010000

@ -1,5 +1,5 @@
//--------------------------------------------------------------------------------
// Auto-generated by Migen (35203d6) & LiteX (--------) on 2021-08-09 13:54:48
// Auto-generated by Migen (a5bc262) & LiteX (de028765) on 2021-09-24 12:37:00
//--------------------------------------------------------------------------------
module liteeth_core(
input wire sys_clock,
@ -41,15 +41,15 @@ wire main_maccore_maccore_bus_errors_we;
reg main_maccore_maccore_bus_errors_re = 1'd0;
wire main_maccore_maccore_bus_error;
reg [31:0] main_maccore_maccore_bus_errors = 32'd0;
wire sys_clk;
(* dont_touch = "true" *) wire sys_clk;
wire sys_rst;
wire por_clk;
reg main_maccore_int_rst = 1'd1;
reg main_maccore_ethphy_reset_storage = 1'd0;
reg main_maccore_ethphy_reset_re = 1'd0;
wire eth_rx_clk;
(* dont_touch = "true" *) wire eth_rx_clk;
wire eth_rx_rst;
wire eth_tx_clk;
(* dont_touch = "true" *) wire eth_tx_clk;
wire eth_tx_rst;
wire main_maccore_ethphy_reset0;
wire main_maccore_ethphy_reset1;
@ -1260,13 +1260,13 @@ end
assign main_preamble_checker_source_payload_data = main_preamble_checker_sink_payload_data;
assign main_preamble_checker_source_payload_last_be = main_preamble_checker_sink_payload_last_be;
always @(*) begin
builder_liteethmacpreamblechecker_next_state <= 1'd0;
main_preamble_checker_source_first <= 1'd0;
main_preamble_checker_sink_ready <= 1'd0;
main_preamble_checker_source_last <= 1'd0;
main_preamble_checker_source_payload_error <= 1'd0;
main_preamble_checker_error <= 1'd0;
main_preamble_checker_source_valid <= 1'd0;
builder_liteethmacpreamblechecker_next_state <= 1'd0;
builder_liteethmacpreamblechecker_next_state <= builder_liteethmacpreamblechecker_state;
case (builder_liteethmacpreamblechecker_state)
1'd1: begin
@ -1602,13 +1602,13 @@ assign main_padding_checker_source_payload_data = main_padding_checker_sink_payl
assign main_padding_checker_source_payload_last_be = main_padding_checker_sink_payload_last_be;
assign main_padding_checker_source_payload_error = main_padding_checker_sink_payload_error;
always @(*) begin
main_tx_last_be_source_valid <= 1'd0;
main_tx_last_be_sink_ready <= 1'd0;
main_tx_last_be_source_first <= 1'd0;
main_tx_last_be_source_last <= 1'd0;
main_tx_last_be_source_payload_data <= 8'd0;
main_tx_last_be_source_payload_error <= 1'd0;
builder_liteethmactxlastbe_next_state <= 1'd0;
main_tx_last_be_source_valid <= 1'd0;
builder_liteethmactxlastbe_next_state <= builder_liteethmactxlastbe_state;
case (builder_liteethmactxlastbe_state)
1'd1: begin
@ -2041,15 +2041,15 @@ assign main_writer_stat_fifo_syncfifo_dout = main_writer_stat_fifo_rdport_dat_r;
assign main_writer_stat_fifo_syncfifo_writable = (main_writer_stat_fifo_level != 2'd2);
assign main_writer_stat_fifo_syncfifo_readable = (main_writer_stat_fifo_level != 1'd0);
always @(*) begin
main_writer_start <= 1'd0;
main_writer_counter_t_next_value_ce <= 1'd0;
main_writer_ongoing <= 1'd0;
main_writer_slot_ce <= 1'd0;
main_writer_errors_status_f_next_value <= 32'd0;
main_writer_stat_fifo_sink_valid <= 1'd0;
main_writer_errors_status_f_next_value_ce <= 1'd0;
main_writer_start <= 1'd0;
builder_liteethmacsramwriter_next_state <= 3'd0;
main_writer_counter_t_next_value <= 32'd0;
main_writer_slot_ce <= 1'd0;
main_writer_counter_t_next_value_ce <= 1'd0;
builder_liteethmacsramwriter_next_state <= builder_liteethmacsramwriter_state;
case (builder_liteethmacsramwriter_state)
1'd1: begin
@ -2181,6 +2181,7 @@ assign main_reader_cmd_fifo_syncfifo_dout = main_reader_cmd_fifo_rdport_dat_r;
assign main_reader_cmd_fifo_syncfifo_writable = (main_reader_cmd_fifo_level != 2'd2);
assign main_reader_cmd_fifo_syncfifo_readable = (main_reader_cmd_fifo_level != 1'd0);
always @(*) begin
main_reader_source_source_last <= 1'd0;
builder_liteethmacsramreader_next_state <= 2'd0;
main_reader_counter_next_value <= 11'd0;
main_reader_read_address <= 11'd0;
@ -2189,7 +2190,6 @@ always @(*) begin
main_reader_eventsourcepulse_trigger <= 1'd0;
main_reader_source_source_valid <= 1'd0;
main_reader_start <= 1'd0;
main_reader_source_source_last <= 1'd0;
builder_liteethmacsramreader_next_state <= builder_liteethmacsramreader_state;
case (builder_liteethmacsramreader_state)
1'd1: begin
@ -2292,8 +2292,8 @@ always @(*) begin
builder_maccore_wishbone_dat_r <= 32'd0;
builder_maccore_adr <= 14'd0;
builder_maccore_we <= 1'd0;
builder_maccore_dat_w <= 32'd0;
builder_maccore_wishbone_ack <= 1'd0;
builder_maccore_dat_w <= 32'd0;
builder_next_state <= builder_state;
case (builder_state)
1'd1: begin
@ -2348,9 +2348,9 @@ assign builder_maccore_wishbone_cyc = (builder_shared_cyc & builder_slave_sel[1]
assign builder_shared_err = (main_bus_err | builder_maccore_wishbone_err);
assign builder_wait = ((builder_shared_stb & builder_shared_cyc) & (~builder_shared_ack));
always @(*) begin
builder_shared_ack <= 1'd0;
builder_error <= 1'd0;
builder_shared_dat_r <= 32'd0;
builder_shared_ack <= 1'd0;
builder_shared_ack <= (main_bus_ack | builder_maccore_wishbone_ack);
builder_shared_dat_r <= (({32{builder_slave_sel_r[0]}} & main_bus_dat_r) | ({32{builder_slave_sel_r[1]}} & builder_maccore_wishbone_dat_r));
if (builder_done) begin
@ -2381,8 +2381,8 @@ always @(*) begin
end
assign builder_csrbank0_bus_errors_r = builder_interface0_bank_bus_dat_w[31:0];
always @(*) begin
builder_csrbank0_bus_errors_we <= 1'd0;
builder_csrbank0_bus_errors_re <= 1'd0;
builder_csrbank0_bus_errors_we <= 1'd0;
if ((builder_csrbank0_sel & (builder_interface0_bank_bus_adr[8:0] == 2'd2))) begin
builder_csrbank0_bus_errors_re <= builder_interface0_bank_bus_we;
builder_csrbank0_bus_errors_we <= (~builder_interface0_bank_bus_we);
@ -2411,8 +2411,8 @@ always @(*) begin
end
assign builder_csrbank1_sram_writer_length_r = builder_interface1_bank_bus_dat_w[31:0];
always @(*) begin
builder_csrbank1_sram_writer_length_we <= 1'd0;
builder_csrbank1_sram_writer_length_re <= 1'd0;
builder_csrbank1_sram_writer_length_we <= 1'd0;
if ((builder_csrbank1_sel & (builder_interface1_bank_bus_adr[8:0] == 1'd1))) begin
builder_csrbank1_sram_writer_length_re <= builder_interface1_bank_bus_we;
builder_csrbank1_sram_writer_length_we <= (~builder_interface1_bank_bus_we);
@ -2429,8 +2429,8 @@ always @(*) begin
end
assign builder_csrbank1_sram_writer_ev_status_r = builder_interface1_bank_bus_dat_w[0];
always @(*) begin
builder_csrbank1_sram_writer_ev_status_re <= 1'd0;
builder_csrbank1_sram_writer_ev_status_we <= 1'd0;
builder_csrbank1_sram_writer_ev_status_re <= 1'd0;
if ((builder_csrbank1_sel & (builder_interface1_bank_bus_adr[8:0] == 2'd3))) begin
builder_csrbank1_sram_writer_ev_status_re <= builder_interface1_bank_bus_we;
builder_csrbank1_sram_writer_ev_status_we <= (~builder_interface1_bank_bus_we);
@ -2465,8 +2465,8 @@ always @(*) begin
end
assign builder_csrbank1_sram_reader_ready_r = builder_interface1_bank_bus_dat_w[0];
always @(*) begin
builder_csrbank1_sram_reader_ready_re <= 1'd0;
builder_csrbank1_sram_reader_ready_we <= 1'd0;
builder_csrbank1_sram_reader_ready_re <= 1'd0;
if ((builder_csrbank1_sel & (builder_interface1_bank_bus_adr[8:0] == 3'd7))) begin
builder_csrbank1_sram_reader_ready_re <= builder_interface1_bank_bus_we;
builder_csrbank1_sram_reader_ready_we <= (~builder_interface1_bank_bus_we);
@ -2483,8 +2483,8 @@ always @(*) begin
end
assign builder_csrbank1_sram_reader_slot0_r = builder_interface1_bank_bus_dat_w[0];
always @(*) begin
builder_csrbank1_sram_reader_slot0_we <= 1'd0;
builder_csrbank1_sram_reader_slot0_re <= 1'd0;
builder_csrbank1_sram_reader_slot0_we <= 1'd0;
if ((builder_csrbank1_sel & (builder_interface1_bank_bus_adr[8:0] == 4'd9))) begin
builder_csrbank1_sram_reader_slot0_re <= builder_interface1_bank_bus_we;
builder_csrbank1_sram_reader_slot0_we <= (~builder_interface1_bank_bus_we);
@ -2510,8 +2510,8 @@ always @(*) begin
end
assign builder_csrbank1_sram_reader_ev_pending_r = builder_interface1_bank_bus_dat_w[0];
always @(*) begin
builder_csrbank1_sram_reader_ev_pending_we <= 1'd0;
builder_csrbank1_sram_reader_ev_pending_re <= 1'd0;
builder_csrbank1_sram_reader_ev_pending_we <= 1'd0;
if ((builder_csrbank1_sel & (builder_interface1_bank_bus_adr[8:0] == 4'd12))) begin
builder_csrbank1_sram_reader_ev_pending_re <= builder_interface1_bank_bus_we;
builder_csrbank1_sram_reader_ev_pending_we <= (~builder_interface1_bank_bus_we);
@ -2537,8 +2537,8 @@ always @(*) begin
end
assign builder_csrbank1_preamble_errors_r = builder_interface1_bank_bus_dat_w[31:0];
always @(*) begin
builder_csrbank1_preamble_errors_we <= 1'd0;
builder_csrbank1_preamble_errors_re <= 1'd0;
builder_csrbank1_preamble_errors_we <= 1'd0;
if ((builder_csrbank1_sel & (builder_interface1_bank_bus_adr[8:0] == 4'd15))) begin
builder_csrbank1_preamble_errors_re <= builder_interface1_bank_bus_we;
builder_csrbank1_preamble_errors_we <= (~builder_interface1_bank_bus_we);
@ -2608,8 +2608,8 @@ always @(*) begin
end
assign builder_csrbank2_mdio_r_r = builder_interface2_bank_bus_dat_w[0];
always @(*) begin
builder_csrbank2_mdio_r_we <= 1'd0;
builder_csrbank2_mdio_r_re <= 1'd0;
builder_csrbank2_mdio_r_we <= 1'd0;
if ((builder_csrbank2_sel & (builder_interface2_bank_bus_adr[8:0] == 2'd2))) begin
builder_csrbank2_mdio_r_re <= builder_interface2_bank_bus_we;
builder_csrbank2_mdio_r_we <= (~builder_interface2_bank_bus_we);
@ -3334,275 +3334,95 @@ end
assign main_writer_stat_fifo_wrport_dat_r = memdat_1;
assign main_writer_stat_fifo_rdport_dat_r = storage_3[main_writer_stat_fifo_rdport_adr];

reg [13:0] storage_4[0:1];
reg [13:0] memdat_2;
always @(posedge sys_clk) begin
if (main_reader_cmd_fifo_wrport_we)
storage_4[main_reader_cmd_fifo_wrport_adr] <= main_reader_cmd_fifo_wrport_dat_w;
memdat_2 <= storage_4[main_reader_cmd_fifo_wrport_adr];
end

always @(posedge sys_clk) begin
end

assign main_reader_cmd_fifo_wrport_dat_r = memdat_2;
assign main_reader_cmd_fifo_rdport_dat_r = storage_4[main_reader_cmd_fifo_rdport_adr];

reg [7:0] mem_grain0[0:381];
reg [31:0] mem[0:381];
reg [8:0] memadr_4;
reg [7:0] memdat_3;
reg [31:0] memdat_2;
always @(posedge sys_clk) begin
if (main_writer_memory0_we)
mem_grain0[main_writer_memory0_adr] <= main_writer_memory0_dat_w[7:0];
mem[main_writer_memory0_adr] <= main_writer_memory0_dat_w;
memadr_4 <= main_writer_memory0_adr;
end

always @(posedge sys_clk) begin
memdat_3 <= mem_grain0[main_sram0_adr0];
memdat_2 <= mem[main_sram0_adr0];
end

assign main_writer_memory0_dat_r[7:0] = mem_grain0[memadr_4];
assign main_sram0_dat_r0[7:0] = memdat_3;
assign main_writer_memory0_dat_r = mem[memadr_4];
assign main_sram0_dat_r0 = memdat_2;

reg [7:0] mem_grain1[0:381];
reg [31:0] mem_1[0:381];
reg [8:0] memadr_5;
reg [7:0] memdat_4;
always @(posedge sys_clk) begin
if (main_writer_memory0_we)
mem_grain1[main_writer_memory0_adr] <= main_writer_memory0_dat_w[15:8];
memadr_5 <= main_writer_memory0_adr;
end

always @(posedge sys_clk) begin
memdat_4 <= mem_grain1[main_sram0_adr0];
end

assign main_writer_memory0_dat_r[15:8] = mem_grain1[memadr_5];
assign main_sram0_dat_r0[15:8] = memdat_4;

reg [7:0] mem_grain2[0:381];
reg [8:0] memadr_6;
reg [7:0] memdat_5;
always @(posedge sys_clk) begin
if (main_writer_memory0_we)
mem_grain2[main_writer_memory0_adr] <= main_writer_memory0_dat_w[23:16];
memadr_6 <= main_writer_memory0_adr;
end

always @(posedge sys_clk) begin
memdat_5 <= mem_grain2[main_sram0_adr0];
end

assign main_writer_memory0_dat_r[23:16] = mem_grain2[memadr_6];
assign main_sram0_dat_r0[23:16] = memdat_5;

reg [7:0] mem_grain3[0:381];
reg [8:0] memadr_7;
reg [7:0] memdat_6;
always @(posedge sys_clk) begin
if (main_writer_memory0_we)
mem_grain3[main_writer_memory0_adr] <= main_writer_memory0_dat_w[31:24];
memadr_7 <= main_writer_memory0_adr;
end

always @(posedge sys_clk) begin
memdat_6 <= mem_grain3[main_sram0_adr0];
end

assign main_writer_memory0_dat_r[31:24] = mem_grain3[memadr_7];
assign main_sram0_dat_r0[31:24] = memdat_6;

reg [7:0] mem_grain0_1[0:381];
reg [8:0] memadr_8;
reg [7:0] memdat_7;
always @(posedge sys_clk) begin
if (main_writer_memory1_we)
mem_grain0_1[main_writer_memory1_adr] <= main_writer_memory1_dat_w[7:0];
memadr_8 <= main_writer_memory1_adr;
end

always @(posedge sys_clk) begin
memdat_7 <= mem_grain0_1[main_sram1_adr0];
end

assign main_writer_memory1_dat_r[7:0] = mem_grain0_1[memadr_8];
assign main_sram1_dat_r0[7:0] = memdat_7;

reg [7:0] mem_grain1_1[0:381];
reg [8:0] memadr_9;
reg [7:0] memdat_8;
always @(posedge sys_clk) begin
if (main_writer_memory1_we)
mem_grain1_1[main_writer_memory1_adr] <= main_writer_memory1_dat_w[15:8];
memadr_9 <= main_writer_memory1_adr;
end

always @(posedge sys_clk) begin
memdat_8 <= mem_grain1_1[main_sram1_adr0];
end

assign main_writer_memory1_dat_r[15:8] = mem_grain1_1[memadr_9];
assign main_sram1_dat_r0[15:8] = memdat_8;

reg [7:0] mem_grain2_1[0:381];
reg [8:0] memadr_10;
reg [7:0] memdat_9;
reg [31:0] memdat_3;
always @(posedge sys_clk) begin
if (main_writer_memory1_we)
mem_grain2_1[main_writer_memory1_adr] <= main_writer_memory1_dat_w[23:16];
memadr_10 <= main_writer_memory1_adr;
mem_1[main_writer_memory1_adr] <= main_writer_memory1_dat_w;
memadr_5 <= main_writer_memory1_adr;
end

always @(posedge sys_clk) begin
memdat_9 <= mem_grain2_1[main_sram1_adr0];
memdat_3 <= mem_1[main_sram1_adr0];
end

assign main_writer_memory1_dat_r[23:16] = mem_grain2_1[memadr_10];
assign main_sram1_dat_r0[23:16] = memdat_9;
assign main_writer_memory1_dat_r = mem_1[memadr_5];
assign main_sram1_dat_r0 = memdat_3;

reg [7:0] mem_grain3_1[0:381];
reg [8:0] memadr_11;
reg [7:0] memdat_10;
reg [13:0] storage_4[0:1];
reg [13:0] memdat_4;
always @(posedge sys_clk) begin
if (main_writer_memory1_we)
mem_grain3_1[main_writer_memory1_adr] <= main_writer_memory1_dat_w[31:24];
memadr_11 <= main_writer_memory1_adr;
if (main_reader_cmd_fifo_wrport_we)
storage_4[main_reader_cmd_fifo_wrport_adr] <= main_reader_cmd_fifo_wrport_dat_w;
memdat_4 <= storage_4[main_reader_cmd_fifo_wrport_adr];
end

always @(posedge sys_clk) begin
memdat_10 <= mem_grain3_1[main_sram1_adr0];
end

assign main_writer_memory1_dat_r[31:24] = mem_grain3_1[memadr_11];
assign main_sram1_dat_r0[31:24] = memdat_10;
assign main_reader_cmd_fifo_wrport_dat_r = memdat_4;
assign main_reader_cmd_fifo_rdport_dat_r = storage_4[main_reader_cmd_fifo_rdport_adr];

reg [7:0] mem_grain0_2[0:381];
reg [8:0] memadr_12;
reg [8:0] memadr_13;
reg [31:0] mem_2[0:381];
reg [8:0] memadr_6;
reg [8:0] memadr_7;
always @(posedge sys_clk) begin
memadr_12 <= main_reader_memory0_adr;
memadr_6 <= main_reader_memory0_adr;
end

always @(posedge sys_clk) begin
if (main_sram0_we[0])
mem_grain0_2[main_sram0_adr1] <= main_sram0_dat_w[7:0];
memadr_13 <= main_sram0_adr1;
end

assign main_reader_memory0_dat_r[7:0] = mem_grain0_2[memadr_12];
assign main_sram0_dat_r1[7:0] = mem_grain0_2[memadr_13];

reg [7:0] mem_grain1_2[0:381];
reg [8:0] memadr_14;
reg [8:0] memadr_15;
always @(posedge sys_clk) begin
memadr_14 <= main_reader_memory0_adr;
end

always @(posedge sys_clk) begin
mem_2[main_sram0_adr1][7:0] <= main_sram0_dat_w[7:0];
if (main_sram0_we[1])
mem_grain1_2[main_sram0_adr1] <= main_sram0_dat_w[15:8];
memadr_15 <= main_sram0_adr1;
end

assign main_reader_memory0_dat_r[15:8] = mem_grain1_2[memadr_14];
assign main_sram0_dat_r1[15:8] = mem_grain1_2[memadr_15];

reg [7:0] mem_grain2_2[0:381];
reg [8:0] memadr_16;
reg [8:0] memadr_17;
always @(posedge sys_clk) begin
memadr_16 <= main_reader_memory0_adr;
end

always @(posedge sys_clk) begin
mem_2[main_sram0_adr1][15:8] <= main_sram0_dat_w[15:8];
if (main_sram0_we[2])
mem_grain2_2[main_sram0_adr1] <= main_sram0_dat_w[23:16];
memadr_17 <= main_sram0_adr1;
end

assign main_reader_memory0_dat_r[23:16] = mem_grain2_2[memadr_16];
assign main_sram0_dat_r1[23:16] = mem_grain2_2[memadr_17];

reg [7:0] mem_grain3_2[0:381];
reg [8:0] memadr_18;
reg [8:0] memadr_19;
always @(posedge sys_clk) begin
memadr_18 <= main_reader_memory0_adr;
end

always @(posedge sys_clk) begin
mem_2[main_sram0_adr1][23:16] <= main_sram0_dat_w[23:16];
if (main_sram0_we[3])
mem_grain3_2[main_sram0_adr1] <= main_sram0_dat_w[31:24];
memadr_19 <= main_sram0_adr1;
mem_2[main_sram0_adr1][31:24] <= main_sram0_dat_w[31:24];
memadr_7 <= main_sram0_adr1;
end

assign main_reader_memory0_dat_r[31:24] = mem_grain3_2[memadr_18];
assign main_sram0_dat_r1[31:24] = mem_grain3_2[memadr_19];
assign main_reader_memory0_dat_r = mem_2[memadr_6];
assign main_sram0_dat_r1 = mem_2[memadr_7];

reg [7:0] mem_grain0_3[0:381];
reg [8:0] memadr_20;
reg [8:0] memadr_21;
reg [31:0] mem_3[0:381];
reg [8:0] memadr_8;
reg [8:0] memadr_9;
always @(posedge sys_clk) begin
memadr_20 <= main_reader_memory1_adr;
memadr_8 <= main_reader_memory1_adr;
end

always @(posedge sys_clk) begin
if (main_sram1_we[0])
mem_grain0_3[main_sram1_adr1] <= main_sram1_dat_w[7:0];
memadr_21 <= main_sram1_adr1;
end

assign main_reader_memory1_dat_r[7:0] = mem_grain0_3[memadr_20];
assign main_sram1_dat_r1[7:0] = mem_grain0_3[memadr_21];

reg [7:0] mem_grain1_3[0:381];
reg [8:0] memadr_22;
reg [8:0] memadr_23;
always @(posedge sys_clk) begin
memadr_22 <= main_reader_memory1_adr;
end

always @(posedge sys_clk) begin
mem_3[main_sram1_adr1][7:0] <= main_sram1_dat_w[7:0];
if (main_sram1_we[1])
mem_grain1_3[main_sram1_adr1] <= main_sram1_dat_w[15:8];
memadr_23 <= main_sram1_adr1;
end

assign main_reader_memory1_dat_r[15:8] = mem_grain1_3[memadr_22];
assign main_sram1_dat_r1[15:8] = mem_grain1_3[memadr_23];

reg [7:0] mem_grain2_3[0:381];
reg [8:0] memadr_24;
reg [8:0] memadr_25;
always @(posedge sys_clk) begin
memadr_24 <= main_reader_memory1_adr;
end

always @(posedge sys_clk) begin
mem_3[main_sram1_adr1][15:8] <= main_sram1_dat_w[15:8];
if (main_sram1_we[2])
mem_grain2_3[main_sram1_adr1] <= main_sram1_dat_w[23:16];
memadr_25 <= main_sram1_adr1;
end

assign main_reader_memory1_dat_r[23:16] = mem_grain2_3[memadr_24];
assign main_sram1_dat_r1[23:16] = mem_grain2_3[memadr_25];

reg [7:0] mem_grain3_3[0:381];
reg [8:0] memadr_26;
reg [8:0] memadr_27;
always @(posedge sys_clk) begin
memadr_26 <= main_reader_memory1_adr;
end

always @(posedge sys_clk) begin
mem_3[main_sram1_adr1][23:16] <= main_sram1_dat_w[23:16];
if (main_sram1_we[3])
mem_grain3_3[main_sram1_adr1] <= main_sram1_dat_w[31:24];
memadr_27 <= main_sram1_adr1;
mem_3[main_sram1_adr1][31:24] <= main_sram1_dat_w[31:24];
memadr_9 <= main_sram1_adr1;
end

assign main_reader_memory1_dat_r[31:24] = mem_grain3_3[memadr_26];
assign main_sram1_dat_r1[31:24] = mem_grain3_3[memadr_27];
assign main_reader_memory1_dat_r = mem_3[memadr_8];
assign main_sram1_dat_r1 = mem_3[memadr_9];

(* ars_ff1 = "true", async_reg = "true" *) FDPE #(
.INIT(1'd1)

@ -1,5 +1,5 @@
//--------------------------------------------------------------------------------
// Auto-generated by Migen (35203d6) & LiteX (--------) on 2021-08-09 13:54:49
// Auto-generated by Migen (a5bc262) & LiteX (de028765) on 2021-09-24 12:37:01
//--------------------------------------------------------------------------------
module liteeth_core(
input wire sys_clock,
@ -39,16 +39,16 @@ wire main_maccore_maccore_bus_errors_we;
reg main_maccore_maccore_bus_errors_re = 1'd0;
wire main_maccore_maccore_bus_error;
reg [31:0] main_maccore_maccore_bus_errors = 32'd0;
wire sys_clk;
(* dont_touch = "true" *) wire sys_clk;
wire sys_rst;
wire por_clk;
reg main_maccore_int_rst = 1'd1;
reg main_maccore_ethphy_reset_storage = 1'd0;
reg main_maccore_ethphy_reset_re = 1'd0;
wire eth_rx_clk;
(* dont_touch = "true" *) wire eth_rx_clk;
wire eth_rx_rst;
wire main_maccore_ethphy_eth_rx_clk_ibuf;
wire eth_tx_clk;
(* dont_touch = "true" *) wire eth_tx_clk;
wire eth_tx_rst;
wire eth_tx_delayed_clk;
reg main_maccore_ethphy_reset0 = 1'd0;
@ -1051,6 +1051,7 @@ assign main_sink_payload_error = main_rx_cdc_source_source_payload_error;
assign main_ps_preamble_error_i = main_preamble_checker_error;
assign main_ps_crc_error_i = main_liteethmaccrc32checker_error;
always @(*) begin
main_tx_gap_inserter_source_payload_last_be <= 1'd0;
main_tx_gap_inserter_source_payload_error <= 1'd0;
main_tx_gap_inserter_sink_ready <= 1'd0;
main_tx_gap_inserter_source_valid <= 1'd0;
@ -1060,7 +1061,6 @@ always @(*) begin
main_tx_gap_inserter_counter_liteethmacgap_next_value_ce <= 1'd0;
main_tx_gap_inserter_source_last <= 1'd0;
main_tx_gap_inserter_source_payload_data <= 8'd0;
main_tx_gap_inserter_source_payload_last_be <= 1'd0;
builder_liteethmacgap_next_state <= builder_liteethmacgap_state;
case (builder_liteethmacgap_state)
1'd1: begin
@ -1163,8 +1163,8 @@ assign main_preamble_checker_source_payload_last_be = main_preamble_checker_sink
always @(*) begin
main_preamble_checker_source_payload_error <= 1'd0;
main_preamble_checker_error <= 1'd0;
main_preamble_checker_sink_ready <= 1'd0;
main_preamble_checker_source_valid <= 1'd0;
main_preamble_checker_sink_ready <= 1'd0;
builder_liteethmacpreamblechecker_next_state <= 1'd0;
main_preamble_checker_source_first <= 1'd0;
main_preamble_checker_source_last <= 1'd0;
@ -1445,6 +1445,7 @@ assign main_ps_preamble_error_o = (main_ps_preamble_error_toggle_o ^ main_ps_pre
assign main_ps_crc_error_o = (main_ps_crc_error_toggle_o ^ main_ps_crc_error_toggle_o_r);
assign main_padding_inserter_counter_done = (main_padding_inserter_counter >= 6'd59);
always @(*) begin
builder_liteethmacpaddinginserter_next_state <= 1'd0;
main_padding_inserter_counter_liteethmacpaddinginserter_next_value <= 16'd0;
main_padding_inserter_counter_liteethmacpaddinginserter_next_value_ce <= 1'd0;
main_padding_inserter_sink_ready <= 1'd0;
@ -1454,7 +1455,6 @@ always @(*) begin
main_padding_inserter_source_payload_data <= 8'd0;
main_padding_inserter_source_payload_last_be <= 1'd0;
main_padding_inserter_source_payload_error <= 1'd0;
builder_liteethmacpaddinginserter_next_state <= 1'd0;
builder_liteethmacpaddinginserter_next_state <= builder_liteethmacpaddinginserter_state;
case (builder_liteethmacpaddinginserter_state)
1'd1: begin
@ -1945,8 +1945,8 @@ always @(*) begin
builder_liteethmacsramwriter_next_state <= 3'd0;
main_writer_slot_ce <= 1'd0;
main_writer_counter_t_next_value <= 32'd0;
main_writer_start <= 1'd0;
main_writer_counter_t_next_value_ce <= 1'd0;
main_writer_start <= 1'd0;
main_writer_ongoing <= 1'd0;
main_writer_errors_status_f_next_value <= 32'd0;
main_writer_stat_fifo_sink_valid <= 1'd0;
@ -2191,8 +2191,8 @@ assign main_bus_dat_r = (((({32{main_slave_sel_r[0]}} & main_sram0_bus_dat_r0) |
always @(*) begin
builder_maccore_adr <= 14'd0;
builder_maccore_we <= 1'd0;
builder_maccore_wishbone_ack <= 1'd0;
builder_maccore_dat_w <= 32'd0;
builder_maccore_wishbone_ack <= 1'd0;
builder_next_state <= 1'd0;
builder_maccore_wishbone_dat_r <= 32'd0;
builder_next_state <= builder_state;
@ -2250,8 +2250,8 @@ assign builder_shared_err = (main_bus_err | builder_maccore_wishbone_err);
assign builder_wait = ((builder_shared_stb & builder_shared_cyc) & (~builder_shared_ack));
always @(*) begin
builder_shared_ack <= 1'd0;
builder_shared_dat_r <= 32'd0;
builder_error <= 1'd0;
builder_shared_dat_r <= 32'd0;
builder_shared_ack <= (main_bus_ack | builder_maccore_wishbone_ack);
builder_shared_dat_r <= (({32{builder_slave_sel_r[0]}} & main_bus_dat_r) | ({32{builder_slave_sel_r[1]}} & builder_maccore_wishbone_dat_r));
if (builder_done) begin
@ -2273,8 +2273,8 @@ always @(*) begin
end
assign builder_csrbank0_scratch0_r = builder_interface0_bank_bus_dat_w[31:0];
always @(*) begin
builder_csrbank0_scratch0_we <= 1'd0;
builder_csrbank0_scratch0_re <= 1'd0;
builder_csrbank0_scratch0_we <= 1'd0;
if ((builder_csrbank0_sel & (builder_interface0_bank_bus_adr[8:0] == 1'd1))) begin
builder_csrbank0_scratch0_re <= builder_interface0_bank_bus_we;
builder_csrbank0_scratch0_we <= (~builder_interface0_bank_bus_we);
@ -2348,8 +2348,8 @@ always @(*) begin
end
assign builder_csrbank1_sram_writer_ev_enable0_r = builder_interface1_bank_bus_dat_w[0];
always @(*) begin
builder_csrbank1_sram_writer_ev_enable0_re <= 1'd0;
builder_csrbank1_sram_writer_ev_enable0_we <= 1'd0;
builder_csrbank1_sram_writer_ev_enable0_re <= 1'd0;
if ((builder_csrbank1_sel & (builder_interface1_bank_bus_adr[8:0] == 3'd5))) begin
builder_csrbank1_sram_writer_ev_enable0_re <= builder_interface1_bank_bus_we;
builder_csrbank1_sram_writer_ev_enable0_we <= (~builder_interface1_bank_bus_we);
@ -2357,8 +2357,8 @@ always @(*) begin
end
assign main_reader_start_start_r = builder_interface1_bank_bus_dat_w[0];
always @(*) begin
main_reader_start_start_re <= 1'd0;
main_reader_start_start_we <= 1'd0;
main_reader_start_start_re <= 1'd0;
if ((builder_csrbank1_sel & (builder_interface1_bank_bus_adr[8:0] == 3'd6))) begin
main_reader_start_start_re <= builder_interface1_bank_bus_we;
main_reader_start_start_we <= (~builder_interface1_bank_bus_we);
@ -2500,8 +2500,8 @@ always @(*) begin
end
assign builder_csrbank2_mdio_w0_r = builder_interface2_bank_bus_dat_w[2:0];
always @(*) begin
builder_csrbank2_mdio_w0_we <= 1'd0;
builder_csrbank2_mdio_w0_re <= 1'd0;
builder_csrbank2_mdio_w0_we <= 1'd0;
if ((builder_csrbank2_sel & (builder_interface2_bank_bus_adr[8:0] == 1'd1))) begin
builder_csrbank2_mdio_w0_re <= builder_interface2_bank_bus_we;
builder_csrbank2_mdio_w0_we <= (~builder_interface2_bank_bus_we);
@ -3451,20 +3451,96 @@ end
assign main_writer_stat_fifo_wrport_dat_r = memdat_1;
assign main_writer_stat_fifo_rdport_dat_r = storage_3[main_writer_stat_fifo_rdport_adr];

reg [31:0] mem[0:381];
reg [8:0] memadr_4;
reg [31:0] memdat_2;
always @(posedge sys_clk) begin
if (main_writer_memory0_we)
mem[main_writer_memory0_adr] <= main_writer_memory0_dat_w;
memadr_4 <= main_writer_memory0_adr;
end

always @(posedge sys_clk) begin
memdat_2 <= mem[main_sram0_adr0];
end

assign main_writer_memory0_dat_r = mem[memadr_4];
assign main_sram0_dat_r0 = memdat_2;

reg [31:0] mem_1[0:381];
reg [8:0] memadr_5;
reg [31:0] memdat_3;
always @(posedge sys_clk) begin
if (main_writer_memory1_we)
mem_1[main_writer_memory1_adr] <= main_writer_memory1_dat_w;
memadr_5 <= main_writer_memory1_adr;
end

always @(posedge sys_clk) begin
memdat_3 <= mem_1[main_sram1_adr0];
end

assign main_writer_memory1_dat_r = mem_1[memadr_5];
assign main_sram1_dat_r0 = memdat_3;

reg [13:0] storage_4[0:1];
reg [13:0] memdat_2;
reg [13:0] memdat_4;
always @(posedge sys_clk) begin
if (main_reader_cmd_fifo_wrport_we)
storage_4[main_reader_cmd_fifo_wrport_adr] <= main_reader_cmd_fifo_wrport_dat_w;
memdat_2 <= storage_4[main_reader_cmd_fifo_wrport_adr];
memdat_4 <= storage_4[main_reader_cmd_fifo_wrport_adr];
end

always @(posedge sys_clk) begin
end

assign main_reader_cmd_fifo_wrport_dat_r = memdat_2;
assign main_reader_cmd_fifo_wrport_dat_r = memdat_4;
assign main_reader_cmd_fifo_rdport_dat_r = storage_4[main_reader_cmd_fifo_rdport_adr];

reg [31:0] mem_2[0:381];
reg [8:0] memadr_6;
reg [8:0] memadr_7;
always @(posedge sys_clk) begin
memadr_6 <= main_reader_memory0_adr;
end

always @(posedge sys_clk) begin
if (main_sram0_we[0])
mem_2[main_sram0_adr1][7:0] <= main_sram0_dat_w[7:0];
if (main_sram0_we[1])
mem_2[main_sram0_adr1][15:8] <= main_sram0_dat_w[15:8];
if (main_sram0_we[2])
mem_2[main_sram0_adr1][23:16] <= main_sram0_dat_w[23:16];
if (main_sram0_we[3])
mem_2[main_sram0_adr1][31:24] <= main_sram0_dat_w[31:24];
memadr_7 <= main_sram0_adr1;
end

assign main_reader_memory0_dat_r = mem_2[memadr_6];
assign main_sram0_dat_r1 = mem_2[memadr_7];

reg [31:0] mem_3[0:381];
reg [8:0] memadr_8;
reg [8:0] memadr_9;
always @(posedge sys_clk) begin
memadr_8 <= main_reader_memory1_adr;
end

always @(posedge sys_clk) begin
if (main_sram1_we[0])
mem_3[main_sram1_adr1][7:0] <= main_sram1_dat_w[7:0];
if (main_sram1_we[1])
mem_3[main_sram1_adr1][15:8] <= main_sram1_dat_w[15:8];
if (main_sram1_we[2])
mem_3[main_sram1_adr1][23:16] <= main_sram1_dat_w[23:16];
if (main_sram1_we[3])
mem_3[main_sram1_adr1][31:24] <= main_sram1_dat_w[31:24];
memadr_9 <= main_sram1_adr1;
end

assign main_reader_memory1_dat_r = mem_3[memadr_8];
assign main_sram1_dat_r1 = mem_3[memadr_9];

FD FD(
.C(main_maccore_ethphy_clkin),
.D(main_maccore_ethphy_reset0),
@ -3534,262 +3610,6 @@ PLLE2_ADV #(
.LOCKED(main_maccore_ethphy_locked)
);

reg [7:0] mem_grain0[0:381];
reg [8:0] memadr_4;
reg [7:0] memdat_3;
always @(posedge sys_clk) begin
if (main_writer_memory0_we)
mem_grain0[main_writer_memory0_adr] <= main_writer_memory0_dat_w[7:0];
memadr_4 <= main_writer_memory0_adr;
end

always @(posedge sys_clk) begin
memdat_3 <= mem_grain0[main_sram0_adr0];
end

assign main_writer_memory0_dat_r[7:0] = mem_grain0[memadr_4];
assign main_sram0_dat_r0[7:0] = memdat_3;

reg [7:0] mem_grain1[0:381];
reg [8:0] memadr_5;
reg [7:0] memdat_4;
always @(posedge sys_clk) begin
if (main_writer_memory0_we)
mem_grain1[main_writer_memory0_adr] <= main_writer_memory0_dat_w[15:8];
memadr_5 <= main_writer_memory0_adr;
end

always @(posedge sys_clk) begin
memdat_4 <= mem_grain1[main_sram0_adr0];
end

assign main_writer_memory0_dat_r[15:8] = mem_grain1[memadr_5];
assign main_sram0_dat_r0[15:8] = memdat_4;

reg [7:0] mem_grain2[0:381];
reg [8:0] memadr_6;
reg [7:0] memdat_5;
always @(posedge sys_clk) begin
if (main_writer_memory0_we)
mem_grain2[main_writer_memory0_adr] <= main_writer_memory0_dat_w[23:16];
memadr_6 <= main_writer_memory0_adr;
end

always @(posedge sys_clk) begin
memdat_5 <= mem_grain2[main_sram0_adr0];
end

assign main_writer_memory0_dat_r[23:16] = mem_grain2[memadr_6];
assign main_sram0_dat_r0[23:16] = memdat_5;

reg [7:0] mem_grain3[0:381];
reg [8:0] memadr_7;
reg [7:0] memdat_6;
always @(posedge sys_clk) begin
if (main_writer_memory0_we)
mem_grain3[main_writer_memory0_adr] <= main_writer_memory0_dat_w[31:24];
memadr_7 <= main_writer_memory0_adr;
end

always @(posedge sys_clk) begin
memdat_6 <= mem_grain3[main_sram0_adr0];
end

assign main_writer_memory0_dat_r[31:24] = mem_grain3[memadr_7];
assign main_sram0_dat_r0[31:24] = memdat_6;

reg [7:0] mem_grain0_1[0:381];
reg [8:0] memadr_8;
reg [7:0] memdat_7;
always @(posedge sys_clk) begin
if (main_writer_memory1_we)
mem_grain0_1[main_writer_memory1_adr] <= main_writer_memory1_dat_w[7:0];
memadr_8 <= main_writer_memory1_adr;
end

always @(posedge sys_clk) begin
memdat_7 <= mem_grain0_1[main_sram1_adr0];
end

assign main_writer_memory1_dat_r[7:0] = mem_grain0_1[memadr_8];
assign main_sram1_dat_r0[7:0] = memdat_7;

reg [7:0] mem_grain1_1[0:381];
reg [8:0] memadr_9;
reg [7:0] memdat_8;
always @(posedge sys_clk) begin
if (main_writer_memory1_we)
mem_grain1_1[main_writer_memory1_adr] <= main_writer_memory1_dat_w[15:8];
memadr_9 <= main_writer_memory1_adr;
end

always @(posedge sys_clk) begin
memdat_8 <= mem_grain1_1[main_sram1_adr0];
end

assign main_writer_memory1_dat_r[15:8] = mem_grain1_1[memadr_9];
assign main_sram1_dat_r0[15:8] = memdat_8;

reg [7:0] mem_grain2_1[0:381];
reg [8:0] memadr_10;
reg [7:0] memdat_9;
always @(posedge sys_clk) begin
if (main_writer_memory1_we)
mem_grain2_1[main_writer_memory1_adr] <= main_writer_memory1_dat_w[23:16];
memadr_10 <= main_writer_memory1_adr;
end

always @(posedge sys_clk) begin
memdat_9 <= mem_grain2_1[main_sram1_adr0];
end

assign main_writer_memory1_dat_r[23:16] = mem_grain2_1[memadr_10];
assign main_sram1_dat_r0[23:16] = memdat_9;

reg [7:0] mem_grain3_1[0:381];
reg [8:0] memadr_11;
reg [7:0] memdat_10;
always @(posedge sys_clk) begin
if (main_writer_memory1_we)
mem_grain3_1[main_writer_memory1_adr] <= main_writer_memory1_dat_w[31:24];
memadr_11 <= main_writer_memory1_adr;
end

always @(posedge sys_clk) begin
memdat_10 <= mem_grain3_1[main_sram1_adr0];
end

assign main_writer_memory1_dat_r[31:24] = mem_grain3_1[memadr_11];
assign main_sram1_dat_r0[31:24] = memdat_10;

reg [7:0] mem_grain0_2[0:381];
reg [8:0] memadr_12;
reg [8:0] memadr_13;
always @(posedge sys_clk) begin
memadr_12 <= main_reader_memory0_adr;
end

always @(posedge sys_clk) begin
if (main_sram0_we[0])
mem_grain0_2[main_sram0_adr1] <= main_sram0_dat_w[7:0];
memadr_13 <= main_sram0_adr1;
end

assign main_reader_memory0_dat_r[7:0] = mem_grain0_2[memadr_12];
assign main_sram0_dat_r1[7:0] = mem_grain0_2[memadr_13];

reg [7:0] mem_grain1_2[0:381];
reg [8:0] memadr_14;
reg [8:0] memadr_15;
always @(posedge sys_clk) begin
memadr_14 <= main_reader_memory0_adr;
end

always @(posedge sys_clk) begin
if (main_sram0_we[1])
mem_grain1_2[main_sram0_adr1] <= main_sram0_dat_w[15:8];
memadr_15 <= main_sram0_adr1;
end

assign main_reader_memory0_dat_r[15:8] = mem_grain1_2[memadr_14];
assign main_sram0_dat_r1[15:8] = mem_grain1_2[memadr_15];

reg [7:0] mem_grain2_2[0:381];
reg [8:0] memadr_16;
reg [8:0] memadr_17;
always @(posedge sys_clk) begin
memadr_16 <= main_reader_memory0_adr;
end

always @(posedge sys_clk) begin
if (main_sram0_we[2])
mem_grain2_2[main_sram0_adr1] <= main_sram0_dat_w[23:16];
memadr_17 <= main_sram0_adr1;
end

assign main_reader_memory0_dat_r[23:16] = mem_grain2_2[memadr_16];
assign main_sram0_dat_r1[23:16] = mem_grain2_2[memadr_17];

reg [7:0] mem_grain3_2[0:381];
reg [8:0] memadr_18;
reg [8:0] memadr_19;
always @(posedge sys_clk) begin
memadr_18 <= main_reader_memory0_adr;
end

always @(posedge sys_clk) begin
if (main_sram0_we[3])
mem_grain3_2[main_sram0_adr1] <= main_sram0_dat_w[31:24];
memadr_19 <= main_sram0_adr1;
end

assign main_reader_memory0_dat_r[31:24] = mem_grain3_2[memadr_18];
assign main_sram0_dat_r1[31:24] = mem_grain3_2[memadr_19];

reg [7:0] mem_grain0_3[0:381];
reg [8:0] memadr_20;
reg [8:0] memadr_21;
always @(posedge sys_clk) begin
memadr_20 <= main_reader_memory1_adr;
end

always @(posedge sys_clk) begin
if (main_sram1_we[0])
mem_grain0_3[main_sram1_adr1] <= main_sram1_dat_w[7:0];
memadr_21 <= main_sram1_adr1;
end

assign main_reader_memory1_dat_r[7:0] = mem_grain0_3[memadr_20];
assign main_sram1_dat_r1[7:0] = mem_grain0_3[memadr_21];

reg [7:0] mem_grain1_3[0:381];
reg [8:0] memadr_22;
reg [8:0] memadr_23;
always @(posedge sys_clk) begin
memadr_22 <= main_reader_memory1_adr;
end

always @(posedge sys_clk) begin
if (main_sram1_we[1])
mem_grain1_3[main_sram1_adr1] <= main_sram1_dat_w[15:8];
memadr_23 <= main_sram1_adr1;
end

assign main_reader_memory1_dat_r[15:8] = mem_grain1_3[memadr_22];
assign main_sram1_dat_r1[15:8] = mem_grain1_3[memadr_23];

reg [7:0] mem_grain2_3[0:381];
reg [8:0] memadr_24;
reg [8:0] memadr_25;
always @(posedge sys_clk) begin
memadr_24 <= main_reader_memory1_adr;
end

always @(posedge sys_clk) begin
if (main_sram1_we[2])
mem_grain2_3[main_sram1_adr1] <= main_sram1_dat_w[23:16];
memadr_25 <= main_sram1_adr1;
end

assign main_reader_memory1_dat_r[23:16] = mem_grain2_3[memadr_24];
assign main_sram1_dat_r1[23:16] = mem_grain2_3[memadr_25];

reg [7:0] mem_grain3_3[0:381];
reg [8:0] memadr_26;
reg [8:0] memadr_27;
always @(posedge sys_clk) begin
memadr_26 <= main_reader_memory1_adr;
end

always @(posedge sys_clk) begin
if (main_sram1_we[3])
mem_grain3_3[main_sram1_adr1] <= main_sram1_dat_w[31:24];
memadr_27 <= main_sram1_adr1;
end

assign main_reader_memory1_dat_r[31:24] = mem_grain3_3[memadr_26];
assign main_sram1_dat_r1[31:24] = mem_grain3_3[memadr_27];

(* ars_ff1 = "true", async_reg = "true" *) FDPE #(
.INIT(1'd1)
) FDPE (

File diff suppressed because it is too large Load Diff

@ -1,6 +1,7 @@
#!/bin/bash

VENDORS="xilinx"
# vendor:sysclk
VENDORS="xilinx:100 lattice:48"

ME=$(realpath $0)
echo ME=$ME
@ -13,8 +14,10 @@ mkdir -p $BUILD_PATH
GEN_PATH=$PARENT_PATH/generated
mkdir -p $GEN_PATH

for i in $VENDORS
for i_clk in $VENDORS
do
i=$(echo $i_clk | cut -d : -f 1)
clk=$(echo $i_clk | cut -d : -f 2)
TARGET_BUILD_PATH=$BUILD_PATH/$i
TARGET_GEN_PATH=$GEN_PATH/$i
rm -rf $TARGET_BUILD_PATH
@ -22,8 +25,8 @@ do
mkdir -p $TARGET_BUILD_PATH
mkdir -p $TARGET_GEN_PATH

echo "Generating $i in $TARGET_BUILD_PATH"
(cd $TARGET_BUILD_PATH && litesdcard_gen --vendor $i)
echo "Generating $i in $TARGET_BUILD_PATH"
(cd $TARGET_BUILD_PATH && litesdcard_gen --vendor $i --clk-freq $clk)

cp $TARGET_BUILD_PATH/build/gateware/litesdcard_core.v $TARGET_GEN_PATH/
done

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

@ -33,6 +33,8 @@ entity loadstore1 is

dc_stall : in std_ulogic;

events : out Loadstore1EventType;

log_out : out std_ulogic_vector(9 downto 0)
);
end loadstore1;
@ -66,7 +68,6 @@ architecture behave of loadstore1 is
noop : std_ulogic;
mode_32bit : std_ulogic;
addr : std_ulogic_vector(63 downto 0);
addr0 : std_ulogic_vector(63 downto 0);
byte_sel : std_ulogic_vector(7 downto 0);
second_bytes : std_ulogic_vector(7 downto 0);
store_data : std_ulogic_vector(63 downto 0);
@ -97,7 +98,7 @@ architecture behave of loadstore1 is
constant request_init : request_t := (valid => '0', dc_req => '0', load => '0', store => '0', tlbie => '0',
dcbz => '0', read_spr => '0', write_spr => '0', mmu_op => '0',
instr_fault => '0', load_zero => '0', do_update => '0', noop => '0',
mode_32bit => '0', addr => (others => '0'), addr0 => (others => '0'),
mode_32bit => '0', addr => (others => '0'),
byte_sel => x"00", second_bytes => x"00",
store_data => (others => '0'), instr_tag => instr_tag_init,
write_reg => 7x"00", length => x"0",
@ -113,6 +114,7 @@ architecture behave of loadstore1 is
type reg_stage1_t is record
req : request_t;
issued : std_ulogic;
addr0 : std_ulogic_vector(63 downto 0);
end record;

type reg_stage2_t is record
@ -123,6 +125,7 @@ architecture behave of loadstore1 is
wait_mmu : std_ulogic;
one_cycle : std_ulogic;
wr_sel : std_ulogic_vector(1 downto 0);
addr0 : std_ulogic_vector(63 downto 0);
end record;

type reg_stage3_t is record
@ -146,6 +149,7 @@ architecture behave of loadstore1 is
intr_vec : integer range 0 to 16#fff#;
nia : std_ulogic_vector(63 downto 0);
srr1 : std_ulogic_vector(15 downto 0);
events : Loadstore1EventType;
end record;

signal req_in : request_t;
@ -271,10 +275,27 @@ begin
if rising_edge(clk) then
if rst = '1' then
r1.req.valid <= '0';
r1.req.tlbie <= '0';
r1.req.is_slbia <= '0';
r1.req.instr_fault <= '0';
r1.req.load <= '0';
r1.req.priv_mode <= '0';
r1.req.sprn <= (others => '0');
r1.req.xerc <= xerc_init;

r2.req.valid <= '0';
r2.req.tlbie <= '0';
r2.req.is_slbia <= '0';
r2.req.instr_fault <= '0';
r2.req.load <= '0';
r2.req.priv_mode <= '0';
r2.req.sprn <= (others => '0');
r2.req.xerc <= xerc_init;

r2.wait_dc <= '0';
r2.wait_mmu <= '0';
r2.one_cycle <= '0';

r3.dar <= (others => '0');
r3.dsisr <= (others => '0');
r3.state <= IDLE;
@ -282,6 +303,8 @@ begin
r3.interrupt <= '0';
r3.stage1_en <= '1';
r3.convert_lfs <= '0';
r3.events.load_complete <= '0';
r3.events.store_complete <= '0';
flushing <= '0';
else
r1 <= r1in;
@ -358,7 +381,7 @@ begin

-- Translate a load/store instruction into the internal request format
-- XXX this should only depend on l_in, but actually depends on
-- r1.req.addr0 as well (in the l_in.second = 1 case).
-- r1.addr0 as well (in the l_in.second = 1 case).
loadstore1_in: process(all)
variable v : request_t;
variable lsu_sum : std_ulogic_vector(63 downto 0);
@ -399,23 +422,21 @@ begin
end if;

addr := lsu_sum;

if l_in.second = '1' then
if l_in.update = '0' then
-- for the second half of a 16-byte transfer,
-- use the previous address plus 8.
addr := std_ulogic_vector(unsigned(r1.req.addr0(63 downto 3)) + 1) & r1.req.addr0(2 downto 0);
addr := std_ulogic_vector(unsigned(r1.addr0(63 downto 3)) + 1) & r1.addr0(2 downto 0);
else
-- for an update-form load, use the previous address
-- as the value to write back to RA.
addr := r1.req.addr0;
addr := r1.addr0;
end if;
end if;
if l_in.mode_32bit = '1' then
addr(63 downto 32) := (others => '0');
end if;
v.addr := addr;
v.addr0 := addr;

-- XXX Temporary hack. Mark the op as non-cachable if the address
-- is the form 0xc------- for a real-mode access.
@ -513,28 +534,31 @@ begin
variable v : reg_stage1_t;
variable req : request_t;
variable dcreq : std_ulogic;
variable addr : std_ulogic_vector(63 downto 0);
variable issue : std_ulogic;
begin
v := r1;
dcreq := '0';
req := req_in;
if flushing = '1' then
-- Make this a no-op request rather than simply invalid.
-- It will never get to stage 3 since there is a request ahead of
-- it with align_intr = 1.
req.dc_req := '0';
issue := '0';

if busy = '0' then
req := req_in;
v.issued := '0';
if flushing = '1' then
-- Make this a no-op request rather than simply invalid.
-- It will never get to stage 3 since there is a request ahead of
-- it with align_intr = 1.
req.dc_req := '0';
end if;
issue := l_in.valid and req.dc_req;
if l_in.valid = '1' then
v.addr0 := req.addr;
end if;
else
req := r1.req;
end if;

-- Note that l_in.valid is gated with busy inside execute1
if l_in.valid = '1' then
dcreq := req.dc_req and stage1_issue_enable and not d_in.error and not dc_stall;
v.req := req;
v.issued := dcreq;
elsif r1.req.valid = '1' then
if r1.req.valid = '1' then
if r1.req.dc_req = '1' and r1.issued = '0' then
req := r1.req;
dcreq := stage1_issue_enable and not dc_stall and not d_in.error;
v.issued := dcreq;
issue := '1';
elsif r1.issued = '1' and d_in.error = '1' then
v.issued := '0';
elsif stage2_busy_next = '0' then
@ -542,23 +566,25 @@ begin
-- in r1 will go into r2
if r1.req.dc_req = '1' and r1.req.two_dwords = '1' and r1.req.dword_index = '0' then
-- construct the second request for a misaligned access
v.req.dword_index := '1';
v.req.addr := std_ulogic_vector(unsigned(r1.req.addr(63 downto 3)) + 1) & "000";
req.dword_index := '1';
req.addr := std_ulogic_vector(unsigned(r1.req.addr(63 downto 3)) + 1) & "000";
if r1.req.mode_32bit = '1' then
v.req.addr(32) := '0';
req.addr(32) := '0';
end if;
v.req.byte_sel := r1.req.second_bytes;
v.issued := stage1_issue_enable and not dc_stall;
dcreq := stage1_issue_enable and not dc_stall;
req := v.req;
else
v.req.valid := '0';
req.byte_sel := r1.req.second_bytes;
issue := '1';
end if;
end if;
end if;
if r3in.interrupt = '1' then
v.req.valid := '0';
dcreq := '0';
req.valid := '0';
issue := '0';
end if;

v.req := req;
dcreq := issue and stage1_issue_enable and not d_in.error and not dc_stall;
if issue = '1' then
v.issued := dcreq;
end if;

stage1_req <= req;
@ -581,7 +607,7 @@ begin

-- Byte reversing and rotating for stores.
-- Done in the second cycle (the cycle after l_in.valid = 1).
byte_offset := unsigned(r1.req.addr0(2 downto 0));
byte_offset := unsigned(r1.addr0(2 downto 0));
for i in 0 to 7 loop
k := (to_unsigned(i, 3) - byte_offset) xor r1.req.brev_mask;
j := to_integer(k) * 8;
@ -591,6 +617,7 @@ begin
if stage3_busy_next = '0' and
(r1.req.valid = '0' or r1.issued = '1' or r1.req.dc_req = '0') then
v.req := r1.req;
v.addr0 := r1.addr0;
v.req.store_data := store_data;
v.wait_dc := r1.req.valid and r1.req.dc_req and not r1.req.load_sp and
not (r1.req.two_dwords and not r1.req.dword_index);
@ -668,6 +695,7 @@ begin
do_update := '0';
v.convert_lfs := '0';
v.srr1 := (others => '0');
v.events := (others => '0');

-- load data formatting
-- shift and byte-reverse data bytes
@ -796,6 +824,7 @@ begin
mmu_mtspr := r2.req.write_spr;
if r2.req.instr_fault = '1' then
v.state := MMU_LOOKUP;
v.events.itlb_miss := '1';
else
v.state := TLBIE_WAIT;
end if;
@ -838,6 +867,9 @@ begin
v.state := IDLE;
end if;

v.events.load_complete := r2.req.load and complete;
v.events.store_complete := (r2.req.store or r2.req.dcbz) and complete;

-- generate DSI or DSegI for load/store exceptions
-- or ISI or ISegI for instruction fetch exceptions
v.interrupt := exception;
@ -873,7 +905,7 @@ begin
write_data := sprval;
when "01" =>
-- update reg
write_data := r2.req.addr0;
write_data := r2.addr0;
when "10" =>
-- lfs result
write_data := load_dp_data;
@ -946,6 +978,8 @@ begin
e_out.in_progress <= in_progress;
e_out.interrupt <= r3.interrupt;

events <= r3.events;

-- Busy calculation.
stage3_busy_next <= r2.req.valid and not (complete or part_done or exception);


@ -20,20 +20,7 @@ end entity logical;

architecture behaviour of logical is

subtype twobit is unsigned(1 downto 0);
type twobit32 is array(0 to 31) of twobit;
signal pc2 : twobit32;
subtype threebit is unsigned(2 downto 0);
type threebit16 is array(0 to 15) of threebit;
signal pc4 : threebit16;
subtype fourbit is unsigned(3 downto 0);
type fourbit8 is array(0 to 7) of fourbit;
signal pc8 : fourbit8;
subtype sixbit is unsigned(5 downto 0);
type sixbit2 is array(0 to 1) of sixbit;
signal pc32 : sixbit2;
signal par0, par1 : std_ulogic;
signal popcnt : std_ulogic_vector(63 downto 0);
signal parity : std_ulogic_vector(63 downto 0);
signal permute : std_ulogic_vector(7 downto 0);

@ -109,35 +96,6 @@ begin
variable negative : std_ulogic;
variable j : integer;
begin
-- population counts
for i in 0 to 31 loop
pc2(i) <= unsigned("0" & rs(i * 2 downto i * 2)) + unsigned("0" & rs(i * 2 + 1 downto i * 2 + 1));
end loop;
for i in 0 to 15 loop
pc4(i) <= ('0' & pc2(i * 2)) + ('0' & pc2(i * 2 + 1));
end loop;
for i in 0 to 7 loop
pc8(i) <= ('0' & pc4(i * 2)) + ('0' & pc4(i * 2 + 1));
end loop;
for i in 0 to 1 loop
pc32(i) <= ("00" & pc8(i * 4)) + ("00" & pc8(i * 4 + 1)) +
("00" & pc8(i * 4 + 2)) + ("00" & pc8(i * 4 + 3));
end loop;
popcnt <= (others => '0');
if datalen(3 downto 2) = "00" then
-- popcntb
for i in 0 to 7 loop
popcnt(i * 8 + 3 downto i * 8) <= std_ulogic_vector(pc8(i));
end loop;
elsif datalen(3) = '0' then
-- popcntw
for i in 0 to 1 loop
popcnt(i * 32 + 5 downto i * 32) <= std_ulogic_vector(pc32(i));
end loop;
else
popcnt(6 downto 0) <= std_ulogic_vector(('0' & pc32(0)) + ('0' & pc32(1)));
end if;

-- parity calculations
par0 <= rs(0) xor rs(8) xor rs(16) xor rs(24);
par1 <= rs(32) xor rs(40) xor rs(48) xor rs(56);
@ -153,7 +111,7 @@ begin
for i in 0 to 7 loop
j := i * 8;
if rs(j+7 downto j+6) = "00" then
permute(i) <= rb(to_integer(unsigned(rs(j+5 downto j))));
permute(i) <= rb(to_integer(unsigned(not rs(j+5 downto j))));
else
permute(i) <= '0';
end if;
@ -178,8 +136,6 @@ begin
tmp := not tmp;
end if;

when OP_POPCNT =>
tmp := popcnt;
when OP_PRTY =>
tmp := parity;
when OP_CMPB =>

@ -18,7 +18,7 @@ filesets:
- ppc_fx_insns.vhdl
- sim_console.vhdl
- logical.vhdl
- countzero.vhdl
- countbits.vhdl
- control.vhdl
- execute1.vhdl
- fpu.vhdl
@ -27,6 +27,7 @@ filesets:
- dcache.vhdl
- divider.vhdl
- rotator.vhdl
- pmu.vhdl
- writeback.vhdl
- insn_helpers.vhdl
- core.vhdl
@ -105,6 +106,12 @@ filesets:
- fpga/clk_gen_plle2.vhd : {file_type : vhdlSource-2008}
- fpga/top-arty.vhdl : {file_type : vhdlSource-2008}

wukong-v2:
files:
- fpga/wukong-v2.xdc : {file_type : xdc}
- fpga/clk_gen_plle2.vhd : {file_type : vhdlSource-2008}
- fpga/top-wukong-v2.vhdl : {file_type : vhdlSource-2008}

cmod_a7-35:
files:
- fpga/cmod_a7-35.xdc : {file_type : xdc}
@ -137,6 +144,7 @@ targets:
- uart_is_16550
- has_fpu
- has_btc
- has_short_mult
tools:
vivado: {part : xc7a100tcsg324-1}
toplevel : toplevel
@ -242,6 +250,7 @@ targets:
- uart_is_16550
- has_fpu
- has_btc
- has_short_mult
generate: [litedram_nexys_video, liteeth_nexys_video, litesdcard_nexys_video]
tools:
vivado: {part : xc7a200tsbg484-1}
@ -262,6 +271,7 @@ targets:
- has_uart1
- has_fpu=false
- has_btc=false
- has_short_mult
- use_litesdcard
tools:
vivado: {part : xc7a35ticsg324-1L}
@ -284,6 +294,7 @@ targets:
- has_uart1
- has_fpu=false
- has_btc=false
- has_short_mult
generate: [litedram_arty, liteeth_arty, litesdcard_arty]
tools:
vivado: {part : xc7a35ticsg324-1L}
@ -304,6 +315,7 @@ targets:
- has_uart1
- has_fpu
- has_btc
- has_short_mult
- use_litesdcard
tools:
vivado: {part : xc7a100ticsg324-1L}
@ -326,11 +338,56 @@ targets:
- has_uart1
- has_fpu
- has_btc
- has_short_mult
generate: [litedram_arty, liteeth_arty, litesdcard_arty]
tools:
vivado: {part : xc7a100ticsg324-1L}
toplevel : toplevel

wukong-v2-a100t-nodram:
default_tool: vivado
filesets: [core, wukong-v2, soc, fpga, debug_xilinx, uart16550, xilinx_specific, litesdcard]
parameters:
- memory_size
- ram_init_file
- use_litedram=false
- use_liteeth=false
- use_litesdcard=true
- disable_flatten_core
- spi_flash_offset=4194304
- clk_frequency=100000000
- log_length=2048
- uart_is_16550
- has_fpu
- has_btc
- has_short_mult
generate: [litesdcard_wukong-v2]
tools:
vivado: {part : xc7a100tfgg676-1}
toplevel : toplevel

wukong-v2-a100t:
default_tool: vivado
filesets: [core, wukong-v2, soc, fpga, debug_xilinx, litedram, liteeth, uart16550, xilinx_specific, litesdcard]
parameters:
- memory_size=0
- ram_init_file
- use_litedram=true
- use_liteeth=true
- use_litesdcard=true
- disable_flatten_core
- no_bram=true
- spi_flash_offset=4194304
- log_length=0
- uart_is_16550
- has_fpu
- has_btc
- has_short_mult
generate: [litedram_wukong-v2, liteeth_wukong-v2, litesdcard_wukong-v2]
tools:
vivado: {part : xc7a100tfgg676-1}
toplevel : toplevel

cmod_a7-35:
default_tool: vivado
filesets: [core, cmod_a7-35, soc, fpga, debug_xilinx, uart16550, xilinx_specific]
@ -388,6 +445,18 @@ generate:
generator: litedram_gen
parameters: {board : genesys2}

litedram_wukong-v2:
generator: litedram_gen
parameters: {board : wukong-v2}

liteeth_wukong-v2:
generator: liteeth_gen
parameters: {board : wukong-v2}

litesdcard_wukong-v2:
generator: litesdcard_gen
parameters: {vendor : xilinx}

parameters:
memory_size:
datatype : int
@ -429,6 +498,12 @@ parameters:
paramtype : generic
default : true

has_short_mult:
datatype : bool
description : Include a 16 bit x 16 bit single-cycle multiplier in the core
paramtype : generic
default : false

disable_flatten_core:
datatype : bool
description : Prevent Vivado from flattening the main core components

@ -86,3 +86,22 @@ begin
rin <= v;
end process;
end architecture behaviour;

library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;

entity short_multiply is
port (
clk : in std_ulogic;

a_in : in std_ulogic_vector(15 downto 0);
b_in : in std_ulogic_vector(15 downto 0);
m_out : out std_ulogic_vector(31 downto 0)
);
end entity short_multiply;

architecture behaviour of short_multiply is
begin
m_out <= std_ulogic_vector(signed(a_in) * signed(b_in));
end architecture behaviour;

@ -0,0 +1,6 @@
interface ftdi
ftdi_vid_pid 0x0403 0x6010
ftdi_channel 0
ftdi_layout_init 0x00e8 0x60eb
reset_config none
adapter_khz 25000

@ -7,7 +7,7 @@ import sys

BASE = os.path.dirname(os.path.abspath(__file__))

def flash(config, flash_proxy, address, data, filetype="", set_qe=False):
def flash(cable, config, flash_proxy, address, data, filetype="", set_qe=False):
script = "; ".join([
"init",
"jtagspi_init 0 {{{}}}".format(flash_proxy),
@ -17,7 +17,7 @@ def flash(config, flash_proxy, address, data, filetype="", set_qe=False):
"exit"
])
print(script)
subprocess.call(["openocd", "-f", config, "-c", script])
subprocess.call(["openocd", "-f", cable, "-f", config, "-c", script])

def get_version():
a = subprocess.run(["openocd", "-v"], capture_output=True)
@ -33,6 +33,7 @@ parser.add_argument("file", help="file to write to flash")
parser.add_argument("-a", "--address", help="offset in flash", type=lambda x: int(x,0), default=0)
parser.add_argument("-f", "--fpga", help="a35, a100 or a200", default="a35")
parser.add_argument("-t", "--filetype", help="file type such as 'bin'", default="")
parser.add_argument("-c", "--cable", help="cable type such as 'arty'", default="arty")
args = parser.parse_args()

version = get_version()
@ -49,5 +50,6 @@ else:

proxy = os.path.join(BASE, proxy)
config = os.path.join(BASE, "xilinx-xc7{}.cfg".format(version))
cable = os.path.join(BASE, "{}.cfg".format(args.cable))

flash(config, proxy, args.address, args.file, args.filetype.lower())
flash(cable, config, proxy, args.address, args.file, args.filetype.lower())

@ -1,9 +1,3 @@
interface ftdi
ftdi_vid_pid 0x0403 0x6010
ftdi_channel 0
ftdi_layout_init 0x00e8 0x60eb
reset_config none
adapter_khz 25000

source [find cpld/xilinx-xc7.cfg]


@ -1,13 +1,6 @@
# This file is the same sa xilinx-xc7.cfg, except we use
# verify_image instead of verify_bank

interface ftdi
ftdi_vid_pid 0x0403 0x6010
ftdi_channel 0
ftdi_layout_init 0x00e8 0x60eb
reset_config none
adapter_khz 25000

source [find cpld/xilinx-xc7.cfg]

# From jtagspi.cfg with modification to support

@ -0,0 +1,365 @@
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;

library work;
use work.common.all;
use work.decode_types.all;

entity pmu is
port (
clk : in std_ulogic;
rst : in std_ulogic;
p_in : in Execute1ToPMUType;
p_out : out PMUToExecute1Type
);
end entity pmu;

architecture behaviour of pmu is

-- MMCR0 bit numbers
constant MMCR0_FC : integer := 63 - 32;
constant MMCR0_FCS : integer := 63 - 33;
constant MMCR0_FCP : integer := 63 - 34;
constant MMCR0_FCM1 : integer := 63 - 35;
constant MMCR0_FCM0 : integer := 63 - 36;
constant MMCR0_PMAE : integer := 63 - 37;
constant MMCR0_FCECE : integer := 63 - 38;
constant MMCR0_TBSEL : integer := 63 - 40;
constant MMCR0_TBEE : integer := 63 - 41;
constant MMCR0_BHRBA : integer := 63 - 42;
constant MMCR0_EBE : integer := 63 - 43;
constant MMCR0_PMCC : integer := 63 - 45;
constant MMCR0_PMC1CE : integer := 63 - 48;
constant MMCR0_PMCjCE : integer := 63 - 49;
constant MMCR0_TRIGGER : integer := 63 - 50;
constant MMCR0_FCPC : integer := 63 - 51;
constant MMCR0_PMAQ : integer := 63 - 52;
constant MMCR0_PMCCEXT : integer := 63 - 54;
constant MMCR0_CC56RUN : integer := 63 - 55;
constant MMCR0_PMAO : integer := 63 - 56;
constant MMCR0_FC1_4 : integer := 63 - 58;
constant MMCR0_FC5_6 : integer := 63 - 59;
constant MMCR0_FC1_4W : integer := 63 - 62;

-- MMCR2 bit numbers
constant MMCR2_FC0S : integer := 63 - 0;
constant MMCR2_FC0P0 : integer := 63 - 1;
constant MMCR2_FC0M1 : integer := 63 - 3;
constant MMCR2_FC0M0 : integer := 63 - 4;
constant MMCR2_FC0WAIT : integer := 63 - 5;
constant MMCR2_FC1S : integer := 54 - 0;
constant MMCR2_FC1P0 : integer := 54 - 1;
constant MMCR2_FC1M1 : integer := 54 - 3;
constant MMCR2_FC1M0 : integer := 54 - 4;
constant MMCR2_FC1WAIT : integer := 54 - 5;
constant MMCR2_FC2S : integer := 45 - 0;
constant MMCR2_FC2P0 : integer := 45 - 1;
constant MMCR2_FC2M1 : integer := 45 - 3;
constant MMCR2_FC2M0 : integer := 45 - 4;
constant MMCR2_FC2WAIT : integer := 45 - 5;
constant MMCR2_FC3S : integer := 36 - 0;
constant MMCR2_FC3P0 : integer := 36 - 1;
constant MMCR2_FC3M1 : integer := 36 - 3;
constant MMCR2_FC3M0 : integer := 36 - 4;
constant MMCR2_FC3WAIT : integer := 36 - 5;
constant MMCR2_FC4S : integer := 27 - 0;
constant MMCR2_FC4P0 : integer := 27 - 1;
constant MMCR2_FC4M1 : integer := 27 - 3;
constant MMCR2_FC4M0 : integer := 27 - 4;
constant MMCR2_FC4WAIT : integer := 27 - 5;
constant MMCR2_FC5S : integer := 18 - 0;
constant MMCR2_FC5P0 : integer := 18 - 1;
constant MMCR2_FC5M1 : integer := 18 - 3;
constant MMCR2_FC5M0 : integer := 18 - 4;
constant MMCR2_FC5WAIT : integer := 18 - 5;
constant MMCR2_FC6S : integer := 9 - 0;
constant MMCR2_FC6P0 : integer := 9 - 1;
constant MMCR2_FC6M1 : integer := 9 - 3;
constant MMCR2_FC6M0 : integer := 9 - 4;
constant MMCR2_FC6WAIT : integer := 9 - 5;

-- MMCRA bit numbers
constant MMCRA_TECX : integer := 63 - 36;
constant MMCRA_TECM : integer := 63 - 44;
constant MMCRA_TECE : integer := 63 - 47;
constant MMCRA_TS : integer := 63 - 51;
constant MMCRA_TE : integer := 63 - 55;
constant MMCRA_ES : integer := 63 - 59;
constant MMCRA_SM : integer := 63 - 62;
constant MMCRA_SE : integer := 63 - 63;

-- SIER bit numbers
constant SIER_SAMPPR : integer := 63 - 38;
constant SIER_SIARV : integer := 63 - 41;
constant SIER_SDARV : integer := 63 - 42;
constant SIER_TE : integer := 63 - 43;
constant SIER_SITYPE : integer := 63 - 48;
constant SIER_SICACHE : integer := 63 - 51;
constant SIER_SITAKBR : integer := 63 - 52;
constant SIER_SIMISPR : integer := 63 - 53;
constant SIER_SIMISPRI : integer := 63 - 55;
constant SIER_SIDERAT : integer := 63 - 56;
constant SIER_SIDAXL : integer := 63 - 59;
constant SIER_SIDSAI : integer := 63 - 62;
constant SIER_SICMPL : integer := 63 - 63;

type pmc_array is array(1 to 6) of std_ulogic_vector(31 downto 0);
signal pmcs : pmc_array;
signal mmcr0 : std_ulogic_vector(31 downto 0);
signal mmcr1 : std_ulogic_vector(63 downto 0);
signal mmcr2 : std_ulogic_vector(63 downto 0);
signal mmcra : std_ulogic_vector(63 downto 0);
signal siar : std_ulogic_vector(63 downto 0);
signal sdar : std_ulogic_vector(63 downto 0);
signal sier : std_ulogic_vector(63 downto 0);

signal doinc : std_ulogic_vector(1 to 6);
signal doalert : std_ulogic;
signal doevent : std_ulogic;

signal prev_tb : std_ulogic_vector(3 downto 0);

begin
-- mfspr mux
with p_in.spr_num(3 downto 0) select p_out.spr_val <=
32x"0" & pmcs(1) when "0011",
32x"0" & pmcs(2) when "0100",
32x"0" & pmcs(3) when "0101",
32x"0" & pmcs(4) when "0110",
32x"0" & pmcs(5) when "0111",
32x"0" & pmcs(6) when "1000",
32x"0" & mmcr0 when "1011",
mmcr1 when "1110",
mmcr2 when "0001",
mmcra when "0010",
siar when "1100",
sdar when "1101",
sier when "0000",
64x"0" when others;

p_out.intr <= mmcr0(MMCR0_PMAO);

pmu_1: process(clk)
begin
if rising_edge(clk) then
if rst = '1' then
mmcr0 <= 32x"80000000";
else
for i in 1 to 6 loop
if p_in.mtspr = '1' and to_integer(unsigned(p_in.spr_num(3 downto 0))) = i + 2 then
pmcs(i) <= p_in.spr_val(31 downto 0);
elsif doinc(i) = '1' then
pmcs(i) <= std_ulogic_vector(unsigned(pmcs(i)) + 1);
end if;
end loop;
if p_in.mtspr = '1' and p_in.spr_num(3 downto 0) = "1011" then
mmcr0 <= p_in.spr_val(31 downto 0);
mmcr0(MMCR0_BHRBA) <= '0'; -- no BHRB yet
mmcr0(MMCR0_EBE) <= '0'; -- no EBBs yet
else
if doalert = '1' then
mmcr0(MMCR0_PMAE) <= '0';
mmcr0(MMCR0_PMAO) <= '1';
mmcr0(MMCR0_PMAQ) <= '0';
end if;
if doevent = '1' and mmcr0(MMCR0_FCECE) = '1' and mmcr0(MMCR0_TRIGGER) = '0' then
mmcr0(MMCR0_FC) <= '1';
end if;
if (doevent = '1' or pmcs(1)(31) = '1') and mmcr0(MMCR0_TRIGGER) = '1' then
mmcr0(MMCR0_TRIGGER) <= '0';
end if;
end if;
if p_in.mtspr = '1' and p_in.spr_num(3 downto 0) = "1110" then
mmcr1 <= p_in.spr_val;
end if;
if p_in.mtspr = '1' and p_in.spr_num(3 downto 0) = "0001" then
mmcr2 <= p_in.spr_val;
end if;
if p_in.mtspr = '1' and p_in.spr_num(3 downto 0) = "0010" then
mmcra <= p_in.spr_val;
-- we don't support random sampling yet
mmcra(MMCRA_SE) <= '0';
end if;
if p_in.mtspr = '1' and p_in.spr_num(3 downto 0) = "1100" then
siar <= p_in.spr_val;
elsif doalert = '1' then
siar <= p_in.nia;
end if;
if p_in.mtspr = '1' and p_in.spr_num(3 downto 0) = "1101" then
sdar <= p_in.spr_val;
elsif doalert = '1' then
sdar <= p_in.addr;
end if;
if p_in.mtspr = '1' and p_in.spr_num(3 downto 0) = "0000" then
sier <= p_in.spr_val;
elsif doalert = '1' then
sier <= (others => '0');
sier(SIER_SAMPPR) <= p_in.pr_msr;
sier(SIER_SIARV) <= '1';
sier(SIER_SDARV) <= p_in.addr_v;
end if;
end if;
prev_tb <= p_in.tbbits;
end if;
end process;

pmu_2: process(all)
variable tbdiff : std_ulogic_vector(3 downto 0);
variable tbbit : std_ulogic;
variable freeze : std_ulogic;
variable event : std_ulogic;
variable j : integer;
variable inc : std_ulogic_vector(1 to 6);
variable fc14wo : std_ulogic;
begin
event := '0';

-- Check for timebase events
tbdiff := p_in.tbbits and not prev_tb;
tbbit := tbdiff(3 - to_integer(unsigned(mmcr0(MMCR0_TBSEL + 1 downto MMCR0_TBSEL))));
if tbbit = '1' and mmcr0(MMCR0_TBEE) = '1' then
event := '1';
end if;

-- Check for counter negative events
if mmcr0(MMCR0_PMC1CE) = '1' and pmcs(1)(31) = '1' then
event := '1';
end if;
if mmcr0(MMCR0_PMCjCE) = '1' and
(pmcs(2)(31) or pmcs(3)(31) or pmcs(4)(31)) = '1' then
event := '1';
end if;
if mmcr0(MMCR0_PMCjCE) = '1' and
mmcr0(MMCR0_PMCC + 1 downto MMCR0_PMCC) /= "11" and
(pmcs(5)(31) or pmcs(6)(31)) = '1' then
event := '1';
end if;

-- Event selection
inc := (others => '0');
fc14wo := '0';
case mmcr1(31 downto 24) is
when x"f0" =>
inc(1) := '1';
fc14wo := '1'; -- override MMCR0[FC1_4WAIT]
when x"f2" | x"fe" =>
inc(1) := p_in.occur.instr_complete;
when x"f4" =>
inc(1) := p_in.occur.fp_complete;
when x"f6" =>
inc(1) := p_in.occur.itlb_miss;
when x"f8" =>
inc(1) := p_in.occur.no_instr_avail;
when x"fa" =>
inc(1) := p_in.run;
when x"fc" =>
inc(1) := p_in.occur.ld_complete;
when others =>
end case;

case mmcr1(23 downto 16) is
when x"f0" =>
inc(2) := p_in.occur.st_complete;
when x"f2" =>
inc(2) := p_in.occur.dispatch;
when x"f4" =>
inc(2) := p_in.run;
when x"f6" =>
inc(2) := p_in.occur.dtlb_miss_resolved;
when x"f8" =>
inc(2) := p_in.occur.ext_interrupt;
when x"fa" =>
inc(2) := p_in.occur.br_taken_complete;
when x"fc" =>
inc(2) := p_in.occur.icache_miss;
when x"fe" =>
inc(2) := p_in.occur.dc_miss_resolved;
when others =>
end case;

case mmcr1(15 downto 8) is
when x"f0" =>
inc(3) := p_in.occur.dc_store_miss;
when x"f2" =>
inc(3) := p_in.occur.dispatch;
when x"f4" =>
inc(3) := p_in.occur.instr_complete and p_in.run;
when x"f6" =>
inc(3) := p_in.occur.dc_ld_miss_resolved;
when x"f8" =>
inc(3) := tbbit;
when x"fe" =>
inc(3) := p_in.occur.dtlb_miss;
when others =>
end case;

case mmcr1(7 downto 0) is
when x"f0" =>
inc(4) := p_in.occur.dc_load_miss;
when x"f2" =>
inc(4) := p_in.occur.dispatch;
when x"f4" =>
inc(4) := p_in.run;
when x"f6" =>
inc(4) := p_in.occur.br_mispredict;
when x"f8" =>
inc(4) := p_in.occur.ipref_discard;
when x"fa" =>
inc(4) := p_in.occur.instr_complete and p_in.run;
when x"fc" =>
inc(4) := p_in.occur.itlb_miss_resolved;
when x"fe" =>
inc(4) := p_in.occur.ld_miss_nocache;
when others =>
end case;

inc(5) := (mmcr0(MMCR0_CC56RUN) or p_in.run) and p_in.occur.instr_complete;
inc(6) := mmcr0(MMCR0_CC56RUN) or p_in.run;

-- Evaluate freeze conditions
freeze := mmcr0(MMCR0_FC) or
(mmcr0(MMCR0_FCS) and not p_in.pr_msr) or
(mmcr0(MMCR0_FCP) and not mmcr0(MMCR0_FCPC) and p_in.pr_msr) or
(not mmcr0(MMCR0_FCP) and mmcr0(MMCR0_FCPC) and p_in.pr_msr) or
(mmcr0(MMCR0_FCM1) and p_in.pmm_msr) or
(mmcr0(MMCR0_FCM0) and not p_in.pmm_msr);

if freeze = '1' or mmcr0(MMCR0_FC1_4) = '1' or
(mmcr0(MMCR0_FC1_4W) = '1' and p_in.run = '0' and fc14wo = '0') then
inc(1) := '0';
end if;
if freeze = '1' or mmcr0(MMCR0_FC1_4) = '1' or
(mmcr0(MMCR0_FC1_4W) = '1' and p_in.run = '0') then
inc(2 to 4) := "000";
end if;
if freeze = '1' or mmcr0(MMCR0_FC5_6) = '1' then
inc(5 to 6) := "00";
end if;
if mmcr0(MMCR0_TRIGGER) = '1' then
inc(2 to 6) := "00000";
end if;
for i in 1 to 6 loop
j := (i - 1) * 9;
if (mmcr2(MMCR2_FC0S - j) = '1' and p_in.pr_msr = '0') or
(mmcr2(MMCR2_FC0P0 - j) = '1' and p_in.pr_msr = '1') or
(mmcr2(MMCR2_FC0M1 - j) = '1' and p_in.pmm_msr = '1') or
(mmcr2(MMCR2_FC0M1 - j) = '1' and p_in.pmm_msr = '1') then
inc(i) := '0';
end if;
end loop;

-- When MMCR0[PMCC] = "11", PMC5 and PMC6 are not controlled by the
-- MMCRs and don't generate events, but do continue to count run
-- instructions and run cycles.
if mmcr0(MMCR0_PMCC + 1 downto MMCR0_PMCC) = "11" then
inc(5) := p_in.run and p_in.occur.instr_complete;
inc(6) := p_in.run;
end if;

doinc <= inc;
doevent <= event;
doalert <= event and mmcr0(MMCR0_PMAE);
end process;

end architecture behaviour;

@ -1,10 +1,16 @@
CFLAGS = -O2 -g -Wall -std=c99
# CFLAGS += -I urjtag/urjtag/include/ -L urjtag/urjtag/src/.libs/
#
ifeq ($(STATIC_URJTAG), 1)
LIBURJTAG=-Wl,-Bstatic -lurjtag -Wl,-Bdynamic -lftdi1 -lusb-1.0 -lreadline
else
LIBURJTAG=-lurjtag
endif

all: mw_debug

mw_debug: mw_debug.c
$(CC) -o $@ $^ $(CFLAGS) -lurjtag
$(CC) -o $@ $^ $(CFLAGS) $(LIBURJTAG)

clean:
rm -f mw_debug

@ -49,7 +49,7 @@
static bool debug;

struct backend {
int (*init)(const char *target);
int (*init)(const char *target, int freq);
int (*reset)(void);
int (*command)(uint8_t op, uint8_t addr, uint64_t *data);
};
@ -67,13 +67,15 @@ static void check(int r, const char *failstr)

static int sim_fd = -1;

static int sim_init(const char *target)
static int sim_init(const char *target, int freq)
{
struct sockaddr_in saddr;
struct hostent *hp;
const char *p, *host;
int port, rc;

(void)freq;

if (!target)
target = "localhost:13245";
p = strchr(target, ':');
@ -210,22 +212,33 @@ static struct backend sim_backend = {

static urj_chain_t *jc;

static int jtag_init(const char *target)
static int common_jtag_init(const char *target, int freq)
{
const char *sep;
const char *cable;
char *params[] = { NULL, };
urj_part_t *p;
uint32_t id;
int rc, part;
const int max_params = 20;
char *params[max_params+1];
int rc;

if (!target)
target = "DigilentHS1";
sep = strchr(target, ':');
target = "probe";
memset(params, 0x0, sizeof(params));
sep = strchr(target, ' ');
cable = strndup(target, sep - target);
if (sep && *sep) {
fprintf(stderr, "jtag cable params not supported yet\n");
return -1;
char *param_str = strdup(sep);
char *s = param_str;
for (int i = 0; *s; s++) {
if (*s == ' ') {
if (i >= max_params) {
fprintf(stderr, "Too many jtag cable params\n");
return -1;
}
*s = '\0';
params[i] = s+1;
i++;
}
}
}
if (debug)
printf("Opening jtag backend cable '%s'\n", cable);
@ -237,12 +250,39 @@ static int jtag_init(const char *target)
}
jc->main_part = 0;

if (strcmp(cable, "probe") == 0) {
char *cparams[] = { NULL, NULL,};
rc = urj_tap_cable_usb_probe(cparams);
if (rc != URJ_STATUS_OK) {
fprintf(stderr, "JTAG cable probe failed: %s\n", urj_error_describe());
return -1;
}
cable = strdup(cparams[1]);
}
rc = urj_tap_chain_connect(jc, cable, params);
if (rc != URJ_STATUS_OK) {
fprintf(stderr, "JTAG cable detect failed\n");
fprintf(stderr, "JTAG cable detect failed: %s\n", urj_error_describe());
return -1;
}

if (freq) {
urj_tap_cable_set_frequency(jc->cable, freq);
}

return 0;
}

static int bscane2_init(const char *target, int freq)
{
urj_part_t *p;
uint32_t id;
int rc;

rc = common_jtag_init(target, freq);
if (rc < 0) {
return rc;
}

/* XXX Hard wire part 0, that might need to change (use params and detect !) */
rc = urj_tap_manual_add(jc, 6);
if (rc < 0) {
@ -255,7 +295,7 @@ static int jtag_init(const char *target)
}
urj_part_parts_set_instruction(jc->parts, "BYPASS");

jc->active_part = part = 0;
jc->active_part = 0;

p = urj_tap_chain_active_part(jc);
if (!p) {
@ -291,6 +331,69 @@ static int jtag_init(const char *target)
return 0;
}

static int ecp5_init(const char *target, int freq)
{
urj_part_t *p;
uint32_t id;
int rc;

rc = common_jtag_init(target, freq);
if (rc < 0) {
return rc;
}

/* XXX Hard wire part 0, that might need to change (use params and detect !) */
rc = urj_tap_manual_add(jc, 8);
if (rc < 0) {
fprintf(stderr, "JTAG failed to add part! : %s\n", urj_error_describe());
return -1;
}
if (jc->parts == NULL || jc->parts->len == 0) {
fprintf(stderr, "JTAG Something's wrong after adding part! : %s\n", urj_error_describe());
return -1;
}
urj_part_parts_set_instruction(jc->parts, "BYPASS");

jc->active_part = 0;

p = urj_tap_chain_active_part(jc);
if (!p) {
fprintf(stderr, "Failed to get active JTAG part\n");
return -1;
}
rc = urj_part_data_register_define(p, "IDCODE_REG", 32);
if (rc != URJ_STATUS_OK) {
fprintf(stderr, "JTAG failed to add IDCODE_REG register! : %s\n",
urj_error_describe());
return -1;
}
// READ_ID = 0xE0 = 11100000, from Lattice TN1260 sysconfig guide
if (urj_part_instruction_define(p, "IDCODE", "11100000", "IDCODE_REG") == NULL) {
fprintf(stderr, "JTAG failed to add IDCODE instruction! : %s\n",
urj_error_describe());
return -1;
}
rc = urj_part_data_register_define(p, "USER2_REG", 74);
if (rc != URJ_STATUS_OK) {
fprintf(stderr, "JTAG failed to add USER2_REG register !\n");
return -1;
}
// ER1 = 0x32 = 00110010b
if (urj_part_instruction_define(p, "USER2", "00110010", "USER2_REG") == NULL) {
fprintf(stderr, "JTAG failed to add USER2 instruction !\n");
return -1;
}
urj_part_set_instruction(p, "IDCODE");
urj_tap_chain_shift_instructions(jc);
urj_tap_chain_shift_data_registers(jc, 1);
id = urj_tap_register_get_value(p->active_instruction->data_register->out);
printf("Found device ID: 0x%08x\n", id);
urj_part_set_instruction(p, "USER2");
urj_tap_chain_shift_instructions(jc);

return 0;
}

static int jtag_reset(void)
{
return 0;
@ -330,8 +433,14 @@ static int jtag_command(uint8_t op, uint8_t addr, uint64_t *data)
return rc;
}

static struct backend jtag_backend = {
.init = jtag_init,
static struct backend bscane2_backend = {
.init = bscane2_init,
.reset = jtag_reset,
.command = jtag_command,
};

static struct backend ecp5_backend = {
.init = ecp5_init,
.reset = jtag_reset,
.command = jtag_command,
};
@ -653,7 +762,7 @@ static void ltrig_set(uint64_t addr)

static void usage(const char *cmd)
{
fprintf(stderr, "Usage: %s -b <jtag|sim> <command> <args>\n", cmd);
fprintf(stderr, "Usage: %s -b <jtag|ecp5|sim> <command> <args>\n", cmd);

fprintf(stderr, "\n");
fprintf(stderr, " CPU core:\n");
@ -697,7 +806,7 @@ int main(int argc, char *argv[])
{
const char *progname = argv[0];
const char *target = NULL;
int rc, i = 1;
int rc, i = 1, freq = 0;

b = NULL;

@ -708,9 +817,10 @@ int main(int argc, char *argv[])
{ "backend", required_argument, 0, 'b' },
{ "target", required_argument, 0, 't' },
{ "debug", no_argument, 0, 'd' },
{ "frequency", no_argument, 0, 's' },
{ 0, 0, 0, 0 }
};
c = getopt_long(argc, argv, "dhb:t:", lopts, &oindex);
c = getopt_long(argc, argv, "dhb:t:s:", lopts, &oindex);
if (c < 0)
break;
switch(c) {
@ -720,8 +830,10 @@ int main(int argc, char *argv[])
case 'b':
if (strcmp(optarg, "sim") == 0)
b = &sim_backend;
else if (strcmp(optarg, "jtag") == 0)
b = &jtag_backend;
else if (strcmp(optarg, "jtag") == 0 || strcmp(optarg, "bscane2") == 0)
b = &bscane2_backend;
else if (strcmp(optarg, "ecp5") == 0)
b = &ecp5_backend;
else {
fprintf(stderr, "Unknown backend %s\n", optarg);
exit(1);
@ -730,17 +842,22 @@ int main(int argc, char *argv[])
case 't':
target = optarg;
break;
case 's':
freq = atoi(optarg);
if (freq == 0) {
fprintf(stderr, "Bad frequency %s\n", optarg);
exit(1);
}
break;
case 'd':
debug = true;
}
}

if (b == NULL) {
fprintf(stderr, "No backend selected\n");
exit(1);
}
if (b == NULL)
b = &bscane2_backend;

rc = b->init(target);
rc = b->init(target, freq);
if (rc < 0)
exit(1);
for (i = optind; i < argc; i++) {
@ -782,7 +899,7 @@ int main(int argc, char *argv[])
if ((i+1) >= argc)
usage(argv[0]);
addr = strtoul(argv[++i], NULL, 16);
if (((i+1) < argc) && isdigit(argv[i+1][0]))
if (((i+1) < argc) && isxdigit(argv[i+1][0]))
count = strtoul(argv[++i], NULL, 16);
mem_read(addr, count);
} else if (strcmp(argv[i], "mw") == 0) {
@ -800,7 +917,7 @@ int main(int argc, char *argv[])
if ((i+1) >= argc)
usage(argv[0]);
filename = argv[++i];
if (((i+1) < argc) && isdigit(argv[i+1][0]))
if (((i+1) < argc) && isxdigit(argv[i+1][0]))
addr = strtoul(argv[++i], NULL, 16);
load(filename, addr);
} else if (strcmp(argv[i], "save") == 0) {

@ -0,0 +1,31 @@
#!/usr/bin/python3

import os
import subprocess
from pexpect import fdpexpect
import sys
import signal

cmd = [ './microwatt-verilator' ]

devNull = open(os.devnull, 'w')
p = subprocess.Popen(cmd, stdout=subprocess.PIPE,
stdin=subprocess.PIPE, stderr=devNull)

exp = fdpexpect.fdspawn(p.stdout)
exp.logfile = sys.stdout.buffer

exp.expect('Type "help\(\)" for more information.')
exp.expect('>>>')

p.stdin.write(b'print("foo")\r\n')
p.stdin.flush()

# Catch the command echoed back to the console
exp.expect('foo', timeout=600)

# Now catch the output
exp.expect('foo', timeout=600)
exp.expect('>>>')

os.kill(p.pid, signal.SIGKILL)

@ -0,0 +1,39 @@
#!/usr/bin/python3

import os
import subprocess
from pexpect import fdpexpect
import sys
import signal

cmd = [ './microwatt-verilator' ]

devNull = open(os.devnull, 'w')
p = subprocess.Popen(cmd, stdout=subprocess.PIPE,
stdin=subprocess.PIPE, stderr=devNull)

exp = fdpexpect.fdspawn(p.stdout)
exp.logfile = sys.stdout.buffer

exp.expect('Type "help\(\)" for more information.')
exp.expect('>>>')

p.stdin.write(b'n2=0\r\n')
p.stdin.write(b'n1=1\r\n')
p.stdin.write(b'for i in range(5):\r\n')
p.stdin.write(b' n0 = n1 + n2\r\n')
p.stdin.write(b' print(n0)\r\n')
p.stdin.write(b' n2 = n1\r\n')
p.stdin.write(b' n1 = n0\r\n')
p.stdin.write(b'\r\n')
p.stdin.flush()

exp.expect('n1 = n0', timeout=600)
exp.expect('1', timeout=600)
exp.expect('2', timeout=600)
exp.expect('3', timeout=600)
exp.expect('5', timeout=600)
exp.expect('8', timeout=600)
exp.expect('>>>', timeout=600)

os.kill(p.pid, signal.SIGKILL)

@ -21,8 +21,8 @@ entity main_bram is
port(
clk : in std_logic;
addr : in std_logic_vector(HEIGHT_BITS - 1 downto 0) ;
di : in std_logic_vector(WIDTH-1 downto 0);
do : out std_logic_vector(WIDTH-1 downto 0);
din : in std_logic_vector(WIDTH-1 downto 0);
dout : out std_logic_vector(WIDTH-1 downto 0);
sel : in std_logic_vector((WIDTH/8)-1 downto 0);
re : in std_ulogic;
we : in std_ulogic
@ -50,9 +50,9 @@ begin
addr64 := (others => '0');
addr64(HEIGHT_BITS + 2 downto 3) := addr;
if we = '1' then
report "RAM writing " & to_hstring(di) & " to " &
report "RAM writing " & to_hstring(din) & " to " &
to_hstring(addr & pad_zeros) & " sel:" & to_hstring(sel);
behavioural_write(di, addr64, to_integer(unsigned(sel)), identifier);
behavioural_write(din, addr64, to_integer(unsigned(sel)), identifier);
end if;
if re = '1' then
behavioural_read(ret_dat_v, addr64, to_integer(unsigned(sel)), identifier);
@ -60,7 +60,7 @@ begin
" returns " & to_hstring(ret_dat_v);
obuf <= ret_dat_v(obuf'left downto 0);
end if;
do <= obuf;
dout <= obuf;
end if;
end process;


@ -59,7 +59,9 @@ entity soc is
SIM : boolean;
HAS_FPU : boolean := true;
HAS_BTC : boolean := true;
HAS_SHORT_MULT : boolean := false;
DISABLE_FLATTEN_CORE : boolean := false;
ALT_RESET_ADDRESS : std_logic_vector(63 downto 0) := (23 downto 0 => '0', others => '1');
HAS_DRAM : boolean := false;
DRAM_SIZE : integer := 0;
DRAM_INIT_SIZE : integer := 0;
@ -81,7 +83,8 @@ entity soc is
DCACHE_TLB_SET_SIZE : natural := 64;
DCACHE_TLB_NUM_WAYS : natural := 2;
HAS_SD_CARD : boolean := false;
NGPIO : natural := 0
HAS_GPIO : boolean := false;
NGPIO : natural := 32
);
port(
rst : in std_ulogic;
@ -220,15 +223,15 @@ architecture behaviour of soc is
signal dmi_core_ack : std_ulogic;

-- Delayed/latched resets and alt_reset
signal rst_core : std_ulogic := '1';
signal rst_uart : std_ulogic := '1';
signal rst_xics : std_ulogic := '1';
signal rst_spi : std_ulogic := '1';
signal rst_gpio : std_ulogic := '1';
signal rst_bram : std_ulogic := '1';
signal rst_dtm : std_ulogic := '1';
signal rst_wbar : std_ulogic := '1';
signal rst_wbdb : std_ulogic := '1';
signal rst_core : std_ulogic;
signal rst_uart : std_ulogic;
signal rst_xics : std_ulogic;
signal rst_spi : std_ulogic;
signal rst_gpio : std_ulogic;
signal rst_bram : std_ulogic;
signal rst_dtm : std_ulogic;
signal rst_wbar : std_ulogic;
signal rst_wbdb : std_ulogic;
signal alt_reset_d : std_ulogic;

-- IO branch split:
@ -237,20 +240,28 @@ architecture behaviour of soc is
SLAVE_IO_ICP,
SLAVE_IO_ICS,
SLAVE_IO_UART1,
SLAVE_IO_SPI_FLASH_REG,
SLAVE_IO_SPI_FLASH_MAP,
SLAVE_IO_SPI_FLASH,
SLAVE_IO_GPIO,
SLAVE_IO_EXTERNAL,
SLAVE_IO_NONE);
signal slave_io_dbg : slave_io_type;
SLAVE_IO_EXTERNAL);
signal current_io_decode : slave_io_type;

signal io_cycle_none : std_ulogic;
signal io_cycle_syscon : std_ulogic;
signal io_cycle_uart : std_ulogic;
signal io_cycle_uart1 : std_ulogic;
signal io_cycle_icp : std_ulogic;
signal io_cycle_ics : std_ulogic;
signal io_cycle_spi_flash : std_ulogic;
signal io_cycle_gpio : std_ulogic;
signal io_cycle_external : std_ulogic;

function wishbone_widen_data(wb : wb_io_master_out) return wishbone_master_out is
variable wwb : wishbone_master_out;
begin
wwb.adr := wb.adr & "00"; -- XXX note wrong adr usage in wishbone_master_out
wwb.adr := wb.adr(wb.adr'left downto 1);
wwb.dat := wb.dat & wb.dat;
wwb.sel := x"00";
if wwb.adr(2) = '0' then
if wb.adr(0) = '0' then
wwb.sel(3 downto 0) := wb.sel;
else
wwb.sel(7 downto 4) := wb.sel;
@ -324,8 +335,9 @@ begin
SIM => SIM,
HAS_FPU => HAS_FPU,
HAS_BTC => HAS_BTC,
HAS_SHORT_MULT => HAS_SHORT_MULT,
DISABLE_FLATTEN => DISABLE_FLATTEN_CORE,
ALT_RESET_ADDRESS => (23 downto 0 => '0', others => '1'),
ALT_RESET_ADDRESS => ALT_RESET_ADDRESS,
LOG_LENGTH => LOG_LENGTH,
ICACHE_NUM_LINES => ICACHE_NUM_LINES,
ICACHE_NUM_WAYS => ICACHE_NUM_WAYS,
@ -404,7 +416,7 @@ begin
variable top_decode : std_ulogic_vector(3 downto 0);
begin
-- Top-level address decoder
top_decode := wb_master_out.adr(31 downto 29) & dram_at_0;
top_decode := wb_master_out.adr(28 downto 26) & dram_at_0;
slave_top := SLAVE_TOP_BRAM;
if std_match(top_decode, "0000") then
slave_top := SLAVE_TOP_BRAM;
@ -462,14 +474,20 @@ begin
-- Misc
variable has_top : boolean;
variable has_bot : boolean;
variable do_cyc : std_ulogic;
variable end_cyc : std_ulogic;
variable slave_io : slave_io_type;
variable match : std_ulogic_vector(31 downto 12);
begin
if rising_edge(system_clk) then
do_cyc := '0';
end_cyc := '0';
if (rst) then
state := IDLE;
wb_io_out.ack <= '0';
wb_io_out.stall <= '0';
wb_sio_out.cyc <= '0';
wb_sio_out.stb <= '0';
end_cyc := '1';
has_top := false;
has_bot := false;
else
@ -485,12 +503,12 @@ begin
wb_io_out.stall <= '1';

-- Start cycle downstream
wb_sio_out.cyc <= '1';
do_cyc := '1';
wb_sio_out.stb <= '1';

-- Copy write enable to IO out, copy address as well
wb_sio_out.we <= wb_io_in.we;
wb_sio_out.adr <= wb_io_in.adr(wb_sio_out.adr'left downto 3) & "000";
wb_sio_out.adr <= wb_io_in.adr(wb_sio_out.adr'left - 1 downto 0) & '0';

-- Do we have a top word and/or a bottom word ?
has_top := wb_io_in.sel(7 downto 4) /= "0000";
@ -514,7 +532,7 @@ begin
wb_sio_out.sel <= wb_io_in.sel(7 downto 4);

-- Bump address
wb_sio_out.adr(2) <= '1';
wb_sio_out.adr(0) <= '1';

-- Wait for ack
state := WAIT_ACK_TOP;
@ -542,14 +560,14 @@ begin
wb_sio_out.sel <= wb_io_in.sel(7 downto 4);

-- Bump address and set STB
wb_sio_out.adr(2) <= '1';
wb_sio_out.adr(0) <= '1';
wb_sio_out.stb <= '1';

-- Wait for new ack
state := WAIT_ACK_TOP;
else
-- We are done, ack up, clear cyc downstram
wb_sio_out.cyc <= '0';
-- We are done, ack up, clear cyc downstream
end_cyc := '1';

-- And ack & unstall upstream
wb_io_out.ack <= '1';
@ -573,7 +591,7 @@ begin
end if;

-- We are done, ack up, clear cyc downstram
wb_sio_out.cyc <= '0';
end_cyc := '1';

-- And ack & unstall upstream
wb_io_out.ack <= '1';
@ -584,144 +602,149 @@ begin
end if;
end case;
end if;

-- Create individual registered cycle signals for the wishbones
-- going to the various peripherals
if do_cyc = '1' or end_cyc = '1' then
io_cycle_none <= '0';
io_cycle_syscon <= '0';
io_cycle_uart <= '0';
io_cycle_uart1 <= '0';
io_cycle_icp <= '0';
io_cycle_ics <= '0';
io_cycle_spi_flash <= '0';
io_cycle_gpio <= '0';
io_cycle_external <= '0';
wb_sio_out.cyc <= '0';
wb_ext_is_dram_init <= '0';
wb_spiflash_is_map <= '0';
wb_spiflash_is_reg <= '0';
wb_ext_is_dram_csr <= '0';
wb_ext_is_eth <= '0';
wb_ext_is_sdcard <= '0';
end if;
if do_cyc = '1' then
-- Decode I/O address
-- This is real address bits 29 downto 12
match := "11" & wb_io_in.adr(26 downto 9);
slave_io := SLAVE_IO_SYSCON;
if std_match(match, x"FF---") and HAS_DRAM then
slave_io := SLAVE_IO_EXTERNAL;
io_cycle_external <= '1';
wb_ext_is_dram_init <= '1';
elsif std_match(match, x"F----") then
slave_io := SLAVE_IO_SPI_FLASH;
io_cycle_spi_flash <= '1';
wb_spiflash_is_map <= '1';
elsif std_match(match, x"C8---") then
-- Ext IO "chip selects"
if std_match(match, x"--00-") and HAS_DRAM then
slave_io := SLAVE_IO_EXTERNAL;
io_cycle_external <= '1';
wb_ext_is_dram_csr <= '1';
elsif (std_match(match, x"--02-") or std_match(match, x"--03-")) and
HAS_LITEETH then
slave_io := SLAVE_IO_EXTERNAL;
io_cycle_external <= '1';
wb_ext_is_eth <= '1';
elsif std_match(match, x"--04-") and HAS_SD_CARD then
slave_io := SLAVE_IO_EXTERNAL;
io_cycle_external <= '1';
wb_ext_is_sdcard <= '1';
else
io_cycle_none <= '1';
end if;
elsif std_match(match, x"C0000") then
slave_io := SLAVE_IO_SYSCON;
io_cycle_syscon <= '1';
elsif std_match(match, x"C0002") then
slave_io := SLAVE_IO_UART;
io_cycle_uart <= '1';
elsif std_match(match, x"C0003") then
slave_io := SLAVE_IO_UART1;
io_cycle_uart1 <= '1';
elsif std_match(match, x"C0004") then
slave_io := SLAVE_IO_ICP;
io_cycle_icp <= '1';
elsif std_match(match, x"C0005") then
slave_io := SLAVE_IO_ICS;
io_cycle_ics <= '1';
elsif std_match(match, x"C0006") then
slave_io := SLAVE_IO_SPI_FLASH;
io_cycle_spi_flash <= '1';
wb_spiflash_is_reg <= '1';
elsif std_match(match, x"C0007") then
slave_io := SLAVE_IO_GPIO;
io_cycle_gpio <= '1';
else
io_cycle_none <= '1';
end if;
current_io_decode <= slave_io;
wb_sio_out.cyc <= '1';
end if;
end if;
end process;
-- IO wishbone slave intercon.
-- IO wishbone slave interconnect.
--
slave_io_intercon: process(wb_sio_out, wb_syscon_out, wb_uart0_out, wb_uart1_out,
wb_ext_io_out, wb_xics_icp_out, wb_xics_ics_out,
wb_spiflash_out)
variable slave_io : slave_io_type;

variable match : std_ulogic_vector(31 downto 12);
variable ext_valid : boolean;
slave_io_intercon: process(all)
begin

-- Simple address decoder.
slave_io := SLAVE_IO_NONE;
match := "11" & wb_sio_out.adr(29 downto 12);
if std_match(match, x"FF---") and HAS_DRAM then
slave_io := SLAVE_IO_EXTERNAL;
elsif std_match(match, x"F----") then
slave_io := SLAVE_IO_SPI_FLASH_MAP;
elsif std_match(match, x"C0000") then
slave_io := SLAVE_IO_SYSCON;
elsif std_match(match, x"C0002") then
slave_io := SLAVE_IO_UART;
elsif std_match(match, x"C0003") then
slave_io := SLAVE_IO_UART1;
elsif std_match(match, x"C8---") then
slave_io := SLAVE_IO_EXTERNAL;
elsif std_match(match, x"C0004") then
slave_io := SLAVE_IO_ICP;
elsif std_match(match, x"C0005") then
slave_io := SLAVE_IO_ICS;
elsif std_match(match, x"C0006") then
slave_io := SLAVE_IO_SPI_FLASH_REG;
elsif std_match(match, x"C0007") then
slave_io := SLAVE_IO_GPIO;
end if;
slave_io_dbg <= slave_io;
wb_uart0_in <= wb_sio_out;
wb_uart0_in.cyc <= '0';
wb_uart0_in.cyc <= io_cycle_uart;
wb_uart1_in <= wb_sio_out;
wb_uart1_in.cyc <= '0';
wb_uart1_in.cyc <= io_cycle_uart1;

wb_spiflash_in <= wb_sio_out;
wb_spiflash_in.cyc <= '0';
wb_spiflash_is_reg <= '0';
wb_spiflash_is_map <= '0';
wb_spiflash_in.cyc <= io_cycle_spi_flash;
-- Clear top bits so they don't make their way to the
-- flash chip.
wb_spiflash_in.adr(27 downto 26) <= "00";

wb_gpio_in <= wb_sio_out;
wb_gpio_in.cyc <= '0';
wb_gpio_in.cyc <= io_cycle_gpio;

-- Only give xics 8 bits of wb addr (for now...)
wb_xics_icp_in <= wb_sio_out;
wb_xics_icp_in.adr <= (others => '0');
wb_xics_icp_in.adr(7 downto 0) <= wb_sio_out.adr(7 downto 0);
wb_xics_icp_in.cyc <= '0';
wb_xics_icp_in.adr(5 downto 0) <= wb_sio_out.adr(5 downto 0);
wb_xics_icp_in.cyc <= io_cycle_icp;
wb_xics_ics_in <= wb_sio_out;
wb_xics_ics_in.adr <= (others => '0');
wb_xics_ics_in.adr(11 downto 0) <= wb_sio_out.adr(11 downto 0);
wb_xics_ics_in.cyc <= '0';
wb_xics_ics_in.adr(9 downto 0) <= wb_sio_out.adr(9 downto 0);
wb_xics_ics_in.cyc <= io_cycle_ics;

wb_ext_io_in <= wb_sio_out;
wb_ext_io_in.cyc <= '0';
wb_ext_io_in.cyc <= io_cycle_external;

wb_syscon_in <= wb_sio_out;
wb_syscon_in.cyc <= '0';

wb_ext_is_dram_csr <= '0';
wb_ext_is_dram_init <= '0';
wb_ext_is_eth <= '0';
wb_ext_is_sdcard <= '0';
wb_syscon_in.cyc <= io_cycle_syscon;

-- Default response, ack & return all 1's
wb_sio_in.dat <= (others => '1');
wb_sio_in.ack <= wb_sio_out.stb and wb_sio_out.cyc;
wb_sio_in.stall <= '0';

case slave_io is
case current_io_decode is
when SLAVE_IO_EXTERNAL =>
-- Ext IO "chip selects"
--
-- DRAM init is special at 0xFF* so we just test the top
-- bit. Everything else is at 0xC8* so we test only bits
-- 23 downto 16.
--
ext_valid := false;
if wb_sio_out.adr(29) = '1' and HAS_DRAM then -- DRAM init is special
wb_ext_is_dram_init <= '1';
ext_valid := true;
elsif wb_sio_out.adr(23 downto 16) = x"00" and HAS_DRAM then
wb_ext_is_dram_csr <= '1';
ext_valid := true;
elsif wb_sio_out.adr(23 downto 16) = x"02" and HAS_LITEETH then
wb_ext_is_eth <= '1';
ext_valid := true;
elsif wb_sio_out.adr(23 downto 16) = x"03" and HAS_LITEETH then
wb_ext_is_eth <= '1';
ext_valid := true;
elsif wb_sio_out.adr(23 downto 16) = x"04" and HAS_SD_CARD then
wb_ext_is_sdcard <= '1';
ext_valid := true;
end if;
if ext_valid then
wb_ext_io_in.cyc <= wb_sio_out.cyc;
wb_sio_in <= wb_ext_io_out;
end if;

wb_sio_in <= wb_ext_io_out;
when SLAVE_IO_SYSCON =>
wb_syscon_in.cyc <= wb_sio_out.cyc;
wb_sio_in <= wb_syscon_out;
when SLAVE_IO_UART =>
wb_uart0_in.cyc <= wb_sio_out.cyc;
wb_sio_in <= wb_uart0_out;
when SLAVE_IO_ICP =>
wb_xics_icp_in.cyc <= wb_sio_out.cyc;
wb_sio_in <= wb_xics_icp_out;
when SLAVE_IO_ICS =>
wb_xics_ics_in.cyc <= wb_sio_out.cyc;
wb_sio_in <= wb_xics_ics_out;
when SLAVE_IO_UART1 =>
wb_uart1_in.cyc <= wb_sio_out.cyc;
wb_sio_in <= wb_uart1_out;
when SLAVE_IO_SPI_FLASH_MAP =>
-- Clear top bits so they don't make their way to the
-- fash chip.
wb_spiflash_in.adr(29 downto 28) <= "00";
wb_spiflash_in.cyc <= wb_sio_out.cyc;
wb_sio_in <= wb_spiflash_out;
wb_spiflash_is_map <= '1';
when SLAVE_IO_SPI_FLASH_REG =>
wb_spiflash_in.cyc <= wb_sio_out.cyc;
when SLAVE_IO_SPI_FLASH =>
wb_sio_in <= wb_spiflash_out;
wb_spiflash_is_reg <= '1';
when SLAVE_IO_GPIO =>
wb_gpio_in.cyc <= wb_sio_out.cyc;
wb_sio_in <= wb_gpio_out;
when others =>
end case;

-- Default response, ack & return all 1's
if io_cycle_none = '1' then
wb_sio_in.dat <= (others => '1');
wb_sio_in.ack <= wb_sio_out.stb and wb_sio_out.cyc;
wb_sio_in.stall <= '0';
end if;

end process;

-- Syscon slave
@ -766,7 +789,7 @@ begin
txd => uart0_txd,
rxd => uart0_rxd,
irq => uart0_irq,
wb_adr_in => wb_uart0_in.adr(11 downto 0),
wb_adr_in => wb_uart0_in.adr(9 downto 0) & "00",
wb_dat_in => wb_uart0_in.dat(7 downto 0),
wb_dat_out => uart0_dat8,
wb_cyc_in => wb_uart0_in.cyc,
@ -783,7 +806,7 @@ begin
port map (
wb_clk_i => system_clk,
wb_rst_i => rst_uart,
wb_adr_i => wb_uart0_in.adr(4 downto 2),
wb_adr_i => wb_uart0_in.adr(2 downto 0),
wb_dat_i => wb_uart0_in.dat(7 downto 0),
wb_dat_o => uart0_dat8,
wb_we_i => wb_uart0_in.we,
@ -825,7 +848,7 @@ begin
port map (
wb_clk_i => system_clk,
wb_rst_i => rst_uart,
wb_adr_i => wb_uart1_in.adr(4 downto 2),
wb_adr_i => wb_uart1_in.adr(2 downto 0),
wb_dat_i => wb_uart1_in.dat(7 downto 0),
wb_dat_o => uart1_dat8,
wb_we_i => wb_uart1_in.we,
@ -913,20 +936,22 @@ begin
icp_out => ics_to_icp
);

gpio : entity work.gpio
generic map(
NGPIO => NGPIO
)
port map(
clk => system_clk,
rst => rst_gpio,
wb_in => wb_gpio_in,
wb_out => wb_gpio_out,
gpio_in => gpio_in,
gpio_out => gpio_out,
gpio_dir => gpio_dir,
intr => gpio_intr
);
gpio0_gen: if HAS_GPIO generate
gpio : entity work.gpio
generic map(
NGPIO => NGPIO
)
port map(
clk => system_clk,
rst => rst_gpio,
wb_in => wb_gpio_in,
wb_out => wb_gpio_out,
gpio_in => gpio_in,
gpio_out => gpio_out,
gpio_dir => gpio_dir,
intr => gpio_intr
);
end generate;

-- Assign external interrupts
interrupts: process(all)

@ -43,14 +43,14 @@ architecture rtl of spi_flash_ctrl is
-- Register indices
constant SPI_REG_BITS : positive := 3;

-- Register addresses (matches wishbone addr downto 2, ie, 4 bytes per reg)
-- Register addresses (matches wishbone addr downto 0, ie, 4 bytes per reg)
constant SPI_REG_DATA : std_ulogic_vector(SPI_REG_BITS-1 downto 0) := "000";
constant SPI_REG_CTRL : std_ulogic_vector(SPI_REG_BITS-1 downto 0) := "001";
constant SPI_REG_AUTO_CFG : std_ulogic_vector(SPI_REG_BITS-1 downto 0) := "010";
constant SPI_REG_INVALID : std_ulogic_vector(SPI_REG_BITS-1 downto 0) := "111";

-- Control register
signal ctrl_reg : std_ulogic_vector(15 downto 0) := (others => '0');
signal ctrl_reg : std_ulogic_vector(15 downto 0);
alias ctrl_reset : std_ulogic is ctrl_reg(0);
alias ctrl_cs : std_ulogic is ctrl_reg(1);
alias ctrl_rsrv1 : std_ulogic is ctrl_reg(2);
@ -58,7 +58,7 @@ architecture rtl of spi_flash_ctrl is
alias ctrl_div : std_ulogic_vector(7 downto 0) is ctrl_reg(15 downto 8);

-- Auto mode config register
signal auto_cfg_reg : std_ulogic_vector(29 downto 0) := (others => '0');
signal auto_cfg_reg : std_ulogic_vector(29 downto 0);
alias auto_cfg_cmd : std_ulogic_vector(7 downto 0) is auto_cfg_reg(7 downto 0);
alias auto_cfg_dummies : std_ulogic_vector(2 downto 0) is auto_cfg_reg(10 downto 8);
alias auto_cfg_mode : std_ulogic_vector(1 downto 0) is auto_cfg_reg(12 downto 11);
@ -126,9 +126,9 @@ architecture rtl of spi_flash_ctrl is
signal auto_latch_adr : std_ulogic;

-- Automatic mode latches
signal auto_data : std_ulogic_vector(wb_out.dat'left downto 0) := (others => '0');
signal auto_cnt : integer range 0 to 63 := 0;
signal auto_state : auto_state_t := AUTO_BOOT;
signal auto_data : std_ulogic_vector(wb_out.dat'left downto 0);
signal auto_cnt : integer range 0 to 63;
signal auto_state : auto_state_t;
signal auto_last_addr : std_ulogic_vector(31 downto 0);

begin
@ -162,7 +162,7 @@ begin
wb_map_valid <= wb_valid and wb_sel_map;

-- Register decode. For map accesses, make it look like "invalid"
wb_reg <= wb_req.adr(SPI_REG_BITS+1 downto 2) when wb_reg_valid else SPI_REG_INVALID;
wb_reg <= wb_req.adr(SPI_REG_BITS - 1 downto 0) when wb_reg_valid else SPI_REG_INVALID;

-- Shortcut because we test that a lot: data register access
wb_reg_dat_v <= '1' when wb_reg = SPI_REG_DATA else '0';
@ -351,6 +351,8 @@ begin
if rst = '1' then
auto_last_addr <= (others => '0');
auto_state <= AUTO_BOOT;
auto_cnt <= 0;
auto_data <= (others => '0');
else
auto_state <= auto_next;
auto_cnt <= auto_cnt_next;
@ -393,7 +395,7 @@ begin

-- Convert wishbone address into a flash address. We mask
-- off the 4 top address bits to get rid of the "f" there.
addr := "00" & wb_req.adr(29 downto 2) & "00";
addr := "00" & wb_req.adr(27 downto 0) & "00";

-- Calculate the next address for store & compare later
auto_lad_next <= std_ulogic_vector(unsigned(addr) + 4);

Some files were not shown because too many files have changed in this diff Show More

Loading…
Cancel
Save