Commit Graph

39 Commits (dee3783d79c40d6a9e986e2d032ae0afad3064ff)

Author SHA1 Message Date
Paul Mackerras 8160f4f821 Add framework for implementing an MMU
This adds a new module to implement an MMU.  At the moment it doesn't
do very much.  Tlbie instructions now get sent by loadstore1 to mmu,
which sends them to dcache, rather than loadstore1 sending them
directly to dcache.  TLB misses from dcache now get sent by loadstore1
to mmu, which currently just returns an error.  Loadstore1 then
generates a DSI in response to the error return from mmu.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
5 years ago
Joel Stanley 6a3d2d95df Set default RAM to be 16K in microwatt.core
This allows it to run hello world out of the box.

Signed-off-by: Joel Stanley <joel@jms.id.au>
5 years ago
Benjamin Herrenschmidt 8e0389b973 ram: Rework main RAM interface
This replaces the simple_ram_behavioural and mw_soc_memory modules
with a common wishbone_bram_wrapper.vhdl that interfaces the
pipelined WB with a lower-level RAM module, along with an FPGA
and a sim variants of the latter.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
5 years ago
Benjamin Herrenschmidt d2762e70e5 Add option to not flatten hierarchy
Vivado by default tries to flatten the module hierarchy to improve
placement and timing. However this makes debugging timing issues
really hard as the net names in the timing report can be pretty
bogus.

This adds a generic that can be used to control attributes to stop
vivado from flattening the main core components. The resulting design
will have worst timing overall but it will be easier to understand
what the worst timing path are and address them.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
5 years ago
Benjamin Herrenschmidt b513f0fb48 dcache: Add a dcache
This replaces loadstore2 with a dcache

The dcache unit is losely based on the icache one (same basic cache
layout), but has some significant logic additions to deal with stores,
loads with update, non-cachable accesses and other differences due to
operating in the execution part of the pipeline rather than the fetch
part.

The cache is store-through, though a hit with an existing line will
update the line rather than invalidate it.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
5 years ago
Paul Mackerras f49a5a99a5 Remove execute2 stage
Since the condition setting got moved to writeback, execute2 does
nothing aside from wasting a cycle.  This removes it.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
5 years ago
Anton Blanchard 813f834012 Add CR hazard detection
To keep things simple we treat the CR as a single entity.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
5 years ago
Anton Blanchard bdc26b7527 Add GPR hazard detection
Check GPRs against any writers in the pipeline.

All instructions are still marked single in pipeline at
this stage.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
5 years ago
Anton Blanchard d5346d0abf Separate issue control into its own unit
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
5 years ago
Anton Blanchard 9b8c094cf6 Fix cmod-a7 frequency
The cmod-a7 is ignoring the clk_frequency parameter and running at
100 MHz. Fix it.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
5 years ago
Anton Blanchard 3c6e66dc96
Merge pull request #83 from paulusmack/logical
execute: Consolidate count-leading/trailing-zeroes implementations
5 years ago
Anton Blanchard 4b7b702e01
Merge pull request #81 from antonblanchard/logical
Consolidate logical instructions
5 years ago
Paul Mackerras 24a4a796ce execute: Consolidate count-leading/trailing-zeroes implementations
This adds combinatorial logic that does 32-bit and 64-bit count
leading and trailing zeroes in one unit, and consolidates the
four instructions under a single OP_CNTZ opcode.

This saves 84 slice LUTs on the Arty A7-100.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
5 years ago
Anton Blanchard b8fb721b81 Consolidate logical instructions
Consolidate and/andc/nand, or/orc/nor and xor/eqv, using a common
invert on the input and output. This saves us about 200 LUTs.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
5 years ago
Benjamin Herrenschmidt b56b46b7d1 icache: Set associative icache
This adds support for set associativity to the icache. It can still
be direct mapped by setting NUM_WAYS to 1.

The replacement policy uses a simple tree-PLRU for each set.

This is only lightly tested, tests pass but I have to double check
that we are using the ways effectively and not creating duplicates.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
5 years ago
Benjamin Herrenschmidt 004eb074c9 plru: Add a simple PLRU module
Tested in sim only for now

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
5 years ago
Paul Mackerras f7c393ba7e Add a rotate/mask/shift unit and use it in execute1
This adds a new entity 'rotator' which contains combinatorial logic
for rotating and masking 64-bit values.  It implements the operations
of the rlwinm, rlwnm, rlwimi, rldicl, rldicr, rldic, rldimi, rldcl,
rldcr, sld, slw, srd, srw, srad, sradi, sraw and srawi instructions.
It consists of a 3-stage 64-bit rotator using 4:1 multiplexors at
each stage, two mask generators, output logic and control logic.

The insn_type_t values used for these instructions have been reduced
to just 5: OP_RLC, OP_RLCL and OP_RLCR for the rotate and mask
instructions (clear both left and right, clear left, clear right
variants), OP_SHL for left shifts, and OP_SHR for right shifts.
The control signals for the rotator are derived from the opcode
and from the is_32bit and is_signed fields of the decode_rom_t.

The rotator is instantiated as an entity in execute1 so that we can
be sure we only have one of it.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
5 years ago
Benjamin Herrenschmidt 9961d70dfb Improve PLL/MMCM clocks configuration
We can now pass both the input clock and target clock frequency
via generics. Add support for both 50Mhz and 100Mhz target freqs
for both cases.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
5 years ago
Benjamin Herrenschmidt 492bf06740 corefile: Remove duplicate wishbone_debug_master
It's both in core and soc, it should only be in the latter

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
5 years ago
Benjamin Herrenschmidt ab5c6ab9ac fpga: Arty A7's don't need multiple filesets
the XDC is identical between variants, so is the fileset

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
5 years ago
Paul Mackerras 1158c81500 fpga: Add definitions for Arty A7-100 board
These are a copy of the A7-35 definitions with 35 changed to 100.
The A7-100 uses the same .xdc file (arty_a7-35.xdc) as the A7-35
since the only difference between the two is the FPGA part; the
hardware and connections on the two boards are identical.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
5 years ago
Anton Blanchard b57325ce29 Merge branch 'divider' of https://github.com/paulusmack/microwatt 5 years ago
Anton Blanchard d82f4c18b6 Add core_debug.vhdl to fusesoc configs
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
5 years ago
Paul Mackerras d5bc6c8824 Add a divider unit and a testbench for it
This adds a divider unit, connected to the core in much the same way
that the multiplier unit is connected.  The division algorithm is
very simple-minded, taking 64 clock cycles for any division (even
32-bit division instructions).

The decoding is simplified by making use of regularities in the
instruction encoding for div* and mod* instructions.  Instead of
having PPC_* encodings from the first-stage decoder for each of the
different div* and mod* instructions, we now just have PPC_DIV and
PPC_MOD, and the inputs to the divider that indicate what sort of
division operation to do are derived from instruction word bits.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
5 years ago
Benjamin Herrenschmidt b46f81fae4 Wishbone debug module
This adds a debug module off the DMI (debug) bus which can act as a
wishbone master to generate read and write cycles.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
5 years ago
Benjamin Herrenschmidt ee52fd4d80 Add a debug (DMI) bus and a JTAG interface to it on Xilinx FPGAs
This adds a simple bus that can be mastered from an external
system via JTAG, which will be used to hookup various debug
modules.

It's loosely based on the RiscV model (hence the DMI name).

The module currently only supports hooking up to a Xilinx BSCANE2
but it shouldn't be too hard to adapt it to support different TAPs
if necessary.

The JTAG protocol proper is not exactly the RiscV one at this point,
though I might still change it.

This comes with some sim variants of Xilinx BSCANE2 and BUFG and a
test bench.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
5 years ago
Anton Blanchard 89849a6856 Add a simple direct mapped icache
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
5 years ago
Benjamin Herrenschmidt 3ac1dbc737 Share soc.vhdl between FPGA and sim
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
5 years ago
Benjamin Herrenschmidt d21ef5836d Pass wishbone record to bram memory module
(And rename it to mw_soc_memory).

This makes soc.vhdl simpler and provides the same interface as
the simulated memory, which will help when sharing soc.vhdl
with sim later

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
5 years ago
Benjamin Herrenschmidt a69a93b466 Split FPGA toplevel from soc
This will be useful when we start needing different toplevels for
different boards.

We keep the reset and clock generators in the toplevel as they will
eventually be taken over by litedram when we integrate it, and they
are more likely to change on different system types.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
5 years ago
Anton Blanchard 270d7b1b9a Cmod A7-35 support
This adds support for the Digilane Cmod A7-35.

I had to use the MMCM because the clock (12 MHz) is below the PLL
minimum of 19 MHz.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
5 years ago
Anton Blanchard 03fd06deaf Rework SOC reset
The old reset code was overly complicated and never worked properly.
Replace it with a simpler sequence that uses a couple of shift registers
to assert resets:

- Wait a number of external clock cycles before removing reset from
  the PLL.

- After the PLL locks and the external reset button isn't pressed,
  wait a number of PLL clock cycles before removing reset from the SOC.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
5 years ago
Anton Blanchard 5e140298a5 Rework decode2
The decode2 stage was spaghetti code and needed cleaning up.
Create a series of functions to pull fields from a ppc instruction
and also a series of helpers to extract values for the execution
units.

As suggested by Paul, we should pass all signals to the execution
units and only set the valid signal conditionally, which should
use less resources.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
5 years ago
Anton Blanchard 5379b805ec Arty A7 reset pin is C2
Use C2 for reset, and fix up a few whitespace issues.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
5 years ago
riktw 4ebd6fc1f7 Added support for building for Arty A7 boards 5 years ago
Olof Kindgren 12327034d6 Add and use plle2 primitive for nexys boards 5 years ago
Olof Kindgren b9bf19f912 Added synthesis target
The synth target can be used to analyze the core after synthesis
without running P&R. Currently, the only edalize backends that
support synthesis without P&R are vivado and icestorm, and icestorm
needs yosys built with verific support to parse vhdl.

To run synthesis only for a part, run

fusesoc run --target=synth --tool=vivado microwatt --part=<part>

where part is a valid Xilinx part such as xc7a100tcsg324-1
5 years ago
Olof Kindgren 250d09ed2d Add Nexys Video support 5 years ago
Olof Kindgren 5e56b14125 Add FuseSoC core description file with Nexys A7 support 5 years ago