Commit Graph

40 Commits (9a63c098a5471e40ca0364a867d30204f0288bc4)

Author SHA1 Message Date
Benjamin Herrenschmidt 9a63c098a5 Move log2/ispow2 to a utils package
(Out of icache and dcache)


Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Benjamin Herrenschmidt cb4451498f dcache: Add testbench
A very simple one for now...

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Benjamin Herrenschmidt b513f0fb48 dcache: Add a dcache
This replaces loadstore2 with a dcache

The dcache unit is losely based on the icache one (same basic cache
layout), but has some significant logic additions to deal with stores,
loads with update, non-cachable accesses and other differences due to
operating in the execution part of the pipeline rather than the fetch
part.

The cache is store-through, though a hit with an existing line will
update the line rather than invalidate it.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Paul Mackerras f49a5a99a5 Remove execute2 stage
Since the condition setting got moved to writeback, execute2 does
nothing aside from wasting a cycle.  This removes it.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Paul Mackerras 374f4c536d writeback: Do data formatting and condition recording in writeback
This adds code to writeback to format data and test the result
against zero for the purpose of setting CR0.  The data formatter
is able to shift and mask by bytes and do byte reversal and sign
extension.  It can also put together bytes from two input
doublewords to support unaligned loads (including unaligned
byte-reversed loads).

The data formatter starts with an 8:1 multiplexer that is able
to direct any byte of the input to any byte of the output.  This
lets us rotate the data and simultaneously byte-reverse it.
The rotated/reversed data goes to a register for the unaligned
cases that overlap two doublewords.  Then there is per-byte logic
that does trimming, sign extension, and splicing together bytes
from a previous input doubleword (stored in data_latched) and the
current doubleword.  Finally the 64-bit result is tested to set
CR0 if rc = 1.

This removes the RC logic from the execute2, multiply and divide
units, and the shift/mask/byte-reverse/sign-extend logic from
loadstore2.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Anton Blanchard 813f834012 Add CR hazard detection
To keep things simple we treat the CR as a single entity.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
Anton Blanchard bdc26b7527 Add GPR hazard detection
Check GPRs against any writers in the pipeline.

All instructions are still marked single in pipeline at
this stage.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
Anton Blanchard e4c98dce36
Merge pull request from antonblanchard/gpr-hazard-5-a
Separate issue control into its own unit
Anton Blanchard d5346d0abf Separate issue control into its own unit
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
Paul Mackerras 4396eddc31 countzero: Add a testbench
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Anton Blanchard 3c6e66dc96
Merge pull request from paulusmack/logical
execute: Consolidate count-leading/trailing-zeroes implementations
Anton Blanchard 4b7b702e01
Merge pull request from antonblanchard/logical
Consolidate logical instructions
Paul Mackerras 24a4a796ce execute: Consolidate count-leading/trailing-zeroes implementations
This adds combinatorial logic that does 32-bit and 64-bit count
leading and trailing zeroes in one unit, and consolidates the
four instructions under a single OP_CNTZ opcode.

This saves 84 slice LUTs on the Arty A7-100.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Anton Blanchard b8fb721b81 Consolidate logical instructions
Consolidate and/andc/nand, or/orc/nor and xor/eqv, using a common
invert on the input and output. This saves us about 200 LUTs.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
Benjamin Herrenschmidt b56b46b7d1 icache: Set associative icache
This adds support for set associativity to the icache. It can still
be direct mapped by setting NUM_WAYS to 1.

The replacement policy uses a simple tree-PLRU for each set.

This is only lightly tested, tests pass but I have to double check
that we are using the ways effectively and not creating duplicates.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Benjamin Herrenschmidt 004eb074c9 plru: Add a simple PLRU module
Tested in sim only for now

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Paul Mackerras f7c393ba7e Add a rotate/mask/shift unit and use it in execute1
This adds a new entity 'rotator' which contains combinatorial logic
for rotating and masking 64-bit values.  It implements the operations
of the rlwinm, rlwnm, rlwimi, rldicl, rldicr, rldic, rldimi, rldcl,
rldcr, sld, slw, srd, srw, srad, sradi, sraw and srawi instructions.
It consists of a 3-stage 64-bit rotator using 4:1 multiplexors at
each stage, two mask generators, output logic and control logic.

The insn_type_t values used for these instructions have been reduced
to just 5: OP_RLC, OP_RLCL and OP_RLCR for the rotate and mask
instructions (clear both left and right, clear left, clear right
variants), OP_SHL for left shifts, and OP_SHR for right shifts.
The control signals for the rotator are derived from the opcode
and from the is_32bit and is_signed fields of the decode_rom_t.

The rotator is instantiated as an entity in execute1 so that we can
be sure we only have one of it.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Paul Mackerras c9e92483b8 decode: Push mtspr/mfspr register decoding down into execute1
Instead of doing mfctr, mflr, mftb, mtctr, mtlr as separate ops,
just pass down mfspr and mtspr ops with the spr number and let
execute1 decode which SPR we're addressing.  This will help reduce
the number of instruction bits decode1 needs to look at.

In fact we now pass down the whole instruction from decode2 to
execute1.  We will need more bits of the instruction in future,
and the tools should just optimize away any that we don't end
up using.  Since the 'aa' bit was just a copy of an instruction
bit, we can now remove it from the record.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Benjamin Herrenschmidt 586abb70a0 Update dependency
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Anton Blanchard 26f70264b3 Update Makefile dependencies
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
Anton Blanchard b57325ce29 Merge branch 'divider' of https://github.com/paulusmack/microwatt
Paul Mackerras d5bc6c8824 Add a divider unit and a testbench for it
This adds a divider unit, connected to the core in much the same way
that the multiplier unit is connected.  The division algorithm is
very simple-minded, taking 64 clock cycles for any division (even
32-bit division instructions).

The decoding is simplified by making use of regularities in the
instruction encoding for div* and mod* instructions.  Instead of
having PPC_* encodings from the first-stage decoder for each of the
different div* and mod* instructions, we now just have PPC_DIV and
PPC_MOD, and the inputs to the divider that indicate what sort of
division operation to do are derived from instruction word bits.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Benjamin Herrenschmidt 42d802bed0 Add distclean to Makefile
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Benjamin Herrenschmidt 98f0994698 Add core debug module
This module adds some simple core controls:

  reset, stop, start, step

along with icache clear and reading the NIA and core
status bits

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org
Benjamin Herrenschmidt b46f81fae4 Wishbone debug module
This adds a debug module off the DMI (debug) bus which can act as a
wishbone master to generate read and write cycles.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Benjamin Herrenschmidt ee52fd4d80 Add a debug (DMI) bus and a JTAG interface to it on Xilinx FPGAs
This adds a simple bus that can be mastered from an external
system via JTAG, which will be used to hookup various debug
modules.

It's loosely based on the RiscV model (hence the DMI name).

The module currently only supports hooking up to a Xilinx BSCANE2
but it shouldn't be too hard to adapt it to support different TAPs
if necessary.

The JTAG protocol proper is not exactly the RiscV one at this point,
though I might still change it.

This comes with some sim variants of Xilinx BSCANE2 and BUFG and a
test bench.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Anton Blanchard 135805d2ac
Merge pull request from antonblanchard/execute-cleanup
execute1 no longer needs sim_console
Anton Blanchard 6d85920068 execute1 no longer needs sim_console
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
Anton Blanchard 1b6eef2a5d Fix multiply_tb
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
Anton Blanchard 1e3e16e500 Add an icache testbench
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
Anton Blanchard 89849a6856 Add a simple direct mapped icache
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
Anton Blanchard b6b2c78163 Update Makefile dependencies
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Benjamin Herrenschmidt 3ac1dbc737 Share soc.vhdl between FPGA and sim
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Benjamin Herrenschmidt 8bfd6e5eae Use simulated UART in core test bench
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Anton Blanchard 03fd06deaf Rework SOC reset
The old reset code was overly complicated and never worked properly.
Replace it with a simpler sequence that uses a couple of shift registers
to assert resets:

- Wait a number of external clock cycles before removing reset from
  the PLL.

- After the PLL locks and the external reset button isn't pressed,
  wait a number of PLL clock cycles before removing reset from the SOC.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
Anton Blanchard 5e140298a5 Rework decode2
The decode2 stage was spaghetti code and needed cleaning up.
Create a series of functions to pull fields from a ppc instruction
and also a series of helpers to extract values for the execution
units.

As suggested by Paul, we should pass all signals to the execution
units and only set the valid signal conditionally, which should
use less resources.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
Anton Blanchard f98370f9e6
Merge pull request from antonblanchard/travis-test
Add an initial travis.yml
Anton Blanchard 2ee269abdb Add an initial travis.yml
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
Anton Blanchard 96787091a6 Add -Wall to CFLAGS
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
Anton Blanchard 5a29cb4699 Initial import of microwatt
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>