Commit Graph

115 Commits (cef3660e74f606f49d18df2fc0e27a858bbd4579)

Author SHA1 Message Date
Paul Mackerras 25b9450475 divider: Do absolute-value ops in divider instead of decode
This moves the negation of negative operands for signed divide and
modulus operations out of the decode2 stage and into the divider.
If either of the operands for a signed divide or modulus operation
is negative, the divider now takes an extra cycle to negate the
operands that are negative.

The interface to the divider now has an 'is_signed' signal rather
than a 'neg_result' signal, and the dividend and divisor can be
negative, so divider_tb had to be updated for the new interface.

The reason for doing this is that one of the worst timing violations
on the Arty A7-100 at 100MHz involved the carry chain in the adders
that did the negation of the dividend and divisor in the decode stage.
Moving the negations to a separate cycle fixes that and also seems to
reduce the total number of slice LUTs used.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
5 years ago
Anton Blanchard b57325ce29 Merge branch 'divider' of https://github.com/paulusmack/microwatt 5 years ago
Paul Mackerras d5bc6c8824 Add a divider unit and a testbench for it
This adds a divider unit, connected to the core in much the same way
that the multiplier unit is connected.  The division algorithm is
very simple-minded, taking 64 clock cycles for any division (even
32-bit division instructions).

The decoding is simplified by making use of regularities in the
instruction encoding for div* and mod* instructions.  Instead of
having PPC_* encodings from the first-stage decoder for each of the
different div* and mod* instructions, we now just have PPC_DIV and
PPC_MOD, and the inputs to the divider that indicate what sort of
division operation to do are derived from instruction word bits.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
5 years ago
Benjamin Herrenschmidt 98f0994698 Add core debug module
This module adds some simple core controls:

  reset, stop, start, step

along with icache clear and reading the NIA and core
status bits

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org
5 years ago
Anton Blanchard 89849a6856 Add a simple direct mapped icache
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
5 years ago
Anton Blanchard 1d00c75ecc Remove nia from loadstore and multiply
Neither unit needs the NIA, so remove it.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
5 years ago
Anton Blanchard 92a7152370 Rework pipeline, add stall and flush signals
This adds stall and flush signals to the pipeline.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
5 years ago
Anton Blanchard 31a6fb6ef5 More second write port removal
I missed the register file updates for the second write port
removal.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
5 years ago
Anton Blanchard fb4cad6eaf Remove second write port
We only need two write ports for load with update instructions.
Having two write ports just for this instruction is expensive.

For now we will force them to be the only instruction in the
pipeline, and take two cycles of writeback.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
5 years ago
Anton Blanchard 0254e40685 Fix issues with CR rework
It simulated fine, but didn't synthesize. Fix some obvious issues
to get us going again.

Fixes: 9fbaea6f08 ("Rework CR file and add forwarding")
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
5 years ago
Anton Blanchard a1ab1d3e56
Merge pull request #25 from antonblanchard/register_file_printing
Clean up register read debug output
5 years ago
Anton Blanchard 04eb9583e6 Clean up register read debug output
Right now we continually print all 3 possible GPRs an instruction
may be using. Add signals so we only print GPRs when they are
actually read. This should hopefully optimise away when synthesized.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
5 years ago
Anton Blanchard 9fbaea6f08 Rework CR file and add forwarding
Handle the CR as a single field with per nibble enables. Forward any
writes in the same cycle.

If this proves to be an issue for timing, we may want to revisit
this in the future. For now, it keeps things simple.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
5 years ago
Anton Blanchard 5e140298a5 Rework decode2
The decode2 stage was spaghetti code and needed cleaning up.
Create a series of functions to pull fields from a ppc instruction
and also a series of helpers to extract values for the execution
units.

As suggested by Paul, we should pass all signals to the execution
units and only set the valid signal conditionally, which should
use less resources.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
5 years ago
Anton Blanchard 5a29cb4699 Initial import of microwatt
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
5 years ago