Commit Graph

9 Commits (9568e5f848894d402f20710b3b2f0abf8037c081)

Author SHA1 Message Date
Paul Mackerras 9568e5f848 multiply: Move data formatting out of decode2
At present, decode2 does some formatting of the input data for the
multiply unit - truncation to 32 bits for 32-bit operations and then
sign or zero extension to 65 bits.  This is going to prevent forwarding
of results within the execute pipeline in future, so we move the
formatting to the first cycle of the multiply pipeline.

It turns out that we have a wasted cycle at the front of the multiply
pipe, because decode2 has a register at its output and multiply has
a register at its input.  For now we use this cycle to do the data
formatting.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
6 years ago
Paul Mackerras 374f4c536d writeback: Do data formatting and condition recording in writeback
This adds code to writeback to format data and test the result
against zero for the purpose of setting CR0.  The data formatter
is able to shift and mask by bytes and do byte reversal and sign
extension.  It can also put together bytes from two input
doublewords to support unaligned loads (including unaligned
byte-reversed loads).

The data formatter starts with an 8:1 multiplexer that is able
to direct any byte of the input to any byte of the output.  This
lets us rotate the data and simultaneously byte-reverse it.
The rotated/reversed data goes to a register for the unaligned
cases that overlap two doublewords.  Then there is per-byte logic
that does trimming, sign extension, and splicing together bytes
from a previous input doubleword (stored in data_latched) and the
current doubleword.  Finally the 64-bit result is tested to set
CR0 if rc = 1.

This removes the RC logic from the execute2, multiply and divide
units, and the shift/mask/byte-reverse/sign-extend logic from
loadstore2.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
6 years ago
Benjamin Herrenschmidt 48e6e719d3 Multiply needs to be 16 stages to fix all timing issues
This seems dependent on the FPGA type/size, so we should probably
make it a toplevel generic, but for now this helps on the
Arty A7-35

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
6 years ago
Anton Blanchard 8dd97fbe7f Reformat multiply code
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
6 years ago
Anton Blanchard 99dd4de54e Don't use VHDL 2008 condition operator in multiply
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
6 years ago
Anton Blanchard 68533c4cfb Reduce multiply to 2 cycles
We want all non load/store ops to take 2 cycles to make
tracking write back easier.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
6 years ago
Anton Blanchard a22afbdb5b Quieten multiply warning
We no longer gate multiply with the valid signal, so it's complaining
a lot. Comment out the warning.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
6 years ago
Anton Blanchard 18b9b39a2c Simplify multiply
No need to gate everything with the valid bit.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
6 years ago
Anton Blanchard 5a29cb4699 Initial import of microwatt
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
6 years ago