This adds an 'is_signed' signal to MultiplyInputType to indicate
whether the data1 and data2 fields are to be interpreted as signed or
unsigned numbers.
The 'not_result' field is replaced by a 'subtract' field which
provides a more intuitive interface for requesting that the product be
subtracted from the addend rather than added, i.e. subtract = 1 gives
C - A * B, vs. subtract = 0 giving C + A * B. (Previously the users
of the multipliers got the same effect by complementing the addend and
setting not_result = 1.)
The is_32bit field is removed because it is no longer used now that we
have a separate 32-bit multiplier.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
This rearranges the way that partial products are generated and summed
so that the partial products that could be negative in a signed
multiplier are now sign-extended. The inputs are still zero-extended,
however.
The overflow detection logic now only detects 64-bit overflow, since
32-bit multiplications are handled in a separate multiplier.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
This adds an optional 16 bit x 16 bit signed multiplier and uses it
for multiply instructions that return the low 64 bits of the product
(mull[dw][o] and mulli, but not maddld) when the operands are both in
the range -2^15 .. 2^15 - 1. The "short" 16-bit multiplier produces
its result combinatorially, so a multiply that uses it executes in one
cycle. This improves the coremark result by about 4%, since coremark
does quite a lot of multiplies and they almost all have operands that
fit into 16 bits.
The presence of the short multiplier is controlled by a generic at the
execute1, SOC, core and top levels. For now, it defaults to off for
all platforms, and can be enabled using the --has_short_mult flag to
fusesoc.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
This changes s0 to use the P register rather than the A/B/C input
registers, thus improving the timing of the multiplier output. The
m00, m02 and m03 multipliers now use their P registers rather than the
M registers, moving the addition they do from the second cycle to the
first.
Also, the XOR that inverts the 32 LSBs is moved before the output
register.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
We now expect the overflow signal from the multiplier to come along
one cycle later than the product.
This breaks up a long combinatorial path and improves timing.
This also changes some uses of v.<field> to r.<field> in the slow
op logic, which should help timing as well.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
This makes the interface to the multiplier more general so an instance
of it can be used in the FPU. It now has a 128-bit addend that is
added on to the product. Instead of an input to negate the output,
it now has a "not_result" input to complement the output. Execute1
uses not_result=1 and addend=-1 to get the effect of negating the
output. The interface is defined this way because this is what can
be done easily with the Xilinx DSP slices in xilinx-mult.vhdl.
This also adds clock enable signals to the DSP slices, mostly for the
sake of reducing power consumption.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
This adds a custom implementation of the multiplier which uses 16
DSP48E1 slices to do a 64x64 bit multiplication in 2 cycles.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>