Commit Graph

5 Commits (6ef9395f10c07d30233b0f32585f16f2baca427e)

Author SHA1 Message Date
Paul Mackerras 0ceace927c Xilinx FPGAs: Eliminate Vivado critical warnings
This resolves various warnings and critical warnings from Vivado.

In particular, the asynchronous loops in the xilinx hardware RNG were
giving a lot of critical warnings, which proved to be difficult to
suppress, so this instead makes all the xilinx platforms use the
'nonrandom.vhdl' implementation, which always returns an error.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
9 months ago
Paul Mackerras e030a500e8 Allow integer instructions and load/store instructions to execute together
Execute1 and loadstore1 now send each other stall signals that
indicate that a valid instruction in stage 2 can't complete in this
cycle, and hence any valid instruction in stage 1 in the other unit
can't move to stage 2.  With this in place, an ALU instruction can
move into stage 1 while a LSU instruction is in stage 2.

Since the FPU doesn't yet have a way to stall completion, we can't yet
start FPU instructions while any LSU or ALU instruction is in
progress.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2 years ago
Paul Mackerras 1720a0584a Use alternative count-leading-zeroes algorithm in the FPU and LSU
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
3 years ago
Paul Mackerras 1086988883 countzero: Use alternative algorithm for higher bits
This implements an alternative count-leading-zeroes algorithm which
uses less LUTs to generate the higher-order bits (2..5) of the
result.

By doing (v | -v) rather than (v & -v), we get a value which has ones
from the MSB down to the rightmost 1 bit in v and then zeroes down to
the LSB.  This means that we can generate the MSB of the result (the
index of the rightmost 1 bit in v) just by looking at bits 63 and 31
of (v | -v), assuming that v is 64 bits.  Bit 4 of the result requires
looking at bits 63, 47, 31 and 15.  In contrast, each bit of the
result using (v & -v), which has a single 1, requires ORing together
32 bits.

It turns out that the minimum LUT usage comes from using (v & -v) to
generate bits 0 and 1 of the result, and using (v | -v) to generate
bits 2 to 5.  This saves almost 60 6-input LUTs on the Artix-7.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
3 years ago
Paul Mackerras 2491aa7fc5 core: Make popcnt* take two cycles
This moves the calculation of the result for popcnt* into the
countbits unit, renamed from countzero, so that we can take two cycles
to get the result.  The motivation for this is that the popcnt*
calculation was showing up as a critical path.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
3 years ago