microwatt

Commit Graph

Author	SHA1	Message	Date
Lars Asplund	478b787c10	Replaced VHDL assert and report with VUnit checking and logging The VUnit log package is a SW style logging framework in VHDL and the check package is an assertion library doing its error reporting with VUnit logging. These testbenches don't use, and do not need, very advanced logging/checking features but the following was possible to improve - Checking equality in VHDL can be quite tedious with a lot of type conversions and long message strings to explain the data received and what was expected. VUnit's check_equal procedure allow comparison between same or similar types and automatically create the error message for you. - The code has report statements used for testbench progress reporting and debugging. These were replaced with the info and debug procedures. info logs are visible by default while debug is not. This means that debug logs don't have to be commented, which they are now, when not used. Instead there is a show procedure making debug messages visible. The show procedure has been commented to hide the debug messages but a more elegant solution is to control visibility from a generic and then set that generic from the command line. I've left this as a TODO but the run script allow you to extend the standard CLI of VUnit to add new options and you can also set generics from the run script. - VUnit log messages are color coded if color codes are supported by the terminal. It makes it quicker to spot messages of different types when there are many log messages. Error messages will always be made visible on the terminal but you must use the -v (verbose) to see other logs. - Some tests have a lot of "metvalue detected" warning messages from the numeric_std package and these clutter the logs when using the -v option. VUnit has a simulator independent option allowing you to suppress those messages. That option has been enabled. Signed-off-by: Lars Asplund <lars.anders.asplund@gmail.com>	4 years ago
Lars Asplund	0940b8a9d3	Organized VUnit testbenches into test cases. Several of the testbenches have stimuli code divided into sections preceded with a header comment explaining what is being tested. These sections have been made into VUnit test cases. The default behavior of VUnit is to run each test case in a separate simulation which comes with a number of benefits: * A failing test case doesn't prevent other test cases to be executed * Test cases are independent. A test case cannot fail as a side-effect to a problem with another test case * Test execution can be more parallelized and the overall test execution time reduced Signed-off-by: Lars Asplund <lars.anders.asplund@gmail.com>	4 years ago
Lars Asplund	08c0c4c1b4	Make core testbenches recognized by VUnit This commit also removes the dependencies these testbenches have on VHPIDIRECT. The use of VHPIDIRECT limits the number of available simulators for the project. Rather than using foreign functions the testbenches can be implemented entirely in VHDL where equivalent functionality exists. For these testbenches the VHPIDIRECT-based randomization functions were replaced with VHDL-based functions. The testbenches recognized by VUnit can be executed in parallel threads for better simulation performance using the -p option to the run.py script Signed-off-by: Lars Asplund <lars.anders.asplund@gmail.com>	4 years ago
Anton Blanchard	ab86b58d95	Exit cleanly from testbench on success Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	5 years ago
Paul Mackerras	c9a2076dd3	execute1: Remember dest GPR, RC, OE, XER for slow operations For multiply and divide operations, execute1 now records the destination GPR number, RC and OE from the instruction, and the XER value. This means that the multiply and divide units don't need to record those values and then send them back to execute1. This makes the interface to those units a bit simpler. They simply report an overflow signal along with the result value, and execute1 takes care of updating XER if necessary. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 years ago
Paul Mackerras	39d18d2738	Make divider hang off the side of execute1 With this, the divider is a unit that execute1 sends operands to and which sends its results back to execute1, which then send them to writeback. Execute1 now sends a stall signal when it gets a divide or modulus instruction until it gets a valid signal back from the divider. Divide and modulus instructions are no longer marked as single-issue. The data formatting step that used to be done in decode2 for div and mod instructions is now done in execute1. We also do the absolute value operation in that same cycle instead of taking an extra cycle inside the divider for signed operations with a negative operand. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 years ago
Paul Mackerras	374f4c536d	writeback: Do data formatting and condition recording in writeback This adds code to writeback to format data and test the result against zero for the purpose of setting CR0. The data formatter is able to shift and mask by bytes and do byte reversal and sign extension. It can also put together bytes from two input doublewords to support unaligned loads (including unaligned byte-reversed loads). The data formatter starts with an 8:1 multiplexer that is able to direct any byte of the input to any byte of the output. This lets us rotate the data and simultaneously byte-reverse it. The rotated/reversed data goes to a register for the unaligned cases that overlap two doublewords. Then there is per-byte logic that does trimming, sign extension, and splicing together bytes from a previous input doubleword (stored in data_latched) and the current doubleword. Finally the 64-bit result is tested to set CR0 if rc = 1. This removes the RC logic from the execute2, multiply and divide units, and the shift/mask/byte-reverse/sign-extend logic from loadstore2. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 years ago
Paul Mackerras	d4f51e08c8	divider: Return 0 for invalid and overflow cases, like P9 does This adds logic to detect the cases where the quotient of the division overflows the range of the output representation, and return all zeroes in those cases, which is what POWER9 does. To do this, we extend the dividend register by 1 bit and we do an extra step in the division process to get a 2^64 bit of the quotient, which ends up in the 'overflow' signal. This catches all the cases where dividend >= 2^64 * divisor, including the case where divisor = 0, and the divde/divdeu cases where \|RA\| >= \|RB\|. Then, in the output stage, we also check that the result fits in the representable range, which depends on whether the division is a signed division or not, and whether it is a 32-bit or 64-bit division. If dividend >= 2^64 or the result doesn't fit in the representable range, write_data is set to 0 and write_cr_data to 0x20000000 (i.e. cr0.eq = 1). POWER9 sets the top 32 bits of the result to zero for 32-bit signed divisions, and sets CR0 when RC=1 according to the 64-bit value (i.e. CR0.LT is always 0 for 32-bit signed divisions, even if the 32-bit result is negative). However, modsw with a negative result sets the top 32 bits to all 1s. We follow suit. This updates divider_tb to check the invalid cases as well as the valid case. This also fixes a small bug where the reset signal for the divider was driven from rst when it should have been driven from core_rst. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 years ago
Paul Mackerras	25b9450475	divider: Do absolute-value ops in divider instead of decode This moves the negation of negative operands for signed divide and modulus operations out of the decode2 stage and into the divider. If either of the operands for a signed divide or modulus operation is negative, the divider now takes an extra cycle to negate the operands that are negative. The interface to the divider now has an 'is_signed' signal rather than a 'neg_result' signal, and the dividend and divisor can be negative, so divider_tb had to be updated for the new interface. The reason for doing this is that one of the worst timing violations on the Arty A7-100 at 100MHz involved the carry chain in the adders that did the negation of the dividend and divisor in the decode stage. Moving the negations to a separate cycle fixes that and also seems to reduce the total number of slice LUTs used. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 years ago
Paul Mackerras	d5bc6c8824	Add a divider unit and a testbench for it This adds a divider unit, connected to the core in much the same way that the multiplier unit is connected. The division algorithm is very simple-minded, taking 64 clock cycles for any division (even 32-bit division instructions). The decoding is simplified by making use of regularities in the instruction encoding for div* and mod* instructions. Instead of having PPC_* encodings from the first-stage decoder for each of the different div* and mod* instructions, we now just have PPC_DIV and PPC_MOD, and the inputs to the divider that indicate what sort of division operation to do are derived from instruction word bits. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 years ago

10 Commits (boxarty-20211011)