microwatt

Commit Graph

Author	SHA1	Message	Date
Paul Mackerras	fdb3ef6874	Finish off taking SPRs out of register file With this, the register file now contains 64 entries, for 32 GPRs and 32 FPRs, rather than the 128 it had previously. Several things get simplified - decode1 no longer has to work out the ispr{1,2,o} values, decode_input_reg_{a,b,c} no longer have the t = SPR case, etc. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2 years ago
Paul Mackerras	2491aa7fc5	core: Make popcnt* take two cycles This moves the calculation of the result for popcnt* into the countbits unit, renamed from countzero, so that we can take two cycles to get the result. The motivation for this is that the popcnt* calculation was showing up as a critical path. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	3 years ago
Paul Mackerras	a68921edca	core: Fix mcrxrx, addpcis and bpermd - mcrxrx put the bits in the wrong order - addpcis was setting CR0 if the instruction bit 0 = 1, which it shouldn't - bpermd was producing 0 always and additionally had the wrong bit numbering Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	3 years ago
Paul Mackerras	658feabfd4	core: Make result multiplexing explicit This adds an explicit multiplexer feeding v.e.write_data in execute1, with the select lines determined in the previous cycle based on the insn_type. Similarly, for multiply and divide instructions, there is now an explicit multiplexer. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 years ago
Paul Mackerras	83816cb9e3	core: Implement BCD Assist instructions addg6s, cdtbcd, cbcdtod To avoid adding too much logic, this moves the adder used by OP_ADD out of the case statement in execute1.vhdl so that the result can be used by OP_ADDG6S as well. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 years ago
Paul Mackerras	b739372f7e	core: Implement the bpermd instruction Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 years ago
Paul Mackerras	9b40b5a77b	logical: Only do output inversion for OP_AND, OP_OR and OP_XOR It's not needed for the other ops (popcnt, parity, etc.) and the logical unit shows up as a critical path from time to time. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 years ago
Paul Mackerras	ec2fa61792	execute1: Reduce width of the result mux to help timing This reduces the number of different things that are assigned to the result variable. - The computations for the popcnt, prty, cmpb and exts instruction families are moved into the logical unit. - The result of mfspr from the slow SPRs is computed in 'spr_val' before being assigned to 'result'. - Writes to LR as a result of a blr or bclr instruction are done through the exc_write path to writeback. This eases timing considerably. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 years ago
Paul Mackerras	0c714f1be6	execute: Move popcnt and prty instructions into the logical unit This implements logic in the logical entity to calculate the results of the popcnt* and prty* instructions. We now have one insn_type_t value for the 3 popcnt variants and one for the two prty variants, using the length field of the decode_rom_t to distinguish between them. The implementations in logical.vhdl using recursive algorithms rather than the simple functions in ppc_fx_insns.vhdl. This gives a saving of about 140 slice LUTs on the A7-100 and improves timing slightly. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 years ago
Anton Blanchard	b8fb721b81	Consolidate logical instructions Consolidate and/andc/nand, or/orc/nor and xor/eqv, using a common invert on the input and output. This saves us about 200 LUTs. Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	5 years ago

10 Commits (41328306f3c7308425018748b09ba4d9f44febc5)