microwatt

Commit Graph

Author	SHA1	Message	Date
Paul Mackerras	24a4a796ce	execute: Consolidate count-leading/trailing-zeroes implementations This adds combinatorial logic that does 32-bit and 64-bit count leading and trailing zeroes in one unit, and consolidates the four instructions under a single OP_CNTZ opcode. This saves 84 slice LUTs on the Arty A7-100. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 years ago
Anton Blanchard	b8fb721b81	Consolidate logical instructions Consolidate and/andc/nand, or/orc/nor and xor/eqv, using a common invert on the input and output. This saves us about 200 LUTs. Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	5 years ago
Paul Mackerras	f7c393ba7e	Add a rotate/mask/shift unit and use it in execute1 This adds a new entity 'rotator' which contains combinatorial logic for rotating and masking 64-bit values. It implements the operations of the rlwinm, rlwnm, rlwimi, rldicl, rldicr, rldic, rldimi, rldcl, rldcr, sld, slw, srd, srw, srad, sradi, sraw and srawi instructions. It consists of a 3-stage 64-bit rotator using 4:1 multiplexors at each stage, two mask generators, output logic and control logic. The insn_type_t values used for these instructions have been reduced to just 5: OP_RLC, OP_RLCL and OP_RLCR for the rotate and mask instructions (clear both left and right, clear left, clear right variants), OP_SHL for left shifts, and OP_SHR for right shifts. The control signals for the rotator are derived from the opcode and from the is_32bit and is_signed fields of the decode_rom_t. The rotator is instantiated as an entity in execute1 so that we can be sure we only have one of it. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 years ago
Paul Mackerras	90b6e27380	Generalize the mul_32bit and mul_signed fields of decode_rom_t This changes the names of the mul_32bit and mul_signed fields of decode_rom_t to is_32bit and is_signed, so they can be used with other types of operations besides multiplies. This plumbs the is_32bit and is_signed flags down into execute1, though they are not used at this point. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 years ago
Paul Mackerras	7fe84220a5	decode: Avoid multiplexing from instruction reg fields to regfile address ports This aims to simplify the logic between the instruction image and the register file read address ports and reduce the size of the decode tables. With this patch, the input_reg_a column of the decode tables can only select RA or zeroes, the input_reg_b column can only select RB or a constant (0, -1, or an immediate value from the instruction), and the input_reg_c columns can only select RS or zeroes. That means that the rotate/shift/logical ops now have their first input coming in via the input_reg_c column. That means we need to add a read_data3 field to the Decode2ToExecuteType record, but that will go away again when we split out the rotate/mask/logical ops to their own unit. As a related but not tightly connected change, this patch also sets the read1_enable signal to the register file be 0 when RA=0 and the input_reg_a for the instruction is RA_OR_ZERO (previously it was 1). Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 years ago
Paul Mackerras	96b402a4bf	Consolidate add/subtract instructions into a single op All of the PPC add and subtract instructions, including carrying and extended versions, do much the same arithmetic operation: result = (I xor A) + B + C where A is the value from RA, I provides a logical inversion of A (i.e. I is 0 or -1), B is either from RB or is a constant 0 or -1, and C is 0, 1 or the carry bit from XER (CA). To consolidate all the add/subtract instructions into a single OP_ADD, we add a column to decode_rom_t to indicate when A should be inverted, and change the input_carry field to a 3-state selector to select C in the equation above. This also adds a new "CONST_M1" value for input_reg_b_t to indicate that B is a constant -1. This allows us to implement addme and subfme. The addex instruction appears not to exist, so the comments referring to it are removed. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 years ago
Paul Mackerras	b0f302ecf4	decode: Make all update-form indexed loads and stores use RA_OR_ZERO Experimentation on POWER9 indicates that the invalid form of lbzux with RA=0 uses just RB as the address, not R0 + RB. Extrapolating this to all update-form loads and stores with RA=0, change all the update-form loads and stores to use RA_OR_ZERO rather than RA. This then means that all decode ROM entries with insn_type = LDST have input_reg_a = RA_OR_ZERO. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 years ago
Paul Mackerras	58b06eb5f3	decode: Remove const fields from decode_rom_t The const* fields of decode_rom_t drove multiplexers in decode2 that picked out various instruction fields and put them into the const* fields of the Decode2ToExecute1Type record, from where they were used in execute1. However, the code in execute1 can just as easily use the appropriate fields of the original instruction word, since that is now available in execute1. This therefore changes the code to do that, resulting in smaller decode tables. Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 years ago
Paul Mackerras	143d0ae9e4	decode: Fix larx/stcx instructions to use RA_OR_ZERO not RA The l?arx and st?cx. instructions are defined to use the normal indexed mode address calculations, i.e. (RA\|0) + RB. Fix their entries in the decode table to say RA_OR_ZERO rather than RA. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 years ago
Paul Mackerras	bbae2d1eda	decode: Index minor op table with insn bits for opcode 31 This changes decode_op_31_array from being indexed by a ppc_insn_t (which is derived from the instruction word by a whole series of if/elsif statements) to being indexed directly by bits 10...1 of the instruction word. With this we no longer need ppc_insn. This then means that the decode1 stage doesn't distinguish between mfcr and mfocrf, or between mtcrf and mtocrf, since those are distinguished by the value in bit 20 of the instruction. To accommodate that, execute1 changes so that the one op value (OP_MFCR) does either the mfcr or the mfocrf behaviour depending on bit 20 of the instruction word; and similarly for mtcrf/mtocrf. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 years ago
Paul Mackerras	21d3f8a5ed	decode: Index minor op table with insn bits for opcode 30 This comprises the 64-bit rotate and mask instructions. In order to reduce the table index to 3 bits, we combine rldcl and rdlcr into a single op (OP_RLDCX), and choose the right mask at execute time based on bit 1 of the instruction word. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 years ago
Paul Mackerras	00e9f801f6	decode: Index minor op table with insn bits for opcode 19 This changes the decoding of major opcode 19 from using the ppc_insn_t index to using bits of the instruction word directly. Opcode 19 has a 10-bit minor opcode field (bits 10..1) but the space is sparsely filled. Therefore we index a table of single-bit entries with the 10-bit minor opcode to filter out the illegal minor opcodes, and index a table using just 3 bits -- 5, 3 and 2 -- of the instruction to get the decode entry. This groups together all the instructions in 4 columns of the opcode map as a single entry. That means that mcrf and all the CR logical ops get grouped together, and bcctr, bclr and bctar get grouped together. At present the CR logical ops are not implemented, so their grouping has no impact. The code for bclr and bcctr in execute1 is now common, using a single op, and it now determines the branch address by looking at bit 10 of the instruction word at execute time. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 years ago
Paul Mackerras	e30a87593a	decode: Start moving towards decoding by major opcode first With this, we have a table for most major opcodes and separate tables for each major opcode that has further decoding required. These tables are still mostly indexed by the ppc_insn_t values, however. A few things are still decoded completely at the top level: nop, attn and sim_config. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 years ago
Paul Mackerras	c9e92483b8	decode: Push mtspr/mfspr register decoding down into execute1 Instead of doing mfctr, mflr, mftb, mtctr, mtlr as separate ops, just pass down mfspr and mtspr ops with the spr number and let execute1 decode which SPR we're addressing. This will help reduce the number of instruction bits decode1 needs to look at. In fact we now pass down the whole instruction from decode2 to execute1. We will need more bits of the instruction in future, and the tools should just optimize away any that we don't end up using. Since the 'aa' bit was just a copy of an instruction bit, we can now remove it from the record. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 years ago
Benjamin Herrenschmidt	3e6f656a90	Add MCRF instruction Hopefully it's not too timing catastrophic. The variable newcrf will be handy for the other CR ops when we implement them I suspect. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	5 years ago
Benjamin Herrenschmidt	554ae88540	Implement absolute branches Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	5 years ago
Anton Blanchard	b57325ce29	Merge branch 'divider' of https://github.com/paulusmack/microwatt	5 years ago
Anton Blanchard	5a6f8d26d1	Rename OP_SUBFC -> OP_SUBFE, OP_ADDC -> OP_ADDE These were somewhat badly named. Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	5 years ago
Paul Mackerras	d5bc6c8824	Add a divider unit and a testbench for it This adds a divider unit, connected to the core in much the same way that the multiplier unit is connected. The division algorithm is very simple-minded, taking 64 clock cycles for any division (even 32-bit division instructions). The decoding is simplified by making use of regularities in the instruction encoding for div* and mod* instructions. Instead of having PPC_* encodings from the first-stage decoder for each of the different div* and mod* instructions, we now just have PPC_DIV and PPC_MOD, and the inputs to the divider that indicate what sort of division operation to do are derived from instruction word bits. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	5 years ago
Benjamin Herrenschmidt	98f0994698	Add core debug module This module adds some simple core controls: reset, stop, start, step along with icache clear and reading the NIA and core status bits Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org	5 years ago
Anton Blanchard	7bb88d5321	Merge pull request #59 from antonblanchard/trap-decode Fix make check	5 years ago
Anton Blanchard	427effdaa9	Fix make check We need to finish support for all the trap instructions, but for now we at least need a decode entry for tw, so we know to stall until the previous instruction completes. Some of our test cases were failing because the trap executed before the previous instruction completed. All these trap instructions need to be resolved at completion, not in execute. Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	5 years ago
Anton Blanchard	9867fb6149	Add a decode for the nop instruction We want these to go out without any GPR dependencies, so add a specific entry in decode for them. Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	5 years ago
Anton Blanchard	acdb2ea157	No need to gate nia or insn in decode1 Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	5 years ago
Anton Blanchard	b9e28598b4	Explicitly check against '1' in if statements nvc doesn't like what I think is a VHDL 2008 construct. Lets just check against '1' explicitly. Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	5 years ago
Anton Blanchard	a2df2a10a2	Remove sim console We can force all existing code to use the UART console by passing 0 in bit zero of the sim config register. Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	5 years ago
Anton Blanchard	92a7152370	Rework pipeline, add stall and flush signals This adds stall and flush signals to the pipeline. Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	5 years ago
Anton Blanchard	9687034d78	Add a decode bit to mark an instruction as single through the pipeline This is used by the pipelining patches. Mark everyone as single through the pipeline to start. Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	5 years ago
Benjamin Herrenschmidt	b0ade2857f	decode1 array fix header Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	5 years ago
Anton Blanchard	9fbaea6f08	Rework CR file and add forwarding Handle the CR as a single field with per nibble enables. Forward any writes in the same cycle. If this proves to be an issue for timing, we may want to revisit this in the future. For now, it keeps things simple. Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	5 years ago
Anton Blanchard	5a29cb4699	Initial import of microwatt Signed-off-by: Anton Blanchard <anton@linux.ibm.com>	5 years ago

31 Commits (24a4a796ce1e4bf370e00f801bc1cee6faf7d8f7)