microwatt

Commit Graph

Author	SHA1	Message	Date
Paul Mackerras	7619df6b78	core: Implement HRMOR as a read-only zero register (#450 ) Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	7 months ago
Paul Mackerras	d2bf3f3580	core: Implement hypervisor doorbell interrupt and msg* instructions This implements the hypervisor doorbell exception and interrupt and the msgsnd, msgclr and msgsync instructions (msgsync is a no-op). The msgsnd instruction can generate a hypervisor doorbell interrupt on any CPU in the system. To achieve this, each core sends its hypervisor doorbell messages to the soc level, which ORs together the bits for each CPU and sends it to that CPU. The privileged doorbell exception/interrupt and the msgsndp/msgclrp instructions are not required since we don't implement SMT. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	8 months ago
Paul Mackerras	ca872faede	core: Consolidate several OP_* values into a single OP_COMPUTE This replaces OP_ADDG6S, OP_BCD, OP_BREV, OP_CMPB, OP_CMPEQB, OP_CMPRB, OP_CROP, OP_EXTS, OP_EXTSWSLI, OP_ISEL, OP_LOGIC, OP_MFCR, OP_PRTY, OP_RLC, OP_RLCL, OP_RLCR, OP_SETB, OP_SHL, OP_SHR, and OP_XOR with a single OP_COMPUTE. The replaced operations are all ones which just compute a result value (for GPR or CR) in execute1, don't have any other side effects, and aren't used in decode2 to determine other signals. The operation to be performed is sufficiently defined by the result and subresult fields in the decode table. With the elimination of OP_SPARE, this reduces the number of insn_type_t values to 44. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	8 months ago
Paul Mackerras	8f6c727309	execute1: Rework data paths for mfspr and mtspr Data being written to an SPR by mtspr now comes in to execute2 via ex1.write_spr_data (renamed from ex1.ramspr_odd_data) rather than ex1.e.write_data. This eliminates the need for the main result mux in execute1 to be able to pass the c_in value through. For mfspr, the no-op behaviour is obtained by selecting ex1.write_spr_data as spr_result in execute2. We already had ex1.write_spr_data being set from c_in, so no new logic is required there. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	10 months ago
Paul Mackerras	fc3ff2d340	logical: Use sub_select rather than insn_type to select logical op Also select the RS passthrough in the logical unit by default for mfspr, which is needed for the no-op SPRs and the no-op behaviour of privileged mfspr to unimplemented SPRs. For slow SPRs the RS behaviour gets passed through from execute1 to execute2 and replaced by the correct result in execute2's result mux. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	10 months ago
Paul Mackerras	54173a0677	decode: Move result_sel and subresult_sel into main decode table Instead of working out result_sel and subresult_sel in decode2 from the insn_type, they now come directly from the main decode table in decode1. This reduces the need for distinct insn_type values and should enable us to avoid expanding insn_type beyond 6 bits. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	10 months ago
Paul Mackerras	e14712e45c	core: Simplify operand presentation for hash instructions This removes the cases in the decode stages which allowed the C register address to come from the RB field for the hash instructions (hashst[p], hashchk[p]), and generated a negative immediate value for the B operand. The motivation is to simpify the logic for the C register address. Instead the unusual construction of the address for the hash instructions is handled in the loadstore1_in process, and the hash computation uses the A and B operands rather than A and C. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	11 months ago
Paul Mackerras	b14dd43ce6	Merge pull request #443 from paulusmack/compliance More architecture compliance improvements: LPCR, [U]SIER[23], [U]MMCR3, HMER, HMEER. Remove HFSCR and associated logic.	12 months ago
Paul Mackerras	8f7326a824	core: Implement various SPRs which read zero and ignore writes This implements [U]SIER2, [U]SIER3, [U]MMCR3, HMER and HMEER as SPRs which return zero when read, and ignore writes. The zero value is provided via the slow SPR read multiplexer. To avoid increasing the size of the selector from 4 bits to 5, the (implementation specific) LOG_ADDR and LOG_DATA SPRs now share a single selector value. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	12 months ago
Paul Mackerras	1b6ee631bc	core: Implement LPCR register This implements the LPCR (Logical Partition Control Register) with 5 read/write bits. The other 59 bits are read-only; two (HR and UPRT) read as 1 and the rest as 0. The bits that are implemented are: * HAIL - enables taking interrupts with relocation on * LD - enables large decrementer mode * HEIC - disables external interrupts when set * LPES - controls how external interrupts are delivered * HVICE - does nothing at present since there is no source of Hypervisor Virtualization Interrupts. This also fixes a bug where MSR[RI] was getting cleared by the delivery of hypervisor interrupts. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 year ago
Paul Mackerras	63fff5e05c	core: Remove HFSCR and Hypervisor Facility Unavailable interrupt logic HFSCR is associated with the LPAR (Logical Partitioning) feature, which is not required for SFFS designs, so remove it and the associated logic. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 year ago
Paul Mackerras	b63773f6e9	FPU: Move computation of main adder inputs out of the state machine Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 year ago
Paul Mackerras	413907e4bc	soc: Move timebase back into the core and enable writing to it Instead of a single global timebase register in the SoC, we now have a timebase counter in each core; however, now they are only reset by the soc reset, not the core reset. Thus they stay in sync even when some cores are disabled (via the syscon cpu_ctrl register). This implements mtspr to the TBLW and TBUW SPRs, which write the lower and upper 32 bits of this core's timebase, respectively. In order to fulfil the ISA's requirements that (a) some method for getting the timebases into sync and (b) some method for preventing userspace from reading the timebase be provided by the platform, this adds a syscon register TB_CTRL with two read/write bits implemented; bit 0 freezes all the timebases in the system when set, and bit 1 makes reading the timebase privileged (in all cores). Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 year ago
Paul Mackerras	f705fc5e19	core: Implement reserved/no-op SPR numbers SPR numbers 808 - 811 do nothing when read or written, that is, mfspr doesn't modify the destination register. This is accomplished in the same way that privileged mfspr to an unimplemented SPR is made a no-op, by supplying the old contents of the destination register as an input and writing that same value back. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 year ago
Paul Mackerras	c49c32b5fe	core: Implement DEXCR and HDEXCR registers Of the defined aspect bits (which are all read-write), only the NPHIE and PHIE bits have any function at all, since Microwatt is an in-order single-issue machine and never does any branch speculation. Also, since there is no privileged non-hypervisor mode, the high 32 bits of DEXCR do nothing. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 year ago
Paul Mackerras	1395bde3cc	core: Store hash key SPRs in the SPR RAM This moves HASHKEYR and HASHPKEYR to the SPR RAM that also stores things such as SRR0/1, LR and CTR. For hashst[p] and hashchk[p] instructions, execute1 reads the relevant key register from the RAM and sends it to loadstore1. This saves several LUTs. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 year ago
Paul Mackerras	2c7d1e5d9c	decode: Split input B selection into two fields Instead of a single input_reg_b_t field in the decode table which select both whether input B is a register or constant, and also which constant (immediate value) to use, we now have one field which selects whether input B is immediate (constant), a GPR, or an FPR, and a separate field to select which sort of immediate value to use. This results in simpler logic and better timing. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 year ago
Paul Mackerras	3bcc31fdda	core: Implement hashstp and hashchkp instructions and HASHPKEYR register These provide facilities similar to hashstp, hashchk and HASHKEYR, but restricted to privileged mode. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 year ago
Paul Mackerras	00a3db8457	decode1: Indicate instruction privilege in main decode table Previously the computation of whether an instruction is privileged or not was done based on the insn_type. However, that meant that lcix (OP_LOAD) and stcix (OP_STORE) couldn't be made privileged, and neither could tlbsync (OP_NOP). Instead, this adds a field to the main instruction decode table to indicate privileged instructions, and makes the cache-inhibited loads and stores privileged, along with tlbsync. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 year ago
Paul Mackerras	0a11e8455f	core: Implement hashst and hashchk instructions These are done in loadstore1. The HashDigest function is computed in 9 cycles; for 8 cycles, a state machine does 4 steps of key expansion per cycle, and for each of 4 lanes of data, does 4 steps of ciphering; then there is 1 cycle to combine the results into the final hash value. At present, hashcmp does not overlap the computation of the hash with fetching of data from memory (in the case of a cache miss). The 'is_signed' field in the instruction decode table is used to distinguish hashst and hashcmp from ordinary loads and stores. We have a new 'RBC' value for input_reg_c_t which says that we are reading RB but we want the value to come in via the C port; this is because we want the 5-bit immediate offset on the B port. Note that in the list of insn_code values, hashst/chk have been put in the section for instructions with an RB operand, which is not strictly correct given that the B port is used for the immediate D operand; however, adding them to the section for instructions without an RB operand would have made that section exceed 128 entries, causing changes to the padding needed. The only downside to having hashst/cmp where they are is that the debug logic can't use the RB port to read GPR/FPRs when a hashst/cmp instruction is being decoded. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 year ago
Paul Mackerras	5a28f76b6f	execute1: Implement CIABR CIABR (Completed Instruction Address Breakpoint Register) is an SPR that contains an instruction address. When the instruction at that address completes, the CPU takes a Trace interrupt before executing the next instruction (provided the instruction doesn't cause some other interrupt and isn't an rfid, hrfid or rfscv instruction). Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 year ago
Paul Mackerras	7437f699ca	core: Implement the PIR SPR This reports the CPU core number, currently always 0, but this will be useful in future for distinguishing which CPU is which in a multiprocessor system. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 year ago
Paul Mackerras	5121e0f392	core: Implement sync instructions This implements all the sync variants (sync, lwsync, ptesync, etc.) as a LSU op that gets sent down to the dcache and completes once the dcache state machine is idle. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 year ago
Paul Mackerras	ba4614c5f4	dcache: Implement data cache touch and flush instructions This implements dcbf, dcbt and dcbtst in the dcache. The dcbst (data cache block store) instruction remains a no-op because our dcache is write-through and therefore never has modified data that could need to be written back. Dcbt (data cache block touch) and dcbtst (data cache block touch for store) behave similarly except that dcbtst is a no-op on a readonly page. Neither instruction ever causes an interrupt. If they miss in the cache and the page is cacheable, they are handled like a load miss except that they complete immediately the state machine starts handling the load miss rather than waiting for any data. Dcbf (data cache block flush) can cause a data storage interrupt. If it hits in the cache, the state machine goes to a new FLUSH_CYCLE state in which the cache line valid bit is cleared. In order to avoid having more than 8 values in op_t, this combines OP_STORE_MISS and OP_STORE_HIT into a single state. A new OP_NOP state is used for operations which can complete immediately without changing any dcache state (now used for dcbt/dcbtst causing access exception or on a non-cachable page, or dcbf that misses the cache). Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 year ago
Paul Mackerras	722f239c02	Reimplement quadword loads and stores This adds implementations of lq, plq, stq, pstq, lqarx and stqcx. Because register file addresses are now computed in decode1 before we have the decode table entry for the instruction, we have to check the icode directly to know when to read register RS\|1 before RS (i.e. for stq and stqcx in LE mode, but not pstq). For the second instance of the instruction, loadstore1 uses the EA from the first instance + 8. It generates an alignment interrupt for unaligned lqarx and stqcx and for lq in LE mode with an unaligned address. (The reason for the latter case is that it writes RT\|1 before RT, and if we have RA = RT\|1 and the second instance traps, we will have overwritten RA.) Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 year ago
Paul Mackerras	d358981d43	Generate doubled instructions in decode1 rather than decode2 This will allow us to read different source registers for the two pieces, which will be needed for instructions like stq. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 year ago
Paul Mackerras	fa9df33f7e	Implement cfuged, pdepd and pextd This implements the cfuged, pdepd and pextd instructions in a new unit called bit_sorter (so called because cfuged and pextd can be viewed as sorting the bits of the mask). The cnt* instructions and the popcnt* instructions now use the same OP_COUNTB insn_type so as to free up an insn_type value to use for the new instructions. The new instructions are implemented using a slow and simple algorithm that takes 64 cycles to compute the result. The ex1 stage is stalled while this happens, as for a 64-bit multiply, or for a divide when there is no FPU. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 year ago
Paul Mackerras	d7d7a3afd4	Implement VRSAVE SPR VRSAVE is a 32-bit software-use SPR accessible in user mode. It is stored in the SPR RAM. The value read from the RAM is trimmed to 32 bits at the ramspr_read process. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 year ago
Paul Mackerras	d112a7ad94	Implement scv and rfscv The main quirk here is that scv sets LR and CTR instead of SRR0 and SRR1, and likewise rfscv uses LR and CTR. Also, scv uses a set of 128 interrupt vectors starting at 0x17000. Fortunately, the layout of the SPR RAM was already such that LR and CTR were in the even and odd halves respectively at the same index, so reading or writing LR and CTR instead of SRR0 and SRR1 is quite easy. Use of scv is subject to an FSCR bit but not an HFSCR bit. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 year ago
Paul Mackerras	a88fa9c459	Implement DSCR The DSCR (Data Stream Control Register) is a user-accessible SPR that controls aspects of data prefetching. It has 25 bits of state defined in the ISA. This implements the register as a 25 read/write bits that do nothing, since we don't have any prefetching. The DSCR is accessible at two SPR numbers, 3 (unprivileged) and 17 (privileged). Access via these SPR numbers is controlled by an FSCR bit and an HFSCR bit. The FSCR bit controls access via SPR 3 in user mode. The HFSCR bit controls access via SPR 3 in user mode and either SPR number in privileged non-hypervisor mode, but since we don't implement privileged non-hypervisor mode, it does essentially the same thing as the FSCR bit. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 year ago
Paul Mackerras	205c0e2c78	Implement the wait instruction This implements the behaviour of the 'wait 0' instruction of pausing execution of instructions until an exception arises. The exceptions that terminate a wait are a pending trace exception, external interrupt request, PMU interrupt request, or decrementer negative exception. These exception conditions terminate a wait even if not enabled to generate an interrupt (e.g. if MSR[EE] is zero). This is implemented by having execute1 assert its busy_out signal while the wait state exists. The wait state is set by the completion of the wait instruction and cleared by a pending exception. If the WC operand of the wait instruction is non-zero, indicating wait for reservation loss or wait for a short period, then the wait instruction does not wait, but just acts as a no-op. In order to make space in the insn_type_t type without going over 64 elements, this combines OP_DCBT and OP_ICBT into a single OP_XCBT, since they were both no-ops (except for their influence on how SRR1 is set on a trace interrupt, where they were identical). Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 year ago
Paul Mackerras	7bc7f335f1	Implement CTRL register The CTRL register has a single bit called RUN. It has some unusual behaviours: - It can only be written via SPR number 152, which is privileged - It can only be read via SPR number 136, which is non-privileged - Reading in problem state (user mode) returns the RUN bit in bit 0, but reading in privileged state (hypervisor mode) returns the RUN bit in bits 0 and 15. - Reading SPR 152 in problem state causes a HEAI (illegal instruction) interrupt, but reading in privileged state is a no-op; this is the same as for an unimplemented SPR. The RUN bit goes to the PMU and is also plumbed out to drive a LED on the Arty board. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 year ago
Paul Mackerras	d2777dd1dd	Generate Hypervisor Emulation Assistance Interrupt for illegal instructions This implements the HEIR register (Hypervisor Emulation Instruction Register) and arranges for an illegal instruction to cause a Hypervisor Emulation Assistance Interrupt (HEAI) at vector 0xE40, and set HEIR to the illegal instruction. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 year ago
Paul Mackerras	e3f4ccedec	Implement facility unavailable and hypervisor facility unavailable interrupts This adds the FSCR and HFSCR registers and implements the associated behaviours of taking a facility unavailable or hypervisor facility unavailable interrupt if certain actions are attempted while the relevant [H]FSCR bit is zero. At present, two FSCR enable bits and three HFSCR enable bits are implemented. FSCR has bits for prefixed instructions and accesses to the TAR register, and HFSCR has those plus a bit that enables access to floating-point registers and instructions. FSCR and HFSCR can be accessed through the debug interface using register addresses 0x2e and 0x2f. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	1 year ago
Paul Mackerras	2dceb28830	Improve timing of redirect_nia going from decode1 to fetch1 This moves the addition that computes the branch target address for statically predicted taken branches before a clock edge, so the redirect_nia signal going to fetch1 comes from a clean latch. The address generation logic is also simplified somewhat, and conditional absolute branches to negative addresses are no longer predicted taken (this should have no impact on performance as such branches are basically never used). Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	3 years ago
Paul Mackerras	1c4b5def36	Improve timing of redirect_nia going from writeback to fetch1 This gets rid of the adder in writeback that computes redirect_nia. Instead, the main adder in the ALU is used to compute the branch target for relative branches. We now decode b and bc differently depending on the AA field, generating INSN_brel, INSN_babs, INSN_bcrel or INSN_bcabs as appropriate. Each one has a separate entry in the decode table in decode1; the *rel versions use CIA as the A input. The bclr/bcctr/bctar and rfid instructions now select ramspr_result for the main result mux to get the redirect address into ex1.e.write_data. For branches which are predicted taken but not actually taken, we need to redirect to the following instruction. We also need to do that for isync. We do this in the execute2 stage since whether or not to do it depends on the branch result. The next_nia computation is moved to the execute2 stage and comes in via a new leg on the secondary result multiplexer, making next_nia available ultimately in ex2.e.write_data. This also means that the next_nia leg of the primary result multiplexer is gone. Incrementing last_nia by 4 for sc (so that SRR0 points to the following instruction) is also moved to execute2. Writing CIA+4 to LR was previously done through the main result multiplexer. Now it comes in explicitly in the ramspr write logic. Overall this removes the br_offset and abs_br fields and the logic to add br_offset and next_nia, and one leg of the primary result multiplexer, at the cost of a few extra control signals between execute1 and execute2 and some multiplexing for the ramspr write side and an extra input on the secondary result multiplexer. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	3 years ago
Paul Mackerras	06ff486567	icache: Restore primary opcode to instruction word The icache stores a predecoded insn_code value for each instruction, and so as to fit in 36 bits, omits the primary opcode (the most significant 6 bits) of each instruction. Previously, for valid instructions, the primary opcode field of the instruction delivered to decode1 was a part-representation of the insn_code value rather than the actual primary opcode. This adds a lookup table to compute the primary opcode from the insn_code and deliver it in the instruction words supplied to decode1. In order that each insn_code can be associated with a single primary opcode value, the various no-operation instructions with primary opcode 31 (the reserved no-ops and dss, dst and dstst) have been given a new insn_code, INSN_rnop, leaving INSN_nop for the preferred no-op (ori r0,r0,0). Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	3 years ago
Paul Mackerras	b50170cd1d	Implement byte reversal instructions This implements the byte-reverse halfword, word and doubleword instructions: brh, brw, and brd. These instructions were added to the ISA in version 3.1. They use a new OP_BREV insn_type value. The logic for these instructions is implemented in logical.vhdl. In order to avoid going over 64 insn_type values, OP_AND and OP_OR were combined into OP_LOGIC, which is like OP_AND except that the RS input can be inverted as well as the RB input. The various forms of OR instruction are then implemented using the identity a OR b = NOT (NOT a AND NOT b) The 'is_signed' field of the instruction decode table is used to indicate that RS should be inverted. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	3 years ago
Paul Mackerras	c4492c843a	Implement interrupts for prefixed instructions This arranges to generate an illegal instruction type program interrupt for illegal prefixed instructions, that is, those where the suffix is not a legal value given the prefix, or the prefix has a reserved value in the subtype field. This implementation doesn't generate an interrupt for the invalid 8LS:D and MLS:D instruction forms where R = 1 and RA != 0. (In those cases it uses (RA) as the addend, i.e. it ignores the R bit.) This detects the case where the address of an instruction prefix is equal mod 64 to 60, and generates an alignment interrupt in that case. This also arranges to set bit 34 of SRR1 when an interrupt occurs due to a prefixed instruction, for those interrupts where that is required (i.e. trace, alignment, floating-point unavailable, data storage, data segment, and most cases of program interrupt). Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	3 years ago
Paul Mackerras	39ca675ce3	Decode prefixed instructions This adds logic to do basic decoding of the prefixed instructions defined in PowerISA v3.1B which are in the SFFS (Scalar Fixed plus Floating-Point Subset) compliancy subset. In PowerISA v3.1B SFFS, there are 14 prefixed load/store instructions plus the prefixed no-op instruction (pnop). The prefixed load/store instructions all use an extended version of D-form, which has an extra 18 bits of displacement in the prefix, plus an 'R' bit which enables PC-relative addressing. When decode1 sees an instruction word where the insn_code is INSN_prefix (i.e. the primary opcode was 1), it stores the prefix word and sends nothing down to decode2 in that cycle. When the next valid instruction word arrives, it is interpreted as a suffix, meaning that its insn_code gets modified before being used to look up the decode table. The insn_code values are rearranged so that the values for instructions which are the suffix of a valid prefixed instruction are all at even indexes, and the corresponding prefixed instructions follow immediately, so that an insn_code value can be converted to the corresponding prefixed value by setting the LSB of the insn_code value. There are two prefixed instructions, pld and pstd, for which the suffix is not a valid SFFS instruction by itself, so these have been given dummy insn_code values which decode as illegal (INSN_op57 and INSN_op61). For a prefixed instruction, decode1 examines the type and subtype fields of the prefix and checks that the suffix is valid for the type and subtype. This check doesn't affect which entry of the decode table is used; the result is passed down to decode2, and will in future be acted upon in execute1. The instruction address passed down to decode2 is the address of the prefix. To enable this, part of the instruction address is saved when the prefix is seen, and then the instruction address received from icache is partly overlaid by the saved prefix address. Because prefixed instructions are not permitted to cross 64-byte boundaries, we only need to save bits 5:2 of the instruction to do this. If the alignment restriction ever gets relaxed, we will then need to save more bits of the address. Decode2 has been extended to handle the R bit of the prefix (in 8LS and MLS forms) and to be able to generate the 34-bit immediate value from the prefix and suffix. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	3 years ago
Paul Mackerras	7af0e001ad	Move insn_codes for mcrfs, mtfsb0/1 and mtfsfi This moves the insn_code values for mcrfs, mtfsb0/1 and mtfsfi into the region used for floating-point instructions. This means that in no-FPU implementations, they will get turned into illegal instructions in predecode. We then don't need the code in execute1 that makes FP instructions illegal in no-FPU implementations. We also remove the NONE value for unit_t, since it was only ever used with insn_type = OP_ILLEGAL, and the check for unit = NONE was redundant with the check for insn_type = OP_ILLEGAL. Thus the check for unit = NONE is no longer needed and is removed here. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	3 years ago
Paul Mackerras	21ab36a0c0	Pre-decode instructions when writing them to icache This splits out the decoding done in the decode0 step into a separate predecoder, used when writing instructions into the icache. The icache now holds 36 bits per instruction rather than 32. For valid instructions, those 36 bits comprise the bottom 26 bits of the instruction word, a 9-bit insn_code value (which uniquely identifies the instruction), and a zero in the MSB. For illegal instructions, the MSB is one and the full instruction word is in the bottom 32 bits. Having the full instruction word available for illegal instructions means that it can be printed in the log when simulating, or in future could be placed in the HEIR register. If we don't have an FPU, then the floating-point instructions are regarded as illegal. In that case, the insn_code values would fit into 8 bits, which could be used in future to reduce the size of decode_rom from 512 to 256 entries. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 years ago
Paul Mackerras	26dc1e879c	Eliminate use of primary opcode outside of decode1 This changes code that previously looked at the primary opcode (bits 26 to 31) of the instruction to use other methods, in places other than in stage0 of decode1. * Extend rc_t to have a new value, RCOE, indicating that the instruction has both Rc and OE bits. * Decode2 now tells execute1 whether the instruction has a third operand, used for distinguishing between multiply and multiply-add instructions. * The invert_a field of the decode ROM is overloaded for load/store instructions to indicate cache-inhibited loads and stores. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 years ago
Paul Mackerras	c9aea45ffe	decode1: Divide insn_code values into ranges to indicate register usage This lets us compute r_out.reg_*_addr and r_out.read_2_enable values without needing access to the primary opcode value. We also have that non-FP instructions are < 256. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 years ago
Paul Mackerras	c3ee10f013	decode1: Split instruction decoding into two steps This reduces the block RAM requirements for instruction decoding by splitting it into two steps. The first, in a new pipeline stage called decode0 (implemented by code in decode1.vhdl) maps the instruction to a 9-bit instruction code using major and row decode ROMs. The second maps the 9-bit code to the final decode_rom_t (about 44 bits wide). Branch prediction done in decode is now done in decode0 rather than decode1. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 years ago
Paul Mackerras	5380d80039	decode1: Use block RAMs in decode This combines the various decode arrays in decode1 into two, one indexed by the major opcode (bits 31--26 of the instruction) together with bits 4--0 of the instruction, and the other indexed mostly by the minor opcode (bits 10--1), with some swizzles to accommodate the relevant parts of the minor opcode space for opcodes 19, 31, 59 and 63 within a 2k entry ROM (11 address bits). These are called the "major" and the "row" decode ROMs respectively. (Bits 10--6 of the instruction are called the "row index", and bits 5--1, or 5--0 for some opcodes, are called the "column index", because of the way the opcode maps in the ISA are laid out.) Both ROMs are looked up each cycle and the result from one or other, or from an override in ri.override_decode, are selected after a clock edge. This uses quite a lot of BRAM resources. In future a predecode step will reduce the BRAM usage substantially. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 years ago
Paul Mackerras	47895f8aff	decode2: Decode unit and single-pipe attributes for mfspr/mtspr in decode2 Instead of doing that in decode1. That lets us get rid of the force_single and override_unit fields of reg_internal_t in decode1, which will simplify following changes to decode1. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 years ago
Paul Mackerras	932da4c114	FPU: Simplify IDLE state code Do more decoding of the instruction ahead of the IDLE state processing so that the IDLE state code becomes much simpler. To make the decoding easier, we now use four insn_type_t codes for floating-point operations rather than two. This also rearranges the insn_type_t values a little to get the 4 FP opcode values to differ only in the bottom 2 bits, and put OP_DIV, OP_DIVE and OP_MOD next to them. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 years ago
Paul Mackerras	7a60c118ed	loadstore1: Simplify address generation in OP_FETCH_FAILED case Instead of having a multiplexer in loadstore1 in order to be able to put the instruction address into v.addr, we now set decode.input_reg_a to CIA in the decode table entry for OP_FETCH_FAILED. That means that the operand selection machinery in decode2 will supply the instruction address to loadstore1 on the lv.addr1 input and no special case is needed in loadstore1. This saves a few LUTs (~40 on the Artix-7). Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	4 years ago
Michael Neuling	602ba25c70	Metavalue cleanup for decoder1.vhdl Signed-off-by: Michael Neuling <mikey@neuling.org>	4 years ago

1 2 3 4

198 Commits (master)