Commit Graph

12 Commits (master)

Author SHA1 Message Date
Paul Mackerras 39ca675ce3 Decode prefixed instructions
This adds logic to do basic decoding of the prefixed instructions
defined in PowerISA v3.1B which are in the SFFS (Scalar Fixed plus
Floating-Point Subset) compliancy subset.  In PowerISA v3.1B SFFS,
there are 14 prefixed load/store instructions plus the prefixed no-op
instruction (pnop).  The prefixed load/store instructions all use an
extended version of D-form, which has an extra 18 bits of displacement
in the prefix, plus an 'R' bit which enables PC-relative addressing.

When decode1 sees an instruction word where the insn_code is
INSN_prefix (i.e. the primary opcode was 1), it stores the prefix word
and sends nothing down to decode2 in that cycle.  When the next valid
instruction word arrives, it is interpreted as a suffix, meaning that
its insn_code gets modified before being used to look up the decode
table.

The insn_code values are rearranged so that the values for
instructions which are the suffix of a valid prefixed instruction are
all at even indexes, and the corresponding prefixed instructions
follow immediately, so that an insn_code value can be converted to the
corresponding prefixed value by setting the LSB of the insn_code
value.  There are two prefixed instructions, pld and pstd, for which
the suffix is not a valid SFFS instruction by itself, so these have
been given dummy insn_code values which decode as illegal (INSN_op57
and INSN_op61).

For a prefixed instruction, decode1 examines the type and subtype
fields of the prefix and checks that the suffix is valid for the type
and subtype.  This check doesn't affect which entry of the decode
table is used; the result is passed down to decode2, and will in
future be acted upon in execute1.

The instruction address passed down to decode2 is the address of the
prefix.  To enable this, part of the instruction address is saved when
the prefix is seen, and then the instruction address received from
icache is partly overlaid by the saved prefix address.  Because
prefixed instructions are not permitted to cross 64-byte boundaries,
we only need to save bits 5:2 of the instruction to do this.  If the
alignment restriction ever gets relaxed, we will then need to save
more bits of the address.

Decode2 has been extended to handle the R bit of the prefix (in 8LS
and MLS forms) and to be able to generate the 34-bit immediate value
from the prefix and suffix.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
9 months ago
Paul Mackerras 4b2c23703c core: Implement quadword loads and stores
This implements the lq, stq, lqarx and stqcx. instructions.

These instructions all access two consecutive GPRs; for example the
"lq %r6,0(%r3)" instruction will load the doubleword at the address
in R3 into R7 and the doubleword at address R3 + 8 into R6.  To cope
with having two GPR sources or destinations, the instruction gets
repeated at the decode2 stage, that is, for each lq/stq/lqarx/stqcx.
coming in from decode1, two instructions get sent out to execute1.

For these instructions, the RS or RT register gets modified on one
of the iterations by setting the LSB of the register number.  In LE
mode, the first iteration uses RS|1 or RT|1 and the second iteration
uses RS or RT.  In BE mode, this is done the other way around.  In
order for decode2 to know what endianness is currently in use, we
pass the big_endian flag down from icache through decode1 to decode2.
This is always in sync with what execute1 is using because only rfid
or an interrupt can change MSR[LE], and those operations all cause
a flush and redirect.

There is now an extra column in the decode tables in decode1 to
indicate whether the instruction needs to be repeated.  Decode1 also
enforces the rule that lq with RT = RT and lqarx with RA = RT or
RB = RT are illegal.

Decode2 now passes a 'repeat' flag and a 'second' flag to execute1,
and execute1 passes them on to loadstore1.  The 'repeat' flag is set
for both iterations of a repeated instruction, and 'second' is set
on the second iteration.  Execute1 does not take asynchronous or
trace interrupts on the second iteration of a repeated instruction.

Loadstore1 uses 'next_addr' for the second iteration of a repeated
load/store so that we access the second doubleword of the memory
operand.  Thus loadstore1 accesses the doublewords in increasing
memory order.  For 16-byte loads this means that the first iteration
writes GPR RT|1.  It is possible that RA = RT|1 (this is a legal
but non-preferred form), meaning that if the memory operand was
misaligned, the first iteration would overwrite RA but then the
second iteration might take a page fault, leading to corrupted state.
To avoid that possibility, 16-byte loads in LE mode take an
alignment interrupt if the operand is not 16-byte aligned.  (This
is the case anyway for lqarx, and we enforce it for lq as well.)

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
3 years ago
Paul Mackerras fc2968f132 FPU: Implement remaining FPSCR-related instructions
This implements mcrfs, mtfsfi, mtfsb0/1, mffscr, mffscrn, mffscrni and
mffsl.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
4 years ago
Paul Mackerras 45cd8f4fc3 core: Add support for floating-point loads and stores
This extends the register file so it can hold FPR values, and
implements the FP loads and stores that do not require conversion
between single and double precision.

We now have the FP, FE0 and FE1 bits in MSR.  FP loads and stores
cause a FP unavailable interrupt if MSR[FP] = 0.

The FPU facilities are optional and their presence is controlled by
the HAS_FPU generic passed down from the top-level board file.  It
defaults to true for all except the A7-35 boards.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
4 years ago
Paul Mackerras 290b05f97d core: Implement the maddhd, maddhdu and maddld instructions
These instructions use major opcode 4 and have a third GPR input
operand, so we need a decode table for major opcode 4 and some
plumbing to get the RC register operand read.

The multiply-add instructions use the same insn_type_t values as the
regular multiply instructions, and we distinguish in execute1 by
looking at the major opcode.  This turns out to be convenient because
we don't have to add any cases in the code that handles the output of
the multiplier, and it frees up some insn_type_t values.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
4 years ago
Shawn Anastasio e606772aeb Implement the addpcis instruction
This commit adds support for the addpcis instruction from ISA 3.0.

A new input_reg_b_t type, CONST_DX_HI, was added to support the
shifted immediate value used in DX-Form instructions.

Signed-off-by: Shawn Anastasio <shawn@anastas.io>
4 years ago
Tom Vijlbrief c05441bf47 Implement CRNOR and friends
Signed-off-by: Tom Vijlbrief <tvijlbrief@gmail.com>
4 years ago
Benjamin Herrenschmidt 501b6daf9b Add basic XER support
The carry is currently internal to execute1. We don't handle any of
the other XER fields.

This creates type called "xer_common_t" that contains the commonly
used XER bits (CA, CA32, SO, OV, OV32).

The value is stored in the CR file (though it could be a separate
module). The rest of the bits will be implemented as a separate
SPR and the two parts reconciled in mfspr/mtspr in latter commits.

We always read XER in decode2 (there is little point not to)
and send it down all pipeline branches as it will be needed in
writeback for all type of instructions when CR0:SO needs to be
updated (such forms exist for all pipeline branches even if we don't
yet implement them).

To avoid having to track XER hazards, we forward it back in EX1. This
assumes that other pipeline branches that can modify it (mult and div)
are running single issue for now.

One additional hazard to beware of is an XER:SO modifying instruction
in EX1 followed immediately by a store conditional. Due to our writeback
latency, the store will go down the LSU with the previous XER value,
thus the stcx. will set CR0:SO using an obsolete SO value.

I doubt there exist any code relying on this behaviour being correct
but we should account for it regardless, possibly by ensuring that
stcx. remain single issue initially, or later by adding some minimal
tracking or moving the LSU into the same pipeline as execute.

Missing some obscure XER affecting instructions like addex or mcrxrx.

[paulus@ozlabs.org - fix CA32 and OV32 for OP_ADD, fix order of
 arguments to set_ov]

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
4 years ago
Benjamin Herrenschmidt 3e6f656a90 Add MCRF instruction
Hopefully it's not too timing catastrophic. The variable newcrf will
be handy for the other CR ops when we implement them I suspect.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
5 years ago
Benjamin Herrenschmidt 554ae88540 Implement absolute branches
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
5 years ago
Anton Blanchard 9ff86a62f5 Reformat insn_helpers
Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
5 years ago
Anton Blanchard 5e140298a5 Rework decode2
The decode2 stage was spaghetti code and needed cleaning up.
Create a series of functions to pull fields from a ppc instruction
and also a series of helpers to extract values for the execution
units.

As suggested by Paul, we should pass all signals to the execution
units and only set the valid signal conditionally, which should
use less resources.

Signed-off-by: Anton Blanchard <anton@linux.ibm.com>
5 years ago