To vec_not or not
Now lets look at a similar example that adds some surprising
complexity. When we look at the negated compare forms we can not find
exact matches in the PowerISA. But a little knowledge of boolean
algebra can show the way to the equivalent functions.
First the X86 compare not equal case where we might expect to
find the equivalent vec_cmpne builtins for PowerISA:
Well not exactly. Looking at the OpenPOWER ABI document we see a
reference to
vec_cmpne for all numeric types. But when we look in the current
GCC 6 documentation we find that
vec_cmpne is not on the list. So it is planned
in the ABI, but not implemented yet.
Looking at the PowerISA 2.07B we find a VSX Vector Compare Equal to
Double-Precision but no Not Equal. In fact we see only vector double compare
instructions for greater than and greater than or equal in addition to the
equal compare. Not only can't we find a not equal, there is no less than or
less than or equal compares either.
So what is going on here? Partially this is the Reduced Instruction
Set Computer (RISC) design philosophy. In this case the compiler can generate
all the required compares using the existing vector instructions and simple
transforms based on Boolean algebra. So
vec_cmpne(A,B) is simply vec_not
(vec_cmpeq(A,B)). And vec_cmplt(A,B) is simply
vec_cmpgt(B,A) based on the
identity A < B iff B > A.
Similarly vec_cmple(A,B) is implemented as
vec_cmpge(B,A).
What a minute, there is no vec_not() either. Can not find it in the
PowerISA, the OpenPOWER ABI, or the GCC PowerPC Altivec Built-in documentation.
There is no vec_move() either! How can this possibly work?
This is RISC philosophy again. We can always use a logical
instruction (like bit wise and or
or) to effect a move, given that we also have
nondestructive 3 register instruction forms. In the PowerISA most instruction
have two input registers and a separate result register. So if the result
register number is different from either input register then the inputs are
not clobbered (nondestructive). Of course nothing prevents you from specifying
the same register for both inputs or even all three registers (result and both
inputs). And some times it is useful.
The statement B = vec_or (A,A) is is effectively a vector move/copy
from A to B. And A = vec_or (A,A) is obviously a
nop (no operation). In fact the
PowerISA defines the preferred nop and register move for vector registers
in this way.
The PowerISA implements the logical operators
nor (not or)
and nand (not and).
The PowerISA provides these instruction for
fixed point and vector logical operations. So vec_not(A)
can be implemented as vec_nor(A,A).
So for the implementation of _mm_cmpne we propose the following:
The Intel Intrinsics also include the not forms of the relational
compares:
The PowerISA and OpenPOWER ABI, or GCC PowerPC Altivec Built-in
documentation do not provide any direct equivalents to the not greater than
class of compares. Again you don't really need them if you know Boolean
algebra. We can use identities like
{not (A < B) iff A >= B} and
{not (A
<= B) iff A > B}. So the PPC64LE implementation follows:
These patterns repeat for the scalar version of the
not compares. And
in general the larger pattern described in this chapter applies to the other
float and integer types with similar interfaces.