To vec_not or not

To vec_not or not Now lets look at a similar example that adds some surprising complexity. When we look at the negated compare forms we can not find exact matches in the PowerISA. But a little knowledge of boolean algebra can show the way to the equivalent functions. First the X86 compare not equal case where we might expect to find the equivalent vec_cmpne builtins for PowerISA: Well not exactly. Looking at the OpenPOWER ABI document we see a reference to vec_cmpne for all numeric types. But when we look in the current GCC 6 documentation we find that vec_cmpne is not on the list. So it is planned in the ABI, but not implemented yet. Looking at the PowerISA 2.07B we find a VSX Vector Compare Equal to Double-Precision but no Not Equal. In fact we see only vector double compare instructions for greater than and greater than or equal in addition to the equal compare. Not only can't we find a not equal, there is no less than or less than or equal compares either. So what is going on here? Partially this is the Reduced Instruction Set Computer (RISC) design philosophy. In this case the compiler can generate all the required compares using the existing vector instructions and simple transforms based on Boolean algebra. So vec_cmpne(A,B) is simply vec_not (vec_cmpeq(A,B)). And vec_cmplt(A,B) is simply vec_cmpgt(B,A) based on the identity A < B iff B > A. Similarly vec_cmple(A,B) is implemented as vec_cmpge(B,A). What a minute, there is no vec_not() either. Can not find it in the PowerISA, the OpenPOWER ABI, or the GCC PowerPC Altivec Built-in documentation. There is no vec_move() either! How can this possibly work? This is RISC philosophy again. We can always use a logical instruction (like bit wise and or or) to effect a move, given that we also have nondestructive 3 register instruction forms. In the PowerISA most instruction have two input registers and a separate result register. So if the result register number is different from either input register then the inputs are not clobbered (nondestructive). Of course nothing prevents you from specifying the same register for both inputs or even all three registers (result and both inputs). And some times it is useful. The statement B = vec_or (A,A) is is effectively a vector move/copy from A to B. And A = vec_or (A,A) is obviously a nop (no operation). In fact the PowerISA defines the preferred nop and register move for vector registers in this way. The PowerISA implements the logical operators nor (not or) and nand (not and). The PowerISA provides these instruction for fixed point and vector logical operations. So vec_not(A) can be implemented as vec_nor(A,A). So for the implementation of _mm_cmpne we propose the following: The Intel Intrinsics also include the not forms of the relational compares: The PowerISA and OpenPOWER ABI, or GCC PowerPC Altivec Built-in documentation do not provide any direct equivalents to the not greater than class of compares. Again you don't really need them if you know Boolean algebra. We can use identities like {not (A < B) iff A >= B} and {not (A <= B) iff A > B}. So the PPC64LE implementation follows: These patterns repeat for the scalar version of the not compares. And in general the larger pattern described in this chapter applies to the other float and integer types with similar interfaces.