|
|
|
|
<?xml version="1.0" encoding="UTF-8"?>
|
|
|
|
|
<!--
|
|
|
|
|
Copyright (c) 2017 OpenPOWER Foundation
|
|
|
|
|
|
|
|
|
|
Licensed under the Apache License, Version 2.0 (the "License");
|
|
|
|
|
you may not use this file except in compliance with the License.
|
|
|
|
|
You may obtain a copy of the License at
|
|
|
|
|
|
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
|
|
|
|
|
|
Unless required by applicable law or agreed to in writing, software
|
|
|
|
|
distributed under the License is distributed on an "AS IS" BASIS,
|
|
|
|
|
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
|
|
|
See the License for the specific language governing permissions and
|
|
|
|
|
limitations under the License.
|
|
|
|
|
|
|
|
|
|
-->
|
|
|
|
|
<section xmlns="http://docbook.org/ns/docbook"
|
|
|
|
|
xmlns:xi="http://www.w3.org/2001/XInclude"
|
|
|
|
|
xmlns:xlink="http://www.w3.org/1999/xlink"
|
|
|
|
|
version="5.0"
|
|
|
|
|
xml:id="sec_vec_or_not">
|
|
|
|
|
<title>To vec_not or not</title>
|
|
|
|
|
|
|
|
|
|
<para>Now lets look at a similar example that adds some surprising
|
|
|
|
|
complexity. When we look at the negated compare forms we can not find
|
|
|
|
|
exact matches in the PowerISA. But a little knowledge of boolean
|
|
|
|
|
algebra can show the way to the equivalent functions.</para>
|
|
|
|
|
|
|
|
|
|
<para>First the X86 compare not equal case where we might expect to
|
|
|
|
|
find the equivalent vec_cmpne builtins for PowerISA:
|
|
|
|
|
<programlisting><![CDATA[extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__))
|
|
|
|
|
_mm_cmpneq_pd (__m128d __A, __m128d __B)
|
|
|
|
|
{
|
|
|
|
|
return (__m128d)__builtin_ia32_cmpneqpd ((__v2df)__A, (__v2df)__B);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__))
|
|
|
|
|
_mm_cmpneq_sd (__m128d __A, __m128d __B)
|
|
|
|
|
{
|
|
|
|
|
return (__m128d)__builtin_ia32_cmpneqsd ((__v2df)__A, (__v2df)__B);
|
|
|
|
|
}]]></programlisting></para>
|
|
|
|
|
|
|
|
|
|
<para>Well not exactly. Looking at the OpenPOWER ABI document we see a
|
|
|
|
|
reference to
|
|
|
|
|
<literal>vec_cmpne</literal> for all numeric types. But when we look in the current
|
|
|
|
|
GCC 6 documentation we find that
|
|
|
|
|
<literal>vec_cmpne</literal> is not on the list. So it is planned
|
|
|
|
|
in the ABI, but not implemented yet.</para>
|
|
|
|
|
|
|
|
|
|
<para>Looking at the PowerISA 2.07B we find a VSX Vector Compare Equal to
|
|
|
|
|
Double-Precision but no Not Equal. In fact we see only vector double compare
|
|
|
|
|
instructions for greater than and greater than or equal in addition to the
|
|
|
|
|
equal compare. Not only can't we find a not equal, there is no less than or
|
|
|
|
|
less than or equal compares either.</para>
|
|
|
|
|
|
|
|
|
|
<para>So what is going on here? Partially this is the Reduced Instruction
|
|
|
|
|
Set Computer (RISC) design philosophy. In this case the compiler can generate
|
|
|
|
|
all the required compares using the existing vector instructions and simple
|
|
|
|
|
transforms based on Boolean algebra. So
|
|
|
|
|
<literal>vec_cmpne(A,B)</literal> is simply <literal>vec_not
|
|
|
|
|
(vec_cmpeq(A,B))</literal>. And <literal>vec_cmplt(A,B)</literal> is simply
|
|
|
|
|
<literal>vec_cmpgt(B,A)</literal> based on the
|
|
|
|
|
identity A < B <emphasis><emphasis role="bold">iff</emphasis></emphasis> B > A.
|
|
|
|
|
Similarly <literal>vec_cmple(A,B)</literal> is implemented as
|
|
|
|
|
<literal>vec_cmpge(B,A)</literal>.</para>
|
|
|
|
|
|
|
|
|
|
<para>What a minute, there is no <literal>vec_not()</literal> either. Can not find it in the
|
|
|
|
|
PowerISA, the OpenPOWER ABI, or the GCC PowerPC Altivec Built-in documentation.
|
|
|
|
|
There is no <literal>vec_move()</literal> either! How can this possibly work?</para>
|
|
|
|
|
|
|
|
|
|
<para>This is RISC philosophy again. We can always use a logical
|
|
|
|
|
instruction (like bit wise <emphasis role="bold">and</emphasis> or
|
|
|
|
|
<emphasis role="bold">or</emphasis>) to effect a move, given that we also have
|
|
|
|
|
nondestructive 3 register instruction forms. In the PowerISA most instruction
|
|
|
|
|
have two input registers and a separate result register. So if the result
|
|
|
|
|
register number is different from either input register then the inputs are
|
|
|
|
|
not clobbered (nondestructive). Of course nothing prevents you from specifying
|
|
|
|
|
the same register for both inputs or even all three registers (result and both
|
|
|
|
|
inputs). And some times it is useful.</para>
|
|
|
|
|
|
|
|
|
|
<para>The statement <literal>B = vec_or (A,A)</literal> is is effectively a vector move/copy
|
|
|
|
|
from <literal>A</literal> to <literal>B</literal>. And <literal>A = vec_or (A,A)</literal> is obviously a
|
|
|
|
|
<emphasis role="bold"><literal>nop</literal></emphasis> (no operation). In fact the
|
|
|
|
|
PowerISA defines the preferred <literal>nop</literal> and register move for vector registers
|
|
|
|
|
in this way.</para>
|
|
|
|
|
|
|
|
|
|
<para>The PowerISA implements the logical operators
|
|
|
|
|
<emphasis role="bold">nor</emphasis> (<emphasis role="bold">not or</emphasis>)
|
|
|
|
|
and <emphasis role="bold">nand</emphasis> (<emphasis role="bold">not and</emphasis>).
|
|
|
|
|
The PowerISA provides these instruction for
|
|
|
|
|
fixed point and vector logical operations. So <literal>vec_not(A)</literal>
|
|
|
|
|
can be implemented as <literal>vec_nor(A,A)</literal>.
|
|
|
|
|
So for the implementation of _mm_cmpne we propose the following:
|
|
|
|
|
<programlisting><![CDATA[extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__))
|
|
|
|
|
_mm_cmpneq_pd (__m128d __A, __m128d __B)
|
|
|
|
|
{
|
|
|
|
|
__v2df temp = (__v2df ) vec_cmpeq ((__v2df) __A, (__v2df)__B);
|
|
|
|
|
return ((__m128d)vec_nor (temp, temp));
|
|
|
|
|
}
|
|
|
|
|
extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__))
|
|
|
|
|
_mm_cmpneq_sd (__m128d __A, __m128d __B)
|
|
|
|
|
{
|
|
|
|
|
__v2df a, b, c;
|
|
|
|
|
a = vec_splat(__A, 0);
|
|
|
|
|
b = vec_splat(__B, 0);
|
|
|
|
|
c = (__v2df)vec_cmpeq(a, b);
|
|
|
|
|
c = (__v2df)vec_nor(c, c);
|
|
|
|
|
return ((__m128d){c[0], __A[1]});
|
|
|
|
|
}]]></programlisting></para>
|
|
|
|
|
|
|
|
|
|
<para>The Intel Intrinsics also include the not forms of the relational
|
|
|
|
|
compares:
|
|
|
|
|
<programlisting><![CDATA[extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__))
|
|
|
|
|
_mm_cmpnlt_pd (__m128d __A, __m128d __B)
|
|
|
|
|
{
|
|
|
|
|
return (__m128d)__builtin_ia32_cmpnltpd ((__v2df)__A, (__v2df)__B);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__))
|
|
|
|
|
_mm_cmpnle_pd (__m128d __A, __m128d __B)
|
|
|
|
|
{
|
|
|
|
|
return (__m128d)__builtin_ia32_cmpnlepd ((__v2df)__A, (__v2df)__B);
|
|
|
|
|
}]]></programlisting></para>
|
|
|
|
|
|
|
|
|
|
<para>The PowerISA and OpenPOWER ABI, or GCC PowerPC Altivec Built-in
|
|
|
|
|
documentation do not provide any direct equivalents to the not greater than
|
|
|
|
|
class of compares. Again you don't really need them if you know Boolean
|
|
|
|
|
algebra. We can use identities like
|
|
|
|
|
{<emphasis role="bold">not</emphasis> (A < B) iff A >= B} and
|
|
|
|
|
{<emphasis role="bold">not</emphasis> (A
|
|
|
|
|
<= B) iff A > B}. So the PPC64LE implementation follows:
|
|
|
|
|
<programlisting><![CDATA[extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__))
|
|
|
|
|
_mm_cmpnlt_pd (__m128d __A, __m128d __B)
|
|
|
|
|
{
|
|
|
|
|
return ((__m128d)vec_cmpge (__A, __B));
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__))
|
|
|
|
|
_mm_cmpnle_pd (__m128d __A, __m128d __B)
|
|
|
|
|
{
|
|
|
|
|
return ((__m128d)vec_cmpgt (__A, __B));
|
|
|
|
|
}]]></programlisting></para>
|
|
|
|
|
|
|
|
|
|
<para>These patterns repeat for the scalar version of the
|
|
|
|
|
<emphasis role="bold">not</emphasis> compares. And
|
|
|
|
|
in general the larger pattern described in this chapter applies to the other
|
|
|
|
|
float and integer types with similar interfaces.</para>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
</section>
|
|
|
|
|
|