You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Programming-Guides/Vector_Intrinsics/sec_vec_or_not.xml

153 lines
7.0 KiB
XML

This file contains invisible Unicode characters!

This file contains invisible Unicode characters that may be processed differently from what appears below. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to reveal hidden characters.

<?xml version="1.0" encoding="UTF-8"?>
<!--
Copyright (c) 2017 OpenPOWER Foundation
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<section xmlns="http://docbook.org/ns/docbook"
xmlns:xi="http://www.w3.org/2001/XInclude"
xmlns:xlink="http://www.w3.org/1999/xlink"
version="5.0"
xml:id="sec_vec_or_not">
<title>To vec_not or not</title>
<para>Now lets look at a similar example that adds some surprising
complexity. When we look at the negated compare forms we can not find
exact matches in the PowerISA. But a little knowledge of boolean
algebra can show the way to the equivalent functions.</para>
<para>First the X86 compare not equal case where we might expect to
find the equivalent vec_cmpne builtins for PowerISA:
<programlisting><![CDATA[extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__))
_mm_cmpneq_pd (__m128d __A, __m128d __B)
{
return (__m128d)__builtin_ia32_cmpneqpd ((__v2df)__A, (__v2df)__B);
}
extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__))
_mm_cmpneq_sd (__m128d __A, __m128d __B)
{
return (__m128d)__builtin_ia32_cmpneqsd ((__v2df)__A, (__v2df)__B);
}]]></programlisting></para>
<para>Well not exactly. Looking at the OpenPOWER ABI document we see a
reference to
<literal>vec_cmpne</literal> for all numeric types. But when we look in the current
GCC 6 documentation we find that
<literal>vec_cmpne</literal> is not on the list. So it is planned
in the ABI, but not implemented yet.</para>
<para>Looking at the PowerISA 2.07B we find a VSX Vector Compare Equal to
Double-Precision but no Not Equal. In fact we see only vector double compare
instructions for greater than and greater than or equal in addition to the
equal compare. Not only can't we find a not equal, there is no less than or
less than or equal compares either.</para>
<para>So what is going on here? Partially this is the Reduced Instruction
Set Computer (RISC) design philosophy. In this case the compiler can generate
all the required compares using the existing vector instructions and simple
transforms based on Boolean algebra. So
<literal>vec_cmpne(A,B)</literal> is simply <literal>vec_not
(vec_cmpeq(A,B))</literal>. And <literal>vec_cmplt(A,B)</literal> is simply
<literal>vec_cmpgt(B,A)</literal> based on the
identity A &lt; B <emphasis><emphasis role="bold">iff</emphasis></emphasis> B &gt; A.
Similarly <literal>vec_cmple(A,B)</literal> is implemented as
<literal>vec_cmpge(B,A)</literal>.</para>
<para>What a minute, there is no <literal>vec_not()</literal> either. Can not find it in the
PowerISA, the OpenPOWER ABI, or the GCC PowerPC Altivec Built-in documentation.
There is no <literal>vec_move()</literal> either! How can this possibly work?</para>
<para>This is RISC philosophy again. We can always use a logical
instruction (like bit wise <emphasis role="bold">and</emphasis> or
<emphasis role="bold">or</emphasis>) to effect a move, given that we also have
nondestructive 3 register instruction forms. In the PowerISA most instruction
have two input registers and a separate result register. So if the result
register number is  different from either input register then the inputs are
not clobbered (nondestructive). Of course nothing prevents you from specifying
the same register for both inputs or even all three registers (result and both
inputs).  And some times it is useful.</para>
<para>The statement <literal>B = vec_or (A,A)</literal> is is effectively a vector move/copy
from <literal>A</literal> to <literal>B</literal>. And <literal>A = vec_or (A,A)</literal> is obviously a
<emphasis role="bold"><literal>nop</literal></emphasis> (no operation). In fact the
PowerISA defines the preferred <literal>nop</literal> and register move for vector registers
in this way.</para>
<para>The PowerISA implements the logical operators
<emphasis role="bold">nor</emphasis> (<emphasis role="bold">not or</emphasis>)
and <emphasis role="bold">nand</emphasis> (<emphasis role="bold">not and</emphasis>).  
The PowerISA provides these instruction for
fixed point and vector logical operations. So <literal>vec_not(A)</literal>
can be implemented as <literal>vec_nor(A,A)</literal>.
So for the implementation of _mm_cmpne we propose the following:
<programlisting><![CDATA[extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__))
_mm_cmpneq_pd (__m128d __A, __m128d __B)
{
__v2df temp = (__v2df ) vec_cmpeq ((__v2df) __A, (__v2df)__B);
return ((__m128d)vec_nor (temp, temp));
}
extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__))
_mm_cmpneq_sd (__m128d __A, __m128d __B)
{
__v2df a, b, c;
a = vec_splat(__A, 0);
b = vec_splat(__B, 0);
c = (__v2df)vec_cmpeq(a, b);
c = (__v2df)vec_nor(c, c);
return ((__m128d){c[0], __A[1]});
}]]></programlisting></para>
<para>The Intel Intrinsics also include the not forms of the relational
compares:
<programlisting><![CDATA[extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__))
_mm_cmpnlt_pd (__m128d __A, __m128d __B)
{
return (__m128d)__builtin_ia32_cmpnltpd ((__v2df)__A, (__v2df)__B);
}
extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__))
_mm_cmpnle_pd (__m128d __A, __m128d __B)
{
return (__m128d)__builtin_ia32_cmpnlepd ((__v2df)__A, (__v2df)__B);
}]]></programlisting></para>
<para>The PowerISA and OpenPOWER ABI, or GCC PowerPC Altivec Built-in
documentation do not provide any direct equivalents to the  not greater than
class of compares. Again you don't really need them if you know Boolean
algebra. We can use identities like
{<emphasis role="bold">not</emphasis> (A &lt; B) iff A &gt;= B} and
{<emphasis role="bold">not</emphasis> (A
&lt;= B) iff A &gt; B}. So the PPC64LE implementation follows:
<programlisting><![CDATA[extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__))
_mm_cmpnlt_pd (__m128d __A, __m128d __B)
{
return ((__m128d)vec_cmpge (__A, __B));
}
extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__))
_mm_cmpnle_pd (__m128d __A, __m128d __B)
{
return ((__m128d)vec_cmpgt (__A, __B));
}]]></programlisting></para>
<para>These patterns repeat for the scalar version of the
<emphasis role="bold">not</emphasis> compares. And
in general the larger pattern described in this chapter applies to the other
float and integer types with similar interfaces.</para>
</section>