You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
143 lines
5.3 KiB
XML
143 lines
5.3 KiB
XML
<!--
|
|
Copyright (c) 2019 OpenPOWER Foundation
|
|
|
|
Licensed under the Apache License, Version 2.0 (the "License");
|
|
you may not use this file except in compliance with the License.
|
|
You may obtain a copy of the License at
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
Unless required by applicable law or agreed to in writing, software
|
|
distributed under the License is distributed on an "AS IS" BASIS,
|
|
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
See the License for the specific language governing permissions and
|
|
limitations under the License.
|
|
|
|
-->
|
|
<chapter version="5.0" xml:lang="en" xmlns="http://docbook.org/ns/docbook" xmlns:xi="http://www.w3.org/2001/XInclude"
|
|
xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="section_techniques">
|
|
|
|
<!-- Chapter Title goes here. -->
|
|
<title>Vector Programming Techniques</title>
|
|
|
|
<section>
|
|
<title>Help the Compiler Help You</title>
|
|
<para>
|
|
Start with scalar code, which is the most portable. Use various
|
|
tricks for helping the compiler vectorize scalar code. Make
|
|
sure you align your data on 16-byte boundaries wherever
|
|
possible, and tell the compiler it's aligned. Use __restrict__
|
|
pointers to promise data does not alias.
|
|
</para>
|
|
</section>
|
|
|
|
<section>
|
|
<title>Use Portable Intrinsics</title>
|
|
<para>
|
|
Individual compilers may provide other intrinsic support. Only
|
|
the intrinsics in this manual are guaranteed to be portable
|
|
across compliant compilers.
|
|
</para>
|
|
<para>
|
|
Some compilers may provide compatibility headers for use with
|
|
other architectures. Recent GCC and Clang compilers support
|
|
compatibility headers for the lower levels of the x86 vector
|
|
architecture. These can be used initially for ease of porting,
|
|
but for best performance, it is preferable to rewrite important
|
|
sections of code with native Power intrinsics.
|
|
</para>
|
|
</section>
|
|
|
|
<section>
|
|
<title>Use Assembly Code Sparingly</title>
|
|
<para>filler</para>
|
|
<section>
|
|
<title>Inline Assembly</title>
|
|
<para>filler</para>
|
|
</section>
|
|
<section>
|
|
<title>Assembly Files</title>
|
|
<para>filler</para>
|
|
</section>
|
|
</section>
|
|
|
|
<section>
|
|
<title>Other Vector Programming APIs</title>
|
|
<para>In addition to the intrinsic functions provided in this
|
|
reference, programmers should be aware of other vector programming
|
|
API resources.</para>
|
|
<section>
|
|
<title>x86 Vector Portability Headers</title>
|
|
<para>
|
|
Recent versions of the <code>gcc</code> and <code>clang</code>
|
|
open source compilers provide "drop-in" portability headers
|
|
for portions of the Intel Architecture Instruction Set
|
|
Extensions (see <xref linkend="VIPR.intro.links" />). These
|
|
headers mirror the APIs of Intel headers having the same
|
|
names. Support is provided for the MMX and SSE layers, up
|
|
through SSE4. At this time, no support for the AVX layers is
|
|
envisioned.
|
|
</para>
|
|
<para>
|
|
The portability headers provide the same semantics as the
|
|
corresponding Intel APIs, but using VMX and VSX instructions
|
|
to emulate the Intel vector instructions. It should be
|
|
emphasized that these headers are provided for portability,
|
|
and will not necessarily perform optimally (although in many
|
|
cases the performance is very good). Using these headers is
|
|
often a good first step in porting a library using Intel
|
|
intrinsics to POWER, after which more detailed rewriting of
|
|
algorithms is usually desirable for best performance.
|
|
</para>
|
|
<para>
|
|
Access to the portability APIs occurs automatically when
|
|
including one of the corresponding Intel header files, such as
|
|
<code><mmintrin.h></code>.
|
|
</para>
|
|
</section>
|
|
<section>
|
|
<title>The POWER Vector Library (pveclib)</title>
|
|
<para>The POWER Vector Library, also known as
|
|
<code>pveclib</code>, is a separate project available from
|
|
github (see <xref linkend="VIPR.intro.links" />). The
|
|
<code>pveclib</code> project builds on top of the intrinsics
|
|
described in this manual to provide higher-level vector
|
|
interfaces that are highly portable. The goals of the project
|
|
include:
|
|
</para>
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Providing equivalent functions across versions of the
|
|
PowerISA. For example, the <emphasis>Vector
|
|
Multiply-by-10 Unsigned Quadword</emphasis> operation
|
|
introduced in PowerISA 3.0 (POWER9) can be implemented
|
|
using a few vector instructions on earlier PowerISA
|
|
versions.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Providing equivalent functions across compiler versions.
|
|
For example, intrinsics provided in later versions of the
|
|
compiler can be implemented as inline functions with
|
|
inline asm in earlier compiler versions.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Providing higher-order functions not provided directly by
|
|
the PowerISA. One example is a vector SIMD implementation
|
|
for ASCII <code>__isalpha</code> and similar functions.
|
|
Another example is full <code>__int128</code>
|
|
implementations of <emphasis>Count Leading
|
|
Zeroes</emphasis>, <emphasis>Population Count</emphasis>,
|
|
and <emphasis>Multiply</emphasis>.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</section>
|
|
</section>
|
|
|
|
</chapter>
|