Vector Programming Techniques
Help the Compiler Help You
Start with scalar code, which is the most portable. Use various
tricks for helping the compiler vectorize scalar code. Make
sure you align your data on 16-byte boundaries wherever
possible, and tell the compiler it's aligned. Use __restrict__
pointers to promise data does not alias.
Use Portable Intrinsics
Individual compilers may provide other intrinsic support. Only
the intrinsics in this manual are guaranteed to be portable
across compliant compilers.
Some compilers may provide compatibility headers for use with
other architectures. Recent GCC and Clang compilers support
compatibility headers for the lower levels of the x86 vector
architecture. These can be used initially for ease of porting,
but for best performance, it is preferable to rewrite important
sections of code with native Power intrinsics.
Use Assembly Code Sparingly
filler
Other Vector Programming APIs
In addition to the intrinsic functions provided in this
reference, programmers should be aware of other vector programming
API resources.
x86 Vector Portability Headers
Recent versions of the gcc
and clang
open source compilers provide "drop-in" portability headers
for portions of the Intel Architecture Instruction Set
Extensions (see ). These
headers mirror the APIs of Intel headers having the same
names. Support is provided for the MMX and SSE layers, up
through SSE4. At this time, no support for the AVX layers is
envisioned.
The portability headers provide the same semantics as the
corresponding Intel APIs, but using VMX and VSX instructions
to emulate the Intel vector instructions. It should be
emphasized that these headers are provided for portability,
and will not necessarily perform optimally (although in many
cases the performance is very good). Using these headers is
often a good first step in porting a library using Intel
intrinsics to POWER, after which more detailed rewriting of
algorithms is usually desirable for best performance.
Access to the portability APIs occurs automatically when
including one of the corresponding Intel header files, such as
<mmintrin.h>
.
The POWER Vector Library (pveclib)
The POWER Vector Library, also known as
pveclib
, is a separate project available from
github (see ). The
pveclib
project builds on top of the intrinsics
described in this manual to provide higher-level vector
interfaces that are highly portable. The goals of the project
include:
Providing equivalent functions across versions of the
PowerISA. For example, the Vector
Multiply-by-10 Unsigned Quadword operation
introduced in PowerISA 3.0 (POWER9) can be implemented
using a few vector instructions on earlier PowerISA
versions.
Providing equivalent functions across compiler versions.
For example, intrinsics provided in later versions of the
compiler can be implemented as inline functions with
inline asm in earlier compiler versions.
Providing higher-order functions not provided directly by
the PowerISA. One example is a vector SIMD implementation
for ASCII __isalpha
and similar functions.
Another example is full __int128
implementations of Count Leading
Zeroes, Population Count,
and Multiply.