|
|
|
<!--
|
|
|
|
Copyright (c) 2019 OpenPOWER Foundation
|
|
|
|
|
|
|
|
Licensed under the Apache License, Version 2.0 (the "License");
|
|
|
|
you may not use this file except in compliance with the License.
|
|
|
|
You may obtain a copy of the License at
|
|
|
|
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
|
|
|
|
Unless required by applicable law or agreed to in writing, software
|
|
|
|
distributed under the License is distributed on an "AS IS" BASIS,
|
|
|
|
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
|
|
See the License for the specific language governing permissions and
|
|
|
|
limitations under the License.
|
|
|
|
|
|
|
|
-->
|
|
|
|
<chapter version="5.0" xml:lang="en" xmlns="http://docbook.org/ns/docbook" xmlns:xi="http://www.w3.org/2001/XInclude"
|
|
|
|
xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="section_intro">
|
|
|
|
|
|
|
|
<!-- Chapter Title goes here. -->
|
|
|
|
<title>Introduction to Vector Programming on Power</title>
|
|
|
|
|
|
|
|
<section>
|
|
|
|
<title>A Brief History</title>
|
|
|
|
<para>
|
|
|
|
The history of vector programming on <phrase
|
|
|
|
revisionflag="changed"><trademark
|
|
|
|
class="registered">Power</trademark></phrase> processors begins
|
|
|
|
with the AIM (Apple, IBM, Motorola) alliance in the 1990s. The
|
|
|
|
AIM partners developed the Power Vector Media Extension (VMX) to
|
|
|
|
accelerate multimedia applications, particularly image
|
|
|
|
processing. VMX is the name still used by IBM for this
|
|
|
|
instruction set. Freescale (formerly Motorola) used the
|
|
|
|
trademark <phrase revisionflag="changed"><trademark
|
|
|
|
class="trade">AltiVec</trademark>,</phrase> while Apple at one
|
|
|
|
time called it "Velocity
|
|
|
|
Engine." While VMX remains the most official name, the term
|
|
|
|
AltiVec is still in common use today. Freescale's AltiVec
|
|
|
|
Technology Programming Interface Manual (the "AltiVec PIM") is
|
|
|
|
still available online for reference (see <xref
|
|
|
|
linkend="VIPR.intro.links" />).
|
|
|
|
</para>
|
|
|
|
<para>
|
|
|
|
The original VMX specification provided for thirty-two 128-bit
|
|
|
|
vector registers (VRs). Each register can be treated as
|
|
|
|
containing sixteen 8-bit character values, eight 16-bit short
|
|
|
|
integer values, four 32-bit integer values, or four 32-bit
|
|
|
|
single-precision floating-point values (the "VMX data types").
|
|
|
|
Furthermore, the integer data types have signed, unsigned, and
|
|
|
|
boolean variants. An extensive set of arithmetic, logical,
|
|
|
|
comparison, conversion, memory access, and permute class
|
|
|
|
operations were specified to operate on these registers.
|
|
|
|
</para>
|
|
|
|
<para>
|
|
|
|
The AltiVec PIM documents intrinsic functions to be used by
|
|
|
|
programmers to access the VMX instruction set. Because similar
|
|
|
|
operations are provided for all the VMX data types, the PIM
|
|
|
|
provides for overloaded intrinsics that can operate on different
|
|
|
|
data types. However, such function overloading is not normally
|
|
|
|
acceptable in the C programming language, so compilers compliant
|
|
|
|
with the AltiVec PIM (such as GCC and Clang) were required to
|
|
|
|
add special handling to their parsers to permit this. The PIM
|
|
|
|
suggested (but did not mandate) the use of a header file,
|
|
|
|
<code><altivec.h></code>, for implementations that provide
|
|
|
|
AltiVec intrinsics. This is common practice for all compliant
|
|
|
|
compilers today.
|
|
|
|
</para>
|
|
|
|
<para>
|
|
|
|
The first chips incorporating the VMX instruction set were
|
|
|
|
introduced by Freescale in 1999, and used primarly in Apple
|
|
|
|
desktop computers. IBM's last desktop CPU (the PowerPC 970)
|
|
|
|
also included AltiVec support, and was used in the Apple
|
|
|
|
PowerMac G5. IBM initially omitted support for VMX from its
|
|
|
|
server-class computers, but added support for it in the POWER6
|
|
|
|
<phrase revisionflag="added">processor-based</phrase> server
|
|
|
|
family.
|
|
|
|
</para>
|
|
|
|
<para>
|
|
|
|
IBM extended VMX by introducing the Vector-Scalar Extension
|
|
|
|
(VSX) for the <phrase revisionflag="changed"><trademark
|
|
|
|
class="registered">POWER7</trademark></phrase> family of
|
|
|
|
processors. VSX adds sixty-four
|
|
|
|
128-bit vector-scalar registers (VSRs); however, to optimize the amount
|
|
|
|
of per-process register state, the registers overlap with the
|
|
|
|
VRs and the scalar floating-point registers (FPRs) (see <xref
|
|
|
|
linkend="VIPR.intro.unified" />). The VSRs can represent all
|
|
|
|
the data types representable by the VRs, and can also be treated
|
|
|
|
as containing two 64-bit integers or two 64-bit double-precision
|
|
|
|
floating-point values. However, ISA support for two 64-bit
|
|
|
|
integers in VSRs was limited until Version 2.07 (<phrase
|
|
|
|
revisionflag="changed"><trademark
|
|
|
|
class="registered">POWER8</trademark></phrase>) of the
|
|
|
|
Power ISA, and only the VRs are supported for these
|
|
|
|
instructions.
|
|
|
|
</para>
|
|
|
|
<para>
|
|
|
|
Both the VMX and VSX instruction sets have been expanded for the
|
|
|
|
<phrase revisionflag="changed">POWER8, <trademark
|
|
|
|
class="registered">POWER9</trademark>, and <trademark
|
|
|
|
class="registered">Power10</trademark></phrase> processor
|
|
|
|
families. Starting with POWER8,
|
|
|
|
a VSR can now contain a single 128-bit integer; and starting
|
|
|
|
with POWER9, a VSR can contain a single 128-bit IEEE floating-point
|
|
|
|
value. Again, the ISA currently only supports 128-bit
|
|
|
|
operations on values in the VRs.
|
|
|
|
</para>
|
|
|
|
<para>
|
|
|
|
The VMX and VSX instruction sets together may be referred to as
|
|
|
|
the Power SIMD (single-instruction, multiple-data)
|
|
|
|
instructions.
|
|
|
|
</para>
|
|
|
|
<section>
|
|
|
|
<title>Little-Endian Linux</title>
|
|
|
|
<para>
|
|
|
|
The Power architecture has supported operation in either
|
|
|
|
big-endian (BE) or little-endian (LE) mode from the
|
|
|
|
beginning. However, IBM's Power servers were only shipped
|
|
|
|
with big-endian operating systems (<phrase
|
|
|
|
revisionflag="changed"><trademark
|
|
|
|
class="registered">AIX</trademark>, IBM i, <trademark
|
|
|
|
class="registered">Linux</trademark></phrase>) prior to
|
|
|
|
the introduction of POWER8. With POWER8, IBM began
|
|
|
|
supporting little-endian Linux distributions for the first
|
|
|
|
time, and introduced a new application binary interface (the
|
|
|
|
64-Bit ELFv2 ABI Specification <xref
|
|
|
|
linkend="VIPR.intro.links" />) that can be used for either
|
|
|
|
big- or little-endian support. In practice, the ELFv2 ABI is
|
|
|
|
currently used only for little-endian Linux.
|
|
|
|
</para>
|
|
|
|
<para>
|
|
|
|
Although Power has always supported big- and little-endian
|
|
|
|
memory accesses, the introduction of vector register support
|
|
|
|
added a layer of complexity to programming for processors
|
|
|
|
operating in different endian modes. Arrays of elements
|
|
|
|
loaded into a VR or VSR will be indexed from left to right in
|
|
|
|
the register in big-endian mode, but will be indexed from
|
|
|
|
right to left in the register in little-endian mode. However,
|
|
|
|
the VMX and VSX instructions originally assumed that elements
|
|
|
|
will always be indexed from left to right in the register.
|
|
|
|
This is an inconvenience that needs to be hidden from the
|
|
|
|
application programmer wherever possible. To this end, IBM
|
|
|
|
developed a bi-endian vector programming model (see <xref
|
|
|
|
linkend="VIPR.biendian" />). The intrinsic functions provided
|
|
|
|
for the bi-endian vector programming model are described in
|
|
|
|
<xref linkend="VIPR.vec-ref" />.
|
|
|
|
</para>
|
|
|
|
</section>
|
|
|
|
</section>
|
|
|
|
|
|
|
|
<section xml:id="VIPR.intro.unified">
|
|
|
|
<title>The Unified Vector Register Set</title>
|
|
|
|
<para>
|
|
|
|
In <phrase revisionflag="changed"><trademark
|
|
|
|
class="trade">OpenPOWER</trademark>-compliant</phrase>
|
|
|
|
processors, floating-point and vector
|
|
|
|
operations are implemented using a unified vector-scalar model.
|
|
|
|
As shown in <xref linkend="FPR-VSR" /> and <xref
|
|
|
|
linkend="VR-VSR" />, there are 64 vector-scalar registers; each
|
|
|
|
is 128 bits wide.
|
|
|
|
</para>
|
|
|
|
<figure pgwide="1" xml:id="FPR-VSR">
|
|
|
|
<title>Floating-Point Registers as Part of VSRs</title>
|
|
|
|
<mediaobject>
|
|
|
|
<imageobject>
|
|
|
|
<imagedata fileref="fig-fpr-vsr.png" format="PNG"
|
|
|
|
scalefit="1" width="100%" />
|
|
|
|
</imageobject>
|
|
|
|
</mediaobject>
|
|
|
|
</figure>
|
|
|
|
<figure pgwide="1" xml:id="VR-VSR">
|
|
|
|
<title>Vector Registers as Part of VSRs</title>
|
|
|
|
<mediaobject>
|
|
|
|
<imageobject>
|
|
|
|
<imagedata fileref="fig-vr-vsr.png" format="PNG"
|
|
|
|
scalefit="1" width="100%" />
|
|
|
|
</imageobject>
|
|
|
|
</mediaobject>
|
|
|
|
</figure>
|
|
|
|
<para>
|
|
|
|
The vector-scalar registers can be addressed with VSX
|
|
|
|
instructions, for vector and scalar processing of all 64
|
|
|
|
registers, or with the "classic" Power floating-point
|
|
|
|
instructions to refer to a 32-register subset of these, having
|
|
|
|
64 bits per register. They can also be addressed with VMX
|
|
|
|
instructions to refer to a 32-register subset of 128-bit registers.
|
|
|
|
</para>
|
|
|
|
</section>
|
|
|
|
|
|
|
|
<section xml:id="VIPR.intro.reporting">
|
|
|
|
<title>Where to Report Bugs</title>
|
|
|
|
<para>
|
|
|
|
This reference provides guidance on using vector intrinsics that
|
|
|
|
are supported by all compatible compilers. If you find a
|
|
|
|
problem when using one of the intrinsics with a compatible
|
|
|
|
compiler, please report a bug! Bug reporting procedures differ
|
|
|
|
depending on which compiler you're using.
|
|
|
|
</para>
|
|
|
|
<itemizedlist>
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
<emphasis role="underline">GCC</emphasis>. The reporting
|
|
|
|
procedure for bugs against the GNU Compiler Collection is
|
|
|
|
described at <link
|
|
|
|
xlink:href="https://gcc.gnu.org/bugs/">https://gcc.gnu.org/bugs/</link>.
|
|
|
|
The GCC bugzilla tracker is located at <link
|
|
|
|
xlink:href="https://gcc.gnu.org/bugzilla/">https://gcc.gnu.org/bugzilla/</link>.
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
<emphasis role="underline">Clang/LLVM</emphasis>. The
|
|
|
|
reporting procedure for bugs against the Clang compiler is
|
|
|
|
described at <link
|
|
|
|
xlink:href="https://llvm.org/docs/HowToSubmitABug.html">https://llvm.org/docs/HowToSubmitABug.html</link>.
|
|
|
|
The LLVM bug tracking system is located at <link
|
|
|
|
xlink:href="https://bugs.llvm.org/enter_bug.cgi">https://bugs.llvm.org/enter_bug.cgi</link>.
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
<emphasis role="underline">The XL <phrase
|
|
|
|
revisionflag="added">and Open XL</phrase>
|
|
|
|
compilers</emphasis>. For XL <phrase
|
|
|
|
revisionflag="added">and Open XL</phrase> compilers provided
|
|
|
|
with the Linux Community Edition, you can provide feedback
|
|
|
|
to the XL compiler team via email
|
|
|
|
(<email>compinfo@cn.ibm.com</email>); for other editions of
|
|
|
|
XL <phrase revisionflag="added">and Open XL</phrase>
|
|
|
|
compilers, please open a <link
|
|
|
|
xlink:href="https://www.ibm.com/mysupport/s/">Case</link>.
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
</itemizedlist>
|
|
|
|
</section>
|
|
|
|
|
|
|
|
<section xml:id="VIPR.intro.links">
|
|
|
|
<title>Useful Links</title>
|
|
|
|
<para>
|
|
|
|
The following documents provide additional reference materials.
|
|
|
|
</para>
|
|
|
|
<itemizedlist>
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
<emphasis>64-Bit ELF V2 ABI Specification - Power
|
|
|
|
Architecture.</emphasis>
|
|
|
|
<emphasis>
|
|
|
|
<link xlink:href="https://openpowerfoundation.org/?resource_lib=64-bit-elf-v2-abi-specification-power-architecture">https://openpowerfoundation.org/?resource_lib=64-bit-elf-v2-abi-specification-power-architecture
|
|
|
|
</link>
|
|
|
|
</emphasis>
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
<emphasis>AltiVec Technology Program Interface
|
|
|
|
Manual.</emphasis>
|
|
|
|
<emphasis>
|
|
|
|
<!-- Note: the %2e substitution for the "." in the URL is needed to work around unknown issues
|
|
|
|
in the link creation in PDF documents -->
|
|
|
|
<link xlink:href="https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM%2epdf">https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf
|
|
|
|
</link>
|
|
|
|
</emphasis>
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
<emphasis>Intel Architecture Instruction Set Extensions and
|
|
|
|
Future Features Programming Reference.</emphasis>
|
|
|
|
<emphasis>
|
|
|
|
<!-- Note: the %2e substitution for the "." in the URL is needed to work around unknown issues
|
|
|
|
in the link creation in PDF documents -->
|
|
|
|
<link xlink:href="https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference%2epdf">https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf
|
|
|
|
</link>
|
|
|
|
</emphasis>
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
<emphasis>Power Instruction Set Architecture</emphasis>,
|
|
|
|
Version 3.0B Specification.
|
|
|
|
<emphasis>
|
|
|
|
<link xlink:href="https://openpowerfoundation.org/?resource_lib=power-isa-version-3-0">https://openpowerfoundation.org/?resource_lib=power-isa-version-3-0
|
|
|
|
</link>
|
|
|
|
</emphasis>
|
|
|
|
<phrase revisionflag="added">
|
|
|
|
Need to update this to Version 3.1B, which is not yet published.
|
|
|
|
</phrase>
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
<emphasis>POWER8 Processor User's Manual for the Single-Chip
|
|
|
|
Module.</emphasis>
|
|
|
|
<emphasis>
|
|
|
|
<link xlink:href="https://ibm.ent.box.com/s/649rlau0zjcc0yrulqf4cgx5wk3pgbfk">https://ibm.ent.box.com/s/649rlau0zjcc0yrulqf4cgx5wk3pgbfk
|
|
|
|
</link>
|
|
|
|
</emphasis>
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
<emphasis>POWER9 Processor User's Manual.</emphasis>
|
|
|
|
<emphasis>
|
|
|
|
<link
|
|
|
|
xlink:href="https://ibm.ent.box.com/s/tmklq90ze7aj8f4n32er1mu3sy9u8k3k">https://ibm.ent.box.com/s/tmklq90ze7aj8f4n32er1mu3sy9u8k3k
|
|
|
|
</link>
|
|
|
|
</emphasis>
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
<listitem revisionflag="added">
|
|
|
|
<para>
|
|
|
|
<emphasis>Power10 Processor User's Manual.</emphasis>
|
|
|
|
<emphasis>
|
|
|
|
<link
|
|
|
|
xlink:href="https://ibm.ent.box.com/s/tmklq90ze7aj8f4n32er1mu3sy9u8k3k">Not
|
|
|
|
yet available, link is to P9 user's manual
|
|
|
|
</link>
|
|
|
|
</emphasis>
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
<emphasis>Power Vector Library.</emphasis>
|
|
|
|
<emphasis>
|
|
|
|
<link xlink:href="https://github.com/open-power-sdk/pveclib">https://github.com/open-power-sdk/pveclib
|
|
|
|
</link>
|
|
|
|
</emphasis>
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
<emphasis>POWER8 In-Core Cryptography: The Unofficial
|
|
|
|
Guide.</emphasis>
|
|
|
|
<emphasis>
|
|
|
|
<link
|
|
|
|
xlink:href="https://github.com/noloader/POWER8-crypto/blob/master/power8-crypto.pdf">https://github.com/noloader/POWER8-crypto/blob/master/power8-crypto.pdf
|
|
|
|
</link>
|
|
|
|
</emphasis>
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
<emphasis>Using the GNU Compiler Collection.</emphasis>
|
|
|
|
<emphasis>
|
|
|
|
<!-- Note: the %2e substitution for the "." in the URL is needed to work around unknown issues
|
|
|
|
in the link creation in PDF documents -->
|
|
|
|
<link xlink:href="https://gcc.gnu.org/onlinedocs/gcc%2epdf">https://gcc.gnu.org/onlinedocs/gcc.pdf
|
|
|
|
</link>
|
|
|
|
</emphasis>
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
<emphasis>GCC's Assembler Syntax.</emphasis> Felix Cloutier.
|
|
|
|
<emphasis>
|
|
|
|
<link xlink:href="https://www.felixcloutier.com/documents/gcc-asm.html">https://www.felixcloutier.com/documents/gcc-asm.html</link>
|
|
|
|
</emphasis>
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
<listitem revisionflag="added">
|
|
|
|
<para>
|
|
|
|
<emphasis>The GNU C Library Project.</emphasis>
|
|
|
|
<emphasis>
|
|
|
|
<link xlink:href="https://www.gnu.org/software/libc">https://www.gnu.org/software/libc</link>
|
|
|
|
</emphasis>
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
<listitem revisionflag="added">
|
|
|
|
<para>
|
|
|
|
<emphasis>Matrix-Multiply Assist Best Practices Guide.</emphasis>
|
|
|
|
<emphasis>
|
|
|
|
<link xlink:href="http://www.redbooks.ibm.com/redpapers/pdfs/redp5612.pdf">https://www.redbooks.ibm.com/redpapers/pdfs/redp5612.pdf</link>
|
|
|
|
</emphasis>
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
</itemizedlist>
|
|
|
|
</section>
|
|
|
|
|
|
|
|
<section xml:id="VIPR.intro.marks" revisionflag="added">
|
|
|
|
<title>Trademarks</title>
|
|
|
|
<para>
|
|
|
|
AIX, POWER7, POWER8, POWER9, and Power10 are trademarks or
|
|
|
|
registered trademarks of International Business Machines
|
|
|
|
Corporation. Linux is a registered trademark of Linus
|
|
|
|
Torvalds. Intel is s registered trademark of Intel Corporation
|
|
|
|
or its subsidiaries. AltiVec is a trademark of Freescale
|
|
|
|
Semiconductor, Inc.
|
|
|
|
</para>
|
|
|
|
</section>
|
|
|
|
|
|
|
|
<section xml:id="VIPR.intro.conf">
|
|
|
|
<title>Conformance to this Specification</title>
|
|
|
|
<orderedlist>
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
Vector programs on OpenPOWER systems should follow the guide
|
|
|
|
and best practices for vector programming as outlined in
|
|
|
|
<xref linkend="VIPR.biendian" /> and in <xref
|
|
|
|
linkend="section_techniques" />.
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
Compliant compilers on OpenPOWER systems should provide
|
|
|
|
suitable support for intrinsic functions, preferably as
|
|
|
|
built-in vector functions that translate to one or more
|
|
|
|
Power ISA instructions as described in <xref
|
|
|
|
linkend="VIPR.biendian" /> and in <xref
|
|
|
|
linkend="VIPR.vec-ref" />. Compliant compilers targeting a
|
|
|
|
supported ISA level (2.7 or 3.0, for example) should provide
|
|
|
|
support for all intrinsic functions valid for that ISA
|
|
|
|
level, except where an intrinsic function is marked as
|
|
|
|
phased in, deferred, or deprecated.
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
</orderedlist>
|
|
|
|
</section>
|
|
|
|
|
|
|
|
</chapter>
|