|
|
|
<!--
|
|
|
|
Copyright (c) 2019 OpenPOWER Foundation
|
|
|
|
|
|
|
|
Licensed under the Apache License, Version 2.0 (the "License");
|
|
|
|
you may not use this file except in compliance with the License.
|
|
|
|
You may obtain a copy of the License at
|
|
|
|
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
|
|
|
|
Unless required by applicable law or agreed to in writing, software
|
|
|
|
distributed under the License is distributed on an "AS IS" BASIS,
|
|
|
|
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
|
|
See the License for the specific language governing permissions and
|
|
|
|
limitations under the License.
|
|
|
|
|
|
|
|
-->
|
|
|
|
<chapter version="5.0" xml:lang="en" xmlns="http://docbook.org/ns/docbook" xmlns:xi="http://www.w3.org/2001/XInclude"
|
|
|
|
xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="section_intro">
|
|
|
|
|
|
|
|
<!-- Chapter Title goes here. -->
|
|
|
|
<title>Introduction to Vector Programming on POWER</title>
|
|
|
|
|
|
|
|
<section>
|
|
|
|
<title>A Brief History</title>
|
|
|
|
<para>
|
|
|
|
The history of vector programming on POWER processors begins
|
|
|
|
with the AIM (Apple, IBM, Motorola) alliance in the 1990s. The
|
|
|
|
AIM partners developed the Power Vector Media Extension (VMX) to
|
|
|
|
accelerate multimedia applications, particularly image
|
|
|
|
processing. VMX is the name still used by IBM for this
|
|
|
|
instruction set. Freescale (formerly Motorola) used the
|
|
|
|
trademark "AltiVec," while Apple at one time called it "Velocity
|
|
|
|
Engine." While VMX remains the most official name, the term
|
|
|
|
AltiVec is still in common use today. Freescale's AltiVec
|
|
|
|
Technology Programming Interface Manual (the "AltiVec PIM") is
|
|
|
|
still available online for reference (see <xref
|
|
|
|
linkend="VIPR.intro.links" />).
|
|
|
|
</para>
|
|
|
|
<para>
|
|
|
|
The original VMX specification provided for thirty-two 128-bit
|
|
|
|
vector registers (VRs). Each register can be treated as
|
|
|
|
containing sixteen 8-bit character values, eight 16-bit short
|
|
|
|
integer values, four 32-bit integer values, or four 32-bit
|
|
|
|
single-precision floating-point values (the "VMX data types").
|
|
|
|
Furthermore, the integer data types have signed, unsigned, and
|
|
|
|
boolean variants. An extensive set of arithmetic, logical,
|
|
|
|
comparison, conversion, memory access, and permute class
|
|
|
|
operations were specified to operate on these registers.
|
|
|
|
</para>
|
|
|
|
<para>
|
|
|
|
The AltiVec PIM documents intrinsic functions to be used by
|
|
|
|
programmers to access the VMX instruction set. Because similar
|
|
|
|
operations are provided for all the VMX data types, the PIM
|
|
|
|
provides for overloaded intrinsics that can operate on different
|
|
|
|
data types. However, such function overloading is not normally
|
|
|
|
acceptable in the C programming language, so compilers compliant
|
|
|
|
with the AltiVec PIM (such as <code>gcc</code> and
|
|
|
|
<code>clang</code>) were required to add special handling to
|
|
|
|
their parsers to permit this. The PIM suggested (but did not
|
|
|
|
mandate) the use of a header file,
|
|
|
|
<code><altivec.h></code>, for implementations that provide
|
|
|
|
AltiVec intrinsics. This is common practice for all compliant
|
|
|
|
compilers today.
|
|
|
|
</para>
|
|
|
|
<para>
|
|
|
|
The first chips incorporating the VMX instruction set were
|
|
|
|
introduced by Freescale in 1999, and used primarly in Apple
|
|
|
|
desktop computers. IBM's last desktop CPU (the PowerPC 970)
|
|
|
|
also included AltiVec support, and was used in the Apple
|
|
|
|
PowerMac G5. IBM initially omitted support for VMX from its
|
|
|
|
server-class computers, but added support for it in the POWER6
|
|
|
|
server family.
|
|
|
|
</para>
|
|
|
|
<para>
|
|
|
|
IBM extended VMX by introducing the Vector-Scalar Extension
|
|
|
|
(VSX) for the POWER7 family of processors. VSX adds 64 logical
|
|
|
|
Vector Scalar Registers (VSRs); however, to optimize the amount
|
|
|
|
of per-process register state, the registers overlap with the
|
|
|
|
VRs and the scalar floating-point registers (FPRs) (see <xref
|
|
|
|
linkend="VIPR.intro.unified" />). The VSRs can represent all
|
|
|
|
the data types representable by the VRs, and can also be treated
|
|
|
|
as containing two 64-bit integers or two 64-bit double-precision
|
|
|
|
floating-point values.
|
|
|
|
</para>
|
|
|
|
<para>
|
|
|
|
Both the VMX and VSX instruction sets have been expanded for the
|
|
|
|
POWER8 and POWER9 processor families. Starting with POWER8,
|
|
|
|
a VSR can now contain a single 128-bit integer; and starting
|
|
|
|
with POWER9, a VSR can contain a single 128-bit floating-point
|
|
|
|
value. The VMX and VSX instruction sets together may be
|
|
|
|
referred to as the POWER SIMD (single-instruction,
|
|
|
|
multiple-data) instructions.
|
|
|
|
</para>
|
|
|
|
<section>
|
|
|
|
<title>Little-Endian Linux</title>
|
|
|
|
<para>
|
|
|
|
The POWER architecture has supported operation in either
|
|
|
|
big-endian (BE) or little-endian (LE) mode from the
|
|
|
|
beginning. However, IBM's POWER servers were only shipped
|
|
|
|
with big-endian operating systems (AIX, Linux, i5/OS) prior to
|
|
|
|
the introduction of POWER8. With POWER8, IBM began
|
|
|
|
supporting little-endian Linux distributions for the first
|
|
|
|
time, and introduced a new application binary interface (the
|
|
|
|
64-Bit ELFv2 ABI Specification <xref
|
|
|
|
linkend="VIPR.intro.links" />) that can be used for either
|
|
|
|
big- or little-endian support. In practice, the ELFv2 ABI is
|
|
|
|
currently used only for little-endian Linux.
|
|
|
|
</para>
|
|
|
|
<para>
|
|
|
|
Although POWER has always supported big- and little-endian
|
|
|
|
memory accesses, the introduction of vector register support
|
|
|
|
added a layer of complexity to programming for processors
|
|
|
|
operating in different endian modes. Arrays of elements
|
|
|
|
loaded into a VR or VSR will be indexed from left to right in
|
|
|
|
the register in big-endian mode, but will be indexed from
|
|
|
|
right to left in the register in little-endian mode. However,
|
|
|
|
the VMX and VSX instructions originally assumed that elements
|
|
|
|
will always be indexed from left to right in the register.
|
|
|
|
This is an inconvenience that needs to be hidden from the
|
|
|
|
application programmer wherever possible. To this end, IBM
|
|
|
|
developed a bi-endian vector programming model (see <xref
|
|
|
|
linkend="VIPR.biendian" />). The intrinsic functions provided
|
|
|
|
for the bi-endian vector programming model are described in
|
|
|
|
<xref linkend="VIPR.vec-ref" />.
|
|
|
|
</para>
|
|
|
|
</section>
|
|
|
|
</section>
|
|
|
|
|
|
|
|
<section xml:id="VIPR.intro.unified">
|
|
|
|
<title>The Unified Vector Register Set</title>
|
|
|
|
<para>filler</para>
|
|
|
|
</section>
|
|
|
|
|
|
|
|
<section xml:id="VIPR.intro.links">
|
|
|
|
<title>Useful Links</title>
|
|
|
|
<para>filler</para>
|
|
|
|
</section>
|
|
|
|
|
|
|
|
</chapter>
|