|
|
|
@ -18,11 +18,443 @@
|
|
|
|
|
xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="VIPR.biendian">
|
|
|
|
|
|
|
|
|
|
<!-- Chapter Title goes here. -->
|
|
|
|
|
<title>The Power Bi-Endian Programming Model</title>
|
|
|
|
|
<title>The POWER Bi-Endian Vector Programming Model</title>
|
|
|
|
|
|
|
|
|
|
<para>
|
|
|
|
|
To ensure portability of applications optimized to exploit the
|
|
|
|
|
SIMD functions of POWER ISA processors, the ELF V2 ABI defines a
|
|
|
|
|
set of functions and data types for SIMD programming. ELF
|
|
|
|
|
V2-compliant compilers will provide suitable support for these
|
|
|
|
|
functions, preferably as built-in functions that translate to one
|
|
|
|
|
or more POWER ISA instructions.
|
|
|
|
|
</para>
|
|
|
|
|
<para>
|
|
|
|
|
Compilers are encouraged, but not required, to provide built-in
|
|
|
|
|
functions to access individual instructions in the IBM POWER®
|
|
|
|
|
instruction set architecture. In most cases, each such built-in
|
|
|
|
|
function should provide direct access to the underlying
|
|
|
|
|
instruction.
|
|
|
|
|
</para>
|
|
|
|
|
<para>
|
|
|
|
|
However, to ease porting between little-endian (LE) and big-endian
|
|
|
|
|
(BE) POWER systems, and between POWER and other platforms, it is
|
|
|
|
|
preferable that some built-in functions provide the same semantics
|
|
|
|
|
on both LE and BE POWER systems, even if this means that the
|
|
|
|
|
built-in functions are implemented with different instruction
|
|
|
|
|
sequences for LE and BE. To achieve this, vector built-in
|
|
|
|
|
functions provide a set of functions derived from the set of
|
|
|
|
|
hardware functions provided by the Power vector SIMD
|
|
|
|
|
instructions. Unlike traditional “hardware intrinsic” built-in
|
|
|
|
|
functions, no fixed mapping exists between these built-in
|
|
|
|
|
functions and the generated hardware instruction sequence. Rather,
|
|
|
|
|
the compiler is free to generate optimized instruction sequences
|
|
|
|
|
that implement the semantics of the program specified by the
|
|
|
|
|
programmer using these built-in functions.
|
|
|
|
|
</para>
|
|
|
|
|
<para>
|
|
|
|
|
This is primarily applicable to the POWER SIMD instructions. As
|
|
|
|
|
we've seen, this set of instructions operates on groups of 2, 4,
|
|
|
|
|
8, or 16 vector elements at a time in 128-bit registers. On a
|
|
|
|
|
big-endian POWER platform, vector elements are loaded from memory
|
|
|
|
|
into a register so that the 0th element occupies the high-order
|
|
|
|
|
bits of the register, and the (N – 1)th element occupies the
|
|
|
|
|
low-order bits of the register. This is referred to as big-endian
|
|
|
|
|
element order. On a little-endian POWER platform, vector elements
|
|
|
|
|
are loaded from memory such that the 0th element occupies the
|
|
|
|
|
low-order bits of the register, and the (N – 1)th element
|
|
|
|
|
occupies the high-order bits. This is referred to as little-endian
|
|
|
|
|
element order.
|
|
|
|
|
</para>
|
|
|
|
|
|
|
|
|
|
<section>
|
|
|
|
|
<title>Purpose</title>
|
|
|
|
|
<para>filler</para>
|
|
|
|
|
<title>Vector Data Types</title>
|
|
|
|
|
<para>
|
|
|
|
|
Languages provide support for the data types in <xref
|
|
|
|
|
linkend="VIPR.biendian.vectypes" /> to represent vector data
|
|
|
|
|
types stored in vector registers.
|
|
|
|
|
</para>
|
|
|
|
|
<para>
|
|
|
|
|
For the C and C++ programming languages (and related/derived
|
|
|
|
|
languages), these data types may be accessed based on the type
|
|
|
|
|
names listed in <xref linkend="VIPR.biendian.vectypes" /> when
|
|
|
|
|
Power ISA SIMD language extensions are enabled using either the
|
|
|
|
|
<code>vector</code> or <code>__vector</code> keywords. NOTE
|
|
|
|
|
THAT THIS IS THE FIRST TIME WE'VE MENTIONED THESE LANGUAGE
|
|
|
|
|
EXTENSIONS, NEED TO FIX THAT.
|
|
|
|
|
</para>
|
|
|
|
|
<para>
|
|
|
|
|
For the Fortran language, OH YET ANOTHER STINKING TABLE gives a
|
|
|
|
|
correspondence between Fortran and C/C++ language types.
|
|
|
|
|
</para>
|
|
|
|
|
<para>
|
|
|
|
|
The assignment operator always performs a byte-by-byte data copy
|
|
|
|
|
for vector data types.
|
|
|
|
|
</para>
|
|
|
|
|
<para>
|
|
|
|
|
Like other C/C++ language types, vector types may be defined to
|
|
|
|
|
have const or volatile properties. Vector data types can be
|
|
|
|
|
defined as being in static, auto, and register storage.
|
|
|
|
|
</para>
|
|
|
|
|
<para>
|
|
|
|
|
Pointers to vector types are defined like pointers of other
|
|
|
|
|
C/C++ types. Pointers to vector objects may be defined to have
|
|
|
|
|
const and volatile properties. Pointers to vector objects must
|
|
|
|
|
be divisible by 16, as vector objects are always aligned on
|
|
|
|
|
quadword (128-bit) boundaries.
|
|
|
|
|
</para>
|
|
|
|
|
<para>
|
|
|
|
|
The preferred way to access vectors at an application-defined
|
|
|
|
|
address is by using vector pointers and the C/C++ dereference
|
|
|
|
|
operator <code>*</code>. Similar to other C/C++ data types, the
|
|
|
|
|
array reference operator <code>[]</code> may be used to access
|
|
|
|
|
vector objects with a vector pointer with the usual definition
|
|
|
|
|
to access the <emphasis>n</emphasis>th vector element from a
|
|
|
|
|
vector pointer. The dereference operator <code>*</code> may
|
|
|
|
|
<emphasis>not</emphasis> be used to access data that is not
|
|
|
|
|
aligned at least to a quadword boundary. Built-in functions
|
|
|
|
|
such as <code>vec_xl</code> and <code>vec_xst</code> are
|
|
|
|
|
provided for unaligned data access.
|
|
|
|
|
</para>
|
|
|
|
|
<para>
|
|
|
|
|
Compilers are expected to recognize and optimize multiple
|
|
|
|
|
operations that can be optimized into a single hardware
|
|
|
|
|
instruction. For example, a load and splat hardware instruction
|
|
|
|
|
might be generated for the following sequence:
|
|
|
|
|
</para>
|
|
|
|
|
<programlisting>double *double_ptr;
|
|
|
|
|
register vector double vd = vec_splats(*double_ptr);</programlisting>
|
|
|
|
|
<table frame="all" pgwide="1" xml:id="VIPR.biendian.vectypes">
|
|
|
|
|
<title>Vector Types</title>
|
|
|
|
|
<tgroup cols="4">
|
|
|
|
|
<colspec colname="c1" colwidth="20*" />
|
|
|
|
|
<colspec colname="c2" colwidth="10*" align="center" />
|
|
|
|
|
<colspec colname="c3" colwidth="15*" align="center" />
|
|
|
|
|
<colspec colname="c4" colwidth="40*" />
|
|
|
|
|
<thead>
|
|
|
|
|
<row>
|
|
|
|
|
<entry align="center">
|
|
|
|
|
<para>
|
|
|
|
|
<emphasis role="bold">Power SIMD C Types</emphasis>
|
|
|
|
|
</para>
|
|
|
|
|
</entry>
|
|
|
|
|
<entry align="center">
|
|
|
|
|
<para>
|
|
|
|
|
<emphasis role="bold">sizeof</emphasis>
|
|
|
|
|
</para>
|
|
|
|
|
</entry>
|
|
|
|
|
<entry align="center">
|
|
|
|
|
<para>
|
|
|
|
|
<emphasis role="bold">Alignment</emphasis>
|
|
|
|
|
</para>
|
|
|
|
|
</entry>
|
|
|
|
|
<entry align="center">
|
|
|
|
|
<para>
|
|
|
|
|
<emphasis role="bold">Description</emphasis>
|
|
|
|
|
</para>
|
|
|
|
|
</entry>
|
|
|
|
|
</row>
|
|
|
|
|
</thead>
|
|
|
|
|
<tbody>
|
|
|
|
|
<row>
|
|
|
|
|
<entry>
|
|
|
|
|
<para>vector unsigned char</para>
|
|
|
|
|
</entry>
|
|
|
|
|
<entry>
|
|
|
|
|
<para>16</para>
|
|
|
|
|
</entry>
|
|
|
|
|
<entry>
|
|
|
|
|
<para>Quadword</para>
|
|
|
|
|
</entry>
|
|
|
|
|
<entry>
|
|
|
|
|
<para>Vector of 16 unsigned bytes.</para>
|
|
|
|
|
</entry>
|
|
|
|
|
</row>
|
|
|
|
|
<row>
|
|
|
|
|
<entry>
|
|
|
|
|
<para>vector signed char</para>
|
|
|
|
|
</entry>
|
|
|
|
|
<entry>
|
|
|
|
|
<para>16</para>
|
|
|
|
|
</entry>
|
|
|
|
|
<entry>
|
|
|
|
|
<para>Quadword</para>
|
|
|
|
|
</entry>
|
|
|
|
|
<entry>
|
|
|
|
|
<para>Vector of 16 signed bytes.</para>
|
|
|
|
|
</entry>
|
|
|
|
|
</row>
|
|
|
|
|
<row>
|
|
|
|
|
<entry>
|
|
|
|
|
<para>vector bool char</para>
|
|
|
|
|
</entry>
|
|
|
|
|
<entry>
|
|
|
|
|
<para>16</para>
|
|
|
|
|
</entry>
|
|
|
|
|
<entry>
|
|
|
|
|
<para>Quadword</para>
|
|
|
|
|
</entry>
|
|
|
|
|
<entry>
|
|
|
|
|
<para>Vector of 16 bytes with a value of either 0 or
|
|
|
|
|
2<superscript>8</superscript> – 1.</para>
|
|
|
|
|
</entry>
|
|
|
|
|
</row>
|
|
|
|
|
<row>
|
|
|
|
|
<entry>
|
|
|
|
|
<para>vector unsigned short</para>
|
|
|
|
|
</entry>
|
|
|
|
|
<entry>
|
|
|
|
|
<para>16</para>
|
|
|
|
|
</entry>
|
|
|
|
|
<entry>
|
|
|
|
|
<para>Quadword</para>
|
|
|
|
|
</entry>
|
|
|
|
|
<entry>
|
|
|
|
|
<para>Vector of 8 unsigned halfwords.</para>
|
|
|
|
|
</entry>
|
|
|
|
|
</row>
|
|
|
|
|
<row>
|
|
|
|
|
<entry>
|
|
|
|
|
<para>vector signed short</para>
|
|
|
|
|
</entry>
|
|
|
|
|
<entry>
|
|
|
|
|
<para>16</para>
|
|
|
|
|
</entry>
|
|
|
|
|
<entry>
|
|
|
|
|
<para>Quadword</para>
|
|
|
|
|
</entry>
|
|
|
|
|
<entry>
|
|
|
|
|
<para>Vector of 8 signed halfwords.</para>
|
|
|
|
|
</entry>
|
|
|
|
|
</row>
|
|
|
|
|
<row>
|
|
|
|
|
<entry>
|
|
|
|
|
<para>vector bool short</para>
|
|
|
|
|
</entry>
|
|
|
|
|
<entry>
|
|
|
|
|
<para>16</para>
|
|
|
|
|
</entry>
|
|
|
|
|
<entry>
|
|
|
|
|
<para>Quadword</para>
|
|
|
|
|
</entry>
|
|
|
|
|
<entry>
|
|
|
|
|
<para>Vector of 8 halfwords with a value of either 0 or
|
|
|
|
|
2<superscript>16</superscript> – 1.</para>
|
|
|
|
|
</entry>
|
|
|
|
|
</row>
|
|
|
|
|
<row>
|
|
|
|
|
<entry>
|
|
|
|
|
<para>vector unsigned int</para>
|
|
|
|
|
</entry>
|
|
|
|
|
<entry>
|
|
|
|
|
<para>16</para>
|
|
|
|
|
</entry>
|
|
|
|
|
<entry>
|
|
|
|
|
<para>Quadword</para>
|
|
|
|
|
</entry>
|
|
|
|
|
<entry>
|
|
|
|
|
<para>Vector of 4 unsigned words.</para>
|
|
|
|
|
</entry>
|
|
|
|
|
</row>
|
|
|
|
|
<row>
|
|
|
|
|
<entry>
|
|
|
|
|
<para>vector signed int</para>
|
|
|
|
|
</entry>
|
|
|
|
|
<entry>
|
|
|
|
|
<para>16</para>
|
|
|
|
|
</entry>
|
|
|
|
|
<entry>
|
|
|
|
|
<para>Quadword</para>
|
|
|
|
|
</entry>
|
|
|
|
|
<entry>
|
|
|
|
|
<para>Vector of 4 signed words.</para>
|
|
|
|
|
</entry>
|
|
|
|
|
</row>
|
|
|
|
|
<row>
|
|
|
|
|
<entry>
|
|
|
|
|
<para>vector bool int</para>
|
|
|
|
|
</entry>
|
|
|
|
|
<entry>
|
|
|
|
|
<para>16</para>
|
|
|
|
|
</entry>
|
|
|
|
|
<entry>
|
|
|
|
|
<para>Quadword</para>
|
|
|
|
|
</entry>
|
|
|
|
|
<entry>
|
|
|
|
|
<para>Vector of 4 words with a value of either 0 or
|
|
|
|
|
2<superscript>32</superscript> – 1.</para>
|
|
|
|
|
</entry>
|
|
|
|
|
</row>
|
|
|
|
|
<row>
|
|
|
|
|
<entry>
|
|
|
|
|
<para>vector unsigned long<footnote xml:id="vlong">
|
|
|
|
|
<para>The vector long types are deprecated due to their
|
|
|
|
|
ambiguity between 32-bit and 64-bit environments. The use
|
|
|
|
|
of the vector long long types is preferred.</para>
|
|
|
|
|
</footnote></para>
|
|
|
|
|
<para>vector unsigned long long</para>
|
|
|
|
|
</entry>
|
|
|
|
|
<entry>
|
|
|
|
|
<para>16</para>
|
|
|
|
|
</entry>
|
|
|
|
|
<entry>
|
|
|
|
|
<para>Quadword</para>
|
|
|
|
|
</entry>
|
|
|
|
|
<entry>
|
|
|
|
|
<para>Vector of 2 unsigned doublewords.</para>
|
|
|
|
|
</entry>
|
|
|
|
|
</row>
|
|
|
|
|
<row>
|
|
|
|
|
<entry>
|
|
|
|
|
<para>vector signed long<footnoteref linkend="vlong" /></para>
|
|
|
|
|
<para>vector signed long long</para>
|
|
|
|
|
</entry>
|
|
|
|
|
<entry>
|
|
|
|
|
<para>16</para>
|
|
|
|
|
</entry>
|
|
|
|
|
<entry>
|
|
|
|
|
<para>Quadword</para>
|
|
|
|
|
</entry>
|
|
|
|
|
<entry>
|
|
|
|
|
<para>Vector of 2 signed doublewords.</para>
|
|
|
|
|
</entry>
|
|
|
|
|
</row>
|
|
|
|
|
<row>
|
|
|
|
|
<entry>
|
|
|
|
|
<para>vector bool long<footnoteref linkend="vlong" /></para>
|
|
|
|
|
<para>vector bool long long</para>
|
|
|
|
|
</entry>
|
|
|
|
|
<entry>
|
|
|
|
|
<para>16</para>
|
|
|
|
|
</entry>
|
|
|
|
|
<entry>
|
|
|
|
|
<para>Quadword</para>
|
|
|
|
|
</entry>
|
|
|
|
|
<entry>
|
|
|
|
|
<para>Vector of 2 doublewords with a value of either 0 or
|
|
|
|
|
2<superscript>64</superscript> – 1.</para>
|
|
|
|
|
</entry>
|
|
|
|
|
</row>
|
|
|
|
|
<row>
|
|
|
|
|
<entry>
|
|
|
|
|
<para>vector unsigned __int128</para>
|
|
|
|
|
</entry>
|
|
|
|
|
<entry>
|
|
|
|
|
<para>16</para>
|
|
|
|
|
</entry>
|
|
|
|
|
<entry>
|
|
|
|
|
<para>Quadword</para>
|
|
|
|
|
</entry>
|
|
|
|
|
<entry>
|
|
|
|
|
<para>Vector of 1 unsigned quadword.</para>
|
|
|
|
|
</entry>
|
|
|
|
|
</row>
|
|
|
|
|
<row>
|
|
|
|
|
<entry>
|
|
|
|
|
<para>vector signed __int128</para>
|
|
|
|
|
</entry>
|
|
|
|
|
<entry>
|
|
|
|
|
<para>16</para>
|
|
|
|
|
</entry>
|
|
|
|
|
<entry>
|
|
|
|
|
<para>Quadword</para>
|
|
|
|
|
</entry>
|
|
|
|
|
<entry>
|
|
|
|
|
<para>Vector of 1 signed quadword.</para>
|
|
|
|
|
</entry>
|
|
|
|
|
</row>
|
|
|
|
|
<row>
|
|
|
|
|
<entry>
|
|
|
|
|
<para>vector _Float16</para>
|
|
|
|
|
</entry>
|
|
|
|
|
<entry>
|
|
|
|
|
<para>16</para>
|
|
|
|
|
</entry>
|
|
|
|
|
<entry>
|
|
|
|
|
<para>Quadword</para>
|
|
|
|
|
</entry>
|
|
|
|
|
<entry>
|
|
|
|
|
<para>Vector of 8 half-precision floats.</para>
|
|
|
|
|
</entry>
|
|
|
|
|
</row>
|
|
|
|
|
<row>
|
|
|
|
|
<entry>
|
|
|
|
|
<para>vector float</para>
|
|
|
|
|
</entry>
|
|
|
|
|
<entry>
|
|
|
|
|
<para>16</para>
|
|
|
|
|
</entry>
|
|
|
|
|
<entry>
|
|
|
|
|
<para>Quadword</para>
|
|
|
|
|
</entry>
|
|
|
|
|
<entry>
|
|
|
|
|
<para>Vector of 4 single-precision floats.</para>
|
|
|
|
|
</entry>
|
|
|
|
|
</row>
|
|
|
|
|
<row>
|
|
|
|
|
<entry>
|
|
|
|
|
<para>vector double</para>
|
|
|
|
|
</entry>
|
|
|
|
|
<entry>
|
|
|
|
|
<para>16</para>
|
|
|
|
|
</entry>
|
|
|
|
|
<entry>
|
|
|
|
|
<para>Quadword</para>
|
|
|
|
|
</entry>
|
|
|
|
|
<entry>
|
|
|
|
|
<para>Vector of 2 double-precision floats.</para>
|
|
|
|
|
</entry>
|
|
|
|
|
</row>
|
|
|
|
|
</tbody>
|
|
|
|
|
</tgroup>
|
|
|
|
|
</table>
|
|
|
|
|
</section>
|
|
|
|
|
|
|
|
|
|
<section>
|
|
|
|
|
<title>Vector Operators</title>
|
|
|
|
|
<para>
|
|
|
|
|
In addition to the dereference and assignment operators, the
|
|
|
|
|
Power SIMD Vector Programming API (REALLY?) provides the usual
|
|
|
|
|
operators that are valid on pointers; these operators are also
|
|
|
|
|
valid for pointers to vector types.
|
|
|
|
|
</para>
|
|
|
|
|
<para>
|
|
|
|
|
The traditional C/C++ operators are defined on vector types
|
|
|
|
|
with “do all” semantics for unary and binary <code>+</code>,
|
|
|
|
|
unary and binary –, binary <code>*</code>, binary
|
|
|
|
|
<code>%</code>, and binary <code>/</code> as well as the unary
|
|
|
|
|
and binary shift, logical and comparison operators, and the
|
|
|
|
|
ternary <code>?:</code> operator.
|
|
|
|
|
</para>
|
|
|
|
|
<para>
|
|
|
|
|
For unary operators, the specified operation is performed on
|
|
|
|
|
the corresponding base element of the single operand to derive
|
|
|
|
|
the result value for each vector element of the vector
|
|
|
|
|
result. The result type of unary operations is the type of the
|
|
|
|
|
single input operand.
|
|
|
|
|
</para>
|
|
|
|
|
<para>
|
|
|
|
|
For binary operators, the specified operation is performed on
|
|
|
|
|
the corresponding base elements of both operands to derive the
|
|
|
|
|
result value for each vector element of the vector
|
|
|
|
|
result. Both operands of the binary operators must have the
|
|
|
|
|
same vector type with the same base element type. The result
|
|
|
|
|
of binary operators is the same type as the type of the input
|
|
|
|
|
operands.
|
|
|
|
|
</para>
|
|
|
|
|
<para>
|
|
|
|
|
Further, the array reference operator may be applied to vector
|
|
|
|
|
data types, yielding an l-value corresponding to the specified
|
|
|
|
|
element in accordance with the vector element numbering rules (see
|
|
|
|
|
<xref linkend="VIPR.biendian.layout" />). An l-value may either
|
|
|
|
|
be assigned a new value or accessed for reading its value.
|
|
|
|
|
</para>
|
|
|
|
|
</section>
|
|
|
|
|
|
|
|
|
|
<section xml:id="VIPR.biendian.layout">
|
|
|
|
|
<title>Vector Layout and Element Numbering</title>
|
|
|
|
|
<para>
|
|
|
|
|
filler
|
|
|
|
|
</para>
|
|
|
|
|
</section>
|
|
|
|
|
|
|
|
|
|
<section>
|
|
|
|
|