The POWER Bi-Endian Vector Programming Model
To ensure portability of applications optimized to exploit the
SIMD functions of POWER ISA processors, the ELF V2 ABI defines a
set of functions and data types for SIMD programming. ELF
V2-compliant compilers will provide suitable support for these
functions, preferably as built-in functions that translate to one
or more POWER ISA instructions.
Compilers are encouraged, but not required, to provide built-in
functions to access individual instructions in the IBM POWER®
instruction set architecture. In most cases, each such built-in
function should provide direct access to the underlying
instruction.
However, to ease porting between little-endian (LE) and big-endian
(BE) POWER systems, and between POWER and other platforms, it is
preferable that some built-in functions provide the same semantics
on both LE and BE POWER systems, even if this means that the
built-in functions are implemented with different instruction
sequences for LE and BE. To achieve this, vector built-in
functions provide a set of functions derived from the set of
hardware functions provided by the Power vector SIMD
instructions. Unlike traditional “hardware intrinsic” built-in
functions, no fixed mapping exists between these built-in
functions and the generated hardware instruction sequence. Rather,
the compiler is free to generate optimized instruction sequences
that implement the semantics of the program specified by the
programmer using these built-in functions.
This is primarily applicable to the POWER SIMD instructions. As
we've seen, this set of instructions operates on groups of 2, 4,
8, or 16 vector elements at a time in 128-bit registers. On a
big-endian POWER platform, vector elements are loaded from memory
into a register so that the 0th element occupies the high-order
bits of the register, and the (N – 1)th element occupies the
low-order bits of the register. This is referred to as big-endian
element order. On a little-endian POWER platform, vector elements
are loaded from memory such that the 0th element occupies the
low-order bits of the register, and the (N – 1)th element
occupies the high-order bits. This is referred to as little-endian
element order.
Much of the information in this chapter was formerly part of
Chapter 6 of the 64-Bit ELF V2 ABI Specification for POWER.
Vector Data Types
Languages provide support for the data types in to represent vector data
types stored in vector registers.
For the C and C++ programming languages (and related/derived
languages), these data types may be accessed based on the type
names listed in when
Power ISA SIMD language extensions are enabled using either the
vector
or __vector
keywords. [FIXME:
We haven't talked about these at all. Need to borrow some
description from the AltiVec PIM about the usage of vector,
bool, and pixel, and supplement with the problems this causes
with strict-ANSI C++. Maybe a separate section on "Language
Elements" should precede this one.]
For the Fortran language, gives a correspondence
between Fortran and C/C++ language types.
The assignment operator always performs a byte-by-byte data copy
for vector data types.
Like other C/C++ language types, vector types may be defined to
have const or volatile properties. Vector data types can be
defined as being in static, auto, and register storage.
Pointers to vector types are defined like pointers of other
C/C++ types. Pointers to vector objects may be defined to have
const and volatile properties. Pointers to vector objects must
be divisible by 16, as vector objects are always aligned on
quadword (128-bit) boundaries.
The preferred way to access vectors at an application-defined
address is by using vector pointers and the C/C++ dereference
operator *
. Similar to other C/C++ data types, the
array reference operator []
may be used to access
vector objects with a vector pointer with the usual definition
to access the nth vector element from a
vector pointer. The dereference operator *
may
not be used to access data that is not
aligned at least to a quadword boundary. Built-in functions
such as vec_xl
and vec_xst
are
provided for unaligned data access.
Compilers are expected to recognize and optimize multiple
operations that can be optimized into a single hardware
instruction. For example, a load and splat hardware instruction
might be generated for the following sequence:
double *double_ptr;
register vector double vd = vec_splats(*double_ptr);
Vector Types
Power SIMD C Types
sizeof
Alignment
Description
vector unsigned char
16
Quadword
Vector of 16 unsigned bytes.
vector signed char
16
Quadword
Vector of 16 signed bytes.
vector bool char
16
Quadword
Vector of 16 bytes with a value of either 0 or
28 – 1.
vector unsigned short
16
Quadword
Vector of 8 unsigned halfwords.
vector signed short
16
Quadword
Vector of 8 signed halfwords.
vector bool short
16
Quadword
Vector of 8 halfwords with a value of either 0 or
216 – 1.
vector unsigned int
16
Quadword
Vector of 4 unsigned words.
vector signed int
16
Quadword
Vector of 4 signed words.
vector bool int
16
Quadword
Vector of 4 words with a value of either 0 or
232 – 1.
vector unsigned long
The vector long types are deprecated due to their
ambiguity between 32-bit and 64-bit environments. The use
of the vector long long types is preferred.
vector unsigned long long
16
Quadword
Vector of 2 unsigned doublewords.
vector signed long
vector signed long long
16
Quadword
Vector of 2 signed doublewords.
vector bool long
vector bool long long
16
Quadword
Vector of 2 doublewords with a value of either 0 or
264 – 1.
vector unsigned __int128
16
Quadword
Vector of 1 unsigned quadword.
vector signed __int128
16
Quadword
Vector of 1 signed quadword.
vector _Float16
16
Quadword
Vector of 8 half-precision floats.
vector float
16
Quadword
Vector of 4 single-precision floats.
vector double
16
Quadword
Vector of 2 double-precision floats.
Vector Operators
In addition to the dereference and assignment operators, the
Power SIMD Vector Programming API [FIXME: If we're going to use
a term like this, let's use it consistently; also, SIMD and
Vector are redundant] provides the usual operators that are
valid on pointers; these operators are also valid for pointers
to vector types.
The traditional C/C++ operators are defined on vector types
with “do all” semantics for unary and binary +
,
unary and binary –, binary *
, binary
%
, and binary /
as well as the unary
and binary shift, logical and comparison operators, and the
ternary ?:
operator.
For unary operators, the specified operation is performed on
the corresponding base element of the single operand to derive
the result value for each vector element of the vector
result. The result type of unary operations is the type of the
single input operand.
For binary operators, the specified operation is performed on
the corresponding base elements of both operands to derive the
result value for each vector element of the vector
result. Both operands of the binary operators must have the
same vector type with the same base element type. The result
of binary operators is the same type as the type of the input
operands.
Further, the array reference operator may be applied to vector
data types, yielding an l-value corresponding to the specified
element in accordance with the vector element numbering rules (see
). An l-value may either
be assigned a new value or accessed for reading its value.
Vector Layout and Element Numbering
Vector data types consist of a homogeneous sequence of elements
of the base data type specified in the vector data
type. Individual elements of a vector can be addressed by a
vector element number. Element numbers can be established either
by counting from the “left” of a register and assigning the
left-most element the element number 0, or from the “right” of
the register and assigning the right-most element the element
number 0.
In big-endian environments, establishing element counts from the
left makes the element stored at the lowest memory address the
lowest-numbered element. Thus, when vectors and arrays of a
given base data type are overlaid, vector element 0 corresponds
to array element 0, vector element 1 corresponds to array
element 1, and so forth.
In little-endian environments, establishing element counts from
the right makes the element stored at the lowest memory address
the lowest-numbered element. Thus, when vectors and arrays of a
given base data type are overlaid, vector element 0 will
correspond to array element 0, vector element 1 will correspond
to array element 1, and so forth.
Consequently, the vector numbering schemes can be described as
big-endian and little-endian vector layouts and vector element
numberings.
This element numbering shall also be used by the []
accessor method to vector elements provided as an extension of
the C/C++ languages by some compilers, as well as for other
language extensions or library constructs that directly or
indirectly refer to elements by their element number.
Application programs may query the vector element ordering in
use by testing the __VEC_ELEMENT_REG_ORDER__ macro. This macro
has two possible values:
__ORDER_LITTLE_ENDIAN__
Vector elements use little-endian element ordering.
__ORDER_BIG_ENDIAN__
Vector elements use big-endian element ordering.
Vector Built-In Functions
Some of the POWER SIMD hardware instructions refer, implicitly
or explicitly, to vector element numbers. For example, the
vspltb
instruction has as one of its inputs an
index into a vector. The element at that index position is to
be replicated in every element of the output vector. For
another example, vmuleuh
instruction operates on
the even-numbered elements of its input vectors. The hardware
instructions define these element numbers using big-endian
element order, even when the machine is running in little-endian
mode. Thus, a built-in function that maps directly to the
underlying hardware instruction, regardless of the target
endianness, has the potential to confuse programmers on
little-endian platforms.
It is more useful to define built-in functions that map to these
instructions to use natural element order. That is, the
explicit or implicit element numbers specified by such built-in
functions should be interpreted using big-endian element order
on a big-endian platform, and using little-endian element order
on a little-endian platform.
The descriptions of the built-in functions in contain notes on endian issues that
apply to each built-in function. Furthermore, a built-in
function requiring a different compiler implementation for
big-endian than it uses for little-endian has a sample
compiler implementation for both BE and LE. These sample
implementations are only intended as examples; designers of a
compiler are free to use other methods to implement the
specified semantics as they see fit.
Extended Data Movement Functions
The built-in functions in map to Altivec/VMX load and
store instructions and provide access to the “auto-aligning”
memory instructions of the VMX ISA where low-order address
bits are discarded before performing a memory access. These
instructions access load and store data in accordance with the
program's current endian mode, and do not need to be adapted
by the compiler to reflect little-endian operating during code
generation.
VMX Memory Access Built-In Functions
Built-in Function
Corresponding POWER
Instructions
Implementation Notes
vec_ld
lvx
Hardware works as a function of endian mode.
vec_lde
lvebx, lvehx, lvewx
Hardware works as a function of endian mode.
vec_ldl
lvxl
Hardware works as a function of endian mode.
vec_st
stvx
Hardware works as a function of endian mode.
vec_ste
stvebx, stvehx, stvewx
Hardware works as a function of endian mode.
vec_stl
stvxl
Hardware works as a function of endian mode.
Previous versions of the VMX built-in functions defined
intrinsics to access the VMX instructions lvsl
and lvsr
, which could be used in conjunction with
vec_vperm
and VMX load and store instructions for
unaligned access. The vec_lvsl
and
vec_lvsr
interfaces are deprecated in accordance
with the interfaces specified here. For compatibility, the
built-in pseudo sequences published in previous VMX documents
continue to work with little-endian data layout and the
little-endian vector layout described in this
document. However, the use of these sequences in new code is
discouraged and usually results in worse performance. It is
recommended (but not required) that compilers issue a warning
when these functions are used in little-endian
environments. It is recommended that programmers use the
vec_xl
and vec_xst
vector built-in
functions to access unaligned data streams. See the
descriptions of these instructions in for further description and
implementation details.
Big-Endian Vector Layout in Little-Endian Environments
(Deprecated)
Versions 1.0 through 1.4 of the 64-Bit ELFv2 ABI Specification
for POWER provided for optional compiler support for using
big-endian element ordering in little-endian environments.
This was initially deemed useful for porting certain libraries
that assumed big-endian element ordering regardless of the
endianness of their input streams. In practice, this
introduced serious compiler complexity without much utility.
Thus this support (previously controlled by switches
-maltivec=be
and/or -qaltivec=be
) is
now deprecated. Current versions of the gcc and clang
open-source compilers do not implement this support.
Language-Specific Vector Support for Other
Languages
Fortran
shows the
correspondence between the C/C++ types described in this
document and their Fortran equivalents. In Fortran, the
Boolean vector data types are represented by
VECTOR(UNSIGNED(
n))
.
Fortran Vector Data Types
XL Fortran Vector Type
XL C/C++ Vector Type
VECTOR(INTEGER(1))
vector signed char
VECTOR(INTEGER(2))
vector signed short
VECTOR(INTEGER(4))
vector signed int
VECTOR(INTEGER(8))
vector signed long long, vector signed long
The vector long types are deprecated due to their
ambiguity between 32-bit and 64-bit environments. The use
of the vector long long types is preferred.
VECTOR(INTEGER(16))
vector signed __int128
VECTOR(UNSIGNED(1))
vector unsigned char
VECTOR(UNSIGNED(2))
vector unsigned short
VECTOR(UNSIGNED(4))
vector unsigned int
VECTOR(UNSIGNED(8))
vector unsigned long long, vector unsigned long
VECTOR(UNSIGNED(16))
vector unsigned __int128
VECTOR(REAL(4))
vector float
VECTOR(REAL(8))
vector double
VECTOR(PIXEL)
vector pixel
Because the Fortran language does not support pointers, vector
built-in functions that expect pointers to a base type take an
array element reference to indicate the address of a memory
location that is the subject of a memory access built-in
function.
Because the Fortran language does not support type casts, the
vec_convert
and vec_concat
built-in
functions shown in are
provided to perform bit-exact type conversions between vector
types.
Built-In Vector Conversion Functions
Group
Description
VEC_CONCAT (ARG1, ARG2)(Fortran)
Purpose:
Concatenates two elements to form a vector.
Result value:
The resulting vector consists of the two scalar elements,
ARG1 and ARG2, assigned to elements 0 and 1 (using the
environment’s native endian numbering), respectively.
Note: This function corresponds to the C/C++ vector
constructor (vector type){a,b}. It is provided only for
languages without vector constructors.
vector signed long long vec_concat (signed long long,
signed long long);
vector unsigned long long vec_concat (unsigned long long,
unsigned long long);
vector double vec_concat (double, double);
VEC_CONVERT(V, MOLD)
Purpose:
Converts a vector to a vector of a given type.
Class:
Pure function
Argument type and attributes:
V Must be an INTENT(IN) vector.
MOLD Must be an INTENT(IN) vector. If it is a
variable, it need not be defined.
Result type and attributes:
The result is a vector of the same type as MOLD.
Result value:
The result is as if it were on the left-hand side of an
intrinsic assignment with V on the right-hand side.
Limitations
vec_sld
vec_perm