Intel Intrinsic functionsSo what is an intrinsic function? From Wikipedia:
In compiler theory, an
intrinsic function is a function available for use in a given
programming
language whose implementation is handled specially by the compiler.
Typically, it substitutes a sequence of automatically generated instructions
for the original function call, similar to an
inline function.
Unlike an inline function though, the compiler has an intimate knowledge of the
intrinsic function and can therefore better integrate it and optimize it for
the situation. This is also called builtin function in many languages.
The “Intel Intrinsics” API provides access to the many
instruction set extensions (Intel Technologies) that Intel has added (and
continues to add) over the years. The intrinsics provided access to new
instruction capabilities before the compilers could exploit them directly.
Initially these intrinsic functions where defined for the Intel and Microsoft
compiler and where eventually implemented and contributed to GCC.The Intel Intrinsics have a specific type and naming structure. In
this naming structure, functions starts with a common prefix (MMX and SSE use
'_mm' prefix, while AVX added the '_mm256' '_mm512' prefixes), then a short
functional name ('set', 'load', 'store', 'add', 'mul', 'blend', 'shuffle', '…') and a suffix
('_pd', '_sd', '_pi32'...) with type and packing information. See
for the list of common intrisic suffixes.Oddly many of the MMX/SSE operations are not vectors at all. There
are a lot of scalar operations on a single float, double, or long long type. In
effect these are scalars that can take advantage of the larger (xmm) register
space. Also in the Intel 32-bit architecture they provided IEEE754 float and
double types, and 64-bit integers that did not exist or were hard to implement
in the base i386/387 instruction set. These scalar operations use a suffix
starting with '_s' (_sd for scalar double float,
_ss scalar float, and _si64
for scalar long long).True vector operations use the packed or extended packed suffixes,
starting with '_p' or '_ep' (_pd for vector double,
_ps for vector float, and
_epi32 for vector int). The use of '_ep'
seems to be reserved to disambiguate
intrinsics that existed in the (64-bit vector) MMX extension from the extended
(128-bit vector) SSE equivalent. For example
_mm_add_pi32 is a MMX operation on
a pair of 32-bit integers, while
_mm_add_epi32 is an SSE2 operation on vector
of 4 32-bit integers.The GCC builtins for the
i386.target
(includes x86 and x86_64) are not
the same as the Intel Intrinsics. While they have similar intent and cover most
of the same functions, they use a different naming (prefixed with
__builtin_ia32_, then function name with type suffix) and uses GCC vector type
modes for operand types. For example:
A key difference between GCC built-ins for i386 and PowerPC is
that the x86 built-ins have different names of each operation and type while the
PowerPC altivec built-ins tend to have a single generic built-in for each
operation, across a set of compatible operand types. In GCC the Intel Intrinsic header (*intrin.h) files are implemented
as a set of inline functions using the Intel Intrinsic API names and types.
These functions are implemented as either GCC C vector extension code or via
one or more GCC builtins for the i386 target. So lets take a look at some
examples from GCC's SSE2 intrinsic header emmintrin.h:
Note that the
_mm_add_pd is implemented direct as GCC C vector
extension code., while
_mm_add_sd is implemented via the GCC builtin
__builtin_ia32_addsd. From the
discussion above we know the _pd suffix
indicates a packed vector double while the _sd suffix indicates a scalar double
in a XMM register.