GCC Vector Extensions

GCC Vector Extensions The GCC vector extensions are common syntax but implemented in a target specific way. Using the C vector extensions requires the __gnu_inline__ attribute to avoid syntax errors in case the user specified C standard compliance (-std=c90, -std=c11, etc) that would normally disallow such extensions. The GCC implementation for PowerPC64 Little Endian is (mostly) functionally compatible with x86_64 vector extension usage. We can use the same type definitions (at least for vector_size (16)), operations, syntax <{...}> for vector initializers and constants, and array syntax <[]> for vector element access. So simple arithmetic / logical operations on whole vectors should work as is. The caveat is that the interface data type of the Intel Intrinsic may not match the data types of the operation, so it may be necessary to cast the operands to the specific type for the operation. This also applies to vector initializers and accessing vector elements. You need to use the appropriate type to get the expected results. Of course this applies to X86_64 as well. For example: Note the cast from the interface type (__m128} to the implementation type (__v4sf, defined in the intrinsic header) for the vector float add (+) operation. This is enough for the compiler to select the appropriate vector add instruction for the float type. Then the result (which is __v4sf) needs to be cast back to the expected interface type (__m128). Note also the use of array syntax (__A)[0]) to extract the lowest (left mostHere we are using logical left and logical right which will not match the PowerISA register view in Little endian. Logical left is the left most element for initializers {left, … , right}, storage order and array order where the left most element is [0].) element of a vector. The cast (__v4sf) insures that the compiler knows we are extracting the left most 32-bit float. The compiler insures the code generated matches the Intel behavior for PowerPC64 Little Endian. The code generation is complicated by the fact that PowerISA vector registers are Big Endian (element 0 is the left most word of the vector) and scalar loads / stores are also to / from the right most word / dword. X86 scalar loads / stores are to / from the right most element for the XMM vector register. The PowerPC64 ELF V2 ABI mimics the X86 Little Endian behavior by placing logical element [0] in the right most element of the vector register. This may require the compiler to generate additional instructions to place the scalar value in the expected position. Application code with extensive use of scalar (vs packed) intrinsic loads / stores should be flagged for rewrite to C code using existing scalar types (float, double, int, long, etc.). The compiler may be able the vectorize this scalar code using the native vector SIMD instruction set. Another example is the set reverse order: Note the use of initializer syntax used to collect a set of scalars into a vector. Code with constant initializer values will generate a vector constant of the appropriate endian. However code with variables in the initializer can get complicated as it often requires transfers between register sets and perhaps format conversions. We can assume that the compiler will generate the correct code, but if this class of intrinsics shows up as a hot spot, a rewrite to native PPC vector built-ins may be appropriate. For example initializer of a variable replicated to all the vector fields might not be recognized as a “load and splat” and making this explicit may help the compiler generate better code.