Some more intrinsic examplesThe intrinsic
_mm_cvtpd_ps
converts a packed vector double into
a packed vector single float. Since only 2 doubles fit into a 128-bit vector
only 2 floats are returned and occupy only half (64-bits) of the XMM register.
For this intrinsic the 64 bits are packed into the logical left half of the result
register and the logical right half of the register is set to zero (as per the
Intel cvtpd2ps instruction).The PowerISA provides the VSX Vector round and Convert
Double-Precision to Single-Precision format (xvcvdpsp) instruction. In the ABI
this is vec_floato (vector double).
This instruction converts each double
element, then transfers converted element 0 to float element 1, and converted
element 1 to float element 3. Float elements 0 and 2 are undefined (the
hardware can do whatever). This does not match the expected results for
_mm_cvtpd_ps.
, 1.0, , 2.0}
_mm_cvtpd_ps ({1.0, 2.0}) result = {1.0, 2.0, 0.0, 0.0}]]>So we need to re-position the results to word elements 0 and 2, which
allows a pack operation to deliver the correct format. Here the merge-odd
splats element 1 to 0 and element 3 to 2. The Pack operation combines the low
half of each doubleword from the vector result and vector of zeros to generate
the require format.
This technique is also used to implement
_mm_cvttpd_epi32
which converts a packed vector double into a packed vector int. The PowerISA instruction
xvcvdpsxws uses a similar layout for the result as
xvcvdpsp and requires the same fix up.