Some more intrinsic examples

Some more intrinsic examples The intrinsic _mm_cvtpd_ps converts a packed vector double into a packed vector single float. Since only 2 doubles fit into a 128-bit vector only 2 floats are returned and occupy only half (64-bits) of the XMM register. For this intrinsic the 64 bits are packed into the logical left half of the result register and the logical right half of the register is set to zero (as per the Intel cvtpd2ps instruction). The PowerISA provides the VSX Vector round and Convert Double-Precision to Single-Precision format (xvcvdpsp) instruction. In the ABI this is vec_floato (vector double). This instruction converts each double element, then transfers converted element 0 to float element 1, and converted element 1 to float element 3. Float elements 0 and 2 are undefined (the hardware can do whatever). This does not match the expected results for _mm_cvtpd_ps. , 1.0, , 2.0} _mm_cvtpd_ps ({1.0, 2.0}) result = {1.0, 2.0, 0.0, 0.0}]]> So we need to re-position the results to word elements 0 and 2, which allows a pack operation to deliver the correct format. Here the merge-odd splats element 1 to 0 and element 3 to 2. The Pack operation combines the low half of each doubleword from the vector result and vector of zeros to generate the require format. This technique is also used to implement _mm_cvttpd_epi32 which converts a packed vector double into a packed vector int. The PowerISA instruction xvcvdpsxws uses a similar layout for the result as xvcvdpsp and requires the same fix up.