<?xml version="1.0" encoding="UTF-8"?>
<!--
  Copyright (c) 2017 OpenPOWER Foundation
  
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License.
  
-->
<section xmlns="http://docbook.org/ns/docbook"
  xmlns:xi="http://www.w3.org/2001/XInclude"
  xmlns:xlink="http://www.w3.org/1999/xlink"
  version="5.0"
  xml:id="sec_handling_mmx">
  <title>Dealing with MMX</title>
  
  <para>MMX is actually the harder case. The <literal>__m64</literal>
  type supports SIMD vector 
  int types (char, short, int, long).  The  Intel API defines  
  <literal>__m64</literal> as:
  <programlisting><![CDATA[typedef int __m64 __attribute__ ((__vector_size__ (8), __may_alias__));]]></programlisting></para>

  <para>Which is problematic for the PowerPC target (not really supported in 
  GCC) and we would prefer to use a native PowerISA type that can be passed in a 
  single register.  The PowerISA Rotate Under Mask instructions can easily 
  extract and insert integer fields of a General Purpose Register (GPR). This 
  implies that MMX integer types can be handled as an internal union of arrays for 
  the supported element types. So a 64-bit unsigned long long is the best type 
  for parameter passing and return values, especially for the 64-bit (_si64) 
  operations as these normally generate a single PowerISA instruction.
  So for the PowerPC implementation we will define
  <literal>__m64</literal> as:
  <programlisting><![CDATA[typedef __attribute__ ((__aligned__ (8))) unsigned long long __m64;]]></programlisting></para>

  <para>The SSE extensions include some copy / convert operations for 
  <literal>_m128</literal> to / 
  from <literal>_m64</literal> and this includes some int to / from float conversions. However in 
  these cases the float operands always reside in SSE (XMM) registers (which 
  match the PowerISA vector registers) and the MMX registers only contain integer 
  values. POWER8 (PowerISA-2.07) has direct move instructions between GPRs and 
  VSRs. So these transfers are normally a single instruction and any conversions 
  can be handled in the vector unit.</para>

  <para>When transferring a <literal>__m64</literal> value to a vector register we should also 
  execute a xxsplatd instruction to insure there is valid data in all four 
  float element lanes before doing floating point operations. This avoids causing 
  extraneous floating point exceptions that might be generated by uninitialized 
  parts of the vector. The top two lanes will have the floating point results 
  that are in position for direct transfer to a GPR or stored via Store Float 
  Double (stfd). These operation are internal to the intrinsic implementation and 
  there is no requirement to keep temporary vectors in correct Little Endian 
  form.</para>

  <para>Also for the smaller element sizes and higher element counts (MMX 
  <literal>_pi8</literal> and <literal>_p16</literal> types)
  the number of  Rotate Under Mask instructions required to 
  disassemble the 64-bit <literal>__m64</literal> 
  into elements, perform the element calculations, 
  and reassemble the elements in a single <literal>__m64</literal> 
  value can get larger. In this 
  case we can generate shorter instruction sequences by transfering (via direct 
  move instruction) the GPR <literal>__m64</literal> value to the 
  a vector register, performance the 
  SIMD operation there, then transfer the <literal>__m64</literal> 
  result back to a GPR.</para>
  
</section>