Programming-Guides/Intrinsics_Reference/ch_biendian.xml

<!--
  Copyright (c) 2019 OpenPOWER Foundation
  
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License.
  
-->
<chapter version="5.0" xml:lang="en" xmlns="http://docbook.org/ns/docbook" xmlns:xi="http://www.w3.org/2001/XInclude"
xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="VIPR.biendian">
  
  <!-- Chapter Title goes here. -->
  <title>The Power Bi-Endian Vector Programming Model</title>

  <para>
    To ensure portability of applications optimized to exploit the
    SIMD functions of Power ISA processors, this reference defines a
    set of functions and data types for SIMD programming.  Compliant
    compilers will provide suitable support for these functions,
    preferably as built-in functions that translate to one or more
    Power ISA instructions.
  </para>
  <para>
    Compilers are encouraged, but not required, to provide built-in
    functions to access individual instructions in the IBM Power®
    instruction set architecture. In most cases, each such built-in
    function should provide direct access to the underlying
    instruction.
  </para>
  <para>
    However, to ease porting between little-endian (LE) and big-endian
    (BE) Power systems, and between Power and other platforms, it is
    preferable that some built-in functions provide the same semantics
    on both LE and BE Power systems, even if this means that the
    built-in functions are implemented with different instruction
    sequences for LE and BE. To achieve this, vector built-in
    functions provide a set of functions derived from the set of
    hardware functions provided by the Power SIMD instructions. Unlike
    traditional “hardware intrinsic” built-in functions, no fixed
    mapping exists between these built-in functions and the generated
    hardware instruction sequence. Rather, the compiler is free to
    generate optimized instruction sequences that implement the
    semantics of the program specified by the programmer using these
    built-in functions. 
  </para>
  <para>
    As we've seen, the Power SIMD instructions operate on groups of 1,
    2, 4, 8, or 16 vector elements at a time in 128-bit registers. On
    a big-endian Power platform, vector elements are loaded from
    memory into a register so that the 0th element occupies the
    high-order bits of the register, and the (N &#8211; 1)th element
    occupies the low-order bits of the register. This is referred to
    as big-endian element order. On a little-endian Power platform,
    vector elements are loaded from memory such that the 0th element
    occupies the low-order bits of the register, and the (N &#8211;
    1)th element occupies the high-order bits. This is referred to as
    little-endian element order.
  </para> 

  <note>
    <para>
      Much of the information in this chapter was formerly part of
      Chapter 6 of the 64-Bit ELF V2 ABI Specification for Power.
    </para>
  </note>

  <section>
    <title>Language Elements</title>
    <para>
      The C and C++ languages are extended to use new identifiers
      <code>vector</code>, <code>pixel</code>, <code>bool</code>,
      <code>__vector</code>, <code>__pixel</code>, and
      <code>__bool</code>.  These keywords are used to specify vector
      data types (<xref linkend="VIPR.ch-data-types" />).  Because
      these identifiers may conflict with keywords in more recent
      language standards for C and C++, compilers may implement these
      in one of two ways.
    </para>
    <itemizedlist>
      <listitem>
	<para>
	  <code>__vector</code>, <code>__pixel</code>,
	  <code>__bool</code>, and <code>bool</code> are defined as
	  keywords, with <code>vector</code> and <code>pixel</code> as
	  predefined macros that expand to <code>__vector</code> and
	  <code>__pixel</code>, respectively.
	</para>
      </listitem>
      <listitem>
	<para>
	  <code>__vector</code>, <code>__pixel</code>, and
	  <code>__bool</code> are defined as keywords in all contexts,
	  while <code>vector</code>, <code>pixel</code>, and
	  <code>bool</code> are treated as keywords only within the
	  context of a type declaration.
	</para>
      </listitem>
    </itemizedlist>
    <para>
      As a motivating example, the <emphasis
      role="bold">vector</emphasis> token is used as a type in the
      C++ Standard Template Library, and hence cannot be used as an
      unrestricted keyword, but can be used in the context-sensitive
      implementation.  For example, <emphasis role="bold">vector
      char</emphasis> is distinct from <emphasis
      role="bold">std::vector</emphasis> in the context-sensitive
      implementation.
    </para>
    <para>
      Vector literals may be specified using a type cast and a set of
      literal initializers in parentheses or braces.  For example,
    </para>
    <programlisting>vector int x = (vector int) (4, -1, 3, 6);
vector double g = (vector double) { 3.5, -24.6 };</programlisting>
    <para>
      Current C compilers do not support literals for
      <code>__int128</code> types.  When constructing a <code>vector
      __int128</code> constant from smaller literals such as
      <code>int</code> or <code>long long</code>, you must test for
      endianness and reverse the order of the smaller literals for
      little-endian mode.
    </para>
  </section>

  <section xml:id="VIPR.ch-data-types">
    <title>Vector Data Types</title>
    <para>
      Languages provide support for the data types in <xref
      linkend="VIPR.biendian.vectypes" /> to represent vector data
      types stored in vector registers.
    </para> 
    <para>
      For the C and C++ programming languages (and related/derived
      languages), the "Power SIMD C Types" listed in the leftmost
      column of <xref linkend="VIPR.biendian.vectypes" /> may be used
      when Power SIMD language extensions are enabled.  Either
      <code>vector</code> or <code>__vector</code> may be used in the
      type name.  Note that the ELFv2 ABI for Power also includes a
      <code>vector _Float16</code> data type.  As of this writing, no
      current compilers for Power have implemented such a type.  This
      document does not include that type or any intrinsics related to
      it.
    </para> 
    <para>
      For the Fortran language, <xref
      linkend="VIPR.biendian.fortran-types" /> gives a correspondence
      between Fortran and C/C++ language types.
    </para>
    <para>
      The assignment operator always performs a byte-by-byte data copy
      for vector data types.
    </para>
    <para>
      Like other C/C++ language types, vector types may be defined to
      have const or volatile properties. Vector data types can be
      defined as being in static, auto, and register storage.
    </para>
    <para>
      Pointers to vector types are defined like pointers of other
      C/C++ types. Pointers to vector objects may be defined to have
      const and volatile properties.  Pointers to vector objects must
      be addresses divisible by 16, as vector objects are always
      aligned on quadword (16-byte, or 128-bit) boundaries.
    </para>
    <para>
      The preferred way to access vectors at an application-defined
      address is by using vector pointers and the C/C++ dereference
      operator <code>*</code>. Similar to other C/C++ data types, the
      array reference operator <code>[]</code> may be used to access
      vector objects with a vector pointer with the usual definition
      to access the <emphasis>N</emphasis>th vector element from a
      vector pointer. The dereference operator <code>*</code> may
      <emphasis>not</emphasis> be used to access data that is not
      aligned at least to a quadword boundary.  Built-in functions
      such as <code>vec_xl</code> and <code>vec_xst</code> are
      provided for unaligned data access.  Please refer to <xref
      linkend="VIPR.biendian.unaligned" /> for an example.
    </para>
    <para>
      One vector type may be cast to another vector type without
      restriction.  Such a cast is simply a reinterpretation of the
      bits, and does not change the data.
    </para>
    <para>
      Compilers are expected to recognize and optimize multiple
      operations that can be optimized into a single hardware
      instruction. For example, a load-and-splat hardware instruction
      (such as <emphasis role="bold">lxvdsx</emphasis>)
      might be generated for the following sequence:
    </para>
    <programlisting>double *double_ptr;
register vector double vd = vec_splats(*double_ptr);</programlisting>
    <table frame="all" pgwide="1" xml:id="VIPR.biendian.vectypes">
      <title>Vector Types</title>
      <tgroup cols="4">
        <colspec colname="c1" colwidth="20*" />
        <colspec colname="c2" colwidth="10*" align="center" />
        <colspec colname="c3" colwidth="15*" align="center" />
        <colspec colname="c4" colwidth="40*" />
        <thead>
          <row>
            <entry align="center">
              <para>
                <emphasis role="bold">Power SIMD C Types</emphasis>
              </para>
            </entry>
            <entry align="center">
              <para>
                <emphasis role="bold">sizeof</emphasis>
              </para>
            </entry>
            <entry align="center">
              <para>
                <emphasis role="bold">Alignment</emphasis>
              </para>
            </entry>
            <entry align="center">
              <para>
                <emphasis role="bold">Description</emphasis>
              </para>
            </entry>
          </row>
        </thead>
        <tbody>
          <row>
            <entry>
              <para>vector unsigned char</para>
            </entry>
            <entry>
              <para>16</para>
            </entry>
            <entry>
              <para>Quadword</para>
            </entry>
            <entry>
              <para>Vector of 16 unsigned bytes.</para>
            </entry>
          </row>
          <row>
            <entry>
              <para>vector signed char</para>
            </entry>
            <entry>
              <para>16</para>
            </entry>
            <entry>
              <para>Quadword</para>
            </entry>
            <entry>
              <para>Vector of 16 signed bytes.</para>
            </entry>
          </row>
          <row>
            <entry>
              <para>vector bool char</para>
            </entry>
            <entry>
              <para>16</para>
            </entry>
            <entry>
              <para>Quadword</para>
            </entry>
            <entry>
              <para>Vector of 16 bytes with a value of either 0 or 
              2<superscript>8</superscript> &#8211; 1.</para>
            </entry>
          </row>
          <row>
            <entry>
              <para>vector unsigned short</para>
            </entry>
            <entry>
              <para>16</para>
            </entry>
            <entry>
              <para>Quadword</para>
            </entry>
            <entry>
              <para>Vector of 8 unsigned halfwords.</para>
            </entry>
          </row>
          <row>
            <entry>
              <para>vector signed short</para>
            </entry>
            <entry>
              <para>16</para>
            </entry>
            <entry>
              <para>Quadword</para>
            </entry>
            <entry>
              <para>Vector of 8 signed halfwords.</para>
            </entry>
          </row>
          <row>
            <entry>
              <para>vector bool short</para>
            </entry>
            <entry>
              <para>16</para>
            </entry>
            <entry>
              <para>Quadword</para>
            </entry>
            <entry>
              <para>Vector of 8 halfwords with a value of either 0 or 
              2<superscript>16</superscript> &#8211; 1.</para>
            </entry>
          </row>
          <row>
            <entry>
              <para>vector pixel</para>
            </entry>
            <entry>
              <para>16</para>
            </entry>
            <entry>
              <para>Quadword</para>
            </entry>
            <entry>
              <para>Vector of 8 halfwords, each interpreted as a 1-bit
	      channel and three 5-bit channels.</para>
            </entry>
          </row>
          <row>
            <entry>
              <para>vector unsigned int</para>
            </entry>
            <entry>
              <para>16</para>
            </entry>
            <entry>
              <para>Quadword</para>
            </entry>
            <entry>
              <para>Vector of 4 unsigned words.</para>
            </entry>
          </row>
          <row>
            <entry>
              <para>vector signed int</para>
            </entry>
            <entry>
              <para>16</para>
            </entry>
            <entry>
              <para>Quadword</para>
            </entry>
            <entry>
              <para>Vector of 4 signed words.</para>
            </entry>
          </row>
          <row>
            <entry>
              <para>vector bool int</para>
            </entry>
            <entry>
              <para>16</para>
            </entry>
            <entry>
              <para>Quadword</para>
            </entry>
            <entry>
              <para>Vector of 4 words with a value of either 0 or 
              2<superscript>32</superscript> &#8211; 1.</para>
            </entry>
          </row>
          <row>
            <entry>
              <para>vector unsigned long<footnote xml:id="vlong">
              <para>The vector long types are deprecated due to their
              ambiguity between 32-bit and 64-bit environments. The use
              of the vector long long types is preferred.</para>
              </footnote></para>
              <para>vector unsigned long long</para>
            </entry>
            <entry>
              <para>16</para>
            </entry>
            <entry>
              <para>Quadword</para>
            </entry>
            <entry>
              <para>Vector of 2 unsigned doublewords.</para>
            </entry>
          </row>
          <row>
            <entry>
              <para>vector signed long<footnoteref linkend="vlong" /></para>
              <para>vector signed long long</para>
            </entry>
            <entry>
              <para>16</para>
            </entry>
            <entry>
              <para>Quadword</para>
            </entry>
            <entry>
              <para>Vector of 2 signed doublewords.</para>
            </entry>
          </row>
          <row>
            <entry>
              <para>vector bool long<footnoteref linkend="vlong" /></para>
              <para>vector bool long long</para>
            </entry>
            <entry>
              <para>16</para>
            </entry>
            <entry>
              <para>Quadword</para>
            </entry>
            <entry>
              <para>Vector of 2 doublewords with a value of either 0 or 
              2<superscript>64</superscript> &#8211; 1.</para>
            </entry>
          </row>
          <row>
            <entry>
              <para>vector unsigned __int128</para>
            </entry>
            <entry>
              <para>16</para>
            </entry>
            <entry>
              <para>Quadword</para>
            </entry>
            <entry>
              <para>Vector of 1 unsigned quadword.</para>
            </entry>
          </row>
          <row>
            <entry>
              <para>vector signed __int128</para>
            </entry>
            <entry>
              <para>16</para>
            </entry>
            <entry>
              <para>Quadword</para>
            </entry>
            <entry>
              <para>Vector of 1 signed quadword.</para>
            </entry>
          </row>
          <row>
            <entry>
              <para>vector float</para>
            </entry>
            <entry>
              <para>16</para>
            </entry>
            <entry>
              <para>Quadword</para>
            </entry>
            <entry>
              <para>Vector of 4 single-precision floats.</para>
            </entry>
          </row>
          <row>
            <entry>
              <para>vector double</para>
            </entry>
            <entry>
              <para>16</para>
            </entry>
            <entry>
              <para>Quadword</para>
            </entry>
            <entry>
              <para>Vector of 2 double-precision floats.</para>
            </entry>
          </row>
        </tbody>
      </tgroup>
    </table>
  </section>

  <section>
    <title>Vector Operators</title>
    <para>
      In addition to the dereference and assignment operators, the
      Power Bi-Endian Vector Programming Model provides the usual
      operators that are valid on pointers; these operators are also
      valid for pointers to vector types.
    </para>
    <para>
      The traditional C/C++ operators are defined on vector types
      for unary and binary <code>+</code>,
      unary and binary &#8211;, binary <code>*</code>, binary
      <code>%</code>, and binary <code>/</code> as well as the unary
      and binary shift, logical and comparison operators, and the
      ternary <code>?:</code> operator.  These operators perform their
      operations "elementwise" on the base elements of the operands,
      as follows.
    </para>
    <para>
      For unary operators, the specified operation is performed on
      each base element of the single operand to derive the result
      value placed into the corresponding element of the vector
      result. The result type of unary operations is the type of the
      single operand.  For example,
    </para>
    <programlisting>vector signed int a, b;
a = -b;</programlisting>
    <para>
      produces the same result as
    </para>
    <programlisting>vector signed int a, b;
a = vec_neg (b);</programlisting>
    <para>
      For binary operators, the specified operation is performed on
      corresponding base elements of both operands to derive the
      result value for each vector element of the vector result. Both
      operands of the binary operators must have the same vector type
      with the same base element type. The result of binary operators
      is the same type as the type of the operands.  For example,
    </para>
    <programlisting>vector signed int a, b;
a = a + b;</programlisting>
    <para>
      produces the same result as
    </para>
    <programlisting>vector signed int a, b;
a = vec_add (a, b);</programlisting>
    <para>
      Further, the array reference operator may be applied to vector
      data types, yielding an l-value corresponding to the specified
      element in accordance with the vector element numbering rules (see 
      <xref linkend="VIPR.biendian.layout" />). An l-value may either
      be assigned a new value or accessed for reading its value.  For
      example,
    </para>
    <programlisting>vector signed int a;
signed int b, c;
b = a[0];
a[3] = c;</programlisting>
  </section>

  <section xml:id="VIPR.biendian.layout">
    <title>Vector Layout and Element Numbering</title>
    <para>
      Vector data types consist of a homogeneous sequence of elements
      of the base data type specified in the vector data
      type. Individual elements of a vector can be addressed by a
      vector element number.  To understand how vector elements are
      represented in memory and in registers, it is best to start with
      some simple concepts of endianness.
    </para>
    <figure pgwide="1" xml:id="scalar-endian">
      <title>Scalar Quantities and Endianness</title>
      <mediaobject>
	<imageobject>
	  <imagedata fileref="Scalar-endian.png" format="PNG"
		     scalefit="1" width="100%" />
	</imageobject>
      </mediaobject>
    </figure>
    <para>
      <xref linkend="scalar-endian" /> shows different representations
      of a 64-bit scalar integer with the hexadecimal value
      <code>0x0123456789ABCDEF</code>.  We say that the most
      significant byte (MSB) of this value is <code>0x01</code>, and
      its least significant byte (LSB) is <code>0xEF</code>.  The scalar
      value is stored using eight bytes of memory.  On a little-endian
      (LE) system, the LSB is stored at the lowest address of these
      eight bytes, and the MSB is stored at the highest address.  On a
      big-endian (BE) system, the MSB is stored at the lowest address
      of these eight bytes, and the LSB is stored at the highest
      address.  Regardless of the memory order, the register
      representation of the scalar value is identical; the MSB is
      located on the "left" end of the register, and the LSB is
      located on the "right" end.
    </para>
    <para>
      Of course, the concept of "left" and "right" is a useful
      fiction; there is no guarantee that the circuitry of a hardware
      register is laid out this way.  However, we will see, as we deal
      with vector elements, that the concepts of left and right are
      more natural for human understanding than byte and element
      significance.  Indeed, most programming languages have
      operators, such as shift-left and shift-right, that use this
      same terminology.
    </para>
    <para>
      Let's move from scalars to arrays, which are more interesting to
      us since we can use vector registers to operate on arrays, or
      portions of larger arrays.  Suppose we
      have an array of bytes with values 0 through 15, as shown in
      <xref linkend="byte-array-endian" />.  Note that each byte is a
      separate data element with only one possible representation in
      memory, so the array of bytes looks identical in memory,
      regardless of whether we are using a BE system or an LE system.
      But when we load these 16 bytes into a vector register, perhaps
      by using the ISA 3.0 <emphasis role="bold">lxv</emphasis>
      instruction, the byte at the lowest address on an LE system will
      be placed in the LSB of the vector register, but on a BE system
      will be placed in the MSB of the vector register.  Thus the
      array elements appear "right to left" in the register on an LE
      system, and "left to right" in the register on a BE system.
    </para>
    <figure pgwide="1" xml:id="byte-array-endian">
      <title>Byte Arrays and Endianness</title>
      <mediaobject>
	<imageobject>
	  <imagedata fileref="Byte-array-endian.png" format="PNG"
		     scalefit="1" width="100%" />
	</imageobject>
      </mediaobject>
    </figure>
    <para>
      Things become even more interesting when we consider arrays of
      larger elements.  In <xref linkend="word-array-endian" />, we
      see the layout of an array of four 32-bit integers, where the 0th
      element has hexadecimal value <code>0x00010203</code>, the 1st
      element has value <code>0x04050607</code>, the 2nd element has
      value <code>0x08090A0B</code>, and the 3rd element has value
      <code>0x0C0D0E0F</code>.  The order of the array elements in
      memory is the same for both LE and BE systems; but the layout of
      each element itself is reversed.  When the <emphasis
      role="bold">lxv</emphasis> instruction is used to load the
      memory into a vector register, again the low address is loaded
      into the LSB of the register for LE, but loaded into the MSB of
      the register for BE.  The effect is that the array elements
      again appear right-to-left on a LE system and left-to-right on a
      BE system.  Note that each 32-bit element of the array has its
      most significant bit "on the left" whether a LE or BE system is
      in use.  This is of course necessary for proper arithmetic to be
      performed on the array elements by vector instructions.
    </para>
    <figure pgwide="1" xml:id="word-array-endian">
      <title>Word Arrays and Endianness</title>
      <mediaobject>
	<imageobject>
	  <imagedata fileref="Word-array-endian.png" format="PNG"
		     scalefit="1" width="100%" />
	</imageobject>
      </mediaobject>
    </figure>

<!-- Element numbers can be established either
      by counting from the “left” of a register and assigning the
      left-most element the element number 0, or from the “right” of
      the register and assigning the right-most element the element
      number 0.
      </para>
      -->
    <para>
      Thus on a BE system, we number vector elements starting with 0
      on the left, while on an LE system, we number vector elements
      starting with 0 on the right.  We will informally refer to these
      as big-endian and little-endian vector element numberings and
      vector layouts.
    </para>
    <para>
      This element numbering shall also be used by the <code>[]</code>
      accessor method to vector elements provided as an extension of
      the C/C++ languages by some compilers, as well as for other
      language extensions or library constructs that directly or
      indirectly refer to elements by their element number.
    </para>
    <para>
      Application programs may query the vector element ordering in
      use by testing the __VEC_ELEMENT_REG_ORDER__ macro. This macro
      has two possible values:
    </para>
    <informaltable frame="none" rowsep="0" colsep="0">
      <tgroup cols="2">
        <colspec colname="c1" colwidth="40*" />
        <colspec colname="c2" colwidth="60*" />
        <tbody>
          <row>
            <entry>
              <para>__ORDER_LITTLE_ENDIAN__</para>
            </entry>
            <entry>
              <para>Vector elements use little-endian element ordering.</para>
            </entry>
          </row>
          <row>
            <entry>
              <para>__ORDER_BIG_ENDIAN__</para>
            </entry>
            <entry>
              <para>Vector elements use big-endian element ordering.</para>
            </entry>
          </row>
        </tbody>
      </tgroup>
    </informaltable>
    <para>
      This is no longer as useful as it once was.  The primary use
      case was for big-endian vector layout in little-endian
      environments, which is now deprecated as discussed in <xref
      linkend="VIPR.biendian.BELE" />.  It's generally equivalent to
      test for <code>__BIG_ENDIAN__</code> or
      <code>__LITTLE_ENDIAN__</code>.
    </para>
    <note>
      <para>
	Remember that each element in a vector has the same representation
	in both big- and little-endian element orders.  That is, an
	<code>int</code> is always 32 bits, with the sign bit in the
	high-order position.  Programmers must be aware of this when
	programming with mixed data types, such as an instruction that
	multiplies two <code>short</code> elements to produce an
	<code>int</code> element.  Always access entire elements to
	avoid potential endianness issues.
      </para>
    </note>
  </section>

  <section>
    <title>Vector Built-In Functions</title>
    <para>
      Some of the Power SIMD hardware instructions refer, implicitly
      or explicitly, to vector element numbers.  For example, the
      <code>vspltb</code> instruction has as one of its inputs an
      index into a vector.  The element at that index position is to
      be replicated in every element of the output vector.  For
      another example, <code>vmuleuh</code> instruction operates on
      the even-numbered elements of its input vectors.  The hardware
      instructions define these element numbers using big-endian
      element order, even when the machine is running in little-endian
      mode.  Thus, a built-in function that maps directly to the
      underlying hardware instruction, regardless of the target
      endianness, has the potential to confuse programmers on
      little-endian platforms.
    </para>
    <para>
      It is more useful to define built-in functions that map to these
      instructions to use natural element order.  That is, the
      explicit or implicit element numbers specified by such built-in
      functions should be interpreted using big-endian element order
      on a big-endian platform, and using little-endian element order
      on a little-endian platform.
    </para>
    <para>
      The descriptions of the built-in functions in <xref
      linkend="VIPR.vec-ref" /> contain notes on endian issues that
      apply to each built-in function.  Furthermore, a built-in
      function requiring a different compiler implementation for
      big-endian than it uses for little-endian has a sample
      compiler implementation for both BE and LE.  These sample
      implementations are only intended as examples; designers of a
      compiler are free to use other methods to implement the
      specified semantics.
    </para>
    <para>
      Of course, most built-in functions operate only on corresponding
      sets of elements of input vectors to produce output vectors, and
      thus are not "endian-sensitive."  A complete list of
      endian-sensitive built-in functions can be found in <xref
      linkend="VIPR.biendian.sensitive" />.
    </para>
    <section>
      <title>Extended Data Movement Functions</title>
      <para>
	The built-in functions in <xref
	linkend="VIPR.biendian.vmx-mem" /> map to Altivec/VMX load and
	store instructions and provide access to the “auto-aligning”
	memory instructions of the VMX ISA where low-order address
	bits are discarded before performing a memory access. These
	instructions load and store data in accordance with the
	program's current endian mode, and do not need to be adapted
	by the compiler to reflect little-endian operation during code
	generation.
      </para>
      <para>
	Before the bi-endian programming model was introduced, the
	<code>vec_lvsl</code> and <code>vec_lvsr</code> intrinsics
	were supported.  These could be used in conjunction with
	<code>vec_perm</code> and VMX load and store instructions for
	unaligned access. The <code>vec_lvsl</code> and
	<code>vec_lvsr</code> interfaces are deprecated in accordance
	with the interfaces specified here. For compatibility, the
	built-in pseudo sequences published in previous VMX documents
	continue to work with little-endian data layout and the
	little-endian vector layout described in this document.
	However, the use of these sequences in new code is discouraged
	and usually results in worse performance. It is recommended
	that compilers issue a warning when these functions are used
	in little-endian environments.
      </para>
      <table frame="all" pgwide="1" xml:id="VIPR.biendian.vmx-mem">
        <title>VMX Memory Access Built-In Functions</title>
        <tgroup cols="3">
          <colspec colname="c1" colwidth="15*" align="center" />
          <colspec colname="c2" colwidth="35*" align="center" />
          <colspec colname="c3" colwidth="50*" />
          <thead>
            <row>
              <entry>
                <para>
                  <emphasis role="bold">Built-in Function</emphasis>
                </para>
              </entry>
              <entry>
                <para>
                  <emphasis role="bold">Corresponding Power
                  Instructions</emphasis>
                </para>
              </entry>
              <entry align="center">
                <para>
                  <emphasis role="bold">Implementation Notes</emphasis>
                </para>
              </entry>
            </row>
          </thead>
          <tbody>
            <row>
              <entry>
                <para>vec_ld</para>
              </entry>
              <entry>
                <para>lvx</para>
              </entry>
              <entry>
                <para>Hardware works as a function of endian mode.</para>
              </entry>
            </row>
            <row>
              <entry>
                <para>vec_lde</para>
              </entry>
              <entry>
                <para>lvebx, lvehx, lvewx</para>
              </entry>
              <entry>
                <para>Hardware works as a function of endian mode.</para>
              </entry>
            </row>
            <row>
              <entry>
                <para>vec_ldl</para>
              </entry>
              <entry>
                <para>lvxl</para>
              </entry>
              <entry>
                <para>Hardware works as a function of endian mode.</para>
              </entry>
            </row>
            <row>
              <entry>
                <para>vec_st</para>
              </entry>
              <entry>
                <para>stvx</para>
              </entry>
              <entry>
                <para>Hardware works as a function of endian mode.</para>
              </entry>
            </row>
            <row>
              <entry>
                <para>vec_ste</para>
              </entry>
              <entry>
                <para>stvebx, stvehx, stvewx</para>
              </entry>
              <entry>
                <para>Hardware works as a function of endian mode.</para>
              </entry>
            </row>
            <row>
              <entry>
                <para>vec_stl</para>
              </entry>
              <entry>
                <para>stvxl</para>
              </entry>
              <entry>
                <para>Hardware works as a function of endian mode.</para>
              </entry>
            </row>
          </tbody>
        </tgroup>
      </table>
      <para>
	Instead, it is recommended that programmers use the
	<code>vec_xl</code> and <code>vec_xst</code> vector built-in
	functions to access unaligned data streams.  See the
	descriptions of these instructions in <xref
	linkend="VIPR.vec-ref" /> for further description and
	implementation details.
      </para>
      <table frame="all" pgwide="1" xml:id="VIPR.biendian.sensitive">
	<title>Endian-Sensitive Built-In Functions</title>
	<tgroup cols="3">
          <colspec colname="c1" colwidth="15*" align="center" />
          <colspec colname="c2" colwidth="15*" align="center" />
          <colspec colname="c3" colwidth="15*" align="center" />
	  <tbody>
            <row>
              <entry>
		<para>vec_bperm</para>
              </entry>
              <entry>
		<para>vec_mergeo</para>
              </entry>
              <entry>
		<para>vec_sld</para>
              </entry>
            </row>
            <row>
              <entry>
		<para>vec_cipher_be</para>
              </entry>
              <entry>
		<para>vec_mfvscr</para>
              </entry>
              <entry>
		<para>vec_sldw</para>
              </entry>
            </row>
            <row>
              <entry>
		<para>vec_cipherlast_be</para>
              </entry>
              <entry>
		<para>vec_mule</para>
              </entry>
              <entry>
		<para>vec_sll</para>
              </entry>
            </row>
            <row>
              <entry>
		<para>vec_doublee</para>
              </entry>
              <entry>
		<para>vec_mulo</para>
              </entry>
              <entry>
		<para>vec_slo</para>
              </entry>
            </row>
            <row>
              <entry>
		<para>vec_doubleh</para>
              </entry>
              <entry>
		<para>vec_ncipher_be</para>
              </entry>
              <entry>
		<para>vec_slv</para>
              </entry>
            </row>
            <row>
              <entry>
		<para>vec_doublel</para>
              </entry>
              <entry>
		<para>vec_ncipherlast_be</para>
              </entry>
              <entry>
		<para>vec_splat</para>
              </entry>
            </row>
            <row>
              <entry>
		<para>vec_doubleo</para>
              </entry>
              <entry>
		<para>vec_pack</para>
              </entry>
              <entry>
		<para>vec_srl</para>
              </entry>
            </row>
            <row>
              <entry>
		<para>vec_extract</para>
              </entry>
              <entry>
		<para>vec_pack_to_short_fp32</para>
              </entry>
              <entry>
		<para>vec_sro</para>
              </entry>
            </row>
            <row>
              <entry>
		<para>vec_extract_fp32_from_shorth</para>
              </entry>
              <entry>
		<para>vec_packpx</para>
              </entry>
              <entry>
		<para>vec_srv</para>
              </entry>
            </row>
            <row>
              <entry>
		<para>vec_extract_fp32_from_shortl</para>
              </entry>
              <entry>
		<para>vec_packs</para>
              </entry>
              <entry>
		<para>vec_sum2s</para>
              </entry>
            </row>
            <row>
              <entry>
		<para>vec_extract_4b</para>
              </entry>
              <entry>
		<para>vec_packsu</para>
              </entry>
              <entry>
		<para>vec_sums</para>
              </entry>
            </row>
            <row>
              <entry>
		<para>vec_float2</para>
              </entry>
              <entry>
		<para>vec_perm</para>
              </entry>
              <entry>
		<para>vec_unpackh</para>
              </entry>
            </row>
            <row>
              <entry>
		<para>vec_floate</para>
              </entry>
              <entry>
		<para>vec_permxor</para>
              </entry>
              <entry>
		<para>vec_unpackl</para>
              </entry>
            </row>
            <row>
              <entry>
		<para>vec_floato</para>
              </entry>
              <entry>
		<para>vec_pmsum_be</para>
              </entry>
              <entry>
		<para>vec_unsigned2</para>
              </entry>
            </row>
            <row>
              <entry>
		<para>vec_gb</para>
              </entry>
              <entry>
		<para>vec_reve</para>
              </entry>
              <entry>
		<para>vec_unsignede</para>
              </entry>
            </row>
            <row>
              <entry>
		<para>vec_insert</para>
              </entry>
              <entry>
		<para>vec_sbox_be</para>
              </entry>
              <entry>
		<para>vec_unsignedo</para>
              </entry>
            </row>
            <row>
              <entry>
		<para>vec_insert_4b</para>
              </entry>
              <entry>
		<para>vec_shasigma_be</para>
              </entry>
              <entry>
		<para>vec_xl (ISA 2.07 only)</para>
              </entry>
            </row>
            <row>
              <entry>
		<para>vec_mergee</para>
              </entry>
              <entry>
		<para>vec_signed2</para>
              </entry>
              <entry>
		<para>vec_xl_be</para>
              </entry>
            </row>
            <row>
              <entry>
		<para>vec_mergeh</para>
              </entry>
              <entry>
		<para>vec_signede</para>
              </entry>
              <entry>
		<para>vec_xst (ISA 2.07 only)</para>
              </entry>
            </row>
            <row>
              <entry>
		<para>vec_mergel</para>
              </entry>
              <entry>
		<para>vec_signedo</para>
              </entry>
              <entry>
		<para>vec_xst_be</para>
              </entry>
            </row>
	  </tbody>
	</tgroup>
      </table>
    </section>
    <section xml:id="VIPR.biendian.BELE">
      <title>Big-Endian Vector Layout in Little-Endian Environments
      (Deprecated)</title>
      <para>
	Versions 1.0 through 1.4 of the 64-Bit ELFv2 ABI Specification
	for Power provided for optional compiler support for using
	big-endian element ordering in little-endian environments.
	This was initially deemed useful for porting certain libraries
	that assumed big-endian element ordering regardless of the
	endianness of their input streams.  In practice, this
	introduced serious compiler complexity without much utility.
	Thus this support (previously controlled by switches
	<code>-maltivec=be</code> and/or <code>-qaltivec=be</code>) is
	now deprecated.  Current versions of the GCC and Clang
	open-source compilers do not implement this support.
      </para>
    </section>
  </section>

  <section>
    <title>Language-Specific Vector Support for Other
    Languages</title>
    <section>
      <title>Fortran</title>
      <para>
	<xref linkend="VIPR.biendian.fortran-types" /> shows the
	correspondence between the C/C++ types described in this
	document and their Fortran equivalents. In Fortran, the
	Boolean vector data types are represented by
	<code>VECTOR(UNSIGNED(</code><emphasis>n</emphasis><code>))</code>.
      </para>
      <table frame="all" pgwide="1" xml:id="VIPR.biendian.fortran-types">
        <title>Fortran Vector Data Types</title>
        <tgroup cols="2">
          <colspec colname="c1" colwidth="50*" />
          <colspec colname="c2" colwidth="50*" />
          <thead>
            <row>
              <entry align="center">
                <para>
                  <emphasis role="bold">XL Fortran Vector Type</emphasis>
                </para>
              </entry>
              <entry align="center">
                <para>
                  <emphasis role="bold">XL C/C++ Vector Type</emphasis>
                </para>
              </entry>
            </row>
          </thead>
          <tbody>
            <row>
              <entry>
                <para>VECTOR(INTEGER(1))</para>
              </entry>
              <entry>
                <para>vector signed char</para>
              </entry>
            </row>
            <row>
              <entry>
                <para>VECTOR(INTEGER(2))</para>
              </entry>
              <entry>
                <para>vector signed short</para>
              </entry>
            </row>
            <row>
              <entry>
                <para>VECTOR(INTEGER(4))</para>
              </entry>
              <entry>
                <para>vector signed int</para>
              </entry>
            </row>
            <row>
              <entry>
                <para>VECTOR(INTEGER(8))</para>
              </entry>
              <entry>
                <para>vector signed long long, vector signed long<footnote
		xml:id="vlongappalling">
                  <para>The vector long types are deprecated due to their
                  ambiguity between 32-bit and 64-bit environments. The use
                  of the vector long long types is preferred.</para>
		</footnote></para>
              </entry>
            </row>
            <row>
              <entry>
                <para>VECTOR(INTEGER(16))</para>
              </entry>
              <entry>
                <para>vector signed __int128</para>
              </entry>
            </row>
            <row>
              <entry>
                <para>VECTOR(UNSIGNED(1))</para>
              </entry>
              <entry>
                <para>vector unsigned char</para>
              </entry>
            </row>
            <row>
              <entry>
                <para>VECTOR(UNSIGNED(2))</para>
              </entry>
              <entry>
                <para>vector unsigned short</para>
              </entry>
            </row>
            <row>
              <entry>
                <para>VECTOR(UNSIGNED(4))</para>
              </entry>
              <entry>
                <para>vector unsigned int</para>
              </entry>
            </row>
            <row>
              <entry>
                <para>VECTOR(UNSIGNED(8))</para>
              </entry>
              <entry>
                <para>vector unsigned long long, vector unsigned long<footnoteref
		linkend="vlongappalling" /></para>
              </entry>
            </row>
            <row>
              <entry>
                <para>VECTOR(UNSIGNED(16))</para>
              </entry>
              <entry>
                <para>vector unsigned __int128</para>
              </entry>
            </row>
            <row>
              <entry>
                <para>VECTOR(REAL(4))</para>
              </entry>
              <entry>
                <para>vector float</para>
              </entry>
            </row>
            <row>
              <entry>
                <para>VECTOR(REAL(8))</para>
              </entry>
              <entry>
                <para>vector double</para>
              </entry>
            </row>
            <row>
              <entry>
                <para>VECTOR(PIXEL)</para>
              </entry>
              <entry>
                <para>vector pixel</para>
              </entry>
            </row>
          </tbody>
        </tgroup>
      </table>
      <para>
	Because the Fortran language does not support pointers, vector
	built-in functions that expect pointers to a base type take an
	array element reference to indicate the address of a memory
	location that is the subject of a memory access built-in
	function.
      </para>
      <para>
	Because the Fortran language does not support type casts, the
	<code>vec_convert</code> and <code>vec_concat</code> built-in
	functions shown in <xref linkend="VIPR.endian.convert" /> are
	provided to perform bit-exact type conversions between vector
	types.
      </para>
      <table frame="all" pgwide="1" xml:id="VIPR.endian.convert">
        <title>Built-In Vector Conversion Functions</title>
        <tgroup cols="2">
          <colspec colname="c1" colwidth="30*" align="center" />
          <colspec colname="c2" colwidth="70*" />
          <thead>
            <row>
              <entry>
                <para>
                  <emphasis role="bold">Group</emphasis>
                </para>
              </entry>
              <entry align="center">
                <para>
                  <emphasis role="bold">Description</emphasis>
                </para>
              </entry>
            </row>
          </thead>
          <tbody>
            <row>
              <entry>
                <para>VEC_CONCAT (ARG1, ARG2)<?linebreak?>(Fortran)</para>
                <para></para>
              </entry>
              <entry>
                <para>Purpose:</para>
                <para>Concatenates two elements to form a vector.</para>
                <para>Result value:</para>
                <para>The resulting vector consists of the two scalar elements,
                ARG1 and ARG2, assigned to elements 0 and 1 (using the
                environment’s native endian numbering), respectively.</para>
                <itemizedlist>
                  <listitem>
                    <para><emphasis role="bold">Note:  </emphasis>This function corresponds to the C/C++ vector
                    constructor (vector type){a,b}. It is provided only for
                    languages without vector constructors.</para>
                  </listitem>
                </itemizedlist>
              </entry>
            </row>
            <row>
              <entry>
                <para></para>
              </entry>
              <entry>
                <para>vector signed long long vec_concat (signed long long,
                signed long long);</para>
              </entry>
            </row>
            <row>
              <entry>
                <para></para>
              </entry>
              <entry>
                <para>vector unsigned long long vec_concat (unsigned long long,
                unsigned long long);</para>
              </entry>
            </row>
            <row>
              <entry>
                <para></para>
              </entry>
              <entry>
                <para>vector double vec_concat (double, double);</para>
              </entry>
            </row>
            <row>
              <entry>
                <para>VEC_CONVERT(V, MOLD)</para>
              </entry>
              <entry>
                <para>Purpose:</para>
                <para>Converts a vector to a vector of a given type.</para>
                <para>Class:</para>
                <para>Pure function</para>
                <para>Argument type and attributes:</para>
                <itemizedlist spacing="compact">
                  <listitem>
                    <para>V Must be an INTENT(IN) vector.</para>
                  </listitem>
                  <listitem>
                    <para>MOLD Must be an INTENT(IN) vector. If it is a
                    variable, it need not be defined.</para>
                  </listitem>
                </itemizedlist>
                <para>Result type and attributes:</para>
                <para>The result is a vector of the same type as MOLD.</para>
                <para>Result value:</para>
                <para>The result is as if it were on the left-hand side of an
                intrinsic assignment with V on the right-hand side.</para>
              </entry>
            </row>
          </tbody>
        </tgroup>
      </table>
    </section>
  </section>

  <section>
    <title>Examples and Limitations</title>
    <section xml:id="VIPR.biendian.unaligned">
      <title>Unaligned vector access</title>
      <para>
	A common programming error is to cast a pointer to a base type
	(such as <code>int</code>) to a pointer of the corresponding
	vector type (such as <code>vector int</code>), and then
	dereference the pointer.  This constitutes undefined behavior,
	because it casts a pointer with a smaller alignment
	requirement to a pointer with a larger alignment requirement.
	Compilers may not produce code that you expect in the presence
	of undefined behavior.
      </para>
      <para>
	Thus, do not write the following:
      </para>
      <programlisting>  int a[4096];
  vector int x = *((vector int *) a);</programlisting>
      <para>
	Instead, write this:
      </para>
      <programlisting>  int a[4096];
  vector int x = vec_xl (0, a);</programlisting>
    </section>
    <section xml:id="VIPR.biendian.sld">
      <title>vec_sld and vec_sro are not bi-endian</title>
      <para>
	One oddity in the bi-endian vector programming model is that
	<code>vec_sld</code> has big-endian semantics for code
	compiled for both big-endian and little-endian targets.  That
	is, any code that uses <code>vec_sld</code> without guarding
	it with a test on endianness is likely to be incorrect.
      </para>
      <para>
	At the time that the bi-endian model was being developed, it
	was discovered that existing code in several Linux packages
	was using <code>vec_sld</code> in order to perform multiplies,
	or to otherwise shift portions of base elements left.  A
	straightforward little-endian implementation of
	<code>vec_sld</code> would concatenate the two input vectors
	in reverse order and shift bytes to the right.  This would
	only give compatible results for <code>vector char</code>
	types.  Those using this intrinsic as a cheap multiply, or to
	shift bytes within larger elements, would see different
	results on little-endian versus big-endian with such an
	implementation.  Therefore it was decided that
	<code>vec_sld</code> would not have a bi-endian
	implementation.
      </para>
      <para>
	<code>vec_sro</code> is not bi-endian for similar reasons.
      </para>
    </section>
    <section xml:id="VIPR.biendian.vperm">
      <title>Limitations on bi-endianness of vec_perm</title>
      <para>
	The <code>vec_perm</code> intrinsic is bi-endian, provided
	that it is used to reorder entire elements of the input
	vectors.
      </para>
      <para>
	To see why this is, let's examine the code generation for
      </para>
      <programlisting>  vector int t;
  vector int a = (vector int){0x00010203, 0x04050607, 0x08090a0b, 0x0c0d0e0f};
  vector int b = (vector int){0x10111213, 0x14151617, 0x18191a1b, 0x1c1d1e1f};
  vector char c = (vector char){0,1,2,3,28,29,30,31,12,13,14,15,20,21,22,23};
  t = vec_perm (a, b, c);</programlisting>
      <para>
	For big endian, a compiler should generate:
      </para>
      <programlisting>  vperm  t,a,b,c</programlisting>
      <para>
	For little endian targeting a POWER8 system, a compiler should
	generate:
      </para>
      <programlisting>  vnand  d,c,c
  vperm  t,b,a,d</programlisting>
      <para>
	For little endian targeting a POWER9 system, a compiler should
	generate:
      </para>
      <programlisting>  vpermr  t,b,a,c</programlisting>
      <para>
	Note that the <code>vpermr</code> instruction takes care of
	modifying the permute control vector (PCV) <code>c</code> that
	was done using the <code>vnand</code> instruction for POWER8.
	Because only the bottom 5 bits of each element of the PCV are
	read by the hardware, this has the effect of subtracting the
	original elements of the PCV from 31.
      </para>
      <para>
	Note also that the PCV <code>c</code> has element values that
	are contiguous in groups of 4.  This selects entire elements
	from the input vectors <code>a</code> and <code>b</code> to
	reorder.  Thus the intent of the code is to select the first
	integer element of <code>a</code>, the last integer element of
	<code>b</code>, the last integer element of <code>a</code>,
	and the second integer element of <code>b</code>, in that
	order.
      </para>
      <para>
	The big endian result is {0x00010203, 0x1c1d1e1f, 0x0c0d0e0f,
	0x14151617}, as shown here:
      </para>
      <informaltable frame="all">
	<tgroup cols="17">
          <colspec colname="c1" colwidth="1*" />
          <colspec colname="c2" colwidth="1*" />
          <colspec colname="c3" colwidth="1*" />
          <colspec colname="c4" colwidth="1*" />
          <colspec colname="c5" colwidth="1*" />
          <colspec colname="c6" colwidth="1*" />
          <colspec colname="c7" colwidth="1*" />
          <colspec colname="c8" colwidth="1*" />
          <colspec colname="c9" colwidth="1*" />
          <colspec colname="c10" colwidth="1*" />
          <colspec colname="c11" colwidth="1*" />
          <colspec colname="c12" colwidth="1*" />
          <colspec colname="c13" colwidth="1*" />
          <colspec colname="c14" colwidth="1*" />
          <colspec colname="c15" colwidth="1*" />
          <colspec colname="c16" colwidth="1*" />
          <colspec colname="c17" colwidth="1*" />
          <tbody>
            <row>
              <entry align="center">
		<para><emphasis role="bold">a</emphasis></para>
              </entry>
	      <entry align="center">
		<para>00</para>
	      </entry>
	      <entry align="center">
		<para>01</para>
	      </entry>
	      <entry align="center">
		<para>02</para>
	      </entry>
	      <entry align="center">
		<para>03</para>
	      </entry>
	      <entry align="center">
		<para>04</para>
	      </entry>
	      <entry align="center">
		<para>05</para>
	      </entry>
	      <entry align="center">
		<para>06</para>
	      </entry>
	      <entry align="center">
		<para>07</para>
	      </entry>
	      <entry align="center">
		<para>08</para>
	      </entry>
	      <entry align="center">
		<para>09</para>
	      </entry>
	      <entry align="center">
		<para>0A</para>
	      </entry>
	      <entry align="center">
		<para>0B</para>
	      </entry>
	      <entry align="center">
		<para>0C</para>
	      </entry>
	      <entry align="center">
		<para>0D</para>
	      </entry>
	      <entry align="center">
		<para>0E</para>
	      </entry>
	      <entry align="center">
		<para>0F</para>
	      </entry>
	    </row>
            <row>
              <entry align="center">
		<para><emphasis role="bold">b</emphasis></para>
              </entry>
	      <entry align="center">
		<para>10</para>
	      </entry>
	      <entry align="center">
		<para>11</para>
	      </entry>
	      <entry align="center">
		<para>12</para>
	      </entry>
	      <entry align="center">
		<para>13</para>
	      </entry>
	      <entry align="center">
		<para>14</para>
	      </entry>
	      <entry align="center">
		<para>15</para>
	      </entry>
	      <entry align="center">
		<para>16</para>
	      </entry>
	      <entry align="center">
		<para>17</para>
	      </entry>
	      <entry align="center">
		<para>18</para>
	      </entry>
	      <entry align="center">
		<para>19</para>
	      </entry>
	      <entry align="center">
		<para>1A</para>
	      </entry>
	      <entry align="center">
		<para>1B</para>
	      </entry>
	      <entry align="center">
		<para>1C</para>
	      </entry>
	      <entry align="center">
		<para>1D</para>
	      </entry>
	      <entry align="center">
		<para>1E</para>
	      </entry>
	      <entry align="center">
		<para>1F</para>
	      </entry>
	    </row>
            <row>
              <entry align="center">
		<para><emphasis role="bold">c</emphasis></para>
              </entry>
	      <entry align="center">
		<para>0</para>
	      </entry>
	      <entry align="center">
		<para>1</para>
	      </entry>
	      <entry align="center">
		<para>2</para>
	      </entry>
	      <entry align="center">
		<para>3</para>
	      </entry>
	      <entry align="center">
		<para>28</para>
	      </entry>
	      <entry align="center">
		<para>29</para>
	      </entry>
	      <entry align="center">
		<para>30</para>
	      </entry>
	      <entry align="center">
		<para>31</para>
	      </entry>
	      <entry align="center">
		<para>12</para>
	      </entry>
	      <entry align="center">
		<para>13</para>
	      </entry>
	      <entry align="center">
		<para>14</para>
	      </entry>
	      <entry align="center">
		<para>15</para>
	      </entry>
	      <entry align="center">
		<para>20</para>
	      </entry>
	      <entry align="center">
		<para>21</para>
	      </entry>
	      <entry align="center">
		<para>22</para>
	      </entry>
	      <entry align="center">
		<para>23</para>
	      </entry>
	    </row>
            <row>
              <entry align="center">
		<para><emphasis role="bold">t</emphasis></para>
              </entry>
	      <entry align="center">
		<para>00</para>
	      </entry>
	      <entry align="center">
		<para>01</para>
	      </entry>
	      <entry align="center">
		<para>02</para>
	      </entry>
	      <entry align="center">
		<para>03</para>
	      </entry>
	      <entry align="center">
		<para>1C</para>
	      </entry>
	      <entry align="center">
		<para>1D</para>
	      </entry>
	      <entry align="center">
		<para>1E</para>
	      </entry>
	      <entry align="center">
		<para>1F</para>
	      </entry>
	      <entry align="center">
		<para>0C</para>
	      </entry>
	      <entry align="center">
		<para>0D</para>
	      </entry>
	      <entry align="center">
		<para>0E</para>
	      </entry>
	      <entry align="center">
		<para>0F</para>
	      </entry>
	      <entry align="center">
		<para>14</para>
	      </entry>
	      <entry align="center">
		<para>15</para>
	      </entry>
	      <entry align="center">
		<para>16</para>
	      </entry>
	      <entry align="center">
		<para>17</para>
	      </entry>
	    </row>
	  </tbody>
	</tgroup>
      </informaltable>
      <para>
	For little endian, the modified PCV is elementwise subtracted
	from 31, giving {31,30,29,28,3,2,1,0,19,18,17,16,11,10,9,8}.
	Since the elements appear in reverse order in a register when
	loaded from little-endian memory, the elements appear in the
	register from left to right as
	{8,9,10,11,16,17,18,19,0,1,2,3,28,29,30,31}.  So the following
	<code>vperm</code> instruction will again select entire
	elements using the groups of 4 contiguous bytes, and the
	values of the integers will be reordered without compromising
	each integer's contents.  The little-endian result matches the
	big-endian result, as shown.  Observe that <emphasis
	role="bold">a</emphasis> and <emphasis
	role="bold">b</emphasis> switch positions for little endian
	code generation.
      </para>
      <informaltable frame="all">
	<tgroup cols="17">
          <colspec colname="c1" colwidth="1*" />
          <colspec colname="c2" colwidth="1*" />
          <colspec colname="c3" colwidth="1*" />
          <colspec colname="c4" colwidth="1*" />
          <colspec colname="c5" colwidth="1*" />
          <colspec colname="c6" colwidth="1*" />
          <colspec colname="c7" colwidth="1*" />
          <colspec colname="c8" colwidth="1*" />
          <colspec colname="c9" colwidth="1*" />
          <colspec colname="c10" colwidth="1*" />
          <colspec colname="c11" colwidth="1*" />
          <colspec colname="c12" colwidth="1*" />
          <colspec colname="c13" colwidth="1*" />
          <colspec colname="c14" colwidth="1*" />
          <colspec colname="c15" colwidth="1*" />
          <colspec colname="c16" colwidth="1*" />
          <colspec colname="c17" colwidth="1*" />
          <tbody>
            <row>
              <entry align="center">
		<para><emphasis role="bold">b</emphasis></para>
              </entry>
	      <entry align="center">
		<para>1C</para>
	      </entry>
	      <entry align="center">
		<para>1D</para>
	      </entry>
	      <entry align="center">
		<para>1E</para>
	      </entry>
	      <entry align="center">
		<para>1F</para>
	      </entry>
	      <entry align="center">
		<para>18</para>
	      </entry>
	      <entry align="center">
		<para>19</para>
	      </entry>
	      <entry align="center">
		<para>1A</para>
	      </entry>
	      <entry align="center">
		<para>1B</para>
	      </entry>
	      <entry align="center">
		<para>14</para>
	      </entry>
	      <entry align="center">
		<para>15</para>
	      </entry>
	      <entry align="center">
		<para>16</para>
	      </entry>
	      <entry align="center">
		<para>17</para>
	      </entry>
	      <entry align="center">
		<para>10</para>
	      </entry>
	      <entry align="center">
		<para>11</para>
	      </entry>
	      <entry align="center">
		<para>12</para>
	      </entry>
	      <entry align="center">
		<para>13</para>
	      </entry>
	    </row>
            <row>
              <entry align="center">
		<para><emphasis role="bold">a</emphasis></para>
              </entry>
	      <entry align="center">
		<para>0C</para>
	      </entry>
	      <entry align="center">
		<para>0D</para>
	      </entry>
	      <entry align="center">
		<para>0E</para>
	      </entry>
	      <entry align="center">
		<para>0F</para>
	      </entry>
	      <entry align="center">
		<para>08</para>
	      </entry>
	      <entry align="center">
		<para>09</para>
	      </entry>
	      <entry align="center">
		<para>0A</para>
	      </entry>
	      <entry align="center">
		<para>0B</para>
	      </entry>
	      <entry align="center">
		<para>04</para>
	      </entry>
	      <entry align="center">
		<para>05</para>
	      </entry>
	      <entry align="center">
		<para>06</para>
	      </entry>
	      <entry align="center">
		<para>07</para>
	      </entry>
	      <entry align="center">
		<para>00</para>
	      </entry>
	      <entry align="center">
		<para>01</para>
	      </entry>
	      <entry align="center">
		<para>02</para>
	      </entry>
	      <entry align="center">
		<para>03</para>
	      </entry>
	    </row>
            <row>
              <entry align="center">
		<para><emphasis role="bold">c</emphasis></para>
              </entry>
	      <entry align="center">
		<para>8</para>
	      </entry>
	      <entry align="center">
		<para>9</para>
	      </entry>
	      <entry align="center">
		<para>10</para>
	      </entry>
	      <entry align="center">
		<para>11</para>
	      </entry>
	      <entry align="center">
		<para>16</para>
	      </entry>
	      <entry align="center">
		<para>17</para>
	      </entry>
	      <entry align="center">
		<para>18</para>
	      </entry>
	      <entry align="center">
		<para>19</para>
	      </entry>
	      <entry align="center">
		<para>0</para>
	      </entry>
	      <entry align="center">
		<para>1</para>
	      </entry>
	      <entry align="center">
		<para>2</para>
	      </entry>
	      <entry align="center">
		<para>3</para>
	      </entry>
	      <entry align="center">
		<para>28</para>
	      </entry>
	      <entry align="center">
		<para>29</para>
	      </entry>
	      <entry align="center">
		<para>30</para>
	      </entry>
	      <entry align="center">
		<para>31</para>
	      </entry>
	    </row>
            <row>
              <entry align="center">
		<para><emphasis role="bold">t</emphasis></para>
              </entry>
	      <entry align="center">
		<para>14</para>
	      </entry>
	      <entry align="center">
		<para>15</para>
	      </entry>
	      <entry align="center">
		<para>16</para>
	      </entry>
	      <entry align="center">
		<para>17</para>
	      </entry>
	      <entry align="center">
		<para>0C</para>
	      </entry>
	      <entry align="center">
		<para>0D</para>
	      </entry>
	      <entry align="center">
		<para>0E</para>
	      </entry>
	      <entry align="center">
		<para>0F</para>
	      </entry>
	      <entry align="center">
		<para>1C</para>
	      </entry>
	      <entry align="center">
		<para>1D</para>
	      </entry>
	      <entry align="center">
		<para>1E</para>
	      </entry>
	      <entry align="center">
		<para>1F</para>
	      </entry>
	      <entry align="center">
		<para>00</para>
	      </entry>
	      <entry align="center">
		<para>01</para>
	      </entry>
	      <entry align="center">
		<para>02</para>
	      </entry>
	      <entry align="center">
		<para>03</para>
	      </entry>
	    </row>
	  </tbody>
	</tgroup>
      </informaltable>
      <para>
	Now, suppose instead that the original PCV does not reorder
	entire integers at once:
      </para>
      <programlisting>  vector char c = (vector char){0,20,31,4,7,17,6,19,30,3,2,8,9,13,5,22};</programlisting>
      <para>
	The result of the big-endian implementation would be:
      </para>
      <programlisting>  t = {0x00141f04, 0x07110613, 0x1e030208, 0x090d0516};</programlisting>
      <informaltable frame="all">
	<tgroup cols="17">
          <colspec colname="c1" colwidth="1*" />
          <colspec colname="c2" colwidth="1*" />
          <colspec colname="c3" colwidth="1*" />
          <colspec colname="c4" colwidth="1*" />
          <colspec colname="c5" colwidth="1*" />
          <colspec colname="c6" colwidth="1*" />
          <colspec colname="c7" colwidth="1*" />
          <colspec colname="c8" colwidth="1*" />
          <colspec colname="c9" colwidth="1*" />
          <colspec colname="c10" colwidth="1*" />
          <colspec colname="c11" colwidth="1*" />
          <colspec colname="c12" colwidth="1*" />
          <colspec colname="c13" colwidth="1*" />
          <colspec colname="c14" colwidth="1*" />
          <colspec colname="c15" colwidth="1*" />
          <colspec colname="c16" colwidth="1*" />
          <colspec colname="c17" colwidth="1*" />
          <tbody>
            <row>
              <entry align="center">
		<para><emphasis role="bold">a</emphasis></para>
              </entry>
	      <entry align="center">
		<para>00</para>
	      </entry>
	      <entry align="center">
		<para>01</para>
	      </entry>
	      <entry align="center">
		<para>02</para>
	      </entry>
	      <entry align="center">
		<para>03</para>
	      </entry>
	      <entry align="center">
		<para>04</para>
	      </entry>
	      <entry align="center">
		<para>05</para>
	      </entry>
	      <entry align="center">
		<para>06</para>
	      </entry>
	      <entry align="center">
		<para>07</para>
	      </entry>
	      <entry align="center">
		<para>08</para>
	      </entry>
	      <entry align="center">
		<para>09</para>
	      </entry>
	      <entry align="center">
		<para>0A</para>
	      </entry>
	      <entry align="center">
		<para>0B</para>
	      </entry>
	      <entry align="center">
		<para>0C</para>
	      </entry>
	      <entry align="center">
		<para>0D</para>
	      </entry>
	      <entry align="center">
		<para>0E</para>
	      </entry>
	      <entry align="center">
		<para>0F</para>
	      </entry>
	    </row>
            <row>
              <entry align="center">
		<para><emphasis role="bold">b</emphasis></para>
              </entry>
	      <entry align="center">
		<para>10</para>
	      </entry>
	      <entry align="center">
		<para>11</para>
	      </entry>
	      <entry align="center">
		<para>12</para>
	      </entry>
	      <entry align="center">
		<para>13</para>
	      </entry>
	      <entry align="center">
		<para>14</para>
	      </entry>
	      <entry align="center">
		<para>15</para>
	      </entry>
	      <entry align="center">
		<para>16</para>
	      </entry>
	      <entry align="center">
		<para>17</para>
	      </entry>
	      <entry align="center">
		<para>18</para>
	      </entry>
	      <entry align="center">
		<para>19</para>
	      </entry>
	      <entry align="center">
		<para>1A</para>
	      </entry>
	      <entry align="center">
		<para>1B</para>
	      </entry>
	      <entry align="center">
		<para>1C</para>
	      </entry>
	      <entry align="center">
		<para>1D</para>
	      </entry>
	      <entry align="center">
		<para>1E</para>
	      </entry>
	      <entry align="center">
		<para>1F</para>
	      </entry>
	    </row>
            <row>
              <entry align="center">
		<para><emphasis role="bold">c</emphasis></para>
              </entry>
	      <entry align="center">
		<para>0</para>
	      </entry>
	      <entry align="center">
		<para>20</para>
	      </entry>
	      <entry align="center">
		<para>31</para>
	      </entry>
	      <entry align="center">
		<para>4</para>
	      </entry>
	      <entry align="center">
		<para>7</para>
	      </entry>
	      <entry align="center">
		<para>17</para>
	      </entry>
	      <entry align="center">
		<para>6</para>
	      </entry>
	      <entry align="center">
		<para>19</para>
	      </entry>
	      <entry align="center">
		<para>30</para>
	      </entry>
	      <entry align="center">
		<para>3</para>
	      </entry>
	      <entry align="center">
		<para>2</para>
	      </entry>
	      <entry align="center">
		<para>8</para>
	      </entry>
	      <entry align="center">
		<para>9</para>
	      </entry>
	      <entry align="center">
		<para>13</para>
	      </entry>
	      <entry align="center">
		<para>5</para>
	      </entry>
	      <entry align="center">
		<para>22</para>
	      </entry>
	    </row>
            <row>
              <entry align="center">
		<para><emphasis role="bold">t</emphasis></para>
              </entry>
	      <entry align="center">
		<para>00</para>
	      </entry>
	      <entry align="center">
		<para>14</para>
	      </entry>
	      <entry align="center">
		<para>1F</para>
	      </entry>
	      <entry align="center">
		<para>04</para>
	      </entry>
	      <entry align="center">
		<para>07</para>
	      </entry>
	      <entry align="center">
		<para>11</para>
	      </entry>
	      <entry align="center">
		<para>06</para>
	      </entry>
	      <entry align="center">
		<para>13</para>
	      </entry>
	      <entry align="center">
		<para>1E</para>
	      </entry>
	      <entry align="center">
		<para>03</para>
	      </entry>
	      <entry align="center">
		<para>02</para>
	      </entry>
	      <entry align="center">
		<para>08</para>
	      </entry>
	      <entry align="center">
		<para>09</para>
	      </entry>
	      <entry align="center">
		<para>0D</para>
	      </entry>
	      <entry align="center">
		<para>05</para>
	      </entry>
	      <entry align="center">
		<para>16</para>
	      </entry>
	    </row>
	  </tbody>
	</tgroup>
      </informaltable>
      <para>
	For little-endian, the modified PCV would be
	{31,11,0,27,24,14,25,12,1,28,29,23,22,18,26,9}, appearing in
	the register as
	{9,26,18,22,23,29,28,1,12,25,14,24,27,0,11,31}.  The final
	little-endian result would be
      </para>
      <programlisting>  t = {0x071c1703, 0x10051204, 0x0b01001d, 0x15060e0a};</programlisting>
      <para>
	which bears no resemblance to the big-endian result.
      </para>
      <informaltable frame="all">
	<tgroup cols="17">
          <colspec colname="c1" colwidth="1*" />
          <colspec colname="c2" colwidth="1*" />
          <colspec colname="c3" colwidth="1*" />
          <colspec colname="c4" colwidth="1*" />
          <colspec colname="c5" colwidth="1*" />
          <colspec colname="c6" colwidth="1*" />
          <colspec colname="c7" colwidth="1*" />
          <colspec colname="c8" colwidth="1*" />
          <colspec colname="c9" colwidth="1*" />
          <colspec colname="c10" colwidth="1*" />
          <colspec colname="c11" colwidth="1*" />
          <colspec colname="c12" colwidth="1*" />
          <colspec colname="c13" colwidth="1*" />
          <colspec colname="c14" colwidth="1*" />
          <colspec colname="c15" colwidth="1*" />
          <colspec colname="c16" colwidth="1*" />
          <colspec colname="c17" colwidth="1*" />
          <tbody>
            <row>
              <entry align="center">
		<para><emphasis role="bold">b</emphasis></para>
              </entry>
	      <entry align="center">
		<para>1C</para>
	      </entry>
	      <entry align="center">
		<para>1D</para>
	      </entry>
	      <entry align="center">
		<para>1E</para>
	      </entry>
	      <entry align="center">
		<para>1F</para>
	      </entry>
	      <entry align="center">
		<para>18</para>
	      </entry>
	      <entry align="center">
		<para>19</para>
	      </entry>
	      <entry align="center">
		<para>1A</para>
	      </entry>
	      <entry align="center">
		<para>1B</para>
	      </entry>
	      <entry align="center">
		<para>14</para>
	      </entry>
	      <entry align="center">
		<para>15</para>
	      </entry>
	      <entry align="center">
		<para>16</para>
	      </entry>
	      <entry align="center">
		<para>17</para>
	      </entry>
	      <entry align="center">
		<para>10</para>
	      </entry>
	      <entry align="center">
		<para>11</para>
	      </entry>
	      <entry align="center">
		<para>12</para>
	      </entry>
	      <entry align="center">
		<para>13</para>
	      </entry>
	    </row>
            <row>
              <entry align="center">
		<para><emphasis role="bold">a</emphasis></para>
              </entry>
	      <entry align="center">
		<para>0C</para>
	      </entry>
	      <entry align="center">
		<para>0D</para>
	      </entry>
	      <entry align="center">
		<para>0E</para>
	      </entry>
	      <entry align="center">
		<para>0F</para>
	      </entry>
	      <entry align="center">
		<para>08</para>
	      </entry>
	      <entry align="center">
		<para>09</para>
	      </entry>
	      <entry align="center">
		<para>0A</para>
	      </entry>
	      <entry align="center">
		<para>0B</para>
	      </entry>
	      <entry align="center">
		<para>04</para>
	      </entry>
	      <entry align="center">
		<para>05</para>
	      </entry>
	      <entry align="center">
		<para>06</para>
	      </entry>
	      <entry align="center">
		<para>07</para>
	      </entry>
	      <entry align="center">
		<para>00</para>
	      </entry>
	      <entry align="center">
		<para>01</para>
	      </entry>
	      <entry align="center">
		<para>02</para>
	      </entry>
	      <entry align="center">
		<para>03</para>
	      </entry>
	    </row>
            <row>
              <entry align="center">
		<para><emphasis role="bold">c</emphasis></para>
              </entry>
	      <entry align="center">
		<para>9</para>
	      </entry>
	      <entry align="center">
		<para>26</para>
	      </entry>
	      <entry align="center">
		<para>18</para>
	      </entry>
	      <entry align="center">
		<para>22</para>
	      </entry>
	      <entry align="center">
		<para>23</para>
	      </entry>
	      <entry align="center">
		<para>29</para>
	      </entry>
	      <entry align="center">
		<para>28</para>
	      </entry>
	      <entry align="center">
		<para>1</para>
	      </entry>
	      <entry align="center">
		<para>12</para>
	      </entry>
	      <entry align="center">
		<para>25</para>
	      </entry>
	      <entry align="center">
		<para>14</para>
	      </entry>
	      <entry align="center">
		<para>24</para>
	      </entry>
	      <entry align="center">
		<para>27</para>
	      </entry>
	      <entry align="center">
		<para>0</para>
	      </entry>
	      <entry align="center">
		<para>11</para>
	      </entry>
	      <entry align="center">
		<para>31</para>
	      </entry>
	    </row>
            <row>
              <entry align="center">
		<para><emphasis role="bold">t</emphasis></para>
              </entry>
	      <entry align="center">
		<para>15</para>
	      </entry>
	      <entry align="center">
		<para>06</para>
	      </entry>
	      <entry align="center">
		<para>0E</para>
	      </entry>
	      <entry align="center">
		<para>0A</para>
	      </entry>
	      <entry align="center">
		<para>0B</para>
	      </entry>
	      <entry align="center">
		<para>01</para>
	      </entry>
	      <entry align="center">
		<para>00</para>
	      </entry>
	      <entry align="center">
		<para>1D</para>
	      </entry>
	      <entry align="center">
		<para>10</para>
	      </entry>
	      <entry align="center">
		<para>05</para>
	      </entry>
	      <entry align="center">
		<para>12</para>
	      </entry>
	      <entry align="center">
		<para>04</para>
	      </entry>
	      <entry align="center">
		<para>07</para>
	      </entry>
	      <entry align="center">
		<para>1C</para>
	      </entry>
	      <entry align="center">
		<para>17</para>
	      </entry>
	      <entry align="center">
		<para>03</para>
	      </entry>
	    </row>
	  </tbody>
	</tgroup>
      </informaltable>
      <para>
	The lesson here is to only use <code>vec_perm</code> to
	reorder entire elements of a vector.  If you must use vec_perm
	for another purpose, your code must include a test for
	endianness and separate algorithms for big- and
	little-endian.  Examples of this may be seen in the Power
	Vector Library project (see <xref linkend="VIPR.intro.links"
	/>).
      </para>
    </section>
  </section>

</chapter>
-												Create outline for front matter chapters.

Signed-off-by: Bill Schmidt <wschmidt@linux.ibm.com>

											
										
										
											6 years ago
+								<!--
 								  Copyright (c) 2019 OpenPOWER Foundation
 								  Licensed under the Apache License, Version 2.0 (the "License");
 								  you may not use this file except in compliance with the License.
 								  You may obtain a copy of the License at
 								    http://www.apache.org/licenses/LICENSE-2.0
 								  Unless required by applicable law or agreed to in writing, software
 								  distributed under the License is distributed on an "AS IS" BASIS,
 								  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 								  See the License for the specific language governing permissions and
 								  limitations under the License.
 								-->
 								<chapter version="5.0" xml:lang="en" xmlns="http://docbook.org/ns/docbook" xmlns:xi="http://www.w3.org/2001/XInclude"
-												More work on history.

											
										
										
											6 years ago
+								xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="VIPR.biendian">
-												Create outline for front matter chapters.

Signed-off-by: Bill Schmidt <wschmidt@linux.ibm.com>

											
										
										
											6 years ago
 								  <!-- Chapter Title goes here. -->
-												Changed to consistently use Power versus POWER, Power ISA versus
PowerISA, etc.  Added graphic to vec_gb.

											
										
										
											5 years ago
+								  <title>The Power Bi-Endian Vector Programming Model</title>
-												Begin converting Chapter 6 of ELFv2 ABI document.

											
										
										
											6 years ago
 								  <para>
 								    To ensure portability of applications optimized to exploit the
-												Changed to consistently use Power versus POWER, Power ISA versus
PowerISA, etc.  Added graphic to vec_gb.

											
										
										
											5 years ago
+								    SIMD functions of Power ISA processors, this reference defines a
-												Significant updates to chapters 1-3.  Delete old outline file.

											
										
										
											5 years ago
+								    set of functions and data types for SIMD programming.  Compliant
 								    compilers will provide suitable support for these functions,
 								    preferably as built-in functions that translate to one or more
-												Changed to consistently use Power versus POWER, Power ISA versus
PowerISA, etc.  Added graphic to vec_gb.

											
										
										
											5 years ago
+								    Power ISA instructions.
-												Begin converting Chapter 6 of ELFv2 ABI document.

											
										
										
											6 years ago
+								  </para>
 								  <para>
 								    Compilers are encouraged, but not required, to provide built-in
-												Changed to consistently use Power versus POWER, Power ISA versus
PowerISA, etc.  Added graphic to vec_gb.

											
										
										
											5 years ago
+								    functions to access individual instructions in the IBM Power®
-												Begin converting Chapter 6 of ELFv2 ABI document.

											
										
										
											6 years ago
+								    instruction set architecture. In most cases, each such built-in
 								    function should provide direct access to the underlying
 								    instruction.
 								  </para>
 								  <para>
 								    However, to ease porting between little-endian (LE) and big-endian
-												Changed to consistently use Power versus POWER, Power ISA versus
PowerISA, etc.  Added graphic to vec_gb.

											
										
										
											5 years ago
+								    (BE) Power systems, and between Power and other platforms, it is
-												Begin converting Chapter 6 of ELFv2 ABI document.

											
										
										
											6 years ago
+								    preferable that some built-in functions provide the same semantics
-												Changed to consistently use Power versus POWER, Power ISA versus
PowerISA, etc.  Added graphic to vec_gb.

											
										
										
											5 years ago
+								    on both LE and BE Power systems, even if this means that the
-												Begin converting Chapter 6 of ELFv2 ABI document.

											
										
										
											6 years ago
+								    built-in functions are implemented with different instruction
 								    sequences for LE and BE. To achieve this, vector built-in
 								    functions provide a set of functions derived from the set of
-												Changed to consistently use Power versus POWER, Power ISA versus
PowerISA, etc.  Added graphic to vec_gb.

											
										
										
											5 years ago
+								    hardware functions provided by the Power SIMD instructions. Unlike
-												Significant updates to chapters 1-3.  Delete old outline file.

											
										
										
											5 years ago
+								    traditional “hardware intrinsic” built-in functions, no fixed
 								    mapping exists between these built-in functions and the generated
 								    hardware instruction sequence. Rather, the compiler is free to
 								    generate optimized instruction sequences that implement the
 								    semantics of the program specified by the programmer using these
 								    built-in functions.
-												Begin converting Chapter 6 of ELFv2 ABI document.

											
										
										
											6 years ago
+								  </para>
 								  <para>
-												Changed to consistently use Power versus POWER, Power ISA versus
PowerISA, etc.  Added graphic to vec_gb.

											
										
										
											5 years ago
+								    As we've seen, the Power SIMD instructions operate on groups of 1,
-												Significant updates to chapters 1-3.  Delete old outline file.

											
										
										
											5 years ago
+, 4, 8, or 16 vector elements at a time in 128-bit registers. On
-												Changed to consistently use Power versus POWER, Power ISA versus
PowerISA, etc.  Added graphic to vec_gb.

											
										
										
											5 years ago
+								    a big-endian Power platform, vector elements are loaded from
-												Significant updates to chapters 1-3.  Delete old outline file.

											
										
										
											5 years ago
+								    memory into a register so that the 0th element occupies the
 								    high-order bits of the register, and the (N &#8211; 1)th element
 								    occupies the low-order bits of the register. This is referred to
-												Changed to consistently use Power versus POWER, Power ISA versus
PowerISA, etc.  Added graphic to vec_gb.

											
										
										
											5 years ago
+								    as big-endian element order. On a little-endian Power platform,
-												Significant updates to chapters 1-3.  Delete old outline file.

											
										
										
											5 years ago
+								    vector elements are loaded from memory such that the 0th element
 								    occupies the low-order bits of the register, and the (N &#8211;
 )th element occupies the high-order bits. This is referred to as
 								    little-endian element order.
-												Begin converting Chapter 6 of ELFv2 ABI document.

											
										
										
											6 years ago
+								  </para>
-												Create outline for front matter chapters.

Signed-off-by: Bill Schmidt <wschmidt@linux.ibm.com>

											
										
										
											6 years ago
-												Completed incorporating portions of Chapter 6 from the ELFv2 ABI.

											
										
										
											6 years ago
+								  <note>
 								    <para>
 								      Much of the information in this chapter was formerly part of
-												Changed to consistently use Power versus POWER, Power ISA versus
PowerISA, etc.  Added graphic to vec_gb.

											
										
										
											5 years ago
+								      Chapter 6 of the 64-Bit ELF V2 ABI Specification for Power.
-												Completed incorporating portions of Chapter 6 from the ELFv2 ABI.

											
										
										
											6 years ago
+								    </para>
 								  </note>
-												Create outline for front matter chapters.

Signed-off-by: Bill Schmidt <wschmidt@linux.ibm.com>

											
										
										
											6 years ago
+								  <section>
-												Significant updates to chapters 1-3.  Delete old outline file.

											
										
										
											5 years ago
+								    <title>Language Elements</title>
 								    <para>
 								      The C and C++ languages are extended to use new identifiers
 								      <code>vector</code>, <code>pixel</code>, <code>bool</code>,
 								      <code>__vector</code>, <code>__pixel</code>, and
 								      <code>__bool</code>.  These keywords are used to specify vector
 								      data types (<xref linkend="VIPR.ch-data-types" />).  Because
-												Resolve a number of comments from Paul Clarke, and one from Steve Munroe.

											
										
										
											5 years ago
+								      these identifiers may conflict with keywords in more recent
 								      language standards for C and C++, compilers may implement these
 								      in one of two ways.
-												Significant updates to chapters 1-3.  Delete old outline file.

											
										
										
											5 years ago
+								    </para>
 								    <itemizedlist>
 								      <listitem>
 									<para>
 									  <code>__vector</code>, <code>__pixel</code>,
 									  <code>__bool</code>, and <code>bool</code> are defined as
 									  keywords, with <code>vector</code> and <code>pixel</code> as
 									  predefined macros that expand to <code>__vector</code> and
 									  <code>__pixel</code>, respectively.
 									</para>
 								      </listitem>
 								      <listitem>
 									<para>
 									  <code>__vector</code>, <code>__pixel</code>, and
 									  <code>__bool</code> are defined as keywords in all contexts,
 									  while <code>vector</code>, <code>pixel</code>, and
 									  <code>bool</code> are treated as keywords only within the
 									  context of a type declaration.
 									</para>
 								      </listitem>
 								    </itemizedlist>
-												Resolve a number of comments from Paul Clarke, and one from Steve Munroe.

											
										
										
											5 years ago
+								    <para>
 								      As a motivating example, the <emphasis
 								      role="bold">vector</emphasis> token is used as a type in the
 								      C++ Standard Template Library, and hence cannot be used as an
 								      unrestricted keyword, but can be used in the context-sensitive
 								      implementation.  For example, <emphasis role="bold">vector
 								      char</emphasis> is distinct from <emphasis
 								      role="bold">std::vector</emphasis> in the context-sensitive
 								      implementation.
 								    </para>
-												Significant updates to chapters 1-3.  Delete old outline file.

											
										
										
											5 years ago
+								    <para>
 								      Vector literals may be specified using a type cast and a set of
 								      literal initializers in parentheses or braces.  For example,
 								    </para>
 								    <programlisting>vector int x = (vector int) (4, -1, 3, 6);
 								vector double g = (vector double) { 3.5, -24.6 };</programlisting>
-												Make updates for comments received so far, including issue #4 and
issue #5.  XL bug report support for Linux is still pending.

											
										
										
											5 years ago
+								    <para>
-												Fix typos.

											
										
										
											5 years ago
+								      Current C compilers do not support literals for
-												Make updates for comments received so far, including issue #4 and
issue #5.  XL bug report support for Linux is still pending.

											
										
										
											5 years ago
+								      <code>__int128</code> types.  When constructing a <code>vector
 								      __int128</code> constant from smaller literals such as
 								      <code>int</code> or <code>long long</code>, you must test for
 								      endianness and reverse the order of the smaller literals for
 								      little-endian mode.
 								    </para>
-												Significant updates to chapters 1-3.  Delete old outline file.

											
										
										
											5 years ago
+								  </section>
 								  <section xml:id="VIPR.ch-data-types">
-												Begin converting Chapter 6 of ELFv2 ABI document.

											
										
										
											6 years ago
+								    <title>Vector Data Types</title>
 								    <para>
 								      Languages provide support for the data types in <xref
 								      linkend="VIPR.biendian.vectypes" /> to represent vector data
 								      types stored in vector registers.
 								    </para>
 								    <para>
 								      For the C and C++ programming languages (and related/derived
-												Resolve a number of comments from Paul Clarke, and one from Steve Munroe.

											
										
										
											5 years ago
+								      languages), the "Power SIMD C Types" listed in the leftmost
 								      column of <xref linkend="VIPR.biendian.vectypes" /> may be used
 								      when Power SIMD language extensions are enabled.  Either
 								      <code>vector</code> or <code>__vector</code> may be used in the
 								      type name.  Note that the ELFv2 ABI for Power also includes a
 								      <code>vector _Float16</code> data type.  As of this writing, no
 								      current compilers for Power have implemented such a type.  This
 								      document does not include that type or any intrinsics related to
 								      it.
-												Begin converting Chapter 6 of ELFv2 ABI document.

											
										
										
											6 years ago
+								    </para>
 								    <para>
-												Completed incorporating portions of Chapter 6 from the ELFv2 ABI.

											
										
										
											6 years ago
+								      For the Fortran language, <xref
 								      linkend="VIPR.biendian.fortran-types" /> gives a correspondence
 								      between Fortran and C/C++ language types.
-												Begin converting Chapter 6 of ELFv2 ABI document.

											
										
										
											6 years ago
+								    </para>
 								    <para>
 								      The assignment operator always performs a byte-by-byte data copy
 								      for vector data types.
 								    </para>
 								    <para>
 								      Like other C/C++ language types, vector types may be defined to
 								      have const or volatile properties. Vector data types can be
 								      defined as being in static, auto, and register storage.
 								    </para>
 								    <para>
 								      Pointers to vector types are defined like pointers of other
 								      C/C++ types. Pointers to vector objects may be defined to have
 								      const and volatile properties.  Pointers to vector objects must
-												Resolve a number of comments from Paul Clarke, and one from Steve Munroe.

											
										
										
											5 years ago
+								      be addresses divisible by 16, as vector objects are always
 								      aligned on quadword (16-byte, or 128-bit) boundaries.
-												Begin converting Chapter 6 of ELFv2 ABI document.

											
										
										
											6 years ago
+								    </para>
 								    <para>
 								      The preferred way to access vectors at an application-defined
 								      address is by using vector pointers and the C/C++ dereference
 								      operator <code>*</code>. Similar to other C/C++ data types, the
 								      array reference operator <code>[]</code> may be used to access
 								      vector objects with a vector pointer with the usual definition
-												Consistency of case for N-1 and Nth.

											
										
										
											5 years ago
+								      to access the <emphasis>N</emphasis>th vector element from a
-												Begin converting Chapter 6 of ELFv2 ABI document.

											
										
										
											6 years ago
+								      vector pointer. The dereference operator <code>*</code> may
 								      <emphasis>not</emphasis> be used to access data that is not
 								      aligned at least to a quadword boundary.  Built-in functions
 								      such as <code>vec_xl</code> and <code>vec_xst</code> are
-												Resolve a number of comments from Paul Clarke, and one from Steve Munroe.

											
										
										
											5 years ago
+								      provided for unaligned data access.  Please refer to <xref
 								      linkend="VIPR.biendian.unaligned" /> for an example.
-												Begin converting Chapter 6 of ELFv2 ABI document.

											
										
										
											6 years ago
+								    </para>
-												Significant updates to chapters 1-3.  Delete old outline file.

											
										
										
											5 years ago
+								    <para>
 								      One vector type may be cast to another vector type without
 								      restriction.  Such a cast is simply a reinterpretation of the
 								      bits, and does not change the data.
 								    </para>
-												Begin converting Chapter 6 of ELFv2 ABI document.

											
										
										
											6 years ago
+								    <para>
 								      Compilers are expected to recognize and optimize multiple
 								      operations that can be optimized into a single hardware
-												Resolve a number of comments from Paul Clarke, and one from Steve Munroe.

											
										
										
											5 years ago
+								      instruction. For example, a load-and-splat hardware instruction
 								      (such as <emphasis role="bold">lxvdsx</emphasis>)
-												Begin converting Chapter 6 of ELFv2 ABI document.

											
										
										
											6 years ago
+								      might be generated for the following sequence:
 								    </para>
 								    <programlisting>double *double_ptr;
 								register vector double vd = vec_splats(*double_ptr);</programlisting>
 								    <table frame="all" pgwide="1" xml:id="VIPR.biendian.vectypes">
 								      <title>Vector Types</title>
 								      <tgroup cols="4">
 								        <colspec colname="c1" colwidth="20*" />
 								        <colspec colname="c2" colwidth="10*" align="center" />
 								        <colspec colname="c3" colwidth="15*" align="center" />
 								        <colspec colname="c4" colwidth="40*" />
 								        <thead>
 								          <row>
 								            <entry align="center">
 								              <para>
 								                <emphasis role="bold">Power SIMD C Types</emphasis>
 								              </para>
 								            </entry>
 								            <entry align="center">
 								              <para>
 								                <emphasis role="bold">sizeof</emphasis>
 								              </para>
 								            </entry>
 								            <entry align="center">
 								              <para>
 								                <emphasis role="bold">Alignment</emphasis>
 								              </para>
 								            </entry>
 								            <entry align="center">
 								              <para>
 								                <emphasis role="bold">Description</emphasis>
 								              </para>
 								            </entry>
 								          </row>
 								        </thead>
 								        <tbody>
 								          <row>
 								            <entry>
 								              <para>vector unsigned char</para>
 								            </entry>
 								            <entry>
 								              <para>16</para>
 								            </entry>
 								            <entry>
 								              <para>Quadword</para>
 								            </entry>
 								            <entry>
 								              <para>Vector of 16 unsigned bytes.</para>
 								            </entry>
 								          </row>
 								          <row>
 								            <entry>
 								              <para>vector signed char</para>
 								            </entry>
 								            <entry>
 								              <para>16</para>
 								            </entry>
 								            <entry>
 								              <para>Quadword</para>
 								            </entry>
 								            <entry>
 								              <para>Vector of 16 signed bytes.</para>
 								            </entry>
 								          </row>
 								          <row>
 								            <entry>
 								              <para>vector bool char</para>
 								            </entry>
 								            <entry>
 								              <para>16</para>
 								            </entry>
 								            <entry>
 								              <para>Quadword</para>
 								            </entry>
 								            <entry>
 								              <para>Vector of 16 bytes with a value of either 0 or
 <superscript>8</superscript> &#8211; 1.</para>
 								            </entry>
 								          </row>
 								          <row>
 								            <entry>
 								              <para>vector unsigned short</para>
 								            </entry>
 								            <entry>
 								              <para>16</para>
 								            </entry>
 								            <entry>
 								              <para>Quadword</para>
 								            </entry>
 								            <entry>
 								              <para>Vector of 8 unsigned halfwords.</para>
 								            </entry>
 								          </row>
 								          <row>
 								            <entry>
 								              <para>vector signed short</para>
 								            </entry>
 								            <entry>
 								              <para>16</para>
 								            </entry>
 								            <entry>
 								              <para>Quadword</para>
 								            </entry>
 								            <entry>
 								              <para>Vector of 8 signed halfwords.</para>
 								            </entry>
 								          </row>
 								          <row>
 								            <entry>
 								              <para>vector bool short</para>
 								            </entry>
 								            <entry>
 								              <para>16</para>
 								            </entry>
 								            <entry>
 								              <para>Quadword</para>
 								            </entry>
 								            <entry>
 								              <para>Vector of 8 halfwords with a value of either 0 or
 <superscript>16</superscript> &#8211; 1.</para>
 								            </entry>
 								          </row>
-												Significant updates to chapters 1-3.  Delete old outline file.

											
										
										
											5 years ago
+								          <row>
 								            <entry>
 								              <para>vector pixel</para>
 								            </entry>
 								            <entry>
 								              <para>16</para>
 								            </entry>
 								            <entry>
 								              <para>Quadword</para>
 								            </entry>
 								            <entry>
 								              <para>Vector of 8 halfwords, each interpreted as a 1-bit
 									      channel and three 5-bit channels.</para>
 								            </entry>
 								          </row>
-												Begin converting Chapter 6 of ELFv2 ABI document.

											
										
										
											6 years ago
+								          <row>
 								            <entry>
 								              <para>vector unsigned int</para>
 								            </entry>
 								            <entry>
 								              <para>16</para>
 								            </entry>
 								            <entry>
 								              <para>Quadword</para>
 								            </entry>
 								            <entry>
 								              <para>Vector of 4 unsigned words.</para>
 								            </entry>
 								          </row>
 								          <row>
 								            <entry>
 								              <para>vector signed int</para>
 								            </entry>
 								            <entry>
 								              <para>16</para>
 								            </entry>
 								            <entry>
 								              <para>Quadword</para>
 								            </entry>
 								            <entry>
 								              <para>Vector of 4 signed words.</para>
 								            </entry>
 								          </row>
 								          <row>
 								            <entry>
 								              <para>vector bool int</para>
 								            </entry>
 								            <entry>
 								              <para>16</para>
 								            </entry>
 								            <entry>
 								              <para>Quadword</para>
 								            </entry>
 								            <entry>
 								              <para>Vector of 4 words with a value of either 0 or
 <superscript>32</superscript> &#8211; 1.</para>
 								            </entry>
 								          </row>
 								          <row>
 								            <entry>
 								              <para>vector unsigned long<footnote xml:id="vlong">
 								              <para>The vector long types are deprecated due to their
 								              ambiguity between 32-bit and 64-bit environments. The use
 								              of the vector long long types is preferred.</para>
 								              </footnote></para>
 								              <para>vector unsigned long long</para>
 								            </entry>
 								            <entry>
 								              <para>16</para>
 								            </entry>
 								            <entry>
 								              <para>Quadword</para>
 								            </entry>
 								            <entry>
 								              <para>Vector of 2 unsigned doublewords.</para>
 								            </entry>
 								          </row>
 								          <row>
 								            <entry>
 								              <para>vector signed long<footnoteref linkend="vlong" /></para>
 								              <para>vector signed long long</para>
 								            </entry>
 								            <entry>
 								              <para>16</para>
 								            </entry>
 								            <entry>
 								              <para>Quadword</para>
 								            </entry>
 								            <entry>
 								              <para>Vector of 2 signed doublewords.</para>
 								            </entry>
 								          </row>
 								          <row>
 								            <entry>
 								              <para>vector bool long<footnoteref linkend="vlong" /></para>
 								              <para>vector bool long long</para>
 								            </entry>
 								            <entry>
 								              <para>16</para>
 								            </entry>
 								            <entry>
 								              <para>Quadword</para>
 								            </entry>
 								            <entry>
 								              <para>Vector of 2 doublewords with a value of either 0 or
 <superscript>64</superscript> &#8211; 1.</para>
 								            </entry>
 								          </row>
 								          <row>
 								            <entry>
 								              <para>vector unsigned __int128</para>
 								            </entry>
 								            <entry>
 								              <para>16</para>
 								            </entry>
 								            <entry>
 								              <para>Quadword</para>
 								            </entry>
 								            <entry>
 								              <para>Vector of 1 unsigned quadword.</para>
 								            </entry>
 								          </row>
 								          <row>
 								            <entry>
 								              <para>vector signed __int128</para>
 								            </entry>
 								            <entry>
 								              <para>16</para>
 								            </entry>
 								            <entry>
 								              <para>Quadword</para>
 								            </entry>
 								            <entry>
 								              <para>Vector of 1 signed quadword.</para>
 								            </entry>
 								          </row>
 								          <row>
 								            <entry>
 								              <para>vector float</para>
 								            </entry>
 								            <entry>
 								              <para>16</para>
 								            </entry>
 								            <entry>
 								              <para>Quadword</para>
 								            </entry>
 								            <entry>
 								              <para>Vector of 4 single-precision floats.</para>
 								            </entry>
 								          </row>
 								          <row>
 								            <entry>
 								              <para>vector double</para>
 								            </entry>
 								            <entry>
 								              <para>16</para>
 								            </entry>
 								            <entry>
 								              <para>Quadword</para>
 								            </entry>
 								            <entry>
 								              <para>Vector of 2 double-precision floats.</para>
 								            </entry>
 								          </row>
 								        </tbody>
 								      </tgroup>
 								    </table>
 								  </section>
 								  <section>
 								    <title>Vector Operators</title>
 								    <para>
 								      In addition to the dereference and assignment operators, the
-												Changed to consistently use Power versus POWER, Power ISA versus
PowerISA, etc.  Added graphic to vec_gb.

											
										
										
											5 years ago
+								      Power Bi-Endian Vector Programming Model provides the usual
-												Significant updates to chapters 1-3.  Delete old outline file.

											
										
										
											5 years ago
+								      operators that are valid on pointers; these operators are also
 								      valid for pointers to vector types.
-												Begin converting Chapter 6 of ELFv2 ABI document.

											
										
										
											6 years ago
+								    </para>
 								    <para>
 								      The traditional C/C++ operators are defined on vector types
-												Resolve a number of comments from Paul Clarke, and one from Steve Munroe.

											
										
										
											5 years ago
+								      for unary and binary <code>+</code>,
-												Begin converting Chapter 6 of ELFv2 ABI document.

											
										
										
											6 years ago
+								      unary and binary &#8211;, binary <code>*</code>, binary
 								      <code>%</code>, and binary <code>/</code> as well as the unary
 								      and binary shift, logical and comparison operators, and the
-												Resolve a number of comments from Paul Clarke, and one from Steve Munroe.

											
										
										
											5 years ago
+								      ternary <code>?:</code> operator.  These operators perform their
 								      operations "elementwise" on the base elements of the operands,
 								      as follows.
-												Begin converting Chapter 6 of ELFv2 ABI document.

											
										
										
											6 years ago
+								    </para>
 								    <para>
 								      For unary operators, the specified operation is performed on
-												Resolve a number of comments from Paul Clarke, and one from Steve Munroe.

											
										
										
											5 years ago
+								      each base element of the single operand to derive the result
 								      value placed into the corresponding element of the vector
-												Begin converting Chapter 6 of ELFv2 ABI document.

											
										
										
											6 years ago
+								      result. The result type of unary operations is the type of the
-												Resolve a number of comments from Paul Clarke, and one from Steve Munroe.

											
										
										
											5 years ago
+								      single operand.  For example,
 								    </para>
 								    <programlisting>vector signed int a, b;
 								a = -b;</programlisting>
 								    <para>
 								      produces the same result as
-												Begin converting Chapter 6 of ELFv2 ABI document.

											
										
										
											6 years ago
+								    </para>
-												Resolve a number of comments from Paul Clarke, and one from Steve Munroe.

											
										
										
											5 years ago
+								    <programlisting>vector signed int a, b;
 								a = vec_neg (b);</programlisting>
-												Begin converting Chapter 6 of ELFv2 ABI document.

											
										
										
											6 years ago
+								    <para>
 								      For binary operators, the specified operation is performed on
-												Resolve a number of comments from Paul Clarke, and one from Steve Munroe.

											
										
										
											5 years ago
+								      corresponding base elements of both operands to derive the
 								      result value for each vector element of the vector result. Both
 								      operands of the binary operators must have the same vector type
 								      with the same base element type. The result of binary operators
 								      is the same type as the type of the operands.  For example,
 								    </para>
 								    <programlisting>vector signed int a, b;
 								a = a + b;</programlisting>
 								    <para>
 								      produces the same result as
 								    </para>
 								    <programlisting>vector signed int a, b;
 								a = vec_add (a, b);</programlisting>
-												Begin converting Chapter 6 of ELFv2 ABI document.

											
										
										
											6 years ago
+								    <para>
 								      Further, the array reference operator may be applied to vector
 								      data types, yielding an l-value corresponding to the specified
 								      element in accordance with the vector element numbering rules (see
 								      <xref linkend="VIPR.biendian.layout" />). An l-value may either
-												Resolve a number of comments from Paul Clarke, and one from Steve Munroe.

											
										
										
											5 years ago
+								      be assigned a new value or accessed for reading its value.  For
 								      example,
-												Begin converting Chapter 6 of ELFv2 ABI document.

											
										
										
											6 years ago
+								    </para>
-												Resolve a number of comments from Paul Clarke, and one from Steve Munroe.

											
										
										
											5 years ago
+								    <programlisting>vector signed int a;
 								signed int b, c;
 								b = a[0];
 								a[3] = c;</programlisting>
-												Begin converting Chapter 6 of ELFv2 ABI document.

											
										
										
											6 years ago
+								  </section>
 								  <section xml:id="VIPR.biendian.layout">
 								    <title>Vector Layout and Element Numbering</title>
-												More work in chapter 2.

											
										
										
											6 years ago
+								    <para>
 								      Vector data types consist of a homogeneous sequence of elements
 								      of the base data type specified in the vector data
 								      type. Individual elements of a vector can be addressed by a
-												Rewrite section 2.4 for #8.

Signed-off-by: Bill Schmidt <wschmidt@linux.ibm.com>

											
										
										
											5 years ago
+								      vector element number.  To understand how vector elements are
 								      represented in memory and in registers, it is best to start with
 								      some simple concepts of endianness.
-												More work in chapter 2.

											
										
										
											6 years ago
+								    </para>
-												Rewrite section 2.4 for #8.

Signed-off-by: Bill Schmidt <wschmidt@linux.ibm.com>

											
										
										
											5 years ago
+								    <figure pgwide="1" xml:id="scalar-endian">
 								      <title>Scalar Quantities and Endianness</title>
 								      <mediaobject>
 									<imageobject>
 									  <imagedata fileref="Scalar-endian.png" format="PNG"
 										     scalefit="1" width="100%" />
 									</imageobject>
 								      </mediaobject>
 								    </figure>
-												More work in chapter 2.

											
										
										
											6 years ago
+								    <para>
-												Rewrite section 2.4 for #8.

Signed-off-by: Bill Schmidt <wschmidt@linux.ibm.com>

											
										
										
											5 years ago
+								      <xref linkend="scalar-endian" /> shows different representations
 								      of a 64-bit scalar integer with the hexadecimal value
 								      <code>0x0123456789ABCDEF</code>.  We say that the most
 								      significant byte (MSB) of this value is <code>0x01</code>, and
 								      its least significant byte (LSB) is <code>0xEF</code>.  The scalar
 								      value is stored using eight bytes of memory.  On a little-endian
 								      (LE) system, the LSB is stored at the lowest address of these
 								      eight bytes, and the MSB is stored at the highest address.  On a
 								      big-endian (BE) system, the MSB is stored at the lowest address
 								      of these eight bytes, and the LSB is stored at the highest
 								      address.  Regardless of the memory order, the register
 								      representation of the scalar value is identical; the MSB is
 								      located on the "left" end of the register, and the LSB is
 								      located on the "right" end.
-												More work in chapter 2.

											
										
										
											6 years ago
+								    </para>
 								    <para>
-												Rewrite section 2.4 for #8.

Signed-off-by: Bill Schmidt <wschmidt@linux.ibm.com>

											
										
										
											5 years ago
+								      Of course, the concept of "left" and "right" is a useful
 								      fiction; there is no guarantee that the circuitry of a hardware
 								      register is laid out this way.  However, we will see, as we deal
 								      with vector elements, that the concepts of left and right are
 								      more natural for human understanding than byte and element
 								      significance.  Indeed, most programming languages have
-												Replace "instructions" with "operators".

Signed-off-by: Bill Schmidt <wschmidt@linux.ibm.com>

											
										
										
											5 years ago
+								      operators, such as shift-left and shift-right, that use this
-												Rewrite section 2.4 for #8.

Signed-off-by: Bill Schmidt <wschmidt@linux.ibm.com>

											
										
										
											5 years ago
+								      same terminology.
-												More work in chapter 2.

											
										
										
											6 years ago
+								    </para>
 								    <para>
-												Rewrite section 2.4 for #8.

Signed-off-by: Bill Schmidt <wschmidt@linux.ibm.com>

											
										
										
											5 years ago
+								      Let's move from scalars to arrays, which are more interesting to
-												Wording change to avoid the word "map".

Signed-off-by: Bill Schmidt <wschmidt@linux.ibm.com>

											
										
										
											5 years ago
+								      us since we can use vector registers to operate on arrays, or
 								      portions of larger arrays.  Suppose we
-												Rewrite section 2.4 for #8.

Signed-off-by: Bill Schmidt <wschmidt@linux.ibm.com>

											
										
										
											5 years ago
+								      have an array of bytes with values 0 through 15, as shown in
 								      <xref linkend="byte-array-endian" />.  Note that each byte is a
 								      separate data element with only one possible representation in
 								      memory, so the array of bytes looks identical in memory,
 								      regardless of whether we are using a BE system or an LE system.
 								      But when we load these 16 bytes into a vector register, perhaps
 								      by using the ISA 3.0 <emphasis role="bold">lxv</emphasis>
 								      instruction, the byte at the lowest address on an LE system will
 								      be placed in the LSB of the vector register, but on a BE system
 								      will be placed in the MSB of the vector register.  Thus the
 								      array elements appear "right to left" in the register on an LE
 								      system, and "left to right" in the register on a BE system.
 								    </para>
 								    <figure pgwide="1" xml:id="byte-array-endian">
 								      <title>Byte Arrays and Endianness</title>
 								      <mediaobject>
 									<imageobject>
 									  <imagedata fileref="Byte-array-endian.png" format="PNG"
 										     scalefit="1" width="100%" />
 									</imageobject>
 								      </mediaobject>
 								    </figure>
 								    <para>
 								      Things become even more interesting when we consider arrays of
 								      larger elements.  In <xref linkend="word-array-endian" />, we
 								      see the layout of an array of four 32-bit integers, where the 0th
 								      element has hexadecimal value <code>0x00010203</code>, the 1st
 								      element has value <code>0x04050607</code>, the 2nd element has
 								      value <code>0x08090A0B</code>, and the 3rd element has value
 								      <code>0x0C0D0E0F</code>.  The order of the array elements in
 								      memory is the same for both LE and BE systems; but the layout of
 								      each element itself is reversed.  When the <emphasis
 								      role="bold">lxv</emphasis> instruction is used to load the
 								      memory into a vector register, again the low address is loaded
 								      into the LSB of the register for LE, but loaded into the MSB of
 								      the register for BE.  The effect is that the array elements
 								      again appear right-to-left on a LE system and left-to-right on a
 								      BE system.  Note that each 32-bit element of the array has its
 								      most significant bit "on the left" whether a LE or BE system is
 								      in use.  This is of course necessary for proper arithmetic to be
 								      performed on the array elements by vector instructions.
 								    </para>
 								    <figure pgwide="1" xml:id="word-array-endian">
 								      <title>Word Arrays and Endianness</title>
 								      <mediaobject>
 									<imageobject>
 									  <imagedata fileref="Word-array-endian.png" format="PNG"
 										     scalefit="1" width="100%" />
 									</imageobject>
 								      </mediaobject>
 								    </figure>
 								<!-- Element numbers can be established either
 								      by counting from the “left” of a register and assigning the
 								      left-most element the element number 0, or from the “right” of
 								      the register and assigning the right-most element the element
 								      number 0.
 								      </para>
 								      -->
 								    <para>
 								      Thus on a BE system, we number vector elements starting with 0
 								      on the left, while on an LE system, we number vector elements
 								      starting with 0 on the right.  We will informally refer to these
 								      as big-endian and little-endian vector element numberings and
 								      vector layouts.
-												More work in chapter 2.

											
										
										
											6 years ago
+								    </para>
 								    <para>
 								      This element numbering shall also be used by the <code>[]</code>
 								      accessor method to vector elements provided as an extension of
 								      the C/C++ languages by some compilers, as well as for other
 								      language extensions or library constructs that directly or
 								      indirectly refer to elements by their element number.
 								    </para>
 								    <para>
 								      Application programs may query the vector element ordering in
 								      use by testing the __VEC_ELEMENT_REG_ORDER__ macro. This macro
 								      has two possible values:
 								    </para>
 								    <informaltable frame="none" rowsep="0" colsep="0">
 								      <tgroup cols="2">
 								        <colspec colname="c1" colwidth="40*" />
 								        <colspec colname="c2" colwidth="60*" />
 								        <tbody>
 								          <row>
 								            <entry>
 								              <para>__ORDER_LITTLE_ENDIAN__</para>
 								            </entry>
 								            <entry>
 								              <para>Vector elements use little-endian element ordering.</para>
 								            </entry>
 								          </row>
 								          <row>
 								            <entry>
 								              <para>__ORDER_BIG_ENDIAN__</para>
 								            </entry>
 								            <entry>
 								              <para>Vector elements use big-endian element ordering.</para>
 								            </entry>
 								          </row>
 								        </tbody>
 								      </tgroup>
 								    </informaltable>
-												Resolve a number of comments from Paul Clarke, and one from Steve Munroe.

											
										
										
											5 years ago
+								    <para>
 								      This is no longer as useful as it once was.  The primary use
 								      case was for big-endian vector layout in little-endian
 								      environments, which is now deprecated as discussed in <xref
-												Rewrite section 2.4 for #8.

Signed-off-by: Bill Schmidt <wschmidt@linux.ibm.com>

											
										
										
											5 years ago
+								      linkend="VIPR.biendian.BELE" />.  It's generally equivalent to
 								      test for <code>__BIG_ENDIAN__</code> or
 								      <code>__LITTLE_ENDIAN__</code>.
-												Resolve a number of comments from Paul Clarke, and one from Steve Munroe.

											
										
										
											5 years ago
+								    </para>
-												Make updates for comments received so far, including issue #4 and
issue #5.  XL bug report support for Linux is still pending.

											
										
										
											5 years ago
+								    <note>
 								      <para>
-												Rewrite section 2.4 for #8.

Signed-off-by: Bill Schmidt <wschmidt@linux.ibm.com>

											
										
										
											5 years ago
+									Remember that each element in a vector has the same representation
-												Make updates for comments received so far, including issue #4 and
issue #5.  XL bug report support for Linux is still pending.

											
										
										
											5 years ago
+									in both big- and little-endian element orders.  That is, an
 									<code>int</code> is always 32 bits, with the sign bit in the
 									high-order position.  Programmers must be aware of this when
 									programming with mixed data types, such as an instruction that
 									multiplies two <code>short</code> elements to produce an
 									<code>int</code> element.  Always access entire elements to
 									avoid potential endianness issues.
 								      </para>
 								    </note>
-												More work in chapter 2.

											
										
										
											6 years ago
+								  </section>
 								  <section>
 								    <title>Vector Built-In Functions</title>
 								    <para>
-												Changed to consistently use Power versus POWER, Power ISA versus
PowerISA, etc.  Added graphic to vec_gb.

											
										
										
											5 years ago
+								      Some of the Power SIMD hardware instructions refer, implicitly
-												More work in chapter 2.

											
										
										
											6 years ago
+								      or explicitly, to vector element numbers.  For example, the
 								      <code>vspltb</code> instruction has as one of its inputs an
 								      index into a vector.  The element at that index position is to
 								      be replicated in every element of the output vector.  For
 								      another example, <code>vmuleuh</code> instruction operates on
 								      the even-numbered elements of its input vectors.  The hardware
 								      instructions define these element numbers using big-endian
 								      element order, even when the machine is running in little-endian
 								      mode.  Thus, a built-in function that maps directly to the
 								      underlying hardware instruction, regardless of the target
 								      endianness, has the potential to confuse programmers on
 								      little-endian platforms.
 								    </para>
 								    <para>
 								      It is more useful to define built-in functions that map to these
 								      instructions to use natural element order.  That is, the
 								      explicit or implicit element numbers specified by such built-in
 								      functions should be interpreted using big-endian element order
 								      on a big-endian platform, and using little-endian element order
 								      on a little-endian platform.
 								    </para>
 								    <para>
 								      The descriptions of the built-in functions in <xref
 								      linkend="VIPR.vec-ref" /> contain notes on endian issues that
 								      apply to each built-in function.  Furthermore, a built-in
 								      function requiring a different compiler implementation for
 								      big-endian than it uses for little-endian has a sample
 								      compiler implementation for both BE and LE.  These sample
 								      implementations are only intended as examples; designers of a
 								      compiler are free to use other methods to implement the
-												Resolve a number of comments from Paul Clarke, and one from Steve Munroe.

											
										
										
											5 years ago
+								      specified semantics.
-												More work in chapter 2.

											
										
										
											6 years ago
+								    </para>
-												Add Table 2.3, Endian-Sensitive Built-In Functions

This fixes #7.

Signed-off-by: Bill Schmidt <wschmidt@linux.ibm.com>

											
										
										
											5 years ago
+								    <para>
 								      Of course, most built-in functions operate only on corresponding
 								      sets of elements of input vectors to produce output vectors, and
 								      thus are not "endian-sensitive."  A complete list of
 								      endian-sensitive built-in functions can be found in <xref
 								      linkend="VIPR.biendian.sensitive" />.
 								    </para>
-												More work in chapter 2.

											
										
										
											6 years ago
+								    <section>
 								      <title>Extended Data Movement Functions</title>
 								      <para>
 									The built-in functions in <xref
 									linkend="VIPR.biendian.vmx-mem" /> map to Altivec/VMX load and
 									store instructions and provide access to the “auto-aligning”
 									memory instructions of the VMX ISA where low-order address
 									bits are discarded before performing a memory access. These
-												Resolve a number of comments from Paul Clarke, and one from Steve Munroe.

											
										
										
											5 years ago
+									instructions load and store data in accordance with the
-												More work in chapter 2.

											
										
										
											6 years ago
+									program's current endian mode, and do not need to be adapted
-												Significant updates to chapters 1-3.  Delete old outline file.

											
										
										
											5 years ago
+									by the compiler to reflect little-endian operation during code
-												More work in chapter 2.

											
										
										
											6 years ago
+									generation.
 								      </para>
-												Add Table 2.3, Endian-Sensitive Built-In Functions

This fixes #7.

Signed-off-by: Bill Schmidt <wschmidt@linux.ibm.com>

											
										
										
											5 years ago
+								      <para>
 									Before the bi-endian programming model was introduced, the
 									<code>vec_lvsl</code> and <code>vec_lvsr</code> intrinsics
 									were supported.  These could be used in conjunction with
 									<code>vec_perm</code> and VMX load and store instructions for
 									unaligned access. The <code>vec_lvsl</code> and
 									<code>vec_lvsr</code> interfaces are deprecated in accordance
 									with the interfaces specified here. For compatibility, the
 									built-in pseudo sequences published in previous VMX documents
 									continue to work with little-endian data layout and the
 									little-endian vector layout described in this document.
 									However, the use of these sequences in new code is discouraged
 									and usually results in worse performance. It is recommended
 									that compilers issue a warning when these functions are used
 									in little-endian environments.
 								      </para>
-												More work in chapter 2.

											
										
										
											6 years ago
+								      <table frame="all" pgwide="1" xml:id="VIPR.biendian.vmx-mem">
 								        <title>VMX Memory Access Built-In Functions</title>
 								        <tgroup cols="3">
 								          <colspec colname="c1" colwidth="15*" align="center" />
 								          <colspec colname="c2" colwidth="35*" align="center" />
 								          <colspec colname="c3" colwidth="50*" />
 								          <thead>
 								            <row>
 								              <entry>
 								                <para>
 								                  <emphasis role="bold">Built-in Function</emphasis>
 								                </para>
 								              </entry>
 								              <entry>
 								                <para>
-												Changed to consistently use Power versus POWER, Power ISA versus
PowerISA, etc.  Added graphic to vec_gb.

											
										
										
											5 years ago
+								                  <emphasis role="bold">Corresponding Power
-												More work in chapter 2.

											
										
										
											6 years ago
+								                  Instructions</emphasis>
 								                </para>
 								              </entry>
 								              <entry align="center">
 								                <para>
 								                  <emphasis role="bold">Implementation Notes</emphasis>
 								                </para>
 								              </entry>
 								            </row>
 								          </thead>
 								          <tbody>
 								            <row>
 								              <entry>
 								                <para>vec_ld</para>
 								              </entry>
 								              <entry>
 								                <para>lvx</para>
 								              </entry>
 								              <entry>
 								                <para>Hardware works as a function of endian mode.</para>
 								              </entry>
 								            </row>
 								            <row>
 								              <entry>
 								                <para>vec_lde</para>
 								              </entry>
 								              <entry>
 								                <para>lvebx, lvehx, lvewx</para>
 								              </entry>
 								              <entry>
 								                <para>Hardware works as a function of endian mode.</para>
 								              </entry>
 								            </row>
 								            <row>
 								              <entry>
 								                <para>vec_ldl</para>
 								              </entry>
 								              <entry>
 								                <para>lvxl</para>
 								              </entry>
 								              <entry>
 								                <para>Hardware works as a function of endian mode.</para>
 								              </entry>
 								            </row>
 								            <row>
 								              <entry>
 								                <para>vec_st</para>
 								              </entry>
 								              <entry>
 								                <para>stvx</para>
 								              </entry>
 								              <entry>
 								                <para>Hardware works as a function of endian mode.</para>
 								              </entry>
 								            </row>
 								            <row>
 								              <entry>
 								                <para>vec_ste</para>
 								              </entry>
 								              <entry>
 								                <para>stvebx, stvehx, stvewx</para>
 								              </entry>
 								              <entry>
 								                <para>Hardware works as a function of endian mode.</para>
 								              </entry>
 								            </row>
 								            <row>
 								              <entry>
 								                <para>vec_stl</para>
 								              </entry>
 								              <entry>
 								                <para>stvxl</para>
 								              </entry>
 								              <entry>
 								                <para>Hardware works as a function of endian mode.</para>
 								              </entry>
 								            </row>
 								          </tbody>
 								        </tgroup>
 								      </table>
-												Significant updates to chapters 1-3.  Delete old outline file.

											
										
										
											5 years ago
+								      <para>
-												Resolve a number of comments from Paul Clarke, and one from Steve Munroe.

											
										
										
											5 years ago
+									Instead, it is recommended that programmers use the
 									<code>vec_xl</code> and <code>vec_xst</code> vector built-in
 									functions to access unaligned data streams.  See the
 									descriptions of these instructions in <xref
 									linkend="VIPR.vec-ref" /> for further description and
 									implementation details.
-												More work in chapter 2.

											
										
										
											6 years ago
+								      </para>
-												Add Table 2.3, Endian-Sensitive Built-In Functions

This fixes #7.

Signed-off-by: Bill Schmidt <wschmidt@linux.ibm.com>

											
										
										
											5 years ago
+								      <table frame="all" pgwide="1" xml:id="VIPR.biendian.sensitive">
 									<title>Endian-Sensitive Built-In Functions</title>
 									<tgroup cols="3">
 								          <colspec colname="c1" colwidth="15*" align="center" />
 								          <colspec colname="c2" colwidth="15*" align="center" />
-												Fixed typo in column length in Table 2.3

Signed-off-by: Bill Schmidt <wschmidt@linux.ibm.com>

											
										
										
											5 years ago
+								          <colspec colname="c3" colwidth="15*" align="center" />
-												Add Table 2.3, Endian-Sensitive Built-In Functions

This fixes #7.

Signed-off-by: Bill Schmidt <wschmidt@linux.ibm.com>

											
										
										
											5 years ago
+									  <tbody>
 								            <row>
 								              <entry>
 										<para>vec_bperm</para>
 								              </entry>
 								              <entry>
 										<para>vec_mergeo</para>
 								              </entry>
 								              <entry>
 										<para>vec_sld</para>
 								              </entry>
 								            </row>
 								            <row>
 								              <entry>
 										<para>vec_cipher_be</para>
 								              </entry>
 								              <entry>
 										<para>vec_mfvscr</para>
 								              </entry>
 								              <entry>
 										<para>vec_sldw</para>
 								              </entry>
 								            </row>
 								            <row>
 								              <entry>
 										<para>vec_cipherlast_be</para>
 								              </entry>
 								              <entry>
 										<para>vec_mule</para>
 								              </entry>
 								              <entry>
 										<para>vec_sll</para>
 								              </entry>
 								            </row>
 								            <row>
 								              <entry>
 										<para>vec_doublee</para>
 								              </entry>
 								              <entry>
 										<para>vec_mulo</para>
 								              </entry>
 								              <entry>
 										<para>vec_slo</para>
 								              </entry>
 								            </row>
 								            <row>
 								              <entry>
 										<para>vec_doubleh</para>
 								              </entry>
 								              <entry>
 										<para>vec_ncipher_be</para>
 								              </entry>
 								              <entry>
 										<para>vec_slv</para>
 								              </entry>
 								            </row>
 								            <row>
 								              <entry>
 										<para>vec_doublel</para>
 								              </entry>
 								              <entry>
 										<para>vec_ncipherlast_be</para>
 								              </entry>
 								              <entry>
 										<para>vec_splat</para>
 								              </entry>
 								            </row>
 								            <row>
 								              <entry>
 										<para>vec_doubleo</para>
 								              </entry>
 								              <entry>
 										<para>vec_pack</para>
 								              </entry>
 								              <entry>
 										<para>vec_srl</para>
 								              </entry>
 								            </row>
 								            <row>
 								              <entry>
 										<para>vec_extract</para>
 								              </entry>
 								              <entry>
 										<para>vec_pack_to_short_fp32</para>
 								              </entry>
 								              <entry>
 										<para>vec_sro</para>
 								              </entry>
 								            </row>
 								            <row>
 								              <entry>
 										<para>vec_extract_fp32_from_shorth</para>
 								              </entry>
 								              <entry>
 										<para>vec_packpx</para>
 								              </entry>
 								              <entry>
 										<para>vec_srv</para>
 								              </entry>
 								            </row>
 								            <row>
 								              <entry>
 										<para>vec_extract_fp32_from_shortl</para>
 								              </entry>
 								              <entry>
 										<para>vec_packs</para>
 								              </entry>
 								              <entry>
 										<para>vec_sum2s</para>
 								              </entry>
 								            </row>
 								            <row>
 								              <entry>
 										<para>vec_extract_4b</para>
 								              </entry>
 								              <entry>
 										<para>vec_packsu</para>
 								              </entry>
 								              <entry>
 										<para>vec_sums</para>
 								              </entry>
 								            </row>
 								            <row>
 								              <entry>
 										<para>vec_float2</para>
 								              </entry>
 								              <entry>
 										<para>vec_perm</para>
 								              </entry>
 								              <entry>
 										<para>vec_unpackh</para>
 								              </entry>
 								            </row>
 								            <row>
 								              <entry>
 										<para>vec_floate</para>
 								              </entry>
 								              <entry>
 										<para>vec_permxor</para>
 								              </entry>
 								              <entry>
 										<para>vec_unpackl</para>
 								              </entry>
 								            </row>
 								            <row>
 								              <entry>
 										<para>vec_floato</para>
 								              </entry>
 								              <entry>
 										<para>vec_pmsum_be</para>
 								              </entry>
 								              <entry>
 										<para>vec_unsigned2</para>
 								              </entry>
 								            </row>
 								            <row>
 								              <entry>
 										<para>vec_gb</para>
 								              </entry>
 								              <entry>
 										<para>vec_reve</para>
 								              </entry>
 								              <entry>
 										<para>vec_unsignede</para>
 								              </entry>
 								            </row>
 								            <row>
 								              <entry>
 										<para>vec_insert</para>
 								              </entry>
 								              <entry>
 										<para>vec_sbox_be</para>
 								              </entry>
 								              <entry>
 										<para>vec_unsignedo</para>
 								              </entry>
 								            </row>
 								            <row>
 								              <entry>
 										<para>vec_insert_4b</para>
 								              </entry>
 								              <entry>
 										<para>vec_shasigma_be</para>
 								              </entry>
 								              <entry>
 										<para>vec_xl (ISA 2.07 only)</para>
 								              </entry>
 								            </row>
 								            <row>
 								              <entry>
 										<para>vec_mergee</para>
 								              </entry>
 								              <entry>
 										<para>vec_signed2</para>
 								              </entry>
 								              <entry>
 										<para>vec_xl_be</para>
 								              </entry>
 								            </row>
 								            <row>
 								              <entry>
 										<para>vec_mergeh</para>
 								              </entry>
 								              <entry>
 										<para>vec_signede</para>
 								              </entry>
 								              <entry>
 										<para>vec_xst (ISA 2.07 only)</para>
 								              </entry>
 								            </row>
 								            <row>
 								              <entry>
 										<para>vec_mergel</para>
 								              </entry>
 								              <entry>
 										<para>vec_signedo</para>
 								              </entry>
 								              <entry>
 										<para>vec_xst_be</para>
 								              </entry>
 								            </row>
 									  </tbody>
 									</tgroup>
 								      </table>
-												More work in chapter 2.

											
										
										
											6 years ago
+								    </section>
-												Resolve a number of comments from Paul Clarke, and one from Steve Munroe.

											
										
										
											5 years ago
+								    <section xml:id="VIPR.biendian.BELE">
-												More work in chapter 2.

											
										
										
											6 years ago
+								      <title>Big-Endian Vector Layout in Little-Endian Environments
 								      (Deprecated)</title>
 								      <para>
 									Versions 1.0 through 1.4 of the 64-Bit ELFv2 ABI Specification
-												Changed to consistently use Power versus POWER, Power ISA versus
PowerISA, etc.  Added graphic to vec_gb.

											
										
										
											5 years ago
+									for Power provided for optional compiler support for using
-												More work in chapter 2.

											
										
										
											6 years ago
+									big-endian element ordering in little-endian environments.
 									This was initially deemed useful for porting certain libraries
 									that assumed big-endian element ordering regardless of the
 									endianness of their input streams.  In practice, this
 									introduced serious compiler complexity without much utility.
 									Thus this support (previously controlled by switches
 									<code>-maltivec=be</code> and/or <code>-qaltivec=be</code>) is
-												Finish all the front matter!

											
										
										
											5 years ago
+									now deprecated.  Current versions of the GCC and Clang
-												More work in chapter 2.

											
										
										
											6 years ago
+									open-source compilers do not implement this support.
 								      </para>
 								    </section>
 								  </section>
 								  <section>
 								    <title>Language-Specific Vector Support for Other
 								    Languages</title>
-												Completed incorporating portions of Chapter 6 from the ELFv2 ABI.

											
										
										
											6 years ago
+								    <section>
 								      <title>Fortran</title>
 								      <para>
 									<xref linkend="VIPR.biendian.fortran-types" /> shows the
 									correspondence between the C/C++ types described in this
 									document and their Fortran equivalents. In Fortran, the
 									Boolean vector data types are represented by
 									<code>VECTOR(UNSIGNED(</code><emphasis>n</emphasis><code>))</code>.
 								      </para>
 								      <table frame="all" pgwide="1" xml:id="VIPR.biendian.fortran-types">
 								        <title>Fortran Vector Data Types</title>
 								        <tgroup cols="2">
 								          <colspec colname="c1" colwidth="50*" />
 								          <colspec colname="c2" colwidth="50*" />
 								          <thead>
 								            <row>
 								              <entry align="center">
 								                <para>
 								                  <emphasis role="bold">XL Fortran Vector Type</emphasis>
 								                </para>
 								              </entry>
 								              <entry align="center">
 								                <para>
 								                  <emphasis role="bold">XL C/C++ Vector Type</emphasis>
 								                </para>
 								              </entry>
 								            </row>
 								          </thead>
 								          <tbody>
 								            <row>
 								              <entry>
 								                <para>VECTOR(INTEGER(1))</para>
 								              </entry>
 								              <entry>
 								                <para>vector signed char</para>
 								              </entry>
 								            </row>
 								            <row>
 								              <entry>
 								                <para>VECTOR(INTEGER(2))</para>
 								              </entry>
 								              <entry>
 								                <para>vector signed short</para>
 								              </entry>
 								            </row>
 								            <row>
 								              <entry>
 								                <para>VECTOR(INTEGER(4))</para>
 								              </entry>
 								              <entry>
 								                <para>vector signed int</para>
 								              </entry>
 								            </row>
 								            <row>
 								              <entry>
 								                <para>VECTOR(INTEGER(8))</para>
 								              </entry>
 								              <entry>
 								                <para>vector signed long long, vector signed long<footnote
 										xml:id="vlongappalling">
 								                  <para>The vector long types are deprecated due to their
 								                  ambiguity between 32-bit and 64-bit environments. The use
 								                  of the vector long long types is preferred.</para>
 										</footnote></para>
 								              </entry>
 								            </row>
 								            <row>
 								              <entry>
 								                <para>VECTOR(INTEGER(16))</para>
 								              </entry>
 								              <entry>
 								                <para>vector signed __int128</para>
 								              </entry>
 								            </row>
 								            <row>
 								              <entry>
 								                <para>VECTOR(UNSIGNED(1))</para>
 								              </entry>
 								              <entry>
 								                <para>vector unsigned char</para>
 								              </entry>
 								            </row>
 								            <row>
 								              <entry>
 								                <para>VECTOR(UNSIGNED(2))</para>
 								              </entry>
 								              <entry>
 								                <para>vector unsigned short</para>
 								              </entry>
 								            </row>
 								            <row>
 								              <entry>
 								                <para>VECTOR(UNSIGNED(4))</para>
 								              </entry>
 								              <entry>
 								                <para>vector unsigned int</para>
 								              </entry>
 								            </row>
 								            <row>
 								              <entry>
 								                <para>VECTOR(UNSIGNED(8))</para>
 								              </entry>
 								              <entry>
 								                <para>vector unsigned long long, vector unsigned long<footnoteref
 										linkend="vlongappalling" /></para>
 								              </entry>
 								            </row>
 								            <row>
 								              <entry>
 								                <para>VECTOR(UNSIGNED(16))</para>
 								              </entry>
 								              <entry>
 								                <para>vector unsigned __int128</para>
 								              </entry>
 								            </row>
 								            <row>
 								              <entry>
 								                <para>VECTOR(REAL(4))</para>
 								              </entry>
 								              <entry>
 								                <para>vector float</para>
 								              </entry>
 								            </row>
 								            <row>
 								              <entry>
 								                <para>VECTOR(REAL(8))</para>
 								              </entry>
 								              <entry>
 								                <para>vector double</para>
 								              </entry>
 								            </row>
 								            <row>
 								              <entry>
 								                <para>VECTOR(PIXEL)</para>
 								              </entry>
 								              <entry>
 								                <para>vector pixel</para>
 								              </entry>
 								            </row>
 								          </tbody>
 								        </tgroup>
 								      </table>
 								      <para>
 									Because the Fortran language does not support pointers, vector
 									built-in functions that expect pointers to a base type take an
 									array element reference to indicate the address of a memory
 									location that is the subject of a memory access built-in
 									function.
 								      </para>
 								      <para>
 									Because the Fortran language does not support type casts, the
 									<code>vec_convert</code> and <code>vec_concat</code> built-in
 									functions shown in <xref linkend="VIPR.endian.convert" /> are
 									provided to perform bit-exact type conversions between vector
 									types.
 								      </para>
 								      <table frame="all" pgwide="1" xml:id="VIPR.endian.convert">
 								        <title>Built-In Vector Conversion Functions</title>
 								        <tgroup cols="2">
 								          <colspec colname="c1" colwidth="30*" align="center" />
 								          <colspec colname="c2" colwidth="70*" />
 								          <thead>
 								            <row>
 								              <entry>
 								                <para>
 								                  <emphasis role="bold">Group</emphasis>
 								                </para>
 								              </entry>
 								              <entry align="center">
 								                <para>
 								                  <emphasis role="bold">Description</emphasis>
 								                </para>
 								              </entry>
 								            </row>
 								          </thead>
 								          <tbody>
 								            <row>
 								              <entry>
 								                <para>VEC_CONCAT (ARG1, ARG2)<?linebreak?>(Fortran)</para>
 								                <para></para>
 								              </entry>
 								              <entry>
 								                <para>Purpose:</para>
 								                <para>Concatenates two elements to form a vector.</para>
 								                <para>Result value:</para>
 								                <para>The resulting vector consists of the two scalar elements,
 								                ARG1 and ARG2, assigned to elements 0 and 1 (using the
 								                environment’s native endian numbering), respectively.</para>
 								                <itemizedlist>
 								                  <listitem>
 								                    <para><emphasis role="bold">Note:  </emphasis>This function corresponds to the C/C++ vector
 								                    constructor (vector type){a,b}. It is provided only for
 								                    languages without vector constructors.</para>
 								                  </listitem>
 								                </itemizedlist>
 								              </entry>
 								            </row>
 								            <row>
 								              <entry>
 								                <para></para>
 								              </entry>
 								              <entry>
 								                <para>vector signed long long vec_concat (signed long long,
 								                signed long long);</para>
 								              </entry>
 								            </row>
 								            <row>
 								              <entry>
 								                <para></para>
 								              </entry>
 								              <entry>
 								                <para>vector unsigned long long vec_concat (unsigned long long,
 								                unsigned long long);</para>
 								              </entry>
 								            </row>
 								            <row>
 								              <entry>
 								                <para></para>
 								              </entry>
 								              <entry>
 								                <para>vector double vec_concat (double, double);</para>
 								              </entry>
 								            </row>
 								            <row>
 								              <entry>
 								                <para>VEC_CONVERT(V, MOLD)</para>
 								              </entry>
 								              <entry>
 								                <para>Purpose:</para>
 								                <para>Converts a vector to a vector of a given type.</para>
 								                <para>Class:</para>
 								                <para>Pure function</para>
 								                <para>Argument type and attributes:</para>
 								                <itemizedlist spacing="compact">
 								                  <listitem>
 								                    <para>V Must be an INTENT(IN) vector.</para>
 								                  </listitem>
 								                  <listitem>
 								                    <para>MOLD Must be an INTENT(IN) vector. If it is a
 								                    variable, it need not be defined.</para>
 								                  </listitem>
 								                </itemizedlist>
 								                <para>Result type and attributes:</para>
 								                <para>The result is a vector of the same type as MOLD.</para>
 								                <para>Result value:</para>
 								                <para>The result is as if it were on the left-hand side of an
 								                intrinsic assignment with V on the right-hand side.</para>
 								              </entry>
 								            </row>
 								          </tbody>
 								        </tgroup>
 								      </table>
 								    </section>
-												Create outline for front matter chapters.

Signed-off-by: Bill Schmidt <wschmidt@linux.ibm.com>

											
										
										
											6 years ago
+								  </section>
 								  <section>
-												Finish chapter 2.

											
										
										
											5 years ago
+								    <title>Examples and Limitations</title>
-												Resolve a number of comments from Paul Clarke, and one from Steve Munroe.

											
										
										
											5 years ago
+								    <section xml:id="VIPR.biendian.unaligned">
-												Finish chapter 2.

											
										
										
											5 years ago
+								      <title>Unaligned vector access</title>
 								      <para>
 									A common programming error is to cast a pointer to a base type
 									(such as <code>int</code>) to a pointer of the corresponding
 									vector type (such as <code>vector int</code>), and then
 									dereference the pointer.  This constitutes undefined behavior,
 									because it casts a pointer with a smaller alignment
 									requirement to a pointer with a larger alignment requirement.
 									Compilers may not produce code that you expect in the presence
 									of undefined behavior.
 								      </para>
 								      <para>
 									Thus, do not write the following:
 								      </para>
 								      <programlisting>  int a[4096];
 								  vector int x = *((vector int *) a);</programlisting>
 								      <para>
 									Instead, write this:
 								      </para>
 								      <programlisting>  int a[4096];
 								  vector int x = vec_xl (0, a);</programlisting>
 								    </section>
-												Resolve a number of comments from Paul Clarke, and one from Steve Munroe.

											
										
										
											5 years ago
+								    <section xml:id="VIPR.biendian.sld">
 								      <title>vec_sld and vec_sro are not bi-endian</title>
-												Finish chapter 2.

											
										
										
											5 years ago
+								      <para>
 									One oddity in the bi-endian vector programming model is that
 									<code>vec_sld</code> has big-endian semantics for code
 									compiled for both big-endian and little-endian targets.  That
 									is, any code that uses <code>vec_sld</code> without guarding
 									it with a test on endianness is likely to be incorrect.
 								      </para>
 								      <para>
 									At the time that the bi-endian model was being developed, it
 									was discovered that existing code in several Linux packages
 									was using <code>vec_sld</code> in order to perform multiplies,
 									or to otherwise shift portions of base elements left.  A
 									straightforward little-endian implementation of
 									<code>vec_sld</code> would concatenate the two input vectors
 									in reverse order and shift bytes to the right.  This would
 									only give compatible results for <code>vector char</code>
 									types.  Those using this intrinsic as a cheap multiply, or to
 									shift bytes within larger elements, would see different
 									results on little-endian versus big-endian with such an
 									implementation.  Therefore it was decided that
 									<code>vec_sld</code> would not have a bi-endian
 									implementation.
 								      </para>
 								      <para>
 									<code>vec_sro</code> is not bi-endian for similar reasons.
 								      </para>
 								    </section>
-												Resolve a number of comments from Paul Clarke, and one from Steve Munroe.

											
										
										
											5 years ago
+								    <section xml:id="VIPR.biendian.vperm">
-												Finish chapter 2.

											
										
										
											5 years ago
+								      <title>Limitations on bi-endianness of vec_perm</title>
 								      <para>
 									The <code>vec_perm</code> intrinsic is bi-endian, provided
 									that it is used to reorder entire elements of the input
 									vectors.
 								      </para>
 								      <para>
 									To see why this is, let's examine the code generation for
 								      </para>
 								      <programlisting>  vector int t;
 								  vector int a = (vector int){0x00010203, 0x04050607, 0x08090a0b, 0x0c0d0e0f};
 								  vector int b = (vector int){0x10111213, 0x14151617, 0x18191a1b, 0x1c1d1e1f};
 								  vector char c = (vector char){0,1,2,3,28,29,30,31,12,13,14,15,20,21,22,23};
 								  t = vec_perm (a, b, c);</programlisting>
 								      <para>
 									For big endian, a compiler should generate:
 								      </para>
 								      <programlisting>  vperm  t,a,b,c</programlisting>
 								      <para>
 									For little endian targeting a POWER8 system, a compiler should
 									generate:
 								      </para>
 								      <programlisting>  vnand  d,c,c
 								  vperm  t,b,a,d</programlisting>
 								      <para>
 									For little endian targeting a POWER9 system, a compiler should
 									generate:
 								      </para>
 								      <programlisting>  vpermr  t,b,a,c</programlisting>
 								      <para>
 									Note that the <code>vpermr</code> instruction takes care of
 									modifying the permute control vector (PCV) <code>c</code> that
 									was done using the <code>vnand</code> instruction for POWER8.
 									Because only the bottom 5 bits of each element of the PCV are
 									read by the hardware, this has the effect of subtracting the
 									original elements of the PCV from 31.
 								      </para>
 								      <para>
 									Note also that the PCV <code>c</code> has element values that
 									are contiguous in groups of 4.  This selects entire elements
 									from the input vectors <code>a</code> and <code>b</code> to
 									reorder.  Thus the intent of the code is to select the first
 									integer element of <code>a</code>, the last integer element of
 									<code>b</code>, the last integer element of <code>a</code>,
 									and the second integer element of <code>b</code>, in that
 									order.
 								      </para>
-												Improve section 2.7.3 with tables

Fixes #12.

Signed-off-by: Bill Schmidt <wschmidt@linux.ibm.com>

											
										
										
											5 years ago
+								      <para>
 									The big endian result is {0x00010203, 0x1c1d1e1f, 0x0c0d0e0f,
 x14151617}, as shown here:
 								      </para>
 								      <informaltable frame="all">
 									<tgroup cols="17">
 								          <colspec colname="c1" colwidth="1*" />
 								          <colspec colname="c2" colwidth="1*" />
 								          <colspec colname="c3" colwidth="1*" />
 								          <colspec colname="c4" colwidth="1*" />
 								          <colspec colname="c5" colwidth="1*" />
 								          <colspec colname="c6" colwidth="1*" />
 								          <colspec colname="c7" colwidth="1*" />
 								          <colspec colname="c8" colwidth="1*" />
 								          <colspec colname="c9" colwidth="1*" />
 								          <colspec colname="c10" colwidth="1*" />
 								          <colspec colname="c11" colwidth="1*" />
 								          <colspec colname="c12" colwidth="1*" />
 								          <colspec colname="c13" colwidth="1*" />
 								          <colspec colname="c14" colwidth="1*" />
 								          <colspec colname="c15" colwidth="1*" />
 								          <colspec colname="c16" colwidth="1*" />
 								          <colspec colname="c17" colwidth="1*" />
 								          <tbody>
 								            <row>
 								              <entry align="center">
 										<para><emphasis role="bold">a</emphasis></para>
 								              </entry>
 									      <entry align="center">
 										<para>00</para>
 									      </entry>
 									      <entry align="center">
 										<para>01</para>
 									      </entry>
 									      <entry align="center">
 										<para>02</para>
 									      </entry>
 									      <entry align="center">
 										<para>03</para>
 									      </entry>
 									      <entry align="center">
 										<para>04</para>
 									      </entry>
 									      <entry align="center">
 										<para>05</para>
 									      </entry>
 									      <entry align="center">
 										<para>06</para>
 									      </entry>
 									      <entry align="center">
 										<para>07</para>
 									      </entry>
 									      <entry align="center">
 										<para>08</para>
 									      </entry>
 									      <entry align="center">
 										<para>09</para>
 									      </entry>
 									      <entry align="center">
 										<para>0A</para>
 									      </entry>
 									      <entry align="center">
 										<para>0B</para>
 									      </entry>
 									      <entry align="center">
 										<para>0C</para>
 									      </entry>
 									      <entry align="center">
 										<para>0D</para>
 									      </entry>
 									      <entry align="center">
 										<para>0E</para>
 									      </entry>
 									      <entry align="center">
 										<para>0F</para>
 									      </entry>
 									    </row>
 								            <row>
 								              <entry align="center">
 										<para><emphasis role="bold">b</emphasis></para>
 								              </entry>
 									      <entry align="center">
 										<para>10</para>
 									      </entry>
 									      <entry align="center">
 										<para>11</para>
 									      </entry>
 									      <entry align="center">
 										<para>12</para>
 									      </entry>
 									      <entry align="center">
 										<para>13</para>
 									      </entry>
 									      <entry align="center">
 										<para>14</para>
 									      </entry>
 									      <entry align="center">
 										<para>15</para>
 									      </entry>
 									      <entry align="center">
 										<para>16</para>
 									      </entry>
 									      <entry align="center">
 										<para>17</para>
 									      </entry>
 									      <entry align="center">
 										<para>18</para>
 									      </entry>
 									      <entry align="center">
 										<para>19</para>
 									      </entry>
 									      <entry align="center">
 										<para>1A</para>
 									      </entry>
 									      <entry align="center">
 										<para>1B</para>
 									      </entry>
 									      <entry align="center">
 										<para>1C</para>
 									      </entry>
 									      <entry align="center">
 										<para>1D</para>
 									      </entry>
 									      <entry align="center">
 										<para>1E</para>
 									      </entry>
 									      <entry align="center">
 										<para>1F</para>
 									      </entry>
 									    </row>
 								            <row>
 								              <entry align="center">
 										<para><emphasis role="bold">c</emphasis></para>
 								              </entry>
 									      <entry align="center">
 										<para>0</para>
 									      </entry>
 									      <entry align="center">
 										<para>1</para>
 									      </entry>
 									      <entry align="center">
 										<para>2</para>
 									      </entry>
 									      <entry align="center">
 										<para>3</para>
 									      </entry>
 									      <entry align="center">
 										<para>28</para>
 									      </entry>
 									      <entry align="center">
 										<para>29</para>
 									      </entry>
 									      <entry align="center">
 										<para>30</para>
 									      </entry>
 									      <entry align="center">
 										<para>31</para>
 									      </entry>
 									      <entry align="center">
 										<para>12</para>
 									      </entry>
 									      <entry align="center">
 										<para>13</para>
 									      </entry>
 									      <entry align="center">
 										<para>14</para>
 									      </entry>
 									      <entry align="center">
 										<para>15</para>
 									      </entry>
 									      <entry align="center">
 										<para>20</para>
 									      </entry>
 									      <entry align="center">
 										<para>21</para>
 									      </entry>
 									      <entry align="center">
 										<para>22</para>
 									      </entry>
 									      <entry align="center">
 										<para>23</para>
 									      </entry>
 									    </row>
 								            <row>
 								              <entry align="center">
 										<para><emphasis role="bold">t</emphasis></para>
 								              </entry>
 									      <entry align="center">
 										<para>00</para>
 									      </entry>
 									      <entry align="center">
 										<para>01</para>
 									      </entry>
 									      <entry align="center">
 										<para>02</para>
 									      </entry>
 									      <entry align="center">
 										<para>03</para>
 									      </entry>
 									      <entry align="center">
 										<para>1C</para>
 									      </entry>
 									      <entry align="center">
 										<para>1D</para>
 									      </entry>
 									      <entry align="center">
 										<para>1E</para>
 									      </entry>
 									      <entry align="center">
 										<para>1F</para>
 									      </entry>
 									      <entry align="center">
 										<para>0C</para>
 									      </entry>
 									      <entry align="center">
 										<para>0D</para>
 									      </entry>
 									      <entry align="center">
 										<para>0E</para>
 									      </entry>
 									      <entry align="center">
 										<para>0F</para>
 									      </entry>
 									      <entry align="center">
 										<para>14</para>
 									      </entry>
 									      <entry align="center">
 										<para>15</para>
 									      </entry>
 									      <entry align="center">
 										<para>16</para>
 									      </entry>
 									      <entry align="center">
 										<para>17</para>
 									      </entry>
 									    </row>
 									  </tbody>
 									</tgroup>
 								      </informaltable>
-												Finish chapter 2.

											
										
										
											5 years ago
+								      <para>
 									For little endian, the modified PCV is elementwise subtracted
 									from 31, giving {31,30,29,28,3,2,1,0,19,18,17,16,11,10,9,8}.
 									Since the elements appear in reverse order in a register when
 									loaded from little-endian memory, the elements appear in the
 									register from left to right as
 									{8,9,10,11,16,17,18,19,0,1,2,3,28,29,30,31}.  So the following
 									<code>vperm</code> instruction will again select entire
 									elements using the groups of 4 contiguous bytes, and the
 									values of the integers will be reordered without compromising
-												Improve section 2.7.3 with tables

Fixes #12.

Signed-off-by: Bill Schmidt <wschmidt@linux.ibm.com>

											
										
										
											5 years ago
+									each integer's contents.  The little-endian result matches the
 									big-endian result, as shown.  Observe that <emphasis
 									role="bold">a</emphasis> and <emphasis
 									role="bold">b</emphasis> switch positions for little endian
 									code generation.
-												Finish chapter 2.

											
										
										
											5 years ago
+								      </para>
-												Improve section 2.7.3 with tables

Fixes #12.

Signed-off-by: Bill Schmidt <wschmidt@linux.ibm.com>

											
										
										
											5 years ago
+								      <informaltable frame="all">
 									<tgroup cols="17">
 								          <colspec colname="c1" colwidth="1*" />
 								          <colspec colname="c2" colwidth="1*" />
 								          <colspec colname="c3" colwidth="1*" />
 								          <colspec colname="c4" colwidth="1*" />
 								          <colspec colname="c5" colwidth="1*" />
 								          <colspec colname="c6" colwidth="1*" />
 								          <colspec colname="c7" colwidth="1*" />
 								          <colspec colname="c8" colwidth="1*" />
 								          <colspec colname="c9" colwidth="1*" />
 								          <colspec colname="c10" colwidth="1*" />
 								          <colspec colname="c11" colwidth="1*" />
 								          <colspec colname="c12" colwidth="1*" />
 								          <colspec colname="c13" colwidth="1*" />
 								          <colspec colname="c14" colwidth="1*" />
 								          <colspec colname="c15" colwidth="1*" />
 								          <colspec colname="c16" colwidth="1*" />
 								          <colspec colname="c17" colwidth="1*" />
 								          <tbody>
 								            <row>
 								              <entry align="center">
 										<para><emphasis role="bold">b</emphasis></para>
 								              </entry>
 									      <entry align="center">
 										<para>1C</para>
 									      </entry>
 									      <entry align="center">
 										<para>1D</para>
 									      </entry>
 									      <entry align="center">
 										<para>1E</para>
 									      </entry>
 									      <entry align="center">
 										<para>1F</para>
 									      </entry>
 									      <entry align="center">
 										<para>18</para>
 									      </entry>
 									      <entry align="center">
 										<para>19</para>
 									      </entry>
 									      <entry align="center">
 										<para>1A</para>
 									      </entry>
 									      <entry align="center">
 										<para>1B</para>
 									      </entry>
 									      <entry align="center">
 										<para>14</para>
 									      </entry>
 									      <entry align="center">
 										<para>15</para>
 									      </entry>
 									      <entry align="center">
 										<para>16</para>
 									      </entry>
 									      <entry align="center">
 										<para>17</para>
 									      </entry>
 									      <entry align="center">
 										<para>10</para>
 									      </entry>
 									      <entry align="center">
 										<para>11</para>
 									      </entry>
 									      <entry align="center">
 										<para>12</para>
 									      </entry>
 									      <entry align="center">
 										<para>13</para>
 									      </entry>
 									    </row>
 								            <row>
 								              <entry align="center">
 										<para><emphasis role="bold">a</emphasis></para>
 								              </entry>
 									      <entry align="center">
 										<para>0C</para>
 									      </entry>
 									      <entry align="center">
 										<para>0D</para>
 									      </entry>
 									      <entry align="center">
 										<para>0E</para>
 									      </entry>
 									      <entry align="center">
 										<para>0F</para>
 									      </entry>
 									      <entry align="center">
 										<para>08</para>
 									      </entry>
 									      <entry align="center">
 										<para>09</para>
 									      </entry>
 									      <entry align="center">
 										<para>0A</para>
 									      </entry>
 									      <entry align="center">
 										<para>0B</para>
 									      </entry>
 									      <entry align="center">
 										<para>04</para>
 									      </entry>
 									      <entry align="center">
 										<para>05</para>
 									      </entry>
 									      <entry align="center">
 										<para>06</para>
 									      </entry>
 									      <entry align="center">
 										<para>07</para>
 									      </entry>
 									      <entry align="center">
 										<para>00</para>
 									      </entry>
 									      <entry align="center">
 										<para>01</para>
 									      </entry>
 									      <entry align="center">
 										<para>02</para>
 									      </entry>
 									      <entry align="center">
 										<para>03</para>
 									      </entry>
 									    </row>
 								            <row>
 								              <entry align="center">
 										<para><emphasis role="bold">c</emphasis></para>
 								              </entry>
 									      <entry align="center">
 										<para>8</para>
 									      </entry>
 									      <entry align="center">
 										<para>9</para>
 									      </entry>
 									      <entry align="center">
 										<para>10</para>
 									      </entry>
 									      <entry align="center">
 										<para>11</para>
 									      </entry>
 									      <entry align="center">
 										<para>16</para>
 									      </entry>
 									      <entry align="center">
 										<para>17</para>
 									      </entry>
 									      <entry align="center">
 										<para>18</para>
 									      </entry>
 									      <entry align="center">
 										<para>19</para>
 									      </entry>
 									      <entry align="center">
 										<para>0</para>
 									      </entry>
 									      <entry align="center">
 										<para>1</para>
 									      </entry>
 									      <entry align="center">
 										<para>2</para>
 									      </entry>
 									      <entry align="center">
 										<para>3</para>
 									      </entry>
 									      <entry align="center">
 										<para>28</para>
 									      </entry>
 									      <entry align="center">
 										<para>29</para>
 									      </entry>
 									      <entry align="center">
 										<para>30</para>
 									      </entry>
 									      <entry align="center">
 										<para>31</para>
 									      </entry>
 									    </row>
 								            <row>
 								              <entry align="center">
 										<para><emphasis role="bold">t</emphasis></para>
 								              </entry>
 									      <entry align="center">
 										<para>14</para>
 									      </entry>
 									      <entry align="center">
 										<para>15</para>
 									      </entry>
 									      <entry align="center">
 										<para>16</para>
 									      </entry>
 									      <entry align="center">
 										<para>17</para>
 									      </entry>
 									      <entry align="center">
 										<para>0C</para>
 									      </entry>
 									      <entry align="center">
 										<para>0D</para>
 									      </entry>
 									      <entry align="center">
 										<para>0E</para>
 									      </entry>
 									      <entry align="center">
 										<para>0F</para>
 									      </entry>
 									      <entry align="center">
 										<para>1C</para>
 									      </entry>
 									      <entry align="center">
 										<para>1D</para>
 									      </entry>
 									      <entry align="center">
 										<para>1E</para>
 									      </entry>
 									      <entry align="center">
 										<para>1F</para>
 									      </entry>
 									      <entry align="center">
 										<para>00</para>
 									      </entry>
 									      <entry align="center">
 										<para>01</para>
 									      </entry>
 									      <entry align="center">
 										<para>02</para>
 									      </entry>
 									      <entry align="center">
 										<para>03</para>
 									      </entry>
 									    </row>
 									  </tbody>
 									</tgroup>
 								      </informaltable>
-												Finish chapter 2.

											
										
										
											5 years ago
+								      <para>
 									Now, suppose instead that the original PCV does not reorder
 									entire integers at once:
 								      </para>
 								      <programlisting>  vector char c = (vector char){0,20,31,4,7,17,6,19,30,3,2,8,9,13,5,22};</programlisting>
 								      <para>
 									The result of the big-endian implementation would be:
 								      </para>
 								      <programlisting>  t = {0x00141f04, 0x07110613, 0x1e030208, 0x090d0516};</programlisting>
-												Improve section 2.7.3 with tables

Fixes #12.

Signed-off-by: Bill Schmidt <wschmidt@linux.ibm.com>

											
										
										
											5 years ago
+								      <informaltable frame="all">
 									<tgroup cols="17">
 								          <colspec colname="c1" colwidth="1*" />
 								          <colspec colname="c2" colwidth="1*" />
 								          <colspec colname="c3" colwidth="1*" />
 								          <colspec colname="c4" colwidth="1*" />
 								          <colspec colname="c5" colwidth="1*" />
 								          <colspec colname="c6" colwidth="1*" />
 								          <colspec colname="c7" colwidth="1*" />
 								          <colspec colname="c8" colwidth="1*" />
 								          <colspec colname="c9" colwidth="1*" />
 								          <colspec colname="c10" colwidth="1*" />
 								          <colspec colname="c11" colwidth="1*" />
 								          <colspec colname="c12" colwidth="1*" />
 								          <colspec colname="c13" colwidth="1*" />
 								          <colspec colname="c14" colwidth="1*" />
 								          <colspec colname="c15" colwidth="1*" />
 								          <colspec colname="c16" colwidth="1*" />
 								          <colspec colname="c17" colwidth="1*" />
 								          <tbody>
 								            <row>
 								              <entry align="center">
 										<para><emphasis role="bold">a</emphasis></para>
 								              </entry>
 									      <entry align="center">
 										<para>00</para>
 									      </entry>
 									      <entry align="center">
 										<para>01</para>
 									      </entry>
 									      <entry align="center">
 										<para>02</para>
 									      </entry>
 									      <entry align="center">
 										<para>03</para>
 									      </entry>
 									      <entry align="center">
 										<para>04</para>
 									      </entry>
 									      <entry align="center">
 										<para>05</para>
 									      </entry>
 									      <entry align="center">
 										<para>06</para>
 									      </entry>
 									      <entry align="center">
 										<para>07</para>
 									      </entry>
 									      <entry align="center">
 										<para>08</para>
 									      </entry>
 									      <entry align="center">
 										<para>09</para>
 									      </entry>
 									      <entry align="center">
 										<para>0A</para>
 									      </entry>
 									      <entry align="center">
 										<para>0B</para>
 									      </entry>
 									      <entry align="center">
 										<para>0C</para>
 									      </entry>
 									      <entry align="center">
 										<para>0D</para>
 									      </entry>
 									      <entry align="center">
 										<para>0E</para>
 									      </entry>
 									      <entry align="center">
 										<para>0F</para>
 									      </entry>
 									    </row>
 								            <row>
 								              <entry align="center">
 										<para><emphasis role="bold">b</emphasis></para>
 								              </entry>
 									      <entry align="center">
 										<para>10</para>
 									      </entry>
 									      <entry align="center">
 										<para>11</para>
 									      </entry>
 									      <entry align="center">
 										<para>12</para>
 									      </entry>
 									      <entry align="center">
 										<para>13</para>
 									      </entry>
 									      <entry align="center">
 										<para>14</para>
 									      </entry>
 									      <entry align="center">
 										<para>15</para>
 									      </entry>
 									      <entry align="center">
 										<para>16</para>
 									      </entry>
 									      <entry align="center">
 										<para>17</para>
 									      </entry>
 									      <entry align="center">
 										<para>18</para>
 									      </entry>
 									      <entry align="center">
 										<para>19</para>
 									      </entry>
 									      <entry align="center">
 										<para>1A</para>
 									      </entry>
 									      <entry align="center">
 										<para>1B</para>
 									      </entry>
 									      <entry align="center">
 										<para>1C</para>
 									      </entry>
 									      <entry align="center">
 										<para>1D</para>
 									      </entry>
 									      <entry align="center">
 										<para>1E</para>
 									      </entry>
 									      <entry align="center">
 										<para>1F</para>
 									      </entry>
 									    </row>
 								            <row>
 								              <entry align="center">
 										<para><emphasis role="bold">c</emphasis></para>
 								              </entry>
 									      <entry align="center">
 										<para>0</para>
 									      </entry>
 									      <entry align="center">
 										<para>20</para>
 									      </entry>
 									      <entry align="center">
 										<para>31</para>
 									      </entry>
 									      <entry align="center">
 										<para>4</para>
 									      </entry>
 									      <entry align="center">
 										<para>7</para>
 									      </entry>
 									      <entry align="center">
 										<para>17</para>
 									      </entry>
 									      <entry align="center">
 										<para>6</para>
 									      </entry>
 									      <entry align="center">
 										<para>19</para>
 									      </entry>
 									      <entry align="center">
 										<para>30</para>
 									      </entry>
 									      <entry align="center">
 										<para>3</para>
 									      </entry>
 									      <entry align="center">
 										<para>2</para>
 									      </entry>
 									      <entry align="center">
 										<para>8</para>
 									      </entry>
 									      <entry align="center">
 										<para>9</para>
 									      </entry>
 									      <entry align="center">
 										<para>13</para>
 									      </entry>
 									      <entry align="center">
 										<para>5</para>
 									      </entry>
 									      <entry align="center">
 										<para>22</para>
 									      </entry>
 									    </row>
 								            <row>
 								              <entry align="center">
 										<para><emphasis role="bold">t</emphasis></para>
 								              </entry>
 									      <entry align="center">
 										<para>00</para>
 									      </entry>
 									      <entry align="center">
 										<para>14</para>
 									      </entry>
 									      <entry align="center">
 										<para>1F</para>
 									      </entry>
 									      <entry align="center">
 										<para>04</para>
 									      </entry>
 									      <entry align="center">
 										<para>07</para>
 									      </entry>
 									      <entry align="center">
 										<para>11</para>
 									      </entry>
 									      <entry align="center">
 										<para>06</para>
 									      </entry>
 									      <entry align="center">
 										<para>13</para>
 									      </entry>
 									      <entry align="center">
 										<para>1E</para>
 									      </entry>
 									      <entry align="center">
 										<para>03</para>
 									      </entry>
 									      <entry align="center">
 										<para>02</para>
 									      </entry>
 									      <entry align="center">
 										<para>08</para>
 									      </entry>
 									      <entry align="center">
 										<para>09</para>
 									      </entry>
 									      <entry align="center">
 										<para>0D</para>
 									      </entry>
 									      <entry align="center">
 										<para>05</para>
 									      </entry>
 									      <entry align="center">
 										<para>16</para>
 									      </entry>
 									    </row>
 									  </tbody>
 									</tgroup>
 								      </informaltable>
-												Finish chapter 2.

											
										
										
											5 years ago
+								      <para>
 									For little-endian, the modified PCV would be
 									{31,11,0,27,24,14,25,12,1,28,29,23,22,18,26,9}, appearing in
 									the register as
 									{9,26,18,22,23,29,28,1,12,25,14,24,27,0,11,31}.  The final
 									little-endian result would be
 								      </para>
 								      <programlisting>  t = {0x071c1703, 0x10051204, 0x0b01001d, 0x15060e0a};</programlisting>
 								      <para>
 									which bears no resemblance to the big-endian result.
 								      </para>
-												Improve section 2.7.3 with tables

Fixes #12.

Signed-off-by: Bill Schmidt <wschmidt@linux.ibm.com>

											
										
										
											5 years ago
+								      <informaltable frame="all">
 									<tgroup cols="17">
 								          <colspec colname="c1" colwidth="1*" />
 								          <colspec colname="c2" colwidth="1*" />
 								          <colspec colname="c3" colwidth="1*" />
 								          <colspec colname="c4" colwidth="1*" />
 								          <colspec colname="c5" colwidth="1*" />
 								          <colspec colname="c6" colwidth="1*" />
 								          <colspec colname="c7" colwidth="1*" />
 								          <colspec colname="c8" colwidth="1*" />
 								          <colspec colname="c9" colwidth="1*" />
 								          <colspec colname="c10" colwidth="1*" />
 								          <colspec colname="c11" colwidth="1*" />
 								          <colspec colname="c12" colwidth="1*" />
 								          <colspec colname="c13" colwidth="1*" />
 								          <colspec colname="c14" colwidth="1*" />
 								          <colspec colname="c15" colwidth="1*" />
 								          <colspec colname="c16" colwidth="1*" />
 								          <colspec colname="c17" colwidth="1*" />
 								          <tbody>
 								            <row>
 								              <entry align="center">
 										<para><emphasis role="bold">b</emphasis></para>
 								              </entry>
 									      <entry align="center">
 										<para>1C</para>
 									      </entry>
 									      <entry align="center">
 										<para>1D</para>
 									      </entry>
 									      <entry align="center">
 										<para>1E</para>
 									      </entry>
 									      <entry align="center">
 										<para>1F</para>
 									      </entry>
 									      <entry align="center">
 										<para>18</para>
 									      </entry>
 									      <entry align="center">
 										<para>19</para>
 									      </entry>
 									      <entry align="center">
 										<para>1A</para>
 									      </entry>
 									      <entry align="center">
 										<para>1B</para>
 									      </entry>
 									      <entry align="center">
 										<para>14</para>
 									      </entry>
 									      <entry align="center">
 										<para>15</para>
 									      </entry>
 									      <entry align="center">
 										<para>16</para>
 									      </entry>
 									      <entry align="center">
 										<para>17</para>
 									      </entry>
 									      <entry align="center">
 										<para>10</para>
 									      </entry>
 									      <entry align="center">
 										<para>11</para>
 									      </entry>
 									      <entry align="center">
 										<para>12</para>
 									      </entry>
 									      <entry align="center">
 										<para>13</para>
 									      </entry>
 									    </row>
 								            <row>
 								              <entry align="center">
 										<para><emphasis role="bold">a</emphasis></para>
 								              </entry>
 									      <entry align="center">
 										<para>0C</para>
 									      </entry>
 									      <entry align="center">
 										<para>0D</para>
 									      </entry>
 									      <entry align="center">
 										<para>0E</para>
 									      </entry>
 									      <entry align="center">
 										<para>0F</para>
 									      </entry>
 									      <entry align="center">
 										<para>08</para>
 									      </entry>
 									      <entry align="center">
 										<para>09</para>
 									      </entry>
 									      <entry align="center">
 										<para>0A</para>
 									      </entry>
 									      <entry align="center">
 										<para>0B</para>
 									      </entry>
 									      <entry align="center">
 										<para>04</para>
 									      </entry>
 									      <entry align="center">
 										<para>05</para>
 									      </entry>
 									      <entry align="center">
 										<para>06</para>
 									      </entry>
 									      <entry align="center">
 										<para>07</para>
 									      </entry>
 									      <entry align="center">
 										<para>00</para>
 									      </entry>
 									      <entry align="center">
 										<para>01</para>
 									      </entry>
 									      <entry align="center">
 										<para>02</para>
 									      </entry>
 									      <entry align="center">
 										<para>03</para>
 									      </entry>
 									    </row>
 								            <row>
 								              <entry align="center">
 										<para><emphasis role="bold">c</emphasis></para>
 								              </entry>
 									      <entry align="center">
 										<para>9</para>
 									      </entry>
 									      <entry align="center">
 										<para>26</para>
 									      </entry>
 									      <entry align="center">
 										<para>18</para>
 									      </entry>
 									      <entry align="center">
 										<para>22</para>
 									      </entry>
 									      <entry align="center">
 										<para>23</para>
 									      </entry>
 									      <entry align="center">
 										<para>29</para>
 									      </entry>
 									      <entry align="center">
 										<para>28</para>
 									      </entry>
 									      <entry align="center">
 										<para>1</para>
 									      </entry>
 									      <entry align="center">
 										<para>12</para>
 									      </entry>
 									      <entry align="center">
 										<para>25</para>
 									      </entry>
 									      <entry align="center">
 										<para>14</para>
 									      </entry>
 									      <entry align="center">
 										<para>24</para>
 									      </entry>
 									      <entry align="center">
 										<para>27</para>
 									      </entry>
 									      <entry align="center">
 										<para>0</para>
 									      </entry>
 									      <entry align="center">
 										<para>11</para>
 									      </entry>
 									      <entry align="center">
 										<para>31</para>
 									      </entry>
 									    </row>
 								            <row>
 								              <entry align="center">
 										<para><emphasis role="bold">t</emphasis></para>
 								              </entry>
 									      <entry align="center">
 										<para>15</para>
 									      </entry>
 									      <entry align="center">
 										<para>06</para>
 									      </entry>
 									      <entry align="center">
 										<para>0E</para>
 									      </entry>
 									      <entry align="center">
 										<para>0A</para>
 									      </entry>
 									      <entry align="center">
 										<para>0B</para>
 									      </entry>
 									      <entry align="center">
 										<para>01</para>
 									      </entry>
 									      <entry align="center">
 										<para>00</para>
 									      </entry>
 									      <entry align="center">
 										<para>1D</para>
 									      </entry>
 									      <entry align="center">
 										<para>10</para>
 									      </entry>
 									      <entry align="center">
 										<para>05</para>
 									      </entry>
 									      <entry align="center">
 										<para>12</para>
 									      </entry>
 									      <entry align="center">
 										<para>04</para>
 									      </entry>
 									      <entry align="center">
 										<para>07</para>
 									      </entry>
 									      <entry align="center">
 										<para>1C</para>
 									      </entry>
 									      <entry align="center">
 										<para>17</para>
 									      </entry>
 									      <entry align="center">
 										<para>03</para>
 									      </entry>
 									    </row>
 									  </tbody>
 									</tgroup>
 								      </informaltable>
-												Finish chapter 2.

											
										
										
											5 years ago
+								      <para>
 									The lesson here is to only use <code>vec_perm</code> to
 									reorder entire elements of a vector.  If you must use vec_perm
 									for another purpose, your code must include a test for
 									endianness and separate algorithms for big- and
-												Make updates for comments received so far, including issue #4 and
issue #5.  XL bug report support for Linux is still pending.

											
										
										
											5 years ago
+									little-endian.  Examples of this may be seen in the Power
 									Vector Library project (see <xref linkend="VIPR.intro.links"
 									/>).
-												Finish chapter 2.

											
										
										
											5 years ago
+								      </para>
 								    </section>
-												Create outline for front matter chapters.

Signed-off-by: Bill Schmidt <wschmidt@linux.ibm.com>

											
										
										
											6 years ago
+								  </section>
 								</chapter>