Programming-Guides/Porting_Vector_Intrinsics/sec_power_vsx.xml

<?xml version="1.0" encoding="UTF-8"?>
<!--
  Copyright (c) 2017 OpenPOWER Foundation

  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License.

-->
<section xmlns="http://docbook.org/ns/docbook"
  xmlns:xi="http://www.w3.org/2001/XInclude"
  xmlns:xlink="http://www.w3.org/1999/xlink"
  version="5.0"
  xml:id="sec_power_vector_scalar_floatingpoint">
  <title>Vector-Scalar Floating-Point Operations (VSX)</title>

  <para>With PowerISA 2.06 (POWER7) we extended the vector SIMD capabilities
  of the PowerISA:</para>

  <itemizedlist spacing="compact">
    <listitem>
      <para>Extend the available vector and floating-point scalar register
      sets from 32 registers each to a combined register set of 64 x 64-bit
      scalar floating-point and
      64 x 128-bit vector registers.</para>
    </listitem>
    <listitem>
      <para>Enable scalar double float operations on all 64 scalar
      registers.</para>
    </listitem>
    <listitem>
      <para>Enable vector double and vector float operations for all 64
      vector registers.</para>
    </listitem>
    <listitem>
      <para>Enable super-scalar execution of vector instructions and support
      2 independent vector floating point  pipelines for parallel execution of 4 x
      64-bit Floating point Fused Multiply Adds (FMAs) and 8 x 32-bit FMAs per
      cycle.</para>
    </listitem>
  </itemizedlist>

  <para>With PowerISA 2.07 (POWER8) we added single-precision scalar
  floating-point instructions to VSX. This completes the floating-point
  computational set for VSX. This ISA release also clarified how these operate in
  the Little Endian storage model.</para>

  <para>While the focus was on enhanced floating-point computation (for High
  Performance Computing), VSX also extended  the ISA with additional storage
  access, logical, and permute (merge, splat, shift) instructions. This was
  necessary to extend these operations to cover 64 VSX registers, and improves
  unaligned storage access for vectors  (not available in VMX).</para>

  <para>The PowerISA 2.07B Chapter 7. Vector-Scalar Floating-Point Operations
  is organized starting with an introduction and overview (chapters 7.1- 7.5) .
  The early sections (7.1 and 7.2) describe the layout of the 64 VSX registers
  and how they relate (overlap and inter-operate) to the existing floating point
  scalar (FPRs) and vector (VMX VRs) registers.

  <literallayout><literal>7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 317
7.1.1 Overview of the Vector-Scalar Extension  . . . . . . . . . . . 317
7.2 VSX Registers  . . . . . . . . . . . . . . . . . . . . . . . . . 318
7.2.1 Vector-Scalar Registers  . . . . . . . . . . . . . . . . . . . 318
7.2.2 Floating-Point Status and Control Register . . . . . . . . . . 321</literal></literallayout></para>

  <para>The definitions given in “7.1.1.1 Compatibility with Category
  Floating-Point and Category Decimal Floating-Point Operations”, and
  “7.1.1.2 Compatibility with Category Vector Operations”
    <blockquote>
      <para>The instruction sets defined in Chapter 4.
      Floating-Point Facility and Chapter 5. Decimal
      Floating-Point retain their definition with one primary
      difference. The FPRs are mapped to doubleword
      element 0 of VSRs 0-31. The contents of doubleword 1
      of the VSR corresponding to a source FPR specified
      by an instruction are ignored. The contents of
      doubleword 1 of a VSR corresponding to the target
      FPR specified by an instruction are undefined.</para>

      <para>The instruction set defined in Chapter 6. Vector Facility
      [Category: Vector], retains its definition with one
      primary difference. The VRs are mapped to VSRs
      32-63.</para></blockquote></para>

  <note><para>The reference to scalar element 0 above is from the big endian
  register perspective of the ISA. In the PPC64LE ABI implementation, and for the
  purpose of porting Intel intrinsics, this is logical doubleword element 1.  Intel SSE
  scalar intrinsics operated on logical element [0],  which is in the wrong
  position for PowerISA FPU and VSX scalar floating-point  operations. Another
  important note is what happens to the other half of the VSR when you execute a
  scalar floating-point instruction (<emphasis>The contents of doubleword 1 of a VSR …
  are undefined.</emphasis>)</para></note>

  <para>The compiler will hide some of this detail when generating code for
  little endian vector element [] notation and most vector built-ins. For example
  <literal>vec_splat (A, 0)</literal> is transformed for
  PPC64LE to <literal>xxspltd VRT,VRA,1</literal>.
  What the compiler <emphasis><emphasis role="bold">can not</emphasis></emphasis>
  hide is the different placement of scalars within vector registers.</para>

  <para>Vector registers (VRs) 0-31 overlay and can be accessed from vector
  scalar registers (VSRs) 32-63. The ABI also specifies that VR2-13 are used to
  pass parameter and return values. In some cases the same (similar) operations
  exist in both VMX and VSX instruction forms, while in the other cases
  operations only exist for VMX (byte level permute and shift) or VSX (Vector
  double).</para>

  <para>So register selection that avoids unnecessary vector moves and follows
  the ABI while maintaining the correct instruction specific register numbering,
  can be tricky. The
  <link xlink:href="https://gcc.gnu.org/onlinedocs/gcc-6.3.0/gcc/Machine-Constraints.html#Machine-Constraints">GCC register constraint</link>
  annotations for Inline
  assembler using vector instructions are challenging, even for experts. So only
  experts should be writing assembler and then only in extraordinary
  circumstances. You should leave these details to the compiler (using vector
  extensions and vector built-ins) when ever possible.</para>

  <para>The next sections gets into the details of floating point
  representation, operations, and exceptions. They describe the implementation
  details for the IEEE-754R and C/C++ language standards that most developers only
  access via higher level APIs. Most programmers will not need this level of
  detail, but it is there if needed.

  <literallayout><literal>7.3 VSX Operations . . . . . . . . . . . . . . . . . . . . . . . . . 326
7.3.1 VSX Floating-Point Arithmetic Overview . . . . . . . . . . . . 326
7.3.2 VSX Floating-Point Data  . . . . . . . . . . . . . . . . . . . 327
7.3.3 VSX Floating-Point Execution Models  . . . . . . . . . . . . . 335
7.4 VSX Floating-Point Exceptions  . . . . . . . . . . . . . . . . . 338
7.4.1 Floating-Point Invalid Operation Exception . . . . . . . . . . 341
7.4.2 Floating-Point Zero Divide Exception . . . . . . . . . . . . . 347
7.4.3 Floating-Point Overflow Exception. . . . . . . . . . . . . . . 349
7.4.4 Floating-Point Underflow Exception . . . . . . . . . . . . . . 351</literal></literallayout></para>

  <para>Next comes an overview of the VSX storage access instructions for big and
  little endian and for aligned and unaligned data addresses. This included
  diagrams that illuminate the differences.

  <literallayout><literal>7.5 VSX Storage Access Operations  . . . . . . . . . . . . . . . . . 356
7.5.1 Accessing Aligned Storage Operands . . . . . . . . . . . . . . 356
7.5.2 Accessing Unaligned Storage Operands . . . . . . . . . . . . . 357
7.5.3 Storage Access Exceptions  . . . . . . . . . . . . . . . . . . 358</literal></literallayout></para>

  <para>Section 7.6 starts with a VSX instruction Set Summary which is the
  place to start to get a feel for the types and operations supported.  The
  emphasis on floating-point, both scalar and vector (especially vector double), is
  pronounced. Many of the scalar and single-precision vector instructions look
  like duplicates of what we have seen in the Chapter 4 Floating-Point and
  Chapter 6 Vector facilities. The difference here is new instruction encodings
  to access the full 64 VSX register space. </para>

  <para>In addition there are a small number of logical instructions
  included to support predication (selecting / masking vector elements based on
  comparison results), and a set of permute, merge, shift, and splat instructions that
  operate on VSX word (float) and doubleword (double) elements. As mentioned
  about VMX section 6.8 these instructions are good to study as they are useful
  for realigning elements from PowerISA vector results to the form required for Intel
  Intrinsics.

  <literallayout><literal>7.6 VSX Instruction Set . . . . . . . . . . . . . . . . . . . . . .  359
7.6.1 VSX Instruction Set Summary . . . . . . . . . . . . . . . . .  359
7.6.1.1 VSX Storage Access Instructions . . . . . . . . . . . . . .  359
7.6.1.2 VSX Move Instructions . . . . . . . . . . . . . . . . . . .  360
7.6.1.3 VSX Floating-Point Arithmetic Instructions  . . . . . . . .  360
7.6.1.4 VSX Floating-Point Compare Instructions . . . . . . . . . .  363
7.6.1.5 VSX DP-SP Conversion Instructions . . . . . . . . . . . . .  364
7.6.1.6 VSX Integer Conversion Instructions . . . . . . . . . . . .  364
7.6.1.7 VSX Round to Floating-Point Integer Instructions  . . . . .  366
7.6.1.8 VSX Logical Instructions. . . . . . . . . . . . . . . . . .  366
7.6.1.9 VSX Permute Instructions. . . . . . . . . . . . . . . . . .  367
7.6.2 VSX Instruction Description Conventions . . . . . . . . . . .  368
7.6.3 VSX Instruction Descriptions  . . . . . . . . . . . . . . . .  392</literal></literallayout></para>

  <para>The VSX Instruction Descriptions section contains the detail
  description for each VSX category instruction.  The table entries from the
  Instruction Set Summary are formatted in the document as hyperlinks to
  corresponding instruction descriptions.</para>

</section>