You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
1365 lines
33 KiB
XML
1365 lines
33 KiB
XML
<!--
|
|
Copyright (c) 2021 OpenPOWER Foundation
|
|
|
|
Licensed under the Apache License, Version 2.0 (the "License");
|
|
you may not use this file except in compliance with the License.
|
|
You may obtain a copy of the License at
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
Unless required by applicable law or agreed to in writing, software
|
|
distributed under the License is distributed on an "AS IS" BASIS,
|
|
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
See the License for the specific language governing permissions and
|
|
limitations under the License.
|
|
|
|
-->
|
|
<chapter version="5.0" xml:lang="en" xmlns="http://docbook.org/ns/docbook" xmlns:xi="http://www.w3.org/2001/XInclude"
|
|
xmlns:xlink="http://www.w3.org/1999/xlink" xml:id="VIPR.mma-ref"
|
|
revisionflag="added">
|
|
|
|
<!-- Chapter Title goes here. -->
|
|
<title>Matrix-Multiply Assist (MMA) Intrinsic Reference</title>
|
|
|
|
<section>
|
|
<title>Introduction</title>
|
|
<para>
|
|
Version 3.1 of the Power Instruction Set Architecture
|
|
Specification (see <xref linkend="VIPR.intro.links" />)
|
|
introduced instructions to accelerate matrix multiplication
|
|
computations. These instructions operate both on the VSRs and
|
|
on eight new 512-bit accumulator registers (ACCs). Intrinsic
|
|
functions to access these instructions are described in this
|
|
chapter.
|
|
</para>
|
|
<para>
|
|
Although the ACCs are treated as separate registers from the
|
|
VSRs, each <code>ACC[i]</code> may use its associated VSRs
|
|
<code>4i</code> to <code>4i+3</code> as scratch space. That is,
|
|
when <code>ACC[i]</code> contains defined data, the contents of
|
|
VSRs <code>4i</code> to <code>4i+3</code> are undefined until
|
|
an <code>xxmfacc</code> instruction is used to copy the contents
|
|
of <code>ACC[i]</code> to the VSRs. Writing to a VSR associated
|
|
with <code>ACC[i]</code> that contains defined data will cause
|
|
<code>ACC[i]</code> to become undefined.
|
|
</para>
|
|
<para>
|
|
This reference is not intended to be a complete introduction to
|
|
MMA concepts. The reader is directed to the Matrix-Multiply
|
|
Assist Best Practices Guide (see <xref
|
|
linkend="VIPR.intro.links" />) and to the POWER ISA.
|
|
</para>
|
|
</section>
|
|
|
|
<section>
|
|
<title>Type Support</title>
|
|
<para>
|
|
Many of the MMA instructions operate on aligned pairs of vectors
|
|
(that is, an even numbered vector and the next-higher numbered
|
|
vector), or on aligned quads of vectors (that is, a vector
|
|
number divisible by four and the three next-higher numbered
|
|
vectors). Compilers that support the MMA intrinsic functions
|
|
must define two types, <code>__vector_pair</code> and
|
|
<code>__vector_quad</code>, to represent these concepts.
|
|
Pointers and references to these types must also be supported
|
|
where these concepts exist in the source language.
|
|
</para>
|
|
</section>
|
|
|
|
<section>
|
|
<title>Intrinsic Functions</title>
|
|
<para>
|
|
The intrinsics in this section are not overloaded. Each is
|
|
presented with its prototype and the instruction it represents.
|
|
The string "vuc" is used as shorthand for "vector unsigned
|
|
char" throughout.
|
|
</para>
|
|
<section>
|
|
<title>Memory Access</title>
|
|
<para>
|
|
Load and store vector pairs.
|
|
</para>
|
|
<indexterm>
|
|
<primary>lxvp</primary>
|
|
<secondary>__builtin_vsx_lxvp</secondary>
|
|
</indexterm>
|
|
<indexterm>
|
|
<primary>stxvp</primary>
|
|
<secondary>__builtin_vsx_stxvp</secondary>
|
|
</indexterm>
|
|
<indexterm>
|
|
<primary>lxvpx</primary>
|
|
<secondary>__builtin_vsx_lxvp</secondary>
|
|
</indexterm>
|
|
<indexterm>
|
|
<primary>stxvpx</primary>
|
|
<secondary>__builtin_vsx_stxvp</secondary>
|
|
</indexterm>
|
|
<para>
|
|
<informaltable frame="all">
|
|
<tgroup cols="2">
|
|
<colspec colname="c0" colwidth="40*" />
|
|
<colspec colname="c1" colwidth="10*" />
|
|
<thead>
|
|
<row>
|
|
<entry align="center" valign="middle">
|
|
<para><emphasis role="bold">Prototype</emphasis></para>
|
|
</entry>
|
|
<entry align="center" valign="middle">
|
|
<para><emphasis role="bold">Instruction</emphasis></para>
|
|
</entry>
|
|
</row>
|
|
</thead>
|
|
<tbody>
|
|
<row>
|
|
<entry>
|
|
<programlisting>
|
|
|
|
__vector_pair __builtin_vsx_lxvp (signed long a, const __vector_pair* b)
|
|
|
|
</programlisting>
|
|
</entry>
|
|
<entry>
|
|
<programlisting>
|
|
lxvp r,a(b)
|
|
or
|
|
lxvpx r,b,a
|
|
</programlisting>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<programlisting>
|
|
|
|
void __builtin_vsx_stxvp (__vector_pair s, signed long a, const __vector_pair* b)
|
|
|
|
</programlisting>
|
|
</entry>
|
|
<entry>
|
|
<programlisting>
|
|
stxvp s,a(b)
|
|
or
|
|
stxvpx s,b,a
|
|
</programlisting>
|
|
</entry>
|
|
</row>
|
|
</tbody>
|
|
</tgroup>
|
|
</informaltable>
|
|
</para>
|
|
</section>
|
|
<section>
|
|
<title>Assembly and Disassembly of Large Types</title>
|
|
<para>
|
|
The following intrinsics are used to construct
|
|
<code>__vector_pair</code> and <code>__vector_quad</code>
|
|
objects from 128-bit vectors, and deconstruct them into such
|
|
vectors. The disassembly interfaces place the results into
|
|
arrays of vectors using natural element order. The build
|
|
interfaces treat the vector input arguments as if they form an
|
|
array of vectors, with the first vector argument being array
|
|
element 0 in natural element order, the second vector argument
|
|
being array element 1, and so forth. The assemble interfaces
|
|
are deprecated because they do not give consistent results for
|
|
big- and little-endian targets, and users should use the build
|
|
interfaces instead.
|
|
</para>
|
|
<para>
|
|
<informaltable frame="all">
|
|
<tgroup cols="2">
|
|
<colspec colname="c0" colwidth="40*" />
|
|
<colspec colname="c0" colwidth="10*" />
|
|
<thead>
|
|
<row>
|
|
<entry align="center" valign="middle">
|
|
<para><emphasis role="bold">Prototype</emphasis></para>
|
|
</entry>
|
|
<entry align="center" valign="middle">
|
|
<para><emphasis role="bold">Notes</emphasis></para>
|
|
</entry>
|
|
</row>
|
|
</thead>
|
|
<tbody>
|
|
<row>
|
|
<entry>
|
|
<programlisting>
|
|
void __builtin_mma_assemble_acc (__vector_quad*, vuc, vuc, vuc, vuc)
|
|
</programlisting>
|
|
</entry>
|
|
<entry align="center" valign="middle">
|
|
<para>Deprecated</para>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<programlisting>
|
|
void __builtin_mma_build_acc (__vector_quad*, vuc, vuc, vuc, vuc)
|
|
</programlisting>
|
|
</entry>
|
|
<entry align="center" valign="middle">
|
|
<para></para>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<programlisting>
|
|
void __builtin_mma_disassemble_acc (void*, __vector_quad*)
|
|
</programlisting>
|
|
</entry>
|
|
<entry align="center" valign="middle">
|
|
<para></para>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<programlisting>
|
|
void __builtin_vsx_assemble_pair (__vector_pair*, vuc, vuc)
|
|
</programlisting>
|
|
</entry>
|
|
<entry align="center" valign="middle">
|
|
<para>Deprecated</para>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<programlisting>
|
|
void __builtin_vsx_build_pair (__vector_pair*, vuc, vuc)
|
|
</programlisting>
|
|
</entry>
|
|
<entry align="center" valign="middle">
|
|
<para></para>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<programlisting>
|
|
void __builtin_vsx_disassemble_pair (void*, __vector_pair*)
|
|
</programlisting>
|
|
</entry>
|
|
<entry align="center" valign="middle">
|
|
<para></para>
|
|
</entry>
|
|
</row>
|
|
</tbody>
|
|
</tgroup>
|
|
</informaltable>
|
|
</para>
|
|
</section>
|
|
<section>
|
|
<title>Accumulator Clear Operation</title>
|
|
<para>
|
|
This intrinsic function initializes an accumulator to zeros.
|
|
</para>
|
|
<indexterm>
|
|
<primary>xxsetaccz</primary>
|
|
<secondary>__builtin_mma_xxsetaccz</secondary>
|
|
</indexterm>
|
|
<para>
|
|
<informaltable frame="all">
|
|
<tgroup cols="2">
|
|
<colspec colname="c0" colwidth="28*" />
|
|
<colspec colname="c0" colwidth="10*" />
|
|
<thead>
|
|
<row>
|
|
<entry align="center" valign="middle">
|
|
<para><emphasis role="bold">Prototype</emphasis></para>
|
|
</entry>
|
|
<entry align="center" valign="middle">
|
|
<para><emphasis role="bold">Instruction</emphasis></para>
|
|
</entry>
|
|
</row>
|
|
</thead>
|
|
<tbody>
|
|
<row>
|
|
<entry>
|
|
<programlisting>
|
|
void __builtin_mma_xxsetaccz (__vector_quad* a)
|
|
</programlisting>
|
|
</entry>
|
|
<entry>
|
|
<programlisting>
|
|
xxsetaccz a
|
|
</programlisting>
|
|
</entry>
|
|
</row>
|
|
</tbody>
|
|
</tgroup>
|
|
</informaltable>
|
|
</para>
|
|
</section>
|
|
<section>
|
|
<title>Conversion Operations</title>
|
|
<para>
|
|
These instructions convert between vectors of single precision
|
|
and bfloat16 types.
|
|
</para>
|
|
<indexterm>
|
|
<primary>xvcvbf16spn</primary>
|
|
<secondary>__builtin_vsx_xvcvbf16spn</secondary>
|
|
</indexterm>
|
|
<indexterm>
|
|
<primary>xvcvspbf16</primary>
|
|
<secondary>__builtin_vsx_xvcvspbf16</secondary>
|
|
</indexterm>
|
|
<para>
|
|
<informaltable frame="all">
|
|
<tgroup cols="2">
|
|
<colspec colname="c0" colwidth="28*" />
|
|
<colspec colname="c0" colwidth="10*" />
|
|
<thead>
|
|
<row>
|
|
<entry align="center" valign="middle">
|
|
<para><emphasis role="bold">Prototype</emphasis></para>
|
|
</entry>
|
|
<entry align="center" valign="middle">
|
|
<para><emphasis role="bold">Instruction</emphasis></para>
|
|
</entry>
|
|
</row>
|
|
</thead>
|
|
<tbody>
|
|
<row>
|
|
<entry>
|
|
<programlisting>
|
|
vuc __builtin_vsx_xvcvbf16spn (vuc a)
|
|
</programlisting>
|
|
</entry>
|
|
<entry>
|
|
<programlisting>
|
|
xvcvbf16spn a
|
|
</programlisting>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<programlisting>
|
|
vuc __builtin_vsx_xvcvspbf16 (vuc a)
|
|
</programlisting>
|
|
</entry>
|
|
<entry>
|
|
<programlisting>
|
|
xvcvspbf16 a
|
|
</programlisting>
|
|
</entry>
|
|
</row>
|
|
</tbody>
|
|
</tgroup>
|
|
</informaltable>
|
|
</para>
|
|
</section>
|
|
<section>
|
|
<title>Outer Product Operations</title>
|
|
<para>
|
|
Each of these intrinsics generates an instruction to perform
|
|
an outer product operation.
|
|
</para>
|
|
<indexterm>
|
|
<primary>pmxvbf16ger2</primary>
|
|
<secondary>__builtin_mma_pmxvbf16ger2</secondary>
|
|
</indexterm>
|
|
<indexterm>
|
|
<primary>pmxvbf16ger2nn</primary>
|
|
<secondary>__builtin_mma_pmxvbf16ger2nn</secondary>
|
|
</indexterm>
|
|
<indexterm>
|
|
<primary>pmxvbf16ger2np</primary>
|
|
<secondary>__builtin_mma_pmxvbf16ger2np</secondary>
|
|
</indexterm>
|
|
<indexterm>
|
|
<primary>pmxvbf16ger2pn</primary>
|
|
<secondary>__builtin_mma_pmxvbf16ger2pn</secondary>
|
|
</indexterm>
|
|
<indexterm>
|
|
<primary>pmxvbf16ger2pp</primary>
|
|
<secondary>__builtin_mma_pmxvbf16ger2pp</secondary>
|
|
</indexterm>
|
|
<indexterm>
|
|
<primary>pmxvf16ger2</primary>
|
|
<secondary>__builtin_mma_pmxvf16ger2</secondary>
|
|
</indexterm>
|
|
<indexterm>
|
|
<primary>pmxvf16ger2nn</primary>
|
|
<secondary>__builtin_mma_pmxvf16ger2nn</secondary>
|
|
</indexterm>
|
|
<indexterm>
|
|
<primary>pmxvf16ger2np</primary>
|
|
<secondary>__builtin_mma_pmxvf16ger2np</secondary>
|
|
</indexterm>
|
|
<indexterm>
|
|
<primary>pmxvf16ger2pn</primary>
|
|
<secondary>__builtin_mma_pmxvf16ger2pn</secondary>
|
|
</indexterm>
|
|
<indexterm>
|
|
<primary>pmxvf16ger2pp</primary>
|
|
<secondary>__builtin_mma_pmxvf16ger2pp</secondary>
|
|
</indexterm>
|
|
<indexterm>
|
|
<primary>pmxvf32ger</primary>
|
|
<secondary>__builtin_mma_pmxvf32ger</secondary>
|
|
</indexterm>
|
|
<indexterm>
|
|
<primary>pmxvf32gernn</primary>
|
|
<secondary>__builtin_mma_pmxvf32gernn</secondary>
|
|
</indexterm>
|
|
<indexterm>
|
|
<primary>pmxvf32gernp</primary>
|
|
<secondary>__builtin_mma_pmxvf32gernp</secondary>
|
|
</indexterm>
|
|
<indexterm>
|
|
<primary>pmxvf32gerpn</primary>
|
|
<secondary>__builtin_mma_pmxvf32gerpn</secondary>
|
|
</indexterm>
|
|
<indexterm>
|
|
<primary>pmxvf32gerpp</primary>
|
|
<secondary>__builtin_mma_pmxvf32gerpp</secondary>
|
|
</indexterm>
|
|
<indexterm>
|
|
<primary>pmxvf64ger</primary>
|
|
<secondary>__builtin_mma_pmxvf64ger</secondary>
|
|
</indexterm>
|
|
<indexterm>
|
|
<primary>pmxvf64gernn</primary>
|
|
<secondary>__builtin_mma_pmxvf64gernn</secondary>
|
|
</indexterm>
|
|
<indexterm>
|
|
<primary>pmxvf64gernp</primary>
|
|
<secondary>__builtin_mma_pmxvf64gernp</secondary>
|
|
</indexterm>
|
|
<indexterm>
|
|
<primary>pmxvf64gerpn</primary>
|
|
<secondary>__builtin_mma_pmxvf64gerpn</secondary>
|
|
</indexterm>
|
|
<indexterm>
|
|
<primary>pmxvf64gerpp</primary>
|
|
<secondary>__builtin_mma_pmxvf64gerpp</secondary>
|
|
</indexterm>
|
|
<indexterm>
|
|
<primary>pmxvi16ger2</primary>
|
|
<secondary>__builtin_mma_pmxvi16ger2</secondary>
|
|
</indexterm>
|
|
<indexterm>
|
|
<primary>pmxvi16ger2pp</primary>
|
|
<secondary>__builtin_mma_pmxvi16ger2pp</secondary>
|
|
</indexterm>
|
|
<indexterm>
|
|
<primary>pmxvi16ger2s</primary>
|
|
<secondary>__builtin_mma_pmxvi16ger2s</secondary>
|
|
</indexterm>
|
|
<indexterm>
|
|
<primary>pmxvi16ger2spp</primary>
|
|
<secondary>__builtin_mma_pmxvi16ger2spp</secondary>
|
|
</indexterm>
|
|
<indexterm>
|
|
<primary>pmxvi4ger8</primary>
|
|
<secondary>__builtin_mma_pmxvi4ger8</secondary>
|
|
</indexterm>
|
|
<indexterm>
|
|
<primary>pmxvi4ger8pp</primary>
|
|
<secondary>__builtin_mma_pmxvi4ger8pp</secondary>
|
|
</indexterm>
|
|
<indexterm>
|
|
<primary>pmxvi8ger4</primary>
|
|
<secondary>__builtin_mma_pmxvi8ger4</secondary>
|
|
</indexterm>
|
|
<indexterm>
|
|
<primary>pmxvi8ger4pp</primary>
|
|
<secondary>__builtin_mma_pmxvi8ger4pp</secondary>
|
|
</indexterm>
|
|
<indexterm>
|
|
<primary>pmxvi8ger4spp</primary>
|
|
<secondary>__builtin_mma_pmxvi8ger4spp</secondary>
|
|
</indexterm>
|
|
<indexterm>
|
|
<primary>xvbf16ger2</primary>
|
|
<secondary>__builtin_mma_xvbf16ger2</secondary>
|
|
</indexterm>
|
|
<indexterm>
|
|
<primary>xvbf16ger2nn</primary>
|
|
<secondary>__builtin_mma_xvbf16ger2nn</secondary>
|
|
</indexterm>
|
|
<indexterm>
|
|
<primary>xvbf16ger2np</primary>
|
|
<secondary>__builtin_mma_xvbf16ger2np</secondary>
|
|
</indexterm>
|
|
<indexterm>
|
|
<primary>xvbf16ger2pn</primary>
|
|
<secondary>__builtin_mma_xvbf16ger2pn</secondary>
|
|
</indexterm>
|
|
<indexterm>
|
|
<primary>xvbf16ger2pp</primary>
|
|
<secondary>__builtin_mma_xvbf16ger2pp</secondary>
|
|
</indexterm>
|
|
<indexterm>
|
|
<primary>xvf16ger2</primary>
|
|
<secondary>__builtin_mma_xvf16ger2</secondary>
|
|
</indexterm>
|
|
<indexterm>
|
|
<primary>xvf16ger2nn</primary>
|
|
<secondary>__builtin_mma_xvf16ger2nn</secondary>
|
|
</indexterm>
|
|
<indexterm>
|
|
<primary>xvf16ger2np</primary>
|
|
<secondary>__builtin_mma_xvf16ger2np</secondary>
|
|
</indexterm>
|
|
<indexterm>
|
|
<primary>xvf16ger2pn</primary>
|
|
<secondary>__builtin_mma_xvf16ger2pn</secondary>
|
|
</indexterm>
|
|
<indexterm>
|
|
<primary>xvf16ger2pp</primary>
|
|
<secondary>__builtin_mma_xvf16ger2pp</secondary>
|
|
</indexterm>
|
|
<indexterm>
|
|
<primary>xvf32ger</primary>
|
|
<secondary>__builtin_mma_xvf32ger</secondary>
|
|
</indexterm>
|
|
<indexterm>
|
|
<primary>xvf32gernn</primary>
|
|
<secondary>__builtin_mma_xvf32gernn</secondary>
|
|
</indexterm>
|
|
<indexterm>
|
|
<primary>xvf32gernp</primary>
|
|
<secondary>__builtin_mma_xvf32gernp</secondary>
|
|
</indexterm>
|
|
<indexterm>
|
|
<primary>xvf32gerpn</primary>
|
|
<secondary>__builtin_mma_xvf32gerpn</secondary>
|
|
</indexterm>
|
|
<indexterm>
|
|
<primary>xvf32gerpp</primary>
|
|
<secondary>__builtin_mma_xvf32gerpp</secondary>
|
|
</indexterm>
|
|
<indexterm>
|
|
<primary>xvf64ger</primary>
|
|
<secondary>__builtin_mma_xvf64ger</secondary>
|
|
</indexterm>
|
|
<indexterm>
|
|
<primary>xvf64gernn</primary>
|
|
<secondary>__builtin_mma_xvf64gernn</secondary>
|
|
</indexterm>
|
|
<indexterm>
|
|
<primary>xvf64gernp</primary>
|
|
<secondary>__builtin_mma_xvf64gernp</secondary>
|
|
</indexterm>
|
|
<indexterm>
|
|
<primary>xvf64gerpn</primary>
|
|
<secondary>__builtin_mma_xvf64gerpn</secondary>
|
|
</indexterm>
|
|
<indexterm>
|
|
<primary>xvf64gerpp</primary>
|
|
<secondary>__builtin_mma_xvf64gerpp</secondary>
|
|
</indexterm>
|
|
<indexterm>
|
|
<primary>xvi16ger2</primary>
|
|
<secondary>__builtin_mma_xvi16ger2</secondary>
|
|
</indexterm>
|
|
<indexterm>
|
|
<primary>xvi16ger2pp</primary>
|
|
<secondary>__builtin_mma_xvi16ger2pp</secondary>
|
|
</indexterm>
|
|
<indexterm>
|
|
<primary>xvi16ger2s</primary>
|
|
<secondary>__builtin_mma_xvi16ger2s</secondary>
|
|
</indexterm>
|
|
<indexterm>
|
|
<primary>xvi16ger2spp</primary>
|
|
<secondary>__builtin_mma_xvi16ger2spp</secondary>
|
|
</indexterm>
|
|
<indexterm>
|
|
<primary>xvi4ger8</primary>
|
|
<secondary>__builtin_mma_xvi4ger8</secondary>
|
|
</indexterm>
|
|
<indexterm>
|
|
<primary>xvi4ger8pp</primary>
|
|
<secondary>__builtin_mma_xvi4ger8pp</secondary>
|
|
</indexterm>
|
|
<indexterm>
|
|
<primary>xvi8ger4</primary>
|
|
<secondary>__builtin_mma_xvi8ger4</secondary>
|
|
</indexterm>
|
|
<indexterm>
|
|
<primary>xvi8ger4pp</primary>
|
|
<secondary>__builtin_mma_xvi8ger4pp</secondary>
|
|
</indexterm>
|
|
<indexterm>
|
|
<primary>xvi8ger4spp</primary>
|
|
<secondary>__builtin_mma_xvi8ger4spp</secondary>
|
|
</indexterm>
|
|
<para>
|
|
<informaltable frame="all">
|
|
<tgroup cols="2">
|
|
<colspec colname="c0" colwidth="28*" />
|
|
<colspec colname="c1" colwidth="10*" />
|
|
<thead>
|
|
<row>
|
|
<entry align="center" valign="middle">
|
|
<para><emphasis role="bold">Prototype</emphasis></para>
|
|
</entry>
|
|
<entry align="center" valign="middle">
|
|
<para><emphasis role="bold">Instruction</emphasis></para>
|
|
</entry>
|
|
</row>
|
|
</thead>
|
|
<tbody>
|
|
<row>
|
|
<entry>
|
|
<programlisting>
|
|
void __builtin_mma_pmxvbf16ger2 (__vector_quad* a, vuc b, vuc c,
|
|
const int d, const int e, const int f)
|
|
</programlisting>
|
|
</entry>
|
|
<entry>
|
|
<programlisting>
|
|
pmxvbf16ger2 a,b,c,d,e,f
|
|
|
|
</programlisting>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<programlisting>
|
|
void __builtin_mma_pmxvbf16ger2nn (__vector_quad* a, vuc b, vuc c,
|
|
const int d, const int e, const int f)
|
|
</programlisting>
|
|
</entry>
|
|
<entry>
|
|
<programlisting>
|
|
pmxvbf16ger2nn a,b,c,d,e,f
|
|
|
|
</programlisting>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<programlisting>
|
|
void __builtin_mma_pmxvbf16ger2np (__vector_quad* a, vuc b, vuc c,
|
|
const int d, const int e, const int f)
|
|
</programlisting>
|
|
</entry>
|
|
<entry>
|
|
<programlisting>
|
|
pmxvbf16ger2np a,b,c,d,e,f
|
|
|
|
</programlisting>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<programlisting>
|
|
void __builtin_mma_pmxvbf16ger2pn (__vector_quad* a, vuc b, vuc c,
|
|
const int d, const int e, const int f)
|
|
</programlisting>
|
|
</entry>
|
|
<entry>
|
|
<programlisting>
|
|
pmxvbf16ger2pn a,b,c,d,e,f
|
|
|
|
</programlisting>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<programlisting>
|
|
void __builtin_mma_pmxvbf16ger2pp (__vector_quad* a, vuc b, vuc c,
|
|
const int d, const int e, const int f)
|
|
</programlisting>
|
|
</entry>
|
|
<entry>
|
|
<programlisting>
|
|
pmxvbf16ger2pp a,b,c,d,e,f
|
|
|
|
</programlisting>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<programlisting>
|
|
void __builtin_mma_pmxvf16ger2 (__vector_quad* a, vuc b, vuc c,
|
|
const int d, const int e, const int f)
|
|
</programlisting>
|
|
</entry>
|
|
<entry>
|
|
<programlisting>
|
|
pmxvf16ger2 a,b,c,d,e,f
|
|
|
|
</programlisting>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<programlisting>
|
|
void __builtin_mma_pmxvf16ger2nn (__vector_quad* a, vuc b, vuc c,
|
|
const int d, const int e, const int f)
|
|
</programlisting>
|
|
</entry>
|
|
<entry>
|
|
<programlisting>
|
|
pmxvf16ger2nn a,b,c,d,e,f
|
|
|
|
</programlisting>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<programlisting>
|
|
void __builtin_mma_pmxvf16ger2np (__vector_quad* a, vuc b, vuc c,
|
|
const int d, const int e, const int f)
|
|
</programlisting>
|
|
</entry>
|
|
<entry>
|
|
<programlisting>
|
|
pmxvf16ger2np a,b,c,d,e,f
|
|
|
|
</programlisting>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<programlisting>
|
|
void __builtin_mma_pmxvf16ger2pn (__vector_quad* a, vuc b, vuc c,
|
|
const int d, const int e, const int f)
|
|
</programlisting>
|
|
</entry>
|
|
<entry>
|
|
<programlisting>
|
|
pmxvf16ger2pn a,b,c,d,e,f
|
|
|
|
</programlisting>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<programlisting>
|
|
void __builtin_mma_pmxvf16ger2pp (__vector_quad* a, vuc b, vuc c,
|
|
const int d, const int e, const int f)
|
|
</programlisting>
|
|
</entry>
|
|
<entry>
|
|
<programlisting>
|
|
pmxvf16ger2pp a,b,c,d,e,f
|
|
|
|
</programlisting>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<programlisting>
|
|
void __builtin_mma_pmxvf32ger (__vector_quad* a, vuc b, vuc c,
|
|
const int d, const int e)
|
|
</programlisting>
|
|
</entry>
|
|
<entry>
|
|
<programlisting>
|
|
pmxvf32ger a,b,c,d,e
|
|
|
|
</programlisting>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<programlisting>
|
|
void __builtin_mma_pmxvf32gernn (__vector_quad* a, vuc b, vuc c,
|
|
const int d, const int e)
|
|
</programlisting>
|
|
</entry>
|
|
<entry>
|
|
<programlisting>
|
|
pmxvf32gernn a,b,c,d,e
|
|
|
|
</programlisting>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<programlisting>
|
|
void __builtin_mma_pmxvf32gernp (__vector_quad* a, vuc b, vuc c,
|
|
const int d, const int e)
|
|
</programlisting>
|
|
</entry>
|
|
<entry>
|
|
<programlisting>
|
|
pmxvf32gernp a,b,c,d,e
|
|
|
|
</programlisting>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<programlisting>
|
|
void __builtin_mma_pmxvf32gerpn (__vector_quad* a, vuc b, vuc c,
|
|
const int d, const int e)
|
|
</programlisting>
|
|
</entry>
|
|
<entry>
|
|
<programlisting>
|
|
pmxvf32gerpn a,b,c,d,e
|
|
|
|
</programlisting>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<programlisting>
|
|
void __builtin_mma_pmxvf32gerpp (__vector_quad* a, vuc b, vuc c,
|
|
const int d, const int e)
|
|
</programlisting>
|
|
</entry>
|
|
<entry>
|
|
<programlisting>
|
|
pmxvf32gerpp a,b,c,d,e
|
|
|
|
</programlisting>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<programlisting>
|
|
void __builtin_mma_pmxvf64ger (__vector_quad* a, __vector_pair b,
|
|
vuc c, const int d, const int e)
|
|
</programlisting>
|
|
</entry>
|
|
<entry>
|
|
<programlisting>
|
|
pmxvf64ger a,b,c,d,e
|
|
|
|
</programlisting>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<programlisting>
|
|
void __builtin_mma_pmxvf64gernn (__vector_quad* a, __vector_pair b,
|
|
vuc c, const int d, const int e)
|
|
</programlisting>
|
|
</entry>
|
|
<entry>
|
|
<programlisting>
|
|
pmxvf64gernn a,b,c,d,e
|
|
|
|
</programlisting>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<programlisting>
|
|
void __builtin_mma_pmxvf64gernp (__vector_quad* a, __vector_pair b,
|
|
vuc c, const int d, const int e)
|
|
</programlisting>
|
|
</entry>
|
|
<entry>
|
|
<programlisting>
|
|
pmxvf64gernp a,b,c,d,e
|
|
|
|
</programlisting>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<programlisting>
|
|
void __builtin_mma_pmxvf64gerpn (__vector_quad* a, __vector_pair b,
|
|
vuc c, const int d, const int e)
|
|
</programlisting>
|
|
</entry>
|
|
<entry>
|
|
<programlisting>
|
|
pmxvf64gerpn a,b,c,d,e
|
|
|
|
</programlisting>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<programlisting>
|
|
void __builtin_mma_pmxvf64gerpp (__vector_quad* a, __vector_pair b,
|
|
vuc c, const int d, const int e)
|
|
</programlisting>
|
|
</entry>
|
|
<entry>
|
|
<programlisting>
|
|
pmxvf64gerpp a,b,c,d,e
|
|
|
|
</programlisting>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<programlisting>
|
|
void __builtin_mma_pmxvi16ger2 (__vector_quad* a, vuc b, vuc c,
|
|
const int d, const int e, const int f)
|
|
</programlisting>
|
|
</entry>
|
|
<entry>
|
|
<programlisting>
|
|
pmxvi16ger2 a,b,c,d,e,f
|
|
|
|
</programlisting>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<programlisting>
|
|
void __builtin_mma_pmxvi16ger2pp (__vector_quad* a, vuc b, vuc c,
|
|
const int d, const int e, const int f)
|
|
</programlisting>
|
|
</entry>
|
|
<entry>
|
|
<programlisting>
|
|
pmxvi16ger2pp a,b,c,d,e,f
|
|
|
|
</programlisting>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<programlisting>
|
|
void __builtin_mma_pmxvi16ger2s (__vector_quad* a, vuc b, vuc c,
|
|
const int d, const int e, const int f)
|
|
</programlisting>
|
|
</entry>
|
|
<entry>
|
|
<programlisting>
|
|
pmxvi16ger2s a,b,c,d,e,f
|
|
|
|
</programlisting>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<programlisting>
|
|
void __builtin_mma_pmxvi16ger2spp (__vector_quad* a, vuc b, vuc c,
|
|
const int d, const int e, const int f)
|
|
</programlisting>
|
|
</entry>
|
|
<entry>
|
|
<programlisting>
|
|
pmxvi16ger2spp a,b,c,d,e,f
|
|
|
|
</programlisting>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<programlisting>
|
|
void __builtin_mma_pmxvi4ger8 (__vector_quad* a, vuc b, vuc c,
|
|
const int d, const int e, const int f)
|
|
</programlisting>
|
|
</entry>
|
|
<entry>
|
|
<programlisting>
|
|
pmxvi4ger8 a,b,c,d,e,f
|
|
|
|
</programlisting>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<programlisting>
|
|
void __builtin_mma_pmxvi4ger8pp (__vector_quad* a, vuc b, vuc c,
|
|
const int d, const int e, const int f)
|
|
</programlisting>
|
|
</entry>
|
|
<entry>
|
|
<programlisting>
|
|
pmxvi4ger8pp a,b,c,d,e,f
|
|
|
|
</programlisting>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<programlisting>
|
|
void __builtin_mma_pmxvi8ger4 (__vector_quad* a, vuc b, vuc c,
|
|
const int d, const int e, const int f)
|
|
</programlisting>
|
|
</entry>
|
|
<entry>
|
|
<programlisting>
|
|
pmxvi8ger4 a,b,c,d,e,f
|
|
|
|
</programlisting>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<programlisting>
|
|
void __builtin_mma_pmxvi8ger4pp (__vector_quad* a, vuc b, vuc c,
|
|
const int d, const int e, const int f)
|
|
</programlisting>
|
|
</entry>
|
|
<entry>
|
|
<programlisting>
|
|
pmxvi8ger4pp a,b,c,d,e,f
|
|
|
|
</programlisting>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<programlisting>
|
|
void __builtin_mma_pmxvi8ger4spp (__vector_quad* a, vuc b, vuc c,
|
|
const int d, const int e, const int f)
|
|
</programlisting>
|
|
</entry>
|
|
<entry>
|
|
<programlisting>
|
|
pmxvi8ger4spp a,b,c,d,e,f
|
|
|
|
</programlisting>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<programlisting>
|
|
void __builtin_mma_xvbf16ger2 (__vector_quad* a, vuc b, vuc c)
|
|
</programlisting>
|
|
</entry>
|
|
<entry>
|
|
<programlisting>
|
|
xvbf16ger2 a,b,c
|
|
</programlisting>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<programlisting>
|
|
void __builtin_mma_xvbf16ger2nn (__vector_quad* a, vuc b, vuc c)
|
|
</programlisting>
|
|
</entry>
|
|
<entry>
|
|
<programlisting>
|
|
xvbf16ger2nn a,b,c
|
|
</programlisting>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<programlisting>
|
|
void __builtin_mma_xvbf16ger2np (__vector_quad* a, vuc b, vuc c)
|
|
</programlisting>
|
|
</entry>
|
|
<entry>
|
|
<programlisting>
|
|
xvbf16ger2np a,b,c
|
|
</programlisting>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<programlisting>
|
|
void __builtin_mma_xvbf16ger2pn (__vector_quad* a, vuc b, vuc c)
|
|
</programlisting>
|
|
</entry>
|
|
<entry>
|
|
<programlisting>
|
|
xvbf16ger2pn a,b,c
|
|
</programlisting>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<programlisting>
|
|
void __builtin_mma_xvbf16ger2pp (__vector_quad* a, vuc b, vuc c)
|
|
</programlisting>
|
|
</entry>
|
|
<entry>
|
|
<programlisting>
|
|
xvbf16ger2pp a,b,c
|
|
</programlisting>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<programlisting>
|
|
void __builtin_mma_xvf16ger2 (__vector_quad* a, vuc b, vuc c)
|
|
</programlisting>
|
|
</entry>
|
|
<entry>
|
|
<programlisting>
|
|
xvf16ger2 a,b,c
|
|
</programlisting>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<programlisting>
|
|
void __builtin_mma_xvf16ger2nn (__vector_quad* a, vuc b, vuc c)
|
|
</programlisting>
|
|
</entry>
|
|
<entry>
|
|
<programlisting>
|
|
xvf16ger2nn a,b,c
|
|
</programlisting>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<programlisting>
|
|
void __builtin_mma_xvf16ger2np (__vector_quad* a, vuc b, vuc c)
|
|
</programlisting>
|
|
</entry>
|
|
<entry>
|
|
<programlisting>
|
|
xvf16ger2np a,b,c
|
|
</programlisting>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<programlisting>
|
|
void __builtin_mma_xvf16ger2pn (__vector_quad* a, vuc b, vuc c)
|
|
</programlisting>
|
|
</entry>
|
|
<entry>
|
|
<programlisting>
|
|
xvf16ger2pn a,b,c
|
|
</programlisting>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<programlisting>
|
|
void __builtin_mma_xvf16ger2pp (__vector_quad* a, vuc b, vuc c)
|
|
</programlisting>
|
|
</entry>
|
|
<entry>
|
|
<programlisting>
|
|
xvf16ger2pp a,b,c
|
|
</programlisting>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<programlisting>
|
|
void __builtin_mma_xvf32ger (__vector_quad* a, vuc b, vuc c)
|
|
</programlisting>
|
|
</entry>
|
|
<entry>
|
|
<programlisting>
|
|
xvf32ger a,b,c
|
|
</programlisting>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<programlisting>
|
|
void __builtin_mma_xvf32gernn (__vector_quad* a, vuc b, vuc c)
|
|
</programlisting>
|
|
</entry>
|
|
<entry>
|
|
<programlisting>
|
|
xvf32gernn a,b,c
|
|
</programlisting>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<programlisting>
|
|
void __builtin_mma_xvf32gernp (__vector_quad* a, vuc b, vuc c)
|
|
</programlisting>
|
|
</entry>
|
|
<entry>
|
|
<programlisting>
|
|
xvf32gernp a,b,c
|
|
</programlisting>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<programlisting>
|
|
void __builtin_mma_xvf32gerpn (__vector_quad* a, vuc b, vuc c)
|
|
</programlisting>
|
|
</entry>
|
|
<entry>
|
|
<programlisting>
|
|
xvf32gerpn a,b,c
|
|
</programlisting>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<programlisting>
|
|
void __builtin_mma_xvf32gerpp (__vector_quad* a, vuc b, vuc c)
|
|
</programlisting>
|
|
</entry>
|
|
<entry>
|
|
<programlisting>
|
|
xvf32gerpp a,b,c
|
|
</programlisting>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<programlisting>
|
|
void __builtin_mma_xvf64ger (__vector_quad* a, __vector_pair b, vuc c)
|
|
</programlisting>
|
|
</entry>
|
|
<entry>
|
|
<programlisting>
|
|
xvf64ger a,b,c
|
|
</programlisting>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<programlisting>
|
|
void __builtin_mma_xvf64gernn (__vector_quad* a, __vector_pair b, vuc c)
|
|
</programlisting>
|
|
</entry>
|
|
<entry>
|
|
<programlisting>
|
|
xvf64gernn a,b,c
|
|
</programlisting>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<programlisting>
|
|
void __builtin_mma_xvf64gernp (__vector_quad* a, __vector_pair b, vuc c)
|
|
</programlisting>
|
|
</entry>
|
|
<entry>
|
|
<programlisting>
|
|
xvf64gernp a,b,c
|
|
</programlisting>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<programlisting>
|
|
void __builtin_mma_xvf64gerpn (__vector_quad* a, __vector_pair b, vuc c)
|
|
</programlisting>
|
|
</entry>
|
|
<entry>
|
|
<programlisting>
|
|
xvf64gerpn a,b,c
|
|
</programlisting>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<programlisting>
|
|
void __builtin_mma_xvf64gerpp (__vector_quad* a, __vector_pair b, vuc c)
|
|
</programlisting>
|
|
</entry>
|
|
<entry>
|
|
<programlisting>
|
|
xvf64gerpp a,b,c
|
|
</programlisting>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<programlisting>
|
|
void __builtin_mma_xvi16ger2 (__vector_quad* a, vuc b, vuc c)
|
|
</programlisting>
|
|
</entry>
|
|
<entry>
|
|
<programlisting>
|
|
xvi16ger2 a,b,c
|
|
</programlisting>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<programlisting>
|
|
void __builtin_mma_xvi16ger2pp (__vector_quad* a, vuc b, vuc c)
|
|
</programlisting>
|
|
</entry>
|
|
<entry>
|
|
<programlisting>
|
|
xvi16ger2pp a,b,c
|
|
</programlisting>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<programlisting>
|
|
void __builtin_mma_xvi16ger2s (__vector_quad* a, vuc b, vuc c)
|
|
</programlisting>
|
|
</entry>
|
|
<entry>
|
|
<programlisting>
|
|
xvi16ger2s a,b,c
|
|
</programlisting>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<programlisting>
|
|
void __builtin_mma_xvi16ger2spp (__vector_quad* a, vuc b, vuc c)
|
|
</programlisting>
|
|
</entry>
|
|
<entry>
|
|
<programlisting>
|
|
xvi16ger2spp a,b,c
|
|
</programlisting>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<programlisting>
|
|
void __builtin_mma_xvi4ger8 (__vector_quad* a, vuc b, vuc c)
|
|
</programlisting>
|
|
</entry>
|
|
<entry>
|
|
<programlisting>
|
|
xvi4ger8 a,b,c
|
|
</programlisting>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<programlisting>
|
|
void __builtin_mma_xvi4ger8pp (__vector_quad* a, vuc b, vuc c)
|
|
</programlisting>
|
|
</entry>
|
|
<entry>
|
|
<programlisting>
|
|
xvi4ger8pp a,b,c
|
|
</programlisting>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<programlisting>
|
|
void __builtin_mma_xvi8ger4 (__vector_quad* a, vuc b, vuc c)
|
|
</programlisting>
|
|
</entry>
|
|
<entry>
|
|
<programlisting>
|
|
xvi8ger4 a,b,c
|
|
</programlisting>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<programlisting>
|
|
void __builtin_mma_xvi8ger4pp (__vector_quad* a, vuc b, vuc c)
|
|
</programlisting>
|
|
</entry>
|
|
<entry>
|
|
<programlisting>
|
|
xvi8ger4pp a,b,c
|
|
</programlisting>
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>
|
|
<programlisting>
|
|
void __builtin_mma_xvi8ger4spp (__vector_quad* a, vuc b, vuc c)
|
|
</programlisting>
|
|
</entry>
|
|
<entry>
|
|
<programlisting>
|
|
xvi8ger4spp a,b,c
|
|
</programlisting>
|
|
</entry>
|
|
</row>
|
|
</tbody>
|
|
</tgroup>
|
|
</informaltable>
|
|
</para>
|
|
</section>
|
|
</section>
|
|
|
|
</chapter>
|