Programming-Guides/Porting_Vector_Intrinsics/ch_howto_start.xml

130 lines
5.5 KiB
XML

This file contains invisible Unicode characters!

This file contains invisible Unicode characters that may be processed differently from what appears below. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to reveal hidden characters.

<?xml version="1.0" encoding="UTF-8"?>
<!--
Copyright (c) 2017 OpenPOWER Foundation
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<chapter xmlns="http://docbook.org/ns/docbook"
xmlns:xi="http://www.w3.org/2001/XInclude"
xmlns:xlink="http://www.w3.org/1999/xlink"
version="5.0"
xml:id="ch_howto_start">
<title>How do we work this?</title>
<para>The working assumption is to start with the existing GCC headers from
<literal>./gcc/config/i386/</literal>, then convert them to PowerISA
and add them to <literal>./gcc/config/rs6000/</literal>.
I assume we will replicate the existing header structure
and retain the existing header file and intrinsic names.
This also allows us to reuse existing DejaGNU test cases from
<literal>./gcc/testsuite/gcc.target/i386</literal>, modify
them as needed for the POWER target, and add them to
<literal>./gcc/testsuite/gcc.target/powerpc</literal>.</para>
<para>We can be flexible on the sequence that headers/intrinsics and test
cases are ported.  This should be based on customer need and resolving
internal dependencies.  This implies an oldest-to-newest / bottoms-up (MMX,
SSE, SSE2, …) strategy. The assumption is, existing community and user
application codes, are more likely to have optimized code for previous
generation ubiquitous (SSE, SSE2, ...) processors than the latest (and rare)
SkyLake AVX512.</para>
<para>I would start with an existing header from the current GCC
 ./gcc/config/i386/ and copy the header comment (including FSF copyright) down
to any vector typedefs used in the API or implementation. Skip the Intel
intrinsic implementation code for now, but add the ending #end if matching the
headers conditional guard against multiple inclusion. You can add additional
#include's as needed. For example:
<programlisting><![CDATA[/* Copyright (C) 2003-2017 Free Software Foundation, Inc
...
/* This header provides a best effort implementation of the Intel X86
* SSE2 intrinsics for the PowerPC target. This implementation is a
* combination of compiled C vector codes or equivalent sequences of
* GCC vector builtins from the GCC PowerPC Altivec target.
*
* However some details of this implementation will differ from
* the X86 due to differences in the underlying hardware or GCC
* implementation. For example the PowerPC target only uses unordered
* floating point compares. */
#ifndef EMMINTRIN_H_
#define EMMINTRIN_H_
#include <altivec.h>
#include <assert.h>
/* We need definitions from the SSE header files. */
#include <xmmintrin.h>
/* The Intel API is flexible enough that we must allow aliasing with other
vector types, and their scalar components. */
typedef float __m128 __attribute__ ((__vector_size__ (16), __may_alias__));
/* Internal data types for implementing the intrinsics. */
typedef __vector float __v4sf;
/* more typedefs. */
/* The intrinsic implmentations go here. */
#endif /* EMMINTRIN_H_ */]]></programlisting></para>
<note><para>
The interface typedef (__m128) uses the GCC vector builtin
extension syntax, while the internal typedef (__v4sf) uses the altivec
vector extension syntax.
This allows the internal typedefs to work correctly with the PowerPC
overloaded vector builtins. Also we use the __vector (vs vector) type prefix
to avoid name space conflicts with C++.
</para></note>
<para>Then you can start adding small groups of related intrinsic
implementations to the header to be compiled and  examine the generated code.
Once you have what looks like reasonable code you can grep through
 ./gcc/testsuite/gcc.target/i386 for examples using the intrinsic names you
just added. You should be able to find functional tests for most X86
intrinsics. </para>
<para>The
<link xlink:href="https://gcc.gnu.org/onlinedocs/gccint/Testsuites.html#Testsuites">
<emphasis role="italic">GCC testsuite</emphasis></link>
uses the DejaGNU  test framework as documented in the
<link xlink:href="https://gcc.gnu.org/onlinedocs/gccint/">
<emphasis role="italic">GNU Compiler Collection (GCC)
Internals</emphasis></link>
manual. GCC adds its own DejaGNU directives and extensions,
that are embedded in the testsuite source as comments.  Some are platform
specific and will need to be adjusted for tests that are ported to our
platform. For example
<programlisting><![CDATA[/* { dg-do run } */
/* { dg-options "-O2 -msse2" } */
/* { dg-require-effective-target sse2 } */]]></programlisting></para>
<para>should become something like
<programlisting><![CDATA[/* { dg-do run } */
/* { dg-options "-O3 -mpower8-vector" } */
/* { dg-require-effective-target lp64 } */
/* { dg-require-effective-target p8vector_hw { target powerpc*-*-* } } */]]></programlisting></para>
<para>Repeat this process until you have equivalent DejaGNU test
implementations for all the intrinsics in that header and associated
test cases that execute without error.</para>
<xi:include href="sec_prefered_methods.xml"/>
<xi:include href="sec_prepare.xml"/>
<xi:include href="sec_differences.xml"/>
</chapter>