First draft of PC-relative changes for internal review.

Signed-off-by: Bill Schmidt <wschmidt@linux.ibm.com>
pull/92/head
Bill Schmidt 6 years ago
parent 8af2567a7f
commit 449df05f12

@ -94,11 +94,11 @@
<revhistory>
<!-- TODO: Set the initial version information and clear any old information out -->
<revision>
<date>2018-03-02</date>
<date>2018-03-14</date>
<revdescription>
<itemizedlist spacing="compact">
<listitem>
<para>Revision 1.5: POWER10 support.</para>
<para>Revision 1.5a: PC-relative addressing first draft.</para>
</listitem>
</itemizedlist>
</revdescription>

@ -4032,7 +4032,8 @@ xml:id="dbdoclet.50655240_pgfId-1156194">
</figure>

<note>
<para><xref linkend="dbdoclet.50655240_30073" /> , the alignment of the
<para>In <xref linkend="dbdoclet.50655240_30073" />, the alignment
of the
structure is not affected by the unnamed short and int fields. The
named members are aligned relative to the start of the structure.
However, it is possible that the alignment of the named members is
@ -4044,6 +4045,70 @@ xml:id="dbdoclet.50655240_pgfId-1156194">
</section>
</section>
</section>
<section revisionflag="added" xml:id="dbdoclet.50655240_AddrModel">
<title revisionflag="added">Global Data Addressing Models</title>
<para revisionflag="added">This specification provides for two global data
addressing models. The traditional addressing model, which we will call
"TOC-based," relies on a dedicated table-of-contents (TOC) pointer to
obtain the addresses of global data. PowerISA version 3.1 introduces new
"PC-relative" instructions that can be used to obtain the addresses of
global data relative to the current instruction address (CIA). Code that
is targeted to run on hardware compliant with PowerISA 3.1 may make use of
this capability with a "PC-relative" addressing model.</para>
<para revisionflag="added">Each compilation unit must adhere entirely to
one addressing model or the other. However, it is expressly possible to
link TOC-based and PC-relative compilation units into a single
executable, or to dynamically link from a compilation unit with one
addressing model to a compilation unit with the other addressing model.
In particular, a PC-relative compilation unit may be linked with an
existing TOC-based library. Note that a "compilation unit" may consist of
hand-written assembly code as well as high-level source code.</para>
<para revisionflag="added">Compilers and other tools performing
link-time optimizations that repackage functions into different
compilation units must not mix PC-relative and TOC-based functions in
the same compilation unit. [To discuss: This could be permitted, but
the value is unclear and it would be likely to spawn occasional
linker bugs.] Similarly, programmers should not be allowed to
specify a single function in a TOC-based compilation unit to use the
PC-relative addressing model or vice versa; for example, using GCC's
"#pragma target" syntax. [To discuss: How should this be recorded and
communicated? Perhaps add to e_flags in the ELF header for module
objects only? We can communicate the need for PC-relative PLT stubs
to the linker on calls with a reloc, so the linker may not need this,
but perhaps other tools will?]</para>
<para revisionflag="added">Details of the two addressing models will be
provided throughout this specification. However, a brief description
of each is in order.</para>
<section revisionflag="added" xml:id="dbdoclet.50655240_TOCBased">
<title revisionflag="added">TOC-Based Addressing Model</title>
<para revisionflag="added">In the traditional TOC-based addressing model,
each function uses register r2 (see <xref
linkend="dbdoclet.50655240_68174" />) to access global memory. A variety
of techniques, known as TOC-relative, TOC-indirect, GOT-relative, etc.,
may be used to address the global data, but all these techniques use the
TOC pointer r2 as part of the data reference.</para>
<para revisionflag="added">With the cooperation of the linker, each
function in a TOC-based compilation unit is responsible for the
establishment and maintenance of its own TOC pointer. All functions
within a compilation unit have the same TOC pointer, so local function
calls may assume it does not change. An external function call may be
resolved to a function in a shared object having a different TOC
pointer, so a caller in a TOC-based compilation unit must save its TOC
pointer prior to making a call outside the compilation unit, and restore
its value upon return before the TOC pointer may be used to access global
data.</para>
</section>
<section revisionflag="added" xml:id="dbdoclet.50655240_PCRel">
<title revisionflag="added">PC-Relative Addressing Model</title>
<para revisionflag="added">A function in a PC-relative compilation unit
has no TOC pointer. All accesses to global data are made relative to
the current instruction address. Since functions in TOC-based
compilation units are responsible for establishment and maintenance
of their own TOC pointers, register r2 may be used freely within a
PC-relative compilation unit, with no need to save or restore the
register when modifying it.</para>
</section>
</section>
<section xml:id="dbdoclet.50655240_85672">
<title>Function Calling Sequence</title>
<para>The standard sequence for function calls is outlined in this section.
@ -4208,15 +4273,22 @@ xml:id="dbdoclet.50655240_pgfId-1156194">
</entry>
<entry>
<para>Nonvolatile<footnote>
<para>Register r2 is nonvolatile with respect to calls
between functions in the same compilation unit. It is saved
and restored by code inserted by the linker resolving a
call to an external function. For more information, see
<xref linkend="dbdoclet.50655240_51083" />.</para>
</footnote></para>
<para><phrase revisionflag="changed">In a TOC-based
compilation unit, register</phrase> r2 is nonvolatile with
respect to calls between functions in the same compilation
unit. It is saved and restored by code inserted by the linker
resolving a call to an external function. For more
information, see <xref linkend="dbdoclet.50655240_51083"
/>.</para>
</footnote><phrase revisionflag="added"> or
Volatile<footnote>
<para>Register r2 is volatile and available for use in
PC-relative compilation units.</para>
</footnote></phrase></para>
</entry>
<entry>
<para>TOC pointer.</para>
<para>TOC pointer <phrase revisionflag="added"> for
TOC-based compilation units</phrase>.</para>
</entry>
</row>
<row>
@ -4388,7 +4460,8 @@ xml:id="dbdoclet.50655240_pgfId-1156194">
</table>
<para>&#160;</para>
<bridgehead xml:id="dbdoclet.50655240_51083">TOC Pointer
Usage</bridgehead>
Usage <phrase revisionflag="added">(TOC-Based Compilation Units
Only)</phrase></bridgehead>
<para>As described in
<xref linkend="dbdoclet.50655241_73385" />, the TOC pointer, r2, is
commonly initialized by the global function entry point when a function
@ -4497,12 +4570,15 @@ xml:id="dbdoclet.50655240_pgfId-1156194">
mask the value received from
<emphasis role="bold">mfocr</emphasis> to avoid corruption of the resulting
(partial) condition register word.</para>
<para>This erratum does not apply to the POWER9 processor.</para>
<para>This erratum does not apply to <phrase
revisionflag="changed">POWER9 and subsequent
processors.</phrase></para>
</note>

<para><anchor xml:id="dbdoclet.50655240_Power-ISA-version-and-the-user-s-manual"
xreflabel="" />For more information, see
<citetitle>Power ISA</citetitle>, version 3.0 and "Fixed-Point Invalid
<citetitle>Power ISA</citetitle>, version <phrase
revisionflag="changed">3.0B</phrase> and "Fixed-Point Invalid
Forms and Undefined Conditions" in
<citetitle>POWER9 Processor User's Manual.</citetitle></para>
<bridgehead>Floating-Point Registers</bridgehead>
@ -5124,8 +5200,16 @@ xml:id="dbdoclet.50655240_pgfId-1156194">
is volatile over a function call.</para>
<para>&#160;</para>
<bridgehead>TOC Pointer Doubleword</bridgehead>
<para>If a function changes the value of the TOC pointer register, it
shall first save it in the TOC pointer doubleword.</para>
<para>If a function <phrase revisionflag="added">in a TOC-based
compilation unit</phrase> changes the value of the TOC pointer
register, it shall first save it in the TOC pointer doubleword.
<phrase revisionflag="added">The TOC pointer doubleword is reserved
for future use for functions in a PC-relative compilation
unit. [To discuss: This has implications for alloca, as if we
reserve it for future use, then the TOC pointer doubleword must be
copied during a dynamic allocation operation. I suspect it is
better to suffer that slight penalty rarely in order to have the
flexibility to use this for another future purpose.]</phrase></para>
</section>
<section xml:id="dbdoclet.50655240_15141">
<title>Optional Save Areas</title>
@ -5252,7 +5336,8 @@ xml:id="dbdoclet.50655240_pgfId-1156194">
<para>Functions without a suitable declaration available to the
caller to determine the called function's characteristics (for
example, functions in C without a prototype in scope, in accordance
with Brian Kernighan and Dennis Ritche,
with Brian Kernighan and Dennis <phrase
revisionflag="changed">Ritchie</phrase>,
<citetitle>The C Programming Language</citetitle>, 1st
edition).</para>
</listitem>
@ -6220,6 +6305,16 @@ ld r12, 0(r12)

ld r12, symbol2@got(r2)
lvx v1, 0, r12</programlisting>
<itemizedlist>
<listitem>
<para revisionflag="added">
By using PC-relative addressing.
</para>
</listitem>
</itemizedlist>
<programlisting revisionflag="added">pld r12, symbol@pcrel(0), 1

plvx v1, symbol@pcrel(0), 1</programlisting>
<para>In the OpenPOWER ELF V2 ABI, position-dependent code built with
this addressing scheme may have a Global Offset Table (GOT) in the data
segment that holds addresses. (For more information, see
@ -6259,6 +6354,12 @@ lvx v1, 0, r12</programlisting>
loaded in the first 2 GB of the address space because direct address
references and TOC-pointer initializations can be performed using a
two-instruction sequence.</para>
<para revisionflag="added">
PC-relative offsets are always 34 bits for all code models, with
a maximum addressing reach of 16GB. The effective addressing reach
for global data is 8GB, since data sections are always located at
higher virtual addresses than text sections.
</para>
</section>
<section>
<title>Position-Independent Code</title>
@ -6318,6 +6419,47 @@ ld r12, 0(r12)

ld r12 symbol2@got(r2)
lvx v1, 0, r12</programlisting>
<itemizedlist>
<listitem>
<para revisionflag="added">By using PC-relative addressing (for
private data).</para>
</listitem>
</itemizedlist>
<programlisting revisionflag="added">pld r12, symbol@pcrel(0), 1

plvx v1, symbol@pcrel(0), 1</programlisting>
<itemizedlist>
<listitem>
<para revisionflag="added">By using PC-relative GOT-indirect
addressing (for shared data or very large span from code to data):
</para>
</listitem>
</itemizedlist>
<programlisting revisionflag="added">pld r12, symbol@got@pcrel(0), 1
ld r12, 0(r12)

pld r12, symbol@got@pcrel(0), 1
lvx v1, 0, r12</programlisting>
<para revisionflag="added">
A compiler may generate a PC-relative addressing sequence to access
static or restricted-visibility data, but must generate a PC-relative
GOT-indirect sequence for extern data. Extern data may be satisfied
from a statically or dynamically linked source, so the compiler must
be conservative. The compiler and linker can cooperate to replace a
PC-relative GOT-indirect sequence with a PC-relative sequence when
the data reference is satisfied at static link time. See
<xref linkend="dbdoclet.50655241_OptPCRel" />.
</para>
<para revisionflag="added">[To discuss: I'd like to see the assembler
support "pld r12, symbol@pcrel" as an alternative to "pld r12,
symbol@pcrel(0), 1", and "pld r12, symbol@got@pcrel" as an
alternative to "pld r12, symbol@got@pcrel(0), 1". In general, any
prefix load/store with only two arguments is PC-relative; the
second argument is either a 34-bit offset or a GPR. Is this
reasonable or too confusing? Another alternative would be "pld r12,
symbol@pcrel(cia)" for an offset, and "pld r12, r5, cia" for the
GPR case. I guess we want something readable that isn't too
complex for the assembler to sort out.]</para>
<para>Position-independent executables or shared objects have a GOT in
the data segment that holds addresses. When the system creates a memory
image from the file, the GOT entries are updated to reflect the
@ -6335,6 +6477,8 @@ lvx v1, 0, r12</programlisting>
</section>
<section xml:id="dbdoclet.50655240_19143">
<title>Code Models</title>
<bridgehead revisionflag="added">TOC-Based Compilation
Units</bridgehead>
<para>Compilers may provide different code models depending on the
expected size of the TOC and the size of the entire executable or
shared library.</para>
@ -6359,7 +6503,8 @@ lvx v1, 0, r12</programlisting>
addition, accesses to module-local code and data objects use TOC
pointer relative addressing with 32-bit offsets. Using TOC pointer
relative addressing removes a level of indirection, resulting in
faster access and a smaller GOT. However. it limits the size of the
faster access and a smaller GOT. <phrase
revisionflag="changed">However,</phrase> it limits the size of the
entire binary to between 2 GB and 4 GB, depending on the placement
of the TOC base.</para>
<note>
@ -6379,6 +6524,53 @@ lvx v1, 0, r12</programlisting>
TOCs, or by some other method. The suggested allocation order of
sections is provided in
<xref linkend="dbdoclet.50655241_66700" />.</para>
<bridgehead revisionflag="added">PC-Relative Compilation
Units</bridgehead>
<para revisionflag="added">
Compilers may provide different code models depending on the size of
the entire executable or shared library. There is no small code
model for PC-relative compilation units.
</para>
<itemizedlist revisionflag="added">
<listitem>
<para>
Medium code model: Accesses to module-local code and data objects
use PC-relative addressing with 34-bit offsets.
Position-independent code uses PC-relative GOT-indirect
addressing to access other objects in the binary.
</para>
</listitem>
<listitem>
<para>
Large code model: Used when 34-bit offsets are insufficient to
reach global data or the GOT from at least one text section,
this is similar to the medium code model, except that up to
64-bit PC-relative offsets are used by generating them into a
register. [To discuss: None of the options for this seem ideal.
It takes about 5 instructions to generate a 64-bit constant into
a register, though we can perhaps use linker optimizations to
replace with a smaller sequence when available. A second choice
is to place the offset in a .quad in the text section to reach
the .got entry, but this would incur a load-load dependency.
(Are there cases where this requires a text relocation resolution
during dynamic linking?) A third choice is to fail the compile
and require TOC addressing with large code model when 34-bit
offsets aren't enough, though that doesn't initially seem
reasonable. Whatever we choose, we should document the sequence
and any associated linker optimizations.]
</para>
</listitem>
</itemizedlist>
<para revisionflag="added">
As with TOC-based compilation units, the medium code model is the
default for compilers, and is applicable to most programs and
libraries. The code examples in this document generally use the
medium code model.
</para>
<para revisionflag="added">
When linking PC-relative relocatable objects, the linker should
attempt to place the .got section near the text sections.
</para>
</section>
</section>
<section xml:id="dbdoclet.50655240_12107">
@ -6387,9 +6579,50 @@ lvx v1, 0, r12</programlisting>
section.</para>
<section xml:id="dbdoclet.50655240___RefHeading___Toc377640597">
<title>Function Prologue</title>
<para>A function's prologue establishes addressability by initializing
a TOC pointer in register r2, if necessary, and a stack frame, if
necessary, and may save any nonvolatile registers it uses.</para>
<para revisionflag="added">The function prologue is responsible for
the following functions:</para>
<itemizedlist revisionflag="added">
<listitem>
<para>Establishing addressability to global data</para>
</listitem>
<listitem>
<para>Creating a stack frame when required</para>
</listitem>
<listitem>
<para>Saving any nonvolatile registers that are used by the
function</para>
</listitem>
<listitem>
<para>Saving any limited-access bits that are used by the function,
per the rules described in <xref
linkend="dbdoclet.50655240___RefHeading___Toc377640581" /></para>
</listitem>
</itemizedlist>
<para revisionflag="added">This ABI shall be used in conjunction with
the Power Architecture that implements the
<emphasis role="bold">mfocrf</emphasis> architecture level. Further,
OpenPOWER-compliant processors shall implement implementation-defined
bits in a manner to allow the combination of multiple
<emphasis role="bold">mfocrf</emphasis> results with an OR instruction;
for example, to yield a word in r0 including all three preserved CRs as
follows:</para>
<programlisting revisionflag="added">mfocrf r0, crf2
mfocrf r1, crf3
or r0, r0, r1
mfocrf r1, crf4
or r0, r0, r1</programlisting>
<para revisionflag="added">Specifically, this allows each
OpenPOWER-compliant processor implementation to set each field to hold
either 0 or the correct in-order value of the corresponding CR field at
the point where the <emphasis role="bold">mfocrf</emphasis>
instruction is performed.</para>
<bridgehead revisionflag="added">TOC-Based Compilation
Units</bridgehead>
<para><phrase revisionflag="changed">In a TOC-based compilation unit,
a</phrase> function's prologue establishes addressability by
initializing a TOC pointer in register r2, if necessary, and a stack
frame, if necessary, and may save any nonvolatile registers it
uses.</para>
<para>All functions have a global entry point (GEP) available to any
caller and pointing to the beginning of the prologue. Some functions
may have a secondary entry point to optimize the cost of TOC pointer
@ -6420,9 +6653,10 @@ addi r2, r2, .TOC.-func@l</programlisting>
form that is faster due to instruction fusion, such as:</para>
<programlisting>lis r2, .TOC.@ha
addi r2, r2, .TOC.@l</programlisting>
<para>In addition to establishing addressability, the function prologue
<para revisionflag="deleted">In addition to establishing
addressability, the function prologue
is responsible for the following functions:</para>
<itemizedlist>
<itemizedlist revisionflag="deleted">
<listitem>
<para>Creating a stack frame when required</para>
</listitem>
@ -6436,24 +6670,25 @@ addi r2, r2, .TOC.@l</programlisting>
<xref linkend="dbdoclet.50655240___RefHeading___Toc377640581" /></para>
</listitem>
</itemizedlist>
<para>This ABI shall be used in conjunction with the Power Architecture
that implements the
<para revisionflag="deleted">This ABI shall be used in conjunction with
the Power Architecture that implements the
<emphasis role="bold">mfocrf</emphasis> architecture level. Further,
OpenPOWER-compliant processors shall implement implementation-defined
bits in a manner to allow the combination of multiple
<emphasis role="bold">mfocrf</emphasis> results with an OR instruction; for example,
to yield a word in r0 including all three preserved CRs as
follows:</para>
<programlisting>mfocrf r0, crf2
<programlisting revisionflag="deleted">mfocrf r0, crf2
mfocrf r1, crf3
or r0, r0, r1
mfocrf r1, crf4
or r0, r0, r1</programlisting>
<para>Specifically, this allows each OpenPOWER-compliant processor
implementation to set each field to hold either 0 or the correct
in-order value of the corresponding CR field at the point where the
<emphasis role="bold">mfocrf</emphasis> instruction is performed.</para>
<para>&#160;</para>
<para revisionflag="deleted">Specifically, this allows each
OpenPOWER-compliant processor implementation to set each field to hold
either 0 or the correct in-order value of the corresponding CR field at
the point where the <emphasis role="bold">mfocrf</emphasis>
instruction is performed.</para>
<para revisionflag="deleted">&#160;</para>
<bridgehead>Assembly Language Syntax for Defining Entry
Points</bridgehead>
<para>When a function has two entry points, the global entry point is
@ -6472,6 +6707,14 @@ or r0, r0, r1</programlisting>
the meaning of the second parameter, which is put in the three
most-significant bits of the st_other field in the ELF Symbol Table
entry.</para>
<bridgehead revisionflag="added">PC-Relative Compilation
Units</bridgehead>
<para revisionflag="added">
In a PC-relative compilation unit, the function prologue does not
require any setup code to establish addressability to global data.
Therefore there is also no need for a function to have a separate
local entry point.
</para>
</section>
<section xml:id="dbdoclet.50655240_13754">
<title>Function Epilogue</title>
@ -6884,11 +7127,13 @@ _restvr_31: addi r12,r0,-16
<xref linkend="dbdoclet.50655242_page119" /> shows an example of this
method.</para>
<para>Examples of absolute and position-independent compilations are
shown in
<xref linkend="dbdoclet.50655240_12719" />,
<xref linkend="dbdoclet.50655240_page77" />, and
<xref linkend="dbdoclet.50655240_19926" />. These examples show the C
language statements together with the generated assembly language. The
shown in <phrase revisionflag="changed"><xref
linkend="dbdoclet.50655240_12719" />,
<xref linkend="dbdoclet.50655240_page77" />,
<xref linkend="dbdoclet.50655240_19926" />, and
<xref linkend="dbdoclet.50655240_StaticPCRel" /></phrase>. These
examples show the
C language statements together with the generated assembly language. The
assumption for these figures is that only executables can use absolute
addressing while shared objects must use position-independent code
addressing. The figures are intended to demonstrate the compilation of
@ -7151,6 +7396,60 @@ stw r0,0,(r7)</programlisting>
</tbody>
</tgroup>
</table>

<table frame="all" pgwide="1" xml:id="dbdoclet.50655240_StaticPCRel"
revisionflag="added">
<title>PC-Relative Load and Store</title>
<tgroup cols="2">
<colspec colname="c1" colwidth="30*" />
<colspec colname="c2" colwidth="70*" />
<thead>
<row>
<entry>
<para>
<emphasis role="bold">C Code</emphasis>
</para>
</entry>
<entry>
<para>
<emphasis role="bold">Assembly Code</emphasis>
</para>
</entry>
</row>
</thead>
<tbody>
<row>
<entry>
<programlisting>extern int src;
extern int dst;
int *ptr;

dst = src;

ptr = &amp;dst;

*ptr = src;


</programlisting>
</entry>
<entry>
<programlisting>.extern src
.extern dst
.extern ptr
.section ".text"
plwz r9, src@pcrel(0), 1
pstw r9, dst@pcrel(0), 1
paddi r11, 0, dst@pcrel, 1
pstd r11, ptr@pcrel(0), 1
pld r11, ptr@pcrel(0), 1
plwz r9, src@pcrel(0), 1
stw r9, 0(r11)</programlisting>
</entry>
</row>
</tbody>
</tgroup>
</table>
<note>
<itemizedlist>
<listitem>
@ -7311,9 +7610,16 @@ nop</programlisting>
<xref linkend="dbdoclet.50655242_20388" />.</para>
</listitem>
</orderedlist>
<para revisionflag="added">
For a function call in a PC-relative compilation unit, the nop in
<xref linkend="dbdoclet.50655240_85319" /> should not be generated.
</para>
<para>For indirect function calls, the address of the function to be
called is placed in r12 and the CTR register. A bctrl instruction is used
to perform the indirect branch as shown in
to perform the indirect branch as shown in
<phrase revisionflag="added">
<xref linkend="dbdoclet.50655240_95364" />,
</phrase>
<xref linkend="dbdoclet.50655240_16744" />, and
<xref linkend="dbdoclet.50655240_95225" />. The ELF V2 ABI requires the
address of the called function to be in r12 when a cross-module function
@ -7381,7 +7687,11 @@ bctrl</programlisting>
</table -->
<para>
<xref linkend="dbdoclet.50655240_16744" /> shows how to make an indirect
function call using small-model position-independent code.</para>
function call using small-model position-independent code.
<phrase revisionflag="added">Note that the store and reload of the
TOC pointer r2 is not required in a PC-relative compilation
unit.</phrase>
</para>

<figure xml:id="dbdoclet.50655240_16744">
<title>Small-Model Position-Independent Indirect Function Call</title>
@ -7451,7 +7761,11 @@ ld r2,24(r1)</programlisting>
</table -->
<para>
<xref linkend="dbdoclet.50655240_95225" /> shows how to make an indirect
function call using large-model position-independent code.</para>
function call using large-model position-independent code.
<phrase revisionflag="added">Note that the store and reload of the
TOC pointer r2 is not required in a PC-relative compilation
unit.</phrase>
</para>

<figure xml:id="dbdoclet.50655240_95225">
<title>Large-Model Position-Independent Indirect Function Call</title>
@ -7521,6 +7835,7 @@ ld r2,24(r1)</programlisting>
</tbody>
</tgroup>
</table -->
<bridgehead revisionflag="added">TOC-Based Compilation Units</bridgehead>
<para>Function calls need to be performed in conjunction with
establishing, maintaining, and restoring addressability through the TOC
pointer register, r2. When a function is called, the TOC pointer register
@ -7553,6 +7868,19 @@ bl target
<xref linkend="dbdoclet.50655240___RefHeading___Toc377640597" />,
<xref linkend="dbdoclet.50655241_95185" />, and
<xref linkend="dbdoclet.50655241_47572" />.</para>
<bridgehead revisionflag="added">PC-Relative Compilation
Units</bridgehead>
<para revisionflag="added">
As with TOC-based compilation units, for calls to functions resolved at
runtime, the linker must generate stub code to load the function
address from the PLT. When the stub code is generated on behalf of
an indirect call in a PC-relative compilation unit, the linker may
omit the save and restore of r2 from the stub code. This behavior
is optional but recommended. Calls in PC-relative code should not
be marked with the R_PPC64_TOCSAVE or R_PPC64_REL24_NOTOC relocations.
[To discuss: Do we need a relocation to identify this as a PC-relative
call?]
</para>
</section>
<section xml:id="dbdoclet.50655240_47036">
<title>Branching</title>
@ -7947,6 +8275,75 @@ f1:
.long .TOC. - Ldefault
.long .TOC. - Lcase13</programlisting>
</figure>
<para revisionflag="added">
<xref linkend="dbdoclet.50655240_PCRelSwitch" /> shows a switch
implementation for PC-relative compilation units. [TBD: This needs to
be a figure, not a table, which may require working with Annette and
FrameMaker to get something that looks similar to the other figures.
All we have in the document for the other figures is .png files from
the old FrameMaker version. Or maybe we should just convert all the
other figures to tables.]
</para>
<table frame="all" pgwide="1" xml:id="dbdoclet.50655240_PCRelSwitch"
revisionflag="added">
<title>
Position-Independent Switch Code (PC-Relative Addressing)
</title>
<tgroup cols="2">
<colspec colname="c1" colwidth="30*" />
<colspec colname="c2" colwidth="70*" />
<thead>
<row>
<entry>
<para>
<emphasis role="bold">C Code</emphasis>
</para>
</entry>
<entry>
<para>
<emphasis role="bold">Assembly Code</emphasis>
</para>
</entry>
</row>
</thead>
<tbody>
<row>
<entry>
<programlisting>switch(j)
{
case 0:
...
case 1:
...
case 3:
...
default:
...
}


</programlisting>
</entry>
<entry>
<programlisting> cmplwi r12, 4
bge .Ldefault
slwi r12, 2
paddi r10, r0, .Ltab@pcrel, 1
lwax r8, r10, r12
add r10, r8, r10
mtctr r10
bctr
.p2align 2
.Ltab:
.word (.Lcase0-.Ltab)
.word (.Lcase1-.Ltab)
.word (.Ldefault-.Ltab)
.word (.Lcase3-.Ltab)</programlisting>
</entry>
</row>
</tbody>
</tgroup>
</table>
</section>
<section xml:id="dbdoclet.50655240_32686">
<title>Dynamic Stack Space Allocation</title>
@ -8019,6 +8416,11 @@ addi r3,r1,p ; R3 = new data area following parameter save area.</pro
a value that needs to be preserved. In the future, if it is defined and
if the function uses the Reserved word, the LR save doubleword must also
be copied.</para>
<para revisionflag="added">
It is unnecessary to copy the TOC pointer doubleword for a
PC-relative compilation unit. [To discuss: Should we, for future
use of this slot for another purpose?]
</para>
<note>
<para>Additional instructions will be necessary for an allocation of
variable size. If a dynamic deallocation will occur, the r1 stack

@ -245,7 +245,9 @@ e_ident[EI_DATA] ELFDATA2LSB For all little-endian implementations.</progra
</section>
<section xml:id="dbdoclet.50655241_66700">
<title>TOC</title>
<para>The TOC is part of the data segment of an executable program.</para>
<para>The TOC is part of the data segment of an executable program
<phrase revisionflag="added">built from at least one TOC-based object
file</phrase>.</para>
<para>This section describes a common layout of the TOC in an executable
file or shared object. Particular tools are not required to follow the
layout specified here.</para>
@ -280,19 +282,21 @@ e_ident[EI_DATA] ELFDATA2LSB For all little-endian implementations.</progra
instruction of the two instruction form with a nop and rewriting the second
instruction. Consequently, the TOC pointer must be live during the first
and second instruction of a two-instruction reference.)</para>
<para>&#160;</para>
<bridgehead>Modules Containing Multiple TOCs</bridgehead>
<para>The link editor may create multiple TOCs. In such a case, the
constituent .got, .toc, .sdata, and .sbss sections are conceptually
repeated as necessary, with each TOC typically using a TOC pointer value
of its base plus 0x8000. Any constituent section of type SHT_NOBITS in
any TOC but the last is converted to type SHT_PROGBITS filled with
zeros.</para>
<para>When multiple TOCs are present, linking must take care to save,
initialize, and restore TOC pointers within a single module when calling
from one function to a second function using a different TOC pointer
value. Many of the same issues associated with a cross-module call apply
also to calls within a module but using different TOC pointers.</para>
<para revisionflag="deleted">&#160;</para>
<section>
<title revisionflag="changed">Modules Containing Multiple TOCs</title>
<para>The link editor may create multiple TOCs. In such a case, the
constituent .got, .toc, .sdata, and .sbss sections are conceptually
repeated as necessary, with each TOC typically using a TOC pointer value
of its base plus 0x8000. Any constituent section of type SHT_NOBITS in
any TOC but the last is converted to type SHT_PROGBITS filled with
zeros.</para>
<para>When multiple TOCs are present, linking must take care to save,
initialize, and restore TOC pointers within a single module when calling
from one function to a second function using a different TOC pointer
value. Many of the same issues associated with a cross-module call apply
also to calls within a module but using different TOC pointers.</para>
</section>
</section>
<section xml:id="dbdoclet.50655241_73385">
<title>Symbol Table</title>
@ -302,7 +306,9 @@ e_ident[EI_DATA] ELFDATA2LSB For all little-endian implementations.</progra
resolved dynamically by an associated shared object will have a symbol
table entry for that symbol. This entry will identify the symbol as
undefined by setting the st_shndx member to SHN_UNDEF.</para>
<para>The OpenPOWER ABI uses the three most-significant bits in the
<para><phrase revisionflag="added">For TOC-based compilation
units,</phrase> <phrase revisionflag="changed">the</phrase> OpenPOWER
ABI uses the three most-significant bits in the
symbol st_other field to specify the number of instructions between a
function's global entry point and local entry point. The global entry
point is used when it is necessary to set up the TOC pointer (r2) for the
@ -2115,10 +2121,273 @@ my_func:
</tbody>
</tgroup>
</informaltable>
<para revisionflag="added">
In the following figure, prefix34 specifies a 34-bit field split
between bits 14-31 and 48-63 of a doubleword. The other bits
remain unchanged. This is used by PC-relative load and store
instructions.
</para>
<informaltable frame="all" rowsep="0" colsep="0" revisionflag="added">
<tgroup cols="5">
<colspec colname="c1" colwidth="7*" />
<colspec colname="c2" colwidth="7*" />
<colspec colname="c3" colwidth="2*" />
<colspec colname="c4" colwidth="8*" />
<colspec colname="c5" colwidth="8*" />
<tbody>
<row>
<entry>
<para> </para>
</entry>
<entry align="right" colsep="1">
<para> </para>
</entry>
<entry>
<para> </para>
</entry>
<entry>
<para> </para>
</entry>
<entry>
<para> </para>
</entry>
</row>
<row>
<entry align="left">
<para> </para>
</entry>
<entry align="right" colsep="1">
<para> </para>
</entry>
<entry nameend="c5" namest="c3" align="center">
<para>prefix34</para>
</entry>
</row>
<row rowsep="1">
<entry align="left">
<para>0</para>
</entry>
<entry align="right" colsep="1">
<para>13</para>
</entry>
<entry align="left">
<para>14</para>
</entry>
<entry>
<para> </para>
</entry>
<entry align="right">
<para>31</para>
</entry>
</row>
<row>
<entry>
<para> </para>
</entry>
<entry>
<para> </para>
</entry>
<entry align="right" colsep="1">
<para> </para>
</entry>
<entry>
<para> </para>
</entry>
<entry>
<para> </para>
</entry>
</row>
<row>
<entry>
<para> </para>
</entry>
<entry>
<para> </para>
</entry>
<entry align="right" colsep="1">
<para> </para>
</entry>
<entry nameend="c5" namest="c4" align="center">
<para>prefix34 (continued)</para>
</entry>
</row>
<row>
<entry align="left">
<para>32</para>
</entry>
<entry>
<para> </para>
</entry>
<entry align="right" colsep="1">
<para>47</para>
</entry>
<entry align="left">
<para>48</para>
</entry>
<entry align="right">
<para>63</para>
</entry>
</row>
</tbody>
</tgroup>
</informaltable>
<para revisionflag="added">
In the following figure, prefix34ds is similar to prefix34, but is
really just 32 bits because the two least-significant bits must be
zero and are not really part of the field. This is used, for example,
by the pldu instruction. In addition to the use of this relocation
field with the DS forms, prefix34ds relocations are also used in
conjunction with DQ forms, such as the plq instruction. In those
instances, the linker and assembler collaborate to create valid DQ
forms. They raise an error if the specified offset does not meet the
constraints of a valid DQ instruction form displacement.
</para>
<informaltable frame="all" rowsep="0" colsep="0" revisionflag="added">
<tgroup cols="7">
<colspec colname="c1" colwidth="7*" />
<colspec colname="c2" colwidth="7*" />
<colspec colname="c3" colwidth="2*" />
<colspec colname="c4" colwidth="7*" />
<colspec colname="c5" colwidth="7*" />
<colspec colname="c6" colwidth="1*" />
<colspec colname="c7" colwidth="1*" />
<tbody>
<row>
<entry>
<para> </para>
</entry>
<entry align="right" colsep="1">
<para> </para>
</entry>
<entry>
<para> </para>
</entry>
<entry>
<para> </para>
</entry>
<entry>
<para> </para>
</entry>
<entry>
<para> </para>
</entry>
<entry>
<para> </para>
</entry>
</row>
<row>
<entry align="left">
<para> </para>
</entry>
<entry align="right" colsep="1">
<para> </para>
</entry>
<entry nameend="c7" namest="c3" align="center">
<para>prefix34ds</para>
</entry>
</row>
<row rowsep="1">
<entry align="left">
<para>0</para>
</entry>
<entry align="right" colsep="1">
<para>13</para>
</entry>
<entry align="left">
<para>14</para>
</entry>
<entry>
<para> </para>
</entry>
<entry>
<para> </para>
</entry>
<entry>
<para> </para>
</entry>
<entry align="right">
<para>31</para>
</entry>
</row>
<row>
<entry>
<para> </para>
</entry>
<entry>
<para> </para>
</entry>
<entry align="right" colsep="1">
<para> </para>
</entry>
<entry>
<para> </para>
</entry>
<entry align="right" colsep="1">
<para> </para>
</entry>
<entry>
<para> </para>
</entry>
<entry>
<para> </para>
</entry>
</row>
<row>
<entry>
<para> </para>
</entry>
<entry>
<para> </para>
</entry>
<entry align="right" colsep="1">
<para> </para>
</entry>
<entry nameend="c5" namest="c4" align="center" colsep="1">
<para>prefix34ds (continued)</para>
</entry>
<entry>
<para> </para>
</entry>
<entry>
<para> </para>
</entry>
</row>
<row>
<entry align="left">
<para>32</para>
</entry>
<entry>
<para> </para>
</entry>
<entry align="right" colsep="1">
<para>47</para>
</entry>
<entry align="left">
<para>48</para>
</entry>
<entry align="right" colsep="1">
<para>61</para>
</entry>
<entry align="left">
<para>62</para>
</entry>
<entry align="right">
<para>63</para>
</entry>
</row>
</tbody>
</tgroup>
</informaltable>
</section>
<section xml:id="dbdoclet.50655241_51269">
<title>Relocation Notations</title>
<para>The following notations are used in the relocation table.</para>
<para revisionflag="added">
[There seem to be a number of missing notations in this table. We
have #higher[a], #highest[a], and got, and perhaps the @ notation
could use further description. Also, there is some usage of #high and
#higha instead of #hi and #ha, which I assume is a mistake.]
</para>
<para> </para>
<informaltable frame="none" rowsep="0" colsep="0">
<tgroup cols="2">
@ -2350,6 +2619,15 @@ my_func:
<para>tp + tprel = (S + A)</para>
</entry>
</row>
<row revisionflag="added">
<entry>
<para>pcrel</para>
</entry>
<entry>
<para>Represents the offset of the symbol being relocated
relative to the current instruction address.</para>
</entry>
</row>
<row>
<entry>
<para>tlsgd</para>
@ -4143,9 +4421,84 @@ my_func:
<para> </para>
</entry>
</row>
<row revisionflag="added">
<entry>
<para>R_PPC64_PCREL34</para>
</entry>
<entry>
<para>256?</para>
</entry>
<entry>
<para>prefix34</para>
</entry>
<entry>
<para>@pcrel</para>
</entry>
</row>
<row revisionflag="added">
<entry>
<para>R_PPC64_PCREL34_DS</para>
</entry>
<entry>
<para>257?</para>
</entry>
<entry>
<para>prefix34ds*</para>
</entry>
<entry>
<para>@pcrel >> 2</para>
</entry>
</row>
<row revisionflag="added">
<entry>
<para>R_PPC64_GOT_PCREL34</para>
</entry>
<entry>
<para>258?</para>
</entry>
<entry>
<para>prefix34</para>
</entry>
<entry>
<para>@got@pcrel</para>
</entry>
</row>
<row revisionflag="added">
<entry>
<para>R_PPC64_GOT_PCREL34_DS</para>
</entry>
<entry>
<para>259?</para>
</entry>
<entry>
<para>prefix34ds*</para>
</entry>
<entry>
<para>@got@pcrel >> 2</para>
</entry>
</row>
<row revisionflag="added">
<entry>
<para>R_PPC64_PCREL_OPT</para>
</entry>
<entry>
<para>260?</para>
</entry>
<entry>
<para> </para>
</entry>
<entry>
<para> </para>
</entry>
</row>
</tbody>
</tgroup>
</table>
<para revisionflag="added">
[To discuss: Assuming we build up 64-bit PC-relative offsets into a
register using shifts/adds, we'll need the #lo, #ha, #higher[a],
#highest[a] relocs to be defined also.]
</para>
</section>
<section xml:id="dbdoclet.50655241_90220">
<title>Relocation Descriptions</title>
@ -4239,6 +4592,13 @@ my_func:
associated with a global entry point. See
<xref linkend="dbdoclet.50655241_95185" /> for discussion of its
use.</para>
<para revisionflag="added">R_PPC64_PCREL_OPT</para>
<para revisionflag="added">
This relocation type requests that the annotated load or store
instruction and its immediately preceding instruction be optimized by
the linker when the referenced symbol can be statically resolved.
See <xref linkend="dbdoclet.50655241_OptPCRel" /> for details.
</para>
</section>
<section>
<title>Assembler Syntax</title>
@ -4301,10 +4661,14 @@ addi 2,2,.TOC.-func@l</programlisting>
requirements as indicated in this section.</para>
<section xml:id="dbdoclet.50655241_69294">
<title>Function Call</title>
<para>The static linker must modify a nop instruction after a bl function
<para><phrase revisionflag="added">For TOC-based compilation
units,</phrase> <phrase revisionflag="changed">the</phrase>
static linker must modify a nop instruction after a bl function
call to restore the TOC pointer in r2 from 24(r1) when an external symbol
that may use the TOC may be called, as in
<xref linkend="dbdoclet.50655240_88555" />. Object files must contain a
<xref linkend="dbdoclet.50655240_88555" />.
<phrase revisionflag="added">TOC-based</phrase>
<phrase revisionflag="changed">object</phrase> files must contain a
nop slot after a bl instruction to an external symbol.</para>
</section>
<section>
@ -4375,6 +4739,46 @@ target:
rewrite address references created using GOT-indirect loads and bl+4
sequences to use TOC-relative address computation.</para>
</section>
<section xml:id="dbdoclet.50655241_OptPCRel" revisionflag="added">
<title>Displacement Optimization for PC-Relative Accesses</title>
<para>
Compilers and assembly programmers must assume that references to
extern data having unrestricted visibility may be satisfied by a
dynamically linked object, and must therefore use PC-relative
GOT-indirect addressing for such references. A linker may
determine that such a reference is satisfied during static linking
and replace the reference with direct PC-relative addressing.
For example:
</para>
<programlisting>pld r12, symbol@got@pcrel(0), 1
lvx v1, 0, r12</programlisting>
<para>The previous sequence may be replaced by:</para>
<programlisting>nop
plvx v1, symbol@pcrel(0), 1</programlisting>
<para>
However, this optimization is not universally safe, since it
changes the value of r12 following the data reference. The
compiler or programmer must ensure that the value of r12 is not
subsequently used, and communicate a request for this optimization
by placing a RELOC_PPC64_PCREL_OPT on the second instruction in
the sequence. The compiler or programmer must further ensure that
the two instructions are not separated by intervening instructions.
</para>
<para>
[To discuss: This optimization is crucial for making PC-relative
performance good enough to replace TOC-relative addressing. I
thought about allowing the compiler to separate the two instructions,
and place an instruction-distance value in the
RELOC_PPC64_PCREL_OPT relocation field, but ultimately I think this
becomes difficult to implement, and I hope that the load-from-DSO
case is infrequent enough that the load-load dependency won't kill
us. Definitely need other opinions/ideas here.]
</para>
<para>
[To discuss: Can we add optimizations for PC-relative offsets built
for large code model? Only applies if we use shift/add sequences.]
</para>
</section>
</section>
<section>
@ -6979,7 +7383,9 @@ nop</programlisting>
<entry>
<para>One-bit field. This field is set to 1 if this function
does not have a TOC. For example, a stackless leaf assembly
language routine with no references to external objects.</para>
language routine with no references to external objects.
<phrase revisionflag="added">[To discuss: What value should be
set for PC-relative functions?]</phrase></para>
</entry>
</row>
<row>
@ -7147,6 +7553,15 @@ nop</programlisting>
parameters are placed in the Parameter Save Area.</para>
</entry>
</row>
<row revisionflag="added">
<entry>
<para>???</para>
</entry>
<entry>
<para>[To discuss: Can/should we add a flag for PC-relative?]
</para>
</entry>
</row>
</tbody>
</tgroup>
</informaltable>

@ -796,14 +796,18 @@ PPC_FEATURE2_HAS_IEEE128 0x00400000 /* VSX IEEE Binary Float 128-bit */</progra
code that contains any of the various R_PPC64_GOT* relocations or when
linking code that references the .TOC. address. The GOT consists of an
8-byte header that contains the TOC base (the first TOC base when
multiple TOCs are present), followed by an array of 8-byte addresses. The
link editor shall emit dynamic relocations as appropriate for each entry
in the GOT. At runtime, the dynamic linker will apply these relocations
after the addresses of all memory segments are known (and thus the
addresses of all symbols). While the GOT may be appear to be an array of
absolute addresses, this ABI does not preclude the GOT containing
nonaddress entries and specifies the presence of nonaddress tls_index
entries.</para>
multiple TOCs are present), followed by an array of 8-byte addresses.
<phrase revisionflag="added">
The 8-byte header value is undefined when all linked compilation units
are PC-relative.
</phrase>
The link editor shall emit dynamic relocations as appropriate for each
entry in the GOT. At runtime, the dynamic linker will apply these
relocations after the addresses of all memory segments are known (and
thus the addresses of all symbols). While the GOT may be appear to be an
array of absolute addresses, this ABI does not preclude the GOT
containing nonaddress entries and specifies the presence of nonaddress
tls_index entries.</para>
<para>Absolute addresses are generated for all GOT relocations by the
dynamic linker before giving control to general application code.
(However, IFUNC resolution functions may be invoked before relocation is
@ -812,7 +816,10 @@ PPC_FEATURE2_HAS_IEEE128 0x00400000 /* VSX IEEE Binary Float 128-bit */</progra
the executable or shared objects in a different process image. After the
initial mapping of the process image by the dynamic linker, memory
segments reside at fixed addresses for the life of a process.</para>
<para>The symbol .TOC. may be used to access the GOT or in TOC-relative
<para><phrase revisionflag="added">When at least one TOC-based
compilation unit is to be linked,</phrase>
<phrase revisionflag="changed">the</phrase>
symbol .TOC. may be used to access the GOT or in TOC-relative
addressing to other data constructs, such as the procedure linkage table.
The symbol may be offset by 0x8000 bytes, or another offset, from the
start of the .got section. This offset allows the use of the full (64 KB)
@ -826,8 +833,13 @@ PPC_FEATURE2_HAS_IEEE128 0x00400000 /* VSX IEEE Binary Float 128-bit */</progra
<para>In PIC code, the TOC pointer r2 points to the TOC base, enabling
easy reference. For static nonrelocatable modules, the GOT address is
fixed and can be directly used by code.</para>
<para>All functions except leaf routines must load the value of the TOC
base into the TOC register r2.</para>
<para>All functions <phrase revisionflag="added">in TOC-based
compilation units</phrase> except leaf routines must load the value of
the TOC base into the TOC register r2.</para>
<para revisionflag="added">
Functions in PC-relative compilation units access GOT entries directly
using PC-relative addressing.
</para>
</section>
<section>
<title>Function Addresses</title>
@ -980,12 +992,19 @@ PPC_FEATURE2_HAS_IEEE128 0x00400000 /* VSX IEEE Binary Float 128-bit */</progra
bl target
.reloc ., R_PPC64_TOCSAVE, tocsaveloc
nop</programlisting>
<orderedlist>
<orderedlist continuation="continues">
<listitem>
<para>3. The caller has not set up r2 to hold the TOC pointer. This
<para>The caller has not set up r2 to hold the TOC pointer. This
is indicated by use of a R_PPC64_REL24_NOTOC relocation (instead of
R_PPC64_REL24) on the call instruction.</para>
</listitem>
<listitem revisionflag="added">
<para>
The caller is PC-relative and does not need to save the TOC
pointer. [To discuss: Do we need a relocation, or will we have
a module-level bit the linker can detect?]
</para>
</listitem>
</orderedlist>
<para>In any scenario, the PLT call stub must transfer control to the
function whose address is provided in the associated PLT entry. This
@ -1033,6 +1052,15 @@ PPC_FEATURE2_HAS_IEEE128 0x00400000 /* VSX IEEE Binary Float 128-bit */</progra
ld r12,func@plt@l(r12)
mtctr r12
bctr</programlisting>
<para revisionflag="added">
A possible implementation for case 4 looks as follows:
</para>
<programlisting>pld r12, func@plt@got@pcrel(0), 1
mtctr r12
bctr</programlisting>
<para revisionflag="added">
[To discuss: Is that the right assembly syntax?]
</para>
<para>To support lazy binding, the link editor also provides a set of
symbol resolver stubs, one for each PLT entry. Each resolver stub
consists of a single instruction, which is usually a branch to a common
@ -1103,10 +1131,12 @@ PPC_FEATURE2_HAS_IEEE128 0x00400000 /* VSX IEEE Binary Float 128-bit */</progra
res_1: b PLTresolve
...</programlisting>
<para>After resolution, the value of a PLT entry in the PLT is the
address of the functions global entry point, unless the resolver can
determine that a module-local call occurs with a shared TOC value wherein
the TOC is shared between the caller and the callee.</para>
<para> </para>
address of the functions global entry point, unless the resolver
can determine that a module-local call occurs with a shared TOC value
wherein the TOC is shared between the caller and the
<phrase revisionflag="changed">callee,</phrase>
<phrase revisionflag="added">or a module-local call occurs in a
PC-relative compilation unit. [?]</phrase></para>
</section>
</section>
</section>

Loading…
Cancel
Save