From 16ef9435f59b13b4a530f0efef1ef80ad1844b3a Mon Sep 17 00:00:00 2001 From: Bill Schmidt Date: Thu, 29 Mar 2018 10:12:34 -0500 Subject: [PATCH] First draft of PC-relative changes for internal review. Signed-off-by: Bill Schmidt --- specification/bk_main.xml | 4 +- specification/ch_2.xml | 474 +++++++++++++++++++++++++++++++++++--- specification/ch_3.xml | 451 ++++++++++++++++++++++++++++++++++-- specification/ch_4.xml | 64 +++-- 4 files changed, 920 insertions(+), 73 deletions(-) diff --git a/specification/bk_main.xml b/specification/bk_main.xml index 54c98b4..71c611b 100644 --- a/specification/bk_main.xml +++ b/specification/bk_main.xml @@ -94,11 +94,11 @@ - 2018-03-02 + 2018-03-14 - Revision 1.5: POWER10 support. + Revision 1.5a: PC-relative addressing first draft. diff --git a/specification/ch_2.xml b/specification/ch_2.xml index ec43e5f..7b8cbc2 100644 --- a/specification/ch_2.xml +++ b/specification/ch_2.xml @@ -4032,7 +4032,8 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> - , the alignment of the + In , the alignment + of the structure is not affected by the unnamed short and int fields. The named members are aligned relative to the start of the structure. However, it is possible that the alignment of the named members is @@ -4044,6 +4045,70 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> +
+ Global Data Addressing Models + This specification provides for two global data + addressing models. The traditional addressing model, which we will call + "TOC-based," relies on a dedicated table-of-contents (TOC) pointer to + obtain the addresses of global data. PowerISA version 3.1 introduces new + "PC-relative" instructions that can be used to obtain the addresses of + global data relative to the current instruction address (CIA). Code that + is targeted to run on hardware compliant with PowerISA 3.1 may make use of + this capability with a "PC-relative" addressing model. + Each compilation unit must adhere entirely to + one addressing model or the other. However, it is expressly possible to + link TOC-based and PC-relative compilation units into a single + executable, or to dynamically link from a compilation unit with one + addressing model to a compilation unit with the other addressing model. + In particular, a PC-relative compilation unit may be linked with an + existing TOC-based library. Note that a "compilation unit" may consist of + hand-written assembly code as well as high-level source code. + Compilers and other tools performing + link-time optimizations that repackage functions into different + compilation units must not mix PC-relative and TOC-based functions in + the same compilation unit. [To discuss: This could be permitted, but + the value is unclear and it would be likely to spawn occasional + linker bugs.] Similarly, programmers should not be allowed to + specify a single function in a TOC-based compilation unit to use the + PC-relative addressing model or vice versa; for example, using GCC's + "#pragma target" syntax. [To discuss: How should this be recorded and + communicated? Perhaps add to e_flags in the ELF header for module + objects only? We can communicate the need for PC-relative PLT stubs + to the linker on calls with a reloc, so the linker may not need this, + but perhaps other tools will?] + Details of the two addressing models will be + provided throughout this specification. However, a brief description + of each is in order. +
+ TOC-Based Addressing Model + In the traditional TOC-based addressing model, + each function uses register r2 (see ) to access global memory. A variety + of techniques, known as TOC-relative, TOC-indirect, GOT-relative, etc., + may be used to address the global data, but all these techniques use the + TOC pointer r2 as part of the data reference. + With the cooperation of the linker, each + function in a TOC-based compilation unit is responsible for the + establishment and maintenance of its own TOC pointer. All functions + within a compilation unit have the same TOC pointer, so local function + calls may assume it does not change. An external function call may be + resolved to a function in a shared object having a different TOC + pointer, so a caller in a TOC-based compilation unit must save its TOC + pointer prior to making a call outside the compilation unit, and restore + its value upon return before the TOC pointer may be used to access global + data. +
+
+ PC-Relative Addressing Model + A function in a PC-relative compilation unit + has no TOC pointer. All accesses to global data are made relative to + the current instruction address. Since functions in TOC-based + compilation units are responsible for establishment and maintenance + of their own TOC pointers, register r2 may be used freely within a + PC-relative compilation unit, with no need to save or restore the + register when modifying it. +
+
Function Calling Sequence The standard sequence for function calls is outlined in this section. @@ -4208,15 +4273,22 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> Nonvolatile - Register r2 is nonvolatile with respect to calls - between functions in the same compilation unit. It is saved - and restored by code inserted by the linker resolving a - call to an external function. For more information, see - . - + In a TOC-based + compilation unit, register r2 is nonvolatile with + respect to calls between functions in the same compilation + unit. It is saved and restored by code inserted by the linker + resolving a call to an external function. For more + information, see . + or + Volatile + Register r2 is volatile and available for use in + PC-relative compilation units. + - TOC pointer. + TOC pointer for + TOC-based compilation units. @@ -4388,7 +4460,8 @@ xml:id="dbdoclet.50655240_pgfId-1156194">   TOC Pointer - Usage + Usage (TOC-Based Compilation Units + Only) As described in , the TOC pointer, r2, is commonly initialized by the global function entry point when a function @@ -4497,12 +4570,15 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> mask the value received from mfocr to avoid corruption of the resulting (partial) condition register word. - This erratum does not apply to the POWER9 processor. + This erratum does not apply to POWER9 and subsequent + processors. For more information, see - Power ISA, version 3.0 and "Fixed-Point Invalid + Power ISA, version 3.0B and "Fixed-Point Invalid Forms and Undefined Conditions" in POWER9 Processor User's Manual. Floating-Point Registers @@ -5124,8 +5200,16 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> is volatile over a function call.   TOC Pointer Doubleword - If a function changes the value of the TOC pointer register, it - shall first save it in the TOC pointer doubleword. + If a function in a TOC-based + compilation unit changes the value of the TOC pointer + register, it shall first save it in the TOC pointer doubleword. + The TOC pointer doubleword is reserved + for future use for functions in a PC-relative compilation + unit. [To discuss: This has implications for alloca, as if we + reserve it for future use, then the TOC pointer doubleword must be + copied during a dynamic allocation operation. I suspect it is + better to suffer that slight penalty rarely in order to have the + flexibility to use this for another future purpose.]
Optional Save Areas @@ -5252,7 +5336,8 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> Functions without a suitable declaration available to the caller to determine the called function's characteristics (for example, functions in C without a prototype in scope, in accordance - with Brian Kernighan and Dennis Ritche, + with Brian Kernighan and Dennis Ritchie, The C Programming Language, 1st edition). @@ -6220,6 +6305,16 @@ ld r12, 0(r12) ld r12, symbol2@got(r2) lvx v1, 0, r12 + + + + By using PC-relative addressing. + + + + pld r12, symbol@pcrel(0), 1 + +plvx v1, symbol@pcrel(0), 1 In the OpenPOWER ELF V2 ABI, position-dependent code built with this addressing scheme may have a Global Offset Table (GOT) in the data segment that holds addresses. (For more information, see @@ -6259,6 +6354,12 @@ lvx v1, 0, r12 loaded in the first 2 GB of the address space because direct address references and TOC-pointer initializations can be performed using a two-instruction sequence. + + PC-relative offsets are always 34 bits for all code models, with + a maximum addressing reach of 16GB. The effective addressing reach + for global data is 8GB, since data sections are always located at + higher virtual addresses than text sections. +
Position-Independent Code @@ -6318,6 +6419,47 @@ ld r12, 0(r12) ld r12 symbol2@got(r2) lvx v1, 0, r12 + + + By using PC-relative addressing (for + private data). + + + pld r12, symbol@pcrel(0), 1 + +plvx v1, symbol@pcrel(0), 1 + + + By using PC-relative GOT-indirect + addressing (for shared data or very large span from code to data): + + + + pld r12, symbol@got@pcrel(0), 1 +ld r12, 0(r12) + +pld r12, symbol@got@pcrel(0), 1 +lvx v1, 0, r12 + + A compiler may generate a PC-relative addressing sequence to access + static or restricted-visibility data, but must generate a PC-relative + GOT-indirect sequence for extern data. Extern data may be satisfied + from a statically or dynamically linked source, so the compiler must + be conservative. The compiler and linker can cooperate to replace a + PC-relative GOT-indirect sequence with a PC-relative sequence when + the data reference is satisfied at static link time. See + . + + [To discuss: I'd like to see the assembler + support "pld r12, symbol@pcrel" as an alternative to "pld r12, + symbol@pcrel(0), 1", and "pld r12, symbol@got@pcrel" as an + alternative to "pld r12, symbol@got@pcrel(0), 1". In general, any + prefix load/store with only two arguments is PC-relative; the + second argument is either a 34-bit offset or a GPR. Is this + reasonable or too confusing? Another alternative would be "pld r12, + symbol@pcrel(cia)" for an offset, and "pld r12, r5, cia" for the + GPR case. I guess we want something readable that isn't too + complex for the assembler to sort out.] Position-independent executables or shared objects have a GOT in the data segment that holds addresses. When the system creates a memory image from the file, the GOT entries are updated to reflect the @@ -6335,6 +6477,8 @@ lvx v1, 0, r12
Code Models + TOC-Based Compilation + Units Compilers may provide different code models depending on the expected size of the TOC and the size of the entire executable or shared library. @@ -6359,7 +6503,8 @@ lvx v1, 0, r12 addition, accesses to module-local code and data objects use TOC pointer relative addressing with 32-bit offsets. Using TOC pointer relative addressing removes a level of indirection, resulting in - faster access and a smaller GOT. However. it limits the size of the + faster access and a smaller GOT. However, it limits the size of the entire binary to between 2 GB and 4 GB, depending on the placement of the TOC base. @@ -6379,6 +6524,53 @@ lvx v1, 0, r12 TOCs, or by some other method. The suggested allocation order of sections is provided in . + PC-Relative Compilation + Units + + Compilers may provide different code models depending on the size of + the entire executable or shared library. There is no small code + model for PC-relative compilation units. + + + + + Medium code model: Accesses to module-local code and data objects + use PC-relative addressing with 34-bit offsets. + Position-independent code uses PC-relative GOT-indirect + addressing to access other objects in the binary. + + + + + Large code model: Used when 34-bit offsets are insufficient to + reach global data or the GOT from at least one text section, + this is similar to the medium code model, except that up to + 64-bit PC-relative offsets are used by generating them into a + register. [To discuss: None of the options for this seem ideal. + It takes about 5 instructions to generate a 64-bit constant into + a register, though we can perhaps use linker optimizations to + replace with a smaller sequence when available. A second choice + is to place the offset in a .quad in the text section to reach + the .got entry, but this would incur a load-load dependency. + (Are there cases where this requires a text relocation resolution + during dynamic linking?) A third choice is to fail the compile + and require TOC addressing with large code model when 34-bit + offsets aren't enough, though that doesn't initially seem + reasonable. Whatever we choose, we should document the sequence + and any associated linker optimizations.] + + + + + As with TOC-based compilation units, the medium code model is the + default for compilers, and is applicable to most programs and + libraries. The code examples in this document generally use the + medium code model. + + + When linking PC-relative relocatable objects, the linker should + attempt to place the .got section near the text sections. +
@@ -6387,9 +6579,50 @@ lvx v1, 0, r12 section.
Function Prologue - A function's prologue establishes addressability by initializing - a TOC pointer in register r2, if necessary, and a stack frame, if - necessary, and may save any nonvolatile registers it uses. + The function prologue is responsible for + the following functions: + + + Establishing addressability to global data + + + Creating a stack frame when required + + + Saving any nonvolatile registers that are used by the + function + + + Saving any limited-access bits that are used by the function, + per the rules described in + + + This ABI shall be used in conjunction with + the Power Architecture that implements the + mfocrf architecture level. Further, + OpenPOWER-compliant processors shall implement implementation-defined + bits in a manner to allow the combination of multiple + mfocrf results with an OR instruction; + for example, to yield a word in r0 including all three preserved CRs as + follows: + mfocrf r0, crf2 +mfocrf r1, crf3 +or r0, r0, r1 +mfocrf r1, crf4 +or r0, r0, r1 + Specifically, this allows each + OpenPOWER-compliant processor implementation to set each field to hold + either 0 or the correct in-order value of the corresponding CR field at + the point where the mfocrf + instruction is performed. + TOC-Based Compilation + Units + In a TOC-based compilation unit, + a function's prologue establishes addressability by + initializing a TOC pointer in register r2, if necessary, and a stack + frame, if necessary, and may save any nonvolatile registers it + uses. All functions have a global entry point (GEP) available to any caller and pointing to the beginning of the prologue. Some functions may have a secondary entry point to optimize the cost of TOC pointer @@ -6420,9 +6653,10 @@ addi r2, r2, .TOC.-func@l form that is faster due to instruction fusion, such as: lis r2, .TOC.@ha addi r2, r2, .TOC.@l - In addition to establishing addressability, the function prologue + In addition to establishing + addressability, the function prologue is responsible for the following functions: - + Creating a stack frame when required @@ -6436,24 +6670,25 @@ addi r2, r2, .TOC.@l - This ABI shall be used in conjunction with the Power Architecture - that implements the + This ABI shall be used in conjunction with + the Power Architecture that implements the mfocrf architecture level. Further, OpenPOWER-compliant processors shall implement implementation-defined bits in a manner to allow the combination of multiple mfocrf results with an OR instruction; for example, to yield a word in r0 including all three preserved CRs as follows: - mfocrf r0, crf2 + mfocrf r0, crf2 mfocrf r1, crf3 or r0, r0, r1 mfocrf r1, crf4 or r0, r0, r1 - Specifically, this allows each OpenPOWER-compliant processor - implementation to set each field to hold either 0 or the correct - in-order value of the corresponding CR field at the point where the - mfocrf instruction is performed. -   + Specifically, this allows each + OpenPOWER-compliant processor implementation to set each field to hold + either 0 or the correct in-order value of the corresponding CR field at + the point where the mfocrf + instruction is performed. +   Assembly Language Syntax for Defining Entry Points When a function has two entry points, the global entry point is @@ -6472,6 +6707,14 @@ or r0, r0, r1 the meaning of the second parameter, which is put in the three most-significant bits of the st_other field in the ELF Symbol Table entry. + PC-Relative Compilation + Units + + In a PC-relative compilation unit, the function prologue does not + require any setup code to establish addressability to global data. + Therefore there is also no need for a function to have a separate + local entry point. +
Function Epilogue @@ -6884,11 +7127,13 @@ _restvr_31: addi r12,r0,-16 shows an example of this method. Examples of absolute and position-independent compilations are - shown in - , - , and - . These examples show the C - language statements together with the generated assembly language. The + shown in , + , + , and + . These + examples show the + C language statements together with the generated assembly language. The assumption for these figures is that only executables can use absolute addressing while shared objects must use position-independent code addressing. The figures are intended to demonstrate the compilation of @@ -7151,6 +7396,60 @@ stw r0,0,(r7) + + + PC-Relative Load and Store + + + + + + + + C Code + + + + + Assembly Code + + + + + + + + extern int src; +extern int dst; +int *ptr; + +dst = src; + +ptr = &dst; + +*ptr = src; + + + + + + .extern src +.extern dst +.extern ptr +.section ".text" +plwz r9, src@pcrel(0), 1 +pstw r9, dst@pcrel(0), 1 +paddi r11, 0, dst@pcrel, 1 +pstd r11, ptr@pcrel(0), 1 +pld r11, ptr@pcrel(0), 1 +plwz r9, src@pcrel(0), 1 +stw r9, 0(r11) + + + + +
@@ -7311,9 +7610,16 @@ nop .
+ + For a function call in a PC-relative compilation unit, the nop in + should not be generated. + For indirect function calls, the address of the function to be called is placed in r12 and the CTR register. A bctrl instruction is used - to perform the indirect branch as shown in + to perform the indirect branch as shown in + + , + , and . The ELF V2 ABI requires the address of the called function to be in r12 when a cross-module function @@ -7381,7 +7687,11 @@ bctrl shows how to make an indirect - function call using small-model position-independent code. + function call using small-model position-independent code. + Note that the store and reload of the + TOC pointer r2 is not required in a PC-relative compilation + unit. +
Small-Model Position-Independent Indirect Function Call @@ -7451,7 +7761,11 @@ ld r2,24(r1) shows how to make an indirect - function call using large-model position-independent code. + function call using large-model position-independent code. + Note that the store and reload of the + TOC pointer r2 is not required in a PC-relative compilation + unit. +
Large-Model Position-Independent Indirect Function Call @@ -7521,6 +7835,7 @@ ld r2,24(r1) + TOC-Based Compilation Units Function calls need to be performed in conjunction with establishing, maintaining, and restoring addressability through the TOC pointer register, r2. When a function is called, the TOC pointer register @@ -7553,6 +7868,19 @@ bl target , , and . + PC-Relative Compilation + Units + + As with TOC-based compilation units, for calls to functions resolved at + runtime, the linker must generate stub code to load the function + address from the PLT. When the stub code is generated on behalf of + an indirect call in a PC-relative compilation unit, the linker may + omit the save and restore of r2 from the stub code. This behavior + is optional but recommended. Calls in PC-relative code should not + be marked with the R_PPC64_TOCSAVE or R_PPC64_REL24_NOTOC relocations. + [To discuss: Do we need a relocation to identify this as a PC-relative + call?] +
Branching @@ -7947,6 +8275,75 @@ f1: .long .TOC. - Ldefault .long .TOC. - Lcase13 + + shows a switch + implementation for PC-relative compilation units. [TBD: This needs to + be a figure, not a table, which may require working with Annette and + FrameMaker to get something that looks similar to the other figures. + All we have in the document for the other figures is .png files from + the old FrameMaker version. Or maybe we should just convert all the + other figures to tables.] + + + + Position-Independent Switch Code (PC-Relative Addressing) + + + + + + + + + C Code + + + + + Assembly Code + + + + + + + + switch(j) +{ +case 0: +... +case 1: +... +case 3: +... +default: +... +} + + + + + + cmplwi r12, 4 + bge .Ldefault + slwi r12, 2 + paddi r10, r0, .Ltab@pcrel, 1 + lwax r8, r10, r12 + add r10, r8, r10 + mtctr r10 + bctr + .p2align 2 +.Ltab: + .word (.Lcase0-.Ltab) + .word (.Lcase1-.Ltab) + .word (.Ldefault-.Ltab) + .word (.Lcase3-.Ltab) + + + + +
Dynamic Stack Space Allocation @@ -8019,6 +8416,11 @@ addi r3,r1,p ; R3 = new data area following parameter save area. + + It is unnecessary to copy the TOC pointer doubleword for a + PC-relative compilation unit. [To discuss: Should we, for future + use of this slot for another purpose?] + Additional instructions will be necessary for an allocation of variable size. If a dynamic deallocation will occur, the r1 stack diff --git a/specification/ch_3.xml b/specification/ch_3.xml index aa084d2..c5ee16d 100644 --- a/specification/ch_3.xml +++ b/specification/ch_3.xml @@ -245,7 +245,9 @@ e_ident[EI_DATA] ELFDATA2LSB For all little-endian implementations.
TOC - The TOC is part of the data segment of an executable program. + The TOC is part of the data segment of an executable program + built from at least one TOC-based object + file. This section describes a common layout of the TOC in an executable file or shared object. Particular tools are not required to follow the layout specified here. @@ -280,19 +282,21 @@ e_ident[EI_DATA] ELFDATA2LSB For all little-endian implementations. -   - Modules Containing Multiple TOCs - The link editor may create multiple TOCs. In such a case, the - constituent .got, .toc, .sdata, and .sbss sections are conceptually - repeated as necessary, with each TOC typically using a TOC pointer value - of its base plus 0x8000. Any constituent section of type SHT_NOBITS in - any TOC but the last is converted to type SHT_PROGBITS filled with - zeros. - When multiple TOCs are present, linking must take care to save, - initialize, and restore TOC pointers within a single module when calling - from one function to a second function using a different TOC pointer - value. Many of the same issues associated with a cross-module call apply - also to calls within a module but using different TOC pointers. +   +
+ Modules Containing Multiple TOCs + The link editor may create multiple TOCs. In such a case, the + constituent .got, .toc, .sdata, and .sbss sections are conceptually + repeated as necessary, with each TOC typically using a TOC pointer value + of its base plus 0x8000. Any constituent section of type SHT_NOBITS in + any TOC but the last is converted to type SHT_PROGBITS filled with + zeros. + When multiple TOCs are present, linking must take care to save, + initialize, and restore TOC pointers within a single module when calling + from one function to a second function using a different TOC pointer + value. Many of the same issues associated with a cross-module call apply + also to calls within a module but using different TOC pointers. +
Symbol Table @@ -302,7 +306,9 @@ e_ident[EI_DATA] ELFDATA2LSB For all little-endian implementations. - The OpenPOWER ABI uses the three most-significant bits in the + For TOC-based compilation + units, the OpenPOWER + ABI uses the three most-significant bits in the symbol st_other field to specify the number of instructions between a function's global entry point and local entry point. The global entry point is used when it is necessary to set up the TOC pointer (r2) for the @@ -2115,10 +2121,273 @@ my_func: + + In the following figure, prefix34 specifies a 34-bit field split + between bits 14-31 and 48-63 of a doubleword. The other bits + remain unchanged. This is used by PC-relative load and store + instructions. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + prefix34 + + + + + 0 + + + 13 + + + 14 + + + + + + 31 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + prefix34 (continued) + + + + + 32 + + + + + + 47 + + + 48 + + + 63 + + + + + + + In the following figure, prefix34ds is similar to prefix34, but is + really just 32 bits because the two least-significant bits must be + zero and are not really part of the field. This is used, for example, + by the pldu instruction. In addition to the use of this relocation + field with the DS forms, prefix34ds relocations are also used in + conjunction with DQ forms, such as the plq instruction. In those + instances, the linker and assembler collaborate to create valid DQ + forms. They raise an error if the specified offset does not meet the + constraints of a valid DQ instruction form displacement. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + prefix34ds + + + + + 0 + + + 13 + + + 14 + + + + + + + + + + + + 31 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + prefix34ds (continued) + + + + + + + + + + + 32 + + + + + + 47 + + + 48 + + + 61 + + + 62 + + + 63 + + + + +
Relocation Notations The following notations are used in the relocation table. + + [There seem to be a number of missing notations in this table. We + have #higher[a], #highest[a], and got, and perhaps the @ notation + could use further description. Also, there is some usage of #high and + #higha instead of #hi and #ha, which I assume is a mistake.] + @@ -2350,6 +2619,15 @@ my_func: tp + tprel = (S + A) + + + pcrel + + + Represents the offset of the symbol being relocated + relative to the current instruction address. + + tlsgd @@ -4143,9 +4421,84 @@ my_func: + + + R_PPC64_PCREL34 + + + 256? + + + prefix34 + + + @pcrel + + + + + R_PPC64_PCREL34_DS + + + 257? + + + prefix34ds* + + + @pcrel >> 2 + + + + + R_PPC64_GOT_PCREL34 + + + 258? + + + prefix34 + + + @got@pcrel + + + + + R_PPC64_GOT_PCREL34_DS + + + 259? + + + prefix34ds* + + + @got@pcrel >> 2 + + + + + R_PPC64_PCREL_OPT + + + 260? + + + + + + + + + + [To discuss: Assuming we build up 64-bit PC-relative offsets into a + register using shifts/adds, we'll need the #lo, #ha, #higher[a], + #highest[a] relocs to be defined also.] +
Relocation Descriptions @@ -4239,6 +4592,13 @@ my_func: associated with a global entry point. See for discussion of its use. + R_PPC64_PCREL_OPT + + This relocation type requests that the annotated load or store + instruction and its immediately preceding instruction be optimized by + the linker when the referenced symbol can be statically resolved. + See for details. +
Assembler Syntax @@ -4301,10 +4661,14 @@ addi 2,2,.TOC.-func@l requirements as indicated in this section.
Function Call - The static linker must modify a nop instruction after a bl function + For TOC-based compilation + units, the + static linker must modify a nop instruction after a bl function call to restore the TOC pointer in r2 from 24(r1) when an external symbol that may use the TOC may be called, as in - . Object files must contain a + . + TOC-based + object files must contain a nop slot after a bl instruction to an external symbol.
@@ -4375,6 +4739,46 @@ target: rewrite address references created using GOT-indirect loads and bl+4 sequences to use TOC-relative address computation.
+
+ Displacement Optimization for PC-Relative Accesses + + Compilers and assembly programmers must assume that references to + extern data having unrestricted visibility may be satisfied by a + dynamically linked object, and must therefore use PC-relative + GOT-indirect addressing for such references. A linker may + determine that such a reference is satisfied during static linking + and replace the reference with direct PC-relative addressing. + For example: + + pld r12, symbol@got@pcrel(0), 1 +lvx v1, 0, r12 + The previous sequence may be replaced by: + nop +plvx v1, symbol@pcrel(0), 1 + + However, this optimization is not universally safe, since it + changes the value of r12 following the data reference. The + compiler or programmer must ensure that the value of r12 is not + subsequently used, and communicate a request for this optimization + by placing a RELOC_PPC64_PCREL_OPT on the second instruction in + the sequence. The compiler or programmer must further ensure that + the two instructions are not separated by intervening instructions. + + + [To discuss: This optimization is crucial for making PC-relative + performance good enough to replace TOC-relative addressing. I + thought about allowing the compiler to separate the two instructions, + and place an instruction-distance value in the + RELOC_PPC64_PCREL_OPT relocation field, but ultimately I think this + becomes difficult to implement, and I hope that the load-from-DSO + case is infrequent enough that the load-load dependency won't kill + us. Definitely need other opinions/ideas here.] + + + [To discuss: Can we add optimizations for PC-relative offsets built + for large code model? Only applies if we use shift/add sequences.] + +
@@ -6979,7 +7383,9 @@ nop One-bit field. This field is set to 1 if this function does not have a TOC. For example, a stackless leaf assembly - language routine with no references to external objects. + language routine with no references to external objects. + [To discuss: What value should be + set for PC-relative functions?] @@ -7147,6 +7553,15 @@ nop parameters are placed in the Parameter Save Area. + + + ??? + + + [To discuss: Can/should we add a flag for PC-relative?] + + + diff --git a/specification/ch_4.xml b/specification/ch_4.xml index 81b1854..98b15e5 100644 --- a/specification/ch_4.xml +++ b/specification/ch_4.xml @@ -796,14 +796,18 @@ PPC_FEATURE2_HAS_IEEE128 0x00400000 /* VSX IEEE Binary Float 128-bit */ + multiple TOCs are present), followed by an array of 8-byte addresses. + + The 8-byte header value is undefined when all linked compilation units + are PC-relative. + + The link editor shall emit dynamic relocations as appropriate for each + entry in the GOT. At runtime, the dynamic linker will apply these + relocations after the addresses of all memory segments are known (and + thus the addresses of all symbols). While the GOT may be appear to be an + array of absolute addresses, this ABI does not preclude the GOT + containing nonaddress entries and specifies the presence of nonaddress + tls_index entries. Absolute addresses are generated for all GOT relocations by the dynamic linker before giving control to general application code. (However, IFUNC resolution functions may be invoked before relocation is @@ -812,7 +816,10 @@ PPC_FEATURE2_HAS_IEEE128 0x00400000 /* VSX IEEE Binary Float 128-bit */ - The symbol .TOC. may be used to access the GOT or in TOC-relative + When at least one TOC-based + compilation unit is to be linked, + the + symbol .TOC. may be used to access the GOT or in TOC-relative addressing to other data constructs, such as the procedure linkage table. The symbol may be offset by 0x8000 bytes, or another offset, from the start of the .got section. This offset allows the use of the full (64 KB) @@ -826,8 +833,13 @@ PPC_FEATURE2_HAS_IEEE128 0x00400000 /* VSX IEEE Binary Float 128-bit */In PIC code, the TOC pointer r2 points to the TOC base, enabling easy reference. For static nonrelocatable modules, the GOT address is fixed and can be directly used by code. - All functions except leaf routines must load the value of the TOC - base into the TOC register r2. + All functions in TOC-based + compilation units except leaf routines must load the value of + the TOC base into the TOC register r2. + + Functions in PC-relative compilation units access GOT entries directly + using PC-relative addressing. +
Function Addresses @@ -980,12 +992,19 @@ PPC_FEATURE2_HAS_IEEE128 0x00400000 /* VSX IEEE Binary Float 128-bit */ - + - 3. The caller has not set up r2 to hold the TOC pointer. This + The caller has not set up r2 to hold the TOC pointer. This is indicated by use of a R_PPC64_REL24_NOTOC relocation (instead of R_PPC64_REL24) on the call instruction. + + + The caller is PC-relative and does not need to save the TOC + pointer. [To discuss: Do we need a relocation, or will we have + a module-level bit the linker can detect?] + + In any scenario, the PLT call stub must transfer control to the function whose address is provided in the associated PLT entry. This @@ -1033,6 +1052,15 @@ PPC_FEATURE2_HAS_IEEE128 0x00400000 /* VSX IEEE Binary Float 128-bit */ + + A possible implementation for case 4 looks as follows: + + pld r12, func@plt@got@pcrel(0), 1 +mtctr r12 +bctr + + [To discuss: Is that the right assembly syntax?] + To support lazy binding, the link editor also provides a set of symbol resolver stubs, one for each PLT entry. Each resolver stub consists of a single instruction, which is usually a branch to a common @@ -1103,10 +1131,12 @@ PPC_FEATURE2_HAS_IEEE128 0x00400000 /* VSX IEEE Binary Float 128-bit */ After resolution, the value of a PLT entry in the PLT is the - address of the function’s global entry point, unless the resolver can - determine that a module-local call occurs with a shared TOC value wherein - the TOC is shared between the caller and the callee. - + address of the function’s global entry point, unless the resolver + can determine that a module-local call occurs with a shared TOC value + wherein the TOC is shared between the caller and the + callee, + or a module-local call occurs in a + PC-relative compilation unit. [?]