From 87dccbeaf5c8822bae9cf44aba3914685ca2d7af Mon Sep 17 00:00:00 2001 From: Bill Schmidt Date: Thu, 10 May 2018 15:37:52 -0500 Subject: [PATCH] Changes for fourth draft of PC-relative changes. Signed-off-by: Bill Schmidt --- specification/bk_main.xml | 13 +- specification/ch_1.xml | 24 +- specification/ch_2.xml | 440 ++++++++++++++++++----------------- specification/ch_3.xml | 478 +++++++++++++++++++++++++++++--------- 4 files changed, 623 insertions(+), 332 deletions(-) diff --git a/specification/bk_main.xml b/specification/bk_main.xml index fc2cddb..7ec589f 100644 --- a/specification/bk_main.xml +++ b/specification/bk_main.xml @@ -57,7 +57,7 @@ Freescale Semiconductor, Inc - Revision 1.5c draft + Revision 1.5d draft OpenPOWER @@ -93,6 +93,17 @@ + + 2018-05-10 + + + + Revision 1.5d: PC-relative addressing fourth + draft. + + + + 2018-04-28 diff --git a/specification/ch_1.xml b/specification/ch_1.xml index cd198f1..5471be6 100644 --- a/specification/ch_1.xml +++ b/specification/ch_1.xml @@ -181,6 +181,28 @@
Changes from Revision 1.4 - TBD + + + + Errata recorded at https://openpowerfoundation.org/?resource_lib=openpower-elfv2-errata-elfv2-abi-version-1-4 + have been incorporated into this document. + + + + + PowerISA version 3.1 introduces PC-relative instructions for + accessing code and data. Thus compilers and assembly programmers + that target version 3.1 or later can, if desired, avoid usage of + a TOC pointer for such accesses. The ABI has been updated to + describe the implications of this new capability. For specifics, + see , , , , , and . + + +
diff --git a/specification/ch_2.xml b/specification/ch_2.xml index f3aebeb..3e00de3 100644 --- a/specification/ch_2.xml +++ b/specification/ch_2.xml @@ -4086,7 +4086,9 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> protocol requirements for external function calls, and summarizes the protocol requirements for local function calls. Each entry in these - tables is further described in the referenced section. + tables is further described in the referenced section. A program may + contain any combination of the function call protocols in these + tables. Note that this ABI does not define protocols where the caller does not use a TOC pointer, but does preserve r2. It is most efficient when @@ -7899,16 +7901,8 @@ nop address of the called function to be in r12 when a cross-module function call is made. -
- Indirect Function Call (Absolute Medium Model) - - - - - -
- + shows how to make an indirect function call using small-model position-independent code. -
- Small-Model Position-Independent Indirect Function Call - - - - - -
- + shows how to make an indirect function call using large-model position-independent code. -
+ Large-Model Position-Independent Indirect Function Call - - - - - - + + + + + + + + C Code + + + + + Assembly Code + + + + + + + + extern void function( ); +extern void (*ptrfunc) ( ); + + +ptrfunc = function; + + + + + +(*ptrfunc) ( ); + + + + + + + + + + + +.section .text +/* TOC pointer is in r2 */ +addis r9,r2,ptrfunc@got@ha +ld r9,ptrfunc@got@l(r9) +addis r12,r2,function@got@ha +ld r12,function@got@l(r12) +std r12,0(r9) + +addis r9,r2,ptrfunc@got@ha +ld r9,ptrfunc@got@l(r9) +ld r12,0(r9) +std r2,24(r1) +mtctr r12 +bctrl +ld r2,24(r1) + + + + +
shows how to make an indirect function call using PC-relative addressing in a - function that does not preserve r2. [TBD: Formatting] + function that does not preserve r2. @@ -8114,24 +8158,21 @@ bctrl use the same r2 value. This scheme avoids having a compiler generate an overconservative r2 save and restore around every external call. - There are two cases where the caller should not provide a nop after + There are two cases where the caller need not provide a nop after the bl instruction performing a call: - When the caller is not guaranteed to preserve r2 (see - ); or + When the bl instruction is marked with an + R_PPC64_REL24_NOTOC relocation (see + ); or When the callee is in the same compilation unit and is guaranteed to preserve r2. - In the first case, the bl instruction must be marked with an - R_PPC64_REL24_NOTOC relocation. See . For calls to functions resolved at runtime, the linker must generate stub code to load the function address from the PLT.The stub code also must save r2 to 24(r1) unless either the call is marked with an - R_PPC64_REL24_NOTOC relocation as above, or - the call is marked + R_PPC64_REL24_NOTOC relocation, or the call is marked with an R_PPC64_TOCSAVE relocation that points to a nop provided in the caller's prologue. In either case, the stub code can omit the r2 save. @@ -8165,20 +8206,12 @@ bl target shows the model for branch instructions. -
- Branch Instruction Model - - - - - -
- +
Selecting one of multiple branches is accomplished in C with switch statements. An address table is used by the compiler to implement the switch statement selections in cases where the case labels satisfy @@ -8232,17 +8265,8 @@ b .L01 application) loaded into the low or high address range, absolute addressing of a branch table yields the best performance. -
- Absolute Switch Code (Within) for static modules located in low - or high 2 GB of address space - - - - - -
- + A faster variant of this code may be used to locate branch targets in the bottom 2 GB of the address space in conjunction with the lwz instruction in place of the lwa instruction. -
- Absolute Switch Code (Beyond) for static modules beyond the top - or bottom 2 GB of the address space - - - - - -
- + For position-independent code targeted at being dynamically loaded to different address ranges as DSO, the preferred code pattern uses TOC-relative addressing by taking advantage of the fact that the TOC @@ -8381,19 +8398,10 @@ bctr relative offsets from the start address of the branch table ensures position-independence when code is loaded at different addresses. - -
+ Position-Independent Switch Code for Small/Medium Models (preferred, with TOC-relative addressing) - - - - - - - +
For position-independent code targeted at being dynamically loaded to different address ranges as a DSO or a position-independent executable (PIE), the preferred code pattern uses TOC-indirect addresses for code @@ -8457,17 +8466,8 @@ bctr table ensures position independence when code is loaded at different addresses. -
- Position-Independent Switch Code for All Models (alternate, with - GOT-indirect addressing) - - - - - -
- + shows how, in the medium code model, PIC code can be used to avoid using the lwa instruction, which may @@ -8549,7 +8550,7 @@ f1:
shows a switch - implementation for PC-relative compilation units. [TBD: Formatting] + implementation for PC-relative compilation units. @@ -8589,6 +8590,7 @@ default: } + diff --git a/specification/ch_3.xml b/specification/ch_3.xml index 6c64091..eaac5e0 100644 --- a/specification/ch_3.xml +++ b/specification/ch_3.xml @@ -992,7 +992,8 @@ my_func: In the following figure, low24 specifies a 24-bit field taking up - bits 6–29 of a word and maintaining 4-byte alignment. The other bits + bits 6–29 of a word. The 32-bit + word is 4-byte aligned. The other bits remain unchanged. A call or unconditional branch instruction is an example of this field. @@ -2228,10 +2229,9 @@ my_func: - In the following figure, prefix34ds is similar to prefix34, but is - really just 32 bits because the two least-significant bits must be - zero and are not really part of the field. This is used, for example, - by the pld instruction. + In the following figure, prefix32 specifies a 32-bit field taking up + bits 14-31 and 48-61 of a doubleword. The doubleword is 8-byte + aligned. This is used, for example, by the pld instruction. @@ -2274,7 +2274,7 @@ my_func: - prefix34ds + prefix32 @@ -2334,7 +2334,7 @@ my_func: - prefix34ds (continued) + prefix32 (continued) @@ -2370,10 +2370,9 @@ my_func: - In the following figure, prefix34dq is similar to prefix34, but is - really just 31 bits because the three least-significant bits must be - zero and are not really part of the field. This is used, for example, - by the plxv instruction. + In the following figure, prefix31 specifies a 31-bit field taking up + bits 14-31 and 48-60 of a doubleword. The doubleword is 8-byte + aligned. This is used, for example, by the plxv instruction. @@ -2416,7 +2415,7 @@ my_func: - prefix34dq + prefix31 @@ -2476,7 +2475,7 @@ my_func: - prefix34dq (continued) + prefix31 (continued) @@ -2512,11 +2511,9 @@ my_func: - In the following figure, prefix28dq specifies a 25-bit field split - between bits 20-31 and 48-60 of a doubleword. The other bits - remain unchanged, and the 25-bit field is assumed to be concatenated - with three zero bits on the right to form a 28-bit offset. This is - used, for example, by the pmlxv instruction. + In the following figure, prefix25 specifies a 25-bit field taking up + bits 20-31 and 48-60 of a doubleword. The doubleword is 8-byte + aligned. This is used, for example, by the pmlxv instruction. @@ -2558,7 +2555,7 @@ my_func: - prefix28dq + prefix25 @@ -2609,7 +2606,7 @@ my_func: - prefix28dq (continued) + prefix25 (continued) @@ -2678,7 +2675,8 @@ my_func: G - Represents the offset from .TOC. at which the address of + Represents the address in + the .TOC. at which the address of the relocation entry’s symbol resides during execution. This implies the creation of a .got section. For more information, see @@ -2694,7 +2692,8 @@ my_func: L - Represents the section offset or address of the procedure + Represents the section + offset or address of the procedure linkage table entry for the symbol. This implies the creation of a .plt section if one does not already exist. It also implies the creation of a procedure linkage table (PLT) entry @@ -2922,15 +2921,6 @@ my_func: tp + tprel = (S + A) - - - pcrel - - - Represents the offset of the symbol being relocated - relative to the current instruction address. - - tlsgd @@ -3205,7 +3195,9 @@ my_func: half16* - G + + G – .TOC. + @@ -3219,7 +3211,9 @@ my_func: half16 - #lo(G) + + #lo(G – .TOC.) + @@ -3233,7 +3227,9 @@ my_func: half16* - #hi(G) + + #hi(G – .TOC.) + @@ -3247,7 +3243,9 @@ my_func: half16* - #ha(G) + + #ha(G – .TOC.) + @@ -3389,7 +3387,9 @@ my_func: half16 - #lo(L) + + #lo(L – .TOC.) + @@ -3403,7 +3403,9 @@ my_func: half16* - #hi(L) + + #hi(L – .TOC.) + @@ -3417,7 +3419,9 @@ my_func: half16* - #ha(L) + + #ha(L – .TOC.) + @@ -3781,7 +3785,10 @@ my_func: half16ds* - G >> 2 + + (G – .TOC.) >> + 2 + @@ -3795,7 +3802,10 @@ my_func: half16ds - #lo(G) >> 2 + + #lo(G – .TOC.) >> + 2 + @@ -3809,7 +3819,10 @@ my_func: half16ds - #lo(L) >> 2 + + #lo(L – .TOC.) >> + 2 + @@ -4633,223 +4646,447 @@ my_func: none - + + + R_PPC64_PLTSEQ + + + 119 + + + none + + + none + + + + + R_PPC64_PLTCALL + + + 120 + + + none + + + none + + + + + R_PPC64_PLTSEQ_NOTOC + + + 121 + + + none + + + none + + + + + R_PPC64_PLTCALL_NOTOC + + + 122 + + + none + + + none + + + + + R_PPC64_PLT16_LO_NOTOC + + + 123 + + + half16 + + + #lo(L – .TOC.) + + + + + R_PPC64_PLT16_HI_NOTOC + + + 124 + + + half16* + + + #hi(L – .TOC.) + + + + + R_PPC64_PLT16_HA_NOTOC + + + 125 + + + half16* + + + #ha(L – .TOC.) + + + + + R_PPC64_PLT16_LO_DS_NOTOC + + + 126 + + + half16ds + + + #lo(L – .TOC.) >> 2 + + + - R_PPC64_IRELATIVE + R_PPC64_PCREL34 - 248 + 127 - doubleword64 + prefix34* - See - . + S + A – P - + - R_PPC64_REL16 + R_PPC64_PCREL32 - 249 + 128 - half16* + prefix32* - S + A – P + (S + A – P) >> 2 - + - R_PPC64_REL16_LO + R_PPC64_PCREL31 - 250 + 129 - half16 + prefix31* - #lo(S + A – P) + (S + A – P) >> 3 - + - R_PPC64_REL16_HI + R_PPC64_PCREL25 - 251 + 130 - half16* + prefix25* - #hi(S + A – P) + (S + A – P) >> 3 - + - R_PPC64_REL16_HA + R_PPC64_GOT_PCREL34 - 252 + 131 - half16* + prefix34* - #ha(S + A – P) + G – P - + - R_PPC64_GNU_VTINHERIT + R_PPC64_GOT_PCREL32 - 253 + 132 - + prefix32* - + (G – P) >> 2 - + - R_PPC64_GNU_VTENTRY + R_PPC64_GOT_PCREL31 - 254 + 133 - + prefix31* - + (G – P) >> 3 - R_PPC64_PCREL34 + R_PPC64_GOT_PCREL25 + + + 134 + + + prefix25* + + + (G – P) >> 3 + + + + + R_PPC64_PCREL_OPT + + + 135 + + + none + + + none + + + + + R_PPC64_PLT_PCREL34 - 256 + 136 prefix34* - @pcrel + S + A – P - R_PPC64_PCREL34_DS + R_PPC64_PLT_PCREL32 - 257 + 137 - prefix34ds* + prefix32* - @pcrel >> 2 + (S + A – P) >> 2 - R_PPC64_PCREL34_DQ + R_PPC64_PLT_PCREL31 - 258 + 138 - prefix34dq* + prefix31* - @pcrel >> 3 + (S + A – P) >> 3 - R_PPC64_PCREL28_DQ + R_PPC64_PLT_PCREL25 - 259 + 139 - prefix28dq* + prefix25* - @pcrel >> 3 + (S + A – P) >> 3 - R_PPC64_GOT_PCREL34 + R_PPC64_PLT_PCREL34_NOTOC - 260 + 140 prefix34* - @got@pcrel + S + A – P - R_PPC64_GOT_PCREL34_DS + R_PPC64_PLT_PCREL32_NOTOC - 261 + 141 - prefix34ds* + prefix32* - @got@pcrel >> 2 + (S + A – P) >> 2 - R_PPC64_GOT_PCREL34_DQ + R_PPC64_PLT_PCREL31_NOTOC - 262 + 142 - prefix34dq* + prefix31* - @got@pcrel >> 3 + (S + A – P) >> 3 - R_PPC64_GOT_PCREL28_DQ + R_PPC64_PLT_PCREL25_NOTOC - 263 + 143 - prefix28dq* + prefix25* - @got@pcrel >> 3 + (S + A – P) >> 3 - + - R_PPC64_PCREL_OPT + R_PPC64_IRELATIVE + + + 248 + + + doubleword64 + + + See + . + + + + + R_PPC64_REL16 + + + 249 + + + half16* - 264 + S + A – P + + + + + R_PPC64_REL16_LO + + + 250 + + + half16 + + + #lo(S + A – P) + + + + + R_PPC64_REL16_HI + + + 251 + + + half16* + + + #hi(S + A – P) + + + + + R_PPC64_REL16_HA + + + 252 + + + half16* + + + #ha(S + A – P) + + + + + R_PPC64_GNU_VTINHERIT + + + 253 + + + + + + + + + + + R_PPC64_GNU_VTENTRY + + + 254 @@ -4961,6 +5198,25 @@ my_func: associated with a global entry point. See for discussion of its use. + R_PPC64_PLTSEQ, R_PPC64_PLTCALL + + These relocations mark the instruction as being part of an inline + PLT call sequence in a function where r2 is a valid TOC pointer. + R_PPC64_PLTCALL is used to mark the call instruction, while + R_PPC64_PLTSEQ is used on other instructions in the sequence that + don't have PLT relocations. All instructions in a given sequence + shall have relocations with the same symbol and addend. Note that + R_PPC64_PLTCALL also implicitly marks the nop or TOC-restoring + instruction immediately following the call instruction. + + R_PPC64_PLTSEQ_NOTOC, + R_PPC64_PLTCALL_NOTOC + + These relocations are like the corresponding R_PPC64_PLTSEQ and + R_PPC64_PLTCALL relocations, but are used in functions where r2 is + not a valid TOC pointer. All instructions in the sequence shall use + _NOTOC variant relocations. + R_PPC64_PCREL_OPT This relocation type requests that the annotated @@ -5023,7 +5279,7 @@ addi 2,2,.TOC.-func@l .quad func@localentry -
+
Assembler- and Linker-Mediated Executable Optimization To optimize object code, the assembler and linker may rewrite object code to implement the function call and return conventions and access to