From 508ef6ce66489aae7a2e3c12810585f4aa3662d5 Mon Sep 17 00:00:00 2001 From: Bill Schmidt Date: Fri, 13 Apr 2018 13:06:47 -0500 Subject: [PATCH] Second draft of PC-relative addressing changes. Signed-off-by: Bill Schmidt --- specification/bk_main.xml | 13 +- specification/ch_1.xml | 4 + specification/ch_2.xml | 444 ++++++---------- specification/ch_3.xml | 1065 ++++++++++++++++++++++++++++++++----- specification/ch_4.xml | 68 +-- 5 files changed, 1121 insertions(+), 473 deletions(-) diff --git a/specification/bk_main.xml b/specification/bk_main.xml index 71c611b..9fd664f 100644 --- a/specification/bk_main.xml +++ b/specification/bk_main.xml @@ -57,7 +57,7 @@ Freescale Semiconductor, Inc - Revision 1.5 draft + Revision 1.5b draft OpenPOWER @@ -93,6 +93,17 @@ + + 2018-04-13 + + + + Revision 1.5b: PC-relative addressing second + draft. + + + + 2018-03-14 diff --git a/specification/ch_1.xml b/specification/ch_1.xml index 4ac5519..1316397 100644 --- a/specification/ch_1.xml +++ b/specification/ch_1.xml @@ -179,4 +179,8 @@ +
+ Changes from release 1.4 + TBD +
diff --git a/specification/ch_2.xml b/specification/ch_2.xml index 7b8cbc2..5668293 100644 --- a/specification/ch_2.xml +++ b/specification/ch_2.xml @@ -4045,70 +4045,6 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> -
- Global Data Addressing Models - This specification provides for two global data - addressing models. The traditional addressing model, which we will call - "TOC-based," relies on a dedicated table-of-contents (TOC) pointer to - obtain the addresses of global data. PowerISA version 3.1 introduces new - "PC-relative" instructions that can be used to obtain the addresses of - global data relative to the current instruction address (CIA). Code that - is targeted to run on hardware compliant with PowerISA 3.1 may make use of - this capability with a "PC-relative" addressing model. - Each compilation unit must adhere entirely to - one addressing model or the other. However, it is expressly possible to - link TOC-based and PC-relative compilation units into a single - executable, or to dynamically link from a compilation unit with one - addressing model to a compilation unit with the other addressing model. - In particular, a PC-relative compilation unit may be linked with an - existing TOC-based library. Note that a "compilation unit" may consist of - hand-written assembly code as well as high-level source code. - Compilers and other tools performing - link-time optimizations that repackage functions into different - compilation units must not mix PC-relative and TOC-based functions in - the same compilation unit. [To discuss: This could be permitted, but - the value is unclear and it would be likely to spawn occasional - linker bugs.] Similarly, programmers should not be allowed to - specify a single function in a TOC-based compilation unit to use the - PC-relative addressing model or vice versa; for example, using GCC's - "#pragma target" syntax. [To discuss: How should this be recorded and - communicated? Perhaps add to e_flags in the ELF header for module - objects only? We can communicate the need for PC-relative PLT stubs - to the linker on calls with a reloc, so the linker may not need this, - but perhaps other tools will?] - Details of the two addressing models will be - provided throughout this specification. However, a brief description - of each is in order. -
- TOC-Based Addressing Model - In the traditional TOC-based addressing model, - each function uses register r2 (see ) to access global memory. A variety - of techniques, known as TOC-relative, TOC-indirect, GOT-relative, etc., - may be used to address the global data, but all these techniques use the - TOC pointer r2 as part of the data reference. - With the cooperation of the linker, each - function in a TOC-based compilation unit is responsible for the - establishment and maintenance of its own TOC pointer. All functions - within a compilation unit have the same TOC pointer, so local function - calls may assume it does not change. An external function call may be - resolved to a function in a shared object having a different TOC - pointer, so a caller in a TOC-based compilation unit must save its TOC - pointer prior to making a call outside the compilation unit, and restore - its value upon return before the TOC pointer may be used to access global - data. -
-
- PC-Relative Addressing Model - A function in a PC-relative compilation unit - has no TOC pointer. All accesses to global data are made relative to - the current instruction address. Since functions in TOC-based - compilation units are responsible for establishment and maintenance - of their own TOC pointers, register r2 may be used freely within a - PC-relative compilation unit, with no need to save or restore the - register when modifying it. -
-
Function Calling Sequence The standard sequence for function calls is outlined in this section. @@ -4273,22 +4209,25 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> Nonvolatile - In a TOC-based - compilation unit, register r2 is nonvolatile with - respect to calls between functions in the same compilation - unit. It is saved and restored by code inserted by the linker - resolving a call to an external function. For more - information, see . + Register r2 is nonvolatile with respect to calls + between functions in the same compilation unit when the caller requires a TOC + pointer. It is saved and restored by code inserted + by the linker resolving a call to an external function. For + more information, see and . or Volatile - Register r2 is volatile and available for use in - PC-relative compilation units. + Register r2 is volatile and available for use in a + function whose symbol table entry contains an st_other + field wherein the three most-significant bits have a value + of 001. See + . - TOC pointer for - TOC-based compilation units. + TOC pointer. @@ -4460,8 +4399,7 @@ xml:id="dbdoclet.50655240_pgfId-1156194">   TOC Pointer - Usage (TOC-Based Compilation Units - Only) + Usage As described in , the TOC pointer, r2, is commonly initialized by the global function entry point when a function @@ -4476,14 +4414,19 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> dynamic linker. For references through function pointers, it is the compiler's or assembler programmer's responsibility to insert appropriate TOC save and restore code. If the function is called from - the same module as the callee, the callee must preserve the value of - r2. (See - for a description of function - entry conventions.) - When a function calls another function, the TOC pointer must have - a legal value pointing to the TOC base, which may be initialized as - described in - . + the same module as the callee, the callee must normally preserve the value of r2. + However, if the callee's symbol table + entry is flagged to indicate the callee does not preserve r2, the + caller is responsible for saving and restoring the TOC pointer if it + needs it. (See + for more information.) + When a function calls another function that requires a TOC pointer, the TOC + pointer must have a legal value pointing to the TOC base, which may be + initialized as described in . When global data is accessed, the TOC pointer must be available for dereference at the point of all uses of values derived from the TOC pointer in conjunction with the @l operator. This property is used by @@ -4513,12 +4456,12 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> context. - When a function is entered through its global entry point, + When a function that requires a + TOC pointer is entered through its global entry point, register r12 contains the entry-point address. For more information, see the description of dual entry points in - and - - . + + and .   @@ -5200,16 +5143,8 @@ xml:id="dbdoclet.50655240_pgfId-1156194"> is volatile over a function call.   TOC Pointer Doubleword - If a function in a TOC-based - compilation unit changes the value of the TOC pointer - register, it shall first save it in the TOC pointer doubleword. - The TOC pointer doubleword is reserved - for future use for functions in a PC-relative compilation - unit. [To discuss: This has implications for alloca, as if we - reserve it for future use, then the TOC pointer doubleword must be - copied during a dynamic allocation operation. I suspect it is - better to suffer that slight penalty rarely in order to have the - flexibility to use this for another future purpose.] + If a function changes the value of the TOC pointer register, + it shall first save it in the TOC pointer doubleword.
Optional Save Areas @@ -6250,6 +6185,20 @@ s6 - 72 (stored) When instructions hold relative addresses, a program library can be loaded at various positions in virtual memory and is referred to as a position-independent code model. + When generating code for PowerISA version 3.1 + or above, this specification provides two ways to address non-local data + and text. The historical method relies on a dedicated table-of-contents + (TOC) pointer to obtain such addresses. PowerISA version 3.1 introduces + new "PC-relative" instructions that can be used to obtain such + addresses relative to the current instruction address (CIA). Both + methods may be used in the same executable, dynamically shared + object (DSO), object file, or even in the same function. If a + function does not require a TOC pointer for addressing, it is not required + to establish this pointer in register r2, and may choose not to preserve + register r2's value provided that the function's symbol table entry is + appropriately annotated. Full details of function call linkage + requirements are provided in .
Code Model Overview Executable modules can be built to use either position-dependent or @@ -6312,9 +6261,9 @@ lvx v1, 0, r12 - pld r12, symbol@pcrel(0), 1 + pld r12, symbol@pcrel -plvx v1, symbol@pcrel(0), 1 +plxv v1, symbol@pcrel In the OpenPOWER ELF V2 ABI, position-dependent code built with this addressing scheme may have a Global Offset Table (GOT) in the data segment that holds addresses. (For more information, see @@ -6355,7 +6304,7 @@ plvx v1, symbol@pcrel(0), 1 references and TOC-pointer initializations can be performed using a two-instruction sequence. - PC-relative offsets are always 34 bits for all code models, with + PC-relative offsets are usually 34 bits for all code models, with a maximum addressing reach of 16GB. The effective addressing reach for global data is 8GB, since data sections are always located at higher virtual addresses than text sections. @@ -6425,9 +6374,9 @@ lvx v1, 0, r12 private data). - pld r12, symbol@pcrel(0), 1 + pld r12, symbol@pcrel -plvx v1, symbol@pcrel(0), 1 +plxv v1, symbol@pcrel By using PC-relative GOT-indirect @@ -6435,10 +6384,10 @@ plvx v1, symbol@pcrel(0), 1 - pld r12, symbol@got@pcrel(0), 1 + pld r12, symbol@got@pcrel ld r12, 0(r12) -pld r12, symbol@got@pcrel(0), 1 +pld r12, symbol@got@pcrel lvx v1, 0, r12 A compiler may generate a PC-relative addressing sequence to access @@ -6450,16 +6399,6 @@ lvx v1, 0, r12 the data reference is satisfied at static link time. See . - [To discuss: I'd like to see the assembler - support "pld r12, symbol@pcrel" as an alternative to "pld r12, - symbol@pcrel(0), 1", and "pld r12, symbol@got@pcrel" as an - alternative to "pld r12, symbol@got@pcrel(0), 1". In general, any - prefix load/store with only two arguments is PC-relative; the - second argument is either a 34-bit offset or a GPR. Is this - reasonable or too confusing? Another alternative would be "pld r12, - symbol@pcrel(cia)" for an offset, and "pld r12, r5, cia" for the - GPR case. I guess we want something readable that isn't too - complex for the assembler to sort out.] Position-independent executables or shared objects have a GOT in the data segment that holds addresses. When the system creates a memory image from the file, the GOT entries are updated to reflect the @@ -6477,11 +6416,11 @@ lvx v1, 0, r12
Code Models - TOC-Based Compilation - Units Compilers may provide different code models depending on the expected size of the TOC and the size of the entire executable or - shared library. + shared library. Assuming that the + TOC pointer is used to address data and/or text, the following + considerations apply: Small code model: The TOC is accessed using 16-bit offsets @@ -6524,52 +6463,26 @@ lvx v1, 0, r12 TOCs, or by some other method. The suggested allocation order of sections is provided in . - PC-Relative Compilation - Units - Compilers may provide different code models depending on the size of - the entire executable or shared library. There is no small code - model for PC-relative compilation units. - - - - - Medium code model: Accesses to module-local code and data objects - use PC-relative addressing with 34-bit offsets. - Position-independent code uses PC-relative GOT-indirect - addressing to access other objects in the binary. - - - - - Large code model: Used when 34-bit offsets are insufficient to - reach global data or the GOT from at least one text section, - this is similar to the medium code model, except that up to - 64-bit PC-relative offsets are used by generating them into a - register. [To discuss: None of the options for this seem ideal. - It takes about 5 instructions to generate a 64-bit constant into - a register, though we can perhaps use linker optimizations to - replace with a smaller sequence when available. A second choice - is to place the offset in a .quad in the text section to reach - the .got entry, but this would incur a load-load dependency. - (Are there cases where this requires a text relocation resolution - during dynamic linking?) A third choice is to fail the compile - and require TOC addressing with large code model when 34-bit - offsets aren't enough, though that doesn't initially seem - reasonable. Whatever we choose, we should document the sequence - and any associated linker optimizations.] - - - - - As with TOC-based compilation units, the medium code model is the - default for compilers, and is applicable to most programs and - libraries. The code examples in this document generally use the - medium code model. + PC-relative addressing may be used in either the small or the + medium code model, and is identical for both. Accesses to + module-local code and data objects use PC-relative addressing with + up to 34-bit offsets. Position-independent code uses PC-relative + GOT-indirect addressing to access other objects in the binary. + If PC-relative addressing span is insufficient to reach any data + item, that access must either be made relative to the TOC + pointer, or a PC-relative indexed form instruction must be used + for the access. PC-relative indexed form instructions provide + up to 64 bits of offset from the current instruction address. + [To discuss: I'm deliberately leaving this flexible for now. + Any concerns? It appears we will probably not see a + load-high-immediate-32 sort of instruction in P10, so we won't + be able to define those kinds of relocs yet.] - When linking PC-relative relocatable objects, the linker should - attempt to place the .got section near the text sections. + When linking objects that contain PC-relative relocations, the + linker should attempt to place the .got section near the text + sections.
@@ -6579,50 +6492,13 @@ lvx v1, 0, r12 section.
Function Prologue - The function prologue is responsible for - the following functions: - - - Establishing addressability to global data - - - Creating a stack frame when required - - - Saving any nonvolatile registers that are used by the - function - - - Saving any limited-access bits that are used by the function, - per the rules described in - - - This ABI shall be used in conjunction with - the Power Architecture that implements the - mfocrf architecture level. Further, - OpenPOWER-compliant processors shall implement implementation-defined - bits in a manner to allow the combination of multiple - mfocrf results with an OR instruction; - for example, to yield a word in r0 including all three preserved CRs as - follows: - mfocrf r0, crf2 -mfocrf r1, crf3 -or r0, r0, r1 -mfocrf r1, crf4 -or r0, r0, r1 - Specifically, this allows each - OpenPOWER-compliant processor implementation to set each field to hold - either 0 or the correct in-order value of the corresponding CR field at - the point where the mfocrf - instruction is performed. - TOC-Based Compilation - Units - In a TOC-based compilation unit, - a function's prologue establishes addressability by + A function's prologue establishes addressability by initializing a TOC pointer in register r2, if necessary, and a stack frame, if necessary, and may save any nonvolatile registers it - uses. + uses. Not all functions must initialize + a TOC pointer, and not all functions must preserve the existing value + of r2. See for more + information. All functions have a global entry point (GEP) available to any caller and pointing to the beginning of the prologue. Some functions may have a secondary entry point to optimize the cost of TOC pointer @@ -6636,7 +6512,9 @@ or r0, r0, r1 entry point when the r2 register is known to hold a valid TOC base value. Function pointers shared between modules shall always use the global entry point to specify the address of a function. - When a linker causes control to transfer to a global entry point, + When a linker causes control to transfer to a global entry point + of a function that requires a TOC + pointer, it must insert a glue code sequence that loads r12 with the global entry-point address. Code at the global entry point can assume that register r12 points to the GEP. @@ -6653,10 +6531,9 @@ addi r2, r2, .TOC.-func@l form that is faster due to instruction fusion, such as: lis r2, .TOC.@ha addi r2, r2, .TOC.@l - In addition to establishing - addressability, the function prologue + In addition to establishing addressability, the function prologue is responsible for the following functions: - + Creating a stack frame when required @@ -6670,7 +6547,7 @@ addi r2, r2, .TOC.@l - This ABI shall be used in conjunction with + This ABI shall be used in conjunction with the Power Architecture that implements the mfocrf architecture level. Further, OpenPOWER-compliant processors shall implement implementation-defined @@ -6678,12 +6555,12 @@ addi r2, r2, .TOC.@l mfocrf results with an OR instruction; for example, to yield a word in r0 including all three preserved CRs as follows: - mfocrf r0, crf2 + mfocrf r0, crf2 mfocrf r1, crf3 or r0, r0, r1 mfocrf r1, crf4 or r0, r0, r1 - Specifically, this allows each + Specifically, this allows each OpenPOWER-compliant processor implementation to set each field to hold either 0 or the correct in-order value of the corresponding CR field at the point where the mfocrf @@ -6707,14 +6584,6 @@ or r0, r0, r1 the meaning of the second parameter, which is put in the three most-significant bits of the st_other field in the ELF Symbol Table entry. - PC-Relative Compilation - Units - - In a PC-relative compilation unit, the function prologue does not - require any setup code to establish addressability to global data. - Therefore there is also no need for a function to have a separate - local entry point. -
Function Epilogue @@ -7438,12 +7307,12 @@ ptr = &dst; .extern dst .extern ptr .section ".text" -plwz r9, src@pcrel(0), 1 -pstw r9, dst@pcrel(0), 1 -paddi r11, 0, dst@pcrel, 1 -pstd r11, ptr@pcrel(0), 1 -pld r11, ptr@pcrel(0), 1 -plwz r9, src@pcrel(0), 1 +plwz r9, src@pcrel +pstw r9, dst@pcrel +paddi r11, dst@pcrel +pstd r11, ptr@pcrel +pld r11, ptr@pcrel +plwz r9, src@pcrel stw r9, 0(r11) @@ -7467,8 +7336,8 @@ stw r9, 0(r11) a signed 32-bit offset from a base register. - For a PIC code (see - and + For TOC-based PIC + code (see and ), the offset in the Global Offset Table where the value of the symbol is stored is given by the assembly syntax symbol@got. This syntax represents the @@ -7611,8 +7480,8 @@ nop - For a function call in a PC-relative compilation unit, the nop in - should not be generated. + For a function call in a function that does not preserve r2, the nop in + need not be generated. For indirect function calls, the address of the function to be called is placed in r12 and the CTR register. A bctrl instruction is used @@ -7688,9 +7557,6 @@ bctrl shows how to make an indirect function call using small-model position-independent code. - Note that the store and reload of the - TOC pointer r2 is not required in a PC-relative compilation - unit.
@@ -7762,9 +7628,6 @@ ld r2,24(r1) shows how to make an indirect function call using large-model position-independent code. - Note that the store and reload of the - TOC pointer r2 is not required in a PC-relative compilation - unit.
@@ -7776,8 +7639,14 @@ ld r2,24(r1)
- - TOC-Based Compilation Units - Function calls need to be performed in conjunction with + + Function calls often + need to be performed in conjunction with establishing, maintaining, and restoring addressability through the TOC pointer register, r2. When a function is called, the TOC pointer register - may be modified. The caller must provide a nop after the bl instruction - performing a call, if r2 is not known to have the same value in the - callee. This is generally true for external calls. The linker will - replace the nop with an r2 restoring instruction if the caller and callee - use different r2 values, The linker leaves it unchanged if they use the - same r2 value. This scheme avoids having a compiler generate an + may be modified. In many cases, + the caller must provide a nop + after the bl instruction performing a call, if r2 is not known to have + the same value in the callee. This is generally true for external calls. + The linker will replace the nop with an r2 restoring instruction if the + caller and callee use different r2 values. The linker leaves it unchanged if they + use the same r2 value. This scheme avoids having a compiler generate an overconservative r2 save and restore around every external call. + + There are two cases where the caller should not provide a nop after + the bl instruction performing a call: + + When the caller is not guaranteed to preserve r2 (see + ); or + When the callee is in the same compilation unit and + is guaranteed to preserve r2. + + In both cases, the bl instruction must be marked with an + R_PPC64_REL24_NOTOC relocation. + For calls to functions resolved at runtime, the linker must generate stub code to load the function address from the PLT. - The stub code also must save r2 to 24(r1) unless the call is marked + The stub code also must save r2 to 24(r1) unless + either the call is marked with an + R_PPC64_REL24_NOTOC relocation as above, or + the call is marked with an R_PPC64_TOCSAVE relocation that points to a nop provided in the - caller's prologue. In that case, the stub code can omit the r2 save. - Instead, the linker replaces the prologue nop with an r2 save. + caller's prologue. In either + case, the stub code can omit the r2 save. + In the latter case, + the linker replaces the prologue nop with an r2 save. tocsaveloc: nop ... @@ -7868,19 +7747,6 @@ bl target , , and . - PC-Relative Compilation - Units - - As with TOC-based compilation units, for calls to functions resolved at - runtime, the linker must generate stub code to load the function - address from the PLT. When the stub code is generated on behalf of - an indirect call in a PC-relative compilation unit, the linker may - omit the save and restore of r2 from the stub code. This behavior - is optional but recommended. Calls in PC-relative code should not - be marked with the R_PPC64_TOCSAVE or R_PPC64_REL24_NOTOC relocations. - [To discuss: Do we need a relocation to identify this as a PC-relative - call?] -
Branching @@ -8277,12 +8143,7 @@ f1: shows a switch - implementation for PC-relative compilation units. [TBD: This needs to - be a figure, not a table, which may require working with Annette and - FrameMaker to get something that looks similar to the other figures. - All we have in the document for the other figures is .png files from - the old FrameMaker version. Or maybe we should just convert all the - other figures to tables.] + implementation for PC-relative compilation units. [TBD: Formatting] @@ -8328,7 +8189,7 @@ default: cmplwi r12, 4 bge .Ldefault slwi r12, 2 - paddi r10, r0, .Ltab@pcrel, 1 + paddi r10, .Ltab@pcrel lwax r8, r10, r12 add r10, r8, r10 mtctr r10 @@ -8416,11 +8277,6 @@ addi r3,r1,p ; R3 = new data area following parameter save area. - - It is unnecessary to copy the TOC pointer doubleword for a - PC-relative compilation unit. [To discuss: Should we, for future - use of this slot for another purpose?] - Additional instructions will be necessary for an allocation of variable size. If a dynamic deallocation will occur, the r1 stack @@ -8794,6 +8650,10 @@ addi r3,r1,p ; R3 = new data area following parameter save area.. + + [Ignorant question to discuss: Are there any impacts to unwinding from + new r2 preservation rules?] + diff --git a/specification/ch_3.xml b/specification/ch_3.xml index c5ee16d..4d04a29 100644 --- a/specification/ch_3.xml +++ b/specification/ch_3.xml @@ -245,9 +245,7 @@ e_ident[EI_DATA] ELFDATA2LSB For all little-endian implementations.
TOC - The TOC is part of the data segment of an executable program - built from at least one TOC-based object - file. + The TOC is part of the data segment of an executable program. This section describes a common layout of the TOC in an executable file or shared object. Particular tools are not required to follow the layout specified here. @@ -272,6 +270,11 @@ e_ident[EI_DATA] ELFDATA2LSB For all little-endian implementations. + + [To discuss: Alan, is it appropriate to make any adjustments here + in the presence of PC-relative addressing, to get any sections closer + to .text, or are we as ideal as we can get already?] + The medium code model is expected to provide a sufficiently large TOC to provide all data addressing needs of a module with a single TOC. Compilers may generate two-instruction medium code model references @@ -306,9 +309,7 @@ e_ident[EI_DATA] ELFDATA2LSB For all little-endian implementations. - For TOC-based compilation - units, the OpenPOWER - ABI uses the three most-significant bits in the + The OpenPOWER ABI uses the three most-significant bits in the symbol st_other field to specify the number of instructions between a function's global entry point and local entry point. The global entry point is used when it is necessary to set up the TOC pointer (r2) for the @@ -490,6 +491,427 @@ my_func: optimize the prologue sequence. Nor does the absence of this relocation forbid the linker from optimizing the prologue sequence.
+
+ Function Call Linkage Protocols + + The compiler (or assembly programmer) and linker cooperate to make + function calls as efficient as possible. Different protocols are + required depending on whether a call is local (caller and callee in + the same compilation unit), whether the caller requires r2 to be + preserved, and whether the callee promises to preserve r2. The + "st_other bits" in the caller's and callee's symbol table entries, + described in , are used to + determine information about r2 preservation requirements. + + + A function that does not require a TOC pointer may have its + st_other bits set to 0 or 1, and its local and global entry points + are the same. If its st_other bits are 0, it preserves r2; if + its st_other bits are 1, it does not promise to do so. It is best + that a function with st_other bits set to 0 does not contain any + function calls; see the Note for st_other 0 in + . + + + summarizes the + protocol requirements for external function calls, and + summarizes the + protocol requirements for local function calls. Each entry in these + tables is further described in the referenced section. + +
+ Protocols for External Function Calls + + + + + + + + + + + + st_other bits + + + + + PLT stub + + + + + nop needed? + + + + + Relocation + + + + + Section link + + + + + + + Caller + + + + + Callee + + + + + + + + + 1 + + + + + 0–6 + + + + + No r2 save + + + + + No + + + + + R_PPC64_REL24_NOTOC + + + + + + + + + + + + 2–6 + + + + + 0–6 + + + + + r2 save + + + + + Yes + + + + + N/A + + + + + + + + + + +
+ + Protocols for Local Function Calls + + + + + + + + + + + + st_other bits + + + + + Call method + + + + + nop needed? + + + + + Relocation + + + + + Section link + + + + + + + Caller + + + + + Callee + + + + + + + + + 1 + + + + + 0–1 + + + + + Local + + + + + No + + + + + R_PPC64_REL24_NOTOC + + + + + + + + + + + + 2–6 + + + + + r12 setup stub + + + + + No + + + + + R_PPC64_REL24_NOTOC + + + + + + + + + + + + 2–6 + + + + + 0 + + + + + Local + + + + + No + + + + + R_PPC64_REL24_NOTOC + + + + + + + + + + + + 1 + + + + + r2 save stub + + + + + Yes + + + + + N/A + + + + + + + + + + + + 2–6 + + + + + Local + + + + + No + + + + + R_PPC64_REL24_NOTOC + + + + + + + + + + +
+
+ External Call, Preserving Caller + + When a function that preserves r2 makes any call to an external + function, the compiler generates a nop instruction after the bl + instruction for the call. The linker generates a procedure linkage + table (PLT) stub that saves r2, and replaces the nop instruction with + a restore of r2. If the callee requires a TOC, the PLT stub also + includes code to place the callee's global entry point into r12. + See for a full + description of PLT stubs. + +
+
+ External Call, Nonpreserving Caller + + When a function that does not preserve r2 makes any call to an + external function, the compiler does not generate a nop instruction + after the bl instruction for the call. Instead, the compiler + annotates the bl instruction with an R_PPC64_REL24_NOTOC + relocation. The linker generates a PLT stub that does not include + a save of r2. If the callee requires a TOC, the PLT stub also + includes code to place the callee's global entry point into r12. + +
+
+ Local Call, Nonpreserving Caller, Callee Needs No TOC + + When a function that does not preserve r2 makes a local call to + a function that does not require a TOC pointer, the compiler + generates a direct call to the function's local entry point, and + does not generate a nop instruction after the call. The compiler + annotates the bl instruction with an R_PPC64_REL24_NOTOC relocation. + +
+
+ Local Call, Nonpreserving Caller, Callee Requires TOC + + When a function that does not preserve r2 makes a local call to + a function that requires a TOC pointer, the compiler does not + generate a nop instruction after the bl instruction for the call. + The linker generates a PLT stub that does not include a save of r2, + but does include code to place the callee's global entry point into + r12. + +
+
+ Local Call, Preserving Caller, Preserving Callee + + When a function that preserves r2 makes a local call to a function + that also preserves r2, the compiler generates a direct call to the + function's local entry point, and does not generate a nop + instruction after the call. The compiler annotates the bl + instruction with an R_PPC64_REL24_NOTOC relocation. + +
+
+ Local Call, Preserving Caller, Nonpreserving Callee + + When a function that preserves r2 makes a local call to a function + that does not preserve r2, the compiler generates a nop instruction + after the call. The linker generates a PLT stub that saves r2, but + does not include code to place the callee's global entry point into + r12, and replaces the nop instruction with a restore of r2. + +
+
Use of the Small Data Area For a data item in the .sdata or .sbss sections, a compiler may @@ -2069,71 +2491,325 @@ my_func: - - 0 - + + 0 + + + 1 + + + 2 + + + 3 + + + 4 + + + 5 + + + 6 + + + 7 + + + 8 + + + 9 + + + 10 + + + 11 + + + 12 + + + 13 + + + 14 + + + 15 + + + + + + + In the following figure, prefix34 specifies a 34-bit field split + between bits 14-31 and 48-63 of a doubleword. The other bits + remain unchanged. This is used by many PC-relative load and store + instructions. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + prefix34 + + + + + 0 + + + 13 + + + 14 + + + + + + 31 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + prefix34 (continued) + + + + + 32 + + + + + + 47 + + + 48 + + + 63 + + + + + + + In the following figure, prefix34ds is similar to prefix34, but is + really just 32 bits because the two least-significant bits must be + zero and are not really part of the field. This is used, for example, + by the pld instruction. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + prefix34ds + + + + + 0 + + + 13 + + + 14 + + + + + + + + + + + + 31 + + + - 1 + - 2 + - - 3 + + - 4 + - - 5 + + - 6 + - 7 + - - 8 + + + + + + + + + + + + + prefix34ds (continued) - - 9 + + + + + + + + + + 32 - 10 + - - 11 + + 47 - - 12 + + 48 - - 13 + + 61 - - 14 + + 62 - 15 + 63 - In the following figure, prefix34 specifies a 34-bit field split - between bits 14-31 and 48-63 of a doubleword. The other bits - remain unchanged. This is used by PC-relative load and store - instructions. + In the following figure, prefix34dq is similar to prefix34, but is + really just 31 bits because the three least-significant bits must be + zero and are not really part of the field. This is used, for example, + by the plxv instruction. - + - - + + + + @@ -2151,6 +2827,12 @@ my_func: + + + + + + @@ -2159,8 +2841,8 @@ my_func: - - prefix34 + + prefix34dq @@ -2176,6 +2858,12 @@ my_func: + + + + + + 31 @@ -2193,6 +2881,12 @@ my_func: + + + + + + @@ -2207,9 +2901,15 @@ my_func: - - prefix34 (continued) + + prefix34dq (continued) + + + + + + @@ -2224,6 +2924,12 @@ my_func: 48 + + 60 + + + 61 + 63 @@ -2232,37 +2938,29 @@ my_func: - In the following figure, prefix34ds is similar to prefix34, but is - really just 32 bits because the two least-significant bits must be - zero and are not really part of the field. This is used, for example, - by the pldu instruction. In addition to the use of this relocation - field with the DS forms, prefix34ds relocations are also used in - conjunction with DQ forms, such as the plq instruction. In those - instances, the linker and assembler collaborate to create valid DQ - forms. They raise an error if the specified offset does not meet the - constraints of a valid DQ instruction form displacement. + In the following figure, prefix28dq specifies a 25-bit field split + between bits 20-31 and 48-60 of a doubleword. The other bits + remain unchanged, and the 25-bit field is assumed to be concatenated + with three zero bits on the right to form a 28-bit offset. This is + used, for example, by the pmlxv instruction. - - - - - - + + + + + + - - - - - + @@ -2279,28 +2977,28 @@ my_func: + + + - - prefix34ds + + prefix28dq 0 - - 13 - - - 14 - - - + + 19 + + + 20 @@ -2310,9 +3008,6 @@ my_func: - - - @@ -2333,17 +3028,14 @@ my_func: - - - - - prefix34ds (continued) + + prefix28dq (continued) @@ -2356,9 +3048,6 @@ my_func: 32 - - - 47 @@ -2366,10 +3055,10 @@ my_func: 48 - 61 + 60 - 62 + 61 63 @@ -2382,12 +3071,6 @@ my_func:
Relocation Notations The following notations are used in the relocation table. - - [There seem to be a number of missing notations in this table. We - have #higher[a], #highest[a], and got, and perhaps the @ notation - could use further description. Also, there is some usage of #high and - #higha instead of #hi and #ha, which I assume is a mistake.] - @@ -2525,7 +3208,8 @@ my_func: #hi(value) - Denotes bits 16–63 of the indicated value. That + Denotes bits 16–31 of the indicated value. That is: #hi(x) = x >> 16 @@ -2535,12 +3219,57 @@ my_func: #ha(value) - Denotes the high adjusted value: bits 16–63 of the + Denotes the high adjusted value: bits + 16–31 of the indicated value, compensating for #lo( ) being treated as a signed number. That is: #ha(x) = (x + 0x8000) >> 16 + + + #higher(value) + + + Denotes bits 32–47 of the indicated value. That + is: + #higher(x) = x >> 32 + + + + + #highera(value) + + + Denotes the higher adjusted value: bits 32–47 + of the + indicated value, compensating for #hi( ) being treated as a + signed number. That is: + #highera(x) = (x + 0x80000000) >> 32 + + + + + #highest(value) + + + Denotes bits 48–63 of the indicated value. That + is: + #higher(x) = x >> 48 + + + + + #highesta(value) + + + Denotes the highest adjusted value: bits 48–63 + of the + indicated value, compensating for #higher( ) being treated as a + signed number. That is: + #highesta(x) = (x + 0x800000000000) >> 48 + + TP @@ -4206,7 +4935,7 @@ my_func: half16 - #high(S + A) + #hi(S + A) @@ -4220,7 +4949,7 @@ my_func: half16 - #higha(S + A) + #ha(S + A) @@ -4234,7 +4963,9 @@ my_func: half16 - #high(@tprel) + + #hi(@tprel) + @@ -4248,7 +4979,9 @@ my_func: half16 - #higha(@tprel) + + #ha(@tprel) + @@ -4262,7 +4995,9 @@ my_func: half16 - #high(@dtprel) + + #hi(@dtprel) + @@ -4276,7 +5011,9 @@ my_func: half16 - #higha(@dtprel) + + #ha(@dtprel) + @@ -4426,10 +5163,10 @@ my_func: R_PPC64_PCREL34 - 256? + 256 - prefix34 + prefix34* @pcrel @@ -4440,7 +5177,7 @@ my_func: R_PPC64_PCREL34_DS - 257? + 257 prefix34ds* @@ -4449,15 +5186,43 @@ my_func: @pcrel >> 2 + + + R_PPC64_PCREL34_DQ + + + 258 + + + prefix34dq* + + + @pcrel >> 3 + + + + + R_PPC64_PCREL28_DQ + + + 259 + + + prefix28dq* + + + @pcrel >> 3 + + R_PPC64_GOT_PCREL34 - 258? + 260 - prefix34 + prefix34* @got@pcrel @@ -4468,7 +5233,7 @@ my_func: R_PPC64_GOT_PCREL34_DS - 259? + 261 prefix34ds* @@ -4477,12 +5242,40 @@ my_func: @got@pcrel >> 2 + + + R_PPC64_GOT_PCREL34_DQ + + + 262 + + + prefix34dq* + + + @got@pcrel >> 3 + + + + + R_PPC64_GOT_PCREL28_DQ + + + 263 + + + prefix28dq* + + + @got@pcrel >> 3 + + R_PPC64_PCREL_OPT - 260? + 264 @@ -4494,11 +5287,6 @@ my_func: - - [To discuss: Assuming we build up 64-bit PC-relative offsets into a - register using shifts/adds, we'll need the #lo, #ha, #higher[a], - #highest[a] relocs to be defined also.] -
Relocation Descriptions @@ -4583,10 +5371,16 @@ my_func: R_PPC64_REL24_NOTOC This relocation type is used to specify a function call where the TOC pointer is not initialized. It is similar to R_PPC64_REL24 in that it - specifies a symbol to be resolved. However, if the symbol is resolved by - inserting a call to a PLT stub code, the PLT stub code must not rely on - the presence of a valid TOC base address in TOC register r2 to reference - the PLT function table. + specifies a symbol to be resolved. If the + symbol resolves to a function that requires a TOC pointer (as + determined by st_other bits) then a link editor must arrange for the + call to be via the entry point of the called function. Any + However, if the symbol is resolved by + inserting a call to a PLT stub code, the PLT stub code must + not rely on the presence of + a valid TOC base address in TOC + register r2 to reference + the PLT function table. R_PPC64_ENTRY This relocation type may optionally be associated with a global entry point. See @@ -4595,7 +5389,7 @@ my_func: R_PPC64_PCREL_OPT This relocation type requests that the annotated load or store - instruction and its immediately preceding instruction be optimized by + instruction and its immediately following instruction be optimized by the linker when the referenced symbol can be statically resolved. See for details. @@ -4661,15 +5455,16 @@ addi 2,2,.TOC.-func@l requirements as indicated in this section.
Function Call - For TOC-based compilation - units, the + When present, + the static linker must modify a nop instruction after a bl function call to restore the TOC pointer in r2 from 24(r1) when an external symbol that may use the TOC may be called, as in . - TOC-based - object files must contain a - nop slot after a bl instruction to an external symbol. + A function must contain a + nop slot after a bl instruction to an external symbol + unless the bl instruction is annotated with + an R_PPC64_REL24_NOTOC relocation.
Reference Optimization @@ -4750,33 +5545,25 @@ target: and replace the reference with direct PC-relative addressing. For example: - pld r12, symbol@got@pcrel(0), 1 + pld r12, symbol@got@pcrel lvx v1, 0, r12 The previous sequence may be replaced by: - nop -plvx v1, symbol@pcrel(0), 1 + plxv v1, symbol@pcrel +nop However, this optimization is not universally safe, since it changes the value of r12 following the data reference. The compiler or programmer must ensure that the value of r12 is not subsequently used, and communicate a request for this optimization - by placing a RELOC_PPC64_PCREL_OPT on the second instruction in - the sequence. The compiler or programmer must further ensure that + by placing an R_PPC64_PCREL_OPT relocation on the first instruction + in the sequence. The compiler or programmer must further ensure that the two instructions are not separated by intervening instructions. - [To discuss: This optimization is crucial for making PC-relative - performance good enough to replace TOC-relative addressing. I - thought about allowing the compiler to separate the two instructions, - and place an instruction-distance value in the - RELOC_PPC64_PCREL_OPT relocation field, but ultimately I think this - becomes difficult to implement, and I hope that the load-from-DSO - case is infrequent enough that the load-load dependency won't kill - us. Definitely need other opinions/ideas here.] - - - [To discuss: Can we add optimizations for PC-relative offsets built - for large code model? Only applies if we use shift/add sequences.] + [To discuss: A possible alternative, due to Alan, is to allow the + code to separate but emit "pld".."lvx;nop" and optimize to + "dnop".."plxv". In this case the PCREL_OPT should be placed on + both groups of insns. Should we pursue?]
diff --git a/specification/ch_4.xml b/specification/ch_4.xml index 98b15e5..3b5a9a6 100644 --- a/specification/ch_4.xml +++ b/specification/ch_4.xml @@ -698,20 +698,25 @@ PPC_FEATURE_HAS_VSX 0x00000080 /* P7 Vector Extension. */ PPC_FEATURE_PSERIES_PERFMON_COMPAT 0x00000040 PPC_FEATURE_TRUE_LE 0x00000002 PPC_FEATURE_PPC_LE 0x00000001 + Bit 0x00000004 is reserved for kernel use. + AT_HWCAP2 The a_val member of this entry is a bit map of hardware capabilities. Some bit mask values include: - PPC_FEATURE2_ARCH_2_07 0x80000000 /* ISA 2.07 */ -PPC_FEATURE2_HAS_HTM 0x40000000 /* Hardware Transactional Memory */ -PPC_FEATURE2_HAS_DSCR 0x20000000 /* Data Stream Control Register */ -PPC_FEATURE2_HAS_EBB 0x10000000 /* Event Base Branching */ -PPC_FEATURE2_HAS_ISEL 0x08000000 /* Integer Select */ -PPC_FEATURE2_HAS_TAR 0x04000000 /* Target Address Register */ -PPC_FEATURE2_HAS_VCRYPTO 0x02000000 /* The processor implements the - Vector.AES category */ -PPC_FEATURE2_HTM_NOSC 0x01000000 -PPC_FEATURE2_ARCH_3_00 0x00800000 /* ISA 3.0 */ -PPC_FEATURE2_HAS_IEEE128 0x00400000 /* VSX IEEE Binary Float 128-bit */ + PPC_FEATURE2_ARCH_2_07 0x80000000 /* ISA 2.07 */ +PPC_FEATURE2_HAS_HTM 0x40000000 /* Hardware Transactional Memory */ +PPC_FEATURE2_HAS_DSCR 0x20000000 /* Data Stream Control Register */ +PPC_FEATURE2_HAS_EBB 0x10000000 /* Event Base Branching */ +PPC_FEATURE2_HAS_ISEL 0x08000000 /* Integer Select */ +PPC_FEATURE2_HAS_TAR 0x04000000 /* Target Address Register */ +PPC_FEATURE2_HAS_VCRYPTO 0x02000000 /* The processor implements the + Vector.AES category */ +PPC_FEATURE2_HTM_NOSC 0x01000000 +PPC_FEATURE2_ARCH_3_00 0x00800000 /* ISA 3.0 */ +PPC_FEATURE2_HAS_IEEE128 0x00400000 /* VSX IEEE Binary Float 128-bit */ +PPC_FEATURE2_DARN 0x00200000 /* darn instruction */ +PPC_FEATURE2_SCV 0x00100000 /* scv syscall */ +PPC_FEATURE2_HTM_NO_SUSPEND 0x00080000 /* TM without suspended state */ When a process starts to execute, its stack holds the arguments, environment, and auxiliary vector received from the exec call. The system makes no guarantees about the relative arrangement of argument strings, @@ -797,10 +802,6 @@ PPC_FEATURE2_HAS_IEEE128 0x00400000 /* VSX IEEE Binary Float 128-bit */ - The 8-byte header value is undefined when all linked compilation units - are PC-relative. - The link editor shall emit dynamic relocations as appropriate for each entry in the GOT. At runtime, the dynamic linker will apply these relocations after the addresses of all memory segments are known (and @@ -816,10 +817,7 @@ PPC_FEATURE2_HAS_IEEE128 0x00400000 /* VSX IEEE Binary Float 128-bit */ - When at least one TOC-based - compilation unit is to be linked, - the - symbol .TOC. may be used to access the GOT or in TOC-relative + The symbol .TOC. may be used to access the GOT or in TOC-relative addressing to other data constructs, such as the procedure linkage table. The symbol may be offset by 0x8000 bytes, or another offset, from the start of the .got section. This offset allows the use of the full (64 KB) @@ -830,15 +828,15 @@ PPC_FEATURE2_HAS_IEEE128 0x00400000 /* VSX IEEE Binary Float 128-bit */ - In PIC code, the TOC pointer r2 points to the TOC base, enabling + In PIC code that uses the + TOC, the TOC pointer r2 points to the TOC base, enabling easy reference. For static nonrelocatable modules, the GOT address is fixed and can be directly used by code. - All functions in TOC-based - compilation units except leaf routines must load the value of - the TOC base into the TOC register r2. + All functions except leaf routines must + load the value of the TOC base into the TOC register r2. - Functions in PC-relative compilation units access GOT entries directly - using PC-relative addressing. + Code may access GOT entries directly using PC-relative addressing, + where available.
@@ -998,13 +996,6 @@ PPC_FEATURE2_HAS_IEEE128 0x00400000 /* VSX IEEE Binary Float 128-bit */ - - - The caller is PC-relative and does not need to save the TOC - pointer. [To discuss: Do we need a relocation, or will we have - a module-level bit the linker can detect?] - - In any scenario, the PLT call stub must transfer control to the function whose address is provided in the associated PLT entry. This @@ -1053,14 +1044,12 @@ PPC_FEATURE2_HAS_IEEE128 0x00400000 /* VSX IEEE Binary Float 128-bit */ - A possible implementation for case 4 looks as follows: + When PC-relative addressing is available, another simpler variant + may alternatively be used for cases 2 or 3: - pld r12, func@plt@got@pcrel(0), 1 + pld r12, func@plt@pcrel mtctr r12 bctr - - [To discuss: Is that the right assembly syntax?] - To support lazy binding, the link editor also provides a set of symbol resolver stubs, one for each PLT entry. Each resolver stub consists of a single instruction, which is usually a branch to a common @@ -1133,10 +1122,7 @@ bctr After resolution, the value of a PLT entry in the PLT is the address of the function’s global entry point, unless the resolver can determine that a module-local call occurs with a shared TOC value - wherein the TOC is shared between the caller and the - callee, - or a module-local call occurs in a - PC-relative compilation unit. [?] + wherein the TOC is shared between the caller and the callee.